Environmental Data Analysis: Methods and Applications 9783110424904, 9783110430011

Most environmental data involve a large degree of complexity and uncertainty. Environmental Data Analysis is created to

403 74 2MB

English Pages 334 Year 2016

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
1 Time series analysis
1.1 Stationary time series
1.2 Prediction of time series
1.3 Spectral analysis
1.4 Autoregressive moving average models
1.5 Prediction and modeling of ARMA processes
1.6 Multivariate ARMA processes
1.7 State-space models
2 Chaos and dynamical systems
2.1 Dynamical systems
2.2 Henon and logistic maps
2.3 Lyapunov exponents
2.4 Fractal dimension
2.5 Prediction
2.6 Delay embedding vectors
2.7 Singular spectrum analysis
2.8 Recurrence networks
3 Approximation
3.1 Trigonometric approximation
3.2 Multivariate approximation and dimensionality reduction
3.3 Polynomial approximation
3.4 Spline approximation and rational approximation
3.5 Wavelet approximation
3.6 Greedy algorithms
4 Interpolation
4.1 Curve fitting
4.2 Lagrange interpolation
4.3 Hermite interpolation
4.4 Spline interpolation
4.5 Trigonometric interpolation and fast Fourier transform
4.6 Bivariate interpolation
5 Statistical methods
5.1 Linear regression
5.2 Multiple regression
5.3 Case study: Tree-ring-based climate reconstructions
5.4 Covariance analysis
5.5 Discriminant analysis
5.6 Cluster analysis
5.7 Principal component analysis
5.8 Canonical correlation analysis
5.9 Factor analysis
6 Numerical methods
6.1 Numerical integration
6.2 Numerical differentiation
6.3 Iterative methods
6.4 Difference methods
6.5 Finite element methods
6.6 Wavelet methods
7 Optimization
7.1 Newton’s method and steepest descent method
7.2 The variational method
7.3 The simplex method
7.4 Fermat rules
7.5 Karush–Kuhn–Tucker optimality conditions
7.6 Primal and dual pairs of linear optimization
7.7 Case studies
8 Data envelopment analysis
8.1 Charnes–Cooper–Rhodes DEA models
8.2 Banker–Charnes–Cooper DEA models
8.3 One-stage and two-stage methods
8.4 Advanced DEA models
8.5 Software and case studies
9 Risk assessments
9.1 Decision rules under uncertainty
9.2 Decision trees
9.3 Fractile and triangular methods
9.4 The ε-constraint method
9.5 The uncertainty sensitivity index method
9.6 The partitioned multiobjective risk method
9.7 The multiobjective multistage impact analysis method
9.8 Multiobjective risk impact analysis method
9.9 The Leslie model
9.10 Leontief’s and inoperability input-output models
10 Life cycle assessments
10.1 Classic life cycle assessment
10.2 Exergetic life cycle assessment
10.3 Ecologically-based life cycle assessment
10.4 Case studies
Index
Recommend Papers

Environmental Data Analysis: Methods and Applications
 9783110424904, 9783110430011

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Zhihua Zhang Environmental Data Analysis

Also of Interest Probability Theory and Statistical Applications. A Profound Treatise for Self-Study Peter Zörnig, 2016 ISBN 978-3-11-036319-7, e-ISBN 978-3-11-040283-4

An Introduction to Nonlinear Optimization Theory Marius Durea, Radu Strugariu, 2014 ISBN 978-3-11-042603-8, e-ISBN 978-3-11-042735-6

Asymptotic Statistics. With a View to Stochastic Processes Reinhard Höpfner, 2014 ISBN 978-3-11-025024-4, e-ISBN 978-3-11-036778-2

Compressive Sensing. Applications to Sensor Systems and Image Processing Joachim Ender, 2017 ISBN 978-3-11-033531-6, e-ISBN 978-3-11-039027-8

Scientific Computing. For Scientists and Engineers Timo Heister, Leo G. Rebholz, 2015 ISBN 978-3-11-035940-4, e-ISBN 978-3-11-038680-6

Zhihua Zhang

Environmental Data Analysis | Methods and Applications

Author Prof. Zhihua Zhang College of Global Change & Earth System Science Beijing Normal University 19 Xinjiekou Wai St. 100875 Beijing People’s Republic of China [email protected]

ISBN 978-3-11-043001-1 e-ISBN (PDF) 978-3-11-042490-4 e-ISBN (EPUB) 978-3-11-042498-0 Set-ISBN 978-3-11-042491-1

Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2017 Walter de Gruyter GmbH, Berlin/Boston Cover image: Merve Sarac/iStock/thinkstock Typesetting: PTP-Berlin, Protago-TEX-Production GmbH, Berlin Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

Preface Environmental data provide huge amounts of information, but it is complex to process due to the size, variety, and dynamic nature of the data. In order to develop solutions to many environmental issues and make predictions to determine how resources are best allocated, environmental researchers have spent considerable time ensuring well-conducted data collection, analyzing and interpreting environmental data, and describing environmental changes with sound and validated models. Therefore, researchers in environmental science need to be familiar with various advanced techniques for exploration, identification and analysis of patterns in data. This book covers the comprehensive range of topics in data analysis in space, time and spectral domains which are necessary knowledge for environmental research. Main topics include Models for Linear and Nonlinear Environmental Systems, Statistical and Numerical Methods, Data Envelopment Analysis, Risk Assessments, and Life Cycle Assessments. It is a concise and accessible book suitable for anyone interested in learning and understanding advanced methods and applications in environmental data analysis.

DOI 10.1515/9783110424904-001

Contents Preface | V 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7

Time series analysis | 1 Stationary time series | 1 Prediction of time series | 6 Spectral analysis | 13 Autoregressive moving average models | 17 Prediction and modeling of ARMA processes | 26 Multivariate ARMA processes | 34 State-space models | 39

2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8

Chaos and dynamical systems | 45 Dynamical systems | 45 Henon and logistic maps | 46 Lyapunov exponents | 50 Fractal dimension | 51 Prediction | 55 Delay embedding vectors | 56 Singular spectrum analysis | 57 Recurrence networks | 58

3 3.1 3.2 3.3 3.4 3.5 3.6

Approximation | 63 Trigonometric approximation | 63 Multivariate approximation and dimensionality reduction | 72 Polynomial approximation | 76 Spline approximation and rational approximation | 82 Wavelet approximation | 86 Greedy algorithms | 98

4 4.1 4.2 4.3 4.4 4.5 4.6

Interpolation | 102 Curve fitting | 102 Lagrange interpolation | 106 Hermite interpolation | 110 Spline interpolation | 112 Trigonometric interpolation and fast Fourier transform | 116 Bivariate interpolation | 118

VIII | Contents

5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9

Statistical methods | 122 Linear regression | 122 Multiple regression | 125 Case study: Tree-ring-based climate reconstructions | 128 Covariance analysis | 131 Discriminant analysis | 132 Cluster analysis | 137 Principal component analysis | 139 Canonical correlation analysis | 142 Factor analysis | 143

6 6.1 6.2 6.3 6.4 6.5 6.6

Numerical methods | 148 Numerical integration | 148 Numerical differentiation | 152 Iterative methods | 155 Difference methods | 163 Finite element methods | 167 Wavelet methods | 176

7 7.1 7.2 7.3 7.4 7.5 7.6 7.7

Optimization | 185 Newton’s method and steepest descent method | 185 The variational method | 192 The simplex method | 198 Fermat rules | 222 Karush–Kuhn–Tucker optimality conditions | 225 Primal and dual pairs of linear optimization | 233 Case studies | 240

8 8.1 8.2 8.3 8.4 8.5

Data envelopment analysis | 243 Charnes–Cooper–Rhodes DEA models | 243 Banker–Charnes–Cooper DEA models | 252 One-stage and two-stage methods | 255 Advanced DEA models | 257 Software and case studies | 264

9 9.1 9.2 9.3 9.4 9.5 9.6

Risk assessments | 267 Decision rules under uncertainty | 267 Decision trees | 271 Fractile and triangular methods | 274 The ε-constraint method | 282 The uncertainty sensitivity index method | 287 The partitioned multiobjective risk method | 291

Contents |

9.7 9.8 9.9 9.10 10 10.1 10.2 10.3 10.4

The multiobjective multistage impact analysis method | 294 Multiobjective risk impact analysis method | 296 The Leslie model | 304 Leontief’s and inoperability input-output models | 307 Life cycle assessments | 312 Classic life cycle assessment | 312 Exergetic life cycle assessment | 315 Ecologically-based life cycle assessment | 316 Case studies | 318

Index | 321

IX

1 Time series analysis The main objectives of environmental time series analysis are to describe environmental change, explain the mechanisms underlying these changes, predict future environmental change under given perturbation and avoid undesired environmental impacts. In environmental time series analysis, autoregressive moving average (ARMA) models and state-space models are two most useful tools to discover the dynamical characteristics of environmental change through fitting them to environmental time series with complex dynamics.

1.1 Stationary time series A time series is a set of observations {x t }, where each t is a specific time. Time series are encountered in a variety of fields such as temperature, rainfall, pollution, emissions and population. To draw inferences from time series, involving prediction and interrelationships, and to understand the mechanism generating the series, one needs to establish a probability model. The observation is postulated to be a realization of the probability model. For a complete probability model {X t }, all of the joint distributions of (X1 , . . . , X n )T (n ∈ ℤ+ ) would be specified, i.e., for any n ∈ ℤ+ , the probabilities P(X1 ≤ x1 , . . . , X n ≤ x n ) are specified. If all the joint distributions are multivariate normal, the distribution properties of the probability model would be determined completely by the means EX t and covariances Cov(X t+h , X t ) (t ∈ ℤ+ , h = 0, 1, . . . ). Generally speaking, a complete probability model cannot be established using one data observation. In most practical problems involving time series, only one realization is seen. Moreover, to obtain all of the joint distributions, one needs to estimate too many parameters. However, from a linear prediction viewpoint, the minimum mean squared error linear prediction depends only on means and covariances. Therefore, one characterizes time series models using the second-order properties. A time series model for observed data is a sequence of random variables {X t } with mean function EX t and covariance function Cov(X t+h , X t ). Denote μ X (t) = EX t , γ X (t + h, t) = Cov(X t+h , X t ). We say a time series {X t } is stationary if its mean function μ X (t) is independent of t and the autocovariance function γ X (t + h, t) is independent of t for each lag h.

DOI 10.1515/9783110424904-002

2 | 1 Time series analysis

1.1.1 Autocovariance functions If a time series {X t } is stationary, denote μ X = μ X (t), the autocovariance function of {X t } at lag h is γ X (h) = γ X (t + h, t) and the autocorrelation function at lag h is ρ X (h) = γ X (h)/γ X (0). The basic properties of autocovariance function γ X (h) are as follows: – γ X (0) ≥ 0 and γ X (h) is even, and |γ X (h)| ≤ γ X (0) for all h; – the matrix (γ X (i − j))i,j=1,...,n is nonnegative definite, i.e., for any real-valued vector a = (a1 , . . . , a n )T , n

∑ a i γ X (i − j)a j ≥ 0 (n ∈ ℤ). i,j=1

In fact, it is clear that γ X (0) = Cov(X0 , X0 ) = Var(X0 ) ≥ 0, γ X (−h) = Cov(X−h , 0) = Cov(X0 , X h ) = Cov(X h , X0 ) = γ X (h). By the Schwarz inequality, 0 ≤ |γ X (h)| = |Cov(X h , X0 )| = |E[(X h − EX h )(X0 − EX0 )]| 1

1

≤ (E[(X h − EX h )2 ]) 2 ⋅ (E[(X0 − EX0 )2 ]) 2 1

= (Var(X h ) Var(X0 )) 2 = Var(X0 ) = γ X (0). Without loss of generality, assume μ X = 0. Let X n = (X n , . . . , X1 )T . Then n

0 ≤ Var(aT X n ) = E[(aT X n )2 ] = E[aT X n X Tn a] = aT E[X n X Tn ]a = ∑ a i γ X (i − j)a j , i,j=1

where Var(X) means the variance of X. A time series {X t } is strictly stationary if (X1 , . . . , X n ) and (X1+h , . . . , X n+h ) (h ∈ ℤ, n ∈ ℤ+ ) have the same joint distributions. If {X t } is strictly stationary, then {X t } must be stationary. Conversely, it is not true. For example, consider a time series {X t } which is obtained by tossing a penny repeatedly and scoring +1 for each head and −1 for each tail. Its time series model {X t } is an independent random variable sequence with P(X t = 1) = 12 , P(X t = −1) = 12 ,

(t ∈ ℤ+ ).

Its mean function EX t = 0 and variance Var(X t ) = 1, and autocovariance function γ X (t + h, t) = 0 (h ≠ 0). Clearly, both μ X (t) and γ X (t + h, t) are independent of t. So {X t } is a stationary time series with mean 0 and autocovariance function γ X (h) = δ0,h , where δ ij is the Kronecker delta. Take S t = X1 + X2 + ⋅ ⋅ ⋅ + X t S0 = 0,

(t ∈ ℤ+ ),

1.1 Stationary time series

| 3

which is called the random walk. Clearly, μ S (t) = ES t = 0 and γ X (t + h, t) = Cov(S t+h , S t ) = Cov(S t + X t+1 + ⋅ ⋅ ⋅ + X t+h , S t ) = Cov(S t , S t ) = E[(X1 + ⋅ ⋅ ⋅ + X t )2 ] = EX12 + ⋅ ⋅ ⋅ + EX 2t = t. Since γ X (t + h, t) depends on t, S t is not a stationary time series. Let Y t = X t + 12 X t−1 . Then μ Y (t) = 0 and { 54 , h = 0, { { { γ X (t + h, t) = { 12 , h = ±1, { { { 0, |h| > 1, { and so the random walk is stationary with mean 0 and 5 { , { { {4 γ X (h) = { 12 , { { { 0, {

h = 0, h = ±1, |h| > 1.

For example, consider a time series X t = A cos(ωt) + B sin(ωt)

(t ∈ ℤ),

where A and B are uncorrelated random variables with mean 0 and variance 1, and ω is a constant. The mean and covariance are, respectively, EX t = (EA) cos(ωt) + (EB) sin(ωt) = 0, Cov(X t+h , X t ) = E[(A cos ω(t + h) + B sin ω(t + h))(A cos(ωt) + B sin(ωt))] = E[A2 cos ω(t + h) cos(ωt) + AB cos ω(t + h) sin(ωt) + AB sin ω(t + h) cos(ωt) + B2 sin ω(t + h) sin(ωt)]. By the assumption that EA2 = EB2 = 1 and E[AB] = 0, Cov(X t+h , X t ) = cos ω(t + h) cos(ωt) + sin ω(t + h) sin(ωt) = cos(ωh). Therefore, {X t } is a stationary time series with the mean μ = 0 and autocovariance function γ(h) = cos(ωh).

1.1.2 White noise and linear process The simplest model for a time series is white noise which is a sequence of uncorrelated random variables {Z t } with mean 0 and variance σ2 . Denote {Z t } ∼ WN(0, σ2 ). Since EZ t = 0 (t ∈ ℤ) and Cov(Z t+h , Z t ) = δ h,0 σ2 (t ∈ ℤ), the time series {Z t } is stationary. Its mean is μ = 0 and the covariance matrix of Z1 , . . . , Z n is σ2 I n , where I n is the identity matrix of order n.

4 | 1 Time series analysis

White noise plays an important role as a building block for more complicated time series. For example, consider the time series 1 Z t−1 , 3 {Z t } ∼ WN(0, 1) (t ∈ ℤ). Xt = Zt +

Its mean is EX t = 0 and for all t, 10 { { { { 9 E[X t+h , X t ] = { 13 { { { 0 {

if h = 0, if h = ±1, if |h| > 1.

So X t is a stationary time series. We say that a time series {X t } is a linear process if X t = ∑j∈ℤ ψ j Z t−j , {Z t } ∼ WN(0, σ2 ), where each ψ j is constant and ∑j∈ℤ |ψ j | < ∞. Since ∑j∈ℤ |ψ j | < ∞, clearly, ∑j∈ℤ ψ2j < ∞. This further deduces that ∑j∈ℤ ψ j Z t−j is convergent in the square mean. So the linear process X t is stationary. In fact, from E[X t ] = ∑j∈ℤ ψ j E[Z t−j ] = 0, E[X t+h X t ] = ∑k∈ℤ ∑j∈ℤ ψ j ψ k E[Z t+h−j Z t−k ], E[Z t+h−j Z t−k ] = δ j−h,k σ2 , it follows that γ(h) = E[X t+h X t ] = ∑ ψ k+h ψ k σ2 ,

(1.1.1)

k∈ℤ

and so {X t } is stationary. Assume that {X t } is a stationary time series satisfying the following: X t = φX t−1 + Z t , {Z t } ∼ WN(0, σ2 ) (t ∈ ℤ), where |φ| < 1 and Z t is uncorrelated with X s . Then X t = φX t−1 + Z t = φ(φX t−2 + Z t−1 ) + Z t = φ2 X t−2 + φZ t−1 + Z t k

= ∑ φ j Z t−j + φ k+1 X t−k−1 . 0

(1.1.2)

1.1 Stationary time series | 5

Since {X t } is stationary, k

E[(X t − ∑ φ j Z t−j )2 ] = φ2k+2 E[X 2t−k−1 ] → 0

(k → ∞),

j=0 ∞ j j and so X t = ∑∞ j=0 φ Z t−j in the mean square sense. This is a linear process and ∑j=0 φ < ∞. By (1.1.1), ∞ σ2 φ h γ X (h) = ∑ φ k+h φ k σ2 = (h ≥ 0). (1.1.3) 1 − φ2 0

1.1.3 Sample autocorrelation function In practical problems, we do not start with a model but with observed data {x1 , x2 , . . . , x n }. Using these observation data, we estimate the mean μ, autocovariance function γ(h), and autocorrelation function ρ(h). The sample mean of x1 , . . . , x n is x̄ = 1n ∑nt=1 x t . The sample autocovariance function is ̂ γ(h) =

1 n−|h| ̄ t − x)̄ (−n < h < n). ∑ (x t+|h| − x)(x n 1

1.1.4 Classical decomposition model When we analyze any time series, we first plot data on a sheet of graph paper. If there are discontinuities, we break it into homogeneous segments. If there are outlying observations, we discard them. Inspection of a graph may also suggest the possibility of representing the data as a realization of a process. The classical decomposition model is Xt = mt + st + Yt , where m t is a trend component, s t is a seasonal component, and Y t is a stationary random noise component. We hope to estimate m t and s t such that the residual Y t is stationary. First we consider the nonseasonal model with trend X t = m t + Y t (t = 1, . . . , n), where E[Y t ] = 0. The moving average is p

m̂ t =

1 ∑ X t−j 2p + 1 −p

(p − 1 ≤ t ≤ n − p).

It provides an estimate of the trend m t . A general classical decomposition model is Xt = mt + st + Yt

(t = 1, . . . , n),

6 | 1 Time series analysis where E[Y t ] = 0, s t+d = s t , and ∑dj=1 s j = 0. For observations {x1 , . . . , x n }, if the period d = 2p is even, then the estimate of trend m̂ t =

1 1 ( x t−p + x t−p+1 + ⋅ ⋅ ⋅ + x t+p−1 + x t+p ) 2p 2

(p < t ≤ n − p);

if the period d = 2p + 1 is odd, then the estimate of trend p

m̂ t =

1 ∑ x t−j 2p + 1 −p

(p + 1 ≤ t ≤ n − p).

For each k = 1, . . . , d, let ω k be the average of the deviations {x k+jd − m̂ k+jd } (p < k + jd ≤ n − p) and let s k̂ = ω k −

1 d ∑ ωi d 1

(k = 1, . . . , d),

s k̂ = s k−d

(k > d).

Define d t = x t − s t̂ (t = 1, . . . , n). Reestimating the trend from {d t } as above, the estimated noise series is Ŷ t = x t − m̂ t − s t̂

(t = 1, . . . , n).

1.2 Prediction of time series If a time series {X t } is independent for different time t, we cannot make predictions. If it is correlated, and its mean and covariance are known, we may predict X t+h from X0 , . . . , X t . In fact, this requires finding a linear combination τ∗ of X0 , . . . , X t such that the mean square error E[(X t+h − τ∗ )2 ] is minimal. The τ∗ is called the best linear approximation and is used as the forecast of X t+h .

1.2.1 The best linear approximation If a random variable Y is approximated by random variable X, define the mean square error as E[(Y − X)2 ]. The simplest case is that if a random variable Y is approximated by a constant, then EY is its best approximation and the minimum mean square error is Var(Y).

Orthogonality principle Let Y be a random variable. Denote by τ n all linear combinations of 1, X1 , . . . , X n , i.e., n

τ n = {a0 + ∑ a k X k , 1

where each a k is constant} .

1.2 Prediction of time series |

7

For X ∈ τ n , define the approximation error as E[(Y − X)2 ]. If X ∗ ∈ τ n is such that E[(Y − X ∗ )X k ] = 0 (k = 0, 1, . . . , n), then X ∗ is the best linear approximation of Y in τ n , i.e., for all X ∈ τ n , E[(Y − X ∗ )2 ] ≤ E[(Y − X)2 ] and the minimum mean error is E[(Y − X ∗ )2 ] = E[Y 2 ] − E[X ∗2 ]. Conversely, if X ∗ ∈ τ n is the best linear approximation of Y in τ n , then E[(Y − X ∗ )X k ] = 0

(k = 0, 1, . . . , n).

1.2.2 Prediction of stationary time series Let {X t } be a stationary time series with mean μ and autocovariance function γ. We estimate X n+h (h > 0) by the linear combination of 1, X1 , . . . , X n such that the mean square error attains the minimal value. Denote by P n X n+h the best linear predictor and P n X n+h = a∗0 + a∗1 X n + ⋅ ⋅ ⋅ + a∗n X1 ,

(1.2.1)

where a∗0 , a∗1 , . . . , a∗n are undetermined coefficients. By the orthogonality principle, the best approximation P n X n+h of X n+h in all linear combinations of {1, X1 , . . . , X n } satisfies E[(X n+h − P n X n+h )] = 0, E[(X n+h − P n X n+h )X n+1−j ] = 0

(j = 1, . . . , n).

This implies the following proposition. Let {X t } be a stationary time series with mean μ and autocovariance function γ. Then (a) the best predictor P n X n+h of X n+h (h > 0) by the linear combinations of 1, X1 , . . . , X n is n

P n X n+h = μ + ∑ a∗i (X n+1−i − μ) 1

and the coefficient

a∗n

=

(a∗1 ,

...,

a∗n )T

satisfies the system of linear equations

{ a∗0 = μ (1 − ∑ni=1 a∗i ) , { Γ a∗ = R n (h), { n n where Γ n = ( γ(i − j) )n×n , and R n (h) = (γ(h), γ(h + 1), . . . , γ(h + n − 1))T ; (b) the minimum mean square error E[(X n+h − P n X n+h )2 ] = γ(0) − a∗T n R n (h).

(1.2.2)

8 | 1 Time series analysis Consider a time series {X t } satisfying X t = 12 X t−1 + Z t , {Z t } ∼ WN(0, 1)

(t ∈ ℤ).

The best predictor is n

P n X n+1 = ∑ a∗i X n+1−i

(1.2.3)

i=1

and the coefficient a∗n = (a∗1 , . . . , a∗n )T satisfies the equation (γ(i − j))n×n (a∗1 , . . . , a∗n )T = (γ(1), . . . , γ(n))T . From (1.1.3), it follows that γ(h) = 43 2−h . So 1 2−1 ( . .. −n+1 2

2−1 1 .. . 2−n+2

2−2 2−1 .. . 2−n+3

⋅⋅⋅ ⋅⋅⋅ .. . ⋅⋅⋅

2−n+1 a∗1 2−1 ∗ −n+2 2 a2 2−2 = ) ( ) ( .. .. ) .. . . . 1 a∗n 2−n

and the solution is a∗1 = 12 , a∗2 = ⋅ ⋅ ⋅ = a∗n = 0. By (1.2.3), P n X n+1 = 12 X n . So the minimum mean square error is E[(X n+1 − P n X n+1 )2 ] = γ(0) − a∗1 γ(1) =

4 3

− 23 a∗1 = 1.

1.2.3 Durbin–Levinson and innovation algorithms Generally speaking, to obtain a∗1 , . . . , a∗n , we need to solve the system (1.2.2) of linear equations. This is very difficult for large n. The following Durbin–Levinson algorithm gives a simple recursive formula of coefficients of the best linear prediction for a stationary time series {X t } with mean μ and covariance function γ(h). For convenience, we assume that the mean μ = 0 and γ(h) → 0 (h → ∞).

Durbin–Levinson algorithm Let P n X n+1 = a∗n1 X n + ⋅ ⋅ ⋅ + a∗nn X1 . The minimum mean square error is U n := γ(0) − (a∗n1 γ(n) + ⋅ ⋅ ⋅ + a∗nn γ(1)).

1.2 Prediction of time series |

9

The coefficients a∗n1 , . . . , a∗nn can be computed recursively as follows: n−1

−1 a∗nn = (γ(n) − ∑ a∗n−1,j γ(n − j)) U n−1 ,

U n = U n−1 (1 − (a∗nn )2 ),

j=1

a∗n1

a∗n−1,1 a∗n−1,n−1 .. .. ( ... ) = ( ) − a∗nn ( ), . . ∗ ∗ ∗ a n,n−1 a n−1,n−1 a n−1,1 where a∗11 = γ(1)/γ(0) and U0 = γ(0). Now we do not consider stability. For a general time series {X t }, if both its mean function μ X (t) = E[X t ] and covariance function γ X (i, j) = Cov(X i , X j ) are known, we may predict X n+h by 1, X1 , . . . , X n . Similar to the stationary case that the best linear predictor of X n+h is n

P n X n+h = E[X n+h ] + ∑ a i (X n+1−i − E[X n+1−i ]) i=1

and the coefficient a∗n = (a∗1 , . . . , a∗n )T satisfy the following system of linear equations R n (h), Γ̃ n a∗n = ̃ where

(1.2.4)

Γ̃ = (Cov(X n+1−i , X n+1−j ))i,j=1,...,n , ̃ R n (h) = (Cov(X n+h , X n ), . . . , Cov(X n+h , X1 ))T .

The mean square error is E[(X n+h − P n X n+h )2 ] = Var(X n+h ) − a∗T n R n (h). Consider the time series {X t } satisfying X t = 12 X t−1 + Z t , {Z t } ∼ WN(0, 1). Formula (1.1.3) has given the autocovariance function γ(h) =

2−h+2 . 3

Suppose that the observations X1 and X3 are known but we miss the observation value X2 . We want to estimate X2 by the linear combination of X1 and X3 . Now Cov(X3 , X3 ) Γ̃2 = ( Cov(X1 , X3 )

γ(0) Cov(X3 , X1 ) )=( Cov(X1 , X1 ) γ(2)

4 γ(2) ) = ( 13 γ(0) 3

1 3) , 4 3

T ̃ R2 (1) = ( Cov(X2 , X3 ), Cov(X2 , X1 ) )T = (γ(1), γ(1))T = ( 23 , 23 ) .

10 | 1 Time series analysis

This gives a system of equations 1 3 ) (a1 ) 4 a2 3

4

( 13 3

with solution a1 = a2 =

2 5.

2

= ( 32 ) 3

Therefore, the best estimator of X2 is

P(X2 | 1, X1 , X3 ) = E[X2 ] + 25 (X3 + X1 ) = 25 (X1 + X3 ). The minimum mean square error is 2

E[X | 1, X1 , X3 ] = Var(X2 ) − (a1 , a2 ) ( 32 ) = 3

8 4 4 − = . 3 15 5

From this example, we see that even if a time series is stationary, to estimate its missing values we will use (1.2.4) but not use the Durbin–Levinson method. To avoid solving the system of linear equations (1.2.4) for large n, one gives the innovation algorithm. It is a simple recurrence algorithm for the best linear prediction, but allows the time series to not be stationary.

Innovation algorithm Assume that the mean EX t = 0 (t ∈ ℤ+ ) and the covariance matrix ( h ij )n×n is nonsingular, where h ij = Cov(X i , X j ). Then the best linear prediction is n o o X n+1 = ∑ α nj (X n+1−j − X n+1−j ), j=1

where = P n X n+1 (n ∈ ℤ+ ) and X0o = 0, and the coefficients α nj and mean square errors λ n satisfy o X n+1

λ0 = h11 , λ n,n−k =

λ nn =

1 h n+1,1 , λ0

1 1 k−1 h n+1,k+1 − ∑ α k,k−j α n,n−j λ j λk λ k j=0

(k = 1, . . . , n − 1), (1.2.5)

n−1

λ n = h n+1,n+1 − ∑ α2n,n−j λ j , j=0

E[(X n+1 − P n X n+1 )2 ] = λ n . The use of this recursive formula is in the order λ0 , α11 , λ1 , α22 , α21 , λ2 , α33 , α32 , α31 , λ3 , . . . . For example, consider the time series X t = Z t + 12 Z t−1 , {Z t } ∼ WN(0, 1).

1.2 Prediction of time series |

Note that

11

h ij = E[X i X j ] = E[(Z i + 12 Z i−1 )(Z j + 12 Z j−i )], E[Z i Z j ] = 0 E[Z 2i ]

(i ≠ j),

= 1.

Then h ii = 54 ,

h i,i+1 = 12 ,

h ij = 0

(|i − j| ≥ 2).

Applying the innovation algorithm gives α nj = 0 λ0 =

(2 ≤ j ≤ n),

5 , 4

1 , 2λ n−1 5 1 . λn = − 4 4λ n−1

α n1 =

So the best linear prediction is n o o = ∑ α nj (X n+1−j − X n+1−j ) = α n1 (X n − X no ). X n+1 1

For the recursive calculation of the h-step predictors, we may use the following formula: n+h−1 o ), P n X n+h = ∑ α n+h−1 (X n+h−j − X n+h−j j=h

where the coefficients α j are determined as before by the innovation algorithm and the minimum mean square error is n+h−1

E[(X n+h − P n X n+h )2 ] = K(n + h, n + h) − ∑ α2n+h−1 U n+h−j−1 . j=h

Compare these two algorithms. In the Durbin–Levinson algorithm, the best prediction is represented into a linear combination of X n , . . . , X1 n

P n X n+h = ∑ a nj X n+1−j , j=1

where {φ nj } are given recursively. The Durbin–Levinson algorithm is suited to autoregressive process X t − a1 X t−1 − ⋅ ⋅ ⋅ − a p X t−p = Z t since φ nj = 0 for j < n − p. While in the innovation algorithm, the best prediction is represented by another linear combination n o o = ∑ α nj (X n+1−j − X n+1−j ). X n+1 j=1

When X t is a moving average of order q, α nj = 0 (j < n − q). The innovation algorithm is suited to moving average process X t = Z t + α1 Z t−1 + ⋅ ⋅ ⋅ + α q Z t−q .

12 | 1 Time series analysis

1.2.4 Wold decomposition In order to introduce the Wold decomposition of a stationary process, we explain concepts of convergence for a sequence of random variables. Let {X n }n=0,1,... be a sequence of random variables and X be a random variable. We say that X n → X in mean square if limn→∞ E[(X n − X)2 ] = 0. Now we consider the prediction of a stationary process in terms of infinitely many past values. P mn X n+h is the best linear prediction of X n+h (h > 0) by the linear combination of 1, X m , . . . , X0 , . . . , X n . Define the best prediction based on the infinite past values {X t (−∞ < t ≤ n)} as P̃ n X n+h = lim P mn X n+h m→−∞

in the mean square sense. A time series {X t } is called deterministic if X n = P̃ n−1 X n for all n. For example, X t = A cos(ωt) + B sin(ωt), where ω is constant and A, B are uncorrelated random variables with mean 0 and variance σ2 . Note that 2 cos ωX n−1 = A(cos(ωt) + cos ω(t − 2)) + B(sin(ωt) + sin ω(t − 2)) = X n + X n−2 . Then

X n = 2 cos ωX n−1 − X n−2 = P̃ n−1 X n

(n ∈ ℤ+ ).

So {X t } is deterministic.

Wold decomposition If {X t } is a nondeterministic stationary time series, then ∞

X t = ∑ ψ j Z t−j + V t , j=0

{Z t } ∼ WN(0, 1), ∑∞ j=0

where ψ0 = 1, < ∞, {V t } is deterministic, and Cov(Z t , V t󸀠 ) = 0 for all t, t󸀠 . For all causal ARMA processes (see Section 1.4), the deterministic component V t is zero in the Wold decomposition, and so ψ2j



X t = ∑ ψ j Z t−j . j=0

Let {X t } be a stationary time series with mean 0 and autocovariance function γ(h) such that γ(k) = 0 (k > q) and γ(q) ≠ 0. Then q

X t = ∑ ψ j Z t−j , j=0

{Z t } ∼ WN(0, 1), i.e., {X t } is an MA process of order q, where ψ0 = 1.

1.3 Spectral analysis

|

13

1.3 Spectral analysis The spectral representation of a stationary time series {X t } shows that {X t } can be decomposed into a sum of sinusoidal components with uncorrelated random coefficients. Such spectral representation is called spectral analysis.

1.3.1 Spectral density Suppose that {X t } is a stationary time series with mean 0 and autocovariance function γ(h) satisfying ∑h∈ℤ |γ(h)| < ∞. The spectral density of {X t } is defined as ∞

f(α) = ∑ γ(h) e−ihα = γ(0) + 2 ∑ γ(h) cos(hα).

(1.3.1)

1

h∈ℤ

The spectral density f(α) is a nonnegative even function. Using termwise integration, the autocovariance can be expressed into the following integral: π

π

−π

−π

1 γ(h) = ∫ f(α) eihα dα = ∫ cos(hα)f(α) dα. 2π Since f(α) is an even function, γ(h) =

π

π

−π

−π

1 1 ∫ f(−α) e−ihα dα = ∫ f(α) e−ihα dα. 2π 2π

This implies that the autocovariance function γ(h) is Fourier coefficient of the spectral density f(α). The definition of spectral density is generalized as follows. For a stationary time series {X t } with autocovariance function γ(h), if f(α) is a 2π-periodic nonnegative function and γ(h) is its Fourier coefficient, then f(α) is called the spectral density of {X t }. For example, suppose that {X t } is a stationary time series satisfying X t = φX t−1 + Z t , {Z t } ∼ WN(0, 1) (t ∈ ℤ), where |φ| < 1. By (1.1.3), γ(h) =

1 φ|h| . 1 − φ2

14 | 1 Time series analysis So {X t } has the spectral density f(α) = ∑ e−ihα γ(h) = h

1 ∑ φ|h| e−iαh 1 − φ2 h

=

∞ ∞ 1 (1 + ∑ φ h e−iαh + ∑ φ h eiαh ) 2 1−φ h=1 h=1

=

1 φeiα φe−iα 1 + . + (1 )= 2 −iα iα 1−φ 1 − 2φ cos α + φ2 1 − φe 1 − φe

For another time series

X t = Z t + θZ t−1 , Z t ∼ WN(0, 1)

with mean μ = 0 and autocovariance function { 1 + θ2 , { { { γ(h) = { θ, { { { 0, {

h = 0, h = ±1, |h| > 1,

the spectral density is f(α) = γ(0) + γ(−1) eiα + γ(1) e−iα = 1 + θ2 + θ(eiα + e−iα ) = 1 + 2θ cos α + α2 .

1.3.2 Spectral estimation Consider a stationary time series {X t } with mean 0 and autocovariance function γ(h). In applications, we only know finitely many observations x0 , . . . , x N−1 . Since x̄ = N−1 1 N ∑k=0 x k ≈ 0, the γ(h) is estimated by ̂ γ(h) =

1 N−1 ∑ x n x n+h . N n=0

(1.3.2)

̂ =∑ ̂ e−ikα . Denote The corresponding spectral density f(α) is estimated by f (α) k∈ℤ γ(k) { xn , x(n) = { 0, {

0 < n ≤ N − 1, otherwise.

So

1 x(n + h)x(n) ∑ N n∈ℤ and the spectral density is estimated by ̂ γ(h) =

1 x(n) (∑h∈ℤ x(n + h) e−ihα ) ∑ N n∈ℤ 󵄨󵄨2 󵄨 󵄨󵄨 1 1 󵄨󵄨󵄨N−1 = (∑n∈ℤ x(n) einα ) (∑m∈ℤ x(m) e−imα ) = 󵄨󵄨󵄨 ∑ x n e−inα 󵄨󵄨󵄨 . N N 󵄨󵄨󵄨 0 󵄨󵄨󵄨

̂ ̂ e−ihα = f (α) = ∑h∈ℤ γ(h)

1.3 Spectral analysis

|

15

For each N ∈ ℤ+ , define 󵄨 󵄨󵄨2 N−1 N−1 󵄨 1 1 [󵄨󵄨󵄨󵄨N−1 −inα 󵄨󵄨 ] f N (α) = E 󵄨󵄨 ∑ X n e 󵄨󵄨 = E [ ∑ X n e−inα ∑ X m eimα ] 󵄨󵄨 󵄨󵄨 N n=0 m=0 󵄨 ] N [󵄨 0 1 = ∑|h| m), then Y t is a simple moving average

and Yt = ψ(e−iα ) =

m 1 ∑ ψ k X t−k , 2m + 1 −m m 1 ∑ e−ikα . 2m + 1 −m

By the sum formula of geometric series, sin(m + 12 )α eimα − e−i(m+1)α ei(m+ 2 )α − e−i(m+ 2 )α = )= = α α (2m + 1)(1 − e−iα ) (2m + 1)(ei 2 − e−i 2 ) (2m + 1) sin 2α 1

−iα

ψ(e

1

ψ(e−iα ) = 1

(α ≠ 0), (α = 0).

So, by (a) and (b) γ Y (h) = f Y (α) =

m m 1 γ X (h − k + l), ∑ ∑ (2m + 1)2 k=−m l=−m

sin2 (m + 12 )α (2m + 1)2 sin2

f (α) α X 2

f Y (α) = f X (α),

(α ≠ 0), (α = 0).

1.4 Autoregressive moving average models Autoregressive moving average (ARMA) models are the most important class of stationary time series. Here we discuss their basic properties and give Yule–Walker equations. We also introduce partial autocorrelation function and spectral densities of ARMA.

1.4.1 ARMA Models Let {Z t } ∼ WN(0, σ2 ). If {X t } is stationary and for each t, p

q

X t − ∑ φ k X t−k = Z t + ∑ θ l Z t−l , 1

1

(1.4.1)

18 | 1 Time series analysis where φ k , θ l are constants, then {X t } is called an ARMA (p, q) process and (1.4.1) is called an ARMA equation. If φ1 = ⋅ ⋅ ⋅ = φ p = 0, then {X t } is called an MA (q) process. If θ1 = ⋅ ⋅ ⋅ = θ q = 0, then {X t } is called an AR (p) process. Let {X t } be a stationary time series. Define a time shift operator B as BX t = X t−1 . So, for any l ∈ ℤ+ , B l X t = X t−l . Denote

p

φ(z) = 1 − ∑ φ k z k , 1 q

(1.4.2)

θ(z) = 1 + ∑ θ l z l . 1

They are polynomials of complex variable z of degree ≤ p and degree ≤ q, respectively. Assume that φ(z) and θ(z) have no common root. Denote the operator polynomials p

φ(B) = 1 − ∑ φ k B k , 1 q

θ(B) = 1 + ∑ θ l B l . 1

Since

p

p

φ(B)X t = X t − ∑ φ k B k X t = X t − ∑ φ k X t−k , 1 q

1 q

θ(B)Z t = Z t + ∑ θ l B l X t = Z t + ∑ θ l X t−l , 1

1

the ARMA equation is written into φ(B)X t = θ(B)Z t . The AR equation is written into Φ(B)X t = Z t and the MA equation is written into X t = θ(B)Z t . p Suppose that the polynomial φ(z) = 1 − ∑k=1 φ k z k ≠ 0 (|z| = 1), where |z| = 1 is the unit circle. Then ARMA equation (1.4.1) has a unique solution X t = ∑ ψ j z t−j

(t ∈ ℤ)

(1.4.3)

j∈ℤ

in the mean square sense, where the coefficients ψ j (j ∈ ℤ) are determined by ψ(z) =

θ(z) = ∑ ψj zj φ(z) j∈ℤ

(|z| = 1),

q

(1.4.4)

where θ(z) = 1 + ∑l=1 θ l z l . If the unique solution is X t = ∑∞ j=0 ψ j Z t−j , then the ARMA (p, q) process is called causal. For example, consider ARMA equation Xt −

10 X t−1 + X t−2 = Z t 3

or

φ(B)X t = Z t ,

1.4 Autoregressive moving average models |

19

2 where φ(z) = 1 − 10 3 z + z and θ(z) = 1. Since φ(z) = 0 has two roots z = 3 and z = and φ(z) ≠ 0 on |z| = 1, the ARMA equation has a unique solution. Note that

ψ(z) =

1 3 1 1 1 ∞ z j 9 −1 = ( − ) = − ∑ ( ) + ∑ (3z)j 1 φ(z) 8 z − 3 z − 3 8 0 3 8 −∞

i.e., ψ(z) = ∑j∈ℤ ψ oj z j , where ψ oj = − 31j 8 (j ≥ 0) and ψ oj = solution is X t = ∑ ψ oj Z t−j .

3j 9 8

1 3

(|z| = 1),

(j ≤ −1). By (1.4.3), the

j∈ℤ θ(z) Generally, the solution is ψ(z) = φ(z) which is a rational function. When φ(z) ≠ 0 (|z| = 1), the function ψ(z) can be expanded into (1.4.4). In fact, if φ(z) has p different roots α1 , . . . , α p , then

ψ(z) =

βp β2 β1 + + ⋅⋅⋅ + + Q(z), z − α1 z − α2 z − αp

(1.4.5)

where Q(z) is a polynomial of z and β μ = limz→α μ ψ(z)(z − α μ ) (μ = 1, . . . , p). If |α μ | > 1, then

βμ z−α μ

can be expanded into a power series on |z| = 1; if |α μ | < 1, then it can be

expanded into a negative power series. From this and(1.4.5), we get ψ(z) = ∑j∈ℤ ψ j z j (|z| = 1).

Causal solution An ARMA (p, q) process is causal if and only if φ(z) ≠ 0 (|z| ≤ 1), where φ(z) is stated in (1.4.2). In fact, if φ(z) = 1 − φ1 z − ⋅ ⋅ ⋅ − φ p z p ≠ 0 (|z| ≤ 1), then all roots of φ(z) lie β outside the unit disk. So each fraction z−αj j in (1.4.5) is expanded into a power series. This implies that θ(z) ∞ ψ(z) = = ∑ ψj zj , φ(z) 0 i.e., the ARMA (p, q) process is causal. Conversely, it is also true. Another method for finding the coefficients {ψ j } is as follows. θ(z) , it follows that From ψ(z) = φ(z) (1 − φ1 z − ⋅ ⋅ ⋅ − φ p z p )(ψ0 + ψ1 z + ⋅ ⋅ ⋅ ) = 1 + θ1 z + ⋅ ⋅ ⋅ + θ q z q . Equating the coefficients of z j (j = 0, 1, . . . ), we get ψ0 = θ0 + 1, ψ1 = θ1 + φ1 , ψ2 = θ2 + φ2 + θ1 φ1 + φ21 ,

20 | 1 Time series analysis

In general, l

ψ l = θ l + ∑ φ k ψ l−k = θ l

(l ∈ ℤ+ ),

k=1

where θ l = 0 (l > q) and φ k = 0 (k > p). Using the above similar argument, we get the following:

Invertible solution An ARMA (p, q) process is invertible if and only if θ(z) ≠ 0 (|z| ≤ 1), where θ(z) is stated in (1.4.2). For example, consider an ARMA (1, 1) process X t − φX t−1 = Z t + θZ t−1 It is clear that for all |z| ≤ 1,

(|φ| < 1, |θ| < 1, φ + θ ≠ 0).

φ(z) = 1 − φz ≠ 0, θ(z) = 1 + θz ≠ 0.

Since |φz| < 1 on |z| = 1, the ARMA process is causal. Note that ψ(z) =

∞ 1 + θz = 1 + ∑ φ j−1 (φ + θ)z j . 1 − φz 1

So ψ0 = 1 and ψ j = φ j−1 (φ + θ) (j = 1, 2, . . . ), and ∞

X t = Z t + (φ + θ) ∑ φ j−1 Z t−j . 1

Since |θz| < 1 (|z| = 1), the ARMA process is invertible. Note that ζ(z) =

∞ ∞ 1 − φz = (1 − φz) ∑(−1)j θ j z j = 1 + ∑ (−1)j θ j−1 (φ + θ)z j . 1 + θz 0 j=1

So



Z t = X t + (φ + θ) ∑(−1)j θ j−1 X t−j , 1

i.e., the ARMA process is invertible.

1.4.2 Yule–Walker equation The Yule–Walker equation explains the relation between the parameters of an ARMA process φ1 , . . . , φ p ; θ1 , . . . , θ q ; σ2 , and its autocovariance function γ(t).

1.4 Autoregressive moving average models |

21

p

Suppose that {X t } is a causal AR (p) process, X t − ∑ν=1 φ ν X t−ν = Z t . Multiplying both sides by X t−k and then taking the mean on both sides, p

E[X t X t−k ] − ∑ φ ν E[X t−k X t−ν ] = E[Z t X t−k ]. 1

Since {X t } is causal, X t =

∑∞ j=0

ψ j Z t−j (t ∈ ℤ). So, for k = 0, 1, . . . ,

∞ ∞ ∞ { σ2 E[Z t X t−k ] = ∑ ψ j E[Z t Z t−k−j ] = σ2 ∑ ψ j δ0,k+j = σ2 ∑ ψ j−k δ0,j = { 0 0 0 j=k {

(k = 0), (k ≠ 0).

From this and E[X t X t−k ] = γ(k), and E[X t−k X t−ν ] = γ(k − ν), it follows that the Yule– Walker equation p { σ2 (k = 0), γ(k) − ∑ φ ν γ(k − ν) = { 0 (k ≠ 0). 1 { Its matrix form is γ(0) − φT γ p = σ2 , Γp φ = γp , where Γ p = (γ(i − j))i,j=1,...,p , γ p = (γ(1), . . . , γ(p))T , and φ = (φ1 , . . . , φ p )T . For example, consider an AR (2) process Xt −

3 1 X t−1 + X t−2 = Z t , 4 8 {Z t } ∼ WN(0, 1).

By Yule–Walker equation, the autocovariance function γ(h) satisfies (a) γ(0) − 34 γ(1) + 18 γ(2) = 1, (b) − 34 γ(0) + 98 γ(1) = 0, (c) γ(k) − 34 γ(k − 1) + 18 γ(k − 2) = 0 (k ≥ 2). Note that the polynomial φ(z) = 1 − 34 z + 18 z2 has two zero points z1 = 2 and z2 = 4, the homogeneous linear difference equation (c) has a general solution γ(h) = a1 2−h + a2 4−h (h ≥ 0), where a1 and a2 are arbitrary constants. Substituting it into (a) and (b), we get 84a1 + 105a2 = 128, 42a1 + 105a2 = 0. −h

128 2 So a1 = 64 21 and a 2 = − 105 . This implies γ(h) = 128( 42 − More conveniently, take k = 2 in (c). The solution is

γ(0) =

64 , 35

γ(1) =

128 , 105

γ(2) =

4−h 105 )

(h ≥ 0).

24 , 35

and then γ(3), γ(4), . . . can be found from (c) successively. This is an especially convenient method for numerical determination of autocovariances γ(h).

22 | 1 Time series analysis For an AR (p) process, from the first p + 1 Yule–Walker equations p

γ(0) − ∑ φ ν γ(ν) = σ2 , 1 p

γ(k) − ∑ φ ν γ(k − ν) = 0 (k = 1, . . . , p), 1

we may find out γ(0), γ(1), . . . , γ(p) and then find out γ(p + 1), γ(p + 2), . . . successively using equations p

γ(k) − ∑ φ ν γ(k − ν) = 0 (k = p + 1, . . . ). 1

More generally, if {X t } is a causal ARMA process p

q

X t − ∑ φ k X t−k = Z t + ∑ θ l Z t−l , 1

1 ∞

X t = ∑ ψ j Z t−j , 0

the relation between φ k , θ l , σ2 and γ(t) is given by using a similar method as follows: q

p

γ(k) − ∑ φ ν γ(k − ν) = σ2 ∑ θ j ψ j−k 1

(0 ≤ k < m),

(1.4.6)

(k ≥ m),

(1.4.7)

j=k

p

γ(k) − ∑ φ ν γ(k − ν) = 0 1

where m = max{ p, q + 1 } and θ j = 0 (j ≥ q + 1). The homogeneous linear difference equation (1.4.7) with constant coefficients has a solution p

γ(h) = ∑ a l ξ l−h

(h ≥ m − p),

(1.4.8)

1

where a1 , . . . , a p are arbitrary constants and ξ1 , . . . , ξ p are different roots of the polynomials p

φ(z) = 1 − ∑ φ k z k . k=1

In fact, substituting γ(h) = the equation (1.4.7), p

ξ l−h

(l = 1, . . . , p; h > m − p) into the left-hand side of

p

γ(k) − ∑ φ ν γ(k − ν) = ξ l−k (1 − ∑1 φ ν ξ lν ) = ξ l−k φ(ξ l ) = 0 1

1.4 Autoregressive moving average models |

23

since ξ l is a root of φ(z). Note that (1.4.7) is a homogeneous linear difference equation. The linear combination (1.4.8) of ξ1−h , . . . , ξ p−h is the general solution of (1.4.7). Substituting (1.4.8) into (1.4.7), we obtain a system of m linear equations that determine the constants a1 , . . . , a p and autocovariances γ(0), . . . , γ(m − p − 1). For example, consider the ARMA (1, 1) process X t − φX t−1 = Z t + θZ t−1

(|φ| < 1),

{Z t } ∼ WN(0, σ ). 2

Since |φ| < 1, it has a causal solution ∞

X t = 1 + ∑ ψ n Z t−n , 1

where ψ n = (φ + θ)φ n−1 (n ≥ 1). By (1.4.7) and (1.4.8), we get γ(0) − φγ(1) = σ2 (1 + (φ + θ)θ), γ(1) − φγ(0) = σ2 θ. By (1.4.7), the homogeneous equation γ(k) − φγ(k − 1) = 0

(k ≥ 2)

has the solution γ(h) = aφ h (h ≥ 1). Substituting this solution into the above equations, we get γ(0) − aφ2 = σ2 (1 + θ(φ + θ)), −φγ(0) + aφ = σ2 θ. Their solution is a=

(φ + θ)(1 + φθ) 2 σ , φ(1 − φ2 )

So γ(h) = aφ h =

γ(0) = (1 +

(φ + θ)2 ) σ2 . 1 − φ2

(φ + θ)(1 + φθ) h−1 2 φ σ 1 − φ2

(h ≥ 1).

(1.4.9)

(1.4.10)

1.4.3 Partial autocorrelation function and spectral densities Suppose that {X t } is a stationary time series with mean 0 and autocorrelation function γ(t). Denote the matrix Γ h = (γ(i − j))i,j=1,...,h and the vector γ h = (γ(1), . . . , γ(h))T . Let Φ h = Γ h−1 γ h , Φ h = (Φ i1 , . . . , Φ hh )T .

24 | 1 Time series analysis

The partial autocorrelation function β(h) is defined as β(0) = 1, β(h) = Φ hh

(h ≥ 1).

Consider the MA (1) process X t = Z t + θZ t−1 , {Z t } ∼ WN(0, σ2 ). By γ(0) = (1 + θ2 )σ2 and γ(1) = θσ2 , it follows that Φ1 = Γ1−1 γ1 = So β(1) = Φ11 = By γ(0) = (1

θ . 1+θ2 2 + θ )σ2 ,

γ(1) = θσ2 , and γ(2) = 0, it follows that

Φ2 = Γ2−1 γ2 = ( = = So β(2) = Φ22 = By γ(0) = (1

γ(1) θ . = γ(0) 1 + θ2

−1

γ(0) γ(1)

γ(1) ) γ(0)

(

−γ(1) γ(1) )( ) γ(0) 0

1 γ(0) ( γ2 (0) − γ2 (1) −γ(1) γ2 (0)

γ(1) ) γ(2)

1 γ(1)γ(0) ( 2 ). 2 − γ (1) −γ (1)

2 −γ2 (1) = 1+θ−θ2 +θ4 . γ2 (0)−γ2 (1) + θ2 )σ2 , γ(1) = θσ2 ,

γ(2) = γ(3) = 0, and

γ(0) Γ3 = (γ(1) 0

γ(1) γ(0) γ(1)

0 γ(1)) , γ(0)

and γ3 = (γ(1), 0, 0)T , it follows that Γ3−1

= (γ (0) − 2γ (1)γ(0))

Φ3 =

3

2

Γ3−1 γ3

So β(3) = Φ33 =

−1

γ2 (0) − γ2 (1) ( −γ(0)γ(1) γ2 (1)

−γ(0)γ(1) γ2 (0) −γ(0)γ(1)

γ2 (0) − γ2 (1) γ(1) = 3 ( −γ(0)γ(1) ) . γ (0) − 2γ2 (1)γ(0) γ2 (1)

γ3 (1) γ3 (0)−2γ2 (1)γ(0)

=

θ3 1+θ2 +θ4 +θ6

.

In general, β(h) = Φ hh =

(−1)h+1 θ h 1 + θ2 + ⋅ ⋅ ⋅ + θ2h

(h ≥ 1).

γ2 (1) −γ(0)γ(1) ) , γ2 (0) − γ2 (1)

1.4 Autoregressive moving average models |

25

Consider the AR (p) process X t − φ1 X t−1 − ⋅ ⋅ ⋅ − φ p X t−p = Z t , {Z t } ∼ WN(0, σ2 ). The Yule–Walker equation is φ = Γ p−1 γ p , where φ = (φ1 , . . . , φ p ), Γ p = ( γ(i − j) )i,j=1,...,p , γ p = (γ1 , . . . , γ p )T . Comparing it with the definition of the partial autocorrelation function, Φ p = φ, β(p) = Φ pp = φ p , β(h) = 0

(h > p).

For h < p, β(h) can easily be computed by the definition. Now we turn to introducing the spectral density. Suppose that {X t } is a causal ARMA (p, q) process φ(B)X t = θ(B)Z t , {Z t } ∼ WN(0, σ2 ). So



X t = ∑ ψ j Z t−j ,

φ(z) ≠ 0

(|z| ≤ 1),

0

where {ψ j } satisfy ψ(z) = density is

θ(z) φ(z)

∞ j = ∑∞ 0 ψ j z and ∑0 |ψ j | < ∞. By (1.3.4), the spectral

󵄨󵄨 󵄨2 󵄨 θ(eiα ) 󵄨󵄨󵄨 󵄨󵄨 f Z (α). f X (α) = |ψ(eiα )|2 f Z (α) = 󵄨󵄨󵄨󵄨 󵄨󵄨 φ(eiα ) 󵄨󵄨󵄨 However, for white noise {Z t }, E[Z t ] = 0, γ(0) = E[Z 2t ] = σ2 , γ(α) = E[Z t+|α| Z t ] = 0

(α ∈ ℤ+ ),

and its spectral density is f Z (α) = ∑ γ(h) e−ihα = γ(0) = σ2 . h

Thus the spectral density of the causal ARMA (p, q) process {X t } is f X (α) = σ2

|θ(eiα )|2 . |φ(e−iα )|2

26 | 1 Time series analysis For example, in the causal ARMA (1, 2) process X t − φ1 X t−1 = Z t + θ1 z t−1 + θ2 z t−2 , where

θ(z) = 1 + θ1 z + θ2 z2 , φ(z) = 1 − φ1 z.

So the spectral density of the causal ARMA (1, 2) process is f X (α) = σ2

2 2 2 |θ(eiα )|2 2 1 + θ 1 − 2θ 2 + θ 2 + 2(θ 1 θ 2 + θ 1 ) cos α + 4θ 2 cos α . = σ |φ(e−iα )|2 1 + φ21 − 2φ1 cos α

1.5 Prediction and modeling of ARMA processes The innovation algorithm of the linear prediction of time series has been discussed in Section 1.2. For ARMA processes, one can simplify the innovation algorithm. Based on an observed stationary time series, one can determine an appropriate ARMA model fitting it.

1.5.1 Prediction For a causal ARMA process X t − φ1 X t−1 − ⋅ ⋅ ⋅ − φ p X t−p = Z t + θ1 Z t−1 + ⋅ ⋅ ⋅ + θ q Z t−q , {Z t } ∼ WN(0, σ2 ), o from X n+1 . we use the innovation algorithm to predict X n+1 To simplify the algorithm, let m = max{ p, q } and

S t = σ−1 X t

(t = 1, . . . , m),

−1

S t = σ (X t − φ1 X t−1 − ⋅ ⋅ ⋅ − φ p X t−p ) (t > m).

(1.5.1)

The autocovariances h ij = E[S i S j ]. A direct computation shows that for 1 ≤ i, j ≤ m, since {X t } is stationary, h ij = σ−2 E[X i X j ] = σ−2 γ(i − j), where γ is autocovariance function of {X t } which can be calculated as in Section 1.4.2. For 1 ≤ i ≤ m and m ≤ j ≤ 2m, h ij = σ−2 E[X i (X j − φ1 X j−1 − ⋅ ⋅ ⋅ − φ p X j−p )] = σ−2 (γ(j − i) − φ1 γ(j − 1 − i) − ⋅ ⋅ ⋅ − φ p γ(j − p − i)).

1.5 Prediction and modeling of ARMA processes |

27

For 1 ≤ i ≤ m and j > 2m, h ij = 0. For 1 ≤ j ≤ m and m < i ≤ 2m, h ij = σ−2 (γ(i − j) − φ1 γ(i − 1 − j) − ⋅ ⋅ ⋅ − φ p γ(i − p − j)). For 1 ≤ j ≤ m and i > 2m, h ij = 0. For i > m and j > m, h ij = σ−2 E[(X i − φ1 X i−1 − ⋅ ⋅ ⋅ − φ p X i−p )(X j − φ1 X j−1 − ⋅ ⋅ ⋅ − φ p X j−p )] = σ−2 E[(Z i − θ1 Z i−1 + ⋅ ⋅ ⋅ + θ q Z i−q )(Z j + θ1 Z j−1 + ⋅ ⋅ ⋅ + θ q Z j−q )]. By the autocovariance function formula (1.1.1) of a linear process, { θ0 θ j−i + θ1 θ1+j−i + ⋅ ⋅ ⋅ + θ q θ q+j−i h ij = { θ θ + θ1 θ1+i−j + ⋅ ⋅ ⋅ + θ q θ q+i−j { 0 i−j

(j > i), (i > j),

where θ0 = 1. Therefore, the autocovariances {h ij } are as follows: (1 ≤ i, j ≤ m), σ−2 γ(i − j) { { { { p −2 { { σ (γ(i − j) − ∑l=1 φ l γ(l − |i − j|)) (min{ i, j } ≤ m < max{ i, j } ≤ 2m), h ij = { q { ∑ θ l θ l+|i−j| { (min{ i, j } > m), { { { l=0 otherwise. { 0, (1.5.2) By using the innovation algorithm to {S t }, the best linear predictor of S n+1 in terms of {1, S1 , . . . , S n } is { ∑n α nj (S n+1−j − S on+1−j ) (1 ≤ n < m), S on+1 = { 1q − S on+1−j ) (n ≥ m), ∑ α (S { 1 nj n+1−j

(1.5.3)

where the coefficients {α nj } are found from recurrence formula (1.2.5) in the innovation algorithm. Denote by X ko the best linear predictor of X k in terms of { 1, X1 , . . . , X k−1 }. Noticing that the linearity of prediction operator, by (1.5.1), we get { σ−1 X ko (k = 1, . . . , m), S ok = { o −1 σ (X k − φ1 X k−1 − ⋅ ⋅ ⋅ − φ p X k−p ) (k > m), { o −1 S k − S k = σ (X k − X ko ) (k ∈ ℤ+ ). From this and (1.5.3), it follows that o { ∑n α nj (X n+1−j − X n+1−j ) (1 ≤ n < m), o = { j=1 X n+1 q o φ X + ⋅ ⋅ ⋅ + φ p X n+1−p + ∑j=1 α nj (X n+1−j − X n+1−j ) (n ≥ m), { 1 n

(1.5.4)

and the mean squared errors o )2 ] = σ2 (S n+1 − S on+1 )2 = σ2 λ n , V n+1 = E[(X n+1 − X n+1

(1.5.5)

28 | 1 Time series analysis

where α nj and λ n are obtained from the innovation algorithm with h ij which are stated in (1.5.2). Especially, the predictions of the AR (p) and MA (q) processes are easier. By (1.5.4), the prediction of the AR (p) process is o = φ1 X n + ⋅ ⋅ ⋅ + φ p X n+1−p X n+1

(n ≥ p)

and the prediction of the MA (q) process is min{ p,q }

o o = ∑ α nj (X n+1−j − X n+1−j ) (n ≥ 1), X n+1 j=1 q−|n−j|

where α nj = σ−2 γ(n − j) = ∑l=0 θ l θ l+|n−j| . In general, the algorithm of the best linear prediction for an ARMA (p, q) process is as follows. Step 1. Use the method given in Section 1.4.2 to compute the autocovariance function γ. Step 2. Compute {h ij }i,j∈ℤ by (1.5.2). Step 3. Use the recurrence formula of the innovation algorithm to find α nj and λ n by (1.2.5). o Step 4. The predictor X n+1 and the mean squared error are given by the formulas (1.5.4) and (1.5.5). For example, consider the prediction of an ARMA (1, 1) process X t − φX t−1 = Z t + θZ t−1 , {Z t } ∼ WN(0, σ), where |φ| < 1. Now m = p = q = 1. Let S t = σ−1 X t

(t = 1),

−1

S t = σ (X t − φX t−1 )

(t > 1).

By (1.5.2) and h ij = E[S i S j ], we get h11 = σ−2 γ(0) = 1 +

(φ + θ)2 1 − φ2

h12 = h21 = σ−2 (γ(1) − φγ(0)) = θ h i+1,i = h i,i+1 = θ0 θ1 + θ1 θ2 = θ0 θ1 = θ h ii =

θ20

+

θ21

=1+θ

2

h ij = θ0 θ|i−j| + θ1 θ1+|i−j| = 0

(by (1.4.9)), (by (1.4.10)), (i > 1), (i ≥ 2), otherwise.

By (1.5.4), the best linear predictor of X n+1 is o = φX n + α n1 (X n − X no ) (n ≥ 1). X n+1

1.5 Prediction and modeling of ARMA processes |

29

By the recurrence formula of the innovation algorithm, successively compute λ0 , α11 , λ1 , α22 , α21 , λ2 , α33 , α32 , α31 , … as follows: λ0 = h11 = 1 +

φ+θ , 1 − φ2

λ1 = h22 − α211 λ0 = 1 + θ2 − λ2 = 1 + θ2 −

θ2 , λ1

θ2 , λ0

α33 =

h41 = 0, λ0

α11 =

h21 θ = , λ0 λ0

α22 =

h31 = 0, λ0

α21 =

h32 − α11 α22 λ0 θ = , λ1 λ1

α32 =

h42 − α11 α33 = 0, λ1

α31 =

h43 θ = , λ2 λ2

.. . α n1 =

θ . λ n−1

So the best linear predictor of X n+1 satisfies o = φX n + X n+1

where

θ λ n−1

(X n − X no ) (n ≥ 1),

λ n = 1 + θ2 − θ2 /λ n−1

(n ≥ 1),

λ0 = 1 + (φ + θ)/(1 − φ ), 2

and the mean squared error satisfies o )2 ] = σ 2 λ n . E[(X n+1 − X n+1

Now we consider the h-step prediction of an ARMA (p, q) process {X t }. Denote by P n X n+h the best linear predictor of X n+h in terms of 1, X1 , . . . , X n . For n > m = max{ p, q }, we find predictors X1o , . . . , X no and then find recursively predictors P n X n+1 , P n X n+2 , . . . using the formula p

q

o P n X n+h = ∑ φ l P n X n+h−l + ∑ α n+h−1,l (X n+h−l − X n+h−l ) (h ≥ 1) l=1

l=h

and the mean squared error is h−1

2

l

E[(X n+h − P n X n+h )2 ] = σ2 ∑ ( ∑ χ k α n+h−k−1,l−k ) λ n+h−j−1 , l=0

where

min{ p,k }

k=0

χ k = ∑ φ ν χ k−ν

(χ0 = 1).

ν=1

If {X t } is a causal ARMA (p, q) process with Gaussian white noise Z t ∼ WN(0, σ2 ), Z t ∼ N(0, σ2 ),

30 | 1 Time series analysis

then the prediction error is h−1

E[(X n+h − P n X n+h )2 ] = σ2 ∑ ψ2l , l=0 l where ψ l (l = 0, 1, . . . ) satisfies θ(z)/φ(z) = ∑∞ l=0 ψ l z .

1.5.2 Modeling We will estimate the coefficients φ1 , . . . , φ p , θ1 , . . . , θ q and the white noise variance σ2 such that the autocovariance γ X of the corresponding ARMA process approximates to the autocovariance function γ of the observation time series.

(a) The fitted AR model From a causal AR (p) process p

X t − ∑ φ k X t−k = Z t , k=1

the Yule–Walker equation given in Section 1.4.2 is as follows: γ(0) − ΦT γ p = σ2 , Γp Φ = γp , where

Γ p = (γ(i − j))i,j=1,...,p , γ p = (γ(1), . . . , γ(p))T , Φ = (φ1 , . . . , φ p )T .

In applications, we only know finitely many observations x0 , . . . , x N−1 . Since x̄ = 1 N−1 n ∑k=0 x k ≈ 0, the γ(h) is estimated by ̂ γ(h) =

1 N−1 ∑ x n x n+h . N 0

̂ In the Yule–Walker equation, replacing γ(ν) by the sample covariance γ(ν), Φ = (φ1 , . . . , φ p )T by its estimator Φ̂ = (φ̂ 1 , . . . , φ̂ p )T , and σ by its estimator σ,̂ we get ̂ − Φ̂ T γ ̂p , σ̂ 2 = γ(0) Γ ̂ p Φ̂ = γ ̂p .

(1.5.6)

This is the Yule–Walker equation of estimators, where ̂ − j))i,j=1,...,p , Γ ̂ p = (γ(i T ̂ ̂ γ ̂p = (γ(1), . . . , γ(p)) .

(1.5.7)

1.5 Prediction and modeling of ARMA processes |

31

̂ If γ(0) > 0, then Γ ̂ p is nonsingular. From this and (1.5.6), Φ̂ = (φ̂ 1 , . . . , φ̂ p )T = Γ ̂−1 p γ ̂p . Clearly, 1 − φ̂ 1 z − ⋅ ⋅ ⋅ − φ̂ p z p ≠ 0 (|z| ≤ 1). So the fitted model X t − φ̂ 1 X t−1 − ⋅ ⋅ ⋅ − φ̂ p X t−p = Z t , {Z t } ∼ WN(0, σ̂ 2 )

(1.5.8)

is a causal AR (p) process. Therefore, its autocovariance function γ F (h) satisfies the Yule–Walker equation {0 γ F (h) − φ̂ 1 γ F (h − 1) − ⋅ ⋅ ⋅ − φ̂ p γ F (h − p) = { σ̂ 2 {

(h = 1, . . . , p), (h = 0).

(1.5.9)

Comparing this and (1.5.6), ̂ (h = 0, . . . , p). γ F (h) = γ(h) From this, it is seen that the autocovariances of the fitted model (1.5.8) at lags 0, 1, . . . , p coincide with the sample autocovariances. For a causal AR (p) process {X t } p

X t − ∑ φ k X t−k = Z t , 1

{Z t } ∼ WN(0, σ2 ), the large-sample distribution of Yule–Walker estimator is Φ̂ = (φ̂ 1 , . . . , φ̂ p )T = Γ ̂−1 p γ ̂p and the distribution is Φ̂ ≈ N(Φ, n−1 σ2 Γ p−1 ), where Φ = (φ1 , . . . , φ p )T . How do we select the order p? The partial autocorrelation function (PACF) is given in Section 1.4.3. The sample PACF is defined as ̂ β(0) = 1, ̂ β(m) = Φ̂ mm

(m ∈ ℤ+ ),

̂ where Φ̂ mm is the last component of Φ̂ m = Γ ̂−1 m γ ̂m . If the sample PACF β(m) (0 ≤ m ≤ p) ̂ is significantly different from zero and β(m) ≈ 0 (m > p). Precisely speaking, if 1 ̂ β(m) > 1.96n− 2

(0 ≤ m ≤ p),

̂ β(m) < 1.96n

(m > p),

− 12

then a p-order AR model can fit the observation data.

32 | 1 Time series analysis

(b) The fitted MA model ̂ Given the observed data {x1 , . . . , x n }, denote by γ(h) its sample autocovariance funĉ − j))i,j=1,...,n . We can fit moving average tion. The sample covariance matrix is (γ(i models X t = Z t + α̂ m1 Z t−1 + ⋅ ⋅ ⋅ + α̂ mm Z t−m (m ∈ ℤ+ ), {Z t } ∼ WN(0, V̂ m )

(m ∈ ℤ+ ),

and α̂ m1 , . . . , α̂ mm , V̂ m are obtained by using the innovation algorithm with the autocovariance function matrix (γ(i − j))i,j=1,...,n replaced by the sample autocovariance ̂ − j))i,j=1,...,n . Since we know that for a zero-mean stationary profunction matrix (γ(i cess, if its autocovariance function (ACVF) γ(h) = 0 for h > q, then this process can be represented as a moving average process of order q or less. Therefore, if its sam̂ ̂ ple ACVF γ(h) satisfies the condition that γ(h) is significantly different from zero for ̂ 0 ≤ h ≤ q and γ(h) ≈ 0 for h > q, precisely speaking, if ̂ γ(h) > 1.96n− 2 1

̂ γ(h) < 1.96n

− 12

(0 ≤ h ≤ q), (h > q),

then a q-order MA model can fit the observation data.

(c) The fitted ARMA model We use m-order Yule–Walker estimates X t − φ̂ m1 X t−1 − ⋅ ⋅ ⋅ − φ̂ mm X t−m = Z t , {Z t } ∼ WN(0, σ̂ m ) to fit observed data {x1 , . . . , x n }, where m > max{ p, q }. Then the Hannam–Rissanen algorithm is divided into three steps. Step 1. Let Ẑ t = X t − φ̂ m1 X t−1 − ⋅ ⋅ ⋅ − φ̂ mm X t−m (t = m + 1, . . . , n). Step 2. Choose β = (φ1 , . . . , φ p , θ1 , . . . , θ q ) such that the following sum of squares n

S(β) = ∑(X t − φ1 X t−1 − ⋅ ⋅ ⋅ − φ p X t−p − θ1 Ẑ t−1 − ⋅ ⋅ ⋅ − θ q Ẑ t−q )2 t=m+1+q

attains the minimal value. Let β̂ = (H T H)−1 H T X n , where X n = (X m+1+q , . . . , X n )T and H = (H1 |H2 ), and X m+q X m+q+1 H1 = ( . .. X n−1

X m+q−1 X m+q .. . X n−2

⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅

X m+q+1−p X m+q+2−p ), .. . X n−p

1.5 Prediction and modeling of ARMA processes |

Ẑ m+q−1 Ẑ m+q .. . Ẑ n−2

Ẑ m+q Ẑ m+q+1 H2 = ( . .. ̂ Z n−1

⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅

33

Ẑ m+1 Ẑ m+2 .. ) . . Ẑ n−p

It can be shown that β̂ = (φ̂ 1 , . . . , φ̂ p , θ̂1 , . . . , θ̂q ) such that S(β)̂ attains the minimal value. Step 3. For t ≤ max{ p, q }, let Z̃ t = 0, V t = 0, and W t = 0. For max{ p, q } < t ≤ n, let p

q

1

1

Z̃ t = X t − ∑ φ̂ j X t−j − ∑ θ̂j Z̃ t−j , p

V t = ∑ φ̂ j V t−j + Z̃ t , 1 p

W t = − ∑ θ̂j W t−j + Z̃ t . 1

Take β = β̂+ such that n

+

S (β) =

p

q

1

1

2

(Z̃ t − ∑ β j V t−j − ∑ β k+p W t−k )

∑ t=max{ p,q }+1

attains the minimal value. The β̃ = β̂+ + β̂ is a good estimator. Let β̃ = (φ̃ 1 , . . . , φ̃ p , θ̃ 1 , . . . , θ̃ q ). Then the ARMA (p, q) process X t − φ̃ 1 X t−1 − ⋅ ⋅ ⋅ − φ̃ p X t−p = Z t + θ̃ 1 Z t−1 + ⋅ ⋅ ⋅ + θ̃ q Z t−q . can fit the observed data.

(d) Maximum likelihood estimation Suppose that {X t } is a zero-mean Gaussian time series and the covariance matrix Γ n of X1 , . . . , X n is nonsingular. The likelihood of X n is L(Γ n ) = (2π)− 2 (det Γ n )− 2 exp (− 12 X Tn Γ n−1 X n ) . n

1

It can be rewritten in the form L(Γ n ) = ((2π)n V0 ⋅ ⋅ ⋅ V n−1 )

− 12

1 n (X j − X j ) exp (− ∑ ), 2 1 V j−1 o 2

(1.5.10)

where X jo is the best linear prediction in terms of X j−1 , . . . , X1 and V j is the corresponding mean squared error. If {X t } is not Gaussian, we always say (1.5.10) is the Gaussian likelihood of X1 , . . . , X n .

34 | 1 Time series analysis Suppose that data {X t } are from an ARMA (p, q) process. The one-step predictors o X n+1 and the mean squared errors V n+1 are stated in (1.5.4) and (1.5.5). Therefore, X jo are the functions of φ1 , . . . , φ p , θ1 , . . . , θ q and the Gaussian likelihood for an ARMA process is L(φ, θ, σ ) = ((2πσ ) λ0 ⋅ ⋅ ⋅ λ n−1 ) 2

2 n

− 12

o 2 1 n (X j − X j ) exp (− 2 ∑ ), λ j−1 2σ 1

where φ = (φ1 , . . . , φ p ) and θ = (θ1 , . . . , θ q ). Note that ∂ n 1 n (X j − X j ) 2 log(φ, θ, σ ) = − + . ∑ λ j−1 ∂σ2 2σ2 2σ4 1 o 2

Then L(φ, θ, σ2 ) attains the minimal value when σ2 = σ̂ 2 = n

H(φ, θ) = ∑ 1

(X j − X jo )2 λ j−1

1 n

∑nj=1

(X j −X jo )2 λ j−1

. Denote

.

Take φ = φ̂ and θ = θ̂ such that l(φ, θ) = log

H(φ, θ) 1 n + ∑ log λ j−1 n n 1

attains the minimal value and the estimate of σ2 is σ̂ 2 =

1 ̂ H(φ,̂ θ). n−p−q

In this way, for an ARMA (p, q) process, we can give the estimators of parameters φ̂ 1 , . . . , φ̂ p , θ̂1 , . . . , θ̂q and σ̂ 2 . The fitted ARMA (p, q) model is X t − φ̂ 1 X t−1 − ⋅ ⋅ ⋅ − φ̂ p X t−p = Z t + θ̂1 Z t−1 + ⋅ ⋅ ⋅ + θ̂q Z t−q , {Z t } ∼ WN(0, σ̂ 2 ). The selection of the orders p and q for general ARMA models is done by using the AICC statistic AICC = −2L (φ p , θ q ,

H(φ p , θ q ) 2(p + q + 1)n . )+ n n−p−q−2

Choose p, q, φ p , θ q such that AICC attains the minimal value.

1.6 Multivariate ARMA processes Consider the k-dimensional time series Y t = (Y t1 , . . . , Y tk )T . The purpose of multivariate time series analysis is to understand the relationship among the component

1.6 Multivariate ARMA processes |

35

time series. If Y t has a constant mean vector μ = (μ1 , . . . , μ k )T for all t and its crossT covariance matrix R(Y t , Y t+l ) = Γ(l) only depends on l, then the k-dimensional time T series Y t is called a stationary vector process, where R(Y t , Y t+l ) = (E[Y ti Y t+l,j ])i,j=1,...,k and each E[Y ti Y t+l,j ] is independent of t. Denote γ ij (l) = E[Y ti Y t+l,j ], Γ(l) = (γ ij (l))i,j=1,...,k . If {Y t } is a k-dimensional zero-mean stationary vector process and p

q

Y t − ∑ Φ j Y t−j = Z t + ∑ Θ j Z t−j , 1

(1.6.1)

1

then Y t is called a vector ARMA process, where Φ j , Θ j are both k × k matrixes independent of t and Z t is a k-dimensional white noise satisfying E[Z t ] = 0, E[Z t Z Tt ] = Σ, E[Z t Z Tt+l ]

=0

(1.6.2) (l ≠ 0).

Here Σ is a k × k constant matrix. Denote {Z t } ∼ WN(0, Σ). Especially, if Φ j = 0 (j = 1, . . . , p), then Y t is called a vector MA process. If Θ j = 0 (j = 1, . . . , q), then Y t is called a vector AR process. Introduce two matrix-valued polynomials Φ(z) = I − Φ1 z − ⋅ ⋅ ⋅ − Φ p z p , Θ(z) = I + Θ1 z + ⋅ ⋅ ⋅ + Θ q z q , where I is the k × k identity matrix, z is a complex variable, and each component of matrices Φ(z), Θ(z) is a polynomial of degree p and q with matrix coefficients, respectively. Denote (j)

(j = 0, 1, . . . , p),

Φ0 = I,

(j)

(j = 0, 1, . . . , q),

Θ0 = I.

Φ j = (φ mn )k×k Θ j = (θ mn )k×k Then p

q

(j)

Φ(z) = ( ∑ φ mn z j ) j=0

(j)

Θ(z) = ( ∑ θ mn z j ) k×k

j=0

k×k

Let B be backward shift operator. Then BY t = Y t−1 , B j Y t = Y t−j , and so Φ(B)Y t = (I − Φ1 B − ⋅ ⋅ ⋅ − Φ p B p )Y t = Y t − Φ1 Y t−1 − ⋅ ⋅ ⋅ − Φ p Y t−p , Θ(B)Z t = (I + Θ1 B + ⋅ ⋅ ⋅ + Θ p B p )Z t = Z t + Θ1 Z t−1 + ⋅ ⋅ ⋅ + Θ q Z t−p . Equation (1.6.1) can be rewritten in the form Φ(B)Y t = Θ(B)Z t , where {Z t } ∼ WN(0, Σ).

36 | 1 Time series analysis

1.6.1 Vector MA processes Let Y t be an MA (q) process q

Y t = Z t + ∑ Θ j Z t−j = Θ(B)Z t . 1

If the determinant of the matrix Θ(z) satisfies det Θ(z) ≠ 0 (|z| ≤ 1), then the inverse matrix Θ−1 (z) exists on |z| ≤ 1 and Θ−1 (z) = (det Θ(z))−1 Θ∗ (z)

(|z| ≤ 1),

where Θ∗ (z) is the associated matrix of Θ(z). Both each component h ij (z) of Θ∗ (z) and det Θ(z) are polynomials of z and det Θ(z) ≠ 0 (|z| ≤ 1). Therefore, h ij (z)/det Θ(z) can be expanded into a power series and Θ−1 (z) can be expanded into the power series j with matrix coefficients Θ−1 (z) = ∑∞ j=0 π j z (|z| ≤ 1). It is easy to show that π 0 = I . Replacing z by B, ∞

Θ−1 (B) = I + ∑ π j B j 1

since ‖B‖ = 1. From this and (1.6.3), it follows that ∞

Z t = Θ−1 (B)Y t = Y t + ∑ π j Y t−j . j=1

For a vector MA process, the cross-covariance matrix Γ(l) is equal to q

Γ(l) = Cov(Y t , Y t+l ) = q

T E[Y t Y t+l ]

q

T

= E [(Z t − ∑ Θ j Z t−j ) (Z t+l − ∑ Θ j Z t+l−j ) ] 1 1 [ ] T

q

q

q

= E [(∑ Θ j Z t−j ) (∑ Θ j Z t+l−j ) ] = ∑ ∑ Θ j E[Z t−j Z Tt+l−k ]ΘTk , 0 [ 0 ] k=0 j=0 where Θ0 = I . Again, by (1.6.2) and Θ j = 0 (k > q), E[Z t−j Z Tt+l−k ] = δ j,k−l Σ, and so q−l

Γ(l) = ∑ Θ k ΣΘTk+l

(l = 0, 1, . . . , q).

0

Conversely, if the cross-covariance matrix is known, the MA coefficients Θ k and Σ can be found by this formula. For example, consider a bivariate MA (1) model Y t = Z t + Θ1 Z t−1 , where Y t = (Y1t , Y2t )T ,

Θ1 = (

α11 α21

α12 ). α22

1.6 Multivariate ARMA processes

|

37

Let Z t = (Z1t , Z2t )T and Y t = Θ(B)Z t , and Θ(B) = I + Θ1 B = (

1 + α11 B α21 B

α12 B ). 1 + α22 B

Then det(Θ(B)) = (1 + α11 B)(1 + α22 B) − α12 α21 B2 .

(1.6.3)

The inverse matrix Θ−1 (B) =

1 1 + α22 B ( det(Θ(B)) −α21 B

−α12 B ). 1 + α11 B

This implies Θ−1 (B)Y t = Z t , and so 1 + α22 B −α21 B

( or

−α12 B ) Y t = Z t det(Θ(B)) 1 + α11 B

(1 + α22 B)Y1t = α12 BY2t + det(Θ(B))Z1t , (1 + α11 B)Y2t = α21 BY1t + det(Θ(B))Z2t .

Its solution is Y1t = α12 B(1 + α22 B)−1 Y2t + det(Θ(B))(1 + α22 B)−1 Z1t , Y2t = α21 B(1 + α11 B)−1 Y1t + det(Θ(B))(1 + α11 B)−1 Z2t . Such structure is called a joint transfer function structure. For convenience, assume that α12 = 0. Then by (1.6.3) these equations can be rewritten in the form Y1t = det(Θ(B))(1 + α22 B)−1 Z1t = (1 + α11 B)Z1t , Y2t = a21 B(1 + α11 B)−1 Y1t + (1 + α22 B)Z2t .

(1.6.4)

Even if α12 ≠ 0, the equations of a form similar to (1.6.4) could still be arrived at. From this it is seen that future values of the process Y2t depend on the past of both Y1t and Y2t , whereas future values of Y1t only depend on its own past and not on the past of Y2t . In applications, the equations (1.6.4) often represent the model structure of most interest.

1.6.2 Vector AR processes Let Y t be an AR (p) process p

Y t − ∑ Φ j Y t−j = Z t 1

or

Φ(B)Y t = Z t ,

(1.6.5)

38 | 1 Time series analysis where Φ(z) = I − Φ1 z − ⋅ ⋅ ⋅ − Φ p z p . If the determinant of Φ(z) satisfies det(Φ(z)) ≠ 0 (|z| ≤ 1), then the inverse matrix Φ−1 (z) exists and can be expanded into a power series with matrix coefficients ∞

Φ−1 (z) = ∑ Ψ j z j

(|z| ≤ 1).

0

Since Y t is stationary and ‖B‖ = 1, ∞

Φ−1 (B) = ∑ Ψ j B j . j=0

From this and (1.6.5), ∞



0

0

Y t = Φ−1 (B)Z t = ∑ Ψ j B j (Z t ) = ∑ Ψ j Z t−j ,

Ψ0 = I.

Especially, for AR (1) model Y t = ΦY t−1 + Z t ,

(1.6.6) det(z −1 I

− Φ) ≠ 0 the condition det(Φ(z)) = det(I − Φz) ≠ 0 (|z| ≤ 1) is equivalent to (|z| ≤ 1), i.e., the absolute values of all eigenvalues of the matrix Φ are greater than or equal to 1. Repeatedly using (1.6.6), we get n

Y t = ∑ Φ j Z t−j + Φ n+1 Y t−n−1 . 0

From this, it is seen that when the initial value Y n−1 is known, the value of an AR (1) process Y t can be deduced from the above formula. T From Y t−l = Z t−l + ∑∞ j=0 Φ j Z t−l−j and E[Y t−l Z t ] = 0 (l > 0), it implies the Yule– Walker equations p

Γ(0) = ∑ Γ(−j)ΦTj + Σ, 1 p

Γ(l) = E[Y t−l Y tT ] = ∑ Γ(l − j)ΦTj

(l = 1, . . . , p).

1

The AR (p) process can always be expressed in the form of a kp-dimensional vector AR (1) model T )T , W t = (Y tT , . . . , Y t−p+1 where W t = ΦW t−1 + Z t with Z t = (Z Tt , 0T , . . . , 0T )T and Φ1 I ( Φ=( (0 .. .

Φ2 0

⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ ⋅⋅⋅

I ..

0 .. . 0

⋅⋅⋅ .. . I

⋅⋅⋅

⋅⋅⋅

⋅⋅⋅ 0

⋅⋅⋅ ⋅⋅⋅

. ⋅⋅⋅

(0 is a kp × kp matrix associated with Φ(B).

Φp 0 .. ) . ) ) .. . 0)

1.7 State-space models |

39

1.6.3 Multivariate ARMA (p, q) processes Let {Y t }t∈ℤ be a k-variate ARMA (p, q) model stated in (1.6.1) and (1.6.2). If det Φ(z) ≠ 0 (|z| ≤ 1), then the matrix Φ(z) is nonsingular and its inverse matrix Φ−1 (z) exists. Multiplying both sides by Φ−1 (B), Y t = Φ−1 (B)Θ(B)Z t = Ψ(B)Z t ,

(1.6.7)

where Ψ(B) = Φ−1 (B)Θ(B). Expand it into a power series in z with matrix coefficients ∞

Ψ(z) = ∑ Ψ j z j

(|z| ≤ 1),

0

where Ψ j is a k × k matrix. The corresponding operator is ∞

Ψ(B) = ∑ Ψ j B j . j=0

By (1.6.7), ∞

p

q

∑ Θ j z j = (I − ∑ Φ j z j ) (I + ∑ Ψ j z j ) 0

1

1

= I + (Ψ1 − Φ1 )z + ⋅ ⋅ ⋅ + (Ψ j − Φ1 Ψ j−1 − ⋅ ⋅ ⋅ − Φ p Ψ j−p )z j + ⋅ ⋅ ⋅ . Equating coefficient matrixes of various powers z j , we get Ψ0 = 1, Θ j = Ψ j − Φ1 Ψ j−1 − ⋅ ⋅ ⋅ − Φ p Ψ j−p

(j = 1, . . . , p).

From this and (1.6.7), ∞



0

0

Y t = Ψ(B)Z t = ∑ Ψ j B j Z t = ∑ Ψ j Z t−j .

1.7 State-space models 1.7.1 State-space representation A state-space model for an m-dimensional time series {Y t } consists of two equations. The first equation Y t = K t X t + W t (t ∈ ℤ+ ). (1.7.1) is called the observation equation and the second equation X t+1 = L t X t + V t

(t ∈ ℤ+ ),

(1.7.2)

40 | 1 Time series analysis is called the state equation, where K t is an m × n matrix and L t is an n × n matrix, and {W t } ∼ WN(0, R t ), (1.7.3) {V t } ∼ WN(0, Q t ). Here R t is an m × m matrix and Q t is an n × n matrix. Assume that E[W t V sT ] = 0,

E[X1 W sT ] = 0,

E[X1 V sT ] = 0 (s, t ∈ ℤ+ ).

By (1.7.1) and (1.7.2), X t = (L t−1 , . . . , L1 )X1 + (L t−1 , . . . , L2 )V1 + ⋅ ⋅ ⋅ + L t−1 V t−2 + V t−1 , Y t = (K t , . . . , K1 )X1 + (K t , . . . , K2 )W1 + ⋅ ⋅ ⋅ + K t W t−1 + W t . This implies that for 1 ≤ s ≤ t, E[V t X Ts ] = 0,

E[V t Y sT ] = 0,

E[W t X Ts ] = 0,

E[W t Y sT ] = 0.

The state-space model is the extension of the ARMA model. For example, if {Y t } is a causal and invertible ARMA (1, 1) model Y t − φY t−1 = Z t + θZ t−1 , {Z t } ∼ WN(0, σ2 ). The corresponding observation equation is Y t = (θ, 1)X t , where X t = (X t−1 , X t )T , and the state equation is 0 1 X t+1 = ( ) Xt + Vt , 0 φ where V t = (0, Z t+1 )T and ∞



k=0

k=0

T

X1 = ( ∑ φ k Z−k , ∑ φ k Z1−k ) . If {Y t } is a causal ARMA (p, q) model, Y t − φ1 Y t−1 − ⋅ ⋅ ⋅ − φ p Y t−p = Z t + θ1 Z t−1 + ⋅ ⋅ ⋅ + θ q Z t−q . Denote r = max{ p, q + 1 }, φ j = 0 (j > p), θ j = 0 (j > q), and θ0 = 1. Then the observation equation is Y t = (θ r−1 , . . . , θ0 )X t , X t = (U t−r+1 , . . . , U t )T , where {U t } satisfies the causal AR (p) equation U t − φ1 U t−1 − ⋅ ⋅ ⋅ − φ p U t−p = Z t

1.7 State-space models |

41

and the state equation is

X t+1

0 0 ( = ( ... 0 (φ r

1 0 .. . 0 φ r−1

⋅⋅⋅ ⋅⋅⋅ .. . ⋅⋅⋅ ⋅⋅⋅

0 1 .. . 0 φ r−2

0 0

0 0 ) ( .. ) 0 ) X t + ( . ) Z t+1 0 1 φ1 ) (1)

(t ∈ ℤ).

We extend ARMA models to ARIMA models which are nonstationary time series and are highly efficient in short-time forecasting. For d = 0, 1, . . . , let d

Ỹ n = (1 − B)d Y n = ∑ (−1)k k=0

d! Y n−k . k!(d − k)!

If {Ỹ n }n∈ℤ is a causal ARMA (p, q) process, we say {Y n }n∈ℤ is an ARIMA (p, d, q) process. Let {Y t } be an ARIMA model (1 − φB)(1 − B)Y t = (1 + θB)Z t , {Z t } ∼ WN(0, σ2 ). Then the observation equation is Y t = (θ, 1, 1)(X t−1 , X t , Y t−1 )T , where

j X0 ∑∞ 0 φ Z −j ∞ j (X1 ) = (∑0 φ Z1−j ) , Y0 Y0

0 Xt (X t+1 ) = (0 θ Yt

1 φ 1

0 X t−1 0 0) ( X t ) + (Z t+1 ) 1 Y t−1 0

(t ∈ ℤ+ ).

For a general ARIMA process, a state-space representation can be found. Consider a randomly varying trend with noise. Let Y1 be a random variable and {V t } ∼ WN(0, σ2 ), E[V t Y1 ] = 0

(t ∈ ℤ+ ).

Define a process {Y t } as Y t+1 = Y t + α + V t = Y1 + αt + V1 + ⋅ ⋅ ⋅ + V t , where α is a constant. Let X t = (Y t , α)T and X t satisfy the state equation X t+1 = (

1 0

1 ) Xt + Vt 1

(t ∈ ℤ),

where V t = (V t , 0)T . The process {Y t } can be determined by the observation equation Y t = (1, 0)X t .

42 | 1 Time series analysis

1.7.2 Kalman prediction Let {Y t }t∈ℤ be a state-space model whose observation equation and state equation are stated in (1.7.1) and (1.7.2), respectively. We want to find the best linear prediction of the state X t by the observation Y1 , . . . , Y t−1 and a random vector Y0 satisfying Y0 ⊥ V t and Y0 ⊥ W t for all t. In many cases, choose Y0 = (1, 1, . . . , 1)T . Let X t = (X t1 , . . . , X tn )T be an n-dimensional random vector. The best one-step o o T linear predictor X to = (X t1 , . . . , X tn ) of X t in terms of t vectors Y0 = (Y01 , . . . , Y0n ), . . . , Y t−1 = (Y t−1,1 , . . . , Y t−1,n ) o is the best linear predictor of X tk in terms of all the components is defined as each X tk

Y01 , . . . , Y0n , Y11 , . . . , Y1n , . . . , Y t−1,1 , . . . , Y t−1,n of t vectors Y0 , . . . , Y t−1 . By the orthogonality principle, we get n

(0)

n

(1)

n

(t−1)

o X tk = ∑ α kl Y0l + ∑ α kl Y1l + ⋅ ⋅ ⋅ + ∑ α kl l=1

l=1

Y t−1,l

(k = 1, . . . , n)

l=1

o )Y μν ] = 0. and E[(X tk − X tk

Kalman prediction For the state-space models (1.7.1) and (1.7.2), denote by X to the best one-step linear predictor of X t in terms of t observation vectors Y0 , . . . , Y t−1 and denote by Ω t the covariance matrix E[(X t − X to )(X t − X to )T ] of the error X t − X to . Then the linear predictor X t is determined uniquely by initial conditions X1o and Ω1 = E[(X1 − X1o )(X1 − X1o )T ], and o o = L t X to + Θ t ∆−1 X t+1 t (Y t − K t X t ) (t ∈ ℤ+ ), T Ω t+1 = L t Ω t LTt + Q t − Θ t ∆−1 t Θt

(t ∈ ℤ+ ),

where Θ t = L t Ω t K Tt and ∆ t = K t Ω t K Tt + R t , and ∆−1 t is the inverse of ∆ t . Here matrixes L t , K t , R t , and Q t are stated in (1.7.1), (1.7.2), and (1.7.3).

1.7.3 Kalman filtering and Kalman fixed point smoothing For a state-space model, we find the estimator of the state vector X t in terms of ̃t . Y0 , . . . , Y t . This estimator is called the Kalman filter, denoted by X

Further reading |

43

Kalman filtering Let X to , Ω t , and ∆−1 t be stated as in the Kalman prediction. Then ̃ t is (a) the filtered estimate X o X to + Ω t K Tt ∆−1 t (Y t − L t X t );

(b) the error covariance matrix is ̃ t )(X t − X ̃ t )T ] = Ω t − Ω t K T ∆−1 ΩT . E[(X t − X t t t For a state-space model, we find the estimator of the state vector X t in terms of Y0 , . . . , Y t , . . . , Y s . This estimator is called the Kalman smoothing, denoted by X ts .

Kalman smoothing Let X to and ∆−1 t be stated as in the Kalman prediction. Then (a) the smoothed estimate X ts is determined by initial conditions X to and Ω tt = Ω t , and o X ts = X ts−1 + Ω ts GTs ∆−1 s (Y s − G s X s ) (s = t, t + 1, . . . ), T where Ω t,s+1 = Ω ts (L s − Θ s ∆−1 s Ks ) ; (b) the error covariance matrix E[(X t − X ts )(X t − X ts )T ] = Ω st is determined by the recursive formula T Ω st = Ω s−1 − Ω ts K Ts ∆−1 s K s Ω ts . t

Further reading [1] [2] [3] [4] [5]

[6]

[7] [8]

Bao C, Hao H, Li ZX. Integrated ARMA model method for damage detection of subsea pipeline system. Engineering Structures. 2013(48):176–192. Boularouk Y, Djeddour K. New approximation for ARMA parameters estimate. Mathematics and Computers in Simulation. 2015(118):116–122. David M, Ramahatana F, Trombe PJ, Lauret P. Probabilistic forecasting of the solar irradiance with recursive ARMA and GARCH models. Solar Energy. 2016(133):55–72. Flores JJ, Graff M, Rodriguez H. Evolutive design of ARMA and ANN models for time series forecasting. Renewable Energy. 2012(44):225–230. Galiana-Merino JJ, Pla C, Fernandez-Cortes A, Cuezva S, Ortiz J, Benavente D. Environmental wavelet tool: Continuous and discrete wavelet analysis and filtering for environmental time series. Computer Physics Communications. 2014(185):2758–2770. Ip RHL, Li WK, Leung KMY. Seemingly unrelated intervention time series models for effectiveness evaluation of large scale environmental remediation. Marine Pollution Bulletin. 2013(74): 56–65. Liu J, Deng Z. Information fusion Kalman predictor for two-sensor multichannel ARMA signal system with time-delayed measurements. Procedia Engineering. 2012(29):623–629. Liu Y, Wu J, Liu Y, Hu BX, Hao Y, Huo X, Fan Y, Yeh TJ, Wang ZL. Analyzing effects of climate change on streamflow in a glacier mountain catchment using an ARMA model. Quaternary International. 2015(358):137–145.

44 | 1 Time series analysis

[9] [10]

[11] [12] [13]

[14] [15]

[16] [17]

[18]

[19] [20] [21]

Macciotta NPP, Vicario D, Pulina G, Cappio-Borlino A. Test day and lactation yield predictions in Italian Simmental cows by ARMA methods. Journal of Dairy Science. 2002(85):3107–3114. Kadri F, Harrou F, Chaabane S, Sun Y, Tahon C. Seasonal ARMA-based SPC charts for anomaly detection: Application to emergency department systems. Neurocomputing. 2016(173): 2102–2114. Kapetanios G. A note on an iterative least-squares estimation method for ARMA and VARMA models. Economics Letters. 2003(79):305–312. Krasnov H, Katra I, Friger M. Increase in dust storm related PM10 concentrations: A time series analysis of 2001–2015. Environmental Pollution. 2016(213):36–42. Piston N, Schob C, Armas C, Prieto I, Pugnaire FI. Contribution of co-occurring shrub species to community richness and phylogenetic diversity along an environmental gradient. Perspectives in Plant Ecology, Evolution and Systematics. 2016(19):30–39. Ran C, Deng Z. Self-tuning distributed measurement fusion Kalman estimator for the multichannel ARMA signal. Signal Processing. 2011(91):2028–2041. Soni K, Parmar KS, Kapoor S, Kumar N. Statistical variability comparison in MODIS and AERONET derived aerosol optical depth over Indo-Gangetic Plains using time series modeling. Science of the Total Environment. 2016(553):258–265. Takemura A. Exponential decay rate of partial autocorrelation coefficients of ARMA and shortmemory processes. Statistics & Probability Letters. 2016(110):207–210. Tulbure MG, Broich M, Stehman SV, Kommareddy A. Surface water extent dynamics from three decades of seasonally continuous Landsat time series at subcontinental scale in a semi-arid region. Remote Sensing of Environment. 2016(178):142–157. Valipour M, Banihabib ME, Behbahani SMR. Comparison of the ARMA, ARIMA, and the autoregressive artificial neural network models in forecasting the monthly inflow of Dez Dam Reservoir. Journal of Hydrology. 2013(476):433–441. Voyant C, Muselli M, Paoli C, Nivet ML. Numerical weather prediction (NWP) and hybrid ARMA/ANN model to predict global radiation. Energy. 2012(39):341–355. Xu W, Gu R, Liu Y, Dai Y. Forecasting energy consumption using a new GM-ARMA model based on HP filter: The case of Guangdong Province of China. Economic Modelling. 2015(45):127–135. Zhang P, Qi W, Deng Z. Multi-channel ARMA signal covariance intersection fusion Kalman predictor. Procedia Engineering. 2012(29):609–615.

2 Chaos and dynamical systems Complex dynamical systems exist widely in nature. In 1961, Edward Lorenz ran a numerical computer model to make a weather prediction. When he entered the initial condition 0.506 instead of 0.506127, the result was a completely different weather scenario. Later on, when Lorenz presented this result at the 139th meeting of the American Association for the Advancement of Science in 1972, Philip Merilees concocted “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas” as the title of Lorenz’s talk. This is the so-called butterfly effect. In general, for any complex dynamical system in environmental science, its trajectory often sensitively depends on initial conditions. If two initial conditions have a small difference and their trajectories after some time are exponential separation, such a system is called a chaotic dynamical system. If a set of initial conditions, after some time, is attracted to some subset which is invariant under dynamical evolution, such an invariant subset is called an attractor of the system. Attractors of chaotic dynamical systems often exhibit an unusual kind of self-similarity and have fractal dimension. The sensitivity dependence on initial conditions and the fractal dimension of attractors are two of the most important characteristics of dynamical systems. In environmental science, the equations underlying a dynamical system are often unknown. Using the delay-coordinate embedding technique, the dynamical system can be reconstructed from the trajectories which originate from a dynamical system.

2.1 Dynamical systems In theory, a dynamical system is defined as a first-order differential equation acting on a phase space ℝm d x(t) = F(t, x(t)) (t ∈ ℝ), (2.1.1) dt where x = (x(1) , . . . , x(m) ) and F = (F1 , . . . , F m ). In the discrete case, define a dynamical system as an m-dimensional map x n+1 = F(x n ) (n ∈ ℤ), (1)

(2.1.2)

(m)

where x k = (x k , . . . , x k ). If F is not depend explicitly on t, i.e., d x(t) = F(x(t)) dt

(t ∈ ℝ),

then the system is called autonomous. If F satisfies the Lipschitz condition ‖F(x) − F(x󸀠 )‖ ≤ K ‖x − x 󸀠 ‖α

DOI 10.1515/9783110424904-003

(0 < α ≤ 1),

(2.1.3)

46 | 2 Chaos and dynamical systems where x, x 󸀠 ∈ ℝm , K > 0 is a constant, and ‖ ⋅ ‖ is the norm of ℝm , then the initial value problem d { dt x(t) = F(x(t)) (t ∈ ℝ), { x(0) = a { has a unique solution. Let F = (F1 , . . . , F m ). Suppose that divergence of F in (2.1.3) is less than zero, ∂F m ∂F1 ∂F2 + + ⋅⋅⋅ + < 0. div F = ∇ ⋅ F = ∂x1 ∂x2 ∂x m Then the system is called a dissipative system. In this case, a set of initial conditions is contracted under the dynamical evolution. This set is attracted to some invariant subset which is called an attractor of the system. For the discrete dynamical system (2.1.2), consider the Jacobian matrix of F , ∂F1 ∂x(1)

JF = (

∂F i = ( ... ) ∂x(j) m×m

∂F m ∂x(1)

⋅⋅⋅ .. . ⋅⋅⋅

∂F1 ∂x(m)

.. ) , .

∂F m ∂x(m)

if the absolute value of the determinant of J F satisfies |det J F | < 1, then a set of initial conditions is contracted to an invariant subset which is called an attractor.

2.2 Henon and logistic maps Simple dynamical systems can exhibit a completely unpredictable behavior. Here we give some examples, including the famous Henon and logistic maps.

2.2.1 Circular motion For a dynamical system dx1 = −ωx2 , dt

dx2 = ωx1 , dt

2

2 2 where ω is a constant, we have ddtx21 = −ω dx dt = −ω x 1 or of this second-order differential equation

x1 = a cos ω(t − t0 ), x2 = a sin ω(t − t0 )

d2 x1 dt2

(2.2.1) + ω2 x1 = 0. The solution

2.2 Henon and logistic maps |

47

is a trajectory of the dynamical system (2.2.1), where a, t0 ∈ ℝ. Rewrite (2.2.1) into the vector form dx = F(x) = Ax, dt where x = (x1 , x2 )T and F = (F1 , F2 ) = (−ωx2 , ωx1 ) and A = ( ω0 −ω 0 ). The divergence of F is ∂F1 ∂F2 ∂(−ωx2 ) ∂(ωx1 ) ∇⋅F = + = + = 0. ∂x1 ∂x2 ∂x1 ∂x2 So the system has bounded solutions and is not dissipative.

2.2.2 Henon map (1)

(2)

For the Henon map x n+1 = F(x n ) (n ∈ ℤ), where x n = (x n , x n ), F = (F1 , F2 )T , and (1)

(2)

F1 (x n ) = a − (x n )2 + bx n , (1)

F2 (x n ) = x n , and a, b are constants, its Jacobian matrix is ∂F1

J F = ( ∂x

(1)

∂F2 ∂x(1)

∂F1 ∂x(2) ) ∂F2 ∂x(2)

−2x(1) 1

=(

b ) 0

and the determinant |det J F | = |b|. If |b| < 1, then the system is dissipative.

2.2.3 Fixed point Let x n+1 = F(x n ), where F : ℝm → ℝm is a map satisfying ‖F(s) − F(t)‖ ≤ q‖s − t‖ (0 ≤ q < 1). Here ‖ ⋅ ‖ is the norm of the space ℝm . For any x1 ∈ ℝm , x2 = F(x1 ), { { { { . .. { { { { { x n+1 = F(x n ), and so ‖x n+1 − x n ‖ = ‖F(x n ) − F(x n+1 )‖ ≤ q ‖x n − x n−1 ‖ ≤ ⋅ ⋅ ⋅ ≤ q n−1 ‖x2 − x1 ‖. This implies that {x n } is convergent. In fact, ‖x n+p − x n ‖ ≤ ‖x n+p − x n+p−1 ‖ + ⋅ ⋅ ⋅ + ‖x n+1 − x n ‖ ≤ (q n+p−2 + ⋅ ⋅ ⋅ + q n−1 )‖x2 − x1 ‖ ≤

qn ‖x2 − x1 ‖ → 0 1−q

(n → ∞).

48 | 2 Chaos and dynamical systems Thus, there is a point x ∗ ∈ ℝm such that x n → x ∗ (n → ∞). Finally, we prove the uniqueness of x ∗ , i.e., x ∗ is independent of the choice of the initial value. Since F is continuous, clearly, from x n+1 = F(x n ) and x n → x ∗ , it follows that ∗ ̄ then x = F(x ∗ ). If there is an x̄ ≠ x ∗ such that x̄ = F(x), ̄ ≤ q ‖x∗ − x‖̄ < ‖x ∗ − x‖. ̄ 0 < ‖x∗ − x‖̄ = ‖F(x ∗ ) − F(x)‖ This is a contradiction. So x ∗ = x.̄ From this, it is seen that sometimes the solution of a dynamical equation tends to a point which is independent of the choice of the initial value. This point is just an attractor of this system.

2.2.4 Linear mapping Consider the dynamical system x n+1 = αx n + β

(n ∈ ℤ),

(2.2.2)

where x n , α, β ∈ ℝ and α, β are constants. Let x n = γy n + δ. Then (2.2.2) becomes γy n+1 + δ = αγy n + αδ + β. Choose δ such that δ = αδ + β, i.e., δ =

β 1−α .

y n+1 = αy n

So (n ∈ ℤ),

(2.2.3)

i.e., y n+1 = F(y n ), where F(y) = αy. By (2.2.3), y n = y0 α n . If |α| < 1, then {y n } is exponentially decaying wherever y0 is any number. Clearly, y n → 0 (n → ∞). Thus, the trajectory is attracted to the fixed point y0 = 0, i.e., y0 is an attractor. If |α| > 1, then {y n } is exponentially increasing. If α = −1, then it is an oscillatory system between two values. For a dynamical system x n+1 = F(x n ) (n ∈ ℤ), its fixed points are the solutions of x = F(x). Start from a point x0 in a small neighborhood of a fixed point x ∗ , define a small number ε n such that x n = x ∗ + ε n . So x ∗ + ε n+1 = x n+1 = F(x ∗ + ε n ) ≈ F(x ∗ ) + F 󸀠 (x ∗ )ε n . Note that x ∗ = F(x ∗ ). Then

ε n+1 = F 󸀠 (x ∗ )ε n .

(2.2.4)

Furthermore, when |F 󸀠 (x ∗ )| < 1, x n → x ∗ (n → ∞). This means that x ∗ is an attractor.

2.2 Henon and logistic maps |

49

2.2.5 Logistic map The logistic map is x n+1 = F(x n ) = Ax n (1 − x n ), where A is a constant. Consider a fixed point x∗1 = 0. Then F 󸀠 (x∗1 ) = A(1 − 2x∗1 ) = A, and so |F 󸀠 (x∗1 )| < 1 if and only if |A| < 1. This means that x∗1 = 0 is an attractor when −1 < A < 1. For another fixed point x∗2 = 1 − A1 , F 󸀠 (x∗2 ) = A(1 − 2(1 − A−1 )) = −A + 2, and so |F 󸀠 (x∗2 )| < 1 if and only if 1 < A < 3. This means that x∗2 = 1 − A−1 is an attractor when 1 < A < 3. Now consider the case A = 3. When A = 3, F 󸀠 (x∗2 ) = −1. By (2.2.4), ε n+1 = −ε n , ε n+2 = ε n

(n ∈ ℤ),

i.e., x n+2 = x n . So {x n } is an oscillatory trajectory between two values under a small perturbation. To obtain such an oscillatory trajectory, consider the map T 2 (x) := F(F(x)) and find its fixed point through the system of equations y2 = F(y1 ) = Ay1 (1 − y1 ), y1 = F(y2 ) = Ay2 (1 − y2 ). Its solution is

(2.2.5)

A + 1 + √(A + 1)(A − 3) , 2A (2.2.6) A + 1 − √(A + 1)(A − 3) ∗ y2 = . 2A Clearly, if A ≥ 3, the solution exists. Then y∗1 and y∗2 are fixed points of the map F 2 . From this and (2.2.5), it follows that y∗1 =

(F 2 )󸀠 (y∗1 ) = F 󸀠 (F(y∗1 ))F 󸀠 (y∗1 ) = F 󸀠 (y∗2 )F 󸀠 (y∗1 ) = A2 (1 − 2y∗1 )(1 − 2y∗2 ). By (2.2.6),

y∗1 + y∗2 = 1 + A−1 , y∗1 y∗2 =

1 A+1 ((A + 1)2 − (A + 1)(A − 3)) = . 4A2 A2

So (F 2 )󸀠 (y∗1 ) = 1 − (A + 1)(A − 3). Similarly, (F 2 )󸀠 (y∗2 ) = (F 2 )󸀠 (y∗1 ) = 1 − (A + 1)(A − 3). From (F 2 )󸀠 (y∗1 ) = (F 2 )󸀠 (y∗2 ) = 1 when A = 3, (F 2 )󸀠 (y∗1 ) = (F 2 )󸀠 (y∗2 ) = −1

when A = 1 + √6,

50 | 2 Chaos and dynamical systems similar to the case A = 3, it is easily shown that when A = 1 + √6, the trajectories are oscillatory among four values. Therefore, when A increases from 3, the trajectories are oscillatory among two values, then four values, eight values, …. When A approximates the value 3.56995, periods of oscillations tend to infinity. When the parameter A > 3.56995, the system begins to exhibit chaotic behavior and have the following characteristics: – sensitively depend on initial conditions; – nonperiodicity; – the attractors have fractal structures (i.e. strange attractors). The interval [3.56995, 4] for the parameter A is often called the chaotic domain.

2.3 Lyapunov exponents For a dynamical system, if two initial conditions have a small difference δx and their difference after time t becomes δxeλt , then λ is called the Lyapunov exponent. Numerically, one can calculate the maximal Lyapunov exponent λmax as follows. Choose two very close initial points and let their distance be d0 ≪ 1. After time t i , their difference will become d i (i ∈ ℤ+ ). Then the maximal Lyapunov exponent can be computed by N

∑ log λmax = lim

N→∞

1 N

di d0

.

∑ ti 1

Negative Lyapunov exponents are characteristic of dissipative or nonconservative systems. Such systems exhibit asymptotic stability; the more negative the exponent, the greater the stability. A positive Lyapunov exponent is characteristic of chaotic systems. Nearby points, no matter how close, will diverge to any arbitrary separation. In a dynamical system there exist many different Lyapunov exponents, but the sum of all Lyapunov exponents cannot be positive in a physically meaningful system. Let x and y be two close trajectories for a dynamical system with dimension m x n+1 = F(x n ),

y n+1 = F(y n ).

Then y n+1 − x n+1 = F(y n ) − F(x n ) = J n (x n )(y n − x n ) + O(‖y n − x n ‖2 ), where J n (x n ) is the m × m Jacobian matrix of F at x n . Let δ n = y n − x n . It is rewritten into δ n+1 = J n δ n (n ∈ ℤ+ ), where J n = J n (x n ). This implies that δ N+1 = J N δ N = J N J N−1 δ N−1 = ⋅ ⋅ ⋅ = U N δ1 . where U N = ∏Nn=1 J n .

2.4 Fractal dimension |

51

In order to compute Lyapunov exponents, we need to find δ1 such that ‖δ n+1 ‖ attains the maximal value under the condition ‖δ1 ‖ = 1 (i.e., δT1 δ1 = 1) by using the Lagrange multipliers method. Introduce a function H(u) = (U N u)T (U N u) − λ(uT u − 1). Note that (U N u)T (U N u) = uT U NT U N u = uT G N u, where G N = U NT U N is an m × m real symmetric positive definite matrix and u is an m-dimensional vector and uT G N u is a quadratic form of u. So H(u) = u T G N u − λ(uT u − 1). Let G N = ( α ij )m×m and u = (u1 , . . . , u m ). Then m

m

m

H(u) = ∑ ∑ α ij u i u j − λ (∑ u2i − 1) . 1

i=1 j=1

This implies that n ∂H = 2 ∑ α il u i − 2λu l ∂u l i=1

(l = 1, . . . , m).

∂H = 0 (l = 1, . . . , m) and ‖u l ‖ = 1, when λ is the eigenvalue of the matrix G N and Since ∂u l u is the corresponding unit eigenvector, H(u) attains the minimal value. Denote its (N) (N) (N) (N) (N) eigenvalues by λ1 ≥ λ2 ⋅ ⋅ ⋅ ≥ λ m . The corresponding unit eigenvectors e1 , . . . , e m (N) (N) (N) satisfy G N e i = λ i e i . Then the Lyapunov exponents are

λ i = lim

N→∞

1 (N) log|λ i | 2N

(i = 1, . . . , m).

In the one-dimensional case, the Jacobian matrix F 󸀠 (x n ) on x n is a real number. The corresponding Lyapunov exponent is λ = lim

N→∞

1 N ∑ log|F 󸀠 (x n )|. N 1

2.4 Fractal dimension From an intuitive notion of dimension, a set of finite points is zero-dimensional, a straight line is one-dimensional, a plane is two-dimensional, and a cube is threedimensional. One often determines the dimension by degrees of freedom. For example, the equation with a parameter x = x(t),

y = y(t)

(0 ≤ t ≤ 1),

52 | 2 Chaos and dynamical systems

where x(t) and y(t) are continuous functions, is considered to be a one-dimensional continuous curve since it only has one degree of freedom. The equation x = x(u, υ),

(u, υ ∈ [0, 1]2 ),

y = y(u, υ)

where x(u, υ) and y(u, υ) are continuous functions, is considered to be a two-dimensional continuous surface. However, the approach in which one uses degrees of freedom to determine dimension is not precise. In 1890, Peano constructed a continuous curve x = x p (t),

y = y p (t)

(0 ≤ t ≤ 1),

such that its trajectory fills the whole unit square [0, 1]2 , i.e., its dimension is two. Peano’s curve is constructed as follows. First, the segment S = [0, 1] is divided equally into four small segments (1)

S2 = [ 14 , 12 ] ,

(1)

S4 = [ 34 , 1] .

S1 = [0, 14 ] , S3 = [ 12 , 34 ] ,

(1) (1)

Correspondingly, the unit square Q = [0, 1]2 is equally divided into four small squares (1)

Q2 = [ 12 , 1] × [0, 12 ] ,

(1)

Q4 = [ 12 , 1] × [ 12 , 1]

Q1 = [0, 12 ] × [0, 12 ] , Q3 = [0, 12 ] × [ 12 , 1] ,

(1) (1)

(1)

and each small segment S k (k = 1, 2, 3, 4) is divided equally into four small segments (2)

S4(k−1)+1 ,

(2)

S4(k−1)+2 ,

(2)

S4(k−1)+3 ,

(2)

(k = 1, 2, 3, 4).

S4k (2)

(2)

(2)

So the segment S is divided into 24 segments S1 , S2 , . . . , S16 . Correspondingly, (1) each Q k (k = 1, 2, 3, 4) is divided equally into four small squares. So the square Q is divided equally into 16 small squares. Continuing this procedure to the n-th steps. Then the segment S is equally divided (n) (n) (n) into 4n small segments S1 , S2 , . . . , S4n and the square Q is equally divided into 4n (n) (n) small squares Q1 , . . . , Q4n (n = 1, 2, . . . ). Now we construct a continuous map from [0, 1] to [0, 1]2 one-to-one as follows. (1) (n) For a point a ∈ S, there is a sequence of segments S k1 , . . . , S k n such that (1)

(2)

(n)

S k1 ⊃ S k2 ⊃ ⋅ ⋅ ⋅ ⊃ S k n ⊃ ⋅ ⋅ ⋅ . (n)

The sequence {S k n }n∈ℤ+ converges to the point a. The corresponding small square (n)

{Q k n }n∈ℤ+ converges to a unique point b as n → ∞. Let a corresponds to b, i.e., the map φ from [0, 1] into [0, 1]2 is given by φ(a) = b. (n) Conversely, for any b󸀠 ∈ [0, 1]2 , there is a sequence {Q k󸀠 }n∈ℤ+ such that n

(1) Q k󸀠 1



(2) Q k󸀠 2

⊃ ⋅⋅⋅ ⊃

(n) Q k󸀠 n

⊃ ⋅⋅⋅

2.4 Fractal dimension |

(n)

(n)

n

n

53

and {Q k󸀠 }n∈ℤ+ converges to b󸀠 . The corresponding small segment {S k󸀠 }n∈ℤ+ con-

verges to a unique point a󸀠 , and so φ(a󸀠 ) = b󸀠 . This implies that φ(a) = b is a one-to-one map from [0, 1] to [0, 1]2 and φ([0, 1]) = [0, 1]2 . Finally, we prove that φ is a continuous map. Let a n → a (a n ∈ S), b n = φ(a n ), and b = φ(a). We need only to prove that b n → b. By the above construct, there are k1 < k2 < ⋅ ⋅ ⋅ such that (1)

(2)

(n)

(1)

(2)

S k1 ⊃ S k2 ⊃ ⋅ ⋅ ⋅ ⊃ S k n ⊃ ⋅ ⋅ ⋅ , (n)

Q k1 ⊃ Q k2 ⊃ ⋅ ⋅ ⋅ ⊃ Q k n ⊃ ⋅ ⋅ ⋅ , (n)

(n)

and a ∈ S k n , b ∈ Q k n (n ∈ ℤ+ ). For ε > 0, choose m ∈ ℤ+ such that √2 (4−m ) < ε. (m)

(m)

(m)

By a n → a, there is a p ∈ ℤ+ such that a n ∈ S k m , b n ∈ Q k m (n > p). Note that b ∈ Q k m (m) and Q k is a square with the diameter √2 (4−m ). Then m

‖b n − b‖ ≤ √2 (4−m ) < ε. So b n → b. This implies that φ(a) = b is a one-to-one continuous map from [0, 1] to [0, 1]2 . Let φ(a) = (φ1 (a), φ2 (a)), b = (b1 , b2 ). Then the map b = φ(a) can be rewritten into b1 = φ1 (a),

b2 = φ2 (a)

(0 ≤ a ≤ 1),

where both φ1 and φ2 are continuous. So it is a continuous curve. But it is twodimensional. From this, it is seen that the notion of dimension needs to be described more precisely. There are several ways to define dimension.

2.4.1 Box-counting dimension Cover a bounded data set A by boxes with diameter ε. Denote the minimal number of boxes by N(ε). Then the box-counting dimension D1 is defined by D1 = lim

ε→0

log N(ε) . log 1ε

By this definition, the segment [0, 1] is one-dimensional, the square [0, 1]2 is twodimensional, the cube [0, 1]3 is three-dimensional. This definition coincides with the intuitive notion of dimension.

54 | 2 Chaos and dynamical systems

2.4.2 Information dimension In the definition of box-counting dimension, it is not considered how many points of the set lie in each box. The information dimension overcomes this shortcoming and is defined as H(ε) , D2 = lim ε→0 log 1 ε where N(ε)

H(ε) = − ∑ P i log P i 1

and P i is the relative frequency that the point (in the set) occurs in the i-th box of the covering.

2.4.3 Correlation dimension In the higher dimensional case, the computations of box-counting dimension and the information dimension are too complicated. For convenience, the correlation dimension is introduced. It is defined as log C(ε) , ε→0 log ε

D3 = lim where C(ε) =

2 N(N−1)

∑Ni=1 ∑Nj=i+1 Θ(ε − ‖x i − x j ‖) and Θ is the Heaviside step function Θ(x) = 0

(x ≤ 0),

Θ(x) = 1

(x > 0).

2.4.4 Self-similarly dimension A segment may be divided into 2 subsegments with similarity ratio 12 . A square may be divided into 4 subsquares with similarity ratio 12 . A cube may be divided into 8 subcubes with similarity ratio 12 . The numbers 2, 4, and 8 can be rewritten as 21 , 22 , and 23 . Here the exponents 1, 2, and 3 coincide with their intuitive dimensions. In the real world, many diagrams can be divided into b similar small diagrams with similarity ratio 1a , where a is a positive integer, and each small diagram can be divided into b similar smaller diagrams with similarity ratio 1a . If this procedure can continue forever, then we call such a diagram self-similar and its dimension D4 =

log b . log a

For example, the following Canton set is a self-similar set.

2.5 Prediction

| 55

First, divide [0, 1] into three parts [0, 13 ], ( 13 , 23 ), [ 23 , 1]. Remove the middle part G1 = ( 13 , 23 ). The residual part is E1 = [0, 1] \ ( 13 , 23 ) = [0, 13 ] ⋃ [ 23 , 1] . Secondly, divide [0, 13 ] and [ 23 , 1] into three parts, respectively, [0, 19 ] ,

( 19 , 29 ) ,

[ 29 , 13 ] ,

[ 23 , 79 ] ,

( 79 , 89 ) ,

[ 89 , 1] .

Remove G2 = ( 19 , 29 ) ⋃( 79 , 89 ). The residual part is E2 = [0, 1] \ (G1 ∪ G2 ) = [0, 19 ] ⋃ [ 29 , 13 ] ⋃ [ 23 , 79 ] ⋃ [ 89 , 1] . Continuing this procedure, we obtain a set G∞ = ⋃∞ n=1 G n . Clearly, G ∞ is an open set with measure 1. Let C = [0, 1] \ G∞ . Then C is a closed set with measure 0. The set C is called the Canton set. The Canton set can be divided into 2 similar small sets C1 and C2 with similarity ratio 13 . Both C1 and C2 can be divided into 2 similar smaller sets with similarity ratio 1 3 . This procedure can continue forever. Therefore, the Canton set C is a self-similar 2 set with dimension D4 = log log 3 . For the Canton set, its dimension is a fractal. Fractal dimensional diagrams are ubiquitous in nature. For chaotic dynamical systems, its attractor is often fractal dimensional. A fractal dimensional diagram has a complicated geometric structure. Self-similarity is the most important characteristic of fractal diagrams.

2.5 Prediction Consider the linear prediction. For a given time series S n (n = 1, . . . , N ) of measurements, we want to predict the following measurement S N+1 . The predictor S on+1 of the measurement S n+1 can be expressed by a linear combination of S n−m+1 , . . . , S n , i.e., m

S on+1 = ∑ α i S n−m+i

(n = m, . . . , N),

(2.5.1)

i=1

where α i ’s are m undetermined coefficients which are independent of n. Choose α i ’s such that N−1

g(α1 , . . . , α m ) = ∑ (S on+1 − S n+1 )2

(2.5.2)

n=m

attains the minimal value. By the extremum principle, it is necessary to find the deriva∂g tives ∂α . By (2.5.1) and (2.5.2), i N−1 N−1 m ∂S o 1 ∂g = ∑ (S on+1 − S n+1 ) n+1 = ∑ ( ∑ α j S n−m+j − S n+1 ) S n−m+i . 2 ∂α i n=m ∂α i n=m j=1

56 | 2 Chaos and dynamical systems ∂g So ∂α = 0 (i = 1, . . . , m) are equivalent to the system of linear equations ∑m j=1 c ij α j = d i i (i = 1, . . . , m), where N−1

c ij = ∑ S n−m+j S n−m+i , n=m N−1

d i = ∑ S n−m+i S n+1 . n=m

The solutions are denoted by

α1o ,

. . . , α om . By (2.5.1), we get the prediction of S N+1 m

S oN+1 = ∑ α oi S N−m+i . i=1

For the nonlinear prediction, we consider the vector S n = (S n−(m−1) , S n−(m−2) , . . . , S n−1 , S n ), where m is the embedding dimension. In order to predict S N+1 , a good method is to find a known embedding vector R N closest to S N . Then we use R N+1 to predict S N+1 . More generally, we choose a small ε > 0 and find L known embedding vectors R kN (k = 1, . . . , L) such that ‖R kN − S N ‖ ≤ ε. The prediction of S N+1 is ŜN+1 =

1 L k . ∑R L k=1 N+1

2.6 Delay embedding vectors For a dynamical system, instead of measuring the actual states x n , we observe a scalar time series depending on the states S n = S(x n ). This scalar is a projection of variables of a system on the real axis. Due to the reduction of dimensionality and nonlinearity of the projection process, it is difficulty to reconstruct the state space of the original system by this scalar time series. When the attractor dimension is much smaller than the dimension of the state space, it is enough to construct a new space such that the attractor in the new space is equivalent to the original one. Consider a dynamical system x n+1 = F(x n ) (n = 1, . . . , N ) with the attractor A. To reconstruct the attractor A, we construct the m-dimensional vector by the scalar time series {S n } S n = (S n−(m−1) , S n−(m−2) , . . . , S n−1 , S n ) and τ is the decay time and m is the embedding dimension. An embedding of a compact smooth manifold M into the space ℝm is defined as a map which is a one-to-one continuously differentiable map with the nonsingular Jacobian matrix. The following theorem shows how to choose m such that the attractor A can embed into the space ℝm .

2.7 Singular spectrum analysis

| 57

If x n ∈ A, then S n ∈ ℝ, and so S n ∈ à ⊂ ℝm , where à is the image of A in the embedding process. This implies d à = d A , where d B is the dimension of B. Since the embedding map is one-to-one continuously differentiable, the selfintersection must not occur in A.̃ Otherwise, there are two directions at the selfintersection, which will destroy the one-to-one correspondence. In an m-dimensional space, the intersection of a d1 -dimensional subspace and a d2 -dimensional subspace in general is a d l -dimensional subspace, where d l = d1 + d2 − m. When these two subspaces do not intersect, then d l < 0. Now à does not intersect with itself, so d l = 2d A − m < 0, i.e., m > 2d A . Therefore, m > 2d A is a necessary condition that A is embedded in ℝm . On the other hand, we have the following theorem. Delay embedding theorem. Let d A be the box-counting dimension of the attractor A. If m > 2d A , then except for some special cases, the map S n = (S n−(m−1) , . . . , S n ) from x n ∈ A into ℝm is an embedding map. The whole embedding procedure can be visualized in the following diagram: xn ∈ A ↓ Sn ∈ ℝ ↓ S n ∈ à ⊂ ℝm

F



G



x n+1 ∈ A ↓ S n+1 ∈ ℝ ↓ S n+1 ∈ Ã ∈ ℝm

Since the attractor is an invariant subset, from S n ∈ A,̃ it follows that S n+1 ∈ A.̃ The new dynamical system S n+1 = G(S n ) is uniquely determined by x n+1 = F(x n ). A large embedding dimension requires knowledge of long time series. To reduce the cost of calculation, one often chooses d A < m < 2d A if the self-intersection is neglectable. Finally, we choose the suitable delay time τ such that τ is large enough and S n and S n+1 are rather independent, and at the same time τ is not so large that they are completely independent in a statistical sense. Let p i be the probability that S n is in the i-th bin of histogram, and let p ij be the probability that S n is in the bin i and S n+τ is in the j-th bin. The mutual information for time delay τ is I(τ) = ∑ p ij (τ) log p ij (τ) − 2 ∑ p i log p i . i,j

i

The first minimum of the mutual information is a good candidate for time delay.

2.7 Singular spectrum analysis Singular spectrum analysis is a technique to identify recurrent patterns and adaptively enhance signal-to-noise ratio in a dynamical system. Consider the dynamical system

58 | 2 Chaos and dynamical systems x n+1 = F(x n ). Let S n = S(x n ) (n = 1, 2, 3, . . . , N ) be an observed scalar time series from this dynamical system. Define Y1 = (S N−2(m−1) , . . . , S N−m+1 )T

...

Y m = (S N−(m−1) , . . . , S N )T .

where m ≪ N . Let Y = (Y1 , . . . , Y m ). Then the covariance matrix Σ YY of Y is Σ YY = (Cov(Y i , Y j ))m×m . Since the matrix Σ YY is a real symmetric matrix, its eigenvalues are λ1 ≥ λ2 ≥ ⋅ ⋅ ⋅ ≥ λ m and the corresponding eigenvectors e1 , e2 , . . . , e m form an orthonormal basis. In general, the main recurrent patterns are just the first several eigenvectors. Let α k = (Y , e k ) (k = 1, . . . , m). Then

m

Y = ∑ αk ek .

(2.7.1)

k=1

Since e k is a normal eigenvector of Σ YY , Var(α k ) = eTk (λ k e k ) = λ k (eTk e k ) = λ k Cov(α k , α l ) =

eTk Σ YY e l

=

eTk (λ l e l )

=0

(k = 1, . . . , m), (k ≠ l).

(2.7.2)

By (2.7.1) and (2.7.2), it follows that the total variance of Y is m

m

m

Var(Y) = E[‖Y ‖22 ] = E [∑ α2k ] = ∑ E[α2k ] = ∑ λ k 1

1

1

and the ratio λ L / ∑m k=1 λ k is defined as the contribution of e L to Y .

2.8 Recurrence networks Recurrence networks are a new approach to analyzing the recurrence properties of complex dynamical systems. For N observed trajectories S n (n = 1, . . . , N ) from a dynamical system, their time delay embedding vectors STn = (S nT−(m−1)τ , . . . , S nT−τ , S nT ), where m is the embedding dimension and τ is the delay. Define a recurrence network associated with these delay time vectors: each S n (n = 1, . . . , N ) is considered as a vertex. A simple approach is to introduce an edge between two vertices S i and S j when the correlation coefficient between STi and STj is larger than a given threshold. Another approach is to introduce an edge between two vertices based on the distance. For i ≠ j, if the distance between S i and S j is less than a given threshold ε, where ε

2.8 Recurrence networks | 59

is reasonably small (in particular, much smaller that the attractor diameter), then the vertices S i and S j are considered to be connected by an edge. In detail, For a given threshold ε > 0, let R ij = H(ε − distance(S i , S j )) (i, j = 1, . . . , m), where H is the Heaviside function {1 H(α) = { 0 {

(α ≥ 0), (α < 0)

and ‖ ⋅ ‖ is the norm of the space ℝm . If the pair i, j (i ≠ j) such that R ij = 1, it means that S i and S j are close. In this case, we introduce an edge between S i and S j . If R ij (ε) = 0, it means that S i and S j are not close. In this case, we do not introduce an edge between S i and S j . The matrix (R ij (ε))m×m is called a recurrence matrix. Let A ij (ε) = R ij (ε) − δ ij (i, j = 1, . . . , m), (2.8.1) where δ ij is the Kronecker delta, i.e., δ ij = 0 (i ≠ j) and δ ij = 1 (i = j). The matrix (A ij )m×m is called an adjacency matrix. When i ≠ j, A ij (ε) = R ij (ε). The topological characteristics of recurrence networks can capture the fundamental properties of dynamical systems.

2.8.1 Local recurrence rate First, we measure the importance of a vertex in a complex network, the local recurrence rate of a vertex S ν (ν = 1, . . . , N ) is N

k ν = ∑ A νi

(A νν = 0)

i=1

which is the number of vertices i ≠ ν connected directly with ν. Therefore, the local connectivity is defined as 1 N ρν = ∑ A νi . N − 1 i=1 Local connectivity gives the relationship between local edge density and local correlation dimension.

2.8.2 Global recurrence rate (edge density) We discuss the mean of all local recurrence rates k̄ =

1 N 1 N N ∑ kl = ∑ ∑ A li . N l=1 N i=1 l=1

60 | 2 Chaos and dynamical systems Note that the correlation coefficients are symmetric, R ij = R ji and R ii = 1. By (2.8.1), A li = A il and A ll = 0. This implies that k̄ =

2 N i−1 ∑ ∑ A il . N i=1 l=1

Define the global recurrence rate ρ̄ as ρ̄ = k̄ =

1 N

∑Nl=1 ρ l . Furthermore,

1 N ∑(N − 1)ρ l = (N − 1)ρ.̄ N 1

and so ρ̄ =

2 2 ∑ A il = ∑ (H(ε − distance(S i , S j )) − δ il ) . N(N − 1) i 0, there exists a polynomial P(x) such that |P(x) − f(x)| < ϵ (x ∈ [a, b]). In fact, if f ∈ C[−1, 1], then the sequence of Bernstein polynomials of f n k n! (B n f)(x) = ∑ f ( ) x k (1 − x)n−k n k!(n − k)! 0

converges uniformly to f(x) on [0, 1] as n → ∞.

(n ∈ ℤ+ )

3.3 Polynomial approximation

| 77

Let f ∈ C([a, b]). Denote by H n the set of polynomials of degree ≤ n. Then the best approximation error of f in H n is defined as E n (f) = inf { max |P(x) − f(x)|} . P∈H n

a≤x≤b

̃ − f(x)|, then P̃ is called the best approximation If P̃ ∈ H n and E n (f) = maxa≤x≤b | P(x) polynomial of f in H n . The best approximation polynomial exists and is unique. It is characterized as follows. Let f ∈ C([a, b]). Then Q(x) is the best approximation polynomial of f in H n if and only if there exist n + 2 points a ≤ x1 < x2 < ⋅ ⋅ ⋅ < x n+2 ≤ b such that |Q(x k ) − f(x k )| = max |Q(x) − f(x)| a≤x≤b

Q(x k ) − f(x k ) = −(Q(x k+1 ) − f(x k+1 ))

(k = 1, . . . , n + 2), (k = 1, . . . , n + 1).

This characterization shows that if Q(x) is the best approximation polynomial, then the deviation Q(x) − f(x) is an undamped oscillation. It gives a numerical method to find out the best approximation polynomial. Suppose that f ∈ C([a, b]) and its derivative exists on [a, b], and its n-th best approximation polynomial is P n (x) = ∑nk=0 α k x k . Then the best approximation error E n , the best approximation polynomial P n (x), and n + 2 deviation points y1 , . . . , y n+2 should satisfy the following 2n + 4 equations: (f(y k ) − P n (y k ))2 = (E n )2 , (y k − a)(y k − b)(f 󸀠 (y k ) − P󸀠n (y k )) = 0

(k = 1, 2, . . . , n + 2).

(3.3.1)

To solve this system of equations, Remez gave a numerical procedure as follows. (a) Take initial values a ≤ y1 < y2 < ⋅ ⋅ ⋅ < y n+2 ≤ b. (b) Solve out unknown numbers a0 , a1 , . . . , a n and E n from the system of linear equations a0 + a1 y k + ⋅ ⋅ ⋅ + a n y nk − (−1)k E n = f(y k ) (k = 1, 2, . . . , n + 2). So the polynomial P on (x) = ∑nk=0 a ok x k and E on are obtained. (c) Find extreme points x1 , x2 , . . . , x n+2 of f(x) − P n (x), where a ≤ x1 < x2 < ⋅ ⋅ ⋅ < x n+2 ≤ b. (d) Replace y k by x k in (b), solve out a10 , a11 , . . . , a1n and E1n , and then obtain the polynomial P1n (x). Again, take extreme points x11 , x12 , . . . , x1n+2 of f(x) − P1n (x), where a ≤ x11 < x12 < ⋅ ⋅ ⋅ < x1n+2 ≤ b. Continuing this procedure, the obtained coefficients a0ν , a1ν , . . . , a νn and E νn satisfy a νk → α k , E νn → E n

(ν → ∞).

78 | 3 Approximation So the best approximation polynomial P n (x) = ∑nk=0 α k x k and the best approximation error E n are obtained. Weierstrass theorem shows that for any f ∈ C[a, b], the approximation error E n (f) in H n monotonically tends to zero as n → ∞. The decay rate of E n (f) depends on the smoothness of f . Let f (p) ∈ lip α (0 < α ≤ 1) on [a, b]. Then the best approximation of f in H n satisfies E n (f) ≤ nM p+α , where M is a constant independent of n. The following proposition indicates that the polynomial approximation becomes better at ends of the interval [−1, 1]. Let f ∈ C[−1, 1]. Then there exists a polynomial sequence {P n (x)} of degree n such that for −1 ≤ x ≤ 1, |f(x) − P n (x)| ≤ M(∆ n (x))p+α

(0 < α ≤ 1; p = 0, 1, . . . )

if and only if f (p) ∈ lip α, where M is a constant independent of n and x, and ∆ n (x) = max (

√1 − x 2 n

,

1 ). n2

3.3.2 Orthogonal polynomials b

Now we consider a polynomial P n (x) of degree n such that ∫a (f(x) − P n (x))2 dx is as small as possible. Such an approximation is called the square approximation of polynomials. It is different from uniform approximation. If a sequence of polynomials converges uniformly to a function, then the function is continuous. Therefore, if a function has points of discontinuity, then the function cannot be approximated by polynomials uniformly. However, a piecewise continuous function can be approximated by polynomials in the square sense. More generally, we consider the weighted square approximation. The notation f ∈ L2ρ ([a, b]) means that the function f on [a, b] satisfies b

∫ ρ(x)f 2 (x) dx < ∞, a b

where ρ(x) ≥ 0 and ∫a ρ(x)dx = 1. Define the inner product of f, g ∈ L2ρ ([a, b]) as b

(f, g)ρ = ∫ ρ(x)f(x)g(x) dx. a

If (f, g)ρ = 0, then f and g are orthogonal with weight ρ. Define the norm of f ∈ 1/2 L2ρ ([a, b]) as ‖f ‖ρ = (f, f)ρ .

3.3 Polynomial approximation

| 79

Let {P n }n=0,1,... be a system of n-degree polynomials and ρ be a weight function on [a, b]. If, for n, m = 0, 1, . . . , b

(P n , P m )ρ = ∫ ρ(x)P n (x)P m (x) dx = 0 (n ≠ m), a

then {P n } is said to be a system of orthogonal polynomials on [a, b] with weight ρ. Again, if (P n , P n )ρ = 1 (n = 0, 1, . . . ), then {P n } is said to be a system of a normal orthogonal polynomial system. For any weight function ρ(x) on [a, b], there is a normal orthogonal polynomial system {ω n (x)}n=0,1,... with weight ρ(x), where ω n (x) is just a polynomial of degree n. Any f ∈ L2ρ ([a, b]) can be expanded into Fourier series with respect to the system {ω n (x)}n=0,1,... as ∞

f(x) = ∑ c k ω k (x), 0

where c k =

b ∫a

f(x)ω k (x)ρ(x) dx, and the Parseval identity holds b



∫ ρ(x)f 2 (x) dx = ∑ c2k . k=0

a

Let s n (x) be its partial sum: s n (x) = ∑nk=0 c k ω k (x). Then b

∫ ρ(x)(f(x) − s n (x))2 dx → 0 (n → ∞). a

For any polynomial P n (x) of degree n, b

b

∫ ρ(x)(f(x) − P n (x))2 dx ≤ ∫ ρ(x)(f(x) − s n (x))2 dx, a

(3.3.2)

a

i.e., s n (x) is the best square approximation polynomial in all n-degree polynomials 1 2 2 and the best square approximation error is E n = (∑∞ k=n+1 c k ) . We introduce several important orthogonal polynomials often used in applications.

(a) Chebyshev polynomials T n (x) = cos(n arccos x) are orthogonal polynomials with weight (1 − x2 )− 2 on [−1, 1]. So T0 (x) = 1 and T1 (x) = x, and the recurrence formula holds 1

T n+1 (x) = 2xT n (x) − T n−1 (x)

(n ≥ 1).

80 | 3 Approximation

Note that 1

∫ T n (x)T m (x) −1

π

dx √1 − x 2

= ∫ T n (cos θ)T m (cos θ) dθ 0 π

= ∫ cos(nθ) cos(mθ) dθ = 0 (n ≠ m), 0 1

∫ T n2 (x) −1

π

{π = ∫ cos2 (nx) dx = { 2 √1 − x 2 π 0 { dx

(n ≠ 0), (n = 0).

The system T̂ 0 (x) =

1 , √π

T̂ n (x) = √

2 cos(n arccos x) π

is a normal orthogonal system with weight If

1 ∫−1 √f (x)2 1−x 2

1 √1−x2

(n ∈ ℤ+ )

on [−1, 1].

dx < ∞, then f can be expanded into Fourier–Chebyshev series

f(x) =

1



∑ cTn T̂ n (x), 0

where

cTn

= ∫ f(x)T̂ n (x) −1

dx √1 − x 2

.

Let x = cos θ. Then f(cos θ) = α0 + ∑∞ 1 α n cos(nθ), where π

α0 =

1 ∫ f(cos θ) dθ, π 0 π

αn =

2 ∫ f(cos θ) cos(nθ) dθ π

(n ∈ ℤ+ ).

0

This is just the Fourier cosine expansion of f(cos θ). Chebyshev polynomials have the following properties: |T n (x)| ≤ 1 T n (cos (n)

kπ ) = (−1)k n

(−1 ≤ x ≤ 1), (0 ≤ k ≤ n),

(k = 0, 1, . . . , n). In all monic polynomials P n (x) = and their zeros are x k = cos (2k+1)π 2n 1 x n + ∑nk=1 a k x n−k , the polynomial 2n−1 T n (x) is such that max|x|≤1 |P n (x)| attains the 1 minimal value 2n−1 . If we choose the zeros of the Chebyshev polynomial as the interpolation nodes, then a good interpolation approximation formula is given in Chapter 4.

3.3 Polynomial approximation

| 81

(b) Legendre polynomials dn 2 (x − 1)n dx n are orthonormal polynomials with the weight ρ(x) = 1 on [−1, 1], and P0 (x) = 1, P1 (x) = x, P2 (x) = 12 (3x2 − 1), and the recurrence formula holds P n (x) =

P n+1 (x) =

1

2n n!

2n + 1 n xP n (x) − P n−1 (x) n+1 n+1

(n ≥ 2).

2ν−1 2ν π ≤ θ ν ≤ 2n+1 π The Legendre polynomial P n (x) has n real zeros x ν = cos θ ν , where 2n+1 (ν = 1, . . . , n). If we choose zeros of a Legendre polynomial as nodes, a good numerical integral formula is given in Chapter 6. The system

1 P̂ 0 (x) = √ P0 (x), 2 2n + 1 P̂ n (x) = √ P n (x) (n ∈ ℤ+ ) 2 is a normal orthogonal system on [−1, 1] with the weight ρ(x) = 1. If f ∈ L2 ([−1, 1]), then f can be expanded into Fourier–Legendre series 1



(P) f(x) = ∑ c n P̂ n (x),

where

(P)

c n = ∫ f(x)P̂ n (x) dx.

0

−1

(P) (P) Its partial sums s n (f; x) = ∑nk=0 c k P̂ k (x) satisfying 1 (P)

∫ (f(x) − s n (f; x))2 dx → 0 (n → ∞). −1 (P)

From (3.3.2), it is seen that s n (f; x) is the best square approximation polynomial in all polynomials of degree n and the best square approximation error is ∞

e n (f) = ( ∑

(P) (c k )2 )

1 2

.

k=n+1

(c) Jacobian polynomials (α,β)

Jn

(x) = (1 − x)−α (1 + x)−β

(−1)n dn ((1 − x)n+α (1 + x)n+β ) 2n n! dx n

are orthogonal polynomials on [−1, 1] with the weight (1 − x)α (1 + x)β (α, β > −1). (α,β) – When α = β = 0, J n (x) = P n (x) (Legendre polynomial). (α,β) – When α = β = − 12 , J n = T n (x) (Chebyshev polynomial).

82 | 3 Approximation

(d) Laguerre polynomials and Hermite polynomials Laguerre polynomials and Hermite polynomials are both orthogonal polynomials on infinite intervals. One uses them to solve the square approximation problems on infinite intervals. The Laguerre polynomials L n (x) = ex

dn n −x (x e ) (n = 0, 1, . . . ) dx n

are orthogonal with weight e−x on [0, ∞), and L0 (x) = 1, L1 (x) = −x + 1, and the recurrence formula holds nL n (x) = (−x + 2n − 1)L n−1 (x) − (n − 1)L n−2 (x)

(n ≥ 2).

The Hermite polynomials H n (x) = ex

2

dn −x2 e dx n

(n = 0, 1, . . . )

are orthogonal with weight e−x on (−∞, ∞), and H0 (x) = 1, H1 (x) = 2x, and the recurrence formula holds 2

H n (x) = 2xH n−1 (x) − 2(n − 1)H n−2 (x) (n ≥ 2).

3.4 Spline approximation and rational approximation In this section, we discuss approximation by piecewise polynomials and the rational approximation. For the approximation by piecewise polynomials, we consider not only the one-dimensional case but also the high-dimensional case. For functions defined in a domain, we use differentiability to describe the smoothness. Sobolev space is the most important and best known smoothness space. For 1 ≤ p < ∞ and r ∈ ℤ+ , define the Sobolev space W r (L p (Ω)) as the set of all functions f on Ω which have all derivatives D ν f(|ν| ≤ r) in L p (Ω), i.e., ∫ |D ν f |p dt < ∞ (|ν| ≤ r), Ω

where D ν f =

∂|ν| f ν ν ∂t11 ⋅⋅⋅∂t dd

. Here ν = (ν1 , . . . , ν d ) and |ν| = ν1 + ⋅ ⋅ ⋅ + ν d .

For p = ∞ and r ∈ ℤ+ , define the Sobolev space W r (L∞ (Ω)) as the set of all functions f on Ω whose all derivatives D ν f (|ν| ≤ r) are continuous on Ω and maxt∈Ω |D ν f(t)| < ∞ (|ν| ≤ r), i.e., W r (L∞ (Ω)) = C r (Ω).

3.4.1 Approximation by piecewise polynomial based on fixed partitions A partition ∆ of Ω is a finite set, ∆ = { C } of subdomains C (such as polyhedrons) which are pairwise disjoint and union to Ω. Define a polynomial of degree r on each

3.4 Spline approximation and rational approximation

|

83

subdomain C to obtain a piecewise polynomial relative to the partition ∆. Denote the set of these piecewise polynomials by S r (∆). The approximation error of f by elements of S r (∆) is defined as e ∆ (f)p := inf (∫ |f(t) − S(t)|p dt) r S∈S (∆)

1 p



(p < ∞),

e ∆ (f)∞ := inf (max|f(t) − S(t)|) . r s∈S (∆)

t∈∆

Let Ω = [0, 1]d and let S r (∆) be the space of piecewise polynomials of order r relative to ∆, where ∆ = { C } is a partition of Ω. Denote diam(∆) := maxC∈∆ (diam(C)). Then, for f ∈ W r (L p (Ω)), the approximation error e ∆ (f)p in the space S r (∆) satisfies 1 p

e ∆ (f)p ≤ L r,p (diam(∆))r ∑ (∫ |D ν f |p dt) , |ν|=r



where L r,p is a constant independent of f and ∆. In the one-dimensional case, suppose that f is an r-order continuously differentiable function on [0, 1], i.e., f ∈ C r ([0, 1]). Let T : 0 = t0 < t1 < ⋅ ⋅ ⋅ < t N = 1 be a partition of [0, 1] and S r (T) be the set of piecewise polynomials of order r relative to the partition T . Then the approximation error of f by elements of S r (T) e T (f)∞ := inf (max |f(t) − S(t)|) r S∈S (T) 0≤t≤1

satisfies σ T (f)∞ ≤ L r max |f (r) (t)|δ rT , 0≤t≤1

where δ T = max0≤k 0 such that the approximation error σ N (f) = inf (max |f(t) − S(t)|) 0≤t≤1

S∈Σ N

satisfies σ N (f) = O ( N1 ). However, if f ∈ lip α, we approximate f by piecewise polynomials based on a fixed equally spaced partition T with N pieces, the approximation error e T (f) cannot attain O( n1α ). In applications, if f has different smoothness on each subinterval, we may choose a partition depending on f such that there are more nodes in which f is not smooth and fewer nodes in which f is smooth. This gives a good approximation. Consider piecewise polynomial approximation with not only free knots but also free degrees of the polynomial pieces. The theory for this type of approximation is far from complete. Let Σ∗N denote the set of all polynomials S = ∑I∈∆ P I χ I , where ∆ is a partition and for each I ∈ ∆, there is a polynomial P I of degree r l with ∑l∈∆ r l ≤ N . Denote 1 p

1

σ∗N (f)p = inf∗ (∫|f(t) − S N (t)|p dt) . S∈Σ N

0

Clearly, σ∗Nr (f)p ≤ σ N,r (f)p . For example, when f(x) = x β (x ∈ [0, 1], β > 0), σ N,r (f)p ≈

1 , Nr

√2−1)√N

σ∗N (f)p ≤ Ce−(

.

3.4 Spline approximation and rational approximation

|

85

3.4.3 Rational approximation Let R n (ℝd ) denote the space of rational functions in d variables. Each R ∈ R n is a quotient R = P/Q, where P and Q are two d variate polynomials of total degree ≤ n. Define the approximation error as r n (f)p := inf ‖f − R‖L p (Ω) . R∈R n

For many functions with less singularity, rational approximation is very efficient. For example, f(t) = |t| on [0, 1] is approximated by a polynomial of degree ≤ n, the approximation error is E n (f)∞ ≈ 1n . However, Newmann showed that if it is approximated by the rational function of total degree ≤ n, the approximation error r n (f)∞ ≤ 3e−√n . In fact, let p(t) − p(−t) , r n (t) = t p(t) + p(−t) where p(t) = ∏n−1 k=1 (t + e

k − √n

). A direct estimate will deduce the above result.

(a) Pade approximation Suppose that f(x) can be expanded into a power series on [−1, 1] f(x) = c0 + ∑ c k x k . k∈ℤ+

We approximate to f by a rational function R mn (x) = p m (x)/q n (x), where m

p m (x) = ∑ a k x k , 0

n

q n (x) = ∑ b j x j .

(3.4.2)

0

The coefficients {a k }k=0,...,m and {b j }j=0,...,n satisfy the system of linear equations a0 = c0 b0 , a1 = c1 b0 + c0 b1 , .. . a m = c m b0 + c m−1 b1 + ⋅ ⋅ ⋅ + c m−n b n , c m+1 b0 + c m b1 + ⋅ ⋅ ⋅ + c m−n+1 b n = 0, c m+2 b0 + c m+1 b1 + ⋅ ⋅ ⋅ + c m−n+2 b n = 0, .. . c m+n b0 + c m+n−1 b1 + ⋅ ⋅ ⋅ + c m b n = 0.

86 | 3 Approximation This is a system of m + n + 1 linear equations with m + n + 2 unknown. So this system has a nonzero solution, say a0 , a1 , . . . , a m , b0 , b1 , . . . , b n , and all its solutions are ka0 , ka1 , . . . , ka m , kb0 , kb1 , . . . , kb n for k ∈ ℝ. From this and (3.4.2), the unique rational function R mn (x) = p m (x)/q n (x) with order (m, n) approximating to f is determined. The rational function R mn (x) is called the Pade approximant. When m = n or m = n + 1, the Pade approximation often gives an exact approximation.

(b) Mahley approximation Replace the power series expansion in the Pade approximation by the Chebyshev series expansion to obtain the Mahley approximation. Suppose that a function f(x) can be expanded into a Chebyshev series uniformly and absolutely as follows: ∞

f(x) = ∑ c k T k (x), 0

where T k (x) are Chebyshev polynomials, and approximate to f(x) by a rational function R mn (x) = p m (x)/q n (x) of degree (m, n), where m

n

p m (x) = ∑ a k T k (x),

q n (x) = ∑ b j T j (x).

0

0

Then the coefficients {a k }k=0,...,m and {b j }j=0,...,n are the nonzero solution of the system of linear equations a0 = b0 c0 +

1 n ∑ bν cν , 2 1

a k = b0 c k +

1 1 n b k c0 + ∑ b ν (c ν+k + c|ν−k| ) 2 2 ν=1

(k = 1, 2, . . . , m + n).

Pade approximation is based on the power series expansion. It sometimes has a good effect only in a neighborhood at the center for the power series. Comparing these two kinds of approximations, the effect of Mahley approximation is better than that of Pade approximation. Especially when the convergence rate of power series expansion is slow, while the convergence rate of Chebyshev series expansion is quick, their difference is large.

3.5 Wavelet approximation Wavelet theory provides a simple and powerful decomposition of the target function into a series. Multiresolution approximation and the N -term approximation of wavelet series are often used. Moreover, these algorithms can be generalized easily to the highdimensional case.

3.5 Wavelet approximation

|

87

3.5.1 Fourier transforms and wavelet transforms The Fourier transform of f ∈ L(ℝ) is defined as ̂ f (ω) = ∫ f(t) e−itω dt

(ω ∈ ℝ).



If f ̂ ∈ L(ℝ), the inverse transform is f(t) =

1 ̂ eitω dω ∫ f (ω) 2π ℝ

(t ∈ ℝ).

For example, the Fourier transform of f(t) = χ[0,1] (t)

is

̂ f (ω) =

2 sin ω , ω

where χ[0,1] is the characteristic function of [0, 1]; the Fourier transform of f(t) = e−t

– – –

2

is

̂ f (ω) = √ π e−

ω2 4

.

The Fourier transform has the following properties: ̂ = 0. If f ∈ L1 (ℝ), then limω→∞ f (ω) (m) ̂ If f (t) is continuous on ℝ and f (k) ∈ L1 (ℝ) for each k, then f (ω) = O ( |ω|1 m ). Operational rules ̂ (f(t + α))∧ = eiαω f (ω), ̂ − b), (eibt f(t)∧ ) = f (ω 1 ̂ ω f ( ). (f(λt))∧ = |λ| λ



̂ g(ω), ̂ where the convolution product of Convolution formula (f ∗ g)∧ (ω) = f (ω) 1 f, g ∈ L (ℝ) is defined as (f ∗ g)(t) = ∫ℝ f(t − x)g(x) dx.

The space L2 (ℝ) is the set of functions f satisfying ∫ℝ |f(t)|2 dt < ∞. In the space L2 (ℝ), ̄ dt. – The inner product is defined as (f, g) = ∫ℝ f(t)g(t) 1

– – –

The norm is defined as ‖f ‖L2 = (∫ℝ |f(t)|2 dt) 2 . If ‖f n − f ‖L2 → 0 (n → ∞), we say limn→∞ f n = f . R ̂ Fourier transform is defined as f (ω) = limR→∞ ∫−R f(t) e−iωt dt. Parseval identity ‖ f ‖̂ L2 = ‖f ‖L2 .

Fourier transform of a signal can provide only global frequency information. To obtain the frequency content of the signal as it evolves with time, one introduces the wavelet transform.

88 | 3 Approximation A wavelet is a damped function ψ ∈ L2 (ℝ) with zero average. The wavelet transform of f ∈ L2 (ℝ) is defined by (W ψ f)(a, b) =

t−b 1 ) dt ∫ f(t)ψ̄ ( a √|a| ℝ

(a ≠ 0, b ∈ ℝ),

where ψ̄ means the conjugate of ψ. Its inverse formula is f(t) =

t − b da 1 1 ψ( ) 2 db ∬(W ψ f)(a, b) cψ a a √|a|

(t ∈ ℝ),

ℝ2

̂

M where c ψ = ∫ℝ | ψ(ω)| |ω| dω < ∞. In applications, Morlet wavelets ψ (t) and Mexican hat wavelet ψ H (t) are often used. They have respectively the representations 2

t2

ψ M (t) = π− 4 eitθ e− 2 , t2 1 (1 − t2 ) e− 2 , ψ H (t) = − Γ(2.5) 1

where θ is a parameter and Γ(t) is the gamma function.

3.5.2 Multiresolution analyses and wavelet bases Let ψ ∈ L2 (ℝ). If the integral translations and dyadic dilations of a function ψ m

ψ mn (t) = 2 2 ψ(2m t − n)

(m, n ∈ ℤ)

form a normal orthogonal basis for L2 (ℝ), then {ψ mn }m,n∈ℤ is called a wavelet basis and ψ is called a wavelet. For example, let { −1, { { { ψ h (t) = { 1, { { { 0, {

0 ≤ t < 12 , 1 2

< t < 1,

(3.5.1)

otherwise.

m

Then the Haar system {2 2 ψ h (2m t − n)} (m, n ∈ ℤ) is a normal orthogonal basis for L2 (ℝ). This is the simplest wavelet basis. The construction of a wavelet basis is based on multiresolution analysis (MRA). An MRA consists of a sequence {V m }m∈ℤ of closed subspaces in L2 (ℝ) satisfying (a) V m ⊂ V m+1 , ⋃m∈ℤ V m = L2 (ℝ), ⋂m∈ℤ V m = { 0 }; (b) f ∈ V m if and only if f(2⋅) ∈ V m+1 (m ∈ ℤ); (c) there exists a φ ∈ V0 such that {φ(t − n)}n∈ℤ is a normal orthogonal basis of V0 , where φ is called a scaling function. For example, for any m ∈ ℤ, let (h)

V m = { f : f ∈ L2 (ℝ) and is a constant in each interval (

n n+1 , ), n ∈ ℤ}. 2m 2m

3.5 Wavelet approximation

|

89

(h)

Then {V n }m∈ℤ is an MRA with scaling function φ h (t) = χ[0,1] (t). Another example is (s) ̂ = 0 (|ω| > 2m π) } V m = { f; f ∈ L2 (ℝ), f (ω)

(m ∈ ℤ),

where f ̂ is the Fourier transform of f . Then {V m } is an MRA with scaling function φ s (t) = sin(πt) πt . One uses an MRA to construct a wavelet basis as follows. ̂ Step 1. Find the transfer function H satisfying the bi-scale equation φ(2⋅) = H φ,̂ and π H is a 2π-periodic function and ∫−π |H(ω)|2 dω < ∞. Step 2. Expand H(ω) into Fourier series H(ω) = ∑n∈ℤ c n einω , where c n are bi-scale coefficients. Step 3. Find the wavelet using the wavelet formula (s)

ψ(t) = −2 ∑ (−1)n c̄1−n φ(2t − n) n∈ℤ

̂ ̄ ω + π)φ(̂ ω ). or ψ(ω) = e−i H( 2 2 Step 4. The integral translations and dyadic dilations of ψ derive a wavelet basis {ψ m,n (t)}m,n∈ℤ , where m ψ mn (t) = 2 2 ψ(2m t − n). ω 2

For example, from the scaling function φ h (t) = χ[0,1] , one constructs the Haar wavelet ψ h (t) (see (3.5.1)). Haar wavelet is a discontinuous function. From the scaling function φ s (t) = sin(πt) πt , one constructs the Shannon wavelet ψ(t) =

sin 2π(t − 12 ) − sin π(t − 12 ) π(t − 12 )

.

Shannon wavelet is infinitely many times differentiable but it decays very slowly. A lot of good wavelets, such as Meyer wavelets, Battle–Lemarie wavelets, and Daubechies wavelets, have been constructed. Meyer wavelets and Battle–Lemarie wavelets have good smoothness and decay fast. Fourier transform of Meyer wavelets ψ(t) is 3 { e−i 2 sin ( π2 ν( 2π |ω| − 1)) , ̂ ψ(ω) ={ ω π 3 −i 2 e cos ( 2 ν( 4π |ω| − 1)) , { ω

2 3π 4 3π

≤ |ω| ≤ 43 π, ≤ |ω| ≤ 83 π,

where ν(x) is an n-degree differentiable real-valued function and { 0 (x ≤ 0), ν(x) = { 1 (x ≥ 1) {

ν(x) + ν(1 − x) = 1 (x ∈ ℝ).

Fourier transform of the Battle–Lemarie wavelet ψ k (t) of degree k is 1

F k ( ω2 + π) 2 ω ω 4 k ψ̂ k (ω) = ( ) e−i 2 sin2k ( ) , iω 4 F k ( ω2 )F k (ω)

90 | 3 Approximation (cot ω) sin ω d where F k (2ω) = − (2k−1)! . Especially, F1 (ω) = 1 and F2 (ω) = 13 sin2 ω2 + dω2k−1 ω 2 cos 2 . Daubechies wavelet has good smoothness and is compactly supported. It is introduced in Section 6.6. Let {ψ mn }m,n∈ℤ be a wavelet basis. Then f ∈ L2 (ℝ) can be expanded into a wavelet series f(t) = ∑ d mn ψ mn (t) 2k

2k−1

m,n∈ℤ

in

L2 -sense,

where d mn = (f, ψ mn ) = ∫ f(t)ψ̄ mn (t) dt ℝ

are called wavelet coefficients. Another wavelet expansion formula is that for M ∈ ℤ, ∞

f(t) = ∑ c Mn φ Mn (t) + ∑ ∑ d mn ψ mn (t), n∈ℤ

m=M n∈ℤ

where φ is the scaling function and M

φ Mn (t) = 2 2 φ(2M t − n), c Mn = (f, φ Mn ). Moreover, {φ Mn (t)}n∈ℤ and {ψ mn }m=M,M+1,...; n∈ℤ are a normal orthogonal basis for L2 (ℝ). If f has few nonnegligible wavelet coefficients d mn and scale coefficients c Mn , then f can be approximated by the sum of few terms in these expansions. This is very useful in data compression and noise removal. We say ψ has p vanishing moments if ∫ℝ t k ψ(t) dt = 0 (k = 0, . . . , p). If f is smooth and the wavelet ψ has a high vanishing moment, the Taylor theorem shows that wavelet coefficients of the fine-scale are small.

3.5.3 High-dimensional wavelets The notion of multiresolution analyses (MRA) is generalized easily to the highdimensional space L2 (ℝd ). An MRA consists of a sequence {V m }m∈ℤ of closed subspaces in L2 (ℝd ) satisfying (a) V m ⊂ V m+1 , ⋃m∈ℤ V m = L2 (ℝd ), ⋂m∈ℤ V m = { 0 }; (b) f ∈ V m if and only if f(2⋅) ∈ V m+1 (m ∈ ℤ); (c) there exists a φ ∈ V0 such that {φ(t − n)}n∈ℤd is a normal orthogonal basis of V0 , where φ is called a scaling function. One may use a high-dimensional MRA to construct a high-dimensional wavelet basis as follows.

3.5 Wavelet approximation

|

91

As in the one-dimensional case, the scaling function φ satisfies ̂ ̂ φ(2ω) = H(ω)φ(ω) (ω ∈ ℝd ), where H is a 2π-periodic function in ℝd and ∫[−π,π]d |H(ω)|2 dω < ∞. Let H0 = H .

Denote by {0, 1}d the set of vertices of [0, 1]d . For example, {0, 1}2 = { (0, 0), (0, 1), (1, 0), (1, 1) }. Take 2d − 1 functions {H μ }μ∈{0,1}d \{0} with period 2π such that the 2d × 2d matrix (H μ (ω + πν))μ,ν∈{0,1}d is a unitary matrix. Let {ψ μ }μ∈({0,1}d \{0}) be such that ̂ ψ̂ μ (2ω) = H μ (ω)φ(ω) ψ μmn (t) = 2

md 2

(μ ∈ ({0, 1}d \ {0})),

ψ μ (2m t − n).

Then {ψ μmn }μ∈({0,1}d \{0}),m∈ℤ,n∈ℤd form an normal orthogonal basis for L2 (ℝd ). The following is the simplest method for constructing a high-dimensional wavelet basis. Suppose that φ is a univariate scaling function and ψ is the corresponding wavelet. Denote ψ0 = φ and ψ1 = ψ. For each nonzero vertex e = (e1 , . . . , e d ) ∈ {0, 1}d \ {0}, define a d-variate function ψ e (t1 , . . . , t d ) = ψ e1 (t1 ) ⋅ ⋅ ⋅ ψ e d (t d ). The set Ψ = { ψ e , e ∈ {0, 1}d \ {0} } is a d-variate wavelet and ψ emn (t) = 2

md 2

ψ e (2m t − n) (m ∈ ℤ, n ∈ ℤd )

is a d-dimensional wavelet basis, where t = (t1 , . . . , t d ). Each f ∈ L2 (ℝd ) can be expanded into a wavelet series f =

∑ e∈{0,1}d \{0}



∑ c emn ψ emn ,

m∈ℤ n∈ℤd

where c emn = (f, ψ emn ) = ∫ℝd f(t)ψ em,n (t) dt. Especially, the two-dimensional wavelets are ψ(1) (t) = φ(t1 )ψ(t2 ), ψ(2) (t) = ψ(t1 )φ(t2 ), ψ(3) (t) = ψ(t1 )ψ(t2 ), where t = (t1 , t2 ) ∈ ℝ2 , and (k)

ψ mn (t) = 2m ψ(k) (2m t − n), is a bivariate wavelet basis.

m ∈ ℤ, n ∈ ℤ2

(k = 1, 2, 3)

92 | 3 Approximation

3.5.4 Wavelet packet The emergence of the concept of the wavelet packet enables us to construct many normal orthogonal bases from an MRA. Wavelet packet is defined as follows. Let {V m } be a d-dimensional MRA with scaling function φ and 2d transfer functions {H μ }μ∈{0,1}d . The corresponding wavelets {ψ μ } satisfy ω ω ψ̂ μ (ω) = H μ ( ) φ̂ ( ) , 2 2

μ ∈ ({0, 1}d \ {0}).

Define ψ ν1 ,...,ν j such that j

ω ω ψ̂ ν1 ,...,ν j (ω) = φ̂ ( j ) ∏ H ν i ( i ) , 2 2 1 where each ν i ∈ {0, 1}d (i = 1, . . . , j). Let ν i be the ν ̃l i -th vertex in {0, 1}d . Denote l = ν ̃l1 + 2d ν ̃l2 + ⋅ ⋅ ⋅ + 2(j−1)d ν ̃l j , where ν1̃ , ν2̃ , . . . , ν2̃ d are an arbitrary serial number of {0, 1}d . Then l corresponds to the set ν1 , . . . , ν j one-to-one. Denote ω l (t) = ψ ν1 ,...,ν j (t). The system md { 2 2 ω l (2m t − k) }l,m∈ℤ∗+ ,k∈ℤd is called a wavelet packet, where ℤ∗+ = ℤ+ ⋃{ 0 }. Let S be the set of pairs (l, m) (l, m ∈ ℤ∗+ ). From a wavelet packet the question is md how to choose S such that {2 2 ω l (2m t − k)}(l,m)∈S,k∈ℤd form an normal orthogonal basis for L2 (ℝd )? The answer is as follows. Denote I lm = { τ ∈ ℤ∗+ : 2md l ≤ τ < 2md (l + 1) },

where l ∈ ℤ∗+ , m ∈ ℤ∗+ .

Then {2

md 2

ω l (2m t − k) }(l,m)∈S,k∈ℤd

is an normal orthogonal basis for L2 (ℝd ) if and only if the set S ∈ ℤ∗+ × ℤ∗+ is such that {I lm }(l,m)∈S forms a partition of ℤ∗+ . For example, let τ ∈ ℤ∗+ and S = {(l, m) : (l, m) ∈ ([0, 2τd ) × { 0 }) ⋃ ([ 2τd , 2(τ+1)d ) × ℤ∗+ )} . Then {I lm }(l,m)∈S is a disjoint covering of ℤ∗+ . In fact, since I lm ((l, m) ∈ S) are disjoint, from I lm = [2(τ+m)d , 2(τ+m+1)d ) ⋂ ℤ∗+ , ⋃ 2τd ≤l 0 such that for all η > 0, #(Λ η (f)) ≤ M fτ η−τ , where M f is a constant depending only on f , then for a given ε > 0, the thresholding operator T ε (f) = ∑ c mn (f)ψ mn = ∑ c mn (f)ψ mn |c mn |>ε

(mn)∈Λ ε (f)

satisfies

τ

τ

‖f − T ε (f)‖L2 (R) ≤ CM f2 ε1− 2 , where C is an absolute constant. When ε = M f N − τ , #Λ ε (f) ≤ N , and so the N -term approximation is estimated by 1

‖f − T ε (f)‖L2 (R) ≤ CM f N 2 − τ . 1

1

This result is easily extended to L p -approximation. The thresholding operator has an instability. To improve it, the soft thresholding operator s ε (t) is introduced by { 0 { { { s ε (t) := { 2(|t| − ε) sgn t { { { t {

(|t| < ε), (ε ≤ |t| ≤ 2ε), (|t| > 2ε).

The soft thresholding operator s ε (t) has the same approximation properties as the thresholding operator T ε .

(c) Data compression In the lossy compression replacing the original data by an approximation, we choose a multivariate scaling function φ and then expand the original data I into a series I(t) ∼ ∑ c mn φ(2m t − n). n∈ℤd

Using the decomposition formula of the pyramid algorithm, we obtain a wavelet expansion m−1

I(t) ∼ ∑ c0n φ(t − n) + ∑ n∈ℤd

k=0

∑ e∈{0,1}d \{0}

∑ d ekn ψ ekn (t) n∈ℤd

98 | 3 Approximation and then we use thresholding to give a compressed file {d̃ ekn } of wavelet coefficients. The compressed coefficient file is further compressed using a lossless encoder. From the encoded compressed file of wavelet coefficients, we use a decoder and then use the synthesis formula of the pyramid algorithm to give the reconstructed data.

3.6 Greedy algorithms In this section we regard Hilbert space H as L2 (ℝd ). An arbitrary subset of the Hilbert space is called a dictionary. In applications, one considers the N -term approximation from the dictionary D, i.e., the approximation to f ∈ H by a linear combination of N -terms in the dictionary D. 2 In time frequency analysis, the translations of Gabor functions g α,β (t) := eiαt e−βt generate a dictionary D in L2 (ℝ), i.e., D := { g α,β (t − γ) : α, β, γ ∈ ℝ }. L2 (ℝd ),

the wavelet packet {ω l }l=0,1,... has been constructed in SecIn an MRA of tion 3.5. The integral translations and dyadic dilations of the wavelet packet generate a dictionary D in L2 (ℝd ), i.e., m

D := { ω l,m,n (t) = 2 2 ω l (2m t − n), l = 0, 1, . . . ; m, n ∈ ℤ }. For a general dictionary D and any τ > 0, M > 0, define K 0τ (D, M) as the set of functions f satisfying f = ∑ c g g (Λ ⊂ D, #Λ < ∞), g∈Λ

∑ |c g |τ ≤ M τ , g∈Λ

where #Λ represents the cardinality of Λ. Let K τ (D, M) be the closure of K 0τ (D, M) in H , and let K τ (D) = ⋃ K τ (D, M). M>0

Define |f |K τ (D) = inf{ M : f ∈ K τ (D, M) }.

(3.6.1)

Greedy algorithms are often used in numerical analysis. Most often greedy algorithms are as follows.

3.6.1 Pure greedy algorithm The advantage of this algorithm is its simplicity. Let f ∈ H . The algorithm is as follow. Let h = h(f) ∈ D be such that (f, h(f)) = sup(f, h) h∈D

3.6 Greedy algorithms

|

99

̄ dt. Define where (f, g) = ∫ℝd f(t)g(t) U1 (f) = (f, h(f))h(f),

R1 (f) = f − U1 (f),

U2 (f) = U1 (f) + U1 (R1 (f)),

R2 (f) = f − U2 (f).

For each m ≥ 2, inductively define U m (f) = U m−1 (f) + U1 (R m−1 (f)),

R m (f) = f − U m (f).

Because at each iteration of the algorithm it approximates to the residual R m (f) as best possible by a single function from the dictionary D, this algorithm is called the pure greedy algorithm. If the dictionary D is generated by an orthonormal basis, then U m (f) is a best mterm approximation of f from D, i.e., σ m (f; D) := inf ‖f − s‖ = ‖f − U m ‖, s∈Σ m

where Σ m is the set of linear combinations s = ∑h∈Λ c h h (Λ ⊂ D, #Λ ≤ m). However, for a general dictionary, the pure greedy algorithm only gives an estimate as follows. For f ∈ K1 (D), 1 ‖f − U m (f)‖ ≤ |f |K1 (D) m− 6 (m ∈ ℤ+ ), where |f |K1 (D) is stated in (3.6.1).

3.6.2 Relaxed greedy algorithm The relaxed greedy algorithm is an improvement of the pure greedy algorithm. Let f ∈ H and D be any dictionary of H . The relaxed greedy algorithm is follows. Define (r) (r) R0 (f) = f, U0 (f) = 0, (r)

U1 (f) = U1 (f),

(r)

R1 (f) = R1 (f),

where U1 (f) and R1 (f) are stated in Section 3.6.1. For a function g ∈ H , let h = h(g) ∈ D be such that (g, h) = supτ∈D (g, τ). Inductively define (r) (r) (r) 1 1 h(R m−1 (f)), U m (f) = (1 − m ) U m−1 (f) + m (r)

(r)

R m (f) = f − U m (f). The relaxed greedy algorithm give an estimate as follows. For f ∈ K1 (D), (r)

‖f − U m (f)‖ ≤ Cm− 2 1

(m ∈ ℤ+ ),

where C is a constant. In this estimate, the approximation order is m− 2 . From this, it is seen that it has better approximation properties than the pure greedy algorithm. 1

100 | 3 Approximation

3.6.3 Orthogonal greedy algorithm The orthogonal greedy algorithm is also an improvement of the pure greedy algorithm. Let H0 be a finite-dimensional subspace of the Hilbert space H . The orthogonal greedy algorithm is as follows. Define (0)

U0 (f) = 0,

(0)

R0 (f) = f.

For m ≥ 1, inductively define (0)

(0)

Hm = span{ h(R0 )(f), . . . , h(R m−1 )(f) }, (0)

U m (f) = PHm (f), (0)

(0)

R m (f) = f − U m (f), where h(g) is stated in Section 3.6.2 and PHm is the best approximation to f in Hm . The orthogonal greedy algorithm has the following estimate: (0)

‖f − U m (f)‖ ≤ |f |K1 (D) m− 2 1

(m ∈ ℤ+ ).

(3.6.2)

− 12

In this estimate, the approximation order is m . From this, it is seen that it has better approximation properties than the pure greedy algorithm. But in this algorithm, the computation of the projection PHm is more expensive. Based on the estimate (3.6.2), the m-term approximation from a dictionary has a good result as follows. If f ∈ K τ (D) and 1τ = α + 12 , and α ≥ 12 , then σ m (f; D) := inf ‖f − s‖ ≤ C |f |K τ (D) m−α s∈Σ m

(m ∈ ℤ+ ),

where C depends on τ if τ is small and each s ∈ Σ m can be written in the form s = ∑h∈Λ c h h (Λ ⊂ D, #Λ ≤ m).

Further reading [1] [2] [3] [4] [5] [6]

Alhasan A, White DJ, De Brabanterb K. Continuous wavelet analysis of pavement profiles. Automation in Construction. 2016(63):134–143. Bolten M, Huckle TK, Kravvaritis CD. Sparse matrix approximations for multigrid methods. Linear Algebra and its Applications. 2016(502):58–76. Fang J, Lin S, Xu Z. Learning and approximation capabilities of orthogonal super greedy algorithm. Knowledge-Based Systems. 2016(95):86–98. Liu D, Yan P, Wei Q. Data-based analysis of discrete-time linear systems in noisy environment: Controllability and observability. Information Sciences. 2014(288):314–329. Morkisz PM, Plaskota L. Approximation of piecewise Holder functions from inexact information. Journal of Complexity. 2016(32):122–136. Moteki N. Discrete dipole approximation for black carbon-containing aerosols in arbitrary mixing state: A hybrid discretization scheme. Journal of Quantitative Spectroscopy and Radiative Transfer. 2016(178):306–314.

Further reading |

[7] [8] [9] [10] [11] [12] [13] [14] [15]

101

Safarinejadian B, Estahbanati ME. A novel distributed variational approximation method for density estimation in sensor networks. Measurement. 2016(89):78–86. Sharma V, Yang D, Walsh W, Reindl T. Short term solar irradiance forecasting using a mixed wavelet neural network. Renewable Energy. 2016(90):481–492. Yi H, Shu H. The improvement of the Morlet wavelet for multi-period analysis of climate data. Comptes Rendus Geoscience. 2012(344):483–497. Zhang Z. Fourier expansions with polynomial terms for random processes. Journal of Function Spaces. 2015:1–13. doi:10.1155/2015/763075. Zhang Z. Hyperbolic cross truncations for stochastic Fourier cosine series. The Scientific World Journal. 2014:1–13. doi:10.1155/2014/265031. Zhang Z, Saito N. PHLST with adaptive tiling and its application to Antarctic remote sensing image approximation. Inverse Problems and Imaging. 2014(8):321–337. Zhang Z, Jorgensen P. Modulated Haar wavelet analysis of climatic background noise. Acta Appl Math. 2015(140):71–93. Zhu X, Gisbrecht A, Schleif FM, Hammer B. Approximation techniques for clustering dissimilarity data. Neurocomputing. 2012(90):72–84. Zou CX, Shen XD, Li HY, Li XZ, Li ZJ. Wavelet analysis of spring climate characteristics in arid aeolian area of agro-pastoral ecotone in China. Water Science and Engineering. 2012(5):269–277.

4 Interpolation Data records with equidistant time intervals are fundamental prerequisites for the development of environmental modeling, simulation and impact assessment. Usually long-term environmental time series contain missing data or data with different sampling intervals. Interpolation can be used to handle missing environmental data or fill the intervals between two grid points so that series of measurements with small intervals are kept. In this chapter we will discuss curve fitting, Lagrange and Hermite interpolations, spline interpolation, trigonometric interpolation, and bivariate interpolation.

4.1 Curve fitting Given observation data (x k , y k ) (k = 1, . . . , M), we will find a polynomial P(x) of degree 2 N (N < M) such that the sum ∑M k=1 (P(x k ) − y k ) attains the minimal value. This is the so-called curve fitting problem.

4.1.1 Polynomial fitting Let (x k , y k ) (k = 1, . . . , M) be the observation data. Take a polynomial with unknown a0 , a1 , . . . , a N f(x) = a0 + a1 x + ⋅ ⋅ ⋅ + a N x N (N < M) to fit these data. Denote M

M

F(a0 , a1 , . . . , a N ) = ∑(f(x k ) − y k )2 = ∑(a0 + a1 x k + ⋅ ⋅ ⋅ + a N x Nk − y k )2 . 1

For ν = 0, . . . , N , let

∂F ∂a ν

1

= 0. Then

M

∑(a0 + a1 x k + a2 x2k + ⋅ ⋅ ⋅ + a N x Nk − y k )x νk = 0. 1 M l ν Denote R l = ∑M k=1 x k and S ν = ∑k=1 y k x k . Then

R ν a0 + R ν+1 a1 + R ν+2 a2 + ⋅ ⋅ ⋅ + R ν+N a N = S ν

(ν = 0, 1, . . . , N)

or N

∑ R ν+i a i = S ν

(ν = 0, 1, . . . , N).

i=0

This is a system of N + 1 linear equations with N + 1 unknown, and so it has a unique solution a∗0 , a∗1 , . . . , a∗N . DOI 10.1515/9783110424904-005

4.1 Curve fitting

|

103

Proposition 4.1.1. If a∗0 , a∗1 , . . . , a∗N is the solution of the system of linear equations N

∑ R ν+i a i = S ν

(ν = 0, 1, . . . , N),

i=0

then the polynomial fitting of data { x k , y k }k=1,...,M is f(x) = ∑Nν=0 a∗ν x ν . It is more convenient to choose a linear combination of orthogonal polynomials to fit data. For the given observation data (x k , y k ) (k = 1, . . . , M), assume that {x k } satisfy −1 = x1 < x2 < ⋅ ⋅ ⋅ < x M = 1 and are equally spaced. Choose the following linear combination of normal Legendre polynomials P̂ n (x) (see (3.3.1)) to fit the data f(x) = a0 P̂ 0 (x) + a1 P̂ 1 (x) + ⋅ ⋅ ⋅ + a N P̂ N (x). Let M

M

N

k=1

k=1

μ=0

2

F(a0 , a1 , . . . , a N ) = ∑ (f(x k ) − y k )2 = ∑ ( ∑ a μ P̂ μ (x k ) − y k ) . Then

∂F ∂a ν

= 0 (ν = 0, 1, . . . , N ) is equivalent to M

N

k=1

μ=0

∑ ( ∑ a μ P̂ μ (x k ) − y k ) P̂ ν (x k ) = 0 (ν = 0, 1, . . . , N). M ̂ ̂ ̂ The left-hand side is equal to ∑Nμ=0 (∑M k=1 P μ (x k ) P ν (x k )) a μ − ∑k=1 y k P ν (x k ). So N

∑ α μ,ν a μ − β ν = 0 (ν = 0, 1, . . . , N),

(4.1.1)

μ=0

where M

α μ,ν = ∑ P̂ μ (x k )P̂ ν (x k ),

M

β ν = ∑ y k P̂ ν (x k ).

k=1

(4.1.2)

k=1

Since x1 , . . . , x M are equally spaced nodes on [−1, 1] and normal Legendre polynomials satisfy 1

∫ P̂ ν (x)P̂ μ (x) dx = δ νμ , −1

where δ νμ is the Kronecker delta, we get M

α μ,ν = ∑ P̂ μ (x k )P̂ ν (x k ) ≈ 0 (μ ≠ ν), k=1 M M−1 α ν,ν = ∑ P̂ 2ν (x k ) ≈ . 2 k=1

From this and (4.1.1), a ν ≈

2β ν M−1

(ν = 0, 1, . . . , N ), where β ν are stated in (4.1.2).

104 | 4 Interpolation Proposition 4.1.2. Let −1 = x0 < x1 < ⋅ ⋅ ⋅ < x M = 1 and let them be equally spaced, and data (x k , y k ) (k = 0, 1, . . . , M) be given. Then the polynomial fitting data is f(x) = 2 ̂ ̂ ∑Nν=0 β ν P̂ ν (x), where β ν ≈ M−1 ∑M k=1 y k P ν (x k ) and P ν (x) is the ν-th normal Legendre polynomial. Now we consider the orthogonal polynomial with weight function to fit the given data. Given data (x k , y k ) (k = 1, . . . , M) satisfying a = x1 < x2 < ⋅ ⋅ ⋅ < x M = b, the (ρ) normal orthogonal polynomials P ν (x) (ν = 0, 1, . . . ) on [a, b] with weight function ρ(x) satisfy b (ρ)

(ρ)

∫ P μ (x)P ν (x)ρ(x) dx = δ μ,ν , a

where δ μ,ν is the Kronecker delta. Then the polynomial fitting data is N

(ρ) (ρ)

f(x) = ∑ β μ P μ (x), μ=0

where (ρ)

βμ ≈

M 2 (ρ) ∑ y k P μ (x k )ρ(x k ). M − 1 k=1

4.1.2 Orthogonality method In fitting data using the orthogonality method, the BC-decomposition of matrices is crucial.

BC-decomposition Let A be an M × N matrix with rank r, where M ≥ N . Then the matrix A can be decomposed into A = BC, where B is an M × r matrix, C is an r × N matrix, and ranks of B and C are both r. In fact, let A = (α ij )M×N and a j = (α1j , . . . , α Mj )T (j = 1, . . . , N ) be its j-th column vector. Since rank(A) = r, there are r linearly independent column vectors. Say, a1 , a2 , . . . , a r are the r linearly independent column vectors. We construct an orthonormal basis e1 , e2 , . . . , e M on ℝM such that for any s = 2, . . . , M, es ⊥ aj

(j = 1, . . . , s − 1).

Let P = (e1 |e2 | ⋅ ⋅ ⋅ |e M ). Then P is an orthogonal matrix of order M. Define U = PT A. Since PT = P−1 , A = PU. Let U = (u kl )M×N , where u kl = (e k , a l ). Since a1 , a2 , . . . , a r are linearly independent, a l = ∑rj=1 c jl a j (l > r). From this and e s ⊥ a j , it follows that u kl = (e k , a l ) =

4.1 Curve fitting

|

105

∑rj=1 c jl (e k , a j ) = 0 (k > r). Denote B̃ = (u kl )k,l=1,...,r and C̃ = (u kl )k=1,...,r; l=r+1,...,N . So U=(

B̃ 0

C̃ ) 0

and the product PU only depends on the first r columns of P and the first r rows of ̃ U , and A = PU = BC, where B = (e1 |e2 | ⋅ ⋅ ⋅ |e r ) and C = (B|̃ C). Consider a general system of linear independent functions φ1 (x), φ2 (x), . . . , φ N (x). We use their linear combination F(x) to fit observation data (x i , y i ) (i = 1, . . . , M), where M ≫ N , such that M

N

2

γ := ∑ ( ∑ β j φ j (x i ) − y i ) 2

i=1

j=1

attains the minimal value. Some often used function systems are the power function system {x i }, the trigonometric function system sin(ix), and the exponential function system { eλ i x }. ∂F = 0. Then Let ∂β j M

N

∑ φ j (x i ) ( ∑ β j φ j (x i ) − y i ) = 0 (i = 1, . . . , M). i=1

(4.1.3)

j=1

This is a system of linear equations. The matrix form is AT (Aβ − y) = 0, where A = (α ij )M×N , β = (β1 , . . . , β N )T , y = (y1 , . . . , y M )T , and α ij = φ j (x i ) (i = 1, . . . , M; j = 1, . . . , N ). Denote by β∗ = (β∗1 , . . . , β∗N ) the solution of (4.1.3). Then the combination fitting data is ∑Nj=1 β∗j φ j (x). We solve out β = β∗ below. Replacing A by its BC-decomposition in the matrix form of (4.1.3), CT BT BCβ = CT BT y. Multiplying both sides by C, (CCT )(BT B)Cβ = (CCT )BT y. Both CCT and BT B are r × r nonsingular matrices and rank(B) = rank(C) = r, so Cβ = W,

where W = (BT B)−1 BT y.

This implies that CT (CCT )−1 Cβ = CT (CCT )−1 W . Note that CT (CCT )−1 C = I . The desired solution is β∗ = CT (CCT )−1 W = CT (CCT )−1 (BT B)−1 BT y. Write β∗ = (β∗1 , . . . , β∗N ). So the combination F(x) fitting data is ∑Nj=1 β∗j φ j (x).

106 | 4 Interpolation

4.2 Lagrange interpolation Given a real sequence y k (k = 1, . . . , n) and nodes x k (k = 1, . . . , n), where x1 < x2 < ⋅ ⋅ ⋅ < x n , we construct a Lagrange interpolation polynomial L n (x) of degree n − 1 such that L n (x k ) = y k (k = 1, . . . , n). Moreover, we introduce the uniform convergence and mean convergence of the Lagrange interpolation polynomial sequences.

4.2.1 Fundamental polynomials Let ω n (x) be the product of n factors (x − x k ) (k = 1, . . . , n), i.e., ω n (x) = (x − x1 )(x − x2 ) ⋅ ⋅ ⋅ (x − x n ). Then ω n (x) is a polynomial of degree n and ω n (x k ) = 0 (k = 1, . . . , n), and ω󸀠n (x k ) = (x k − x1 ) ⋅ ⋅ ⋅ (x k − x k−1 )(x k − x k+1 ) ⋅ ⋅ ⋅ (x k − x n ) (k = 1, . . . , n). Define fundamental polynomials as l k (x) =

ω n (x) ω󸀠n (x k )(x − x k )

(k = 1, . . . , n).

(4.2.1)

Then l k (x) is a polynomial of degree n − 1 and l k (x j ) = δ jk for j, k = 1, . . . , n, where δ jk is the Kronecker delta. Let P(x) be any polynomial of degree n − 1. Then P(x) = ∑nk=1 P(x k )l k (x). In fact, let n

Q(x) = ∑ P(x k )l k (x). 1

Then Q(x) is a polynomial of degree n − 1 and Q(x k ) = P(x k ) (k = 1, . . . , n). These n pairs of values determine that P(x) = Q(x), i.e., P(x) = ∑nk=1 P(x k )l k (x).

4.2.2 Lagrange interpolation polynomials Lagrange interpolation polynomial of degree n − 1 is defined as n

n

L n (x) = ∑ y k l k (x) = ∑ y k 1

1

ω n (x) . ω󸀠n (x k )(x − x k )

(4.2.2)

Clearly, L n (x j ) = y j (j = 1, . . . , n). Formula (4.2.2) is called the Lagrange interpolation formula. For convenience of computation, it is rewritten in the form L n (x) = c0 + c1 (x − x1 ) + c2 (x − x1 )(x − x2 ) + ⋅ ⋅ ⋅ + c n−1 (x − x1 )(x − x2 ) ⋅ ⋅ ⋅ (x − x n−1 ).

(4.2.3)

4.2 Lagrange interpolation | 107

This form is called the Newton interpolation formula. The coefficients {c k } are computed as follows. c0 = y1 , y2 − c0 c1 = , x2 − x1 .. . c k−1 =

y k − c0 − ∑k−2 l=1 c l (x k − x 1 ) ⋅ ⋅ ⋅ (x k − x l ) (x k − x1 ) ⋅ ⋅ ⋅ (x k − x k−1 )

(k = 3, . . . , n).

The combination of Lagrange and Newton interpolation formulas gives k

yν 󸀠 ν=1 ω k (x ν )

c k−1 = ∑

(k = 1, . . . , n).

(4.2.4)

When we add a node, if we use the Lagrange interpolation formula, this again necessitates computing each fundamental polynomial l i (x); if we use the Newton interpolation formula, the coefficients already computed do not have to be changed. Therefore, in numerical computations, it is best to use the Newton interpolation formula. Let f ∈ C n ([a, b]) and a ≤ x1 < ⋅ ⋅ ⋅ < x n ≤ b, and L n (x) be the Lagrange interpolation polynomial of degree n − 1. Then the error between the interpolation polynomial and the original function is f(x) − L n (x) =

1 (n) f (ξ x )ω n (x) n!

and so max |f(x) − L n (x)| ≤

a≤x≤b

(a < ξ x < b),

1 max |f (n) (x)|(b − a)n . n! a≤x≤b

4.2.3 Equally spaced nodes and Chebyshev nodes Consider equally spaced nodes x1 = a, x2 = a + h, .. . x n = a + (n − 1)h. Then

ω󸀠k (x ν ) = (x ν − x1 ) ⋅ ⋅ ⋅ (x ν − x ν−1 )(x ν − x ν+1 ) ⋅ ⋅ ⋅ (x ν − x k ) = (−1)k−ν h k−1 (ν − 1)!(k − ν)! .

(4.2.5)

108 | 4 Interpolation

From this and (4.2.4), c k−1 =

k

1 h k−1

∆ k−1 y1 (−1)k−ν y ν = k−1 , (ν − 1)!(k − ν)! h (k − 1)! ν=1 ∑

where ∆ k−1 y1 is the (k − 1)-th difference. From this and (4.2.3), it follows that n−1

L n (x) = y1 + ∑ ν=1

(x − a)(x − a − h) ⋅ ⋅ ⋅ (x − a − (ν − 1)h) ν ∆ y1 . ν! h ν

This formula is called the Newton interpolation formula with equally spaced nodes. (n) Consider the equally spaced nodes x k = −1 + 2k n (k = 0, 1, . . . , n). The Lagrange interpolation polynomials of f(x) = |x| (−1 ≤ x ≤ 1) do not converge to f(x) as n → ∞ except x = −1, 0, 1. Therefore, we need to look for other nodes. Consider Chebyshev nodes x k = cos (2k−1)π (k = 1, . . . , n), i.e., the zeros of Cheby2n shev polynomial T n (x) = cos(n arccos x). The term of highest degree of T n (x) is 2n−1 x n . So 1 1 ω n (x) = n−1 T n (x) = n−1 cos(n arccos x), (4.2.6) 2 2 2 n √1 − T n (x) , 2n−1 √1 − x2 n . ω󸀠n (x k ) = 2n−1 √1 − x2k

ω󸀠n (x) =

By (4.2.1), l k (x) =

T n (x) √1 − x2k n(x − x k )

and the Lagrange interpolation formula with Chebyshev nodes is L n (x) =

T n (x) n y k √1 − x2k . ∑ n 1 x − xk

1 (|x| ≤ 1). From this and the error formula (4.2.5), it follows By (4.2.6), |ω n (x)| ≤ 2n−1 that for the Chebyshev nodes x1 , . . . , x n , the interpolation error is 󵄨󵄨󵄨 1 󵄨󵄨󵄨 1 |f(x) − L n (x)| = 󵄨󵄨󵄨 f (n) (ξ x )ω n (x)󵄨󵄨󵄨 ≤ n−1 max |f (n) (t)|. 󵄨󵄨 n! 󵄨󵄨 2 n! |t|≤1

From this, it is seen that Chebyshev nodes are optimal nodes since |ω n (x)| ≤ (|x| ≤ 1) holds if and only if the nodes x1 , . . . , x n are Chebyshev nodes.

1 2n−1

4.2.4 Convergence of interpolation polynomials Given a continuous function f , if P n is an interpolation polynomial of f with n nodes, we expect that {P n } converges to f as n → ∞. Consider the triangular matrix of nodes

4.2 Lagrange interpolation | 109

on [a, b]

(1)

x1 (2) x1 , .. . (n) x1 ,

(2)

x2 .. . (n) x2 ,

...,

(n)

xn .

The sequence of Lagrange interpolation polynomials of f ∈ C([a, b]) is n

(n) (n)

L n (x) = ∑ f(x k )l k (x) (n ∈ ℤ+ ), k=1 (n)

(n)

where l k (x) (k = 1, . . . , n) are fundamental polynomials based on nodes x k (k = 1, . . . , n) ω n (x) (n) l k (x) = , (n) (n) ω󸀠n (x k )(x − x k ) (n)

where ω n (x) = ∏nk=1 (x − x k ). Proposition 4.2.1. Let f ∈ C([a, b]) and its best approximation by polynomials of degree n be E n , and let n

(n)

λ n (x) = ∑ |l k (x)|, k=1

λ n = max |λ n (x)|. a≤x≤b

If limn→∞ λ n E n−1 = 0, then the Lagrange interpolation polynomial L n (x) converges to f(x) on [a, b] uniformly. (n)

(k = 1, . . . , n), For Chebyshev nodes x k = cos (2k−1)π 2n λ n = max λ n (x) ≤ 8 + 0≤x≤1

4 log n. π

If f ∈ C([−1, 1]) and f is a piecewise differentiable function on [−1, 1], then f ∈ lip 1. From this and the Jackson inequality in Section 3.1, E n = O( 1n ), and so lim λ n E n−1 = 0.

n→∞

By Proposition 4.2.1, the interpolation polynomial L n (x) with Chebyshev nodes converges uniformly to f(x) on [−1, 1]. There exists a continuous function f(x) on [−1, 1] such that its Lagrange interpolation polynomial with Chebyshev nodes diverges everywhere. But, in fact, for any (n) triangular matrix {x k }k=1,...,n , there exists a continuous function such that its Lagrange interpolation polynomials do not converge to itself uniformly.

110 | 4 Interpolation

4.2.5 Mean convergence From Section 3.3, we see that Legendre polynomials are orthogonal polynomials with weight 1. They have n different zeros on [−1, 1]. It is convenient to choose the zeros of Legendre polynomials as nodes for mean convergence, and the following proposition holds. (n) Let nodes {x k }k=1,...,n be zeros of the n-th Legendre polynomial. If f ∈ C([−1, 1]), then its Lagrange interpolation polynomial L n (x) converges to f(x) in the mean square sense, i.e., 1

lim ∫ (L n (x) − f(x))2 dx = 0.

n→∞

−1 (n)

For orthogonal polynomials on [a, b] with weight ρ(x), their zeros {x k }k=1,...,n (n) (n) (n) are all simple zeros and a < x1 < x2 < ⋅ ⋅ ⋅ < x n = b, and can be estimated precisely. The product (n) (n) (n) ω n (x) = (x − x1 )(x − x2 ) ⋅ ⋅ ⋅ (x − x n ) is an orthogonal polynomial with the first term coefficient 1 and has a simple and (n) clear representation, and its derivative ω󸀠n (x k ) is estimated easily. Take these zeros (n)

x k (k = 1, . . . , n) as nodes. For f ∈ C([a, b]), the Lagrange interpolation polynomials (n) L n (x) with nodes x k (k = 1, . . . , n) converges to f(x) in the mean square sense b

lim ∫(L n (x) − f(x))2 ρ(x) dx = 0.

n→∞

a

Especially, for f ∈ C([−1, 1]), the Lagrange interpolation polynomial with Chebyshev nodes x k = cos (2k−1)π (k = 1, . . . , n) satisfies 2n 1

lim ∫ (L n (x) − f(x))2

n→∞

−1

1 dx = 0. √1 − x 2

4.3 Hermite interpolation For nodes x k (k = 1, . . . , n) satisfying x1 < x2 < ⋅ ⋅ ⋅ < x n , we will find the lowest (l) polynomial H(x) such that H (l) (x k ) = y k (k = 1, . . . , n; l = 0, 1, . . . , α k − 1), i.e., H(x k ) = y k , { { { { { { H 󸀠 (x k ) = y󸀠k , { { { .. { . { { { { (α −1) (α −1) k (x k ) = y k k {H

(k = 1, . . . , n).

4.3 Hermite interpolation

| 111

Such a polynomial exists and is unique. The polynomial H(x) is called the Hermite interpolation polynomial. If α1 = α2 = ⋅ ⋅ ⋅ = α n = 1, the Hermite interpolation polynomial is reduced to the Lagrange interpolation polynomial. If n = 1, the Hermite interpolation polynomial is reduced to the Taylor polynomial α −1

H(x) = y1 +

y󸀠1 y 11 (x − x1 ) + ⋅ ⋅ ⋅ + (x − x1 )α1 −1 . 1! (α1 − 1)!

4.3.1 Hermite interpolation formula with remainder term Assume that f ∈ C m ([a, b]), where m = α1 + ⋅ ⋅ ⋅ + α n , and nodes x k ⊂ [a, b] (k = 1, . . . , n), and its Hermite interpolation polynomial H(x) satisfying H (l) (x k ) = f (l) (x k ) (k = 1, . . . , n; l = 0, 1, . . . , α k − 1). Then the error formula is as follows: f(x) = H(x) +

f (m) (ξ) Ω(x) m!

(a < ξ < b),

where Ω(x) = (x − x1 )α1 (x − x2 )α2 ⋅ ⋅ ⋅ (x − x n )α n and m = α1 + α2 + ⋅ ⋅ ⋅ + α n .

4.3.2 Interpolation polynomial with double points When α1 = ⋅ ⋅ ⋅ = α n = 2, the problem is reduced to finding a polynomial H(x) of degree 2n − 1 satisfying H(x k ) = y k , H 󸀠 (x k ) = y󸀠k

(k = 1, . . . , n).

The polynomial H(x) is called the interpolation polynomial with double points. Let n ω󸀠󸀠 (x k ) (x − x k )) l2k (x), P2n−1 (x) = ∑ y k (1 − n󸀠 ω n (x k ) 1 n

Q2n−1 (x) = ∑ y󸀠k (x − x k )l2k (x). 1

where ω n (x) = (x − x1 ) ⋅ ⋅ ⋅ (x − x n ) and l k (x) are fundamental polynomials stated in Section 4.2. Then P2n−1 (x k ) = y k ,

P󸀠2n−1 (x k ) = 0,

Q2n−1 (x k ) = 0,

Q󸀠2n−1 (x k ) = y󸀠k

(k = 1, . . . , n).

Both P2n−1 (x) and Q2n−1 (x) are polynomials of degree 2n − 1. The following proposition holds.

112 | 4 Interpolation Given two real number sequences y k , y󸀠k (k = 1, . . . , n) and nodes x k (k = 1, . . . , n), where x1 < ⋅ ⋅ ⋅ < x n , the Hermite interpolation polynomial H2n−1 (x) satisfying H2n−1 (x k ) = y k , 󸀠 H2n−1 (x k ) = y󸀠k

(k = 1, . . . , n)

can be decomposed in the form H2n−1 (x) = P2n−1 (x) + Q2n−1 (x). Especially, for the case n = 2, given real numbers y1 , y2 and y󸀠1 , y󸀠2 , and nodes x1 , x2 , the Hermite interpolation polynomial H3 (x) satisfying H3 (x1 ) = y1 ,

H3 (x2 ) = y2 ,

H3󸀠 (x1 )

H3󸀠 (x2 ) = y󸀠2

=

y󸀠1 ,

can be decomposed into H3 (x) = P3 (x) + Q3 (x), where

x − x2 2 2 (x − x1 )) ( ) (x1 − x2 ) x1 − x2 x − x1 2 2 + y2 (1 − (x − x2 )) ( ) , (x2 − x1 ) x2 − x1 x − x2 2 x − x1 2 Q3 (x) = y󸀠1 (x − x1 ) ( ) + y󸀠2 (x − x2 ) ( ) . x1 − x2 x2 − x1 P3 (x) = y1 (1 −

(4.3.1)

(n)

Let f ∈ C([−1, 1]) and x k = cos (2k−1)π (k = 1, . . . , n) be Chebyshev nodes. 2n (n) (n) If H2n−1 (x) is a polynomial of degree 2n − 1 satisfying H2n−1 (x k ) = f(x k ) and (n) 󸀠 H2n−1 (x k ) = 0, then the polynomial H2n−1 (x) converges to f(x) on [−1, 1] uniformly. Such an interpolation method is called the Féjer interpolation method.

4.4 Spline interpolation Spline interpolation is to replace polynomials by piecewise polynomials as interpolation functions and requires that piecewise polynomials are smooth at each node.

4.4.1 Spline functions Given nodes a = x0 < x1 < ⋅ ⋅ ⋅ < x n = b, if s(x) is a constant on each interval [x k , x k+1 ] (k = 0, . . . , n − 1), then s(x) is called a spline function of degree 0 with nodes x k (k = 0, . . . , n). If s(x) is a liner function on each interval [x k , x k+1 ] and s ∈ C([a, b]), then s(x) is called a spline function of degree 1. In general, if s(x) is a polynomial of

4.4 Spline interpolation

| 113

degree m on each interval [x k , x k+1 ] and s ∈ C m−1 ([a, b]), then s(x) is called a spline function of degree m. For a spline function s(x) of degree m, let d k = s(m) (x k + 0) − s(m) (x k − 0), and let s mk be the restriction of s(x) on [x k , x k+1 ], i.e., s mk = s|[x k ,x k+1 ) . Then s mk (x) = s m,k−1 (x) + R(x), where R(x) is a polynomial of degree m. From s(x) ∈ C m−1 ([a, b]), it follows that R(l) (x k ) = 0 (l = 0, 1, . . . , m − 1), R(m) (x k ) = d k . So s mk (x) = s m,k−1 (x) +

dk (x − x k )m−1 . (m − 1)!

Introduce two notations x+ = max(0, x) and x+m−1 = (x+ )m−1 (m ≥ 2). Then n−1

s(x) = s|[x0 ,x1 ) (x) + ∑ k=1

dk (x − x k )+m−1 . (m − 1)!

The general structure of spline functions is as follows. Proposition 4.4.1. Let s(x) be a spline function of degree m with nodes x0 < x1 < ⋅ ⋅ ⋅ < x n . Then there exist c0 , . . . , c m and d1 , . . . , d n−1 such that n−1

s(x) = c0 + c1 x + ⋅ ⋅ ⋅ + c m x m + ∑ k=1

dk (x − x k )+m−1 . (m − 1)!

4.4.2 Spline interpolation Given nodes a = x0 < x1 < ⋅ ⋅ ⋅ < x n = b and numerical values y0 , y1 , . . . , y n , if a function s(x) satisfies (a) s(x) is a polynomial of degree ≤ 3 on each subinterval [x k−1 , x k ) (k = 1, . . . , n), (b) s(x k ) = y k (k = 0, 1, . . . , n) and s(x) ∈ C2 ([a, b]), then s(x) is called a cubic spline function on [a, b]. Let s(x) be a cubic spline function on [a, b]. Then s(x) satisfies on each interval [x k , x k+1 ) s(x k ) = y k , s(x k+1 ) = y k+1 . Denote s󸀠 (x k ) = μ k ,

s󸀠 (x k+1 ) = μ k+1 ,

114 | 4 Interpolation where {μ k }k=1,...,n are unknown. By (4.3.1), x − x k+1 x − xk x − x k+1 2 x − xk 2 )( ) y k + (1 − 2 )( ) y k+1 δk δk δk δk x − x k+1 2 x − xk 2 + (x − x k ) ( ) μ k + (x − x k+1 ) ( ) μ k+1 , δk δk

s(x) = (1 + 2

(4.4.1)

where δ k = x k+1 − x k . This implies that s󸀠󸀠 (x) = (

6 12 6 12 − (x k+1 − x)) y k + ( 2 − 3 (x − x k )) y k+1 δ2k δ3k δk δk

+(

6 6 2 6 − (x k+1 − x)) μ k − ( − 2 (x − x k )) μ k+1 . δk δk δk δk

(4.4.2)

Thus, the right-derivative and the left-derivative are, respectively, 6 6 4 2 y + 2 y k+1 − μk − μ k+1 2 k δ δ δk δk k k 6 6 2 4 s󸀠󸀠 (x−k ) = 2 y k−1 − 2 y k + μ k−1 + μk δ δ δ k−1 δ k−1 k−1 k−1

s󸀠󸀠 (x+k ) = −

(k = 0, 1, . . . , n − 1), (k = 0, 1, . . . , n − 1).

Since s󸀠󸀠 (x+k ) = s󸀠󸀠 (x−k ) (k = 1, . . . , n − 1), 1 1 2 2 μ k−1 + 4 ( + ) μk + μ k+1 δ k−1 δ k−1 δ k δk =−

6 6 6 6 y k−1 + ( − ) y k + 2 y k+1 δ k−1 δ2k δ2k−1 δk

(k = 1, . . . , n − 1).

Finding unknown μ0 , μ1 , . . . , μ n meets three kinds of boundary conditions. The first kind is s󸀠 (x0 ) = μ0 , s󸀠 (x n ) = μ n . The second kind is

s󸀠󸀠 (x0 ) = 0, s󸀠󸀠 (x n ) = 0.

From this and (4.4.2), it follows that 3 (y1 − y0 ), δ0 3 (y n − y n−1 ). μ n−1 + 2μ n = δ n−1 2μ0 + μ1 =

The third kind is that s(x) is a periodic function with period x n − x0 , in this case, y0 = y n , and s󸀠 (x0 ) = s󸀠 (x n ), s󸀠󸀠 (x0 ) = s󸀠󸀠 (x0 ).

4.4 Spline interpolation

| 115

From this and (4.4.2), it follows that μ0 = μ n and 3 1 3 1 (y1 − y0 ) − (2μ0 + μ1 ) = 2 (y n−1 − y n ) + (μ n−1 − 2μ n ). δ0 δ n−1 δ20 δ n−1 Each kind of boundary condition gives n + 1 linear equations with n + 1 unknown numbers. Solve these equations to find μ0 , μ1 , . . . , μ n . Proposition 4.4.2. Let f ∈ C4 ([a, b]), and let s(x) be the cubic spline function on [a, b] corresponding to nodes a = x0 < x1 < ⋅ ⋅ ⋅ < x n = b. Denote λ = max0≤k≤n+1 |x k+1 − x k |. Then |f (i) (x) − s(i) (x)| ≤ Cλ4−i (i = 1, 2, 3), where C is a constant.

4.4.3 B-splines Let N1 (x) = χ[0,1] (x), where χ[0,1] (x) is the characteristic function of [0, 1] and 1

N2 (x) = ∫ N1 (x − t) dt, .. .

0

1

N m (x) = ∫ N m−1 (x − t) dt. 0

The N m (x) is called the B-spline of degree m − 1. B-spline N m has the following properties: N m (x) > 0 (0 < x < m), supp N m (x) = [0, m], ∑ N m (x − l) = 1 (x ∈ ℝ). l∈ℤ

Let s m ∈ L2 (ℝ) be a spline function of degree m with nodes ℤ. Then s m (x) = ∑ c k N m (x − k) k∈ℤ

in the L2 (ℝ) sense. Especially, if supp s m (x) = [N1 , N2 ] (N1 , N2 ∈ ℤ), then s m (x) is a sum of finitely many terms.

116 | 4 Interpolation

4.5 Trigonometric interpolation and fast Fourier transform Given N points x0 , x1 , . . . , x N−1 , define the discrete Fourier transform as Xk =

2πk 1 N−1 ∑ x n e−in N N n=0

(k = 0, . . . , N − 1).

Using the formula N−1

∑ e−ik

2πj N

eim

2πj N

= δ km

(0 ≤ k, m ≤ N − 1),

j=0

where δ km is the Kronecker delta, the inverse discrete Fourier transform is n

x n = ∑ X k eik

2πn N

(n = 0, 1, . . . , N − 1).

k=0

This implies the following formula.

Trigonometric interpolation formula Let f(x) be defined on [0, 1] and x j = ck =

j N

(j = 0, 1, . . . , N − 1) be N nodes. Denote

1 N−1 ∑ f(x j ) e−2πikx j N j=0

(k = 0, . . . , N − 1).

2πikx satisfies P(x ) = f(x ). Then the trigonometric polynomial P(x) = ∑N−1 k k k=0 c k e

Fast Fourier transform Fast Fourier transform is a fast algorithm computing discrete Fourier transform by the halving trick. Given a 2N -point time series x = (x0 , x1 , . . . , x2N −1 ), its discrete Fourier transform is N 1 2 −1 −in 2πk X k = N ∑ x n e 2N (k = 0, 1, . . . , 2N − 1). (4.5.1) 2 n=0 Now we halve X k by the halving trick. First, we compute the first half X0 , X1 , . . . , X2N−1 −1 . Decompose the given 2N point time series x into two 2N−1 -point time series u = (x0 , x2 , . . . , x2N −2 ) =: (u0 , u1 , . . . , u2N−1 −1 ), υ = (x1 , x3 , . . . , x2N −1 ) =: (υ0 , υ1 , . . . , υ2N−1 −1 ),

4.5 Trigonometric interpolation and fast Fourier transform

|

117

i.e., u is an even sample and υ is an odd sample of x. From (4.5.1), it follows that Xk =

1 2N

and so

2N−1 −1

∑ un e

+

n=0

where

Vk =

2

1 2N

2N−1 −1

∑ υn e

−(2n+1)i 2πk N 2

,

n=0

1 −i 2πk (U k + e 2N V k ) (k = 0, 1, . . . , 2N−1 − 1), 2

Xk =

Uk =

−2ni 2πk N

2N−1 −1

1 2N−1 2N−1

2πk 2N−1

−in

2πk 2N−1

,

n=0 2N−1 −1

1

−in

∑ un e ∑ υn e

(k = 0, 1, . . . , 2N−1 − 1).

n=0

Similarly, the second half X2N−1 , X2N−1 +1 , . . . , X2N −1 is computed as follows: X k+2N−1 =

1 −i 2πk (U k − e 2N V k ) (k = 0, 1, . . . , 2N−1 − 1). 2

where U k and V k are stated as above. We continue to halve the obtained U k and V k by the halving trick. Halving U k (k = 0, 1, . . . , 2N−1 − 1), this gives that 1 −i 2πk (U k󸀠 + e 2N−1 U k󸀠󸀠 ) , 2 1 −i 2πk = (U k󸀠 − e 2N−1 U k󸀠󸀠 ) (k = 0, 1, . . . , 2N−2 − 1), 2

Uk = U k+2N−2

where U k󸀠 and U k󸀠󸀠 are the discrete Fourier transforms of two 2N−2 -point time series which consist of even samples and odd samples of u, respectively. Halving V k (k = 0, 1, . . . , 2N−1 − 1), this gives that 1 −i 2πk (V k󸀠 + e 2N−1 V k󸀠󸀠 ) , 2 1 −i 2πk = (V k󸀠 − e 2N−1 V k󸀠󸀠 ) (k = 0, 1, . . . , 2N−2 − 1), 2

Vk = V k+2N−2

where V k󸀠 and V k󸀠󸀠 are the discrete Fourier transforms of two 2N−2 -point time series which consist of even samples and odd samples of υ, respectively. Continue this procedure until a one-point time series. Using the fast Fourier transform algorithm, the total number of multiplication operations is equal to N2N−1 . While using the original discrete Fourier transform algorithm, the total number of multiplication operations is equal to 22N . This means that the fast Fourier transform has better computationally efficiency.

118 | 4 Interpolation

4.6 Bivariate interpolation Given a set of nodes in the xy-plane (x1 , y1 ), (x2 , y2 ), . . . , (x n , y n ), where each node (x i , y i ) is associated with a real number c i , we find a smooth and easily computed function F such that F(x i , y i ) = c i (1 ≤ i ≤ n).

4.6.1 Cartesian product and grids Assume that the set of nodes is a Cartesian product N = {x1 , x2 , . . . , x p } × {y1 , y2 , . . . , y q }, i.e., grids N = { (x i , y j ) : 1 ≤ i ≤ p, 1 ≤ j ≤ q }. Let u i (x) (i = 1, . . . , p) be univariate real functions such that u i (x j ) = δ ij (1 ≤ i, j ≤ p), where δ ij is the Kronecker delta. For example, u i (x) may be the fundamental polynomials in the Lagrange interpolation formula (see (4.2.1)). Let f be a bivariate function. Define an operator P as p

(Pf)(x, y) = ∑ f(x i , y)u i (x). 1

The operator Pf is a bivariate function that interpolates f on vertical lines L i : { (x i , y): −∞ < y < ∞ } (i = 1, . . . , p). Define another operator Q as q

(Qf)(x, y) = ∑ f(x, y j )υ j (y), 1

where υ j (y i ) = δ ij (1 ≤ i, j ≤ q). The operator Qf is a bivariate function that interpolates f on horizontal lines L j : { (x, y j ): −∞ < x < ∞ } (j = 1, . . . , q). Then P(Qf) is a function that interpolates f at the nodes (x i , y j ) (i = 1, . . . , p; j = 1, . . . , q). In fact, from q

p

q

P(Qf)(x, y) = P (∑ f(x, y j )υ j (y)) = ∑ ∑ f(x i , y j )u i (x)υ j (y), 1

i=1 j=1

it follows that P(Qf)(x i , y j ) = f(x i , y j ) (i = 1, . . . , p; j = 1, . . . , q).

4.6.2 Tensor product A function ∑0≤i+j≤k c ij x i y j is called a bivariate polynomial of degree ≤ k, where c ij are constants. The space of all bivariate polynomials of degree at most k is denoted by ∏k (ℝ2 ). The set { x i y j }0≤i+j≤k is the basis of ∏k (ℝ2 ). So the dimension of ∏k (ℝ2 ) is 1 2 (k + 1)(k + 2).

4.6 Bivariate interpolation

| 119

For a node set N , we ask whether the interpolation is possible on N . The following results are known: (a) Assume that the set N consists of nodes (x 1 , y1 ), (x2 , y2 ), . . . , (x k̃ , y k̃ ), where k̃ = 12 (k + 1)(k + 2), and these nodes lie on lines L0 , L1 , . . . , L k , and for each i, the line L i contains just i + 1 nodes. Then, for arbitrary associated data c1 , c2 , . . . , c k̃ , ̃ there is a polynomial p ∈ ∏k (ℝ2 ) such that p(x l , y l ) = c l (l = 1, 2, . . . , k). (b) For any set of k + 1 distinct nodes (x1 , y1 ), (x2 , y2 ), . . . , (x k+1 , y k+1 ) and data c1 , c2 , . . . , c k+1 , there is a p ∈ ∏k (ℝ2 ) such that p(x l , y l ) = c l (l = 1, . . . , k + 1).

4.6.3 Shepard interpolation Let n nodes p i = (x i , y i ) (i = 1, . . . , n). We select a real-valued function φ on ℝ2 × ℝ2 such that φ(p, q) = 0 if and only if p = q. Define cardinality functions u i (p) =

∏ j=1,...,n j=i̸

φ(p, p j ) φ(p i , p j )

(1 ≤ i ≤ n).

This leads to an interpolation formula of a function f at the given nodes p i as follows: n

F = ∑ f(p i )u i . 1

Especially, if φ(p, p j ) = ‖p − p j ‖2 = (x − x j )2 + (y − y j )2 , where p = (x, y) and p j = (x j , y j ), then the interpolation formula is n

F(x, y) = ∑ f(x i , y i ) ∏ i=1

j=1,...,n j=i̸

(x − x j )2 + (y − y j )2 . (x i − x j )2 + (y i − y j )2

4.6.4 Triangulation Triangulation is another method for interpolation problems. If a set of triangles ∆1 , ∆2 , . . . , ∆ m satisfies the following three conditions: – each node is the vertex of some triangle ∆ s ; – each vertex of a triangle in the set is a node; – if a node belongs to a triangle, then the node must be a vertex of that triangle, then this set is called a triangulation. In a triangle ∆ s , define a linear function as l s (x, y) = α s x + β s y + γ s ,

(x, y) ∈ ∆ s .

120 | 4 Interpolation Choose α s , β s , and γ s such that each linear function l s (x, y) (s = 1, 2, . . . , m) takes the prescribed values c i , c j , and c k at vertices (x i , y i ), (x j , y j ), and (x k , y k ), i.e., l s (x i , y i ) = c i , l s (x j , y j ) = c j , l s (x k , y k ) = c k

(s = 1, 2, . . . , m).

Then the obtained pieces linear function is continuous on the union ⋃m s=1 ∆ s .

Further reading [1] [2] [3]

[4]

[5]

[6] [7] [8]

[9] [10] [11] [12]

[13] [14] [15]

Agudelo OM, Viaene P, De Moor B. Improving the PM10 estimates of the air quality model AURORA by using Optimal Interpolation. IFAC-PapersOnLine. 2015(48):1154–1159. Arun PV. A comparative analysis of different DEM interpolation methods. The Egyptian Journal of Remote Sensing and Space Science. 2013(16):133–139. Chaplot V, Darboux F, Bourennane H, Legedois S, Silvera N, Phachomphon K. Accuracy of interpolation techniques for the derivation of digital elevation models in relation to landform types and data density. Geomorphology. 2006(77):126–141. Chudinov AV, Gao W, Huang Z, Cai W, Zhou Z, Raznikov VV, Kozlovski VI, Sulimenkov IV. Interpolational and smoothing cubic spline for mass spectrometry data analysis. International Journal of Mass Spectrometry. 2016(396):42–47. Duran-Rosal AM, Heras-Martinez C, Tallon-Ballesteros AJ, Martinez-Estudillo AC, Salcedo-Sanz S. Massive missing data reconstruction in ocean buoys with evolutionary product unit neural networks. Ocean Engineering. 2016(117):292–301. Janssen S, Dumont G, Fierens F, Mensink C. Spatial interpolation of air pollution measurements using CORINE land cover data. Atmospheric Environment 2008(42):4884–4903. Jeong SYY, Choi YJ, Park P. Parametric interpolation using sampled data. Computer-Aided Design. 2006(38):39–47. Kilibarda M, Tadi MP, Hengl T, Lukovic J, Bajat B. Global geographic and feature space coverage of temperature data in the context of spatio-temporal interpolation. Spatial Statistics. 2015(14):22–38. Krivoruchko K, Gribov A, Krause E. Multivariate areal interpolation for continuous and count data. Procedia Environmental Sciences. 2011(3):14–19. Li Q, Dehler SA. Inverse spatial principal component analysis for geophysical survey data interpolation. Journal of Applied Geophysics. 2015(115):79–91. Lin J, Cromley PG. Evaluating geo-located Twitter data as a control layer for areal interpolation of population. Applied Geography. 2015(58):41–47. Liu R, Chen Y, Sun C, Zhang P, Wang J, Yu W, Shen Z. Uncertainty analysis of total phosphorus spatial-temporal variations in the Yangtze River Estuary using different interpolation methods. Marine Pollution Bulletin. 2014(86):68–75. Liu S, Wang CCL. Quasi-interpolation for surface reconstruction from scattered data with radial basis function. Computer Aided Geometric Design. 2015(29):435–447. Mendez D, Labrador M, Ramachandran K. Data interpolation for participatory sensing systems. Pervasive and Mobile Computing. 2013(9):132–148. Nardelli BB, Droghei R, Santoleri R. Multi-dimensional interpolation of SMOS sea surface salinity with surface temperature and in situ salinity data. Remote Sensing of Environment. 2016: in press.

Further reading

| 121

[16] Nardelli BB, Pisano A, Tronconi C, Santoleri R. Evaluation of different covariance models for the operational interpolation of high resolution satellite Sea Surface Temperature data over the Mediterranean Sea. Remote Sensing of Environment. 2015(164):334–343. [17] Plouffe CCF, Robertson C, Chandrapala L. Comparing interpolation techniques for monthly rainfall mapping using multiple evaluation criteria and auxiliary data sources: A case study of Sri Lanka. Environmental Modelling & Software. 2015(67):57–71. [18] Scudiero E, Corwin DL, Morari F, Anderson RG, Skaggs TH. Spatial interpolation quality assessment for soil sensor transect datasets. Computers and Electronics in Agriculture. 2016(123): 74–79. [19] Singh SK, McMillan H, Bardossy A. Use of the data depth function to differentiate between case of interpolation and extrapolation in hydrological model prediction. Journal of Hydrology. 2013(477):213–228. [20] Slattery SR. Mesh-free data transfer algorithms for partitioned multiphysics problems: Conservation, accuracy, and parallelism. Journal of Computational Physics. 2016(307):164–188. [21] Steinbuch L, Brus DJ, van Bussel LGJ, Heuvelink GBM. Geostatistical interpolation and aggregation of crop growth model outputs. European Journal of Agronomy. 2016(77):111–121. [22] Su T, Cao Z, Lv Z, Liu C, Li X. Multi-dimensional visualization of large-scale marine hydrological environmental data. Advances in Engineering Software. 2016(95):7–15. [23] Tokumitsu M, Hasegawa K, Ishida Y. Toward resilient sensor networks with spatiotemporal interpolation of missing data: An example of space weather forecasting. Procedia Computer Science. 2015(60):1585–1594. [24] Urquhart EA, Hoffman MJ, Murphy RR, Zaitchik BF. Geospatial interpolation of MODIS-derived salinity and temperature in the Chesapeake Bay. Remote Sensing of Environment. 2013(135): 167–177. [25] Wagner PD, Fiener P, Wilken F, Kumar S, Schneider K. Comparison and evaluation of spatial interpolation schemes for daily rainfall in data scarce regions. Journal of Hydrology. 2012(464–465):388–400. [26] Wang Q, Shi W, Atkinson PM. Sub-pixel mapping of remote sensing images based on radial basis function interpolation. ISPRS Journal of Photogrammetry and Remote Sensing. 2014(92): 1–15. [27] Yang Z, Liu Y, Li C. Interpolation of missing wind data based on ANFIS. Renewable Energy. 2011(36):993–998.

5 Statistical methods In order to handle various environmental issues well, it is crucial to apply statistical methodology to ensure well-conducted data collection, analyze and interpret environmental data, and describe environmental changes with sound and validated models. In this chapter, we will provides comprehensive coverage of the methodology used in the statistical investigation of environmental issues, including regression analysis, principal component analysis, discriminant analysis, cluster analysis, factor analysis, and canonical correlation analysis. All these statistical methods can be easily implemented by popular software packages such as SPSS, SAS, and R.

5.1 Linear regression Linear regression analysis is the most widely used among all statistical techniques. It is used to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. The simplest linear regression model is Y = β0 + β1 x + ε

(ε ∼ N(0, σ 2 )),

(5.1.1)

where Y is a dependent variable, x is an explanatory variable, and ε is the random error. If the data is a sample { Y i , x i }i=1,...,n , by (5.1.1), Y i = β0 + β1 x i + ε i

(i = 1, . . . , n).

(5.1.2)

Common assumptions are that ε1 , . . . , ε n are independent, E[ε i ] = 0 (i = 1, . . . , n) and Var(ε i ) = σ2 (i = 1, . . . , n), and ε1 , . . . , ε n are normal distributed.

5.1.1 Estimate of regression coefficients β0 and β1 Let Ŷ i be the estimate of Y i , i.e., Ŷ i = β 0 + β 1 x i

(i = 1, . . . , n).

Use the method of least squares to estimate β0 and β1 . Let n n Q(β0 , β1 ) := ∑(Y i − Ŷ i )2 = ∑(Y i − β0 − β1 x i )2 . 1

(5.1.3)

(5.1.4)

1

We need to choose β̂0 , β̂1 such that Q(β̂0 , β̂1 ) attains the minimal value. This means solving two equations ∂Q ∂Q = 0, = 0. ∂β0 ∂β1 DOI 10.1515/9783110424904-006

5.1 Linear regression

Let x̄ =

1 n

∑ni=1 x i and Ȳ =

1 n

|

123

∑ni=1 Y i . By (5.1.4),

n ∂Q ̄ 1 ). = −2 ∑(Y i − β0 − β1 x i ) = −2(n Ȳ − nβ0 − n xβ ∂β0 1 ∂Q ∂β0

Since

= 0,

β0 = Ȳ − β1 x.̄

(5.1.5)

Denote x ̃i = x i − x̄ and Ỹ i = Y i − Ȳ . By (5.1.4) and (5.1.5), we get n n ∂Q = −2 ∑(Y i − (β0 + β1 x i ))x i = −2 ∑(Ỹ i + Ȳ − (β0 + β1 x i ))x i ∂β1 1 1 n

= −2 ∑(Ỹ i + (β0 + β1 x)̄ − (β0 + β1 x i ))x i 1 n

= −2 ∑(Ỹ i − β1 x ̃i )x ̃i = −2S xY + 2β1 S xx , 1

where S xY =

∑ni=1

x ̃i Ỹ i and S xx = ∑ni=1 (x ̃i )2 . Since

∂Q ∂β1

= 0,

β1 S xx = S xY . From this and (5.1.5), the regression coefficients β0 , β1 are estimated ̄ i − Y)̄ ∑n x ̃i Ỹ i ∑1n (x i − x)(Y S xY = 1n = , n 2 S xx ∑1 (x ̃i ) ∑1 (x i − x)̄ 2 β̂0 = Ȳ − β̂1 x.̄

β ̂1 =

The equation Ŷ = β̂0 + β̂1 x is called the regression equation.

5.1.2 Estimate of σ 2 By (5.1.1), E[(Y − (β0 + β1 x))2 ] = E[ε2 ] = σ2 . Now we estimate σ2 using the sample (Y i , x i )i=1,...,n . By β̂1 = S xY /S xx , the sum of squares n

n

i=1

1 n

̄ 2 ∑ (Y i − β̂0 − β̂1 x i )2 = ∑(Y i − Ȳ − β̂1 (x i − x)) n

n

1

1

̄ i − x)̄ + (β̂1 )2 ∑(x i − x)̄ 2 = ∑(Y i − Y)̄ 2 − 2β̂1 ∑(Y i − Y)(x 1

= S YY

− β̂1 S xY .

The corresponding statistic is Q = S YY − β̂1 S xY . It can be proved that So E[Q/(n − 2)] = σ 2 . This implies that the unbiased estimator σ̂ 2 =

Q S YY − β̂1 S xY = . n−2 n−2

Q σ2

∼ χ2 (n − 2).

124 | 5 Statistical methods

5.1.3 Decomposition formula Since n

n

1

1 n

n

n

1

1

1

∑(Y i − Y)̄ 2 = ∑(Y i − Ŷ i + Ŷ i − Y)̄ 2 = ∑(Y i − Y)̂ 2 + 2 ∑(Y i − Ŷ i )(Ŷ i − Y)̄ + ∑(Ŷ i − Y)̄ 2 . From Ỹ i = Y i − Ȳ and Ŷ i = Ȳ + β̂1 x ̃i and x ̃i = x i − x,̄ and (5.1.5), the second term on the right-hand side n

n

n

i=1

i=1

1

∑ (Y i − Ŷ i )(Ŷ i − Y)̄ = ∑ (Ỹ i − β̂1 x ̃i )β̂1 x ̃i = β̂0 β̂1 ∑ x ̃i = 0. Thus the sum of squares of deviation Y i − Ȳ is decomposed as follows: n

n

n

1

1

1

∑(Y i − Y)̄ 2 = ∑(Y i − Ŷ i )2 + ∑(Ŷ i − Y)̄ 2 ,

(5.1.6)

where Ŷ i = β̂0 + β̂1 x i . In (5.1.6), the sum Q := ∑ni=1 (Y i − Ŷ i )2 is called the residual sum of squares; the sum U := ∑ni=1 (Ŷ i − Y)̄ 2 is called the regression sum of squares. So (5.1.6) can be rewritten as S YY = Q + U . Since residuals Y i − Ŷ i represent the differences between the fitted values and the observed values, they can be used to check if the assumed model fits the data well or not. Define R2 =

∑ n ( Y ̂ i − Y i )2 U Q =1− = 1 − 1n . S YY S YY ∑1 (Y i − Y)̄ 2

Here R2 is called the coefficient of determination which indicates the usefulness of the regression model. If R approximates 1, the regression model is very useful.

5.1.4 Extensions Many more complicated regressions may be reduced to linear regression problems. Here we give some examples which may be reduced to linear regression. For the model G(Y) = β0 + β1 g(x) + ε and ε ∼ N(0, σ2 ), where β0 , β1 , and σ2 are independent of x, its samples are { x i , Y i }i=1,...,n . Let Y ∗ = G(Y) and x∗ = g(x). Then Y ∗ = β0 + β1 x∗ + ε, ε ∼ N(0, σ2 ) is a linear regression model and the samples are { g(x i ), G(Y i ) }i=1,...,n . By using the least square method, the regression coefficients β̂0 and β̂1 can be estimated and the regression model is G(Y) = β̂0 + β̂1 g(x) + ε.

5.2 Multiple regression

|

125

For the model Y = aebx ε and log ε ∼ N(0, σ2 ) and the model Y = ax b ε and log ε ∼ N(0, σ 2 ), where a, b, and ε are independent of x, taking logarithms on both sides, these two models are transformed into two linear regression models, respectively.

5.1.5 Nonlinear regression models In general, the univariate regression model is Y = μ(x; β1 , . . . , β p ) + ε, ε ∼ N(0, σ2 ), where β1 , . . . , β p , and σ2 are independent of x. If μ is a linear function of β1 , . . . , β p , then μ is called a linear regression model, otherwise, it is called a nonlinear regression model. Let { x i , Y i }i=1,...,n (n > p) be a known sample. A general nonlinear regression model can be written as Y i = μ(x i ; β1 , . . . , β p ) + ε i

(i = 1, . . . , n).

For a nonlinear regression model, the assumption of random errors ε i ’s are the same as those for a linear model. Use least squares to estimate β1 , . . . , β p . Let n

2

Q(β1 , . . . , β p ) = ∑ (Y i − μ(x i ; β1 , . . . , β p )) . 1

Differentiating both sides of the equation with respect to β k gives n ∂Q ∂μ = 2 ∑(Y i − μ(x i ; β1 , . . . , β p )) (x i ; β1 , . . . , β p ) = 0 (k = 1, . . . , p). ∂β k ∂β k 1

One often solves these equations using the Newton–Rapson iterative algorithm (see Chapter 7) and obtains the estimates β̂ k ’s of β k ’s (k = 1, . . . , p). Finally, the estimate of Y is Y ≈ μ(x; β̂1 , . . . , β̂ p ).

5.2 Multiple regression In this section, we discuss multivariate linear regression models with one or multiple dependent variables. The multivariate regression model incorporates the correlation between the responses. So it may provide more efficient inference than separate univariate regression models but it is more complicated and is more different to handle.

126 | 5 Statistical methods

5.2.1 A dependent variable case In multivariate analysis, we choose Y as a response and variables x1 , . . . , x p as parameters, and a sample { Y i ; x i1 , . . . , x ip }i=1,...,n . Multivariate linear regression models attempt to find an approximate relationship p

Y = ∑ βk xk + ε

(ε ∼ N(0, σ2 I)),

x0 = 1.

(5.2.1)

0

This implies that p

Y i = ∑ β k x ik + ε i

(ε i ∼ N(0, σ2 )),

x i0 = 1 (i = 1, . . . , n)

k=0

and {ε i } are independent. The matrix form is Y = Xβ + ε where Y = (Y1 , . . . , Y n )T , ε = (ε1 , . . . , ε n )T , β = (β0 , β1 , . . . , β p )T , and 1 X = ( ... 1

x11 .. . x n1

⋅⋅⋅ .. . ⋅⋅⋅

x1p .. ) . . x np

(a) Estimate of regression coefficients β = (β0 , . . . , β p ) Choose {β k }k=1,...,p such that the following sum of squares attains the minimal value p

n

Q(β) = ∑ (Y i − ∑ β k x ik )2 = (Y − Xβ)T (Y − Xβ). i=1

Let

∂Q ∂β j

(5.2.2)

k=0 p

= 0 (j = 1, . . . , n). Then ∑ni=1 (Y i − ∑k=0 β k x ik )x ij = 0. This is equivalent to p

n

n

∑ ( ∑ x ik x ij ) β k = ∑ Y i x ij k=0

i=1

(j = 1, . . . , n).

i=1

Its matrix form is X T Xβ = X T Y . So β̂ = (X T X)−1 X T Y is the least square estimate of β. Moreover, it is an unbiased estimate of β. It can be proved that β̂ ∼ N(β, σ2 (X T X)−1 ). From this and (5.2.2), it follows that n

p

i=1

k=0

2

Q(β)̂ = ∑ (Y i − ∑ β̂ k x ik ) = Y T Y − Y T X β,̂ n Q(β)̂ 1 = σ̂ 2 = ∑(Y i − Ŷ i )2 , n−p−1 n−p−1 1 p where Ŷ i = ∑k=0 β̂ k x ik (i = 1, . . . , n) and σ̂ 2 is a unbiased estimate of σ2 . The following proposition holds.

5.2 Multiple regression

For the model (5.2.1),

|

127

β̂ ∼ N(β, σ2 (X T X)−1 ), Q(β)̂ ∼ χ2n−p−1 . σ2

If β1 = ⋅ ⋅ ⋅ = β p = 0, then V/σ2 ∼ χ2p , where V = ∑ni=1 (Ŷ i − Y)̄ 2 . V/p . If β1 = ⋅ ⋅ ⋅ = β p = 0, then F ∼ F(p, n − p − 1). By Define a test statistic F = Q/(n−p−1) the known sample, we compute the value of F and significant probability p.̃ If p̃ is less than significance level α, then we reject hypothesis H0 . Otherwise, the hypothesis H0 holds.

(b) Model selection If in a regression model, some variable does not have a significant effect on the response, then we may remove it from this model. In general, a simple model is better than a complex model. In a linear regression model, if the first model chosen by us is p

Y = ∑ β k x k + ε1 (1)

Ŷ i

0 p

(β0 = 1), (β̂0 = 1)

= ∑ β̂ k x k 0

and the second model chosen by us is q

Y = ∑ b k x k + ε2 (2)

Ŷ i

0 q

(b0 = 1), (b̂ 0 = 1),

= ∑ b̂ k x k 0

where p > q. Denote

n

(1)

− Y ̂ i )2 ,

(2)

(2) − Y ̂ i )2 .

R1 = ∑(Y i 1 n

R2 = ∑(Y i

(1)

1

If F=

(R2 − R1 )/(p − q) > F α (p − q, n − p), R1 /(n − p)

then we choose the first model. Otherwise, we choose the second model.

128 | 5 Statistical methods

5.2.2 Multiple dependent variable case Suppose that there are p dependent variables Y1 , . . . , Y p and m independent variables x1 , . . . , x m , and data matrixes are X = (x ij )n×m and Y = (Y ij )n×p . If m

Y ij = β0j + ∑ β kj x ik + ε ij

(i = 1, . . . , n; j = 1, . . . , p)

k=1

then Y = (I n |X)β + E = Cβ + E, where I n = (1, 1, . . . , 1)T , β = (β ij )m×p ,

C = (I n |X), E = (ε ij )n×p .

Assume that ε i = (ε i1 , . . . , ε ip )T (i = 1, . . . , n) are independent, E[ε i ] = 0, ε i ’s have the same covariance matrix Σ, and ε i ∼ N p (0, Σ) (i = 1, . . . , n). The model Y = (I n |X)β + E = Cβ + E is called a multivariate linear regression model of multiple dependent variables, where Y, E are random matrices and C = (I n |X). Similar to the dependent variable case, the parameter matrix β and covariance matrix Σ are estimated using the least square method as follows: β̂ = (CT C)−1 CT Y =: (b ij )(m+1)×p . So the regression equation with p dependent variables is Ŷ j = b0j + b1j x1 + ⋅ ⋅ ⋅ + b mj x m (j = 1, . . . , p).

5.3 Case study: Tree-ring-based climate reconstructions Due to the lack of reliable long-term meteorological records, it is hard to understand past climate change. Reconstructions of past climate conditions may be derived from various paleoclimatology proxies. Compared with other proxies, tree-ring-based reconstructions have many advantages, including wide spatial distribution, high climate sensitivity, high annual resolution and calendar-exact dating. Currently, the world’s longest tree-ring chronology extends over more than 7000 years.

5.3.1 Collection of tree-ring data Since tree radial growth is always subject to climatic influences, one can use various parameters from sample tree-rings to reconstruct historical temperature and precipitation. In order to reconstruct past temperature, the sample sites of tree-ring data should be in upper-elevation tree-line locations and cold mountain valley environments. In order to reconstruct past precipitation, sample sites should be in a steep rocky, south

5.3 Case study: Tree-ring-based climate reconstructions | 129

facing slope. In each sample site, after removing a cylinder of wood 5 mm in diameter along the radius of a tree, core samples are collected at breast height from trees. Core samples are air dried and polished, then tree-ring widths are measured with a precision of 0.01 mm by using the LINTAB system or a similar system. After eliminating age-related growth trends, tree-ring width data need be standardized using the program ARSTAN in order to get tree-ring width chronologies. Except for tree-ring widths, tree-ring isotopic data can also be used to reconstruct past climate. Each ring of tree core samples is cut by using a scalpel blade and cellulose is extracted, δ18 O and δ13 C values of cellulose are measured by using a stable isotope radio mass spectrometer.

5.3.2 Tree-ring-based climate reconstruction Main tree-ring related parameters include a1 tree-ring width; a2 (δ18 O); a3 (δ13 C), a4 mean latewood density. Main reconstructed climate parameters include b1 temperature, b2 precipitation, b3 runoff, b4 drought, b5 CO2 concentration C a . Correlation analysis is used to examine the relationship between tree-ring related pa(1) (n) rameters a i and climate parameters b j . Suppose that X i , . . . , X i is the tree-ring (1) (n) chronology for parameter a i and Y j , . . . , Y j is the observation of climate parameter b j . Then the estimate of the correlation coefficient is (l)

r ij =

(l)

∑nl=1 (X i − X̄ i )(Y j − Ȳ j ) ̄ 2 √∑n (Y (l) − Ȳ j )2 √∑nl=1 (X (l) l=1 j i − Xi)

,

(l) (l) where X̄ i = 1n ∑nl=1 X i and Ȳ j = 1n ∑nl=1 Y j . If {r i,j } is large, we can use the tree-ring parameter a i to reconstruct the climate parameter b j . Finally, with the help of linear regression formula, we can reconstruct the climate factor b j from climate parameter a i as follows: b j = β + β1 a i + ε i,j (ε i,j ∼ N(0, σ2 )),

where β, β1 , and σ2 are estimated well by samples of a i and b j (see Section 5.1). Below we give some results on tree-ring-based reconstruction of climate.

130 | 5 Statistical methods

The Hulunbuir region between 47–53° N and 115–126° E is extremely sensitive to climate changes. Therefore, it is an ideal region to carry out tree-ring research. Y. Liu et al. (2009) collected tree core samples and found that tree-ring width (TRW) is highly correlated with precipitation P76 for previous July to current June with the correlation coefficient r = 0.711. They give the following linear regression formula: P76 = 222.408 TRW − 133.115 to reconstruct P76 since 1865. During the calibration period 1952–2003, total precipitation reconstructed by Liu et al. tracked the observation very well. In 2012, G. Bao et al. further reconstructed April–September mean maximum temperature (MMT49 ) from 1868 to 2008 by tree-ring width chronologies by using the following linear regression formula: MMT49 = −2.807 TRW + 21.288. Moreover, Bao et al. (2012) further found significant correlations between the reconstructed MMT and Pacific Decadal Oscillation/Nino. This explains the influences of large-scale atmospheric-oceanic variability on regional temperature and droughts in the Hulubuir grassland. The Luoshan Mountains are in the south part of the Tengger Desert of China and are surrounded by land subject to desertification. Palmer Drought Severity Index (PDSI) is a standardized measure of surface moisture conditions. In 2013, Y. Wang et al. used tree-ring width to reconstruct annual PDSI in Tengger Desert for the period 1897–2007 as follows: PDSI = 4.90 + 4.15 TRW. In the West Tianmu Mountains of China, X. Zhao et al. (2006) collected tree-ring samples and reconstructed atmospheric CO2 concentration C a by using tree-ring δ13 C values C a = 8598 + 810.922 δ13 C + 19.748 (δ13 C)2 . This is a curvilinear regression formula. Their results show that in 1685–1840, the evaluated atmospheric CO2 concentration was stable, but after 1840 it exhibited a rapid increase. In the Wuyi Mountains of China, F. Chen et al. (2012) used tree-ring width of current and pervious years to reconstruct July–October minimum temperature MT710 in 1803–2008 as follows: MT710 (t) = 22.662 − 150 TRW(t) − 1.567 TRW(t − 1). This is a nonlinear regression formula. From this, they showed that there is a strong relationship between the reconstruction and Summer Asian-Pacific Oscillation which suggest linkages of regional temperature variability with the Asian-Pacific climate system.

5.4 Covariance analysis

| 131

5.4 Covariance analysis The analysis of covariance is used to adjust or control for differences between groups. Consider several independent random variables that have normal distribution with unknown means and unknown but common variance. A test of the equality of several means is called analysis of variance. Suppose that X1 , . . . , X m are independent and each X k ∼ N(μ k , σ2 ) (k = 1, . . . , m). Let (X1k , . . . , X nk ) be a sample of each X k . We test the hypothesis H0 : μ1 = μ2 = ⋅ ⋅ ⋅ = μ m = μ. Denote 1 n X⋅k = ∑ X jk , n j=1 X⋅⋅ =

1 n m ∑ ∑ X jk . mn j=1 k=1

It is easily deduced that the sum of squares n

m

mnS2 = ∑ ∑ (X jk − X̄ ⋅ ⋅ )2 j=1 k=1

has the decomposition formula n

m

m

mnS2 = ∑ ∑ (X jk − X̄ ⋅ k )2 + n ∑ (X̄ ⋅ k − X̄ ⋅ ⋅ )2 =: Q1 + Q2 . j=1 k=1

k=1

Note that the sample variance of X k : S2k = Var(X k ) = σ2 . Then

1 n−1

∑nj=1 (X jk − X̄ ⋅ k )2 and E[S2k ] =

m

E[Q1 ] = ∑ E[(n − 1)S2k ] = (n − 1)mσ2 .

(5.4.1)

k=1 1 Denote μ = m ∑m k=1 μ k and δ k = μ k − μ (k = 1, . . . , m). By the independence of X jk (j = 1, . . . , m; k = 1, . . . , n),

σ2 X̄ ⋅ k ∼ N (μ k , ), n

σ2 X̄ ⋅ ⋅ ∼ N(μ, ), nm

and so the expectations of their squares are, respectively, σ2 E[X̄ 2⋅ k ] = Var(X̄ ⋅ k ) + μ2k = + (μ + δ k )2 , n σ2 E[X̄ 2⋅ ⋅ ] = Var(X̄ ⋅ ⋅ ) + μ2 = + μ2 . nm Since X̄ ⋅ k =

1 n

∑nj=1 X jk , m

m

m

k=1

k=1

k=1

Q2 = n ∑ (X̄ ⋅ k − X̄ ⋅ ⋅ )2 = n ∑ (X̄ 2⋅ k + X̄ 2⋅ ⋅ − 2X̄ ⋅ k X̄ ⋅ ⋅ ) = n ∑ X̄ 2⋅ k − nm X̄ 2⋅ ⋅ .

132 | 5 Statistical methods m Note that ∑m k=1 δ k = ∑k=1 μ k − mμ = 0. Then, m

m

E[Q2 ] = n ∑ E [X̄ 2⋅ k ] − nmE [X̄ 2⋅ ⋅ ] = n ∑ ( 1

m

m

1

σ2 σ2 + (μ + δ k )2 ) − nm ( + μ2 ) n nm m

= (m − 1)σ + 2nμ ∑ δ k + n ∑ δ2k = (m − 1)σ2 + n ∑ δ2k . 2

1

1

1

Comparing this with (5.4.1), E[Q2 /(m − 1)] ≥ 1, E[Q1 /(m(n − 1))]

(5.4.2)

and the equality of (5.4.2) holds if and only if H0 is true since δ k = 0 (k = 1, . . . , m). The other test method is as follows. It is easy to prove that Qσ21 ∼ χ2 (m(n − 1)) and Q2 ∼ χ2 (m − 1). Since Q1 and Q2 are independent, σ2 Q2 /(m − 1) ∼ F(m − 1, m(n − 1)). Q1 /(m(n − 1)) This implies that H0 is not true if and only if α is a given significance level.

Q2 /(m−1) Q1 /(m(n−1))

≥ F α (m − 1, m(n − 1)), where

5.5 Discriminant analysis Discriminant analysis is used to separate individuals into different populations based on given multivariate data. Suppose that there are k m-dimensional populations G1 , . . . , G k . For a given sample X = (x1 , . . . , x m )T , how to decide X ∈ G l ?

5.5.1 Mahalanobis distance method Let G be an m-variate population with mean value vector μ = (μ1 , . . . , μ m )T and covariance matrix Σ. The Mahalanobis distance between an individual X = (x1 , . . . , x m )T and the population G is defined as d2 (X, G) = (X − μ)T Σ−1 (X − μ). For example, let G be a univariate population with mean μ and variance σ2 , and let X be a sample. Then the Mahalanobis distance is d2 (X, G) = (X − μ)2 /σ2 . Let G be a bivariate population with mean value vector μ and covariance matrix Σ μ = (μ1 , μ2 )T , Σ=(

τ11 τ21

τ12 ), τ22

5.5 Discriminant analysis

|

133

and let X = (x1 , x2 )T be a sample. Then the Mahalanobis distance is d2 (X, G) = (x1 − μ1 , x2 − μ2 ) (

τ12 ) τ22

τ11 τ21

−1

(

x1 − μ1 ). x2 − μ2 (i)

Suppose that there are two populations G i (i = 1, 2) which have samples X1 , . . . , (i = 1, 2). Then the estimates of the mean μ i and covariance matrix Σ i for G i are, respectively, 1 n i (i) X̄ i = (i = 1, 2). ∑X , n i k=1 k (5.5.1) ni 1 (i) (i) T ̄ ̄ Si = ∑ (X − X i )(X k − X i ) , (i = 1, 2). n i − 1 k=1 k (i) X ni

The Mahalanobis distance is estimated as ̄ d2 (X, G i ) = (X − X̄ i )T S−1 i (X − X i )

(i = 1, 2).

We say X ∈ G1 if d2 (X, G1 ) < d2 (X, G2 ), and we say X ∈ G2 if d2 (X, G1 ) ≥ d2 (X, G2 ). For example, for two one-dimensional populations G i with mean μ i and variance σ2i (i = 1, 2). We say x0 ∈ G1 if d2 (x, G1 ) < d2 (x, G2 ), i.e., (x0 − μ(1) )2 (x0 − μ(2) )2 < , σ21 σ22 which is equivalent to μ∗ < x0 < μ∗ , and we say x0 ∈ G2 if d2 (x, G1 ) ≥ d2 (x, G2 ), i.e., (x0 − μ(1) )2 (x0 − μ(2) )2 ≥ , σ21 σ22 which is equivalent to x0 ≤ μ∗ or x0 ≥ μ∗ , where μ∗ =

μ(1) σ2 − μ(2) σ1 , σ2 − σ1

μ∗ =

μ(1) σ2 + μ(2) σ1 . σ2 + σ1

When Σ1 = Σ2 =: Σ, the estimate of the covariance matrix Σ is S=

2 ni 1 (i) (i) ∑ ∑ (X k − X̄ i )(X k − X̄ i )T . n1 + n2 − 2 i=1 k=1

(5.5.2)

The Mahalanobis distance is estimated as follows: d2 (X, G i ) = (X − X̄ i )T S−1 (X − X̄ i ) = X T S−1 X − (X̄ i )T S−1 X − X T S−1 X̄ i + (X̄ i )T S−1 X̄ i

(i = 1, 2),

where S is stated in (5.5.2) which is the estimate of the covariance matrix Σ. Since S is a symmetric matrix and X T S−1 X̄ i is a 1 × 1 matrix, (X T S−1 X̄ i )T = (X̄ i )T S−1 X, d2 (X, G i ) = X T S−1 X − 2T i (X),

134 | 5 Statistical methods

where

T i (X) = (X̄ i )T S−1 X − 12 (X̄ i )T S−1 X̄ i

(i = 1, 2)

(5.5.3)

is a linear function. In fact, denote (i)

(i)

(X̄ i )T S−1 = (α1 , . . . , α n ), (X̄ i )T S−1 X̄ i = b i . Then, for X = (x1 , . . . , x m )T , m

(i) (X̄ i )T S−1 X = ∑ α k x k , k=1

and so

m

(i)

T i (X) = ∑ α k x k − k=1

bi , 2

(i) αk

where and b i are constants, i.e., T i (x) is a linear function. The T i (X) is called a linear discriminant function. The difference of two Mahalanobis distances is d2 (X, G1 ) − d2 (X, G2 ) = 2(T2 (X) − T1 (X)). If T1 (X) ≥ T2 (X), then x ∈ G1 , otherwise x ∈ G2 . More generally, suppose that there are k m-variate populations {G i }i=1,...,k . For a given sample X = (x1 , . . . , x m ), if d2l (X) = mini=1,...,k { d2i (X) }, then X ∈ G l .

5.5.2 Fisher method Suppose that there are k m-dimensional populations {G i }i=1,...,k with means μ i and the same covariance matrix Σ, and G i has n i samples X i1 , . . . , X in i (i = 1, . . . , k). Let 1 ni μ̂ i = X̄ i = ∑ X ij , n i j=1 1 k ni 1 k X̄ = ∑ ∑ X ij = ∑ n i μ̂ i , n i=1 j=1 n i=1 where n = n1 + ⋅ ⋅ ⋅ + n k . Denote C = (A + B)−1 A, where k

̄ X̄ i − X)̄ T , A = ∑ n i (X̄ i − X)( i=1 k

ni

B = ∑ ∑ (X ij − X̄ i )(X ij − X̄ i )T . i=1 j=1

5.5 Discriminant analysis

|

135

Let λ1 be the largest eigenvalue of the matrix C and υ1 be the corresponding eigenvector. Then, for a given observation X = (x1 , . . . , x m )T , a linear discriminant function is z1 = υT1 X. Suppose that μ̂ 1 , . . . , μ̂ k lie on a straight line. We compute k distances in the straight line d i = |υT1 X − υT1 μ̂ i | (i = 1, . . . , k). Let d l = mini=1,...,k {d i }. Then X ∈ G l . Suppose that μ̂ 1 , . . . , μ̂ k do not lie on a straight line but on a plane. Let λ2 be the second largest eigenvalue of the matrix C and υ2 be the corresponding eigenvector. Then, for a given observation X = (x1 , . . . , x m )T , two discriminant functions are y1 = υT1 X and y2 = υT2 X. We compute k distances in the plane d2i = (υT1 X − υT1 μ̂ i )2 + (υT2 X − υT2 μ̂ i )2

(i = 1, . . . , k).

Let d2l = mini=1,...,k { d2i }. Then X ∈ G l . If necessary, three discriminant functions are computed but usually two discriminant functions are sufficient by experience.

5.5.3 Bayes method In the Mahalanobis distance method and Fisher method, we do not consider the prior probability and the losses of mistaken decisions. Bayes method solves these two problems.

Prior probability Suppose that there are k populations G1 , . . . , G k with given probabilities q1 , . . . , q k , where each q i > 0 and ∑ki=1 q i = 1. Let X = (x1 , . . . , x m ) be an individual. Define a generalized square distance from X to a population G i as D2 (X, G i ) = d2i (X) + h1 (i) + h2 (i)

(i = 1, . . . , k),

where d2i (X) is the Mahalanobis distance between X and G i and {0 if Σ1 = ⋅ ⋅ ⋅ = Σ k , h1 (i) = { log|S i | otherwise, { {0 if q1 = ⋅ ⋅ ⋅ = q k , h2 (i) = { −2 log |q i | otherwise. { Here Σ i is the covariance matrix of G i and S i is an estimate of Σ i . The decision method of generalized square distance is X ∈ G l if D2 (X, G l ) = mini=1,...,k D2 (X, G i ).

136 | 5 Statistical methods

Posterior probability When an individual X is known, we compute the probability P(X ∈ G i ) P(X ∈ G i ) =

q i f i (x)

(i = 1, . . . , k),

∑1k q j f j (x)

where f j (x) is the probability density of G j and q j is the given probability associated with G j . If each G i is a normal population, then its density function is f i (x) = (2π)− 2 |Σ1 |− 2 e− 2 d i (x) m

1

and

2

e− 2 d i (x) 1

P(X ∈ G i ) =

1

2

∑1k e− 2 d j (x) 1

2

,

where d2i (x) = d2 (X, G i ). The decision is X ∈ G i if P(X ∈ G i ) = maxj=1,...,k P(x ∈ G j ). Suppose that there are k populations G1 , . . . , G k with prior probabilities q1 , . . . , q k . A discriminant criterion D means that a partition of R m R m = D1 ⋃ D2 ⋃ ⋅ ⋅ ⋅ ⋃ D k is given, where D1 , . . . , D k are mutually disjoint. For an individual X ∈ G i , we use the criterion D to decide X ∈ G j (j ≠ i) and we may get a mistaken decision. Denote the probability of making such a mistaken decision by P(j|i, D). Let the probability density function of the proposition G i be f i (x1 , . . . , x m ). Then P(j|i, D) = ∫ ⋅ ⋅ ⋅ ∫ f i (x1 , . . . , x m ) dx1 ⋅ ⋅ ⋅ dx m = ∫ f i (X) dX Dj

(j ≠ i).

(5.5.4)

Dj

Let L(j|i, D) be the loss of a mistaken decision and L(j|i, D) be determined by experience. Define the mean loss of mistaken decision as k

k

g(D) = ∑ q l ∑ P(j|l, D)L(j|l, D). l=1

(5.5.5)

j=1

If a discriminant method D∗ is such that g(D∗ ) = minall D g(D), we say D∗ conforms to the Bayes criterion.

Bayes criterion Suppose that there are k populations with joint density function f1 (X), . . . , f k (X), prior probabilities q1 , . . . , q k , and the losses of mistaken decisions L(j|i, D). Then D∗ = (D∗1 , . . . , D∗k ) conforms to the Bayes criterion, where D∗i = { X | h i (X) < h j (X) k

h j (X) = ∑ q l L(j|l, D)f l (X). l=1

(j ≠ i, j = 1, . . . , k) }, (5.5.6)

5.6 Cluster analysis

|

137

In fact, by (5.5.4), (5.5.5), and (5.5.6), k

k

l=1

j=1

g(D∗ ) = ∑ q l ∑ (∫ f l (X) dX) L(j|l, D∗ ) k

D∗j

k

k

= ∑ ∫ ( ∑ q l f l (X)L(j|l, D∗ )) dX = ∑ ∫ h j (X) dX. ∗ j=1 D j

∗ j=1 D j

l=1

If D = (D1 , . . . , D k ) is any partition on ℝm , then the mean loss caused by them is g(D) = ∑kl=1 ∫D h l (X)dX. So l

k

k

g(D∗ ) − g(D) = ∑ ∫ h j (X) dX − ∑ ∫ h l (X) dX ∗ j=1 D j

k

l=1 D l

k

= ∑ ∑∫

∗ l=1 j=1 D j ⋂ D l

(h j (X) − h l (X)) dX.

By (5.5.6), g(D∗ ) ≤ g(D), i.e., D∗ conforms the Bayes criterion.

5.6 Cluster analysis Cluster analysis is to partition all individuals into subgroups such that individuals in the same subgroup have similar characteristics. This necessitates devising a rule to measure the similarity between two individuals. Let X1 , . . . , X n be n individuals of m-dimensional random vectors. Ordinarily, one measures the similarity by the distance between X i and X j . When the distance is small, we say X i and X j are similar. Now we define the distances between two individuals and between two subgroups. Let X i = (x i1 , . . . , x im ) (i = 1, . . . , n). To ensure that all individuals have a similar scale, one uses the standardized data of individuals. Denote the individual mean vector by X̄ = (X̄ 1 , . . . , X̄ n )T and the individual covariance matrix by S = (S ij )m×m . Then the standardized transform is defined as x ij − X̄ j x∗ij = (i = 1, . . . , n; j = 1, . . . , m), Sj where S2j =

1 n−1

∑ni=1 (x ij − X̄ j )2 (j = 1, . . . , m).

5.6.1 Distance between individuals X i and X j –

Minkowski distance is defined as 1 p

m

d ij (p) = ( ∑ |x ik − x jk | ) p

k=1

(i, j = 1, . . . , n; 0 ≤ p ≤ ∞).

138 | 5 Statistical methods

Specially, the following distance measures are called absolute value distance: m

d ij (1) = ∑ |x ik − x jk |; k=1

Euclidean distance:

1 2

m

d ij (2) = ( ∑ |x ik − x jk | ) ; 2

k=1

Euclidean distance with variance weight: d∗ij (2)

1

|x ik − x jk |2 2 = (∑ ) ; Sk k=1 m

Chebyshev distance: d ij (∞) = max |x ik − x jk |. k=1,...,m



Mahalanobis distance is defined as d ij (M) = (X i − X j )T S−1 (X i − X j ).



The distance is defined as d2ij = 1 − c2ij , where c ij is equal to cosine of angle α ij between m-dimensional vectors (x i1 , . . . , x im ) and (x j1 , . . . , x jm ) c ij = cos α ij =

∑m k=1 x ik x jk 1

(i, j = 1, . . . , n).

1

m 2 2 2 2 (∑m k=1 x ik ) (∑k=1 x jk )

The c ij is called the similarity coefficient between X i and X j .

5.6.2 Distance between subgroups G p and G q Denote by D pq the distance between G p and G q . – Single linkage: D pq = minX i ∈G p ,X j ∈G q d ij . – Complete method: D pq = maxX i ∈G p ,X j ∈G q d ij . – Gentraid method: Let G r = G p ⋃ G q , and n p and n r be cardinal numbers of G p and G q , respectively, and X̄ p and X̄ q be mean values of G p and G q , respectively. Then the mean value of G r is 1 X̄ r = (n p X̄ p + n q X̄ q ) (n r = n p + n q ). nr



For a subgroup G k (k ≠ p, q) with the mean value X̄ k , the distance D rk between G r and G k is D rk = d(X̄ r , X̄ k ), where d is the Euclidean distance. Average linkage: Let n p , n q be a cardinal number of G p , G q , respectively. Define the distance as 1 D pq = d2 . ∑ n p n q X ∈G , X ∈G ij i

p

j

q

5.7 Principal component analysis

|

139

5.6.3 Hierarchical cluster method Suppose that there are n individuals and each individual has m indices. Step 1. Start from n clusters with each cluster containing only one individual. Compute the distance between any two individuals to obtain a matrix of distances. Step 2. Combine two nearest pair of clusters. Step 3. Compute the distances between the newly formed clusters to obtain a new matrix of distances. Step 4. Repeat steps 2 and 3 until there is one cluster left. Step 5. Determine the number of clusters and members of each cluster.

5.7 Principal component analysis In multivariate statistical analysis, many variables may be highly correlated and the covariance matrix is of high dimension. For statistical inference, there may be too many parameters. Therefore, one hopes to reduce the number of variables without much loss of information.

5.7.1 Principle component decomposition The main idea of principal component analysis (PCA) is to transform the set of variables X1 , . . . , X m with mean 0 and covariance matrix Σ into a smaller set of uncorrelated new variables Y1 , . . . , Y k (k < m) without much loss of information and Y1 , . . . , Y k are a linear combination of original variables X1 , . . . , X m . Consider a linear combination m

Y i = ∑ a ij X j

(i = 1, . . . , m).

j=1

The matrix form is Y = AX, where A = (a ij )i,j=1,...,m , X = (X1 , . . . , X m )T , Y = (Y1 , . . . , Y m )T . Denote the i-th row vector of A by A i , i.e., A i = (A i1 , . . . , A im ). The covariance matrix of Y is E[YY T ] = E[AXX T AT ] = AE[XX T ]AT . Since Σ = E[XX T ], E[YY T ] = AΣAT , and so Var(Y i ) = A i ΣATi , Cov(Y i , Y j ) = A i ΣATj

(i, j = 1, . . . , m).

140 | 5 Statistical methods

Since the covariance matrix Σ is a real symmetric matrix and is nonnegative definite, all eigenvalues are nonnegative real numbers satisfying λ1 ≥ λ2 ≥ ⋅ ⋅ ⋅ ≥ λ m ≥ 0 and the corresponding eigenvectors a∗1 , . . . , a∗m with unit length are linear independent. Denote a∗i = (a∗1i , . . . , a∗mi )T (i = 1, . . . , m). Define m

Y i∗ = ∑ a∗ij X j .

(5.7.1)

j=1

Then

Var(Y i∗ ) = (a∗i )T Σa∗i = λ i Cov(Y i∗ , Y j∗ )

=0

(i, j = 1, . . . , m), (i ≠ j; i, j = 1, . . . , m).

(5.7.2)

∗ are obtained by linear combination of original These new variables Y1∗ , . . . , Y m variables X1 , . . . , X m . These new variables are called principal components. The goal of principal component analysis is to replace the original set of variables by the first few principal components if the first few principal components can explain most variability. We rewrite (5.7.1) into the matrix form Y ∗ = A∗ X, where ∗ T Y ∗ = (Y1∗ , . . . , Y m ) ,

X = (X1 , . . . , X m )T , A∗ = (a∗ij )i,j=1,...,m and A∗ is an orthogonal matrix whose column vectors are m eigenvectors A∗ = (a∗1 , . . . , a∗m ). So X = (A∗ )−1 Y ∗ = (A∗ )T Y ∗ . ∗ )T , Since A∗ = (a∗1 , . . . , a∗m ) and Y ∗ = (Y1∗ , . . . , Y m m

X = ∑ Y i∗ a∗i .

(5.7.3)

1

Since the total variation of X = (X1 , . . . , X m )T is equal to Tr(Σ) (i.e., the trace of covariance matrix Σ). The trace Tr(Σ) is equal to the sum of eigenvalues Tr(Σ) = λ1 + ⋅ ⋅ ⋅ + λ m . By (5.7.2), the importance of the j-th principal component can be measured by the ratio λ j / Tr(Σ) (j = 1, . . . , m). So the importance of the first k principal components can be measured by the ratio k

∑ λ j / Tr(Σ). 1

If we use two principal components to replace the original m variables without much loss of information, we can obtain better parameter estimates and better use of graphical tools.

5.7 Principal component analysis

|

141

In practice, the covariance matrix Σ of X is unknown. We should use samples to estimate it. Suppose that there is a sample {x j }j=1,...,n , where x j = (x1j , . . . , x mj )T (j = 1, . . . , n) are the m-dimensional vectors. The covariance matrix Σ has an estimate Σ̂ = (α ij )i,j=1,...,m , where α ij = 1n ∑nk=1 x ki x kj (i, j = 1, . . . , m). The accuracy of these estimates depends on the sample size n. Since results of principal component analysis depend on units of variables, it is desirable that the original data has a similar scale. Therefore, one often performs PCA on the standardized data z ij = x ij /σ̂ j , where σ̂ j is the estimate of variance of x j . Let λ1̂ ≥ λ2̂ ≥ ⋅ ⋅ ⋅ ≥ λ ̂m be the eigenvalues of Σ̂ and the corresponding eigenvectors with unit length be â ∗1 , . . . , â ∗m . Denote â ∗i = (â∗1i , . . . , â∗mi )T (i = 1, . . . , m). Define m

ŷ∗i = ∑ â∗ij x j

(i = 1, . . . , m).

j=1

Let ŷ∗i = (y∗̂i1 , . . . , y∗̂in )T . Then m

y∗̂il = ∑ â∗ij x jl

(i = 1, . . . , m; l = 1, . . . , n).

j=1

These new data {y∗̂il }m×n are called principal component scores and are used for further analysis.

5.7.2 Rotation of principal components Assume that for m random variables X1 , . . . , X m with mean 0, we find k principal components Y1 , . . . , Y k without much loss of information. Each variable X i has practical meaning but each Y i does not always have practical meaning. Take a k × k orthogonal matrix G. Define Z = GY , where Z = (z1 , . . . , z k )T and Y = (Y1 , . . . , Y k )T . The z1 , . . . , z k are called the rotated principal components. They are some new linear combination of X1 , . . . , X m and possess the following two properties: – the total variance is invariant, i.e., k

k

k

∑ Var(z i ) = ∑ Var(Y i ) = ∑ λ i , 1



1

1

where λ1 , . . . , λ k are the first k eigenvalues of covariance matrix of X; z1 , . . . , z k are correlated, i.e., Cov(z i , z j ) ≠ 0 (i ≠ j).

142 | 5 Statistical methods

5.8 Canonical correlation analysis Canonical correlation analysis is a kind of statistical method to study the relationship between two random vectors X and Y , where X = (X1 , . . . , X p )T and Y = (Y1 , . . . , Y q )T . Assume that the means of X and Y are both 0. We hope to choose a p-dimensional vector f1 = (f11 , . . . , f1p )T and a q-dimensional vector p q g1 = (g11 , . . . , g1q )T such that α1 = ∑k=1 f1k X k and β1 = ∑l=1 g1l Y l satisfy the following two conditions: (a) the correlation coefficients of α1 and β1 attain the maximal value; (b) Var(α1 ) = Var(β1 ) = 1. To find f1 and g1 , we first state the concepts of the square roots and singular values of matrices. Let A be a real symmetric nonnegative definite matrix. Then there is an orthogonal matrix Γ such that A = Γ diag(λ1 , . . . , λ m )Γ T , where λ1 ≥ λ2 ≥ ⋅ ⋅ ⋅ ≥ λ m > 0. Define the square root of A as A 2 = Γ diag(√ λ1 , . . . , √ λ m )Γ T . 1

1

1

It is easy to prove that A 2 A 2 = A. Let B be an m × n matrix with rank r. Then there exists an m × m orthogonal matrix C and an n × n orthogonal matrix D such that B = C(

G 0

0 ) DT , 0

where G = diag(μ1 , . . . , μ r ) (r ≤ min{ m, n }) and μ1 ≥ μ2 ≥ ⋅ ⋅ ⋅ ≥ μ r ≥ 0. Each μ k is called a singular value. The k-th column vector C k of the matrix C is called the left singular vector of B. The k-th column vector D k of the matrix D is called the right singular vector of D. Using the Lagrange multiplier method, the following f1 and g1 −1

f1 = Σ XX2 C1 ,

−1

g1 = Σ YY2 D1

satisfy the above conditions (a) and (b), where Σ XX and Σ YY are covariance matrices of X and Y , respectively, and C1 and D1 are the left and right singular vectors of matrix −1

−1

S = Σ XX2 Σ XY Σ YY2 corresponding to the maximal singular values μ1 of S, and Σ XY is the covariance matrix of X and Y . Moreover, Cov(α1 , β1 ) = √μ1 .

5.9 Factor analysis

|

143

Assume that S has r nonzero singular values μ1 ≥ μ2 ≥ ⋅ ⋅ ⋅ ≥ μ r > 0. We further find f k and g k (k = 1, . . . , r) such that α k = f kT X and β k = g kT Y (k = 1, . . . , r) satisfy Var(α k ) = Var(β k ) = 1, (f k , f l ) = (g k , g l ) = 0

(k ≠ l),

Cov(α k , β k ) = √μ k , (f k , g l ) = δ kl

(k, l = 1, . . . , r),

where δ ij is the Kronecker delta.

5.9 Factor analysis Factor analysis is to remove redundancy from a set of correlated statistical variables and represent these variables with a smaller set of new statistical variables.

5.9.1 The factor analysis model Let X = (X1 , . . . , X p )T be a random vector with mean 0. The factor analysis model can be written as m

X k = ∑ λ ki f i + ε k

(k = 1, . . . , p, m ≤ p),

i=1

where f i ’s are called factors and λ ki is called a loading of the k-th variable on the i-th factor, which reflects the relative importance of the i-th factor for the k-th variable, and ε k ’s are random errors. The matrix form of the factor analysis model is X = ΛF + ϵ,

(5.9.1)

where Λ = (λ kl )p×m is the factor loading matrix, F = (f1 , . . . , f m )T is the common factor vector, and ϵ = (ε1 , . . . , ε m )T . The assumptions for factor analysis are – factors f k ’s are independently and identically distributed with mean 0 and variance 1; – random errors ε k ’s are independent with mean 0 and variance ψ k ; – f k and ε j are independent for any k and j.

5.9.2 The factor analysis equation Let X be a random vector with mean 0. Then the covariance matrix of X is Σ = E[XX T ] = E[(ΛF)(ΛF)T ] + E[ϵ(ΛF)T ] + E[(ΛF)ϵ T ] + E[ϵϵT ] = A + B + C + D

144 | 5 Statistical methods

and by the assumption, A = E[ΛFF T ΛT ] = ΛE[FF T ]ΛT = ΛΛT , B = E[ϵF T ]ΛT = 0, C = ΛE[Fϵ T ] = 0, D = E[ϵϵT ] = diag(ψ1 , . . . , ψ p ) =: Ψ. So Σ = ΛΛT + Ψ,

(5.9.2)

where Ψ = diag(ψ1 , . . . , ψ p ). This equation (5.9.2) is called a factor analysis equation. Let Σ = (σ kl )p×m . Then the variance of X k is m

σ kk = ∑ λ2kj + ψ k

(k = 1, . . . , p).

j=1 2 Therefore, the proportion of variance of X k explained by factors f1 , . . . , f m is ∑m j=1 λ kj / σ kk . The the factor loading matrix Λ is not unique.

5.9.3 Parameter estimate method The goal of factor analysis is to describe the structure of covariance of p variables by only a few factors. So this needs estimate factor loadings {λ ki } and variances ψ k of errors.

Maximal likelihood method If factors f k󸀠 s and errors ϵ󸀠k s follow normal distributions, the maximal likelihood estimates of the loading matrix and random error can be obtained. By the factor analysis equation, the sample likelihood function is L(X,̄ Σ) = L(X,̄ ΛΛT + Ψ) =: φ(Λ, Ψ). We choose Λ0 and Ψ0 such that φ(Λ, Ψ) attains the maximal value. It can be proved that Λ0 and Ψ0 satisfy the system of equations { SΨ0−1 Λ0 = Λ0 (I + ΛT Ψ0−1 Λ0 ), { Λ = diag(S − Λ0 ΛT0 ), { 0

(5.9.3)

̄ k − X)̄ T is the sample covariance matrix. where S = 1n ∑nk=1 (X k − X)(X To ensure that the system of equations (5.9.3) has a unique solution, Bayes suggested to add a condition that ΛT0 Ψ0−1 Λ0 is a diagonal matrix. The system of equations is solved using the iteration method.

5.9 Factor analysis

| 145

Principal component method Let the eigenvalues of the sample covariance matrix Σ be λ1 ≥ ⋅ ⋅ ⋅ ≥ λ p ≥ 0 and the corresponding unit orthogonal eigenvectors be l1 , l2 , . . . , l p , i.e., lSlT = diag(λ1 , . . . , λ p ), p

p

m

S = lT diag(λ1 , . . . , λ p )l = ∑ λ i l i lTi = ∑ λ i l i lTi + ∑ λ i l i lTi , 1

1

m+1

where l = (l1 , . . . , l p ). This formula is called spectral decomposition. When the eigenvalues λ m+1 , . . . , λ p are small, S may decompose approximately into √ λ1 lT1 S ≈ (√ λ1 l1 , . . . , √ λ m l m ) ( ... ) + diag(ψ1 , . . . , ψ p ) = ΛΛT + Ψ, √ λ m lTm where Λ = (√ λ1 l1 , . . . , √ λ m l m ) =: (λ ki )p×m , m

Ψ = diag(ψ1 , . . . , ψ p ),

ψ k = S kk − ∑ λ2ki

(k = 1, . . . , p).

i=1

This gives a solution of the factor analysis equation. The j-th column of the loading matrix is √λ j times the j-th principal component of X. Denote the error S − (ΛΛT + Ψ) = (η ij )p×p . p

p

It can be proved that ∑i,j=1 η2ij ≤ ∑m+1 λ2j . Therefore, we may choose m such that the error is very small. One often chooses m such that m

∑ λj 1 p

≥ 0.7.

∑ λj 1

5.9.4 Rotation of a loading matrix The goal of factor analysis is not only to find common factors but also to know the practical meaning of each common factor. In Section 5.9.3 we found the initial common factor using the maximal likelihood method and principal component method. In order to ensure that each common factor has practical meaning, we should choose an orthogonal matrix Q such that Λ∗ := ΛQ is also a loading matrix, and the new common factors have practical meaning. In fact, since Q is an orthogonal matrix, QQT = I and Λ∗ (Λ∗ )T = ΛQQT Λ t = ΛΛT , and so Σ = Λ∗ (Λ∗ )T + Ψ, X = ΛF + ϵ = ΛQ(QT F) + ϵ = Λ∗ Z + ϵ,

146 | 5 Statistical methods where Z = QT F and the covariance matrix of Z is Σ Z = E[ZZ T ] = E[QT FF T Q] = I and Cov(Z, ϵ) = Cov(QT F, ϵ) = QT Cov(F, ϵ) = 0. It is seen that if F is the common factor vector of the factor model, then, for any orthogonal matrix Q, Z = QT F is also the common factor vector and ΛQ is the loading matrix of the common factor Q. We should use rotation repeatedly such that many factor loadings tend to zero and maximize other factor loadings. So this lets us focus on those factors with large loadings. This method is called the orthogonal rotation of factor axes. Suppose that the factor analysis model is X = ΛF + ϵ and Λ = (λ ij )p×m . Let h2i = m ∑j=1 λ2ij (i = 1, . . . , p). Then h2i is called communality of the variable X i . Let d2ij =

λ2ij h2i

(i = 1, . . . , p; j = 1, . . . , m),

d11 .. . ( D=( d ( i1 .. . d ( p1

⋅⋅⋅ .. . ⋅⋅⋅ .. . ⋅⋅⋅

d1j .. . d ij .. . d pj

⋅⋅⋅ .. . ⋅⋅⋅ .. . ⋅⋅⋅

d1m .. . ) d im ) ). .. . d pn )

The variance of p data in the j-th column of D is defined as p

Vj = where d̄ j =

1 p

1 ∑ (d2 − d̄ 2j ), p i=1 ij

p

∑i=1 d2ij (j = 1, . . . , m). This implies that 2

p p λ2 λ4ij 1 1 μj Vj = ∑ ( 4 − 2 ( ∑ 2 ) ) . p i=1 h i p μ=1 h μ

The variance of the factor loading matrix Λ is m

V = ∑ Vj = j=1

2

p λ4 p λ2 1 m μj ij − ( ∑ (p ∑ ∑ ) ). 4 2 p2 j=1 h h μ=1 μ i=1 i

Through a rotation of a loading matrix, we expect that the variance V is large such that the loading values tend to 1 or tend to 0. In this way, the corresponding common factors have a simplified structure.

Further reading

| 147

Further reading [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15] [16]

Bao G, Liu Y, Linderholm HW. April-September mean maximum temperature inferred from Hailar pine tree rings in the Hulunbuir region, Inner Mongolia, back to 1868 AD. Palaeogeogr Palaeoclimatol Palaeoecol. 2012(313–314):162–172. Chen F, Yuan Y, Wei W, Yu S, Zhang T. Reconstructed temperature for Yongan, Fujian, Southeast China: Linkages to the Pacific Ocean climate variability. Global Planet Change. 2012(86–87): 11–19. Gao J, Shi Z, Xu L, Yang X, Jia Z, Lu S, Feng C, Shang J. Precipitation variability in Hulunbuir, northeastern China since 1829 AD reconstructed from tree-rings and its linkage with remote oceans. J Arid Environ. 2013(95):14–21. Gregg D, Rolfe J. The value of environment across efficiency quantiles: A conditional regression quantiles analysis of rangelands beef production in North Eastern Australia. Ecological Economics. 2016(128):44–54. Fernandez P, Mourato S, Moreira M, Pereira L. A new approach for computing a flood vulnerability index using cluster analysis. Physics and Chemistry of the Earth, Parts A/B/C. 2016: in press. Jones M, Randell D, Ewans K, Jonathan P. Statistics of extreme ocean environments: Nonstationary inference for directionality and other covariate effects. Ocean Engineering. 2016(119):30–46. Liu Y, Bao G, Song H, Cai Q, Sun J. Precipitation reconstruction from Hailar pine tree rings in the Hailar region, Inner Mongolia, China back to 1865 AD. Palaeogeogr Palaeoclimatol Palaeoecol. 2009(282):81–87. Parinet J, Julien M, Nun P, Robins RJ, Remaud G, Hohener P. Predicting equilibrium vapour pressure isotope effects by using artificial neural networks or multi-linear regression – A quantitative structure property relationship approach. Chemosphere. 2015(134):521–527. Parente J, Pereira MG, Tonini M. Space-time clustering analysis of wildfires: The influence of dataset characteristics, fire prevention policy decisions, weather and climate. Science of the Total Environment. 2016(559):151–165. Roberts-Jones J, Bovis K, Martin MJ, McLaren A. Estimating background error covariance parameters and assessing their impact in the OSTIA system. Remote Sensing of Environment. 2016(176):117–138. Singh S, Prakash A, Chakraborty NR, Wheeler C, Agarwal PK, Ghosh A. Trait selection by path and principal component analysis in Jatropha curcas for enhanced oil yield. Industrial Crops and Products. 2016(86):173–179. Tan KC, Lim HS, Jafri MZM. Prediction of column ozone concentrations using multiple regression analysis and principal component analysis techniques: A case study in peninsular Malaysia. Atmospheric Pollution Research. 2016: in press. Wang Y, Lu R, Ma Y, Sang Y, Meng H, Gao S. Annual variation in PDSI since 1897 AD in the Tengger Desert, Inner Mongolia, China, as recorded by tree-ring data. J Arid Environ. 2013(98): 20–26. Weichenthal S, Ryswyk KV, Goldstein A, Bagg S, Shekkarizfard M, Hatzopoulou M. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach. Environmental Research. 2016(146):65–72. Zhang Z. Tree-Rings, a key ecological indicator of environment and climate change. Ecological Indicators. 2015(51):107–116. Zhao X, Qian J, Wang J, He Q, Wang Z, Chen C. Using a tree ring δ13 C annual series to reconstruct atmospheric CO2concentration over the past 300 years. Pedosphere. 2006(16):371–79.

6 Numerical methods Numerical methods are widely used in each branch of environmental science. While environmental simulation, prediction and impact assessment depend on mathematical models, numerical methods provide a practical approach in quickly computing the solution of these models. In this chapter, we will introduce numerical integration and differentiation, numerical linear algebra, and numerical solution of ordinary differential equations and partial differential equations by using iterative method, difference method, finite element method, and wavelet method.

6.1 Numerical integration Let x k ∈ [a, b] (k = 1, . . . , n). If a function f is a continuous function on [a, b] and b f(x k ) (k = 1, . . . , n) are known, the quadrature formula for the integral ∫a f(x) dx is b

n

∫ f(x) dx ≐ ∑ A k f(x k ), 1

a

where x k are called nodes and A k are called quadrature coefficients. Throughout this chapter, the notation A ≐ B means that A is equal approximately to B. One wants to choose suitable nodes x k and coefficients A k such that the error b

n

e n (f) = ∫ f(x) dx − ∑ A k f(x k ) a

k=1

is as small as possible.

6.1.1 Interpolation-type quadrature formulas Assume that f is a continuous function on [a, b] and a ≤ x1 ≤ x2 ≤ ⋅ ⋅ ⋅ ≤ x n ≤ b. Let ω n (x) = (x − x1 )(x − x2 ) ⋅ ⋅ ⋅ (x − x n ), l k (x) =

ω n (x) ω󸀠n (x k )(x − x k )

(k = 1, . . . , n).

The Lagrange interpolation polynomial for f is L(x) = ∑nk=1 f(x k )l k (x) (see Section 4.2). Since f(x) ≐ L(x), b

b n

n

∫ f(x) dx ≐ ∫(∑ f(x k )l k (x)) dx = ∑ A k f(x k ), a DOI 10.1515/9783110424904-007

a

1

1

(6.1.1)

6.1 Numerical integration

|

149

b

where A k = ∫a l k (x) dx (k = 1, . . . , n). Assume further that f ∈ C n ([a, b]). Then the error is estimated by b 󵄨󵄨 󵄨󵄨 󵄨󵄨 1 󵄨 C 󵄨 max |f (n) (x)|, |e n (f)| = 󵄨󵄨 ∫ f (n) (ξ x )ω n (x) dx󵄨󵄨󵄨󵄨 ≤ 󵄨󵄨 n! 󵄨󵄨 n! a≤x≤b a b

where C = ∫a |ω n (x)| dx. If f is a polynomial of degree ≤ n − 1, then f(x) ≡ L(x). So (6.1.1) is exact, i.e., b

n

∫ f(x) dx = ∑ A k f(x k ). 1

a

Let n = 2, x1 = a, x2 = b in (6.1.1). The trapezoidal formula is b

∫ f(x) dx ≐ a

Let n = 3, x1 = a, x2 =

a+b 2 , x3

b

∫ f(x) dx ≐ a

b−a (f(a) + f(b)). 2

= b in (6.1.1), then the Simpson formula is

b−a a+b (f(a) + 4f ( ) + f(b)) . 6 2

Let h = b−a n and x k = a + kh(k = 0, . . . , n). Applying the trapezoidal formula on each interval [x k , x k+1 ] (k = 0, . . . , n − 1), the complicated trapezoidal formula is given by b

∫ f(x) dx ≐ a

If f ∈

C2 ([a, b]),

n−1 h (f(a) + 2 ∑ f(a + kh) + f(b)) . 2 1

the error is e n (f) = −

(b − a)3 󸀠󸀠 f (η1 ) (a ≤ η1 ≤ b). 12n2

Let h = b−a 2m and x k = a + kh (k = 0, . . . , 2m). Applying the Simpson formula on each interval [x2k−2 , x2k ] (k = 1, . . . , m), the complicated Simpson formula is given by b

∫ f(x) dx ≐ a

m m−1 h (f(a) + 4 ∑ f(x2k−1 ) + 2 ∑ f(x2k ) + f(b)) . 3 1 1

If f ∈ C4 ([a, b]), the error is e n (f) = −

(b − a)5 (4) f (η2 ) 2880m4

(0 ≤ η2 ≤ b).

150 | 6 Numerical methods

6.1.2 Gauss quadrature formula One expects to choose nodes such that the quadrature formula becomes more precise. The zeros of Legendre polynomials play such a role. It is well known that Legendre polynomials 1 dn (x2 − 1)n P n (x) = n (n = 0, 1, . . . ) 2 n! dx n (n)

(n)

are orthogonal polynomials on [−1, 1] with weight 1. Its n zeros x k satisfy −1 < x1 < (n) ⋅ ⋅ ⋅ < x n < 1. If the interpolation nodes are zeros of the Legendre polynomial, then the Gauss–Legendre quadrature formula is 1

n

(n)

(n)

∫ f(x) dx ≐ ∑ A k f(x k ), 1

−1

where (n) Ak

(n)

=

2(1 − (x k )2 ) (n)

n2 P2n−1 (x k )

.

This formula is also the classical Gauss quadrature formula. For f ∈ C2n ([−1, 1]), the error is 22n+1 (n!)4 e n (f) = f (2n) (ξ) (−1 < ξ < 1). (2n + 1)((2n)!)3 If f is a polynomial of degree ≤ 2n − 1, then the Gauss–Legendre quadrature formula is exact, i.e., 1

n

(n)

(n)

∫ f(x) dx = ∑ A k f(x k ). 1

−1

More generally, assume that { ω n (x) } is a monic orthogonal polynomial system on (n) [a, b] with weight ρ(x). Take zeros {x k }k=1,...,n of ω n (x) as nodes. The general Gauss quadrature formula is b

n

(n)

(n)

∫ f(x)ρ(x) dx ≐ ∑ A k f(x k ), 1

a

where b (n) Ak

=∫ a

ρ(x)ω n (x) (n)

(n)

(x − x k )ω󸀠n (x k )

dx.

Assume further that f ∈ C2n ([a, b]). The error is b

e n (f) = ∫ f(x)ρ(x) dx − a

n

(n) (n) ∑ A k f(x k ) 1

b

f (2n) (ξ) = ∫ ρ(x)ω2n (x) dx (2n)! a

(a < ξ < b).

6.1 Numerical integration

|

151

If f is a polynomial of degree ≤ 2n − 1, the general Gauss quadrature formula is exact, i.e., b

n

(n)

(n)

∫ f(x)ρ(x) dx = ∑ A k f(x k ). 1

a

As a special case, the monic Chebyshev polynomials T̃ n (x) =

cos(n arccos x) 2n−1

(n = 0, 1, . . . )

are an orthogonal polynomial on [−1, 1] with weight ρ(x) = x k = cos 1

∫ −1

2k−1 2n π

1 √1−x2

. Taking its zeros

(k = 1, . . . , n) as nodes, the general Gauss quadrature formula is

f(x) √1 − x 2

dx =

π n 2k − 1 π f (2n) (ξ) (−1 < ξ ≤ 1). π) + ∑ f (cos n 1 2n (2n)!22n−1

This formula is also called the Hermite formula.

6.1.3 Gauss quadrature formula on infinite intervals Laguerre polynomials L n (x) = ex

dn −x n (e x ) dx n

(n = 0, 1, . . . ) (n)

are orthogonal polynomials on [0, ∞) with weight ρ(x) = e−x . Its n zeros x k satisfy (n) (n) 0 < x1 < ⋅ ⋅ ⋅ < x n < ∞. Take these zeros as interpolation nodes. The Gauss–Laguerre quadrature formula is ∞

n

(n)

(n)

∫ e−x f(x) dx ≐ ∑ A k f(x k ), 1

0

where (n)

Ak =

(n!)2 (n) (n) x k (L󸀠n (x k ))2

.

Assume that f ∈ C2n ([0, ∞)). Then the error is e n (f) =

(n!)2 (2n) (ξ) (0 < ξ < ∞). f (2n)!

Hermite polynomials H n (x) = (−1)n ex

2

dn −x2 (e ) (n = 0, 1, . . . ) dx n

152 | 6 Numerical methods are orthogonal polynomials on (−∞, ∞) with weight ρ(x) = e−x . Take its zeros as interpolation nodes. The Gauss–Hermite quadrature formula is 2

n

(n)

∫ e−x f(x) dx ≐ ∑ A k f(x k ), 2



1

where (n)

Ak =

2n+1 n!√π (n)

(H n󸀠 (x k ))2

.

Assume that f ∈ C2n (ℝ). The error is e n (f) =

n!√π (2n) (ξ) (ξ ∈ ℝ). f 2n (2n)!

If f is a polynomial of degree ≤ 2n − 1, then the Gauss–Laguerre quadrature formula and the Gauss–Hermite quadrature formula are both exact.

6.2 Numerical differentiation Assume that a function is smooth on a closed interval and its values at some points on the closed interval are known. The numerical differentiation algorithm is to estimate its derivative by its values at these points.

6.2.1 Differentiation via polynomial interpolation Let f(x) be a smooth function on [a, b] and a =< x1 < ⋅ ⋅ ⋅ < x n = b. We use the derivative of the Lagrange interpolation polynomial L n (x) to estimate the derivative of f as follows: f 󸀠 (x k ) ≐ L󸀠n (x k ) (k = 1, . . . , n). By (4.2.5), 1 (n) f (ξ x ) ω n (x) (a < ξ x < b), n! where ω n (x) = (x − x1 ) ⋅ ⋅ ⋅ (x − x n ). Differentiating both sides gives f(x) − L n (x) =

f 󸀠 (x) − L󸀠n (x) =

f (n) (ξ x ) dω n (x) ω n (x) df (n) (ξ x ) + . n! dx n! dx

Let x = x k . Note that ω n (x k ) = 0. Then f 󸀠 (x k ) − L󸀠n (x k ) =

f (n) (ξ x ) 󸀠 ω n (x k ). n!

This derives the numerical differentiation formulas often used as follows. Let x k+1 − x k = h (k = 1, . . . , n − 1) and y k = f(x k ) (k = 1, . . . , n).

6.2 Numerical differentiation

|

153

(a) Two-point form x − x1 x − x2 y2 − y1 h h is the interpolation function with nodes x1 and x2 . So L1 (x) =

y2 − y1 , h y2 − y1 . f 󸀠 (x2 ) ≐ L󸀠1 (x2 ) = h

f 󸀠 (x1 ) ≐ L󸀠1 (x1 ) =

(b) Three-point form (x − x2 )(x − x3 ) (x − x1 )(x − x3 ) (x − x1 )(x − x2 ) y1 − y2 + y3 2h2 h2 2h2 is the interpolation function with nodes x1 , x2 , x3 . So the three-point form is L2 (x) =

−3y1 + 4y2 − y3 , 2h y3 − y1 , f 󸀠 (x2 ) ≐ L󸀠2 (x2 ) = 2h y1 − 4y2 + 3y3 . f 󸀠 (x3 ) ≐ L󸀠2 (x3 ) = 2h f 󸀠 (x1 ) ≐ L󸀠2 (x1 ) =

From L2 (x), the second-order numerical differentiation formula is f 󸀠󸀠 (x1 ) = f 󸀠󸀠 (x2 ) = f 󸀠󸀠 (x3 ) =

y1 − 2y2 + y3 . h2

In general, when L n (x) converges to f(x), L󸀠n (x) does not necessarily converge to To avoid this problem, we may use the spline interpolation function to find differentiation. f 󸀠 (x).

6.2.2 Differentiation via spline interpolation Let f ∈ C4 ([a, b]) and s(x) be its cube spline interpolation function associated with partition a = x0 < x1 < ⋅ ⋅ ⋅ < x n = b. When λ = max0≤k≤n−1 |x k+1 − x k | → 0, by Proposition 4.4.2, s(x) → f(x), s󸀠 (x) → f 󸀠 (x), s󸀠󸀠 (x) → f 󸀠󸀠 (x) uniformly and the errors are estimated as |f (i) (x) − s(i) (x)| ≤ Cλ4−i

(i = 0, 1, 2),

where C is independent of x, and the partition and the representation of s(x) is stated in Section 4.4.2.

154 | 6 Numerical methods Find the derivative of first order on [x k , x k+1 ] (k = 0, . . . , n − 1), by (4.4.1), we get s󸀠 (x) =

6 1 ( (x − x k+1 )2 + (x − x k+1 )) y k+1 δ2k δ k 6 1 + 2 ((x − x k ) + (x − x k )2 ) y k δk δk 1 1 ( 3 (x − x k+1 )2 + 2(x − x k+1 )) μ k δk δ k 1 3 − (2(x − x k+1 ) − (x − x k )2 ) μ k+1 . δk δk +

So f 󸀠 (x) ≈ s󸀠 (x). Especially, f 󸀠 (x k ) = μ k (k = 0, . . . , n). The representation of s󸀠󸀠 (x) is stated as in (4.4.2). So f 󸀠󸀠 (x) ≐ s󸀠󸀠 (x).

6.2.3 Richardson extrapolation By the Taylor formula, it follows that for small h, f(x + h) − f(x − h) = 2hf 󸀠 (x) + or

2 3 󸀠󸀠󸀠 2 h f (x) + h5 f (5) (x) + ⋅ ⋅ ⋅ . 3! 5!

1 1 f(x + h) − f(x − h) = f 󸀠 (x) + h2 f 󸀠󸀠󸀠 (x) + h4 f (5) (x) + ⋅ ⋅ ⋅ . 2h 3! 5!

Denote a2 = − f

󸀠󸀠󸀠

(x) 3! ,

a4 = − f

(5)

(x) 5! ,… .

This equality becomes that for any small h,

f 󸀠 (x) = φ(h) + a2 h2 + a4 h4 + a6 h6 + ⋅ ⋅ ⋅ =: L(h). where φ(h) =

f(x + h) − f(x − h) . 2h

(6.2.1)

This leads to L(h) = L(h/2) = L(h/4) = ⋅ ⋅ ⋅ . So

h h h4 h6 + a6 + ⋅⋅⋅ , 4L(h) = 4L ( ) = 4φ ( ) + a2 h2 + a4 2 2 4 16 f 󸀠 (x) = L(h) =

4 5 h 1 1 φ ( ) − φ(h) − a4 h4 − a6 h6 + ⋅ ⋅ ⋅ , 3 2 3 4 16

and so f 󸀠 (x) =

4 h 1 φ ( ) − φ(h) + O(h4 ), 3 2 3

L(h) = g(h) + b4 h4 + b6 h6 + ⋅ ⋅ ⋅ .

(6.2.2)

6.3 Iterative methods |

155

6 where b4 = − a44 , b6 = − 5a 16 , and

g(h) =

4 h 1 φ ( ) − φ(h). 3 2 3

(6.2.3)

Note that L(h) = L( 2h ). Then h h6 16L(h) = 16g ( ) + b4 h4 + b6 + ⋅⋅⋅ . 2 4 The combination of this with (6.2.2) gives f 󸀠 (x) = L(h) =

16 h 1 b6 h6 g( )− g(h) − + ⋅⋅⋅ , 15 2 15 20

(7)

5 5 f (x) a6 = 16 where b6 = − 16 7! . From this and (6.2.1)–(6.2.3), it is seen that a simple combination of φ(h), φ( 2h ), and φ( 4h ) furnishes an estimate of f 󸀠 (x) with accuracy O(h6 ). Continue this procedure to give the more precise results.

6.3 Iterative methods Iterative methods are used to solve nonlinear equations, systems of linear equations, ordinary differential equations, and the matrix eigenvalue problem.

6.3.1 Fixed point principle Given an equation f(x) = 0, where f is continuous, its equivalent form is x = φ(x), where φ(x) = x + f(x). Starting with an initial value x0 , we get the iterative sequence {x n } as follows: x1 = φ(x0 ), x2 = φ(x1 ), .. . x n = φ(x n−1 ), .. . If {x n } converges to ξ , then ξ = φ(ξ). So ξ is a solution of f(x) = 0.

Fixed point principle Assume that a function φ(x) satisfies |φ(y1 ) − φ(y2 )| ≤ M |y1 − y2 |

(y1 , y2 ∈ [a, b]),

156 | 6 Numerical methods where 0 < M < 1 is a constant independent of y1 and y2 . Take an initial value x0 ∈ (a, b). Then – the iterative sequence {x n } satisfying x n = φ(x n−1 ) converges. Denote ξ = limn→∞ x n . – ξ = φ(ξ), i.e., the ξ is the fixed point of φ(x); and Mn – the rate of convergence is |x n − ξ | ≤ 1−M |x1 − x0 |. In fact, by the assumption, for any k ∈ ℤ+ , |x k+1 − x k | = |φ(x k ) − φ(x k+1 )| ≤ M |x k − x k+1 | ≤ ⋅ ⋅ ⋅ ≤ M k |x1 − x0 |. This implies that |x n+p − x n | ≤ |x n+p − x n+p−1 | + |x n+p−1 − x n+p−2 | + ⋅ ⋅ ⋅ + |x n+1 − x n | ≤ (M n+p−1 + M n+p−2 + ⋅ ⋅ ⋅ + M n ) |x1 − x0 | =

M n (1 − M p ) |x1 − x0 |. (6.3.1) 1−M

Since 0 < M < 1, limn→∞ |x n+p − x n | = 0. Cauchy criterion indicates the iterative sequence {x n } converges. Denote ξ = limn→∞ x n . From this and x n = φ(x n−1 ), it follows that ξ = φ(ξ), i.e., ξ is the fixed point of φ(x). Letting p → ∞ in (6.3.1), since x n+p → ξ and M p → 0, we get the desired rate of convergence.

6.3.2 Iterative methods of univariate nonlinear equations Let f ∈ C2 ([a, b]), ξ be a zero of f , and x be an approximation to ξ . By Taylor’s formula, 0 = f(ξ) = f(x + h) = f(x) + hf 󸀠 (x) + O(h2 ) ≈ f(x) + hf 󸀠 (x), where h = ξ − x. This implies h ≐ −f 󸀠 (x)/f(x). Since x is an approximation to ξ , the better approximation to ξ should be x − f(x)/f 󸀠 (x). Starting with an estimate x0 of ξ , Newton’s iterative method defines inductively x n+1 = x n − f(x n )/f 󸀠 (x n ) (n = 0, 1, . . . ). If x0 is close to a zero of f , then the sequence {x n } converges to ξ . Especially, if f ∈ C2 (ℝ) is an increasing and convex function and has only a zero, then the Newton iteration sequence {x n } converges to the zero of f from any stating point. Since f(x n ) − f(x n−1 ) f 󸀠 (x n ) ≐ , x n − x n−1 replacing f 󸀠 (x n ) by

f(x n ) − f(x n−1 ) x n − x n−1

6.3 Iterative methods |

157

in the above Newton formula, the iteration formula becomes x n+1 = x n − f(x n ) [

(x n − x n−1 ) ] f(x n ) − f(x n−1 )

(n = 0, 1, . . . ).

This iterative method is simpler than the Newton iterative method but its convergence rate is slower. A polynomial of degree n p(x) = a n x n + a n−1 x n−1 + ⋅ ⋅ ⋅ + a0 has exactly n complex roots and its all roots lie in the disk with center 0 and radius ρ = 1 + |a n |−1 max0≤k≤n |a k |. Horner’s algorithm can give the values of a polynomial and its derivatives simultaneously. Define α k and β k for k = 0, 1, . . . , n as follows: αn = an ,

β n = 0,

α n−1 = a n−1 + xα n ,

β n−1 = α n + xβ n ,

α n−2 = a n−2 + xα n−1 ,

β n−2 = α n−1 + xβ n−1 ,

.. .

.. .

α0 = a0 + xα1 ,

β0 = α1 + xβ1 .

Then α0 = p(x) and β0 = p󸀠 (x). Using Horner’s algorithm, one can rapidly compute p(x k )/p󸀠 (x k ) to obtain x k+1 = x k − p(x k )/p󸀠 (x k ) from x k in Newton’s method. This iterative process yields a root ξ of p(x). Let q(x) = p(x)/(x − ξ). Then q(x) is a polynomial of degree n − 1. Denote q(x) = b n−1 x n−1 + b n−2 x n−2 + ⋅ ⋅ ⋅ + b0 . We will find b n−1 , . . . , b0 . From p(x) = (x − ξ)q(x), it follows that a n x n + a n−1 x n−1 + ⋅ ⋅ ⋅ + a0 = (x − ξ)(b n−1 x n−1 + b n−2 x n−2 + ⋅ ⋅ ⋅ + b0 ). Comparing coefficients on both sides, we get b n−1 = a n , b n−2 = a n−1 + ξb n−1 , .. . b0 = a1 + ξb1 . i.e., we get q(x). All roots of q(x) are the remaining (n − 1) roots of p(x). Repeating the above procedure in which p(x) is replaced by q(x), we get another root of p(x). Continuing the procedure, finally, we get all roots of p(x).

158 | 6 Numerical methods

6.3.3 Iterative method of systems of bivariate nonlinear equations Consider a system of nonlinear equations { f1 (x, y) = 0, { f (x, y) = 0, { 2 where f1 and f2 are both continuously differentiable functions. A pair of initial values (x0 , y0 ) are given. Expanding f1 , f2 in the neighborhood of (x0 , y0 ) into Taylor series using the Taylor theorem and taking their linear principal parts, respectively, we get the following system of equations: { { f1 (x0 , y0 ) + { { f (x , y ) + { 2 0 0

∂f1 ∂x (x 0 , y 0 )(x

− x0 ) +

∂f1 ∂y (x 0 , y 0 )(y

− y0 ) = 0,

∂f2 ∂x (x 0 , y 0 )(x

− x0 ) +

∂f2 ∂y (x 0 , y 0 )(y

− y0 ) = 0.

If the Jacobian determinant 󵄨󵄨 ∂f1 󵄨󵄨 ∂x (x0 , y0 ) 󵄨 J0 = 󵄨󵄨󵄨 ∂f 󵄨󵄨 2 (x0 , y0 ) 󵄨󵄨 ∂x

󵄨󵄨 ∂f1 󵄨 ∂y (x 0 , y 0 )󵄨󵄨󵄨 󵄨󵄨 ∂f2 󵄨󵄨 ∂y (x 0 , y 0 )󵄨󵄨

≠ 0,

then the solution is 󵄨󵄨 ∂f1 󵄨󵄨 (x0 , y0 ) { { −1 󵄨󵄨 ∂y { x1 = x0 − J0 󵄨󵄨 { { { 󵄨󵄨 ∂f2 (x , y ) { 󵄨󵄨 ∂y 0 0 { { { { 󵄨󵄨 { { 󵄨󵄨f (x , y ) { { −1 󵄨󵄨 1 0 0 { { y1 = y0 − J0 󵄨󵄨 { 󵄨󵄨f (x , y ) 󵄨󵄨 2 0 0 {

󵄨 f1 (x0 , y0 )󵄨󵄨󵄨󵄨 󵄨󵄨 , 󵄨 f2 (x0 , y0 )󵄨󵄨󵄨󵄨 󵄨󵄨 ∂f1 󵄨 ∂x (x 0 , y 0 )󵄨󵄨󵄨

󵄨.

󵄨󵄨 ∂f2 󵄨 ∂x (x 0 , y 0 )󵄨󵄨

Continue this procedure, in general, starting from (x k , y k ), if 󵄨󵄨 ∂f1 󵄨󵄨 ∂x (x k , y k ) 󵄨 J k = 󵄨󵄨󵄨 ∂f 󵄨󵄨 2 (x k , y k ) 󵄨󵄨 ∂x the solution is

󵄨󵄨 ∂f1 󵄨 ∂y (x k , y k )󵄨󵄨󵄨 󵄨󵄨 ∂f2 󵄨󵄨 ∂y (x k , y k )󵄨󵄨

󵄨󵄨 ∂f1 󵄨󵄨 (x k , y k ) { { −1 󵄨󵄨 ∂y { x k+1 = x k − J k 󵄨󵄨 { { 󵄨󵄨 ∂f2 (x , y ) { { 󵄨󵄨 ∂y k k { { { { 󵄨󵄨 { { 󵄨󵄨f (x , y k ) { { −1 󵄨󵄨 1 k { y = y − J { k+1 k k 󵄨󵄨󵄨 { 󵄨󵄨f2 (x k , y k ) 󵄨 {

≠ 0 (k ∈ ℤ+ ),

󵄨 f1 (x k , y k )󵄨󵄨󵄨󵄨 󵄨󵄨 , 󵄨 f2 (x k , y k )󵄨󵄨󵄨󵄨 󵄨󵄨 ∂f1 󵄨 ∂x (x k , y k )󵄨󵄨󵄨

󵄨.

󵄨󵄨 ∂f2 󵄨 ∂x (x k , y k )󵄨󵄨

This gives the iterative process. If max{ |x N+1 − x N |, |y N+1 − y N | } < ε. Then (x N+1 , y N+1 ) is a desired solution.

6.3 Iterative methods |

159

6.3.4 Iterative method to solve systems of linear equations Consider a system of linear equations n

∑ a ij x j = b i

(i = 1, . . . , n).

(6.3.2)

j=1

The matrix form is Ax = b, where A = (a ij )n×n , x = (x1 , . . . , x n )T , and b = (b1 , . . . , b n )T .

(a) Jacobian iterative method The system of linear equations (6.3.2) can be rewritten in the form n

x i = ∑ c ij x j + d i

(i = 1, . . . , n).

j=1 (0)

(0)

Take x1 , . . . , x n as initial values. Applying the iterative formula, we get (k)

xi

n

(k−1)

= ∑ c ij x j

+ di

(i = 1, . . . , n),

j=1

which is called the Jacobian iterative formula. We find the k-th approximate solution such that (k) (k−1) | < ε, max |x i − x i 1≤i≤n

(k)

(k)

where ε is a predictive error. So (x1 , . . . , x n ) is an approximate solution of (6.3.2) with predictive error ε. Let (x∗1 , . . . , x∗n ) be the exact solution. If μ = max1≤i≤n ∑nj=1 |c ij | < 1, then the error is estimated by μk (k) (1) max |x i − x∗i | ≤ max |x − x∗i |. 1 − μ 1≤i≤n i 1≤i≤n If ν = max1≤j≤n ∑ni=1 |c ij | < 1, then the error is estimated by n

(k)

∑ |x i − x∗i | ≤ i=1

ν k n (1) ∑ |x − x∗i |. 1 − ν i=1 i

If p = ∑ni,j=1 c2ij < 1, then the error is estimated by n

(k) ( ∑ (x i i=1



x∗i )2 )

1 2

k



n

p2 1

1 − p2

(1) ( ∑ (x i i=1



x∗i )2 )

1 2

.

160 | 6 Numerical methods

(b) Seldel iterative method The Jacobian iterative formula can be rewritten in the form (k)

xi

i−1

(k)

n

(k−1)

= ∑ c ij x j + ∑ c ij x j j=1

+ di ,

j=i+1

which is called the Seldel iterative formula, then the convergence rate of the iterative process is quicker. If max1≤i≤n ∑nj=1 |c ij | < 1, then the error is estimated by (k)

(0)

max |x i − x i | ≤

1≤i≤n

μk (1) (0) max |x − x i |, 1 − μ 1≤i≤n i

where n { ∑j=1 |c ij | } μ = max { }. 1≤i≤n 1 − ∑i−1 j=1 |c ij | } {

(c) Relaxation iterative method The Seldel iterative formula is rewritten in the form (k)

xi

i−1

(k)

n

(k−1)

= ω ( ∑ c ij x j + ∑ c ij x j j=1

(k−1)

+ d i ) + (1 − ω)x i

,

j=i+1

where ω is a constant which is called a relaxation factor. This formula is called the relaxation iterative formula. We may choose ω such that the convergence rate of the iterative process becomes quicker.

6.3.5 Maximum eigenvalues for matrixes Let A be an n × n matrix. Suppose that the eigenvalues λ1 , . . . , λ n of A satisfy |λ1 | > ⋅ ⋅ ⋅ > |λ n | and the corresponding eigenvectors υ1 , . . . , υ n satisfy ‖υ1 ‖ = ⋅ ⋅ ⋅ = ‖υ n ‖ = 1. If these eigenvectors constitute a basis for the n-dimensional space, then an initial vector x0 can be expressed into a linear combination of eigenvectors, i.e., n

x0 = a1 υ1 + a2 υ2 + ⋅ ⋅ ⋅ + a n υ n = ∑ a i υ i . 1

Define x k = Ax k−1 (k ∈ ℤ+ ). Since Aυ i = λ i υ i , n

n

x1 = Ax0 = ∑ a i (Aυ i ) = ∑ λ i a i υ i ; 1 n

1 n

x2 = Ax1 = ∑ λ i a i (Aυ i ) = ∑ λ2i a i υ i . 1

1

6.3 Iterative methods |

161

In general, n

x k = Ax k−1 = ∑ λ ki a i υ i

(k ∈ ℤ+ ).

1

For a1 ≠ 0, since |λ i /λ1 | < 1 (i ≥ 2), when k is sufficiently large, x k = λ1k (a1 υ1 + a2 ( where ε k = o(1), i.e.,

λ2 k λn k ) υ2 + ⋅ ⋅ ⋅ + a n ( ) υ n ) = λ1k (a1 υ1 + ε k ), λ1 λ1 x k ≐ λ1k a1 υ1 , x k+1 ≐ λ1k+1 a1 υ1

Let x k = (x k1 , . . . , x kn ) (k ∈ ℤ+ ) and υ1 = (υ11 , . . . , υ1n ). Then x ki ≐ λ1k a1 υ1i , x k+1,i ≐ λ1k+1 a1 υ1i . This implies that x k+1,i λ1k+1 a1 υ1i ≐ k = λ1 . x ki λ1 a1 υ1i

6.3.6 Iterative method of ordinary differential equations Consider the initial value problem of the ordinary differential equation of order 1 { y󸀠 = f(x, y), { y(x0 ) = y0 . {

(6.3.3)

In the numerical solution of differential equations, we want to find the values of y(x) on a sequence of points on the interval [a, b] x n = a + nh where h =

b−a N

(n = 0, 1, . . . , N),

is called the step length. Denote y(x n ) ≐ y n (n = 1, . . . , N ).

(a) Euler method The derivative may be represented by the difference quotient, i.e., y󸀠 (x) ≐ 1h (y(x + h) − y(x)). So y(x n + h) − y(x n ) y(x n+1 ) − y(x n ) y n+1 − y n = ≐ . y󸀠 (x n ) ≐ h h h By (6.3.3), y󸀠 (x n ) = f(x n , y(x n )) ≐ f(x n , y n ), y(x0 ) = y0 .

162 | 6 Numerical methods

These equalities give an iterative formula as follows: y0 = y(x0 ), y n+1 ≐ y n + hf(x n , y n ) (n = 0, 1, . . . , N − 1).

(6.3.4)

This iterative process gives the values of y n (n = 0, 1, . . . , N ). If f(x, y) is smooth on [a, b], then the solution of (6.3.3) is smooth. By the Taylor formula 1 1 y(x n+1 ) = y(x n + h) = y(x n ) + hf(x n , y(x n )) + h2 y󸀠󸀠 (ξ) ≐ y n+1 + h2 y󸀠󸀠 (ξ), 2 2 where x n < ξ < x n+1 . Denote M = maxa≤x≤b |y󸀠󸀠 (x)|. The local truncation error is |y(x n+1 ) − y n+1 | ≤

M 2 h 2

(n = 0, 1, . . . , N − 1).

The accumulation of all the local truncation errors gives the global truncation error O(h).

(b) Trapezoidal method Integrating both sides of (6.3.3) from x n to x n+1 , x n+1

y(x n+1 ) − y(x n ) = ∫ f(t, y(t)) dt. xn

The integral on the right-hand side is computed by trapezoidal formula as follows: x n+1

∫ f(t, y(t)) dt = xn

h (f(x n , y n ) + f(x n+1 , y n+1 )). 2

Thus, h (f(x n , y n ) + f(x n+1 , y n+1 )). 2

y n+1 = y n +

(6.3.5) (0)

This is an implied format. We use (6.3.4) in the Euler method to find initial values y n+1 (0)

y n+1 ≐ y n + hf(x n , y n ), and then use (6.3.5) to give an implied iterative formula (k+1)

y n+1 = y0 +

h (k) (f(x n , y n ) + f(x n+1 , y n+1 )) (k = 0, 1, . . . ). 2

This implies that (k+1)

(k)

|y n+1 − y n+1 | ≤

M1 h (k) (k−1) |y n+1 − y n+1 |. 2 (k)

A similar argument of the fixed point principle implies that if 0 < M21 h < 1, then { y n+1 } converges as k → ∞. For the trapezoidal method, the global truncation error is O(h2 ).

6.4 Difference methods |

163

(c) System of ordinary differential equations { y󸀠 (x) = f(x, y, z), { 󸀠 z (x) = f(x, y, z), {

y(x0 ) = y0 , z(x0 ) = z0 .

Its Euler formula is y n+1 = y n + hf(x n , y n , z n ),

y(x0 ) = y0 ,

z n+1 = z n + hg(x n , y n , z n ),

z(x0 ) = z0 .

6.4 Difference methods The difference method is a fundamental method for solving differential equations. In this method, the derivatives in a differential equation are replaced by difference quotients.

6.4.1 The difference method of ordinary differential equations Consider an ordinary differential equation with the first boundary condition { y󸀠󸀠 − p(x)y = q(x), { y(a) = α, y(b) = β. {

q(x) > 0

(a ≤ x ≤ b),

(6.4.1)

Take a partition x k = a + kh (k = 0, . . . , n), where h = b−a n . Let y = y(x) be the solution of the problem (6.4.1). We find the approximation value y i of y(x i ) (i = 0, . . . , n). For i = 1, . . . , n − 1, the second-order derivative y󸀠󸀠 (x i ) is expressed approximately by the second-order central difference quotient, i.e., y󸀠󸀠 (x i ) ≐

y i+1 − 2y i + y i−1 . h2

From this and (6.4.1), the difference equation is given by y i+1 − 2y i + y i−1 − pi yi ≐ qi h2

(i = 1, . . . , n − 1),

where p i = p(x i ) and q i = q(x i ). Combining this with the boundary conditions y0 = α and y n = β, the problem (6.4.1) is reduced to the following system of linear equations y i+1 − 2y i + y i−1 { − pi yi ≐ qi { h2 { { {y0 = α, y n = β.

(i = 1, . . . , n − 1),

Solving the system of linear equations, the numerical solution y i (i = 0, . . . , n) of the problem (6.4.1) is obtained.

164 | 6 Numerical methods

Consider an ordinary differential equation with the second boundary condition { y󸀠󸀠 − p(x)y = q(x), { 󸀠 y (a) = α, y󸀠 (b) = β. {

q(x) > 0

(a ≤ x ≤ b),

Using the three-point form of the numerical differentiation formula (see Section 6.2) −y2 + 4y1 − 3y0 , 2h 3y n − 4y n−1 + y n−2 , y󸀠n ≐ 2h y󸀠0 ≐

the corresponding system of linear equations is given by { {y i+1 − 2y i + y i−1 − p i y i ≐ q i (i = 1, . . . , n − 1), { { −y2 + 4y1 − 3y0 ≐ α, 3y n − 4y n−1 + y n−2 ≐ β. 2h 2h { Solving the system of linear equations, the numerical solution is obtained. Consider an ordinary differential equation with the third boundary condition { y󸀠󸀠 − p(x)y = q(x), q(x) > 0 (a ≤ x ≤ b), { 󸀠 y (a) − α0 y(a) = α1 , y󸀠 (b) + β0 y(b) = β1 , { where α0 ≥ 0, β0 ≥ 0, and α0 + β0 > 0. The corresponding system of linear equations is given by y i+1 − 2y i + y i−1 { − p i y i ≐ q i (i = 1, . . . , n − 1), { { h2 { { { −y2 + 4y1 − 3y0 − α y ≐ α , 3y n − 4y n−1 + y n−2 + β y ≐ β . 0 0 1 0 n 1 2h 2h { Solving the system of linear equations, the numerical solution is obtained.

6.4.2 The difference method of elliptic equations Consider an elliptic equation with the first boundary condition ∂2 u ∂2 u { + = 0 ((x, y) ∈ G), {∆u := ∂x2 ∂y2 { { ((x, y) ∈ Γ), {u(x, y) = φ(x, y)

(6.4.2)

where G is a bounded domain and Γ is the boundary of G. Let x i = ih, y j = jτ (i, j ∈ ℤ), where h > 0, τ > 0. The point (ih, jτ) is called a grid, denoted by (i, j). If (i, j) ∈ G and all four neighborhood points (i − 1, j), (i + 1, j), (i, j − 1), and (i, j + 1) belong to G + Γ , then (i, j) is called an interior point. If (i, j) ∈ G

6.4 Difference methods |

165

and there is a neighborhood point that does not belong to G, then (i, j) is called a boundary point. Replacing the second-order partial derivative in (6.4.2) by the second-order central difference quotient, we obtain the corresponding difference equation on each interior point u i+1,j − 2u i,j + u i−1,j u i,j+1 − 2u i,j + u i,j−1 + ≐ 0, h2 τ2 where u i,j = u(x i , y j ). This equation is called a five-point scheme. Consider the simplest case that h = τ and G is a rectangle G : { 0 ≤ x ≤ L, 0 ≤ y ≤ M }. and all boundary nodes lie on the boundary. Let l = [L/h] and m = [M/h]. The boundary condition becomes u i,j = φ(ih, jh)

((i = 0, l; j = 0, . . . , m) or (i = 0, . . . , l; j = 0, m)).

So the problem (6.4.2) is reduced to the following system of m + 1 equations with l + 1 unknown { u i,j = 14 (u i+1,j + u i−1,j + u i,j+1 + u i,j−1 ) (i = 1, . . . , l − 1; j = 1, . . . , m − 1), { u = φ(ih, jh) ((i = 0, l, j = 0, . . . , m) or (i = 0, . . . , l; j = 0, m)). { i,j It can be proved that this system of difference equations has a unique solution by the extremum principle. One may use the direct method and the iteration method to solve this system of linear equations.

6.4.3 The difference method of parabolic equations Consider a parabolic equation with an initial condition and boundary conditions 2

∂ u { (0 < x < 1, 0 < t ≤ T), Lu := ∂u { ∂t − ∂x2 = 0 { { u(0, x) = φ(x) (0 ≤ x ≤ 1), { { { { u(t, 0) = μ 1 (t), u(t, 1) = μ2 (t) (0 ≤ t ≤ T), { where φ(0) = μ1 (0) and φ(1) = μ2 (0). Establish the following grids (x k , t j ), where

x k = kh

(k = 1, . . . , N),

t j = jτ

(j = 0, . . . , m0 ),

Nh = 1, m0 = [T/τ].

Using the numerical differential formula, we obtain u(x k , t j+1 ) − u(x k , t j ) ∂u (x k , t j ) ≐ , ∂t τ u(x k+1 , t j ) − 2u(x k , t j ) + u(x k−1 , t j ) ∂2 u (x k , t j ) ≐ . ∂x2 h2

(6.4.3)

166 | 6 Numerical methods

By (6.4.3), it follows that u k,j+1 − u k,j u k+1,j − 2u k,j + u k−1,j ≐ 0. − τ h2 Denote r = τ/h2 , this equality becomes u k,j+1 ≐ (1 − 2r)u k,j + r(u k+1,j + u k−1,j ) (k = 1, . . . , N − 1; j = 0, . . . , m0 − 1). The corresponding initial condition and boundary condition are u k,0 ≐ φ(kh) u0,j ≐ μ1 (jτ),

(k = 0, . . . , N), μ N,j ≐ μ2 (jτ)

(j = 0, . . . , m0 ).

From this, the problem (6.4.3) is reduced to the following system of linear equations: { u k,j+1 ≐ (1 − 2r)u k,j + r(u k+1,j + u k−1,j ) (k = 1, . . . , N − 1; j = 0, . . . , m0 − 1), { { { (k = 0, . . . , N), u k,0 ≐ φ(kh) { { { { u ≐ μ1 (jτ), μ N,j ≐ μ2 (jτ) (j = 0, . . . , m0 ). { 0,j

6.4.4 The difference method of hyperbolic equations Consider a hyperbolic equation problem with boundary conditions 2

2

∂ u ∂ u { (0 < x < 1, 0 < t ≤ T), 2 − ∂t 2 = f(x, t) { { { ∂x ∂u u(0, x) = φ(x), ∂t (0, x) = ψ(x) (0 ≤ x ≤ 1), { { { { u(t, 0) = Φ 0 (t), u(t, 1) = Φ1 (t) (0 ≤ t ≤ T). {

Take grids (x k , t j ), where

where h =

1 N

x k = kh

(k = 0, . . . , N),

t j = jτ

(j = 0, . . . , m0 ),

and m0 = [T/τ]. Using the numerical differential formula, we get u(x k+1 , t j ) − 2u(x k , t j ) + u(x k−1 , t j ) ∂2 u (x k , t j ) ≐ . 2 ∂x h2 u(x k , t j+1 ) − 2u(x k , t j ) + u(x k , t j−1 ) ∂2 u (x k , t j ) ≐ . 2 ∂t τ2

By (6.4.4), it follows that u k+1,j − 2u k,j + u k−1,j u k,j+1 − 2u k,j + u k,j−1 − ≐ f k,j h2 τ2

(6.4.4)

6.5 Finite element methods |

167

where u(x k , y j ) = u k,j and f(x k , y j ) = f k,j . Denote s = τ/h. This equation becomes u k,j+1 ≐ s2 (u k+1,j + u k−1,j ) + 2(1 − s2 )u k,j − u k,j−1 − s2 h2 f k,j (k = 1, . . . , N − 1; j = 1, . . . , m0 − 1). So the problem (6.4.4) is reduced to the following system of linear equations u k,j+1 ≐ s2 (u k+1,j + u k−1,j ) + 2(1 − s2 )u i,j { { { { { − u k,j−1 − s2 h2 f k,j (k = 1, . . . , N − 1; j = 1, . . . , m0 − 1), { { { (k = 0, . . . , N), u k,0 ≐ φ k , u k,1 ≐ φ k + τψ k { { { { { u0,j ≐ Φ0 (jτ) = Φ0j , u N,j ≐ Φ1 (jτ) = Φ1j (j = 0, . . . , m0 ). This is an explicit difference scheme. Solving the system of linear equations, the numerical solution is obtained.

6.5 Finite element methods The finite element method is another numerical method for solving differential equations. Its basic idea is to transfer a differential equation into an integral equation, partition the integral domain into finite subdomains (e.g. triangulation), then construct an interpolation polynomial of the generalized solution with nodes (e.g. vertices of triangle), and finally yield a system of linear equations in each subdomain.

6.5.1 The one-dimensional finite element method Consider the boundary value problem of a second-order ordinary differential equation { −(p(x)u󸀠 (x))󸀠 + q(x)u(x) = f(x) (0 ≤ x ≤ b), (6.5.1) { 󸀠 u(0) = 0, p(b)u (b) + αu(b) = g, { where p(x), q(x) are both continuous functions on [0, b] and p(x) ≥ p0 > 0, q(x) > 0 (0 ≤ x ≤ b), α > 0, and g is a constant. Its solution in the space C 1 ([0, b]) ⋂ C2 ((0, b)) is called the classical solution.

(a) Generalized solution We want to find the generalized solution of the problem (6.5.1) in the space C1 ([0, b]). Let φ(x) be a continuously differentiable function on [0, b] and satisfy the boundary condition φ(0) = 0. Multiplying both sides of the second-order ordinary equation in (6.5.1) by φ(x), and then integrating both sides from 0 to b, we get b

b 󸀠

󸀠

∫(−(p(x)u (x)) + q(x)u(x))φ(x) dx = ∫ f(x)φ(x) dx. 0

0

(6.5.2)

168 | 6 Numerical methods Note that φ(0) = 0 and the boundary condition p(b)u󸀠 (b) + αu(b) = g. Using the integration by parts gives b

b 󸀠

󸀠

󸀠

∫(p(x)u (x)) φ(x) dx = p(b)u (b)φ(b) − ∫ p(x)u󸀠 (x)φ󸀠 (x) dx 0

0

b

= (g − αu(b))φ(b) − ∫ p(x)u󸀠 (x)φ󸀠 (x) dx. 0

From this and (6.5.2), it follows that b

b

∫(p(x)φ󸀠 (x)u󸀠 (x) + q(x)φ(x)u(x)) dx + αu(b)φ(b) = ∫ f(x)φ(x) dx + gφ(b). (6.5.3) 0

0 b ∫0 ((u󸀠 (x))2

If u ∈ C1 ([0, b]) satisfying u(0) = 0 and + (u(x))2 ) dx < ∞, and (6.5.3) holds 1 for any φ ∈ C ([0, b]) satisfying φ(0) = 0, the function u(x) is called a generalized solution of the problem (6.5.1).

(b) Discretization Take a partition of the interval [0, b]: 0 ≤ x0 < x1 < ⋅ ⋅ ⋅ < x n = b. For i = 0, . . . , n − 1, denote by η i each subinterval [x i , x i+1 ] and denote by u i the value of u(x) at the node x i . n−1 From [0, b] = ⋃n−1 i=0 [x i , x i+1 ] = ⋃i=0 η i , it follows by (6.5.3) that n−1

n−1

∑ U i + αu(b)φ(b) = ∑ V i + gφ(b), 0

(6.5.4)

0

where U i = ∫ (p(x)φ󸀠 (x)u󸀠 (x) + q(x)φ(x)u(x)) dx, ηi

V i = ∫ f(x)φ(x) dx. ηi

(c) Interpolation function Introduce the linear interpolations of u(x) and φ(x) on η i (i = 0, . . . , n − 1) as follows: u(x) ≐ A i (x)u i + B i (x)u i+1 , φ(x) ≐ A i (x)φ i + B i (x)φ i+1 where A i (x) =

x i+1 −x Mi

and B i (x) =

x−x i Mi

(x ∈ η i ),

and M i = x i+1 − x i , and

u󸀠 (x) ≐ A󸀠i (x)u i + B󸀠i (x)u i+1 , φ󸀠 (x) ≐ A󸀠i (x)φ i + B󸀠i (x)φ i+1 where A󸀠i (x) = − M1i , B󸀠i (x) =

1 Mi

.

(x ∈ η i ),

6.5 Finite element methods |

169

(d) Unit stiffness matrix and unit carrier vector Introduce two 2-dimensional vectors J i , J i∗ and two 1 × 2 matrices K i (x), H i (x) as follows: J i∗ = (φ i , φ i+1 )T ,

J i = (u i , u i+1 )T ,

H i (x) = (A󸀠i (x), B󸀠i (x))

K i (x) = (A i (x), B i (x)), So

u(x) = K i J i ,

φ(x) = K i J i∗ ,

u󸀠 (x) = H i J i ,

φ󸀠 (x) = H i J i∗ .

Note that

(i = 0, . . . , n).

(6.5.5)

H i J i∗ = (H i J i∗ )T = (J i∗ )T H iT , K i J i∗ = (K i J i∗ )T = (J i∗ )T K Ti .

So

φ󸀠 (x)u󸀠 (x) = (J i∗ )T H iT H i J i , φ(x)u(x) = (J i∗ )T K Ti K i J i

and U i = ∫ (p(x)φ󸀠 (x)u󸀠 (x) + q(x)φ(x)u(x)) dx ηi

≐ ∫ p(x)(J i∗ )T H iT H i J i dx + ∫ q(x)(J i∗ )T K Ti K i J i dx. ηi

Since J i and

J i∗

ηi

are independent of x, U i ≐ (J i∗ )T (∫ p(x)H iT H i dx + ∫ q(x)K Ti K i dx) J i . ηi

Let

G ηi

=

∫η (p(x)H iT H i i

+

ηi

q(x)K Ti K i ) dx

(i = 0, . . . , n − 1). Then

U i ≐ (J i∗ )T G η i J i Note that H iT H i

=

1 M 2i ( 1 − M2 i

K Ti (x)K i (x) = (

(i = 0, . . . , n − 1). − M12 i

1 M 2i

A2i (x) A i (x)B i (x)

), A i (x)B i (x) ). B2i (x)

For i = 0, . . . , n − 1, G η i = ∫ (p(x)H iT H i + q(x)K Ti (x)K i (x)) dx ηi



=

1 M 2i ( 1 − M2 i η G i,ii ( ηi G i+1,i

− M12 i

1 M 2i

) p i + ∫ q(x) (

ηi G i,i+1 ) ηi G i+1,i+1

ηi

A2i (x) A i (x)B i (x)

A i (x)B i (x) ) dx B2i (x)

(6.5.6)

170 | 6 Numerical methods where p i ≐ ∫η p(x) dx and i

η

pi + ∫ q(x)A2i (x) dx, M 2i ηi pi ηi = G i+1,i = − 2 + ∫ q(x)A i (x)B i (x) dx, Mi ηi pi = 2 + ∫ q(x)B2i (x) dx (i = 0, . . . , n − 1). Mi ηi

G i,ii = η

i G i,i+1

η

i G i+1,i+1

The matrix

η

(

η

G i,ii

i G i,i+1

η

η

i G i+1,i

i G i+1,i+1

)

is called the unit stiffness matrix. It is a positive definite and symmetric matrix. Substituting (6.5.6) into (6.5.4), n−1

n−1

0

0

∑ (J i∗ )T G η i J i + αu(b)φ(b) = ∑ V i + gφ(b).

(6.5.7)

By (6.5.5), 0 0

∗ (J n−1 )T (

0 0 ) J n−1 = (φ n−1 , φ n ) ( 0 α

0 u n−1 )( ) = αφ n u n = αu(b)φ(b), α un

0 0 ∗ (J n−1 )T ( ) = (φ n−1 , φ n ) ( ) = gφ n = gφ(b). g g From this and (6.5.7), it follows that n−1

0 0

∗ )T ( ∑ (J i∗ )T G η i J i + (J n−1 0

n−1 0 0 ∗ )T ( ) . ) J n−1 = ∑ V i + (J n−1 α g 0

(6.5.8)

Since J i∗ is independent of x, by (6.5.5), we get V i = ∫ f(x)φ(x) dx ≐ ∫ (J i∗ )T K Ti (x)f(x) dx = (J i∗ )T ∫ ( ηi

ηi

ηi

A i (x) ) f(x) dx = (J i∗ )T F e i , B i (x)

where

T

F e i = (∫ A i (x)f(x) dx, ∫ B i (x)f(x) dx) . ηi

ηi

The F e i is called a unit carrier vector. By (6.5.8), it follows that n−1

0 0

∗ )T ( ∑ (J i∗ )T G η i J i + (J n−1 0

Let

n−1 0 0 ∗ )T ( ) . ) J n−1 = ∑ (J i∗ )T F e i + (J n−1 α g 0

F̃ e i = F e i

G̃ η i = G η i , 0 G̃ η n−1 = G η n−1 + ( 0

0 ), α

(i = 0, 1, . . . , n − 2),

0 F̃ e n−1 = F e n−1 + ( ) . g

(6.5.9)

6.5 Finite element methods |

171

Then (6.5.9) becomes n−1

n−1

0

0

∑ (J i∗ )T G̃ η i J i = ∑ (J i∗ )T F̃ e i .

(6.5.10)

(e) Global stiffness matrix and global carrier vector Let J = (u0 , u1 , . . . , u n )T and J ∗ = (φ0 , φ1 , . . . , φ n )T , and let F̃ e i = (α0 , α1 , . . . , α n )T , where α i = ∫ A i (x)f(x) dx, ηi

α i+1 = ∫ B i (x)f(x) dx,

αν = 0

ηi

(ν ≠ i, i + 1),

and let Ĝ η i = (β μ,ν )μ,ν=0,...,n (i = 0, . . . , n − 1), where β i,i = G̃ i,ii , { { { { ηi { { β i,i+1 = β i+1,i = G̃ i,i+1 , { { { ηi ̃ β i+1,i+1 = G i+1,i+1 , { { { { η n−1 { { + α, β n,n = G̃ n,n { { { { β μ,ν=0 (otherwise). η

By (6.5.10), n−1

n−1

0

0

(J ∗ )T ( ∑ Ĝ η i ) J i = (J ∗ )T ( ∑ F e i ) . Let G =

∑n−1 i=0

Ĝ η i

and F =

∑n−1 i=0

F̃ e i .

Then

(J ∗ )T (GJ i − F) = 0,

(6.5.11)

where G is called a global stiffness matrix and F is called a global carrier vector. The matrix G is a positive definite and symmetric matrix.

(f) Constraint condition Define F̃ = (0, F e1 , . . . , F e n−1 ), J ̃ = (0, u1 , . . . , u n )T , and J ∗̃ = (0, φ1 , . . . , φ n )T , and 1 0 G̃ = ( . .. 0

0 β11 .. . β n1

⋅⋅⋅ ⋅⋅⋅ .. . ⋅⋅⋅

0 β1n .. ) . . β nn

From (6.5.11) and φ(0) = 0, and u(0) = 0, it follows that J ∗̃ (G̃ J ̃ − F)̃ = 0.

172 | 6 Numerical methods This equality holds for any function φ satisfying φ ∈ C1 ([0, b]) and φ(0) = 0. So, deleting J ∗̃ , we get G̃ J ̃ = F.̃ (6.5.12) Note that G̃ is a positive definite and symmetric matrix, the solution (0, u1 , . . . , u n ) of the equation (6.5.12) has a unique solution. The u1 , . . . , u n are approximate values of the solution of the problem (6.5.1) at nodes x1 , . . . , x n .

6.5.2 The two-dimensional finite element method Consider the two-dimensional boundary value problem of the Poisson equation { −∆u = f(x, y) { u| = u0 (x, y), { ∂Ω where ∆ =

∂2 ∂x2

+

∂2 ∂y2

((x, y) ∈ Ω),

(6.5.13)

is the Laplace operator and Ω is a bounded domain in the plane.

(a) Generalized solution Take φ(x, y) ∈ C2 (Ω) ⋂ C1 (Ω)̄ and φ(x, y) = 0 on ∂Ω. Multiplying both sides of the Poisson equation in (6.5.13) by φ(x, y), and then integrating both sides over Ω, we get − ∬(∆u)φ dx dy = ∬ fφ dx dy. Ω



By the Green formula and φ(x, y) = 0 on ∂Ω, the left-hand side is − ∬(∆u)φ dx dy = ∬ ( Ω

∂u ∂u ∂φ ∂u ∂φ + dS ) dx dy − ∮ φ ∂x ∂x ∂y ∂y ∂n



∂Ω

∂u ∂φ ∂u ∂φ + = ∬( ) dx dy. ∂x ∂x ∂y ∂y Ω

So ∬( Ω

∂u ∂φ ∂u ∂φ + ) dx dy = ∬ fφ dx dy. ∂x ∂x ∂y ∂y

(6.5.14)



If u(x, y) is such that ∬Ω (u2 + ∆u) dx dy < ∞ and (6.5.14) holds for any φ ∈ C1 (Ω)̄ satisfying φ(x, y) = 0 on ∂Ω, then u(x, y) is called a generalized solution of the problem (6.5.13).

(b) Discretization Partition the integral domain Ω into a combination of triangle units e k (k = 1, . . . , N e ). The vertices p i (x i , y i ) (i = 1, . . . , N e ) of triangles are called nodes. Each node cannot be an interior point of sides of other triangles.

6.5 Finite element methods |

173

Let u(x i , y i ) = u i (i = 1, . . . , N e ). Take a triangle unit e whose three vertices are P i , P j , P m (anticlockwise order). Choose three constants a, b, c such that the function ̃ y) = ax + by + c u(x,

(6.5.15)

attains values u i , u j , and u m , respectively, i.e., a, b, and c satisfy the following equations: { ax i + by i + c = u i , { { { ax j + by j + c = u j , { { { { ax + by m + c = u m . { m Its solution is 1 ((y j − y m )u i + (y m − y i )u j + (y i − y j )u m ), 2∆ e 1 ((x i − x m )u i + (x m − x i )u j + (x i − x j )u m ), b=− 2∆ e 1 ((x j y m − x m y j )u i + (x m y i − x i y m )u j + (x i y j − x j y i )u m ), c= 2∆ e a=

where

󵄨󵄨 󵄨 xi 1 󵄨󵄨󵄨󵄨 ∆ e = 󵄨󵄨 x j 2 󵄨󵄨󵄨 󵄨󵄨x m

yi yj ym

󵄨 1󵄨󵄨󵄨 󵄨󵄨 1󵄨󵄨󵄨 . 󵄨󵄨 1󵄨󵄨󵄨

Since the order of P i , P j , P m is anticlockwise, ∆ e is the area of the triangle unit e = ∆P i P j P m . From this and (6.5.15), the interpolation function is ̃ y) = N i (x, y)u i + N j (x, y)u j + N m (x, y)u m , u(x, where

1 ((y j − y m )x − (x j − x m )y + (x j y m − x m y j )) , 2∆ e 1 N j (x, y) = ((y m − y i )x − (x m − x i )y + (x m y i − x i y m )) , 2∆ e 1 N m (x, y) = ((y i − y j )x − (x i − x j )y + (x i y j − x j y i )) . 2∆ e N i (x, y) =

Here N i (x, y), N j (x, y), and N m (x, y) are all polynomials of degree 1 and satisfy N s (x t , y t ) = δ s,t (s, t = i, j, m), where δ s,t is the Kronecker delta. These three functions are called primary functions of linear interpolation on the triangle unit e. Denote J e = (u i , u j , u m )T , N = (N i , N j , N m )T . So ũ = N T J e .

174 | 6 Numerical methods The gradient vector of ũ is ∇ ũ = Note that

∂ ũ ∂x ( ̃) ∂u ∂y

=(

∂N i ∂x ∂N i ∂y

∂N i 1 (y j − y m ), = ∂x 2∆ e ∂N j 1 (y m − y i ), = ∂x 2∆ e 1 ∂N m (y i − y j ), = ∂x 2∆ e

∂N j ∂x ∂N j ∂y

ui ) ( uj ) . ∂N m um ∂y ∂N m ∂x

∂N i 1 (x j − x m ), =− ∂y 2∆ e ∂N j 1 (x m − x i ), =− ∂y 2∆ e ∂N m 1 (x i − x j ). =− ∂y 2∆ e

Then ∇ũ = BJ e , where B=

1 yj − ym ( 2∆ e x m − x j

ym − yi xi − xm

yi − yj ). xj − xi

(6.5.16)

(c) Unit stiffness matrix and unit carrier vector ̃ Let φ(x, y) be the interpolation function of φ on the triangle unit e. Then ̃ φ(x, y) = N i (x, y)φ i + N j (x, y)φ j + N m (x, y)φ m = N T J e∗ , where

J e∗ = (φ i , φ j , φ m )T , N = (N i , N j , N m )T N = (N i , N j , N m )T , φ i = φ(x i , y i ),

φ j = φ(x j , y j ),

φ m = φ(x m , y m ).

Similar to the argument of ∇u,̃ we get ∇φ̃ = BJ e∗ , where B is stated in (6.5.16). Replacing u and φ by the interpolation functions ũ and φ̃ in (6.5.14), respectively, Ne noticing that ∑n=1 e n = Ω, we get Ne

∑ ∬( n=1 e n

Ne ∂ ũ ∂ φ̃ ∂ ũ ∂ φ̃ + ) dx dy = ∑ ∬ fφ dx dy. ∂x ∂x ∂y ∂y n=1

(6.5.17)

en

By ∇ũ = BJ e and ∇φ̃ = BJ e∗ , each term on the left-hand side of (6.5.17) is ∬( e

∂ ũ ∂ φ̃ ∂ ũ ∂ φ̃ + ) dx dy = ∬(∇φ)̃ T (∇u)̃ dx dy ∂x ∂x ∂y ∂y e

= ∬(J e∗ )T BT BJ e dx dy = (J e∗ )T K e J e , e

(6.5.18)

6.5 Finite element methods |

175

where K e = ∬e BT B dx dy is called a unit stiffness matrix. Since B is a constant matrix and the area of e is ∆ e , K iie K e = ∆ e BT B = ( K jie e K mi

K ije K jje e K mj

e K im e K jm ), e K mm

where e = ∆e ( K st

and

1 ∂N s ∂N t ∂N s ∂N t (a s a t + b s b t ) (s, t = i, j, m) + )= ∂x ∂x ∂y ∂y 4∆ e ai = yj − ym ,

aj = ym − yi ,

am = yi − yj ,

bi = xm − xj ,

bj = xi − xm ,

bm = xj − xi .

Note that fφ = φT f = (J e∗ )T Nf . Since J e∗ is independent of x and y, each term on the right-hand side of (6.5.17) is ∬ fφ dx dy = ∬(J e∗ )T Nf dx dy = (J e∗ )T F e , e

e

where e T F e = ∬ Nf dx dy = (F ie , F je , F m ) . e

The vector F e is called a unit carrier vector and F se = ∬e N s f dx dy (s = i, j, m).

(d) Global stiffness matrix and global carrier vector Substituting representations of unit stiffness matrix and unit carrier vector into (6.5.17). For the convenience superposition, each K e is extended to an N e × N e matrix K̃ e , each F e is extended to N e -dimensional vectors F̃ e , and J e∗ and J e are extended to J ∗ and J , respectively. So Ne

Ne

(J ∗ )T (∑ K̃ e n ) J = (J ∗ )T (∑ F̃ e n ) . 1

This equality holds for arbitrary

J∗ .

1

Deleting

(J ∗ )T ,

Ne

Ne

1

1

we get

(∑ K̃ e n ) J = (∑ F̃ e n ) . N

N

e e K e n and F = ∑n=1 F e n . Then KJ = F . This is a system of linear equations Let K = ∑n=1 and the matrices K and F are called a global stiffness matrix and a global carrier vector, respectively. Rewriting this system of linear equations using constraint conditions, the desired solution can be obtained.

176 | 6 Numerical methods

6.6 Wavelet methods Starting from the classical Galerkin method and Daubechies wavelets, we introduce the wavelet-Galerkin method for solving differential and integral equations.

6.6.1 Classical Galerkin method Consider a problem of linear differential equations Lu = f,

(6.6.1)

where L is a linear differential operator, f is a known function, and u is a undetermined function. Choosing a set of test functions u1 , . . . , u n , we try to approximate to u by a linear combination u = ∑nk=1 c k u k of these test functions. Since L is a linear operator, n

Lu = ∑ c k Lu k . 1

Choose coefficients c k (k = 1, . . . , n) such that some norm 󵄩󵄩 󵄩󵄩 n 󵄩󵄩 󵄩󵄩 󵄩 ‖Lu − f ‖ = 󵄩󵄩 ∑ c k Lu k − f 󵄩󵄩󵄩 󵄩󵄩 󵄩󵄩 1 󵄩 󵄩 attains the minimal value. The function u is an approximation solution of (6.6.1). The norm may be the norm of C([a, b]) or the norm of L2 ([a, b]), i.e., b

‖f ‖ = max |f(x)| a≤x≤b

or ‖f ‖ = (∫|f(x)|2 dx)

1/2

.

a

The interpolation form of the Galerkin method is n

∑ c k (Lu k )(x i ) = f(x i ) (i = 1, . . . , n), k=1

i.e., choose c1 , . . . , c n such that Lu is an interpolation function. The function n

u(x) = ∑ c k u k (x) 1

is an approximation solution of (6.6.1) in the interpolation sense. The weak form of the Galerkin method is often used. Choose c k (k = 1, . . . , n) such that (u l , Lu)L2 = (u l , f)L2 (l = 1, . . . , n), (6.6.2)

6.6 Wavelet methods |

177

i.e., n

∑ c k (u l , Lu k )L2 = (u l , f)L2

(l = 1, . . . , n).

1

Denote (f l , . . . , f n )T and g lk = (u l , Lu k )L2 . From the system of linear equations n

∑ g lk c k = f l , k=1

we solve out c k (k = 1, . . . , n). The function u = ∑nk=1 c k u k satisfies (6.6.2). For example, consider the Dirichlet problem { ∆u = 0 { u(x, y) = g {

in Ω, on ∂Ω.

or write as Lu = f ,

where Lu = (∆u, u|∂Ω )T and f = (0, g)T .

Let u k be the real part of z k . Then u k (k = 1, . . . , n) are harmonic functions and their linear combination u = ∑nk=1 c k u k is also a harmonic function. So ∆u = 0. Take m points (x l , y l ) (l = 1, . . . , m) distributed on ∂Ω uniformly, where m > n, and then use the Remez algorithm (see Section 3.3) to find c k (k = 1, . . . , n) such that 󵄨󵄨 n 󵄨󵄨 󵄨󵄨 󵄨󵄨 max 󵄨󵄨󵄨 ∑ c k u k (x l , y l ) − g(x l , y l ) 󵄨󵄨󵄨 󵄨󵄨 1≤l≤m 󵄨󵄨 󵄨 k=1 󵄨 attains the minimal value. So this linear combination u is an approximation solution of the Dirichlet problem. In Galerkin method, one often takes a subset of a normal orthogonal basis as test functions such that their linear combination approximates to the solution. Moreover, when the cardinal numbers of subsets increase to ∞, the approximate solutions tend to the exact solution. Wavelet bases have many nice properties such as compact support, smoothness, and vanishing moment. Therefore, the wavelet-Galerkin method has been developing recently. To give the wavelet-Galerkin method, we first state the construction of compactly supported wavelets.

6.6.2 Compactly supported wavelets If ψ is an orthogonal wavelet with p vanishing moments, then it has a support of size larger than or equal to 2p − 1. Daubechies wavelet has a minimum size support in these wavelets. Given a trigonometric polynomial H(ω) satisfying 󵄨󵄨 ω ω 󵄨󵄨2p |H(ω)|2 = 2 󵄨󵄨󵄨󵄨cos 󵄨󵄨󵄨󵄨 P (sin2 ) , 2󵄨 2 󵄨

178 | 6 Numerical methods

where p−1

P(y) = ∑ C kp−1+k y k ,

C kp−1+k =

k=0

(p − 1 + k)! , k! (p − 1)!

one uses the Riesz lemma to construct a minimum degree polynomial R(z) by factorization method such that ω |R(e−iω )|2 = P (sin2 ) , 2 and let p 2p−1 1 + e−iω H(ω) = √2 e−i(p−1)ω ( ) R(e−iω ) = ∑ c k e−ikω . 2 0 The bi-scale equation based on the coefficient c k 2p−1

1 φ(t) = ∑ c k φ(2t − k) 2 0

(6.6.3)

can be solved by the ordinary iteration method as follows. Define an operator T as 2p−1

T(φ(t)) = 2 ∑ c k φ(2t − k). 0

Take a compactly supported function φ0 (t) such as { t { { { φ0 (t) = N2 (t) = { 2 − t { { { 0 {

(0 ≤ t < 1), (1 ≤ t < 2), otherwise,

where N2 (t) is a two-order B spline (see Section 4.4). Let φ n (t) = T(φ n−1 (t)) (n ∈ ℤ+ ). Then φ k (t) converges as k → ∞, the limit is φ∗ (t), and φ∗ (t) is a continuous solution of (6.6.3) with compact support [0, 2p − 1]. Under the additional condition 2p−1

∑ φ∗ (k) = 1,

k=0

the values of the scaling function φ∗ (t) at dyadic decimal k/2m are given. The corresponding wavelet function ψ∗ is given by 1

ψ∗ (t) = −2 ∑ (−1)k c1−k φ(2t − k) 2−2p

(see Section 3.5). The function ψ∗ is called a Daubechies wavelet. It has p vanishing moments and support [−p + 1, p]. The larger p is, the smoother the Daubechies wavelet is.

6.6 Wavelet methods |

179

6.6.3 Wavelet-Galerkin method Consider the problem of one-dimensional differential equation with Dirichlet boundary condition { Lu(x) = f(x) (0 ≤ x ≤ 1), (6.6.4) { u(0) = u(1) = 0, { where f is a real-valued and continuous function on [0, 1] and L is a uniformly elliptic differential operator. m Let ψ mn (x) = 2 2 ψ(2m x − n) be a wavelet basis for L2 ([0, 1]) with boundary condition ψ mn (0) = ψ mn (1) = 0. Let Λ be a set of indices (m, n). For each (m, n) ∈ Λ, ψ mn is a two-order continuously differentiable function and 󵄨󵄨 1 n 󵄨󵄨 ψ mn (x) = 0 if 󵄨󵄨󵄨󵄨x − m 󵄨󵄨󵄨󵄨 > m . (6.6.5) 2 󵄨 2 󵄨 Let the approximation solution of (6.6.4) be up =



c mn ψ mn ,

(m,n)∈Λ

where c mn ((m, n) ∈ Λ) are undetermined coefficients. Select c mn ((m, n) ∈ Λ) such that (Lu p , ψ jk ) = (f, ψ jk ) ((j, k) ∈ Λ). This implies that ∑ (Lψ mn , ψ jk ) c mn = (f, ψ jk ) ((j, k) ∈ Λ).

(6.6.6)

(m,n)∈Λ

Denote

α mnjk = (Lψ mn , ψ jk ), A = (α mnjk )(j,k),(m,n)∈Λ ,

y jk = (f, ψ jk ),

C = (c mn )(m,n)∈Λ ,

Y = (Y jk )(j,k)∈Λ ,

where (m, n) and (j, k) represent the rows and columns of A, respectively. So (6.6.6) is reduced to the following system of linear equations with unknown c mn ∑

α mnjk c mn = y jk

((j, k) ∈ Λ)

(m,n)∈Λ

which is equivalent to AC = Y . From (6.6.5), it is seen that A is a sparse matrix. If the condition number of A is high, choose a matrix B such that the system AC = Y is replaced by the equivalent system BAC = BY , where BA has low condition number. Denote BA = M, BY = V. The system of the linear equation MC = V is stable. From this, the undetermined coefficients c mn are solved. So we get the approximation solution u p .

180 | 6 Numerical methods

6.6.4 Numerical solution of integral equations Numerical calculation of integral operators plays a key role in solving integral equations. Consider a linear integral operator Tf(x) = ∫ K(x, y)f(y) dy ℝ

and K(x, y) = 0 for (x, y) ∈

ℝ2

\

[0, 1]2 .

Take N points (x i , y i ) (i = 1, . . . , N ), where

0 = x1 < x2 < ⋅ ⋅ ⋅ < x N = 1, 0 = y1 < y2 < ⋅ ⋅ ⋅ < y N = 1. Then N

Tf(x i ) = ∑ K(x i , y i )f(y i )∆y i

(∆y i = y i − y i−1 ).

1

From this, the calculation of Tf(x) is reduced to a product of an N × N matrix and an N -dimensional vector. So the number of multiply operations is O(N 2 ). This number is very large. When T is a convolution operator, i.e., (Tf)(x) = ∫ K(x − y)f(y) dy, ℝ

taking Fourier transform, Its discrete form is

̂ ̂ ̂ (Tf)(ω) = K(ω) f (ω).

̂ ̂ ̂ (Tf)(ω i ) = K(ω i ) f (ω i ) (i = 1, . . . , N).

The corresponding number of operations decreases from O(N 2 ) to O(N log N). When the linear operator is not in a convolution form, the Fourier transform approach does not work. Since Daubechies wavelet expansion enables the discretization matrix corresponding to the integral operator T to be reduced to a sparse matrix, the number of operations is only O(N log N) for wavelet-based numerical computation of integral operators. Take a slightly modified Daubechies wavelet with M − 1 vanishing moments, its scaling function φ and wavelet ψ satisfy 2M−1

φ(x) = 2 ∑ h k+1 φ(2x − k), 0 2M−1

ψ(x) = 2 ∑ g k+1 φ(2x − k), 0

where g k = (−1)k−1 h2M−k+1 (k = 1, . . . , 2M).

6.6 Wavelet methods |

181

(a) Wavelet expansion of K(x, y) The bivariate Daubechies wavelets are φ(x)ψ(y), ψ(x)φ(y), ψ(x)ψ(y), where φ(x) and ψ(x) are the Daubechies scaling function and the corresponding wavelet. The bivariate Daubechies wavelet basis is { φ jm (x)ψ jn (y),

ψ jm (x)φ jn (y),

ψ jm (x)ψ jn (y) }j,m,n∈ℤ .

Expand the kernel function K(x, y) into a series with respect to this basis K(x, y) = ∑ α jkl φ jk (x)ψ jl (y) + ∑ β jkl ψ jk (x)φ jl (y) + ∑ γ jkl ψ jk (x)ψ jl (y), j,k,l

j,k,l

(6.6.7)

j,k,l

where the wavelet coefficients are α jkl = ∬ K(x, y)φ jk (x)ψ jl (y) dx dy, ℝ2

β jkl = ∬ K(x, y)ψ jk (x)φ jl (y) dx dy, ℝ2

γ jkl = ∬ K(x, y)ψ jk (x)ψ jl (y) dx dy. ℝ2

(b) Wavelet expansion of Tf(x) By Tf(x) = ∫ℝ K(x, y)f(y) dy and (6.6.7), it follows that Tf(x) = ∑ α jkl d jl φ jk (x) + ∑ β jkl c jl ψ jk (x) + ∑ γ jkl d jl ψ jk (x), j,k,l

j,k,l

(6.6.8)

j,k,l

where d jl = ∫ f(y)ψ jl (y) dy, ℝ

c jl = ∫ f(y)φ jl (y) dy. ℝ

Define three matrices

αj ,

βj ,

γj

and two vectors d̃ j , c ̃j as

α j = (α jkl )k,l , β j = (β jkl )k,l , γ j = (γ jkl )k,l , d̃ j = γ j d j + β j c j , c ̃j = α j d j , where d j = (d jl )l and c j = (c jl )l are vectors. From this and (6.6.8), Tf(x) = ∑ ∑ (d̃ k ψ jk (x) + c ̃k φ jk (x)) . j

j

k

j

(6.6.9)

182 | 6 Numerical methods (c) Numerical calculation of Tf(x) Let N = 2n . Take c0k (k = 1, . . . , N ) as N samples which are regarded as the average of f on dyadic intervals with length 2−n and { c0k } is a 2n -periodic sequence. So we get a (2N − 2)-dimensional vector f ̃ = (d11 , . . . , d1 N , c11 , . . . , c1 N , d21 , . . . , d2, N , c21 , . . . , c2 N , . . . , d n1 c n1 )T . 2

2

4

4

Take n 2n−j

j j T N f(x) = ∑ ∑ (d̃ k ψ jk (x) + c ̃k φ jk (x)) .

(6.6.10)

j=1 k=1

Then Tf(x) ≐ T N f(x). We only need compute T N f(x). Formula (6.6.9) can be represented by a product of the matrix A and the vector f ̃, where A1 γj βj .. A=( (6.6.11) ) , Aj = ( j ), . α 0 An and

α j = (α jkl )k,l=1,...,2n−j , j=1,...,n , β j = (β jkl )k,l=1,...,2n−j , j=1,...,n , γ j = (γ jkl )k,l=1,...,2n−j , j=1,...,n .

(d) Calderon–Zygmund operators Calderon–Zygmund operators are most important integral operators which are widely applied in integral equations. Let T be a Calderon–Zygmund operator whose kernel function satisfies the following three conditions: |K(x, y)| ≤ C(|x − y|−1 ), 󵄨󵄨 M 󵄨󵄨 󵄨󵄨 M 󵄨󵄨 󵄨󵄨 ∂ 󵄨󵄨 󵄨󵄨 ∂ 󵄨󵄨 󵄨󵄨 󵄨󵄨 + 󵄨󵄨 󵄨󵄨 ≤ C(|x − y|−1−M ), K(x, y) K(x, y) 󵄨󵄨 ∂x 󵄨󵄨 󵄨󵄨 ∂y 󵄨󵄨 󵄨 󵄨 󵄨 󵄨 󵄨󵄨 󵄨 󵄨 󵄨󵄨 󵄨 󵄨󵄨∫ K(x, y) dx dy󵄨󵄨󵄨 ≤ C|I| for any dyadic interval I, 󵄨󵄨 I×I 󵄨󵄨 where C is a constant and M ≥ 1. Then the wavelet coefficients of K(x, y) satisfy |α jkl | + |β jkl | + |γ jkl | = O ((1 + |k − l|)−1−M ) .

(e) Modified integral operators Given ε > 0, take

1

B≥(

M C log2 N) . ε

Further reading

| 183

(B)

Let A = (A μν ) be stated in (6.6.11). Define A(B) = (A μν ), where { A μν , (B) A μν = { 0, {

|μ − ν| ≤ B, |μ − ν| > B.

Let α jB , β jB , γ jB be obtained from α j , β j , γ j , respectively, by replacing A by A(B) and d̃ jB , c ̃jB be obtained by replacing α j , β j , γ j by α jB , β jB , γ jB in (6.6.9), respectively. Denote

jB

jB

jB

jB

d jB = (d1 , . . . , d2n−j )T , c jB = (c1 , . . . , c2n−j )T . The modified integral operator is defined as n−j

n 2 ̃ ̃ jB jB T NB f(x) = ∑ ∑ (d k ψ jk (x) + c k φ jk (x)) . j=1 k=1

Then ‖T NB f − T N (f)‖L2 ≤

D log2 N ‖f ‖L2 ≤ ε, BM

where T N is stated in (6.6.10). So we only need compute (T NB f)(x). The corresponding total number of operations is just O(N log N). From this, we can see that the numerical solution of integral equations Tf(x) = g(x) via wavelet methods is fast.

Further reading [1] [2] [3] [4] [5]

[6] [7] [8]

[9]

Goyal K, Mehra M. Fast diffusion wavelet method for partial differential equations. Applied Mathematical Modelling. 2016(40):5000–5025. Nguyen K, Dabdub D. Two-level time-marching scheme using splines for solving the advection equation. Atmospheric Environment. 2001(35):1627–1637. Garcin M, Guegan D. Wavelet shrinkage of a noisy dynamical system with non-linear noise impact. Physica D: Nonlinear Phenomena. 2016(325):126–145. Ravansalar M, Rajaee T, Zounemat-Kermani M. A wavelet-linear genetic programming model for sodium (Na+) concentration forecasting in rivers. Journal of Hydrology. 2016(537):398–407. Lateb M, Meroney RM, Yataghene M, Fellouah H, Saleh F, Boufadel MC. On the use of numerical modelling for near-field pollutant dispersion in urban environments? A review. Environmental Pollution. 2016(208):271–283. Tan Z, Dong J, Xiao Y, Tu J. Numerical simulation of diurnally varying thermal environment in a street canyon under haze-fog conditions. Atmospheric Environment. 2015(119):95–106. Taghinia J, Rahman MM, Siikonen T. Numerical simulation of airflow and temperature fields around an occupant in indoor environment. Energy and Buildings. 2015(104):199–207. Zhang S, Xia Z, Wang T. A real-time interactive simulation framework for watershed decision making using numerical models and virtual environment. Journal of Hydrology. 2013(493): 95–104. Yu W, Chen XJ, Wu GH, Liu J, Hearn GE. A fast numerical method for trimaran wave resistance prediction. Ocean Engineering. 2015(107):70–84.

184 | 6 Numerical methods

[10] Rai AC, Lin CH, Chen Q. Numerical modeling of particle generation from ozone reactions with human-worn clothing in indoor environments. Atmospheric Environment. 2015(102):145–155. [11] Lefebvre A, Paarlberg AJ, Ernstsen VB, Winter C. Flow separation and roughness lengths over large bedforms in a tidal environment: A numerical investigation. Continental Shelf Research. 2014(91):57–69. [12] Felde VA, Bjune AE, Grytnes JA, Birks HJB. A comparison of novel and traditional numerical methods for the analysis of modern pollen assemblages from major vegetation-landform types. Review of Palaeobotany and Palynology. 2014(210):22–36. [13] Sarbu I, Pacurar C. Experimental and numerical research to assess indoor environment quality and schoolwork performance in university classrooms. Building and Environment. 2015(93): 141–154. [14] Chen H, Janbakhsh S, Larsson U, Moshfegh B. Numerical investigation of ventilation performance of different air supply devices in an office environment. Building and Environment. 2015(90):37–50. [15] Pennell KG, Scammell MK, McClean MD, Suuberg EM, Roghani AMM, Ames J, Friguglietti L, Indeglia PA, Shen R, Yao Y, Heiger-Bernays WJ. Field data and numerical modeling: A multiple lines of evidence approach for assessing vapor intrusion exposure risks. Science of the Total Environment. 2016(556):291–301.

7 Optimization Optimization problems arise in almost all areas of environmental science. Optimization can be used to reduce overall product cost, minimize negative environment impacts and maximize the probability of making a correct decision. Many numerical methods can solve these optimization problems quickly with/without some constraints, especially, linear and nonlinear optimization problems. In this chapter we will introduce main optimization techniques, including Newton’s method, steepest descent method, Newton–Raphson method, variational method, simplex method, Fermat method, KKT optimality conditions and primal/dual pairs.

7.1 Newton’s method and steepest descent method Newton’s algorithm is one of the known methods for solving unconstrained optimization problems. We start from the one-dimensional version. Let f : ℝ → ℝ be an objective function and we will minimize f , i.e., solve the equation f 󸀠 (x) = 0. At first, x0 ∈ ℝ is taken as an initial point. If f 󸀠 (x0 ) = 0, the algorithm stops. Otherwise, f 󸀠 (x0 ) ≠ 0. Ignoring all nonlinear terms in the Taylor series of f 󸀠 (x) at x0 f 󸀠 (x) = f 󸀠 (x0 ) + (x − x0 )f 󸀠󸀠 (x0 ) + (x − x0 )2

f 󸀠󸀠󸀠 (x0 ) f (4) (x0 ) + (x − x0 )3 + ⋅⋅⋅ , 2! 3!

the nonlinear equation f 󸀠 (x) = 0 can be approximated by the linear equation f 󸀠 (x0 ) + (x − x0 )f 󸀠󸀠 (x0 ) = 0. Since f 󸀠󸀠 (x0 ) > 0, the equation yields the first iterate x1 = x0 −

f 󸀠 (x0 ) f 󸀠󸀠 (x0 )

(f 󸀠 (x0 ) ≠ 0).

The process is repeated at x1 , leading to a new iterate x2 , and repeatedly to a new iterate x n . The generic recurrence equation is given by x k+1 = x k −

f 󸀠 (x k ) f 󸀠󸀠 (x k )

(f 󸀠 (x k ) ≠ 0, f 󸀠󸀠 (x k ) > 0).

The one-dimensional classical Newton’s algorithm has a local convergence property as follows. Let f : ℝ → ℝ be twice smooth with f 󸀠 (x∗ ) = 0 and f 󸀠󸀠 (x∗ ) > 0, where x∗ ∈ ℝ. Denote M ∗ = f 󸀠󸀠 (x∗ ). If there are r > 0 and M > 0 such that for any x󸀠 , x󸀠󸀠 ∈ (x∗ − r, x∗ + r), (7.1.1) |f 󸀠󸀠 (x󸀠 ) − f 󸀠󸀠 (x󸀠󸀠 )| ≤ M |x󸀠 − x󸀠󸀠 |, then the iterates {x k }k=0,1,... ∈ (x∗ − r, x∗ + r) converge to x∗ quadratically, i.e., |x k+1 − ∗ x∗ | ≤ C|x k − x∗ |2 , where C = M∗M−Mr (r < MM ). DOI 10.1515/9783110424904-008

186 | 7 Optimization In fact, by (7.1.1) and Mr < M ∗ , it follows that for x ∈ (x∗ − r, x∗ + r), |f 󸀠󸀠 (x)| = |f 󸀠󸀠 (x∗ ) + f 󸀠󸀠 (x) − f 󸀠󸀠 (x∗ )| ≥ |f 󸀠󸀠 (x∗ )| − |f 󸀠󸀠 (x) − f 󸀠󸀠 (x∗ )| ≥ M ∗ − Mr.

(7.1.2)

Since f(x) is twice smooth, the Lagrange form of the Taylor expansion is ̄ f 󸀠 (x k ) = f 󸀠 (x∗ ) + (x k − x∗ )f 󸀠󸀠 (x), 󸀠



) = 0 (since f 󸀠 (x∗ ) = 0), it follows where x̄ ∈ (x k , x∗ ) or x̄ ∈ (x∗ , x k ). From this and ff󸀠󸀠(x (x k ) that f 󸀠 (x k ) f 󸀠 (x∗ ) x k+1 − x∗ = (x k − 󸀠󸀠 ) − x∗ + 󸀠󸀠 f (x k ) f (x k ) ∗ 󸀠󸀠 󸀠 (x k − x )f (x k ) − (f (x k ) − f 󸀠 (x∗ )) = f 󸀠󸀠 (x k ) ∗ 󸀠󸀠 ̄ (x k − x )(f (x k ) − f 󸀠󸀠 (x)) = . 󸀠󸀠 f (x k )

Therefore, by (7.1.1) and (7.1.2), for x k ∈ (x∗ − r, x∗ + r), |x k+1 − x∗ | = where C =

|x k − x∗ | |f 󸀠󸀠 (x)̄ − f 󸀠󸀠 (x k )| ≤ C|x k − x∗ | | x̄ − x k |, |f 󸀠󸀠 (x k )|

M M ∗ −Mr . Note that

x̄ ∈ (x k , x∗ ) or x̄ ∈ (x∗ , x k ). Clearly, | x̄ − x k | ≤ |x k − x∗ |. So |x k+1 − x∗ | ≤ C|x k − x∗ |2 ,

i.e., the iterates {x k }k=0,1,... converge to x∗ quadratically.

Classical Newton’s method Let f : ℝn → ℝ be a twice smooth function and its Hessian matrix ∂2 f(x) ∂x21 2 ∂ f(x) ( ∂x2 ∂x1

∇ f(x) = ( 2

.. .

∂2 f(x)

( ∂x n ∂x1

∂2 f(x) ∂x1 ∂x2

⋅⋅⋅

∂2 f(x) ∂x1 ∂x n

∂2 f(x) ∂x22

⋅⋅⋅

∂2 f(x) ∂x2 ∂x n )

.. .

..

∂2 f(x) ∂x n ∂x2

⋅⋅⋅

.

.. .

∂2 f(x) ∂x2n

) )

be positive definite, where x = (x1 , . . . , x n ). The classical Newton’s algorithm involves the following three steps: Step 1. Start from an initial point x0 ∈ ℝn . Step 2. Stop if ∇f(x k ) = 0, where x k ∈ ℝn . Otherwise, the Newton direction d k = −∇f(x k )(∇2 f(x k ))−1

(d k ∈ ℝn ),

where (∇2 f(x))−1 is the inverse matrix of the Hessian matrix.

7.1 Newton’s method and steepest descent method |

187

Step 3. The next iterate x k+1 = x k + d k . Return to Step 2, the process is repeated at x k+1 . The classic Newton method is related closely to the concept of the descent direction. Definition 7.1.1. Let f : ℝn → ℝ be a smooth function. The derivative of f at x in direction d is defined as f(x + λd) − f(x) ∂f(x) = lim , ∂d λ λ→0 where x, d ∈ ℝn and λ ∈ ℝ. If

∂f(x) < 0, ∂d then d is said to be a descent direction for f at x. Note that f(x + λd) − f(x) λ f(x1 + λd1 , x2 + λd2 , x3 + λd3 , . . . , x n + λd n ) − f(x1 , x2 , x3 , . . . , x n ) = λ f(x1 + λd1 , x2 + λd2 , x3 + λd3 , . . . , x n + λd n ) − f(x1 , x2 + λd2 , x3 + λd3 , . . . , x n + λd n ) = λ f(x1 , x2 + λd2 , x3 + λd3 , . . . , x n + λd n ) − f(x1 , x2 , x3 + λd3 , . . . , x n + λd n ) + + ⋅⋅⋅ λ f(x1 , x2 , . . . , x n−1 , x n + λd n ) − f(x1 , x2 , . . . , x n−1 , x n ) + . λ

Let λ → 0. Then ∂f(x) ∂f(x) ∂f(x) ∂f(x) + d2 + ⋅ ⋅ ⋅ + dn = ⟨∇f(x), d⟩, = d1 ∂d ∂x1 ∂x2 ∂x n where ∇f(x) = (

∂f(x) ∂f(x) ,..., ) ∂x1 ∂x n

is the gradient of f at x. From this and Definition 7.1.1, it is seen that if f is smooth, then every direction d satisfying ⟨∇f(x), d⟩ < 0 is a descent direction at x. Since the Hessian matrix ∇2 f(x) is positive definite, its inverse matrix (∇2 f(x))−1 exists and is also positive definite, and for all ∇f(x k ) ≠ 0, ⟨∇f(x k ), ∇f(x k )(∇2 f(x k ))−1 ⟩ > 0 or ⟨∇f(x k ), d k ⟩ < 0. By Definition 7.1.1, the Newton direction d k is a descent direction. The classical Newton’s algorithm has a local convergence property as follows. Let f : ℝn → ℝ be twice smooth with ∇f(x ∗ ) = 0 and ∇2 f(x ∗ ) be positive definite. If there are r > 0 and M > 0 such that the Hessian matrix satisfies the locally Lipschitz condition, i.e., for any x 󸀠 , x 󸀠󸀠 ∈ B(x ∗ , r), ‖∇2 f(x 󸀠 ) − ∇2 f(x 󸀠󸀠 )‖ ≤ M |x 󸀠 − x 󸀠󸀠 |,

(7.1.3)

188 | 7 Optimization where B(x ∗ , r) is a ball with the center x ∗ and radius r, then the iterates x k ∈ B(x ∗ , r) (k = 0, 1, . . . ) converge to x ∗ quadratically, i.e., |x k+1 − x ∗ | ≤ C|x k − x ∗ |2 , where M∗ ∗ = ‖∇2 f(x ∗ )‖. C = MM‖I‖ ∗ −Mr (r < M ) and M In fact, by (7.1.3) and Mr < M ∗ , it follows that for x ∈ B(x ∗ , r), ‖∇2 f(x)‖ = ‖∇2 f(x ∗ ) + ∇2 f(x) − ∇2 f(x ∗ )‖ ≥ ‖∇2 f(x ∗ )‖ − ‖∇2 f(x) − ∇2 f(x ∗ )‖ ≥ M ∗ − Mr. Since ∇2 f(x)(∇2 f(x))−1 = I , where I is the unit matrix and (∇2 f(x))−1 is the inverse matrix of ∇2 f(x), for x ∈ B(x ∗ , r), ‖(∇2 f(x))−1 ‖ =

‖I‖ ‖∇2 f(x)‖



‖I‖ . − Mr

M∗

(7.1.4)

Since ∇f(x) is a continuously differentiable function, the Lagrange form of its Taylor expansion is ̄ ∇f(x k ) = ∇f(x ∗ ) + (x k − x ∗ )∇2 f(x), where x̄ ∈ (x k , x ∗ ) or x̄ ∈ (x∗ , x). From this and ∇f(x ∗ )(∇2 f(x k ))−1 = 0 (since ∇f(x ∗ ) = 0), it follows that x k+1 − x ∗ = (x k − ∇f(x k )(∇2 f(x k ))−1 ) − x ∗ + ∇f(x ∗ )(∇2 f(x k ))−1 = (x k − x ∗ )∇2 f(x k )(∇2 f(x k ))−1 − (∇f(x k ) − ∇f(x ∗ ))(∇2 f(x k ))−1 2 ̄ = (x k − x ∗ )(∇2 f(x k ) − ∇2 f(x))(∇ f(x k ))−1 .

By (7.1.3) and (7.1.4), it follows that for x k ∈ B(x ∗ , r), ̄ ‖(∇2 f(x k ))−1 ‖ |x k+1 − x ∗ | = |x k − x ∗ | ‖∇2 f(x k ) − ∇2 f(x)‖ ̄ ≤ C|x k − x ∗ | |x k − x|, where C = So

M‖I‖ M ∗ −Mr .

Note that x̄ ∈ (x k , x ∗ ) or x̄ ∈ (x ∗ , x k ). Clearly, |x k − x|̄ ≤ |x k − x ∗ |. |x k+1 − x ∗ | ≤ C|x k − x ∗ |2 ,

i.e., the iterates {x k }k=0,1,... ∈ B(x ∗ , r) converge to x ∗ quadratically. The steepest descent method was introduced by Cauchy. Different to the classical Newton’s method, steepest descent method chooses a descent direction such that f decreases most quickly and includes a line minimization step.

Steepest descent method Let f : ℝn → ℝ be a smooth function. The steepest descent algorithm involves the following three steps:

189

7.1 Newton’s method and steepest descent method |

Step 1. Start at an initial point x0 ∈ ℝn . Step 2. Stop if ∇f(x k ) = 0, where x k ∈ ℝn . Otherwise, the descent direction is d k = −∇f(x k ) (d k ∈ ℝn ). Step 3. The next iterate x k+1 = x k + μ k d k , where the step size μ k > 0 minimizes the function f(x k + μd k ) with respect to μ on [0, ∞). Return to Step 2, the process is repeated at x k+1 . The steepest descent algorithm possesses the following convergence property: The steepest descent algorithm stops after a finite number of steps at a point x k where ∇f(x k ) = 0. Otherwise, the algorithm generates an infinite sequence of points {x k }k=0,1,... . According to Bolzano–Weierstrass theorem, the infinite sequence {x k }k=0,1,... must have at least one convergent subsequence. Without loss of generality, we still denote this convergent subsequence by {x k }k=0,1,... . Let x ∗ be its limit point. Now we prove ∇f(x ∗ ) = 0. If ∇f(x ∗ ) ≠ 0, then d∗ = −∇f(x ∗ ) is a descent direction, i.e., there is a μ∗ > 0 so that f(x ∗ + μd∗ ) < f(x ∗ ) for any μ ∈ (0, μ∗ ). Note that d k = −∇f(x k ), d∗ = −∇f(x ∗ ) ≠ 0. Since f(x) is a smooth function, d k → d∗ ≠ 0. Combining this with x k → x ∗ , it follows from x k+1 = x k + μ k d k that μ k → 0. This implies μ k ∈ (0, μ∗ ). By the assumption that μ k minimizes the function f(x k + μd k ) with respect to μ on [0, ∞), since f(x k ) is independent of μ, it is clear that μ k also minimizes the function f(x k + μd k ) − f(x k ) with respect to μ on [0, ∞). So, for any μ k ∈ (0, μ∗ ) and μ ∈ (0, μ∗ ), f(x k + μ k d k ) − f(x k ) ≤ f(x k + μd k ) − f(x k ).

(7.1.5)

Since f(x) is smooth and x k → x ∗ , and μ k → 0, the left-hand side of (7.1.5) is f(x k + μ k d k ) − f(x k ) → 0. Since f(x) is a smooth function and d k = −∇f(x k ), the right-hand side of (7.1.5) is f(x k + μd k ) − f(x k ) = ⟨μd k , ∇f(x k )⟩ + o(μ) = −μ⟨∇f(x k ), ∇f(x k )⟩ + o(μ), where o(μ) |μ| → 0 as μ → 0. Let k → ∞ in (7.1.5) and choose μ close enough to 0. Note that μ > 0 and the assumption ⟨∇f(x∗ ), ∇f(x ∗ )⟩ > 0. Then 0 ≤ −μ⟨∇f(x∗ ), ∇f(x ∗ )⟩ < 0. This is a contradiction. Thus ∇f(x ∗ ) = 0. Newton’s method includes a minimization step and a step size. It is a development of the classical Newton’s method and is a variant of the steepest descent method.

190 | 7 Optimization

Newton’s method Let f : ℝn → ℝ be a twice smooth function and its Hessian matrix ∂2 f(x) ∂x21 ∂2 f(x) ( ∂x2 ∂x1

∇2 f(x) = (

.. .

∂2 f(x)

( ∂x n ∂x1

∂2 f(x) ∂x1 ∂x2

⋅⋅⋅

∂2 f(x) ∂x1 ∂x n

∂2 f(x) ∂x22

⋅⋅⋅

∂2 f(x) ∂x2 ∂x n )

.. .

..

∂2 f(x) ∂x n ∂x2

⋅⋅⋅

.

)

... ∂2 f(x) ∂x2n

)

be positive definite, where x = (x1 , . . . , x n ). Newton’s algorithm involves the following three steps: Step 1. Start from an initial point x0 ∈ ℝn . Step 2. Stop if ∇f(x k ) = 0, where x k ∈ ℝn . Otherwise, the Newton direction is d k = −∇f(x k )(∇2 f(x k ))−1

(d k ∈ ℝn ),

where (∇2 f(x))−1 is the inverse matrix of the Hessian matrix. Step 3. The next iterate x k+1 = x k + μ k d k , where the step size μ k > 0 minimizes the function f(x k + μd k ) with respect to μ on [0, ∞). Return to Step 2, the process is repeated at x k+1 . Newton’s algorithm has the same convergence as the steepest descent algorithm. It stops after a finite number of steps at a point x k , where ∇f(x k ) = 0. Otherwise, the algorithm generates an infinite sequence of points {x k }k=0,1,... . By the Bolzano–Weierstrass theorem, the infinite sequence {x k }k=0,1,... must have at least one convergent subsequence. Without loss of generality, we still denote this convergent subsequence by {x k }k=0,1,... . Let x ∗ be its limit point. Now we prove ∇f(x ∗ ) = 0. If ∇f(x ∗ ) ≠ 0, since the Hessian matrix is positive definite, it is invertible and its inverse matrix is also positive definite. By the definition of positive definite, the inverse matrix satisfies that ⟨∇f(x ∗ ), ∇f(x ∗ )(∇2 f(x ∗ ))−1 ⟩ > 0, (7.1.6) or ⟨∇f(x ∗ ), d k ⟩ < 0. By Definition 7.1.1, d∗ = −∇f(x ∗ )(∇2 f(x ∗ ))−1 is also a descent direction. So, there is a μ∗ > 0 so that f(x ∗ + μd∗ ) < f(x ∗ ) for any μ ∈ (0, μ∗ ). Note that d k = −∇f(x k )(∇2 f(x k ))−1 , d∗ = −∇f(x ∗ )(∇2 f(x ∗ ))−1 ≠ 0. Since f : ℝn → ℝ is a twice smooth function and x k → x ∗ , d k → d∗ ≠ 0. Combining this with x k → x ∗ , it follows from x k+1 = x k + μ k d k that μ k → 0. This implies μ k ∈ (0, μ∗ ).

7.1 Newton’s method and steepest descent method |

191

By the assumption that μ k minimizes the function f(x k + μd k ) with respect to μ on [0, ∞), but f(x k ) is independent of μ, so μ k also minimizes the function f(x k + μd k ) − f(x k ) with respect to μ on [0, ∞). This implies that for any μ k ∈ (0, μ∗ ) and μ ∈ (0, μ∗ ), f(x k + μ k d k ) − f(x k ) ≤ f(x k + μd k ) − f(x k ). (7.1.7) Since f(x) is smooth and x k → x ∗ , and μ k → 0, the left-hand side of (7.1.7) is f(x k + μ k d k ) − f(x k ) → 0. Since f(x) is smooth and d k = −∇f(x k )(∇2 f(x k ))−1 , the right-hand side of (7.1.7) is f(x k + μd k ) − f(x k ) = ⟨∇f(x k ), μd k ⟩ + o(μ) = −μ⟨∇f(x k ), ∇f(x k )(∇2 f(x k ))−1 ⟩ + o(μ), where o(μ) |μ| → 0 as μ → 0. Let k → ∞ in (7.1.7) and choose μ close enough to 0. Note that μ > 0 and (7.1.6). It is clear that 0 ≤ −μ⟨∇f(x∗ ), ∇f(x ∗ )(∇2 f(x ∗ ))−1 ⟩ < 0. This is a contradiction. Thus ∇f(x ∗ ) = 0. Denote G(x) = ∇f(x) and G k (x) = ∂f(x) ∂x k . The Newton method is viewed as systems of n linear/nonlinear equations in n unknown variables G1 (x) 0 . . G(x) = ( . ) = ( ... ) 0 G n (x) with the Jacobian matrix ∂G1 (x) ∂x1

∇G(x) = (

.. .

∂G n (x) ∂x1

⋅⋅⋅ .. .

∂G1 (x) ∂x n ,

⋅⋅⋅

∂G n (x) ∂x n

.. .

)

and Newton’s algorithm involves a line minimization. Inspired by it, we have the following method.

Newton–Raphson method Assume that ∇G(x) is invertible. The algorithm involves three steps: Step 1. Start from an initial point x0 ∈ ℝn . Step 2. Stop if G(x k ) = 0, where x k ∈ ℝn . Otherwise, the Newton–Raphson direction d k = −G(x k )(∇G(x k ))−1

(d k ∈ ℝn ),

where (∇G(x))−1 is the inverse matrix of the Jacobian matrix ∇G(x).

192 | 7 Optimization Step 3. The next iterate x k+1 = x k + d k . Return to Step 2, the process is repeated at x k+1 . The Newton–Raphson method has the same local convergence property as the classical Newton method Let G: ℝn → ℝn be a twice smooth function with G(x ∗ ) = 0 and ∇G(x ∗ ) invertible. If there are r > 0 and M ≥ 0 such that for any x󸀠 , x 󸀠󸀠 ∈ B(x ∗ , r), ‖∇G(x 󸀠 ) − ∇G(x 󸀠󸀠 )‖ ≤ M |x 󸀠 − x 󸀠󸀠 |, then the iterates {x k }k=0,1,... ∈ B(x ∗ , r) converge to x ∗ quadratically, i.e., |x k+1 − x ∗ | ≤ M∗ ∗ = ‖∇G(x ∗ )‖. C|x k − x ∗ |2 , where C = MM‖I‖ ∗ −Mr (r < M ) and M In fact, similar to the argument of (7.1.4), for x ∈ B(x∗ , r), ‖(∇G(x))−1 ‖ =

‖I‖ ‖I‖ ≤ . ‖∇G(x)‖ M ∗ − Mr

Since G(x) is continuously differentiable, the Lagrange form of the Taylor expansion shows that for x k ∈ B(x ∗ , r), ̄ G(x k ) = G(x ∗ ) + (x k − x ∗ )∇G(x), where x̄ ∈ (x k , x ∗ ) or x̄ ∈ (x ∗ , x k ). From this and G(x ∗ )(∇G(x k ))−1 = 0 (since G(x ∗ ) = 0), it follows that x k+1 − x ∗ = x k + d k − x ∗ + G(x ∗ )(∇G(x k ))−1 = x k − G(x k )(∇G(x k ))−1 − x ∗ + G(x ∗ )(∇G(x k ))−1 −1 ̄ = (x k − x ∗ )(∇G(x k ) − ∇G(x))(∇G(x k )) ,

and so ̄ ‖(∇G(x k ))−1 ‖ |x k+1 − x ∗ | = |x k − x ∗ | ‖∇G(x k ) − ∇G(x)‖ ≤ C|x k − x ∗ |2 , where C =

M‖I‖ M ∗ −Mr , i.e., the iterates

{x k }k=0,1,... ∈ B(x ∗ , r) converge to x ∗ quadratically.

7.2 The variational method The variational method is a useful method for solving the optimization problem of functionals, i.e., for finding a function f such that the functional υ(f) with boundary conditions attains the minimal value min υ(f)

subject to b

{ υ(f) = ∫a F(x, f(x), f 󸀠 (x)) dx, { f(a) = y0 , f(b) = y1 (boundary conditions), {

(7.2.1)

7.2 The variational method |

193

where F is a second-order differentiable function and f is a second-order continuously differentiable function. For the optimization problem (7.2.1), if the functional υ(f) attains the minimal value, then f must satisfy the Euler equation Ff −

d F f 󸀠 = 0. dx

In fact, for a small parameter α, let b

I(α) = ∫ F (x, f(x) + αη(x), f 󸀠 (x) + αη󸀠 (x)) dx, a

where η is any differentiable function and η(a) = η(b) = 0. Clearly, I(0) = υ(f). Differentiating both sides, b

I 󸀠 (α) = ∫ (F f +αη η(x) + F f 󸀠 +αη󸀠 η󸀠 (x)) dx. a

Let α = 0. Then b

b

󸀠

b

I (0) = ∫ (F f η(x) + F η (x)) dx = ∫ F f η(x) dx + ∫ F f 󸀠 η󸀠 (x) dx. f󸀠

󸀠

a

a

a

By integration by parts, the second integral on the right-hand side is b

b

b

a

a

a

d d 󵄨b ∫ F f 󸀠 η󸀠 (x) dx = F f 󸀠 η(x)󵄨󵄨󵄨a − ∫ ( F f 󸀠 ) η(x) dx = − ∫ ( F f 󸀠 ) η(x) dx. dx dx Therefore, b

b

b

a

a

a

d d I (0) = ∫ F f η(x) dx − ∫ ( F f 󸀠 ) η(x) dx = ∫ (F f − F f 󸀠 ) η(x) dx. dx dx 󸀠

If υ(f) attains the minimal value, then I 󸀠 (0) = 0, i.e., b

∫ η(x) (F f − a

d F f 󸀠 ) dx = 0. dx

d F f 󸀠 = 0. Since η(x) is arbitrary, this implies that f must satisfy F f − dx The generalization of the optimization problem (7.2.1) is as follows:

min υ(f)

subject to b

{ υ(f) = ∫a F (x, f1 (x), . . . , f n (x), f1󸀠 (x), . . . , f n󸀠 (x)) dx, { f (a) = y i0 , f i (b) = y i1 (i = 1, . . . , n) (boundary conditions), { i

194 | 7 Optimization where F is a second-order differentiable function and f i (i = 1, . . . , n) are secondorder continuously differentiable functions. Similarly, for the generalized optimization problem, if the functional υ(f) attains the minimal value, then f i must satisfy the Euler equation d F 󸀠 = 0 (i = 1, . . . , n). F fi − dx f i The conditional optimization problem of functionals is to find f, g such that the functional υ(f, g) with an additional condition and boundary conditions attains the minimal value, i.e., to solve the following conditional optimization problem: min υ(f, g)

subject to b

{ υ(f, g) = ∫a F(x, f, f 󸀠 , g, g󸀠 ) dx, { { { G(x, f, g) = 0, { { { { f(a) = y0 , f(b) = y1 , g(a) = z0 , g(b) = z1 , { where G(x, f, g) = 0 is the additional condition and f(a) = y0 , f(b) = y1 , g(a) = z0 and g(b) = z1 are boundary conditions. If the functional υ(f, g) attains the minimal value, then f and g must satisfy d ∗ F 󸀠 − F ∗f = 0, dx f d ∗ F 󸀠 − F ∗g = 0, dx g where F ∗ = F + λ(x)G and λ(x) =

d 󸀠 dx F f

− Ff

=

Gf

− Fg . Gg

d 󸀠 dx F g

In fact, the additional condition G(x, f, g) = 0 determines that g is a function of x and f , say, g = φ(x, f). So ̃ f, f 󸀠 ), F(x, f, f 󸀠 , g, g󸀠 ) = F(x, f, f 󸀠 , φ, φ x + φ f f 󸀠 ) =: F(x, ̃ υ(f, g) = υ(f, φ(x, f)) =: υ(f), φ(a, y0 ) = z0 ,

φ(b, y1 ) = z1 .

Then the conditional optimization problem of functionals is reduced to the optimization problem (7.2.1), i.e., ̃ min υ(f)

subject to

̃ f, f 󸀠 ) dx, ̃ = ∫a F(x, { υ(f) { f(a) = y0 , f(b) = y1 . { ̃ attains the minimal value, If the functional υ(f, g) attains the minimal value, i.e., υ(f) then the Euler equation d ̃ F f 󸀠 − F̃ f = 0 dx b

7.2 The variational method |

195

holds. Note that F̃ f = F f + F g φ f + F g󸀠 (φ xf + φ ff f 󸀠 ), F̃ f 󸀠 = F f 󸀠 + F g 󸀠 φ f , d ̃ d d Ff 󸀠 = Ff 󸀠 + φf F g󸀠 + F g󸀠 (φ fx + φ ff f 󸀠 ). dx dx dx The Euler equation becomes that d d F f 󸀠 − F f + φ f ( F g󸀠 − F g ) = 0. dx dx On the other hand, differentiating the equation G(x, f, g) = 0 with respect to f , G f + G g φ f = 0, or φ f = −G f /G g . So the Euler equation further becomes that Gf d d Ff 󸀠 − Ff − ( F g󸀠 − F g ) = 0, dx G g dx i.e., d 󸀠 dx F f

− Ff

Gf

=

d 󸀠 dx F g

− Fg

Gg

= λ(x)

which is equivalent to d F f 󸀠 − (F f + λ(x)G f ) = 0, dx d F g󸀠 − (F g + λ(x)G g ) = 0. dx

(7.2.2)

Let F ∗ = F + λ(x)G. Note that d ∗ d F󸀠= Ff 󸀠 , dx f dx d ∗ d F 󸀠= F g󸀠 dx g dx

F ∗f = F f + λ(x)G f , F ∗g = F g + λ(x)G g .

Then, the conditions (7.2.2) are further equivalent to the following conditions: d ∗ F 󸀠 − F ∗f = 0, dx f d ∗ F 󸀠 − F ∗g = 0. dx g The variational method is frequently used in data compression by combining with dyadic wavelet transform. Denote by θ(t) the Gauss function, i.e., θ(t) =

1 √2πα

t2

e− 2α

(α > 0).

196 | 7 Optimization

It is clear that



1 − t2 1 e 2α = 1. ∫ θ(t) dt = ∫ √π √2α ℝ −∞

Let θ λ (t) =

1 t λ θ( λ )

(λ > 0). Using change of variables t = λu gives ∞

∫ θ λ (t) dt = ∫ ℝ

−∞



1 t θ ( ) dt = ∫ θ(u) du = 1. λ λ −∞

Let f be a one-dimensional signal. The convolution of f and θ λ is a mean with weight θλ , (f ∗ θ λ )(t) = ∫ f(τ)θ λ (t − τ) dτ, ℝ

and is an infinitely differentiable function. Note that θ2−m (t) = 2m θ(2m t). Then d d (f ∗ θ2−m )(t) = 2m (∫ f(τ)θ(2m (t − τ)) dτ) = 22m ∫ f(τ)θ󸀠 (2m (t − τ)) dτ. dt dt ℝ ℝ Let ψ(t) = −θ󸀠 (t). Its Fourier transform satisfies A ≤ ∑m∈ℤ | ψ(̂ 2ωm )|2 ≤ B (stability condition), where A and B are positive constants. Since θ(t) is an even function, ψ is an odd function and d (f ∗ θ2−m )(t) = 2m (W ψm f)(t) (m ∈ ℤ), dt where W ψm f is the dyadic wavelet transform. Assume that |(W ψm f)(t)| attains the maximal values on points { t m n }n,m∈ℤ . Denote m m by { X n }m,n∈ℤ these maximal values on points { t n }n,m∈ℤ . Once W ψm f (m ∈ ℤ) are m found by data { t m n , X n }m,n∈ℤ , the signal f is obtained immediately using the inversion formula of dyadic wavelet transform. Thus, it remains to reconstruct W ψm f by data m { tm n , X n }n,m∈ℤ . This is reduced to finding h(t) satisfying the following: m (W ψm h)(t m n ) = Xn

(m, n ∈ ℤ), (7.2.3)

m ∫ 2m h(t)ψ (2m (t − t m n )) dt = X n . ℝ

Since the values of |W nm h| at t m n are known, the first equation of (7.2.3) can be replaced approximately by finding h such that the integral ‖W ψm h‖22 = ∫ |(W ψm h)(t)|2 dt ℝ

(m ∈ ℤ)

is as small as possible, and the integral 󵄨󵄨2 󵄩󵄩 d 󵄨󵄨 d 󵄩󵄩󵄩2 󵄨 󵄩󵄩 󵄨 󵄩󵄩 W ψm h󵄩󵄩󵄩 = ∫ 󵄨󵄨󵄨 (W ψm h)(t)󵄨󵄨󵄨 dt 󵄩󵄩2 󵄨󵄨 ℝ 󵄨󵄨 dt 󵄩󵄩 dt

7.2 The variational method |

197

is also required to be as small as possible to avoid that new extreme points occur. Both require that 󵄩󵄩2 󵄩󵄩 d 󵄩 󵄩2 󵄩 󵄩 ‖h‖2∗ = ∑ (󵄩󵄩󵄩󵄩W ψm h󵄩󵄩󵄩󵄩 + 2−2m 󵄩󵄩󵄩 W ψm h󵄩󵄩󵄩 ) 2 󵄩󵄩2 󵄩 dt 󵄩 n attains the minimal value. It is easy to prove that ‖h‖∗ < ∞ for f ∈ L2 (ℝ). Let the space K consist of all differentiable function sequences F = {F m }m∈ℤ satisfying ∑ (‖F m ‖22 + 2−2m ‖F 󸀠m ‖22 ) < ∞. m

For {F m }m∈ℤ ∈ K , define a boundary linear operator W ψ−1 from K to L2 (ℝ) by W ψ−1 (F) =

1 ∗ , ∑ Fm ∗ Hm 2π m

∗ = 2m ψ ∗ (2m t) and ψ ∗ is the dyadic dual wavelet of ψ. Let Γ be the closed where H m m subset of F = {F m }m∈ℤ ∈ K satisfying the condition F m (t m n ) = X n (m, n ∈ ℤ), and let V denote the space of dyadic wavelet transform W ψ f , where f ∈ L2 (ℝ). Then it is desired to find h ∈ V ⋂ Γ such that ‖h‖K attains the minimal value. Define a linear bounded operator from K to V as P V = W ψ ∘ W ψ−1 . When F ∈ V , P V F = F. Since ψ is a real-valued odd function, it can be proved that P V is an orthogonal project operator from K to V . Now we find the project of F ∈ K to the closed subset Γ . Let P Γ (F) = h. Then ‖F − h∗ ‖2K (h∗ ∈ Γ ) attains the minimal value at h∗ = h = ∗ {h m }m∈ℤ . Let ε m (t) = F m (t) − h∗m (t). (7.2.4)

Then ‖F − h∗ ‖2K = ∑ (‖ε m ‖22 + 2−2m ‖ε󸀠m ‖22 ) . m

When

h∗m

= h m , this formula attains the minimal value. Note that m m ε m (t m n ) = F m (t n ) − X n , m m ε m (t m n+1 ) = F m (t n+1 ) − X n+1 .

(7.2.5)

When h∗m = h m , the integral tm n+1

∫ (|ε m (t)|2 + 2−2m |ε󸀠m (t)|2 ) dt tm n

attains the minimal value. Let y = ε m (t). Take H = y2 + 2−2m (y󸀠 )2 . Then H y = 2ε m (t), d d H y󸀠 = (2−2m+1 y󸀠 ) = 2−2m+1 ε󸀠󸀠 m (t), dt dt

(7.2.6)

198 | 7 Optimization

and

tm n+1

tm n+1

∫ (|ε n (t)|2 + 2−2m |ε󸀠m (t)|2 ) dt = ∫ H(y(t), y󸀠 (t)) dt. tm n

tm n

The variational method says that if y satisfies the Euler equation H y − the integral

d 󸀠 dt H y

= 0, then

tm n+1

∫ H(y(t), y󸀠 (t)) dt tm n

attains the minimal value. Thus, if ε m (t) satisfies the Euler equation ε m (t) − 2−2m ε󸀠󸀠 m (t) = 0

m (t m n ≤ t ≤ t n+1 )

(7.2.7)

and (7.2.5), then the integral (7.2.6) attains the minimal value. The solution of (7.2.7) is ε m (t) = α m,n e2

m

t

+ β m,n e−2

m

t

m (t m n ≤ t ≤ t n+1 ),

where α m,n and β m,n are determined by (7.2.5). This implies from P Γ (F) = h and (7.2.4) that P Γ ({F m }) = h m (t) = F m (t) − (α m,n e2

m

t

m + β m,n e−2 t ) (t m n ≤ t ≤ t n+1 , n ∈ ℤ). m

Finally, denote P = P V ∘ P Γ . Let F be a sequence of null functions. Then W ψm f (m ∈ ℤ) m are reconstructed by data { t m n , X n }n,m∈ℤ as follows: {(W ψm f)(t)} ≈ lim P(ψ) ∘ P(ψ) ∘ ⋅ ⋅ ⋅ ∘ P(ψ), ν→∞

where ν is the number of P(ψ).

Finally, the signal f is reconstructed immediately by using the inversion formula of dyadic wavelet transform.

7.3 The simplex method The simplex method is applied for solving linear optimization models. Linear optimization models may have an objective function that is either to be maximized or to be minimized; they may have variables that are either nonnegative or free; and they may have constraints that are either equalities or inequalities.

7.3.1 Pivot operations Pivot operations are used for solving systems of linear equations or for solving linear optimization models. Here consider a system of m linear equations with n unknown

7.3 The simplex method |

199

variables a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n = b1 , { { { { { { a21 x1 + a22 x2 + ⋅ ⋅ ⋅ + a2n x n = b2 , { { { .. { . { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n = b m . Any entry (i, j) consists of the row index i and the column index j. The chosen entry (i, j) in order to obtain a new system of linear equations from the current system is called a pivot entry. The resulting calculation of the new system from the current system is called a pivot operation. An indexing function ϕ of the row indices into the column indices records that for each row k (k = 1, . . . , m), the ϕ(k)-th column has coefficient 1 in the entry (k, ϕ(k)) and coefficient 0 in the entries (i, ϕ(k)) (i = 1, . . . , m, i ≠ k). Consider the k-th equation of the current system a k1 x1 + a k2 x2 + ⋅ ⋅ ⋅ + a kn x n = b k . There are three cases. Case 1. a kj = 0 for all j = 1, . . . , n and b k = 0. Then any values x1 , . . . , x n satisfy the k-th equation. The k-th equation is vacuous. In this case, define the indexing function as ϕ(k) = 0. Case 2. a kj = 0 for all j = 1, . . . , n but b k ≠ 0. In this case, the k-th equation has no solution. Case 3. There are several nonzero coefficients a kj . The gradient rule advises to choose the column index l so that the absolute value of a kl (a kl ≠ 0) is as large as possible. In this case, the (k, l) entry is chosen as a pivot entry (i.e., ϕ(k) = l), and then a pivot operation is performed by Gauss–Jordan elimination. This produces a new linear system from the current system. The process is as follows. Divide the k-th equation by a kl to produce the new k-th equation, where the coefficient of x l is 1, (1)

(1)

(1)

(1)

(1)

a k1 x1 + ⋅ ⋅ ⋅ + a k,l−1 x l−1 + x l + a k,l+1 x l+1 + ⋅ ⋅ ⋅ + a kn x n = b k , where

(1)

ϕ(k) = l,

a kj (j = 1, . . . , n), a kl bk = . a kl

a kj = (1)

bk

Multiply the new k-th equation by a il (i = 1, . . . , m; i ≠ k) so that the coefficient of x l becomes a il , and then subtract this equation from the i-th equation of the current system. This produces the new i-th equation, where the coefficient of x l is 0, (1)

(1)

(1)

(1)

(1)

a i1 x1 + ⋅ ⋅ ⋅ + a i,l−1 x l−1 + 0x l + a k,l+1 x l+1 + ⋅ ⋅ ⋅ + a in x n = b i ,

200 | 7 Optimization where for i = 1, . . . , m, i ≠ k, (1)

a kj a il (j = 1, . . . , n), a kl bk = bi − a il . a kl

a ij = a ij − (1)

bi

Therefore, the resulting new system from the current system by a pivot operation is (1)

(1)

(1)

(1)

(1)

a11 x1 + ⋅ ⋅ ⋅ + a1,l−1 x l−1 + 0 + a1,l+1 x l+1 + ⋅ ⋅ ⋅ + a1n x n = b1 , { { { { { .. { { { . { { { (1) (1) (1) (1) (1) a k1 x1 + ⋅ ⋅ ⋅ + a k,l−1 x l−1 + x l + a k,l+1 x l+1 + ⋅ ⋅ ⋅ + a kn x n = b k , { { { { . { { { .. { { { { (1) (1) (1) (1) (1) { a m1 x1 + ⋅ ⋅ ⋅ + a m,l−1 x l−1 + 0 + a m,l+1 x l+1 + ⋅ ⋅ ⋅ + a mn x n = b m , where

(1)

ϕ(k) = l,

a kj (j = 1, . . . , n), a kl bk = , a kl

a kj = (1)

bk and for i = 1, . . . , m, i ≠ l, (1)

a kj a il (j = 1, . . . , n), a kl bk = bi − a il , a kl

a ij = a ij − (1)

bi

which are called pivoting formulas. These formulas are extremely important for solving linear system and linear optimization models. It can be verified that the new system and the current system are equivalent.

7.3.2 Basic variables and nonbasic variables The standard linear optimization model with equality constraints and nonnegative variables is the form (L) : max y

subject to

y + c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n = d, { { { { { { a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n = b1 , { { { { . .. { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n = b m , { { { { { x1 ≥ 0, x2 ≥ 0, . . . , x n ≥ 0,

7.3 The simplex method |

201

where b i , c j have any values. Assume that for each k (k = 1, . . . , m), the coefficients of the ϕ(k)-th column satisfy a kϕ(k) = 1, a iϕ(k) = 0,

(i = 1, . . . , m, i ≠ k),

c ϕ(k) = 0, where ϕ is an indexing function of row indices into column indices. Then the corresponding variables x ϕ(k) are called basic variables and the remaining variables are called nonbasic variables. The solution { x ϕ(k) = b k (k = 1, . . . , m), { x =0 otherwise { j is called the basic solution of the model (L). The objective function of the model (L) is rewritten as y = −c1 x1 − ⋅ ⋅ ⋅ − c n x n + d. Under the above assumption condition, the coefficients of basic variables are equal to zero, i.e., c ϕ(k) = 0. Since nonbasic variables themselves in the basic solution of the model (L) are equal to zero, i.e., x j = 0 (j ≠ ϕ(k)), the value of the objective function is reduced to y = d. Definition 7.3.1. Under the above assumption condition, – if b i ≥ 0 for all i = 1, . . . , m and all c j have any values, the model (L) is said to be in basic form; – if c j ≥ 0 for all j = 1, . . . , n and all b i have any values, the model (L) is said to be in dual basic form.

7.3.3 The simplex algorithm The discovery of the simplex method is due to George Dantzig. This method is applied for solving the standard linear optimization models with equality constraints and nonnegative variables in basic form. Given a standard linear optimization model with equality constraints and nonnegative variables in basic form as follows: (L b )

max y

subject to

y + c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n = d, { { { { { { a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n = b1 , { { { { . .. { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n = b m , { { { { { x1 ≥ 0, x2 ≥ 0, . . . , x n ≥ 0,

202 | 7 Optimization where b i ≥ 0 for all i = 1, . . . , m and all c j have any values, its matrix form is that (L b ) : max y

subject to

{ y + ⟨c, x⟩ = d, { { { Ax T = bT , { { { { x ∈ ℝ+n , { where b = (b1 , . . . , b m ) ≥ 0 and c = (c1 , . . . , c n ) have any values, and ℝ+n = [0, ∞)n , and a11 ⋅ ⋅ ⋅ a1n .. ) . .. A = ( ... . . a m1 ⋅ ⋅ ⋅ a mn The simplex algorithm consists of three cases. Case 1. c j ≥ 0 for all j = 1, . . . , n. Then the basic solution of the model (L b ) is optimal and the corresponding value of the objective function is d. The algorithm stops. In fact, if x∗1 , x∗2 , . . . , x∗n is the basic solution of the model (L b ), then y(x∗1 , x∗2 , . . . , x∗n ) = d. On the other hand, for any solution x1 , x2 , . . . , x n , since c j , x j ≥ 0 (j = 1, . . . , n), the objective function y(x1 , . . . , x n ) = −c1 x1 − ⋅ ⋅ ⋅ − c n x n + d ≤ d. Thus, the basic solution is optimal. Case 2. There is a column l so that c l < 0 and a il ≤ 0 for all i = 1, . . . , m. Then the model (L b ) has no feasible solution. The algorithm stops. In fact, since c l < 0, x l must be a nonbasic variable. It is known that for basic variables, a kϕ(k) = 1, a iϕ(k) = 0 (i ≠ k), c ϕ(k) = 0, and for nonbasic variables, x j = 0 (j ≠ l). So the k-th equation of the model (L b ) reduces to x ϕ(k) + a kl x l = b k or x ϕ(k) = b k − a kl x l , where b k ≥ 0 and a kl ≤ 0, and the objective function of the model (L b ) reduces to y = −c l x l + d, where c l < 0. Thus, y can be made arbitrarily large by choosing x l sufficiently large.

7.3 The simplex method |

203

Case 3. There is a column l so that c l < 0 and a il > 0 for some i = 1, . . . , m. If there are several columns l so that c l < 0, then the gradient rule advises to choose the column l so that c l is the most negative, and then choose the row index k so that { a kl > 0, { bk = min1≤i≤m { abili : a il > 0} . { a kl Pivoting on the (k, l) entry, this produces a new model (L1b ) from the model (L b ) by a pivot operation (L1b )

max y

subject to

(1)

(1)

(1)

(1)

y + c1 x1 + ⋅ ⋅ ⋅ + c l−1 x l−1 + 0 + c l+1 x l+1 + ⋅ ⋅ ⋅ + c n x n = d(1) , { { { { { (1) (1) (1) (1) (1) { a11 x1 + ⋅ ⋅ ⋅ + a1,l−1 x l−1 + 0 + a1,l+1 x l+1 + ⋅ ⋅ ⋅ + a1n x n = b1 , { { { { { . { .. { { { { { (1) (1) (1) (1) (1) { a k1 x1 + ⋅ ⋅ ⋅ + a k,l−1 x l−1 + x l + a k,l+1 x l+1 + ⋅ ⋅ ⋅ + a kn x n = b k , { { { . { { { .. { { { { (1) (1) (1) (1) (1) { { a m1 x1 + ⋅ ⋅ ⋅ + a m,l−1 x l−1 + 0 + a m,l+1 x l + ⋅ ⋅ ⋅ + a mn x n = b m , { { { { { x1 ≥ 0, x2 ≥ 0, . . . , x n ≥ 0,

ϕ(k) = l,

where the new coefficients are given by the pivoting formulas (1)

a kj (j = 1, . . . , n), a kl bk = , a kl

a kj = (1)

bk

and the pivoting formulas for i = 1, . . . , m, i ≠ k, (1)

a kj a il a kl a kj = cj − cl a kl

a ij = a ij − (1)

cj

(1)

(j = 1, . . . , n),

bi

(j = 1, . . . , n),

d(1)

bk a il , a kl bk =d− cl . a kl = bi −

Repeat the above process with (L1b ) until a newer model is in Case 1 or Case 2.

7.3.4 The dual simplex method The dual simplex method is used for solving the standard linear optimization models with equality constraints and nonnegative variables in dual basic form. Given a standard linear optimization model with equality constraints and nonnegative variables

204 | 7 Optimization

in dual basic form as follows: (L d ) : max y

subject to

y + c1 x1 + ⋅ ⋅ ⋅ + c n x n = d, { { { { { { { a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n = b1 , { { { . .. { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n = b m , { { { { { x1 ≥ 0, x2 ≥ 0, ..., x n ≥ 0, where c j ≥ 0 for all j = 1, . . . , n and all b i have any values. Its matrix form is that (L d ) : max y

subject to

{ y + ⟨c, x⟩ = d, { { { Ax T = bT , { { { { x ∈ ℝ+n , { where c = (c1 , . . . , c n ) ≥ 0 and b = (b1 , . . . , b m ) have any values, and x = (x1 , . . . , x n ), and a11 ⋅ ⋅ ⋅ a1n .. ) . .. A = ( ... . . a m1 ⋅ ⋅ ⋅ a mn The dual simplex algorithm consists of three cases. Case 1. b i ≥ 0 for all i = 1, . . . , m. This is just Case 1 of the above simplex algorithm. Thus, the basic solution of the model (L d ) is optimal and the corresponding value of the objective function is d. The algorithm stops. Case 2. There is a row k so that b k < 0 and a kl ≥ 0 for all l = 1, . . . , n. Then the model (L d ) has no feasible solution. The algorithm stops. In fact, for any x1 ≥ 0, . . . , x n ≥ 0, since a kl ≥ 0 for all l = 1, . . . , n, a k1 x1 + ⋅ ⋅ ⋅ + a kn x n ≥ 0. But b k < 0. So x1 , . . . , x n cannot satisfy the k-th equation, and so the model (L d ) has no feasible solution. Case 3. There is a row k so that b k < 0 and a kl < 0 for some l = 1, . . . , n. If there are several rows k so that b k < 0, then the dual gradient rule advises to choose the row index k so that b k is the most negative, and then choose the column index l so that { a kl < 0, { cl c = max1≤j≤n { a kjj : a kj < 0} . { a kl Pivoting on the (k, l) entry, this produces a new model (L1d ) from the current model (L d ) by a pivot operation. Repeat the above process with (L1d ) until a newer model is in Case 1 or Case 2.

7.3 The simplex method |

205

7.3.5 Slack variables Any linear optimization model with inequality constraints and nonnegative variables can be converted into an equivalent standard linear optimization model with equality constraints and nonnegative variables by introducing new variables. The introduced new variables are called slack variables. This is a major creative idea in the development of the simplex method. The types often used are as follows.

Type 1 The model is a linear optimization model with inequality constraints and nonnegative variables as follows: (L)

max y

subject to

y + c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n = d, { { { { { { { a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n ≤ b1 , { { { . .. { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n ≤ b m , { { { { { x1 ≥ 0, x2 ≥ 0, . . . , x n ≥ 0, where b i ≥ 0 for all i = 1, . . . , m and all c j have any values. Its matrix form is that (L) : max y

subject to

{ y + ⟨c, x⟩ = d, { { { Ax T ≤ bT , { { { { x ∈ ℝ+n , { where b = (b1 , . . . , b m ) ≥ 0 and c = (c1 , . . . , c n ) have any values, and x = (x1 , . . . , x n ), and a11 ⋅ ⋅ ⋅ a1n .. ) . .. A = ( ... . . a m1 ⋅ ⋅ ⋅ a mn Introduce m new variables x n+i (i = 1, . . . , m) (i.e., m slack variables) to convert inequality constraints to equality constraints. This produces a standard linear optimization model with equality constraints and nonnegative variables in basic form as follows:

206 | 7 Optimization (L b )

max y

subject to

y + c1 x1 + ⋅ ⋅ ⋅ + c n x n + 0 + 0 + 0 + ⋅ ⋅ ⋅ + 0 = d, { { { { { { a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n + x n+1 + 0 + 0 + ⋅ ⋅ ⋅ + 0 = b1 , { { { { { a x + a x + ⋅⋅⋅ + a x + 0 + x { 22 2 2n n n+2 + 0 + ⋅ ⋅ ⋅ + 0 = b 2 , { 21 1 .. { { { . { { { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n + 0 + 0 + ⋅ ⋅ ⋅ + 0 + x n+m = b m , { { { x1 ≥ 0, x2 ≥ 0, . . . , x n+m ≥ 0,

ϕ(1) = n + 1, ϕ(2) = n + 2,

ϕ(m) = n + m,

where b i ≥ 0 for all i = 1, . . . , m and all c j have any values. Here ϕ is an indexing function of row indices into column indices and x n+i ≥ 0 (i = 1, . . . , m). It is seen that when ϕ(k) = n + k (k = 1, . . . , m), a kϕ(k) = 1, a iϕ(k) = 0

(i = 1, . . . , m, i ≠ k),

c ϕ(k) = 0. Thus, the slack variables x n+k (k = 1, . . . , m) are basic variables and the original variables x j (j = 1, . . . , n) are nonbasic variables. Let the nonbasic variables x j = 0 (j = 1, . . . , n). Then x n+i = b i (i = 1, . . . , m). So xj = 0

(j = 1, . . . , n),

x n+i = b i

(i = 1, . . . , m)

is the basic solution of the model (L b ). Applying the simplex method to the model (L b ), we will obtain the optimal solution and the corresponding value of the objective function. Example 7.3.2. Given a linear optimization model with inequality constraints and nonnegative variables, (L) : max y

subject to

y − 4x1 − 5x2 = 25, { { { { { { x1 + 2x2 ≤ 12, { { { x + x2 ≤ 9, { { 1 { { { { 4x1 + 5x2 ≤ 40, { { { { x1 ≤ 0, x2 ≥ 0, try to solve it by the simplex method. Solution. It is clear that the set of feasible solutions of the model (L) is a polygonal region with vertices (0, 0), (0, 6), (6, 3), and (9, 0).

7.3 The simplex method |

207

Introduce three slack variables x3 , x4 , x5 to convert inequality to equality constraints. This produces a standard linear optimization model with equality constraints and nonnegative variables in basic form as follows: (L b ) : max y

subject to

y − 4x1 − 5x2 + 0 + 0 + 0 = 25, { { { { { { x1 + 2x2 + x3 + 0 + 0 = 12, { { { x1 + x2 + 0 + x4 + 0 = 9, { { { { { { 4x1 + 5x2 + 0 + 0 + x5 = 40, { { { { x1 ≤ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0,

ϕ(1) = 3, ϕ(2) = 4, ϕ(3) = 5,

where ϕ is an indexing function of row indices into column indices. It is seen that the basic variables of the model (L b ) are x3 , x4 , x5 and the nonbasic variables of the model (L b ) are x1 , x2 . Let nonbasic variables x1 = x2 = 0. Then x3 = 12, x4 = 9, x5 = 40. So the basic solution of the model (L b ) is x1 = 0,

x2 = 0,

x3 = 12,

x4 = 9,

x5 = 40

and the corresponding value of the objective function is y(0, 0) = 25 (see Table 7.1). Tab. 7.1: First simplex tableau. Basic Variables

Equation Number

Coefficient of y

x1

x2

x3

x4

x5

Right Side of Equation

y

0

1

−4

−5

0

0

0

25

x3

1

0

1

2

1

0

0

12

x4

2

0

1

1

0

1

0

9

x5

3

0

4

5

0

0

1

40

Here the point (0, 0) is the first vertex of the set of feasible solutions of the model (L). Since c1 = −4 and c2 = −5, the model (L b ) is not in Case 1 of the simplex algorithm. So the basic solution is not optimal. In the model (L b ), c1 = −4, and c2 = −5. The gradient rule advises to choose the column index l = 2 since c2 = −5 is the most negative. In this column, a12 = 2 > 0,

a22 = 1 > 0,

a32 = 5 > 0,

so the model (L b ) is in Case 3 of the simplex algorithm. Choose the row index k = 1 since a12 > 0 and bk b1 b2 b3 b1 = min { , , . } = min{ 6, 9, 8 } = 6 = a k2 a12 a22 a32 a12

208 | 7 Optimization Pivoting on the (1, 2) entry, this produces a new model (L1b ) from the model (L b ) by the first pivot operation (L1b ) : max y

subject to

y − 32 x1 + 0 + 52 x3 + 0 + 0 = 55, { { { { 1 1 { { { 2 x1 + x2 + 2 x3 + 0 + 0 = 6, { { 1 x + 0 − 12 x3 + x4 + 0 = 3, { {2 1 { { 3 5 { { { 2 x 1 + 0 − 2 x 3 + 0 + x 5 = 10, { { { x1 ≤ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0.

ϕ(1) = 2, ϕ(2) = 4, ϕ(3) = 5,

It is seen that the basic variables of the model (L1b ) are x2 , x4 , x5 and the nonbasic variables of the model (L1b ) are x1 , x3 . Let nonbasic variables x1 = x3 = 0. Then x2 = 6, x4 = 3, x5 = 10. So the basic solution of the model (L1b ) is x1 = 0,

x2 = 6,

x3 = 0,

x4 = 3,

x5 = 10

and the corresponding value of the objective function is y(0, 6) = 55 (see Table 7.2). Here the point (0, 6) is the second vertex of the set of feasible solutions of the model (L). Since c1 = − 32 , the model (L1b ) is not in Case 1 of the simplex algorithm. So the basic solution is not optimal. Tab. 7.2: Second simplex tableau. Basic Variables

Equation Number

Coefficient of y

y

0

1

x2

1

0

x4

2

0

x5

3

0

x1 3 − 2 1 2 1 2 3 2

x2 0 1 0 0

x3 5 2 1 2 1 − 2 5 − 2

Right Side of Equation

x4

x5

0

0

55

0

0

6

1

0

3

0

1

10

In the model (L1b ), c1 = − 32 . The chosen column index is l = 1. In this column, a11 = b 1 1 3 2 > 0, a 21 = 2 > 0, and a 31 = 2 > 0, so the model (L 1 ) is in Case 3 of the simplex algorithm. Choose the row index k = 2 since a21 > 0 and b1 b2 b3 bk b2 20 = min { , , . }=6= } = min {12, 6, a k1 a11 a21 a31 3 a21

7.3 The simplex method |

209

Pivoting on the (2, 1) entry, this produces a new model (L2b ) from the model (L1b ) by the second pivot operation (L2b ) : max y

subject to

y + 0 + 0 + x3 + 3x4 + 0 = 64, { { { { { { { 0 + x2 + x3 − x4 + 0 = 3, { { x + 0 − x3 + 2x4 + 0 = 6, { { 1 { { { { 0 + 0 − x3 − 3x4 + x5 = 1, { { { { x1 ≤ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0.

ϕ(1) = 2, ϕ(2) = 1, ϕ(3) = 5,

It is seen that the basic variables of the model (L2b ) are x1 , x2 , x5 and the nonbasic variables of the model (L2b ) are x3 , x4 . Let nonbasic variables x3 = x4 = 0. Then x1 = 6, x2 = 3, x5 = 1. In the model (L2b ), c j ≥ 0 for all j = 1, 2, 3, 4, 5. So the model (L2b ) is in Case 1 of the simplex algorithm. The basic solution of the model (L2b ) x1 = 6,

x2 = 3,

x3 = 0,

x4 = 0,

x5 = 1

is the optimal solution of the model (L) and the corresponding value of the objective function is y(6, 3) = 64. Here the point (6, 3) is the third vertex of the set of feasible solutions of the model (L) (see Table 7.3).

Tab. 7.3: Third simplex tableau. Basic Variables

Equation Number

Coefficient of y

x1

x2

x3

x4

x5

Right Side of Equation

y

0

1

0

0

1

3

0

64

x1

1

0

0

1

1

−1

0

3

x2

2

0

1

0

−1

2

0

6

x5

3

0

0

0

−1

−3

1

1

From Example 7.3.2, it is seen that the calculations applying the simplex method to solve the standard linear optimization model in basic form are to journey from one vertex of the set of feasible solutions of the given linear optimization model with inequality constraints and nonnegative variables to another such vertex by the pivot operations.

210 | 7 Optimization

Type 2 The model is a linear optimization model with inequality constraints and nonnegative variables (L) min z subject to z = c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n + d, { { { { { { a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n ≥ b1 , { { { { { a x + a x + ⋅⋅⋅ + a x ≥ b , { 22 2 2n n 2 { 21 1 . { { .. { { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n ≥ b m , { { { { { x1 ≥ 0, x2 ≥ 0, , . . . , x n ≥ 0, where c j ≥ 0 for all j = 1, . . . , n and all b i have any values. Its matrix form is that (L) : min z

subject to

{ z = ⟨c, x⟩ + d, { T Ax ≥ bT , x ∈ ℝ+n , { where c = (c1 , . . . , c n ) ≥ 0 and b = (b1 , . . . , b m ) have any values, and x = (x1 , . . . , x n ), and a11 ⋅ ⋅ ⋅ a1n .. ) . .. A = ( ... . . a m1 ⋅ ⋅ ⋅ a mn Let y = −z. In the model (L), converting minimization of z to maximization of y, and then multiplying the inequalities by −1, produces a linear optimization model (L∗ ) with inequality constraints and nonnegative variables from the current model (L) as follows: (L∗ ) max y subject to −y = c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n + d, { { { { { { { −a11 x1 − a12 x2 − ⋅ ⋅ ⋅ − a1n x n ≤ −b1 , { { { . .. { { { { { { −a m1 x1 − a m2 x2 − ⋅ ⋅ ⋅ − a mn x n ≤ −b m , { { { { { x1 ≥ 0, x2 ≥ 0, . . . , x n ≥ 0, where c j ≥ 0 for all j = 1, . . . , n and all b i have any values. Introducing m slack variables x n+i (i = 1, . . . , m), the model (L∗ ) is further converted to a standard linear optimization model with equality constraints and nonnegative variables in dual basic form as follows:

7.3 The simplex method |

(L d ) : max y

211

subject to

y + c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n + 0 + 0 + ⋅ ⋅ ⋅ + 0 = −d, { { { { { { −a11 x1 − a12 x2 − ⋅ ⋅ ⋅ − a1n x n + x n+1 + 0 + ⋅ ⋅ ⋅ + 0 = −b1 , { { { { { −a x − a x − ⋅ ⋅ ⋅ − a x + 0 + x { 22 2 2n n n+2 + ⋅ ⋅ ⋅ + 0 = −b 2 , { 21 1 .. { { { . { { { { { { { { −a m1 x1 − a m2 x2 − ⋅ ⋅ ⋅ − a mn x n + 0 + 0 + ⋅ ⋅ ⋅ + 0 + x n+m = −b m , { { { x1 ≥ 0, x2 ≥ 0, . . . , x n+m ≥ 0,

ϕ(1) = n + 1, ϕ(2) = n + 2,

ϕ(m) = n + m,

where c j ≥ 0 for all j = 1, . . . , n and all b i have any values. Here x n+i ≥ 0 (i = 1, . . . , m). Applying the dual simplex method to the model (L d ), we will obtain the optimal solution of the model (L d ) and the corresponding value of the objective function. Example 7.3.3. Given a linear optimization model with inequality constraints and nonnegative variables (L) : min z subject to z = 3x1 + 4x2 , { { { { { { 4x1 + 2x2 ≥ 18, { { { 2x1 + 5x2 ≥ 25, { { { { 2x1 + 3x2 ≥ 19, where x1 ≥ 0 and x2 ≥ 0, try to solve the model (L) by the dual simplex method. Solution. Let y = −z. Converting minimization of z to maximization of y, multiplying the inequalities by −1, and then introducing three slack variables x3 , x4 , x5 , produces a standard linear optimization model with equality constraints and nonnegative variables in dual basic form as follows: (L d ) : max y

subject to

y + 3x1 + 4x2 + 0 + 0 + 0 = 0, { { { { { { −4x1 − 2x2 + x3 + 0 + 0 = −18, { { { −2x1 − 5x2 + 0 + x4 + 0 = −25, { { { { { { −2x1 − 3x2 + 0 + 0 + x5 = −19, { { { { x1 ≥ 0, x2 ≥ 0.

ϕ(1) = 3, ϕ(2) = 4, ϕ(3) = 5,

where c1 = 3 and c2 = 4 are both nonnegative. Since b1 = −18 < 0, b2 = −25 < 0, and b3 = −19 < 0, the dual gradient rule advises to choose the second row k = 2 since b2 = −25 is the most negative. In this row, a21 = −2 < 0 and a22 = −5 < 0. So the model (L d ) is in Case 3 of the dual simplex

212 | 7 Optimization algorithm. Note that c1 = 3 and c2 = 4. Choose the column l = 2 since { a22 < 0, { cl = max { ac211 , { a2l

c2 a22 }

= max {− 32 , − 45 } = − 45 =

c2 a22 .

Pivoting on the (2, 2) entry, this produces a newer model (L1d ) from the model (L d ) by the first pivot operation (L1d ) : max y

subject to

y + 75 x1 + 0 + 0 + 45 x4 + 0 = −20, { { { { 16 2 { { { − 5 x1 + 0 + x3 − 5 x4 + 0 = −8, { { 2 x + x2 + 0 − 15 x4 + 0 = 5, { {5 1 { { { { − 45 x1 + 0 + 0 − 35 x4 + x5 = −4, { { { { x1 ≥ 0, x2 ≥ 0,

ϕ(1) = 3, ϕ(2) = 2, ϕ(3) = 5,

where c1 = 75 and c4 = 45 are both nonnegative. It is seen that basic variables of the model (L1d ) are x2 , x3 , x5 and nonbasic variables of the model (L1d ) are x1 , x4 . Let nonbasic variables x1 = x4 = 0. Then x2 = 5, x3 = −8, x5 = −4. So the basic solution of the model (L1d ) is x1 = 0,

x2 = 5,

x3 = −8,

x4 = 0,

x5 = −4

and the corresponding value of the objective function is y(0, 5) = −20 (see Table 7.4). Since b1 = −8 and b3 = −4, the model (L1d ) is not in Case 1 of the dual simplex algorithm. So the solution is not the optimal one. Tab. 7.4: First dual simplex tableau. Basic Variables

Equation Number

Coefficient of y

y

0

1

x2

1

0

x3

2

0

x5

3

0

x1 7 5 16 − 5 2 5 4 − 5

x2

x3

0

0

0

1

1

0

0

0

x4 4 5 2 − 5 2 − 5 3 − 5

x5

Right Side of Equation

0

−20

0

−8

0

5

1

−4

In the model (L1d ), since b1 = −8 and b3 = −4, the dual gradient rule advises to choose the row k = 1 since b1 = −8 is the most negative. In this row, b1 = −8, and a11 = − 16 5 0 for all i = 1, 2, 3 in the model (L3d ). So the model (L3d ) is in Case 1 of the dual simplex algorithm. So the basic solution of the model (L3d ) x1 = 2,

x2 = 5,

x3 = 0,

x4 = 4,

x5 = 0.

is the optimal solution of the model (L) and the corresponding value of the objective function is y(2, 5) = −26 (see Table 7.6). Here (2, 5) is a vertex of the set of feasible solutions of the model (L). Tab. 7.6: Third dual simplex tableau. Basic Variables

Equation Number

Coefficient of y

x1

x2

y

0

1

0

0

x1

1

0

1

0

x2

2

0

0

1

x4

3

0

0

0

x3 1 8 3 − 8 1 4 1 2

x4 0 0 0 1

x5

Right Side of Equation

5 4 1 4 1 − 2

−26

−2

4

2 5

Note that min z = − max(−z) = − max y and max y = −26. Then min z = 26. Consider the following standard linear optimization model, satisfying the assumption condition given in Section 7.3.2, with equality constraints and nonnegative variables:

7.3 The simplex method |

(L) : min z

215

subject to

z = c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n + d, { { { { { { a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n = b1 , { { { { . .. { { { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n = b m , { { { x1 ≥ 0, x2 ≥ 0, . . . , x n ≥ 0, where the first h − 1 right-hand sides b i ≥ 0 (i = 1, . . . , h − 1), b i (i = h, . . . , m), and c j (j = 1, . . . , n) have any values. The model (L) will be converted to a model in basic form. For convenience, without loss of generality, assume that in the model (L), the first m − 1 right-hand side b i ≥ 0 (i = 1, . . . , m − 1) and b m has any value. Then the model (L) is converted to a model in basic form below. Focusing on the m-th row of the model (L), there are three cases. Case 1. b m ≥ 0. Then the model (L) is the desired model. The algorithm stops. Case 2. b m < 0 and a mj ≥ 0 for all j. Then a m1 x1 + ⋅ ⋅ ⋅ + a mn x n ≥ 0 for any solution x j ≥ 0 (j = 1, . . . , n) but b m < 0. Thus the m-th equation has no solution. The algorithm stops. Case 3. b m < 0 and a ml < 0 for some l. Then the chosen column index is l. Regard the first (m − 1) rows as the constraints and the m-th row as the objective row. There are two alternatives. – If a il ≤ 0 for all i = 1, . . . , m − 1, then the chosen row index is m. So the (m, l) entry in the objective row is chosen as the pivot entry, this produces a (1) (1) (1) new model (L1 ) with coefficients c j , a ij , b i , where bm { b(1) m = a ml > 0, { (1) m a il ≥ 0 b = b i − abml { i



(i = 1, . . . , m − 1)

since b i ≥ 0 (i = 1, . . . , m − 1). Thus the model (L1 ) is in basic form. The algorithm stops. If a il > 0 for some i (1 ≤ i ≤ m − 1), then the model (L) is in Case 3 of the simplex algorithm. So the chosen row index k is such that a kl > 0 and bj bk = min { : a jl > 0} . a kl 1≤j≤m−1 a jl

Pivoting on the (k, l) entry, this produces a new model (L1 ) from the model (L). If the new model (L1 ) is in basic form, then the algorithm stops. Otherwise, repeat the above process with the model (L1 ) until a newer model is in basic form.

216 | 7 Optimization

Type 3 The model is a linear optimization model with nonnegative variables, μ inequality constraints and m − μ equality constraints as follows: (L)

min z

subject to

{ z = c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n + d, { { { { { { a11 x1 + a12 x2 + ⋅ ⋅ ⋅ + a1n x n ≥ b1 , { { { { .. { { { . { { { { { { a μ1 x1 + a μ2 x2 + ⋅ ⋅ ⋅ + a μn x n ≥ b μ , { { { a μ󸀠 1 x 1 + a μ󸀠 2 x 2 + ⋅ ⋅ ⋅ + a μ󸀠 n x n = b μ󸀠 , { { { { .. { { { . { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n = b m , { { { { { x1 ≥ 0, x2 ≥ 0, . . . , x n ≥ 0, where c j and b i have any values and μ󸀠 = μ + 1. Its matrix form is (L) : min z

subject to

z = ⟨c, x⟩ + d, { { { { { { A1 x T ≥ bT1 , { { A2 xT = bT , { { 2 { { n x ∈ ℝ , + { where b1 = (b1 , . . . , b μ ), b2 = (b μ󸀠 , . . . , b m ), c = (c1 , . . . , c n ), x = (x1 , . . . , x n ), and a11 A1 = ( ... a μ1

⋅⋅⋅ .. . ⋅⋅⋅

a1n .. ) , . a μn

a μ󸀠 1 A2 = ( ... a m1

⋅⋅⋅ .. . ⋅⋅⋅

a μ󸀠 n .. ) . . a mn

Let y = −z. Converting minimization of z to maximization of y and multiplying the inequalities by −1, and then introducing μ slack variables x n+i (i = 1, . . . , μ), produces a standard linear optimization model (L∗ ), satisfying the assumption given in Section 7.3.2, with equality constraints and n + μ nonnegative variables as follows:

7.3 The simplex method |

(L∗ )

max y

217

subject to

{ y + c1 x1 + c2 x2 + ⋅ ⋅ ⋅ + c n x n + 0 + 0 + ⋅ ⋅ ⋅ + 0 = −d, { { { { { { −a11 x1 − a12 x2 − ⋅ ⋅ ⋅ − a1n x n + x n+1 + 0 + 0 + ⋅ ⋅ ⋅ + 0 = −b1 , { { { { { −a21 x1 − a22 x2 − ⋅ ⋅ ⋅ − a2n x n + 0 + x n+2 + 0 + ⋅ ⋅ ⋅ + 0 = −b2 , { { { { . { { .. { { { { { −a μ1 x1 − a μ2 x2 − ⋅ ⋅ ⋅ − a μn x n + 0 + 0 + ⋅ ⋅ ⋅ + 0 + x n+μ = −b μ , { { { { { { a μ󸀠 1 x 1 + a μ󸀠 2 x 2 + ⋅ ⋅ ⋅ + a μ󸀠 n x n = b μ󸀠 , { { { { .. { { { . { { { { { { a m1 x1 + a m2 x2 + ⋅ ⋅ ⋅ + a mn x n = b m , { { { { { x1 ≥ 0, x2 ≥ 0, . . . , x n ≥ 0, x n+1 ≥ 0, . . . , x n+μ ≥ 0 where c j and b i have any values. Applying the above method, the model (L∗ ) can be converted to a standard linear optimization model with equality constraints and nonnegative variables in basic form. Example 7.3.4. Solve the following linear optimization model: (L) : min z

subject to

z + x1 + 2x2 − 2x3 = 0, { { { { { { x1 − 2x2 + 2x3 ≥ 2, { { { 2x1 − x2 + x3 ≥ 2, { { { { { { −2x1 − x2 + x3 ≥ 4, { { { { x1 ≥ 0, x2 ≥ 0, x3 ≥ 0. Solution. Let y = −z. Converting minimization to maximization, multiplying the inequalities by −1, and then introducing slack variables x4 , x5 , x6 , produces a standard linear optimization model (L∗ ) with equality constraints and nonnegative variables as follows: (L∗ ) : max y

subject to

y − x1 − 2x2 + 2x3 + 0 + 0 + 0 = 0, { { { { { { { −x1 + 2x2 − 2x3 + x4 + 0 + 0 = −2, { { −2x1 + x2 − x3 + 0 + x5 + 0 = −2, { { { { { { 2x1 + x2 − x3 + 0 + 0 + x6 = −4, { { { { x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0, x6 ≥ 0. Looking at the first row of (L∗ ), b1 = −2 < 0 and a11 = −1, a13 = −2. Choose the column index l = 3 as the column index of the pivot entry. Pivoting on the (1, 3) entry produces a model (L1 ) from the model (L∗ )

218 | 7 Optimization (L1 ) : max y

subject to

y − x1 − 2x2 + 2x3 + 0 + 0 + 0 = 0, { { { { 1 { { x1 − x2 + x3 − 12 x4 + 0 + 0 = 1, { { {2 − 32 x1 + 0 + 0 − 12 x4 + x5 + 0 = −1, { { { { 5 1 { { { 2 x 1 + 0 + 0 − 2 x 4 + 0 + x 6 = −3, { { { x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0, x6 ≥ 0. Looking at the first row of the model (L1 ), b1 = 1 > 0. The model (L1 ) is in Case 1. So we turn to look at its second row. Since b2 = −1 < 0, a21 = − 32 < 0, and a24 = − 12 < 0, the model (L1 ) is in Case 3. Choose the column index l = 1. Regard the first row as the only constraint and the second row as the objective row. Only a11 = 12 > 0, the chosen row index is k = 1. Pivoting on the (1, 1) entry, this produces a model (L2 ) from (L1 ) (L2 ) : max y

subject to

y + 0 − 4x2 + 4x3 − x4 + 0 + 0 = 2, { { { { { { x1 − 2x2 + 2x3 − x4 + 0 + 0 = 2, { { { 0 − 3x2 + 3x3 − 2x4 + x5 + 0 = 2, { { { { { { { 0 + 5x2 − 5x3 + 2x4 + 0 + x6 = −8, { { { x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0, x6 ≥ 0. Looking at the second row of the model (L2 ), b2 = 2 > 0, the model (L2 ) is in Case 1. So we turn to look at its third row. Since b3 = −8 < 0 and a33 = −5 < 0, the model (L2 ) is in Case 3. The chosen column index is l = 3. Regard the first and second rows as the constraints and the third row as the objective row. Since a13 = 2 > 0 and a23 = 3 > 0, the model (L2 ) is in the second alternative of Case 3. From a13 = 2 > 0 and a23 = 3 > 0, it follows that bk b1 b2 2 b2 2 = min { , , } = min {1, } = = a kl a13 a23 3 3 a23 and so the chosen row index is k = 2. Pivoting on the (2, 3) entry produces a model (L3 ) from (L2 ) (L3 ) : max y

subject to

y + 0 + 0 + 0 + 53 x4 − 43 x5 + 0 = − 23 , { { { { { { x1 + 0 + 0 + 13 x4 − 23 x5 + 0 = 23 , { { { 0 − x2 + x3 − 23 x4 + 13 x5 + 0 = 23 , { { { { 4 5 14 { { { 0 + 0 + 0 − 3 x4 + 3 x5 + x6 = − 3 , { { { x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0.

7.3 The simplex method |

219

4 Looking at the third row of the model (L3 ), b3 = − 14 3 < 0, a 34 = − 3 < 0, the model (L3 ) is in Case 3. So the chosen column index l = 4. Regard the first and second rows as the constraints and the third row as the objective row. Only a14 = 13 > 0, so the chosen row index k = 1. Pivoting on the (1, 4) entry produces a model (L4 ) from the model (L3 )

(L4 ) : max y

subject to

y − 5x1 + 0 + 0 + 0 + 2x5 + 0 = −4, { { { { { { 3x1 + 0 + 0 + x4 − 2x5 + 0 = 2, { { { 2x1 − x2 + x3 + 0 − x5 + 0 = 2, { { { { { { 4x1 + 0 + 0 + 0 − x5 + x6 = −2, { { { { x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0. Looking at the third row of the model (L4 ), b3 = −2 < 0 and a35 = −1 < 0, the model (L4 ) is in Case 3. The chosen column index l = 5. Regard the first and second rows as the constraints and the third row as the objective row. All a15 = −2 < 0 and a25 = −1 < 0, so the model (L4 ) is in the first alternative of Case 3, and so the chosen row index k = 3. Pivoting on the (3, 5) entry in the objective row produces a model (L5 ) from the model (L4 ) (L5 ) : max y subject to y + 3x1 + 0 + 0 + 0 + 0 + 2x6 = −8, { { { { { { { −5x1 + 0 + 0 + x4 + 0 − 2x6 = 6, { { −2x1 − x2 + x3 + 0 + 0 − x6 = 4, { { { { { { −4x1 + 0 + 0 + 0 + x5 − x6 = 2, { { { { x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x4 ≥ 0, where b1 = 6, b2 = 4, b3 = 2. So the model (L5 ) is in basic form. Note that c1 = 3, c2 = c3 = c4 = c5 = 0, c6 = 2. The model (L5 ) is in Case 1 of the simplex algorithm, and basic variables of the model (L5 ) are x3 , x4 , x5 and nonbasic variables are x1 , x2 , x6 . Let nonbasic variables x1 = x2 = x6 = 0. Then x3 = 4, x4 = 6, x5 = 2. So the basic solution of the model (L5 ) x1 = 0,

x2 = 0,

x3 = 4,

x4 = 6,

x5 = 2,

x6 = 0

is the optimal solution of the model (L∗ ) and the corresponding value of objective function is y(00) = −8. Note that min z = − max y. Then min z = 8.

220 | 7 Optimization

Type 4 The model is a general linear optimization model with ν nonnegative variables and n − ν free variables subject to μ inequality constraints and m − μ equality constraints as follows: (L) : min z

subject to

{ z = c1 x1 + ⋅ ⋅ ⋅ + c ν x ν + c ν󸀠 x ν󸀠 + ⋅ ⋅ ⋅ + c n x n + d, { { { { { a11 x1 + ⋅ ⋅ ⋅ + a1ν x ν + a1ν󸀠 x ν󸀠 + ⋅ ⋅ ⋅ + a1n x n ≥ b1 , { { { { { .. { { { . { { { { { { a μ1 x1 + ⋅ ⋅ ⋅ + a μν x ν + a μν󸀠 x ν󸀠 + ⋅ ⋅ ⋅ + a μn x n ≥ b μ , { { a μ󸀠 1 x 1 + ⋅ ⋅ ⋅ + a μ󸀠 ν x ν + a μ󸀠 ν󸀠 x ν󸀠 + ⋅ ⋅ ⋅ + a μ󸀠 n x n = b μ󸀠 , { { { { { .. { { { { { . { { { { a m1 x1 + ⋅ ⋅ ⋅ + a mν x ν + a μ󸀠 ν󸀠 x ν󸀠 + ⋅ ⋅ ⋅ + a mn x n = b m , { { { { 󸀠 { x1 ≥ 0, . . . , x ν ≥ 0, x ν free, . . . , x n free, where c j and b i have any values and μ󸀠 = μ + 1 and ν󸀠 = ν + 1. Its matrix form is (L) : min z

subject to

z = ⟨c, x⟩ + d, { { { { { { A1 x T ≥ bT1 , { { { A2 x T = bT2 , { { { ν n−ν { x ∈ ℝ+ × ℝ , where b1 = (b1 , . . . , b μ ), b2 = (b μ󸀠 , . . . , b m ), c = (c1 , . . . , c n ), x = (x1 , . . . , x n ), d ∈ ℝ, and a11 ⋅ ⋅ ⋅ a1n a μ󸀠 1 ⋅ ⋅ ⋅ a μ󸀠 n . . .. ) . .. .. .. ) , A2 = ( ... A1 = ( .. . . . a μ1 ⋅ ⋅ ⋅ a μn a m1 ⋅ ⋅ ⋅ a mn For each free variable x j (j = ν󸀠 , . . . , n), let x+j = x j ,

x−j = 0

if x j ≥ 0,

x+j

x−j

if x j ≤ 0.

= 0,

= −x j

Then x+j and x−j are nonnegative and x j = x+j − x−j (j = ν󸀠 , . . . , n). Let y = −z. Converting minimization of z to maximization of y and multiplying the inequalities by −1, substituting x j (j = ν󸀠 , . . . , n) by the difference of two nonnegative variable, x n = x+n − x−n , where x+n ≥ 0 and x−n ≥ 0, and then introducing μ slack variables x n+i (i = 1, . . . , μ), produces a standard linear optimization model, satisfying

7.3 The simplex method |

221

the assumption condition given in Section 7.3.2, with 2n − ν + μ nonnegative variables subject to m equality constraints as follows: (L∗ )

max y

subject to

{ y + c1 x1 + ⋅ ⋅ ⋅ + c ν x ν + c ν󸀠 x+ν󸀠 − c ν󸀠 x−ν󸀠 + ⋅ ⋅ ⋅ + c n x+n − c n x−n + 0 + 0 + ⋅ ⋅ ⋅ + 0 = −d, { { { { { { −a11 x1 − ⋅ ⋅ ⋅ − a1ν x ν − a1ν󸀠 x+ν󸀠 + a1ν󸀠 x−ν󸀠 − ⋅ ⋅ ⋅ − a1n x+n + a1n x−n + x n+1 + 0 + ⋅ ⋅ ⋅ + 0 = −b1 , { { { { .. { { { . { { { { { { −a μ1 x1 − ⋅ ⋅ ⋅ − a μν x ν − a μν󸀠 x+ν󸀠 + a μν󸀠 x−ν󸀠 − ⋅ ⋅ ⋅ − a μn x+n + a μn x−n + 0 + ⋅ ⋅ ⋅ + 0 + x n+μ = −b μ , { { { a μ󸀠 1 x1 + ⋅ ⋅ ⋅ + a μ󸀠 ν x ν + a μ󸀠 ν󸀠 x+ν󸀠 − a μ󸀠 ν󸀠 x−ν󸀠 + ⋅ ⋅ ⋅ + a μ󸀠 n x+n − a μ󸀠 n x−n = b μ󸀠 , { { { { .. { { { . { { { { + − { + − { { a m1 x1 + ⋅ ⋅ ⋅ + a mν x ν + a mν󸀠 x ν󸀠 − a mν󸀠 x ν󸀠 + ⋅ ⋅ ⋅ + a mn x n − a mn x n = b m , { { { + − + − { x1 ≥ 0, . . . , x ν ≥ 0, x ν󸀠 ≥ 0, x ν󸀠 ≥ 0, . . . , x n ≥ 0, x n ≥ 0, x n+1 ≥ 0, . . . , x n+μ ≥ 0,

where c j and b i have any values, and μ󸀠 = μ + 1 and ν󸀠 = ν + 1. Applying the approach of Type 3, this produces a standard linear optimization model with equality constraints and nonnegative variables in basic form. Example 7.3.5. Reduce the following linear optimization model: (L) : min z

subject to

z = −3x1 + x2 + x3 , { { { { { { −4x1 + x2 + 2x3 ≤ 3, { { { x1 − 2x2 + x3 ≤ 11, { { { { { { { −2x1 + 2x3 = 1, { { { x1 ≥ 0, x2 ≥ 0, x3 free to a standard linear optimization model with equality constraints and nonnegative variables. Solution. Let y = −z. Converting minimization to maximization and substituting the variable x3 by x+3 − x−3 , where x+3 ≥ 0 and x−3 ≥ 0, and then introducing two slack variables x4 , x5 , produces a standard linear optimization model with three equality constraints and six nonnegative variables as follows: (L∗ ) : max y

subject to

y − 3x1 + x2 + x+3 − x−3 + 0 + 0 = 0, { { { { { { −4x1 + x2 + 2x+3 − 2x−3 + x4 + 0 = 3, { { { x − 2x2 + x+3 − x−3 + 0 + x5 = 11, { { 1 { { { { −2x1 + 0 + 2x+3 − 2x−3 + 0 + 0 = 1, { { { + − { x1 ≥ 0, x2 ≥ 0, x3 ≥ 0, x3 ≥ 0, x4 ≥ 0, x5 ≥ 0. The model (L∗ ) is the desired model.

222 | 7 Optimization

7.4 Fermat rules Let f : ℝ → ℝ be a function defined on ℝ and with values on ℝ. If f(x∗ ) ≤ f(x) for all x ∈ [x∗ − ε, x∗ + ε] (ε > 0), then we say x∗ minimizes f on the interval [x∗ − ε, x∗ + ε].

Fermat rule I Let f : ℝ → ℝ be a smooth function. If x∗ minimizes f on the interval [x∗ − ε, x∗ + ε] (ε > 0), then f 󸀠 (x∗ ) = 0. The meaning of Fermat rule I is that the tangent to the graph of the smooth function f at the point (x∗ , f(x∗ )) is parallel to the real axis. Now let f : ℝn → ℝ be a function defined on ℝn and with values on ℝ. If f(x ∗ ) ≤ f(x) for all x ∈ B(x ∗ , ε), where B(x ∗ , ε) is a closed ball of radius ε > 0 centered at x ∗ ∈ ℝn , then we say x ∗ minimizes f on B(x ∗ , ε). Relying on this concept, Fermat rule I is extended to functions defined on the n-dimensional space ℝn .

Fermat rule II Let f : ℝn → ℝ be a smooth function. If x ∗ minimizes f on B(x ∗ , ε) (ε > 0), then ∇f(x ∗ ) = (

∂f(x ∗ ) ∂f(x ∗ ) ,..., ) = 0, ∂x1 ∂x n

i.e., the gradient of f(x) at x∗ is equal to zero, where x = (x1 , . . . , x n ) and x ∗ = (x∗1 , . . . , x∗n ). The extension of Fermat rule to convex functions needs the following concepts. Let X be a subset of ℝn . If, for any x1 , x2 ∈ X, the line segment [x1 , x2 ] ⊂ X, i.e., x τ = (1 − τ)x1 + τx2 ∈ X

(0 ≤ τ ≤ 1),

then the set X is said to be a convex set of ℝn . It is clear by the definition of convex sets that (a) a set consisting of a single point is a convex set; (b) the empty set is a convex set; (c) if X ⊂ ℝn is a convex set, then the set S = {ax + b | x ∈ X, a, b ∈ ℝ} is a convex set. Convex cones are a particularly important subclass of convex sets. For example, {0}, ℝ+n = [0, ∞)n , ℝn , and closed half-spaces are all convex cones. Let f : ℝn → ℝ and X ⊂ ℝn be a convex set. If, for any x1 , x2 ∈ X, f ((1 − τ)x1 + τx2 ) ≤ (1 − τ)f(x1 ) + τf(x2 ) (0 ≤ τ ≤ 1), then f is said to be a convex function on X. If, for any x1 , x2 ∈ X, f ((1 − τ)x1 + τx2 ) < (1 − τ)f(x1 ) + τf(x2 ) (0 ≤ τ ≤ 1),

7.4 Fermat rules | 223

then f is said to be a strictly convex function on X. If f is a convex function on X, then g = −f is said to be a concave function on X. Clearly, the linear function f(x) = ⟨a, x⟩ is both a convex function and a concave function. The subderivative and the set of subgradients of convex functions are two fundamental concepts of subdifferential calculus. They are defined as follows. Definition 7.4.1. Let f : ℝn → ℝ be convex and its effective domain be dom f = {x ∈ ℝn | f(x) < ∞}. The subderivative of f at x ∗ ∈ dom f in direction w is defined as ∂+ f(x ∗ ) f(x ∗ + λw) − f(x ∗ ) = lim ∂w λ λ→0+

for any w ∈ ℝn .

The set of subgradients of f at x∗ ∈ dom f is defined as ∂f(x ∗ ) = { y ∈ ℝn | f(x) ≥ f(x ∗ ) + ⟨y, x − x ∗ ⟩ for any x ∈ ℝn } .

(7.4.1)

The set of subgradients depends on the subderivatives. If we set x − x ∗ = λw (λ > 0) in (7.4.1), then the inequality f(x) ≥ f(x ∗ ) + ⟨y, x − x ∗ ⟩ is equivalent to f(x ∗ + λw) − f(x ∗ ) ≥ ⟨y, w⟩. λ Since f is convex, the ratio on the left-hand side is monotonically nonincreasing, and + f(x ∗ ) tends to ∂ ∂w (λ → 0+). Let λ → 0+ on both sides of the above inequality. Then ∂+ f(x ∗ ) ≥ ⟨y, w⟩. ∂w So (7.4.1) has an equivalent form as follows: ∂f(x∗ ) = { y ∈ ℝ |

∂+ f(x ∗ ) ≥ ⟨y, w⟩ for any w ∈ ℝn } . ∂w

(7.4.2)

From this, it is seen that the set of subgradients depends on the subderivatives. The Fermat rule for convex functions defined on ℝn is as follows.

Fermat rule III Let f : ℝn → ℝ be a convex function. Then x ∗ ∈ ℝn minimizes f on B(x∗ , ε) (ε > 0) if and only if 0 ∈ ∂f(x∗ ). Proof. By the definition, x ∗ minimizes f on B(x ∗ , ε) if and only if f(x) ≥ f(x∗ ) for any x ∈ B(x ∗ , ε). Let x = x ∗ + λw, where w ∈ ℝn and λ > 0. Then f(x) ≥ f(x ∗ ) for any x ∈ B(x ∗ , ε) if and only if f(x ∗ + λw) ≥ f(x ∗ ) + λ⟨0, w⟩. By Definition 7.4.1, f(x ∗ + λw) ≥ + f(x ∗ ) f(x ∗ ) + λ⟨0, w⟩ if and only if ∂ ∂w ≥ ⟨0, w⟩ (w ∈ ℝn ). Finally, by (7.4.2), it follows that ∂+ f(x ∗ ) ∂w

≥ ⟨0, w⟩ (w ∈ ℝn ) if and only if 0 ∈ ∂f(x ∗ ).

In order to vary Fermat rule III, the following two concepts are required.

224 | 7 Optimization

Definition 7.4.2. Let Ω be a set and {0 i Ω (x) = { ∞ {

if x ∈ Ω, otherwise.

Then i Ω (x) is said to be the indicator function of the set Ω. It is clear by Definition 7.4.2 that the indicator function i Ω (x) is convex if and only if the set Ω is convex. Definition 7.4.3. Let Ω ⊂ ℝn be a nonempty closed convex set and x ∗ ∈ Ω. If ⟨υ, x − x∗ ⟩ ≤ 0, where υ ∈ ℝn and for all x ∈ Ω, then the vector υ is said to be normal to Ω at x ∗ , denoted by υ ∈ N Ω (x ∗ ), where N Ω (x ∗ ) is said to be the normal cone to Ω at x ∗ . Assume that Ω ⊂ ℝn is a nonempty closed convex set and x ∗ ∈ Ω. Then the set of subgradients of the indicator function of Ω is just the normal cone to Ω, i.e., ∂i Ω (x ∗ ) = N Ω (x ∗ ).

(7.4.3)

In fact, from Definition 7.4.3, it follows that N Ω (x ∗ ) = ⋂ { υ ∈ ℝn | ⟨υ, x − x ∗ ⟩ ≤ 0 },

(7.4.4)

x∈Ω

i.e., N Ω (x ∗ ) is the intersection of closed half-spaces. Since x ∗ ∈ Ω and x ∈ Ω, it is clear by Definition 7.4.2 that i Ω (x ∗ ) = 0, i Ω (x) = 0. From this and (7.4.1), the set of subgradients of i Ω (x) (x ∈ Ω) at x ∗ is ∂i Ω (x ∗ ) = { υ ∈ ℝn | i Ω (x) ≥ i Ω (x ∗ ) + ⟨υ, x − x ∗ ⟩ for any x ∈ Ω } = { υ ∈ ℝn | ⟨υ, x − x ∗ ⟩ ≤ 0 for any x ∈ Ω } = ⋂ { υ ∈ ℝn | ⟨υ, x − x ∗ ⟩ ≤ 0 }. x∈Ω

The combination of this and (7.4.4) gives (7.4.3). The following rule is a variant of Fermat rule III.

Fermat rule IV Let f : ℝn → ℝ be a convex function and Ω ⊂ ℝn be a nonempty closed convex set. Then x ∗ minimizes f(x) on Ω if and only if 0 ∈ (∂f(x ∗ ) + N Ω (x ∗ )), and this is further equivalent to that there is a υ ∈ ∂f(x∗ ) such that −υ ∈ N Ω (x ∗ ).

7.5 Karush–Kuhn–Tucker optimality conditions

|

225

Proof. By Definition 7.4.2, i Ω (x) is constant. Since x ∗ minimizes f(x) on Ω, it also minimizes f(x) + i Ω (x) on Ω. By Fermat rule III, x ∗ minimizes f(x) + i Ω (x) on Ω if and only if 0 ∈ ∂(f + i Ω )(x ∗ ). By Definition 7.4.1 and (7.4.3), it follows that ∂(f + i Ω )(x ∗ ) = ∂f(x ∗ ) + ∂i Ω (x ∗ ) = ∂f(x ∗ ) + N Ω (x ∗ ). Therefore, x ∗ minimizes f(x) on Ω if and only if 0 ∈ (∂f(x ∗ ) + N Ω (x ∗ )), and this is further equivalent to that there is a υ ∈ ∂f(x∗ ) such that −υ ∈ N Ω (x ∗ ). An application of Fermat rules is to solve the nonnegative constrained convex optimization model (L nc ) : min f(x) subject to x ∈ ℝ+n , x

where the function f : ℝn → ℝ is convex and ℝ+n = [0, ∞)n is the n-dimensional nonnegative orthant. The following result is given. Let x ∗ = (x∗1 , . . . , x∗n ) ∈ ℝ+n . Then x ∗ is an optimal solution of the model (L nc ) if and only if there is a υ ∈ ∂f(x∗ ) such that υi ≥ 0 υ i x∗i

=0

(i = 1, . . . , n), (i = 1, . . . , n),

where υ = (υ1 , . . . , υ n ). In fact, according to Fermat rule IV, x ∗ is an optimal solution of the model (L nc ) if and only if there is a υ ∈ ∂f(x ∗ ) such that −υ ∈ Nℝ+n (x ∗ ). However, −υ ∈ Nℝ+n (x ∗ ) if and only if −υ i ∈ Nℝ+ (x∗i ) (i = 1, . . . , n), where υ = (υ1 , . . . , υ n ), x ∗ = (x∗1 , . . . , x∗n ), ℝ+n = [0, ∞)n ,

ℝ+ = [0, ∞).

By Definition 7.4.3, a direct computation shows that for i = 1, . . . , n, { ℝ− if x∗i = 0, Nℝ+ (x∗i ) = { 0 if 0 < x∗i < ∞, { where ℝ− = (−∞, 0). So −υ i ∈ Nℝ+ (x∗i ) (i = 1, . . . , n) if and only if υ i ≥ 0 and υ i x∗i = 0 (i = 1, . . . , n). Therefore, x ∗ is an optimal solution of the model (L nc ) if and only if there is a υ ∈ ∂f(x∗ ) such that υi ≥ 0 υ i x∗i

=0

(i = 1, . . . , n), (i = 1, . . . , n).

7.5 Karush–Kuhn–Tucker optimality conditions Karush–Kuhn–Tucker (KKT) conditions are the optimality conditions for solving the optimization models. These conditions are related to KKT multipliers. In order to give the KKT conditions, we first introduce some propositions.

226 | 7 Optimization

Proposition 7.5.1. Let Ω be a hyperplane and Ω = { x ∈ ℝn | ⟨a, x⟩ = b }

(0 ≠ a ∈ ℝn , b ∈ ℝ).

Then, for x∗ ∈ Ω, the vector −υ ∈ N Ω (x ∗ ) if and only if there is a λ∗ ∈ ℝ such that υ = λ∗ a. Proof. Let U = Ω − x ∗ . For each x ∈ U , there is a x 󸀠 ∈ Ω such that x = x 󸀠 − x ∗ . Note that ⟨a, x 󸀠 ⟩ = b (x 󸀠 ∈ Ω), ⟨a, x∗ ⟩ = b

(x ∗ ∈ Ω).

It follows that for each x ∈ U , ⟨a, x⟩ = ⟨a, x󸀠 − x ∗ ⟩ = ⟨a, x 󸀠 ⟩ − ⟨a, x ∗ ⟩ = 0. So U = { x | ⟨a, x⟩ = 0 }. Let V = { λa | λ ∈ ℝ }. For any x ∈ U and any λa ∈ V , ⟨λa, x⟩ = λ⟨a, x⟩ = 0. So U ⊥ V . Now we prove that N Ω (x ∗ ) = V . If υ ∈ V , then υ = λa. For all x ∈ Ω, ⟨υ, x − x∗ ⟩ = ⟨λa, x − x ∗ ⟩ = λ(⟨a, x⟩ − ⟨a, x ∗ ⟩) = λ(b − b) = 0. By Definition 7.4.3, it follows that υ ∈ N Ω (x ∗ ). So V ⊂ N Ω (x ∗ ). Conversely, if υ ∈ N Ω (x ∗ ), then υ ∈ V . If υ ∈ ̸ V , since U ⊥ V , υ = λa + υ̂ for some λ ∈ ℝ and 0 ≠ υ̂ ∈ U . On the one hand, since υ̂ ≠ 0, ⟨υ,̂ υ⟩̂ > 0. (7.5.1) On the other hand, since υ̂ ∈ U , there is a x̂ ∈ Ω such that υ̂ = x̂ − x∗ . Again, by the assumption υ ∈ N Ω (x ∗ ), it follows by Definition 7.4.3 that ⟨υ, x̂ − x ∗ ⟩ ≤ 0.

(7.5.2)

The left-hand side of (7.5.2) becomes ⟨υ, x̂ − x∗ ⟩ = ⟨λa + υ,̂ x̂ − x ∗ ⟩ = λ(⟨a, x⟩̂ − ⟨a, x ∗ ⟩) + ⟨υ,̂ x̂ − x ∗ ⟩ = λ(b − b) + ⟨υ,̂ x̂ − x ∗ ⟩ ̂ = ⟨υ,̂ υ⟩, and so ⟨υ,̂ υ⟩̂ ≤ 0. This is in contradiction with (7.5.1). Thus, υ ∈ V . So N Ω (x ∗ ) ⊂ V . Hence N Ω (x ∗ ) = V . This implies that −υ ∈ N Ω (x ∗ ) if and only if −υ ∈ V . Note that V = {λa | λ ∈ ℝ}. Then, −υ ∈ V if and only if there is a λ∗ ∈ ℝ such that υ = λ∗ a. So Proposition 7.5.1 follows.

7.5 Karush–Kuhn–Tucker optimality conditions

|

227

Proposition 7.5.2. Let Ω be a closed half-space and Ω = { x ∈ ℝn | ⟨a, x⟩ ≥ b }

(0 ≠ a ∈ ℝn , b ∈ ℝ).

Then, for x ∗ ∈ Ω, the vector −υ ∈ N Ω (x ∗ ) if and only if there is a λ∗ ≥ 0 such that λ∗ (⟨a, x ∗ ⟩ − b) = 0, υ = λ∗ a. Proof. There are two cases. Case 1. ⟨a, x ∗ ⟩ > b. Then x ∗ is an interior point of Ω. So N Ω (x ∗ ) = { 0 }. The conclusion holds clearly. Case 2. ⟨a, x ∗ ⟩ = b. Let Ω1 = { x ∈ ℝn | ⟨a, x⟩ > b }, Ω2 = { x ∈ ℝn | ⟨a, x⟩ = b }, where 0 ≠ a ∈ ℝn and b ∈ ℝ. Then Ω = Ω1 ⋃ Ω2 and x ∗ ∈ Ω2 . Let U = Ω2 − x ∗ . A similar argument of Proposition 7.5.1 implies that U = { x ∈ ℝn | ⟨a, x⟩ = 0 }. Let V = { λa | λ ≤ 0 }. For any x ∈ U and any λa ∈ V , ⟨λa, x⟩ = λ⟨a, x⟩ = 0. So U ⊥ V . Now we prove N Ω (x ∗ ) = V . First, we prove that if υ ∈ N Ω (x ∗ ), then υ ∈ V . If υ ∉ V , since U ⊥ V , υ = λa + υ̂ for some λ ≤ 0 and 0 ≠ υ̂ ∈ U . On the one hand, since υ̂ ≠ 0, ⟨υ,̂ υ⟩̂ > 0. (7.5.3) On the other hand, since υ̂ ∈ U , there is an x̂ ∈ Ω2 such that υ̂ = x̂ − x ∗ . Since Ω = Ω1 ⋃ Ω2 , clearly, x̂ ∈ Ω. Again, by the assumption that υ ∈ N Ω (x ∗ ), it follows by Definition 7.4.3 that ⟨υ, x̂ − x ∗ ⟩ ≤ 0. (7.5.4) Since υ = λa + υ,̂ where υ̂ = x̂ − x ∗ and x̂ ∈ Ω2 , the left-hand side of (7.5.4) becomes ⟨υ, x̂ − x ∗ ⟩ = ⟨λa + υ,̂ x̂ − x ∗ ⟩ = λ(⟨a, x⟩̂ − ⟨a, x ∗ ⟩) + ⟨υ,̂ x̂ − x ∗ ⟩ = λ(b − b) + ⟨υ,̂ υ⟩̂ ̂ = ⟨υ,̂ υ⟩, and so ⟨υ,̂ υ⟩̂ ≤ 0. This is in contradiction with (7.5.3). Thus, υ ∈ V . So N Ω (x ∗ ) ⊂ V . Next, we prove V ⊂ N Ω (x ∗ ). Let υ ∈ V . Then υ = λa, where λ ≤ 0. Note that x ∗ ∈ Ω2 . For all x ∈ Ω, ⟨υ, x − x∗ ⟩ = ⟨λa, x − x ∗ ⟩ = λ(⟨a, x⟩ − ⟨a, x ∗ ⟩) = λ(⟨a, x⟩ − b) ≤ 0. By Definition 7.4.3, υ ∈ N Ω (x ∗ ). So V ⊂ N Ω (x ∗ ).

228 | 7 Optimization Hence V = N Ω (x ∗ ). This implies that −υ ∈ N Ω (x ∗ ) if and only if −υ ∈ V . Note that V = { λa | λ ≤ 0 }. Then, −υ ∈ V if and only if there is a λ∗ ≥ 0 such that υ = λ∗ a and λ∗ (⟨a, x ∗ ⟩ − b) = 0 since the assumption in Case 2 is ⟨a, x ∗ ⟩ = b. So Proposition 7.5.2 follows. Propositions 7.5.1 and 7.5.2 can be generalized as follows. Proposition 7.5.3. Let Ω be a nonempty polyhedral set and Ω = { x ∈ ℝn | ⟨a i , x⟩ ≥ b i (i = 1, . . . , s), ⟨a i , x⟩ = b i (i = s + 1, . . . , m) } , where 0 ≠ a i ∈ ℝn and b i ∈ ℝ. Then for x ∗ ∈ Ω, the vector −υ ∈ N Ω (x ∗ ) if and only if there are λ∗i ≥ 0 (i = 1, . . . , s) and λ∗i ∈ ℝ (i = s + 1, . . . , m) such that λ∗i (⟨a i , x ∗ ⟩ − b i ) = 0 (i = 1, . . . , s), υ = λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗s a s + λ∗s+1 a s+1 + ⋅ ⋅ ⋅ + λ∗m a m . Proof. Let Ω1i = { x ∈ ℝn | ⟨a i , x⟩ ≥ b i } Ω2i and

= { x ∈ ℝ | ⟨a i , x⟩ = b i } n

s

(i = 1, . . . , s), (i = s + 1, . . . , m). m

Ω1 = ⋂ Ω1i ,

Ω2 = ⋂ Ω2i .

1

s+1

Then both Ω1 and Ω2 are polyhedral sets. Let Ω = Ω1 ⋂ Ω2 . Then Ω is also a polyhedral set. It is well known that the normal cones of polyhedral sets have the following property: N Ω (x ∗ ) = N Ω1 (x ∗ ) + N Ω2 (x ∗ ) = N Ω11 (x ∗ ) + ⋅ ⋅ ⋅ + N Ω1s (x ∗ ) + N Ω2s+1 (x ∗ ) + ⋅ ⋅ ⋅ + N Ω2m (x ∗ ). So −υ ∈ N Ω (x ∗ ) if and only if there exist υ1i (i = 1, . . . , s) and υ2i (i = s + 1, . . . , m) such that −υ1i ∈ N Ω1i (x ∗ ) (i = 1, . . . , s), −υ2i ∈ N Ω2i (x ∗ )

(i = s + 1, . . . , m),

where υ = υ11 + ⋅ ⋅ ⋅ + υ1s + υ2s+1 + ⋅ ⋅ ⋅ + υ2m . By Proposition 7.5.2, −υ1i ∈ N Ω1i (x ∗ ) (i = 1, . . . , s) if and only if there is a λ∗i ≥ 0 (i = 1, . . . , s) such that λ∗i (⟨a i , x ∗ ⟩ − b) = 0 υ1i

=

λ∗i a i

(i = 1, . . . , s), (i = 1, . . . , s).

By Proposition 7.5.1, −υ2i ∈ N Ω2i (x ∗ ) (i = s + 1, . . . , m) if and only if there is a λ∗i ∈ ℝ(i = s + 1, . . . , m) such that υ2i = λ∗i a i

(i = s + 1, . . . , m).

7.5 Karush–Kuhn–Tucker optimality conditions

|

229

Therefore, −υ ∈ N Ω (x ∗ ) if and only if there are λ∗i ≥ 0 (i = 1, . . . , s) and λ∗i ∈ ℝ (i = s + 1, . . . , m) such that λ∗i (⟨a i , x ∗ ⟩ − b) = 0 υ=

λ∗1 a1

+ ⋅⋅⋅ +

λ∗s a s

+

(i = 1, . . . , s),

λ∗s+1 a s+1

+ ⋅ ⋅ ⋅ + λ∗m a m .

Now we solve the convex optimization models by Propositions 7.5.1, 7.5.2, and 7.5.3. The simplest convex optimization model with an equality constraint is the form (L ce ) : min f(x) x

subject to ⟨a, x⟩ = b,

where f : ℝn → ℝ is convex and 0 ≠ a ∈ ℝn , x ∈ ℝn , b ∈ ℝ. Its feasible set is Ω ce = { x ∈ ℝn | ⟨a, x⟩ = b }. Theorem 7.5.4. x∗ ∈ Ω ce is the optimal solution of the model (L ce ) if and only if there is a λ∗ ∈ ℝ such that λ∗ a ∈ ∂f(x ∗ ), where ⟨a, x ∗ ⟩ = b. Proof. It is seen that the feasible set Ω ce is a hyperplane and x ∗ minimizes f on Ω ce , where ⟨a, x ∗ ⟩ = b. Fermat rule IV shows that x ∗ minimizes f on Ω ce if and only if there is a υ ∈ ∂f(x ∗ ) such that −υ ∈ N Ω ce (x ∗ ). Proposition 7.5.1 shows that when the feasible set Ω ce is a hyperplane, −υ ∈ N Ω ce (x ∗ ) if and only if there is a λ∗ ∈ ℝ such that υ = λ∗ a. Note that υ ∈ ∂f(x ∗ ). Therefore, x ∗ ∈ Ω ce is the optimal solution of the model (L ce ) if and only if there is a λ∗ ∈ ℝ such that λ∗ a ∈ ∂f(x ∗ ), where ⟨a, x ∗ ⟩ = b. The simplest convex optimization model with an inequality constraint is the form (L ci ) : min f(x) x

subject to

⟨a, x⟩ ≥ b,

where f : ℝn → ℝ is convex and 0 ≠ a ∈ ℝn , x ∈ ℝn , b ∈ ℝ. Its feasible set is Ω ci = { x ∈ ℝn | ⟨a, x⟩ ≥ b }. Theorem 7.5.5. x∗ ∈ Ω ci is the optimal solution of the model (L ci ) if and only if there is a λ∗ ≥ 0 such that λ∗ a ∈ ∂f(x ∗ ) and λ∗ (⟨a, x ∗ ⟩ − b) = 0, where ⟨a, x∗ ⟩ ≥ b. Proof. It is seen that the feasible set Ω ci is a half-space and x ∗ minimizes f on Ω ci , where ⟨a, x ∗ ⟩ ≥ b. Fermat rule IV says that x ∗ minimizes f on Ω ci if and only if there is a υ ∈ ∂f(x∗ ) such that −υ ∈ N Ω ci (x ∗ ). Proposition 7.5.2 says that when the feasible set Ω ci is a half-space, −υ ∈ N Ω ci (x ∗ ) if and only if there is a λ∗ ≥ 0 such that λ∗ (⟨a, x ∗ ⟩ − b) = 0, υ = λ∗ a. From this and υ ∈ ∂f(x ∗ ), it follows that x ∗ is the optimal solution of the model (L ci ) if and only if there is a λ∗ ≥ 0 such that λ∗ (⟨a, x ∗ ⟩ − b) = 0 and λ∗ a ∈ ∂f(x ∗ ).

230 | 7 Optimization

The linearly constrained convex optimization model is the form (L lc ) : min f(x)

subject to

x

inequality constraints:

⟨a i , x⟩ ≥ b i

(i = 1, . . . , s),

and equality constraints:

⟨a i , x⟩ = b i

(i = s + 1, . . . , m),

and the restriction:

x∈X⊂ℝ , n

where f : ℝn → ℝ is convex and X is a polyhedral set, and a i , x ∈ ℝn , b i ∈ ℝ (i = 1, . . . , m). Let X lc = { x ∈ ℝn | ⟨a i , x⟩ ≥ b i (i = 1, . . . , s); ⟨a i , x⟩ = b i (i = s + 1, . . . , m) }. Its feasible set is Ω lc = X lc ⋂ X. Both X and X lc are polyhedral sets, so is Ω lc . The assumption that the function f : ℝn → ℝ is convex and the set X is a polyhedral set is called the blanket assumption. The convex property of the function ensures that the function is continuous. Theorem 7.5.6. x ∗ ∈ Ω lc is the optimal solution of the model (L lc ) if and only if the following three conditions hold: (a) ⟨a i , x ∗ ⟩ ≥ b i (i = 1, . . . , s) and ⟨a i , x ∗ ⟩ = b i (i = s + 1, . . . , m); (b) there are m multipliers λ∗1 , . . . , λ∗m such that

λ∗i (⟨a i , x ∗ ⟩

λ∗i ≥ 0

(i = 1, . . . , s),

λ∗i

(i = s + 1, . . . , m),

∈ℝ

− bi ) = 0

(i = 1, . . . , s);

(c) x∗ minimizes f(x) + ⟨λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m , x⟩ on X. Here (c) is equivalent to that there is a υ ∈ ∂f(x ∗ ) such that −(υ + λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m ) ∈ N X (x ∗ ). The multipliers λ∗1 , . . . , λ∗m in Theorem 7.5.6 are called KKT multipliers of the linearly constrained convex optimization model. The conditions (a)–(c) in Theorem 7.5.6 are called KKT-optimality conditions of the linearly constrained convex optimization model. Proof. By x∗ ∈ Ω lc , it is clear that ⟨a i , x ∗ ⟩ ≥ b i ∗

⟨a i , x ⟩ = b i

(i = 1, . . . , s), (i = s + 1, . . . , m).

By Fermat rule IV, x ∗ minimizes f on Ω lc if and only if 0 ∈ (∂f(x ∗ ) + N Ω lc (x ∗ )). Note that Ω lc = X lc ⋂ X.

7.5 Karush–Kuhn–Tucker optimality conditions

|

231

It is well known that normal cones of polyhedral sets have the following property: N Ω lc (x ∗ ) = N X lc (x ∗ ) + N X (x ∗ ). So 0 ∈ (∂f(x ∗ ) + N Ω lc (x ∗ )) if and only if 0 ∈ (∂f(x ∗ ) + N X lc (x ∗ ) + N X (x ∗ )) .

(7.5.5)

By Fermat rule IV, this is further equivalent to that there is a υ ∈ (∂f(x∗ ) + N X (x ∗ )) such that −υ ∈ N X lc (x ∗ ). By Proposition 7.5.3, −υ ∈ N X lc (x ∗ ) if and only if there are λ∗i ≥ 0 (i = 1, . . . , s) and λ∗i ∈ ℝ (i = s + 1, . . . , m) such that λ∗i (⟨a i , x ∗ ⟩ − b i ) = 0 (i = 1, . . . , s), υ = λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m . Note that υ ∈ (∂f(x ∗ ) + N X (x ∗ )). Then (λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m ) ∈ (∂f(x ∗ ) + N X (x ∗ )) or 0 ∈ (∂f(x ∗ ) + N X (x ∗ ) − λ∗1 a1 − ⋅ ⋅ ⋅ − λ∗m a m ).

(7.5.6)

Note that ∂⟨λ∗i a i , x ∗ ⟩ = λ∗i a i (i = 1, . . . , m). Then 0 ∈ (∂f(x ∗ ) + N X (x ∗ ) − ∂⟨λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m , x ∗ ⟩) or 0 ∈ (∂ [ f(x∗ ) − ⟨λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m , x ∗ ⟩] + N X (x ∗ )) .

(7.5.7)

By Fermat rule IV, (7.5.7) holds if and only if x ∗ minimizes f(x) − ⟨λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m , x⟩ on X, and (7.5.6) holds if and only if there is υ ∈ ∂f(x∗ ) such that (−υ + λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m ) ∈ N X (x ∗ ). In the linearly constrained convex optimization model, if X = ℝ+r × ℝn−r , the corresponding optimization model is called a linearly nonnegative constrained convex optimization model. Again, if f(x) = ⟨c, x⟩, where c is an n-dimensional constant vector, the corresponding optimization model is a linear optimization model. For these two models, two corollaries of Theorem 7.5.6 are given as follows. Corollary 7.5.7. Let X = ℝ+r × ℝn−r . Then x ∗ ∈ Ω lc is an optimal solution of the linearly nonnegative constrained convex optimization model if and only if the following KKToptimality conditions (a)–(d) hold: (a) x∗j ≥ 0 (j = 1, . . . , r) and x∗j ∈ ℝ (j = r + 1, . . . , n), where x ∗ = (x∗1 , . . . , x∗n ); (b) ⟨a i , x ∗ ⟩ ≥ b i (i = 1, . . . , s) and ⟨a i , x ∗ ⟩ = b i (i = s + 1, . . . , m), where a i = (a i1 , . . . , a in ); (c) there are m KKT multipliers λ∗1 , . . . , λ∗m such that

λ∗i (⟨a i , x ∗ ⟩

λ∗i ≥ 0

(i = 1, . . . , s),

λ∗i

(i = s + 1, . . . , m),

∈ℝ

− bi ) = 0

(i = 1, . . . , s);

232 | 7 Optimization (d) there is a υ ∈ ∂f(x ∗ ) such that υ j − λ∗1 a1j − ⋅ ⋅ ⋅ − λ∗m a mj ≥ 0 (υ j −

λ∗1 a1j

υj −

− ⋅⋅⋅ −

λ∗1 a1j

(j = 1, . . . , r),

λ∗m a mj ) x∗j

=0

(j = 1, . . . , r),

λ∗m a mj

=0

(j = r + 1, . . . , n),

− ⋅⋅⋅ −

where υ = (υ1 , . . . , υ n ). Proof. Theorem 7.5.6 gives that x ∗ ∈ Ω lc is the optimal solution of the model (L lc ) if and only if the following KKT-optimality conditions (a)–(c) hold: (a) ⟨a i , x ∗ ⟩ ≥ b i (i = 1, . . . , s) and ⟨a i , x ∗ ⟩ = b i (i = s + 1, . . . , m); (b) there are m KKT multipliers λ∗1 , . . . , λ∗m such that

λ∗i (⟨a i , x ∗ ⟩

λ∗i ≥ 0

(i = 1, . . . , s),

λ∗i

(i = s + 1, . . . , m),

∈ℝ

− bi ) = 0

(i = 1, . . . , s);

(c) there is a υ ∈ ∂f(x∗ ) such that (−υ + λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m ) ∈ Nℝ+r ×ℝn−r (x ∗ ). Let υ = (υ1 , . . . , υ n ) and a i = (a i1 , . . . , a in ) (i = 1, . . . , m) in the KKT-optimality condition (c). Then (−υ + λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m ) ∈ Nℝ+r ×ℝn−r (x ∗ ) if and only if (−υ j + λ∗1 a1j + ⋅ ⋅ ⋅ + λ∗m a mj ) ∈ Nℝ+ (x∗j ) (−υ j +

λ∗1 a1j

+ ⋅⋅⋅ +

λ∗m a mj )



(j = 1, . . . , r),

Nℝ (x∗j )

(j = r + 1, . . . , n).

(7.5.8)

By Definition 7.4.3, for j = 1, . . . , r, { ℝ− Nℝ+ (x∗j ) = { 0 {

if x∗j = 0, if 0 < x∗j < ∞,

where ℝ+ = [0, ∞) and ℝ− = (−∞, 0), and for j = r + 1, . . . , n, Nℝ (x∗j ) = { 0 }. From this and (7.5.8), it is seen that the following (a) and (b) hold: (−υ j + λ∗1 a1j + ⋅ ⋅ ⋅ + λ∗m a mj ) ∈ Nℝ+ (x∗j ) if and only if

υ j − λ∗1 a1j − ⋅ ⋅ ⋅ − λ∗m a mj ≥ 0 (υ j −

λ∗1 a1j

− ⋅⋅⋅ −

λ∗m a mj ) x∗j

=0

(j = 1, . . . , r)

(j = 1, . . . , r), (j = 1, . . . , r);

(a)

7.6 Primal and dual pairs of linear optimization

|

(−υ j + λ∗1 a1j + ⋅ ⋅ ⋅ + λ∗m a mj ) ∈ Nℝ (x∗j ) (j = r + 1, . . . , n)

233

(b)

if and only if υ j − λ∗1 a1j − ⋅ ⋅ ⋅ − λ∗m a mj = 0

(j = r + 1, . . . , n).

Therefore (−υ + λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m ) ∈ Nℝ+r ×ℝn−r (x ∗ ) if and only if υ j − λ∗1 a1j − ⋅ ⋅ ⋅ − λ∗m a mj ≥ 0 (υ j −

λ∗1 a1j

− ⋅⋅⋅ −

λ∗m a mj ) x∗j

=0

υ j − λ∗1 a1j − ⋅ ⋅ ⋅ − λ∗m a mj = 0

(j = 1, . . . , r), (j = 1, . . . , r), (j = r + 1, . . . , n).

Corollary 7.5.7 follows. Corollary 7.5.8. Let X = ℝ+r × ℝn−r and f(x) = ⟨c, x⟩. Then x∗ ∈ Ω lc is the optimal solution of the linear optimization model if and only if the following KKT-optimality conditions hold: (a) x∗j ≥ 0 (j = 1, . . . , r) and x∗j ∈ ℝ (j = r + 1, . . . , n), where x ∗ = (x∗1 , . . . , x∗n ); (b) ⟨a i , x ∗ ⟩ ≥ b i (i = 1, . . . , s) and ⟨a i , x ∗ ⟩ = b i (i = s + 1, . . . , m), where a i = (a i1 , . . . , a in ); (c) there are m KKT multipliers λ∗1 , . . . , λ∗m such that

λ∗i (⟨a i , x ∗ ⟩

λ∗i ≥ 0

(i = 1, . . . , s),

λ∗i

(i = s + 1, . . . , m),

∈ℝ

− bi ) = 0

(i = 1, . . . , s);

(d) for c = (c1 , . . . , c n ), c j − λ∗1 a1j − ⋅ ⋅ ⋅ − λ∗m a mj ≥ 0

(j = 1, . . . , r),

λ∗1 a1j

(c j −

cj −

− ⋅⋅⋅ −

λ∗1 a1j

λ∗m a mj ) x∗j

=0

(j = 1, . . . , r),

λ∗m a mj

=0

(j = r + 1, . . . , n).

− ⋅⋅⋅ −

Proof. Corollary 7.5.8 follows immediately from Corollary 7.5.7 and ∂f(x ∗ ) = {c}.

7.6 Primal and dual pairs of linear optimization Assume that a bivariate function S(x, y) defined on ℝn × ℝm is a convex-concave bivariate function, i.e., – for all y ∈ ℝm , the function S(x, y) is convex with respect to x ∈ ℝn , – for all x ∈ ℝn , the function S(x, y) is concave with respect to y ∈ ℝm , then S(x, y) is called a saddle function on ℝn × ℝm . Assume further that there is a point (x ∗ , y∗ ) such that y∗ maximizes S(x ∗ , y) on ℝm and x ∗ minimizes S(x, y∗ ) on ℝn . Then the point (x ∗ , y∗ ) is called a saddle point of the saddle function S(x, y).

234 | 7 Optimization Since y∗ maximizes S(x ∗ , y) on ℝm and x ∗ minimizes S(x, y∗ ) on ℝn , S(x ∗ , y) ≤ S(x ∗ , y∗ ), S(x, y∗ ) ≥ S(x ∗ , y∗ ). From these two inequalities, it follows that if (x ∗ , y∗ ) is a saddle point of a saddle function S(x, y), then S(x ∗ , y) ≤ S(x ∗ , y∗ ) ≤ S(x, y∗ ). Definition 7.6.1. The following two optimization models: (L p ) : minn sup S(x, y), x∈ℝ y∈ℝm

(L ) : maxm infn S(x, y) d

y∈ℝ

x∈ℝ

are called a primal/dual pair, where the model (L p ) is called the primal model and the model (L d ) is called the dual model. Let

f(x) = sup S(x, y), y∈ℝm

g(y) = infn S(x, y). x∈ℝ

By the convex-concave property of the saddle function S(x, y), it is easy to deduce that f(x) is a convex function and g(y) is a concave function. Dual theorem. The following two conditions are equivalent: (a) (x ∗ , y∗ ) is a saddle point of the saddle function S(x, y), (b) x ∗ solves the primal model (L p ) and y∗ solves the dual model (L d ), and f(x ∗ ) = g(y∗ ), and the optimal values of the primal model (L p ) and the dual model (L d ) are both equal to S(x ∗ , y∗ ). Now consider the linearly constrained convex optimization model (L) given in Section 7.5 min f(x) x

subject to

inequality constraints:

⟨a i , x⟩ ≥ b i

(i = 1, . . . , s),

and equality constraints:

⟨a i , x⟩ = b i

(i = s + 1, . . . , m),

and the restriction:

x∈X⊂ℝ , n

where f : ℝn → ℝ is convex and X is a polyhedral set. Define its Lagrangian function as m { { f(x) + ∑ λ i (b i − ⟨a i , x⟩) if x ∈ X, λ ∈ S, { { 1 L(x, λ) = { ∞ if x ∈ ̸ X, { { { if x ∈ X, λ ∈ ̸ S, { −∞ where S = ℝ+s × ℝm−s and λ = (λ1 , . . . , λ m ).

7.6 Primal and dual pairs of linear optimization

|

235

7.6.1 Saddle function It is clear that the Lagrangian function L(x, λ) is a bivariate function on ℝn × ℝm . We will prove that L(x, λ) is a convex-concave bivariate function on ℝn × ℝm . Without loss of generality, consider x ∈ X and λ ∈ S. Then m

L(x, λ) = f(x) + ∑ λ i (b i − ⟨a i , x⟩). 1

On the one hand, for all λ ∈ S and any x1 , x2 ∈ X, L((1 − τ)x1 + τx2 , λ)

m

= f((1 − τ)x1 + τx2 ) + ∑ λ i (b i − ⟨a i , (1 − τ)x1 + τx2 ⟩)

(0 ≤ τ ≤ 1).

1

Since f is a convex function and b i = (1 − τ)b i + τb i , it follows that f((1 − τ)x1 + τx2 ) ≤ (1 − τ)f(x1 ) + τf(x2 ), b i − ⟨a i , (1 − τ)x1 + τx2 ⟩ = (1 − τ)(b i − ⟨a i , x1 ⟩) + τ(b i − ⟨a i , x2 ⟩). Substituting them into the above equality gives m

L((1 − τ)x1 + τx2 , λ) ≤ (1 − τ) (f(x1 ) + ∑ λ i (b i − ⟨a i , x1 ⟩)) 1 m

+ τ (f(x2 ) + ∑ λ i (b i − ⟨a i , x2 ⟩)) 1

= (1 − τ)L(x1 , λ) + τL(x2 , λ)

(0 ≤ τ ≤ 1),

i.e., for all λ ∈ S, the Lagrangian function is convex with respect to x ∈ X. (1) (2) On the other hand, for all x ∈ X and any λ i , λ i ∈ S, (1)

L(x, (1 − τ)λ i

m

(2)

(1)

+ τλ i ) = f(x) + ∑ ((1 − τ)λ i

(2)

+ τλ i ) (b i − ⟨a i , x⟩)

(0 ≤ τ ≤ 1).

1

Note that f(x) = (1 − τ)f(x) + τf(x). Substituting it into the above equality, we get (1)

L(x, (1 − τ)λ i

m

(2)

(1)

+ τλ i ) = (1 − τ) (f(x) + ∑ λ i (b i − ⟨a i , x⟩)) 1 m

(2)

+ τ (f(x) + ∑ λ i (b i − ⟨a i , x⟩)) = (1 − τ)L(x, λ

1 (1)

) + τL(x, λ(2) )

(0 ≤ τ ≤ 1),

i.e., for all x ∈ X, the Lagrangian function is concave with respect to λ ∈ S. Thus Lagrangian function L(x, λ) is a saddle function.

236 | 7 Optimization

7.6.2 Saddle point Consider the KKT-optimization conditions in Theorem 7.5.6. Let a i = (a i1 , . . . , a in ) (i = 1, . . . , m), x = (x1 , . . . , x n ). Then ⟨λ∗1 a1 + λ∗2 a2 + ⋅ ⋅ ⋅ + λ∗m a m , x⟩ = (λ∗1 a11 + λ∗2 a21 + ⋅ ⋅ ⋅ + λ∗m a m1 )x1 + ⋅ ⋅ ⋅ + (λ∗1 a1n + λ∗2 a2n + ⋅ ⋅ ⋅ + λ∗m a mn )x n = λ∗1 (a11 x1 + ⋅ ⋅ ⋅ + a1n x n ) + λ∗2 (a21 x1 + ⋅ ⋅ ⋅ + a2n x n ) + ⋅ ⋅ ⋅ + λ∗m (a m1 x1 + ⋅ ⋅ ⋅ + a mn x n ) m

m

1

1

= ∑ λ∗i (a i1 x1 + ⋅ ⋅ ⋅ + a in x n ) = ∑ λ∗i ⟨a i , x⟩, and so

m

f(x) − ⟨λ∗1 a1 + ⋅ ⋅ ⋅ + λ∗m a m , x⟩ = f(x) − ∑ λ∗i ⟨a i , x⟩. 1

∑m i=1

Note that λ i b i is constant. Therefore, the KKT-optimization condition (c) of Theorem 7.5.6 means that x∗ minimizes L(x, λ∗ ). The KKT-optimization conditions (a) and (b) in Theorem 7.5.6 imply that m

∑ λ∗i (b i − ⟨a i , x ∗ ⟩) = 0. i=1

On the other hand, since λ = (λ1 , . . . , λ m ) ∈ ℝ+s × ℝm−s , it is clear that λ i ≥ 0 (i = 1, . . . , s). Note that ⟨a i , x⟩ ≥ b i (i = 1, . . . , s), ⟨a i , x⟩ = b i Then

m

s

(i = s + 1, . . . , m). m

∑ λ i (b i − ⟨a i , x⟩) = ∑ λ i (b i − ⟨a i , x⟩) + ∑ λ i (b i − ⟨a i , x⟩) ≤ 0. 1

So

λ∗

1

L(x ∗ , λ).

s+1

maximizes Thus, (x ∗ , λ∗ ) is a saddle point of the Lagrange function L(x, λ) for the model (L).

7.6 Primal and dual pairs of linear optimization

237

|

7.6.3 Dual pair Theorem 7.6.2. Let c = (c1 , . . . , c n ),

x = (x1 , . . . , x n ),

a i = (a i1 , . . . , a in )

(i = 1, . . . , m),

b = (b1 , . . . , b m ),

λ = (λ1 , . . . , λ m ),

a = (a1j , . . . , a mj )

(j = 1, . . . , n).

j

Then the linear optimization model (L p ) : min⟨c, x⟩ x

subject to

the inequality constraints: ⟨a i , x⟩ ≥ b i

(i = 1, . . . , s)

and the equality constraints: ⟨a i , x⟩ = b i and the restriction:

x∈

ℝ+r

×ℝ

(i = s + 1, . . . , m),

n−r

and the linear optimization model (L d ) : max⟨b, λ⟩ subject to λ

the inequality constraints: ⟨a j , λ⟩ ≤ c j

(j = 1, . . . , r)

and the equality constraints: ⟨a , λ⟩ = c j

(j = r + 1, . . . , n),

j

and the restriction:

λ∈

ℝ+s

×ℝ

m−s

are a primal/dual pair. Proof. In fact, let X = ℝ+r × ℝn−r , S = ℝ+s × ℝm−s . Then the Lagrangian function associated with the primal model (L p ) is given by m

{ ⟨c, x⟩ + ∑ λ i (b i − ⟨a i , x⟩) if x ∈ X, λ ∈ S, { { { 1 L(x, λ) = { ∞ if x ∈ ̸ X, { { { if x ∈ X, λ ∈ ̸ S, { −∞ where λ = (λ1 , . . . , λ m ). Note that for x ∈ X and λ ∈ S, m

L(x, λ) = ⟨c, x⟩ + ∑ λ i (b i − ⟨a i , x⟩). 1

The right-hand side of (7.6.1) is computed as follows: m

m

m

⟨c, x⟩ + ∑ λ i (b i − ⟨a i , x⟩) = ⟨c, x⟩ + ∑ b i λ i − ∑ λ i ⟨a i , x⟩ 1

1 n

1 m

= ∑ c j x j + ⟨b, λ⟩ − ∑ λ i ⟨a i , x⟩. 1

1

(7.6.1)

238 | 7 Optimization

However, m

m

∑ λ i ⟨a i , x⟩ = ∑ λ i (a i1 x1 + ⋅ ⋅ ⋅ + a in x n ) 1

1

= λ1 (a11 x1 + ⋅ ⋅ ⋅ + a1n x n ) + ⋅ ⋅ ⋅ + λ m (a m1 x1 + ⋅ ⋅ ⋅ + a mn x n ) = (λ1 a11 + ⋅ ⋅ ⋅ + λ m a m1 )x1 + ⋅ ⋅ ⋅ + (λ1 a1n + ⋅ ⋅ ⋅ + λ m a mn )x n = ⟨a1 , λ⟩x1 + ⋅ ⋅ ⋅ + ⟨a n , λ⟩x n n

= ∑⟨a j , λ⟩x j 1

and n

m

n

n

∑ c j x j + ⟨b, λ⟩ − ∑ λ i ⟨a i , x⟩ = ⟨b, λ⟩ + ∑ c j x j − ∑⟨a j , λ⟩x j 1

1

1 n

1

= ⟨b, λ⟩ + ∑(c j − ⟨a j , λ⟩)x j . 1

From this and (7.6.1), it follows that for x ∈ X and λ ∈ S, n

L(x, λ) = ⟨b, λ⟩ + ∑(c j − ⟨a j , λ⟩) x j .

(7.6.2)

1

The combination of (7.6.1) and (7.6.2) gives m

n

L(x, λ) = ⟨c, x⟩ + ∑ λ i (b i − ⟨a i , x⟩) = ⟨b, λ⟩ + ∑(c j − ⟨a j , λ⟩) x j . 1

1

From this, by Definition 7.6.1 of primal/dual pairs, it follows that m

(L p ) : min sup L(x, λ) = min sup (⟨c, x⟩ + ∑ λ i (b i − ⟨a i , x⟩)) x∈X λ∈S

x∈X λ∈S

1 m

= min (⟨c, x⟩ + sup ∑ λ i (b i − ⟨a i , x⟩)) , x∈X

λ∈S

(7.6.3)

1 n

(L d ) : max inf L(x, λ) = max inf (⟨b, λ⟩ + ∑(c j − ⟨a j , λ⟩) x j ) λ∈S x∈X

λ∈S x∈X

1 n

= max (⟨b, λ⟩ + inf ∑(c j − ⟨a j , λ⟩) x j ) . x∈X

λ∈S

Note that

(7.6.4)

1

x = (x1 , . . . , x n ) ∈ X = ℝr × ℝn−r , λ = (λ1 , . . . , λ m ) ∈ S = ℝs × ℝm−s .

It is seen that

x1 ≥ 0,

...,

x r ≥ 0,

x r+1 , . . . , x n ∈ ℝ,

λ1 ≥ 0,

...,

λ s ≥ 0,

λ s+1 , . . . , λ m ∈ ℝ.

(7.6.5)

7.6 Primal and dual pairs of linear optimization

|

239

By the assumption that ⟨a i , x⟩ ≥ b i (i = 1, . . . , s) and ⟨a i , x⟩ = b i (i = s + 1, . . . , m), and (7.6.5), m

s

∑ λ i (b i − ⟨a i , x⟩) = ∑ λ i (b i − ⟨a i , x⟩) ≤ 0. 1

By the assumption that and (7.6.5),

1

⟨a j ,

λ⟩ ≤ c j (j = 1, . . . , r) and ⟨a j , λ⟩ = c j (j = r + 1, . . . , n),

n

r

∑(c j − ⟨a j , λ⟩) x j = ∑(c j − ⟨a j , λ⟩) x j ≥ 0. 1

1

Thus, m

sup ∑ λ i (b i − ⟨a i , x⟩) = 0, λ∈S

1 n

inf ∑(c j − ⟨a j , λ⟩) x j = 0.

x∈X

1

From this and (7.6.3), and (7.6.4), it follows that (L p ) : min sup L(x, λ) = min ⟨c, x⟩, x∈X λ∈S

x∈X

(L ) : max inf L(x, λ) = max ⟨b, λ⟩. d

λ∈S x∈X

λ∈S

Therefore, the linear optimization models (L p ) and (L d ) are a primal/dual pair. Theorem 7.6.2 shows that two linear optimization models (L p ) and (L d ) are a primal/ dual pair. This result will be applied in Chapter 8. Example 7.6.3. The linear optimization model (L p ) : min{ f(x1 , x2 , x3 ) = 5x1 − 6x2 + 8x3 }

subject to

2x1 + 3x2 + 2x3 ≥ 2, { { { { { { 4x1 + 4x2 + 3x3 ≥ 1, { { { 5x1 − 8x2 + x3 = −3, { { { { { { { 7x1 + 9x2 − 7x3 = 1, { { { x1 ≥ 0, x2 ≥ 0, x3 ∈ ℝ. and the linear optimization model (L d ) : max{ g(λ1 , λ2 , λ3 , λ4 ) = 2λ1 + λ2 − 3λ3 + λ4 }

subject to

2λ1 + 4λ2 + 5λ3 + 7λ4 ≤ 5, { { { { { { 3λ1 + 4λ2 − 8λ3 + 9λ4 ≤ −6, { { { 2λ1 + 3λ2 + λ3 − 7λ4 = 8, { { { { λ1 ≥ 0, λ2 ≥ 0, λ3 ∈ ℝ, λ4 ∈ ℝ are a primal/dual pair, and the model (L p ) is the primal model and the model (L d ) is the dual model.

240 | 7 Optimization

7.7 Case studies Optimization problems arise in almost all areas of environmental science. Optimization can used to reduce overall product cost, minimize negative environment impacts and maximize the probability of making a correct decision. Here we give some case studies to explain how the algorithms in this chapter are applied in environmental science.

7.7.1 Building design Buildings have considerable impacts on the environment and it has become necessary to pay more attention to environmental performance in building design. Recent progress in the design of greener buildings advances the research and applications of various optimization methods in the building sector. In these optimization problems, main variables include building orientation, window type, window-to-wall ratio, roof type, etc. The lower and upper boundary values of each variable are viewed as constraints. The objective is to achieve cost-effective green building design under these constraints.

7.7.2 Supply chain planning The efficiency of a company can often be constrained by the efficiency of its supply chain management procedures. The supply chain planning problem consists of determining the optimal production, storage, backorder, subcontracting and distribution variables associated with a supply chain network, where the objective is to achieve minimization of the overall total cost, carbon emissions, total delivery time, tardiness, or maximization of the benefit, etc. and the main constraints includes mass balances, capacities (production, distribution, budget, storage), time restrictions, etc.

7.7.3 Coal mining Significant quantities of groundwater are discharged in underground coal mining. It brings further damage to local water environment. Xu et al. (2016) developed an equilibrium strategy-based optimization method to solve the coal-water conflict in China. Each colliery pursues the largest possible profit and minimizes the cost which consists of production costs, sewage treatment costs and punitive fees. The main constraints include production capacity, mining quota limitations, and environmental protection constraints. At the leader level, the government has the obligation to promote financial

Further reading

| 241

revenue so as to ensure an environmental self-repair capacity. In this case, financial revenue is set to the objective function and environmental protection is the constraint.

7.7.4 Iron and steel industry The rapid development of the iron and steel industry directly needs large amount of energy. Because energy shortages and environmental degradation problems have become increasingly prominent, energy conservation and emission reduction have become more important. The objective function of the optimization is the energy intensity in the iron and steel industry, while the constraints include production unit ferrite balance, process ferrite balance, market order as well as production capacity.

7.7.5 Electric energy generating systems Economic operation of electric energy generating systems is one of the prevailing problems in energy systems. The objective function includes cost reduction, voltage profile improvement, voltage stability enhancement, emission reduction, as well as their combinations. The constraints can be classified into equality and inequality constraints. Equality constraints consist of real power constraints and reactive power constraints. Inequality constraints consists of generator constraints, transformer constraints, shunt VAR compensator constraints, and security constraints.

Further reading [1]

[2] [3]

[4] [5]

[6]

[7]

Ahmadi P, Dincer I, Rosen MA. Multi-objective optimization of an ocean thermal energy conversion system for hydrogen production. International Journal of Hydrogen Energy. 2015(40): 7601–7608. Cauchy A. Methode génerale pour la résolution des systéms d’equations simultanées, C R Acad Sci Paris. 1847(25):536–538. Chaib AE, Bouchekara HREH, Mehasni R, Abido MA. Optimal power flow with emission and nonsmooth cost functions using backtracking search optimization algorithm. International Journal of Electrical Power & Energy Systems. 2016(81):64–77. Cooper FC, Zanna L. Optimisation of an idealised ocean model, stochastic parameterisation of sub-grid eddies. Ocean Modelling. 2015(88):38–53. Feng L, Mears L, Beaufort C, Schulte J. Energy, economy, and environment analysis and optimization on manufacturing plant energy supply system. Energy Conversion and Management. 2016(117):454–465. Fergani Z, Touil D, Morosuk T. Multi-criteria exergy based optimization of an organic rankine cycle for waste heat recovery in the cement industry. Energy Conversion and Management. 2016(112):81–90. Harver CM. Operations Research: An Introduction to Linear Optimization and Decision Analysis, Elsevier North Holland, Inc., New York, 1979.

242 | 7 Optimization

[8] [9]

[10] [11] [12]

[13] [14] [15]

[16]

Kucukmehmetoglu M, Geymen A. Optimization models for urban land readjustment practices in Turkey. Habitat International. 2016(53):517–533. Kuhn HW, Tucker AW. Nonlinear Programming. Proceedings of Second Berkeley Symposium on Mathematical Statistics and Probability, pp. 481–492, University of California Press, Berkeley, CA, 1950. Lemke CE. The dual method of solving a linear programming problem. Naval Res Logistics Q. 1954(1):48–54. Lendering KT, Jonkman SN, van Gelder PHAJM, Peters DJ. Risk-based optimization of land reclamation. Reliability Engineering & System Safety. 2015(144):193–203. Ma D, Wang S, Zhang Z. Hybrid algorithm of minimum relative entropy-particle swarm optimization with adjustment parameters for gas source term identification in atmosphere. Atmospheric Environment. 2014(94):637–646. Nguyen AT, Reiter S, Rigo P. A review on simulation-based optimization methods applied to building performance analysis. Applied Energy. 2014(113):1043–1058. Wang W, Zmeureanu R, Rivard H. Applying multi-objective genetic algorithms in green building design optimization. Building and Environment. 2005(40):1512–1525. Xu J, Lv C, Zhang M, Yao L, Zeng Z. Equilibrium strategy-based optimization method for the coal-water conflict: A perspective from China. Journal of Environmental Management. 2015(160):312–323. Zamarripa MA, Aguirre AM, Mendez CA, Espuna A. Mathematical programming and game theory optimization-based tool for supply chain planning in cooperative/competitive environments. Chemical Engineering Research and Design. 2013(91):1588–1600.

8 Data envelopment analysis Data Envelopment Analysis (DEA) is a very powerful service management and benchmarking technique to evaluate organizations (i.e. so-called Decision Making Units (DMUs)) such as companies, schools, hospitals, shops, bank branches and similar instances where there is a relatively homogeneous set of units. Each DMU has a varying level of inputs and gives a varying level of outputs. DEA uses a linear optimization technique to measure the relative performance of DMUs where the presence of multiple inputs and outputs makes comparisons difficult. The main DEA models can be classified in three groups according to orientation, returns to scale, and distance function. The oriented models include input-oriented, output-oriented, and nonoriented envelopment models. The returns to scale models include constant, variable, nonincreasing, nondecreasing, and generalized returns to scale models. The distance function models include traditional radial DEA, nonradial slack-based measure, and radial and hybrid slack based measure models. In this chapter we will introduce these models.

8.1 Charnes–Cooper–Rhodes DEA models The Charnes–Cooper–Rhodes (CCR) DEA model is the origin of DEA models. The CCR DEA model is referred to as the Constant Returns to Scale (CRS) model or the synthetic technical efficiency model. Assume that there are n DMUs to be evaluated, denoted by DMUj (j = 1, . . . , n). For each DMUj , the input x j has m components and the output y j has q components x j = (x1j , x2j , . . . , x mj ),

y j = (y1j , y2j , . . . , y qj )

and each DMUj consumes amount x ij of input i (i = 1, . . . , m) and produces amount y rj of output r (r = 1, . . . , q). Assume that x ij ≥ 0 and y rj ≥ 0 and that each DMUj has at least one positive input value and one positive output value. For the particular DMU0 being evaluated, its input and output are, respectively, x0 = (x10 , x20 , . . . , x m0 ),

y0 = (y10 , y20 , . . . , y q0 ),

and it consumes amounts x i0 of input i (i = 1, . . . , m) and produces amount y r0 of output r (r = 1, . . . , q). Assume that x i0 ≥ 0 and y r0 ≥ 0.

8.1.1 Input-oriented CCR DEA models Let

u = (u1 , . . . , u q ), υ = (υ1 , . . . , υ m ).

DOI 10.1515/9783110424904-009

244 | 8 Data envelopment analysis

The ratio of outputs y0 with weight u to inputs x0 with weight υ forms the objective function u1 y10 + u2 y20 + ⋅ ⋅ ⋅ + u q y q0 ⟨u, y0 ⟩ , = ⟨υ, x0 ⟩ υ1 x10 + υ2 x20 + ⋅ ⋅ ⋅ + υ m x m0 where both u ≥ 0 and υ ≥ 0 are variables, and x0 , y0 are the observed input and output values, respectively. The objective function is a measure of efficiency. An additional constraint is to guarantee that the efficiency of a DMU must be less than or equal to unity, i.e., u1 y1j + u2 y2j + ⋅ ⋅ ⋅ + u q y qj ⟨u, y j ⟩ ≤ 1 (j = 1, . . . , n). = ⟨υ, x j ⟩ υ1 x1j + υ2 x2j + ⋅ ⋅ ⋅ + υ m x mj The first input-oriented CCR DEA model presented by Charnes, Cooper, and Rhodes is the form ⟨u, y0 ⟩ ip (LCCR ) : max { } subject to ⟨υ, x0 ⟩ ⟨u,y ⟩

{ ⟨υ,xjj ⟩ ≤ 1 (j = 1, . . . , n), { { u ≥ 0, υ ≥ 0. If (u∗ , υ∗ ) is the optimal solution of the model (LCCR ), then (tu∗ , tυ∗ ) (t > 0) is also its optimal solution. Hence, the above ratio form yields an infinite number of solutions. Using the Charnes–Cooper transformation 1 t= (8.1.1) or ⟨tυ, x0 ⟩ = 1, ⟨υ, x0 ⟩ ip

the objective function becomes ⟨u, y0 ⟩ = t⟨u, y0 ⟩ = ⟨tu, y0 ⟩ ⟨υ, x0 ⟩

(8.1.2)

and the additional constraint becomes ⟨u, y j ⟩ t⟨u, y j ⟩ ⟨tu, y j ⟩ = = ≤ 1 (j = 1, . . . , n). ⟨υ, x j ⟩ t⟨υ, x j ⟩ ⟨tυ, x j ⟩ By the assumption, ⟨tυ, x j ⟩ = t⟨υ, x j ⟩ > 0. So the additional constraint is equivalent to ⟨tu, y j ⟩ ≤ ⟨tυ, x j ⟩ or (8.1.3) ⟨tu, y j ⟩ − ⟨tυ, x j ⟩ ≤ 0. Let

μ = (μ1 , . . . , μ q ), ν = (ν1 , . . . , ν m ).

Using change of variables μ = tu and ν = tυ in (8.1.1), (8.1.2), and (8.1.3), the inputoriented CCR DEA model has the following equivalent form ip

(L CCR ) : max⟨μ, y0 ⟩

subject to

{ ⟨μ, y j ⟩ − ⟨ν, x j ⟩ ≤ 0 { { { ⟨ν, x0 ⟩ = 1, { { { { μ ≥ 0, ν ≥ 0. {

(j = 1, . . . , n),

8.1 Charnes–Cooper–Rhodes DEA models

| 245

The input-oriented CCR DEA model is also called the multiplier model since μ and ν are multipliers. Example 8.1.1. Assume that there are three DMUs with two inputs x1 , x2 and one output y to be evaluated. The data for the input-oriented CCR DEA model are listed in the following table:

DMU

x1

x2

y

x1 /y

x2 /y

DMU1 DMU2 DMU3

15 32 50

25 24 60

10 16 20

1.50 2.00 2.50

2.50 1.50 3.00

ip

Using the model (LCCR ), three input-oriented CCR DEA models are used to assess DMUs, respectively DMU1 : max 10 W y subject to 10 W y − 15 W x1 − 25 W x2 ≤ 0 16 W y − 32 W x1 − 24 W x2 ≤ 0 20 W y − 50 W x1 − 60 W x2 ≤ 0 15 W x1 + 25 W x2 = 1. DMU2 : max 16 W y

subject to

10 W y − 15 W x1 − 25 W x2 ≤ 0 16 W y − 32 W x1 − 24 W x2 ≤ 0 20 W y − 50 W x1 − 60 W x2 ≤ 0 32 W x1 + 24 W x2 = 1. DMU3 : max 20 W y

subject to

10 W y − 15 W x1 − 25 W x2 ≤ 0 16 W y − 32 W x1 − 24 W x2 ≤ 0 20 W y − 50 W x1 − 60 W x2 ≤ 0 50 W x1 + 60 W x2 = 1. The three evaluated DMUs may be represented by the following three points, respectively, D1 (1.5, 2.5), D2 (2, 1.5), D3 (2.5, 3). Set a rectangular coordinate system with the origin O(0, 0) and horizontal axis x1 /y, and vertical axis x2 /y on a sheet of graph paper and plot these three points D1 , D2 , and D3 . The curve through points D1 , D2 , and its extension is called the frontier curve which looks like a convex envelopment towards the origin and envelops these three

246 | 8 Data envelopment analysis points. Denote by D󸀠i the intersection point of the straight line OD i and the frontier curve. The intersection point D󸀠i is called the projection from the point D i to the frontier OD󸀠 curve. Define the efficiency value of D i as ODii . Clearly, OD󸀠1 = 1, OD1

OD󸀠2 = 1, OD2

OD󸀠3 < 1, OD3

i.e., the efficiency values of D1 and D2 are both 1, and the efficiency value of D3 is less than 1. The detailed version of the input-oriented CCR DEA model is ip

(LCCR ) : max { y10 μ1 + y20 μ2 + ⋅ ⋅ ⋅ + y q0 μ q }

subject to

y11 μ1 + y21 μ2 + ⋅ ⋅ ⋅ + y q1 μ q − x11 ν1 − x21 ν2 − ⋅ ⋅ ⋅ − x m1 ν m ≤ 0, { { { { { { .. { . { { { y μ + y2n μ2 + ⋅ ⋅ ⋅ + y qn μ q − x1n ν1 − x2n ν2 − ⋅ ⋅ ⋅ − x mn ν n ≤ 0, { { { 1n 1 { { { x10 ν1 + x20 ν2 + ⋅ ⋅ ⋅ + x m0 ν m = 1, { { { { { μ1 ≥ 0, . . . , μ q ≥ 0, ν1 ≥ 0, . . . , ν m ≥ 0. By Theorem 7.6.2, the dual model of the input-oriented CCR DEA model is (Lid CCR ) : min θ

subject to

y11 λ1 + y12 λ2 + ⋅ ⋅ ⋅ + y1n λ n ≥ y10 , { { { { { { .. { { . { { { { { { y q1 λ1 + y q2 λ2 + ⋅ ⋅ ⋅ + y qn λ n ≥ y q0 , { { { −x11 λ1 − x12 λ2 − ⋅ ⋅ ⋅ − x1n λ n + θ x10 ≥ 0, { { { { .. { { { . { { { { { { −x m1 λ1 − x m2 λ2 − ⋅ ⋅ ⋅ − x mn λ n + θ x m0 ≥ 0, { { { { { λ1 ≥ 0, λ2 ≥ 0, . . . , λ n ≥ 0. Note that x11 λ1 + x12 λ2 + ⋅ ⋅ ⋅ + x1n λ n ≤ θ x10 . From x11 λ1 + x12 λ2 + ⋅ ⋅ ⋅ + x1n λ n > 0 and x10 ≥ 0, it follows that θ > 0. Let λ = (λ1 , . . . , λ n ) and x i = (x i1 , . . . , x in )

(i = 1, . . . , m),

y = (y r1 , . . . , y rn )

(r = 1, . . . , q).

r

8.1 Charnes–Cooper–Rhodes DEA models | 247

Then the contraction of the dual model is as follows: (Lid CCR ) : min θ

subject to

{ ⟨λ, y r ⟩ ≥ y r0 { { { ⟨λ, x i ⟩ ≤ θx i0 { { { { λ ≥ 0, θ > 0. {

(r = 1, . . . , q), (i = 1, . . . , m),

If the input-oriented CCR DEA model has the optimal solution z∗ and its dual model has the optimal solution θ∗ , the dual theorem says that z∗ = θ∗ . Thus, the problem of finding the optimal solution of the input-oriented CCR DEA model is reduced to the problem of finding the optimal solution of the dual model. One can solve the dual model to obtain the optimal solution because when θ = 1, λ0 = λ k = 1, and all other λ j = 0, the constraints of the dual model hold. So the solution satisfies θ∗ = min θ ≤ 1. So the optimal solution of the dual model satisfies that 0 < θ∗ ≤ 1. A DMU is efficient if the optimal solution θ∗ = 1 and the optimal multipliers μ∗r > 0 and ν∗i > 0. All efficient points representing DMUs lie on the frontier curve which looks like a concave envelopment towards the origin and envelops all DMUs. So the dual model is called the envelopment model. Example 8.1.2. Assume that there are five DMUs. Each DMU consumes a single input to produce a single output. The data are given in the following table:

DMUs

DMU1

DMU2

DMU3

DMU4

DMU5

Input

x11 = 2

x12 = 3

x13 = 6

x14 = 9

x15 = 5

Output

y11 = 1

y12 = 4

y13 = 6

y14 = 7

y15 = 3

and the observed input and output values are, respectively, x10 = x15 = 5 and y10 = y15 = 3. To evaluate the efficiency of DMU5 , by the dual theorem, we solve the following dual model: (Lid CCR ) : min θ

subject to

{ 2λ1 + 3λ2 + 6λ3 + 9λ4 + 5λ5 ≤ 5θ { { { λ1 + 4λ2 + 6λ3 + 7λ4 + 3λ5 ≥ 3 { { { { λ ≥ 0, λ2 ≥ 0, λ3 ≥ 0, λ4 ≥ 0, λ5 ≥ 0, θ > 0. { 1 Let θ = −ξ . Converting minimization of θ to maximization of ξ , and then introducing two slack variables s1 ≥ 0 and s2 ≥ 0, this produces a new model (L0 ) from the dual model (Lid CCR )

248 | 8 Data envelopment analysis (L0 ) : max ξ

subject to

{ 5ξ − 2λ1 − 3λ2 − 6λ3 − 9λ4 − 5λ5 + s1 = 0, { { { λ1 + 4λ2 + 6λ3 + 7λ4 + 3λ5 + s2 = 3 { { { { λ ≥ 0, λ2 ≥ 0, λ3 ≥ 0, λ4 ≥ 0, λ5 ≥ 0. { 1 Consider the first row as the objective and the second row as the only constraint, we will solve the above dual model (L0 ) using the simplex method given in Section 7.3. There is only one constraint, so the computation process becomes very simple. Pivoting on the (1, 1) entry, this produces a model (L1 ) from the model (L0 ) (L1 ) : max ξ

subject to

{ 5ξ + 5λ2 + 6λ3 + 5λ4 + λ5 + s1 + 2s2 = 6, { { { λ1 + 4λ2 + 6λ3 + 7λ4 + 3λ5 + s2 = 3 { { { { λ ≥ 0, λ2 ≥ 0, λ3 ≥ 0, λ4 ≥ 0, λ5 ≥ 0. { 1

ϕ(1) = 1,

The basic variable is λ1 . Other variables are nonbasic variables. Let λ i = 0 (i ≠ 1) and s i = 0 (i = 1, 2). Then λ1 = 3 and ξ = − 65 . Pivoting on the (1, 2) entry, this produces a model (L2 ) from the model (L0 ) (L2 ) : max ξ

subject to

11 3 9 { 5ξ + 54 λ1 + 32 λ3 + 15 { 4 λ4 + 4 λ5 − s1 − 4 s2 = − 4 , { { 1 3 7 3 1 3 { 4 λ1 + λ2 + 2 λ3 + 4 λ4 + 4 λ5 + 4 s2 = 4 , { { { λ ≥ 0, λ2 ≥ 0, λ3 ≥ 0, λ4 ≥ 0, λ5 ≥ 0. { 1

ϕ(1) = 2,

The basic variable is λ2 . Other variables are nonbasic variables. Let λ i = 0 (i ≠ 2) and 9 . s i = 0 (i = 1, 2). Then λ2 = 34 and ξ = − 20 Pivoting on the (1, 3) entry, this produces a model (L3 ) from the model (L0 ) (L3 ) : max ξ

subject to

{ 5ξ + λ1 − λ2 + 2λ4 + 2λ5 − s1 − s2 = −3, { { { 1 λ + 2 λ + λ3 + 76 λ4 + 12 λ5 + 16 s2 = 12 , { {6 1 3 2 { { λ ≥ 0, λ2 ≥ 0, λ3 ≥ 0, λ4 ≥ 0, λ5 ≥ 0. { 1

ϕ(1) = 3,

The basic variable is λ3 . Other variables are nonbasic variables. Let λ i = 0 (i ≠ 3) and s i = 0 (i = 1, 2). Then λ3 = 12 and ξ = − 35 . Pivoting on the (1, 4) entry, this produces a model (L4 ) from the model (L0 ) (L4 ) : max ξ

subject to

5 5 12 8 1 1 3 { ξ + 63 λ1 − 15 { 63 λ 2 − 63 λ 3 + 63 λ 5 − 9 s 1 − 7 s 2 = − 7 , { {9 1 4 6 3 1 3 { 7 λ1 + 7 λ2 + 7 λ3 + λ4 + 7 λ5 + 7 s2 = 7 , { { { λ ≥ 0, λ2 ≥ 0, λ3 ≥ 0, λ4 ≥ 0, λ5 ≥ 0. { 1

ϕ(1) = 4,

8.1 Charnes–Cooper–Rhodes DEA models

| 249

The basic variable is λ4 . Other variables are nonbasic variables. Let λ i = 0 (i ≠ 4) and s i = 0 (i = 1, 2). Then λ4 = 37 and ξ = − 27 35 . Pivoting on the (1, 5) entry, this produces a model (L5 ) from the model (L0 ) (L5 ) : max ξ

subject to

1 4 8 1 1 { λ1 − 11 ξ + 15 { 15 λ 2 − 5 λ 3 − 15 λ 4 − 5 s 1 − 3 s 2 = −1, { { 1 4 7 1 ϕ(1) = 5, { 3 λ 1 + 3 λ 2 + 2λ 3 + 3 λ 4 + λ 5 + 3 s 2 = 1, { { { λ ≥ 0, λ2 ≥ 0, λ3 ≥ 0, λ4 ≥ 0, λ5 ≥ 0. { 1 The basic variable is λ5 . Other variables are nonbasic variables. Let λ i = 0 (i ≠ 5) and s i = 0 (i = 1, 2). Then λ5 = 1 and ξ = −1. Therefore,

6 9 3 27 9 max ξ = max {− , − , − , − , −1} = − 5 20 5 35 20 and the corresponding λ2 = 34 and λ i = 0 (i ≠ 2). Note that min θ = − max ξ . So when λ2 = 34 and λ i = 0 (i ≠ 2), min θ = − max ξ =

9 20 .

8.1.2 Output-oriented CCR DEA models Let x0 , y0 , x j , y j (j = 1, . . . , n), u, υ, and μ, ν be stated as in Section 8.1.1, i.e., x0 = (x10 , . . . , x m0 ),

y0 = (y10 , . . . , y q0 ),

x j = (x1j , . . . , x mj ),

y j = (y1j , . . . , y qj )

u = (u1 , . . . , u q ),

υ = (υ1 , . . . , υ m ),

μ = (μ1 , . . . , μ q ),

ν = (ν1 , . . . , ν m ).

(j = 1, . . . , n),

The ratio of inputs x0 with weight υ to outputs y0 with weight u forms the objective function ⟨υ, x0 ⟩ υ1 x10 + υ2 x20 + ⋅ ⋅ ⋅ + υ m x m0 , = ⟨u, y0 ⟩ u1 y10 + u2 y20 + ⋅ ⋅ ⋅ + u q y q0 where u ≥ 0 and υ ≥ 0 are variables, and x0 , y0 are the observed input and output values, respectively. Using the Charnes–Cooper transformation t=

1 ⟨u, y0 ⟩

or ⟨tu, y0 ⟩ = 1,

(8.1.4)

⟨υ, x0 ⟩ = t⟨υ, x0 ⟩ = ⟨tυ, x0 ⟩. ⟨u, y0 ⟩

(8.1.5)

the objective function becomes

An additional constraint is to guarantee that the efficiency of DMU must be less than or equal to unity, i.e., ⟨u, y j ⟩ t⟨u, y j ⟩ ⟨tu, y j ⟩ = = ≤ 1 (j = 1, . . . , n). ⟨υ, x j ⟩ t⟨υ, x j ⟩ ⟨tυ, x j ⟩

(8.1.6)

250 | 8 Data envelopment analysis Using change of variable μ = tu and ν = tυ in (8.1.4), (8.1.5), and (8.1.6), the outputoriented CCR DEA model becomes the following form op

(LCCR ) : min⟨ν, x0 ⟩

subject to

{ ⟨ν, x j ⟩ − ⟨μ, y j ⟩ ≥ 0 (j = 1, . . . , n), { { { ⟨μ, y0 ⟩ = 1, { { { { ν ≥ 0, μ ≥ 0. { Example 8.1.3. Assume that there are three DMUs with two outputs y1 , y2 and one input x to be evaluated. The data for the output-oriented CCR DEA model are listed in the following table:

DMU

y1

y2

x

y1 /x

y2 /x

DMU1 DMU2 DMU3

10 20 25

40 40 35

10 20 10

1.00 1.00 2.50

4.00 2.00 3.50

op

Using the model (LCCR ), three output-oriented CCR DEA models are used to assess DMUs, respectively DMU1 : max 10 W x subject to 10 W x − 10 W y1 − 40 W y2 ≥ 0 20 W x − 20 W y1 − 40 W y2 ≥ 0 10 W x − 25 W y1 − 35 W y2 ≥ 0 10 W y1 + 40 W y2 = 1. DMU2 : max 20 W x

subject to

10 W x − 10 W y1 − 40 W y2 ≥ 0 20 W x − 20 W y1 − 40 W y2 ≥ 0 10 W x − 25 W y1 − 35 W y2 ≥ 0 20 W y1 + 40 W y2 = 1. DMU3 : max 10 W x

subject to

10 W x − 10 W y1 − 40 W y2 ≥ 0 20 W x − 20 W y1 − 40 W y2 ≥ 0 10 W x − 25 W y1 − 35 W y2 ≥ 0 25 W y1 + 35 W y2 = 1. The three evaluated DMUs may be represented by the following three points, respectively, D1 (1, 4), D2 (1, 2), D3 (2.5, 3.5).

8.1 Charnes–Cooper–Rhodes DEA models | 251

Set a rectangular coordinate system with the origin O(0, 0), horizontal axis y1 /x, and vertical axis y2 /x on a sheet of graph paper and plot these three points D1 , D2 , D3 . The curve through points D1 , D3 , and its extension is called the frontier curve which looks like a concave envelopment towards the origin and envelops these three points. Denote by D󸀠i the intersection point of the straight line OD i and the frontier curve. The intersection point D󸀠i is called the projection from the point D i to the frontier curve. OD i Define the efficiency value of D i as OD 󸀠 . Clearly, i

OD1 = 1, OD󸀠1

OD2 < 1, OD󸀠2

OD3 = 1, OD󸀠3

i.e., the efficiency values of D1 and D3 are both 1, and the efficiency values of D2 is less than 1. The detailed version of the output-oriented CCR DEA model is op

(LCCR ) : min { x10 ν1 + x20 ν2 + ⋅ ⋅ ⋅ + x m0 ν m }

subject to

x11 ν1 + x21 ν2 + ⋅ ⋅ ⋅ + x m1 ν m − y11 μ1 − y21 μ2 − ⋅ ⋅ ⋅ − y q1 μ q ≥ 0, { { { { { { { x12 ν1 + x22 ν2 + ⋅ ⋅ ⋅ + x m2 ν m − y12 μ1 − y22 μ2 − ⋅ ⋅ ⋅ − y q2 μ q ≥ 0, { { { { .. { { . { { { { x1n ν1 + x2n ν2 + ⋅ ⋅ ⋅ + x mn ν n − y1n μ1 − y2n μ2 − ⋅ ⋅ ⋅ − y qn μ q ≥ 0, { { { { y μ + y μ + ⋅ ⋅ ⋅ + y μ = 1, { { 10 1 20 2 q0 q { { { { ν1 ≥ 0, . . . , ν m ≥ 0, μ1 ≥ 0, . . . , μ q ≥ 0. By Theorem 7.6.2, the dual model of the output-oriented CCR DEA model is (Lod CCR ) : max φ

subject to

{ x11 λ1 + x12 λ2 + ⋅ ⋅ ⋅ + x1n λ n ≤ x10 , { { { { { .. { { . { { { { { { x m1 λ1 + x m2 λ2 + ⋅ ⋅ ⋅ + x mn λ n ≤ x m0 , { { { −y11 λ1 − y12 λ2 − ⋅ ⋅ ⋅ − y1n λ n + y10 φ ≤ 0, { { { { .. { { { . { { { { { { −y q1 λ1 − y q2 λ2 − ⋅ ⋅ ⋅ − y qn λ n + y q0 φ ≤ 0, { { { { { λ1 ≥ 0, λ2 ≥ 0, . . . , λ n ≥ 0. Let x i , y r , and λ be stated as in Section 8.1.1, i.e., x i = (x i1 , . . . , x in )

(i = 1, . . . , m),

y = (y r1 , . . . , y rn )

(r = 1, . . . , q),

r

λ = (λ1 , . . . , λ n ).

252 | 8 Data envelopment analysis

The contraction of the dual model of the output-oriented CCR DEA model is as follows: (Lod CCR ) : max φ

subject to

{ ⟨λ, x i ⟩ ≤ x i0 { { { ⟨λ, y r ⟩ ≥ φy r0 { { { { λ ≥ 0. {

(i = 1, . . . , m), (r = 1, . . . , q),

8.2 Banker–Charnes–Cooper DEA models The Banker–Charnes–Cooper (BCC) DEA model was presented by Banker, Charnes, and Cooper in 1984. This model is referred to as the Variable Returns to Scale (VRS) model or the pure technical efficiency model. BCC DEA models are constructed by od adding an additional constraint to the dual models (Lid CCR ) and (L CCR ).

8.2.1 Input-oriented BCC DEA models Consider the dual model (Lid CCR ) introduced in Section 8.1.1. Adding an additional constraint λ1 + ⋅ ⋅ ⋅ + λ n = 1 (λ ≥ 0) ip

to the dual model (Lid CCR ) yields the input-oriented BCC DEA model (L BCC ). ip The detailed version of the input-oriented BCC DEA model (LBCC ) is as follows ip

(LBCC ) : min θ

subject to

{ y11 λ1 + y12 λ2 + ⋅ ⋅ ⋅ + y1n λ n ≥ y10 , { { { { { { ... { { { { { { { y q1 λ1 + y q2 λ2 + ⋅ ⋅ ⋅ + y qn λ n ≥ y q0 , { { { { { { −x11 λ1 − x12 λ2 − ⋅ ⋅ ⋅ − x1n λ n + θx10 ≥ 0, { { { ... { { { { { { { −x m1 λ1 − x m2 λ2 − ⋅ ⋅ ⋅ − x mn λ n + θx m0 ≥ 0, { { { { { { λ1 + λ2 + ⋅ ⋅ ⋅ + λ n = 1, { { { { { λ1 ≥ 0, λ2 ≥ 0, . . . , λ n ≥ 0, where θ > 0.

8.2 Banker–Charnes–Cooper DEA models | 253

By Theorem 7.6.2, the dual model (Lid BCC ) of the input-oriented BCC DEA model is

ip (LBCC )

(Lid BCC ) : max { y 10 μ 1 + y 20 μ 2 + ⋅ ⋅ ⋅ + y q0 μ q + μ 0 }

subject to

y11 μ1 + y21 μ2 + ⋅ ⋅ ⋅ + y q1 μ q − x11 ν1 − x21 ν2 − ⋅ ⋅ ⋅ − x m1 ν m + μ0 ≤ 0, { { { { { .. { { . { { { y1n μ1 + y2n μ2 + ⋅ ⋅ ⋅ + y qn μ q − x1n ν1 − x2n ν2 − ⋅ ⋅ ⋅ − x mn ν m + μ0 ≤ 0, { { { { { { x10 ν1 + x20 ν2 + ⋅ ⋅ ⋅ + x m0 ν m = 1, { { { { { μ1 ≥ 0, . . . , μ n ≥ 0, ν1 ≥ 0, . . . , ν m ≥ 0, μ0 ∈ ℝ. Let x0 , y0 , x j , y j , x i , y r , and μ, ν, λ be stated in Section 8.1, and let e = (1, 1, . . . , 1). The contraction of the input-oriented BCC DEA model is the form ip

(LBCC ) : min θ

subject to

{ ⟨λ, y r ⟩ ≥ y r0 { { { ⟨λ, x i ⟩ ≤ θx i0 { { { { ⟨e, λ⟩ = 1, λ ≥ 0, θ > 0, {

(r = 1, . . . , q), (i = 1, . . . , m),

and the contraction of the dual model of the input-oriented BCC DEA model is the form (Lid BCC ) : max{ ⟨μ, y 0 ⟩ + μ 0 }

subject to

{ ⟨μ, y j ⟩ − ⟨ν, x j ⟩ + μ0 ≤ 0 (j = 1, . . . , n), { { { ⟨ν, x0 ⟩ = 1, { { { { μ ≥ 0, ν ≥ 0, μ0 ∈ ℝ, { where μ0 is a free variable. It has been seen that the added constraint in the input-oriented BCC DEA model introduces an additional variable μ0 in its dual model. This extra variable μ0 makes it possible to effect returns-to-scale evaluations. Thus the input-oriented BCC DEA model is refereed to as the Variable Returns to Scale (VRS) model.

8.2.2 Output-oriented BCC DEA models Similarly, adding the additional constraint λ1 + λ2 + ⋅ ⋅ ⋅ + λ n = 1 (λ ≥ 0) to the dual model (Lod CCR ) yields the output-oriented BCC DEA model. Its detail version is as follows:

254 | 8 Data envelopment analysis op

(LBCC ) : max φ

subject to

{ x11 λ1 + x12 λ2 + ⋅ ⋅ ⋅ + x1n λ n ≤ x10 , { { { { { { ... { { { { { { { x λ + x m2 λ2 + ⋅ ⋅ ⋅ + x mn λ n ≤ x m0 , { { m1 1 { { { { −y11 λ1 − y12 λ2 − ⋅ ⋅ ⋅ − y1n λ n + y10 φ ≤ 0, { { { ... { { { { { { { −y q1 λ1 − y q2 λ2 − ⋅ ⋅ ⋅ − y qn λ n + y q0 φ ≤ 0, { { { { { { λ1 + λ2 + ⋅ ⋅ ⋅ + λ n = 1, { { { { { λ1 ≥ 0, λ2 ≥ 0, . . . , λ n ≥ 0. By Theorem 7.6.2, its dual model (Lod BCC ) is as follows: (Lod BCC ) : max { x 10 ν 1 + x 20 ν 2 + ⋅ ⋅ ⋅ + x m0 ν m + ν 0 }

subject to

x11 ν1 + x21 ν2 + ⋅ ⋅ ⋅ + x m1 ν m − y11 μ1 − y21 μ2 − ⋅ ⋅ ⋅ − y q1 μ q + ν0 ≥ 0, { { { { { .. { { . { { { x ν + x2n ν2 + ⋅ ⋅ ⋅ + x mn ν m − y1n μ1 − y2n μ2 − ⋅ ⋅ ⋅ − y qn μ q + ν0 ≥ 0, { { { 1n 1 { { { { y10 μ1 + y20 μ2 + ⋅ ⋅ ⋅ + y q0 μ q = 1, { { { { ν1 ≥ 0, . . . , ν m ≥ 0, μ1 ≥ 0, . . . , μ q ≥ 0, ν0 ∈ ℝ, where ν0 is a free variable. Let x0 , y0 , x j , y j , x i , y r , and μ, ν, λ be stated in Section 8.1 and e = (1, 1, . . . , 1). The contraction of the output-oriented BCC DEA model is as follows: op

(L BCC ) : min φ

subject to

⟨λ, x i ⟩ ≤ x i0 { { { { { { ⟨λ, y r ⟩ ≥ φy r0 { { ⟨e, λ⟩ = 1, { { { { { λ ≥ 0.

(i = 1, . . . , m), (r = 1, . . . , q),

The contraction of its dual model is as follows: (Lod BCC ) : max{ ⟨ν, x 0 ⟩ + ν 0 } subject to { ⟨μ, y j ⟩ − ⟨ν, x j ⟩ + ν0 ≤ 0 (j = 1, . . . , n), { { { ⟨μ, y0 ⟩ = 1, { { { { ν ≥ 0, μ ≥ 0, ν0 ∈ ℝ, { where ν0 is a free variable. Similarly, the added constraint ⟨e, λ⟩ = 1 introduces an extra variable ν0 .

8.3 One-stage and two-stage methods |

255

8.3 One-stage and two-stage methods The one-stage and two stage methods are used for solving optimization models. Cite the dual model (Lid CCR ) of the input-oriented CCR DEA model to introduce these two methods.

8.3.1 One-stage method Slack variables are used to convert the inequalities in the model (Lid CCR ) to equivalent equations. The one-stage method is to add only slack variables to constraints. The dual model (Lid CCR ) is the form (Lid CCR ) : min θ

subject to

{ ⟨λ, y r ⟩ ≥ y r0 { { { ⟨λ, x i ⟩ ≤ θx i0 { { { { λ ≥ 0, θ > 0. {

(r = 1, . . . , q), (i = 1, . . . , m),

Introducing slack variables to constraints, the dual model has the following equivalent form (Lid CCR ) : min θ subject to { s−r ≥ 0 (r = 1, . . . , q), ⟨λ, y r ⟩ − s−r = y r0 , { { { ⟨λ, x i ⟩ + s+i = θx i0 , s+i ≥ 0 (i = 1, . . . , m), { { { { λ ≥ 0, θ > 0. { In the previous Section 8.1.1, the one-stage method was used for solving Example 8.1.2.

8.3.2 Two-stage method Nonzero slacks will be such that some boundary points may be weakly efficient which is defined as follows. Definition 8.3.1. Let the particular DMU0 being evaluated consume amounts x i0 of input i (i = 1, . . . , m) and produce amounts y r0 of output r (r = 1, . . . , q), and let θ∗ be the optimal solution of the dual model (Lid CCR ). Then +∗ (a) DMU0 is efficient if and only if θ∗ = 1 and the slack variables s−∗ i = s r = 0 for all i and r. (b) DMU0 is weakly efficient if and only if θ∗ = 1 and the slack variables s−∗ ≠ 0 i and/or s+∗ = ̸ 0 for some i and r. r

256 | 8 Data envelopment analysis

To avoid weak efficiency, the two-stage method is to add the slack variables to both constraints and the objective, i.e., m

q

1

1

− + (Lid CCR ) : min { θ − ε (∑ s i + ∑ s r ) }

{ ⟨λ, y r ⟩ − s−r = y r0 , { { { ⟨λ, x i ⟩ + s+i = θx i0 , { { { { λ ≥ 0, θ > 0, ε > 0, {

subject to

s−r ≥ 0 (r = 1, . . . , q), s+i ≥ 0 (i = 1, . . . , m),

where ε is a so-called non-Archimedean element defined to be smaller than any positive real number. This model is solved in two stages. The first stage is to find the optimal solution θ∗ of the dual model (Lid CCR ) (Lid CCR ) : min θ

subject to

{ ⟨λ, y r ⟩ ≥ y r0 { { { ⟨λ, x i ⟩ ≤ θx i0 { { { { λ ≥ 0, θ > 0 {

(r = 1, . . . , q), (i = 1, . . . , m),

or its equivalent model (Lid CCR ) : min θ

subject to

{ ⟨λ, y r ⟩ − s−r = y r0 , { { { ⟨λ, x i ⟩ + s+i = θx i0 , { { { { λ ≥ 0, θ > 0. {

s−r ≥ 0 (r = 1, . . . , q), s+i ≥ 0 (i = 1, . . . , m),

The second stage is to solve the following model: m

q

1

1

max {∑ s−i + ∑ s+r } { ⟨λ, y r ⟩ − s−r = y r0 { { { ⟨λ, x i ⟩ + s+i = θ∗ x i0 { { { { λ ≥ 0, {

subject to (r = 1, . . . , q), (i = 1, . . . , m),

where θ∗ is the optimal solution obtained in the first stage.

8.4 Advanced DEA models | 257

8.4 Advanced DEA models Let x0 , y0 , x j , y j , x i , y r and μ, ν, λ be stated in Section 8.1, and let e = (1, 1, . . . , 1).

8.4.1 Free disposal hull DEA models The Free Disposal Hull (FDH) DEA model presented by Tulkens (1993) is a mixed integer linear programming model. The FDH DEA model is obtained by changing the constraint λ ≥ 0 into λ = (λ1 , . . . , λ n ) and λ j = 0 or 1 (j = 1, . . . , n) in DEA models. The inputoriented FDH DEA model is the form min θ

subject to

{ ⟨λ, y r ⟩ ≥ y r0 { { { ⟨λ, x i ⟩ ≤ θx i0 { { { { ⟨e, λ⟩ = 1 {

(r = 1, . . . , q), (i = 1, . . . , m), (λ j = 0 or 1 (j = 1, . . . , n)), θ > 0.

8.4.2 Slack-based measure DEA models Slack-Based Measure (SBM) DEA models were presented by Tone (2001). The SBM DEA models include the input-oriented, the output-oriented, and the nonoriented SBM DEA model. – The input-oriented SBM DEA model is the form min {1 −

− 1 m si ∑ } m 1 x i0

{ ⟨λ, y r ⟩ ≥ y r0 { { { ⟨λ, x i ⟩ + s−i = x i0 , { { { { λ ≥ 0, { –

subject to (r = 1, . . . , q), s−i

≥ 0 (i = 1, . . . , m),

where s−i (i = 1, . . . , m) are m slack variables. The output-oriented SBM DEA model is the form { 1 min { 1 q { 1 + q ∑1

s+r y r0

} } }

{ ⟨λ, y r ⟩ − s+r = y r0 , { { { ⟨λ, x i ⟩ ≤ x i0 { { { { λ ≥ 0, {

subject to s+r ≥ 0 (r = 1, . . . , q),

where s+r (r = 1, . . . , q) are q slack variables.

(i = 1, . . . , m),

258 | 8 Data envelopment analysis



The nonoriented SBM DEA model is the form s−

1 ∑1m x i0i } {1 − m min { + 1 q sr } { 1 + q ∑1 y r0 }

{ ⟨λ, y r ⟩ − s+r = y r0 , { { { ⟨λ, x i ⟩ + s−i = x i0 , { { { { λ ≥ 0, {

subject to s+r ≥ 0 (r = 1, . . . , q), s−i ≥ 0 (i = 1, . . . , m),

where s−i (i = 1, . . . , m) and s+r (r = 1, . . . , q) are m + q slack variables. The above input-oriented, output-oriented, and nonoriented SBM models are referred to as CRS models. If the additional constraint ⟨e, λ⟩ = 1 is added to the above models, the obtained models are referred to as VRS models. If the additional constraints ⟨e, λ⟩ ≥ 1, ⟨e, λ⟩ ≤ 1, L ≤ ⟨e, λ⟩ ≤ U, are added, respectively, to the above models, the obtained models are referred to as Non-Decreasing Returns to Scale (NDRS) models, Non-Increasing Returns to Scale (NIRS) models, and Generalized Returns to Scale (GRS) models, respectively.

8.4.3 MSBM, WSBM, and WMSBM DEA models –

The Modified Slack-Based Measure (MSBM) model was presented by Sharp et al. (2007). Its form is s−

1 ∑1m R ii0 } {1 − m min { + 1 q sr } { 1 + q ∑1 R r0 }

⟨λ, y r ⟩ − s+r = y r0 , { { { { { { ⟨λ, x i ⟩ + s−i = x i0 , { { ⟨e, λ⟩ = 1, { { { { { λ ≥ 0,

subject to s+r ≥ 0 (r = 1, . . . , q), s−i ≥ 0 (i = 1, . . . , m),

where s+r (r = 1, . . . , q) and s−i (i = 1, . . . , m) are slack variables, and R i0 = x i0 − min { x ij }, 1≤j≤n

R r0 = max { y rj } − y r0 . 1≤j≤n

8.4 Advanced DEA models | 259



The nonoriented Weighted Slack-Based Measure (WSBM) model is the form w I s−

m i i 1 { { 1 − ∑1m w Ii ∑1 x i0 min { { 1 + q1 ∑q w or s+r 1 y r0 ∑1 w or {

{ ⟨λ, y r ⟩ − s+r = y r0 , { { { ⟨λ, x i ⟩ + s−i = x i0 , { { { { λ ≥ 0, {

} } } } }

subject to

s+r ≥ 0 (r = 1, . . . , q), s−i ≥ 0 (i = 1, . . . , m),

where w Ii (i = 1, . . . , m) and w or (r = 1, . . . , q) are the input weights and the output weights, respectively. –

The Weighted Modified Slack-Based Measure (WMSBM) model is the form w I s−

m i i 1 { { 1 − ∑1m w Ii ∑1 R i0 min { { 1 + q1 ∑q w or s+r 1 R r0 ∑1 w or {

⟨λ, y r ⟩ − s+r = y r0 , { { { { { { ⟨λ, x i ⟩ + s−i = x i0 , { { ⟨e, λ⟩ = 1, { { { { { λ ≥ 0,

} } } } }

subject to

s+r ≥ 0 (r = 1, . . . , q), s−i ≥ 0 (i = 1, . . . , m),

where s+r , s−i , w Ii , w or , R i0 , and R r0 are stated as above.

8.4.4 Hybrid distance function DEA models The main hybrid distance function DEA models include two groups. The first group was presented by Tone and Tsutsui (2010). Due to the parameter ε, Tone and Tsutsui call the first group Epsilon-Based Measure (EBM) models. The second group was presented by Cooper, Seiford, and Tone (2007). Due to the mix of the radial model and SBM model, Cooper, Seiford, and Tone call the second group Hybrid Models.

260 | 8 Data envelopment analysis

(a) EBM models – The input-oriented EBM model is the form min {θ −

− − ε− m w i s i ∑ } ∑1m w−i 1 x i0

{ ⟨λ, x i ⟩ + s−i = θx i0 , { { { ⟨λ, y r ⟩ ≥ y r0 { { { { λ ≥ 0, {

subject to

s−i ≥ 0 (i = 1, . . . , m), (r = 1, . . . , q),

where s−i and w−i are stated as above, and ε− ∈ [0, 1]. –

The output-oriented EBM model is the form { { min { {φ + {

1 ε+ q ∑1 w+r

q

∑1

w+r s+r y r0

} } } } }

{ ⟨λ, y r ⟩ − s+r = φy r0 , { { { ⟨λ, x i ⟩ ≤ x i0 { { { { λ ≥ 0, {

subject to

s+r ≥ 0 (r = 1, . . . , q), (i = 1, . . . , m),

where s+r and w+r are stated as above, and ε+ ∈ [0, 1]. –

The non-oriented EBM model is the form −

m ε { { θ − ∑1m w−i ∑1 min { { φ + qε+ ∑q 1 ∑1 w+r {

w−i s−i x i0 w+r s+r y r0

{ ⟨λ, x i ⟩ + s−i = θx i0 , { { { ⟨λ, y r ⟩ − s+r = φy r0 , { { { { λ ≥ 0, {

} } } } }

subject to

s−i ≥ 0 (i = 1, . . . , m), s+r ≥ 0 (r = 1, . . . , q),

where s−i , s+r , w−i , w+r are stated as above, and ε− , ε+ ∈ [0, 1]. If ε− = ε+ = 0 and φ = 1, the EBM model is just the dual model of the input-oriented CCR DEA model. If ε− = ε+ = 1 and θ = φ = 1, the EBM model is just the nonoriented weighted SBM model.

8.4 Advanced DEA models | 261

(b) Hybrid models – The input-oriented hybrid model is the form N

min {1 −

− m1 1 m2 s (1 − θ) − ∑ iN } m m 1 x i0

R ⟨λ, x Ri ⟩ + s R− { i = θx i0 , { { { { { ⟨λ, x Ni ⟩ + s N− = x Ni0 , i { { { ⟨λ, y rN ⟩ ≥ y Nr0 { { { { λ ≥ 0,

subject to

s R− i ≥ 0 (i = 1, . . . , m), s N− ≥ 0 (i = 1, . . . , m), i (r = 1, . . . , q),

where R is radial and N is nonradial, and m is the input, where m1 is the radial input and m2 is the nonradial input. –

The output-oriented hybrid model is the form { min { 1− {

1 q1 q (φ

− 1) +

1 q

⟨λ, y R ⟩ − s r = φy r0 , { { { { N { { ⟨λ, y rN ⟩ − s N+ r = y r0 , { { { ⟨λ, x Ni ⟩ ≤ x Ni0 { { { { λ ≥ 0, r

R+

R

q N+ ∑12 syrN i0

} } }

subject to

s R+ r ≥ 0 (r = 1, . . . , q), s N+ r ≥ 0 (r = 1, . . . , q), (i = 1, . . . , m),

where R is radial and N is nonradial, and q is the output, where q1 is the radial output and q2 is the nonradial output. –

Hybrid model mixing the radial and SMB models is the form { {1 − min { { 1+ {

m1 m (1

− θ) −

1 m

∑1 2

m

s N− i x Ni0

q1 q (φ

− 1) +

1 q

∑1 2

q

s N+ r y Nr0

R ⟨λ, x Ri ⟩ + s R− { i = θx i0 , { { { r R R+ { { { ⟨λ, y R ⟩ − s r = φy r0 , { { = x Ni0 , ⟨λ, x Ni ⟩ + s N− { i { { { N { { ⟨λ, y rN ⟩ − s N+ r = y r0 , { { { { λ ≥ 0,

} } } } }

subject to

s R− i ≥0

(i = 1, . . . , m),

≥0

(r = 1, . . . , q),

s R+ r s N− i s N+ r

≥ 0 (i = 1, . . . , m), ≥ 0 (r = 1, . . . , q),

where R is radial and N is nonradial, m is the input and q is the output, where m1 is the radial input and m2 is the nonradial input, and q1 is the radial output and q2 is the nonradial output.

262 | 8 Data envelopment analysis

8.4.5 Super efficiency models The main super efficiency models include three groups. The first group is the radial super efficiency models presented by Andersen and Petersen (1993). The second group is the directional distance function super efficiency models presented by Ray (2008). The third group is the SMB super efficiency models presented by Tone (2002).

(a) Radial super efficiency models – The input-oriented CRS radial super efficiency model is the form min θ

subject to

{ ⟨λ, x i ⟩ − λ k x ik ≤ θx i0 { { { ⟨λ, y r ⟩ − λ k y rk ≥ y r0 { { { { λ ≥ 0, θ > 0. { –

subject to

{ ⟨λ, x i ⟩ − λ k x ik ≤ x i0 { { { ⟨λ, y r ⟩ − λ k y rk ≥ φy r0 { { { { λ ≥ 0. {

(i = 1, . . . , m), (r = 1, . . . , q),

The input-oriented VRS radial super efficiency model is the form min θ

subject to

⟨λ, x i ⟩ − λ k x ik ≤ θx i0 { { { { { { ⟨λ, y r ⟩ − λ k y rk ≥ y r0 { { { ⟨e, λ⟩ − λ k = 1, { { { { λ ≥ 0, θ > 0. –

(r = 1, . . . , q),

The output-oriented CRS radial super efficiency model is the form max φ



(i = 1, . . . , m),

(i = 1, . . . , m), (r = 1, . . . , q),

The output-oriented VRS radial super efficiency model is the form max φ

subject to

⟨λ, x i ⟩ − λ k x ik ≤ x i0 { { { { { { ⟨λ, y r ⟩ − λ k y rk ≥ φy r0 { { { ⟨e, λ⟩ − λ k = 1, { { { { λ ≥ 0.

(i = 1, . . . , m), (r = 1, . . . , q),

8.4 Advanced DEA models | 263

(b) Directional distance function super efficiency models – The form is as follows: max β

subject to

⟨λ, x i ⟩ − λ k x ik + βg xi ≤ x i0 { { { { { { ⟨λ, y r ⟩ − λ k y rk − βg yr ≥ y r0 { { { ⟨λ, b t ⟩ − λ k b tk − βg yt ≤ b t0 , { { { { λ ≥ 0,

(i = 1, . . . , m), (r = 1, . . . , q),

where b t = (b t1 , . . . , b tn ).

(c) SBM super efficiency models – The input-oriented SBM super efficiency model is the form min {1 +

− 1 m si ∑ } m 1 x i0

subject to

{ ⟨λ, x i ⟩ − λ k x ik + s−i = x i0 , { { { ⟨λ, y r ⟩ − λ k y rk ≥ y r0 { { { { λ ≥ 0. { –

(r = 1, . . . , q),

The output-oriented SBM super efficiency model is the form { 1 min { 1 q { 1 − q ∑1

s+r y r0

} } }

subject to

{ ⟨λ, x i ⟩ − λ k x ik ≤ x i0 { { { ⟨λ, y r ⟩ − λ k y rk − s+r = y r0 , { { { { λ ≥ 0. { –

s−i ≥ 0 (i = 1, . . . , m),

(i = 1, . . . , m), s+r ≥ 0 (r = 1, . . . , q),

The nonoriented SBM super efficiency model is the form s−

1 ∑1m x i0i } {1 + m max { + 1 q sr } { 1 − q ∑1 y r0 }

subject to

{ ⟨λ, x i ⟩ − λ k x ik + s−i = x i0 , { { { ⟨λ, y r ⟩ − λ k y rk − s+r = y r0 , { { { { λ ≥ 0. {

s−i ≥ 0 (i = 1, . . . , m), s+r ≥ 0 (r = 1, . . . , q),

264 | 8 Data envelopment analysis

8.4.6 Directional distance function models The Directional Distance Function (DDF) model was presented by Chung et al. in 1997. –

Directional distance function CRS model is max β

subject to

{ ⟨λ, x i ⟩ + βg xi ≤ x i0 { { { ⟨λ, y r ⟩ + βg yr ≥ y r0 { { { { λ ≥ 0, {

(i = 1, . . . , m), (r = 1, . . . , q), q

where g x = (g1x , . . . , g xm ) and g y = (g1y , . . . , g y ). By Theorem 7.6.2, its dual model is min(⟨ν, x0 ⟩ − ⟨μ, y0 ⟩) subject to { ⟨ν, x j ⟩ − ⟨μ, y j ⟩ ≤ 0 { { { ⟨ν, g x ⟩ − ⟨μ, g y ⟩ = 1, { { { { ν ≥ 0, μ ≥ 0. { –

(j = 1, . . . , n),

Directional distance function VRS model is max β

subject to

⟨λ, x i ⟩ + βg xi ≤ x i0 { { { { { { ⟨λ, y r ⟩ + βg yr ≥ y r0 { { { ⟨e, λ⟩ = 1, { { { { λ ≥ 0,

(i = 1, . . . , m), (r = 1, . . . , q),

q

where g x = (g1x , . . . , g xm ) and g y = (g1y , . . . , g y ). By Theorem 7.6.2, its dual model is min{ ⟨ν, x0 ⟩ − ⟨μ, y0 ⟩ + μ0 } subject to { ⟨ν, x j ⟩ − ⟨μ, y j ⟩ + μ0 ≤ 0 (j = 1, . . . , n), { { { ⟨ν, g x ⟩ − ⟨μ, g y ⟩ = 1, { { { { ν ≥ 0, μ ≥ 0, μ0 ∈ ℝ. {

8.5 Software and case studies Data Envelopment Analysis (DEA) is a very powerful service management and benchmarking technique to evaluate organizations. MaxDEA is an easy-to-use DEA software and can be downloaded from http://www.maxdea.cn/. MaxDEA includes most DEA models and can import data from Excel, Access, and text files. Below we give some case studies that use DEA algorithms.

Further reading |

265

8.5.1 Carbon emissions reduction DEA can be used to make environmental efficiency analysis of several regional industries. Capital stock, population and energy consumption are three main inputs, GDP is a single desirable output and carbon emissions are an undesirable output. Miao et al. (2016) use DEA to quantify the efficient allocation of carbon emissions between different provinces using China’s provincial data in 2006–2010. Results showed that the actual carbon emissions in some provinces were higher than their maximal carbon emission allowances calculated from the DEA model, indicating that these provinces are facing great pressures on carbon emission reduction.

8.5.2 Coal mining industry For the coal mining industry, the main inputs are capital and labour, as well as fossilfuels; the main output is coal production and some undesirable outputs such as waste water, gas and solids. With the help of DEA, Liu et al. compared two major coal producing areas in China, Shanxi and Inner Mongolia provinces, and showed that the market-oriented policy package performs better than the nationalization regulations, because not only can the former policy instruments reduce the economic shock on industry productivity, but also improve to a larger extent the unified environmental and operational efficiency.

8.5.3 Coal-fired power plants DEA can be used to assess energy efficiency of coal-fired plants. Song et al. (2015) use input-oriented CCR/BCC DEA models to measure overall technical efficiency and pure technical efficiency of 34 coal-fired power plants in China. The generalized energy efficiency is calculated based on four input parameters: coal consumption, oil consumption, water consumption and auxiliary power consumption by power units. The special energy efficiency is only based on two input parameters: coal consumption and auxiliary power consumption. Results showed that electricity saving potential exists for 14 out of 34 power plants.

Further reading [1] [2]

Andersen P, Petersen NC. A procedure for ranking efficient units in data envelopment analysis. Management Science. 1993(39):1261–1265. Banker RD, Charnes A, Cooper WW. Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science. 1984(30):1078–1092.

266 | 8 Data envelopment analysis

[3] [4] [5] [6]

[7]

[8] [9] [10]

[11] [12]

[13] [14] [15]

[16] [17] [18] [19] [20]

Cauchy A. Methode génerale pour la résolution des systéms d’equations simultanées. C R Acad Sci Paris. 1847(25):536–538. Charnes A , Cooper WW, Rhodes E. Measuring the efficiency of decision making units. European Journal of Operational Research. 1978(2):429–444. Chen L, Jia G. Environmental efficiency analysis of China’s regional industry: a data envelopment analysis (DEA) based approach. Journal of Cleaner Production. 2016: in press. Cooper WW, Seiford LM, Tone K. Data Envelopment Analysis. A comprehensive text with models, applications, references and DEA-solver software, Kluwer Academic Publishers, Boston, 2000. Cooper WW, Seiford LM, Tone K. Data Envelopment Analysis. A comprehensive text with models, applications, references and DEA-Solver software, 2nd ed. New York: Springer Science & Business Media, 2007. Chung YH, Färe R, Grosskopf S. Productivity and undesirable outputs: A directional distance function approach. Journal of Environmental Management. 1997(51):229–240. Harver CM. Operations Research: An Introduction to Linear Optimization and Decision Analysis, Elsevier North Holland, Inc., New York, 1979. Kuhn HW, Tucker AW. Nonlinear Programming. Proceedings of Second Berkeley Symposium on Mathematical Statistics and Probability, pp. 481–492, University of California Press, Berkeley, CA, 1950. Lemke CE. The dual method of solving a linear programming problem. Naval Res Logistics Q. 1954(1):48–54. Liu J, Liu H, Yao XL, Liu Y. Evaluating the sustainability impact of consolidation policy in China’s coal mining industry: A data envelopment analysis. Journal of Cleaner Production. 2016(112): 2969–2976. Miao Z, Geng Y, Sheng J. Efficient allocation of CO2 emissions in China: A zero sum gains data envelopment model. Journal of Cleaner Production. 2016(112):4144–4150. Ray SC. The directional distance function and measurement of super-efficiency: An application to airlines data. Journal of the Operational Research Society. 2008(59):788–797. Sharp JA, Meng W, Liu W. A modified slacks-based measure model for data envelopment analysis with natural negative outputs and inputs. Journal of the Operational Research Society. 2007(58):1672–1677. Song C, Li M, Zhang F, He YL, Tao WQ. A data envelopment analysis for energy efficiency of coal-fired power units in China. Energy Conversion and Management. 2015(102):121–130. Tone K. A slacks-based measure of efficiency in data envelopment analysis. European Journal of Operational Research. 2001(130):498–509. Tone K. A slacks-based measure of super-efficiency in data envelopment analysis. European Journal of Operational Research. 2002(143):32–41. Tone K. Variations on the theme of slacks-based measure of efficiency in DEA. European Journal of Operational Research. 2010(200):901–907. Tulkens H. On FDH efficiency analysis: Some methodological issues and applications to retail banking, courts, and urban transit. Journal of Productivity Analysis. 1993(4):183–210.

9 Risk assessments Risk assessment provides a systematic procedure for estimating the probability of harm to, or from, the environment, the severity of harm, and uncertainty. Lowrance defined a risk is a measure of the probability and severity of adverse effects. This definition is harmonious with the mathematical formula used to calculate the expected value of risk. Later, Haimes showed that a risk assessment is a process consisting of a set of logical, systematic, and well-defined activities that provide the decision maker with a sound identification, measurement, quantification, and evaluation of the risk associated with certain natural phenomena or man-made activities. The main process of risk assessment includes formulating the problem; assessment of the risk; appraising various management options; and determining the optimal risk management strategy. In this chapter we will introduce fundamental methods in risk assessment processes.

9.1 Decision rules under uncertainty Denote by a i (i = 1, . . . , I ) the i-th decision adopted by the decision maker and denote by s j (j = 1, . . . , J ) the j-th scenario, and denote by P ij (i = 1, . . . , I ; j = 1, . . . , J ) the payoff associated with the pair (a i , s j ). The common decision rules include the pessimistic rule, the optimistic rule, and the Hurwitz rule. The pessimistic rule includes both the maximin criteria and the minimax criterion. The optimistic rule is a maximax criterion. The Hurwitz rule is a linear combination between the maximin and the maximax criteria.

The pessimistic rule The rule includes two criteria as follows. – Let P ij represent a payoff. The maximin criteria is represented by max (min P ij ) . 1≤i≤I



1≤j≤J

Let P ij represent a loss or a risk. The minimax criterion is represented by min (max P ij ) .

1≤i≤I

1≤j≤J

The pessimistic rule is also called the maximin criterion or the minimax criterion. This rule ensures that the decision makers will at least realize the minimum gain or avoid maximum loss. DOI 10.1515/9783110424904-010

268 | 9 Risk assessments

The optimistic rule The rule is represented mathematically by max (max P ij ) . 1≤i≤I

1≤j≤J

The decision maker is most optimistic and seeks to maximize the maximum gain using the optimistic rule. The optimistic rule is sometimes called the maximax criterion.

The Hurwitz rule The rule is represented mathematically by max P i (α) = max (α min P ij + (1 − α) max P ij ) 1≤i≤I

1≤i≤I

1≤j≤J

1≤j≤J

(0 ≤ α ≤ 1).

From the Hurwitz rule, it is seen that max1≤i≤I P i (0) is the maximax criterion and max1≤i≤I P i (1) is the maximin criterion. Therefore, the Hurwitz rule is a compromise between two extreme criteria through an α-index and the decision maker’s degree of optimism is specified by the parameter α. Example 9.1.1. Assume that a i (i = 1, 2) are two kinds of carbon-emission-reduction technology adopted potentially by the decision maker and s j (j = 1, 2, 3) are three scenarios for future carbon price, the corresponding payoff P ij (i = 1, 2; j = 1, 2, 3) for production with value $ 1000 associated with pair (a i , s j ) (i = 1, 2; j = 1, 2, 3) is (

P11 P21

P12 P22

200 P13 )=( 100 P23

110 50

−15 ). 5

Try to find the optimal decision by using the pessimistic rule, the optimistic rule, and the Hurwitz rule, respectively. Solution. First, consider the pessimistic rule. From min P1j = min(P11 , P12 , P13 ) = min(200, 110, −15) = −15

for decision a1 ,

min P2j = min(P21 , P22 , P23 ) = min(100, 50, 5) = 5

for decision a2 ,

1≤j≤3 1≤j≤3

it follows that max ( min P ij ) = max ( min P1j , min P2j ) = max(−15, 5) = 5,

1≤i≤2

1≤j≤3

1≤j≤3

1≤j≤3

i.e., the maximin criterion implies a gain of at least $ 5 and the corresponding decision is a2 .

9.1 Decision rules under uncertainty

|

269

Secondly, consider the optimistic rule. Since max P1j = max(P11 , P12 , P13 ) = max(200, 110, −15) = 200

for decision a1 ,

max P2j = max(P21 , P22 , P23 ) = max(100, 50, 5) = 100

for decision a2 ,

1≤j≤3 1≤j≤3

it follows that max (max P ij ) = max (max P1j , max P2j ) = max(200, 100) = 200,

1≤i≤2

1≤j≤3

1≤j≤3

1≤j≤3

i.e., the optimistic rule implies a gain of at most $ 200 and the corresponding decision is a1 . Third, consider the Hurwitz rule max P i (α) = max (α min P ij + (1 − α) max P ij ) .

1≤i≤2

From

1≤i≤2

1≤j≤3

1≤j≤3

min P1j = min(P11 , P12 , P13 ) = min(200, 110, −15) = −15,

1≤j≤3

max P1j = max(P11 , P12 , P13 ) = max(200, 110, −15) = 200,

1≤j≤3

min P2j = min(P21 , P22 , P23 ) = min(100, 50, 5) = 5,

1≤j≤3

max P2j = max(P21 , P22 , P23 ) = max(100, 50, 5) = 100,

1≤j≤3

it follows that for 0 ≤ α ≤ 1, P1 (α) = α min P1j + (1 − α) max P1j 1≤j≤3

1≤j≤3

= −15α + 200(1 − α) = 200 − 215α

for decision a1 ,

P2 (α) = α min P2j + (1 − α) max P2j 1≤j≤3

1≤j≤3

= 5α + 100(1 − α) = 100 − 95α

for decision a2 .

In order to find max1≤i≤2 P i (α) = max(P1 (α), P2 (α)), it is only necessary to solve the equation 200 − 215α = 100 − 95α. The solution is α =

5 6.

It is clear that

200 − 215α < 100 − 95α

if 0 ≤ α
100 − 95α

if

200 − 215α = 100 − 95α

if

5 ; 6

5 < α ≤ 1; 6 5 α= . 6

Therefore, if 0 ≤ α ≤ 56 , then the optimal decision is a1 ; if 56 ≤ α ≤ 1, then the optimal decision is a2 ; if α = 56 , both a1 and a2 are the optimal decision.

270 | 9 Risk assessments

Example 9.1.2. A computer company knows the sales potential for low-performance, medium-performance, and high-performance computers. The company wishes to decide on the best development plan based on minimizing risk of financial loss and/or maximizing the expected profit. The sales potential ($ 1000) of the company is given by the following table:

a1 a2 a3

s1

s2

s3

150 300 450

50 175 150

−20 −100 −150

where the decisions a1 , a2 , and a3 are the low-performance computer, the mediumperformance computer, and the high-performance computer, respectively, and the scenarios s1 , s2 , and s3 are the excellent computer system, the good computer system, and the poor computer system, respectively. Solution. Consider the Hurwitz rule max {P i (α) = α min P ij + (1 − α) max P ij }

1≤i≤3

1≤j≤3

1≤j≤3

(0 ≤ α ≤ 1).

From min P1j = min(150, 50, −20) = −20,

1≤j≤3

min P2j = min(300, 175, −100) = −100,

1≤j≤3

min P3j = min(450, 150, −150) = −150,

1≤j≤3

max P1j = max(150, 50, −20) = 150,

1≤j≤3

max P2j = max(300, 175, −100) = 300,

1≤j≤3

max P3j = max(450, 150, −150) = 450,

1≤j≤3

it follows that P1 (α) = −20α + 150(1 − α) = 150 − 170α

for decision a1 ,

P2 (α) = −100α + 300(1 − α) = 300 − 400α

for decision a2 ,

P3 (α) = −150α + 450(1 − α) = 450 − 600α

for decision a3

which represent three straight lines. It is clear that the second straight line and the third straight line are parallel and the second straight line is dominated by the third straight line. Therefore, in order to find max1≤i≤3 P i (α), it is only necessary to solve the equation 450 − 600α = 150 − 170α. 30 The solution is α = 30 43 ≈ 0.698. This indicates that if α < 43 , the high-performance 30 computer should be produced and sold; if α > 43 , the low-performance computer should be produced and sold.

9.2 Decision trees | 271

9.2 Decision trees The decision tree is the most commonly used tool in risk-based decision-making. It possesses the ability to represent and analyze multiple stages in the decision-making process. This is an attractive feature of decision trees. The single-objective decision tree consists of the following basic components. Decision node. Decision nodes are designated by a square ( ). Branches emanating from a decision node represent the various decisions to be investigated. Each alternative choice is designated by a i and identify each branch with that decision choice. Chance node. Chance nodes are designated by a circle ( ). Branches emanating from a chance node represent the various scenarios with their associated probabilities. Consequences. The value of the consequences is written at the end of each branch. The consequence associated with the i-th decision and j-th scenario is designated by P ij . To determine the optimal manufacturing scenario, the following three measures are used for calculating the expected value of profits for each of the alternative decision options. Let a i (i = 1, . . . , I ) be the i-th decision. Denote by E[a i ] the expected value of profits associated with the i-th decision a i . Let s j (j = 1, . . . , J ) be the j-th scenario. Denote by p(s j ) the probability associated with the j-th scenario s j and denote by P ij the payoff associated with the decision/scenario pair (a i , s j ). Assume that p(s j ) and P ij are known. The Expected Monetary Value (EMV) measure is defined as J

EMV = max ∑ p(s j )P ij = max E[a i ]. 1≤i≤I

1≤i≤I

j=1

The Expected Value of Opportunity Loss (EOL) measure is defined as J

EOL = min ∑ p(s j )(M j − P ij ), 1≤i≤I

j=1

where M j = max1≤i≤I P ij (j = 1, . . . , J ). The EOL measure is essentially a modification of the EMV measure. The Most Likely Value (MLV) measure is not commonly used because the optimal results are very sensitive to the number of scenarios. In the MLV, one selects the outcome with the highest probability for each scenario and then maximizes the corresponding P ij . Example 9.2.1. Consider the payoff matrix ($ 1000) in Example 9.1.1 (

P11 P21

P12 P22

200 P13 )=( 100 P23

110 50

−15 ) 5

272 | 9 Risk assessments where the payoff P ij is associated with the pair (a i , s j ) (i = 1, 2; j = 1, 2, 3). The decision tree is represented by

Decision

Consequence

Chance s1

P11 = 200

s2

P12 = 110

a1

s3

P13 = −15

a2

s1

P21 = 100

s2

P22 = 50

s3

P23 = 5.

Assume that p(s1 ) = 0.4, p(s2 ) = 0.5, and p(s3 ) = 0.1. Consider the EMV measure. The expected values of profits for a1 and a2 are, respectively, 3

E[a1 ] = ∑ p(s j )P1j j=1

= 0.4 × 200 + 0.5 × 110 + 0.1 × (−15) = 80 + 55 − 1.5 = 133.5; 3

E[a2 ] = ∑ p(s j )P2j j=1

= 0.4 × 100 + 0.5 × 50 + 0.1 × 5 = 40 + 25 + 0.5 = 65.5. Thus, the EMV measure is 3

EMV = max ∑ p(s j )P ij = max(E[a1 ], E[a2 ]) = max(133, 500, 65, 000) = 133, 500. 1≤i≤2

j=1

The decision tree with expected value of profits is represented by

Decision

Consequence

Chance s1

0.4 × 200

s2

+0.5 × 110

a1

s3

+0.1 × (−15) = 133.5

a2

s1

0.4 × 100

s2

+0.5 × 50

s3

+0.1 × 5 = 65.5.

9.2 Decision trees | 273

Consider EOL measure. By M j = max1≤i≤2 {P ij } (j = 1, 2, 3), we get M1 = max (P11 , P21 ) = max(200, 100) = 200, 1≤i≤2

M2 = max (P12 , P22 ) = max(110, 50) = 110, 1≤i≤2

M3 = max (P13 , P23 ) = max(−15, 5) = 5. 1≤i≤2

By using ∑3j=1 p(s j )(M j − P ij ) (i = 1, 2), we get 3

∑ p(s j )(M j − P1j ) = 0.4 × (200 − 200) + 0.5 × (110 − 110) + 0.1 × (5 − (−15)) j=1

= 0.4 × 0 + 0.5 × 0 + 0.1 × 20 = 0 + 0 + 2 = 2 for i = 1;

3

∑ p(s j )(M j − P2j ) = 0.4 × (200 − 100) + 0.5 × (110 − 50) + 0.1 × (5 − 5) j=1

= 0.4 × 100 + 0.5 × 60 + 0.1 × 0 = 40 + 30 + 0 = 70 for i = 2.

Thus, the EOL measure is 3

EOL = min ∑ p(s j )(M j − P ij ) = min(2, 70) = 2. 1≤i≤2

j=1

The decision tree with EOL measure is represented by

Decision

Consequence

Chance s1

0.4 × 0

s2

+0.5 × 0

a1

s3

+0.1 × 20 = 2

a2

s1

0.4 × 100

s2

+0.5 × 60

s3

+0.1 × 0 = 70.

Consider the MLV measure. For i = 1, the highest probability is max1≤j≤3 p(s j ) = 0.5. The corresponding profit is P12 = 110. For i = 2, the highest probability is max1≤j≤3 p(s j ) = 0.5. The corresponding profit is P22 = 50. Thus, the measure is MLV = max(110, 50) = 110.

274 | 9 Risk assessments

9.3 Fractile and triangular methods The fractile method is used for assessing probability distributions. A probability distribution can be represented by a cumulative distribution function (cdf) given by x

F(x) = ∫ f(t) dt

(a < x < b),

a

where f(t) is a density function and the interval (a, b) is often chosen as one of the three intervals (0, 1), (0, ∞), and (−∞, ∞). It is seen from this representation that if the density function is continuous on the interval [a, b] and f(t) > 0 (a < t < b), then the cdf F(x) is a smooth, positive, strictly increasing function, and the cdf tends to 0 as x tends to a and the cdf tends to 1 as x tends to b. It is well known that the values of the cdf can be interpreted as the probabilities of events F(x) = P(a < θ < x)

(a < x < b).

The probabilities on the intervals a ≤ θ < x, a < θ ≤ x, a ≤ θ ≤ x, and a < θ < x are the same, i.e., P(a < θ < x) = P(a ≤ θ < x) = P(a < θ ≤ x) = P(a ≤ θ ≤ x). Definition 9.3.1. Given a probability p (0 ≤ p ≤ 1), let x p denote that value of x so that xp

F(x p ) = P(a < θ < x p ) = ∫ f(t) dt = p. a

The value x p is called the p-th fractile of θ. Assume that f(t) > 0 (a < t < b). Then

a

F(a) = P(θ = a) = P(a < θ < a) = ∫f(t) dt = 0, b

a

F(b) = P(a < θ < b) = ∫ f(t) dt = 1, a

and according to the strictly increasing property of the cdf, the value x p is uniquely defined. Especially, the 0.5-th fractile x0.5 is called the median of θ; the 0.25-th fractile x0.25 is called the lower quartile of θ; and the 0.75-th fractile x0.75 is called the upper quartile of θ.

9.3 Fractile and triangular methods |

275

b

For these three fractiles x0.25 , x0.5 , x0.75 , noticing that ∫a f(t) dt = 1, it follows from Definition 9.3.1 that x0.5

P(a < θ < x0.5 ) = ∫ f(t) dt = 0.5, a b

x0.5

b

P(x0.5 < θ < b) = ∫ f(t) dt = ∫ f(t) dt − ∫ f(t) dt = 1 − 0.5 = 0.5, x0.5

a

a

x0.25

P(a < x < x0.25 ) = ∫ f(t) dt = 0.25, a x0.5

x0.5

x0.25

P(x0.25 < θ < x0.5 ) = ∫ f(t) dt = ∫ f(t) dt − ∫ f(t) dt = 0.5 − 0.25 = 0.25, x0.25

a

a

x0.75

x0.75

x0.5

P(x0.5 < x < x0.75 ) = ∫ f(t) dt = ∫ f(t) dt − ∫ f(t) dt = 0.75 − 0.5 = 0.25, x0.5 b

a

a

b

x0.75

P(x0.75 < θ < b) = ∫ f(t) dt = ∫ f(t) dt − ∫ f(t) dt = 1 − 0.75 = 0.25. x0.75

a

a

So the three fractiles x0.25 , x0.5 , x0.75 have the following probability property: P(a < θ < x0.5 ) = P(x0.5 < θ < b), P(a < x < x 0.25 ) = P(x0.25 < θ < x0.5 ) = P(x0.5 < x < x0.75 ) = P(x0.75 < θ < b). These equalities are such that the decision maker often directly assesses these three fractiles and the cdf will have the assessed fractiles, i.e., F(x0.25 ) = 0.25, F(x0.5 ) = 0.5, F(x0.75 ) = 0.75.

The graphical method Assume that the decision maker has assessed these three fractiles and the points (x0.25 , 0.25) and (x0.5 , 0.5), and (x0.75 , 0.75) are specified. The question is now, how to determine a cdf F(x) whose graph passes through the given points. Establish an oxp-rectangular coordinate system on a sheet of graph paper, where the vertical op-axis is the probability axis and the horizontal ox-axis is the fractile axis. In the oxp-rectangular coordinate system, plot the three assessed points (x0.25 , 0.25), (x0.5 , 0.5), (x0.75 , 0.75) and two end points (a, 0), (b, 1), and then use the cdf’s smooth property and strictly increasing property to draw a graph through the plotted points. The cdf F(x) is obtained graphically.

276 | 9 Risk assessments

The analytical method In this method, the decision maker assumes that the considered probability distribution belongs to some parametric family of distributions, and then the decision maker chooses values for the parameters so that the resulting cdf will have the assessed fractiles.

9.3.1 Assessing a normal distribution The decision maker first assumes that the considered probability distribution belongs to the family of normal distributions. Denote by f N (t) a normal density function with parameters μ and σ2 2 1 − (t−μ) f N (t) = e 2σ2 , √2πσ2 where μ is the expected value and σ2 is the variance. Secondly, the decision maker chooses values for parameters μ and σ2 so that the resulting normal cdf x

F N (x) = ∫ −∞

1

e

√2πσ2



(t−μ)2 2σ2

dt

satisfies F N (x0.25 ) = 0.25, F N (x0.5 ) = 0.5, and F N (x0.75 ) = 0.75. It follows from Definition 9.3.1 that x0.5

F N (x0.5 ) = P(−∞ < θ < x0.5 ) = ∫ −∞

2

1 √2πσ2

e

− (t−μ)2

dt = 0.5.



Note that ∞

∫ x0.5

1 √2πσ2

e





(t−μ)2 2σ2

dt = ∫ −∞

1 √2πσ2

e



(t−μ)2 2σ2

x0.5

dt − ∫ −∞

1 √2πσ2

= 1 − 0.5 = 0.5. So

x0.5

∫ −∞

1 √2πσ2

By the change of variables u = x0.5 −μ σ

e



(t−μ)2 2σ2



dt = ∫ x0.5

t−μ σ ,

1 √2πσ2

e



(t−μ)2 2σ2

dt.

this implies that ∞

1 − u2 1 − u2 e 2 du = ∫ e 2 du. ∫ √2π √2π x −μ

−∞

0.5 σ

e



(t−μ)2 2σ2

dt.

9.3 Fractile and triangular methods |

Since the integrand e−

u2 2

277

is an even function, x0.5 − μ =0 σ

μ = x0.5 ,

or

i.e., the parameter μ (i.e., the expected value) is chosen to be the 0.5-th fractile x0.5 . It follows from Definition 9.3.1 and μ = x0.5 that x0.75

F N (x0.75 ) = P(−∞ < θ < x0.75 ) = ∫

√2πσ2

−∞

By the change of variables u = x0.75

∫ −∞

t−x0.5 σ

1 √2πσ2

and so

e

1

−∞

(t−x0.5 )2 2σ2

dt = 0.75.

x0.75 −x0.5 σ

dt = ∫ −∞





, this implies that

(t−x0.5 )2 − 2σ2

x0.75 −x0.5 σ

e

1 − u2 e 2 du. √2π

1 − u2 e 2 du = 0.75. √2π

(9.3.1)

u2

The integrand e− 2 is the standard normal density function. However the upper quartile of the standard normal density function can be found in a statistical table to be 0.674. So x0.75 − x0.5 = 0.674 σ or (x0.75 − x0.5 )2 = 2.20 (x0.75 − x0.5 )2 . σ2 = (0.674)2 i.e., the parameter σ2 (i.e., the variance) is chosen to be 2.20 times the square of the difference between the 0.75-th fractile and the 0.5-th fractile. It follows from Definition 9.3.1 and μ = x0.5 that x0.25

F N (x0.25 ) = P(−∞ < θ < x0.25 ) = ∫ −∞

By the change of variables u = x0.25

∫ −∞

and so

1 √2πσ2

x0.5 −t σ ,

e



1 √2πσ2



dt =

∫ x0.5 −x0.25 σ



x0.5 −x0.25 σ



(t−x0.5 )2 2σ2

dt = 0.25.

this implies that

(t−x0.5 )2 2σ2



e

1 − u2 e 2 du, √2π

1 − u2 e 2 du = 0.25. √2π

(9.3.2)

278 | 9 Risk assessments ∞

On the other hand, by (9.3.1) and ∫−∞ ∞

∫ x0.75 −x0.5 σ

1 √2π

u2

e− 2 du = 1, x0.75 −x0.5 σ



1 − u2 1 − u2 1 − u2 e 2 du = ∫ e 2 du − ∫ e 2 du = 0.25. √2π √2π √2π −∞

−∞

From this and (9.3.2), ∞

∫ x0.5 −x0.25 σ

1 − u2 e 2 du = √2π



∫ x0.75 −x0.5 σ

1 − u2 e 2 du. √2π

So x0.5 − x0.25 = x0.75 − x0.5 , i.e., the distance between the 0.5-th fractile and the 0.25-th fractile, x0.5 − x0.25 , is equal to the distance between the 0.75-th fractile and the 0.5-th fractile, x0.75 − x0.5 . So the parameter σ2 (i.e., the variance) may be also chosen to be 2.20 times the square of the difference between the 0.5-th fractile and the 0.25-th fractile σ2 = 2.20 (x0.5 − x0.25 )2 .

9.3.2 Assessing a beta prior distribution The decision maker assumes first that the considered probability distribution belongs to the beta family of distributions. Denote by f β (t) the beta density function with parameters r0 and n0 f β (t) =

1 t r0 −1 (1 − t)n0 −r0 −1 B(r0 , n0 )

(0 ≤ t ≤ 1),

where r0 > 0 and n0 > 0 and B(r0 , n0 ) is a beta function 1

B(r0 , n0 ) = ∫ t r0 −1 (1 − t)n0 −r0 −1 dt. 0

Secondly, the decision maker chooses values for parameters r0 and n0 so that the resulting beta cdf x

x

F β (x) = ∫ f β (t) dt = B(r0 , n0 )−1 ∫ t r0 −1 (1 − t)n0 −r0 −1 dt 0

(0 ≤ x ≤ 1)

0

satisfies F β (x0.25 ) = 0.25, F β (x0.5 ) = 0.5, and F β (x0.75 ) = 0.75. Generally speaking, this will not be possible. To satisfy them approximately, the decision maker considers the error function E(r0 , n0 ) = (F β (x0.25 ) − 0.25)2 + (F β (x0.5 ) − 0.5)2 + (F β (x0.75 ) − 0.75)2 and uses a computer code to calculate values r0 and n0 that minimize E(r0 , n0 ).

9.3 Fractile and triangular methods |

279

The triangular distribution method is another ideal method. This method qualifies a triangle as a probability density function. Let the area of the triangle be equal to 1. Denote by a and b the lowest and highest values of the outcome, respectively, and denote by c the most likely value of the outcome. The base of the triangle is equal to b − a and its height is the frequency p(c) of the most likely value. So 1 (b − a)p(c) = 1, 2 and so the frequency of the most likely value of the outcome is given by 2 . b−a Law and Kelton presented that the cdf P(x) and pdf p(x) of the triangular distribution for a continuous random variable X are as follows: p(c) =

0 { { { { { (x−a)2 { { (b−a)(c−a) P(x) = P(X ≤ x) = { { 1 − (b−x)2 { { (b−a)(b−c) { { { {1

(x < a), (a ≤ x ≤ c), (c < x ≤ b), (x > b)

and 0 { { { { { 2(x−a) { { (b−a)(c−a) p(x) = { 2(b−x) { { { (b−a)(b−c) { { { {0

(x < a), (a ≤ x ≤ c), (c < x ≤ b), (x > b),

d P(x). where p(x) = dx The expected value of the triangular distribution is

a+b+c . 3 In fact, by the definition, the expected value of the triangular distribution is E[X] =



E[x] = ∫ xp(x) dx −∞

=

c

b

a

c

2 2 ∫(x2 − ax) dx + ∫(bx − x2 ) dx. (b − a)(c − a) (b − a)(b − c)

However, c

∫(x2 − ax) dx = a b

∫(bx − x2 ) dx = c

c3 − a3 a(c2 − a2 ) − , 3 2 b3 − bc2 b3 − c3 − . 2 3

280 | 9 Risk assessments

Therefore, the expected value of the triangular distribution is E[x] =

c3 − a3 a(c2 − a2 ) 2 − ( ) (b − a)(c − a) 3 2 +

=

2 b3 − bc2 b3 − c3 − ( ) (b − a)(b − c) 2 3

2c2 − ac − a2 b2 + bc − 2c2 a + b + c + = . 3(b − a) 3(b − a) 3

The variance of the triangular distribution is Var(X) =

a2 + b2 + c2 − ab − ac − bc . 18

In fact, by the definition, the variance of the triangular distribution is Var(X) = E[(x − E[X])2 ] ∞

= ∫ (x2 − 2xE[X] + (E[X])2 )p(x) dx −∞ ∞





= ∫ x p(x) dx − 2E[X] ∫ xp(x) dx + (E[X]) ∫ p(x) dx 2

−∞

= I1 − 2E[X]I2 +

2

−∞ 2 (E[X]) I3 .

−∞

(9.3.3)

These three integrals I1 , I2 , and I3 are computed as follows: ∞

I1 = ∫ x2 p(x) dx −∞

c

b

a

c

2 2 = ∫(x3 − ax2 ) dx + ∫(bx2 − x3 ) dx (b − a)(c − a) (b − a)(b − c) c4 − a4 a(c3 − a3 ) 2 b(b3 − c3 ) b4 − c4 2 − − = ( )+ ( ) (b − a)(c − a) 4 3 (b − a)(b − c) 3 4 =

2 a3 − b3 + c(a2 − b2 ) + c2 (a − b) b3 − a3 + c(b2 − a2 ) + c2 (b − a) + ( ) b−a 4 3

=

a2 + b2 + c2 + ab + bc + ac , 6

9.3 Fractile and triangular methods |

281



I2 = ∫ xp(x) dx −∞ c

b

a

c

2 2 (x2 − ax) dx + ∫ (bx − x2 ) dx =∫ (b − a)(c − a) (b − a)(b − c) c3 − a3 a(c2 − a2 ) 2 b(b2 − c2 ) b3 − c3 2 − − = ( )+ ( ) (b − a)(c − a) 3 2 (b − a)(b − c) 2 3 2 a2 − b2 + c(a − b) (b2 − a2 ) + c(b − a) + ( ) b−a 3 2 a+b+c = , 3 and =



I3 = ∫ p(x) dx −∞ c

=∫ a

b

2 2 (x − a) dx + ∫ (b − x) dx (b − a)(c − a) (b − a)(b − c) c

c2 − a2 2 b2 − c2 2 − a(c − a)) + = ( (b(b − c) − ) (b − a)(c − a) 2 (b − a)(b − c) 2 c+a 2 2 b+c = − a) + ( (b − ) b−a 2 b−a 2 2 c−a b−c = + ( ) = 1. b−a 2 2 Note that E[X] =

a+b+c 3 .

By (9.3.3), the variance of the triangular distribution is

Var(X) = I1 − 2E[X]I2 + (E[X])2 I3 a2 + b2 + c2 + ab + bc + ac a+b+c − 2E[X] ( ) + (E[X])2 6 3 a2 + b2 + c2 + ab + bc + ac 2(a + b + c)2 (a + b + c)2 = − + 6 9 9 a2 + b2 + c2 − ab − ac − bc = . 18 =

Example 9.3.2. Given the pdf p(x) of the triangular distribution as above, compute the value of the conditional expected risk functions b

g( ⋅ ) = where β > c.

∫β xp(x) dx b

∫β p(x) dx

,

282 | 9 Risk assessments 2(b−x) (b−a)(b−c) ,

Solution. Since β > c, p(x) =

and so

b

g(⋅) =

∫β xp(x) dx b

∫β p(x) dx

b

=

∫β x(b − x) dx b

∫β (b − x) dx

.

The numerator is b

∫ x(b − x) dx =

b3 − bβ2 b3 − β3 − 2 3

β

and the denominator is b

∫(b − x) dx = b2 − bβ −

b2 − β2 . 2

β

Thus, the conditional expected risk functions are g( ⋅ ) =

b3 −bβ2 2

b3 −β3 3 2 2 − b −β 2



b2 − bβ

=

b(b+β) 2



b−

b2 +bβ+β2 3 b+β 2

=

b2 − β2 + bβ − β2 b + 2β = . 3(b − β) 3

9.4 The ε-constraint method Let f i : ℝn → ℝ (i = 1, . . . , N ) and c i : ℝn → ℝ (i = 1, . . . , m) all be continuously differentiable functions. The model of the form min{ f1 (x), . . . , f N (x) } subject to x

c i (x) ≤ 0 (i = 1, . . . , m)

(9.4.1)

is called an multiobjective optimization model. Haimes et al. presented the ε-constraint method and showed that the model (9.4.1) is equivalent to the ε-constraint optimization model of the form minf i (x) x∈X

subject to f j (x) ≤ ε j

(j ≠ i; j = 1, 2, . . . , N),

(9.4.2)

where ε j (j ≠ i, j = 1, . . . , N ) are variables and X = { x ∈ ℝn | c i (x) ≤ 0 (i = 1, . . . , m) }, ε j = min f j (x) + ϵ j , x∈X

ϵ j > 0 (j ≠ i; j = 1, . . . , N).

From (9.4.2), it is seen that the ε-constraint optimization model replaces N − 1 objective functions in (9.4.1) by N − 1 constraints, i.e., one objective f i (x) is the principal objective, all others f j (x) (j ≠ i; j = 1, . . . , N ) are the constraining objectives.

9.4 The ε-constraint method |

283

The generalized Lagrange function corresponding to the ε-constraint optimization model is given by L(x; λ) = f i (x) + ∑ λ ij (f j (x) − ε j ), j=i̸ i,j=1,...,N

where λ ij are called generalized Lagrange multipliers and the positive Lagrange multiplies λ ij are called the trade-off functions between the principal objective f i (x) and the constraining objective f j (x) (j ≠ i; i, j = 1, . . . , N ). The trade-off function corresponds to the noninferior set of solutions, so this concept is very important. Example 9.4.1. Let f1 (x1 , x2 ) = (x1 − 1)2 + (x2 − 2)2 + 1, f2 (x1 , x2 ) = (x1 − 4)2 + (x2 − 5)2 + 4. Consider a biobjective optimization model min { f1 (x1 , x2 ), f2 (x1 , x2 ) },

(x1 ,x2 )

the restriction:

subject to

x1 ≥ 0, x2 ≥ 0.

Let f1 (x1 , x2 ) be the principal objective and f2 (x1 , x2 ) the constraining objective. Then this model is equivalent to the following ε-constraint optimization model: min

(x1 ,x2 )∈X

f1 (x1 , x2 )

subject to f2 (x1 , x2 ) ≤ ε2

((x1 , x2 ) ∈ X),

where X = { (x1 , x2 ) | x1 ≥ 0, x2 ≥ 0 }. The corresponding generalized Lagrange function is given by L(x1 , x2 ; λ12 ) = f1 (x1 , x2 ) + λ12 (f2 (x1 , x2 ) − ε2 ) = (x1 − 1)2 + (x2 − 2)2 + 1 + λ12 ((x1 − 4)2 + (x2 − 5)2 + 4 − ε2 ). Differentiating both sides with respect to x1 and x2 , respectively, ∂L = 2(x1 − 1) + 2λ12 (x1 − 4), ∂x1 ∂L = 2(x2 − 2) + 2λ12 (x2 − 5). ∂x2 Let

∂L ∂x i

= 0 (i = 1, 2). Then 2(x1 − 1) + 2λ12 (x1 − 4) = 0, 2(x2 − 2) + 2λ12 (x2 − 5) = 0.

This implies that the trade-off function λ12 between f1 (x1 , x2 ) and f2 (x1 , x2 ) satisfies x1 − 1 , 4 − x1 x2 − 2 = . 5 − x2

λ12 = λ12

(9.4.3)

284 | 9 Risk assessments From (9.4.3) and λ12 > 0, it follows that the decision variables x1 and x2 satisfy 1 < x1 < 4, 2 < x2 < 5. Several noninferior solutions and trade-off values of the biobjective optimization model are listed in the following table. x1

x2

f1 (x1 , x2 )

f2 (x1 , x2 )

λ12

1.5 2 2.5 3 3.5

2.5 3 3.5 4 4.5

1.5 3 5.5 9 13.5

16.5 12 8.5 6 4.5

0.2 0.5 1 2 5

From (9.4.3), it follows that x2 = x1 + 1. So the Pareto-optimal solutions in the decision space are determined by the equation x2 = x1 + 1 (1 < x1 < 4), i.e., a line segment jointing two points (1, 2) and (4, 5). Example 9.4.2. Let f1 (x1 , x2 ) = (x1 − 1)2 + (x2 − 2)2 + 1, f2 (x1 , x2 ) = (x1 − 4)2 + (x2 − 5)2 + 4, f3 (x1 , x2 ) = (x1 − 6)2 + (x2 − 10)2 + 1. A three-objective optimization model is given as min { f1 (x1 , x2 ), f2 (x1 , x2 ), f3 (x1 , x2 ) },

(x1 ,x2 )

subject to

x1 ≥ 0, x2 ≥ 0.

Let f1 (x1 , x2 ) be the principal objective and f2 (x1 , x2 ), f3 (x1 , x2 ) the constraining objectives. This model is equivalent to the following ε-constraint optimization model: min

(x1 ,x2 )∈X

f1 (x1 , x2 )

subject to

f2 (x1 , x2 ) ≤ ε2 , f3 (x1 , x2 ) ≤ ε3

((x1 , x2 ) ∈ X),

where X = { (x1 , x2 ) | x1 ≥ 0, x2 ≥ 0 }. The corresponding generalized Lagrange function is given by L(x1 , x2 ; λ12 , λ13 ) = f1 (x1 , x2 ) + λ12 (f2 (x1 , x2 ) − ε2 ) + λ13 (f3 (x1 , x2 ) − ε3 ) = (x1 − 1)2 + (x2 − 2)2 + 1 + λ12 ((x1 − 4)2 + (x2 − 5)2 + 4 − ε2 ) + λ13 ((x1 − 6)2 + (x2 − 10)2 + 1 − ε3 ).

9.4 The ε-constraint method |

285

Differentiating both sides with respect to x1 and x2 , respectively, ∂L = 2(x1 − 1) + 2λ12 (x1 − 4) + 2λ13 (x1 − 6), ∂x1 ∂L = 2(x2 − 2) + 2λ12 (x2 − 5) + 2λ13 (x2 − 10). ∂x2 Let

∂L ∂x i

= 0 (i = 1, 2). Then 2(x1 − 1) + 2λ12 (x1 − 4) + 2λ13 (x1 − 6) = 0, 2(x2 − 2) + 2λ12 (x2 − 5) + 2λ13 (x2 − 10) = 0.

This implies that the trade-off functions λ12 and λ13 satisfy −8x1 + 5x2 − 2 , 5x1 − 2x2 − 10 3x1 − 3x2 + 3 = . 5x1 − 2x2 − 10

λ12 = λ13

(9.4.4)

Secondly, let f2 (x1 , x2 ) be the principal objective and f1 (x1 , x2 ), f3 (x1 , x2 ) be the constraining objectives. Then the given three-objective optimization model is also equivalent to the following ε-constraint optimization model: min

(x1 ,x2 )∈X

f2 (x1 , x2 )

subject to

f1 (x1 , x2 ) ≤ ε1 , f3 (x1 , x2 ) ≤ ε3

((x1 , x2 ) ∈ X).

The corresponding generalized Lagrange function is given by L(x1 , x2 ; λ21 , λ23 ) = f2 (x1 , x2 ) + λ21 (f1 (x1 , x2 ) − ε1 ) + λ23 (f3 (x1 , x2 ) − ε3 ) = (x1 − 4)2 + (x2 − 5)2 + 4 + λ21 ((x1 − 1)2 + (x2 − 2)2 + 1 − ε1 ) + λ23 ((x1 − 6)2 + (x2 − 10)2 + 1 − ε3 ). Let

∂L ∂x i

= 0 (i = 1, 2). Then 2(x1 − 4) + 2λ21 (x1 − 1) + 2λ23 (x1 − 6) = 0, 2(x2 − 5) + 2λ21 (x2 − 2) + 2λ23 (x2 − 10) = 0.

This implies that the trade-off functions λ21 and λ23 satisfy 5x1 − 2x2 − 10 , −8x1 + 5x2 − 2 3x1 − 3x2 + 3 = . −8x1 + 5x2 − 2

λ21 = λ23

(9.4.5)

286 | 9 Risk assessments Third, let f3 (x1 , x2 ) be the principal objective and f1 (x1 , x2 ), f2 (x1 , x2 ) the constraining objectives. Then the given three-objective optimization model is also equivalent to the following ε-constraint optimization model: min

(x1 ,x2 )∈X

f3 (x1 , x2 )

subject to

f1 (x1 , x2 ) ≤ ε1 , f2 (x1 , x2 ) ≤ ε2

((x1 , x2 ) ∈ X).

The corresponding generalized Lagrange function is given by L(x1 , x2 ; λ31 , λ32 ) = f3 (x1 , x2 ) + λ31 (f1 (x1 , x2 ) − ε1 ) + λ32 (f2 (x1 , x2 ) − ε2 ) = (x1 − 6)2 + (x2 − 10)2 + 1 + λ31 ((x1 − 1)2 + (x2 − 2)2 + 1 − ε1 ) + λ32 ((x1 − 4)2 + (x2 − 5)2 + 4 − ε2 ). Let

∂L ∂x i

= 0 (i = 1, 2). Then 2(x1 − 6) + 2λ31 (x1 − 1) + 2λ32 (x1 − 4) = 0, 2(x2 − 10) + 2λ31 (x2 − 2) + 2λ32 (x2 − 5) = 0.

This implies that the trade-off functions λ31 and λ32 satisfy 5x1 − 2x2 − 10 , 3x1 − 3x2 + 3 −8x1 + 5x2 − 2 = . 3x1 − 3x2 + 3

λ31 = λ32

(9.4.6)

The combination of (9.4.4) and (9.4.5), and (9.4.6) gives λ12 =

1 , λ21

λ13 =

1 , λ31

λ13 = λ12 λ23 ,

λ23 =

1 , λ32

λ23 = λ21 λ13 .

Several noninferior solutions and trade-off values of the given three-objective optimization model are listed in the following table. x1

x2

f1

f2

f3

λ12

λ13

λ21

λ23

λ31

λ32

2 3 4 5

3.2 5 6 8

3.44 14 26 53

11.24 5 5 14

63.24 35 21 6

0.31 0.2 2 2

0.09 0.6 1.5 6

3.2 5 0.5 0.5

0.3 3 0.75 3

10.67 1.67 0.67 0.17

3.33 0.33 1.33 0.33

From (9.4.4), (9.4.5), and (9.4.6), the Pareto-optimal solution in the decision space is a triangle determined by the system of inequalities

9.5 The uncertainty sensitivity index method |

287

{ x2 > x1 + 1, { { { x > 5 x − 5, { { 2 2 1 { { x < 8x + 2, { 2 5 1 5 i.e., a triangle with three vertices (1, 2), (4, 5), and (6, 10). Examples 9.4.1 and 9.4.2 indicate that they both have two common objective functions, but by adding the third objective function in Example 9.4.2, a large number of Paretooptimal solutions have been added.

9.5 The uncertainty sensitivity index method Uncertainty and sensitivity are two characteristics of systems. Uncertainty is defined as the inability to determine the true state of affairs of a system. It is caused by stochastic variability and incomplete knowledge. Taylor shows that the variability uncertainty arises from temporal, spatial, and individually heterogeneous variability. Finkel shows that knowledge about uncertainty comes from model, parameter, and decision uncertainty. Sensitivity is defined as the relation between changes in the system’s performance index and possible variations in decision variables, constraint levels, and uncontrolled parameters. The Uncertainty Sensitivity Index Method (USIM) presented by Li and Haimes is a uncertainty-sensitivity analysis method. This method represents the uncertainty associated with potential variations of the system’s parameter by a system’s output sensitivity index. The USIM is intrinsic to considering a joint optimality and sensitivity multiobjective model. Let y = h(x; α) be the system’s output response, where x is a vector consisting of n decision control variables and α is a vector consisting of m uncertain random system parameters x = (x1 , . . . , x n ), α = (α1 , . . . , α m ), and y ∈ ℝ is differentiable with respect to x and α. Let f(x; y; α) = f(x; h(x; α); α) be the system’s objective function. Denote the nominal value of α by α̂ = (α̂1 , . . . , α̂ m ). The nominal values α̂ i (i = 1, . . . , m) may be determined by any system’s identification procedure. Assume that the m uncertain random system parameters α1 , . . . , α m vary in the neighborhood of their nominal value (α̂1 , . . . , α̂ m ). The Taylor theorem shows that m

h(x; α̂1 + ∆ α̂1 , . . . , α̂ m + ∆ α̂ m ) ≈ h(x; α̂1 , . . . , α̂ m ) + ∑ 1

∂h(x; α̂1 , . . . , α̂ m ) ∆ α̂ i , ∂ α̂ i

288 | 9 Risk assessments where ∆ α̂ i is a small variation in the nominal value α̂ i . Thus, associated with variations in the parameters, the variation of the system’s output is equal approximately to m ∂h(x; α̂1 , . . . , α̂ m ) ∆ α̂ i . ∑ ∂ α̂ i 1 and the square of the variation satisfies the Cauchy–Schwarz inequality m

(∑ 1

2

m ∂h(x; α̂1 , . . . , α̂ m ) 2 m ∂h(x; α̂1 , . . . , α̂ m ) 2 ∆ α̂ i ) ≤ ∑ ( ) ⋅ ∑ (∆ α̂ i ) . ∂ α̂ i ∂ α̂ i 1 1

In order to reduce this variation, a control scenario x is chosen such that m

∑( 1

∂h(x; α̂1 , . . . , α̂ m ) 2 ) ∂ α̂ i

is minimal. Define a sensitivity index function as m

s(x; α)̂ = ∑ ( 1

∂h(x; α̂1 , . . . , α̂ m ) 2 ) . ∂ α̂ i

The multiobjective optimization model of the form ̂ α), ̂ s(x; α)̂ } min{ f(x; h(x; α); x

̂ α)̂ is called a joint optimality and sensitivity multiobjective model. Let f(x; h(x; α); be the principal objective and s(x; α)̂ be the constraining objective. Using the εconstraint method, the joint optimality and sensitivity multiobjective optimization model is equivalent to the ε-constraint optimization model of the form ̂ α)̂ subject to minf(x; h(x; α); x

s(x; α)̂ ≤ ε.

The corresponding generalized Lagrangian function is given by ̂ α)̂ + λ fs (s(x; α)̂ − ε), L(x; α;̂ λ fs ) = f(x; h(x; α); where λ fs is the generalized Lagrange multiplier. Differentiating both sides with respect to x i , ̂ α)̂ ∂f(x; h(x; α); ∂s(x; α)̂ ∂L = + λ fs ∂x i ∂x i ∂x i

(i = 1, . . . , n),

(9.5.1)

where x = (x1 , . . . , x n ) and λ fs is the trade-off function between the system’s objective ∂L ̂ α)̂ and the sensitivity index function s(x; α). ̂ Let ∂x function f(x; h(x; α); = 0 (i = i 1, . . . , n). Then ̂ α)̂ ∂f(x; h(x; α); ∂s(x; α)̂ + λ fs = 0 (i = 1, . . . , n) ∂x i ∂x i

9.5 The uncertainty sensitivity index method |

or ̂

λ fs = −

289

̂

α);α) ) ( ∂f(x;h(x; ∂x i α)̂ ( ∂s(x; ∂x i )

(i = 1, . . . , n).

From this and λ fs > 0, the set of noninferior solutions can be determined by the systems of inequalities ̂ α)̂ α); { ∂f(x;h(x; > 0, ∂x i { ∂s(x;α)̂ 0. { ∂x i Let x∗ be a decision variable which minimizes the system’s objective function ̂ α), ̂ i.e., f(x; h(x; α); ̂ α)̂ = min f(x; h(x; α); ̂ α), ̂ f(x ∗ ; h(x ∗ ; α); x

and let x̂ be another decision variable which minimizes the sensitivity index function ̂ i.e., s(x; α), ̂ s(x;̂ α)̂ = min s(x; α). x

The most conservative policy x̂ provides a very stable solution, whereas the conventional solution x ∗ suffers the highest deviation. Based on the preference of the decision maker, the best compromise solution may be selected from the noninferior solution set { x̂ ≤ x ≤ x ∗ }. Example 9.5.1. Consider a system that has the following output response and objective function: h(x; α) = (α21 + α22 + α23 + α24 )x, f(x; h(x; α); α) = x2 − 2(α1 α22 + α1 α23 + α1 α24 + α2 α23 + α2 α24 + α3 α24 )x + α21 α2 + α21 α3 + α21 α4 + α22 α3 + α22 α4 + α23 α4 , where x is the one-dimensional decision variable and α is a vector consisting of four uncertain random system parameters α1 , α2 , α3 , α4 . The system’s sensitivity index function is 4

s(x; α) = ∑ ( 1

∂h(x; α1 , α2 , α3 , α4 ) 2 ) = 4 (α21 + α22 + α23 + α24 )x. ∂α i

Assume that the nominal values α̂1 , α̂2 , α̂3 , α̂4 of the system’s parameters α1 , α2 , α3 , α4 are, respectively, α̂1 = 1, α̂2 = 2, α̂3 = 1,

α̂4 = 2.

290 | 9 Risk assessments

Then h(x; α̂1 , α̂2 , α̂3 , α̂4 ) = h(x; 1, 2, 1, 2) = 10x, f(x; h(x; α̂1 , α̂2 , α̂3 , α̂4 ); α̂1 , α̂2 , α̂3 , α̂4 ) = f(x; h(x; 1, 2, 1, 2); 1, 2, 1, 2) = x2 − 23x + 19

(9.5.2)

s(x; α̂1 , α̂2 , α̂3 , α̂4 ) = s(x; 1, 2, 1, 2) = 40x2 . When the nominal values α̂1 , α̂2 , α̂3 , α̂4 are perturbed by ∆ α̂1 , ∆ α̂2 , ∆ α̂3 , ∆ α̂4 and ∆ α̂1 = ∆ α̂2 = ∆ α̂3 = ∆ α̂4 = 0.1, the variation of the system’s output response is 4

4

4

1

1

1

∆h = (∑(α̂ i + ∆ α̂ i )2 ) x − (∑ α2i ) x = ∑(2α̂ i ∆ α̂ i + (∆ α̂ i )2 ) x = 1.24 x.

(9.5.3)

The joint optimality and sensitivity multiobjective model is the form { f(x; h(x; α̂1 , α̂2 , α̂3 , α̂4 ); α̂1 , α̂2 , α̂3 , α̂4 ) = x2 − 23x + 19 min { x s(x; α̂1 , α̂2 , α̂3 , α̂4 ) = 40x2 { Using the ε-constraint method, this model is equivalent to the following ε-constraint optimization model: min{ f(x; h(x; α̂1 , α̂2 , α̂3 , α̂4 ); α̂1 , α̂2 , α̂3 , α̂4 ) = x2 − 23x + 19 } x

subject to the ε − constraint:

s(x; α̂1 , α̂2 , α̂3 , α̂4 ) = 40x2 ≤ ε.

The corresponding generalized Lagrangian function is given by L(x, α̂1 , α̂2 , α̂3 , α̂4 , λ fs ) = f(x; h(x; α̂1 , α̂2 , α̂3 , α̂4 ); α̂1 , α̂2 , α̂3 , α̂4 ) + λ fs (s(x; α̂1 , α̂2 , α̂3 , α̂4 ) − ε) = (x2 − 23x + 19) + λ12 (40x2 − ε), where λ fs is the trade-off between the system’s objective function and system’s sensitivity index function. Let ∂L ∂x = 0. Then 2x − 23 + λ fs 80x = 0, i.e., λ fs = −

2x − 23 . 80x

From this and λ fs > 0, it follows that { 2x − 23 > 0, { 80x < 0 { or { 2x − 23 < 0, { 80x > 0. { So the set of noninferior solutions is 0 ≤ x ≤ 11.5.

9.6 The partitioned multiobjective risk method |

291

By (9.5.2) and (9.5.3), a sample of noninferior solutions of the joint optimality and sensitivity model, the corresponding values of the system’s objective function f , the system’s sensitivity index function s, the trade-off λ fs , the system’s output response h, and its variation ∆h are listed in the following table.

x f s λ fs h ∆h

0 19 0 ∞ 0 0

4 −57 640 0.0456 40 4.96

8 −101 2560 0.0109 80 9.92

10 −111 4000 0.00375 100 12.4

11.5 −113.25 5290 0 115 14.26

Let x∗ be a decision variable which minimizes f(x, h(x; α̂1 , . . . , α̂ n ); α̂1 , . . . , α̂ n ), and let x ̂ be another decision variable which minimizes s(x; α̂1 , . . . , α̂ n ), i.e., f(x∗ ; h(x∗ ; α̂1 , α̂2 , α̂3 , α̂4 ); α̂1 , α̂2 , α̂3 , α̂4 ) = min f(x; h(x; α̂1 , . . . , α̂ n ); α̂1 , α̂2 , α̂3 , α̂4 ), x

s(x;̂ α̂1 , α̂2 , α̂3 , α̂4 ) = min s(x; α̂1 , α̂2 , α̂3 , α̂4 ). x

(9.5.4)

From (9.5.2), it follows that f(x, h(x, α̂1 , α̂2 , α̂3 , α̂4 ); α̂1 , α̂2 , α̂3 , α̂4 ) = x2 − 23x + 19 = (x − 11.5)2 − 113.25, s(x; α̂1 , α̂2 , α̂3 , α̂4 ) = 40x2 . The combination of this with (9.5.4) gives x∗ = 11.5, x ̂ = 0. Thus, the best compromise solution can be selected from the noninferior solution set { 0 ≤ x ≤ 11.5 }.

9.6 The partitioned multiobjective risk method The Partitioned Multiobjective Risk (PMR) method is a risk analysis method. In the PMR method, the concept of the traditional expected value of damage is extended to generate a number of conditional expected value functions (or conditional expected risk functions), each associated with a particular range of exceedance probabilities (or the corresponding range of damage severities). The resulting conditional expected risk functions, together with the traditional expected value of damage, provide a family of risk measures associated with a particular policy.

292 | 9 Risk assessments Let a damage severity associated with a particular policy s j (j = 1, . . . , m) be represented by the continuous random variable X j (j = 1, . . . , m), and P j (x) and p j (x) denote the cdf and the pdf of damage, respectively. Let 1 − α i (i = 1, . . . , n) denote n exceedance probabilities, where 0 < α1 < ⋅ ⋅ ⋅ < α n < 1. For the policy s j , there is j a unique damage β i (i = 1, . . . , n) on the damage axis that corresponds to the exceedance probability 1 − α i (i = 1, . . . , n) on the probability axis. Thus, the partition of the probability axis [0, 1 − α n ],

[1 − α n , 1 − α n−1 ],

[1 − α1 , 1]

...,

corresponds the partition of the damage axis j

j

[β0 , β1 ], j

j

j

[β1 , β2 ],

j

j

[β n , β n+1 ],

...,

j

where β0 and β n+1 are the lower bound and upper bound of damage, respectively. The j policies s j , the exceedance probabilities 1 − α i , and the bounds β i of damage ranges satisfy the relationship j P j (β1 ) = 1 − α1 , .. . j

P j (β n ) = 1 − α n . If the inverse P−1 j (x) exists, then P−1 j (1 − α 1 ) = β 1 , j

.. . P−1 j (1 − α n ) = β n . j

The conditional expected risk functions f i (s j ) (i = 2, . . . , n + 2) of the damage are given by β

j

j

f2 (s j ) = E[ X j | p j (x), x ∈ [β0 , β1 ] ] =

j

∫ j1 xp j (x) dx β0

,

j

β1

∫ j p j (x) dx β0

β

j

j

f3 (s j ) = E[ X j | p j (x), x ∈ [β3 , β4 ] ] =

j

∫ j2 xp j (x) dx β1

β

,

j

∫ j2 p j (x) dx β1

.. .

β

j

j

f n+2 (s j ) = E[ X j | p j (x), x ∈ [β n , β n+1 ] ] =

j

∫ j n+1 xp j (x) dx βn

β

j

∫ jn+1 p j (x) dx βn

.

9.6 The partitioned multiobjective risk method |

293

The unconditional excepted risk function f n+3 (s j ) of the damage is given by β

f n+3 (s j ) = E[X j ] =

j

∫ jn+1 xp j (x) dx β0

j

β n+1

∫j

β0

. p j (x) dx

Since the total probability of the damage is equal to 1, i.e., j

β n+1

∫j

β0

p j (x) dx = 1,

the unconditional excepted risk function of the damage is j

β n+1

f n+3 (s j ) = ∫ j

β0

xp j (x) dx.

Denote their denominators by q i (i = 2, . . . , n + 2), respectively, i.e., j

q2 = ∫

β1

β0j

p j (x) dx,

j

β2

q3 = ∫ j p j (x) dx, .. .

β1

j

β n+1

q n+2 = ∫ j

βn

p j (x) dx,

where q i > 0 (i = 2, . . . , n + 2) and q2 + ⋅ ⋅ ⋅ + q n+2 = 1. So the conditional expected risk functions f i (i = 2, . . . , n + 2) and the unconditional expected risk function f n+3 satisfy the balance relationship n+2

f n+3 (s j ) = q2 f2 (s j ) + q3 f3 (s j ) + ⋅ ⋅ ⋅ + q n+2 f n+2 (s j ) = ∑ q i f i (s j ). 2

Let f1 (s j ) denote the cost objective function. Combining any one of the conditional expected risk functions or the unconditional excepted risk function with the cost objective function constitutes a set of biobjective optimization models, i.e., min{ f1 (s j ), f2 (s j ) }, { { { { . { { { .. { { { min{ f1 (s j ), f n+2 (s j ) }, { { { { { min{ f1 (s j ), f n+3 (s j ) }.

294 | 9 Risk assessments These n + 2 biobjective optimization models can offer more information about the probabilistic behavior than the single biobjective optimization model min{ f1 (s j ), f n+3 (s j ) }. Let f1 (s j ) be the principal objective and f i (s j ) (i = 2, . . . , n + 2) the constraining objective. The above n + 2 biobjective optimization models can be transformed into the n + 2 equivalent ε-constraint optimization models and the trade-offs λ1i (i = 2, . . . , n + 3) between the cost function f1 and risk functions f i (i = 2, . . . , n + 3) satisfy n+2 1 qi q2 q3 q n+2 = + + ⋅⋅⋅ + = ∑ , λ1,n+3 λ12 λ13 λ1,n+2 λ 1i 2 1 where λ1i = − ∂f ∂f i (i = 2, . . . , n + 3) and q i (i = 2, . . . , n + 2) are stated as above.

9.7 The multiobjective multistage impact analysis method A general definition of impact is the effect of one thing upon another. For impact analysis in a multiobjective framework, Gomide developed a theoretical basis. He formulated a multiobjective multistage optimization model and presented the Multiobjective Multistage Impact Analysis (MMIA) method. Let g : ℝn × ℝr → ℝn be a continuously differentiable function. Assume that the evolution of a system in stages can be represented by a multistage process x(0) = x0 ,

x(k + 1) = g(x(k), u(k); k)

(k = 0, . . . , T − 1),

where x(k) ∈ ℝn is the system’s state at stage k and u(k) ∈ ℝr is the decision at stage k, and T is the horizon of interest to the system’s decision maker. Denote by F the universe of objectives of interest to the system F = { f ik (x(k), u(k); k),

i = 1, 2, . . . , N k ; k = 0, . . . , T − 1 },

where N k is the number of objectives considered as important for stage k, and f ik are continuously differentiable functions. Assume further that Ω k (k = 0, . . . , T ) are specified by a system of inequality constraints Ω k = { (x(k), u(k)) : h(x(k), u(k); k) ≤ 0 }

(k = 0, . . . , T − 1),

Ω T = { x(T) : h(x(T); T) ≤ 0 }, where h(x(k), u(k); k) : ℝn × ℝr → ℝm k (k = 0, . . . , T − 1) and h(x(k); T) : ℝn → ℝm T are both continuously differentiable functions. The Multiobjective Multistage Impact Analysis (MMIA) model is the form

9.7 The multiobjective multistage impact analysis method |

f1 (x, u; T − 1) } { { } { } .. min { . } u { } { k } (x, u; T − 1) f N { T−1 }

f10 (x, u; 0) } { } { } { .. min { , ..., . } u { } } { k { f N0 (x, u; 0) } x(0) = x0 , where

295

T−1

subject to

x(k + 1) = g(x(k), u(k); k)

(k = 0, . . . , T − 1),

(x(k), u(k)) ∈ Ω k ⊂ ℝn × ℝr

(k = 0, . . . , T − 1),

x(T) ∈ Ω T ⊂ ℝ . n

From this, it is seen that the meaning of the MMIA model is to solve a sequence of static or single-stage multiobjective optimization models where decisions made at stage k affect stages k + 1, k + 2, . . . , T − 1. Gomide and Haimes characterized noninferior policy decisions for the MMIA model. To obtain noninferior solutions, the MMIA model is converted into a series of single-objective models using the ε-constraint method as follows. Let X = (x(0), . . . , x(T)), U = (u(0), . . . , u(T − 1)) and

𝕂 = { 0, . . . , T − 1 }, ℕk = { 1, . . . , N k } (k = 0, . . . , T − 1)

and let Ω = ⋃Tk=0 Ω k . Using the ε-constraint method, the MMIA model is represented as minf st (x(t), u(t); t) U

subject to τ (X, U) ∈ (Ω ⋂(Γ st (ε) ⋃ Γ s,t (ε)) ⋂ V) ,

where Γ st (ε) = { (X, U) : f it (x(t), u(t); t) ≤ ε ti τ Γ s,t (ε)

= { (X, U) :

f iτ (x(τ), u(τ); τ)



ε τi

for all i ∈ ℕt (i ≠ s), t ∈ 𝕂 }, for all i ∈ ℕτ ,

for all τ ∈ 𝕂 (τ ≠ t) },

V = { (X, U) : x(k + 1) = g(x(k), u(k); k) (k = 0, . . . , T − 1), x(0) = x0 } and ε ∈ Y s with t t t t ε = { ε01 , . . . , ε0N0 , . . . , ε1t−1 , . . . , ε t−1 N t−1 , ε 1 , . . . , ε s−1 ε s+1 , . . . , ε N t , . . . , T−1 T−1 ε1t+1 , . . . , ε t+1 N t+1 , . . . , ε 1 , . . . , ε N T−1 }, τ (ε)) ≠ 0 }. Y s = { ε : (Γ st (ε) ⋃ Γ s,t

Gomide and Haimes proved that if a policy decision is the unique solution of the ε-constraint problem, then it is a noninferior policy decision. Conversely, any noninferior policy decision solves the ε-constraint problem.

296 | 9 Risk assessments

9.8 Multiobjective risk impact analysis method The PMR method is a risk analysis method. The MMIA method is an impact analysis method. The Multiobjective Risk Impact Analysis (MRIA) method is a combination of the MMIA method with the PMR method. Let x(k) and y(k) be two normally distributed random variables. Assume that x(k) represents the system’s state and y(k) represents the system’s output, both at stage k, and x(k + 1) = Ax(k) + Bu(k) + w(k), x(0) = x0 , (9.8.1) y(k) = Cx(k) + υ(k) (k = 0, . . . , T − 1), where A, B, C are system parameters, u(k) is the decision, and w(k), υ(k) are two normally distributed, purely random sequences. Assume further that the system satisfies the following statistical properties: for 0 ≤ k, l ≤ T − 1, E[x(0)] = x0 , { { { { { { E[(x(0) − x0 )2 ] = X0 , { { { E[(x(0) − x0 )υ(k)] = 0, { { { { E[(x(0) − x0 )w(k)] = 0 and E[υ(k)] = 0, { { { { { { E[w(k)] = 0, { { { { { { E[υ2 (k)] = Q(k) = Q, { { { E[w2 (k)] = P(k) = P, { { { { { { E[υ(k)υ(l)] = 0 { { { { { E[w(k)w(l)] = 0 { { { { { E[w(k)υ(l)] = 0.

(k ≠ l), (k ≠ l),

Proposition 9.8.1. Under the above assumption, the mean of y(k) satisfies that k−1

E[y(k)] = CA k x0 + ∑ CA i Bu(k − 1 − i). i=0

Proof. Note that E(w(k)) = 0 and E(υ(k)) = 0. Since u(k) is constant, it follows by (9.8.1) that E[x(k + 1)] = E[Ax(k) + Bu(k) + w(k)] = AE[x(k)] + BE[u(k)] + E[w(k)] = AE[x(k)] + Bu(k), E[y(k)] = E[Cx(k) + υ(k)] = CE[x(k)] + E[υ(k)] = CE[x(k)].

(9.8.2)

9.8 Multiobjective risk impact analysis method |

297

Now we prove by induction that the mean of x(k) satisfies that k−1

E[x(k)] = A k x0 + ∑ A i Bu(k − 1 − i).

(9.8.3)

i=0

In the case k = 0. By (9.8.2) and E[x(0)] = x0 , it follows that E[x(1)] = AE[x(0)] + Bu(0) = Ax0 + Bu(0), i.e., (9.8.3) holds clearly for k = 0. Assume for k that k−1

E[x(k)] = A k x0 + ∑ A i Bu(k − 1 − i).

(9.8.4)

i=0

Prove that (9.8.3) holds for k + 1. By (9.8.2) and (9.8.4), it follows that E[x(k + 1)] = AE[x(k)] + Bu(k) k−1

= A (A k x0 + ∑ A i Bu(k − 1 − i)) + Bu(k) i=0 k−1

= A k+1 x0 + ∑ A i+1 Bu(k − 1 − i) + Bu(k). i=0

Let j = 1 + i. Note that Bu(k) =

A0 Bu(k

− 0) since A0 = 1 and B(k) = B(k − 0). Then

k

k

E[x(k + 1)] = A k+1 x0 + ∑ A j Bu(k − j) + Bu(k) = A k+1 x0 + ∑ A j Bu(k − j), j=1

j=0

i.e., (9.8.3) holds for k + 1. Finally, by (9.8.2) and (9.8.3), a direct computation deduces that k−1

E[y(k)] = CE[x(k)] = CA k x0 + ∑ CA i Bu(k − 1 − i). i=0

Proposition 9.8.2. Under the above assumption, the variances of y(k) satisfy that k−1

Var(y(k)) = C2 A2k X0 + ∑ C2 A2i P + Q. i=0

Proof. It is clear by (9.8.1) that Var(y(k)) = Var(Cx(k)) + Var(υ(k)). A direct computation shows that Var(Cx(k)) = E[C2 x2 (k)] − (E[Cx(k)])2 = C2 (E[x2 (k)] − (E[x(k)])2 ) = C2 Var(x(k)).

298 | 9 Risk assessments By E[υ2 (k)] = Q and E[υ(k)] = 0, it follows that Var(υ(k)) = E(υ2 (k)) − (E[υ(k)])2 = Q. Thus, Var(y(k)) = C2 Var(x(k)) + Q. (9.8.5) Similarly, by (9.8.1), it is clear that Var(x(k + 1)) = Var(Ax(k)) + Var(Bu(k)) + Var(w(k)). A direct computation shows that Var(Ax(k)) = E[A2 x2 (k)] − (E[Ax(k)])2 = A2 (E[x2 (k)] − (E[x(k)])2 ) = A2 Var(x(k)). Since B and u(k) are constants, Var(Bu(k)) = E[B2 u2 (k)] − (E[Bu(k)])2 = B2 u2 (k) − (Bu(k))2 = 0. By E[w(k)] = 0 and E[w2 (k)] = P, it follows that Var(w(k)) = E[w2 (k)] − (E[w(k)])2 = P. Thus, Var(x(k + 1)) = A2 Var(x(k)) + P. (9.8.6) Now we prove by induction that the variance of x(k) satisfies that k−1

Var(x(k)) = A2k X0 + ∑ A2i P.

(9.8.7)

i=0

In the case k = 0. From (9.8.6), it follows that Var(x(1)) = A2 Var(x(0)) + P = A2 (E[x2 (0)] − (E[x(0)])2 ) + P. By E[(x(0) − x0 )2 ] = X0 and E[x(0)] = x0 , and E[x20 ] = x20 , it follows that E[x2 (0)] = E[((x(0) − x0 )2 + 2x(0)x0 − x20 )] = E[(x(0) − x0 )2 ] + 2x0 E[x(0)] − E[x20 ] = X0 + x20 , (E[x(0)])2 = x20 . Thus, Var(x(1)) = A2 (X0 + x20 − x20 ) + P = A2 X0 + P, i.e., (9.8.7) holds for k = 0. Assume for k that k−1

Var(x(k)) = A2k X0 + ∑ A2i P. i=0

Prove that (9.8.7) holds for k + 1. By (9.8.6) and (9.8.8), it follows that Var(x(k + 1)) = A2 Var(x(k)) + P k−1

= A2 (A2k X0 + ∑ A2i P) + P i=0 k−1

= A2(k+1) X0 + ∑ A2(i+1) P + P. i=0

(9.8.8)

9.8 Multiobjective risk impact analysis method |

299

Let j = 1 + i. Then k

k

Var(x(k + 1)) = A2(k+1) X0 + ∑ A2j P + P = A2(k+1) X0 + ∑ A2j P, j=1

j=0

i.e., (9.8.7) holds for k + 1. Finally, by (9.8.5) and (9.8.8), a direct computation shows that k−1

Var(y(k)) = C2 A2k X0 + ∑ C2 A2i P + Q. i=0

From Propositions 9.8.1 and 9.8.2, it is seen that E[y(k)] is dependent on k and u, while Var(y(k)) is only dependent on k. Thus, E[y(k)] and Var(y(k)) may be denoted by μ y (k, u) and σ2y (k), respectively, i.e., μ y (k, u) := E[y(k)], σ2y (k) := Var(y(k)). Let y(k) be the damage and its pdf p(y) be the normal distribution p(y) := p(y(k)) =

2

1 y(k) − μ y (k, u) exp (− ( ) ), 2 σ y (k) √2π σ y (k) 1

where σ y (k) is the positive square root of Var(y(k)). Correspondingly, – f1k (u) represents the cost objective function of y(k) at stage k; – f jk (u) (j = 2, . . . , N k + 2) represents the j-th conditional expected risk function of y(k) at stage k and β kj−1

∫β k

f jk (u) =

β kj−1

y(k)p(y(k)) dy(k) =

j−2

β kj−1

∫β k

j−2

β kj−1

∫β k

p(y(k)) dy(k)

j−2



∫β k

j−2

y(k) √2πσ y (k)

exp (− 12 (

1 √2πσ y (k)

exp (− 12

y(k)−μ y (k,u) 2 ) ) dy(k) σ y (k)

y(k)−μ y (k,u) 2 ( σ y (k) ) ) dy(k)

;

(9.8.9) f Nk k +3 (u) represents the unconditional excepted risk function of y(k) at stage k and β kN

f Nk k +3 (u) =

∫β k k

+1

=

0

β kN

∫β k k

β kN

y(k)p(y(k)) dy(k) +1

p(y(k)) dy(k)

0

∫β k k

+1

0 β kN +1 k β0k



y(k) √2πσ y (k)

exp (− 12 (

y(k)−μ y (k,u) 2 ) ) dy(k) σ y (k)

1 √2πσ y (k)

exp (− 12 (

y(k)−μ y (k,u) 2 ) ) dy(k) σ y (k)

,

(9.8.10) where N k is the number of partitions for the probability exceedance axis at stage k, β0k = −∞ and β kN k +1 = ∞ are the lower and upper bounds of the damage at stage k, respectively, and β kj (j = 1, . . . , N k ) are the partitioned points of the damage at stage k. Denote the dominator of (9.8.9) by q kj , i.e., q kj = ∫

β kj−1

β kj−2

2

1 y(k) − μ y (k, u) exp (− ( ) ) dy(k) 2 σ y (k) √2π σ y (k) 1

(j = 2, . . . , N k + 2).

300 | 9 Risk assessments

By (9.8.9) and (9.8.10), it follows that N k +2

f Nk k +3 (u) = q2k f2k (u) + ⋅ ⋅ ⋅ + q kN k +2 f Nk k +2 (u) = ∑ q kj f jk (u), j=2 N +2

k q kj = 1. where q kj > 0 and q2k + ⋅ ⋅ ⋅ + q kN k +2 = ∑j=2

The f jk (u) (j = 2, . . . , N k + 2) is computed as follows. Note that y(k) = μ y (k, u) + (y(k) − μ y (k, u)). The numerator of (9.8.9) becomes ∫

β kj−1

β kj−2

2

1 y(k) − μ y (k, u) exp (− ( ) ) dy(k) 2 σ y (k) √2π σ y (k) y(k)

= ∫

β kj−1

β kj−2

+∫

2 μ y (k, u) 1 y(k) − μ y (k, u) exp (− ( ) ) dy(k) 2 σ y (k) √2π σ y (k) 2 y(k) − μ y (k, u) 1 y(k) − μ y (k, u) exp (− ( ) ) dy(k). 2 σ y (k) √2π σ y (k)

β kj−1

β kj−2

y(k)−μ y (k,u) σ y (k)

Let τ = ∫

β kj−1

β kj−2



β kj−1

β kj−2

and dτ =

dy(k) σ y (k) .

Then

2 α j−1 1 μ y (k, u) τ2 1 y(k) − μ y (k, u) exp (− ( e− 2 dτ, ) ) dy(k) = μ y (k, u) ∫ k 2 σ y (k) √2π σ y (k) √ α j−2 2π k

2 α j−1 τ (y(k) − μ y (k, u)) τ2 1 y(k) − μ y (k, u) exp (− ( e− 2 dτ, ) ) dy(k) = σ y (k) ∫ k 2 σ y (k) √2π σ y (k) α j−2 √2π k

where α kj−i =

β kj−i − μ y (k, u) σ y (k)

(j = 2, . . . , N k + 2; i = 1, 2),

(9.8.11)

and so the numerator of (9.8.9) becomes ∫

β kj−1

β kj−2

2

1 y(k) − μ y (k, u) exp (− ( ) ) dy(k) 2 σ y (k) √2π σ y (k) y(k)

= μ y (k, u) ∫

α kj−1

α kj−2

k

α j−1 τ2 1 − τ2 τ e 2 dτ + σ y (k) ∫ e− 2 dτ. √2π α kj−2 √2π

Similarly, the denominator of (9.8.9) becomes ∫

β kj−1

β kj−2

2

α j−1 1 τ2 1 y(k) − μ y (k, u) exp (− ( e− 2 dτ. ) ) dy(k) = ∫ 2 σ y (k) √2π σ y (k) α kj−2 √2π

1

k

Therefore, the j-th conditional expected risk function of the damage at stage k becomes f jk (u) = μ y (k, u) + σ y (k)b kj (j = 2, . . . , N k + 2), (9.8.12)

9.8 Multiobjective risk impact analysis method |

where

α kj−1

∫α k

b kj =

301

τ2

τe− 2 dτ (j = 2, . . . , N k + 2)

j−2

α kj−1

∫α k e

2

− τ2

(9.8.13)



j−2

and α kj−i is stated in (9.8.11). Similarly, the f Nk k +3 (u) is computed as follows: β kN

f Nk k +3 (u) =

μ y (k, u) ∫β k k

+1

0

1 √2π

β kN

τ2

e− 2 dτ + σ y (k) ∫β k k 0

β kN

∫β k

k +1

0

1 √2π

e

2 − τ2

+1

τ √2π

τ2

e− 2 dτ .



τ2

τ2

Note that β0k = −∞ and β kN k +1 = ∞. Since e− 2 is an even function and τe− 2 is an odd function, ∞

1 − τ2 e 2 dτ = 1, √2π

∫ −∞ ∞

τ2 τ e− 2 dτ = 0, √2π

∫ −∞

and so ∞

f Nk k +3 (u)

=

μ y (k, u) ∫−∞

1 √2π



τ2

e− 2 dτ + σ y (k) ∫−∞ ∞

∫−∞

1 √2π

τ2

e− 2 dτ

τ √2π

τ2

e− 2 dτ

= μ y (k, u),

(9.8.14)

i.e., the unconditional excepted risk function of the damage at stage k is equal to the mean of the damage. Note that σ y (k) is independent of u and b kj (j = 2, . . . , N k + 2) is constant. Then min{ f jk (u) } = min{ μ y (k, u) + b kj σ y (k) } u

u

= b kj σ y (k) + min{ μ y (k, u) } u

(j = 2, . . . , N k + 2),

i.e., the minimizing of any one of the conditional excepted risk functions is reduced to minimizing the mean of the damage. Combining the cost objective function with any one of the risk functions (the conditional or the unconditional) constitutes a set of biobjective optimization problems at stage k, i.e., min{ f1k (u), f2k (u) }, { { { { { { .. { . { { { min{ f1k (u), f Nk k +2 (u) }, { { { { k k { min{ f1 (u), f N k +3 (u) }.

302 | 9 Risk assessments

These biobjective optimization problems can be solved using the ε-constraint method. k The trade-off functions λ1j between the cost objective function f1k (u) and the risk function f jk (u) both at stage k are given by k =− λ1j

∂f1k ∂f jk

(j = 2, . . . , N k + 3).

For j = 2, . . . , N k + 2, since σ y (k) is independent of u and b kj is constant, by (9.8.12), it follows that ∂f jk (u) = ∂ (μ y (k, u) + σ y (k) b kj ) = ∂μ y (k, u), and for j = N k + 3, by (9.8.14), it is clear that ∂f Nk k +3 (u) = ∂μ y (k, u). k between the cost function f1k (u) and the risk function Thus, the trade-off functions λ1j

f jk (u) at stage k are k λ1j =−

∂f1k ∂μ y

(j = 2, . . . , N k + 3),

i.e., all trade-offs for the given stage k are equal. Therefore, for the normal distribution, the risk functions of the damage at stage k are parallel curves. This greatly simplifies the multiobjective optimization. Example 9.8.3. Let N k = 2 and α0k = −∞, α1k = −1, α2k = 1, α3k = ∞ in (9.8.11). Denote x

Φ(x) :=

τ2 1 ∫ e− 2 dτ. √2π

−∞

By (9.8.13), −1

b2k

=

τ2

∫−∞ τe− 2 dτ √2π Φ(−1)

=

∫−1 τe− 2 dτ

√2π (Φ(1) − Φ(−1)) ∞

b4k

=

√2π Φ(−1)

=−

0.24197 = −1.52514, 0.158655

τ2

1

b3k

e− 2 1

=−

τ2

∫1 τe− 2 dτ √2π(1 − Φ(1))

= 0, e− 2 1

=

√2πΦ(−1)

=

0.24197 = 1.52514. 0.158655

Then, by (9.8.12), the conditional expected risk functions at stage k are f2k (u) = μ y (k, u) + σ y (k)b2k = μ y (k, u) − 1.52514 σ y (k), f3k (u) = μ y (k, u) + σ y (k)b3k = μ y (k, u), f4k (u)

= μ y (k, u) +

σ y (k)b4k

= μ y (k, u) + 1.52514 σ y (k),

(9.8.15)

9.8 Multiobjective risk impact analysis method |

303

where σ y (k) is the positive square root of σ2y (k). By Propositions 9.8.1 and 9.8.2, μ y (k, u) and σ2y (k) satisfy the following: k−1

μ y (k, u) = CA k x0 + ∑ CA i Bu(k − 1 − i), i=0

(9.8.16)

k−1

σ2y (k)

= C A X0 + ∑ C A P + R. 2

2k

2

2i

i=0

Assume that for one scenario, A = 1.02, x0 = 100,

B = −0.01, X0 = 100,

C = 1,

P = 200,

u(0) = 200,

R = 0,

u(1) = 150.

We will find the values of conditional expected risk functions at stage k (k = 1, 2). For k = 1, by (9.8.16), μ y (1, u) = (1.02)1 (100) − (1.02)0 (0.01)(200) = 100, σ2y (1) = (1.02)2 (100) + (1.02)0 (200) = 304.04, σ y (1) = √304.04 = 17.4. From this and (9.8.15), we get f21 (u) = 100 − (1.52514)(17.4) = 73.46 f31 (u) = 100, f41 (u) = 100 + (1.52514)(17.4) = 126.54. For k = 2, by (9.8.16), μ y (2, u) = (1.02)2 (100) − (1.02)0 (0.01)(150) − (1.02)(0.01)(200) = 100.5, σ2y (2) = (1.02)4 (100) + (1.02)0 (200) + (1.02)2 (200) = 516.32, σ y (2) = √516.32 = 22.7. From this and (9.8.15), we get f22 (u) = 100.5 − (1.52514)(22.7) = 65.88 f32 (u) = 100.5, f42 (u) = 100.5 + (1.52514)(22.7) = 135.12. These values are summarized in the following table.

Scenario f2k

k=1 73.46

k=2 65.88

f3k

100

100.5

f4k

126.54

135.12

304 | 9 Risk assessments

9.9 The Leslie model In the simplest population dynamics model of a state variable, let – p(t) represent the level of population at time t (t = 0, 1, . . . ); – B and D be the numbers of births and deaths in any one year, respectively; – b(t) and d(t) represent birth and death rates for the time between t and t + 1 (t = 0, 1, . . . ), respectively. Clearly, B = b(t)p(t), D = d(t)p(t). Assume that the population level p(0) is known, and that the birth and death rates do not change with time, i.e., b(t) = b and d(t) = d. Then population growth from t to t + 1 is balanced as follows: p(t + 1) = p(t) + B − D = p(t)r,

(9.9.1)

where r = 1 + b − d is the overall growth rate. Meyer calls it the Malthusian parameter. For any t (t ∈ ℤ+ ), it follows from (9.9.1) that p(t) = p(t − 1)r = p(t − 2)r2 = ⋅ ⋅ ⋅ = p(0)r t ,

(9.9.2)

i.e., the growth rate of the population can be represented by an exponential function, thus the simplest population dynamics model of a state variable is called an exponential model. Example 9.9.1. Assume that the current worker population in a small factory is 70 workers, and that the rate of increasing due to new hiring has been 0.04 and the rate of leaving the factory has been 0.02. How many workers will be at that factory in 5 years? How many years will it take for the number of workers to double? Solution. By the assumption, p(0) = 70, b = 0.04, d = 0.02, and r = 1 + b − d = 1.02. By (9.9.2), p(5) = p(0)r5 = (70)(1.02)5 ≈ (70)(1.104) ≈ 77. Thus the number of workers in 5 years is about 77. Let t be the number of years until doubling. Then p(t) = 2p(0). From this and (9.9.2), p(0)r t = 2p(0)

or

2 = (1.02)t .

The solution is t = ln 2/ln(1.02) ≈ 35, i.e., the number of years of doubling is about 35.

9.9 The Leslie model | 305

The simplified version of the Leslie model adapted by Meyer is as follows. In the Leslie model, only the female population is considered and the female population is divided into n age categories [0, ∆),

[∆, 2∆),

[(n − 1)∆, n∆),

...,

where ∆ is the width of each age interval of the population, and define – F i (t) is the number of females in the age interval [i∆, (i + 1)∆), – m i is the ∆-year maternity rate for the age interval [i∆, (i + 1)∆), – p i is the survival rate in the age interval [i∆, (i + 1)∆), where i = 0, 1, . . . , n − 1. The female population of the i-th age group at the next t + ∆ period is given by F i+1 (t + ∆) = p i F i (t) (t = 0, ∆, 2∆, . . . ). The number of newborns at the lowest age group at time (t + ∆) is given by n−1

F0 (t + ∆) = ∑ m i F i (t). 0

The combination of these two equations gives F0 (t + ∆) m0 p0 F1 (t + ∆) ( F2 (t + ∆) ) ( 0 ( )=( .. .. . . (F n−1 (t + ∆)) ( 0

m1 0 p1 .. . 0

m2 0 0 .. . 0

⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ p n−2

m n−1 F0 (t) 0 F1 (t) ) ( 0 ) ( F2 (t) ) ). .. .. . . 0 ) (F n−1 (t))

Its matrix form is F(t + ∆) = MF(t),

(9.9.3)

where F0 (t) F1 (t) ( ) F(t) = ( F2 (t) ) .. . F ( n−1 (t))

m0 p0 (0 M=( .. . 0 (

m1 0 p1 .. . 0

m2 0 0 .. . 0

⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ ⋅⋅⋅

⋅⋅⋅ p n−2

m n−1 0 0 ) ). .. . 0 )

The vector F is called the age distribution vector and the n × n matrix M is called the Leslie matrix. For any k (k ∈ ℤ+ ), it follows from (9.9.3) that F(k∆) = MF((k − 1)∆) = ⋅ ⋅ ⋅ = M k F(0) (k ∈ ℤ+ ).

306 | 9 Risk assessments

Example 9.9.2. Based on an observation, the population, the maternity rate, and the survival rate of female birds are given by the following: F0 (0) = 100,

F1 (0) = 60,

m0 = 0, 1 p0 = , 2

m1 = 3, 3 p1 = , 4

F2 (0) = 40, m2 = 1, p2 = 0.

Find the age distribution vectors F(∆) and F(2∆). Solution. The initial vector and the Leslie matrix are, respectively, 100 F0 (0) F(0) = (F1 (0)) = ( 60 ) , F2 (0) 40

0 M = ( 12 0

3 0 3 4

1 0) . 0

To find F(∆), by F(∆) = MF(0), we get F0 (0) 0 F(∆) = M (F1 (0)) = ( 12 F2 (0) 0

3 0

1 100 220 0) ( 60 ) = ( 50 ) , 0 40 45

3 4

To find F(2∆), by F(2∆) = MF(∆), we get 0 F(2∆) = ( 12 0

3 0 3 4

1 220 195 0) ( 50 ) = ( 110 ) 0 45 37.5

or by F(2∆) = M 2 F(0), noticing that 0 M = 2

( 12

3 0

2

3 1 2 0) = ( 0 3 0 8

0

3 4

3 2

3 4 3 2

0

0

0

3 4 3 2

0

0

0

1 , 2)

we get F(2∆) = ( 0 3 8

1 2) (

100 195 60 ) = ( 110 ) . 40 37.5

Example 9.9.3. Suppose that the numbers of females, the maternity rates, and the survival rates in the age intervals are given as follows: F0 (0) = 50, m0 = 0, 3 p0 = , 4

F1 (0) = 30, m1 = 1, 1 p1 = , 2

F2 (0) = 20, m2 = 2, 1 p2 = . 2

Find the age distribution vectors F(∆) and F(2∆).

F3 (0) = 10, m3 = 3

9.10 Leontief’s and inoperability input-output models | 307

Solution. The initial vector and the Leslie matrix are, respectively, F0 (0) 50 F1 (0) 30 F(0) = ( ) = ( ), F2 (0) 20 F3 (0) 10

0 M=

3 (4

0 0

1 0 1 2

0

2 0 0 1 2

3 0 ). 0 0

To find F(∆), by F(∆) = MF(0), we get F0 (0) 0 3 F1 (0) F(∆) = M ( ) = (4 F2 (0) 0 F3 (0) 0

1 0

2 0 0

1 2

1 2

0

3 50 100 0 30 37.5 )( ) = ( ), 0 20 15 0 10 10

To find F(2∆), by F(2∆) = MF(∆), we get 0 F(2∆) =

3 (4

0 0

1 0 1 2

0

2 0 0 1 2

3 100 97.5 0 37.5 75 )( )=( ), 0 15 18.75 0 10 7.5

or by F(2∆) = M 2 F(0), noticing that 0 M2 =

1 0

3 (4

1 2

0 0

2

3 3 4 0 0 ) = (3 0 8 0 0

2 0 0 1 2

0

1 3 4

0 1 4

3 2 3 2

0

0 0

0 0

9 4) ,

we get 3 4

1

0

3 4

8

0

F(2∆) = ( 3

0

1 4

3 2 3 2

0

0 0

0 0

50

9 4 ) (30)

20 10

97.5 75 =( ). 18.75 7.5

9.10 Leontief’s and inoperability input-output models Leontief’s input-output model studies the equilibrium state of an economy consisting of a number of individual economic sectors. The inoperability input-output model presented by Haimes and Jiang studies the equilibrium state of a system consisting of n critical complex intra- and interconnected infrastructures. Although the equations of these two models are the same, they connote different meanings. In Leontief’s input-output model, define the following notations: – x i is the output (for the total economy) of the i-th goods, i = 1, . . . , n; – r k is the input (for the total economy) of the k-th resource, k = 1, . . . , m;

308 | 9 Risk assessments

– – –

x ij is the amount of the i-th goods used in the production of the j-th goods; r kj is the amount of the k-th resource input used in the production of the j-th goods; c i is the i-th input for the production of other commodities, i = 1, . . . , n;

Assume that the inputs of both goods and resources required to produce any commodity are proportional to the output of that commodity, i.e., x ij = a ij x j

(i, j = 1, . . . , n),

r kj = b kj x j

(k = 1, . . . , m; j = 1, . . . , n).

These two equations are called Leontief’s proportionality equations of goods and resource, respectively. The equation n

x i = ∑ x ij + c i

(i = 1, . . . , n).

j=1

is called the Leontief-based equation. The combination of the proportionality equation of goods with the Leontief-based equation gives the following Leontief equation: n

x i = ∑ a ij x j + c i

(i = 1, . . . , n)

j=1

which is written in a matrix form x = Ax + c, where x = (x1 , . . . , x n )T and c = (c1 , . . . , c n )T , and a11 ⋅ ⋅ ⋅ a1n .. ) . .. A = ( ... . . a n1 ⋅ ⋅ ⋅ a nn Similarly, applying the proportionality assumption of the resources gives n

n

∑ r kj = ∑ b kj x j j=1

(k = 1, . . . , m).

j=1

Since the demand for the k-th resource cannot exceed its supply, then n

∑ b kj x j ≤ r k ,

r k ≥ 0 (k = 1, . . . , m).

j=1





In the Inoperability Input-Output Model (IIM), define the following notations: x i is the overall risk of inoperability of the i-th intra- and interconnected infrastructure that can be triggered by one or multiple failures caused by accidents or acts of terrorism, i = 1, . . . , n; x ij is the degree of inoperability triggered by one or multiple failures that the j-th infrastructure can contribute to the i-th infrastructure due to their complex intraand interconnectedness, i, j = 1, . . . , n;

9.10 Leontief’s and inoperability input-output models | 309

– –

a ij is the probability of inoperability that the j-th infrastructure contributes to the i-th infrastructure; c i is the natural or man-made perturbation into the i-th critical infrastructure, i = 1, . . . , n;

and assume that x ij = a ij x j

(i, j = 1, . . . , n).

This equation is called the proportionality equation of IIM. The equation n

x i = ∑ x ij + c i

(i = 1, . . . , n).

j=1

is called the balance equation of IIM. The combination of the proportionality equation with the balance equation gives that n

x i = ∑ a ij x j + c i

(i = 1, . . . , n)

j=1

which is called the inoperability equation of IIM. Its matrix form is x = Ax + c, where x = (x1 , . . . , x n )T , c = (c1 , . . . , c n )T , and the A-matrix is a11 A = ( ... a n1

⋅⋅⋅ .. . ⋅⋅⋅

a1n .. ) . . a nn

Example 9.10.1. Assume that a system consists of three subsystems. Denote by x1 , x2 , x3 the inoperability of these three subsystems. If Subsystem 1 fails completely, then Subsystems 2 and 3 can perform to only 40 % and 75 %, respectively, of their functionality. If Subsystem 2 fails completely, then Subsystems 1 and 3 can perform to only 20 % and 10 %, respectively, of their functionality. Assume that the inoperability of three subsystems has no impact on each other. Thus, the A-matrix for the system is 0 A = ( 0.6 0.25

0.8 0 0.9

0 0) . 0

Assume that Subsystem 2 loses 40 % of its functionality due to an external perturbation. The inoperability equation of IIM is x1 0 (x2 ) = ( 0.6 0.25 x3

0.8 0 0.9

0 x1 0 0.8x2 0) (x2 ) + (0.4) = ( 0.6x1 + 0.4 ) 0 0 x3 0.25x1 + 0.9x2

or { x1 = 0.8x2 , { { { x2 = 0.6x1 + 0.4, { { { { x = 0.25x1 + 0.9x2 . { 3

310 | 9 Risk assessments

Solving this set of equations gives 8 ≈ 0.615, 13 10 ≈ 0.769, x2 = 13 11 ≈ 0.846. x3 = 13

x1 =

This means that the inoperability of Subsystems 1, 2, and 3 is 0.615, 0.769, and 0.846, respectively. Assume h × 100 % of the operability of Subsystem 2 is lost due to the attack alone. Then the inoperability equation of IIM is x1 0 (x2 ) = ( 0.6 0.25 x3

0.8 0 0.9

0 x1 0 0.8x2 + = ) ( ) ( ) ( 0 h x2 0.6x1 + h ) 0 0 x3 0.25x1 + 0.9x2

or { x1 = 0.8x2 , { { { x = 0.6x1 + h, { { 2 { { x = 0.25x1 + 0.9x2 . { 3 Note that the constraint 0 ≤ x1 , x2 , x2 ≤ 1. Solving the set of equations, the solution is 20 25 55 h, x2 = h, x3 = h for 0 ≤ h ≤ 0.47, x1 = 13 13 26 8 10 for 0.47 < h ≤ 1. ≈ 0.727, x2 = ≈ 0.909, x3 = 1 x1 = 11 11 Subsystem 3 fails completely when the external attack brings down 47 % of its operability. The remaining 53 % is taken away by its dependency on Subsystems 1 and 2.

Further reading [1] [2] [3]

[4]

[5]

Asbeck E, Haimes YY. The partitioned multiobjective risk method. Large Scale Systems. 1984(6):13–38. Aven T. Risk assessment and risk management: Review of recent advances on their foundation. European Journal of Operational Research. 2016(253):1–13. Browning J, Thomas N. An assessment of the tsunami risk in Muscat and Salalah, Oman, based on estimations of probable maximum loss. International Journal of Disaster Risk Reduction. 2016(16):75–87. Chankong V. Multiobjective decision making analysis: The interactive surrogate worth trade-off method, PhD dissertation, Systems Engineering Department, Case Western Reserve, Cleveland, OH, 1977. de Carvalho AL, Antunes CH, Freire F, Henriques CO. A multi-objective interactive approach to assess economic-energy-environment trade-offs in Brazil. Renewable and Sustainable Energy Reviews. 2016(54):1429–1442.

Further reading

[6] [7] [8]

[9] [10]

[11] [12] [13] [14] [15] [16] [17] [18]

[19] [20]

[21] [22] [23] [24]

[25]

| 311

Finkel A. Confronting Uncertainty in Risk Management: A guide for Decision-Makers. Resources for the Future, Center for Risk Management, Washington, DC., 1990. Gomide F, Haimes YY. The multiobjective, multistage impact analysis method, theoretical basis. IEEE Transactions on System, Man, and Cybernetics SMC. 1984(14):88–98. Haimes YY, Lasdon LS, Wismer DA. On the bicriterion formulation of the integrated system identification and systems optimization. IEEE Transactions on Systems, Man, and Cybernetics SMC. 1971(1):296–297. Haimes YY, Lambert JH, Li D. Risk of extreme events in a multiobjective framework. Water Resources Research. 1992(28):201–209. Hamdy M, Nguyen AT, Hensen JLM. A performance comparison of multi-objective optimization algorithms for solving nearly-zero-energy-building design problems. Energy and Buildings. 2016(121):57–71. He W, He Q, Zhou J. Soil weathering-water environment-ecological risks in Hanjiang River Basin, China. Quaternary International. 2015(380–381):297–304. Kankara RS, Arockiaraj S, Prabhu K. Environmental sensitivity mapping and risk assessment for oil spill along the Chennai Coast in India. Marine Pollution Bulletin. 2016(106):95–103. Kennedy WV. The directive on environmental impact assessment. Environmental Policy and Law. 1982(8):84–95. Law AM, Kelton WD. Simulation Modeling and Analysis. McGraw-Hill, New York, 1991. Leach MR. Risk and impact analysis in a multiobjective framework, MSc thesis, Systems Engineering Department, Case Western Reserve University, Cleveland, OH, 1984. Leontief WW. The Structure of the American Economy, 1919–1939. Second edition, Oxford University Press, New York, 1951. Leontief WW. Input-Output Economics. Second edition, Oxford University Press, New York, 1986. Li C, Sun L, Jia J, Cai Y, Wang X. Risk assessment of water pollution sources based on an integrated k-means clustering and set pair analysis method in the region of Shiyan, China. Science of the Total Environment. 2016(557–558):307–316. Li D, Haimes YY. The uncertainty sensitivity index method and its extension. Naval Research Logistic. 1988(35):655–672. Mohamad N, Latif MT, Khan MF. Source apportionment and health risk assessment of PM10 in a naturally ventilated school in a tropical environment. Ecotoxicology and Environmental Safety. 2016(124):351–362. Morgan G, Henrion M. Uncertainty. Cambridge University Press, Cambridge, 1990. Naidu R, Espana VAA, Liu Y, Jit J. Emerging contaminants in the environment: Risk-based analysis for better management. Chemosphere. 2016(154):350–357. Oxley T, Simon HM. Space, time and nesting Integrated Assessment Models. Environmental Modelling & Software. 2007(22):1732–1749. Senapati N, Jansson PE, Smith P, Chabbi A. Modelling heat, water and carbon fluxes in mown grassland under multi-objective and multi-criteria constraints. Environmental Modelling & Software. 2016(80):201–224. Shrestha S, Semkuyu DJ, Pandey VP. Assessment of groundwater vulnerability and risk to pollution in Kathmandu Valley, Nepal. Science of the Total Environment. 2016(556):23–35.

10 Life cycle assessments Life cycle means the entire life cycle of a product system from the raw material acquisition, manufacture, use/reuse/maintenance, and recycle/waste management, to its final disposal of a product, process or system. The concept of life cycle was mentioned first by Novick in 1959. At that time, a main application was the cost of weapon systems including the cost of purchase, development, and end-of-life operations. When environmental policy became a major issue in all industrialized societies, the life cycle concept was fully developed. Currently, life cycle assessment (LCA) has become a major instrument to assess the ecological burdens and impacts throughout the consecutive and interlinked stages of a product system, from raw material acquisition or generation from natural resources, through production and use to final disposal.

10.1 Classic life cycle assessment Life Cycle Assessment (LCA) is a cradle-to-grave approach for assessing industrial systems. It begins with the initial gathering of raw materials from the earth to create the product and ends at the point when all residuals are returned to the earth. LCA is used for identifying, quantifying, and decreasing the overall environmental impact of a product, process, or system. The International Standards Organization (ISO) documents on LCA include the following several parts: – ISO-14040, Life Cycle Assessment – Principles and Framework (ISO, 1997) – ISO-14041, Life Cycle Assessment – Goal and Scope Definition and Inventory Analysis (ISO, 1998) – ISO-14042, Life Cycle Assessment – Life Cycle Impact Assessment (ISO, 2000) – ISO-14043, Life Cycle Assessment – Life Cycle Interpretation (ISO, 2000) – ISO-14044, Life Cycle Assessment – Requirements and Guidelines (ISO, 2006) The LCA process is a systematic, phased approach. An LCA consists of the following four interrelated phases: – goal and scope definition, – life cycle inventory analysis (or inventory analysis for short), – life cycle impact assessment (or impact assessment for short), – life cycle interpretation (or interpretation for short). The framework of LCA is shown in Figure 10.1. Arrows indicate that all phases are linked to each other, the life cycle interpretation is also linked to all phases, and the LCA has direct applications including product development and improvement, strategic planning, public policy making, marketing, and so on.

DOI 10.1515/9783110424904-011

10.1 Classic life cycle assessment |

313

Impact assessment

Inventory analysis

Interpretation

Direct applications

Goal and scope definition Fig. 10.1: The framework of LCA.

10.1.1 Goal and scope definition There is no explicit ISO definition of this phase of LCA. But it obviously centers around formulating the question and stating the context of answering this question. In this phase, no data is collected and no results are calculated. Rather, it is a place where the plan of the LCA study is defined as clearly and unambiguously as possible such that one can quickly find out the precise question addressed and main principles chosen. The goal of LCAs deals with the following topics: – the intended application; – reasons for carrying out the study; – the intended audience; – whether the results will be used as the basis for public comparative assertions. The scope definition makes a number of major choices. For instance, the product system or systems to be studied, the function the system delivers, and the functional unit. The scope definition further sets the main outline on a number of subjects that are discussed and further refined in more detail in the later phases. For instance, system boundaries, impact categories, and treatment of uncertainty.

10.1.2 Life cycle inventory analysis Life Cycle Inventory Analysis (LCI) is the second phase of LCA. ISO defines it as the phase of LCA involving the compilation and quantification of inputs and outputs for a product throughout its life cycle. In this phase, quantification is an important aspect and numbers (data and calculations) are of central concern. Unit processes are the central element of inventory analysis. A unit process in ISO 14040 is defined as the smallest element considered in the LCI. Examples of unit process are coal mining, steel production, refining of oil, recycling of waste paper, and transport by lorry. In LCA a unit process is considered commonly as a black box that converts a bundle of inputs into a bundle of outputs. LCA studies can connect different unit process into a system in simple upstream-downstream connections or more complicated connections. In the present era of digital databases, LCA studies can easily connect several thousands of unit processes.

314 | 10 Life cycle assessments

In the LCI, all unit processes have to be quantified. This means that the sizes of the inflows and outflows per unit process have to be specified. In scaling the unit processes, the web-like nature of the system quickly creates complications. Two complications are mentioned as follows: – upstream or downstream processes of some products may be difficult to quantify; – some unit processes produce several co-products such that the balance equations become impossible. The first issue can be solved by a procedure known as cut-off. The second issue can be solved by co-product allocation. After appropriate cut-off and allocation, the final inventory results can be calculated. The key steps of LCI are to develop a flow diagram and an LCI data collection plan, collect data of inputs and outputs, and evaluate and document the LCI results.

10.1.3 Life cycle impact assessment Life Cycle Impact Assessment (LCIA) is the third phase of LCA aimed at understanding and evaluating the magnitude and significance of the potential environmental impacts for a product system throughout the life cycle of the product. According to ISO 14042, LCIA is divided into mandatory and optional steps. The mandatory steps are as follows. – Selection of impact categories. The impact category is the central element in life cycle impact assessment. ISO defines the impact category as a class representing environmental issues of concern to which LCI results may be assigned. Impact categories and the corresponding category indicators can be organized at a midpoint level and at an endpoint level along the cause-effect chain. Important environmental impact categories are Acidification Potential (AP), Eutrophication Potential (EP), Global Warming Potential (GWP), and Ozone Depletion Potential (ODP). – Classification. ISO defines it as the assignment of LCI results to the selected impact categories. – Characterization. This step converts the results of LCI into a common metric and aggregates the converted LCI results. The result from characterization is a list of numbers. ISO calls these numbers category indicator results (score). Many LCIA studies stop at the characterization. The optional steps are as follows. – Normalization. This refers to calculating the magnitude of category indicator results relative to reference information. It fulfills several functions that it provides insight into the meaning of the impact indicator results, it helps to check for errors, and it prepares for a possible weighting step.

10.2 Exergetic life cycle assessment | 315





Grouping. This is seldom seen in LCA studies. ISO defines it as the assignment of impact categories into one or more sets. ISO mentions two ways: sorting on a nominal basis and ranking on an ordinal basis. Weighting. This starts with the characterization (or normalization) results. Weighting factors are applied to the characterization indicator results or their normalized version. Weighting produces one final number by W = ∑ WF c × I c , c



where I c is the impact score and WF c is the weighting factor, both for the impact category c, and W is the weighted result. Data quality analysis. This develops a better understanding of the reliability of the indicator results in the LCIA profile.

10.1.4 Life cycle interpretation Life cycle interpretation is the final phase of LCA. It integrates the LCI and LCIA results to develop conclusions and recommendations that relate to the goal and scope of the study. ISO 2006a defines two objectives of interpretation: – Analyze results, reach conclusion, explain limitations, and provide recommendations based on the findings of the preceding phases, and report the results of the interpretation in a transparent manner. – Provide a readily understandable, complete, and consistent presentation of the results of an LCA study in accordance with the goal and scope of the study. LCA is best used as an iterative approach. It is especially important to determine that if the result of the impact assessment or the underlying inventory data are incomplete or unacceptable for drawing conclusions and making recommendations, then the previous steps must be repeated until the results can support the original goals of the study.

10.2 Exergetic life cycle assessment Exergy is a thermodynamic quantity. The exergy of a system is defined as the maximum shaft work that can be attained when it is in a reference environment. Exergy is conserved only when all processes occurring in a system and its surroundings are reversible. Exergy is destroyed whenever an irreversible process occurs. Like energy, exergy can also be transferred across the boundary of a system. There is an exergy transfer corresponding to each type of energy transfer. The exergy transfer associated with shaft work is equal to the shaft work, while the exergy transfer associated with heat transfer depends on the temperature of the reference environment. A system in complete equilibrium with its environment has no exergy.

316 | 10 Life cycle assessments

Exergy analysis is based on the second law of thermodynamics. An exergy balance for a process or system is the following: Accumulated exergy = Input exergy − Output exergy − Destroyed exergy. The exergy quantities in an exergy balance include the exergy of a matter flow, a thermal energy transfer, and electricity. The exergy of a matter flow is equal to the sum of physical, chemical, kinetic, and potential components. The exergy associated with a thermal energy transfer relates to system and reference environment temperatures. The exergy associated with electricity is equal to the energy. Reducing exergy losses or increasing exergy efficiencies can often decrease environmental impacts associated with systems or processes. Exergy losses occur during the lifetime of a product or process. Reducing exergy losses helps improve sustainability. As exergy efficiency approaches ideality, environmental impacts approach zero. As exergy efficiency approaches zero, sustainability approaches zero. The exergy and environmental impact occur generally during all phases of the life cycle. Their main connections are waste exergy emissions, resource degradation, and order destruction and chaos creation. Exergetic life cycle assessment (ExLCA) is a different approach from LCA. Like LCA, it is also a useful analytical tool to identify, quantify, and decrease the overall environmental impact of a process or a system. The general methodological framework of ExLCA is similar to that of LCA. The main differences are the following: (a) The inventory analysis of ExLCA is more detailed than that of LCA. All inputs and outputs must be identified and quantified. The material and energy balances have to be closed. (b) The impact assessment of ExLCA focuses on the determination of the exergies of the flow, the exergy destructions, and exergy efficiencies of the overall process and its subprocesses. (c) The improvement analysis in ExLCA is intended to reduce its life cycle irreversibilities. The summation of all exergy destructions in the life cycle identifies the life cycle irreversibility of the product or process. Throughout ExLCA, the calculation of exergy values requires that the conditions and composition of the reference environment are specified.

10.3 Ecologically-based life cycle assessment Ecosystem goods and services, such as fresh water, soil, and pollination, are essential to all human activity. These goods and services can be divided into the following categories:

10.3 Ecologically-based life cycle assessment |

– – – –

317

provisioning services that supply goods; regulating services that provide benefits by controlling ecosystem processes; cultural services that are all of the intangible benefits; supporting services that are required for all of the other ecosystem services to take place.

Twenty-four different ecosystem services relating to provisioning, regulating, and cultural services are included in the Millennium Ecosystem Assessment (MEA). Of these twenty-four services, scientists have found that in the last fifty years, fifteen ecosystem services have globally degraded, another five ecosystem services have mixed results, and the remaining four ecosystem services have enhanced performance. If these trends continue, the earth may no longer be able to sustain human life. This motivates scientists to put these ecosystem services into the life cycle assessment methodology. Ecologically-based Life Cycle Assessment (Eco-LCA) is a hybrid LCA method. In a hybrid study, the most important parts of the process are modeled using Process-based LCA (Process LCA), the less important parts are modeled using Economic Input-Output LCA (EIO-LCA). Process LCA and EIO-LCA are two methods used routinely to perform the assessment of products and processes by LCA practitioners. Eco-LCA has both the precision of Process LCA and the completeness of EIO-LCA. Eco-LCA contains twenty different ecosystem services in which many provisioning and regulating services were assessed in the MEA, some supporting services were not included in the MEA. Many engineering analyses undervalue or completely ignore the role of ecosystems, i.e., energy consumption and emissions. It is well known that the second law of thermodynamics has profound implications on the capability of man-made technology in meeting sustainability goals. The second law says that decreasing entropy in an open system must result in a greater increasing entropy in the surroundings. This increase manifests as environmental impact since the surrounding environment must dissipate the local increasing entropy. Eco-LCA can quantify the role of ecosystem goods and services in the life cycle. Different goods and services have different units that cannot be added together. To compare various ecosystem services Eco-LCA converts many different units of ecosystem goods and services to a common basis of thermodynamic work using exergy and emergy. Exergy allows the comparison of the different resources on a common basis of Joules of work. Emergy is also a basis of comparison. Emergy’s unit is the solar emergy Joule (seJ). The seJ allows the difference in energy quality to be accounted for in different resources. The factors that are used to convert from the original energy unit to seJ are called transformities (units: seJ/J) or, more generally, Unit Emergy Value (UEV, units: seJ/unit). Transformity indicates how many seJ are required to get a single Joule of the product.

318 | 10 Life cycle assessments

10.4 Case studies Life cycle assessment (LCA) has become a major instrument to assess ecological burdens and impacts throughout the consecutive and interlinked stages of a product system, from raw material acquisition or generation from natural resources, through production and use to final disposal. Here we give some case studies as examples.

10.4.1 Energy crops The need to tackle climate change has pushed the cultivation of energy crops for the production of bio-fuels to the top of the global agenda. Christoforou et al. (2016) use LCA to quantify the environmental impact of first generation energy crops in Cyprus which include maize, sweet sorghum, winter wheat, sugar-beets, potato and winter barley. The goal of the study was to evaluate the environmental impact of different energy crop systems. Life cycle inventory are classified as pre-farm activities (e.g. fuel consumption, electricity consumption, seed, fertilizers and pesticides transportation) and on-farm activities (e.g. land preparation, fertilizer and pesticides application, irrigation and harvesting). Main impact assessment includes global warming potential, acidification potential, eutrophication potential, ozone depletion potential and abiotic depletion potential. The results show that barley, potato, and wheat crop presented the highest environment impact in Cyprus.

10.4.2 Wastewater treatment Wastewater treatment plays a key role in assuring the continued utility of ecosystems. The three main inputs needed to treat wastewater are electricity, chlorine, and transportation services for the solid and sludge waste products. Morrison et al. (2016) investigated a wastewater treatment plant (WWTP) at a university campus. Their analysis began at the point where raw wastewater from individual buildings on the campus deposits into the central wastewater collection system and is gravity fed to the wastewater inlet port at the WWTP and extends to the point that it has been treated to discharge standards. Based on LCA, Morrison et al. (2016) predicted the total energy consumption and CO2 emissions over the lifetime of the WWTP.

10.4.3 Livestock production Commercial livestock production has significant impacts on the environment. Pig production is a highly complex global system which involves the production of fertilizers and pesticides for crop production, land transformation, transportation to and from

Further reading

| 319

farms, energy for light and heat, water for animal consumption and farmyard washing, and waste management. LCA can be used to measure the potential environmental performance of pig production, including acidification potential, eutrophication potential and global warming potential (Graham et al. 2016).

Further reading [1]

[2]

[3]

[4] [5]

[6] [7] [8] [9]

[10]

[11] [12] [13]

[14] [15] [16]

Bare JC. Developing a Consistent Decision-Making Framework by using U.S. EPA’s TRACI. National Risk Management Research Laboratory, US Environmental Protection Agency, Cincinnati, OH, 2002. Bare JC, Gloria TP. Critical analysis of the mathematical relationships and comprehensiveness of life cycle impact assessment approaches. Environmental Science & Technology. 2006(40): 1104–1113. Brand G, Braunschweig A et al. Weighting in Ecobalances with the Ecoscarcity Method – Ecofactors. Environment Series, No.297, Bern, Switzerland, Swiss Agency for the Environment, Forests, and Landscape (SAEFL), 1997. BUS. Ökobilanzen von Packstoffen. Schriftenreihe Umweltschutz Nr. 24. Bern, Switzerland, Bundesamt für Umweltschutz, 1984. Christoforou E, Fokaides PA, Koroneos CJ, Recchia L. Life Cycle Assessment of first generation energy crops in arid isolated island states: The case of Cyprus. Sustainable Energy Technologies and Assessments. 2016(14):1–8. Daniel JJ, Rosen MA. Exergetic environmental assessment of life cycle emissions for various automobiles and fuels. Exergy, an International Journal. 2002(2):283–294. Dincer I. Thermodynamics, exergy and environmental impact. Energy Sources. 2000(22) 723–732. Dincer I, Rosen MA. Exergy: Energy, Environment and Sustainable Development. Elsevier, UK, 2007. EC-JRC. Framework and Requirements for Life Cycle Impact Assessment (LCIA) Models and Indicators. ILCD Handbook-International Reference Life Cycle Data System, European Commission-Joint Research Center, 2010. EC-JRC. An Analysis of existing Environmental Impact Assessment Methodologies for use in Life Cycle Assessment-Background Document. ILCD Handbook-International Reference Life Cycle Data System, European Commission-Joint Research Center, 2010. EC-JRC. General Guide for Life Cycle Assessment-Detailed Practice. ILCD Handbook-International Reference Life Cycle Data System, European Commission-Joint Research Center, 2010. Goedkoop M, Demmers M, Collignon M. The Eco-indicator 95 Manual for Designers. National Reuse of Waste Research Programme, The Netherlands, 1996. Guinée JB, Gorrée M et al. Handbook on Life Cycle Assessment, Operational guide to the ISO Standards. I: LCA in perspective, IIa: Guide, IIb: Operational annex, III: Scientific background. Kluwer Academic Publishers, Dordrecht, 2002. Heijungs R, Guinee J et al. Environment Life Cycle Assessment of Products: Guide and Background. CML, Leiden, The Netherlands, 1992. Heijungs R, Kleijn R. Numerical approaches to life-cycle interpretation, five examples. International Journal of Life Cycle Assessment. 2001(6):141–148. Heijungs R, Suh S. The Computational Structure of Life Cycle Assessment. Kluwer Academic Publishers, Dordrecht, 2002.

320 | 10 Life cycle assessments

[17] Hendrickson CT, Lave LB, Matthews HS. Environmental Life Cycle Assessment of Goods and Services. Washington DC, REF Press, 2006. [18] Hermann WA. Quantifying global exergy resources. Energy. 2006(31):1349–1366. [19] Huesemann MH. The limits of technological solutions to sustainable development. Clean Technologies and Environmental Policy. 2003(5):21–34. [20] Huijbregts MA, Norris G, Bretz R, Ciroth A, Maurice B, von Bahr B et al. Framework for modelling data uncertainty in life cycle inventories. International Journal of Life Cycle Assessment. 2001(6):127–132. [21] ISO. Environmental Management – Life Cycle Assessment – Requirements and Guidelines (ISO 14044). International Organization for Standardization, Geneva, 2006. [22] Jolliet O, Margni M, Charles R, Humbert S, Payet J, Rebitzer G, Rosenbaum R. IMPACT 2002+: A new life cycle impact assessment methodology. International Journal of Life Cycle Assessment. 2003(8):324–330. [23] Kotas TJ. The Exergy Method of Thermal Plant Analysis. Kriger: Malabar, Florida, 1995. [24] McAuliffe GA, Chapman DV, Sage CL. A thematic review of life cycle assessment (LCA) applied to pig production. Environmental Impact Assessment Review. 2016(56):12–22. [25] Millennium Ecosystem Assessment Board. Living Beyond Our Means: Natural Assets and Human Well-being, 2005. [26] Moran MJ. Availability Analysis: A Guide to Efficient Energy Use. American Society of Mechanical Engineers, New York, 1989. [27] Morrison M, Srinivasan RS, Ries R. Complementary life cycle assessment of wastewater treatment plants: An integrated approach to comprehensive upstream and downstream impact assessments and its extension to building-level wastewater generation. Sustainable Cities and Society. 2016(23):37–49. [28] Novick D. The Federal Budget as an Indicator of Government Intentions and the Implications of Intentions. Santa Monica, CA: Rand Corporation, 1959, P-1803. [29] Parry ML, Canziani OF, Palutikof JP, van der Linden PJ, Hanson CE. Contribution of Working Group II to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge Univ. Press, Cambridge, UK, 2007. [30] Rosen MA, Dincer I. On exergy and environmental impact. International Journal of Energy Research. 1997(21):643–654. [31] Rosen MA, Dincer I. Exergy as the confluence of energy, environment and sustainable development. Exergy, an International Journal. 2001(1) 3–13. [32] Steen B. A systematic approach to environmental priority strategies in product development (EPS), Version 2000-General system characteristics. CPM report 1999: 4, Center for Environmental Assessment, Chalmers Univ. of Technology, Gothenburg, Sweden, 1999. [33] Ultiati S. Energy quality, emergy, and transformity: H.T. Odum’s contributions to quantifying and understanding systems. Ecological Modelling. 2004(178):201–213. [34] Zhang Y, Baral A, Bakshi BR. Accounting for ecosystem services in life cycle assessment, Part II, Toward an ecologically based LCA. Environmental Science & Technology. 2010(44): 2624–2631. [35] Zhang Y, Singh S, Bakshi BR. Accounting for ecosystem services in life cycle assessment, Part I: A critical review. Environmental Science & Technology. 2010(44):2232–2242.

Index adjacency matrix 59 analysis of covariance 131 analysis of variance 131 approximation 63 approximation error 84 approximation for random processes 69 AR model 30 ARMA model 1, 17, 32 attractor 46 autocorrelation function 2 autocovariance function 2 autoregressive moving average 1, 17 B-spline 115 Banker–Charnes–Cooper (BCC) DEA model 252 Battle–Lemarie wavelet 89 Bayes method 135 best approximation 66 bivariate interpolation 118 bivariate nonlinear equation 158 box-counting dimension 53 building design 240 butterfly effect 45 Calderon–Zygmund operator 182 canonical correlation analysis 142 carbon emissions reduction 265 chance node 271 chaotic domain 50 Charnes–Cooper–Rhodes (CCR) DEA model 243 Chebyshev distance 138 Chebyshev polynomial 79 circular motion 46 classic life cycle assessment 312 classical Galerkin method 176 climate reconstruction 128 cluster analysis 137 clustering coefficient 60 coal mining 240, 265 coal-fired power plant 265 complicated Simpson formula 149 complicated trapezoidal formula 149 consequence 271 correlation dimension 54 covariance function 1

data compression 97 data envelopment analysis 243 Daubechies wavelet 89 DEA 243 decision node 271 decision rule 267 decision tree 271 delay embedding theorem 57 delay embedding vector 56 difference method 163 dimension curse 63 dimensionality reduction 72 directional distance function model 264 directional distance function super efficiency model 263 discrete Fourier transform 116 discriminant analysis 132 dissipative system 46 distance 61 dual simplex method 203 Durbin–Levinson algorithm 8 dynamical system 45 ε-constraint method 282 ecologically-based life cycle assessment (Eco-LCA) 317 electric energy generating system 241 elliptic equation 164 energy crops 318 epsilon-based measure (EBM) model 259, 260 Euclidean distance 138 Euler method 161 exergetic life cycle assessment (ExLCA) 315, 316 exergy 315 expected monetary value (EMV) 271 expected value of opportunity loss (EOL) 271 factor analysis 143 fast Fourier transform 116 Fermat rule 222 finite element method 167 Fisher method 134 fixed point 47 fixed point principle 155 Fourier power spectrum 15 Fourier series 63

322 | Index

Fourier transform 87 fractal dimension 51 fractal structure 50 fractile method 274 free disposal hull DEA model 257 Gauss–Legendre quadrature formula 150 generalized Lagrange multiplier 283 global clustering coefficient 60 global recurrence rate 59 greedy algorithm 98 Haar wavelet 89 Henon map 47 Hermite interpolation 110 Hermite polynomial 82 high-dimensional wavelet 90 Hurwitz rule 268 hybrid model 261 hyperbolic equation 166 information dimension 54 innovation algorithm 10 inoperability input-output model 307 input-oriented BCC DEA model 252 input-oriented CCR DEA model 243 integral equation 180 interpolation 102 iron and steel industry 241 iterative method 155 Jacobian iterative method 159 Jacobian polynomial 81 joint optimality and sensitivity multiobjective model 288 Kalman filtering 43 Kalman prediction 42 Kalman smoothing 43 Karush–Kuhn–Tucker (KKT) condition 225 Lagrange interpolation 106 Lagrange interpolation formula 106 Laguerre polynomial 82 LCA 312 LCI 313 LCIA 314 Legendre polynomial 81 Leontief’s input-output model 307

Leslie model 304 life cycle assessment (LCA) 312 life cycle impact assessment 312, 314 life cycle interpretation 312, 315 life cycle inventory analysis 312, 313 linear mapping 48 linear process 4 linear regression 122 linearly constrained convex optimization model 230 linearly nonnegative constrained convex optimization model 231 livestock production 318 local clustering coefficient 60 local recurrence rate 59 logistic map 49 Lyapunov exponent 50 MA model 32 Mahalanobis distance 138 Mahalanobis distance method 132 Mahley approximation 86 MaxDEA 264 maximum likelihood estimation 33 mean function 1 Meyer wavelet 89 Minkowski distance 137 modeling 30 modified slack-based measure (MSBM) model 258 most likely value (MLV) 271 MRA 88 multiobjective multistage impact analysis (MMIA) method 294 multiobjective optimization model 282 multiobjective risk impact analysis (MRIA) method 296 multiple regression 125 multiresolution analysis 88 multiresolution approximation 95 multivariate approximation 72 multivariate ARMA process 34 N-term approximation 96 Newton interpolation formula 107, 108 Newton’s method 185 Newton–Raphson method 191 nonlinear regression model 125

Index |

numerical differentiation 152 numerical integration 148 observed trajectory 58 optimistic rule 268 ordinary differential equation 161, 163 orthogonal greedy algorithm 100 output-oriented BCC DEA model 253 output-oriented CCR DEA model 249 Pade approximation 85 parabolic equation 165 partial autocorrelation function 24 partitioned multiobjective risk (PMR) 291 path 61 pessimistic rule 267 polynomial approximation 76 polynomial fitting 102 prediction 6, 26, 55 primal and dual pairs of linear optimization 233 principal component analysis 139 pure greedy algorithm 98 radial super efficiency model 262 random walk 3 rational approximation 82, 85 recurrence matrix 59 recurrence network 58 relaxation iterative method 160 relaxed greedy algorithm 99 Richardson extrapolation 154 risk assessment 267 saddle function 233 saddle point 236 sample autocovariance function 5 SBM super efficiency model 263 scaling function 88 Seldel iterative method 160 self-similarly dimension 54 Shepard interpolation 119 simplex method 198 Simpson formula 149 singular spectrum analysis 57 slack-based measure DEA model 257 spectral analysis 13 spectral density 13, 25 spectral estimation 14

323

spline approximation 82 spline interpolation 112 state-space model 39 stationary 1 stationary vector process 35 steepest descent method 185 strange attractor 50 strictly stationary 2 super efficiency model 262 supply chain planning 240 system of linear equations 159 system of ordinary differential equations 163 thresholding value method 97 time series 1 time series analysis 1 time-invariant linear filter 16 trade-off function 283 trapezoidal formula 149 trapezoidal method 162 triangular distribution method 279 trigonometric approximation 63 trigonometric interpolation 116 uncertainty sensitivity index method (USIM) 287 univariate nonlinear equation 156 variational method 192 vector AR process 37 vector ARMA process 35 vector MA process 36 wastewater treatment 318 wavelet 88 wavelet approximation 86 wavelet basis 88 wavelet coefficient 90 wavelet decomposition formula 93 wavelet packet 92 wavelet-Galerkin method 176 weighted modified slack-based measure (WMSBM) model 259 weighted slack-based measure (WSBM) model 259 white noise 3 Wold decomposition 12 Yule–Walker equation 20, 38