The Statistical Analysis of Time Series 0471047457, 9780471047452

The Wiley Classics Library consists of selected books that havebecome recognized classics in their respective fields. Wi

146 113 25MB

English Pages 720 [716]

Recommend Papers

Statistical Visions in Time: A History of Time Series Analysis, 1662–1938 0521420466, 9780521420464

This work documents the history of techniques that statisticians have used to manipulate economic, meteorological, biolo

119 92 6MB Read more

Time Series Analysis 9780691218632

The last decade has brought dramatic changes in the way that researchers analyze economic and financial time series. Thi

103 27 77MB Read more

Time Series Analysis

115 31 6MB Read more

Time Series Analysis 9780387759593, 038775959X

Time Series Analysis With Applications in R, Second Edition, presents an accessible approach to understanding time serie

436 116 6MB Read more

Structural Health Monitoring by Time Series Analysis and Statistical Distance Measures (PoliMI SpringerBriefs) 3030662586, 9783030662585

This book conducts effective research on data-driven Structural Health Monitoring (SHM), and accordingly presents many n

99 32 5MB Read more

Climate Time Series Analysis: Classical Statistical and Bootstrap Methods 9048194814, 9789048194810

Climate is a paradigm of a complex system. Analysing climate data is an exciting challenge, which is increased by non-no

359 44 9MB Read more

Wavelet Methods for Time Series Analysis 9780511841040

This introduction to wavelet analysis 'from the ground level and up', and to wavelet-based statistical analysi

447 98 45MB Read more

Statistics for astrophysics: Time series analysis 9782759827411

This book is the result of the 2019 session of the School of Statistics for Astrophysics (Stat4Astro) that took place on

125 48 5MB Read more

Time Series Analysis in the Social Sciences: The Fundamentals 9780520966383

Times Series Analysis in the Social Sciences is a practical and highly readable introduction written exclusively for stu

120 72 2MB Read more

Spectral Analysis of Economic Time Series. (PSME-1) 9781400875528

The important data of economics are in the form of time series; therefore, the statistical methods used will have to be

121 26 17MB Read more

The Statistical Analysis of Time Series
0471047457, 9780471047452

Author / Uploaded
Theodore W. Anderson

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

The Statistical Analysis of Time Series

The Statistical Analysis of Time Series T. W. ANDERSON

Professor of Statistics and Economics Stanford University

Wiley Classics Library Edition Published 1994

A Wiley-Interscience Publication JOHN WlLEY & SONS, INC. New York Chichester Brisbane Toronto Singapore

A NOTE TO THE READER

This book has been electronically reproduced from digital information stored at John Wiley & Sons,Inc. We are pleased that the use of this new technology will enable us to keep works of enduring scholarly value in print as long as there is a reasonable demand for hem. The content of this book is identical to previous printings.

This text is printed on acid-free paper. Copyright 0 1971 by John Wiley & Sons, Inc. Wiley Classics Library Edition Published 1994.

All rights reserved. Published simultaneously in Canada. Reproduction or translation of any part of this work beyond that permitted by Section 107 or 108 of the 1976 United States Copyright Act without the permission of the copyright owner is unlawful. Requests for permission or further information should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012. This work was supponed by the Logistics and Mathematical Statistics Branch of the Office of Naval Research under Contracts Nonr-3279(00), NR 042-2 19, Nonr-266(33), N R 042-034; Nonr-4195(00), NR 042-233; Nonr-4259(08), NR 042-034; Nonr-225(53), NR 042-002; N 00014-67-A-0012-0030, NR 042-034. ISBN 0-471-04745-7 Library of Congress Catalog Number 70-126222

To

DOROTHY

Preface In writing a book on the statistical analysis of time series an author has a choice of points of view. My selection is the mathematical theory of statistical inference concerning probabilistic models that are assumed to generate observed time series. The probability model may involve a deterministic trend and a random part constituting a stationary stochastic process; the statistical problems treated have to do with aspects of such trends and processes. Where possible, the problem is posed as one of finding an optimum procedure and such procedures are derived. The statistical properties of the various methods are studied; in many cases they can be developed only in terms of large samples, that is, on information from series observed a long time. In general these properties are derived on a rigorous mathematical basis. While the theory is developed under appropriate mathematical assumptions, the methods may be used where these assumptions are not strictly satisfied. It can be expected that in many cases the properties of the procedures will hold approximately. In any event the precisely stated results of the theorems give some guidelines for the use of the procedures. Some examples of the application of the methods are given, and the uses, computational approaches, and interpretations are discussed, but there is no attempt to put the methods in the form of programs for computers. This book grew out of a graduate course that 1 gave for many years at Columbia University, usually for one semester and occasionally for two semesters. By now the material included in the book cannot be covered completely in a two-semester course; an instructor using this book as a text will select the material that he feels most interesting and important. Many exercises are given. Some of these are applications of the methods described; some of the problems are to work out special cases of the general theory; some of the exercises fill in details' in complicated proofs; and some extend the theory. vii

viii

PREFACE

Besides serving as a text book I hope this book will furnish a means by which statisticians and other persons can learn about time series analysis without resort to a formal course. Reading this book and doing selected exercises will lead to a considerable knowledge of statistical methodology useful for the analysis of time series. This book may also serve for reference. Much material which has not been assembled together before is presented here in a fairly coherent fashion. Some new theorems and methods are presented. In other cases the assumptions of previously stated propositions have been weakened and conclusions strengthened. Since the area of time series analysis is so wide, an author must select the topics he will include. I have described in the introduction (Chapter 1) the material included as well as the limitations, and the Table of Contents also gives an indication. It is hoped that the more basic and important topics are treated here, though to some extent the coverage is a matter of taste. New methods are constantly being introduced and points of view are changing; the results here can hardly be definitive. In fact, some material included may at the present time be rather of historical interest. In view of the length of this book a few words of advice to readers and instructors may be useful in selecting material to study and teach. Chapter 2 is a self-contained summary of the methods of least squares; it may be largely redundant for many statisticians. Chapters 3 and 4 deal with models with independent random terms (known sometimes as “errors in variables”); some ideas and analysis are introduced which are used later, but the reader interested mainly in the later chapters can pass over a good deal (including much of Sections 3.4, 4.3, and 4.4). Autoregressive processes, which are useful in applications and which illustrate stationary stochastic processes, are treated in Chapter 5 ; Sections 5.5 and 5.6 on large-sample theory contain relevant theorems, but the proofs involve considerable detail and can be omitted. Statistical inference for these models is basic to analysis of stationary processes “in the time domain.” Chapter 6 is an extensive study of serial correlation and tests of independence; Sections 6.3 and 6.4 are primarily of theoretical statistical interest; Section 6.5 develops the algebra of quadratic forms and ratios of them; distributions, moments, and approximate distributions are obtained in Sections 6.7 and 6.8, and tables of significance points are given for tests. The first five sections of Chapter 7 constitute an introduction to stationary stochastic processes and their spectral distribution functions and densities. Chapter 8 develops the theory of statistics pertaining to stationary stochastic processes. Estimation of the spectral density is treated in Chapter 9; it forms the basis of analyzing stationary processes “in the frequency domain.” Section 10.2 extends regression analysis (Chapter 2) to stationary random terms; Section

PREFACE

ix

10.3 extends Chapters 8 and 9 to this case; and Section 10.4 extends Chapter 6 to the case of residuals from fitted trends. Parts of the book that constitute units which may be read somewhat independently from other parts are (i) Chapter 2, (ii) Chapters 3 and 4, (iii) Chapter 5 , (iv) Chapter 6, (v) Chapter 7, and (vi) Chapters 8 and 9. The statistical analysis of time series in practical applications will also invoke less formal techniques (which are now sometimes called “data analysis”). A graphical presentation of an observed time series contributes to understanding the phenomenon. Transformations of the measurement and relations to other data may be useful. The rather precisely stated procedures studied in this book will not usually be used in isolation and may be adapted for various situations. However, in order to investigate statistical methods rigorously within a mathematical framework some aspects of the analysis are formalized. For instance, the determination of whether an effect is large enough to be important is sometimes formalized as testing the hypothesis that a parameter is 0. The level of this book is roughly that of my earlier book, An Introduction to Multivariate Statistical Analysis. Some knowledge of matrix algebra is needed. (The necessary material is given in the appendix of my earlier book; additional results are developed in the text and exercises of this present book.) A general knowledge of statistical methodology is useful; in particular, the reader is expected to know the standard material of univariate analysis such as t-tests and F-statistics, the multivariate normal distribution, and the elementary ideas of estimation and testing hypotheses. Some more sophisticated theory of testing hypotheses, estimation, and decision theory that is referred to is developed in the exercises. [The reader is referred to Lehmann (1959) for a detailed and rigorous treatment of testing hypotheses.] A moderate knowledge of advanced calculus is assumed. Although real-valued time series are treated, it is sometimes convenient to write expressions in terms of complex variables; actually the theory of complex variables is not used beyond the simple fact that eie = cos 8 i sin 8 (except for one problem). Probability theory is used to the extent of characteristic functions and some basic limit theorems. The theory of stochastic processes is developed to the extent that it is needed. As noted above, there are many problems posed at the end of each chapter except the first which is the introduction. Solutions to these problems have been prepared by Paul Shaman. Solutions which are referred to in the text or which demonstrate some particularly important point are printed in Appendix B of this book. Solutions to most other problems (except solutions that are straightforward and easy) are contained in a Solutions Manual which is issued as a separate booklet. This booklet is available free of charge by writing to the publisher.

+

X

PREFACE

I owe a great debt of gratitude to Paul Shaman for many contributions to this book in matters of exposition, selection of material, suggestions of references and problems, improvements of proofs and exposition, and corrections of errors of every magnitude. He has read my manuscript in many versions and drafts. The conventional statement that an acknowledged reader of a manuscript is not responsible for any errors in the publication I feel is usually superfluous because it is obvious that anyone kind enough to look at a manuscript assumes no such responsibility. Here such a disclaimer may be called for simply because Paul Shaman corrected so many errors that it is hard to believe any remain. However, I admit that in this material it is easy to generate errors and the reader should throw the blame on the author for any he finds (as well as inform him of them). My appreciation also goes to David Hinkley, Takamitsu Sawa, and George Styan, who read all or substantial parts of the manuscript and proofs and assisted with the preparation of the bibliography and index. There are many other colleagues and students to thank for assistance of various kinds. They include Selwyn Gallot, Joseph Gastwirth,Vernon Johns, Ted Matthes, Emily Stong Myers, Emanuel Parzen, Lloyd Rosenberg, Ester Samuel, and Morris Walker as well as Anupam Basu, Nancy David, Ronald Glaser, Elizabeth Hinkley, Raul Mentz, Fred Nold, Arthur V. Peterson, Jr.. Cheryl Schiffman, Kenneth Thompson, Roger Ward, Larry Weldon, and Owen Whitby. No doubt I have forgotten others. I also wish to thank J. M. Craddock, C. W. I. Granger, M. G. Kendall, A. Stuart, and Herman Wold for use of some material. In preparing the typescript my greatest debt is to Pamela Oline Gerstman, my secretary for four years, who patiently went through innumerable drafts and revisions. (Among other tribulations her office was used as headquarters for the “liberators” of Fayerweather Hall in the spring of 1968.) The manuscript also bore the imprints of Helen Bellows, Shauneen Nelson, Carol Andermann Novak, Katherine Cane, Carol Hallett Robbins, Alexandra Mills, Susan Parry, Noreen Browne Ettl, Sandi Hochler Frost, Judi Campbel1,and Polly Bixler. An important factor in my writing this book was the sustained financial support of the Office of Naval Research over a period of some ten years. The Logistics and Mathematical Statistics Branch has been helpful, encouraging, accommodating and patient in its sponsorship. Stanford University Stanford, California February, 1970

T. W. ANDERSON

Contents 1 Introduction

CHAPTER

1

7

References

2 The Use of Regression Analysis CHAPTER

8

2.1. Introduction 2.2. The General Theory of Least Squares 2.3. Linear Transformations of Independent Variables; Orthogonal Independent Variables 2.4. Case of Correlation 2.5. Prediction 2.6. Asymptotic Theory References Problems CHAPTER 3 Trends and Smoothing

8 8 13 18 20

23 21

27 30 30 31 46 60 79 81 82 82

3.1. 3.2. 3.3. 3.4.

Introduction Polynomial Trends Smoothing The Variate Difference Method 3.5. Nonlinear Trends 3.6. Discussion References Problems Xi

xii

CONTENTS

4 Cyclical Trends

CHAPTER

4.1. Introduction 4.2. Transformations and Representations 4.3. Statistical Inference When the Periods of the Trend Are Integral

92 92 93

102 Divisors of the Series Length Statistical Inference When the Periods of the Trend Are Not Integral 136 Divisors of the Series Length 158 4.5. Discussion 159 References 159 Problems 4.4.

CHAPTER 5 Linear Stochastic Models with Finite Numbers of Parameters

5.1. Introduction

5.2. Autoregressive Processes

5.3. Reduction of a General Scalar Equation to a First-Order Vector Equation 5.4. Maximum Likelihood Estimates in the Case of the Normal Distribution 5.5. The Asymptotic Distribution of the Maximum Likelihood Estimates 5.6. Statistical Inference about Autoregressive Models Based on Large-Sample Theory 5.7. The Moving Average Model 5.8. Autoregressive Processes with Moving Average Residuals 5.9. Some Examples 5.10. Discussion References Problems CHAPTER

6

Serial Correlation 6.1. 6.2. 6.3. 6.4. 6.5.

164 164 166 177 183 188 21 1 223 235 242 247 248 248

254

254 Introduction 256 Types of Models 260 Uniformly Most Powerful Tests of a Given Order Choice of the Order of Dependence as a Multiple-Decision Problem 270 276 Models: Systems of Quadratic Forms

CONTENTS

6.6. 6.7. 6.8. 6.9. 6.10. 6.1 1. 6.12.

Cases of Unknown Means Distributions of Serial Correlation Coefficients Approximate Distributions of Serial Correlation Coefficients Joint and Conditional Distributions of Serial Correlation Coefficients Distributions When the Observations Are Not Independent Some Maximum Likelihood Estimates Discussion References Problems

CHAPTER

7

...

xi11

294 302 338 348 35 I 353 357 358 358

Stationary Stochastic Processes

37 1

7.1. 7.2. 7.3. 7.4. 7.5. 7.6. 7.7.

37 1 372 380 392 398 414 424 432 432

Introduction Definitions and Discussion of Stationary Stochastic Processes The Spectral Distribution Function and the Spectral Density The Spectral Representation of a Stationary Stochastic Process Linear Operations on Stationary Processes Hilbert Space and Prediction Theory Some Limit Theorems References Problems

CHAPTER 8 The Sample Mean, Covariances, and Spectral Density

8.1. Introduction 8.2. Definitions of the Sample Mean, Covariances, and Spectral Density, and their Moments 8.3. Asymptotic Means and Covariances of the Sample Mean, Covariances, and Spectral Density 8.4. Asymptotic Distributions of the Sample Mean, Covariances, and Spectral Density 8.5. Examples 8.6. Discussion References Problems

438 438 439 459 477 495 495 496 496

xiv

CONTENTS

CHAPTER 9

Estimation of the Spectral Density Introduction Estimates Based on Sample Covariances Asymptotic Means and Covariances of Spectral Estimates Asymptotic Normality of Spectral Estimates Examples Discussion References Problems

9.1. 9.2. 9.3. 9.4. 9.5. 9.6.

10 Linear Trends with Stationary Random Terms CHAPTER

10.1. Introduction 10.2. Efficient Estimation of Trend Functions 10.3. Estimation of the Covariances and Spectral Density Based

on Residuals from Trends 10.4. Testing Independence

References Problems

APPENDIX A

501 501 502 519 534 546 549 55 1 551

5 59 559 560 587 603 617 617

Some Data and Statistics

622

A. 1. The Beveridge Wheat Price Index A.2. Three Second-Order Autoregressive Processes Generated by

622

Random Numbers A.3. Sunspot Numbers APPENDIX B

640 659

Solutions to Selected Problems

663

Bibliography Books Papers

680

Index

689

680 683

The Statistical Analysis of Time Series by T. W. Anderson Copyright © 1971 John Wiley & Sons, Inc.

CHAPTER 1

Introduction A time series is a sequence of observations, usually ordered in time, although in some cases the ordering may be according to another dimension. The feature of time series analysis which distinguishes it from other statistical analyses is the explicit recognition of the importance of the order in which the observations are made. While in many problems the observations are statistically independent, in time series successive observations may be dependent, and the dependence may depend on the positions in the sequence. The nature of a series and the structure of its generating process may also involve in other ways the sequence in which the observations are taken. In almost every area there are phenomena whose development and variation with the passing of time are of interest and importance. In daily life one is interested in aspects of the weather, in prices that one pays, and in features of one’s health; these change in time. There are characteristics of a nation, affecting many individuals, such as economic conditions and population, that evolve and fluctuate over time. The activity of a business, the condition of an industrial process, the level of a person’s sleep, and the reception of a television program vary chronologically. The measurement of some particular characteristic over a period of time constitutes a time series. It may be an hourly record of temperature at a given place or the annual rainfall at a meteorological station. It may be a quarterly record of gross national product; an electrocardiogram may compose several time series. There are various purposes for using time series. The objective may be the prediction of the future based on knowledge of the past; the goal may be the control of the process producing the series; it may be to obtain an understanding of the mechanism generating the series; or simply a succinct description of the salient features of the series may be desired. As statisticians we shall be interested in statistical inference; on the basis of a limited amount of information, a

1

2

INTRODUCTION

CH.

1.

time series of finite length, we wish to make inferences about the probabilistic mechanism that produced the series; we want to analyze the underlying structure. In principle the measurement of many quantities, such as temperature and voltage, can be made continuously and sometimes is recorded continuously in the form of a graph. In practice, however, the measurements are often made discretely in time; in other cases, such as the annual yield of grain per acre, the measurement must be made at definite intervals of time. Even if the data are recorded continuously i n time only the values at discrete intervals can be used for digital computations. In this book we shall confine ourselves to time series that are recorded discretely in time, that is, at regular intervals, such as barometric pressure recorded each hour on the hour. Although the effect of one quantity on another and the interaction of several characteristics over time are often of consequence, in many studies a great deal of knowledge may be gained by the investigation of a single time series; this book (except with respect to autoregressive systems) is concerned with statistical methods for analyzing a univariate time series, that is, one type of measurement made repeatedly on the same object or individual. We shall, furthermore, suppose that the measurement is a real number, such as temperature, which is not limited to a finite (or denumerable) number of values; such a measurement is often called a continuous variable. Some measurements we treat mathematically as if they were continuous in time; for example, annual national income can at best be measured to the nearest penny, but the number of values that this quantity can take on is so large that there is no serious slight to reality in considering the variable as continuous. Moreover, we shall consider series which are rather stable, that is, ones which tend to stay within certain bounds or at least are changing slowly, not explosively or abruptly; we would include many meteorological variables, for instance, but would exclude shock waves. Let an observed time series be yl, y2, . . . ,yT. The notation means that we have T numbers, which are observations on some variable made at T equally distant time points, which for convenience we label I , 2, . . . , T. A fairly general mathematical, statistical, or probabilistic model for the time series can be written as follows: (1)

=fO) + l(f,

!I(

r=1,2

, . . . ) f.

The observed series is considered as made up of a completely determined sequence { f ( r ) } , which may be called the systematic part, and of a random or stochastic sequence {u,}, which obeys a probability law. (Signal and noise are sometimes used for these two components.) These two components of the observed series are not observable; they are theoretical quantities. For example,

INTRODUCTION

3

if the measurements are daily rainfall, the f ( t ) might be the climatic norm, which is the long-run average over many years, and the ut would represent those vagaries and irregularities in the weather that describe fluctuations from the climatic norm. Exactly what the decomposition means depends not only on the data, but, in part, depends on what is thought of as repetitions of the experiment giving rise to the data. The interpretation of the random part made here is the so-called “frequency” interpretation. In principle one can repeat the entire situation, obtaining a new set of observations; thef(t) would be the same as before, but the random terms would be different, as a new realization of the stochastic process. The random terms may include errors of observation. [In effectf(t) = &y,.] We have some intuitive ideas of what time should mean in such a model or process. One notion is that time proceeds progressively in one direction. Another is that events which are close together in time should be relatively highly related and happenings farther apart in time should not be as strongly related. The effect of time in the mathematical model (1) can be inserted into specifications of the function or sequencef(t); it can be put into the formulation of the probability process that defines the random term u,; or it can be put into both components. The first part of this book will be devoted to time series represented by “error” models, in which the observations are considered to be independent random deviations from some function representing trend. In the second part we shall be concerned with sequences of dependent random variables, in general stationary stochastic processes with particular emphasis on autoregressive processes. Finally, we shall treat models in which there is a trend and the random terms constitute a stationary stochastic process. Stationary stochastic processes are explained in Chapter 7. In many cases the model may be completely specified except for a finite number of parameters; in such a case the problems of statistical inference concern these parameters. In other cases the model may be more loosely defined and the corresponding methods are nonparametric. The model is to represent the mechanism generating the relevant series reasonably well, but as a mathematical abstraction the model is only an approximation to reality. How precisely the model can be determined depends on the state of knowledge about the process being studied, and, correspondingly, the information that can be supplied by statistical analysis depends on this state of knowledge. In this book many methods and their properties will be described, but, of course, these are only a selection from the many methods which are useful and available. Here the emphasis i s on statistical inference and its mathematical basis. The early development of time series analysis was based on models in which the effect of time was made in the systematic part, but not in the random part.

4

INTRODUCTION

CH. 1.

For convenience this case might be termed the classical model, because in a way it goes back to the time when Gauss and others developed least squares theory and methods for use in astronomy and physics. In this case we assume that the random part does not show any effect of time. More specifically, we assume that the mathematical expectation (that is, the mean value) is zero, that the variance is constant, and that the ut are uncorrelated at different points in time. These specificationsessentially force any effects of time to be made in the systematic partf(r). The sequencef(t) may depend on unknown coefficients and known quantities which vary over time. Thenf(r) is a “regression function.” Methods of inference for the coefficients in a regression function are useful in many branches of statistics. The cases which are peculiar to time series analysis are those cases in which the quantities varying over time are known functions oft. Within the limitations set out we may distinguish two types of sequences in time,f(r). One is a slowly moving function of time, which is often called a trend, particularly by economists, and is exemplified by a polynomial of fairly low degree. Another type of sequence is cyclical; this is exemplified by a finite Fourier series, which is a finite sum of pairs of sine and cosine terms. A pair may be a cos Ar ,b sin I t (0 < A < n), which can also be written as a cosine function, say p cos (At - 0). The period of such a function of time is 2 4 A ; that is, the function repeats itself after r has gone this amount; the frequency is p* the reciprocal of the period, namely Al(2n). The coefficient p = Jtc2 is the amplitude and 0 is the phase. The observed series is considered to be the sum of such a series f ( t ) and a random term. Figure 1.1 presents yt = 5 2 cos 2 ~ r / 6 sin 2 d / 6 ut, where ut is normally distributed with mean 0 and variance 1. [The function f ( r ) is drawn as a function of a continuous variable 1.1 The successive values of yf are scattered randomly above and below f ( t ) . If we know this curve and the error distribution, information about yl, . . . ,yr-l gives us no help in predicting y,; the plot of f ( s ) for s > r - I does not depend on yl,. . . ,yf-l. Such a model may be appropriate in certain physical or economic problems. In astronomy, for example,f(t) might be one coordinate in space of a certain planet at time r. Because telescopes are not perfect, and because of atmospheric variations, the observation of this coordinate at time t will have a small error. This error of observation does not affect later positions of the planet nor our observations of them. In the case of a freely swinging pendulum the displacement of the pendulum is a trigonometric function p cos ( A t - 0) when measured from its lowest point. One general model with the effect of time represented in the random part is a

+

+

+

+

+

8

c

5

1NTRODUCTlON

I

I

I

I

I

I

I

I

I

I

0

? 8 1

1

2

3

5

4

7

6

8

9

10

t

Figure 1 . 1 .

Series with a trigonometric trend

stationary stochastic process; we can illustrate this with an autoregressive process. Suppose yl has some distribution with mean 0; let yl and yz have the joint distribution of yl and pyI u2, where u2 is distributed with mean 0, independently of yl. We write y2 = pyI + u2. Define in turn the joint distribution of y,,y2, . . . , yt-l, y t as the joint distribution of yl, y 2 , . . . , Yf-l, pyt-I u t , where u , is distributed with mean 0, independently of yl, . . . , yf-], r = 3, 4 , . . . . When the (marginal) distributions of u2, u s , . . . are identical and the distribution of y1 is chosen suitably, {yt} is a stationary stochastic process, in fact, an autoregressive process, and

+

+

(2)

Yf=

PYt-1

+ u/

is a stochastic difference equation of first order. This construction is illustrated in Figure 1.2 for p = $. In this model the “disturbance” u1 has an effect on y, and all later y,’s. It follows from the construction that the conditional expectation of y,, given y l , . . . ,yt-l,is (3)

&Y/

I

!/I,

..

*

I

!If-1)

= PYf-1.

(In fact, for a first-order process yl and y f - 2 , . . . ,y1 are conditionally independent given If we want to predict yt from yl, . . . , yt-l and know the parameter p, our best prediction (in the sense of minimizing the mean square error) is p y f - ] ; in this model knowledge of earlier observations assists in predicting y,. An autoregressive process of second order is obtained by taking the joint

6

INTRODUCTION

2

1

CH.

1.

3

t

Figure 1.2. Construction of an autoregressive series.

distribution of yl, y2, . . . ,yt-l, y t as the joint distribution of yl, yz, . . . ,yt-l, plyt-l p2yt-2 ut, where til is independent of yl, yn, , . . ,yt-l, t = 3 , 4 , . . . , and the distribution of y1 and yn is suitably chosen; graphs of such series are given in Appendix A.2. Graphs of other randomly generated series are given by Kendall and Stuart (1966), Chapter 45, and Wold (1965), Chapter 1. The variable y t may represent the displacement of a swinging pendulum when it is subjected to random “shocks” or pushes, ul. The values of yt tend to be a trigonometric function p cos (Ar - 0) but with varying amplitude, varying frequency, and varying phase. An autoregressive process of order 4, generated by y t = psyt-s u l , tends to be like the sum of two trigonometric functions with varying amplitudes, frequencies, and phases. A general stationary stochastic process can be approximated by an autoregressive process of sufficiently high order, or it can be approximated by a process

+

z:-l

(4)

+

+

2 j=1

( A cos I +

+ B, sin A r t ) ,

where A,, B,, . . . ,A,, B, are independently distributed with &’Aj = &Bj = 0 and &A,” = 8B: = # ( I j ) . The process is the sum of q trigonometric functions, whose amplitudes and phases are random variables. On the average the importance of the trigonometric function with frequency Ij/(27r)is proportional to the expectation of its squared amplitude, which is 2#(5). In these terms a stationary stochastic process (in a certain class) may be characterized by a specrrul density f ( 2 ) such that J,” f ( A ) dA is approximated by the sum of + ( I j ) over A, such that a 5 I, < 6. A feature of stationary stochastic processes is that the covariance b ( y , - &y,)(y, - &ys) only depends on the difference in time It - sl and hence can be denoted by a(r - s). The covariance sequence and the spectral density (when it exists) are alternative ways of describing the

INTRODUCTION

7

second-order moment structure of a stationary stochastic process. The covariance sequence may be more appropriate and instructive when the time sequence is most relevant, which is the case of many economic series. The spectral density may be more suitable for other analyses. This is particularly true in the physical sciences because the harmonic or trigonometric function of time frequently gives the basic description of a phenomenon. Since the fluctuation of air pressure of a pure tone is given by a cosine function, the natural analysis of sound is in Fourier terms. In particular, the ear operates in this manner in the way it senses pitch. The effect of time may be present both in the systematic part f ( t ) as a trend in time and in the random part ut as a stationary stochastic process. For example, an economic time series may consist of a long-run movement and seasonal variation, which together constitute f ( t ) , and an oscillatory component and other irregularities which constitute ut and which may be described by an autoregressive process. When the trend f ( t ) has a specified structure and depends on a finite number of parameters, we consider problems of inference concerning these parameters. For example, we may estimate the coefficients of powers of t in polynomial trends and the coefficients of sines and cosines in trigonometric trends. In the former case we may want to decide the highest power o f t to include, and in the latter case we may want to decide which of several terms to include. When the trend is not specified so exactly, we may use nonparametric methods, such as smoothing, to estimate the trend. When the stochastic process is specified in terms of a finite number of parameters, such as an autoregressive process, we may wish to estimate the coefficients, test hypotheses about them, or decide what order process to use. A null hypothesis of particular interest is the independence of the random terms; this hypothesis may be tested by means of a serial correlation coefficient. When the process is stationary, but specified more loosely, we may estimate the covariances {a(h)} or the spectral density ; these procedures are rather nonparametric. The methods presented here are mainly for inference concerning the structure of the mechanism producing the process. Methods of prediction of the future values of the process when the structure is known are indicated; when the structure is not known, it may be inferred from the data and that inferred structure used for forecasting. REFERENCES

Kendall and Stuart (1966), Wold (1965).

The Statistical Analysis of Time Series by T. W. Anderson Copyright © 1971 John Wiley & Sons, Inc.

CHAPTER 2

The Use of Regression Analysis 2.1.

INTRODUCTION

Many of the statistical techniques used in time series analysis are those of regression analysis (classical least squares theory) or are adaptations or analogues of them. The independent variables may be given functions of time, such as powers of time t or trigonometric functions of t. We shall first summarize statistical procedures when the random terms are uncorrelated (Sections 2.2 and 2.3); these results are used i n trend analysis (Chapters 3 and 4). Then these procedures are modified for the case of the random terms having an arbitrary covariance matrix, which is known to within a proportionality factor (Section 2.4). Statistical procedures for analyzing trends when the random terms constitute a stationary stochastic process are studied in Chapter 10. Some largesample theory for regression analysis which holds when the random terms are not necessarily normally distributed is developed (Section 2.6); generalizations of these results are useful for estimates of coefficients of stochastic difference equations (Chapter 5 ) because exact distribution theory is unmanageable; other generalizations are needed when the random terms constitute a more general stationary process (Section 10.2). 2.2.

THE GENERAL THEORY OF LEAST SQUARES

Consider random variables yl, y2, , . . , yT which are uncorrelated and have means and variances

b(y,

- bYO* = uz, 8

t = I,

. . . , T,

t = 1,

. . . , T,

SEC.

2.2.

9

THE GENERAL THEORY OF LEAST SQUARES

where the zit's are given numbers. The zit's are called independent variables and the yt's are called dependent variables. If we introduce the following vector notation

(3)

P=

,

tl =

the means ( 1 ) can be written (4 1

t = 1,

by, = P'tr,

. . . , T.

(The transpose of any vector or matrix u is indicated by a'.) The estimate of g, denoted by b, is the solution to the normal equation (5)

Ab = C ,

where

and A is assumed nonsingular (and thus T 2 p). The vector b = A-'c minimizes respect to vectors i, and is called the least squares estimate of p. An unbiased estimate s2 of u2 may be obtained (when T > p) from

2 L (Yt - 3ztl)l with

(7)

(T

- p)s2 =

2

( y t - b'tt)'

2

yf

T

f =I

T

=

- b'Ab.

The least squares estimate 6 is an unbiased estimate of (8)

p,

b b = p,

with covariance matrix The Gauss-Markov Theorem states that the elements of b are the best linear unbiased estimates (BLUE) of the corresponding components of f3 in the sense that each element of b has the minimum variance of all unbiased estimates of the corresponding element of p that are linear in yl, . . . ,yT.

10

THE USE OF REGRESSION ANALYSIS

CH.

2.

If the y i s are independently normally distributed, b is the maximum likelihood estimate of p and (T - p)s2/Tis the maximum likelihood estimate of a2. Then b is distributed according to N ( p , u2A-l), where N ( p , a2A-l) denotes the multivariate normal distribution with mean f3 and covariance matrix u2A-l, and (T - p)sZ/a2is distributed according to a X2-distribution with T - p degrees of freedom independently of 6. The vector b and s2 form a sufficient set of statistics for p and u2. On the assumption of normality we can develop tests of hypotheses about the pi's and confidence regions for the pi's. Let

where

and suppose that zt and b are partitioned similarly. [Partitioned vectors and matrices are treated in T. W. Anderson (1958), Appendix 1, Section 3.1 Then to test the hypothesis H: pc2) = p(2), where p(2)is a specified vector of numbers, we may use the F-statistic (12) (6'2'

- -p'2')/(A22)-1(p - -p( 2 ) ) - (b"' (P - r)s2

- A21A;,'A12)(b'2' - 6'")

- @"')'(A22

(P - r)s2

where A and A-' have been partitioned into r a n d p

9

- r rows and columns

[The equality A Z 2= (A22- A21,4;~A12)-1 is indicated in Problem 8.1 This statistic has the F-distribution with p - r and T p degrees of freedom if the normality assumption holds and the null hypothesis is true. In general if the normality assumption holds, this statistic has the noncentral F-distribution with noncentrality parameter

-

SEC.

2.2.

THE GENERAL THEORY OF LEAST SQUARES

11

These results follow from the fact that b f 2 )is distributed according to N ( P ( 2 ) , u2Az2).[See Section 2.4 of T. W. Anderson (1958), for example.] If = 0, the null hypothesis means that z:” does not enter into the regression function and one says that the y,’s are independent of the zi2)’s. In this important case the numerator of (12) is simply b(2)‘(A,,- A,lA;;Al,)b(2). A confidence region for pee) with confidence coefficient 1 - E is given by the set

p(2)

where F - P - t , T - p (is~ )the upper E significance point of the F-distribution with - r and T - p degrees of freedom. If one is interested in only one component of p, these inferences can be based on a I-statistic (instead of an F-statistic). Suppose the component is p,; then AZ2= uPD,say, is a scalar. The essential fact is that ( b , - p , ) / ( s J T P ) has a !-distribution with T - p degrees of freedom. We note that the residuals 9, = y, - b’z, are uncorrelated with the independent variables zt in the sample

p

c

(16)

f=l

2 T

T

9,z, =

f=l

Y f t f-

2 T

ZfZ2= 0,

t=1

and the set of residuals is uncorrelated with the sample regression vector in the population. (See Problem 7.) The geometric interpretation in Figure 2.1 is enlightening. Let y = (yl, . . , , yT)’ be a vector in a T-dimensional Euclidean space, let the r columns of Z , = (z;’, . . . , 2:))‘ be r vectors in the space, let the p - r columns of Z , = (zp’, . . . , ~2))’ b e p - r vectors, and let Z = ( Z , Z,). Then the expectation of y is Z P , which is a vector in thep-dimensional space spanned by the columns of Z . The sample regression Zb is the projection of y on this p-dimensional space; the vector of residuals y - Zb is orthogonal to every vector in the pdimensional space; it is (parallel to) the projection o f y on the ( T - p)-dimensional space orthogonal to the columns of Z . Figure 2.2 represents the p-dimensional space spanned by the columns of 2. The projection of the vector Z6 - Z p = Z(b - p) on the r-dimensional subspace spanned by the columns of Z , is Z,b*“) - Z , p * ( l )= Z,(b*(’) P*(l)), where b*“) = ( Z ~ Z I ) - ’ Z ~(See y . Section 2.3.) The projection on the ( p - r)-dimensional space of Z orthogonal to Z , is Z ( b - p) - Z1(b*‘” p*(l))= ( Z , - Z,A;,’A,,)(b‘2) -p‘”). The numerator of the F-statistic (12) is the squared length of this last vector, and the denominator is proportional to the squared length of y - Zb.

-

12

THE USE OF REGRESSION ANALYSIS

2

CH. 2 .

У

If \\u-Zb

у - Zß/JI /1 / /

/

Zi

\>( y//^S^iZb ——■-*" *

l/LS^s^m'' /TAN/--ί-""""~

T Figure 2.1. Geometry of least squares estimation.

Ζί-Ζ\Α\χΑη

„

/Zb

= Zi6(1) + Z26zi·'

к — 2,

ί=Ί

that is, zj( depends on zjt with index у no greater than k. The conditions that (zj, 2j r ) is orthogonal to («*,, . . . , z* T ),. . . , (z*_ui,. .. , zj_j T ) are the same as that (zj,, . . . , zj r ) is orthogonal to (z u zlT),.. . , (гк_1Л,. . . , г *-1.т)> namely

(16)

- ( у и . · · · .y*.*-i)

= (β*ι

α

*.*-ι)·

1_ г * - 1 (»

0

·' * ]■

16

THE USE OF REGRESSION ANALYSIS

CH.

2.

(If the orthogonal variables are normalized to have sum of squares of unity, this orthogonalization procedure is often known that is, transformed to as the Gram-Schmidt orthogonalization.) The formulas and computations are considerably simplified when the independent variables are orthogonal. Since

zf:/JZ,

...

[la;

(19)

0 1 0

- - *

A*-' =

7

...

a,*,-'

the elements of b* are uncorrelated and the variance of b: is a2/a:, i = 1 , . . . ,p. In this case the normal equations have the simple form i=l , . . . , ~ ,

and the formula for the estimate of the variance is

- p)s2 = 2 y: - 2 azbb:2. The F-statistic (12) of Section 2.2 for testing p*(2)= p*(2) is T

P

1-1

f=1

'(T

In some cases independent variables are chosen at the outset so as to be orthogonal. As we shall see in Chapter 4 the trigonometric sequences {cos 2njr/r), j = 0, 1 , . . . , [iTJ,and {sin 27rkt/T), k = 1, . . . ,[f(T l)] are :1 , . . . , T and can be used as components of tt. Although orthogonal for I I any set of independent variables can be orthogonalized, it is usually not worthwhile to do so if the given set is to be used only once. However, if the same set of independent variables is to be used many times, then the transformation to orthogonal 2:s' can save considerable computational effort. The case of polynomial regression provides an example of the use of orthogonalized independent variables. Suppose z i t = ti-1-, let

-

(23)

&Yt = B 1

+ +- + B*t

* *

pPf-,

f =

1,

..., T.

2.3.

LINEAR TRANSFORMATIONS OF INDEPENDENT VARIABLES

17

The powers of r can be replaced by orthogonal polynomials C$lT(r), . . . , $n-l,T(t). These have the form

= 1,

SEC.

(24)

=tk

&(f)

+

ofk-' + + Ci(k, T)f + Co(k, T ) ,

Ck-i(k,

* * *

where the C's depend on the length of the series T and the degree of the polynomial k and are determined by (25) Then

+

+

+

(26) = YO$Od') YIC$ldr) . * Yp-l$p-1.T(')' Orthogonal polynomials will be considered in detail in Section 3.2. Most of the computational methods for solving the normal equations Ab = c consist of a so-called forward solution and a backward solution. The forward solution consists of operations involving the rows of (A c ) which end up with A in triangular form; that is, it amounts to D(A c ) = Zr), namely 1

0

d21

1

... ...

-

0' 0

( A C) =

d92

...

a12

...

z19

1'

0

cizz

--

a29

4

.

.

.

0

1.

(A

a11

.

4 1

-

.

*

.

I

0

.

.

.

-*.

The matrix D is of the form of r. Consideration of (27) shows that each row of D is identical to the corresponding row of l' and hence that D = r. [Different methods triangularize A by different sequences of operations but they are algebraically equivalent, given the ordering of the variables; note that the forward solution of (27) for one k is part of the forward solution for each subsequent k.] Thus T

T

The sample regression coefficients for the orthogonalized variables b: = C ; / U ; ~ can be obtained from the forward solution of the normal equations as

18

THE USE OF REGRESSION ANALYSIS

CH.

2.

b: = 2&iW In fact, in many procedures such as the Doolittle method each row of (d Z) is divided by the leading nonzero element and recorded; the last element in each row is then the regression coefficient of the corresponding orthogonal variable. Thus we see that the forward solution of the normal equations involves the same algebra as defining the orthogonal variables; the important difference, of course, is that the computation of the orthogonal variables involves the calculation of the p T numbers zit. We note in passing that b(2)'(A2, A21AT:A12)b(z)can be obtained from the forward solution as ~ ~ = , + l= b~c~ (See Problem 12.)

-

z:-r+&ikk.

2.4. CASE OF CORRELATION If the dependent variables are correlated and the covariance matrix is known except (possibly) for a multiplying factor, the foregoing theory can be modified. Suppose 8 Y : = P'Z:,

(1)

(2)

a Y t

t=l,

- P'Zf)(?/, - P'S) = a:s,

f,s=

..., T,

1 ,..., T,

where zI is a known (column) vector of p numbers, f = 1, . . . , T,and crt, = dyfr,t , s = 1, . . . , T, where the yts)s are known. It is convenient to put this model in a more complete matrix notation. Let y = (yl,.. . ,gT)', Z = (zz,. . . , zT)', C I : (uJ, and Y = (yJ. Then ( I ) and (2) are (3) (4)

8Y = zp,

80,- ZP)O, - zp)' = c,

and C = azY,where Y is known. Let D be a matrix such that (5)

DYD' = I.

Let Dy = x = (q, . . . ,zT)' and D Z = W = ( w l , . , . ,wT)'. Multiplication of (3) on the left by D and multiplication of (4) on the left by D and on the right by D' gives (6)

(7)

wp, b ( x - WP)(x- wp)'= bx=

(721,

which is the model treated in Section 2.2. The normal equation Ab = e, where A = W'W and c = Wx, is equivalent to (8)

Z'D'DZb = Z'D'Dy.

SEC.

2.4.

CASE

19

OF CORRELATION

From ( 5 ) we see that Y = D-l(D')-l = (D'D)-I and Y-l = D'D; the solution of (8) is

b = (ZY-12)-1Z'Y-1y.

(9)

Then Bb = (z'Y-'Z)-'ZY-'$y

(10)

= p,

(1 1)

b ( b - p)(b

- p)' = (ZY-'Z)-'Z'Y-'Q(y

- zp)(y- zp)'Y--1z(z'Y--Iz)--1

= ayZY-'Z)-'.

Each element of b is the best linear unbiased estimate of the corresponding component of p (Gauss-Markov Theorem); b is called the Markov estimate of p. The vector b minimizes (y - Zi)'Y-l(y - Z i ) . If yl,. . . ,yT have a joint normal distribution, b and da = ( T p)sZ/Tare the maximum likelihood estimates of p and o2 and constitute a sufficient set of statistics for p and u2. The least squares estimate

-

6, = (Z'Z)-lZ>

(12)

is unbiased,

with covariance matrix (14)

b(b,

- @)(b, - p)' = (Z'Z)-lZ'&'O,- Zp)(y - Zp)'Z(Z'Z)-' = rJ~(Z'Z)-1Z'YZ(Z'Z)-~.

Unless the columns of Z have a specific relation to Y the variance of a linear combination of b,, say y'b,, will be greater than the variance of the corresponding linear combination of 6, namely y'b; then the difference between the matrix (14) and the matrix (11) is positive semidefinite. THEOREM 2.4.1. If2 = Y*C, where the p columns of V* are p linearly irrdependent characteristic cectors of Y and C is a nonsingular matrix, then the least squares estimate (12) is the same as the Markov estimate (9). PROOF. The condition on Y* may be summarized as (15)

YY*= V * A * ,

20

THE USE OF REGRESSION ANALYSIS

CH.

2.

where A* is diagonal, consisting of the (positive) characteristic roots of Y corresponding to the columns of Y * . Then (16)

Y*’y-l = A*-lV*’

9

and Y * ’ Y - l V * = A*-’V*’Y*. The least squares estimate is (17)

bL = c-l(Y*’Y*)-l Y*’Y ,

and the Markov estimate is (18)

b = (C’Y*’Y-I V*C)-~C’V*’Y-ly = (C’A*-l v*’v*C)-lC‘A*-lv*’

Y9

which is identical to (17). Q.E.D. In Section 10.2.1 it will be shown that the condition in Theorem 2.4.1 is necessary as well as sufficient. The condition may be stated equivalently that there are p linearly independent linear combinations of the columns of 2 that are characteristic vectors of Y. The point of the theorem is that when the condition is satisfied, the least squares estimates (which are available when Y is not known) have minimum variances among unbiased linear estimates. In Section 10.2.1 we shall also take up the case that there are an arbitrary number of linearly independent linear combinations of the columns of 2 that are characteristic vector S of Y ; the least squares estimates of the coefficients of these linear combinations are the same as the Markov estimates when the other independent variables are orthogonal to these. That the condition of Theorem 2.4.1 implies that the least squares estimates are maximum likelihood under normality was shown by T. W. Anderson (1948). 2.5.

PREDICTION

Let us consider prediction of y,, the observation at f = 7. If we know the regression function, then we know b y , and this is the best prediction of y, in the sense of minimizing the mean square error of prediction. Suppose now that we have observations y l , . . . , yT and wish to use these to predict y, (T > T ) , where 8y, = ~ ~ , l @ k r = (3’2, and where (3 is unknown and z, is known. It seems reasonable that we would estimate (3 by the least squares estimate b and predict y, by b’z,. Let us Justify this procedure. We shall consider linear predictors d,y,, where the d,’s may depend on zl,. . . , zT and z,. First we ask that the predictor be unbiased, that is, that

z:TpI

IT

\

SEC.

2.5.

21

PREDICTION

Thus the predictor is to be an unbiased estimate of &yr = g'z,. We ask to minimize the variance (or equivalently the mean square error of prediction) IT

1 2

\

= &(

5 -p q + dty,

a*.

f=1

Thus the problem is to find the minimum variance linear unbiased estimate of g'z,, a linear combination of the regression coefficients. The Gauss-Markov Theorem implies that this is b'z,. This follows from the earlier discussion since the model can be transformed so that P'z, is a component of p* whenonecolumn of G-' is z,. The variance of b'z, is

(3)

b(b'z, - P'Z,)*

= EzXb - P)(b = &;A-'z,.

- g)'zr

The mean square error of prediction is (4)

G(b'zr - y,)2 = a2(1

+ z;A-'z,).

The property of this predictor can alternatively be stated as the property of minimizing the mean square error among linear predictors with bounded mean square error. (See Problem IS.) If we assume normality, we can give a prediction interval for yr. We have that b'z, - y, is normally distributed with mean 0 and variance (4) and independently of s2. Thus

has a t-distribution with T - p degrees of freedom. Then a prediction interval for y, with confidence 1 - E is

+

+

+

( 6 ) b'z, - t T - , ( e ) s d l z:A-'zr I yr 5 b'r, rT-,(e)sJ1 z:A-'z,, where tT-Je) is the number such that probability of the t-distribution with is 1 - e. T - p degrees of freedom between Now let us study the error of prediction when some independent variables are neglected. Suppose the components of zt are numbered in a suitable way, that z i is partitioned as before into (zj"' ti2"),and that the effect of zi2' is neglected, that is, that the regression is erroneously assumed to be a linear function of z:" instead of a function of z t . It is convenient to write p'z, as

22

CH. 2.

THE USE OF REGRESSION ANALYSIS

p*‘z:, where p* is defined by (7) of Section 2.3 with Gll = I, C,, = -A2#;;, and Gzz= I and (7)

Then z;‘l) and z;‘” are orthogonal, t = 1, . . . , T, and

The estimate of

P*(l)

is b*(1)

(9)

2

= A-1 (1) = 11c 4,’

(10)

= A;;

2 T

z ; l ) ( z y ’ p * ( l ) + z*(2’’ 1

t=l

= A;,’(Allp*‘”

- p*‘”

3

- P*‘”)(b*‘”

&((b*‘”

(11)

The prediction at time diction at T is

7

&b*‘”’zy

(12)

(> T ) is

- yJ

= = =

The variance of

b*(l)’z;’)

+

P2))

op*‘2’)

- P*(l))‘ = a2AG1* = b*‘l)‘zj”. The bias in pre-

b*‘l)‘z:‘”

p*(l)’p

- P’Z,

-p*tz)’Zr

*(*)

-p(z)’(zJ2)

is

&(b*(l”zjl’

ZPYt.

t=1

The statistical properties of the estimate are

&*‘”

T

- p*(1Vz1.”)2

(13) and the mean square error of prediction is

=

- A 21A-111 Zr 1. (1)

2 (1)’A-l ( 1 ) 11ZI

fJ

z,

(14) b(b*(’)’zJ’’- Y,)~ =

p‘2)’(z12)- A 2 1 A ; ~ ~ ~ ” ) ( ~ j2”

+ a2z~”AT,’zj*’+ 0’.

In general, neglecting of z ! ~ ’causes a bias in the prediction but decreases the variance. (See Problem 18.)

SEC.

2.6.

23

ASYMPTOTIC THEORY

In some instances one might expect to predict in situations where the 2,'s of the future are like the z,'s of the past. The squared bias averaged over the z,'s already observed is

2 [P(*)'(Z~~) - A21A;11z11))]2 T

(15)

+=I

=c T

p(2)'(zj2)

- A 21A -112, I (1) ( 2 ) ' - zS"'A-lA >(z* 11

)P(2)

12

t=1

= p'"'(A,, - A21A&l*2)p~2~,

which is proportional to the noncentrality parameter in the distribution of the F-statistic for testing the hypothesis pc2)= 0. This fact may give another reason for preferring the F-test to other tests of the hypothesis. 2.6. ASYMPTOTIC THEORY The procedures of testing hypotheses and setting up confidence regions have been based on the assumption that the observations are normally distributed. When the assumption of normality is not satisfied, these procedures can be justified for large samples on the basis of asymptotic theory.

+

THEOREM 2.6.1. Let y, = P'z, u,, t = 1 , 2, . . . , where the ut's are independent and u, has mean 0 , variance u2, and distribution function Ft(u), T i t = I , 2, . . . . Let A T = z : ,=l~ CT t~ =, , ~ t z t ,bT = A?'cT,

(1)

JZ

o

...

0

o

Ja,T,

...

0

0

0

...

J.;ff

D, =

and

(2)

RT = DFIATDF1.

Suppose (i) a: + 00 as T + 00, i = I , . . . ,p , (ii) zT,T+,/arfi.-+ 0 as T - co, i = I , . . . , p , (iii) R, + R, as T - . 00, (iv) R, is nonsingular, and (v)

(3)

sup t-1.2..

..

1

IIlI>C

u2dFt(u)+O

24

CH.

THE USE OF REGRESSION ANALYSIS

2

-

C - P co. Then DT(bT P) has a limiting normal distribution with mean 0. and covariance matrix a4R-,f.

as

PROOF. We have (4)

2 T

zzl

= (D-,'ATD$')-'D-;

ZtUt.

L-1

We shall prove that D;' Z,U, has a limiting normal distribution with mean 0 and covariance matrix ueR, by showing that ztut has a limiting normal distribution with mean 0 and variance uaa'R,a for every arbitrary vector

a'E$zzl

a (# 0). (See Theorem 7.7.7.) Let y: = a'&lz, = 1 , . . . , T, T = 1 , 2 , . . . . Then

Let W: = yTui/(aJa'RTa).Then Bw? = 0,

1 saz

... >'u

(aiz,,/&;),

t=

zE1Var (w:) = 1, and

s

SUP

t-1.2.

c;-l

P[a'RTa/

1'1

u2 dF,(u) + 0, max

(yt")'bs

I...IT

where F:(w) is the distribution of w:, t = 1, . . . , T. The limit in (6) follows by (3) since

which converges to 0 by (i), (ii), and Lemma 2.6.1 following. The LindebergFeller condition (6) implies that a'E$ z p , has a limiting normal distribution. [See Lotve (1963), Section 21.2, or Theorem 7.7.2.1 Therefore DG1 e,u, has a limiting normal distribution with mean 0 and covariance

zzl

zEl

SEC.

2.6.

25

ASYMPTOTIC THEORY

-

matrix azR, and DT(bT p) has a limiting normal distribution with mean 0 and covariance matrix azR-,'. Q.E.D. LEMMA2.6.1. (i) a: -. 03 and (ii) ~ i , ~ + ~--*/ 0a imply z

max

t=l..

:2

. ..T -0

a:

Conversely (8) implies (i) and (ii).

as T-m.

Let t ( T ) be the largest index t ( t = 1 , . , . , 7') such that z : ~ ( = ~) Then {t(T)}is a nondecreasing sequence of integers. If { t ( T ) } is bounded, let T be the maximum; then PROOF.

maXt=1,.

. . ,T zzi t '

(9)

by (i). If t(T)-+

max zf, ...,T < - +zO: ~ a: a: as T-. co, then f=i.

00

by (ii). Proof of the converse is left to the reader.

+

COROLLARY 2.6.1. Let y f = P'zt ut, t = 1,2, . . . , where the U;S are independently and identically distributed with mean 0 and variance Oe. Suppose (i), (ii), (iii), and (iv) of Theorem 2.6.1 are satisjied. Then DT(bT p) has (I limiting normal distribution with mean 0 and covariance matrix azR-,'.

-

+

COROLLARY 2.6.2. Let yt = P'zt ut, t = 1,2, . . . , where the uf)s are independent and ut has mean 0 and variance 8. Suppose (i), (ii), (iii), and (iv) of Theorem 2.6.1 are satisfied. If there exist 6 > 0 and M > 0 such that B lutlz* M , t = I , 2, . . . , then DT(bT p) has a limiting normal distribution with mean 0 and covariance matrix aZRi1.

-

Theorem 2.6.1 in a slightly different form was given by Eicker (1963). The two corollaries use stronger sufficient conditions. Corollary 2.6.2 with the condition (ii) replaced by z;zf uniformly bounded can be proved directly by use of the Liapounov central limit theorem. (See Problem 20.) However, the boundedness of z:zt is not sufficiently general for our purposes since polynomials in r do not satisfy the condition. THEOREM 2.6.2. Under the conditions of Theorem 2.6.1, Corollary 2.6.1, or Corollary 2.6.2 sz converges stochastically to 8.

26

CH. 2.

THE USE OF REGRESSION ANALYSIS

PROOF.Since

t=l

f=1

T

we have T

The second term in (12) is nonnegative and has expected value

--

u2, T-P which goes to 0 as T - , co. Hence, by Tchebycheff’s inequality the second term converges stochastically to 0. The theorem will follow from the appropriate law of large numbers as T

T

T

where zt = u: and &zf = &if; = u2. For the conditions of Corollary 2.6.1 the zt’s are identically distributed; for the conditions of Corollary 2.6.2, & lzt11+j6< M [Lotve (1963), Section 20.11; and for Theorem 2.6.1 (15)

sup

f=I.Z,..

.

x d G , ( z )+ 0 z>d

as d -+co,where G,(z) is the distribution function of z, = u:. [See Loeve (l963), Section 20.2, and Problem 21.1 Q.E.D. The implication of the theorems is that if the usual normal theory is used in the case of large samples it will be approximately correct even if the observations are not normally distributed. We shall see in Section 5.5 that in autoregressive processes, where p‘z, is replaced by a linear combination of lagged values of yf, there is further asymptotic theory justifying the procedures for large samples

SEC.

2.6.

27

ASYMPTOTIC THEORY

when the assumptions are not satisfied. The asymptotic theory will be generalized and presented in more detail in Section 5.5. In Section 10.2.4 similar theorems will be proved for {uf}a stationary stochastic process of the moving average type.

REFERENCES

The theory of regression is treated more fully in several textbooks; for example, Applied Regression Analysis by N. R. Draper and H. Smith; An Introduction to Linear Statistical Models, Vol. I, by Franklin A. Graybill; The Design and Analysis of Experiments by Oscar Kempthorne; The Advanced Theory of Statistics, Vol. 11, by Maurice G. Kendall and Alan Stuart; Principles of Regression Analysis by R . L. Plackett; The Analysis o/’ Variance by Henry Scheffk; Mathernatical Statistics by S . S . Wilks; and Regression Ann/@is by E. J . Williams. Section 2.2. T. W. Anderson (1958). Section 2.4. T. W. Anderson (1948). Section 2.6. Eicker (1963), Loeve (1963).

PROBLEMS 1. (Sec. 2.2.) Prove that b defined by ( 5 ) minimizes ~ ~ , ( y 5, to $. [Hint: Show that

1 =I

’ ~ with ~ ) ~respect

L=l

2. (Sec. 2.2.) Verify (8) and (9). 3. (Sec. 2.2.) Prove the Gauss-Markov Theorem. [Hint: Show that if the cornponents of afycare unbiased estimates of the components of f3, where at = A-’z, + d t , I = 1 , . . . , T , and a,, . . . ,uT and dl, . . . ,dT are vectors of p components, then d,zi = 0. Show that the variances of such estimates are the u2 d,d;.] diagonal elements of u2A-’ 4. (Sec. 2.2.) Show that 6 and 62 = (T - p ) s z / T are the maximum likelihood estimates of p and u2 if yl, . . . , yT are independently normally distributed. 5. (Sec. 2.2.) Show that b and s2 are a sufficient set of statistics for f3 and u2 if y,, . . . , yT are independently normally distributed. [Hint: Show that the density is

xc, , : I

+ , : I

( 2 7 ~ u * ) - exp ) ~ { - 6 [ ( b - P)’A(6

- f3)

+ ( T -p)s2]/u2}.]

28

THE USE OF REGRESSION ANALYSIS

CH.

2.

6. (Sec. 2.2.) If &y = Zp and 8 ( y - Zp)(y - Zp)‘ = aaI, prove that the covariance matrix of residuals y - Zb is uz(f - ZA-lZ’). 7. (Sec. 2.2.) In the formulation of Problem 6,prove &(y - Z6)(6 p)’ 5 0. 8. (Sec. 2.2.) Let A (nonsingular) and B = A-’ be partitioned similarly as

-

where A,, is nonsingular. By solving AB = I according to the partitioning, prove (i)

BlZBk1= -Aii1A12,

(ii)

Bz2 = (Az2 - A,,AT~A,,)-*.

9. (Sec. 2.3.) Show that if C = (gi,) is nonsingular and lower triangular (that is, g,, = 0, i < j ) , then G-’is lower triangular.

10. (Sec. 2.3.) Let i t be the residual of yt from the sample regression on z;”, be the residuals of ziz’from the formal sample regression on zil). and let i:z) (a) Show &ij, = p(2)‘Z{z). (b) Show that the least squares estimate of P(z)based on itand .tiz) according to (a) is identical to that based on y, and z.I 11. (Sec. 2.3.) Prove explicitly and completely that D = r. be the orthogonalization of Zk#, t = I , , . , T, p*’= 12. (Sec. 2.3.) Let (p*[l)‘p*(2)’)be the corresponding transform of p‘ = (f3(1)‘ pc2)’).Prove that criterion (12) of Section 2.2 for testing the hypothesis H: pcz)= 0 is

.

13. (Sec. 2.4.) Show algebraically that the difference between (14) and (11) is positive semidefinite. [Hint: The difference is ur times

(z’z)-l[Z’YZ

- Z’Z(z’Y-’z)-’Z’ZJ(Z’Z)-’,

which is positive semidefinite if

(

Z‘Y-’2 Z’Z

Z’Z Z’YZ

)

= (Z;-l)Y(Y-lz

Z)

is positive semidefinite.] 14. (Sec. 2.5.) Verify that the Gauss-Markov Theorem implies that b’z, is the best linear unbiased estimate of p‘z,.

PROBLEMS

29

cEl

15. (Sec. 2.5.) Show that if a linear estimate ktyt of an expected value is biased, then the mean square error is unbounded. [Hint: If d z z , k t y t # P'z, for P = y, then consider g = k y as k m.1 16. (Sec. 2.5.) Show that one-sided prediction intervals for y , with confidence 1 - E are given alternatively by

-

and 6'zr - f T - , , ( 2 & ) d 1

+ Z;A-'Zr

5 yr.

17. (Sec. 2.5.) Show that a prediction region for y r and y, with confidence 1 - E is given by

18. (Sec. 2.5.) Prove

(T

> T, p > T, T #

p)

< -

z l l ~ ' ~ - l z ~ z'A-lE l ~ r u r r r.

[Hint: Use (7).1 19. (Sec. 2.6.) Prove Corollary 2.6.2 using Theorem 2.6.1. 20. (Sec. 2.6.) Prove Corollary 2.6.2 (without use of Theorem 2.6.1) with condition (ii) replaced by (ii') there exists a constant L such that zIzt L, t = 1, 2, . . . 21. (Sec.2.6.) Prove that (IS) implies

0, we see that boois the coefficient of yt. However, the general theory of the least squares method states that the variance of a, is a2boo.Q.E.D. If k = 0, the variance is a2/(2m 1). If k = 1, the variance is

+

(26)

If m = k

+

3(3m2 3m - 1) (2m - 1)(2m 1)(2m

+ 1, the variance is

+

+ 3) a*.

+

u f l - c ( k +2k, ) ] =2 u ~ [ l -

In Table 3.4 we give the variances of some smoothed values. For given k the variance goes down as the number of points used increases. For a given number of points (that is, given rn) the variance increases with k.

SEC.

3.3.

53

SMOOTHING

TABLE 3.4 VARIANCES OF SMOOTHED VALUES (a2

2m+I

3 5 7 9 11

k =O q=O,I

Q

= 0.333

+ = 0.200 + = 0.143 $

= 0.111

$1 = 0.091

= 1)

k = l q =2,3

k = 2 q =4,5

ii = 0.486 Q

= 0.255

# = 0.567 i g = 0.417

0.207

Q = 0.333

= 0.333

$& =i

k=3

q =6,7

-&?J7 = 0.619

In fact for given m - k (which is half the excess of points over implicitly fitted constants) the variance increases with k. We notice also that yf - ,:y the residual of the observed value from the fitted value, is uncorrelated with :y because in general the estimated regression coefficients are uncorrelated with the residuals. (See Problem 7 of Chapter 2.) Thus Var (y, - yr) = o2 - Var y:

(28)

= aZ(1

- co).

As indicated earlier, successive smoothed values are correlated. As an example, the correlations of y: with y z l , Y:-~, Y:-~, and yT-4 in the case of k = 1 and m = 2 are

(29) 23%= 0.565,

+& = 0.071,

-L2- 6 9 5

= -0.121,

3:s = 0.015,

respectively. We shall study this effect more in Chapter 7 after we have developed more powerful mathematical tools. When yf = f ( r ) u, and the smoothing formula with coefficients c, is used, the systematic error in the smoothed value is

+

z: m

(30)

4 Y t

-):Y =1(0-

t=-m

cJ(r

+ s).

If the smoothing formula is based on a polynomial of degree q and the trend is a polynomial of this degree (or less), the systematic error is 0; otherwise there will be some systematic error. Suppose k = 0 (q = 0 or 1) and the coefficients are the same; then the systematic error is

54

TRENDS AND SMOOTHING

CH.

3.

the difference between f(t) and the arithmetic mean of the neighboring values. s), s = - m , . . . ,m , is written in terms of the orthogonal Suppose f ( r polynomials to degree 2m 1 (orthogonal over the range - m , . . . ,m) as

+

f(r

(32)

+ S) =

YO

+ +

yl&.m.+l(s)

+

* *

+

~~m+l#m+l.2rn+l(~)t

s = -m,...,m.

Then use of the smoothing formula based on a polynomial of degree 2k or 2k 1 has systematic error

+

(33) &h y:) = Y2k+2&+2.2rn+1(0) Y2k+4&+4.2m+1(0) * ' Y2n~~m.2m+l(0)~ since the fitted polynomial consists of the terms of (32) to degree 2k or 2k 1 and #:,2m+,(0) = 0 for i odd. (See Problem 30.) A mean value function we shall study in Chapter 4 isf(r) = cos (At 0). This is a cosine function with period 2 ~ 1 2 .Ifthecoefficientsarer, = I/(2m l), the expected value of the smoothed variate is

-

+

+

+

+

-

+

(34)

0 q. Consideration of (28) for q = 0 , 1 , . . . ,p in turn shows do = 0, dl = 0, . . . ,d , = 0, respectively. This proves the lemma.

+

1 COROLLARY 3.4.1. The residual (26) of a smoothing formula of 2m terms based on apolynomial of degree 2k or 2k 1 is a linear combination of ha'+%, . . . , ALmoperating on yt-,,,.

+

In the case of m = k + 1, the operator is of degree 2m = 2k + 2 and hence must be a constant multiple of (9- 1)2k+2 = A2k+2,say C'A2k+2. We now evaluate the constant by comparing two formulas for the variance of

a=-(k+l)

We have

66

TRENDS AND SMOOTHING

CH.

3.

from (22). Furthermore

= rJ2 - a%, follows from Theorem 3.3.1 and the fact that the estimated regression and the residual from the estimated regression are uncorrelated. Since 1 c, is the coefficient of a?+l in C'(x l)ak+n we have

-

-

2k

+2

k+l Thus

+4 2k + 2 4k

(33)

(34)

(35)

c-, = c, = (-1).+l

(T:12)(ky

:: G::::)

s),

which were given in (19), (20), and (21)of Section 3.3. 3.4.4. Estimating the Error Variance

We have shown that if the trend f ( t ) is approximately a polynomial of low degree in short intervals a difference sequence has mean approximately 0. If these means are exactly 0,

3.4.

SEC.

67

THE VARIATE DIFFERENCE METHOD

is an unbiased estimate of a2. In fact, whatever the trend

(37)

. 1,.

Then the variance of Vr when Arf(t) = 0, t =

. . , T - r , is K,'times

LEMMA 3.4.3. Let zzj=,laijuiuj = u'Au, where A is symmetric andbu, = 0, &u: = a2, &upj = 0,i # j , buf = K~ 3a4, &u:u,2 = a4,i # j , and&uiujuku,= 0 unless the subscripts are equal in pairs. Then

+

82

aijuiuj= a2

(39) (40)

b(

2

i,j=l

aijuiu,)I=

i,j=l

2

2

a , = a2 tr A,

i 4

ailaklcf'uiujukul

i,j . k . l = l

i=l

i=l

j the conditions of Lemma LEMMA3.4.4. The t7ariance of ~ ; , j , l a i f u i uunder 3.4.3 is

(41)

Var

(2

a,juiuj) = K4

i.j=l

2+ 2 a:i

i=1

= K4

204

a:j

i.j-1

2

a:

+ 2a4 tr A'.

i=1

If the U:S are normally distributed, K~ = 0 and the first term in the variance is 0. Let 1-1

68

TRENDS AND SMOOTHING

CH.

and let A , = (a(::). Then A,, = I , 1

-1

0 -1

2

0

0

A, =

0

5

-2 1

1

L o

0

1

-3

-3

10

3

-!2

(45)

0

0

0-

1

... ...

0 0

6

...

0 ,

0

...

1

-4

-4

0

3

-1

6 -!5

-1 6

20 - 1 5

-15

6

0

0

-1

-12

19 6

-1

AS =

...

-4 6

-4

0

Az=

0

1

r 1 - 2

(44)

... ... ...

0 2

-1

(43)

-1

0

20

-15

0

0

3.

SEC.

3.4. 1

17 6

-28

I

-8

0

1

0

0

22

-8

-52

-52 28

1

-4

-28 53

22

-4 =

6

-4

-4

(46) A,

69

THE VARIATE DIFFERENCE METHOD

28

1

...

-8

. . a

28

-56

70

0

0

0

..

28

-56

-56

0

...

... ... ...

69

-8

0

70

-56

Also V, = K,S,. Whenf(t) = 0, we compute

1 (47) Var Vo = - { T (48)

+ 2a4),

K ~

1 Var V, = T - 1 ([I - 2 ( T - 1)

5 (49) Var Vz = T - 2 ([l-9(T-2)

1.4

+ 2[' 2 - 2(T - 1) -k 2[- 35 18 2(T

69 (50) Var V3 = ] K , -k 2[- 231 T - 3 ([l - lOO(T - 3) 100 (51)

Var V, = T - 4 ([I

194

- 245(T - 4)

]tc4

- 2(T - 3)

+ 2[- 1287 490

- 2)

2(T

- 4)] a $

where K , is the fourth cumulant. To find a general formula for the variance of V, is difficult because the elements on each diagonal of A, at the two ends are different from the elements nearer the middle. We have

70

CH. 3.

TRENDS AND SMOOTHING

Then a::) is the coefficient of yi in (52), which is

(53)

s=r+l,

as, (r)=i(:)2=(:),

..., T - r ,

0-0

The first sum is evaluated from the identity

(54) = (z - 1)'(1

- zy

being the coefficient of zr. Other nonzero elements are

(55) s=r

fork= 1, ..., r.

- k + 1 , . . .,T - r,

s = 1,.

. .,r - k ,

The variances can now be evaluated, but the calculation is tedious. We have for even r , say r = 2n,

SEC.

3.4.

71

THE VARIATE DIFFERENCE METHOD

because a;ls+l,T--s+*- a"' I ( and fr)

(57)

ar+l-s.r+l--s=

Similarly, if r = 2n

+ 1, we have

(y ) -

a:).

s = 1,

..., r.

because

(59)

r=2n+1.

In each case the leading term is ( T - r ) We also find that for k = 1 , , . . ,r

(7 .

T-k

(See Problem 55.) Then the leading term in ~ ~ , = , [ uis~ ; ) ] ~

r

- k odd.

72

CH. 3.

TRENDS AND SMOOTHING

For r = 2n the other term is

:r-

Kendall and Stuart [(1966), p. 389) give this other term as

Thus we have [for Arf(t) = 0 ) (63) Var Sr= [(T

- r)(

+ 2[(T -

-/3,

=

--

0)

- /3,]u4,

where K4 is the fourth cumulant and a,is deduced from (56) or (58). [Kendall and Stuart (1966), p. 389, also give an expression for a,.] If T is large, the variance of V, is approximately

If the u;s are normally distributed, K4 = 0. Since the variance goes to 0 as T increases, V, is a consistent estimate of 6. It can be shown that when K4 exists and the ut*sare independently and identically distributed

SEC.

3.4.

73

THE VARIATE DIFFERENCE METHOD

has a limiting normal distribution with mean 0 and variance 1. (See Section 3.4.5.) If u4 = 0, the variance of J T

- r V, is asymptotically 2d

(;)/(T

and this is a measure of the efficiency of V, as an estimate of 08. As is to be expected, the variance of V, increases with r.

LEMMA3.4.5. The covariance of the two (symmetric) quadratic forms It,,=, bk,UkU, = U ~ under U the conditions of Lemma 3.4.3 is atfuiuj= U’Au and

= u42 a , b i ,

+ 2a4tr AB.

1-1

It can be shown that the covariance between V, and V, is approximately

The exact formula differs from this one by terms which result from the fact that the coefficients of terms near the ends of the series are different from those in the middle. Kendall and Stuart [(1966), p. 3891 give the covariance exactly for the case of q = r 1. * Quenouille (1953) has pointed out that a linear combination c,V, c,V, with c, * c, = 1 is also an unbiased estimate of u* [if A p f ( t ) = 01. I f p = 0, then V, = y:/T is the best unbiased estimate of a* because then f (1) = 0 and V, is a sufficient statistic for a* under normality. [If p = 1, and hence f (t) is constant, then p and (yt - g)* are a sufficient set of statistics under normality.] For p 2 I Quenouille raised the question of what linear combination has minimum variance for various p and q when u4 = 0. If p = 1 and q = 4, the “best” coefficients are cl = 7.5, cI = -20.25, cs = 22.50, and c4 = -8.75; the variance of this is 3/4 the variance of V,. It turns out that these linear combinations have coefficients that alternate in sign. Thus there is no assurance that the estimate will be positive; in fact, since the coefficients are numerically large relative to 1, one can expect that the probability of a negative estimate of the variance is not at all negligible.

+ + -+

xcl

+ -+

xcl

74

TRENDS AND SMOOTHING

CH.

3.

In formulating this problem one is led to the question of why the limit on the number of V;s and-more important-why restrict the statistician only to sums of squares of the variate differences? What is desired is the best estimate of oz when the trend is “smooth.” The difficulty is in giving a precise enough definition of “smooth” so that the problem of best estimate is mathematically defined. If smoothness is defined so that it implies the trend is a polynomial of degree q , then the best estimate of a* is the one given in Section 3.2. Any other definition of smoothness is either too vague or too complicated. Kendall [in Problem 30.8 (1946a)I has proposed modifying the variate difference procedure by inserting dummy variables y-r+l = Y - ~ + = ~ *** = yo = 0 and yT+l = * . = yT+, = 0 and computing the sum of squares of Arytfrom t = - r I to T. Then there are simple relations between these sums of squares and sums of cross-products ;2 :: y t ~ t + j . Quenouille (1953) has proposed another modification that involves changing the end-terms. His mth modified sum, say, is an average of two (m - 1)st modified sums as

+

-

where (70)

Modified sums of ( A ‘ Y ~ )and ~ ytyl+jcan then be used and related. Another modification that leads to some simplification is using A; instead of A,. (See Chapter 6.) 3.4.5.

Determination of the Degree of Smoothness of a Trend

One may be interested in questions of whether the trend has a certain degree of smoothness. For instance, the choice of a smoothing formula depends on this feature. That is, one may want to know whether a trend is smooth in the sense that polynomials of specified degree q are adequate to represent the trend in each interval of time. This corresponds to a problem of testing the hypothesis that a given degree is suitable against the alternative hypothesis that a greater degree is needed. We can also consider the multiple-decision problem of determining the appropriate degree (between a specified m and a specified q ) of a polynomial as an approximation of the trend. If one considers an exact fit the only adequate representation, then these questions are equivalent to those treated in Section 3.2 on polynomial trends. If one permits a more generous, but unspecified, interpretation of adequate

SEC.

3.4.

75

THE VARIATE DIFFERENCE METHOD

representation these problems are not mathematically defined. It is conceivable that one specify the maximum allowable discrepancy between the actual trend and the polynomial approximation, but the resulting mathematical statistical problems would be intractable. We might note that the general theory of multiple-decision problems, which is the generalization of that given in Section 3.2, is not applicable here. We shall now consider these problems restricted to the use of the sums of squares of variate differences. From (37) we see that the expected value of V, depends on ZC;'[AY (r)I2. This is 0 iff(r) is a polynomial of degree less than r and is close to 0 if/(r) is approximately a polynomial of degree less than r for every succession of r 1 values of r. These facts suggest that the V;s are suitable for the problems of statistical inference mentioned. To decide whether Ay(r)is approximately 0 for all available t [that is, that (r - I ) degree polynomials are adequate representation] against the alternative that Ar+'f(t)is approximately 0 one could see whether V, was not much larger than V,+l. The smoothing formulas considered in Sections 3.3 and 3.4 were based on the 1 was an adequate repreassumption that a polynomial of degree 2k or 2k sentation of the trend over 2m 1 successive time points. In particular we noted that when m = k + 1 the bias in estimating the trend was C'Azk+2f(r). Thus the questions we are studying here correspond to problems about the suitability of smoothing formulas. Consider testing the hypothesis A T ( f ) = 0 under the assumption that A a f ( r )= 0 (q > r). We shall reject this hypothesis if V, is much larger than V,. The procedure can be based on

+

+

+

From (68) we see that the variance of the numerator J T - q ( V , approximately

- V,)

is

76

TRENDS AND SMOOTHING

CH.

3.

When the hypothesis is true, the expected value of the numerator of (71) is 0, and otherwise it is positive. We note that we can write

-

Each of the last two terms multiplied by J T q converges stochastically to 0. (Note the factor in front of the second term is of order 1/Tg.) The first term is the average of T - q terms. If the uLs are independently and identically distributed these terms are identically distributed but not independently distributed. However, the sequence of terms forms a stationary stochastic process (see Chapter 7) and terms more than q apart are independent; this is called a finitely dependent stationary stochastic process. Theorem 7.7.5 asserts that J T - q ( V, - VJ has a limiting normal distribution. Since V, is a consistent estimate of uz (whether the null hypothesis is true or not), (71) has a limiting normal distribution with variance given by (72) with u4 omitted. and u,, u2, . . . , uT are independently and identically distributed with gut = 0, and 8u: 00, then

THEOREM 3.4.1. If Arf(t) = 0, t = I ,

. . . ,T - r ,

0 has a rejection region of the form

xi-,,

+

.

xEn+2

t

+

.

[Hint: Since Problem 10 shows that the rejection region satisfies (i), consider the x:. Compare the intersection of x1 = density at given x,, . . . , x,, and clr . . . ,x, = c,, + . . . + z$ = c with the region R above and the intersection with any other region R* satisfying (i).]

zz,+,

13. (Sec. 3.2.2.) Prove that in the formulation of Problem I I the uniformly most powerful test of the hypothesis P , , + ~ = 0 at significance level E against alternatives P,,+~ > 0 has a rejection region (iv) of Problem 12.

85

PROBLEMS

14. (Sec.3.2.2.) Let xl, . . . ,x, be normally independently distributed with means 0 and variances 1, and let S be independently distributed as xz with m degrees of freedom. Let

ri =

PI+ * . Xi

+x;

+s

i=l,

9

..., n.

m+n-i

Prove that r,, . . . , t, are independently distributed according to t-distributions, the distribution of t i having rn + n - i degrees of freedom. [Hint: Transform the density of xl,. . . ,x,, and S to that of I,, . . , r,, and u = xi * . x: S.] 15. (Sec. 3.2.2.) Show that (19)and (20)are equivalent. 16. (Sec. 3.2.2.) Show that aii defined by (22)is

+ -+ +

.

-

-

( i ! ) 4 ~ ( Pl )(P 4)(T2

(2i)!(2i

- 9)

+ l)!

. (T'

- iz)

17. (Sec. 3.2.2.) Give the fitted trend of Example 3.1 as a polynomial in r (where r is the year less 1918). 18. (Sec. 3.2.2.) Dow Jones Industrial Srock Price Averages Year Price

Year Price

Year Price

1897 1898 1899 1900 1901 1902

1903 55.5 1904 55.1 1905 80.3 1906 93.9 1907 74.9 1908 75.6

1909 1910 1911 1912 1913

45.5 52.8 71.6 61.4 69.9 65.4

92.8 84.3 82.4 88.7 79.2

Using these data do the following: (a) Graph the data. (b) Assuming the least squares model with &yt = yo ~ , + ~ ~ ( tYz+lT(t), ) where +*&) are the orthogonal polynomials, find the estimates of yo, yl. and yz. (c) Find a confidence region for with a confidence coefficient 1 - E = .95. = 0 at the .05level of significance. (d) Test the hypothesis H : (e) If &yt = Po Plr PzrZ, find the least squares estimates of Po, PI, and &. [Hint: These estimates can be found from (b).] (f) Compute and graph the residuals from the estimated linear trend co cl+lT(t). 19. (Sec.3.2.2.) Suppose the polynomial regression is

+

+

+

+

+

8 y t = Po

+ Blr + . . + Potq

and the investigator wants to determine the degree of polynomial according to the method of Section 3.2.2. Show how the computation can be done efficiently by using the forward solution of the normal equations.

86

TRENDS AND SMOOTHING

CH.

3.

20. (Sec. 3.3. I .) I n ( I ) let a = I and k = 3. (a) Tabulate and graph / ( t ) . (b) Verify that the two constituent functions define d!/dtconsistently at t = 2k = 6. ( c ) Find the parabola that approximates/(t) best over the range t = 0, I , . . . , 12 i n the sense o f minimizing the sum o f squared deviations. (d) Tabulate and graph this parabola. (e) Tabulate the deviations. (f) What parabola fits best at t = 5 , 6 , 7 and what are its values at these points? (g) What i s the value of the first constituent parabola at t = 12? (h) What is the best fitting parabola at t = 4, 5 , 6, 7, 8 and what are its values at these points? (i) What i s the best fitting cubic over the range t = 0, I, . . . , 12? (j) Tabulate and graph the cubic. 21. (SCC. 3.3.1.) I f z ~ , = I, show that

Z''=-,,

zc2> I

If, furthermore,

+I

- 2m

s--,,,

.

c , ~2 0, show that Z C ?s -< I s=--tn

and that equality holds only i f all the c,'s but one are 0. 22. (Sec. 3.3.1.) Find a, when k = 2. [Hint: Find the orthogonal polynomials ,(s), i = 0 , 2, 4, defined over s = -tn, . . . , m and represent a, in terms o f the fitted linear combination of these.] 23. (Sec. 3.3.1.) Smooth the series in Example 3.1 with the procedure based on k = I and In = 2. Compare with the fitted cubic polynomial. 24. (Sec. 3.3. I .) (a) Smooth the series in Problem 18 with the procedure based on k = 0 and n i = 3. (b) Smooth with the procedure based on k = 1 and m = 2. (c) Compare the smoothed series with the fitted trend i n Problem 18. 25. (Sec. 3.3.1.) Let

#cZm,

I I

C =

I

1

'.'

I

z2 32 . . .

in2

34

...

tn4

3Zh

...

,$k

124

. . .

. . .

1

22h

. . .

87

PROBLEMS

1

Prove that the matrix of coefficients in (13) is 1 0

...

0 0

. .

+ 2CC’.

. . . .

0 0

.. .. ..

0

26. (Sec. 3.3.1.) In Problem 25 let D = 2CC’ and let D,,,, be the cofactor of d,. Prove

27. (Sec. 3.3.1.) Verify the entries in Table 3.3. 28. (Sec.3.3.2.) Find a confidence interval for a. at r given an independent estimate s2 of u2, where s2/u2 has a X2-distribution with n degrees of freedom. 29. (Sec. 3.3.2.) Let the orthogonal polynomials4~,,+,(s) be defined over -m, . , m. Find +:,&) and +t,,(s). 30. ( S e c . 3.3.2.) Let theorthogonal polynomials $r2m+l(s)bedefinedover -m, . ,m. Prove +&,+,(O) = 0 for i odd. [Hint: Let +*(s) = co cls * * * c,,ss” sZrf1. Prove that 4*(s)s2j = 0, j = 0 , 1 , . . . , r , yield a set of r 1 homogeneous equations in the r + 1 unknowns co, c2, . . . , car with nonsingular coefficient matrix.] 31. (Sec. 3.3.2.) Verify (34). [Hint: Let cos [ A ( t s) - O] = g e x p [ i ( h It - O)], where g exp [iy] is the real part of eiY,and use the formula for a geometric xa = (1 - xm)/(l - z), x # 1. for x = eiy.1 sum 32. (Sec. 3.3.2.) Prove

+

zr=n=_m

+

+

+

.. .. +

+

+

zs

where F 2 , . 1 9 - p i(s~ the ) &-significance point of the F-distribution with 2 and T - p degrees of freedom. This confidence region consists of the circumference and interior of the circle with center [ ~ ( k , )h(k,)] , and radius J ~ , ~ ~ F . , , ~ . - , , ( E ) / T . The points of the circumference and interior of this circle in polar coordinates,

Ja*'

+

p* = p*' and O * = arctan ( p * / a * ) , constitute a confidence region for the amplitude p ( k , ) and phase O(k,). The minimum and maximum p * in the circle are the end points of the interval

the interval (39) constitutes a confidence interval for p ( k , ) with confidence greater than I E . [If the lower limit o f ( 3 9 ) i s negative, i t can be replaced by

-

114

CH. 4.

CYCLICAL TRENDS

0.1 Using the noncentral F-distribution of (27), we could obtain a confidence

interval on the noncentrality parameter Tp2(k5)/(2a*),but this is not very useful since up is unknown. If R(k,) < J4s*F2,,-,(~)/T, then the origin is in the confidence circle and p* = 0 is admitted; only the upper limit in (39) is effective. In this caw the null hypothesis p(k5) = 0 is not rejected at significance level E . For every angle 8* there are points in the confidence circle. However, if the origin is not in the confidence circle, one can define a confidence interval for 8(k,) by including all angles 8* in the circle; that is, one draws the two tangents to the circle that go through the origin and the angles of these are the endpoints of the confidence interval for 8(k5). In this case the length of the interval for 8 is less than w. The procedure yields confidence greater than 1 E. Another approach is to use the fact that a(k,) tan 8 b(k5) is normally distributed with mean a(k5)tan 8 p(k5) and variance 2a2(1 tan2O)/T= 2dsec' 8/T. If 8 = 8(kj),the mean is 0; then

-

-

[a(k5)tan 8

(40)

- b(k5)]'

- s2(i + tan'

T The event in the braces is

8)

L

Inversion 0. (41) to obtain limits on tan 8 gives

-

+

SEC.

4.3.

115

WHEN PERIODS ARE DIVISORS OF SERIES LENGTH

if the denominator is positive (for then the quantity under the radical is nonnegative), and a(k,)b(k,)

tan 8

+ Jfs'Fl,T-.(E)[R'(kj)

2-

(43)

a(kj)b(kj) tan 0 2

-

-

,/-

- -2 S ' F , d E ) ] T

2 a2(k,) - 7 SSFl,T-D(&)

2 S ~ F ~ , ~ . - &R2(kj) )[ T 2 a2(kj) - T s2F1,T-p(~)

or T2 S~F~,T--&)] 9

if the denominator is negative and the quantity under the radical is nonnegative. If the quantity under the radical is negative, the two ratios are imaginary. In that case (41) is satisfied by all 8 since a2(kj)< ~ S ~ F ~ , ~ - ~The ( Econfidence )/T. region for tan 6 then is (42) when the denominator is positive and (43) when the denominator is negative in case the quantity under the radical is nonnegative; it is the entire line when the denominator is negative in case the quantity under the radical is negative. The probability that the interval be the whole line is the probability that (44)

-

the left-hand side has a noncentral F-distribution with 2 and T p degrees of freedom and noncentrality parameter TpS(kr)/(2$). The probability of (44) is small if Tp2(kj)/(20L) is large. Note that the confidence interval based on (38) is trivial when the left-hand side of (44) is less than F 2 , T - p ( ~which ), is about 3/2 of the right-hand side of (44). [Scheffd (1970) has proposed a procedure that yields the confidence set (38) for a ( k j ) and p(k,) when that set includes the ( E ) replaced by a origin and yields intervals similar to (42) and (43) with Fl,T-v suitably chosen, decreasing function of Fz,T-v( E ) ] . The three cases are (42) when ~ S ~ F ~ , ~ - , (51Ea2(kj), ) / T (43) when a2(kj)5 ~ S ~ F , , ~ - , ( E2 ) / a2(kj) T b2(k,), and the entire line when a*@,) b9(kj)5 ~SV'~,~-,,(E)/T. These three cases occur when a2(kj)is large, when a2(kj)is small and b2(kj)is large, and when both are small, respectively. The test procedures and confidence regions given in this chapter are based on the observations being independently normally distributed. However, the results hold asymptotically for fixed periods T / k j if the yt's are independently distributed with uniformly bounded (absolute) moments of order 2 + d for some d > 0 or if a Lindeberg-Feller type condition is satisfied. (See Section 2.6.)

+

+

1 I6

CYCLICAL TRENDS

CH.

4.

4.3.4. Deciding About Inclusion of Trigonometric Terms A difficult but pertinent problem is what trigonometric terms to include in a cyclical trend. This is a multiple-decision problem that also arises in other areas where regression methods are used, but there are some special features that occur here. We assume there are p terms in the trend function, including a,,and possibly akT(- I)t, if T is even, and 3(p 2) or i ( p - 1) pairs of trigonometric terms with periods T / k i ,j = 1, . . . , a(p - 2) or a(p - 1); then there is available an estimate of d with T - p degrees of freedom. Suppose there are q pairs of trigonometric terms whose inclusions are in question; that is,p - 2q terms are certain to be included in the trend. For each pair of trigonometric terms we wish to decide whether the amplitude is 0; that is, p(k,) = 0 for each j. The possible decisions are

-

H,:

p(k,) = * . * = p(k,) = 0 ,

> 0, p(kj) = 0, p(kh) > O , p ( k , ) > O , p ( k j ) = 0, i #

Hi: p(k,) (45)

Hhi:

H12..

.a:

p(kh)

> O,

j # i , j = I , . . . ,q, i, j # h, / = 1 , .

.. ,q,

h = 1, . . . , q .

The multiple-decision problem here is basically different from the one treated in Section 3.2.2 because there is no a priori meaningful ordering of the periods as there is an ordering of degrees of polynomials. We assume that there is no interest in the phases and no prior knowledge of them. Hence, our procedures will be based on R z ( k j ) . These procedures are + ctT (if T is even), invariant under transformations u,' = a, c,, ufT = u*(k,) = u ( k i ) c i , b*(k,) = b ( k i ) + d i , i = q I , . . . , t ( p - 2)or h ( ~ I), and a * ( k j ) = u(kj)cos 0, b(k,)sin Oj, b * ( k j ) = -u(k,)sin Bj b(kj)cos Bj, j = 1 , . . . , q ; that is,

+

+

+

+

+

Also we assume that there is no a priori preference for any one periodic function

among the q. Hence, it seems reasonable to impose symmetry on any procedure to be considered.

SEC. 4.3.

WHEN PERIODS ARE DIVISORS OF SERIES LENGTH

117

Let us consider first the case when a2is known. This case is of some historical interest and is the limiting case of some other cases. Under the assumption of normality a,, a(k,), 6 ( k j ) , j= I , . . . , i ( p - 2) or i ( p I), and possibly atT form a sufficient set of statistics for ao, a(k,), B(k,),j = 1, . . . , 3(p - 2) or $@ - I), and possibly ahT. We ask that procedures be invariant with respect to transformations (46). [Changes of phases are rotations of pairs a(k,), b(k,).] Thus the procedures should be based on R2(k,),. . . ,R2(k,,). If p(k,) = 0,

-

(47)

z1

T

= -R2(kj) 2a2

has a X*-distribution with 2 degrees of freedom; the density and distribution functions are k(z) =

p

z

,

respectively. I n general zj has a noncentral X2-distribution with 2 degrees of freedom and noncentrality parameter 7'p2(k1)/(2a2) = T ~ say; , the density? is (49)

where

is the Bessel function of the first kind with purely imaginary argument of order 0. Because I,@) is a power series with positive coefficients, it is a monotonically increasing function of z (0 5 z < a),and therefore k(z 1 T ~ ) / ~ ( = z ) e-?r*lo(~&) is a monotonically increasing function of z (0 z < 00) for every 7%> 0. Consider testing the null hypothesis that a specified population amplitude is zero using only the corresponding sample amplitude.

0 at significance level e, has the rejection region

> -2

TR'(kj)

2 2

log Ej.

PROOF.The probability of (51) under the null hypothesis is E, since the cumulative distribution function of z, = TR'(k,)/(2aa)is I t t S l . That the test of H: which is most powerful against any specified alternative, say p*(k,) = 2 d ~ ' / T ,has rejection region (51) is a consequence of the Neyman-Pearson Fundamental Lemma. To prove the result in this case we let the set (51) be A* and A be any other rejection region with Pr {A H:} = e,. The inequality (51) is equivalent to

-

I

say. Then

= d,

1

k(z,) dz,

+ Pr (A* A A IT')

J'nA

2

f k(z, I

7')

dz,

+ Pr {A* n A I

7')

I o nA

= Pr {A

IT'}.

The theorem follows because (53) holds for every T* > 0. This procedure is known as the Schuster test [Schuster (1898)l. The multiple-decision problem with which we are concerned can be considered as a combination of problems involving hypotheses about the individual

SEC.

4.3.

119

WHEN PERIODS ARE DIVISORS OF SERIES LENGTH

amplitudes; that is,

i=l

H~ = R,? n

n H;, Q

i = 1.

j=1

j#i

h

(54)

H ~. ~=.

< i, h , i = I , . .

9

4,

n R; Q

i=l

Let R,, R i , etc., denote the regions in the sample space of R2(k1),. . . , R2(kv) for making the decisions that H,, Hi, etc., are true, respectively, and let R t , i = 1 , . . . ,q , be the regions for accepting Hi*, respectively. Then these two sets of regions in the sample space are related in exactly the same way as the two sets of regions in the parameter space are related, as indicated in (54). Suppose we ask that (55)

1

Pr {RY p ( k l ) , . . . , p(k,); p(k,) = 0} = 1 - cj;

that is, that the probability of deciding p(k,) = 0 does not depend on what the other amplitudes are. We shall call this property independence of irrelevant parameters. Then an argument similar to that of Section 3.2.2 (which we shall outline) demonstrates that the intersection of R: and R2(ki)= c i , i # j , has probability 1 - E , for almost every set of q - 1 nonnegative cl, . . . , cj-l, cj+,, . . . , cv. Let h [ R 2 ( k l ) ., . . , R2(kQ)]be I in R: and 0 outside. (This is the critical function and could be generalized to a randomized critical function.) The expectation of h relative to R2(kj)is a bounded function of the other q - 1 arguments. Condition ( 5 5 ) is that the expectation of this quantity minus I - c j is 0 identically in the noncentrality parameters. The result follows from the bounded completeness of the noncentral X2-distributions. (See Problems 25 and 26.) Then in effect the decision about p(k,) is based on R2(kj). When we invoke the symmetry condition, we see that c1 = . . = E, = E , say. Given E , the best

-

120

CH. 4.

CYCLICAL TRENDS

procedure for

is given by Theorem 4.3.1 (for

= (1

-

49.

If this probability is specified to be, say 1 log (1 E ) = [log (1 8)J/q.

-

-

E~

= E ) . We note that

- 8, then 1 -

E

= (1

- 8)l'Q or

THEOREM 4.3.2. The uniformi) most powerful symmetric invariant decision procedure for selection of positive amplitudes given the probability (1 - E)' of deciding all are zero when that is true and given independence of irrelevant > -2 log E. parameters is to decide p(k,) is positive when TR2(kj)/(2ua) If every p(k,) = 0, an error is deciding that at least one p(k,) > 0. Since the procedure is to decide p(ki) > 0 if zi > C, the probability of an error is Pr

(57)

{KOI H,}

=1

- (1 - e-ic)Q.

This function has been tabulated by Davis [(1941), Table 11, writing our #C as his K and our q as his 4N. The procedure as a test of significance of Ho is known as the Walker test [G. T. Walker (1914)J. This general type of decision problem has been treated in a different way by Lehmann (1957). We note that if q = [a(T - 1)J (58) limPr {zj 9-a,

< C + 2 log q , j

= 1, . .

. ,q } = lim [l - exp(--fC - log q)Jq ll- m

= exp ( - e-4 ').

Other formulations of the decision problem can be given. Suppose it is assumed that at most one amplitude is positive. Then the hypotheses to be considered are H,: p ( k J = * * * - p(k9) = 0, (59) Hj: p ( k j ) > 0, p(kJ = 0, i # j , i = 1,..., q , j = 1, . . . , q .

-

The joint density of the normalized sample amplitudes is

where at most one

7:

= Tp2(kj)/(2$) is positive. If, for instance, H, is true,

SEC. 4 . 3 .

121

WHEN PERIODS ARE DIVISORS OF SERIES LENGTH

that is, p(k,) > 0 and p(ki) = 0, i # j , then the likelihood ratio for H,and Ho is

i=i

which is a monotonic increasing function of zj. This suggests that H, should be preferred to Ho if z, is large. The likelihood ratio for H, and Hi, i # j , i > 0, is

If T: = 7: = T ~ say, , the likelihood ratio is lo(7d~)/Io(~&i). This is greater than 1 or less than 1 according to the ratio zj/zi being greater than 1 or less than 1. Consider for a moment deciding between H, and H i . LEMMA 4.3.1. The uniformly most powerful symmetric invariant procedurefor deciding between H , and H i is to choose H , if z, > zt and Hi if zj < zt.

PROOF.Here symmetric means symmetric between zj and 2,; that is, B, and B, are symmetric if interchanging z, and zi in the definition of B, gives a definition of Bi. Let A f = {(zl, . . . ,zp) z, > z i } and A: = {(zl, . . . ,z,) z, < z,} and let A, and A i be two other mutually exclusive, exhaustive, and symmetric sets. [We ignore sets of probability 0, such as {(zl, . . . ,zJ zj = zi}.] Then

I

I

I

(63) Pr {AT 17: = rz,7f = 0) 7: = T * , 7: = 0} = Pr { A : n

I

+ Pr {A:

n A,

I

T;

= r8,7:= 0}

122

CYCLICAL TRENDS

CH.

4.

because Af n A i and A j n A,* are symmetric. The above inequality holds > 0. Thus this symmetric procedure is most powerful for every value of uniformly in 7'. Theorem 4.3.2 and Lemma 4.3.1 suggest that in the multiple-decision problem one should decide Ho if all of the sample amplitudes are small and decide H j if R 2 ( k j )is the largest sample amplitude and is sufficiently large. This procedure is symmetric in the sense that permuting the indices of H I , . . . , Haand R2(k,), . . . , R2(k,) does not change the procedure. THEOREM 4.3.3. The uniformly most powerful symmetric invariant decision procedure f o r deciding among H,, H , , . . . , H, given Pr ( R , Ho} = ( I - E)* and given independence of irrelevant parameters is 10 decide H, if z j = TRz(kj)l ( 2 ~ ' )_< - 2 log e , j = 1 , . . . , q , and otherwise to decide H, when zj > zir i # j , i = I , . . . , q.

I

PROOF. Let the mutually exclusive and exhaustive regions in the space of . . . , z, for deciding H,, H,, . . . , Ha be R,, R , , . . . , R,, respectively, for an arbitrary symmetric procedure with Pr {R, Ho} = (1 - E),. Let RA = R , and

z,,

I

(64)

R ; = K, n {(z,,

I

. . . , z,) I z j > zir i # j}l

j = 1,

...

q.

Then P r { R i H,} = ( 1 - E)* and

(65) Pr {Ri. I T ; = 2, T: = 0, h # j } = P r { R I n K j ) T ~ = T z , T ~ = O , h ~ j }P+r { R : n R j I s : = T 2 , T : = O l h # j } =

2 Pr ( R i nR i ( 7 ;

= T',T; = 0, h # j}

+ Pr {R; n R j IT:

1 {Rj

= T',T; = 0,h # j }

+ Pr { R ; nR j: .1

i# i

2

Pr

17R;

17:

= T',T; = 0 , h # j } =?,Ti

= 0,h # j }

i#i

= Pr{Rj1T:=T2,T;=0,h#j},

j = I, . . . , q ,

because (66)

Pr { R I

f3

R i1 T :

= T',

T:

= 0,h # j }

2 Pr {R, n R: I T :

= T*, T: = 0,h # j }

by symmetry and the argument of Lemma 4.3. I . Now let (67) (68)

1 zi < - 2 loge, i = 1, . . . , q}, ~f = R,* n { ( z , , . . . , z,) I z, > z i , i # j } ,

R,* =

((2,.

. . . , z,)

j = 1,

. . . ,q.

SEC. 4.3.

123

WHEN PERIODS ARE DIVISORS OF SERIES LENGTH

I H,} = (1 - e)* and Pr {R,*IT: = = 0,h # j }

Then Pr {R: (69)

T',T;~

= Pr{Rj*nR;1T:=T2,T,:=0,h#j}+

2 Pr { R i nR: = Pr { R i

I

I

T:

Pr{R:nR;(T~=T',T:=O,h#j)

= T',T; = 0,h # j }

+ Pr { R f f3 R;I

T:

= T ~ , T , "= 0,h # j }

0,h # j }

T:=T:T;=

by the argument of Theorem 4.3.1. Since (65) and (69) hold for every theorem follows.

T',

the

4.3.4. Any symmetric Bayes solution is a procedure fo accept H, fi THEOREM TR2(kj)/(2a2)5 C , j = 1, . , . ,q, and otherwise to accept H, fi zj > zi, i # j , i = 1,

...,q.

PROOF. Any Bayes solution is defined by [see Section 6.6,T. W.Anderson (l958), for example] (70)

Ro:

Q

PO

k(zj)

j=l

> PiUzi I

n

T ~ )

k(zj),

i=l,..,,q,

j # i

i = l , ..., q , i # h ,

(71) p)Lk(zh

I

T')

k(zj)

j#h

> P O 'fi 5-1

k(zj),

where p i 2 0, i = 0,1 , . . . , q, and z - , p i = 1. [These regions are defined to be disjoint; a point on the boundary of two regions (of probability 0) can be assigned to either region.] The first set of inequalities (pi > 0) is

i=l, ...,q, the left-hand side being a monotonic increasing function of zi. This region is symmetric if and only if p 1 = . = pp. Then Ro is of the form given in the theorem and is identical when p o (= 1 - & p i ) is chosen suitably. Then the Ri's agree with the theorem. Q.E.D. We note that this procedure amounts to the test of significance given before with the further proviso that if the null hypothesis H, is rejected, one decides that the largest sample amplitude indicates the one population amplitude which is positive. This type of problem has been treated in a general way by Karlin and Truax (1960) and by Kudo (1960).

124

CYCLICAL TRENDS

CH.

4.

Since the class of Bayes solutions is the same as the class of admissible solutions in this problem [see Theorem 6.6.4 of T. W. Anderson (1958), for example], Theorem 4.3.3 and Theorem 4.3.4 are equivalent. Now let us turn to the more important case of u8 being unknown. We must distinguish between the situations when there is available an estimate of d based on n = T - p degrees o f freedom and attention is focused on some q given frequencies, not involved in the estimate of ua, and the situation when there is no estimate of aathat does not involve the frequencies of interest. In the first situation there is an estimate s2, such that ns2/a2= S/a2has a X8-distribution with n degrees of freedom independent of the q pairs a&), b(k,), . . . ,a@,), b(kJ In this first situation we may ask for symmetric procedures satisfying (55); that is, that the probability of deciding whether p(k,) = 0 does not depend on the other amplitudes and the probability of deciding all the amplitudes are 0 when that is the case does not depend on ua. Then the best procedure for testing Hi* is to reject H: if Fj= TR2(kj)/(4sa) > cj. [This follows by consideration of the sufficient statistics invariant under the transformations (29) and the completeness of the family of distributions; see the remarks below (29).] For a symmetric procedure we require c1 = * = ca = c, say. Then

-

[This approaches 1 minus (57) for (73) is a preassigned number.

c = #C as

n -+ 00.1 One can adjust c so

THEOREM 4.3.5. When aais unknown, the uniformly most powerful symmetric invariant decision procedure for selection of positive amplitudes given the probability of deciding all are zero when that is the case and given independence of > c , where irrelevant parameters is to decide p(k,) is positive when TR2(kj)/(4sa) c is determined so (73) is the given probability.

SEC. 4.3.

WHEN PERIODS ARE DIVISORS OF SERIES LENGTH

125

When σ2 is unknown, the procedure above is possible only if there is an estimate i 2 of σ2 whose distribution does not depend on α ^ ) , . . . , ß(k,,) which are involved in the hypotheses. Otherwise one cannot make the requirement of independence of irrelevant parameters. This latter situation occurs when one is unwilling to assume that any coefficients a(A:,), ß(kf) are 0; that is, one may want to permit trigonometric terms of any frequency (of the form у/Г, j = 0, 1 , . . . , [\T]). However, one may assume that there are at most only a small number of these nonzero but without delineating them further. We treat the simplest case in some detail. Suppose at most one of q population amplitudes is positive. The problem is to decide which of the hypotheses H0, Hlt. . . ,Ha is true. We ask that the procedure be symmetric and that Pr {R01 # 0 ) be specified independent of the unknown σ2. We shall further assume that there are r — q sample amplitudes, Ä 2 (/t, + i),. . ■ , R2(kr), corresponding to zero population amplitudes, though r — q may be 0. That leaves T —2r coefficients in the trend which may be nonzero and which we do not question. (This formulation leads to a more general problem than those considered by other authors.) When H0 is true, Σϊ=ι ^ 2 (^)> αο> a n d any sample coefficients corresponding to other population coefficients possibly nonzero form a sufficient set of statistics for the parameters. Hence, for almost every set of values of £ j Л2(&,), α0, and the other coefficients, the conditional probability of Ra must be the assigned Pr {/?„ | # 0 }. (See Problem 10 of Chapter 3.) The joint density of z, = 77?2(^)/(2ст2), j = I, . .. , r - 1, given 2J«! z, = c, that is, z, = с - ]££} z„ is (74)

f j Цг,) = (frV**,

2j

> 0, ; = 1, . . . , r,

J'=l

= 0,

otherwise,

when H0 is true and is (75)

k{z, | г2) TJ fc(z,) = Ш ' е - ^ е - ' Ч С т ^ ) , г4 > 0, / = 1, . . . , г, = 0,

otherwise,

when Hj is true. We use the same argument as in proving Theorem 4.3.3. The intersection of the region R0 and £j=i г> = c ' s defined by z, < g(c), i = 1, . . . ,q, where g(c) is a number so the conditional probability is the assigned Pr {/?„ | H0]. The intersection of RK, h — 1,. . . , q, and ^rj=l z, = с is defined by zh > g(c) and zh > ζ,, / ?ί A, / = 1 q.

I26

CYCLICAL TRENDS

CH.

ziSl

4.

The conditional joint distribution of zl, . . . ,z, given zj = c is uniform over the region in r-dimensional space where 2, 2 0, j = 1, . . . , r, and z, = c. Thus the conditional distribution of z,/c, . . . ,z,/c given & z , / c = 1 is uniform over the region zj/c 2 0, j = I , . . . ,r, and zr/c = 1. If g(c) = gc for each positive g ,

xXl

&

does not depend on c, and hence is the same as the unconditional probability

4.3.6. When az is unknown, the uniformly most powerfirl symmetric THEOREM invariant decisionprocedurefor deciding among H,, Hl, . . . ,H, given Pr(R, Ho} is to decide H, if

I

j = l , ...,q,

and otherwise decide Hi when Rz(kj) > R2(ki), i # j, i = I , . . . ,q, with g chosen so that the probability of (78) under H,, is the specified Pr {Ro I H,}.

Now let us evaluate 1 minus (77) which is (79)

~r

! max z,> g I H , \,=I,...,p

I

= Pr {at least one x i

Let A, be the event {x, > g}. Then under Ho .. , Q

j=l,.

> g, j

= 1,

. . . , q I H,}.

SEC. 4.3.

127

WHEN PERIODS ARE DIVISORS OF SERIES LENGTH

[See Feller (1968), Section 1 of Chapter IV, for example.] Now by the equidistribution of the zf(s we have

Since 2F1xt = I, and xi 2 0, j = 1 , .

whenever kg

2 1.

. . , r , we have

Hence

(83)

where n is the minimum of q and [l/gJ, which is the largest integer not greater than l/g. The conditional distribution of zl, . . . ,x,, given zj = 1, is uniform z, = 1 (and zero elseover the region where zj 2 0, j = 1, . . . ,r , and where). The unconditional distribution of the zis is the same as the conditional distribution since the condition is automatically satisfied by the way zj is defined. The r coordinates in the ( r - 1)-dimensional space 2im1 zj = 1 are burycentric coordinates. The probability of a set is proportional to its (r - 1)dimensional volume. We see

x-l

(84) Pr {A,} = Pr {zl > g}

< zl - g < 1 - g , o
z , , z h > z i , i # j , i # h , i = l , ..., q. I f ~ ~ = ~ ~ ~ , ( l l O ) f o r i = h i s equivalent to zj > d and (1 10) for i =j is equivalent to zh > d. Figure 4.3 indicates Ro, R1,Rp,and Rlp when 7: = T&. The shapes of the regions depend on po,p I ,pIr,and 7;. [Inequality in (103) is impossible when 7: = T&.] If 7: > T&, the lines zl = d and zp = d are replaced by curves sloping upwards. If 7: < T&, the lines z1 = dand zp = d are replaced by curves with negative slopes. The shapes of the regions depend onpo,pI,pII, and T&. There is no procedure that is optimal uniformly in 7: and 7:;. These symmetric procedures may be characterized by the probabilities of correct decision Pr {& H,}, Pr (R, 7: = T:, 7: = 0,i 2 2}, and Pr {RIP 7: = 7," = 7 : = 0, i 2 3). An alternative formulation is to set Pr {R, H,} and Pr {R, I 7: = T:, 7: = 0, i 2 2) for a specified value of 6 and maximize the third probability for a specified value of T&.

с is equivalent to k(z | т*)/А:(г |

TJ)

CH. 4 .

> d for a suitable

d. Then

(47)

JTV» | T*) *[°°*(г | г») 0 and some m , then

where F is given by (22).

196

LINEAR STOCHASTIC MODELS

LEMMA5.5.4.

CH.

5.

If C is positiue definite, F is positive definite.

PROOF. F is positive semidefinite because it is a covariance matrix. Since BFB', it is the sum of a positive definite matrix and a positive semidefinite matrix and hence is positive definite. (See Problem 28.)

F

=C

+

THEOREM 5.5.3. Under the conditions of Theorem 5.5.2 and if F is positive dejnite A

(33)

plim B = B,

(34)

plim X = C.

T-

A

T- 00

PROOF. To prove (33) we observe that

= 0.

Then (34) follows from

(28) and (33). Q.E.D. Theorem 5.5.3 shows that the estimates of B and C are consistent under quite general conditions and in particular under more general conditions than those defining the estimates as maximum likelihood. The conditions assumed here have been selected because they are relatively simple and because they also lead to asymptotic normality. The assumption that the roots are less than 1 in

SEC.

5.5.

197

ASYMPTOTIC DISTRIBUTION OF ESTIMATES

absolute value is not necessary for consistency of the estimate of B though it is necessary for a general theorem about asymptotic normality. [See T. W. Anderson (1959).] For consistency of the estimates for thepth-order univariate equation we need t o show that F is nonsingular. a.

LEMMA 5.5.5. I f u 2 ispositive, then (37) is posilioe definite, where

(38)

2=

0 , and if either the u,'s are identically distributed or B < m, t = 1, 2 , . . . , 0,1,.

198 for some

LINEAR STOCHASTIC MODELS E

> 0 and some m, then

p = p,

(41)

pIim

(42)

plim 6 ' = .'u

T+a,

CH. 5.

T-+m

5.5.4. Asymptotic Normality of the Estimates The limiting distribution of

I/J%r&,

is the same as the limiting distribution of - P ( yl-&. We shall find the limiting distribution of this last matrix by showing that an arbitrary linear combination of its elements, say (44)

has a limiting normal distribution. Let (45) a=O

(46)

(47)

Then (48)

SEC.

5.5.

- -

199

ASYMPTOTIC DISTRIBUTION OF ESTIMATES

(We have used the fact that x’x = tr xx’.) Then Mk (-B)’C(-B’)’ is convergent. For given k

has mean 0, variance

=c

k

0 as k

03

because

k

(50) ~ ~ u ~ * ( - B ) 8 ~ ~ - l - ~ ~ * ( - B ~ ~ * -trl (-B)’C(-B’)”cP’EcP -r s,r-0

= ui

S=O

say, and covariance at t and

7

(t #

7)

The sequence { u ~ * y ~ f ,has } the further property that any set of n elements, t = t, < < t,, is distributed independently of the elements for t < t1 k - 1 and for t > 1 , k 1. Application of Theorem 7.7.5 yields the following lemma :

-

---

+ +

LEMMA5.5.6. If ylk’, t = 1,2, . . . , is defined by (45) and the uis are independently and identically distributed with gut = 0 and &up; = C , then the distribution of ZkT dejined by (46) (52)

Pr {zk!l’

2 z,

= FkT(Z)

has the limit on T (53)

where 1

(54)

Aw’

and uk is given by (50). Then (55)

where (56)

We use Corollary 7.7.1.

u2 = lim ui = trX@F@‘, k-m

200

LINEAR STOCHASTIC MODELS

CH.

5.

LEMMA5.5.7. Under the conditions of Lemma 5.5.6, if the characteristic roots of -B are less than 1 in absolute value, ST,defined by (44), has a limiting normal distribution with mean 0 and variance u2. Application of Theorem 7.7.7 yields the desired result. The matrix whose elements aref,,uklwill be denoted by F 0 C,the Kronecker product of F and C.

THEOREM 5.5.5. Under the conditions of Lemmas 5.5.6 and 5.5.7, ( I / h ) x y,-,u; has a limiting normal distribution with mean 0 and covariance matrix

xcl

F @ C.

If { y , } is a sequence of

random vectors satisfying (1) with identically distributed nith But = 0 and = C, if -B has all characteristic roots less rhan I in absolute value, and if F isposirive definite, then JT(6’- B’) has a limiting normal distribution with mean 0 and covariance matrix F-1 @ C. THEOREM 5.5.6.

{u,} independently and

PROOF.This follows from Theorem 5.5.5 and (43). (See Problem 29.)

THEOREM 5.5.7. If { y l } is a sequence of random variables satisfying ( I ) of Section 5.1 for t = , . . , -1, 0, 1 , . . . , if the roots of the associated polynomial equation are less than I in absolute value, and if the u,’s are independently and identically distributed with mean 0 and variance aa > 0, rhen JT@ - p) has a limiting normal distribution with mean 0 and covariance matrix a2P-1. Theorems 5.5.6 and 5.5.7 are true if the condition on the distribution of disturbance is that moments of order 2 + E are uniformly bounded instead of being identically distributed, or a Lindeberg-type condition could be used. 5.5.5.

Case of Unknown Mean

We now suppose that the vector stochastic difference equation is (57)

y;

+ Byt-1 +

Y

= Uf

SEC.

5.5.

20 1

ASYMPTOTIC DISTRIBUTION OF ESTIMATES

! !

Then the maximum likelihood equations for B, v, and C are

5

2 Yt-1 T

YI--IYGl

1-1

2Y G l T

T

t=1

2Yt-lYi T

-

‘=l

2

t=1

Y;

t=1

Equation (60) leads to

(63)

=

zTpl

-7 - i F ( l ) ,

zE1

where = ( ] I T ) y t and F(l) = (1/T) y t - l . We want to show that the asymptotic behavior of 6 defined by (62) is the same defined - as that of i by (3) when p = 0. This will follow from showing that d T y & ‘ , )and JTY(l,? converge stochastically to 0 when p = 0. First consider the case where (57) holds for t = . . . , - - I , O , 1 , . . . . Then

s=o

s=o

Then

=F

+ -1 C ( T - s)F(-B’)* + -1 c ( T - s)(-BB)*F, T-l

T--l

T

s=l

T

8-1

202

LINEAR STOCHASTIC MODELS

5.

CH.

which converges to (67)

lim Td(y*

T+m

- p)(j* - p)' = F + F 2 (-By + 2 (-B)'F = F + F( -B)(I + B')-' + ( I + B)-'( -B)F = F(I + B)-' + ( I + B)-'F - F. m

m

8-1

a-1

By the Tchebycheff inequality &* - p)(f* - p)' converges stochastically to 0. In the case where (57) holds for t = 1,2, . . . and yo is a fixed vector (68)

yt = p

+ (-B)'(y0

1-1

- p) + 2(-BYttt.-s. a-0

The difference between yr defined by (64) and yt defined by (68) is (69)

(-B)"Y,:

- (Yo -

P I 1 9

where$ is given by (7), and the difference between ,/Fj*and ,@defined the two is

by

m

the sum of expected values of the squares of the components of (70) converges to 0. It follows from these facts that d?F&l) and J T y J converge stochastically to 0 if p is 0 in the second case as well. Thus for both models i defined by (62) has the asymptotic properties of 6 defined by (3).

THEOREM 5.5.8. I f y l is de$ned by (64) or (68), where -B has characteristic roots less than 1 in absolute value, and if the ut's are independently and identically distributed with gut = 0 and cfutu~= E, then ,@" - p) hus a limiting normal distribution with mean 0 and covariance matrix (67).

,z:

PROOF. Let ST = J?+'(y - p); letylk' be (45) plus p, ZkT = (1/JF)x +(yik) - p), and XkT= ST - ZkT.Then if yt is defined by (64)

(71)

&'Xi, = +'(-B)k+l[F(l

+ B')-' + ( I + B)-'F - F](-B')k+'+

= MR,

which converges to 0. By Theorem 7.7.5 we find that Z k T has a limiting normal distribution as T-. co with mean 0 and a variance depending on k which converges as k -+ co. It follows by Corollary 7.7.1 that ST has a limiting normal distribution, and hence JT(F - p) has a limiting normal distribution. Use of

SEC.

5.5.

203

ASYMPTOTIC DISTRIBUTION OF ESTIMATES

(70) and another application of Corollary 7.7.1 yields the theorem for y,defined by (68)-

The scalar stochastic difference equation of order p can be written in vector form with G having only the first component different from 0, namely Y . In fact, equations (60) for the model (57) with B replaced by 8, y , by p,, and u, by Gt forces the last p - 1 components of V to be 0. Then (59) implies all of the components of @ are p = ./(I PI). The above results apply in this , E = case; JT(3 - i;) has the limiting distribution of JT(3 - p ) ~ where (1, 1 , . . . , I)'. A

+ z;=l

5.5.6. Case of Fixed Variates The maximum likelihood estimates were obtained when the model was

s=o

j=l

This can be considered as a special case of the p-component vector model

where r is a p x q matrix of constants and t , = (zlf, likelihood estimates of B, I', and C are defined by I T

-

. . . ,zJ'.

The maximum

2 YL-lY;-l -T 2Yt-12; T

1 :

t= I

:=1

m

m

m

The asymptotic theory developed for i and $ defined by (3) and (4) can be extended to i,?, and 5 defined here. The details, however, are extremely laborious. Instead of giving them in full we shall only sketch the developments. [See T. W. Anderson and Rubin (1950) and Kooprnans, Rubin, and Leipnik (1 950).]

204

LINEAR STOCHASTIC MODELS

If (73) holds for t = 1,2, yt

5.

. . . and yo is fixed, then

t-1

(76)

CH.

= Z(-B)'u,-, s=o

+ + (-B)?o, W:

where (77)

If (73) holds for t = . . . , - 1,0, 1, solution is

. . . and zf = 0 for t = 0, - 1, . . . ,then the

8-0

The difference y:

- yt is (-B):(y: - yo), where y:

is given by (7).

THEOREM 5.5.9. Ifyt is defined by (73)for t = 1,2, . . . with yo afixed vector and y: is defined by (73) for t = . . . , - 1, 0 , 1, . . . , where the uis are independently distributed with but = 0 and Bu,u] = E, if -B has characteristic roots less than 1 in absolute value, and if tjzt -C N, t = 1, 2 , . . . , for some N,and zt = 0, t = 0,- 1,. . . , then (18), (19), (20) hold and (79)

PROOF. We have

SEC.

5.5.

ASYMPTOTIC DISTRIBUTION

205

OF ESTIMATES

The first, third, and fifth terms divided by JT converge stochastically to 0 by the proof of Theorem 5.5.1. The sum of expected values of squares of components of the second term is

2 T

(82)

tr 8

(-B)’(yX

- yo)w;wl,(y(:- Y ~ ) ’ ( - B ’ ) ~ ‘

f.f’=l

(83)

and each element of z1 is less than J%,(82) is uniformly bounded (by Lemma 5.5.1). Hence, (81) divided by JT converges stochastically to 0. Now consider t=l

f=1

f=I

The sum of expected values of the squares of the components of (84) is bounded; the argument is that given above with w f replaced by z t . This proves (79). Then ( 8 5 ) 2TY L Y ? ’ f=1

- CTY 1 - S ; 1=1

=

-

[t 2 Yt-IYt-I * *I

1-1

-

Yf-1Yt-1 I ] B‘

f*l

and the sum of expected values of the squares of the components of the last matrix is uniformly bounded. (See the proof of Theorem 5.5.1.) Finally

Thus, the theorem is proved. As in Section 5.5.2 the asymptotic properties of b, ,f and (under appropriate conditions) are the same for y l defined by (73) for r = 1,2, . . . with yo a fixed vector as for y: defined by (73)for I = . . . , - 1,0, 1, . . . or equivalently by (78). In the sequel we shall for convenience treat only the latter case and shall use the notation y r to denote (78).

k

206

LINEAR STOCHASTIC MODELS

CH.

5.

LEMMA 5.5.8. If yt is dejned by (78), where the U ~ are S independently distributed with But = 0 and bu,uI = 22, if -B has characteristic roots less than 1 in absolute value, and if Z~Z,< N, t = 1, 2, . . . , f o r some N , then m

1

2 x'

piim ulz; = 0. T-cc T t=l

(88)

PROOF. The sum of the expected values of the squares of the components of (87) is

which converges to 0. The sum of expected values of the squares of the components of (88) is m

m

< -1 trZN, T

which converges to 0. Q.E.D. Lemma 5.5.8 yields

(92)

plim T-w

[2 1

T

1-1

Y,t;

+

+ B 2 Yt-,E; + r 2 Z,z;] 1

Lemma 5.5.2, (91), and (92) imply

T

1-1

T

i=I

= 0.

SEC.

5.5.

207

ASYMPTOTIC DISTRIBUTION OF ESTIMATES

Now we assume the existence of the following limits: m

(94) (95) Then (96) and

0, if the roots of the associatedpolynomial equation are less than 1 in absolute value, ifziz, < N,t = 1, 2, . . . , for some N, if 8ut < m , t = 1,2, . . . , for some m , if M defined by (94) is positive definite, and if the limits (95) and (102) exist, where

SEC. 5.6.

21 1

INFERENCE BASED ON LARGE-SAMPLE THEORY

-

then J?(B g), f i ( 9 and covariance matrix

- y) hai:e a limiting normal distribution with mean 0

5.6. STATISTICAL INFERENCE ABOUT AUTOREGRESSIVE MODELS BASED ON LARGE-SAMPLE THEORY 5.6.1. Relation to Regression Methods In Section 5.4 we showed that the maximum likelihood estimates of and u2 are based on minimizing

(PI, , . . ,#?J’,y = ( y t , . . . , y,J‘,

p=

and hence these estimates are formally least squares estimates. This implies that all computational procedures of ordinary regression analysis are available. In Section 5.5 we showed that under appropriate conditions the estimates g and 9 had asymptotically a normal distribution just as in the case of ordinary regression. Thus for large samples (that is, large T) we can treat them as normally distributed and the procedures of ordinary regression analysis can be used. The approximating multivariate normal distribution of JT(B p) and JT (9 - y) has covariance matrix

-

which in turn is approximated by

(3)

212

LINEAR STOCHASTIC MODELS

CH.

where (4)

L$

Yt-IYt-2

l=l

L$

Yt-2Yt-1

tEl

1

-

2Y L T

f=l

$i ;E Yt-vYt-

Yt-,Yt-2

t-1

t"l

.

fC T

Yt-vZlt

t=1

L=l

1

r=i

5.

SEC.

5.6.

213

INFERENCE BASED ON LARGE-SAMPLE THEORY

r

Usually, there will be a constant which is included by letting zlt = 1. Then when is eliminated all the sums of squares and cross-products are in terms of deviations from the corresponding means. (We have dropped N from F, H, and L because in Section 5.6 the general vector case is not treated.) As remarked at the end of Section 5.4, the terms on any diagonal of (l/T) jt-lji-l differ only with respect to the terms, at the beginning and end of the series. As far as the large-sample theory goes, these end-effects are negligible and the same term could be used in each position on a given diagonal. Moreover, the divisor T could be modified. This suggests

T

*

+P -s

2

YtYt-,,

s=o,1,

..., p ,

t=-p+n+1

on the diagonal s above or below the main diagonal. If the mean g* = ~ ~ = - ~ p - l l y f / (p) T is subtracted, the sum is

+

m

(9)

+ - s) - g * z .

Other modifications are possible such as _ZI~-p+s+lyfyf-s/(Tp Note that we have available observations from - p + 1 to T.

214 5.6.2.

CH. 5.

LINEAR STOCHASTIC MODELS

Tests of Hypotheses and Confidence Regions

From the usual regression theory we verify that each of

where s2i“ and sZI53’j are the ith diagonal element and (p +j)th diagonal element, respectively, of the matrix in (3), has approximately a normal distribution with mean 0 and variance 1. This enables us to test a hypothesis about #Ii or y j or set up a confidence interval for #Ii or y j in the usual way. Suppose we are interested in a set of coefficients, say, Pi,,. . . ,Pi,, y,,, . . . , yj,. We can use the fact that

Ifyj,,tFzjujVform the matrix inverse to the matrix consisting of rows where iiui,, and columns il, . . . ,i,,

jl + p ,

. . . ,jn + p

+

of

(LA: :)-’,

has an

approximate Xz-distribution with g h degrees of freedom. This leads to tests of hypotheses and confidence regions. (The F-distribution with g h and T (p q) degrees of freedom could also be used.)

+

- +

5.6.3.

Testing the Order of an Autoregressive Process

The order of a stochastic difference equation is determined by the highest lag which has a nonzero coefficient; as we have written the equation the order is p (P, # 0). Apart from the effect of any “independent variables” zit, at mostp lagged variables, . . . ,y+, directly affect yt; only these need be used in prediction. It is of interest, therefore, to determine the order of an autoregressive model. We shall now study this problem when the difference equation is (12)

Yr + Ply:-1

+

* * *

+ #IpYt-?J + Y = u:.

In this case the estimating equations for /11, . . . ,p, are given by

2 i=l

a;fi5

I :

-42,

i = 1, . . . , p ,

SEC.

5.6.

INFERENCE BASED ON LARGE-SAMPLE THEORY

215

where (14)

and 5 = g(,,, and g(,.)is given by (17) of Section 5.4. [The equations (13) are ( 1 8) of Section 5.4. J First let us consider deciding whether the order isp - 1 orp; that is, deciding whether /?, = 0. We formulate this as a problem of testing the null hypothesis H: Po = 0 against alternatives 8, # 0. The likelihood ratio test is the analogue of the two-sided r-test in the regression problem; we reject the null hypothesis if

where = (at:)-1 and r ( E ) is the (two-sided) &-significancepoint of the standardized normal distribution. For sufficiently large T,this test has approximate significance level E . Suppose the null hypothesis is not true. In general

given by (37) of Section 5 . 5 , plimT,,~ = u , and plimT-, Is, = 8,. Then roughly the test statistic behaves like J T I / ? , l / ( u G ) . This argument can be made rigorous to prove that the test is consistent; that is, for a given alternative /?, # 0, the probability of rejecting the null hypothesis approaches 1 as T increases. A more interesting study of the power of the test can be based on considering a sequence of alternative hypotheses ppT (with uz, PI, . . . , fixed and ut normally distributed, say) such that

Then @ , / ( s J a * P p ) has a limiting normal distribution with variance I and mean + / ( u J p ) . To make theargument rigorous (and some later arguments rigorous) we could prove that J T (6 - BT) has a limiting normal distribution when IimT-- J T ( B - BT) = 9 for suitable B, with mean matrix 9 and covariances corresponding to B. (Here i,B, and BT refer to the Markov vector case.)

216

LINEAR STOCHASTIC MODELS

An interesting point is the following:

LEMMA 5.6.1. If /I, = 0, then (19)

[ "'

aypp = 1.

PROOF.We have (Problem 8 of Chapter 2) 1

(20)

f ; = f v p - (fvl

* *

*f,,v-d

f+-l,l

Since in the stationary case (21)

f i j

* - .

--

*

5.

*"f fv-1.9-1

- BYt-i)(Yt-, = 4%- '%++J =4Yt-i

CH.

fv-1.v

'Yt-j)

& M L j + i

= a(i - j ) ,

i , j = 1 ,... , p ,

corresponds to the differenceequation of orderp - 1 when /?, = 0, the coefficients satisfy [by (48) of Section 5-21

Then (20) becomes [by (47) of Section 5.21 (23)

1 = a(0) + U(P -

f

Dp

+ + u(l)/I1 * * *

= a*.

This proves the lemma, which in turn implies the following theorem:

THEOREM 5.6.1. If /I, = 0 , then under the conditions of Theorem 5.5.7 JT 8, has a limiting normal distribution with mean 0 and variance 1. Now consider testing the null hypothesis that the order of a stochastic difference equation is m against alternative hypotheses that the order isp (> m). Let A* = (a:) be partitioned into m and p - m rows and columns and fi correspondingly

SEC.

5.6.

217

INFERENCE BASED ON LARGE-SAMPLE THEORY

Then the likelihood ratio criterion for testing the hypothesis that @+,, 8, = 0 is a monotonic function of

=

.- -

which has a limiting X2-distribution with p - m degrees of freedom when the null hypothesis is true. Analogy with the usual regression procedures suggests that the power of this test depends mainly on the function of the parameters measuring the excess in the mean square error in prediction by using m terms instead ofp. [See Section 2.5; this was pointed out by A. M. Walker (1952).] Another approach to testing the hypothesis that /Im+, = * * = 8, = 0 has been given by Quenouille (1947) and extended by Bartlett and Diananda (1950) and others. For simplification in presentation we assume y = 0, and therefore shall use A in place of A*. If { y , } is a stationary autoregressive process of arbitrary order, say p, with coefficients fil, . . . ,Pm, 0,. . . ,0, then

-

1

-

T

i = 1, . . . , p ,

have means 0 and covariance matrix aY = a2[a(i- j ) ] , which is of order p. It follows from Theorem 5.5.5 that g,, . . . ,g, have a limiting joint normal distribution. The limiting distribution of a given number of g,'s is the same as the distribution of that number of ay,'s when the ay,'s are normally distributed. Then m

J

=m

+ 1 , . . . ,p ,

218

LINEAR STOCHASTIC MODELS

CH.

5.

have a limiting joint normal distribution. For any T the joint distribution has means 0 and covariances bhjhj,, based on

k=O

2 Bku(j m

= U*

k

- i)

k=O

> i,

= 0,

j

= a4,

j = i,

by (47) and (48) of Section 5.2. Thus m

= 0,

i >j',

= a4,

j =j'.

That is, when the null hypothesis is true, hm+Ju8, hm+Ju2,.. . , h,/a2 have a joint distribution with means 0, variances 1, and covariances 0; their limiting distribution is normal. We could use the hj's to test the hypothesis that /?m+l = * * = B, = 0 if u* and P I , . . . , B,,, are known because then

-

has a limiting Xg-distribution with p - rn degrees of freedom. To make this procedure more efficient A. M. Walker (1952) has suggested adding another statistic so the criterion has a limiting X2-distribution with p degrees of freedom. To test the hypothesis Pmfl = * . * - /?, = 0 when the other parameters are unknown it has been suggested to use (30) with the unknown parameters replaced by estimates. Let m

where 8, = 1, bl, . . . , 8, are the maximum likelihood estimates given in Section 5.4 (withp replaced by m). We propose the criterion hi/s4, and

z7=,,,+l

SEC.

5.6.

219

INFERENCE BASED ON LARGE-SAMPLE THEORY

now verify that it is asymptotically equivalent to (30) when the null hypothesis a,,/T = o ( i - j ) , is true. Since 1, is a consistent estimate of B k and plim,,

(33)

Hence

= 0,

j = m +

1, . . . , p .

Since s2 (based on the mth-order stochastic difference equation) is a consistent estimate of a2 when the null hypothesis is true, our verification is complete. THEOREM 5 . 6 . 2 . rfthe roots of thepolynomialequation associated with the mthorder stochastic difference equation are less than 1 in absolute value,

f: hy j-m+l

(35)

where h, is defined by (31) and sf is the estimate of u2 based on the mth-order stochastic difference equation, has a limiting X2-distributionwith p - m degrees of freedom. Now we shall relate these two approaches to testing the order of the stochastic difference equation.

LEMMA 5.6.2.

Whenp = m

+1

where Bm+l(m + 1) is the estimate of Pmfl on the assumption of an ( m order process and where ii is ah estimate of u2 based on (45) below.

+ 1)st-

220

LINEAR STOCHASTIC MODELS

PROOF.

CH. 5.

Let the matrix of sums of squares and cross-products with elements T

(37)

au = 2 , Vt-iVt-i,

(

be partitioned

a

i, j = 0, 1, . . . , m + 1,

1=1

α

οο

Al0

m+l,0

A0l

a

A

Alm+1

o,m+i

n

a

^m+1,1

Then the estimate of p ( w ) = [ßi(m),. equation of order m is

m+l,m+l/

. . , ßm(m)]'

based on a difference

(39) ftm) = -ATiAl0. Use of these coefficients in the definition of A m + 1 yields (40)

s/Tkm+1 яу

m

= a,

w> m

a

l + Z , /?к(п»)во.т+г-* + Z , Ä ( ) l . m + 1 + Z , & ( η » ) & ( » " Η « + ι _ * m

m

= ao.m+i + Z , ßm+i-ii.m)a0j — a 0,ra+l

—

+ Л т + 1 > 1 £(/и) + Z ,

Am+llAnA10.

The equations for the estimates of ßlt.. of order m + 1 are (41)

ßi(m)ßm+i-j(™)au

Ли

., ßm+l

based on a difference equation

^i.Wi\/^"(«+l)\ a

l^m+l.l

m+l.m+l/

m

\ßm+l(

+

/ Лю \am+l,0

*)/

that is, (42)

4 ί " ' ( « + 1) + AliM+1ßm+l(m

+ 1) = - Λ 1 β ,

(43) Λ . + ι . ^ ' ί « + О + a ^ w i / W " » + О = -β«+ι.· · [Note that these correspond to (29) and (30) of Section 5.4.] Multiplication of (42) by Ат+1 ЛА~1\ and subtraction from (43) yields (44) ( a m + 1 , m + i - /4 m + l i l ^n^i,m+i)/5 m + i(m + 1) = - ( a m + i . o The right-hand side is —у/тйт+1. hand side is T times глс\ УУЭ)

a

m+l.m+l

The coefficient of ßm+l(m Am+liχΑχι ~

Alim+i

j m

— S

4m+i,iAnAio)·

+ 1) on the left-

SEC.

5.6.

INFERENCE BASED O N LARGE-SAMPLE THEORY

22 1

which is asymptotically equivalent to 8; (based on an equation of order m); the corresponding sums in the matrices defining f i and 6: differ only with respect to end terms. This proves the lemma.

LEMMA 5.6.3. If

8m+l= .

= 8, = 0 and y = 0 ,

*

PROOF. The proof of Lemma 5.6.2 shows that fl,(j)is the ratio of the righthand side of the normal equations to the coefficient of B, after the first j - 1 hi's have been eliminated from the normal equations. This corresponds to b: = c,*/a; in (20) of Section 2.3. In this case the coefficient of b, is T.f-, as defined in (45) f o r j = m 1. Thus as shown in Section 2.3 the numerator of the left-hand term in (46) is

+

(47) The lemma follows because -2

sj-1 plim -=

(48)

T-+m

THEOREM 5.6.3. If&+1 = ( 2 9 , (301, (351, atid

*

- = 3/,

Q.E.D.

1.

3’

= 0 and y = 0 , the diferences between

(49) i=m+l

have probability limits 0.

PROOF. These probability limits follow from Lemmas 5.6.2 and 5.6.3, (48), and plimT,,s2 = u2. Each criterion has a limiting Xe-distribution with p - m degrees of freedom under the null hypothesis Pm+] = * = 8, = 0 if y = 0; if y # 0, A is replaced by A*. A variant o n ZY=,,,+~ h:/a4 is obtained by replacing hi by

2

2

m

(50)

@kgl+k

=

k=O

=

T Yt-l-kUi

JT~-~ @k

f=l

T Jrl 2 _.

ut-jut,

f=1

j =m

+ 1 , . . . ,p.

222

LINEAR STOCHASTIC MODELS

CH.

5.

These variables have means 0, variances a‘, are uncorrelated, and have a limiting joint normal distribution. However, when pl, . . . ,/I, are replaced by estimates in the definition, the limiting distribution is altered. A. M. Walker (1952) has considered the limiting power of these tests against alternatives approaching the null hypothesis. The asymptotic theory is not altered by changing end terms in the sums. In particular, when y # 0, we can replace A* by

where c z , . . . ,cp*are defined by (24) of Section 5.4 when the observations are . . . ,yT. In turn, the c: can be replaced by r: = c:/c,*, j = 0, 1,. . . ,p. Note that (44) of this section corresponds to (31) of Section 5.4 (with p and m interchanged). Let us partition C*into yl,

c:

CZL1

Cf

(52)

Then the partial correlation between yt and yl+,, “holding yL+1,. . . ,Y,+,-~ fixed” is c:

(53)

c:

- c:,*I1(c:-l)-lc,-l

- z$:l(c:-l)-lE:-l

The normal equations for the estimate of P(p) based on c: as an estimate of a(j ) , j = 0, 1, . . , p , are

.

(54)

The solution of (54) for -b,(p) is (53). Thus, except possibly for end effects, the estimate of 8, under the assumption that the order isp is the partial correlation between the yt and y t f ptaking into account yt+l, . . . ,Y~+,,-~.

SEC.

5.1.

223

THE MOVING AVERAGE MODEL

Anotherquestionis that ofchoosingthesuitableorder ofthe stochastic difference equation. The analogy with ordinary regression (Section 3.2) suggests that if the investigator can specify a maximum order p and a minimum order m he can test b, = 0, then b,-, = 0, etc. We shall treat this question in Chapter 6 in terms of a modified model that permits the use of an exact theory similar to that of Section 3.2. For large T the normal theory can be used. The maximum order p can be taken quite large, say TI10 or T/4, if ei, i = p , p - I , . . . ,m 1, is taken very small, such as Since the series is presumably studied because it is known that the observations are not independent, the minimum order m might be taken positive but small, say 2. To test p, = 0 on a large-sample basis one needs only B,(p). In fact, to carry out the multiple-decision procedure one only needs b j ( j )f o r j = p and smallerj until the hypothesis bj = 0 is rejected. The calculation of these on the basis of r:, . . . , r: was indicated in Section 5.4. Alternatively the forward solution of R t b ( p ) = -r$ can be carried out to obtain the statistics needed.

+

5.7. THE MOVING AVERAGE MODEL

5.7.1. The Model Let

where a. = 1 and { u t } is a sequence of independent random variables with But = 0 and = u2. Then By, = 0 and

= 0,

It

- sl > q.

These first-order and second-order moments, of course, only depend on the 0;s being uncorrelated. We can also write (1) as (3) = 9*M(9--')ut,

224

LINEAR STOCHASTIC MODELS

CH.

5.

where 9is the lag operator ( 9 v t = ut-J, M(z) =

(4)

f:

ujzq-j,

j-0

and 9 J M ( 9 - 1 ) is the linear operator formed by replacing z in

Ii,,.o UjZj-

zqM(z-') =

IF&,

The covariance generating function of any stationary process is a(h)zA; the coefficient of zk is a(k), k = 0, f l , . . . . In the case of the moving average (1) with covariances (2) the covariance generating function is

2

a(h)zh = aZM(z)M(z-').

[Note that this equation, and hence (2), can be obtained from Lemma 3.4.1.1 Given the coefficients uo = 1, a,, . . . uq,the covariances can be calculated by forming the product on the right-hand side of ( 5 ) and identifying the coefficients of powers of z. Since the uj's are real, the roots zl, . . . , z,, of (6)

M(2) = 0

are real or complex conjugate in pairs. When the roots of (7)

5 r=o

a(g

- q)z'

ua #

0 [and hence a(q) # 01,

=0

are zl, . . . , zqy I/%,, . . . , I/%,,. Now suppose we are given an arbitrary set of covariances of a stationary process a(O), ~ ( l ).,. . , a(q) # 0 with a(q 1) = . * = 0. Then if z, is a root of (7), l/zj also is a root because a(h) = a(-h). The multiplicity of a root less than 1 in absolute value must be the same as the multiplicity of its reciprocal (because if the derivative of any order vanishes at z, it must vanish at l/zJ Thus the roots different from 1 in absolute value can be paired (z,,l/z,). Moreover, any root of absolute value 1 must have even multiplicity because a(g)zgis real and nonnegative for IzI = 1, say z = ei', for then the function is a spectral density (Section 7.3). (See Problem 35.) The 29 roots of (7) can be numbered and divided into two sets (zl, . . . ,z,,)and (z,,+~,, . . , zm) such that if lzil < 1 then zq+i = l/zi and if lzil = 1 then za+i = z i , i = 1, . . . ,q.

+

z=.-a

-

SEC.

5.7.

THE MOVING AVERAGE MODEL

225

Since the coefficients in (7) are real, the roots are real or occur in conjugate pairs. Then

has real coefficients and ( 5 ) holds for M ( z ) replaced by M*(z). Thus the covariance sequence {a(h)}with a(q 1) = * * * = 0 can be generated by a finite moving average process (1) with coefficients a: = 1, a *l , . . . , a:, and Var ( u t ) = a*. Given the covariances a(O), a(l), . . . , a(q) or the variance a(0) and correlations pi = a(h)/a(O), h = 1 , . . . ,q, the coefficients of the moving average process can be obtained by solving (7) for zl, . . . ,zpq,composing M*(z) and finding the coefficients of the powers of 2. If the roots of (6) are less than 1 in absolute value,

+

(9)

converges for 1x1

< I/max, lz,l.

We can write

m

If the roots of (6) are less than 1 in absolute value, (10) converges in mean square; the argument is similar to that used to show that (28) in Section 5.2 converges in the mean. [See also (34), ( 3 9 , and (36) of Section 7.5.1 I n fact,

226

CH. 5.

LINEAR STOCHASTIC MODELS

z-l

.

yr = kizz for suitable constants k,, . , ,ka if the roots are all different. [See (39), (40), and (41) of Section 5.2.1 Note that the condition excludes the simple average $(ut vt-l) and the difference Ao,= ut+l - 0:.

+

5.7.2. Estimation of Parameters If the 0:'s are normally distributed, the yl's are normally distributed with means 0 and covariances (2). We now consider estimating the q 1 parameters on the basis of T observations, yl, . . . ,y T . Unfortunately, although the covariance matrix has a simple form, the inverse matrix is not simple. In fact, a minimal sufficient set of statistics has T components, and the maximum likelihood equations are complicated and cannot be solved directly. (See Problems 4 and 5 of Chapter 6.) The approach followed by A. M. Walker (1961) was to apply the method of maximum likelihood to the approximate normal distribution of some - n sample correlations. As is shown in Theorem 5.7.1, J T ( r l - pl), . . . ,JT(r, - p,) have a limiting normal distribution with means 0 and covariances wgn = wgh(pl, . . . , pa), which are functions of the correlations ph = a(h)/a(O), h = 1, . . . ,q. The sample correlations are r, = ch/co,where

+

1 T-h ch = c-,, = - CyLyt+,,, T t=l

h = 0,1 , .

. . , T - 1.

Here n (n ;r q , n 5 T - 1) is fixed. Actually, the maximization of the approximate normal distribution of the sample correlations is only to motivate the estimation equations obtained; the properties of the estimating procedure do not depend on this approach. Let p = (pl, . . . , pa)'. The logarithm of the approximate likelihood function of r = (rl, . . . , rn)' is (12)

log L = -fn log 277

- 4 log IWI

r=

w-1

where (13)

('(I)),

r(2)

(

= Wll w 2 1

W12) W22

SEC.

5.7.

THE MOVING AVERAGE MODEL

227

are partitioned into q and n - q rows and columns. Then the vector of partial derivatives is

We set this vector of derivatives equal to 0. Since J T ( r ( l ) - p) and J T r ( 2 ) have a limiting normal distribution, we normalize the resulting equations by dividing by J?. Then the first two terms are of order I/JT and converge stochastically to 0. The resulting equations are

(See Problem 8 of Chapter 2.) An estimating procedure is to use r ( l ) as a consistent estimate of p, calculate W12(d1))W;l(r(l)) and estimate p by (16)

6 = r ( l ) - w12(r(l))W&r(l))r(2).

Then dF(6 - p) calculated according to (16) has the same limiting distribution as that of JT(6 - p) calculated according to (15), which is normal with mean 0 and covariance matrix

One can iterate the above procedure. Let 6 ( O ) be an initial consistent estimate; then iterate by

Walker gives the following example, using a series of 100 observations that Durbin (1959) generated by the model with q = 1 and a1 = ). The first five sample correlations were 0.35005, -0.06174, -0.08007, -0.141 16, and -0.15629. The matrix W is taken from the covariance matrix of the first five

228

LINEAR STOCHASTIC MODELS

CH.

5.

correlations 1

- 3pz + 4p4 2 p ( l - pa)

2p(l

I

- pa)

+ 2p* 2P

1

pa

0

2P

Pa

+ 2p9

0

2p

0

Pa

2P

1 +2p'

0

0

Pa

2P

2p

1 +2p'

where p = pl. For n = 2 and q = 1 (15) is

We take $(O) = r l ; then $(') = 0.38051 and = 0.38121 (p = 0.4). The estimated standard deviation of the asymptotic distribution is

It will be observed that (17) is a monotonically nonincreasing function of n in the sense that for every x the quadratic form x'(Wl, W , , ~ ~ W , , )isx nonincreasing with n. More precisely, x'[ Wl:)- W::)(W~:))-lWrn)]x 11 is the variance of the limiting normal distribution of J?x'(S - p) when rl, . . . ,r n are used; this asymptotic variance is monotonically nonincreasing in n for every x. This fact suggests choosing n relatively large (as long as T is large enough for

-

the asymptotic normality to hold reasonably well). The estimates p of p and c, of a(0) can be used to calculate estimates of a,,. . . ,au,and uz. Then JT(a, a,), . . . ,J?($ a,,) have a limiting normal distribution. Another approach was made by Durbin (1959). He suggested that the infinite sum (10) could be approximated by a finite sum

-

-

-

where n is fairly large. (Note yo = 1.) The sequence {v;} is a finite moving average process. [In fact (22) corresponds to (1 1) of Section 5.2 with suitable

SEC. 5.7.

THE MOVING AVERAGE MODEL

229

changes of notation such as u,'s replaced by Y,'S and y,'s replaced by u,'s and with the resulting linear combination of u ~ - , , - ~.,. . , tit-,-@ transposed to the left.] u: is ut plus a linear combination of u ~ - , , - ~., . . ,ut-n--q, the coefficients in this linear combination are small when n is large. The coefficients are linear combinations of Y,+~, . . . ,yn+* as defined by (9) or (lo), and these coefficients are linear combinations of (n 1)st to (n q)th powers of the roots of M(z) = 0, which are assumed less than 1 in absolute value. [See (14) or (22) and (26) of Section 5.2.1 Hence, the u:'s are nearly uncorrelated if n is large, and (22) is roughly a stochasticdifferenceequation. This suggests estimating y = (yl, . . . ,

+

+

The equation is written equivalently - in terms of the correlations r,, = 1, rlr . . . , r,. Theorem 5.7.1 asserts that JT(rl pl), . , . , J?.q and pk — 0. Since

2 α ί α ί + * = (Σ α ψ*·

(38)

the covariance between (34) and (34) with A replaced by A', 0 < A ^ A', is V

(39)

4

\t V+K'

α

σί 2 ' ) Σ

(/>

»-' + p'+h ~ 2ЛЛ)0>г-. + Ρ'+" *

= έ«Π 2 , α< ) =

Л

^*-«

+

/) +Ä -

'

2

Р*Р'Хрк->

2ph p

' ')

+

Ρ

'+Η' ~ 2PkPJ

CT 2 (0)W ÄÄ .

as denned in (28). [If Sv\ < oo, (39) is H m , , ^ 7tf(cA - />Ac0)( + Ф33*'3'] = О,

ар

where дхт'

(21)

j = q + 1, . . . , q + p, s = 1, . . . , p,

эр дхт'

(22)

J = q + p + U · ■ ■ , n, s = \, . . . , p.

эр

То simplify (20) we note that (23)

(24)

dXj

j = q + 1, . . . , q + p, s = 1,. . . , p,

=

dß, дх, 3ß.

= 2

Ζ^

ßtri-s-n

j = q + p + 1, . . . ,n, s = 1, . . . ,p.

Since 7 г ( ^ _ , - Pj_s), j — q + 1,. . . , q + p, s = 1 p, have a limiting distribution, dXjjdß, can be replaced by p,_a in (20) for these values of у and s. Because of (16), Ν / Γ ^ ^ , , ; = q +p + 1 , . . . , n, s = 1 p, have a limiting distribution and dxjdß, can be replaced by 0 in (20) for these values of у and s. Then (20) is asymptotically equivalent to (25)

Ц[фИ(Х _ p) + φ«χ + φ23χ]

=

0)

where

(26)

Po+i

P«+j>-i

P,-i

P«

Ра+р-г

,. Pe-p+l

Pq-v+2

R = 0»,-.) =

240

CH. 5.

LINEAR STOCHASTIC MODELS

We now want to argue that R is nonsingular. If R were singular, there would be a vector of constants c = (q,. . . , c,)' such that

2 P

-zp,.l

j =q

= 0,

C*Ph

a-1

+

+ 1, . . . ,q -b P.

From (16) we have pn = /?Iph-t, h = q 1, . . . . Application of this to (27)extends the range of validity of (27)t o j = q 1, . . . . This fact would imply that the process would satisfy a model (1) with Po, . . . ,8, replaced by cl, . . . ,c, and values of al, . . . , aq determined by cl, . . . ,c, and p l , . . . , pq in (13). However, this would contradict the fact that /?, # 0 and p 1 constants are needed on the left-hand side of (1). Since R is nonsingular, (19) and (25) can be written

+

+

Then

(29)

(See Problem 8 of Chapter 2, for example.) The right-hand side of (29) is the regression of dl) p and xt2)on d3)based on the covariance matrix of the asymptotic distribution. Then

-

(30)

@ = x(')

-

*

4-'x'3' 13 33

consists of the (asymptotic) residuals of the correlations r l , . . . ,re on j =q p 1, . . . , n. The other set of equations in (29) states that the (asymptotic)residuals of pI,r j = q 1, . . . ,q p, on PJtr $, j =q p 1, . . . , n, are 0. These equations then can be written

I;t-o/3J?tr,-s-r,

z;,t=o ,-,-

+ + + +

zfs0 ,-,,

+

+

*=O

where rjwS(xf3)) denotes the residual of from its (asymptotic) regression on d3). The equations are nonlinear because 4 and d3)depend on the unknown parameters. However, (30) and (31) can be solved iteratively by computing 4

SEC.

5.8.

AUTOREGRESSIVE PROCESS WITH MOVING AVERAGE RESIDUAL

241

and xis) on the basis of initial estimates of p and the Ft’s and solving (30) and (31) for revised estimates. Initial consistent estimates can be found from 67 = ri, i = 1,. . . ,q, and = 0, j = q 1 , . . . ,q + p . The estimates defined by (30) and (31) are consistent and asymptotically normally distributed. In fact

~~=,&,-,

(32)

JT

+

( 6 - p) = \/T ( x ( ~-) p) - *l3*iiJT

x(~)

has a limiting normal distribution with mean 0 covariance matrix

*13*;i431.The equations (31) are

all-

(33) 1-1

-

JT

since lo= Po = 1. The set [ r i ( d 3 )) p i ] , i = 1 , . . . ,q limiting normal distribution. Then (33) is equivalent to

+ p,

has a

(34) j=q+

1,

. . . ,q + p .

By virtue of (16) the right-hand side of (34) can be written as (35) 1=0

which have a limiting normal distribution with means 0 and covariance matrix Since [ r , - , ( ~ ( ~ converges ))] stochastically to R’,(34) shows 4,, that JT(B - p), where p = (fll,. . . ,b,)‘ and p = (&, . . . ,p,,)’, has the limiting distribution of

(R’)-’JT ( x ‘ ~ ’ *23*iiX‘3’) which is normal with mean 0 and covariance matrix

(36)

(37)

(R’)-’(*Zz

- *23*ii*32)R-’.

Then JT(6 - p) and JT(p - p) have a limitingjoint normaldistribution and the covariance between the two vectors in this limiting distribution is

(38)

(*lZ

- *13*~~*32)R-1-

Durbin’s procedure for the moving average model can be modified [Durbin (1960b)l. Suppose one starts with some initial estimates of PI, . . . ,Pp, say I:’,. . . ,&’I (as suggested above for iteration); let 1.4:) = &’)yt-a.

242

LINEAR STOCHASTIC MODELS

CH.

5.

Then use Durbin’s method described in Section 5.7.2 to estimate the coefficients in the moving average model

(39) i=O

where the u y ) ’ s are taken as the observed variables. Call the resulting estimates

. . . , a^ ( Op) .

& ,):

Let { x ~ be } a stochastic process such that

2 P

(40)

Bsx1-s =

017

t =

..., - l , O , l , . . . .

S=O

Then the stochastic process defined by y:

(41)

= xl

+ alxl-l + . - + aox1--9, *

t

=

. . .,-I,O, I , . .. ,

satisfies (I) for t = . . . , - 1 , 0 , 1, . . . . If { x t ) is given for t = . . . , - 1,0, and {Y:} is given for t = I , 2, . . . , then ( 2 , ) could be continued by solving (41) recursively for x l , x 2 , . . . . This suggests using d y ) , . . . , d r ) and arbitrary x r ’ , . . . ,x!:.,, to define

.I”’

(42)

= yr - ( & y x ; : ;

+ . . . + aZ“’x:!;),

t =

1,

. . . , T.

Then the fir’s are estimated on the basis of ( 5 : ) ) satisfying (41). This leads to estimates . . . ,&). Durbin proposes to alternate back and forth between these procedures, calculating t i p ) , , . . , &,): I:),. . . , etc. It is hoped that for large T (and large n ) the iteration converges.

&”,

JF),

5.9. SOME EXAMPLES Wold (1965) has generated artificial time series by using second-order difference equations with ul’s taken as independent random normal deviates. Three of these series are tabulated in Appendix A.2 and the three series are graphed. The coefficients were b1 = - y , p2 = y 2 and the roots of the associated polynomial equations were yefiZnlewith y = 0 . 2 5 , 0.7, and 0.9, respectively. In each case the variance of u1was taken so the variance of yt was 1. There can be observed i n the graphs a tendency towards oscillation with periods of length roughly 6; the period is more pronounced with the larger amplitude y . The first three correlations (r,, = C,,/Co)of the three tabulated series for T = 200 and the corresponding population correlations [p,, = a(h)/o(O)] are given in Table 5.1.

SEC.

5.9.

243

SOME EXAMPLES

TABLE 5.1 SAMPLE AND POPULATION CORRELATION COEFFICIENTS FOR THREE AUTOREGRESSIVE PROCESSES (T = 200) Y

rl

0.25 0.70 0.90

0.2473 0.51 13 0.501 I

1'3

PI

P2

P3

0.0492 -0.3001 -0.7726

0.23529 0.46980 0.49724

-0.00368 -0.16114 -0.36249

-0.015625 -0.34300 -0.72900

r2

0.1120 -0.0473 -0.3858

Estimates of and p2 under the assumption that p = 2 and estimates of pl, PI, and p3 under the assumption that p = 3 were obtained from the equations

respectively. The results are given in Table 5.2. The test of the null hypothesis p3 = 0 can be made on the basis of J T b , ; the three values are -0.1458, 1.208, and -0.7706. Under the null hypothesis the values of JT b3 should be approximately distributed according to N(0, 1). In none of the three cases would the null hypothesis be rejected at any reasonable level of significance such as I %, 5 %, or 10% for two-sided tests. Yule (1927) proposed the autoregressive process as a statistical model in preference to the model of an error superimposed on trigonometric trends such as considered in Chapter 4. He applied it to Wolfer's sunspot numbers, which are annual measures of sunspot activity, from 1749 to 1924. [The data have been expanded and given in more detail by Waldmeier (1961).] The figures used by Yule are tabulated in Table A.3.1 of Appendix A.3. Similar data analyzed by TABLE 5.2 ESTIMATES OF COEFFICIENTS OF AUTOREGRESSIVE PROCESSES

Y 0.25 0.70 0.90

-

B1

-0.25 -0.70 -0.90

B2

0.0625 0.49 0.81

61

__

b2

bl

b2

bS

-0.2334 -0.6893 -0.9736

-0.0518 0.3561 0.9010

-0.0103 0.0854 -0.0545

___

______.

-0.2339 -0.7250 -0.9273

-0.0542 0.4180 0.8505

244

LINEAR STOCHASTIC MODELS

CH.

5.

TABLE 5.3 CORRELATIONS FOR SUNSPOT

Lag

Correlation

4 5

0.81 1180 0.433998 0.03 I574 -0.264463 -0.404119

I 2 3

NUMBERS

Craddock (1967) are graphed below. Roughly, the series looks somewhat similar to the artificial series of Wold. The first five correlations as given by Yule are recorded in Table 5.3. For a second-order process Yule obtains the estimates 6, = -1.34254, 6, = 0.65504, 0 = - 13.854, Ci = 15.41. The roots of the associated polynomial equation are 0.67127 f 0.45215i = 0.80935e1i33~9e3".The tendency of the fitted model is to have a period of 360"/33.963" = 10.600 (years). This is rather less than the usually accepted period of a little over I I years. [Schuster (1906) arrived at 1 1.125 years.] From the data in Table 5.3 Yule worked out the estimates of /I, on the basis of a pth-order process (the partial correlation coefficient) for p = 1, 2, 3, 4, 5. These are given in Table 5.4. It will be seen that JTbJp), which is approximately normally distributed if /I, = /I,+ =, * . = 0, are relatively small numerically for p = 3,4, 5. Another study of the sunspot data was made by Craddock (1967) on the basis of annual mean numbers from 1700 to 1965. A graph of the record is TABLE 5.4 LAST COEFFICIENTS

P I 2

3 4 5

b,(P) -0.811180 0.655040 0.101043 -0.013531 0.05Ooo I

J r UP) -10.761 8.690 1.340 -0.180 0.663

SEC. 5.9.

245

SOME EXAMPLES

100

1700

100

1800

I

I

I

I

I

I

I

I

I

1

1710

1720

1730

1740

1750

1760

1770

1780

1790

1800

I

I

I

I

I

I

I

I

I

1

1810

1820

1830

1840

1850

1860

1870

1880

1890

1900

I

I

I

I 1900

1910

1920

1930

1940

1950

I

I

1960

Figure 5.1. Sunspot activity over the years 1700-1965.

given in Figure 5.1. The correlations are given in Figure 5.2. Craddock fitted alternative autoregressive models for values of p from 1 to 30. In Figure 5.3 is plotted the ratio 6:/&: (in terms of percentages), where 6; is the estimate of variance based on apth-order model (with a constant included). The difference T(6:-, - 6;) is proportional to which is used in testing the hypothesis 3/, = 0. [See (16) of Section 5.6.1 In fact the t2-statistic is

E/U*~~,

-1.0

Figure 5.2. Correlation coefficients for sunspot numbers by Craddock.

246

LINEAR STOCHASTIC MODELS

I,

I l 1 ! 1

1

Figure 5.3.

5

I 1 I I 1 I 1 I

I

15

10

I I

I

I

I

20

I I I

1

CH.

I 25

I

I I

5.

I1

30

---+-P Residual variation in autoregressive models.

The data indicate clearly that and p2 should be taken different from 0. The set p3,. . . , BB are doubtfully 0, plo,.. . ,/Ile are more likely to be taken 0, and PIS, . . . , p30 can be concluded to be 0. In formal terms of the multiple decision procedure describedattheendof Section 5.6.3 wetakep = 30and m = 0. The coefficients JTfij, j = 9,. . . ,30, are roughly 1.2, except for J ~ f i , , ofabout 2.2 and JTfi, of about 3.8. Tf the sequence of ti, i = 9 , . . . , 30, is such that 0.0002 < ei < 0.02, then the order would be decided as 9; if the sequence of e i r i = 18, . . . ,20, is such that 0.03 < E * < 0.1, the order would be decided as 18. ( E ~= * * = 830 = 0.0035 corresponds to po = 0.9.) Whittle (1954) has studied the sunspot phenomenon on the basis of semiannual data from 1886 to 1945 and found an autoregressive model using lags of 1 and 22 (that is, 6 months and 11 years). Many other statistical investigations have been made. Schaerf (1964) has also studied Wolfer’s series as well as Whittle’s. She fits an autoregressive model with lags I , 2, and 9. As another example, consider the Beveridge trend-free wheat price index tabulated in Appendix A.1 together with the correlations. The estimated coefficients of a pth-order autoregressive process are given in Table 5.5 for p = 2, 3, . . . , 8. The penultimate column gives JTb,(p), which can be used to test the null hypothesis p, = 0 against the alternative @, # 0 under the assumption that the order is not greater than p. It will be observed that the null hypothesis /I2 = 0 is rejected (at any reasonable significance level); the null hypothesis 8, = 0 is accepted for p = 3,4, or 5 (at any reasonable significance level); and the null hypothesis /?, = 0 is rejected for p = 6,7,or 8 at the 2.5 % significance level and accepted for p = 6 or 7 at the 1 level. If one assumes

-

SEC.

5.10.

247

DISCUSSION

TABLE 5.5

ESTIMATED COEFFICIENTS OF AUTOREGRESSIVE PROCESSES FOR BEVERIDGE TREND-FREE WHEATPRICEINDEX

2 3 4 5 6 1 8

-0.1368 -0.1489 -0.1503 -0.1488 -0.1435 -0.1285 -0.1123

0.3110 0.3391 -0.0388 0.3521 -0.0662 0,3494 -0.0516 0.3501 -0.0582 0.3431 -0.0524 0.3495 -0.0543

0.0367

0.0055 0.0504 0.0436 0.0496

-0.0415 -0.0546 -0.0138 -0.0211

0.1284 0.0411 0.1165 0.0895 0.0153 0.1390

THE

5.984 -0.146 0.106 -0.798 2.410 2.241 2.614

0.6119 0.61 10 0.6162 0.6151 0.6050 0.5968 0.5853

that the order is not greater than p for p = 6, 7, or 8 and uses the multipledecision procedure suggested earlier one would decide on the value o f p if E , is at least 0.025. On the other hand, if one used as a multiple-decision procedure testing hypotheses br = 0 in sequence, j = 2,3, . . . , until one was accepted and then deciding on the order of the last rejected hypothesis, he would settle on the second-order process [as did Sargan (1953)l. However, Table 5.5 suggests that an order greater than 8 might be suitable. 5.10

DISCUSSION

In this chapter we have discussed statistical models for stationary processes that depend on a finite number of parameters in a form that may be said to have linear structure: linear in the coefficients, in the observed variables, and in the hypothetical random variables. In some cases the model corresponds to a theory of the generation of the data and the coefficients have substantive meaning; in other cases the model is an approximation that is adequate for many purposes. The autoregressive model has several advantages over the moving average model and over the autoregressive process with moving average residuals, though in certain instances the latter models may give a good description of the formation of the observed time series. The estimates of the coefficients of the autoregressive process are readily calculated, and inference based on large samples is easy to carry out because it corresponds to ordinary least squares. In many cases the coefficients have a direct interpretation, and the linear functions of lagged variables can be used for prediction. As shall be seen later, any of these models can be used for approximating and estimating the spectral density.

248

LINEAR STOCHASTIC MODELS

CH. 5.

REFERENCES Section 5.1. Gilbert Walker (1931), Yule (1927). Section 5.3. Halmos (1958), Turnbull and Aitken (1952). Section 5.4. Durbin (1960a), (1960b). Mann and Wald (1943b). Section 5.5. T. W. Anderson (1959). T. W. Anderson and Rubin (1950), Koopmans, Rubin, and Leipnik (1950), L o h e (1963). Section 5.6. Bartlett and Diananda (1950), Quenouille (1947), A. M. Walker (1952). Section 5.7. Durbin (1959), A. M. Walker (1961). Section 5.8. Doob (1944), Durbin (1960b), A. M.Walker (1962). Section 5.9. Craddock (1967), Sargan (1953), Schaerf (1964), Schuster (1906). Waldmeier (1961), Whittle (1954), Wold (1965), Yule (1927). Problems. Haavelmo (1947), Kuznets (1954).

PROBLEMS 1. (Sec. 5.1.) Find the coefficients a l , (Ap

+ alAP-’ +

. . . ,a) in (2) written as * * *

+ a P ) y t - ) = ut.

2. (Sec. 5.2.1.) Carry through in detail (30) in the case of q = 1,p = 2. Express u: defined by (31) in terms of ut for r = 1. Verify that u: are uncorrelated. 3. (Sec. 5.2.1.) If {y,} is a stationary process and satisfies

Yt

+ Yt-1

=

Ut,

with b u t = 0 , b u : = u2, and €‘utu, = 0 , r # s, prove yt = (-l)syt-s with probability 1. 4. (Sec. 5.2.1.) Prove that a linear difference equation of order p PO(Oz1

+ 81(r)zt-1 + . . + BP(dzt--)= a ( 0 ,

& ( f ) # 0 , pp(r) # 0 , over a set S of consecutive integer values of r has one and only one solution zt for which values at p consecutive r-values are arbitrarily prescribed. [Hint: Prove by induction on r , showing that the values of a solution zt at any p consecutive integers determine its value at the next integer.] 5. (Sec. 5.2.1.) Show that the rth power of any root x i of the polynomial equation (23) associated with

&wt

+ P,wt-l + . + P,wt-, ’ ’

=0

is a solution to the difference equation (Po # 0,P, # 0). [Hirrr: Substitute wt

= zf.]

249

PROBLEMS

6. (Sec. 5.2.1.) Show that

(9 - a)Pa(r)at= Pa-l(t)at+i, where Pa(r) and PVl(t) are polynomials of degree q and q 7. (Sec. 5.2.1.) Show that

( 9- a)"Pm-l(t)a'

+

+

- 1, respectively.

= 0.

+

8. (Sec. 5.2.1.) Show that coat c1tat-l ' .* cm-ltm-lat-mtl can be written P,,,-l(r)at (for a # 0). 9. (Sec. 5.2.1.) Prove that if a is a real root of multiplicity m of the polynomial equation (23) associated with the homogenous difference equation of Problem 5, then wt = Pm-l(r)atis a solution to the difference equation. 10. (Sec. 5.2.1 .) Prove that if aeie and ae-*e are a pair of conjugate complex roots of multiplicity m of the polynomial equation (23) associated with the homogeneous difference equation of Problem 5, then

+

wt = Pm-l(t)ate*te Pm-l(t)ate-if8

is a real solution to the difference equation, where Pn,-l(t) and pmd1(t) are polynomials whose corresponding coefficients are conjugate complex numbers. 11. (Sec. 5.2.1.) Prove that if the roots of (23) are distinct, every solution of the equation of Problem 5 is wt = Z:Llkiz:; wt real implies ki is real if xi is real and ki and ki+l are conjugate complex if zi and are conjugate complex. [Hint: Use Problem 5 to show the above is a solution and use Problem 4 to show it is unique for specified values of wt for p consecutive values of t . ] 12. (Sec. 5.2.1 ,) In a second-order difference equation model is B1 = 0 a necessary and sufficient condition that yt and yt-l are independent? 13. (Sec. 5.2.2.) Verify from (51) that u(0) =

(I

+ Be)* -

.-1

UZ

B1*

1

+Be

- Be'

14. (Sec. 5.2.2.) Verify the results of Problem 13 from (42) and

IS. (Sec. 5.2.2.) Graph S, and u(r) for (c) B1 = 0.1, and (d) /I1 = -0.1.

uz =

I and (a)

&

= 0.9, (b)

B1 =

-0.9,

250

CH.

LINEAR STOCHASTIC MODELS

5.

16. (Sec. 5.2.2.) Graph 8, and a(r) for 19= I and (a) P1 = 0.25, B, = 0.0625, (b) B1 = 0.7, P2 = 0.49, and (c) B1 = 0.9, Bn = 0.81. 17. (Sec. 5.3.) Find A' when I.

1

0

0

1

1

0

0

1

... ... ...

0

0

0

0

0

0

. .

. .

.

A =

.

0

0

0

-..

2

1

0

0

0

* . *

0

1

18. (Sec. 5.3.) Prove that L of Problem 17 has the characteristic root 1 of multiplicity equal to the order of the matrix and that there is only one linearly independent characteristic vector. 19. (Sec. 5.3.) Prove that the one and only characteristic vector of (except for multiplication by a constant) corresponding to a root .xi of any multiplicity is ( z r * , z:-?, . . . , 1)'. Hence, prove that the number of linearly independent characteristic vectors of -fi is equal to the number of distinct roots. Argue from this that if the cannot be written as CAC-l with A diagonal. roots are not all distinct then 20. (Sec. 5.3.) Verify that (28) of Section 5.2 is identical to (22) of Section 5.3. 21. (Sec. 5.4.) Find conditions under which

-a

-a

is an unbiased estimate of -& when p = 1 (and q = 0). 22. (Sec. 5.4.) The estimates in (b) of Problem 18 in Chapter 3 are fo = 72.31, f1 = 2.206, and 9, = -0.1109. (a) Find the (maximum likelihood) estimates of the coefficients PI and p2 and the variance uz in the model i t

+ PlG1-1 + P Z V t - 2

where u t has mean 0 and variance Gt

= .!/t

=

I'tr

u2,

- Yo&T(Z)

- YidlT(Z),

and the 1897 price is Y-.~, the 1898 price is yo, etc. (b) Find estimates of the roots zland x 2 of the associated polynomial equation (23) of Section 5.2, estimates of coefficients 6, in (42) or (44) of Section 5.2, and estimates of u(r) in (51) or (52) of Section 5.2.

25 1

PROBLEMS

and ,!Iz in (a) are approximately the maximum

(c) Show that the estimates of likelihood estimates in

Y:

+ PlY1-I + Bey:-, + + a,? = u:. a0

23. (Sec. 5.4.) (a) Let estimates of

PI, B2 be defined by

Show that

(b) Let the estimates of

PI, p2, B3 be defined by

Show that

24. (Sec. 5.5.1.) Write out (3) in detail when y t = i t and , verify that (3) defines the same estimates as (8) of Section 5.4withz-,and 9 deleted. 25. (Sec. 5.5.1.) Show that (3) and (4, define the maximum likelihood estimates of B and C defined by ( I ) and the fact that u l r . . . ,up are independently distributed, each according to N ( 0 , Z),and yo is a given vector. 26. (Sec. 5.5.2.) Show that the conditions that A is positive semidefinite and tr A = 0 imply that A = 0. 27. (Sec. 5.5.3.) Prove that (31) can be solved uniquely for A . [Hint: Transform (31) to

A*

- AA*A'

= C*,

where A* = C - l A(C')-l and C * = C-' Z(C')-l.] 28. (Sec. 5.5.3.) Show that if A = B C, where B is positive definite and C is positive semidefinite, then A is positive definite. [Hint: x'Ax = X'BX x'Cx and A is positive definite if and only if x'Ax 2 0 and x'Ax = 0 implies x = 0.1 29. (Sec. 5.5.4.) Verify that if the random matrix W has covariance matrix A 0 C, then A-lW has covariance matrix A-I 0 C . 30. (Sec. 5.5.5.) Show that (60) and (61) define the maximum likelihood estimates of B, Y, and C defined by (57) and the facts that ulr . . . , uT are independently distributed, each according to N ( 0 , C), and yo is a given vector. 31. (Sec. 5.5.5.) Evaluate (67) for p = 2 and B = s.

+

+

252

LINEAR STOCHASTIC MODELS

CH.

5.

32. (Sec. 5.5.5.) Show that the sum of expected values of the squares of the components of (70) converges to 0. 33. (Sec. 5.6.3.) Verify that (50) has mean 0 and variance 6 and that the covariance between (50)and (50)with j replaced by j’ ( # j ) is 0. 34. (Sec. 5.7.1.) Forq = 1 show that lu(l)l S iu(0). 35. (Sec. 5.7.1.) Prove that any root of (7)of absolute value 1 is of even multiplicity. [Hinr: For z = e“

2

.@b*

=

2

u(g) cos ag =f(1),

say, is a spectral density and hence is nonnegative. (See Section 7.3.) Show that if 1 = Y is a root off(A), it is of even multiplicity by showing that the highest order of the derivative off(& vanishing at 1 = Y must be odd.] 36. (Sec. 5.7.1.) Prove that (10)converges in the mean. 37. (Sec. 5.7.1.) For q = 1 show that 1 E , 1 = u8,(1 - a2(*+l))(l 1 - cx?)-’, where E, = [u(t s)] is of order T. [Hinr: See Section 6.7.7.1 38. (Sec. 5.7.1.) For q = 1 show

-

- a*min(b)) 1 , 04(1 - at) where (ug) is the inverse of the Tth-order matrix [u(t - s)] = E , . lim u$ =

( -al)lt-al(l

39. (Sec. 5.7.3.) For q = 1 and p1 = p show that for positive indices wll = 1 - 3p3 W12

= 2p

+ 4p4,

- 2p8,

Wjj

=1

+ 2p*,

wi.j+l = 2 ~ 9

wj,,+*= p*, j = 1,2,. . . ,

j = 2,3,.. . , j

.. ,

= 2,3,.

wij = 0 ,

li - , j i

> 2.

40. (Sec.5.9.) The following table gives Deflated Aggregate Disposable Income for the United States, 1919 - 1941, in billions of dollars: 1919 1920 1921 1922 1923 1924 1925 1926

44.32 45.45 44.08 47.67 54.10 54.65 56.28 58.00

1927 1928 1929 1930 1931 1932 1933 1934

59.26 61.58 65.04 59.18 54.91 46.72 48.12 53.25

1935 1936 1937 1938 1939 1940 1941

57.47 65.87 67.39 62.34 68.09 72.71 84.29

The data were obtained from Haavelmo (1947) for 1922-1941 and from Kuznets [(1954), pp. 151-1531 for 1919, 1920, and 1921. (a) Graph income against year.

253

PROBLEMS

(b) Find the (maximum likelihood) estimates of the coefficients PI,B2,yl. y2 and the standard deviation IJ in the model Y:

+ BrYt-1 + BOY:-2 + Y1 + Y d

..., 21, = - 1, 0,1, . . . , t = 1,

= Ufr

where ut has mean 0 and variance u2,ytis income for year t + 1920, t 21, and y-l and yo are treated as “given.” The matrix of sums of squares and crossproducts in the order 1 , I , y,-,, y:-,, Y: is 231 231

1,202

1,174

1,241

3,311 13,905 13,551

14,473

1,202 13,905 70,112 68,247 72,365 1,174 13,551 68,247 66,781 70,313 i 1,241 2 1 14,473 72,365 70,313 75,151

1

. .

(c) Find the (maximum 1ikelihood)estimates of p and 6(the coefficients of the trend) when the model is written Y:

- (CC+ S t ) + B ~ { Y ~-- [~P +

- 0 1 ) + Bo{~t-e- [ P + S(r - 211) = u:.

(d) Find the roots of the polynomial equation associated with the estimated coefficients of the second-order stochastic difference equation, and give the angle and modulus of these (complex) roots.

The Statistical Analysis of Time Series by T. W. Anderson Copyright © 1971 John Wiley & Sons, Inc.

CHAPTER 6

Serial Correlation

6.1. INTRODUCTION One of the first problems of time series analysis is to decide whether an observed time series is from a process of independent random variables. A simple alternative to independence is a process in which successive observations are correlated. For example, the process may be generated by a first-order stochastic difference equation. In such cases it may be appropriate to test the null hypothesis of independence by some kind of first-order serial correlation. Given a series of observations yl, . . . , y T from a process with means known to be 0, one definition of a first-order serial correlation coefficient is T

2 YLYI-1 1=1

In this chapter we shall consider various definitions of serial correlation. When the means are known to be 0, these definitions are modifications of ( I ) ; when the means are not known, y t in (1) is replaced by the deviation from the sample mean (or from a fitted trend). Many modifications consist in changes in the so-called end terms, that is, changes in terms involving yl, yz, yT-l, yT; other modifications may involve multiplication by a factor such as T / ( T - 1). A serial correlation is a measurement of serial dependence in a sequence of observations that is similar to the measurement of dependence between two sets of observations q ,. . . , zg and q , ., . , 2." as furnished by the usual 2 54

SEC.

6.1.

255

INTRODUCTION

product-moment correlation coefficient (when the means are assumed 0)

5

4Zi

r=

i=l

i=l

In the case of the serial correlation x i is replaced by y t and zi is replaced by yt-l in the numerator. Since both yI; and y:-l = 2 1 L; y: estimate (T- l)a2 in the case of a stationary process (with mean 0), these may be replaced by yf [possibly multiplied by (7'- I)/T]. This gives ( I ) as an analogue of (2). An alternative term for serial correlation is autocarrelation. Higher-order serial correlations are defined similarly. One definition of a jth-order serial correlation is

zEz

z?',

, : I

(3) t=1

this measures the serial dependence between observations j time units apart. Such a correlation coefficient might be used to test the null hypothesis of independence against the alternative that there is dependence between observationsj time units apart. In most situations, however, if there is dependence between observations i time units apart, it would be assumed that there is dependence between observations that are 1, 2, . . . , and i - 1 time units apart. The ith-order serial correlation would customarily be used to test the null hypothesis of (i - 1)st-order dependence against alternatives of ith-order dependence. For example, a process generated by an ith-order difference equation would be said to exhibit ith-order dependence. The ith-order serial correlation coefficient would be used in conjunction with the serial correlation coefficients of lower order. A more general problem is the determination of the order of dependence when it is assumed that the order is at least m (20)and not greater thanp (>m). Such a multiple-decision procedure will be based on the serial correlation coefficients up to order p. Tests of independence, tests of order of dependence, and determination of dependence were considered in Chapter 5 within the context of the stochastic difference equation model (roughly, the stationary Gaussian process of finite order). The properties of procedures and the distributions of estimates and test

256

SERIAL CORRELATION

CH.

6.

criteria were developed on a large-sample basis. In this chapter optimal procedures and exact distributions are derived for models which are moderate modifications of the stochastic difference equation model. (For instance, the estimate -8, in the first-order difference equation is a type of first-order serial correlation coefficient.) The general types of models will be developed from the stochastic difference equation model (Section 6.2). For these exponential models optimal tests of dependence (one-sided or unbiased) will be derived (Section 6.3). Multipledecision procedures will be developed from these tests (Section 6.4) in a fashion similar to the development of procedures for choosing the degree of polynomial trend (in Section 3.2.2). Some explicit models will be defined for known means (Section 6.5) and for unknown means, including trends (Section 6.6); these models imply definitions of serial correlation coefficients, and the treatment includes the circular serial correlation coefficient and the coefficient based on the mean square successive difference. Some exact distributions of serial correlations are obtained (Section 6.7), as well as approximate distributions (Section 6.8). Some joint distributions are also found (Section 6.9). Serial correlations as estimates will be considered later (Chapter 8). 6.2.

TYPES OF MODELS

In Chapter 5 we studied time series that satisfied a stochastic difference equation of order p,

where {ut} is a sequence of independent random variables with means 0 and variances 02. If the distribution of yl, . . . ,yzl is given and u1is’independentof 1, . . . , T,then the distributions of . . . ,uT ytW1,yt-2, . . . , yl, f = p determine the joint distribution of yl, . . . ,yT, which are the observed variables. If the ut’s are identically distributed, the joint distribution of y, . . . ,y P can be taken so that the 9:’s form part of a stationary process when the roots of the associated polynomial equation are less than 1 in absolute value. [It was shown in Section 5.2.1 that a necessary and sufficient condition that a stochastic process . . . , Y - ~ , yo, yl,. . . satisfying a stochastic difference equation (1) be stationary such that y t is independent of u:+~,u ~ +.~. ., is that the roots of the associated polynomial equation be less than 1 in absolute value.] If u2 is the only unknown parameter in the distribution of utr then Bl, . . . ,3/, and cB constitute the parameters of the process.

+

SEC.

6.2.

257

TYPES OF MODELS

We shall now consider time series with joint normal distributions and with means 0; cases of nonzero means will be taken up later in this chapter. Models with normal distributions are called Gaussian. Here the joint distribution of y,, . . . ,y P is determined by its covariance matrix, Z P P ,say. Let

x,:

(2)

Then the joint density of yl,

= (a;;)).

. . . , y P is

(3) and the joint density of u P C 1 ,.. . , uT is (4)

Thus the joint density of yl,.

. . ,yT is

The exponent in ( 5 ) is -3 times

1.a-1

i=P+l

(We assume T 2 2p.) Let (6) be y’C-’y, where y = (yl, . . . , yT)’. If C,, is chosen to obtain a stationary process, the fact that

(7)

a(s

- t) = a[(T-

s

+ 1) - ( T - f + I)]

implies that the element in row f and column s of C-’ is the same as the element in row (T - t + I ) and column (T - s + 1) of C-’ and therefore that the

258

CH.

SERIAL CORRELATION

6.

quadratic form in y,,. . . , y p is the same as the quadratic forni in yT,. . . , yT+l--P; that is, a,, = b,,, t , s = 1, . . . , p . The coefficients of the form in Y31+1--pcan be obtained from the left-hand side of (6). ?/TI * It may be noted that the quadratic form (6) has no term yfy8 for which It - sl > p. Except for the end terms, the quadratic form is a linear combination of the sums &tyI-j, j = 0, I , . . . ,p, namely, 9

+

+

Thep 1 coefficients of the quadratic forms form a transformation of the p 1 parameters u2, B,, . . . ,B,. Hypotheses and multiple-decision problems can be formulated in terms of these coefficients. In particular, 28,,/u2is the coefficient of z 1 t YtY1-,r The end terms in (6) involve coefficients that are other functions of a*, p,, . . . ,By. This fact implies that a sufficient set of statistics includes some of ylyn and Y ~ + ~ - , Y ~ + ~ -t ,,s,= 1, . . . , p , as well as the p 1 sums Z1ytYf-j, j = 0,I , . . . ,p. In turn, the fact that the number of statistics in the minimal sufficient set is a good deal greater than the number of parameters makes it impossible to formulate inference problems in a way that leads to relatively simple optimum procedures. If T is large compared to p , the sums yfy,-, will tend to dominate the other terms. This feature suggests replacing the density ( 5 ) with densities that have these same sums but end terms modified to permit families of p 1 sufficient statistics. We shall suppose that the density of y,, . . . ,yT is given by

+

zt

+

(9)

f(~,,. . . ,Y T I yo,

. . . ,yp) = Kexp [--f(yoQo

+

* * *

+ y,,QP)l9

where Q, is a quadratic form in y = (y,.. . . ,yT.)’,

A, is a given symmetric matrix, j = 0, I , . . . ,p , with A, positive definite, and yo, . . . , y y are parameters with yo positive such that

SEC.

6.2.

259

TYPES OF MODELS

is positive definite. The density (9) is then multivariate normal; the constant K = K ( y o , . . . , y r ) depends on the values of yo, . . . ,y,, and is given by (12)

K(yo, . . . , y p ) = (2n)-- IyoA,

+ . + y,A,I! * *

Usually A, will be taken to be the identity matrix I. The relationship between the parameters in (9) and those in ( 5 ) is

Yl = 2

B,

+ Bld + ...+ U9

IJP-lBI, 9

-

= 8, = 0. Usually Note that ynr+, = * = y,, = 0 if and only if Pmfl = the quadratic form Qi is zLj+l plus a quadratic form in yl, . . . ,y,, plus a similar quadratic form in Y ~ - , + ~ ., . . ,yT. Thesep 1 quadratic forms are chosen conveniently (for example, with common characteristic vectors). Three examples of systems of quadratic forms are given in Section 6.5. The basic theorems of this chapter apply to some other problems, such as those involving components of variance. In terms of this model, we consider tests of the hypotheses that ym+l = * = y,, = 0 for specified m and p. We are particularly interested in the case m = p - 1 ; the case p = 1 is most important and has been considered most in the literature. We find the uniformly most powerful test against alternatives y v < 0 (equivalently 8, < 0) and the uniformly most powerful unbiased test. The actual use of these tests is limited by the available theory and tabulation of the test statistics. The second type of problem we treat is a multiple-decision procedure for choosing the largest index i for which yi # 0 for i = m,. . . ,p ; this is the order of dependence. We assign the probability of overstating the order, given the true order i for i = m , . . . p - I , and ask for the procedure that is uniformly most powerful against positive dependence ( y i < 0 ) or uniformly most powerful unbiased. Such procedures are built up from the tests of hypotheses in a manner similar to that of Section 3.2.2. [See T. W. Anderson (1963).]

+

-

260

SERIAL CORRELATION

CH.

6.

6.3. UNIFORMLY MOST POWERFUL TESTS OF A GIVEN ORDER

6.3.1. Uniformly Most Powerful Tests Against Alternatives of Positive Dependence The basic model for the observations yl, . . . , yT is the multivariate normal density (9) of Section 6.2. In this section we shall be interested in testing a hypothesis about y i , i = I , . . . , p , when we assume yi+l = * = yp = 0. In that case the density is (1) fi(YI,

. . > YT *

1 YO, . .

*

t

~ i = )

Ki(y09 . *

.

9

r i ) exp

[-l(yoQo

+

* * *

+

yiQi)l,

where

(2) ~ i ( y o ,.. . , Y , ) =

+

+

yo,. . . , y i , 0,.. . ,o) = ( 2 ~ 1 - IYl J,~

+ . . . + YiAiI',

and yoQo * yiQi is positive definite. Then the null hypothesis that y i = 0 is the hypothesis that the order of dependence is less than i (on the assumption that the order is not greater than i). When i = 1, the null hypothesis y1 = 0 is the hypothesis of independence (when A, is diagonal as, for instance, when A , = I). In this subsection we shall consider testing the hypothesis that y i is a specific value against one-sided alternatives. In the case of i = 1 and the null hypothesis y1 = 0 (PI = 0) a natural alternative is y1 < 0 (that is, PI < 0), which corresponds to positive population correlation between successive observations. In the case of i > 1 , the alternative to the null hypothesis of y i = 0 (Pi = 0) is not necessarily yi < 0 (pi < 0) because -pi represents the partial correlation between yr and yt+i given the correlations of lower order. Since the optimum tests against alternatives y i > 0 are constructed in a manner similar to the optimum tests against alternatives yi < 0 we shall restrict the actual treatment to the one-sided alternatives corresponding to positive dependence ( y i < 0). The problems of statistical inference are relatively simple for an exponential family of densities such as (1). One feature is that the minimal sufficient set of statistics is of the same dimensionality as the set of parameters.

.

.

THEOREM 6.3.1. Q,, . . , Qi form a suficient set of staristicsfor yo, . . , yi when yl, . . . , yT haoe the densityf,(y,, . . , yT yo, . . , y i ) given by (I).

.

I

.

The exponential family is of the classical Koopman-Darmois form, which obviously satisfies the factorization criterion that the density factor into a

SEC.

6.3.

261

MOST POWERFUL TESTS OF ClVEN ORDER

nonnegative (measurable) function of the parameters and sufficient statistics and a nonnegative (measurable) function of the observations that does not depend on the parameters. [See Lehmann (1959), p. 49, or Kendall and Stuart (1961), p. 22, for example.] In this case the first function can be taken to be the density, and the second function can be taken to be 1. The implication of the theorem is that inference can he based on Q,, . . . , Q i ; for instance, the significance level and power function of any test based on yl, . . . , y T can be duplicated by a (possibly randomized) test based on Q,, . . . , Qi. THEOREM 6.3.2. The family of distributions of Q,, . . . , Qi based on densities f i ( y l , . . . , YT y o , . . . , y i ) is complete as yo, . . . ,y i range over the set of values

I

for which yoQo

+ - + yiQi is positive dejinite. *

The property of completeness is also a property of the exponential family. [See Lehrnann (1959), p. 132, or Kendall and Stuart (1961), p. 190, for example.] A proof is outlined in Problem 7. Theorem 6.3.2 implies that a test of the null hypothesis that y, is a given value, say y ; l ) , at assigned level of significance ci and determined by a similar region has Neyman structure; hat is, if Sj is a set in the space of Q,, . . . , Q , such that (3)

Pr {Si

then (4)

almost everywhere in the set of possible Q,, . . . , Qi-,. [See Lehmann (1959), Section 4.3, or Kendall and Stuart (1961), Section 23.19.1 A proof is outlined in Problem 9. Note that since Q , , . . . , Qi-, are sufficient for y o , . . . ,yiVl, Pr {Si I Q,, . . . , QiP1; yo, . . . , yi-l, y j l ’ , 0,. . . ,0) does not depend on yo9

. . . , yj-1.

Let hi(Qi I Q,, . . . , Q i - l ; y o , . . . , y i ) be the conditional density of Q i , when y i + l = * = y p = 0. We are interested in this given Q,, . . . , conditional density on the set Vi-l of possible values of Q,,. . . , (that is, the set for which the marginal density of Q,, . . . , Qi-, is positive). Let hi(Qo,. . . , Qi I yo, . . . , y i ) be the density of Q,, . . . , Q i when yi+l = = Y,, = 0. This density is

- -

(5)

hi(Q0,

. . . , Qi

I

. = Ki(Yo9 . . YO,

*

9

3

Yi)

Yi)

exp

[-~(YOQO

+

* * *

+

ki(Q0,

~iQi>l

-

* * 7

Qi).

262

CH.

SERIAL CORRELATION

6.

The density ( 5 ) is obtained from the density of y,, . . . ,yT by transforming y,, . . . , yT to Q,, . . . , Qi and some T - i - 1 of the yt’s and integrating these yt’s over a range determined by Q,, . . . , Qi ;this operation yieldsk,(Q,, . . . ,QJ. The set Vi is the set of Q,, . . . , Qi for which ki(Q,, . . . , Q,) > 0. Then the marginal density of Q,, . . . , Qi-, is

(6) Ki(y0, * - . Yi)exp 7

[-HYOQO

where (7)

mi(QO,. . . , Qi-l; yi)

+

* * *

=I m

--m

+Y~-IQ~-I)I~~(QO, * * *

9

Qj-1;

~ i ) ,

. . ,Q,) dQi.

e-lYiQiki(Qo, .

[Note that mi(Q,, . . . , Q i - l ; y i ) > 0 for every value of y,.] Thus the conditional density of Qi, given Q,, . . . , Qi-, in Vi-,,is

I

= hi(Qi Qo,

*

.

* 9

Qi-1;

yi),

say. The conditional density of Q i , given Q,, . . . , Q ,,, is in an exponential family with parameter y,; for fixed Q,,. . . , Qf-,, ki(Q,, . . . , Q,) is a function of Qi and mi(Qo, . . . ,Qi-l; y,) is a function of y i and is the normalizing constant. It will be understood that the conditional density hi(Q, Q,, . . . , Qi-l ; y i ) will be written only for Q,, . . . , Q,, in Vi-,. Since a test of the hypothesis yi = yi” has Neyman structure, we can apply the Neyman-Pearson Fundamental Lemma (Problem 10) to the conditional distribution of Q,. The best test of the hypothesis that y i = yy’ against the ’: has a rejection region alternative y, = 7

I

(9)

where k, depending on Qo, . . . , Qi-l, is determined so that the conditional probability of the inequality is the desired significance level when y , = 7.’: If y y ) > y y ) , then the rejection region (9) can be written (10)

Qi

> ci(Q0, . . , *

Qi-1;

Y!’),

YI’)).

In particular, the test of yi = 0 against an alternative y i = y\2) < 0 is of this form. THEOREM 6.3.3. The best similar test of the null hypothesis y i = y:’) against the alternative yi = yy’ < y;” at significance level E , is given by (lo), where

SEC.

6.3.

MOST POWERFUL TESTS

OF GIVEN ORDER

263

ci(Qo, . . . , Qi-l; y “ ) , y y ’ ) isdeterminedso theprobability of (10)according to the density ( 8 ) is ci when yi = y:]’. Since ci(Qo. . . . , Qi-l; y,!”, y,!”) is determined from the conditional density (8) with yi = y:”, it does not depend on the value of $), and we shall write it as ci(Qo,. . . , Qi-l; y!”). Thus the region (10) does not depend on yi2’; it gives ’ the best test against any alternative ~ 1

(1 1)

ci(Qo9

* * *

Qi-1;

y!’’).

where ci(Qo, . . . , Qi-l; y:”) is determined so the probability of ( I 1) according to the density (8) is .si when yi = We are particularly interested in the case y:” = 0. Then the conditional density of Q, under the null hypothesis is (12)

I

hi (Qi Qo, .

-

*

*

Qi-1;

0) =

ki(Q0,

ki-l(Qo*

. . * Qi) . . Qi-1) * *

9

COROLLARY 6.3.1. The uniformly most powerful similar test of the null hypothesis y i = 0 against alternatives yi < 0 at signfzcance level .si has the rejection region (13)

Qi

> ci(Q09

* * *

9

Qi-11,

where ci(Qo, . . . , Qi-l) is determined so the integral of (12) over (13) is ci. When we come to consider specific models we shall see that usually the rejection region (13) is (14)

r,

> Ci(r1, . . . , r i - i ) ,

where rl = Ql/Qo,. . . , ri = Qi/Qo are serial correlation coefficients. In particular, when i = 1 , a test of the null hypothesis of independence against alternatives of positive dependence has the rejection region rl > C,. 6.3.2. Uniformly Most Powerful Unbiased Tests

In testing that the order of dependence is less than i (that is, testing y i = 0), we may be interested in alternatives of dependence of order i (that is, yi # 0). Against two-sided alternatives no uniformly most powerful test exists. For example, the best similar test of y i = yJ1’against an alternative y i = y!” < y!”

264

CH.

SERIAL CORRELATION

6.

at significance level E~ is (1 I), while the best similar test against an alternative > y:” has a rejection region yi = (15)

Qi

(16)

S U Q i

< c;(Qo,

* * *

9

Qi-1;

(1)

yi ),

where c:(Q,, . . . , Qi-l; #)) is determined so the probability of (15) according to the density (8) is ci when yi = y!”. When we are interested in two-sided alternatives, we shall require that the test be unbiased, that is, that the power function have its minimum at yi = y!”. Among the class of unbiased tests at a given significance level we shall seek a uniformly most powerful test. Such a test exists, because the family of densities is exponential, and its acceptance region is an interval. We shall sketch the derivation of the uniformly most powerful unbiased test; a complete and rigorous proof is given in Lehmann (1959), Chapter 4; Section 4. For given Q,, . . . , Qi-l let R be a rejection region with significance level q ; that is,

I Qo,

* * * 9

Qi-1;

(1)

yi

1 d Q i = 61.

R

Let the power function be /?(yJ; that is, /?(y:”) = el, where (17)

For the test to be locally unbiased we want the first derivative of the power function to be zero at yi = y1I). Interchanging the order of integration and differentiation gives us

R

SEC.

6.3.

265

MOST POWERFUL TESTS OF GIVEN ORDER

If we similarly evaluate the derivative of

at yi = y:” and compare with (18) we find (20)

R

Thus (20) is the condition of local unbiasedness. We now want to find the region R satisfying (16) and (20) that maximizes the corresponding B(yi) at some particular value of yi, say yi = yj’) > By a generalized Neyman-Pearson Fundamental Lemma (Problem 12) that region will be defined by (21)

hi(Qi I QoY

* *

9

Qi-1; Y?’)

> ahi(Qi I Qo, . . Qi-1; Y : ’ ) ) + bQihi(Qi I Qo, . Qi-1; yi’)) if there are functions of Q,, . . . , Qi-l a and b such that the region satisfies (16) * *

9

9

and (20). The inequality (21) is equivalent to g(Q,) > 0, where (22)

and (23)

Then

-

dQi) = ~ X [-t(~i” P - ~:“)Qil ( A + BQi), A = a mi(Qo9 * * m,(Qo, .

*

-.

9

9

Qi-1; Y:’)) Qi-1; 7;’)) ’

=

-

mi(Qo, . . Qi-1; yl’)) mi(Qo, . . . Qi-l; #)I 9

*

If B > 0 (and yiz) > y;”), the derivative is negative and (21) defines a halfinterval (- 00, c ) ; but such a region cannot satisfy (16) and (20). (See Problem 13.) Thus B should be negative. Then the derivative is a continuous increasing function with one 0 ; g(Qi) has a minimum at this point; if the minimum is negative (which will be the case if A is sufficiently large), (21) will define two half-intervals (- 03, c ) and (d, 03). The acceptance region will be the interval ( c , d). Note that the values of c and d will be defined by (16) and (20), which are independent of yj”. In a similar fashion, when y;’) < we argue that B should be positive and that then the acceptance region is an interval satisfying

266

CH.

SERIAL CORRELATION

6.

(16) and (20) independently of y:*). Thus we obtain the uniformly most powerful unbiased test.

THEOREM 6.3.5. The uniformly most powerful unbiased test of the null hypothesis yi = y j l ) against alternatives y i # yj” at significance level ti has the rejection region

where c u i ( Q o , . that

. . , Qi-l;

and cLi(Qo,. . . , Qi-.l; y ; ” ) are determined so

and

m

= (1 - &i)S_>ihi(Qi

I Qo,

*

*

,Qi-1;

YI”) dQi*

COROLLARY 6.3.2. The uniformly most powerful unbiased test of the null hypothesis yi = 0 against alternatives yi # 0 at significance level ti has the rejection region

where cui(Qo, . . . , Qi-l) and cLi(Qo,. . . , QiJ

are determined by

and

If the conditional density of Q i , given Qo, . . . , Qi-l for y i = 0, is symmetric, then (30) is automatically satisfied for (31)

c ~ i ( Q 0., .

9

Qi-1)

=

-cUi(Q0, . . . , Qi-l),

SEC.

6.3.

MOST POWERFUL TESTS OF GIVEN ORDER

267

since both sides are zero, and (29) can be written

The uniformly most powerful one-sided similar tests can also be termed uniformly most powerful one-sided unbiased tests, for unbiasedness of a test implies similarity of the rejection region. It should be noted that the conditional distribution of Qi, given Q,, , . . , Qi.-l, does not depend on the values of yo, . . . , yi-l. Hence these parameters can be assigned convenient values, such as yo = 1, y1 = . * = yi-1 = 0. Usually Q, will be y:. The ratio

, : I

(33)

Qj rj = Qo

can be considered as a serial correlation coeficient. When y1 = * . = yi = 0, the joint distribution of r l , . . . , ri is independent of yo (> 0) and Qo. (See Theorem 6.7.2.) The definitions of the optimum tests (13) and (28) can be given in terms of ri = Qt/Qo conditional on r l , . . . , ri-l. A partial serial correlation of order i , given the serial correlations of order I , . . . , i - 1, can be defined in the usual way. It has been suggested that this partial serial correlation coefficient be used to test the hypothesis that yi = 0. Since this partial correlation coefficient is a linear function of ri (with coefficients depending on r l , . . . , r i J , using it conditional on rl, . . . , is equivalent to using ri conditional on rlr . . . , ri-.l. (See Section 6.9.2.) 6.3.3. Some Cases Where Uniformly Most Powerful Tests Do Not Exist There are a number of problems of serial Correlation whose solutions cannot be obtained from Theorems 6.3.4 and 6.3.5. If we wish to test the null hypothesis y1 = * * . = Y P = 0 (34) against an alternative y1 = y y , . . . , y p = y:, where yy,. . . , yz are specified values, the most powerful similar test may have the rejection region (35) where c(Qo)is a constant determined so the conditional probability of (35) is E under the null hypothesis. If the alternative values of y l , . . . ,y r are given by j = l , ...,p, where the y t ’ s are fixed, then (35) with y: replaced by y: is the best region for all alternatives k > 0. If the alternative values do not lie on a line (36) in the y (36)

yj = k y f ,

268

CH. 6.

SERIAL CORRELATION

space, the region (35) is not the same for all admissible alternatives (y;, Hence there does not exist a uniformly most powerful test. Consider the stationary stochastic process defined by

Y: + BY:-, = U t ,

(37)

. . . ,y:).

..., - l , O , l , . . . ,

t=

where IBI < 1 and the distribution of ut is normal with mean zero and variance aa and is independent of ybl, yt-a, . . . and ukl, u , ~ , . . . . Then the marginal distribution of a sample Y,, . . . ,yT is given by (1) with A, = I, 0

1

0

--.

0

0

1

0

I

* * .

0

0

0

1

0

-. .

0

0

.

.

A, = f

-

0

0

0

..-

0

1

0

0

0

*..

l

oJ

'0

0

0

-..

0

0

0

1

0

* * .

0

0

0

0

1

..'

0

0

0

0

0

* - *

1

0

,o

0

0

.*.

0

0

,

A, =

The set of alternatives for y, = 2P/ua and 72 = Ba/uZdoes not lie on a straight line. Hence, no uniformly most powerful test exists, even for one-sided alternatives. If IBI is small (that is, near 0) this case is similar to the model of Section 6.5.4; if /?is small (that is, near - 1) this case is similar to the model of Section 6.5.3. Thus one might conjecture that one of the two relevant correlations would be a good statistic to use in testing 8 = 0 in this population (when the mean is known). Positive dependence corresponds to p < 0. Suppose (37) defines the distribution for t = 1. . . . , T with the condition

SEC. 6.3.

MOST POWERFUL TESTS OF GIVEN ORDER

269

y0 = 0. Then A0 and A1 are the same as in the previous case and Г1

0

0

···

0

0'

0

1

0

···

0

0

0

1

···

0

0

0

0

0

■··

1

0

Lo

о

о

·■·

о

о.

(39)

2 σ2

ΐ

The set of alternatives for γχ — 2β/σ and γ2 = /? / does not lie on a straight line. Hence no uniformly most powerful test exists for the hypotheses /3 = 0. A model defined by a second-order difference equation (40)

yt + Ä y ^ + /?2y,_2 = u„

t= \

T,

with the circular definition y0 s yT, y_x = yT_x leads to a density (1) with 0

(41)

1

0

···

0

0

0

1

0

П

A, = \ 0

0

0

0

1

0

0

1

0) is the alternative to independence. Moreover, composite procedures can be constructed when the alternative to yi = 0 is specified as y i < 0, y i > 0, or y , # O , i = m + I , . . . ,p . The comments that were made in Section 3.2.2 on this type of multipledecision procedure for determining the degree of polynomial trend apply here as well. The theoretical advantage of this procedure is that the investigator controls the probability of error of saying the order of dependence is i when it is actually lower. A disadvantage is that the maximum order of dependencep must be specified i n advance; this may be difficult to do, but it is essential. A practical disadvantage is that p 1 quadratic forms must be calculated. The disadvantage of having to specifyp i n advance might be mitigated by specifying p fairly large, but takingp, very small for i nearp; that is, make the probabilities small of deciding on a large order of dependence when that is not the case. An alternative multiple-decision procedure is to run significance tests i n the opposite direction. I f it is assumed that the order of dependence is at least t n , the investigator may test the hypothesis ytnfl = 0 at significance level E , , + ~ . If the hypothesis is accepted, the investigator decides the order of dependence is m ; if the hypothesis is rejected, he then tests the hypothesis ym+2 = 0. The procedure may continue with such a sequence of significance tests. Jf a hypothesis y i = 0 is accepted, the decision is made that the order of dependence is i - 1 ; if the hypothesis yi = 0 is rejected, the hypothesis yi+l = 0 is tested. If successive hypotheses are rejected until the hypothesis y p = 0 is rejected for p specified a priori, the order of dependence is said to bep. The advantage of this procedure is that not all p 1 quadratic forms may need to be computed, because the sequence of tests may stop beforep is reached. Because of this fact the specification of p is not as important. It was pointed out i n Section 3.2.2 that the corresponding procedure for determining the degree of polynomial trend had the disadvantage that if the coefficient of degree i was large it would tend to swell the estimates of error variance used to test y n l f l . . . . , yi-l and hence would tend to increase the

+

+

+

276

CH. 6.

SERIAL CORRELATION

probability of accepting one of the hypotheses ym+, = 0,. . . ,yi-l = 0; that is, the effect would be to increase the probability of deciding the degree is lower than i. In fact, the larger yi is, the greater is this probability of error (holding the other yj's fixed). It is more difficult to analyze the case here of testing order of dependence. l t is possible that if the dependence one time unit apart is small, but the dependence two time units apart is large, the probability will not be large of rejecting yI = 0 and getting to decide the order of dependence is at least two. This procedure seems to require that Y,,,+~, . . . , yi-l be substantial so that none of them is accepted as zero if y i # 0 for some particular i. For a situation where the yi's fall off monotonically (or perhaps sharply) this procedure may be desirable. However, no theoretically optimum properties have been proved. Thus far in this chapter we have assumed that &yl = * * = &y2, = 0; if the means are different from 0 but known, then the differences between the variates and their means can be treated under the above assumption. Knowledge of the means, however, is not a very realistic supposition. We shall take up the cases of unknown means in Section 6.6 after we discuss more specific models in Section 6.5. [See T. W. Anderson (1963).] 6.5. MODELS: SYSTEMS OF QUADRATIC FORMS 6.5.1.

General Discussion

The densities of y,, . . . ,y T , the observed time series, which are studied here, are normal with exponent -4 yjQr, where Q, = y'Ajy is a quadratic form in y = (yl,. . . ,yT)'. The serial correlation coefficients obtained from these densities are r, = Q,/Qo. In this section we shall suggest several different systems of matrices Ao, A,, . . . ,A,; in each system Qj is approximately yLyI-$ as suggested in Section 6.2. In each system we can use rl, . . . , r , to test hypotheses about the order of dependence as developed in Section 6.3 and to choose the order of dependence as described in Section 6.4. First we discuss sets of matrices in general to prepare for the analysis of the systems of matrices. The characteristic roots Al, . . . ,A, and corresponding characteristic vectors ul, . . . , uT of a T x T matrix A satisfy

zy=o

It

(1)

Au, = Ap,,

S =

1,

. . . , T.

Since (1) can be written (A - Asl)us = 0 and us # 0, the matrix A - 1,1 must be singular; this is equivalent to saying that As is a root of (2)

IA

- A11 = 0.

SEC.

6.5.

277

MODELS: SYSTEMS OF QUADRATlC FORMS

If A is symmetric (which we assume for quadratic forms y‘Ay), the roots are real and so are the components of the characteristic vectors. If the roots are distinct, the vectors are orthogonal. If m roots are equal, say 1, there are m linearly independent solutions of (A - 1Z)u = 0; any such set may serve as the characteristic vectors corresponding to this multiple root. It will be convenient to take a set of vectors that are mutually orthogonal; they are necessarily orthogonal to the other characteristic vectors. If each characteristic vector is normalized so the sum of squares of the components is 1,

where d,, = 1 and 8,: = 0 for s # Let

t.

,

(4)

v = (UI,. . .

y

VT).

Then (1) and (3) can be summarized as (5)

A Y = VA,

(6)

V’V = I.

If we multiply ( 5 ) on the left by V’ we obtain (7)

V’AV = A;

that is, the orthogonal matrix V diagonalizes the symmetric matrix A. A quadratic form y‘Ay is transformed to diagonal form x’Ax by the linear transformation y = Vx. We shall use these transformations to simplify the problems of finding distributions of quadratic forms and serial correlation coefficients. We are interested in systems of matrices, A,, Al, . . . ,A,. Usually A, = I. Any nonzero vector is a characteristic vector of I , and any orthogonal matrix Y diagonalizes Z as Y’ZV = I. It is convenient if the same matrix Ydiagonalizes

218

SERIAL CORRELATION

CH.

6.

all the A j ; that is, if

j = I,

Y'A,Y = Aj,

(8)

...,/I.

This will occur if every matrix A j has u,. . . . , vT as its characteristic vectors. The following theorem will be useful :

If I , , . . . , A , and u,, . . . , u, are characteristic roots and vectors of a matrix A and P ( A ) is a polynomial in A , then P(A,), . . . ,P(AT) and u,, . . . , V , are characteristic roots and vectors of P ( A ) . THEOREM 6.5.1.

We use the following lemmas to prove this theorem: LEMMA6.5.I . If I , , . . . , I , and u l , . . . , u, are characteristic roots and vectors of A , then A:, . . . , I ; and u , , . . . , u2' are characteristic roots and vectors ofA',g=O,I ,... . PROOFOF LEMMA.We prove the lemma by induction. It obviously holds for g = I . Suppose it holds for g = G - I . Then AfJv, = A(A"-'u,) = AAf-'v,

(9)

= if-'(Av,)

= R~-'A,V,

= Afus,

s =

1,

..., T.

and u , , . . . , uT are characteristic roots and vectors of A,, 11 = 0 , . . . , H , then if a,A,,, . . . , ~ { ~ o a , , land , , T u,, . . . , U T are characteristic roots and vectors of a,At,. LEMMA 6.5.2.

I/ A,,, . . . , At,,

z,=, zr=,

PROOF OF LEMMA.We have s=l,

. . . , T.

PROOFOF THEOREM. Let ( 1 1)

P ( A ) = a,[

+ a,A + . . +afiA". *

Then the theorem follows from the two lemmas. 6.5.2.

The Circular Model

One serial correlation that has simple mathematical properties and has been studied considerably is the circular serial correlation coefficient. It is based on a circular probability model.

SEC. 6.5.

279

MODELS : SYSTEMS OF QUADRATIC FORMS

Let B be the circulant -0

I

0

0

* * *

0

0

1

0

- * *

0

0

0

1

0

0

0

0

0

1

0

c

0

0

0

0

* * .

0

0

0

*.-

0

0

0

0

..*

0

1

0

0

...

0

0

This matrix is orthogonal, B-’ = B’. Let

-0

1

0

0

1

0

1

0

0

1

0

1

0

0

1

0

0

1

0

...

0

1

0

0

. * *

0

0

0

* * *

0

0

0

0

. * *

0

1

0

0

* * *

1

0

I

.

*

Then

2 T

=

YLYL-1,

t=1

where yo EE yT. This amounts to EL, yLyl--lplus the term yIyT. In effect the last term is included for mathematical convenience. We define (15)

A j = &[B’

+ (B’)’]= &(B’ + B-j),

j

< tT.

280

SERIAL CORRELATION

CH. 6.

This matrix has gs on the diagonalsj elements above and j elements below the main diagonal, corresponding to Z t y t y t - , , which is in the exponent of the density of a stationary Gaussian process of order at leastj. A , also has 4's on the diagonals T - j elements above and T - j elements below the main diagonal. We have (16)

A? = (g)2'[B

+ B-'I2'

Thus A, can be written as a polynomial in A , :

- I, A , = 4A: - 3A,, A , = 8A: - 8A: + I, A2 = 2A:

+ 5A1, A , = 32At - 48A: + 18AT - I. A , = 16A: - 20A:

THEOREM 6.5.2. The characteristic roots of A l are 4n 2.n 4.n 2.n (19) cos 0 = 1, cos - ,cos - ,cos - ,cos - , . . . T T T T

(20) cos 0 = 1, cos

2.n 4n 4.n cos - ,cos - ,cos - , . . . , T T T T

277 - 1

cos

n(T - 2) v(T - 2) cos ,cos r = -1, T T

T even;

SEC. 6.5.

28 1

MODELS : SYSTEMS OF QUADRATIC FORMS

the characteristic vector corresponding to rhe root 1 is (1, 1, . . . , 1)', the characteristic vectors corresponding to rhe roofs cos 2nslT ( s # 0, 4T) are r

-

2 77s cos T

2ns sin T

4ns cos T ,

4r s sin T

\

I

-

0

,

2

and the characteristic vector corresponding to the root - 1 ( - l , l , - 1 , . . . , I)'. PROOF.

(if T is

even) is

The equation defining the characteristic roots and vectors of B,

BX = YX,

(22) is written in components as

t = 1 , . . . , T - 1,

Thus (23) implies (25)

Xt

= Yf-kq,

(26)

Z1

= YTZ],

(27)

YT

= 1.

t = 2 , . . . , T,

and (24) then implies

T-iVT Thus the roots are the Tth roots of I , namely 1, e i z n / TeirnlT , ,..., The characteristic vector corresponding to the root e i Z n s / Ts, = 0, 1, . . . , T - 1, has as its rth component eiZntslTif the first component is eizns/T.Let this characteristic vector be u,. Then

=

2n cos - s u,. T

282

CH.

SERIAL CORRELATION

6.

If s # 0, $T,then cos (2ns/T)= cos [2n(T - s ) / q is a double root. Corresponding to this double root, we take as characteristic vectors +(us uTJ and

+

1

- (u, - u T J , which have as tth components

2i

respectively. This proves the theorem.

THEOREM 6.5.3. The characteristic roots of A , are

-

27rj 2nj 4nj 4nj ,. . . , ( 3 1 ) cos 0 = 1, cos -, cos -, cos -,cos T T T T

- I ) , cos

cos nj(T T 2nj (32) cos 0 = I , cos -, T

COS

- I),

T

Todd,

-

2nj n j , cos 4nj , . . . , -, cos 4T T T

cos nj(T - 2 ) , cos nj(T - 2 ) ,cos nj = (-l)’, T even; T T the characteristic vector corresponding to the root 1 is (1, 1, . . . , l)‘, the characteristic vectors corresponding to the roots cos 2njslT ( s # 0 , .kT) are (21), and the characteristic vector corresponding to the root (- 1)’ T is even) is (- 1, I ,

-1,.

(if

. . > I)‘.

PROOF.Since A, is a polynomial in A,, it has the same characteristic vectors as Al and the characteristic roots are the polynomials in the roots of Al (by Theorem 6.5.1). Also, the characteristic vectors and roots of a power of B are the vectors and powers of the roots of B. Thus (33) 2 nli s us. = cos -

T

SEC.

6.5.

283

MODELS : SYSTEMS OF QUADRATIC FORMS

It will be observed that the characteristic vectors are the sequences of trigonometric functions that were used in the cyclic trends of Chapter 4 . When these characteristic vectors are normalized, they form the orthogonal matrices of Section 4.2.2, namely (34)

cos

2n - sin 2-n cos 47r T T T 4n T

cos - sin

4n 877 2 4 T - 2) - cos - . ' . sin T T T

0

1

sin n ( T - - 2 ) T

1

...

- -1

,I5

1

4

1 Jj

0

if T is even and 1 -

Jt

1 -

,/2

cos

-

27r

sin

T

- 1)

27r - cos 4a - .. * T T

sin

n(T

-

sin

2 4 T - 1) T

47r 47l 87r cos - sin - cos T T T

' *

if T is odd. The same orthogonal matrix diagonalizes all of the quadratic forms. If we transform the vector ( y l , . . ,yT)' by this matrix to (zl,. . the j t h quadratic form is

.

.

=

Qj

2.0s2 T

s-1

--sz:, 7rj

T

,+)I,

j = O , l , ..., p.

(Note cos 27rjs/T is the same for s = 0 and s = T.) If the yt's are independently normally distributed with means 0 and variances c2( y o = l/G, y1 = * * = yo = 0), then the 2,'s are similarly distributed.

-

284

SERIAL CORRELATION

CH.

6.

The jth-order circular serial correlation coefficient is T

(37)

i = r . = Q-

’

Qo

2 YtY1-r

ty; 2 1-1

1=1

-

9= I

cos a s , : T

where y,-j = y l - j + T , f = I , . . . , j , j = 1, . . . ,p. It is called “circular” because the quadratic form Qjis made up by taking products of all yl’sjunits apart when yl,. . . ,yT are equally spaced on the circumference of a circle. The weights cos 2nslTcan be visualized by placing Tequidistant points on the unit circle with one point at (1,O) and dropping perpendiculars to the horizontal axis. The weights are the s-coordinates of these points. The advantages of the circular model are that the roots occur in pairs for the first-order coefficient, except for possibly one or two roots, and all the matrices have the same characteristic vectors; it will be seen in Section 6.7 that when the roots occur in pairs except for possibly one or two roots the distribution of rl or r:can be given in closed form. There is a practical disadvantage in adding terms such as yIyT, for such a term does not reflect serial dependence, and if there is a trend (which has been ignored) such a term may be numerically large and will tend to distort the measurement of dependence.

6.5.3. A Model Based on Successive Differences Here we shall develop another system of matrices A,, A,, . . . , A,. In Section 3.4 we considered using sums of squares of differences to estimate the variance of a time series. If yI,. . . , yT have means 0, variances g2, and are uncorrelated ,

is an unbiased estimate of u2. If the means are possibly not 0 (but the yt’s are uncorrelated), V,/V, (q > r ) can be used to test the null hypothesis that the

SEC.

6.5.

MODELS : SYSTEMS OF QUADRATIC FORMS

285

trend f ( t ) = €yt satisfies Ar/(r) = 0 under the assumption that Aqf(r) = 0. If the means are 0 and the yt’s are possibly correlated, then the variate difference will reflect the correlation. In particular, for r = 1 the expected value of 2(T - l)Vl is

1=1

f=1

t=2

1-1

If the successive pairs of yt’s are positively correlated, Vl will tend to underestimate the variance. A statistic suggested for testing serial correlation is the ratio of the meansquare successive difference to the sum of squares. If the means are known to be 0, this statistic is

=2

1

2 T

I -

YtYt-1

t=2

+ 1Y: + 1Y;

2 ‘F

L

1=1

Y:

As an alternative to the circular quadratic forms, (40) suggests defining y: and Q1as the numerator in the fraction in the last term of (40). The matrix of this quadratic form is Qo =

EL,

...

0

0

0

0’

... ... ...

0

...

1

0 0

286

SERIAL CORRELATION

CH. 6.

cT=,

This matrix differs from the matrix of the quadratic form y,y,-, by having 4’s in the upper left-hand corner and i~ the lower right-hand corner. We can generate the subsequent matrices in this model by the polynomials in (18) [implied by (16) and (17)] in this Al. These are the polynomials which we used in the circular case. The first are

...

0’

...

0

... (43)

0

0

0

0

0

...

0

...

OJ

zt

It will be noted that A, differs from the matrix of the desired form y,yt-, only by insertion o f j 4’s in the upper left-hand corner and in the lower right-hand corner. Defining A,, j = 2, . . , , p , as polynomials in Al assures that the characteristic vectors of A, are the same as the characteristic vectors of A,. The use of the same polynomials in this system as were used in the system of circular serial correlation coefficients implies that A, here will not differ very much from the matrix with i ’ s j elements off the main diagonal because Al here differs from f(B B’) only in corner elements, and hence A, differs from +[B’ (B’)j] by only a small number of elements whenj is small. A characteristic vector x with components xl,. . . ,xT corresponding to a

+

+

SEC. 6.5.

287

MODELS: SYSTEMS OF QUADRATIC FORMS

characteristic root 1 satisfying A,x = Ax satisfies (44) (45)

&-1

(46)

i(ZT-1

+

22)

= 1%

+ %+A

= 1%

)(%

+

- 212,

Ztfl

- 1,

ZT) =

Equations (45) may be written (47)

t = 2,. . . , T

+

f = 2 , . . . , T - 1,

= 0,

Zt-1

which is a second-order difference equation. The solution is (48)

where

Z(

+ czt;,

= c1t;

tl and Fa are the roots of the associated polynomial equation

(49)

22

- 212 + 1 = 0.

The roots 1 f JA2 - 1 are distinct unless = 21, we can write (48) as and t1

+

(50)

21

and 21 = 6 (51)

+ 6-l.

= Clt'

+

R

= 1 or 1 = -1.

Since

tlFz= 1

C Z P ,

Then (44) can be written as

0 = 22

+ (1 - 21)2,

+ cpt-2 + (1 - E - E-')(c,E + = c,(l - t-% + = C1P

c2(l

c2E-1)

t)t-1

= (t - l)(c, - c,t-1).

Thus cp = cl€ and we can take c, = t-: and c2 = t i . Then zt =

(52)

tt-4 +

.5-'t-H*

Equation (46) is

(53)

0 = (21

- l)ST - XT-,

=(E +

6-1

-

t +p T - 9

-(tZ'4

+ E4T-3

)

- tT+t + t - ( T + ! l - [T-' - t-fT-!l = (62' - E - T ) ( f : - E-L). = Thus either = 1 or tZT = 1. The roots of [ZT = 1 are etanlT , s = 0,1, . . . , 2T - 1. Since elZwa/(2T)= e - I z w ( 2 T - a ) / ( 2 T ) , T

+ 1 different values of 1 = b(E + t-l) for E =

e12ns/(2T),

e12wJ(2T)

=

there are s = 0, 1, . . . , T.

288

SERIAL CORRELATION

CH.

6.

However, the value of 1 = - 1 corresponding to 4 = - 1 for s = T is inadmissible because zt = 0. The remaining A’s are characteristic roots of A, and since there are T characteristic roots, these must constitute all of them. It will be convenient to multiply ef(xt--l)nJ(2T)e-i(2t--l)ffr’(2T) by 8 to obtain the tth component of the sth characteristic vector.

+

THEOREM 6.5.4. The characteristic roots of A , given by (41) are cos aslT, . . . ,T - 1, and the corresponding characteristic vectors are

s = 0, 1,

1

(54)

3T S

COROLLARY 6.5.1. The characteristic roots of A, implied by (16) and (17) and based on A , given by (41) are cos ajs/T, s = 0 , 1, . . , T 1, and the corresponding characteristic vectors are (54).

.

-

PROOF. A, is defined as a polynomial in A,, say P,(A,), and the characteristic roots of A, are P,(A,), where 1,are the characteristicroots of A,, s = 0, 1, . , . , T 1, by Theorem 6.5.1. The proof of Theorem 6.5.3, where Al = i ( B B’), implies P , (cos v ) = cosj v . Corollary 6.5.1 now follows from Theorem 6.5.4.

+

-

,

COROLLARY 6.5.2. The orthogonal matrix that diagonalizes A of Corollary 6.5.1 is (55) 1 -

cos 2T

cos

2a 2T

-

...

1 -

3a cos 2T

61r cos 2T

...

J2

\if

JZ

a

cos

-

cos

34T 1) 2T

(2T

- l ) d T - 1) 2T

SEC. 6.5.

289

MODELS: SYSTEMS OF QUADRATIC FORMS

All of the matrices are diagonalized by the same orthogonal matrix. In terms of the transformed variables, they'th quadratic form is

ß, = 2cos^'( S -l)4

(56)

j? = 0 , 1 , . . . , p.

The matrices developed in this section are closely related to the matrices used in the variate-difference method (Section 3.4). Let

Sr = J / W = ^c«)y'y»

(57)

(-1

S. 0 and y1 = ... - yi = 0, the joint distribution of r:, . . . ,rl* is independent of yo and Q,*. (See Theorem 6.7.2.) The definitions of the optimum tests (IS) and (21) can then be given in terms of r* conditional on r:, . . . , rlcl. The theorems of Section 6.4 on optimum multiple-decision procedures can similarly be phrased in terms of Q:, . . . , Q,* when &yt = p and E is a characteristic vector of A,, . . . , A,. 6.6.2. Certain Regression Functions

More general mean value functions can be included in the treatment of nonzero means. Suppose m

v=1

where u l , . . . ,uT are characteristic vectors, t = 1, . . . , T ,

j = 0 , l ) ..., p ,

300

CH. 6.

SERIAL CORRELATION

and sl, . . . ,s, is a subset of 1 , . . . , T. The vectors are orthogonal: ulu, = di,. Then the exponent in the normal density is -4 times

Thus Qo, . . . , Q, and y’uSl,. . . ,yfuSm form a sufficient set of statistics for the parameters yo, . . . , y D and al, . . . , a,. An equivalent sufficient set of statistics is

j = 0, 1,.

and

4,Y

av = , ’ ~

. . ,p ,

v = 1, . . . , m.

4.@av

Note that the least squares estimates (31) of a”, Y = 1, . . . ,m, obtained by minimizing (y ayvsy)’( y upsv),are also the Markov estimates, that is, the best linear unbiased estimates, because the u,” are characteristic vectors of I:&o yrA,. (See Section 2.4.) As an example, consider A. = I, A,, . . . , A , of the circular model of Section 6.5.2. The characteristic roots of A, are cos 2mj/T, s = 1, . . . ,T; and the characteristic vectors have tth components cos 2mt/T and sin 2mt/T, s = 1 , . . . ,#(T- 2 ) if T is even or #(T - 1) if T is odd; (1, 1 , . . . , 1)’ corresponds to the root 1 = cos 2wTj/T, and if T is even (- 1, 1, . . . , 1)’ corresponds to the root (- 1)’ = cos 2?r+7j/T. These characteristic vectors are the sequences that were used in the periodic trends studied in Chapter 4. Suppose that

zzl

(32)

b y , = pt = a.

zr=l

+ h-1

2rvh a(vA)cos -t T

+ p(vh) sin

T

where vl, . . . ,v9 form a subset of the integers 1, . . . , J ( T - 2) if T is even or i ( T - 1) if T is odd, or suppose that

SEC.

6.6.

301

CASES OF UNKNOWN MEANS

+

if T is even. The index m in (27) corresponds to 2q 1 for (32) and 2q for (33). Then the least squares estimates of the coefficients are given by

+2

(34)

(36) if T is even. Let

or

if T is even. The jth-order circular serial correlation coefficient using residuals from the fitted periodic trend is

:=I

or

(40)

*

rj =

h=l

h-1

:=I

T

if T is even. [See R. L. Anderson and T. W. Anderson (1950).]

302

SERIAL CORRELATION

CH.

6.

Best tests and multiple-decision procedures can be phrased for the cases considered in this subsection. Example 4.1 in Chapter 4 gave monthly receipts of butter at five markets over three years. The fitted trend m, based on twelve trigonometric functions (including the constant) was given in Table 4.1. The circular serial correlation of first order based on the residuals is

+

(-0.6)(-4.6) (41) r: = (2.4)(-0.6) (2.4)’ (-0.6)’

+

+ . . + (1.6)(2.5) + (2.5)(2.4) + . + (1.6)’ + (2.5)’ * *

--- 232*18- 0.489. 474.5 1

The computation in (41) is carried out in rounded-off residuals for the sake of easier reading. 6.7. 6.7.1.

DISTRIBUTIONS O F SERIAL CORRELATION COEFFICIENTS

The General Problem

The various serial correlation coefficients discussed in earlier sections are ratios of quadratic forms, say sj = Pj/Po,j = 1 , . . . , p , where Pi = y’B,y, j = 0,1, . . . , p . We are interested in their distributions when y has a multivariate normal distribution with covariance matrix Z. The general discussion will be carried out in broad terms because of applications to other problems. In the applications in this chapter the density has in the exponent the matrix Z-I = y j A j . In the simplest cases Bj = A , ; in some other cases

x;=,

where

E

= (1, 1,

. . . , 1)‘, and then

(2)

Pj

= (y

- j j ~ ) ‘ A j( ~$a).

In still other cases Bj is such that Pi is a quadratic form in the residuals of y from a more general regression function; that is, (3)

Bj = ( I - V * ( Y * ’ Y * ) - ’ V * ’ ) A j ( I- V * ( V * ‘ V * ) - ’ V * ’ ) ,

(4)

Pj = (y

- V* ( Y * ’Y * )-’V * ’y)’Aj(y - Y * ( V * ’ V * ) - ’ Y * ’ y ) ,

SEC.

6.7.

DISTRIBUTIONS OF SERIAL CORRELATION COEFFICIENTS

303

where V* is a matrix with T rows and rank equal to the number of columns. When B, = A j we shall write Q j for P, and r j for sj. When Bj is given by (1) we shall write Qt for Pjand r,* for sj. THEOREM 6.7.1. The joint distribution of Po, . . . ,P, given by (4) does not depend on p when y is distributed according to N( V* p, C). PROOF. Let y = u

+ V*p;

then u is distributed according to N ( 0 , C ) and

y - V*(V*'V*)-lV*'y = ( I - V*(V*'V*)-1V*')(U

(5)

+ V*p)

- V*(V*'V*)-lV*')U, = (U - V*( V*'V*)-l Y*'u)'Aj(u - V * ( V*'V*)-l V*'U). = (I

Pj

(6)

xi=,

Q.E.D.

If C-l = yjAj and Bj = A, (Pi= Q, and sj = r J , then the conditional density of Q i , gi en Q,, . . . , Qi-,, is (7)

hdQi

where nti(Qo,. . . ,QiPl; y i )

(8)

=I

m

. . . ,Qi) dQi

e-*YiQki(Q,,

-Q

(assumed to be positive); thus it does not depend on yo, . . . , yi-l. Equivalently, the conditional distribution of r i , given Q, and r l , . . . , ri-l, does not depend on yo, . . . , y i - l . It is this conditional distribution of r i that will be used in testing the null hypothesis that the order of dependence is less than i (that is, y i = 0). It will be convenient to obtain the conditional distribution on the simple assignment of parameter values of y, = 1 , y1 = * * = yi-l = 0. This conditional distribution when yi = 0 does not depend on Qo (because rl, . . . ,ri are invariant with respect to scale transformations y t cy,). The following lemma and theorem will be useful.

-

---f

6.7.1. The conditional density of oi, given Po = a,, . . . , -Qi-,LEMMA = ai-l, where 0, = c2Q,, . . . , oi = c2Qi, is the conditioiial density

of Q i , given Qo = a,, . . . , Qi-, = ai-l, with y i replaced by yi/c2, namely hi(Qi 00, . . * 0i-l; yi/c2).

I

9

- PROOF. The

quadratic form

Qj= c2y'Ajy= (cy)'A,(cy)

can be written

Qj= x ' A j x , j = 0,. . . , i, where x = cy has the density with- exponent equal to -.; times Zj=,yj(c-lx)'Aj(c-lx)= Zi-, (yj/c2)@j. Thus Q,, . . . , oi have the joint density of Q,, . . . , Q iwith y j replaced by y j / c 2 , j = 0 , . . . , i ; the lemma follows.

304

SERIAL CORRELATION

CH.

6.

THEOREM 6.7.2. The conditional density of ri = Qi/Qo,given Qo and r, = Qj/Q,,j= 1, , i - 1, is

. ..

I

hi(ri 1 s ri, . .

(9)

-

ri-1;

YiQO).

If yi = 0 , this conditional density does not depend on Q,. PROOF. The conditional density of r,, given Q, = 6, and rj = b,, 1= 1, . . . ,i - 1, is that of czQi for ca = l/Qo, given caQo= 1 and rj = coQj = b, (with yi replaced by yi/c2 = yiQo),by Lemma 6.7.1. This gives the first part of the theorem. If yi = 0, it will be seen that the conditional density (9) does not depend on Qo by substitution of the arguments of (9) into the definitions (7) and (8). If y1 = * - yi = 0, then Theorem 6.7.2 implies that the joint conditional density of rl, . . . , ri given Q, [the product of i conditional densities of the form of (9)J does not depend on Qo. Thus, the conditional density of ri given Qo does not depend on Q,. It should be noted that Lemma 6.7.1 and Theorem 6.7.2 hold similarly for Q : and r,?.

--

6.7.2.

Characteristic Functions

One approach to finding the distributions of P o , . . . , P, or sl,. . . ,s, is to use the characteristic function of Po, . . . ,P,. THEOREM 6.7.3. I f y is distributed according to N(0, C ) , the characteristic function of Po, . . ,P , is

.

-

1

JlZ-

2i(roB,

+

* * *

+ t,B,)CI

*

PROOF. For purely imaginary to, . . . ,t , sufficiently small, (10) can be obtained from the fact that the integral of e-fy'Ayis IAI-' ( 2 ~ ) ~ ' to ; justify (10) for real to, . . . , t , we reduce the multivariate integral to a product of univariate integrals. For fixed real to, . . . , t , let C be a matrix such that (1 1)

C'

c-'c

= I,

SEC.

6.7.

DISTRIBUTIONS OF SERIAL CORRELATION COEFFICIENTS

305

where D is diagonal. (If A is positive definite and B is symmetric, there is a matrix C such that C’AC = I and C’BC is diagonal. See Problem 30.) Then (wheny = Ct) (13)

because l / J l - 2it is the characteristic function of 25 (x* with 1 degree of freedom) when z,, is distributed according to N ( 0 , 1). Using ( 1 1 ) and (12) again, we derive from ( I 3) (14)

gei(fol’o+...i tnl’r)

-

1

,/fi( 1 - 2id,,) S=l

-

1

J I / - 2iD(

J I C ’IC-’ ~

which is equal to (10). Q.E.D.

1

- 2 i ( t , ~ , ,+ . . . + r,B,)1 I C I’

xr=uyiQj], the

COROLLARY 6.7.1. I f y is distributed with density K exp [ -; characteristic function of Q, = ~ ’ A o y., . . , Q, = y ’ A 2 is

306

CH. 6.

SERIAL CORRELATION

In principle, the characteristic function can be inverted to find the joint distribution of Po, P,,. . . ,P,, though in practice it cannot be carried out in all cases. It can be used to obtain the joint distribution of s,, . . . ,s, by use of

= Pr

{P,- w,Po ,< 0,. . . , P, - w,P, 5 O } .

The characteristic function of P , - wlPo, . . . ,P , Theorem 6.7.3 by letting to = -wltl - * Wd,.

- -

- w,P,

is obtained from

6.7.3. Canonical Forms As we have observed in Section 6.5,if ut is the tth normalized characteristic vector of B and dt the corresponding characteristic root, then

(17)

Y’BY = @,

V’Y = Z,

+,,

where CP is a diagonal matrix with . . . , +T as diagonal elements and Y = (u,, . . . , uT). (If the orthogonality conditions uiu, = 0, r Z s, are not necessarily satisfied, linear combinations of the ut’s corresponding to common roots can be defined as the characteristic vectors so as to satisfy them.) If the columns of V are the characteristic vectors of B,, . . . , B, with characteristic roots .. ., +OT; 411, . . . , +,Ti. . . ; + , I , . . . ,+,T, respectively, then

+,,

(18)

V‘BjV = a,,

Y’V = z,

j=O,l,

. . . ,p ,

where CPj is diagonal with t$jl, . . . , +jT as its diagonal elements. Then when Bo = Z the quadratic forms can be written

L=l

1=1

where y = Vz. A necessary and sufficient condition for B,, common set of characteristic vectors is (20)

BrBk = BkB,,

(See Problem 31.) We are particularly interested in the case where C-’ = and (21)

B, = [Z - Y*V*’]Aj[Z- Y*V*‘],

. . . , B,

to have a

j , k = 1,...,p.

x=,

yjAj, A , = Z,

j = 0,1,

... , p ,

SEC.

6.7.

DISTRIBUTIONS OF SERIAL CORRELATION COEFFICIENTS

307

where V* is a matrix whose columns are columns of V, namely the characteristic vectors of At corresponding to roots which are the diagonal elements of the diagonal matrix A;, j = 0, 1, . . . , p . Suppose that the roots and vectors are numbered in such a way that V* consists of the last m columns of Y and A: is the lower right-hand corner of A,. It is assumed that ?"A, V = A,, where A, is diagonal and V'Y = I. Then

V'V*=

-

(y"")'

- V'Aj(O

V*)

Y'(I - V*V*') = y'

(23) (24)

(?),

V'B,V = V'A,V

-

(:*,)AjV

+

Thus (for y = Vz) (25)

Pj = y'Bjy = Z'Y'B,Vz

t=T-m+l

t=1

t=l

If the distribution of y is N( V * p , C),where C-' = ~~~o ?,A,, then the density of z = V-Iy is

- +

where the components of p have been numbered from T m 1 to T. In the case of the circular model E = (1, . . . , 1)' is a characteristic vector

308

SERIAL CORRELATION

CH.

6.

of Aj corresponding to the characteristic root 1 (the Tth root),j = 0, 1, . . . ,p. The canonical form of Qt with residuals from the sample mean is T=2,3,

(28)

... .

In the case of the mean,square successive difference model E is also a characteristic vector of A j corresponding to the root 1 and (29)

Q:

T-1

= xcosni t z:,

T

t=l

T=2,3,

... .

In the case of the third model E is not a characteristic vector of A , , ] = 1, . . . ,p, and a reduction of Q: to a weighted sum of T - 1 squares is not possible. 1 quadratic forms Po, . . . ,P , defining p serial In general we may have p correlations sl = Pl/Po,. . . ,s, = P,/Po and a covariance matrix C. It is always possible to find a matrix C such that C'C-'C = Z [equivalently C-lC(C')-l = Z] and C'BjC is diagonal for some j . Unless special relations hold among these matrices, however, only one Bj can be diagonalized by some one matrix C. Thus if C is transformed to Z and sj = x'(C'BjC)x/[x'(C'BoC)x], then either the numerator or denominator quadratic form may be a weighted sum of squares, but not necessarily both. The following is a corollary to Theorem 6.7.3:

+

COROLLARY 6.7.2. If y is distributed according to N( V * p , C ) , where C-' =

zso

y j A j and the columns of V* are characteristic vectors of A j corresponding to characterisric roots A j t , t = T - m + 1 , . . . , T, j = 0, 1 , . . . , p , the characteristic function of P: = ( y - V*b)'Aj(y - V*b), j = 0 , 1, . . . , p , where b is the least squares estimate of p, is

J

j=o

Distributions of Serial Correlation Coefficients with Paired Roots Under Independence

6.7.4.

To carry out the procedures derived in Sections 6.3 and 6.4 we need the In particular, to conditional distribution of r i , given Qo and r l , . . . , determine a test of the null hypothesis y1 = 0 (given y j = 0,j > 1) at a specified

SEC.

6.7.

DISTRIBUTIONS OF SERIAL CORRELATION COEFFICIENTS

309

significance level we want the distribution of rl when yl = 0; this distribution does not depend on yo. We shall therefore consider the (marginal) distribution of a single serial correlation coefficient when y1 = * - y,, = 0. Since A, = I, the observations are independent. The distribution of a serial correlation coefficient can be expressed simply when the characteristic roots occur in pairs. Inasmuch as the distribution may be valid for other serial correlations, we shall give a general treatment and hence shall drop the index. Later we shall discuss the conditional distribution of t i ,given rl, . . . , ti-l. Suppose T - m = 2H, and that there are H different roots

- -

(31)

ljh

h=l,

= A j . T - m f l - h = vh,

. . . , H.

For instance, in the case of the first-order circular serial correlation with residuals from the mean, nt = 1, T is odd, and v,, = cos 277hlT. Let

Then

2 H

'hXh

(33) h=l

When yt = 0, j > 0, the zt's are independently distributed with variance 1 ( y o = 1). Then x l r . . . , zH are independently, identically distributed, each according to a X*-distribution with 2 degrees of freedom. The density of x,, is je-bxk, and the joint density of x l , . . . ,xH is

and 0 elsewhere. The surfaces of constant density are hyperplanes in the positive orthant

2 H

(35)

XI,

= c,

2,,>0,

h=l)

. . . , H,

h=l

for c > 0. Thus the conditional distribution of zl, . . . ,xH given (35) is uniform on that (H - I)-dimensional regular tetrahedron. (If H = 3, the distribution is uniform on the equilateral triangle.)

310

CH. 6.

SERIAL CORRELATION

The conditional probability is

C conditionally uniformly distributed on the (H- 1)where x l / c , . . . ,X ~ / are dimensional tetrahedron zEl (xh/c) = 1 and x,,/c 2 0, h = 1, . . . , H. (This is a special case of Theorem 6.7.2.) Thus the conditional probability (36) is independent of c; for convenience we can take c = 1. This shows that r is distributed independently of zglxk We shall, therefore, find the probability

(37)

where x l ,

. . . ,xH

are uniformly distributed on

The probability (37) is the ( H (38) and

- 1)-dimensional volume of the intersection of

(39) h=l

relative to the (H - 1)-dimensionalvolume of the regular tetrahedron (38). The vertices of the regular tetrahedron (38) are XI,. . . ,XH, where all of the coordinates of X5 are 0, except the jth which is 1. The hyperplane zEl vhxh= R intersects an edge XiX5(xi xi = I , xk = 0, & # i, k # j , i # j ) or its extension beyond the positive orthant in the point W,,with coordinates (40)

x* =

-

R-v vi

- vi

,

+

x,=-

VY

vi

-R - v5

3

= k#i,

k#j,

(Note that Wi5= W5i.)It will be convenient to number the roots so (41)

VH

< V H - 1 < ' < v2 < v1. * *

i#j.

SEC.

6.7.

A

DISTRIBUTIONS OF SERIAL CORRELATION COEFFICIENTS

Figure 6.1. Regions when

ZE,

Y,

31 1

u, x,+ u2x2+ u 3 x 3 = R

< R I: vl.

0 in Urn;thus

urnc um-l.

(48)

LEMMA 6.7.4. (49)

S = U, V U3 V

*

(50)

S = U, U U3 V

*

-

V *

V

Urn- (UsV U, V

*

- (U,V U, V

*

- V UmJ, m odd, - - V U,,,), m even.

PROOF.Relations (46) and (48) indicate that the set in parentheses on the right-hand side of (49) or (50) is contained in the set outside parentheses, and hence the indicated subtraction is possible. Since S C Ul and S n U,= 0, j > 1, S is contained in the right-hand side of (49) or (50). On the other hand any point in the set on the right-hand side is in S, for a point in the right-hand side must be in some one set U2,-, (because Ul, U3, . . . are disjoint) and it must not be in any U 2 j ;because of (46) and (48) such a point must be in the part of U,disjoint to U2and hence it is in S. This proves the lemma.

314

CH. 6.

SERIAL CORRELATION

LEMMA 6.7.5. The ( H - 1)-dimensional volume of U i relative to (38) is

ll-vi - R H

(-1y-1

(51)

j=1

i+i

V

a

- yi

PROOF.The coordinates of the vertex W i j(1# i ) of Ui are (40). The length of the edge XiWij relative to the length of the edge X i X j is - ( v i - R)/ ( v i - v,) f o r j < i and is (vi - R)/(v, - vj) f o r j > i. Lemma 6.7.5 then follows from Lemma 6.7.3. THEOREM 6.7.4. Pr {r 2 R } = 1 for R _< v f I ,

Pr {r 2 R } = 0for v1

m = 1,.

R.

. . ,H

- 1,

PROOF. The theorem follows from Lemmas 6.7.4 and 6.7.5 and (45). COROLLARY 6.7.3. The density of r is m

( H - 1)

(53)

(Vi

- ry-2

fi (vi j=l

- vj)

vm+1 I r m = 1,

S vm,

. . . , H - 1,

i#i

and is 0 outside (vfi, vl).

The distribution of the serial correlation when there is one single root in addition to double roots can be derived from (52). It will be convenient to derive a more general result from which the distribution may be obtained. We let pl, . . . ,p M ,pM+l be arbitrary weights.

THEOREM 6.7.5. Let

(54) i-i

j-1

where z l r . . . ,z M + L are independently distributed, each according to N ( 0 , 1). Then

SEC.

6.7.

315

DISTRIBUTIONSOF SERIAL CORRELATION COEFFICIENTS

= Pr {s

2T

+ (T - p ~ + ~ ) ~ ) ,

where u = xi/& has the distribution of LFL,M/M. The last step in (56) is a result of the fact that s is distributed independently of 2:.

zfl

COROLLARY 6.7.4. I f L = 1 and M = 2 H ,

To evaluate (57) when s is r of (52) we use a special case of the following lemma:

LEMMA6.7.6.

- c I ( M i - L ) - l (c

PROOF. Upon the substitution (59)

+ d)-".

c+d u w = -c l+u

into the left-hand side of (58) we obtain

which gives the right-hand side of (%!),because B(W, 30.

COROLLARY 6.7.5.

the integral is the Beta function

316

SERIAL CORRELATION

CH.

6.

THEOREM 6.7.6. Let

r=

h-1 7

2H+1

h=l

where zl, . . . , z ~ are ~ independently + ~ distributed, each according to N(0, I), and v ~ 1 elliptic, hyperelliptic and more complicated integrals are involved. The distribution for T - 1 odd can be obtained from the distribution for T - 1 even with the help of Corollary 6.7.4. In the case of the serial correlation (79) I , = cos r t / T , t = 1, . . , T - 1. The distribution of r: is symmetric because AT-t = -A,, t = 1, . . . , T - 1. The roots are paired (positive and negative values) except when T is even and then I t T = 0. The range of values of r: is (-cos T/T,cos n/T).

.

6.7.7.

(84)

Moment Generating Functions and Moments

8 r : , = &'Q:hQ:-h.

The moments of Q: and Q: can be found from the characteristic function or moment generating function of Q,' and Q: (the moments of positive order by differentiation and the moments of negative order by integration). Under the null hypothesis of independence (yl = 0), however, the deduction of the

322

SERIAL CORRELATION

CH.

6.

moments is accomplished effectively by virtue of the fact that r: and Q,* are independent (Theorem 6.7.2).

LEMMA6.7.8. I f r : (85)

= Q:/Q:

and y, = 0,

&r:hQQo*g= &rp&Q,*g.

THEOREM 6.7.9. If r: = Q:/Q,* and y, = 0,

PROOF.This follows from Lemma 6.7.8 by letting g = h. If rl = QJQo and y1 = 0, then similarly 67: = 8Q:/BQt. Usually Q: has the canonical form 2:;' z i , and Q; has the %*-distribution with T - 1 degrees of freedom. Similarly Q, has the canonical form z:, and Q, has the Xz-distribution with T degrees of freedom. The moments are

, : I

(87)

-4T

< h.

To find 8Q:it is convenient to use the moment generating function. If Q, = y'A,y, where Al is an arbitrary symmetric matrix with characteristic roots A,, . . . , A,, and A , = I, then from Corollaries 6.7.1 and 6.7.2

for 8 sufficiently small in absolute value. (See Problem 29.) We shall evaluate (88) in the first three specific m.odels defined in Section 6.5.

-

LEMMA6.7.9. The determinant DT = II 2OA,I, where A , has 4's in the diagonals immediately above and below the main diagonal and 0's elsewhere, is

T=l,2,

... .

SEC. 6.7. PROOF.

Expansion of DT by elements of the first row gives 1-Θ

-Θ

(90)

323

DISTRIBUTIONS OF SERIAL CORRELATION COEFFICIENTS

DT =

0

0 ··■

0

0

1 - е

о ···

о

о

0

0

0

0

0

-Θ

1 -0

0

0

-0

0

0

0

0

0

0

0

0

1 ■··

-0

-0

1

0 0

= DT,

■■■

-0

-0 1 0

0

0

-0

0

0

0

0

1

+ Θ

0

0

0

··

0

0

0

··

1 -Θ -Θ

= I>r-i - б2 DT_2,

1 ^=3,4,.

where Z), = 1. Thus DT satisfies the difference equation (91)

DT - DT_X + вгПт_2 = 0,

which has the associated polynomial equation (92)

хг - x + в* = 0,

whose roots are £(1 ± Vl — 40 2 ). Thus the solution for θ2 Φ 1/4 is (93)

1 + Vl - 4 б 2 \ г DT = c,i-

+

■) 4

1 - Vl - 40s )·

T=

1,2

The constants cx and c2 can be determined by Dx = 1 and D2 = 1 — Θ2 for M O . This proves the lemma, which is needed for the model of Section 6.5.4.

324

CH. 6.

SERIAL CORRELATION

LEMMA6.7.10. The determinant AT = IZ - 28A,I, where Al is the matrix defining the first-order circular serial correlation coeflcient, is

=

[(Ja + Ji - 2e 2

)T-(

-

Ji + 2e - J i - 28 -2

)I* ... . T a

T=2,3,

PROOF. Expansion of AT by elements of the first row and then by elements of the first column of cofactors gives

1

-e I

-8

o -e

o -8

-e

1

o o

-..

-e

o

o -e

0

0

0

0

0

0

0

0

1

...

0

0

0

- * .

1 -8

0

0

0

* . -

o

o

o

0

0 -8

0

0

0

-e 1 o -e

o -8 1

0

...

0

0

0

o 1 -e o -e 1

-.. .-.

0

0

0

0

0

0

-0

-0

0

0

0

0

0

0

o

o

-e

* * .

1

-e

. . * -e 1 o -e

o -8

1

SEC. 6.7.

DISTRIBUTIONS OF SERIAL CORRELATIONCOEFFICIENTS

=:

DT-1

- ~ ~ ‘ D T-- Z2eT,

T=3,4,

325

... .

Substitution from (89) gives (94). For T = 2 ul8 = aPl = 1, and, hence, A, = I 482.

-

LEMMA6.7.11. The determinant C , = (1- 2BAJ, where A , is fhe matrix based on successive di$erences, is

T=2,3,

... .

PROOF. Expansion of C , (by elements of the first row and next by elements of the last row of one cofactor and the first column of the other) gives

1-8

-e o 0

o ... 1 -e o ... -e I -e ... o -e I ... -e

o

0

0

0

0

0

0

0

0

0

0

0

0

-e 0

0

0

0

* * .

l o

0

0

0

..-

-e

o I

0

-e

-e I--0

326

SERIAL CORRELATION

-e

1

-6

1

o -e = (1 -

CH.

o ... -0 . . . 1 ...

..-

I

0

0

0

0

0

0

0

0

0

e)

-8

o

o

-e

0

0

0

0

0

0 ..*

0

0

0

-8

... -e . . . 1 ...

0

0

0

0

0

0

0

0

0

1

0

1

o -e

-e

-e

1

o -e

- * .

1-6

+e 0

0

0 .*.

0

0

0

0

0

0 ..*

* * -

o

-e

-e

o -e

I

-0

1-8

0

0

0

0

0

0

1

0

-e

-e

6.

SEC.

6.7.

DISTRIBUTIONS OF SERIAL CORRELATION COEFFICIENTS

1

-8

-0

1

... ...

0

0

0

0

0

0

1

o -e

327

- e2 0

0 .*-

0

0

*.*

0

0

- . *

1

... ...

1

-8

= (1

-0

0

0

0

0 *-.

1

-e

-e

o -e

* * -

-e

0

0

0

0

1

0

1-8

-e

- 8)eDT-2- 2(1 - 8)8'D~-3+ e4DT-#, T = 5 , 6 , . . . .

Substitution from (89) gives

- [(I

- 3e2) - (1 - e 2 ) h - 4821 (1

- d1 - 4 P T - 1 2

Rearrangement of terms gives (96) for T = 5 , 6 , . . . . Direct calculation verifies (96) for T = 2, 3 , 4 . Another form of the determinant, obtained by expanding the two Tth powers in (96) according to the binomial theorem, is

where H = $(T - 1 ) if T is odd and H = $T - 1 if T is even.

328

SERIAL CORRELATION

CH.

6.

LEMMA 6.7.12. Q: = ( y - @)‘Al(y - g ~ ) A, , = I , E is a characteristic vector of Al corresponding to a characteristic root 1 , yo = 1, and y, = 0, then

for 8 sujiciently small in absolute value.

PROOF. This lemma is a special case of Corollary 6.7.2. THEOREM 6.7.10. The moment generating function of serial correlation coeflcient when yo = 1 and y1 = 0 is

Q:

T =2,3,

for 0 sujiciently small in absolute value.

THEOREM 6.7.1 1.

The moment generating function of differences when yo = 1 and y, = 0 is (gee9; =

(102)

J-rJ( 1

+

for the circular

JCiZ (1

-

2

for 8 sujiciently small in absolute value.

Q:

...,

based on successive

Jk -

4e2r’

T = 2 , 3 , . .. ,

It will be observed that (102) is the same as 1 / J G , which is the moment generating function of Q, = 1 2 ;: ytyL-,defined in Section 6.5.4 for T 1 observations. The moments of the circular Q, can be found from the moment generating function

-

1

(1

\

+ Ji2 - 4e2\ltT(Ji + 20 - J i - 2elT / \ 2 /

1 +2e+J1-228

T = 2,3,. . .,

SEC.

6.7.

329

DISTRIBUTIONS OF SERIAL CORRELATION COEFFICIENTS

for 8 sufficiently small in absolute value. Laplace (1829) has shown that if u is the root 1 4 1 - c of u2 - 2u c = 0, then

+

+

(104)

1 2"

u-" = -

n c n +3 +--+2"(4 2!

(3' cz

(n

+ 4)(n + 5 ) ( $ 3!

+*.*+ The derivation of this expansion is not straightforward and will not be given here. If we let c = 402 and n = bT, we have

Since

the second term in the square brackets in (103) is a power series starting with OT and higher powers. Thus the square brackets contain 1 powers of 8 at least T. The moments of Qlto order T - 1 are

+

n=0,1,

..., T-1.

(See Problem 51.) From (105) and (107) we obtain &Q:"-' = 0,

(108)

&Q?= -T r ( ) T + 2k)(2k)! , 2 r ( ) T + k + l)k!

l < k j & T , T = 2 , 3 ,..., O 0. Thus

I

+

+&

(9) g?(r:,

* * * 9

r:

I

PI, * * *

9

+

+---

pi)

= A: IAoI-t(2r)f‘T-’)K*a ( Y O ,

x (1

+

+-- +

+

2

~ 0 ~ * 1 *,

-&tT-I)

g?(r:,

pjr;)

~o~i)~~?T-l)

9

. . . , rr I 0,. . . ,o)

j=1

= A; lAol 4 (2T”T-”K’ i

i

(1,

PI,

*

-i(T-ll

gf(r:,

* 9

pi)

. .-. , r:

10,. . . , O), .

since

t(T-1)

= Yo

+ + + PiAil 4 J2o + + + PJi

1Ao

~ 1 A l ~

1

+ z:f=l

* * *

~ * * 1*

Moments can be found from (9) by expanding (1 p,r;)-*fT-l) into a series in pjr:) and integrating with respect to r:, . . . , r: for small p , k

(x:fZl

SEC.

6.1 I .

353

SOME MAXIMUM LIKELIHOOD ESTIMATES

Another approach to the. distribution of the serial correlation coefficients involves the canonical form. Let

where zl, . . . ,z ~ are - ~independently normally distributed with means 0 and variances I/(yo ylAl)?.. . l/(vo Y ~ A ~ - respectively. ~), Then

+

(12)

+

J Y+~T i l t

U:

21

is distributed according to N(0, 1). We write

where p = y l / y o for the case of r* = r:. The distributions of Section 6.7 may be used with 1, - R replaced by (At - R*)/(l p1J. For example, if 1, = AT-, = v,, h = I , . . . H = +(T I), ( 5 2 ) may be rewritten

+

-

Y,+~

< R*
*-' ,4 r = 0. (20) If r > 0, the real roots of (20) are = -a and = - l / a for some a > 0 if T is even and -a, - l/a, and 1 if T is odd; if r = 0, the real roots are 0 if T is even and 0 and 1 if T is odd; if r < 0, the real roots are a and l / a for a > 0 if T is even and a, l/a, and 1 if T is odd and --( 1-2/T) < r, but only 1 if -cos n / T < r 6 -(1-2/T). Since the same process is defined for reciprocal values of p (and adjusted d)we take the root between - 1 and 1. (The root 1 does not give the absolute maximum of L.) It can be shown that asymptotically p = -r is a root of the second equation (because rT + 0 and 8' + 0 in probability). (See Problem 8 5 . ) It can also be shown that the likelihood ratio criterion for testing p = 0 is

B

The hypothesis is rejected if (21) is less than a suitably chosen constant. It may be noted that each root of (20) is a function of r, but not a rational function of r . The method based on maximum likelihood (using 8) may be approximated by the approach given in previous sections (using r). [Dixon (1 944) treated these problems.] We can write (12) as (I PB')y = u, which is equivalent to

+

y = (I

(22)

+ /IB')-lu.

Since BT = I, we have (23)

(I

+ PB')-'

T-1

= I(-P)'B''

and, equivalently, (I

+ @B')-' = 1 - ( - - B Y

Then the covariance matrix of y is (25)

+ (-P)T(I + @B')-'

t-0

8yy' = ( I

(-p)"".

+ PB')-'&uu'(Z + pB)-'

SEC.

6.12.

357

DISCUSSION

where S, = (0,1 , . . . , T - 1 T - 1) for r 0, since B'a =

-r}

for r 2 0 and S, = {-r, - r Thus (since B-a = BT-a)

+ 1,. ..

The covariance between yt and y,+,, is

h=0,1,

(27) a(h) = &YtYL+h =

+

+

..., T

- 1.

The correlation, (-/l)T-h]/[l is made up of two terms, one depending on the time between t and t h and the other between t and I h - T. The matrix (26) is the inverse of [(l Pz)Z /l(B B')]/us.

+

6.12.

+

+

+

+

DISCUSSION

The first-order serial correlation coefficients give methods for testing the null hypothesis that yl, . . . , y T are independently distributed against certain alternatives that they are serially correlated; under the assumption of normality such tests can be carried out at specified significance levels. To test higher order dependence partial serial correlations may be used. In Chapter 10 we shall discuss tests of independence and tests of order of dependence when by, =f(r) is some nontrivial trend.

358

CH. 6.

SERIAL CORRELATION

REFERENCES Section 6.2. T. W. Anderson (1963). Section 6.3. T. W. Anderson (1948), (1963). Kendall and Stuart (1961), Lehmann (1 959).

Section 6.4. T. W. Anderson (1963). Section 6.5. T. W. Anderson (1963), Watson and Durbin (1951). Section 6.6. R. L. Anderson and T. W. Anderson (1950). Section 6.7. R. L. Anderson (1941), (1942), Dixon (1944), Koopmans (1942), Laplace (1829), von Neumann (1941). Section 6.8. T. W. Anderson (1948), Dixon (1944), Hannan (1960), Hart (1942), Hart and von Neumann (1942), Koopmans (1942), Rubin (1945), United States Bureau of the Census (1955), Young (1941). Section 6.9. T. W. Anderson (1958), Daniels (1956), Hannan (1960), Jenkins (1956), Quenouille (1949a), (1949b), Watson (1956). Section 6.10. Leipnik (1947), Madow (1945). Section 6.1 1. Dixon (1944), Koopmans (1942).

PROBLEMS 1. (Sec.6.2.) Show that for the sequence yl,. . . ,yT from a stationary Gaussian process of order p the following functions of the observations form a sufficient set of statistics: T-P

2 1'

YtYt-p.

t=P+l

. .

2. (Sec. 6.2.) Let yl, . ,yT be a sequence with a covariance matrix ZT = [a(i j ) ] from a stationary process generated by the moving average yt = vt al~t-lr

-

+

3 59

PROBLEMS

i'.

where Bu,= 0, &'u: = u2, and &'up, = 0, t # s. Then o(0) = uz(l u2a1 = a(O)p, say, and o(h) = 0, h > I . Show

c;'

- 2P2)0(0) 1

(1

I

I

- 2pz

1

E ; '

=

I

(I

- p2)

-p(l

- 3p2 + p4)~(o)

1

p2

-p

-p(l

- p2)

I

I:]?

-P

-P2

-

1

- P3

-

pz P2

-P

p2

- p2 - p ( I - p2) 1

-P

p2

+ a;) and u(l)

P2

-P(l

1

=

1-

- PZ)

-pf

2p2

3. (Sec. 6.2.) Show that 2 2 ~ ' defined in Problem 2 is the product of uT-'(O)/lE,l and a matrix whose elements are polynomials in p. Show that all powers of p from 0 to T - I appear in these polynomials and that there are T linearly independent polynomials. [Hint: An element of the inverse is the ratio of the corresponding cofactor and the determinant.] 4. (Sec. 6.2.) Assume that yl, . . . , yT are T successive values from a Gaussian process defined as in Problem 2. Show that the dimensionality of the minimal sufficient set of statistics for a(0) and p is T. [Hint: Use the result of Problem 3 to show that the exponent in the density is a factor times pjPj, where the Pis are linearly independent quadratic forms. If Q,, . . . , Qs (S I; T - 1) is another sufficient set of statistics, the exponent is a factor times z k o / ; ( p ) R j ,the Rj's are functions of the Qis, and the two exponents are identical.] 5. (Sec. 6.2.) Let Z, be defined as in Problem 2 and

zTsl

aij = [ - p ~ ( O ) l j - ~ IEi-11 . lE~-jl/lE~Iv

i l j .

Show A = [ai,] = C?l. [Hint: Compute the cofactor of the i , j t h element by the Laplacian expansion using the first i - I columns and then the last T - j columns. An alternative method is to apply Lemma 6.7.9 to show that lCil satisfies a secondorder difference equation and to obtain a solution for )Zil. Then verify C,A = I. Verification of the diagonal terms (except the first and last) uses the solution for I Cil ; verification of the off-diagonal terms uses the second-order difference equation directly.] 6. (Sec. 6.2.) Show that the solution of (13) forp = I is u2 = 2 ( y 0 f d m )/yf and PI = ( y o f -d )In. 7. (Sec. 6.3.1.) Prove Theorem 6.3.2, namely, that (i)

i-1. J-1 ..

.

I

identically in yo. . . . , yi (for which (ii)

.

hi(Qo,. . , Q, yo, . . . , yi)g(Qo, . . , Q i ) d Q , . . . dQi = 0

Zj=,yjQj is positive definite) implies

g(Qo,.

..

3

QJ

=0

360

CH. 6.

SERIAL CORRELATION

I .. .

.

almost everywhere relative to the densities hi(Qo,, . ,Qi YO. ,Vf). [Hint: Let g(Q0, . . , Q J =f(Q,, . ,Qt) -g-(Qo. Qi), Where 8’ and 8- are nonnegative. Then show

.

.- .

..

jm. jm exp [-4 2 i

(iii)

*

EJ-00

. . ., Qt)ki(Qo, -

~jQj]g+tQo.

--m

-00

..

Im [-8.2 i

*

--m

exp

9

~jQj]g-(Qo,

.- -

* *

Qi)ki(Qo*

*

-

9

9

Qi) dQo *

Qi)

. dQi a

~ Q * o. * dQi.

Deduce (ii) from (iii) by the theory of characteristic functions.] 8. (Sec.6.3.1.) Show that (4) implies (3). 9. (Sec. 6.3.1.) Show that (3) implies (4) almost everywhere. [Hint: Use the property of completeness as proved in Problem 7 with

10. (Sec. 6.3.1.) The Neyman-Pearson Fundamental Lemma. Let f,(x) and fl(x) be density functions on some Euclidean space, and let R be any (measurable) set

such that

sjidx) ti% = E ,

6)

R

where dx indicates Lebesgue integration in the space and 0 5

E

5 1. Prove that if

R* = { x IfiW > kh(4)

(ii)

for some k and satisfies (i), then (iii)

[Hint: The contribution of the integral over R* n R is the same to both sides of (iii). Compare the integrals offl@) over R* n K and K* n R to the integrals off,(%).] 11. (Sec. 6.3.1.) Let yr,=ai+eij,

i-1,

..., M , j = l , ...,N ,

where the ai’s and eft’s are independentlydistributed, the distribution of a, is N(0, a:), and the distribution of eij is N(0, u3. Give the joint density of the yr,’s in the form of (1). Find the uniformly most powerful similar test of the hypothesis u: = 0 against the alternatives u: > 0 at significance level a. [Hint: The covariance matrix of yI1, . . ,ylN, y,,, . . . ,YMN is btfMN 0:fM @ EE’, where E = (1, I , . . . , I)’ is of N components and @ denotes the Kronecker product, and the inverse is of the form 6fMN CfJf @ E d . ]

. +

+

36 1

PROBLEMS

12. (Sec. 6.3.2.) A generalized Neyman-Pearson Fundamental Lemma. Let fo(x) and h(z)be density functions on some Euclidean space and fi(x) another function, and let R be any (measurable) set such that

where d z indicates Lebesgue integration, 0 I; E I; 1, and c is a constant. Prove that if (iii)

R* = {x

If * @)

> kOJl(4 + k l f l ( 4

for some ko and k, and satisfies (i) and (ii), then

[Hint: The contribution of the integral over R* n R is the same to both sides of (iv). Compare the integrals off,(x) over R* n K and K* n R to the integrals of koJl(x) kljd4.1 13. (Sec. 6.3.2.) Prove that R = (- w , c) cannot satisfy (16) and (20). [Hint: Show that (16) and (20) are then equivalent to (16) and

+

where (ii)

m

IC a S_mQQi(Qi

I Qo,

* * *

9

Qi-1;

~ 1 ~dQi.1 ')

14. (Sec. 6.3.2.) Let o(h) = a(O)p,, for a stationary process. Show that the partial correlation between yt and Yf+* "holding yf+, fixed" is

Show that the partial correlation between y, and yt+s "holding Yf+, and ~

t

fixed" + ~ is

15. (Sec. 6.5.2.) Prove that Bt has all elements 0 except1above the main diagonal and T -j below the main diagonal, which are 1, 1= 0, 1, . . , T - 1. 16. (Sec. 6.5.2.) Prove that BT = I.

.

362

SERIAL CORRELATION

CH.

6.

17. (See. 6.5.2.) Prove Theorem 6.5.2 by the method used to prove Theorem 6.5.4. 18. (Sec. 6.5.3.) Using A, given by (41), find A,, A,, and A, according to (18). 19. (See. 6.5.3.) Verify that (55) is orthogonal by using sums evaluated in Chapter 4. 20. (Sec. 6.5.3.) Prove C , and C; differ only in the elements for s, t 5 rand (T s l), (T - t 1) I r. 21. (Sec. 6.5.3.) Write out C: explicitly. 22. (See. 6.5.3.) Prove (61). [Hint: C1= 2(1 - A,) and A, is a polynomial in A, of degree].] 23. (See. 6.5.4.) Prove Corollary 6.5.4. 24. (Sec. 6.6.1.) Let x be distributed according to N ( ~ EX). , Show that an unbiased linear estimate in = y'x of p must satisfy Y'E = 1. Show that for such an estimate to be independent of the residuals x - me, it must be Markov; that is,

+

+

25. (See. 6.6.1.) Show that when x is distributed according to N ( ~ E C ,) Z and x - ik are independent if and only if e is a characteristic vector of 2. 26. (Sec. 6.6.1.) State Theorems 6.4.1 and 6.4.2when &yr = p and E is a characteristic vector of A,, . . . ,A,. 27. (Sec. 6.7.1.) Let y for T = 4 have the density ~ ( y , ,yl)e-l ( YoQo+YlQ

6)

I ),

+

Y: - (Y: Y:), W y 0 , n) = (ri where Qo = Zt=,y t and Q , = Y: and lyll < yw (a) Show that the joint density of Qu and Q, is 4

2

+

- r4/(27d0

(ii)

+

+

[Hint: (yo + y,)(yf y:) and ( y o - y,)(y: y:) are independently distributed as x1 with 2 degrees of freedom.] (b) Show that the joint density of Qo and r = Ql/Qo is

(iii)

Irl 5;

1.

(c) Show that the marginal density of Qo when y1 Z 0 is

(d) Find the conditional density of r given Q,. (e) Find the marginal density of t . 28. (See. 6.7.1.) Let x have densityf(x, 6), where 6 E R. Suppose Q is a sufficient statistic for 6, and suppose there is a group of transformations g on x with a corresponding group g on Q. Suppose that there is some value qu of Q such that for every value ql of Q there is a transformationg such that gq, = qO. Then every statistic r that is invariant with respect to the group of transformations is distributed independently of Q and 6.

363

PROBLEMS

29. (Sec. 6.7.2.) Prove (10) for purely imaginary t o , .

. . ,t , by using the fact that

30. (Sec. 6.7.2.) Show that if A is positive definite and B is symmetric there exists a nonsingular matrix C such that

C'BC = D ,

C'AC = I,

where D is diagonal and the diagonal elements of D are the roots of IB - dAl = 0. Show that the columns of C are characteristic vectors of A-lB. [Hint: Let E be a matrix so E'AE = I; take C = E 6 , where 6 is an orthogonal matrix that diagonalizes E'BE.] 31. (Sec. 6.7.3.) Show that AjAk = AkA,, j , k = 1,. ,p, is a necessary and sufficient condition for the existence of an orthogonal matrix 2 such that Z ' A j Z = A,, diagonal,j = 1, . . . ,p , where A,, . . . ,A,, are symmetric matrices. [Hint: Z'AjZ = Aj is equivalent to A j = ZAjZ'. To prove sufficiency, assume A,A, = A,A, and use A , = ZAZ'.] 32. (Sec. 6.7.3.) Let A, be an arbitrary positive definite matrix and let A,, . . . ,A, be symmetric matrices. Show that if there exists a nonsingular matrix C such that its columns are the characteristic vectors of A;'A,, . . . , Ai'A, and are properly normalized, then C'A,C = Aj, C'A,C = I , j = 1 , . . ,p,

..

.

where Aj is diagonal with the roots of IAj - dA,I = 0 as diagonal elements. Prove that a necessary and sufficient condition for the above is that

j . k = 1,...,p.

AjA,'Ak = AkA,'A,,

[Hint: See Problems 30 and 31.1 33. (Sec.6.7.4.) Prove h*(y,, . . . ,Y H - ~ )= k:h(yl, . ,yrf-,). [Hint: Let Z,, Z,, and Z: be the points of intersection of the line parallel to V,V, through (y,, . . . ,yH-,) and the faces V,V 3 .. * V,, V, V 3 . . . V,, and V,' V 3 . * * V,, respectively, and let W be the point of intersection of V3V, . . V, and the plane through V, V, and the line. Consider the triangles V,V,W . VlV,' W,Z1Z2W,and Z,Z," W.] 34. (Sec. 6.7.4.) Indicate U,,U,,and U3 for R < v3 (in barycentric coordinates with H = 3). 35. (Sec. 6.7.4.) Show that Theorem 6.7.4 can be stated

..

-

ZH (Vi

Pr {r < R} =

i=m+l

(vi

i=1

j#i

[Hint: Use Lemma 6.7.7.1

- R)-

, ~ ~ + ~ < R j v , m , ,= l , 10j)

...,H - I .

364

SERIAL CORRELATION

CH.

6.

36. (Sec. 6.7.4.) Prove Theorem 6.7.4 by induction by applying Theorem 6.7.5. [Hint: Assume (52) for H = H* - 1 and use Theorem 6.7.5 with M = 2(H* - 1) and L = 2 to verify (52) for H = H*.I 37. (Sec. 6.7.4.) Prove Theorem 6.7.4 by inverting (69). [Hinr: Show that u = (vj - R)xj has the density for u 2 0.

zEl

m

/(u)

3

4,pullvr-R) 9-1

(Vj

- R)H-n

fi

t-1

,vm+, 0 and ~r< 0, < 21r if B, < 0. If A, and Bj are normally distributed, then R: is proportional to a chi-square variable with 2 degrees of freedom and 8, is uniformly distributed between 0 and 21r (by the symmetry of the normal distribution) and is independent of R;. The stationarity of the process may be seen from this definition by noting that

where (0,

- 4,s) is distributed uniformly over an interval of length 2a;

thus

y, and yt+, have the same distribution. If the AJ’s and Bj’s are not normally distributed, {y,} is not necessarily stationary in the strict sense.

This example is important because in a sense every weakly stationary stochastic process with finite variance can be approximated by a linear combination, such as the right-hand side of (13).

Example 7.4. Moving average. Let . . . ,u - ~ uo, , v l , . . . be a sequence of independently and identically distributed random variables, and let ao,al,. . . , agbe q + 1 numbers. Then the moving average (24)

+

y, = aOvt slut-*

+

* - *

+ agutdg,

t

=

. . . ,- l , O ,

Figure 7.1. Coordinates of trigonometric functions.

1,.

..,

SEC.

7.2.

377

DEFINITIONSOF STATIONARY STOCHASTIC PROCESSES

is a stationary stochastic process. If 4%: = Y and Var ut = uz, then

(25) (26)

+ a1 + + ma), Cov (yt, yt+J = uz(aoas+ - + a,-,q,), 8 y t = v(aO

*

*

*

*

s=q+

= 0,

. . . ,q, 1, ...,

s = 0,

and {y,} is stationary in the wide sense. For {yl} defined by (24) to be stationary in the wide sense the only conditions necessary on the ut’s is that they have the same mean and variance and be mutually uncorrelated. We can also define infinite moving averages. The infinite sum

means the random variable, say y t , if it exists, such that

such a random variable exists (Corollary 7.6.1). Then (27) is said to converge in the mean or in quadratic mean. The doubly infinite sum

S=-W

is defined similarly as the random variable yt such that

”

a sufficient condition is

...

s=-m

The conditions (29) and (32) are sufficient if the ud’s are uncorrelated with common mean and variance, but are not necessarily independent. Example 7.5. Autoregressive process. In Chapter 5 we considered a process satisfying a stochastic difference equation

(33)

378

STATIONARY STOCHASTIC PROCESSES

CH.

7.

where the uis are independently and identically distributed with mean 0. If the roots of the associated polynomial equation

2 n

(34)

=0

f3,Z"'

r=o

are less than 1 in absolute value, (33) can be inverted to yield Yt =

(35)

2

a-0

4ut-8,

f =

..., - l , O , l , . . . ,

where the right-hand side converges in the mean. Thus {y,} is stationary in the S uncorrelated and have common mean and variance strict sense. If the U ~ are (but are not necessarily independently and identically distributed), the process is stationary in the wide sense. It was shown in Section 5.2 that the covariance sequence satisfies the pure difference equation z p r u ( s - r ) = 0,

(36)

s=l,2,

?PO

If the roots of the associated polynomial equation (34) are distinct, say zl, x,, then

2 0

(37)

a(h) =

c

h = 1 - p, 2

... . ... ,

- p, . . . , -1, 0, 1, . . . ,

j=1

and the constants cl, . . . , c, must satisfy

h = l , . . ., p

- 1,

where 8u; = ua.

Example 7.6. Autoregressive process with moving auerage residual. The two previous models can be combined; that is, we can take a moving average as the disturbance ut of the stochastic difference equation. The model is

where the ut's are independently and identically distributed. Then {y,} is stationary in the strict sense. IF the U;S are uncorrelated and have common mean and variance, then { y I } is stationary in the wide sense.

SEC.

7.2.

379

DIEFNITIONS OF STATIONARY STOCHASTIC PROCESSES

Example 7.7. Autoregressiue process with superimposed error. Let y, satisfy the stochastic difference equation (33) and define (41)

+ W,,

= Y,

Zt

where the W,'S areindependently identically distributed, independently of the y;s. If {y,} is stationary in the strict sense, the process (2,) is stationary in the strict sense. If {y,} is stationary in the wide sense and the wt's are only uncorrelated with common mean and variance, {zt} is stationary in the wide sense.

Example 7.8. An extreme example of a process that is stationary in the wide sense but not in the strict sense is

r=

y, = cos tw,

(42)

1,2,

...,

where w is uniformly distributed in the interval (0,27r). Then 8 y t = 0, &yi = +, and 8yty, = 0, f # s. These random variables are uncorrelated although dependent functionally and statistically.

Example 7.9. Another process stationary in the wide sense related to Example 7.3 is (43)

Y, =

2

ujJ2cos ( A r t

- uj),

1-1

where ul, . . . ,u, are independently uniformly distributed in the interval (0,27r). Then 8y, = 0 and (44)

2u:uflcos

by,y, =

2

- U i ) cos (AjS

(A,t

- u,)

i , j-I

=2

u:f"cos

(Ajf

- Uj) cos (Aj, - Uj)

dUj/(27r)

1'3

x (cos A,s cos uj

=

2

cos 2,s

2

- s),

u;,}. We define Ιτ(λ) as 1

(44)

2πι

- à Σ*^""~"

Then (45)

£Ιτ(λ)

= —

27Г

У