137 88 9MB
English Pages 255 [246] Year 2005
Springer Finance
Springer Finance Springer Finance is a new programme of books aimed at students, academics and practitioners working on increasingly technical approaches to the analysis of financial markets. It aims to cover a variety of topics, not only mathematical finance but foreign exchanges, term structure, risk management, portfolio theory, equity derivatives, and financial economics.
Credit Risk Valuation: Risk-Neutral Valuation: Pricing and Hedging of Finance Derivatives Bingham, N. H. and Kiesel, R. ISBN 1-85233-001-5 (1998) Visual Explorations in Finance with Self-Organizing Maps Deboeck, G. and Kohonen, T. (Editors) ISBN 3-540-76266-3 (1998) Mathematical Models of Financial Derivatives Kwok, Y-K. ISBN 3-981-3083-25-5 (1998) Mathematics of Financial Markets Elliott, R. J. and Kopp, P. E. ISBN 0-387-98533-0 (1999) Efficient Methods for Valuing Interest Rate Derivatives A. Pelsser ISBN 1-85233-304-9 (2000) Methods, Models and Applications Ammann, M. ISBN 3-540-67805-0 (2001) Credit Risk: Modelling, Valuation and Hedging Bielecki, T. R. and Rutkowski, M. ISBN 3-540-67593-0 (2001) Mathematical Finance – Bachelier Congress 2000 – Selected Papers from the First World Congress of the Bachelier Finance Society, held in Paris, June 29 – July 1, 2000 Geman, H., Madan, D. S., Pliska R. and Vorst, T. (Editors) ISBN 3-540-67781-X (2001)
Exponential Functionals of Brownian Motion and Related Processes M. Yor ISBN 3-540-65943-9 (2001) Financial Markets Theory: Equilibrium, Efficiency and Information Barucci, E. ISBN 3-85233-469-X (2003) Financial Markets in Continuous Time Dana, R.-A. and Jeanblanc, M. ISBN 3-540-41722-9 (2003) Weak, Convergence of Financial Markets Prigent, J.-L. ISBN 3-540-4233-8 (2003) Incomplete Information and Heterogenous Beliefs in Continuous-time Finance Ziegler, A. ISBN 3-540-00344-4 (2003) Stochastic Calculus Models for Finance: Volume 1: The Binominal Assett Pricing Model Shreve, S. E. ISBN 3-540-40101-6 (2004) Irrational Exuberance Reconsidered: The Cross Section of Stock Returns Külpmann, M. ISBN 3-540-14007-7 (2004) Credit Risk Pricing Models: Theory and Practice Schmid, B. ISBN 3-540-40466-X (2004) Empirical Techniques in Finance Bhar, R. and Hamori, S. ISBN 3-540-25123-5
Ramaprasad Bhar Shigeyuki Hamori
Empirical Techniques in Finance With 30 Figures and 30 Tables
123
Professor Ramaprasad Bhar School of Banking and Finance The University of New South Wales Sydney 2052 Australia E-mail: [email protected] Professor Shigeyuki Hamori Graduate School of Economics Kobe University Rokkodai, Nada-Ku, Kobe 657-8501 Japan E-mail: [email protected]
Mathematics Subject Classification (2000): 62-02, 62-07
Cataloging-in-Publication Data Library of Congress Control Number: 2005924539
ISBN 3-540-25123-5 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com © Springer-Verlag Berlin Heidelberg 2005 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: design & production Production: Helmut Petri Printing: Strauss Offsetdruck SPIN 11401841
Printed on acid-free paper – 43/3153 – 5 4 3 2 1 0
To Rajiv, Mitra, Hitoshi, Makoto and Naoko
Acknowledgements
We have benefited greatly from the support of many people in writing this volume. Special thanks are due to Martina Bihn for excellent editorial guidance. We would like to thank the students at the University of New South Wales and Kobe University for many helpful comments and suggestions. Finally, we would like to thank our family members, Rajiv, Mitra, Hitoshi, Makoto and Naoko. Without their warm-hearted support, we could not have finished writing this volume. Our research is in part supported by a grant-in-aid from the Japan Society for the Promotion of Science. Sydney, Australia Kobe, Japan
Ramaprasad Bhar Shigeyuki Hamori
Table of Contents
1 Introduction
1
2 Basic Probability Theory and Markov Chains 2.1 Random Variables 2.2 Function of Random Variable 2.3 Normal Random Variable 2.4 Lognormal Random Variable 2.5 Markov Chains 2.6 Passage Time 2.7 Examples and Exercises References
5 5 7 8 9 10 14 16 17
3 Estimation Techniques 3.1 Models, Parameters and Likelihood - An Overview 3.2 Maximum Likelihood Estimation and Covariance Matrix of Parameters 3.3 MLE Example - Classical Linear Regression 3.4 Dependent Observations 3.5 Prediction Error Decomposition 3.6 Serially Correlated Errors - Overview 3.7 Constrained Optimization and the Covariance Matrix 3.8 Examples and Exercises References
19 19 20
4 Non-Parametric Method of Estimation 4.1 Background 4.2 Non-Parametric Approach 4.3 Kernel Regression 4.4 Illustration 1 (EViews) 4.5 Optimal Bandwidth Selection 4.6 Illustration 2 (EViews) 4.7 Examples and Exercises References
31 31 32 33 35 36 36 38 39
22 23 24 25 27 28 29
Table of Contents 5 Unit Root, Cointegration and Related Issues 5.1 Stationary Process 5.2 Unit Root 5.3 Dickey-Fuller Test 5.4 Cointegration 5.5 Residual-based Cointegration Test 5.6 Unit Root in a Regression Model 5.7 Application to Stock Markets References
41 41 44 46 49 50 51 52 54
6VARModeling 6.1 Stationary Process 6.2 Granger Causality 6.3 Cointegration and Error Correction 6.4 Johansen Test 6.5 LA-VAR 6.6 Application to Stock Prices References
55 55 57 59 61 62 64 65
7 Time Varying Volatility Models 7.1 Background 7.2 ARCH and GARCH Models 7.3 TGARCH and EGARCH Models 7.4 Causality-in-Variance Approach 7.5 Information Flow between Price Change and Trading Volume References
67 67 68 71 74 77 81
8 State-Space Models (I) 8.1 Background 8.2 Classical Regression 8.3 Important Time Series Processes 8.4 Recursive Least Squares 8.5 State-Space Representation 8.6 Examples and Exercises References
83 83 83 86 89 91 94 103
9 State-Space Models (II) 9.1 Likelihood Function Maximization 9.2 EM Algorithm
105 105 108
XI 9.3 Time Varying Parameters and Changing Conditional Variance (EViews) 9.4 GARCH and Stochastic Variance Model for Exchange Rate (EViews) 9.5 Examples and Exercises References
111 113 116 126
10 Discrete Time Real Asset Valuation Model 10.1 Asset Price Basics 10.2 Mining Proj ect Background 10.3 Example 1 10.4Example2 10.5 Example 3 10.6 Example 4 Appendix References
127 127 129 130 131 133 135 138 140
11 Discrete Time Model of Interest Rate 11.1 Preliminaries of Short Rate Lattice 11.2 Forward Recursion for Lattice and Elementary Price 11.3 Matching the Current Term Structure 11.4 Immunization: Application of Short Rate Lattice 11.5 Valuing Callable Bond 11.6 Exercises References
141 141 145 148 149 152 153 154
12 Global Bubbles in Stock Markets and Linkages 12.1 Introduction 12.2 Speculative Bubbles 12.3 Review of Key Empirical Papers 12.4 New Contribution 12.5 Global Stock Market Integration 12.6 Dynamic Linear Models for Bubble Solutions 12.7 Dynamic Linear Models for No-Bubble Solutions 12.8 Subset VAR for Linkages between Markets 12.9 Results and Discussions 12.10 Summary References
155 155 156 158 164 165 167 172 174 175 186 187
13 Forward FX Market and the Risk Premium 13.1 Introduction
193 193
XII
Table of Contents
13.2 Alternative Approach to Model Risk Premia 13.3 The Proposed Model 13.4 State-Space Framework 13.5 Brief Description of Wolff/Cheung Model 13.6 Application of the Model and Data Description 13.7 Summary and Conclusions Appendix References
195 196 201 204 205 209 210 211
14 Equity Risk Premia from Derivative Prices 14.1 Introduction 14.2 The Theory behind the Modeling Framework 14.3 The Continuous Time State-Space Framework 14.4 Setting Up The Filtering Framework 14.5 The Data Set 14.6 Estimation Results 14.7 Summary and Conclusions References
215 215 217 220 223 228 228 235 236
Index
239
About the Authors
243
1 Introduction
This book offers the opportunity to study and experience advanced empirical techniques in finance and in general financial economics. It is not only suitable for students with an interest in the field, it is also highly recommended for academic researchers as well as the researchers in the industry. The book focuses on the contemporary empirical techniques used in the analysis of financial markets and how these are implemented using actual market data. With an emphasis on Implementation, this book helps focusing on strategies for rigorously combing finance theory and modeling technology to extend extant considerations in the literature. The main aim of this book is to equip the readers with an array of tools and techniques that will allow them to explore financial market problems with a fresh perspective. In this sense it is not another volume in econometrics. Of course, the traditional econometric methods are still valid and important; the contents of this book will bring in other related modeling topics that help more in-depth exploration of finance theory and putting it into practice. As seen in the derivatives analysis, modern finance theory requires a sophisticated understanding of stochastic processes. The actual data analyses also require new Statistical tools that can address the unique aspects of financial data. To meet these new demands, this book explains diverse modeling approaches with an emphasis on the application in the field of finance. This book has been written for anyone with a general knowledge of the finance discipline and interest in its principles together with a good mathematical aptitude. For the presentation of the materials throughout the book, we therefore focused more on presenting a comprehensible discussion than on the rigors of mathematical derivations. We also made extensive use of actual data in an effort to promote the understanding of the topics. We have used Standard Software tools and packages to implement various algorithms. The readers with a Computer programming orientation will enormously benefit from the available program codes. We have illustrated the Implementation of various algorithms using contemporary data as appropriate and utilized either Excel spreadsheet (Microsoft Corporation), EViews (Quantitative Micro Software), or GAUSS
1 Introduction (Aptech Systems Inc.) environments. These program codes and data would be made available through one of the author's Website (www.bhar.id.au) with appropriate reference to the chapters in the book. We have implemented the routines using the Software package versions currently in use so that most readers would be able to experiment with these almost immediately. We sincerely hope that the readers would utilize these Software Codes to enhance the capabilities and thus contribute to the empirical finance field in fiiture.^ Besides the first introductory chapter, the book comprises thirteen other chapters and the brief description these chapters follow. The chapter 2 reviews the basic probability and Statistical techniques commonly used in quantitative fmance. It also briefly Covers the topic of Markov chains and the concept of first passage time. The chapter 3 is devoted to estimation techniques. Since most empirical models would consist of several unknown parameters as suggested by the underlying theory, the issue of inferring these parameters from available data is of paramount importance. Without these parameters the model is of little use in practice. In this chapter we mainly focus on maximum likelihood approach to model estimation. We discuss different ways to specify the likelihood function and these become useful for later chapters. We devote sufficient time to explain how to deal with placing restrictions on model parameters. As most commercially available optimization routines would automatically produce the covariance matrix of the parameters at the point of convergence, we include a careful analysis how to translate this to the constrained parameters that are of interest to the user. This translated covariance matrix is used to make inference about the Statistical significance of the estimated parameters. In chapter 4 we cover the essential Clements of non-parametric regression models and illustrate the principles with examples. Here we make use of the routines on Kernel density fiinction available in the Software package EViews. Chapters 5 and 6 then review the stationary and nonstationary time series models. The former chapter discusses the unit root, cointegration and related issues; the latter, multivariate time series models such as VAR, VECM and LA-VAR. Chapter 7 reviews time varying volatility models, such as ARCH, GARCH, T-GARCH and E-GARCH. Since these models have dominated the literature over the last several years we emphasize applications rather than theories. We extend this topic to include causality in
1 Datastream is a trademark of THOMSON FINANCIAL. EViews is a trademark of Quantitative Micro Software. Excel is a trademark of Microsoft Corporation. GAUSS is a trademark of Aptech Systems, Inc.
1 Introduction variance and demonstrate its efficacy with an application to the commodity fütures contracts. The chapter 8 is devoted to explaining the state-space models and its application to several time series data. We have attempted to demystify the concepts underlying unobserved components in a dynamical System and how these could be inferred from an appHcation of the filtering algorithm. The filtering algorithm and its various sophistication dominate the engineering literature. In this book we restrict ourselves to those problem settings most familiär to the researchers in finance and economics. Some of the examples in this chapter make use of the limited facility available in EViews to estimate such models. In chapter 9 we take up the issue of estimation of state-space models in greater detail by way of maximization of prediction error decomposition form of the likelihood function. This analysis gives the reader an insight into how various model parameters feed into the adaptive filtering algorithm and thus constitute the likelihood function. It becomes clear to the reader that for any reasonable practical System the likelihood function is highly non-linear in model parameters and thus optimization is a very complex problem. Although the Standard numerical function optimization technique would work in most situations, there are cases where an alternative method based on Expectation Maximization is preferable. For the benefit of the readers we give a complete GAUSS program code to implement the EM algorithm. We sincerely hope that the readers would experiment with this code structure to enhance their understanding of this very important algorithm. In the next two chapters, 10 and 11, we move away from the state-space Systems and take up the practical issues in modeling stochastic process in a discrete time framework. We will, however, take up more challenging modeling exercises using state-space models in the succeeding chapters. In chapter 10 we discuss the discrete time stochastic nature of real asset problem. This approach is suitable for resource based valuation exercises where there might be several embedded options available to the Investor. After describing the basic issues in financial options valuation and the real asset options valuation, we describe the approach using a mining problem. We structure the development in this chapter and the next one foUowing the excellent book by D. G. Luenberger on Investment Science (Luenberger DG (1997) Investment science. Oxford University Press, New York). We, however, add our Interpretation of the issues as well as other illustrations. Besides, we include the relevant implementations in Excel spreadsheets. In chapter 11, we maintain the same discrete time theme and take up the issues with modeling interest rates and securities contingent on the term
1 Introduction structure of interest rates. We explain the elegant algorithm of forward recursion in a fashion similar to that in D. G. Luenberger's Investment Science. There are several illustrations as well as the spreadsheets of the Implementation. We hope that the readers would take füll advantage of these spreadsheets and develop them further to suit their own research needs. In chapter 12 we highlight the recent advances in inferring the speculative component in aggregate equity prices. The topic area in this chapter should be very familiär to most researchers and students in finance. However, it may not be as obvious how to extend the Standard present value models to infer unobserved speculative price component. The Implementation of the models in this chapter relies upon the understanding of the Contents of chapters 8 and 9. We not only infer the speculative components, we extend the analysis to investigate whether these components are related between different markets. The last two chapters, 13 and 14, deal with the important issue of risk premium in asset prices. Chapter 13 Covers the foreign exchange prices and the chapter 14 deals with the equity market risk premium. Both these chapters require some understanding of the stochastic processes. Most students and researchers would be familiär with the topic of risk premium in a regression-based approach. That makes it a backward looking estimation process. We, however, exploit the rieh theoretical structures in the derivatives market that connects the probability distribution in the risk-neutral World and the real world. Finally, this leads to a convenient mechanism, with minimum of assumptions, to uncover the market's belief about the likely risk premium in these markets. Since the methodology is based on the derivative securities, the inferred risk premium is necessarily forwardlooking. These two chapters completely rely on the unobserved component modeling framework introduced in the chapters 8 and 9. The associated GAUSS programs would be of immense benefit to all readers and would help them enhance their skills in this area as well.
2 Basic Probability Theory and Markov Chains
2.1 Random Variables Suppose X is a random variable that can take on any one of a flnite number of specific values, e.g., Xi,X2,...x^. Also assume that there is a probability p. that represents the relative chance of an occurrence x^. In this case, p^ satisfies^l^^pi =1 and Pi >0 foreachi. Each p- canbethought of as the relative frequency with which x- will occur when an experiment to observe x is carried out a large number of times. If the outcome variable can take on any real value in an interval, e.g., the temperature of a room, then a probability density function p(^) describes the probability. Since the random variable can take on a continuum of values, the probability density function has the foUowing Interpretation:
p © d ^ = prob.(^0.
Equations (2.28) can be solved by any of several well-known methods. The Solution provides the answer to our question of distribution of passengers in the long run. The implications of these equations are very important in practice. These reveal the State and probabilities towards which a process will incline. Thus, if we compare two approaches that lead to two different Markov chains, we can study the long run effects of these methods. We next consider a more efficient Solution approach (Kim and Nelson 1999) to a System of equations such as that given in (2.28). From equation (2.27) we obtain, ( I M - P ) ^ = 0M
'
(2.29)
where Ij^ is the identity matrix of order M (i.e., the order of the probability transition matrix P) and Oj^ is the M x 1 vector of zeros. The condition where the steady State probabilities sum to one is captured by, C^r = 1,
(2.30)
where ij^ =[l 1 1 ... l] . Combining equations (2.29) and (2.30), we get, "IM-P"
L iM J or
7C =
"OM"
L 1J
(2.31)
14
2 Basic Probability Theory and Markov Chains
A7l =
(2.32)
By multiplying both sides of (2.32) by ( A ' A ) 1 AKf' we obtain
7i = (A'A)"V'
(2.33)
2.6 Passage Time Economists are often required to ascertain the time required to reach a particular State. Given a wealth process, for example, how long will it take to reach the bankrupt State, i.e., the State without any wealth? Assume that we are in State i and let f^j(n)be the probability of a first transition from State i to State j in n steps. This is the probability that the j * State has not been passed through in prior transitions. For a transition in one step, the transition matrix gives this probability. For a transition in two steps, it equals the probability in two steps conditional on not having transited in one step. This implies two step transition probability less the one step transition probability times the probability that if it has reached such a State, it does not stay there. This can be represented by,
fi^(2) = p,^(2)-f^(l)p,
(2.34)
and in general, f;,(n) = P , ( n ) - [ f , ( l ) p , ( n - l ) + f;^(2)p,(n-2) + ... + f,^(n-l)pJ.
(2.35)
When the states i and j communicate (i.e., when it is possible to switch from State i to State j in a finite number of transitions), we can compute the expectation of this passage time. This expectation is defined by,
2.6 Passage Time
15
uu
(2.36) n=0
With some further analysis we can show that the mean first passage time can be obtained by solving the set of equations given by,
(2.37)
Note here that if k = j , then
^^.=^^=0.
Now consider the Hedge Fund Market Share example described by Tapeiro (2000): The current market positions of a hedge fund and its two main competitors are 12%, 40% and 48%, respectively. Based on industry data, Clients switch from one fund to another for various reasons (fund Performance falls below expectation, etc.). Here, the switching fund matrix is estimated to be, 0.1 0.3
0.6
P = 0.1 0.5
0.4
0.1 0.3
0.6
(2.38)
Our aim is to find the mean first passage time for clients in funds 2 and 3 to fund 1. This is given by the following System of equations from (2.37): II21 = 1 +1^21 ^ 0.5 + 1I31 X 0.4 , |i3j = 1 + 1I21 ^ 0.4 + |i3i X 0.6 .
(2.39)
Solving these we get ILI21 = M'si = 10 months. Other interesting applications for the first passage time include calculation of the time to bankruptcy, the first time cash attains a given level, etc. Continuing with the same example of hedge funds, determine the longterm market share of each of these funds.
16
2 Basic Probability Theory and Markov Chains
2.7 Examples and Exercises Example2.1: Suppose that x is a continuous random variable with density p(x), and let E[x] = | i . In this condition, show that Var[x] = E[x^] - |i^. Var[x] = E[(x-^)^] = E[x^ - 2 | i x - | i ^ ] = r (x^ - 2|ix - |i^)p(x)dx J-C30
(2.40)
= r x^p(x)dx - 2|i r xp(x)dx + |i^ r p(x)dx J-OD
J—OO
J-00
= E[x^]-2|i|i + |i^
Example 2.2: Given a normally distributed variable x with mean |x and variance a^, find the mean and variance of the random variable defined by y = a + bx , where 'a' and 'b' are constants. h(y) = p y - a
]_
iry-(a+b^)Y b(j J
(2.41)
aV27i This shows that the random variable y has a mean of (a + b|i) and the variance of b^o^. Exercise2.1: Use equation (2.33) and the transition matrix from the airline problem to compute the steady State probabilities. Attempt this problem in Excel or GAUSS.
References
17
References Kim C-J, Nelson CR (1999) State-space models with regime switching: classical and Gibbs-sampling approach with applications. The MIT Press, Cambridge Ross SM (2000) Introduction to probability models. f^ edn. Harcourt Academic Publishers, London Tapeiro CS (2000) Applied stochastic models and control for finance and Insurance. Kluwer Academic Publishers, Dordrecht
3 Estimation Techniques
3.1 Models, Parameters and Likelihood - An Overview When we speak about the probability of observing events, we are implicitly assuming some kind of model, even in the simple case of tossing a coin. In the case of tossing a coin, the model gives a certain and fixed probability for a particular outcome. This model has one parameter, 0, representing the probability that the coin will land on heads. If the coin is fair, then 0 = 0.5 . Given specific parameter values for the model, we can speak about the probability of observing an event. In this simple case, if 0 = 0.5 , then the probability that the coin will land on heads on any one toss is also 0.5. This simple example does not appear to provide us with very much: we merely seem to be calling what was previously a simple probability the parameter of a model. As we shall see, however, this way of thinking provides a very useful framework for expressing more complex problems. In the real world, very few things have absolute, fixed probabilities. Many of the aspects of the world with which we are familiär are not truly random. Take for instance, the probability of becoming a millionaire. Say that the ratio of millionaire in a population is 10%. If we know nothing eise about an individual, we would say that the probability of this individual becoming a millionaire is 0.10. In mathematical notation, p(M) = 0.10, where M shows the event of being a millionaire. We know, however, that certain people are more likely to be a millionaire than others. For example, having a strong academic background such as MBA may greatly increase one's possibility of becoming a millionaire. The probability above is essentially an average probability, taken across all individuals both with and without MBA. The notion of conditional probability allows us to incorporate other potentially important variables, such as MBA, into Statements about the probability of an individual becoming a millionaire. Mathematically, we write p(X | Y), meaning the probability of X conditional on Y or given Y. In our example, we could write,
20
3 Estimation Techniques p(M|withMBA),
(3.1)
p(M|withoutMBA).
(3.2)
and
Whether or not these two values differ is an indication of the influence of MBA upon an individual's chances of becoming a millionaire. Now we are in a position to introduce the concept of likelihood. If the probability of an event X dependent on model parameters 6 is written p(X 19), then we would talk about the likelihood L(91X), that is, the likelihood of the parameters given the data. For most sensible models, we find that certain data are more probable than other data. The aim of maximum likelihood estimation is to find the Parameter value(s) that makes the observed data most likely. The likelihood of the Parameters given the data is defmed to be equal to the probability of the data given the parameters (technically, they are proportional to each other, but this does not affect the principle). If we were in the business of making predictions based on a set of solid assumptions, then we would be interested in probabilities—^the probabilities of certain outcomes occurring or not occurring. In the case of data analysis, however, all the data have already been observed. Once they have been observed they are fixed; no 'probabilistic' part to them remains (the Word data comes from the Latin word meaning 'given'). We are much more interested in the likelihood of the model parameters that underlie the fixed data. Probability: Knowing parameters -> Predictionof outcome,
Likelihood: Observation of data -^ Estimation of parameters.
3.2 Maximum Likelihood Estimation and Covariance IVIatrix of Parameters A Statistical model with the parameter vector 0 of dimension k specifies a Joint distribution for a vector of observations y^ = [yi,y2?—»yil' •
3.2 Maximum Likelihood Estimation and Covariance Matrix of Parameters
Joint density function: p(y^ 16) .
21
(3.3)
The Joint density is, therefore, a function of y^ given 0. In econometric work, we know the y-p vector, or the sample data, but we do not know the Parameter vector 0 of the underlying Statistical model. In this sense, the Joint density in equation (3.3) is a function of 0 given y-p. We call it the likelihood function: LikeUhood function: L (01 y^ ) .
(3.4)
This is functionally equivalent to equation (3.3). Different values of 0 result in different values of the likelihood function (3.4). The function represents the likelihood of observing the data given the parameter vector. In the maximum likelihood method we are interested in choosing parameter estimates that allow us to maximize the probability of having generated the observed sample by maximizing the log of the likelihood function: 0 ^ = a r g m a x ln(L(0|yT)).
(3.5)
Maximizing the log of the likelihood function instead of the likelihood function itself allows us to directly compute the covariance matrix, Cov[0j^], of the maximum likelihood estimate, ÖJ^JL • The expectation of the second derivative of the log likelihood function provides us with the Information matrix summarizing the amount of Information in the sample, i.e.,
1(6) = - E
aMnL(eiyT) 5959'
(3.6)
The inverse of the information matrix provides us with the lower bound for the covariance matrix of an unbiased estimator, 9, otherwise known as the Cramer-Rao inequality. The maximum likelihood estimator has been shown to have the following asymptotic distribution,
22
3 Estimation Techniques ^/T(V-e)->N(0,(H)-'),
where — E T
aMnL(eiyT) dQdQ'
(3.7)
->H = lim-I(e)
3.3 MLE Example - Classical Linear Regression The classical theory of maximum likelihood estimation (MLE) is based on a Situation in which each of T observations is drawn independently from the same distribution. As the observations are drawn independently, the Joint density function is given by,
L(ypy2.-.yTie)-np(yc;e),
(3.8)
t=i
where p(yt;0) is the probabiHty density function y^. The models encountered in econometrics rarely conform to this pattem. Nonetheless, the main results of maximum Hkelihood estimators are vaUd for most cases. Consider the linear regression model in matrix notation, where disturbances are uncorrelated, each having mean zero and a constant, but finite variance, y = Xß + 8,
(3.9)
where, the error vector 8 = [8i,829—^^TI' has the properties, E[8] = 0, Var[8] = E[88'] = a%.
(3.10)
Assuming the parameter vector 6 = [ß,a^]', we obtain the following log likelihood function of the System in (3.9) with normally distributed errors:
3.4 Dependent Observations
lnL(e) = - I l n 2 7 r - I l n 0 ^ - - i ^ ( y - X ß ) ' ( y - X ß ) . 1
23
(3.11)
1(5
L
While the likelihood function in equation (3.11) can be maximized numerically, in this case we can show that,
ß = (X'X)"' X'y and ö ' = — .
(3.12)
To find the covariance matrix we need the Information matrix, which in this case tums out to be:
i(e) =
a~'X'X
0
0
T/2a'
(3.13)
3.4 Dependent Observations By definition, the observations in time series applications are dependent. Here we explore the method for constructing the likelihood function in this Situation. The function cannot be written in the form of equation (3.8). Instead, we express the Joint density in terms of conditional densities. When there are two observations, the Joint density can be written as, p(y2'yi)=p(y2lyi)p(yi)-
(3.i4)
The first term on the right-hand side is the probability density function of y^ conditional on observing y^. Similarly, for three observations we can write, p(y3'y2'yi) = p(y3ly2'yi)p(y2'yi)' and with the help of equation (3.14),
(3.i5)
24
3 Estimation Techniques
p(y3.y2'yi) = p(y3ly2'yi)p(y2lyi)p{yi)-
(3.i6)
Thus, in general,
L(yT;e) = np(yJyT_i)p(yt)-
(3.17)
As an example, consider the first-order autoregressive model, yt=*yt-i+St' s , - N I D ( 0 , a O .
(3.18)
In this case, the distribution of y^, conditional on y^_^, can be specified as normal with mean ^y^_^ and the variance a^. Under this Interpretation, the likelihood flinction can be expressed as,
T-L
=
,
T-K
2
^ J^i
X
\2
, .
X
^ln27i
_ l n a ' - - ^ S ( y , - ^ y ^ _ ^ ) +ln(p,).
2
2
(3-19)
2 a t-2
All that remains is to consider the treatment of the initial condition as reflected by the distribution of y^. When this first Observation is fixed, as is often the case, it does not enter the likelihood function to be maximized and can be dropped from equation (3.19). A similar analysis is possible for the different form of equation (3.18).
3.5 Prediction Error Decomposition The mean of the conditional distribution of y^, E[yt lyt.J, is the optimal predictor of y^ in the sense that it minimizes the prediction mean Square error. The variance of the corresponding prediction error, Vt=yt-E(yjyt_i),
(3.20)
3.6 Serially Correlated Errors - Overview
25
is the same as the conditional variance of y^, that is, Var[yJ = Var[yJy,_J.
(3.21)
The likelihood function in equation (3.17) can be expressed in terms of the prediction errors. This Operation, otherwise known as the "prediction error decomposition," is highly relevant for normally distributed observations. Let us write the conditional variance as, Var[vJ = a'f„ t = l,2,...,T,
(3.22)
where a^ is a parameter. The prediction error decomposition yields,
lnL(y,;e) = - | l n 2 7 t - | l n a ^ - i E L l n f . - ^ l L ^ .
(3.23)
For the AR(1) model considered earlier, we can easily see that f^ =1, Vt. For more complicated time series models, we can compute the prediction error decomposition form of the likelihood function by putting the model in the state-space form (i.e., in the form of a dynamic linear model) and applying a Kaiman filter. The prediction error form also offers a powerful approach for handling estimation issues for multivariate models. Further details can be found in Harvey(1990).
3.6 Serially Correlated Errors - Overview A basic Observation on prices in financial markets is that large retums tend to be foUowed by still larger retums of either sign. This, in tum, implies that the volatility of asset retums tends to be serially correlated. Here, the volatility implies the conditional variance of asset retums. Various econometric tests have been developed to infer the extent of this correlation in volatility. The most significant development for capturing this serial correlation has been the specification of a dass of time-varying volatility models known as ARCH (autoregressive conditional heteroskedasticity) or
26
3 Estimation Techniques
GARCH (generalized ARCH). This topic is extensively discussed in chapter 7. Here we only outline the essential elements for the maximum likelihood estimation of such models. In this case, we write the conditional variance as, cif=ao + Sr=ia.sf_. .
(3.24)
This represents an ARCH(p) process of order p implying that the current variance is determined by the last p surprises (disturbances). In practice, this approach may introduce a large number of parameters to be estimated, depending the value of p. One way to simplify the model is to introduce lagged values of the conditional variance itself This suggests, c^f = 0^0 + ILoc.sf.. + I?=ißjaf_j .
(3.25)
This is referred to as a GARCH(p,q) model. The coefficient a measures the extent to which a volatility shock today feeds through into the next period's volatility, and (a + ß) measures the rate at which this effect decays. Focusing on a GARCH(1,1) formulation, we can show by successive Substitution that,
cjf=:[^ +a,ir=,r'0, JK(u)du = l.
(4.4)
Though K(x) is a probability density function, it plays no probabilistic part in the subsequent analysis. To the contrary, it serves merely as a convenient method for Computing a weighted average. In no case does it im-
34
4 Non-Parametric Method of Estimation
ply, for example, that X is distributed according to K(x). If that were the case, it would become a parametric approach. By rescaling the kemel with respect to a variable h > 0, we can change the spread by varying h if we define,
K,(u)-iK(u/h),
JK,(u)du = l.
(4.5)
We can now define the weight fiinction to be used in the weighted average as, w,^T(x)^K,(x-XO/g,(x),
(4.6)
gh(x)-:|:iK,(x-X,).
(4.7)
A t=l
If h is very small, the averaging will be done with respect to a rather small neighborhood around each of the X / s . If h is very large, the averaging will be over a large neighborhood of the X / s . The degree of averaging amounts to adjusting the smoothing parameter, h, also known as bandwidth. Substituting equations (4.6) and (4.7) into equation (4.3) yields,
mh(x)--Xwt,T(x)% - ^ - ;
^
-—.
(4.8)
This is known as Nadaraya-Watson kemel estimator mj^(x) of m(x) . Under a certain regularity condition on the shape of the kemel and the magnitude and behavior of the weights, we find that as the sample size grows, mi^(x) -> m(x) asymptotically. This convergence property holds for a large dass of kemels. One of the most populär kemels is the Gaussian kemel defined by,
4.4 Illustration 1 (EViews)
K,(x) = - ^ 1= e ^ -— H^
35
(4.9)
In analyzing different examples with the EViews package, we would make use of this Gaussian kemel.
4.4 Illustration 1 (EViews) To illustrate the efficacy of kemel regression in capturing nonlinear relations, consider the smoothing technique for an artificially generated dataset using Monte Carlo Simulation. Let {XJ denote a sequence of 500 observations that take on values between 0 and 2n at evenly spaced increments, and let {YJ be related to {XJ through the foUowing nonlinear relation: % = sin (X^) +0.58,,
(4.10)
where { s j is a sequence of HD pseudorandom Standard normal variates. Using the simulated data pairs{Xj,YJ, we attempt to estimate the conditional expectation E[Y, | X J = sin(XJ by kemel regression. We apply the Nadaraya-Watson estimator (4.8) with a Gaussian kemel to the data, and vary the bandwidth parameter h among O.la^, O.Sa^, 0.5^^, where a^ is the sample Standard deviation of {XJ . By varying h in units of Standard deviation we are effectively normalizing the explanatory variable. The kemel estimator can be plotted for each variable. We notice from the plots that the kemel estimator is too choppy when the bandwidth is too small. It thus appears that for very low bandwidth, the Information is too sparse to recover sin(XJ . While the kemel estimator succeeds in picking up the general nature of the function, it shows local variations due to noise. As the bandwidth climbs these variations can be smoothed out. At intermediate bandwidth, for example, the local noise is largely removed and the general appearance of the estimator is quite appealing. At still higher bandwidth, the noise is completely removed but the estimator falls to capture the genuine profile of the sine function. In the limit, the kemel estimator approaches the sample average of {YJand all the variability with respect to {X J is lost.
36
4 Non-Parametric Method of Estimation
This experiment may be carried out with other kemel functions (provided by EViews) as well. EViews also allows automatic selection of bandwidth. This brings us to the topic of optimal bandwidth selection.
4.5 Optimal Bandwidth Seiection Choosing the proper bandwidth is critical in kemel regression. Among the several methods available for bandwidth selection, the most common is called cross-validation. This method is performed by choosing the bandwidth to minimize the weighted-average squared error of the kemel estimator. For a sample of T observations{X^, YJJ:^, let
Aho(Xj) = 7 l w , , ( X ^ ) Y , .
(4.11)
This is basically the kemel estimator based on the dataset with the j * Observation deleted, evaluated at the j * value X^. The cross Validation flinction CV(h) is deflned as,
CV(h)-;|:X[Y.
-AH,(X.)] 5(X.),
(4.12)
A t=i
where 5(XJ is a non-negative weight function required to reduce the effect of edges of the boundary (for additional Information, see Hardle (1990)). The function CV(h) is called cross-validation since it validates the success of the kemel estimator in fitting {YJ across T subsamples {X^,YJ^^j, each with one Observation omitted. The optimal bandwidth is the bandwidth that minimizes this function.
4.6 Illustration 2 (EViews) This example will acquaint you with the use of EViews for applying both non-parametric and parametric estimations procedures in the modeling of the short-term interest rate. Academic researchers and practitioners have
4.6 Illustration 2 (EViews)
37
been investigating ways to model the behavior of the short-term interest rate for many years. The short-term interest rate plays very important roles in financial economics. To cite just one use, its function is crucial in the pricing of fixed income securities and interest rate contingent Claims. The following example deals with a method to model the mean and variance of changes in the short-term interest rate. The behavior of the short-term interest rate is generally represented by,
dr, =M(rJdt + V2(rt)dW,,
("^-1^)
where r^ is the interest rate, dW^ is the Standard Brownian motion, M() is the conditional mean function ofdr^, and V() is the conditional variance function of dr^. In estimating the model in equation (4.13), a non-parametric method does not need to specify the functions M(-) and V(). As part of the exercise, a Standard parametric form may also be estimated for equation (4.13). The volatility estimated by both methods may then be compared. The Standard non-parametric regression model is, Y,=f(X,) + v.,
(4.14)
where, Y^ is the dependent variable, X^ is the independent variable, and v^ is the HD with mean zero and finite variance. The aim is to obtain a non-parametric estimate of f (•) using the Nadaraya-Watson estimator. The conditional mean and variance of the interest rate changes can be defmed as, M(r^) = E [ Y J X , = x ] ,
Y(T,) = E[Y,,\X,=X]-M{T,)\
(4.15)
(4.16)
38
4 Non-Parametric Method of Estimation
where, X^ =r^, Y^^ = Ar^, Y^^ ={^^x) • Estimates of the conditional means, E[Yit IX^ = x] and £[¥3^ | X^ = x] are obtained from the following nonparametric regressions: Yu=fi(XO + v,,,
(4.17)
Y,,=f,(X,) + v,,.
(4.18)
The means of equations (4.17) and (4.18) can be estimated using the estimator in equation (4.8). This provides the estimates of M(r^) and V(r^) in the equations (4.15) and (4.16), respectively. In this process, we use a Gaussian kemel and the optimal bandwidth suggested by Silverman (available in EViews). The variance estimate obtained from this non-parametric method may be compared with that from a parametric specification. The populär parametric specification for the short-term interest rate is the following GARCH-M (GARCH-in-Mean) model,
r^^ao+a^r^+a^V^^+g^^^^ e^^Jr, ^ N ( 0 , V J ,
X=ßo+ß,r,+ß,X_,+ß38,%.
(4-19)
(4.20)
We encourage you to study the two different variance estimates—^the non-parametric one obtained from equation (4.16) and the parametric one obtained from equation (4.20). Both of these methods have been applied in practice for different applications. The dataset for this exercise is described in the next section.
4.7 Examples and Exercises Exercise 4.1: The empirical models for equity retum in the CAPM (capital asset pricing model) framework most commonly adopted by researchers is given by,
References
n - r f , t = a + ß ( v , - r f , ) + s,.
39
(4.21)
The left-hand side represents the excess retum from the asset and the right-hand side is a linear function of the excess retum from the market. The aim of this exercise is to explore, using nonparametric kemel regression, whether such a linear relation holds for a given dataset. The dataset consists of weekly Japanese market data spanning January 1990 to December 2000. It contains data on excess return in the banking sector as well as excess retum in the total market. You may also examine the above relationship using the GARCH(1,1) stmcture for the residual term.
References Hafner CM (1998) Nonlinear time series analysis with applications to foreign exchange rate voiatility. Physica-Verlag, Berlin Hardle W (1990) Applied nonparametric regression. Cambridge University Press, Cambridge Hart JD (1997) Nonparametric smoothing and lack-of-fit tests. Springer, Berlin Niizeki MK (1998) Empirical tests of short-term interest rate models: a nonparametric approach. Applied Financial Economics 8: 347-352 Pagan A, Ullah A (1999) Nonparametric econometrics. Cambridge University Press, Cambridge Scott DW (1992) Multivariate density estimation. John Wiley & Sons, New York Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, New York
5 Unit Root, Cointegration and Related Issues
5.1 Stationary Process A stochastic process { y j is weakly stationary or covariance stationary if it satisfies the foUowing conditions. ^ 1. E[y J is a constant; 2- V[yJ is a constant; 3. Cov[y^ ,y^.g ] is a function of s, but not of t, where s = ± 1, ±2, • • •. In other words, a stochastic process {yJ is said to be weakly stationary (or covariance stationary) if its mean, variance and autocovariances are unaffected by changes of time. Note that the covariance between observations in the series is a function only of how far apart the observations are in time. Since the covariance between y^ and y^.^ is a function of s, we can define the autocovarince function: y(s)=Cov[y,,y,.J.
(5.1)
Equation (5.1) shows that for s=0, Y(Ö) is equivalent to the variance of y^. Further, the autocorrelation function (ACF) or correlogram between y^ and y. g is obtained by dividing Y(S) by the variance Y(0) as follows:
^ Enders (1995) and Hamilton (1994) are good reference to understand time series models. The explanation of chapters 5 and 6 relies on them.
42
5 Unit Root, Cointegration and Related Issues
P«=f •
(5.2)
The property of stationary stochastic process is represented by autocorrelation function. ExampleS.l: white noise process The simplest stationary stochastic processes is called the white noise process, u^. This process has the following properties: E[uj=0,
(5.3)
VK] = a^
(5.4)
Cov[u,,Ut.3]=y(s)=0, s ^ 0,
(5.5)
fl for
s = 0,
A sequence u^ is a white noise process if each value in the sequence has a mean of zero, a constant variance and no autocorrelation. Example 5.2: MA(1) process The first order moving average process, i.e., the MA(1) process, is written as follows: yt=^+Ut+0Ut_i,
(5.7)
where u^ is a white noise process. Thus, E[yJ = ^i,
(5.8)
5.1 Stationary Process 2x_2
Var[yJ = (l + eOa^
Cov[y„y,_J = Y(s) =
p(s) =
1+e0
GCT^
0
for s = ±l, for s = ±2,±3,--,
for
s=0
for
s = +l
43 (5.9)
(5.10)
(5.11)
for s = ±2,+3,'
Generally speaking, the MA(q) process is written as follows: Y, =n+u, + e,u,_i + BjU^.j + • • • +q - et - qu,_,
(5.12)
Note that the finite moving average processes are stationary regardless of the Parameter values. Example5.3: AR(1) process The first order autoregressive process, i.e., the AR(1) process, is written as: yt=^^+*yt-l+Ut,
(5.13)
where u^ is a white noise process. If|(|)|22(1)
[yu-i + "l,t 1X2,,-!. . U 2 . t .
(6.5)
Note that the right hand side of equation (6.1) contain only predetermined variables and that the error terms are serially uncorrelated with constant variance. Hence, each equation in the system can be estimated using OLS (ordinary least Squares). Moreover, OLS estimators are consistent and asymptotically efficient.^ In a VAR, the long lag lengths quickly consume degrees of freedom. If lag length isp, each of the n equations contains np coefficients plus the intercept term. It is important to select the appropriate lag length. To check lag length, we can use the multivariate version of the AIC or SBIC:
AIC = l o g | E | + - N
(6.6)
^ If some of the VAR equations have regressors not included in the others (including the possibility of different lag length), the System is called a near-VAR. In this case, SUR (seemingly unrelated regressions) can provide efficient estimates of the VAR coefficients.
6.2 Granger Causality
57
SBIC = log I i I + ^ ^ 5 ^ N ,
(6.7)
where 111 is the determinant of the variance covariance matrix of the residuals, and N is the total number of parameters estimated in all equations. If each equation in an n-variable VAR has p lags and an intercept, each of the n equations has np lagged regressors and an intercept. Thus, there are N = n^p + n Parameters. We can determine the appropriate lag length by choosing the specification with the lowest value of the AIC or SBIC.
6.2 Granger Causality Granger (1969) developed a method to analyze the causal relationship among variables systematically. In the Granger approach to the causality from y^ to y2 ^, we determine whether the past values of y^ can help to explain the current y2 ^ • We begin by defming three Information sets,
ii,t={yM'yM-i'---}' i2,t ~iy2,t'y2,t-2'*"}' it={yM.yi,t-ir--y2,t.y2,t-2r--}.
The Information set I^ consists of the history of y^ ^, the Information set I21 consists of the history of y21, and the Information set I^ consists of both y^ and y21. We say that yj ^ Granger-causes y21 if
E[y2,tll2,t-i]^E[y2,tlIt-i]-
(6.8)
Equation (6.8) implies that y^ Granger-causes y2 ^ (yi t -^ y2,t) if yi,t helps in the prediction of y21. If yj t does not help in the prediction of y2,t' yi,t does not Granger-cause y2t (yi,t/^y2,t)Similarly, we say that y2 ^ Granger-causes y^ if
58
6VARModeling
E[yiJIi^,_J^E[yiJVJ.
(6.9)
To explain the testing procedure concretely, let us consider the foUowing bivariate VAR(p) process:
(6.10) y2,t =*20 +Z!li*2l(0yi,t-i +E!li*22(i)y2,t-i +^2,^, where Uj^ (i=l,2) is adisturbanceterm. Then, y^ does not Granger-cause y2t if
(|)2i(l) = *2i(2) = --- = *2i(P) = 0.
(6.11)
To analyze whether y^ Granger-causes y21, we carry out the following hypothesis testing: Ho:*2i(l) = *2i(2)= • = (t)2i(p) = 0, H^ 1(^21(1)^0 or(|)2i(2)^0or..42i(p)^0.
^'
^
Rejection of the null hypothesis ( H Q ) implies that some of the coefficients on the lagged y^^ 's are statistically significant, whereas acceptance of the null hypothesis may indicate that none of the coefficients on the lagged y^ 's are statistically significant. In the former case yi ^ Granger- causes y21, while in the latter y^ may not Granger-cause y21. This can be tested using the F test or asymptotic chi-square test.^ For the bivariate model, we have four cases to consider.
(RSS-USS)/p ^ F-statistic is shown as follows: F =
F(p, T - 2p -1), where T USS/(T-2p-l) is the sample size, RSS is the restricted residual sum of Squares and USS is the unrestricted residual sum of Squares. It is also shown that pF—^—>X^(p) •
6.3 Cointegration and Error Correction Casel: yi^ ->y^, but y^.A^x,,
59
•
In this case, we have a one-way causality running from y^ to y^xCase 2: y2 ^ -> yi ^ but y 1,4/^X2,1 • In this case, we have a one-way causaHty running from y2 ^ to yj ^. Case 3: y^^ -^ y^^, and y2 ^ ^ yi,t • Here we obtain a feedback between y^ and y21 • Case4: y^,/4yi,x and y2,t^yi,t Here we obtain no causal relationship between y^ ^ and y21. While Granger-causality measures the precedence and information content, note that it does not measure causaUty by itself in the more common sense of the word.
6.3 Cointegration and Error Correction As discussed in the last chapter, the principal feature of cointegrating variables is the influence of deviations from long-run equilibrium on their time paths. Thus, the deviation from the long-run relationship influences the short-run dynamics. Here, let all variables in y^ =[yi,t'y2,t]' ^^ Kl)- Consider the following VAR(2) model: y, =Oo +Oiy,_i +02yt_2 +u,.
(6.13)
where u^ = [Uj t,U2 J ' is a vector of disturbance term. Then, it holds that Ay, = Oo + Oiy,_i - y,_^ + ^^y,_^ + u, = ^ 0 + ^ i y t - i -yt-i+^2X1-1 - ^ 2 y t - i + ^ 2 X 1 - 2 + ^ = Oo+nyt_i+rAy,_i+Ut,
(6.i4)
60
6VARModeling
where n = 0 i + 0 2 - I , r = - O 2 . If we solve equation (6.14) in terms of ny^.i, we obtain the following: ny,_i = Ay, - O o -rAy^-i - u , .
(6.15)
Since the right-hand side of equation (6.15) is stationary, the left-hand side should also be stationary. Here we must consider two cases to satisfy the stationarity. Case 1: n = 0 In the first case, n is a zero matrix. Under this condition, equation (6.14)becomes: Ayt=Oo+rAyt_i + Ut.
(6.16)
This corresponds to the Standard VAR in first differences. Case 2: ny^.i ~ 1(0) In the second case, the product of FI and y^.^ is stationary. This implies that the linear combination of y^^ and y2 ^ is stationary, and thus that these two variables are cointegrated. In this case, ny^_i can be expressed as: ny,_i = aß'yt_i=aEC,_i,
(6.17)
where a = [ai,a2]', ß = [ßl,ß2]'. ECt_i=ßiyi,t-i+ß2y2,t-i-
The a vector is called the adjustment vector and ß is the cointegrating vector. Substituting equation (6.17) into equation (6.13) yields the following: Ay, = Oo + aECt_i + rAy^.i + u,.
(6.18)
6.4 Johansen Test
61
The term ECj_i is called the error correction term and equation (6.18) is called the vector error correction model (VECM). The cointegrating relation is interpreted as the long-run equilibrium. That is, the relationship ßiYn +ß2y2,t =0 is satisfied in the long-run. The error correction term is interpreted as the deviation from the long-run equilibrium. Equation (6.18) is called the error correction model since y^ moves in such a way that it adjusts to the past error (ECt_i). Thus, estimating y^ as a VAR in first difference is inappropriate if y^ has an error-correction representation.
6.4 Johansen Test As we discussed in the last chapter, the Engle-Granger test has several limitations. Johansen and Juselius (1990) developed an alternative approach to test for cointegration. Consider the following general model of equation (6.14): p-i
Ay, = ny,_i + ^f:^ FiAy,_i + u,.
(6.19)
We can use the rank of n to determine whether or not the variables in y^ is cointegrated: the rank of n is equal to the number of independent cointegrating vectors. The number of cointegrating vectors can be obtained by checking the significance of the characteristic roots of n . We know that the rank of a matrix is equal to the number of its characteristic roots that differ from zero. If variables in y^ are not cointegrated, rank(n) = 0 and all of these characteristic roots will equal zero. There is a Single cointegrating vector if rank(n) = 1. There are multiple cointegrating vectors if 1 < rank(n) < n where n is a number of variables. Two testing procedure can be used to find the number of cointegrating vectors, a variable indicated by r. The first procedure, the "trace test," applies a test statistic written as X^^^^. The trace test tests the null hypothesis that the number of distinct cointegrating vectors is less than or equal to r against a general alternative. Thus, they are shown as follows:
62
6VARModeling Ho:r = 0, HA:r>0, Ho :r < 1,
HA
:r > 1,
H„:r2,
^^'^^^
The second procedure, the "maximum eigenvalue test," applies a statistic written as X^^^. The maximum eigenvalue test tests the null hypothesis that the number of cointegrating vectors is r against the alternative r +1 cointegrating vectors. They are shown as foUows: Ho:r = 0,H^:r = l, Ho:r = l, H^:r = 2,
We apply these tests in a sequential manner starting from HQ : r = 0. A testing sequence terminates when HQ is rejected for the first time. Osterwald-Lenum (1992) refines the critical values of the X^^^^ and X^^ statistics originally calculated by Johansen and Juselius (1990).
6.5 LA-VAR As the foregoing discussion makes clear, the Integration and cointegration must be tested before specifying the model. The VAR model in the firstorder differences is used when the variables are integrated of order one and have no cointegration between them, while the VECM is used when the variables are integrated of order one and do have cointegration between them (Fig.6.1). However, the Standard approach to testing economic hypotheses conditioned on the testing of a unit root and cointegration may suffer from severe pretest bias. Toda and Yamamoto (1995) developed the LA-VAR (lag-augmented VAR) to overcome this problem. Their approach is appealing because it remains applicable regardless of whether the VAR process is stationary, integrated, or cointegrated. Suppose that a two dimensional vector y^ =[yit 9X2,1!' is generated by the VAR(k) model as follows:
6.5 LA-VAR
63
yi,t - idX y2,t -1(1)
VAR in First Differences
Fig. 6.1 Specification of the model
yi,t=*io+Z!li*ii(')^u-i +Zili*i2(i)y2,t-i +^,t'
(6.22)
y2,t =*20 +Sti*2l(0yi,t-i +S!li*22 0)y2,t-i +^2,t^ To analyze whether y^ Granger-causes y21, we carry out the following hypothesis testing: Ho:*2i(l) = *2i(2) = - = (t)2i(k) = 0, HA •2I(1)
'^ 0 or (|)2i(2) ^ 0 or-• •(|)2i(k) ^ 0.
(6.23)
Next, we consider estimating a VAR formulated in levels by ordinary least Squares (OLS), as follows:
64
6VARModeling
yi,t =*io+Z!li*ii*^^)yi^t-i +ZLi*i2(i)y2,t-i +ui^t'
(6.24)
y2,t =*20 +S!li*2l(i)yi,t-i +S!li*22(0y2,t-i +U2,t, where p is equal to the true lag length (k) plus the possible maximum Integration Order considered in the process (d^^^^). Note that the order of integration of the process should not exceed the true lag length of the model ( d^^ < k ) . Since the true coefficients of (|)2i (k +1), • • •, (|)2i (p) are zero, we should note that they are not included in the restriction in (6.23). We can test the null hypothesis using an asymptotic chi-square distribution with k degrees of freedom. As noted by Toda and Yamamoto (1995), however, the LA-VAR method is inefficient in terms of power and should not totally replace conventional hypothesis testing methods, which are conditional on unit root and cointegration tests. Therefore, both methods can be used to assess the robustness of the empirical results.
6.6 Application to Stocic Prices This section applies the cointegration test to stock price data from the USA and Japan. The prices are measured based on the logarithmic values of the prices at the end of each month over a sample period from December 1969 to March 2004. The data are taken from the Morgan Stanley Capital International Index. This is the same data used in the last chapter. Table 6.1 shows the empirical results. In the last chapter, we found that both US and Japanese stock prices are 1(1) variables. Thus, the Johansen test is used to test whether the stock price index for each country has a cointegrating relation. The cointegration test statistic is the trace test statistic (X^^^) and maximum eigenvalue statistic ( X^^^). The null hypothesis holds that there is no cointegrating relation, and the alternative hypothesis holds that there is a cointegrating relation. The results confirm the absence of any cointegrating relation between the US and Japanese stock prices for all lag length. Thus, we specify the model as a VAR in first differences and carry out the Granger causality test. Table 6.2 shows the F-test statistic and its corresponding P-value. As we can clearly see from this table, the US stock prices Granger-cause Japanese stock prices.
References
65
Table 6.1 Cointegration test
X^^,, X^^
Lag=l
lag=3
lag=6
7.92 7 A3
8.53 7.90
8.24 7.23
5% critical value 15.41 14.07
Table 6.2 Granger causality test Null Hypothesis lag=l 9.628 USAdoesnot Granger-cause Japan (0.002) Japan does not 0.064 Granger-cause USA (0.800) Numbers in parentheses are P-value
lag=3 3.603 (0.014) 0.299 (0.826)
lag=6 2.271 (0.036) 1.206 (0.302)
References Enders W (2004) Applied econometric time series, 2nd edn. John Wiley & Sons, New York Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37: 161-194 Hamilton JD (1994) Time series analysis. Princeton University Press, Princeton Johansen S, Juselius K (1990) Maximum likelihood estimation and inferences on cointegration with application to the demand for money. Oxford Bulletin of Economics and Statistics, 52: 169-210 Osterwald-Lenum M (1992) A note with quantiles of the asymptotic distribution of the maximum likelihood cointegration rank test statistics, 54: 461-472 Toda H, Yamamoto T (1995) Statisticd inference in vector autoregressions with possibly near integrated processes. Journal of Econometrics, 66:225-250
7 Time Varying Volatility Models
7.1 Background Since the seminal work by Engle (1982), the ARCH model has reached a remarkable level of sophistication.^ The ARCH model has become one of the most prevalent tools for characterizing changing variance. Consider the basic nature of the forecasting problem. When the volatility of stock retums is constant, the confidence interval for the stock retum is a function of the sample variance or sample Standard deviation. Here, the volatility implies the conditional variance of asset retums. Yet the shocks that affect the stock retums are also likely to affect the volatility of the stock retums, hence the sample variance or Standard deviation will not be constant. For this reason, the development of a reasonably accurate confidence interval for forecasting requires an understanding of the characteristics of volatility in relationship to the stock retums. The ARCH process explicitly recognizes the difference between unconditional and conditional variance and allows the latter to change over time as a function of past errors. Data has also shown that the percentage changes in stock prices have fatter tails than the percentage changes predicted by stationary normal distributions (Kon 1984). The ARCH model recognizes the temporal dependence in the second moment of stock retums and exhibits a leptokurtic distribution for the unconditional errors from the stock-retum-generating process. Under the recognized phenomenon known as volatility clustering, a period of increased (decreased) volatility isfrequentlyfoUowed by a period of high (low) volatility that persists for some time. As the ARCH model takes the high persistence of volatility into consideration, it has often been extended to more complex models used to characterize changing variance as a function of time. Bollerslev (1986) extended it into the GARCH (general^ A survey article by Bollerslev, Chou, and Kroner (1992) cited more than 300 papers applying ARCH, GARCH, and other related models. ARCH and GARCH models were shown to successfully model time-varying volatility in financial time-series data.
68
7 Time Varying Volatility Models
ized ARCH) model; Glosten, Jagannathan, and Runkle (1993) and Zakoian (1994) extended it to the TGARCH (threshold GARCH) model; and Nelson (1991) extended it to the EGARCH (exponential GARCH) model. This chapter focuses on the ARCH-type modeling approach and the causality technique developed by Cheung and Ng (1996).
7.2 ARCH and GARCH Models
We begin with a brief review of the ARCH family of Statistical models. The ARCH model was originally designed by Engle (1982) to model and forecast the conditional variance. The process allows the conditional variance to change over time as a function of past errors while the unconditional variance remains constant. Let variable y^ have the following AR(k) process:
yt=^o+Z!li^iyt.i+^t^
£t=cj,Zt'
(7.1)
(7.2)
where z^ is identical and independent distribution (i.i.d.) with E[zj = 0 and E[z^] = l , and z^ and a^ are statistically independent. It thus holds that \.,[y']
= Ki[(yt -E,_,(y,))^] = E , j 8 f ] .
(7.3)
Thus, the conditional variance of y^ is equal to the conditional variance of The kurtosis (K) of s^ is defined as foUows: K[Ej = E[E',]/iE[s',]f.
(7.4)
7.2 ARCH and GARCH Models
69
Suppose Zj has a normal distribution. Since the kurtosis of a normal distribution is equal to 3, we have K(Zt) = E[z^^]/(E[zf])^ =3 . For s^, it holds that
K(s.) = - H # ^ E[aJ]E[z^] (E[af])^(E[zn)3E[a?] 2i\2
(£[3.
where the second equality follows from the independence of a^ and z^, and the inequality in the fourth line is implied by Jensen's inequality.^ Equation (7.5) shows that the distribution of s^ has a fatter tail than normal as long as G^ is not constant (Campbell, Lo and MacKinlay, 1997). This is consistent with the idea that the percentage changes in stock prices have fatter tails than the percentage changes predicted by a stationary normal distribution (Kon 1984). The ARCH(p) model is specified as follows:
G^,=(iy-{-Y,l=i^i^ii^ ^ - 0 ' a i > 0 .
(7.6)
The conditional variance at time t depends on two factors: a constant ( o ) and past news about volatility taken as the squared error from the past (the ARCH term, i.e.,^^ a-s^ ). The p of the ARCH(p) refers to the number of ARCH terms in equation (7.4). The condition o > 0, a^ >0 guarantees the non-negativity of variance. As equation (7.6) clearly shows, the conditional variance is the weighted average of the squared values of past errors. For the ARCH model, it holds that Var^_JyJ = Et_i[s^] = afE^_Jzf] = af, where a^ is the conditional variance of y^ and is called volatility. ^ Let X be a random variable with mean EpC], and let g(») be a convex function. Here, Jensen's inequality implies E[g(X)] > g(E[X]). For example: g(X)=X^ is convex, hence EP(^]>(E[X])\
70
7 Time Varying Volatility Models
The GARCH model developed by Bollerslev (1986) is an extension of the ARCH model. The ARCH(p) process specifies the conditional variance solely as a linear function of past sample variances, whereas the GARCH(p,q) process allows lagged conditional variances to enter as well. This corresponds to some sort of adaptive learning mechanism. The variance dynamics is thus specified as follows:
^t = « + Si=i^i^t-i + Zi=iPi^t-i , 0) > 0 , a, > 0 , ßi > 0 .
(7.7)
The conditional variance at time t depends on three factors: a constant (co), past news about volatility taken as the squared error from the past (the ARCH term, i.e., ^ ^ _ a^^^_^), and past forecast variance (the GARCH term, i.e., V ^ ßjCJ^.j). The (p,q) in GARCH(p,q) refers to p ARCH terms and q GARCH terms. The condition (o>0, a i > 0 , ß i > 0 guarantees the non-negativity of variance.^ This specification is logical since variance at time t is predicted by forming a weighted average of the forecast from the past and either a long-term average or constant variance. Example7.1: GARCH(1,1) Model Let US consider the simple GARCH(1,1) model as follows i^ Gt^^co + as^^.i+ßaf.i.
(7.8)
As clearly seen from equation (7.8), the GARCH (1,1) model includes one ARCH term (St.i) and one GARCH (cjf.i) term. If equation (7.8) is lagged by one period and substituted for the lagged variance on the righthand side, an expression with two lagged squared errors and a two-period
^ Nelson and Cao (1992) show that inequality constraints less severe than those commonly imposed are sufficient to keep the conditional variance non-negative. In the GARCH(2,1) case, for example, c o > 0 , a j > 0 , ß, > 0 , and ß^a^+a^ > 0 are sufficient to ensure G^ > 0 , such that a^ may be negative. "^ The Parameter subscripts are not necessary for the GARCH(1,1), TGARCH(1,1), and EGARCH(1,1) models and are suppressed for the remainder of this section.
7.3 TGARCH and EGARCH Models
71
lagged variance is obtained. By successively substituting for the lagged conditional variance, the foUowing expression is found: 2
CO
^ ^^""LiJ 0, a i > 0 , ßi>0, yi>0, where the dummy variable D^.- is equal to 0 for a positive shock (e^.^ > 0 ) and 1 for a negative shock ( s^.^ < 0 ). Provided that y^ > 0, the TGARCH model generates higher values for a^ given E^_- < 0, than for a positive shock of equal magnitude. As with the ARCH and GARCH models, the Parameters of the conditional variance are subject to non-negativity constraints. Example 7.2: TGARCH(1,1) Model As a special case, the TGARCH(1,1) model is given as: af = CO + (a + yDt.i )8^.i + ßcl^.
(7.13)
In this case, equation (7.13) becomes Gt =co + a8^.i + ßaf.i,
(7.14)
for a positive shock (8^.^ > 0), and af = CO + (a + y)8f.i + ßal^,
(7.15)
for a negative shock ( E^_^ < 0 ). Thus, the presence of a leverage effect can be tested by the hypothesis that y = 0, where the impact is asymmetric if y^^O. An alternative way of describing the asymmetry in variance is through the use of the EGARCH (exponential GARCH) model proposed by Nelson (1991). The EGARCH(p,q) model is given by
7.3 TGARCH and EGARCH Models
log(a?) = co + 5 ; f ^ ^ ( a J z , J n i Z t J + Z!lißil^g(^?.iX
73
(7.16)
where z^ = ^Jo^. Note that the left-hand side of equation (7.16) is the log of the conditional variance. The log form of the EGARCH(p,q) model ensures the non-negativity of the conditional variance without the need to constrain the coefficients of the model. The asymmetric effect of positive and negative shocks is represented by inclusion of the term z^_-. If y- > 0 , volatility tends to rise (fall) when the lagged standardized shock, ^t-i - ^t-i/^t-i' is positive (negative). The persistence of shocks to the conditional variance is given by ^._jßi • Since negative coefficients are not precluded, the EGARCH models allows for the possibility of cyclical behavior in volatility. Example7.3: EGARCH(1,1) Model As a special case, the EGARCH(1,1) model is given as foUows: log(af) = Q + a|z,.i |+yz,.i +ßlog(af.i).
(7-17)
Equation (7.17) becomes log(a^) = CO + (a + y) I z,_, \ +ßlog(aJ.i),
(7.18)
for a positive shock (z^ > 0 ), and log(a^) = CO + ( a - y ) I z,., | +ßlog(a^.i),
(7.19)
for a negative shock ( z^ < 0 ). Thus, the presence of a leverage effect can be tested by the hypothesis that yi = 0, where the impact is asymmetric if y. ^ 0. Furthermore, the sum of a and ß govems the persistence of volatility shocks in the GARCH (1,1) model, whereas only parameter ß govems the persistence of volatility shocks in the EGARCH(1,1) model.
74
7 Time Varying Volatility Models
7.4 Causality-in-Variance Approach Cheung and Ng (1996) developed a testing procedura for causality-inmean and causality-in-variance. This test is based on the residual crosscorrelation function (CCF) and is robust to distributional assumptions. Their procedure to test for causality-in-mean and causality-in-variance consists of two steps. The first step involves the estimation of univariate timeseries models that allow for time Variation in both conditional means and conditional variances. The second step constructs the residuals standardized by conditional variances and the squared residuals standardized by conditional variances. The CCF of standardized residuals is used to test the null hypothesis of no causality-in-mean, while the CCF of squaredstandardized residuals is used to test the null hypothesis of no causality-invariance. In the vein of Cheung and Ng (1996) and Hong (2001), let us summarize the two-step procedure of testing causality. Suppose that there are two stationary time-series, X^ and Y^. When I^^, l2t and I^ are three Information sets defmed by I^ =(Xj,X^.i,--)9 l2,t =(%'^t-i'***) ^^^ ^t = (Xt X^.i,---Y^,Yt.i,---), Y is Said to cause X inmeanif E[XJI,,J^E[XJI,J.
(7.20)
Similarly, X is said to cause Y in mean if E[YJl2,,.,]^E[Y.|I,J.
(7.21)
Feedback in mean occurs if Y causes X in mean and X causes Y in mean. On the other hand, Y is said to cause X in variance if E[(X, - ^i, ,)^ 11,, J ^ E[(X, - ^i,,)^ 11, J ,
(7.22)
where |ix,t is the mean of X^ conditioned on Ij ^-i. Similarly, X is said to cause Y in variance if
7.4 Causality-in-Variance Approach
E[(Y, - ^i^,)^ 11^,, J ^ E[(Y, - ^i^,)^ 11, J ,
75
(7.23)
where |iY,t i^ ^^e mean of Y^ conditioned on I2 ^.j. Feedback in variance occurs if X causes Y in variance and Y causes X in variance. The causality-in-variance has its own interest since it is directly related to volatility spillover across different assets or markets. As the concept defined in equations (7.20) through (7.23) is too general to test empirically, additional structure is required to make the general causality concept applicable in practice. Suppose X^ and Y^ can be written as:
Xt=^x,t+Vh^£t^
(7.24)
Yt=^Y,t+Vh^^t'
(7.25)
where s^ and ^^ are two independent white noise processes with zero mean and unit variance. For the causality-in-mean test, we have the standardized Innovation as follows: ^t=(Xt-^x,t)/Vh^'
(7.26)
^t=(Yt-^Y,t)/Viv7-
(7.27)
Since both s^ and C,^ are unobservable, we have to use their estimates, 8^ and ^t, to test the hypothesis of no causality-in-mean. Next, the sample cross-correlation coefficient at lag k, fg^(k), is computed from the consistent estimates of the conditional mean and variance of X^ and % . This gives us
76
7 Time Varying Volatility Models
r,^(k) = c,^(k)/^c,,(0)c^^(0),
(7.28)
where Cj^(k) is the k-th lag sample cross-covariance given by
c,^(k) = (l/T)^(s,
- g)(4.k - ^ ) , k = 0,±1,±2,...,
(7.29)
and similarly, Cgg(O) and c^^(O) are defined as the sample variances of 8^ and C,^, respectively. Causality in the mean of X^ and Y^ can be tested by examining r^^Ck), the univariate standardized residual CCF. Under the condition of regularity, it holds that ^ / T f , ^ ( k i ) - ^ ^ N ( 0 , l ) , i=l,2,
m,
(7.30)
where —^^-^ shows the convergence in distribution. We can test the null hypothesis of no causality-in-mean using this test statistic. To test for a causal relationship at a specified lag k, we compute >/Tfg^(k). If the test statistic is larger than the critical value of Standard normal distribution, then we reject the null hypothesis. For the causality-in-variance test, let u^ and v^ be the Squares of the standardized innovations, given by Ut=(Xt-^lx,t)7hx,t=^^
(7.31)
^t=(\-\^Y,tf/Kt=^l
(7.32)
Since both u^ and v^ are unobservable, their estimates, ü^ and v^, have to be used to test the hypothesis of no causality-in-variance.
7.5 Information Flow between Price Change and Trading Volume
77
Next, the sample cross-correlation coefficient at lag k, r^vC^)» is computed from the consistent estimates of the conditional mean and variance of Xt and Y^. This gives us:
?uv(k) = c,,(k)/Vc,,(0)c^(0) ,
(7.33)
where c^^(k) is the k-th lag sample cross-covariance given by
c,,(k) = ( l / T ) ^ ( ü , - ü ) ( v , . , - ^ ) , k = 0,±l,±2,...,
(7.34)
and similarly, Cyy(O) and c^(0) are defined as the sample variances of u^ and v^, respectively Causality in the variance of X^ and Y^ can be tested by examining the squared standardized residual CCF, v^^(k). Under the condition of regularity, it holds that ^/Tt(k,)^-^N(0,l),i=l,2,
m.
(7.35)
We can test the null hypothesis of no causality-in-variance using this test statistic. To test for a causal relationship at a specified lag k, we compute vTf^j^(k-). If the test statistic is larger than the critical value of Standard normal distribution, then we reject the null hypothesis.
7.5 Information Flow between Price Change and Trading Volume Many researchers have studied the interaction between price and trading volume in financial assets.^ The importance of this relationship stems from the widespread belief that the arrival of new Information induces trading in asset markets. The trading volume is thought to reflect Information about ^ The content of this section is based on the foUowing paper with permission from the Journal: Bhar R, Hamori S (2004) Information flow between price change and trading volume in gold flitures contracts, International Journal of Business and Economics, 3: 45-56.
78
7 Time Varying Volatility Models
aggregate changes in investor expectations. Researchers might also be interested in the prospect of devising profitable technical trading rules based on strong relationships they observe between price and trading volume. The ability to forecast better price movement in the futures market might also help improve hedging strategies.^ This section attempts to characterize the interaction between the percentage price change and trading volume in gold futures contracts. Gold futures are an interesting market to investigate since the events in other markets, for example, equities, are generally expected to influence the trading of gold. When the equity market underperforms, speculative trading in the gold market is likely to rise. If this occurs, the rising short sales of gold will be transacted chiefly in the futures market due to the relative difficulty of taking short positions in the physical market. In combination, these changes could lead to different pattems of Information flow between the percentage price change and trading volume in gold futures as opposed to the other commodity futures contracts mentioned before. This chapter uses daily data on the gold future price and trading volume from January 3, 1990 to December 27, 2000. The continuous series of futures data are obtained from Datastream and represent NYMEX daily settlement prices. The percentage retum is calculated as yt=(P^-P^.i)xlOO/P^.i, where P^ is the future price at time t. Thus, the percentage price change is obtained for the period between January 4, 1990 and December 27, 2000. We model the dynamics of the percentage price change and trading volume using the AR-GARCH process as follows. The simplicity of the AR structure in the mean equation justifies its use for the Single time series here. The GARCH effect in the variance process is well known for most futures contracts, particularly in the daily frequency. Yt = ao + Süi^i^t-i + ^t' £t|t-i - N(0,a,'),
(7.36)
^t'=«^If>.^t-l^^S:,Mt-^
(7.37)
where y^ is the percentage price change (R^) or trading volume (V^). Equation (7.36) shows the conditional mean dynamics and is specified as ^ Other recent studies focusing on these issues include Fujihara and Mougoue (1997), Moosa and Silvapulle (2000), and Kocagil and Shachmurove (1998).
7.5 Information Flow between Price Change and Trading Volume
79
the AR( Pi) model. Here, 8^ is the error term with its conditional variance a^^. Equation (7.37) shows the conditional variance dynamics and is specified as the GARCH (P2 ? P3) niodel. The variables p2 and P3 are the number of ARCH terms and GARCH terms, respectively. The results from the fitting of the AR-GARCH model to the percentage price change and trading volume are reported in Table 7.1. Schwarz Bayesian Information criteria (SBIC) and diagnostic statistics are used to choose the final models from various possible AR-GARCH specifications. The maximum likelihood estimates confirm that the percentage price change and trading volume exhibit significant conditional heteroskedasticity. The lag Order of the AR part in the mean equation (7.36) is set at five for price data and ten for trading volume data. The GARCH (2,1) model is chosen for the percentage price change, while the GARCH (1,1) model is chosen for the trading volume. For the price data, the coefficient of the GARCH term is 0.966 and the corresponding Standard error is 0.009, indicating substantial persistence. For the trading volume data, the coefficient of the GARCH term is relatively small, 0.908, and its corresponding Standard error is 0.035, indicating less persistence compared to the price data. Q(20)and Q^(20) are the Ljung-Box statistics with 20 lags for the standardized residuals and their Squares. The Q(20) and Q^(20) statistics, values calculated from the first 20 autocorrelation coefficients of the standardized residuals and their Squares, indicate that the null hypothesis of no autocorrelation is accepted for both the price and trading volume. This suggests that the selected specifications explain the data quite well. The cross correlations computed from the standardized residuals of the AR-GARCH models of Table 7.1 are given in Table 7.2. The "lag" refers to the number of days that the trading volume data lags behind the percentage price change data. The "lead" refers to the number of days that the percentage price change data lags behind the trading volume data. The significance of a statistic in the "lag" column implies that the trading volume causes the percentage price change. Similarly, the significance of a statistic in the "lead" column implies that the percentage price change causes the trading volume. Cross correlation statistics under the "Levels" columns are based on standardized residuals themselves and are used to test for causality in the mean. Cross correlation statistics under the "Squares" columns are based on the Squares of standardized residuals and are used to test for causality in the variance.
80
7 Time Varying Volatility Models
Table 7.1 AR-GARCH model for percentage price change and trading volume Yt =ao +!].=,^iy.-i +^.' ^t|t-i ~ N ( 0 , a , ' )
Percentage ]Price Change SE Estimate
Trading Volume SE Estimate
ao a,
-0.016
0.012
3.069**
0.296
-0.033
0.025
0.368**
0.021
»2
-0.013
0.021
0.090**
0.020
»3
-0.045
0.024
0.077**
0.022
^4
0.017
0.024
0.065**
0.023
as
0.034
0.021
0.037
0.022
ae a,
-0.006
0.021
-0.006
0.020
ag ag
-0.016
0.020
0.046*
0.021
aio G)
0.001
0.001
0.043* 0.014*
0.021 0.007
«1
0.177**
0.062
0.041**
0.014
ttj
-0.141*
0.060
ßl Loglikelihood Q(20) P - value
0.966**
0.009
0.908**
0.035
Q'(20) P - value
-2974.926
2111.105
15.978 0.718 18.30C1
16.79][ 0.667 22.070
0.568
0.337
* indicates significance at the 5% level. ** indicates significance at the 1% level. Bollerslev-Woodridge (1992) robust Standard errors are used to calculate the t-value. Q(20) and Q^(20) are the Ljung-Box statistics with 20 lags for the standardized residuals and their Squares.
References
81
Table 7.2 Gross correlation analysis for the levels and Squares of the standardized residuals Levels
Lag
Lead R&(+k)
Squares
Lag
Lead
R&V(-k) R&V(-k) R&V(+k) k k 0.019 0.223=** 0 0 1 1 -0.015 0.025 -0.009 -0.006 2 0.022 2 0.017 0.008 -0.038* 0.039* 3 0.038* 3 0.001 -0.007 4 0.037 4 0.034 0.001 -0.015 0.022 5 -0.010 5 0.015 0.005 6 0.005 -0.020 6 0.009 -0.001 0.039* 7 0.007 7 -0.025 -0.025 0.032 8 0.001 8 -0.007 -0.036 0.014 -0.002 9 0.001 9 0.006 0.000 0.003 10 0.024 10 0.016 indicates significance at the 5% level. ** indicates significance at the 1% level.
The empirical results of cross correlations in Table 7.2 reveal a complex and dynamic causation pattem between the percentage price change and the trading volume. For instance, the feedback effects in the means involve a high-order lag structure. Trading volume causes the mean percentage price change at lag three at the 5% significance level. The percentage price change causes the mean of the trading volume at lag three and seven at the 5% significance level. Further, there is evidence of strong contemporaneous causality-in-variance and mild lagged causality-in-variance When moving from the percentage price change to the trading volume, but not vice versa. The percentage price change causes variance of the trading volume at lag two at the 5% significance level. These results show that a proper account of conditional heteroskedasticity can have significant implications for the study of price change and trading volume spillovers. The Information flows between the price change and trading volume influence not only their mean movements, but also their volatility movements in this market.
References Bhar R, Hamori S (2004) Information flow between price change and trading volume in gold fiitures contracts. International Joumal of Business and Economics, 3:45-56 Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. Joumal of Econometrics, 31: 307-327
82
7 Time Varying Volatility Models
Bollerslev T, Chou RY, Kroner KF (1992) ARCH modeling in finance. Journal of Econometrics, 52: 5-59 Campbell JY, Lo AW, McKinlay AC (1997) The econometrics of financial markets. Princeton University Press, Princeton, New Jersey. Cheung Y-W, Ng LK (1996) A causality-in-variance test and its application to financial market prices. Journal of Econometrics, 72: 33-48 Christie AA (1982) The stochastic behavior of common stock variances: value, leverage and interest rate effects. Journal of Financial Economics, 10: 407-432 Enders W (2004) Applied econometric time series, 2nd edn. John Wiley & Sons, New York Engle RF (1982) Autoregressive conditional heteroskedasticity with estimates of the variance of United Kingdom Inflation. Econometrica, 50: 987-1008 Fujihara RA, Mougoue M (1997) An examination of linear and nonlinear causal relationships between price variability and volume in petroleum futures markets. Journal of Futures Markets, 17: 385-416 Glosten LR, Jagannathan R, Runkle DE (1993) On the relation between the expected value and the volatility of the nominal excess retum on Stocks. Journal of Finance, 48: 1779-1801 Hong Y (2001) A test for volatility spillover with application to exchange rates. Journal of Econometrics, 103: 183-224 Kocagil AE, Shachmurove Y (1998) Retum-volume dynamics in futures markets. Journal of Futures Markets, 18: 399-426 Kon SJ (1984) Models of stock retums: a comparison. Journal of Finance, 39: 147-165 Moosa lA, Silvapulle P (2000) The price-volume relationship in the crude oil futures market: some results based on linear and nonlinear causality testing. International Review of Economics and Finance, 9: 11-30 Nelson DB (1991) Conditional heteroskedasticity in asset retums: a new approach. Econometrica, 59: 347-370 Nelson DB, Cao CQ (1992) Inequality constraints in the univariate GARCH model. Joumal of Business and Economic Statistics, 10: 229-235 Schwert GW (1989) Why does stock market volatility change over time? Joumal of Finance, 44: 1115-1153 Zakoian J-M (1994) Threshold heteroskedastic models. Joumal of Economic Dynamics and Control, 18: 931-955
8 State-Space Models (I)
8.1 Background Many problems that arise in the analysis of data in such diverse fields as air pollution, economics, or sociology require the researcher to work with incompletely specified noisy data. For example, one might mention pollution data where values can be missing on days when measurements were not made or economic data where several different sources are providing partially complete data relating to some given series of interest. We, therefore, need general techniques for interpolating sections of data where there are missing values and for construction of reasonable forecasts of future values. A very general model that subsumes a whole dass of special cases of interest in much the same way that linear regression does is the state-space model (SSM) or the dynamic linear model (DLM). This was introduced by Kaiman (1960), and Kaiman and Bucy (1961). The model was originally intended for aerospace-related research but it has found immense application in economics. In this approach typically dynamic time series models that involve unobserved components are analyzed. The wide ränge of potential applications in econometrics that involve unobserved variables include, for example, permanent income, expectations, the ex ante real interest rate etc. Before we introduce the basic idea in the formulation of a state-space model, we will revise the concepts in classical regression, both univariate and multivariate cases.
8.2 Classical Regression We Start this discussion in the time series context by assuming some Output or dependent time series, say, x^jt = l,2,...,n , that is being influenced by a coUection of possible input or independent series, say, z^pZ^2,...,z ,
84
8 State-Space Models (I)
where we consider the input as fixed and known. This relationship is expressed through the linear regression model, as, \
= ßlZtl +ß2Zt2 +-+ßqZtq + W^ ,
(g.l)
where ßi,ß2,...,ßq are unknown fixed regression coefficients and w^is the random error or noise. This is normally assumed to be white noise with mean zero and variance a^. This linear model above can be more conveniently described using matrix and vector notation as, x^=ßX+w,,
(8.2)
where, ß = [ßi,ß2,...,ßq]' and z^=[z^^,z^2^'"^\Y - When the noise term w^ is normally distributed it can be shown that an estimate of the coefficients of interest are given by, ß = (Z7)"'z'X,
(8.3)
when the rank of (Z'Z) is q. In equation (8.3) we defme, Z = [Zi,Z2,...,z„]' and X = [Xi,X2,...,x„]'.Similarly an estimate of the variance of the noise is given by,
^2 ^ t r :
1
(8.4)
It is often necessary to explore the Statistical significance of the estimates of the coefficients and this is done with the help of the diagonal elements of the covariance matrix of the coefficients given by, cov.(ß) = a ^ Z ' Z ) - ' .
(8.5)
8.2 Classical Regression
85
To understand the capabilities of the state-space models a basic understanding of the multivariate time series regression technique is required. We next discuss this multivariate time series technique in the classical regression context. Suppose, instead of a single Output variable y^, a collection of p Output variables yti,yt29-'ytp^^ist that are related to the inputs as,
yt. = ßnZti + ß.2Zt2 + - + ßiqZtq + w, ,
(8.6)
for each i = 1,2,...,p Output variable. We assume that the noise w^j terms are correlated through the identifier I but uncorrelated through t. We will denote cov.(wj3,Wjt) = aijfor s=t and 0 otherwise. In matrix notation, let Vi =[yti'yt2.-'ytp]'be the vector of Outputs, and B = {ß^j}, i = l,2,...,p, j = l,2,...,q be an pxq matrix containing the regression coefficients, then, y,=Bz,+w,.
(8.7)
Here, the p x l vector process w^ is the collection of independent noises with covariance matrixE{w^wj} = E^, the pxpmatrix containing the covariances G-- . The maximum likelihood estimator in this case (similar to the univariate case) is,
B = Y'Z{Z'Z)"\
(8.8)
where, Z' = [Zi,Z2,...,z„] and Y' = [ypy2,...,yn] . The noise covariance matrix is given by,
^-=ö^|(^'"^'')(^'"^'')'-
^^-^^
Again the Standard error of estimator of the coefficients are given by,
86
8 State-Space Models (I)
se(ßy) = V ^ '
(8-10)
where c^- is the j * diagonal element of t ^ a n d c^, is the i* diagonal elementof (^"^jZX) •
8.3 Important Time Series Processes In this section we review the structure of most commonly encountered univariate and multivariate time series processes. These are autoregressive (AR), autoregressive moving average (ARMA), vector autoregressive (VAR) and vector autoregressive moving average (VARMA) processes. These time series processes become very useful while discussing statespace models. An AR process of order p i.e. AR(p) for a univariate series x^ is represented as, Xt =l^ + iXt-i +c|)2Xt_2 +... + (|)pXt_p + w,.
(8.11)
Here |i is a constant. It is instructive to note that by suitably defining some vector quantities, it is possible to represent equation (8.11) as a regression equation examined before. For example, let (|) = [c|)i,(|)2?—?p]' and Xt_i =[Xt_i,Xt_2,...,Xt_p]',then, x,=fX,_i + w,.
(8.12)
There are, however, some technical difficulties with this representation when compared to the regression equation (8.2), since z^ was assumed fixed. In this case, X^_^ is not fixed. We will not, however, pursue further these technical issues. An alternative to the autoregressive model in which x^ on the left hand side is assumed to be determined by a linear combination of the white
8.3 Important Time Series Processes
87
noise w^on the right hand side. This gives rise to the moving average model of Order q, MA(q), and is represented as,
x, = w,+e,w,_i+...+e w. ,
(8.13)
where there are q lags in the moving average and öi,02,...6q are parameters that determine the overall pattem of the process. A combination of these two models has also found many usefül applications. This is referred to as autoregressive moving average or ARMA(p,q) model. This has autoregressive order p and moving average order q. The general structure is, X, = |Ll + (|)iX,_i + (t)2X,_2 + ... + (|)pX,_p
(8.14)
+w,+Q,w,_,+e^w,.^+...+e^w^.^,
Jan-80
Jan-82
Jan-84
Jan-86
Jan-88
Jan-90
US Stock Price Fig. 8.1 US stock price and stock retum
Jan-92
Jan-94
Stock Retum
Jan-96
Jan-98
88
8 State-Space Models (I)
Another important concept that goes with time series modeling is the concept of stationarity. A time series is strictly stationary if the probabilistic behavior of x, ,x, ,...,x, is identical to that of the shifted set, X i^,x i^,...,Xt^^i^for any collection of time points tpt2,...,ti^ , for any numberk = l,2,..., and for any shifth = 0,±l,±2,.... This means that all multivariate distribution functions for subsets of variables must agree with their counterparts in the shifted set for all values of the shift parameter h. For a practical understanding of stationary and non-stationary series we show the US stock price movement over the period January 1980 to December 1998 in Fig.8.1. We also include the stock retum i.e. percentage change in stock price over the same period in Fig.8.1. It should be visually clear that the stock price series is non-stationary, whereas the retum series is stationary. There are, however, Statistical tests that can be employed to establish that fact. A typical way to convert a non-stationary series to a stationary one is to take consecutive differences of the values. In a sense, to compute the percentage change in the stock price that is what we are doing. We next discuss some examples of time series processes involving multiple time series. In dealing with economic variables often the value of one variable is not only related to its past value in time, in addition, it depends on past values of other variables. For instance, household consumption expenditures may depend on variables such as income, interest rates, and Investment expenditures. If all these variables are related to the consumption expenditures it makes sense to use their possible additional Information content in forecasting consumption expenditures. Consider a possible relation between growth rate in GNP, money demand (M2) and an interest rate (IR). The foUowing VAR(2) model may express the relationship between these variables: 'GNP," "2" 0.7 0.1 0 GNP,_. M2, = 1 + 0 0.4 0.1 M2.., [iR. _ 0 0.9 0 0.8 -0.2 0 0 GNP.t-2 0 0.1 0.1 M2t-2 0 0 0 IR t-2
(8.15) w, w 2t w
where the noise sources may have the following covariance structure:
8.4 Recursive Least Squares 0.26 0.03 0 0.03 0.09 0 0 0 0.81
89
(8.16)
As with the ARMA model for univariate series, the structure of a VARMA(2,1) model for a bi-variate System has the following appearance:
^2t
0.3 0.5
0.5 0.1 0.4 0.5
"^it-i
0 0 0.25 0
W
w 2t
4-
^lt-2 ^2t-2
0.6 0.2 W lt-1 0 0.3 w 2t-l
(8.17)
8.4 Recursive Least Squares An estimator of the parameter vector in the classical regression model is computed from the whole of the given sample. Referring to equation (8.3) it is clear that such an estimate requires Inversion of a product matrix. Suppose that instead of all n observations being available at the same time, we have access to data up to time t, and we compute an estimate of the Parameters utilizing that available series. We then need a mechanism to update the estimate using the newly available data. There are some advantages of generating regression estimates this way. For example, (a) it enables us to track changes in the parameters with time, and (b) it reduces the computation task by not requiring large matrix Inversion. This section outlines this procedure and it will help understand the nature of state-space models (discussed in the next section) in a more dynamic setting. The regression estimator based on the first t observations may be written as in equation (8.3) but using only data up to time t,
ß,=(z:zO-'z;x.
(8.18)
Once the (t +1)* Observation becomes available, ß,^i may be obtained as,
90
8 State-Space Models (I)
ß,,,=ß,+K,,,(x,,,-z;J,).
(8.19)
You should note that the expression in the parenthesis in the above equation reflects the forecast error based on the regression estimates of the Parameters ß^. This is referred to as recursive residual and when multiplied by the gain vectorK^^j, it represents the necessary adjustments to the parameter estimates. We can compute the gain vector as,
When we have processed all the n observations the final estimate of the Parameters will be equal to the usual regression estimates discussed earlier. Now let US simplify the equation (8.20) so that we do not need to invert the product matrix as each Observation becomes available. We will adopt P^ =(Z[Z^)"^ to simpHfy the exposition. We will use the following matrix Inversion lemma (Maybeck 1979), for any given three compatible matrices, (A + BC)"' = A"^ - A-'B(I + CA-^B)-^CA-^.
(8.21)
For our problem, we identify, A = (ZJZ^),B = z^^j,C = B', then
p..=p.-p.. y r ;
p.-
(8.22)
Note that the denominator in the above equation is a scalar and does not represent a matrix Inversion. As new Observation becomes available we update the estimates of the parameter vector via equation (8.19) and at the same time update P^ via equation (8.22) efficiently without the need for additional inversion. Also, we can use the final estimate of P^ to obtain the covariance of the parameter estimates as given in equation (8.5). Harvey
8.5 State-Space Representation
91
(1990) gives additional information about the usefulness of the recursive residuals in model identification.
8.5 State-Space Representation The SSM in its basic form retains a VAR(l) structure for the State equation, yt=ry,-i + w,,
(8.23)
where the State equation determines the rule for generation of the states y^from the past states Yt-g ? j = l,2,...,p for i = l,2,...,p and time points t = l,2,...,n . For completeness we assume that w^are p x 1 independent and identically distributed zero-mean normal vectors with covarianceQ. The State process is assumed to have started with the initial value given by the vector, yo, taken from normally distributed variables with mean vector |io and the pxp covariance matrix, loThe State vector itself is not observed but some transformation of these is observed but in a linearly added noisy enviroimient. Thus, the measurement equation is given by, Zt=A,y,+v,.
(8.24)
In this sense, the q x 1 vector z^ is observed through the q x p measurement matrix F^ together with the q x 1 Gaussian white noise v^, with the covariance matrix, R. We also assume that the two noise sources in the State and the measurement equations are uncorrelated. The model arose originally in the Space tracking area, where the State equation defmes the motion equations for the space position of a spacecraft with location y^ and z^ reflects information that can be observed from a tracking device such as velocity and height. The next step is to make use of the Gaussian assumptions and produce estimates of the underlying unobserved State vector given the measurements up to a particular point in time. In other words, we would like to find out, E[yJ(Zt_i,Zt_2---Zj)] and the covariance matrix, Pt|t_i =
92
8 State-Space Models (I)
E[(yt ~ yt|t-i)(yt ~ ytiti)'] • TWS IS achieved by using Kaiman filter and the basic System of equations is described below. Given the initial conditions yop = 1^0'^^^^oio =^o ' ^^^ observations madeattime 1,2,3 T, ytit-i=ryt-i|t-i,
(8.25)
Ptit-i=rp,_,,,_,r'+Q,
(8.26)
ytit = yt|t-i + K, (z, - A , y^i^.j),
(8.27)
where the Kaiman gain matrix K,=P.MA;[A,V,A' + R ] " \
(8.28)
and the covariance matrix P^|^ after the t* measurement has been made is,
Pt|t=[l-KA]Pt|t-i-
(8.29)
Equation (8.25) forecasts the State vector for the next period given the current State vector. Using this one step ahead forecast of the State vector it is possible to define the Innovation vector as, Vt=Zt-Atyt|t-i.
(8.30)
^t=A,P,MA;+R.
(8.31)
and its covariance as,
Since in finance and economic applications all the observations are available, it is possible to improve the estimates of State vector based upon
8.5 State-Space Representation
93
the whole sample. This is referred to as Kaiman smoother and it Starts with initial conditions at the last measurement point i.e. y-p|-p and ?j^j . The following set of equations describes the smoother algorithm:
yt-iiT ~ yt-iit-i "^ ^ t-i vtiT ~ Xtit-i j 9
^t-i|T ~ "t-i|t-i "•" ^t-1 V "t|T ~ ^t|t-i j '^ t-19
(8.32)
(8.33)
where
J..,=Pt-„.-.r'[v,r'.
(8.34)
It should be clear from the above that to implement the smoothing algorithm the quantities y^^ andP^^ generated during the filter pass must be stored. It is worth pointing out here that in EViews Implementation of ARMA model in state-space framework, the measurement error is constrained to be zero. In other words, R = 0 for such models. The description of the above filtering and the smoothing algorithms assumes that these parameters are known. In fact, we want to determine these Parameters and this achieved by maximizing the innovation form of the likelihood function. The one step ahead innovation and its covariance matrix are defmed by the equations (8.30) and (8.31) and since these are assumed to be independent and conditionally Gaussian, the log likelihood function (without the constant term) is given by,
log(L) = - t log|E, (0)1 - X v; (©)£.-' (0)v. ( e ) . t=l
(8.35)
t=l
In this expression 0 is specifically used to emphasize the dependence of the log likelihood function on the parameters of the model. Once the function is maximized with respect to the parameters of the model, the next step of smoothing can start using those estimated parameters. There are different numerical approaches that may be taken to carry out the maximization of the log likelihood function. The computational com-
94
8 State-Space Models (I)
plexity and other numerical issues are beyond the scope of this book. However, some intuitions in these matters will be given in a later chapter. In Order to encapsulate this adaptive algorithm we have given a schematic in Fig. 8.2. This should help clarify the flow of the filter process as observations are processed sequentially.
8.6 Examples and Exercises Example 8.1: State-Space Representation of a VARMA (2,1) Model Consider the example given in equation (8.17) and assume that the series have been adjusted for their mean. This implies that the first constant vector would be zero. In order to put this model in the state-space form we need to define the System matrices as foUows: i,t •2,t
X l,t-l Vt^
(8.36)
^2,t-l
w,l,t w 2,t
the State vector and it is also the measurement vector in this setup. The State transition matrix and the State noise vector may be defined as, ).2" "w,.,1 0.5 0.1 0 0 0.6 0.2 ).3 0.4 0.5 0.25 0 0 0.3 W2,t 0 0 1 0 0 0 0 w = r= 0 1 0 0 0 0 0 0 0 0 0 0 0 Wl.. 0 0 0 0 0 0 .W2,,_
(8.37)
It is now clear that the measurement equation would not have any noise term and the measurement matrix is,
8.6 Examples and Exercises
y tit' "t
State Dynamics
t+1
't+i|t
Prediction Error ^t+i ~ ' ^ t + i ~ ^ t + i | t
Accumulate Prediction Error Form of Likelihood Function
yt+iit' "it+iit
Updating Equations
yt+iit+i' t+iit+i' ^"it+iit+i
Use v^^pP^^ii,
T Fig. 8.2 Filter schematic
t+2
95
96
8 State-Space Models (I) 1 0 0
0 0 0 0 1 0
0 0 0
(8.38)
If the matrix Clements were not known and had to be estimated, this state-space representation of the model may be estimated by the maximum likelihood method as discussed earlier. For several other examples, see Lutkepohl (1993). Example 8.2: Signal Extraction Consider the quarterly eamings of Johnson & Johnson (data obtained from Shumway and Stoffer (2000)) shown in Fig.8.3. It seems likely that the series has a trend component and superimposed on that is the seasonal (quarterly) variations in eamings. Our aim in this example is to apply statespace model approach to extract such seasonal Signals as well as the trend eamings component. The upward bend in the curve could not be removed by making some functional transformation of the series e.g. taking logarithms and/or Square or cube roots. We would, therefore, model the series in its original form and represent is as, y,=T,+S,+v,,
(8.39)
where the trend component is T^ and the seasonal component is S^ .We will let the trend component to grow exponentially or in other words, T,=(|)T,_,+Wi,,
(8.40)
where (|) > 1. We will assume that the seasonal (quarterly) components are such that over the year these sum up to zero or white noise. Thus, S,+S,_i+S,_2+S,_3=W2,.
(8.41)
8.6 Examples and Exercises
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
97
81
Fig. 8.3 Quarterly eamings
Table 8.1 Trend and seasonal component model of quarterly eamings Estimates Std Error
1.03508 0.14767
1.23e-10 1.82e-ll
qii
q22
O.ÖT97I 0.00291
0.04927 0.00727
The equations (8.39) - (8.41) may now be put into the state-space form with the foUowing definitions:
z,=[l
+ v,, v,~N(0,r„)
1 0 0] 't-i t-2.
(8.42)
98
8 State-Space Models (I) The State equation may be written as,
't-i 't-2
0
0
0
0 -1
- 1 -1
0 0
0 1
1 0
T._,
s._, + 0 s... oj[St-aJ
Wi.t
Wz.,
0 0
(8.43)
and the State noise covariance matrix, qu 0 0 0
0
0 0 (In 0 0 0 0 0 0
(8.44)
0 0
This structure of the State noise covariance matrix impHes that the noise sources for the trend and the seasonal components are independent of each other. In this Problem we have to estimate the four parameters, {^Ju^^n^^n) Besides, the estimation algorithm defined earlier would provide us with the smoothed estimates of the states i.e. the trend and the seasonal components. We used the maximum likelihood estimation method with numerical optimization procedure in GAUSS and Table 8.1 summarizes the parameters of the model along with their Standard errors. We then include two graphs (Fig.8.4 and Fig.8.5). The first one shows the original series and the trend component superimposed. You should note how well the trend line tracks the actual eamings series. In the second graph we show the seasonal component only. It should also be clear that the seasonal component fluctuates more toward the end of the sample period.
8.6 Examples and Exercises
1
6
11
16
21
26
31
36
41
Eamings Fig. 8.4 Eamings: actual and estimated trend
Fig. 8.5 Estimated seasonal eamings
46
51
Trend
56
61
66
71
76
99
81
100
8 State-Space Models (I)
Example 8.3: Coincident Indicator In classical factor analysis approach it is assumed that a set of q observed variables z^ depends linearly on q < p unobserved common factors f^ and possibly on individual factors, u^. This structure may be expressed as, z,=Lf,+u,,
(8.45)
where the matrix L is referred to as the factor loading. The main objective of such an approach will be to infer the unobserved factors as well as the factor loading matrix for practical use. The state-space modeling framework is very usefiil in dealing with this Situation. In that context the equation (8.45) may be viewed as the measurement equation and we need to define the dynamics of the unobserved factors. This is where the flexibility of the framework is most appreciated. The exact nature of the System dynamics will, however, depend upon the economic background in which the model is proposed. There have been several applications where the factors are assumed to have ARCH effect and again the parameters of the ARCH process might depend upon a hidden Markov process. A good reference for such details is Khabie-Zeitune et al (2000). Another interesting application is given in Chauvet and Potter (2000). The authors attempt to build a coincident fmancial indicator from a set of four macroeconomic variables and explore the usefulness of the extracted indicator in forecasting the financial market conditions. These authors also allow the dynamic of the latent factor or the coincident indicator to be influenced by a hidden Markov process and switch states with some transition probability matrix. To estimate such a model the authors use statespace framework with the added complexity of the driving hidden Markov process. In this example we refrain form that extra complexity and to get an insight into the versatility of the state-space models we use simpler factor structure and attempt to show its usefulness. The dynamic factor approach in Chauvet and Potter (2000) Starts with the preliminary analysis of four variables that reflect public information about the State of the financial market. These are, excess retum on the market index, price/eaming ratio at the index level, short-term interest rate and a proxy variable for the volatility in the market. The squared excess retum is used for this proxy, but several other choices are possible. The authors suggest that these variables should be transformed to make them stationary. While applying this model to the Australian data we need to take
8.6 Examples and Exercises
101
the first differences of the price/eaming ratio and the short-term interest rate to make these two variables stationary. We will use the symbols consistent with our description of the statespace model earlier. We assume a straightforward dynamic of the unobserved factor (representing the State of the financial market) as,
yt=o+iyt-i+Wt' w,~N(o,a^).
(8.46)
The Observation vector of four variables identified above is assumed to have been generated by contribution from this unobserved factor with varying degrees of sensitivities. Any unexplained part is captured by individual noise terms with their individual variances and these noise terms are uncorrelated. This leads us to define the State equation as foUows:
"ß,"
"zi/
= ^3,
ß2 ß3
Lz4,. Lß4j
'\t'
y,+
^2,
,v,,~N(0,af)
(8.47)
^3., .^4...
In the above measurement equation the observed variables, z-^ (i = 1, 2, 3, 4) represent excess retum, market volatility, change in price/eaming ratio and change in short-term interest rate, in that order. The parameters ßj's measure the sensitivity of the individual observed variable with the unobserved financial market coincident indicator. The state-space model given by the equations (8.46) and (8.47) can now be estimated given the observations of the four variables in the measurement equation. This will be achieved by numerically maximizing the log likelihood function discussed earlier. The parameters of interest are, ((|)o,(|)i,a^,ßi,ß2,ß3,ß4,af,a2,a3,04) and at the same time we would be able to infer about the unobserved component i.e. financial market coincident indicator. We apply this to the Australian monthly data covering the period August 1983 to June 2001. This is implemented using a GAUSS program and below we just summarize the correlations between the inferred coincident indicator and each of the four observed variables used in the model.
102
8 State-Space Models (I)
Table 8.2 Correlations between coincident indicator and others ExcessRetum "™
^~»^™;^™~~™^
Market Volatility ™^™»j^j.^
ChangeinP/E ^^^
^ _Q^j28
These correlations compare very well with those obtained from the U.S. market by Chauvet and Potter (2000), although those authors use more elaborate setup that includes a hidden Markov process driving both the mean and the variance of the unobserved component. Our aim has been to demonstrate the usefulness of state-space approach to the factor analytic models and those interested should refer to the paper by Chauvet and Potter (2000) where the authors use the coincident indicator to forecast out-of-sample the State of the financial markets. It is conceivable that such an approach could be extended to constructing leading indicator and thus making it usefiil for portfolio allocation. ExerciseS.l: AR Model and State-Space Form (EViews) The objective in this exercise is to appreciate modeling flexibility offered by state-space approach. In this exercise, you model the real dividend data from the aggregate US equity market covering the period January 1951 to December 1998. As a straightforward application of time series model, first apply an AR(3) structure to this data and estimate the Parameters. Next, put the AR(3) model in the state-space form and estimate its Parameters. Compare the estimation results. Exercise 8.2: Time Varying Beta (EViews) Market model of equity retum is adopted for empirical work in the CAPM framework. The systematic risk of the equity portfolio is captured by beta. Although, in many cases this quantity is considered time invariant, there are many articles that describe this as a time varying quantity and offer many different approaches to estimating such a time varying beta. In this exercise, you are going to treat beta as an unobserved State variable and apply state-space methodology to estimate this from known values of equity retum. The data contains banking sector retum from Japan and the Nikkei index retum is the proxy for the market portfolio. In Order to understand the difference between constant beta and time varying beta, you should also estimate this using simple linear regression approach.
References
103
Exercise 8.3: Stochastic Regression (EViews) This exercise explores the relationship between interest rate and inflation. The three month interest rate, y^, and the quarterly inflation rate, z^, from the second quarter of 1976 to the first quarter of 2001 (for Australia) is to be analyzed following the stochastic regression equation given below: y,=a
+ ß,z^+v,,
(8.48)
where, a is a fixed constant, ß^ is a stochastic regression coefficient, and v^ is a white noise with a variance a j . Your task is to formulate this as a state-space problem where the variable ßt is assumed to have a constant mean, b, and the associate error term has a variance a^ . In other words, ß^ may be expressed as,
(ß,-b) = (t)(ß,_,-b) + w,.
(8.49)
How would you forecast from this state-space model the next three month's interest rate? Also, comment on how well this model performs for the Australian market.
References Chauvet M, Potter S (2000) Coincident and leading indicators of the stock market. Journal of Empirical Finance, 7: 87-111 Harvey AC (1989) Forecasting, structural time series models and the Kaiman fllter. Cambridge University Press, Cambridge Harvey AC (1990) The econometric analysis of time series. The MIT Press, Cambridge Kaiman RE (1960) A new approach to linear filtering and prediction problems. Journal Basic Engineering, Transactions ASME, Series D, 82: 35-45 Kaiman RE, Bucy RS (1961) New resuhs in linear filtering and prediction theory. Journal Basic Engineering, Transactions ASME, Series D, 83: 95-108 Khabie-Zeitoune D, Salkin G, Christofides N (2000) Factor GARCH, regime switching and the term structure. Advances in Quantitative Asset Management, vol 1, Kluwer Academic Publishers, Dordrecht Kim C-J, Nelson CR (1999) State-space models with regime switching: classical and Gibbs-sampling approach with applications. The MIT Press, Cambridge
104
8 State-Space Models (I)
Lutkepohl H (1993) Introduction to multiple time series analysis. Springer-Verlag, Berlin Maybeck PS (1979) Stochastic models, estimation and control. vol 1, Academic Press, New York, and London Shumway RH, Stoffer DS (2000) Time series analysis and its applications. Springer Text in Statistics, New York Tanizaki H (1993) Nonlinear filters: estimation and applications. Lecture Notes in Economics and Mathematical Systems, Springer-Verlag, Berlin Wells C (1996) The Kaiman filter in flnance. Kluwer Academic Publishers, Dordrecht
9 State-Space Models (II)
9.1 Likelihood Function Maximization We have discussed earlier that to estimate the unknown parameters of a model cast into state-space form we need to maximize the prediction error form of the Hkelihood function. In this segment we will review some of the intricacies of function optimization and indicate some of the Software Implementation of these procedures. Further details in this context are normally covered in courses like numerical techniques. In Order to focus our attention on the topic we reproduce the likelihood function developed in the previous lecture and all Symbols retain their original meaning:
log(L) = - I l o g | S , (0)1 - X v ; i&K' (0)v. (0) • t=l
(9.1)
t=l
The State process is assumed to have started with the initial value given by the vector, y^, taken from normally distributed variables with mean vector \XQ and the p x p covariance matrix, E^. In carrying out this optimization we need to specify these prior quantities. When we are modeling stationary series we may be able to initialize the mean State vector using Information of the sample means of the Observation variables or any other knowledge we have about the System. For non-stationary series we usually initialize with the first set of observations. For finance and economic applications we normally initialize the prior covariance matrix with large but finite diagonal Clements. This represents the fact the Information about the prior mean vector is diffused i.e. Coming from a widely dispersed distribution. The initial specification of the covariance matrix, IQ , is especially important since the forecast error variance, Pj^^ is not only dependent on the Observation data but it is also partially determined by E^. Under certain situations it is possible to demonstrate that the System has limited memory
106
9 State-Space Models (II)
i.e. the value of EQ is quickly forgotten as more data are processed. It is, therefore, sometimes important to empirically investigate this. Further details about this topic are available in Jazwinski (1970) and Bhar and Chiarella(1997). Software products like EViews and Excel have limited number of optimization routines incorporated. On the other hand, in the GAUSS programming environment there are several choices available to suit many applications. Although, for most, detailed understanding of these routines is not a priority, it, however, helps in practice to have some knowledge of the internal workings of these algorithms. Most commercial Implementation of optimization routines offers different tuning parameters, which are used to control the progress of the algorithm. The kinds of functions we encounter in practice for optimization are highly non-linear in several parameters to be estimated. Therefore, there is no guarantee that a sensible result can be obtained by simply running these programs. The choice of starting values and other tuning parameters are critical for meaningful results to be obtained. Here we will briefly describe the logic of one of the most common algorithms used. This is commonly known as the NewtonRaphson technique and the discussion will focus on minimizing a given function. When we are maximizing a function, as in the case of above likelihood function, we simply define the negative ofthat fiinction to be minimized. To simplify exposition of this algorithm we redefme equation (9.1) simply as, L ( 0 ) and the parameter vector 0 has n elements. The algorithm requires gradient Information about the objective function. The Jacobian gradient vector, g, is defined as,
8L 50,
8L *" 50.
(9.2)
The n X n Symmetrie matrix of second order partial derivatives of L is known as the Hessian matrix and is denoted by, H where,
9.1 Likelihood Function Maximization 107
a'L
H
aef
d'L 59,502
d'L
5e,5e„
d'L
d'L
.
se^ae,
del
(9.3)
d'L
d'L
5e„ae,
del
Assuming the function L is sufficiently differentiable, we can express L near a minimum for a small change the parameter vector as,
L ( 0 + A © ) « L ( 0 ) + g'A0 + - A 0 ' H A 0 .
(9.4)
Cur objective is to determine the elements of A0i.e. 0,,i = l,2,...n so that we can change these elements from the current values in order to move toward a minimum. We can write equation (4) in expanded form as, n pjT 1 n n 8'L L™„ = L ( 0 ) + y — A e , + - y y A a ^ - : : : - A e .
(9.5)
and to determine A0 we consider the gradient and the Hessian matrices are constant and partially differentiate equation (9.5) with respect to the elements AB^, for each j from 1 to n. Setting these results to zero gives,
— + 1 A 0 , ^ ^ = O, j = l,2,...n, 1
(9.6)
J
for the first order condition of a minimum. In matrix notation, equation (9.6) is simply, g = -HA0,
(9.7)
108
9 State-Space Models (II)
or A© = -H-^g,
(9.8)
as the approximation to the required movement to the minimum 0^,„ from a nearby point, 0 . In general, the required movement to a minimum from a nearby point is approximately given by the equation (9.8). As you will notice from the above analysis, the procedure requires computation of the gradient and the Hessian matrix. During the process of optimization these quantities have to be calculated many times, therefore, much effort has gone into developing faster algorithms to compute these. In practice, it is hardly possible to compute the partial derivatives analytically for complex likelihood functions. In that Situation these are computed using numerical methods e.g. forward finite differencing scheme etc. Another Problem often encountered in practice is that of ill conditioned matrices. Special actions are required to ensure that such situations are avoided. In this book we will not be able to delve into this topic in greater detail. 9.2 EM Algorithm EM Stands for Expectation Maximization. In addition to Newton-Raphson technique, Shumway and Stoffer (2000) suggest this procedure that has been found to be more robust to arbitrary initial values, although researchers report that this procedure is somewhat slower than the NewtonRaphson method. The Research Department of Bank of Canada appears to have adopted a mixture of these two approaches. The apparent robustness of the EM algorithm is utilized to get started on the complex optimization Problem and after a while the values generated for the parameters are used to initialize the Newton-Raphson method. They have reported some success in this hybrid mechanism. In this section we will give an overview of this algorithm. The GAUSS code I have developed is available for those who are interested. Referring to the state-space model developed in the previous lecture, assume that the states Y^ ={yQ,yp...yj^} in addition to the observations Zj^ ={zi,Z2,...Zj^} are observed, then we would consider {Yj^,Z^} as the complete data. Under the normality assumption the Joint likelihood (without the constant term) would then be,
9.2 EM Algorithm 109
•21nL^.v(0) = ln|lo| + (yo-^io)'Sö'(yo-^io) +in|Q|+I(y.-ry.-,)'Q"'(y.-ry._,) t=i
(9.9)
+ln|R| + X(z.-A,y._,)'R->(z.-A,y._,). t=l
Thus, if we had the complete data then we could use the relevant theory from multivariate normal distribution to obtain maximum likelihood estimate (MLE) of 0 . Without the complete Observation, the EM algorithm gives US an iterative method for finding the MLE of 0 based on the incomplete data, Z^, by successively maximizing the conditional expectation of the complete data likelihood. Let us write at Iteration j ,
0 = 1,2,....), Q(0|0^-') = E{-21nL,,Y(0)|z„,0^->}.
(9.10)
The equation (10) is referred to as the expectation step. Given the current value of the parameters, 0^"', we can use the smoothing algorithm described earlier. This results in, Q(©|0^-') = ln|Eo| + tr|E-|p;+(yo"-n„)(y;-^„y
+in|Q|+tr{Q-'[s„ -s,„r-rs;o +rs„„r']}
(9.11)
+ln|R|+tr|R-> j r ( z , -A.y:)(z. -A,y,")' +A,P,"A;
where
s„=l:(y,"yr+p;)
(9.12)
110
9 State-Space Models (11)
s.o=l(y:yr+PÜ-,)> t=l ^
(9.13)
'
and
Soo=i(yMyM+P".)t=i ^
(9.14)
'
The equations (9.11)-(9.14) are evaluated under the current value of the Parameters, 0^"^. Minimization of equation (9.11) with respect to the Parameters is equivalent to the usual multivariate regression approach and constitutes the maximization step at Iteration j . This results in,
X
Q-'-n
— '-'in^on 9
(^Sj!
SJOSQQSIQJ,
R^=^"Z (zt-A,y:)(z,-Aj:y+A,px
(9.15)
(9.16)
(9.17)
t=l
In this procedure the initial mean and the covariance cannot be estimated simultaneously. In practice we fix the covariance matrix and use the estimator,
K = t,
(9.18)
obtained from the minimization of equation (9.11). The overall procedure simply altemates between Kaiman filtering and smoothing recursions and the multivariate normal maximum likelihood estimators given by equations (9.15)-(9.18). Convergence in EM algorithm is guaranteed. To summarize, the EM algorithm steps are,
9.3 Time Varying Parameters and Changing Conditional Variance (EViews) 111 • Initialize the process by starting parameter values, 0^ = {|io,r,Q,R}, and fix ZQ • ForeachjJ = l,2,
.
Compute the incomplete-data likelihood, -21nL2(0^~^), see equation(9.1) E-step: For the current value of the parameters i.e. 0^"^, compute the smoothed estimates of y",P",P"t_i and use these to compute ^ ^ ^ M-step: Update the estimates of |Lio,r,Q,R using equations (9.15)(9.18) i.e. obtain 0^ • Repeat the above steps until convergence is achieved.
9.3 Time Varying Parameters and Changing Conditional Variance (EViews) In the usual linear regression models the variance of the innovation is assumed constant. However, many fmancial and economic time series exhibit changing conditional variance. The main modeling approach for this is ARCH (autoregressive conditional heteroscedasticity). In this dass of models the current conditional variance depends on the past squared innovations. It has been suggested that the uncertainty about future arises not simply because of future random innovation but also due to the uncertainty about the current parameter values and of the model's ability to relate present to the future. In the fixed parameter regression framework model's parameters are constant and hence may not capture the dynamic properly. In other words, the changing parameters might be able to explain the observed time varying conditional variance in regression models. In this section we describe such an attempt in modeling the quarterly monetary growth rate in the US over a sample period of 1959:3 to 1985:4. The relevant article for this section is Kim and Nelson (1989) and the whole exercise should be carried out in EViews using the data set provided.
112
9 State-Space Models (II)
First we consider the model in the fixed parameter context. The model equation is given by, AM, = ßo + ßiAr^.i+ß^INF,., + ßsSURP,., + ß4AM,_i + s,,
(9.19)
where 8, ~N(0,ag). The meaning of the variables are, AM is the quarterly growth rate of Ml money in the US, Ar is the change in the three month interest rate, INF is the rate of Inflation, and SURF is the budget surplus. Your task is to estimate this model and test for ARCH effect in the residual. This gives indication that the ARCH effect has to be incorporated in the model. Again using EViews apply ARCH (1) specification to the model in equation (9.19) and construct the variance series. This should depict how the uncertainty evolved over time. Using Chow test it can be shown that the parameters in equation (9.19) are not stable over the sample period analyzed. This brings us to the alternative specification of time varying parameter model. A natural way to model the time Variation of the parameters is to specify them as random walk. You should note that once we make the parameters in equation (9.19) time varying we couldn't estimate the model using Standard maximum likelihood method since we have no observations of the parameters. In other words, given the random walk dynamic of the parameters, we want to infer the parameters from the observations of the different variables we have in the model. The easiest way to deal with this Situation is to model the observations and the unobserved components in the state-space framework. We describe the model structure below and your task is to set up and estimate the model in EViews using state-space object: AM, = ßo, + ß,,Ar ,_,+ß, ,INF,_, + ßs.SURF,,, + ß4,tAM,_^ + 8,,
(9.20)
ßu=ßu-i+v,„ v,,~N(0,a^),
(9.21)
where 1 = 0,1,2,3,4. We will also assume that the Innovation process for the parameters and the Observation are uncorrelated. Once the model has been put in the state-space form it can be estimated using Kaiman filter. The recursive nature of the filter captures the insight into how different
9.4 GARCH and Stochastic Variance Model for Exchange Rate (EViews) 113 policy regimes force the revision of the estimates. This is not possible in the fixed parameter regression framework. While estimating this modal in EViews you should also analyze the importance of specification of the prior values of the State variables and their covariance. Once the estimation is complete you should generate the forecast error variance from the estimated components. Your next task is to compare this forecast error variance with that of the ARCH variance developed earlier. In an ARCH model, changing uncertainty about future is focused on the changing conditional variance in the disturbance term of the regression equation. In the time varying parameter model, the uncertainty about current regression coefficients contributes to the changing conditional variance of the monetary growth. This is well represented by the equation for the Kaiman filter that describes the conditional forecast error, which has two components. The first part captures the variance associated with the uncertain parameters (states). It will be interesting to plot the two variance series, one from the ARCH estimation and the other one from the time varying parameter model, in the same graph. You would notice that around 1974 the variance from the time varying parameter model is higher than the one from the ARCH model. This suggests that during oil price shock uncertainty about the future monetary policy was higher than that suggested by the ARCH model. In that period, unusually high Inflation rate caused the conditional variance to be unusually high as well.
9.4 GARCH and Stochastic Variance IVIodel for Exchange Rate (EViews) Another alternative to ARCH in modeling changing conditional variance is to model variance as an unobserved stochastic process. In this section we describe such an approach suggested by Harvey, Ruiz and Shepherd (1994) and as an application of this model you will apply this technique in EViews to exchange rate data. Before formulating the stochastic variance (SV) model we first State the specification for the GARCH(1,1) model that is usually applied to exchange rate data. Let r^ denote the retum from the spot exchange rate i.e. the log price difference series, then the GARCH (1,1) model is, :, + a^r^.i + 8,, z,^,_, - N(0,of) ,
(9.22)
114
9 State-Space Models (II)
af=ßo+ßi8f_,+Mf_,,
(9.23)
where a^ is the conditional variance that depends upon past squared Innovation as well as the past conditional variance. The mean equation (22) in this case is a simple AR(1) process and in other cases it could include other explanatory variables. Since the model is formulated in terms of one-step ahead prediction error, maximum likelihood estimation is straightforward. The dynamics of a GARCH model show up in the autocorrelation function (ACF) of the squared observations. If the parameters ßj,ß2 sum close to one then the ACF decays very slowly implying slowly changing conditional variance. In practice it is also quite difficult to establish the condition under which is af always positive. The stochastic variance models are the natural discrete time version of continuous time models on which much of modern finance theory has been developed. The main difficulty is the maximum likelihood estimation of these models. However, such models can be generalized to multivariate case in a natural way. The estimation of SV model is based on quasimaximum likelihood procedure and it has been shown to capture the exchange rate data quite well. The SV model is characterized by a Gaussian white noise process multiplied by a GARCH(1,1) factor. In order to ensure positive variance it has been suggested that we model log of the variance. If we define h^ =lna^ and y t= Inr^^, then equation (9.22) can be re-expressed as, y , = h , + bi8f.
(9.24)
Equation (9.24) is considered the Observation equation, and the stochastic variance h^ is considered to be an unobserved State process. In its basic form the volatility process foUows an autoregression, ht=*o + *iVi + w , , where w^ is the white Gaussian noise with variance a^.
(9.25) Together, the
equations (9.24) and (9.25) make up the stochastic variance model. If ef
9.4 GARCH and Stochastic Variance Model for Exchange Rate (EViews) 115 had a log normal distribution, then equations (9.24)-(9.25) would form a Gaussian state-space model. Unfortunately y t=lnrt^ is rarely normal, so we keep the ARCH normality assumption for 8^. In that case, Inef is distributed as the log of a chi-squared random variable with one degree of freedom. Such a variable has a mean of -1.27 and variance n^jl. Although various approaches to estimating the stochastic variance model have been proposed in the literature, we will investigate an approximate Kaiman filter and you would be able to implement the model in EViews and investigate the model given an exchange rate series. You may also experiment whether a specification other than equation (9.25) fits the data better. An alternative to equation (9.25) would be a random walk specification. The estimation method suggested in Harvey, Ruiz and Shepherd (1994) is a quasi-maximum likelihood (QML) method and computed using the Kaiman filter. The following state-space form may be adopted for implementing the SV model, yt=^+^t.
ht = (|)o + iht-i + Tlt or h^ = \_, + r| ^.
(9.26)
(9.27)
Although Kaiman filter can be applied to (9.26)-(9.27), it will only yield minimum mean Square linear estimators of State and Observation rather than minimum mean Square estimators. Besides, since the model is not conditionally Gaussian, the exact likelihood cannot be obtained from the resulting prediction errors. Nevertheless, estimates can be obtained by treating H,^ as N(0,a^) and maximizing the resulting quasi-likelihood function. Further reference to the relevant literature is available in Harvey, Ruiz and Shepherd (1994) for the Performance of these estimators. Before you attempt the model in EViews, the following notes would be helpful: a. Initializing the State variable: When the State variable has an autoregression structure the sample variance of the Observation series may be used as the prior variance,
116
9 State-Space Models (II) When the State variable has the random walk structure, the prior variance may be a large but finite positive integer.
b.
Avoiding numerical problem with data Remember that y^ = Inr^^, and r^ = Inp^ - lnp^_i, where p^ is the observed exchange rate on date t, so it is possible that r^ could be zero in some cases. This will create numerical problem to generate y^ since log of zero is undefined. To avoid this problem, first generate the series r^ and then subtract the mean of r^ from each of the Clements.
9.5 Examples and Exercises Example9.1: EM Algorithm (Data from Shumway and Stoffer (2000)) Measurements of average temperature and salt level in an agricultural field at equally spaced intervals are shown in Fig.9.1. Our aim is to explore the relationship between these two variables using a suitable state-space form and estimate such a model using EM algorithm. We assume that the bivariate vector of temperature and salt level are measured with some measurement errors that are uncorrelated. Thus,
=
1
Ol | y i , .
0
1
+
y2..
(9.28) ^2..
where the State vector is assumed to have the following dynamic.
ii
0). The log-linear approximation of (12.11) can be written as follows: q = k + H^E,p,^i+(l-V)d,-p,,
(12.12)
Where q is the required log gross retum rate, ^ is the average ratio of the stock price to the sum of the stock price and the dividend, k is -ln(4^)-(lT)ln(l/^-l), p, is ln(P,),and d^ is ln(D,). The general Solution to (12.12) is given by:
p, =(k-q)/(l-x^) + (l-v/)2v|/'EJd.„] + b, -p[ + b,,
(12.13)
1=0
where b^ satisfies the following homogeneous difference equation: E,[b,,J = (l/v^yb,.
(12.14)
164
12 Global Bubbles in Stock Markets and Linkages
In equation (12.12), the no-bubble Solution p^ is exclusively determined by dividends, while b can be driven by events extraneous to the market and is referred to as a rational speculative bubble. After defining the stock price equation, the parametric bubble process and the dividend process in a state-space form, the bubble is treated as an unobserved State vector, which can be estimated by the Kaiman filtering technique. The author finds statistically significant estimate of the Innovation variance for the bubble process. During the 1960s bull market the bubble accounts for between 40% and 50% of the actual stock prices. Negative bubbles are found during the 1919-1921 bear market, in which the bubble explains between 20% and 30% of the decline in stock prices. 12.3.5 Wu (1995) The same model has been used also to estimate the unobserved bubbly component of the exchange rate and test whether it is significantly different from zero. Using the monetary model of exchange rate determination, the Solution for the exchange rate is the sum of two components. The first component, called the fundamental Solution, is afimctionof the observed market fundamental variables. The second component is an unobserved process, which satisfies the monetary model and is called the stochastic bubble. The monetary model, the market fundamental process and the bubble process are expressed in the state-space form, with the bubble being treated as a State variable. The Kaiman filter can than be used to estimate the State variable. The author finds no significant estimate of a bubble component was found at any point in the period 1974-1988. Similar results were obtained for the sub-sample, 1981 through 1985, in which the results US dollar appreciated most drastically and a bubble might more likely have occurred. 12.4 New Contribution The purpose of our study is to search empirically for bubbles in national stock markets using state-of-the-art methodology such as Wu (1995, 1997) with emphasis on the U.S., Japan, Germany and the United Kingdom. We focus on the post-war period in these four countries as opposed to Wu (1997), which concentrates on only the U.S. annual data series dating back to 1871. All data are monthly retums of the S&P 500, Nikkei 225, Dax-30 and FT-100 Indexes ranging from January 1951 to December 1998, that is,
12.5 Global Stock Market Integration
165
576 observations. All data are converted to real values using the corresponding CPI measures and Global Financial Data provided the data. In Order to establish the soundness of our methodology we attempted to reproduce the results from Wu (1997) using annual U.S. data (also obtained from Global Financial Data) covering the period 1871 - 1998. Although we employ the unobserved component modeling approach similar to Wu (1997), our implementations of the state-space form (or the Dynamic Linear Model, DLM, (chapter 8) is quite different from that of Wu. We treat both the dividend process and the bubble process as part of the unobserved components i.e. the State vector. The State equations also include their own System error, which are assumed uncorrelated. The measurement vector in this case contains the price and the realized dividend without any measurement errors. The advantage of this way modeling is that the comparison with the no bubble Solution becomes much more straightforward. Wu (1997) had to resort to alternative way (GMM) of estimating the no bubble Solution and the model adequacy tests are not performed there. Besides, the precise moment conditions used in the GMM estimation are not reported there. On the other hand, in our approach we are able to subject both the bubble and the no bubble Solutions to a battery of diagnostics test applicable to state-space Systems. In the following subsections we describe in detail the mathematical structures of our models and the estimation strategies. Once bubbles are confirmed empirically, we proceed to test linkages between the four markets in terms of both the fundamental price and the bubble price series. In this context we adopt a sub-set VAR methodology (Lutkepohl 1993 p.l79). The approach builds into it the causal relations between the series and this gives us the opportunity to analyze the potential global contagion among these national equity markets through the speculative component of the prices. The potential existence of global linkages among equity markets will further decrease the expected benefits of a global diversification. The review of the literature associated with global Integration and diversification is briefly presented next.
12.5 Global Stock Market Integration During the past thirty years, world stock markets have become more integrated, primarily because of financial deregulation and advances in Computer technology. Financial researchers have examined various aspects of the evolution of this particular aspect of world Integration. For example, the early studies by Grubel (1968), Levy and Samat (1970), Grubel and
166
12 Global Bubbles in Stock Markets and Linkages
Fadner (1971), Agmon (1972, 1973), Ripley (1973), Solnik (1974), and Lessard (1973, 1974, 1976) have investigated the benefits from international portfolio diversification. While some studies, such as Solnik (1976), were exclusively theoretical in extending the capital asset pricing model to a World economy, others such as Levy and Samat (1970) used both theory and empirical testing to confirm the existence of financial benefits from international diversification. Similar benefits were also confirmed by Grubel (1968), Grubel and Fadner (1971), Ripley (1973), Lessard (1973, 1974, 1976), Agmon (1972, 1973), Makridakis and Wheelwright (1974), and others, who studied the relations among equity markets in various countries. Specifically, Agmon (1972, 1973) investigated the relationships among the equity markets of the U.S., United Kingdom, Germany and Japan, while Lessard (1973) considered a group of Latin American countries. By 1976, eight years after the pioneering work of Grubel (1968), enough knowledge had been accumulated on this subject to induce Panton, Lessing and Joy (1976) to offer taxonomy. It seems reasonable to argue that although these studies had used different methodologies and diverse data from a variety of countries, their main conclusions confirmed that correlations among national stock market retums were low and that national speculative markets were largely responding to domestic economic fundamentals. Theoretical developments on continuous time stochastic processes and arbitrage theory were quickly incorporated into international finance. Stulz (1981) developed a continuous time model of international asset pricing while Solnik (1983) extended arbitrage theory to an international setting. Adler and Dumas (1983) integrated international portfolio choice and corporate finance. Empirical research also continued to flow such as HiUiard (1979), Moldonado and Saunders (1981), Christofi and Philippatos (1987), Philippatos, Christofi and Christofi (1983) and also Grauer and Hakansson (1987), Schollhammer and Sand (1987), Wheatley (1988), Eun and Shim (1989), von Furstenberg and Jeon (1989), Becker, Finnerty and Gupta (1990), Fisher and Palasvirta (1990), French and Poterba (1991) and Harvey(1991). These numerous studies employ various recent methodologies and larger databases than the earlier studies to test for interdependencies between the time series of national stock market retums. The underlying issue remains the empirical assessment of how much Integration exists among national stock markets. In contrast to earlier results, and despite some reservations, several of these new studies find high and statistically significant
12.6 Dynamic Linear Models for Bubble Solutions
167
level of interdependence between national markets supporting the hypothesis that global stock markets are becoming more integrated. In comparing the results of the earlier studies with those of the more recent ones, one could deduce that greater global Integration implies fewer benefits from international portfolio diversification. If this is true, how can one explain the ever-increasing flow of big sums of money invested in international markets? To put differently, while Tesar and Werner (1992) confirm the home bias in the globalization of stock markets, why are increasing amounts of funds invested in non-home equity markets? For instance, currently about 10% of all trading in U.S. equities take place outside of the United States. The June 14, 1993 issue of Barron's reported that US Investors have tripled their ownership of foreign equities over the past five years from $63 billion to over $200 billion in 1993. The analysis of the October 19, 1987 stock market crash may offer some insight in answering this question. Roll (1988, 1989), King and Wadhwani (1990), Hamao, Musulis and Ng (1990) and Malliaris and Urrutia (1992) confirm that almost all stock markets feil together during the October 1987 Crash despite the existing differences of the national economies while no significant interrelationships seem to exist for periods prior and post the Crash. Malliaris and Urrutia (1997) also confirm the simultaneous fall of national stock market retums because of the Iraqi Invasion of Kuwait in July 1990. This evidence Supports the hypothesis that certain global events, such as the crash of October 1987 or the Invasion of Kuwait in July, 1990, tend to move world equity markets in the same direction, thus reducing the effectiveness of international diversification. On the other hand, in the absence of global events, national markets are dominated by domestic fundamentals, and international investing increases the benefits of diversification. Exceptions exist, as in the case of regional markets, such as the European stock markets reported in Malliaris and Urrutia (1996). Longin and Solnik (2001) distinguish between bear and bull markets in international equity markets and find that correlation increases in bear markets, but not in bull markets.
12.6 Dynamic Linear IModels for Bubble Solutions Our starting point in this approach is the equations (12.13) and (12.14) described earlier. As our preliminary investigations reveal that both the log real price and log real dividend series are non-stationary, we choose to work with the first differenced series. Thus, the equation (12.13) becomes,
168
12 Global Bubbles in Stock Markets and Linkages
Ap, =Ap[+Ab,,
(12.15)
where, Ap[ =(l-\|;)^VEJd,^J-(l-\|/)£VE,_Jdt_j^J . Assuming the i=0
i=0
following parametric representation of equation (12.14),
bt.i=-bt+e,, s,~N(0,a'J,
(12.16)
Ab,=~(b,-b,_,).
(12.17)
In Order to express the fundamental component of the price, Ap[, in term of the dividend process, we fit an appropriate AR model of sufficient order so that the Information criterion AIC is minimized. We find that for the Japanese data a AR(1) model is sufficient whereas for the other three countries we need AR(3) models. The infinite sums in the expression for Ap[ may be expressed in terms of the parameters of the dividend process once we note the following conditions: • • •
The differenced log real dividend series is stationary, therefore the infinite sum converges, Any finite order AR process can be expressed in companion form (VAR of order 1) by using extended State variables i.e. suitable lags of the original variables, (Campbell, Lo and MacKinlay 1997 p.280), Using demeaned variables the VAR(l) process can be easily used for multiperiod ahead forecast (Campbell, Lo and MacKinlay 1997 p.280).
Assuming the demeaned log real dividend process has the following AR(3) representation, Ad, =(|)iAd,_i +(t)2Adt_2 +(|)3Ad,_3 + 8 5 , 8 5 - N ( 0 , a ^ ) ,
the companion form may be written as,
(12.18)
12.6 Dynamic Linear Models for Bubble Solutions
Ad,
0 (I - v|/0)'' AX, + Ab,,
(12.22)
where e' = [l 0 O]. The equation (12.22) represents the measurement equation of the DLM and we need to suitably define the State equation for the model. An examination of the equation (12.17) and (12.19) suggests that the foUowing State equation adequately represent the dynamics of the dividend and the bubble process:
170
12 Global Bubbles in Stock Markets and Linkages
>,
Ad, Ad.,, Ad._2
-as, or |i = (r-rf.) + )ias.
197 (13.2)
Thus, under the historical measure Q equation(13.1) canberewritten dS = ( r - Tf + A.a)Sdt + asSdW(t), under Q .
(13.3)
Altematively under the risk neutral measure Q the last equation becomes dS = (r-rf)Sdt + asSdW(t),
(13.4)
t
where, W(t) = W(t) + JA.(u)du . 0
We recall that under Q, the process W(t) is not a Standard Wiener process since E[dW(t)] = A,dt 9^ 0 in general. However, Girsanov's theorem allows us to obtain the equivalent measure Q under which W(t) does become a Standard Wiener process. The measures Q and Q are related via the Radon-Nikodym derivative. Using Standard arguments for pricing derivative securities (see for example, Hüll (1997), chapter 13), the forward price at time t for a contract maturing at T(> t), is F(t,T) = E,(S,).
(13.5)
But from equation (13.4), by Ito's lemma, d[S(t)e-^'-^^^^] = Gß(t)Q-^'-''^'(m(t),
(13.5')
so that under Q, the quantity S(t)e"^'~'^* is a martingale and it foUows immediately that E^(ST;) = S,e^'~''^^'^~'\ i.e.
198
13 Forward FX Market and the Risk Premium ¥{t,T)=:S/'-'^^^-'\
(13.6)
If the maturity date of the contract is a constant period, x, ahead then (13.6) may be written as F(Ut + x) = S/'-'^^\
(13.7)
Then from (13.3), (13.4) and (13.7) and by a trivial application of Ito's lemma we obtain the stochastic differential equation for F under Q and Q . Thus, under Q dF(t,x) = (r -rf )F(t,x)dt + asF(t,x)dW(t),
(13.8)
whilst under Q , dF(t, x) = (r - rf + Xa^ )F(t, x)dt + asF(t, x)dW(t),
(13.9)
with, F(0,x) = Soe^"'^^\ We now assume that under historical measure Q the market price of risk, X, follows the mean reverting stochastic process dA, = K(A,-A.)dt + a;,dW,
(13.10)
where X is the long-term average of the market price risk, K defines the speed of mean reversion. Here, we assume that the same noise process drives both the spot exchange rate and the market price of risk. It would of course also be possible to consider a second independent Wiener process driving the stochastic differential equation for X. However, we leave investigation of this issue for future research. It should be pointed out here that when discretised the stochastic differential equation (13.10) would become a low order ARMA type process of the kind reported in Wolff (1987) and Cheung (1993). The parameters in
13.3 The Proposed Model
199
equation (13.10) may be estimated from the data using the Kaiman filter as pointed out earlier. Considering we have one forward price, f(t,x), then we have a System of 3 stochastic differential equations. These are (under the measure Q ) dS = (r - ff + ;Las)Sdt + agSdWCt),
(13.11a)
dX, = K(A,-:^)dt + a;,dW(t),
(13.11b)
dF(t, x) = (r - Tf + Xa^ )F(t, x)dt + asF(t, x)dW(t),
(13.11c)
where, S(0) = So, ^ 0 ) = ^^, f(0,x) = Soe^'-^^^\ It should be noted that the information contained in equations (13.11a)(13.1 Ic) is also contained in the pricing relationships, (13.12)
F(t,x) _=oS,e^ ( ' • - ' • f ) ' '
To estimate the parameters in the filtering framework, however, we choose to work with the equation (13.11c). From equation (13.3), we can write the spot price at time t + x as, using s(t) = lnS(t),as .2\
s(t + x) = s(t) + r - r ,
i-t-x
i-hx
x + a^ JA,(T)dT + as JdW(T).
(13.13)
From equation (13.13) we can write the expected value of s(t + x) as .2\
E,[s(t + x)] = s,+
-t-x
X + GgE^
jHt)dx
The calculations outlined in appendix allow us to then write,
(13.14)
200
13 Forward FX Market and the Risk Premium 2 \
Ejs(t + x)] = s(t) + r - , - -
x + a. (x(t)-x)
^j_g-\
+ ^x
(13.15)
The above equation may also be expressed (via use of equation (13.7)) as.
E.[s(t + x)] = f ( t , x ) - ^ + a