Financial Econometrics: Problems, Models, and Methods 9780691187020, 0691088721, 9780691088723, 2001036264

Financial econometrics is a great success story in economics. Econometrics uses data and statistical inference methods,

274 103 26MB

English Pages [528] Year 2018

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Financial Econometrics: Models and Methods 1107177154, 9781107177154

This is a thorough exploration of the models and methods of financial econometrics by one of the world's leading fi

748 41 24MB Read more

Financial Econometrics Modeling: Market Microstructure, Factor Models and Financial Risk Measures 0230283624, 9780230283626

436 26 78KB Read more

Nonlinear Financial Econometrics: Markov Switching Models, Persistence and Nonlinear Cointegration 9780230283640, 0230283640

This book proposes new methods to value equity and model the Markowitz efficient frontier using Markov switching models

101 78 2MB Read more

Handbook of Research on Emerging Theories, Models, and Applications of Financial Econometrics 303054107X, 9783030541071

This handbook presents emerging research exploring the theoretical and practical aspects of econometric techniques for t

119 109 27MB Read more

statistical methods in econometrics

518 62 4MB Read more

Spatial Econometrics: Methods and Models [Softcover reprint of hardcover 1st ed. 1988] 9048183111, 9789048183111

This book is a very useful and significant contribution to the spatial analysis literature ... I for one will be using i

393 16 23MB Read more

Probability and Statistical Models: Foundations for Problems in Reliability and Financial Mathematics 0817649867, 9780817649869

With an emphasis on models and techniques, this textbook introduces many of the fundamental concepts of stochastic model

99 31 2MB Read more

The Econometrics of Financial Markets 9781400830213

The past twenty years have seen an extraordinary growth in the use of quantitative methods in financial markets. Finance

98 56 33MB Read more

The Econometrics of Financial Markets 0691043019, 9780691043012

The past twenty years have seen an extraordinary growth in the use of quantitative methods in financial markets. Finance

614 74 5KB Read more

Data Science for Financial Econometrics 3030488527, 9783030488529

This book offers an overview of state-of-the-art econometric techniques, with a special emphasis on financial econometri

608 32 13MB Read more

Financial Econometrics: Problems, Models, and Methods
9780691187020, 0691088721, 9780691088723, 2001036264

Author / Uploaded
Christian Gourieroux
Joann Jasiak

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Financial Econometrics

PRINCETON SERIES IN FINANCE

SERIES EDITORS

Darrell Duffie Stanford University

Stephen Schaefer London Business School

Finance as a discipline has been growing rapidly. The numbers of researchers in academy and industry, of students, of methods and models have all proliferated in the past decade or so. This growth and diversity manifests itself in the emerging cross-disciplinary as well as cross-national mix of scholarship now driving the field of finance forward. The intellectual roots of modem finance, as well as the branches, will be represented in the Princeton Series in Finance. Titles in the series will be scholarly and professional books, intended to be read by a mixed audience of economists, mathematicians, operations research scientists, financial engineers, and other investment professionals. The goal is to provide the finest cross-disciplinary work in all areas of finance by widely recognized researchers in the prime of their creative careers.

Financial Econometrics PROBLEMS, MODELS, AND METHODS

Christian Gourieroux Joann J asiak

Princeton University Press Princeton and Oxford

Library of Congress Cataloging-in-Publication data Gourieroux, Christian, 1949Financial econometrics / Christian Gourieroux andjoannjasiak. p. cm. - (Princeton series in finance) Includes index ISBN ().691-O8872-1 1. Econometrics. 2. Finance-Statistical methods. 3. FinanceMathematical models. I. jasiak,joann, 1963- II. Title. III. Series. HB139 .G685 2001 330'.01'5195-dc21 Copyright © 2001 by Princeton University Press Published by Princeton University Press, 41 William Street, Princeton, New Jersey 08540 In the United Kingdom: Princeton University Press, 3 Market Place, Woodstock, Oxfordshire 0X20 ISY All Rights Reserved

ISBN 0-691-08872-1 British Library Cataloging-in-Publication Data is available This book has been composed in New Baskerville

Printed on acid-free paper.oo www.pup.princeton.edu Printed in the United States of America 10 9 8 7 6 5 4 3 ISBN-13: 978-0-691-08872-3 (d.) ISBN-I0: 0-691-08872-1 (d.)

2001036264

Contents

Preface

Vll

1

Introduction

1

2

Univariate Linear Models: The AR(l) Process and Its Extensions

17

3

Multivariate Linear Models: VARMA Representation

53

4

Simultaneity, Recursivity, and Causality Analysis

81

5

Persistence and Cointegration

105

6

Conditional Heteroscedasticity: Nonlinear Autoregressive Models, ARCH Models, Stochastic Volatility Models

117

7

Expectation and Present Value Models

151

8

Intertemporal Behavior and the Method of Moments

173

9

Dynamic Factor Models

195

10

Dynamic Qualitative Processes

219

11

Diffusion Models

241

12

Estimation of Diffusion Models

285

13

Econometrics of Derivatives

317

14

Dynamic Models for High-Frequency Data

351

15

Market Indexes

409

16

Management of Extreme Risks

427

References

451 477

Index

v

Preface

The aim of econometrics is to make use of data, statistical inference methods and structural or descriptive modeling to address practical economic problems. The development of econometric methods in finance is quite recent and has been paralleled by fast expansion of financial markets and increasing variety and complexity of financial products. While some material covered in the book is now well established, we gave much consideration to foregoing research. The objective of this book is to report on the current state of scientific advancement and point out the accomplishments and failures of econometric methods applied to finance. Given the progress of financial econometrics and its wide scope, the content of the book necessarily reflects our subjective choice of the matters of interest. Therefore, we devote the next paragraphs to the motivations for the adopted approach and array of problems presented in the text.

Past versus Future It is conceivable to review the theory and practice of financial econometrics in a chronological order. Such an approach seems a priori quite natural and insightful. However, its potential pitfall is to put too much emphasis on the techniques developed in the past and adapted to an environment that no longer exists. Therefore, we focus on the methods related to foregoing research, as well as on those that seem to us to be relevant for future advances. As a consequence, the reader may feel that some topics have been given much attention at the expense of other ones. For example, the arbitrage pricing theory (APT), which seeks the possibilities of making sure gains, has not been given extensive coverage as it requires a set of restrictive conditions, practically never fulfilled in the real VII

Preface

viii

world. Instead, APT is discussed as a particular case of a factor model with additional dynamic features and we give a brief overview of its limitations. A similar argument motivated the limited coverage of autoregressive conditionally heteroscedastic (ARCH) models designed to accommodate time-varying risks. The advent of the ARCH model has marked the last two decades of financial econometrics and was a breakthrough in the way econometricians used to model and evaluate the returns and risks on assets. Nevertheless, due to their restrictive form, the ARCH models fail to account for several empirical features, such as asymmetric responses of volatility to rising and falling asset prices, and postulate a deterministic relationship between the risk and past returns. We believe that future research interests likely will shift toward the family of stochastic volatility models and general nonlinear models, which accordingly were given more attention in the text.

Statistics versus Finance Most textbooks concerning finance illustrate the subjects of primary interest, both of theoretical or practical nature, by data-based empirical results. A converse approach would imply describing the statistical models and methods of analysis along with their financial applications. Neither of these two outlines has been followed in the present text. We believe that an adequate treatment of financial econometrics consists of a wellbalanced synthesis of financial theory and statistical methodology. Therefore, we put much effort in the embodiment of our vision of an optimal blend of these two aspects of financial econometrics. As a consequence, some theoretical results and estimation methods are dispersed among several chapters. For example, the Capital Asset Pricing Model (CAPM) is discussed under various headings, such as portfolio management in application to risk control and the equilibrium model to highlight the CAPM-based derivative pricing. For the same reason, the generalized method of moments appears in the analysis of actuarial models, intertemporal optimization, and derivative pricing.

Case Study versus Empirical Dlustration There currently is a tendency to provide textbooks with computer disks that contain a set of financial series used for empirical illustrations. In our work, we employ a variety of data sampled at frequencies that range from intraday to monthly and comprise time series that represent both European and North American markets for stocks, bonds, and foreign

ix

Preface

currencies. Our purpose is to convince the reader that econometric methods need to be adapted to the problem and data set under study on a case-by-case basis. In this way, we try to convey the message that there does not exist a true model for each market that yields the best approximation of its price dynamics, captures most of the evidenced stylized facts, is valid at different frequencies, and provides a unifying framework to portfolio management, derivative pricing, forecasting, and risk control. One has to remember that a statistical model is a simplified image of reality, which is much too complex to be described exactly. Therefore, an econometrician is aware of the fact that a model is necessarily misspecified. Since specification errors differ depending on the subject under study, models designed for examining various problems may not be compatible. For the same reason, one has to interpret with caution the stylized facts reported in empirical literature. These stylized facts are based on inference from imperfect models and are to a great extent influenced by the adopted research methodology. Indeed, we have seen in the past that even a very slight improvement of a model may produce evidence that contradicts the commonly recognized empirical regularities. For example, volatility persistence is significantly reduced when trading volumes are included in the conditional variance equation. As well, in contrast to a common belief, asset prices do not follow random walks when nonlinear patterns are accounted for in the dynamic specification. Especially, socalled financial puzzles often result from a narrow interpretation of inference based on misspecified models.

Descriptive Diagnostics versus Testing Theory

The above remarks lead to the conclusion that a crucial task of financial econometrics is to eradicate specification errors or at least keep them under control. There exist two instruments to this end: graphical diagnostics, including residual plots, autocorrelograms, tracking errors, and the like, and statistical tests for comparing various hypotheses. We have clearly given a priority to the first type of instruments for the two following reasons. First, graphical diagnostic methods are easier to understand for practitioners than the test statistics and thus avoid human errors in interpreting the outcomes. The focus on graphical diagnostics also eliminates heavy mathematics involved in the theory of tests. Second, the theory of tests relies on a comparison of the test statistic to a critical value that presumes correct specification of the hypothetical model. The critical values, and hence the outcomes, of a test are extremely sensitive to omitted nonlinearities, heteroscedasticity, or poor data.

Preface

x

Topics Covered Chapters 1-13 outline the econometric methods and models readily applicable to forecasting financial data, to portfolio management, and to derivative pricing. Although this material lends itself to a majority of financial applications, the reader needs to be aware that econometric models are plagued by specification errors. The last three chapters of the book deal directly with misspecification problems and offer an array of solutions. Some of the specification errors concern • microeconomic aspects such as investor heterogeneity, noncompetitive behavior of investors, market organization, and market mechanism effects; • the computation of market indexes and the responsibility delegated to scientific committees, which decide on their composition; • the treatment of extreme risks for risk control and determining minimal capital requirements. Given its scope, the book is intended as a text for graduate students in statistics, mathematics, economics, and business who are interested in financial applications. It was developed from our lectures at York University; Montreal University; Schulich School of Business in Canada; INSEAD; ENSAE; Paris I, VI, VII, and IX Universities in France; and Geneva and Lausanne Universities in Switzerland. It has usefulness both at the master's level, because of its emphasis on the practical aspects of financial modeling and statistical inference, and at the doctoral level, for which detailed mathematical derivations of the deeper results are included together with the more advanced financial problems concerning high-frequency data or risk control. By establishing a link between practical questions and the answers provided by financial and statistical theory, the book also addresses applied researchers employed by banks and financial institutions. Instructors who use the book might find it difficult to cover all included material. It is always better to have some topics in the book that instructors can choose depending on their interests or needs. A onesemester graduate course with an explicit focus on discrete time analysis might cover Chapters 1-4, 6, and possibly elements of Chapter 14 when the emphasis of the course lies on statistical analysis or Chapters 8, 13, and 15 when the main field of interest is derivative pricing. Instructors who teach continuous time modeling might consider Chapters 1 and 1013. The material for a course in financial macroeconometricsmight include chapters 1-5, 7, and possibly 9. At an advanced (doctoral) level, the following topics can be used for lectures:

xi

Preface 1. High-frequency data analysis (Chapter 14) 2. Market and risk management (Chapters 14 and 16) 3. Factor models (Chapter 9) 4. Stochastic discount factor models (Chapters 8, 11, and 13)

We expect the ongoing progress in financial econometrics to be driven further by future advances in the domains of martingales and nonlinear time series; parametric and nonparametric estimation methods, including the simulation-based methods and methods of moments; numerical analysis for solving diffusion equations; and integral approximations for pricing the derivatives and in economics, by better explaining the market mechanisms.

Acknowledgment This book would not have been written without the help of some of our colleagues, students, and collaborators. We are grateful to Alain Monfort, Darrell Duffie, Frank Diebold, and an anonymous referee for helpful comments. We are also indebted to Jon Cockerline, the director of Research Services of the Toronto Stock Exchange, and George Tauchen for data and valuable collaboration; and to our former students Serge Darolles, Gaelle LeFol, Christian Robert, and Xingnong Zhu, for their assistance in empirical research. As well, we thank Bonny Stevenson and Fanda Traore for technical support in preparing the text. Finally, we thank each other for the commitment, dedication, and mutual support during the time of writing this book.

Financial Econometrics

1 Introduction

1.1 Assets and Markets 1.1.1 Markets Financial markets comprise markets for stocks, bonds, currencies, and commodities. During the last decade, these markets have grown remarkably fast in number and volume of daily concluded transactions. Their expansion was paralleled by substantial qualitative improvements. The supply of financial products has increased in size, and several new and sophisticated products have been developed. AS well, trading on major stock exchanges has become much faster due to computerized order matching systems that enhance market transparency and accelerate operations. Financial markets satisfy various commercial and productive needs of firms and investors. For instance, the forward markets of futures on commodities ensure the purchases and future deliveries of goods at prices fixed in advance. Their activity reduces uncertainty in transactions and creates a safe environment for developing businesses. Stock markets satisfy essentially the demand of national and international companies for external funds. The possibility of issuing equity tradable on domestic markets and abroad offers easy access to many investors and allows diversification of shareholders. As for investors, the market value of stocks provides information on the performance of various companies and helps efficient investment decisions to be made. Financial markets also serve some purely financial purposes: lending, risk coverage, and refinancing. Especially, bonds issued by the Treasury, various states, or companies represent the demand of these institutions for loans. The use of organized markets to collect external funds has several advantages: It allows for a direct match between borrowers and lend1

2

Introduction

ers; it extends the number of potential lenders by splitting the requested amount into the so-called bonds or notes; it facilitates the diversification of investments; and it allows financing of very risky plans with low probabilities of repayment Gunk bonds and emerging markets}. Moreover, the experience of past decades shows that the development of organized markets has contributed to significant growth of pension funds by providing sustained returns in the middle and long run. Financial assets are also used by investors for coverage against various risks; in financial terminology, this is risk hedging. For example, a European firm that exports its production to the United States and receives its payments in US dollars within six months following a shipment, may wish to cover against the risk of a decrease in the exchange rates between the US dollar and the Euro. Similarly, an institution that provides consumption loans indexed on the short-term interest rate may need to seek insurance against a future decline of this rate. The demand for coverage of diverse types of risk has generated very specific products called derivatives, such as options written on exchange rates or interest rates. Finally, we need to emphasize the role of secondary markets. A standard credit contract involves a borrower and a lender; the lender is entitled in the future to receive regular payments of interest and capital until the expiry date. Secondary financial markets provide the initial lender an opportunity to sell the rights to future repayments to a secondary lender. The trade of repayment rights is widely used by credit institutions as an instrument of refinancing. A related type of transaction involving mortgages is called securitization, which allows a bank or an institution that specializes in mortgages to create financial assets backed by a pool of individual mortgages and to trade them on the market. The assets created in the process of securitization are called mortgage-backed securities (MBS). 1.1.2

Financial Assets

Financial assets are defined as contracts that give the right to receive (or

obligation to provide) monetary cash flows. Typically such a contract specifies the dates, conditions, and amounts of future monetary transfers. It has a market price and can be exchanged whenever there are sufficient potential buyers and sellers. The acquisition of a financial asset can be summarized in terms of a sequence of monetary cash flows, including the purchasing price. It is graphically represented by a bar chart with a horizontal axis that measures the times between consecutive payments and a vertical axis that measures the amounts of cash flows. The cash flows take positive values when they are received and are negative otherwise. Figure 1.1 shows that, unlike standard real assets, the financial assets

3

Asset5 and Market5

Figure 1.1 Bar Chart

need not exist physically. Instead, most financial assets are recorded and traded by computer systems. Below are some examples of financial assets and the associated bar charts. Zero-Coupon Bond (or Discount Bond)

A zero-coupon bond or discount bond is an elementary financial asset. A zerocoupon bond (Figure 1.2) with maturity date T provides a monetary unit (i.e., $1) at date T. At date t with t'5. T, this zero-coupon bond has a residual maturity of H = T - t and a price of B(t,H) =B(t, T - t).

The zero-coupon bond allows for monetary transfers between the dates andT.

1 tr______________

~

__________

T B(t,H) Figure 1.2 Zero-Coupon Bond

time

t

4

Introduction

Figure 1.3

Coupon Bond

Coupon Bond Coupon bonds are similar to loans with fIxed interest rates and constant, regular repayment of interest. The contract specifIes the lifetime of the loan (or maturity) and interest payments (or coupons) and states the method of capital repayment. The capital is usually repaid at the terminal date (or in fine). The coupon bond has a market price at any date after the issuing date O. Figure 1.3 displays the bar chart at issuing date O. If the coupon bond is traded at any date t between 0 and the maturity date T, the bar chart needs to be redrawn. The reason is that the sequence of residual cash flows is altered since some payments prior to t have already been made. Therefore, intuitively, the price PI differs from the issuing price po. Stocks Stocks are assets that represent equity shares issued by individual companies. They give to shareholders the power to inflict their opinion on the

Figure 1.4 Stock Indefinitely Held

5

Asset5 and Market5

~~--~--~--~~---------------time

t+l

Figure 1.5 Stock Sold at t + 1

policy of the firm via their voting rights and to receive a part of the firm's profits (dividends). If we disregard the value of the right to vote, the current price St of a stock is equivalent to the sequence of future dividends, the amounts and payment dates of which are not known at t. Figure 1.4 provides the bar chart representing an indefinitely held stock, whereas Figure 1.5 provides the bar chart of a stock sold at t + 1. Buying and Selling Foreign Currency

To demonstrate transactions that involve buying and selling foreign currency, let us denote by Xt the exchange rate between the US dollar and the Euro at date t. We can buy 1 Euro at t for X t dollars and sell it at t + 1 for Xt+! dollars. The bar chart of Figure 1.6 differs from the one that illustrates a zerocoupon bond because the future exchange rate (i.e., the amount of cash flow at t + 1) is not known at date t. Forward Asset

Let us consider a simple asset, such as an IBM stock. A forward buy contract of this stock at date t and maturity H represents a commitment of a trader

Xt+l

r--------------------L--------___ time t+1 Xt

Figure 1.6 Buying and Selling of Foreign Currency

6

Introduction

to buy the stock at t + H at a predetermined price. Therefore, the buyer starts receiving the dividends after t + H. The existence of forward assets allows stripping the sequence of stock-generated cash flows before and after t + H (Figure 1. 7). Options options are contingent assets that give the right to make a future financial transaction as described in the following example. A European call on IBM stock with maturity T and strike K gives the opportunity to buy an IBM stock at T at a predetermined price K. The cash flow received by the buyer at T is FT=max(ST- K,O) =(ST- K)\

t~

__- L__- L____~____~__~__~__- L_____

buying a stock

t+H

forward contract

t

r---~~~--~~-----------------

t+H

residual strip

Figure 1.7 Stripping of a Stock

7

Financial Theory

where ST is the price at T of the IBM stock. Therefore, this cash flow is uncertain and depends on the future value ST. It is equal to ST - K if ST > K and is zero otherwise. These two outcomes are illustrated in Figure 1.8. Here, Ce(T,K) denotes the price at t of the European call.

1.2

Financial Theory

Financial theory describes the optimal strategies of portfolio management, risk hedging, and diffusion of newly tailored financial assets. Recently, significant progress has been made in the domain of market microstructures, which explore the mechanisms of price formation and market regulation. In this section, we focus attention on the theoretical aspects of dynamic modeling of asset prices. We review some basic theoretical concepts, not all of which are structural. 1.2.1

Actuarial Approach

The actuarial approach assumes a deterministic environment and emphasizes the concept of fair price of a financial asset. As an illustration, let IfSTK

ST-K

T

Ct(T,K)

Figure 1.8 European Call

8

Introduction

us consider at date 0 a stock that provides future dividends d}, ~, ... , d t at predetermined dates 1, 2, ... , t. In a deterministic environment, the stock price has to coincide with the discounted sum of future cash flows: So =

L d)3(O,t),

(1.1)

1=1

where B(O,t) is the price of the zero-coupon bond with maturity t. Moreover, if the short-term interest rate ro is assumed to be constant at all maturities, the above formula becomes ~

dt

So=L-(l )t. 1=1

+ ro

(1.2)

Formulas (1.1) and (1.2) are the essential elements of the actuarial calculus. However, they are in general not confirmed by empirical evidence. The reason is that formulas (1.1) and (1.2) do not take into account the uncertainty about future dividends and the time variation of the short-term interest rate. Some ad hoc extensions of the actuarial formulas have been proposed in the literature to circumvent this difficulty in part. For instance, the literature on expectation models has come up with the formula S - ~ Eo(dt ) 0-

k.J (1 1=1

+ ~)t , 0

(1.3)

in which future dividends are replaced by their expectations evaluated at date O. However, the pricing formula (1.3) disregards again the uncertainty about future dividends. Intuitively, the larger this uncertainty, the greater the risk on future cash flows is. Hence, the observed price will likely include a risk premium to compensate investors for bearing risk. An alternative extension assumes the existence of a deterministic relationship between the derivative prices (e.g., an option written on a stock) and the price of an underlying asset (e.g., the stock). This approach is known as the complete market hypothesis, which underlies, for instance, the well-known Black-Scholes formula. The existence of deterministic relations between asset prices also is not confirmed by empirical research. Essentially, the merit of concepts such as fair price or the deterministic relationship between prices lies rather in their theoretical appeal than in their empirical relevance. 1.2.2

Absence of Arbitrage Opportunity

Let us consider two financial assets; the first one provides systematically, at predetermined dates, the cash flows of amounts smaller than the second one. Naturally, we would expect the first asset to have a lower price.

9

Financial Theory

For instance the price ClT,K) of a European call with maturity T and strike K and written on an underlying asset with price St should be less than St. Its cash flow at the maturity date (ST - Kt indeed is less than ST' The inequality between prices is a consequence of the absence of arbitrage opportunity (AAO), which assumes the impossibility of achieving a sure, strictly positive, gain with a zero initial endowment. Thus, the AAO principle suggests imposing deterministic inequality restrictions on asset prices. 1.2.3 Equilibrium Models

In the approach of equilibrium models, market prices arise as outcomes of aggregate asset demand and supply equilibrium. The equilibrium models are rather complicated due to the presence of assumptions on investor behavior and traded volumes involved in the analysis. Various equilibrium models can be distinguished with respect to the assumptions on individual behavior. Basic differences among them can be briefly outlined as follows. The standard Capital Asset Pricing Model (CAPM) assumes the existence of a representative investor. The equilibrium condition concerns only a limited number of financial assets. The Consumption-Based Capital Asset Pricing Model (CCAPM) instead supposes joint equilibrium of the entire market of financial assets and of a market for a single consumption good. The market microstructure theory focuses on the heterogeneity of economic agents by distinguishing different categories of investors. This classification is based on access to information about the market and therefore makes a distinction between the informed and uninformed investors (the so-called liquidity traders), and the market makers. Microstructure theory also explains the transmission of information between these groups during the process of convergence toward equilibrium. 1.2.4 Predictions

The efficiency of portfolio management and risk control depends on the accuracy of several forecasted variables, such as asset prices, and their time-varying variance (called the volatility). A significant part of financial theory relies on the random walk hypothesis, which assumes that the history of prices contains no information useful for predicting future returns. In practice, however, future returns can often be inferred from past prices and volumes, especially when nonlinear effects are accounted for. There exist various methods to examine nonlinear temporal dependence, such as the technical analysis or the time series analysis of autoregressive conditionally heteroscedastic processes (ARCH).

10

Introduction

1.3 Statistical Features Statistical methods for estimation and forecasting of prices, returns, and traded volumes exploit the informational content of past observations. We give below a few insights on various types of variables used in statistical analysis and on the selection of sampling schemes and methodology. 1.3.1

Prices

Many financial time series represent prices of financial assets. It is important to understand the nature of available data before proceeding to the statistical analysis. The mechanisms of fmancial markets do not differ substantially from those of standard good markets (see Chapter 14 for more details). The trades are generated by buyers and sellers, whose demand and supply are matched directly by computer systems or by an intermediary. On some stock markets, called order driven, the prices offered by traders who wish to buy or sell (i.e., the quotes), are displayed on computer screens accessible to the public. The quotes are ranked starting with the best bid (proposed buy price) and the best ask (proposed sell price). This type of market includes the Toronto Stock Exchange (TSE) and the Paris Bourse (PB), for example. On other stock markets, such as the New York Stock Exchange (NYSE) and the National Association of Securities Dealers Automated Quotation (NASDAQ), asks and bids are determined by market makers and include their commissions. The price at which assets are effectively exchanged can therefore be equal to the bid, the ask, or even a different amount, especially in the presence of market makers. Accordingly, the price records may contain the bids, asks, and/or traded prices. Also, prices per share depend not only on the exchanged assets and times of trade, but also on the traded quantities (volume) and individual characteristics of investors and may eventually include the commission of an intermediary. Moreover, in particular cases, the publicly displayed prices may differ from the true trading prices. Therefore, even on well-organized financial markets for which information is accurate and available on line in real time, it is important to know the genuine content of price records. In particular, we have to consider the following questions: 1. Do the available data contain the true trading prices, quotes, or proxies for trading prices computed as geometric averages of bids and asks? 2. Empirical analysis may occasionally concern separately the buyerinitiated (ask) or seller-initiated (bid) trades. In such cases, only sequences of ask and bid prices (signed transactions) need to be extracted from records.

11

Statistical Features

3. Do the prices include transaction costs or commissions of intermediaries? Are they corrected for the tax transfers effectuated by either the buyer or seller? 4. Is the market sufficiently liquid to eliminate noncompetitive effects in price formation? This issue arises in the empirical analysis of infrequently traded assets. 5. Have the prices been adjusted for inflation to facilitate their comparison at different dates? This question is especially important for bonds with coupon payments that commonly are discounted. 1.3.2 Frequency of Observations

Recent expansion of financial markets has entailed increasing numbers and frequencies of trades due to the implementation of electronic ordermatching systems. Until the early 1980s, data on prices were registered daily at either market openings or market closures. Accordingly, daily traded volumes were also recorded. Therefore, a sample spanning, for example, four years of asset trading would amount to about 1,000 daily observed prices (there are about 250 working days per year). The electronic systems now allow instantaneously updated records to be kept of all transactions. They register on computer screens all movements that reflect all changes in the list of queued orders (called the order book) and have an accuracy of a fraction of one second. Therefore, the size of data files comprising the so-called tick-by-tick data or highfrequency data may be extremely large. A four-year sample may contain more than 1 million records on trades of a liquid stock or more than 3 million records on exchange rates. Since transaction records are made at various times and are not necessarily integer multiples of a time unit such as one day, the timing of trades requires particular consideration. It is important to distinguish the price data indexed by transaction counts from the data indexed by time of associated transactions. Empirical evidence suggests that the price dynamics in calendar time and in transaction time differ significantly. The comparison of both sampling scales provides insights into the trading activity of an asset and its liquidity. 1.3.3 Definition of Returns

Time series of asset prices display a growing tendency in the long run. Occasionally, however, price series may switch from upward to downward movements and vice-versa in the short or middle run. For this reason, prices of the same asset sampled at different periods of time may exhibit unequal means. Since this feature greatly complicates statistical inference,

Introduction

12

it needs to be eliminated. A simple approach consists in transforming the prices into returns, which empirically display more stationary behavior. Let us consider a financial asset with price PI at date t that produces no dividends. Its return over the period (t,t + H) is defined as r(t,t + H)

PHp~ PI .

(1.4)

The return depends on time t and the horizon H. Very often, statistical analysts investigate returns at a fixed unitary horizon: r(t,t + 1)

PH~~ PI ,

(1.5)

which in general display more regular patterns than the initial series of prices. In theoretical or econometric analysis, the above formula is often replaced by the following approximation: Let us suppose the unitary horizon and a series of low-value returns: we obtain f(t,t + 1) = logpl+ 1 - logpl =

IOg(PP;)

=

IOg( 1 + PH; PI)

= PI+~~ PI = r(t,t + 1). The returns defined in (1.5) are used by banks, various financial institutions, and investors in financial markets. The differences of price logarithms conventionally represent the returns examined by researchers. However, it is important to note that f(t,t + 1) = _

IOg( 1 +PI+~~ PI) (

- r t,t+

1)

-

r(t,t+1l

2

'

when we consider the expansion at order two. Therefore, the approximation f (t,t +1) undervalues the true return and may induce a significant bias due to replacing the theoretical definition of returns in (1.5) by the approximation. 1.3.4

Historical and Dynamic Analysis

The distributional properties of returns provide valuable insights on their future values. The analysis can be carried over in two frameworks. The

13

Statistical Features

static (historical) approach consists of computing marginal moments such

as the marginal mean and variance from a sample of past returns and using these statistics as indicators of future patterns. The dynamic approach concerns the conditional distribution and conditional moments, such as the conditional mean and variance. These are assumed to vary in time, so that at each date t, new estimates need to be computed conditional on past observations. The conditioning is necessary whenever there are reasons to believe that the present returns, to some extent, are determined by the past ones. By the same argument, future returns depend on present and past returns as well, and their values can be used for forecasting. Historical Approach

The historical approach explores the marginal distribution of returns. For instance, let us consider the series of returns on a single asset Yt = r (t,t + 1). The expected return is evaluated from the data on past returns by 1

T

EYt = T LYt = YT, 1=1

whereas the variance of the return is approximated by 1

T

lYt = T L (Yt -

YT)2.

1=1

The historical approach can be refined by applying rolling estimators. Implicitly, this procedure assumes that marginal distributions of returns vary in time. It is implemented by introducing a window of a fixed length K and approximating the expected return at t by the rolling average: 1 X-I 1 EYt = K L Yt-k = K(Yt +Yt-l + ... + Yt-X+l). k=O

On the transition from t to t+1, the approximation of the expected return is updated by adding a new observation Yt+l and deleting the oldest one

Yt-X+l· Conditional Distribution

The analysis of the marginal distributions of returns is adequate for processes with a history that provides no information on their current values. In general, the expected values and variances of returns are partly predictable from the past. This property is called temporal dependence and requires a dynamic approach, which consists of updating the conditional moments in time by conditioning them on past observations. Very often, the analysis is limited to the first- and second-order conditional moments:

14

Introduction

and

V(Ytl~),

where l'!::! =(Yt-I> Yt-2, ... ) denotes the information available at date t - 1. Although the conditional moments are more difficult to approximate, in practice they yield more accurate forecasts. Horizon and Observation Frequency

The conditional distribution may be used for predicting future returns at various horizons and sampling frequencies. While the predictions of future returns may not always be improved by conditioning on the past, the conditional expectations often yield better outcomes than the historical expectations. For illustration, we discuss below the prediction accuracy in computing the conditional variance of prices, called the price volatility. Let us first assume that prices are observed at integer valued dates. The price volatilities at date t can be computed at one, two, or more units of time ahead:

V(Pt+! IPt>Pt-I>Pt-2, ... ) V(pt+2Ipt>pt-I>Pt-2, ... ) V(PHH IPt>Pt-I>Pt-2, ... )

at horizon 1, at horizon 2, at horizon H.

This approach allows examination of the dependence of volatility on the forecast horizon (the so-called term structure of volatilities). If prices are observed every two units of time and t is even, the volatility at horizon 2 is

V(pt+2Ipt>pt-2,PH, ... ). It differs from the previously given volatility at horizon 2 in terms of the content of the conditioning set, for which observations at odd dates are omitted. The above discussion suggests that price volatility is a complex notion comprised of the effects of time, horizon, and sampling frequency. 1.3.5 Nonlinearity

The complexity of financial time series has motivated research on statistical methods that allow accommodation of nonlinear dynamics. The nonlinear patterns result from the specificity of financial products and the complexity of strategies followed by investors. We give below some insights on the nature of nonlinearities encountered in theory and/or documented by empirical research. Nonlinearity of the Variable to Be Predicted

Let us provide two examples of the nonlinearity of the variable to be predicted. First, market risk is related to the volatility of returns, com-

Statistical Features

15

monly approximated by squared returns. Therefore, the variable to predict is a power function of the asset price. Second, there exist derivative assets with definitions that involve nonlinear transformations of the prices of underlying assets. For instance, the pricing formula of a European call is based on an expectation of (ST- KY, which is a nonlinear transform of the stock price. Nonlinearity of the Relationships between Prices

Even though prices of a derivative and of an underlying asset do not generally satisfy a deterministic relationship, they likely are randomly and nonlinearly related. For instance, the price of a European call ClT,K) and the price SI satisfy nonlinear inequality constraints due to the requirement of the AAO. Nonlinearity with Respect to Parameters

Empirical evidence suggests that both the marginal and the conditional return distributions feature departures from normality. Essentially, research has documented the asymmetry of distributions and fat tails, implying a high probability of observing extreme returns. For this reason, standard analysis based on linear regression models, which involves the first two moments only, may be insufficient or even misleading in many financial applications. Nonlinearity of the Dynamics

The observed dynamics of returns feature several nonlinear patterns. By looking at a trajectory of returns sampled daily or at a higher frequency, one can easily observe time-varying dispersion of returns around the mean or, equivalently, their time-varying variance (volatility). The first observation of this type was made by Mandelbrot in the early 1950s, who empirically found that large returns (positive or negative) have a tendency to be followed by large returns and that small returns have a tendency to be followed by small ones of either sign. This phenomenon is known as volatility clustering and points out not only the variation, but also the persistence of volatility. During the last twenty years, estimation and prediction of volatility dynamics have been given considerable attention and have resulted in a large body of literature on models with conditional heteroscedasticity. Technically, future squared returns are represented as functions of past squared returns, and nonlinearity arises from the presence of power functions. In more recent developments, temporal dependence in volatility has been associated with regime switching, which means that episodes of high

16

Introduction

or low returns are explained by movements of a latent variable that admit a finite number of discrete states. Nonlinearity of the Financial Strategies The myopic or intertemporal optimizations of investors for dynamic portfolio management, hedging, and risk control are nonlinear with respect to the expected future evolution of prices. Then, at equilibrium, the behavior of investors induces nonlinear effects on future prices.

2 Univariate Linear Models: The AR( 1) Process and Its Extensions

IN THIS CHAPTER, we introduce elementary time series models for estimation and forecasting of financial data. We begin the discussion with a simple autoregressive model that provides a good fit to various series of returns defined as logarithmic price changes sampled monthly or at a lower frequency. The so-called AR(I) (first-order autoregressive) model is extended to a general class of autoregressive moving average (ARMA) models later in the text. Technically, two sets of basic constraints are imposed on time series for feasibility of inference and forecasting. The first one requires the time invariance of the first two moments of the marginal distribution. The second one concerns the dynamics and assumes the same type of temporal dependence across the sample. A time series that satisfies these conditions is called second-order stationary or simply stationary. Statistical analysis of time series consists of exploiting temporal dependence in stationary processes to build forecasts of their future values based on information available for past and present observations. This predictability property characterizes dynamic processes in the class of ARMA processes. There also exist processes that do not exhibit any relationship among their past, present, and future realizations. They are called white noise processes. A weak white noise is defined as a sequence of uncorrelated variables of mean zero and variance cl. It is an elementary time series, usually denoted WN(O,d); it appears in this chapter as a building block of more complex structures.

17

18

Univariate Linear Models

In the first section, we introduce the autoregressive process of order 1 and study its temporal dependence by means of dynamic multipliers and autocorrelations. Statistical inference is discussed in the second section; we introduce various tests of the white noise hypothesis as well. Section 2.3 presents the effects of modifications in the sampling frequency on the autoregressive representation and introduces the continuous time analogue of the AR( 1) model. The limiting case of a unit root process is covered in Section 2.4 in relationship to the so-called martingale hypothesis. In the last section, we discuss the class of ARMA processes. It has to be emphasized that linear ARMA models represent processes with time-varying conditional means and implicitly assume constant conditional variances. This assumption is not satisfied by financial series, which typically feature time-varying variances (volatility). Models that take into account the time-varying volatility are discussed in Chapter 6. In practice, however, such models are often applied to residuals of ARMA models estimated in a first step. Therefore, the reader needs to be aware of the simplifying assumption made in this chapter. This approach entails some consequences with respect to the variances of estimators, performances of test statistics, and validity of critical values and prediction intervals.

2.1 2.1.1

Definition and Dynamic Properties

The Autoregressive Process and Its Moving Average Representation

2.1: The series (y" t E Z) follows an autoregressive process of order 1, denoted AR(1), if and only if it can be written as

DEFINITION

Yt = PYt-1 + E" where (E" t E Z) is a weak white noise with variance VEt =d, and P is a real number of absolute value strictly less than 1. The coefficient P is called the auto-

regressive coefficient. The dynamics of the autoregressive model are very straightforward. The current value of the series (Yt) is determined by two components. The first one represents the past effect and is determined by the history of the process. The relevant history is limited, however, to the last realization Yt-I only, and the impact of this variable is attenuated to some extent by the autoregressive coefficient Ipi < 1. The second component can be viewed as a random shock that occurs at time t. It is called the innooation and is not observable. By solving the autoregressive equation recursively, the current value Yt can be expressed in terms of the current and lagged shocks.

19

Definition and Dynamic Properties PROPOSITION

2.1: The autoregressive process of order 1 can be written as 2

Y, = E, + pEt-1 + P Et-2 + ...

This is the (infinite) moving average (MA(oo)) representation of the AR( 1) process, and ph is the moving average coefficient of order h. The moving average coefficients can be interpreted as follows. Let us consider a "transitory shock" O(Eo) at time 0 (say) that adds up to the initial innovation, transforming Eo into Eo + O(Eo). In consequence, the future values of the process Yh accordingly become Yh + O(Yh), where O(Yo)

= O(Eo), O(YI) = pO(Eo), ... , O(Yh) = phO(Eo), ...

The moving average coefficient ph alters the impact of the additional shock on future values of y. Since Ip I < 1, the shock effect (called the multiplier effect) decreases asymptotically to 0:

rImp h-- rIm 8O(Yh) (Eo )-- 0, h ...=

h ...=

and ultimately dies out (Figure 2.1).

2.1: The previous results can be extended by including a constant term in the autoregressive model: Y, = c + PYt-1 + E,. Equiva-

REMARK

C\I

6

o

6

~

__~____~__~__~__~__~__~__~__~~ 10 2 4 9 6 7 8 3 5 h Figure 2.1

Multipliers: p = 0.5, 0.8, 0.9

Univariate Linear Models

20

lently, the autoregressive equation can be written as: YI - m = p (YI_I m) + EI, where m = c/(l - p), to show that the above results hold for the demeaned process (YI - m).

2.1.2

The First- and Second-Order Moments

The condition Ipi < 1 ensures the existence of the first- and second-order marginal moments of (YI)' It also guarantees their time invariance, which is necessary for the second-order stationarity of the process defined in the following proposition. PROPOSITION

2.2: The autoregressive process of order 1 is such that

(i) EYI = 0, Vt;

..

cr2p h l

d

..

1

(n) Cov(Yt.YI_h) = -1--2' Vt,h; zn partzcular, lYl = - - 2 ;

-p

1-p

(iii) p(t,h) = plhl, Vt,h; (iv) (YI) is second-order stationary. PROOF:

(i) We have

(ii) Let us assume that h;::: O. The autocovariances are defined by

Cov(Yt> YI-h) = cov(i pIEI_I, l=O

=

i

pkEI_h_k)

k=O

L L pl+kCOV(EI_I, EI-h-k) l=O k=O

=

L

ph+2k V(E I_h_k)

k=O

(since the white noise is uncorrelated)

phcr2 = 1- p2' (iii) Therefore, the auto correlations are power functions of the autoregressive coefficient:

Definition and Dynamic Properties

21

(iv) Finally, since EYt and COV(YhYt-h) do not depend on the time index t, the process satisfies the condition of second-order stationarity. QED Proposition 2.2 implies that the mean and variance of an AR(l) process remain constant in time. This follows from Proposition 2.2 (i) and (ii) since, for h = 0, the covariance of Yt with itself is equal to the variance of Yt. This stationarity condition concerns the marginal distribution of the process. The dynamic aspect of stationarity concerns the behavior of autocovariances at h"# 0. The covariances between the realizations of a stationary time series separated by h units of time are functions of the distance in time h only. They do not depend on the timing of observastions or on their indexes in the sample. For example, in a sample of daily observations on market returns, the covariances of two consecutive returns have to be constant over the whole sample no matter how long it is and how many days it spans. As well, the covariances of each pair of observations separated by h units of time (say one, two, or three days) have to be time invariant. The formula of autocovariances indicates that the marginal variance of (y,) is a function of both (32 and p. As a function of p, it increases with Ip I and tends to infinity when p approaches the limiting values +1, -1 (Figure 2.2). The auto correlations are obtained by dividing the autocovariances by the variance of Yt. The sequence of autocorrelations, considered a func-

l!)

OJ

U

c

as

.~

(')

C\I

-1.5

-1.0

-0.5

0.0

0.5

rho Figure 2.2

Marginal Variance

1.0

1.5

Univariate Linear Models

22

tion of integer valued lags h, is called the autocorrelation function (ACF) (Figure 2.3). For stationary processes, it decreases exponentially to O. The rate of decay is slow when the absolute value of p is large and is fast in the opposite case. Like autocovariances, the autocorrelations describe the memory of a time series in terms of temporal dependence between realizations separated by a varying number h of time units. As mentioned, the autoregressive parameter can be viewed as the persistence measure of an "additional transitory shock." This effect is observable from Figures 2.2 and 2.3. An increase of the autoregressive parameter p results in higher autocorrelations and stronger persistence of past shocks. It also has an immediate effect on the marginal variance ~I=d/(l- p2) since this expression is an increasing function of Ipl. The persistence effect of p can be observed in simulated trajectories of autoregressive processes. Figures 2.4 and 2.5 display various AR(I) paths generated from a Gaussian white noise with unitary variance. More precisely, we consider independent drawings u~, t = -1000, ... , T from the standard normal distribution. The simulated path for given values of the parameters p and d is defined recursively by y~(p,d) = PY~_I(P,0"2) + O"u~, t = -999, ... , T, with the initial condition y~lOoo(p,d) = 0, and formed by observations with nonnegative time indexes t ~ O. The stretch of simulations was initiated at the origin -1000, far away from 0, to eliminate at t = 0 a too strong effect of an arbitrary initial condition. For nonnegative

00

c:i c: o ;cc ttl •

~o

-

80' 14.1 for the test based on Pr and is rejected if ir > 2.86 for the test based on i7' PROPOSITION

2.7: If(Yt) is an 1(1) process with drift

~

(i) Pr tends asymptotically to 1 and ~r to ~; (ii) the rates of convergence are liT; (iii) the tests of the unit root hypothesis based on Pr and iT at the 5% level consist of rejecting the unit root hypothesis if T( 1- Pr) > 21.8 and reject· ing the unit root hypothesis if ir > 3.41, respectively.

40

Univariate Linear Models

The testing procedures introduced in Propositions 2.6 and 2.7 are known as Dickey-Fuller tests (Dickey and Fuller 1979, 1981). The nonstationarity of the process (Yt) explains a faster rate of convergence of the OLS estimator of p compared to the typical square root T convergence shown in Section 2.2. Indeed, we have asymptotically

°

that is, the variance tends to with the increasing sample size at a faster rate than in a stationary process. This is called the superconsistency. 2.4.4.

The Martingale Hypothesis

The I( 1) hypothesis is related to the so-called martingale hypothesis, which plays a crucial role in asset pricing.

2.3: A process (Yt, tEN) is a martingale if and only if EtYt+ 1 = yt, V t :?: 0, where E t denotes the conditional expectation given the information 2'!. Equivalently, this condition can be written as DEFINITION

(2.14) where the process (lOt, t :?: 0) satisfies Vt.

(2.15)

Here, £ is called a martingale difference sequence. It is easy to check that a martingale difference sequence automatically has mean zero and is uncorrelated. Therefore, whenever the variance of lOt is time independent, the process (lOt> t:?: 0) is a weak white noise, and the martingale process (yt, t:?: 0) is integrated of order 1. Due to imposing (2.15) instead of E£t = 0, COV(£h£t-h) = 0, Vh:F- 0, the martingale condition is stronger than the 1(1) condition. The martingale condition of prices implies that the best (nonlinear) prediction of the future price is the current price. The current price conveys all information that can help to predict Yt+ I and can be directly used as a predictor. This property is also valid for predictions at larger horizons h. Indeed, by the law of iterated expectations, we get (2.16) The role of martingales in finance becomes clear when we consider some complex financial strategies. Let us consider a risk-free asset with a price that is constant and equal to 1 and a risky asset with a price Pt that follows a martingale process. Let Wo denote an initial endowment. It can

The Autoregressive Moving Average Processes

41

be invested in both assets, and the portfolio allocations can be regularly updated. Let us denote by Wo, which is contradictory. This explains the equivalence of the terms efficient market hypothesis and martingale hypothesis. The market is efficient if even a skilled investor has no sure advantage.

2.5

The Autoregressive Moving Average Processes

The AR( 1) model belongs to a wide class of models that represent conditional mean dynamics. This class of ARMA models combines the autoregressive and moving average patterns. 2.5.1

The Wold Theorem

The Wold theorem plays a central role in time series analysis. It implies that the dynamics of any second-order stationary process can be arbitrarily well approximated by a moving average model.

42

Univariate Linear Models

DEFINITION 2.4: The process (YI) is second-order stationary (or weakly stationary)

if

(i) its mean is time independent, EYt = m, (ii) the autocovariance Cov(y"Yt--h) = y(h) depends only on the absolute value of the difference of time indexes.

Under some mild regularity conditions, a second-order stationary process can always be expressed as a linear function of current and past values of a weak white noise. PROPOSITION 2.9, WOLD THEOREM: Any second-order stationary process (y" t E Z) [such that limh....... LE(Yt I~) = EyJ can be written as Yt= m +£1+ al£l-l + .... + ah£t-h + .... = m +

L ahEt-h. iFO

where (£/0 t E Z) is a weak white noise, and the coefficients are square summable, that is, they satisfy l:;=o a~ < + 00.

Thus, any process can be written as a moving average of order infinity. possibly with a constant, under a regularity condition that requires that the very distant past has no impact on the current value. 2.5.2

Various Representations

Autoregressive Moving Average Model

The rationale for an ARMA representation of second-order stationary processes is approximation of the above infinite-order moving average by a model with a finite number of parameters. In the rest of this section. we assume for simplicity that all processes have zero mean. In the case when the mean m is different from zero, YI can always be transformed into Yt - m. so that all results hold for the demeaned process. DEFINITION 2.5: A second-order stationary process (Yl> t E Z) is an ARMA(P,q) process of autoregressive order p and moving average order q if it can be written as

where cj>p

* 0, Oq * 0, and (£1' t

E

Z) is a weak white noise.

The coefficients cj>; (i = 1•...• P) and OJ (j = 1, ... , q) are the autoregressive and moving average coefficients, respectively. The description of the dynamics of the process can be simplified by introducing the lag operator L, such that

43

The Autoregressive Moving Average Processes "iIt.

(2.20)

The ARMA process can be written as (2.21)

where the autoregressive and moving average lag polynomials are (L) = 1- hYe + E/+h + Cl>Et+h-l + ... + Cl>h-lEl+l' By substituting for each term its best linear predictor and observing that LE(E/+k IL ) = 0, ' 0, we obtain

LE(Ye+h Ill) = Cl>hYe. QED Thus, the dependence of the linear forecast on the past is limited to the most recent observation only. REMARK 3.2: When (Ye) represents a vector of price series of various financial assets, the current prices contain all information that can help forecast future prices from the VAR(I) model. (It is a form of

61

Definition and Dynamic Properties Series 1

q

Series 1 and Series 2

(Id + 8Lrl1';=Et < = > 1'; - 81';-1 + 8 21';_2 + ... + (-li8 k1';_k + ... = Et < = > 1'; = Et + 81';-1 - 8 21';_2 + ... + (-It+18k1';-h + ... We find the best linear forecast of 1';+1 computed at time t: LE(1';+II.!}) =81'; - 8 21';_1 + ... + (-It+ 18 k1';+I_h + ... ,

and note that this expression involves the whole past of the process.

3.2

Estimation of Parameters

Among VARMA models, the pure autoregressive process is exceptionally easy to estimate. Under standard conditions, the VAR parameters can be approximated by ordinary least squares applied to the system equation by equation. This follows from a general result on the so-called SUR model. The model is presented below, along with some remarks on the SUR representation of VAR(P) processes. 3.2.1

Seemingly Unrelated Regressions

Let us consider two multivariate series 1'; = (YII> ... , Ynt )' and Xt = (XII> ... , X Kt )' of respective dimensions n and K. We assume that these two subsets of series satisfy a linear system

f~lt = bnXlt + ... + blKXKt + Elt;

l

Ynt = bnlXIt + ... + bnKXKt + Ene,

where Et =(EIt, ... ,Ene)' is a weak white noise with components uncorrelated with the X variables. Therefore, the model corresponds to n regressions with different dependent variables and identical explanatory variables. This model also admits a vector representation (3.18)

using conventional notation.

66

Multivariate Linear Models

Let us now denote by (XI> Y;), t = 1, ... , T, observations available on the two sets of variables. In this setup, the matrix of regression parameters B and the variance-covariance matrix of the noise term n can be estimated by the quasi-maximum likelihood method (QML). As mentioned, in this approach, we build the likelihood function as if the error terms were normally distributed. The quasi-likelihood function is L=

-f

log detil -

~

f

(Y; - BXt)'n-I(Y; - BXt).

(3.19)

t=1

There exists an explicit solution that maximizes this expression with respect to the regression coefficient B and the variance of the noise n. (Zellner 1962): PROPOSITION

3.4:

(i) The QML estimator of B is equivalent to the ordinary least squares (OLS) estimator computed separately from each equation. Consider the equation numbered i and denote by F the vector of observations of the ith endogenous variables (Yib ... , YiT)', by X the matrix of observations of the explanatory variables, and by bi =(bib' .. , b;x)' the ith row of B. We get

f/ = (XX)-IXF. (ii) These estimators are consistent and asymptotically normal. Their asymptotic variance is

where co;; is the (ij)th element of 0. (iii) A QML estimator of co;; is

where

Eu =Yit - x/i is the OLS residual of equation number i. 3.2.2

Application to Vector Autoregressive Models

It is easy to see that the VAR models fit into the SUR framework. Let us, consider, for instance, a bivariate VAR(2) process: Y; = 1 Y;-I + 2Y;-2 + Et

67

Joint Analysis of Intraday Prices and Volumes

This model satisfies the definition of the SUR model with explanatory variables X t =

(i~:) on the right-hand side (rhs). Hence, the autoregressive

coefficients can be estimated by ordinary least squares. In the first equation, we have to run the regression on and in the second equation, we regress on

3.3 Joint Analysis of Intraday Prices and Volumes 3.3.1

Estimation of a Vector Autoregressive Representation

The raw data consist of daily closing values of the Standard and Poor's (S&P) composite stock index and daily volumes of shares traded on the New York Stock Exchange (NYSE). Since the market index is not directly traded, these data have to be interpreted as aggregate summaries of stock prices and market activity. The S&P 500 is a value-weighted average of

It)

0

Of

,~..,-t'~ooI.''''''''''"iIII. "'lIIti~~"

In

,

1950

1958

1966

1974

1982

1990

1982

1990

returns

C! ~ ~

0

c:i ~

0

en 0

cO

1950

1958

1966

1974 volume

Figure 3.5

The Return and Volume Series, Daily S&P 500

68

Multivariate Linear Models

Series 1 and Series 2

Series 1

~r---~~~~~~~~=---~

o

.. o

u. ..

()O

«

~ q

o

--L------ ---- --- -------------- --- -----a

__

----

--

----

----

-----

--------

~----~----------------~

o

10

20

30

~

-----------------------------------------------

0

10

20

30

Series 2

~r---~S~e~ri~e~s~2~a~n~d~S~e~r~ie~s~1~--~ :!!

o

u. ( )8 .

«0 ~

@l -----------------------------------------------

9

-30

-20 Lag

-10

0

o d

-

-

a_a __ a_a ___ -

o

-

-

---

___ ._ a_a_ - - - - - - - - - - - - - - - . _ - - -

10

20

-- -

30

Lag

Figure 3.6 Autocorrelations and Cross Correlations, Returns and Volumes

prices of common stocks, most of which are traded on the NYSE. Before 1957, it included 90 stocks; it was broadened to 500 stocks on March 1, 1957. The set of stocks included in the composite index is regularly updated and so are the associated weights (see Chapter 15 for more details on market indexes). The volume data originate from the S&P Security Price Index Record. They are obtained by aggregating traded volumes of various stocks, with weights depending on stock prices. The returns are computed by differencing the log-price index, and the volumes are transformed into logarithms. As well, the raw data have been preliminarily filtered to eliminate some trend and seasonal effects (see Gallant, Rossi, and Tauchen 1992). Figure 3.5 shows the evolution of the return and volume series over the period 1950-1995. We first compute the joint autocorrelogram of the volume and the return series (Figure 3.6). The marginal autocorrelogram of the return series is typical for autoregressive processes of a low order (likely 1). whereas the marginal autocorrelogram of volumes features a high degree of persistence. The cross correlograms show a strong impact of lagged volumes on current returns. Thus, when we consider the univar-

---

Table 3.1

Estimation of VAR(1) Return

Volume Valid cases Degrees of freedom Rbar2 F(2,10871)

10,874 10,871 0.672 11,141.586

Total SS

940.889

R2 Residual SS Probability of F

0.672 308.510 0.000

Valid cases Degrees of freedom Rbar2 F(2,10871)

10,874 10,871 0.022 125.636

Volume Variable Constant V01'_1 T'_l

Estimate

1.760186 0.815828 0.000116

SS = sum of squares SE = standard error t= t-value

t 33.347463 0.052783 0.005520 147.804709 5.988261 0.000019 SE

Total SS

75858454.453

R2 Residual SS Probability of F

0.023 74144677.935 0.000

Return Prob> It I .000 .000 .000

Variable

Estimate

Constant -54.794754 5.998239 volt-! 0.146702 T,-l

SE

25.876226 2.705923 0.009529

-2.117571 2.216707 15.394699

Prob> It I .034 .027 .000

Multivariate Linear Models

70

iate series of returns, we get the impression that past market history is not relevant, whereas a joint analysis of volume and returns shows that lagged volumes can improve linear predictions of future returns. We show below that volumes also help to predict nonlinear features of return dynamics, such as the volatility. Some residual seasonal effects can also be observed. We estimate in Table 3.1 a VAR(l) representation with a constant term. The estimated autoregressive coefficient matrix of the volumereturn series is

cf, = (0.8158 0.0001) 5.9982 0.1467· The eigenvalues of this matrix are 0.8167 and 0.1458. The first eigenvalue is large, indicating strong persistence of the volume series. 3.3.2

Intraday Seasonality

In this section, we provide some insights on intraday regularities (called intraday seasonalities) observed in hourly volume and return data. The existence of such regularities implies that standard assumptions of a constant autoregressive coefficient and constant innovation variance underlying the VARMA models (see the preceding section) are not satisfied empiri-

o

,-----------------------------------------,

o

~ ~~----~----~----~------.-----~----~ 09:00

10:00

11:00

12:00 hours

13:00

14:00

Figure 3.7 Hourly Average Returns, Bank of Montreal

15:00

71

Mean-Variance Efficiency ~

,------------------------------------------,

10

c:i

o

c:i

~----~~----~----~----~----~------~

09:00

10:00

11:00

12:00

13:00

14:00

15:00

hours

Figure 3.8 Hourly Average Volume5, Bank of Montreal

cally. We consider the stock of the Bank of Montreal traded on the Toronto Stock Exchange (TSE). The trading day is divided into hourly subperiods, beginning from the opening at 09:00 until the market closure at 16:00. The hourly returns and volumes averaged over several days are plotted in Figures 3.7 and 3.8. We observe a typical U shape for the activity curve. It shows that traded volumes are high after the opening and before the closure. They decrease during the lunch period. Intraday seasonal effects also affect dynamics. Table 3.2 gives, from the tick-by-tick data, the estimated VAR(l) coefficients as a function of the hour of the day for October 1, 1998. The joint dynamics features strong intraday seasonality. It is revealed by the eigenvalues of the matrices of autoregressive coefficients (Table 3.3). Depending on the hour, the eigenvalues are real, positive or negative, or even complex, implying the presence of cyclical movements.

3.4 Mean-Variance Efficiency The VAR model is a useful tool for portfolio management. This section describes the fundamentals of this approach for practical implementa-

72

Multivariate Linear Models Table 3.2 Estimation of VAR(1) in Hourly Subsamples Volume

Hour

09:00 10:00 11:00 12:00 13:00 14:00 15:00

Variable

Estimate

SE

Constant volt-! rt-l Constant VOlt-l rt-l Constant volt_l rt-l Constant volt_l rt-l Constant VOlt_l rt-l Constant VOlt_l rt-l Constant VOlt_l rt-l

15.664305 -0.302834 -0.179635 13.574141 -0.177135 -1.074599 8.727931 0.119328 0.486321 7.513878 0.066224 -0.002416 8.422337 0.158842 -1.348326 10.9986lO -0.084764 -0.380188 12.527214 -0.002404 -0. 115 lOO

1.773152 0.132568 0.337359 1.286492 0.lO2461 0.409970 1.403468 0.120632 0.762924 1.332435 0.148097 0.394244 1.656767 0.143872 0.711568 1.480554 0.130608 0.3948lO 1.584273 0.lO2579 0.449949

Variable

Estimate

Probability> It I

8.834157 -2.284358 -0.532474 lO.551281 -1.728811 -2.621169 6.218834 0.989194 0.637443 5.6392lO 0.447170 -0.006128 5.083597 1.104051 -1.894866 7.428713 -0.648992 -0.962965 7.907231 -0.023438 -0.255806

.000 .026 .597 .000 .088 .0lO .000 .326 .526 .000 .657 .995 .000 .276 .065 .000 .519 .340 .000 .981 .799

Returns Hour

09:00 lO:OO 11:00 12:00 13:00 14:00 15:00

Constant volt_l rt-l Constant volt_l rt-l Constant VOlt_l rt-l Constant volt-! rt-l Constant VOlt_l Tt-l Constant volt-! rt-l Constant VOlt_l rt-l

-0.166868 0.004239 -0.183590 -0.381711 0.063129 -0.124897 0.175233 -0.017378 -0.287463 -0.462773 0.053552 -0.272268 -0.616406 0.048714 -0.124255 0.233328 -0.021466 -0.340070 -0.497027 0.026622 -0.324222

SE

0.702866 0.052549 0.133727 0.329440 0.026238 0.lO4983 0.216023 0.018568 0.117430 0.481662 0.053535 0.142515 0.338113 0.029361 0.145217 0.465058 0.04lO25 0.124014 0.337962 0.021882 0.095984

Probability>

-0.2374lO 0.080676 -1.372870 -1.158667 2.406025 -1.189681 0.811178 -0.935906 -2.447953 -0.960784 1.000315 -1.9lO451 -1.823076 1.659111 -0.855654 0.501718 -0.523224 -2.742187 -1.470660 1.216594 -3.377854

.813 .936 .176 .250 .018 .238 .420 .353 .017 .342 .322 .062 .075 .lO4 .397 .618 .603 .008 .145 .227 .001

It I

Mean-Variance Efficiency

73 Table 33 Eigenvalues in Hourly Subsamples

Hour

09:00 10:00 11:00 12:00 13:00 14:00 15:00

-0.296 -0.190 -0.151 ± 0.259 i 0.097 -0.265 0.066 -0.271 0.017 ± 0.214 i -0.056 -0.368 -0.012 -0.314

tions. We begin with theoretical remarks on efficient portfolio selection and next study an empirical example involving a VAR( 1) model.

3.4.1

Efficient Portfolios

Suppose the existence of a finite number of securities indexed by i, i = 0, ... , n. The security 0 is risk-free and has a price equal to 1 at date t, while its value at t + 1 is 1 + rt. where rt is the risk-free rate. The other securities are risky and have prices pi,1> i = 1, ... , n, t = 1, ... , T. They pay no dividends. A portfolio is described by an allocation vector (ao, aI. ... , an)' = (ao, a')' (say) of quantities ai of various securities. It defines the portfolio allocation. The portfolio is characterized by an acquisition cost at date t of ao + a'pt = W t and a value at date t + 1 of ao(1 + rt) + a'pt+! = W t+!. At time t, this future value is partly unknown. Its expectation is Jl;(ao, a) = Etwt+! = ao(1 + r,) + a'Etpt+!; its variance is TI~(ao, a) = Vtwt+! = a'Vtpt+!a. In the mean-variance approach (Markowitz 1952, 1976; Roy 1952; Sharpe 1963), the investor selects the composition of the portfolio at time t by taking into account his initial budget constraint and tries to maximize the expected value while minimizing the risk (i.e., the variance). Since these objectives are contradictory, the investor compromises and selects the portfolio with a balanced trade-off between the conditional mean and the variance. The investor's optimization objective is

A

2

max f.llao, a) - -2 TIt (ao, a), !lo,n

(3.20)

subject to

ao+a'pt=w, where W is the initial endowment at time t, and A is a positive scalar that measures the investor's risk aversion. From the budget constraint, we can

74

Multivariate Linear Mode15

derive the quantity of the risk-free asset: no = w - a.'pt. Next, after substituting this expression into the criterion function, the objective is to maximize with respect to the allocation a in the risky assets:

m:x w(l + rt) + a'[EtPI+1 - Pt(1 + rt)] - ~ a'VtPt+1a . Let us denote by 1';+1 = Pt+1 - Pt(l + rt) the excess gain on the risky assets, that is, the gain corrected for the return on the risk-free asset. Since rt and Pt belong to the information set available at time t, the expression to be maximized becomes (3.21)

The objective function is concave in a, and the optimal allocation satisfies the first-order condition

or (3.22) PROPOSITION 3.5: The solutions of the mean·variance optimization, that is, the mean-variance efficient portfolio allocations, consist of allocations in risky assets proportional to

where 1';+1 is the excess gain: 1';+1 = Pl+1 - (1 + rt)pt- The corresponding quantity of riskfree asset is aff.t = W - dlpt. The initial budget has an effect only on the allocation in the risk-free asset. The quantities of risky assets diminish when the risk aversion coefficient A increases.

3.4.2 Efficiency Frontier The stochastic properties of efficient portfolios are summarized by their first- and second-order conditional moments. These are

= aff.l1 + rt ) + al EtPI+1

Mean-Variance Efficiency

75

=w(1 + rt) + ai[EtPt+l =w(1 + r + l)

i

(1

+ rt)pt]

(EtY;+I)'(V;Y;+lr1(EtY;+I),

Tl~*(A, w) =Tlt(~I' at)

=aiV;(pl+l)C1.t* =12 (EtY;+I)'(VIY;+lr1(EtY;+I). When w is flxed and A, A > 0 varies, the moments are related by

Tl~*(A, w) = ~ [J.l1(A, w) -

w(1

+ rl )]2,

(3.23)

where PI = (EIY;+I)'(V;Y;+lr1(EtY;+I) measures the relative magnitude of the expected excess gain with respect to risk. PI is called the Sharpe performance of the set of assets (Sharpe 1963; Lintner 1965). Let us now introduce the mean-variance representation of portfolios. Each portfolio is represented by a bidimensional vector with components that are the conditional mean and variance of its future value. From (3.23), the set of efficient portfolios forms a semiparabola, which is tangent to the vertical axis at the risk-free portfolio, in which the whole budget is invested in the risk-free asset. All other portfolios are situated below this semiparabola, which justifles the term efficiency frontier (Figure 3.9). The efficiency frontier shifts upward when the Sharpe performance increases. The mean-variance representation, that is, the efficiency frontier and the location of portfolios with respect to the frontier, depend on time t through the price history. 3.4.3

Expected Utility

In a special case, the mean-variance approach can be interpreted in terms of expected utility. Let us consider an exponential utility function U(w) = -exp(-Aw), where the parameter A is positive. The absolute risk aversion

-

d2U(~)/dlJld(W) = A is independent of the wealth w. For this reason, this dw

w

utility function features constant absolute risk aversion (CARA). An investor may maximize his expected utility under the budget constraint. The optimization objective is

max "0,.

EP(Wl+l)

Multivariate Linear Models

76

variance Figure 3.9

The Efficiency Frontier

subject to

After eliminating the allocation in the risk-free asset through the budget constraint, the objective becomes

max - Et[exp - A(w(l + rt) + a'Yr+l)]. (l

When the vector of excess gains is conditionally Gaussian, the objective function is equivalent to the moment-generating function of a Gaussian variable and can be expressed in terms of the first- and second-order conditional moments. We get

This optimization is equivalent to the mean-variance optimization:

solved in Section 3.4.1.

77

Mean-Variance Efficiency 3.4.4

Vector Autoregressive Processes of Returns

The general theory of efficient portfolios can be applied to the case of an excess gain process with a VAR(I) representation that includes a constant: (3.24) where Et is conditionally centered, E!!-I+I = 0, and conditionally homoscedastic, V!!-t+1 = Q. The first- and second-order moments of the excess gain are

Therefore, the efficient allocations are

= in-I(1l + «I>Yt)

(3.25)

= in-Ill + in-I«I>Yt.

They depend on the price history through the current values Yt. These allocations are generated by a limited number of basic portfolios n-Ill and the columns of n-I«I>. The number of independent generating portfolios, often called the benchmark portfolios, is equal to the rank of [Il, «1>]. The investor has to update his or her portfolio regularly by taking into account the available information. It is interesting to compare this behavior with the behavior of an investor who fails to perform the updating. Such an investor will select an allocation based on the marginal moments:

As a result, the investor obtains a static portfolio allocation equivalent to the expected value of the dynamic allocation a* =EaT+1, In the (conditional) mean-variance representation, this set of "marginally" efficient portfolios (i.e., determined by the marginal moments) is represented by a semiparabola located below the efficiency frontier and tangent to it at the risk-free portfolio (Figure 3.10).

78

Multivariate Linear Models

variance

Figure 3.10 Marginally Efficient Portfolios and the Frontier

Indeed, the equation of the subefficient semiparabola is 212

1'\t = p;[~t - w(l

+ rt)] ,

where

liP _ (EY;+I)' (VY;+lrIVtY;+I(VY;+lrl(EY;+I) t-

[(EY;+I)' (VY;+lr l(EtY;+I)]2

From the Cauchy-Schwartz inequality, we get [(EY;+I)' (VY;+lr l (EtY;+I)]2 = [E(Y;+I)' (VY;+lr l V'tY;+I(VtY;+lr l (EtY;+I)]2 :::; [E(Y;+I)' (VY;+lrIVtY;+I(VY;+lrl(EY;+I)] [(EtY;+I)' (VtY;+lr\EtY;+I)] ,

or, equivalently, Pt:::;Pto

The two semiparabolas overlap if and only if the Cauchy-Schwartz condition is satisfied with an equality. This arises when the two portfolios (VtY;+lr 1 EtY;+1 and (VY;+lr1EY;+1 are proportional.

3.5 Summary In this chapter, we examined linear dynamic models for two or more time series. A multivariate setup can be used, for example, for short-term

Summary

79

forecasting of portfolio returns or for joint analysis of asset returns and volumes. Most results in the multivariate framework arise as extensions of their univariate analogues. Thus, a scalar mean of a univariate series is replaced by a vector of means of the individual series. The counterpart of a scalar variance of a univariate series is a symmetric, positive, semidefinite matrix of variances of the component series and their covariances. Finally, the autocovariance function, which remains an essential tool of analysis in a multivariate setup, becomes a sequence of square matrices that captures serial correlation of the components, as well as lagged interactions between them. By analogy, VARMA and VAR models arise as multivariate analogues of ARMA and AR models for which, instead of scalar coefficients, we find matrices of autoregressive and/or moving average coefficients. These issues are covered in Section 3.l. Section 3.2 is devoted to estimation of the parameters of the VAR model. Our interest in the VAR is motivated by its relatively simple structure and empirically evidenced good fit to financial data. In the special case of identical right-hand-side variables in all· equations of the reduced form, the estimation procedure simplifies to least squares applied separately to each equation in the model. An empirical application of a VAR model to high-frequency return and volume data is given in Section 3.3. It reveals strong intraday variation of the parameters estimated from hourly samples due to intraday seasonal effects. Section 3.4 highlights the use of the VAR in determining the meanvariance efficient portfolios. We have shown the advantage of using the VAR-based forecasts of the conditional means of returns in dynamic updating of portfolio allocations. The VAR-based strategy outperforms a static approach that relies on the marginal means of returns, which disregards the dynamic aspect. The structure of serial correlations and lagged interactions in a multivariate model is quite complex. We documented this complexity in our empirical example illustrating the return-volume relationship. Sometimes, however, it remains unclear whether the current values of both variables are determined simultaneously or instead are subject to a leaderfollower type of behavior. To investigate this issue, we need to uncover the existence of causal relationships, which are examined in the next chapter.

4 Simultaneity, Recursivity, and Causality Analysis

IN THE PREVIOUS CHAPTERS, we focused our attention on dynamic models for vectors or univariate time series with current values that depend on their past. In this chapter, we introduce systems of equations that emphasize both feedback and simultaneity effects, which arise when a current value of a time series simultaneously determines and is determined by a current value of another time series. This class of models, called simultane· ous equations, is widely used in economics, especially for modeling supplyand-demand equilibria. In finance, simultaneous equation models provide a convenient framework to study, for example, jointly or interdependently determined asset prices and volumes. This leads to the dynamic Capital Asset Pricing Model (CAPM), which provides an explicit formula for the trade-off between risk and expected returns on assets. The nature of dynamic interactions between variables in simultaneous equation models can be explored further using the causality analysis. It is aimed at distinguishing variables that at date t determine other variables in the system from those that respond to the system with a lag. In the first section, we introduce the structural model, discuss its dynamics, and study the properties of ordinary least squares (OLS) estimators. For clarity of exposition, we consider models involving only two endogenous series and one exogenous series. In the second section, we present the CAPM equilibrium model of asset prices and derive the equilibrium condition of asset demand and supply. Various procedures for testing the equilibrium hypothesis are also provided. In the third section, we explain the concept of causality and show how causal relations between variables can be modeled in a vector autoregressive (VAR) framework. We apply the causality analysis for empirical study of the relation between high-frequency returns and volumes.

81

82

Simultaneity, Recursivity, and Causality Analysis

4.1

Dynamic Structural Model

We denote by (y;, t E Z) and (Xt' t E Z) the two endogenous time series of interest and by (Z" t E Z) the exogenous one. The series are assumed jointly weakly stationary. For inference, we use three information sets lj, K." and Z" which represent all available information contained in the current and past values of Y, X, and Z, respectively. 4.1.1

Structura~

Reduced, and Final Forms

Structural Form Structural models represent interactions between variables implied by economic or financial theory. In general, we distinguish two sets of variables with respect to their role in the model. Variables with current values that are simultaneously determined by the system are called endogenous. They do not necessarily appear on the left-hand side of equations as their current or past values may determine some other endogenous processes. Among explanatory variables on the right-hand side of an equation in the system, we distinguish the exogenous variables, which are given or determined outside the system and the lagged endogenous variables. The form of a structural model is often based on some equilibrium conditions. These conditions can entail a quite complex structure involving interactions of the current values of endogenous processes with the lagged values of endogenous and exogenous processes, as well as instantaneous feedback effects between the current values of various endogenous variables. A typical structural form is Y;

= -al2X t + bllY;-1 + bl2X t-l + CIOZt + CllZt_1 + Ult,

X t = -a21Y; + b21 Y;-1 + b22X t-l + C20Zt + C21Z t-l + U2"

(4.1)

where the error terms Ult, U2t form a bivariate weak white noise and are uncorrelated with Z" Zt-h X t- h and Y;-I. These equations jointly determine two variables, X t and Y;. For this reason, the system in (4.1) is called a simultaneous equation model. The model may be rewritten using vector notation: (4.2)

EXAMPLE 4.1, THE EQUIUBRIUM SYSTEM: A classical example of a simultaneous equation model is the demand-supply equilibrium model. Let us consider an asset and introduce the corresponding aggregate demand and supply functions at date t. They depend on the current

83

Dynamic Structural Model

price, exogenous variable Zb and eventually lagged price and exchanged quantity. They are given by d t = alpt St

+ bllPt-1 + b l2qt-1 + CIZt + U!.h

= a2/Jt + b 2lpt-1 + b 22qt-1 + C2Zt + U2,h

assuming linearity with respect to prices. At equilibrium, the demand and supply are equal, and their common value determines the traded quantity: qt=dt=St.

By introducing explicitly the equilibrium condition into the system, we obtain a bivariate simultaneous equation model with current price and traded quantity as the endogenous variables:

+ bllPt-1 + b l2qt-l + CIZt + UI,l, qt = a2/Jt + b 2lPt-1 + b 22qt-1 + C2 Zt + U2,1' qt = alpt

EXAMPLE 4.2, EQUILIBRIUM AND ABSENCE OF ARBITRAGE OPPORTUNITY: There is a link between the notion of equilibrium and the condition of the absence of arbitrage opportunity. Let us consider two risk-free assets with respective risk-free returns rl,t and r2,h say. We denote by Sl,h and S2,t, respectively, the finite exogenous supplies of these assets. The total demand of investors is intuitively infinite for the asset with higher returns and thus is degenerate of the following form:

where rt =rl,t =r2,t in the regime where both risk-free returns are equal. The investors try to benefit from the arbitrage opportunity by leveraging the demanded quantities. The equilibrium condition

implies the equality rl,t = r2,t of risk-free returns. Therefore, at equilibrium, the system is degenerate with a deterministic restriction on asset returns.

Reduced Form We have seen from previous examples that current values of the endogenous variables l't and X t may be obtained as simultaneous outcomes of an equilibrium condition. The equilibrium is unique whenever the system (4.2) admits a unique solution. The condition ensuring its uniqueness is the invertibility of the A matrix. Under this restriction, there exists a reduced form of the simultaneous equation model in which current values

84

Simultaneity, Recursivity, and Causality Analysis

of endogenous variables are represented as functions of lagged endogenous and predetermined variables and of current values of exogenous variables and error terms: (4.3) It can be rewritten as (4.4) I I where B- = A - I B, Co = A-Co, and C1 = A - C1 are the reduced form parameters, and ut=A-1ut is the reduced form error term. Note that the error term is uncorrelated with all explanatory variables appearing on the righthand side of the system in (4.4).

REMARK 4.1: If Co = C1 =0, expression (4.4 ) is simply a VAR( 1) representation of the bivariate process

(i).

By including current and

lagged values of an exogenous process, we obtain a so-called ARMAX model (X for exogenous) (Hannan 1970). Therefore, the reduced form is a VARX representation of the bivariate process. EXAMPLE 4.3, THE EQUILIBRIUM SYSTEM: Let us consider the equilibrium model introduced in Example 4.1. The bivariate system can be solved with respect to Pt. qt. We get

1

qt = - - {(a2 bll - a 1b21 )pt-l a2- al

+ (a2 b12 -

a 1b22 )qt-l

+ (a2cl - alc2)Zt + a2ul.t -a lUV), 1

Pt = - - {(b ll - b21 )Pt-l + (b 12 - b22 )qt-l + (Cl - C2)Zt + ul.t - U2,t). a2- al

Even though the error terms of the demand and supply functions are uncorrelated, Cov(Ul.t, u2,t) = 0, the equilibrium price and quantity, in general, are conditionally correlated due to the equilibrium condition. Indeed, we get COVt-l[qt,Ptl =

1 (a2 - al)

1

2 COVt-l[a2Ul,t

- alU2,t, Ul,t - U2,t]

Dynamic Structural Model

85

Final Form By recursive substitution, lagged values of endogenous processes may be eliminated from the reduced form. This yields the final form of the structural model, in which current values of endogenous processes appear as functions of current and lagged exogenous variables and error terms. The final form is

where L is the lag operator, or in explicit form,

(i)

= A -ICoZt + (A -ICI + A -IBA -ICO)ZH

+ ... + (K1Bt-I(K1C1+ A -IBK1CO)Zt_h + K1ut + ... + (K1BtK1ut_h + ... 4.1.2

From a Structural Model to a Time Series Model

None of the specifications presented above belongs to the class of models discussed in Chapter 3, in which present values of vector autoregressive moving average (VARMA) processes were determined by their own past. The difference is due to the presence of current and lagged exogenous variables on the right-hand side (rhs) of equations in the system. Unless future values of these exogenous variables are known, a simultaneous equations model cannot be used to make predictions. Even the final form is not appropriate for forecasting at large horizons. To see that, consider the prediction of (~~::) evaluated at time T. From (4.4), we infer that

(~~::) = CoZm + (C + BCo)ZT+l + BC1Z + B2(~~) + um + BuT+!. 1

T

Therefore, for IT denoting all information available at time T, we get

whenever IT includes current and past values of the relevant variables. Since the structural model contains no information about the dynamics of the exogenous variables, it is not possible to evaluate E(ZT+lIIT ) and E(ZmIIr). A straightforward remedy to this problem consists of adding

86

Simultaneity, Recursivity, and Causality Analysis

yet another equation to describe the dynamics of the exogenous processes. The initial structural model now becomes

(4.5)

Zt = DZt_+ v" 1

where the error term V t is uncorrelated with U t and the lagged values of other processes. The system can be rewritten as

(i) =A-IB(i~:) + (A-ICI +A-ICof)Zt-l +ut+COV" Zt = DZt_+ v" 1

or, equivalently, (4.6)

This is a VAR( 1) representation of the bivariate process. Some features of a structural model are still preserved under this specification. For example, the constraints on the structural parameters A, B, Co, and C may imply restrictions on the autoregressive matrix. Moreover, the autoregres-

sive matrix has to include a subset of zeros in the lower left corner to accommodate the dynamics of the predetermined Z variable. Note that model (4.6) represents causal relations between variables in the system. If we disregard the noise effect at time t, are jointly determined by

Zt is determined by Zt-I and (i)

(i~:). Z" Zt-h and so on (Figure 4.1).

Figure 4.1

The Causal Chain

87

Dynamic Structural Model 4.1.3

The Properties of the Ordinary Least Squares Estimators

The structural, reduced, and final forms, as well as the extended structural form (4.6), define various linear functions of explanatory (rhs) variables. Therefore, it seems natural to use OLS for estimation of these linear models without taking into account structural constraints that they may need to satisfy. However, we have to remember that the least squares method yields consistent estimators of the parameters only in equations with error terms that are uncorrelated with the explanatory variables. This condition is satisfied by the reduced and final forms only. 4.1: The reduced form parameters B, Co, and C1 can be consistently estimated by OLS applied to (4.4).

PROPOSITION

The system in (4.4) is a seemingly unrelated regressions (SUR) model, discussed in Chapter 3. In the absence of structural constraints on the coefficients B, Co, and (;1> the same explanatory (rhs) variables are present in all equations. In this particular case, we know from Zellner (1962) that OLS estimators obtained from separate least squares estimation of each equation are equivalent to the general least squares estimator applied to the entire SUR model in (4.4). The estimation of structural form (4.1) is more complicated due to simultaneity. Although UI,t and U2,t are assumed uncorrelated with the lagged values of endogenous processes and the current and lagged values of exogenous processes, they may still remain correlated with the current values of the endogenous processes that appear among the explanatory (rhs) variables. PROPOSITION 4.2: The OLS estimators of the structural coefficients in the first equation of (4.1) are consistent if and only if Cov(Xt. Ul,t) = O.

This condition can be written in terms of structural parameters. Let us introduce the linear prediction errors of X t and Yt given X t- I and 1';-1: El.t = 1'; -

LE(1'; IXt-1> 1';-1),

E2,t = Xt -

LE(Xt IXt-1> 1';-1).

We deduce from (4.1) that

The condition for the consistency of the OLS estimators becomes

Cov(Xt> Ult) = 0 COV(E2t>Ult) = o.

88

Simultaneity, Recursivity, and Causality Analysis

COROLLARY

tent

4.1: The OLS estimators of the first structural equation are consis-

if and only if

or, equivalently,

if COV(EJt,E2t) + al2 Vqt = O.

The consistency condition involves both the structural parameters a21 and the slope coefficient from a regression of U2t on UJt. In particular, this condition is satisfied when a21

= Cov(uJt, U2t) = O.

In this case Xt is a function of its own lags and of the shock ~t uncorrelated with the shock Ul t from the equation defining 1';. Hence, Xt is determined prior to 1'; (or predetermined). We obtain a causal chain in which the variables are recursively determined (Figure 4.2).

4.2

The Capital Asset Pricing Model

In this section, we present an equilibrium model of asset prices widely known as the CAPM and interpret its various representations. The CAPM provides an explicit formula for the trade-off between risk and expected returns and shows the important role of the market portfolio. 4.2.1

Derivation of the Capital Asset Pricing Model

The CAPM model was derived independently by Sharpe (1964), Lintner (1965), and Mossin (1966). It assumes an optimizing behavior of investors and an equilibrium condition of asset supply and demand. We consider the framework introduced in Section 3.4 with one risk-free asset and n risky assets. We also denote by 1';+1 = Pt+1 - (1 + rt)pt the vector of excess gains on risky assets. Suppose that there are M investors, i = 1, ... ,M,

Yi-I Figure 4.2

The Causal Chain in the Recursive Scheme

89

The Capital Asset Pricing Model

who possess the same information at time t. Their behavior obeys the mean-variance optimization described in Section 3.4. The traders are characterized by absolute individual risk aversion coefficients Ai, i = 1, ... , M, and their total demand is (see Proposition 3.5): M

o.f = L. o.1t i=1

=

t li

[Yt(Y;+I)rIElY;+I)

= i[Yt(Y;+I)rIEtY;+h

where A, equal to the harmonic average

[f l]-I 1=1

of individual risk aversion

I

coefficients, may be interpreted as the absolute risk aversion coefficient of a representative investor. In addition, let us denote by a.: = Ztb the asset supply and assume that it is linearly driven by a set of exogenous regressors Zt with the sensitivity parameter b. The exogeneity assumption is justified if the supply shocks are due to the issue of new asset shares at a frequency assumed to be much lower than the trading frequency of investors. It has to be emphasized that, in the literature, exogenous supply is traditionally assumed to be time independent, a.; = bo (say). This is a strong assumption that needs to be tested since it eliminates from the analysis all dynamics due to the quantity effects. The equilibrium condition implies

af=a: i(VtY;+lfIEtY;+1 = o.~ EtY;+1 = A(YtY;+I)Z/J

(4.7)

Y;+I = A(YtY;+I)Ztb + Et+h

where the expectation error Et+1 satisfies the martingale difference condition EeEt+1 = O. At equilibrium, the expected excess gain depends on three components: • The representative absolute risk aversion coefficient A: If this coefficient increases, the expected excess gain has to grow to compensate investors with a higher risk aversion. • The risk effect VtY;+I: A higher risk implies a higher reward for risk bearing; the relation between the expected excess gain and the volatility VtY;+1 determines the risk premium. • The supply Ztb: When the exogenous supply increases, the expected

Simultaneity, Recursivity, and Causality Analysis

90

price also has to increase to attract investors and to ensure the matching of demand and supply. 4.2: A similar equilibrium condition can be derived in terms of excess returns. Let us denote the excess returns by

REMARK

Yt+1 = [diag(Pt)rl[PI+I - (1 + rt)ptl = [diag(Pt)r1ft+I' The total demand becomes

and the equilibrium condition

i[VlYt+l)rIEt(YI+I) = d~, where d~ = diag(pt)a~ gives the value of the allocation vector.

4.2.2

Case of Vector Autoregressive Net Gain of Order 1

The previous structural CAPM model can be nested in a dynamic VAR representation. Let us assume that the excess gains are conditionally homoscedastic:

(4.8)

say,

and that the predetermined supply is a function of excess gains, that is, the issuing of new shares depends on the observed excess gains: Ztb = bo + y'ft.

(4.9)

The equilibrium condition becomes ft+1 = AQbo+ AQy'ft + £1+1>

(4.10)

which is an autoregressive model of order 1. The underlying equilibrium model implies no restrictions on the parameters of the VAR representation. In fact, the number of parameters in the descriptive VAR representation is

~ n(n + 1)

[n for the constant

.matrIX, . and n(n2+ 1) £,or th e varIance. terms, n 2 £,or t h e autoregressIve

covariance matrix of the error term] and is less than the number of struc2 n(n + 1) . 3 tural parameters "2 n(n + 1) + 1 [n for bo, n for y, 2 for the vanancecovariance matrix, and 1 for the risk aversion coefficient].

91

The Capital Asset Pricing Model 4.2.3 Static Supply and Market Portfolio

For illustration, let us consider a static asset supply ci = bo as is traditional in the literature. The relationship between the number of parameters in the VAR representation of the excess gains [now equal to n + n(n2+ 1)] and the number of structural parameters is different when the total number of shares

a: =b is known. In such a case, there are n(n2+ 1) + 1 indeo

pendent parameters, and the equilibrium condition implies n - 1 constraints on the autoregressive parameters. These implicit constraints can be found easily. Let us introduce the market portfolio, with allocation in risky assets ~ = boo This portfolio generates an excess gain equal to Ym.t+! = (~)'Y;+! = bOY;+!. PROPOSITION 4.3: In the affine regression of Y;+1 on Ym,t+l

with EUt+l = 0, COV(Ut+b Ym,t+l) = 0, the constant term c is equal to 0. PROOF: From (4.10) with y=O, we have Y;+!=Anbo+Et+h and we deduce EY;+! = Anbo, VY;+! = VE t+! = n. In the regression of Proposition 4.3, the coefficient vector ~ is equal to ~ = COV(Y;+h Ym,t+!)

V(Ym,t+!) COV(Y,+h bOY,+!) =---::'::c~::-':----'-'-= V(bOY;+!)

nbo

= bOnbo '

The constant term is equal to c = EY;+! - ~E(Ym,t+!) = Anbo-

:~o AbO nbo

=0.

QED Therefore, the initial static model

can also be written as

Simultaneity, Recursivity, and Causality Analysis

92

Y;+1 = ~Ym,t+1 + Ut+h Ym,t+1 = A~ Qbo + ~t+1' This is a kind of "recursive" form, with the excess gain of the market portfolio being determined from the second equation. On substituting into the first subsystem, it yields the excess gains of basic assets. The reader should be aware that this interpretation is misleading since Y;+1 and Ym,t+1 = ~Y;+1 are simultaneously determined. The following proposition may be viewed as the reciprocal of Proposition 4.3. PROPOSITION 4.4: Let us consider the static model defined by (4.10) with Y= 0 and introduce a portfolio with allocations a. The excess gain of this portfolio is Y;+la) = a'Y;+l' The constant term in the regression

Y;+1 = c(a) + ~(a)Y;+I(a) + u t+1(a), is equal to 0 if and only if the portfolio a is proportional to bo, that is, it is a mean-variance efficient portfolio. PROOF: The regression coefficients are ~(a) = COV(Y;+hY;+I(a))

V(Y;+I(a))

~(a)E[Y;+I(a)]

c(a) = E(Y;+I) = AQbo -

Qa

a'Qa'

e:

aua

Aa'Qbo•

The constant term c(a) vanishes if and only if Qbo =

~:

au.a

a'Qbo a'Qa

a = -,--bo aQbo a is proportional to bo,

which completes the proof. QED REMARK 4.3: We obtain similar properties for excess returns instead of excess gains if the total value of supplied assets ri is constant (see Remark 4.1). Since ci = diag(pt)as and prices are not constant, only one of the static assumptions on as and ri can be satisfied. 4.2.4

Test of the Capital Asset Pricing Model Hypothesis

As shown in Section 4.2.2, a complete analysis of the equilibrium condition requires the specification of exogenous supply and includes a joint

93

The Capital Asset Pricing Model

study of prices and volumes. In the literature, the following three implications of the equilibrium condition are emphasized: 1. The efficiency of the market portfolio; 2. The fact that cross-sectional variation of expected excess returns is entirely captured by betas; 3. The nonnegativity of the market risk premium.

We discuss below the procedures for testing the two first equilibrium implications. They all require a precise choice of the risk-free asset and of the market portfolio. Usually, a I-month T-bill is used as proxy for the risk-free asset and a market index is the proxy for the market portfolio (see Chapter 15). However, this choice is conventional, and other proxies can be selected. It is important to realize, however, that the test results can significantly depend on the choice of proxies and can become invalid in the presence of heteroscedasticity (see Chapter 6). As a framework, we use regressions of excess gains and excess returns associated with the CAPM. We use the same notation Y. Efficiency of the Market Portfolio

This testing procedure is based on Propositions 4.3 and 4.4. We consider the regressions of excess gains or excess returns of assets }j,t on the market excess gains on returns Ym,t:

j = 1, ... ,n, t = 1, ... , T.

(4.11)

The market portfolio is mean-variance efficient if and only if Ho =

{C1

= C2 = ... = Cn = O}.

(4.12)

Model (4.11) is a SUR model with parameters that can be estimated by OLS. Standard software can provide the estimate e=(ej, ... , en>' and the estimated variance-covariance matrix of estimators Ve. At this point, it is advantageous to use a heteroscedasticity-adjusted variance estimator for Ve to take into account possible conditional heteroscedasticity of the error term Uj,t (see, e.g., Gibbons 1982; Jobson and Korkie 1982; Kandel and Staumbaugh 1987). The test of the null hypothesis Ho is based on the Wald statistic: ):,,:>w-C _ "(V;'C,)-1,c,

(4.13)

which measures the distance of efrom 0 (Gibbons 1982; Jobson and Korkie 1982; Kandel and Staumbaugh 1987). It may be proven that, under the null hypothesis, this statistic asymptotically follows a chi-square distribution with n - 1 degrees of freedom: ~w - x2(n -1). Therefore, the testing procedure consists of accepting the

94

Simultaneity, R.ecursivity, and Causality Analysis

efficiency hypothesis if ~w< X~5%(n - 1) or rejecting it otherwise, where X~5% is the 95th percentile of the 2(n - 1) distribution.

x

4.4: Let us explain why the degrees of freedom are n - 1 instead of n. The CAPM is a degenerate SUR model. Indeed, the explanatory variable Ym•t = a'Yt is a linear combination of the endogenous variables Yt. It implies that, under the null hypothesis, H o, U t = (Id - pa')Yt. It may be checked easily that Id - pa' is not of full rank, and that the variance-covariance matrix of U t is not invertible. Nevertheless, it is possible to apply a standard estimation technique and to correct only the number of degrees of freedom to n - 1 instead of n (see Gourieroux et al. 1997, Section IV.3.4).

REMARK

Cross-Sectional Regressions

Cross-sectional regressions were first developed by Blume and Friend (1973) and Fama and McBeth (1973). Under the CAPM hypothesis, we deduce from Proposition 4.3 that the marginal expectation of the excess gain satisfies an exact linear relationship with the beta: j=I, ... ,n,

(4.14)

where A. = E[Ym•t] > O. This is called the security market line. When the series (l}.t) is stationary, in the first step we can approximate Jlj and Pj by their historical counterparts: (4.15) To test the null hypothesis Ho = {3A. > 0: Il; = ~Pj, 'Vj}, in the second step we run a cross-sectional regression of J1;.T on Pj.T for j = 1, ... , n: j=I, ... ,no

(4.16)

Then, given the OLS estimators tZo and tZl and the associated residuals Vj.To we check if (1) tZo is statistically nonsignificant, (2) tZl is significantly positive, and (3) the residuals are close to O. When we compare regression model (4.16) to the constraint in expression (4.14), we see that the error term is equal to Vj.T

=J1;.T -

Il; - A.(~j.T - Pj),

under the CAPM hypothesis. Therefore, regression (4.16) involves two variables Il; and Pj with measurement errors. A large body of financial literature on regression model (4.16) has been focused on approximating the explanatory variable Pi by ~i.T (see, e.g., the discussion on Huang and Litzenberger 1988). It is a common belief that OLS estimators are not consistent and have to be replaced by

95

Causality

the instrumental variable estimators (see Chapter 8). A typical approach proposed by Fama and McBeth (1973) consists of dividing the assets into subsets. Let us partition the set of assets into different subsets flo' .. ,jK of a sample size f (say). Then, we aggregate the !1j,T and ~j,T within the 1 1 subsets to get flk,T = iI.j'if,!1j,T' ~k,T = ]~~, ~i,T' and run the OLS regression of A

flk,T on ~k,T' The aim is to reduce the error in variable ~k,T -~k by collecting assets into portfolios. Alternative bias correction methods have also been proposed by Litzenberger and Ramaswamy (1979) and Shanken (1985). Despite common belief, the OLS estimators are consistent. Indeed, the OLS estimator of al is

When T tends to infinity, this expression tends to •

A

hmal,T= T

~l ~Jlj - ~ ~l ~j I.~l ~

n

2 l(

n)2

I.. 1 A. - - I.. lAo 'F I-'J n 'FI-'J

A.

under the CAPM constraint (4.14). In fact, for large T, the error in variable ~j,T - ~j is small, and there is no need for bias correction. The argument given above is based on asymptotic theory and requires a large number of observations, such as T~ 300. In finite samples, the aggregation procedure proposed by Fama and McBeth (1973) may increase the accuracy of estimators, although this effect has not been thoroughly examined. In any case, it is sensitive with respect to the partition flo' ··.fK' Finally, note that it is necessary to take into account the heteroscedasticity of the error term Vj,T while computing the variance of the OLS estimators, aggregates, and test statistics. Under the CAPM hypothesis. we get

and a nondiagonal covariance matrix of the error terms.

4.3

Causality

The aim of causality theory is to describe dynamic interactions between time series and to reveal their independent movements. To simplify the exposition. we consider a bivariate VAR process:

96

Simultaneity, Recursivity, and Causality Analysis

{

Yl.t

= III + l,lYl •t- l

Y 2•t = 112

+ 1,2Y2•t- l + £l,t. + 2.1 Y l.t-l + 2.2 Y 2.t-l + £2.10

(4.17)

where the weak white noise admits a variance-covariance matrix

We also introduce the additional regressions of each series on its own past: {

Yl. t = ~l + ~IYl,t-l + ~l'h Y 2,t = 112 + 2 Y 2.t-l + £2,h

(4.18)

and the following regressions that account for the effects of other contemporaneous endogenous components: {

Yl.t

= ml + 'Vl,OY2,t + 'Vl,lYl,t-l + 'Vl,2 Y 2,t-l + Ul,h

Y 2,t = m2

4.3.1

+ 'V2,OYl,t + 'V2,lYl ,t-l + 'V2,2 Y 2.t-l + U2,t·

(4.19)

The Noncausality Hypotheses

The noncausality hypotheses are defined in terms of linear predictions. We say that, first, Y2 does not Granger cause Y j if and only if the best linear prediction of Yl,t given Yl,t-l and Y 2•t- l does not depend on Y 2,t-Jo In the VAR(1) framework, this hypothesis is equivalent to the nullity constraint on 1,2: H2->1

= {1,2 =

OJ.

It may also be characterized by comparing regressions (4.17) and (4.18) since

(4.20) Both residual variances coincide if and only if the regressor Y2,t-l is not relevant, that is, has no explanatory power. Second, similarly. the process Yl does not Granger cause Y2 if and only if the best linear prediction of Y 2,t given Yl,t-l and Y 2•t- l does not depend on Yl,t-Jo In the above framework, this hypothesis is

(4.21) Third, we can introduce a test for the absence of an instantaneous relationship (or simultaneity) between the two processes. We say that Yl does not instantaneously cause Y2 if and only if the best linear predictor

97

Causality

of Y 2,t given YI,h and Yl,t-h Y 2,1-1 does not depend on YI,t. This hypothesis can be expressed by the following equivalent formulas: HIH2

= {'I'2,0 =O} = {VU2,t = V£2,t} = {'I'1,0 =O} = {VUI,t = V£I,t}

(4.22)

= {W1,2 =O} ={detV[ ::::] =V£lt V£+ In particular, under this hypothesis both processes playa symmetric role: Y I does not instantaneously cause Y 2 if and only if Y 2 does not instantaneously cause YI • Fourth, when the three noncausality hypotheses are satisfied, the VAR( 1) representation (4.17) simplifies to {

YI,t = Y2,t =

/ll + CPl,lYl,t-1 + £l,t, /l2 + ~,2Y2,t-1 + £2,t,

where V£I,t =WI,h V£2,t =W2,2, cOV(£l,t,£2,t) = O. We get distinct evolutions of both processes, and ill = /lh (L)u, = 9(L)Et>

where (e,) is a weak white noise and the roots of q> and 9 lie outside the unit circle, (Y;) satisfies the autoregressive integrated moving average representation with orders p, 1, q, denoted ARIMA (p,I,q): (1 - L)q>(L)Y;

=9(L)Et>

t:?!p+l.

107

Unit Root

It differs from the standard ARMA representation by the presence of a unit root in the autoregressive polynomial. 5.2, MULTIDIMENSIONAL PROCESS: Let us consider a multivariate autoregressive representation of order 1 of the process (1';):

EXAMPLE

fIl(L)1'; =(ld - fIlL)1'; =£/.

We assume that the eigenvalues of the autoregressive matrix fIl have a modulus strictly less than 1, except for one of them, which is equal to 1. This condition can be written using the determinant of fIl(L): det fIl(L) =(1- L)P(L), where P(L) is a lag polynomial of degree n - 1 with roots lying outside the unit circle. Let us now introduce the adjoint matrix ",(L) of fIl(L) defined by ",(L)fIl(L) = det fIl(L)ld =(1- L)P(L)ld. ",(L) is a matrix of lag polynomials of degree n -1, which may be

decomposed into ",(L) =",(1) + (1- LNi(L).

By inverting the autoregressive representation, we get 1'; = fIl(Lr 1£/

",(L) = det fIl(L) £/

=

",(1) + (1 - LNi(L) (1 - L)P(L) £/,

or

1 'ii(L) 1'; = ",(1) (1- L)P(L) £/ + P(L) £/.

The first term of the decomposition is integrated of order 1, whereas the second term is stationary. The presence of the nonstationary component in the decomposition of (1';) implies that (1';) is 1(1). Indeed, we have ",(1) 'ii(L) (1- L)1'; =P(L) £/ + (1- L)P(L) £1>

which is stationary. 5.1.2 Fractional Process

In this section, the notion of integrated processes is extended by introducing a unit root of a fractional order. We restrict our attention to univariate time series.

Persistence and Cointegration

108

DEFINITION 5.2: The process (Y;) is ARIMA (P,d,q) ARMA(P,q) process where

if (1 -

LlY; is a stationary

(1-L)d= 1 +l:;1 d(d-1) . ..; (d-j+ 1) (-lYzj

J. =r" r(-d+J)!zj 'j=O r(-d) j! ' and the

r function is defined by rev) = J;

exp(-x)xV-1dx. We consider below the

fractional order d taking values in [O,l}.

For d =0, we obtain a stationary ARMA process, whereas we get a nonstationary ARIMA(P,l,q) process for d = 1. By allowing d to take any value between 0 and 1, we close a gap between stationary and nonstationary processes. Proposition 5.1 provides the limiting value of d that separates the regions of stationary and nonstationary dynamics (see, e.g., Granger 1980; Hosking 1981). PROPOSITION

only

5.1: The fractional process is (asymptotically) stationary

if and

if d < 1/2.

When this condition is satisfied, we can derive the limiting behavior of the autocorrelation function for large lags. We get p(h) - const h 2d- l ,

for large h.

(5.3)

For d < 112, we have 2d - 1 < 0, and the autocorrelations tend to 0 at a hyperbolic rate, much slower than the geometric rate associated with standard ARMA models. The persistence increases with the fractional order d. The asymptotic expansion given in (5.3) suggests a crude estimation method for the fractional order d in the stationary case. It is a two-step procedure outlined below: 1. Estimate the autocorrelations by computing their empirical counterparts p(h), h varying. 2. Recognize that expression (5.3) implies that 10glp(h)l- a.+(2d-1)logh,

for largeh,

and estimate its empirical counterpart by regressing 10glp(h)1 on 1, log h,

for large h.

a

The estimator of the fractional order is defined by = (1 + ~)/2, where ~ is the ordinary least squares (OLS) estimator of the coefficient on log h.

Unit Root

109

This estimator is not very accurate and its asymptotic properties are difficult to derive (see, e.g., Geweke and Porter-Hudak 1982). More advanced estimation methods can be found in the literature (see, e.g., Robinson 1992). However, the asymptotic properties of fractional order estimators are known, but only for (lOt) being a (Gaussian) strong white noise. Obviously, this assumption is violated by financial data. Moreover, the fractional order is sensitive to nonlinear transformations of time series. For example, while returns may appear uncorrelated, their squares typically display a long range of temporal dependence. 5.1.3 Persistence and Nonlinearity Nonlinear Transformation Let us illustrate the dependence of the fractional order of integration on nonlinear transformations. We consider the univariate process defined by Yt = sng (lOt) IXtl,

where (1 - LtXt =lOt> (lOt) is a Gaussian strong white noise, and sng lOt =+ 1 if lOt ~ 0 and is -1 otherwise. Both IYtl and (Yt) feature long-range persistence since they depend on the fractional process Xt. However, sng (Yt) = sng (lOt) is a strong white noise without memory. Strong dependence of the fractional order on nonlinear transformations of the data is well documented in return series (see, e.g., Ding, Engle, and Granger 1993). Figure 5.1 shows the autocorrelation function

o

(\J

o

abs power 1.5 power 2 abs power 3

10

0 0

0 10 0

0 0

0

0

10 Figure 5.1

20

30

40

Transformed Autocorrelogram

50

110

Persistence and Cointegration ~

co c::i co u..c::i

~

'PYt-p + crEt· Let us now assume that the innovation process (E" t E Z) is a strong white noise with mean zero and unitary variance. Then, the conditional mean of the AR(P) can be written as a linear function of lagged y's E(Yt 11t-l) = Jl + c!>IYt-l + ... + c!>PYt-p, and its conditional variance remains constant in time:

The nonlinear autoregressive models extend the previous specification in two aspects. First, they express the conditional mean as a nonlinear function of the past of the process. Second, they accommodate the volatility movements by conditioning the variance on lagged observations. A nonlinear autoregressive model can be written as (6.1) where (E" t E Z) is a strong white noise with EEt = 0 and VEt = 1. Depending on the functional forms of the drift function Jl and volatility function cr, various extensions of the general specification (6.1) can be derived. For example, the drift and volatility functions may be a priori constrained to belong to a given parametric class under a semiparametric setup. Otherwise, the drift and volatility functions may be left unconstrained under a nonparametric approach. 6.1.2

Leptokurtosis

In the early 1960s, Mandelbrot (1963) and Fama (1963) documented empirical evidence on heavy tails of the marginal distributions of asset re-

119

Nonlinear Autoregressive Models Table 6.1

Marginal Kurtosis

Sampling Frequency

Asset

5 minutes 10 minutes 20 minutes Daily Monthly Quarterly Quarterly Daily

Bank of Montreal S&P 500 (adjusted) ITA Index Yield on 20-year UK gilts Yield on 91-day UK gilts Pound/dollar exchange rate

Period

Kurtosis

October 1998

7.00 6.34 6.89 1.48 15.96 5.93 4.22 8.40

1950-1992 1965-1990 1952-1988 1952-1988 1974-1982

ITA: Financial Times Actuaries. All shares.

turns. This finding had crucial implications for risk management since thick-tailed distributions entail frequent occurrence of extremely valued observations (see Chapter 16). The thickness of tails is measured by the marginal kurtosis:

k_E(Yt-Eyl -

(Vyi

'

(6.2)

which is the ratio of the centered fourth-order moment and squared variance. It is known that the kurtosis of a normally distributed variable is equal to 3. In Table 6.1, we report the empirical marginal kurtosis computed for various asset returns. Table 6.1 shows that the tails of return distributions are heavier than the tails of the normal. Distributions sharing this property are called leptokurtic. Our evidence confirms that returns feature departures from normality. The normality assumption is violated by the marginal distributions of returns. Let us now explain how the conditionally heteroskedastic models may reconcile this stylized fact with the normality assumption by distinguishing between the marginal and conditional kurtosis. To illustrate this approach, let us consider an autoregressive process of order 1 with a Gaussian white noise: Et - IIN(O, 1).

(6.3)

The conditional distribution of Yt given Yt-I is normal and has conditional kurtosis equal to 3. Let us now compute the statistic

C(Yt) = E(Yt - Eyl- 3(vyl.

120

Conditional Heteroscedasticity

It is equal to zero if the marginal kurtosis is equal to 3. By substituting expression (6.3) for Yh we get: C(Yt) = E{ [~(Yt-I) - E~(Yt_I)] + a(Yt_I)Et }4 -:- 3[E{~(Yt_l) - E~(Yt_l) + a(Yt-I)E }2]2

= C[~(Yt-I)] + 6E{ a2(Yt_I)[~(Yt_l) - E~(Yt_I)]2} - 6E[~(Y'_I) - E~(Y'_I)]2Ed(Yt-l) + 3Va2(Yt_I),

where C(~(YI_I)) denotes the C-type moment of the variable ~(YI_I)' Therefore, the marginal kurtosis of YI depends on the marginal kurtosis of ~(YI_I) and on the variation of the volatility d(Y,_I). In particular, when ~(YI_I) is constant (for instance, if it is equal to 0), the formula becomes (6.4)

Therefore, conditionally Gaussian errors in (6.3) are compatible with a leptokurtic marginal distribution of returns, and the thickness of tails is related to the dispersion of volatility.

6.1.3

Parametric and Nonparametric Inference

A standard estimation method for dynamic models of non-Gaussian series is the quasi (pseudo) maximum likelihood applied as if the errors were conditionally Gaussian. For clarity of exposition, we consider below a nonlinear autoregressive process of order 1. Parametric Model

The model is defined by the equation

where the drift and volatility functions depend on the unknown parameter 9. The quasi-maximum likelihood estimator (QMLE) of 9 is obtained by maximizing the log-likelihood function computed as if the errors (Eh t E Z) were Gaussian:

aT = Arg min L T(9) 8

T

=Arg min~)og l(y,IY,_I; 9), 8

with

1=1

(6.5)

Nonlinear Autoregressive Models

121

(6.6) 6.1: If (e t, t E Z) is a strong white noise with Eet = 0 and Vet = 1, then the QML estimator aT is consistent: lim~~ aT = a and asymptotically normal: PROPOSITION

where l(Yt IYt-l; J = E 6 {_ ii logaoao' 1= E {alog l(Yt (Yt-I; 6 ao 1

a)},

a) a log l(Yt IYt-l; a)} ao"

Although the two matrices I and J are generally different, they coincide when the error term et is Gaussian, that is, when the QML is identical to the maximum likelihood (ML). Other estimation methods may also be used, such as two-step least squares. Let us assume that the drift and volatility functions depend on distinct subsets of parameters:

(6.7) The approach consists of the two steps outlined below. In the first step, estimation of 0 1 is done by nonlinear ordinary least squares. The estimator is

In the second step are computed the first-step residuals:

t= 1, ... ,T, and estimation of O2 is made by nonlinear least squares applied to squared residuals. The estimator of O2 is A

•

o2,T= Arg mm 6,

L [,2Ut,T- 2(Yt-I; O ]2 T

(J

2)

•

1=1

In practice a one-step QML method is preferred for two reasons. First, it is in general more accurate. Second, in financial applications, the existence of a risk premium implies that the parameters of the volatility equation are likely to appear in the conditional mean.

Conditional Heteroscedasticity

122

Nonparametric Model Let us now consider the autoregressive specification YI = Il(YI-I) + O'(YI_I)£I,

where the drift and volatility functions are a priori left unconstrained except for the positivity restriction on the volatility function. They are considered as two functional parameters that can be estimated by a kernel-based quasi-maximum likelihood method (Pagan and Schwert 1990; Hardie and Vieu 1992). A kernel is a weighting function used for local analysis of Il and 0'. It is defined by a real function K that integrates up to 1: J K(u)du = 1. A standard example is the Gaussian kernel (Figure 6.1) defined by K(u) =

in

~2 ).

exp ( _

The kernel-based QML estimators of Il and 0' are defined by

[0'I1(Y)] A() -_ Arg max ~ ~ _1 Th K Y 140 1=1 T = Arg max 140

~

ci

[1!.:.c1] h l(YI IYI-I>. 1l,0') T

f _1_ K [1!.:.c1] [-!2 log -!2 1=1

ThT

hT

0'2

(6.8)

(YI - Ili ], ~

Vy,

r---------------------------------------,

C\J

ci

o ciL-~~----~~----~------~------~--~ -2 -4 o 2 4 Figure 6.1

Gaussian Kernel

123

Nonlinear Autoregressive Models

where hr, called the bandwidth, tends to 0 when T tends to infinity. The estimators have the explicit forms

(6.9)

(6.10)

They can be interpreted as weighted empirical mean and variance, with weights K [

YI-~~ Y ] / [ r!1 K ( YI-~~ Y )] depending on the point y and on

the number of observations. The asymptotic properties of the estimators follow from general properties of kernel M-estimators (see, e.g., Gourieroux, Monfort, and Tenreiro 2000). 6.2: If (E t E Z) is a strong white noise with EEl = 0 and if T ~ and Th r ~ then (i) J1 and ef are consistent estimators of ~ and d.

PROPOSITION

I,

1,

00,

00

VEl

=

(ii) They are pointwise asymptotically normal: ...JThr [ar - a]

where

~ N [0, J(yr~~ryrl JK2(v)dv ].

a=(dry) ~(y) ), 2 )( ) _ E [ _ a 10g l(YIIYI_I; Y aaaa'

a) 1YI-I-Y -] ,

fly) is the value of the marginal probability density function (Pdf) of YI evaluated at y.

The formula of the asymptotic variance is similar to the formula derived in the parametric approach. A distinctive feature is the presence of a multiplicative factor j(~) JK2(v)dv and the expressions of information matrices I and J conditioned on YI_I = y.

124

Conditional Heteroscedasticity 6.1.4

Asymmetric Reactions

The nonlinear autoregressive specification allows for an asymmetric response of volatility to the sign of price changes (the so-called leverage effect, see Section 6.3.2). Indeed, stock market investors seem to react differently depending on whether the asset prices rise or fall. Moreover, these reactions may generate price discontinuities due to psychological factors, such as traders' preferences for round numbers or the slowdown of market indexes before exceeding thresholds of integer multiples of 100 (say). To examine these issues, Gourieroux and Monfort (1992) introduced a nonlinear autoregressive model with thresholds called the Qualitative Threshold Autoregressive Conditionally Heteroscedastic (QTARCH) Model. The specification, including two lags, is

Yt =

LL ,

(6.11)

J

where Ai, i varying, are fixed intervals defining the partition of the real line, and p.

(6.16)

128

Conditional Heteroscedasticity

o

100

200

300 a1 = 0.0

400

500

o

100

200

300 a1 = 0.2

400

500

o

100

200

300 a1 = 0.8

400

500

Figure 6.4

Squares of Simulated Paths of AReR( 1) Model

This is an ARMA representation for the transformed process (y~, t E Z) of an autoregressive order max(p,q) and a moving average order p. However, we need to be cautious about applying standard software designed for estimation of ARMA models to the series (y~, t E Z). Indeed, the innovation (u t ) of squared returns is, in general, conditionally heteroscedastic and also conditionally non-Gaussian (due to the positivity of y~). Therefore, standard software packages often provide erroneous standard errors of estimators and erroneous test statistics and prediction intervals. ARMA-GARCH

The ARCH and GARCH models introduced above represent processes with zero conditional means. Let us now examine processes with path dependent means, whose errors follow an ARCH or a GARCH process (ARMA-GARCH; Weiss 1984). For example, we can consider a linear regression model with GARCH errors:

Autoregressive Conditionally Heteroscedastic Models

{

129

Yt=Xtb + Ue.

where (u t) is a GARCH,

or an ARMA model with GARCH errors: {

(L)yt = 9(L)ue.

where (u t ) is a GARCH.

ARCH-M

The ARCH-M model (Engle, Lilien, and Robbins 1987) accounts for the risk premium by introducing the volatility into the conditional mean equation. For this reason the process is called ARCH-M for ARCH effect in the mean. This model can be written as {

Yt = x/J + ccr~ + Ue.

where (u t) is a GARCH and cr~ = V(u t I~t-l).

The coefficient c can be interpreted as the unitary price of risk. 6.2.2

How to Detect the Autoregressive Conditionally Heteroscedastic Effect

Before estimating the GARCH-type models, it is necessary to test for the presence of conditional heteroscedasticity. A straightforward approach follows. 1. We select a specification for the conditional mean by considering either a linear regression model Yt=xtb+ue.

or an ARMA model (L )Yt = 9(L )ue.

2.

where (u" t E Z) is assumed to be a weak white noise. The QML estimators of the parameters of the conditional mean are consistent (even if their standard errors produced by computer software are not correct). The first-step estimators can be used to approximate the error terms by either or A

cb(L)

u t = 8(L)Yt'

residuals, where the hat indicates that the parameters were replaced by the estimated values.

130

Conditional Heteroscedasticity

3.

We analyze the ARMA properties of the series (u;, t E Z) byexamining the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the approximating series (u~, t = 1, ... , T) to see if the standard identification test reveals statistically significant autoregressive or moving average patterns.

The following is the rationale for this approach: If we are interested in the evolution of risk (in general, measured by the second-order conditional moment), it is insufficient to examine the standard autocorrelogram p(y; h) = Cov(Yt>Yt-h)/VYt.

We also have to examine temporal dependence between squared errors: p2(U, h) = Cov(u~, U~h)/VU~

Cov{ [Yt - E(Yt 11t-I)]2[Yt-h - E(Yt--h 11t-h-d 2}

(6.17)

V[(Yt - E(Yt 11t-I»2] This expression is called the autocorrelogram of order 2. 6.2.3

Statistical Inference

Two·Step Estimation Method The quadratic specification of the volatility equation can be estimated using a software package designed for estimation of ARMA models. Let us consider an ARMA model with GARCH errors: (6.18) where cr~ = Vt-IUt satisfies cr~ = c + a,(L)u~ + ~(L)cr~ ~u~ = c + [a,(L) + ~(L)]u~ + 11t - ~(L)11t.

(6.19)

In the first step, the autoregressive and moving average parameters

and e are estimated using a computer package for estimation of ARMA

models. Then, we evaluate the associated residuals u~. In the second step, the ARMA representation in (6.19) is estimated by again applying ARMA software to the series of squared residuals u~. Although this approach is not very accurate, it is easy to implement and provides consistent estimators. They may be used later as initial values in an algorithm optimizing the quasi likelihood.

Q}.tasi·Maximum Likelihood The parameter estimators are derived by optimizing a log-likelihood function computed as if the innovations (u t ) were conditionally Gaussian. The

Autoregressive Conditionally Heteroscedastic Models

131

asymptotic properties of the method are similar to the properties given in Proposition 6.1. Generally, the quasi-likelihood function has a complicated analytical expression and its values have to be computed numerically. 6.2.4

Integrated Generalized Autoregressive Conditionally Heteroscedastic Models

Very often, the estimated parameters of the volatility equation are such that l:~1 d; + l:.f-I ~j = 1, that is, they suggest the presence of a unit root in the volatility equation. This is evidence of the so-called volatility persistence (see, e.g., Poterba and Summers 1986). However the interpretation of a unit root in the volatility equation is very different from its interpretation in the conditional mean equation. As shown below, it implies an infinite marginal variance of Yt without violating the stationarity property, whereas a unit root in the mean equation leads to a nonstationary process (see Chapter 5). To understand this phenomenon, we consider an example of the GARCH(l,l) model studied by Nelson (1990). The process is defined by l(YI IYI-I) =N(O,~),

where cr~ = c + ~~_I + a.Y~h a. ~ 0, and ~ ~ O. Equivalently, the volatility equation can be written

~ = c + (~ + aZ~l)cr~h

(6.20)

where (ZI) is a reduced Gaussian white noise. Therefore, the volatility follows a nonlinear autoregressive process of order 1 with a stochastic autoregressive coefficient. Its dynamic properties can be analyzed by examining the impulse response function, which measures the effect of a transitory shock to cr~ on future volatilities. The shock effect may quickly dissipate or persist. In our study, we distinguish the shock effect in average across many admissible trc:yectories and along a single path. The effect of an initial shock at horizon t is

o(do)

o(cri) = n (~+ aZ~_I)o(do). t=1

Average Multiplier

For the average multiplier, we get

E(o(~)) =E[rt.1 (~+ aZ~_I)]o(do)

=n~IE(~ + aZ~_I)O(cr~)

(6.21)

132

Conditional Heteroscedasticity

=

[E(~ + aZ2m5(do)

=

(~+ a)' ••• ) = (~). The predictions h steps ahead depend on the date t and the horizon h. The prediction of Yt+h made at time t is (7.1)

whereas the prediction errors are

(7.2) In this framework, the prediction functions gt,h are doubly indexed by the prediction origin t and prediction horizon h and so are the processes of predictions and prediction errors. Depending on the approach to prediction making, the following scenarios can be considered: 1. A fixed forecast origin t and a varying prediction horizon h yields the term structure of predictions. It shows how a prediction made at time t depends on the horizon h (or term). 2. A varying (increasing) prediction origin and a varying (shrinking) prediction horizon for predicting the same future value yr illustrates the prediction updating for Yr. It displays the evolution of predictions of the terminal value Yr, where T =t + h is fixed, whereas t and h = T - t are jointly varying. 3. A fixed prediction horizon and a varying prediction origin result in a sequence of predictions at a fixed horizon h. This type of prediction is often made "in-sample" (as opposed to "out-of-sample" predictions) to assess their accuracy. The approach consists of comparing the true observed values of a series (Yt) to a sequence of in-sample predictions. For example, for h = 1, we would compare (Yt) to (t-lYI).

153

Expectation Scheme Table 7.1

Date of Prediction T T+l T+2

Approaches to Prediction Making

Variable to Be Predicted YT

YT+l

YT+2

YT

rYT+l

rYT+2

rYT+3

rYT+4

YT

YT+l

T+1YT+2

T+dT+3

T+dT+4

YT

YT+l

YT+2

T+~T+3

T+~T+4

YT+3

Y'+4

The described approaches to prediction making are summarized in Table 7.l. In prediction updating, it is important to distinguish between the increasing sequence of information sets It = (~), which comprise the whole history of x and grow with each arriving observation, and the sequence of information sets effectively used for repeated predicting. Intuitively, it may become technically cumbersome to build predictions based on the entire history of x. Instead, we can select a prediction function based, for example, only on the last (i.e., the most recent) observation on x. In this case, we disregard the remaining available information and at each prediction updating use an information set containing the same number of elements (i.e., one element in our example). Thus, unlike the sequence of information sets comprising the growing history of x, the sequence of information sets effectively used for predicting is not necessarily increasing. Moreover, the information sets may contain endogenous components, that is, lagged values of the process of interest y. Let us introduce an additional process z and decompose the process x into x, = (Y'-hZ,). The predictions take the form (7.3)

that is, they include an autoregressive component representing the past dependence on lagged values 1.t::! and exogenous component ~. Finally, there exist stationary expectation schemes, for which the prediction function gt,h = gh depends on the prediction horizon and is independent of the prediction origin t. To this category belongs the adaptive scheme discussed below. 7.1.2

Adaptive Scheme

The adaptive scheme was the first prediction scheme introduced in economic literature. It still remains popular among practitioners. It was originally proposed by Fisher (1930a, 1930b) (see also Arrow 1959; Nerlove

154

Expectation and Present Value Models

1958; Friedman 1957 for historical applications). The adaptive scheme consists of simple prediction updating, with a fIxed horizon: tjl+h = Ah t-ljl+h-I + (1- Ah)YI. (7.4) The prediction made at time t is a weighted average of the previously made prediction at horizon h and the last available observation of the process. The smoothing coefficient Ah may depend on the horizon and is assumed to lie between 0 and 1. Since the smoothing coefficient is time invariant, the expectation scheme is stationary. In the limiting case Ah = 0, we obtain the naive expectation scheme Ijl+h =y" for which the prediction is equal to the current value of the process. For the horizon h = 1, the adaptive scheme admits the following equivalent interpretations. An equivalent form of equation (7.4) is the error correction mechanism:

(7.5) When the prediction error at time t is positive, the prediction is adjusted to a larger value using the adjustment coefficient (1 - AI). The larger Al is, the smaller is the necessary correction. Predictions can also be recursively substituted into (7.4) to obtain the extrapolation formula Ijl+l =(1- AI)YI + (1- AI)AIYI_I + ... + (1- AI)A.-hl-P + ... ,

(7.6)

where the prediction is expressed as a geometrically weighted average of current and lagged observations of the process. This approach is called exponential smoothing.

Any expectation scheme unrelated to the dynamics of the series of interest is expected to have undesirable properties, such as bias or poor accuracy. The performance of the adaptive scheme depends on the process to be predicted. The following examples illustrate this issue. EXAMPLE 7.1: Let us consider a constant series YI=C. By applying (7.6), we get

Ijl+l =(1- AI)

(~ ~)c=C=YI+I.

This is a perfect foresight with a zero-expectation error. EXAMPLE

7.2: Let us consider a deterministic exponential growth YI =

cpt, with P > 1. We get Ijl+1 = Al I-dl + (1 - AI)Cpl.

This is a linear recursive equation with solutions of the type: • (1 - AI)Cpl+l A'II+I IYI+I = P _ Al + "'I,

155

Expectation Scheme

where A is an arbitrary constant. For large t, we get approximately (1 - A.I)Cpl+l 1 - 1..1 d b h hi· . IYI+I "'" '1 - - ' 1 - YI+I an 0 serve t at t e re atIve expectatlOn p - 11.1 P - 11.1 error is constant. A

7.3: If YI = £1 - 9£1-1 admits a moving average representation of order 1, the "optimal" prediction is IYI+I =-Syl - 9)1-1 - 93YI_2 ••• - 9P+lyl-p •... Thus, the weights of the adaptive and the optimal prediction schemes are different, especially for a negative moving average coefficient. EXAMPLE

7.1: The efficiency of the adaptive scheme depends on the smoothing parameter. The expectation error can be reduced by adequately setting its value. In particular, for a fixed horizon h, the smoothing parameter can be estimated by

REMARK

where the values of YI are set to 0 for negative dates, and to is chosen sufficiently large to alleviate the effect of this truncation. 7.1.3

Rational Scheme

The rational scheme yields the most accurate predictions among all admissible ones. It relies on the following proposition: PROPOSITION 7.1: Let y be a variable to be predicted and x denote variables used for predicting. The solution to the minimization

min E[y - g(X)]2 g

exists and is equal to the conditional expectation of y given x: Y= E(y Ix).

The conditional expectation, or equivalently the rational expectation, can be viewed as a projection. Therefore, it satisfies the orthogonality condition, admits a variance decomposition, and obeys the law of iterated expectations, explained below. PROPOSITION 7.2: The (rational) expectation error is orthogonal (uncorrelated) to any function of the conditioning variables:

E[(y - j)g(x)] =E{ [y - E(y Ix)]g(x)}

=0, \;/g.

In particular, by choosing a constant function, we find that E(y - j) = 0 ¢:::> Ey = Ej ¢:::> Ee = 0,

156

Expectation and Present Value Models

that is, the rational expectations are unbiased. Moreover, the condition of Proposition 7.2 implies the absence of correlation between the expectation error and the conditioning variable. PROPOSITION 7.3, VARIANCE DECOMPOSITION: The rational expectations satisfy the following equality:

Vy= Vy+ VE. PROOF: We get

Vy = V[y + y - yl = Vy + V(y - y) + 2 Cov(j,y - y) = Vy+ V(y- y), because the prediction error and the conditioning variable are uncorrelated. QED PROPOSITION 7.4, LAW OF ITERATED EXPECTATIONS: Let us assume a parti· tion of the set of conditioning variables into two subsets x =(x bX2). We have

y = E(y Ix) = E(E(y IXI) Ix) = E(jllx). The prediction of y based on the variables XI and X2 can be derived in two steps. We first compute the expectation with respect to the restricted information set Xi> which yields YI. In the second step, we compute the expectation of YI with respect to x. Propositions 7.2-7.4 can be extended to a dynamic framework. At this point, we need to define the martingale process. DEFINITION 7.1: Let It = tion sets.

(i) A process M = (M!> t

(~),

E

t varying; be an increasing sequence of informa·

Z) is a martingale with respect to (It) if and only if:

E(Mt+lllt) =E(Mt+ll~ =Me. \:It. (ii) A process 11 = (110 t E Z) is a martingale difference sequence with respect to (It) if and only if E[11t+lllt] =E[11t+II~] =0, \:It. A martingale is a stochastic process for which the rational and naive expectations coincide. The martingales and martingale difference sequences are related. Indeed, it is easily checked that, if (Mt) is a martingale, then (11t = M t - M t- I)

157

Expectation Scheme

is a martingale difference sequence. Conversely, if (TIe) is a martingale difference sequence, (Mt = ~et=OE,;) is a martingale. Let us now consider the process of interest (Ye), the sequence of increasing information sets (Ie) = (~), and the associated rational expectations eYe+h = E(YHh I~)· We can introduce the rational expectation errors /-Hh = YHh - E(YHh I~),

(7.7)

and the subsequent (rational) updating errors

£; = E(YHh I~) -

E(Yt+h IXe-I).

(7.8)

By applying the law of iterated expectations to Yt+h, the updating error can be rewritten as

that is, as one step-ahead prediction error from the prediction E(YHh I~). It follows that the updating errors are uncorrelated with all elements of the information set I t- I =(Xe-I) 7.5: For any fixed horizon h, the sequence of updating errors (10;, t varying) is a martingale difference sequence.

PROPOSITION

In general, the prediction errors (ef,t+h, t varying, h fIxed) do not form a martingale difference sequence, except for h = 1. This becomes obvious when the expectation error at horizon h is written in terms of expectation errors at horizon 1. We get A+h = YHh - E(Ye+h I~) = Yt+h - E(YHh IXe+h-I)

+ E(Yt+h IXHh-I) - E(Yt+h IXt+h-2) + ... + E(YHh IXe+I) - E(YHh I~), or h-I

A

tf:-t+h =

~

~

Ai

Ee+h-i·

(7.9)

i=O

EXAMPLE 7.4: To illustrate the differences between the expectation errors and the updating errors, we consider a Gaussian AR( 1) process:

Expectation and Present Value Models

158

with a Gaussian white noise, and 1p 1 < 1. If the information set contains the lagged values of Y, the rational expectation at horizon h is

tYt+h =E(Yt+h 11!) = phyt. The prediction errors are •

~t+h

h

= Yt+h - P Yt = Et+h + PEt+h-l +

... + P

~

Et+h

and the updating errors are E~ = phEt; they all are functions of the white noise process. The next proposition describes the dynamics of predictions of the value of YT made at different origins t. It arises as a crucial element in the discussion of martingale properties of spot and forward prices on financial markets (Samuelson 1965). PROPOSITION 7.6: The rational expectation sequence [tYT=E(YTI~), t varying] is a martingale. PROOF:

It is a direct consequence of the law of iterated expectations:

QED Let us now comment briefly on Samuelson's argument. If YT denotes the spot price of an asset at time T, we can expect that the forward price at date t with residual maturity T - t is equal to Y;'T =E(YT IYt). We deduce that the sequence of forward prices is a martingale. This property is a direct consequence of the law of iterated expectations. It differs from the efficient market hypothesis, which assumes that the sequence of spot prices (Yt) is a martingale. In practice, the information set includes the lagged values of y. Therefore, the in-sample predictions become perfect foresights: tYT = YTo "VT ~ t. The out-of-sample predictions (with t < T) become more accurate when t increases, and the prediction origin approaches T since

V(tYT) =V(t-lYT) + V(tYT-t-l YT) ~

V(t-lYT),

by the variance decomposition equation; the direction of this inequality is reversed for the expectation errors:

159

Expectation Scheme

In conclusion, under the optimal expectation scheme, the prediction accuracy is improved when the predictions are made at dates closer to maturity.

7.2: Section 7.1.3 is entirely focused on rational expectations. We do not analyze in detail suboptimal expectations, although some results established in Propositions 7.2-7.4 may hold for them as well. At this point, researchers need to be cautioned against a commonly committed error, by which suboptimal expectations are reported and mistakenly interpreted as the rational ones. This happens when the set of solutions of the optimization problem in Proposition 7.1 is restricted to linear affine functions of x, yielding as a result a theoretical linear regression of Y on x: Y= LE(y Ix). In the literature, this suboptimal outcome is often erroneously viewed as the rational expectation. However, we must realize that, although basic Propositions 7.2, 7.3, and 7.4 can easily be extended to this framework, a linear regression does not yield martingale prediction errors, which is an important feature in financial theory (see, e.g., 2.4.4).

REMARK

7.1.4

Optimality of the Adaptive Scheme

For some specific dynamic processes, the adaptive scheme may become equivalent to the rational one. This class of processes has been described by Muth (1961). Let us consider the adaptive scheme of predictions at horizon 1 for an unspecified process y:

IYt+1 = Alt-lYI + (1 - AI)YI.

(7.10)

Let us also introduce the expectation errors at horizon 1:

Al

Et

A

= Yt - t-lYt·

Equation (7.10) can be written as

Y,+I - E:+I = AI~' - En + (1 - AI)YI Al Al ¢::> YHI = Yt + Et+1 - I\.IE, •

(7.11)

~

The adaptive scheme yields rational expectations if and only if E(Yllxt-l) ¢::> (E!) is a martingale difference sequence.

,-d, =

PROPOSITION 7.7: The adaptive scheme is equivalent to the rational one for a sequence of information sets (I,) = (~), t varying, if and only if the process Y is an autoregressive integrated moving average ARlMA(O,1,1) with a martingale difference error term.

In the past, for quite a long time, the adaptive scheme was a very popular tool of analysis. The reason w~s that the ARIMA(O,I,I) models featuring both a stochastic trend and a short-term temporal dependence

160

Expectations and Present Value Models

fit a significant number of monthly sampled time series. At that time, this type of data was commonly used in empirical research; this was before data sampled at higher frequencies became available.

7.2 Price Dynamics Associated with the Capital Asset Pricing Model In Section 4.2, we derived the constraints on price dynamics implied by the CAPM. In this section we further investigate the equilibrium condition to reveal that there exists a multiplicity of price dynamics compatible with the CAPM. Among the CAPM-compatible processes are, for example, the conditionally heteroscedastic processes presented in Chapter 6. To simplify the presentation, we consider a single risky financial asset. Let us recall the CAPM-implied restriction

a: =~ (VU't+1fl EU't+h

(7.12)

where Yt is the excess gain on the risky asset, and E t and Yt denote the conditional expectation and variance of an information set Ie, respectively. (We denote the excess gain by Yt rather than 1'; to emphasize that it is a scalar.) The supply (a.:) is assumed adapted with respect to (Ie), that is, is a function of It.

a:

7.2.1

The Multiplicity of Equilibrium Dynamics

Let us introduce ~h a positive function of information available at date t, and a martingale difference sequence (11t) with respect to It with unitary conditional variance Vt11t+ 1 = l. PROPOSITION 7.8: The processes (Yt) satisfying the equilibrium condition in (7.12) are such that (7.13) where (~/) and (111) are a positive adapted sequence and a conditionally standardized martingale difference sequence, respectively.

PROOF: (1) For the sufficient condition, if YI+I = Aa:~~ + ~/11t+h where ~t is positive adapted and 11t is a conditionally standardized martingale difference sequence, by taking the conditional expectation on both sides we get

EU't+1 = Et(Aa.:~n + Ee and Ve MI+I)'

Vj.

or, if there is a risk-free asset

Pj,t =1.!.r/t/Jj.t+1 + COVI(Pi,t+!, MI+I)'

Vj.

(S.16)

This is a decomposition of the price into a standard discounted present value and a risk premium. The risk premium can have any sign. If the asset price evolution is positively correlated with the discount factor (i.e., loosely speaking with the consumption growth), at equilibrium, investors need to pay a premium. Intuitively, the more assets are demanded for intertemporal transfers, the higher their price. The pricing formula in (S.16) shares some similarities with the pricing formula derived from the CAPM (see Chapter 4). Indeed, the optimization provides an explicit expression of the risk premium, linked with the market portfolio return in the CAPM, and consumption growth in the CCAPM. They both point out the presence of an underlying factor and the opportunity to decompose the risk into a systematic and an idiosyncratic risk. Moreover, in the CCAPM, a more risky asset does not necessarily have a higher price. For instance, let us consider an asset with a gross return that is

180

Intertemporal Behavior and the Method of Moments

PO,I+I/PO,t = 1 + rf+ /;+10 ~I+I is conditionally uncorrelated with the stochastic discount factor COV(~t+1o MI+ I ) O. Then, the pricing formula in (8.16) implies EfPo,t+l/poJ1 =

where

=

1 + rf' that is, the same return as the risk-free asset. In some sense, the component ~ is an idiosyncratic risk, while only the systematic risk (i.e., a risk component conditionally correlated with the stochastic discount factor) can be corrected for. In particular in the CCAPM framework, all discounted asset prices (1 + rfrpj,/J j varying, can be martingales if the returns include only idiosyncratic components, 8.1. 4 Special Utility Functions

For illustration, let us present the Euler conditions associated with exponential and power utility functions. Exponential Utility Function Let us consider the utility function U(c) = -exp(-Ac) , where A is positive.

Its derivative is

~(c) =A exp(-Ac), and the Euler condition becomes (8,17)

It involves the prices of financial assets, the inflation rate, and the increment of the consumption level. Power Utility Function Let us now consider a representative agent with the power utility function

cl ...., -1 U(c) =-1-'

-'Y

where 'Y is the coefficient of relative risk aversion. When 'Y approaches 1, the utility function is close to the logarithmic utility function u(c) = log c. The marginal utility is:

~(c) = c""', and this expression is also valid in the

limiting case 'Y= 1. By substituting into the Euler condition, we get

1 =Et[PjJ+I.!l!...o( Ct+1 )-1], pj,t qt+1 Ct

'Vj,

(8.18)

an expression that involves the rate of increase of the aggregate consumption. The practical advantage of the power utility function is combining various rates of growth in the following expression:

181

Intertemporal Equilibrium Model

Vj.

(8.19)

Log-Normal Model

A priori various asset price dynamics are compatible with Euler conditions (8.19) based on an associated power utility function (see Chapter 7 for the discussion of the multiplicity of solutions in rational expectation models). Let us restrict our attention to joint conditional log-normal distributions of various growth rates: Pj.t+lPi,h j = 0, 1, ... ,j, qt+llqh and CHI/Ct. Then, we infer from the moment-generating function of a Gaussian variable X, that is, E(exp X) =exp(EX + ~ VX), that

~

1 = 8 eXP{Et[log(Pi,t+lIPiJ ) -log(qHllqt) - ylog(CHlCt)] + V,[log(Pi,t+lPi,t) -log(qt+lqt) - Y 10g(Ct+I/Ct)]},

j=I, ... ,j.

By taking the logarithms on both sides, we obtain Et[log(Pi,HI/Pi.t)] = -log 8 + Et 10g(qHllqt) + yEt 10g(CHI/Ct)

(8.20) Vj.

Therefore, the Uointly) conditional log-normal model is compatible with the restriction implied by CCAPM if and only if the first- and secondorder conditional moments of the various growth rates are related by (8.20).

As for the CAPM model (see Chapter 6), condition (8.20) can be tested under a joint ARCH in mean (ARCH-M) specification for 10g(Pi.t+lPj.t), j = 1, ... ,n, 10g(qHllqt), and 10g(CHI/Ct). The multivariate specificatiqn involves at least three series: the returns of a market index (if n = 1), the inflation rate, and consumption growth. REMARK

8.4: For a risk-free asset, equation (8.20) becomes

10g(1 + rf) = -log 8 + Et 10g(qHlqt) + yEt 10g(CHI/Ct) -

1

2V,[log(qHlqt)

+ y 10g(CHlCt)]. The equilibrium risk-free rate is a decreasing function of 8. If the investor is impatient (8 is small), the risk-free rate increases to compensate the investor's lack of interest in the future and to ensure the existence of a market for the risk-free asset. It is an increasing function of the expected consumption growth since the risk-free asset is

Intertemporal Behavior and the Method of Moments

182

used for intertemporal transfers. It decreases with the volatility of consumption growth, making these transfers less efficient ex ante. 8.1.5 Mean· Variance Frontier

We derive from the Euler conditions 1 -- E t (Pi,t+l p. M 1+1,)

\..J '

"l,

J,t

a mean-variance frontier, which is an analog of the efficiency frontier derived under the standard Markovitz approach (see Chapter 4). Let us consider a portfolio allocation at time t, a" say; we get a:pt =Et[a:pt+lMl+d ¢:::>

Wt(a,) =E t[Wl+l(at)Ml+d

¢:::>

1 =E[ Wl+l(at)M

¢:::>

M ] E [ Wl+l(a,)] E'IIK 1 = CoVt [ Wt+l(a,) "Ut+l W,(a ) , t+l + t W,(a,)

¢:::>

E IIA" ( I -EI [ Wl+l(a,)] W,(a,) "·...1+1

t Wt(a,)

t+l

]

t

)2 =Cov2[, Wt+l(a,) M W,(a,) ,

]

1+ 1 •

By applying the Cauchy-Schwartz inequality, we obtain ( 1_E[Wt+l(a,)]EIIA" I Wt(a ,) "Ul+l

¢:::>

)2 < v. [WI+l(a,)] v.1IA" W,(a,) -

t

"UI+l

[E,Wl+l(a,) - (1 + rr)Wt(a,)]2 < V,M,+1 V;W,+1(a,) - (E,Ml+l)2 .

(8.21)

This inequality defines a mean-variance frontier, that is, an upper bound of the ratio of the portfolio squared excess gain and its volatility. However, it is important to note that, in general, (8.21) is satisfied with a strict inequality. In contrast, the frontier can only be reached if the Cauchy-Schwartz relation is an equality, that is, if there exists an allocation a! such that Ml+l and Wl+l(a!)/W,(an are proportional, with a proportionality factor being a function of the information at time t. It is easy to verify that this condition is equivalent to the existence of a portfolio allocation aT such that M '+1 =W,+1(aT). Such a portfolio is called a numeraire portfolio. A necessary condition for interpreting the stochastic discount factor as a portfolio value is

Intertemporal Equilibrium Model

183

Wla;) = Et[Wt+1(a;)Mt+d =Et[M~ll, If at is time independent, the condition becomes

Et[M~ll =Mt. REMARK 8.5: For a power utility function and conditionally uncorrelated dynamics of q!qt+1 and (Ct+/CS-'Y, the slope of the mean-variance frontier is

1.

It is intuitively higher if the economy in terms of inflation rate or consumption growth is more volatile and, if the investor is more risk averse, due to the y coefficient. In such an environment the assets are more attractive. The efficiency frontier based on the stochastic discount factor M t corresponding to the CCAPM is not necessarily attainable. Thus, a natural question to address is the following one: Does another stochastic discount factor Mt exist that is attainable and satisfies the pricing formula: 1 =E[t pj,t+1 p. M *t+l,]

'I.-I'? vJ.

(8.22)

J,t

The answer is straightforward. It was first given by Hansen and Jagannathan (1991). We assume a risk-free asset and an information set that includes the asset prices. Let us define M;+1 =LElMt+1Ipt+1) as the best approximation of Mt+1 by a linear function of Pj,t+1" j = 0, 1, .. .J, with coefficients that depend on the information at time t. It follows from the definition of linear regression (see Chapter 7) that Et[Pj,t+1Mt+d =Et[Pj,t+1LElMt+1Ipt+1)] = Et[Pj,t+1Mt +1] Therefore, we have the pricing formula

Pj,t = Et[Pj,t+1Mi +d,

Vj.

Moreover, Mt+1 can be written as ]

Mi +1 = a;L(1 + Tf) + L aj,tPj,t+b

(8.23)

j=1

that is, it is a portfolio value. As a by-product, the conditional variance of the attainable stochastic discount factor (8.24)

184

IntertempoTal BehavioT and the Method of Moments

where Pt includes the prices of risky assets only, is smaller than the conditional variance of the initial stochastic discount factor Ve(M1 +1) ~ Ve(Mt+I)' It provides a lower bound of the volatility of stochastic discount factors.

8.2

The Nonexpected Utility Hypothesis

8.2.1

Risk Aver.sion and Intertemporal Substitution

The individual preferences concern consumption plans across various states of the world and at various dates. There are two conceptually distinct aspects of preferences: (1) the attitude toward the variation in consumption across different states of the world (at a given date) and (2) the attitude toward the variation in consumption between different time periods (in the absence of risk). The time-separable utility function defines the following intertemporal utility:

The risk aversion is usually measured by the relative risk aversion coefficient A(C) = -C U"(C)IU'(C). The effect of intertemporal substitution can be evaluated by considering a stationary deterministic environment. In a deterministic environment, risky assets are not used at all, while all transfers are performed by means of a risk-free asset only. Moreover, the budget constraints at different dates can be aggregated into a single intertemporal budget constraint. Let us assume a constant pattern of income R, a constant risk-free rate Tf' and a constant inflation rate 13. The individual optimization objective at date t becomes max" ~ OiU(C/-Ij),

~ J3i ~ R 1 + Tf . sub~ect to: ~j=O qt---.Ct+j = I.j=O - - - i = R--

(1 + Tf)l

(1 + Tf)

Tf

The first-order conditions are

. , O'U (Ct+j) - Aqt(1

J3i

+ Tf)i

0,

'Vj~

0,

where A. is a Lagrange multiplier. They imply BU'(Ct+j)IU'(Ct+j-l) = /31(1 ¢:::>

+ Tf)'

'Vj

log U'(Ct+j) -log U'(Ct+j-l) = -IOg( 1; Tf) -log B.

185

The Nonexpected Utility Hypothesis

If the time unit is rather small, and Ct+j = Ct+j-h the left-hand side can be expanded:

(1 +Tt)

U' C/+j-l dlog dc (Ct+j-l)(lOg C/+j -log Ct+j-l) = -log -~- -log 3,

(1

(8.25)

Tf)

U'(C/+j-l) + Ct+j-l U"(Ct+j-l) (log -~-

log Ct+j -log C/+1- 1 =

+ log 3).

The elasticity of intertemporal substitution is defined as the derivative of the log of the planned consumption growth with respect to the log of the real interest rate. From formula (8.25), this elasticity is approximately equal to the inverse of the relative risk aversion coefficient A(Ct+j-l)' Therefore, the additive specification of expected utility is considered as very restrictive. It implicitly assumes that the only admissible combinations are either (1) a high-risk aversion and a low intertemporal substitutability or (2) a low-risk aversion and a high intertemporal substitutability. As a consequence, more flexible specifications are preferred, even at the expense of being less parsimonious. In particular, we are interested in such utility functions that would allow for a separate analysis of risk aversion and intertemporal substitution. 8.2.2

Recursive Utility

Let us consider the standard intertemporal additive expected utility. At date t, we get (8.26) Let us also distinguish the current consumption and the future consumption plan.

v; =

U(C/) + 3Et[

~ OiU(Ct+j+l) ]

= U(Ct) + 3E,Et+l [ Vt

=U(C

t)

~ OiU(Ct+j+l) J.

(8.27)

+ 3Et Vt+1•

Thus, the intertemporal utility at time t is the sum of the current utility U(C/) and the discounted expected future intertemporal utility. This recursive definition of V; can be extended in two respects. First, we can relax the assumption on additivity of the present and future utilities.

Intertemporal Behavior and the Method of Moments

186

Second, we can summarize the distribution of the future random utility by an indicator that includes a risk premium. Indeed, in (8.27), this distribution is summarized by the conditional expectation, which assumes a "risk-neutral" individual. Kreps and Porteus (1978), Epstein and Zin (1989), and Weil (1989) introduced a recursive intertemporal utility function. It is defined by

(8.28) where Ilt is a certainty equivalent of the future intertemporal utility Vt+1 evaluated at t, and W is an aggregator function, which aggregates the current consumption with a summary of the future to determine the current utility. The certainty equivalent is often assumed of the type

Ilt- {

(Et[v::d)I/a,

if 0 < a < 1,

exp Ellog Vt+I ),

if a = 0,

(8.29)

whereas the aggregator is a Constant Elasticity of Substitution (CES) function: W(C, Il)=

1[(1 - ~)CP + ~IlP]l/p,+

exp[(l - ~) log C ~ log Ill,

ifO defined as a function of the information at date t, denoted lj. We deduce from (8.34) that E[ztg(Ye+I;9)] = E[ztEtg(Ye+I;9)]

=0.

This is a marginal moment condition for the product offunctions Zg(Ye+I; 9). The variable Zt is called the instrumental variable or instrument. Then, we select K instruments, Zit, ••• , ZK.h and replace the initial set of L conditional restrictions by a set of KL marginal moment conditions: Vk=l, ... ,K,

l=l, ... ,L.

(8.35)

189

The Generalized Method of Moments 8.3.2

The Estimation Method

For ease of exposition, we assume L = 1 so that the set of marginal moment restrictions becomes (8.36) where Zt = (ZI,h ... ,ZK,t)'. The method of moments consists of approximating the previous moment conditions and solving the approximate set of equations for 9 to obtain an estimator. Three cases can be distinguished, depending on the number K of instruments and the dimension p of the parameter vector.

1. Underidentification. If the number of instruments is too small, K < p, there are not enough relations to determine a unique value of the parameter. The parameters are unidentified. 2. Exact identification. When K = p, there exists in general a unique solution. 3. Overidentification. When K > p, there are more equations than unknown parameters, and we cannot find an exact solution of the system. We will find instead an approximate solution. Let us assume that (ZhY;+I) is a stationary process. The theoretical expectation in (8.36) can be estimated by its empirical counterpart: 1

E[Ztg(Y;+1;9)] =

T

rL Ztg(Y;+1;9).

(8.37)

1=1

Let us introduce a symmetric positive definite matrix Q of dimension (K,K). DEFINITION 8.1: A moment estimator of9 based on the estimating equations in (8.36), the instruments z" and the weighting matrix Q is the solution of

For a given set of conditional restrictions, we obtain a multiplicity of moment estimators. They depend on the selected instruments, their number, and the weighting matrix. These estimators are consistent, but their efficiency depends on Q and z. PROPOSITION

normal:

where

8.1: The moment estimator of 9 is consistent, asymptotically

Intertemporal Behavior and the Method of Moments

190

L(O) =

{E[~(Yt+l;S)Z:J QE [i(Yt+l;S)ZtJf

E[~(Yt+l;S)Z:J o Vas [Jy~ Ztg{Yt+l;S) JQE [i(Yt+l;S)J

{E[~(Yt+l;S)ZtJ QE [i(Yt+l;S)ZtJr where Vas denotes the limiting variance for large T.

For a given set of instruments, there exists an optimal choice of n. This choice provides the most efficient moment estimator given the instrument z. It is called the generalized moment estimator (see Hansen 1982). Its properties are described below. PROPOSITION 8.2: There exists a weighting matrix 0* that optimizes the efficiency of the moment estimator

The associated estimator (i.e., the GMM estimator) has the asymptotic variancecovariance matrix

In practice, the optimal weighting matrix is unknown and has to be estimated. We get

...

=

L E[ZtZ:-kg{Yt+l;S)g{Yt+l-k;S)] k=-oo

A consistent estimator is

The Generalized Method of Moments

191

r

n* = [~~ Z~~Y,+1;9i

where 9 is a moment estimator computed with a simple Q matrix, such as Q=Id. Even with an optimal choice of the weighting matrix, the accuracy of the GMM estimator still depends on the selected instruments. The estimators may not be fully efficient if the instruments are selected in an inappropriate way. EXAMPLE 8.3: For instance, let us assume a one-dimensional parameter e in a stochastic discount factor model. We can use in the estimation a single "bad" instrument Zt = 1. The estimating restriction is simply the marginal pricing formula

E[Pt+lIPtm(XHl;e) - 1] = o. Intuitively, a constant instrument does not contain much information on the dynamic of the discount factor, and the estimator of e is not very efficient. It is preferable to introduce several instruments, such as Zl.t= 1, Z2.t=PelPt-h Z3.t=Xh and Z4.t = (P/Pt_l)2, including the lagged values of the processes and their squares. 8.3.3 Implementation

The Euler conditions and associated GMM estimators can be used to analyze the individual behaviors of investors and to study asset prices. These two types of applications require different data. We discuss the approaches in relation to the CCAPM model with a power utility function. Individual Behavior We show in Section 8.1.2 that the Euler conditions are valid at the individuallevel. Let us consider a set of individuals i = 1, ... , n, with portfolios invested in the same assets j = 1, ... ,j. The Euler conditions for an individual i are j=l, ... ,j,

(8.38)

where Cj•t is the consumption at t of the individual i when all individuals have the same information, consisting for example, of asset and consumption good prices. Thus, if the individual consumption data are available, we can apply a GMM approach based on the conditional moments corresponding to individual i. We find 8j and Yj, which approximate the individual subjective

Intertemporal Behavior and the Method of Moments

192

discount factor and relative risk aversion coefficient, respectively. Then, the individuals can be compared in terms of pairs (8;,1i)' i = 1, ... , n to derive a segmentation of the population of investors into homogeneous groups with similar preferences. We can also compute for any individual the empirical differences

!

T

f [Pi,t+lPi,t ~8i(Ci,t+l)-1; C;,t t=l

1] = ~i,j'

(say).

qt+l

If these differences ~ij' j = 1, ... ,j, are close to 0, the hypothesis of the behavior of individual i being optimal cannot be rejected. It is rejected otherwise. Thus, we are able to identify the optimally behaved individuals in the sample. Asset Price Analysis

We can also follow a macroeconomic approach, which implicitly assumes a single representative investor, to find a pricing formula of the type

1 =Et[

P;;:l ~8 (~l fJ.

j=I, ... ,j.

(8.39)

In this approach, we focus on the possible model misspecification before estimating the parameters 8 and yand using the estimated values to price more complex assets. The idea is to apply the GMM method with different sets of instruments: z: = (.z!h ... , .i,.,), 1= 1, ... ,L, say. We derive the associated GMM estimators (81)11), 1= 1, ... ,L. Then, by comparing these estimators we find (1) what instruments are included in the information set and (2) among the instruments, which are the most informative ones. This approach is described in detail in Chapter 13, in which we also explain how the estimated stochastic discount factors can be used to price derivative assets.

8.4 Summary This chapter generalized the CAPM model to an intertemporal equilibrium model, called the Consumption-Based CAPM. An important feature of this approach is the presence of a behavioral equation defining consumer preferences. A consumer is supposed to hold assets to maximize utility, measured in units of a consumption good, and to take into account consumption in future periods. The first-order conditions of the maximization objective, called the Euler conditions, can be solved empirically using the generalized method of moments. The behavior of individuals can be described by various forms of the utility functions. Especially, in-

Summary

193

vestors vary with respect to their risk aversion and discount rate of future states. The GMM relies on moment conditions that involve instrumental variables. Among problems often encountered in empirical research is the choice of instruments, which need to be orthogonal to the transformation involved in the Euler equation to ensure good performance of the GMM estimator.

9 Dynamic Factor Models

IN CHAPTERS 3 AND 5, we investigated linear models of returns (prices) on various financial assets. We have pointed out several aspects of joint dynamics, such as feedback effects and cointegration. In this chapter, we present yet another specification that involves multiple time series, called a factor model. Under this approach, the dynamics of returns on a set of assets is explained by a common effect of a limited number of variables, called factors, which mayor may not be observed. In the first section, we present the models with observable factors and discuss their role in empirical finance. We show that factor models can be used to construct benchmark portfolios or to diversify investments. Linear factor models with unobservable factors are examined in Section 9.2. The approach is extended to nonlinear factor models in Sections 9.3 and to Markov models with finite dimensional dependence in Section 9.4. It is important to mention that a significant number of factor models are considered in the financial literature in a static framework. A common practice consists of performing singular value decomposition of a samplebased variance-covariance matrix of returns. This approach implicitly assumes independent identically distributed (i.i.d.) returns and disregards temporal dependence and dynamic nonlinearities, such as conditional heteroscedasticity. Essentially, these factors are not dynamic. Genuine dynamic factor models have been introduced into the literature quite recently. In this chapter, we focus on the factor model specifications and factor interpretations. Empirical applications of factor models to liquidity analysis (see Chapter 14) and extreme risk dynamics (see Chapter 16) are also described. In the last Section we explain how to introduce cross-sectional factors and discuss in detail the arbitrage pricing theory (APT).

195

196

Dynamic Factor Models

9.1

Linear Factor Models with Observable Factors 9.1.1

The Model

We first examine the model, in which factors are simple linear functions of some observable variables. More precisely, a multivariate process (1';), of dimension n, is represented by a system of seemingly unrelated equations: (9.1) where B is an (n,L) matrix, X t is an L-dimensional vector of observable explanatory variables, and U t is an n-dimensional error term, such that

(9.2) The number L of explanatory variables may be greater than or less than the dimension n of the vector of endogenous series. The unknown parameters B and n, estimated by ordinary l~ast s

Y2,t = £2,t> Y3,t =

£3,t

+ £2,1-1 + 0.5£3,t-3'

This process admits a VMA representation of order 3, but clearly Y 2,t = (0,1,0)}'t is a white noise direction. Intuitively, the moving average orders are not equal in all directions; therefore, the vector moving average representation has orders (3,1,0). More precisely, it is possible to define the multivariate order ql:::; q2' .. :::; qn = q of a VMA process by the following method (see Gourieroux and Peaucelle 1992; Vahid and Engle 1997). Y1,t

= (l,0,0)}'t has a MA(l) representation, whereas

Dynamic Factor Models

204 Let us consider the matrices

Ah = r(h + 1)r(Or2r(h + 1)' + ... + r(q)r(Or2r(q)', h= 1, ... , q-l.

(9.20)

We obtain a decreasing sequence of nonnegative symmetric matrices. Therefore, the spaces Eh = KerA h, h = 1, ... , q - 1 form an increasing sequence of vector spaces. Moreover, e belongs to Eh if and only if e'r(h + 1) = ... = e'r(q) = 0, that is, the linear combination e'Y; has a moving average representation of order less or equal to h. Let us denote by fJj, j = 1, .. . ,j, the increasing sequence of indexes such that Eq ~Eq-I; assume by convention that B-1 = {OJ; and denote by nj' j = 1, ... ,J, the differences between dimensions nj =dimEqj - dimEqj-l' Then, the process admits a moving average representation of a multivariate order: qh ... , ql q2, ... , q2, ... , qJ' ... , qp nl ~ nJ In practice, the analysis of multivariate moving average orders is performed for the following purpose. Let us assume that fJJ is rather large, whereas ql is small and close to 0, and let us denote by e the direction associated with the order ql' The linear combination e'Y; responds to a transitory shock only during a very short time since the multiplier effect vanishes after the lag ql + 1, whereas the shock has an effect for fJJ lags on each component. Therefore, temporary shocks to e'Y; are short lived, and e'Y; defines a stable relation between the variables called the eodependenee direction. To conclude the discussion, let us point out that codependence directions of stationary time series are the analogues of cointegration directions of nonstationary time series. This technique is used for exhibiting stable relations of stationary series such as the relative purchasing power parity, which implies that a relative increment of an exchange rate between two currencies is approximately equal to the difference between inflation rates of the two countries.

9.3 Nonlinear Factor Models 9.3.1

The Measurement and Transition Equations

These models are generally defined by a state-space representation involving nonlinear measurement and transition equations. The measurement equation explains how the variables of interest depend on lagged values of the process, unobservable factors, and an additional strong white noise. Typically, this equation is

205

Nonlinear Factor Models

(9.21) where the (multivariate) error term (u I ) is assumed to be i.i.d. with a known distribution. The transition equation describes the factor dynamics (9.22) where (£1) is a strong white noise with known distribution and is independent of (u I ). The functions a and b are known up to a parameter 9. The system (9.21) and (9.22) implicitly assumes that, except for the effect of Y;-h the entire dynamics is completely determined by the current factor Fe. It is easy to check that the system can be alternatively represented in terms of conditional distributions. The measurement equation specifies the conditional distribution of Y; given fl,Y;-h which depends on the past through Flo Y;-I only: (9.23) The transition equation specifies the conditional distribution of F, given FI-I>Y;-h which depends on the past through FI_I only: (9.24) From these conditional distributions, we deduce [(YI,j; IJ;-I> YI-I) = [(YI

Ii!, 2't:!) [ (/t Ij;-I> 2't:!)

= g (YI IJ;, YI-I ;9) 1t (/t IJ;-I ;8),

and recursively find the distribution of the bivariate process (Y;, F,). EXAMPLE 9.2, STOCHASTIC MEAN AND VOLATIUTY MODEL: Let (YI) be the return series of a financial asset. The stochastic mean and volatility "parameters" ml and 0 1 can be introduced into the return equation

where (u I ) is independent identically N(O,l) distributed (IIN(O, 1)). As well, we can impose a dynamic structure of the stochastic parameters ml and 01> such as

where (£1) is IIN(O, [d:). This specification extends the stochastic volatility model with ml = (see Section 6.3). This is a factor representation, where the factor components F, = (mhOt)' have straightforward finan-

°

206

Dynamic Factor Models

cial interpretations. Moreover, the transition equation allows us to study various effects, such as the impact of log 0"1-1 on ml reflected by the coefficient YI-I). The joint distribution can be decomposed in the following way (Lancaster 1968). PROPOSITION

9.2:

If JJ[i;7i~~)]

2

f(Yt)f(Y~I)dytiYI.1
YI.1) =f{YI)f(YI.1){ 1 +

f A, 0 and is Ye = 0 otherwise. The analysis can show, for example, that sufficient information is contained in the most recent state and the duration of that state. While the initial state space contains 25 = 32 different patterns that correspond to various admissible sequences of ups and downs, the reduced state space will include 10 states only. The set corresponding to an up state of duration 1 consists of the following eight patterns: up,down,up,up,up up,down,up,up,down up,down,up,down,up up,down,up,down,down up,down,down,up,up up,down,down,up,down up,down,down,down,up up,down,down,down,down.

Dynamic Qualitative Processes

240

10.5 Summary This chapter discussed the qualitative representation of financial data. The approach consists of dividing the value space of a process into distinct states and assigning a dummy variable to each of them. Accordingly, the dynamics of the new process consists of transitions between various states. This approach can be applied to asset returns, which admit states of high and low values, for example, or to squared returns that approximate the volatility process, in which the regimes of moderate, high, and low volatility can be distinguished. On one hand, this approach can be criticized for omission of information contained in the initial series. On the other hand, the transformation of a series into a qualitative process allows isolation of non-linear dynamic features that are essential to the subject of research. In this way, all valuable information is retained, and unwanted patterns are eliminated. Volatility and return transitions between various regimes are a topic of ongoing research. Recently, it has been revealed, for example, regime switches are a potential cause of spurious long memory in return volatility, mentioned in Chapter 5. As well, the interstate transition dynamics leads to jump processes that may be considered an additional source of randomness in continuous time price processes, examined in Chapter 11. The random times spent by a qualitative process in various states form a stochastic process with interesting features. They can be modeled using dynamic duration models, which are discussed in Chapter 14 in the context of modeling the times between trades on stock markets.

Appendix 10.1: Autoregressive Moving Average Representation of the Process (Yt) The matrix autocovariances of the multivariate process (Zt) at order h, r(h) =Cov(Zt>Zt_h), is a linear transformation of ph. Therefore, since the transition matrix may be diagonalized, every covariance Yj,I,(h) =Cov(ZjJ' Zk,t-h) is a linear combination of the powers of the eigenvalues: ]

'Yi;.(h) =

L aJ~A.~. i=1

Since Ye is a linear transform of Zt, the same property holds for the autocovariance: ]

y(h) = Cov(Ye,Ye+h) =

L a(i)A.~

(say).

i=1

The existence of a linear ARMAUJ) representation for the (Ye) process is a consequence of this form.

11 Diffusion Models

CONTINUOUS TIME MODELS that assume that asset prices follow stochastic differential equations (SDEs) were introduced into the literature by Bachelier (1900), Working (1934), and Osborne (1959) in the first half of the last century. This approach, however, was not pursued in finance until the 1970s. At that time, it was recognized that continuous time models provide a convenient framework for determining the prices of derivative assets under the so-called complete market hypothesis. In this chapter, we present the continuous time models and explain their application to derivative pricing. In particular, we derive and discuss the well-known Black Scholes formula (Black and Scholes 1973). The theoretical concepts are further extended in Chapters 12 and 13, in which we examine estimation of continuous time models and computation of derivative prices.

11.1

Stochastic Differential Equations

This section contains a comprehensive overview of stochastic differential equations, along with basic insights into their practical implementation. In some sense, we introduce the class of continuous time processes along the same lines as the class of autoregressive moving average (ARMA) processes. We first introduce an elementary continuous time process, called Brownian motion, with increments that behave like a Gaussian standard white noise. Next, we define other, more complex, continuous time processes using Brownian motion as an elementary building block. 11.1.1 Brownian Motion

Brownian motion was first defined by Wiener (1923, 1924) and since has also been known as the Wiener process. Let us denote it by (WI> t E R+). 241

Diffusion Models

242

Since Brownian motion is an extension of a Gaussian random walk to the continuous time framework, let us recall at this point some basic properties of the Gaussian random walk. A standardized Gaussian random walk is a discrete time process satisfying

Yt=Yt-1 +Eh t~ 1, (11.1) where yo = 0, and Eh t ~ 1, is a sequence of independent variables with standard normal distribution N(O,I). The moving average representation of the random walk is t~

(11.2)

1.

log YI = log Yo + t(Jl-

¢:::>

YI =Yo exp t(Jl-

~) + aWl

(11.12)

~)exp aWl'

From this analytical expression, we deduce the distributional properties of the solution. For instance, the conditional distribution of YI given yo is a log-normal distribution with parameters log yo + t(Jl-

~) and a 2• Its con-

ditional mean is

( d)

2

a t, = yo exp t Jl-"2 eXP"2 E(YI Iyo) = yo exp Jlt.

(11.13)

249

Diffusion Models with Explicit Solutions

The trajectories of the solution feature an exponential trend. The conditional variance is given by

V(Yt Iyo) = E(y~ Iyo) - [E(yt lyo)]2 = y~ exp t(2Jl- 0-2)E(exp 20-Wt )

-

y~ exp(2Jlt).

(11.14)

= y~ exp(2Jlt)[exp(dt) - 1].

V(Yt Iyo) = E(Yt lyo)2[exp(0-2t) - 1]. The last equation implies an exponentially increasing ratio of the variance and squared mean. Figure 11.2 displays a simulated path of a geometric Brownian motion where Jl = 0.5 and 0- = 1. 11.2.2

The Ornstein-Uhlenbeck Process

The stochastic differential equation for the Ornstein-Uhlenbeck (OU) process is (11.15) To solve this equation, it is useful to consider first a deterministic equation corresponding to (11.15) without noise, that is, under the constraint 0- = O. This equation (11.16)

o

T

Figure 11.2 Simulated Path of a Geometric Broumian Motion

250

DijJu.sion Models

is an ordinary differential equation that is linear in y" Therefore, its general solution admits the form (11.17) where k is an arbitrary constant. To solve the initial stochastic differential equation, we introduce the change of variable y, ~ ~h where:

y, = ~, exp(-At) + X .

(11.18)

By Ito's lemma, the second-order correction term is not required since the relation between y, and ~ is linear. We get dYI =d~, exp(-At) - A~, exp(-At)dt

=d~ exp(-At) - A~I - ~ )dt = d~, exp(-At) + ( - AYI)dt.

From (11.15), we deduce O'dW, =d~ exp(-At)

d~t = 0' exp(At)dW,. Therefore, for any pair of dates t < t', we can express ~t' and of the noise realizations between t' and t:

~t = ~( + 1:, 0' exp(Au)dWu,

~

as a function of

t' < t.

A similar relation is derived from (11.18) for the initial process Yt =

=

~ exp(-At) + ~ ~( exp(-At) + ~ + exp(-At) J~ 0' exp(Au)dWu

= [Y( exp(At') -

~ exp(At')] exp(-At) + ~ + 0' exp(-At) J; exp(Au)dW

u

=y( exp- A(t- t')+~ [1- exp- A(t- t')] +0' exp(-At) J~ exp(Au)dWu'

251

Diffusion Models with Explicit Solutions

PROPOSITION 11.3: The solutions of the stochastic differential equation dYe = (ep - 'AYe)dt + O'dWe satisfy, for any t' < t, Ye = exp[-'A(t - t')]y( +

t[l-

exp - 'A(t - t')] + 0'

I; exp -'A(t - u)dWu'

Several important results can be inferred from this expression. They are given below as corollaries. COROLLARY 11.1: For an

ou process, Ye-exp[-'A{t- t')]y( is independent Of]f.

PROOF: Indeed, Ye - exp[-'A(t - t')]Yt is a function of infinitesimal increments of the Brownian motion on (t',t), whereas the values y., t < t', depend on the value of the Brownian motion prior to t'. Then, the corollary is a consequence of the property of independent increments of the Brownian motion. QED We now explicitly describe the relation when t' = t - 1. We get YI = (exp - 'A)YI_I + ep

1- exp- 'A Ie 'A + 0' e-I exp - 'A(t - u)dWu'

The variables f:-I exp - 'A{t - u)dWu, t varying, are Gaussian and independent, with mean 0 and variance given by 0'2

cr (1 - exp - 2'A). II-Ie exp - 2'A(t - u)du = 2'A

Therefore, we can write I

_

1 - exp - 'A ( 1 - exp - 2'A)2 Ye - (exp - 'A)YI_I + ep 'A + 0' 2'A €t,

(11.19)

where (£1' t E Z) is a Gaussian white noise with unitary variance. We get the following corollary: COROLLARY 11.2: For A. > 0, the discrete time process (Yh t E Z) is a Gaussian autoregressive process of order 1, with mean ep/A, autoregressive coefficient ~) d' .. J1-exp-2'A exp (-,..., an znnovatwn varzance u 21 This result can be extended to prove that, for 'A> 0, the continuous time process (YI> t E R) is also stationary and Gaussian. Thus, the au process is the continuous time analog of the Gaussian AR(l) autoregressive process (see also Chapter 2).

252

Diffusion Mode15 11.2.3

The Cox-Ingersoll-Ross Model

The model of Cox, Ingersoll, and Ross (1985) was introduced to represent the dynamics of short-term interest rate. The stochastic differential equation is dYt =(a - bYt)dt + 0' -VY,dWt.

(11.20)

While the drift is a linear function of Yt as in the Ornstein-Uhlenbeck process, the volatility is a square root of y. For this reason, the CIR (CoxIngersoll-Ross) process is often called a square root process. PROPOSITION

cr

11.4: Let us denote c(t) = 4b [1- exp(-bt)J. Then, the condi-

tional Laplace transform of the process Xt = Yt/c(t) is given by

1

E[exp - Axt Iyo] = [211. + l]2aJ02 exp

[A 4y ob ] 211. + 1 d(exp bt - 1) .

The conditional distribution of Xt = Yt/c(t) given Yo is a noncentered chi-square distribution with 0 = 4a/d degrees of freedom and the noncentrality parameter 4yob ~ d(expbt-1)'

Let us recall that the chi-square distribution X2 (O,~) admits a density function j(x)

exp - (l;/2) 2~&4-112

&4-112 _~ exp(-xl2)x 1&2-1 (vxC;),

(11.21)

where Iix) is a Bessel function I.(x) =(xl2)

v

-

(xl2)2n

~ n!r(v + n + 1) .

(11.22)

This result is valid when 0 and ~ are positive or, equivalently, if a > 0 and b > O. Then, the CIR process takes positive values. Moreover, the condition 2a/d < 1 is also imposed for stationarity. The marginal distribution is derived by setting t = 00. We see that Yt 4b/d - x2(4a/d).

(11.23)

The first- and second-order conditional moments are easily derived from the second-order expansion of the log-Laplace transform. Indeed, it is easily checked that

11.2 log E[exp -Axt Iyo] = - AE(xt Iyo) + 2V(xt Iyo) + 0(11.2). We find

(11.24)

253

Approximation of Diffusion Models

log E[ exp - Axt Iyo] = -

A cl log(2A + 1) - 2A + 1 ~

2a

Therefore, we get

(11.25) The conditional mean and variance are both linear functions of ~ and thus of yo. The marginal moments correspond to the limiting case t = +00: E(Yt) = alb,

acl

V(Yt) = 2b2

•

(11.26)

11.3 Approximation of Diffusion Models When a diffusion equation does not admit a closed-form solution, it is generally also impossible to find the analytical formula of the conditional distribution of Yt given Yo, say. The knowledge of this distribution is, however, necessary for derivative asset pricing and estimation. In this section, we present various approximations of a diffusion model, such that the conditional distribution of the approximated model is easy to derive and close to the true one. 11.3.1

Euler Discretization

The Euler discretization approach consists of replacing the initial diffusion model

yo given,

(11.27)

by a recursive equation in discrete time. More precisely, let us introduce a small time interval () and consider the process (y~S), t E R+) such that (S)

Yt

(S)

=Y[tlSj,

where [x] denotes the greatest integer less than or equal to x, and

(11.28) where E!,S), n varying, is a Gaussian standard white noise.

254

Diffusion Models

This equation represents the Euler discretized version of the continuous time diffusion (11.27). A transition from date n - 1 to date n takes time 0; dYI has been replaced by the corresponding increment !!y~5), dt by 0, and dWt> which is N(O,O), has been standardized, that is, divided by 0 112 • This discretization is interesting to us because of the following proposition: PROPOSITION 11.5: The conditional distribution of y~5) given Yo tends to the conditional distribution of YI given Yo when 0 tends to o.

11.3.2. Binomial Tree

Approximation (11.28) has been derived by time discretization that preserved the normality of the noise. It is also possible to discretize both time and the value space of the noise. A binomial tree is obtained when the noise admits only two values, denoted U5 and d5, for up and down movements, respectively (see Section 11.4). The approximated model is (5) _

(5)

Yn - Yn-I

+ ,,(5)

en,

where with probability P5&~~I)' with probability 1 - P5&~~I). The admissible values of the shock may depend on the lagged price, as does the probability of an up movement. The proposition below gives a condition for the convergence in distribution of y~5) to YI that is easy to interpret (Stroock and Varadhan 1979; Nelson and Ramaswamy 1990). 11.6: The conditional distribution of yl5) given Yo tends to the conditional distribution ofYI given Yo, when 0 tends to 0, if the two processes have asymptotically the same instantaneous drift function and volatility function and the instantaneous third-order moment tends to O. PROPOSITION

For instance, we can set the average drift and volatility over a period of time of length 0 equal to their continuous time counterparts. The conditions are

{

8I E[Yn(5) 1

(5)

8V [Yn or, equivalently,

(5)

I (5)]

(5)

(5)

Yn-I Yn-I =

-Yn-IIYn-l] =

[ (5) ] ~ Yn-I ,

1

8V [Yn IYn-I]=(j~n-I]' (5)

(5)

2

(5)

Approximation of Diffusion Models

255

We get a system of two equations with three unknowns p&, u&, and d&. Now, we need to select all admissible solutions with trajectories that are sufficiently smooth to ensure that the limiting path of Yt is continuous (see Section 11.6.2 for the discontinuous case). This is the reason for introducing an additional condition on the instantaneous third moment, which may be written as

1

~

E[(Yn

(&)

(&) )31 (&)] - Yn-I Yn-I

~

0,

when o~O.

In particular, we can introduce the third equation (&)][ (&)) (&) ]3 p&[Yn-I u& Yn-I - Yn-I +

[1 - p&(Yn-I )][d (Yn-I (&)

&

(&) )

0

(&)]3 - Yn-I = .

11.3.3 Simulation of a Path of a Diffusion Process Instead of simulating a path of a diffusion process, which would require determination of a continuum of values Yt, t ~ 0, and is practically infeasible, a simulated discrete time path can be based on one of the approximated models. This approach was used to plot Figures 11.1 and 11.2. Let us consider the Euler discretization. We first select a time unit 0 = 1/100 (say) and then compute recursively

(.,') 100 1 + cr(Yn-I ' ) 10 1 Em , Yn" = Yn-I + ~\yn-I

n= 1,.,.,

where E~, n = 0, 1, ... , is a sequence of independent drawings from the standard normal distribution, and Y~ = 0 (say). Next, we plot the values Y~ against the associated times t = nll00 and link the points by segments of a straight line. This approach can be repeated several times, producing a number of different trajectories. In particular, we may consider another sequence of drawings n varying, and generate another simulated path y~', s varying. Such a repetition is called a replication. As an illustration, we have drawn 200 simulated paths of the continuous time process with length N = 1000 [= t = 10],

e:;,

dYt = 2(1 - Yt)dt + dWt. We display in Figure 11.3 the estimated marginal distribution of YIO inferred from the repeated simulations. Note that it is an approximation ofa normal distribution with mean 1 and standard error 0.5 [see (11.19)].

Diffusion Models

256

.

>-CX)

(.)

~ 0 ::I

tTco•

(I)

.:::0

C\I

ci

o ci

L -_ _~_ _ _ _ _ _~_ _ _ _ _ _~_ _ _ _ _ _~_ _ _ _ _ _~_ _~

0.0

0.5

1.0 y[10]

1.5

2.0

Figure 11.3 Monte-Carlo Approximation of Marginal Distribution

The accuracy of this approximation depends on the selected time unit B. The smaller the time unit, the more accurate is the approximation to a continuous time trajectory. We provide in Figure 11.4 the approximations obtained with B= 11100, B= 1110, and B= 1. Their comparison shows clearly that the accuracy of the approximation is very sensitive to the selected B, which needs to be fIxed at a sufficiently small value. Sometimes, the accuracy of the estimated marginal distribution may be improved by introducing either dependent replications or a different Euler discretization scheme. For instance, instead of independent drawings of the sequence (£1, ... , £fv), for varying 5, we can perform a single drawing and produce the additional ones by reverting the sign of a subset of components. We still obtain drawings in the same distribution; however, they are dependent. This simulation technique is called antithetic. We can also perform the simulation under a different Euler discretization. Recall that the Euler discretization depends on a preliminary transformation applied to the data. If g is a one-to-one transformation, we deduce from Ito's formula

with the following Euler discretization:

J

(5)] _

gLYn

J

(5) ] _ [

gLYn-1 -

~ (5)] [(5)] ! ~ [ (5) ] -2[ (5) ]] I: ~ [ (5)] [(5)] _ ~ (5) ay [Yn-l /l Yn-l + 2 al Yn-l U Yn-l U ay Yn-l 0" Yn-I'V 0 E;; •

257

Approximation of Diffusion Models ~

.-----------------------------------------,

zs the cdf of the standard nor-

mal distribution. PROOF: See Appendix 11.2 11. 4.3

General Result

Pricing Formula The previous approach can be extended to risky assets with more complex dynamics. We still consider a risk-free asset with a risk-free rate rand a risky asset with a price (S,) that follows a Markov process. The market is complete if there exists a unique pricing formula for any derivative written on S. Let us consider a derivative asset providing the cash flow g(ST) at date T and denote by C(t,T - t,g) its price at date t, t ~ T. It is possible to show that the pricing formula can only be of the type C(t, T - t,g) = exp - r(T - t)Ex[g(ST) ISa, where

1t

is a unique modified distribution of the process (S,).

(11.35)

Derivative Pricing in Complete Market

267

The conditional risk-neutral density, that is, the probability density function (pdf) of ST given Se. denoted 1tT-ls 1St), is directly related to prices of digital options. Let us introduce the derivative associated with the cash flow g(ST) = l(s,!+d5)(ST). From (11.35), the price of this digital option is approximately equal to exp - r(T - t)1tT_ls ISt)ds. Therefore, 1tT-ls ISt)ds is a normalized price of a digital option, that is, its price divided by the price of the zero-coupon bond with residual maturity T - t. Determination of the Risk-Neutral Probability

Let us assume that the asset price satisfies a stochastic differential equation (l1.36) There exist three methods of computing the risk-neutral probability. The first one approximates the continuous time dynamics by an appropriate binomial tree and computes the risk-neutral probability associated with this tree. This approach is frequently used by practitioners. The second approach is based on a theorem about martingales that characterizes the change of probability measure that leads to a martingale price process. This theorem, called the Girsanov theorem, is given below in the framework of asset prices determined by (11.36) (see, e.g., Karatzas and Shreve 1988, p. 184; Pham and Touzi 1996). PROPOSITION 11.14, GIRSANOV THEOREM: The change of probability measure is such that E"[g(ST) 1St]

= E[exp{-

where ~ =!leSt), probability.

0",

r ~ ~,rs,

dW, -

~

r (~ ~,rs,

fd't}g(ST)

IS}

=O"(S,), and E is the expectation with respect to the historical

Finally, the risk-neutral probability may also be derived by solving partial differential equations. More precisely, let us fix the maturity T and the cash flow g and denote G(t)=C(t,T- t;g) the price at t. The function G will satisfy a partial differential equation. The coefficients of this equation are independent of T and g, that is, of the derivative to be priced. The effect of the derivative is taken into account only by the terminal condition G(T) =g(ST). The differential equation satisfied by the option price is derived by arbitrage conditions (Merton 1973b). The idea is to construct a risk-free self-financed portfolio based on both the asset and the derivative. By the absence of arbitrage condition, its return will be equal to the risk-free rate.

Diffusion Mode15

268

Let us denote by a(t) and alt) the allocations in the basic asset and the derivative at date t. They are selected depending on the price history. The self-financing condition is a(t)S(t + dt) + alt)G(t + dt) = a(t + dt)S(t + dt) + alt + dt)G(t + dt),

for small dt. This is equivalent to da(t)S(t) + dalt)G(t) = 0,

"It.

(11.37)

Let us now assume that the derivative price is a function of the current value of the asset G(t) = G[St,t], say. Then, by Ito's lemma, we get (11.38) where oG oG 1 (fG 2 !Le= oS !L+-at+"2 OS2 0,

(11.39)

oG Oe = oS 0.

(11.40)

The value V(t) of the self-financed portfolio is also a function of S(t) and satisfies a stochastic differential equation. We get dV(t)

=da(t)S(t) + dag(t)G(t) + a(t)dS(t) + alt)dG(t) = a(t)dS(t) + alt)dG(t),

by the self-financing condition,

= (a!L + ag!1g)dt + (ao + apg)dWt.

The portfolio is riskless if aO + agog =0, "It. Under this condition, its expected return is equal to the risk-free return: a!L + ag!1g = r(aS + agG). We deduce the condition !L(t) - S(t)r o(t)

!Ltt) - G(t)r olt)

(ll.41)

that is, the equality of the instantaneous excess performance of the two risky assets. By substituting the expressions of the instantaneous drift !Lg and volatility ago we deduce

!L- Sr -0-

=

(oGloS)!L + oGlot + 0.5(02GloS2)a2 - Gr (oGloS)o

or, equivalently, (11.42)

269

Derivative Pricing in Incomplete Markets

which is a second-order linear parabolic partial differential equation of G. It is important to note that the coefficients ofthis equation depend on neither the transformation g nor the maturity T. The solution G[S,t] is subject to the boundary conditions

and G[O,t] =0,

Vt~

T.

The second condition means that the market for the derivative cannot exist when the market for the underlying asset is eliminated.

11.5

Derivative Pricing in Incomplete Markets

The possibility of pricing derivatives without ambiguity in complete markets is due to the dimension of the underlying dynamic model. Indeed, for the binomial tree, the number of admissible states one period ahead equals the number of assets; similarly, the movements of the log-normal process are driven by two shocks dt and dWt> a number equal to the number of assets. In this section, we discuss the incomplete market, in which the number of tradable assets is less than the number of shocks. We first extend the tree by considering a double binomial tree. By using this simple approach, we describe the problem of multiple admissible pricing formulas. Then, we provide the general pricing formula and apply it to a stochastic volatility model. 11.5.1

A Double Binomial Tree

We introduce a tree in which two binary movements can take place between consecutive trading dates. We focus on the pricing at horizon 1, where 1 denotes the first trading date after date O. Between 0 and t = 1/2, we can have up or down movements with probabilities p and 1 - p, respectively, and independently between t = 112 and 1, the same type of movements with the same weights. Using the notation of Section 11.4.1, the price evolution between 0 and 1 is defined as with probability p2, with probability 2p(1- P), with probability (1- pi.

Diffusion Models

270

Moreover, we assume a risk-free asset, which may be traded at integervalued dates. Its price at date is 1, whereas its price at date 1 is (1 + d. We have three digital options (Arrow securities) with maturity 1, associated with the movements (up, up), (up, down) or (down, up), (down, down). Let us denote by c(1,l), c(1,O), and c(O,O) their admissible prices. By the no arbitrage argument, we get for the risky asset,

°

So = (1 + u)2So c(1, 1) + (1 + u)(l + d)So c(l, 0) + (1 + d)2So c (0, 0),

and for the risk-free asset, 1 = (1 + d[c(1, 1) + c(1, 0) + c(O, 0)].

We get two linear equations for the three unknown digital prices, which have to be solved under the price nonnegativity constraint: I =1(1 + uic(1, 1) + (1 + u)(l + d)c(1, 0) + (1 + d) 2c(0, 0), {

(11.43)

-1--2 = c(1, 1) + c(1,O) + c(O, 0), ( +r)

where c(l, 1) ~ 0, c(1, 0) ~ 0, and c(O, 0) ~ 0. This system admits infinity solutions. Moreover, with any solution c(1, 1), c(1, 0), c(O, 0), we can associate a risk-neutral probability 1t(1, 1), 1t(1, 0), 1t(0, 0), where 1t(iJ) = (1 + dc(iJ). The admissible prices of a contingent asset written on the risky asset Sj and delivering at date 1 the cash flow g(Sj) is: c(g) = g((1

+ u)2So)c(1, 1) + g((1 + u)(1 + d)So)c(1' 0) + g((1 + d)2So)c(0,

0)

= -1 1 2 [g((1 + U)2S0 )1t(1, 1) + g((1 + u)(1 + d)So)1t(l, 0) ( + r)

+ g((1 + d) 2So)1t(0, = (1

+1 d

n

0)]

I

E (g(S]) So).

Thus, we get an infinite number of admissible contingent prices when the digital prices vary under constraints (11.43). The price multiplicity can be handled in various ways. For instance, we can search for a subset of admissible prices, that is, prices that do not allow for an arbitrage among the risk-free asset, the risky asset, and the derivative g. The interval is given by [r(g),l(g)], where r(g) and l(g) are the minimal and maximal admissible prices, respectively: r(g) =min c(g),l(g) =min c(g), ceC

ceG

where C denotes the set of digital prices (11.43).

Derivative Pricing in Incomplete Markets

271

We also can study all admissible prices and try to interpret their expressions. EXAMPLE 11.3: It is easy to check the digital prices at horizon 2 of the binomial tree (see Section 11.4.1) 1t2

21t(1 - 1t)

(1 _1t)2

c(1,l) = (1 +d' c(1, 0)= (1 +d ,c(O, 0)= (1 +d'

where 1t = r - dd are solutions of the system in (11.35). uEXAMPLE 11.4: More generally, we can price differently the first and second bifurcations of the tree, (1th 1 -1tI) and (~, 1 -1t2), respectively. The associated digital prices are

c(l 1) = 1t11t2 c(1 0) = 1t1(1-1t2) + ~(1-1tI) (0 0) = (1- 1t1)(1- 1t2) , (1+d" (1+d ,c, (1+d

Conditions (11.43) are satisfied if and only if (1 + d = (1 + U)21t11t2 + (1 + u)(1 + d)[1t1(1 - 1t2) + 1t2(1 - 1tI)]

+ (1 + d)2(1 -1tI)(1 -1t2) = [(1

+ U)1t1+ (1 + d)(1-1tI)][(1 + u)~ + (1 + d)(1 -1t2)].

We get an infinite number of pricing probabilities of this type. The differences arise from different pricing of the zero coupon at horizon 1/2, with rates rl and r2 at the first and second subperiods, respectively, where (1 + ri = (1 + rl)(1 + r2). 11.5.2.

General Result

In an incomplete market framework, the derivative pricing formula is not unique. We still can write the relation C(t,T - t.g) = exp[-r(T - t)]E"[g(ST)

1ft],

(11.44)

where It is the information available to investors at date t, but we need to be aware that there exists an infinite number of admissible risk-neutral probabilities. This set of risk-neutral probabilities can be described case by case using any of the three methods considered in Section 11.4.3. The multiplicity of pricing formulas is related to the number of independent shocks, which affects the price of the underlying asset. Let us consider the special case of a tradable risky asset and a tradable derivative. We cannot construct a self-financed porfolio containing this asset and derivative, which would be insensitive (immune) to two or more independent shocks.

272

Diffusion Models As an illustration, let us consider a model of the type

(11.45) where the drift and volatility are driven by an additional factor satisfying a stochastic differential equation (11.46) with another Brownian motion. We can reapply the argument of Section 11.4.3 with a derivative price of the type G(t) =G(S,,11t,t), a function of the two types of information on the price and the latent factor. We assume that the investor observes both St and 11t and may use this information to update the portfolio. The value of the self-financed portfolio satisfies the equation

using obvious notation. We cannot choose nondegenerate allocations a and ag to jointly eliminate the effects of the shocks dWI and dW2. However, we obtain restrictions on the derivative prices if we include two tradable derivatives in the portfolio with cash flows g(ST) andftST), say. The portfolio value satisfies dV(t) = (all + agllg+ aft!f)dt + (aO" + agO"l,g + apI,[)dWI(t) + (ag0"2.g + ap2,[)dW2(t).

The absence of arbitrage condition implies {

ag0"2,g + ap2,[ = 0 aO" + apI,g+ apI,[= 0

~ all + agllg + afllJ= r(aS + agG + afF).

or, equivalently, Il- rS =1..10",

Ilg- rG = AIO"l,g + ~0"2,g, Ilr rF=AIO"I,[+~0"2,['

(11.47) (11.48)

where Al and ~ are multipliers that depend on the information It. The system has a simple interpretation. Indeed, at any date, the instantaneous expected excess returns capture the risk premia associated with the two types of risk, that is, dWI and dW2. The multipliers Al and ~ provide the prices of these risks. If S is the only traded asset, Al =(Il- rS)/O" is defined unambiguously, whereas ~ can be arbitrarily selected. The multiplicity of pricing formulas is due to this arbitrary multiplier ~. If there also exists a derivative (g, say) traded on the market, ~ is determined by the observed derivative price.

273

Derivative Pricing in Incomplete Markets

It is also possible to apply the Girsanov theorem to the bidimensional system in (11.45) and (11.46) (see Pham and Touzi 1996). Let us first make the two Brownian motions orthogonal and write

(

dSt = J.1(Sh11t)dt + O"(Sh11t)(-Vl -

~2(St,11t)dW~ + ~(Sh11t)dWe"),

(11.49)

d11t = a(Sh11t)dt + b(Sh11MW':, where W': = W 2t and

We are independent Brownian motions.

PROPOSITION 11.15, GIRSANOV THEOREM: The admissible changes of proba· bility compatible with the differential system (11.49) are such that E"(g(ST) lIt)

=E[exp{-r I.,tliW~-~ r I.,~dt}exp{-r VtliW~-~ r V~dt}g(ST)14 where the processes (At) and (v t) satisfy the constraint

(At-V l - ~~+Vt~t)O"t=J.1t- rSt • The multipliers At and Vt can be interpreted as the path-dependent premia with respect to the two sources of uncertainty W~ and W~.

11.5.3 Stochastic Volatility Model The Model Stochastic volatility models in continuous time were introduced by Hull and White (1987) and Scott (1987) and later extended by many authors (see, e.g., Follmer and Schweizer 1991). They admit the following structure: dSt = J.1(t,ShO"t)Stdt + O"tStdW~, dj{ O"t) = a(t,O"t)dt + b(t,O"t)dW~, where (W~) and (W~) are independent Brownian motions. Thus, the model allows the stochastic volatility O"t to influence the drift. EXAMPLE 11.5: The initial model introduced by Hull and White (1987) is

dSt = !!Stdt + O"tStdW~, dci; =aa2tlt + bO"~W~. The price equation resembles a geometric Brownian motion except for the varying volatility. The volatility equation does not ensure, however, the nonnegativity of~. Therefore, the second equation is often

Diffusion Models

274

replaced by an Ornstein-Uhlenbeck process for the log volatility. It corresponds, for instance, to the limit of the ARCH-type process derived by Nelson (1990; see Section 11.3.4). The model becomes dSt = ~tdt + crtStdW~, d log crt =ao(al -log crt)dt + bdW~ .

A Pricing Formula Let us assume a Black-Scholes type of price equation dSt = ~tlt + cr,5'tlw, .

(11.50)

We can intuitively derive a pricing formula for the stochastic volatility model from the Black-Scholes formula (see Proposition 11.11). Let us restrict our attention to European call options. In the standard BlackScholes formula, the volatility is constant, so that cr-w;i is the cumulated volatility between the current date t and maturity T. When volatility is time varying, this cumulated volatility becomes -1. We get log S(n+l)O -log Sno = log

Eno,

[tlo]-l

log St - log So =

L

log

Eno

n=O

= [tlo] log(l

+ Ou) + [IOg( 1 + ~~) -log(l + ou) ]Nt,o

where Nt,o is the sum of [tlo] independent Bernoulli variables with parameter po. Proposition 11.16 follows directly from a (functional) convergence theorem to a Poisson process. PROPOSITION

11.16: Let us assume that 0 ~ 0,

po ~ 0,

and po!O ~ 'A> 0, then

the process log St - log So weakly converges to the process [Ilt + IOg( 1 +

~)NtC'A)}

where NtC'A) is a Poisson process with constant intensity 'A. Thus, by changing the condition on the probability po, the binomial tree can converge to the solution of a pure differential equation with jump:

d log St = udt + log[1 + dl'A]dNl'A).

(11.54)

11.6.3 Risk-Neutral Probability The corresponding risk-neutral probability can be derived by considering the limit of the binomial tree corrected for the risk (see Appendix 11.1

279

Summary

for the standard case). The risk-neutral probabilities of the two states 1 - 1ta and 1ta are given by

(11.55)

This probability is such that lim 1tato = liml/0 &-->0

&-->0

u - Adu . u - dl'Pa

(11.56)

This directly implies the following proposition. PROPOSITION 11.17: Let us assume r = 0, 0 ~ 0, pa ~ 0, and Palo ~ A > 0, then under the risk·neutral probability, the process (log St - log So) weakly con·

verges to the process

[~t + lO~ 1 + ~ )Nt(- ~u ) J. where Nt(- ~u ) is a Poisson pro·

cess with intensity -(Au/d).

Thus, for a zero risk-free rate, the risk-neutral probability is derived by modifying the intensity of the Poisson process and keeping unchanged the infinitesimal drift and the jump sizes. The risk-neutral probability is unique, which is the complete market hypothesis.

11.7

Summary

This chapter introduced basic concepts in derivative asset pricing and continuous time modeling. Continuous time asset price processes provide a convenient framework for pricing the derivatives given the information on past asset prices. In general, price processes in continuous time display either smooth trajectories and are consequently modeled using diffusion processes or else feature jumps and need to be modeled by processes with jumps. The price process underlying the Black-Scholes approach to derivative pricing follows a simple diffusion, called the geometric Brownian motion. The analysis of a geometric Brownian motion can be carried out using a tree representation under the assumption that transitions between the nodes of the tree are instantaneous. This method allowed us to derive the Black-Scholes formula of derivative pricing. It has to be emphasized that the Black-Scholes formula relies on a strong and unrealistic assumption of constant price volatility, implied by the fixed parameters of the geometric Brownian motion. In Chapter 6, we showed extensive

280

Diffusion Models

empirical evidence contradicting this assumption. Despite this shortcoming, the Black-Scholes formula is widely used in practice. A basic continuous time model accommodating time-varying price volatility is known as the Hull-White stochastic volatility model. Its discrete time analog was introduced in Chapter 6. However, relaxing the assumption of constant volatility results in an incomplete market and infinity of derivative prices. The determination of derivative prices requires knowledge of the distribution of the underlying continuous time price process. Therefore, in the next chapter, we present various estimation procedures for diffusion processes.

Appendix 11.1: Black-Scholes Risk-Neutral Probability We follow an approach by which the continuous time model is approximated by a sequence of binomial trees with nodes. The transition between the nodes takes time B. We consider successively these approximations for the historical and risk-neutral dynamics. Historical Dynamics

We consider a binomial tree (see Section 11.4.1) with the distance between subsequent nodes defined by time B. The probabilities of up and down movements are p and 1 - p, respectively. An up movement of price results in the next price value

whereas a down movement yields the price Stt{; = St[1 +

B~ - p -Vrp(l.voo p) ].

To apply Proposition 11.7, let us consider the drift and volatility per time unit. We get 1

~{E[Stt{; 1St] - St} = St~,

When B tends to 0, the drift and volatility per time unit converge to St~ and S~0"2, respectively. Thus, the limit of the sequence of binomial

trees is the diffusion equation

281

Black-Scholes Risk-Neutral Probability

that is, the geometric Brownian motion, which underlies the BlackScholes approach_ Risk-Neutral Dynamics

If r is the interest rate per time unit, the interest rate over a short time period of length B is approximately rB_ Let us now consider the riskneutral binomial tree associated with the historical binomial tree_ The up and down movements are with probability 1ta,

with probability 1 - 1ta, where

Let us compute the drift and volatility per time unit for the riskneutral probability_ We get, for instance,

=~1 1taSt[BIl + (l-P){"OO] -V + ~1 (l - 1ta)St[ BIl- P-V {"OO] u

'P(l-P)

U

for small B_ Similarly, we get

'P(l-P)

282

Diffusion Models

We deduce that the risk-neutral continuous time model associated with the historical geometric Brownian motion is the limit of this tree, that is,

Appendix 11.2: Black-Scholes Price of a Call Under the risk-neutral probability n, the price process satisfies the stochastic differential equation d log SI =(r - (J'2/2)dt + (J'dWI. By integrating, we get log St+H -log SI = (r - (J'2/2)H + rrWlu, where u is a standard normal variable. Then, St+H=SI exp(r- (J'2/2)H exp(rrWlu). The Black-Sholes price of the call is C(t;H,K)

= exp(-rH)E"[(SI+H - K)+ lSI] = exp(-rH)E[(St exp(r- (J'2/2)H exp(rrWlu) - K)+ lSI] =E[(SI exp -

(J'~H exp(rrWlu) -K exp(-rH)r IS}

The integrand is positive if and only if St exp ~

(J'2H

2

exp rrWlu - K exp(-rH) ~ 0

__ r;; (J'2H miHu ~ log[K exp(-rH)/Sa + 2

> log[S/(K exp - rH)] rrWl _

~u_-

Therefore, we get

rrWl

_ r;;H +-2--- xt + cyvn .

283

Black-Scholes Price of a Call C(t;H,K)

=E[(St exp- cr;H exp(crWiu)-K

= St exp(- cr

2H)r 2

xitO

eXp-rH)l~_x'-kl-JHISt]

-JH exp(crWiu)cp(u)du - K exp(-rH)f -JHcp(u)du -x'i+-

,2 l~(Al ,)2 ST = T £.J LliOgpt - mT .

t=1

(12.4)

t=1

The ML estimators of the drift and volatility parameters are derived from their relations to the mean and variance parameters. They are given by

, , d~ IlT= mT+"2'

(12.5)

It is known that the asymptotic variances of mT and s~ are such that

cr2 Vasy(mT) = T'

Covasy( mT's~) = O. We infer the asymptotic variance-covariance matrix of the pair J1T,d~: cr2 cr4 Vasyll T= T + 2T' A

V ,2 2cr4 asycrT=y'

COVasy(J1T,d~) = ~ . We observe that the estimators of Il and cr2 are correlated. Effect of the Sampling Frequency The previously given properties were derived by assuming an interval of one unit of time between consecutive observations. Let us examine what happens when data are sampled at a shorter interval h instead. We get (12.6)

where (E~h) is a standard Gaussian white noise. As before, we compute the estimators mh,T and ~,T of the mean and variance of d h log Pt based on T observations sampled at interval h. The estimators of Il and cr are now given by

Maximum Likelihood Approach

289

Their asymptotic variances are modified accordingly: -2 1 ,.2 1 2h2(J4 2(J4 Vasy((Jh,T) = h2 Var(sh'T) = h2----;y- = Y '

_ 1 hd 1 2(J4 d (J4 Vasy(~h,T)= h2Y+4Y= hT+2T"

The variance of the volatility parameter depends only on the number T of observations and does not depend on the sampling frequency h. On the contrary, the variance of the drift parameter depends on both hand T. In the limiting case, when T increases and h decreases so that hT -'; 1, we have limT...... Vasy(!1h,T) = (J2. h....O

Therefore, the drift parameter cannot be consistently estimated even from an infinite number of observations. This result is easy to explain. The drift parameter represents the trend effect and can be recovered only if the observations span a long period. However, the condition h T = 1 can be satisfied even when the observations are separated by h and recorded over the interval [0, 1]. In that case, we observe the trend effect only within a bounded interval, which does not convey sufficient information.

12.1.2 Ornstein-Uhlenbeck Process Exact Discretization The dynamics of the process is described by the stochastic differential equation dYt = (CI> - A.Yt)dt + (JdWt

(12.7)

= A.(~ - Yt)dt + (JdWt,

(12.8)

where (Wt ) is a standard Brownian motion. This equation has a simple discrete time counterpart (see Section 11.2.2):

Yt = ~[1-exp(-A.)] + exp(-A.)Yt_l + (J(

1- exp(-2A.) )112 2A. Et,

(12.9)

where (E t ) is a standardized Gaussian white noise. This equation corresponds to a Gaussian AR(1) (autoregressive process of order 1) representation for the (YI> t E Z) process (see also Chapter 2).

Estimation of Diffusion Models

290

Maximum Likelihood Estimators We can easily apply the ML method to the autoregressive model in (12.9). Let us first reparametrize the autoregressive representation as Yt = ~(1- p) + PYt-l + 1'\Eh where P = exp - A.. The ML estimators of the parameters independent and equivalent to 1 T

~,

(12.10)

p, and 1'\ are asymptotically

IlT = y LYt = Jr. 1=1

(12.11)

where the residuals are defined by £t = Yt - YT - PT(Yt-1 - YT). Their asymptotic variances are given by _ 1'\2 Vasy~T = T(1 _p2) , From the ML estimators of the parameters ~, p, and 1'\, we easily infer the ML estimators of the parameters of interest i..T=-log

-2 2 log PT-2 aT=- 1 -2 1'\T·

Pr.

(12.12)

- PT Their asymptotic variances are derived by the B-method. For instance, we get _ [ 0(- log p) ] 2 _ Vasy(A.T) = op VasyPT,

11- p2

=y-r'

_.! 1 - T

exp - 21.. exp- 21.. .

The estimators i..T and d} are asymptotically correlated since they both depend on PT. This is a consequence of time aggregation. 12.1.3

Cox-Ingersoll-Ross Process

The CIR process satisfies the stochastic differential equation

dYt =(a - bYt)dt + csVY,dWt.

(12.13)

Method of Moments and Infinitesimal Generator

291

In Section 12.2.3, we derive the conditional distribution of Yt given Yt-l by exploiting its Laplace transform. Recall that it is a noncentral chi-square distribution, up to a scale factor. Expression (11.21) and (11.22) of the conditional probability density function (pdf) of Yt can be used to build the log-likelihood function, which requires an adequate truncation of the Bessel function. The resulting ML estimators have no explicit expressions, and the optimization of the log-likelihood function has to be performed numerically.

12.2

Method of Moments and Infinitesimal Generator

When the likelihood function cannot be computed explicitly, we can use less efficient estimators obtained by optimizing an alternative criterion which admits an analytical form. A natural candidate for an optimization criterion arises from the moment conditions associated with a stochastic differential equation. In this section, we review various methods proposed in the literature in the context of unidimensional diffusion processes. Note that this framework implies a complete market and unique pricing formula of the derivatives, which is not compatible with stochastic volatility, for example. 12.2.1

Moment Conditions

The moment conditions studied by Hansen and Scheinkman (1995) are based on the following proposition: PROPOSITION 12.1: Let us consider a unidimensional process (Yt) that satisfies the stochastic differential equation

dYt = /l(Yt)dt + CJ(Yt)dWe. and let us introduce the infinitesimal generator A associated with this equation. It transforms a function of Y into A(y) = lim -hI E[((Yt+h) - (Yt» IYt = y] h-->O

_ d(y) () ! d2(y) d( ) - dy /lY+2 dl y. Then, for a large set of functions , , the following moment conditions are satisfied: (i) EA(Yt) = 0, V'. (ii) E[A(Yt+l)(Yt) - (Yt+l)A(Yt)] = 0, V',.

Estimation of Diffusion Models

292

By definition, the infinitesimal generator represents the infinitesimal drift of a transformed process. The differential expression of the generator follows from Ito's formula applied to (Yt). The moment conditions given in Proposition 12.1 concern the marginal moments of nonlinear functions of Yt (condition i) and cross moments of nonlinear functions of Yt and YHI (condition ii). This set of moment conditions seems to be quite large since the functions , are not constrained (except for the second-order differentiability restriction). As an illustration, we detail the moment conditions for the exponential functions ofy: (y) = exp(-ay), (y) =exp(-by). For condition (i), we get

for condition (ii), E{a exp(-aYHI -

bYt)[~(Yt+l) - ~(Yt+I)] -

[~(Yt) - ~(Yt)]} = 0,

b exp(-bYHI - ayt)

Va,b.

12.2.2 Identification A method of moments provides estimators that may not be fully efficient. Therefore, it is important to check whether the above set of moment conditions is sufficiently informative about the drift and volatility functions ~ and cr. In fact, some identification problems may arise. For example, let us consider the drift and volatility parameterized as follows: ~(y;9)

=9oJ.1(y;9J,

with 9 = ( :: ).

The conditions (i) and (ii) can be simplified with respect to 9 0 • Thus, this parameter cannot be identified from the previously given moment conditions. By considering all admissible functions, it may be shown that the marginal distribution of Yt is identifiable up to a scale factor. Yet, the set of conditions (ii) does not allow us to identify the joint distribution of (YhYHI) up to a scale factor. To illustrate the identification problems, let us consider the following diffusion model:

Method of Moments and Infinitesimal Generator

293

dYt = (ex + ~Yt) + cryidWh which has been proposed by Chan et al. (1992) for the short-term interest rate. This specification encompasses a number of well-known continuous time models (see Broze, Scaillet, and Zakoian 1995). Its dynamics depends on four parameters: ex, ~, cr, and 'Y. The condition (i) for the exponential functions is

E[ exp(-aYt )(ex + ~Yt -

~)~y)] = 0,

Va.

The parameters that are identifiable from conditions (i) are only'Y, and Wcr.

alcr,

12.2.3 Method of Moments

The method of moments is a widely used econometric procedure (see Hansen 1982; or Chapter 8). We present its implementation for restrictions of type (i). Let us select a priori n functions «I»j, i = 1, ... ,n. The moment conditions i = 1, ... ,n,

(12.14)

are satisfied for the true value of the parameter. We assume that some identifying restrictions are imposed to eliminate the effect of the scale factor (for instance, 90 = 1; see 12.2.2). A moment estimator is a parameter value for which conditions (12.14) match their empirical counterparts. Let us denote

and

The moment estimator is derived as a solution of the optimization

(12.15)

294

Estimation of Diffusion Models

where Q is a positive definite matrix of weights. When this matrix is an identity matrix, the optimization yields (12.16) EXAMPLE 12.1: Let us consider the model of Chan et al. with the identifying constraint ex = 1. The optimization criterion is

where aj, ... , an are given real numbers. This is a quadratic objective with respect to the parameters ~ and d and hence it may first be concentrated (i.e., optimized) with respect to these parameters. The concentrated objective function depends only on the parameter ,,(, which may be numerically found by a grid search method.

12.2.4

Spectral Decomposition of the Infinitesimal Generator

As shown in Section 12.2.2, the previous method does not allow identification of all the parameters of interest. To circumvent this drawback, it has been proposed to analyze the properties of the infinitesimal generator by considering its spectral decomposition (see Demoura 1993; Hansen, Scheinkman, and Touzi 1998; Darolles, Florens, and Gourieroux 1998; Chen, Hansen, and Scheinkman 1999). Moreover, this approach provides nonparametric estimators of the drift and volatility functions. Under weak regularity conditions, the infinitesimal operator admits a spectral decomposition, that is, a sequence of eigenvalues and eigenfunctions ~, ¢lj' f~ 1 (say), such that

1. A ¢liY) = ~¢liY), j ~ 1. 2. Aj , j ~ 1, are positive real numbers. Let us now rank the eigenvalues and denote by Aj the largest eigenvalue and by ~ the second largest one. The conditions defining the pairs (~, ¢lj), j = 1, 2, are {

or, equivalently,

A¢lj(Y) = Aj¢lj(Y), A¢l2(Y) = ~¢l2(Y)'

(12.17)

Method of Moments and Infinitesimal Generator

295

(12.18)

This is a bivariate system with respect to the drift and volatility. Therefore, it is equivalent to estimate the functions Il and 0" or to estimate the two first pairs O..j, «1>1) and (~, «1>2) and next solve the system in (12.18). In fact, it is possible to estimate 0"10 «1>1) and (~, «1>2) nonparametrically from discrete time observations. The approach is based on the prediction interpretation of the infinitesimal generator

A«I>(y) =lim -hI E[((YI)) IYI =y], h....o

(12.19)

which represents the infinitesimal drift of the transformed process. Next, we note that the spectral decomposition of A is related to the spectral decomposition of the conditional expectation operator T, which is associated with the function «1>:

(12.20) The eigenfunctions of the conditional expectation operator coincide with the eigenfunctions of the infinitesimal generator, whereas its eigenvalues are given by X; = exp 1..;.

12.1: The system in (12.18) gives a possibility of identifying nonparametrically the drift and volatility functions. Alternatively, AitSahalia (1996) has proposed the use of the marginal density function for identifying nonparametrically the volatility function under the assumption of a restrictive affine form of the drift function. REMARK

12.2.5 Nonlinear Canonical Decomposition

The spectral decomposition of the expectation operator is related to the nonlinear canonical decomposition of the joint pdf of (YI,YI_I). Let us denote by j(YhYI-I) and f(YI) the joint and marginal pdf, respectively. If

JJ[fi;~v~;:!I) ]f(Yt)f(Yt-l)dYtlYt-l
iYI)'I',{Yt-I)]'

(12.21)

FI

where the canonical correlations ~, j varying, are nonnegative, and the canonical directions «I>j and 'IIi, j varying, satisfy the constraints

Estimation of Diffusion Models

296

Ej(YI) = EIlfiYI) = 0, Vj, VJ{YI) = V'I'J{YI) = 1,

Vj,

cov[j(YI),iYI)] = O,Vj -j:. k, cov ['I'j(YI),'I'k(YI)] = 0,

Vj-j:.k.

In particular, we deduce

If (YI) is a Markov process of order 1, we get E(hl) IYI-I) = EiYI) +

L A.~[iYI)k(YI)]'I'k(YI-I) k=1

(12.22)

The unidimensional diffusion processes are reversible, that is, their distributional properties are identical in the ordinary and reverse times (Revuz and Yor 1990). This implies that the current and lagged canonical directions are identical j = '!'i,Vj (up to a change of sign). Thus, the nonlinear canonical decomposition becomes

(12.23) Moreover, the conditional expectation operator is such that Vj.

(12.24)

This suggests that the spectral elements of the conditional expectation operator can be derived from the nonlinear canonical decomposition of the bivariate distribution. 12.2.6 Estimation of the Spectral Decomposition

Two types of nonparametric estimation methods have been proposed in the literature. The Sieve method approximates the conditional expectation operator on a finite dimensional subspace and requires spectral decomposition of this approximated operator. The second approach relies on nonlinear canonical decomposition of a kernel-based estimator of the joint pdff(YhYI-I). Sieve Method The first step approximation consists of projecting the functions on a finite-dimensional space (Chen, Hansen, and Scheinkman 1999). Let us consider the finite dimensional space of stepwise functions

Method of Moments and Infinitesimal Generator

297

K

(y) =

L b 1[a"a,jy), k

(12.25)

k=1

where [ahak+Il, k = 1, ... , K, is a given partition of the real line. We can approximate the conditional expectation operator by projecting these functions on the space generated by l(a"a",) (Yt-l), k = 1, ... , K. Its empirical counterpart is easily obtained by estimating by ordinary least squares (OLS) the seemingly unrelated regressions (SUR) model: Z(t)=BZ(t-l)+u h

(12.26)

Z(t) = [1[a,,a,1(Yt), ... , l(aK,aK.iYt)]'. Then, we per~orm the spectral decomposition of the estimated matrix of coefficients B to find the first eigenvalues and the associated eigenvectors. If CPh ... , CPK are the estimated eigenvectors, the corresponding eigenfunctions are approximated by 4>k(Yt) = cp~(t) (see Chen, Hansen, and Scheinkman 1999 for the choice of the partition and the asymptotic properties of the estimator). Kernel-Based Estimation This approach was introduced by Darolles, Florens, and Gourieroux (1998). These authors proposed the application of the nonlinear canonical analysis to a kernel-based estimator of the bivariate pdf. The estimated pdf is

vi

vi

n ) _ 1 ~ y. - Yt ) y. - Yt-l ) j,YhYt-l - Th2 f:t.n..\ -h- .n..\ --h- ,

(12.27)

where K is a univariate kernel, and h is the bandwidth. We refer to Darolles et al. 1998 for the optimal choice of the bandwidth h and the asymptotic properties of the estimated canonical correlations and canonical directions.

12.2.7 Applications A Monte Carlo Study We first provide an illustration of the previous approach based on the simulated realizations of an Ornstein-Uhlenbeck process, that satisfies the stochastic differential equation [see (12.8)] dYt = A.(Jl- Yt)dt + adWt.

This equation admits a solution that is a Gaussian process. It can be shown that the canonical analysis of the covariance operator associated with the discrete time process (Yh t E Z) leads to the canonical correlations

Estimation of Diffusion Models

298

A.; = exp(-iA),

(12.28)

i= 1, 2, ...

~

The corresponding canonical variates are cI>i(Y) = _ Hi -'I1!

('L), i = 1, 2, ... , cry

where ~ is the variance of YI> and Hi are the Hermite polynomials: (12.29) In particular, the first Hermite polynomials are (12.30) As an example, we simulate a path (y" t= 1, 2, ... , T) of length T= 250 of the process with parameter values J.l = 0, A = 0.8, and cr = 0.5. It is plotted in Figure 12.2. We apply the kernel-based canonical analysis to the artificially gener-

(x) - k exp(-rH)cl>(x - aWi),

(13.5)

__ log(k exp - rH) ! _r;-;H xcr-{Ji + 2 a'Jr1.

(13.6)

where:

and

Thus, the volatility parameter a appears in both the dynamic specification of the underlying price and in relations between the derivative prices, whereas the drift parameter appears in the dynamic equation only.

319

Analysis Based on the Black-Scholes Model 13.1.1

Inference from the Price of the Underlying Asset

Let us suppose that the assumptions of the Black-Scholes model are all satisfied. Given a sample of observations SJ, ... ,ST on the price of the underlying asset, we can estimate the drift and volatility parameters; next, by applying the Black-Scholes formula in (13.2) and (13.3), we can approximate the derivative prices. Estimation of the Drift and Volatility Parameters By Ito's formula, we know that

which implies illog St = log St -log St-I =

(Jl- ~2) + aEe.

(13.7)

where (Et) is a standard Gaussian white noise. We find the expression of the maximum likelihood (ML) estimators of the mean

d

V=

Jl- ~ and variance

of the log-differenced prices (see Section 12.1.1). They are equal to 1

A

VT=

T L (illog St) = illog ST, T

(13.8)

1=1

(13.9) These two estimators are independent and asymptotically normal with • • V;A d Vi.~ 2a4 asymptotIC varIances yVT= T' UT=Y· Determination of a Current Option Price Our objective is to evaluate the option price at the present time T. The current option price is defined by the formula: C(T;H;k) = S/P(a;k;H;rT),

(13.10)

where the risk-free rate is usually replaced by a I-month interest rate evaluated at T or the interest rate at horizon H, if available. Since ST and rT are both observable, the derivative price can be approximated by (13.11)

Econometrics of Derivatives

320

It is a counterpart of the Black-Scholes option price (13.10) with the unknown volatility replaced by its maximum likelihood estimator. By applying the O-method and using the formula of the asymptotic variance of d"~ , we get a 95% prediction interval for the option price

[C(T;H;k)±2S'~G';k;H;y,) ~.f,l

(13.12)

The length of the prediction interval depends on the sensitivity ~! of the option price with respect to the volatility.

Prediction of a Future Call Price Our objective is now to predict at T the option price corresponding to a future time t. Let us consider a European call with maturity To > T and strike K. At date t> T, prior to maturity, its Black-Scholes price is C(t;To - t;K) = St'P[O";K/St;To- t;r].

(13.13)

At the prediction origin T, the risk-free rate, assumed to be constant in the Black-Scholes model, can be replaced by the current value rT. However, the future price of the underlying asset also has to be predicted given all available information. We know that log St = log ST + Il(t - 1) + 0" -.jt - TE,

(13.14)

where £ is a standard normal variable independent of log ST, and t ~ T. Therefore, conditional on the current price ST, the future price St has a log-normal distribution with mean log ST + Il(t - 1) and variance cr(t - 1). Conditional on ST, the option price admits a distribution that is derived from the distribution of SI by the Black-Scholes transformation (13.13). The explicit form of the conditional distribution of the option price is easily obtained when t is equal to the option maturity To. Indeed, we have C(To;O;K) = (STo- K)+, and the conditional distribution of the option price is a truncated log-normal distribution. It admits a point mass at 0 with probability

P[STo- K < 0 1ST] = p[log STo -log K < o1ST]

=~log ST-Iog K + Il(To-1) + 0" -.jTo-T E < 0] = [ -log ST + log K - Il(To- 1) ], cr-.JTo - T and a continuous part for strictly positive values. Depending on the location of the mode of the log-normal distribution with respect to the strike, we get one of the two patterns displayed in Figure 13.1.

321

Analysis Based on the Black-Scholes Model

o (a)

Figure 13.1

(b)

Conditional Distributions Of(STo-Kt

When t is strictly less than the maturity To, the analytical expression of the conditional distribution is difficult to derive. In contrast, this distribution is easily obtained by simulations. Let us consider S independent drawings f.,', s = 1, __ . ,S, from the standard normal distribution. Using formula (13.14), we generate simulated values of the future price of the underlying asset: log S; =log Sr+ J1rtt - 1) + crr'''t - T

f.,',

s = 1, ... ,S,

(13.15)

after replacing the unknown parameters by their estimators. Next, using formula (13.13), the simulated future option prices are C,(t,To - t;K) = S; '¥(crd(JS;;To - t,Tr),

s= 1, ___ ,S_

(13.16)

The conditional distribution of the future option price is well approximated by the empirical distribution of the simulated values C'(t, To - t;K). It is important to note that, to determine this conditional distribution, we need to estimate the drift parameter. Indeed, the parameter ~ is necessary for predicting the future option price even though it does not appear explicitly in the Black-Scholes formula. Some conditional distributions of future option prices are displayed in Figure 13.2. 13.1.2

The Incompatibility between the Black-Scholes Model and Statistical Inference

The previous section leaves us with an impression that, from a statistical point of view, the Black-Scholes model is a simple and convenient tool

322

Econometrics of Derivatives co

ci co

LO

ci

>- """.

{j-

I:

1:",,"

:::Ie')

:::10

0 0 CD tT

co

ci CD

•

ci

•

tT

~o

&

C\I

ci

C\I

C\I

ci

ci

T""

ci o ci~::;::;:::;;::;::::;;=,J o 10 20 30 40 k=1

Figure 13.2

o

o

cit;:::;::;:::;;::;::::;;::J

o

10 20 30 k=2

ci ~:;;=;::;=;;=J

o

10 20 30 k=3

Conditional Distributions of Future option Prices

for analysis. Recall that, from a sample of asset prices, we estimated the parameters ~ and cr of the geometric Brownian motion and substituted cr into the option pricing formula. Statistical inference breaks down, however, when option prices are observed as well. Let us further assume that at date t we have a complete sample of observations on the price SI of the asset, the risk-free rate rl , and two option prices with characteristics Hhkl and H 2,k2, respectively. From the Black-Scholes formula, {

C(tJIhkl) = St'¥(cr;k1,l{hrt), C(t;H2,k2) = St'¥(cr;k2,l{2,rt).

The true value of the volatility should arise as the solution of this bivariate nonlinear system. Yet, in practice, the observed prices Sh rl , C(tJIhkl), and C(t,H2,k2) are such that the system has no solution; as a consequence, the Black-Scholes model is immediately rejected by the data with probability 1. The incompatibility stems from the unrealistic assumption of the Black-Scholes model on unique derivative prices, which implies a deterministic relationship between the asset price and option price. In reality, such deterministic relationships do not exist. Statistical inference makes sense only in the presence of a source of random variation or, more precisely of an error term, transforming the deterministic relationship into a stochastic one. Therefore, the Black-Scholes formula can be viewed instead as an approximate pricing formula:

Analysis Based on the Black-Scholes Model

323

C(t;Hj,k l ) = S,'f'(cr;kj,Hj,r,) + 111', C(t;H2 ,k2) = S,'f'( cr;k 2 ,H2,r,) + 112"

where l1lt and 112, are error terms_ One has to realize, however, that the presence of the additional noises (111,) and (112,), for instance, results in an incomplete market framework (see Chapter 11) for which derivative prices are no longer unique_

13.1.3

Implied Volatilities

Definition Nevertheless, the Black-Scholes formula remains a valuable tool for comparing option prices. Especially, it may be used for preliminary correction of the derivative price for the effects of residual maturity, moneynessstrike, and current asset price_ This first-step correction leads to the socalled implied volatility_ Thus, in spite of its name, an implied volatility is essentially a normalized option price_ Let us assume an observed option price of a call with residual maturity Hand moneyness-strike k: C(t;H;k), (say). Since the function 'f' in the Black-Scholes formula is a one-to-one function of volatility, there exists a unique volatility value such that C(t;H,k) = S,'f'[crBs(t;H;k);k;H,r,l,

(13.17)

where crBs(t,H,k) is the implied volatility associated with the derivative with an observed price. It can essentially be viewed as a corrected option price for which the correction pertains to the aforementioned effects_ This explains why, on derivative markets, options are often quoted in terms of their implied volatilities to facilitate price comparison. If the assumptions of the Black-Scholes model were all satisfied, all implied volatilities would be equal and would coincide with the constant historical volatility cr. In fact, in the Black-Scholes world, all contingent claims are generated either by the risky and risk-free assets or by an option and risk-free asset. All options are equivalent; hence, their corrected prices are equal as welL Comparison of Volatilities In the real world, implied volatilities vary depending on the data, maturity, and strike. To illustrate this feature, Figure 13.3 shows various volatilities that correspond to the index Hang Seng of the Hong Kong stock exchange covering the period January 15,1997, to January 12,1999. The data contain (1) the historical volatility and (2) the implied volatility. Historical volatility consists of daily observations, each computed from a sam-

Econometrics of Derivatives

324

o

C\I

g

Figure 13.3 Evolution of Historical and Implied Volatilities

pIe of daily data on 10 consecutive trading days. For implied volatility, the daily implied volatilities are derived by averaging the Black-Scholes implied volatilities of the six most frequently traded call and put options written on the index. Of course, the selection of these six derivatives is endogenous and varies in time. The implied volatility series is less volatile than the historical series. This is a stylized fact, which can partly justify the interpretation of implied volatility as an expectation of future volatilities (see the variance bounds discussed in Chapter 8). Such an interpretation results from a reasoning based on an equilibrium model with rational expectations rather than the Black-Scholes model. Since the derivatives are introduced to hedge against the volatility risk, their demand and supply have to be functions of the expected future volatilities. Therefore, the equilibrium prices and the normalized equilibrium prices (i.e., the implied volatilities) also depend on these expectations. Additional information on the joint dynamics of the historical and implied volatilities is provided by their autocorrelation functions (ACF), shown in Figure 13.4. The significant autocorrelations of historical volatilities are partly due to the overlapping of the periods from which averages are computed. The possible explanations of the larger values of autocorrelations of the implied volatilities are of two different types. First, if we interpret the implied volatilities as normalized prices, we expect a unit root price dynamics under the efficient (derivative) market hypothesis. Second, if we retain an expectation interpretation of implied volatility as

325

Analysis Based on the Black-Scholes Model Series 1

qr---------~~~----------~

co

~r-------S~e~r~ie~s~1~an~d~S~e~ri~es~2~----~

co

o

o

C\I

o

C\I

o ~~~~~~~LLLLLLLL~~~~

. -----_. ----_. ------------------_. ------------

-------------------------------_._._-----------

o

o

~

5

10

20

15

Series 2 and Series 1

5

10

15

20

Series 2

qr---------~~~~--------~

co

o

00

ci CD

.. ci ci

'"ci -----------------------------------------------

-20

-15

-10

Lag

-5

o

----------------------------------------------10 15 20 o 5 Lag

Figure 13.4 Autocorrelogram of Historical and Implied Volatilities

the average of expected future volatilities until the maturity, we again end up with the overlapping argument- However, the overlapping alone does not explain the entire values of implied autocorrelations, as seen by comparing them with the autocorrelations of historical volatilities_ The temporal dependence revealed by the joint autocorrelogram of historical and implied volatilities suggests a bivariate autoregressive structure of the volatilities and implied volatilities, and likely the possibility to outperform the predictions based on generalized autoregressive conditional heteroscedastic models by introducing implied volatilities among the regressors_ This stylized fact was noted first by Lamoureux and Lastrapes (1993). Table 13.1 shows the estimated coefficients from various ARCH regressions that include implied volatilities among the regressors. The ARCH-type models are

where

Econometrics of Derivatives

326

Table 13.1 Implied Volatility in ARCH Regression

Constant 0.0004 (4.58) -0.0011 (-4.52) -0.0027 (-5.70) -0.0007 (-3.01) -0.0018 (-3.77)

2

Y'-I 0.3903 (9047)

1. This particular pattern is called the volatility smile, and it is discussed further below. However, this typical pattern is not observed in all derivatives, as shown in Figure 13.5, in which the volatility smiles for the dollar/yen options are displayed. As expected, the asymmetry is more difficult to detect in exchange rates. Indeed, the definition of the exchange rate depends on the selected currency of reference. A call option with dollars as the basic currency is similar to a put option with yen as the basic currency. Thus, there is no reason to justify an asymmetry to the right rather than to the left or vice-versa. More generally, for any given date t, we can plot the implied volatilities as functions of both the maturity and moneyness-strike to obtain the so-called implied volatility surface (Figure 13.6). This surface illustrates the dependence of the smile effect on the maturity. Alternatively, this dependence can be observed by plotting the smiles associated with various residual maturities. We provide in Figure 13.7 a

327

Analysis Based on the Black-Scholes Model 15.0 r - - - - - - - - - - - - - - - - - ,

7 days

14.5 .~

~

g "0

30 days 60 days 90 days

14.0

.!!1 0. E

180 days 65 days 730 days

13.5

13.0+--------r-------.--------.-------~

0.93

Figure 13.5

0.98

1.03 SIK

1.08

1.13

Volatility Smiles, US. Dollar/Japanese Yen, September 1, 1997

Figure 13.6 Implied Volatility Suiface, US. Dollar/Japanese Yen, September 1, 1997

Econometrics of Derivatives

328

..

...,

....

.. AM

....

....

o.

u,

-

o.

... ..... ... ... ... ...

,.

.-

....

,.

...~

. .,,; ~

....

..-

.-

AM

-

••

~ .

,

.-

___ o

....

1M

tAl

.. ••

0'--- - .. o.

.. AM

u

:"..

.. ..

••

UI

L.

LM

Figure 13.7 Set of Volatility Smiles, S&P

set of such figures for different dates and options on the Standard and Poor's (S&P) Index. 13.1.4 Reconstitution of the Implied Volatility Surface

In practice, at each date t, there exists a very limited number of liquid European calls (or puts) with prices that can be considered competitive.

Analysis Based on the Black-Scholes Model

329

Their characteristics are denoted by (~,kj)' j = 1, ... , ft. Their observed prices are C(t;~,kj), and the implied volatilities are (JBs(t;~,kj)' j = 1, ... , ft. These data can be used to infer an acceptable competitive price of a newly created derivative such as a call with characteristics (H,k) or to price a derivative that is not actively traded on the market. Equivalently, this information may serve to recover the price surface (H,k) -t C(t;H,k),

or the implied volatility surface (H,k) -t (JBs(t;H,k).

There exist various heuristic approaches to fitting a smooth surface to a set of observed points (see the references at the end of this book). In finance, the variable represented by the surface is, in general, the implied volatility, for the following reasons. First, under the Black-Scholes framework, we should observe a flat surface, which is easy to recover. Moreover, fewer restrictions are imposed on the volatility surface than on the price surface [for instance, C(t;H,k) is a decreasing function of k]. Below, we outline the methods that can be applied for obtaining the implied volatility surface (H and k varying) or simply the smile (k varying, fixed maturity). Finally, note that there exists a large number of databases that include "complete" volatility surfaces. These surfaces have generally been estimated by one of the approaches described below. Thus, it is important to know the initial observed prices and the method of smoothing before using these volatility surfaces for further analysis. Regression Approach A natural idea is to introduce a parametric specification for the implied volatility surface: (JBs(t;H,k)

= a(H,k;8),

say,

(13.18)

where 8 is a vector parameter. Usually the selected parametric specification includes a constant function. Then, we approximate the implied volatility surface by a(~,kJJt), where: j,

9t =Arg min L [(JBs(t;~,kj) 9

a(~,kj;8)t

(13.19)

j=1

The price surface is approximated by C(t;H,k) = St'!l[a(H,k;9t);k,H,rt].

(13.20)

This approach is designed to reproduce an implied volatility surface at a fixed time, that is, under a cross-sectional approach. As a consequence,

Econometrics of Derivatives

330

the volatility surfaces are computed daily by practitioners. Since the parameter 91 is reestimated each day as well, a time series of varying parameters is generated as an additonal output. EXAMPLE 13.1: Let us introduce a polynomial approximation of the implied volatility surface. For a polynomial of degree 2, we get

a(H,k;9) =ao + alH + a2k + a3lf + aJIk + a5k2, where 9 = (aO,aha2,a3,a4,a5)'. Furthermore, for a fIxed maturity H, we obtain a parabolic smile. The level and curvature of the smile may depend on the maturity. Empirically, polynomials of higher degrees are required to capture asymmetric smiles. The fact that the estimated volatility surface is not flat is an indication of misspecifIcation in the Black-Scholes model. The parametric specifIcation in (13.18) extends the standard Black-Scholes set of pricing formulas

C(t;H,k) = SI'II[a(H,k;9);k,H,rl ]. The sensitivities of the option price with respect to the arguments H and k are also modifIed. More precisely, we have

oC . oW oa oW oH (t;H,k) = SI ocr oH+ SI oH =VI

oa oH- 8 t>

where VI and 8 1 denote the first derivatives of the Black-Scholes formula with respect to the volatility and maturity, respectively. They are called vega and theta, respectively.

Regressogram The implied volatility surface can also be recovered by nonparametric methods. For instance, we can approximate the implied volatility associated with (H,k) by computing a weighted average of observed volatilities using as weights some functions of the derivative characteristics. For illustration, let us introduce a Gaussian kernel and two bandwidths 11H and 11k for the maturity and moneyness-strike, respectively. The approximated implied volatility can be defIned as

'2/'

FI

cJBS(t;H,k) =

K [H - ~] K [ k - kj 11H 11k

'2/'

FI

K[H-~]K[ 11H

]

crBi.t;~,k) (13.21)

k-kj 11k

]

This is the so-called regressogram, or Nadarayah-Watson estimator, of the implied volatility function.

Analysis Based on the Black-Scholes Model

331

Hermite Polynomial Expansions Other nonparametric approximations of the price surface are based on polynomial expansions. These methods allow for a choice of one among various possible basis of polynomials. For instance, Jarrow and Rudd (1982) considered Edgeworth expansions, whereas Madan and Milne (1994) proposed the introduction of Hermite polynomials. Let us describe the last approach after introducing first the definition and properties of Hermite polynomials. The formula of a Hermite polynomial involves a derivative of the standard normal density. As we know, the probability density function (pdf) cp of the standard normal distribution admits derivatives of any order. These derivatives are written as products of cp and polynomials of increasing degrees. The Hermite polynomial of order k is defined by Hk(x) = (-ll icp(x) _1_. -{J;i dxk cp (x)

(13.22)

For example, the Hermite polynomials of low orders are Ho(x) = 1,

HJ(x) = x,

x2 -1 H 2(x) = -2-'

i-3x H3(X) = - 6 - '

The sequence of Hermite polynomials forms a basis of orthonormal functions with respect to the standard normal distribution cp:

f Hi(x)cp(x)dx = 1,

"Ik,

f Hk(x)H,(x)cp(x)dx = 0,

(13.23) "Ik ::f.l.

Therefore, any function g can be decomposed on the orthonormal basis as g(x) =

i

g,H,(x),

f

where g, = g(x)H,(x)cp(x)dx.

(13.24)

l=O

Let us now describe an application of the Hermite expansion to the Black-Scholes formula. Under the Black-Scholes model, the price of a derivative with residual maturity H and cash flow g(St+H) is llBS

C(t,H,g) = exp(-rH) E [g(St+H) 1St]

(13.25)

where u is a standard normal variable. Instead of using a Gaussian distribution, Madan and Milne (1994) assumed a nonparametric specification of the risk-neutral density of the shock u. Let us pursue their approach. We get

Econometrics of Derivatives

332

n

= exp(-rH) E g(StJ/,d,r;u)

(13.26)

(say)

J

= exp(-rH) g(StJ/,d,r;u)1t(u)du.

Madan and Milne (1994) considered the Hermite expansion of the ratio of the latent risk-neutral density and the Black-Scholes Gaussian density: 1t(u) =

[~ CJlk(U)] cp(u),

and truncated this expansion at a finite number of terms, say k = K: 1t(u) = [

~ CJlh(U) ] cp (u).

(13.27)

The pricing formula now becomes K

C(tJ/,g) = exp(-rH)

Lk=O Ch Jg(S"H,d,r,u)Hk(u)cp(u)du

"K = exp(-rH) '~ k=O C{'{h(g,StJ/,cr2,r),

(13.28)

say.

The parameters r, cr2 , and Cho k = 0, ... ,K, are approximated by calibration, that is, by substituting the observed derivative prices into the following formula: (f,d2,Ch) = Arg ,min T," ,c,

Ll'{ C(t~,I5i) - exp(-r~) L c{'fk{I5i,St~,cr2,r) }2. K

FI

k=O

The calibration is performed directly on price volatilities, instead of implied Black-Scholes volatilities, to maintain the linearity of the approximation formula with respect to the parameters Ch, k = 0, ... ,K, in the expansion. Once an implied volatility or a call price surface has been recovered, it is common to display several surfaces jointly. The most interesting ones are (1) the price surface for the European calls, (2) the implied volatility surface, and (3) the implied state price densities. Let us explain how the implied state price densities can be inferred from the price surface of the European calls. The price of a European call is n

C(tJ/,K) = exp(-rH) E«St+H- K)+ISt)

= exp(-rH)f(s - Kt1tIAs ISt)ds,

Analysis Based on the Black-Scholes Model

333

where 1ta{s 1St) denotes the conditional risk-neutral density of St+H given St. By differentiating both sides of the pricing formula, we obtain (Breeden and Litzenberger 1978) C(t,H,K.) = exp(-rH) aC(t,H,K.)

aK

a2C(aK t,H,K.) 2

J; (s - K.)1ta{s ISt)d5,

- exp(-rH)

J~ K

1ta{s ISt)d5

= exp(-rH)1ta{KISt).

Thus, the computation of the second-order derivatives of the European call price with respect to the strike yields the family of state price densities as functions of the residual maturity. 13.1.5

Asymmetric Smile and Stochastic Volatility

The presence of the smile effect reveals various misspecifications of the Black-Scholes model. Let us take a closer look at the assumption of constant volatility. To explore its effect, we modify the formula by introducing a heterogeneous volatility. The pricing formula becomes C(t;H,k) = St f ljI(cr,k,H;r)j(cr)dcr,

(13.29)

where f is the volatility distribution under the risk-neutral probability. The implied volatility crBs(t;H,k) is the solution of ljI [crBs(t;H,k),k,H;r)]

PROPOSITION

=f ljI(cr,k,H,r)j(cr)dcr.

13.1: Under the condition r= 0, we have crBS(t,H,lIk) = crBS(t,H,k).

PROOF:

We have

ljI(cr,k,H,O) =

[

-:1

k+

~ crJH ] - k [

-:1

k

~ crJH ].

and: ljI(cr,lIk,H,O) =

[~ + ~ crJH] - lIk [~ - ~ crJH ]

= 1 - (11k) _ [ -log k -

crJH

! crJH] 2

334

Econometrics of Derivatives

+ (lIk) [

-:tH

k+

~ aWl]

= 1- (11k) + (Ilk)",(O',k,H,O). Let us denote by O'Bs(k) the implied volatility. We deduce 'II [O'Bs(Ilk),lIk,H,O] =f ",(0', IIk,H,O)j( O')dO' = 1- (11k) + (Ilk)

f ",(O',k,H,O)j(O')dO'

= 1- (11k) + (lIk)",[O'~k),k,H,O] = "'[O'Bs(k), IIk,H,O].

Therefore, O'BS(Ilk) = O'Bs(k) by the uniqueness of implied volatility. QED In particular, the implied volatility is an even function of the log moneyness-strike log k. This implies a zero derivative at moneyness

dO'ai l ) = 0 and an asymmetric smile when the implied volatility is plotted against k (and not log k).

13.2

Parameterized Pricing Formulas

In Chapters 8 and 11, we derived the pricing formulas for derivative assets based on either the equilibrium or absence of arbitrage conditions. In both cases, the price of the derivative was expressed as the conditional expectation of the discounted future cash flows with respect to a modified probability measure. Equivalently, it can be written as the conditional expectation of the future cash flow with respect to the historical probability measure after introducing a stochastic discount factor. In the first section, we give a general review of this class of models for which the stochastic discount factor admits a parametric specification. In the second section, we discuss the compatibility between the discount factor models and statistical inference. 13.2.1

Stochastic Discount Factor Models

We consider European derivatives backed on an underlying asset with price (St). The European derivative with residual maturity H and cash flow g(St+H) at date t + H has a price C(t;H,g). We assume that the price satisfies a stochastic discount factor model

(I3.30)

335

Parameterized Pricing Formulas

where It denotes the information set of the representative investor, and Mt,t+H is the discount factor for the period [t,t + H]. The information set

includes various variables y*, such as prices, macroeconomic variables, and volatility factors. They are called state variables. The discount factor depends on the history of these variables until the maturity date t + H. We assume a parametric specification of the discount factor

(13.31)

Mt,I+H = M(H;y1 +H;a).

Under this parametric specification, we get C(t;H,g) = E[M(H;y1+H,a)g(St+H) ttV.

(13.32)

where the conditional expectation is taken with respect to the historical probability. To obtain the expression of the derivative price, we have to compute the conditional expectation after specifying the conditional distribution of y1 +H. SHH given 21. For ease of exposition, we assume that the price of the underlying asset belongs to the information set and that the process followed by state variables is Markov. The model contains a parametric specification of the state variable transitions. Let ~ denote the associated parameter. It is a vector that may possibly share some common components with the a parameter. By computing the conditional expectation in (13.25), we get a parametric specification of the derivative price as a function of the state variables y1: C(t;H.g) = y(H.g;y1;a.~).

say.

(13.33)

EXAMPLE 13.2, CONSUMPTION-BASED CAPM: During the period (t.t + 1), between two consecutive portfolio updatings, the discount factor is [see (8.13)]

where qt is a consumer price index, and Ct is the aggregate consumption of physical goods. The pricing formula Pt = Et[PH1M t,t+I1.

is easily extended to larger horizons. For instance, we have Pt = Et[P H1M t,t+I1 =Et[EHl(PH~t+l,t+2)Mt.HI1

by iterated expectation.

Econometrics of Derivatives

336

We deduce that M t.t+2 = Mtt+1Mt+l.t+2, and more generally, Mt.t+H =

nf!ol

Mt+h.t+h+ I·

Next we select a power utility function, and write

Under the Consumption-Based Capital Asset Pricing Model (CCAPM), the information set of the investor includes the consumer price index, the consumption variable, and the prices of tradable assets. As shown 'in Section 8.1, the CCAPM is a semiparametric model that cannot provide derivative prices unless it is completed by a transition equation for the state variable (under the historical probability). EXAMPLE 13.3, RECURSIVE UTILITY: Let us consider the Epstein-Zin model introduced in Section 8.2. The discount factor is

P H 1[[ 5 ( Ct+h+1 )-(I-P) Jai ( Wt+h M t.t+H= n~ Ct+h Wt+h+1

=5aH/p ( Ct+H )-lI-P)aJP ( Ct

)1-aiP(

qt+h+1 qt+h

)-aiPJ

P( ~. )-a/P

W t )1-ai Wt+H

qt

The discount factor now depends on the evolution of the market portfolio value. Consequently, a market index has to be introduced among the state variables. By analogy to Example 13.2, the model also has to be completed by the transition equation of the state variable to provide derivative prices. EXAMPLE 13.4, DERIVATIVE PRICING IN CONTINUOUS TIME: Let us consider an underlying asset with a price that satisfies the diffusion equation dSt = ~(St)dt + cr(St)dWt. We saw in Section 11.4.3 that the discount factor is given by

S dW -"21 ft+H ( J.1, ~tr:S)2] d't,

t+H M t.I+H= exp(-rH) exp [ - f I J.1, ~tr:

t

t

I

t

when the derivative prices are assumed to depend on SI only, and the state variable is y1 =SI' In contrast to the examples corresponding to equilibrium conditions, the dynamics of the state variable SI and of the discount factor

337

Parameterized Pricing Formulas

are now jointly specified. The parameterization involves the drift and volatility functions. EXAMPLE 13.5, STOCHASTIC VOLATILITY MODEL: For a stochastic volatility model of the type {

dSt = J.LSrlt + o~rlW:, dj{ot) = a(ot)dt + b(ol)dW~,

we derived in Section 11.5.3 the discount factor t+H

Mt,t+H

= exp(-rH) exp{- (~ - r) Jt

-21 (~-

r)

2 Jt+H

t

d'o~t} exp

dW. a, '

{Jt+H

- t

2 } v, d~-21 Jt+H t v,d't.

To obtain a fully parametric model, we still have to introduce parametric specifications of the drift and volatility functions that appear in the volatility equation, as well as a parametric specification of the volatility premium (v,). The parameters of the volatility premium are not necessarily related to the parameters that characterize the dynamics of the state variable y1 = (ShOt) under the historical probability. This allows selection of the most appropriate pricing formula in an incomplete market framework, in which an infinite number of pricing formulas are a priori admissible. In the above examples, the conditional expectation that appears in the definition of the derivative price cannot be computed analytically in general and needs to be approximated by simulations (see Section 13.5 on Monte Carlo methods). Also note that the discount factor models can be simplified if the sampling dates of the asset price process and the discount factor horizons are multiples of a fixed time unit, conventionally set equal to 1. Indeed, we can always write H-l

Mt,t+H =

II Mt+h,t+h+h h~O

so that the discount factors over a unitary period need only to be specified. Simple models arise when Mt,t+!

Then, we get

= exp m(y1;a.),

say.

(13.34)

338

Econometrics of Derivatives

C(t;H,g) = E [

II:Iexp m(y;t.;a)g(St+H) ly1 ] (13.35)

= y(H,g;y1;a,~),t,H E N,

whereas the transition function also represents the dynamics at discrete dates. It is denoted by (13.36) The complete model is a nonlinear state-space model with transition equation (13.36) and measurement equation (13.35) (see Chapter 9). 13.1: The discrete time formula of the discount factor can also be used to approximate a continuous time pricing formula. For instance, the discount factor of the stochastic volatility model can be approximated by

REMARK

MI.t+H= exp(-rH) exp {- (Jl- r)

{ L exp-

I+H-I t=t

IJ

Lt=tI+H-I 0',~ -"21(Jl- r) Lt=tI+H-I ~I} 2

1 Lt+H-I

v~--

2

t=t

V

2} ' ,

where (~) and (E~) are independent sequences of independent identically distributed (i.i.d.) Standard Gaussian variables. 13.2.2 Compatibility with Statistical Inference

Let us now reconsider the question of compatibility between the stochastic discount factor model and statistical inference (see Section 13.1.2). For expositional convenience, we assume that, at any discrete date t = 1, ... , T, the econometrician observes • the price SI of the underlying asset; • the price CI of one derivative, for instance, of an at-the-money European call with residual maturity H and cash flow glSt+H) = (St+H- Sit; • other variables XI> which are not asset prices. The whole vector of observations is denoted by YI = (ShCh.K:)'. We can now distinguish different cases depending on the respective dimensions of the vector of observations YI and the vector of state variables y1. 1. If the number of state variables is strictly less than the number of observed variables, a deterministic relationship between the observed variables is spuriously created, which is at odds with available data. Therefore, the stochastic discount factor model is rejected with proba-

Statistical Inference

339

bility 1. This is the situation discussed in Section 13.1.2 for the BlackScholes model. 2. If the number of state variables is equal to the number of observed variables, in general we can recover the state variables from the observed ones. Moreover, the likelihood function is directly derived from the likelihood function associated with the state variables. 3. If the number of state variables is strictly larger than the number of observed ones, we have to integrate out some unobservable state variables to derive the observable likelihood function. Then, we can apply a filter to recover the unknown states.

13.3

Statistical Inference

Let us now discuss the estimation of parameters from observations on both the underlying asset, a derivative, and possibly some other variables X. We already presented consistent estimation methods for selected subsets of parameters. For instance, in the CCAPM model, the preference parameters can be estimated from the Euler conditions by the Generalized Method of Moments (GMM) (see Section 8.3). In the continuous time asset price models, the parameters of the diffusion equation can be estimated from the observations on (St) only (see Chapter 12). In this section, we apply a maximum likelihood approach jointly using the information on the underlying asset and derivative. We expect this method to be more efficient.

13.3.1

The Hull-White Model

Let us consider the stochastic volatility model in Example 13.4 with an Ornstein-Uhlenbeck log-volatility process:

d log O"t = ao(aj -log O"t)dt + bdW~, and a constant volatility premium v. The observed prices are St and the price of an at-the-money call Ct = y(Sto O"t;9), where

y(St>O"t;9) = E[Mt,t+H(St+H - Stt 1St> O"t], t+H dW. 1 2 Jt+H dt} Mt,t+H= exp(-rH)exp{- (/l- r) Jt O"t t -"2 (/l- r) t ~

and the parameter vector is 9 = (~,a.), with ~ = (/l,ao,aj,b), and a. = (r,v). The parameter vector ~ characterizes the state variables dynamics, whereas the

Econometrics of Derivatives

340

parameter a is associated specifically with the pricing formula. We denote by ftSt+hat+1ISe.at;~) the transition function of the state variables and by j(St+1ISt;~) the transition function of the price (St) only. Marginal Maximum Likelihood

A consistent estimator of ~ is the ML estimator based on the observed price of the underlying asset only: T

~T=Arg max ~

L1=1 IOgj(St+11~;~),

whenever ~ is identifiable from this partial information. In this optimization, j generally has to be computed numerically. Global Maximum Likelihood The pair (St,Gt) satisfies a one-to-one relationship with the pair (Sloat):

where 1* is the inverse of y with respect to the volatility. The transition equation for prices is derived from the Jacobian formula

The maximum likelihood estimator of e = (l3,a) is: T

aT = Arg max e

L1=1 logf*(St+hGt+1I Se.Gt;O).

In this optimization, both 1* and ftSl+hat+1ISt,at) involve multiple integrals and have to be computed numerically (see Section 13.5). Two·Step Method A consistent estimation method, which is less efficient although easier to implement, relies on the marginal ML estimator of ~ and estimates a as T

a.T = Arg max a

L1=1 log f*(St+hGt+1ISt,Gt;a'~T).

The various estimation methods presented in this section are maximum likelihood methods. Thus, the corresponding estimators feature standard asymptotic properties of the ML estimator. However, they cannot be applied directly since the pricing function y, its inverse, and its derivative have no explicit form. For this reason, the pricing formulas have to be approximated by simulations before the optimizations are performed. We define the maximum simulated likelihood estimator

341

Statistical Inference T

esT = Arg max 9

L log j*S(St+hCt+IISt>Ct;8), t=1

where:

and y*s is the inverse of a Monte Carlo approximation of y. 13.3.2

General Case

In the general case, the number of factors is strictly greater than the number of observed prices. Let us consider a sample of regularly spaced observations on the prices of K derivatives. The K-dimensional vector of derivative prices can be written as (13.37) where the factor process satisfies the transition equation 1(y1+lly1;8).

(13.38)

The likelihood function of this nonlinear factor model has no analytical expression for two reasons: First we have to integrate out the unobservable factors, which requires computing integrals of dimension (KL)T, where L is the number of factors. Second, the y function is a conditional expectation without an analytical expression. The estimation methods for the parameter 8 all involve simulations (see Sections 9.3, 12.4, and 12.5). Let us, for instance, consider the method of simulated moments, with a number of moments set equal to the parameter dimension p. We denote by b(Ct>Ct_l ) the p-dimensional price transformation, which is used for moment calibration. Let us introduce the simulated paths of the factor process corresponding to the transition function f(y1+lly1;8) and the value 8 of the parameter. They are denoted by y;"'(8), t = 1, ... , T, s = 1, ... , S. Let us also denote by YS* an approximation of y obtained by a Monte Carlo method based on S* replications (see Section 13.5). Then, the MSM estimator of 8 is the solution of the set of calibrating equations: (13.39) The method involves two sets of simulations: the first one to approximate y, and the second one to generate the factors. The method of simu-

342

Econometrics of Derivatives

lated moments (MSM) estimator is consistent when both T and S* tend to infinity (not necessarily S).

13.4 Stochastic Risk-Neutral Probability We have discussed various procedures that allow approximation of the latent risk-neutral probability. Some heuristic approaches described in Section 13.1.4 are designed to provide an outcome at a fixed date. Therefore, the computations have to be performed daily. The reason is that heuristic methods do not account for latent factors that describe the dynamics of the state price densities. Thus, they are appropriate for a crosssectional analysis, but not for prediction making. The dynamic parametric specifications of the risk-neutral density were introduced in Section 3.2. They contain a limited number of latent factors that allow for statistical inference on a limited number of derivative prices. However, the parametric models cannot be used when the number of derivatives increases in time since the order condition (see Section 13.2.2) is no longer satisfied. One can argue that an appropriate specification needs to include the state prices (i.e., Arrow-Debreu prices) as the unobserved latent variables. Thus, the number of latent error terms has to be equal to the number of admissible states, that is, infinite. Equivalently, the risk-neutral probability could be assumed stochastic to accommodate the effect of the large amount of information unavailable to an econometrician. Such an approach was first introduced by Clement, Gourieroux, and Monfort (2000). We describe below the method of cross-sectional analysis of the state price density at a given residual maturity H; the original paper has an extension to a dynamic framework. 13.4.1

Stochastic Model for the State Price Density

We assume a state price density (for the residual maturity H). The price of a European derivative with cash flow g(St+H) is C(t,H,g) = I g(s)dQls),

(13.40)

where dQt(s) = Qls + ds) - Qt(s) is the price of the digital option providing $1 at t + H if St+H belongs to [s, s + ds]. This price includes the discounting and risk correction (i.e., the risk-neutral densities). Due to insufficient information, the econometrician does not know the state price density exactly. The missing information can be accommodated by assuming a random state price measure. For a generic element of the probability space w, we get

343

Stochastic Risk-Neutral Probability

C(t,H,g,w) = f g(s)dQt(s,w).

(13.41)

Then, the derivative prices also depend on the generic element w and therefore are stochastic. This leads us to a specification with latent variables, in which we define (1) the distribution of the latent prices, that is, the distribution of the stochastic state price density [dQb), s varying] and (2) the links between the observed derivative prices and the latent prices defined by (13.41). Due to the linearity of measurement equation (13.41), the first- and second-order moments of the derivative prices are easy to derive from the first- and second-order moments of the state price density. Let us define Eu!1Qls,w) = dmt(s), Covw[dQls,w), dQb',w)] = Glds,ds'), Vw[dQls,w)] = Clds),

(13.42) (13.43) (13.44)

where the expectation and variance are evaluated with respect to the distribution of the generic element of the probability space. We deduce EwC(t,H,g;w) = Ew[J g(s)dQt(s,w)1 = fEw[ g(s)dQls,w)1

(13.45)

= f g(s)dmb).

Similarly, we get Covw[C(t,H,g,w), C(t,H,g,w)] = f f g(s)g(s')Glds,ds') + f g(s)g(s)Clds).

13.4.2

(13.46)

Gamma Model

At this point, we can introduce a tractable specification of the stochastic valuation measure. Clement et al. (2000) proposed a gamma specification for at least the three following reasons: 1. It leads to a clear-cut factorization of the distribution into components corresponding to the zero-coupon price and risk-neutral probability. 2. It is close to a deterministic valuation formula, such as that of Black and Scholes. 3. The estimation and computation steps are easily performed by simulation-based inference methods.

344

Econometrics of Derivatives

DEFINITION

13.1: The random measure Q/ is a gamma measure

if and only if:

(i) it has independent increments, that is, the variables Ql[SbSd), ... , (Q/[sn-bSJ) are independent for any n, S1 < S2· .. < sn; (ii) the variable ~ ([sbsd) follows a gamma distribution with parameters Vtfsb sJ, and A.o where v/ is a positive deterministic measure on R+, and At is a positive real number.

real

Therefore, the distribution of the random measure is characterized by a At and a deterministic measure VI. Let us recall that the gamma distribu-

tion with parameters v, and A. has the density f(y) = r~v) exp(-A,y)A. yv-11"o, V

mean viA., and variance v/A.2• We deduce the first- and second-order moments of the random measure under the gamma specification: 1 dmb) = I; dvls),

1 Clds) = A.; dvb),

Glds,ds') = 0,

(13.4 7)

where the last equality results from the property of independent increments. Therefore, the second-order properties of derivative prices are 1

EwC(t,H,g;w) = I; Jg(s)dv/(s),

1 Jg2(s)dv/(s), VwC(t,H,g;w) = A.;

1

Covw[C(t,H,g;w),C(t,H,g;w)] = ~ Jg(s)g(s)dvt(s).

(13.48) (13.49) (13.50)

Equations (13.48)-(13.50) suggest the interpretations of the parameters Vt (.) and At. For instance, dvls)/At is the average digital option price, whereas At measures the ex-ante uncertainty about this price. In the limiting case At --+ + 00, we get a deterministic formula for derivative pricing that corresponds to the standard complete market framework. EXAMPLE 13.6: As an illustration, we can extend the standard BlackScholes model by allowing for stochastic state prices. We assume vlds) = At exp(-rH)n:s(ds),

where n:s is the Black-Scholes risk-neutral probability of St+H given St. Thus, we get EwC(t;H,g,w) = exp(-rH) Jg(s)n:s(ds),

that is, the Black-Scholes formula written as an expectation. The accuracy of the derivative prices is measured by

345

Monte Carlo Methods

1 Covw[C(tJi,g,w),C(tJi,g,w)] = ~ exp(--rH) Jg(s)g(s)rr:s(ds).

It depends on the Black-Scholes price of the derivative, whose cash flow is equal to the product g(SHH) g(St+H). Thus introducing of a stochastic state price allows to evaluate the accuracy of the Black-Scholes formula by estimating the parameter At

13.5

Monte Carlo Methods

In general, derivative prices do not admit analytical expressions and have to be computed numerically. In this section, we consider approximations obtained by simulations. They are used not only to predict the derivative prices, but also to build the likelihood function when option prices are observed (see Section 13.3). We first recall various classical techniques to compute an integral by Monte Carlo experiments. Then, we discuss its implementation to option pricing. In particular, we explain how to use jointly the historical and risk-neutral densities in numerical computations. 13.5.1

The Approach

Let us consider an integral 1= J a(z)dz,

(13.51)

which can be multidimensional. Although the function a is known, we cannot compute analytically the integral and find the value of 1. A Monte Carlo method consists in introducing f, a known pdf, and rewriting I as follows: a(z) [ a(Z) ] 1= j{z) j{z)dz = Ej j{Z) .

J

(13.52)

Thus, the integral I is an expectation, which can be approximated by an empirical average. DEFINITION

13.2: A Monte-Carlo estimator of I is:

f

-!

s- S

i

s=!

a(z,) j{z,) ,

where z" s = 1, ... , S, are independent drawings in the distribution f

This estimator is unbiased of I, with a variance equal to 1 [a(Z)] V (Is) = sVr j{Z) . A

(13.53)

346

Econometrics of Derivatives

Its variance depends on the selected distribution f, called the importance function. Intuitively, optimal accuracy is achieved when alf is constant, that is, when f is proportional to a. However, the coefficient of proportionality 1IIa(z)dz = 111 is unknown. Nevertheless, accurate approximations can still be obtained when a and f have similar patterns. There exist various possibilities to improve the general Monte Carlo approach: First, let us select a symmetric distribution f We can double the number of simulated realizations by considering z" as well as the same vector with the opposite sign -z,. The estimator becomes

f~ = ~

2S

i

[a(z,) + a(-z,)]. j{z,) j{-z,)

s=1

(13.54)

This is a method involving antithetic variables. The estimator is unbiased, with a variance given by .

[a(Z) a(-Z)]

Aa 1 V(ls) = 4Sl't j{Z) + j{-Z) .

(13.55)

The use of antithetic variables is intended to reduce the variance of the estimator and to enhance its efficiency. Such an improvement is achieved when the function alf is almost symmetric. Second, sometimes we know an approximation ao of a that is easy to integrate. Then, we can write I = I a(z)dz

= I ao(z)dz + I[a(z) - ao(z)]dz = 10 + I[a(z) - ao(z)]dz,

and estimate the value of I by ~_L 0

lS -

.!. ~ a(z,) -

ao(z,) j{z,) .

+ S 1:t

(13.56)

The estimator is unbiased with variance Vi(l!,) ='!'v, [a(Z) - ao(Z)] s

S

f

j{Z)

.

(13.57)

13.5.2 Application to Option Pricing

Let us consider a risky asset with the price (Yt) that satisfies the diffusion equation

347

Monte Carlo Methods

(13.58) We changed the notation for the asset price from SI to YI to avoid confusion with the number S of replications. The corresponding risk-neutral model is (13.59) For computational ease the continuous time processes in (13.58) and (13.59) are replaced by their Euler discretized versions at a short time interval For 0 = 1, these discretized models are

o.

YI = YI-I

+ Jl(YI-I) + O"(YI_I)£I,

(13.60)

YI = YI-I

+ rYI_I + 0" (YI-I)

(13.61)

£1.

They correspond to the transitions 1_ rn [YI - YI-I - Jl(YI-I) ] PIII-I -_ _ ( )Y () , 0" YI_I 0" YI-I

(13.62)

(13.63) under the historical and risk-neutral probabilities, respectively. Let us now consider the pricing at t of a European option with residual maturity H and cash flow g(Yt+H). Its price is given by (13.64)

C(t;H,g) = exp(-rH)E,,[g(YI+H) IYI].

The conditional expectation is an integral: 1= E,,[g(YI+H) IYIl H-1

H-I

= Jg(YI+H) II 1t +hl/+h-I(YI+h IYt+h-l) II dYt+h. h=1 1

h=1

This integral admits different equivalent expressions: H-I

I

H-1

H-I

h=1

h=1

(Yt+h IYI+h-1) II I ) II d:Yt+h = Jg (Yt+h) II 1t1+h+hIt+h-1 P (I ) PI+h 1/+h-1( Yt+h YI+h-1 h=1

H-I

=

t+hlt+h-I Yt+h Yt+h-I

H-I

(13.65)

H-1

Jg(YI+h) II ml+hl/+h-I(Yt+h IYt+h-l) II Pt+hl/+h-I(Yt+h IYt+h-l) II dYI+I h=1

h=1

h=1

which involve the stochastic discount factor. I can also be defined in terms of future shocks under either the riskneutral probability or the historical one. Let us denote z = (£1+1. ... ,£t+H)' standard Gaussian shocks. We get

Econometrics of Derivatives

348 H-I

J

1= I(YI' z)

H-I

II at - bt + A'Y~ > O. This condition is satisfied if at - bt ~ O. Let us now examine the expressions of the bid and ask prices under a rational expectation hypothesis. AsSUMPTION A.14.4: The market maker knows the distribution of P'f'+b'J1t+l conditional on the information It and the behavior of traders. Then, the ask and bid prices satisfy

1 { at:E[Pl+11/t, Zt:+ ), bt -E[Pt+ll/t. Zt-- 1),

or, equivalently, if ~t > 0,

at = E[P1 +ll/t.'J11+1 > (at - y],y

E

R+.

(14.24)

This is a decreasing function with limiting values S(O) = + I,S( + 00) = O. It satisfies S(y) = 1 - F(y) =

r,

f('t)d't.

(14.25)

The hazard function or intensity A provides the instantaneous probability of occurrence of a trade after a time y when no trades have arrived; it is defined by A(y) = lim d, ....O

i p[y O. The family of Burr distributions includes • the Weibull distributions in the limiting case a ~ 0, • the exponential distribution, when a ~ 0 and ~ = 1, • the log-logistic distribution. 14.4.2

Activity and Coactivity Measures

In this section, we examine the trading intensity (rate), defined as the number of transactions concluded in a fixed time interval. The rationale for this approach is a duality between the modeling of duration dynamics and of the transaction rate dynamics. However, from the perspectives of a financial analyst interested in asset liquidity, the outcomes of both methods are equivalent. The One-Asset Case

Before analyzing the temporal dependence of the transaction rate, it is important to eliminate potential sources of nonstationarities. For this reason, we need to investigate intraday periodicities, called the intraday seasonalities. Gourieroux, Jasiak, and Le Fol (1999) introduced a periodic model in which the transaction rate depends on the time of day: p(one trade between t and t + dt, day m) = A,(t)dt + o(dt), p(strictly more than one trade between t and t + dt, day m) = o(dt),

(14.28)

(14.29)

where the time t is measured since the market opening on a given day m. Thus, A, depends on the time of day, but not on the day itself, which is the pertaining periodicity condition. Moreover, it does not depend on the entire trade history. Let us denote by Nm(t) the number of transactions on day m up to time t. It is known that

J:

ENm(t) = A,(u)du = A(t)

(say),

(14.30)

where A is the so-called cumulated hazard function. A consistent estimator of the function A is obtained from trade observations on M consecutive days for a large M:

AIM

A(t) =M

L Nm(t). m=1

(14.31)

393

Intertrade Durations

Il)

10:0110:30

11:30

12:30

13:30

14:30

15:30

16:30 17:00

~ .,....

· C\I

Il)

~ .,....

·

II'!

«

Il)

~ co

~ .,....

·

Il)

0

5000

10000

15000

20000

25000

seconds

Figure 14.11

Trading Rate, Alcatel

Since the counting process t --+ Nm(t) is not differentiable, an estimator of the hazard function can only be derived after preliminary smoothing. A kernel estimator of the trading intensity A. is (14.32) where dn{m) is the time of the nth trade on day m, N m is the total number of transactions, K is a kernel, and hM is the bandwidth. The estimated trading rate for the Alcatel data is presented in Figure 14.1l. The trading rate is time dependent and features characteristic seasonal patterns: a low trade intensity shortly after the market opening and shortly before the closure; a minimal trade intensity at the lunch time; a high intensity at the opening of foreign markets, that is, the London Stock Exchange around 10:30 and the New York Stock Exchange (NYSE) around 15:30. The activity pattern is generally market dependent and varies across stock exchange markets. Especially, an M-shaped intensity is obtained for liquid assets traded on the Paris Bourse, while a V-shaped intensity is reported on the NYSE. The Multiasset Framework

The previous analysis can be extended to a multiasset framework. For convenience, we consider two assets, 1 and 2. We define various periodic trading rates:

394

Dynamic Models for High-Frequency Data

P (one trade between t and t + dt, day m, and this is a trade of asset 1 only) = An(t)dt + o(dt), P (one trade between t and t + dt, day m, and this is a trade of asset 2 only) = ~2(t)dt + o(dt), P (two trades between t and t + dt, day m, and these are trades of both assets 1 and 2) = AI2(t)dt + o(dt), P (no trades between t and t + dt, day m), = 1 - An(t) - ~2(t) - AI2(t) + o(dt).

These various rates are related to counting processes that measure the numbers of trades before t on day m for assets 1 and 2. They are denoted by N~(t) and N~ (t), respectively. Between t and t + dt, the increments of the counting processes, that is, N~(t + dt) - N~(t) = dN~(t) and N~(t + dt) N;.(t) =dN~(t), are either 0 or 1 if dt is sufficiently small. We deduce that E[dN~(t)] = A1(t)dt = [An(t) + AI2(t)]dt, E[dN~(t)] = ~(t)dt = [~2(t) + AI2(t)]dt, V[dN~(t)] = A1(t)dt, V[dN~(t)] = ~(t)dt, Cov[dN~(t), dN~(t)] = AI2(t)dt.

Thus, we can approximate the instantaneous trade occurrence at t by the variance-covariance matrix (14.33)

This matrix measures the instantaneous liquidity risk and is similar to the volatility-covolatility matrix that represents the risk on returns. Its diagonal elements give the marginal trading rates of each asset (called activity measures). The off-diagonal element A12 is called the coactivity measure. It measures the intensity of simultaneous trades of both assets. Figure 14.12 shows the activity-coactivity measures for the stocks Alcatel and St. Gobain traded on the Paris Bourse. The coactivity has been multiplied by 2.106 to allow for a comparison.

14.4.3 Autoregressive Conditional Duration The Models

In the spirit of ARCH models introduced for time series of returns, Engle and Russell (1997, 1998a, 1998b) developed a similar specification for series of durations. The basic idea consists of introducing a path-dependent time

Intertrade Durations

It)

395

10:00 10:30

11 :30

12:30

13:30

14:30

15:30

16:30 17:00

.....~ (\I

It)

~

..... It)

- Alcalal - _. SainI Gobain - Alcalal-SI Gobain

I

"

,

---

------

---------,

O~r_----~__----~~--~~------r_----~~

o

5000

10000

15000

20000

25000

seconds

Figure 14.12 Activity and Coactivity Measures, Alcatel-St. Gobain

deformation such that the durations expressed on the new time scale are LLd. More precisely, let us consider the sequence of observed durations 'tn, n = 1, .... We denote by "'n = a('tn-I),

(14.34)

a time scale function of the lagged durations. Then, we assume that (14.35) is a sequence of Li.d. variables with the same distribution f Expressions (14_34) and (14.35) define the family of autoregressive conditional duration (ACD) models. The ACD models differ by the assumptions on the functional form of time deformation and on the distribution of errors. ExAMPLE 14.5: Engle and Russell proposed a type of ARMA dynamics for "'n: "'n = W + ltl 0, (al-

= at +

Similarly, we get

al+~ "f~)/~I]

~;;["'1+1 II/> "'1+1 > (al -

al +

~ it )/~I]

Global or Sequential Matching

Appendix 14.2 LEMMA

407

Global or Sequential Matching

14.2: The price p' lies between PI and P2.

PROOF:

The price p' corresponds to a zero global excess demand: a(p·) = 0,

where a(p) = DI (P) + D 2(P) - SI (P) - S2(P) is a decreasing function. The price P2 is such that a*(P2) = 0,

where a*(p) =D2(P) + (DI(P) - Vlt - S2(P) - (SI(P) - Vlt is a decreasing function. Moreover, the price PI gives a null excess demand DI - SI or, equivalently, it solves ~(PI) = 0,

where ~(P) =a(p) - a(p*) =mineO, DI - VI) - mineO, SI - VI)' This last function is decreasing, with positive values for P < PI and negative values otherwise. We can also note that a(p) = a*(p) + ~(P).

Let us now consider the case PI < P2 (the reasoning is similar in the opposite case). We get a(pI) = a*(pI) + ~(PI) = a*(pI) > 0, a(P2) = a*(P2) + ~(P2) = ~(P2) < 0.

By the mean value theorem, we deduce that p' QED

E

[Ph P2]'

LEMMA 14.3: v'::::; VI + V2, and if PI < P2, any seller served at T by the global matching is also served by the sequential procedure. PROOF: It is sufficient to prove the second part of the lemma, which has a first part that is a direct consequence. Let us now consider the sell orders in the sequential procedure.

If the limiting price is P < PI and the order enters in the first subperiod, it is filled at T I • If the limiting price is P < PI and the order enters in the second subperiod, it is filled at T. If the limiting price is between PI and P2, it is filled at T. The result follows. QED

15 Market Indexes

THE STOCK MARKET INDEXES play an important role in finance. They summarize joint evolution of multiple assets, provide proxies for market portfolios, and consequently allow testing of structural models such as the Capital Asset Pricing Model (CAPM). There exist derivatives written on stock indexes that belong to the most actively traded assets on stock markets. In the first section, we recall standard definitions of price indexes. The concept of stock market indexes is borrowed from consumption theory. Indeed, the first indexes ever built were the consumer price indexes. They were introduced into the literature at the end of the nineteenth century to examine the effect of gold mine discoveries on inflation (see, e.g., Laspeyres 1864; Jevons 1863, 1865; Kramar 1886; Nicholson 1887). Two widely used index formulas are the so-called Laspeyres and Paasche indexes; we compare their properties. The second section is devoted to the use and description of stock market indexes. In Section 15.3, we explain why a linear dynamic factor model fails to accommodate asset prices and market indexes simultaneously. Finally, in Section 15.4, we discuss the endogenous selection of weights and assets included in market indexes.

15.1

Price Indexes

15.1.1 Basic Notions The use of stock market indexes follows from the tradition of computing standard consumer price indexes (see, e.g., Laspeyres 1864; Jevons 1865; Paasche 1870; Schumpeter 1905; Fisher 1922; Konus 1939). Therefore, it is insightful to recall the definitions of consumer price indexes before introducing the stock market indexes. 409

410

Market Indexes

Let us consider two states, corresponding to two distinct dates, denoted 0 and 1. The consumption bundle includes n goods, with quantities qo =(qlO, ... , qnO)' in state 0 and ql =(qu, ... , qnl)' in state 1. The associated price vectors are po =(P1O, ... ,PnO)' and PI =(pu, ... ,Pnl)'. The modification of consumer expenditure between the two states is

WI piql Wo Poqo

-=-,-

piqlPiqo =-,--,PlqoPoqo p~//Ipiql = Pw'o prtJI

(15.1)

.

The second factor of this multiplicative decomposition captures the price effect for an unchanged bundle of goods, whereas the first factor measures the quantity effect evaluated at the same price levels. The price and quantity effects differ essentially in terms of weights, which may correspond to either state 0 or state 1. By convention, an index with weights that correspond to state 0 is called the Laspeyres index. When the weights correspond to state 1, it is called the Paasche index. We distinguish four different indexes underlying decomposition (15.1), namely, a Laspeyres index for prices (15.2) a Paasche index for prices (15.3) a Laspeyres index for quantities (15.4) and a Paasche index for quantities (15.5) Thus, the decomposition of the relative expenditure modification can be written as (15.6)

411

Price Indexes 15.1.2 Fixed·Base versus Chain Index

A multiplicative decomposition involving more than two states (i.e., dates) is rather complex. However, it has to be determined whenever a consumption pattern over a sequence of dates t = 0, 1, 2, ... is examined. In such a case, several alternative approaches can be adopted. Fixed Base We can select a benchmark state (date), set by convention at t = 0, that is used to define the weights of the Laspeyres indexes at all future dates. Then, we have

W/Wo= Ct/o(p)Pt/o(q) p~qo p~qt

=-,--,-.

poqoPtqo A set of coherent measures of price evolutions is defined by the ratio

l!fh. l t=p'oqo'

t=O,I, ... ,

(15.7)

which is called the index value with base 0. The corresponding evolution of the index between t and t + h is It+h

It+hlt=r· This expression measures the change of the value of a fixed consumption bundle qo between t and t + h. Chain Index The benchmark state can be successively changed at each date (Marshall 1887). The idea is to consider the subsequent decompositions

w:Wt-I = Ctlt-I (P)Ptlt-M) t

p~qt-I p~qt

=P~-Iqt-I p~qt-I ' and to measure the price evolution between t - 1 and t by

p;qt-I I tlt- I = p----'------- . t-lqt-I The price index is finally defined by

(15.8)

(15.9)

with 10 = 1, by convention.

Market Indexes

412

Alternatively, the composite indexes can be based on the Paasche formula, for which temporal consistency is ensured by the chain index

/1= II /1lt-1> ...1 t

* p~qt whereltlt_l=p-'- . t-Iqt

Even though it seems more natural to select weights corresponding to date t - 1 rather than weights corresponding to a future date t, we have to remember that decomposition formula (15.6) represents a mixture of the Laspeyres and Paasche indexes. Generally, when alternative states correspond to different dates, Laspeyres price indexes are selected. For international comparisons for which the states correspond to different countries, both indexes can be considered.

Comparison of Laspeyres and Paasche Indexes

15.1.3

The selection of weights may have a significant impact on the measure of price evolution. A comparison between the Laspeyres and Paasche indexes provides insights on the direction and magnitude of these effects. We consider again the two-state setup. PROPOSITION

15.1:

(i) .c1/o(P) = 'L">=I Cl;o pPil , where Cl;o = PioqiO/P'oq(}o iO

(ii) PI/o(P) = (

~I Cl;1 ~::

r

PiIqil/P~ql.

where Cl;1 =

This proposition is easily proved. It provides an interpretation of the composite Laspeyres index as an arithmetic average of elementary price changes with weights corresponding to the budget shares Cl;o in situation O. Similarly, the composite Paasche index is a harmonic average of elementary price changes with weights corresponding to the budget shares Cl;1 in state l. Proposition 15.2 is directly implied by the Jensen inequality. PROPOSITION 15.2: If the budget shares are invariant Cl;o = Cl;b 'V ~ then .c1/o(P) ~ PI/iP)· Even if the budget shares are not invariant, this inequality is often satisfied in practice. This is due to Proposition 15.3. PROPOSITION

15.3:

.cl/O(p) - PI/O(P) = -.cl/o(qrl cOVa.(Plipli,

'1i!),

iO qiO

where cova. denotes the covariance computed using the weights UiO, i = 1, ... , n.

Market Indexes PROOF:

413

We get

= - ( ~i=1 Cl;o-) -1 COV"" ~ -, -

qil qiO

n

Pil qil ) • iO qiO

QED Standard consumption theory implies that this quantity is nonnegative. Indeed, if the elementary price ratio Pil/PiO is higher for good i, the good is perceived as expensive, and its consumption decreases. Therefore, we expect that qil1qiO is a decreasing function of PilIPiO. Hence, the covariance is negative, and the difference £1/0(P) - Pl/o(p) is positive.

15.2 15.2.1

Market Indexes

The Use of Market Indexes

Market indexes are designed for different purposes. They can be used as measures of asset price evolutions, benchmarks for evaluating the performance of portfolio management, support of derivatives, and economic indicators. Since these various functions are not entirely compatible, indexes with various characteristics need to be designed. Measure of Asset Price Evolution The indicator needs to allow for a clear and quick interpretation of the sign and size of price modifications. It has to be computationally simple and evaluated in practice from a limited sample of assets that quickly respond to shocks (i.e., are highly liquid). The weights may be set equal for all assets in the sample or else may depend on the current importance of assets in terms of their capitalization, that is, of the total capital corresponding to the issued shares. Benchmark for Portfolio Management Since a market index is often interpreted as the value of an efficient portfolio, it seems natural to assess the performance of a portfolio manager with respect to the performance of the market portfolio. We comment

Market Indexes

414

below on the consequences of this practice on the composition of the market index. First, in general, a portfolio manager adopts a dynamic strategy that involves frequent updating of the portfolio allocation. It is natural to compare the performance of this manager to the performance of the efficient dynamic portfolio that admits a priori time-varying allocations. Thus, it is preferable to select a benchmark with time-varying allocations (i.e., a chain index) instead of a Laspeyres index, which has a flxed allocation that corresponds to a static management scheme. This is the case when the market index is weighted by current capitalization. Second, to assess personal performance, the manager takes into account the total return on his or her portfolio, including the modiflcation of the portfolio value and cash flows received during the management period. Let us consider a stock index that can eventually account for the dividends:

C (P) _ rIa· 1= 1,1- IP'1,1 1/1-1 - ....n

(15.10)

,

""i=1 ai,I-lpi,I-1

C

f,h rl\ _ r I1=a ·1,1-If,h. '1'1,1 tll-1 '1' ,IN) .... n ""i=1

+d·) 1,1

(15.11)

'

aiJ-Ipi,I-1 C f,h d*) = r 1=l a·1,1-I(P'1,1 +d*) 1,1 tll-I\f" ....n

,

(15.12)

""i=1 ai,I-lpi,I-1

where ai,l-1 is the quantity in asset i for date t - 1, di,l is the dividend received between t - 1 and t, and dtl the dividend immediately reinvested in asset i. Therefore, it is useful to distinguish the price index (i.e., without dividends) from the return index (including dividends) since the modiflcations of the latter are always larger. The return index is an adequate benchmark for evaluation of portfolio performance, despite that in practice portfolio managers use price indexes, that are easier to outperform. Third, flnally note that market indexes are computed ex-post, that is, after observing the price. Such ex-post performance measures have to be distinguished from ex-ante measures, which take into account potential risk. Examples of such measures are the Sharpe performance coefficient (see Chapter 3) and the (conditional) Value at Risk (see Chapter 16).

Support of Derivatives Options or futures on market indexes belong to the most frequently traded derivatives. They initially were introduced as hedging instruments against the market risk and are now used also by speculators, who try to beneflt from transitory mispricing of these derivatives. The market indexes have to be updated very frequently (for instance, every 30 seconds)

415

Market Indexes

to allow for frequent trading of the derivatives. They also have to be sufficiently volatile because otherwise risk on the market index would not be large enough, and the derivatives would be useless. As well, the way in which the selection and updating of weights is implemented has to prevent perfect arbitrage opportunities. Economic Indicator The stock indexes summarize the values of companies and reflect the underlying economic fundamentals. In this regard, they provide useful inputs into macroeconomic studies and national accounting. They can be compared to other price indexes, like consumer price indexes, which generally are Laspeyres indexes. For this type of application, the weights of Laspeyres market indexes need to ensure sectorial representativeness.

15.2.2 Main Stock Market Indexes US Indexes To the most commonly traded market indexes belong the indexes computed by Standard and Poor's (S&P), Dow Jones Company, or by major stock exchanges. The main indexes are weighted by current capitalization. The composite New York Stock Exchange (NYSE) and the NASDAQ (National Association of Securities Dealer Automated Quotation) are comprehensive indexes, including all assets that are quoted by these institutions. The composite NYSE includes about 1,600 stocks. The S&P 500 includes 500 stocks of the NYSE, representing about 80% of the capitalization. The S&P 100 includes the 100 most important ones. The S&P 400 Midcap includes 400 assets that do not belong to the S&P 500: 246 from the NYSE, 141 from the NASDAQ, and 13 from the AMEX (American Stock Exchange). The last three indexes support their own derivatives. The Dow Jones is an index with equal weights. It includes 30 stocks that represent about 25% of the NYSE. UK Indexes The main indexes in the United Kingdom are jointly computed by the Financial Times and the London Stock Exchange (LSE). The FT-SE 100 (called Footsie) includes 100 assets that represent about 70% of the capitalization of the LSE. It is weighted by current capitalization and supports options and forward contracts. The FT-30 includes 30 most important stocks, whereas the FT -Actuarial-All Shares is a general index. The latter index includes not only 650 stocks, but also bonds. It represents about 80% of the total capitalization.

416

Market Indexes

Japanese Indexes

For a long time, the Nikkei was a market index with equal weights. The weights are now related to the capitalization. It includes about 225 stocks, for 70% of the total capitalization of the Tokyo Stock Exchange. T1:e TOPIX index is a comprehensive index for the first section of the Tokyo Stock Exchange (about 1,100 stocks). It is weighted by capitalization. Both indexes support their own derivatives. French Indexes

French indexes are computed and diffused by the Paris Bourse (Euronext). The CAC 40 (CAC is for Cotation Assistee en Continu, i.e., continuous quotation) is continuously computed and diffused every 30 seconds. It supports derivatives and includes 40 important stocks. The general indexes SBF 120 and 250 (Societe des Bourses Fran~aises) are computed daily. They include 120 and 250 assets, respectively, and take into account the sectorial representativeness. The "indice second marche" is an index for emerging markets. It is computed daily and is a comprehensive index of the secondary market, with a highly volatile composition due to variation of companies who join or quit the index. Table 15.1 provides some insights on the dynamics of an emerging market. Canadian Indexes

The Canadian indexes are managed by the Toronto Stock Exchange (TSE). The TSE 35 and TSE 100 include 35 and 100 of Canada's largest corporations, respectively. They support derivatives such as the TIPS 35 and TIPS

Table 15.1

The Size of the Paris Secondary Market (1983-1991)

Arriving Year

Number of Stocks (End of December)

1983 1984 1985 1986 1987 1988 1989 1990 1991

43 72 127 180 258 286 298 295 288

Quitting

% of Total % of Total Number Capitalization Number Capitalization 4 48% 30% 32 3 51% 0.3% 56 1 57 32% 4 8.6% 19% 87 36% 9 10% 17% 37 9 39 13% 20% 27 17 7% 20 14% 13 10% 2% 20

417

Market Indexes

100, which are portfolios that mimic the indexes. The TSE 300 composite index is the general index and includes 14 groups of companies.

15.2.3 Market Index and Market Porifolio

The test of the CAPM hypothesis is usually performed by regressing the asset returns (or gains) on the market portfolio return (or value modification). This regression may include an intercept that either is constant (Proposition 4.3) or depends on lagged variables (Proposition 6.4). Under the CAPM hypothesis, the intercept is equal to O. Market indexes are often used as proxies for the value of the market portfolio. This approximation may result in misleading conclusions on the CAPM hypothesis tests. This argument is known in the literature as the Roll's critique (Roll 1977). It is mainly due to the structure of market indexes with asset components that are time varying, whereas the market portfolio is theoretically based on a fixed set of assets. Contrary to the market portfolio, the weights in market indexes may be cap constrained. As well, the market indexes mayor may not include dividends. A formal discussion of this problem is given below in the framework of an errorin-variable model. Under the CAPM hypothesis, the regression model

where EUt = 0, Cov(uhYm,t) = 0, admits a zero intercept: c = O. Let us assume that we approximate the market portfolio return Ym,t by the rate of increase of a market index 1m,/> say, and consider the regression

where Eu1 = 0, Cov(u1,!m,t) =O. We get

* = EY.t -

c

Cov(Yt,!m,t) E(l ) Vi(l) m,t m,t

= Cov(Yt'ym,t) E(Y. ) _ Cov(Yt,!m,t) E(l ) V(Ym,t) m,t V(lm,t) m,t .

(15.13) (15,14)

In general, this coefficient is different from O. For instance, in the errorin-variable model,

where E'rlt = 0, Cov(1lt'ym,t) = 0, COV(1lhYt) = 0, the intercept is equal to

418

Market Indexes

* = COV(Y;,Ym,t) E(Y.

c

Vi(Y.m,t)

(15.15)

) _ Cov(Y;,Ym,t) E(Y. ) m,t Vi(Y.) + VnItt m,t m,t

=Cov(Y;,Ym,t)E(Ym,t)[Vi(~m,t ) -

V(Y. J+ V ] m,t 11t

* O.

(15,16)

The approximation of the market portfolio by a market index leads to a spurious rejection of the true CAPM hypothesis.

15.3 Price Index and Factor Model By selecting a fIxed portfolio to construct the Laspeyres index, we induce a lack of representativeness of the asset price dynamics in the long run. We fIrst consider this problem when the relative asset price changes are Li.d. Then, the analysis is extended to a dynamic model including factors.

15.3.1 Limiting Behavior of a Price Index Let us consider the price changes Yi,t = P;'/Pi,t-h i = 1, ... , n, and assume that the vectors (YI,h"" Yn,t)', t varying, are independent with identical distribution. We denote by Il and n the mean and variance-covariance matrix of (log YI,h"" log Yn,t)" respectively. The price at date t can be written as pi,t = AOYi,1 ' , 'Yit = pi,O

exp{~1 log Yi,t}

= pi,O exp tlli exp{-w

~ ~I (log Yi,t -Il;)}.

Therefore, for large t, we can apply the central limit theorem, yielding the approximation

pi,t = Ao exp (tlli) exp-wui,

(15,17)

where the vector U =(Uh ' , , , un)' is Gaussian with mean 0 and variancecovariance matrix U The above approximation can be used to derive an asymptotic absence of arbitrage opportunity (AAO) condition. Let us consider an investor with a fixed arbitrage portfolio allocation in assets i and j, where Jl; > J.l;. The allocation satisfIes

and the portfolio value at t is

Price Index and Factor Model

419

Wt = aipi,t + ajpj,t - aipi,O exp(tJ.li + {tUi) + ajpj,O exp(tllJ + {tUj), for large t. We deduce that

P[Wt > 0] =

p[Ui - Uj > ~ IOg(- :;~;::) - {t(J.li - J.lj)]'

tends to 1, when t tends to infinity. With a zero initial endowment, this static portfolio ensures asymptotically a positive gain with probability 1. We obtain the following condition of no asymptotic arbitrage opportunity: PROPOSITION

15.4: A necessary condition for asymptotic ADA is

Jl; = E log y~t = jl, independent of the asset.

Under this condition, we have limt--PfWt > 01= 1/2.

15.1: If there exists a risk-free asset with a constant rate r, the condition implies E log Yi,t = log(l + r), and by the Jensen inequality,

REMARK

EYi,t = E( exp log Yi,t) > exp(E log Yi,t) = 1 + r. Therefore, the condition is compatible with the existence of a risk premium. From the above analysis of basic price evolutions, we can easily infer the asymptotic behavior of a Laspeyres price index. Let us consider the evolution of the price index between t - 1 and t. We get

.ctlO(P) L {

Pi,t_l} qi,OPi,O-p.

n

.ct-l/O(P)

i=1

1,0

r

. . Pi,t_/i,t

1=1

_

~ { qi,Opi,O exp(tJl; + {tUi)

- £... i=1

( J )Yi,t ~i=1 qi,Opi,O exp tJ.li + WUi n

(15.18)

ql,OP"O Pi,O

} ,

where the variables Yi,l> i = 1, ... ,n, are independent of the asymptotic variables Ui, i = 1, ... ,n. The relative change in the price index is a weighted average of the asset price changes Yi,l> i = 1, ... , n, with stochastic weights. Let us now discuss the choice of weights by distinguishing two cases. For the first case, if the condition of asymptotic AOA is not satisfied, there is an asset, asset 1, say, with the highest mean J.lI' Then, the change of the index price tends to be driven by the change in the price of asset 1. Asymptotically, the Laspeyres index does not take into account the price movements of the whole set of assets.

Market Indexes

420

In the second case, if the means Ili' i = 1, ... ,n, are equal, we get, asymptotically, (15.19) The index is no longer comprehensive for large t. However, the prevailing asset is now randomly selected, with the probability of any asset being drawn depending on the volatility-covolatility matrix Q. 15.3.2

The Effect of Factors

The discussion can be extended to a dynamic factor representation of the relative changes of asset prices. Let us consider a linear factor model (see Chapter 9): Yi,1 =P;/Pi,I-1 = a~FI + £i,h

(15.20)

where (FI) is strongly stationary and independent of the strong white noise (£1) = (£I,e. ••• ,£",1)" Under the necessary condition of asymptotic AOA, E log(a~FI + £;,1) = Il,

(15.21)

independent of i,

the change of the index is ) "" /'£t/o(P (P)

J....I-1I0

[~n 1 ']F. ~n 1 Loi=1 (u;>u,VjFi)aj 1+ Loi=1 (uj>tt.,Vfr'1)£I' I

I

The Laspeyres index also satisfies a linear factor model with the same factors (FI), but with stochastic coefficients instead. Therefore, only linear factor representations with stochastic coefficients provide a coherent specification of both asset price and composite index dynamics. REMARK 15.2: The above result may not be valid for other types of indexes. To understand this point, let us consider a Paasche chain index. We get

1::1qi,t-Ipi,O 1:: qi,dJi,O 1:: qi,t-IPi,I-1 41 qi,lpi,1

'Pt/O(P) 'PH/0(P)

1

=

1

i {1:: i=1

i {1::

qi,t-Ipi,H } k k [ i=1 1 qi,t-IPi,t-1 qi,H Pi,t-I

qi,l-IPi,O } k 1 qi,l-IAo qi,H

]-1

The asymptotic expansion of the relative change involves both the price and quantity dynamics. Hence, factors driving the modification of the Paasche index are not only the price factors, but also quantity factors.

.

421

Endogenous Selectivity

15.4

Endogenous Selectivity

In a standard consumer price index, the set of goods included in the index is invariant. This explains why we denoted the goods by i = 1, ... , n without allowing for their dependence on either the date or the environment. Indexes of financial assets often become endogenously modified in time, leading to a change of their interpretation. Typically, as seen in Section 15.3, the choice of a fixed portfolio implies a lack of representativeness in the long run. Various schemes of endogenous selections of weights can be considered for the sake of representativeness or liquidity. Indeed, the index components have to be liquid for at least two reasons. The observed asset prices have to be competitive. Moreover, since several indexes support derivatives, any asset in the index has to be liquid enough to allow for arbitrage strategies. 15.4.1

Selection of the Weights

The decisions on substantial and sudden modifications of the index composition are usually taken by scientific committees, which have meeting dates that are never announced in advance to avoid speculative interventions on the markets. These committees may follow various policy rules, which are often based on a (static or dynamic) analysis of the underlying weights. N,

1t;,t = p;,tqi'L Pj,llj,t,

(15.22)

j=!

where i = 1, ... , Nt, and Nt is the total number of assets traded at period t. We describe below two approaches that allow transformation of the weights; these are often followed in practice. Index with a Fixed Number of Assets The main market indexes usually include a fixed number of assets, such as the S&P 100 on the NYSE, which includes 100 assets; the CAC 40 on the Paris Bourse, which includes 40 assets; and so on. The assets can be selected by considering the most important capitalizations. Let us consider a Laspeyres index with underlying weights 1ti,t-h i = 1, ... , Nt. We classify the assets by decreasing capitalization:

and denote by ij the index of the asset with rank j. Then, the price index with a given number N° of assets is such that NO

o Lj=l 1tij,t-lPi/Pij't-l Lt/t-l(P) = L'J~~ 1ti·t-l l'

(15.23)

Market Indexes

422

The evolution of this restricted index can differ from the evolution of an unrestricted index based on the set of all assets. Indeed, between t - 2 and t - 1, an asset may be removed from the selected basket if its weight diminishes significantly. It is sufficient that its price diminishes with respect to other asset prices. In contrast, a new asset may be included when its price increases significantly. Therefore, we get an endogenous selection scheme in which the price change of the newly included asset is generally above the average. Thus, we can expect the value of the restricted index L~t-1(P) to be greater than that of the unrestricted one. Index with Cap Let us consider an index that includes a given sample of assets: N

Ltlt-1(P) =

L 1ti,t-lPJP;,t-h i=1

where l:~1 1tiH = 1. We now illustrate a nonlinear transformation of the weights, which is often performed in practice when some weights become too large. The reason is that such an underlying portfolio may not be sufficiently diversified. This is a typical situation on the Helsinki Stock Exchange, on which the largest stock represents about 50% of the market. To prevent this effect, an upper bound (a cap) on the weight values, 10%, say, can be imposed. The new weight bundle is derived by the following recursive algorithm. 1. The weights are ranked by decreasing value, and the weights larger than 10% are set equal to 10%; then, the other weights are transformed by a scale factor to make them sum to 1. We get a new set of weights 1tt!~h i = 1, ... ,N. 2. The previous approach is reapplied to the set 1tt!~h i = 1, ... ,N, and so on until all weights are set less than or equal to 10%. This final set of weights is used to construct the index. To illustrate this technique, let us consider N = 20 and the initial weights (in %) 20, 9, 8, 6, 6, 6, 6, 3, ... , 3. In the first step, we truncate the weight 20 to get 10,9,8,6,6,6,6,3, ... ,3, then, we rescale by 9/8 to get 10,81/8,9,54/8, ... , 54/8, 27/8, ... ,27/8. The second weight is larger than 10. Therefore, we apply the method again to get

Endogenous Selectivity

423 100-20 10,10, 100 -10 _ 8118 9 , ... ,

which is the final set of weights

sinc~ 1001~~~ ~~1I8 9 < 10.

15.4.2 Hedonic Index for Bonds The construction of price indexes requires regularly observed asset prices from a time invariant sample of assets. This condition is not always satisfied, especially when we consider bonds or derivative assets. Indeed, liquid bonds (or derivatives) at date t - 1 and date t generally do not have the same structure of cash flows. To describe this problem, we first consider an index for bonds in a complete market framework and then present the extensions to the incomplete market framework.

Complete Market Let us consider a fixed portfolio of bonds, indexed by i = 1, ... ,n, with quantities qi.O, i = 1, ... ,n. At date t - 1, they pay cash flows (fi.o, fi.1> ... , fiJI), i = 1, ... ,n, at future dates t, t + 1, ... ,t + H and have prices At-I> i = 1, ... ,n. At date t, the bond prices are Pi.1> and the residual future cash flows are (fi.1> ... ,fiJI), i = 1, ... ,n. The modification of the Laspeyres index between t - 1 and t is Lt/O(P) = ~:'..l qi.OPi.t Lt-lIo(P) ~:'..l q•.op•.t-1

(15.24) •

Under the complete market hypothesis, the bond prices can be written in terms of zero-coupon prices:

Pi.t-l = ~~fiJl3(t - l,h), Pi.t = ~!lfiJl3(t,h - 1) = ~!ofi"'+IB(t,h), setting by convention fiJl+l = 0, 'l;;/i. By substituting into equation (15.24), we get

~:'..l qi.o[~!ofi.h+lB(t,h)l Lt-lIo(P) = ~:'..l qi.o[~!ofi.,.B(t-l,h)l Lt/o(P)

=

~!o (~:'..l qi.{;.h+l)B(t,h) H

n

~h=O (~i=l

qi.{;",)B(t - l,h) ~!o (~l qi.{;",)B(t,h) = H ~h=O (~:'..l qi.{;",)B(t - l,h) ~!o (~:'..l qi.{;"'+I)B(t,h) ~!o (~:'..l qi.{;",)B(t,h)

We get the decomposition

424

Market Indexes LtlO(P) L t-1I0(P)

Ltlo(B) Lt-l/O(B) et(s),

(15.25)

where Ltlo(B)/Lt-lIo(B) is the modification of a Laspeyres index on zerocoupon bonds, and et(s) is a residual term that measures a structural effect. Indeed, the bond i, with the same name at dates t - 1 and t, is not the same financial asset since its maturity has decreased, and the cash flow pattern became modified. Therefore, the index computed from bonds is a value index, which does not take into account the intermediate coupon payments. It still responds to the structural effect. Note that only the Laspeyres index computed from zero-coupon bonds has an interpretation as a price index. Incomplete Market Generally, the zero-coupon bonds are not actively traded on the market, and thus the associated prices are not observed. Therefore, the exact price index cannot be computed, but may be approximated instead by a dynamic model of the term structure. More precisely, by using such a model, we derive date-by-date predictions of the underlying zero-coupon prices B(t,h), say. Then, the modification of the index is approximated by Ctlo(B) Lt-1/0(B)

~~ (~:'l qi,cf;;.)B(t,h) ~~ (~:'l q;,cf;;.)B(t - l,h) .

(15.26)

This approach is related to the theory of hedonic price indexes (see, e.g., Griliches 1961; Rosen 1974), introduced for consumption goods of varying quality. The idea underlying the hedonic index is to price the various qualities, or characteristics, and to consider next a consumption good as a portfolio of characteristics with a price that is inferred from the individual characteristic prices. The bond characteristics are various admissible maturities, and the cash flows represent the quantities of these characteristics.

15.5 Summary A market index is commonly perceived as an empirical equivalent of the theoretical concept of market portfolio, a key ingredient of the CAPM model. It is important to realize, however, that market indexes have various structures and compositions. As a consequence, not all market indexes provide good approximations of the market portfolio. Section 15.1 introduced basic definitions of consumer price indexes, induding the well-known Paasche and Laspeyres indexes, extended in Section 15.2 to market indexes. We discussed the practical use of market indexes with

Summary

425

reference to their design and gave a brief overview of major stock market indexes. The following section illustrated the use of factor models to study market indexes. We showed that factor variables can efficiently account for the price and quantity effects, whenever random coefficients are introduced. More information on factor models can be found in Chapter 9. The last section concerned the choice of weights in constructing market indexes. This topic was investigated under the complete and incomplete market hypotheses.

16 Management of Extreme Risks

IN PREVIOUS CHAPTERS, risk on financial assets was measured by the conditional second-order moment representing the volatility. Recall that volatility-based risk underlies the mean-variance portfolio management rules and justifies the use of generalized autoregressive conditionally heteroscedastic (GARCH) and stochastic volatility models for predicting future risks. The variance, however, provides a correct assessment of risk only under some specific conditions, such as the normality of the conditional distribution of returns and a constant absolute risk aversion of investors. It also adequately represents small risks. In particular, a measure of a small market risk on an asset of value X is based on the expected utility E(U(X»), which can be expanded into U(E(X)) + 0.5 U'[E(X)]V(X), revealing the importance of the two first moments. Such risks can be evaluated at short horizons from price processes with continuous trajectories. In contrast, the conditional variance shows poor performance as a measure of occasionally occurring extreme risks. This chapter introduces the models, risk measures, and optimal risk management rules under the presence of extreme risks. The market risk is inherently related to the probability of occurrence of extreme events, that is, very large negative or positive returns. For any random return variable, the probability of extremely valued observations is reflected by the size of the tails of the distribution. The benchmark for heavy tails is determined by the normal distribution. Normally distributed data are characterized by a relatively low probability of extreme realizations, and the tails of their distribution taper off quickly. Indeed, a standard normal variable admits values greater than 1.96 in absolute sense only with 5% probability. In statistics, a standard measure of tails is the kurtosis, defined as the ratio E[X]/[E[X2]]2 for a zero-mean variable X.

427

Management of Extreme Risks

428

It can be estimated by computing

r

~ ~ xil[ ~ ~ x~

Whenever this quantity

exceeds 3 (i.e., the theoretical kurtosis of a standard normal), we say that the data feature excess kurtosis, or that their distribution is leptokurtic, that is, has heavy tails. There also exist other more sophisticated scalar measures of the thickness of tails. Among them we distinguish a classical measure called the tail index. It is introduced in the first section, along with the estimator of the tail index originally proposed by Hill (1975). In Section 16.1, we review thick-tailed distributions commonly used in statistical analysis and introduce dynamic models that allow for pathdependent tails. The Value at Risk (VaR) is defined in Section 16.2. The VaR measures the maximum loss on a portfolio incurred within a fixed period with a given probability. It is used by bank regulators to define the minimum capital banks are required to hold to hedge against market risk on asset portfolios. We also show that the VaR can be interpreted as a conditional quantile and present selected quantile estimators. In Section 16.3, we analyze the sensitivity of the VaR to changes in portfolio allocations and study portfolio management when VaR is used as a measure of risk instead of volatility. Finally, in Section 16.4, we explain why standard constant absolute risk aversion (CARA) utility functions are not adapted to infrequent extreme risks and introduce a class of utility functions for extreme risk analysis.

16.1

Distributions with Heavy Tails

Empirical evidence suggests that the marginal and conditional distributions of returns feature heavy tails and therefore often admit extreme values. We show in Figure 16.1 the empirical distribution of returns on the Alcatel stock traded on the Paris Bourse in August 1996, sampled at I-minute intervals. The density function is centered at 4.259E-6, and its variance and standard deviation are 9.771E-7 and 0.000989, respectively. The distribution is slightly asymmetric, with skewness coefficient -0.00246. The high kurtosis of 5.329689 is due to the presence of heavy tails stretched between the extreme values of -0.00813 and 0.007255. Of the probability mass, 90% is concentrated between -0.0017 and 0.001738. The interquartile range 0.000454 is 100 times smaller than the overall range 0.01538. The kurtosis of real-time data in general is lower than the kurtosis of data sampled regularly in calendar time, which is approximately 10 for the return series. This is a standard effect of random time deformation on the empirical distribution (see Chapter 14). The shape of tails, although kernel smoothed in Figure 16.1, suggests the presence of local irregularities in the rate of tail decay. Indeed, slight

429

Distributions with Heavy Tails

g L!)

o~====~=====-0.005

Figure 16.1

__

~~~==~====~ 0.0

0.005

Density of Returns, Alcatel

lobes can easily be distinguished in both tails. They may be due to the discreteness of prices and preference of investors for round numbers (see Chapter 14). In the right tail, we observe a higher probability of returns taking values between 0.0012 and 0.0013 compared to the probability of those of a slightly smaller size (Le., between 0.0010 and 0.0012). In the left tail, we recorded relatively more returns between -0.0014 and -0.0012 than those with marginally higher values. 16.1.1

Tail Index

Let us consider a continuous distribution on the real line with the probability density function (pdf) and cumulative distribution function (cdf) denoted by f and F, respectively. For example, let fbe a conditional distribution of portfolio returns. The right (resp. left) tail encompasses extreme high (resp. low) values of y, which in the limit tend to +00 (or -00). The magnitude of the right tail can be inferred from the survivor function 1- F(y), which determines the probability that the variable takes values greater than y. In particular, the survivor function can be used for comparing the size of tails of various distributions. We say that a distribution F* more often admits large positive values than a distribution F, if 1- F*(y) > 1- F(y),

for largey.

In general, distributions can be compared in terms of asymptotic properties of the survivor function when y tends to +00. Below, we give some examples of analytical expression of survivor functions for selected distributions.

Management of Extreme Risks

430

Gaussian Distribution Let us denote by

o] = a, t=1

where 1[.] is an indicator function. The nonparametric approach is commonly implemented by banks. It is called historical simulation because it is based on the historical distribution. In reality, it involves no simulation at all. A serious limitation of this method is due to the discreteness of the T observed portfolio values, which implies that the above equation, in

438

Management of Extreme Risks

general, has no closed-form solution. It is thus preferable to smooth the estimator of the historical cdf. The smoothing yields the VaR as a continuous function of the portfolio allocation. For instance, we can replace the indicator function 1,>0 by cD(ylh), where cD is the cdf of the standard normal distribution, and h is a bandwidth. The smoothed estimated VaR is defined as the solution to 1 T _

T L cD[(-a'Yt -

(16.12)

VaR(a,a»lh] = a.

1=1

In practice, this equation is solved numerically. The starting value of the algorithm can be set equal to the VaR derived under the normality assumption. When T is large and h close to 0, the estimated VaR is asymptotically equivalent to

_(;; _

lIT [-Vi R( )] _r;; ga a a,a -VT

\IT[VaR(a,a) - VaR(a,a)] -

L

[la',, and Q 1 the conditional mean and variance, respectively, of !!'Pt+!. The VaR is given by REMARK

Value at Risk Efficient Portfolios

445

We get

16.3.2

The Optimization Problem

A portfolio selection rule based on probability of failure (also called the safety first criterion) was initially proposed by Roy (1952) (see Levy and Sarnat 1972; Arzac and Bawa 1977; Jansen, Koedijk, and de Vries 1998 for applications). Let us consider a given budget w to be allocated at time t among n risky assets and a risk-free asset, with a risk-free interest rate r. The budget constraint at time t is

where ao is the amount invested in the risk-free asset and a the allocation in the risky assets. The portfolio value at the next date is

Wt+I(a) = ao(1 + r) + a'PHI = w(1 + r) + a'[pt+1 - (1 + r)pt] = w(l + r) + a'1';+h

where 1';+1 is the excess gain. The required reserve amount for this portfolio is defined by (16.19) It can be written in terms of the quantile of the risky part of the portfolio, denoted by VaRla,a). This quantile is different from VaRe (ao,a,a) = w + Rlao,a;a). More precisely, we get

Re(ao,a,a) + w(1 + r) = VaRe(a,a), where VaRla,a) satisfies (16.20) We define the VaR efficient portfolio as a portfolio with an allocation vector that solves the constrained optimization

446

Management of Extreme Risks {max"E/W/+l(a) subject to R,(ao,a,a) S; RO,

where RO is a benchmark reserve level. This optimization is equivalent to max" a' E/Y;+1 {

subject to VaR,(a,a)

S;

VaRo,

where VaRo = RO + w(l + r). PROPOSITION 16.2: A VaR efficient portfolio satisfies the first-order condition E/Y;+1 = A. *E/[Y;+11 a*' Y;+1 =- VaRo] , where the Lagrange multiplier is determined by the constraint VaRla*,a) = VaRGo Thus, the first-order condition implies the proportionality between the expected excess gain and the expectation evaluated on the boundary of the risk contraint.

16.3.3 Estimation of Value at Risk Efficient Portfolios Let us assume i.i.d. excess gains Y. For any allocation a, the VaR can be estimated as (see Section 16.2.3)

~ ~ (-a'Y/- ~;;R(a,a») = a, where h is a bandwidth. Then, we can solve the problem of optimal allocation by applying a Gauss-Newton algorithm. The optimization in a neighborhood of an allocation al./» becomes max" a' EY;+ 1 subject to VaR(al./»,a) + d~:~(al./», a)[a - al./»]

![ _

1./»],d2VaR(alJ»,a) [ _ 1./»] < TT. RO dada' a a - va .

+2 a a

This objective function admits the solution

a

1./>+1) _ 1./» _ [d 2vaR( 1./» )]-1 dVaR( 1./» ) dada' a ,a da a ,a - a

+

[

2 [VaRo - VaR[al./»,a)] + Q(al./»,a)] 112 [d 2VaR( 1./» )]EY. :I :I , a ,a /+h d2V/ R ]-1 oaoa EY:+1 [da:a' (al./»,a) EY;+1

. 1./» dVaR 1./» [d 2VaR 1./» ]-1 dVaR 1./» wIth Q(a ,a) = da' (a ,a) dada' (a ,a) ---aa(a ,a).

447

Utility Functions for Extreme Risks

Next, the theoretical recursion is replaced by its empirical counterpart, in which the expectation EY,+I is replaced by the empirical mean Yn whereas the VaR and its derivatives are replaced by their corresponding kernel estimates.

16.4

Utility Functions for Extreme Risks

In this section, we introduce a specification of infrequent extreme risks and show that an optimizing behavior based on expected CARA utility functions leads to zero demand for risky assets. Next, we introduce a class of utility functions that allows a strictly positively valued demand for risky assets. 16.4.1

A Drawback of Constant Absolute Risk Aversion Utility Functions

Let us assume that the distribution of excess asset returns is a mixture of two normal distributions (16.21) with the same mean and weights a and 1 - a. The variance-covariance matrices are normalized according to the weights. Intuitively, if a is close to 0 (or to 1), there is an infrequently occurring regime of very large variance, that is, an extreme risk. However, when the regime is unknown, the mean is equal to

am + (1- a)m = m, whereas the variance is a ill + (1- a) 1il2

a

-a

= ill + il2.

Therefore, the first- and second-order moments do not depend on the weights. Let us now consider an investor with an absolute risk aversion coefficient A = 1. If the investor follows the standard mean-variance approach, the allocation of the efficient portfolio is

a* = (ill + il 2r lm.

(16.22)

Alternatively, if an investor maximizes an expected CARA utility function, the optimal allocation solves the optimization objective ii = Arg max. - E( exp - a' Y) = Arg max. - {a exp (-a,m + a'illa)

2"