246 37 3MB
English Pages [121] Year 2023
SpringerBriefs in Statistics JSS Research Series in Statistics Makoto Takahashi · Yasuhiro Omori · Toshiaki Watanabe
Stochastic Volatility and Realized Stochastic Volatility Models
SpringerBriefs in Statistics
JSS Research Series in Statistics Editors-in-Chief Naoto Kunitomo, The Institute of Mathematical Statistics, Tachikawa, Tokyo, Japan Akimichi Takemura, The Center for Data Science Education and Research, Shiga University, Hikone, Shiga, Japan Series Editors Genshiro Kitagawa, Meiji Institute for Advanced Study of Mathematical Sciences, Nakano-ku, Tokyo, Japan Shigeyuki Matsui, Graduate School of Medicine, Nagoya University, Nagoya, Aichi, Japan Manabu Iwasaki, School of Data Science, Yokohama City University, Yokohama, Kanagawa, Japan Yasuhiro Omori, Graduate School of Economics, The University of Tokyo, Bunkyo-ku, Tokyo, Japan Masafumi Akahira, Institute of Mathematics, University of Tsukuba, Tsukuba, Ibaraki, Japan Masanobu Taniguchi, School of Fundamental Science and Engineering, Waseda University, Shinjuku-ku, Tokyo, Japan Hiroe Tsubaki, The Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan Satoshi Hattori, Faculty of Medicine, Osaka University, Suita, Osaka, Japan Kosuke Oya, School of Economics, Osaka University, Toyonaka, Osaka, Japan Taiji Suzuki, School of Engineering, University of Tokyo, Tokyo, Japan
The current research of statistics in Japan has expanded in several directions in line with recent trends in academic activities in the area of statistics and statistical sciences over the globe. The core of these research activities in statistics in Japan has been the Japan Statistical Society (JSS). This society, the oldest and largest academic organization for statistics in Japan, was founded in 1931 by a handful of pioneer statisticians and economists and now has a history of about 80 years. Many distinguished scholars have been members, including the influential statistician Hirotugu Akaike, who was a past president of JSS, and the notable mathematician Kiyosi Itô, who was an earlier member of the Institute of Statistical Mathematics (ISM), which has been a closely related organization since the establishment of ISM. The society has two academic journals: the Journal of the Japan Statistical Society (English Series) and the Journal of the Japan Statistical Society (Japanese Series). The membership of JSS consists of researchers, teachers, and professional statisticians in many different fields including mathematics, statistics, engineering, medical sciences, government statistics, economics, business, psychology, education, and many other natural, biological, and social sciences. The JSS Series of Statistics aims to publish recent results of current research activities in the areas of statistics and statistical sciences in Japan that otherwise would not be available in English; they are complementary to the two JSS academic journals, both English and Japanese. Because the scope of a research paper in academic journals inevitably has become narrowly focused and condensed in recent years, this series is intended to fill the gap between academic research activities and the form of a single academic paper. The series will be of great interest to a wide audience of researchers, teachers, professional statisticians, and graduate students in many countries who are interested in statistics and statistical sciences, in statistical theory, and in various areas of statistical applications.
Makoto Takahashi · Yasuhiro Omori · Toshiaki Watanabe
Stochastic Volatility and Realized Stochastic Volatility Models
Makoto Takahashi Faculty of Business Administration Hosei University Chiyoda-ku, Tokyo, Japan
Yasuhiro Omori Faculty of Economics University of Tokyo Bunkyo-ku, Tokyo, Japan
Toshiaki Watanabe Graduate School of Social Data Science Hitotsubashi University Kunitachi, Tokyo, Japan
ISSN 2191-544X ISSN 2191-5458 (electronic) SpringerBriefs in Statistics ISSN 2364-0057 ISSN 2364-0065 (electronic) JSS Research Series in Statistics ISBN 978-981-99-0934-6 ISBN 978-981-99-0935-3 (eBook) https://doi.org/10.1007/978-981-99-0935-3 © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
The aim of this book is to introduce recent developments in stochastic volatility models for asset returns such as returns of stocks, foreign currencies, and interest rates. It is intended for researchers who are interested in Bayesian analysis of financial time series. In financial markets, it is well known that financial time series often exhibit a behavior called volatility clustering and that the volatility of the asset return changes randomly with high persistence. The stochastic volatility model describes such dynamic structures of the unobserved volatilities. We first introduce the basic stochastic volatility model and then extend it to various volatility models. Noting that realized volatility computed from high-frequency data has recently been used to estimate true volatility, we also discuss joint modeling of the asset return and realized volatility, which is called realized stochastic volatility. Chapter 1 summarizes the topics covered in this book. We describe the various stochastic volatility models covered in this book including the basic model, the asymmetric stochastic volatility model, the stochastic volatility model with generalized hyperbolic (GH) skewed Student’s t error, and joint modeling of stochastic volatility and realized volatility. Chapter 2 discusses the basic stochastic volatility model and describes an estimation method using Markov chain Monte Carlo (MCMC) simulation. In stochastic volatility models, there are as many latent volatilities as asset returns, and we need to integrate them out to compute the likelihood function. Since it is difficult to obtain the likelihood numerically, we use the Bayesian approach and implement MCMC simulation to facilitate statistical inference of model parameters. The simple MCMC sampling algorithm for the latent log volatilities is called a single-move sampler, which samples one variable given other variables. However, it is known to be inefficient in the sense that it produces highly autocorrelated MCMC samples. To overcome this difficulty, we describe two major efficient MCMC estimation methods: the mixture sampler and the multi-move (or block) sampler. The former transforms the model into a linear state-space model and approximates the error distribution using a mixture of normal distributions. Given the mixture normal distribution, the volatilities are generated all at once. The latter method generates a block of state
v
vi
Preface
variables given other parameters and latent variables. Using real data in empirical studies, we will show that they are highly efficient. Chapter 3 introduces the asymmetric stochastic volatility model, which extends the basic model to allow for correlation between the return and log volatility errors. It has long been recognized in stock markets that a decrease in today’s return is followed by an increase in the tomorrow’s volatility. This phenomenon is called the “leverage effect” or “asymmetry.” The efficient MCMC algorithms given in Chap. 2 are extended to incorporate such a negative correlation between the errors for the leverage effect. Chapter 4 discusses the stochastic volatility model with GH skew Student’s t error. Financial time series data such as stock returns and foreign exchange returns are known to have several properties that depart from a normality assumption. Major characteristics of return distributions for financial variables are skewness, heavy-tailedness, and volatility clustering with leverage effects. These properties are crucial not only for describing return distributions but also for asset allocation, option pricing, forecasting, and risk management. Further, to visualize a leverage effect or volatility asymmetry, we introduce a news impact curve in the context of the stochastic volatility models, which measures how the new information affects the return volatility. In Chap. 5, we consider an additional piece of information, namely the realized volatility, for the measurement equation. The realized volatility is the sum of squared intraday returns over a certain interval such as a day; it has recently attracted the attention of financial economists and econometricians as an accurate measure of true volatility. In the real market, however, the presence of non-trading hours and market microstructure noise in transaction prices may cause bias in the realized volatility. On the other hand, daily returns are less subject to noise and may therefore provide additional information about the true volatility. From this view point, modeling realized volatility and daily returns simultaneously based on the well-known stochastic volatility model is proposed. The GH skew Student’s t error is considered, as in Chap. 4, and we investigate the predictive performance of the realized stochastic volatility model with respect to one-day-ahead volatility, value at risk, and expected shortfall forecasts. Finally, we would like to thank the members of the JSS-Springer Editorial Committee and the editors at Springer for their patience and support. Financial support from the Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government through Grant-in-Aid for Scientific Research (Nos. 19H00588 and 20H00073), the Hitotsubashi Institute for Advanced Study, and the Hosei University’s Innovation Management Research Center is also gratefully acknowledged. Additionally, we would like to thank Editage (www.editage.com) for English language editing. Tokyo, Japan December 2022
Makoto Takahashi Yasuhiro Omori Toshiaki Watanabe
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Research Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Summary of Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 3 4
2 Stochastic Volatility Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Single-Move Sampler for the Symmetric SV Model . . . . . . . . . . . . . 2.2.1 Generation of θ = (μ, φ, ση2 ) . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Generation of h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Mixture Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Reformulation of the Measurement Equation . . . . . . . . . . . . . 2.3.2 MCMC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Correcting for Misspecification . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Multi-move Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Auxiliary Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Simulation Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Augmented Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 7 8 9 10 12 12 14 16 19 22 23 27 27 29 30
3 Asymmetric Stochastic Volatility Model . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Single-Move Sampler for the Asymmetric SV Model . . . . . . . . . . . . 3.2.1 Generation of (μ, φ, ση2 , ρ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Generation of h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Mixture Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Reformulation of the Measurement Equation . . . . . . . . . . . . . 3.3.2 MCMC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Correcting for Misspecification . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Multi-move Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31 31 32 33 35 35 35 38 41 43 vii
viii
Contents
3.5 Auxiliary Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Simulation Smoother . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Augmented Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 51 53 53 54 54
4 Stochastic Volatility Model with Generalized Hyperbolic Skew Student’s t Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Generalized Hyperbolic Skew Student’s t Distribution . . . . . . . . . . . 4.3 SV Model with GH Skew Student’s t Error . . . . . . . . . . . . . . . . . . . . . 4.4 MCMC Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Generation of (μ, φ, ση , ρ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Generation of (ν, β) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Generation of λ and h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 News Impact Curve: Simulation-Based Method . . . . . . . . . . . . . . . . . 4.5.1 Simulation Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57 57 58 59 61 62 63 65 66 68 68 76
5 Realized Stochastic Volatility Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Realized Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.3 Realized Stochastic Volatility Model . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4 RSV Model with GH Skewed Student’s t Error . . . . . . . . . . . . . . . . . 82 5.5 MCMC Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.5.1 Generation of (μ, φ, ση , ρ, ν, β) and λ . . . . . . . . . . . . . . . . . . 84 5.5.2 Generation of ξ and σu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.5.3 Generation of h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.6 Evaluation of Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.6.1 Volatility, VaR, and ES Forecasts . . . . . . . . . . . . . . . . . . . . . . . 86 5.6.2 Loss Functions for Volatility . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.6.3 A Joint Loss Function for VaR and ES . . . . . . . . . . . . . . . . . . 88 5.6.4 Testing Relative Forecast Performance . . . . . . . . . . . . . . . . . . 88 5.7 EGARCH and Realized EGARCH Models . . . . . . . . . . . . . . . . . . . . . 90 5.8 Empirical Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.8.1 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.8.2 Prediction Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Chapter 1
Introduction
Abstract This chapter provides a brief background on the developments in stochastic volatility models, including Markov chain Monte Carlo simulation for the estimation of model parameters, and summarizes the topics covered in the subsequent chapters.
1.1 Research Background In finance and related fields, the term volatility indicates a dispersion of financial asset returns and represents a financial risk. It is considered as either standard deviation or variance of returns but cannot be directly observed because it changes stochastically over time. Thus, modeling and forecasting time-varying financial volatility is one of the most important topics in finance for the prediction of return distribution and risk management. Time-varying volatility used to be estimated and forecasted via two classes of time series models using financial returns: the generalized autoregressive conditional heteroskedasticity (GARCH) methodologies [15, 25] and the stochastic volatility (SV) model [26, 46, 51]. These models are consistent with volatility clustering (high persistence in volatility) and have been extended to accommodate the phenomenon called volatility asymmetry or the leverage effect, i.e., the negative correlation between today’s return and tomorrow’s volatility observed in stock markets [14, 19]. Early works related to such extensions in the GARCH framework include the exponential GARCH [41], GJR [27], and asymmetric power GARCH [23] models, while those for the SV model are [31, 39, 54]. Among various econometric models in the literature, the SV model has been one of the most popular with the GARCH models. Past empirical studies have shown that the SV model outperforms other models (including GARCH models) with respect to goodness of fit, forecasting volatilities, and evaluating the value at risk (a measure widely used for risk management). However, unlike the GARCH models, it is difficult to evaluate the likelihood of the SV model analytically and hence to estimate the model parameters using the maximum likelihood method. Thus, Bayesian methods for estimating the parameters using the Markov chain Monte Carlo (MCMC) technique have been developed in the literature. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Takahashi et al., Stochastic Volatility and Realized Stochastic Volatility Models, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-99-0935-3_1
1
2
1 Introduction
There are two major efficient MCMC estimation methods, in the sense that they substantially reduce autocorrelations of MCMC samples. One method is the mixture sampler [33, 44], which transforms the model into a linear state-space model and approximates the error distribution using a mixture of normal distributions. The mixture sampler is fast and highly efficient, but its use is limited to models that can be transformed into a linear state-space form. The other method is the multi-move (or block) sampler [45, 47, 55], which generates a block of state variables given other parameters and latent variables. In contrast to the mixture sampler, it can be applied to the model directly, without first transforming it into a linear state-space form. Along with development regarding efficient estimation methods, SV models have been extended to capture the characteristics observed in financial returns such as daily equity returns and exchange rates. The empirical return distribution is more peaked and has heavier tails than the normal distribution. Such a heavy tail or leptokurtosis can be partly explained by time-varying volatility, but the return distribution conditional on the volatility may still be leptokurtic. To describe the heavy tail in the SV context, Student’s t distribution is often used for the error distribution [13, 18, 37, 44, 56]. Moreover, the return distribution may also be skewed, and several types of skew Student’s t distributions, such as the generalized hyperbolic (GH) skew Student’s t distribution [1], have been used for the error distribution [36, 40]. Recently, thanks to the availability of high-frequency data that includes prices and other trade information within a day, a model-free volatility estimator, called realized volatility (RV), has become popular in finance. Basically, a daily RV is defined as the sum of squared intraday returns over a day. Early studies have shown its theoretical and empirical properties such as consistency under some conditions and strong persistence [3, 5–7, 10, 11]. To forecast time-varying volatility for financial risk management, one must model the dynamics of RV. Many researchers, including [5–7], have documented that daily RV may follow a long-memory process, so they use autoregressive fractionally integrated moving average (ARFIMA) models (see [12] for long-memory and ARFIMA models). A more widely used model for RV dynamics is the heterogeneous autoregressive (HAR) model proposed by [20]. It is not a long-memory model but is known to approximate the long-memory process well, and several extensions have been proposed [4, 21, 22]. Although the ARFIMA and HAR models have been shown to significantly improve volatility forecasts relative to GARCH and SV models, the RV is subject to a bias caused by market microstructure noise and non-trading hours. To mitigate or correct the bias, various RV measures have been proposed [8, 9, 32, 57, 58]. Such RV measures have been analyzed extensively in the literature [2, 38, 53]. For modeling and forecasting volatility while taking into account the bias in RV, two classes of hybrid models have been proposed. One is the realized stochastic volatility (RSV) model based on the SV model [24, 35, 48], and the other is the realized GARCH (RGARCH) model based on the GARCH model [28, 29]. By jointly modeling the RV and daily return, which is less subject to microstructure noise, these
1.2 Summary of Topics
3
hybrid models can consider the bias in RV as a model parameter. Further, accounting for other empirical characteristics in RVs and daily returns, several extensions of the RSV model [42, 43, 49, 52] and the RGARCH model [16, 28] have been proposed. These volatility models’ forecasting abilities have been compared within each class, e.g., GARCH [30], RGARCH [17], and RSV [49], but a comprehensive comparison across different classes of volatility models is rare. A few exceptions are [34] and [50]. In particular, Takahashi et al. [50] documented that models with RV outperforms those without it and that the RSV model performs better than the HAR and REGARCH models.
1.2 Summary of Topics This book introduces recent developments in stochastic volatility models. For the estimation of model parameters, we focus on Bayesian methods using MCMC simulation. The models and estimation methods are further illustrated in the applications to exchanges rates and stock indices. Chapter 2 describes a basic SV model without the leverage effect and an efficient Bayesian estimation method using MCMC. Several efficient sampling algorithms for latent volatilities, such as the mixture sampler and the multi-move sampler, are described in detail. Parameter estimation is illustrated in an empirical study using daily returns of the Japanese yen/US dollar exchange rate. In Chap. 3, the basic SV model is extended to accommodate the leverage effect and an efficient Bayesian estimation method is described. In particular, the mixture sampler and the multi-move sampler for latent volatilities are developed for the extension. The empirical study presents the estimation results for daily returns of the US stock index, Dow Jones Industrial Average (DJIA). Chapter 4 further extends the SV model to incorporate the heavy tail and skewness. In particular, the SV model with the GH skew Student’s t error is described, and its efficient MCMC estimation method is presented. To visualize the leverage effect as a function of multiple parameters in the extended model, we also introduce a simulation method to calculate a news impact curve in the context of SV models. The extended model and estimation method are applied to daily returns of the Japanese stock index, Nikkei 225 (N225), as well as to the DJIA. In Chap. 5, the RSV model is introduced with a brief description of RV. Similar to the SV model, the RSV model is extended with the GH skew Student’s t error. An efficient MCMC estimation method is also developed to estimate the models. In the empirical study, the SV and RSV models, as well as the EGARCH and REGARCH models, are applied to the DJIA and N225 and compared with respect to one-dayahead volatility, VaR, and expected shortfall (a measure to assess a financial tail risk) forecasts.
4
1 Introduction
References 1. Aas, K., Haff, I.H.: The generalized hyperbolic skew Student’s t-distribution. J. Fin. Econometrics 4(2), 275–309 (2006). https://doi.org/10.1093/jjfinec/nbj006 2. Aït-sahalia, Y., Mykland, P.A.: Estimating volatility in the presence of market microstructure noise: a review of the theory and practical considerations. In: Andersen, T.G., Davis, R.A., Kreiβ, J.P., Mikosch, T. (eds.) Handbook of Financial Time Series, pp. 577–598. Springer, Berlin (2009) 3. Andersen, T.G., Bollerslev, T.: Answering the skeptics: yes, standard volatility models do provide accurate forecasts. Int. Econ. Rev. 39(4), 885–905 (1998) 4. Andersen, T.G., Bollerslev, T., Diebold, F.X.: Roughing it up: including jump components in the measurement, modeling, and forecasting of return volatility. Rev. Econ. Stat. 89(4), 701–720 (2007) 5. Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H.: The distribution of realized stock return volatility. J. Fin. Econ. 61(1), 43–76 (2001). https://doi.org/10.1016/S0304-405X(01)000551 6. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P.: The distribution of realized exchange rate volatility. J. Am. Stat. Assoc. 96(453), 42–55 (2001). https://doi.org/10.1198/ 016214501750332965 7. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P.: Modeling and forecasting realized volatility. Econometrica 71(2), 579–625 (2003) 8. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N.: Designing realized kernels to measure the ex post variation of equity prices in the presence of noise. Econometrica 76(6), 1481–1536 (2008). https://doi.org/10.3982/ECTA6495 9. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N.: Realized kernels in practice: trades and quotes. Econometrics J. 12(3), C1–C32 (2009). https://doi.org/10.1111/j.1368423X.2008.00275.x 10. Barndorff-Nielsen, O.E., Shephard, N.: Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. J. R. Stat. Soc. B 63(2), 167–241 (2001) 11. Barndorff-Nielsen, O.E., Shephard, N.: Econometric analysis of realized volatility and its use in estimating stochastic volatility models. J. R. Stat. Soc. B 64(2), 253–280 (2002) 12. Beran, J.: Statistics for Long-Memory Processes, 1st edn. Chapman & Hall (1994) 13. Berg, A., Meyer, R., Yu, J.: Deviance information criterion for comparing stochastic volatility models. J. Bus. Econ. Stat. 22(1), 107–120 (2004). https://doi.org/10.1198/ 073500103288619430 14. Black, F.: Studies of stock market volatility changes. In: Proceedings of the American Statistical Association, Business and Economic Statistics Section, pp. 177–181 (1976) 15. Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. J. Econometrics 31(3), 307–327 (1986) 16. Borup, D., Jakobsen, J.S.: Capturing volatility persistence: a dynamically complete realized EGARCH-MIDAS model. Quan. Financ. 19(11), 1839–1855 (2019). https://doi.org/10.1080/ 14697688.2019.1614653 17. Chen, C.W., Watanabe, T., Lin, E.M.: Bayesian estimation of realized GARCH-type models with application to financial tail risk management. Econometrics Stat. (2021). https://doi.org/ 10.1016/j.ecosta.2021.03.006 18. Chib, S., Nardari, F., Shephard, N.: Markov chain Monte Carlo methods for stochastic volatility models. J. Econometrics 108(2), 281–316 (2002). https://doi.org/10.1016/S03044076(01)00137-3 19. Christie, A.A.: The stochastic behavior of common stock variances: value, leverage, and interest rate effects. J. Fin. Econ. 10, 407–432 (1982) 20. Corsi, F.: A simple approximate long memory model of realized volatility. J. Fin. Econometrics 7(2), 174–196 (2009) 21. Corsi, F., Fusari, N., La Vecchia, D.: Realizing smiles: options pricing with realized volatility. J. Fin. Econ. 107(2), 284–304 (2013). https://doi.org/10.1016/j.jfineco.2012.08.015
References
5
22. Corsi, F., Mittnik, S., Pigorsch, C., Pigorsch, U.: The volatility of realized volatility. Econometric Rev. 27(1–3), 46–78 (2008). https://doi.org/10.1080/07474930701853616 23. Ding, Z., Granger, C.W., Engle, R.F.: A long memory property of stock market returns and a new model. J. Empirical Financ. 1(1), 83–106 (1993). https://doi.org/10.1016/09275398(93)90006-D 24. Dobrev, D.P., Szerszen, P.J.: The Information Content of High-Frequency Data for Estimating Equity Return Models and Forecasting Risk. FRB Working Paper 2010-45 (2010) 25. Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4), 987–1008 (1982) 26. Ghysels, E., Harvey, A.C., Renault, E.: Stochastic volatility. In: Maddala, G.S., Rao, C.R. (eds.) Handbook of Statistics, vol. 14, pp. 119–191. Elsevier (1996). https://doi.org/10.1016/S01697161(96)14007-4 27. Glosten, L.R., Jagannathan, R., Runkle, D.E.: On the relation between the expected value and the volatility of the nominal excess return on stocks. J. Financ. 48(5), 1779–1801 (1993). https://doi.org/10.1111/j.1540-6261.1993.tb05128.x 28. Hansen, P.R., Huang, Z.: Exponential GARCH modeling with realized measures of volatility. J. Bus. Econ. Stat. 34(2), 269–287 (2016) 29. Hansen, P.R., Huang, Z., Shek, H.: Realized GARCH: a joint model of returns and realized measures of volatility. J. Appl. Econometrics 27(6), 877–906 (2012) 30. Hansen, P.R., Lunde, A.: A forecast comparison of volatility models: does anything beat a GARCH(1,1)? J. Appl. Econometrics 20(7), 873–889 (2005). https://doi.org/10.1002/jae.800 31. Harvey, A.C., Shephard, N.: Estimation of an asymmetric stochastic volatility model for asset returns. J. Bus. Econ. Stat. 14(4), 429–434 (1996). https://doi.org/10.2307/1392251 32. Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M.: Microstructure noise in the continuous case: the pre-averaging approach. Stochast. Process. Appl. 119(7), 2249–2276 (2009). https:// doi.org/10.1016/j.spa.2008.11.004 33. Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison with ARCH models. Rev. Econ. Stud. 65, 361–393 (1998) 34. Koopman, S.J., Jungbacker, B., Hol, E.: Forecasting daily variability of the S&P 100 stock index using historical, realised and implied volatility measurements. J. Emp. Financ. 12(3), 445–475 (2005) 35. Koopman, S.J., Scharth, M.: The analysis of stochastic volatility in the presence of daily realized measures. J. Fin. Econometrics 11(1), 76–115 (2013). https://doi.org/10.1093/jjfinec/nbs016 36. Leão, W.L., Abanto-Valle, C.A., Chen, M.H.: Bayesian analysis of stochastic volatility-inmean model with leverage and asymmetrically heavy-tailed error using generalized hyperbolic skew Student’s t-distribution. Stat. Interface 10(4), 529–541 (2017) 37. Liesenfeld, R., Jung, R.C.: Stochastic volatility models: conditional normality versus heavytailed distributions. J. Appl. Econometrics 15(2), 137–160 (2000). https://doi.org/10.1002/ (SICI)1099-1255(200003/04)15:23.0.CO;2-M 38. Liu, L.Y., Patton, A.J., Sheppard, K.: Does anything beat 5-minute RV? A comparison of realized measures across multiple asset classes. J. Econometrics 187(1), 293–311 (2015) 39. Melino, A., Turnbull, S.M.: Pricing foreign currency options with stochastic volatility. J. Econometrics 45(1), 239–265 (1990). https://doi.org/10.1016/0304-4076(90)90100-8 40. Nakajima, J., Omori, Y.: Stochastic volatility model with leverage and asymmetrically heavytailed error using GH skew Student’s t-distribution. Comput. Stat. Data Anal. 56(11), 3690– 3704 (2012). https://doi.org/10.1016/j.csda.2010.07.012 41. Nelson, D.B.: Conditional heteroskedasticity in asset returns: a new approach. Econometrica 59(2), 347–370 (1991) 42. Nugroho, D.B., Morimoto, T.: Realized non-linear stochastic volatility models with asymmetric effects and generalized Student’s t-distributions. J. Japan. Stat. Soc. 44(1), 83–118 (2014) 43. Nugroho, D.B., Morimoto, T.: Box-Cox realized asymmetric stochastic volatility models with generalized Student’s t-error distributions. J. Appl. Stat. 43(10), 1906–1927 (2016) 44. Omori, Y., Chib, S., Shephard, N., Nakajima, J.: Stochastic volatility with leverage: fast and efficient likelihood inference. J. Econometrics 140(2), 425–449 (2007). https://doi.org/10.1016/j. jeconom.2006.07.008
6
1 Introduction
45. Omori, Y., Watanabe, T.: Block sampler and posterior mode estimation for asymmetric stochastic volatility models. Comput. Stat. Data Anal. 52(6), 2892–2910 (2008). https://doi.org/10. 1016/j.csda.2007.09.001 46. Shephard, N.: Statistical aspects of ARCH and stochastic volatility. In: Cox, D.R., Hinkley, D.V., Barndorff-Nielsen, O.E. (eds.) Time Series Models in Econometrics, Finance and Other Fields, pp. 1–67. Chapman & Hall, New York (1996) 47. Shephard, N., Pitt, M.K.: Likelihood analysis of non-Gaussian measurement time series. Biometrika 84, 653–667 (1997) 48. Takahashi, M., Omori, Y., Watanabe, T.: Estimating stochastic volatility models using daily returns and realized volatility simultaneously. Comput. Stat. Data Anal. 53(6), 2404–2426 (2009). https://doi.org/10.1016/j.csda.2008.07.039 49. Takahashi, M., Watanabe, T., Omori, Y.: Volatility and quantile forecasts by realized stochastic volatility models with generalized hyperbolic distribution. Int. J. Forecasting 32(2), 437–457 (2016). https://doi.org/10.1016/j.ijforecast.2015.07.005 50. Takahashi, M., Watanabe, T., Omori, Y.: Forecasting daily volatility of stock price index using daily returns and realized volatility. Econometrics Stat. (2021). https://doi.org/10.1016/j.ecosta. 2021.08.002 51. Taylor, S.J.: Modelling Financial Time Series. Wiley, New York (1986) 52. Trojan, S.: Regime Switching Stochastic Volatility with Skew, Fat Tails and Leverage using Returns and Realized Volatility Contemporaneously. Discussion Paper Series No. 2013-41, Department of Economics, School of Economics and Political Science, University of St. Gallen (2013) 53. Ubukata, M., Watanabe, T.: Pricing Nikkei 225 options using realized volatility. Jpn. Econ. Rev. 65(4), 431–467 (2014). https://doi.org/10.1111/jere.12024 54. Watanabe, T.: A non-linear filtering approach to stochastic volatility models with an application to daily stock returns. J. Appl. Econometrics 14(2), 101–121 (1999). https://doi.org/10.1002/ (SICI)1099-1255(199903/04)14:23.0.CO;2-A 55. Watanabe, T., Omori, Y.: A multi-move sampler for estimating non-gaussian time series models: comments on Shephard & Pitt (1997). Biometrika 91(1), 246–248 (2004) 56. Yu, J.: On leverage in a stochastic volatility model. J. Econometrics 127(2), 165–178 (2005). https://doi.org/10.1016/j.jeconom.2004.08.002 57. Zhang, L.: Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12(6), 1019–1043 (2006) 58. Zhang, L., Mykland, P.A., Aït-Sahalia, Y.: A tale of two time scales: determining integrated volatility with noisy high-frequency data. J. Am. Stat. Assoc. 100(472), 1394–1411 (2005)
Chapter 2
Stochastic Volatility Model
Abstract It is well known in financial markets that return volatility changes randomly with high persistence. This chapter considers a statistical model called a stochastic volatility (SV) model to describe those dynamics in financial time series such as stock returns and foreign exchange returns. In particular, the basic SV model without leverage effects or asymmetry is considered; we call this a symmetric SV model. We describe an efficient Bayesian method using Markov chain Monte Carlo for the estimation of symmetric SV models.
2.1 Introduction We first introduce the basic SV model. Let yt and h t denote the daily asset return and the log volatility at time t. The response, yt , is observed, while h t is the unobserved latent variable. The SV model is given by yt = exp(h t /2)t , t ∼ i.i.d.N (0, 1), t = 1, . . . , n, h t+1 = μ + φ(h t − μ) + ηt , ηt ∼ h1 ∼
N (μ, ση2 /(1
i.i.d.N (0, ση2 ).
t = 1, . . . , n − 1,
− φ )), 2
(2.1) (2.2) (2.3)
where N (a, b) denotes a normal distribution with mean a and variance b. The disturbances, t and ηt , are assumed to be independent in the basic SV model, which we call the symmetric SV model. When t and ηt are correlated, the SV model is called the asymmetric SV model or the SV model with leverage effects and will be discussed in Chap. 3. We assume a stationary first order autoregressive process for h t by setting |φ| < 1, and a corresponding stationary distribution for the initial latent volatility h 1 . In symmetric SV models, the model parameter is θ = (μ, φ, ση2 ) . However, since there are many latent variables, it is difficult to obtain the likelihood function by integrating them out numerically. Thus, for statistical inference of model parameters, we take the Bayesian approach and implement Markov chain Monte Carlo (MCMC) simulation. There are two major efficient MCMC estimation methods in the sense that they substantially reduce autocorrelations of MCMC samples. One method is the © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Takahashi et al., Stochastic Volatility and Realized Stochastic Volatility Models, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-99-0935-3_2
7
8
2 Stochastic Volatility Model
mixture sampler proposed by Kim et al. [9]. This method transforms the model into a linear state-space model and approximates the error distribution using a mixture of normal distributions. The mixture sampler is fast and highly efficient, but its use is limited to models that can be transformed into a linear state-space form. Although the sampler is based on the approximation of the measurement equation, we describe how we correct the approximation errors using the posterior resampling of the MCMC samples or by introducing the pseudotarget density where our posterior density is its marginal density. The other method is the block sampler or the multi-move sampler proposed by Shephard and Pitt [14] and Watanabe and Omori [15], which generates a block of state variables given other parameters and latent variables. In contrast to the mixture sampler, it can be applied to the model directly, without the need to transform it into a linear state-space form. The rest of this chapter is organized as follows. In Sect. 2.2, we first describe the single-move sampler where we sample one h t at a time given {h s }s=t and parameters. It is a simple sampling method, but is known to be inefficient since it produces highly autocorrelated MCMC samples. Next, we discuss the mixture sampler in Sect. 2.3, and the multi-move sampler in Sect. 2.4. Further, the particle filter is introduced to compute the likelihood of the SV model in Sect. 2.5. Finally, an empirical study is presented in Sect. 2.6.
2.2 Single-Move Sampler for the Symmetric SV Model For the prior distribution of θ , we assume μ ∼ N (μ0 , σ02 ), (φ + 1)/2 ∼ Beta(a, b), ση2 ∼ IG(n 0 /2, s0 /2), where Beta(a, b) denotes a beta distribution with parameters (a, b), and IG(a/2, b/2) denotes an inverse gamma distribution with the probability density function
π(ση2 )
∝
(ση2 )−(a/2+1)
b exp − 2 2ση
.
Let π(θ ) denote the prior density of θ . Further, let f ( y|h) and f (h|θ) denote the densities corresponding to (2.1), and (2.2)–(2.3), respectively, given by 1 (2π )−1/2 exp − h t + yt2 exp(−h t ) , 2 t=1 f (h|θ ) = (2π ση2 )−n/2 1 − φ 2
n−1 1 2 2 2 × exp − 2 (1 − φ )(h 1 − μ) + , (h t+1 − μ(1 − φ) − φh t ) 2ση t=1
f ( y|h) =
n
2.2 Single-Move Sampler for the Symmetric SV Model
9
where y = (y1 , . . . , yn ) and h = (h 1 , . . . , h n ) . Then, the joint posterior density function is given by π(h, θ | y) ∝ f ( y|h) f (h|θ )π(θ)
s0 (μ − μ0 )2 ∝ (1 + φ)a−1/2 (1 − φ)b−1/2 (ση2 )−(n 1 /2+1) exp − 2 − ση 2σ02
n 1 h t + yt2 exp(−h t ) × exp − 2 t=1 ⎧ ⎡ ⎤⎫ n−1 ⎨ 1 ⎬ (h t+1 − μ(1 − φ) − φh t )2 ⎦ , × exp − 2 ⎣(1 − φ 2 )(h 1 − μ)2 + ⎩ 2ση ⎭ t=1
where n 1 = n + n 0 . The MCMC algorithm consists of six blocks. Let θ \μ denote θ excluding μ, and define θ \φ , θ \ση2 in a similar manner. Step 1. Step 2. Step 3. Step 4. Step 5. Step 6.
Initialize θ = (μ, φ, ση2 ) , and h. Generate φ ∼ π(φ|θ \φ , h, y). Generate ση2 ∼ π(ση2 |θ \ση2 , h, y). Generate μ ∼ π(μ|θ \μ , h, y). Generate h ∼ π(h|θ, y). Go to Step 2.
The details of the algorithm are described below.
2.2.1 Generation of θ = (μ, φ, ση2 ) Step 2. Generation of φ. The full conditional posterior density of φ is
π(φ|·) ∝ (1 + φ)
a−1/2
(1 − φ)
b−1/2
(φ − μφ )2 exp − 2σφ2
,
where n−1 μφ =
t=1 (h t+1 − n−1 t=2 (h t
μ)(h t − μ) − μ)2
, σφ2 = n−1
ση2
t=2 (h t
− μ)2
.
Let TN(−1,1) (μφ , σφ2 ) denote the normal distribution with mean μφ and variance σφ2 truncated over the interval (−1, 1). We apply the Metropolis–Hastings (MH) algorithm to generate φ. Given the current sample φx , we generate a candidate φ y ∼ TN(−1,1) (μφ , σφ2 ) and accept it with probability
10
2 Stochastic Volatility Model
(1 + φ y )a−1/2 (1 − φ y )b−1/2 min ,1 . (1 + φx )a−1/2 (1 − φx )b−1/2 Step 3. Generation of ση2 . The full conditional posterior distribution of ση2 is the inverse gamma distribution as follows. ση2 |· ∼ IG(n 1 /2, s1 /2), where s1 = s0 + (1 − φ 2 )(h 1 − μ)2 +
n−1
(h t+1 − μ(1 − φ) − φh t )2 .
t=1
Step 4. Generation of μ. The full conditional posterior distribution of μ is a normal distribution as follows. μ|· ∼ N (μ1 , σ12 ), where
n−1 μ0 (1 − φ 2 )h 1 1−φ μ1 = + + (h t+1 − φh t ) , ση2 ση2 t=1 σ02
−1 1 1 − φ 2 + (n − 1)(1 − φ)2 2 + . σ1 = ση2 σ02
σ12
2.2.2 Generation of h Step 5. Generation of h. The full conditional posterior density of h is
n 1 2 π(h|·) ∝ exp − [h t + yt exp(−h t )] 2 t=1
n−1 1 2 2 2 × exp − 2 (1 − φ )(h 1 − μ) + (h t+1 − μ(1 − φ) − φh t ) . 2ση t=1 The single-move sampler generates h t given other h s (s = t) for t = 1, . . . , n. When 1 < t < n,
2.2 Single-Move Sampler for the Symmetric SV Model
11
1 π(h t |·) ∝ exp − [h t + yt2 exp(−h t )] 2
1 2 2 × exp − 2 (h t − μ(1 − φ) − φh t−1 ) + (h t+1 − μ(1 − φ) − φh t ) 2ση
1 2 (h t − m t )2 ∝ exp − , y exp(−h t ) + 2 t st2
where mt =
st2
ση2 1 μ(1 − φ)2 + φ(h t−1 + h t+1 ) 2 = . − + , s t 2 ση2 1 + φ2
When t = 1, 1 π(h 1 |·) ∝ exp − [h 1 + y12 exp(−h 1 )] 2
1 2 2 2 × exp − 2 (1 − φ )(h 1 − μ) + (h 2 − μ(1 − φ) − φh 1 ) 2ση 1 2 (h 1 − m 1 )2 ∝ exp − , y exp(−h 1 ) + 2 1 s12 where m1 =
1 μ(1 − φ) + φh 2 − + , s12 = ση2 . 2 ση2
s12
When t = n,
2 1 1 π(h n |·) ∝ exp − [h n + yn2 exp(−h n )] − h n − μ(1 − φ) − φh n−1 2 2 2ση 1 2 (h n − m n )2 ∝ exp − , y exp(−h n ) + 2 n sn2
where m n = sn2
1 μ(1 − φ) + φh n−1 − + , sn2 = ση2 . 2 ση2
Therefore, we can apply the MH algorithm using N (m t , st2 ) as a proposal distribution. Alternatively, noting that
12
2 Stochastic Volatility Model
2 2 yt yt exp − exp(−h t ) < exp − (1 − h t ) , 2 2 we can use N m t + yt2 st2 /4, st2 as a proposal distribution of the accept–reject algorithm. However, it is well known that posterior MCMC samplers are highly autocorrelated and that we need to iterate the MCMC a huge number of times to conduct statistical inference. In this sense, the single-move sampler for sampling h is inefficient, and we need to consider a more efficient sampling algorithm. Below, we introduce two major efficient samplers. The first is called a mixture sampler since it is based on a normal mixture approximation [9, 11]. The second sampler is called a multi-move sampler since it divides h into multiple blocks and samples one block given other blocks [12, 14, 15].
2.3 Mixture Sampler 2.3.1 Reformulation of the Measurement Equation This section describes sampling h based on a normal mixture approximation as in [9] and [11]. We consider the transformation of the measurement Eq. (2.1), yt∗ = log yt2 = h t + t∗ , t∗ = log t2 ,
(2.4)
and yt∗ is a linear process (e.g., [8]) with an i.i.d. error t∗ in (2.4) that follows a log χ12 density f (t∗ )
∗ 1 t − exp(t∗ ) , t∗ ∈ R. = √ exp 2 2π
(2.5)
Kim et al. [9] introduced the idea of accurately approximating this distribution using a mixture of normal distributions selected to ensure that moments up to a certain order are equal. In the Bayesian MCMC context, the resulting approximation error can be corrected by reweighting the sequences sampled from the posterior distribution, as we discuss below. The mixture approximation has the form of g(t∗ ) =
K
pi f N (t∗ |m i , vi2 ), t∗ ∈ R
(2.6)
i=1
where f N (t∗ |m i , vi2 ) denotes the density function of a normal distribution with mean m i and variance vi2 . Kim et al. [9] determined the constants m i and vi2 on the basis of K = 7 components. These values are reproduced in the first block of columns in Table 2.1. Omori et al. [11] obtained a tighter approximation based on K = 10,which is given in the second block of Table 2.1.
2.3 Mixture Sampler
13
Table 2.1 Selection of ( pi , m i , vi2 ) K = 10b
KSCa i 1 2 3 4 5 6 7 8 9 10 a b
pi 0.04395 0.24566 0.34001 0.25750 0.10556 0.00002 0.00730
mi 1.50746 0.52478 −0.65098 −2.35859 −5.24321 −9.83726 −11.40039
vi2 0.16735 0.34023 0.64009 1.26261 2.61369 5.17950 5.79596
pi 0.00609 0.04775 0.13057 0.20674 0.22715 0.18842 0.12047 0.05591 0.01575 0.00115
mi 1.92677 1.34744 0.73504 0.02266 −0.85173 −1.97278 −3.46788 −5.55246 −8.68384 −14.65000
vi2 0.11265 0.17788 0.26768 0.40611 0.62699 0.98583 1.57469 2.54498 4.16591 7.33342
Determined by Kim et al. [9] Better approximation by Omori et al. [11]
Figure 2.1 shows the differences between the approximate and true densities of the log χ12 , χ12 and χ12 (for the range from the 1st percentile to the 99th percentile) for the two mixtures. Further, Fig. 2.2 shows the approximate and true densities for three cases (log χ12 , χ12 , χ12 ). It is clear that we have much better approximation using K = 10 components. Let st ∈ {1, 2, . . . , K } denote the component indicator of the mixture normal distributions in (2.6). Given the component st , we can express the distribution of t∗ as t∗ |st = m st + vst z 1t , z 1t ∼ i.i.d.N (0, 1). Given s = (s1 , . . . , sn ) , we obtain the approximate SV model as the linear Gaussian state-space model: yt∗ = m st + h t + vst z 1t , h t+1 = μ(1 − φ) + φh t + ση z 2t , where
(2.7) (2.8)
z t = (z 1t , z 2t ) ∼ N (0, I2 ).
Assuming the prior distribution for θ = (μ, φ, ση2 ) as in Sect. 2.2, we can implement MCMC simulation using the above approximate model.
14
2 Stochastic Volatility Model (a) Difference of densities: Log of χ 21
(b) Difference of densities: Log of χ 21
KSC (K=7)
0.01
K=10
0.00025 0.00000
0.00 -0.00025
-8
-6
-4
-2
0
-8
2
-6
KSC (K=7)
0
-4
-2
0
2
(d) Difference of densities: χ 21
(c) Difference of densities: χ 21 K=10
0.0 -0.1
-5
-0.2 -10 0
1
2
3
4
5
6
(e) Difference of densities: Square root of 0.2
0 χ2
1
2
3
4
5
6
(f) Difference of densities: Square root of χ 21
1
0.010
KSC (K=7)
K=10
0.005
0.0
0.000 -0.2
-0.005
0.0
0.5
1.0
1.5
2.0
2.5
0.0
0.5
1.0
1.5
2.0
2.5
Fig. 2.1 Difference between the approximate and true densities (from the 1st percentile to the 99th percentile) of log χ12 (top), χ12 (middle), and
(b) Density: χ 2
(a) Density: Log of χ 2
0.2
χ12 (bottom) (c) Density: Square root of χ 2 1.0
True New(K=10) KSC (K=7)
2
0.5 1
0.1
-5
0
-5
0
Fig. 2.2 Approximate and the true densities of log χ12 (left), χ12 (middle), and
-5
0
χ12 (right)
2.3.2 MCMC Algorithm Let y∗ = (y1∗ , . . . , yn∗ ) . The simple way to generate θ = (μ, φ, ση2 ) from its posterior distribution is as follows. 1. Initialize s, h, and θ . 2. Generate s|h, θ , y∗ .
2.3 Mixture Sampler
15
3. Generate (h, θ ))|s, y∗ by a. b. c. d.
Sampling φ|θ \φ , h. Sampling ση2 |θ \ση2 , h. Sampling μ|θ \μ , h. Sampling h|θ, s, y∗ .
4. Go to Step 2. Step 2. Generation of s. For t = 1, . . . , n, we generate st from the discrete posterior distribution Pr (st |·) ∝ pst f N (yt∗ |m st + h t , vs2t ), st = 1, . . . , K , where f N (·|m, v) denotes a density of N (m, v). Step 3a–3c. Generation of θ = (μ, φ, ση2 ) . Sampling algorithms for θ are the same as in Sect. 2.2.1. Step 3d. Generation of h. Using the simulation smoother proposed by de Jong and Shephard [3] or Durbin and Koopman [6] for the linear Gaussian state-space model, Eqs.(2.7) and (2.8), we can generate h all at once. This substantially improves the sampling efficiency of the MCMC algorithm. See Sect. 2.7.1 for the details of the simulation smoother. Further, we can improve the efficiency by replacing Step 3 as below. 3’. Generate (h, θ )|s, y∗ by a Sampling φ, ση2 |s, y∗ . b Sampling μ|θ \μ , s, y∗ . c Sampling h|θ, s, y∗ . Step 3 a. Generation of (φ, ση2 ) . To sample ϑ = θ \μ = (φ, ση2 ) from the approximate posterior distribution π ∗ (ϑ|s, y∗ ) ∝ g( y∗ |ϑ, s)π(ϑ) using MH algorithm where ! ! n ∗ ∗ 2 f N (y |m st + h t , vst ) f (h|θ)π(μ)dhdμ, g( y |ϑ, s) = t=1
we evaluate g( y∗ |ϑ, s) using an augmented Kalman filter as shown in Sect. 2.7.2, ˆ Define ϑ ∗ and ∗ as and compute the posterior mode ϑ. " ∂ log π(ϑ|s, y∗ ) "" −1 ˆ ϑ ∗ = ϑ, ∗ = − " ˆ. ∂ϑ∂ϑ ϑ=ϑ
16
2 Stochastic Volatility Model
Generate a candidate ϑ y from the distribution N (ϑ ∗ , ∗ ) truncated over R = {ϑ : |φ| < 1, ση2 > 0}. Let ϑ x denote the current point of ϑ. We accept the candidate ϑ y with probability π(ϑ y |s, y∗ ) f TN (ϑ x |ϑ ∗ , ∗ ) ,1 , α(ϑ x , ϑ y |s, y ) = min π(ϑ x |s, y∗ ) f TN (ϑ y |ϑ ∗ , ∗ )
∗
where f TN denotes the density of the truncated normal distribution for the proposal above. If the candidate ϑ y is rejected, we take the current value ϑ x as the next draw. Remark. If it is difficult to obtain the mode ϑˆ and generate from the truncated normal distribution, consider a reparameterization such as log{(1 + φ)/(1 − φ)} and log ση2 . Note that we need to include the Jacobian term in the MH algorithm for the transformed parameters. Step 3 b. Generation of μ. Generate μ|θ \μ , s, y∗ ∼ N (c1 , C1 ), where c1 and C1 are computed using the byproducts of the augmented Kalman filter (see Sect. 2.7.2). Step 3 c. Generation of h. The sampling algorithm is the same as in Step 3d.
2.3.3 Correcting for Misspecification 2.3.3.1
Posterior Resampling
In our approach, we approximate the true density f (t∗ ) with our convenient mixture density g(t∗ ). Thus, the draws from our MCMC procedure hj, θ j,
j = 1, 2, . . . , M,
are from the approximate posterior density π ∗ (h, θ | y∗ ) ∝ g( y∗ |h) f (h|θ)π(θ ) where ∗
g( y |h) =
K n t=1
pi f N (yt∗ |m i
+
h t , vi2 )
.
i=1
To produce draws from the correct posterior density π(h, θ | y), we simply reweight the sampled draws. Define ∗j
t = yt∗ − h t , j
2.3 Mixture Sampler
17
Then, we compute the weights w ∗j =
n ∗j f (t ) ∗j
t=1
g(t )
,
j = 1, 2, ..., M,
where f and g are defined in (2.5) and (2.6), respectively, and let w ∗j w j = M i=1
wi∗
.
We can now produce a sample from π(h, θ | y) by resampling the sampled variates with weights w j ’s. Furthermore, posterior moments can be computed via weighted averaging of the MCMC draws. As shown in [11], the variance of these weights is small, and the effect of reweighting is modest. This is because our approximation is highly accurate [9, 11].
2.3.3.2
Exact Sampling Using a Pseudotarget Density
To generate (h, θ ) from the true conditional posterior density π(h, θ | y), we define the pseudotarget density as discussed in [4] and implement the MCMC algorithm. Define the pseudotarget density π˜ (h, θ , s| y) = π(h, θ |y) × q(s|h, θ , y∗ ), q(s|h, θ , y∗ ) =
n t=1
10
pst g(yt∗ |h t , θ , st )
i=1
pi g(yt∗ |h t , θ , st = i)
,
where we set g(yt∗ |h t , θ , st ) = f N (yt∗ |m st + h t , vs2t ), and the marginal density π(h, θ | y) is our target density. The MCMC algorithm is implemented in four blocks. 1. Initialize h and θ. 2. Generate (θ , s)|h, y ∼ π˜ (θ, s|h, y). a. Generate θ |h ∼ π(θ|h, y) as below. Note that π(θ |h, y) ∝ We consider the transformation
s
π(h, ˜ s|θ, y).
ϑ = (ϑ1 , ϑ2 , ϑ3 ) = (μ, log{(1 + φ)/(1 − φ)}, log ση2 ) ,
18
2 Stochastic Volatility Model
where " " " dϑ " 2 exp(ϑ2 ) − 1 2 " "= " dθ " (1 − φ 2 )σ 2 , φ = exp(ϑ ) + 1 , ση = exp(ϑ3 ). 2 η To sample from the conditional posterior distribution π(ϑ|h, y) = π(θ |h, y) ˆ Define × |dθ /dϑ| using the MH algorithm, compute the posterior mode ϑ. ϑ ∗ and ∗ as " ∂ log π(ϑ|h, y) "" −1 ˆ ϑ ∗ = ϑ, ∗ = − " ˆ. ∂ϑ∂ϑ ϑ=ϑ and generate a candidate ϑ y from the distribution N (ϑ ∗ , ∗ ). Let ϑ x denote the current point of ϑ. We accept the candidate ϑ y with probability π(ϑ y |h, y) f N (ϑ x |ϑ ∗ , ∗ ) ,1 , α(ϑ x , ϑ y |h, y) = min π(ϑ x |h, y) f N (ϑ y |ϑ ∗ , ∗ )
where f N denotes the density of the normal distribution for the proposal above. If the candidate ϑ y is rejected, we take the current value ϑ x as the next draw. Finally, transform ϑ back to θ. b. Generate s|h, θ , y∗ ∼ q(s|h, θ , y∗ ). 3. Generate h|θ , s, y ∼ π˜ (h|θ, s, y). a. Propose a candidate h† = (h †1 , . . . , h †n ) using a simulation smoother for the linear Gaussian state-space model: yt∗ = m st + h t + (vst , 0)u ∗t , t = 1, . . . , n, h t+1 = μ(1 − φ) + φh t + 0, ση u ∗t , t = 1, . . . , n − 1, h 1 ∼ N (μ, ση2 /(1 − φ 2 )), |φ| < 1, u ∗t ∼ i.i.d.N (0, I2 ). Thus, h† is a sample from 1 g(yt∗ |h t , θ , st ) × g(h|θ , s), m( y∗ |θ, s) t=1 n
π ∗ (h|θ , s, y∗ ) = where g(h|θ, s) =
n−1 t=1
f N (h t+1 |μ + φ(h t − μ), ση2 ) × f N (h 1 |μ, ση2 /(1 − φ 2 )),
2.4 Multi-move Sampler
19
and m( y∗ |θ, s) is a normalizing constant given by m(y ∗ |θ , s) =
! n
g(yt∗ |h t , θ , st ) × g(h|θ, s)dh.
t=1
b. Given the current value h, accept the candidate h† with probability
min 1,
π˜ (h† |θ, s, y)π ∗ (h|θ, s, y∗ )
π˜ (h|θ , s, y)π ∗ (h† |θ, s, y∗ )
π(h† |θ, y)q(s|h† , θ , y∗ )π ∗ (h|θ, s, y∗ ) = min 1, π(h|θ, y)q(s|h, θ , y∗ )π ∗ (h† |θ , s, y∗ )
# q(s|h† , θ , y∗ ) nt=1 f N (yt |0, exp(h †t ))g(yt∗ |h t , θ , st ) = min 1, # q(s|h, θ , y∗ ) nt=1 f N (yt |0, exp(h t ))g(yt∗ |h †t , θ , st )
# n † 10 ∗ t=1 f N (yt |0, exp(h t )) i=1 pi g(yt |h t , θ , st = i) = min 1, #n 10 ∗ † t=1 f N (yt |0, exp(h t )) i=1 pi g(yt |h t , θ , st = i)
4. Go to Step 2. Remark. The exact sampling using a pseudotarget density is available in R package for the stochastic volatility ASV, which estimates model parameters and computes the logarithm of the likelihood and the marginal likelihood (see, e.g., https://sites. google.com/view/omori-stat/english/software/asv-r).
2.4 Multi-move Sampler As discussed in Sect. 2.2, the single-move sampler that samples h t given other h j ’s ( j = t) is known to produce highly autocorrelated posterior samples. To reduce such autocorrelations, we alternatively introduce the multi-move sampler or a block sampler [14, 15] where we sample a vector of hs+1:s+m = (h s+1 , . . . , h s+m ) given other h j ’s ( j ≤ s, j ≥ s + m + 1) instead of sampling one h t at a time given other h j ’s. We first divide h into K + 1 blocks: h ki +1:ki+1 , i = 0, 1, . . . , K , where k0 ≡ 0 < k1 < · · · < k K < k K +1 ≡ n. For the selection of (k1 , . . . , k K ), Shephard and Pitt [14] propose stochastic knots using ki = int[n × (i + Ui )/(K + 2)],
i = 1, . . . , K ,
20
2 Stochastic Volatility Model
where Ui is an i.i.d. uniform random variable over the interval (0, 1). This is expected to reduce the possible high autocorrelations for the posterior samples around the boundaries h ki . Remark. We choose a tuning parameter K , so that it substantially reduces the autocorrelations among posterior samples. Suppose we aim to sample a a vector of hs+1:s+m given other h j ’s ( j ≤ s, j ≥ s + m + 1). We note that it is equivalent to sampling a vector of disturbances ηs:s+m−1 = (ηs , . . . , ηs+m−1 ) given h s and h s+m+1 where hs+1:s+m is a linear combination of ηs:s+m−1 . Since sampling ηs:s+m−1 is known to be more efficient than sampling hs+1:s+m , we consider sampling ηs:s+m−1 . Further, let us denote ηt = ση z t , z t ∼ N (0, 1) and consider the log conditional posterior density of z s:s+m−1 = (z s , . . . , z s+m−1 ) , log π(z s:s+m−1 |h s , h s+m+1 , ·) = const −
s+m−1 s+m 1 2 zt + l(h t ), 2 t=s t=s+1
where l(h t ) = −
y2 ht (h s+m+1 − μ(1 − φ) − φh s+m )2 − t exp(−h t ) − I (t = s + m < n), 2 2 2ση2
and I (A) = 1 if A is true and 0 otherwise. To sample from this conditional posterior density of z s:s+m−1 , we construct the proposal density as follows. Using Taylor expansion of l(h t ) around the conditional mode hˆ t of h t (where hˆ s+1:s+m corresponds to the conditional mode zˆ s:s+m−1 of z s:s+m−1 ), 1 ( yˆt − h t )2 l(h t ) ≈ l(hˆ t ) + l (hˆ t )(h t − hˆ t ) + l (hˆ t )(h t − hˆ t )2 = const − , 2 2vt2 where l (hˆ t ) 1 , vt2 = − , yˆt = hˆ t − l (h t ) l (hˆ t ) y2 1 [h s+m+1 − μ(1 − φ) − φh s+m ]φ l (h t ) = − + t exp(−h t ) + I (t = s + m < n), 2 2 ση2 y2 φ2 l (h t ) = − t exp(−h t ) − 2 I (t = s + m < n), 2 ση
2.4 Multi-move Sampler
21
we obtain log π(z s:s+m−1 |h s , h s+m+1 , ·) ≈ const −
s+m−1 s+m ( yˆt − h t )2 1 2 zt − 2 t=s 2vt2 t=s+1
≡ const + log π ∗ (z s:s+m−1 |h s , h s+m+1 , ·). The density π ∗ (z s:s+m−1 |h s , h s+m+1 , ·) is the posterior density of z s:s+m−1 for the linear Gaussian state-space model yˆt = h t + t∗ , t∗ ∼ N (0, vt2 ), t = s + 1, . . . , s + m,
(2.9)
h t+1 = μ(1 − φ) + φh t + ση z t , z t ∼ N (0, 1), t = s + 1, . . . , s + m − 1, (2.10) μ(1 − φ) + φh s + ση z s , z s ∼ N (0, 1), if s ≥ 1, (2.11) h s+1 = z 0 ∼ N (0, 1), if s = 0, μ + ση0 z 0 , where ση20 = ση2 /(1 − φ 2 ). Thus, we sample z s:s+m−1 using the MH algorithm from its posterior distribution in five steps and compute hs+1:s+m recursively. 1. Compute the conditional mode hˆ s+1:s+m . a. Let hˆ s+1:s+m = hs+1:s+m where hs+1:s+m denotes the current sample of hs+1:s+m . 2 2 , . . . , vs+m ) . b. Compute ˆys+1:s+m = ( yˆs+1 , . . . , yˆs+m ) and v 2s+1:s+m = (vs+1 c. Implement the disturbance smoother given by Koopman [10] (see Sect. 2.7.1) to obtain the posterior mode of zˆ s:s+m−1 . d. Compute hˆ s+1:s+m recursively using zˆ s:s+m−1 . e. Repeat Steps b and c until convergence (it will usually converge after several repetitions). 2. Compute ˆys+1:s+m and v 2s+1:s+m using hˆ s+1:s+m . 3. Generate a candidate z †s+1:s+m ∼ π ∗ (z s:s+m−1 |h s , h s+m+1 , ·) using the simulation smoother [3, 6] (see Sect. 2.7.1) for the state-space model with (2.9), (2.10), and (2.11). 4. Accept z †s+1:s+m with probability
min 1,
π(z †s:s+m−1 |h s , h s+m+1 , ·)π ∗ (z s:s+m−1 |h s , h s+m+1 , ·) π(z s:s+m−1 |h s , h s+m+1 , ·)π ∗ (z †s:s+m−1 |h s , h s+m+1 , ·)
,
where z s:s+m−1 denotes the current sample from π(z s:s+m−1 |h s , h s+m+1 , ·). 5. Compute hs+1:s+m recursively using the new sample z s:s+m−1 . Remark. Step 1 is equivalent to the Newton method to optimize the log-likelihood function.
22
2 Stochastic Volatility Model
Remark. If we replace l (h t ) with E yt |h t [l (h t )] in (2.4), the algorithm becomes a special case of the multi-move sampler for the asymmetric stochastic volatility model described in Chap. 3. Further, Step 1 is equivalent to the Fisher scoring method to optimize the log-likelihood function.
2.5 Auxiliary Particle Filter This section describes how to compute the likelihood f ( y|θ) numerically, ! f ( y|θ) =
f ( y|h) f (h|θ)dh,
$ which is necessary to obtain the marginal likelihood, f ( y) = f ( y|θ )π(θ)dθ , Bayes factors, and goodness-of-fit statistics. Filtering and associated computations are carried out using particle filter methods (see, e.g., [9]). We describe the auxiliary particle filter [13] for the SV model, which is known to give a more accurate estimate of the likelihood. Let Y t = (y1 , . . . , yt ) . The simple particle filter is obtained as a special case by setting q(h it |Y t+1 , θ ) = fˆ(h it |Y t , θ ) (or equivalently setting f (yt+1 |μit+1 ) ≡ 1) in Step 2.
1. Compute fˆ(y1 |θ) and fˆ(h i1 |y1 , θ ) = π1i for i = 1, . . . , I as below. (a) Generate h i1 ∼ f (h 1 |θ) (= N (μ, ση2 /(1 − φ 2 ))). (b) Compute wi fˆ(h i1 |y1 , θ ) = π1i = I
j=1 w j
I 1 wi , fˆ(y1 |θ ) = w 1 = I i=1
, wi = f (y1 |h i1 ), Wi = F(y1 |h i1 ),
I ˆ 1 |θ) = W 1 = 1 F(y Wi , I i=1
where f (y1 |θ) and F(y1 |θ) are the marginal density function and the marginal distribution function of y1 given θ , respectively. Let t = 1. i 2. Compute fˆ(yt+1 |Y t , θ ) and fˆ(h it+1 |Y t+1 , θ ) = πt+1 for i = 1, . . . , I as below. Define the importance function
f (yt+1 |μit+1 ) fˆ(h it |Y t , θ ) , μit+1 j j ˆ f (y |μ ) f (h |Y , θ ) t+1 t t t+1 j=1
q(h it |Y t+1 , θ ) = I
= (1 − φ)μ + φh it .
(a) Sample h it ∼ q(h t |Y t+1 , θ ). (b) Generate h it+1 |h it , θ ∼ f (h t+1 |h it , θ ) (= N ((1 − φ)μ + φh it , ση2 )).
2.6 Empirical Study
23
(c) Compute wi i fˆ(h it+1 |Y t+1 , θ ) = πt+1 = I j=1
wi =
wj
,
f (yt+1 |h it+1 ) f (h it+1 |h it , θ ) fˆ(h it |Y t , θ )
f (h it+1 |h it , θ )q(h it |Y t+1 , θ ) F(yt+1 |h it+1 ) fˆ(h it |Y t , θ ) , Wi = q(h it |Y t+1 , θ )
I 1 wi , fˆ(yt+1 |Y t , θ ) = w t+1 = I i=1
=
f (yt+1 |h it+1 ) fˆ(h it |Y t , θ ) q(h it |Y t+1 , θ )
,
I ˆ t+1 |Y t , θ ) = W t+1 = 1 F(y Wi . I i=1
3. Increment t and go to 2. p
p
It can be shown that as I → ∞, w t+1 → f (yt+1 |Y t , θ ), and W t+1 → p F(yt+1 |Y t , θ ), where → denotes the convergence in probability. It therefore follows that n n p log w t → log f (yt |Y t−1 , θ ). t=1
t=1
Thus, nt=1 log w t is a consistent estimate of the conditional log-likelihood and can be used as an input in the marginal likelihood calculation using " the method " given by [1]. Likewise the sequence of W t and its reflected version 2 "W t − 1/2" can be used to check for model fit as these are approximately i.i.d. standard uniform if the model is correctly specified. This diagnostic was introduced into econometrics by Kim et al. [9], and diagnostic checking of this type has been further popularized by Diebold et al. [5].
2.6 Empirical Study To illustrate the MCMC estimation of the symmetric SV model, we fit the model to the daily returns, yt , of the Japanese yen/US dollar exchange rate from January 6, 2015 to November 29, 2021. Let pt denote the exchange rate at time t, and define the daily return at time t as yt = (log pt − log pt−1 ) × 100. The time series plot of yt (t = 1, . . . , 1686) is given in Fig. 2.3. Flat priors are assumed, reflecting that there is no prior information: μ ∼ N (0, 1000), ση2 ∼ IG(0.01, 0.01), φ ∼ U(−1, 1).
24
2 Stochastic Volatility Model yt
3 2 1 0 -1 -2 2015
2016
2017
2018
2019
2020
2021
2022
Fig. 2.3 Time series plot of the Japanese yen/US dollar exchange rate returns 1
μ
-1.25
μ
0 -1.75
0
10000
20000
30000
40000
0
50000 1
φ
100
200
300
400
500
600
200
300
400
500
600
200
300
400
500
600
φ
0.95 0 0.90 0
10000
20000
30000
40000
ση
0.4
0
50000 1
0.3
100 ση
0
0.2 0
10000
20000
30000
40000
50000
0
100
Fig. 2.4 Traceplot (left) and sample autocorrelation functions (right) for μ, φ, and ση
We may instead use more informative prior distributions, for example, μ ∼ N (0, 1) and (φ + 1)/2 ∼ Beta(20, 1.5), as in past empirical studies. The exact mixture sampler is used to generate h t ’s from their conditional distribution, but similar results are obtained using other multi-move sampling algorithm with flat priors. The MCMC algorithm is iterated 50,000 times after discarding 5,000 samples as a burn-in period.1 The acceptance rates of the MH algorithms for h and θ are found to be high, at 85.3% and 80.6%, respectively. 1
The computational results are generated using R and Ox (version 8.02).
2.6 Empirical Study
25 1
h500
h500
0 0 -1
0 0
10000
20000
30000
40000
0
50000 1
h1000
150
300
450
600
750
300
450
600
750
300
450
600
750
h1000
-1 0
-2 -3 0
10000
20000
30000
40000
1
h1500
-1
0
50000
150 h1500
0
-2 -3 0
10000
20000
30000
40000
50000
0
150
Fig. 2.5 Traceplot (left) and sample autocorrelation functions (right) for h 500 , h 1000 , and h 1500 Table 2.2 MCMC estimation results Parameter Mean SD μ φ ση h 500 h 1000 h 1500
−1.623 0.943 0.251 −0.653 −1.870 −2.096
0.125 0.021 0.049 0.410 0.420 0.434
95%CI
IF
CD
(−1.865, −1.367) (0.896, 0.976) (0.166, 0.360) (−1.438, 0.178) (−2.680, −1.016) (−2.917, −1.211)
13 167 209 4 5 5
0.92 0.52 0.51 0.14 0.27 0.90
Mean posterior mean SD standard deviation 95%CI 95% credible interval IF inefficiency factor CD p-values of convergence diagnostic test given by Geweke [7]
As shown in Figs. 2.4 and 2.5, the chains mix well, and the distributions of the MCMC samples seem to converge to the posterior distributions. The sample autocorrelation functions decay and vanish after lag 400 for μ, φ, and ση . Moreover, they vanish very quickly for h 500 , h 1000 , and h 1500 , which indicates that the proposed sampling algorithm is highly efficient. The summary statistics and estimated posterior densities are given in Table 2.2 and Fig. 2.6, respectively. The posterior mean of the autoregressive parameter φ is estimated to be large, at 0.943, indicating high persistence in the log volatility. These are the typical estimates for φ as found in various previous empirical studies. The inefficiency factor (IF) is defined as 1 + 2 ∞ g=1 ρ(g), where ρ(g) is the autocorrelation function at lag g and is estimated using the sample autocorrelation function.
26
2 Stochastic Volatility Model μ
3
20
ση
φ
7.5
2
5.0 10
1
2.5
-2.0
-1.5
-1.0
0.85
0.90
0.95
1.00
0.1
0.2
0.3
0.4
0.5
Fig. 2.6 Estimated posterior densities for μ, φ and ση 95%CI Median
0
-2
2015
2016
2017
2018
2019
2020
2021
2022
Fig. 2.7 Estimated posterior median and 95% credible intervals for log volatilities h t s
This is interpreted as the ratio of the numerical variance of the posterior mean from the chain to the variance of the posterior mean from hypothetical uncorrelated draws, and the effective sample size (ESS) is obtained as the posterior sample size divided by the inefficiency factor. The smaller the inefficiency factor, the closer the MCMC sampling to the uncorrelated sampling. The IFs range from 13 to 209 (the ESSs range from 240 to 3950) for μ, φ, and ση , and from 4 to 5 (the ESSs range from 9410 to 12968) for h 500 , h 1000 , and h 1500 , which implies that our sampling algorithm works quite efficiently. The CD indicates the convergence diagnostics, defined as the p-value of the twosided test as to whether the mean of first the 10% of the MCMC samples is equal to that of the last 50% [7]. All p-values are greater than 0.05, which implies that there is no strong evidence that the distribution of the MCMC samples does not converge to the posterior distribution. Finally, Fig. 2.7 shows the estimated posterior median and 95% credible intervals for log volatilities h t s. The volatilities were found to be high in the year 2016, and they decreased toward the end of the year 2019. The high peak early in the year 2020 corresponds to the coronavirus disease (COVID-19) outbreak.
2.7 Appendix
27
2.7 Appendix 2.7.1 Simulation Smoother Consider a state-space model yt = Xt β + Zt α t + Gt ut , t = 1, . . . , n, α t+1 = Wt β + Tt α t + Ht ut , t = 0, 1, . . . , n − 1, α 0 ≡ 0, ut ∼ N (0, I p+q ),
(2.12) (2.13)
where yt is a p × 1 observed response vector at time t, α t is a q × 1 state variable vector, Xt , Zt , Wt , Tt , Gt , Ht (t = 1, . . . , n), and W0 , H0 , and β are constant vectors or matrices with appropriate dimensions. It is well known that sampling the disturbance vector u = (u0 , . . . , un−1 ) leads to less autocorrelated MCMC samples of α = (α 1 , . . . , α n ), and we consider the simulation smoother [3] and [6] proposed. In the approximate state-space form of the SV model for the mixture sampler, yt corresponds to yt∗ where we set Zt = 1, Tt = φ, α t = h t , Gt = (vst , 0), Ht = (0, ση ), Xt = (m st , 0), Wt = (0, 1), t = 1, . . . , n, W0 = (0, 1/(1 − φ)), H0 = (0, ση / 1 − φ 2 ), β = (1, (1 − φ)μ) . Algorithm by de Jong and Shephard (1995) [3] Suppose that we want to obtain the posterior sample of Ft ut . By setting Ft = Ht , the posterior sample of the disturbance vector is obtained for the state equation (2.13). We first implement the Kalman filter for t = 1, . . . , n. 1. Let a1 = W0 β and P1 = H0 H0 . 2. For t = 1, 2, . . . , n, compute at+1 = Wt β + Tt at + Kt et , Pt+1 = Tt Pt Lt + Ht Jt , where et = yt − Xt β − Zt at , Dt = Zt Pt Zt + Gt Gt , Kt = (Tt Pt Zt + Ht Gt )D−1 t , Lt = Tt − Kt Zt , Jt = Ht − Kt Gt , and save {et , Dt , Jt , Lt }nt=1 . Next, we generate u using the simulation smoother. 3. Set r n = 0 and Un = O. For t = n, n − 1, . . . , 1, compute
28
2 Stochastic Volatility Model Ct = Ft (I − Gt D−1 t Gt − Jt Ut Jt )Ft ,
κ t ∼ N (0, Ct ), Vt = Ft (Gt D−1 t Zt + Jt Ut Lt ), −1 r t−1 = Zt D−1 t et + Lt r t − Vt Ct κ t , −1 Ut−1 = Zt D−1 t Zt + Lt Ut Lt + Vt Ct Vt ,
to obtain {Ct , κ t , r t , Ut , Vt }nt=0 and save ηt = Ft (Gt D−1 t et + Jt r t ) + κ t , t = 0, 1, . . . , n,
where G0 = O, κ 0 ∼ N (0, C0 ), and C0 = F0 (I − H0 U0 H0 )F0 . 4. The obtained set of vectors (η0 , η1 , . . . , ηn ) is the posterior sample of (F0 u0 , F1 u 1 , . . . , Fn u n ) 5. Setting Ft = Ht , compute α t+1 = Wt β + Tt α t + ηt for t = 0, . . . , n − 1 to obtain the posterior sample of α. Algorithm by Durbin and Koopman (2002) [6] Durbin and Koopman [6] proposed an efficient sampling algorithm to sample u based on the disturbance smoother given by Koopman [10]. We first describe the disturbance smoother in four steps. 1. Implement the Kalman filter for t = 1, . . . , n as in Steps 1 and 2 in de Jong and Shephard’s algorithm above. 2. Set r n = 0, Un = O. 3. For t = n, n − 1, . . . , 1, compute −1 r t−1 = Zt D−1 t et + Lt r t , Ut−1 = Zt Dt Zt + Lt Ut Lt ,
4. Compute the posterior mean and covariance matrix of ut as E[ut |y] = Gt D−1 t et + Jt r t , Var[ut |y] = I − Gt D−1 t Gt − Jt Ut Jt ,
for t = 0, 1, . . . , n, where y = ( y1 , y2 , . . . , yn ) and G0 = O. Using the above disturbance smoother, we implement the simulation smoother as below. 1. Implement the disturbance smoother to obtain E[ut |y] for t = 0, 1, . . . , n. 2. Generate a new disturbance u+ t for t = 0, 1, . . . , n and obtain the new response + , . . . , y ) using (2.12) and (2.13). y+ = ( y + n 1 + 3. Implement the disturbance smoother to obtain E[u+ t |y ] for t = 0, 1, . . . , n. 4. Compute + + ηt = Ft (E[ut |y] + u+ t − E[ut |y ]), t = 0, 1, . . . , n,
2.7 Appendix
29
5. Setting Ft = Ht , compute α t+1 = Wt β + Tt α t + ηt for t = 0, . . . , n − 1 to obtain the posterior sample of α.
2.7.2 Augmented Kalman Filter To find ϑ = θ \μ = (φ, ση2 ) that maximizes π ∗ (ϑ|s, y∗ ), we need to evaluate a likelihood g( y∗ |ϑ, s). In the following, we assume that s and ϑ are fixed. Based on [2], we conduct an augmented Kalman filter and calculate the likelihood function. Consider a state-space model given by (2.12) and (2.13) and suppose • β = b + Bμ. In our approximate state-space model, b = (1, 0) , B = (0, 1 − φ) . • μ and ut ’s are uncorrelated. When μ is fixed, the Kalman filter is the recursion Dt = Zt Pt|t−1 Zt + Gt Gt , Kt = (Tt Pt|t−1 Zt + Ht Gt )D−1 t , Pt+1|t = Tt Pt|t−1 Lt + Ht Jt , Lt = Tt − Kt Zt , et = yt − Xt β − Zt at|t−1 , at+1|t = Wt β + Tt at|t−1 + Kt et , for t = 1, . . . , n, where Jt = Ht − Kt Gt and a1|0 = W0 β, P1 = H0 H0 . Further, we consider additional equations f t = yt − Xt b − Zt a∗t|t−1 , a∗t+1|t = Wt b + Tt a∗t|t−1 + Kt f t , A∗t+1|t = −Wt B + Tt A∗t|t−1 + Kt Ft , Ft = Xt B − Zt A∗t|t−1 , for t = 1, . . . , n, where a∗1|0 = W0 b, A∗1|0 = −W0 B. Note that et = f t − Ft μ, at+1|t = a∗t+1|t − A∗t+1|t μ. Then, the log-likelihood given μ and ϑ is n n 1 log |Dt | + f t D−1 log f ( y|μ, ϑ) = − n log 2π + t ft 2 t=1 t=1 n n −1 2 −1 −2μ Ft Dt f t + μ Ft Dt Ft . t=1
t=1
On the other hand, the posterior distribution of μ given y and ϑ is N (c1 , C1 ) where c1 = C1
σ0−2 μ0
+
n t=1
Ft D−1 t
f t , C1 =
σ0−2
+
n t=1
−1 Ft D−1 t Ft
.
30
2 Stochastic Volatility Model
Thus, we obtain the log-likelihood of y as log f (y|ϑ) = log f (y|μ, ϑ) + log π(μ) − log π(μ|y, ϑ) n 1 log |Dt | + log σ02 =− n log 2π + 2 t=1 − log |C1 | +
n
f t D−1 t
ft +
σ0−2 μ20
−
2 C−1 1 c1
.
t=1
References 1. Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90(432), 1313–1321 (1995) 2. de Jong, P.: The diffuse Kalman filter. Ann. Stat. 19, 1073–1083 (1991) 3. de Jong, P., Shephard, N.: The simulation smoother for time series models. Biometrika 82(2), 339–350 (1995) 4. Del Negro, M., Primiceri, G.E.: Time varying structural vector autoregressions and monetary policy: a corrigendum. Rev. Econ. Stud. 82, 1342–1345 (2015) 5. Diebold, F.X., Gunther, T.A., Tay, T.S.: Evaluating density forecasts with applications to financial risk management. Int. Econ. Rev. 39, 863–883 (1998) 6. Durbin, J., Koopman, S.J.: A simple and efficient simulation smoother for state space time series analysis. Biometrika 89(3), 603–616 (2002) 7. Geweke, J.: Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In: Bernardo, J.M., Berger, J.O., Dawid, A., Smith, A. (eds.) Bayesian Statistics, vol. 4, pp. 169–193. Oxford (1992) 8. Harvey, A.C., Ruiz, E., Shephard, N.: Multivariate stochastic variance models. Rev. Econ. Stud. 61, 247–264 (1994) 9. Kim, S., Shephard, N., Chib, S.: Stochastic volatility: likelihood inference and comparison with ARCH models. Rev. Econ. Stud. 65, 361–393 (1998) 10. Koopman, S.J.: Disturbance smoother for state space models. Biometrika 80(1), 117–126 (1993) 11. Omori, Y., Chib, S., Shephard, N., Nakajima, J.: Stochastic volatility with leverage: fast and efficient likelihood inference. J. Econometrics 140(2), 425–449 (2007). https://doi.org/10.1016/j. jeconom.2006.07.008 12. Omori, Y., Watanabe, T.: Block sampler and posterior mode estimation for asymmetric stochastic volatility models. Comput. Stat. Data Anal. 52(6), 2892–2910 (2008). https://doi.org/10. 1016/j.csda.2007.09.001 13. Pitt, M.K., Shephard, N.: Filtering via simulation: auxiliary particle filters. J. Am. Stat. Assoc. 94(446), 590–599 (1999) 14. Shephard, N., Pitt, M.K.: Likelihood analysis of non-Gaussian measurement time series. Biometrika 84, 653–667 (1997) 15. Watanabe, T., Omori, Y.: A multi-move sampler for estimating non-gaussian time series models: comments on Shephard & Pitt (1997). Biometrika 91(1), 246–248 (2004)
Chapter 3
Asymmetric Stochastic Volatility Model
Abstract It has long been recognized in stock markets that there is a negative correlation between today’s return and tomorrow’s volatility [1, 3]. This phenomenon is called the “leverage effect” or “asymmetry.” This chapter extends the SV model to the asymmetric SV (ASV) model and describes an efficient Bayesian method using Markov chain Monte Carlo (MCMC) for the estimation of ASV models (for alternative GARCH-class models, see, e.g., GJR [13], EGARCH [17], and APGARCH [8] models).
3.1 Introduction We first define the ASV model. Let yt and h t denote the daily asset return and the log volatility at time t. We observe the response, yt , but h t is the unobserved latent variable. The ASV model is defined as yt = exp(h t /2)t , t = 1, . . . , n,
(3.1)
h t+1 = μ + φ(h t − μ) + ηt , t = 1, . . . , n − 1, h1 ∼
N (μ, ση2 /(1
− φ )), 2
(t , ηt ) ∼ N (0, ), =
1 ρση , ρση ση2
(3.2) (3.3) (3.4)
where we assume a stationary first-order autoregressive process for h t by setting |φ| < 1 and a corresponding stationary distribution for the initial latent volatility h 1 . The correlation coefficient ρ is introduced to incorporate the leverage effect or asymmetry. The leverage effect refers to the increase in volatility following a drop in equity returns and, in this model, corresponds to a negative correlation between t and ηt (e.g., [1, 17, 24]). It is worth noting that the model in [14] where t and ηt−1 are correlated is distinct and different. The above model given by Eqs. (3.1), (3.2), (3.3), and (3.4) is appealing because it is an Euler approximation to the log-normal Ornstein–Uhlenbeck SV model with leverage. Thus, the methods we develop in this paper, combined with © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Takahashi et al., Stochastic Volatility and Realized Stochastic Volatility Models, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-99-0935-3_3
31
32
3 Asymmetric Stochastic Volatility Model
those of [10, 11, 22], can be used to fit the corresponding continuous time model with discretely sampled data. Letting Y t−1 = (y1 , . . . , yt−1 ) , another distinction is that a model with correlated t and ηt−1 implies that yt |Y t−1 can be skewed, while models that correlate t and ηt have symmetric yt |Y t−1 unless t itself is skewed. On the other hand, in the alternative specification, ρ has two roles: leverage and skewness. In our view, the use of a single parameter to model two effects is unappealing because it makes the parameter difficult to interpret. Another downside of correlating t and ηt−1 is that yt is no longer a martingale difference sequence. A more desirable way of introducing skewness in the distribution of yt |Y t−1 is by modeling t as asymmetric within the setup of (3.4). This allows the SV model to maintain the martingale difference property paralleling the GARCH literature and that on time-changed Lévy processes and Lévy-based SV models. Yu [24] provides further discussion of some of these issues alongside empirical evidence that the model in (3.4) is better supported in a real data example. The model parameter is θ = (μ, φ, ση2 , ρ) . Since there are many latent variables, it is difficult to obtain the likelihood function by integrating them out numerically. Thus, for the statistical inference of model parameters, we apply the Bayesian approach and implement MCMC simulation. The rest of this chapter is organized as follows. In Sect. 3.2, we describe the single-move sampler that samples one h t at a time given {h s }s=t and parameters. As pointed out in Chap. 2, it is known to be inefficient since it produces highly autocorrelated MCMC samples. Next, in Sect. 3.3, we discuss the mixture sampler for the ASV model and describe how to correct the approximation errors. Section 3.4 describes the multi-move sampler and, in Sect. 3.5, the particle filter is introduced to compute the likelihood of the ASV model. Finally, an empirical study is presented in Sect. 3.6.
3.2 Single-Move Sampler for the Asymmetric SV Model For θ , we assume the prior distributions μ ∼ N (μ0 , σ02 ), (φ + 1)/2 ∼ Beta(a, b), ση2 ∼ I G(n 0 /2, s0 /2), ρ ∼ U (−1, 1). Then, the joint posterior density function is given by
3.2 Single-Move Sampler for the Asymmetric SV Model
33
π(h, θ | y) ∝ f ( y, h|θ)π(θ) ∝ (1 + φ)a−1/2 (1 − φ)b−1/2 (1 − ρ 2 )−(n−1)/2 (ση2 )−(n 1 /2+1) s0 (μ − μ0 )2 × exp − 2 − 2ση 2σ02 n (1 − φ 2 )(h − μ)2 1 1 2 h t + yt exp(−h t ) − × exp − 2 2ση2 t=1 ⎫ ⎧ n−1 ⎬ ⎨ 1 2 [h − μ(1 − φ) − φh − y ¯ ] × exp − 2 t t t+1 ⎭ ⎩ 2ση (1 − ρ 2 ) t=1
where h = (h 1 , . . . , h n ) , y = (y1 , . . . , yn ) , n 1 = n 0 + n, and y¯t = ρση yt exp(−h t /2). The MCMC algorithm basically consists of seven blocks as follows. Step 1. Step 2. Step 3. Step 4. Step 5. Step 6. Step 7.
Initialize θ = (μ, φ, ση2 , ρ) and h. Generate φ ∼ π(φ|θ \φ , h, y). Generate μ ∼ π(μ|θ \μ , h, y). Generate ση2 ∼ π(ση2 |θ \σ 2 , h, y). η Generate ρ ∼ π(ρ|θ \ρ , h, y). Generate h ∼ π(h|θ, y). Go to Step 2.
3.2.1 Generation of (μ, φ, ση2 , ρ) Step 2. Generation of φ. We apply the MH algorithm to generate φ. The full conditional posterior density of φ is π(φ|·) ∝ (1 + φ)a−1/2 (1 − φ)b−1/2 exp −
(φ − μφ )2 2σφ2
,
where n−1
μφ =
t=1 (h t+1 − μ − y¯t )(h t − μ) , 2 2 ρ (h 1 − μ)2 + n−1 t=2 (h t − μ)
σφ2 =
ση2 (1 − ρ 2 ) . 2 ρ 2 (h 1 − μ)2 + n−1 t=2 (h t − μ)
Given the current sample φx , we generate a candidate φ y ∼ TN(−1,1) (μφ , σφ2 ) and accept it with probability min
(1 + φ y )a−1/2 (1 − φ y )b−1/2 ,1 . (1 + φx )a−1/2 (1 − φx )b−1/2
34
3 Asymmetric Stochastic Volatility Model
Step 3. Generation of ση2 . The full conditional posterior density of ση2 is π(ση2 |·) ∝ (ση2 )−(n 1 /2+1) exp −
s0 + (1 − φ 2 )(h 1 − μ)2 2ση2
⎫ n−1 ⎬ 1 2 . [h − μ(1 − φ) − φh − y ¯ ] × exp − 2 t t t+1 ⎭ ⎩ 2ση (1 − ρ 2 ) ⎧ ⎨
t=1
Consider the transformation τ = log ση2 with the density f (τ ) = π(ση2 |·)ση2 . ση2 = exp(τ ). Let τˆ denote the mode of f (τ ). Given the current value τ = log ση2 , we propose a candi-
date τ † = log ση2† ∼ N (τˆ , σ 2 (τˆ )) where σ 2 (τ ) = −{∂ 2 log f (τ )/∂τ 2 }−1 and accept it with probability
f (τ † ) f N (τ |τˆ , σ 2 (τˆ )) min 1, . f (τ ) f N (τ † |τˆ , σ 2 (τˆ )) Step 4. Generation of μ. The full conditional posterior density of μ is a normal distribution given by μ|· ∼ N (μ1 , σ12 ), where ⎫ n−1 ⎬ 2 )h (1 − φ (1 − φ) 0 1 + + (h − φh − y ¯ ) , μ1 = σ12 t t t+1 ⎭ ⎩σ2 ση2 ση2 (1 − ρ 2 ) 0 t=1 −1 1 1 − φ2 (n − 1)(1 − φ)2 2 + + . σ1 = ση2 ση2 (1 − ρ 2 ) σ02 ⎧ ⎨μ
Step 5. Generation of ρ. The full conditional posterior distribution of ρ is π(ρ|·) ∝ (1 − ρ 2 )−(n−1)/2 ⎧ ⎫ n−1 ⎨ ⎬ 1 2 × exp − 2 [h t+1 − μ(1 − φ) − φh t − y¯t ] . 2 ⎩ 2ση (1 − ρ ) ⎭ t=1
Consider the transformation r = log{(1 + ρ)/(1 − ρ)} with the density f (r ) = π(ρ|·) ×
exp(r ) − 1 1 − ρ2 , ρ= , 2 exp(r ) + 1
3.3 Mixture Sampler
35
Let rˆ denote the mode of f (r ). Given the current value r = log{(1 + ρ)/(1 − ρ)}, we propose a candidate r † = log{(1 + ρ † )/(1 − ρ † )} ∼ N (ˆr , σ 2 (ˆr)) where σ 2 (r ) = −{∂ 2 log f (r )/∂r 2 }−1 and accept it with probability
f (r † ) f N (r |ˆr, σ 2 (ˆr )) min 1, . f (r ) f N (r † |ˆr , σ 2 (ˆr ))
3.2.2 Generation of h Step 5. Generation of h. The full conditional posterior density of h is
n (1 − φ 2 )(h 1 − μ)2 1 2 [h t + yt exp(−h t )] − π(h|·) ∝ exp − 2 2ση2 t=1 ⎫ ⎧ n−1 ⎬ ⎨ 1 2 . [h − μ(1 − φ) − φh − y ¯ ] × exp − 2 t t t+1 ⎭ ⎩ 2ση (1 − ρ 2 ) t=1
The simple way to sample from the full conditional distribution of h is to generate h t given other h s (s = t) for t = 1, . . . , n, which is called a single-move sampler. For example, we may compute the conditional mode of h t and use the Laplace approximation to obtain the normal distribution as the proposal distribution. However, as discussed in Chap. 2, the single-move sampler for sampling h is inefficient. Thus, we extend the two efficient sampling algorithms for the symmetric SV model described in Chap. 2 to those for the asymmetric SV model in this chapter: the mixture sampler [18] and the multi-move sampler [19].
3.3 Mixture Sampler 3.3.1 Reformulation of the Measurement Equation As in Chap. 2, we consider the transformed measurement equation yt∗ = log yt2 = h t + t∗ , t∗ = log t2 ,
(3.5)
with an i.i.d. error t∗ in (3.5) that follows a log χ12 density ∗ − exp(t∗ ) 1 , t∗ ∈ R, exp t f (t∗ ) = √ 2 2π and approximate it using a mixture of K normal distributions. As discussed, the mixture approximation has the form of
36
3 Asymmetric Stochastic Volatility Model
Table 3.1 Selection of ( pi , m i , vi2 , ai , bi ) i
pi
mi
vi2
ai
bi
1 2 3 4 5 6 7 8 9 10
0.00609 0.04775 0.13057 0.20674 0.22715 0.18842 0.12047 0.05591 0.01575 0.00115
1.92677 1.34744 0.73504 0.02266 −0.85173 −1.97278 −3.46788 −5.55246 −8.68384 −14.65000
0.11265 0.17788 0.26768 0.40611 0.62699 0.98583 1.57469 2.54498 4.16591 7.33342
1.01418 1.02248 1.03403 1.05207 1.08153 1.13114 1.21754 1.37454 1.68327 2.50097
0.50710 0.51124 0.51701 0.52604 0.54076 0.56557 0.60877 0.68728 0.84163 1.25049
g(t∗ ) =
K
pi f N (t∗ |m i , vi2 ), t∗ ∈ R,
i=1
where f N (t∗ |m i , vi2 ) denotes the density function of a normal distribution with mean m i and variance vi2 . Table 3.1 shows ( pi , m i , vi2 , ai , bi ) from [18] with K = 10. However, in the ASV model, t is correlated with ηt in the state equation as h t+1 = μ(1 − φ) + φh t + ηt , ηt |t ∼ N (ρση t , ση2 (1 − ρ 2 )), for t = 1, . . . , n − 1. Noting that t = dt exp(t∗ /2), dt =
1, −1,
if if
yt > 0, yt ≤ 0,
we obtain ηt |dt , t∗ ∼ N (dt ρση exp(t∗ /2), ση2 (1 − ρ 2 )).
(3.6)
The main complication here is that dt is not ignorable when ρ = 0. Thus, Omori et al. [18] have offered the novel strategy of approximating the bivariate conditional density of (t∗ , ηt )|dt with the density f (t∗ , ηt |dt ) = f (t∗ |dt ) f (ηt |t∗ , dt ) = f (t∗ ) f (ηt |t∗ , dt ).
(3.7)
We approximate f (t∗ ) using the mixture of normal distributions as in Chap. 2, and further consider the approximation of f (ηt |t∗ , dt ) using the conditional normal distribution as
3.3 Mixture Sampler
37
g(t∗ , ηt |dt ) =
K
2 ), pi f N (t∗ |m i , vi2 ) f N (ηt |m ηi , vηi
(3.8)
i=1
where 2 = σ 2 (1 − ρ 2 ), m ηi = dt ρση exp(m i /2)[ai + bi (t∗ − m i )], vηi η
and (ai , bi ) are constants to be determined. In other words, we utilize a mixture of bivariate Gaussian densities to approximate the distribution of t∗ , ηt |dt . The remaining question is the 2 ) to well approximate the density of η | ∗ , d determination of the density f N (ηt |m ηi , vηi t t t in the i-th component of the mixture distribution. Due to the form of (3.6), this amounts to approximating exp t∗ /2 exp(−m i /2) by ai + bi t∗ − m i , given t∗ ∼ N (m i , vi2 ). We find the values of (ai , bi ) by considering the mean square norm and setting (ai , bi ) = arg min E (exp t∗ /2 exp(−m i /2) − a − b(t∗ − m i ))2 , a,b
where the solutions to this minimization problem are given by ai = exp(vi2 /8), bi = vi−1 E[z t exp(vi z t /2)] =
exp(vi2 /8) 2
,
for i = 1, 2, . . . , K . The implied values of (ai , bi ) are given in Table 3.1. To investigate how well (3.8) approximates (3.7), we give results for ρ = −0.3, −0.6, and −0.9 in Fig. 3.1. It shows f and g for ηt |t∗ , dt = 1 evaluated with t∗ set at its 25th, 50th, and 75th percentiles. Likewise, Fig. 3.2 shows f and g for t∗ |ηt , dt = 1 evaluated with ηt = −0.67ση , 0, 0.67ση . The results suggest that the approximation is quite good as it is very hard to detect any difference between the true densities f and the approximations g. Further, Fig. 3.3 shows the marginal density of ηt given dt = 1. It is clear that the true conditional joint density given dt is well approximated by the stated bivariate normal mixture. Using the mixture approximation to the density t∗ and introducing the mixture component indicator st ∈ {1, 2, . . . , K }, we can express t∗ |st = m st + vst z 1t ,
ηt |t∗ , st , dt = dt ρση (ast + bst vst z 1t ) exp(m st /2) + ση 1 − ρ 2 z 2t , where (z 1t , z 2t ) ∼ i.i.d. N (0, I2 ). Given s = (s1 , . . . , sn ) , we have that the ASV model can be approximated by the linear Gaussian state-space form
38
3 Asymmetric Stochastic Volatility Model ρ= −0.3, ε t *=log χ 21(0.25) 4
True Approx
2
-0.25 ρ= −0.6, ε t
0.00
4
4
2
2
0.25
*=log
ρ= −0.3, ε t *=log χ 21(0.75)
ρ= −0.3, ε t *=log χ 21(0.5)
-0.25 ρ= −0.6, ε t
χ 21(0.25)
0.00 *=log
-0.25
0.25 χ 21(0.5)
4
4
2
2
2
10.0
0.00
-0.25
0.25
ρ= −0.9, ε t *=log χ 21(0.25)
10.0
0.00
-0.25
0.25
ρ= −0.9, ε t *=log χ 21(0.5)
10.0
7.5
7.5
7.5
5.0
5.0
5.0
2.5
2.5
2.5
-0.25
0.00
0.25
-0.25
0.00
0.25
0.25 2
ρ= −0.6, ε t =log χ 1(0.75)
4
-0.25
0.00 *
0.00
0.25
ρ= −0.9, ε t *=log χ 21(0.75)
-0.25
0.00
0.25
Fig. 3.1 Conditional density of ηt given dt = 1 and t∗ = log χ12 (0.25), log χ12 (0.5), log χ12 (0.75) (left, center, right) for ρ = −0.3, −0.6, −0.9 (top, middle, bottom)
yt∗ = m st + h t + vst , 0 z t , h t+1 = μ(1 − φ) + dt ρση ast exp(m st /2) + φh t + dt ρση bst vst exp(m st /2), ση 1 − ρ 2 z t , where z t = (z 1t , z 2t ) ∼ i.i.d. N (0, I2 ). Assuming the prior distribution for θ = (μ, φ, ση2 , ρ) as in the previous section, we can implement MCMC simulation.
3.3.2 MCMC Algorithm Let y∗ = (y1 , . . . , yn ) and d = (d1 , . . . , dn ) . The simple way to generate θ = (μ, φ, ση2 , ρ) from its posterior distribution is as follows. 1. Initialize s, h and θ . 2. Generate s|h, θ , y∗ , d. 3. Generate (h, θ )|s, y∗ , d by a. Sampling θ \μ |s, y∗ , d. b. Sampling μ|θ \μ , s, y∗ , d. c. Sampling h|θ, s, y∗ , d.
3.3 Mixture Sampler
0.3
39
ρ= −0.3, η t = −0.67ση
ρ= −0.3, η t = 0
0.3
0.3
ρ= −0.3, η t = 0.67ση
True Approx
0.2
0.2
0.2
0.1
0.1
0.1
-10 0.3
-5
0
ρ= −0.6, η t = −0.67ση
-10
-5
0
ρ= −0.6, η t = 0
0.3
-10 0.3
0.2
0.2
0.2
0.1
0.1
0.1
-10 0.4
-5
0
ρ= −0.9, η t = −0.67ση
-10
-5
0
ρ= −0.9, η t = 0
0.4
-10
-5
-5
0
ρ= −0.9, η t = 0.67ση
0.2
0.2
0.2
0
ρ= −0.6, η t = 0.67ση
-10 0.4
-5
-10
0
-5
-10
0
-5
0
Fig. 3.2 Conditional density of t∗ given dt = 1 and ηt = −0.67ση , 0, 0.67ση (left, center, right) for ρ = −0.3, −0.6, −0.9 (top, middle, bottom) with ση = 1 in this example ρ= −0.3 5.0
ρ= −0.6
True Approx
2.5
-0.25
0.00
0.25
ρ= −0.9
5.0
5.0
2.5
2.5
-0.25
0.00
0.25
-0.25
0.00
0.25
Fig. 3.3 Marginal density of ηt given dt = 1 for ρ = −0.3, −0.6, −0.9 (left, middle, right) with ση = 1 in this example
4. Go to Step 2. Step 2. Generation of st . For t = 1, . . . , n, we generate st from the posterior probability mass function π(st |h, θ, y∗ , d) ∝
pst exp vst
−
2 ∗ yt − h t − m st 2vs2t
−
(h t+1 − h¯ st )2 , 2ση2 (1 − ρ 2 )
40
3 Asymmetric Stochastic Volatility Model
where st = 1, . . . , 10, pst is given in Table 3.1, and h¯ st = μ(1 − φ) + φh t + dt ρση exp(m st /2)[ast + bst (yt∗ − h t − m st )].
Step 3a. Generation of (φ, ση2 , ρ). To sample ϑ = θ \μ = (φ, ση2 , ρ) from the approximate posterior density π ∗ (ϑ|s, y∗ , d) ∝ g( y∗ |ϑ, s, d)π(ϑ) using the MH algorithm where g( y∗ |ϑ, s, d) n f N (yt∗ |h t + m st , vs2t ) × f N (h t+1 |h¯ st , ση2 (1 − ρ 2 )) × π(μ)dhdμ, = t=1
we evaluate g( y∗ |ϑ, s, d) using an augmented Kalman filter as shown in Sect. 3.7.2 and ˆ Define ϑ ∗ and ∗ as compute the posterior mode ϑ. ∂ log π(ϑ|s, y∗ ) −1 ˆ ϑ ∗ = ϑ, ∗ = − ˆ, ∂ϑ∂ϑ ϑ=ϑ and generate a candidate ϑ y from the distribution N (ϑ ∗ , ∗ ) truncated over R = {ϑ : |φ| < 1, ση2 > 0, |ρ| < 1}. Let ϑ x denote the current point of ϑ. We accept the candidate ϑ y with probability π(ϑ y |s, y∗ ) f TN (ϑ x |ϑ ∗ , ∗ ) , 1 , α(ϑ x , ϑ y |s, y∗ ) = min π(ϑ x |s, y∗ ) f TN (ϑ y |ϑ ∗ , ∗ ) where f TN denotes the density of the normal distribution truncated over the above region for the proposal. If the candidate ϑ y is rejected, we take the current value ϑ x as the next draw. When the Hessian matrix is not negative definite (or when |ρ| ˆ ≈ 1), we take a flat proposal θ and ∗ = c0 I using some constant c0 . μ∗ = Remark. If it is difficult to obtain the mode ϑˆ and generate from the truncated normal distribution, consider a reparameterization such as (log{(1 + φ)/(1 − φ)}, log ση2 , and log{(1 + ρ)/(1 − ρ)}. Note that we need to include the Jacobian term in the MH algorithm to evaluate the posterior densities of the transformed parameters. Step 3b. Generation of μ. Generate μ|θ \μ , s, y∗ ∼ N (c1 , C1 ), where c1 and C1 are computed using the byproducts of the augmented Kalman filter (see Sect. 3.7.2). Step 3c. Generation of h. We sample h using the simulation smoother [5, 9] (see Sect. 3.7.1).
3.3 Mixture Sampler
41
3.3.3 Correcting for Misspecification 3.3.3.1
Posterior Resampling
In our approach, we approximate the true bivariate density f (t∗ , ηt |dt , θ) with our convenient mixture density g(t∗ , ηt |dt , θ ). Thus, the draws from our MCMC procedure hj,θ j,
j = 1, 2, ..., M,
are from the approximate posterior density g(h, θ | y∗ , d). To produce draws from the correct posterior density f (h, θ| y∗ , d), we simply reweight the sampled draws. Define ∗j
t
j j j j = yt∗ − h t , ηt = (h t+1 − μ j ) − φ j (h t − μ j ).
Then, we compute the weights w∗j =
n ∗j j f (t , ηt |dt , θ j )
∗j j j t=1 g(t , ηt |dt , θ )
,
j = 1, 2, ..., M,
and let w∗j w j = M . ∗ i=1 wi We can now produce a sample from π(h, θ | y∗ , d) by resampling the sampled variates with weights w j ’s. Furthermore, posterior moments can be computed via weighted averaging of the MCMC draws. As shown in [18], the variance of these weights is small, and the effect of reweighting is modest.
3.3.3.2
Exact Sampling Using a Pseudotarget Density
To generate (h, θ) from the true conditional posterior density π(h, θ| y), we alternatively define the pseudotarget density as discussed in [6] and implement the MCMC algorithm. Define the pseudotarget density π(h, ˜ θ , s| y) = π(h, θ| y) × q(s|h, θ, y∗ , d), where q(s|h, θ , y∗ , d) =
n t=1
10
pst g(yt∗ , h t+1 |h t , θ , st , d)
∗ i=1 pi g(yt , h t+1 |h t , θ , st = i, d)
g(yt∗ , h t+1 |h t , θ, st , d) =
,
f N (yt∗ |m st + h t , vs2t ) f N (h t+1 |h¯ st , ση2 (1 − ρ 2 )), t < n, t = n, f N (yt∗ |m st + h t , vs2t ),
h¯ st = μ(1 − φ) + φh t + dt ρση exp(m st /2)[ast + bst (yt∗ − h t − m st )],
42
3 Asymmetric Stochastic Volatility Model
and the marginal density π(h, θ | y) is our target density. The MCMC algorithm is implemented in four blocks. 1. Initialize h and θ. 2. Generate (θ, s)|h, y ∼ π˜ (θ, s|h, y).
˜ s|θ, y). We cona. Generate θ |h ∼ π(θ|h, y) as below. Note that π(θ|h, y) ∝ s π(h, sider the transformation ϑ = (μ, log{(1 + φ)/(1 − φ)}, log ση2 , log{(1 + ρ)/ (1 − ρ)}) where dϑ 4 exp(ϑ 2 ) − 1 dθ = (1 − φ 2 )(1 − ρ 2 )σ 2 , φ = exp(ϑ ) + 1 , 2 η exp(ϑ 4 ) − 1 2 ση = exp(ϑ 3 ), . ρ= exp(ϑ 4 ) + 1 To sample from the conditional posterior density π(ϑ|h, y) = π(θ|h, y) × |dθ/dϑ| ˆ Define ϑ ∗ and ∗ as using the MH algorithm, compute the posterior mode ϑ. ∂ log π(ϑ|h, y) ˆ −1 ϑ ∗ = ϑ, ∗ =− ˆ. ∂ϑ∂ϑ ϑ=ϑ and generate a candidate ϑ y from the distribution N (ϑ ∗ , ∗ ). Let ϑ x denote the current point of ϑ. We accept the candidate ϑ y with probability α(ϑ x , ϑ y |h, y) = min
π(ϑ y |h, y) f N (ϑ x |ϑ ∗ , ∗ ) ,1 , π(ϑ x |h, y) f N (ϑ y |ϑ ∗ , ∗ )
where f N denotes the density of the normal distribution above. If the candidate ϑ y is rejected, we take the current value ϑ x as the next draw. b. Generate s|h, y∗ ∼ q(s|h, θ , y∗ , d). 3. Generate h|θ , s, y ∼ π(h|θ, ˜ s, y). a. Propose a candidate h† = (h †1 , . . . , h †n ) using a simulation smoother for the linear Gaussian state-space model: yt∗ = m st + h t + vst , 0 u ∗t , t = 1, . . . , n, h t+1 = μ(1 − φ) + dt ρση ast exp(m st /2) + φh t + dt ρση bst vst exp(m st /2), ση 1 − ρ 2 u ∗t , t = 1, . . . , n − 1, h 1 ∼ N (μ, ση2 /(1 − φ 2 )), |φ| < 1,
u ∗t ∼ i.i.d.N (0, I2 ), Thus, h† is a sample from π ∗ (h|θ, s, y∗ , d) =
n
∗ t=1 g(yt , h t+1 |h t , θ , st , d) × f N (h 1 |μ, ση2 /(1 − φ 2 )), m( y∗ |θ, s)
3.4 Multi-move Sampler
43
where m( y∗ |θ , s) is a normalizing constant given by m( y∗ |θ , s) =
n
g(yt∗ , h t+1 |h t , θ , st , d) × f N (h 1 |μ, ση2 /(1 − φ 2 ))d h.
t=1
b. Let f (yt , h t+1 |h t , θ ) =
f N (yt |0, exp(h t )) f N (h t+1 |m h , vh ), t < n, t = n, f N (yt |0, exp(h t )),
where m h = μ(1 − φ) + φh t + ρση yt exp(−h t /2), vh = ση2 (1 − ρ 2 ). Given the current value h, accept the candidate h† with probability min 1,
π˜ (h† |θ , s, y)π ∗ (h|θ , s, y∗ , d)
π˜ (h|θ, s, y)π ∗ (h† |θ , s, y∗ , d) π(h† |θ, y)q(s|h† , , y∗ )π ∗ (h| , s, y∗ ) = min 1, π(h|θ, y)q(s|h, , y∗ )π ∗ (h† | , s, y∗ ) q(s|h† , , y∗ ) nt=1 f N (yt , h †t+1 |h †t , θ)g(yt∗ , h t+1 |h t , , st ) = min 1, q(s|h, , y∗ ) nt=1 f N (yt , h t+1 |h t , θ)g(yt∗ , h †t+1 |h †t , , st ) n 10 † † ∗ t=1 f N (yt , h t+1 |h t , θ ) i=1 pi g(yt , h t+1 |h t , , st = i) = min 1, , 10 † n ∗ † t=1 f N (yt , h t+1 |h t , θ ) i=1 pi g(yt , h t+1 |h t , , st = i)
where = (θ, d). 4. Go to Step 2. Remark. The exact sampling using a pseudotarget density is available in R package for the stochastic volatility ASV, which estimates model parameters and computes the logarithm of the likelihood and the marginal likelihood (see, e.g., https://sites.google.com/view/omori-stat/ english/software/asv-r).
3.4 Multi-move Sampler This section extends the multi-move sampler for the symmetric SV model in Chap. 2 to that for the asymmetric SV model. Based on [18–20], we consider sampling h and (μ, ση2 , ρ) using the following reparameterization.
44
3 Asymmetric Stochastic Volatility Model
=
σ2 ρσ ση ρση σ ση2
=
exp(μ) ρση exp(μ/2) , ρση exp(μ/2) ση2
αt = h t − μ, t = 1, . . . , n, and assume the prior distribution for as ∼ IW(ν0 , S0 ), where its probability density function is given by 1 π() ∝ ||−(ν0 +3)/2 exp − tr S0−1 −1 . 2 Our corresponding ASV model is now given by yt = exp(αt /2)t , t = 1, . . . , n, αt+1 = φαt + ηt , t = 1, . . . , n − 1, α1 ∼ N (0, ση2 /(1 − φ 2 )), where |φ| < 1, αt = h t − μ, and 2 σ ρσ ση t ∼ N (0, ), = . 2 ηt ρση σ ση Let α = (α1 , . . . , αn ) . Then, we implement the MCMC simulation in five blocks. Step 1. Step 2. Step 3. Step 4. Step 5.
Initialize θ = (φ, ) and α. Generate φ ∼ π(φ|, α, y). Generate ∼ π(|φ, α, y). Generate α ∼ π(α|φ, , y). Go to Step 2
Step 2. Generation of φ. Let π(φ) denote the prior density function of φ given by π(φ) ∝ (1 + φ)a−1 (1 − φ)b−1 where we assume (φ + 1)/2 ∼ Beta(a, b). The logarithm of the full conditional posterior density of φ is log π(φ|·) = const + log π(φ) + −
α 2 (1 − φ 2 ) 1 log(1 − φ 2 ) − 1 2 2ση2
n−1 1 [αt+1 − φαt − ρση σ−1 yt exp(−αt /2)]2 . 2 2 2(1 − ρ )ση t=1
To sample from the full conditional posterior distribution, we apply the MH algorithm. Given the current sample φx , we generate a candidate φ y ∼ TN(−1,1) (μφ , σφ2 ) where
3.4 Multi-move Sampler
45
n−1 μφ =
−1 t=1 [αt+1 − ρση σ yt exp(−αt /2)]αt , 2 ρ 2 α12 + n−1 t=2 αt
σφ2 =
ση2 (1 − ρ 2 ) . 2 ρ 2 α12 + n−1 t=2 αt
and accept it with probability min
(1 + φ y )a−1/2 (1 − φ y )b−1/2 ,1 . (1 + φx )a−1/2 (1 − φx )b−1/2
Step 3. Generation of . The logarithm of the full conditional probability density is given by α 2 (1 − φ 2 ) (ν1 + 3) 1 log || − tr S1−1 −1 log π(|·) = const − log ση − 1 − 2 2 2 2ση − log σ −
yn2 , 2 2σ exp(αn )
where ν1 = ν0 + n − 1, S1−1 = S0−1 +
n−1
z t z t , z t = yt exp(−αt /2), αt+1 − φαt .
t=1
To sample from the full conditional posterior distribution, we apply the MH algorithm. Given the current sample x , we generate a candidate y ∼ IW(ν1 , S1 ) and accept it with probability ⎡
−1 −1 ⎢ ση,y σ,y exp
α12 (1 − φ 2 )
yn2
⎤
− − ⎥ 2 2 exp(α ) ⎢ ⎥ 2ση,y 2σ,y n ⎥. min ⎢ , 1 ⎢ ⎥ 2 2 α1 (1 − φ ) yn2 ⎣ −1 −1 ⎦ ση,x σ,x exp − − 2 2 exp(α ) 2ση,x 2σ,x n Step 4. Generation of α. The log-likelihood of yt given αt , αt+1 and other parameters excluding the constant term is lt = −
(yt − μt )2 αt − , 2 2σt2
where μt = σt2 =
ρσ ση−1 (αt+1 − φαt ) exp(αt /2), t = 1, . . . , n − 1, 0, t = n, (1 − ρ 2 )σ2 exp(αt ), t = 1, . . . , n − 1, σ2 exp(αn ), t = n.
To sample α from its conditional posterior distribution, we divide it into multiple blocks and generate one block, for example, α˜ ≡ (αs+1 , . . . , αs+m ) , given other blocks. We first
46
3 Asymmetric Stochastic Volatility Model
sample η˜ ≡ (ηs , . . . , ηs+m−1 ) and then obtain αt (t = s + 1, . . . , s + m) recursively, since it is known to be more efficient than sampling the state α˜ (e.g. [9]). To sample η˜ from its full conditional density, f (ηs , . . . , ηs+m−1 |αs , αs+m+1 , ys , . . . , ys+m ), we consider the proposal density f ∗ obtained using the second-order Taylor expansion of the log posterior density around the mode ηˆ˜ as below. log f (ηs , . . . , ηs+m−1 |αs , αs+m+1 , ys , . . . , ys+m ) = const −
s+m−1 1 2 η + L, 2 t=s t
≈ const −
s+m−1 1 2 ˆ η + L+ 2 t=s t
# $ 2 ∂ L ∂ L 1 ˆ˜ E ˆ˜ + (η˜ − η) (η˜ − η) ∂ η˜ η= 2 ˜ η˜ ∂ η∂ ˜ ηˆ˜
˜ ηˆ˜ η=
= const −
ˆ˜ (η˜ − η)
s+m−1 1 2 ˆ ˆ˜ Q( ˆ˜ ˆ˜ − 1 (α˜ − α) ˆ α˜ − α) η + L + dˆ (α˜ − α) 2 t=s t 2
= const + log f ∗ (ηs , . . . , ηs+m−1 |αs , αs+m+1 , ys , . . . , ys+m ), where L=
s+m t=s
lt −
(αs+m+1 − φαs+m )2 I (s + m < n), 2ση2
∂L , t = s + 1, . . . , s + m, ∂αt ⎞ ⎛ As+1 Bs+2 0 . . . . . . 0 ⎜ .. ⎟ .. ⎜B . ⎟ ⎟ ⎜ s+2 As+2 Bs+3 . # $ ⎜ .. ⎟ . . ⎟ ⎜ 2 . . . ∂ L . ⎟ ⎜ 0 Bs+3 As+3 . Q = −E =⎜ ⎟, . .. .. .. .. ⎟ ⎜ . ˜ α˜ ∂ α∂ ⎟ ⎜ . . . . . 0 ⎟ ⎜ ⎟ ⎜ . . . . .. .. .. B ⎠ ⎝ .. s+m 0 . . . . . . 0 Bs+m As+m # $ 2 ∂ L At = −E , t = s + 1, . . . , s + m, ∂αt2 # $ ∂2 L , t = s + 2, . . . , s + m, Bs+1 = 0. Bt = −E ∂αt ∂αt−1 d = (ds+1 , . . . , ds+m ) , dt =
3.4 Multi-move Sampler
47
ˆ˜ It is easy to find dt , At , Bt ˆ L, ˆ are d, L, Q evaluated at α˜ = αˆ˜ (equivalently, η˜ = η). ˆ Q and d, as 1 (yt − μt )2 yt − μt ∂μt yt−1 − μt−1 ∂μt−1 + + + κ(αt ), + 2 2 2 2 ∂α ∂αt 2σt σt σt−1 t 2 ∂μt 2 1 −2 ∂μt−1 At = + σt−2 + σt−1 + κ (αt ), 2 ∂αt ∂αt −2 ∂μt−1 ∂μt−1 Bt =σt−1 , ∂αt−1 ∂αt dt = −
where ⎧ α αt+1 − φαt t ⎨ ρσ exp , t < n, −φ + = ση 2 2 ⎩ 0, t = n, ⎧ ⎨ 0, ∂μt−1 t = 1, = ρσ exp αt−1 , t > 1. ⎩ σ ∂αt 2 η ⎧ ⎨ φ(αt+1 − φαt ) , t = s + m < n ση2 κ(αt ) = ⎩ 0, otherwise, ⎧ 2 ⎪ ⎨φ , t =s+m m, = E λt 2 (ν/2) we define μλ = E[λt ] =
ν 2ν 2 , σλ2 = Var[λt ] = . ν−2 (ν − 2)2 (ν − 4)
Assuming that μw = −βμλ for E[wt ] = 0, and ν > 4 for the finite variance of wt , we have (4.1) wt = β λ¯ t + λt z t , z t ∼ N(0, 1), λt ∼ i.i.d.IG(ν/2, ν/2). Then, the moments of wt are (see Appendix A in [1] for the derivation) E[wt ] = 0, ν 2β 2 ν 2 + = β 2 σλ2 + μλ , 2 (ν − 2) (ν − 4) ν − 2 √ 8β 2 ν 2β ν(ν − 4) 3(ν − 2) + , Sk[wt ] = 3 ν−6 [2β 2 ν + (ν − 2)(ν − 4)] 2
Var[wt ] =
(4.2)
4.3 SV Model with GH Skew Student’s t Error
59
Fig. 4.1 Density function of the GH skew Student’s t distribution
6 Ku[wt ] = 3 + [2β 2 ν + (ν − 2)(ν − 4)]2 16β 4 ν(ν − 2)(ν − 4) 8β 4 ν 2 (5ν − 22) 2 + . × (ν − 2) (ν − 4) + ν−6 (ν − 6)(ν − 8) (4.3) Thus, the skewness and kurtosis of wt are jointly determined by the combination of parameter values of β and ν. When β is nonzero, kurtosis exists for ν > 8. To illustrate that the shape of the distribution depends on the values ν and β, we present the density function of wt in Fig. 4.1 (see [1] for the expression of the density function). We observe from Fig. 4.1a that a lower value of β implies more negative skewness or left-skewness as well as heavier tails. β = 0 corresponds to a symmetric Student’s t density. On the other hand, as shown in Fig. 4.1b, the density becomes less skewed and has lighter tails as ν becomes larger. Additionally, when ν → ∞ or λt = 1 for all t, it becomes a normal density irrespective of the value of β. Hence, we may interpret ν and β as a parameter concerning the heavy tail and skewness, respectively, but we need to note that the overall shape of the distribution is determined by the combination of their values.
4.3 SV Model with GH Skew Student’s t Error Following [20], we extend the ASV model in Sect. 3.1 using a mixture representation of the GH skew Student’s t distribution (4.1) as follows: yt = exp(h t /2)t , t = 1, . . . , n, h t+1 = μ + φ(h t − μ) + ηt , t = 1, . . . , n − 1,
(4.4) (4.5)
60
4 Stochastic Volatility Model with Generalized …
h 1 ∼ N(μ, ση2 /(1 − φ 2 )), where |φ| < 1 and √
β λ¯ t + λt z t 1 ρση , t = , (z t , ηt ) ∼ N(0, ), = ρση ση2 β 2 σλ2 + μλ This model is the same as the one [20] proposed except for the standardization term β 2 σλ2 + μλ , which leaves the variance of the return as exp(h t ). Figure 4.2 shows the density function of t for some values of ν and β. We observe that the standardized density takes higher values around 0 as ν decreases and/or |β| increases. The correlation between t and ηt is a function of ρ, ν, and β: Corr[t , ηt ] =
E
√ λt
β 2 σλ2 + μλ
ρ.
(4.6)
√ Since E[ λt ]/ β 2 σλ2 + μλ is positive, a negative value of ρ implies the leverage effect or volatility asymmetry. This model reduces to the SV model with standardized Student’s t error when β = 0, and it reduces to the SV model with a standard normal error when ν → ∞ irrespective of the value of β. We call the SV models with GH skew-t, Student’s t, and normal distributions SV-N, SV-T, and SV-ST models, respectively.
Fig. 4.2 Density function of the standardized GH skew Student’s t distribution
4.4 MCMC Estimation
61
4.4 MCMC Estimation The model parameter is θ = (μ, φ, ση , ρ, ν, β) . For the statistical inference of the parameters, we again take the Bayesian approach and implement MCMC simulation. For θ , we assume the prior distributions s0 , 2 2 (ρ + 1)/2 ∼ Beta(aρ , bρ ), ν ∼ G (n ν , sν ) I (ν > ν ∗ ), β ∼ N(β0 , σβ2 ),
μ ∼ N(μ0 , σ02 ), (φ + 1)/2 ∼ Beta(a, b), ση2 ∼ IG
n
0
,
where ν ∗ = 4 for the SV-T model and ν ∗ = 8 for the SV-ST model. Then, the joint posterior density function is given by π(h, z, θ | y) ∝ f ( y, h, z|θ )π(θ ) ∝ (1 + φ)a−1/2 (1 − φ)b−1/2 (1 + ρ)aρ −(n+1)/2 (1 − ρ)bρ −(n+1)/2 n
1 2 −(n 1 /2+1) 2 2 n/2 2 × (ση ) (β σλ + μλ ) exp − log(λt ) + h t + y¯t 2 t=1
1 2 2 × exp − 2 s0 + (1 − φ )(h 1 − μ) 2ση n−1
2 1 h t+1 − μ(1 − φ) − φh t − ρση y¯t × exp − 2 2ση (1 − ρ 2 ) t=1 n ν 1 ν nν ν − n log − × exp (ν + 2) log(λt ) + log 2 2 2 2 λt t=1 (β − β0 )2 (μ − μ0 )2 − − sν ν . × exp − 2σ02 2σβ2
where h = (h 1 , . . . , h n ), λ = (λ1 , . . . , λn ), y = (y1 , . . . , yn ), n 1 = n 0 + n, and β 2 σλ2 + μλ yt exp(−h t /2) − β(λt − μλ ) y¯t = . √ λt Following [20], we implement the MCMC algorithm below: Step 1. Initialize θ = (μ, φ, ση , ρ, ν, β) , λ, and h. Step 2. Generate φ ∼ π(φ|θ\φ , λ, h, y). Step 3. Generate μ ∼ π(μ|θ\μ , λ, h, y). Step 4. Generate (ση , ρ) ∼ π(ση , ρ|θ\(ση ,ρ) , λ, h, y). Step 5. Generate ν ∼ π(ν|θ\ν , λ, h, y). Step 6. Generate β ∼ π(β|θ\β , λ, h, y). Step 7. Generate λ ∼ π(λ|θ , h, y).
(4.7)
62
4 Stochastic Volatility Model with Generalized …
Step 8. Generate h ∼ π(h|θ , λ, y). Step 9. Go to Step 2.
4.4.1 Generation of (μ, φ, ση , ρ) The full conditional density of each parameter in (μ, φ, ση2 , ρ) is similar to that of the ASV model in Sect. 3.2.1. Thus, we sample μ and φ via the same sampling procedure described in Sect. 3.2.1 by replacing yt exp(−h t /2) with y¯t defined in (4.7). For ση and ρ, we follow [20] and sample them jointly via the MH algorithm. Step 2. Generation ofφ. We apply the MH algorithm to generate φ. The full conditional posterior density of φ is
π(φ|·) ∝ (1 + φ)
(1 − φ)
¯t )(h t t=1 (h t+1 − μ − ρση y n−1 2 2 ρ (h 1 − μ) + t=2 (h t −
− μ)
a−1/2
b−1/2
(φ − μφ )2 exp − 2σφ2
,
where n−1 μφ =
μ)2
, σφ2 =
ση2 (1 − ρ 2 ) . 2 ρ 2 (h 1 − μ)2 + n−1 t=2 (h t − μ)
Given the current sample φx , we generate a candidate φ y ∼ TN(−1,1) (μφ , σφ2 ) and accept it with probability min
(1 + φ y )a−1/2 (1 − φ y )b−1/2 , 1 . (1 + φx )a−1/2 (1 − φx )b−1/2
Step 3. Generation of μ. The full conditional posterior distribution of μ is a normal distribution given by μ|· ∼ N(μ1 , σ12 ), where n−1 μ0 (1 − φ 2 )h 1 (1 − φ) μ1 = + + 2 (h t+1 − φh t − ρση y¯t ) , ση2 ση (1 − ρ 2 ) t=1 σ02 −1 1 (1 − φ 2 ) (n − 1)(1 − φ)2 2 + + . σ1 = ση2 ση2 (1 − ρ 2 ) σ02
σ12
Step 4. Generation of (ση , ρ). The full conditional posterior distribution of (ση , ρ) is
4.4 MCMC Estimation
63
−(n 1 /2+1) π(ση , ρ|·) ∝ (1 + ρ)aρ −(n+1)/2 (1 − ρ)bρ −(n+1)/2 ση2
1 2 2 × exp − 2 s0 + (1 − φ )(h 1 − μ) 2ση n−1 1 2 × exp − 2 [h t+1 − μ(1 − φ) − φh t − ρση y¯t ] . 2ση (1 − ρ 2 ) t=1 Consider the transformation ω = (ω1 , ω2 ) , where ω1 = log ση and ω2 = log(1 + ρ) − log(1 − ρ) with the density f (ω) = π(ση (ω1 ), ρ(ω2 )|·)|J (ση (ω1 ), ρ(ω2 ))|, where ση (ω1 ) = exp(ω1 ), ρ(ω2 ) =
exp(ω2 ) − 1 , exp(ω2 ) + 1
and |J (·)| is the Jacobian for the transformation, that is, ∂(ση , ρ) ση (1 − ρ 2 ) = . |J (ση (ω1 ), ρ(ω2 ))| = ∂ω 2 Let ωˆ denote the mode of f (ω). Given the current value ω = (log ση , log(1 + ρ) − log(1 − ρ)), we propose a candidate ω† = (log ση† , log(1 + ρ † ) − log(1 − ρ † )) ∼ N(ω∗ , ∗ ), where ω∗ = ωˆ + ∗
∂ log f (ω) ∂ 2 log f (ω) −1 , = . ∗ ∂ω ∂ω∂ω ω=ωˆ ω=ωˆ
Then, we accept the candidate ω† with probability f (ω† ) f N (ω|ω∗ , ∗ ) . min 1, f (ω) f N (ω† |ω∗ , ∗ )
4.4.2 Generation of (ν, β) We implement MH sampling for ν and β separately. Step 5. Generation ofν. The full conditional density of ν is
64
4 Stochastic Volatility Model with Generalized …
π(ν|θ\ν , λ, h, y) ∝ (β 2 σλ2 + μλ )n/2 exp
n n 1 2 ν 1 log(λt ) + − y¯t − 2 2 λt t=1
t=1
⎫
2 ⎬ 1 h t+1 − μ(1 − φ) − φh t − ρση y¯t × exp − 2 ⎭ ⎩ 2ση (1 − ρ 2 ) t=1 ν ν nν log − n log − sν ν . × exp 2 2 2 ⎧ ⎨
n−1
Since it is not easy to sample from this density, we follow [20] and apply the MH algorithm based on a normal approximation of the density around the mode. Consider the transformation v = log ν with the density f (v) = π(ν(v)|·)ν(v), ν(v) = exp(v). Let vˆ denote the mode of f (v). Given the current value v = log ν, we propose a canˆ σ 2 (v))I ˆ (ν † > ν ∗ ) where σ 2 (v) = −{∂ 2 log f (v)/∂v 2 }−1 didate v † = log ν † ∼ N(v, ∗ and ν = 4 for the SV-T model and ν ∗ = 8 for the SV-ST model, respectively. Then, we accept v † with probability ˆ σ 2 (v)) ˆ f (v † ) f N (v|v, . min 1, f (v) f N (v † |v, ˆ σ 2 (v)) ˆ Alternatively, we can adopt efficient AR-MH sampling for ν of the SV-T model as [28] proposed. Step 6. Generation of β. The full conditional density of β is π(β|θ\β , λ, h, y) ∝ (β 2 σλ2 + μλ )n/2 exp
n 1 2 (β − β0 )2 − y¯t − 2 2σβ2 t=1
⎫ n−1
2 ⎬ 1 h t+1 − μ(1 − φ) − φh t − ρση y¯t × exp − 2 ⎭ ⎩ 2ση (1 − ρ 2 ) ⎧ ⎨
t=1
Again, it is not easy to sample from this density, and we apply the MH algorithm based on a normal approximation of the density around the mode. Let βˆ denote the mode of ˆ σ 2 (β)), ˆ where π(β|·). Given the current value β, we propose a candidate β † ∼ N(β, 2 2 2 −1 † σ (β) = −{∂ log π(β|·)/∂β } , and accept β with probability
ˆ σ 2 (β)) ˆ π(β † |·) f N (β|β, min 1, . ˆ σ 2 (β)) ˆ π(β|·) f N (β † |β,
4.4 MCMC Estimation
65
4.4.3 Generation of λ and h Step 7. Generation of λ. The full conditional density of λt is π(λt |θ , h, y) ∝ g(λt ) ×
−(ν+1)/2+1 λt
ν , exp − 2λt
where
[h t+1 − μ(1 − φ) − φh t − ρση y¯t ]2 y¯t2 g(λt ) = exp − − I (t < n) . 2 2(1 − ρ 2 )ση2 Following [20], we use the MH algorithm. Specifically, we generate a candidate λ†t ∼ IG((ν + 1)/2, ν/2) and accept it with probability min{g(λ†t )/g(λt ), 1}. Step 8. Generation ofh. First, we rewrite the SV-ST model (4.4)–(4.5) as yt = (β λ¯ t + λt t ) exp(αt /2)γ , t = 1, . . . , n, αt+1 = φαt + ηt , t = 0, . . . , n − 1, where exp(μ/2) αt = h t − μ, λ¯ t = λt − μλ , γ = . β 2 σλ2 + μλ The log-likelihood of yt given αt , αt+1 and other parameters excluding the constant term is lt = −
(yt − μt )2 αt − , 2 2σt2
where μt = σt2 =
√ [β λ¯ t + ρση−1 λt (αt+1 − φαt )] exp(α/2)γ , β λ¯ t exp(α/2)γ , (1 − ρ 2 )λt exp(αt )γ 2 ,
t = 1, . . . , n − 1,
λn exp(αn )γ ,
t = n.
2
t = 1, . . . , n − 1, t = n,
Using this log-likelihood, we can sample the latent variable (α1 , . . . , αn ) efficiently via the multi-move sampler introduced in Sect. 3.4.
66
4 Stochastic Volatility Model with Generalized …
4.5 News Impact Curve: Simulation-Based Method In this section, we explain the news impact curve (NIC), which measures how the new information affects the return volatility.1 Reference [14] defines the NIC in the context of GARCH models where σt2 = exp(h t ) is known at t − 1 and a change of yt = σt t is solely due to a change of disturbance t . This feature of GARCH models justifies the news impact function with lagged conditional variances fixed at σ 2 and implies the well-known U-shaped NIC.2 Under SV models, however, σt2 is a latent variable, and hence, it is unknown at t − 1. For example, consider the ASV model: yt = exp(h t /2)t , t ∼ N(0, 1) h t+1 = μ + φ(h t − μ) + ηt , ηt ∼ N(0, ση2 ), where |φ| < 1 for the stationary process and ρ = Corr[t , ηt ] captures the volatility asymmetry. Following [29], we first define the news impact function for SV models as a relation between yt and h t+1 . Contrary to in GARCH models, a change of yt is due to a change of either t or h t , or both. This implies a stochastic relation between yt and h t+1 instead of the deterministic relation in GARCH models. This relation can be expressed as a conditional expectation of h t+1 , E[h t+1 |yt ] = μ + φ(E[h t |yt ] − μ) + E[ηt |yt ] = μ + φ(E[h t |yt ] − μ) + ρση yt E[exp(−h t /2)|yt ].
(4.8)
Replacing the conditional expectations with unconditional expectations yields the following news impact function,3
ση2 μ E[h t+1 |yt ] ≈ μ + ρση exp − + 2 8(1 − φ 2 )
yt .
If ρ = 0, this is a flat line. If ρ < 0, this is a downward sloping line. 1 2
This section is partially based on Chap. 2 in [24]. For example, the GARCH(1,1) model specifies the volatility as follows. 2 σt+1 = ω + βσt2 + αyt2 ,
where it is assumed that ω > 0, β ≥ 0 and α ≥ 0 to assure that the volatility σt2 is always positive and |α + β| < 1 to guarantee that the volatility is stationary. The news impact function is then 2 σt+1 = ω + βσ 2 + αyt2 .
3
See, e.g., Yu [29] and Asai and McAleer [3] for other approximation methods.
4.5 News Impact Curve: Simulation-Based Method
67
Such a monotonic news impact line is due to ignoring the dependence between yt and h t and replacing the conditional expectations with the unconditional ones. If E[h t |yt ] is increasing in the absolute return |yt |, the conditional expectation in (4.8) implies a non-monotonic NIC. The joint movement of h t and yt (or t ) is incorporated by generating h t , yt , and h t+1 from the joint distribution via the Bayesian MCMC scheme [25]. Specifically, the following procedure gives M sets of (t , h t , yt , h t+1 ). (i) Set parameters (φ, μ, ρ, ση ). a. Generate h from its stationary distribution, h ∼ N(μ, ση2 /(1 − φ 2 )), and from the standard normal distribution, ∼ N(0, 1). b. Compute daily return, y = exp(h/2), and the one-day-ahead volatility forecast, hˆ = μ + φ(h − μ) + ρση . (ii) Repeat Step (i) for M times. There are several ways to implement Steps (i) and (ii). We can repeat Step (i) by simply fixing the parameters at the estimated values such as the posterior means of the parameter samples generated using the MCMC method. We can also implement Step (i) with different parameter values each time. For example, we can generate y and hˆ for each parameter sample generated in the MCMC estimation of the SV models. This method requires the MCMC estimation scheme but enables us to take the parameter uncertainty into account. ˆ we can estimate hˆ with respect to an arbitrary pair Given the n sets of (, h, h), ¯ using the following Nadaraya–Watson kernel regression, (¯ , h) n ˆ , h) ¯ = h(¯
¯ × k h iB−h h × hˆ i , ¯ i −¯ × k h iB−h h i=1 k B
i=1 k n
i −¯ B
(4.9)
where k is the Gaussian kernel and B and Bh are the bandwidths. On the other hand, the conventional method estimates the one-day-ahead volatility as ˜ , h) ¯ = μ + φ(h¯ − μ) + ρση ¯ . h(¯
68
4 Stochastic Volatility Model with Generalized …
Contrary to the conventional method, which results in a monotonic news impact line, the simulation method can produce a non-monotonic NIC as in GARCH models. It is also applicable to various SV extensions such as the SV-T, SV-ST, and realized SV models introduced in Chap. 5.4
4.5.1 Simulation Example In this subsection, we illustrate the simulation method via 20,000 simulated sets of ˆ with parameter values φ = 0.97, μ = 0, ση = 0.2, and ρ = −0.45. Given (, h, r, h) ˆ we implement the Nadaraya–Watson kernel regression the 20,000 sets of (, h, r, h), 100 100 (4.9) with 101 grid points of ¯ ∈ {−2 + 0.04i}i=0 and h¯ ∈ {−1 + 0.02i}i=0 and bandwidths B = 0.7 and Bh = 0.5. Figure 4.3 shows the estimated impact of today’s news on tomorrow’s volatility with today’s volatility fixed at various levels. When today’s volatility is fixed at a ˆ , h) ¯ is a concavelow level, the NIC according to the simulation-based method h(¯ ˜ , h). ¯ When up-decreasing function, and it lies above the conventional method h(¯ ˆ , h) ¯ becomes an approximately today’s volatility is fixed around the mean, μ = 0, h(¯ ˜ , h) ¯ at some point. On the contrary, linear decreasing function and crosses with h(¯ ˆ , h) ¯ is a concave-down-decreasing when today’s volatility is fixed at a high level, h(¯ ˜ , h). ¯ The news impact strongly depends on today’s function, and it lies below h(¯ volatility level, and its shape can differ with different states of today’s volatility. This result further supports the importance of considering the joint action of today’s news shock and today’s volatility when estimating the NIC. Figure 4.4 shows the NIC against today’s return rt with today’s volatility fixed at various levels. Contrary to the monotonic decreasing function in the conventional method, the simulation-based method provides a U-shaped curve comparable to the ˆ , h) ¯ and h(¯ ˜ , h) ¯ NIC for GARCH type models. Moreover, the difference between h(¯ becomes larger as today’s volatility level moves away from its mean μ = 0.
4.6 Empirical Study In this section, we apply the SV models to daily (close-to-close) returns of two stock indices, the Dow Jones Industrial Average (DJIA) and the Nikkei 225 (N225). The DJIA data are obtained from the Oxford-Man Institute’s Realized Library,5 and the 4
Alternatively, Catania and Proietti [9] have proposed a framework to derive the NIC that takes into account the three different sources of variation in the conditional volatility process: (i) the amount of leverage; (ii) the fatness of the conditional return distribution’s tail; and (iii) the volatility of the volatility component [7, 8]. Additionally, Wang and Mykland [27] have provided nonparametric estimation for a class of stochastic measures of the leverage effect with high-frequency data. 5 The website https://realized.oxford-man.ox.ac.uk/ is closed as of December 2022.
4.6 Empirical Study
69
h t = −0.80
h t = −0.40
h t = −0.60
-0.6
-0.4
-0.2
-0.8
-0.6
-0.4
-2
0
-2
2
h t = −0.20
0.0
0.2
-0.2
-2
0
2
0.3 0.2
-0.1
0.1 -2
0
0.4
2
0.50 -2
2
0
0.75
0.2 0
2
h t = 0.80
1.00
0.6
-2
-2
2
h t = 0.60
0 h t = 0.20
0.4
0.0
0.8
0.4
-2
0.1
2
h t = 0.40
0.6
0 h t = 0.00
0
2
-2
0
2
Fig. 4.3 Estimated impact of today’s news shock t on tomorrow’s volatility h t+1 with today’s volatility h t fixed at various levels using the simulation-based method (solid line) and the conventional method (dotted line) h t = −0.80
-0.25
h t = −0.60
h t = −0.40
0.00
-0.25 -0.50
-0.25 -0.50
-0.75
-0.50
-0.75
-1.00 -2
0 h t = −0.20
2
-2
0 h t = 0.00
2
0.2
0.0
0 h t = 0.20
2
-2
0 h t = 0.80
2
-2
0
2
0.3 0.2
0.0
-0.2
-2 0.4
0.1 -2
0 h t = 0.40
2
-2 0.7
0.5
0 h t = 0.60
2
0.8 0.4 0.5 0.3
0.6 -2
0
2
-2
0
2
Fig. 4.4 NIC against today’s return rt with today’s volatility h t fixed at various levels using the simulation-based method (solid line) and the conventional method (dotted line)
70
4 Stochastic Volatility Model with Generalized …
Fig. 4.5 Time series plots (top) and histograms (bottom) of the daily returns in percentage points for DJIA (left) and for N225 (right)
N225 data are constructed from the Nikkei NEEDS-TICK dataset.6 The time period covered is June 2009 to September 2019 for both indices. Specifically, the DJIA sample contains 2596 trading days from June 1, 2009 to September 27, 2019, and the N225 sample contains 2532 trading days from June 1, 2009 to September 30, 2019. Figure 4.5 presents the time series plots and histograms of the daily returns for DJIA and N225. We observe that daily returns vary substantially around 0 for both series. Additionally, the returns take larger values in the negative direction and appear to be negatively skewed. Table 4.1 presents the descriptive statistics of the daily returns. The means of the daily returns are statistically significantly different from 0 for DJIA but not for N225. Since the magnitude of the mean is small, we do not adjust the mean of the daily returns for both series in the subsequent analyses. The p-value of the LB [19] statistic, adjusted for heteroskedasticity following [13] to test the null hypothesis of no autocorrelation up to 10 lags, indicates that the null hypothesis is not rejected. This allows us to estimate the models using the daily returns without any adjustment for the autocorrelation. The kurtosis of daily returns shows a leptokurtic distribution, as is commonly observed in financial returns, and the skewness of daily returns is significantly negative for both series. Consequently, the Jarque–Bera (JB) statistic rejects its normality. The excess kurtosis and negative skewness are expected to be captured by the GH skew Student’s t distribution.
6
See Ubukata and Watanabe [26] for the construction of the N225 data.
4.6 Empirical Study
71
Table 4.1 Descriptive statistics of the daily returns in percentage points for DJIA and N225 Mean SD Skew Kurt Min Max JB LB DJIA N225
0.044 (0.018) 0.033 (0.026)
0.895 1.321
−0.448 (0.048) −0.544 (0.049)
6.675 (0.096) 8.101 (0.097)
−5.562
4.857
0.00
0.52
−11.153
7.426
0.00
0.55
∗
Standard errors in parentheses JB p-value of the Jarque–Bera statistic for testing the null hypothesis of normality LB p-value of the LB [19] statistic adjusted for heteroskedasticity following [13] to test the null hypothesis of no autocorrelation up to 10 lags
We estimate the SV-N, -T, and -ST models with the following priors: μ ∼ N (0, 100), (φ + 1)/2 ∼ Beta(20, 1.5), (ρ + 1)/2 ∼ Beta(1, 2), ση2 ∼ IG(2.5, 0.025), ν ∼ G(5, 0.5), β ∼ N (0, 1), where we use informative priors reflecting past empirical studies. As in Chap. 3, the multi-move sampler is used to generate h t ’s from their conditional distribution with the number of blocks K + 1 = 90. The MCMC algorithm is iterated 50,000 times after discarding 5000 samples as a burn-in-period.7 For all models and data, the acceptance rates of φ and (ση , ρ) are both higher than 97%, and those in the multi-move sampler for h are higher than 80%. The acceptance rates of ν and λ in the SV-T model are 98.7% and 86.4% for DJIA, respectively, and 98.7% and 91.3% for N225, respectively. Further, those of ν, λ, and β in the SV-ST models are 98.4%, 73.1%, and 99.9% for DJIA, respectively, and 98.7%, 82.2%, and 99.8% for N225, respectively. Figure 4.6 shows the traceplot, sample autocorrelation functions, and density estimates of the MCMC samples for the SV-ST model applied to DJIA. The chains mix well, and the sample autocorrelation functions decay and vanish after 300 for μ, φ, ση , and ρ. On the other hand, the chains mix fine, but the sample autocorrelation functions decay slowly and vanish after 600 for ν and β. Nevertheless, the sample paths appear to be stable for all parameters and the sampling algorithm works fine. Table 4.2 summarizes the estimation results for DJIA. The posterior means of φ indicate high persistence of volatility. Additionally, the posterior means of ρ are negative, while the 95% credible intervals do not contain 0, confirming the leverage effect or volatility asymmetry. The posterior means of ν are around 15, indicating the error distribution’s heavy tail. On the other hand, the posterior means of β are negative, while the 95% credible intervals do not contain 0. This provides strong evidence of skewness in the data. These results are consistent with previous studies. Using the MCMC samples, we compute the leverage effect defined in (4.6) for the SV-T and SV-ST models. Recall that the leverage effect for the SV-N models is simply equal to ρ. Although the values of ρ are different among the three models, we 7
Again, the computational results are generated using Ox (version 8.20).
72
4 Stochastic Volatility Model with Generalized … μ
-0.2
μ
μ
1
4 2
0 -0.6 0
20000
0
40000
φ 0.96 0.94
300
600
900
20000
50 25
0.4
0
40000
ση
300
600
15
0 0
20000
40000
ρ
-0.7 -0.8 -0.9
300
600
900
20000
30
300
600
900
40000
0
300
600
900
β
20000
40000
10 1.5
0.5 0
-0.7
0.10 0.05
1.0
-0.5 -1.5
-0.8
ν
0.5 20000
-0.9
ν
1.0
β
0.4
ρ
10 0
0.975
10 5 0
40000
0.3
ρ
1
ν
0.950
ση
5 0
0 0
0.925
900
ση
1
0.3
0.0
φ
0 0
-0.5
φ
1
20
30
β
0.5 0
300
600
900
-2
-1
0
Fig. 4.6 Traceplots (left), sample autocorrelation functions (center), and density estimates (right) of the MCMC samples for the SV-ST model applied to DJIA
observe that the leverage effects are largely the same. We also compute the skewness and kurtosis, defined in (4.2) and (4.3), respectively. We observe significant excess kurtosis for both the SV-T and SV-ST models and significant negative skewness for the SV-ST model. Figure 4.7 shows the standardized and unstandardized density functions of t for the SV-N, -T, and -ST models, calculated from the posterior means for DJIA. Reflecting the significant excess kurtosis and negative skewness for the SV-ST model, its returns are more likely to take large negative values. We also observe that the densities of the SV-T and -ST models are higher than that of the SV-N model around 0 because of the standardization. Figure 4.8 shows the NIC, a plot of tomorrow’s volatility, h t+1 , against today’s return rt , estimated via the simulation method in Sect. 4.5 using the posterior means of the SV-N, -T, and -ST models for DJIA. Today’s volatility, h t , is fixed at three levels, μ − 2ση , μ, and μ + 2ση , whose values are calculated from the posterior means for the SV-ST model. We use 101 grid points and a bandwidth of 1.0 for all cases. The three models’ NIC is largely the same for the three levels of h t , except for the SV-T model in the higher returns.
4.6 Empirical Study
73
Table 4.2 MCMC estimation results of the SV-N, -T, and -ST models for DJIA μ
φ
ση
ρ
SV-N −0.434 [0.090]a
SV-T
SV-ST
−0.351 [0.092]
−0.434 [0.085]
(−0.605, −0.250)b 10c
(−0.526, −0.162)
(−0.595, −0.264)
17
17
0.936 [0.009]
0.945 [0.008]
0.946 [0.007]
(0.918, 0.952)
(0.928, 0.959)
(0.931, 0.959)
91
77
103
0.346 [0.028]
0.315 [0.025]
0.332 [0.024]
(0.295, 0.408)
(0.269, 0.367)
(0.286, 0.380)
200
143
181
−0.697 [0.041]
−0.754 [0.037]
−0.836 [0.034]
(−0.769, −0.610)
(−0.821, −0.678)
(−0.900, −0.766)
116
97
170
13.445 [3.017]
16.915 [3.619]
(8.843, 20.344)
(11.211, 25.431)
177
435
ν
β
−0.987 [0.263] (−1.588, −0.550) 322
Leverage
−0.697 [0.041]a
−0.737 [0.035]
−0.757 [0.028]
(−0.769, −0.610)b
(−0.801, −0.663)
(−0.805, −0.697)
3.702 [0.229]
4.184 [0.556]
(3.367, 4.239)
(3.583, 5.268)
Kurtosis
−0.502 [0.098]
Skewness
(−0.718, −0.331) Log-ML
−2922.81 (0.23)d
−2917.39 (0.28)
−2895.03 (1.80)
a
Posterior mean with standard deviation in bracket 95% credible interval c Inefficiency factor d Log-marginal likelihood with standard error in parenthesis b
Fig. 4.7 Standardized (left) and unstandardized (right) return densities of the SV-N, -T, and -ST models for DJIA
74
4 Stochastic Volatility Model with Generalized …
Fig. 4.8 NIC for DJIA with today’s volatility h t fixed at μ − 2ση (left), μ (center), and μ + 2ση (right), calculated from the posterior means of the SV-ST model μ
0.75
μ
1
μ 4 2
0
0.25 0
20000
0.975
0
40000
φ
300
600
900
1.0
0
20000
30
ση
0.4
10 0
40000
300
600
900
ση
1.0
0.3 20000
40000
ρ
-0.6
300
600
900
ρ
1
40000
ν
30
300
600
900
40000
0
0.10 0.05 0
300
600
900
β
1
0
20000
40000
10 2
0
-1
-0.5
ν
0.5 20000
0.4
-0.7
ν
1.0
β
0.3
ρ
2.5 0
10 0
0.2 7.5
0 20000
0.95
ση
5 0
-0.8 0
0.90 15
0.5 0
0.75
φ
0.5
0.925
0.25
φ
20
30
β
1 0
300
600
900
-1
0
Fig. 4.9 Traceplots (left), sample autocorrelation functions (center), and density estimates (right) of the MCMC samples for the SV-ST model applied to N225
For comparison of model fitness, we also compute the SV models’ log-marginal likelihood (log-ML). The posterior density of parameters is evaluated using the method given by Chib [10] and Chib and Jeliazkov [11] through 5000 additional reduced MCMC runs. The likelihood component is estimated using the auxiliary particle filter given by Pitt and Shephard [22] with 5000 particles, which is replicated 10 times to obtain its standard error. The log-ML for the SV-ST model is significantly lower than those of the SV-N and SV-T models, while that for the SV-T model is significantly lower compared to in the SV-N model. This shows that incorporating both skewness and a heavy tail improves the goodness of fit.
4.6 Empirical Study
75
Table 4.3 MCMC estimation results of the SV-N, -T, and -ST models for N225 μ
φ
ση
ρ
SV-N 0.302 [0.079]a (0.148, 0.460)b
SV-T
SV-ST
0.362 [0.084]
0.315 [0.081]
(0.202, 0.533)
(0.159, 0.479)
6c
19
16
0.925 [0.013]
0.940 [0.012]
0.944 [0.011]
(0.898, 0.949)
(0.914, 0.962)
(0.921, 0.964)
125
174
166
0.315 [0.032]
0.272 [0.031]
0.273 [0.029]
(0.254, 0.382)
(0.211, 0.339)
(0.220, 0.332)
203
276
242
−0.586 [0.047]
−0.647 [0.049]
−0.699 [0.045]
(−0.672, −0.488)
(−0.734, −0.544)
(−0.783, −0.606)
70
67
98
13.527 [3.290]
16.340 [3.544]
(8.588, 21.433)
(10.546, 24.189)
177
372
ν
β
−0.725 [0.223] (−1.204, −0.345) 219
Leverage
−0.586 [0.047]a
−0.632 [0.047]
−0.654 [0.041]
(−0.672, −0.488)b
(−0.716, −0.533)
(−0.728, −0.570)
3.705 [0.247]
3.777 [0.273]
(3.344, 4.308)
(3.418, 4.389)
Kurtosis
−0.388 [0.090]
Skewness
(−0.577, −0.225) Log-ML
a,b,c,d
−4003.44 (0.13)d
−3999.94 (0.44)
−3988.12 (0.85)
See Table 4.2 for details
Fig. 4.10 Standardized (left) and unstandardized (right) return densities of the SV-N, -T, and -ST models for N225
76
4 Stochastic Volatility Model with Generalized …
Fig. 4.11 NIC for N225 with today’s volatility h t fixed at μ − 2ση (left), μ (center), and μ + 2ση (right), calculated from the posterior means of the SV-ST model
Table 4.3 and Figs. 4.9, 4.10, and 4.11 present the MCMC estimation results for N225. We observe that the results are qualitatively the same as those for DJIA. Reflecting the larger standard deviation of returns presented in Table 4.1, the posterior means of μ are higher than those for DJIA. Given that the larger volatility is expected to partly capture the excess kurtosis and negative skewness of the returns, the kurtosis and negative skewness for the SV-ST model are weaker than they are for DJIA. However, the SV-ST model outperforms the SV-T model, which dominates the SVN model. Thus, it is still important to employ the GH skew-t distribution for modeling financial returns and volatility.
References 1. Aas, K., Haff, I.H.: The generalized hyperbolic skew Student’s t-distribution. J. Finance Econometrics 4(2), 275–309 (2006). https://doi.org/10.1093/jjfinec/nbj006 2. Abanto-Valle, C.A., Lachos, V.H., Dey, D.K.: Bayesian estimation of a skew-Student-t stochastic volatility model. Methodol. Comput. Appl. Prob. 17(3), 721–738 (2015) 3. Asai, M., McAleer, M.: Multivariate stochastic volatility, leverage and news impact surfaces. Econometrics J. 12(2), 292–309 (2009). https://doi.org/10.1111/j.1368-423X.2009.00284.x 4. Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. B 65(2), 367–389 (2003) 5. Barndorff-Nielsen, O., Kendall, D.G.: Exponentially decreasing distributions for the logarithm of particle size. Proc. R. Soc. Lond. A. Math. Phys. Sci. 353(1674), 401–419 (1977). https:// doi.org/10.1098/rspa.1977.0041 6. Berg, A., Meyer, R., Yu, J.: Deviance information criterion for comparing stochastic volatility models. J. Bus. Econ. Stat. 22(1), 107–120 (2004). https://doi.org/10.1198/ 073500103288619430 7. Bollerslev, T., Patton, A.J., Quaedvlieg, R.: Exploiting the errors: a simple approach for improved volatility forecasting. J. Econometrics 192(1), 1–18 (2016). https://doi.org/10.1016/ j.jeconom.2015.10.007 8. Bollerslev, T., Tauchen, G., Zhou, H.: Expected stock returns and variance risk premia. Rev. Financ. Stud. 22(11), 4463–4492 (2009). https://doi.org/10.1093/rfs/hhp008 9. Catania, L., Proietti, T.: Forecasting volatility with time-varying leverage and volatility of volatility effects. Int. J. Forecast. 36(4), 1301–1317 (2020). https://doi.org/10.1016/j.ijforecast. 2020.01.003 10. Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90(432), 1313–1321 (1995)
References
77
11. Chib, S., Jeliazkov, I.: Marginal likelihood from the Metropolis-Hastings output. J. Am. Stat. Assoc. 96(453), 270–281 (2001) 12. Chib, S., Nardari, F., Shephard, N.: Markov chain Monte Carlo methods for stochastic volatility models. J. Econometrics 108(2), 281–316 (2002). https://doi.org/10.1016/S03044076(01)00137-3 13. Diebold, F.X.: Empirical Modeling of Exchange Rate Dynamics. Springer, Berlin (1988) 14. Engle, R.F., Ng, V.K.: Measuring and testing the impact of news on volatility. J. Finan. 48(5), 1749–1778 (1993). https://doi.org/10.1111/j.1540-6261.1993.tb05127.x 15. Fernández, C., Steel, M.F.: On Bayesian modeling of fat tails and skewness. J. Am. Stat. Assoc. 93(441), 359–371 (1998) 16. Kobayashi, G.: Skew exponential power stochastic volatility model for analysis of skewness, non-normal tails, quantiles and expectiles. Comput. Stat. 31(1), 49–88 (2016). https://doi.org/ 10.1007/s00180-015-0596-4 17. Leão, W.L., Abanto-Valle, C.A., Chen, M.H.: Bayesian analysis of stochastic volatility-inmean model with leverage and asymmetrically heavy-tailed error using generalized hyperbolic skew Student’s t-distribution. Statist. Interface 10(4), 529–541 (2017) 18. Liesenfeld, R., Jung, R.C.: Stochastic volatility models: conditional normality versus heavytailed distributions. J. Appl. Econometrics 15(2), 137–160 (2000). https://doi.org/10.1002/ (SICI)1099-1255(200003/04)15:23.0.CO;2-M 19. Ljung, G.M., Box, G.E.P.: On a measure of lack of fit in time series analysis. Biometrika 65(2), 297–303 (1978) 20. Nakajima, J., Omori, Y.: Stochastic volatility model with leverage and asymmetrically heavytailed error using GH skew Student’s t-distribution. Comput. Stat. Data Anal. 56(11), 3690– 3704 (2012). https://doi.org/10.1016/j.csda.2010.07.012 21. Omori, Y., Chib, S., Shephard, N., Nakajima, J.: Stochastic volatility with leverage: fast and efficient likelihood inference. J. Econometrics 140(2), 425–449 (2007). https://doi.org/10.1016/j. jeconom.2006.07.008 22. Pitt, M.K., Shephard, N.: Filtering via simulation: auxiliary particle filters. J. Am. Stat. Assoc. 94(446), 590–599 (1999) 23. Prause, K.: The Generalized Hyperbolic Model: Estimation, Financial Derivatives, and Risk Measures. Ph.D. thesis, University of Freiburg (1999) 24. Takahashi, M.: Essasys in Financial Econometrics. Ph.D. thesis, Northwestern University (2015) 25. Takahashi, M., Omori, Y., Watanabe, T.: News impact curve for stochastic volatility models. Econ. Lett. 120(1), 130–134 (2013). https://doi.org/10.1016/j.econlet.2013.03.001 26. Ubukata, M., Watanabe, T.: Pricing Nikkei 225 options using realized volatility. Japan. Econ. Rev. 65(4), 431–467 (2014). https://doi.org/10.1111/jere.12024 27. Wang, C.D., Mykland, P.A.: The estimation of leverage effect with high-frequency data. J. Am. Stat. Assoc. 109(505), 197–215 (2014). https://doi.org/10.1080/01621459.2013.864189 28. Watanabe, T.: On sampling the degree-of-freedom of Student’s-t disturbances. Stat. Prob. Lett. 52(2), 177–181 (2001). https://doi.org/10.1016/S0167-7152(00)00221-2 29. Yu, J.: On leverage in a stochastic volatility model. J. Econometrics 127(2), 165–178 (2005). https://doi.org/10.1016/j.jeconom.2004.08.002
Chapter 5
Realized Stochastic Volatility Model
Abstract In this chapter, we further extend the SV model by incorporating a modelfree volatility estimator called realized volatility. The realized volatility, calculated from returns observed in a short interval such as 5 min, is a consistent estimator under certain conditions. The extended model can be seen as a hybrid model utilizing the information in realized volatility as well as daily returns. This hybrid model’s MCMC estimation algorithm is presented, as well as its extension with generalized hyperbolic skew Student’s t distribution. In an empirical study, we evaluate the models by comparing the one-day-ahead forecasts of volatility and daily return.
5.1 Introduction In the SV models in Chaps. 2–4, the volatility has been considered as a latent variable and estimated using the MCMC method. In recent years, thanks to the availability of high-frequency data that include prices and other trade information within a day, realized volatility (RV) has become popular for volatility estimation. Basically, a daily RV is defined as the sum of squared intraday returns over a day. Early studies have shown its theoretical and empirical properties such as consistency under some conditions and strong persistence [4, 8–10, 17, 18]. See, for example, Andersen et al. [3, 5, 7, 45, 50] for more recent reviews of RV. To forecast time-varying volatility for financial risk management, we must model the dynamics of RV. Many researchers, including [8–10], have documented that daily RV may follow a long-memory process, so they use autoregressive fractionally integrated moving average (ARFIMA) models (see Beran [19] for long-memory and ARFIMA models). A more widely used model for RV dynamics is the heterogeneous autoregressive (HAR) model [24] proposed, which is not a long-memory model but is known to approximate a long-memory process well. Several extensions, such as HAR with jump components [6], the HAR-GARCH model [26], and the HAR gamma with leverage model [25], have been proposed and applied to option pricing [44]. Although the ARFIMA and HAR models have been shown to significantly improve volatility forecasts relative to GARCH and SV models, the RV is subject to the bias caused by market microstructure noise and non-trading hours. There © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 M. Takahashi et al., Stochastic Volatility and Realized Stochastic Volatility Models, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-99-0935-3_5
79
80
5 Realized Stochastic Volatility Model
are some methods for mitigating the bias in RV: multi-scale estimators [65, 66], the preaveraging approach [39], and the realized kernel (RK) [15, 16]. See, for example, [1, 42, 62] for details. For modeling and forecasting volatility taking into account the bias in RV, two classes of hybrid models have been proposed. One is the realized SV (RSV) model based on the SV model [29, 40, 57], and the other is the realized GARCH (RGARCH) model based on the GARCH model [36]. Several extensions of RSV models have been proposed: ones with the GH skew Student’s t distribution [58, 61], ones that account for the non-normality of RV [21, 48, 49], ones with long memory [12, 54], and multivariate RSV models [41, 55, 64]. On the other hand, extensions of the RGARCH models include the realized EGARCH (REGARCH) model [35] and the REGARCH model with mixed-data sampling (MIDAS) [22]. In this chapter, we focus on the RSV model and its extension with the GH skew Student’s t distribution. To examine the validity of the models, we compare them with the EGARCH and REGARCH models. In particular, following [58, 59], we evaluate the models by comparing the one-day-ahead forecasts of volatility and daily return. The rest of this chapter is organized as follows. Section 5.2 introduces the RV and briefly explains its bias with some correction methods. We describe the RSV model and its extension with the GH skew Student’s t distribution in Sects. 5.3 and 5.4, respectively. Section 5.5 explains the MCMC estimation algorithm. Section 5.6 explains how to evaluate the forecast performance. In Sect. 5.7, we introduce the EGARCH and REGARCH models for comparison with the RSV models. These models and methods are illustrated in an empirical study using daily returns and RVs of US and Japanese stock indices, the DJIA and N225, in Sect. 5.8.
5.2 Realized Volatility We first define the true volatility for day t. Let p(s) denote the logarithm of the price of the asset at time s and assume that it follows the continuous time diffusion process given by d p(s) = μ(s)ds + σ (s)dW (s), where μ(s), σ 2 (s), and W (s) are the drift, the volatility, and the Wiener process, respectively. Let t denote the market closing time on day t. Then, the true volatility for day t is defined as the volatility between t − 1 and t: IVt =
t
σ 2 (s)ds,
t−1
which is also called the integrated volatility. Suppose we have m intraday observations for the day t given by rt−1+1/m , rt−1+2/m , . . . , rt . Then, the RV, the estimator of IVt , is defined by
5.2 Realized Volatility
81
RVt =
m
2 rt−1+i/m .
i=1
Under some conditions, it can be shown that RVt converges to IVt as m → ∞. However, in practice, high-frequency data do not satisfy those conditions, and the estimator is known to have bias. Non-trading hours for which we lack intraday observations constitute one source of bias. If we ignore the non-trading hours to compute the RV, we underestimate the true volatility for a whole day (24 h). To deal with the non-trading hours, Hansen and Lunde [37] have proposed adjusting the RVt as follows: RVHL t
= cHL RVt , cHL
n 2 t=1 (yt − y) = , n t=1 RVt
y=
n 1 yt , n t=1
(5.1)
where RVt represents the RV when we ignore the non-trading hours. The average of RVHL t is set as equal to the sample variance of the daily returns. Another source of bias is microstructure noise. The actual asset prices are observed with microstructure noise, which results in autocorrelations in the asset returns. The effect of microstructure noise is known to grow as the time interval between observed prices shrinks. The optimal time intervals to compute the RV are investigated in the literature [2, 13, 14, 42]. Alternative estimators such as two- or multi-scale estimators [65, 66] and preaveraging estimators [39] have also been proposed. In the empirical study presented in Sect. 5.8, we use the RK [15] as an unbiased volatility estimator. The RK for day t is defined as H
RKt =
h=−H
k
h γh , H +1
where H is the bandwidth, k(·) ∈ [0, 1] is a non-stochastic weight function, and γh =
m
rt−1+i/m rt−1+(i−|h|)/m ,
i=|h|+1
Regarding selecting k(·), Barndorff-Nielsen et al. [16] have suggested the Parzen kernel function and provided a detailed procedure for choosing H . Alternatively, those biases can be incorporated into the two classes of hybrid models: the RSV and RGARCH models. In particular, the RSV model [57] proposed adjusts the bias of the RV by estimating the model parameter as we shall see in Sect. 5.3.
82
5 Realized Stochastic Volatility Model
5.3 Realized Stochastic Volatility Model We extend the SV model with an additional source of information regarding the true volatility, the RV. In addition to the measurement equation of daily return yt , we consider that of log RV, that is, xt = log RVt . Let h t denote the logarithm of the true volatility, h t = log IVt . Then, the RSV model is defined as yt = exp(h t /2)t , t = 1, . . . , n,
(5.2)
h t+1 = μ + φ(h t − μ) + ηt , t = 1, . . . , n − 1, |φ| < 1, xt = ξ + h t + u t , t = 1, . . . , n, h 1 ∼ N(0, ση2 /(1 − φ 2 )),
(5.3) (5.4)
⎛
⎞ 1 ρση 0 (t , ηt , u t ) ∼ N (0, ), = ⎝ρση ση2 0 ⎠ , 0 0 σu2
(5.5)
where u t is assumed to be independent of (t , ηt ) for simplicity. The parameter ξ corrects the bias in log RVt and ξ = 0 implies that there is no bias. If we ignore the non-trading hours to compute RV, we underestimate IVt and expect a negative bias. It has been shown that microstructure noise may result in positive or negative bias [38]. Thus, ξ > 0 if the bias caused by microstructure noise is positive and dominates the negative bias caused by ignoring non-trading hours to compute the RV and ξ < 0 otherwise. In general, we could replace (5.4) with xt = ξ + ψh t + u t . Since, in empirical studies, it is found that this extension does not improve the forecasting performance of the volatilities, we assume ψ = 1 as in [57]. Similar to the SV model, the RSV model has been extended with the GH skew Student’s t distribution [48, 49, 58, 61]. In the following, we extend the RSV model in (5.2)–(5.5), referred to as the RSV-N model hereafter, by employing the GH skew Student’s t distribution for t .
5.4 RSV Model with GH Skewed Student’s t Error Following [58], we extend the RSV-N model in (5.2)–(5.5) by employing the GH skew t distribution. The extended model, referred to as the RSV-ST model hereafter, is given by replacing t and ηt in (5.5) with the following mixture representation: √ β λ¯ t + λt z t t = , (z t , ηt , u t ) ∼ N (0, ), 2 2 β σλ + μλ
(5.6)
5.5 MCMC Estimation
83
where is defined as in (5.5) and λt ∼ i.i.d.IG
ν ν ν 2ν 2 , , μλ = E[λt ] = , σλ2 = Var[λt ] = 2 2 ν−2 (ν − 2)2 (ν − 4)
as defined in Section 4.2. Again, u t is assumed to be normally distributed and independent of (t , ηt ). As described in Sect. 4.3, the overall shape of the distribution is determined by the combination of ν and β. Nevertheless, we may interpret ν and β as a parameter concerning the tail thickness and skewness, respectively. In fact, when the skewness parameter β = 0, this model reduces to the RSV model with Student’s t distribution, referred to as the RSV-T model hereafter.
5.5 MCMC Estimation The model parameter is θ = (μ, φ, ση , ρ, ν, β, ξ, σu ) . For statistical inference of the parameters, we again take the Bayesian approach and implement MCMC simulation. We assume the prior distribution for (μ, φ, ση2 , ρ, ν, β) as in Sect. 4.4, and assume normal and inverse gamma distributions for ξ and σu2 , respectively. That is, μ ∼ N (μ0 , σ02 ), (φ + 1)/2 ∼ Beta(a, b), ση2 ∼ IG(n 0 /2, s0 /2), (ρ + 1)/2 ∼ Beta(aρ , bρ ), ν ∼ G(n ν , sν )I (ν > ν ∗ ), β ∼ N (β0 , σβ2 ), ξ ∼ N (ξ0 , σξ2 ), σu2 ∼ IG(n u /2, su /2), where ν ∗ = 4 for the RSV-T model and ν ∗ = 8 for the RSV-ST model. The joint posterior density function of = (h, λ, θ ) is given by π(|x, y) ∝ f (x, y, h, z|θ )π(θ ) ∝ (1 + φ)a−1/2 (1 − φ)b−1/2 (1 + ρ)aρ −(n+1)/2 (1 − ρ)bρ −(n+1)/2 n 1 log(λt ) + h t + y¯t2 × (ση2 )−(n 1 /2+1) (β 2 σλ2 + μλ )n/2 exp − 2 t=1 1 2 2 × exp − 2 s0 + (1 − φ )(h 1 − μ) 2ση n−1 2 1 × exp − 2 h t+1 − μ(1 − φ) − φh t − ρση y¯t 2ση (1 − ρ 2 ) t=1 n ν 1 ν nν ν − n log − (ν + 2) log(λt ) + × exp log 2 2 2 2 t=1 λt
84
5 Realized Stochastic Volatility Model
(μ − μ0 )2 (β − β0 )2 × exp − − − sν ν 2σ02 2σβ2 n (xt − ξ − h t )2 (ξ − ξ0 )2 su 2 −(n˜ u /2+1) × (σu ) exp − − − , 2σu2 2σu2 2σξ2 t=1 where h = (h 1 , . . . , h n ) , λ = (λ1 , . . . , λn ) , y = (y1 , . . . , yn ) , n 1 = n 0 + n, n˜ u = n u + n and
β 2 σλ2 + μλ yt exp(−h t /2) − β(λt − μλ ) y¯t = . √ λt Following [58], we implement the MCMC algorithm below: Step 1. Initialize θ = (μ, φ, ση , ρ, ν, β, ξ, σu ) , λ, and h. Step 2. Generate φ ∼ π(φ|θ\φ , λ, h, x, y). Step 3. Generate μ ∼ π(μ|θ\μ , λ, h, x, y). Step 4. Generate (ση , ρ) ∼ π(ση , ρ|θ\(ση ,ρ) , λ, h, x, y). Step 5. Generate ν ∼ π(ν|θ\ν , λ, h, x, y). Step 6. Generate β ∼ π(β|θ\β , λ, h, x, y). Step 7. Generate ξ ∼ π(ξ |θ\ξ , λ, h, x, y). Step 8. Generate σu ∼ π(σu |θ\σu , λ, h, x, y). Step 9. Generate λ ∼ π(λ|θ , h, x, y). Step 10. Generate h ∼ π(h|θ , λ, x, y). Step 11. Go to Step 2.
5.5.1 Generation of (μ, φ, ση , ρ, ν, β) and λ Given that u t are assumed to be independent from t and ηt , the joint posterior density function is the same as in the SV-ST model except the last term regarding ξ and σu . Hence, we can generate (μ, φ, ση , ρ, ν, β) and λ in exactly the same manner as in the SV-ST model described in Sect. 4.4.
5.5.2 Generation of ξ and σu Step 7. Generation of ξ . The conditional posterior distribution of ξ is normal, and we generate ξ |· ∼ N (m˜ ξ , s˜ξ2 ),
5.5 MCMC Estimation
85
where m˜ ξ =
σu−2
n
−2 t=1 (x t − h t ) + sξ m ξ nσu−2 + sξ−2
, s˜ξ−2 = nσu−2 + sξ−2 .
Step 8. Generation ofσu . The conditional posterior distribution of σu2 is inverse gamma distribution, and we generate σu2 |· ∼ IG(n˜ u /2, s˜u /2), where n˜ u = n u + n, s˜u =
n
(xt − ξ − h t )2 + su .
t=1
5.5.3 Generation of h Step 10. Generation of h. First, we rewrite the RSV-ST model (5.2), (5.3), (5.4), and (5.6) as yt = (β λ¯ t + λt z t ) exp(αt /2)γ , t = 1, . . . , n, αt+1 = φαt + ηt , t = 0, . . . , n − 1, xt = c + αt + u t , t = 1, . . . , n, where αt = h t − μ, λ¯ t = β(λt − μλ ), γ =
exp(μ/2) β 2 σλ2 + μλ
, c = ξ + μ.
The log-likelihood of (yt , xt ) given αt , αt+1 and other parameters excluding the constant term is lt = −
(yt − μt )2 αt (xt − c − αt )2 − − , 2 2 2σt 2σu2
where μt σt2
⎧ ⎨ [β λ¯ t + ρ √λt (α t+1 − φαt )] exp(αt /2)γ , ση = ⎩ β λ¯ exp(α /2)γ , t t 2 (1 − ρ )λt exp(αt )γ 2 , t = 1, . . . , n − 1, = λn exp(αn )γ 2 , t = n.
t = 1, . . . , n − 1, t = n,
86
5 Realized Stochastic Volatility Model
Using this log-likelihood, we can sample the latent variable (α1 , . . . , αn ) efficiently via the multi-move sampler introduced in Sect. 3.4. See Appendix A.5 in [58] for details.
5.6 Evaluation of Forecasts To assess whether the extensions of the SV model are useful, we evaluate their forecasting performance. Adding some steps to the MCMC estimation procedure above allows us to sample future volatility and returns. From the returns prediction samples, we calculate value-at-risk (VaR), which is defined as a quantile of return distribution and is commonly used to quantify the risk associated with a financial portfolio. Given that the VaR ignores important information contained in the tail beyond that quantile, we also compute an expected shortfall (ES), which is defined as the conditional expectation of the return given that it violates the VaR and accounts for the tail information. Section 5.6.1 explains how to sample one-day-ahead forecasts of volatility and returns and compute volatility, VaR, and ES forecasts. We evaluate those forecasts of the models by comparing suitable loss or scoring functions. Section 5.6.2 presents two robust loss functions for evaluating volatility forecasts, while Sect. 5.6.3 describes a strictly consistent loss function for evaluating VaR and ES forecasts jointly. In Sect. 5.6.4, we explain how to test relative forecast performance based on these loss functions.
5.6.1 Volatility, VaR, and ES Forecasts To obtain one-day-ahead volatility, VaR, and ES forecasts, we follow the Bayesian sampling scheme given by Takahashi et al. [58]. Let n be the last observation used in the estimation. Then, for each sample of (θ , λ, h) generated by the MCMC algorithm above, we generate one-day-ahead log-volatility h n+1 and daily return yn+1 as follows: 2 1. Generate h n+1 from the normal distribution N (μn+1 , σn+1 ), where 2 = (1 − ρ 2 )ση2 . μn+1 = μ + φ(h n − μ) + ρση yn exp(h n /2), σn+1
2. Generate λn+1 from the inverse gamma distribution IG(ν/2, ν/2). 3. Generate z n+1 from the standard normal distribution and calculate yn+1 as yn+1 =
√ β(λn+1 − μλ ) + λn+1 z n+1
exp(h n+1 /2). β 2 σλ2 + μλ
5.6 Evaluation of Forecasts
87
Repeating the above procedure with the MCMC estimation for the parameters (θ , λ, h), we obtain predictive samples of h n+1 and yn+1 . The one-day-ahead volatility forecast can be computed as a mean or a median of the samples of h n+1 . One-day-ahead VaR forecasts with probability α at time n + 1, denoted by VaRn+1 (α), are defined as Pr[yn+1 < VaRn+1 (α)|In ] = α, where In denotes the information set available at time n. The corresponding ES forecasts, denoted by E S n+1 (α), are then given as E S n+1 (α) = E[yn+1 |yn+1 < VaRn+1 (α), In ]. We can obtain one-day-ahead VaR and ES forecasts as the (1 − α)th quantile and conditional average of the samples of yn+1 , respectively.
5.6.2 Loss Functions for Volatility When comparing volatility forecasts, it is common to focus on expected loss using true and latent volatility. In practice, however, we do not observe the true volatility and only obtain the loss using some volatility proxy. The ranking of competing volatility forecasts using a proxy may distort the ranking using the true volatility. In [51], a loss function is defined as robust if the ranking of any two (possibly imperfect) volatility forecasts by expected loss is the same as the ranking using some conditionally unbiased volatility proxy. Following [51], we use a class of robust loss functions. Let x and f be a volatility proxy and a volatility forecast, respectively. Then, the following class of loss functions, indexed by the scalar parameter b, corresponds to the entire subset of robust and homogeneous loss functions, ⎧ b+2 x f b+1 (x − f ) − f b+2 ⎪ ⎪ ⎪ − , ⎪ ⎪ b+1 ⎨ (b + 1)(b + 2)x L(x, f ; b) = f − x + x log , ⎪ f ⎪ ⎪x x ⎪ ⎪ ⎩ − log − 1, f f
for b ∈ / {−1, −2}, for b = −1, for b = −2,
where the degree of homogeneity is equal to b + 2. Given the volatility proxy xt and volatility forecasts f t for t = 1, . . . , T , the mean squared error (MSE) and Gaussian quasi-likelihood (QLIKE) loss functions are obtained when b = 0 and b = −2, respectively, up to additive and multiplicative constants:
88
5 Realized Stochastic Volatility Model T T 1 1 (xt − f t )2 L(xt , f t ; b = 0) = , T t=1 T t=1 2 T T 1 xt xt 1 L(xt , f t ; b = −2) = − log − 1 . QLIKE = T t=1 T t=1 f t ft
MSE =
In [52], the QLIKE is shown to have higher power than the MSE for the tests given by Diebold and Mariano [28] and West [63], referred to as DMW tests hereafter.
5.6.3 A Joint Loss Function for VaR and ES A measure of a probability distribution, such as VaR and ES, is called elicitable if there is a loss function such that the correct forecast of the measure is the unique minimizer of the expected loss. Such loss functions are called strictly consistent for the measure. Although the ES is not an elicitable risk measure on its own [33], the ES and VaR are jointly elicitable under the following class of loss functions indexed by the functions G 1 and G 2 [31], referred to as FZ loss functions hereafter,
1 L FZ (y, v, e; α, G 1 , G 2 ) = [I (y ≤ v) − α] G 1 (v) − G 1 (y) + G 2 (e)v α 1 − G 2 (e) I (y ≤ v)Y − e − G2 (e), α
where y, v, and e denote a return, VaR, and ES, respectively, G 1 is weakly increasing, G 2 is strictly increasing and strictly positive, and G2 = G 2 . To use the FZ loss function for forecast evaluation, we follow [53] and choose G 1 (a) = 0 and G 2 (a) = −1/a.1 The resulting loss function, referred to as FZ0 loss function hereafter, is L FZ0 (y, v, e; α) = −
v 1 I (y ≤ v)(v − y) + + log(−e) − 1. αe e
(5.7)
The FZ0 loss function is the only one FZ loss function generating loss differences that are homogeneous of degree zero [53].
5.6.4 Testing Relative Forecast Performance Although the average losses using the loss functions above are useful for an initial look at the competing models’ forecast performance, they do not reveal whether the loss differences are statistically significant. Hence, we employ conditional and 1
See Taylor [60] for other FZ loss functions.
5.6 Evaluation of Forecasts
89
unconditional predictive ability tests as in [32], referred to as GW tests hereafter, on the loss differences. In contrast to the DMW tests, the GW tests mainly focus on rolling window methods and permit unified treatment of nested and non-nested models. Let L t be a loss difference between two models for the forecast target date t. For the GW test of equal unconditional predictive ability, we set the null and alternative hypotheses as follows: H0 : E[L n,t+1 ] = 0 for t = n, n + 1, . . . , n + n f − 1 and H A : |E[ L¯ n f ]| ≥ d > 0 for all n f sufficiently large, where n is the number of observations used in estimation, n f is the number of out-of-sample forecasts, and L¯ n f is the average of loss differences, i.e., L¯ n f = n+ f f −1 n −1 L t+1 . The test statistic is t=n f tn f =
√ n f L¯ n f , σˆ n f
(5.8)
where σˆ n f is a heteroskedasticity and autocorrelation consistent (HAC) estimator [11, √ 47] of the asymptotic variance of n f L¯ n f . The test statistic tn f coincides with that which [28] proposed and is, under the null hypothesis, distributed asymptotically as a standard normal distribution. Thus, for a significant level of p, we reject the null hypothesis of equal unconditional predictive ability when |tn f | > z p/2 , where z p/2 is the (1 − p/2)quantile of N (0, 1). For the GW test of equal conditional predictive ability, we consider the null hypothesis as H0,h : E[L t+1 |It ] = 0 for t = n, n + 1, . . . , n + n f − 1, where It is the information set available at time t. For a given choice of test function ft , which is q × 1 vector measurable with respect to It [56], the null hypothesis is equivalent to stating that H0,h : E[ft L t+1 ] = 0 for t = n, n + 1, . . . , n + n f − 1. Let Zt+1 = ft L t+1 and Z¯ n f = n −1 f is given by
n+n f −1 t=n
Zt+1 . Then, the alternative hypothesis
H A,h : E[Z¯ n f ]E[Z¯ n f ] ≥ d > 0 for all n f sufficiently large.
90
5 Realized Stochastic Volatility Model
We use the following Wald-type test statistic ˆ −1 ¯ Tn f = n Z¯ n f n f Zn f , n+n f −1 ˆ n f = n −1 t=n where Zt+1 Zt+1 is a q × q matrix that consistently estimates f the variance of Zt+1 . Under the null hypothesis, the test statistic Tn f is distributed asymptotically as a chi-square distribution with q degrees of freedom, denoted by χq2 . Thus, for a significant level of p, we reject the null hypothesis of equal conditional 2 2 2 predictive ability when Tn f > χq,1− p , where χq,1− p is the (1 − p)quantile of χq . In practice, we need to choose the test function ft that is thought to help distinguish between the two models’ or methods’ forecast performance. Given our focus on one-day-ahead forecasts, we choose a constant and the first-order lagged loss differences as the test function, i.e., ft = (1, L t ) for t = n, n + 1, . . . , n + n f − 1. With this test function, rejecting the equal conditional predictive ability implies that the lagged loss differences L t predict the one-day-ahead loss differences L t+1 . Let bˆ denote the coefficient for a regression of L t+1 on ft . Then, the predicted n+n −1 loss differences { bˆ ft }t=n f are useful for assessing the relative performances of the two at different times. In the empirical study presented in Sect. 5.8, we summarize relative performance by computing the proportion of times that one model has larger predicted losses than the other as follows: 1 nf
n+n f −1
I ( bˆ ft > 0).
(5.9)
t=n
5.7 EGARCH and Realized EGARCH Models In the empirical study below, we evaluate the forecast performance of the RSV models. Given that GARCH models (including ARCH [30], GARCH [20], and their extensions) have also been used to describe time-varying volatility, we examine the validity of the RSV models by comparing them with GARCH models as well as SV models. Among numerous GARCH models,2 we choose the EGARCH model that [46] proposed, which is represented by yt = exp(h t /2)t , t ∼ D(0, 1), t = 1, . . . , n, h t+1 = ω + ϕ(h t − ω) + τ t + γ (|t | − E[|t |]), t = 1, . . . , n − 1, |ϕ| < 1, where h 1 = ω and D(0,1) indicates a distribution with a 0 mean and unit variance. If τ < 0, this model is consistent with volatility asymmetry. If we assume that h 1 = ω, then evaluating this model’s likelihood is straightforward. In the empirical study, we 2
See, for example, Andersen et al. [5] and references therein.
5.8 Empirical Study
91
consider the standard normal and Student’s t distributions for D(0,1) and estimate these models’ parameters using the QML method. We also predict the one-day-ahead volatility, VaR, and ES forecasts by plugging the QML estimates into the parameters. Incorporating the RV, Hansen et al. [36] and Hansen and Huang [35] extend the GARCH and EGARCH models, respectively. Since the REGARCH model is more general than the RGARCH model, we use the REGARCH model, which is given by yt = exp(h t /2)t , t ∼ D(0, 1), t = 1, . . . , n, h t+1 = ω + ϕ(h t − ω) + τ1 t + τ2 (t2 − 1) + γ νt , t = 1, . . . , n − 1, |ϕ| < 1, xt = ζ + h t + δ1 t + δ2 (t2 − 1) + vt , vt ∼ N (0, σv2 ), t = 1, . . . , n,
(5.10)
with h 1 = ω. Originally, Hansen and Huang [35] use the following equation instead of (5.10): xt = ζ + ψh t + δ1 t + δ2 (t2 − 1) + vt vt ∼ N (0, σv2 ). Since the estimate of ψ is usually close to unity, we do not estimate it assuming ψ = 1. ζ is the bias correction parameter in log RVt , while ζ = 0 implies that there is no bias. Similarly to the EGARCH model, we consider the standard normal and Student’s t distributions for D(0,1). Assuming h 1 = ω, we estimate the parameters in these models using the QML method and predict the one-day-ahead forecasts by plugging the estimates into the parameters. See [23] for other realized GARCH models with Bayesian estimation methods and applications to volatility, VaR, and ES forecasts.
5.8 Empirical Study In this section, we apply the RSV models to daily (close-to-close) returns of two stock indices, the DJIA and N225. Following [42], we use a 5-min RV, calculated by ignoring the non-trading hours, among various RV estimators. Data sources are the same as in Sect. 4.6. Again, the DJIA sample contains 2596 trading days from June 1, 2009 to September 27, 2019, and the N225 sample contains 2532 trading days from June 1, 2009 to September 30, 2019. Figure 5.1 presents the time series plots of the 5-min RVs and their logarithms as well as histograms for DJIA and N225. We observe that the RVs vary substantially with occasional clusters and some spikes for both series.3 Reflecting such occasional large RV values, their logarithms appear to be positively skewed. 3
For DJIA, the single largest spike in the RV represents the “upheaval” of August 24, 2015 (see, e.g., The New York Times archived at https://bit.ly/36hYzZt). For N225, the largest spike, which occurred on March 15, 2011, represents the effect of the earthquake on March 11, 2011, in Japan.
92
5 Realized Stochastic Volatility Model
Fig. 5.1 Time series plots of 5-min RVs (top) and their logarithms (middle) as well as histograms of the logarithms (bottom) for DJIA (left) and N225 (right)
Table 5.1 presents the descriptive statistics of the 5-min RVs and their logarithms. For both DJIA and N225, the LB statistic rejects the null of no autocorrelation, which is consistent with high persistence volatility, known as volatility clustering. The skewness is significantly positive, and the kurtosis shows that the distribution is leptokurtic. Consequently, the JB statistic rejects the normality of logarithms of 5-min RVs. This contradicts the normality assumption for the error term u t in the RSV model (5.5), but we stick to it in this book. Incorporating the non-normality into the model would be a good subject for future research.
5.8.1 Estimation Results We estimate the RSV-N, -T, and -ST models with the following priors:
5.8 Empirical Study
93
Table 5.1 Descriptive statistics of the logarithms of 5-min RVsa for DJIA and N225 Mean SD Skew Kurt Min Max JB DJIA N225
−0.987 (0.021) −0.858 (0.017)
1.048 0.844
0.317 (0.048) 0.589 (0.049)
3.177 (0.096) 4.219 (0.097)
LB
−3.953
4.087
0.00
0.00
−3.321
3.258
0.00
0.00
∗
Standard errors are in parentheses Calculated from squared returns in percentage points JB p-value of the Jarque–Bera statistic for testing the null hypothesis of normality LB p-value of the [43] statistic adjusted for heteroskedasticity following [27] to test the null hypothesis of no autocorrelation up to 10 lags a
μ ∼ N (0, 100), (φ + 1)/2 ∼ Beta(20, 1.5), (ρ + 1)/2 ∼ Beta(1, 2), ση2 ∼ IG(2.5, 0.025), ν ∼ G(5, 0.5), β ∼ N (0, 1), ξ ∼ N (0, 1), σu ∼ G(2.5, 0.1), where we use informative priors reflecting past empirical studies. As in Chaps. 3 and 4, the multi-move sampler is used to generate h t ’s from their conditional distribution with the number of blocks K + 1 = 90. The MCMC algorithm is iterated 50,000 times after discarding 5000 samples as a burn-in-period.4 For all models and data, the acceptance rates of φ and (ση , ρ) are both higher than 97%, and those in the multi-move sampler for h are higher than 86%. The acceptance rates of ν and λ in the RSV-T model are 98.9% and 95.7% for DJIA, respectively, and 98.8% and 98.3% for N225, respectively. Further, those of ν, λ, and β in the RSV-ST models are 73.6%, 85.9%, and 99.8% for DJIA, respectively, and 80.6%, 91.3%, and 99.9% for N225, respectively. Figure 5.2 shows the traceplot, sample autocorrelation functions, and density estimates of the MCMC samples for the RSV-ST model applied to DJIA. The chains mix well, and the sample autocorrelation functions decay and vanish quickly for ξ and σu . For the other parameters, the chains mix better, and the sample autocorrelation functions decay more quickly than those for the SV-ST model in Chap. 4. Hence, as in [50] and [59], we observe that adding the RV equation (5.4) improves the sampling efficiency because it provides additional information on the latent volatility. Table 5.2 summarizes the estimation results for DJIA. For the parameters (μ, φ, ση , ρ), we confirm similar results as observed in Chap. 4, namely high persistence and asymmetry of volatility. Compared to the SV models, the posterior means of ρ are significantly higher (closer to 0), and consequently, the leverage effects are weaker for the RSV models. We also observe that the inefficiency factors decreased substantially compared to the SV models. This further supports that the RV improves the sampling efficiency. These results are consistent with [59]. The results for the tail and leverage parameters ν and β are somewhat different compared to the SV models. The posterior means of ν for the RSV-T and -ST 4
Again, the computational results are generated using Ox (version 8.20).
94
5 Realized Stochastic Volatility Model μ
-0.4 -0.8 0
20000
40000
φ
0.96 0.94 0.92 0
20000
0
40000
20000
0 40 20 0
20000
20000
-0.5 -1.5 0
20000
-0.1 -0.3 0
20000
40000
0.55 0.50 0
0
20000
40000
600
0
20
-2 10 5
300
600
600
900
-0.4
40
-1
0
ξ
900
-0.3 30 10
300
-0.5
β
900
σu
1 0
-0.6
1.5 0.5 300
0.35
ν
900
ξ
1 0
σu
600
0.950
ρ
900 0.075 0.025
300
0
40000
600
β
1 0
ξ
300
0.925
0.30 10 5
-0.25
ση
900
ν 0
40000
600
-0.50
φ 0.900
30 10 300
0
-0.75
900
ρ
1 0
β
600
μ
900 50 25
300
0
40000
600
ση
1 0
ν
300
0
40000
4 2
φ
1 0
ρ
-0.4 -0.6
0 1 0
ση
0.35 0.30
μ
1 0
-0.1
σu 0.50
0.55
Fig. 5.2 Traceplots (left), sample autocorrelation functions (center), and density estimates (right) of the MCMC samples for the RSV-ST model applied to DJIA
models are larger than those for the SV-T and -ST models, respectively, although not significantly so. The posterior mean of β for the RSV-ST model is, though not significantly, somewhat closer to 0. Consequently, the kurtosis and skewness of the error distribution are both weaker than in the SV models. Despite these differences, the standardized and unstandardized density functions of t for the RSV-N, -T, and -ST models shown in Fig. 5.3 are quite similar to those for the SV models. Figure 5.4 shows the NIC for the RSV-N, -T, and -ST models, estimated as in Chap. 4. The shapes of the NICs are largely the same across the models for the three levels of h t . We also observe that the NIC for the RSV-ST model is located below the other two models, except in the region of high returns. Note that the leverage effect and NIC for the RSV models, as well as the kurtosis and skewness of the error distribution, can be calculated irrespective of ξ and σu because of the independent assumption on u t . For the additional parameters (ξ, σu ), we confirm the results observed in previous studies. The posterior means of ξ are negative, and the credible intervals are away from 0 for all models, showing the downward bias of the 5-min RV, due mainly to the non-trading hours. We also observe that the results for (ξ, σu ) are nearly identical across the models.
5.8 Empirical Study
95
Table 5.2 MCMC estimation results of the RSV-N, -T, and -ST models for DJIA RSV-N RSV-T RSV-ST μ
φ
ση
ρ
−0.520 [0.087]a (−0.687, −0.346)b 7c 0.927 [0.008] (0.911, 0.942) 18 0.317 [0.015] (0.289, 0.347) 35 −0.483 [0.035] (−0.550, −0.414) 23
−0.477 [0.089] (−0.647, −0.299) 9 0.931 [0.008] (0.916, 0.945) 20 0.307 [0.014] (0.281, 0.336) 40 −0.511 [0.036] (−0.581, −0.440) 23 19.629 [4.391] (12.738, 29.965) 108
−0.258 [0.036] (−0.329, −0.187) 27 0.508 [0.011] (0.486, 0.530) 17
−0.282 [0.039] (−0.359, −0.208) 28 0.513 [0.011] (0.491, 0.535) 17
−0.483 [0.035]a (−0.550, −0.414)b
−0.504 [0.035] (−0.571, −0.433) 3.415 [0.118] (3.231, 3.687)
−5313.78 (0.21)d
−5318.82 (0.29)
ν
β
ξ
σu
Leverage Kurtosis Skewness Log-ML
−0.526 [0.085] (−0.688, −0.354) 9 0.930 [0.007] (0.914, 0.944) 15 0.313 [0.013] (0.288, 0.339) 32 −0.515 [0.034] (−0.582, −0.447) 33 20.454 [4.300] (13.805, 30.095) 295 −0.833 [0.238] (−1.366, −0.438) 145 −0.257 [0.037] (−0.331, −0.186) 42 0.513 [0.011] (0.492, 0.535) 15 −0.484 [0.031] (−0.544, −0.423) 3.596 [0.184] (3.324, 4.034) −0.326 [0.072] (−0.470, −0.190) −5302.64 (0.49)
a
Posterior mean with standard deviation in bracket 95% credible interval c Inefficiency factor d Log-marginal likelihood with standard error in parenthesis b
The log-marginal likelihood for the RSV-ST model is significantly lower than that for the RSV-N and RSV-T models, while that for the RSV-N model is significantly lower compared to in the RSV-T model. Thus, incorporating the skewness improves the goodness of fit, but incorporating the heavy tail does not. This reflects the larger estimates of ν and significantly negative estimates of β and is in line with [58].
96
5 Realized Stochastic Volatility Model
Fig. 5.3 Standardized (left) and unstandardized (right) return densities of the RSV-N, -T, and -ST models for DJIA
Fig. 5.4 NIC for DJIA with today’s volatility h t fixed at μ − 2ση (left), μ (center), and μ + 2ση (right), calculated from the posterior means of the RSV-ST model
Table 5.3 and Figs. 5.5, 5.6 and 5.7 present the MCMC estimation results for N225. Although some results are qualitatively the same as those for DJIA and for the SV models in Chap. 4, some are quite different. Notably, the 95% credible interval of β contains 0 and the skewness is no longer significant, whereas the estimates of ν are large and the kurtosis has decreased. Consequently, the estimated return densities are similar across the models, and the estimates of log-ML show that the RSV-N model outperforms the other two models. These results appear to contradict the strong excess kurtosis and negative skewness of daily returns observed in Chap. 4. This may be due to the normality assumption on the error terms of log-RVs because their skewness and kurtosis for N225 in Table 5.1 show stronger non-normality than for DJIA. We leave further investigation on this matter to future research.
5.8.2 Prediction Results Following [58], we estimate the volatility, VaR, and ES forecasts using a rolling window estimation scheme with the window size fixed. For DJIA, the fixed window size is 1993, and the last observation dates vary within the period April 28, 2017 to September 26, 2019. For N225, the window size is 1942, and the last observation dates vary within the period April 28, 2017 to September 27, 2019. We compute one-
5.8 Empirical Study
97
Table 5.3 MCMC estimation results of the RSV-N, -T, and -ST models for N225 RSV-N RSV-T RSV-ST μ
φ
ση
ρ
0.247 [0.073]a (0.107, 0.391)b 9c 0.913 [0.010] (0.893, 0.932) 12 0.292 [0.014] (0.266, 0.319) 25 −0.281 [0.037] (−0.353, −0.209) 15
0.273 [0.074] (0.129, 0.422) 9 0.916 [0.010] (0.896, 0.934) 18 0.286 [0.013] (0.261, 0.313) 39 −0.290 [0.037] (−0.364, −0.217) 18 24.062 [5.108] (15.879, 35.845) 106
−1.045 [0.033] (−1.111, −0.981) 34 0.401 [0.011] (0.380, 0.422) 16 −0.281 [0.037]a (−0.353, −0.209)b
−1.069 [0.035] (−1.137, −1.000) 35 0.404 [0.011] (0.383, 0.425) 24 −0.287 [0.037] (−0.360, −0.215) 3.319 [0.082] (3.188, 3.505)
−5926.10 (0.49)d
−5933.85 (0.60)
ν
β
ξ
σu
Leverage Kurtosis Skewness log-ML a, b, c, d
0.168 [0.073] (0.027, 0.314) 6 0.919 [0.009] (0.900, 0.936) 14 0.279 [0.012] (0.255, 0.302) 32 −0.295 [0.035] (−0.364, −0.226) 18 26.843 [5.729] (17.219, 39.815) 226 −0.254 [0.220] (−0.718, 0.149) 50 −0.965 [0.032] (−1.029, −0.903) 27 0.420 [0.010] (0.400, 0.439) 16 −0.290 [0.034] (−0.357, −0.223) 3.281 [0.071] (3.170, 3.452) −0.070 [0.058] (−0.186, 0.041) −5936.27 (0.66)
See Table 5.2 for details
day-ahead forecasts at each estimation. In particular, we generate 15,000 prediction samples and compute the means and medians of volatility forecasts for the SV and RSV models, respectively, as well as the quantiles of return forecasts.5 Eventually, we obtain 603 and 590 prediction samples from the beginning of May 2017 to the end of September 2019 for DJIA and N225, respectively. 5
We compute the median of volatility forecasts instead of the mean because the means for the SV models are quite large for days with a few outliers in the posterior samples.
98
5 Realized Stochastic Volatility Model μ
0.4 0.0 0
20000
0
20000
0
20000
40000
-0.2 -0.3 0
ν
40 20 0
20000
0.5 -0.5 0
20000
-0.9 -1.0 0
20000 σu
0.45 0.40 0
20000
40000
600
0
0.50
600
0.925
0.950
0.25
-0.4
600
600
-0.2
40
60
β
900
900
-0.3
20
-1 10 5
300
0.30
ν
900 2 1
300
0.900
ρ
900
0
ξ -1.1
-1.0
-0.9
-0.8
σu
σu
1 0
0.25
ση
900
0.075 0.025 300
0
40000
600
ξ
1 0
40 20 0.875
10 5 300
0
40000
ξ
900
β
1 0
0.00
30 10 300
0
40000
600
ν
1 0
β
900
ρ 0
40000
μ
φ
300
0 1 0
20000
600
ση
1 0
ρ
300
0
40000
5.0 2.5
φ
1 0
ση
0.30 0.25
0
40000
φ
0.95 0.90
μ
1 0
300
600
900
40 20 0.375
0.400
0.425
0.450
Fig. 5.5 Traceplots (left), sample autocorrelation functions (center), and density estimates (right) of the MCMC samples for the RSV-ST model applied to N225
Fig. 5.6 Standardized (left) and unstandardized (right) return densities of the RSV-N, -T, and -ST models for N225
5.8.2.1
Volatility Forecasts
To evaluate the volatility forecasts, we calculate their average loss using MSE and QLIKE, as described in Sect. 5.6.2. The MSE and QLIKE losses are the robust loss functions providing a ranking that is consistent with that using the true volatility as long as the volatility proxy is a conditionally unbiased estimator of the true volatility.
5.8 Empirical Study
99
Fig. 5.7 NIC for N225 with today’s volatility h t fixed at μ − 2ση (left), μ (center), and μ + 2ση (right), calculated from the posterior means of the RSV-ST model
To mitigate the bias caused by market microstructure noise, we use the RK with the Parzen kernel function and bandwidth [16] suggested as a proxy for the latent volatility. Moreover, considering the bias due to non-trading hours, we follow [37] and adjust the effect of non-trading hours on the RK as in (5.1). Specifically, we compute the adjustment term cHL for the RK at time t as follows: t cHL =
s=t−n+1 t
(ys − y¯ )2
s=t−n+1
RKs
and y¯ =
1 n
t
ys ,
s=t−n+1
where n denotes the fixed window size. Figure 5.8 shows the volatility forecasts and the adjusted RKs for DJIA. We observe that the forecasts of the SV models exhibit some spikes. In particular, the SV-ST model has considerably over-predicted volatility at the end of 2018 compared to other models. Figure 5.9 shows similar findings for N225. Table 5.4 presents the average losses of volatility forecasts for DJIA with p-values of the model confidence set (MCS) given by Hansen and Lunde [37].6 The RSV-N model provides the lowest MSE and QLIKE. Noting that the MSE and QLIKE values with ∗∗ and ∗ indicate that the models are in the 75% and 90% MCS, respectively, we observe that, in general, the volatility forecasts of the RSV and REGARCH models are superior to those of the SV and EGARCH models. Additionally, the MSE for the SV-ST model is considerably higher than that of the other models, reflecting the over-predicted volatility at the end of 2018. Note that the MSE and QLIKE are loss functions with a homogeneous degree 1 and 0, respectively, and hence, the QLIKE for the SV-ST model is higher than that of the other models but to a lesser extent. The results of the GW tests on the loss differences are summarized in Table 5.5.7 The lower triangular elements present the test statistics of the unconditional GW test defined in (5.8), where a positive value indicates that the row model has a higher 6
Our rolling window estimation scheme satisfies the stationarity assumption of the MCS procedure with bootstrap methods. We calculate the MCS using 1000 bootstrap replications and a block size equal to 10. See Sect. 4.2 in [34] for details. 7 Given that [52] show that the QLIKE loss has higher power than the MSE in the DMW tests, equivalent to the unconditional GW tests, we only report the results for QLIKE.
100
5 Realized Stochastic Volatility Model
Fig. 5.8 Volatility forecasts and adjusted RKs of DJIA for all period (top) and subperiods (bottom)
QLIKE than the column model. On the other hand, the upper triangular elements present the proportion (in percentage) of the predicted losses for the conditional GW test defined in (5.9), where a higher value indicates that the row model has a higher QLIKE than the column model. The numbers with ∗ and ∗∗ indicate the rejection of the GW tests at the 1% and 5% significant levels, respectively. We observe that the RSV models are significantly better than the SV and EGARCH models, while the REGARCH models are significantly better than the EGARCH models. Thus, we confirm that including the RV improves the volatility forecasts. We also observe that the SV models are significantly better than the EGARCH models. These results are
5.8 Empirical Study
101
Fig. 5.9 Volatility forecasts and adjusted RKs of N225 for all period (top) and subperiods (bottom)
consistent with [59]. Although the t statistics (5.8) and the proportions (5.9) show that the forecasting losses are smaller for the RSV models than for the REGARCH models, there is no significant difference among them. It is worthwhile to check the differences in the volatility forecasts between models over time. Figure 5.10 shows the models’ cumulative loss differences (CLDs) in comparison to the SV-N models. At each time point, the cumulative QLIKE of each model is subtracted from that of the SV-N model.8 Thus, a higher CLD implies higher accuracy. We observe that the values for the RSV and REGARCH models are higher than those for the SV and EGARCH models throughout the prediction 8
The results for the MSE are omitted due to proneness to outliers, which makes model comparison in this regard difficult.
102
5 Realized Stochastic Volatility Model
Table 5.4 Average losses of volatility forecasts for DJIA MSE pMCS QLIKE SV-N SV-T SV-ST RSV-N RSV-T RSV-ST EGARCH-N EGARCH-T REGARCH-N REGARCH-T
1.00∗∗
0.2132 0.2280 0.8092 0.2075 0.2156 0.2137 0.2207 0.2359 0.2100 0.2133
pMCS
0.3019 0.2990 0.3213 0.2526 0.2587 0.2567 0.3580 0.3615 0.2767 0.2794
1.00∗∗ 0.47∗∗ 1.00∗∗ 1.00∗∗ 1.00∗∗ 1.00∗∗ 1.00∗∗ 1.00∗∗ 1.00∗∗
0.00 0.00 0.00 1.00∗∗ 1.00∗∗ 1.00∗∗ 0.00 0.00 0.25∗∗ 0.25∗∗
pMCS MCS p-value ∗∗ , ∗ models in 75% and 90% MCS, respectively Table 5.5 Summary of GW tests on QLIKE of volatility forecasts for DJIA SV SV SV RSV RSV RSV EG EG -N SV-N SV-T SV-ST RSV-N RSV-T RSVST EG-N EG-T REGN REG-T
-T 72
REG
REG
-ST
-N
-T
-ST
-N
-T
-N
-T
0 10
93∗∗
90∗∗
92∗∗
4∗∗
3∗∗
96∗∗ 98∗∗
94∗∗ 98∗∗ 20
95∗∗ 97∗∗ 21 64
7∗∗ 6∗ 5∗∗ 6∗∗ 5∗∗
4∗∗ 5∗ 4∗∗ 5∗∗ 4∗∗
87 89 95 2 4 6
87 88 95 1 2 4
95∗∗ 95∗∗
96∗∗ 96∗∗ 0
−0.5 1.8 −3.5∗∗ −2.8∗∗ −2.9∗∗
2.6∗∗ −3.8∗∗ −4.3∗∗ −3.2∗∗ −4.0∗∗ 1.3 −3.1∗∗ −4.0∗∗ 1.0
−0.3
4.1∗∗ 4.4∗∗ −1.0
4.4∗∗ 4.7∗∗ −1.0
2.3∗ 2.3∗ −1.6
5.2∗∗ 5.5∗∗ 1.4
4.9∗∗ 5.1∗∗ 1.1
4.7∗∗ 4.9∗∗ 1.1
0.8 −3.1∗∗ −3.4∗∗
−0.9
−0.8
−1.4
1.4
1.1
1.2
−2.9∗∗ −3.1∗∗ 1.2
36
EG EGARCH REG REGARCH Lower triangular test statistic of unconditional GW tests defined in (5.8) where a positive value indicates that the row model has a higher loss than the column model Upper triangular proportion in percentage that the row model has higher predicted losses than the column model defined in (5.9) for the conditional GW test ∗∗ , ∗ Rejection of the GW test at the 1% and 5% significant levels, respectively
period, confirming that the RV improves volatility forecasts. We also observe that the difference between the RSV and REGARCH models can be attributed to the forecast for late 2017.9
9
To be precise, the RK on December 1, 2017 was 1.29, and the SV-N model predicted around 0.24, while the RSV and REGARCH models predicted around 0.19 and 0.07, respectively.
5.8 Empirical Study
103
Fig. 5.10 Cumulative loss differences (QLIKE) in comparison to the SV-N model for DJIA Table 5.6 Average losses of volatility forecasts for N225 MSE pMCS QLIKE SV-N SV-T SV-ST RSV-N RSV-T RSV-ST EGARCH-N EGARCH-T REGARCH-N REGARCH-T
0.4983 0.5026 0.6408 0.4962 0.5011 0.5134 0.5715 0.5690 0.5236 0.5205
1.00∗∗ 1.00∗∗ 0.48∗∗ 1.00∗∗ 1.00∗∗ 1.00∗∗ 0.15∗ 0.00 1.00∗∗ 1.00∗∗
0.1970 0.2079 0.2169 0.1434 0.1456 0.1416 0.2228 0.2386 0.1551 0.1532
pMCS 0.00 0.00 0.00 0.40∗∗ 0.00 1.00∗∗ 0.00 0.00 0.00 0.00
pMCS MCS p-value ∗∗ , ∗ models in 75% and 90% MCS, respectively
Tables 5.6 and 5.7 summarize the volatility forecasts for N225. The RSV-N model provides the lowest MSE, while the RSV-ST model gives the lowest QLIKE. We confirm that the results are similar to those for DJIA. Again, the MSE for the SV-ST model is considerably higher than that for the other models, reflecting the over-predicted volatility at the end of 2018. The RSV and REGARCH models are significantly better than the SV and EGARCH models; that is, including the RV improves the volatility forecasts. Additionally, there are significant differences within the same class of models with different return distributions. For example, the RSV-N model is significantly better than the RSV-T model, while the REGARCH-T model is significantly superior to the REGARCH-N model. These differences can be seen in Fig. 5.11.
104
5 Realized Stochastic Volatility Model
Table 5.7 Summary of GW tests on QLIKE of volatility forecasts for N225 SV SV SV RSV RSV RSV EG EG -N SV-N SV-T SV-ST RSV-N RSV-T RSVST EG-N EG-T REGN REG-T
-T
-ST
18∗∗
0∗∗
-N
-T
-ST
-N
-T
REG
REG
-N
-T
96∗∗
96∗∗
97∗∗
95∗∗
95∗∗
1.8 16 3.5∗∗ 1.6 99∗∗ 99∗∗ ∗∗ ∗∗ ∗∗ ∗∗ −3.7 −3.4 −4.6 0 −3.5∗∗ −3.3∗∗ −4.4∗∗ 4.1∗∗ −3.8∗∗ −3.5∗∗ −4.5∗∗ −0.8 −1.9
96∗∗
2.1∗ 1.0 0.5 4.0∗∗ 2.6∗∗ 2.2∗ −2.8∗∗ −2.7∗∗ −3.8∗∗
6.9∗∗ 6.6∗∗ 4.9∗∗
6.7∗∗ 6.4∗∗ 4.8∗∗
6.7∗∗ 4∗∗ ∗∗ ∗∗ 6.5 2.8 4.6∗∗ −6.4∗∗ −6.0∗∗
−3.0∗∗ −2.8∗∗ −3.9∗∗
4.6∗∗
4.4∗∗
4.2∗∗ −6.5∗∗ −6.1∗∗ −5.8∗∗
99∗∗ 93 99
11 19 23
8∗∗
4∗∗
3∗∗
3∗∗
4∗∗
3∗∗
5∗∗
3∗∗
3∗∗
1∗∗
11∗ 10∗
96∗∗ 94∗∗ 99∗∗
97∗∗ 98∗∗
96∗∗ 95∗∗ 99∗∗ 3∗∗ 5∗∗ 1∗∗ 97∗∗ 98∗∗ 98∗∗
See Table 5.5 for details
Fig. 5.11 Cumulative loss differences (QLIKE) in comparison to the SV-N model for N225
5.8.2.2
VaR and ES Forecasts
Figure 5.12 shows the VaR and ES forecasts of the RSV-ST and REGARCH-T models for DJIA. We observe that the VaR violations (actual returns fall below the VaR forecasts) appear similarly across the models not only in the volatile period, such as early 2018 and late 2019, but also in the serene period. The other models’ VaR and ES forecasts move similarly with slightly different levels and are thus omitted for brevity. Table 5.8 presents the violation rates of VaR forecasts, α, ˆ and the average losses using the FZ0 loss function given by Patton et al. [53] and defined in (5.7). We observe that the SV-ST and EGARCH-T models give violation rates closest to the
5.8 Empirical Study
105
Fig. 5.12 VaR and ES forecasts of the RSV-ST (left) and REGARCH-T (right) models for DJIA Table 5.8 Violation rates and FZ0 losses of VaR and ES forecasts for DJIA α = 1% α = 5% SV-N SV-T SV-ST RSV-N RSV-T RSV-ST EGARCH-N EGARCH-T REGARCH-N REGARCH-T
αˆ 2.32 1.99 1.33 2.99 2.82 2.16 2.65 1.82 3.15 2.99
FZ0 1.0714 1.0483 1.1224 1.2445 1.1419 1.0689 1.2928 1.0797 1.3273 1.3202
p MC S 1.00∗∗ 1.00∗∗ 1.00∗∗ 0.69∗∗ 1.00∗∗ 1.00∗∗ 0.58∗∗ 1.00∗∗ 0.31∗∗ 0.35∗∗
αˆ 4.64 4.31 4.31 5.64 5.80 5.64 4.81 4.98 5.47 5.47
FZ0 0.5965 0.5809 0.7012 0.5584 0.5465 0.5387 0.6226 0.5855 0.5652 0.5621
p MC S 0.98∗∗ 1.00∗∗ 0.13∗ 1.00∗∗ 1.00∗∗ 1.00∗∗ 0.40∗∗ 1.00∗∗ 1.00∗∗ 1.00∗∗
αˆ violation rate (%) FZ0 average FZ0 loss p MC S MCS p-value models in 75% and 90% MCS, respectively
∗∗ , ∗
target rates of α = 1% and 5%, respectively. For the target rate α = 1%, the violation rates are larger than 1% for all the models, which indicates that the actual returns exceed the VaR forecasts more often than they should. For α = 5%, the violation rates are fairly closer to 5%. On the other hand, the SV-T and RSV-ST models give the lowest FZ0 loss for α = 1% and 5%, respectively, implying that these models give better forecasts on the tail of the return distribution. We also observe that all the models are in the 75% MCS for α = 1% and 5%, except the SV-ST model, which is the only one in the 90% MCS for α = 5%.
106
5 Realized Stochastic Volatility Model
Table 5.9 Summary of GW tests on FZ0 loss of VaR and ES forecasts for DJIA α = 1% SV SV SV RSV RSV RSV EG EG REG -N SV-N SV-T −0.4 SV-ST 0.5 RSV-N 1.1 RSV-T 0.6 RSV-ST 0.0 EG-N 1.5 EG-T 0.1 REG-N 1.4 REG-T 1.4
REG
-T
-ST
-N
-T
-ST
-N
-T
-N
-T
94
6 7
1 0 2
2 0 2
67 0 92 100 93
1 2∗ 2 2 2 1
23 2 71 99 98 36 100
1 0 0 1 0∗ 0 1 0
1 0 0 1 0∗ 0 1 0 98
0.8 1.4 0.9 0.2 1.8 0.4 1.7 1.6
100∗
0.6 0.1 −0.4 0.9 −0.3 0.9 0.9
−1.7 0.3 −1.0 1.3 1.2
−1.0 1.0 −0.4 2.0∗ 2.0∗
1.2 0.1 1.7 1.6
−2.4∗
−2.1∗ 0.2 0.2
1.4 1.4
−1.8
SV
SV
SV
RSV
RSV
RSV
EG
EG
REG
REG
-N
-T 96
-ST 1 0
-N 96 99 100
-T 97 99 100 100
-ST 100 99 100 99 81
-N 2 2 98 1 1 2
-T 80 25 98 1 1 4 99∗
-N 95 98 100 8 4 9 99 99
-T 96 98 100 11 5 10 99 100 97∗
α = 5% SV-N SV-T SV-ST RSV-N RSV-T RSV-ST EG-N EG-T REG-N REG-T
−0.6 1.7 −1.0 −1.3 −1.7 0.8 −0.4 −0.8 −0.8
2.0∗ −0.6 −1.0 −1.2 1.5 0.2 −0.5 −0.5
−2.1∗ −2.3∗ −2.4∗ −1.1 −1.7 −2.0∗ −2.0∗
−1.8 −1.0 1.4 0.6 0.5 0.3
−0.4 1.7 0.8 1.3 1.1
1.8 1.0 1.0 0.9
−2.3∗ −1.4 −1.5
−0.4 −0.5
−2.5∗
See Table 5.5 for details
Table 5.9 summarizes the GW tests on the FZ0 loss, where the lower and upper triangular elements represent the same as shown in Table 5.5. The test statistics for the unconditional GW tests in the lower triangular elements show that the RSV-N and REGARCH models give significantly lower FZ0 losses than the RSV-T model for α = 1%, while the SV-T, RSV, and REGARCH models are significantly better than the SV-ST model for α = 5%. Although there is no significant difference among the RSV models, the EGARCH-T and REGARCH-T models are significantly better than the EGARCH-N and REGARCH-N models, respectively, indicating the importance of a heavy tail in forecasting the return distribution. The proportions of the predicted losses for the conditional GW test in the upper triangular elements confirm that the RSV-T model is significantly better than the RSV-N and REGARCH models for α = 1% and that the EGARCH-T and REGARCH-T models are superior to the EGARCH-N and REGARCH-N models for α = 5%. Figure 5.13 shows the models’ CLDs in comparison to the SV-N models. Again, a higher CLD implies higher accuracy. We observe some spikes, notably in early 2018
5.8 Empirical Study
107
Fig. 5.13 Cumulative loss differences (FZ0) in comparison to the SV-N model for DJIA Table 5.10 Violation rates and FZ0 losses of VaR and ES forecasts for N225 α = 1% α = 5% αˆ FZ0 p MC S αˆ FZ0 SV-N SV-T SV-ST RSV-N RSV-T RSV-ST EGARCH-N EGARCH-T REGARCH-N REGARCH-T
1.02 0.85 0.68 1.53 1.36 1.19 1.02 1.02 1.69 1.02
See Table 5.8 for details
1.2034 1.2158 1.2720 1.1447 1.1292 1.1126 1.2818 1.2344 1.1844 1.1306
0.99∗∗ 0.82∗∗ 0.39∗∗ 1.00∗∗ 1.00∗∗ 1.00∗∗ 0.51∗∗ 0.36∗∗ 1.00∗∗ 1.00∗∗
3.90 3.39 4.24 5.08 5.08 4.58 3.73 3.73 4.41 4.92
0.8163 0.7973 0.8846 0.7844 0.7806 0.7685 0.8450 0.8292 0.7997 0.7959
p MC S 0.42∗∗ 1.00∗∗ 0.00 1.00∗∗ 1.00∗∗ 1.00∗∗ 0.00 0.18 0.99∗∗ 1.00∗∗
108
5 Realized Stochastic Volatility Model
Table 5.11 Summary of GW tests on FZ0 loss of VaR and ES forecasts for N225 α = 1% SV SV SV RSV RSV RSV EG EG REG -N -T -ST -N -T -ST -N -T -N SV-N SV-T SV-ST RSV-N RSV-T RSV-ST EG-N EG-T REG-N REG-T
1∗∗
0 0
100 100 100
100 100 100 99
100 100 100∗ 99 99
1 1 1 0 0 1
0 0 94 0 0 0 99
100 100 99 0 1∗ 1 100 100
REG -T 100 100 99∗ 98 11 0 99 100 99∗
0.7 1.1 −1.0 −1.6 −1.9 1.0 0.9 −0.3 −1.7
1.1 −1.2 −1.9 −2.0∗ 0.8 0.8 −0.5 −2.2∗
−1.5 −2.0∗ −2.2∗ 0.1 −0.7 −1.0 −2.7∗∗
−0.7 −0.9 1.7 1.5 1.7 −0.3
−0.7 1.8 2.1∗ 1.7 0.1
1.7 2.1∗ 1.5 0.6
−0.7 −1.4 −1.6
−0.8 −2.4∗
−1.0
SV
SV
SV
RSV
RSV
RSV
EG
EG
REG
REG
-N
-T
-ST
-N
-T
-ST
-N
-T
-N
-T
99
1∗∗
99
100
100
1
2
98
100
0∗∗
100 99
100 99 99
99 100∗ 98 98
1∗ 96∗ 0 0 0
2 98∗ 0 0 0 100
15 99 0∗∗ 0∗∗ 2∗ 100 100
99 99 0 0 1 100 100 98
α = 5%
SV-N SV-T SV-ST RSV-N RSV-T RSV-ST EG-N EG-T REG-N REG-T
−2.0∗ 2.2∗ −1.2 −1.4 −1.9 1.2 0.6 −0.7 −0.9
3.0∗∗ −0.4 −0.6 −1.1 2.2∗ 2.0∗ 0.1 −0.1
−2.6∗∗ −2.9∗∗ −3.1∗∗ −1.3 −2.0∗ −2.4∗ −2.6∗∗
−0.8 −1.2 2.1∗ 1.5 2.1∗ 1.3
−1.1 2.3∗ 1.8 3.2∗∗ 2.6∗∗
2.4∗ 2.0∗ 2.4∗ 2.2∗
−1.5 −1.8 −1.8
−1.2 −1.4
−0.8
See Table 5.5 for details
and late 2019, that coincide with the large volatility spikes presented in Fig. 5.8. For α = 1%, most models, especially the REGARCH models, performed poorly in early 2018 and became significantly worse than the SV-N model. After the spike, the SV-T and RSV-ST models outperformed and finished better than the RSV-N model. For α = 5%, on the other hand, the RSV and REGARCH models performed better than the SV and EGARCH models after July 2018 until the end. Given that the FZ0 loss evaluates the VaR and ES jointly and accounts for the tail information that the violation rate cannot capture, these results imply that the RV, as well as skewness and a heavy tail on daily returns, are somewhat helpful in forecasting the VaR and ES. The prediction results for N225 are presented in Tables 5.10 and 5.11 and Figs. 5.14 and 5.15. We observe that the VaR violations are fairly close to the target rates for all the models. For both α = 1% and α = 5%, the RSV-ST model gives the lowest FZ0 loss. Notably, the RSV models are in the 90% MCS for both α = 1% and
5.8 Empirical Study
109
Fig. 5.14 VaR and ES forecasts of the RSV-ST (left) and REGARCH-T (right) models for N225
α = 5%, but the SV models are not, except for the SV-T model for α = 5%. The unconditional GW tests summarized in the lower triangular elements show that the RSV and REGARCH models outperform the SV and EGARCH models in many cases but that this trend does not hold for the reverse. The conditional GW tests in the upper triangular elements, on the other hand, indicate the superiority of the RSV models to the REGARCH models in some cases. The CLDs show that the RSV and REGARCH models are better than the SV and EGARCH models across the entire period for α = 1%, as are the RSV models almost across the entire period for α = 5%. In particular, the RSV-ST model outperformed the other models. These results suggest that extending the SV model by incorporating a skewed-heavy tail distribution on daily returns and the RV is effective in terms of generating VaR and ES forecasts.
5.8.2.3
Summary
Table 5.12 presents the rankings of average losses for volatility, VaR, and ES forecasts. The average ranking in the last column indicates that the RSV models are better ranked than the other models and that the RSV-ST model performs best. In summary, we observe that the RV is useful for volatility, VaR, and ES forecasts. Further, the RSV model is more effective compared to the REGARCH model, and incorporating the GH skewed t distribution also helps improve the forecasts.
110
5 Realized Stochastic Volatility Model
Fig. 5.15 Cumulative loss differences (FZ0) in comparison to the SV-N model for N225 Table 5.12 Rankings of average losses for volatility, VaR, and ES forecasts DJIA N225 MSE QLIKE FZ01% FZ05% MSE QLIKE FZ01% FZ05% Avg SV-N SV-T SV-ST RSV-N RSV-T RSV-ST EGARCH-N EGARCH-T REGARCH-N REGARCH-T
3 8 10 1 6 5 7 9 2 4
7 6 8 1 3 2 9 10 4 5
3 1 5 7 6 2 8 4 10 9
8 6 10 3 2 1 9 7 5 4
2 4 10 1 3 5 9 8 7 6
6 7 8 2 3 1 9 10 5 4
6 7 9 4 2 1 10 8 5 3
7 5 10 3 2 1 9 8 6 4
5.25 5.50 8.75 2.75 3.38 2.25 8.75 8.00 5.50 4.88
MSE mean squared error for volatility forecasts QLIKE Gaussian quasi-likelihood for volatility forecasts FZ01 %, FZ05 % FZ0 loss functions for VaR and ES forecasts with α = 1% and α = 5%, respectively Avg average rank across all columns
References
111
References 1. Aït-sahalia, Y., Mykland, P.A.: Estimating volatility in the presence of market microstructure noise: A review of the theory and practical considerations. In: Andersen, T.G. , Davis, R.A., Kreiβ, J.P., Mikosch , T.(eds.) Handbook of Financial Time Series, pp. 577–598. Springer, Berlin (2009) 2. Aït-Sahalia, Y., Mykland, P.A., Zhang, L.: How often to sample a continuous-time process in the presence of market microstructure noise. Rev. Fin. Stud. 18(2), 351–416 (2005). https:// doi.org/10.1093/rfs/hhi016 3. Andersen, T.G., Benzoni, L.: Realized volatility. In: Andersen, T.G., Davis, R.A., Kreiβ, J.P., Mikosch, T. (eds.) Handbook of Financial Time Series, pp. 555–575. Springer, Berlin (2009) 4. Andersen, T.G., Bollerslev, T.: Answering the skeptics: yes, standard volatility models do provide accurate forecasts. Int. Econ. Rev. 39(4), 885–905 (1998) 5. Andersen, T.G., Bollerslev, T., Christoffersen, P.F., Diebold, F.X.: Financial risk measurement for financial risk management. In: Constantinides, G.M., Harris, M., Stulz, R.M. (eds.) Handbook of the Economics of Finance, vol. 2B, chap. 17, pp. 1127–1220. North Holland, Amsterdam (2013) 6. Andersen, T.G., Bollerslev, T., Diebold, F.X.: Roughing it up: including jump components in the measurement, modeling, and forecasting of return volatility. Rev. Econ. Stat. 89(4), 701–720 (2007) 7. Andersen, T.G., Bollerslev, T., Diebold, F.X.: Parametric and nonparametric volatility measurement. In: Aït-Sahalia, Y., Hansen, L.P. (eds.) Handbook of Financial Econometrics, chap. 2, pp. 67–138. North Holland, Amsterdam (2010) 8. Andersen, T.G., Bollerslev, T., Diebold, F.X., Ebens, H.: The distribution of realized stock return volatility. J. Fin. Econ. 61(1), 43–76 (2001). https://doi.org/10.1016/S0304-405X(01)000551 9. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P.: The distribution of realized exchange rate volatility. J. Am. Stat. Assoc. 96(453), 42–55 (2001). https://doi.org/10.1198/ 016214501750332965 10. Andersen, T.G., Bollerslev, T., Diebold, F.X., Labys, P.: Modeling and forecasting realized volatility. Econometrica 71(2), 579–625 (2003) 11. Andrews, D.W.K.: Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59(3), 817–858 (1991). https://doi.org/10.2307/2938229 12. Asai, M., McAleer, M., Peiris, S.: Realized stochastic volatility models with generalized Gegenbauer long memory. Econometrics Stat. 16, 42–54 (2020). https://doi.org/10.1016/j.ecosta. 2018.12.005 13. Bandi, F.M., Russell, J.R.: Separating microstructure noise from volatility. J. Fin. Econ. 79(3), 367–389 (2006) 14. Bandi, F.M., Russell, J.R.: Microstructure noise, realized volatility, and optimal sampling. Rev. Econ. Stud. 75(2), 655–692 (2008) 15. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N.: Designing realized kernels to measure the ex post variation of equity prices in the presence of noise. Econometrica 76(6), 1481–1536 (2008). https://doi.org/10.3982/ECTA6495 16. Barndorff-Nielsen, O.E., Hansen, P.R., Lunde, A., Shephard, N.: Realized kernels in practice: trades and quotes. Econometrics J. 12(3), C1–C32 (2009). https://doi.org/10.1111/j.1368423X.2008.00275.x 17. Barndorff-Nielsen, O.E., Shephard, N.: Non-Gaussian Ornstein-Uhlenbeck-based models and some of their uses in financial economics. J. R. Stat. Soc. B 63(2), 167–241 (2001) 18. Barndorff-Nielsen, O.E., Shephard, N.: Econometric analysis of realized volatility and its use in estimating stochastic volatility models. J. R. Stat. Soc. B 64(2), 253–280 (2002) 19. Beran, J.: Statistics for Long-Memory Processes, 1st edn. Chapman & Hall (1994) 20. Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. J. Econometrics 31(3), 307–327 (1986)
112
5 Realized Stochastic Volatility Model
21. Bormetti, G., Casarin, R., Corsi, F., Livieri, G.: A stochastic volatility model with realized measures for option pricing. J. Bus. Econ. Stat. 38(4), 856–871 (2020). https://doi.org/10. 1080/07350015.2019.1604371 22. Borup, D., Jakobsen, J.S.: Capturing volatility persistence: a dynamically complete realized EGARCH-MIDAS model. Quant. Financ. 19(11), 1839–1855 (2019). https://doi.org/10.1080/ 14697688.2019.1614653 23. Chen, C.W., Watanabe, T., Lin, E.M.: Bayesian estimation of realized GARCH-type models with application to financial tail risk management. Econometrics Stat. (2021). https://doi.org/ 10.1016/j.ecosta.2021.03.006 24. Corsi, F.: A simple approximate long memory model of realized volatility. J. Fin. Econometrics 7(2), 174–196 (2009) 25. Corsi, F., Fusari, N., La Vecchia, D.: Realizing smiles: options pricing with realized volatility. J. Fin. Economics 107(2), 284–304 (2013). https://doi.org/10.1016/j.jfineco.2012.08.015 26. Corsi, F., Mittnik, S., Pigorsch, C., Pigorsch, U.: The Volatility of realized volatility. Econometric Rev. 27(1–3), 46–78 (2008). https://doi.org/10.1080/07474930701853616 27. Diebold, F.X.: Empirical Modeling of Exchange Rate Dynamics. SpringerVerlag, Berlin (1988) 28. Diebold, F.X., Mariano, R.S.: Comparing predictive accuracy. J. Bus. Econ. Stat. 20(1), 134– 144 (2002). https://doi.org/10.1198/073500102753410444 29. Dobrev, D.P., Szerszen, P.J.: The Information Content of High-Frequency Data for Estimating Equity Return Models and Forecasting Risk. FRB Working Paper 2010-45 (2010) 30. Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4), 987–1008 (1982) 31. Fissler, T., Ziegel, J.F.: Higher order elicitability and Osband’s principle. Ann. Stat. 44(4), 1680–1707 (2016). https://doi.org/10.1214/16-AOS1439 32. Giacomini, F., White, H.: Tests of conditional predictive ability. Econometrica 74(6), 1545– 1578 (2006) 33. Gneiting, T.: Making and evaluating point forecasts. J. Am. Stat. Assoc. 106(494), 746–762 (2011). https://doi.org/10.1198/jasa.2011.r10138 34. Hansen, P., Lunde, A., Nason, J.: The model confidence set. Econometrica 79(2), 453–497 (2011). https://doi.org/10.3982/ECTA5771 35. Hansen, P.R., Huang, Z.: Exponential GARCH modeling with realized measures of volatility. J. Bus. Econ. Stat. 34(2), 269–287 (2016) 36. Hansen, P.R., Huang, Z., Shek, H.: Realized GARCH: a joint model of returns and realized measures of volatility. J. Appl. Econometrics 27(6), 877–906 (2012) 37. Hansen, P.R., Lunde, A.: A forecast comparison of volatility models: does anything beat a GARCH(1,1)? J. Appl.Econometrics 20(7), 873–889 (2005). https://doi.org/10.1002/jae.800 38. Hansen, P.R., Lunde, A.: Realized variance and market microstructure noise. J. Bus. Econ. Stat. 24(2), 127–161 (2006). https://doi.org/10.1198/073500106000000071 39. Jacod, J., Li, Y., Mykland, P.A., Podolskij, M., Vetter, M.: Microstructure noise in the continuous case: The pre-averaging approach. Stochast. Processes Appl. 119(7), 2249–2276 (2009). https:// doi.org/10.1016/j.spa.2008.11.004 40. Koopman, S.J., Scharth, M.: The analysis of stochastic volatility in the presence of daily realized measures. J. Fin. Econometrics 11(1), 76–115 (2013). https://doi.org/10.1093/jjfinec/nbs016 41. Kurose, Y., Omori, Y.: Multiple-block dynamic equicorrelations with realized measures, leverage and endogeneity. Econometrics Stat. 13, 46–68 (2020). https://doi.org/10.1016/j.ecosta. 2018.03.003 42. Liu, L.Y., Patton, A.J., Sheppard, K.: Does anything beat 5-minute RV? A comparison of realized measures across multiple asset classes. J. Econometrics 187(1), 293–311 (2015) 43. Ljung, G.M., Box, G.E.P.: On a measure of lack of fit in time series analysis. Biometrika 65(2), 297–303 (1978) 44. Majewski, A.A., Bormetti, G., Corsi, F.: Smile from the past: a general option pricing framework with multiple volatility and leverage components. Econometric Anal. Fin. Derivatives 187(2), 521–531 (2015). https://doi.org/10.1016/j.jeconom.2015.02.036
References
113
45. McAleer, M., Medeiros, M.C.: Realized volatility: a review. Econometric Rev. 27(1–3), 10–45 (2008) 46. Nelson, D.B.: The Time Series Behaviour of Stock Market Volatility and Returns. Unpublished Ph.D. Thesis, MIT (1988) 47. Newey, W.K., West, K.D.: A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55(3), 703–708 (1987). https://doi.org/10. 2307/1913610 48. Nugroho, D.B., Morimoto, T.: realized non-linear stochastic volatility models with asymmetric effects and generalized Student’s t-distributions. J. Jpn. Stat. Soc. 44(1), 83–118 (2014) 49. Nugroho, D.B., Morimoto, T.: Box-Cox realized asymmetric stochastic volatility models with generalized Student’s t-error distributions. J. Appl. Stat. 43(10), 1906–1927 (2016) 50. Omori, Y., Watanabe, T.: Stochastic volatility and realized stochastic volatility models. In: Upadhyay, S.K., Singh, U., Dey, D.K., Appaia, L. (eds.) Current Trends in Bayesian Methodology with Applications, 1st edn. Hapman and Hall/CRC, New York (2015) 51. Patton, A.J.: Volatility forecast comparison using imperfect volatility proxies. J. Econometrics 160(1), 246–256 (2011). https://doi.org/10.1016/j.jeconom.2010.03.034 52. Patton, A.J., Sheppard, K.: Evaluating Volatility and Correlation Forecasts. In: Mikosch, T., Kreiβ, J.P., Davis, R.A., Andersen, T.G. (eds.) Handbook of Financial Time Series, pp. 801– 838. Springer (2009) 53. Patton, A.J., Ziegel, J.F., Chen, R.: Dynamic semiparametric models for expected shortfall (and Value-at-Risk). J. Econometrics 211(2), 388–413 (2019) 54. Shirota, S., Hizu, T., Omori, Y.: Realized stochastic volatility with leverage and long memory. Comput. Stat. Data Anal. 76, 618–641 (2014). https://doi.org/10.1016/j.csda.2013.08.013 55. Shirota, S., Omori, Y., Lopes, H.F., Piao, H.: Cholesky realized stochastic volatility model. Econometrics Stat. 3, 34–59 (2017). https://doi.org/10.1016/j.ecosta.2016.08.003 56. Stinchcombe, M.B., White, H.: Consistent specification testing with nuisance parameters present only under the alternative. Econometric Theor. 14(3), 295–325 (1998) 57. Takahashi, M., Omori, Y., Watanabe, T.: Estimating stochastic volatility models using daily returns and realized volatility simultaneously. Comput. Stat. Data Anal. 53(6), 2404–2426 (2009). https://doi.org/10.1016/j.csda.2008.07.039 58. Takahashi, M., Watanabe, T., Omori, Y.: Volatility and quantile forecasts by realized stochastic volatility models with generalized hyperbolic distribution. Int. J. Forecast. 32(2), 437–457 (2016). https://doi.org/10.1016/j.ijforecast.2015.07.005 59. Takahashi, M., Watanabe, T., Omori, Y.: Forecasting daily volatility of stock price index using daily returns and realized volatility. Econometrics Stat. (2021). https://doi.org/10.1016/j.ecosta. 2021.08.002 60. Taylor, J.W.: Forecast combinations for value at risk and expected shortfall. Int. J. Forecast. 36(2), 428–441 (2020). https://doi.org/10.1016/j.ijforecast.2019.05.014 61. Trojan, S.: Regime Switching Stochastic Volatility with Skew, Fat Tails and Leverage using Returns and Realized Volatility Contemporaneously. Discussion Paper Series No. 2013- 41, Department of Economics, School of Economics and Political Science, University of St. Gallen (2013) 62. Ubukata, M., Watanabe, T.: Pricing Nikkei 225 options using realized volatility. Jpn. Econ. Rev. 65(4), 431–467 (2014). https://doi.org/10.1111/jere.12024 63. West, K.D.: Asymptotic inference about predictive ability. Econometrica 64(5), 1067–1084 (1996). https://doi.org/10.2307/2171956 64. Yamauchi, Y., Omori, Y.: Multivariate stochastic volatility model with realized volatilities and pairwise realized correlations. J. Bus. Econ. Stat. 38(4), 839–855 (2020). https://doi.org/10. 1080/07350015.2019.1602048 65. Zhang, L.: Efficient estimation of stochastic volatility using noisy observations: a multi-scale approach. Bernoulli 12(6), 1019–1043 (2006) 66. Zhang, L., Mykland, P.A., Aït-Sahalia, Y.: A tale of two time scales: determining integrated volatility with noisy high-frequency data. J. Ame. Stat. Assoc. 100(472), 1394–1411 (2005)