233 8 4MB
English Pages XVI, 131 [141] Year 2020
SPRINGER BRIEFS IN STATISTICS JSS RESEARCH SERIES IN STATISTICS
Li-Hsien Sun · Xin-Wei Huang · Mohammed S. Alqawba · Jong-Min Kim · Takeshi Emura
Copula-Based Markov Models for Time Series Parametric Inference and Process Control
SpringerBriefs in Statistics JSS Research Series in Statistics
Editors-in-Chief Naoto Kunitomo, Economics, Meiji University, Chiyoda-ku, Tokyo, Tokyo, Japan Akimichi Takemura, The Center for Data Science Education and Research, Shiga University, Bunkyo-ku, Tokyo, Japan Series Editors Genshiro Kitagawa, Meiji Institute for Advanced Study of Mathematical Sciences, Nakano-ku, Tokyo, Japan Shigeyuki Matsui, Graduate School of Medicine, Nagoya University, Nagoya, Aichi, Japan Manabu Iwasaki, School of Data Science, Yokohama City University, Yokohama, Tokyo, Japan Yasuhiro Omori, Graduate School of Economics, The University of Tokyo, Bunkyo-ku, Tokyo, Japan Masafumi Akahira, Institute of Mathematics, University of Tsukuba, Tsukuba, Ibaraki, Japan Masanobu Taniguchi, Department of Mathematical Sciences/School, Waseda University/Science & Engineering, Shinjuku-ku, Japan Hiroe Tsubaki, The Institute of Statistical Mathematics, Tachikawa, Tokyo, Japan Satoshi Hattori, Faculty of Medicine, Osaka University, Suita, Osaka, Japan Kosuke Oya, School of Economics, Osaka University, Toyonaka, Osaka, Japan
The current research of statistics in Japan has expanded in several directions in line with recent trends in academic activities in the area of statistics and statistical sciences over the globe. The core of these research activities in statistics in Japan has been the Japan Statistical Society (JSS). This society, the oldest and largest academic organization for statistics in Japan, was founded in 1931 by a handful of pioneer statisticians and economists and now has a history of about 80 years. Many distinguished scholars have been members, including the influential statistician Hirotugu Akaike, who was a past president of JSS, and the notable mathematician Kiyosi Itô, who was an earlier member of the Institute of Statistical Mathematics (ISM), which has been a closely related organization since the establishment of ISM. The society has two academic journals: the Journal of the Japan Statistical Society (English Series) and the Journal of the Japan Statistical Society (Japanese Series). The membership of JSS consists of researchers, teachers, and professional statisticians in many different fields including mathematics, statistics, engineering, medical sciences, government statistics, economics, business, psychology, education, and many other natural, biological, and social sciences. The JSS Series of Statistics aims to publish recent results of current research activities in the areas of statistics and statistical sciences in Japan that otherwise would not be available in English; they are complementary to the two JSS academic journals, both English and Japanese. Because the scope of a research paper in academic journals inevitably has become narrowly focused and condensed in recent years, this series is intended to fill the gap between academic research activities and the form of a single academic paper. The series will be of great interest to a wide audience of researchers, teachers, professional statisticians, and graduate students in many countries who are interested in statistics and statistical sciences, in statistical theory, and in various areas of statistical applications.
More information about this subseries at http://www.springer.com/series/13497
Li-Hsien Sun Xin-Wei Huang Mohammed S. Alqawba Jong-Min Kim Takeshi Emura •
•
•
•
Copula-Based Markov Models for Time Series Parametric Inference and Process Control
123
Li-Hsien Sun Graduate Institute of Statistics National Central University Taoyuan, Taiwan
Xin-Wei Huang Institute of Statistics National Chiao Tung University Hsinchu, Taiwan
Mohammed S. Alqawba Department of Mathematics College of Sciences and Arts at Al Rass Qassim University Unayzah, Saudi Arabia
Jong-Min Kim Division of Science and Mathematics University of Minnesota at Morris Morris, MN, USA
Takeshi Emura Department of Information Management Chang Gung University Taoyuan, Taiwan
ISSN 2191-544X ISSN 2191-5458 (electronic) SpringerBriefs in Statistics ISSN 2364-0057 ISSN 2364-0065 (electronic) JSS Research Series in Statistics ISBN 978-981-15-4997-7 ISBN 978-981-15-4998-4 (eBook) https://doi.org/10.1007/978-981-15-4998-4 © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
This book provides statistical methodologies for fitting copula-based Markov chain models to a serially correlated time series. These methods are illustrated through a variety of illustrative examples from finance, industry, sports, and other fields. It is our hope that the book serves as an accessible textbook for learning statistical analyses of time series data using copulas for researchers/students in the fields of economics, management, mathematics, statistics, and others. The book can also serve as a research monograph, where each chapter can be read independently. As the subtitle “Parametric inference” suggests, we focus on parametric models based on the normal distribution, t-distribution, normal mixture distribution, Poisson distribution, and others. The book adopts likelihood-based methods as the main statistical tools for fitting the models and develops computing techniques to find the maximum likelihood estimator. Some chapters discuss statistical process control, Bayesian methods, and regression methods. We provide computer codes for most presented statistical methods to help readers analyze their data. Taoyuan, Taiwan Hsinchu, Taiwan Qassim, Saudi Arabia Minnesota, USA Taoyuan, Taiwan
Li-Hsien Sun Xin-Wei Huang Mohammed S. Alqawba Jong-Min Kim Takeshi Emura
v
Acknowledgements
We thank the series editor, Dr. Shigeyuki Matsui, for his valuable comments on this book. Li-Hsien Sun thanks his former graduate students, Chang-Shang Lee and Wei-Cheng Lin, for their prior contribution to our published articles. He is financially supported by Ministry of Science and Technology, Taiwan (MOST 108-2118-M-008 -002 -MY2). Xin-Wei Huang would like to thank the advisor of his master’s degree, Dr. Takeshi Emura, who is also the author of this book. He would also like to thank Dr. Jia-Han Shih for his kind help. Mohammed Alqawba thanks his advisor Dr. Norou Diawara for his guidance and valuable comments that lead to several published articles. Takeshi Emura thanks his former graduate student, Ting-Hsuan Long, for his prior contribution to our published articles. He is financially supported by Ministry of Science and Technology, Taiwan (MOST 107-2118-M-008-003-MY3).
vii
Contents
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
1 1 2 2 3 3 4 4 4 5
2 Copula and Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Kendall’s Tau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Archimedean Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Copula-Based Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A: The Proof of C ðu; vÞ ¼ u þ v 1 þ C ð1 u; 1 vÞ being a Copula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix B: Proofs of Copulas Approaching to the Independence . . Appendix C: Derivations of Kendall’s Tau . . . . . . . . . . . . . . . . . . . . ½1;1 Appendix D: Derivation of Ca ðu; vÞ Under the Frank Copula . . . . . ½1;0 Appendix E: Derivation of Cq ðu; vÞ Under the Gaussian Copula . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
7 7 8 10 12 15 20
.. .. .. ..
23 24 24 26
.. ..
26 27
1 Overview of the Book with Data Examples . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Copulas and Applications . . . . . . . . . . . . . . . . . . . 1.3 Chemical Process Data . . . . . . . . . . . . . . . . . . . . . 1.4 S&P 500 Stock Market Index Data . . . . . . . . . . . . 1.5 Batting Average Data in MLB . . . . . . . . . . . . . . . 1.6 Stock Price Data of Dow Jones Industrial Average . 1.7 Data on the Count of Arsons . . . . . . . . . . . . . . . . . 1.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
ix
x
Contents
3 Estimation, Model Diagnosis, and Process Control Under the Normal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Serial Dependence, Statistical Process Control, and Copulas 3.2 Model and Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Goodness-of-Fit Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Chemical Process Data . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Financial Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.3 Baseball Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: R Codes for Data Analysis . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
29 29 31 34 36 38 39 43 43 45 47 50 52
4 Estimation Under Normal Mixture Models for Financial Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Copula-Based Markov Chain . . . . . . . . . . . . . 4.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Maximum Likelihood Estimators . . . . . . . . . . 4.3.2 Interval Estimation . . . . . . . . . . . . . . . . . . . . . 4.3.3 Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: R codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
55 55 57 57 58 60 60 62 64 65 66 70 71 71
5 Bayesian Estimation Under the t-Distribution for Financial Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Models and Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Copula-Based Markov Models . . . . . . . . . . . . . . 5.2.2 Non-standardized t-Distribution . . . . . . . . . . . . . 5.2.3 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Estimation of Hyperparameters via Resampling . . 5.3.2 Metropolis–Hastings Algorithm . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
73 73 74 74 75 76 77 77 79
. . . . . . . . . . . . . .
Contents
5.4 Data Analysis . . . . . . . . 5.5 Conclusions . . . . . . . . . Appendix: Moment Estimates References . . . . . . . . . . . . . .
xi
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
6 Control Charts of Mean by Using Copula Markov SPC and Conditional Distribution by Copula . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Copula Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Copula and Directional Dependence . . . . . . . . . . . . . . 6.2.2 Copula Markov Statistical Process Control Chart . . . . . 6.2.3 Control Charts of Mean by Using Copula Conditional Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Real Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix: R Codes for Data Analysis . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Copula Markov Models for Count Series with Excess Zeros 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Zero-Inflated Count Regression Models . . . . . . . . 7.3 Markov Chain Models . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 First-Order Markov Models . . . . . . . . . . . . . . . . . 7.3.2 Second-Order Markov Models . . . . . . . . . . . . . . . 7.3.3 Model Properties . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Log-Likelihood Functions . . . . . . . . . . . . . . . . . . 7.4.2 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . 7.5 Model Selection and Prediction . . . . . . . . . . . . . . . . . . . . 7.6 Arson Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A: Trivariate Max-Id Copula Function with Positive Stable LT and Bivariate Gumbel . . . . . . . . . . . . . . . . . . . . . . . Appendix B: R Codes for Data Analysis . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . .
. . . .
. . . .
80 84 84 85
. . . . .
. . . . .
. . . . .
87 87 88 88 89
. . . . .
. . . . .
. . . . .
90 91 95 96 99
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
101 101 103 103 106 107 108 110 111 111 114 116 117
. . . . . . 120 . . . . . . 122 . . . . . . 125
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Abbreviations
ACF AIC AR AR(1) ARL BA BIC CDF CI CMP CvM ERA GARCH KS LCL MA ML MLB MLE NB NR pdf SD SE SPC UCL ZICMP ZINB ZIP
Autocorrelation Function Akaike Information Criterion Autoregressive First-Order Autoregressive Average Run Length Batting Average Bayesian Information Criterion Cumulative Distribution Function Confidence Interval Conway–Maxwell–Poisson (distribution) Cramér–von Mises Earned Run Average Generalized Autoregressive Conditional Heteroscedasticity Kolmogorov–Smirnov Lower Control Limit Moving Average Maximum Likelihood Major League Baseball Maximum Likelihood Estimator Negative Binomial (distribution) Newton–Raphson Probability Density Function Standard Deviation Standard Error Statistical Process Control Upper Control Limit Zero-Inflated Conway–Maxwell–Poisson Zero-Inflated Negative Binomial Zero-Inflated Poisson
xiii
Notations
a2A aT Cð u; v Þ C ½ 1; 0 ð u; v Þ ¼ @Cð@uu; v Þ C ½ 0; 1 ð u; v Þ ¼ @Cð@vu; v Þ
u; v Þ C ½ 1; 1 ð u; v Þ ¼ @ Cð @u@v Mð u; v Þ ¼ minð u; v Þ Nðl; r2 Þ 2
E½X E½X jY arg maxu ‘ð u Þ Ið Þ PrðAÞ PrðAjBÞ sgnðÞ R Unifða; bÞ VarðXÞ Wð u; v Þ ¼ maxð u þ v þ 1; 0 Þ /a ðÞ UðÞ
An element a belonging to a set A The transpose of a vector a The copula function The partial derivative of a copula The partial derivative of a copula The copula density The Fréchet–Hoeffding upper bound copula The normal distribution with mean l and variance r2 The expectation of X The conditional expectation of X given Y The argument that maximizes a function ‘ The indicator function: IðAÞ ¼ 1 if A is true, or IðAÞ ¼ 0 if A is false The probability of A The conditional probability of A given B A function defined as sgnðxÞ ¼−1, 0 or 1 for x\0, x ¼ 0, or x [ 0, respectively Real line or one-dimensional Euclidean space, i.e., R ð1; 1Þ The uniform distribution on an interval ða; bÞ The variance of X The Fréchet–Hoeffding lower bound copula The generator of Archimedean copulas with parameter a The distribution function of Nð0; 1Þ, defined as 2 Rx 1 pffiffiffiffi exp s ds UðxÞ ¼ 2 2p 1
xv
xvi
CðÞ
Notations
The gamma function defined as R1 CðaÞ ¼ xa1 expðxÞdx 0
d
!
Convergence in distribution
!
p
Convergence in probability
x y means x is approximately equal to y X F means a random variable X follows a distribution F x y means x is defined by y For any For some
or : = 8 9
Chapter 1
Overview of the Book with Data Examples
Abstract This chapter briefly describes the main ideas of the book: time series data and copula-based Markov models for serial dependence. For illustration, we introduce five datasets, namely, the chemical process data, S&P 500 stock market index data, the batting average data in MLB, the stock price data of Dow Jones Industrial Average, and data on the number of arsons. Keywords Copula · Financial time series · Normal distribution · Markov chain · Poisson distribution · Serial dependence · Statistical process control · Time series
1.1 Introduction This book presents some statistical models and data analytic methods for time series data: the data on a measurement of interest at equally spaced time points (e.g., every 2 h). In time series data, a measurement is recorded together with the time index. Often, two adjacent time points are close to each other so that the two measurements may not be independent. In particular, observed data collected in daily manufacturing processes are serially dependent in the sense that the present sampling condition is dependent on the past ones. Thus, modeling serial dependence in time series plays a critical part of the data analysis, which will be investigated in this book. One of the simplest and most popular approaches for modeling serial dependence is the first-order autoregressive AR(1) model. This traditional model can only deal with linear dependence between two time points, and the normal distribution for measurements. What we consider in this book, however, is copula-based Markov chain models that were initially proposed by Darsow et al. (1992), and subsequently applied to different statistical problems by Joe (1997), Chen and Fan (2006), Domma et al. (2009), Long and Emura (2014), Emura et al. (2017), Sun et al. (2018), Lin et al. (2019), Huang and Emura (2019), Huang et al. (2020a, 2020b), Kim et al. (2019), and Zhang et al (2020). Unlike the AR(1) model that is mainly designed for the normal distributions, the copula models can easily incorporate a variety of marginal distributions, as well as different types of dependence structures, yielding a fairly flexible framework for time series modeling. Readers will see the Clayton copula, Joe copula, Frank copula, and © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 L.-H. Sun et al., Copula-Based Markov Models for Time Series, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-15-4998-4_1
1
2
1 Overview of the Book with Data Examples
many others for the options to model serial dependence. Furthermore, readers will see how marginal distributions, such as the t-distribution, normal mixture distribution, zero-inflated Poisson distribution, and more complex distributions, are incorporated to describe the marginal behavior for measurements.
1.2 Copulas and Applications Chapter 2 of this book contains a quick introduction to copulas and Markov processes, serving as a useful reference, especially for readers who have not studied copulas. Also, it serves as a convenient dictionary for readers who are reading Chaps. 3–7. Many of the important concepts and terms, such as Archimedean copula, Kendall’s tau, copula densities, and Markov chain, are clearly defined in Chap. 2. The word “copula” was first used by Sklar (1959), which means “a link, tie, or bond” in Latin. A “copula function” can be used to define a functional structure of two variables. Typically, a copula function has its unique name, such as “the Clayton copula,” “the Joe copula,” and “the Frank copula,” named after their original papers. Different copulas often produce remarkably different structures of dependence. Copulas have been extensively studied by both theoretical and applied researchers since its introduction by Sklar (1959). See Nelsen (2006) and Durante and Sempi (2016) for the extensive coverage of copula theory. Copulas offer an effective tool for constructing a multivariate distribution, which has especially been useful in applied statistical models for survival data and financial time series data. Readers are referred to the book of Joe (1997) and McNeil et al. (2005) for the analysis of time series data, and the books of Emura and Chen (2018) and Emura et al. (2019) for the analysis of survival data. This book provides statistical methodologies for fitting copula-based Markov chain models to a serially correlated time series. The copula-based Markov chain models were initially proposed by Darsow et al. (1992), and subsequently used as statistical methods by many authors. The datasets for the statistical methods are available from a variety of illustrative examples from industry, finance, sports, and other fields. Below, we list up the data examples of this book.
1.3 Chemical Process Data Researchers may collect data on a measurement on a chemical process (Box and Jenkins 1990; Bisgaard and Kulahci 2007; Box and Narasimhan 2010). The data consists of a series of chemical concentrations measured every 2 h. Since the two adjacent time points are close to each other, the measurements may be dependent. In general, observed data collected in daily manufacturing processes are often dependent in the sense that the present sampling condition depends on the past ones.
1.3 Chemical Process Data
3
Engineers/researchers often use statistical process control (Mastrangelo and Montgomery 1995; Montgomery 2009) to judge if the concentration level is kept within a reasonable range. Chapter 3 revisits the data in details.
1.4 S&P 500 Stock Market Index Data The stock price today may depend on the past ones. Financial engineers and investors may be interested in the weekly values of the S&P 500 index consisting of 500 leading companies in leading industries of the U.S. economy. From FRED (Federal Reserve Economic Data) https://research.stlouisfed.org/fred2/series/ SP500/downloaddata, we extract the weekly stock price from January 1, 2010 to January 3, 2014 (ending Friday). Our goal is to see if the weekly returns stay within a reasonable range over the period, as well as to find a suitable model for serial dependence. Chapter 3 analyzes the data using copula-based Markov chain models with the normal marginal distribution. It turns out that the present stock price is dependent on the price of the past 2 weeks as the second-order Markov model shows a best fit serial dependence. Chapter 5 employs the t-distribution to capture the fat-tail behavior of the log return (the difference of the log-scaled stock prices between two time points) using a Bayesian method. For the log return, the normality assumption is often rejected by the Jarque–Bera test owing to the fat tails (Curto et al. 2009).
1.5 Batting Average Data in MLB Hill and Schvaneveldt (2011) used statistical process control (SPC) to identify the steroid era in major league baseball (MLB). They examined batting averages (BA) from 1993 to 2008 to see if there is any unusual increase/decrease in the BA during the period. Motivated by this finding and following Kim et al. (2019), we extract annual records of BA in MLB from 1980 to 2016 (37 seasons). The goal is to detect if there is an unusual record of MLB statistics by fitting copula-based methods. Chapter 3 analyzes the data using copula-based Markov models. Kim et al. (2019) noticed that there may be some other characteristics of interest in baseball that may affect the performance of the season: ERA (earned run average) for instance. ERA is the mean of earned runs given up by a pitcher per nine innings pitched. Kim et al. (2019) considered the directional dependence of ERA given BA, or BA given ERA by using a copula directional dependence measure proposed by Sungur (2005) and a statistical method proposed by Kim and Hwang (2017). Chapter 6 analyzes the bivariate time series of BA and ERA using copula-based Markov models and copula-based directional dependence models.
4
1 Overview of the Book with Data Examples
1.6 Stock Price Data of Dow Jones Industrial Average Stock price data with the time index collected for financial and econometric studies are rarely independent due to serial dependence. We consider the weekly stock price of Dow Jones Industrial Average from 2008/1/1 to 2012/1/1 obtained from Yahoo Finance. In order to remove the serial dependence, researchers typically calculate the difference of the log-transformed values between two adjacent time points to obtain the log return (see Chap. 4 for details). Curto et al. (2009) showed that log returns in the stock market follow some heavy tail distributions rather than the normal distribution, which agrees with our data analysis of Chap. 4. Here, we adopt the normal mixture distribution to capture the non-normal distribution of the log return. In addition, the normal mixture distribution allows a flexible shape of the distribution, including a bimodal shape and fat-tailed shape.
1.7 Data on the Count of Arsons Count time series data may be observed in several applied disciplines such as environmental science, biostatistics, economics, public health, and finance. For analyzing the number of crimes, the Poisson distribution is typically fitted to observed counts (Santitissadeekorn et al. 2020). In some cases, a zero count may occur more often than other counts (Alqawba et al. 2019). For instance, we consider monthly counts of arson in Pittsburgh, USA (to be analyzed in Chap. 7). The data consisted of 144 monthly counts of arsons, starting from January 1990 and ending in December 2001. A bar plot of the distribution of series shows that the distribution of the time series of the arson counts has more zeros relative to the counts predicted by a Poisson distribution. The time series plot shows frequent occurrence of zeros and some autocorrelation. Overlooking the frequent occurrence of zeros and the serial correlation could lead to false inference. Motivated by these problems, Chap. 7 develops a class of copula-based Markov time series models for zero-inflated counts.
1.8 Concluding Remarks From its introduction of copulas by Sklar (1959), it took more than 30 years to generate the idea of using copulas to describe serial dependence in a time series (Darsow et al. 1992). It took additional years to develop likelihood-based inference procedures (Joe 1997; Chen and Fan 2006). This book describes our previously developed ideas of likelihood-based inference procedures with various parametric models, including Long and Emura (2014), Emura et al. (2017), Huang and Emura (2019), Sun et al. (2018), Lin et al. (2019), Huang and Emura (2019), and Kim et al. (2019). The advantage of the copula-based time series models is that the marginal
1.8 Concluding Remarks
5
distribution is separately modeled from the serial dependence structure. In addition, copulas provide flexible dependence models and unified statistical methods, not restricted to a particular type of dependence structure, such as a linear dependence. One can choose any copula that he/she likes from a large pool of existing copulas. One can also choose any specific type of marginal distribution, e.g., the normal mixture distribution and the Poisson distribution. Copulas would continue to be the heart of modeling time series data.
References Alqawba M, Diawara N, Chaganty NR (2019) Zero-inflated count time series models using Gaussian copula. Sequen Anal 38(3):342–357 Bisgaard S, Kulahci M (2007) Quality quandaries: Using a time series model for process adjustment and control. Qual Eng 20(1):134–141 Box GEP, Jenkins G (1990) Time series analysis, forecasting and control. Holden-Day, Inc, New York Box G, Narasimhan S (2010) Rethinking statistics for quality control. Qual Eng 22(2):60–72 Chen X, Fan Y (2006) Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. J Economet 135(1–2):125–154 Curto J, Pinto J, Tavares G (2009) Modeling stock markets volatility using Garch models with normal, students t and stable Paretian distributions. Stat Pap 50(2):311–321 Darsow WF, Nguyen B, Olsen ET (1992) Copulas and Markov processes. Illinois J Math 36(4):600– 642 Domma F, Giordano S, Francesco PP (2009) Statistical modeling of temporal dependence in financial data via a copula function. Commun Stat Simul Comput 38:703–728 Durante F, Sempi C (2016) Principles of copula theory. Chapman and Hall/CRC Emura T, Long T-H, Sun L-H (2017) R routines for performing estimation and statistical process control under copula-based time series models. Commun Stat Simul Comput 46(4):3067–3087 Emura T, Matsui S, Rondeau V (2019) Survival analysis with correlated endpoints, joint frailtycopula models. JSS Research Series in Statistics, Springer Emura T, Chen YH (2018) Analysis of survival data with dependent censoring, copula-based approaches. JSS Research Series in Statistics, Springer Hill SE, Schvaneveldt SJ (2011) Using statistical process control charts to identify the steroids era in major league baseball: An educational exercise. J Stat Educ 19:1–19 Huang X-W, Emura T (2019) Model diagnostic procedures for copula-based Markov chain models for statistical process control. Commun Stat Simul Comput. https://doi.org/10.1080/03610918. 2019.1602647 Huang X-W, Chen WR, Emura T (2020a). Likelihood-based inference for a copula-based Markov chain model with binomial time series, submitted Huang X-W, Wang W, Emura T (2020b). A copula-based Markov chain model for serially dependent event times with a dependent terminal event, Japanese J Stat Data Sci, in revision Joe H (1997) Multivariate models and multivariate dependence concepts. Chapman and Hall/CRC Kim JM, Baik J, Reller M (2019) Control charts of mean and variance using copula Markov SPC and conditional distribution by copula. Commun Stat Simul Comput. https://doi.org/10.1080/ 03610918.2018.1547404 Kim J-M, Hwang S-Y (2017) Directional dependence via Gaussian copula beta regression model with asymmetric GARCH marginals. Commun Stat Simul Comput 46(10):7639–7653
6
1 Overview of the Book with Data Examples
Lin WC, Emura T, Sun LH (2019) Estimation under copula-based Markov normal mixture models for serially correlated data. Commun Stat Simul Comput. https://doi.org/10.1080/03610918.2019. 1652318 Long T-H, Emura T (2014) A control chart using copula-based Markov chain models. J Chin Stat Assoc 52(4):466–496 Mastrangelo CM, Montgomery DC (1995) SPC with correlated observations for the chemical and process industries. Qual Reliabil Eng Int 11(2):79–89 McNeil AJ, Frey R, Embrechts P (2005) Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, New York Montgomery DC (2009) Statistical quality control, vol 7. Wiley, New York Nelsen RB (2006) An Introduction to Copulas. Springer Science & Business Media Santitissadeekorn N, Lloyd DJ et al (2020) Approximate filtering of conditional intensity process for Poisson count data: Application to urban crime. Comput Stat Data Anal 144:106850 Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris 8:229–231 Sun LH, Lee CS, Emura T (2018) A Bayesian inference for time series via copula-based Markov chain models. Commun Stat Simul Comput. https://doi.org/10.1080/03610918.2018.1529241 Sungur EA (2005) A note on directional dependence in regression setting. Commun Stat Theory Methods 34:1957–1965 Zhang S, Zhou QM, Lin H (2020) Goodness-of-fit test of copula functions for semi-parametric univariate time series models. Stat Pap https://doi.org/10.1007/s00362-019-01153-4
Chapter 2
Copula and Markov Models
Abstract This chapter introduces the basic concepts on copulas and Markov models. We review the formal definition of copulas with its fundamental properties. We then introduce Kendall’s tau as a measure of dependence structure for a pair of random variables, and its relationship with a copula. Examples of copulas are reviewed, such as the Clayton copula, the Gaussian copula, the Frank copula, and the Joe copula. Finally, we introduce the copula-based Markov chain time series models and their fundamental properties. Keywords Bivariate distribution · Copula · Kendall’s tau · Archimedean copula · Markov chain · Serial dependence · Time series
2.1 Introduction Copulas have been extensively studied by both theoretical and applied researchers since its introduction by Sklar (1959). This chapter provides a quick review of the basic concepts on copulas and Markov models that are specifically relevant to the topic of the book: copula-based Markov time series models. More extensive coverage of copula theory can be found in two excellent books: Nelsen (2006) and Durante and Sempi (2016). This chapter contains the following materials. Section 2.2 reviews the formal definition of copulas with their fundamental properties. Section 2.3 introduces Kendall’s tau as a measure of dependence for a pair of random variables, and its relationship with a copula. Section 2.4 presents an important class of copulas, called Archimedean copulas. Section 2.5 describes data-generation methods along with examples of copulas, such as the Clayton copula and the Joe copula. Section 2.6 discusses the copula-based Markov chain models for a time series. Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-981-15-4998-4_2) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 L.-H. Sun et al., Copula-Based Markov Models for Time Series, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-15-4998-4_2
7
8
2 Copula and Markov Models
In this chapter, a symbol “Unif(0,1)” represents a uniform distribution on an interval [0,1]. The notation “U~Unif(0,1)” expresses a random variable U following Unif(0,1). Specifically, Pr(U ≤ u) = u for 0 ≤ u ≤ 1.
2.2 Copulas The word “copula” was first used by Sklar (1959), which means “a link, tie, or bond” in Latin. In two-dimensional case, a copula is defined as a bivariate distribution function C : [0, 1]2 → [0, 1] whose marginal distributions follow Unif(0,1). Formally, a copula is defined as follows: Definition 2.1 (copula) : A bivariate copula is a function C : [0, 1]2 → [0, 1] satisfying: (C1) C(u, 0) = C(0, v) = 0, C(u, 1) = u and C(1, v) = v for 0 ≤ u, v ≤ 1. (C2) C(u2 , v2 ) – C(u2 , v1 ) – C(u1 , v2 ) + C(u1 , v1 ) ≥ 0 for 0 ≤ u1 ≤ u2 ≤ 1 and 0 ≤ v1 ≤ v2 ≤ 1. The condition (C1) states that the marginal distributions follow Unif(0,1). The condition (C2) states that the probability mass on (u1 , u2 ] × (v1 , v2 ] is nonnegative. If the condition (C2) holds, C is said to be “2-increasing.” In fact, any bivariate distribution function is 2-increasing. If C is not 2-increasing, it induces a negative probability and does not give a valid probability model. The following are copulas since they satisfy (C1) and (C2): (i) Independence copula: (u, v) = uv. (ii) Fréchet–Hoeffding upper bound copula: M(u, v) = min(u, v). (iii) Fréchet–Hoeffding lower bound copula: W (u, v) = max(u + v − 1,0). For any copula C(u, v), we have W (u, v) ≤ C(u, v) ≤ M(u, v) for 0 ≤ u, v ≤ 1. See Nelsen (2006) for the proof. The inequality means that W serves as the lower bound and M serves as the upper bound. The following remarkable result is known as Sklar’s theorem (Sklar 1959) that provides a mathematical basis of copula models.
2.2 Copulas
9
Sklar’s theorem: Let X and Y be two random variables. For any joint distribution function with its marginal distribution functions F(x) = Pr(X ≤ x) and G(y) = Pr(Y ≤ y), there exists a copula C such that Pr(X ≤ x, Y ≤ y) = C{F(x), G(y)}.
(2.1)
The copula C is unique if F and G are continuous. For example, if X and Y are independent, then the copula is (u, v) = uv, as its name suggests. If X = Y with probability one, then the copula is M(u, v) = min(u, v). Thus, M is a copula corresponding to perfect positive dependence between X and Y. If X = −Y with probability one, then the copula is W (u, v) = max(u + v − 1, 0). Thus, W is a copula corresponding to perfect negative dependence between X and Y. In all the examples above, the copula is irrelevant to the marginal distributions of X and Y. This means that the copula characterizes the dependence structure between X and Y without being influenced by the structure of the marginal distributions. In statistical applications, we usually consider a copula that has a parameter describing the degree of dependence. Usually, a positive (negative) parameter value is assigned for positive (negative) dependence while the zero value is assigned for independence. For instance, we consider X and Y that jointly follow a bivariate normal distribution with E[X ] = E[Y ] = 0, V ar [X ] = V ar [Y ] = 1, and Cov(X, Y ) = ρ ∈ (−1, 1). One can derive a one-parameter copula corresponding to this distribution as Cρ (u, v) = Φρ [Φ −1 (u), Φ −1 (v)], where x y Φρ (x, y) = −∞ −∞
1
2π 1 − ρ 2
exp −
1 2 2 (s − 2ρst + t ) dsdt 2(1 − ρ 2 )
is a distribution function of the standard bivariate normal distribution, and x Φ(x) = −∞
2 1 s ds √ exp − 2 2π
is the distribution function of N(0, 1). This one-parameter copula Cρ (u, v) is called the Gaussian copula (or the normal copula). Unfortunately, the Gaussian copula may not be an attractive choice for data analysis due to its complex form, and inability to produce tail dependence (see Sect. 2.5).
10
2 Copula and Markov Models
Consider a bivariate random vector (U, V ) such that the distribution function is a copula C(u, v). In short, Pr(U ≤ u, V ≤ v) = C(u, v). Let F(·) and G(·) be strictly increasing and continuous distribution functions. If one defines two random variables X = F −1 (U ) and Y = G −1 (V ), they satisfy Eq. (2.1). This is a reverse statement for Sklar’s theorem: a copula and two marginal distribution functions produce a unique bivariate distribution function. Assume that the joint distribution function of X and Y is written as Pr(X ≤ x, Y ≤ y) = C{F(x), G(y)}. Then, the joint survival function is Pr(X > x, Y > y) = 1 − F(x) − G(y) + C{F(x), G(y)}. ¯ If we express the right side by marginal survival functions F(x) ≡ 1 − F(x) = ¯ Pr(X > x) and G(y) ≡ 1 − G(y) = Pr(Y > y), we have ¯ ¯ ¯ ¯ Pr(X > x, Y > y) = F(x) + G(y) − 1 + C{1 − F(x), 1 − G(y)} ¯ F(x), ¯ ¯ = C{ G(y)}, ¯ where C(u, v) = u + v − 1 + C(1 − u, 1 − v) is also a copula (the proof is given in Appendix A). The last expression gives us a tool for modeling X and Y through a copula and two marginal survival functions. If X and Y are nonnegative random variables, the survival functions are usually easier to handle than the distribution functions. Interested readers are referred to the books of Emura and Chen (2018) and Emura et al. (2019b) for the analysis of survival data with copulas. Finally, we briefly introduce a trivariate copula. A trivariate copula is a trivariate distribution function whose marginal distributions follow Unif(0, 1). As in Definition 2.1, a trivariate copula is a function C : [0, 1]3 → [0, 1] satisfying the requirements similar to (C1) and (C2). The boundary condition (C1) is rewritten as C(u, v, 0) = C(u, 0, w) = C(0, v, w) = 0, C(u, 1, 1) = u, C(1, v, 1) = v, and C(1, 1, w) = w for 0 ≤ u, v, w ≤ 1. The condition (C2) is rewritten as the 3-increasing property, defined as VC ≥ 0 where VC ≡ C(u 2 , v2 , w2 ) − C(u 2 , v2 , w1 ) − C(u 2 , v1 , w2 ) − C(u 1 , v2 , w2 ) + C(u 2 , v1 , w1 ) + C(u 1 , v2 , w1 ) + C(u 1 , v1 , w2 ) − C(u 1 , v1 , w1 ) for 0 ≤ u 1 ≤ u 2 ≤ 1, 0 ≤ v1 ≤ v2 ≤ 1 and 0 ≤ w1 ≤ w2 ≤ 1.
2.3 Kendall’s Tau Kendall’s tau is a measure of dependence between two variables, which is defined by τ = Pr{(X 1 − X 2 )(Y1 − Y2 ) > 0} − Pr{(X 1 − X 2 )(Y1 − Y2 ) < 0},
Probability of concordance
Probability of discordance
(2.2)
2.3 Kendall’s Tau
11
where (X 1 , Y1 ) and (X 2 , Y2 ) are independent and identically distributed random vectors (Kendall 1948). Here, we have two independent subjects, Subject 1 for (X 1 , Y1 ) and Subject 2 for (X 2 , Y2 ). The two subjects are said to be “concordant” if the larger (smaller) value for X gives the larger (smaller) value for Y. The two subjects are said to be “discordant” if the larger (smaller) value for X gives the smaller (larger) value for Y. By definition, the range of Kendall’s tau is −1 ≤ τ ≤ 1. If X and Y are independent, then τ = 0. If Y = f (X ) for a strictly increasing function f (·), then τ = 1. This is because the pairs (X 1 , Y1 ) and (X 2 , Y2 ) are concordant with probability one. Similarly, if Y = g(X ) for a strictly decreasing function g(·), then τ = −1. Under the copula model (2.1), Kendall’s tau between X and Y can be written as
1
τ =4 0
1
C(u, v)dC(u, v) − 1.
0
Thus, Kendall’s tau is free from the marginal distributions. It is convenient to introduce the following notations of the partial derivatives of a copula: ∂C(u, v) , ∂u ∂C(u, v) , (u, v) = ∂v ∂ 2 C(u, v) . (u, v) = ∂u∂v
C [1,0] (u, v) = C [0,1] C [1,1]
If C is a continuous distribution function (i.e., C has a density function), C [1, 1] is a bivariate density function, called the “copula density.” It holds that C(u, v) =
u v [1, 1] C (s, t)dsdt. If the copula has its density, Kendall’s tau is computed as 0 0
1
τ =4 0
1
C(u, v)C [1,1] (u, v)dudv − 1.
0
For instance, Kendall’s tau under the independence copula is 0 since τ =4 0
1
1 0
uv · 1 · dudv − 1 = 4
1
udu 0
1
vdv − 1 = 1 − 1 = 0.
0
The Fréchet–Hoeffding upper (or lower) bound copula does not have a copula density since the corresponding distribution is degenerated to a line. One can still compute Kendall’s tau
12
2 Copula and Markov Models
1
τM = 4
0
τW = 4 0
1
1
0
1
1
M(u, v)d M(u, v) = 4
udu − 1 = 1,
0
1
W (u, v)dW (u, v) = 4
0
0du − 1 = −1.
0
See Chap. 5 of Nelsen (2006) for the details. Thus, if X = Y with probability one, then Kendall’s tau for X and Y is 1. If X = −Y with probability one, then Kendall’s tau for X and Y is −1. Estimation of Kendall’s tau for bivariate data is straightforward. Let (X i , Yi ), i = 1, . . . , n, be independent samples following the distribution Pr(X ≤ x, Y ≤ y) = C{F(x), G(y)}. Then, the sample Kendall’s tau is defined as τˆ =
−1 −1 n n I {(X i − X j )(Yi − Y j ) > 0} − I {(X i − X j )(Yi − Y j ) < 0} 2 2 i< j i< j
Probability of being concordant
Probability of being discordant
−1 n = sgn{(X i − X j )(Yi − Y j )}, 2 i< j
where sgn(x) = − 1 for x < 0, sgn(x) = 0 for x = 0 and sgn(x) = 1 for x > 0. This estimator is consistent and unbiased for τ . However, if the independence assumption does not hold (e.g., (X i , Yi ) and (X j , Y j ) are not independent for i = j), the estimator τˆ may be inconsistent and biased.
2.4 Archimedean Copulas Genest and MacKay (1986) initially studied an important class of copulas called Archimedean copulas, which is defined by C(u, v) = φ −1 {φ(u) + φ(v)}, where φ : [0, 1]→[0, ∞] is a generator of the copula having the inverse function φ −1 : [0, ∞]→[0, 1]. To satisfy (C1) and (C2), the generator needs to be continuous and strictly decreasing from φ(0) > 0 to φ(1) = 0, as well as ∂φ(t)/∂t < 0 and ∂ 2 φ(t)/∂t 2 > 0 for any t ∈ (0, 1). The generator of the independence copula is φ(t) = − log(t) where the inverse function is φ −1 (s) = exp(−s). One can easily verify φ −1 {φ(u)+φ(v)} = uv. Below are the examples of Archimedean copulas.
2.4 Archimedean Copulas
13
(iv) Clayton copula (Clayton 1978): Cα (u, v) = max(u −α + v−α − 1, 0)−1/α , α ∈ [−1, +∞)\{0}; Generator: φα (t) = (t −α − 1)/α; Inverse: φα−1 (s) = (1 + αs)−1/α (v) Joe copula (Joe 1993): Cα (u, v) = 1−{(1−u)α +(1−v)α −(1−u)α (1−v)α }1/α , α ∈ [1, +∞); Generator: φα (t) = − log{1 − (1 − t)α }; Inverse: φα−1 (s) = 1 − {1 − exp(−s)}1/α (vi) Frank copula (Frank1979): −αu −αv −1) Cα (u, v) = − α1 log 1 + (e −1)(e , α ∈ (−∞, 0) ∪ (0, ∞); −α e −1 Generator: φα (t) = − log{(e−αt − 1)/(e−α − 1)}; Inverse: φα−1 (s) = − α1 log{1 + (e−α − 1) exp(−s)}
The Clayton and Frank copulas converge to the independence copula when α → 0 (see Appendix B for the proof). The Joe copula becomes the independent copula when α = 1. The Clayton copula allows negative dependence for α ∈ [−1, 0), particularly giving the Fréchet–Hoeffding lower bound copula for α = −1. However, the copula density does not exist on the line u −α +v−α −1 = 0, giving a “singular” distribution for α ∈ [−1, 0). Hence, the Clayton copula is usually not suitable for modeling negative dependence except for some special examples (e.g., Emura et al. 2011). The Clayton copula is also known as the MTCJ (Mardia–Takahashi–Cook– Johnson) copula (Joe 2015). We use the former in this book. Many statistical models and software packages in biostatistics focus on the Clayton copula due to its ease of conducting simulation (e.g., Rotolo et al. 2013), estimation (e.g., Emura et al. 2017b; Rotolo et al. 2018; Emura et al. 2019b, c; Huang et al. 2020b; Wu et al. 2020), feature selection (Emura and Chen 2016; Emura et al. 2019a), and prediction (e.g., Emura et al. 2018). Due to its simplicity, we use the Clayton copula as a main choice throughout the book. However, the Clayton copula is not always the best choice for a given dataset. The issue of the copula model selection is an important issue that will be discussed in Chap. 3 For Archimedean copulas, Genest and MacKay (1986) derived Kendall’s tau as τ =1+4 0
1
φ(t) dt = 1 − 4 ∂φ(t)/∂t
∞
s 0
∂φ −1 (s) ∂s
2 ds.
These expressions substantially simplify the calculation from the double integral to one-dimensional integral. For instance, the Clayton copula gives τα = 1 − 4 0
∞
∂(1 + αs)−1/α s ∂s
2 ds =
α . α+2
14
2 Copula and Markov Models
The Joe copula gives
s 0
4 =1 − 2 α
∞
τα =1 − 4
∞
2 ∂ 1 1 − (1 − e−s ) α −1 ds ∂s
s(1 − e−s ) α −2 e−2s ds. 2
0
The Frank copula gives τα = 1 −
1 α t 4 1− dt . α α 0 et − 1
See Appendix C for the derivations of Kendall’s tau. Using a generator, one can easily construct a trivariate copula through C(u, v, w) = φ −1 {φ(u) + φ(v) + φ(w)}. Under the trivariate case, the Clayton copula is defined as Cα (u, v, w) = (u −α + v−α + w−α − 2)−1/α , α > 0. The trivariate Clayton copula induces the bivariate Clayton copulas such that −1/α Cα (u, v, 1) = u −α + v−α − 1 Cα (u, 1, w) = (u −α + w−α − 1)−1/α Cα (1, v, w) = (v−α + w−α − 1)−1/α Thus, they have the pairwise Kendall’s tau τuv = τuw = τvw = α/(α + 2). The trivariate Joe copula is defined as Cα (u, v, w) =1 − {(1 − u)α + (1 − v)α + (1 − w)α − (1 − u)α (1 − v)α − (1 − v)α (1 − w)α − (1 − u)α (1 − w)α + (1 − u)α (1 − v)α (1 − w)α }1/α .
The trivariate Joe copula induces the bivariate Joe copulas, e.g., Cα (u, v, 1) = 1 − {(1 − u)α + (1 − v)α − (1 − u)α (1 − v)α }1/α
2.4 Archimedean Copulas
15
Hence, all the pairwise dependence structures are equivalent under the trivariate Clayton and Joe copulas. This is true for all the trivariate Archimedean copulas. Interestingly, the trivariate Clayton copula also gives the bivariate conditional Clayton copula when one of the three variables is conditioned on; other Archimedean copulas do not have this invariance property (Stoeber et al. 2013).
2.5 Random Number Generation We introduce a useful method (Sect. 2.9 of Nelsen (2006)) to generate X and Y following Pr(X ≤ x, Y ≤ y) = C{F(x), G(y)}. We first generate U ∼ Unif(0, 1). Given the value U = u, we generate V following Pr(V ≤ v|U = u) = C [1, 0] (u, v). To do this, solve the equation W = C [1, 0] (u, V ) for V, where W ∼ Unif(0, 1) is independent of U. In this way, the pair (U, V ) follows Pr(U ≤ u, V ≤ v) = C(u, v). Finally, we obtain X = F −1 (U ) and Y = G −1 (V ). Example 2.1 (Clayton copula): The conditional distribution function is Pr(V ≤ v|U = u) = Cα[1, 0] (u, v) = (u −α + v−α − 1)−1/α−1 u −α−1 . The data-generating algorithm is derived as follows: Step 1 Generate two independent variables U and W from Unif(0, 1). Step 2 Obtain V by V = {(W −α/(α+1) − 1)U −α + 1}−1/α . Step 3 The pair (U, V ) follows the Clayton copula. Upper panels of Fig. 2.1 show the scatter plots of 500 pairs of (U, V ) that are generated using the above algorithm (Supplementary material 1 for the R codes). The plot shows the presence of a lower tail dependence (strong dependence at (U, V ) ≈ (0, 0)), which is a feature of the Clayton copula. The level of dependence increases as α increases. The copula density of the Clayton copula is Cα[1, 1] (u, v) = (1 + α)u −α−1 v−α−1 (u −α + v−α − 1)−1/α−2 . The contour plots of the copula density (lower panels of Fig. 2.1) agree with the observed features of the (U, V )’s.
16
2 Copula and Markov Models
Fig. 2.1 Scatter plots of (Ui , Vi ), i = 1, . . . , 500, generated from the Clayton copula (upper panels) and the respective contour plots of the copula density Cα[1, 1] (u, v) (lower panels)
Example 2.2 (Joe copula): The conditional distribution function is Pr(V ≤ v | U = u) = Cα[1, 0] (u, v) = {(1 − u)α + (1 − v)α − (1 − u)α (1 − v)α }1/α−1 {1 − (1 − v)α }(1 − u)α−1 The data-generating algorithm is Step 1 Generate two independent variables U and W from Unif(0, 1). Step 2 Solve W ={(1 − U )α +(1 − V )α −(1 − U )α (1 − V )α }1/α−1 {1 − (1 − V )α }(1 − U )α−1 to obtain V. Step 3 The pair (U, V ) follows the Joe copula.
2.5 Random Number Generation
17
Fig. 2.2 Scatter plots of (Ui , Vi ), i = 1, . . . , 500, generated from the Joe copula (upper panels) and the respective contour plots of the copula density Cα[1, 1] (u, v) (lower panels)
Figure 2.2 shows the scatter plots of 500 pairs of (U, V ) generated by the above algorithm (Supplementary material 1 for the R codes). We observe the presence of an upper tail dependence (strong dependence at (U, V ) ≈ (1, 1)), which is a feature of the Joe copula. The level of dependence increases as α increases. The copula density is [1, 1]
Cα
(u, v) = α{(1 − u)α + (1 − v)α − (1 − u)α (1 − v)α }1/α−1 (1 − u)α−1 (1 − v)α−1 + (α − 1){(1 − u)α + (1 − v)α − (1 − u)α (1 − v)α }1/α−2 × {1 − (1 − u)α }{1 − (1 − v)α }(1 − u)α−1 (1 − v)α−1 .
The contour plots of the copula density (lower panels of Fig. 2.2) agree with the observed features of the (U, V )’s.
18
2 Copula and Markov Models
Example 2.3 (Frank copula): The conditional distribution function is Pr(V ≤ v|U = u) = Cα[1, 0] (u, v) =
e−α
e−αu (e−αv − 1) . − 1 + (e−αu − 1)(e−αv − 1)
Then the data-generating algorithm is Step 1 Generate two independent variables U and W from Unif(0, 1). Step 2 Solve the equation V = −1/α log[(e−αU − W e−αU + W e−α )/(e−αU − W e−αU + W )] to obtain U. Step 3 The pair (U, V ) follows the Frank copula. Figure 2.3 shows the scatter plots of 500 pairs of (U, V ) (Supplementary material 1 for the R codes). A negative value of α gives negative dependence while a positive value of α gives positive dependence between U and V. The copula density is
Fig. 2.3 Scatter plots of (Ui , Vi ), i = 1, . . . , 500, generated from the Frank copula (upper panels) and the respective contour plots of the copula density Cα[1, 1] (u, v) (lower panels)
2.5 Random Number Generation
Cα[1, 1] (u, v) =
19
{e−α
α(1 − e−α )e−αu e−αv . − 1 + (e−αu − 1)(e−αv − 1)}2
See Appendix D for the derivation. The contour plots of the copula density (lower panels of Fig. 2.3) agree with the observed features of the (U, V )’s. Example 2.4 (Gaussian copula): The conditional distribution function is
Pr(V ≤ v|U = u) =
Cρ[1, 0] (u,
Φ −1 (v) − ρΦ −1 (u) v) = Φ . 1 − ρ2
See Appendix E for the derivation. The data-generating algorithm is Step 1 Generate X and Y from a bivariate normal distribution with E[X ] = E[Y ] = 0, V ar [X ] = V ar [Y ] = 1, and Cov(X, Y ) = ρ ∈ (−1, 1). Step 2 The pair (U, V ) follows the Gaussian copula by setting U = Φ(X ) and V = Φ(Y ).
Fig. 2.4 Scatter plots of (Ui , Vi ), i = 1, . . . , 500, generated from the Gaussian copula (upper panels) and the respective contour plots of the copula density Cρ[1, 1] (u, v) (lower panels)
20
2 Copula and Markov Models
For Step 1, a variety of simulation methods are available (Supplementary material 1 for the R codes). However, R users may simply use the mvrnorm(.) function from the MASS package. Figure 2.4 shows the scatter plots of 500 pairs of (U, V ) (see Supplementary material 1 for the R codes). A negative value of ρ gives negative dependence while a positive value of ρ gives positive dependence. Under the case of Gaussian copula, Kendall’s tau is given by τ = π2 arcsin(ρ) (Fang et al. 2002). The copula density is Cρ[1, 1] (u,
ρ 2 {Φ −1 (u)2 + Φ −1 (v)2 } − 2ρΦ −1 (u)Φ −1 (v) . v) = exp − 2(1 − ρ 2 ) 1 − ρ2 1
The contour plots of the copula density (lower panels of Fig. 2.4) agree with the observed features of (U, V )’s. Some authors suggest using the Gaussian copula for time series modeling (e.g., Alqawba et al. 2019) and biostatistical modeling (Suresh et al. 2019).
2.6 Copula-Based Markov Chain This subsection is a review of Darsow et al. (1992) and Nelsen (2006) who developed and summarized copula-based Markov chain models. We first define the Markov process for the discrete time index: Definition 2.2 (Markov process) A time series {Yt : t = 1, . . . , n} is called Markov process if Pr(Yt ≤ y|Yt1 , . . . , Ytm ) = Pr(Yt ≤ y|Ytm ) for t1 < t2 < · · · < tm < t and m ≥ 2. For the Markov process, one can verify the Chapman–Kolmogorov equation: Pr(Yt ≤ y|Yt1 ) = E[Pr(Yt ≤ y|Yt1 , Yt2 )|Yt1 ] =
Pr(Yt ≤ y|yt2 )d Pr(Yt2 ≤ yt2 |Yt1 )
for t1 < t2 < t. This equation means that the transition distribution of t1 → t is determined by the convolution of the two intermediate transition distributions of t1 → t2 and t2 → t. Hence, it suffices to specify the transition distributions for (t − 1) → t to determine all other transition distributions. Thus, we will consider how the transition distribution of Pr(Yt ≤ y|Yt−1 ) is modeled by a copula under the Markov process. The Chapman–Kolmogorov equation is a necessary but not sufficient condition for the time series to be a Markov process (Darsow et al. 1992). Darsow et al. (1992) first introduced the copula-based Markov processes for time series using copulas. For a Markov process {Yt : t = 1, . . . , n}, they considered a bivariate copula model Pr(Yt ≤ yt , Yt−1 ≤ yt−1 ) = Ct, t−1 {G t (yt ), G t−1 (yt−1 )},
(2.3)
2.6 Copula-Based Markov Chain
21
where G t (y) = Pr(Yt ≤ y) is the marginal distribution. The transition distribution function is 1] {G t (yt ), G t−1 (yt−1 )}. Pr(Yt ≤ yt | Yt−1 = yt−1 ) = Ct,[0,t−1
If Yt ∼ G(yt ) is continuous, the transition density function is derived as 1] {G t (yt ), G t−1 (yt−1 )}gt (yt ), ∂ Pr(Yt ≤ yt |Yt−1 = yt−1 )/∂ yt = Ct,[1,t−1
where ∂gt (y) = dG t (y)/dy is the marginal density. By the Chapman–Kolmogorov equation, the transition density determines the whole probabilistic behavior of the Markov process {Yt : t = 1, . . . , n}. Darsow et al. (1992) considered the operator “∗” called the Markov product, defined by Cs,x ∗ C x,t (u, v) =
1
0
[1, 0] [0, 1] Cs,x (u, t)C x,t (t, v)dt,
where Cs,t is the copula for (Ys , Yt ), Cs,x is the copula for (Ys , Yx ), and C x,t is the copula for (Yx , Yt ). Under the copula-based Markov model (2.3), we have Cs,t = Cs,x ∗ C x,t ,
0 ≤ s < x < t ≤ n.
(2.4)
Equation (2.4) implies that the transition copula of s → t is determined by the two intermediate transition copulas of s → x and x → t. Furthermore, if the same copula is imposed for the one-step transition of (t − 1) → t, (t−s) times
Cs,t = Cs,s+1 ∗ Cs+1,t
= C ∗ C ∗ · · · ∗ C,
which is the (t − s) fold *-product of C(·, ·). In practical applications of the copula-based Markov process, we often specify some parametric forms of C(·, ·) or G(·). Chen and Fan (2006) considered a semiparametric method by specifying the parametric form of C(·, ·), but without specifying the form of G(·); see also Li et al. (2019) and Zhang et al. (2020). Long and Emura (2014), Emura et al. (2017a), and Huang and Emura (2019) considered a fully parametric model with the normal distribution and the Clayton or Joe copula; see also the case of the t-distribution (Sun et al. 2018) and the normal mixture distribution (Lin et al. 2019). Huang et al. (2020a) studied the case of the binomial distribution for G(·) while Huang et al. (2020b) studied the case of the Weibull distribution. Many applications of the copula-based Markov processes are specifically seen in the context of statistical process control (Long and Emura 2014; Emura et al. 2017a; Huang and Emura 2019; Kim et al. 2019; Sonmez and Baray 2019) and finance (Domma et al. 2009; Sun et al. 2018; Lin et al. 2019), but rarely seen in survival analysis except for Huang et al. (2020b).
22
2 Copula and Markov Models
For all the parametric models mentioned above, modeling the forms of G(·) and C(·, ·) is the essential idea. Darsow et al. (1992) envisioned their ideas as follows: In our approach, one specifies a Markov process by giving all of the marginal distributions and a family of 2-copulas satisfying (3.7).1 Ours is accordingly an alternative approach to the study of Markov processes which is different in principle from the conventional one.
The second sentence in the above statement emphasizes the difference of the copula model from the conditional models, such as the first-order autoregressive AR(1) model Yt = ξ + ρYt−1 + εt ,
t = 1, . . . , n,
where −∞ < ξ < ∞, −1 < ρ < 1, εt ∼iid N(0, τ 2 ), and τ 2 > 0. In the AR(1) model, the conditional distribution for Yt |Yt−1 is modeled instead of the marginal distribution for Yt . In the AR(1) model, the variance parameter τ 2 is not equal to V ar (Yt ) since the latter depends on the serial dependence parameter ρ. The advantage of the copula-based models is that the marginal distribution for Yt is separately modeled from the serial dependence structure. An example of the copula-based Markov chain models under the Clayton copula is Pr(Yt ≤ yt , Yt−1 ≤ yt−1 ) = {G(yt )−α + G(yt−1 )−α − 1}−1/α , where Yt ∼ G(yt ) = Φ{(yt − μ)/σ } ∼ N(μ, σ ) is a normal distribution with mean μ and variance σ 2 . One can generate a time series by the following algorithm: Step 1 Generate U1 from Unif(0, 1). Step 2 Generate Wt from Unif(0, 1) and solve the equation −α −α−1 − 1)−1/α−1 Ut−1 , Wt = (Ut−α + Ut−1
to obtain Ut for t = 2, . . . , n. Step 3 Yt = G −1 (Ut ) for t = 1, . . . , n. This algorithm has been applied to several studies (Long and Emura 2014; Emura et al. 2017a; Huang and Emura 2019). Figure 2.5 shows the plots for Yt , t = 1, . . . , 500, generated from the above algorithm under α = 2 and α = 8 (Supplementary material 1 for the R codes). The marginal distribution is the standard normal distribution so that μ = E(Yt ) = 0 and V ar (Yt ) = σ 2 = 1. We observe that the time series under α = 8 exhibits a higher degree of serial dependence than that under α = 2. For both time series, the plots are scattered around μ = 0 and bounded in the 3-σ interval [μ − 3σ, μ − 3σ ]. 1 *(3.7)
should be read as (2.4) of our book.
2.6 Copula-Based Markov Chain
23
Fig. 2.5 The time series plots for Yt , t = 1, . . . , 500, generated under the Clayton copula model (with α = 2 and α = 8) and the standard normal marginal distribution
Appendix A: The Proof of C(u, v) = u + v − 1 + C(1 − u, 1 − v) being a Copula ¯ We verify the conditions (C1) and (C2) for C(u, v). Recall that C(u, v) is a copula by assumption. The condition (C1) holds since ¯ C(u, 0) = u + 0 − 1 + C(1 − u, 1) = u − 1 + 1 − u = 0, ¯ C(0, v) = 0 + v − 1 + C(1, 1 − v) = v − 1 + 1 − v = 0, ¯ C(u, 1) = u + 1 − 1 + C(1 − u, 0) = u + 0 = u, ¯ C(1, v) = 1 + v − 1 + C(0, 1 − v) = v + 0 = v. The condition (C2) holds since ¯ 2 , v1 ) − C(u ¯ 1 , v2 ) + C(u ¯ 1 , v1 ) ¯ 2 , v2 ) − C(u C(u = u 2 + v2 − 1 + C(1 − u 2 , 1 − v2 ) − [u 2 + v1 − 1 + C(1 − u 2 , 1 − v1 )] − [u 1 + v2 − 1 + C(1 − u 1 , 1 − v2 )] + u 1 + v1 − 1 + C(1 − u 1 , 1 − v1 ) = C(1 − u 1 , 1 − v1 ) − C(1 − u 2 , 1 − v1 ) − C(1 − u 1 , 1 − v2 ) + C(1 − u 2 , 1 − v2 ) = C(u¯ 1 , v¯ 1 ) − C(u¯ 2 , v¯ 1 ) − C(u¯ 1 , v¯ 2 ) + C(u¯ 2 , v¯ 2 ) ≥ 0 Here, u¯ i ≡ 1 − u i and v¯ i ≡ 1 − vi for i = 1 and 2.
24
2 Copula and Markov Models
Appendix B: Proofs of Copulas Approaching to the Independence We first consider the Clayton copula. As α → 0, lim Cα (u, v) = lim exp{log(u −α + v −α − 1)−1/α }
α→0
α→0
= exp[ lim {− log(u −α + v −α − 1)/α}] α→0
= exp[ lim {−(u −α log u + v −α log v)/(u −α + v −α − 1)}] α→0
(L’Hopital’s rule)
= exp[log u + log v] = uv.
The last expression is the independence copula. An alternative proof is to consider the generator φα (t) = (t −α − 1)/α for α > 0. It follows that d −α t −α − t −0 = t α=0 = − log(t). α→0 α dα
lim φα (t) = lim
α→0
The last expression is the generator of the independence copula. Similarly, the Frank copula reduces to the independence copula because
e−αt − 1 lim φα (t) = lim − log −α α→0 α→0 e −1
−αt te = − log(t). = lim − log α→0 e−α
Appendix C: Derivations of Kendall’s Tau Kendall’s tau for the Clayton copula:
2
2 ∞ 1 ds = 1 − 4 s − ( 1 + αs)− α −1 ds 0 0 ∞ ∞ 1 2 2 =1 − 4 s( 1 + αs )− α −2 ds = 1 − 4 2 ( t − 1 )t − α −2 dt α 0 1 α − 2 ∞ α − 2 −1 ∞ 1 t θ =1 − 4 2 − t θ + α 2 2+α 1 1 α = α+2
τθ = 1 − 4
∞
d 1 s ( 1 + αs )− α ds
Appendix C: Derivations of Kendall’s Tau
25
Kendall’s tau for the Joe copula:
2 ∂ 1 1 − (1 − e−s ) α ds ∂s 0 2 ∞ 1 1 −s α −1 −s 1−e s − e ds =1 − 4 α 0 ∞ 4 2 =1 − 2 s(1 − e−s ) α −2 e−2s ds. α 0
τα = 1 − 4
∞
s
Kendall’s tau for the Frank copula: Kendall’s tau can be derived by
1
τ =4
0
1
C(u, v)dC(u, v) − 1
0
1
1
1
C(u, v)C [1,1] (u, v)dudv − 1 0 0 1 1 u=1 =4 C(u, v) C [0,1] (u, v)u=0 − C [1,0] (u, v)C [0,1] (u, v)du dv − 1 0 0 1 1 [1,0] [0,1] C (u, v)C (u, v)du dv − 1 =4 v− =4
0
=2 − 4
0
=1 − 4 0
1
0 1
C [1,0] (u, v)C [0,1] (u, v)dudv − 1
0 1
C [1,0] (u, v)C [0,1] (u, v)dudv
0
after integration by parts. According to Nelsen (1986), under the Frank copula, Cα[1, 0] (u, v) =
e−α
e−αu (e−αv − 1) − 1 + (e−αu − 1)(e−αv − 1)
e−α
e−αv (e−αu − 1) . − 1 + (e−αu − 1)(e−αv − 1)
and similarly Cα[0, 1] (u, v) = Thus,
e−αu e−αv (e−αu − 1)(e−αv − 1) dudv {e−α − 1 + (e−αu − 1)(e−αv − 1)}2 . 0 0 α 1 t 4 1− =1− dt α α 0 et − 1
τ =1−4
1
1
26
2 Copula and Markov Models
Nelsen (1986) provided the above derivation, but the derivation of the last equality is unclear to us. To prove the last equality, we give a method of numerical verification by R codes. alpha = 10 InnerFunc = function(u, v) { ( exp(-alpha*v) * (exp(-alpha*u)-1) * exp(-alpha*u) * (exp(-alpha*v)-1) )/ ( exp(-alpha)-1+(exp(-alpha*u)-1)*(exp(-alpha*v)-1) )^2 } InnerIntegral = function(x) { sapply(x, function(z) { integrate(InnerFunc, 0, 1, v=z)$value }) } 1-4*integrate(InnerIntegral, 0, 1)$value dfunc = function(t){t/(exp(t)-1)} 1-4/alpha*(1-1/alpha*integrate(dfunc, 0, alpha)$value)
No matter how we choose the value of “alpha,” the two numerical integrations give the identical value.
Appendix D: Derivation of Cα[1,1] (u, v) Under the Frank Copula Pr(V ≤ v|U = u) = Cα[1, 0] (u, v) = ∂ [1,0] [1,1] Cα (u, v) Cα (u, v) =
e−α
e−αu (e−αv − 1) − 1 + (e−αu − 1)(e−αv − 1)
∂v
∂ {e−αu (e−αv − 1)}{e−α − 1 + (e−αu − 1) e−αv − 1)} = ∂v {e−α − 1 + (e−αu − 1)(e−αv − 1)}2 ∂ {e−α − 1 + (e−αu − 1)(e−αv − 1)} −αu (e−αv − 1)} ∂v {e − {e−α − 1 + (e−αu − 1)(e−αv − 1)}2
=
−αe−αu e−αv {e−α − 1 + (e−αu − 1)(e−αv − 1)} + αe−αu e−αv (e−αu − 1)(e−αv − 1) {e−α − 1 + (e−αu − 1)(e−αv − 1)}2
=
α(1 − e−α )e−αu e−αv {e−α − 1 + (e−αu − 1)(e−αv − 1)}2
[1,0]
Appendix E: Derivation of Cρ Copula
(u, v) Under the Gaussian
Let X ≡ Φ −1 (U ) and Y ≡ Φ −1 (V ). Then, X and Y jointly follow a bivariate normal distribution with E[X ] = E[Y ] = 0, V ar [X ] = V ar [Y ] = 1, and Cov(X, Y ) = ρ ∈ (−1, 1). By the property of the bivariate normal distribution, Y |X = x ∼ N (ρx, 1 − ρ 2 ). Thus,
Appendix E: Derivation Under the Gaussian Copula
y Pr(Y ≤ y|X = x) = 0
27
(t − ρx)2 dt. exp − 2(1 − ρ 2 ) 2π(1 − ρ 2 ) 1
The expression can be re-expressed as Φ −1 (v)−ρΦ −1 (u) √ 1−ρ 2
Pr(V ≤ v|U = u) = 0
2 Φ −1 (v) − ρΦ −1 (u) 1 z , dz = Φ √ exp − 2 2π 1 − ρ2
where a transformation z = √t−ρx 2 is applied. 1−ρ
References Alqawba M, Diawara N, Chaganty NR (2019) Zero-inflated count time series models using Gaussian copula. Seq Analy 38(3):342–357 Chen X, Fan Y (2006) Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. J Econometr 135(1–2):125–154 Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65(1):141–151 Darsow WF, Nguyen B, Olsen ET (1992) Copulas and Markov processes. Illinois J Math 36(4):600– 642 Domma F, Giordano S, Francesco PP (2009) Statistical modeling of temporal dependence in financial data via a copula function. Commun Statist Simulat Comput 38:703–728 Durante F, Sempi C (2016) Principles of copula theory: chapman and Hall/CRC Emura T, Wang W, Hung HN (2011) Semi-parametric inference for copula models for dependently truncated data. Statist Sinica 21:349–367 Emura T, Chen YH (2016) Gene selection for survival data under dependent censoring, a copulabased approach. Statist Methods Med Res 25(6):2840–2857 Emura T, Long T-H, Sun L-H (2017a) Routines for performing estimation and statistical process control under copula-based time series models. Commun Statist Simulat Comput 46(4):3067– 3087 Emura T, Nakatochi M, Murotani K, Rondeau V (2017b) A joint frailty-copula model between tumour progression and death for meta-analysis. Statist Methods Med Res 26(6):2649–2666 Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2018) Personalized dynamic prediction of death according to tumour progression and high-dimensional genetic factors: meta-analysis with a joint model. Statist Methods Med Res 27(9):2842–2858 Emura T, Matsui S, Chen HY (2019a) Compound.Cox: univariate feature selection and compound covariate for predicting survival. Comput Methods Prog Biomed 168:21–37 Emura T, Matsui S, Rondeau V (2019b) Survival analysis with correlated endpoints, joint frailtycopula models, JSS research series in statistics, Springer Emura T, Shih JH, Ha ID, Wilke RA (2019c) Comparison of the marginal hazard model and the sub-distribution hazard model for competing risks under an assumed copula. Statist Methods Med Res. http://doi.org/10.1177/0962280219892295 Emura T, Chen YH (2018) Analysis of survival data with dependent censoring, copula-based approaches, JSS research series in statistics, Springer
28
2 Copula and Markov Models
Fang HB, Fang KT, Kotz S (2002) The meta-elliptical distributions with given marginals. J Multi Analy 82(1):1–16 Frank MJ (1979) On the simultaneous associativity of F(x, y) and x + y – F(x, y). Aequationes Math 19:194–226 Genest C, MacKay RJ (1986) Copules archimédiennes et families de lois bidimensionnelles dont les marges sont données. Canadian J Statist 14(2):145–159 Huang X-W, Emura T (2019) Model diagnostic procedures for copula-based Markov chain models for statistical process control. Commun Statist Simulat Comput. https://doi.org/10.1080/036 10918.2019.1602647 Huang X-W, Chen WR, Emura T (2020a) Likelihood-based inference for a copula-based Markov chain model with binomial time series, submitted Huang X.-W., Wang W, Emura T (2020b). A copula-based Markov chain model for serially dependent event times with a dependent terminal event, Japanese J Stat Data Sci, in revision Joe H (1993) Parametric families of multivariate distributions with given margins. J Multi Analy 46(2):262–282 Joe H (2015) Dependence modeling with copulas. Chapman and Hall/CRC Kendall MG (1948) Rank correlation methods. Griffin, Oxford, England Kim JM, Baik J, Reller M (2019) Control charts of mean and variance using copula Markov SPC and conditional distribution by copula. Commun Statist Simul Comput. https://doi.org/10.1080/ 03610918.2018.1547404 Li F, Tang Y, Wang HJ (2019) Copula-based semiparametric analysis for time series data with detection limits. Canadian J Statist 47(3):438–454 Lin WC, Emura T, Sun LH (2019) Estimation under copula-based Markov normal mixture models for serially correlated data. Commun Statist Simulat Comput. https://doi.org/10.1080/03610918. 2019.1652318 Long T-H, Emura T (2014) A control chart using copula-based Markov chain models. J Chinese Statist Assoc 52(4):466–496 Mastrangelo CM, Montgomery DC (1995) SPC with correlated observations for the chemical and process industries. Qual Reliability Eng Int 11(2):79–89 Nelsen RB (1986) Properties of a one-parameter family of bivariate distributions with specified marginals. Commun Statist Theory Methods 15(11):3277–3285 Nelsen RB (2006) An introduction to copulas: Springer Science & Business Media Rotolo F, Legrand C, Van Keilegom I (2013) A simulation procedure based on copulas to generate clustered multi-state survival data. Comput Methods Prog Biomed 109(3):305–312 Rotolo F, Paoletti X, Michiels S (2018) surrosurv: an R package for the evaluation of failure time surrogate endpoints in individual patient data meta-analyses of randomized clinical trials. Comput Methods Prog Biomed 155:189–198 Stoeber J, Joe H, Czado C (2013) Simplified pair copula constructions—limitations and extensions. J Multi Analy 119:101–118 Suresh K, Taylor JM, Tsodikov A (2019) A Gaussian copula approach for dynamic prediction of survival with a longitudinal biomarker. Biostatistics. https://doi.org/10.1093/biostatistics/kxz049 Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Publications de l’Institut de Statistique de l’Université de Paris 8:229–231 Sonmez OE, Baray A (2019) On copula based serial dependence in statistical process control. In: Industrial engineering in the big data Era (pp 127–136). Springer, Cham Sun LH, Lee CS, Emura T (2018) A Bayesian inference for time series via copula-based Markov chain models. Commun Statist Simulat Comput. https://doi.org/10.1080/03610918.2018.152 9241 Wu BH, Michimae H, Emura T (2020) Meta-analysis of individual patient data with semi-competing risks under the Weibull joint frailty-copula model. Comput Statist. https://doi.org/10.1007/s00 180-020-00977-1 Zhang S, Zhou QM, Lin H (2020) Goodness-of-fit test of copula functions for semi-parametric univariate time series models. Statistical papers. https://doi.org/10.1007/s00362-019-01153-4
Chapter 3
Estimation, Model Diagnosis, and Process Control Under the Normal Model
Abstract This chapter introduces statistical methods for copula-based Markov models under the normal margin. First, the data structures and the idea of statistical process control are reviewed. The copula-based Markov models and essential assumptions are introduced as well. Next, we derive the likelihood functions under the first-order and the second-order Markov models and define the maximum likelihood estimators (MLEs). We then give the asymptotic properties of the MLEs. We propose goodness-of-fit methods to test the model assumptions based on a given dataset. In addition, a copula model selection method is discussed. We introduce an R package Copula.Markov to implement the statistical methods of this chapter. Finally, we analyze three real datasets for illustration. Keywords Asymptotic theory · Autoregressive model · Copula · Goodness-of-fit test · Maximum likelihood estimation · Markov chain · Normal distribution · Statistical process control
3.1 Serial Dependence, Statistical Process Control, and Copulas Researchers may collect data on a measurement of interest at different time points. The resultant dataset yields a time series dataset, where each measurement is recorded together with the time index. If two adjacent time points are close to each other, the measurements may be dependent. In particular, observed data collected in daily manufacturing processes are often dependent in the sense that the present sampling condition depends on the past ones. Thus, modeling dependence in time series data plays a crucial role in statistical process control (SPC) (Mastrangelo and Montgomery 1995; Montgomery 2009) and other objectives. Let {Y t : t = 1, …, n} be data collected on n different time points. In some cases, an unusually high (or low) value of Y t −1 may influence the next value of Y t (Bisgaard and Kulahci 2007). While the major goal of SPC is to monitor the marginal process parameters (mean and SD of Y t ), the dependence parameter
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 L.-H. Sun et al., Copula-Based Markov Models for Time Series, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-15-4998-4_3
29
30
3 Estimation, Model Diagnosis, and Process Control …
(e.g., cor(Y t −1, Y t ) largely influences long-term process performance of SPC, such as the average run length (ARL). Control charts provide a tool to detect out-of-control signals in observations {Y t : t = 1, …, n}. The three-sigma control chart √ consists of the center μ = E(Yt ) and the control limits μ ± 3σ , where σ = V ar (Yt ). If the parameters (μ, σ ) are unknown, one can use the estimators of lower control limit (LCL) and upper control limit (UCL), as μˆ − 3σˆ and μˆ + 3σˆ , respectively. Out-of-control signals are detected when Yt > UCL, or Yt < LCL. Control charts usually display the plot of {Y t : t = 1, …, n} together with the control limits (Montgomery 2009). A solid overview of serial dependence models in SPC is found in Wieringa (1999) and Knoth and Schmid (2004), while a concise review is seen in Box and Narasimhan (2010). The literature focuses on the first-order (Markov) models, including a first-order autoregressive AR(1) and a first-order moving average MA(1). These traditional models can only deal with linear dependence between two observations. For instance, the AR(1) model applies a linear structure Yt = ξ + ρYt−1 + εt ,
t = 1, . . . , n,
where −∞ < ξ < ∞, −1 < ρ < 1, εt ∼iid N(0, τ 2 ), and τ 2 > 0. The conditional distribution for Yt |Yt−1 is modeled instead of the marginal distribution for Yt . The variance parameter τ 2 is not equal to Var(Yt ) since the latter depends on the serial dependence parameter ρ. The advantage of the copula-based models is that the marginal distribution for Yt is separately modeled from the serial dependence structure. Long and Emura (2014) considered a copula-based Markov chain model to perform SPC for serially correlated data. In their model, serial dependence between a pair of consecutive observations (Yt−1 , Yt ) is modeled as Pr(Yt ≤ yt , Yt−1 ≤ yt−1 ) = C{G(yt ), G(yt−1 ) },
(3.1)
where C : [ 0, 1 ]2 → [ 0, 1 ] is a copula (Nelsen 2006) and G(y) = Pr(Yt ≤ y) is the marginal (stationary) distribution function. Note that the model (3.1) itself was originally proposed by Darsow et al. (1992), and subsequently applied to different statistical problems by Joe (1997), Chen and Fan (2006), Domma et al. (2009), Sun et al. (2018), Lin et al. (2019), Huang and Emura (2019), Huang et al. (2020a, b), Kim et al. (2019), Sonmez and Baray (2019), Li et al. (2019) and Zhang et al (2020). The computer tools were only recently available through the R package Copula.Markov (Emura et al. 2017a).
3.2 Model and Likelihood
31
3.2 Model and Likelihood We first review the normal distribution model for SPC under a copula-based Markov chain model as previously studied by Long and Emura (2014), Emura et al. (2017a), and Huang and Emura (2019). For observations {Y t : t = 1, …, n} we assume the Markov property Pr(Yt ≤ yt |Yt−1 = yt−1 , Yt−2 = yt−2 , . . .) = Pr(Yt ≤ yt |Yt−1 = yt−1 ) ∀t, and a bivariate copula model Pr(Yt ≤ yt , Yt−1 ≤ yt−1 ) = Cα {G(yt ), G(yt−1 ) },
(3.2)
where Cα is a copula (Nelsen 2006), α is the dependence parameter, and G(y) = Φ{(y − μ)/σ }, where z Φ(z) = 0
2 1 x dx √ exp − 2 2π
is the√distribution function of N(0, 1). We have marginal mean μ = E(Yt ) and SD σ = V ar (Yt ). The reason of choosing the normal margin is due to its remarkable popularity of the three-sigma rule of μ ± 3σ in SPC. We mainly focus on the one-parameter Clayton copula defined as −α −1/α , Cα (u 2 , u 1 ) = (u −α 2 + u 1 − 1)
(3.3)
where α > 0 is related to Kendall’s tau between Yt−1 and Yt via τ = α/(α + 2); see Chap. 2 for the details. Many statistical models and software packages in biostatistics focus on the Clayton copula due to its ease of conducting simulation (e.g., Rotolo et al. (2013)), estimation (e.g., Emura et al. (2017b), Rotolo et al. (2018), Emura et al. (2019b)), feature selection (Emura and Chen (2016); Emura et al. (2019a)), and prediction (e.g., Emura et al. (2018)). Due to its simplicity, we use the Clayton copula as our illustration of the subsequent methods. However, the Clayton copula is not always the best choice for a given dataset, and hence the issue of model selection will also be discussed. Under the models (3.2) and (3.3), Long and Emura (2014) considered the likelihood function L(μ, σ, α) =
n t=1
g(yt )
n
Cα[ 1, 1 ] {G(yt ), G(yt−1 ) }
t=2
for given {Y t : t = 1, …, n}. The corresponding log-likelihood is given by
32
3 Estimation, Model Diagnosis, and Process Control …
(μ, σ, α) =
n t=1
log g(yt ) +
n
log Cα[1,1] {G(yt ), G(yt−1 )},
t=2
where g(yt ) = ∂G(yt )/∂ yt and Cα[1,1] (u 2 , u 1 ) = ∂ 2 Cα (u 2 , u 1 )/∂u 2 ∂u 1 . Under the Clayton copula, log Cα[1,1] (u 2 , u 1 ) = log(1 + α) − (1 + α) log u 1 − (1 + α) log u 2 1 − + u −α − 1), α > 0. + 2 log(u −α 2 1 α Under the Joe copula Cα (u 2 , u 1 ) = 1−{(1−u 2 )α +(1−u 1 )α −(1−u 2 )α (1−u 1 )α }1/α , log Cα[1,1] (u 2 , u 1 ) = log {α − 1 + Aα (u 2 , u 1 )} + (α − 1) log(1 − u 1 ) + (α − 1) log(1 − u 2 ) 1 − 2 log Aα (u 2 , u 1 ), + α ≥ 1, α where Aα (u 2 , u 1 ) = (1 − u 2 )α + (1 − u 1 )α − (1 − u 2 )α (1 − u 1 )α . We extend the likelihood function to the second-order Markov chain model. The second-order Markov chain model is more difficult to interpret for SPC users, but it can fit well to some real dataset. Consequently, the second-order model is an attractive alternative to the first-order (Markov) model, and it even provides a tool for checking a Markov property. For a sequence {Y t : t = 1, …, n}, the conditional densities under the second-order model are g(yt | yt−1 , . . . , y1 ) = g(yt | yt−1 , yt−2 ) ∂ = Pr(Yt ≤ yt | Yt−1 = yt−1 , Yt−2 = yt−2 ). ∂ yt Hence, the probabilistic model for the sequence is specified by the joint distribution of three adjacent variables. We impose a trivariate copula function Pr(Yt ≤ yt , Yt−1 ≤ yt−1 , Yt−2 ≤ yt−2 ) = Cα {G(yt ), G(yt−1 ), G(yt−2 ) }, dependence parameter, where Cα : [ 0, 1 ]3 → [ 0, 1 ] is a trivariate copula, α is the √ and G(y) = Φ{(y − μ)/σ }. Note that μ = E(Yt ) and σ = V ar (Yt ). For instance, the trivariate Clayton copula is defined as −α −α −1/α , Cα (u 3 , u 2 , u 1 ) = (u −α 3 + u 2 + u 1 − 2)
where α > 0 describes the correlation between Yt−2 and Yt−1 , the correlation between Yt−2 and Yt , and the correlation between Yt−1 and Yt . While the model imposes a strong symmetric correlation structure, one important reason of using the trivariate
3.2 Model and Likelihood
33
Clayton copula is its simple mathematical form allowing an explicit data-generation scheme (e.g., Rotolo et al. (2013)). To derive the MLE (μ, ˆ σˆ , α), ˆ we use the conditional densities g(yt |yt−1 , yt−2 ) =
Cα[ 1, 1, 1 ] {G(yt ), G(yt−1 ), G(yt−2 ) } Cα[ 0, 1, 1 ] {1, G(yt−1 ), G(yt−2 ) }
g(yt ) for t ≥ 3,
and g(y2 |y1 ) = Cα[ 0, 1, 1 ] {1, G(y2 ), G(y1 ) }g(y2 ), where ∂ 3 Cα (u t , u t−1 , u t−2 ) , ∂u t ∂u t−1 ∂u t−2 ∂ 2 Cα (u t , u t−1 , u t−2 ) Cα[0,1,1] (u t , u t−1 , u t−2 ) = . ∂u t−1 ∂u t−2 Cα[1,1,1] (u t , u t−1 , u t−2 ) =
The likelihood function given by Huang and Emura (2019) is L(μ, σ, a ) = Cα[0,1,1] {1, G( y2 ), G( y1 )}
n n Cα[1,1,1] {G( yt ), G( yt−1 ), G( yt−2 )} t=3
Cα[0,1,1] {1, G( yt−1 ), G( yt−2 )}
g( yt )
t=1
The corresponding log-likelihood function is expressed as
(μ, σ, a) = log Cα[0,1,1] {1, G(y2 ), G(y1 )} +
n
log Cα[1,1,1] {G(yt ), G(yt−1 ), G(yt−2 )}
t=3
−
n
log Cα[0,1,1] {1, G(yt−1 ), G(yt−2 )}
t=3
+
n
. logg(yt )
t=1
For instance, the log-likelihood based on the Clayton copula can be expressed as
(μ, σ, α) = (n − 2) log(1 + 2α) + log(1 + α) n 1 − +3 log G(yt )−α + G(yt−1 )−α + G(yt−2 )−α − 1 α t=3 n 1 +2 + log G(yt−1 )−α + G(yt−2 )−α − 1 α t=4 − (α + 1)
n t=1
log G(yt ) +
n t=1
log g(yt ).
34
3 Estimation, Model Diagnosis, and Process Control …
The MLE for both the first-order model and second-order model is defined as (α, ˆ μ, ˆ σˆ ) = argmax (α, μ, σ ). (α, μ, σ )∈
Here = (0, ∞) × (−∞, ∞) × (0, ∞). When computing the MLE and SE, the constraints of parameters, such as σ ∈ (0, ∞), have to be noticed. One can remove the constraints by an appropriate transformation (MacDonald 2014). In the Clayton copula models, we use transformations S = log(σ ) and A = log(α) such that (A, μ, S) ∈ R3 where R ≡ (−∞, ∞). The transformed log-likelihood function is written as ˜
(A, μ, S) = {exp(A), μ, exp(S) } = (α, μ, σ ). The MLE of the transformed parameters is ˜ ˆ μ, ˆ = argmax (A, μ, S). (A, ˆ S) (A, μ, S)∈R3
The MLE of the original parameters is obtained by the inverse transformations.
3.3 Asymptotic Properties For the first-order model, we define ⎡ ∂ 2 log g(y2 |y1 ) ⎢ J(A, μ, S) ≡ −⎣
∂ 2 log g(y2 |y1 ) ∂ 2 log g(y2 |y1 ) ∂A∂A ∂A∂μ ∂A∂S ∂ 2 log g(y2 |y1 ) ∂ 2 log g(y2 |y1 ) ∂ 2 log g(y2 |y1 ) ∂A∂μ ∂μ∂μ ∂μ∂S ∂ 2 log g(y2 |y1 ) ∂ 2 log g(y2 |y1 ) ∂ 2 log g(y2 |y1 ) ∂A∂S ∂μ∂S ∂S∂S
⎤ ⎥ ⎦.
Alternatively, for the second-order model, we define ⎡ ∂ 2 log g(y3 |y2 , y1 ) ⎢ J(A, μ, S) ≡ −⎣
∂ 2 log g(y3 |y2 , y1 ) ∂ 2 log g(y3 |y2 , y1 ) ∂A∂A ∂A∂μ ∂A∂S ∂ 2 log g(y3 |y2 , y1 ) ∂ 2 log g(y3 |y2 , y1 ) ∂ 2 log g(y3 |y2 , y1 ) ∂A∂μ ∂μ∂μ ∂μ∂S ∂ 2 log g(y3 |y2 , y1 ) ∂ 2 log g(y3 |y2 , y1 ) ∂ 2 log g(y3 |y2 , y1 ) ∂A∂S ∂μ∂S ∂S∂S
⎤ ⎥ ⎦.
We then define I(A, μ, S) ≡ E{ J(A, μ, S) } as the information matrix. The observed information matrix is obtained by the second derivatives of the loglikelihood ⎤ ⎡ 2˜ ˜ ˜ μ, S) ∂ 2 (A, μ, S) ∂ (A, μ, S) ∂ 2 (A, ∂A∂A ∂A∂μ ∂A∂S ⎥ ⎢ ∂ 2 (A, ˜ ˜ ˜ μ, S) ∂ 2 (A, μ, S)) ∂ 2 (A, μ, S) ⎥ ⎢ ˆ ˆ J(A, μ, ˆ S) = − ⎣ ∂A∂μ . ⎦ ∂μ∂μ ∂μ∂S ˜ ˜ ˜ μ, S) ∂ 2 (A, μ, S) ∂ 2 (A, μ, S) ∂ 2 (A, ∂A∂S ∂μ∂S ∂S∂S ˆ ˆ (A, μ, S)=( A, μ, ˆ S)
3.3 Asymptotic Properties
35
ˆ μ, ˆ is the maxima of the log-likelihood, one may In order to examine if (A, ˆ S) ˆ μ, ˆ That is, all the check the negative definiteness of the Hessian matrix −J(A, ˆ S). eigenvalues of the Hessian matrix must be negative (see Theorem 7.7.1 of Khuri (2003, p. 284)). Under some regularity conditions, the result of Billingsley (1961) may be applied to show p ˆ μ, → I(A, μ, S) as n → ∞. J A, ˆ Sˆ / n − Furthermore, one has the asymptotic normality T d √ n Aˆ − A, μˆ − μ, Sˆ − S − → N 0, I−1 (A, μ, S) as n → ∞ ˆ μ, ˆ }k,k , {J−1 (A, ˆ S) Hence, for large samples, {I−1 (A, μ, S)/n }k,k ≈ where k = 1, 2, 3. Therefore, we have large sample approximations: ˆ μ, , ˆ Sˆ Aˆ ∼ N A, J −1 A, 1,1
−1 ˆ ˆ μˆ ∼ N μ, J , A, μ, ˆ S 2,2
(3.4)
ˆ μ, Sˆ ∼ N S, J −1 A, . ˆ Sˆ 3,3
One can obtain the standard errors (SEs) for α, ˆ μ, ˆ and σˆ as well as the 95% confidence intervals (CIs) for α, μ, and σ . For instance, we construct the 95% CI for μ by μˆ ± 1.96 × SE( μ), ˆ where SE(μ) ˆ =
ˆ μ, J−1 A, ˆ Sˆ . Similarly, by applying the delta method to
Eq. (3.4), the 95% CI for σ is
2,2
σˆ exp{±1.96 × SE(σˆ )/σˆ }, where SE(σˆ ) = σˆ
ˆ μ, J−1 A, ˆ Sˆ . For the Clayton copula, by applying the 3,3
delta method to Eq. (3.4), we construct the 95% CI for α by αˆ exp{±1.96 × SE( α)/ ˆ αˆ }, where SE( α) ˆ = αˆ
ˆ μ, ˆ J−1 (A, ˆ S)
1,1
.
36
3 Estimation, Model Diagnosis, and Process Control …
3.4 Goodness-of-Fit Tests Recall that the aforementioned models rely on some distributional assumptions, such as the normal marginal model and the Clayton copula function. The effect of violating the normality assumption for independent data has been noticed by Albers and Kallenberg (2007). Under the copula-based model for correlated data, the model assumptions are even more stringent. Therefore, we have three model assumptions to be addressed: Markov property: Pr(Yt ≤ yt |Yt−1 = yt−1 , Yt−2 = yt−2 , . . .) = Pr(Yt ≤ yt |Yt−1 = yt−1 ) for ∀t (ii) Marginal distribution: G(y) = Φ{(y − μ)/σ } for ∃(μ, σ ) (iii) Copula form: Pr(Yt ≤ yt , Yt−1 ≤ yt−1 ) = Cα {G(yt ), G(yt−1 ) } for ∃α (i)
Therefore, the main goal of this section is to present model diagnostic procedures to examine (i)–(iii). To examine (i), we propose a model comparison approach with the second-order Markov chain model. To examine (ii), we propose significance tests based on the Kolmogorov–Smirnov and the Cramér–von Mises statistics with aid of a parametric bootstrap. We propose to check (iii) by comparing the goodness of fit between the Clayton copulas and the Joe copulas, and choose the better one to perform SPC. We provide all the computing codes in the R Copula.Markov package. To develop a goodness-of-fit test procedure, we set a null hypothesis H0 : Pr(Yt ≤ y) = Φ
y−μ σ
for ∃(μ, σ ),
against an alternative hypothesis
y−μ H1 : Pr(Yt ≤ y) = Φ σ
for
∀(μ, σ ).
n Let G n (y) = t=1 I {Yt ≤ y }/n be the empirical distribution function. If the model is correct, the parametric estimator Φ{(y − μ)/ ˆ σˆ } and the nonparametric estimator G n (y) converge to the true value (Chen and Fan 2006; Long and Emura 2014). If the model is wrong, the two estimators converge to different values. Thus, we propose a Kolmogorov–Smirnov-type statistic yt − μˆ K S = maxG n (yt ) − Φ , t σˆ and a Cramér–von Mises-type statistic Cv M =
2 2 y − μˆ yt − μˆ n G n (y) − Φ G n (yt ) − Φ dG n (y) = , σˆ σˆ t
to detect the departure of the model from the underlying model.
3.4 Goodness-of-Fit Tests
37
We suggest a parametric bootstrap method to obtain the P-value of the test. The Goodness-of-Fit Test with Parametric Bootstrap Step 1 Generate Markov time series {Y (b) t : t = 1, …, n} under H0 with estimated parameters (μ, ˆ σˆ , α) ˆ for each b = 1, 2, . . . , B. Step 2 Compute the MLE (μˆ (b) , σˆ (b) , αˆ (b) ), the parametric estimator Φ(y − μˆ (b) /σˆ (b) ), and the nonparametric estimator G (b) n (y) from the data {Yt(b) : t = 1, . . . , n }. Then, compute the corresponding statistic K S (b) or Cv M (b) for each b = 1, 2, . . . ,B. B I(K S (b) ≥ K S)/B or Step 3 The P-value of the test is calculated as b=1 B (b) ≥ Cv M /B b=1 I Cv M Reject H0 if the P-value is less than a specified significance level P; otherwise, accept H0 . In conjunction with the test results, a graphical diagnostic procedure is useful by plotting Φ{ (Yt − μ)/ ˆ σˆ } against G n (Yt ). If the plot bends away from the diagonal line, this indicates evidence that the fitted model is not a good choice. Figure 3.1 shows three plots, one for correct data and other two for contaminated data. We see that the plot almost perfectly lies on the diagonal for the correct data while the plots bend away from the diagonal line for the contaminated data.
Fig. 3.1 Diagnostic plots made by the correct data (left), 10% contaminated data (center), and 20% contaminated data (right). The data are generated from the Clayton model with the normal margins under μ = 1, σ = 1, and α = 2 (τ = 0.5), randomly replacing 10% (or 20%) of the data by the outlier μ + 3σ = 4
38
3 Estimation, Model Diagnosis, and Process Control …
Table 3.1 The rejection rates of the goodness-of-fit tests based on 200 repetitions under μ = 1, σ = 1, and α = 2 n = 300
Sample size Nominal significance level (P)
n = 600
n = 1000
0.01
0.05
0.10
0.01
0.05
0.10
0.01
0.05
0.10
KS
0.01
0.03
0.10
0.00
0.03
0.12
0.01
0.04
0.10
CvM
0.00
0.04
0.10
0.00
0.01
0.14
0.00
0.04
0.10
Outliers with rate = 10% size = μ + 3σ
KS
0.00
0.00
0.17
0.02
0.21
0.41
0.08
0.54
0.84
CvM
0.00
0.01
0.15
0.01
0.19
0.53
0.10
0.66
0.91
Outliers with rate = 10% size = μ + 6σ
KS
0.56
0.93
0.98
0.87
0.99
1.00
0.98
1.00
1.00
CvM
0.53
0.89
0.97
0.88
1.00
1.00
0.98
1.00
1.00
Outliers with rate = 20% size = μ + 3σ
KS
0.27
0.86
0.92
0.74
0.97
0.99
0.92
0.99
1.00
CvM
0.27
0.88
0.92
0.81
0.93
0.94
0.92
0.95
0.97
Without outliers
KS = Kolmogorov–Smirnov statistic; CvM = Cramér–von Mises statistic
We conducted simulations to examine the type I error rates and power for the proposed goodness-of-fit test. First, we generated data from the Clayton model with the normal margins under μ = 1, σ = 1, and α = 2 (τ = 0.5). For each data generated, we performed the parametric bootstrap tests, and then examined the rejection rates (the number of rejections among 200 repetitions) under three nominal significance levels, P = 0.01, 0.05, and 0.10. As shown in Table 3.1, if the model is correct, the rejection rates are close to the nominal levels. However, if the model is contaminated by randomly replacing 10% of the data by the outlier μ + 3σ = 4, the rejection rates increase. The rejection rates further increase by increasing the location of outliers to μ + 6σ = 7 or by increasing the contamination rates to 20%. In conclusion, the proposed goodness-of-fit test has a desirable type I error and reasonable power rates.
3.5 Model Selection We propose a model selection method by comparing the first-order and second-order models, and then choosing one that fits better (higher value in the maximized loglikelihood). From the simulations below, we see that the method has a high probability
3.5 Model Selection
39
Table 3.2 The rate of choosing the model between the first-order and second-order Markov models. The data was generated by the Clayton copula and N(1,1) n = 300
n = 600
n = 1000
τ
τ
τ
True model
Chosen model
0.2
0.5
0.75
0.2
0.5
0.75
0.2
0.5
0.75
First order
First order
0.964
1.000
0.988
0.994
1.000
0.999
1.000
1.000
1.000
Second order
0.036
0.000
0.012
0.006
0.000
0.001
0.000
0.000
0.000
First order
0.014
0.001
0.000
0.002
0.000
0.000
0.000
0.000
0.000
Second order
0.986
0.999
1.000
0.998
1.000
1.000
1.000
1.000
1.000
Second order
1000 The rate of choosing the first-order model is i=1 I( 1 > 2 )/1000, and the rate of choosing the 1000 second-order model is i=1 I( 1 < 2 )/1000, where k is the log-likelihood under the kth-order model
to select the true model (higher log-likelihood value) if either the first-order or secondorder model is correct. In particular, we observe Pr( 1 > 2 | first-order model) ≥ 0.95, where k is the maximized log-likelihood under the k-th-order Markov model. Even if both models are incorrect, the model with higher log-likelihood would be regarded as a better model. Table 3.2 shows the performance of the proposed model selection method. It reports the rate of choosing the model between the first-order and second-order Markov models. The method has nearly 100% of selecting the correct model for n = 1000. Even if the sample size is n = 300, the rate of choosing the correct model is more than 95%. The results imply the model selection consistency of the proposed method.
3.6 Software Currently, the Copula.Markov package provides two options for copulas, the Clayton copula and the Joe copula. Note that the Clayton copula has the lower tail dependence while the Joe copula has the upper tail dependence. Hence, these two copulas capture quite different dependence structures and supplement each other in modeling serial dependence. Comparison between the Clayton copula and the Joe copula leads to a very simple but effective strategy for data analysis.
40
3 Estimation, Model Diagnosis, and Process Control …
For the second-order Clayton copula model, we develop an R function Clayton.Markov2.DATA(.) to generate data as well as Clayton.Markov2.MLE(.) to compute (μ, ˆ σˆ , α) ˆ from a given dataset. The latter function applies the subroutine nlm(.) to maximize the log-likelihood with the data-driven initial values ⎞ ⎛ n n 1 2τ 1 0 ⎠, ⎝ y¯ = yt , (yt − y¯ )2 , n t=1 n − 1 t=1 1 − τ0 where τ0 =
2 sgn{(yt − yt∗ )(yt+1 − yt∗+1 ) }, n(n − 1) t 0. If the algorithm diverges, then it restarts the nlm after adding noises of Unif(−D, D) to the initial values with a user specific value D > 0. This scheme is called the randomized Newton–Raphson algorithm that has been applied to various statistical models with many parameters (Emura and Pan 2020; Achim and Emura 2019). Note that τ0 is Kendall’s tau after transforming the time series data to the paired data: (y1 , y2 ), (y2 , y3 ), . . . , (yn−1 , yn ). Other R functions, such as Clayton.Markov.DATA(.), Joe.Markov.DATA(.), Clayton.Markov.MLE(.), and Joe.Markov.MLE(.), are constructed in a similar fashion. After installing the Copula.Markov package, one can enter the commands: set.seed(1) Y=Clayton.Markov2.DATA(n=1000,mu=0,sigma=1,alpha=8) Clayton.Markov2.MLE(Y,plot=TRUE)
The first line sets the seed number before generating random numbers. The second line generates the data {Y t : t = 1, …, 1000} that appear in Fig. 3.2. The third line fits the data to the second-order Markov model with the Clayton copula.
3.6 Software
41
Fig. 3.2 The plot of {Y t : t = 1, …, 1000} generated from the second-order Markov chain model under the trivariate Clayton copula with α = 8 and the marginal distribution G ∼ N(0, 1)
The output is shown below. In the output, $mu, $sigma, and $alpha give the MLEs (μ, ˆ σˆ , α), ˆ and their SEs and 95% CIs. The lower control limit (LCL = μˆ − 3σˆ ) and the upper control limit (UCL = μˆ + 3σˆ ) are given in $Control_Limit. Whether the MLE attains the maximum of the likelihood function or not can be confirmed by checking $Gradient and $Eigenvalue_Hessian. In the example, the gradients are quite close to zero, which means that the likelihood function gives a proper solution. In addition, all the eigenvalues of the Hessian matrix are negative. This guarantees that the MLE attains the local maximum of the log-likelihood (see Theorem 7.7.1 of Khuri (2003, p. 284)).
42
3 Estimation, Model Diagnosis, and Process Control …
> Clayton.Markov2.MLE(Y,plot=TRUE) $mu estimate SE Lower 0.351213287 0.180819819 -0.003193559
Upper 0.705620133
$sigma estimate SE Lower Upper 0.84711413 0.07211004 0.71693968 1.00092430 $alpha estimate SE Lower Upper 4.864032 1.208020 2.989441 7.914122 $Control_Limit Center Lower 0.3512133 -2.1901291
Upper 2.8925557
$out_of_control [1] 530 $Gradient [1] -4.348635e-05 -9.454880e-05 -9.170452e-06 $Hessian [,1] [,2] [,3] [1,] -754.1947 887.7274 -813.1771 [2,] 887.7274 -2207.5784 1331.2615 [3,] -813.1771 1331.2615 -1013.3597 $Eigenvalue_Hessian [1] -10.1214 -393.2452 -3571.7662 $CM.test [1] 0.2583273 $KS.test [1] 0.03049542 $log_likelihood [1] -170.0381
Even though n = 1000 is quite large, the MLEs of μˆ = 0.351 and σˆ = 0.847 are not close to the true values of μ = 0 and σ = 1. This is due to a large sampling variation caused by the strong serial correlation (τ = 0.8), a reasonable phenomenon suggested in Fig. 3.2. For a very large sample size n, the bias vanishes since the MLE is consistent.
3.6 Software
43
The function Clayton.Markov2.MLE(.) draws a control chart, including UCL = μˆ + k σˆ , LCL = μˆ − k σˆ and the center line μˆ (Fig. 3.2). The default is k = 3 (three-sigma control limit), but the user can specify any value k > 0 (k = 1 will be used in Chap. 6). In the output, only one observation falls outside the interval [ LCL, UCL ]. This out-of-control signal is indeed identified from Fig. 3.2. The value k = 3 means that the rate of out-of-control signals is specified at 0.27% at each time point. In addition, we implemented the computation of the goodness-of-fit tests as well as the diagnostic plot under the Clayton copula by Clayton.Markov.GOF(.) and the Joe copula by Joe.Markov.GOF(.).
3.7 Data Analysis This section analyzes three datasets for illustration. The R codes for the analysis are given in Appendix.
3.7.1 Chemical Process Data We consider the chemical process data (Box and Jenkins 1990; Bisgaard and Kulahci 2007; Box and Narasimhan 2010). The data consists of a series of chemical concentrations {Y t : t = 1, …, 197} measured every 2 h. Engineers use SPC to judge if the concentration level is kept within a reasonable range. First, we applied the first-order Clayton Markov model by = 17.0732223, Clayton.Markov.MLE(.), and obtained the MLE μ σˆ = 0.4213754, and αˆ = 1.1777489 (Kendall’s tau τˆ = α/( ˆ αˆ +2) = 0.37). Control limits were LCL = μˆ − 3σˆ = 15.8090961 and UCL = μˆ + 3σˆ = 18.3373486. Next, we fitted the data to the second-order Clayton Markov model by using Clayton.Markov2.MLE(.): #
44
3 Estimation, Model Diagnosis, and Process Control …
> Clayton.Markov2.MLE(Y) $mu estimate SE Lower Upper 17.07094420 0.06868912 16.93631353 17.20557488 $sigma estimate SE Lower Upper 0.41232646 0.03441568 0.35010040 0.48561245 $alpha estimate SE Lower Upper 0.8238138 0.2272330 0.4797747 1.4145578 $Control_Limit Center Lower Upper 17.07094 15.83396 18.30792 $out_of_control [1] "NONE" $Gradient [1] 1.454305e-07 1.643485e-07 2.273737e-09 $Hessian [,1] [,2] [,3] [1,] -406.403140 1.151079 -70.29312 [2,] 1.151079 -448.117270 111.99201 [3,] -70.293121 111.992011 -53.19127 $Eigenvalue_Hessian [1] -12.00858 -412.60245 -483.10065 $CM.test [1] 0.148302 $KS.test [1] 0.07591838 $log_likelihood [1] -59.32751
The outputs show μˆ = 17.0709442, σˆ = 0.4123265, and αˆ = 0.8238138 (Kendall’s tau τˆ = α/( ˆ α+2) ˆ = 0.29). Control limits are LCL = μ−3 ˆ σˆ = 15.83396 and UCL = μˆ + 3σˆ = 18.30792. Finally, we compared the log-likelihood for the two models: 1 = −60.07602 (first order) and 2 = −59.32751(second order). Hence, we choose the second-order model for SPC. These results suggest that there may be some residual dependence that is not captured by the first-order model. Hence, the current chemical concentration may depend on those on previous 2 hours. Figure 3.3 depicts a control chart drawn under the second-order model. It shows that all the points are between the LCL and UCL, which implies that the process is in-control. The data clearly exhibits positive serial correlation.
3.7 Data Analysis
45
Fig. 3.3 A control chart for chemical concentrations {Y t : t = 1, …, 197} measured every 2 h. The UCL and LCL are computed under the second-order Markov model
3.7.2 Financial Data We analyze the weekly values of S&P 500 stock price index consisting of 500 leading companies in leading industries of the U.S. economy. Data were downloaded from FRED (Federal Reserve Economic Data) https://research.stlouisfed.org/fred2/series/ SP500/downloaddata. We extracted weekly data from January 1, 2010 to January 3, 2014 (weekly, ending Friday) and wrote them as {Y t : t = 1, …, 210}. The goal is to show that weekly returns stay within a reasonable range. We first applied the first-order Clayton Markov model by using Clayton.Markov.MLE(.), and obtained the MLE μˆ = 3.28241124, σˆ = 27.45415699, and αˆ = 0.04422089 (Kendall’s tau τˆ = α/( ˆ αˆ + 2) = 0.02). Consequently, control limits were LCL = μˆ − 3σˆ = −79.08005974 and UCL = μˆ + 3σˆ = 85.64488222. Since Kendall’s tau is almost zero, there is some possibility that the first-order Clayton Markov model cannot capture dependence structure of the data. Next, we applied the first-order Joe Markov model by using Joe.Markov.MLE(.) and obtained the MLE σˆ = 27.61220, and $ ∞ μˆ = 3.31300, ˆ e−2s ds = 0.36). αˆ = 2.0 (Kendall’s tau τˆ = 1 − 4/αˆ 2 0 s(1 − e−s )2/α−2 Consequently, control limits are LCL = μˆ − 3σˆ = −79.52359 and UCL = μˆ + 3σˆ = 86.14959. It is interesting to see that Kendall’s tau is now much larger than that under the Clayton copula. Next, we fit the data to the second-order model by using Clayton.Markov2.MLE(.):
46
3 Estimation, Model Diagnosis, and Process Control …
Clayton.Markov2.MLE(Y) $mu estimate SE Lower 3.2785383 2.1542148 -0.9437226 $sigma estimate 27.234645
Upper 7.5007993
SE Lower Upper 1.390067 24.641961 30.100116
$alpha estimate SE Lower Upper 0.09224491 0.05177391 0.03070318 0.27714144 $Control_Limit Center Lower 3.278538 -78.425396
Upper 84.982473
$out_of_control [1] 84 91 101 $Gradient [1] 5.548172e-07 -5.036709e-05
5.247140e-07
$Hessian [,1] [,2] [,3] [1,] -0.2235298 -1.011451 -0.1203988 [2,] -1.0114510 -398.420392 5.1577411 [3,] -0.1203988 5.157741 -3.3218271 $Eigenvalue_Hessian [1] -0.215098 -3.260379 -398.490272 $CM.test [1] 0.1317453 $KS.test [1] 0.06562425 $log_likelihood [1] -991.992
The outputs show μˆ = 3.2785383, σˆ = 27.234645, and αˆ = 0.09224491 (Kendall’s tau τˆ = α/( ˆ αˆ + 2) = 0.29). Consequently, control limits are LCL = μˆ − 3σˆ = −78.42539612 and UCL = μˆ + 3σˆ = 84.98247281. Again, it is interesting to see that Kendall’s tau is now much larger than that under the first-order Clayton model. Finally, we compared the log-likelihood for the three models: 1 (Clayton) = −993.892, 2 (Clayton) = −991.992, and 1 (Joe) = −1064.618. Hence, we chose the second-order Clayton model. Figure 3.4 depicts a control chart drawn under the second-order Clayton Markov model. It shows that three points are outside the range between the LCL and UCL. Hence, the process is out of control. The data exhibits positive but weak serial correlation.
3.7 Data Analysis
47
Fig. 3.4 A control chart for weekly S&P 500 stock price index from January 1, 2010 to January 3, 2014, denoted as {Y t : t = 1, …, 210}. The UCL and LCL are computed under the second-order Markov model
3.7.3 Baseball Data A set of baseball data was analyzed as an example. The data is available on open data website: https://www.baseball-reference.com/leagues/MLB/bat.shtml. Following Kim et al. (2019), we extract annual records of batting average (BA) in Major League Baseball (MLB) from 1980 to 2016 and write them as {Y t : t = 1,…, 37}. The goal is to examine if there is a large and unusual variation of MLB statistics by fitting our method (Kim et al. 2019). Chapter 6 will discuss more details on the baseball data. We first apply the first-order Clayton Markov model by using Clayton.Markov.MLE(.):
48
3 Estimation, Model Diagnosis, and Process Control …
> Clayton.Markov.MLE(Y) $mu estimate SE Lower Upper 0.261813148 0.002206924 0.257487576 0.266138719 $sigma estimate SE Lower Upper 0.005793096 0.001130732 0.003951528 0.008492909 $alpha estimate SE Lower Upper 1.8253710 1.0024545 0.6221297 5.3557628 $Control_Limit Center Lower Upper 0.2618131 0.2444339 0.2791924 $out_of_control [1] "NONE" $Gradient [1] 0.0017472615
0.0002157212 -0.0000873564
$Hessian [,1] [,2] [,3] [1,] -580992.261 1948.90666 -2412.33131 [2,] 1948.907 -90.98980 33.01026 [3,] -2412.331 33.01026 -20.68419 $Eigenvalue_Hessian [1] -3.041017e+00 -9.207743e+01 -5.810088e+05 $CM.test [1] 0.1555015 $KS.test [1] 0.1502079 $log_likelihood [1] 153.8685
The outputs show the MLE μˆ = 0.261813148, σˆ = 0.005793096, and αˆ = 1.8253710 (Kendall’s tau τˆ = α/( ˆ α+2) ˆ = 0.48). Control limits are LCL = μ−3 ˆ σˆ = 0.2444339 and LCL = μˆ + 3σˆ = 0.2791924. Next, we fitted the first-order Joe Markov model by using Joe.Markov.MLE(.), we obtained μˆ = 0.260683403, σˆ = 0.006095821, and $∞ ˆ e−2s ds = 0.43). αˆ = 2.390078566 (Kendall’s tau τˆ = 1 − 4/αˆ 2 0 s(1 − e−s )2/α−2 Control limits are LCL = μˆ − 3σˆ = −0.242395939 and UCL = μˆ + 3σˆ = 0.278970867.
3.7 Data Analysis
49
Fig. 3.5 The diagnostic plot and goodness-of-fit test for the first-order Markov model under the Clayton copula
Lastly, we fitted the data to the second-order Clayton Markov model by using Clayton.Markov2.MLE(.) and obtained μˆ = 0.261049293, σˆ = 0.005741486, and αˆ = 1.368885059 (Kendall’s tau τˆ = α/( ˆ αˆ + 2) = 0.40). Consequently, control limits are LCL = μˆ − 3σˆ = 0.243824833 and UCL = μˆ + 3σˆ = 0.278273752. We compared the log-likelihood for the three models: 1 (Clayton) = 153.8685,
2 (Clayton) = 152.4118, and 1 (Joe) = 150.7123. Obviously, the first-order Clayton Markov model is chosen as the best model. To confirm the first-order Clayton Markov model as a suitable model for the dataset, we performed model diagnostic and goodness-of-fit test. The model diagnostic plot does not give graphical evidence of rejecting the model (Fig. 3.5). Indeed, the results of the bootstrap goodness-of-fit test (by Clayton.Markov.GOF(.)) showed little evidence for rejecting the model under the Kolmogorov–Smirnov statistics (P-value = 0.59) and Cramér–von Mises statistics (P-value = 0.61).
50
3 Estimation, Model Diagnosis, and Process Control …
Fig. 3.6 A control chart for batting average (BA) in Major League Baseball based on {Y t : t = 1, …, 37}. The UCL and LCL are given under the first-order Clayton model
Figure 3.6 depicts a control chart drawn under the first-order Clayton Markov model. It shows that none of the points are outside the control limit, which implies that the process is in-control. The data exhibits positive serial correlation.
Appendix: R Codes for Data Analysis The following R codes produce the results of our real data analysis in Sect. 3.7.
Appendix: R Codes for Data Analysis
51
install.packages("Copula.Markov") library(Copula.Markov) #Chemical data Y=c(17.0, 16.6, 16.3, 16.1, 17.1, 16.9, 16.8, 17.4, 17.1, 17.0, 16.7, 17.4, 17.2, 17.4, 17.4, 17.0, 17.3, 17.2, 17.4, 16.8, 17.1, 17.4, 17.4, 17.5, 17.4, 17.6, 17.4, 17.3, 17.0, 17.8, 17.5, 18.1, 17.5, 17.4, 17.4, 17.1, 17.6, 17.7, 17.4, 17.8, 17.6, 17.5, 16.5, 17.8, 17.3, 17.3, 17.1, 17.4, 16.9, 17.3, 17.6, 16.9, 16.7, 16.8, 16.8, 17.2, 16.8, 17.6, 17.2, 16.6, 17.1, 16.9, 16.6, 18.0, 17.2, 17.3, 17.0, 16.9, 17.3, 16.8, 17.3, 17.4, 17.7, 16.8, 16.9, 17.0, 16.9, 17.0, 16.6, 16.7, 16.8, 16.7, 16.4, 16.5, 16.4, 16.6, 16.5, 16.7, 16.4, 16.4, 16.2, 16.4, 16.3, 16.4, 17.0, 16.9, 17.1, 17.1, 16.7, 16.9, 16.5, 17.2, 16.4, 17.0, 17.0, 16.7, 16.2, 16.6, 16.9, 16.5, 16.6, 16.6, 17.0, 17.1, 17.1, 16.7, 16.8, 16.3, 16.6, 16.8, 16.9, 17.1, 16.8, 17.0, 17.2, 17.3, 17.2, 17.3, 17.2, 17.2, 17.5, 16.9, 16.9, 16.9, 17.0, 16.5, 16.7, 16.8, 16.7, 16.7, 16.6, 16.5, 17.0, 16.7, 16.7, 16.9, 17.4, 17.1, 17.0, 16.8, 17.2, 17.2, 17.4, 17.2, 16.9, 16.8, 17.0, 17.4, 17.2, 17.2, 17.1, 17.1, 17.1, 17.4, 17.2, 16.9, 16.9, 17.0, 16.7, 16.9, 17.3, 17.8, 17.8, 17.6, 17.5, 17.0, 16.9, 17.1, 17.2, 17.4, 17.5, 17.9, 17.0, 17.0, 17.0, 17.2, 17.3, 17.4, 17.4, 17.0, 18.0, 18.2, 17.6, 17.8, 17.7, 17.2, 17.4) Clayton.Markov.MLE(Y=Y) Clayton.Markov2.MLE(Y=Y) Joe.Markov.MLE(Y=Y) #Financial data Y = c(-11.38, 29.88, -8.95, -44.27, -17.89, -7.68, 9.32, 33.66, -4.68, 34.21, 11.29, 9.91, 6.69, 11.51, 6.27, -2.24, 25.15, -30.59, -75.81, 24.80, 47.99, 1.72, -24.53, 26.72, 25.91, -40.75, -54.18, 55.38, -13.08, 37.78, -1.06, 20.04, -42.39, -7.56, -7.10, 39.92, 5.04, 16.04, 23.08, -2.43, 18.91, 11.04, 6.89, 0.18, 42.59, -26.64, 0.52, -10.33, 35.31, 15.69, 3.51, 12.86, 0.87, 13.86, 21.74, -9.89, -7.01, 34.53, 18.28, 13.86, 23.13, 1.27, -16.87, -25.08, 34.60, 18.61, -4.24, -8.49, 17.70, 26.23, 23.41, -2.43, -4.50, -2.17, -30.94, -29.18, 0.52, -3.05, 71.22, 4.13, 27.66, 28.88, -52.74, -92.09, -20.57, -55.28, 53.25, -2.83, -19.74, 61.78, -79.58, -5.01, 24.04, 69.12, 13.67, 46.84, -31.86, 10.62, -48.20, -56.98, 85.61, 10.91, -35.53, 45.67, -7.73, 20.21, 11.28, 26.29, 0.95, 28.57, -2.26, 18.59, 4.51, 3.89, 1.24, 33.30, -7.06, 11.36, -10.39, 27.82, 8.27, 24.83, -34.26, -15.71, -58.17, 22.60, -39.78, 47.62, 17.18, -7.82, 27.14, -7.48, 2.10, 5.88, 23.31, 5.02, 14.88, 12.29, -7.03, 4.55, 31.34, 27.85,-5.62, -19.48, 20.26, -32.34, 4.60, -21.25, 2.26, 34.35, -19.97, 49.27, 7.03, 1.89, -4.49, 16.57, -27.72, 64.04, 5.58, 13.93, 16.98, 10.21, 4.76, 1.86, -4.19, 2.60, 32.98, 9.52, -3.81, 12.30, -15.91, 35.57, -33.60, 26.99, 32.18, 19.28, 33.77, -17.87, -18.86, 12.64, -16.65, -34.30, 13.85, 25.61, 48.30, 11.90, -0.44, 18.02, -18.2, -35.59, 7.67, -30.53, 22.20, 32.82, 21.92, -18.16, -1.25, 12.70, 41.30, 15.27, 1.87, 8.97, 27.57, 6.58, 1.05, -0.72, -29.77, 43.00, 23.08, -10.03) Clayton.Markov.MLE(Y=Y) Clayton.Markov2.MLE(Y=Y) Joe.Markov.MLE(Y=Y) #BA data Y = c(0.265,0.256,0.261,0.261,0.260,0.257,0.258,0.263,0.254,0.254,0.258,0.256, 0.256,0.265,0.270,0.267,0.270,0.267,0.266,0.271,0.270,0.264,0.261,0.264,0.266, 0.264,0.269,0.268,0.264,0.262,0.257,0.255,0.255,0.253,0.251,0.254,0.255) Clayton.Markov.MLE(Y=Y) Clayton.Markov2.MLE(Y=Y) Joe.Markov.MLE(Y=Y) Clayton.Markov.GOF(Y = Y, method = "nlm")
52
3 Estimation, Model Diagnosis, and Process Control …
References Achim D, Emura T (2019) Analysis of doubly truncated data. An introductin. JSS Research Series in Statistics, Springer, Singapore Albers W, Kallenberg WC (2007) Shewhart control charts in new perspective. Seq Analy 26(2):123– 151 Billingsley P (1961) Statistical methods in markov chains. Annals Math Statist 32(1):12–40 Bisgaard S, Kulahci M (2007) Quality quandaries: using a time series model for process adjustment and control. Qual Eng 20(1):134–141 Box G, Narasimhan S (2010) Rethinking statistics for quality control. Qual Eng 22(2):60–72 Box GEP, Jenkins G (1990) Time series analysis, forecasting and control: Holden-Day, Inc Chen X, Fan Y (2006) Estimation and model selection of semiparametric copula-based multivariate dynamic models under copula misspecification. J Econometr 135(1–2):125–154 Darsow WF, Nguyen B, Olsen ET (1992) Copulas and markov processes. Ill J Math 36(4):600–642 Domma F, Giordano S, Perri PF (2009) Statistical modeling of temporal dependence in financial data via a copula function. Commun Statist Simul Comput 38(4):703–728 Emura T, Chen Y-H (2016) Gene selection for survival data under dependent censoring: a copulabased approach. Statist Methods Med Res 25(6):2840–2857 Emura T, Long T-H, Sun L-H (2017a) R routines for performing estimation and statistical process control under copula-based time series models. Commun Statist Simul Comput 46(4):3067–3087 Emura T, Nakatochi M, Matsui S, Michimae H, Rondeau V (2018) Personalized dynamic prediction of death according to tumour progression and high-dimensional genetic factors: meta-analysis with a joint model. Statist Methods Med Res 27(9):2842–2858 Emura T, Matsui S, Chen H-Y (2019a) Compound. Cox: univariate feature selection and compound covariate for predicting survival. Comput Methods Prog Biomed 168:21–37 Emura T, Matsui S, Rondeau V (2019b) Survival analysis with correlated endpoints joint frailtycopula models. JSS Research Series in Statistics, Springer, Singapore Emura T, Nakatochi M, Murotani K, Rondeau V (2017b) A joint frailty-copula model between tumour progression and death for meta-analysis. Statist Methods Med Res 26(6):2649–2666 Emura T, Pan CH (2020) Parametric maximum likelihood inference and goodness-of-fit tests for dependently left-truncated data, a copula-based approach. Stat Pap 61:479–501 Huang X-W, Chen W-R, Emura T (2020a) Likelihood-based inference for a copula-based Markov chain model with binomial time series. Under review Huang X-W, Wang W, Emura T (2020b) A copula-based markov chain model for serially dependent event times with a dependent terminal event, Japanese J Stat Data Sci, in revision Huang X-W, Emura T (2019) Model diagnostic procedures for copula-based Markov chain models for statistical process control. Commun Statist Simul Comput. https://doi.org/10.1080/03610918. 2019.1602647 Joe H (1997) Multivariate models and multivariate dependence concepts: chapman and Hall/CRC Khuri AI (2003) Advanced calculus with applications in statistics: Wiley Kim J-M, Baik J, Reller M (2019) Control charts of mean and variance using copula Markov SPC and conditional distribution by copula. Commun Statist Simul Comput. https://doi.org/10.1080/ 03610918.2018.1547404 Knoth S, Schmid W (2004) Control charts for time series: a review. In: Frontiers in statistical quality control 7 (pp 210–236). Springer Long T-H, Emura T (2014) A control chart using copula-based markov chain models. J Chinese Statist Ass 52(4):466–496 Li F, Tang Y, Wang HJ (2019) Copula-based semiparametric analysis for time series data with detection limits. Canadian J Statist 47(3):438–454 Lin WC, Emura T, Sun LH (2019) Estimation under copula-based Markov normal mixture models for serially correlated data. Commun Statist Simul Comput. https://doi.org/10.1080/03610918. 2019.1652318 MacDonald IL (2014) Does Newton-Raphson really fail? Statist Methods Med Res 23(3):308–311
References
53
Mastrangelo CM, Montgomery DC (1995) SPC with correlated observations for the chemical and process industries. Qual Reliability Eng Int 11(2):79–89 Montgomery DC (2009) Statistical quality control (vol 7). Wiley, New York Nelsen RB (2006) An introduction to copulas. Springer Science & Business Media Rotolo F, Legrand C, Van Keilegom I (2013) A simulation procedure based on copulas to generate clustered multi-state survival data. Comput Methods Pro Biomed 109(3):305–312 Rotolo F, Paoletti X, Michiels S (2018) surrosurv: an R package for the evaluation of failure time surrogate endpoints in individual patient data meta-analyses of randomized clinical trials. Comput Methods Pro Biomed 155:189–198 Sonmez OE, Baray A (2019) On copula based serial dependence in statistical process control. In: Industrial engineering in the big data Era (pp 127–136). Springer Sun LH, Lee CS, Emura T (2018) A bayesian inference for time series via copula-based markov chain models. Commun Statist Simul Comput. https://doi.org/10.1080/03610918.2018.1529241 Wieringa JE (1999) Statistical process control for serially correlated data. Labyrint Publication Zhang S, Zhou QM, Lin H (2020) Goodness-of-fit test of copula functions for semi-parametric univariate time series models. Statistical Papers. https://doi.org/10.1007/s00362-019-01153-4
Chapter 4
Estimation Under Normal Mixture Models for Financial Time Series Data
Abstract We propose an estimation method under a copula-based Markov model for serially correlated data. Motivated by the fat-tailed distribution of financial assets, we select a normal mixture distribution for the marginal distribution. Based on the normal mixture distribution for the marginal distribution and the Clayton copula for serial dependence, we obtain the corresponding likelihood function. In order to obtain the maximum likelihood estimators, we apply the Newton–Raphson algorithm with appropriate transformations and initial values. In the empirical analysis, the stock price of Dow Jones Industrial Average is analyzed for illustration. Keywords Log return · Copula · Normal mixture distribution · Newton–Raphson algorithm · Markov model
4.1 Introduction Data with the time index collected for financial and econometric studies are rarely independent. For instance, the stock price today may be highly dependent on the previous one. Hence, a variety of time series models have been proposed in the literature to describe serial dependence. See the unconditional distribution models such as autoregressive-moving-average (ARMA) models. However, we observe that financial data usually do not satisfy the feature of the unconditional models including the constant variance and normality feature. Consequently, the conditional distribution such as generalized autoregressive conditional heteroscedasticity (GARCH) models where the variance also has time dependency can be applied to the analysis of financial data. See Curto et al. (2009) for instance. However, both ARMA and GARCH models are still insufficient to describe nonlinear dependence that often arises in financial time series. Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-981-15-4998-4_4) contains supplementary material, which is available to authorized users.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 L.-H. Sun et al., Copula-Based Markov Models for Time Series, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-15-4998-4_4
55
56
4 Estimation Under Normal Mixture Models for Financial Time Series Data
Hence, in order to model nonlinear dependence, we consider the copula-based Markov models that can fit financial time series data well. See Chen and Fan (2006). The correlated data based on copula models has been widely studied. In this chapter, we focus on the estimation problem based on the first-order copula-based Markov model where a copula function is used to model serial dependence. Referred to Nelsen (2006), there are several families of copula including the Archimedean copulas and elliptical copulas. The Archimedean copulas also have many different types as follows: Clayton, Gumbel, Frank and Joe copulas, and so on. Therefore, compared to autoregressive (AR) models that produce linear dependence, the copula-based models provide the nonlinear structure between two time step variables through the copula functions. The details are given in Sect. 4.2. In addition, in real applications, we observe that data violates the normality assumption frequently. For instance, Platen and Rendek (2008) showed that log returns in the stock market follow some heavy tail distributions rather than normal distributions. Hence, referred to Zangari (1996), the normal mixture distribution can be used to capture the heavy tail feature given by finance data. On the other hand, based on the feature of mixture distributions, we have the flexibility to analyze the data with multiple modes. In this chapter, we study the maximum likelihood estimators (MLEs) under the copula-based Markov model where the marginal distribution is given by a twocomponent normal mixture distributions. In order to obtain the desired MLEs, we first calculate the partial derivatives of the log-likelihood function based on the copulabased Markov model where the marginal distribution follows the normal mixture distribution. We then use the Newton–Raphson method to obtain the MLEs numerically with appropriate transformations and initial values. In the literature, copula-based Markov models are studied in several papers. Joe (1997) proposed the parametric estimation methods under a copula-based Markov model. Chen and Fan (2006) proposed the semiparametric estimation method. Long and Emura (2014) and Emura et al. (2017) proposed the parametric estimation method under a copula-based Markov model with the normal distribution being the marginal distribution. Sun et al. (2018) studied the parametric estimation method under a copula-based Markov model where the marginal distribution follows the Student’s t distribution. Huang and Emura (2019) developed model diagnostic procedures under the normal marginal distribution. Lin et al. (2019) developed an estimation under a two-component normal mixture model, which will be presented in this chapter. Note that fitting the normal model to heavy-tailed data or multimodal data produces biased results and poor fit. See Sect. 4.5 for the illustration. The chapter is organized as follows. We first introduce the copula-based Markov chain models with a marginal distribution being a mixture normal distribution in Sect. 4.2. Section 4.3 is devoted to the proposed method: Newton–Raphson algorithms to obtain the MLEs for the parameters of the copula-based Markov chain models. In Sect. 4.5, we give the empirical study using the stock price of Dow Jones Industrial Average. The concluding remark is provided in Sect. 4.6.
4.2 Models
57
4.2 Models The aim of this section is to introduce the proposed model for a time series {Yt : t = 1, 2, · · · , n}.
4.2.1 Copulas A copula function is a joint distribution function with the marginal distribution being Unif(0, 1). Copula functions are useful to describe the dependence between multiple random variables. In Nelsen (2006), Sklar’s theorem shows that for any bivariate distribution function H (y1 , y2 ) with marginal distribution F1 (y1 ) = H (y1 , ∞) and F2 (y2 ) = H (∞, y2 ), there exists a copula C : [0, 1]2 → [0, 1] given by H (y1 , y2 ) = C(F1 (y1 ), F2 (y2 )). In this chapter, we focus on the bivariate Clayton copula, which is a member of Archimedean copulas. A copula C is called Archimedean if it admits the representation C(u 1 , u 2 ) = ψ −1 (ψ(u 1 ) + ψ(u 2 )), where ψ : [0, 1] → [0, ∞) is a continuous, strictly decreasing, and convex function where ψ −1 : [0, ∞) → [0, 1] is its inverse. The Clayton copula function is defined as −α −α −α I(u −α C(u 1 , u 2 ; α) = (u −α 1 + u 2 − 1) 1 + u 2 − 1 > 0), α ∈ [−1, ∞)\{0}, 1
where I(·) is the indicator function and α represents the degree of the correlation between Yt and Yt−1 . For instance, given t = 2, we have U1 = F1 (Y1 ) and U2 = F2 (Y2 ) whose joint distribution function is C(u 1 , u 2 ; α). Note that α > 0 implies the positive correlation between Yt and Yt−1 and α ∈ [−1, 0) implies the negative correlation between the two. The Clayton copula is an Archimedean copula by setting ψ(t) = α1 (t −α − 1). The inverse function of ψ is defined as ψ −1 (t) =
− α1
max(0, αt + 1)
.
Kendall’s tau τ can be used to describe the dependence between Yt and Yt−1 , which can be expressed as τ=
α , α ∈ [−1, ∞). α+2
Thus, τ ∈ [−1, 1]. See Chap. 2 for more details on the Clayton copula.
58
4 Estimation Under Normal Mixture Models for Financial Time Series Data
4.2.2 Copula-Based Markov Chain The copula-based Markov chain model is introduced by Darsow et al. (1992). Let {Yt : t = 1, 2, ..., n} be correlated random variables whose serial correlations are determined by the joint distribution function Pr(Yt−1 ≤ yt−1 , Yt ≤ yt ) = C(F(yt−1 ), F(yt ); α),
(4.1)
where C(·, · : α) is the copula with the parameter α and F(·) is the continuous marginal distribution with the density f (y) =
d F(y) . dy
We further assume that the copula C has the density denoted by C [1,1] (u 1 , u 2 ; α) =
∂2 C(u 1 , u 2 ; α) ∂u 1 ∂u 2
. According to (4.1), we compute the transition density of Yt given Yt−1 = yt−1 as follows: f (yt |yt−1 ) = C [1,1] (F(yt−1 ), F(yt ); α) f (yt ). Observe that the nonlinearity of the conditional distribution is driven by the copula density function C [1,1] . On the other hand, the AR(1) model is written as Yt = φYt−1 + et , where −1 < φ < 1. Assuming that et ∼ N(0, σ 2 ), we obtain f (yt |yt−1 ) = √
1 2π σ 2
e−
(yt −φyt−1 )2 2σ 2
.
Hence, the conditional distribution of Yt is still the normal distribution and the conditional mean is linearly dependent on Yt−1 . Unfortunately, the mean-based model is not convenient for mixture models since they have many means. The copula-based models can easily incorporate a mixture distribution by modeling F in (4.1). We now impose the Markov assumption f (yt |yt−1 , · · · , y2 , y1 ) = f (yt |yt−1 ).
4.2 Models
59
Suppose that the marginal distribution follows a k-component normal mixture distribution whose density is defined as f (y) =
k
1 pi φ σi
i=1
y − μi σi
,
2 where φ(x) = √12π exp − x2 and 0 < pk < 1 with pk = 1. We focus on the two-component normal mixture model of k = 2, that is,
1 f (y) = p φ σ1
y − μ1 σ1
1 + (1 − p) φ σ2
y − μ2 σ2
,
where 0 < p < 1, −∞ < μi < ∞, and σi2 > 0 are mean and variance for the ith component. Figure 4.1 shows the densities of the normal mixture distribution. A bimodal density is created by μ1 = μ2 while a heavy tail density is created by μ1 = μ2 and σ1 = σ2 . The cumulative distribution function is F(y) = p
y − μ1 σ1
+ (1 − p)
y − μ2 σ2
,
x where (x) = −∞ φ(z)dz is the c.d.f. of the standard normal random variable. For each t, the expectation for Yt is given by
1.0 0.8 0.6 0.4 0.0
0.2
Probability Density Function f(x)
0.8 0.6 0.4 0.2 0.0
Probability Density Function f(x)
1.0
μ = E(Yt ) = pμ1 + (1 − p)μ2 .
−3
−2
−1
0
x
1
2
3
−3
−2
−1
0
1
2
3
x
Fig. 4.1 Two particular densities of the normal mixture distribution with k = 2. Red lines with (μ1 , σ1 ) = (1, 0.5) and blue lines with (μ2 , σ2 ) = (−1, 0.5) indicate the component densities and black lines with (μ1 , σ1 , μ2 , σ2 , p) = (1, 0.5, −1, 0.5, 0.5) indicate the mixture densities (left). Red lines with (μ1 , σ1 ) = (0, 1) and blue lines with (μ2 , σ2 ) = (0, 0.5) indicate the component densities and black lines with (μ1 , σ1 , μ2 , σ2 , p) = (0, 1, 0, 0.5, 0.5) indicate the mixture densities (right)
60
4 Estimation Under Normal Mixture Models for Financial Time Series Data
The variance for Yt is given by σ 2 = V ar (Yt ) = E(Yt2 ) − (E(Yt ))2 = p(1 − p)(μ1 − μ2 )2 + pσ12 + (1 − p)σ22 . Hence, the variance is decomposed into two parts: one is the squared difference between μ1 and μ2 , and the other is the weighted sum of σ12 and σ22 .
4.3 Parameter Estimation This section proposes an estimation method under the proposed model in Sect. 4.2.
4.3.1 Maximum Likelihood Estimators The log-likelihood function under the Clayton copula with the marginal normal mixture model is written as
(α, μ1 , μ2 , σ1 , σ2 , p) =
n
log( f (yt )) +
t=1
= +
n t=1 n
n
log(C [1,1] (F(yt−1 ), F(yt ); α))
t=2
log( f (yt )) log{(1 + α)F(yt−1 )−1−α F(yt )−1−α (F(yt−1 )−α + F(yt )−α − 1)− α −2 } 1
t=2
= +
n
log( f (yt ))
t=1 n
log(1 + α) − (1 + α) log F(yt−1 ) − (1 + α) log F(yt )
t=2
1 −α −α − ( + 2) log F(yt−1 ) + F(yt ) − 1 , α where {yt : t = 1, 2, · · · , n} are observed data. The maximum likelihood estimator (MLE) is defined by ML = arg max (θ ), (αˆ M L , μˆ 1M L , μˆ 2M L , σˆ 1M L , σˆ 2M L , pˆ M L ) = θˆ θ
4.3 Parameter Estimation
61
where θ = (α, μ1 , μ2 , σ1 , σ2 , p). Supplementary material 1 provides the first- and second-order derivatives of (θ ) with respect to θ . In order to calculate the MLE, we use the Newton–Raphson algorithm because the MLE does not have closed forms. In addition, it is well known that the Newton– Raphson algorithm is sensitive to the initial values. Also, note that the algorithm diverges due to a wrong initial value. We first reparameterize the log-likelihood to avoid the constraints α > −1, σ1 > 0, σ2 > 0, and 0 ≤ p ≤ 1. The log-likelihood function is reparameterized as ˜ m 1 , m 2 , s1 , s2 , q) = (α, μ1 , μ2 , σ1 , σ2 , p),
(a, where a = log(α + 1), m 1 = μ1 , m 2 = μ2 , s1 = log σ1 , s2 = log σ2 , and q = log(− log( p)). These transformations reduce the sensitivity of the Newton– Raphson algorithm to the initial values as discussed in MacDonald (2014). The MLE is ML ˜ θ˜ ), = arg max ( θ˜ θ˜
ML = (aˆ M L , mˆ 1M L , mˆ 2M L , sˆ1M L , sˆ2M L , qˆ M L ). where θ˜ = (a, m 1 , m 2 , s1 , s2 , q) and θ˜ ˜ ˜ The partial derivatives of (θ ) are available by using the chain rule. For exam∂ ˜ ∂
∂ ˜ ∂
∂
1 ∂
ple, ∂a = ∂α = (α + 1) ∂α and ∂s = ∂σ = σ1 ∂σ . Then, the iteration of the ∂a ∂α ∂s1 ∂σ1 1 1 Newton–Raphson algorithm is
⎡
⎤
⎡
⎤
⎡
⎢ a(k+1) a(k) ⎢ ⎢ m 1(k+1) ⎥ ⎢ m 1(k) ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ m 2(k+1) ⎥ ⎢ m 2(k) ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎢ s1(k+1) ⎥ = ⎢ s1(k) ⎥ + ⎢ ⎢ ⎥ ⎢ ⎥ ⎢ ⎣ s2(k+1) ⎦ ⎣ s2(k) ⎦ ⎢ ⎢ ⎣ q(k+1) q(k)
∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂a∂m 1 ∂a∂m 2 ∂a∂s1 ∂a∂s2 ∂a∂q ∂a 2 2 2 2 2 2 ˜ ˜ ˜ ˜ ˜ ∂
∂
∂
∂
∂
∂ 2 ˜ ∂m 1 ∂a ∂m 1 ∂m 2 ∂m 1 ∂s1 ∂m 1 ∂s2 ∂m 1 ∂q ∂m 1 2 ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂m 2 ∂a ∂m 2 ∂m 1 ∂m 2 ∂s1 ∂m 2 ∂s2 ∂m 2 ∂q ∂m 2 2 2 2 2 2 2 ˜ ˜ ˜ ˜ ˜ ∂
∂
∂
∂
∂ 2 ˜ ∂
∂s1 ∂a ∂s1 ∂m 1 ∂s1 ∂m 2 ∂s1 ∂s2 ∂s1 ∂q ∂s1 2 ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂s2 ∂a ∂s2 ∂m 1 ∂s2 ∂m 2 ∂s2 ∂s1 ∂s2 ∂q ∂s2 2 2 2 2 2 2 ˜ ˜ ˜ ˜ ˜ ∂
∂
∂
∂
∂ 2 ˜ ∂
∂q∂a ∂q∂m 1 ∂q∂m 2 ∂q∂s1 ∂q∂s2 ∂q 2
⎤−1 ⎡ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
∂ ˜ ∂a ∂ ˜ ∂m 1 ∂ ˜ ∂m 2 ∂ ˜ ∂s1 ∂ ˜ ∂s2 ∂ ˜ ∂q
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
.
θ=θ (k)
(4.2) The algorithm stops if ||θ (k+1) − θ (k) || < for some small > 0. The initial value of the Newton–Raphson algorithm for the parameter α is chosen by α0 =
−2τ0 , τ0 − 1
where 1 τ0 = n {sgn(Y j − Yi )sgn(Y j+1 − Yi+1 )}. 2
i> j
62
4 Estimation Under Normal Mixture Models for Financial Time Series Data
Here, sgn(x) is the sign function, taking the value −1, 0, or 1 when x < 0, x = 0, or x > 0, respectively. We then give the initial values for (μ1 , μ2 , σ1 , σ2 , p) by using the MLEs under the independence assumption as we will discuss later. The MLEs are obtained by iterating the formula (4.2) until convergence and check that the Hessian matrix is negative definite. One possible alternative to obtain the initial values would be the method of moment. However, due to the mixture feature, it is difficult to obtain the initial values by solving the moments equations E(Ytk ) = n1 nt=1 Ytk , k = 1, 2, · · · , 5. See Everitt (1996) for the details about the method of moments.
4.3.2 Interval Estimation For the interval estimation in our model, we define J(θ˜ ) as the Hessian matrix: ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ 2˜ ˜ ⎢
( θ ) ∂ ˜ =⎢ J(θ ) ≡ ⎢ ∂ θ˜ ∂ θ˜ ⎢ ⎢ ⎢ ⎢ ⎣
∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂a∂a ∂a∂m 1 ∂a∂m 2 ∂a∂s1 ∂a∂s2 ∂a∂q ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂m 1 ∂a ∂m 1 ∂m 1 ∂m 1 ∂m 2 ∂m 1 ∂s1 ∂m 1 ∂s2 ∂m 1 ∂q ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂m 2 ∂a ∂m 2 ∂m 1 ∂m 2 ∂m 2 ∂m 2 ∂s1 ∂m 2 ∂s2 ∂m 2 ∂q ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂s1 ∂a ∂s1 ∂m 1 ∂s1 ∂m 2 ∂s1 ∂s1 ∂s1 ∂s2 ∂s1 ∂q ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂s2 ∂a ∂s2 ∂m 1 ∂s2 ∂m 2 ∂s2 ∂s1 ∂s2 ∂s2 ∂s2 ∂q ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂q∂a ∂q∂m 1 ∂q∂m 2 ∂q∂s1 ∂q∂s2 ∂q∂q
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
Then we let ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ML ML ML ML ML ML J(aˆ , mˆ 1 , mˆ 2 , sˆ1 , sˆ2 , qˆ ) = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂a∂a ∂a∂m 1 ∂a∂m 2 ∂a∂s1 ∂a∂s2 ∂a∂q 2 2 2 2 2 ˜ ˜ ˜ ˜ ˜ ∂
∂
∂
∂
∂
∂ 2 ˜ ∂m 1 ∂a ∂m 1 ∂m 1 ∂m 1 ∂m 2 ∂m 1 ∂s1 ∂m 1 ∂s2 ∂m 1 ∂q 2 2 2 2 2 ∂ ˜ ∂ ˜ ∂ ˜ ∂ ˜ ∂ ˜ ∂ 2 ˜ ∂m 2 ∂a ∂m 2 ∂m 1 ∂m 2 ∂m 2 ∂m 2 ∂s1 ∂m 2 ∂s2 ∂m 2 ∂q ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂s1 ∂a ∂s1 ∂m 1 ∂s1 ∂m 2 ∂s1 ∂s1 ∂s1 ∂s2 ∂s1 ∂q 2 2 2 2 2 ∂ ˜ ∂ ˜ ∂ ˜ ∂ ˜ ∂ ˜ ∂ 2 ˜ ∂s2 ∂a ∂s2 ∂m 1 ∂s2 ∂m 2 ∂s2 ∂s1 ∂s2 ∂s2 ∂s2 ∂q ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂ 2 ˜ ∂q∂a ∂q∂m 1 ∂q∂m 2 ∂q∂s1 ∂q∂s2 ∂q∂q
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
.
˜ θˆ˜ θ=
ML
Billingsley (1961) showed that for the Markov chain model, there exists the exact ˜ such that information matrix I(θ) ML
ˆ J(θ˜ n
)
˜ as n −→ ∞, −→ −I(θ), p
under some regularity conditions. Furthermore, he showed the asymptotic normality written as √ ˆML d ˜ n(θ˜ − θ˜ ) −→ N (0, I −1 (θ)), as n −→ ∞.
4.3 Parameter Estimation
63
ML
˜ k = [(− J)−1 (θˆ˜ )]k,k . Then we can Thus, the standard error (SE) is SE [θ] construct the 100(1 − α)% confidence intervals (CIs) via normal approximations aˆ M L ± z α/2 · SE(aˆ M L ), mˆ 1M L ± z α/2 · SE(mˆ 1M L ), mˆ 2M L ± z α/2 · SE(mˆ 2M L ), sˆ1M L ± z α/2 · SE(ˆs1M L ), sˆ2M L ± z α/2 · SE(ˆs2M L ), qˆ M L ± z α/2 · SE(qˆ M L ), where z p is the p-th upper quantile for N (0, 1), and SE(aˆ M L ) =
ˆ [(− J)−1 (θ˜
ML
)]1,1 ,
ML
ˆ [(− J)−1 (θ˜ )]2,2 , ML ˆ ML SE(mˆ 2 ) = [(− J)−1 (θ˜ )]3,3 , ML ˆ ML SE(ˆs1 ) = [(− J)−1 (θ˜ )]4,4 , ML ˆ ML SE(ˆs2 ) = [(− J)−1 (θ˜ )]5,5 , ML ˆ ML SE(qˆ ) = [(− J)−1 (θ˜ )]6,6 .
SE(mˆ 1M L ) =
Since we have the transformed parameters a = log(α + 1), m 1 = μ1 , m 2 = μ2 , s1 = log (σ1 ), s2 = log (σ2 ), and q = log(− log( p)), then we make the original parameters α = exp(a) − 1, μ1 = m 1 , μ2 = m 2 , σ1 = exp(s1 ), σ2 = exp(s2 ), and p = exp(− exp(q)), respectively. To obtain the SEs of the estimates of the original parameters, we apply the delta method √
M L 2 2 ˆ˜
˜ ˜ ˜ −g θ , ∼ N 0, SE [θ]k · g ([θ]k ) n g θ k
k
where g(·) is the transformation function and θˆ = (α, ˆ μˆ 1 , μˆ 2 , σˆ 1 , σˆ 2 , p). ˆ We can construct the 100(1 − α)% CI for all parameters α, μ1 , μ2 , σ1 , σ2 , p given by ML ML ˆ˜ ˆ˜ g [θ ] ± Z α/2 SE[θ ] . .
64
4 Estimation Under Normal Mixture Models for Financial Time Series Data
4.3.3 Initial Values In order to propose reliable initial values, we maximize the log-likelihood function given by
∗ (θ ∗ ) =
1 1 yt − μ1 yt − μ2 + (1 − p) φ , log p φ σ1 σ1 σ2 σ2 t=1
n
where θ ∗ = (μ1 , μ2 , σ1 , σ2 , p). Referring to Seo and Kim (2012), since the maximizer of the likelihood function is still difficult to obtain, we will apply the EM algorithm. To maxmize ∗ (·), we introduce latent variables Z 1 , · · · , Z n such that Yt |(Z t = 1) ∼ N (μ1 , σ12 ),
Yt |(Z t = 2) ∼ N (μ2 , σ22 ),
where P(Z t = 1) = p ≡ p1 and P(Z t = 2) = 1 − p ≡ p2 . Then, the log-likelihood for the complete data is written as ∗
∗
(θ ; Z 1 , · · · , Z n ) =
(yt − μ j )2 1 2 I(Z t = j) log p j − log(2π σ j ) − . 2 2σ j2 j=1
n 2 t=1
Now, we apply the EM algorithm with the E-step: Q(θ ∗ |θ ∗(k) ) =
n 2
2 − μ ) (y 1 t j log p j − log (2π σ j2 ) − , 2 2σ j2
Pjt(k)
t=1 j=1
where
Pjt(k) := Pr (Z t = j|θ ∗(k) ) =
1 p (k) j σ (k) φ
p1(k)
(k) φ
1
σ1
yt −μ(k) j
j
(k)
yt −μ1 σ1(k)
σ j(k)
+ p2(k)
(k) φ
1 σ2
yt −μ(k) 2 σ2(k)
.
M-step: We obtain arg max ∗ Q( ∗ | ∗(k) ) by p
(k+1)
= arg max
n
p
P1t(k)
t=1
log p +
n t=1
P2t(k)
log (1 − p) ,
1 (yt − μ1 )2 2 − log(2π σ1 ) − , = arg max μ1 ,σ1 2 2σ12 t=1 n 1 (yt − μ2 )2 (k+1) (k) 2 (μ(k+1) − . log(2π σ , σ ) = arg max P ) − 2 2 2 2t μ2 ,σ2 2 2σ22 t=1
, σ1(k+1) ) (μ(k+1) 1
n
P1t(k)
4.3 Parameter Estimation
65
Then, p (k+1) = = μ(k+1) 1
n 1 (k) P , n t=1 1t
n
(t) t=1 P1t yt n (k) t=1 P1t
n
= t=1 μ(k+1) 2 n
P2t(k) yt P2t(k)
i=1
,
,
σ1(k+1) = σ2(k+1) =
n
i=1
n
t=1
P1t(k) (yt − μ(k+1) )2 1 , n (k) t=1 P1t P2t(k) (yt − μ(k+1) )2 2 . n (k) t=1 P2t
Note that for simplicity, we set P1t(1) = P2t(1) = 0.5 as the initial values. We iterate the above algorithm until the parameters converge, and choose them as our initial value. We then use the Newton–Raphson algorithm by this initial value until the parameters converge. Note that R users may use “nlm(·)" or “optim(·)" function to directly maximize ∗ (θ ∗ ). See Everitt and Hothorn (2009).
4.4 Data Generation Referring to Chap. 2, one can generate {Yt : t = 1, 2, ..., n} by applying the conditional approach. Let F(y2 |y1 ) = U2 , where U2 ∼Unif(0, 1); then U2 = F(y1 )−(1+α) [F(y1 )−α + F(y2 )−α − 1]−( α +1) . 1
By solving the above equation for y2 , we have α − α+1
y2 = F −1 {(1 + (U2
1
− 1)F(y1 )−α )− α }, where y1 = F −1 (U1 ), U1 ∼ Unif(0, 1).
Therefore, we obtain the sequence: −
α
1
yt+1 = F −1 {(1 + (Ut+1α+1 − 1)F(yt )−α )− α }, where t = 1, ..., n − 1, Ut ∼ Unif(0, 1).
To generate data, we consider a normal mixture distribution and with (μ1 , μ2 , σ1 , σ2 , p) = (0.4, 0.8, 0.2, 0.1, 0.5). We focus on strong dependence cases through α = 2 and α = 5 leading to Kendell’s tau τ = 0.5 and τ = 0.714, respectively. In addition, the parameters in the normal mixture distributions are chosen in terms of bimodal cases shown in Fig. 4.2. Figure 4.3 shows that if α is larger, the sequence has the more specific trend rather than the smaller α.
2.0 1.5 1.0 0.0
0.5
Probability Density Function f(x)
2.0 1.5 1.0 0.5 0.0
Probability Density Function f(x)
2.5
4 Estimation Under Normal Mixture Models for Financial Time Series Data 2.5
66
−0. 5
0. 0
0. 5
1. 0
1.5
−0. 5
0. 0
0. 5
x
1. 0
1.5
x
Y(t)
0.4 0.0
0.4 0.0
Y(t)
0.8
0.8
Fig. 4.2 The density of two-component normal mixture distribution with parameters (μ1 , μ2 , σ1 , σ2 ) = (0.4, 0.8, 0.2, 0.1) and p = 0.5 (left) or p = 0.7 (right)
0
100
200
300
400
500
0
100
200
300
400
500
Time t
Time t
Fig. 4.3 Two datasets generated under (μ1 , μ2 , σ1 , σ2 , p) = (0.4, 0.8, 0.2, 0.1, 0.5) and α = 2 (left) or α = 5 (right)
4.5 Data Analysis In this empirical study, we use the weekly stock price of Dow Jones Industrial Average from 2008/1/1 to 2012/1/1. The data source comes from Yahoo Finance, and is available from the R package Copula.Markov. We then calculate the log return of the stock price of Dow Jones Industrial Average given by Yt = log(
St ), St−1
where St is the stock price of Dow Jones Industrial Average. In Fig. 4.4, we show the plot of stock price, log return, and empirical density for the data. The empirical density shows a high peak at zero and symmetric flat tails. This type of densities can be approximated by a mixture of two normal densities with different variances. We then show the Q-Q plot in Fig. 4.4 and observe that the lower and upper tails are both away from the straight line given under the normal model. Namely, the normality assumption may be violated. In the next step, we apply Jarque–Bera test
4.5 Data Analysis
67
0.10 0.05 0.00 −0.05
9000
11000
Log Return for DJI
13000
Dow Jones Index
7000
Close Price for DJI
Dow Jones Index
0
20 0
40 0
600
0
20 0
Time
600
Normal Q−Q Plot 0.10 0.05 0.00 −0.05
0
10
20
30
40
Sample Quantiles for DJI
Dow Jones Index Empirical Density
40 0 Time
−0.0 5
0.0 0
0.0 5
0.10
−3
−2
Log Return for DJI
−1
01
2
3
Theoretical Quantiles
Fig. 4.4 The plot of stock price, log return, density of log return, and Q-Q plot
proposed by Jarque and Bera (1987); also see Chen et al. (2017); Curto et al. (2009), for instance, to check normality for the stock price. The test statistic is defined as n JB = 6 where n is the sample size, S = 1 n ¯ 4 t=1 (Yt −Yt ) n 1 n ( n t=1 (Yt −Y¯t )2 )2
μˆ 3 σˆ 3
1 2 S + (K − 3) , 4 2
=
1 n
( n1
n
n
t=1
t=1
(Yt −Y¯t )3 3
(Yt −Y¯t )2 ) 2
is the sample skewness, and
is the sample kurtosis. If the data comes from a normal K = μσˆˆ 44 = distribution, the JB statistic asymptotically has a chi-squared distribution with two degree of freedom and S will be equal to 0 and K will be equal to 3 such that JB is equal to 0. We obtained S = 0.08999 and K = 9.06975 and p-value < 0.001 at the 1% significance level by the Jarque–Bera test. In addition, the excess kurtosis is equal to 6.11800. According to these two reasons, we conclude that the data violates the normality assumption due to the fat tails. Therefore, we consider the copula-based Markov chain models with marginal distributions being a two-component normal mixture.
68
4 Estimation Under Normal Mixture Models for Financial Time Series Data
We then apply the proposed model to the log return of the stock price of Dow Jones Industrial Average. We computed the MLEs using the proposed Newton– Raphson algorithm based on the copula-based Markov model. The results are shown in Fig. 4.5 and Table 4.1. Table 4.1 shows the estimates and the corresponding 95% CIs. In order to verify the proposed estimates to be MLEs, we examine the gradient of log-likelihood function and the Hessian matrix given by ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
∂
∂α ∂
∂μ1 ∂
∂μ2 ∂
∂σ1 ∂
∂σ2 ∂
∂p
⎤
⎤ −2.9 × 10−11 ⎢ −7.7 × 10−13 ⎥ ⎥ ⎢ ⎢ 3.2 × 10−12 ⎥ ⎥ =⎢ ⎢ 9.8 × 10−14 ⎥ , ⎥ ⎢ ⎣ 5.1 × 10−14 ⎦ −7.6 × 10−14 ⎡
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ θ =θˆ
and ⎡
−1030.8 ⎢ 444.2 ⎢ ⎢ −5647.1 ⎢ ⎢ −69.8 ⎢ ⎣ 11.2 6.2
444.2 −403199 −313075 −48.8 −101.7 −535.3
−5647.1 −313075 −4350936 −1616.3 −2085.7 −1047.4
−69.8 −48.8 −1616.3 −503.5 33.7 144.1
11.2 −101.7 −2085.7 −33.7 −278.8 153.4
⎤ 6.2 −535.3 ⎥ ⎥ −1047.4 ⎥ ⎥. 144.1 ⎥ ⎥ 8153.4 ⎦ −165.6
We observe that the eigenvalues of the Hessian matrix are all negative, namely, [−17.4, −368.9, −549.7, −1031.3, −3.7 × 105 , −4.3 × 106 ]. ML
Therefore, the θˆ = (α, ˆ μˆ 1 , μˆ 2 , σˆ 1 , σˆ 2 , p) ˆ is a local maxima. We also confirm the global uniqueness of the MLE by drawing the log-likelihood functions in Fig. 4.5 ˆ attains the global maxima. showing that (α, ˆ μˆ 1 , μˆ 2 , σˆ 1 , σˆ 2 , p) In this empirical analysis, we want to focus on the parameter α which can describe the correlation of data. The result indicates αˆ = −0.00987 and Kendall’s tau τˆ = −0.005. This represents that the log return from the data is nearly independent. We also fit the copula-based Markov model with the marginal normal distribution of Long and Emura (2014); Emura et al. (2017) to the same data. In particular, we estimate the mean and the standard deviation for the normal mixture distribution and the normal distribution ˆ N M [Y ] = 0.01736, Eˆ N M [Y ] = −0.00017, SD ˆ N [Y ] = 0.01734. Eˆ N [Y ] = −0.00019, SD
0.0
0.5
1.0
1.5
2000 1600 1200
1600
Log−Likelihood Function
2000
69
1200
Log−Likelihood Function
4.5 Data Analysis
2.0
−0.10
−0.05
−0.05
0.00
0.05
0.10
0.20
0.03
0.04
2000 1600 1200
Log−Likelihood Function
2000
Log−Likelihood Function
1600
0.02 sigma2
0.15
2000
0.05
sigma1
1200
0.01
0.10
1600 0.00
0.10
mu2
0.00
0.05
1200
Log−Likelihood Function
2000 1600 −0.10
0.00 mu1
1200
Log−Likelihood Function
alpha
0.0
0.2
0.4
0.6
0.8
1.0
p
Fig. 4.5 The log-likelihood function for the log return data. The vertical line in each figure represents the position of the MLE
Observe that two estimates are similar to each other. To select a better model, we show the AICs for the two models: the normal model and the normal mixture model in Table 4.2. We observe that the AIC is smaller under the normal mixture model, compared to the AIC under the normal model (a smaller AIC value leads to a better model). In conclusion, we suggest the normal mixture model to fit the data.
70
4 Estimation Under Normal Mixture Models for Financial Time Series Data
Table 4.1 The parameter estimates and 95% CIs for the log return of DJI MLE (95% CI) α μ1 μ2 σ1 σ2 p
−0.00987 (−0.06908,0.05310) −0.00152 (−0.00473,0.00167) 0.00066 (−0.00032,0.00166) 0.02619 (0.02245,0.03056) 0.00769 (0.00594,0.00995) 0.38488 (0.24597,0.52204)
Table 4.2 The parameter estimates and AIC of the two models Marginal Dist. MLE Normal mixture α −0.00987 Normal α 0.01241
μ1 −0.00152 μ −0.00019
μ2 0.00066
σ1 0.02619 σ 0.01734
AIC σ2 0.00769
p 0.38488 −4171.0 −3983.7
4.6 Conclusions This chapter offers a statistical method for performing financial time series analysis with copula-based Markov models. Firstly, we choose the Clayton copula and the marginal distribution being normal mixture distributions to catch the feature of fattailed data. We derive the likelihood function and find the MLEs using the Newton– Raphson method with appropriate transformations and initial values. In empirical analysis, we find that the normal mixture model fits better than the normal model for the distribution of the stock price. There are several directions for future extension. First, it is worth studying other methods for choosing the initial values. For example, given a copula, we can use the moment estimates through the invariant measure. In addition, from the financial point of view, it is interesting to consider directional dependence between two time series through two copula-based Markov time series. See Kim and Hwang (2017) and Kim et al. (2019) for instance. Finally, one may extend our two-component mixture model to multi-component mixture models. For instance, Matsui et al. (2005) analyzed the incubation times for N = 76 patients with Creutzfeldt–Jakob disease reported in Japan. Their analysis revealed three peaks in the density, namely, short (1.4–6.2 years), medium (7.0–11.9 years), and long (12.9–17.6 years) incubation times. This might be a case where a three-component mixture model is useful. Lastly, one may consider some extensions to survival data, which are recently developed (e.g. Huang et al. (2020)). Another interesting extension is to introduce the idea of conditional models such as the GARCH model. To this end, we consider the hierarchical model with hyperparameters driven by another Markov processes. For instance, assuming that the
4.6 Conclusions
71
marginal distribution of Yt to be normally distributed with mean zero and variance σ 2 , we can further assume that σ 2 follows the copula-based Markov process with the conditional density written as f (σt |σt−1 ) = Cσ[1,1] (Fσ (σt−1 ), Fσ (σt )) f σ (σt ), where Cσ[1,1] is a copula density for a copula Cσ and Fσ is a marginal distribution function of σt . Therefore, the conditional density of Yt is f Y (yt |yt−1 , σt−1 , σt−2 )) = Cα[1,1] (F(yt−1 |σt−2 )), F(yt |σt−1 )) f Y (yt |σt−1 ), where Cα[1,1] and Cσ[1,1] are two copula densities with respect to the processes Y and σ . Then, by deriving the likelihood function L(θ |σ, y), we can obtain the MLEs for the parameters. Furthermore, the proposed model can be applied to the statistical process control using control charts. See also Long and Emura (2014). These extensions would be of great interest.
Appendix: R codes Below are the R codes for implementing the data analysis. ——————————————————————————————— library(Copula.Markov) data(DowJones) Y=as.vector(DowJones$log_return) Clayton.MixNormal.Markov.MLE(y=Y) ———————————————————————————————–
References Billingsley P (1961) Statistical inference for markov processes. The University of Chicago Press, Chicago Chen CWS, Zona W, Songsak S, Sangyeol L (2017) Pair trading based on quantile forecasting of smooth transition Garch models. North Am J Econ Finance 39(2017):38–55 Chen X, Fan Y (2006) Estimation of copula-based semiparametric time series models. J Econ 130(2):307–335 Curto J, Pinto J, Tavares G (2009) Modeling stock markets’ volatility using Garch models with normal, student’s t and stable Paretian distributions. Stat Pap 50(2):311–321 Darsow WF, Nguten B, Olsen ET (1992) Copulas and Markov processes. Ill J Math 36(4):600–642 Emura T, Long TH, Sun LH (2017) R routines performing estimation and statistical process control under copula-based time series models. Commu Stat Simul Comput 46(4):3067–3087 Everitt BS (1996) An introduction to finite mixture distributions. Stat Methods Med Res 5:107–127 Everitt BS, Hothorn T (2009) A handbook of statistical analyses using R, 2nd Edn. Chapman and Hall/CRC
72
4 Estimation Under Normal Mixture Models for Financial Time Series Data
Huang XW, Emura T (2019) Model diagnostic procedures for copula-Based Markov chain models for statistical process control. Commun Stat Simul Comput. https://doi.org/10.1080/03610918. 2019.1602647 Huang X-W, WangW, Emura T (2020) A copula-basedmarkov chainmodel for serially dependent event times with a dependent terminal event. Japanese J Stat Data Sci, in revision Jarque CM, Bera AK (1987) A test for normality of observations and regression residuals. Int Stat Rev 55(2):163–172 Joe H (1997) Multivariate models and dependence. Chapman & hall Kim JM, Baik J, Reller M (2019) Control charts of mean and variance using copula Markov SPC and conditional distribution by copula. Communi Stati Simul Comput. https://doi.org/10.1080/ 03610918.2018.1547404 Kim J-M, Hwang S-Y (2017) Directional dependence via Gaussian copula beta regression model with asymmetric GARCH marginals. Commun Stat Simul Comput 46(10):7639–7653 Lin WC, Emura T, Sun L-H (2019) Estimation under copula-based Markov normal mixture models for serially correlated data. Commun Stat Simul Comput. https://doi.org/10.1080/03610918. 2019.1652318 Long TH, Emura T (2014) A control chart using copula-based markov chain models. J Chin Stat Assoc 52(4):466–496 MacDonald IL (2014) Does Newton-Raphson really fails? Stat Methods Med Res 23(3):308–311 Matsui S, Sadaike T, Hamada C, Fukushima M (2005) Creutzfeldt-Jakob disease and cadaveric dura mater grafts in Japan: an updated analysis of incubation time. Neuroepidemiology 24:22–25 Nelsen RB (2006) An introduction to copulas, 2nd edn. Springer Series in Statistics, Springer, New York Seo B, Kim D (2012) Root selection in normal mixture models. Comput Stat Data Anal 56(8):2454– 2470 Sun L-H, Emura Lee C-S, T, (2018) A Bayesian inference for time series via copula-based Markov chain models. Commun Stat Simul Comput. https://doi.org/10.1080/03610918.2018.1529241 Platen E, Rendek R (2008) Empirical evidence on student-t log-returns of diversified world stock indices. J Stat Theory Prac 2(2):233–251 Zangari P (1996) An improved methodology for measuring VaR. Risk metrics monitor 2nd quarter, Reuters/J.P. Morgan, pp 7–25
Chapter 5
Bayesian Estimation Under the t-Distribution for Financial Time Series
Abstract This chapter studies Student’s t-distribution for fitting serially correlated observations where serial dependence is described by the copula-based Markov chain. Due to the computational difficulty of obtaining maximum likelihood estimates, alternatively, we develop Bayesian inference using the empirical Bayes method through the resampling procedure. We provide a Metropolis–Hastings algorithm to simulate the posterior distribution. We also analyze the stock price data in empirical studies for illustration. Keywords Clayton copula · Student’s t-distribution · Bayesian inference · Markov chain Monte Carlo · Metropolis–Hastings algorithm
5.1 Introduction In the analysis of time series data, fitting an appropriate model for serial dependence is important. One popular approach is based on copulas where a copula function determines the dependence between two successive observations referred to Chen and Fan (2006), Long and Emura (2014), Emura et al. (2017), Sun et al. (2018), and Kim et al. (2019). In practice, there are two popular families of copulas which are the elliptical family such as the Gaussian copula and the t-copula and the Archimedean family such as the Clayton, Gumbel, Frank, and Joe copulas. Joe (1997), Long and Emura (2014), Emura et al. (2017), and Huang and Emura (2019) considered parameter estimation under the copula-based Markov chain model with the marginal normal distributions. The maximum likelihood estimates (MLEs) provide much more accurate estimates than the semiparametric estimators of Chen and Fan (2006), when the normal assumption is correct as shown by Long and Emura (2014). However, Platen and Rendek (2008) showed that log returns in the stock market follow some heavy tail distributions rather than normal distributions. Electronic supplementary material The online version of this chapter (https://doi.org/10.1007/978-981-15-4998-4_5) contains supplementary material, which is available to authorized users. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 L.-H. Sun et al., Copula-Based Markov Models for Time Series, JSS Research Series in Statistics, https://doi.org/10.1007/978-981-15-4998-4_5
73
74
5 Bayesian Estimation Under the t-Distribution for Financial Time Series
In biomedical studies, the t-distribution has been employed to deal with heavy-tailed data as discussed in Wang et al. (2018). Hence, we study the copula-based Markov chain model where the marginal distributions are given by Student’s t-distributions. However, the degrees of freedom parameter ν is difficult to be maximized since the log-likelihood function is often monotonically increasing in ν. See Sun et al. (2018) for instance. In addition, the derivatives of the log-likelihood with respect to ν are extremely complicated, so the Newton–Raphson algorithm is difficult to apply. Hence, we propose a Bayesian approach in order to avoid these difficulties. In order to obtain estimates efficiently, Hasting (1970) introduces a Bayesian approach using the Markov chain Monte Carlo (MCMC) method where the MCMC is a conditional simulation methodology to avoid the complex maximization problem. Carter and Kohn (1994) apply the MCMC method in the case of conditionally Gaussian state-space models and Michael and Nicholas (2002) apply the MCMC for finance and economics. In addition, Metropolis et al. (1953) and Hasting (1970) study the MH algorithm where one needs to simulate Markov chains with some equilibrium distributions. In particular, the Gibbs sampler proposed by Geman and Geman (1984) is one special case of the MH algorithm. Smith and Roberts (1993) discuss the Bayesian approach using Gibbs sampler and the MCMC method. The chapter is organized as follows. We introduce the model and likelihood in Sect. 5.2. Section 5.3 discusses the estimation scheme using the Bayesian approach and the Metropolis–Hastings algorithm. Section 5.4 is devoted to the analysis of S&P 500. The conclusion is provided in Sect. 5.5.
5.2 Models and Likelihood 5.2.1 Copula-Based Markov Models The aim of this section is to introduce the proposed model and the corresponding likelihood. Darsow et al. (1992) introduced the copula-based Markov chain model. Accordingly, we impose a Markov assumption such that given the past observations {Ys : s = 1, 2, . . . , t − 1}, the current observation Yt depends solely on the previous observation Yt−1 where the correlation structure is given by a bivariate copula. Referring to Joe (1997) and Nelsen (2006), for any bivariate distribution function H (y1 , y2 ) with two marginal distribution functions F1 (y1 ) and F2 (y2 ), there exists a copula C : [0, 1]2 → [0, 1] written as H (y1 , y2 ) = C (F1 (y1 ), F2 (y2 )) . For instance, the Clayton copula is written as − α1 −α −α Cα (u 1 , u 2 ) = u −α I u 1 + u −α 1 + u2 − 1 2 − 1 > 0 , α ∈ (−1, ∞) \ {0} , (5.1) where the case of α ∈ (0, ∞) implies positive dependence and the case of α ∈ (−1, 0) implies negative dependence. According to the formula for Kendall’s tau
5.2 Models and Likelihood
75
11 written as τ = 4 0 0 C(u 1 , u 2 )dC(u 1 , u 2 ) − 1, −1 ≤ τ ≤ 1, the integration has α . an explicit formula for the Clayton copula given by τ = α+2 Based on the copula model, the joint distribution function is written as Pr(Yt−1 ≤ yt−1 , Yt ≤ yt ) = H (yt−1 , yt ) = Cα (F(yt−1 ), F(yt )), where Cα (u 1 , u 2 ) is the copula with the parameter α and F(·) is the continuous marginal distribution. The corresponding conditional density is given by f Yt |Yt−1 (yt | yt−1 ) =
Cα[1,1] (F(yt−1 ), F(yt )) f (yt ) f (yt−1 ) = Cα[1,1] (F(yt−1 ), F(yt )) f (yt ), f (yt−1 )
where f (·) is the density of F(·) and Cα[1,1] (u 1 , u 2 ) is the density of the copula written as
[1,1]
Cα
(u 1 , u 2 ) =
∂ 2 Cα (u 1 , u 2 ) − −(1+α) −(1+α) −α = (1 + α)u 1 u2 [u 1 + u −α 2 − 1] ∂u 1 ∂u 2
1 α +2
.
(5.2)
5.2.2 Non-standardized t-Distribution The probability density function (pdf) of Student’s t-distributions is written as − ν+1 ν+1 1 (y − μ)2 ( 2 ) 2 1+ , y ∈ (−∞, ∞), f (y) = ν √ νσ 2 2 νπ σ
(5.3)
with the degrees of freedom ν > 0, the location parameter μ ∈ (−∞, ∞), and the scale parameter σ > 0. The mean, variance, and mode of Student’s t-distribution ν ) for ν > 2, and mode(Y ) = μ, respecare E(Y ) = μ for ν > 1, V ar (Y ) = σ 2 ( ν−2 tively. Figure 5.1 shows the comparison of the normal pdf and Student’s t pdfs with varied parameters. We observe that if ν = 2, the second moment does not exist owing to the heavy tails. By the binomial theorem, the kth moment of Student’s t-distribution is given by
k
k l k−l k−l μσ Z E(Y ) = E((μ + σ Z ) ) = E l l=0 k
k
=
k
k l k−l μ σ E(Z k−l ), l l=0
where Z follows the standard Student’s t-distribution with the kth moment written as 0 , k is odd, 0 < k < ν, k k E(Z ) = √ 1 ν [ k+1 ν−k ν 2 ] , k is even, 0 < k < ν. 2 2 π ( 2 ) In particular, E(Z ) = 0, E(Z 2 ) = V ar (Z ) =
ν , ν−2
and E(Z 4 ) =
3ν 2 . (ν−2)(ν−4)
Hence,
5 Bayesian Estimation Under the t-Distribution for Financial Time Series
fY (y)
Fig. 5.1 The comparison of pdf f between the normal distribution and Student’s t-distribution with varied parameters
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
76
1, μμ == 0, 0, σσ == 11 νν == 1, 2, μμ == 0, 0, σσ == 11 νν == 2, 5, μμ == 0, 0, σσ == 11 νν == 5, ∞μ σσ=σσ=1== ∞ 0,0, 111 μμ=0, == ,μ 0, νννν=== ∞∞μ,=
−4
−2
0 y
2
E(Y ) = E(μ + σ Z ) = μ, V ar (Y ) = V ar (μ + σ Z ) = σ 2 V ar (Z ) = and E(Y 4 ) = μ4 + 6μσ 2
4
σ 2ν ν−2
ν 3σ 4 ν 2 + . ν − 2 (ν − 2)(ν − 4)
5.2.3 Likelihood Based on the Clayton copula model (5.2) and the pdf of Student’s t-distributions (5.3), given data {yt : t = 1, 2, . . . , n}, by denoting Ut = F(yt ), we have the likelihood function L(α, ν, μ, σ 2 ) =
n t=1
=
f (yt )
n
Cα[1,1] (F(yt−1 ), F(yt ))
t=2
ν+1 n n ν+1 (yt − μ)2 − 2 1 − 1 +2 −(1+α) −(1+α) −α ν 2√ 1+ (1 + α)Ut−1 Ut (Ut−1 + Ut−α − 1) α , 2 νσ 2 νπ σ t=1 t=2
(5.4) and the corresponding log-likelihood function (α, ν, μ, σ 2 ) n
ν (yt − μ)2 1 ν+1 ν+1
− log ( ) − log νπ − log σ − = n log log 1 + 2 2 2 2 νσ 2 t=1 n
1 −α + + Ut−α − 1) . log(1 + α) − (1 + α) log Ut−1 − (1 + α) log Ut − + 2 log(Ut−1 α t=2
(5.5)
5.2 Models and Likelihood
77
Due to the difficulty to obtain the maximum of (α, ν, μ, σ 2 ) within a reasonable range of the degrees of freedom ν as discussed in Sun et al. (2018), we propose Bayes estimation of all parameters θ = (α, ν, μ, σ 2 ) in order to tackle this problem.
5.3 Parameter Estimation The aim of this section is to propose a Bayes estimator for (α, ν, μ, σ 2 ) under the following prior distributions. The copula parameter α is uniformly distributed with the interval (α1 , α2 ) where α1 and α2 are the lower bound and the upper bound of α. Note that we use α ∈ (α1 , α2 ) \ {0} instead of α ∈ (−1, ∞) \ {0} in order to propose a proper prior distribution for α so that the impropriety issue of the posterior distribution can be avoided. See Hobert and Casella (1996). The degrees of freedom ν follow the chi-squared distribution with the degrees of freedom k. The parameter μ is normally distributed with the mean m and the variance γ 2 written as μ ∼ N(m, γ 2 ). The parameter σ 2 follows the inverse-gamma with (α , β ). Hence, (α1 , α2 , k, m, γ 2 , α , β ) is the set of hyperparameters. Based on the assumption of the independence between all parameters θ = (α, ν, μ, σ 2 ), the joint prior distribution is
2 β 1 1 β α 1 k ν − (μ−m) 2 −(α +1) − σ 2 (σ × k k ν 2 −1 e− 2 × √ ) e , π(θ) = e 2γ 2 × α2 − α1 (α ) 2π γ 22 2 (5.6) where α ∈ (α1 , α2 ) \ {0}, ν ∈ (0, ∞), μ ∈ (−∞, ∞), and σ 2 ∈ (0, ∞). However, the accuracy for Bayesian estimation depends on the choice of hyperparameters. To avoid this, we adopt the empirical Bayes method in the following.
5.3.1 Estimation of Hyperparameters via Resampling In many applications of Bayes inference, the hyperparameters are subjectively given by the domain knowledge. In this section, we propose a resampling scheme for objectively estimating hyperparameters using the method of moment according to the framework of the empirical Bayes methods. Note that the degrees of freedom ν > 4 lead to the existence of the fourth moment. • The procedure: 1. Owing to the parameter α depending on the correlation structure of the data, the resampling scheme is not straightforwardly applied. Hence, we keep the correlation structure by dividing data into l groups denoted as {Y1 , . . . , Yn 1 }, {Yn 1 +1 , . . . , Yn 2 }, . . . , {Ynl−1 +1 , . . . Ynl }
78
5 Bayesian Estimation Under the t-Distribution for Financial Time Series
where 0 < n 1 < n 2 < · · · < n l = n. Namely, there are n i − n i−1 samples in the ith group. Kendall’s tau in group i is obtained using τ˜(i) =
2 (n i − n i−1 )(n i − n i−1 − 1) n
sgn(Yt+1 − Ys+1 )sgn(Yt − Ys ).
i−1 +1≤s