Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model (Gabler Theses) 3658386177, 9783658386177

The book addresses the problem of a time-varying unconditional variance of return processes utilizing a spline function.

127 86 29MB

English Pages 259 [260] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Foreword
Acknowledgements
Contents
List of Figures
List of Tables
List of Abbreviations
List of Symbols
1 Introduction
1.1 Motivation
1.2 Problem statement
1.3 Outline of the thesis
2 Financial time series
2.1 Definitions and properties
2.2 Stylized facts
2.3 Model specification
2.4 Univariate GARCH models
2.5 Long-range dependence and structural breaks
3 Smoothing long term volatility
3.1 Multiplicative decomposition of the conditional variance function
3.2 Spline functions
3.2.1 Truncated power spline function
3.2.2 B-spline functions
3.3 Model review
3.3.1 Spline volatility models
3.3.2 Spline-GARCH model
3.3.3 B-spline-GARCH model
3.3.4 P-spline GARCH model
4 Free-knot spline-GARCH model
4.1 Optimization
4.2 Estimation methods
4.2.1 Least-squares
4.2.2 Least-squares with free-knots
4.2.3 Jupp transformation
4.2.4 Quasi-maximum-likelihood
4.3 Model selection
4.4 Forecast evaluation
4.5 Starting vector
5 Simulation study
5.1 Previous studies
5.2 Simulation setup
5.2.1 Data generating process
5.2.2 Computational aspects
5.2.3 Sample statistics
5.2.4 Asymptotic statistics
5.2.5 Specification
5.2.6 Starting vectors
5.3 Model selection
5.4 Finite sample properties
6 Empirical study
6.1 Previous studies
6.2 In-sample analysis
6.3 Out-of-sample forecast
7 Conclusion
7.1 Research problems and contributions
7.2 Research questions
7.3 Limitations and future research
7.4 Concluding remarks
References
Appendices
A Standardized Student’s t-distribution
B Derivatives
B.1 Free-knot spline-GARCH model
B.2 P-spline-GARCH model
C Tables
C.1 Simulation study: knots
C.2 Simulation study: Finite sample properties
C.3 Empirical study
D Figures
D.1 Simulation study: distribution of knot selection
D.2 Simulation study: asymptotic distribution estimators
D.3 Emprical study
Recommend Papers

Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model (Gabler Theses)
 3658386177, 9783658386177

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Gabler Theses

Oliver Old

Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model

Gabler Theses

In der Schriftenreihe „Gabler Theses“ erscheinen ausgewählte, englischsprachige Doktorarbeiten, die an renommierten Hochschulen in Deutschland, Österreich und der Schweiz entstanden sind. Die Arbeiten behandeln aktuelle Themen der Wirtschaftswissenschaften und vermitteln innovative Beiträge für Wissenschaft und Praxis. Informationen zum Einreichungsvorgang und eine Übersicht unserer Publikationsangebote finden Sie hier.

Oliver Old

Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model

Oliver Old Offenbach, Germany Zugl.: Dissertation, Fakultät für Wirtschaftswissenschaften, FernUniversität in Hagen, 2021 Erstgutachter: Prof. Dr. Hermann Singer Zweitgutachter: Prof. Dr. Rainer Baule Tag der Disputation: 06. Dezember 2021

ISSN 2731-3220 ISSN 2731-3239 (electronic) Gabler Theses ISBN 978-3-658-38617-7 ISBN 978-3-658-38618-4 (eBook) https://doi.org/10.1007/978-3-658-38618-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Responsible Editor: Marija Kojic This Springer Gabler imprint is published by the registered company Springer Fachmedien Wiesbaden GmbH, part of Springer Nature. The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany

Für Nike, Oskar und Sarah

Foreword The Ph.D. dissertation of Oliver Old, entitled Modeling time-varying unconditional variance by means of a free-knot spline-GARCH model, treats the topic of nonstationary ARCHmodels, which may be necessary in applications with long financial time series, when the variance is not constant. A classical approach was initiated by Robert Engle, who used a parametrization of the variance in terms of spline functions with fixed knots. This somewhat unrealistic assumption is relaxed in Old’s work, in that he considers the knot locations as free parameters, to be estimated from empirical data. In contrast to the work of Engle and Rangel (2008), who used a truncated spline base, B-splines are used, which lead to numerically more stable algorithms. The parameter estimation procedure utilizes a quasiNewton algorithm with analytical score function, which leads to a dramatic performance enhancement. The algorithms are tested in an extensive simulation study and are applied to the Standard & Poor’s 500 index. All algorithms were developed by the author in the Matlab language and do not rely on opaque blackbox routines, as nowadays is often the case. I hope, that the text finds an interested audience, widespread distribution, and will have some impact on the implementation of nonstationary ARCH-models. Prof. i.R. Dr. Hermann Singer Chair of Applied Statistics and Empirical Social Research FernUniversität in Hagen Germany

vi

Acknowledgements This dissertation was developed during my time as a research assistant at the Chair of Applied Statistics and Methods of Empirical Social Research at the FernUniversität in Hagen. Therefore, I would first and foremost like to thank my supervisor, Prof. Dr. Hermann Singer, for giving me the chance to be part of his chair and put my doctorate into practice. When I started at his chair, I was an eager and willing student who had only a vague idea of the depths of econometrics. Prof. Dr. Singer encouraged and inspired me from day one to explore these depths properly. I have benefited a lot from the insightful discussions in room A 311 in the ESG building and later on Skype due to the corona pandemic. His door was always open and he continuously supported me even when the obstacles seemed insurmountable. This dissertation would not have been possible without his support and inspiration. I am further grateful to him for the possibility to present my research at the Statistische Woche in Trier 2019 and the CFE conference in London 2020. Furthermore, I would like to thank my second examiner, Prof. Dr. Rainer Baule, for the kindly attendance at the examination board and helpful discussion. I thank Dr. Dominik Ballreich for disentangling the magic of MATLAB and supporting me in critical coding phases. Moreover, I would like to thank my former colleagues at the chair for a great time together. Thanks to Bayram Oruc, Hasan Oruc, Jana Sachno, Daniela Doliwa, and Carina Skeet. After my contract ended, I moved to the Department of Anesthesiology, Intensive-Care Medicine, and Pain Therapy at the University Hospital Frankfurt as a research associate within the EU project ENVISION. I would like to thank the ENVISION team for their understanding and motivation during the challenging final phase of this dissertation. I would like to thank my parents for all their encouragement and their belief in me. I am grateful to my son Oskar and my daughter Nike for their empathy and understanding when dad preferred the PC screen to playing together. Last but not least, I am deeply indebted to my wife Sarah for her love, infinite patience, and unwavering support in all stages of this dissertation. This endeavor would not have been possible without you, Sarah!

vii

Contents Foreword

vi

Acknowledgements

vii

List of Figures

x

List of Tables

xii

List of Abbreviations

xiv

List of Symbols

xvii

1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Financial time series 2.1 Definitions and properties . . . . . . . . . . . 2.2 Stylized facts . . . . . . . . . . . . . . . . . . 2.3 Model specification . . . . . . . . . . . . . . . 2.4 Univariate GARCH models . . . . . . . . . . 2.5 Long-range dependence and structural breaks

1 1 5 8

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

13 13 15 19 24 29

3 Smoothing long term volatility 3.1 Multiplicative decomposition of the conditional variance function 3.2 Spline functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Truncated power spline function . . . . . . . . . . . . . . . 3.2.2 B-spline functions . . . . . . . . . . . . . . . . . . . . . . . 3.3 Model review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Spline volatility models . . . . . . . . . . . . . . . . . . . . 3.3.2 Spline-GARCH model . . . . . . . . . . . . . . . . . . . . 3.3.3 B-spline-GARCH model . . . . . . . . . . . . . . . . . . . 3.3.4 P-spline GARCH model . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

32 32 34 35 36 40 40 40 44 48

4 Free-knot spline-GARCH model 4.1 Optimization . . . . . . . . . . . . . 4.2 Estimation methods . . . . . . . . . 4.2.1 Least-squares . . . . . . . . . 4.2.2 Least-squares with free-knots 4.2.3 Jupp transformation . . . . . 4.2.4 Quasi-maximum-likelihood . . 4.3 Model selection . . . . . . . . . . . . 4.4 Forecast evaluation . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

50 52 57 59 60 61 62 80 88

viii

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . .

. . . . . . . .

. . . . . . . .

Contents 4.5

Starting vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 Simulation study 5.1 Previous studies . . . . . . . . . 5.2 Simulation setup . . . . . . . . 5.2.1 Data generating process 5.2.2 Computational aspects . 5.2.3 Sample statistics . . . . 5.2.4 Asymptotic statistics . . 5.2.5 Specification . . . . . . . 5.2.6 Starting vectors . . . . . 5.3 Model selection . . . . . . . . . 5.4 Finite sample properties . . . .

93

. . . . . . . . . .

104 104 105 105 108 110 111 112 112 114 117

6 Empirical study 6.1 Previous studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 In-sample analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Out-of-sample forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

132 132 135 147

7 Conclusion 7.1 Research problems and contributions 7.2 Research questions . . . . . . . . . . 7.3 Limitations and future research . . . 7.4 Concluding remarks . . . . . . . . . .

153 153 155 160 162

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . .

References

163

Appendix A Standardized Student’s t-distribution

177

Appendix B Derivatives 178 B.1 Free-knot spline-GARCH model . . . . . . . . . . . . . . . . . . . . . . . . . 178 B.2 P-spline-GARCH model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Appendix C Tables 183 C.1 Simulation study: knots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 C.2 Simulation study: Finite sample properties . . . . . . . . . . . . . . . . . . . 192 C.3 Empirical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Appendix D Figures 214 D.1 Simulation study: distribution of knot selection . . . . . . . . . . . . . . . . 214 D.2 Simulation study: asymptotic distribution estimators . . . . . . . . . . . . . 230 D.3 Emprical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

ix

List of Figures 2.1 2.2 2.3 2.4 2.5

S&P500 sample - spot-prices and innovations . . . . . . . . . . . . . . . . S&P500 sample - SACF . . . . . . . . . . . . . . . . . . . . . . . . . . . S&P500 sample -(GJR)-GARCH(1,1), unconditional variance in segments Simulated GJR-GARCH(1,1) with segments . . . . . . . . . . . . . . . . Simulated GJR-GARCH(1,1) geometry . . . . . . . . . . . . . . . . . . .

. . . . .

16 17 29 30 31

3.1 3.2 3.3 3.4 3.5

Truncated power spline basis functions . . . . . . . . . . . . . . . . . . . . . B-spline basis functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S&P500 sample - (GJR)-GARCH(1,1) and BS(K)-(GJR)-GARCH(1,1), l = 2 S&P500 sample - (GJR)-GARCH(1,1) and BS(K)-GJR-GARCH(1,1), l = 3 . S&P500 GJR-GARCH(1,1) and BS(8)-GARCH(1,1) derivatives . . . . . . .

36 39 43 47 48

4.1 4.2 4.3 4.4

Simulated BS-GJR-GARCH process, K = 11, l = 3 Smoothed squared innovations, one path . . . . . . Smoothed squared innovations, M paths . . . . . . Luo-Kang-Yang procedure . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

95 97 98 99

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12

RMSE model selection criteria, correctly specified model . . . . . . RMSE model selection criteria, misspecified model . . . . . . . . . . Standard deviation GARCH parameters, correctly specified model . Bias GARCH parameters, correctly specified model . . . . . . . . . RMSE GARCH parameters, correctly specified model . . . . . . . . Standard deviation GARCH parameters, misspecified model . . . . Bias GARCH parameters, misspecified model . . . . . . . . . . . . RMSE GARCH parameters, misspecified model . . . . . . . . . . . Coverage Probability GARCH parameters, correctly specified model Coverage probability GARCH parameters, misspecified model . . . Boxplots GARCH parameters, correctly specified model . . . . . . . Boxplots GARCH parameters, misspecified model . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

117 118 119 120 120 121 122 122 123 124 125 127

6.1 6.2 6.3 6.4 6.5 6.6 6.7

S&P500 S&P500 S&P500 S&P500 S&P500 S&P500 S&P500

sample - FKS(K)-(GJR)-GARCH(1,1) model, l = 2 . . . . . sample - FKS(K)-(GJR)-GARCH(1,1) model, l = 3 . . . . . sample - FKS(K)-(GJR)-GARCH(1,1), basis functions, l = 3 sample separated in full , estimation , validation period . . . OOS forecast evaluation, full validation period . . . . . . . . OOS forecast evaluation, high volatility period . . . . . . . . OOS forecast evaluation, low volatility period . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

141 144 146 147 149 152 152

D.1 D.2 D.3 D.4 D.5

Selection Selection Selection Selection Selection

. . . . . . . . . . . . . . . . . . . . vector)

. . . . .

. . . . .

214 215 216 217 218

of of of of of

ˆ K ˆ K ˆ K ˆ K ˆ K

with with with with with

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

AIC, specified as S-GARCH model . . . . . BIC, specified as S-GARCH model . . . . . HQ, specified as S-GARCH model . . . . . GCV, specified as S-GARCH model . . . . AIC, correctly specified (equidistant starting

x

. . . .

. . . . .

LIST OF FIGURES ˆ with BIC, correctly specified (equidistant starting vector) D.6 Selection of K ˆ with HQ, correctly specified (equidistant starting vector) . D.7 Selection of K ˆ with GCV, correctly specified (equidistant starting vector) D.8 Selection of K ˆ with AIC, correctly specified model . . . . . . . . . . . . . D.9 Selection of K ˆ with BIC, correctly specified model . . . . . . . . . . . . . D.10 Selection of K ˆ with HQ, correctly specified model . . . . . . . . . . . . . D.11 Selection of K ˆ with GCV, correctly specified model . . . . . . . . . . . . D.12 Selection of K ˆ with AIC, misspecified model . . . . . . . . . . . . . . . . D.13 Selection of K ˆ with BIC, misspecified model . . . . . . . . . . . . . . . . D.14 Selection of K ˆ with HQ, misspecified model . . . . . . . . . . . . . . . . D.15 Selection of K ˆ with GCV, misspecified model . . . . . . . . . . . . . . . D.16 Selection of K D.17 Asymptotic normality of α ˆ 1 , correctly specified model . . . . . . . . . . . D.18 Asymptotic normality of βˆ1 , correctly specified model . . . . . . . . . . . D.19 Asymptotic normality of γˆ1 , correctly specified model . . . . . . . . . . . D.20 Asymptotic normality of α ˆ 1 , misspecified model . . . . . . . . . . . . . . D.21 Asymptotic normality of βˆ1 , misspecified model . . . . . . . . . . . . . . D.22 Asymptotic normality of γˆ1 , misspecified model . . . . . . . . . . . . . . D.23 S&P500 sample - FKS(K)-(GJR)-GARCH(1,1) model, l = 1 . . . . . . . D.24 S&P500 sample - BS(K)-(GJR)-GARCH(1,1) model, l = 1 . . . . . . . .

xi

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237

List of Tables 2.1 2.2 2.3 2.4

S&P500 sample - description . . . . . . . . . S&P500 sample - descriptive statistics . . . S&P500 sample - (GJR)-GARCH(1,1) model Data generating process, GJR-GARCH(1,1)

. . . .

23 24 28 30

3.1 3.2 3.3

S&P500 sample - BS(K)-(GJR)-GARCH model, l = 2 . . . . . . . . . . . . . Computation time, S-GJR-GARCH(1,1), BS-GJR-GARCH(1,1) . . . . . . . S&P500 sample - BS(K)-(GJR)-GARCH model, l = 3 . . . . . . . . . . . . .

44 45 46

4.1

Computation time, different starting vectors . . . . . . . . . . . . . . . . . . 102

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8

Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data generating processes and simulated knot locations . . . . . . . . . . . Computation time, FKS-GJR-GARCH(1,1) . . . . . . . . . . . . . . . . . Sample statistics of estimated knot locations . . . . . . . . . . . . . . . . . Standardized-residuals statistics, correctly specified model . . . . . . . . . Standardized-residuals statistics, misspecified model . . . . . . . . . . . . . Fraction of full rank Fisher-information matrices, correctly specified model Fraction of full rank Fisher-information matrices, misspecified model . . . .

6.1 6.2 6.3 6.4 6.5 6.7 6.6 6.8 6.9 6.10 6.11 6.12 6.13

S&P500 S&P500 S&P500 S&P500 S&P500 S&P500 S&P500 S&P500 S&P500 S&P500 S&P500 S&P500 S&P500

C.1 C.2 C.3 C.4 C.5 C.6 C.7 C.8 C.9

Model selection, S-GARCH model . . . . . . . . . . . . . . . . . . . . . . . Model selection with starting vector routine, correctly specified model . . . Model selection with equidistant starting vectors, correctly specified model Model selection with starting vector routine, misspecified model . . . . . . Knot location statistics, DGP 1 (K0 = 11) . . . . . . . . . . . . . . . . . . Knot location statistics, DGP 2 (K0 = 9) . . . . . . . . . . . . . . . . . . . Knot location statistics, DGP 3 (K0 = 5) . . . . . . . . . . . . . . . . . . . Knot location statistics, DGP 4 (K0 = 2) . . . . . . . . . . . . . . . . . . . Finite sample properties: Estimator statistics, correctly specified model . .

sample sample sample sample sample sample sample sample sample sample sample sample sample

-

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . .

BS(K)-(GJR)-GARCH model. t-test volatility persistence. FKS(K)-(GJR)-GARCH model, l = 1, knot locations . . . FKS(K)-(GJR)-GARCH model, l = 2, knot locations . . . FKS(K)-(GJR)-GARCH model, l = 2 . . . . . . . . . . . . FKS(K)-(GJR)-GARCH model, l = 3, knot locations . . . FKS(K)-(GJR)-GARCH model. t-test volatility persistence. FKS(K)-(GJR)-GARCH model, l = 3 . . . . . . . . . . . . Diebold-Mariano test with MSPE out-of-sample evaluation Diebold-Mariano test with QLIKE out-of-sample evaluation Diebold-Mariano test with MSPE out-of-sample evaluation Diebold-Mariano test with QLIKE out-of-sample evaluation Diebold-Mariano test with MSPE out-of-sample evaluation Diebold-Mariano test with QLIKE out-of-sample evaluation

xii

. . . . . . . . .

106 107 109 113 128 129 130 131 138 139 140 142 143 143 145 150 150 150 151 151 151 184 185 186 187 188 189 190 191 193

LIST OF TABLES C.10 Finite sample properties: Estimator statistics, misspecified model . . . C.11 Finite sample properties: Volatility persistence . . . . . . . . . . . . . . C.12 Kolmogorov-Smirnov-test, correctly specified model . . . . . . . . . . . C.13 Kolmogorov-Smirnov-test, misspecified model . . . . . . . . . . . . . . C.14 Coverage probability, correctly specified model . . . . . . . . . . . . . . C.15 Coverage probability, misspecified model . . . . . . . . . . . . . . . . . C.16 S&P500 sample - BS(K)-(GJR)-GARCH model, l = 1 . . . . . . . . . . C.17 S&P500 sample - FKS(K)-(GJR)-GARCH model, l = 1 . . . . . . . . . C.18 S&P500 sample - MSPE, full validation period . . . . . . . . . . . . . . C.19 S&P500 sample - MSPE, low volatility period . . . . . . . . . . . . . . C.20 S&P500 sample - MSPE, high volatility period . . . . . . . . . . . . . . C.21 S&P500 sample - QLIKE, full validation period . . . . . . . . . . . . . C.22 S&P500 sample - QLIKE, low volatility period . . . . . . . . . . . . . C.23 S&P500 sample - QLIKE, high volatility period . . . . . . . . . . . . . C.24 S&P500 sample - Diebold-Mariano test, MSPE, full validation period . C.25 S&P500 sample - Diebold-Mariano test, MSPE, low volatility period . . C.26 S&P500 sample - Diebold-Mariano test, MSPE, high volatility period . C.27 S&P500 sample - Diebold-Mariano test, QLIKE, full validation period C.28 S&P500 sample - Diebold-Mariano test, QLIKE, low volatility period . C.29 S&P500 sample - Diebold-Mariano test, QLIKE, high volatility period .

ix

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213

List of Abbreviations ACF

AutoCorrelation Function

AIC

Akaike Information Criterion

ARCH

AutoRegressive Conditional Heteroskedasticity

ARCH-LM

ARCH Lagrange Multiplier

ARMA

AutoRegressive Moving Average

B-spline

Basic-spline

BFGS

Broyden, Fletcher, Goldfarb and Shanno

BIC

Bayesian Information Criterion

BS-GARCH

B-spline-GARCH

CI

confidence interval

cf.

confer (Latin for “compare“)

CP

coverage probability

DCC

Dynamic Conditional Correlation

DGP

Data Generating Process

DM

Diebold and Mariano

EGARCH

Exponential GARCH

e.g.

exempli gratia (Latin for “for example“)

et al.

et alii (Latin for “and others“)

etc.

et cetera (Latin for “and the rest“)

FKS-GARCH

Free-Knot-Spline-GARCH

GARCH

Generalized ARCH

GARCHt

GARCH with Student’s-t distributed DGP

GARCH MIDAS

GARCH MIxed Data Sampling

GCV

Generalized Cross Validation

GDP

Gross Domestic Product

xiv

List of Abbreviations GJR-GARCH

Glosten-Jagannathan-Runkel GARCH

GN

Gauss Newton

GNML

Gauss Newton Marquardt Levenberg

HARCH

Heterogenous ARCH

HQ

Hannan-Quinn Information Criterion

i.e.

id est (Latin for “that is“)

IGARCH

integrated GARCH

i.i.d.

independent and identically distributed

IS

In-Sample

JB

Jarque-Bera

KS

Kolmogorov-Smirnoff

LS

Least Squares

MD

Martingale Difference

MRES

Modified Recursive Expanding Scheme

ML

Maximum Likelihood

MLE

ML Estimator

MTV-GARCH

Multiplicative Time-Varying GARCH

n.d.

negative definite

n.s.d.

negative semi definite

NLS

Nonlinear Least Squares

OLS

Ordinary Least Squares

OOS

Out Of Sample

OPG

Outer Product of Gradients

p.

pagina (Latin for “page“)

pp.

paginae (Latin for “pages“)

p.d.

positive definite

p.s.d.

positive semi definite

pdf

probability density function

PS-GARCH

Penalized Spline GARCH

QML

Quasi-Maximum Likelihood

xv

List of Abbreviations QMLE

QML Estimator

RMSE

Root Mean Squared Error

SACF

Sample AutoCorrelation Function

S-GARCH

Spline-GARCH

S&P500

Standard & Poor’s composite 500 index

SSR

Sum of Squared Residuals

SST

Sum of Total Squares

TV-GARCH

Time-Varying GARCH

VP

Volatility Persistence

WN

White Noise

w.r.t.

with respect to

xvi

List of Symbols General symbols b ˆb b B B B −1 B+ b0 b0 I

scalar b estimated b vector b matrix B transposed matrix B inverse of matrix B Moore-Penrose generalized inverse of matrix B vector with starting values vector with true values identity matrix 1(·) indicator variable for a subset (·) R set of real numbers Z set of integer numbers D open convex set Pr{·} probability F(·) empirical distribution p(·) probability density function f (·), g(·) arbitrary function f  (·), g  (·) first derivative of arbitrary functions (scalar) f  (·), g  (·) second derivative of arbitrary functions (scalar) N (·) normal distribution St(·, v) Student’s-t distribution with v degrees of freedom χ2 (v) chi-square distribution Γ(·) gamma function c constant term (several usages) Ψt−1 information set E[·] unconditional expectation Et−1 [·] conditional expectation Ey [·], Ez [·], E0 [·] expectation w.r.t. y, z, true distribution Var[·] unconditional variance Vart−1 [·] conditional variance Cov[·] unconditional covariance Σ variance-covariance-matrix Avar[·] asymptotic variance κ(·) theoretical kurtosis ρh ACF Tr(·) trace function diag(·) diagonal function dim(·) dimension # Number of elements

xvii

List of Symbols p

− → i.i.d. ∼

convergence in probability i.i.d. variable

Indices and orders h H j maxIter j J t T T (E) V e t + j|t t|t − 1 t:t+j LFP i n m M MT K p P q Q U V

lag of ACF number of lags in (S)ACF iterations optimization algorithm maximum number of iterations forecast step number of forecast steps time index last time point (in full sample period) last time point in estimation sample period length validation sample period index for validation sample period OOS-forecast j steps ahead, conditional on the information at time t in-sample-forecast, conditional on the information at time t − 1 cumulative sum of corresponding variable in range [t, t + j] low-frequency period several usages several usages replication index number of replications number of converged replications number of knots ARCH parameter index number of ARCH parameters GARCH parameter index number of GARCH parameters number of AR parameters number of MA parameters

Parameters θ φ φy φ α0 αp βq γp v η1 wi λi ωi ei

parameter (general) ARMA function parameter (general) AR parameter MA parameter intercept short-term volatility function ARCH parameter GARCH parameter asymmetric response parameter degrees of freedom (Student’s-t distribution) volatility persistence spline function parameter Jupp transformed knot parameter of Jupp back-transformation normalized B-spline knot difference of parameters

xviii

List of Symbols ι θ φ α w λ u Φ A W Λ Θ U

smoothing parameter for penalized splines vector of parameters vector of ARMA parameters vector of conditional variance function parameters vector of spline function parameters vector of Jupp transformed knots length of parameter vector set of mean parameters set of conditional variance parameters set of spline parameters set of Jupp transformed parameters compact parameter set maximum number of parameters in set of models

Variables at pt yt t ˆt zt zˆt xt  y

error term of squared innovation function spot-price of an financial asset log-return innovation residual standardized innovation standardized residual exogenous variable vector of innovations vector of log-returns

Optimization a b c d δ low up pq (δ) pcu (δ) funtol gradtol steptol macheps m n p(·) Δp(·) Δ2 p(·) Q (θ) s

fraction of initial rate of objective function decrease parameter controlling the step-length in line-search algorithm parameter controlling for cubic step-length parameter controlling for quadratic step-length step length optimization algorithm lower bound backtracking parameter upper bound backtracking parameter quadratic polynomial in line-search sub-iteration cubic polynomial in line-search sub-iteration stopping criterion for relative function value stopping criterion for relative gradient stopping criterion for relative step-length floating point relative accuracy correction term model-Hessian constant in GNML method approximation polynomial gradient of approximation polynomial Hessian of approximation polynomial BFGS update optimization algorithm’s step (search direction)

xix

List of Symbols backtracking coefficient line-search algorithm difference from new and old gradient in BFGS update

r y Statistics ˆ E(·)  Var(·)  Std(·)  Cov(·) ρˆh  SK(·)  Kur(·)  bias(·)  RMSE

 0 bias ˆ )m t(t biast0m  0 bias ˆ ) t(t biast0

sample mean sample variance sample standard deviation sample covariance SACF sample skewness sample kurtosis sample bias sample RMSE bias of estimated knot vector (measured as Euclidean distance) bias of starting-knot vector (measured as Euclidean distance) average bias of estimated knot vector average bias of starting-knot vector

 0 bias ˆ ) t(t biast0  Std[bias t0 ] QLB H QMcL H − d+ M T , dM T , d M T dMT ;1−α  Pr c ˆ  θ) rse(

standardized bias of estimated knot vector standardized bias of starting-knot vector standardized sample standard deviation Ljung-Box statistic McLeod-Li statistic KS statistic KS quantiles coverage probability estimated robust-standard-error

Mean and variance functions ht τt τ¯LFP σ2 σt2 σ ˜t2 IVt RVt νt me ηe μ μt ct dt

short-term volatility function long-term volatility function averaged long-term volatility constant unconditional variance conditional variance proxy variable for conditional variance integrated variance realized variance short-term volatility function of zt e-th moment of zt unconditional expectation of νte unconditional mean conditional mean vector of partial first derivatives of linear parameters (mean function) vector of partial first derivatives of linear parameters (variance function)

xx

List of Symbols Spline functions a b l Bil (t) B l (t) ˜ ˜l (t) B t ti t t˜ tI ri ξi ξ υi s(t) k Dk Z ζi E F R Jumpl,k ti dist Sl,t Pl,t C l−k

left boundary knot right boundary knot degree of basis function spline basis function spline basis normalized B-spline basis knot-(location) knot vector normalized knot vector inner knot vector number of knots at site ti break point vector of break points smoothness of spline-function at site i spline function k-th derivative of a spline function matrix of the k-th difference polynomial for Jupp back transformation distance of adjacent knots distance of adjacent knots matrix Rose algorithm distance matrix Rose algorithm Rose matrix for solving system of linear equations jump of a spline function with degree l and k-th derivative at ti interval length of equidistant knot vector space of spline functions space of polynomials continuous derivative up to order l − k

Other Symbols LT ln LT ln Lt g(θ) gt (θ) H(θ) Ht (θ) H0 ˆˆ H θ I0 Iˆθˆ J ni B(·) K(·) S P

likelihood function log likelihood function log likelihood function at time t gradient vector gradient vector at time t Hessian matrix Hessian matrix at time t asymptotic Hessian estimated Hessian matrix Fisher information estimated Fisher information Jacobian matrix unit normal vector Boltzmann entropy Kullback-Leibler information quantity hat matrix projection matrix

xxi

List of Symbols H0 H1 α simplexK−1 simplexK−1 Loss(·) smooth wt

null-hypothesis alternative-hypothesis significance level simplex simplex closure loss function operator for unspecific smoothing algorithm weight of moving-average smoother

xxii

1 Introduction 1.1 Motivation Arguably, no concept in financial mathematics is as loosely interpreted and as widely discussed as ‘volatility‘. A synonym to ‘changeability‘, ‘volatility‘ has many definitions and is used to denote various measures of changeability. Shiryaev (1999, p.345) At the heart of speculative prices modeling stands the concept of volatility. Volatility refers to the standard deviation or the variance of a certain financial variable over time. For the purposes of this dissertation, volatility equals the conditional variance. It is a dazzling concept, as volatility is not observable and often, the risk or uncertainty of an investment is derived from it. While the risk typically is associated with the lower tail of the distribution, volatility corresponds to the upper and the lower tail. Therefore, it is reasonable not to equate volatility with risk, but rather with uncertainty (Granger, 2002). Volatility has been gaining momentum latest since the publication of the option price model by Black and Scholes (1973) and Merton (1973). Unlike the other four variables (stock price, strike price, time to expiration, and interest rate) in their model, volatility is not observable. It has, therefore, to be estimated from the data. “Under the strict assumptions of the Black-Scholes model, implied volatility is interpreted as the market’s estimate of the constant volatility parameter“ (Mayhew, 1995). Besides the implied volatility, the traditional estimator is the historical volatility, estimated by the variance of the stock price history. In addition to the determination of the option price already mentioned, volatility is relevant for portfolio theory (like the capital asset pricing model), risk-measurements (like value-at-risk or expected-shortfall), and the Basle Accord’s provisions. Unlike the models discussed in this dissertation, the Black-Scholes-Merton model was defined in continuous time. Nevertheless, even in discrete time series models, the conditional variance was long assumed to be constant. To describe the dynamics of a particular variable, the mean function of this variable was the focus of interest. Until the 1970s, the changeability of the variance was mainly thought of as a cross-section phenomenon. However, for financial market instruments like stocks, interest rates, exchange rates, or several economic indicators like inflation rate, the assumption of a constant conditional variance did not describe the dynamics of this data well. Especially financial returns are poorly fitted by models with the assumption of a homoskedastic conditional variance, see Gouriéroux (1997, pp.1-5). Stylized facts about financial returns like the leptokurtosis of the marginal distribution, the cluster property, and the dependency had been known for a long time and had long been empirically documented, see Mandelbrot (1963); Fama (1965). It was the seminal work of Engle (1982) who recognized that an autoregressive process of squared returns could adequately explain these stylized facts1 . His Autoregressive Conditional Heteroskedastic (ARCH) model was a game-changer, for which Robert F. Engle was co-awarded the Nobel Prize in economic sciences in 20032 . With the exploration of the ARCH model and its generalization by Boller1 2

Whereby the structure is more like a moving average model. He shared the award with Clive W.J.Granger for his contribution to cointegration, see Diebold (2004)

1

© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2022 O. Old, Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model, Gabler Theses, https://doi.org/10.1007/978-3-658-38618-4_1

1 Introduction slev (1986), a model class was created with which the volatility could now be viewed as a time-varying function of its past3 . With this model class, “the dynamic volatility era“ (Andreou et al., 2001), which covers the last twenty years of the twentieth century, was ushered. As pure ARCH models are rarely applied nowadays, the model class is referred to its generalization as GARCH in the following. The GARCH model can be compared with an AutoRegressive Moving Average (ARMA) model for the mean since, in addition to the moving average component of the ARCH model, the autoregressive component also found its way into the model. This construction made it possible to explain the mentioned stylized facts with fewer parameters to estimate than the ARCH model. An important stylized fact that both models cannot explain is the asymmetric response of the volatility to the sign of the returns. There are several models which can explain this fact. A well-known representative applied in this dissertation is the Glosten-Jagannathan-Runkel (Glosten et al., 1993) (GJR)GARCH model. Other models that explain the asymmetric response of the volatility are the Exponential GARCH (EGARCH) model (Nelson, 1991), the Power-GARCH model (Ding et al., 1993), or the Threshold-GARCH model (Zakoian, 1994), for instance. While Engle originally chose the inflation rate in the United Kingdom as object of study, it is now impossible to imagine financial market econometrics without models of the GARCH class. To date, many GARCH models have been developed. However, only those that are relevant to this dissertation will be discussed below. A comprehensive overview of various univariate GARCH models can be found in Bollerslev et al. (1992); Bera and Higgins (1993); Bollerslev et al. (1994); Xekalaki and Degiannakis (2010) and of multivariate ARCH models in Bauwens et al. (2006),Lütkepohl (2007, chapter 5),Tsay (2014, chapter 7). Furthermore, GARCH models applied in other research areas where data violates the assumption of constant conditional variance are discussed in Guo et al. (2014) for traffic data, Campbell and Diebold (2005) for temperature forecasting, or Wong et al. (2006) for brain dynamics through an electroencephalogram.

Long-term behavior of volatility Autoregressive models assume that the time series under consideration are (weakly)-stationary. The first two unconditional moments are, therefore, time-independent. For GARCH models hold that the conditional variance fluctuates around a constant unconditional variance, which is the mean of the volatility process. The stationarity condition is met if the sum of moving average (ARCH) and autoregressive (GARCH) parameters is less than one. This sum is known as volatility persistence (VP). If one forecasts with this estimated stationary VP, the process is mean-reverting, i.e., it approaches the unconditional variance in the long run. The magnitude of the VP determines the speed of the approach. A higher VP corresponds to a slower mean-reversion. With the wide application of GARCH models, it became apparent that the VP is often nearly one. Lamoureux and Lastrapes (1990) proved that the VP for long return series is higher than for short return series. Engle and Bollerslev (1986) stated that “ARCH and GARCH models for interest rates typically exhibit parameters that are not in the stationary region“. This empirical fact inspired them to model the process with an integrated variance, similar to a random walk model in the mean function. Here, the VP sums up to one. They called this model the Integrated-GARCH (IGARCH) model. 3

Returns are typically thought to be unpredictable in a perfect market environment (see Fama (1965)). If this is the case, the returns equal the innovations (error term in time series context). However, Engle (1982) focused on a more general mean process, with dynamic and static explanatory variables. It is, therefore, more accurate to speak of squared innovations rather than squared returns.

2

1 Introduction In a comment on their paper, Diebold (1986) firstly supposed that shifts in the intercept of the conditional variance equation caused a high VP in GARCH models. This assumption suggested different means in different segments of the volatility process. A high estimated VP corresponds to a long memory of the process if the applied model is correct. Mikosch and Starica (2004) proved this perception. They demonstrated that changes caused the socalled “IGARCH-effect“ in the unconditional variance and that the resulting assumption of a long-memory process is spurious. Hillebrand (2005) was able to identify that by estimating the parameters over the entire sample period, neglecting structural breaks overestimates the VP. Regarding the last 40 years, there have been many occurrences associated with a structural break in the time series of stock returns. First and foremost, there was the bursting of the dot-com bubble and the Asian crisis at the beginning of the 2000s, the financial crisis in 2008, and, most recently, the corona pandemic. However, there are also single events like the black Monday in 1987 or the terrorist attacks in 2001 that show up as structural breaks in time series. At this point, a distinction must be made between structural breaks in the mean process and the variance process. The events just mentioned are related to a sudden or continuing drop in spot prices and, therefore, correspond to negative returns. A well-known stylized fact is the negative correlation of volatility with returns, the so-called leverage effect (discussed in detail in section 2.2). Therefore, all these events are also breaks in the structure of the variance. It is, therefore, recommended to apply a structural break test, e.g., the cumulated sum test from Inclán and Tiao (1994), to identify such breaks. There are many models within the GARCH framework to address the problem of different segments, a selection of which will be briefly discussed here. A remedy for this problem would be finding structural breakpoints, dividing the whole sample into homogeneous segments of the volatility processes, and estimating a GARCH model within each segment. Hillebrand and Medeiros (2008) followed this procedure. However, this procedure has several major drawbacks. Firstly, the breakpoints would first have to be determined using additional breakpoint detection tests. Secondly, it cannot be ensured that the distance between two breakpoints is large enough to obtain unbiased estimators. Furthermore, different GARCH models have to be evaluated for each segment, making the procedure seem complex. Within this framework, but in a uniform procedure, Mercurio and Spokoiny (2004) and Čižek and Spokoiny (2009) proposed a model with time-varying parameter values. For this, they estimated a GARCH model with constant unconditional variance for a pre-defined period. Then the range was divided into sub-ranges. Then they used a likelihood ratio test strategy to determine whether the sample period could be described by a single parameter set or divided. If a single event like the 1987 crash without any volatility clustering occurs, models of the GARCH class forecast much too high volatility for a long time horizon. This inspired Hamilton and Susmel (1994) and Cai (1994) to propose a regime-switching ARCH model (later generalized by Gray (1996) and Haas (2004)). A random unobservable variable represents the volatility regime, and the transition probability of switching between these states is estimated. The state-variable is assumed to follow a first-order Markov chain with a sudden transition to the adjacent segment. In the smooth-transition model (González-Rivera, 1998), by contrast, the intercept of the conditional variance equation is possible to shift slowly from one regime to another. Here, the transition is modeled by a logistic function with a smoothness parameter accounting for the type of transition (from sudden swift to a smooth transition). A special case of the smooth-transition model is the Time-Varying (TV) GARCH model by Amado et al. (2008). Here, the parameters are a function of the time, modeled by a logistic function. This model is embedded in the framework of multiplicative GARCH models, further discussed below.

3

1 Introduction

Models with multiplicatively decomposed volatility All these models are based on the assumption of a (at least locally) constant unconditional variance process. Early approaches attempted to relieve this assumption by modeling additive linked short-term and long-term components, see Ding and Granger (1996), Muller et al. (1997), or Engle et al. (1999). Feng (2004) and van Bellegem and von Sachs (2004) were the first who proposed to decompose the conditional variance multiplicatively. One component is the stochastic stationary short-term part, representing the fluctuation around a deterministic non-stationary long-term part. The latter is the time-varying unconditional variance. A standard GARCH model4 typically embodies the short-term volatility process. This modular applicability makes multiplicative decomposition models a smart alternative to the more traditional approaches discussed above. Thus, Feng chose a non-parametric Nadaraya-Watson kernel estimator for the long-run volatility. The function representing the long-term part should follow the long-term path smoothly. That mitigates the problem of a spurious long-memory caused by a single event like the 1987 crash. In the spirit of these models, Engle and Rangel (2008) introduced their spline-GARCH model (in the following S-GARCH), which is the basis of the proposal in this dissertation. Here, a spline function represents the deterministic long-term part. Before discussing this model further, a brief explanation of a spline function follows to classify the terms used. Spline functions are piecewise polynomials, where their degree determines the smoothness of these polynomials. The sites where the polynomial pieces meet are knots. Multiple knots may occur at one site. If this is the case, then a distinction between knots and breakpoints has to be made. The spline function consists of spline-basis functions, weighted by parameters. The sum of these weighted spline-basis functions is the spline-function, see Lyche et al. (2018, chapter 1), Dierckx (1993, chapter 1), or de Boor (2001, chapters 8-9). For the S-GARCH model, Engle and Rangel (2008) chose quadratic (degree two) truncated power spline bases with equidistant knots. An equidistant knot vector means that the time series is divided into intervals of equal length depending on the number of knots. Their paper intended to explain the sources of volatility, with the presumption that exogenous variables can explain the time-varying pattern of the long-term volatility (non-constant unconditional volatility). These sources are presumed to be macroeconomic variables, typically recorded at a different frequency than the returns. They proposed a two-step procedure for this. Firstly, the long-term volatility was estimated. In a second step, they averaged these function values to the frequency of the exogenous variables. Then they conducted a cross-section regression with the averaged long-term volatility as the dependent variable. Muller et al. (1997) provided an alternative hypothesis about the source of the long-term volatility. Here, the authors had the idea that different types of market participants influence short-term or long-term volatility. Thus, traders who trade intra-day or day-to-day impact short-term volatility, while traders with a long investment horizon are not interested in intra-day or daily fluctuations. They called their model heterogeneous ARCH (HARCH), which differs from the framework of multiplicative GARCH models. However, there is a model-immanent approach that does not rely on exogenous variables. Somehow these variables influence the behavior of traders on the markets. The innovations, therefore, contain every piece of information. This assumption is implied in the model presented in this dissertation. 4

In the following, the term standard GARCH refers to any GARCH model with constant unconditional variance

4

1 Introduction The penalized spline GARCH model (PS-GARCH hereafter) by Brownlees and Gallo (2010) or Feng and Härdle (2020) represents a modified version of the S-GARCH model. Here, the likelihood function has a penalty term, through which a function with many knots can be smoothly estimated. The approach follows the recommendation by Eilers and Marx (1996). Another important representative of this model class is the GARCH-MIDAS (MIxed Data Sampling) model by Engle et al. (2013), with a long-term stochastic component. This model links the sources of long-term volatility directly to the model. The returns series is modeled as a high-frequency variable, whereas exogenous variables are modeled as low-frequency variables. The linkage to the exogenous variable in the long-term component is obtained by a so-called beta-weighting scheme, following the MIDAS approach by Ghysels et al. (2007). If the exogenous variable is stationary, then so is the long-term volatility. The statistical properties of the GARCH-MIDAS model were examined by Conrad and Kleen (2020). Finally, the multiplicative TV-GARCH model of Amado et al. (2008); Amado and Teräsvirta (2013, 2017) should not go unmentioned. Here, the long-term component is modeled as a deterministic smooth-transition function. In Amado et al. (2018), a comprehensive summary of models with a multiplicative decomposition can be found.

1.2 Problem statement Besides explaining the economic sources, models with a multiplicative decomposition provide mitigation of the near unit-root VP. This was revealed for every single model (in the related publications) and some simulation studies. For the S-GARCH model, Old (2020) conducted a broad finite sample study to investigate the behavior of the estimators under several conditions5 . Here, the Standard & Poor’s 500 composite index (hereafter S&P 500) from 1980-2018 was sampled, and an S-GARCH and an S-GJR-GARCH (with GJR-GARCH as short-term volatility model) with a different number of knots was estimated. These estimated parameters were then used to simulate and replicate different time series lengths with different numbers of knots. This simulation setup highlighted that the VP decreases when the number of knots increases. This is especially the case when the ratio (of the number of knots to the time series length) increases. It was possible to demonstrate that the estimators are also strongly biased for a concise time series (less than 1000 data points). Therefore, biased estimators caused the tremendous decrease in VP for concise time series and should be treated cautiously. In conclusion, the application of spline-GARCH models for short time series is not suitable. Vice versa, smoothing the unconditional variance with a spline function seems to be tailor-made for long-time series. The S-GARCH model in the cited working paper was only slightly modified from the original model, and the results also hold for the S-GARCH model in general. In particular, the spline functions were estimated with equidistant knot vectors and with quadratic truncated spline bases. The S-GARCH model in the original form has several drawbacks, which the model presented in this dissertation aims to address: • First of all, the choice of the basis function should be mentioned. The applied truncated power bases could be nearly linearly dependent if two adjacent knots are too close to each other and, therefore, numerically unstable (de Boor, 2001, pp.84-86). However, 5

This working paper is not part of this dissertation. However, reference is made to the results of this work at the appropriate place.

5

1 Introduction this problem is unlikely to occur if an equidistant knot vector is chosen and the ratio number of knots/time series length is small. • Besides the numerical problems with the truncated power bases, an equidistant knot vector is too restrictive. As revealed by Rice (1969), de Boor (1973), and Burchard (1974), there is an improvement in the approximation power of spline functions adapting the knot-locations to the data. In other words, the knots should be placed where the data is not smooth. • The specification of the degree of the spline basis functions should be part of the model selection and not determined in advance. From this derive the research objectives of this dissertation. • The main contribution of this dissertation thesis is estimating the knot-locations within the framework of a spline-GARCH model as free parameters. Moreover, this dissertation investigates the proposed model in the asymmetric GJR-GARCH and the symmetric GARCH variant. In addition to the assumption of a normally distributed data generating variable, this dissertation also examines the assumption of a Student’s-t distributed variable. The dissertation intends to identify the finite-sample properties of the resulting parameters of the conditional variance equation and the accuracy of the estimated knot locations under the given circumstances with a broad simulation study. These circumstances differ from some well-studied standard GARCH models (see Lumsdaine (1996, inter alia)) by the time-varying unconditional variance. In the case of a deterministic spline function, the long-term component is, furthermore, non-stationary. The proposed freeknot spline-(GJR)-GARCH model (in the following FKS-GARCH or FKS-GJR-GARCH) differs from the S-GARCH model in the choice of basis functions and the free estimation of the knots. If the knots are freely estimated, the resulting knot locations could be very close to each other or even coincide, particularly if there is a structural break in the volatility process. The proximity of the knots would lead to numerical problems with truncated power bases, as mentioned above. Therefore, the spline function is formed with a Basic-spline (B-spline) basis. There are already investigations made by Audrino and Bühlmann (2009), Liu and Yang (2016), or Zhang et al. (2020) with a B-spline basis function in the context of a GARCH model. A B-spline basis function has a major advantage here. As long as all knots within the range of a B-spline basis function do not coincide in one site, the spline basis remains nonsingular. Thus, the choice of a B-spline function for the free estimation of the parameters proves to be reasonable. Since the free-estimation approach is rarely applied in the time series setting, this dissertation draws on the rich literature in the field of cross-sectional analysis. In contrast to spline functions with a given knot-vector (e.g., an equidistant knot vector), the free-estimation of the knot vector is generally a nonlinear problem. In the cross-section theory, it is proposed to solve this problem with a nonlinear least-squares (NLS) approach (de Boor and Rice, 1968b; Jupp, 1978; Eubank, 1984; Gervini, 2006, inter alia). However, estimating parameters in the framework of (nonlinear) GARCH models with least-squares (LS) methods is rather uncommon. Compared to the usually applied (quasi)-maximum likelihood (QML) method, it has some disadvantages. Chapter 4.2 discusses this issue in more detail. Zhang et al. (2020) circumvent this problem by separating the estimation of the spline parameters (with LS) and the GARCH parameters (with QML). However, their approach does not face free-knot estimation as the knots are given in advance. On the other

6

1 Introduction hand, cross-section theory rarely applies the QML method. An exception are extended linear models (see Stone et al. (1997, inter alia)) or generalized linear models (see Fahrmeir et al. (2001, chapters 2,3)). Another issue when optimizing a nondecreasing sequence like the knot-vector is the so-called “lethargy“ problem (Jupp, 1975). Here, a typical Newton-Raphson or Gauss-Newton type optimization routine tends to find coincident knots as optima, even if with the true data generating process (DGP hereafter), there are no multiple knots at that site. These optima often turn out as local optima or saddle points. To mitigate the “lethargy“ problem of a free-knot optimization routine, Jupp (1978) provided a logarithmic transformation of the knot-vector, which transforms the problem from a constrained into an unconstrained optimization problem, with which coincident knots are quasi impossible. Nevertheless, if there is a particular structural break, two knots are very close to each other. This dissertation applies the so-called Jupp-transformation of the knots, which has two decisive advantages. First, the transformed knot parameters are on the same scale as the others (which would not be the case if the raw knot-locations were applied). Second, the wellunderstood QML method can be employed in a uniform procedure with a single parameter vector. This parameter vector contains the parameters of the mean function, the standard GARCH function, the spline function, and the transformed knots. As for any parameter estimation, the choice of starting values for estimating optimal knot locations is of utmost relevance. Wold (1974), Jupp (1978), Dierckx (1993, pp.67-68), or Lindstrom (1999) examined different starting values, without giving a universal answer on how to calculate a starting-vector with which a free-knot estimation procedure can be conducted for all possible types of data. All these approaches are from the field of cross-section theory. To the best of my knowledge, no such starting vector procedures for the field of time series analysis have been developed to date. • This dissertation thesis investigates appropriate knot-vector starting values for the application of high fluctuating return data. For this, some known algorithms, modified algorithms, and new procedures are investigated by simulation. • Another objective of this dissertation is to verify to what extent different model selection criteria can determine the model order in terms of the number of knots. Engle and Rangel (2008) recommend applying the Bayesian Information Criterion (BIC) to select the optimal number of knots. The BIC is known for penalizing models with a high dimensional parameter vector stronger than other criteria. Therefore, besides the BIC, the Akaike Information Criterion (AIC), the Hannan-Quinn criterion (HQ), and the Generalized Cross-Validation criterion (GCV) are investigated by employing a simulation study. To close the gap, this dissertation will furthermore examine how accurate the BIC is when applying the S-GARCH model under a misspecified model (simulated non-equidistant knot-vector). All this leads to five key questions this dissertation aims to answer. • Which are the finite-sample properties of the parameters of the short-term GARCH equation when a spline function is applied to smooth long-term volatility? Moreover, how do the freely estimated knot-locations of the spline function affect the estimators of the short-term volatility function? • How accurate are the estimated knot locations in relation to the applied startingvectors?

7

1 Introduction • Can the different model selection criteria validate the true number of knots? Furthermore, is it possible to distinguish between a process with constant unconditional variance and a process with time-varying unconditional variance through the model selection criteria? This dissertation provides a broad simulation study and an empirical application to answer these questions. Other questions that typically arise when volatility models are evaluated. • How can spline-GARCH models in general (in different short-term variants and different presumptions about the distribution of the DGP) mitigate the spurious longmemory-property of standard GARCH models? So, is there a significant drop in VP due to the use of spline GARCH models? Moreover, can the pattern of the VP be lowered more by free estimation of the knots than by choosing equidistant knots? • Is there an improvement in the in-sample (IS) and the out-of-sample (OOS) forecast performance if spline-GARCH models in general and with freely estimated knots in particular are applied? An empirical example answers these questions.

1.3 Outline of the thesis Chapter 2 deals with the general model-specification and the short-term volatility. First, the basics of financial time series, central definitions, and notations relevant to the dissertation’s further course are introduced. This includes the stylized facts of financial return time series such as the leptokurtosis of the unconditional distribution, volatility clustering, the leverage effect and the asymmetric response of the volatility to the sign of the return, no-correlation, the non-stationarity of the spot-price series, and some facts about measurement and aggregational Gaussianity of return data. This dissertation estimates all introduced models by employing empirical examples. Therefore, a sample of the S&P500 index from 1980-2020 is employed. This sample is described in detail. Besides the model parameters, some sample statistics, such as two Portmanteau test procedures, the Jarque-Bera test, the ARCH-LM test, and the sample autocorrelation function (SACF), have to be calculated to check the appropriateness of the corresponding model. Section 2.3 introduces these statistics and the model specifications. Section 2.4 defines the symmetric GARCH(P ,Q) and the asymmetric GJR-GARCH(P ,Q) model applied in this dissertation. Hansen and Lunde (2001, 2005) empirically proved that GARCH models with P = 1 and Q = 1 provide better fits than most of their competitors. This dissertation applies the short-term volatility in the order GARCH(1,1) and GJR-GARCH(1,1) to focus on the issues with the free-knot estimation in chapter 4. This holds across all empirical applications and within the simulation study. Furthermore, an AR(1) model with intercept is estimated for the presented models to account for the low serial autocorrelation in the mean function. Without anticipating the findings in chapter 4, it also makes sense to carry out the mean and the variance function in a uniform estimation procedure for the consecutive models. Finally, the chapter discusses the problem of estimating long time series with the standard GARCH class models. If there are neglected differences in the regime of the unconditional mean, autoregressive models such as standard GARCH models (and ARMA models for the mean function) estimate parameters in the near unit-root range. Hillebrand (2004, 2005) proved that this is due to the geometry of the estimation procedure. An example illustrates this issue with simulated data and an empirical example with the S&P500 data.

8

1 Introduction

Chapter 3 examines the smoothing of the long-term variance by models with a multiplicatively decomposed conditional variance. The first part specifies the structure for all models of this class. The second part concerns the theory and application of spline functions. After a general introduction to spline theory, the spline functions with truncated power bases and B-spline bases, both relevant for this dissertation, are defined and discussed in more detail. The last part of the chapter presents different spline-GARCH models. First, several spline-volatility models, which do not model volatility within a multiplicatively decomposed conditional variance framework, are presented. Then the S-GARCH model is described in more detail, and an S-GARCH model with a B-spline basis (BS-GARCH hereafter) is introduced. Since B-spline bases can represent truncated power bases, the function values of both spline functions are equivalent. Therefore, all models discussed in this dissertation will be estimated with B-spline basis functions (chapter 5 provides an exception, applying the original S-GARCH model to study the selection of the number of knots with different information criteria). The B-spline basis functions apply for the BS-GARCH model, the PS-GARCH, and from chapter 4 on for the FKS-GARCH model6 . For illustrative purposes, the models are estimated with the S&P500 returns. This chapter estimates an example for the BS-GARCH model with different presumptions about the distributions of the DGP, the symmetric response of the variance, and B-spline-basis functions of different degrees. Chapter 6 reports the results.

Chapter 4 introduces and investigates the FKS-GARCH model. This chapter examines the entire theory, methodology, and application of this model. Before discussing the actual model, the first part deals with the applied optimization algorithms. In this dissertation, the estimation of all parameters is conducted with unconstrained optimization algorithms. Therefore, this part looks at the general unconstrained optimization environment. It first presents the primary Newton-Raphson method in general and discusses its problems. Then the applied Broyden, Fletcher, Goldfarb, and Shanno (BFGS) quasi-Newton-algorithm and the associated line-search procedures are described. The Gauss-Newton algorithm is briefly covered to build a bridge to the models in the cited cross-section literature, as the crosssection field typically estimates spline models with a linear or nonlinear LS method. The second part addresses the objective functions to be optimized. These are the (N)LS and the QML method. This section firstly discusses the LS and the NLS methods in spline parameter estimation and specifies the Jupp-transformation. The central part of the chapter evaluates the QML estimation of the single parameter vector of the FKS-GARCH model. This parameter vector contains all parameters of the mean function, the short-term volatility function (GARCH or GJR-GARCH), the spline function, and the parameters of the Jupp-transformed knot locations. The section initially introduces the fundamental maximum likelihood theory and the inherent assumptions about distribution, model, and regularity. Then the partial first derivatives of the likelihood function and the mean, volatility, and spline-function with respect to (w.r.t.) the parameters and Jupp-transformed knot locations are derived. These analytical derivatives are crucial for the computation time and the accuracy of the applied optimization algorithm. 6

To make notation concise: this is the general form used when no distinction between FKS-GARCH and FKS-GJR-GARCH model is needed.

9

1 Introduction Furthermore, the moment conditions of the gradient and the Hessian and the resulting asymptotic theory for the use case of the FKS-GARCH model are analytically examined. It can be demonstrated that the factor for the long-term volatility (the exponential splinefunction) can asymptotically be reduced, and, therefore, the already known asymptotic theory for standard (ARMA)-GARCH models applies. This section also points out that a two-step estimation of the mean and the volatility function could violate the assumption of a block-diagonal (estimated) Fisher-information matrix in finite samples. Moreover, the variance and spline-function parameters and the Jupp-transformed knot-locations are asymptotically not independent. They have, therefore, to be jointly estimated as proposed. Part three of this chapter presents the four model-selection criteria, BIC, AIC, HQ, and GCV, in detail, which this dissertation utilizes in the sequel. Part four concerns the theoretical framework of forecast evaluation. The last part of this chapter considers evaluating a starting vector, particularly for estimating the knot locations. This venture is not trivial, even for less fluctuating data than financial returns. Regularly occurring local outliers make it difficult for known algorithms to recognize the long-term pattern (which corresponds to the long-term volatility for squared returns) underlying the data. Therefore, the data must be smoothed in advance, and afterwards, the smoothed data is subject to a starting-vector evaluation. For this, two adaptive free-knot procedures, the Multivariate Adaptive Spline Regression (MARS) model by Friedman (1991) and an approach by Luo et al. (2019), the proposals by Dierckx (1993) and Lindstrom (1999), and an ad-hoc procedure were tested through a simulation. The three candidates applied in the subsequent analysis are the modified Luo-Kang-Yang procedure and the ad-hoc procedure. These turned out to be superior under the given simulation study setup.

Chapter 5 explores the behavior of the FKS-GARCH model in the framework of different pre-specified simulation setups. For this, four cubic BS-GJR-GARCH models with a different number of knots and different distributions of the knot-vector and a standard GJR-GARCH model are simulated. Each of these models is simulated once with a normal distributed DGP and once with a Student’s-t distributed DGP. Furthermore, each model setup is replicated 1000 times. The following step estimates the parameters for each replication ranging from zero knots (standard GARCH case) to 15 knots, namely, once with a correctly specified distribution and once with a misspecified distribution. The aim of this simulation study is twofold. First, it studies the finite-sample properties of the GARCH estimators under the application of the FKS-GARCH model. Second, it evaluates the model selection criteria. Moreover, it verifies if the true number of knots can be determined with the help of any criterion presented in chapter 4.3. From the asymptotic theory from chapter 4.2.4, some favorable properties of the QML estimators for the FKS-GARCH models, such as consistency and asymptotic normal distribution, can be derived. The simulation study examines how the QML estimators behave in finite samples. For that reason, the study applies different time series lengths and the different specifications as mentioned above. In doing so, it investigates the distribution, the bias, and the variance of the estimators in dependency of time series length and simulated paths. For this purpose, several statistics are computed to verify the assumptions made. The model orders in the range from zero knots (standard GARCH) to 15 knots are estimated in the simulation study. These are 16 different models. As only one model out of 16 is the true model, there are 15 models misspecified. Here, the ability of the criteria to distinguish between a time-varying and a constant unconditional variance is of outstanding importance. It turns out that the AIC, BIC, and HQ recognize the simulated standard GARCH model.

10

1 Introduction However, the BIC often underestimates more complex model orders. The AIC, on the other hand, overestimates models with a few inner knots. The HQ seems to be a reasonable alternative here, as the simulation study suggests. Therefore, the HQ is applied in the empirical part of this dissertation. A critical issue is the selection of the starting vector. Hence, the simulation study investigates three different starting-knot-vector routines. These starting-knot-vectors are the equidistant one (which is the fastest to compute), an ad-hoc vector (also very fast to compute), and a modified version of the proposal by Luo et al. (2019). It turns out that all starting-vectors achieve good results in terms of deviation from the simulated knot vector for a few inner knots. The more complex the DGP, the more complicated it is to find a suitable starting vector. For the models with many simulated knots, more elaborate starting vectors achieved better results. Furthermore, one simulation setup has the objective to prove if the BIC (which Engle and Rangel (2008) initially recommended) identifies the true number of knots when the S-GARCH model (with quadratic truncated power bases) is applied.

Chapter 6 reports the results of the estimated conditional variances by different splineGARCH models. The focus is on the previously theoretically derived FKS-GARCH model and its competitors employing the S&P500 sample. For this purpose, the S&P500 sample is divided into an estimation period (01/02/1980-01/11/2017) and a validation period (01/12/2017-12/31/2020). The IS analysis considers the full period (01/02/1980-12/31/2020). The first part of this chapter reviews previous studies which employed a spline-GARCH model, considering first and foremost the empirical analysis in the Engle and Rangel (2008) paper. In addition, this chapter reflects the empirical S-GARCH model studies by Engle et al. (2013), Goldman and Wang (2015),Goldman and Shen (2017), Silvennoinen and Terasvirta (2017), Conrad and Hartmann (2019), and Old (2020). After that, part two of the chapter examines the IS results of FKS-GARCH models in different variants. These variants are estimated with a GARCH and a GJR-GARCH model for short-run volatility, and once each with normally distributed and once with Student’s-t distributed data generating variables. In order to investigate different smoothnesses of long-term volatility, the FKS-GARCH models are estimated with basis-functions in degrees l ∈ {1, 2, 3}. For illustrative purposes, chapters 2 and 3 present the estimates of the competitor models (standard GARCH and BS-GARCH models). These results are discussed and compared to the results of the FKS-GARCH models. For reasons of comparability, the standard GARCH models and the BS-GARCH models were estimated in the same variants as the FKS-GARCH model. The IS analysis includes the evaluation of the VP, the accuracy of the estimated conditional variance (utilizing loss functions), the model selection with the HQ, and the analysis of the standardized residuals. The latter provides information on how well the applied model represents the previously made assumptions about the distribution of the data generating variable. The results of this chapter are summarized as follows: The sample under consideration demonstrates that the VP decreases significantly when a spline function smoothes the longterm volatility. Furthermore, for the FKS-GARCH model holds that the number of selected knots is smaller than in the case of equidistantly distributed knots. A problem for FKSGARCH models with l = 3 is that the Fisher information is not of full rank. This occurs since knots were estimated very close to each other (nearly coincident) at different breakpoints. However, this does not affect the other estimators, and, thus, also the estimated standard errors of the other parameters remain reliable. In the last part of this chapter,

11

1 Introduction an OOS study investigates the forecast accuracy of the corresponding models. These are evaluated with different loss functions. It confirms that the FKS-GARCH model produces a better forecast of the conditional variance under certain circumstances than the other models considered.

12

2 Financial time series There are several characteristics different financial assets share that have been identified through empirical observations over time. These are independent of the period, location, and financial instruments and referred to as stylized facts of financial time series. Before discussing some of the most important stylized facts for this dissertation, this chapter introduces the problem of financial time series modeling and briefly reviews the history of financial market research. First, some useful properties are defined.

2.1 Definitions and properties Let pt > 0 be the observed price of a financial asset at time t ∈ Z, where t ≥ 1 is measured in days and Ψt−1 = {pt−1 , pt−2 , pt−3 , ...} is the information set the observer has up to time t − 1. For the sake of notational simplicity, in the following, Et−1 [·] refers to the conditional expectation E[·|Ψt−1 ] and Vart−1 [·] refers to the conditional variance Var[·|Ψt−1 ] of a certain process. Regarding the sequence of log-prices ln pt = μ + ln pt−1 + t ,

(2.1)

ln pt is a random walk with drift for μ = 0 and without drift for μ = 0. This was the common conjecture about the price process until the 1960s (Shiryaev, 1999, pp.37-38). Rearranging (2.1) yt = ln (pt /pt−1 ) = μ + t

(2.2)

is the process of the log-returns yt . In a more general view, the mean-process y t = μ t + t

(2.3)

is an unspecified difference equation, where the conditional mean Et−1 [yt ] = μt is a dynamic (non-)linear function of lagged values of the dependent variable (yt−1 , yt−2 , ...) and possibly exogenous (not necessarily stochastic) independent variables (x1,t , x1,t−1 , ..., x2,t , x2,t−1 , ...). Dynamic refers to the lagged dependent variables in the model. A model with no lagged variables is defined as static. t is a term for the unexpected returns, the so-called innovations. The model is specified in the next section 2.3. The following definitions refers to Brockwell and Davis (2006, chapter 1,3) and Shiryaev (1999, chapter 1).

13

© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2022 O. Old, Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model, Gabler Theses, https://doi.org/10.1007/978-3-658-38618-4_2

2 Financial time series D.1 W.r.t. the first and the second moment of a distribution, a process is weakly stationary if E[|yt |2 ] < ∞ ∀t ∈ Z E[yt ] = μ ∀t ∈ Z Cov[yt , yt+h ] = E[(yt − μ)(yt+h − μ)] ∀t, h ∈ Z, Cov(h) Cov[t , t+h ] = E[t t+h ] ∀t, h ∈ Z, Cov(h), i.e., the mean is constant, the variance is finite, and the covariance only depends on h, where h corresponds to a certain lag. For h = 0, Cov[t , t ] = Var[t ] = Var[yt ] = σ 2 (the variance is unconditionally constant). Therefore, this definition of stationarity is sometimes called covariance stationarity. The autocorrelation function (ACF) ρh =

Cov[t , t+h ] σ2

∀t, h ∈ Z, Cov(h)

(2.4)

is the standardized covariance function and describes the memory (or persistence) of the process. μ, σ 2 and ρh are independent of t. D.2 Given that a variable is weakly stationary with μ = 0 and ρh = 0 ∀t, h = 0, then the process is called white noise (WN), t ∼ WN(0, σ 2 ). D.3 If, additionally to definition D.1 and D.2, E[|t |] < ∞ Et−1 [t ] = 0 t ≥ 1, a stochastic sequence is called a martingale difference (MD). The condition E[|t |] < ∞ ensures integratability of the absolute values of the process. This expression is closely related to definition D.2. If an innovation is a martingale difference, then no prediction can be made about the future due to the (known) process history. The terms are therefore uncorrelated, i.e., Cov[t , s ] = 0 for t = s, but not necessarily independent, i.e., E[g(t )f (s )] = E[g(t )]E[f (s )] for t = s and arbitrary functions g, f . The assumption about MD innovations is necessary for any ARCH process. D.4 If a variable is, furthermore, independent and identically distributed (i.i.d.), then E[g(t )f (s )] = E[g(t )]E[f (s )] for t = s and arbitrary functions g, f , i.e., t ∼ IID(0, σ 2 ). This is WN in a strict sense. A random walk process as defined above has an innovation series with t ∼ IID(0, σ 2 ), where  yt = ts=1 s . If the variables are Gaussian distributed, then this is always strict WN, i.e., i.i.d. t ∼ N (0, σ 2 ). D.5 If F (t , ..., s ) and F (t+h , ..., s+h ) ∀t, h ∈ Z have the same joint distribution F for all moments, then an innovation series is called strictly stationary.

14

2 Financial time series Strict stationarity implies weak stationarity if E[|t |2 ] < ∞ ∀t. This point is relevant for integrated GARCH processes (see section 2.4). Strict stationarity holds for every weakly stationary Gaussian process.

2.2 Stylized facts Nonstationarity An autoregressive model yt = μ + φyt + t ,

(2.5)

where φ is an autoregressive parameter, corresponds to a stationary model if |φ| < 1, to a random walk with drift if |φ| = 1, and to a model with exponentially growing variance if |φ| > 1. Based on (2.5), Dickey and Fuller (1979) proposed a test for the hypotheses H0 : |φ| − 1 = 0 H1 : |φ| − 1 < 0, where the null hypothesis is rejected if the process is stationary. The price series of financial assets are typically nonstationary, i.e., the null hypothesis is not rejected, which is in line with the assumption of a random-walk process (Stock, 1994). A (near) unit root autoregressive process is related to a (high) perfect correlation of the price today with the price yesterday, which makes total sense, as no one expects sharp changes from one day to the next. This holds for the log-prices ln pt , as conjectured by Samuelson or Kendall, as well as for the prices pt , as conjectured by Bachelier in the early 1900s (see (Shiryaev, 1999, pp.35-46)). The log-returns yt , on the other hand, fluctuate around a constant value. The top-left graph in figure 2.1 displays the S&P500 index price series, and the top right graph the return series. The conjecture of a constant mean for yt seems appropriate. Therefore, the process reverses to a constant level in the long run; the process is mean-stationary. There are several statistics for testing the hypothesis of a mean-stationary time series. Thus, software applications, for example, typically implement the Dickey-Fuller or the Kwiatkowski, Phillips, Schmidt, and Shin test, to name two1 . For this dissertation, the verification of the mean-stationarity assumption plays a minor role. It is assumed that price time series are nonstationary, while first difference returns and log-returns are always mean-stationary. Nevertheless, for high-frequency returns (at least on a daily basis), the constancy of higher moments is not guaranteed. Regarding the unequal fluctuation around the constant mean, it is evident that the (conditional) second moment is time-varying to some extent. GARCH class models represent such processes (see section 2.4). However, in standard GARCH models, the unconditional variance is assumed to be constant. Therefore, the variance is also reverting to the long-term mean (of the variance process). If that were the case, then the series of log returns would be weakly stationary. Mandelbrot (1963) firstly explored that “the tails of the distributions of price changes are so extraordinarily long that the sample second moments typically vary in an erratic fashion“ (this finding also includes leptokurtosis, which is discussed below). For different samples of daily cotton returns with T ∈ {2, ..., 1300}, he found that the variance did not approach a constant level, i.e., the second moment is infinite and the process therefore not weakly stationary. On the other hand, Cont (2001) 1

More information about the unit-root test can be found in Maddala and Kim (2004).

15

2 Financial time series 15

4000 corona pandemic

financial crisis black monday

10

3500

September 11 attacks 5

3000 2500

0

2000

-5 September 11 attacks asia crisis

1500

-15 dotcom bubble financial crisis

500 0 1980

asia crisis dotcom bubble

-10

1000

black monday 1985

1990

1995

2000

2005

2010

2015

corona pandemic

-20 -25 1980

2020

0.6

600

0.5

500

0.4

400

0.3

300

0.2

200

1985

1990

1995

2000

2005

2010

2015

2020

black monday

corona pandemic financial crisis

September 11 attacks

0 -25

dotcom bubble asia crisis

100

0.1

-20

-15

-10

-5

0

5

10

0 1980

15

1985

1990

1995

2000

2005

2010

2015

2020

Figure 2.1: S&P 500 Index. Spot-prices pt (top left), AR(1) model residuals ˆt (top right), histogram of ˆt with Gaussian distribution in red (bottom left), squared residuals ˆ2t (bottom right)

demonstrated with 5-minute returns of the S&P500 index that the variance is indeed finite (this may be due to temporal aggregation). Mikosch and Starica (2004) proved that there is a source of nonstationarity due to structural breaks in the conditional variance process. Therefore, it can be assumed that return time series are locally weakly stationary, but not for large periods. Generally speaking, the slope parameters (autoregressive parameters) tend to be a unit-root process when structural breaks in the process have been neglected. Hillebrand (2004) proved this for the mean process and Hillebrand (2005) for the variance process. This dissertation focuses on this issue. In section 2.5, the problem of structural breaks for the stationarity assumption is discussed again in detail.

No autocorrelation Considering an i.i.d. process, the innovations are uncorrelated for each and every arbitrary function of the innovations. Of course, this would also affect squared and absolute innovations. In reality, however, the picture is often completely different. The innovations, as well as the returns, often appear low or uncorrelated at lags h > 1, which was empirically proved first by Fama (1965) for a sample of thirty Dow-Jones-Index stocks from 1956-1962. The sample autocorrelation function (SACF, see (2.17) for ρh (|t |c ) with c ≥ 1, in turn, appears to be significantly larger than zero up to a very large h (Ding et al., 1993; Ding and Granger, 1996). This is also depicted in figure 2.2 for the SACF of the S&P500 returns yt , residuals of an AR(1) model ˆt and the squared residuals ˆ2t . Therefore, the assumption of normally distributed returns or innovations is not tenable. Regarding the autocorrelated returns and residuals, the efficient market theory must be referred to (Fama, 1970, and the references therein). In a strong form, every market participant has the same information set Ψt−1 , is rational, and prices follow a fair game, i.e., there are no opportunities for arbitrage,

16

2 Financial time series 0.25

0.25

0.25

0.2

0.2

0.2

0.15

0.15

0.15

0.1

0.1

0.1

0.05

0.05

0.05

0

0

0

-0.05

-0.05

-0.05

0

50

100

150

200

250

300

350

400

0

50

100

150

200

250

300

350

400

0

50

100

150

200

250

300

350

400

Figure 2.2: S&P 500 Index. Sample autocorrelation function (SACF) with 95% homoscedastic confidence band (red). From left to right: yt ,ˆ t ,ˆ 2t

and yt is an MD (see definition D.3) with E[yt ] = Et−1 [yt ] = 0. Yet for stock returns (on a daily basis), it often appears that yt ≈ t . This indicates a low serially autocorrelated yt as figure 2.2 highlights. Therefore, a semi-strong or weak form of the efficient market theory seems more appropriate. This dissertation accounts for the (weak) autocorrelation through an AR(1) model (see section 2.3) and tests if the autocorrelations are significant up to a certain number of lags employing a Portmanteau-test (see (2.19),(2.20)). Leptokurtic and asymmetric unconditional distribution Besides variance and autocovariance from which the i.i.d. property of the normal distribution could be derived, through the kurtosis (standardized fourth moment) κ(t ) =

E[4t ] , (E[2t ])2

(2.6)

the frequency of large values can be determined, i.e., the tails of the distribution. If the examined variable is normally distributed, κ(t ) = 3, if κ(t ) > 3 the distribution is leptokurtic, whereas if κ(t ) < 3, the distribution is platykurtic. The assumption of normally distributed returns was the prevailing view until the 1960s, although Kendall recognized that large values are too common and too many values are clustered around the mean (cf. Fama (1965)). As Mandelbrot (1963) noticed the heavy tails of returns, he proposed a so-called stable pareto distribution, which is described with parameters for the location, scale, skewness, and height of the tails. For this distribution holds that it is a limiting distribution for the sum of i.i.d. variables, even if they have no finite second moment. This, in turn, makes the use of common statistical tools based on the definitions D.1-D.4 impossible, which is why the stable pareto distribution has not really caught on. Cont (2001) recommended to choose a parametric distribution by its “analytical and numerical tractability“. Looking at the histogram of the residuals in the bottom left graph in figure 2.1, the leptokurtosis of the S&P500 sample under consideration is apparent. Besides a finite variance and a kurtosis of κ(t ) = 3, a normal distribution is described by its symmetry. Accordingly, the third unconditional moment is E[3t ] = 0. Table 2.2 reveals for the sample skewness that the normal distribution assumption does not apply to the present sample either. A   ) (see (2.15)) and sample kurtosis Kur(ˆ  t ) (see test statistic that uses sample skewness Sk(ˆ t (2.14)) to test the normal distribution hypothesis is the Jarque-Bera (JB) test (see (2.16)). This assumption is rejected for the returns as well as for the AR(1) residuals ˆt and squared AR(1) residuals ˆ2t , as table 2.2 lists for the S&P500 sample.

17

2 Financial time series Volatility clustering Looking at the course of the residual time series in figure 2.1, it is relatively easy to see that the variance is not constant over time. Moreover, except for the single event of the “Black Monday“ (10/19/1987), the property of volatility clustering could be observed. Mandelbrot (1963) stated that “large changes tend to be followed by large changes-of either sign-and small changes tend to be followed by small changes“. Regarding the cluster property of returns, it is obvious that Vart−1 [t ] = Var[t ], and that the marginal distribution of t is different from the marginal distribution of a Gaussian distributed variable. The bottom right figure in 2.1 depicts the series of squared residuals 2t . This graph reveals the volatility clustering even more clearly. Referring to the significant and high autocorrelation of the squared residuals, already here the basic idea of the ARCH model of Engle (1982) becomes apparent (see section 2.4). Leverage effect Another important stylized fact of financial time series is that negative news (innovations) affects volatility more strongly than positive news (innovations). According to Black (1976), this is because corporations cannot adjust their fixed costs in the short term and, thus, cannot react to the same extent to declining revenues. Therefore, with falling revenues, the prospects for profit decrease and with them the value of the corporation. This reflects a stronger fall in stock prices and rising volatility. Black (1976) called this negative correlation the leverage effect of stock prices. In symmetric GARCH models, only the size but not the sign of t affects the conditional variance, see section 2.4. Therefore, no asymmetries and no leverage effect can be displayed with these models. As discussed in the introduction, several asymmetric GARCH models like the EGARCH model, the Power-GARCH model, the Threshold-GARCH model, and the GJR-GARCH model were developed. For a long time, the leverage effect and the asymmetric response of the conditional variance to the sign of the innovations were assumed to be two terms for the same phenomenon (Bollerslev, 2008; Zivot, 2009; Xekalaki and Degiannakis, 2010, inter alia). Recently the controversial issue has arisen whether asymmetric GARCH models can even reproduce the leverage effect. McAleer (2014) derived a random coefficient model which could generate an asymmetric GJR-GARCH process. For this random coefficient model, the GJR-GARCH parameters are variances of the random coefficients and, therefore, positively constrained. The leverage effect, a negative correlation between conditional variance and innovations, can only be modeled if the parameters of the response to positive innovations are negative and those with responses to negative innovations are positive. This concurrency is impossible if the parameters of an asymmetric GARCH model are considered variances of a random coefficient process. Therefore, the GJR-GARCH model (as well as the EGARCH model) can represent asymmetries in response to the innovations but not represent the leverage effect when positivity constraints are imposed. Caporin and Costola (2019) defined asymmetric responses as “a GARCH model allows for asymmetry if positive and negative shocks of the same size induce changes in the conditional variance (volatility) of different magnitude“ and the leverage effect as “a GARCH model is coherent with the presence of the leverage effect if negative shocks lead to an increase in the variance while positive shocks lead to a decrease in the variance“. Thus, this dissertation uses the term asymmetric response when applying the GJR-GARCH model.

18

2 Financial time series Measurement and aggregational Gaussianity The basis for the investigation of financial time series are the prices pt , measured at time t ∈ Z (in discrete time). This time is often chosen arbitrarily. For example, most applications sample the daily closing price (as applied in the empirical part of this dissertation). However, the process is subject to a dynamic that is not necessarily continuous but occurs in much smaller time intervals. A price is achieved whenever suppliers and demanders reach an equilibrium price in the stock market. The highest frequency here is the so-called trade-bytrade (Rydberg and Shephard, 2003) or ultra-high-frequency (Engle, 2000), in which prices are measured in the (micro) second’s range and not necessarily at equidistant time intervals. The closing price of a stock does not necessarily occur at the closing time of the stock exchange. Furthermore, consequently, the last traded price does not occur at the same time every day. Therefore, news (innovations) influence stock prices depending on the respective trading frequency of the stocks. This effect entered the literature as non-synchronous-trading effect (Lo and MacKinlay, 1990). This problem arises as the system of the dynamic processes and the measurement in discrete-time differs from discrete time-series modeling and equidistant sampling. Here, the parameters are estimated from the measured data in discrete time, thus neglecting the dynamics between two measurement times. This problem increases the further apart the measurement times are. If Δt = (ti − ti−1 ) → 0, this separation vanishes and even discrete GARCH models can be transformed into continuous-time models, cf. (Singer, 1999, chapters 1,8). Nelson (1990) proved that in this case, the conditional variance process becomes integrated, i.e., nonstationary (see section 2.5). If Δt → ∞ on the other hand, the distribution of the returns approaches normal distribution, and as demonstrated by Diebold (1988), using the example of exchange rate returns, the ARCH pattern of the variance disappears.

2.3 Model specification This dissertation assumes that the return series {yt } follows an Autoregressive Moving Average (ARMA) process. The ARMA model with a constant φ0 and without exogenous variables is defined as yt = φ0 + φy1 yt−1 + ... + φyU yt−U + t + φ1 t−1 + ... + φV t−V ,

(2.7)

where φ = (φ0 , φy1 , ...) is a (U + V + 1) × 1 parameter vector. The innovation series t = yt − μt = σ(Ψt−1 )zt

(2.8)

is generated by an unobservable random variable zt with E[zt ] = 0 and E[zt2 ] = 1, where zt is stochastically independent of σ(Ψt−1 ) by assumption. zt is rescaled by σt = σ(Ψt−1 ), which refers to the conditional standard deviation of t and is also unobservable. σ 2 (Ψt−1 ) refers to the term of volatility. Et−1 [yt ] = μt is the conditional expectation of the return series and refers to a specific ARMA(U,V) model. If the conditional variance σt2 = σ 2 is constant, then the innovation series t = yt − μt is a WN process. Taking the assumption of an efficient market, some important properties could be derived from t . First of all, it can be proved that t is an MD Et−1 [t ] = Et−1 [σt zt ] = σt Et−1 [zt ] = 0,

(2.9)

19

2 Financial time series with applying the law of iterated expectations E[Et−1 [zt ]] = E[zt ] = 0 and, therefore, E[t ] = E[σt ]E[zt ] = 0,

(2.10)

where zt is stochastically independent of values of t , i.e., E[g(t )f (zt )] = E[g(t )]E[f (zt )] as well as E[g(σt )f (zt )] = E[g(σt )]E[f (zt )] for various functions g, f . Since σt depends on the information set Ψt−1 , Et−1 [σt ] = σt . Having these derivations, the unconditional variance Var[t ] = E[2t ] − E[t ]2 = E[2t ] = E[σt2 Et−1 [zt2 ]] = E[σt2 ]E[zt2 ] = E[σt2 ],

(2.11)

is the same as the expected value of the conditional variance. The conditional variance, in turn, results from Vart−1 [t ] = Et−1 [2t ] = Et−1 [σt2 zt2 ] = σt2 Et−1 [zt2 ] = σt2 ,

(2.12)

where E[Et−1 [zt2 ]] = E[zt2 ] = 1 and E[Et−1 [2t ]] = E[2t ] = E[σt2 ], which proves (2.11). For purposes of this dissertation, the conditional variance σt2 = ht τt is multiplicatively decomposed into a long term component τt , which will be introduced in chapter 3, and a short term component ht , which will be introduced in section 2.4. Sample properties, statistics and diagnostics The S&P500 sample is assumed to follow an AR(1) process with μt = φ0 + φy1 yt−1 , where φ0 is the intercept and φy1 is a constant autoregressive parameter. Here, the log-returns yt = 100 ln (pt /pt−1 )

(2.13)

are multiplied by 100 to receive percentage returns and stabilize numerical optimization. In the case of financial time series, the marginal distribution of the residual series (or the series of the observed returns yt ) ˆt = yt − μ ˆt often appears leptokurtic, as figure 2.1 and table 2.2 show for the S&P500 sample. Here, the sample statistics indicate that returns are not  t ) ≥ κ(zt ) = 3, where κ(zt ) = 3 is the kurtosis of normally distributed. Accordingly, Kur(ˆ a standardized i.i.d normal distributed variable zt with Et−1 [zt ] = 0 and Et−1 [zt2 ] = 1 and T

(ˆ t −¯ ˆt )4 T  ¯t )2 2 (ˆ t −ˆ

t=1

 t ) =   Kur(ˆ T

t=1

(2.14)

T

stands for the sample kurtosis of the residual series. Another important assumption about zt is that Et−1 [zt3 ] = 0 (symmetric distribution), but the sample skewness T t=1

 )= Sk(ˆ  T t

t=1

(ˆ t −¯ ˆt )3 T 3 (ˆ t −¯ ˆt )

(2.15)

T

20

2 Financial time series   ) < 0, which means that the negative returns of the innovations is often observed Sk(ˆ t outweigh the positive ones. The Jarque and Bera (1980) statistic ⎛

JB =



 t ) − 3)2 (T − uφ ) ⎝2 (Kur(ˆ ⎠ ∼ χ2 (2) Sk (ˆt ) + 6 4

(2.16)

 t ) and tests whether the observed data come from a normal distribution through Kur(ˆ   ), where uφ is the number of parameters of the mean-equation. The JB-statistic is Sk(ˆ t asymptotically χ2 (2) distributed with 2 degrees of freedom. The null hypothesis H0 : ˆt ∼ N (0, σ 2 ) is rejected if JB > χ21−α (2), given a significance level α. If the assumption about the distribution of t and the chosen model for the mean process are correct, then ˆt should be i.i.d., and therefore serially independent. Regarding real world data sets, it is a well-documented phenomenon that the residuals are lowly serially correlated, but the squared and absolute innovations are highly serially correlated (Ding et al., 1993; Ding and Granger, 1996). Table 2.2 also highlights this phenomenon for the S&P500 sample. Here, the SACF is calculated by T

ρˆh (ˆt ) = ρˆh (ˆ2t ) =

ˆt ˆt−h t=1+h  T ˆ2t t=1  T ˆ 2t ])(ˆ2t−h − 2t − E[ˆ t=1+h (ˆ T ˆ 2t ])2 2t − E[ˆ t=1 (ˆ

(2.17) ˆ 2t ]) E[ˆ

,

(2.18)

where ρˆh refers to the sample autocorrelation coefficient of lag h = 1, ..., H. The so-called Portmanteau-statistic tests the property of normally distributed residuals of not being serially correlated. With the autocorrelation functions in (2.17), the Ljung-Box, or (2.18), the McLeod-Li (both named after their explorers), statistics QLB H = T (T + 2)

H ρˆ2h (ˆt ) j=1

= T (T + 2) QMcL H

T −h

H ρˆ2h (ˆ2t ) j=1

T −h

∼ χ2 (H − uφ )

(2.19)

∼ χ2 (H)

(2.20)

are computed and the null hypothesis H0 : ρˆ1 = 0, ρˆ2 = 0, ..., ρˆJ = 0 is tested for whether the innovations or squared innovations are significantly autocorrelated, where H corresponds to the maximum number of considered lags and ˆ 2t ] = E[ˆ

T

ˆ2t t=1 

(2.21)

T

is the sample variance2 . The null hypothesis is rejected if QLB > χ21−α (H − uφ ) for the Ljung-Box test (Ljung and Box, 1978). McLeod and Li (1983) proved that the distribution 2

2 In table 2.2, σ ˆML is a maximum-likelihood estimator of the variance (see 4.2.4).

21

2 Financial time series of the Portmanteau statistic for squared residuals is independent of the number of ARMA parameters and, therefore, the null is rejected if QMcL > χ21−α (H) for the McLeod-Li test. Whether one uses H or H − u for the degrees of freedom of the χ2 -distribution makes a difference only in small samples. Nevertheless, two restrictions should be respected in any case. First, H > u, which is relevant for high dimensional problems, like for the model presented in this dissertation. Second, T > H, which is of relevance in small samples. For a H close to T it is recommended to use a weighted version of any Portmanteau test, see (Fisher and Gallagher, 2012). The Q statistic and the χ2 -value increase with a larger H. Therefore, the number of lags considered in the analysis should be based on the problem under consideration. There are relatively few recommendations in the literature on how to choose H for the tests. Practitioners such as Hyndman and Athanasopoulos (2018, p. 62) recommended H = 10 for non-seasonal data, Burns (2002) recommended choosing H < 0.05T , whereas Fan and Yao (2017, p.75) suggested testing different values of H. Engle (2001) investigated the use of GARCH models in practice and found that most practitioners choose H = 15. For purposes of this dissertation, where T as well as u are large, 50 lags are considered. Engle (1982) recommended testing if the variance σ 2 is homoskedastic before further analyses are conducted. His suggestion is originally a lagrange-multiplier(LM)-test for the LSresiduals ˆt = yt − φˆ0 − φˆy1 yt−1 −, ..., −φˆyU yt−U

(2.22)

of an autoregressive model. For the resulting residuals ˆt , it is tested whether the parameters of an autoregressive model of the squared residuals ˆ2t = α0 + α1 ˆ2t−1 +, ..., +αP ˆ2t−P

(2.23)

are jointly zero H0 : α1 = 0, ..., αP = 0, which would imply independency of the innovations. Unlike in the original Engle paper, for purposes of this dissertation, the residuals in (2.22) are calculated with maximum-likelihood estimators, which allows greater flexibility in the choice of the mean process. For the ARCHLM regression in (2.23), on the other hand, a least-square-estimation is conducted. The ARCH-LM statistic is calculated by f  d(d d)−1 d f ∼ χ2 (P ) f f SSR = T R2 with R2 = 1 − , SST

ARCH-LM = T

(2.24)

where dt = (1, ˆ2t−1 , ˆ2t−2 , ..., ˆ2t−P )  d = [d 1 , ..., dT ]



f=



(2.25)



ˆ2P +1 ˆ2 − 1 , ..., T2 − 1 2 σ ˆ σ ˆ

22

2 Financial time series Definition

Sample Period

Observations non-trading days adjusted

S&P500 equity Index full sample estimation period validation period

Frequency

Source

Retrieved

Refinitiv-Datastream Refinitiv-Datastream Oxford-Man Institute

04/15/2021 04/15/2021 06/06/2021

daily 01/02/1980 - 12/31/2020 10330 01/02/1980 - 01/11/2017 9340 01/12/2017 - 12/31/2020 990

Table 2.1: Sample and the null hypothesis is rejected if ARCH-LM > χ21−α (P ). R2 is the squared multiple correlation between f and d, where SSR stands for the sum of squared residuals and SST for the sum of total squares. Asymptotically, the ARCH-LM test corresponds to the McLeod-Li test (Li, 2004, p.101). Often, not the residuals ˆt or the squared residuals ˆ2t , but the standardized residuals ˆt zˆt =  σ ˆt2

(2.26)

ˆ t , whereas for analyzing the residuals of a mean process are under examination. Here σ ˆt2 = τˆt h with constant conditional variance, both components are assumed to be constant. Therefore, analyzing zˆt or zˆt2 instead of ˆt or ˆ2t does not make any difference if σ ˆt2 = σ ˆ 2 is constant. Of course, there are many further diagnostic tests, but they are beyond the scope of this dissertation. An in-depth survey of the most crucial time series diagnostic tests is presented in Li (2004). What the tests applied here show up to this point is that a homoskedastic model for the mean does not describe the data generating process well, as figures 2.1 and 2.2 already suggested. All diagnostic tests (see table 2.2) reject the null hypothesis for all considered significance levels. Therefore, the distributions of the returns and the residuals are non-Gaussian. Moreover, there is a significant autocorrelation even for the returns and the residuals of the AR(1) process. This indicates a misspecification of the AR(1) model with i.i.d. innovations, as expected. The S&P 500 sample For the empirical application, a sample is drawn from the S&P 500 index. The index is chosen for reasons of practical importance. Thus, the market value of the corporations listed in the S&P500 index represents 80% of the market value of all corporations at the New York Stock Exchange. Therefore, the index is a yardstick for the state of the North American economy, cf. Woolridge and Ghosh (1986) or Lo (2016). Furthermore, the index is chosen for reasons of comparability by the frequent use in scientific volatility papers, cf. Ding et al. (1993); He and Teräsvirta (1999); Mikosch and Starica (2004); Conrad and Kleen (2020, inter alia). The full sample covers the period from 01/02/1980 to 12/31/2020. During this period, several events of great importance for the financial market occurred, in particular, the “Black Monday“ in 1987, the Asian crisis, the bursting of the dot-com bubble, and the terrorist attacks in the early 2000s. Furthermore, there were the financial crisis starting in 2008 and, most recently, the corona pandemic, to name the most influential events. The sample is divided into an estimation period (01/02/1980-01/11/2017) and a validation period (01/12/2017-12/31/2020). The full sample includes T = 10330 data points, the estimation period T = 9340, and the last T = 990 data points are used to validate a forecast study in chapter 6. The data for the estimates come from the Refinitiv Datastream. The data adjusted for non-trading days comprise T = 10340 returns for the full sample. The data for

23

2 Financial time series

Mean Standard deviation Skewness Kurtosis Minimum Maximum SACF(2) SACF(3) SACF(4) SACF(5) JB QLB 50 QMcL 50 ARCH-LM(5)

yt

ˆt

0.0342 1.1377 −1.1503 28.8728 −22.8997 10.9572 −0.0554 −0.0110 −0.0033 −0.0288 290399.9923∗∗∗ 214.6745∗∗∗ − 1161.5714∗∗∗

0.0000 1.1359 −1.2369 29.5423 −23.2345 10.8543 −0.0008 −0.0154 −0.0065 −0.0302 305860.0150∗∗∗ 173.3284∗∗∗ − 1071.7992∗∗∗

ˆ2t 1.2901 6.8927 50.1081 3664.5980 0.0000 539.8415 0.1574 0.2523 0.1295 0.1165 5775048239.6843∗∗∗ − 3578.2437∗∗∗ −

ˆt = yt − 0.0362∗ + 0.0564 yt−1 (0.0269)

2 σ ˆML = 1.2902∗∗∗

(0.0490)

(0.0676)

Table 2.2: Descriptive Statistics full-sample and AR(1) with assumption t ∼ N (0, σ). Robust-standarderrors (4.107), with analytic gradient and numerical Hessian in parentheses. * p-value < 0.10, ** p-value < 0.05, *** p-value < 0.01

the validation comes from the Oxford-Man Institute of Quantitative Finance (Heber et al., 2009). The two sources differ by ten returns. Therefore, the Refinitiv data set was shortened by these ten returns so that the total length of the validation period is T = 990 and the full period T = 10330. This dissertation employs the data from the full sample to estimate the models presented in each section to illustrate the theory. Table 2.1 summarizes and figure 2.1 displays the sample. Chapter 6 discusses the results.

2.4 Univariate GARCH models Until the exploration of the ARCH model by Engle (1982), time series analysis was mainly concerned with the modeling of the mean function μt , whereas the variance was assumed to be unconditionally and conditionally constant over time. These models could not explain some well-known stylized facts, like the volatility clustering mentioned above, the leptokurtosis of the marginal distribution, and no independency of returns. The idea of Engle (1982) was that the variance derives from an autoregressive function of lagged squared innovations 2t = α0 +

P

αp 2t−p + at ,

(2.27)

p=1

where at ∼ IID(0, 1) and Et−1 [2t ] = ht = α0 +

P

αp 2t−p

(2.28)

p=1

is the conditional mean of the squared innovation process. As t = yt − μt , ht (2t−1 , 2t−2 , ...) is the conditional variance of the mean process. To be even more specific, ht refers to the

24

2 Financial time series short-term variance and τt = 1 up to this point. If the conditional variance is estimated with (2.28), empirical analyses proved that a large number of lags must be included in the model. Hence, a large number of parameters have to be estimated. To achieve a parsimonious representation of ht , Bollerslev (1986) proposed the GARCH model hst = α0 +

P p=1

αp 2t−p +

Q

βj ht−q ,

(2.29)

j=q

where (2.29) can also be considered as ARCH model with infinite model order. Here, the superscript in hst denotes the symmetric GARCH model. Hansen and Lunde (2001, 2005) found out that GARCH models with the order P = 1 and Q = 1 are usually superior to other model orders. Therefore, and to focus on the issues discussed in chapter 4, the GARCH model hst = α0 + α1 2t−1 + β1 ht−1 



(2.30)

2 hst = α0 + α1 zt−1 + β1 ht−1

is only considered with P = 1 and Q = 1 in the following. The representation in the second row refers to the process-generating variable zt and will be useful later to derive the unconditional variance, kurtosis, and autocorrelation function. Given the assumption made in section 2.3 of a standardized i.i.d. random variable zt with E[zt ] = 0 and E[zt2 ] = 1, (2.30) is a strong GARCH process following the definitions by Drost and Nijman (1993). In this dissertation thesis, all conditional variance processes are assumed to be strong GARCH processes. To assure that ht is positive for all t, Bollerslev (1986) recommended constraining the parameters α0 > 0, α1 ≥ 0 and β1 ≥ 0. In addition to the GARCH(1,1) model in (2.30), this dissertation considers the asymmetric GJR-GARCH(1,1) model. Here, Et−1 [2t ] is represented by hat = α0 + (α1 + γ1 1t−1 0, the exponential spline function is chosen as ⎛

τt = exp ⎝w0 + w1



t t + w2 T T

2

+

K+1

wi

i=3

(t − ti−2 )+ T

2 ⎞ ⎠,

(3.30)

where τt = exp(s2 (t)). To mitigate numerical optimization problems with truncated power spline functions, the basis functions in (3.30) are scaled by T and, therefore, in line with figure 3.1, following the suggestion by Laurent (2013). As in the empirical analysis in the original paper, the embedded exogeneous variable xt is not taken into account here, either. Thus, the mollification of η1 is achieved merely due to variations in t . It should not remain unmentioned that in the original paper, ht is only represented by a symmetric GARCH(1,1) model. Old (2020) and the simulation study in this disseration also take the GJR-GARCH model as defined in equation (3.8) into account. The empirical analysis of the S&P500 sample is not conducted with truncated power bases as in (3.30), but with the equivalent B-spline function (3.31), as each truncated power basis can be represented as a B-spline basis. The resulting values of the spline function are, therefore, identical, see Ruppert et al. (2009, pp.69-71) or de Boor (2001, chapter IX). In this framework, it is easier to compare GARCH models with B-spline bases as with the S-GARCH model. This especially holds for the FKS-GARCH model, see section 4. Table 3.1 and figure 3.3 illustrate some of the properties of the S-GARCH model. Compared to the GARCH model with constant unconditional variance, the volatility persistence ηˆ1 decreases considerably. This decrease is evident with an approximately equal ARCH effect (represented by α ˆ 1 ) but a smaller GARCH effect (represented by βˆ1 ) across all four models. This coincides with findings from the original paper. Unlike as in the original paper, this dissertation considers the mean process (as AR(1)), the GJR-GARCH model (for short-term volatility), and all models also with Student’s-t distributed standardized innovations. For the GJR-GARCH models holds that the α ˆ 1 values decrease, along with a strong asymmetry effect (represented by γˆ1 ). Old (2020) demonstrated that for hat , the α ˆ 1 values decrease with increasing number of knots. In exceptional cases, they may become tiny and statistically insignificant. Suppose a high βˆ1 value indicates different unconditional variances in different 2

Engle and Rangel (2008) defined volatility as conditional standard deviation. Therefore, this term is called low-frequency volatility in their paper.

41

3 Smoothing long term volatility segments (which is evident for long-term estimation of standard GARCH models along with the occurrence of structural breaks). In that case, this reveals that the S-GARCH model captures part of the spurious long-memory effect to some extent, see chapter 2.5. A smaller βˆ1 value also has consequences for the model-implied kurtosis (2.43). This is indicated by a smaller ηˆ2 and, therefore, by a smaller kurtosis. However, this also leads to an existing kurtosis for Student’s -t distributed processes, which is often not the case when applying standard GARCH models. The AR(1) part changes only slightly. This section concludes with some remarks on the S-GARCH model in its original version. In the present example, the knots/observation ratio seems moderate, which also makes a calculation with (3.30) seem unproblematic. Yet estimating within a range of K ∈ {1, .., 15}, as recommended by the authors, and for shorter time series, some of the problems mentioned in section 3.2.1 could occur and, thus, (3.13) could be “ill-conditioned“(Dierckx, 1993, p.5). This, in turn, can lead to identification problems and problems with the calculation, as described above. Engle and Rangel (2008) recommended using the BIC to select the optimal number of knots. The simulation study 5 found that for non-equidistantly distributed knots, the BIC rarely finds the correct order. Furthermore, the BIC tends to select a too low order, especially for shorter time series (see 5.3). Old (2020) proved some of these theoretical considerations utilizing a comprehensive simulation study of the S-GARCH model (section 5.1 discusses the results in more detail). The same working paper presents an LS approach for receiving good starting values for the spline parameters. Starting values for the truncated power functions are more complex than for B-spline functions. Furthermore, the determination of a quadratic spline basis seems not tailor-made for complex nonlinear problems, as only k = 2 derivatives are possible in the given framework. On the other hand, applying a degree less than two for low volatility periods should also be possible. Therefore, it is recommended to include the degree of the basis functions into the model selection procedure. Rangel and Engle (2012) proposed a similar but multivariate factor approach, which will not be discussed further here.

42

3 Smoothing long term volatility 4

4

3.5

3.5

4

4

60

4

3.5

3.5

3.5 50

3

3

3

3

3 40

2.5

2.5

2

2

1.5

1.5

2.5

2.5

2

2

2.5 30

2

1.5

1.5

1.5 20

1

1

1

1

1 10

0.5

0.5 0

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

0

0

4

4

3.5

0.5

0.5

1985

1990

1995

2000

2005

2010

2015

2020

3.5

0.5

0

0

4

50

3.5

1985

1990

1995

2000

2005

2010

2015

2020

0

4

45

3.5

40 3

3

3

3

3 35

2.5

2.5

2

2

1.5

1.5

2.5

2.5

2

2

1.5

1.5

2.5

30 25

2

20

1.5

15 1

1

1

1

1 10

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

0.5

0

0

4

4

3.5

0.5

1985

1990

1995

2000

2005

2010

2015

2020

3.5

0

0

4

100

3.5

0.5

5

1985

1990

1995

2000

2005

2010

2015

2020

0

4

90

3.5

80 3

3

3

3

3 70

2.5

2.5

2

2

1.5

1.5

2.5

2.5

2

2

1.5

1.5

2.5

60 50

2

40

1.5

30 1

1

1

1

1 20

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

4

0

4

3.5

3.5

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

0

4

100

3.5

0.5

10

0

1985

1990

1995

2000

2005

2010

2015

2020

0

4

90

3.5

80 3

3

3

3

3 70

2.5

2.5

2

2

1.5

1.5

2.5

2.5

2

2

1.5

1.5

2.5

60 50

2

40

1.5

30 1

1

1

1

1 20

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

0

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

0

0.5

10 0

1985

1990

1995

2000

2005

2010

2015

2020

0

Figure 3.3: S&P500, 1980-2020. Left column: AR(1)-GARCH models (from top to bottom: GARCH(1,1), ˆ t and τˆt are only displayed in the GARCHt (1,1), GJR-GARCH(1,1), GJR-GARCHt (1,1)). h range [0, 4]. The estimated parameter values are displayed in table 2.3. AR(1)-BS(K)-(GJR)-GARCH(1,1) with l = 2 and coincident boundary knots (3.19). This model is equivalent to the S-GARCH model by Engle and Rangel (2008). Middle column: σ ˆt2 and ˆt2 and τˆt are displayed in full range τˆt are only displayed in the range [0, 4]. Right column: σ (scale of right axis). First row BS(12)-GARCH(1,1) model with zt ∼ N (0, 1), second row BS(7)GARCHt (1,1) model with zt ∼ St(0, 1, v), third row BS(12)-GJR-GARCH(1,1) with zt ∼ N (0, 1), and fourth row BS(7)-GARCHt (1,1) model with zt ∼ St(0, 1, v). K was selected with the HQ criterion from a range K ∈ {1, ..., 15}. The mean process is modeled by an AR(1) process with intercept in a uniform estimation procedure. The estimated parameter values are displayed in table 3.1.

43

3 Smoothing long term volatility l=2

AR(1)BS(12)-GARCH(1,1)

0.0631∗∗∗ (0.0124) −0.0023 (0.0097) 0.1023∗∗∗ (0.0053) 0.8499∗∗∗ (0.0023) −

φˆ0 φˆy1 α ˆ1 βˆ1 γˆ1

0.9522∗∗∗ (0.0057) 0.9275 1 3.8658 6.3157 −0.5437 0.9751 44.1696

ηˆ1 ηˆ2 σ2 κ ˆ (t )  zt ) Kur(ˆ  zt ) Skew(ˆ  z) Var(ˆ t QMcL zt2 ) 50 (ˆ AIC BIC HQ GCV

0.0689∗∗∗ (0.0026) −0.0125∗∗∗ (0.0022) 0.0833∗∗∗ (0.0008) 0.8902∗∗∗ (0.0009) − 5.9687∗∗ (2.1304)





AR(1)BS(7)-GARCHt (1,1)

0.9735∗∗∗ (0.0012) 0.9827 1 18.3002 6.9196 −0.5807 0.8710 47.0805

AR(1)BS(12)-GJR-GARCH(1,1)

AR(1)BS(7)-GJR-GARCHt (1,1)

0.0352 (0.0466) −0.0013 (0.0620) 0.0087 (0.0214) 0.8567∗∗∗ (0.0176) 0.1635∗∗∗ (0.0396) −

0.0500∗∗∗ (0.0006) −0.0085∗∗∗ (0.0008) 0.0063∗∗∗ (0.0002) 0.8799∗∗∗ (0.0007) 0.1532∗∗∗ (0.0007) 6.4479∗∗∗ (0.0464)

0.9471∗∗∗ (0.0235) 0.9134 1 3.5663 5.9899 −0.5042 0.9281 52.3115

0.9628∗∗∗ (0.0008) 0.9576 1 9.3874 5.9727 −0.4963 0.8235 45.4305

27496 27626 27540 42.1652

26900 27001 26934 42.6228

27200 27338 27247 42.3027

26693 26801 26729 42.8162

0.8206 42.0917

0.8301 42.5651

0.7918 42.2249

0.8013 42.7541

QLIKE MSE

Table 3.1: S&P500 (see tables 2.1, 2.2). BS(K)-(GJR)-GARCH model with l = 2, zt ∼ N (0, 1) and

zt ∼ St(0, 1). Robust-standard-errors (4.107), with analytic gradient and numerical Hessian in ˆ selected by HQ parentheses. Number of knots evaluated in a range K ∈ {0, ..., 15} and optimal K criterion. Model with K = 0 corresponds to a unit-GARCH model with constant unconditional variance. B-spline functions are equivalent to truncated power functions and, therefore, the results in this table are the same as for the related spline-GARCH model of Engle and Rangel (2008). * p-value < 0.10, ** p-value < 0.05, *** p-value < 0.01

3.3.3 B-spline-GARCH model The long-term-volatility in the form of ⎛

τt = exp ⎝

K−1

⎞ l wi Bi,t (t)⎠

(3.31)

i=−l

is described in the following as the BS-GARCH model (which in this general expression also includes the case of the BS-GJR-GARCH model). As mentioned above, each truncated spline function can be represented by the related B-spline function. Therefore, to overcome the problems with the truncated power basis, it should be mentioned that the S-GARCH model by Engle and Rangel (2008) can also be represented as a B-spline-function as in (3.31) ˆ is selected with l = 2. Figure 3.4 and table 3.3 display τˆt , estimated by (3.31) with l = 3. K with the HQ, and the spline function is defined with equidistantly distributed knots for illustrative purposes.

44

3 Smoothing long term volatility For the BS-GARCH model, the optimization problem with given knot sequence t and degree l becomes θˆ =

arg max LT (θ) , θ∈RU +V +1 ×R2P +Q+(1) ×RK+l

(3.32)

where θ = (φ, α, w), with the parameters of the mean equation φ = (φ0 , φy1 , ...), shortterm volatility equation α = (α1 , ..., αP , γ1 , ..., γP , β1 , ..., βQ , (v)), and the parameters of the spline function w = (w−l , ..., wK−1 ). As reported in table 3.2, the computation of S-GARCH models with a truncated power basis is faster than with a B-spline basis, whereby the latter requires fewer iterations. spline-GJR-GARCH iterations θˆ 415 function-evaluations θˆ 470 elapsed-time θˆ 3.982 sec

BS-GJR-GARCH 230 244 5.378 sec

Table 3.2: spline-GJR-GARCH(1,1) and BS-GJR-GARCH(1,1) with T = 10330, K = 15 and l = 2 (u = 20). Optimizing (4.45), (4.43) with MATLAB ‘fminunc‘ (BFGS update, Wolfe line-search). All computations are performed on an ‘Intel Core i9-9900K 3.60 GHz‘ PC.

45

3 Smoothing long term volatility l=3

AR(1)BS(14)-GARCH(1,1)

0.0629∗∗∗ (0.0048) −0.0024 (0.0057) 0.1014∗∗∗ (0.0036) 0.8500∗∗∗ (0.0006) −

φˆ0 φˆy1 α ˆ1 βˆ1 γˆ1



vˆ ηˆ1 ηˆ2 σ2 κ ˆ (t )  zt ) Kur(ˆ  zt ) Skew(ˆ  z) Var(ˆ t QMcL zt2 ) 50 (ˆ AIC BIC HQ GCV QLIKE MSE

0.9514∗∗∗ (0.0035) 0.9257 1 3.8304 6.2953 −0.5378 0.9652 44.5616

AR(1)BS(8)-GARCHt (1,1)

0.0693∗∗∗ (0.0067) −0.0127∗∗∗ (0.0055) 0.0834∗∗∗ (0.0018) 0.8882∗∗∗ (0.0005) − 5.9897∗∗∗ (0.0654) 0.9716∗∗∗ (0.0019) 0.9789 1 15.9463 6.8296 −0.5596 0.8728 48.7291

AR(1)BS(14)-GJR-GARCH(1,1)

AR(1)BS(8)-GJR-GARCHt (1,1)

0.0348∗∗∗ (0.0117) 0.0016 (0.0031) 0.0075 (0.0061) 0.8560∗∗∗ (0.0432) 0.1647∗∗∗ (0.0800) −

0.0504∗∗ (0.0011) −0.0085∗∗ (0.0012) 0.0042∗∗ (0.0004) 0.8777∗∗∗ (0.0006) 0.1559∗∗∗ (0.0011) 6.4860∗∗∗ (0.0800)

0.9459∗∗∗ (0.0056) 0.9109 1 3.5439 5.9364 −0.4977 0.9152 52.4872

0.9599∗∗∗ (0.0008) 0.9513 1 8.7274 5.8786 −0.4770 0.8265 44.6428

27500 27652 27551 42.1655

26898 27013 26937 42.6193

27201 27361 27255 42.2774

26690 26813 26731 42.7776

0.8204 42.0789

0.8281 42.5533

0.7913 42.1874

0.7992 42.7072

Table 3.3: S&P500 (see tables 2.1, 2.2). BS(K)-(GJR)-GARCH model with l = 3, zt ∼ N (0, 1) and

zt ∼ St(0, 1). Robust-standard-errors (4.107), with analytic gradient and numerical Hessian in ˆ selected by HQ parentheses. Number of knots evaluated in a range K ∈ {0, ..., 15} and optimal K criterion. Model with K = 0 corresponds to a unit-GARCH model with constant unconditional variance * p-value < 0.10, ** p-value < 0.05, *** p-value < 0.01

46

3 Smoothing long term volatility 4

4

3.5

4

60

3.5

3.5 50

3

3

3 40

2.5

2.5

2

2.5

2

1.5

2

30

1.5

1.5 20

1

1

1 10

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

0.5

0

0

4

50

1985

1990

1995

2000

2005

2010

2015

2020

4

45

3.5

0

3.5

40 3

3

3 35

2.5

2.5

2

2

1.5

2.5

30

2

25 20

1.5

1.5

15 1

1

1 10

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

0.5

5

0

0

4

100

1985

1990

1995

2000

2005

2010

2015

2020

4

90

3.5

0

3.5

80 3

3

3 70

2.5

2.5

2

2

1.5

2.5

60

2

50 40

1.5

1.5

30 1

1

1 20

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

0.5

10

0

0

4

100

1985

1990

1995

2000

2005

2010

2015

2020

4

90

3.5

0

3.5

80 3

3

3 70

2.5

2.5

2

2

1.5

2.5

60

2

50 40

1.5

1.5

30 1

1

1 20

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

0.5

10

0

0

1985

1990

1995

2000

2005

2010

2015

2020

0

Figure 3.4: S&P500, 1980-2020. AR(1)-BS(K)-(GJR)-GARCH(1,1) with l = 3 and coincident boundary knots (3.19). Left hand side: σ ˆt2 and τˆt are only displayed in the range [0, 4]. Right hand side: σ ˆt2 and τˆt are displayed in full range. First row BS(14)-GARCH(1,1) model with zt ∼ N (0, 1), second row BS(8)-GARCHt (1,1) model with zt ∼ St(0, 1, v), third row BS(14)-GJR-GARCH(1,1) with zt ∼ N (0, 1), and fourth row BS(8)-GJR-GARCHt (1,1) model with zt ∼ St(0, 1, v). K was selected with the HQ criterion from a range K ∈ {1, ..., 15}. The mean process is modeled by an AR(1) process with intercept in an uniform estimation procedure. The estimated parameter values are displayed in table 3.3.

47

3 Smoothing long term volatility 1

5

0.8

4

10 -6

3

0.6

2

0.4

1 0.2 0 0 -1 -0.2

-2

-0.4

-3

-0.6

-4

-0.8

2

1985

1990

1995

2000

2005

2010

2015

-5

2020

10 -3

8

1.5

6

1

4

0.5

2

0

0

-0.5

-2

-1

-4

-1.5 -2

1985

1990

1995

2000

2005

2010

2015

2020

1985

1990

1995

2000

2005

2010

2015

2020

10 -9

-6

1985

1990

1995

2000

2005

2010

2015

-8

2020

Figure 3.5: S&P500, 1980-2020. ln τt and derivatives based on BS(8)-GARCH(1,1) model with zt ∼ N (0, 1), l = 3, K = 8, T = 10339 and coincident boundary knots (3.19). The representation in logarithmic form serves the better illustration and is in line with B-spline theory presented in section 3.2.2. Top left ln τt , bottom left ln τt , top right ln τt , bottom right ln τt , where the apostrophe marks the related derivative, see (3.23),(3.24).

3.3.4 P-spline GARCH model Brownlees and Gallo (2010) first followed the P-spline approach by Eilers and Marx (1996) with B-spline basis functions. They applied a Multiplicative Error Model (MEM) (Engle, 2002b) with a realized-volatility measurement as a proxy variable for volatility for their Pspline MEM model. Feng and Härdle (2020) applied a PS-GARCH model with a truncated power basis to make comparison to the original S-GARCH model more intuitive. As originally introduced by Eilers and Marx (1996), the P-spline function are based on B-spline basis functions (3.18) and, in the case of the P-spline-MEM model, with the exponential Bspline function (3.31) to smooth realized-volatility terms. In the case of P-spline functions, a penalty is added to the likelihood function. The resulting penalized maximum likelihood (PLM) estimator is 



θˆ = arg max LT (θ) − ιw Dk Dk w ,

(3.33)

θ

where θ contains all parameters from the MEM-function and the B-spline function, w is the parameter vector of the B-spline parameters, and ι is a parameter controlling the smoothness of the fit. For ι = 0, the penalized likelihood-function is reduced to a common maximumlikelihood function. The term Dk refers to a certain difference matrix of the spline function, see (3.24), (3.25). The term ιw Dk Dk w is called “roughness penalty“ (Ruppert et al., 2009, p.66). For k = 0, (3.33) equals the ridge regression, see Eilers and Marx (1996). Since the smoothing of the P-spline function mainly takes place via ι, the recommended choice of number of knots is very vague. Eilers and Marx (1996) suggested “to use a relatively large number of knots“. Ruppert (2002) stated that “because smoothing is controlled by

48

3 Smoothing long term volatility the penalty parameter, ι, the number of knots, K, is not a crucial parameter“. He proposed two algorithms to find the right K. The first is the so-called “myopic“ algorithm, where ι(K) is optimized for a given grid of K ∈ {5, 10, ..., 120} with the GCV criterion. As soon as a ι(K) is smaller than the previous K, the algorithm stops. The second is the so-called “full-search“ algorithm. Here, ι(K) is calculated over a complete grid and the value with the minimum GCV is selected. As default value, he proposed to choose K = min(35, T /4). Without specific justification, Brownlees and Gallo (2010) applied 20 knots and selected the smoothing parameter ˆι over different estimates with a corrected AIC. Feng and Härdle ˆ by a so(2020) conducted a simulation study with K = 10, 20, .., .70 knots and received ˆι(K) called “iterative plug-in“ algorithm. The P-spline approach is rarely used with non-uniform knot distributions. First, the difference matrix (3.25) becomes more difficult, and second, there is no need for a precise knot placement if the ratio of knots/observation is high. These advantages seem appealing but somewhat arbitrary regarding the structure of the data. Feng and Härdle (2020) proved that both τt and ι can be estimated consistently with a P-spline approach. Nevertheless, it should also be considered that the estimated GARCH parameters can be strongly biased for small samples or a high knot/observation ratio. The simulation studies in chapter 5 and in Old (2020) suggested that. Furthermore, ridge type regressions are known for biased parameter estimations, see Rao et al. (2008, pp.79-82). Appendix B reports the analytical gradient of the P-spline GARCH function.

49

4 Free-knot spline-GARCH model Given the assumption that the DGP is a spline function, finding an sl (t) that approximates the data well and minimizes a loss function is a mathematically difficult problem to solve. The difficulty arises from the choice of the optimal Btl in combination with K, l, and t. Solving these problems in a uniform framework is computationally complex. It often leads to unsatisfactory results in terms of convergence, accuracy, and computational burden. Therefore, most approaches try to break the whole problem down into subproblems. Here, the numerical advantages of B-spline bases are so decisive (see 3.2.2) that almost all models cited are based on them. The selection of l is partly done with a subjective and experience-based decision regarding the smoothness requirement. Therefore, Engle and Rangel (2008) opted for l = 2, without theoretical foundation. Most authors (de Boor and Rice, 1968b; Jupp, 1978; Lindstrom, 1999; Gervini, 2006; Brownlees and Gallo, 2010; Luo et al., 2019, inter alia) opted for cubic splines (i.e. l = 3) because of their good smoothing properties. With cubic splines, it is possible to set continuity requirements for the first two derivatives if ri = 1. The cubic spline function is still continuous for the occurrence of multiple knots up to ri = 3. Agarwal and Studden (1980) and Stone (1982) derived a criterion for the optimal l and K, depending on the distribution of the data. Besides l, the location of K − 1 active inner knots in tI affects the shape of the B-spline function. Therefore, the selection of the location and the number of the knots are of utmost importance. In order to obtain unbiased estimates, the true K0 and the true tI0 should not vary widely from the estimates. Different approaches to receiving reliable estimates can be distinguished. These approaches can be divided into local gradient-based, adaptive, and knot-removal approaches. The Jupp transformation approach presented in this dissertation belongs to the group of local gradient-based estimation. This method has improved local optimization and is certainly the most important representative within this group. Dierckx (1993, pp.61-62) proposed the estimation with an additive penalty term. Lindstrom (1999) took a similar approach as Jupp (1978), but multiplied a term to penalize knot locations with too small distances. Her approach mitigates the “lethargy“ problem (see 4.2.2) even more, but has the drawback that additional tuning parameters, which are not part of the estimation, must be selected. Gervini (2006) estimated the optimal distribution of the knot-sequence also with Jupp-transformed knots. He estimated a global μt by applying multiple splines to panel data. Adaptive models deal with optimizing the number and the knot locations simultaneously. Smith (1982) first introduced and Friedman and Silverman (1989); Friedman (1991), and Stone et al. (1997) mainly further developed this approach. Within this model class, spline bases with different K, Btl (t), and w are analyzed and selected with methods like GCV or other information criteria, starting either with the maximum or the minimum model order and gradually adding or omitting basis functions. In recent years, a novel two-stage framework has been established. First, spline functions defined on a (large) initial knot vector are estimated. In the second stage, candidate knots are chosen from the initial knot vector by the specific criterion, and in the last step, these candidate knots are locally adjusted (Kang et al., 2015; Luo et al., 2019). These models do not require additional model selection procedures.

50

© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2022 O. Old, Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model, Gabler Theses, https://doi.org/10.1007/978-3-658-38618-4_4

4 Free-knot spline-GARCH model Since the methods used in this dissertation deal with local-gradient-based estimates, they are discussed in detail here. In the field of regression splines, early explorations by Rice (1969), de Boor (1973), and Burchard (1974) proved that the approximation power of splines with variable knot locations is better than that of splines with equidistant knots. In their approaches, the knots were treated as free parameters and estimated together with all other parameters. Spline parameters were typically estimated by a (linear) ordinary LS (OLS) estimation with a given knot sequence. Treating the knots as free parameters requires K inequality constraints to ensure ti − ti−1 ≥ 0, and is, therefore, no longer a linear problem. Since the parameter vector θ = (w, tI ) is a mixture of linear and nonlinear parameters, solving the objective function of NLS estimation must be split up into a reduced nonlinear problem and a linear problem (Golub and Pereyra, 1973, 2002),(Kaufman, 1975). The findings by Rice (1969); de Boor (1973); Burchard (1974), and Jupp (1978) showed for NLS estimation that knot locations tend to coalesce, i.e., ti − ti−1 = 0, and so ri > 1 for some ξi . As this was not necessarily always related to possible structural breaks at the respective locations ξi , the results appeared biased. Jupp (1975) highlighted for the class of so-called γ-polynomials, which includes spline functions, that this is due to the so-called “lethargy“ theorem, which section 4.2.2 discusses. To mitigate this problem, Jupp (1978) suggested a log-transformation of tI to an unconstrained sequence λ, where the boundary of the constraints is put to infinity. Thus, coincident knot locations are penalized, and the constrained optimization problem changes to an unconstrained optimization problem, which section 4.2.3 presents. Nevertheless, even with Jupp-transformation, the optimal knot locations are not easy to derive. This applies all the more to very strongly fluctuating return series. The choice of a suitable objective function and optimization method is, therefore, crucial. Jupp (1978) presented estimates for the optimal knot locations using different local derivative-based Newton-Raphson methods like the Davidon, Fletcher, and Powell (DFP), steepest descent, and the Gauss-Newton-LevenbergMarquard (GNLM) methods. The latter is often used especially for NLS estimator methods. Section 4.1 investigates different optimization methods. In the regression spline literature mentioned above, parameters are commonly estimated by optimizing OLS or NLS functions. However, there are good reasons not to estimate the proposed FKS-GARCH model with OLS/NLS estimators; therefore, this dissertation thesis estimates the parameters with the QML method. Section 4.2 discusses the objective functions employed in this dissertation. As mentioned above, the number of knots K also affects the smoothness of the spline function. Furthermore, K affects the accuracy of the estimators, as Old (2020) and chapter 5 of this dissertation verified. If K is too small, then the resulting spline function does not adequately reflect the data, and if K is too large, the spline function is over-adapted. Therefore, a good selection criterion of the optimal K has to be chosen. Engle and Rangel (2008) applied the BIC, whereas many other authors in the field of volatility splines opted for knot selection by percentiles (see 3.3). In the field of linear regression splines, Dierckx (1993, p.68) suggested using the root-mean-squared-residuals, which is a (weighted) sum of squared residuals (SSR) divided by [T − K − l − 1], as selection criterion. Many authors (e.g.,(Mao and Zhao, 2003), (Lindstrom, 1999), (Gervini, 2006)) recommended choosing the optimal K by the GCV criterion of Craven and Wahba (1978). However, there is evidence that the BIC determines the true model order asymptotically, see Lütkepohl (2007, chapter 4). Section 4.3 discusses different model selection methods in more detail. For LS, and for QML estimation, a good starting vector is critical. This is even more true for the free knot estimation. Section 4.5 explores different approaches.

51

4 Free-knot spline-GARCH model

4.1 Optimization The parameters of the models discussed in this dissertation are estimated by optimizing an objective function. These are the SSR or the likelihood function LT . The SSR has to be minimized, and LT has to be maximized by a certain optimization algorithm. Here and in the following, f (·) or g(·) represent functions in general, where the respective usage results from the context. To describe the basic procedure for optimizing the parameter vector θ, f (θ) is the general denotation of the particular objective functions. The representation of the parameter vector as θ refers to the parameters of the presumed model. θ0 represents the true (generally unknown) parameter vector and θˆ the estimator (optimized). If u > 1, where u is the length of the vector, then g(θ) denotes the (u × 1) gradient vector and H(θ) the (u×u) Hessian matrix. p(θ) denotes an approximation polynomial and Δp(θ) and Δ2 p(θ) the related gradient or Hessian. If there is a single argument to be optimized, then the first derivatives are denoted by f  (θ), p  (θ) and the second derivatives are denoted by f  (θ), p  (θ) As the objective function under consideration is multidimensional and nonlinear, iterative procedures like Newton-Raphson are necessary and frequently applied. The optimization algorithms described are minimization algorithms. Therefore, θˆ = arg min −LT (θ) = arg max LT (θ) apply for the likelihood function. If the objective function is not linear or not convex at all feasible points, global minimization is not assured. With the Newton-Raphson methods presented in this section, however, a local minimum can be obtained in most cases. This dissertation utilizes only unconstrained minimization algorithms. Hence, only these algorithms are discussed in the following. The procedures presented in this section and the general presentation refer to the monograph by Dennis and Schnabel (1983)1 . Differences to the integrated optimization-toolbox of MATLAB Coleman and Zhang (2020) are pointed out at the appropriate place. The following definitions D.7-D.9 are oriented on Magnus and Neudecker (2002, pp.116-129). D.7 The task of an unconstrained minimization algorithm is to find min f (θ)

(4.1)

θ∈Ru

the (local) minimum of the objective function f . It is assumed that every vector function f (θ) under consideration is defined on a compact set θ ∈ Θ ⊂ Ru . This can also be considered as unconstrained optimization if the true parameter vector θ0 lies in the interior of the compact space. If the true parameter vector lies on the boundary of the compact space, then this is a constrained optimization problem. Since this dissertation deals only with unconstrained algorithms, the following definition applies. D.8 (Bolzano-Weierstraß-theorem) f : Θ ⊂ Ru . If f is continuous in Θ, then there exists a minimum (maximum) for θ ∈ Θ. This theorem does not propagate the application of a constrained algorithm. It merely describes sufficient conditions for the existence of absolute extrema. Magnus and Neudecker (2002, pp.69-70,118-119) provided the proof of this theorem. 1

Every presented procedure and the finite-differences for the Hessian are programmed in MATLAB following the pseudo-codes Dennis and Schnabel (1983, Appendix A). Furthermore, the MATLAB optimization toolbox is applied for performance reasons.

52

4 Free-knot spline-GARCH model D.9 The estimated parameter vector θˆ is a local minimizer for the objective function f if f : Ru → R with u > 1 is twice differentiable in the open convex set D ⊂ Ru and if θˆ ∈ D, ˆ = 0 and, furthermore, H(θ) ˆ exists, is positive definite (p.d.) and non-singular. If g(θ) ˆ < f (θ) ∀θ ∈ Ru , then θˆ is a global minimizer for the objective function f . f (θ) ˆ The NewtonThe open convex set D can also be considered as the neighbourhood to θ. Raphson methods under consideration are structured in such a way that, starting with an initial-vector θ 0 , the objective function is approximated by a quadratic Taylor series p(θ) = f (θ j ) + g(θ j )(θ − θ j ) +

1 (θ − θ j ) H(θ j )(θ − θ j ) 2!

(4.2)

around the current parameter vector θ j or initial guess (j = 0 for the start), and requires information about the function value, the slope, and the curvature in each iteration2 . The iterations are defined by the superscript index j = 1, .., maxIter, where maxIter is the maximum number of iterations, which the researcher gives in advance. Newton-Raphson algorithms converge towards a local minimum, but not necessarily to a global minimum. Therefore, the choice of a good starting vector is necessary to detect a global minimum. If H is p.d. or positive-semi-definite (p.s.d.), then the quadratic function (4.2) is strictly convex or convex. This fact is necessary and sufficient to determine a descent step within a minimization algorithm which ensures that f (θ j +1 ) < f (θ j ) ∀j .

(4.3)

Differentiating (4.2), setting Δp(θ) = 0 and θ j +1 to the critical point Δp(θ) = g(θ j ) + H(θ j )(θ − θ j ) θ j +1 = θ j − H −1 (θ j )g(θ j ) s j = θ j +1 − θ j = −H −1 (θ j )g(θ j )

(4.4)

determines the step towards the next iteration. s is simply the search direction of the algorithm. If g  s < 0, it is a descent direction. Thus, in (4.4), it is only a descent direction if H is p.d.. The different Newton-Raphson methods differ mainly in the configuration of s. Even though s in (4.4) seems appealing in theory, it suffers from two issues in practice. On the one hand, as the functions under consideration are highly nonlinear, there is no global convergence guaranteed. In every iteration j , H has to be p.d. to ensure a descent direction. On the other hand, calculating H in every iteration is computationally cost intensive, in particular with finite differences. With the Newton-Raphson method in (4.4), it is possible that in some iterations H is not p.d., which could terminate the algorithm quickly. A remedy is the calculation of the so-called “model Hessian“ (Dennis and Schnabel, 1983, pp.101-103) H(θj ) = H(θ j ) + m j , where m j > 0 if H(θ j ) is not p.d. and m j = 0 else. This is achieved by a Cholesky decomposition of H as conducted by algorithm A5.5.1 in Dennis and Schnabel (1983). The model Hessian can also be used as start-Hessian H(θ 0 ) for quasi-Newton procedures as further described below. Line search algorithm Calculating θ j +1 by (4.4) is called a full-Newton-step, where s is multiplied by a (here omitted) factor δ = 1. Especially with rough surfaces of the objective function, a full2

If a linear function is the subject of investigation, it is sufficient to consider only the linear part of the Taylor series.

53

4 Free-knot spline-GARCH model Newton-step could be too large and could skip a possible point in the descent direction. If a full-Newton-step results in f (θ j +1 ) ≥ f (θ j ) ∃j , the step-length δ j > 0 θ j +1 = θ j + δ j s j

(4.5) j

3

has to be adjusted. If δ is chosen by a fixed criterion such as step-halving , it is also possible to skip possible points in the descent direction. There are two different reasons for this. On the one hand, δ j can be too small in relation to the decrease of f (θ j ) → f (θ j +1 ). On the other hand, the decrease of the function value relative to δ j can be too small. Therefore, it is recommended to use a line-search algorithm which finds the optimum step-length, where condition (4.3) seems to rigid. The condition for the first problem of too small decreases of the function values f (θ j + δ j s) ≤ f (θ j ) + a · δ · g(θ j ) s

(4.6)

g(θ j + δ j s) s ≥ b · g(θ j ) s

(4.7)

is described in (4.6), and the condition for the second problem of to small steps is described in (4.7). It is required that δ > 0, a ∈ [0, 12 ] and b ∈ [a, 1]. If b > a, both conditions can be achieved simultaneously. Line-search algorithms are designed as sub-routines within each iteration, where δ j is reduced until the conditions (4.6) and (4.7) are met. This reduction is called “backtracking“(Dennis and Schnabel, 1983, pp.126-129). The θ values that fulfill both conditions are in the “permissible region“ Dennis and Schnabel (1983, p.120). This region is geometrically located between the intersection of condition (4.6) and the tangent of condition (4.7) with fˆ(δ)=f (θ j + δs),

(4.8)

which a line-search algorithm minimizes. To keep the display of the backtracking sub-routine clear, the superscript j is left out here and the sub-iterations have a subscript index. The following line-search procedure refers to algorithms A6.3.1 and A6.3.5 in Dennis and Schnabel (1983), whereas the MATLAB line-search algorithm follows the representations in Fletcher (2013, chapter 2.6). Differences are pointed out. Dennis and Schnabel (1983) started the backtracking sub-routine first with the full-Newtonstep δ0 = 1 and the MATLAB optimization-toolbox with a sequence of {δ− = 0, δ0 = up}, where up is the upper limit of the permissible region. Here, δ− corresponds to the previous value and δ0 corresponds to the recent value. A full Newton-step, in contrast, ensures that the best possible convergence could be achieved. Minimizing (4.8) requires condition (4.6), although Dennis and Schnabel (1983) neglected condition (4.7) and gave a lower limit of low = 0.1. The MATLAB optimization-toolbox meets both conditions. If fˆ(1) in (4.8) does not meet the conditions, the algorithm searches the optimal δˆ by backtracking r ∈ [low, up],

δj = r δj

(4.9)

where 0 < low < up < 1. Given the information about fˆ(1) = f (θj + s), fˆ(0) = f (θ j ), and fˆ (0) = g(θ j ) s, a quadratic polynomial 



pq (δ) = fˆ(1) − fˆ(0) − fˆ (0) δ 2 + fˆ (0)δ + fˆ(0)

3

Here the step length δ j is halved within a sub-routine until f (θ j +1 ) < f (θ j ).

54

(4.10)

4 Free-knot spline-GARCH model is modeled to find the minimum value of δˆ =

−fˆ (0) , ˆ 2 f (1) − fˆ(0) − fˆ (0) 

(4.11)

for which pq (δ) = 0 and pq (δ) > 0 hold. δ  12 for small a and, therefore, up ≈ 12 for the first sub-iteration. If condition (4.6) is not met after the first sub-iteration, modeling a cubic polynomial pcu (δ) = cδ 3 + d δ 2 + fˆ (0)δ + fˆ(0),

(4.12)

where ⎡



1

1 2 c −) ⎣ (δ = −δ d δ− − δ2− (δ−2− )2



−1 fˆ(δ− (δ2− )2 ⎦ δ− ˆ f (δ2− (δ2− )2

− fˆ(0) − fˆ (0)δ− − fˆ(0) − fˆ (0)δ2−



is possible with the information of the previous two values of δ. Here, δ− corresponds to the previous value and δ2− corresponds to the value before that. The minimum of (4.12) is δˆ =

−d +



d 2 − 3c fˆ (0) , 3c

(4.13)

  (δ) = 0 and pcu (δ) > 0. The line-search algorithm of the MATLAB optimzationwith pcu toolbox starts the sequence of sub-iterated δ2− = 0 and δ− = up. Therefore, the cubic approximation (4.12) is already used in the first sub-iteration step. The sub-routine for both algorithms stops if condition (4.6) is met and δˆ = δ j .

Quasi-Newton methods That H is p.d. can be guaranteed by calculating the model-Hessian in each iteration but is still computationally intensive, as mentioned above. Therefore, methods were developed that do not calculate the Hessian explicitly but approximate it using the information about parameter-vector, gradient, and function-value. These methods are called “quasi-Newton“. Two important representatives of the “quasi-Newton“ methods are the Davidon, Fletcher, and Powell (DFP) and the Broyden, Fletcher, Goldfarb, and Shanno (BFGS) method. LSproblems typically apply the Gauss-Newton (GN) and Gauss-Newton-Marquard-Levenberg (GNML) methods. Employing H(θ j ) := I, where I is the (u × u) identity matrix, in (4.5) is called steepest-descent. This dissertation deals essentially with the BFGS algorithm. The BFGS update of H in general terms approximates the information about the curvature Q j = Q (θ j ) ≈ H(θ j ) at θ j . Using the BFGS update procedure, the first iteration requires a start matrix for Q (θ 0 ). The optimization-toolbox by MATLAB uses Q 0 = I, which is also possible with Dennis and Schnabel (1983) algorithm A.9.4.1. Then the first iteration is a steepest-descent step, which ensures a symmetric and p.d. starting matrix. A starting matrix that contains more information for the first iteration can be obtained by using the model Hessian. With the BFGS update, the “quasi-Hessian“ is computed by Q j +1 = Q j +

y j y j  Q j s j s j Q j − j j j , y j s j s Q s

(4.14)

where y j = g(θ j +1 ) − g(θ j ). The BFGS update is p.d. and symmetric in every iteration if Q 0 is p.d. and symmetric. It can be shown that y j  s j > 0 ⇒ g(θ j +1 ) s j > g(θ j ) s j , which

55

4 Free-knot spline-GARCH model agrees with line-search condition (4.7). The Newton-Raphson step with BFGS-update, thus, results in s j = −(Q j )−1 g(θ j ). For the SSR, the often used GNML-method chooses 

s = J  (θ j )J (θ j ) + n j I

−1

J (θ j )ˆ (θ j ),

(4.15)

where J (θ j ) is the Jacobian of SSR, n ≥ 0 is a constant, and ˆ(θ j ) is a (T × 1) vector of the LS-residuals. For n = 0, this refers to the basic GN-method. Newton-Raphson algorithms are characterized by being locally convergent at different speeds, depending on the procedure (Dennis and Schnabel, 1983; Fletcher, 2013).

Stopping criteria In the previously presented quasi-Newton procedures, it is assumed that the (quasi)-Hessian is always p.d.. Therefore, it is implicitly assumed that parameter values for g(θ j +1 ) = 0 refer to a minimum. In practice, this condition is rarely achieved and the minimum assumption g(θ j +1 ) ≈ 0 seems more appropriate. Thus, it is necessary to define a threshold gradtol at which the algorithm stops, and the minimum is found. The choice of this threshold depends strongly on the scale of the objective function as well as on the scale of the parameter value. Therefore, it is advisable to apply a relative gradient g(θ) θ/f (θ). For f (θ) ≈ 0 and/or θ ≈ 0, the relative gradient collapses. Here it is a common approach to substitute θ by (|θ| + 1), if the parameter values are in the scale around 1. The MATLAB optimization toolbox employs ||g(θ j +1 )||∞ ≤ gradtol, ||g(θ 0 )||∞

(4.16)

where the infinity norm refers to the maximum value of the gradient vector. Another termination criterion refers to the step-length δ ||θ j − θ j +1 ||∞ ≤ steptol, 1 + ||θ j ||∞

(4.17)

where a relative stopping criterion is also employed. The last termination criterion |f (θ j ) − f (θ j +1 )| ≤ funtol 1 + |f (θ j )|

(4.18)

refers to the decrease of the function value. The MATLAB-implied stopping criteria differ a little from the proposed algorithms in Dennis and Schnabel (1983). Here, the authors suggested using gradtol = macheps1/3 and steptol = macheps2/3 as default values, which is applied within this dissertation. macheps corresponds to the “floating-point relative accuracy“ Coleman and Zhang (2020). For MATLAB macheps = 2−52 . The applied default value for funtol = 10−6 . The last stopping criterion applied is maxIter and is selected by the researcher. The simulation study sets this value to maxIter = 1000. If an optimization routine does not converge, in addition to choosing an alternative starting vector, one can also try to obtain an estimator via less stringent tolerance levels (Dennis and Schnabel, 1983; Fletcher, 2013).

56

4 Free-knot spline-GARCH model

4.2 Estimation methods The researcher’s aim is to approximate the DGP through a specific model, given a sample y = {y1 , ..., yT } drawn from the true distribution p0 (y). In the following, the distribution generated from the observed data by the presumed model is denoted as p(y|θ). The explicit model is not presented here, only the distribution results from it depending on the model’s inherent parameters, which simplifies the presentation of the problem. The estimators of ˆ this presumed model depend on the sample θ(y) and will vary from sample to sample. If the realized values of the samples are independently distributed, it can be inferred that they are drawn from the same DGP. This holds for cross-section data, see Davidson and MacKinnon (1993, pp.113-118). Stock return series, on the other hand, are customarily sampled at the daily closing price. A resample with a different timestamp (e.g., opening prices) is not independent of previous samples. Nevertheless, as in cross-section theory, the true DGP is unknown, or the researcher has little information about it. The true DGP can be seen as an unknown model with unknown parameters θ0 . Of course, a model is just a simplification of reality, no researcher knows the truth, and the “truth“ does not necessarily have to be generated by a specific (true) model, which might suggest the use of θ0 . Thus, the assumption about the true model is more theoretical. Therefore, p0 (·) refers to the “truth“, which corresponds to the true model if p0 (·) = p(·|θ0 ) Burnham and Anderson (2002, pp.16-37). They further stated that the “truth“ could be regarded as an infinite-dimensional model and “building the set of candidate models is partially a subjective art; that is why a scientist must be trained, educated, and experienced in their discipline“. Based on the sample, the parameter values that generated the sample are estimated. Frequently used methods are the LS and the ML methods. The LS method is widely used, especially in the cross-section spline literature, making the presentation necessary. Furthermore, the LS method is applied to estimate the starting vector and to smooth the data (see 4.5) in preparation for the main estimation procedure. With the LS-method, no a priori assumption is made about the distribution of the data p(y|θ). With the ML principle, on the other hand, an a priori assumption about p(y|θ) is necessary. The ML principle is not very common for estimating parameters in cross-section spline models (except generalized linear and extended linear models, see chapter 1) but is very common in the time series context. Thus, it will be discussed in more detail. The choice of the objective function also depends on whether the model parameters included are linear or nonlinear. A model is linear in parameters if any first partial derivative ∂f∂θ(θ) is independent from any parameter (Rao et al., 2008, p.2). So for example, for a classic linear  ∂yt (w) l l regression model yt = K−1 i=−l wi Bi,t (t), the first derivative ∂w = Bi,t (t) is independent from 2 any wi . For a GARCH(1,1) model ht = α0 + α1 t−1 + β1 ht−1 holds that the first derivative ∂ht (α) (θ) = (1, 2t−1 , ht−1 ) + β1 ∂ht−1 is not independent from any parameter. Therefore, a ∂α ∂α GARCH model is nonlinear in parameters due to its recursive construction. The same holds for the class of ARMA and MA models. The last section 4.1 reported principles of minimization of a particular objective function value f (θ) in dependence of parameters. This section deals with the objective functions. The parameter vector of the FKS-GARCH model consists, as reported above, of the parameters of the mean function φ, the GARCH parameters α, the parameters of the spline function w, and the parameters of the Jupp-transformed knots λ.

57

4 Free-knot spline-GARCH model Regarding the full model y t = μ t + t

(4.19)



t =

ht τt zt 

(4.20)



2 hst = α0 + α1 zt−1 + β1 ht−1





2 hat = α0 + (α1 + γ1 1t−1 0, α ˆ p > 0 (for at least one α ˆ p ), α ˆ p ≥ 0 (for p = 1, .., P ) ˆ t is positive and E[2 ] < ∞ (see section and βˆq ≥ 0 (for q = 1, .., Q), which ensures that h t 2.4). This ensures that the variance or in general θ, and in particular for GARCH models, a finite second moment of the innovations under consideration exist (Bollerslev, 1986, inter alia). With assumption A.2, the boundary of the true parameter vector is excluded. It is important that the presumed model is correct and not a single parameter of the presumed model equals zero. If, for example, the true underlying process follows a constant variance process with α0,1 = 0 and the presumed model is a GARCH (1,1), then asymptotic normality for α ˆ 1 cannot be derived (Straumann, 2005, pp.76-77). Assumption A.4 implies the IGARCH case, where E[2t ] = ∞. For integrated GARCH models, the estimated parameters are still consistent and asymptotically normally distributed. Given the B-spline property P.4 of a convex hull, then the B-spline function sl (t) is bounded from below and from above if w ∈ W , such that A.5 maxi=−l,...,K−1 |wi | ≤ Cw < ∞. Furthermore, σt2 is positive due to τt = exp(sl (t)) and, therefore, it holds that τt > 0 ∀t. Assumption A.5 follows the recommendation in Amado and Teräsvirta (2013, 2017) and is just a technical requirement that, in combination with property P.4, ensures that the spline function does not become infinitely large. From the multiplicative decomposition (see 3.1), it can be derived that  t (4.41) √ = ht zt , τt √ √ where ht zt and, therefore, t / τt is strictly stationary when assumptions A.1-A.5 hold.

The application of an unconstrained optimization algorithm (see 4.1) bears the risk that theoretical considerations made are not fulfilled in practice, i.e., θˆ ∈ Ru (where θ0 ∈ intΘ still holds), with consequences for the existence of a positive variance. This mitigation of the assumption about the parameter space, however, does not change the definition of the QMLE ˆ LT (θ|y) ≥ LT (θ|y) ∀θ ∈ Θ,

(4.42)

if θˆ = θ (Martin et al., 2013, pp.780-781,p.849),(Davidson and MacKinnon, 1993, pp.243255).

64

4 Free-knot spline-GARCH model The likelihood function As shown in equation (4.36), p(y1 , ..., yT |y0 ) is a product of the conditional distributions. To be able to calculate with sums, equations (4.39) and (4.40) are, therefore, logarithmized 1 1 ln Lt (θ|y) = − ln(2πσt2 ) − 2 2

ln Lt (θ|y) = −

2t σt2



(4.43)

1 2t ln σt2 + (v + 1) ln 1 + 2 (v − 2)σt2





+ ln ⎝

Γ( v+1 ) 2 

Γ( v2 )

π(v − 2)

⎞ ⎠ , (4.44)

and the resulting log-likelihood function ln LT (θ|y) =

T

ln Lt (θ|y)

(4.45)

t=1

can be optimized, when y0 or h0 and 20 are given (Martin et al., 2013, chapters 1,2 and pp.769-781). A.6 h0 is a positive constant (not a function value resulting from θ). For practical purposes, Bollerslev (1986) recommended using the in-sample mean h0 =  1  2 ˆt . In this dissertation h0 = T1 yt2 , as no LS-residuals of the mean are calculated T in advance. Furthermore, from theoretical and practical considerations, it is assumed that y0 = 0 and 0 = 06 . To make the notation clearer, in the following, ln LT (θ|y) = ln LT (θ)7 . Regularity conditions The ML-principle is based on regularity conditions from which the asymptotic properties consistency, normality, and efficiency of an MLE can be derived. Regularity conditions are the existence of "

E0 [ln Lt (θ)] =

ln Lt (θ)Lt (θ0 )dyt ,

(4.46)

the convergence in probability of 1 p → E0 [ln Lt (θ)] ln LT (θ) − T

(4.47)

and at least twice continuous differentiability of ln LT (θ) in θ. These regularity conditions belong to θ0 = arg max E0 [ln Lt (θ)], where E0 [ln Lt (θ)] is the (true) log-likelihood function of the population (Martin et al., 2013, p.53)8 . If the likelihood function fullfills these regularity conditions, then evaluating the first and second order derivatives with θ0 yields the mean of the gradient or the Fisher information, which is demonstrated below. The expected loglikelihood function in (4.46) can be regarded as the so-called cross-entropy, and the average log-likelihood is a consistent estimator of the cross-entropy (Bozdogan, 1987), which will be further discussed in section 4.3. Through continuous differentiability, a quadratic Taylor 6

If a higher model order than AR(1)-GARCH(1,1) is chosen, the pre-sampling process has to be defined up to the largest number of considered lags. 7 Some literature (Martin et al., 2013, inter alia) recommends using the average of (4.45), i.e., 1/T ln LT (θ), to estimate QMLE. In the algorithms programmed, the sum is used, which is why this presentation is chosen here. The same applies to the calculation of gradient and Hessian. 8 Here and in the following, E0 refers to the expected value w.r.t. true population DGP.

65

4 Free-knot spline-GARCH model series approximation around each estimation step in an iterative Newton-Raphson method from section 4.1 and the evaluation of a maximum (minimum of − ln LT ) are guaranteed. First and second order derivatives To maximize the log-likelihood function in (4.45), it is necessary for the u×1 gradient vector, also known as score vector !

!

T T ∂ ln LT (θ) !! ∂ ln Lt (θ) !! ! ! ! = = gt (θ ∗ ) = g(θ ∗ ) = 0, ! ! ∂θ ∂θ t=1 t=1 θ=θ ∗ θ=θ ∗

(4.48)

to be zero. If the (u × u) Hessian !

!

T T ∂ 2 ln LT (θ) !! ∂ 2 ln Lt (θ) !! = = Ht (θ ∗ ) = H(θ ∗ ) ! !  ! ∂θ∂θ  !θ=θ∗ ∂θ∂θ t=1 t=1 θ=θ ∗

(4.49)

is furthermore negative definite (n.d.) (p.d. if minimization algorithm is applied), then θ ∗ ˆ The is a local maximum of the likelihood function. If furthermore (4.94) is fulfilled, θ ∗ = θ. Hessian in (4.49) is the observed Fisher information (Martin et al., 2013, chapter 2),(Davidson and MacKinnon, 1993, chapter 8). To derive the analytical gradient vector for the model under consideration g(θ) = − (g(φ), g(α), g(w), g(λ)) ,

(4.50)

with g(φ) =

T ∂ ln Lt (θ) ∂ ln LT (θ) = ∂φ ∂φ t=1

g(α) =

T ∂ ln Lt (θ) ∂ ln LT (θ) = ∂α ∂α t=1

g(w) =

T ∂ ln Lt (θ) ∂ ln LT (θ) = ∂w ∂w t=1

g(λ) =

T ∂ ln Lt (θ) ∂ ln LT (θ) = , ∂λ ∂λ t=1

(4.51)

where the derivatives of φ and α (for GARCH and GJR-GARCH cases) are well documented in Fiorentini et al. (1996) and Levy (2003).

66

4 Free-knot spline-GARCH model The modifications for the case of the ARMA(1,1)-FKS-(GJR)-GARCH(1,1) model w.r.t. the parameters φ of the mean function ∂hst t−1 ∂t−1 ∂ht−1 = 2α1 + β1 ∂φ τt−1 ∂φ ∂φ ∂hat ∂ht−1 t−1 ∂t−1 = 2(α1 + γ1 1t−1 0. However, for comparative reasons, they also estimated an S-GJRGARCH and an S-GARCH model. All models employed have equidistant knot vectors. They chose an S&P500 sample from 1950-2013 (T = 15853). Goldman and Shen (2017) carried out a similar approach with a S&P500 sample from 2002-2016 (T = 3500). The latter study applied the AIC and the BIC for selecting the order of the spline model. The behavior of the estimators is in line with those in Engle and Rangel (2008) and Old (2020). Silvennoinen and Terasvirta (2017) investigated the polish WIG20 index from 01/1996 03/2015 (T = 4777) by means of the MTV-GJR-GARCH model of Amado et al. (2008); Amado and Teräsvirta (2013, 2017). To compare the forecast accuracy of this model, they ˆ additionally estimated an S-GARCH model, once with BIC and once with AIC selected K. Their empirical analysis partially confirms the results of the simulation study in chapter 5. ˆ was estimated with the AIC than with the BIC. The compariThus, a remarkably higher K son of the forecast accuracy of the models (besides S-GARCH and MTV-GJR-GARCH, they estimated the standard GJR-GARCH model) highlighted that the S-GARCH model with a ˆ is the best forecast model. According to all that is known (see section 5 or BIC-chosen K

134

6 Empirical study Old (2020)), it was to be expected that a high number of knots (AIC case) in combination with a short estimation period would lead to biased estimators. These, in turn, lead to a worse forecast performance. As their paper compared the symmetric S-GARCH model with the asymmetric GJR-GARCH and the MTV-GJR-GARCH models, it is unclear which effect the choice of the short-term volatility function has on the forecast performance. For the forecast of the deterministic component, it holds that τˆt+j = τˆt , as this dissertation applies. The authors proved a major disadvantage of this procedure empirically. Thus, a τˆt function (no matter which model is used to estimate it) provides better forecasts if the last value is closer to the value of the proxy variable. As the information criteria are evaluated for the IS performance, this could arbitrarily differ.

6.2 In-sample analysis Four different setups for each model class were estimated to evaluate and compare different conditional variance models. These model classes comprise the standard GARCH models, the BS-GARCH models and the FKS-GARCH models. Thus, the evaluated models are the GARCH and GJR-GARCH, the BS(K)-GARCH and the BS(K)-GJR-GARCH, as well as the FKS(K)-GARCH and the FKS(K)-GJR-GARCH, each with zt ∼ N (0, 1), zt ∼ St(0, 1, v), P = 1, and Q = 1. This and the following section omit the AR(1) model for notational clarity. Nevertheless, the output tables report all parameters. The spline models are estimated with basis functions in three different degrees in a range of l ∈ {1, 2, 3}. Furthermore, all spline models are estimated and evaluated in a range K = {0, ..., 15}, where ˆ is selected with the HQ. The K = 0 refers to a standard GARCH model. The optimal K output tables only display the estimators and statistics corresponding to these selected models. However, the other information criteria discussed in chapter 4.3 are nonetheless reported for the HQ selected model. This results in 28 different models under investigation. These are four standard GARCH models, twelve BS-GARCH models, and twelve FKS-GARCH models. The PS-GARCH model is not presented here because even after the appliance of the “full-search“ algorithm (see chapter 3.3.4), the optimal controlling parameter ˆι equals zero, and, therefore, the results are similar to the BS-GARCH models. After the selection of the optimal model, the IS forecast is evaluated utilizing the QLIKE and the MSE loss functions, see chapter 4.4. As block-diagonality of the Fisher information matrix is only asymptotically achieved (see chapter 4.2.4), the mean and the variance are jointly estimated. The following general findings hold for every estimated model under consideration and will, therefore, be presented in advance. After the estimation, the adequacy of each model can be evaluated by means of the standardized residuals, i.e., zˆt = ˆt /ˆ σt . Here, ˆt are the residuals of the jointly estimated AR(1) process. For the AR(1) process with homoskedastic variance, hold that zˆt has the same distribution as ˆt . Therefore, the sample statistics of the residuals from the AR(1) process with the assumption of a constant conditional variance (see table 2.2) can be compared with the standardized residuals from the AR(1)-GARCH processes. If the zˆt series is distributed according to the presumption made for the likelihood function and if zˆt is independently distributed, then the applied model captures the dynamics of the data well. Nevertheless, for all models hold that zˆt is not distributed as presumed in advance. If the AR(1) process with constant conditional variance is considered as a benchmark, then the sample skewness is mitigated from −1.2369 to values around −0.50 after employing any model with heteroskedastic conditional variance. The sample kurtosis of zˆt drops by a third, and the sample variance is equal to or less than one for most examined models. The sample

135

6 Empirical study mean remains unchanged at zero. In particular, the sample skewness and kurtosis reveal a non-normal distributed pattern for the standardized residuals. Moreover, even for the Student’s-t models, a KS statistic rejects the assumption of Student’s-t distributed zˆt . Still, the McLeod-Li test (see (2.20)) for zˆt2 reveals no significant autocorrelation up to 50 considered lags. Another important exploration is that the βˆ1 estimators are smaller for models with a Gaussian likelihood than for models with a Student’s-t distributed likelihood. All investigated models revealed that the variant with short-term GJR-GARCH volatility and a Student’s-t likelihood is the best HQ selected model. All models with a time-varying ˆ is smaller with a Student’s-t likelihood than with unconditional variance suggested that K a Gaussian likelihood. A general pattern for all spline-GARCH models is that a higher ˆ lowers the VP more, as Old (2020) showed. The corresponding tables present all these K investigations. In the following, all model classes are considered individually.

AR(1)-model Chapter 2 first estimated the mean process with a constant conditional variance. On the one hand, this dissertation does not focus on the mean. On the other hand, it is empirically proved that yt ≈ t . This assumes that a small autocorrelation in the first lags is observed, see table 2.2 or figure 2.2. Therefore, this dissertation models the mean process as well. Empirically, there is little difference between an AR(1) and an MA(1), as Nelson (1991) or Lo and MacKinlay (1990) proved. Further considered model orders were an ARMA(1,1) model or a demeaned model (only with intercept in the mean function). Here the demeaned model was rated lower than all other three by AIC, BIC, and HQ. For the AR(1), MA(1), and ARMA(1,1), there were no remarkable differences. Due to the small SACF in the first lags, an AR(1) model is chosen for the mean process throughout all estimated models, even though in some cases, the autoregressive parameter φyt is not significant (as in the case of the AR(1) model with constant conditional variance). Unfortunately, the same pattern can be observed for every other ARMA model order. The mean and the variance processes are estimated together for all considered models.

Standard GARCH models Chapter 2.4 estimated and evaluated four standard GARCH models. Table 2.3 and figure 2.3 report the results. Some typical patterns of standard GARCH models could be observed. First, there is a near-unit-root VP for each setup. The highest ηˆ1 = 0.9935 resulted from the GARCHt model and the lowest ηˆ1 = 0.9788 followed from the GJR-GARCH model. Second, for both Student’s-t distributed models ηˆ2 > 1 and, therefore, the fourth moment is infinite. Third, the GJR-GARCH models have a smaller (model-implied) kurtosis κ ˆ (t ) than the GARCH models. The best model chosen with the HQ is the GJR-GARCHt model. This model is also selected by the BIC and the AIC but not with the GCV. This pattern is consistent throughout the subsequently regarded models. For evaluating the IS forecast, the GARCH model is the most accurate following the MSE, and the GJR-GARCH model is the most accurate following the QLIKE criterion.

BS-GARCH models Table C.16 and figure D.24 (for l = 1), table 3.1 and figure 3.3 (for l = 2), and table 3.3 and figure 3.4 (for l = 3) display the results of the BS-GARCH models. The following paragraph

136

6 Empirical study first summarizes those empirical results that are not particularly different between all twelve BS-GARCH models. Unlike in the standard GARCH case, all considered BS(K)-(GJR)-GARCH models revealed that ηˆ2 < 1. Therefore, all models with Student’s-t likelihood functions captured the unconditional leptokurtic pattern of the innovation series to a certain extent. Further exploration of these models showed that all autoregressive estimators of the mean function φˆy resulting from a Gaussian likelihood function are not significant. However, for the Student’s-t models, the mean and the variance function parameters are significant at least at an α = 0.05 level. The estimated Fisher information matrix is of full rank for all twelve considered models. That is not surprising, due to the fact that all knots are distinct. Therefore, all B-spline basis functions meet the Schoenberg-Whitney theorem, see 3.2.2. Regarding zˆt , it is striking that for the BS-GARCH models with assumed Student’s-t distribution, the sample variances are noticeably smaller than one and smaller than in the Gaussian cases. The sample kurtosis and skewness are in the same range as the standard GARCH models discussed above. The advantage of models with a spline basis function is that they are easily calculated with different degrees. This makes it easier to meet the smoothness requirements of the process. It is also possible to model piecewise constant spline functions with l = 0. With this approach, the unconditional variance changes segmentwise and is discontinuous with a jump at each knot, and no derivative is possible. Therefore, BS-GARCH models with l = 0 cannot be compared with FKS-GARCH models, where at least the first derivative regarding knots is necessary. Consequently, only spline functions with l ∈ {1, 2, 3} are considered subsequently. The following paragraphs look at the estimated BS-GARCH models in different degrees. The τˆt functions with l = 1 are very jagged. These functions adapt well to the long-term trend but are not very smooth. Nevertheless, they have per se one or two spline parameters less than in spline functions with l ∈ {2, 3}. Therefore, the BS(11)-GJR-GARCHt model is the best HQ model among all BS-GARCH models. Another property shows that the regarded models lower the VP remarkable. Again, the GJR-GARCH variant has the lowest VP with ηˆ1 = 0.9451. The highest ηˆ1 = 0.9686 results again from the GARCHt type. All four setups are far from an integrated GARCH model. Furthermore, the BS(14)-GJRGARCH model has the smallest QLIKE loss function, and the BS(14)-GARCH model has the smallest MSE loss function in this class of models. For the BS(14)-GJR-GARCH model, however, it is worth mentioning that the α ˆ 1 estimator is insignificant. This is a known issue for GJR-GARCH models, as discussed in section 6.1. The presented results demonstrate that B-spline functions with a low smoothness requirement as with l = 1 exhibit good IS properties. As mentioned above, the S-GARCH model as proposed by Engle and Rangel (2008) can be estimated with a B-spline basis. Here, the BS(K)- GARCH with l = 2 corresponds to the S-GARCH model. Contrary to the B-spline basis functions with l = 1, the τˆt function is relatively smooth. The best model selected with HQ is the BS(7)-GJR-GARCHt model. The models with presumed Student’s-t distributed DGP have a smaller number of knots ˆ = 7) than the models with the assumption of a Gaussian distributed DGP (K ˆ = 12). As (K in the l = 1 case, in addition to all parameters of the mean function, the ARCH estimator α ˆ 1 is also not statistically significant for the BS(12)-GJR-GARCH model. Nevertheless, this variant has the lowest QLIKE loss function for the IS forecast and the smallest ηˆ1 = 0.9471. The BS(12)-GARCH model is most accurate in terms of the MSE.

137

6 Empirical study These findings are similar to the BS(K)-GARCH with l = 3. The IS forecast for the BS(14)GJR-GARCH model evaluated by QLIKE is the same as l = 1. The lowest ηˆ1 = 0.9459 arises from the BS(14)-GJR-GARCH model and the highest ηˆ1 = 0.9716 from the BS(7)GARCHt model. Considering all BS-GARCH models, the lowermost VP comes from the BS(14)-GJR-GARCH model with l = 1. A research question of this dissertation is whether different spline-GARCH models mitigate the VP and if this mitigation is significant. Therefore, a t-test ηˆ1 (spline-GARCH) − ηˆ1 (standard GARCH)  η1 (spline-GARCH)) rse(ˆ

(6.1)

 η1 ) is computed as described in (4.108) and (4.109). Here, the difference is applied, where rse(ˆ of the VPs of the BS-GARCH model and of the corresponding standard-GARCH model is investigated. Table 6.1 displays the results. The VP drop for all BS-GARCH models is statistically significant, except for the BS(12)-GJR-GARCH model with l = 2.

l=1 l=2 l=3

GARCH

GARCHt

*** *** ***

*** *** ***

GJR-GARCH

*** ***

GJR-GARCHt

*** *** ***

Table 6.1: BS-GARCH models. In columns: different variants. In rows: different basis function degrees. Each cell correpsonds to best HQ selected model, see tables C.16, 3.1, 3.3. The benchmark model for the t-test is the corresponding standard GARCH model, see2.3. The stars show if the VP drop is significant at a certain level. * p-value < 0.10, ** p-value < 0.05, *** p-value < 0.01

FKS-GARCH models This section addresses a central research question of this dissertation by means of the S&P500 sample: whether the free estimation of the knots leads to better IS properties than with the assumption of equidistant knots. Therefore, the four above mentioned short-term volatility variants within a range l ∈ {1, 2, 3} are evaluated through an FKS-GARCH model. Table C.17 for l = 1, table 6.4 for l = 2, and table 6.6 for l = 3 summarize the results. Figure D.23 for l = 1, figure 6.1 for l = 2 and figure 6.2 for l = 3 illustrate the associated τˆt and σ ˆt2 functions. Some of the findings from the BS-GARCH models are similar to those with the FKS-GARCH models. Nevertheless, some estimation results are quite different. These are presented here. First, the number of knots selected with the HQ is considerably smaller than in the BSGARCH case with equidistant knots. That is an expected result, as knots are placed in locations where the process changes due to free estimation. Second, ηˆ2 < 1 for all FKS-GARCH models, except for the FKS(3)-GARCHt model with l = 1. Third, with the BS-GARCH approach, the GCV and the MSE select the GARCH variant with Gaussian likelihood as the best model. Here, this pattern is not as stringent as in the BS-GARCH case. Fourth, not all parameters of all Student’s-t models are significant at a usual level. Furthermore, essentially, not every estimated Fisher information matrix is of full rank. Regarding the last point, if two adjacent knots are too close to each other, then the corresponding columns of the estimated Fisher information are dependent. As the knot optimization is conducted

138

6 Empirical study with Jupp-transformed knot parameters, there is no exact coincidence. This is due to the fact that knot locations are not restricted to lying on a discrete time scale, but they can be placed very close to each other. That is a critical issue for FKS-GARCH models if they do not meet the Schoenberg-Whitney theorem. To recapitulate the Schoenberg-Whitney theorem holds if in no interval [i, i + l + 1] all knots coincide. If the Schoenberg-Whitney theorem is not met, the design matrix is not of full rank and the resulting spline estimators are not identifiable. However, that is not the case for any of the estimated models within this empirical study. The columns of Iˆθˆ corresponding to the spline, GARCH, and mean estimators are linearly independent throughout. This implies that the resulting estimated standard errors for these estimators are reliable. Tables 6.2-6.5 detail the knot locations in date format. The dates are calculated from rounded tˆi values, and, therefore, appear as coincident at certain locations. One last point that applies to all FKS-GARCH models concerns the sample variance. The resulting sample variance for zˆt for all FKS-GARCH models equals approximatively one, which differs from the BS-GARCH case. A sample variance of one corresponds to the presumed likelihood specifications, see (4.43) and (4.44). The sample kurtosis and skewness are in the same range as in the BS-GARCH case. Again, the optimal model is selected with the HQ. However, contrary to the BS-GARCH case, the knots themselves are now additional parameters. Therefore, the penalty term for the FKS-GARCH model is stricter for the same number of spline basis functions. Nevertheless, the function values of − ln LT are sometimes smaller to such an extent that the resulting HQ prefers a free estimation of the knots. The FKS-GARCH model with l = 1 has a jagged τˆt function, as in the BS-GARCH case. However, the knots are placed where the data is not smooth due to the free estimation of the knot locations. Therefore, a smaller number of knots is needed. So, for example, for the FKS(K)-GARCHt model, only three knots are selected employing the HQ. The best HQ selected model is the FKS(7)-GJR-GARCHt model. However, following the HQ criterion, the BS(11)-GJR-GARCHt model with equidistant knots would be chosen. The same holds for the IS forecast, where the BS(14)-GJR-GARCH model outperforms its competitor with freely estimated knots using QLIKE and MSE. The lowest VP is ηˆ1 = 0.9508 with the FKS(8)-GJR-GARCH model and the highest VP is ηˆ1 = 0.9829 with the FKS(3)-GARCHt . Due to the small number of knots here, the VP is only slightly reduced. All estimated Fisher information matrices are of full rank. This can easily be verified by looking at the knot locations in table 6.2. For every model setup, except for FKS-GARCHt , knots are placed in the vicinity of the financial crisis in 2008-2009. Moreover, no knot is placed in the vicinity of the “Black Monday“ event, which actually has no impact on long-term volatility. t1

FKS(6)-GARCH FKS(3)-GARCHt FKS(8)-GJR-GARCH FKS(7)-GJR-GARCHt

03/30/1995 05/11/1995 04/10/1995 03/26/1991

t2

12/20/2002 08/31/1998 03/21/2003 03/21/2003

t3

t4

t5

01/19/2004

09/29/2008

11/14/2017

06/21/2004 05/13/2009

04/20/2009 09/14/2017

08/07/2014

t6

03/28/2016

t7

FKS(6)-GARCH FKS(3)-GARCHt FKS(8)-GJR-GARCH FKS(7)-GJR-GARCHt

Table 6.2:

08/02/2017

FKS(K)-(GJR)-GARCH model with l = 1, inner knot locations in date format. tˆi is rounded before it is transformed in date format. The boundary knots are located at t0 = 01/03/1980 and tK = 12/31/2020.

The τˆt functions of the FKS-GARCH models with l = 2 are much smoother than with l = 1. Nevertheless, the long-term volatility appears peaked at the climax of the financial crisis.

139

6 Empirical study This is because there are multiple knots located around this incisive event, as a look at table 6.3 reveals. t1

FKS(8)-GARCH 03/02/1987 FKS(7)-GARCHt 03/03/1995 FKS(13)-GJR-GARCH 10/22/1982 FKS(7)-GJR-GARCHt 04/02/1991 t7

FKS(8)-GARCH FKS(7)-GARCHt FKS(13)-GJR-GARCH FKS(7)-GJR-GARCHt

Table 6.3:

t2

01/08/1993 08/22/1995 10/26/1982 04/08/1991 t8

t3

04/24/1996 09/15/2006 01/26/1984 12/13/1995 t9

t4

09/10/2004 06/07/2007 06/15/1993 03/07/2006 t10

t5

03/09/2009 03/12/2008 12/18/1995 08/02/2007 t11

t6

03/24/2009 11/28/2016 03/15/2002 02/25/2009 t12

09/18/2009 07/24/2002

07/26/2002

03/18/2009

03/30/2009

12/24/2009

05/22/2017

FKS(K)-(GJR)-GARCH model with l = 2, inner knot locations in date format. tˆi is rounded before it is transformed in date format. The boundary knots are located at t0 = 01/03/1980 and tK = 12/31/2020.

Despite this, the estimated Fisher information matrices are not singular, as all knots are far enough apart. The best HQ selected model is the FKS(7)-GJR-GARCHt model. The same HQ value is obtained for this model as for the BS-GARCH models’ best BS(11)-GJRGARCHt , i.e., HQ = 26720. Moreover, even though the FKS(7)-GJR-GARCHt parameter vector has three more parameters, it has four spline functions less. The best IS forecast arises from the FKS(13)-GJR-GARCH model, with both the QLIKE and the MSE loss function. Furthermore, it has the highest IS accuracy among all models considered in the empirical study of this dissertation and has a smaller VP than all competitors, i.e., ηˆ1 = 0.9340.  η1 ), the VP decrease in absolute values compared to the GJRHowever, due to its large rse(ˆ GARCH model (ˆ η1 = 0.9788) is noteworthy but not statistically significant (see table 6.7). The FKS(7)-GARCHt model has the highest ηˆ1 = 0.9524 among all FKS-GARCH models with l = 2, which is still far away from an integrated process.

140

6 Empirical study 12

12

60

12

10

10

50

10

8

8

40

8

6

6

30

6

4

4

20

4

2

2

10

2

0

0

0

4

50

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

1985

1990

1995

2000

2005

2010

2015

2020

4

45

3.5

0

3.5

40 3

3

3 35

2.5

2.5

2

2

1.5

2.5

30

2

25 20

1.5

1.5

15 1

1

1 10

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

10

0.5

5

0

0

10

1985

1990

1995

2000

2005

2010

2015

2020

100

0

10

9

9

90

9

8

8

80

8

7

7

70

7

6

6

60

6

5

5

50

5

4

4

40

4

3

3

30

3

2

2

20

2

1

1

10

0

0

4

100

0

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

1

1985

1990

1995

2000

2005

2010

2015

2020

4

90

3.5

0

3.5

80 3

3

3 70

2.5

2.5

2

2

1.5

2.5

60

2

50 40

1.5

1.5

30 1

1

1 20

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

0.5

10

0

0

1985

1990

1995

2000

2005

2010

2015

2020

0

Figure 6.1: S&P500, 1980-2020. AR(1)-FKS(K)-(GJR)-GARCH(1,1) with l = 2 and coincident boundary knots (3.19). Left-hand side: σ ˆt2 and τˆt are only displayed in the range [0, 4] or [0, 12]. Right-hand side: σ ˆt2 and τˆt are displayed in full range. First row FKS(8)-GARCH(1,1) model with zt ∼ N (0, 1), second row FKS(7)-GARCHt (1,1) model with zt ∼ St(0, 1, v), third row FKS(13)-GJR-GARCH(1,1) with zt ∼ N (0, 1) and fourth row FKS(11)-GARCHt (1,1) model with zt ∼ St(0, 1, v). K was selected with the HQ criterion from a range K ∈ {1, ..., 15}. The mean process is modeled by an ARMA(1,1) process with intercept in a uniform estimation procedure. The estimated parameter values are displayed in table 6.6.

141

6 Empirical study l=2

AR(1)FKS(8)-GARCH(1,1)

0.0625∗∗∗ (0.0085) −0.0015 (0.0036) 0.1018∗∗∗ (0.0017) 0.8450∗∗∗ (0.0005) −

φˆ0 φˆy1 α ˆ1 βˆ1 γˆ1



vˆ ηˆ1 ηˆ2 σ2 κ ˆ (t )  zt ) Kur(ˆ  zt ) Skew(ˆ  z) Var(ˆ t QMcL zt2 ) 50 (ˆ AIC BIC HQ GCV QLIKE MSE

0.9468∗∗∗ (0.0018) 0.9171 1 3.7491 5.9883 −0.5013 0.9995 44.0042

AR(1)FKS(7)-GARCHt (1,1)

AR(1)FKS(13)-GJR-GARCH(1,1)

0.0694∗∗∗ (0.0083) −0.0125∗ (0.0071) 0.0819∗∗∗ (0.0027) 0.8852∗∗∗ (0.0018) − 6.0010∗∗∗ (0.0624) 0.9671∗∗∗ (0.0026) 0.9687 1 12.4219 7.1506 −0.6253 1.0004 46.7130

AR(1)FKS(7)-GJR-GARCHt (1,1)

0.0341∗∗ (0.0135) 0.0035 (0.0118) −0.0059∗∗∗ (0.0008) 0.8525∗∗∗ (0.0812) 0.1746∗∗∗ (0.0393) −

0.0504∗∗∗ (0.0012) −0.0082∗ (0.0046) 0.0000 (0.0003) 0.8720∗∗∗ (0.0070) 0.1606∗∗∗ (0.0039) 6.5634∗∗∗ (1.5172)

0.9340∗∗∗ (0.0623) 0.8856 1 3.3479 5.4835 −0.4693 1.0005 42.3349

0.9524∗∗∗ (0.0052) 0.9351 1 7.6462 6.1359 −0.5418 1.0085 44.6128

27486 27638 27537 42.1206

26890 27035 26939 42.6554

27141 27373 27219 42.0995

26668 26821 26720 42.6859

0.8191 41.9496

0.8287 42.4904

0.7836 41.8391

0.7965 42.5125

Table 6.4: S&P500 (see tables 2.1, 2.2). Free-knot(K) spline-(GJR)-GARCH model with l = 2, zt ∼ N (0, 1)

and zt ∼ St(0, 1). Robust standard errors (4.107), with analytic gradient and numerical Hessian ˆ selected in parentheses. Number of knots evaluated in a range K ∈ {0, ..., 15} and optimal K by HQ criterion. Model with K = 0 corresponds to a unit-GARCH model with constant unconditional variance * p-value < 0.10, ** p-value < 0.05, *** p-value < 0.01

The simulation study in chapter 5 comprehensively investigated the FKS-GARCH models with l = 3. Spline basis functions of degree l = 3 were selected based on smoothness requirements for nonsmooth data. This implies that there is a smooth transition from one regime to the next. What immediately stands out is that the estimated Fisher information matrices are not of full rank for all four variants. An issue which the simulation study has already been shown. That is due to knot multiplicities at several locations. As in the l = 2 case, there is a peak in the τˆt function at the high point of the financial crisis (except for the FKS(6)-GARCHt model). Unlike in the l = 2 case, some knots almost coincide. Thus, the FKS(8)-GARCH model has two knots on 04/24/2009, the FKS(8)-GJR-GARCH model has two knots on 05/08/2009, and the FKS(8)-GJR-GARCHt model has two knots on 08/03/2009 and 05/14/2009, as table 6.5 details.

142

6 Empirical study t1

FKS(8)-GARCH 08/03/1992 FKS(6)-GARCHt 10/28/1992 FKS(10)-GJR-GARCH 04/16/1993 FKS(8)-GJR-GARCHt 08/03/1992

t2

t7

FKS(8)-GARCH 04/29/2010 FKS(6)-GARCHt FKS(10)-GJR-GARCH 05/08/2009 FKS(8)-GJR-GARCHt 06/29/2010

Table 6.5:

t3

08/03/1992 10/28/1992 04/19/1993 08/03/1992

t4

11/15/2001 07/26/2002 11/20/1998 01/30/2003

t8

09/29/2008 02/16/2007 01/23/2002 10/07/2008

t5

04/24/2009 02/16/2007 01/23/2002 05/14/2009

t6

10/24/2006 05/14/2009

t9

05/08/2009

11/18/2015

FKS(K)-(GJR)-GARCH model with l = 3, inner knot locations in date format. tˆi is rounded before it is transformed in date format. The boundary knots are located at t0 = 01/03/1980 and tK = 12/31/2020.

B-spline basis functions with knot multiplicities are identifiable as long as the SchoenbergWhitney-theorem holds. This is the case for all models examined. Figure 6.3 illustrates the corresponding B-spline basis functions. This figure shows clearly that the basis functions are linearly independent. Another striking issue is that the VP for three out of four models is in the narrow range [0.9433, 0.9479], and also the FKS(6)-GARCHt model has only a slightly higher VP with ηˆ1 = 0.9690. With the HQ, the FKS(8)-GJR-GARCHt model is the third-best model among all models this dissertation considers. Only the BS(11)-GJRGARCHt with l = 1 and the FKS(7)-GJR-GARCHt with l = 2 have a smaller HQ. As for the case with l = 2, the MSE and the QLIKE selected models with GJR-GARCH short-term volatility variant. Globally, the QLIKE value is the second-best among all models under consideration. The drop in VP over all FKS-GARCH models is statistically significant, except for the GJR-GARCH variants with l ∈ {1, 2}, as table 6.7 reports.

l=1 l=2 l=3 Table 6.7:

GARCH

GARCHt

*** *** ***

*** *** ***

GJR-GARCH

***

GJR-GARCHt

*** *** ***

FKS-GARCH models. In columns: different variants. In rows: different basis function degrees. Each cell corresponds to best HQ selected model, see tables C.17, 6.4, 6.6. The benchmark model for the t-test is the corresponding standard GARCH model, see2.3. The stars illustrate if the VP drop is significant at a certain level. * p-value < 0.10, ** p-value < 0.05, *** p-value < 0.01

143

6 Empirical study

12

12

60

12

10

10

50

10

8

8

40

8

6

6

30

6

4

4

20

4

2

2

10

2

0

0

0

4

50

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

1985

1990

1995

2000

2005

2010

2015

2020

4

45

3.5

0

3.5

40 3

3

3 35

2.5

2.5

2

2

1.5

2.5

30

2

25 20

1.5

1.5

15 1

1

1 10

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

4 3.5

0.5

5

0

0

4

100

1985

1990

1995

2000

2005

2010

2015

2020

4

90

3.5

0

3.5

80 3

3

3 70

2.5

2.5

2

2

1.5

2.5

60 50

2

40

1.5

1.5

30 1

1

1 20

0.5 0

0.5

1985

1990

1995

2000

2005

2010

2015

2020

0

12

12

10

10

8

8

6

6

4

4

2

2

0.5

10 0

1985

1990

1995

2000

2005

2010

2015

2020

100

10

90

9

80

8

70

7

60

6

50

5

40

4

30

3

20

2

10 0

1985

1990

1995

2000

2005

2010

2015

2020

0

0

0

1

1985

1990

1995

2000

2005

2010

2015

2020

0

Figure 6.2: S&P500, 1980-2020. AR(1)-FKS(K)-(GJR)-GARCH(1,1) with l = 3 and coincident boundary knots (3.19). Left-hand side: σ ˆt2 and τˆt are only displayed in the range [0, 4] or [0, 12]. Righthand side: σ ˆt2 and τˆt are displayed in full range. First row: FKS(8)-GARCH(1,1) model with zt ∼ N (0, 1), second row: FKS(6)-GARCHt (1,1) model with zt ∼ St(0, 1, v), third row: FKS(10)-GJR-GARCH(1,1) with zt ∼ N (0, 1) and fourth row: FKS(8)-GARCHt (1,1) model with zt ∼ St(0, 1, v). K was selected with the HQ criterion from a range K ∈ {1, ..., 15}. The mean process is modeled by an ARMA(1,1) process with intercept in a uniform estimation procedure. The estimated parameter values are displayed in table 6.6.

144

6 Empirical study

l=3

AR(1)FKS(8)-GARCH(1,1)

0.0629∗∗∗ (0.0081) −0.0019 (0.0036) 0.1017∗∗∗ (0.0017) 0.8445∗∗∗ (0.0031) −

φˆ0 φˆy1 α ˆ1 βˆ1 γˆ1



vˆ ηˆ1 ηˆ2 σ2 κ ˆ (t )  zt ) Kur(ˆ  zt ) Skew(ˆ  z) Var(ˆ t QMcL zt2 ) 50 (ˆ AIC BIC HQ GCV QLIKE MSE

0.9462∗∗∗ (0.0040) 0.9160 1 3.7379 6.0479 −0.5050 0.9995 45.0184

AR(1)FKS(6)-GARCHt (1,1)

AR(1)FKS(10)-GJR-GARCH(1,1)

0.0690∗∗∗ (0.0022) −0.125∗∗∗ (0.0017) 0.0829∗∗∗ (0.0008) 0.8860∗∗∗ (0.0009) − 6.0209∗∗∗ (0.0553) 0.9690∗∗∗ (0.0012) 0.9730 1 14.2575 6.8240 −0.6031 1.0004 49.3360

AR(1)FKS(8)-GJR-GARCHt (1,1)

0.0352 (0.0809) 0.0020 (0.0032) 0.0087 (0.0248) 0.8521∗∗∗ (0.0047) 0.1652∗∗∗ (0.0090) −

0.0511∗∗∗ (0.0004) −0.0078∗∗∗ (0.0013) 0.0011∗∗∗ (0.0002) 0.8662∗∗∗ (0.0013) 0.1611∗∗∗ (0.0012) 6.5987∗∗∗ (0.1326)

0.9433∗∗∗ (0.0166) 0.9065 1 3.5347 5.5351 −0.4629 1.0006 54.4749

0.9479∗∗∗ (0.0011) 0.9273 1 7.4092 6.0949 −0.5300 1.0084 47.1558

27490 27649 27543 42.1119

26899 27036 26945 42.6718

27201 27396 27267 42.3357

26667 26841 26726 42.5657

0.8193 41.9327

0.8275 42.5150

0.7903 42.1147

0.7950 42.3681

Table 6.6: S&P500 (see tables 2.1, 2.2). Free-knot(K)spline-(GJR)-GARCH model with l = 3, zt ∼ N (0, 1)

and zt ∼ St(0, 1). Robust standard errors (4.107), with analytic gradient and numerical Hessian ˆ selected in parentheses. Number of knots evaluated in a range K ∈ {0, ..., 15}, and optimal K by HQ criterion. Model with K = 0 corresponds to a unit-GARCH model with constant unconditional variance * p-value < 0.10, ** p-value < 0.05, *** p-value < 0.01

145

6 Empirical study

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0

0.1

1985

1990

1995

2000

2005

2010

2015

0

2020

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1 0

1985

1990

1995

2000

2005

2010

2015

2020

1985

1990

1995

2000

2005

2010

2015

2020

0.1

1985

1990

1995

2000

2005

2010

2015

0

2020

Figure 6.3: S&P500, 1980-2020. AR(1)-FKS(K)-(GJR)-GARCH(1,1) with l = 3 and coincident boundary knots (3.19). FKS(8)-GARCH(1,1) model with zt ∼ N (0, 1) (top left), FKS(6)-GARCHt (1,1) model with zt ∼ St(0, 1, v) (top right), FKS(10)-GJR-GARCH(1,1) with zt ∼ N (0, 1) (bottom left), FKS(8)-GARCHt (1,1) model with zt ∼ St(0, 1, v) (bottom right).

146

6 Empirical study

6.3 Out-of-sample forecast Figure 6.4 illustrates the OOS forecast procedure applied. First, the S&P500 sample is split in an estimation period (01/02/1980-01/11/2017, T E = 9340) and a validation period (01/12/2017-12/31/2020, V = 990), as chapter 2.3 reported. The models under consideration are the same 28 models as for the IS analysis, see 6.2. 4500

20 estimation

validation

4000

15

3500

10

3000

5

2500

0

2000

-5

1500

-10

1000

-15

500

estimation

validation

-20

0 1980

1985

1990

1995

2000

2005

2010

2015

-25 1980

2020

1985

1990

1995

180

600 estimation

2000

2005

2010

2015

2020

validation

validation

160 500 140 120

400

100 300 80 60

200

40 100 20 0 1980

1985

1990

1995

2000

2005

2010

2015

0 2017

2020

2018

2019

2020

2021

Figure 6.4: S&P500 sample separated in full-, estimation-, validation period. Spot prices pt (top left figure), log-returns yt (top right figure), squared log-returns yt2 (bottom left figure). On the bottom right,  t (5-minute frequency) and y 2 are compared within the validation period. RV t

However, the applied forecasting schemes for the three model classes (standard GARCH, BS-GARCH, FKS-GARCH) differ for computational reasons. • For the standard GARCH model, the so-called recursive expanding forecasting scheme (see chapter 4.4) is employed. For this, each of the four standard GARCH models is re-estimated in a daily frequency t = {1, ..., 9340}, t = {1, ..., 9341},..., t = {1, ..., 10270}, and the first date is fixed. Here, T − J = 10270, where J = 60 is the largest included forecast step. The resulting estimated cumulative conditional variances (see (4.151)) 2 2 2 σ ˆ9341:9341+j|9341 ,ˆ σ9342:9342+j|9342 ,...,ˆ σ10270:10270+j|10270

are evaluated with the QLIKE(j) (see (4.154)) and MSPE(j) (see (4.155)) loss funcˆ t. ˆt2 = h tions. For the standard GARCH case, hold that τt = 1 and, therefore, σ • The BS-GARCH models are evaluated with the modified recursive expanding scheme (see chapter 4.4). First, each of the models is estimated in a range of K ∈ {0, ..., 15} for the estimation period, where K = 0 refers to a standard GARCH model. The best

147

6 Empirical study model selected with HQ is chosen. Now, each of these twelve BS-GARCH models is re-estimated in a daily frequency. To keep the BS-GARCH models comparable to the FKS-GARCH models, the inner knots stay fixed and the right boundary knot shifts for each estimate to b = T (E) + e. Therefore, the BS-GARCH models are estimated with a given knot vector that has equal intervals for [t0 , t1 ], ..., [tK−2 , tK−1 ] and an expanded interval [tK−1 , tK = b]. Every 99 days, t = {1, ..., 9340}, t = {1, ..., 9341},..., t = {1, ..., 9438} t = {1, ..., 9439}, t = {1, ..., 9440},..., t = {1, ..., 9537} . . . t = {1, ..., 10231}, t = {1, ..., 10233},..., t = {1, ..., 10270} each BS-GARCH model is re-estimated and re-evaluated within the range K ∈ {0, ..., 15}, and for the next 99-day interval, this procedure is repeated. This means, that a daily re-estimation of all mean, GARCH, and spline parameters with given (expanded) knot vector is conducted. As V = 990, this means that the re-estimation/ re-evaluation procedure is conducted ten times until t = V − J 1 . As in the standard GARCH case, the resulting estimated cumulative conditional variances are evaluated with the QLIKE(j) and MSPE(j) loss functions. • As the daily re-estimation of an FKS-GARCH model is computationally expensive (see table 5.3), the same procedure as for the BS-GARCH models is utilized here. After the initial estimation and evaluation of the best HQ model, the estimated knot vector is applied for a daily re-estimation with a BS-GARCH model (with given knot vector tˆ ˆ for a 99-day interval. As in the BS-GARCH case, the rightmost interval is and given K) expanded daily, i.e., b = T (E) + e. The mean parameters, the GARCH parameters, and the spline parameters are, thus, re-estimated daily, and the estimated inner knots stay fixed for 99 days. After each 99-day interval, each FKS-GARCH model is re-estimated in a range K ∈ {0, .., 15}, and the procedure is repeated until t = V − J. To calculate the forecast loss functions, the estimated conditional variances are compared  provided by the Oxford-Man Institute of Quantitative Finance with the five-minute RV t (Heber et al., 2009). The lower right-hand side figure 6.4 compares the five-minute realized  values are approxvariances and the daily squared log-returns. It is obvious that the RV t imately in the magnitude of the estimated conditional variances and, therefore, less noisy than yt2 . Table C.18 reports the results of the MSPE(j) and table C.21 the results of QLIKE(j). The best model from j = 30 upwards following the MSPE(j) loss function and the best model from j = 10 upwards following the QLIKE(j) loss function is the FKS-GJR-GARCH model with l = 2. This is not surprising, as this is the model which has the best IS forecast ˆ evaluation for both loss functions among all models (given the HQ selected K). For the shorter forecast steps, the FKS-GJR-GARCHt with l = 1 (MSPE(j = 5, 10), QLIKE(j = 1)) and the FKS-GARCH model with l = 1 (MSPE(j = 1)) are the best models. Overall, no single competing model is better than the best FKS-GARCH model. 1

The validation period was set to V = 1000. Since the Oxford-Man Institute data for RVt for this period is only 990, the Refinitiv data was adjusted.

148

6 Empirical study Regarding the MSPE(j), all FKS-GARCH models are better than the standard GARCH models in all forecast step lengths, except for the FKS-GARCH model with l = 2. With the QLIKE(j) loss function, there is no such stringent pattern. Here, the GJR-GARCH model is the best standard GARCH model, and it outperforms most of the BS-GARCH models in most forecast step lengths. Nevertheless, most of the FKS-GARCH models in most step lengths are better than every competitor, even with the QLIKE(j). Figure 6.5 illustrates the loss functions of the best models for each class. Noteworthy at this point is that most of the competitors outperform the famous standard GARCH(1,1) and BS-GARCH(1,1) models. What is striking is that the best HQ selected model within each class is always a GJRGARCHt variant. Nevertheless, only among the BS-GARCH models and the FKS-GARCH models with l = 1, these models are the best OOS models following the MSPE(j). Regarding the QLIKE(j) evaluation, the GJR-GARCHt variant is always the best in forecasting one step ahead and almost always up to j = 5. For larger forecast steps, other variants are more accurate in terms of QLIKE(j). 6

30000

5

25000

4

20000

3

15000

2

10000

1

5000

0

0

10

20

30

40

50

0

60

0

10

20

30

40

50

60

Figure 6.5: S&P500 OOS forecast evaluation 01/12/2017-12/31/2020 (full validation period, V = 990). On the left-hand side: QLIKE(j). On the right-hand side: MSPE(j).

To test whether the difference of the forecast accuracy (measured by QLIKE(j) and MSPE(j)) between the FKS-GARCH models and the competitors is statistically significant, the DMtest statistic (see (4.157)) is applied pairwise. For the two best competitors of the FKSGJR-GARCH model with l = 2, tables 6.8 and 6.9 display the p-values of the DM-statistics. With the QLIKE loss function, the FKS-GJR-GARCH model outperforms the GJR-GARCH model significantly (at least at α = 0.10) up to j = 30 and the BS-GJR-GARCH model for j ∈ {5, 10}. With the MSPE loss function, only the one-step-ahead forecast is significantly better than the competitor models. Tables C.24 and C.27 list the p-values for each pairwise comparison of an FKS-GARCH model with the corresponding standard GARCH or BS-GARCH model2 . A peculiarity is that every FKS-GARCH model outperforms every related standard GARCH model (for both loss functions) significantly for the one-step-ahead horizon. The same holds for the BS-GARCH models with MSPE(1), except for the BSGARCH and BS-GJR-GARCHt models with l = 1. For the QLIKE loss function holds that the FKS-GARCH model is not significantly better than any BS-GARCH model, except for 2

e.g., the comparison of the GJR-GARCH model (here: model 1) with FKS-GJR-GARCH models in l ∈ {1, 2, 3} (here: model 2) or the comparison of the BS-GARCH model with l = 1 (here: model 1) with the FKS-GARCH model with l = 1 (here: model 2), and so on.

149

6 Empirical study j=1

j=5

j = 10

j = 30

j = 60

l=1 BS-GJR-GARCHt

0.0354

0.7224

0.5600

0.4039

0.3952

GJR-GARCH

0.0000

0.1461

0.1894

0.1924

0.1935

Table 6.8: p-values of DM-test (see (4.157)) for S&P500 OOS forecast 01/12/2017-12/31/2020 (full volatility period) with MSPE(j) loss functions, see (4.155). Both models are compared with the FKS-GJR-GARCH model with l = 2. j=1

j=5

j = 10

j = 30

j = 60

l=1 BS-GJR-GARCH

0.4527

0.0219

0.0412

0.1129

0.1456

GJR-GARCH

0.0000

0.0043

0.0365

0.0913

0.1346

Table 6.9: p-values of DM-test (see (4.157)) for S&P500 OOS forecast 01/12/2017-12/31/2020 (full volatility period) with QLIKE(j) loss functions, see (4.154). Both models are compared with the FKS-GJR-GARCH model with l = 2.

the BS-GJR-GARCHt model with l ∈ {1, 2}. It should be stated that the validation period is relatively short and, thus, the variance is relatively high. Therefore, with some loss functions in some setups, the FKS-GARCH model exceeds the predictive accuracy in total values, but the differences to many forecasts of many reference models are statistically not significant. Nevertheless, there is an improvement to the standard GARCH models as expected. Furthermore, the best FKS-GARCH model is significantly better than the best BS-GARCH model, at least for short forecast horizons. The validation period is mainly characterized by a long tranquil volatility period (01/12/2017 - 01/31/2020) and a short highly volatile period (02/01/2020-12/31/2020). The latter phase includes the start of the corona pandemic with the major stock market crash in March 2020. Therefore, these two phases were additionally analyzed separately. First, the high volatility period is highlighted. Figure 6.6 depicts the best models among each class within the high volatility period. The best model evaluated with QLIKE(j) is, again, the FKS-GJR-GARCH model with l = 2. Even the magnitude of the loss function values is higher than in the full validation period, and the FKS-GJR-GARCH model with l = 2 is much more accurate here than any of its competitors. With the MSPE(j), the FKS-GJR-GARCHt model with l = 1 is the best. A DM-test evaluates the pairwise best competitor models of each class, see tables 6.10 and 6.11. It is shown that the relevant FKS-GARCH model is significantly better for j = 1 throughout. Furthermore, even for the forecast horizon j = 5, three out of four loss function evaluations are significantly superior. Tables C.26 and C.29 show the p-values of the pairwise DM comparison for all FKS-GARCH models with the corresponding competitors. For both loss functions, the predictive accuracy is significantly better for j = 1 than the forecast of the compared models (except three BS-GARCH models with l = 1).

j=1

j=5

j = 10

j = 30

j = 60

l=1 BS-GJR-GARCHt

0.0000

0.0534

0.1059

0.1515

0.1609

GARCH

0.0015

0.0586

0.1230

0.1561

0.1568

Table 6.10: p-values of DM-test (see (4.157)) for S&P500 OOS forecast 02/01/2020-12/31/2020 (high volatility period) with MSPE(j) loss functions, see (4.155). Both models are compared with the FKS-GJR-GARCHt model with l = 1.

150

6 Empirical study j=1

j=5

j = 10

j = 30

j = 60

l=2 BS-GARCHt

0.0001

0.1165

0.1486

0.1700

0.1840

GJR-GARCH

0.0001

0.0837

0.1262

0.1645

0.1809

Table 6.11: p-values of DM-test (see (4.157)) for S&P500 OOS forecast 02/01/2020-12/31/2020 (high volatility period) with QLIKE(j) loss functions, see (4.154). Both models are compared with the FKS-GJR-GARCH model with l = 2.

In the low volatility period, the difference between the models is not as clear as during high volatility. Nevertheless, an FKS-GARCH model is the best performing model for j ∈ {30, 60} here. For shorter forecast steps within the low volatility period, the absolute difference is negligible; see figure 6.7. The DM statistic between the best performing models for the QLIKE loss function (see table 6.13) exposes a significant supremacy of the FKS-GJRGARCH model with l = 2 for j ∈ {5, 10, 30}. However, as the right figure in 6.7 reveals, the BS-GJR-GARCH model has a lower MSPE(j) up to j = 30 than the FKS-GJR-GARCH model. Therefore, the null hypothesis of the DM-test cannot be rejected for the case of the MSPE(j) there. Nevertheless, the predictive accuracy of the FKS-GJR-GARCH model with l = 2 is significantly better than the standard GARCH model for the one-step ahead horizon. Tables C.25 and C.28 exhibit no superiority of the FKS-GARCH models over the corresponding competitors in the low volatility period. To sum up, especially for the high volatility period, the FKS-GARCH model improves the forecast for shorter horizons significantly and also has noticably lower loss function values for larger j than the competitors. For periods with low volatility, the comparative advantage diminishes. j=1

j=5

j = 10

j = 30

j = 60

l=1 BS-GJR-GARCH

0.9908

0.9719

0.9173

0.5496

0.1764

GARCH

0.0000

0.1222

0.1763

0.0707

0.1130

Table 6.12: p-values of DM-test (see (4.157)) for S&P500 OOS forecast 01/12/2017-01/31/2020 (low volatility period) with MSPE(j) loss functions, see (4.155). Both models are compared with the FKS-GJR-GARCH model with l = 2.

j=1

j=5

j = 10

j = 30

j = 60

l=2 BS-GJR-GARCH

1.0000

0.0072

0.0121

0.0690

0.1385

GARCH

0.0000

0.0086

0.0992

0.1016

0.1341

Table 6.13: p-values of DM-test (see (4.157)) for S&P500 OOS forecast 01/12/2017-01/31/2020 (low volatility period) with QLIKE(j) loss functions, see (4.154). Both models are compared with the FKS-GJR-GARCH model with l = 2.

151

6 Empirical study

160000

7

140000 6 120000 5 100000 80000

4

60000 3 40000 2 20000 1

0

10

20

30

40

50

0

60

0

10

20

30

40

50

60

Figure 6.6: S&P500 OOS forecast evaluation 02/01/2020-12/31/2020 (high volatility period, V = 226). On the left-hand side: QLIKE(j). On the right-hand side: MSPE(j).

5

6000

4

5000

3

4000

2

3000

1

2000

0

1000

-1

0

10

20

30

40

50

0

60

0

10

20

30

40

50

60

Figure 6.7: S&P500 OOS forecast evaluation 01/12/2017-01/31/2020 (low volatility period, V = 764). On the left-hand side: QLIKE(j). On the right-hand side: MSPE(j).

152

7 Conclusion This chapter discusses and reflects on the contribution and the research results of this dissertation. First of all, the S-GARCH model of Engle and Rangel (2008) is the basis for contributions in this dissertation. Like other models with a multiplicatively decomposed conditional variance (see Amado et al. (2018) for a comprehensive summary), the S-GARCH model mitigates a major drawback of standard GARCH models: the VP in a near unit-root region, i.e., ηˆ1 ≈ 1. Here, the favorable properties of a spline function to adapt to the structure of several types of data come into play. The spline function smoothes the longterm volatility. Therefore, the unconditional variance of the S-GARCH model is no longer constant. However, with the S-GARCH model, the knots a = t0 , ..., tK = b are equally distributed and given in advance. As a result, the knots are placed in arbitrary locations, even where the data is already smooth. This chapter is organized as follows. Section 7.1 reviews the problems associated with the free estimation of knots in a spline-GARCH context and looks at the contributions of this dissertation to solving them. Section 7.2 highlights the key results of this dissertation and answers the initial research questions. Section 7.3 reports some of the limitations of the dissertation thesis and highlights some potential future research. The dissertation ends with some concluding remarks in 7.4.

7.1 Research problems and contributions The main contribution of this dissertation was to estimate the knots of the S-GARCH model as free parameters. The idea was that placement of the knots at optimal locations (where the data is not smooth) improves the IS and the OOS performance of the model. Still, previous research results in the cited literature revealed for the cross-sectional data domain that the estimation of knots as free parameters is associated with some difficulties, which chapter 4 investigated: • The “lethargy problem“, see chapter 4.2.2. • Estimating all parameters jointly or separately, see chapter 4.2.4. • The choice of the objective function, see chapter 4.2. • The choice of the spline basis function, see chapter 3.2. • Identification of the estimators, see chapter 4.2.4. • Finding an appropriate starting vector for the optimization routine, see chapter 4.5. These difficulties had to be explored first for the context of time-series models. Furthermore, a suitable adaptation for the spline-GARCH model class had to be found. The logtransformation of the knots as firstly proposed by Jupp (1978) and later adjusted by Lindstrom (1999) mollified the “lethargy problem“. This log transformation penalizes the proximity of adjacent knots up to infinity for coincident knots. The Jupp-transformed knots are

153

© The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2022 O. Old, Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model, Gabler Theses, https://doi.org/10.1007/978-3-658-38618-4_7

7 Conclusion parameters for the subsequent optimization of the knot locations. Through the application of Jupp-transformed knots, optimizing the knot location is an unconstrained optimization problem. Furthermore, the transformed values are (mostly) in the scale [10−2 , 101 ], as the other parameters. Therefore, the parameter vector consists of parameters of the conditional mean, the conditional variance equation, the spline function, and the Jupp-transformed knots. All equations are non-linear in the parameters (except if the mean is an AR(U ) or the conditional variance is an ARCH(P ) process). Furthermore, the Fisher information applies only asymptotically and is only block-diagonal for the parameters of the conditional mean and the conditional variance equations, see chapter 4.2.4. Therefore, a joint estimation of the entire parameter vector is required to receive consistent estimators. Due to the non-linearity of the parameters and the beneficial asymptotic properties, the QML method is applied. • This dissertation provided the analytical first derivatives for the proposed FKS-GARCH model as well as for the S-GARCH, BS-GARCH, and PS-GARCH models, see chapter 4.2.4 and appendix B. Furthermore, the asymptotic properties for the QMLE of the FKS-GARCH model are derived, see chapter 4.2.4. The local identification of the parameters requires a Fisher information of full rank. If multiple knots (nearly) coincide, then the corresponding columns of the Fisher information matrix are linearly dependent. Employing B-spline basis functions instead of truncated basis functions is preferential due to the Schoenberg-Whitney theorem, see 3.2.2. If the conditions of the Schoenberg-Whitney theorem are fulfilled, the spline parameters are only affected by linearly dependent Fisher information columns if all knots within at least one B-spline basis interval coincide (design matrix is non-singular). With equidistantly distributed knots in combination with B-spline basis functions, this problem cannot occur. However, with freely estimated knots, multiple knots could appear. Therefore, it is reasonable to opt for the B-spline basis function, as truncated power basis functions (as employed for the S-GARCH model) are nearly linearly dependent if at least two knots are very close to each other. • The FKS-GARCH model consists of B-spline basis functions and is not restricted to a pre-defined degree. Therefore, the FKS-GARCH model can estimate processes with different smoothness requirements. A broad area within free knot estimation research is finding a good starting vector for optimizing knot locations, see chapter 4.5. To the best of my knowledge, there are currently no procedures available for determining starting vectors for free knot estimation in the timeseries data domain. • This dissertation proposed three starting knot vector procedures: the equidistant, the left-right-central (LRC), and the modified Luo-Kang-Yang (LKY) starting vector. Dierckx (1993, p.67) assessed the equidistant starting vector as a poor choice. Still, the equidistant starting vector is computationally fast and easy to determine. The LRC starting vector is determined with an ad hoc procedure. The equidistant starting vector is once shifted by a half interval length to the left and once to the right. Then the one with the lowest SSR is chosen. A more elaborate procedure is the modified LKY. It consists of the so-called unimodality property of splines. Here, all inactive knots are sorted out until the best starting vector is found. This procedure is noticeably slower (see 4.1) but more accurate than the other two, particularly for complex data structures. Based on these findings, the simulation study and the empirical study in chapters 5 and 6 are conducted.

154

7 Conclusion

7.2 Research questions The first research question addressed the finite sample properties of the QMLE of splineGARCH models in general and FKS-GARCH models in particular. For the evaluation of the finite sample properties of the S-GARCH model, chapter 5.1 discussed and reviewed several studies. A large simulation study was carried out to evaluate the finite sample properties of the FKS-GJR-GARCH model, as chapter 5.4 reported. For this, a BS-GJR-GARCH model was simulated for different knot vectors in a range of K0 ∈ {0, 2, 5, 9, 11}, where K0 = 0 corresponds to a standard GJR-GARCH model. Each DGP was replicated 1000 times in four different time series lengths, T ∈ {2500, 5000, 10000, 20000}, once with normally distributed and once with Student’s-t distributed zt . Thus, 40000 different paths were simulated and estimated. The analysis of the finite sample properties only assessed the models with the ˆ In the first part of the analysis, the replications of correct number of knots (i.e., K0 = K). a correctly specified model (Gaussian simulations and Gaussian likelihood) were estimated. The simulation study with the correctly specified likelihood was useful to study the behavior of the estimators in a perfect environment. Here, the theoretically derived assumptions could be confirmed in principle. However, in real-world applications, the data-generating model and the data-generating distribution are unknown. Therefore, the simulation study with a misspecified model (Student’s-t simulations and Gaussian likelihood) is more realistic than the correctly specified variant. These analyses revealed: • The more knots were included, the greater were the bias and variance of the GARCH estimators. • The estimators were consistent. • The moving average estimators α ˆ 1 (for all simulated knot vectors) were asymptotically normally distributed, without limitation. • The βˆ1 and γˆ1 estimators were asymptotically normally distributed for all models with up to four inner knots. For the two variants with K ∈ {9, 11}, there was a small amount of (asymptotically disappearing) outliers. The estimators’ bias and variance in the misspecified study were larger than in the correctly specified simulation setup but disappeared asymptotically. Furthermore, it holds that the GARCH estimators were also asymptotically normally distributed but with a slower convergence rate. The latter means that even in the largest sample (T = 20000), there were few outliers. This fact should be taken into account by practitioners.

The second research question dealt with the accuracy of the estimated knot locations. In principle, this is the pivot of the FKS-GARCH model. Only if the proposed algorithms lead to accurate results, the basic assumptions of the model are correct. The simulation study in chapter 5 highlighted this question. All knot vectors t0 were distributed non-equidistantly to examine this issue. As chapter 4.5 discussed, the accuracy of the estimated knot locations depends on the utilized starting vector. Therefore, answering this question is twofold. On the one hand, there has to be a starting vector t0 that deviates as little as possible from the true knot vector t0 . On the other hand, even with the best t0 , there must be an improvement through the application of the selected optimization method, i.e., Newton-Raphson procedure with BFGS update and jointly estimated Jupp-transformed knots.

155

7 Conclusion The proposed modLKY algorithm 1 indeed succeeded in reducing the starting vector bias remarkably. Furthermore, as expected, the closer t0 was to t0 , the smaller was the estimated bias1 . However, when estimating only one inner knot (i.e., K = 2), the choice of the t0 did not matter. Moreover, in that case, the estimated knot vector was consistent with a fast convergence rate. This changed when there was more than one internal knot. Having a moderate number of inner knots (K ∈ {5, 9}), the equidistant starting vector was a poor choice. However, as the number of inner knots further increased, more knots were in the vicinity of the equidistant knot vector2 . Nevertheless, w.r.t. the data, the equidistant knot vector determines knots at arbitrary locations, and DGPs with equidistantly distributed knots are implausible. Therefore, the modLKY outperformed the equidistant one for all K (except K = 2, as discussed above). Thus, the simulation and the empirical studies utilized the replication routine with algorithm 3, where modLKY was the first try. The resulting estimated knot locations are less biased throughout than with the equidistant starting vector. The analysis of different starting vectors demonstrated that the more knots are estimated, the greater the resulting bias was. • The estimated knot locations were consistent with the applied algorithm, but the bias depended on the number of knots and on the starting knot vector applied. For the true knot vectors held ri = 1, i.e., there were no sites ξi with multiple knots. Therefore, the estimated knots should also not coincide. Starting the estimation procedure with the true knot vector always resulted in a tˆ in the vicinity of t0 , and no coincidences appeared. However, the determination of the starting vector has a random component, see chapter 4.5. Hence, every starting vector was different (except the equidistant one). Therefore, there was a variation in the accuracy of the employed starting vectors, which the simulation study in chapter 5 detailed. Moreover, for some tˆ with K ∈ {5, 9, 11}, multiple knots appeared, in particular in the misspecified case. This directly affected the estimated Fisher information, which was no longer of full rank. Hence, the knot locations were not identifiable. As long as the Schoenberg-Whitney theorem (see 3.2.2) was met, this had no further consequences for the spline parameters. Furthermore, as long as the remaining columns of the estimated Fisher information were linearly independent, the resulting standard errors of the corresponding parameters were reliable. In the subsequent empirical analysis, the estimated Fisher information matrices for the FKS-GARCH models with l ∈ {1, 2} were of full rank. However, for the FKS-GARCH models with l = 3, all estimated Fisher information were not of full rank. Congruence occurred as the simulation study employs only B-spline basis functions with l = 3 . For B-spline functions with l = 1, multiple knots would lead to discontinuous functions of the first derivative and for l = 2, to discontinuous functions of the second derivative. In both cases, the ’likelihood’ that the DGP has multiple knots is small. To account for possible structural breaks, most researchers opted for a B-spline basis with l > 2 (see Lindstrom (1999); Gervini (2006); Audrino and Bühlmann (2009, inter alia)). Therefore, it is a conjecture that multiple knots lead to no convergence for B-spline basis functions with l < 3, but l = 3. Here, the algorithm is more flexible and able to adopt (true) structural breaks. Nevertheless, B-spline basis functions with l < 3 are also vulnerable to local optima and saddle points, as Jupp (1978) demonstrated and as the simulation and the empirical studies in this dissertation revealed. 1

Before conducting the simulation study, a pre-study with the true knot vector as starting vector was made. Here, it was shown that the bias was negligibly small. These results are not reported in this dissertation. 2 Given the assumption that the true knots are distributed over the whole range of the data. For functions with extremely one-sidedly distributed knots, as in the case of the Doppler function, for example, the equidistant starting vector is an inappropriate choice even for a high number of knots.

156

7 Conclusion • With B-spline basis functions of degree l = 3, the knot locations tend to coincide. However, even with l = 3, the other parameters are still identifiable and consistent.

The third question asked whether the discussed model selection criteria (see chapter 4.3) could determine the true K0 and if these criteria could distinguish a process with constant unconditional variance from a process with time-varying unconditional variance. Especially for the latter, sometimes a step-wise variable selection was applied, see Amado and Teräsvirta (2013, inter alia). This question was directly related to the previous one. If the selected ˆ differs from the true number of knots K0 , then the accuracy also suffers. number of knots K For their S-GARCH model, Engle and Rangel (2008) recommended employing the BIC. The BIC penalizes complex model structures more than the other discussed criteria, see chapter 4.3. Nevertheless, to the best of my knowledge, the BIC selection for the S-GARCH model has not been investigated before. Therefore, the simulation study in chapter 5 investigated this issue. • For the S-GARCH model, a distinction between a process with constant unconditional variance and a process with time-varying long-term volatility could be made with the ˆ estimated with the BIC were biased and not consistent, although BIC. However, the K the selection with the BIC was more accurate than with the other three model selection criteria. The following simulation study results refer to the estimations with the FKS-GARCH model. • For the application of the FKS-GARCH model, the choice of the BIC was rather unsuitable. In particular for small T , the BIC underestimated K0 noticeably. Nevertheless, the BIC was consistent and could distinguish between a constant and a time-varying unconditional variance. • The AIC, on the other hand, overestimated K0 , especially for a small number of knots. Furthermore, no distinction between a constant and a time-varying unconditional variance was made with the AIC and the GCV. • The GCV systematically overestimated K0 in all simulation setups. This is due to the MSE related structure of the GCV. • The HQ, in turn, could distinguish these processes, was robust for small and large K, and was consistent. If the likelihood was misspecified, the BIC was good if the true number of knots was small. • The HQ was the best choice regarding all conducted simulations and the related estimations. Thus, the empirical part of this dissertation employed the HQ. However, it is recommended to consider the other options in cases of doubt. For the real-world example in chapter 6, K0 was unknown. Nevertheless, there were at least knots at places associated with structural breaks, such as the financial crisis. With the BIC in four out of the twelve FKS-GARCH model estimates, the GARCH model with constant unconditional variance was selected. ˆ was at least as large as with the HQ. With the AIC, on the other hand, the selected K ˆ was larger than with the BIC and HQ. StrikMoreover, in most cases, the AIC-selected K ˆ selected with the AIC was even at least as large as with the GCV. That was ingly, the K

157

7 Conclusion not necessarily to be expected considering the results in the simulation study. Furthermore, it is noticeable that none of the model selection criteria selected the maximum number of knots (in the range they were searched) K = 15.

The fourth research question looked at the ability of spline-GARCH models to mitigate a VP near one. Since the simulation study predetermined the VP, the empirical part of this dissertation investigated this question. For this, the standard GARCH models, BS-GARCH models, and FKS-GARCH models were estimated with the S&P500 index. All models were estimated with symmetric and asymmetric as well as with Gaussian and Student’s-t likelihood functions. Furthermore, since the spline-GARCH models were estimated with l ∈ {1, 2, 3}, there were 28 models under examination. The estimated VP for the standard GARCH models was considered as a benchmark for this research question. • As expected, the estimated VPs for the four standard GARCH models were in a nearly non-stationary range of [0.9788, 0.9935]. • For all 24 spline-GARCH models considered the estimated VP was mostly (remarkably) smaller than the minimum ηˆ1 = 0.9788 of the standard GARCH cases (except for the FKS(3)-GARCHt model with l = 1). • The empirical study in Old (2020) and the empirical study in this dissertation revealed that, at least for the S&P500 index, the VP decreased with an increasing number of knots. • Utilizing a GJR-GARCH model for the short-term volatility lowered the VP more than utilizing a symmetric GARCH model. • With a Gaussian likelihood, a more substantial decrease was observed than with a Student’s-t likelihood. • The decrease of the VP with the free knot estimation was more substantial than with equidistant knots, having the same K. As with the HQ and the BIC, models with fewer included basis functions were often preferred; the decrease was generally in the same range. • For most FKS-GARCH models, the decrease was statistically significant. However, for the GJR-GARCH variants with Gaussian likelihood, three out of six drops in VP were  η1 ). not statistically significant due to a large rse(ˆ

The fifth research question focused on the IS and the OOS accuracy of the estimated conditional variances σ ˆt utilizing the S&P500 sample. The accuracy was measured by the loss functions QLIKE and MS(P)E. • Both loss functions revealed that the 24 models with a time-varying unconditional variance outperformed those with a constant unconditional variance from the IS perspective. Since spline functions adjust to the long-term trend and reflect local characteristics of the time series, this result was expected. Nevertheless, there was a difference between models with equidistant and models with freely estimated knots.

158

7 Conclusion For the case with l = 1, the IS accuracy of BS-GARCH models was slightly better than with FKS-GARCH models with l = 1. • For the cases with l ∈ {2, 3}, the FKS-GARCH models outperformed all competitors. The best IS model out of all 28 models examined was the FKS-GJR-GARCH model with l = 2. The OOS performance of a volatility model is of great relevance to practitioners and researchers. Engle and Patton (2007) stated that a “volatility model should be able to forecast volatility“. Furthermore, the forecast ability of standard GARCH models was one of the groundbreaking properties of this model class, see Hansen and Lunde (2005). Nevertheless, due to the long memory pattern of standard GARCH models (high VP), the long-term (unconditional) variance approach is slow. Furthermore, long ago, past squared returns impact the forecast, which is rather unfavorable and arbitrary. From another perspective, the local volatility regime at the end of the estimation period has too little influence on the forecast. With spline-GARCH models, the end of the estimation period is locally adapted. Therefore, τˆT (E) represents the volatility regime at time t = T (E) . Furthermore, the mean-reverse process is faster than in the standard GARCH case, as spline-GARCH models lower the VP. The unconditional variance τˆT (E) is considered as the average variance at the end of the estimation period. Hence, it can be assumed that this average value will also apply to the near future. Thus, the loss is less than with a standard GARCH model. Chapter 4.4 examined these theoretical considerations. • All 24 spline-GARCH models (BS-GARCH and FKS-GARCH) with both loss functions (QLIKE and MSPE) produced significantly better predictions than the standard GARCH models for the one-step ahead horizon. The DM-test statistic verified the significance. Moreover, the absolute values of the loss functions were smaller than those of the standard GARCH models, even for longer forecast horizons. The FKS-GARCH model suggests that free knot estimation represents the local properties of the time series better than the standard GARCH model and better than spline-GARCH models with equidistantly distributed knots. With the proposed model class, the knots should automatically be placed where the data are not smooth and, therefore, enhance τˆT (E) . This, in turn, would improve the OOS forecast. Therefore, besides the superiority of spline-GARCH models over standard GARCH models in general, I examined whether freely estimated knots improve prediction over models with equidistant knots. In order to not only view this globally over the entire validation period, the validation period was split. Thus, the highly volatile phase of the corona pandemic from 01/31/2020 - 12/31/2020 was explicitly investigated. • Most FKS-GARCH models surpassed most BS-GARCH models in absolute values in most forecast horizons with both loss functions. • The best FKS-GARCH model was statistically significantly better than the best BSGARCH model in j = 1 (MSPE) or j ∈ {5, 10} (QLIKE). • The comparative advantage of the FKS-GARCH model was striking in the considered high volatility period. For both loss functions, the comparisons with the related models preferred the FKS-GARCH models significantly here. • In the low volatility period, the benefit of free knot estimation was not clear. However, this was comprehensible since there were no large structural breaks or different regimes

159

7 Conclusion in those tranquil phases to which a spline function must be adapted. Therefore, in low volatility periods, the application of a spline-GARCH model has no advantage over standard GARCH models.

7.3 Limitations and future research The proposed FKS-GARCH model showed irregularities for some paths with degree l = 3. They appeared due to knot multiplicities. Even though the other parameters were not affected, structural breaks were indicated where no structural breaks occurred, especially in the misspecified simulations and the empirical example. Therefore, the IS and the OOS sample results of the FKS-GARCH models with l = 3 fell short of the other FKS-GARCH models. Nevertheless, modeling a higher degree B-spline function has advantages over a lower degree one. This is because of smoothness requirements and more continuous derivatives, see equations (3.14), (3.22), (3.23). Hence, the applied FKS-GARCH model with an unconstrained optimization routine did not deliver reliable estimates of all knot-locations for B-spline basis functions with l > 2 and K > 5. An obvious remedy would be limiting the knot distances ζi > c to be larger than a pre-specified value and, thus, remaining in the unconstrained optimization framework. On the one hand, that would lead to fully identifiable knot locations. On the other hand, that procedure would suppress the identification of structural breaks when they occur, notably beyond the penalty of the Jupp transformation. Another relief could be the Lindstrom (1999) approach with an additional penalty term, which is not part of the optimization routine. Her method has not been applied in the time-series context so far and is left to future research. The applied McLeod-Li Portmanteau test is not tailored to standard GARCH processes or time-varying GARCH processes. However, it provides reliable results for testing the autocorrelation of squared return series. Since the dissertation did not focus on post-sample analyses, this point was excluded from the start. Other Portmanteau-tests were, therefore, not dealt with, as their results were not central to this dissertation. Nevertheless, it should be mentioned that it is common for standard GARCH processes to use the Li and Mak (1994) test. For the time-varying unconditional variance, a test of Patilea and Raïssi (2014) has recently been proposed. For both, I would like to refer to the literature. There is a trade-off between the length and characteristics of the estimation period and the length of the validation period. The full sample period for the empirical study was chosen w.r.t the simulation results, see chapter 5. For this, the minimum length should be T ≈ 10000. Choosing a more extended period would lead to better estimates. However, this would also include data that are not reasonable for the forecast. For example, the 1970s were much less volatile than the 2000-2010s. Consequently, the chosen validation period was relatively short to keep the estimation period at a reasonable length. Therefore, even if the absolute values for some forecast horizons were remarkably smaller, no statistical significance was found for some j. The superiority of spline-GARCH models in general and the FKSGARCH in particular were shown. However, some forecasts lacked statistical significance for some j. This fact would be mitigated if the chosen forecast period were longer. Then some of the notable differences would also be statistically significant. The S&P500 index was sampled in equally spaced daily frequency t (daily closing price). This is associated with problems with the microstructure of the financial market that are not considered; see chapter 2.2. For the mean process, an AR(1) model was selected from the class of ARMA models with maximum order U = 1 and V = 1 for practical reasons. In addition, for short-term volatility, the best of two different models , namely the symmetric GARCH(1,1) and the asymmetric GJR-GARCH(1,1) models, was chosen. The order P = 1

160

7 Conclusion and Q = 1 for the GARCH(1,1) model followed the recommendations of Hansen and Lunde (2001, 2005) and was adopted for the GJR-GARCH(1,1) model for the sake of consistency. All these choices were made to focus on modeling long-term volatility. Nevertheless, there are good reasons to review each of these items. First of all, as there is a difference in the behavior of different types of financial assets, future research should investigate several types of financial assets. Thus, single stocks, exchange rates, interest rates, or inflation rates should be considered besides stock market indices. The second point concerns the sampling frequency. As discussed above, sampling daily closing prices is arbitrary. Sampling in an equidistant grid leads to some neglected market microstructures, see chapter 2.2. Still, a higher frequency leads to more information about the process, cf. Engle (2000, inter alia). The third point deals with the mean process. In financial econometrics, the mean is often ignored, i.e., yt = t , or a simple linear model is applied (as is done here), cf. Andersen et al. (2009). This, however, does not necessarily correspond to the DGP. The same holds for the short-term volatility process. Even if the P = 1 and Q = 1 order are superior with standard GARCH models, this must not hold for models with a time-varying unconditional variance. Therefore, it is recommended to verify different models and different orders for the mean and the short-term variance process. The application of the FKS-GARCH model improved the IS and OOS sample results, as discussed above. However, this model is not automatically better than every competitor, regardless of the researcher’s or practitioner’s contributions. Therefore, it is recommended to use the modLKY procedure to find a reliable starting vector. Furthermore, this dissertation ˆ Both the modLKY and the HQ proved recommends employing the HQ for the choice of K. to be robust in several conditions. Nevertheless, both have to be evaluated for each use case. Moreover, it is left to future research to find more efficient and accurate procedures for model selection and starting vectors. For now, it is recommended to apply the tested model selection criteria and the three starting vector procedures, as they obtained good results, see chapter 5. Another important issue is the dynamics of the OOS forecast. In this dissertation, τT (E) was chosen constantly for the entire forecast horizon J, see chapter 4.4. This improved the OOS sample forecast, as the unconditional variance is adapted to the last estimation point. Nevertheless, a further improvement would be an OOS dynamic of the spline function. With the S-GARCH model, Engle and Rangel (2008) originally intended to explain the sources of volatility, as demonstrated in chapters 3.3.2 and 5.1. They also intended to include a (weakly) exogenous variable directly in the long-term function τt , see equation (3.28). However, as this dissertation, they did not account for any exogenous sources in the spline function. This dissertation intrinsically follows the reasoning of Muller et al. (1997), where all exogenous information is included in the innovation term. Nevertheless, it is possible to integrate one or more exogenous variables into the model with a B-spline approach. These multivariate B-spline functions are called tensor product splines, see Dierckx (1993, chapter 2) or Schumaker (2007, chapter 12). This method has the advantage of multiplicatively linked B-spline bases, where the knot vector can be determined separately for each B-spline basis. Furthermore, each of the B-spline basis functions can have different degrees. Audrino and Bühlmann (2009) proposed a bivariate tensor product spline function in a spline regression setting, with the two independent variables ht−1 and 2t−1 . In this dissertation, the univariate framework of a spline-GARCH model with free knots was derived. Besides the inclusion of exogenous variables, multivariate modeling is the natural extension to this approach. Rangel and Engle (2012) proposed a multivariate factor S-GARCH model. For this, they applied the Dynamic Conditional Correlation (DCC) model of Engle (2002a), in which the multiplicatively decomposed variance σt2 is included. Here,

161

7 Conclusion the conditional variance matrix is a product of the matrices with the univariate conditional variances (spline functions account for the long-term volatility) and the conditional correlation matrix. The advantage of the DCC-GARCH model approach is that it is sufficient to estimate the univariate models. This means that the interaction terms are missing, but the estimators of these models are interpretable, and the estimation process is feasible. Therefore, a DCC-FKS-GARCH model would be a suitable extension to consider more than one returns series.

7.4 Concluding remarks This dissertation developed, simulated, and applied a new GARCH type model. Based on the framework of multiplicatively decomposed conditional variance models, the FKS-GARCH model counteracts several drawbacks of standard GARCH models and the S-GARCH model. A key feature is that the unconditional variance is no longer restricted to being constant. Therefore, the spurious long-memory pattern of standard-GARCH models is mitigated without explicitly accounting for different segments. However, in contrast to the S-GARCH model, with the FKS-GARCH model, the knots are not equidistantly placed. Now, the knots are considered as free parameters. Within a uniform estimation procedure, all parameters (including the knots) are jointly estimated. Therefore, the knots are placed at sites where the DGP is not smooth. These sites are typically associated with structural breaks or highly volatile phases. If the estimation process detects such sites, then the IS and the OOS sample performance are improved. Nevertheless, free knot estimation is an elaborate procedure. In contrast to the S-GARCH model, the knot vector is not known in advance, which complicates the estimation process. The knots tend to coalesce, in particular for the B-spline basis function with l > 2. Furthermore, the knot locations are not at the same scale as the other parameters. The FKSGARCH model mollifies the knot multiplicity property by a so-called Jupp transformation. With this, the knot proximities are penalized, with a minimum penalty for equidistantly distributed knots. This method was first applied in the time-series contexts, but some difficulties appeared that had already been encountered in the cross-section data domain. What is different here from the cross-section field is the high fluctuation of return data. That makes the identification of the knot locations even harder. This dissertation suggested smoothing the return data first and then using the smoothed function for the proposed starting vector routine. This starting knot vector routine provided good initial values for the following estimation process. The FKS-GARCH model was employed to model the (un)conditional variance through an unconstrained estimation process. Therefore, all estimators can attain free values in the real space. The estimated spline function showed an excellent adaption to the true unconditional variance by choosing B-spline basis functions and freely estimating the knots. As splines can generally adapt to different data types, the FKS-GARCH model is able to model the long-term variance, even if the true process was not a spline function. This suggests that the FKS-GARCH model is a flexible alternative to existing conditional variance models.

162

References Agarwal, G. G. and Studden, W. (1980), ‘Asymptotic integrated mean square error using least squares and bias minimizing splines’, The Annals of Statistics pp. 1307–1325. Agresti, A. and Coull, B. A. (1998), ‘Approximate is better than "exact" for interval estimation of binomial proportions’, The American Statistician 52(2), 119. Alexander, C. (2011), Practical financial econometrics, Vol. / Carol Alexander ; Vol. 2 of Market risk analysis, reprinted with corr edn, Wiley, Chichester. Amado, C., Silvennoinen, A. and Teräsvirta, T. (2008), ‘Modelling conditional and unconditional heteroskedasticity with smoothly time-varying structure’, CREATES Research Paper (2008-8). Amado, C., Silvennoinen, A. and Teräsvirta, T. (2018), ‘Models with multiplicative decomposition of conditional variances and correlations’, CREATES Research Paper (2018-14). Amado, C. and Teräsvirta, T. (2013), ‘Modelling volatility by variance decomposition’, Journal of Econometrics 175(2), 142–153. Amado, C. and Teräsvirta, T. (2017), ‘Specification and testing of multiplicative timevarying garch models with applications’, Econometric Reviews 36(4), 421–446. Amemiya, T. (1980), ‘Selection of regressors’, International Economic Review 21(2), 331– 354. Andersen, T. G. and Bollerslev, T. (1998), ‘Answering the skeptics: Yes, standard volatility models do provide accurate forecasts’, International Economic Review 39(4), 885–905. Andersen, T. G., Davis, R. A., Kreiß, J.-P. and Mikosch, T. V. (2009), Handbook of Financial Time Series, Springer Science & Business Media. Andreou, E., Pittis, N. and Spanos, A. (2001), ‘On modelling speculative prices: The empirical literature’, Journal of Economic Surveys 15(2), 187–220. Audrino, F. and Bühlmann, P. (2009), ‘Splines for financial volatility’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 71(3), 655–670. Bao, Y. (2015), ‘Should we demean the data?’, Annals of economics and finance 16(1), 163– 171. Bauwens, L., Dufays, A. and Rombouts, J. V. K. (2011), ‘Marginal likelihood for markovswitching and change-point garch models’, Journal of Econometrics 178, 508–522. Bauwens, L., Laurent, S. and Rombouts, J. V. (2006), ‘Multivariate garch models: a survey’, Journal of applied econometrics 21(1), 79–109. Bera, A. K. and Higgins, M. L. (1993), ‘Arch models: properties, estimation and testing’, Journal of economic surveys 7(4), 305–366.

163

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2022 O. Old, Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model, Gabler Theses, https://doi.org/10.1007/978-3-658-38618-4

REFERENCES Berkes, I., Horv, L. and Kokoszka, P. (2003), ‘Garch processes: structure and estimation’, Bernoulli 9(2), 201–227. Biais, B., Glosten, L. and Spatt, C. (2005), ‘Market microstructure: A survey of microfoundations, empirical results, and policy implications’, Journal of Financial Markets 8(2), 217–264. Black, F. (1976), ‘Studies of stock price volatility changes’, Proceedings of the 1976 Meeting of the Business and Economic Statistics Section, American Statistical Association pp. 177– 181. Black, F. and Scholes, M. (1973), ‘The pricing of options and corporate liabilities’, Journal of Political Economy 81(3), 637–654. Bollerslev, T. (1986), ‘Generalized autoregressive conditional heteroskedasticity’, Journal of Econometrics 31(3), 307–327. Bollerslev, T. (1987), ‘A conditionally heteroskedastic time series model for speculative prices and rates of return’, The review of economics and statistics 69(3), 542–547. Bollerslev, T. (2008), ‘Glossary to arch (garch)’, CREATES Research paper (2008-49). Bollerslev, T., Chou, R. Y. and Kroner, K. F. (1992), ‘Arch modeling in finance’, Journal of Econometrics 52(1-2), 5–59. Bollerslev, T., Engle, R. F. and Nelson, D. B. (1994), ‘Arch models’, Handbook of Econometrics IV(49), 2959–3038. Bollerslev, T. and Wooldridge, J. M. (1992), ‘Quasi-maximum likelihood estimation and inference in dynamic models with time-varying covariances’, Econometric Reviews 11(2), 143–172. Bozdogan (2000), ‘Akaike’s information criterion and recent developments in information complexity’, Journal of mathematical psychology 44(1), 62–91. Bozdogan, H. (1987), ‘Model selection and akaike’s information criterion (aic): The general theory and its analytical extensions’, Psychometrika 52(3), 345–370. Brockwell, P. J. and Davis, R. A. (2006), Time Series: theory and methods, Springer Series in Statistics, 2. edn, Springer, New York, NY. Brooks, C. and Burke, S. P. (2003), ‘Information criteria for garch model selection’, The European Journal of Finance 9(6), 557–580. Brownlees, C., Engle, R. F. and Kelly, B. T. (2011), ‘A practical guide to volatility forecasting through calm and storm’, Journal of risk 14(2), 3–22. Brownlees, C. and Gallo, G. M. (2010), ‘Comparison of volatility measures: A risk management perspective’, Journal of financial econometrics : official journal of the Society for Financial Econometrics 8(1), 29–56. Burchard, H. G. (1974), ‘Splines (with optimal knots) are better’, Applicable Analysis 3(4), 309–319.

164

REFERENCES Burnham, K. P. and Anderson, D. R. (2002), Model selection and multimodel inference: A practical information-theoretic approach, 2. edn, Springer. Burns, P. (2002), ‘Robustness of the ljung-box test and its rank equivalent’, Available at SSRN 443560 . Cai, J. (1994), ‘A markov model of switching-regime arch’, Journal of Business & Economic Statistics 12(3), 309–316. Campbell, S. D. and Diebold, F. X. (2005), ‘Weather forecasting for weather derivatives’, Journal of the American Statistical Association 100(469), 6–16. Caporin, M. and Costola, M. (2019), ‘Asymmetry and leverage in garch models: a news impact curve perspective’, Applied Economics 51(31), 3345–3364. Čižek, P. and Spokoiny, V. (2009), Varying coefficient garch models, in ‘Handbook of Financial Time Series’, Springer, pp. 169–185. Clements, M. P. (2005), Evaluating econometric forecasts of economic and financial variables, Palgrave Macmillan UK. Coleman, T. F. and Zhang, Y. (2020), Optimization Toolbox: User’s Guide, Natick, MA. URL: https://de.mathworks.com/help/pdf_doc/optim/index.html Conrad, C. and Hartmann, M. (2019), ‘On the determinants of long-run inflation uncertainty: Evidence from a panel of 17 developed economies’, European Journal of Political Economy 56(3), 233–250. Conrad, C. and Kleen, O. (2020), ‘Two are better than one: Volatility forecasting using multiplicative component garch–midas models’, Journal of Applied Econometrics 35(1), 19–45. Cont, R. (2001), ‘Empirical properties of asset returns: stylized facts and statistical issues’, Quantitative Finance 2(1), 223–236. Corradi, V., Distaso, W. and Swanson, N. R. (2011), ‘Predictive inference for integrated volatility’, Journal of the American Statistical Association 106(496), 1496–1512. Cox, M. G. (1972), ‘The numerical evaluation of b-splines’, IMA Journal of Applied Mathematics 10(2), 134–149. Craven, P. and Wahba, G. (1978), ‘Smoothing noisy data with spline functions’, Numerische Mathematik 31(4), 377–403. Curry, H. B. and Schoenberg, I. J. (1947), ‘On spline distributions and their limits: The pólya distribution functions’, Bulletin of the American Mathematical Society (53), 1114. Davidson, R. and MacKinnon, J. G. (1993), Estimation and inference in econometrics, Oxford New York. de Boor, C. (1973), Good approximation by splines with variable knots, in ‘Spline functions and approximation theory’, Springer, pp. 57–72. de Boor, C. (1978), A practical guide to splines, Vol. 27 of Applied mathematical sciences, Springer, New York.

165

REFERENCES de Boor, C. (2001), A practical guide to splines, Vol. 27 of Applied mathematical sciences, 1. hardcover print, rev. edn, Springer, New York. de Boor, C. and Rice, J. R. (1968a), ‘Least squares cubic spline approximation i-fixed knots’, Department of Computer Sciences. Purdue University, Lafayette (CSD TR 20). de Boor, C. and Rice, J. R. (1968b), ‘Least squares cubic spline approximation, ii-variable knots’, Department of Computer Sciences. Purdue University, Lafayette (CSD TR 21). Dennis, J. E. and Schnabel, R. B. (1983), Numerical methods for unconstrained optimization and nonlinear equations, Prentice-Hall series in computational mathematics. Dickey, D. A. and Fuller, W. A. (1979), ‘Distribution of the estimators for autoregressive time series with a unit root’, Journal of the American Statistical Association 74(366), 427–431. Diebold, F. X. (1986), ‘Modeling the persistence of conditional variances: A comment’, Econometric Reviews 5(1), 51–56. Diebold, F. X. (1988), Empirical Modeling of Exchange Rate Dynamics, Vol. 303 of Lecture Notes in Economics and Mathematical Systems, Springer, Berlin and Heidelberg. Diebold, F. X. (2004), ‘The nobel memorial prize for robert f. engle’, The Scandinavian Journal of Economics 106(2), 165–185. Diebold, F. X. and Mariano, R. S. (1995), ‘Comparing predictive accuracy’, Journal of Business & Economic Statistics 13(3), 253–263. Dierckx, P. (1993), Curve and surface fitting with splines, Monographs on numerical analysis, 1. publ edn, Oxford University Press. Ding, Z. and Granger, C. (1996), ‘Modeling volatility persistence of speculative returns: A new approach’. Ding, Z., Granger, C. W. and Engle, R. F. (1993), ‘A long memory property of stock market returns and a new model’, Journal of Empirical Finance 1(1), 83–106. Draper, D. (1995), ‘Assessment and propagation of model uncertainty’, Journal of the Royal Statistical Society. Series B (Methodological) 57(1), 45–97. Drost, F. C. and Nijman, T. E. (1993), ‘Temporal aggregation of garch processes’, Econometrica 61(4), 909. Dufays, A. (2016), ‘Infinite-state markov-switching for dynamic volatility’, Journal of financial econometrics : official journal of the Society for Financial Econometrics 14(2), 418– 460. Eilers, P. H. C. and Marx, B. D. (1996), ‘Flexible smoothing with b-splines and penalties’, Statistical Science 11(2), 89–121. Engle, R. F. (1982), ‘Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation’, Econometrica 50(4), 987–1007. Engle, R. F. (2000), ‘The econometrics of ultra-high-frequency data’, Econometrica 68(1), 1– 22.

166

REFERENCES Engle, R. F. (2001), ‘Garch 101: the use of arch/garch models in applied econometrics’, The journal of economic perspectives 15(4), 157–168. Engle, R. F. (2002a), ‘Dynamic conditional correlation: A simple class of multivariate generalized autoregressive conditional heteroskedasticity models’, Journal of Business & Economic Statistics 20(3), 339–350. Engle, R. F. (2002b), ‘New frontiers for arch models’, Journal of Applied Econometrics 17(5), 425–446. Engle, R. F. and Bollerslev, T. (1986), ‘Modelling the persistence of conditional variances’, Econometric Reviews 5(1), 1–50. Engle, R. F. and Gonzalez-Rivera, G. (1991), ‘Semiparametric arch models’, Journal of Business & Economic Statistics 9(4), 345–359. Engle, R. F., Hendry, D. F. and Richard, J.-F. (1983), ‘Exogeneity’, Econometrica 51(2), 277. Engle, R. F. and Lee, G. G. (1999), A long-run and short-run component model of stock return volatility, in R. F. Engle, H. White et al., eds, ‘Cointegration, Causality, and Forecasting: A Festschrift in Honour of Clive W.J. Granger’, Oxford University Press, pp. 475–497. Engle, R. F. and Mezrich, J. (1996), ‘Garch for groups’, RISK 10(9), 36–40. Engle, R. F. and Patton, A. J. (2007), What good is a volatility model?, in ‘Forecasting volatility in the financial markets’, Elsevier, pp. 47–63. Engle, R. F. and Rangel, J. G. (2008), ‘The spline-garch model for low-frequency volatility and its global macroeconomic causes’, Review of Financial Studies 21(3), 1187–1222. Engle, R. F., White, H. et al. (1999), Cointegration, causality, and forecasting: a Festschrift in Honour of Clive WJ Granger, Oxford University Press. Engle, R., Ghysels, E. and Sohn, B. (2013), ‘Stock market volatility and macroeconomic fundamentals’, Review of Economics and Statistics 95(3), 776–797. Eubank, R. L. (1984), ‘Approximate regression models and splines’, Communications in Statistics - Theory and Methods 13(4), 433–484. Fahrmeir, L., Tutz, G. and Hennevogl, W. (2001), Multivariate statistical modelling based on generalized linear models, 2. edn, Springer, New York. Fama, E. F. (1965), ‘The behavior of stock-market prices’, The Journal of Business 38(1), 34–105. Fama, E. F. (1970), ‘Efficient capital markets: A review of theory and empirical work’, The Journal of Finance 25(2), 383. Fan, J. and Yao, Q. (2003), Nonlinear time series: Nonparametric and parametric methods, Springer Series in Statistics, Springer. Fan, J. and Yao, Q. (2017), The elements of financial econometrics, Cambridge University Press.

167

REFERENCES Feller, W. (1945), ‘The fundamental limit theorems in probability’, Bulletin of the American Mathematical Society 51(11), 800–833. Feng, Y. (2004), ‘Simultaneously modeling conditional heteroskedasticity and scale change’, Econometric Theory 20(3), 563–596. Feng, Y. and Härdle, W. K. (2020), ‘A data-driven p-spline smoother and the p-spline-garchmodels’, Available at SSRN 3714616 . Figlewski, S. (1997), ‘Forecasting volatility’, Financial Markets, Institutions and Instruments 6(1), 1–88. Fiorentini, G., Calzolari, G. and Panattoni, L. (1996), ‘Analytic derivatives and the computation of garch estimates’, Journal of Applied Econometrics 11(4), 399–417. Fisher, T. J. and Gallagher, C. M. (2012), ‘New weighted portmanteau statistics for time series goodness of fit testing’, Journal of the American Statistical Association 107(498), 777– 787. Fletcher, R. (2013), Practical Methods of Optimization, 2. edn, Wiley. Francq, C. and Zakoïan, J.-M. (2004), ‘Maximum likelihood estimation of pure garch and arma-garch processes’, Bernoulli 10(4), 605–637. Francq, C. and Zakoian, J.-M. (2010), GARCH models: Structure, statistical inference, and financial applications, Wiley. Friedman, J. H. (1991), ‘Multivariate adaptive regression splines’, The Annals of Statistics 19(1), 1–67. Friedman, J. H. and Silverman, B. W. (1989), ‘Flexible parsimonious smoothing and additive modeling’, Technometrics 31(1), 3–21. Gervini, D. (2006), ‘Free-knot spline smoothing for functional data’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68(4), 671–687. Ghysels, E., Sinko, A. and Valkanov, R. (2007), ‘Midas regressions: Further results and new directions’, Econometric Reviews 26(1), 53–90. Glosten, L. R., Jagannathan, R. and Runkle, D. E. (1993), ‘On the relation between the expected value and the volatility of the nominal excess return on stocks’, The Journal of Finance 48(5), 1779–1801. Goldman, E. and Shen, X. (2017), ‘Analysis of asymmetric garch volatility models with applications to margin measurement’, Pace University Finance Research Paper (2018/03). Goldman, E. and Wang, T. (2015), ‘The spline-threshold-garch volatility model and tail risk’. Golub, G. and Pereyra, V. (1973), ‘The differentiation of pseudoinverses and nonlinear least squares problems whose variables separate’, SIAM Journal on Numerical Analysis 10(2), 413–432. Golub, G. and Pereyra, V. (2002), Separable nonlinear least squares: the variable projection method and its applications, in ‘Institute of Physics, Inverse Problems’, pp. 1–26.

168

REFERENCES González-Rivera, G. (1998), ‘Smooth-transition garch models’, Studies in Nonlinear Dynamics & Econometrics 3(2). Gouriéroux, C. (1997), ARCH Models and Financial Applications, Springer Series in Statistics, Springer, New York. Granger, C. W. J. (2002), ‘Some comments on risk’, Journal of Applied Econometrics 17(5), 447–456. Gray, S. F. (1996), ‘Modeling the conditional distribution of interest rates as a regimeswitching process’, Journal of Financial Economics 42(1), 27–62. Guo, J., Huang, W. and Williams, B. M. (2014), ‘Adaptive kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification’, Transportation Research Part C: Emerging Technologies 43, 50–64. Haas, M. (2004), ‘A new approach to markov-switching garch models’, Journal of Financial Econometrics 2(4), 493–530. Hafner, C. M. and Herwartz, H. (2008), ‘Analytical quasi maximum likelihood inference in multivariate volatility models’, Metrika 67(2), 219–239. Hamilton, J. D. and Susmel, R. (1994), ‘Autoregressive conditional heteroskedasticity and changes in regime’, Journal of econometrics 64(1-2), 307–333. Han, H. and Kristensen, D. (2015), ‘Semiparametric multiplicative garch-x model: Adopting economic variables to explain volatility’, Toulouse, France: Toulouse School of Economics . Hannan, E. J. (1980), ‘The estimation of the order of an arma process’, The Annals of Statistics 8(5). Hannan, E. J. and Quinn, B. G. (1979), ‘The determination of the order of an autoregression’, J. R. Statist. Soc. B 41, 190. Hansen, P. R. and Lunde, A. (2001), ‘A comparison of volatility models: Does anything beat a garch(1,1)?’. URL: http://www-stat.wharton.upenn.edu/ steele/Courses/434/434Context/GARCH/HansenLunde01.pdf Hansen, P. R. and Lunde, A. (2005), ‘A forecast comparison of volatility models: does anything beat a garch(1,1)?’, Journal of Applied Econometrics 20(7), 873–889. Hansen, P. R. and Lunde, A. (2006), ‘Consistent ranking of volatility models’, Journal of Econometrics 131(1/2), 97–121. Härdle, W. (2004), Nonparametric and semiparametric models, Springer Series in Statistics, Springer, Berlin. Harrell, JR., F. E. (2015), Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis, Springer Series in Statistics, 2. edn, Springer-Verlag. Harvey, D. I., Leybourne, S. J. and Newbold, P. (1997), ‘Testing the equality of prediction mean squared errors’, International Journal of Forecasting 13(2), 281–291.

169

REFERENCES He, C. and Teräsvirta, T. (1999), ‘Properties of moments of a family of garch processes’, Journal of Econometrics 92(1), 173–192. Heber, G., Lunde, A., Sheppard, N. and Sheppard, K. (2009), ‘Oxford-man institute’s realized library’. URL: https://realized.oxford-man.ox.ac.uk Hillebrand, E. (2005), ‘Neglecting parameter changes in garch models’, Journal of Econometrics 129(1-2), 121–138. Hillebrand, E. and Medeiros, M. C. (2008), Estimating and forecasting garch models in the presence of structural breaks and regime switches, in D. E. Rapach and M. E. Wohar, eds, ‘Forecasting in the presence of structural breaks and model uncertainty’, Vol. 3 of Frontiers of Economics and Globalization, Emerald, Bingley, pp. 303–327. Hillebrand, E. T. (2004), ‘Neglecting parameter changes in autoregressive models’, Louisiana State University Ecocomics Working Paper (2004-04). Hyndman, R. J. and Athanasopoulos, G. (2018), Forecasting: Principles and practice, 2. edn, OTexts. Inclán, C. and Tiao, G. C. (1994), ‘Use of cumulative sums of squares for retrospective detection of changes of variance’, Journal of the American Statistical Association 89(427), 913– 923. Jarque, C. M. and Bera, A. K. (1980), ‘Efficient tests for normality, homoscedasticity and serial independence of regression residuals’, Economics Letters 6(3), 255–259. Jupp, D. L. (1975), ‘The “lethargy” theorem—a property of approximation by γpolynomials’, Journal of Approximation Theory 14(3), 204–217. Jupp, D. L. B. (1978), ‘Approximation to data by splines with free knots’, SIAM Journal on Numerical Analysis 15(2), 328–343. Kang, H., Chen, F., Li, Y., Deng, J. and Yang, Z. (2015), ‘Knot calculation for spline fitting via sparse optimization’, Computer-Aided Design 58, 179–188. Kass, R. E. and Raftery, A. E. (1995), ‘Bayes factors’, Journal of the American Statistical Association 90(430), 773. Kaufman, L. (1975), ‘A variable projection method for solving separable nonlinear least squares problems’, BIT 15(1), 49–57. Lamoureux, C. G. and Lastrapes, W. D. (1990), ‘Persistence in variance, structural change, and the garch model’, Journal of Business & Economic Statistics 8(2), 225. Laurent, S. (2013), Estimating and forecasting ARCH models using G@rch 7, Timberlake Consultants, London and Union, NJ. Levy, G. (2003), ‘Analytic derivatives of asymmetric garch models’, The Journal of Computational Finance 6(3), 21–63. Li, W. K. (2004), Diagnostic checks in time series, Vol. 102 of Monographs on statistics and applied probability, Chapman & Hall/CRC, Boca Raton, Fla.

170

REFERENCES Li, W. K. and Mak, T. K. (1994), ‘On the squared residual autocorrelations in non-linear time series with conditional heteroskedasticity’, Journal of Time Series Analysis 15, 627– 636. Lindstrom, M. J. (1999), ‘Penalized estimation of free-knot splines’, Journal of Computational and Graphical Statistics 8(2), 333. Liu, R. and Yang, L. (2016), ‘Spline estimation of a semiparametric garch model’, Econometric Theory 32(4), 1023–1054. Ljung, G. M. and Box, G. E. P. (1978), ‘On a measure of lack of fit in time series models’, Biometrika 65(2), 297. Lo, A. W. (2016), ‘What is an index?’, The Journal of Portfolio Management 42(2), 21–36. Lo, A. W. and MacKinlay, A. C. (1990), ‘An econometric analysis of nonsynchronous trading’, Journal of Econometrics 45(1-2), 181–211. Lucchetti, R. (1999), ‘Analytic score for multivariate garch models’, Università di Ancona, Dipartimento di Economia, Working Paper Series . Lumsdaine, R. L. (1995), ‘Finite-sample properties of the maximum likelihood estimator in garch(1,1) and igarch(1,1) models: A monte carlo investigation’, Journal of Business & Economic Statistics 13(1), 1. Lumsdaine, R. L. (1996), ‘Consistency and asymptotic normality of the quasi-maximum likelihood estimator in igarch(1,1) and covariance stationary garch(1,1) models’, Econometrica 64(3), 575. Luo, J., Kang, H. and Yang, Z. (2019), ‘Knot calculation for spline fitting based on the unimodality property’, Computer Aided Geometric Design 73, 54–69. Lütkepohl, H. (2007), New introduction to multiple time series analysis, Springer, Berlin. Lyche, T., Manni, C., Speleers, H., Kunoth, A., Sangalli, G. and Serra-Capizzano, S., eds (2018), Splines and PDEs: from approximation theory to numerical linear algebra: Cetraro, Italy 2017, Vol. 2219 of Lecture notes in mathematics CIME Foundation subseries, Springer, Cham. Maddala, G. S. and Kim, I.-M. (2004), Unit roots, cointegration and structural change, Themes in modern econometrics, 6. print edn, Cambridge Univ. Press, Cambridge. Magnus, J. R. and Neudecker, H. (2002), Wiley series in probability and statistics, revised, reprinted edn, Wiley. Mandelbrot, B. (1963), ‘The variation of certain speculative prices’, The Journal of Business 36(4), 394–419. Mao, W. and Zhao, L. H. (2003), ‘Free-knot polynomial splines with confidence intervals’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 65(4), 901–919. Mardia, K. V., Kent, J. T. and Bibby, J. M. (1995), Multivariate analysis, Probability and mathematical statistics, 10. printing edn, Acadamic Press, London.

171

REFERENCES Martin, V., Hurn, S. and Harris, D. (2013), Econometric modelling with time series: Specification, estimation and testing, Themes in modern econometrics, Cambridge Univ. Press, Cambridge. Martinez, W. L. and Martinez, A. R. (2016), Computational statistics handbook with MATLAB, Chapman & Hall / CRC computer science and data analysis series, 3. edn, CRC Press Taylor & Francis Group, Boca Raton. Mayhew, S. (1995), ‘Implied volatility’, Financial Analysts Journal 51(4), 8–20. McAleer, M. (2014), ‘Asymmetry and leverage in conditional volatility models’, Econometrics 2(3), 145–150. McLeod, A. I. and Li, W. K. (1983), ‘Diagnostic checking arma time series models using squared residuals autocorrelations’, Journal of Time Series Analysis 4, 269–273. Mercurio, D. and Spokoiny, V. (2004), ‘Statistical inference for time-inhomogeneous volatility models’, The Annals of Statistics 32(2), 577–602. Merton, R. C. (1973), ‘Theory of rational option pricing’, The Bell journal of economics and management science 4(1), 141–183. Mikosch, T. and Starica, C. (2004), ‘Nonstationarities in financial time series, the long-range dependence, and the igarch effects’, Review of Economics and Statistics 86(1), 378–390. Muller, U. A., Dacorogna, M., Dave, R. D., Olsen, R., Pictet, O. V. and von Weizsäcker, J. (1997), ‘Volatilities of different time resolutions – analyzing the dynamics of market components’, Journal of Empirical Finance 4(2-3), 213–239. Murphy, K. P. (2012), Machine learning: A probabilistic perspective, Adaptive computation and machine learning series, MIT Press, Cambridge, Mass. Nelson, D. B. (1990), ‘Stationarity and persistence in the garch (1, 1) model’, Econometric theory 6(3), 318–334. Nelson, D. B. (1991), ‘Conditional heteroskedasticity in asset returns: A new approach’, Econometrica 59(2), 347–370. Noguchi, K., Aue, A. and Burman, P. (2016), ‘Exploratory analysis and modeling of stock returns’, Journal of Computational and Graphical Statistics 25(2), 363–381. Old, O. (2020), ‘Finite-sample properties of garch models in the presence of time-varying unconditional variance. a simulation study’, FernUniversität Hagen. Diskussionsbeiträge Fakultät Wirtschaftswissenschaft (519). Pagan, A. R. and Sabau, H. (1991), ‘On the inconsistency of the mle in certain heteroskedastic regression models’, Estudios Economicos pp. 159–172. Pagan, A. R. and Schwert, G. W. (1990), ‘Alternative models for conditional stock volatility’, Journal of econometrics 45(1-2), 267–290. Patilea, V. and Raïssi, H. (2014), ‘Testing second-order dynamics for autoregressive processes in presence of time-varying variance’, Journal of the American Statistical Association 109(507), 1099–1111.

172

REFERENCES Patton, A. J. (2011), ‘Volatillity forecast comparison using imperfect volatility proxies’, Journal of Econometrics 160(1), 246–256. Pawitan, Y. (2013), In all likelihood: Statistical modelling and inference using likelihood, first published in paperback edn, Clarendon Press and Oxford University Press, Oxford. Poon, S.-H. and Granger, C. W. J. (2003), ‘Forecasting volatility in financial markets: A review’, Journal of economic literature 41(2), 478–539. Raftery, A. E. (1995), ‘Bayesian model selection in social research’, Sociological methodology 25, 111–163. Rangel, J. G. and Engle, R. F. (2012), ‘The factor-spline-garch model for high and low frequency correlations’, Journal of business & economic statistics 30(1), 109–124. Rao, C. R. (2002), Linear statistical inference and its applications, Wiley series in probability and statistics, 2. paperback edn, Wiley, New York. Rao, C. R., Heumann, C., Shalabh and Toutenburg, H. (2008), Linear Models and Generalizations: Least Squares and Alternatives, Springer Series in Statistics, 3. extended edn, Springer-Verlag, Berlin, Heidelberg. Rao, C. R. and Wu, Y. (2001), On model selection, in P. Lahiri, ed., ‘Model selection’, Lecture notes, monograph series / Institute of Mathematical Statistics, JSTOR and Inst. of Math. Statistics, New York, NY and Beachwood, Ohio, pp. 1–57. Rapach, D. E. and Wohar, M. E., eds (2008), Forecasting in the presence of structural breaks and model uncertainty, Vol. 3 of Frontiers of Economics and Globalization, Emerald, Bingley. Rice, J. R. (1969), On the degree of convergence of nonlinear spline approximation, in I. J. Schoenberg, ed., ‘Approximations with special emphasis on spline functions : proceedings of a symposium conducted by the Mathematics Research Center’, Publication ... of the Mathematics Research Center, Academic Press, Madison. Rose, D. J. (1969), ‘An algorithm for solving a special class of tridiagonal systems of linear equations’, Communications of the ACM 12(4), 234–236. Rothenberg, T. J. (1971), ‘Identification in parametric models’, Econometrica 39(3), 577– 591. Ruppert, D. (2002), ‘Selecting the number of knots for penalized splines’, Journal of Computational and Graphical Statistics 11(4), 735–757. Ruppert, D., Wand, M. P. and Carroll, R. J. (2009), Semiparametric regression, Vol. 12 of Cambridge series in statistical and probabilistic mathematics, reprinted. edn, Cambridge Univ. Press, Cambridge. Rydberg, T. H. and Shephard, N. G. (2003), ‘Dynamics of trade-by-trade price movements: decomposition and models’, Journal of financial econometrics 1(1), 2–25. Schoenberg, I. J. (1946a), ‘Contributions to the problem of approximation of equidistant data by analytic functions: Part a. on the problem of smoothing or raduation. a first class of analytic approximation formulae’, Quarterly of Applied Mathematics 4(1), 45–99.

173

REFERENCES Schoenberg, I. J. (1946b), ‘Contributions to the problem of approximation of equidistant data by analytic functions: Part b. on the problem of osculatory interpolation. a second class of analytic approximation formulae’, Quarterly of Applied Mathematics 4(2), 112–141. Schoenberg, I. J. and Whitney, A. (1953), ‘On polya frequency function. iii. the positivity of translation determinants with an application to the interpolation problem by spline curves’, Transactions of the American Mathematical Society 74(2), 246. Schumaker, L. L. (2007), Spline functions: Basic theory, Cambridge mathematical library, 3. edn, Cambridge University Press, Cambridge. Schwarz, G. (1978), ‘Estimating the dimension of a model’, The Annals of Statistics 6(2). Schwetlick, H. and Schütze, T. (1995), ‘Least squares approximation by splines with free knots’, BIT 35(3), 361–384. Shibata, R. (1976), ‘Selection of the order of an autoregressive model by akaike’s information criterion’, Biometrika 63(1), 117. Shibata, R. (1986), ‘Consistency of model selection and parameter estimation’, Journal of Applied Probability 23(A), 127–141. Shiryaev, A. N. (1999), Essentials of stochastic finance: facts, models, theory, Vol. 3 of Advanced series on statistical science & applied probability, World scientific. Silvennoinen, A. (2006), Essays on autoregressive conditional heteroskedasticity, Stockholm School of Economics, EFI, Economic Research Institute, Stockholm. Silvennoinen, A. and Terasvirta, T. (2017), ‘Modelling and forecasting wig20 daily returns’, Central European Journal of Economic Modelling and Econometrics 2017(3). Silverman, B. W. (1986), Density estimation for statistics and data analysis, Monographs on statistics and applied probability, Chapman & Hall, New York. Singer, H. (1999), Finanzmarktökonometrie: Zeitstetige Systeme und ihre Anwendung in Ökonometrie und empirischer Kapitalmarktforschung, Vol. 171 of Wirtschaftswissenschaftliche Beiträge, Physica-Verlag HD, Heidelberg. Smith, P. L. (1982), ‘Curve fitting and modeling with splines using statistical variable selection techniques’, Report NASA 166034 . Song, P. X.-K., Fan, Y. and Kalbfleisch, J. D. (2005), ‘Maximization by parts in likelihood inference’, Journal of the American Statistical Association 100(472), 1145–1158. Stock, J. H. (1994), Unit roots, structural breaks and trends, in ‘Handbook of econometrics’, North-Holland, Amsterdam, pp. 2739–2841. Stone, C. J. (1982), ‘Optimal global rates of convergence for nonparametric regression’, The Annals of Statistics 10(4), 1040–1053. Stone, C. J., Hansen, M. H., Kooperberg, C. and Truong, Y. K. (1997), ‘Polynomial splines and their tensor products in extended linear modeling’, The Annals of Statistics 25(4), 1371–1425.

174

REFERENCES Straumann, D. (2005), Estimation in Conditionally Heteroscedastic Time Series Models, Vol. 181 of Lecture Notes in Statistics, Springer, Berlin, Heidelberg. Straumann, D. and Mikosch, T. (2006), ‘Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach’, The Annals of Statistics 34(5), 2449–2495. Teräsvirta, T. (2009), An introduction to univariate garch models, in ‘Handbook of Financial Time Series’, Springer, pp. 17–42. Tsay, R. S. (1984), ‘Order selection in nonstationary autoregressive models’, The Annals of Statistics 12(4), 1425–1433. Tsay, R. S. (2014), Multivariate time series analysis: With R and financial applications, Wiley series in probability and statistics, Wiley, Hoboken, NJ. van Bellegem, S. and von Sachs, R. (2004), ‘Forecasting economic time series with unconditional time-varying variance’, International Journal of Forecasting 20(4), 611–627. Wang, L., Feng, C., Song, Q. and Yang, L. (2012), ‘Efficient semiparametric garch modeling of financial volatility’, Statistica Sinica 22(1). Weiss, A. A. (1986), ‘Asymptotic theory for arch models: Estimation and testing’, Econometric Theory 2(1), 107–131. West, K. D. and McCracken, M. W. (1998), Regression-based tests of predictive ability, Vol. 226 of Technical working papers National Bureau of Economic Research, Inc, Cambridge, MA. White, H. (1982), ‘Maximum likelihood estimation of misspecified models’, Econometrica 50(1), 1–25. Wold, S. (1974), ‘Spline functions in data analysis’, Technometrics 16(1), 1–11. Wong, K. F. K., Galka, A., Yamashita, O. and Ozaki, T. (2006), ‘Modelling non-stationary variance in eeg time series by state space garch model’, Computers in biology and medicine 36(12), 1327–1335. Woolridge, J. R. and Ghosh, C. (1986), ‘Institutional trading and security prices: The case of changes in the composition of the s&p 500 index’, Journal of Financial Research 9(1), 13– 24. Xekalaki, E. and Degiannakis, S. (2010), ARCH Models for Financial Applications, 2. edn, John Wiley & Sons Ltd, Hoboken. Zakoian, J. M. (1994), ‘Threshold heteroskedastic models’, Journal of Economic Dynamics and Control 18(5), 931–955. Zhang, Y., Liu, R., Shao, Q. and Yang, L. (2020), ‘Two-step estimation for time varying arch models’, Journal of Time Series Analysis 41(4), 551–570. Zivot, E. (2009), Practical issues in the analysis of univariate garch models, in ‘Handbook of financial time series’, Springer, Berlin, pp. 113–155.

175

Appendices

176

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2022 O. Old, Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model, Gabler Theses, https://doi.org/10.1007/978-3-658-38618-4

A Standardized Student’s t-distribution Student’s-t distribution PDF

p(x, v, ϕ) =

) Γ( v+1 x2 2 √ 1+ 2 Γ( v2 )ϕ πv vϕ

−( v+1 ) 2

(A.1) i.i.d.

ϕ is a scaling factor, v are the degrees of freedom. zt ∼ St(0, 1, v), then the me (moments) of standardized Student’s-t distribution with ϕ2 = (v−2) v m1 =E[zt ] = 0

(A.2)

v =1 = Var[zt ] = ϕ (v − 2) 2 3v 3(v − 2) m4 =E[zt4 ] = ϕ4 = (v − 2)(v − 4) (v − 4) m4 3(v − 2) , κ(zt ) = 2 = m2 (v − 4) m2 =E[zt2 ]

2

see Fan and Yao (2017, pp.127-128).

177

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Fachmedien Wiesbaden GmbH, part of Springer Nature 2022 O. Old, Modeling Time-Varying Unconditional Variance by Means of a Free-Knot Spline-GARCH Model, Gabler Theses, https://doi.org/10.1007/978-3-658-38618-4

(A.3) (A.4) (A.5)

B Derivatives B.1 Free-knot spline-GARCH model conditional mean equation:

yt = φ0 +

U

φyk yt−k + t +

k=1

⇒ t = yt − φ0 −

U

φyk yt−k −

k=1

V

φk t−k

k=1 V

(B.1) φk t−k

k=1

conditional variance equation: ⎛

ht s = ⎝1 − ⎛

P

αp −

p=1

Q



βq ⎠ +

q=1

P p=1

αp

Q 2t−p + βq ht−q τt−p q=1



Q P P   2 1 ht a = ⎝1 − αp − βq − γp ⎠ + αp + γp 1t−p