940 92 7MB
English Pages 415 [389] Year 2021
Marco Corazza Manfred Gilli Cira Perna Claudio Pizzi Marilena Sibillo Editors
Mathematical and Statistical Methods for Actuarial Sciences and Finance
eMAF
2020
Mathematical and Statistical Methods for Actuarial Sciences and Finance
Marco Corazza · Manfred Gilli · Cira Perna · Claudio Pizzi · Marilena Sibillo Editors
Mathematical and Statistical Methods for Actuarial Sciences and Finance eMAF2020
Editors Marco Corazza Department of Economics Ca’ Foscari University of Venice Venice, Italy Cira Perna Department of Economics and Statistics University of Salerno Fisciano, Salerno, Italy
Manfred Gilli Geneva School of Economics and Management (GSEM) University of Geneva Geneva, Switzerland Claudio Pizzi Department of Economics Ca’ Foscari University of Venice Venice, Italy
Marilena Sibillo Department of Economics and Statistics University of Salerno Fisciano, Salerno, Italy
ISBN 978-3-030-78964-0 ISBN 978-3-030-78965-7 (eBook) https://doi.org/10.1007/978-3-030-78965-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume presents a collection of peer reviewed papers selected from over one hundred and ten presented at the International Conference eMAF2020—Mathematical and Statistical Methods for Actuarial Sciences and Finance. eMAF2020 is the ninth edition of an international biennial series of scientific meetings, started in 2004 on the initiative of the Department of Economics and Statistics of the University of Salerno. The idea behind this series is that the cooperation and contamination between mathematicians and statisticians working in actuarial sciences and finance could improve the research on these topics. The effectiveness of this idea has been proved by the wide participation in all the editions, which, in order, have been held in Salerno (2004, 2006, 2010 and 2014), in Venice (2008, 2012 and 2020), in Paris (2016) and in Madrid (2018). Originally, the conference was supposed to be physically held in Geneva, in April 2020. But, due to the now sadly famous COVID-19 pandemic, it was temporarily suspended. The Steering Committee of the conference, after considering all possible alternatives, decided to hold the conference remotely. (“e” in eMAF2020 stands for “electronic”). Thus, eMAF2020 was streamed through the Zoom platform offered by the Department of Economics of the Ca’ Foscari University of Venice on September 18, 22 and 25, 2020. Despite the remote format, an unexpected number of participants and presenters attended the virtual rooms of eMAF2020. This volume covers a wide variety of subjects: artificial intelligence and machine learning in finance and insurance, behavioral finance, credit risk methods and models, dynamic optimization in finance, financial data analytics, forecasting dynamics of actuarial and financial phenomena, foreign exchange markets, insurance models, interest rate models, longevity risk, models and methods for financial time series analysis, multivariate techniques for financial markets analysis, pension systems, portfolio selection and management, real-world finance, risk analysis and management, trading systems and others. Of course, both eMAF2020 and this volume would not be possible without the collaboration of the members of the Scientific and Organizing Committees, without the support of the sponsors, namely, the Association for Mathematics Applied to Social and Economic Sciences (AMASES) and Egonon SA—Risk management and v
vi
Preface
advisor, and without the help of several partners, namely, the Computational and Metodological Statistics working group, the Venice centre in Economics and Risk Analytics for public policies (VERA) of the Ca’ Foscari University of Venice, the Centre of Quantitative Economics of the Ca’ Foscari University of Venice, Springer Nature, and the Società Italiana di Statistica (SIS). To all of them, our thanks. Finally, we are pleased to inform you that the Steering Committee is already working for the next edition in 2022. We look forward to seeing you. Venice, Italy Geneva, Switzerland Salerno, Italy Venice, Italy Salerno, Italy November 2020
Marco Corazza Manfred Gilli Cira Perna Claudio Pizzi Marilena Sibillo
Contents
A Comparison Among Alternative Parameters Estimators in the Vasicek Process: A Small Sample Analysis . . . . . . . . . . . . . . . . . . . . . Giuseppina Albano, Michele La Rocca, and Cira Perna On the Use of Mixed Sampling in Modelling Realized Volatility: The MEM–MIDAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandra Amendola, Vincenzo Candila, Fabrizio Cipollini, and Giampiero M. Gallo
1
7
Simultaneous Prediction Intervals for Forecasting EUR/USD Exchange Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilaria Lucrezia Amerise and Agostino Tarsitano
15
An Empirical Investigation of Heavy Tails in Emerging Markets and Robust Estimation of the Pareto Tail Index . . . . . . . . . . . . . . . . . . . . . . Joseph Andria and Giacomo di Tollo
21
Potential of Reducing Crop Insurance Subsidy Based on Willingness to Pay and Random Forest Analysis . . . . . . . . . . . . . . . . . . . Rahma Anisa, Dian Kusumaningrum, Valantino Agus Sutomo, and Ken Seng Tan
27
A Stochastic Volatility Model for Optimal Market-Making . . . . . . . . . . . . Zubier Arfan and Paul Johnson
33
Method for Forecasting Mortality Based on Key Rates . . . . . . . . . . . . . . . . David Atancd, Alejandro Balbas, and Eliseo Navarro
39
Resampling Methods to Assess the Forecasting Ability of Mortality Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Atance, Ana Debón, and Eliseo Navarro
45
Portfolio Optimization with Nonlinear Loss Aversion and Transaction Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandro Avellone, Anna Maria Fiori, and Ilaria Foroni
51
vii
viii
Contents
Monte Carlo Valuation of Future Annuity Contracts . . . . . . . . . . . . . . . . . . Anna Rita Bacinello, Pietro Millossovich, and Fabio Viviano
57
A Risk Based Approach for the Solvency Capital Requirement for Health Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabio Baione, Davide Biancalana, and Paolo De Angelis
63
An Application of Zero-One Inflated Beta Regression Models for Predicting Health Insurance Reimbursement . . . . . . . . . . . . . . . . . . . . . Fabio Baione, Davide Biancalana, and Paolo De Angelis
71
Periodic Autoregressive Models for Stochastic Seasonality . . . . . . . . . . . . . Roberto Baragona, Francesco Battaglia, and Domenico Cucina
79
Behavioral Aspects in Portfolio Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diana Barro, Marco Corazza, and Martina Nardon
87
Stochastic Dominance in the Outer Distributions of the α-Efficiency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio Bianchi, Augusto Pianese, Massimiliano Frezza, and Anna Maria Palazzo
95
Formal and Informal Microfinance in Nigeria. Which of Them Works? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Marinella Boccia Conditional Quantile Estimation for Linear ARCH Models with MIDAS Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Vincenzo Candila and Lea Petrella Modelling Topics of Car Accidents Events: A Text Mining Approach . . . 117 Gabriele Cantaluppi and Diego Zappa A Bayesian Generalized Poisson Model for Cyber Risk Analysis . . . . . . . 123 Giulia Carallo, Roberto Casarin, and Christian P. Robert Implementation in R and Matlab of Econometric Models Applied to Ages After Retirement in Europe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Patricia Carracedo and Ana Debón Machine Learning in Nested Simulations Under Actuarial Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Gilberto Castellani, Ugo Fiore, Zelda Marino, Luca Passalacqua, Francesca Perla, Salvatore Scognamiglio, and Paolo Zanetti Comparing RL Approaches for Applications to Financial Trading Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Marco Corazza, Giovanni Fasano, Riccardo Gusso, and Raffaele Pesenti MFG-Based Trading Model with Information Costs . . . . . . . . . . . . . . . . . . 153 Marco Corazza, Rosario Maggistro, and Raffaele Pesenti
Contents
ix
Trading System Mixed-Integer Optimization by PSO . . . . . . . . . . . . . . . . . 161 Marco Corazza, Francesca Parpinel, and Claudio Pizzi A GARCH-Type Model with Cross-Sectional Volatility Clusters . . . . . . . 169 Pietro Coretto, Michele La Rocca, and Giuseppe Storti A Lattice Approach to Evaluate Participating Policies in a Stochastic Interest Rate Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Massimo Costabile, Ivar Massabó, Emilio Russo, and Alessandro Staino Multidimensional Visibility for Describing the Market Dynamics Around Brexit Announcements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Maria Elena De Giuli, Andrea Flori, Daniela Lazzari, and Alessandro Spelta Risk Assessment in the Reverse Mortgage Contract . . . . . . . . . . . . . . . . . . . 189 Emilia Di Lorenzo, Gabriella Piscopo, Marilena Sibillo, and Roberto Tizzano Neural Networks to Determine the Relationships Between Business Innovation and Gender Aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Giacomo di Tollo, Joseph Andria, and Stoyan Tanev Robomanagement TM : Virtualizing the Asset Management Team Through Software Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Riccardo Donati and Marco Corazza Numerical Stability of Optimal Mean Variance Portfolios . . . . . . . . . . . . . 209 Claudia Fassino, Maria-Laura Torrente, and Pierpaolo Uberti Pairs-Trading Strategies with Recurrent Neural Networks Market Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Andrea Flori and Daniele Regoli Automatic Balancing Mechanism and Discount Rate: Towards an Optimal Transition to Balance Pay-As-You-Go Pension Scheme Without Intertemporal Dictatorship? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Frédéric Gannon, Florence Legros, and Vincent Touzé The Importance of Reporting a Pension System’s Income Statement and Budgeted Variances in a Fair and Sustainable Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Anne Marie Garvey, Manuel Ventura-Marco, and Carlos Vidal-Meliá Improved Precision in Calibrating CreditRisk+ Model for Credit Insurance Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 J. Giacomelli and L. Passalacqua A Model-Free Screening Selection Approach by Local Derivative Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Francesco Giordano, Sara Milito, and Maria Lucia Parrella
x
Contents
Markov Switching Predictors Under Asymmetric Loss Functions . . . . . . 251 Francesco Giordano and Marcella Niglio Screening Covariates in Presence of Unbalanced Binary Dependent Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Francesco Giordano, Marcella Niglio, and Marialuisa Restaino Health and Wellbeing Profiles Across Europe . . . . . . . . . . . . . . . . . . . . . . . . 265 Aurea Grané, Irene Albarrán, and Roger Lumley On Modelling of Crude Oil Futures in a Bivariate State-Space Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 Peilun He, Karol Binkowski, Nino Kordzakhia, and Pavel Shevchenko A General Comovement Measure for Time Series . . . . . . . . . . . . . . . . . . . . 279 Agnieszka Jach Alternative Area Yield Index Based Crop Insurance Policies in Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Dian Kusumaningrum, Rahma Anisa, Valantino Agus Sutomo, and Ken Seng Tan Clustering Time Series by Nonlinear Dependence . . . . . . . . . . . . . . . . . . . . . 291 Michele La Rocca and Luca Vitale Quantile Regression Neural Network for Quantile Claim Amount Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Alessandro G. Laporta, Susanna Levantesi, and Lea Petrella Modelling Health Transitions in Italy: A Generalized Linear Model with Disability Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Susanna Levantesi and Massimiliano Menzietti Mid-Year Estimators in Life Table Construction . . . . . . . . . . . . . . . . . . . . . . 315 Josep Lledó, Jose M. Pavía, and Natalia Salazar Representing Koziol’s Kurtoses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Nicola Loperfido Optimal Portfolio for Basic DAGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Diego Attilio Mancuso and Diego Zappa The Neural Network Lee–Carter Model with Parameter Uncertainty: The Case of Italy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Mario Marino and Susanna Levantesi Pricing of Futures with a CARMA(p, q) Model Driven by a Time Changed Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Lorenzo Mercuri, Andrea Perchiazzo, and Edit Rroji
Contents
xi
Forecasting Multiple VaR and ES Using a Dynamic Joint Quantile Regression with an Application to Portfolio Optimization . . . . . . . . . . . . . 349 Merlo Luca, Petrella Lea, and Raponi Valentina Financial Market Crash Prediction Through Analysis of Stable and Pareto Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Jesus-Enrique Molina, Andres Mora-Valencia, and Javier Perote Precision Matrix Estimation for the Global Minimum Variance Portfolio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Marco Neffelli, Maria Elena De Giuli, and Marina Resta Deconstructing Systemic Risk: A Reverse Stress Testing Approach . . . . . 369 Javier Ojea-Ferreiro Stochastic Dominance and Portfolio Performance Under Heuristic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Adeola Oyenubi Big-Data for High-Frequency Volatility Analysis with Time-Deformed Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 António A. F. Santos Parametric Bootstrap Estimation of Standard Errors in Survival Models When Covariates are Missing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Francesco Ungolo, Torsten Kleinow, and Angus S. Macdonald The Role of Correlation in Systemic Risk: Mechanisms, Effects, and Policy Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395 Stefano Zedda, Michele Patanè, and Luana Miggiano
About the Editors
Marco Corazza, Ph.D. in “Mathematics for the Analysis of Financial Markets”, is an associate professor at the Department of Economics of the Ca’ Foscari University of Venice. Among his main research interests are static and dynamic portfolio management theories; trading system models; machine learning applications in finance; bioinspired metaheuristics for optimization; multicriteria methods for economic decision support; nonstandard probability distributions in finance; and port scheduling models and algorithms. He has participated and participates in several research projects, both at the national and international levels. He is an author/coauthor of approximately one hundred thirty scientific publications; some of them have received national and international awards. He is also editor-in-chief of the international scientific journal “Mathematical Methods in Economics and Finance”, editor of Springer books, and has been and is member of the scientific committees of several conferences and of some private companies. He combined academic activity with consulting services. Manfred Gilli is Professor emeritus at the Geneva School of Economics and Management at the University of Geneva, where he has taught numerical methods in economics and finance. He is also a faculty member of the Swiss Finance Institute, a member of the Advisory Board of Computational Statistics and Data Analysis and a member of the editorial board of Computational Economics. He formerly served as president of the Society for Computational Economics. Cira Perna is a full professor of statistics at the Department of Economics and Statistics of the University of Salerno (Italy). Since 2018, she has been elected a member of the Steering Committee of the Italian Statistical Society; since 2019, she has been a member of the Board of Directors of the University of Salerno; since the first edition of the Conference, in 2004, she has been a chair of the international conference MAF and guest editor of the associated international journals; and since 2006, she has been an Editor of the Springer books MAF. Her research work mainly focuses on nonlinear time series, artificial neural network models and resampling techniques. On these topics, she has published numerous papers in national and xiii
xiv
About the Editors
international journals. She has participated in several research projects, both at the national and international levels, and she has been a member of several scientific committees of national and international conferences. Claudio Pizzi is an associate professor at the Department of Economics of the Ca’ Foscari University of Venice, where he teaches statistical methods for financial and monetary markets and business statistics. His research is focused mainly on statistical analysis of financial time series, linear and nonlinear models for time series, technical analysis, trading system models, bioinspired metaheuristics for optimization and systemic risk. He has participated in both national and international research projects. He is a member of the editorial board of “Statistical Method and Applications”. Marilena Sibillo is a full professor of Mathematical Methods for Economics, Finance and Actuarial Sciences at the University of Salerno and is currently a contract professor of Financial Mathematics in the 2020/2021 academic year at Luiss University in Rome. In 2012, she was awarded a Highly Commended Award Winner at the Literati Network Awards for Excellence, and since 2013, she has been a Paul Harris Fellow. She had national and international awards related to teaching. Since 2006, she has been an editor of the Springer books MAF and Finance and a guest editor of international journals. Since 2004, she has been chair of the international conference MAF, and since 2016, she has been chair of the UNISActuarial School. She is an author of more than 100 papers mostly published in international journals and books. Her scientific activity mainly deals with risk theory, analysis and control of the interactions between financial and demographic risks, variable annuities, stochastic mortality and innovative pension contracts.
A Comparison Among Alternative Parameters Estimators in the Vasicek Process: A Small Sample Analysis Giuseppina Albano, Michele La Rocca, and Cira Perna
Abstract In this paper we perform a Monte Carlo simulation study to compare the performance of several parameters estimators in the Vasicek process when small sample sizes are available. The aim is to give useful insights to establish which estimator is better when only short time series, with a length between 20 and 200, are observed. Keywords Small samples · Martingale estimating functions · Generalized method of moments
1 Introduction Vasicek is a well known homogeneous diffusion process, often used for the evaluation of life insurance contracts and for modelling short-term interest rates. In such contexts, the data are usually yearly observed and, as a consequence, only small samples are available in the estimation procedures. Therefore, the asymptotic conditions, which ensure the good statistical properties of many parameters estimators, are not appropriate. The aim of this paper is to compare the performances of some alternative estimation procedures in the case of short time series. In particular, we consider three alternative estimators for the parameters: the classical Maximum Likelihood Estimator (MLE), the Generalized method of Moments (GMM), the linear martingale estimating function. G. Albano (B) Dip.di Studi Politici e Sociali, Università di Salerno, Salerno, Italy e-mail: [email protected] M. L. Rocca · C. Perna Dip.di Scienze Economiche e Statistiche, Università di Salerno, Salerno, Italy e-mail: [email protected] C. Perna e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_1
1
2
G. Albano et al.
In this context, a small Monte Carlo experiment is implemented in order to investigate which properties of parameters estimator still remain valid when time series, with a length n between 20 and 200, are available. The paper is organized as follows. In Sect. 2 the alternative estimators are briefly reviewed. In Sect. 3 the results of the implemented simulation experiment are reported along with some concluding remarks.
2 Some Relevant Parameters Estimators Let X obs = (X 1 , . . . , X n ) be equally-spaced observations from a Vasicek process {X (t), t ∈ [0, +∞]} described by the following stochastic differential equation (SDE): (1) d X t = k(α − X t )dt + σ d Bt , where Bt is a standard Brownian motion. The unknown parameters vector θ = (α, k, σ ) can be estimated by means of several approaches (see, for example, [8]). In the following we briefly overview the most used in the literature.
2.1 Maximum Likelihood Estimator Since, for the Vasicek process, the conditional distribution is Normal, the likelihood function of θ = (k, α, σ ) is √ L(θ ) = φ σ −1 2k(X 0 − α) n −1 −kδ −kδ −2kδ −1 φ σ ) {X t − X t−1 e − α(1 − e )} . 2k(1 − e × t=1
The MLE estimator of θ can be explicitly derived (see, [1, 2, 9] for details).
2.2 Generalized Method of Moments The generalized method of moments is based on the matching of theoretical moments and sample moments. Let u i = u(X i ; θ ) ∈ R r , r ≥ 3 (the number of unknown parameters), such that the orthogonality condition holds, i.e. if θ0 is the true value of the parameter θ : (2) E[u(X i ; θ )] = 0 only if θ = θ0 .
A Comparison Among Alternative Parameters Estimators in the Vasicek Process …
3
Usually, the function u(θ ) is the difference between the exact kth moment and X ik for some powers k. Let n gn (θ ) = u(X i ; θ ) i=1
be the sample counterpart of (2). The optimization problem is transformed as θ = ar gmin θ gn (θ )T W gn (θ ), where W is a positive definite matrix of weights. The choice of W as the inverse of the long-run covariance matrix guarantees the smallest asymptotic covariance matrix of the GMM estimator. The covariance matrix is estimated from the data by looking at its sample counterpart. In [4] a two-step procedure is provided to obtain the GMM estimator for the parameters of a general diffusion X (t). The GMM estimators are asymptotically normal. Under additional regularity conditions, the estimator is also consistent (see [4]).
2.3 Martingale Estimating Functions Estimating functions (see [5] for a complete reference) are functions Fn such that: Fn (X obs ; θ ) = 0. They should be developed ad hoc for each observational model, and there might exist different kinds of them. A relevant case is the martingale estimating function: They are estimating functions, G n (θ ), that are also martingales, i.e. E G n (θ ) | F n−1 = G n−1 (θ ), where Fn−1 is the filtration defined as σ (X 1 , . . . , X n−1 ). Particular form of estimating functions leading to explicit estimations are polynomial martingale estimating functions. The linear and quadratic estimating functions are relied on the knowledge of the first conditional moments. When the first conditional moments are known the estimations are also consistent. In particular we focus on the optimal estimating function proposed in [3]: G n (θ ) =
n ∂θ b(X i−1 , θ )
i=1
σ2
X i − E(X i | X i−1 ) .
(3)
When conditional moments are known, the estimators are also consistent. In general, this type of estimating function provides estimators that are very robust to model misspecification [7].
4
G. Albano et al.
3 Simulations and Some Final Remarks We implement a small Monte Carlo experiment in order to compare the proposed estimators in the case of short time series with a length between 20 and 200. Following [1] we simulate three Vasicek processes, V1 , V2 and V3 with parameters (k, α, σ 2 ) = (0.858, 0.0891, 0.00219) (Model V1), (0.215, 0.0891, 0.0005) (Model V2), (0.140, 0.0891, 0.0003) (Model V3). Since in [9] it is shown that the bias and α variance of σ 2 is O(n −1 ), and hence converge to zero much faster with respet to and k, in this simulation we assume that σ in (1) is known and we focus only on the estimates of k and α. All simulations are based on 500 runs with sample size n = 20, 30, 50, 100, 200. In Fig. 1 the boxplots of the MLEs are shown for the parameters α (on the left) and k (on the right). We can observe that for small sample size (n = 20 and 30) the range of the estimates α is very high, decreasing already for n = 50. Anyway, the bias seems not remarkable for all the the sample sizes. A similar behaviour can be observed also for k even if for the models V2 and V3 the variability decreases more slowly as the sample size increases. In Fig. 2 the results for the GMM estimates are plotted, showing the quite good performances of these estimates, expecially for the parameter k. Also in this case the bias seems to be small, although several outliers are present for the considered sample sizes. The variability seems higher for α in particular for the model V1 , in which it becomes smaller only for n = 200. Figure 3 shows the estimates provided by the martingale estimating function as in (3). They show a high bias for small samples, in particular for k in model V1 , but the relative bias remains not so high. Further the variability seems small already for small sample size and it does not substantially decrease as n increase. Since MLE and
Fig. 1 Vasicek model: results of the MLE estimates of the parameters α (on the left) and k (on the right) based on 500 Monte Carlo runs
A Comparison Among Alternative Parameters Estimators in the Vasicek Process …
5
Fig. 2 Vasicek model: results of the GMM estimates of the parameters α (on the left) and k (on the right) based on 500 Monte Carlo runs
Fig. 3 Vasicek model: results of the martingale estimates of the parameters α (on the left) and k (on the right) based on 500 Monte Carlo runs
GMM estimators seem to have similar performances and better with respect to the linear martingale estimator, in Fig. 4 they are plotted together. It is clear that MLE provides better estimates for both the parameters in almost all the cases. Clearly outliers are still present for n = 200 and not admittible values (negative values of the estimates) are provided when the sample size is less than 50.
6
G. Albano et al.
Fig. 4 Vasicek model: results of the estimates of the parameters α (on the left) and k (on the right) based on 500 Monte Carlo runs
References 1. Albano, G., La Rocca, M., Perna, C.: Small sample analysis in diffusion processes: a simulation study. In: Corazza, M., et al. (eds.) Mathematical and Statistical Methods for Actuarial Sciences and Finance, pp. 19–23. Springer, Cham (2018) 2. Albano, G., La Rocca, M., Perna, C.: Small sample properties of ML estimator in Vasicek and CIR models: a simulation experiment. Decis. Econ. Finance 42, 1–15 (2019) 3. Bibby, B.M., Sorensen, M.: Martingale estimation functions for discretely observed diffusion processes. Bernoulli 1(1–2), 17–39 (1995) 4. Hansen, L.P.: Large sample properties of generalized method of moments estimators. Econometrica 50(4), 1029–1054 (1982) 5. Heyde, C.C.: Quasi-likelihood and Its Applications: A General Approach to Optimal Parameter Estimation. Springer, New York (1997) 6. Kessler, M.: Simple and explicit estimating functions for a discretely observed diffusion process. Scand. J. Stat. 27, 65–82 (2000) 7. Iacus, S.M.: Simulation and Inference for Stochastic Differential Equations: With R Examples. Springer Series in Statistics, New-York (2008) 8. Sorensen, H.: Parametric inference for diffusion processes observed at discrete points in time: a survey. Int. Stat. Rev. 72(3), 337–354 (2004) 9. Tang, C.Y., Chen, S.X.: Parameter estimation and bias correction for diffusion processes. J. Econ. 149(1), 65–81 (2009)
On the Use of Mixed Sampling in Modelling Realized Volatility: The MEM–MIDAS Alessandra Amendola, Vincenzo Candila, Fabrizio Cipollini, and Giampiero M. Gallo
Abstract When dealing with market activity, different frequency of observation may reveal relevant information of interest to model financial time series. We embed a MIDAS (MI(xed)–DA(ta) Sampling) component in a multiplicative error model (MEM) context (MEM–MIDAS). The proposed specification considers a low frequency component, say monthly, in the conditional expectation of a daily nonnegative process. The empirical application illustrates the performance of the MEM– MIDAS model on the realized volatility of the NASDAQ index, statistically outperforming the standard MEM model and other popular specifications. Keywords Realized volatility · Multiplicative error model · MIDAS
1 Introduction More than forty years have passed since Engle’s pioneering works on modeling the conditional variance as an autoregressive process of observable variables. GARCHtype models are still playing a significant role in the financial econometrics literature. This is mainly due to the fact that this class of models allows to reflect several stylized facts, such as the persistence in the conditional second moments (volatility clustering) and the possibility of taking into account the slow moving or state dependent A. Amendola (B) Department of Economics and Statistics, University of Salerno, Salerno, Italy e-mail: [email protected] V. Candila MEMOTEF Department, Sapienza University of Rome, Rome, Italy e-mail: [email protected] F. Cipollini Department of Statistics “G. Parenti”, University of Florence, Florence, Italy e-mail: [email protected] G. M. Gallo NYU in Florence, Florence, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_2
7
8
A. Amendola et al.
average volatility level. The latter stylized fact can be suitably accommodated assuming that the dynamic evolution of volatility is driven by two components, a high- and a low-frequency one, which combine additively or multiplicatively. Several suggestions exist in the GARCH literature to model the low frequency component ([1] offer a comprehensive survey of the contributions in this field). For instance, [2] consider a Markov Switching framework and [3] introduce deterministic functions in order to make the unconditional variance time-varying. More recently, in order to address the issue of the low frequency component driven by macro–variables observed at lower frequencies than that of the asset returns (typically daily), the MIDAS (MI(xed)– DA(ta) Sampling, [4]) terms have appeared in the GARCH class of models [5]. A further extension, the Double Asymmetric GARCH–MIDAS (DAGM) was introduced by [6], where a variable available at a low frequency drives the slow moving level of volatility and is allowed to have differentiated effects according to its sign, determining a local time–varying trend around which a GJR–GARCH describes the short run dynamics. New emphasis to the volatility modeling literature has also been given by the advent of ultra-high frequency data, which have contributed to this framework under several aspects. First, high-frequency based volatility estimators, such as the realized variance (RV), stemming from [7], have become an ideal target for evaluating forecasting performances. Second, the high-frequency based estimators provide an ideal tool to deal with intra-daily information. Third, the models using Realized variance, like the Multiplicative Error Model (MEM, [8, 9]), the Heterogeneous Autoregressive Model (HAR, [10]) and Realized GARCH (RGARCH, [11]), were shown to be better capable of exploiting available information than those only based on squared returns. Within the MEM context, the low-frequency component has been estimated in several ways: through regime switching and smooth transition functions [12], by deterministic splines [13] or by a semi-non-parametric vector MEM, where the low-frequency term affecting several assets is obtained non-parametrically [14]. The present contribution fills a gap in the current literature: the inclusion of MIDAS terms within the MEM framework. The proposed MEM–MIDAS assumes that the conditional expectation of a non-negative process, like the realized volatility, observed daily may accommodate a low frequency component, say weekly or monthly. The empirical application highlights the benefits of such approach: with reference to the volatility of the NASDAQ index, for the period 2001–2019, the MEM–MIDAS outperforms in-sample the standard MEM and other popular models. The rest of the paper is organized as follows. Section 2 illustrates the proposed specification while Sect. 3 is devoted to the empirical analysis.
2 The MEM–MIDAS Let {xi,t } be the time series of a discrete time process for the day i, with i = 1, . . . , Nt , with Nt representing the number of days within the low-frequency period t, which may be a week, a month or a quarter, for instance; overall, we have
On the Use of Mixed Sampling in Modelling Realized Volatility …
9
T low-frequency periods. We assume that xi,t , ∀i and t, is observed on the set R≥0 = xi,t ∈ R|xi,t ≥ 0 , an assumption which is suitable when realized volatility or high-low range processes are the object of interest xi,t . Conditionally on the information set Fi−1,t , at day i of the period t, the simple MEM specifies xi,t as: xi,t |Fi−1,t = μi,t εi,t ,
with i = 1, . . . , Nt
and
t = 1, . . . , T,
(1)
where μi,t is a quantity that, conditionally to Fi−1,t given a parameter vector , evolves deterministically, and εi,t is the error term. Note that F0,t ≡ F Nt−1 ,t−1 . iid
In line with the MEM literature, we assume that εi,t |Ii−1,t ∼ D + (1, σ 2 ), that is, the error term has a unit mean, unknown variance σ 2 and a probability density function defined over a non-negative support. Therefore, independently of the chosen distribution D + or the function used to build the evolution of μi,t , we have that: E(xi,t |Fi−1,t ) =μi,t ; V A R(xi,t |Fi−1,t ) =σ
2
(2)
2 μi,t .
(3)
In a standard or base MEM(1,1) specification, the conditional expectation of xi,t evolves as: (4) μi,t = α0 + αxi−1,t + βμi−1,t . Let us assume that {xi,t } is mean-stationary leads to E(xi,t ) = E(μi,t ) ≡ μ. Therefore, the constant α0 in (4) can be replaced by α0 = (1 − α − β) μ. A slowly evolving component of volatility can be inserted: the proposed MEM– MIDAS such a low-frequency component as a function of one (or more) additional exogenous variable, labelled as X t , observed at each period t. Hence, the dynamics of xi,t will depend on two components, labelled as short- and long-run components. The former (still labeled μi,t ) varies with each day i and the latter (denoted by τt ) with each period t. Therefore, the MEM–MIDAS model is defined as: xi,t |Fi−1,t = μi,t τt εi,t ,
with i = 1, . . . , Nt K τt = exp m + θ δk (ω)X t−k ;
and
t = 1, . . . , T ;
(5) (6)
k=1
μi,t = (1 − α − β)μ + α
xi−1,t + βμi−1,t . τt
(7)
The long-run component τt in Eq. (6) is a one-sided filter, in the spirit of the MIDAS regression [4], of the past K realizations of the variable X t . The coefficient m represents the average level of the long-run equation, around K which τt fluctuates according δk (ω)X t−k . to the impact of the weighted sum expressed by θ k=1 X t can be any macro–economic force potentially driving the volatility xi,t . The only restriction we need to impose is the strict stationarity of X t . In Eq. (6), θ signals
10
A. Amendola et al.
the impact of such a sum, whose addends are weighted according to suitable weighing functions. In line with the related literature, we use the Beta function: (k/K )ω1 −1 (1 − k/K )ω2 −1 . δk (ω) = K ω1 −1 (1 − j/K )ω2 −1 j=1 ( j/K )
(8)
In posing the constraints ω1 = 1 and ω2 ≥ 1, we only focus on the cases in which more emphasis is given on the most recent observations. So far, the parameter space of the proposed MEM–MIDAS consists of M E M−M = {α, β, m, θ, ω2 }, while μ can be replaced by its sample mean. However, the model (as the base MEM) is flexible enough as to include some asymmetric terms both in the short- and long-run equations (as in [6], for instance) and/or some additional lags of the variable xi,t . Suppose that xi,t is a volatility measure, such the realized volatility. If the interest is about evaluating the impact of negative lagged daily returns ri−1,t on today’s volatility, the short-run equation of the MEM–MIDAS transforms to: x i−1,t + βμi−1,t . μi,t = (1 − α − β − γ /2)μ + α + γ · 1(ri−1,t VaRt:t+W 0 otherwise
R Jt+W +1 =
R 1 if Rt+W +1 < VaRt:t+W 0 otherwise
From Fig. 2 it is possible to observe how the Gaussian assumption, in all cases, underestimates the extreme losses (VaR 99%) as J is much bigger than 1 − p; the estimation error is much more appreciable for Mexico and Qatar markets (Fig. 2c, b). Furthermore, the Stable distribution in most cases overestimates the VaR at 99% with a significant evidence for the Qatar market (Fig. 2c). In all cases, for a relevant choice of the window width W , the Pareto distribution provide a clearly better fit.
4 Tail Index Estimation The optimization problem herein proposed is based on a parameter tuning/fitting procedure. In detail, as said in Sect. 3, if VaR at p level reveals to be a good estimator in predicting actual extreme losses, then the number of loss occurrences which fall beyond VaR’s confidence level should be close to 1 − p for a relevant choice of the parameters. We introduce the following optimization problem: arg min W,dn ,u n ,w
2 Jt+W − Pr L T −W ≥ VaR p , #number of observations T −W t=1
(7)
where W, dn , u n , w are the parameters as defined in Sect. 2. By solving (7) we find the optimal parameters settings from which estimating the corresponding optimal α value in Eq.(1). Problem (7) is solved by means of the threshold accepting (TA) [9–13], a local search trajectory method algorithm.
An Empirical Investigation of Heavy Tails in Emerging Markets …
25
Table 1 Monte Carlo simulation results α = 1.0
α = 1.5
α = 2.0
α = 3.0
T = 350
T = 150
T = 80
T = 100
RB
RRMSE RB
RRMSE RB
TAVaR
−0.8891 4.0927
−0.6721
Hill
−1.1366 7.6908
−1.2692 11.6588
R-Hill
−0.5499 4.8580
−0.8209
6.3498 6.7343
−0.5200
RRMSE RB 5.4936
T = 80
RRMSE RB
RRMSE
9.0725
0.5715 10.8392
−1.6580 11.3871
−2.3173 14.7511
−3.2845 15.2885
−0.6950
−1.0849
−2.2796 11.9587
5.6050
0.2107
α = 3.0
9.5750
We have ran a Monte Carlo simulation in order to test our algorithm and assess the robustness of our estimates in terms of both the relative bias (RB) and the relative root-mean square error (RRMSE). In detail, the relative bias of an estimator is given by m 100 1
(αˆ i − α), (8) RB = α m i=1 where α is the actual value of the Pareto tail index, αˆ i is the estimated one with respect to the i − th (i = 1, . . . m) simulated sample and m is the number of simulations. The relative root-mean-square error is defined as m
100
1 (αˆ i − α)2 . RRMSE = α m i=1
(9)
We set m = 2, 500 and with respect to the TAVaR algorithm we executed 5 Restarts of 15 rounds of 900 steps each. Finally, we compared our results with those obtained, respectively, by the Hill estimator and the revised Hill estimator (R-Hill) as described in Sect. 2. Looking at the results in Table 1, it can be clearly seen that our proposed algorithm, i.e., TAVaR, across all the simulations, outperforms Hill and R-Hill methods as it provides less biased and more reliable results than the two other estimators. Nonetheless, theoretical and experimental comparisons with other robust methods (e.g., [14, 15]) will be addressed in future research.
5 Conclusion In this work, we analyze and compare the performances of VaR based estimators over three classes of density distributions, i.e., Gaussian, Stable and Pareto, and with respect to three different emerging markets: Egypt, Qatar and Mexico. This study is aimed at investigating if well known results in terms of tail behaviour properties of price changes in traditional markets also apply to emerging ones. Accordingly to
26
J. Andria and G. di Tollo
results obtained for traditional markets, also for the investigated emerging markets, our results led to the conclusion that power law distributions give the best results and appear to be the most effective with respect to the VaR risk measure. By means of a Threshold Accepting algorithm, we also propose a framework to optimally estimating the Pareto tail index. The herein presented results show that our estimates obtained by the proposed TAVaR algorithm outperform those from the two other compared approaches as they exhibit the smallest root-mean square error.
References 1. 2. 3. 4. 5. 6.
7. 8. 9. 10. 11. 12. 13.
14. 15.
Mandelbrot, B.: The variation of certain speculative prices. J. Bus. 36(4), 394–419 (1963) Fama, E.F.: Mandelbrot and the stable paretian hypothesis. J. Bus. 36(4), 420–429 (1963) Lévy, P.: Calcul des Probabilités. F. Rouge, Paris (1925) Pareto, V.: Cours D’Economie Politique. F. Rouge, Lausanne, Switzerland (1896) Jorion, P.: Value at Risk: The New Benchmark for Managing Fianncial Risk. McGraw-Hill (2006) Champagnat, N., Deaconu, M., Lejay, A., Navet, N., Boukherouaa, S.: An empirical analysis of heavy-tails behavior of financial data: the case for power laws. working paper or preprint (2013) Weissman, Ishay: Estimation of parameters and larger quantiles based on the k largest observations. J. Am. Stat. Assoc. 73(364), 812–815 (1978) Sean, D.: Campbell. A review of backtesting and backtesting procedures, Technical Report (2005) Dueck, G., Scheuer, T.: Threshold accepting: a general purpose optimization algorithm appearing superior to simulated annealing. J. Comput. Phys. 90(1), 161–175 (1990) Gilli, M., Schumann, E.: Optimal enough? J. Heuristics 17(4), 373–387 (2011) Gilli M., Winker, P.: Heuristic optimization methods in econometrics. In: Belsley, D.A., Kontoghiorghes, E (eds.) Handbook of Computational Econometrics. Wiley, New York (2009) Moscato, P., Fontanari, J.F.: Stochastic versus deterministic update in simulated annealing. Phys. Lett. A 146(4), 204–208 (1990) Andria, J., di Tollo, G., Lokketangen, A.: Distance measures for portfolio selection. In: Masri, H., Pérez-Gladish, B., Zopounidis, C. (eds.) Financial Decision Aid Using Multiple Criteria: Recent Models and Applications, pp. 113–129 (2018) Dell’Aquila, R., Embrechts, P.: Extremes and Robustness: a Contradiction?. J. Financial Markets Portfolio Manag. 20(1), 103–118 (2006) Hubert, M., Dierckx, G., Vanpaemel, D.: Detecting influential data points for the Hill estimator in Pareto-type distributions. Comput. Stat. Data Anal. 65(C), 13–28 (2013)
Potential of Reducing Crop Insurance Subsidy Based on Willingness to Pay and Random Forest Analysis Rahma Anisa, Dian Kusumaningrum, Valantino Agus Sutomo, and Ken Seng Tan
Abstract Indonesia has recently piloted a national paddy insurance program. The paddy insurance is heavily subsidized by the government generously covering 80% of the premium. Because the agricultural insurance is still in its infancy in Indonesia, it is imperative to have an insurance scheme that is effective and sustainable. To better understand the existing insurance scheme as well as farmers attitude towards paddy insurance, an extensive survey on the paddy farmers was conducted. Through formal statistical analysis based on the Wilcoxon signed-rank test, farmers’ willingness to pay (WTP) is found to be higher than the current subsidized premium for a satisfactory insurance scheme. This implies that the government subsidy could be lowered from 80% to 72%. Further analysis based on the method of random forest classification allows to identify most important factors for affecting farmers’ WTP. The first four most important factors are willingness to buy crop insurance, choice of insurance plan, satisfaction toward ease of premium administration from previous owned insurance, and farmers’ perception on the priority of insurance company trustworthiness. This is based on criteria of mean decrease of Gini impurity. These analysis and findings provide valuable guidance in revamping the existing paddy insurance program, including exploring the possibility of reducing premium subsidy. Keywords Crop insurance · Indonesia · Random forest classification · Survey data · Willingness to pay R. Anisa (B) Department of Statistics, IPB University, Jl.Meranti Wing 22 Level 4, IPB Campus Dramaga, Bogor, Indonesia 16680 e-mail: [email protected] D. Kusumaningrum · V. A. Sutomo School of Applied STEM, Prasetiya Mulya University, BSD, Tangerang, Indonesia 15339 e-mail: [email protected] V. A. Sutomo e-mail: [email protected] K. S. Tan Division of Banking & Finance, Nanyang Business School, Nanyang Technological University, Nanyang, Singapore e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_5
27
28
R. Anisa et al.
1 Introduction and Paddy Insurance Agriculture has been playing an indispensable role in Indonesia’s economy, with an estimated 57 million hectare (ha) of agricultural lands and involves about one third of the workforce.1 In particular, in 2018 there were around 13 millions of agricultural households with their main income from cultivating paddy [1]. The incomes of these households, therefore, tie directly to the productivity of paddy, which in turn, is vulnerable to a number of factors. While there are factors (such as farming method, fertilizer usage) that are within the control of farmers, there are many other external factors (such that pests, diseases, and weather related risk including flood and drought) that are beyond their control. To ensure stability of paddy households income and minimize disruption in food supply, Indonesian government through the Ministry of Agriculture (MoA) introduced paddy crop insurance (Asuransi Usaha Tani Padi/AUTP) in 2012 [2, 3]. The AUTP program began to pilot in selected provinces in Indonesia with Jasindo, a stateown insurance company, as the designated insurer [4]. The policy protects farmers with land up to 2 ha and covers crop failures caused by floods, droughts, pests and disease. The indemnity depends on the severity of the damaged land area, up to a maximum payout of Rp 6 millon/ha. The premium for the insurance policy was determined to be Rp 180,000/ha, but farmers are only required to pay 20% of the full premium (i.e. 36,000/ha) while the remaining 80% is subsidized by the government. Despite by now the AUTP program has been around in the market for a few years, farmers are more familiar with the insurance scheme, and Jasindo has more relevant data (such as incurred expense and claim experience) for underwriting the paddy insurance, no changes have been made to the insurance policy and its insurance premium since it was first implemented. For these reasons, the objective of this paper is to provide better understanding of the AUTP scheme from the perspective of farmers. Through a comprehensive survey conducted on the paddy farmers, valuable information can be collected to better understand their perception towards the paddy insurance. In particular, we analyze farmers’ willingness to pay (WTP) using Wilcoxon sign rank test. We also investigate factors that affect farmers’ WTP using the random forest classification method. These analyses and findings will be extremely useful for updating or re-designing the AUTP program to ensure its sustainability (such as to decrease financial burden of the government by reducing the subsidy level) and to provide better coverage for the farmers.
2 Survey Data Set The study in this paper is based on the primary data by surveying 625 paddy farmers in West Java, Central Java, East Java, North Sumatera, and West Nusa Tenggara. The survey was conducted in years 2018 and 2019. The survey was very comprehensive 1
https://oxfordbusinessgroup.com/indonesia-2019/agriculturereport_launcher.
Potential of Reducing Crop Insurance Subsidy Based on Willingness …
29
which covered a wide array of issues. Here we just extract information of relevance to this paper; i.e. questions related to WTP and its influencing factors: 1. Farmers’ personal description and capacity: (a) age, (b) farming experience, and (c) years spent to pursue formal education. 2. Farmers’ welfare: (a) monthly income per capita, (b) size of farming land, and (c) whether the farmers are poor or not. 3. Farmers’ exposure of knowledge: (a) exposure to AUTP program, (b) interaction with extension worker, and (c) membership in a farmer group. 4. Risk exposure: risk of drought, flood, pest, plant disease, typhoon, and others. 5. Insurance experience: (a) satisfaction of previous insurance product, (b) importance of attributes in an insurance product, and (c) choice of insurance plan.
3 Methodology After we had completed the survey, a considerable effort was spent on data cleaning and data aggregating while ensuring data integrity. Then we proceeded with the analysis as described in the following two procedures, with the first procedure relates to investigating the possibility of reducing insurance premium’s subsidy via farmers’ WTP and the second procedure on identifying the key factors that affect WTP. (i) Willingness to pay analysis a. Explore the data to obtain general description whether there is any indication that we can lower the premium subsidy. This is also required to investigate the distribution of WTP. b. Perform hypothesis testing to investigate the possibility of reducing premium subsidy. The hypothesis that we are testing is H0 : µ ≤ 36, 000
versus
H1 : µ > 360, 00,
where µ denotes population mean of farmers’ WTP (in Rupiah). If WTP is normally distributed, then the hypothesis testing can be performed using t-test. Otherwise, Wilcoxon signed-rank test will be performed. c. If the null hypothesis is rejected, then it can be concluded that there is sufficient evidence that for the paddy insurance product the farmers are willing to pay higher than the subsidized premium. d. To determine the exact farmers’ WTP, the median of WTP is used as the approximation as it describes the amount at which most farmers are willing to pay. (ii) Factors that affect WTP analysis a. Perform data binning in order to reveal the best relation between the explanatory and the observed variable. b. Perform predictive modeling using random forest method.
30
R. Anisa et al.
Table 1 Survey results of farmers’ WTP for paddy crop insurance Willingness to join Willingness to pay (in Rupiah) AUTP frequency Min. 1st Median Mean quartile Yes No
541 55
0 0
36,000 0
50,000 10,000
58,353 13,673
3rd quartile
Max.
100,000 20,000
1,000,000 108,000
c. Evaluate the model accuracy. d. Compute the variable importance based on criteria of mean decrease impurity (corrected). e. Investigate which variables have a large value of variable importance based on both criteria in the preceding step.
4 Results and Discussion 4.1 Willingness to Pay Table 1 summarizes some key survey results of relevance to WTP. The results are highly dependent on farmers’ willingness to join the AUTP program. Among the 625 farmers that we surveyed, 86.56% (i.e. 541 farmers) of them is willing to have paddy insurance. Conditioned on their interest to join, they are willing to pay substantially higher insurance premium. The median WTP is Rp 50,000 and is 5 times higher than the farmer who is not interested in insurance. The results in Table 1 are illuminating in that most farmers find AUTP program appealing and that their median WTP is higher than the current subsidized premium of Rp 36,000. This suggests that the government could potentially reduce its financial commitment to the AUTP program by lowering the current level of premium subsidy. To formally evaluate the statistical significance of reducing premium subsidy, we conduct a hypothesis test. Given that the distribution of WTP tends to skew to the right (and thus t-test is not appropriate), we resort to performing the Wilcoxon signed-rank test. Table 2 shows that both groups of farmers have extremely low pvalues. Therefore, null hypothesis can be rejected at 5% level of significance and it is statistically significance that the farmers’ WTP exceeds the current subsidized premium. Furthermore, using the median of WTP to approximate farmers’ WTP, the farmers are willing to pay as high as Rp 50,000/ha to join AUTP if the farmers is satisfied with the insurance product.
Potential of Reducing Crop Insurance Subsidy Based on Willingness … Table 2 Wilcoxon signed-rank test results Farmers Test statistics Interested to join AUTP All farmers
78035 92295
31
p-value 2.2 × 10−16 7.6 × 10−15
4.2 Factors That Affect WTP We now analyze factors that affect WTP by using the random forest (RF) classification with 1000 trees. In order to use the entire data set, imputation is applied to the data. WTP (in Rupiah) is classified into four categories: (i) up to 36,000; (ii) (36,000,100,000]; (iii) (100,000, 180,000]; and (iv) above 180,000. Most farmers are in the first two categories, with proportion as high as 49% and 48%, respectively. Based on the RF classification method, the accuracy rates for these categories are (i) 65.9%, (ii) 64.1%, (iii) 0%, and (iv) 0%, respectively. These results show that the method of RF classification provides reasonably good accuracy at describing WTP up to Rp 100,000. Several criteria are used to evaluate the goodness of fit of the random forest classification method. Both error out of bag (37.28%) and accuracy (62.72%) have showed a reasonable value. Variable importance is used in this study in order to gain insight on the factors that affect WTP. Mean decrease of corrected impurity is computed to determine the variables that are considerably most important in classifying the WTP. Figure 1 describes the value of mean decrease impurity (MDI) for each variable (sorted from the largest value). While the graph shows that the decrement level off 13th rank, it is of interest to identify the top four most important variables that affect farmers’ WTP, namely (1) willingness to join AUTP, (2) preferred insurance plan, (3) farmers’ satisfaction towards ease of premium administration of previous owned insurance product, and (4) farmers perception on the importance of insurance company trustworthiness.
Fig. 1 Line plot for variable importance
32
R. Anisa et al.
5 Conclusion The objective of this paper is to study the farmers’ demand behavior to paddy insurance. By conducting the Wilcoxon signed-rank test on our survey data, we concluded that the farmers’ WTP is in the neighbourhood of Rp 50,000, which exceeds the current subsidized premium of Rp 36,000. This result is statistically significance and that the farmers are willing to pay for if the insurance policy is satisfactory. Based on the random forest classification method, we were able to disentangle the factors that affect farmers’ WTP. We concluded that the top four importance factors were (1) willingness to buy crop insurance, (2) choice of insurance plan, (3) satisfaction toward ease of premium administration from previous owned insurance, and (4) farmers’ perception on the priority of insurance company trustworthiness. These results suggest that for a well-designed insurance scheme, farmers are willing to pay. Hence these results could provide valuable guidance in revamping the existing AUTP, including exploring the possibility of lowering the premium subsidy. Acknowledgements This research is part of the applied research program funded by the Risk Management, Economic Sustainability and Actuarial Science Development in Indonesia (READI) project supported by the Global Affairs Canada.
References 1. BPS.: Results of Inter-Census Agricultural Survey (SUTAS) (2018) 2. Pasaribu, S.M.: Developing rice farm insurance in Indonesia. Agri. Agri. Sci. Procedia 1, 33–41 (2010) 3. Pasaribu, S.M., Sudijanto, A.: Rice crop insurance pilot project: an implementation review. In: JICA Project of Capacity Development for Climate Change Strategies in Indonesia, Jakarta, Indonesia (2013). Accessed 21 Jan 2014 4. Pasaribu, S.M., Sudiyanto, A.: Agricultural risk management: Lesson learned from the application of rice crop insurance in indonesia. In: Kaneko, S., Kawanishi, M. (eds.) Climate Change Policies and Challenges in Indonesia. Springer, Tokyo (2016)
A Stochastic Volatility Model for Optimal Market-Making Zubier Arfan and Paul Johnson
Abstract The electronification of financial markets and the rise of algorithmic trading has sparked interest from the mathematical community, for the marketmaking problem in particular. The research presented in this paper solves the classic stochastic control problem for the optimal trading strategy of a market-maker, which is then applied to real limit order book trading data. Often models in the literature assume constant volatility for the asset price, and therefore may not be best suited to intra-day data. The stochastic volatility model is introduced to describe the intra-day price and variance process of the asset. The market-maker’s objective function is optimized to find the optimal two-way limit-order quotes. The results show that the stochastic volatility market-making model is more suited to a market-maker pursuing a strategy that returns stable profits. Keywords Stochactic volatility · Algorithmic trading · Limit order books · Market microstructure
1 Introduction The research regarding market-microstructure in the mathematical community is growing as a result of the electronification of financial markets, and the increase in quantitative trading strategies, for market-making (MM) in particular. Quote-driven markets have mostly been replaced by order-driven markets, where anybody can place orders in the limit order book (LOB), thus any trader can act as a MM. A MM earns the spread, or rebates, by providing liquidity on an exchange by quoting prices on both sides of the LOB. Regulations and exchange rules determine the parameters in which MM strategies can be used. These challenges and opportunities have sparked interest from the mathematical sciences. Z. Arfan · P. Johnson (B) The University of Manchester, Alan Turing Building, Oxford Road, Manchester M139PL, UK e-mail: [email protected] Z. Arfan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_6
33
34
Z. Arfan and P. Johnson
This work fits into the MM model framework that was originally formulated by [1], who combined the utility optimization framework posed by [6], but also took into consideration the microstructural features of a LOB such as the probability of order execution at different depths. This paper was inspired by the research in [5] which built on [1] by providing a generalized tractable framework for the MM problem by taking into account inventory control of the MM, with easy to compute solutions and analytical approximations. Guéant [5] assumed constant volatility for the asset price, but our investigations into the S&P 500 intra-day LOB data used for this report did not support this assumption. As is the case with most exchange-traded assets, greater volatility is seen during highly active market hours, and more so during the open and close periods.1 The resulting performance of the strategy was found to be sensitive to the choice of the security’s variance, ν. Ching [2] use the Heston model to optimise a mean-variance objective function and show its effectiveness using simulated asset price data. This work contributes to this area of research by combining ideas from [2, 5] to optimise the utility function of a risk-averse MM. Also, the model parameters are calibrated to real high-frequency LOB trading data to be back-tested and compared with the results of a defined naïve strategy, and a constant volatility MM model. Assumptions are made in the back-test algorithm to take into account the ambiguity of limit order priority, which is difficult to track in back-testing, and also to be in-line with the obligations of a MM. Section 2 describes the problem formulation and model assumptions, Sect. 3 briefly describes the trading algorithm and priority assumptions for the back-tests. The model performance is compared with that of the naïve and existing model from [5] in Sect. 4, before Sect. 5 finishes off with some concluding comments.
2 Problem Formulation Stochastic volatility is incorporated into the MM’s problem by using a simple version of the Heston model, which was similarly done by [2]. The difference to the usual Heston model is that the asset price is assumed to follow an arithmetic Brownian motion (ABM), rather than geometric. This assumption is valid because small-time trading horizons are considered, and the probability of negative prices occurring during these horizons is very low, this assumption is common in the related MM literature. Further, as opposed to [2], the objective is to maximise the constant absolute risk aversion (CARA) utility function of the MM instead of the mean-variance objective function. This choice is in-line with what was done in [5], which is the work that this paper builds upon and allows for a more direct comparison of results, but also allows for mathematical simplification when choosing ansatz solutions to reduce the problem to fewer dimensions. 1
The data for the S&P 500 e-mini futures used in this report was for 23 h trading sessions, with less volatility seen outside the usual (9:30 am to 4:30 pm ET) trading hours.
A Stochastic Volatility Model for Optimal Market-Making
35
Let (, F , P) be a probability space with a filtration (Ft , t ∈ [0, T ]), then the processes for the state variables are assumed as follows, Price: d St =
√
νt dWt(1) ,
(1)
√ Variance: dνt = θ (α − νt )dt + ξ νt dWt(2) ,
(2)
Cash Flow: d X t = (St + δta )d Nta − (St − δtb )d Ntb , Inventory: dqt =
d Ntb
−
(3)
d Nta ,
Nt ∼ Pois((δ)), where (δ) = Ae
(4) −kδ
,
(5)
where, (Wt(1) )t≥0 and (Wt(2) )t≥0 are Ft measurable Wiener processes correlated with coefficient ρ, dS, St = νt dt, dν, νt = ξ 2 νt dt, and dν, St = ρξ νt dt. The Ft measurable Poisson processes (Nta )t≥0 and (Ntb )t≥0 are independent of each other, (Wt(1) )t≥0 , and (Wt(2) )t≥0 . The cash flow, X t , describes the amount of money the MM has at time t from buying and selling assets, and the inventory, qt , is the net number of assets the MM is holding at time t. The control parameter in the problem, δ = (δ a , δ b ), is the additional compensation the MM asks for when providing liquidity on the bid or ask side of the LOB. Assuming the MM has a CARA utility function, the objective function to maximize is of the form, −γ (X T +qT ST −l(|qT |)) , (6) sup E − e δta ,δtb
where l is a deterministic penalty function. The resulting Hamilton-Jacobi-Bellman partial differential equation (HJB-PDE) is solved using a discrete implicit-explicit finite difference scheme that is assumed to satisfy the strong comparison property. The scheme is consistent, monotone, and stable, therefore the scheme converges to the viscosity solution [4]. The waiting time method which is described in [3], was used to calibrate the parameters A and k for the function (δ) in (5). The stochastic volatility model parameters θˆ , α, ˆ ξˆ , and ρˆ were calibrated using the maximum likelihood formulae derived in [7]. The solution to the optimal bid-side strategy δ b (t, q, ν) is illustrated in Fig. 1. From the graph, it can be seen that the optimal choice of δ b , the distance to quote from the reference price, depends more on ν and q than the time, t, for the calibrated parameters. The results shown in the graph seem to make practical sense, for example, given a fixed q, the strategy quotes more conservatively when ν is large, this is to receive a greater premium for taking the trade when there is a higher risk of price changes. Similarly, for a fixed ν, when the MM is holding a large positive inventory, q, it will also price further away to reduce the probability of execution or be rewarded by charging a premium to take on further inventory risk.
36
Z. Arfan and P. Johnson
Fig. 1 The optimal bid-side strategy δ b , for different inventory levels q, when using calibrated parameters: θ = 0.0017, ξ = 1.622, α = 775 ρ = −0.08, γ = 0.01, A = 1.212, k = 0.0448
3 Trading Algorithm To compare the strategies, a simple naïve strategy is defined to post 1 tick away from the reference price throughout the trading day, independent of time and inventory. Also, a constant volatility strategy (CVS) is defined that uses the results from a model similar to that in [5]. The optimal quotes, δ(τ, q), for the CVS are a function of the time to maturity, τ , and the inventory held, qτ . The stochastic volatility strategy (SVS) determines the optimal quotes, δ(τ, q, ν), as a function of time, inventory, and instantaneous variance ντ . The trading algorithm quotes simultaneous orders on both sides of the LOB with orders of comparable size, in this case, 1 unit, and re-quotes relative to the reference price every minute. This algorithm structure is in-line with recent market-making regulations defined by MiFIDII and is considered to be more realistic. To remove ambiguity regarding the priority of the limit orders (LO) in the back-test, two extreme cases are considered, that the actual results should lay in between. Firstly, optimistic priority (OP), here it is assumed that every quote posted by the MM joins the front of the queue at any price level and hence executed as soon as the reference price reaches this level. Pessimistic priority (PP) assumes the LO joins the back of the queue at any price level and stays at the back even when new orders join. The liquidity parameters A and k were also calibrated, using the waiting time method from [3], with these two assumptions to remain consistent with the back-tests.
4 Strategy Results The strategies are back-tested on 20 trading days from July 2017 of level 5 LOB data for S&P 500 e-mini futures, which are traded on the Chicago Mercantile Exchange. The mean profit and standard deviation columns in Table 1 show that both the CVS and SVS outperform the naïve strategy for the OP and PP cases. For the PP case, all
A Stochastic Volatility Model for Optimal Market-Making
37
Table 1 Table showing the mean number of trades, maximum inventory, mean profit and standard deviations for the CVS and SVS trading strategies. CVS: Constant volatility model. SVS: Stochastic Volatility Strategy Strategy
γ 0.0001 0.001 0.01
Pessimistic priority Trades Max Mean Inv profit (¢) Naïve CVS SVS CVS SVS CVS SVS
186 160 179 165 148 103 72
10 4 5 3 3 2 2
(1,180) (859) (950) (627) (745) (439) (520)
Std (¢)
3,398 1,342 1,016 744 537 403 330
Optimistic priority Trades Max Mean Inv profit (¢) 902 958 957 1029 939 725 829
10 9 10 4 7 2 3
20,048 20,311 19,766 20,425 18,923 14,934 14,259
Std (¢)
3,747 2,755 1,710 2,182 1,878 1,547 1,669
profits are negative, implying that it is difficult to profit from such a strategy if the MM acts too slow because the probability of making the spread is small when the LOs have low priority. However, it can be seen that for all risk aversion parameters, γ , the standard deviations of the profit is reduced when the SVS is applied and the mean profits remain similar to the CVS. In the OP case, the SVS outperforms the CVS for most strategies. For example, when γ = 0.0001 the mean profit reduced by 3% from 20, 311¢ to 19, 766¢ but the standard deviation was reduced by almost 40%. For γ = 0.001 a similar improvement was seen for the SVS, although not as great with a 7% decrease in mean profit and 14% decrease in standard deviation. Finally, for the highest risk aversion parameter, γ = 0.01, there was no improvement by using the SVS. The strategies, in this case, were already so risk-averse and priced quotes so conservatively, that there was a negligible difference in the two strategies. This is further justified with the lower number of trades seen for higher risk aversion parameters and tighter inventory control which is seen by the max inventory held throughout the day to be smaller. For instance, in the OP case, the maximum inventory held is 3 and 10 units for γ = 0.01 and γ = 0.0001, respectively. The reason maximum inventory reduces for larger risk aversion choices is that the strategy quotes both sides of the book at greater distances from the reference price when inventory is zero, this is to reduce the probability of the MM moving away from the optimal inventory amount of zero, however, when the inventory is unbalanced the MM prices the opposite side of the book to maximise the probability of execution to quickly rebalance the inventory. This is in contrast to strategies with smaller risk aversion parameters that price relatively closer to the reference price increasing the probability of execution but profiting smaller amounts per trade. It appears that by using the SVS, the MM benefits when the risk aversion parameter is small. If choosing between CVS and SVS, the SVS would be a safer choice to make when there is less certainty about the intra-day variance since the strategy is a function of time, inventory and variance.
38
Z. Arfan and P. Johnson
Another interesting observation made during this study was that the SVS is less sensitive to parameter calibration errors compared to the CVS. This is because the SVS strategy also takes into account the variance of variance, therefore anticipating further volatility, but also for a big enough numerical grid it can find an optimal choice for δ. Furthermore, the algorithm requires rounding to the nearest tick size of $0.25, which absorbs most of the calibration error, if any, which may explain the very similar results seen for SVS. This leaves open questions regarding how sensitive the model may be for asset classes with smaller tick sizes. Or, alternatively, assets that have wider bid-ask spreads with lower volume.
5 Conclusion This work has contributed to the very fruitful and growing area of market-making strategy research by designing, analysing and calibrating a stochastic volatility MM model and back-testing it on real LOB data. It is believed the work stands out in this niche area of finance because it has not only introduced the novelty of a MM model that takes into account stochastic volatility but has also described how it can be used with real LOB data with the described model having enough flexibility to introduce further extensions that will be investigated in the near future. The back-test algorithm structure is independent of the PDE modelling, it would be interesting to introduce, into the back-test algorithm, some forecasting or signal processing techniques for the variance process. This may enable the MM to choose the optimal δ, from the HJB-PDE solution, for the expected variance over the trading interval, not the present variance at the time of quoting. There are many exciting ways to build on this practical approach to solve the MM problem, and therefore remains a growing area of interest for academics and practitioners.
References 1. Avellaneda, M., Stoikov, S.: High-frequency trading in a limit order book. Quantitative Finance 8(3), 217–224 (2008) 2. Ching, W., Gu, J., Siu, T., Yang, Q.: Trading Strategy with Stochastic Volatility in a Limit Order Book Market (2016). arXiv:1602.00358 3. Fernandez-Tapia, J.: Modelling, Optimization and Estimation for the On-line Control of Trading Algorithms in Limit Order Markets. In: Tapia, J.F. (ed.) Ph.D. thesis, Paris 6 (2015) 4. Forsyth, P.A., Vetzal, K.R.: Numerical Methods for Nonlinear PDEs in Finance. In: Handbook of Computational Finance, pp. 503–528. Springer, Berlin (2012) 5. Guéant, O.: Optimal market making. Appl. Math. Finance 24(2), 112–154 (2017) 6. Ho, T., Stoll, H.R.: Optimal dealer pricing under transactions and return uncertainty. J. Financial Econ. 9(1) (1981) 7. Wang, X., He, X., Bao, Y., Zhao, Y.: Parameter estimates of heston stochastic volatility model with MLE and consistent EKF algorithm. Sci. China Inf. Sci. 61(4), 042202 (2018)
Method for Forecasting Mortality Based on Key Rates David Atancd, Alejandro Balbas, and Eliseo Navarro
Abstract We develop a model to construct dynamic life tables based on the idea that the behavior of whole life table can be explained by a reduced number of factors. These factors are identified with some mortality rates at specific ages. These key mortality rates and model parameters estimates are obtained by applying a maximum likelihood criteria under the hypothesis of a binomial distribution of the number of deaths. We develop the single factor version of the model, which is implemented to the male USA population. The model is compared with a set of alternative well-known life tables models. To test the forecasting ability of the model we apply a battery of tests using out of sample data. Despite its simplicity, the outcomes indicate that this model is not outperformed by other more complex mortality models. Other important advantage of this model is that it can be easily implemented to address some longevity risk linked problems in the context of Solvency II. Keywords Key mortality rate · Forecasting · Mortality modelling · Demography
1 Introduction The development of mortality models in order to describe and forecast mortality rates is undoubtedly one of the most important topics in actuarial literature. This is an essential issue due to its relevance for accurate pricing of life insurance products or for macroeconomic issues as the sustainability of public pension system. Since D. Atancd (B) · E. Navarro Departamento de Economía y Dirección de Empresas, University of Alcala, Alcalá de Henares, Spain e-mail: [email protected] E. Navarro e-mail: [email protected] A. Balbas Departamento de Economía de la Empresa, Universidad Carlos III de Madrid, Madrid, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_7
39
40
D. Atancd et al.
the publication of Lee and Carter’s seminal paper [10] many authors have extended their model including additional factors or effects [1–3, 12–14]. In this paper, we propose an innovative dynamic mortality model that assumes that variations of mortality rates are linearly related to a small number of factors. Moreover, we identify these risk factors with some particular mortality rates corresponding to specific ages or “key ages”. This model is inspired by former studies about the Term Structure Interest Rates or TSIR [7] where interest rates are assumed to depend linearly on changes of a reduced number of interest rates with specific maturities. It necessary to emphasized the novelty of employing TSIR methodology to model and forecast mortality rates. In fact, by studying the mortality random behavior of the key age, one can address many topics in longevity risk management [11].
2 Single Factor Model Inspired by literature describing the behavior of the term structure of interest rates and by similarities between life tables (where mortality rates are assumed to depend on age) and the term structure of interest rates (where interest rates depend on the term to maturity), we seek a tractable model that can be used to build dynamic life tables. In a model proposed by Elton, Gruber and Michaelly [7], it is assumed that changes in interest rates with different maturities linearly depend on changes in some key interest rates identified as those that best explain the whole behavior of the term structure of interest rates. In a similar way, the Single Factor Model assumes that the whole life table can be explained by one mortality rate of a particular age, the “key” mortality rate. Particularly, we suppose that: ln qˆ x,t = α ∗ (x) + b∗ (x) · ln qˆ x ∗ ,t + ε(x,x ∗ ),t
(1)
where:
• ln qˆ x,t is the variation in the logarithm of the crude1 mortality rate at age x fromyear t-1 to year t. • ln qˆ x ∗ ,t is the change in the logarithm of the mortality rate at the key age x ∗ from t-1 to t. This key age will be chosen to maximize the explanatory power of the model. • α ∗ (x) is a function that tries to capture the general tendency of a reduction (increment) in mortality rates and is assumed to be independent of the behavior of the key mortality rate qˆ x ∗ ,t . The value of this term may differ from one age to another, indicating a differential behavior in the reduction of mortality rates over time.
1
The model could be implemented using graduated mortality rates. Eventually, we decided to use crude mortality rates to avoid data manipulation.
Method for Forecasting Mortality Based on Key Rates
41
• b∗ (x) is a function that describes the sensitivity of the logarithm of the mortality rate at age x to changes in the logarithm of the key morality rate, and it captures changes in the shape of the mortality curve over time. 2 • ε(x,x ∗ ),t is a random error term with zero mean and constant variance σε,(x,x ∗). If we denote by Dx,t the number of deaths at age x and period t, which are assumed to be independent, then, we can consider Dx,t as a binomial random variable: Dx,t ∼ Bi E x,t ; qx,t
qx,t = qˆ x,t−1 expα
∗
(x)+b∗ (x)·ln(qx ∗ ,t )
,
(2)
where, E x,t is the initial exposure to risk at age x and period t, and qx,t is the mortality rate at exact age x. From (2), if we denote by dx,t the actual number of deaths at age x observed during period t, the log-likelihood function is given by:
E x,t dx,t ln qˆ x,t + E x,t − dx,t ln 1 − qˆ x,t + ln (3) L θ, x ∗ ; dx,t = dx,t x,t where, θ is a set of parameters that will be used to model and estimate functions α ∗ (x) and b∗ (x) as functions of x and the key age x ∗ . To determine the key age x ∗ we proceed as follows. First, for each x ∗ we will obtain the set of parameters θ that maximise function L θ, x ∗ ; dx,t and then we will obtain the integer age x ∗ that maximizes function (3) that is: max L θ, x ∗ ; dx,t max ∗ x
θ
(4)
2.1 Function to α ∗ (x) and b∗ (x) In this paper, we propose a very simple functional form for α ∗ (x) and b∗ (x). The first consists of a cubic function form: 2 3 α ∗ (x) = a1 x − x ∗ + a2 x − x ∗ + a3 x − x ∗ , and the second consists of adjusting a parametric function, inspired by [6]: 2 + (1 − β1 ) + u x b∗ (x) = β1 · exp −β2 x − x ∗
(5)
(6)
42
D. Atancd et al.
2.2 Forecasting Mortality Rates The final step in the process of constructing the dynamic life tables consists of developing a methodology to forecast future mortality rates. According to Eq. (1) and rearranging termns in ecuation (1), we obtain: ln qˆ x,t = ln qˆ x,t−1 + α ∗ (x) + b∗ (x) · ln qˆ x ∗ ,t + ηx,t
(7)
where:
• ln qˆ x ∗ ,t ; represents the change in the logarithm of the mortality rate corresponding to the key age x ∗ from t − 1 to t or, alternatively, the relative change in the key mortality rate. • ηx,t is an error term with mean zero and variance ση2 . We Employ an ARIMA time series to model the behavior of the key mortality rate, ln qˆ x ∗ ,t . Once, we determine the ARIMA (p, d, q) that allows us to forecast the future values of ln qˆ x ∗ ,t , the remaining ages are obtained using Eq. (7).
3 Comparison Between the Single Factor Model and Different Mortality Models Finally, we compared the Single Factor Model (SFM) with different mortality models (Lee-Carter [10], Bi-Factorail Lee-Carter [1, 12], Age-Period-Cohort model [4, 13], Renshaw-Haberman-Model [13] and Plat-Model [14]) analyzing the forecasting power of the models. They have been calibrated using data corresponding to the USA male experience over 1975–2006 period, employing the 2007–2016 period for out-of-sample testing. Ages covered in this study ranges from 0 to 99. Data was obtained from the [8] (HMD). To compare the performance of the models, we have computed three standard metrics of accuracy: the Sum of Squared Errors (SSE) and Mean Absolute Error (MAE) are defined as: SSE =
S F M,LC,... 2 lnqˆ x,t − lnqˆ x,t ,
(8)
x,t
MAE =
1
S F M,LC,... lnqˆ x,t − lnqˆ x,t , n d x,t
(9)
where n d is the number of observations (ages) in out of sample period. The outcomes are summarized in Table 1 for SSE and MAE in USA male population.
Method for Forecasting Mortality Based on Key Rates Table 1 SSE and MAE for USA male population USA-male SFM LC LC2 SSE MAE
9.5253 0.0727
15.1707 0.1007
15.6482 0.0973
43
APC
RH
PLAT
14.6278 0.0752
10.5033 0.0758
14.9680 0.0789
In summary, the SFM is a very simple and tractable model that is highly effective in forecasting future mortality rates compared with other competing models. This fact is due that SFM model is based on studying the dynamic of mortality curve. Moreover, the SFM is highly flexible in capturing suddenly changes in mortality rates and can be easily extended to a multifactorial framework. Its tractability can make of this model a very useful tool for valuing life insurance products and for measuring the risk (notably, longevity risk) inherent in such products.
References 1. Booth, H., Maindonald, J., Smith, L.: Applying Lee-Carter under conditions of variable mortality decline. Popul. Stud. 56(3), 325–336 (2002) 2. Brouhns, N., Denuit, M., Vermunt, J. K.: A poisson log–bilinear regression approach to the construction of projected lifetables. Insurance: Math. Econ. 31(3), 373–393 (2002) 3. Cairns, A.J., Blake, D., Dowd, K., Coughlan, G.D., Epstein, D., Ong, A., Balevich, I.: A quantitative comparison of stochastic mortality models using data from England and Wales and the United States. North Am. Actuarial J. 13(1), 1–35 (2009) 4. Currie, I.: Smoothing and forecasting mortality rates with P–splines. Talk given at the Institute of Actuaries (2006). http://www.ma.hw.ac.uk/iain/research/talks.html 5. Debón, A., Montes, F., Puig, E.: Modelling and forecasting mortality in Spain. Euro. J. Oper. Res. 189(3), 624–637 (2008) 6. Díaz, A., Merrick, J.J., Navarro, E.: Spanish treasury bond market liquidity and volatility preand post-european monetary union. J. Bank. Finance 30(4), 1309–1332 (2006) 7. Elton, E.J., Gruber, M.J., Michaely, R.: The structure of spot rates and immunization. J. Finance 45(2), 629–642 (1990) 8. Human Mortality Database: University of California, Berkeley (USA), and Max Planck institute for demographic research. Germany (2018). Avaliable at www.mortality.org; www. humanmortality.de 9. Hyndman, R.: Forecast: forecasting funcions for time series. R Package Version 1, 11 (2008) 10. Lee, R.D., Carter, L.R.: Modeling and forecasting US mortality. J. Am. Stat. Assoc. 87(419), 659–671 (1992) 11. Li, J.S.H., Luo, A.: Key q-duration: a framework for hedging longevity risk. ASTIN Bull. J. IAA 42(2), 413–452 (2012) 12. Renshaw, A.E., Haberman, S.: Lee–Carter mortality forecasting with age–specific enhancement. Insurance: Math. Econ. 33(2), 255–272 (2003) 13. Renshaw, A.E., Haberman, S.: A cohort–based extension to the Lee–Carter model for mortality reduction factors. Insurance: Math. Econ 38(3), 556–570 (2006) 14. Plat, R.: On stochastic mortality modeling. Insurance: Math. Econ. 45(3), 393–404 (2009)
Resampling Methods to Assess the Forecasting Ability of Mortality Models David Atance, Ana Debón, and Eliseo Navarro
Abstract Given the number of mortality models that have been considered in the literature, it is difficult to choose one model to forecast the probabilities of deaths. In this paper, we use the resampling methods to meet the mortality model that has a better forecasting ability. These techniques are a statistical tool that allows assessing the predictive performance of different models and which have not been used to compare mortality models. We employ four resampling methods that test the forecasting ability of three variations of the original Lee-Carter model in several European countries. The aim of this paper to compare different mortality models in terms of forecasting ability in the population studied by applying the resampling methods. Keywords Cross-validation · Lee-carter model · Forecasting ability · Resampling methods
1 Methods to Evaluate the Forecasting Ability of the Models We apply four statistical learning methods within the resampling methods. They consist of randomly splitting the sample into two subsets: the training set and the validation set. The first one is used to fit the model and the second one to evaluate the quality of the fit. Depending on the way these two sets are generated, we have different methods. In this study, we focus on the following methods: D. Atance (B) · E. Navarro Departamento de Economía y Dirección de Empresas, University of Alcala, Alcalá de Henares, Spain e-mail: [email protected] E. Navarro e-mail: [email protected] A. Debón Departamento de Estadística e Investigación Operativa Aplicada y Calidad, Universitat Politècnica de València, Valencia, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_8
45
46
1. 2. 3. 4.
D. Atance et al.
Hold-Out. Repeated Hold-Out. Leave-One-Out-CV. K-Fold-CV.
All these resampling methods are adapted to time-series data because we are using dynamic life tables, which are ordered chronologically. The standard procedure of resampling methods which was not applied because theses methods use nonembedded data time series.
1.1 Hold-Out This first method, Hold-out or H-method, consists of randomly splitting the sample into two subsets of data [9] for training and testing just once. In this case, we have adapted this method to time series, according to [2], the division of the sample should be done chronologically, as shown in Fig. 1. The forecasting ability of the model is measured just once, and it is the goodness of fit for the model in the validation dataset.
1.2 Repeated-Hold-Out A variation of the Hold-Out method is known as Reapeated hold out [6] and involves repeating the hold-out several times, for example, b, as can be seen in Fig. 1. For each iteration, the sample is randomly divided into two subsets for training and testing. In this way, the model is tested b times, being b the number of times that the sample is subdivided or iteration. The measure of the forecasting ability when applying the repeated hold out method is the average of each of the b measures of goodness of fit: 1 Measure of Goodness fit j . b j=1 b
Repeated Hold-Outb =
(1)
For different iteration validation sets can share observations; this fact is not possible in the following cross-validation methods.
1.3 Leave-One-Out-CV In this method, instead of generating two subsets of similar size, only one observation is used as the validation set, and the remaining observations are used as the training set, as shown in Fig. 1. For more details about this method, Leave-One-Out CV (LOOCV), see [5, 14].
Resampling Methods to Assess the Forecasting Ability of Mortality Models
47
Fig. 1 A schematic display of the employed resampling methods for an embedded time series. Training set, validation set and omited set are shown in grey, white and black, respectively
This process is repeated n times (being n the number of observations of the entire sample). A measure of the quality of the forecast should be applied in each iteration, and then, the accuracy of the forecast is calculated as an average of this measure: LOOCVn =
n 1 Measure of Goodness fiti . n i=1
(2)
When applying this method to time series, as in our case, the training set must contain only observations before the data to be predicted, and so future observations must not be used to build the training set, as can be seen in Fig. 1. Thus, the training set consists of a window with a fixed origin, and in each iteration, a new data is added to it, chronologically.
48
D. Atance et al.
1.4 K-Fold-CV An alternative to the LOOCV is the K-fold CV [5]. When applying this method, first, we randomly split the sample into k data subsets of similar size. Then each iteration, we fix one of these subsets and use the remaining k − 1 data subsets as the training set. The fixed one becomes the validations set, and it is used to measure the forecasting ability of the model. The process has to be repeated, fixing at each iteration a different validation set and all the remaining data as the training set. So, we obtain k different measures of the forecasting goodness of fit of the model. As before, thus, the forecasting ability of the model is measured by the average of the measure of the forecasting ability obtained at each of the k iterations. 1 Measure of Goodness fith . k h=1 k
K-fold-CVk =
(3)
When the sample consists of a time series, the method, as explained before, cannot be applied directly. In this case, the partition of the sample cannot be done randomly. Each subset must contain only consecutive data. Also, we can only use previous information for forecasting future observations, as shown in Fig. 1. Thus, when applying this methodology for testing the forecasting ability of a mortality model in the first iteration, only the first subset (chronologically ordered) is used to forecast the data of the next subset that is used as the validation subset. The next iteration, the two first blocks are used as the training set, and the next block (in chronological order) is the validation set and so on [2].
2 Description of the Data The data employed in this study consist of the life tables of 30 European countries provided by the Human Mortality Database [7]. In this paper, we study the male populations. The sample period covers from 1990 to 2013, and the ages range from 0 to 109 years old. The fitting is carried out using the [11].
3 Choosing the Optimal Mortality Model As we mention before, resampling methods need to quantify the forecasting ability of the model using the goodness of measures in the validation datasets. So, to measure the accuracy of the forecasting ability of the mortality model, we apply non-penalized (SSE, MSE, MAE, MAPE, R 2 ) and penalized (AIC [1], BIC [14]) measures, in all countries studied for men and women.
Resampling Methods to Assess the Forecasting Ability of Mortality Models
49
We compare the forecasting ability of the mortality models, the original LeeCarter model [10], the Lee-Carter model with two terms [4, 12], and the Lee-Carter model with two terms and orthogonality constraints for a broad set of parameters [8]. To enable this, we apply a spider or radar chart to visually compare the different goodness of fit tests in all the studied populations. In Fig. 2, each spoke represents the number of times that each model provides a better result in each measure of goodness of fit.
4 Conclusion The comparison using the Cobweb Graph leads us to the conclusion that the best model is, in general, the Lee-Carter model. Also, we propose a procedure that can be applied to a life tables database, allowing us to choose the most appropriate model in any geographical area.
Fig. 2 Radar chart applying resampling methods a Hold-Out, b Repeated hold-out, c LOOCV and d K-fold CV to show the number of times that each model (LC, LC2 and LC2-O) have better forecasting ability in each country for male population
50
D. Atance et al.
References 1. Akaike, H.: A new look at the statistical model identiffication. IEEE Trans. Autom. Control 19(6), 716–723 (1974) 2. Bergmeir, C., Benítez, J.M.: On the use of cross-validation for time series predictor evaluation. Inf. Sci. 191, 192–213 (2012) 3. Bergmeir, C., Hyndman, R.J., Koo, B.: A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 120, 70–83 (2018) 4. Booth, H., Maindonald, J., Smith, L.: Applying Lee-Carter under conditions of variable mortality decline. Popul. Stud. 56(3), 325–336 (2002) 5. Burman, P.: A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods. Biometrika 76(3), 503–514 (1989) 6. Forsythe, A., Hartigan, J.: Efficiency of confidence intervals generated by repeated subsample calculations. Biometrika 57(3), 629–639 (1970) 7. Human Mortality Database.: University of California, Berkeley (USA), and Max Planck institute for demographic research. Germany (2018). Avaliable at www.mortality.org; www. humanmortality.de. Accessed October 8. Hunt, A., Blake, D.: Identifiability in age/period/cohort mortality models, pp. 9–15. Technical Report, Pensions Institute PI (2015) 9. Lachenbruch, P.A., Mickey, M.R.: Estimation of error rates in discriminant analysis. Technometrics 10(1), 1–11 (1968) 10. Lee, R.D., Carter, L.R.: Modeling and forecasting US mortality. J. Am. Stat. Assoc. 87(419), 659–671 (1992) 11. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018) https://www.R-project.org/ 12. Renshaw, A.E., Haberman, S.: Lee–Carter mortality forecasting with age–specific enhancement. Insurance: Math.Econ. 33(2), 255–272 (2003) 13. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978) 14. Shao, J.: Linear model selection by cross-validation. J. Am. Stat. Assoc. 88(422), 486–494 (1993)
Portfolio Optimization with Nonlinear Loss Aversion and Transaction Costs Alessandro Avellone, Anna Maria Fiori, and Ilaria Foroni
Abstract This proposal puts forth a methodology that can be used to derive optimal asset allocations for general forms of Loss Aversion, explicitly accounting for the real risks associated with large-scale investments. The portfolio problem is solved by a stochastic algorithm based on Particle Swarm Optimization, which permits the inclusion of transaction costs and other constraints faced by investors and fund managers. An empirical study compares the proposed approach to traditional strategies in terms of portfolio composition, downside protection in adverse market conditions and global performance. Keywords Asset allocation · Downside risk · Particle swarm optimization · Cumulative prospect theory
1 Introduction Although Mean-Variance (M-V) analysis has been the paradigm for quantitative portfolio management since the 1950s, the recent market instability and the growing complexity of financial instruments demand more flexible tools to handle the real risks associated with large-scale investments. On the one hand, a huge amount of experimental evidence suggests that individuals place stronger emphasis on downside risk and, in particular, on possible shortfalls relative to some personal target, or reference point. In the behavioral finance literature derived from Kahneman and Tversky [11] this attitude is described as Loss Aversion (LA). On the other hand, the increasing use of high-yield bonds, foreign currency and equity derivatives in portfolio management is potentially responsible for changing the portfolio distribuA. Avellone · A. M. Fiori (B) · I. Foroni Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy e-mail: [email protected] A. Avellone e-mail: [email protected] I. Foroni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_9
51
52
A. Avellone et al.
tion from symmetric to asymmetric, with a heavier left tail. This arguably invalidates conventional M-V analysis and motivates the development of alternative portfolio selection rules (Sect. 2). Unfortunately, the implementation of optimal asset allocation rules under LA is computationally challenging because the corresponding utility (or value) function is non-differentiable and non-concave [2]. Thus, with a few notable exceptions, the majority of published papers have focused on linear or quadratic LA, solving the optimal asset allocation problem either by linear/quadratic programming, or by Monte Carlo simulation for three to five assets. As an alternative to gradient-based optimization techniques, the use of metaheuristic algorithms permits to solve the portfolio selection problem with general forms of LA, subject to realistic constraints. In this work (Sect. 2.1) we consider a metaheuristic global optimization algorithm known as Particle Swarm Optimization (PSO). Introducing a specific constraint-handling mechanism, we apply PSO to a real-world asset allocation problem with LA and transaction costs (Sect. 3). Results of our empirical analysis suggest that the proposed portfolio strategy represents a valuable alternative to traditional M-V analysis.
2 Methodology We consider a loss-averse Decision Maker (DM) who associates risk with the failure of meeting a target (or reference wealth). The DM is not uniformly risk averse, but distinctly more sensitive to losses (i.e. outcomes below the reference level) than to gains (outcomes above the reference level). These elements translate into a S-shaped value function v(z) of wealth deviations z from the reference point, which combines concavity over gains with convexity over losses and is steeper for losses relative to gains. A well-known example is the value function of Cumulative Prospect Theory (henceforth CPT; see [11]), defined as: v(z) =
for z ≥ 0 zα β −λ(−z) for z < 0,
(1)
where α, β are positive parameters and λ > 1 is the LA index. Based on extensive experimental work, Kahneman and Tversky estimated α = β = 0.88, and λ = 2.25 indicating that the pain for losing is 2.25 times stronger than the pleasure for gaining. Let’s now consider a DM with the value function given in (1) and a single-period investment planning horizon, from date t to t + 1. The DM is initially endowed with a sure wealth bt > 0 that can be invested in a set of K risky assets, with (random) return vector X t+1 = [X t+1,1 , ..., X t+1,K ] , subject to transaction costs ct per unit traded. Gains and losses are evaluated with respect to a reference wealth br = bt (1 + r ), where r is the reference return. If wk is the fraction of bt invested in asset k, the deviation between the DM’s terminal wealth and br can be written as:
Portfolio Optimization with Nonlinear Loss Aversion … ⎛ Z t+1 = Bt+1 − br = bt ⎝1 +
K
53
⎞
wk X t+1,k − ct ⎠ − bt (1 + r ) = bt w X t+1 − ct − r ,
k=1
K where w = [w1 . . . w K ] ∈ W = {w ∈ R K : k=1 wk = 1 and wk ≥ 0 for all k}, i.e. short-selling is not permitted. Then, the asset allocation problem consists in choosing a vector w∗ that maximizes the CPT-preference value: V (Z t+1 ) = E[v(Bt+1 − br )] = E{v[bt w X t+1 − ct − r ]}.
(2)
Using (1), this optimization problem is equivalent to the maximization of the function V ∗ defined as: β α V (Z t+1 ) β−α = E[max w X t+1 − ct − r, 0 ] − λbt E[max 0, r − w X t+1 + ct ] α bt β−α
= UPMα (w X t+1 − ct ; r ) − λbt
LPMβ (w X t+1 − ct ; r ),
(3)
where UPMα (·; r ) and LPMβ (·; r ) denote, respectively, the upper partial moment of order α and the lower partial moment of order β of portfolio returns (net of transaction costs) at level r . In investment contexts, natural choices for r are either 0 (status-quo), or a risk-free rate of return. Transaction costs can be specified in different forms, and we refer the reader to [6, 9] for accurate discussions.
2.1 Optimization Algorithm Although a number of multivariate optimization methods are available to locate the global optimum of a non-concave function, these methods could be computationally too demanding for high dimension problems [2]. To circumvent this issue, we implement PSO which is a metaheuristic algorithm specifically designed for NP-hard optimization and particularly suitable to deal with solutions encoded as real-valued vectors [4]. Inspired by the social behavior of animals in a swarm, PSO considers a population of candidate solutions (particles) that live within the same ambient (search space). These particles move toward the best food source (the optimum) according to two distinct driving forces: an individual component (representing the members own knowledge of the already explored search space) and a social component (which accounts for knowledge shared by the swarm as a whole). The attraction towards these two dynamic points forces the whole swarm closer and closer to the best food source, provided that a fitness function measuring the quality of the food source has been correctly defined. After the random generation of an initial swarm, improvement strategies are applied for a sequence of steps until an appropriate criterion is met. In more details, to solve the constrained CPT portfolio problem with the PSO technique, one needs to define for each particle h (h = 1, . . . , H ) a position wh (i) in the search space W and a velocity vector vh (i) that displaces the particle at each
54
A. Avellone et al.
iteration step i. The function V ∗ defined in (3) is used as fitness to asses the quality of each position with respect to the optimization objective. The position of the h-th particle at iteration i is then adjusted as follows wh (i) = wh (i − 1) + vh (i) vh (i) = vh (i − 1)+ c1 · r1 · (wbh (i − 1) − wh (i − 1))+ c2 · r2 · (wg (i − 1) − wh (i − 1))
(4a)
(4b)
where c1 and c2 are the cognitive and the social parameters of the particle, r1 and r2 are random variables, wbh indicates the best position so far occupied by the h-th particle and wg the one so far occupied by the whole swarm. Equations (4a) and (4b) clearly reveal that move operators are blind to constraints and frequently return solutions that fall outside the feasible region. For such cases, a constraint-handling strategy needs to be added to supplement the PSO. In this aim, we developed a repairing operator (RO) able to bring back unfeasible individuals into the feasible space. As it is well-known, the ability of a repairing mechanism in guiding the search process to the global solution is strongly linked to the features of the problem under consideration and its optimum. In particular, when boundary constraints on the variables are involved, as it happens in our case, the chance of obtaining efficient solutions strongly relies on the boundary-handling method chosen. We thus created a RO able to encompass any boundary-handling technique commonly used in heuristic constrained optimization (we refer to [10] for a review on this topic) and in constrained portfolio selection problems (one of the most used approaches in this field can be found in [1]). As summarily described in Algorithm 1, RO is embedded into the PSO structure and, if it is necessary, corrects the positions of the particles after each update.
Algorithm 1: PSO for constrained loss-averse/CPT portfolio problem 1 2 3 4 5 6 7 8 9 10 11 12 13 14
for h ← 1 to H do Initialize randomly wh (0) and vh (0); if wh (0) ∈ / W then apply RO to obtain ¯ h (0) ∈ W ; wh (0) ← w¯ h (0); w Initialize wbh (0) = wh (0); Calculate V ∗ (wh (0)); if V ∗ (wh (0)) ≥ V ∗ w j (0) for all j = h then wg (0) = wh (0); end repeat for i ← 1 to I do for h ← 1 to H do Update vh (i) using (4b) and wh (i) using (4a); if wh (i) ∈ / W then apply RO to ¯ h (i) ∈ W ; wh (i) ← ¯ h (i); obtain w w Calculate V (wh (i)); if V ∗ wbh (i) ≤ V ∗ (wh (i)) then wbh (i) ← wh (i); end if V ∗ (wh (i)) ≥ V ∗ w j (i) for all j = h then wg (i) = wh (i); end until the termination criterion is met;
Portfolio Optimization with Nonlinear Loss Aversion …
55
3 Results and Discussion To evaluate the empirical characteristics of CPT-portfolios, the optimization problem (3) was solved for an investment universe consisting of all constituents of the DJ30 index. Based on daily total returns for a period of 12 years (2001–2012), the following rolling-horizon methodology was implemented: (approximately) at the beginning of each year, starting from 02.01.2003, we selected a random sample of 20 assets that became our asset universe for next year. Using a rolling estimation window of 500 historical scenarios, we derived the optimal CPT-allocation subject to L 1 -norm transaction costs of 50 bps [9] and ran the strategy for 250 days with daily rebalancing. After 250 days, the asset set was dismissed and replaced by a new random sample of 20 assets from DJ30, which was kept for the next year. The procedure was iterated for 10 years. The yearly performance of the CPT-allocation was compared to that of a traditional M-V investor (refraining from predicting the mean, as recommended in [6], we effectively performed global minimum variance optimization with transaction costs). As shown in Fig. 1, the CPT-strategy based on Kahneman and Tversky’s estimates of behavioral parameters [11] performed globally better than M-V, with higher Sharpe ratios in periods of market downturn (years 2007–08). However, the M-V allocations recovered faster after the global financial crisis (second-half of 2009). Interestingly, the daily average turnover was always lower for CPT-portfolios, implying reduced transaction costs relative to M-V and thus a positive effect on the net portfolio performance. Summary statistics reported in Table 1 (average results over 10 years: 2003–2012) suggest that CPT-allocations were able to generate (on average) higher values for the annualized return, Sharpe and Sortino ratios and embedded a positive skewness into the realized distribution of portfolio returns. An interesting relationship between LA and positively skewed wealth deviations from the reference point was observed in [7], who also noticed that alternative choices of behavioral parameters may induce a change in the skewness preferences of CPT investors. We additionally computed the Average Concentration Coefficient (ACC), which is defined by the sample mean of the inverse of the Herfindhal–Hischman concentration measure and can be roughly interpreted as the number of assets picked by the portfolio strategy [8]. As shown in Table 1, CPT-allocations tended to concentrate on fewer assets relative to M-V, contributing to reduce transaction costs and simplify portfolio management issues. These findings seem to confirm the outcomes of a CPT-based index tracking strategy recently proposed by [5] and suggest interesting directions for future research. Different types of decision makers could be examined by considering more flexible choices of the reference point, going beyond the status-quo and addressing the issue of time-consistent CPT-strategies in light of some recent developments [3]. We are also investigating the impact of different market conditions, other forms of transaction costs (e.g. piecewise-linear, L 2 -norm) and the inclusion of a cardinality constraint that could be easily handled within our version of the PSO algorithm.
56
A. Avellone et al. % Cumulative return: CPT % Cumulative return: Min Variance Sharpe ratio 1.30 0.80
60
0.58 0.78
-0.04 0.38
1.76 1.47
0.0321 0.0397
0.0391 0.0443
0.0410 0.0376
1.06 0.56
-1.03 - 1.13
0.50 0.82
0.0367 0.0501
0.0360 0.0419
0.0203 0.0468
0.44 0.71
- 0.004 - 0.22
0.72 0.50
0.0266 0.0304
0.0303 0.0366
0.0322 0.0376
40
20
0
Turnover 0.0354 0.0375
-20 2
200
12.
31.
3
200
12.
29.
4 200
12.
27.
5
200
12.
21.
6
200
12.
19.
7
200
12.
18.
8
200
12.
15.
9
200
12.
11.
0
201
12.
09.
201
12.
06.
1
201
12.
06.
2
Fig. 1 Performance of CPT-allocations (blue) compared to Minimum Variance portfolios (red), net of transaction costs (short-sales are not allowed). Top line: out-of-sample Sharpe ratios; bottom line: daily turnover, computed by the average fraction of wealth traded in each period [9]. CPTparameters [11]: r = 0, α = β = 0.88, λ = 2.25; portfolio weights are bounded above by 0.25 Table 1 Summary statistics for CPT and minimum variance portfolios, net of transaction costs Ann. Ret. Sharpe Sortino ACC Turnover Skewness Kurtosis CPT M-V
4.6557 3.7455
0.5291 0.4665
1.6905 1.4766
7.0967 8.1466
0.0329 0.0403
0.1386 0.0034
15.8615 14.9467
References 1. Chang, T.-J., Meade, N., Beasley, J.E., Sharaiha, Y.M.: Heuristics for cardinality constrained portfolio optimisation. Comput. Oper. Res. 27, 1271–1302 (2000) 2. De Giorgi, E., Hens, T.: Making prospect theory fit for finance. Financ. Mark. Portf. Manag. 20, 339–360 (2006) 3. Deng, L., Pirvu, T.A.: Multi-period investment strategies under cumulative prospect theory. J. Risk Financ. Manag. 12, 83 (2019) 4. Eberhart, R., Kennedy, J.: Particle swarm optimization. In: Proceedings of the IEEE Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995) 5. Grishina, N., Lucas, C.A., Date, P.: Prospect theory-based portfolio optimization: an empirical study and analysis using intelligent algorithms. Quant. Financ. 17, 353–367 (2017) 6. Hautsch, N., Voigt, S.: Large-scale portfolio allocation under transaction costs and model uncertainty. J. Econ. 212, 221–240 (2019) 7. Kwak, M., Pirvu, T.A.: Cumulative prospect theory with generalized hyperbolic skewed t distribution. SIAM J. Financ. Math. 9, 54–89 (2018) 8. Mainik, G., Mitov, G., Rüschendorf, L.: Portfolio optimization for heavy-tailed assets: extreme risk index vs Markowitz. J. Empir. Financ. 32, 115–134 (2015) 9. Olivares-Nadal, A., DeMiguel, V.: A robust perspective on transaction costs in portfolio optimization. Oper. Res. 66, 733–739 (2018) 10. Padhye, N., Pulkit, M., Deb, K.: Feasibility preserving constraint-handling strategies for real parameter evolutionary optimization. Comput. Optim. Appl. 62(3), 851–890 (2015) 11. Tversky, A., Kahneman, D.: Advances in prospect theory: cumulative representation of uncertainty. J. Risk Uncertain. 4, 297–323 (1992)
Monte Carlo Valuation of Future Annuity Contracts Anna Rita Bacinello, Pietro Millossovich, and Fabio Viviano
Abstract In this paper we propose a methodology for valuing future annuity contracts based on the Least-Squares Monte Carlo approach. We adopt, as first step, a simplified computational framework where just one risk factor is taken into account. We give a brief description of the valuation procedure and provide some numerical illustrations. Furthermore, to test the efficiency of the proposed methodology, we compare our results with those obtained by applying a straightforward and timeconsuming approach based on nested simulations. Keywords LSMC · Life annuities · Longevity risk · Stochastic mortality
1 Introduction Over the 20th century, due to health improvements and medical advances, it has become evident that people tend to live longer and longer. Indeed, the mortality of individuals over time has exhibited many stylized features. In particular, looking at the survival curve for most developed countries around the world, it is immediately clear that mortality levels are decreasing as time passes by, leading to an increase in A. R. Bacinello · P. Millossovich · F. Viviano (B) Department of Business, Economics, Mathematics and Statistics ‘B. de Finetti’, University of Trieste, Piazzale Europa 1, 34127 Trieste, Italy e-mail: [email protected] A. R. Bacinello e-mail: [email protected] P. Millossovich e-mail: [email protected] P. Millossovich Faculty of Actuarial Science and Insurance, The Business School (Formerly Cass), City, University of London, 106 Bunhill Row, London EC1Y 8TZ, UK F. Viviano Department of Economics and Statistics, University of Udine, Via Tomadini 30/A, 33100 Udine, Italy © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_10
57
58
A. R. Bacinello et al.
individual’s life expectancy. As a consequence, life insurance companies and pension providers need to face the so-called longevity risk. The actuarial literature has increasingly focused, in the last decades, on studying and proposing several methods for managing and evaluating this source of risk. The importance of modelling and transferring such a risk is argued in [2]. In particular, it is highlighted how the new longevity-linked capital market instruments could help in facilitating the development of annuity markets and hedging the long-term viability of retirement incomes. As a further consequence, we may recall the non-negligible impact on liabilities of insurers and pension plans, as studied in [11]. Recently, some attention has been devoted to the valuation of life annuity contracts issued at a distant future time. This problem has many sources of uncertainty, among which the most relevant are future interest rate and mortality levels. In this regard, [5, 7] suggest comonotonic approximations of the life annuity conditional expected present value. Moreover, [4, 6, 9] propose an approach based on a Taylor series approximation of the involved conditional expectation. In this paper, we propose a simulation based method to evaluate the distribution of future annuity values. In particular, we aim at avoiding the straightforward approach based on nested simulations which is quite time-demanding, especially in a complex framework. The methodology described in what follows provides an application of the well-established Least-Squares Monte Carlo algorithm (LSMC), originally proposed by [10] for pricing American-type options. The most important advantage of this method is its flexibility to accommodate any type of Markov mortality model, and the possibility to extend it to more complicate frameworks without increasing the complexity of the involved computations. The paper is structured as follows. In the next section we introduce the problem under scrutiny and describe our assumptions and the methodology used to solve it, in Sect. 3 we present a numerical example, and Sect. 4 contains some conclusions.
2 Problem and Methodology The ever-increasing interest on adequately evaluating life insurance products or retirement incomes at a future time relates to the need of providing a reliable valuation of the cost of life expectancy, and to prevent somehow possible insolvency issues. In this paper, we aim at simulating the distribution of the value of an immediate life annuity contract issued to an individual aged x + T at a future time horizon T . We define the current value at the future time T > 0 of a unitary immediate annuity for an individual then aged x + T as ax+T (T ) =
+∞ i=1
B(T, T + i) i p x+T (T ) ,
(1)
Monte Carlo Valuation of Future Annuity Contracts
59
where B(T, T + i) is the i-th years discount factor prevailing at time T > 0 and i p y (T ) is the i-th years survival probability for an individual aged y at time T . The quantities B(T, T + i) and i p y (T ) appearing in (1) are both random variables at time 0 (today), and consequently also ax+T (T ) is random. More precisely, these variables are expectations conditional on the information available at time T . To evaluate these conditional expectations we need models for describing the stochastic evolution of both interest and mortality rates. Under some circumstances, some closed form formulae for computing them are available, for instance when affine processes are used (see [1]), but in general this is not guaranteed. As previously mentioned, a straightforward approach would rely on a simulation within simulation procedure, also known as nested simulations; however, since it is quite computationally time-consuming, we are going to propose an application of the LSMC method.
2.1 Framework: Stochastic Mortality Dynamics Although we have just mentioned that there are at least two sources of uncertainty affecting the value of an annuity, in this paper we assume a constant risk-free rate and adopt a stochastic model only for projecting future mortality levels. To this end, we use the Poisson version of one of the most significant and widely applied stochastic mortality models, i.e. the Lee-Carter model (see [8]). Hence, we assume that the number of deaths at age x and calendar year t, Dx;t , is Poisson distributed with parameter E x;t m x;t , where E x;t and m x;t denote the central exposure and the central death rate, respectively. Moreover, according to [8], we assume that the force of mortality is constant over each year of age and calendar year and equal to the corresponding central death rate m x;t , modelled as log m x;t = αx + βx κt , where αx , βx are age specific parameters and κt is a period index dictating the decrease over time in m x;t . Therefore, by exploiting the fact that κt is usually modelled as a Markov process, and typically as a random walk with drift, we have: i
p x+T = E exp − m x+T ;T + · · · + m x+T +i−1;T +i−1 | κT ,
and, within this framework, we can rewrite (1) as ax+T (T ) = E
ω−T −x
exp − ir + m x+T ;T + · · · + m x+T +i−1;T +i−1 | κT ,
i=1
(2) where ω is the ultimate age and r the constant interest rate.
60
A. R. Bacinello et al.
2.2 Valuation Procedure The previously introduced framework does not produce a closed form formula for (2), as typically the central death rates have a lognormal distribution so each exponent in (2) involves the sum of lognormal variables. Hence, a possible strategy is to evaluate the involved conditional expectation through simulation based methods. A straightforward approach would rely on a nested simulations procedure. This strategy requires first to simulate all relevant risk factors up to time T (outer scenarios); then, for each simulated time T value of such factors, one would need to simulate forward starting from that particular value (inner simulations), and finally compute conditional expectations by averaging across all inner simulations. It follows that this method can be computationally expensive, in particular when several annuity values (at different times and/or ages) are needed. Therefore, in order to reduce the computational complexity, we propose an alternative methodology based on the LSMC approach and, to check the accuracy of the results, we compare them with those obtained through nested simulations, so that the latter acts as benchmark for evaluating the efficiency and the accuracy of the LSMC procedure (see [3]). The LSMC approach involves two main steps: firstly, we need to perform simulations of future mortality patterns; then, we use regression across the simulated trajectories in order to obtain estimates of future annuity values. In this way, the conditional expectation is evaluated through regression taking into account the information available at time T (i.e. the simulated values of the time index parameter κT exploited as predictor). Moreover, this method allows to obtain an estimate of the probability distribution of annuity values at future time horizon T for individuals aged x + T at that date. Finally, a single set of simulations, without increasing the computational demand, can be used for different ages and time horizons.
3 A Numerical Example In this section, we provide an example based on an immediate life annuity issued to an individual aged 40 at different future time horizons T ∈ {10, 20, 30, 40}. In order to simulate future mortality patterns, we fit the Poisson Lee-Carter model to the Italian male population data over the period 1965–2014 and range of ages 0–90, obtained through the Human Mortality Database. Further, we assume that year 2014 corresponds to the evaluation time 0 (today). The risk-free rate is set at the (constant) level r = 0.03. Moreover, we simulate 10000 different trajectories of future mortality and, in the nested simulation approach, we further simulate another 10000 paths starting from each value generated at time T ; in total this amounts to 100 millions inner simulations. Regarding the basis functions, we use polynomials with degree p = 4.
Monte Carlo Valuation of Future Annuity Contracts
61
Table 1 Distribution of annuity values at time horizon T for individuals aged 40 in year 2014 + T
T = 10 T = 20 T = 30 T = 40
MC LSMC MC LSMC MC LSMC MC LSMC
Mean
Std dev
Skewness Kurtosis
10th perc. 90th perc.
24.42 24.42 24.78 24.78 25.11 25.11 25.40 25.40
0.11 0.11 0.14 0.14 0.16 0.15 0.16 0.16
−0.08 −0.06 −0.14 −0.11 −0.20 −0.19 −0.23 −0.24
24.27 24.27 24.60 24.61 24.91 24.91 25.20 25.20
2.94 2.91 3.04 3.02 3.05 3.04 3.13 3.05
24.56 24.56 24.96 24.96 25.30 25.30 25.60 25.60
Table 1 reports some statistics of the distributions of future annuity values obtained through the two valuation algorithms. Looking at the results, it immediately turns out that, as the time horizon T increases, the distribution changes. Specifically, its mean increases, which is quite reasonable and in line with the ever-increasing life expectancy registered in the last decades. In addition, its standard deviation increases as well, which implies a more dispersed distribution. This result also seems to be reasonable due to the higher uncertainty caused by the longer time horizon. Furthermore, it seems that the distributions tend to be increasingly left-skewed, which implies a longer left tail, hence the distribution is concentrated on the right tail (i.e. higher values of the annuity contract). Finally, we see that the kurtosis increases, meaning that we recognize a heavier tailed distribution, hence a greater propensity to result in extreme annuity values with respect to the Gaussian case. Concerning the validation procedure, we can see from Table 1 that the LSMC approach provides quite accurate estimates. Moreover, the reliability of the proposed approach is evidenced by the fact that the obtained distribution overlaps substantially with the one produced through nested simulations; this is also confirmed by the Kolmogorov–Smirnov test (KS, see Table 2). In addition, we have constructed the Q-Q plots by considering the distribution obtained through nested simulations as the theoretical one, and these graphs, once again, confirm the goodness of the proposed method in approaching this kind of problem.1 Finally, for a more comprehensive analysis, we checked whether the LSMC approach tends to over- or under-estimate the quantities of interest. In this regard, in Table 2 we report the frequency with which the LSMC estimates lie inside the 95% confidence interval obtained through the nested simulation procedure or outside (on the left or on the right, respectively). Looking at this result, we can see that most of the time the LSMC method provides an estimate which lies within that interval. However, even if there is a small signal of under-estimation effect which could be due to biases in the regression, we can assess the goodness of the proposed method. 1
We do not report the distribution graphs and the Q-Q plots for space considerations.
62
A. R. Bacinello et al.
Table 2 Frequency of hitting the confidence intervals (left table) and KS Test (right table) Left (%) Inside (%) Right (%) KS stat. Value p-value T T T T
= 10 = 20 = 30 = 40
8.09 12.57 5.35 12.67
87.68 80.43 87.50 83.47
4.23 7.00 7.15 3.86
0.0059 0.0094 0.0036 0.0053
0.9950 0.7689 0.9996 0.9990
4 Conclusions In this paper, we faced the problem of approximating future annuity values. We proposed a methodology based on the LSMC approach which turns out to be quite accurate. Our results highlight the need of developing reliable actuarial models able to capture the source of risk arising from longevity. This is not a negligible aspect, especially for solvency purposes. This paper can be extended in several directions, by assuming more complicated valuation frameworks (e.g., stochastic interest rates), or by dealing with other types of life annuity contracts such as variable annuities, equity-indexed products, or by implementing de-risking strategies for pension plans, e.g. Buy-Ins and Buy-Outs, which require an accurate valuation of annuities.
References 1. Biffis, E.: Affine processes for dynamic mortality and actuarial valuations. Insur. Math. Econ. 37, 443–468 (2005) 2. Blake, D., Cairns, A., Coughlan, G., Dowd, K., MacMinn, R.: The new life market. J. Risk Insur. 80, 501–558 (2013) 3. Boyer, M., Stentoft, L.: If we can simulate it, we can insure it: an application to longevity risk management. Insur. Math. Econ. 52, 35–45 (2013) 4. Cairns, A.J.: Modelling and management of longevity risk: approximations to survivor functions and dynamic hedging. Insur. Math. Econ. 49, 438–453 (2011) 5. Denuit, M.: Comonotonic approximations to quantiles of life annuity conditional expected present value. Insur. Math. Econ. 831–838 (2008) 6. Dowd, K., Blake, D., Cairns, A.J.G.: A computationally efficient algorithm for estimating the distribution of future annuity values under interest-rate and longevity risks. N. Am. Actuar. J. 15, 237–247 (2011) 7. Hoedemakers, T., Darkiewicz, G., Goovaerts, M.: Approximations for life annuity contracts in a stochastic financial environment. Insur. Math. Econ. 239–269 (2005) 8. Lee, R.D., Carter, L.R.: Modeling and forecasting U.S. mortality. J. Am. Stat. Assoc. 87, 659–671 (1992) 9. Liu, X.: Annuity uncertainty with stochastic mortality and interest rates. N. Am. Actuar. J. 17, 136–152 (2013) 10. Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14, 113–147 (2001) 11. Oppers, S.E., Chikada, K., Eich, F., Imam, P., Kiff, J., Kisser, M., Soto, M., Sun, T.: The financial impact of longevity risk. In: Global Financial Stability Report, Chap. 4. IMF (2012)
A Risk Based Approach for the Solvency Capital Requirement for Health Plans Fabio Baione, Davide Biancalana, and Paolo De Angelis
Abstract The study deals with the assessment of risk measures for Health Plans in order to assess the Solvency Capital Requirement. For the estimation of the individual health care expenditure for several episode types, we suggest an original approach based on a three-part regression model. We propose three Generalized Linear Models (GLM) to assess claim counts, the allocation of each claim to a specific episode and the severity average expenditures respectively. One of the main practical advantages of our proposal is the reduction of the regression models compared to a traditional approach, where several two-part models for each episode types are requested. As most health plans require co-payments or co-insurance, considering at this stage the non-linearity condition of the reimbursement function, we adopt a Montecarlo simulation to assess the health plan costs. The simulation approach provides the probability distribution of the Net Asset Value of the Health Plan and the estimate of several risk measures. Keywords Health · SCR · GLM · Simulation · Risk measure
1 Introduction The Italian National Health System (SSN) is based on three pillars. In particular, the second is mainly characterized by private group health plans and usually provided through labor agreements. However, the lack of a clearly defined authority and solF. Baione · D. Biancalana Department of Statistics, Sapienza University of Rome, Viale Regina Elena, 295, 00161 Rome, Italy e-mail: [email protected] D. Biancalana e-mail: [email protected] P. De Angelis (B) Department of Methods and Models for Economics, Territory and Finance, Sapienza University of Rome, Rome, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_11
63
64
F. Baione et al.
vency requirements raises insolvency risk, particularly for self-insured funds. Our aim is to introduce an actuarial global framework that allows an estimation of a short or medium term solvency capital requirement. The first step consists of the prediction of the one-year health care expenditures at an individual level for several episode types, with an original approach, alternative to the ones in [1, 2]. Then, considering deductibles, copayments and other limitations working on single episode, or single person or family level, we focus on the estimate of the reimbursement amount. Our final goal is the estimate of the density function of health plan profit (losses) and revenues by a simulation technique to calculate the solvency capital requirement according to several risk measures.
2 Actuarial Framework We consider an Health Plan (henceforth HP) composed by r policyholders. Let • i index the i-th policyholder, 1 ≤ i ≤ r ; • j index the j-th branch of health expenditure, 1 ≤ j ≤ J ; • h index the h-th family, 1 ≤ h ≤ H In the following we denote with • • • •
N the random variable (r.v.), number of episodes per year; T the r.v. branch of the episode requested; Y the r.v. expenditure for single episode; Z the r.v. expenditure for single policyholder per year;
In classical individual risk model, the expenditure of the i-th policyholder given a specific branch j is Ni, j Z i, j = Yi, j,g (1) g=1
where Yi, j,g is the r.v. expenditure for i-th insured, j-th branch and g-th episode. The total expenditure of the HP is Z=
J r
Z i, j .
(2)
i=1 j=i
Under the typical assumption of independence between Ni, j and Yi, j,g and identical distribution on Yi, j,g , ∀g = 1, . . . , Ni, j , then E Z i, j = E Ni, j · E Yi, j .
(3)
A Risk Based Approach for the Solvency Capital Requirement …
65
It is worth nothing that E Ni, j = E [Ni ] · Pr ob [Ti = j|Ni > 0] ,
(4)
then the total expenditure for single insured is E [Z i ] =
J j=1
E Z i, j =
J
E [Ni ] · Pr ob [Ti = j|Ni > 0] · E Yi, j .
(5)
j=1
It is possible to calculate the expenditure for each family and single branch
Z h, j =
Z i, j ,
(6)
i∈h
then E Z h, j = i∈h E Z i, j and E [Z h ] = Jj=1 E Z h, j . Hence, the expected total expenditure for the health fund is E [Z ] =
H
E [Z h ] =
h=1
r
E [Z i ] .
(7)
i=1
In order to assess the reimbursement of the HP, we introduce the following notation • • • • • •
f j is the deductible for single episode for j-th branch; s j is the coinsurance for single episode for j-th branch; M j is the out of pocket maximum for single episode for j-th branch; M ∗j is the out of pocket maximum for family for j-th branch; L is the r.v. reimbursement for single episode; K is the r.v. reimbursement for single policyholder per year.
Then, the reimbursement for the single episode g of the i-th insured and j-th branch is (8) L i, j,g = min Yi, j,g − max s j · Yi, j,g ; f j ; M j . Following (1), the reimbursement of a single policyholder i for a specific branch j is given by Ni, j L i, j,g . (9) K i, j = g=1
Considering that there is a M ∗j for each branch j, that works on the single family, it is important to assess the r.v. reimbursement in terms of family as follows K h, j = min
i∈h
K i, j ;
M ∗j
.
(10)
66
F. Baione et al.
Hence, the total reimbursement should be computed only for family aggregation as follows J H K = K h, j . (11) h=1 j=i
Under the typical assumption of independence between Ni, j and L i, j,g and identical distribution on L i, j,g , ∀g = 1 . . . , Ni, j , then E K i, j = E Ni, j · E L i, j = E [Ni ] · Pr ob [Ti = j|Ni > 0] · E L i, j . (12) whereas, the expected reimbursement for the h-th family is
E K h, j = E min
K i, j ; M ∗j
.
(13)
i∈h
Hence, the one year expected total loss for the health fund is E [K ] =
J H
E K h, j .
(14)
h=1 j=1
Once defined the one year loss of the HP it is important to define contribution for the same time period. Assuming that each policyholder has to pay a fixed contribution b, the total cash in amount is C = b · r . Hence the r.v one year profit is U =C−K
(15)
E[U ] = C − E[K ] .
(16)
and
It is worth noting, that the one year total amount of contribution is deterministic. The actuarial framework represented refers to the expected values of r.v. considered. Hence, for the estimate of the expected value of r.v. Z , is sufficient to choose a sta tistical method of point estimation for E Ni, j , E Yi, j and Pr ob [Ti = j|Ni > 0] such as likelihood or method of moments. Then, all the expected values of Z for different aggregations are obtained by sum, since the expected value is a linear operator. It is important to note that Eqs. (8) and (10) introduce a non linear relation between the random variables involved due to the effect of limitations. This implies that to define the expected value of r.v. K and consequently U , an estimate of the density function (df) is necessary. Moreover, if we are interested on the assessment of a specific risk measure, we need to consider the relative properties of the assumed risk measure (e.g. subadditivity, homogeneity, etc.). This implies that also for the calculation of Z , and a fortiori for K and U , it is fundamental to estimate such a density function.
A Risk Based Approach for the Solvency Capital Requirement …
67
Given the complexity of the stated probability structure, the use of a simulation approach is necessary.
3 Numerical Investigation We set our framework on a database from an Italian HP between years 2009 and 2013. The portfolio has r = 53, 984 policyholders, and H = 24, 660 families. The number of observed episodes is 341,494 spread in 21 branches. In order to estimate the df of U we need to start by a specific probabilistic structure for the main r.v.s: • Ni is Negative Binomial distributed; • [Ti |Ni > 0] multinomial distributed; • Yi, j is Gamma distributed; The subscripts i and j means that we assume a specific distribution for each policyholder and branch, respectively. To this aim, a possible choice is the introduction of a dependency structure between the response variables and a set of covariates by means of a regression model. Considering the features of the r.v.s previously introduced Generalized Linear Models (see [3]) (GLM) seem an appropriate choice. Considering a vector of covariates xi for each policyholder, by means of GLM we can estimate the conditional mean of E [N |xi ] = E [Ni ], E [Ti = j|Ni > 0] and E [Y |xi , j ] = E Yi, j . In Fig. 1, we introduce an example of fitting analysis for the three GLM models based on the analysis over age of a Male for the branch “dental visit”: As one can observe in Fig. 1a, the expected number of claims increases as the age increases; whereas, in Fig. 1b for Pr ob [Ti = j|Ni > 0], a U-shape is observable as a consequence of the higher request of orthodontics implant in young ages and of the increase due to the dental disease at older ages. The expenditure for single episode has only a small increasing trend, however age is not a significant rating factor as observable in Fig. 1c. As previously stated, the assessment of the expected values of the r.d.s N , T |N and Y is only determinant for the calculation of the expected total expenditure E(Z ) (see Eq. 7), therefore for the estimation of the expected total amount of reimbursement E(K ) a simulation approach is necessary. By exploiting the assumed distributions and the capability of the GLM to provide the full set of parameters to specify the conditional density functions, we carry out a Monte Carlo simulation whose final result is the estimate of all involved r.v.s density functions and especially the profit r.v. U . The final outcome is a sampled distribution on which is possible to assess the Solvency capital requirement according to several risk measures and/or a specific regulatory framework. In Fig. 2 an example of the sample distribution of the r.v. U is reported:
68
F. Baione et al.
(a) Expected number of episodes
(b) Probability of a dental visit
(c) Expected expenditure per episode
Fig. 1 Goodness of fitting
A Risk Based Approach for the Solvency Capital Requirement …
69
Fig. 2 Density function of r.v. U
References 1. Duncan, I., Loginov, M., Ludkovski, M.: Testing alternative regression frameworks for predictive modeling of health care costs. N. Am. Actuar. J. 20(1), 65–87 (2016) 2. Frees, E.W., Lee, G.Y., Rosenberg, M.A.: Predicting the frequency and amount of health care expenditures. N. Am. Actuar. J. 15(3), 377–392 (2011) 3. Mc Cullagh, P.A., Nelder, J.A.: Generalized Linear Models. Taylor & Francis, USA (1989)
An Application of Zero-One Inflated Beta Regression Models for Predicting Health Insurance Reimbursement Fabio Baione, Davide Biancalana, and Paolo De Angelis
Abstract In actuarial practice the dependency between contract limitations (deductibles, copayments) and health care expenditures are measured by the application of the Monte Carlo simulation technique. We propose, for the same goal, an alternative approach based on Generalized Linear Model for Location, Scale and Shape (GAMLSS). We focus on the estimate of the ratio between the one-year reimbursement amount (after the effect of limitations) and the one year expenditure (before the effect of limitations). We suggest a regressive model to investigate the relation between this response variable and a set of covariates, such as limitations and other rating factors related to health risk. In this way a dependency structure between reimbursement and limitations is provided. The density function of the ratio is a mixture distribution, indeed it can continuously assume values mass at 0 and 1, in addition to the probability density within (0, 1). This random variable does not belong to the exponential family, then an ordinary Generalized Linear Model is not suitable. GAMLSS introduces a probability structure compliant with the density of the response variable, in particular zero-one inflated beta density is assumed. The latter is a mixture between a Bernoulli distribution and a Beta distribution. Keywords Health · Beta · GAMLSS · Regression
F. Baione (B) Department of Statistics, Sapienza University of Rome, Viale Regina Elena, 295 00161 Rome, Italy e-mail: [email protected] D. Biancalana Department of Statistics, Sapienza University of Rome, Rome, Italy e-mail: [email protected] P. De Angelis Department of Methods and Models for Economics, Territory and Finance, Sapienza University of Rome, Rome, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_12
71
72
F. Baione et al.
1 Introduction Insurance deductibles are a very important contract boundary which aim is to limit the abuse of reimbursement requests especially in health insurance policies. Generally, the data set of such contracts contains only censored and/or truncated values, but when the data contains the medical invoices, as well as the reimbursement values, it is possible to investigate the effect of deductibles in terms of the ratio of the truncated values and the overall loss, briefly referred as indicated deductible relativity (IDR). A standard reference for deductible pricing in actuarial science is in [1]. A practical approach to modeling the deductible rates in a ratemaking process is the adoption of regression models such as Generalized Linear Models (GLM), see [2–4] among others. Anyway, GLMs are used to predict aggregate claims of the truncated response variable using a log deductible covariate and their use is limited to the class of exponential family distributions. Our aim is to propose a different approach to measure the influence of deductibles on health care expenditures by focusing on a regression model, where IDR is the response variable. Anyway, the probability function of IDR in our framework belongs to a particular mixture distribution named Zero-One Inflated Beta (ZOIB) [5]. As a consequence we suggest to use a regression model of Generalized Linear Model for Location, Scale and Shape (GAMLSS) [6] type.
2 Actuarial Framework We consider an Health insurance company, whose portfolio is allocated on H homogeneous sub-portfolios. Let: • Yi is the random variable (r.v.) expenditure for single episode, i.e. before the application of deductibles; • L i be the r.v. reimbursement for single episode, i.e. after the application of deductibles; • xi : be the row vector of the design matrix, providing information about policyholder or contract rating factors; • f the amount of deductible; • M the amount of the out of pocket maximum. Hence, we can state: L i = min I(Yi − f >0) · Yi ; M
(1)
where I(Yi − f >0) is the indicator r.v. for the event Yi − f > 0. Hence, if the invoice amount is higher than f the health insurance pays the full expenditure within the limit fixed by M and 0 otherwise. In order to assess the influence of the deductible and the out of pocket maximum on the loss of the company, we focus on the random variable
An Application of Zero-One Inflated Beta Regression Models …
73
Ri = LYii , that is the proportion of expenditure reimbursed (Indicated Deductible Relativity IDR). Letting h index the number of the H risk classes and Dh be the gh -elements set of policyholders belonging to h-th risk class, we can state xi∈Dh = xh , ∀i ∈ Dh . Hence, in the following, E(R|xi∈Dh ) = E(R|xh ) = E(Rh ), ∀i ∈ Dh is the mean conditioned to the xh vector of risk factors. It is important to note that the r.v. Rh can continuously assume values between zero and one, but Pr ob (Rh = 0) > 0 and Pr ob (Rh = 1) > 0. It means that the density function of Rh is a mixture distribution, because Rh is derived from a collection of two random variables: the first one representing the event (Rh = 0 ∪ Rh = 1) and the second defining the level of proportion conditioned to the event (0 < Rh < 1). This mixture belongs to the so called Zero-One Inflated distributions as the density function of the IDR can continuously assume values mass at 0 and 1, in addition to the probability density within (0, 1). To model the density between (0, 1) we consider a Beta distribution. Then the r.v. Rh is assumed to be Zero-and-One Inflated Beta distributed. We introduce a mixture between a Bernoulli distribution and a Beta distribution as proposed in (Ospina and Ferrari [5]). Specifically, assuming that the cumulative distribution function (hereafter cdf) of the generic random variable R is: B E I N F (r ; po , p1 , a, b) = ( p0 + p1 ) · Ber r ;
p1 p0 + p1
+ (1 − p0 + p1 ) · Beta (r ; a, b)
(2)
p p where Ber ·; p +1p represents the cdf of a Bernoulli random variable with parameter p +1p and 0 1 0 1 Beta (r ; a, b) is the Beta cdf, whose density function is: beta(r ; a, b) =
Γ (a + b) · r a−1 · (1 − r )b−1 , Γ (a) · Γ (b)
0 21 and infinite, if α X < 21 . 1
Stochastic Dominance in the Outer Distributions of the α-Efficiency Domain Table 1 Financial interpretation of H (t) H (t) Stochastic Investors’ belief consequence >
1 2
Persistence Low variance
Future information will confirm past positions
=
1 2
0 collecting all the epochs of negative (respectively, positive) inefficiencies; • For each t ∈ Tγ−α (t ∈ Tγ+α ), the price X (t + h) is collected, for a set of h trading days ahead with respect to time t; and Y¯γ+α (h) = #(T1 α ) t∈Tγ α • Vectors Y¯γ−α (h) = #(T1 α ) t∈Tγ α ln X X(t+h) (t) γ− γ − + + ln X X(t+h) are calculated; (t) • the conditional distributions of the average log-price variations N (y) := FY¯γ α (1,...,h max ) (y) and P(y) := FY¯γ α (1,...,h max ) (y) are estimated for some relevant − + h max . The procedure ensures that the effects revealed by the conditional distributions do not depend on any specific event. Indeed, since the epochs in Tγ−α (as well as in Tγ+α ) can be very far one from each other, the consequent prices, the number of traded stocks, the market phases or even the economic cycle can greatly differ. Given the interpretation provided by Table 1, we expect to observe two effects: • N (y) ≤ P(y) for all y, with strict inequality at some y (first-order stochastic dominance);
∞
∞ • −∞ (y − IE(y))2 d N (y) > −∞ (y − IE(y))2 dG(y) (larger variance for negative inefficiency).
4 Application and Discussion of Results The procedure described in Sect. 3 was applied to the analysis of two stock indexes: the U.S. Dow Jones Industrial Average (DJIA), and the U.K. Footsie 100 (FTSE100), both referred to a period of 35 years (from January 29, 1985 to December 31, 2019), resulting in 8802 observations for the DJIA and 8824 observations for the FTSE100. The analysis was performed by setting h from 1 up to 250 trading days and h max to 1, 3, 6 trading months and 1 trading year. The significance level to test for inefficiency
Stochastic Dominance in the Outer Distributions of the α-Efficiency Domain DJIA
99
FTSE 100
0.2
0.16 0.14
0.15
0.12 0.1
0.1
0.08 0.05
0.06 0.04
0
0.02 0
-0.05
-0.02 -0.1
0.43
245
0.48
205 165
0.53 125
0.58
85 0.63
45 5
-0.04
0.38
245
0.43
205 165
0.48 125
0.53
85 0.58
45 5
Fig. 1 Time average returns with respect to H and number of days ahead, with h max = 1 trading year
Fig. 2 Two left panels: Conditional distributions of the averaged log-price variations for positive (dotted line) and negative (continuous line) inefficiency, with h max = 1 trading month. Two right panels: Conditional distributions of the averaged log-price variations for positive (dotted line) and negative (continuous line) inefficiency, with h max = 6 trading months
was set at α = 0.05, corresponding to −1 (α/2) 0.47 and −1 (1 − α/2) 0.53. The results, reproduced in Figs. 1 and 2, show that in the short term the returns behave as expected. In detail, Fig. 1 displays that up to one trading year both the indexes have conditional mean variations generally higher for negative inefficiency than those of the positive inefficiency case. The pattern is even more evident if one looks at the extremal values of the estimated pointwise regularity eponents. An element of deep distinction between the two indexes can be observed as H approaches to 21 : the conditional mean variations continue to be largely positive for the DJIA whereas they incur in a significant downward correction for the FTSE100. A possible explanation for this effect is constituted by the injections of liquidity that the Federal Reserve provided to the U.S. market during the last global financial crisis. As well documented, this caused U.S. financial market to raise. Figure 2 confirms the findings above in terms of (at least) first-order stochastic dominance between the distributions of the conditional mean variations up to one trading month (h max = 21 days). Interestingly, as h max increases up to six trading months, the evidence becomes more questionable: whereas negative inefficiency continues to generate (moderate) overreaction for the DJIA (with N (y) still domi-
100
S. Bianchi et al.
Fig. 3 Variance of the distributions of the averaged log-price variations, h max = 21 days
nating P(y)), for the FTSE100, N (y) can be almost overimposed to P(y). Again, this can be a symptom of the different level of liquidity provided to the two markets. Finally, Fig. 3 displays the behaviour of the variances of negative and positive inefficiency, up to h max = 21 days, which is the larger time horizon for which we find evidence of first-order stochastic dominance for both the indices. Data confirm that, at least in the two samples, larger variance occurs for negative inefficiency.
5 Conclusions and Further Developments We used the notion of α-efficiency to characterize a well-documented behaviour of stock markets: the under/overreaction. Evidence is provided that negative inefficiency generates overreaction, opposed to positive inefficiency which generally triggers underreaction. More extensive analyses can be made on individual stocks.
References 1. Alves, P., Carvalho, L.: Recent evidence on international stock markets overreaction. Technical Report, MPRA Paper 97983, University Library of Munich, Germany (2020) 2. Benassi, A., Cohen, S., Istas, J.: Identifying the multifractional function of a gaussian process. Stat. Prob. Lett. 39, 337–345 (1998) 3. Bianchi, S., Pantanella, A., A,P.: Modeling stock prices by multifractional brownian motion: an improved estimation of the pointwise regularity. Quant. Fin. 13(8), 1317–1330 (2013) 4. Bianchi, S., Pianese, A.: Time-varying Hurst-Hölder exponents and the dynamics of (in)efficiency in stock markets. Chaos, Solitons Fractals 109, 64–75 (2018) 5. De Bondt, W.F.M., Thaler, R.: Does the stock market overreact? J. Fin. 40(3), 793–805 (1985) 6. Fama, E.F.: Efficient capital markets: a review of theory and empirical work. J. Fin. 25(2), 383–417 (1970) 7. Frezza, M.: A fractal-based approach for modeling stock price variations. Chaos 28(091102), 1–6 (2018)
Stochastic Dominance in the Outer Distributions of the α-Efficiency Domain
101
8. Garcin, M.: Estimation of time-dependent hurst exponents with variational smoothing and application to forecasting foreign exchange rates. Physica A: Stat. Mech. Appl. 483(Supplement C), 462–479 (2017) 9. Istas, J., Lang, G.: Variations quadratiques et estimation de l’exposant de Hölder local d’un processus gaussien. Ann. Inst. Henri Poincaré 33(4), 407–436 (1997) 10. Lux, T., Segnon, M.: Multifractal models in finance: their origin, properties and applications. In: Chen, S.-H., Kaboudan, M., Du Y.-H., (eds.) Oxford Handbooks. Online Publication Date: Feb 2018 11. Pianese, A., Bianchi, S., Palazzo, A.M.: Fast and unbiased estimator of the time-dependent Hurst exponent. Chaos 28(31102), 1–6 (2018) 12. Siwar, E.: Survey of the phenomenon of overreaction and underreaction on french stock market. IUP J. Behav. Fin. 8(2), 23–46 (2011)
Formal and Informal Microfinance in Nigeria. Which of Them Works? Marinella Boccia
Abstract The aim of this paper is to study whether access to formal/informal credit in Nigeria has an effect on household wellbeing. Using a General Household Panel Survey of the World Bank, quantitative methods are employed in order to evaluate these effects. The empirical results make in evidence a positive relationship especially on expenditure of non-durable goods. Moreover the results are no surprising in total income and in the reduction in poverty and inequality. Basically the improvement in their living standards is recorded in the same year in which they borrow. Keywords Microfinance · Wellbeing · Econometrics
1 Introduction “Microfinance institutions have rapidly evolved in the last decade and have been able to create significant income and employment opportunities for the poor in developing countries” [5]. In these countries, rural people, that rely mostly on moneylenders for credit, and that may have not access to safe, and convenient savings services, could benefit from the development of formal and/or semi-formal microfinance institutions if they have the possibility to reach clients with financial products that are more useful and lessexpensive [18]. Rural areas, in fact, are characterized by a lower costs given their access to information and social capital and given that people have the possibility to loan inside the group that guarantees each other’s [6]. As to the Microfinance, according to [19] it may be classified in Formal, Semiformal and Informal. Informal Microfinance is made on by people themselves and it is related to organizations and individuals, and usually not institutions; moreover the external intervention and legal status are excluded from that [6]. Semi-Formal Microfinance refers to the services in-between formal and informal microfinance. It is typically not regulated by finance authorities even if it may be necessary for the providers to have a license M. Boccia (B) University of Salerno, Via Giovanni Paolo II 132-Fisciano (SA), Salerno, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_16
103
104
M. Boccia
and be supervised by government agencies citeLedgerwood. Finally formal Microfinance contains services provided by public and private banks, insurance and finance businesses under the banking regulations and control [19]. Moreover the incapacity of the formal financial institutions to offer financial services to both the urban and rural poor increased the development of microfinance institutions [3]. For example in Nigeria informal microfinance works below different names: esusu or ajo among the Yorubas of Western Nigeria, etoto for the Igbos in the East and adashi in the North for the Hausas [3, 6, 20]. Its most important factors are savings and credit components, informality of actions and lower interest rates. These informal institutions, that operate traditional microfinance in various practices, are in all the rural area in Nigeria [16, 20]. Given that, the aim of this paper is to study whether access to formal/informal credit in Nigeria has an effect on household wellbeing. This study sets a General Household Survey.1 (GHS in what follow) of World Bank (WB in what follows), that contains a sets of structured questionnaires for 5,000 households in 2010 (Post Planting Period) and 2011 (Post Harvest Period). The paper is organized as follows: Sect. 2 provides a review of Literature, Sect. 3 describes the most important variables employed in this analysis. Section 4 defines the estimation strategy. Finally Sect. 5 comments the results and concludes.
2 Related Literature During the last years the literature was interested in studying the Microfinance and more in detail in evaluating its effect on poverty reduction and welfare. A lot of studies found a positive relationship between microfinance consumption and production [10, 13]. Moreover this evidence is not unique and does not support the positive link according to which microcredit might be a way to alleviate poverty and reduce inequality [4, 21]. The relationship between microcredit/microfinances and welfare was also investigated employing both experimental [2] and quasi experimental analysis for which results are different. As to the results of the experimental studies, they provide robust empirical evidence for a positive relationship between access to finance and household welfare. Moreover analysing the quasi-experimental design [7, 8, 11, 14, 17] some studies showed a positive effect in improving consumption [8, 17] and in modifying investment allocations and impacting amounts of wage growth and wealth creation [8]. Moreover a greater financial access from banks and other financial institution is associated with a reduced dependence on household production and increased investment in human capital [1]. 1
National Bureau of Statistics, Federal Republic of Nigeria. Nigeria General Household Survey (GHS), Panel 2010, Wave 1 Ref. NGA_2010_GHSP-W1_v03_M. Dataset downloaded at http:// microdata.worldbank.org/index.php/catalog/1002/download/40238 on 03/04/2019—University of Salerno. The user of the data (Marinella Boccia) acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
Formal and Informal Microfinance in Nigeria. Which of Them Works?
105
3 Inequality and Well-Being The analysis of wellbeing is related to the distribution of income and the distribution of consumption. In what follows both income (total income) and expenditure (monthly non-durable expenditure,the variation in monthly non-durable expenditure and equalized expenditure) measurements of well-being are employed, as well as the share food. As to the Monthly non-durable expenditure (exp2010; exp2011I) it includes spending on food away from home, food at home, transport, tobacco, newspapers and other non-durable items; the variation in monthly non-durable expenditure (deltaexp) is the difference between 2011 and 2010 and captures the growth of this outcome over the period considered; Equalized monthly expenditure (eqvexp2010; eqvexp2011) is computed by dividing expenditure by the square root of household size; Share of food (sfood2011; sfood2010) is defined by the ratio of food expenditure and total expenditure, while the measure of total income (totinc2011; totinc2010) contains information on value of sales of no farm enterprises, farm enterprises, sales of animals, rental income and other income. All variables are deflated using the Consumer Price Index produced by the National Bureau of Statistics (NBS) for 2010 and 2011.
4 Estimation Strategy Equations 1 and 2 set out a value added specification relating changes in wellbeing to access to formal and informal microcredit during 2010. Estimation thus yields the change in well-being that is related to the two forms of credit considered. The analysis was computed considering the area fixed effects. Yi = α0 + β1 DiF + β2 DiI + δχi + ηa + εi
(1)
Yi2011−2010 = α0 + β1 DiF + β2 DiI + δχi + ηa + εi
(2)
In the above specification, Eq. 1, Yi , indicates the outcomes which above (Monthy non-durable Expenditure, Equalized expenditure, Total income, Share food) in 2010 and 2011, while Yi2011−2010 , in Eq. 2, displays the variation in monthly non-durable expenditure, DiF and DiI , are dummies variables indicating respectively if households have access to formal and informal credit in 2010, χi represents a set of households characteristics measured in 2010, ηa is an area specific fixed effect and εi is an error term. The same estimation is replicated in the case in which head female have access to formal and informal credit. Moreover, also an analysis by area was performed: Ya = α0 + β1 DaF + β2 DaI + δχa + εa
(3)
106
M. Boccia
Table 1 FE estimate of microcredit impact Outcomes Deltaxp Formal credit Informal credit Observations R-squared
−0.0279 (0.0626) −0.0457 (0.0349) 4,805 0.170
Exp2010
Exp2011
0.254*** (0.0554) 0.120*** (0.0379) 4,957 0.502
0.243*** (0.0569) 0.0832** (0.0331) 4,815 0.440
a Effect of Formal and Informal microcredit on wellbeing variables. The independent variables which results are not reported for brevity but available upon request- are number of household members, age head, age household in average, gender of the head, number of spouses, number of adult female, number of adult male, years of schooling. Fixed effects are by area. Robust standard errors clustered by area are in parenthesis. * Significant at 10%; ** significant at 5%; *** significant at 1%
The estimation of the effects of formal and informal access to credit by area are computed with an OLS regression (Eq. 3). In order to do this an inequality area indicator is considered2 toghter with the poverty rate for 2010 and 2011.3
5 Results and Conclusions The overall results related to Eqs. 1 and 2 (see Tables 1, 2 and 3) indicates that the microcredit has positive impact, especially for access to formal credit, respect to informal, on the value of non-durable expenditure, in fact, the total monthly non-durable expenditure and equalized expenditure exhibit positive and significant coefficients, moreover the effect is higher in the same year in which they borrow. This, however, given the negative coefficient of share food, seems to be dragged by the other components of non-durable expenditure. This means that households that have access to formal microcredit, more than who have access to informal, allocate resources to improve productive activities as showed by the increasing in total income. Also, the role of women is evident mainly in the context of formal credit. As to the results by area are still far from being able to argue about an effective reduction of poverty and inequality.4 2
It is represented by the standard deviation of the logged non-durable expenditure for both the 2010 and 2011. 3 The poverty rate represents the average of poverty rate by area. It is computed taking into account the population below the 40th percentile of the distribution of log total monthly non-durable expenditure both for 2010 and 2011 and its variation taking into account the fact that the World Bank assuming as poor people that have an expenditure level below the 40 percentile. 4 Both the results related to the estimation of Eqs. 1 and 2—with the interaction with female headand to the estimation of Eq. 3-Poverty and Inequality by area- are not reported for brevity, but available upon request.
Formal and Informal Microfinance in Nigeria. Which of Them Works? Table 2 FE estimate of microcredit impact Outcomes Eqvexp2010 Formal credit Informal credit Observations Rsquared a See
0.239*** (0.0556) 0.114*** (0.0382) 4,957 0.456
Eqvexp2011
Totinc2010
0.226*** (0.0574) 0.0777** (0.0332) 4,815 0.417
0.283* (0.156) −0.0172 (0.0860) 3,047 0.292
Sfood2010
Sfood2011
−0.0499*** (0.0169) −0.00904 (0.00844) 4,957 0.270
−0.00560 (0.0162) −0.00896 (0.00814) 4,815 0.299
Note Table 1
Table 3 FE estimate of microcredit impact Outcomes Totinc2011 Formal Credit Informal Credit Observations R-squared a See
107
0.217 (0.161) 0.0109 (0.0824) 3,065 0.297
Note Table 1
Disclaimer The author-user of the data- acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses. Conflict of Interest The author declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References 1. Amendola, A., Boccia, M., Mele, G., Sensini, L.: an assessment of the access to credit-welfare nexus: evidence from mauritania. Int. J. Bus. Manag. (2017). https://doi.org/10.5539/ijbm. v12n9p77 2. Banerjee, A., Duflo, E., Glennerster, R., Kinnan, C.: The miracle of microfinance? applied economics, evidence from a randomized evaluation. Am. Econ. J. (2015). https://doi.org/10. 1257/app.20130533 3. CBN (2000 and 2001): Annual Report and Statement of Account, Central Bank of Nigeria, Abuja. http://www.cbn.gov.ng/Documents/cbnannualreports.asp 4. Diagne, A., Zeller, M.: Access to Credit and its Impact on Welfare in Malawi, I International Food Policy Research Institute. Research Report, 116.Bangladesh. New York University (2001). https://core.ac.uk/download/pdf/6289693.pdf 5. Iheduru, N.G.: Women Entrepreneurship and Development: The Gendering of Microfinance in Nigeria, I Paper presented at the 8th International Interdisciplinary Congress on Women. Mak-
108
6. 7. 8. 9. 10. 11. 12.
13. 14.
15. 16. 17. 18. 19.
20.
21.
M. Boccia erere University, Kampala- Uganda (2002). http://www.gdrc.org/icm/country/nigeria-women. html Ijaiya, M.A.: Informal microfinance and economic activities of rural dwellers in Kwara South senatorial district of Nigeria. Int. J. Bus. Soc. Sci. 2(15), 136–146 (2011) Kaboski, J.P., Townsend, R.M.: A structural Evaluation of a Large-Scale Quasi Experimenta lInitiative. Econometrica (2011). https://doi.org/10.3982/ECTA7079 Kaboski, J.P., Townsend, R.M.: The impact of credit on villages economies. Appl. Econ. Am. Econ. J. (2012). https://doi.org/10.1257/app.4.2.98 Karlan, D., Zinman, J.: expanding credit access: using randomized supply decisions to estimate the impacts. Rev. Fin. Stud. (2010). https://doi.org/10.1093/rfs/hhp092 Khandker, S.R., Faruqee, R.R.: the impact of farm credit in Pakistan. Agri. Econ. 28, 21–97 (2003). https://doi.org/10.1596/1813-9450-2653 Khandker, S.R.: Microfinance and poverty: evidence using panel data from Bangladesh. World Bank Econ. Rev. (2005). https://doi.org/10.1093/wber/lhi008 Ledgerwood, J.: Microfinance Handbook, An Institutional and Financial Perspective, Washington,D.C.:World Bank,Sustainable Banking with the Poor project (1999). http://documents. worldbank.org/curated/en/347491468326386001/Microfinance-handbook-an-institutionaland-financial-perspective Mahjabeen, R.: Microfinance in Bangladesh: impact on households, consumption and welfare. J. Policy Model. 30(6), 1083–1092 (2008) Morduch, J.: Does microfinance really helps the poor? New evidence from Flagship programs in Bangladesh, New York University. NYU Wagner, New York, NY (1998). https://wagner. nyu.edu/files/faculty/publications/1998-Does-MF-really-help-the-poor.pdf Mosley, P., Hulme, D.: Microenterprice Finance: is there a conflict between Growth and Poverty Alleviation? World Development (1998). https://doi.org/10.1016/S0305-750X(98)00021-7 Otu, M.F., et al.: Informal Credit Market and Monetary Management in Nigeria in Central Bank of Nigeria, p. 29. Research Department. Occas, Paper No (2003) Pitt, M., Khandker, S.R.: The impact of group-based credit on poor households in Bangladesh: Does the gender of participants matter? J. Polit. Econ. (1998). https://doi.org/10.1086/250037 Ravicz, R.: Searching for Sustainable Microfinance, The World Bank Research Working Paper No.1878. Washington D.C (1998). https://doi.org/10.1596/1813-9450-1878 Wilson, T.: Microfinance During and After Armed Conflict: Lessons from Angola, Cambodia, Mozambique and Rwanda: Durham, The Springfield Centre for Business in Development, Mountjoy Research Centre (2001). http://www.microfinancegateway.org/sites/default/ files/mfg-en-paper-microfinance-during-and-after-armed-conflictlessons-from-angolacambodia-mozambique-and-rwanda-mar-2002.pdf Yelwa, M., Omoniyi, A.E., Obansa, S.A.J.: Analysis of the relationship between informal financial institutions and poverty alleviation in nigeria: a multivariate panel data approach. Int. J. Soc. Sci. Human. Invent. (2017). https://doi.org/10.18535/ijsshi/v4i7.13 Zeller, M., Schrieder, G., Von Braun, J., Heldhues, F.: Rural Finance for Food Security for the Poor: Implications for Research and Policy. Review 4, International Food policy Research Institute. Washington D.C. (1997). http://core.ac.uk/download/pdf/6289435.pdf
Conditional Quantile Estimation for Linear ARCH Models with MIDAS Components Vincenzo Candila and Lea Petrella
Abstract Recent financial crises have put an increased emphasis on methods devoted to risk management. Among a plethora of risk measures proposed in literature, the Value-at-Risk (VaR) plays still today a prominent role. Despite some criticisms, the VaR measures are fundamental in order to adequately set aside risk capital. For this reason, during the last decades the literature has been interested in proposing as much as possible accurate VaR models. Recently, the quantile regression approach has been used to directly forecast the VaR measures. We embed the linear AutoRegressive Conditional Heteroscedasticity (ARCH) model with MIDAS (MI(xed)-DA(ta) Sampling) term in such a quantile regression (QR) framework. The proposed model, named Quantile ARCH-MIDAS (Q–ARCH–MIDAS), allows to benefit from the information coming from variables observed at different frequencies with respect to that of the variable of interest. Moreover, the QR context brings additional advantages, such as the robustness to the presence of outliers and the lack of distributional assumptions. Keywords Value-at-risk · Quantile regression · Mixed-frequency variables
1 Introduction During the last decades, the financial econometrics literature has paid particular attention to the methods for the risk management. Among the different risk measures proposed in literature, the Value-at-Risk (VaR) plays still today a leading role. This is because, despite some criticisms [1], according the Basel frameworks the VaR measures are fundamental in order to adequately set aside risk capital [2]. The methodology used to obtain the VaR measures can be broadly divided into three main categories: parametric, non-parametric and semi-parametric [3]. The V. Candila (B) · L. Petrella MEMOTEF Department, Sapienza University of Rome, Rome, Italy e-mail: [email protected] L. Petrella e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_17
109
110
V. Candila and L. Petrella
parametric approach requires the estimation of the volatility of the asset under investigation as a primer step. Typically, the GARCH [4] class of models is used. Secondly, the VaR measures are indirectly obtained by considering these volatility estimates and the quantile at a fixed level of the presumed distribution of the asset. Contrary to the parametric approach, the non-parametric technique does not make any distribution assumption concerning the daily returns. The semi-parametric technique specifies the updating dynamics of the model, but does not require any distributional assumptions. Contributions based on the quantile regression (QR) [5, 6] framework belong to the class of semi-parametric methods for the estimation of the VaR measures. Recent works employing the QR methods are [7, 8], among others. Within the QR context, the aim of this paper is to investigate the profitability of including a MIDAS (MI(xed)-DA(ta) Sampling) [9] component in the linear AutoRegressive Conditional Heteroscedasticity (ARCH) [10] model. The advantage of using the proposed Quantile ARCH–MIDAS (Q–ARCH–MIDAS) model to estimate the VaR measures is that these latter values are directly obtained as conditional quantiles of the daily return process, which in turn may depend on some exogenous variables observed at a lower frequencies. In this respect, many works ([11, 12], among others) highlight the influence of macro–economic variables (observed monthly or quarterly) on the (daily) variability of assets and commodities. Most of those works use the MIDAS terms within the heteroschedasticity class of models, inspired by the seminal paper of [13]. TO the best of our knowledge, this is the first time that a MIDAS term is included into a QR framework. Being in the semi-parametric context, there is no need of assuming a particular (potentially misspecified) distribution for the returns. Another advantage is the robustness to outliers in the data, which indeed may alter the estimates of many commonly used models (see, for instance, the contribution of [14]). The rest of the paper is organized as follows. Section 2 introduces the proposed Q–ARCH–MIDAS model while Sect. 3 is dedicated to the Monte Carlo experiment, used to evaluate the finite sample properties of the model. Finally, Sect. 4 is devoted to the empirical application.
2 Q–ARCH–MIDAS Model In the linear regression model, the relationship between a dependent variable yi , at time i, and a set of covariates x i is represented by the following equation:
yi = x i β + u i ,
(1)
where the vector x i includes an intercept and k − 1 covariates, while the zero mean iid error term u i , with quantile function Q u (τ ), is left with an unspecified distribution. As demonstrated by [5], the τ th quantile of yi , conditional to x i , is:
Q yi (τ |x i ) = x i β(τ ),
(2)
Conditional Quantile Estimation for Linear ARCH Models with MIDAS Components
111
where, in line with [15], the k × 1 vector β(τ ) = (β1 + Q u (τ ), β2 , . . . , βk−1 ) is obtained minimizing the following loss function: ⎡ ⎢ ˆ ) = arg min ⎢ β(τ ⎣ β∈R k
⎤
i∈ i:yi ≥x i β
τ |yi − x i β| +
⎥ (1 − τ )|yi − x i β|⎥ ⎦.
i∈ i:yi 0 and λ ∈ (0, 1) is given by P (X = x|θ, λ) =
θ (θ + λx)x−1 (−θ−λx) e , x = 0, 1, 2, . . . x!
(1)
The GP distribution is part of the general class of Lagrangian distributions (see [6]) and has the following moments θ θ , μ(2) = 1−λ (1 − λ)3 1 + 2λ 1 + 8λ + 6λ2 +3 , μ(4) = = √ θ (1 − λ) θ (1 − λ)
μ(1) =
(2)
μ(3)
(3)
where μ(r ) = E(X − E(X ))r . For λ = 0 one obtains the standard Poisson distribution as special case, whereas for λ ∈ (0, 1) over-dispersion is obtained, that is a
A Bayesian Generalized Poisson Model for Cyber Risk Analysis
125
greater variability than would be explained by the standard Poisson model. Following [9] we assume the following independent gamma and beta prior distributions θ ∼ G a(a, b), λ ∼ Be(c, d)
(4)
where a and b are the shape and rate parameters of the gamma distribution. Given a sequence of i.i.d. observations X 1 , . . . , X T from the GP distribution, the joint posterior distribution of the parameters θ and λ is π(θ, λ|X 1 , . . . , X T ) ∝ L(X 1 , . . . , X T |θ, λ)π(θ, λ)
(5)
where L(X 1 , . . . , X T |θ, λ) denotes the likelihood function and π(θ, λ) the joint prior distribution given in Eq. 4. The posterior distribution is not tractable due to lack of conjugacy, thus we propose a Metropolis-Hastings (MH) algorithm to generate random samples from π(θ, λ|X 1 , . . . , X T ) and to approximate the Bayes estimators and all the posterior quantities of interest. At the i-th iteration of the MH sampler a candidate for the parameters is generated from the two independent random walk proposal distributions θ ∗ ∼ G a(a (i) , b(i) ), λ∗ ∼ Be(c(i) , d (i) )
(6)
where b(i) = r/θ (i−1) , a (i) = b(i) θ (i−1) , c(i) = sλ(i−1) , d (i) = s(1 − λ(i−1) ) and θ (i−1) and λ(i−1) denote the previous iteration values of the parameters.
3 A Cyber Attacks Dataset Our dataset contains observations on the number of cyber threats collected at the daily frequency from the 1st January 2018 to the 31st December 2018. The data source is https://www.hackmageddon.com/. The observations refer to the following classes of threats: cyber crime, cyber espionage, cyber warfare, and hacktivism. The series of the total number of cyber-threats is given in Fig. 1. In the following we consider the time series of the total number of cyber events and the Bayesian model and the Markov Chain Monte Carlo (MCMC) procedure proposed in Sect. 2. We set the gamma prior hyper-parameters a = 10, b = 1 and the beta prior hyper-parameters c = 2 and d = 2. We set the scale to r = 0.01 and the precision to s = 100 in the MH random walk proposal distributions, run the MCMC for 4,000 iterations and obtain an average acceptance rate of 32%. The top plot of Fig. 2 provides the posterior draws (gray lines) and the progressive averages (black lines). After a graphical inspection of the MCMC draws and of the progressive averages we can see the MCMC chain enter in the high probability region after about 1,000 iterations. Thus, we choose to discard an initial burn-in sample of 1,000 MCMC iterations and also to reduce dependence in the MCMC draws by removing one
126
G. Carallo et al. 15
0.2
0.15
10
0.1 5 0.05 0 01/01/17
31/05/17
28/10/17
27/03/18
0
24/08/18
0
5
10
15
20
Fig. 1 Time series (left) and histogram (right) of the total number of cyber threats collected at the daily frequency from 1st January 2018 to 31st December 2018. Data source: https://www. hackmageddon.com/
3.2
0.35
3
0.3
2.8
0.25
2.6 0.2
2.4
0.15
2.2 2
0
1000
2000
3000
4000
5
0
1000
2000
3000
15
Prior Posterior
4
0.1
4000
Prior Posterior
10
3 2
5 1 0
0
5
10
15
0
0
0.2
0.4
0.6
0.8
1
Fig. 2 Top: posterior MCMC draws (gray solid lines) and progressive averages (black dashed lines) for θ (left) and λ (right). Bottom: prior distributions (red solid lines) and kernel density estimates of the posterior distributions (black dashed lines) for θ (left) and λ (right)
sample every two. Thus, the posterior approximation of all quantities of interest is based on 2,000 MCMC samples. Bottom plots show the prior distributions (red solid lines) and kernel density estimates of the posterior distributions (black dashed lines) of λ and θ based on the MCMC samples. The comparison between prior and posterior distribution suggests that the prior has been revised and the data are informative about the value of the parameters. The Bayesian estimates of θ and λ are θˆB = 2.35 and λˆ B = 0.24, respectively. We find substantial evidence of over-dispersion which indicates the standard
A Bayesian Generalized Poisson Model for Cyber Risk Analysis
127
Fig. 3 Left: empirical distribution of the data (black), generalized Poisson distribution (blue) at θ = θˆB = 2.35 and λ = λˆ B = 0.25 and the standard Poisson distribution (red) at θ = θˆB = 3.14. Right: empirical cumulative distribution of the data (black dots), the high posterior density region at the 95% level for the generalized Poisson distribution (blue) and the standard Poisson (red) 0.5
4
0.4
3
0.3 2 0.2 1
0 01/01/17
0.1
31/05/17
28/10/17
27/03/18
2
24/08/18 4
0 01/01/17
31/05/17
28/10/17
27/03/18
1.2
5.5
1.1
1.8
5
3.5
1
1.6
4.5
3
0.9
1.4 2.5
1.2 1 01/01/17
24/08/18
31/05/17
28/10/17
27/03/18
2 24/08/18
4
0.8 0.7 01/01/17
31/05/17
28/10/17
27/03/18
3.5 24/08/18
Fig. 4 Sequential inference on a rolling window of 120 observations. Top: sequential estimates of θ (left plot, black solid lines) and λ (right plot, black solid lines) and their 95% HPD regions (gray areas). In both plots horizontal red dashed lines represent whole sample parameter estimates. Bottom: sequential coefficient of variation (left plot) and of the asymmetry μ(3) (right plot, left scale) and kurtosis μ(4) (right plot, right scale)
Poisson model is not well suited for this data. Figure 3 provides a comparison between the GP and the standard Poisson model and the empirical distribution of the cyberattack frequency. The left plot suggests the GP (blue dots) can better capture than the Poisson (red dots) dispersion and fat tails of the empirical distribution. The right plot provides the 95% HPD region of the GP (blue area) and Poisson (red area) cumulative distributions. The empirical cumulative (black dots) is not entirely contained in the HPD of the Poisson model.
128
G. Carallo et al.
Top plots of Fig. 4 provide the whole sample estimates (horizontal dashed lines), the sequential posterior mean (solid black lines) and sequential 95% HPD region (gray shaded area) over time on a rolling window of 120 observations. We find evidence of substantial temporal fluctuations in the parameters of the GP distribution. More specifically, the sequential estimation of the mean μ(1) (bottom-left plot of Fig. 4) indicates that the expected number of cyber attacks increased from 2.62 in 2017 to 3.85 in 2018 (blue solid line, right axis). In the same plot the estimated coefficient of variation C = μ(2) /μ(1) (red dashed line) increased from 1.15 in April 2017 to 2.03 in December 2018. The estimated values of μ(3) and μ(4) (bottom-right) indicate an increasing degree of asymmetry and tail heaviness in the distribution of the attack frequency.
4 Conclusion Our Bayesian analysis of the cyber-attacks frequency provides evidence of overdispersed data and of a better fitting of the generalised Poisson than the standard Poisson model. Our sequential analysis confirms the escalation of the cyber threats and the increased complexity of the phenomenon.
References 1. EIOPA: Understanding Cyber Insurance: A Structured Dialogue with Insurance Groups (2018). Available at https://eiopa.europa.eu/Publications/Reports/ EIOPAUnderstandingcyberinsurance.pdf 2. EIOPA: Cyber Risk for Insurers: Challenges and Opportunities (2019). Available at https:// eiopa.europa.eu/Publications/Reports/EIOPA_Cyber_risk_for_insurers_Sept2019.pdf 3. Xu, Maochao, Hua, Lei, Xu, Shouhuai: A vine copula model for predicting the effectiveness of cyber defense early-warning. Technometrics 59(4), 508–520 (2017) 4. Chen, C.W.S., Lee, S.: Generalized Poisson autoregressive models for time series of counts. Comput. Stat. Data Anal. 99, 51–67 (2016) 5. Chen, C.W.S., So, M.K.P., Li, J.C., Sriboonchitta, S.: Autoregressive conditional negative binomial model applied to over-dispersed time series of counts. Stat. Methodology 31, 73–90 (2016) 6. Consul, P.C., Jain, G.C.: A generalization of the Poisson distribution. Technometrics, 15(4), 791–799 (1973) ˇ 7. Husák, Martin, Komárková, Jana, Bou-Harb, Elias, Celeda, Pavel: Survey of attack projection, prediction, and forecasting in cyber security. IEEE Commun. Surv. Tutorials 21(1), 640–660 (2018) 8. Leslie, N.O., Harang, R.E., Knachel, L.P., Kott, A.: Statistical models for the number of successful cyber intrusions. J. Defense Model. Simul. Appl. Methodol. Technol. 1–16 9. Scollnik, D.: On the analysis of the truncated generalized poisson distribution using a bayesian method. ASTIN Bull. 28(1), 135–152 (1998) 10. Zhu, Fu.-kang, Li, Qi.: Moment and Bayesian estimation of parameters in the INGARCH (1,1) model. J. Jilin Univ. (Science Edition) 47, 899–902 (2009)
Implementation in R and Matlab of Econometric Models Applied to Ages After Retirement in Europe Patricia Carracedo and Ana Debón
Abstract Econometric models such as panel data models are becoming popular due to available software to implement them. Therefore, a comparison between R and Matlab software, detailing the steps to obtain the models can be useful. The steps to study the convenience of increasing the complexity of the model are described and models are validated by the measures of goodness of fit R2 and residual variance σ 2 . The cost of retirement ages has been the focus of the attention of insurers and Social Security administrations; then, the implementation is carried out with an example of mortality for ages after retirement in Europe. This study aims to improve and compare data panel models using MATLAB and R software. These models take into account the temporal and spatial dependence of the data and identify the covariables which explain mortality. The panel data used correspond to the male and female mortality of the aged population of European countries from the Human Mortality Database. Mortality is quantified with the Comparative Mortality Figure, which is the most suitable statistic for comparison of mortality by sex over space when detailed specific mortality is available for each studied population. The covariates considered in the model are collected from Eurostat and the World Data Bank. Keywords Comparative Mortality Figure · Spatial-panel data models · Europe · Matlab · R
1 Introduction Since the turn of the century, econometrics has focused its studies on the specification and estimation of spatio-temporal panel data models. This interest is due to the greater P. Carracedo (B) Área de ciencias sociales, Universidad Internacional de Valencia, Pintor Sorolla, 21, 46002 Valencia, Spain e-mail: [email protected] A. Debón Centro de Gestión de la Calidad y del Cambio, Universitat Politècnica de València, Camino de Vera, s/n, 46002 Valencia, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_20
129
130
P. Carracedo and A. Debón
availability of this kind of data and the advances in software technology for data collection and storage. This interest has gone hand in hand with the development of new packages to model spatial dependence over time in panel data. A quick look at Matlab and R might suggest they are relatively similar. They both offer access to math functions, a language, statistics, and a community of users. However, a closer look at the technical capabilities of each one and an assessment of other important factors, such as documentation and quality, could lead to a different conclusion. The main objective of this study is to compare the implementation of spatio-temporal panel data models using Matlab and R software. The contents of the article are structured as follows. In Sect. 2, we describe data, Comparative Mortality Figure, panel data models and statistical methodology to select the best model. Next, in Sect. 3, we present the main results, models are implemented in R and Matlab. Finally, Sect. 4 establishes the main conclusions.
2 Spatio-Temporal Methodology 2.1 Data The database used is dynamic life tables of 26 European countries from the free access website Human Mortality Database [4] for the period 1995–2012, an age range of 65– 110+ and sexes male (m) and female (f). Also, we looked for explanatory variables of mortality which were analyzable for all of the countries. Information on the four variables was obtained from the free access website The World Bank Database [6]: Growth rate Gross Domestic Product (GDP), Health expenditure, CO2 and Education expenditure. The first step was to avoid multicollinearity between covariates using the Variance Inflation Factor (VIF). Multicollinearity affects the coefficient values for the predictor variables by inflating their importance; then, the VIF estimates how much the variance of a regression coefficient is inflated. The formula is quite simple, VIFi =
1 , 1 − R2i
where R2i is R-squared of the regression of Xi (predictor variable) on the other predictor variables in a model [2]. In this paper, the rule was not to use variables with corresponding VIF < 2. Therefore, no variable has multicollinearity problems for males but for females, health expenditure (VIF = 2.58) produces multicollinearity.
Implementation in R and Matlab of Econometric Models …
131
2.2 Comparative Mortality Figure (CMF) Death probabilities of ages 65–110+ from dynamic life tables are transformed into the Comparative Mortality Figure (CMF). The CMF’s are the ratio between the number of expected deaths in the standard population if they had the age mortality rates of the subpopulations studied and the actual number of deaths in the standard population over a period of time [5]:
CMFi,t,s =
Ei,t,s Oc,2012,m
i ∈ {1, . . . , N }, t ∈ {1, . . . , T } and s ∈ {m, f }
where Ei,t,s are the expected deaths for each country i, year t and sex s and, Oc,2012,m are the deaths observed in the set of European countries c (standard population) for the fixed year 2012 and gender male. The meaning of CMF is straightforward to interpret: • CMF > 1.—Indicates that more expected deaths were detected than observed, there is “deaths excess”. • CMF < 1.—Indicates that there were more deaths observed than expected, there is “deaths deficit”. • CMF = 1.—Indicates that there was “the same number of expected deaths” as observed.
2.3 Panel Data Panel data combine cross-section and time series data. Specifically, the data from this study are a panel data which combines: spatial dimension (26 countries) and time dimension (18 years).
2.4 Spatio-Temporal Panel Data Models A spatio-temporal model is a regression model that uses the temporal and spatial dimensions of the data to estimate the parameters of interest. They model the unobserved heterogeneity produced by both the spatial and time dimensions. To control this heterogeneity, reduce multicollinearity problems among the variables by producing more efficient estimates in the parameters of the panel data models [3] than classical models. We have two main types of models spatial or non-spatial, depending on the inclusion or not of the spatial lag term in the model:
132
P. Carracedo and A. Debón
1. Non-spatial models • Ordinary least Squared (OLS): yit = α + xit β + εit • Model with spatial fixed effects (MSFE) yit = α + xit β + μi + εit • Model with time fixed effects (MTFE) yit = α + xit β + νt + εit • Model with spatial and time fixed effects (MSTFE) yit = α + xit β + μi + νt + εit 2. Spatial models • Spatial Lag Model (SLM): yit = α + λ
N
W yjt + xit β + εit
j=1
• Spatial Lag Model with spatial and time fixed effects (SLMSTFE). yit = α + λ
N
W yjt + xit β + μi + νt + εit
j=1
• Spatial Lag Model with spatial random effects (SLMSRE). yit = α + λ
N
W yjt + xit β + φ + νt + εit
j=1
• Spatial Error Model with spatial and time fixed effects (SEMSTFE). yit = α + xit β + μi + νt + u; u = δ
N j=1
W u + εit
Implementation in R and Matlab of Econometric Models …
133
2.5 Steps to Select the Best Spatio-Temporal Panel Data Model In this section, we analyzed the convenience of increasing the complexity of the model, contrasting the significance of adding each new term. It is important to say that in Matlab, contrasts are already implemented while in R not. Thus, we have resorted to the calculation of the F-Snedecor where the formula is F=
(er er − e e)/q → Fq,n−k e e/(n − k)
where er er is the quadratic sum of the error in the restricted model, e e the quadratic sum of the error in the unrestricted model, q the number of restrictions and n − k degrees of freedom. The steps to select the best spatio-temporal data model used are: 1. Contrast whether the spatial and time fixed effects are significant. 2. Study the inclusion of spatial dependence in the dependent variable or error to non-spatial models. 3. Selection of non-spatial models versus the spatial lag model with fixed effects. 4. Contrast whether the spatial and time fixed effects are significant in spatial models of panel data. 5. Selection of the spatial lag model with spatial, time, or spatial and time fixed effects. 6. Study whether the effects of the spatial model are to be considered fixed or random.
3 Results Next, the results of steps described in Sect. 2.5 are: 1. For both sexes, we have to include the both effects as they were significant. 2. For both sexes, it is more significant to consider the spatial dependence in the dependent variable than in error. 3. For both sexes, spatial lag models provide higher R2 and lower σ 2 concerning non-spatial models. For this reason, we include the spatial lag in the dependent variable in the model. 4. For both sexes, the inclusion of spatial effects in the spatial lag model, as well as the spatial lag, is significant. 5. For both sexes, the spatial lag model in the dependent variable with spatial and time fixed effects (SLMSTFE) was selected. For males, the goodness of fit were R2 = 0.9883, σ 2 = 0.0006 and females R2 = 0.9725, σ 2 = 0.0016. 6. For both sexes, the effects included in the model must be considered to be fixed.
134
P. Carracedo and A. Debón
The dependent variable is the log(CMF), and the significant explanatory variables are: GDP, Health expenditure, CO2 and Education expenditure for males and GDP, CO2 and Education expenditure for females.
4 Conclusions In this paper, we use two software Matlab and R. The main advantage for R software is free and accessible to everyone, while Matlab is commercial software, with a very high price. Nevertheless, millions of engineers and scientists have tested the spatialtemporal algorithms of Matlab, on the contrary some functions such as sphtest and splm in splm R-package have malfunctions. This study reproduces the code for the selection of spatial-panel data models, written by Elhorst [1] in Matlab for free software R. According to our results, the spatial lag model with spatial and time fixed effects (SLMSTFE) is selected for males. It is a good model with R2 and σ 2 close to 1 and 0, respectively. Besides, λ means that the value of CMF in one country, increases around 58.96% of the mean of the values in its surrounding countries and four covariates (GDP rate, health expenditure, CO2 , and education expenditure) cause variations in the log (CMF) of a country in a year. Regarding females, the same model is selected. Also, it is a good model but, compared to men, slightly worse fitting. In this case, spatial dependence is lower than in the case of men and even not significant at 5% (in the final fitting) and only two covariates (GDP and CO2 ) cause variations. Mortality in women is less complex to explain. In this paper, we consider spatio-temporal models have a part to play in explaining differentials of ages after retirement mortality in Europe. Therefore, we should use these results to guide actuarial calculations with life tables taking into account their high spatial dependence on neighboring countries. Acknowledgements This work was supported by a grant from the Mapfre Foundation (Ayudas a la investigación Ignacio H. de Larramendi 2017 Seguro y Previsión Social).
References 1. 2. 3. 4.
Elhorst, J.P.: Matlab software for spatial panels. Int. Reg. Sci. Rev. 37(3), 389–405 (2014) Gareth, J., et al.: An Introduction to Statistical Learning. Springer, Berlin (2017) Hsiao, C.: Analysis of Panel Data, 3rd edn. Cambridge University Press, Cambridge (2014) Human Mortality Database.: University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany) (2016). www.mortality.org or www.humanmortality.de (data downloaded on 12th July 2016)
Implementation in R and Matlab of Econometric Models …
135
5. Julious, S., Nicholl, J., George, S.: Why do we continue to use standardized mortality ratios for small area comparisons? J. Public Health 23(1), 40–46 (2001) 6. The World Bank Database.: World Development Indicators (2018). http://data.worldbank.org/. Data download on 6th March 2018
Machine Learning in Nested Simulations Under Actuarial Uncertainty Gilberto Castellani, Ugo Fiore, Zelda Marino, Luca Passalacqua, Francesca Perla, Salvatore Scognamiglio, and Paolo Zanetti
Abstract The Solvency II directive states that in order to be solvent the insurance undertakings must to hold eligible own funds covering the Solvency Capital Requirement (SCR), which is defined as the Value-at-Risk of the NAV probability distribution (PDF in the directive) at a confidence level of 99.5% over a one-year period. The estimation of the SCR requires the evaluation of the NAV (under risk-neutral probabilities) conditionally to the economic and actuarial scenarios estimated under real-world probabilities and involve nested Monte Carlo simulations. This approach usually presents unacceptable computational costs. In this paper we analyse the performance of Machine Learning techniques on some insurance portfolios considering a multivariate stochastic model for actuarial risks including mortality, lapse and expense risks. Experiments are aimed not only to analyse the performance of these techniques in a large-dimensional risk framework, but also to investigate variability and robustness of the obtained estimations. Keywords Solvency capital requirement · Life insurance · Support vector machines · Deep neural networks · Actuarial risks G. Castellani · L. Passalacqua Department of Statistics, Sapienza University of Rome, Rome, Italy e-mail: [email protected] L. Passalacqua e-mail: [email protected] U. Fiore · Z. Marino · F. Perla · S. Scognamiglio (B) · P. Zanetti Department of Business and Quantitative Studies, University of Naples Parthenope, Naples, Italy e-mail: [email protected] U. Fiore e-mail: [email protected] Z. Marino e-mail: [email protected] F. Perla e-mail: [email protected] P. Zanetti e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_21
137
138
G. Castellani et al.
1 Introduction The European Union Directive 2009/138/EC—better known as “Solvency II” [4]— represents a revolution in the regulatory framework, since probability distribution functions enter officially in the legal and in the balance sheet of insurance companies. Insurance companies that adopt the internal model must calculate the Solvency Capital Requirement (SCR) applying a Value-at-risk at confidence level 99.5% on the Probability Distribution Forecast (PDF). Nested simulations is, at present, the prevalent approach for the SCR estimation. However, this approach can present prohibitive computational challenge because reliable estimates could require several simulations [1]; different strategies have been proposed in literature in order to reduce the computational burden due to “full” nested simulations. Among these, Least Squares Monte Carlo (LSMC) [10] seems to be the most promising approach and its relevant application can be found in [8, 11]. Only recently, data-driven alternatives based on machine learning have been investigated and interesting discussions are provided in [7, 9]. A regression technique based Deep Learning Networks (DLN) was successfully applied in the valuation of profit-sharing life insurance portfolios in [5] and a more extensive investigation was presented in [2] in which a good performance of DLN and Support Vector Regression (SVR) comparing against the LSMC methodology is shown. This paper investigates the performance of DLN and SVR in a different mathematical framework: a multivariate stochastic models for actuarial risks, including mortality, lapse and expense risks is developed. The experiments are aimed not only to analyse the applicability of these models in a context of actuarial uncertainly, but also to investigate their variability and robustness in the 0.995-quantile estimation. Section 2 describes the stochastic framework used to model actuarial uncertainty. Section 3 offers a brief description of the methods used and finally Sect. 4 provides some empirical results.
2 Actuarial Modeling The stochastic framework developed in this work considers mortality, lapse and expense risk. Mortality is modelled with a multivariate extension of the log-normal Lee-Carter model. Let m p (x, t) be the central rate of mortality of an individual of age x at time t, belonging to population p = 1, . . . , n p , then: nw ( p) ln m p (x, t) = a p (x) + b p (x) k p (t) + αi wi (t), i=1
k p (t) = k p (t −1) + δ p +
nz i=1
( p) βi z i (t),
(1)
Machine Learning in Nested Simulations Under Actuarial Uncertainty
139
where w = (w1 , . . . , wn w ) ∼ N (0, Σw ) and z = (w1 , . . . , z n z ) ∼ N (0, Σz ) are two independent multivariate Brownian motions with diagonal covariance matrices, ( p) ( p) ( p) ( p) while α ( p) = (α1 , . . . , αn w ) and β ( p) = (β1 , . . . , βn z ) are the factor loadings of the p-th population, determining the structure of dependence between the central rates of mortality of different populations. As in the original Lee-Carter model, the coefficients k p (t) are supposed to follow an ARIMA (0, 1, 0) process, i.e. a random walk with drift. Similarly, lapses are modelled in a logistic-normal model where the lapse rate r p (a, t) at time t for a policy with a years since inception, belonging to population p = 1, . . . , n p is given by the expression: nw 2 ( p) logit r p (x, t) = a p (x) + k (pj) (t) + αi wi (t), b(pj) (x) j=1
i=1
k (pj) (t) = k (pj) (t −1) + δ (pj) +
nz
i( p) z i (t), β
(2) j = 1, 2,
i=1 ( p) ( p) 1( p) , . . . , β n( zp) ) are the factor loadings for where α ( p) = ( α1 , . . . , αn w ) and β ( p) = (β the lapse risk. Finally, expense risk is modelled assuming a log-normal distribution n p: for the total expenses E p of each population p = 1, . . . ,
σe2 2 c p z e , z e ∼ logN − , σe , Ep ∼ 2
(3)
where c p is a coefficient depending on the population p.
3 Methods From the algorithmic point of view, the PDF at time t + 1 is obtained by defining a time grid from present to run-off time. The stochastic processes of the risk drivers are simulated over the time grid according to the real-world probability measure P up to t + 1 and then, conditional on the values in t + 1, to run-off, according to the risk-neutral probability measure Q. Notice however that the determination of prices as conditional expectations requires to generate K risk-neutral trajectories for each of the N real-world trajectories (nested Monte Carlo simulation). Let r(i) ∈ Rd be the i-th realisation of the risk drivers and y (i) ∈ R+ the risk-neutral evaluation of the liabilities in t + 1 conditionally to r(i) . Time consuming inner simulations can be avoided providing a pricing function such that: f Q : Rd → R+ ,
y = f Q (r).
One computationally efficient strategy could be: (a) to calibrate f Q using a subset N1 (with N1 N ); (b) evaluate yˆ = f Q (ri ) in the remaining of realisation {r, y}(i)
140
G. Castellani et al.
(N − N1 ) outer scenarios. The Least Squares Monte Carlo (LSMC) methodology approximates this function as a linear combination of a finite set of basis functions; generally orthogonal polynomials [10]. LSMC represents the state-of-art in this class of problems, however it suffers of the curse of dimensionality since the number of coefficients strongly depends on the dimensionality of the problem and the degree of the polynomial regression. One possible solution is the Support Vector Regressor (SVR), a supervised learning techniques that maps the data into high-dimensional features space in which performs a linear regression [12]. According to its -insensitive formulation, y (i) can be estimated as: (i) (i) T (i) ˆ ySV R = f SV R (r ) = v0 + v ϕ(r ),
where ϕ() : Rd → Rs is the kernel function that maps the instances in a sdimensional space (s > d) and v0 ∈ R and v ∈ Rs are respectively bias and weights calculated solving a constrained optimisation problem [12]. An alternative approach could be the Deep Learning Network (DLN) that consists in a set of non-linear functions arranged on several layers named input, output and hidden layer [6]. Let g ∈ N the number of the hidden layers and q ∈ Ng the number of units in each layer, with qg = 1, the values of the liabilities y (i) can be calculated as follows: (i) (i) T ˆ y DL N = f DL N (r ) = φg (Wg zg−1 + bg ),
(4)
taking into account that: T zg−2 + bg−1 ), zg−1 = φg−1 (Wg−1
where Wk ∈ Rqk ×qk−1 and bk ∈ Rqk are respectively are weight matrices1 and bias vectors for k = 1, 2, . . . , g calibrated using Back Propagation algorithm and φ1 , φ2 , . . . , φg are activation functions.
4 Numerical Results Considering a realistic insurance portfolio with a single-premium policy, some numerical experiments are provided.2 Mortality, lapse and expense risks were modelled as described in Sect. 2 and the number of risks sources was set d = 10. The first part of these experiments analyses, for each method, the out-of-sample accuracy in the approximation of the liabilities distribution. Two empirical distributions were The weight matrix in the first hidden layer has q1 × d elements, so W1T ∈ Rq1 ×d . All datasets used in this work are produced using DISAR® (Dynamic Investment Strategy with Accounting Rules), a commercial computational system that allows to carry out market-consistent valuation of complex cash flows using numerical techniques in a stochastic framework [1, 3], designed to be used in the Italian insurance sector.
1 2
Machine Learning in Nested Simulations Under Actuarial Uncertainty Table 1 Performance of LSMC, SVR and DLN Methods MSE KL-divergence LSMC SVR DLN
71.7008 70.3172 69.6160
10−4
0.3686 * 0.3611 * 10−4 0.3574 * 10−4
141
KS-distance 0.1088 * 10−1 0.5900 * 10−2 0.4610 * 10−2
simulated considering respectively (N1 = 10, 000, K = 400) and (N = 100, 000, N1 was employed to optimise the hyperparameK = 400). The first sample {r, y}(i) ters and train fˆaQ (·) for each a ∈ {L S MC, SV R, DL N }, the second one was used to measure the out-of-sample performance L(y (i) , fˆaQ (r(i) )). LSMC approximation was built using Laguerre polynomials with degree k = 2, while hyperparameters of SVR and DLN were optimised using a tuning procedure based on a leave-p-out validation with p = 1/5: • among the hyperparameters space H = {G × E × T }, with G = T = {2l , ∀ l ∈ Z : −7 ≤ l ≤ 7} and E = {2l , ∀ l ∈ Z : −4 ≤ l ≤ 0}, the best parameter vector for ˆ = {2−7 , 2−4 , 23 }. SVR seems to be hˆ = {γˆ , , ˆ C} • about DLN, several architectures were tested, varying the depth of the network g ∈ {2, 3, 4}, the number of hidden units qk = {16, 32, 64} and the activation function φ ∈ {linear, sigmoid, tanh}. Analysing the average Mean Square Error (MSE) in the validation set on 10 different runs, the best architecture was obtained using 3 layers with 32 units each and sigmoid activation function. Table 1 lists the performance in terms of MSE, Kullback–Leibler (KL) divergence and Kolmogorov–Smirnov (KS) distance for all three methods. It should be notice that LSMC and SVR were trained on 10, 000 instances while DLN training was carried out on 8, 000 instances since the remaining 2, 000 were employed as validation set to avoid overfitting. All three methods seem to be quite accurate however the approximation obtained using DLN seems to be the most accurate for all considered measures. The second set of experiments investigates the role of K in the 0.995-quantile estimation. Three empirical distributions were simulated setting the number of outer simulations N1 = 10, 000, 000 and varying K ∈ {10, 50, 100}. Each distribution was partitioned into 1, 000 segments with 10, 000 samples: the first segment was used for the calibration phase and in the remaining 999 splits were employed to estimate the 0.995-quantile. Figure 1 shows the empirical distributions of the quantile estimation obtained for Nested MC, LSMC, SVR and DLN varying the number of risk-neutral simulations. The reference values, indicated with the dot lines, represents the quantile estimation obtained using a large size full nested MC with N = 100.000 and K = 1.000. Nested MC estimator is biased since the empirical distribution of the quantile is shifted on left side of the reference value. This bias progressively decreases by increasing the number of inner simulations K . In different way, the quantile empirical distribution produced by LSMC, SVR and DLN quantile estimator seem to be better positioned
142
G. Castellani et al.
Density
0.00 0.10 0.20
10.000.000X10 nested MC DLN LSMC SVR
1300
1320
1340 N = 1000
1360
1380
1360
1380
1360
1380
Bandwidth = 0.7242
Density
0.00 0.10 0.20
10.000.000x50 nested MC DLN LSMC SVR
1300
1320
1340 N = 1000
Bandwidth = 0.5132
Density
0.00 0.10 0.20
10.000.000x100 nested MC DLN LSMC SVR
1300
1320
1340 N = 1000
Bandwidth = 0.5125
Fig. 1 Empirical distribution of 0.995 quantile of the FDB distribution for nested MC, LSMC, SVR and DLN
on the reference value. Numerical results show that DLN and SVR are promising alternatives to the LSMC approach: they loosen the curse of dimensionality and show a higher out-of-sample accuracy.
References 1. Casarano, G., Castellani, G., Passalacqua, L., Perla, F., Zanetti, P.: Relevant applications of Monte Carlo simulation in solvency II. Soft Comput. 21(5), 1181–1192 (2017) 2. Castellani, G., Fiore, U., Marino, Z., Passalacqua, L., Perla, F., Scognamiglio, S., Zanetti, P.: An investigation of machine learning approaches in the solvency ii valuation framework. SSRN 3303296 (2018) 3. Castellani, G., Passalacqua, L.: Applications of distributed and parallel computing in the solvency II framework: the DISAR system. In: Euro-Par Workshops, pp. 413–421. Springer (2010) 4. Directive 2009/138/EC of the European Parliament and of the Council of 25 November 2009 on the taking-up and pursuit of the business of insurance and reinsurance (2009) 5. Fiore, U., Marino, Z., Passalacqua, L., Perla, F., Scognamiglio, S., Zanetti, P.: Tuning a deep learning network for solvency II: preliminary results. In: Corazza, M., Durbán, M., Grané, A., Perna, C., Sibillo, M. (eds.) Mathematical and Statistical Methods for Actuarial Sciences and Finance. Springer, Berlin (2018) 6. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) 7. Hejazi, S.A., Jackson, K.R.: Efficient valuation of SCR via a neural network approach. J. Comput. Appl. Math. 313, 427–439 (2017) 8. Krah, A.-S., Nikoli´c, Z., Korn, R.: A least-squares Monte Carlo framework in proxy modeling of life insurance companies. Risks 6(2), 62 (2018)
Machine Learning in Nested Simulations Under Actuarial Uncertainty
143
9. Krah, A.-S., Nikoli´c, Z., Korn, R.: Machine learning in least-squares Monte Carlo proxy modeling of life insurance companies. Risks 8(1), 21 (2020) 10. Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. Rev. Financ. Stud. 14(1), 113–147 (2001) 11. Teuguia, O.N., Ren, J., Planchet, F.: Internal model in life insurance: application of least squares Monte Carlo in risk assessment (2014) 12. Vapnik, V.: The Nature of Statistical Learning Theory. Springer Science & Business Media, Berlin (2013)
Comparing RL Approaches for Applications to Financial Trading Systems Marco Corazza, Giovanni Fasano, Riccardo Gusso, and Raffaele Pesenti
Abstract In this paper we present and implement different Reinforcement Learning (RL) algorithms in financial trading systems. RL-based approaches aim to find an optimal policy, that is an optimal mapping between the variables describing an environment state and the actions available to an agent, by interacting with the environment itself in order to maximize a cumulative return. In particular, we compare the results obtained considering different on-policy (SARSA) and off-policy (QLearning, Greedy-GQ) RL algorithms applied to daily trading in the Italian stock market. We both consider computational issues and investigate practical solutions applications, in an effort to improve previous results while keeping a simple and understandable structure of the used models. Keywords Reinforcement learning · Financial trading systems · Sharpe and Calmar ratios
1 Introduction In this paper, we propose some automated Financial Trading Systems (FTSs) based on a self-adaptive machine learning approach known as Reinforcement Learning (RL). Specifically, we define our FTSs on the basis of the following RL methodologies: M. Corazza (B) · R. Gusso Department of Economics, Ca’ Foscari University of Venice, Sestiere Cannaregio 873, Venice, Italy e-mail: [email protected] R. Gusso e-mail: [email protected] G. Fasano · R. Pesenti Department of Management, Ca’ Foscari University of Venice, Sestiere Cannaregio 873, Venice, Italy e-mail: [email protected] R. Pesenti e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_22
145
146
M. Corazza et al.
State-Action-Reward-State-Action (SARSA) [1, 9] and Q-Learning (QL) [1, 10], with its development Greedy-GQ [8]. Then, we compare their effectiveness. The considered methodologies concern an agent interacting with an environment. The agent perceives the state of the environment and takes an action, then the environment provides a negative or a positive reward to the action. This iterative process allows the agent to heuristically identify a policy that maximizes a cumulative return over time. In our case, the agent is a FTS, the environment is a financial market and the reward is a measure of financial gain/loss. The FTS has to decide a trading strategy, i.e., when to sell or to buy an asset, or to stay out of the market. Note that the knowledge of a given FTS is not acquired in some preliminary in-sample training phase. Indeed, any action is taken by the considered FTS on the ground of the “experience” it gained up to that moment, through a trial-and-error mechanism based on the rewards it obtained as consequences of its past actions. The application of the above methodologies is justified by the assumption that the Adaptive Market Hypothesis (AMH) [7] holds. Under this perspective, a financial market can be viewed as an evolutionary environment in which different partly rational “species” (e.g., hedge funds, retail investors and others) interact among them in order to achieve a satisfactory, not necessarily optimal, level of profitability. The adaptations of these species to the various stimuli is neither instantaneous nor immediately appropriate, and this generally does not imply the efficiency of the financial market. Within this framework, a FTS agent can be seen as possibly able to learn the time-varying dynamics of the financial market, aiming at defining a profitable financial trading policy. Note that SARSA, QL and Greedy-GQ methodologies are heuristics that cannot guarantee of providing optimal solutions. On the other hand, they can be successfully applied when there is not a-priori knowledge of the transition probability matrices of the state of a dynamic environment [6, p. 199] as in the case of the financial market. The remainder of the paper is organized as follows. In the next section, we describe the background of RL theory. In Sect. 3 we introduce our implementations of the FTSs and consider the problem of the description of the financial environment state. In Sect. 4 we analyze the results obtained by applying the developed FTSs to some stocks of the Italian FTSE Mib market.
2 RL Background RL applies to problems where the following elements can be identified: (i) the agent, which is a learning decision maker; (ii) the environment the agent interacts with, in subsequent time steps; (iii) a set of possible actions to choose among at each time step; (iv) a feedback signal, the reward, from the environment. Let us denote by S , A and R respectively the sets of all possible states of the environment, actions and rewards. At each time step t the agent reads a description of the environment current state St ∈ S and selects an action At ∈ A , among the possible ones at the current state. At the subsequent time step t + 1, the agent receives
Comparing RL Approaches for Applications to Financial Trading Systems
147
Fig. 1 Interaction between agent and environment at time steps t and t + 1
both a reward Rt+1 ∈ R and the description of the new environment state St+1 (see Fig. 1). The next assumption holds. Assumption 2.1 The sets S , A and R have a finite number of distinct elements, with R ⊂ R. Then, random variables Rt , St have a discrete probability distribution conditioned only on preceding state and action, i.e. p(s , r |s, a) := P St+1 = s , Rt+1 = r |St = s, At = a ,
(1)
which expresses the so-called Markov property of the state. At each time t, the agent’s objective is to maximize the future rewards. This task is generally achieved adopting a cumulated discounted return with respect to discount rate 0 ≤ γ ≤ 1, i.e. ∞ γ k Rt+k+1 . (2) G t := k=0
To reach the above goal, at each time t the agent dynamically defines and updates a policy π(α|ξ ), which determines the probability for the agent to choose an action α ∈ A (ξ ), given a state ξ ∈ S , in order to maximize the expected value of (2), i.e. maximizing (3) qπ (s, a) := Eπ [G t |St = s, At = a] . Here the expected value Eπ is meant to be computed given that the agent selects the policy π after choosing a ∈ A (s). An optimal policy π ∗ such that qπ ∗ (s, a) = maxπ qπ (s, a) can be theoretically found solving the following Bellman equation [2]: q (s, a) =
π∗
s ∈S r ∈R
∗ p(s , r |s, a) r + γ max qπ (s , a ) .
a ∈A (s )
(4)
In principle, Eq. (4) might be solved if the dynamic conditioned probabilities p(s , r |s, a) were known. However, even if this assumption holds, computation burden often results too heavy to be implemented in the practice. For the above reason, RL methods would rather determine sub-optimal policies, using information the agent obtains by direct interaction with the environment, with-
148
M. Corazza et al.
out assuming a complete knowledge of the probabilities p(s , r |s, a). Specifically, RL gets this knowledge from sample sequences of actual or simulated states, actions, and rewards. As an example, let Q(St , At ) be the current estimate of qπ∗ (s, a) for encountered state St and chosen action At and let Rt represent the computed reward at time t, and βt is a step-size parameter. Then SARSA uses the following update rule for Q(St , At ): Q(St , At ) ← Q(St , At ) + βt Rt+1 + γ Q(St+1 , At+1 ) − Q(St , At ) .
(5)
3 The FTSs In this section we apply the three methodologies listed in Sect. 1 to the development of automated FTSs operating on Italian FTSE stock market. The source of the data we c database [3], from which we collected daily close prices for used is the Bloomberg five major companies (Enel, Generali, Intesa, Tim, Unicredit) between January 2000 to October 2018. Our aim is to improve the results obtained in [4], while keeping a similar simple structure of both the state space representing the stock market and the trading actions available. Then we assume that at every time step t the trading system can invest all of its current budget at opening or keeping a short/long position on a single stock, or it can close it and stay out of the market. This is formalized by setting A (St ) = A = {−1, 0, 1} for each time t and each state St . Actions are chosen according to a policy derived from the current approximation of the qπ∗ (s, a) function for the selected methodology. As representation of environmental state, we generalize the approach used in [4] by considering features not only for a given number n of past logarithmic returns of the considered stock price, but also for the current performance of the trade in action. Formally, we first consider the vector y(St , At ) ∈ Rn+1 defined by Pt−n+i , for i = 1, . . . , n yi (St , At ) = φ ln Pt−(n+1)+i yn+1 (St , At ) = φ(P L t )
(6) (7)
where P L t = 0 if At−1 = 0, otherwise it is the logarithmic return of the current trade, and φ(x) is the same real-valued logistic function used in [4]. Then, for the actual feature vector x(St , At ) we adopt a block representation commonly used in RL algorithms [5]. That is, the vector y(St , At ) is copied to one of the three slots of a zero vector with |A | · (n + 1) = 3 · (n + 1) elements, according to the following rule:
Comparing RL Approaches for Applications to Financial Trading Systems
⎧ n+1 ⎪ 0n+1 , if At = −1 ⎨y(St , At ) 0 x(St , At ) = 0n+1 y(St , At ) 0n+1 , if At = 0 ⎪ ⎩ n+1 n+1 0 y(St , At ) , if At = 1 0
149
(8)
where 0n+1 is the null vector in Rn+1 . For the reward Rt+1 we considered two choices. The first one, as in [4] is Rt+1 =
μ(gl,t+1 ) σ (gl,t+1 )
(Sharpe Ratio)
(9)
where μ and σ are respectively the sample mean and standard deviation of the rewards calculated over the last l trading days. The second one is Rt+1 =
μ(gl,t+1 ) 1 + max D Dl,t+1
(Calmar Ratio)
(10)
where max D Dl,t+1 is the maximum drawdown, that is the difference between the maximum value of the equity gained by the trading system calculated over the last l trading days and the subsequent minimum value.
4 Results We considered transaction costs required for opening and closing each position, as a percentage rate of 0.15%. We did a first analysis of the performances of the obtained FTSs by running several replications for each FTS, to compare their performance with respect to the choice of the involved step-size parameters, i.e. βt and some others. More specifically, we analyzed the difference in the performance between setting them constant or decreasing over time according to the required conditions to ensure the convergence of the algorithms. Indeed, it is reasonable to assume that the rewards in the stock market do not derive from a stationary probability distribution. In this case it could be argued that possibly there is not a given optimal policy. Consequently, a methodology might perform exploratory actions and learn/correct its trading-policy. So, we first considered several possible values of the step-size parameters, keeping fixed the values for n = 5 and l = 5 and we performed N = 1000 replications for each combination of them and each algorithm with the two reward metrics (9)–(10). Then, we selected the values of the step-size parameters that produce on average the best final equity value, and using them we performed other N = 5000 replications for different values of n and l. Generally, for each stock the annual average return (AAR) obtained by the differently set FTSs is positive. The lowest AAR is for Tim (4.28%) and the highest one is for Unicredit (79.51%). In Table 1 we show the values of the AARs, of the maximal
150
M. Corazza et al.
Table 1 AAR, maximal drawdown (%) and Calmar Ratio for the best FTSs, and B&H AAR Stock Sharpe Calmar Buy & hold Return MaxDD Calmar Return MaxDD Calmar Return (%) (%) ratio (%) (%) ratio (%) Enel Generali Intesa Tim Unicredit
18.57 23.91 54.94 32.58 79.51
41,83 36.84 38.22 30.82 42.07
0.44 0.65 1.44 1.06 1.89
20.35 26.67 51.49 31.27 76.45
40.01 39.76 43.89 36.79 35.38
−2.15 −3.58 −3.27 −11.56 −15.43
0.51 0.67 1.17 0.85 2.16
Table 2 Ratio between AARs using constant step-size parameters and (convergence-driven) decreasing step-size parameters in (5) Unicredit Intesa Tim Sharpe
Calmar
QL SARSA Greedy-GQ QL SARSA Greedy-GQ
3.33 3.06 4.32 3.15 2.85 4.51
1.55 1.43 2.32 1.65 1.65 2.47
1.74 1.66 2.22 2.05 1.98 2.75
drawdown and of the effective Calmar ratio for the FTSs which achieved the best AAR, for each stock and for the two reward metrics. Moreover, for comparative purposes, we also show for each stock the AARs achieved by the simple investment strategy Buy & Hold (B&H). Note that in some cases FTSs which use the Calmar ratio show higher drawdown than FTSs using the Sharpe ratio. This suggests that in RL framework the classical financial measures of risk should be considered with care when used as reward metrics. Note also that for each stock the B&H AAR is negative. Furthermore, we compared the results obtained using the setting with constant step-size parameters, with the ones obtained by imposing convergence-driven decreasing values. The results are shown in Table 2 in terms of the ratio between AARs in the former setting and in the latter. We always get best results with the constant choice of the step-size parameters, which confirms the non-stationarity based hypothesis of the distribution of rewards. We have reported the result only for three of the considered stocks, since for the remaining two ones the average equity obtained with decreasing step-size parameters was lower then the initial capital.
Comparing RL Approaches for Applications to Financial Trading Systems
151
References 1. Barto, A.G., Sutton, R.S.: Reinforcement Learning: An Introduction. The MIT Press, Boston (2018) 2. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957) 3. Bloomberg Finance L.P.: https://www.bloomberg.com/professional/product/market-data/ 4. Corazza, M., Sangalli, A.: Q-learning and SARSA: a comparison between two intelligent stochastic control approaches for financial trading. Working Papers, Department of Economics, Ca’ Foscari University of Venice, 15 (2015) 5. Geramifard, A., Dann, C., How, J.P.: Off-policy learning combined with automatic feature expansion for solving large MDPs. In: Proceedings of the 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making, pp. 29–33. Princeton University Press, Princeton (2013) 6. Gosavi, A.: Simulation-Based Optimization. Parametric Optimization Techniques and Reinforcement Learning. Springer, Berlin (2015) 7. Lo, A.W.: Adaptive Markets. Financial Evolution at the Speed of Thought. Princeton University Press, Princeton (2017) 8. Maei, H.R., Szepesvári, C., Bhatnagar, S., Sutton, R.S.: Toward off-policy learning control with function approximation. In: International Conference on Machine Learning (ICML), pp. 719–726. Omnipress, Madison (2010) 9. Rummery, G.A., Niranjan, M.: On-line Q-Learning using connectionist systems. Technical Report CUED/F-INFENG/TR, 166, Engineering Department, Cambridge University (1994) 10. Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
MFG-Based Trading Model with Information Costs Marco Corazza, Rosario Maggistro, and Raffaele Pesenti
Abstract Mean Field Games (MFG) theory is a recent branch of Dynamic Games aiming at modelling and solving complex decision processes involving a large number of agents which are each other influencing. The MFG framework has been recently applied to the known optimal trading problem. In the original model (see Cardaliaguet and Lehalle [1]), the Authors consider an optimal trading model where a continuum of homogeneous investors make trades on one single financial instrument. Each participant acts strategically controlling her trading speed given the information she has concerning the behaviour of the others in order to fulfil her goal. This leads to a MFG equilibrium in which the mean field depends on the agents’ actions. In this paper, we present an MFG-based model in which the maximum intensity of the trading speed depends on the available information flow and is modelled as a piecewise linear properly saturated function. Keywords Mean field games · Crowding · Optimal trading · Linear-saturated control
M. Corazza Department of Economics, Ca’ Foscari University of Venice, Sestiere Cannaregio 873, Venezia, Italy e-mail: [email protected] R. Maggistro (B) Department of Economics, Business, Mathematics and Statistics, University of Trieste, Via dell’Università 1, Trieste, Italy e-mail: [email protected] R. Pesenti Department of Management, Ca’ Foscari University of Venice, Sestiere Cannaregio 873, Venezia, Italy e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_23
153
154
M. Corazza et al.
1 Introduction In this paper we present and solve a variant of the problem of optimal (financial) trading inside a Mean Field Game (MFG) proposed in [1]. In particular, the formulation of the original problem belongs to the class of the so-called Extended MFG (see Gomes et al. [3] for more details). Generally speaking, MFG theory studies optimal control problems with infinitely many agents interacting among them (see Lasry and Lions [4] for more details). In the original model, the MFG system describes a financial market arising from the interactions among homogeneous investors, the so-called crowd, which act taking the decision to buy/sell contracts of a given (financial) instrument. In particular, each investor has to buy/sell a number of instrument contracts in a given interval of time [0, T ], e.g., one day or one week or so on, and has to be suitably fast in executing her trading order for avoiding that the effectively traded price are far from her decisional one. At the same time, she has also to take care of the impact on the price movements of her trading. Overall, the instrument price is permanently influenced by the investors’ decisions and, in turn, the investors have to make decisions dealing with a price so affected. This is the day-to-day reality of the market participants. In this paper, the trading speed, which constitutes the control of the generic investor, is lowerly and upperly saturated. In particular, these saturations are monotonic bounded functions of the information. The intuitions underlying this specification are: the minimum and the maximum trading speeds considered by the generic investor depend on the information available to the investor herself; insufficient/excessive information have no influence on the bounds of the trading speed. The remainder of this paper is organized as follows. In the next section we introduce our MFG-based model and derive the MFG equations system. In Sect. 3 we introduced the saturated control and study its impact on the solution of the investigated optimal trading problem.
2 Trading Optimally Within the Crowd In this section we present our MFG-based model and derive the MFG equations system. Similarly to what done in [1], let us consider: a continuum of identical investors; an instrument to trade; a time interval [0, T ] in which all investors have to buy/sell the given instrument; a quantity Qt of instrument contracts to buy, Qt < 0, or to sell, Qt > 0, by the generic investor; and a wealth Xt of the generic investor, with X0 = 0. Furthermore, let dQt =ut dt,
dSt = αμt dt + σd Wt
and
dXt = [−ut (St + k1 ut ) − k2 ξ]dt be respectively the dynamics of Qt , of the asset price, and of Xt , where: ut is the trading speed of the generic investor through time; αμt measures the buying/selling
MFG-Based Trading Model with Information Costs
155
pressure, in which μt is the sum of the trading speeds of all the investors and α > 0; σ is the asset return volatility; Wt is a Wiener process; k1 ut and k2 ξ are respectively the variable and the constant parts of the cost affecting each trade, in which k1 , k2 > 0 measure the temporary impact on the market of the investors’ decisions, and ξ > 0 is the flow of information available to the generic investor.1 Note that the control of the generic agent is given by ut and that her state is described by her inventory Qt and her wealth Xt . The value function of the generic investor is Vt =
sup
u∈[um (ξ),uM (ξ)]
E XT − θT + QT (ST − A · QT ) − φ
T t
Qs2
ds ,
(1)
where θT = ξT while A and φ are risk aversion parameters. Note that this function depends on, among the others, a running part of the cost quadratic in the inventory Qt . The Hamilton-Jacobi-Bellman (HJB) equation associated to (1) is 1 ∂t V + αμ∂S V + σ 2 ∂S2 V − φq2 + sup u∂q V − [u(s + k1 u) + k2 ξ] ∂X V − 2 u + − (2) λ (uM (ξ) − u) − λ (um (ξ) − u) = 0, with terminal condition V (T , x, s, q, θ; μ) = x − θ + q(s − Aq), where λ+ , λ− : [0, T ] → R are the Lagrange multipliers used to include in (2) the constraints on the control. These multipliers must satisfy the following slackness complementarity conditions λ+ (uM (ξ) − u) = λ− (um (ξ) − u) = 0
and
λ+ , λ− ≥ 0,
whose role will become clear throughout the paper. At this point, applying the same approach used in [2], we consider the following ersatz V = x − θ + qs + v(t, q; μ), and we obtain that the HJB equation on v is ∂t v + αμq − φq2 − k2 ξ + sup u∂q v − k1 u2 − λ+ (uM (ξ) − u)− u λ− (um (ξ) − u) = 0, with terminal condition v(T , q; μ) = −Aq2 . The associated optimal feedback control is 1 ∂q v(t, q) + λ+ (t) + λ− (t) . (3) u∗ (t, q, λ+ , λ− ) = 2k1 1
The constant part of the cost is firstly introduced in this paper.
156
M. Corazza et al.
Condition (3) implies that the important quantity for each investor is its current inventory Qt and that the mean field in this framework is the distribution m(t, dq) of the inventories of investors. Then the trading flow μ at time t is μt =
u∗ (t, q, λ+ , λ− )m(t, dq),
q
and the evolution of the density m(t, dq) is given by ∂t m + ∂q mu∗ (t, q, λ+ , λ− ) = 0. Therefore, the MFG equations system made of the backward HJB equation on v coupled with the forward transport equation of m is ⎧ ⎪ ∂t v + αμq − φq2 − k2 ξ + u∗ ∂q v − k1 (u∗ )2 − λ+ (uM (ξ) − u∗ )− ⎪ ⎪ ⎪ ⎪ ⎪ −λ− (um (ξ) − u∗ ) = 0 ⎨ ∗ . ∂t m + ∂q (mu ) = 0
∗ ⎪ ⎪ + − ⎪ μt = q u t, q, λ , λ m(t, dq) ⎪ ⎪ ⎪ ⎩m(0, dq) = m (dq), v(T , q; μ) = −Aq2 0
(4)
3 Trade Crowding with Saturated Control In this section we specify the functional form of the optimal control u∗ and investigate its impact on the solution of the considered optimal trading problem. In particular, the constraint on the control, um ≤ u ≤ uM , and the corresponding slackness conditions justify the fact that (3) may be a linear time-variant piecewise saturated control function, i.e., h1 (t) − qh2 (t) ∗ + − , (5) u (t, q, λ , λ ) = sat[um (ξ),uM (ξ)] 2k1 where: sat[a,b] : R → R is defined as ⎧ ⎪ ⎨a if x ≤ a sat[a,b] (x) = x if a < x < b; ⎪ ⎩ b if x ≥ b q indicates the number of instrument contracts bought or sold per unit of time; h1 (t) and h2 (t) are the time-varying coefficients of the linear argument of the function sat[·,·] (·); um and uM are, respectively, the control lower and upper bounds depending on the information flow ξ available to the generic investor. The optimal control (5)
MFG-Based Trading Model with Information Costs
157
takes value um , respectively uM , when λ− = 0, respectively λ+ = 0, and takes value (h1 (t) − qh2 (t))/2k1 when λ− = λ+ = 0. The ideas underlying this specification of u∗ are: the minimum and the maximum trading speeds that the generic investor considers depend on the information available to the investor herself; information which are insufficient/excessive have no influence on the bounds of the trading speed. At this point, substituting (5) in (3) and integrating with respect to q, we get the following expression of the value function: ⎧ q2 ⎪ ⎪ h h2 (t) if λ+ (t) = λ− (t) = 0 (t) + h (t)q − 0 1 ⎪ ⎪ 2 ⎪ ⎨ + − . v(t, q) = 2k1 quM (ξ) − λ (t)dq if λ (t) = 0 ⎪ q ⎪ ⎪ ⎪ ⎪ ⎩2k1 qum (ξ) − λ− (t)dq if λ+ (t) = 0
(6)
q
Once identified the conditions for the continuity of (6), we can obtain the expressions of the Lagrange multipliers, i.e., λ+ (t) = λ− (t) = −qh2 (t). So, we can reformulate (6) as follows: ⎧ q2 ⎪ ⎪ h h2 (t) (t) + h (t)q − if λ+ (t) = λ− (t) = 0 ⎪ 0 1 ⎪ ⎪ 2 ⎨ q2 . v(t, q) = 2k1 quM (ξ) + h2 (t) + hM (t) if λ− (t) = 0 ⎪ 2 ⎪ ⎪ 2 ⎪ ⎪ ⎩2k1 qum (ξ) + hm (t) + q h2 (t) if λ+ (t) = 0 2
(7)
In the remainder of this section we
solve (4) using (7) and (5). To this end, it . is convenient to set E(t) = E(Qt ) = q qm(t, dq), from which, using the transport equation and an integration by parts, we get E (t) = q∂t m(t, q) = − q∂q (m(t, q)u∗ (t, q, λ+ , λ− ))dq = q q (8) = u∗ (t, q, λ+ , λ− )m(t, dq). q
At this point, we derive the equations system (4) for each functional representation of v obtained in the various branches of (7) and solve the corresponding optimal trading problem. As for the representation of v in the first branch of (7), which corresponds to u∗ ∈ (um (ξ), uM (ξ)), it is possible to prove, similarly to what done in [1], that the equations system (4) reduces to the following single differential equation satisfied by E 2k1 E (t) + αE (t) − 2φE(t) = 0 for t ∈ (0, T ) . (9) k1 E (T ) + AE(T ) = 0 E(0) = E0 ,
158
M. Corazza et al.
In particular, it is also possible to prove that (9) admits a unique solution for every α > 0, again similarly to what done in [1, Proposition 3.1]. Now, regarding the representation of v in the second branch of (7), hence when u∗ = uM (ξ), the backward equation of the equations system (4) can be split in the following three parts: h2 (t) 2 (ξ) − k2 ξ = 0, − φ = 0, αμ + uM (ξ)h2 (t) = 0, hM (t) + k1 uM 2
(10)
with terminal condition v(T , q) = −Aq2 , from which we obtain hM (T ) = −2k1 quM (ξ), h2 (T ) = −2A. Then, recalling the formulations of the trading flow (2) and of E (t) (8), we get μ(t) =
uM dm(q) = uM = E (t).
(11)
q
Accordingly, we can supplement (10) with (11), and collecting all the involved equations we find ⎧ φ=0 ⎪ ⎪ ⎪ ⎨h (t) = −α 2 , (12) 2 ⎪ h ⎪ M (t) + k1 uM − k2 ξ = 0 ⎪ ⎩ E (t) = uM with boundary conditions hM (T ) = −2k1 quM (ξ), h2 (T ) = −2A = −α and E(0) = E0 .2 Note that this equations system has more equations than unknowns and that this is consistent with the expression of v in the second branch of (7) in which the coefficient of the first order term is constant. Note also that the equations in (12) are decoupled since, once fixed the optimal control u∗ = uM , the solution E(t) can be get simply integrating the transport equation, i.e., the last equation of (12). In conclusion, MFG theory looks like a powerful tool for modelling and solving complex optimal trading problems in not traditional financial markets. Inside this framework, our future research steps will be: to calibrate our model to real financial market data for checking its realism; to extend our model for taking into account two different kinds of investors, namely, the usual continuum of homogeneous ones and a new large trader.
2
Similar arguments hold for the representation of v in the last branch of (7), hence when u∗ = um (ξ).
MFG-Based Trading Model with Information Costs
159
References 1. Cardaliaguet, P., Lehalle, C.A.: Mean field game of controls and an application to trade crowding. Math. Finan. Econ. 12, 335–363 (2018) 2. Cartea, Á., Jaimungal, S.: Incorporating order-flow into optimal execution. Math. Finan. Econ. 10, 339–364 (2016) 3. Gomes, D.A., Patrizi, S., Voskanyan, V.: On the existence of classical solutions for stationary extended mean field games. Nonlinear Anal. Theory Methods Appl. 99, 49–79 (2014) 4. Lasry, J.-M., Lions, P.-L.: Mean field games. Jpn. J. Math. 2(1), 229–260 (2007)
Trading System Mixed-Integer Optimization by PSO Marco Corazza, Francesca Parpinel, and Claudio Pizzi
Abstract This work concerns the optimization of a Trading Systems (TS) based on a small set of Technical Analysis (TA) indicators. Usually, in TA the values of the parameters (window lengths and thresholds) of these indicators are fixed by professional experience. Here, we propose to design the parametric configuration according to historical data, optimizing some performance measures subjected to proper constraints using a Particle Swarm Optimization-based metaheuristic. In particular, such an optimization procedure is applied to obtain both the optimal parameter values and the optimal weighting of the trading signals from the considered TA indicators, in order to provide an optimal trading decision. The use of a metaheuristic is necessary since the involved optimization problem is strongly nonlinear, nondifferentiable and mixed-integer. The proposed TS is optimized using the daily adjusted closing returns of seven Italian stocks coming from different industries and of two stock market indices. Keywords Trading Systems · Technical Analysis · Mixed-Integer Optimization · Particle Swarm Optimization
1 Introduction In this work we present a simple Trading System (TS) from the perspective of Technical Analysis (TA) following [2] and using the information from financial asset price M. Corazza · F. Parpinel (B) · C. Pizzi Department of Economics, Ca’ Foscari University of Venice, Sestiere Cannaregio 873, 30121 Venezia, Italy e-mail: [email protected] M. Corazza e-mail: [email protected] C. Pizzi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_24
161
162
M. Corazza et al.
series. The signals to buy, or hold, or sell are typically generated by several different TA indicators proposed in the literature, each of them representing one or more aspects of the behaviour of the TS. These indicators usually depend on some parameters (window lengths and thresholds) whose values are fixed in advance according to some professional common sense rules, substantially regardless the specific features of the analyzed financial asset. Here, we propose to estimate such parameters through an evolutionary optimization approach, in particular the Particle Swarm Optimization (PSO) that searches for the global optimum of a fitness function by mimicking the behaviour of a group of social animals. The use of a metaheuristic is necessary since the involved optimization problem is strongly nonlinear, nondifferentiable and mixed-integer. The simple TS we consider in this paper is constituted by the following TA indicators: the Relative Strength Index (RSI ), the Moving Average Convergence/Divergence (MACD), the Bollinger Bands (BB) and the Know Sure Thing (KST ). All of them are often used in the financial professional practice. In reference to this TS, and to maximize the capital at the end of the trading period, the global constrained optimization problem consists in determining the optimal parameters characterizing the above four indicators and in determining their optimal linear combination for producing a single operational trading decision. The remainder of this paper is organized as follows. The next Section briefly presents the considered TS. Sections 3 and 4 introduce, respectively, the global constrained optimization problem to deal with and the PSO-based metaheuristic to solve it. Finally, Sect. 4 proposes the results coming from the applications of our approach to the daily adjusted closing prices of seven Italian stocks and of two stock market indices. As benchmarks, we consider the same TS with both standard parametrization and the Buy & Hold (B&H) strategy.
2 The TS in Short As stated in the previous Section, we consider a TS based on the following TA indicators: RSI , MACD, BB and KST . The first three indicators are well known and widely used both in the academic community and in the professional one. They are described in details in [3] and also used in [2], of which this work is a continuation. For these reasons, we omit to recall them here. Differently, the fourth indicator is a bit less known than the other ones, so we spend some words to present it. This indicator is based of the following different four Rates Of Change (ROC) (see Pring [4]): ROCj = 100 · P(t)/P(t − tKST ,j ) − 1 , with j = 1, . . . , 4, where P(t) is the current closing price and P(t − tKST ,j ) is the tKST ,j -bars-ago closing price. KST is given by the following linear combination of moving averages of the above four ROCs: KST = 4j=1 wKST ,j · MA(ROCj , G j ), where wKST ,j is an appropriate weight and MA(ROCj , G j ) indicates the moving average for ROC computed using a G j rolling window size. In [4], the following values for the parameters are
Trading System Mixed-Integer Optimization by PSO
163
suggested: tKST ,1 = 10, tKST ,2 = 15, tKST ,3 = 20, tKST ,4 = 30, G 1 = 10, G 2 = 10, G 3 = 10, G 4 = 15, wKST ,1 = 1, wKST ,2 = 2, wKST ,3 = 3 and wKST ,4 = 4. Each of the four indicators provides a trading signal. In particular, the latter may be: “−1”, namely “Sell or stay short in the market”; “0”, namely “Stay out from the market”; “+1”, namely “Buy or stay long in the market”. In order to produce a single operational trading rule, we propose to aggregate the above trading signals as follows: ⎧ ⎨ −1 if i ci · signali (t) ∈ −1, − 13 (1) S(t) = 0 if i ci · signali (t) ∈ − 13 , 13 , ⎩ 1 if i ci · signali (t) ∈ 13 , 1 where S(t) is the operational trading rule at time t, signali (t) and ci are, respectively, the trading signal at time t and the weight related to the i-th indicator, with i ∈ {RSI , MACD, BB, KST }. Notice that, hereinafter, with the purpose of identifying the parameters related to a given indicator, the acronym of the indicator itself will be written in their subscripts.
3 The Global Constrained Optimization Problem The task of our global optimization problem consists in maximizing the net capital at the end of the trading period, C(T ), with respect to the set of the parameters of the four TA indicators, X, subject to a set of proper constraints on the same parameters. Notice that the term “net” means that the transaction costs are explicitly taken into account. To specify C(T ), we denote the transaction costs expressed in percentages with δ and define the net rate of return obtained by the TS from t − 1 to t as:
P(t) e(t) = S(t − 1) ln P(t − 1)
− δ |S(t) − S(t − 1)| , t = 2, . . . , T .
Then, we express the equity line produced by our TS as: C(t) = C(t − 1)[1 + e(t)], t = 1, . . . , T , in which C(0) is the starting capital and T is the end of the trading period. Finally, we report the composition of the set of the decision variables: X = {wRSI , bRSI ,l , bRSI ,u , wMACD , wMACD,f , wMACD,s , wBB , bBB,l , bBB,u , tKST ,1 , tKST ,2 , tKST ,3 , tKST ,4 , wKST ,1 , wKST ,2 , wKST ,3 , wKST ,4 , cRSI , cMACD , cBB , cKST }, where w indicates window lengths, b and t
164
M. Corazza et al.
indicate parameters associated to thresholds and c indicates the weights used in the aggregation of the trading signals coming from the four TA indicators 1 . At this point, we are able to formalize the global constrained optimization problem as maxX C(T ) s.t. bRSI ,l ≤ bRSI ,u , wMACD,f ≤ wMACD,s , tKST ,1 ≤ tKST ,2 ≤ tKST ,3 ≤ tKST ,4 , wKST ,1 ≤ wKST ,2 ≤ wKST ,3 ≤ wKST ,4 , wRSI , wMACD , wMACD,f , wMACD,s , wBB , tKST ,1 , tKST ,2 , tKST ,3 , tKST ,4 ∈ N+ .
4 The PSO-Based Metaheuritsic PSO is an iterative bio-inspired population-based metaheuristic for the solution of global unconstrained optimization problems, instead our optimization problem is global constrained. Because of it, first we briefly introduce the basic PSO, and its implementation performed in order to take into account the presence of constraints. The basic idea of PSO is to replicate the social behaviour of shoals of fish or of flocks of birds cooperating in the pursuit of a given goal. To this purpose, each member (a particle) of the shoal/flock (the swarm) explores the search area recording its best position reached so far, and it exchanges this information with the neighbors in the swarm. Thus, the whole swarm tends to converge towards the best global position reached by the particles. For dealing with the presence of constraints, different strategies are proposed in the literature to ensure that feasible positions are generated at any iterations of PSO. However, in this paper we use PSO accordingly to the original intent, that is as a tool for the solution of unconstrained optimization problems. To this purpose, we reformulate our constrained problem into an unconstrained one using a nondifferentiable penalty function method already applied in the financial context (see Corazza et al. [1]). Such an approach is known as exact penalty method, where the term “exact” refers to the correspondence between the minimizers of the original constrained problem and the minimizers of the unconstrained (penalized) one. Notice that the choice of PSO as core of the optimization solver is mainly due to its simplicity of implementation and ability to quickly converge to a reasonably good solution (see Ratnaweera and Halgamuge [5]).
5 Applications and Some Conclusions To assess the performances of our approach, we compare the in-sample results coming from the TS with the optimized window lengths, thresholds and weights, with the Notice that e(·) is a function of S(·), that S(·) is a function of signali (·), and that signali (·) is a function of the i-th indicator.
1
Trading System Mixed-Integer Optimization by PSO
165
in-sample results coming from a TS with standard setting. In particular, we apply both the TSs to the time series of important stocks and stock market indices in the period from January 4, 2010 to December 23, 2019. Regarding the Italian stocks, we consider the following ones, coming from different industries: Assicurazioni Generali S.p.A. (AG); Buzzi Unicem S.p.A. (BU); ENEL S.p.A. (EE); ENI S.p.A. (EI); Intesa SanPaolo S.p.A. (IS), STMicroelectronics S.p.A. (ST) and Telecom Italia S.p.A. (TI). Regarding the stock market indices, we consider the Italian FTSE MIB (FM) and the German DAX (DX). As the TS with standard setting is concerned, according to the literature, we use the following values for the parameters: wRSI = 26, bRSI ,l = 30, bRSI ,u = 70, wMACD = 9, wMACD,f = 12,wMACD,s = 26, wBB = 26, bBB,l = 2 and bBB,u = 2; the values of the parameters from tKST ,1 to wKST ,4 are reported in Sect. 2. Regarding the aggregation rule of the trading signals at time t coming from the four TA indicators, again according to the literature, we use formula (1) in which c1 , c2 , c3 and c4 are all set to 1. As concerns the PSO-based metaheuristic, we use the following setting: the number of particles is 50 and the number of iterations is 75. Furthermore, as this solver is stochastic due to, for instance, the random initialization of both the position and the velocity of the particles, we run 100 times our approach for each stock, in order to compute some statistics and evaluate their distribution. Finally, in all the applications we set δ to the realistic value of 0.15%. In Table 1 we present the in-sample performances achieved by the two TSs. In particular, columns 2 and 3 respectively report the annualized rate of return performed by the TS with standard setting (r) and by the B&H strategy (rB&H ); columns 4 and 5 we respectively provide the average annualized rate of return performed by the TS with optimized parameter values (r) and the associated standard deviation (sr ) computed over the 100 runs of our approach; column 6 shows the approximated 95% confidence interval calculated using r and sr ([·, ·]95% )2 ; columns 7 and 8 respectively present the minimum of r (rmin ) and the maximum of r (rmax ) over the 100 runs of our approach. Furthermore, in Fig. 1 we present the in-sample violin plots related to the TS with optimized parameter values. We briefly remind that violin plots are an improvement of box-plots, showing the kernel density estimate in a specular way, instead of quartiles, in the Figure we note that the two stock market indices show lesser dispersion and position compared to the other stocks. In Fig. 1, the brown straight line, the red dashed one and the blue dotted one connect the mean of returns with respectively the proposed technique, the B&H strategy and standard setting; in a comparison among lines we see that in all the cases the proposed procedure always performs in means better than the other ones: this is evident also if we look at whole range of simulation results. We highlight that all the average annualized rates of return obtained by the TS with optimized parameter values are greater than the annualized rates of return achieved 2
Notice that we performed our procedure 100 times, so we could apply the central limit theorem according to which the sample mean is asymptotically normally distributed.
166
M. Corazza et al.
Table 1 In-sample performances achieved by the two TSs and by the B&H strategy Time series r (%)
r (%) sr (%)
[·, ·]95% (%)
AG
−6.30
rB&H (%)
4.91
22.48 7.96
[6.87, 38.09]
rmin (%) rmax (%)
0.71
41.05
BU
−5.33
10.06
21.75 8.00
[6.07, 37.43]
5.85
52.95
EE
2.45
11.35
16.50 5.75
[5.24, 27.76]
4.32
34.01
EI
−10.15
4.11
22.51 7.48
[7.85, 37.16]
0.50
37.65
IS
−16.76
4.25
28.21 10.68
[7.27, 49.15]
−4.22
53.94
ST
−8.42
18.46
31.48 13.71
[4.60, 58.35]
10.64
74.11
TI
−5.88
−4.11
25.38 10.44
[4.92, 45.83]
0.65
44.29
DX
−2.01
8.14
13.02 5.15
[2.93, 23.12]
−3.40
24.35
FM
1.98
0.40
13.94 4.95
[4.24, 23.65]
−0.73
25.29
Fig. 1 In-sample violin plots related to the TS with optimized parameter values
both by the TS with standard setting and by the B&H strategy. It indicates that also for simple TA-based TS, the parameters optimization play an important role. Then, no average annualized rate of return achieved by the TS with standard setting belongs to the approximated 95% confidence interval. This indicates that, for all the investigated time series, r is statistically different from r at the 5% significance level. Moreover, note also that, all rs are lower than the corresponding rmin which means that the worst results obtained with optimized parameters is generally better than those obtained with standard settings. In conclusion, we notice that the first results of this work appear interesting. Of course, they will have to be validated by extending the set of stocks and stock market indices to investigate and, mainly, by performing accurate out-of-sample analyses. Both are targets of our future research.
Trading System Mixed-Integer Optimization by PSO
167
References 1. Corazza, M., Fasano, G., Gusso, R.: Particle swarm optimization with no-smooth penalty reformulation, for a complex portfolio selection problem. Appl. Math. Comput. 224, 611–624 (2013) 2. Corazza, M., Parpinel, F., Pizzi C.: An evolutionary approach to improve a simple trading system. In: Mathematical and Statistical Methods for Actuarial Sciences and Finance. MAF 2016, pp. 83–95. Springer (2017) 3. Murphy, J.J.: Technical analysis of the financial markets. A comprehensive guide to trading methods and applications, New York Institute of Finance (1999) 4. Pring, M.J.: Technical Analysis Explained, 3rd edn. McGraw-Hill (1991) 5. Ratnaweera, A., Halgamuge, S.K.: Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Trans. Evol. Comput. 8, 240–255 (2004)
A GARCH-Type Model with Cross-Sectional Volatility Clusters Pietro Coretto, Michele La Rocca, and Giuseppe Storti
Abstract In this work we exploit the inhomogeneity of the cross-sectional distribution of realized stock volatilities, and we propose to use it improve the predictive performance of GARCH-type models. The inhomogeneity is shown to be well captured by a finite Gaussian mixture model plus a uniform component that represents the “noise” generated by abnormal variations in returns. In fact, it is common that in a cross-section of realized volatilities there is a small proportion of stocks showing extreme behavior. The mixture model is used to estimate the probability that, at a given time point, the stock belongs to a specific volatility group. The latter is profitably used for specifying parsimonious state-dependent models for volatility forecasting. We propose novel GARCH-type specifications whose parameters act “clusterwise” conditional on past information on the volatility clusters. Finally the empirical performance of the proposed models is assessed by means of an application to a panel of U.S. stocks traded on the NYSE. Keywords GARCH models · Realized volatility · Model-based clustering · Robust clustering
1 Introduction In recent years there has been a growing interest in modelling the volatility of large dimensional portfolios with a particular interest in the dynamic interdependencies P. Coretto (B) · M. La Rocca · G. Storti DISES, University of Salerno, Via Ponte Don Melillo, 84084 Fisciano, SA, Italy e-mail: [email protected] M. La Rocca e-mail: [email protected] G. Storti e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_25
169
170
P. Coretto et al.
among several assets [2, 13]. However, modelling dynamic interdependencies among several assets would in principle require the estimation of an unrealistic huge number of parameters compared with the available data dimensions. Models parsimony is achieved introducing severe constraints on the dependence structure of the data see [20]. Bauwens and Rombouts [3] showed evidence in favor of the hypothesis that there exist cluster-wise dependence structures in the distribution of financial returns. In this paper we exploit the cluster structure of returns at cross-sectional level to build GARCH-type models where the volatility of each assets depends on the past information about the cluster structure of the entire market. Although this does not provide a multivariate volatility model for the entire market, the proposed strategy allows to build univariate models that parsimoniously use past information on the entire market. In each time period, a robust model-based clustering method identifies K volatility groups based on realized volatility data. Then, for a given asset, its volatility is modeled via a GARCH-type model where parameters depend on the discovered group-structure in previous periods. The paper is organized as follows: in Sect. 2 we introduce the proposed GARCHtype specification and the supporting clustering method, in Sect. 3 we discuss an application using data from the NYSE, finally, in Sect. 4 we state some final remarks.
2 Models and Methods Several papers have provided evidence supporting the idea that the dynamic properties of volatility depend on the level of volatility itself among others [4, 6]. This stylized fact has important implications for modelling since the dynamics of assets characterized by similar volatility levels could be jointly modelled in order to produce parsimonious model specifications for large volatility panels. Also, from a time series perspective, it is reasonable to allow the structure of these volatility clusters to vary over time. Clustering methods for financial time series have been already studied in [19], where the autoregressive metric see [7] is applied to measure the distance between GARCH processes. In this paper we assume that the cross-sectional distribution of assets’ volatility is a finite mixture where each component represents a group of assets that share similar risk behaviour. The underlying clusters are recovered using model-based clustering methods see [1, 17, 18]. Let h t,s be the realized volatility of the asset s at time t. Assume that at each t there are G groups of “regular” assets, and that the jth group has an approximately Gaussian distribution with mean m j,t and variance v j,t , and that it has an expected size (proportion) π j,t ∈ [0, 1], with Gj=1 π j,t = 1 − π0,t . In order to account for the fact that there are often assets’ exhibiting abnormal variations, we assume that there exists an additional ‘non-regular’ group represented by a Uniform distribution supported on interval [lt , u t ], and this group has an expected size π0,t ∈ [0, 1]. In
A GARCH-Type Model with Cross-Sectional Volatility Clusters
171
the context of Gaussian mixture models, the addition of a non-regular cluster has been proposed in [1] and extensively studied in amongtheothers [8, 10, 15]. The cross-sectional model for the assets’ volatility is represented by the mixture density f (h t,s ; θt ) :=
π0,t
G 1[lt ,u t ] (h t,s ) + π j,t φ(h t,s ; m j,t , v j,t ), u t − lt j=1
(1)
where φ(·) is the Gaussian density, and the unknown parameter vector is θt := (π0,t , lt , u t , π1,t , m 1,t , v1,t , . . . , πG,t , m G,t , vG,t ) . θt is estimated based on MLE [9], while the MLE numerical approximation can be obtained based on the EM algorithm [11, 12]. Define τt,s, j
π j,t φ(h t,s ; m j,t , v j,t ) := f (h t,s ; θt )
τt,s,0 :=
π0,t
1[lt ,u t ] (h t,s ) u t −lt
f (h t,s ; θt )
.
The previous quantities are called “posterior weights”, and these are crucial for the analysis. τt,s, j is the probability that at time t the asset s belongs the jth cluster/noise conditional on the observed sample. Points are assigned to their cluster by the optimal Bayes classifier, that is the asset s at time t is assigned to group ct,s = arg max j=0,...,G τt,s, j , where ct,s = 0 means the non-regular group. In addition to perform clustering the τt,s, j weights can be seen as smooth measure of how s is strongly connected to the jth volatility cluster at time t. Therefore at each time period t the clustering structure of the market is completely described by the weights {τt,s, j }. This information is then embodied in the following GARCH-type specification. Let rt,s the return of asset s at time t. The returns generating process is specified as rt,s = μs + σt,s z t,s 2 2 = ωs + α s (rt−1,s − μ)2 + β s σt−1,s τ t−1,s σt,s
(2)
where z t,s ∼ IID(0, 1), ωs = ωs,1 . . . ωs,G > 0, α s = αs,1 . . . αs,G ≥ 0, β s = βs,1 . . . βs,G ≥ 0, and τ t−1,s = (τt,s,1 , . . . , τt,s,G ) . In model (2) the classical GARCH(1,1) parameters are specified group-wise, these are weighted conditional on the clustering information provided by vector τ t−1,s . Note that, although (2), is a univariate volatility model, it uses past information on the entire market through τ t−1,s . These cluster membership weights τ t−1,s are an essential state variable that we cannot observe, however, they can be estimated from the data as discussed previously. We propose a 2-steps estimation procedure. In the first step we use the crosssectional distribution of the realized volatility to recover the cluster wights {τ t−1,s } at each time t and for each stock s. These estimated {τ t−1,s } are treated as observed
172
P. Coretto et al.
data at time t. In the second step we use return data to fit model (2) using the quasi-ML method [22] where the parameter vectors ωs , α s , β s are fitted conditional on {τ t−1,s }.
3 Empirical Study In this Section we compare the performance of the proposed modeling strategy with the standard approach based on parallel estimation (by Gaussian QML) of several constant-parameters univariate GARCH(1,1) models. Differently, our approach entails a panel of GARCH(1,1) models characterized by asset-specific and time varying-parameters, identified using information on the group structure of the volatility panel. Note that for the proposed method G is still a key model’s hyper-parameter that is not treated as estimable. However, one can perform a small out-of-sample predictive assessment and fix the G which gives the best performance. We did this and we fixed G = 3. The comparison with the standard GARCH(1,1) modelling approach is done in terms of forecasting performance. We consider data for S = 123 assets traded on the NYSE from 11/Aug/1998 to 18/Jul/2008, therefore there are T = 2500 trading days. Returns are sampled every 5 minutes and then aggregated to compute daily realized variances (sum of squared returns over one trading day). Details of the out-of-sample forecasting comparison are as follows: • a mixed-rolling window scheme is used with window length = 1500 data points (i.e. the last 1000 are left for out-of-sample forecast evaluation); • the length of the re-estimation interval = 50 days (i.e 20 re-estimations for each asset). In practice all the models are re-estimated every 50 observations. • 1-step-ahead forecasts are computed. The forecasting performance is assessed based on two well known scoring measures. Let h t be the realized volatility at time t, and denote by σˆ t2 the conditional variance predicted by a competing model. The notation avg j {·} stands for the average of {·} across the index j. Note that forecast are computed at periods T + j, where T = 1500 = is the in-sample fitting period. The following scores are considered • FMSE (Forecast Mean Square Error, less-is-better) FMSE := avg j
2 σˆ T2 + j − h T + j ,
• Q-LIKE (less-is-better) QLIKE := avg j
log(σˆ T2 + j )
+
h 2T + j σˆ T2 + j
.
A GARCH-Type Model with Cross-Sectional Volatility Clusters
173
Table 1 Number of times (across the assets) that a given model results the best performer. Values in brackets are the number of times the null hypothesis of equal conditional predictive Ability is rejected according to [14] GARCH(1,1) Proposed Model FMSE QLIKE
23
(15)
31
(28)
100 (94)
92
(89)
Both these loss functions are robust in the sense of [21]. Results are reported in Table 1 where one can see that the proposed methodology has a strong advantage over the well established GARCH(1,1) modelling.
4 Conclusion and Final Remarks We have presented a novel approach to forecasting volatility for large panels of assets. Compared to existing approaches, our modelling strategy has some important advantages. First, inference is based on a computationally feasible two-stage procedure where the investigation of the multivariate group structure in the crosssectional distribution of volatility (stage 1) is separated from the fitting of the dynamic volatility forecasting models (stage 2). Second, for a given stock, the fitted volatility forecasting model has coefficients that are asset specific and time-varying. Last but not least, the structure of our modelling approach is inherently flexible and can be easily adapted to consider alternative choices of the clustering variables as well as of the second-stage parametric specifications used for volatility forecasting. On an empirical ground, the results presented in Sect. 3 provide strong evidence that the proposed approach is able to improve over the estimation of univariate GARCHtype models, thus confirming the intuition that taking into account the information on group structures can be remunerative in terms of predictive accuracy in volatility forecasting. The proposed framework is prone to several extensions. First, the modelling of volatility spillovers could be made feasible by assuming some degree of sparsity in the system among others [5]. Furthermore, relying on recent advancements in computational science and computing power, the same approach could be extended to consider the use of Stochastic Volatility among others [16] instead of GARCH models.
References 1. Banfield, J.D., Raftery, A.E.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993) 2. Barigozzi, M., Brownlees, C., Gallo, G.M., Veredas, D.: Disentangling systematic and idiosyncratic dynamics in panels of volatility measures. J. Econom. 182(2), 364–384 (2014)
174
P. Coretto et al.
3. Bauwens, L., Rombouts, J.: Bayesian clustering of many GARCH models. Econom. Rev. 26, 365–386 (2007) 4. Bauwens, L., Storti, G.: A component GARCH model with time varying weights. Stud. Nonlinear Dyn. Econom. 13, 2 (2009) 5. Billio, M., Casarin, R., Rossini, L.: Bayesian nonparametric sparse VAR models. J. Econom. 212(1), 97–115 (2019) 6. Cai, J.: A Markov model of switching-regime arch. J. Bus. Econ. Stat. 12(3), 309–316 (1994) 7. Corduas, M., Piccolo, D.: Time series clustering and classification by the autoregressive metric. Comput. Statist. Data Anal. 52(4), 1860–1872 (2008) 8. Coretto, P., Hennig, C.: A simulation study to compare robust clustering methods based on mixtures. Adv. Data Anal. Classif. 4(2), 111–135 (2010) 9. Coretto, P., Hennig, C.: Maximum likelihood estimation of heterogeneous mixtures of gaussian and uniform distributions. J. Stat. Plan. Inference 141(1), 462–473 (2011) 10. Coretto, P., Hennig, C.: Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust gaussian clustering. J. Am. Stat. Assoc. 111(516) (2016) 11. Coretto, P., Hennig, C.: Consistency, breakdown robustness, and algorithms for robust improper maximum likelihood clustering. J. Mach. Learn. Res. 18(142), 1–39 (2017a). http://jmlr.org/ papers/v18/16-382.html 12. Coretto, P., Hennig, C.: Otrimle: robust model-based clustering. R Package Version 1, 1 (2017b) 13. Engle, R.F., Ledoit, O., Wolf, M.: Large dynamic covariance matrices. J. Bus. Econ. Stat. 37(2), 363–375 (2019) 14. Giacomini, R., White, H.: Tests of conditional predictive ability. Econometrica 74(6), 1545– 1578 (2006) 15. Hennig, C.: Breakdown points for maximum likelihood estimators of location-scale mixtures. Annal. Stat. 32(4), 1313–1340 (2004) 16. Jacquier, E., Polson, N.G., Rossi, P.E.: Bayesian analysis of stochastic volatility models. J. Bus. Econ. Stat. 20(1), 69–87 (2002) 17. McLachlan, G.J., Basford, K.E.: Mixture Models: inference and Applications to Clustering. Dekker, New York (1988) 18. McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000) 19. Otranto, E.: Clustering heteroskedastic time series by model-based procedures. Comput. Stat. Data Anal. 52(10), 4685–4698 (2008) 20. Pakel, C., Shephard, N., Sheppard, K.: Nuisance parameters, composite likelihoods and a panel of GARCH models. Stat. Sinica 21(1), 307–329 (2011) 21. Patton, A.J.: Volatility forecast comparison using imperfect volatility proxies. J. Econom. 160(1), 246–256 (2011) 22. Straumann, D., Mikosch, T.: Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: a stochastic recurrence equations approach. Annal. Stat. 34(5), 2449– 2495 (2006)
A Lattice Approach to Evaluate Participating Policies in a Stochastic Interest Rate Framework Massimo Costabile, Ivar Massabó, Emilio Russo, and Alessandro Staino
Abstract To achieve the accurate evaluation and management of the risk affecting long-term life insurance contracts, the insurer cannot leave aside the consideration of stochastic dynamics not only for the company’s assets but also for the interest rate. The aim of this paper is to provide a flexible method for evaluating participating policies, life insurance products that combine financial and demographic risks and provide benefits linked to the company’s asset returns. Participating policies embedding not only a minimum guaranteed bonus rate but also a surrender option are analyzed. The method is flexible in that it allows the insurer to choose the most appropriate dynamics, both for the interest rate and the company’s asset, among the ones widely diffused in finance. Lattice-based procedures are used to discretize the continuous time processes, and to provide a comprehensive evaluation method. Keywords Participating policies · Stochastic interest rates · Surrender option · Binomial algorithm · Discret-time model
1 The Framework In a competitive, frictionless, and perfect life insurance market characterized by stochastic interest rates, we develop a model that is able to evaluate participating policies embedding a surrender option. By considering a simple stylized version of a participating contract, structured as in Grosen and Jorgensen [4], the policyholder makes an initial single-sum deposit with the insurance company and acquires a policy with nominal value S(0) and maturity T , measured on annual basis. The insurance company invests the entire sum, S(0), in risky assets traded in the financial M. Costabile · I. Massabó · E. Russo (B) · A. Staino Department of Economics, Statistics, and Finance, University of Calabria, Ponte Bucci cubo 0/C, 87036 Arcavacata di Rende (CS), Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_26
175
176
M. Costabile et al.
market that form the company asset portfolio, so that the sum behaviour along time is stochastic and will be described by a stochastic process as specified hereafter. The quantity S(0) represents also the initial balance of the policy-holder’s account and of the company’s asset, whose stochastic dynamics influences the mechanism used to commit interests to the policy account balance because the crediting rules follow a scheme linked to each year market return. Indeed, supposing that T is an integer, the policy-holder’s account balance at the k-th anniversary, P(k), with k = 1, . . . , T , is computed k recursively as P(k) = (1 + r P (k))P(k − 1), or alternatively as P(k) = (1 + r P (i)), with P(0) = S(0), where: r P(k) is the P(0) i=1 interest rate credited B(k−1) −γ = rG + to the policy in year k and it is given by r P (k) = max r G , α P(k−1) B(k−1) max α P(k−1) − γ − r G , 0 = r G + r B (k); B(k − 1) = S(k − 1) − P(k − 1) is the bonus reserve at time k − 1; r G is the constant positive guaranteed annual policy interest rate; r B (k) is the bonus interest rate in year k; γ is the constant target for the ratio of bonus reserves to policy reserves; α is the distribution fraction to policyholder’s account of any excessive bonus reserve in each period. To develop an evaluation model for participating policies working as detailed above, we need to specify the dynamics describing the interest rate process, r , and the insurance company’s asset, S, made up of equities of the same kind, which are traded in a frictionless, complete, and arbitrage free financial market. In detail, under the risk-neutral probability measure1 Q, we consider the following dynamics: dr (t) = m r (r (t))dt + σr (r (t))dWtr and d S(t) = r (t)S(t)dt + σ S S(t)dWtS . Here, W r and W S are correlated Brownian motions with correlation ρ, and σ S is the equity price volatility. In this framework, we compute the participating policy present value at a generic t, conditioned on the information set Ft generated by W r and W S , time T as E Q e− t r (u)du P(T )|Ft . By specifying differently the drift m r (r (t)) and the volatility component σr (r (t)) in the r -process, we may treat the different stochastic interest rate models widely used in financial and actuarial literature. For instance, we obtain the Cox-Ingersoll-Ross [2] model by imposing m r (r (t)) = δ(θ − r (t)) and √ σr (r (t)) = σr r (t), and the Vasicek [6] model by setting m r (r (t)) = δ(θ − r (t)) and σr (r (t)) = σr , where the reversion speed δ, the long term revision target θ , and the risk-free rate volatility σr are strictly positive constants.
2 The Evaluation Model In order to establish a discrete time model for evaluating participating policies, preliminarily we need to discretize the interest rate and the insurance company’s asset process. With this aim, we use two lattice techniques for which, assuming that the policy is drawn up at time 0 and matures at time T , we split the interval [0, T ] into n intervals of equal length Δt = Tn . We also suppose that n is a multiple of T so 1
See Harrison and Krepes [5] for further details.
A Lattice Approach to Evaluate Participating Policies …
177
that we have an integer number of steps equal to Tn falling between two consecutive annual policy anniversaries, and avoid biases in the policy evaluation that may occur whenever an anniversary falls between two consecutive observation instants. We discretize the interest rate process applying the Costabile and Massabò [1] approach and obtain a computationally simple lattice in which the number of nodes grows up linearly with the number of time steps. Then, we discretize the insurance company’s asset process by establish a recombining binomial lattice a lá Cox, Ross, Rubinstein [3] discretizing the diffusion component of the insurance company’s asset dynamics, while we take into account the dependence upon the stochastic interest rate in the probability associated with each possible movement. The participating policy structure is taken into account when working on the Slattice in order to compute the policy-holder’s account value at each discrete observation time iΔt, P(iΔt). With this aim, we recall that the policy-holder makes an initial single-sum deposit with the insurance company to buy a policy with nominal value S(0) and maturity T , and that the insurance company invests the entire sum, S(0), in the financial market by crediting interest on the policy-holder account on annual basis. Unfortunately, to generate each quantity P(iΔt) on the S-lattice is computationally unmanageable according to the policy crediting rules. Indeed, all the trajectories between two consecutive contract anniversaries must be detected to compute the quantities r P (iΔt), r B (iΔt), B(iΔt) at each discrete observation time, which are then used to generate P(iΔt). As a consequence, to maintain the evaluation problem computationally tractable, we propose to augment the S-lattice at −−→ each time step iΔt by a state vector P(i) containing buckets of representative values for P(iΔt) associated with all nodes (i, j), j = 0, . . . , i. By considering a generic −−→ contract anniversary, iΔt, the buckets in P(i) at time iΔt are computed step by step. Step 1. Follow the highest S-trajectory τmax (i) with i consecutive up steps and compute the maximum policy-holder’s account value Pmax (i) recursively as: • start from inception where Pmax (0) = S(0), and the bonus reserve is B(0) = 0; • in correspondence of each k ∈ [1, T ] antecedent time iΔt, com anniversary B ((k−1) Tn ) pute r P (k) = max r G , α P (k−1) n − γ , where, for k = 1, . . . , iΔt, max ( T)
with B k Tn = S k Tn , k Tn Pmax k Tn = (1 + r P (k))Pmax (k − 1) Tn ,
n − Pmax k T . Step 2. Follow the lowest S-trajectory τmin (i) with i consecutive down steps and compute, similarly as P max (i), the minimum policy-holder’s
account value, Pmin (i), recursively as Pmin k Tn = (1 + r P (k))Pmin (k − 1) Tn , with B k Tn =
S k Tn , 0 − Pmin k Tn . We need also the policy-holder’s account values nodes (i, j)
for the lattice falling between two consecutive anniversaries i ∈ (k − 1) Tn , k Tn . With this aim, we impose that the maximum and minimum account value for such lattice nodes are fixed anniversary, i.e., Pmax (i) =
at thesame levels registered at the antecedent Pmax iΔt Tn and Pmin (i) = Pmin iΔt Tn . The reason behind this choice is
178
M. Costabile et al.
linked to the fact that interests are credited to the policy only at each annual anniversary, so that they do not have any effect in interim periods, thus leaving the policy-holder’s account value unchanged between two consecutive anniversaries. At this point, we are left to compute the representative buckets as reported in the following Step 3. −−→ Step 3. Build up the vector P(i) containing the representative values spanning the interval [Pmin (i), Pmax (i)]: −−→ • the first component in P(i) is P(i; 0) = Pmax (i); √ • the other representative values are computed as P(i; h) = Pmax (i)e−ha Δt , where a is a positive parameter that controls the fineness of the grid and h = 1, . . . , H (i) −√1, with H (i) being the smallest integer assuring that Pmin (i) ≥ Pmax (i) e−H (i)a Δt ; −−→ −−→ • the last component in P(i) is P(i; H (i)) = Pmin (i) and the vector P(i) is made up of H (i) + 1 components. The algorithm for evaluating the participating policy starts by combining the r lattice and the S-lattice at each time slice iΔt, i = 0, . . . , n, to obtain a threedimensional lattice having the form of a binomial pyramid, where each node presents four branches. The three-dimensional tree, with nodes denoted by (i, j, l), with j = 0, . . . , i, and l = 0, . . . , i, presents one node at time 0, four nodes at time Δt, nine nodes at time 2Δt, etc. The probability of each branch is computed by adjusting the product of the marginal probabilities associated with the corresponding movements in the lattice approximating r (t) and S(t) to take into account their correlation. In detail, starting from a generic state (i, j, l), we have four possible scenarios: Scenario uu with probability puu ; Scenario ud with probability pud ; Scenario du with probability pdu ; Scenario dd with probability pdd . Transition probabilities puu , pud , pdu , and pdd are computed by solving the linear system induced by the following conditions: the probabilities sum to 1, i.e., puu + pud + pdu + pdd = 1; the constraint upon the marginal probability of the S-process is set up as puu + pud = p S (i, j, l); the constraint upon the marginal probability of the r process is set up as puu + pdu = pr (i, l); the covariance between the discretized r -process and S-process must equal the covariance between the continuous time ones, i.e., puu − pud − pdu + pdd = ρ. Solving simultaneously, we obtain: puu = p S (i, j, l) pr (i, l) + ρ4 ; pud = p S (i, j, l)qr (i, l) − ρ4 ; pdu = q S (i, j, l) pr (i, l) − ρ4 , and pdd = q S (i, j, l)qr (i, l) + ρ4 . Working on this three-dimensional tree, we can operate backward starting to compute the participating policy value at inception.2
p S (i, j, l) is the probability associated with an upward movement for the S-process starting from node (i, j), when the r -process is located at node (i, l), and q S (i, j, l) = 1 − p S (i, j, l). Similarly, pr (i, l) is the probability associated with an upward movement for the r -process starting from node (i, l), and qr (i, l) = 1 − pr (i, l).
2
A Lattice Approach to Evaluate Participating Policies …
179
3 Numerical Results Preliminary, to validate the model, we force the interest rate process to be constant at level r and present a comparison between the results provided by the proposed lattice model (L) and the ones provided by the univariate model of Grosen and Jorgensen (G J ) [4]. In Table 1, we consider a participating contract with parameters: S(0) = 100, B(0) = 0, T = 20 years, σ S = 0.15 and r G = 0.045, while r assumes the values 0.08 and 0.04, and γ and α, vary among the values reported in Table 1. Each entry in Table 1 presents two values: the first one, at the top, is the value of the contract without surrender option; the second one, at the bottom, represents the value of the same participating contract embedding a surrender option. Observing the results reported in Table 1, it is worth evidencing that the L-values are really close to the G J -values in all the cases, as expected. The L-values are completely coherent with the G J -values for small values of α and for any value of γ . In particular, when α = 0 (corresponding, as already evidenced in G J , to the extreme situation where surplus is never distributed, thus making redundant the values assumed by γ ), the policyholder receives the minimum guaranteed rate r G per year for the entire period of the contract, which results in a zero-coupon bond with maturity 20 years. Furthermore, a participating contract value without surrender option below par implies that the corresponding contract with surrender option results to be at par, i.e., it is terminated immediately. As a final remark, we analyze the impact of the risk-free interest rate r on the participating policy present value. Decreasing r up to 0.04, which is smaller than the minimum guaranteed interest rate r G = 0.045, the policy value increases but the surrender option is never exercised (see Table 1 when r = 0.04), i.e., we obtain the same policy present values with and without the presence of the surrender option. It means that the value of the surrender option decreases when r decreases, that is staying in the contract becomes more attractive when the risk-free rate is closer and closer to the guaranteed interest rate. To present a more realistic numerical analysis, we consider a participating contract with time to maturity T = 10 years in Table 2 evaluated without and with the presence of a surrender option when the risk-free rate dynamics is described by a Cox-Ingersoll-Ross [2] process with δ = 0.3, σr = 0.1, and θ = 0.08, while the initial rate value r (0) assumes the values 0.08 and 0.04, respectively. The other parameters remain fixed at S(0) = 100, B(0) = 0, σ S = 0.15 and r G = 0.045, as γ and α vary among the values reported in the table, while the correlation between the r -process and the S-process is fixed at ρ = −0.3. Each entry in Table 2 presents policy values in two lines, without (at the top) and with surrender option (at the bottom). To generate benchmark values for comparison, we have implemented Monte Carlo simulations (MC) bases on 1,000,000 trials and 250 observations. Standard errors are reported in brackets below each MC price. It is worth noting, the accuracy of the proposed model with respect to the MC method in all the analyzed cases.
1
0.75
0.5
152.02
152.15
151.50
151.50
147.08
147.37
146.63
146.63
140.09
140.09
139.78
128.07
127.94
139.78
128.07
108.37
127.94
108.37
0.25
108.37
108.37
0
GJ
γ =0
L
124.51
124.60
α
109.73
119.95
119.65
109.55
104.94
112.46
112.74
104.76
97.68
101.26
97.54
101.87
r = 0.04
1
0.75
0.5
83.57
100.00
83.47
100.00
0
0.25
48.69
48.69
α
GJ
γ =0
L
r = 0.08
146.69
146.69
142.34
142.34
136.24
136.24
125.78
125.78
108.37
108.37
L
γ = 0.05
120.30
105.27
115.84
100.81
109.72
94.11
100.59
81.11
100.00
48.69
L
γ = 0.05
147.04
147.04
142.64
142.64
136.46
136.46
125.88
125.88
108.37
108.37
GJ
119.79
105.44
115.69
100.96
109.49
94.23
100.00
81.20
100.00
48.69
GJ
142.46
142.46
138.60
138.60
133.16
133.16
123.90
123.90
108.37
108.37
L
γ = 0.1
116.51
101.37
112.52
97.22
107.17
90.98
100.00
78.95
100.00
48.69
L
γ = 0.1
S(0) = 100, B(0) = 0, T = 20, σ S = 0.15, r G = 4.5%, n = 400, a = 0.01
142.72
142.72
138.79
138.79
133.31
133.31
123.98
123.98
108.37
108.37
GJ
115.65
101.54
112.10
97.35
106.91
91.09
100.00
79.03
100.00
48.69
GJ
138.77
138.77
135.31
135.31
130.46
130.46
122.25
122.25
108.37
108.37
L
γ = 0.15
113.19
97.82
109.63
93.94
104.98
88.12
100.00
76.96
100.00
48.69
L
γ = 0.15
138.96
138.96
135.44
135.44
130.57
130.57
122.31
122.31
108.37
108.37
GJ
112.05
97.99
109.23
94.05
104.61
88.23
100.00
77.04
100.00
48.69
GJ
135.55
135.55
132.44
132.44
128.09
128.09
120.79
120.79
108.37
108.37
L
γ = 0.2
110.32
94.59
107.19
90.93
103.18
85.50
100.00
75.13
100.00
48.69
L
γ = 0.2
135.69
135.69
132.52
132.52
128.18
128.18
120.86
128.86
108.37
108.37
GJ
109.16
94.75
106.63
91.04
102.82
85.60
100.00
75.21
100.00
48.69
GJ
132.72
132.72
129.91
129.91
126.02
126.02
119.52
119.52
108.37
108.37
L
γ = 0.25
107.85
91.63
105.07
88.19
101.66
83.09
100.00
73.44
100.00
48.69
L
γ = 0.25
132.84
132.84
129.98
129.98
126.10
126.10
119.58
119.58
108.37
108.37
GJ
106.65
91.79
104.05
88.29
101.15
83.20
100.00
73.53
100.00
48.69
GJ
Table 1 The present values of participating contracts under a constant risk-free rate. The table reports a comparison between the results provided by the proposed lattice model (L), and the ones provided by the univariate model of Grosen and Jorgensen (G J ) [4]
180 M. Costabile et al.
1
0.75
0.5
110.75
(0.041)
118.81
(0.038)
115.17
109.94
106.70
(0.033)
110.43
106.03
100.99
(0.025)
103.98
100.52
91.76
(0.015)
91.43
100.39
0.25
0
MC
80.60
L
80.58
α
(0.042)
115.44
γ =0
107.57
(0.039)
111.41
106.89
103.21
(0.035)
106.14
102.60
96.86
(0.026)
96.42
100.00
r (0) = 0.04
1
0.75
0.5
86.05
(0.015)
85.73
100.00
0
0.25
71.39
71.36
α
MC
γ =0
L
r (0) = 0.08
115.52
106.34
112.40
102.83
108.44
97.96
103.19
90.00
100.39
80.58
L
γ = 0.05
111.93
103.05
108.45
99.14
104.05
93.58
100.00
84.05
100.00
71.36
L
γ = 0.05
(0.039)
106.99
(0.035)
103.37
(0.031)
98.38
(0.024)
90.32
(0.015)
80.64
MC
(0.040)
103.63
(0.037)
99.67
(0.033)
94.03
(0.025)
84.32
(0.015)
71.37
MC
112.76
103.18
110.09
100.03
106.80
95.73
102.58
88.77
100.39
80.58
L
γ = 0.1
108.95
99.62
105.96
96.06
102.34
91.06
100.00
82.58
100.00
71.36
L
γ = 0.1
(0.037)
103.74
(0.034)
100.49
(0.030)
96.11
(0.023)
89.05
(0.015)
80.63
MC
(0.038)
100.18
(0.036)
96.67
(0.032)
91.50
(0.024)
82.85
(0.015)
71.38
MC
110.45
100.40
108.20
97.60
105.48
93.79
102.10
87.69
100.39
80.58
L
γ = 0.15
106.45
96.56
103.94
93.33
100.98
88.83
100.00
81.28
100.00
71.36
L
γ = 0.15
S(0) = 100, B(0) = 0, T = 10, σ S = 0.15, r G = 4.5%, n = 250, a = 0.01, δ = 0.3, θ = 0.08, σr = 0.1, ρ = −0.3
(0.035)
100.97
(0.032)
98.07
(0.028)
94.13
(0.022)
87.97
(0.015)
80.64
MC
(0.037)
97.09
(0.034)
93.77
(0.030)
89.24
(0.023)
81.54
(0.015)
71.38
MC
108.54
97.97
106.65
95.47
104.41
92.10
101.72
86.77
100.39
80.58
L
γ = 0.2
104.37
93.83
102.29
90.90
100.00
86.85
100.00
80.13
100.00
71.36
L
γ = 0.2
(0.033)
98.43
(0.031)
95.91
(0.027)
92.49
(0.021)
87.00
(0.015)
80.64
MC
(0.035)
94.38
(0.033)
91.42
(0.029)
87.25
(0.022)
80.36
(0.015)
71.38
MC
106.96
95.86
105.39
93.62
103.57
90.64
101.43
85.97
100.39
80.58
L
γ = 0.25
102.68
91.42
100.96
88.75
100.00
85.10
100.00
79.12
100.00
71.36
L
γ = 0.25
(0.032)
96.32
(0.029)
94.01
(0.026)
91.00
(0.020)
86.17
(0.015)
80.63
MC
(0.033)
91.86
(0.031)
89.18
(0.028)
85.41
(0.021)
79.38
(0.015)
71.36
MC
Table 2 The present values of participating contracts when interest rates follow a Cox-Ingersoll-Ross [2] process. The table reports a comparison between the results provided by the proposed lattice model (L) and the ones generated through Monte Carlo simulations (MC) when T = 10 years with n = 250
A Lattice Approach to Evaluate Participating Policies … 181
182
M. Costabile et al.
References 1. Costabile, M., Massabò, I.: A simplified approach to approximate diffusion processes widely used in finance. J. Deriv. 17(3), 65–85 (2010) 2. Cox, J.C., Ingersoll, J.E., Ross, S.A.: A theory of the term structure of interest rates. Econometrica 53(2), 385–407 (1985) 3. Cox, J.C., Ross, S.A., Rubinstein, M.: Option pricing: a simplified approach. J. Financ. Econ. 7, 229–263 (1979) 4. Grosen, A., Jørgensen P.L.: Fair valuation of life insurance liabilities: the impact of interest rate guarantees, surrender options, and bonus policies. Insur.: Math. Econ. 26, 37–57 (2000) 5. Harrison, M.J., Kreps, D.M.: Martingales and arbitrage in multiperiod securities markets. J. Econ. Theory 20, 381–408 (1979) 6. Vasicek, O.A.: An equilibrium characterization of the term structure. J. Financ. Econ. 5(2), 177–188 (1977)
Multidimensional Visibility for Describing the Market Dynamics Around Brexit Announcements Maria Elena De Giuli, Andrea Flori, Daniela Lazzari, and Alessandro Spelta
Abstract We propose a multivariate procedure based on multidimensional visibility graph to detect changes in the UK financial system volatility, considered both before and after Brexit main events. We aim at recognizing whether external news related to the Brexit process could induce significant “after-shocks” (and also “pre-shocks”) in the system by producing dynamic relaxation in the values of the centrality measures in line with the cascade effects which follow the Omori earthquake law. In particular, the “after-shocks” high volatility cascades dissipate into the market via power-law relaxation, showing the significant market inefficiency in processing Brexit related news. On opposite, the market is instead more efficient in processing other categories of events, such as the Bank of England monetary policy announcements. Keywords Brexit · Financial markets · Visibility graphs · Tensor decomposition · Omori law
1 Introduction A granular and systematic analysis of the impact of the Brexit process on UK financial market is still a missing link in the existing empirical literature. Most of the studies are focused on the economic or political effects of the Brexit (see, e.g., Vasilopoulou [11], Hosoe [1]), or on the effects of Brexit on the aggregate financial system (see, e.g., Schiereck et al. [8]). On this latter line of research, our analysis M. E. De Giuli · D. Lazzari · A. Spelta (B) University of Pavia, Pavia, Italy e-mail: [email protected] M. E. De Giuli e-mail: [email protected] D. Lazzari e-mail: [email protected] A. Flori Polytechnic of Milan, Milan, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_27
183
184
M. E. De Giuli et al.
aims to study the dynamic of the domestic UK financial market before and after the main Brexit-related news. To analyse the dynamic response of the UK market to the exogenous shocks related to the Brexit process, we employ visibility algorithms [2] for mapping the information encoded in time series into a graph. By representing a time series as a graph, one can therefore investigate through a topological analysis the inherited structural patterns at different time scales from microscopic to macroscopic levels [3]. Secondly, we apply a probabilistic tensor decomposition [6, 9] to obtain global centrality measures from the resulting visibility multi-layer network, namely the Multidimensional Volatility Indicator (MVI) and the Type Centrality (TC) score. The MVI obtained from the visibility multi-layer network allows to synthesize the volatility behaviours of the series both in the temporal and in the cross-sectional dimension; instead the TC score, provided by the decomposition, contains information on the importance of each asset series. To analyse the dynamic response of the UK aggregate market volatility to exogenous shocks related to the Brexit process, we study financial fluctuations embedded into the MVI, using the Omori law. Omori [5], Utsu [10]. This means to study the behaviour of the MVI associated to time points in the neighbourhood of relevant external events. We aim at recognizing whether external news related to the Brexit process could induce significant “after-shocks” (and also “pre-shocks”) in the system by producing dynamic relaxation in the values of the centrality measures in line with the cascade effects which follow an earthquake energy propagation [4, 7]. We find that Brexit announcements produce financial shocks whose dynamic is described by an analogue of the Omori earthquake law, which follow power-law decay. We find also that the same law describes “pre-shocks” behaviour before the date of the Brexit events. Moreover, we observe significant market inefficiency in processing Brexit related news, in particular if compared to other categories of events, such as Bank of England monetary policy announcements and policy action.
2 Data and Methodology 2.1 Natural Visibility Graph Lacasa et al. [3] propose to transform a time series into a graph according to a mapping algorithm named Natural Visibility, which links every point of the time series with all those that can be “seen” from the top of the considered point, i.e. if there exists a straight line that connects the data points of the series and does not intersect the height of any other intermediate data [2]. Formally, two time stamps, ta and tb of a series taking values ya and yb , will represent connected nodes of the associated visibility graph, i.e. V (a, b) = 1 if any other data (tc , yc ) placed between them fulfills the following inequality condition:
Multidimensional Visibility for Describing the Market Dynamics …
V (a, b) = 1 if yc < yb + (ya − yb )
t b − tc tb − ta
185
(1)
2.2 Tensor Decomposition and Centrality Measures The probabilistic tensor decomposition results in two centrality scores, namely: the Multidimensional Volatility Indicator (MVI) and the Type Centrality (TC). These scores are associated respectively to the temporal and to the cross-sectional dimension of the multi-layer network. While the first indicates the importance of each time point in terms of magnitude and co-movement among the series, the second contains information on the probability that high scoring nodes are connected in such layer. Formally, the multi-layer network in which each layer represents the visibility graph associated with one of the K time series of length T can thus be mapped into a third order tensor V ∈ RT ×T ×K , as we have a 2-dimensional visibility graph for each asset series k ∈ K , the latter representing the third dimension. Therefore, V ∈ RT ×T ×K represents a 3-order tensor obtained by stacking the adjacency matrices of the visibility graphs Vk for k = 1, ..., K . Each element of the tensor vi jk takes value 1 if nodes i and nodes j are connected in the k-th layer, and zeros otherwise. To obtain the centrality measures from the visibility multi-layer network, the starting point is the computation of the (bivariate) conditional frequencies H , and R for time stamps and asset types respectively. They can be obtained by normalizing the entries of the tensor V as follows: h i| jk = rk|i j =
vi jk T i=1 vi jk vi jk K k=1 vi jk
i = 1, ..., T k = 1, ..., K
(2)
The above quantities can be used to estimate the MVI and TC as: K M V Ii = Tj=1 k=1 h i| jk M V I j T Ck i = 1, ..., T T T T Ck = i=1 j=1 rk|i j M V Ii M V I j k = 1, ..., K
(3)
3 Visibility Cascading Dynamics Common patterns of complex interactions, derived from the topology of the visibility graphs and extracted via tensor decomposition, are used to investigate the impact of external shocks on the volatility of the aggregate financial system. In a nutshell, we investigate whether the behaviour of MVI in the ten trading days following (and proceeding) the day of the exogenous news fulfils the so-called Omori law. The Omori
186
M. E. De Giuli et al.
law represents statistical regularities observed for geophysical earthquakes and helps in quantifying how the MVI after-shocks (or pre-shocks) decay with time. In order to study the aggregate market dynamics around an external shock occurring at time Ts , we analyze the daily variation M V I (|t − Ts |) around time Ts . M V I (|t − Ts |) quantifies the value of the centrality measure at time t both before and after a market shock occurring at time Ts . The Omori law describes the dynamic of this measure following a perturbation at time Ts as: M V I (|t − Ts |) ∼ |t − Ts |β M V I
(4)
where the parameters β M V I stands for the Omori power-law exponent, t < Ts corresponds to the period before the main shock, and t < Ts corresponds to the dynamics after the main shock. In the analysis that follows, we focus on C M V I (|t − Ts |), which represents the cumulative value of the Multivariate Volatility Indicator during the period |t − Ts |. In order to compare the visibility dynamics before and after the news-related event, we separate centrality values symmetrically around Ts as C M V Ib (t|t < Ts ) and C M V Ia (t|t > Ts ), where suffix b, a stand for before and after shock, respectively. We define the displaced time as τ = |t − Ts |. We then employ a linear OLS fit on a log-log scale to estimate the Omori power-law exponents β M V I,a and β M V I,b .
4 Results To analyse the role that different news play in explaining the rate of occurrence of large fluctuations in the financial system, we investigate the behavior of the MVI in the neighbourhood of relevant events, which could affect assets prices dynamics. To better understand aggregate market responses on different types of shocks, we classify relevant external news observed in the markets over the 3-year period 2016– 2019 into two different categories, namely: news regarding the Brexit events and monetary policy news. We consider the following asset classes identified by the ticker Bloomberg. The equity ckass is represented by the FTSE UK Indexes: FTSE100, FTSE250, FTSE350, FTSE All-Share, FTSE Techmark Focus, FTSE AIM All-Share Index; MSCI UK Industrials Index. MSCI UK Financials Index. The UK government yield curve is represented by the 2, 5, 10, 30, 50 maturities. Finally we consider the sterling (GBP) cross-rates against the main currencies, i.e. US dollar, Euro, Yen and Swiss Franc. Figure 1, left panel, shows the M V I behavior during the period 2016–2019 together with dashed color lines, which identify the date of news regarding Brexit announcements (red bars) and monetary policy activities. The latter have been divided into interest rate announcements (green bars) and asset purchasing announcements (blue bars). Notice how the M V I peaks in correspondence with the Brexit referendum, which have stormed the whole UK financial markets by producing the highest volatil-
Multidimensional Visibility for Describing the Market Dynamics …
187
Fig. 1 MVI dynamics and Omori Law: The figure shows, in the left panel, the MVI reported against the dates of Brexit related news (red bars) and of monetary policy actions (interest rate announcements in green, asset purchasing announcements in blue). The right panel reports the log-log plot of the cumulative distribution function of the MVI around the days of the external shocks. Lines are reported with different color, Brexit evens are displayed in red while interest rate announcements in green and asset purchasing announcements in blue. The legend provides the value of the Omori exponents for both pre-shocks and after-shocks
ity in the system. The sterling depreciated heavily against major currencies, stock market indices in the UK decreased sharply and asset volatility increased. This backdrop triggered a more aggressive expansionary monetary policy by the Bank of England (BoE), with interest rate cuts and additional non-standard monetary policy measures. Brexit uncertainty surged after the June 2016 referendum and it spiked after the September 2018 Salzburg summit when the EU did not accept the UKs Brexit proposal, as it increased the chance of a no-deal Brexit, an outcome that could represent a significant shock for the financial markets. The EU and UK did subsequently come to a withdrawal agreement in November 2018, but this was rejected by the UK Parliament. After Brexit was postponed after the 29 March 2019, the date originally due for the Brexit, volatility showed alternating phases. Figure 1, right panel, shows the average volatility response embedded into the dynamic of M V I obtained by combining the individual Omori responses to different type of exogenous news. In order to compare the dynamics before and after the announcement, we have separate the M V I dynamics symmetrically around the announcement time. We then employ a linear fit on a log-log scale to determine the Omori power-law exponents before the news and after the news. We observe that the Omori exponent is positive for both the after-shock and pre-shock cases and for both monetary policy and Brexit announcements. Moreover, the lower Omori exponents related to the Brexit announcements with respect to the monetary policy actions suggest a market inefficiency in processing the Brexit related news with respect to announcements of the BoE regarding interest rates adjustments or assets purchasing. Finally, notice that between the two classes of monetary policy actions, the asset purchasing has the highest exponent, thus suggesting a more efficiency of market processing. This can be due to the fact that both the dates of the purchasing and the quantity of purchased assets by the BoE are known in advance by market participants.
188
M. E. De Giuli et al.
5 Conclusion In our analysis we have approached the Brexit process by multidimensional visibility graph to detect changes in the UK financial market. We have observed significant market inefficiency in processing Brexit related news, in particular if compared to other categories of events, such as Bank of England monetary policy announcements and policy action. External news related to the Brexit process implies significant “after-shocks” (and also “pre-shocks”) in the system by producing dynamic relaxation in the values of the centrality measures in line with the cascade effects which follow an earthquake energy propagation. We find that Brexit announcements produce financial shocks whose dynamic is described by an analogue of the Omori earthquake law. Indeed, we find that the Brexit process induces high volatility cascades of “after-shocks”, which follow power-law decay. We find also that the same law describes “pre-shocks” behaviour before the date of the Brexit events. Moreover, we observe significant market inefficiency in processing Brexit related news, in particular if compared to other categories of events, such as Bank of England monetary policy announcements and policy action.
References 1. Hosoe, N.: Impact of border barriers, returning migrants, and trade diversion in brexit: firm exit and loss of variety. Econ. Modell. 69, 193–204 (2018) 2. Lacasa, L., Flanagan, R.: Time reversibility from visibility graphs of nonstationary processes. Phys. Rev. E 92(2), 022817 (2015) 3. Lacasa, L., Luque, B., Ballesteros, F., Luque, J., Nuno, J.C.: From time series to complex networks: the visibility graph. Proc. Natl. Acad. Sci. 105(13), 4972–4975 (2008) 4. Lillo, F., Mantegna, R.N.: Power-law relaxation in a complex system: Omori law after a financial market crash. Phys. Rev. E 68(1), 016119 (2003) 5. Omori, F.: On the aftershocks of earthquakes. J. Coll. Sci. 7, 111–200 (1894) 6. Pecora, N., Spelta, A.: A multi-way analysis of international bilateral claims. Soc. Netw. 49, 81–92 (2017) 7. Petersen, A.M., Wang, F., Havlin, S., Stanley, H.E.: Market dynamics immediately before and after financial shocks: Quantifying the omori, productivity, and bath laws. Phys. Re. E 82(3), 036114 (2010) 8. Schiereck, D., Kiesel, F., Kolaric, S.: Brexit: (not) another lehman moment for banks? Finan. Res. Lett. 19, 291–297 (2016) 9. Spelta, A., Flori, A., Pammolli, F.: Investment communities: behavioral attitudes and economic dynamics. Soc. Netw. 55, 170–188 (2018) 10. Utsu, T.: A statistical study on the occurrence of aftershocks. Geophys. Mag. 30, 521–605 (1961) 11. Vasilopoulou, S.: UK euroscepticism and the brexit referendum. Polit. Q. 87(2), 219–227 (2016)
Risk Assessment in the Reverse Mortgage Contract Emilia Di Lorenzo, Gabriella Piscopo, Marilena Sibillo, and Roberto Tizzano
Abstract Changes in demographic structure of a country and economic condition are interconnected causes of new needs that motivate the launch of new financial instruments. One of these is the Reverse Mortgage (RM): a contract in which a homeowner borrows a part or the totality of the future liquidation value of his/her home at the time of his/her death. The paper analyses the contractual details and discusses the impact of the main variables on the lamp sum that an elderly homeowner receives at the inception of the contract. The risks factors that influence the pricing of the RM are both strictly demographic, i.e. the life of the contractor, and financial, in particular the evolution of the real estate market and of the financial market. Keywords Personal pension products · Real estate · Reverse mortgage
1 Introduction The development of financial and pension products aims at addressing needs changing in relation to socio-economic and demographic trends; a recent report by EIOPA [5] emphasized the importance of “personal voluntary pension savings”, to improve the sustainability and adequacy of the pension system. An interesting contractual solution that appears to be especially suited for new segments of the pension market E. Di Lorenzo · G. Piscopo (B) · R. Tizzano Department of Economic and Statistical Sciences, University of Naples, via Cintia, 80126 Naples, Italy e-mail: [email protected] E. Di Lorenzo e-mail: [email protected] R. Tizzano e-mail: [email protected] M. Sibillo Department of Economics and Statistics, Campus di Fisciano, University of Salerno, via Ponte Don Melillo, 84084 Fisciano, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_28
189
190
E. Lorenzo et al.
considers home ownership as a source of income at the end of one’s working life [3]. Thanks to these contracts, known as Reverse Mortgages (RMs), retired people with substantial medical or other welfare expenses are granted access to credit [7]. RMs are designed for house-rich-cash-poor population groups [6], because elderly home owners can receive money in a lump sum or in lifetime instalments, weighted against the value of their home, in which they will be entitled to live until they die [8]. Often RMs offer a non equity guarantee [2]: the home holder receives a lump sum against a proportion of the value of his house; at the borrower’s death the heirs have to pay the minimum amount between the contractual proportion of the initial house value accumulated at the interest rate on the loan and the same proportion of the current house value [11].
2 The Model A RM is a financial product whose valuation is influenced by the randomness of the real estate market, the financial market risk and the longevity risk [12]. If the borrower lives longer than expected, the liquidation of the asset is postponed by thus determining a stronger impact of the house price risk and of the interest rate risk. Let us consider a RM with a non-negative equity guarantee. The contract is issued in t = 0 with a home holder aged x. At t = 0 he receives a lump sum L S0 against a proportion α of the current value of his house H0 ; this proportion represents at the same time also the amount A0 of the loan in t = 0. Due to the presence of the guarantee, L S0 < A0 = α H0 ; in particular, the difference A0 − L S0 represents the cost of guarantee [4]. At the borrower’s death, that will happen at a stochastic time, the heirs will receive the amount Vt given by: Vt = min(At , Ht ) where
t At = α H0 exp
(1) t
rsR M ds
= α H0 exp
0
(rs + π )ds
(2)
0
t Ht = α H0 exp
rsH V ds
(3)
0
r R M is the interest applied by the lender on the reverse mortgage loan, equal to the risk free rate rs plus the risk premium π and r H V is the appreciation rate of the house value. Let γ be the random variable expressing the future lifetime in continuous
Risk Assessment in the Reverse Mortgage Contract
191
time, f x (t) its probability density function and ω the extreme age; since γ and Vt are independent the lump sum can be calculated as follows: ω−x
L S0 =
f x (t)E 0 (Vt )dt
(4)
x
3 Numerical Application The aim of this section is to calculate the Lump Sum given to a home holder aged x. For this purpose, we have to take into account the three stochastic risk factors described in the Introduction. As regards the longevity risk [10], the mortality terms in Eq. 4 will be extracted from a projected life table produced implementing the Lee Carter (LC) model [10] on the Italian male mortality dataset, downloaded from the Human Mortality Database. We have fitted the LC on the historical Italian male mortality data from year 1872 to 2014 and then projected the table until 2030. As regards the interest rate, we have calibrated the Cox et al. [1] model to the Italian 10 year government bond daily yield from 2000 to 2018 and have simulated with B paths the evolution of the interest rate up until T years. As regard the evolution of real estate market, we have used for illustrative purposes the Geometric Brownian motion calibrated on the quarterly variations of the Italian Real Estate Index IPAB diffused by ISTAT (Italian National Institute of Statistics) between 2010 and 2019 and have simulated with B paths the evolution of the interest rate up until T years. The fitted parameters of the model are summarized in Table 1. Table 2 shows the results, the lump sum and the cost of guarantee are calculated for homeholders aged differently at the inception of the contract. Table 1 Parameters of the model
T
30
ω r0 π H0 α B
100 2.56% 3% 10000 0.7 10000
192 Table 2 Lump sum and cost of guarantee
E. Lorenzo et al. x
Lump sum
Cost of guarantee
70 75 80
43285.25 47868.96 52447.95
26714.75 22131.04 17552.05
References 1. Cox, J., Ingersoll, J.E., Ross, S.A.: A theory of the term structure of interest rates. Econometrica 53, 385–407 (1985) 2. De la Fuente Merencio I., Navarro E., Serna G.: Estimating the no-negative-equity guarantee in reverse mortgages: international sensitivity analysis. In: New Methods in Fixed Income Modeling. Springer International Publishing (2018). https://doi.org/10.1007/978-3-319-95285-7_13 3. D’Amato, V., Di Lorenzo, E., Haberman, S., Sibillo, M., Tizzano R.: Pension schemes versus real estate. Ann Oper Res (Springer, US) (2019). https://doi.org/10.1007/s10479-019-03241y. Print ISSN 0254-5330, Online ISSN 1572-9338 4. Emilia, L., Gabriella, P., Marilena, S., Roberto, T.: Reverse mortgages through artificial intelligence: new opportunities for the actuaries. Decisions Econ. Financ. 44(1), 23–35 (2021) 5. EIPOA.: EIOPA’s advice on the development of an EU single market for personal pension products (PPP) EIOPA-16/457 (2016) 6. Fornero, E., Rossi, M., Virzi, Bramati M.C.: Explaining why, right or wrong, (Italian) households do not like reverse mortage. J. Pension Econ. Financ. 15(2), 180–202 (2016) 7. Guerin, J.: Feature: nobel prize-winning economist Robert Merton. Reverse Rev. (2016). https://www.reversereview.com/magazine/features/feature-nobel-prize-winning-economistrobert-merton.html 8. Hanewald, K., Post, T., Sherris, M.: Portfolio choice in retirement: what is the optimal home equity release product? J. Risk Insur. 83(2), 421–446 (2016) 9. Huang, H.C., Wang, C.W, Miao, Y.: Securitization of crossover risk in reverse mortgage. Geneva Pap. Risk Insur. Issues Pract. (SPECIAL ISSUE ON LONGEVITY) 36(4) (2011), 622–647 10. Lee, R.D., Carter, L.R.: Modeling and forecasting U. S. mortality. J. Am. Stat. Assoc. 87(419), 659–671 (1992) 11. Nakajima, M., Telywcava, I.: Reverse mortgage loans: a quantitative analysis. J. Financ. 72(2), 911–950 (2017) 12. Shao, A.W., Hanewal, K., Sherris, M.: Reverse mortgage pricing and risk analysis allowing for idiosyncratic home price risk and longevity risk. Insur.: Math. Econ. 63, 76–90 (2015)
Neural Networks to Determine the Relationships Between Business Innovation and Gender Aspects Giacomo di Tollo, Joseph Andria, and Stoyan Tanev
Abstract Gender aspects of management, innovation and entrepreneurship are gaining more and more importance as cross-cutting issues for researchers, practitioners and decision makers. Extant literature pays a growing attention to the hypothesis that there exists a correlation between the gender diversity of corporate boards of directors and the business attitude to innovation. In this paper we introduce a working framework to test the aforementioned hypothesis and to examine the correlation between board diversity and innovation perception of a business. This framework is based on correlation computation and feed-forward neural networks, and it is used to evaluate whether the gender component may be used to predict the innovation perception of a business. First results about three different economic scenarios are reported and discussed. Keywords Innovation and entrepreneurship · Gender diversity · Corporate boards of directors · Perception of innovation · Feed-forward neural networks
1 Introduction The ongoing discussion about gender balance is gaining a growing attention. There seems to be a substantial interest in the potential impact of the diversity of corporate boards of directors, which is one of the most significant issues in corporate G. di Tollo (B) Department of Computer Science, Université du Luxembourg, Maison du Nombre, 6, Avenue de la Fonte, 4364 Esch-sur-Alzette, Luxembourg e-mail: [email protected] J. Andria Dipartimento di Scienze Economiche, Aziendali e Statistiche, University of Palermo, Palermo, Italy e-mail: [email protected] S. Tanev Sprott School of business, Carleton University, Ottawa, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_29
193
194
G. Tollo et al.
governance [13]. There is a stream of publications about gender analysis in top management [11] and about how external factors (i.e., government actions [15]) could have an impact on the ways gender differences affect entrepreneurial activity. In particular, there has been a growing attention to the relationship between board diversity and innovation [14]: some of the research studies investigated the link between governance and innovation strategies, especially focusing on the relationship between board demographic characteristics and firm innovation. More specifically, some of the studies attempted to examine the link between board diversity and firm innovation, based on the hypothesis that boards heterogeneity can lead to broader range of ideas, creativity, and eventually a higher level and quality of innovation [8]. Nevertheless very few studies investigated the effect of such patterns of board diversity on innovation. There are also studies suggesting that stronger female board representation is associated with better firm performancecite and greater innovation success [12], especially in innovation-intensive industries. In addition, female board representation was found to be positively associated with performance only for firms for which innovation and creativity play a particularly important role [3]. In our paper, we will explore further the logic of this research theme, by examining the correlation between the proportion of women in companies board of directors and the articulation of companies innovativeness on their websites, i.e., the perception of their innovativeness in the way it is articulated on their websites. In order to assess companies perception of their innovativeness, we used a web search tool that measures the frequency of preliminary designed sets of words (or a logical combinations of words) on the websites of samples of firms from specific business sectors [4, 16]. We performed this operation on three different sets of data: the first encompasses 273 Open Source (OSS) businesses associated with the Eclipse OS Foundation and businesses listed on two websites Open Source Experts (www.opensourceexperts.com), and the Canadian Companies Capabilities Directory of OS Companies (http://strategis.ic. gc.ca/epic/site/ict-tic.nsf/en/h_it07356e.html). The other two sets of data consist of businesses listed on the FTSE and NASDAQ indices. The first set of data has already been analyzed in the context of the relation between customer value co-creation and innovation [4, 5]. The purpose of this exploration is to understand whether there is a difference about the gender—innovation relation between innovation-based and more traditional business. Furthermore, we want to refine our study, by assessing whether gender aspects and co-creation activities can be jointly used in predicting the online innovation claims by businesses in different sectors. To this aim, we are using data related to firms co-creation practices used by di Tollo et al. [4, 5], and we are refining the neural network approach adopted therein by using as input indicators: • the co-creation metrics introduced by di Tollo et al. [4, 5]; • the proportion of women in the company board of directors. The paper is organised as follows: Sect. 2 outlines the main concepts and metrics used in this contribution; Sect. 3 reports the results obtained to assess the relationship between gender and innovation; then we introduce our neural network approach in Sect. 4, before concluding and outlining possible future research in Sect. 5.
Neural Networks to Determine the Relationships …
195
Table 1 Main statistics of the innovation perception metrics over two years Year 2013 Year 2019 Mean STD Min Max Mean STD Min OSS NASDAQ FTSE
0.27 0.27 0.34
0.18 0.16 0.27
0 0 0
4.82 3.08 13.18
0.17 0.11 0.01
0.36 0.21 0.18
0 0 0
Max 3.38 1.33 11.06
2 The Main Concepts: Innovation and Gender Analysis In what follows we are outlining the main concepts used in the paper, and reporting the corresponding data used. Section 2.1 outlines the innovation concept and metrics; Sect. 2.2 outlines the gender components and the corresponding metrics.
2.1 Innovation A common idea in innovation-related literature is that it is possible to use the frequency of specific keywords on the businesses’ website and related documentation, to assess the relevance of the activities represented by those keywords for the businesses themselves [10]. The rationale behind this assertion is that most innovation activities are described online, since the more a firm describes a specific activity, the more it deems this activity relevant for its current situation [6]. In this context [16], has defined a methodology that measures the frequency of firms online comments about their new products, processes, and services to assess the perception of firms innovativeness. The keyword (regular expression) used in that contribution is new ∧ ( pr oduct ∨ ser vice ∨ pr ocess ∨ application ∨ solution ∨ f eatur e ∨ r elease ∨ ver sion ∨ launch ∨ intr oduction ∨ intr oduce ∨ (new pr oduct) ∨ (new ser vice) ∨ (new pr ocess)). In our contribution we are quantifying the occurrence of this regular expression in businesses belonging to the three specific sets of data (see Introduction), over two distinct years of observations. Statistics about this keyword can be found in Table 1.
2.2 Gender Composition of Board We have computed the ratio between women in the board and total members of the board to define a gender metric, whose results are reported in Table 2.
196
G. Tollo et al.
Table 2 Main statistics of the ratio of women to total members of the board over two years Year 2013 Year 2019 Mean STD Min Max Mean STD Min Max OSS NASDAQ FTSE
0.15 0.21 0.22
0.21 0.25 0.26
0 0 0
0.59 0.48 0.24
0.17 0.22 0.28
0.14 0.09 0.09
0 0 0
0.57 0.5 0.25
Table 3 Mutual information between gender component and innovation perception over two years Set of data 2013 2019 OSS NASDAQ FTSE
0.31 0.18 0.05
0.34 0.19 0.12
3 Mutual Information Mutual information is a measure of the amount of information that one random variable has about another variable [2]. In other words, it is a measure of the mutual dependence between the two variables. Initially applied to information theory, it is nowadays used as a fundamental tool of investigation for many other research areas, such as mathematics, computer science and economics. In our contribution, we use the mutual information to quantify the amount of information about one random variable (i.e., the innovation metrics) through observing the other random variable (i.e., the gender metrics). We have computed the pairwise mutual information amongst all variables defined in Sects. 2.1 and 2.2, for the two years taken into account. We report the values obtained for years 2013 and 2019 in Table 3. It is interesting to remark that the mutual information obtained between the gender metrics and the innovation metric shows an increase over years, and this is more evident on the set of data reporting innovative businesses (OSS).
4 Our Neural Network Approach In the last decades there has been an increasing need, for business facing globalisation and markets with heterogeneous customer, to look for new sources of innovation and competitive differentiation, i.e., by letting end users become an active part in the product design, services design and experience exploiting. In this framework, an important source of value is given by the co-creation experience, that results in a narrow interaction between company and customer, that have the customer become an active stakeholder.
Neural Networks to Determine the Relationships …
197
In this section we define a neural network approach to predict the perception of innovation of businesses. Such prediction has already been attempted by di Tool et al. [4, 5], by using as input the business co-creation components: the rationale is that the frequencies of a specific set of keywords can be used to extract the key components of value co-creation activities and to classify value co-creation practices, through a careful construction of a set of keywords to represent the different value co-creation constitutive dimensions. To this extent, the frequency of use of each of the keywords on companies websites and news releases is measured, and eventually Principal Component Analysis (PCA) is used to reduce the problem complexity, through identify emerging groups of keywords (components) that could be associated with specific self-consisting groups of activities [1, 4]. We want to improve this prediction by using, along with the co-creation components used by di Tool et al. [4, 5], the gender metrics introduced in Sect. 2.2. The same neural networks introduced by these former works have been used, with the only modification given by adding an input neuron, associated to the gender metrics. In all experiments we sample two disjoint sets of observations out of the total number of businesses: the training set (used to estimate the networks’ parameters) and the test set (used for assessing their performances). Businesses have been randomly allocated to these two sets, so as to have the training set consisting of 75% of the total businesses, and the test set consisting of 25% of the total businesses. This procedure is repeated 30 times, each time leading to a different sub-sample, and is needed to compute a robust weighted error function over the diverse train-test partition. As in [4], in order to verify whether there is correlation between the current network output and the innovation metrics and to stress non-linear features between variables, we have computed the cumulative empirical distribution of Spearman’s rank based correlation value between desired and current networks output values over the 30 different partitions of training and test sets. By keeping in mind that a positive rank-based correlation has been already identified, we want to stress that the same holds in our approach, with the difference that the correlation measure reported by di Tool et al. [4, 5] is greater than 0.85 in 70% of the cases, instead in our approach that measure around 0.85 in more that 75% of the cases over the 2013 sets of data. We have also run a pairwise Wilcoxon test between results by di Tollo et al. [4] and by our approach, over the best solutions for each of the 30 partitions found on all instances, and we have obtained p-values smaller that 0.025, leading us to refuse the hypothesis that the distribution from which they are drawn are equivalent. As for results from year 2019, the correlation measure is greater than 0.85 over almost 80% of the cases: This remark allows us to state that the gender metrics can be used jointly to the co-creation components to predict the innovation perception of a business, and that this trend seems to be more and more relevant over years. Further years of observation are necessary to test this hypotesis, and this is left for further works.
198
G. Tollo et al.
5 Conclusion In this paper we have set un an experimental framework to assess the correlation (assessed by the mutual information) between business innovation perception and gender balance. After our results, we can conclude that it is not possible to detect a universal relationship between innovation perception and gender diversity of the board of directors: traditional business (i.e., the ones belonging to traditional financial indices) do show a low magnitude of such correlation. Conversely, it is possible to detect an higher positive relationship between innovation perception and gender diversity on businesses that are referred to as innovative: in our exemple, firms belonging to businesses from the OSS set of data show a bigger mutual information over these two aggregates. In both cases, it is possible to remark an increasing relationships between the two aggregates over time. Further observations over years are necessary to corroborate these findings. Furthermore, we can state that the gender metrics can be used to improve a neural network approach aimed to innovation perception assessment through co-creation values. Also in this regard, we need further observations over years to produce a good understanding of this phenomenon. Further research will be devoted to detect whether a relationship exists between the gender component and the topics a business is dealing with on the Internet. To this extent, a topic-modeling framework will be implemented to detect the main topics dealt by a business on its own webpage, but also the topics dealt by users that comment the business’ activity. Furthermore, a cluster analysis will be performed, taking into account both gender and innovation metrics over different sets of data: for this analysis, a comparison will be performed between traditional approaches (such as X-means) and black-boxes methods (i.e., Self Organising Maps) in order to detect similarities and differences amongst the obtained clusters and to show different attitudes of innovation and gender aspects over different scenarios.
References 1. Allen, S., Tanev, S., Bailetti, T.: Components of co-creation. Open Sour. Bus. Rev. Online J. Spec. Issue Value Co-creat. (2009) 2. Andria, J., di Tollo, G., Pesenti, R.: A heuristic fuzzy algorithm for assessing and managing tourism sustainability. Soft Comput. (2019) 3. Chen, J., Leung, W.S., Evans, K.P.: Female board representation, corporate innovation and firm performance. J. Empir. Financ. 22, 236–254 (2018) 4. di Tollo, G., Tanev, S., De March, D., Ma, Z.: Neural networks to model the innovativeness perception of co-creative firms. Expert Syst. Appl. 39(16), 12719–12726 (2012) 5. di Tollo G., Tanev, S., Kassis S.M., De March D.: Determining the relationship between co-creation and innovation by neural networks. In: Complexity in Economics: Cutting Edge Research. Springer International Publishing, pp. 49–62 (2014) 6. Ferrie, W.J.: Navigating the competitive landscape: the drivers and consequences of competitive aggressiveness. Acad. Manag. J. 44(4), 858–877 (2001) 7. Grisoni, L., Beeby, M.: Leadership, gender and sense-making. Gender Work Organ. 14(3), 191–209 (2007)
Neural Networks to Determine the Relationships …
199
8. Hambrick, D.C., Mason, P.A.: Upper echelons: the organization as a reflection of its top managers. Acad. Manag. Rev. 9(2), 193–206 (1984) 9. Hatcher, C.: Refashioning a passionate manager: gender at work. Gender Work Organ. 10(4), 391–412 (2003) 10. Hicks, D., Libaers, D., Porter, L., Schoeneck, D.: Identification of the technology commercialisation strategies of high-tech small firms. Small Bus. Res. Summ (2006) 11. Hjgaard, L.: Tracing differentiation in gendered leadership: an analysis of differences in gender composition in top management in business, politics and the civil service. Gender Work Organ. 9(1), 15–38 (2002) 12. Liu, Y., Wei, Z., Xie, F.: Do women directors improve firm performance in China? J. Corp. Financ. 28, 169–184 (2014) 13. Mahadeo, J.D., Soobaroyen, T., Hanuman, V.O.: Board composition and financial performance: uncovering the effects of diversity in an emerging economy. J. Bus. Ethics 105(3), 375–388 (2012) 14. Miller, T., Del Carmen Triana, M.: Demographic diversity in the boardroom: mediators of the board diversity-firm performance relationship. J. Manag. Stud. 46(5), 755–786 (2009) 15. Pfefferman, T., Frenkel, M.: The gendered state of business: gender, enterprises and state in israeli society. Gender Work Organ. 22(6), 535–555 16. Tanev, S., Bailetti, T., Allen, S., Milyakov, H., Durchev, P., Ruskov, P.: How do value co-creation activities relate to the perception of firms’ innovativeness? J. Innov. Econ. (1), 131–159 (2011)
Robomanagement TM : Virtualizing the Asset Management Team Through Software Objects Riccardo Donati and Marco Corazza
Abstract In this paper, we describe a Fintech application we have developed in the field of Asset and Portfolio Management. It regards the virtualization of an entire Asset Management Team, including the Asset Manager, the Risk Manager, the Analysts, the Broker, the Compliance and the Auditor. To show in practice the setup process, how our framework works and the achieved results, we use a real case study regarding the active management of an Investment Certificate. We underline the advantages of a virtual investment committee compared to a human one, including costs reduction, full customizability, scalability, human bias elimination, fast response to regime changes and operative risk control. Keywords Fintech · Robo-Manager · Asset Management
1 Introduction The application of technology to Finance, the so-called Fintech, allowed many disruptive innovations such as the virtualization of human traders into Trading Systems, or the virtualization of financial advisors into Robo-Advisors (see for instance [1, 5, 6]). Our application consists in the virtualization of an entire Asset Management Team (AMT), by transforming each involved human subject into a software object. Notice that, to the best of our knowledge, this is one of the first proposal of virtualization of an entire AMT. Before entering into the details, let us see the advantages of this approach. The first point regards costs. At present, there is a strong focus on cost of the investment vehicles, coming from financial regulators and from the investors themselves. R. Donati Redexe, Piazzale G. Giusti 8, 36100 Vicenza, Italy e-mail: [email protected] M. Corazza (B) Department of Economics, Ca’ Foscari University of Venice, Sestiere Cannaregio 873, 30121 Venezia, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_30
201
202
R. Donati and M. Corazza
It is obvious that high costs negatively affect the risk/reward ratio for an investor, compromising the effectiveness of an investment strategy. Our virtual AMT, being actually a software solution, costs a fraction of a human AMT, allowing more balanced cost structures. Another relevant advantage is the full back-testing capability, which of course is not possible in a human committee. Every new investment strategy, or parameters setting, can be back-tested and verified offline, without the psychological pressure coming from the financial markets. Linked to this, there is the elimination of the psychological biases that typically afflict an asset management committee. Finally, a human committee can manage just a few portfolios at a time, while a software framework is easily scalable and is straightforward to interface with data providers and brokers, limiting operational risks. Now, let us get into details, presenting the setup and the management of a real case investment vehicle: an Investment Certificate managed by one of a customers of Redexe, Egonon SA (Switzerland), with our Robomanagement TM framework. The preliminary phase consists in the setup of the investment vehicle. First of all, the investment philosophy has to be identified. For the aforementioned Certificate, such a vehicle aims to achieve a middle/long term return above inflation through an exposure to the real economy growth, limiting the downside risk properly using portfolio diversification and an active management style. Given this guideline, the client coherently decided to invest in a widely diversified global macro investment universe, ETFs and Futures, listed on regulated markets, allowing long positions only and keeping the financial leverage below 1. A benchmark has been set according to the precedent points. It was decided to be “Indice Fideuram Flessibili Netto” (IFFN), an index representing the flexible funds available in Italy. Finally, costs are a major parameter, which directly enter into the model. The Total Expense Ratio of the Certificate is set to be less than 1%. Once these general requirements have been defined, all the other parameters are fine-tuned through back testing, as we will in the next section.
2 The Software Objects and Some Results In this section, we analyze all the software objects in the virtual committee and the relationships among them (see Fig. 1). The first software object we present is TimeObject. It is not uncommon to see trading and investing platforms mixing past with future, in and out of sample data, so we decided to create an object that is in charge of isolating each investment committee, cropping out all the information not available before the virtual committee date. In this way, time consistency is not delegated to objects developed for other tasks, exposing to errors, but it is guaranteed by a dedicated object. For the Certificate analyzed in this paper, we planned a weekly virtual committee every Monday. The time window width is set to 2 years backward, starting from the previous Friday, and
Robomanagement
TM :
Virtualizing the Asset Management Team …
203
Fig. 1 Robomanagement TM software objects and the relationships among them
the orders are sent to the RoboBroker the day after the committee. The TimeObject prepares all the data according to this plan. RoboManager is the object in charge to define the final asset allocation. It considers many different candidate portfolios. Each of the latter is given to all the other objects, collecting their responses. Then, a suitable function, f RoboManager , is used to evaluate such responses and to assign a rank to any candidate portfolio. In an iterative process, f RoboManager is maximized and the optimal portfolio is so detected. At the end of this section, we will discuss some further detail. RoboRiskManager is the object in charge of measuring the risk of the portfolio and verifying the risk exposure constraints. For our Certificate, the customer decided to limit the maximum historical standard deviation of return to 4.5% yy and to use the Redexe proprietary risk measure RedES TM to estimate the portfolio risk. Let us open a brief parenthesis on such a risk measure. At present, a large part of the financial industry relies on statistics to measure risk and to optimize portfolios which are based on the assumption of normally distributed log-returns. As known, these statistics are often not able to appropriately describe the distribution of financial returns, in particular the tails. In a human committee, the inaccuracy coming from an incorrect risk measure is compensated by the intuition and the experience of the asset manager. Unfortunately, in a quantitative model, it is not easy to simulate such “human feel” and it is necessary to use more precise and coherent risk measures. So, we developed RedES TM , a risk measure built using Pareto-Lévy Stable-based risk measure and data clustering. All the details can be found in [2, 3]. There are two main reasons to focus on Pareto-Lévy Stable statistics to describe financial log-returns: time scaling invariance and the generalized central limit theorem (see for details [4]). For the purposes of this paper, we just recall that such statistics is able to correctly fit financial data, especially in the tails (see for example Fig. 2) and that our risk measure is able to give a good estimation of risk even when short time series are available only.
204
R. Donati and M. Corazza
Fig. 2 Plot of the daily returns distribution of the S&P 500 index, including dividends, from Jan. 3, 2000 to Feb. 24, 2020. The empirical distribution of the log returns is plotted in gray, the dotted line is the best Gaussian fitting and the continuous line is the best Pareto-Lévy Stable fitting
RoboAnalyst is in charge of estimating the expected return of the financial assets to select. The client may freely setup RoboAnalyst using technical indicators, fundamental indicators or any quantitative combination of them. In the real case which is studied, the customer decided to use a simple trend-following model based on two exponential moving averages, AnalystPosition.1 The latter takes value 1 if the faster moving average is above the slower one, 0 otherwise. The lengths of the two moving averages are determined before every virtual committee for each financial asset. To accomplish this, RoboAnalyst sets up an associated trading system which is long of one unit of the analysed financial asset when AnalystPosition = 1 and remains flat when AnalystPosition = 0. The equity line of this associated trading system is assessed by an appropriate utility function (in our case, the Sharpe ratio) which is calculated for each point in the plane (Length_ f ast_moving_average, Length_slow_moving_average) (see for example Fig. 3).2 The final point is selected according to a procedure that aims not only to maximize the utility function, but also the width of the nearby area having enough points of maximum. RoboCompliance checks all the investment constraints, including minimum and maximum exposure for each asset class, the bounds for aggregated asset class, returning to RoboManager a boolean response: feasible/not feasible. For example, in the investigated Certificate, the client decided to invest no more than 25% in developed equity markets and no more than 15% in other equity markets. 1
The exponential moving average is a well-known technical analysis indicator that provides a suitably weighted average of the prices of a given asset. 2 The expected performance from each financial asset is AnalystPosition · r , where r is the historical average return of the financial asset itself.
Robomanagement
TM :
Virtualizing the Asset Management Team …
205
Fig. 3 Regarding the MSCI EURO Small Cap Index, we report the Sharpe ratio of the associated trading system in the plane (Length_ f ast_moving_average, Length_slow_moving_average). The white points are points of maximum and the black point is the selected point of maximum
RoboBroker simulates the real investment, including management and performance fees, high water mark lines, transaction costs, spread and slippage. In the studied real case, the parameters are 90 BPS for management fees3 , 10% above Eonia plus, 200 BPS for performance fees, and 10 BPS of total transaction costs every 100% turnover. RoboBroker returns to RoboManager the impact of all these costs on the candidate portfolio. RoboAuditor works “ex post”. It takes the simulated NAV and compares it to a risk cone that has to be defined “a priori” with the customer. For the Certificate, it was decided to use the Pareto-Lévy Stable statistics and its time scaling law. Parameters was set given the target return of 3.5% yy and the target RedES TM 99% 1W of 3.5%. The stability exponent was set to 1.7, the cluster average for the flexible funds, and it is supposed that the distribution is symmetric. The risk cone can be plotted in a chart we call RedCONE TM (see Fig. 4). This tool is useful not only to monitor ex post the management results, but also to represent ex ante an agreement with the investor on risk, on potential return over time and on the best investment time horizon. RoboAuditor returns to RoboManager the quantile reached by the simulated NAV in the risk cone. Now, let us go back to RoboManager: it considers many different candidate portfolios, collecting the responses from the other objects and calculating f RoboManager . This function is specified according to customer directives. A basic example of such funcPortfolioExpectedReturnRoboAnalysts −CostsRoboCompliance −r f , where r f is tion may be f RoboManager = PortfolioRiskRoboRiskManager the risk-free rate of return. Of course, this function can be more complex and include
3
BPS stands for “basis points”.
206
R. Donati and M. Corazza
Fig. 4 Pareto-Lévy Stable risk cone and NAV. We plot a continuous line on quantile 0.5, dotted lines on quantiles 0.15, 0.3, 0.4, 0.6, 0.7 and 0.85, and a solid dark gray areas between quantiles [0.01, 0.05] and [0.95, 0.99]
quantities that have to be set with the client. We cannot disclose the details for the Certificate, to protect the customer decisions and settings. In the next table we summarize the results achieved by our Robomanagement TM for the Certificate. The back-test starts from Jan. 6, 2014 and it is operationally implemented in the Certificate since Sept. 1, 2019. Reward or risk measure Average return yy Standard deviation of return yy
Gross NAV (%) 5.18 3.49
Net NAV (%) 4.05 3.47
IFFN (%) 1.45 3.53
We conclude noting that, for this kind of applications, the usage of software objects is much more significant than a simple programming trick, allowing us to better define, isolate and virtualize the relevant human skills.
References 1. Beketov, M., Lehmann, K., Wittke, M.: Robo advisors: quantitative methods inside the robots. J. Asset Manag. 19, 363–370 (2018) 2. De Donno, M., Donati, R., Favero, G., Modesti, P.: Risk estimation for short-term financial data through pooling of stable fits. Financ. Mark. Portf. Manag. 33, 447–470 (2019) 3. Donati, R., Corazza, M.: RedES TM , a risk measure in a Pareto Lévy stable framework with clustering. In: Cira, P., Sibillo, M. (Eds.) Mathematical and Statistical Methods for Actuarial Sciences and Finance—MAF 2014, pp. 91–94. Springer (2014)
Robomanagement
TM :
Virtualizing the Asset Management Team …
207
4. Ibe, O.C.: Elements of Random Walk and Diffusion Processes. Wiley (2013) 5. Jung, D., Glaser, F., Köpplin, W.: Robo-advisory—opportunities and risks for the future of financial advisory. In: Nissen, V. (ed.) Advances in Consulting Research, Recent Findings and Practical Cases, pp. 405–427. Springer (2019) 6. Kaya, O.: Robo-advice—A true innovation in asset management. Deutsche Bank Research, 1–16 (2017)
Numerical Stability of Optimal Mean Variance Portfolios Claudia Fassino, Maria-Laura Torrente, and Pierpaolo Uberti
Abstract In this paper we study the numerical stability of the closed form solution of the classical portfolio optimization problem. Such explicit solution relies upon a technical symmetric square matrix A, whose dimension is 2 independently from the number of assets considered in the portfolio. We observe that the computation of A involves the problem’s possible sources of instability, which are essentially related to the estimation and the inversion of the covariance matrix and to the relative scaling of the problem constraints equations, namely the budget and the portfolio expected return constraints respectively. We propose a theoretical approach to minimize the condition number of A by rescaling both its rows and columns with an optimal constant. Using this result, we substitute the original ill-posed optimization problem with an equivalent well-posed formulation of it. Finally, through a simple empirical example, we illustrate the validity of the proposed approach showing considerable improvements in the stability of the closed form solution. Keywords Asset allocation · Portfolio optimization · Numerical stability · Matrix conditioning
1 Introduction The mean-variance portfolio model proposed by Harry Markowitz [10] is the starting point of modern portfolio theory; its high popularity relies upon the simplicity of its C. Fassino Dipartimento di Matematica, Università degli Studi di Genova, Via Dodecaneso 35, 16146 Genova, Italy e-mail: [email protected] M.-L. Torrente (B) · P. Uberti Dipartimento di Economia, Università degli Studi di Genova, Via Vivaldi 5, 16126 Genova, Italy e-mail: [email protected] P. Uberti e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_31
209
210
C. Fassino et al.
interpretation. Despite this, the mean-variance efficient portfolios are characterized by a strongly numerical instability and, consequently, they are difficult to implement in practice. Infinitesimal variations in the model’s inputs, the covariance matrix and the vector of expected returns, may dramatically change the composition of the optimal portfolio. Over the years, the reasons of this instability have been extensively studied in the literature and ascribed to different causes. The model’s input parameters are unknown; consequently, they need to be estimated leading to computational inaccuracy and uncertainty, so that the solved problem could be very different from the original one. In other words, the in-sample mean-variance frontier is a biased estimator of the real efficient frontier, see Kan and Smith [6]. Many authors, see for instance Best and Grauer [1] and Kan and Zhou [7], consider the estimation uncertainty as the principal cause of instability in the model. Others relate the instability to the number of assets in the portfolio and, as a consequence, to the huge number of parameters to be estimated, see Pflug et al. [11]. Moreover, the solution of the optimization problem depends on the inverse of the covariance matrix; when the returns of the assets in the portfolio are almost collinear, the covariance matrix is numerically close to a singular matrix. From this point of view, in order to improve the numerical stability of the model, a bunch of alternative proposals have been discussed in the literature: among the others we recall the Bayesian approach, see Frost and Savarino [5], the shrinkage approach, see Ledoit and Wolf [9], robust optimization techniques, see Kim et al. [8], and Lasso techniques, see Brodie et al. [2]. In this paper we relate the possible sources of numerical instability of the classical portfolio optimization problem to the estimation and the inversion of the covariance matrix and to the relative scaling of the problem constraints equations, namely the budget and the portfolio expected return constraints. We observe that the closed formula for the problem solution relies upon a technical symmetric square matrix A, whose dimension is 2 independently from the number of assets considered in the portfolio (indeed the dimension 2 is the number of linear constraints of the optimization model). Despite its simplicity, the technical matrix A depends both on the covariance matrix and the problem’s constraints; thus, it incorporates the sources of instability of the problem which eventually reveal themselves in the bad conditioning of A. To this aim, we study the 2-norm condition number of A and, in particular, we propose a theoretical approach to minimize it by rescaling both the rows and the columns A by an optimal constant. We highlight that, in the general case, minimizing the scaled 2-norm condition number has been an open question for long time and that, up to now, different approaches have been proposed to solve this problem numerically, whereas an explicit formula is not available (on other hand this kind of problem has been completely solved for other cases—like the 1-norm and the ∞norm; we refer for instance to [3] for a deeper analysis on this topic). In the case in which the ill-conditioning of A exclusively depends on the bad relative scaling of the optimization problem linear constraints, we propose to replace the original problem with an equivalent well-conditioned expression of it. On the other hand, when the ill-conditioning (also) depends on some collinearities in the covariance matrix, the proposed rescaled problem is not equivalent to the original one. In this case, through
Numerical Stability of Optimal Mean Variance Portfolios
211
a simple empirical example, we show the potentially plenty room for improvement represented by the solution of the well-conditioned problem.
2 Condition Number Optimization The standard way to evaluate the numerical stability of a problem is by computing its condition number. In particular, when linear systems are involved, the condition number is classically used to quantify the sensitivity of the solution. In the following definition, we recall the notion of condition number of nonsingular square matrices. Definition 1 Let A ∈ Mat n (R) be a nonsingular matrix. The (2-norm) condition , where number of A, denoted by k(A), is defined by k(A) = A2 A−1 2 = σσn1 (A) (A) · 2 is the classical Euclidean norm and σ1 (A) ≥ . . . ≥ σn (A) > 0 are the singular values of A listed, as usually, in non-increasing order. In general, the problem of minimizing the 2-norm condition number of a given matrix by row and/or column scaling operations has been solved numerically under different approaches, whereas an explicit formula is not available (see for instance [3]). In the following proposition we solve such optimization problem for the special case of 2 × 2 Hermitian positive definite matrices. Proposition 1 Let A = (ai j ) ∈ Mat 2 (R) be an Hermitian positive definite matrix and let D = diag(d, 1) ∈ Mat 2 (R) be a nonsingular diagonalmatrix, with d > 0. The minimum of problem mind>0 k(D AD) is attained at d = aa2211 , so that √ A D) = √a11 a22 + |a12 | , min k(D AD) = k( D d>0 a11 a22 − |a12 |
= diag where D
a22 ,1 . a11
Proof The proof, which is essentially based on the computation of the singular values of a 2 × 2 matrix, has been omitted for brevity.
3 Robust Portfolio Selection We use the following notation: let xi , with i = 1 . . . n, be the weight of the investor’s . , xn )t be the nwealth allocated to the ith asset in the portfolio and let x = (x1 , . . n xi = 1. Let column vector of the weights; the sum of weights is equal to 1, that is i=1 Ri , with i = 1 . . . n, be the expected return of the ith asset and let R = (R1 , . . . , Rn )t be the n-column vector of expected returns. We assume to work with no risk-less asset and that not all the elements of R are equal. Let V be the covariance matrix of the assets which is, by hypothesis, a nonsingular matrix; since V is a covariance matrix, it follows that V is Hermitian and positive definite. Finally, let p be a portfolio with
212
C. Fassino et al.
variance σ p2 = xt Vx and expected return R p = xt R. In this framework, the portfolio selection problem (we refer to the problem classical formulation, see Constantinides and Malliaris [4], Markowitz [10]) is: minimize σ p2 = xt Vx
subject to
xt 1 = 1 and xt R = R p
(1)
where 1 = (1, . . . , 1)t denotes the n-column vector of ones. Since V is positive definite and the linear constraints define a convex set, it follows that problem (1) has the unique solution (see Constantinides and Malliaris [4] for further details) that can be expressed in closed form as:
Rp x = V−1 R 1 A−1 1
(2)
t where A ∈ Mat2 (R) is the Hermitian positive A = R 1 V−1 R 1 . The matrix A plays a fundamental role in the closed form solution of the portfolio optimization problem. In particular, its numerical instability, which may heavily affect the reliability of the solution x, can be heavily reduced by simply left- and right-multiplication of A by a suitable non-singular diagonal matrix D = diag(d, 1), with d > 0. Note that this operation can be thought as a scalar multiplication of the asset returns by the factor d, that is Rd = dR and R dp = d R p . In problem (1) the constraint xt R = R p must then be replaced by xt Rd = R dp , which is the same as considering the following equivalent version of problem (1): minimize σ p2 = xt Vx
subject to
xt 1 = 1 and xt Rd = R dp
(3)
whose unique solution is
d Rp xd = V−1 Rd 1 (Ad )−1 1
(4)
where Ad ∈ Mat2 (R) is the Hermitian positive definite matrix Ad = Rd 1 t V−1 Rd 1 . In the following result we show that the minimum condition number of Ad can be simply obtained by applying Proposition 1 to A. Proposition 2 In the framework of the portfolio selection problems (1) and (3), let A and Ad ∈ Mat2 (R) be the Hermitian positive definite matrices defined above. The t V−1 1 1 d minimization problem mind>0 k(A ) has the unique solution d = t −1 , so that RV R
min k(Ad ) = k(Ad ) = 1 + 2 d>0
|Rt V−1 1| (Rt V−1 R)(1t V−1 1) − |Rt V−1 1|
.
It is immediate to observe that the only possibility to attain the minimum condition number 1 is when A is diagonal, that is when Rt V−1 1 = 0.
Numerical Stability of Optimal Mean Variance Portfolios
213
4 Computational Results In this section we consider a classical example of mean-variance problem and compare the solution of the original problem (1), computed using the explicit formula recalled in (2), with the solutions of the equivalent problem (3) obtained by for given in Proposition 2, and the suboptimal mula (4) by using the optimal constant d, constant c, given in [12], Proposition 3. First example: we consider a 3-assets portfolio; the returns are random generated such that the returns of the third asset are almost collinear to the ones of the first two assets in order to simulate a case in which the covariance matrix is ill-conditioned and close to singularity. The input data, i.e. the covariance matrix V and the vector R of expected returns, for the portfolio problem are: ⎛
⎞ 0.0550 0.0594 −0.0044 V = 10−3 · ⎝ 0.0594 0.1051 −0.0457 ⎠ −0.0044 −0.0457 0.0413
⎛
⎞ 0.5145 R = 10−3 · ⎝ 0.4495 ⎠ . 0.0649
The condition numbers of V and A are k(V) = 1.4 · 1015 and k(A) = 5.7 · 1020 . Let R p = 0.0004 be the portfolio return. The unique solution of the portfolio selection problem (1) is x0 = (0.7130, 0.0411, 0.2239)t (note that MatLab returns it together with a warning message stating that this result may be inaccurate1 ). If we rescale the problem by the suboptimal constant c = 2.5237 · 103 , then Ac has condition number k(Ac ) = 8.8 · 1013 and the solution is xc = (0.6723, 0.0827, 0.2587)t . On the other hand, if we rescale the problem by the optimal constant d = 2.3270 · 1010 , then k(Ad ) = 1.6 and the solution is xd = (0.6926, 0.0610, 0.2464)t . By a comparison among x0 , xc and xd we note that xd is the only solution that satisfies the constraint xt 1 = 1 and yields the minimum value of the objective quadratic form xt Vx: x0t Vx0 = 3.1440 · 10−5 xc t Vxc = 3.1460 · 10−5 xd t Vxd = 3.1422 · 10−5
x0t 1 = 0.9779 xc t 1 = 1.0137 xd t 1 = 1
x0t R − R p = −1.7178 · 10−7 xc t R − R p = −1.3669 · 10−7 xd t R − R p = −2.5220 · 10−7 .
Consequently, xd is in our opinion the best approximation of the unique solution of problem (1); the componentwise relative errors of the solutions x0 and xc with respect to xd are approximately 3% and 6% respectively. Second example: we generalize the previous example to the case of N -assets portfolios, with N ∈ {5, 10, 20, 50, 100}: for each value of N , the first N − 1 returns of 1000 matrices are independently randomly generated, whereas the last asset is constructed so that it is almost collinear to the previous ones. We refer to Table 1 that contains the average of the most important variables. The magnitude of the condition 1
MatLab displays a warning message when the matrix involved in the computations is badly scaled or nearly singular, but performs the calculation regardless.
214
C. Fassino et al.
Table 1 Data of second example k(A) k(Ac ) k(Ad ) N N N N N
=5 = 10 = 20 = 50 = 100
6 · 1023
1 · 1016
8 · 1022 3 · 1022 2 · 1022 9 · 1021
6 · 1015 4 · 1015 4 · 1014 1 · 1015
2.1 1.5 1.4 1.1 1.0
V (x0 )
V (xc )
V (x d)
E0
Ec
9 · 10−7
3 · 10−5
3 · 10−7
2 · 10−7 2 · 10−7 7 · 10−8 1 · 10−8
2 · 10−4 7 · 10−7 5 · 10−8 1 · 10−6
8 · 10−7 3 · 10−7 8 · 10−9 6 · 10−9
2.03 0.97 0.74 0.63 0.52
1.92 1.44 1.00 0.59 0.73
numbers k(A), k(Ac ) and k(Ad ) does not depend on the number N of assets, whereas d. The minimum of the it systematically decreases using the rescaling factors c and objective function V(x) = xt Vx assumes comparable or smaller values at xd with respect to x0 and xc . The componentwise relative errors E 0 and E c of x0 and xc with respect to xd are, on the average, very large and denote a significant variation on the solutions. In particular, the values greater than 1 highlight the presence of possible unreliable solutions due to the occurrence of matrix singularities.
5 Conclusions In this paper we identify one of the possible sources of numerical instability of the solution in the portfolio selection model. The literature on the topic focuses on the estimation of the covariance matrix neglecting the central role of the matrix A. We provide the optimal rescaling for the matrix A and substitute to the original illconditioned problem an equivalent problem that is showed to be better conditioned. The numerical experiments confirm that the proposed optimal rescaling permits to manage the numerical ill-conditioning of the original problem resulting in a more stable optimal portfolio. Moreover, as shown in the example, the results do not depend on the size of the portfolio. Our approach is of potential interest both for academics and for practitioners.
References 1. Best, M.J., Grauer, R.R.: On the sensitivity of mean- variance-efficient portfolios to changes in asset means: some analytical and computational results. Rev. Financ. Stud. 4(2), 315–342 (1991) 2. Brodie, J., Daubechies, I., De Mol, C., Giannone, D., Loris, I.: Sparse and stable Markowitz portfolios. PNAS 106(30), 12267–12272 (2009) 3. Braatz, R.F., Morari, M.: Minimizing the euclidean condition number. SIAM J. Control Optim. 32(6), 1763–1768 (1994) 4. Constantinides, G.M., Malliaris, A.G.: Portfolio theory. In: Jarrow, R., et al. (eds.) Handbooks in OR & MS, Vol. 9 (1995)
Numerical Stability of Optimal Mean Variance Portfolios
215
5. Frost, P.A., Savarino, J.E.: An empirical Bayes approach to efficient portfolio selection. J. Financ. Quantit. Anal. 21(3), 293–305 (1986) 6. Kan, R., Smith, D.R.: The distribution of the sample minimum-variance frontier. Manag. Sci. 54(7), 1364–1380 (2008) 7. Kan, R., Zhou, G.: Optimal portfolio choice with parameter uncertainty. J. Financ. Quant. Anal. 42(3), 621–656 (2007) 8. Kim, J.H., Kim, W.C., Fabozzi, F.J.: Recent developments in robust portfolios with a worst-case approach. J. Optimiz. Theory App. 161(1), 103–121 (2014) 9. Ledoit, O., Wolf, M.: Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Finance 10(5), 603–621 (2003) 10. Markowitz, H.: Portfolio selection. J. Financ. 7(1), 77–91 (1952) 11. Pflug, GCh., Pichler, A., Wozabal, D.: The 1/N investment strategy is optimal under high model ambiguity. J. Bank. Financ. 36, 410–417 (2012) 12. Torrente, M., Uberti, P.: A rescaling technique to improve numerical stability of portfolio optimization problems, submitted
Pairs-Trading Strategies with Recurrent Neural Networks Market Predictions Andrea Flori and Daniele Regoli
Abstract We propose a deep learning approach to complement investor practices for identifying pairs trading opportunities. We refer to the reversal effect, empirically observed in many pairs of financial assets, consisting in the fact that temporarily market deviations are likely to correct and finally converge again, thereby generating profits. Our study proposes the use of Long Short-term Memory Networks (LSTM) to generate predictions on market performances of a large sample of stocks. We note that pairs trading strategies including such predictions can contribute to improve the performances of portfolios created according to gaps in either prices or returns. Keywords LSTM · Pairs trading · Reversal effect
1 Introduction The reversal effect builds on the observation that stocks that have recently underperformed will probably have a larger market reversal in the future [8, 19, 22]. Past market performances may thus affect the investment attitudes of the investors and poorly performing stocks might experience higher reversals since they become, for example, more volatile thus impacting on the provision of liquidity, or they might suffer from certain market behaviors such as fire sales that can cause excessive drops in their market prices but foster their subsequent rebounds [4, 5, 21]. In efficient markets, returns are memoryless stochastic processes and prices react as soon as new information becomes available, thus preventing investors from exploitA. Flori Department of Management, Economics and Industrial Engineering, Politecnico di Milano, Via Lambruschini 4/B, 20156 Milano, Italy e-mail: [email protected] D. Regoli (B) Big Data Lab, Intesa Sanpaolo S.p.A.,, Corso Inghilterra 3, 10138 Torino, Italy © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_32
217
218
A. Flori and D. Regoli
ing past information to predict future market dynamics [9]. The investigation of the reversal effect refers to the literature on market anomalies and on the detection of predictive signals. Hence, reversal effect has been investigated as a market anomaly that can be exploited to build investment strategies (see, e.g., in [1, 2, 6]). Typically, this type of market pattern has been studied and tested within the framework of pairs trading, which refers to the apparent profitability of a strategy in which pairs of stocks with similar historical performances start to exhibit diverging market patterns. Investors may try to exploit these short-term relative mispricings by buying the underperforming stocks and selling the overperforming ones. Pairs trading represents, therefore, a zero-cost framework to investigate the emergence of reversal effect, stimulating literature to detect such anomaly and construct related investment strategies (see, e.g., [3, 7, 18, 25]). Several approaches have been proposed to extract valuable information from financial series by means of various machine learning techniques (see, e.g., [12, 13, 15, 23]). Following the literature that exploits a large-scale use of deep learning algorithms to detect market patterns, we propose to study the complex non-linear relationships among financial time series by relying on the Long Short-term Memory Networks (LSTM) that we employ to predict the market performances of a large sample of stocks. In line with previous studies (see, e.g., [10, 14, 15]), we refer to stocks composing the S&P 500 index. Our findings indicate that an LSTM trained on past returns and volumes can provide additional information on future stock performances with respect to traditional indicators for pairs trading. We note that pairs trading strategies based on observed gaps in either prices or returns can obtain better portfolio performances once the probability of their future market deviation is computed by the LSTM and included as an additional criterion in the portfolio building process.
2 Methodology We employ a cointegration approach to support the theoretical framework of identifying pairs trading opportunities as temporarily deviations of mean reverting processes (see, e.g., [16, 20, 24, 25]). We proceed as follows: (i) at the beginning of each year t we consider the adjusted close price series over the past 3 years for each stock Sti j present in the S&P500 index at that day; (ii) for each possible couple (Sti , St ), we test for cointegration using the Engle-Granger two-step method and the Johansen ci
j
test, considering as cointegrated (Sti ∼ St ) only those couples that are cointegrated according to both tests. Finally, for each stock S i and year t, we define its contegration group C G it as the set of all stocks cointegrated to S i . Once the cointegration groups C G it are identified for each stock S i at the beginning of year t, the set of groups {C G it }i is maintained for the entire year t. To capture the probability that the return difference of each stock versus its C G, namely Δr , will increase in the near future, we employ the indicator P(Δr ). We
Pairs-Trading Strategies with Recurrent Neural Networks …
219
compute P(Δr ) via LSTM, whose target variable yτ is the up-down movement of Δr within a h-day horizon. Specifically, yτ = 1 if Δr increases in h days, zero otherwise. Thus, the LSTM output yˆτ is precisely the probability of Δr increasing in a h-day horizon. The predictive framework is organized as follows: we consider windows of 4 years, employing the first 3 years as training set and the last year as test set or live set; then we roll the window yearly, thus determining an out-of-sample time interval of non-overlapping sets. We calibrate parameters during the training set and then we utilize them in the live set to make predictions for the out-of-sample period. Our reference period is from Jan 2000 to Jun 2019. More practically, the input features for the LSTM are multivariate time series. We take sequences roughly 1-year long (τ = 240 days) with: past values of the Δr gap, past trading volumes (V ), past returns (r ). These features are computed for each stock present in at least one C G, sliding each sequence daily. Namely, for each stock, in the 3-year training window we collect roughly 750–240 input sequences of the form: ⎞ ⎛ i i Δrsi Δrs−τ +1 . . . Δrs−1 i i i ⎠ ⎝ Vs−τ . +1 . . . Vs−1 Vs i i i rs−τ +1 . . . rs−1 rs For the architecture of the LSTM we build upon [10]; specifically, we choose: an input layer with 3 neurons and input sequences with dimension 240, a single hidden layer with 35 nodes, and a 2-node dense output layer with softmax activation. Given z j zi as output of a set of neurons, softmax is defined as softmax(z)i = e ez j , and ensures j the network output to be a set of probabilities. Batch Normalization [17] is applied to rescale the LSTM layer outputs with 2 additional parameters for each node. We rely on early stopping to select the optimal out-of-sample number of epochs: we use the last 10% of training sequences as validation set and we stop the training process when the crossentropy computed on this set does Nnot decrease for 10 consecutive epochs. yi log yˆi − (1 − yi ) log(1 − yˆi ) /N , Crossentropy, or logloss, is defined as − i=1 where i = 1, . . . , N denotes observations (in this case the validation sequences). It is actually equivalent to the negative loglikelihood of the inference problem. The total number of parameters is 5602. Finally, we consider illustrative strategies which equally invest in stocks grouped in percentiles according to selected indicators. We assume a holding period of one day (i.e., h = 1). Returns are linked in time to form series for each percentile portfolio. For each year t, we perform the analysis on the sample of stocks composed by those time series that we have detected as cointegrated with at least one other series at the end of previous year. We neglect transactions costs.
220
A. Flori and D. Regoli
3 Results We assess the predictive capacity of our approach by applying one-way procedures which invest in decile portfolios. We consider as sorting criteria several indicators proposed in the literature. The idea of pairs trading based on deviations in terms of prices refers to [11]. We construct the price gap for each stock i first by normalizing all the prices such that they all have price equal to 1 at the beginning of the training period, and then by taking the difference between its normalized price and the average normalized price of its cointegrated peers. We refer to this criterion as Δp. Following [24], we also consider the residuals of an OLS model where we regress the stock price versus the average price of its cointegrated peers (Δβ p). Then, we consider deviations in terms of returns to determine whether stocks that significantly underperform (overperform) their peers are able to show higher (lower) returns in the near future. Following [3], we compute the difference of the returns between each stock i and its cointerated peers (Δr ), and we also consider the variant (Δβ r ). Table 1 reports also results for our proposed measure P(Δr ) and the Top-Bottom strategies which buy the top performer decile portfolios and sell the worst performer decile portfolios, thus constituting zero-cost investment strategies. The ranking indicator Δp produces an almost monotonic pattern of the annualized returns, with the lowest deciles portfolios that systematically overperform the highest ones. The corresponding Top-Bottom strategy generates a consistent and statistically significant performance (i.e., 11.09%). This result is qualitatively similar to the one obtained by the Top-Bottom strategy based on Δβ p (i.e., 8.08%). Hence, sorting stocks according to price gaps with respect to peers seems to support the emergence of pairs trading opportunities. Note how also the indicators Δr and Δβ r present monotonic patterns and show even higher performances of the Top-Bottom strategies (i.e., 16.42% and 20.86% respectively) compared to the analog in terms of price gaps.
Table 1 One-way sorts. Table shows the raw returns (in percentage) obtained by equally investing in stocks composing decile portfolios defined using as sorting criteria those reported in the first column. The holding period is h = 1 day. Newey-West t-statistics are reported in parenthesis. Data are annualized and refer to the period from January 2003 to June 2019 1
2
3
4
5
6
7
8
9
10
Topbottom
Δp
21.79 (3.48)
15.27 (3.36)
13.37 (3.10)
16.60 (3.84)
13.41 (3.47)
12.27 (3.40)
10.06 (2.84)
8.81 (2.65)
10.84 (3.02)
9.55 (2.37)
11.09 (2.47)
Δβ p
17.70 (3.49)
17.19 (3.68)
19.41 (4.20)
15.15 (3.62)
14.41 (3.61)
12.18 (3.17)
12.49 (3.34)
8.20 (2.45)
7.74 (2.12)
8.75 (2.27)
8.08 (2.48)
Δr
25.50 (4.58)
19.83 (4.31)
17.20 (4.15)
15.55 (4.01)
14.11 (3.56)
10.59 (2.74)
9.27 (2.49)
6.91 (2.00)
7.95 (2.05)
6.40 (1.62)
16.42 (4.04)
Δβ r
26.62 (4.57)
18.08 (4.21)
21.72 (5.10)
13.94 (3.69)
13.61 (3.28)
10.91 (2.89)
8.89 (2.55)
5.88 (1.72)
10.80 (2.54)
3.27 (1.16)
20.86 (4.53)
P(Δr )
4.04 (1.27)
7.12 (1.91)
8.58 (2.24)
9.25 (2.37)
10.91 (2.83)
16.24 (3.98)
13.32 (3.31)
21.85 (4.81)
18.31 (4.01)
24.57 (5.14)
18.32 (4.98)
Pairs-Trading Strategies with Recurrent Neural Networks …
221
Table 2 One-way sorts risk-adjusted performances. Table shows the Sharpe ratios (in percentage) obtained by equally investing in stocks composing decile portfolios defined using as sorting criteria those reported in the first column. The holding period is h = 1 day. Data are annualized and refer to the period from January 2003 to June 2019 1
2
Δp
76.82
64.85
Δβ p
70.09
70.52
Δr
97.01
3
4
5
6
7
8
9
10
Topbottom
62.10
77.72
63.13
59.26
48.99
44.39
54.65
43.22
88.12
70.36
70.21
58.82
60.26
40.19
37.85
40.60
56.93
87.21
80.91
77.19
69.07
53.04
46.36
33.34
37.10
25.07
94.98
61.13
Δβ r
95.40
77.76
104.44
69.97
68.71
56.78
45.45
28.95
49.09
12.36
110.27
P(Δr )
17.33
33.07
40.26
43.87
52.23
76.25
61.91
99.99
81.90
106.59
122.74
Hence, stocks with positive and larger deviations of market returns with respect to peers are more likely to underperform the latter in the near future, and viceversa. Finally, our proposed metric P(Δr ) can sort the raw performances of the decile portfolios, generating an economically and statistically significant performance for the Top-Bottom strategy similar to those of the other sorting criteria. The predictions of the LSTM are thus able to systemically generate valuable information that can be exploited to build profitable Top-Bottom strategies. Then, we check whether these indicators for constructing portfolios are also valuable once the risk dimension is taken into account. Table 2 reports for each sorting criterion the corresponding Sharpe ratios. The monotonic patterns observed in Table 1 are present but appear less pronounced. The performances of Top-Bottom strategies are still positive and high, thus confirming the relevant role of these sorting criteria. Nevertheless, we observe that those portfolios based on Δr and Δβ r are more able to generate higher risk-adjusted Top-Bottom performances than those based on Δp or Δβ p. Interestingly, the Top-Bottom strategy based on P(Δr ) results the best performer once risk is taken into account, again supporting the interest in applying deep learning techniques to detect valuable pairs trading opportunities.
4 Conclusions This work investigates reversal effects through the lens of market anomaly detection in order to extract meaningful signals for future performances. In recent years, deep learning concepts have been widely applied to identify valuable patterns in financial markets. Here we rely on the application of LSTMs. Specifically, we apply LSTMs to study the market performances of a large sample of stocks, testing LSTMs predictive performance within the framework of pairs trading strategies. We show that pairs trading strategies based on temporarily deviation gaps in either prices or returns can reach even better portfolio performances once the probability of their future market deviation is included as an additional criterion for constructing portfolios.
222
A. Flori and D. Regoli
References 1. Blackburn, D.W., Cakici, N.: Overreaction and the cross-section of returns: international evidence. J. Emp. Financ. 42, 1–14 (2017) 2. Chan, L.K., Jegadeesh, N., Lakonishok, J.: Momentum strategies. J. Financ. 51(5), 1681–1713 (1996) 3. Chen, H., Chen, S., Chen, Z., Li, F.: Empirical investigation of an equity pairs trading strategy. Manag. Sci. 65(1), 370–389 (2017) 4. Cheng, S., Hameed, A., Subrahmanyam, A., Titman, S.: Short-term reversals: the effects of past returns and institutional exits. J. Financ. Quant. Anal. 52(1), 143–173 (2017) 5. Da, Z., Gao, P.: Clientele change, liquidity shock, and the return on financially distressed stocks. J. Financ. Quant. Anal. 45(1), 27–48 (2010) 6. Da, Z., Liu, Q., Schaumburg, E.: A closer look at the short-term return reversal. Manag. Sci. 60(3), 658–674 (2013) 7. Do, B., Faff, R.: Are pairs trading profits robust to trading costs? J. Financ. Res. 35(2), 261–287 (2012) 8. Fama, E.F.: The behavior of stock-market prices. J. Bus. 38(1), 34–105 (1965) 9. Fama, E.F.: Efficient capital markets: a review of theory and empirical work. J. Financ. 25(2), 383–417 (1970) 10. Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 270(2), 654–669 (2018) 11. Gatev, E., Goetzmann, W.N., Rouwenhorst, K.G.: Pairs trading: performance of a relative-value arbitrage rule. Rev. Financ. Stud. 19(3), 797–827 (2006) 12. Gu, S., Kelly, B., Xiu., D.: Empirical asset pricing via machine learning. Technical report, National Bureau of Economic Research (2018) 13. Heaton, J., Polson, N., Witte, J.H.: Deep learning for finance: deep portfolios. Appl. Stoch. Models Bus. Ind. 33(1), 3–12 (2017) 14. Huck, N.: Pairs selection and outranking: an application to the s&p 100 index. Eur. J. Oper. Res. 196(2), 819–825 (2009) 15. Huck, N.: Pairs trading and outranking: the multi-step-ahead forecasting case. Eur. J. Oper. Res. 207(3), 1702–1716 (2010) 16. Huck, N., Afawubo, K.: Pairs trading and selection methods: is cointegration superior? Appl. Econ. 47(6), 599–613 (2015) 17. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv:1502.03167 18. Jacobs, H., Weber, M.: On the determinants of pairs trading profitability. J. Financ. Mark. 23, 75–97 (2015) 19. Jegadeesh, N.: Evidence of predictable behavior of security returns. J. Financ. 45(3), 881–898 (1990) 20. Krauss, C.: Statistical arbitrage pairs trading strategies: review and outlook. J. Econ. Surv. 31(2), 513–545 (2017) 21. Lasfer, M.A., Melnik, A., Thomas, D.C.: Short-term reaction of stock markets in stressful circumstances. J. Bank. Financ. 27(10), 1959–1977 (2003) 22. Lehmann, B.N.: Fads, martingales, and market efficiency. Q. J. Econ. 105(1), 1–28 (1990) 23. Patel, J., Shah, S., Thakkar, P., Kotecha, K.: Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Syst. Appl. 42(1), 259–268 (2015) 24. Rad, H., Low, R.K.Y., Faff, R.: The profitability of pairs trading strategies: distance, cointegration and copula methods. Quant. Financ. 16(10), 1541–1558 (2016) 25. Vidyamurthy, G.: Pairs Trading: quantitative Methods and Analysis, Vol. 217. Wiley (2004)
Automatic Balancing Mechanism and Discount Rate: Towards an Optimal Transition to Balance Pay-As-You-Go Pension Scheme Without Intertemporal Dictatorship? Frédéric Gannon, Florence Legros, and Vincent Touzé Abstract The paper deals with the choice of the public discount rate in the framework of dynamic control applied to a specific pension scheme’s automatic balancing mechanism. We introduce a declining discount rate to address the issue of “intertemporal dictatorship”. Assuming such a time-dependent discount rate permits to solve the conflict between present and political needs to procrastinate and the long-run objective of no dictatorship of the present. We use a smooth-ABM and we detail the theoretical properties of this dynamic control problem to tackle properly this issue. Finally, we apply this ABM to the US Social Security and discuss about the sensitivity of the simulated results to the speed of declining of the public discount rate. Keywords Intertemporal dictatorship · Pension schemes · Automatic balancing mechanisms · Intertemporal discount factor
1 Introduction Pay-as-you-go pension systems are subject to regular parametric reforms to guarantee financial balance over a given horizon. The choice of parametric adjustments can be based on an arbitrary choice which depends on the will of the government in place and the urgency of the problem to be solved. Parametric adjustment can also be automatically based on pre-established rules (Automatic Balancing Mechanism) which define both an intertemporal solvency criterion (calculation method, time F. Gannon Sciences Po-OFCE, Université Le Havre, Le Havre, France e-mail: [email protected] F. Legros ICN-Business School, Nancy, France e-mail: [email protected] V. Touzé (B) Sciences Po-OFCE, Paris, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_33
223
224
F. Gannon et al.
horizon) and the way the parameters must evolve (i.e. pension index, contribution rate). In recent literature [1, 2], some models rely on an intertemporal loss function which poses the nature of the intertemporal tradeoff. Explicitly, they introduce a public discount rate. A preference for the present is then justified which implicitly expresses a preference for procrastination, which tends to give less weight to future adjustments than to present ones. However, these models can be calibrated with a public discount factor equal to 0 (e.g. [3]) which prevent them from dealing with infinite time horizon problem. In this paper, we propose to use the concept of “temporal dictatorship” introduced by [4] into the literature on dynamic programming problems. This implies to choose a law of motions of the discount factor which can guarantee neither present nor future dictatorship. We use the Smooth-ABM developed in [2]. This leads us to study its theoretical properties to tackle this issue correctly. We apply it to the US Pension (Social Security) forecasts and describe how the choice of this law of motion of the discount factor can impact the optimal balancing mechanism.
2 The Concept of Intertemporal Dictatorship Chichilnisky in [4] questions the existence of a criterion that would have neither the arbitrary aspect of an expected sum (high weight put on present generation) nor that of the golden rule criterion (only future generations are taken into account) to identify optimal paths for sustainable development. Her criterion is a complete pre-order on the trajectories of the current criterion (welfare function applied to economic problems) {Ut }t=0,...,+∞ . It must never lead to a dictatorship of both the present (axiom 1) and the future (axiom 2) and it increases with the well-being of each generation. The first axiom means that the criterion must be sensitive to the distant future. A well-being function gives a dictatorship of the present if and only if it is sensitive to a finite number of generations when we compare utility trajectories, i.e. it is insensitive to endless generations. Using a positive discount rate induces a dictatorship of the present in the long run. The second axiom means that the criterion is not only sensitive to the distant future. A welfare function gives a dictatorship of the future if it is only sensitive to an infinite number of generations when we compare utility trajectories, i.e. it is insensitive to a finite number of generations. The long-run golden rule criterion induces a dictatorship of the future. Chichilnisky [4] then seeks the class of criteria which simultaneously checks the non-dictatorship of the present and of the future. This “sustainable” preference criterion exists and can be written as an average of the discounted sum of utilities and the long-run utility:
Automatic Balancing Mechanism and Discount Rate: Towards an Optimal Transition …
W =γ
+∞
225
Πt Ut + (1 − γ ) lim Ut t→+∞
t=0
where γ is a “social” weighting, Πt = βt Πt−1 is the intertemporal discount factor with Π0 = 1 and the current discount factor denoted βt . For any given period t, the equivalent public discount rate is equal to δt = 1/βt − 1. In our framework, the maximisation of this criterion can only be solved with a “variable discount rate that goes to zero” (see [5]). In this case, the value of γ does not α matter. Consequently, we will use a time-dependent discount factor βt = e− t+1 with α > 0. This time-dependent discount factor induces a declining discount rate and the parameter α controls the speed of this decline. It simultaneously guarantees the convergence of β to 1 when t → +∞ and condition of transversality since Πt = the t t 1 log βi = −α i=0 . By definition of the log function, βt Πt−1 ⇔ log Πt = i=0 i+1 −α log (t + 1) < log Πt < −α log t. It is then easy to deduce that lim t→+∞ log Πt = −∞ and therefore that limt→+∞ Πt = 0.
3 ABM, Infinite Horizon and Time-Dependent Discount Factor Now, we apply this time-dependent discount factor to a dynamic programming model involving pension balancing. In the smooth-ABM framework developed in [2], the intertemporal problem is the following:
T min{At ,Bt }t=1...T t=1 Πt L(At , Bt ) T T R EC t A + F = t=1 Bt ΠEt X PRt j t t 0 t=1 Π Rj j=1
j=1
where L(At , Bt ) = (At − 1)2 + (Bt − 1)2 is the current loss function to minimize, Rt is the financial interest factor, E X P t are the pension expenditures, R EC t are the contribution receipts and F0 is the buffer fund in period 0. At and Bt are the two deformation coefficients modifying respectively the payroll tax rate (receipts) and pension benefits (expenditures). The optimal values of these coefficients guarantee the financial sustainability of the pension scheme. For a infinite horizon (T → +∞), the first order conditions of the dynamic programming give the following solutions: ⎧ ⎪ ⎨ (A+∞ − 1) = ⎪ ⎩
+∞ +∞ t=1
(B+∞ − 1) =
R EC t E X Pt − +∞ t=1 (Π t t=1 Π t j=1 R j ) j=1 R j
+∞ Πi=t+1
βt+i Rt+i R EC G t+i
E X P +∞ (A+∞ R EC +∞
R EC t Π tj=1 R j
− 1)
−F0
X Pt + ER EC t
E X Pt Π tj=1 R j
where G tR EC and G tE X P denote respectively the growth factor of the receipts and of the expenditures.
226
F. Gannon et al.
The current optimal solutions are the following in each period t:
(At − 1) = (Bt − 1) =
+∞
βi Ri i=t+1 G iR EC βi Ri i=t+1 G iE X P
+∞
At the steady state when Rt = R ∗ , βt = β ∗ and satisfy with following property:
(A+∞ − 1) (B+∞ − 1)
R EC t+1 R EC t
=
E X P t+1 E X Pt
(At+1 − 1) β ∗ R∗ (Bt+1 − 1) = . = (At − 1) G∗ (Bt − 1) ∗
(1) = G ∗ , the solutions
(2)
∗
Whenever βGR∗ is strictly less or greater than 1, the deformation coefficients are not stationary. Such a property is not satisfying because the coefficients could not be valued in the long run. To be consistent with a stationary economy, the golden rule has to be always satisfied in the long run: β ∗ R ∗ = G ∗ , which means the gap between the interest rate and the growth rate is equal to the public discount rate. Hereafter, ∗ we will suppose the long-run interest factor is endogenous and equal to R ∗ = Gβ ∗ 1−σ ∗σ and the convergence speed is denoted σ such as Rt = Rt+1 R with σ = 0.01 and R1 = 1.02. If we consider a scenario without temporal dictatorship (βt converges to β ∗ = 1), then R ∗ = G ∗ .
4 Application to the US Payg Pension Scheme We compare ABM with three scenarios without generational dictatorship. Each sceα nario is described by a dynamic time path of the discount factor: βt = e− t+1 where α controls the speed of convergence to 1. We consider three values α = {0.1, 0.2, 0.3}. For the expenditures and the receipts, we use the Social Security Administration forecasts (annual data) computed for the OASDI trust fund in 2019. The upper panels in Fig. 1 present our assumptions in terms of the dynamic time paths of the parameter βt and of the ratio βGt Rt t (relative gap with respect to the golden rule) in the three scenarios. The lower panels in Fig. 1 present the dynamic time paths of the optimal solutions At and Bt of the S-ABM. Table 1 gives the relatives gap computed for the scenario α = 0.1 or 0.3 relatively to the scenario α = 0.2. For our specific values of α, our computations show different initial degrees of procrastination: β0 = 96.7%, 93.5% or 90.5%. The initial adjustments are sensitive to α with a small range of variations around A0 = 1.008 and B0 = 0.991 computed for the benchmark value α = 0.2. These two immediate adjustments induce respectively a 0.8% contribution increase and a 0.9% pension benefit decrease. The long-run convergence to β+∞ = 1 induces similar differences in the long-run adjustments with respect to the benchmark solution ( A+∞ = 1.129 and B+∞ = 0.831). If compared to the benchmark scenario, the relative long-run adjustments are inverted with respect
Automatic Balancing Mechanism and Discount Rate: Towards an Optimal Transition … 1,02
227
1,02 1
1
0,98
0,98
0,96 0,96 0,94 0,94
0,92
0,92
0,9
0,9
0,88 alpha = -0,2
alpha = -0,1
alpha = -0,3
1,14
1
1,12
0,98
alpha = -0,2
alpha = -0,1
alpha = -0,3
200 alpha = -0,2
400 alpha = -0,1
600 alpha = -0,3
0,96
1,1
0,94
1,08
0,92 0,9
1,06
0,88
1,04
0,86
1,02
0,84
1
0,82 0
200 alpha = -0,2
400 alpha = -0,1
600 alpha = -0,3
800
0
Fig. 1 Upper left: discount factor βt . Upper right: convergence of Lower right: expenditures Bt
βt Rt Gt
800
. Lower left: receipts At .
Table 1 Relative comparison of scenarios with respect to the benchmark scenario α = 0.2 Years t = 0 (%) t = 100 − 200 t = 500 − 600 t = +∞ (%) (%) (%) Receipts adjustment: At α = 0.1 0.3 α = 0.3 −0.2 Expenditures adjustment: Bt α = 0.1 −0.3 α = 0.3 0.2
0.5 −0.4
−0.2 0.2
−0.3 0.3
−0.7 0.7
0.3 −0.3
0.5 −0.6
to the first adjustments: the earlier the adjustment (a higher value of α), the lesser the final correction, and vice versa (a lower value of α).
5 Conclusion This paper deals with the concept of intertemporal dictatorship in a dynamic programming framework applied to a problem of optimal Automatic Balancing Mechanism. We identified long-run consistent assumptions to guarantee the golden rule. We set a
228
F. Gannon et al.
sequence of discount factors with an initial strong present preference and a gradual disappearance over the infinite horizon of that preference. Our simulations show that using a declining discount rate can permit to conciliate a small immediate adjustment with an important need to balance the pension scheme in the long run. Note that this choice might induce a temporal inconsistency. Consequently, at each future period, it may be appropriate for governments in place to systematically postpone this continuous decrease in the discount factor to 1. Then, such a project should require constitutional locks that prohibit any coming back of the “intertemporal dictatorship”.
References 1. Haberman, S., Zimbidis, A.: An investigation of the pay-as-you-go financing method using a contingency fund and optimal control techniques. N. Am. Actuar. J. 6(2), 60–75 (2002) 2. Gannon, F., Legros, F., Touzé, V.: Sustainability of pensions schemes: building a smooth automatic balance mechanism with an application to the us social security. OFCE Working Paper Series No. 3 (2016) 3. Godínez-Olivares, H., del Carmen Boado-Penas, M., Pantelous, A.A.: How to finance pensions: optimal strategies for pay-as-you-go pension systems. J. Forecast. 35, 13–33 (2016) 4. Chichilnisky, G.: What is sustainable development. Soc. Choice Welf. 2, 231–257 (1996) 5. Heal, G.M.: Valuing the Future: Economic Theory and Sustainability. Columbia University Press, New York (1998)
The Importance of Reporting a Pension System’s Income Statement and Budgeted Variances in a Fair and Sustainable Scheme Anne Marie Garvey, Manuel Ventura-Marco, and Carlos Vidal-Meliá
Abstract This paper deals with the pension system’s income statement and reporting budgeted variances in a fair and sustainable scheme. We highlight the importance of income statement presentation for monitoring the changes of solvency in a notional defined contribution (NDC) pension scheme with disability and minimum pension benefits. Using the annual report of the Swedish pension system as a benchmark (TSPS, 2019), we extend it by adding the changes for disability pensions, the value of change in the discount rate and the explicit recognition of non-contributory rights (NCRs). Our proposal integrates both contributory and social aspects of public pensions and discloses the real cost of the disability contingency and the redistribution
Funding: Manuel Ventura-Marco and Carlos Vidal-Meliá are grateful for the financial assistance received from the Spanish Ministry of the Economy and Competitiveness (Ministerio de Economía y Competitividad) projects ECO2015-65826-P, RTI2018-097087-B-100 and Generalidad Valenciana (Valencian Government), project AICO/2019/075. Carlos Vidal-Meliá also acknowledges the financial support from the Basque Government (Project IT1336-19). Anne M. Garvey is grateful for the financial assistance received from the Spanish Ministry of Science, Innovation and Universities (MICINN) PID2019-105570GB-I00, also from the Autonomous Community of Madrid project CM/JIM/2019-044 and from the University of Alcalá projects CCG19/CCJJ-042 and UAHEV/1090. Acknowledgements: We would like to thank the seminar participants at the Complutense University of Madrid, the University of Alcalá and the Santander (XXVII Jornadas de ASEPUMA—XV Encuentro Internacional) and Peter Hall for his help with the English text. A. M. Garvey (B) Department of Economics and Management Sciences, University of Alcalá, Alcalá de Henares, Madrid, Spain e-mail: [email protected] M. Ventura-Marco · C. Vidal-Meliá Department of Financial Economics and Actuarial Science, University of Valencia, Valencia, Spain e-mail: [email protected] C. Vidal-Meliá e-mail: [email protected] C. Vidal-Meliá Instituto Complutense de Análisis Económico (ICAE), Complutense University of Madrid, Madrid, Spain © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_34
229
230
A. M. Garvey et al.
through minimum pensions. The inclusion of information relating to deviations from budget provide useful information to users for future decision making. Keywords Disability insurance · Minimum pensions · Notional defined contribution · Pension accounting · Sweden · True and fair view
1 Introduction The delivery or payment of social benefits to beneficiaries is one of the primary objectives of most governments and accounts for a large proportion of their expenditure. It is important that the financial statements accurately report pension disbursements and any associated liabilities. The financial reporting method used for most social security systems (SS systems) around the world is based exclusively on cash accounting, which is not a suitable accounting framework for the assets and liabilities allocated to pay for scheduled benefits. Reporting should be based mainly on the accrual accounting principle, the cornerstone of good financial management. This chapter is in line with the ongoing debate on how to recognize and measure the assets and liabilities of SS systems. It highlights the importance of income statement presentation for monitoring the changes of solvency in a notional defined contribution (NDC) pension scheme with disability and minimum pension benefits, and builds on previous work carried out on the so-called “Swedish” open group (SOG) approach and on pension schemes combining permanent disability and retirement within an NDC framework [1]. We also propose the inclusion of budgeted and actual information.
2 Importance of Income Statement Presentation and Budget Variance Information from a Public Accounting Perspective In the case of the public sector pension system, the main objective is not that of a profit-making entity and pension and disability policies have multiple implications for society that go beyond the scope of this chapter. This coincides with standard setter objectives for the public sector. The IPSASB [2], for example, sets out the requirements for general purpose financial reports in the public sector, and the public sector pension system is clearly included among the entities covered by their standards. The reporting of an income statement and the presentation of information relating to budgeted and actual amounts also coincide with the objectives of IPSAS 1 as
The Importance of Reporting a Pension System’s Income …
231
Table 1 The income statement for an NDC scheme combining retirement and disability contingencies, with an MPB for the period FUND ASSETS (Changes) LIABILITIES (Changes) Pension contributions: C Sponsor contributions for NCRs: SC Pension disbursements:—PT Net return on funded capital: D CONTRIBUTION ASSET (Changes) Value of change in contribution revenue: δC Value of change in turnover duration: δTD Total debit side
New pension credit: C Recognition of NCRs: R Pension disbursements:—PT Indexation: I Value of change in life expectancy: δe Value of change in discount rate: δG NET GAIN/LOSS Total credit side
Source Own based on TSPS [5] and Vidal-Melia et al. [6]
regards the presentation of financial statements [3] using accrual-based accounting methods.1 IPSAS 1 offers two possibilities for reporting variations. The first one is the inclusion of extra columns alongside the financial statements including the budgeted and actual figures. The second possibility offered is more general, it involves a disclosure that the actual amounts have not exceeded those budgeted.
3 The Income Statement Model This section briefly presents the main items of an income statement model for an NDC scheme including retirement, disability and minimum pension benefits (MPB) for both contingencies. This statement is an important piece of the full accounting model for monitoring the solvency of an NDC developed by Garvey et al. [4]. The proposed income statement structure is shown in Table 1, in which for simplicity’s sake it is assumed that NCRs are totally guaranteed by the sponsor, that the system’s administration costs are financed by general taxation and that inheritance gains arising and distributed are perfectly matched within the year. This profit and loss account follows the model published by the Swedish [5], but some modifications need to be introduced in order to adapt it to the specific NDC model proposed. It incorporates an important item tFrom an analytical point ofhat the benchmark model should also have in order to more accurately compute the annual change in the system’s net worth. It should be noted that the “Swedish approach” is an adaptive methodology with an annual review process, which means it is not prospective or predictive, and the
1
The value of the system’s contribution asset is the product of the system’s turnover duration and the value of the NDC system’s contributions made in that period for the retirement and disability contingencies. The interested reader can consult the paper by Garvey et al. [4].
232
A. M. Garvey et al.
assumption that the system’s rules and the economic and demographic conditions prevailing at the time of the valuation remain constant is fully coherent. In contrast to the Swedish model, on the right-hand side we have included an item known as the “Value of change in the discount rate (G)”, δG. The discount rate assumption is the most influential actuarial input affecting both funding ratios and contribution requirements [7]. In our model it does not affect the level of contributions but is very important for valuing pensions in payment and to compute the balance ratio. Either way, its effect on the ABS would be dampened by using a 5-year moving average of wage bill growth or some other reasonable form of smoothing. The main practical implication of introducing this item is that it would increase the volatility of the system’s results and could trigger the automatic balance more often. Finally, a net income or loss for the year is accounted for on the right-hand side. As can be seen, with the inclusion of items such as recognition of NCRs and sponsor contributions for NCRs, our income statement proposal integrates both contributory and social aspects of public pensions. From an analytical point of view, the change in net worth can be detailed as follows: S = δ AtS − δVtS = δNWtS = NWtS − NWt−1 Changes in net worth
⎛
Changes
⎞ ⎛ ⎞ Assets ⎜ S ⎜ S S ⎟ S ⎟ ⎝ At − At−1 ⎠ − ⎝Vt − Vt−1 ⎠ =
(1)
Liabilities
Changes
− δVtS . Liabilities
δBF S + t Fund asset
δCAtS Contribution asset
As depicted in formula (1), the change in net worth can be determined in the simplest way by comparing the system’s assets and liabilities on two consecutive valuation dates. The changes in net worth can be broken down into three main items: 1. Change in the fund/financial asset: S , δBFtS = CtS + N C R StS − P Tt S + rt · B Ft−1
(2)
where CtS is the income from ordinary contributions, N C R StS is the income from sponsor contributions for NCRs, P Tt S is total pension disbursements, and rt · S B Ft−1 is the net return on funded capital.
The Importance of Reporting a Pension System’s Income …
233
2. Change in the contribution asset: Revenue effect S = δ N DC C AtS = N DC C AtS − N DC C At−1
δCtS
δT D S t
+
=
Turnover duration effect δCtS
⎞ ⎛
⎞
⎜ S S S S
T Dt−1 − T Dt
⎟ ⎟ ⎜ Ct − Ct−1 S − T DS ⎟ ⎠+⎜ ⎝ CtS − C S · · T D t t−1 t−1 ⎟ , ⎜ 2 2 ⎝ ⎠
⎛
(3)
δT DtS
where T DtS is the turnover duration for the system; it can be interpreted as the number of years expected to elapse before the committed liabilities with contributors and pensioners of both the retirement and disability contingencies are completely renewed at the current contribution level. 3. Change in pension liability: Life expectancy effect
δVt = S
CtS
+
NC R
RtS
− P Tt + S
ItS
+
δetS
+
δG S t
(4)
Discount rate effect
where N C R RtS is the value of newly awarded NCRs, ItS is the indexation effect, δetS is the value of the change in the liability to pensioners due to changes in life expectancy, and δG tS is the value of the change in the liability to pensioners due to changes in the discount rate. The value of the change in the liability to pensioners due to updated information regarding the discount rate and life expectancy can be expressed as follows: Liability to pensioners with updated information − Pen_(s)ui t
δetS
= δetS + δG tS = Pen_(s)nui t Liability to pensioners with non-updated information
(5)
ui(e) nui(e) ui(e) Pen_(s)t + Pen_(s)ui −Pen_(s)t t − Pen_(s)t Liability with δG tS updated information about life expectancy
Finally, the income statement could easily be broken down by contingency in formulas [1–5], and doing so, discloses the real cost of the disability contingency and the redistribution through minimum pensions.
234
A. M. Garvey et al.
4 Concluding Comments There are solid reasons that highlight the importance of the pension system’s income statement and the need for PAYG pension systems in general—and NDC schemes in particular—to present one every year. The income statement provides more detailed information on changes that occur in the system and helps stakeholders and management to make informed decisions. The inclusion of information relating to deviations from budget provide useful information to users for future decision making and for taking strategic decisions for the future. Finally, with the inclusion of items such as recognition of NCRs and sponsor contributions for NCRs, our income statement proposal integrates both contributory and social aspects of public pensions.
References 1. Pérez-Salamero González, J.M., Ventura-Marco, M., Vidal-Meliá, C.: A “Swedish” actuarial balance sheet for a notional defined contribution pension scheme with disability and minimum pension benefits. Int. Soc. Secur. Rev. 70(3), 79–104 (2017). https://doi.org/10.1111/issr.12143/ full 2. International Public Sector Accounting Standards Board (IPSASB). The conceptual framework for general purpose financial reporting by public sector entities. In: Handbook of International Public Sector Accounting Pronouncements, vol. 1. International Federation of Accountants—International Public Sector Accounting Standards Board, New York, NY. https://www.ipsasb.org/publications/2018-handbook-international-public-sectoraccounting-pronouncements-15 (2018) 3. International Public Sector Accounting Standards Board (IPSASB). IPSAS 1: Presentation of financial statements. In: Handbook of International Public Sector Accounting Pronouncements, vol. 1. International Federation of Accountants—International Public Sector Accounting Standards Board, New York, NY. https://www.ipsasb.org/publications/2018-handbookinternational-public-sector-accounting-pronouncements-15 (2018) 4. Garvey, A.M., Ventura-Marco, M., Vidal-Meliá, C.: Does the pension system’s income statement really matter? A proposal for an NDC scheme with disability and minimum pension benefits. Econ. Res. Konomska Istraˆzivanj 34(1), 292–310 (2021). https://doi.org/10.1080/1331677X. 2020.1782246 5. The Swedish Pension System (TSPS). Orange annual report 2018. In: Settergren, O. (ed.) Swedish Pensions Agency (Pensionsmyndigheten), Stockholm. https://www. pensionsmyndigheten.se/other-languages/en/en/publications/2018-handbook-internationalpublic-sector-accounting-pronouncements-15 (2019) 6. Vidal-Meliá, C., Ventura-Marco, M., Pérez-Salamero González, J.M.: Social insurance accounting for a notional defined contribution scheme combining retirement and long-term care benefits. Sustainability 10, 2832 (2018). https://doi.org/10.3390/su10082832 7. Chen, G., Matkin, D.S.: Actuarial inputs and the valuation of public pension liabilities and contribution requirements: a simulation approach. Public Budg. Financ. 37(1), 68–87 (2017). https://doi.org/10.1111/pbaf.12154
Improved Precision in Calibrating CreditRisk+ Model for Credit Insurance Applications J. Giacomelli and L. Passalacqua
Abstract Credit insurance is a type of non-life insurance which protects the insured against insolvency events that may arise from all their buyers during the contract period. Dependency amongst claims generated by the underlying buyers is a critical issue in credit insurance modeling, both for pricing and risk management purposes. Given the similarity between a basket of exposures arising from invoices issued to risky buyers and a financial risky obligations portfolio, CreditRisk+ model has been applied to credit insurance multiple times in literature, although having been originally developed for the banking sector. In CreditRisk+ framework, claims are represented by a doubly stochastic process, where the dependency amongst expected default frequencies is described by a multivariate Clayton copula. However, calibration of this dependence structure in credit insurance requires to extract the information embedded in the historical database of an insurance company, given the lack of financial markets data which are available to banks about their debtors. We investigate how to estimate the copula parameters by historical claims time series and how the estimation precision scales with sampling frequency of the time series used in the calibration. Keywords CreditRisk+ · Calibration · Default Correlation · Dependence Structure · Credit Insurance
1 Modelling Credit Insurance Credit insurance—also known as trade credit insurance—gives protection to providers of goods and services. Claims are generated by the violation of a financial J. Giacomelli (B) SACE S.p.A, Piazza Poli 42, 00187 Rome, Italy e-mail: [email protected] L. Passalacqua Department of Statistics, Sapienza University of Rome, Viale Regina Elena 295, 00161 Rome, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_35
235
236
J. Giacomelli and L. Passalacqua
obligation between the insured and a third party, the so-called “buyer”, that does not pay its commercial debt or pays (very) late with respect to the due date. A particular feature of credit insurance is that risks might be unknown at subscription time, since the insured may decide to provide goods or services to new buyers during the coverage period. Accordingly, the amount of the premium is computed ex-post, on the basis of the effective turnover of the insured and of a premium rate contractually specified. Credit insurance policies are commonly underwritten as a master agreement (e.g. the so-called “whole turnover” policy). Insured sellers request a specific coverage (credit limit) to the insurer for each buyer they want to be protected from. The insurer may agree on each request partially or completely, granting to the insured a fraction or the whole credit limit requested. It is also possible for the insurer to reject the request of coverage. While the protection is active, invoices that are issued to the covered buyer are guaranteed against the event that the buyer does not pay, up to granted credit limit. The dynamic management of granted credit limits is a mechanism that allows the insurer to compensate the initial lack of knowledge of risks. Further details about credit insurance are available in [1, 2]. Risk modelling in credit insurance presents many analogies with portfolio credit risk modelling in finance, where a wide variety of well-established models is available. However, commercial portfolios present some important differences, namely: i. differently from the financial case, commercial contracts are not priced on the market so that a marked-to-market measure of their value is unavailable; ii. since market values are unavailable, contracts credit worthiness is unknown and the issuer can be found either in a default or non-default state (there is no “credit migration”); moreover this information become available only at maturity. In practice, the above features limit the choice of a model amongst the ones that can be fully calibrated only using the time series of claims generated by a credit insurance company. The suitable model is required to be a “portfolio model”, not only in the perspective of risk management but also for pricing purposes, since the underlying covered risk of a credit policy is multi-name (the buyers). Finally, a factor model is a natural choice to describe the “default correlations” amongst the underlying buyers, because the typical size of an underlying portfolio and the lack of information make impracticable an explicit description of the dependence between the claim probabilities associated to each couple of buyers. On the other hand, the latent variables of a factor model can be used to represent features that are shared amongst sub-portfolios of buyers, such as their industrial sector or the geographical position of their registered offices. All the above considerations make the CreditRisk+ model an excellent choice for risk management in credit insurance. A detailed discussion of the appropriateness of CreditRisk+ also to credit insurance pricing can be found in [3, 4].
Improved Precision in Calibrating CreditRisk+ Model …
237
2 Fundamentals of the CreditRisk+ Model CreditRisk+ is a portfolio model developed by Credit Suisse First Boston (CSFB) by Tom Wilde and co-workers—firstly documented in [5]. It can be classified as a frequency-severity model, cast in a single-period framework, with the peculiarity that frequency is described by a doubly stochastic process while—at least in the original formulation—loss severity is deterministic. However, the latter hypothesis can be easily relaxed at the cost of some additional computational burden. The structure of dependence of default events is described using a factor model framework, where factors are unobservable (“latent”) stochastic “market” variables, whose precise financial/actuarial identification is irrelevant, since the model integrates on all possible realizations. The structure of the model can be summarised as follows. Let N be the number of different risks in a given portfolio and 1Ii the default indicator function of the i-th risk (i = 1, . . . , N ) over the time horizon (t, T ]. The indicator function 1Ii is a Bernoulli random variable that takes the value 1 in case of default with probability qi and the value 0 with probability 1 − qi . Thus: E [1Ii ] = qi , var [1Ii ] = qi (1 − qi ), i = 1, . . . , N .
(1)
The “portfolio loss” L over the reference time horizon (t, T ] is then given by: L=
N
1Ii E i ,
E i = (E AD)i (LG D)i ,
(2)
i=1
where (E AD)i and (LG D)i are respectively the Exposure At Default and the Loss Given Default of the i-th risk. In order to ease the semi-analytic computation of the distribution of L, the model introduces a new set of variables Yi , each replacing the corresponding indicator function 1Ii (i = 1, . . . , N ). The new variables Yi are supposed to be Poisson-distributed, conditionally on the value assumed by the market latent variables. Let K be the number of latent variables and Γ = (Γ1 , . . . , Γ K ) the K -dimensional vector describing the “market”, with μk := E [Γk ] and σk2 := var [Γk ]. The latent variables are assumed to be independent and gamma-distributed, where shape and scale parameters are notated as αk and βk respectively (k = 1 . . . K ). Without loss of generality, it can be further assumed that μ1 = · · · = μk = 1, so that αk = βk−1 and βk = σk2 . Conditionally on Γ , the parameter of the Poisson distribution of Yi is assumed to be: K ωik Γk , (3) pi (Γ ) = qi · ωi0 + k=1
238
J. Giacomelli and L. Passalacqua
where the factor loadings ωik are all non-negative and to sum up to unity: K
ωik = 1, ωik ≥ 0, i = 1, . . . , N , k = 0, . . . , K ,
(4)
k=0
so that qi is the unconditional expected default frequency:
∞
qi = E [ pi (Γ )] = 0
∞
··· 0
pi (Γ ) f (Γ ) dΓ1 · · · dΓk ,
(5)
and the identity between the expected values of the original Bernoulli variable 1Ii and the new Poisson variable Yi is granted, i.e. E [Yi ] = E [1Ii ] = qi . According to the above hypothesis, the portfolio loss is now given by L Y : LY =
N
Yi · E i , where Yi |Γ ∼ Poisson ( pi (Γ )) .
(6)
i=1
In [5] it is shown how to compute the distribution of L Y using a recursive method known as “Panjer recursion”. In the language of copula functions, the structure of dependence implied by (3) corresponds [6] to a multivariate Clayton copula. The parameters of the copula are the factor loadings ωik , that can be gathered in a N × K matrix W . Typically it holds N K , implying that W is much smaller than the N × N covariance matrix of the default indicators 1I. It is easy [7] to show that K cov Yi , Y j = qi q j ωik ω jk σk2 + δi j qi .
(7)
k=1
where δi j stands for the Kronecker delta. A scaled and shifted version A of the covariance matrix introduced in Eq. (7) is defined below and used in the next section to address the calibration problem. Ai j :=
1 1 cov Yi , Y j − δi j qi q j qi
(8)
Following this notation, (7) can be written in the equivalent form: A = WΣWT
(9)
where Σ := diag σ 2 . As suggested in [7], Eq. (9) allows the calibration of the factor loadings, and thus of the dependence structure, by matching the observed covariance matrix of historical default time series with model values. This can be achieved in practice by the application of Symmetric Non negative Matrix Factorization (SNMF),
Improved Precision in Calibrating CreditRisk+ Model …
239
ˆ in order to obtain or other equivalent decomposition technique, to the estimator A, ˆ However, since the model is the complete estimated set of model parameters Wˆ , Σ. defined in a single-period framework, with a reference “forecasting” time horizon (t, T ], that is typically of 1 year, it is not a priori evident how to use historical time series with a different frequency (e.g. quarterly or monthly) to build Aˆ in a consistent way. Naively speaking, it is reasonable to expect that the larger the information provided by the historical time series (i.e. the higher the frequency), the better should be the calibration. The questions of how estimate A at various time scales and how calibration precision depends on time series frequency has recently been addressed in [8].
3 Time Series Sampling Period and Calibration Precision Self-consistency of CreditRisk+ at different time scales is a necessary condition to be satisfied in order to estimate A at a shorter time scale (e.g. one month sampling period of the considered historical time series) than the one onto which the model is applied (e.g. one year long projection horizon). Namely, we require that Eq. (3) holds at both the calibration and the projection time scales, together with the distributional hypotheses on latent variables. Moreover, we require that W remains the same at any time scale and only Σ and qi are allowed to change. Given an arbitrary partition {t ≡ t0 < · · · < t j < · · · < tm ≡ T } of the projection interval (t, T ], it can be shown [8] that if each interval (t j−1 , t j ] ( j = 1, . . . , m) satisfies the following hypotheses qi t j−1 , t j qi (t, T ) = = constant, T −t t j − t j−1
( j) −2 t j − t j−1 2 T −t , σk Γk ∼ Gamma σk , T −t t j − t j−1
(10) (11)
then the CreditRisk+ self-consistency conditions described above are satisfied as well, where qi (t , t ) is the unconditional expected default frequency of the i-th risk ( j) evaluated over the interval (t , t ] and Γk (k = 1 . . . K ) are the latent variables that verify the form of Eq. (3) together with qi (t j−1 , t j ). In this framework, we consider two additional (realistic) hypotheses in order to ˆ The sampling period Δm := (T − t)/m is the same for all the observaevaluate A. tions / time series considered in calibration, and the projection horizon measures m times the sampling period length (e.g. monthly historical time series vs one year projection horizon implies m = 12). Regarding the portfolio, we assume, as in real life applications, that the buyers can be divided in H N sectors. H The h-th sector ch includes n h buyers, implying the n h = N . Each buyer belonging to the h-th sector has normalization condition h=1
240
J. Giacomelli and L. Passalacqua
the same dependency ωhk from the k-th latent variable. Hence, the dimensionality of the calibration problem is reduced from N × K to H × K . Let f h (t) be the claim rate historical time series observed for the h-th sector using the sampling period Δm . Moreover, let Fh (t) be the claim rate historical time series observed for the h-th sector using the sampling period m × Δm . Hence, each observation of Fh summarizes the information contained in m observations of f h by the relation m
(12) 1 − f h (t j ) . Fh (t) = 1 − j=1
Given any two, not necessarily distinct, sectors h, h , it can be shown [8] that Eqs. (3), (7), (10), (11) and (12) imply the following scale-invariance law 1
[cov ( f h , f h ) + s h s h ] Δm = constant,
(13)
for each Δm > 0, where s h := 1 − q h = 1 − E[ f h ]. Equations (13) and (7) allow to define Aˆ both at the time scales Δ1 (projection horizon) and Δm (sampling period): Ahh
Qh = cov (Fh , Fh ) − δhh nh Q h Q h Qh 1 m = (cov ( f h , f h ) + s h s h ) − S h S h − δhh nh Q h Q h 1
(14) (15)
where S h := 1 − Q h = 1 − E[Fh ]. Typically n h 1 while Q h 1, hence the term Q h /n h can be considered approximately equal to zero even when h = h . The form of Eq. (14) Ahh in can be derived from Eq. (8) considering that it holds cov(Fh , Fh ) = 1 ) by construction. cov(Y , Y i i i∈ch i ∈ch n h n h RHS of Eqs. (14) and (15) measure the same quantity Ahh . However, the correˆ (m) sponding estimators Aˆ (1) hh and A hh —obtained by replacing covariances and means with their sample estimators—do not have the same precision. Recalling that a Gamma distribution is approximately normal when the scale parameter is “small” (i.e. σk2 1 given that E[Γk ] = 1) and that the sample covariance matrix of a set of normal variables is Wishart distributed, it can be shown [8] that √ std Aˆ (1) m · std Aˆ (m) hh
hh , implying a considerable reduction of the standard error by choosing a shorter sampling period for the definition of the time series used in calibration. We have verified numerically that a shorter sampling period leads to a more precise estimator of Ahh , even when the normal approximation of latent variables is not more applicable. However, the precision gain obtained at shorter sampling periods decreases at increasing σk2 values.
Improved Precision in Calibrating CreditRisk+ Model …
241
References 1. International Credit Insurance & Surety Association.: A Guide to Trade Credit Insurance, 1st edn. Anthem Press, London (2015) 2. Jus, M.: Credit Insurance, 1st edn. Elsevier Academic, Cambridge (2013) 3. Passalacqua, L.: A pricing model for credit insurance. Giornale dell’Istituto Italiano degli attuari LXIX, 87–123 (2006) 4. Passalacqua, L.: Measuring effects of excess-of-loss reinsurance on credit insurance risk capital. Giornale dell’Istituto Italiano degli attuari LXX, 81–102 (2007) 5. Credit Suisse First Boston.: CreditRisk+ , A Credit Risk Management Framework 6. McNeil, A., Frey, R., Embrechts, P.: Quantitative Risk Management. Princeton University Press, Princeton (2015) 7. Vandendorpe, A., Ho, N., Vanduffel, S., Van Dooren, P.: On the parameterization of the CreditRisk+ model for estimating credit portfolio risk. Insur. Math. Econ. 42, 736–745 (2008) 8. Giacomelli, J., Passalacqua, L.: Calibrating the CreditRisk+ model at different time scales and in presence of temporal autocorrelation. Math. 9(14), 1679 (2021). https://doi.org/10.3390/ math9141679
A Model-Free Screening Selection Approach by Local Derivative Estimation Francesco Giordano, Sara Milito, and Maria Lucia Parrella
Abstract A new model-free screening method, called Derivative Empirical Likelihood Independent Screening (D-ELSIS) is proposed for high-dimensional regression analysis. Without requiring a specific parametric form of the underlying data model, our method is able to identify explanatory variables that contribute to the explanation of the response variable in nonparametric and non-additive contexts. This approach is fully nonparametric and combines the estimation of marginal derivatives by the local polynomial estimator together with the empirical likelihood technique. The proposed method can be applied to variable screening problems emerging from a wide range of areas, from genomic and health science to economics, finance and machine learning. We report some simulation results in order to show that the D-ELSIS screening approach performs satisfactorily. Keywords Screening selection · Nonparametric regression · High-dimension
1 Introduction The typical aim of statistical analysis with high-dimensional data is to identify the set of relevant variables that contribute in the explanation of a given response variable. n from the data model Suppose that we have a random sample {(X i , Yi )}i=1 Yi = m(X i ) + εi
(1)
where Y is the response variable, X is the vector of p candidate variables, ε is the error with E(ε|X) = 0 and m(·) is the unknown regression function. Without loss F. Giordano · S. Milito (B) · M. L. Parrella University of Salerno, Via Giovanni Paolo II, 84084 Fisciano, SA, Italy e-mail: [email protected] F. Giordano e-mail: [email protected] M. L. Parrella e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_36
243
244
F. Giordano et al.
of generality, we assume E(Y ) = 0, which implies E{m(X)} = 0. Let us denote with M ∗ = {1 ≤ j ≤ p : the j-th variable in X is relevant for explanation of Y } the set of p ∗ true relevant covariates in model (1). We consider a sparse model, where only a small proportion of the candidate variables contribute to the response ( p ∗ p). In order to identify the p ∗ relevant covariates in M ∗ in high-dimensional nonparametric regression with p very large, following the idea in Chang et al. [1], we propose an independence model-free feature screening technique that combines the local polynomial (l.p.) regression and the empirical likelihood (e.l.) technique. The new procedure, called Derivative Empirical Likelihood Independence Screening (D-ELSIS), works as follows. First, it uses l.p. regression to estimate the first marginal derivatives of the regression function, with respect to all variables in X (so, p derivatives in total). Then, it checks by the e.l. (a nonparametric inference method, see [4]) if these derivatives are uniformly zero (or not) on the support of each variable. Those variables for which the test is passed are chosen as relevant ones. By derivatives, we investigate the marginal contribution from each variable X j (a single component of vector X ) in explaining Y , to justify whether it is relevant or not. In fact, keeping the rest of the variables in X fixed, if a variable X j is not contributing to Y marginally, then its marginal derivative ∂m(X)/∂ X j is zero for all x ∈ χ j , where χ j is the support of X j . With this idea in mind, we attempt a feature screening procedure that is able to determine whether ∂m(X)/∂ X j ≡ 0, or not, for each j = 1, . . . , p. Until now, based on our knowledge, no other screening method uses the l.p. estimation of the first marginal derivative for screening purposes (see [1] and the reference therein). The estimator used in D-ELSIS is more complicated compared to the Nadaraya– Watson used in Chang et al. [1] and it needs some non trivial theoretical analysis in order to show its consistency. The advantage, however, is the possibility of using D-ELSIS also to identify the non-linear covariates (which cannot be done in [1]). In fact, a variable is linear in (1) when the difference between the marginal derivative of the covariate and its mean is identically zero. In the first step of D-ELSIS we already have an estimate of the marginal derivative, therefore, we only need to add a second step to check whether such a difference is identically zero for all the points in the support.
2 Derivative Estimation by Local Polynomials Denoting with m j (x) the marginal relation between Y and X j , its l.p. estimation is derived by means of a weighted least squares regression fitted at each point of interest, x0 , using data from its neighbourhood. Suppose that the (d + 1)th derivative of m j (x0 ) exists and is continuous. We approximate locally the marginal function m j (x) using Taylor’s Expansion by a polynomial of order d [2]. Then, we can estimate the expansion terms using weighted least squares by minimizing the following equation for β jv := β jv (x0 ) = m (v) j (x 0 )/v!:
A Model-Free Screening Selection Approach by Local Derivative Estimation n
Yi −
d
245
2 β jv (x0 )(X i j − x0 )
K h (X i j − x0 )
v
(2)
v=0
i=1
where h, called the bandwidth, controls the size of the neighbourhood around x0 , K h (·) controls the weights, with K h (x) ≡ K (x/ h)/ h and K a kernel function satisfying K (x)d x = 1, and X i j is the ith observation of the jth explanatory variable. ˆ Let βˆ j = (βˆ j1 , . . . , βˆ jd ) be the solution of (2), then mˆ (v) j (x 0 ) = v!β jv is an estima(v) tor for m j (x0 ) with v = 0, . . . , d. In matrix notation, let X j be the design matrix centred at x0 : ⎞ ⎛ 1 (X 1 j − x0 ) . . . (X 1 j − x0 )d ⎟ ⎜ .. .. X j = ⎝ ... (3) ⎠, . . 1 (X n j − x0 ) . . . (X n j − x0 )d
Y = (Y1 , . . . , Yn )T and W j a diagonal matrix of weights with diagonal elements K h (X i j − x0 ), for i = 1, . . . , n. Then, the local estimate of m (v) j (x) with a dth degree polynomial is T T −1 T mˆ (v) j (x; d, h) = v!ev+1 (X j W j X j ) X j W j Y = v!
n
Wi, j,d (x)Yi
(4)
i=1 T for v = 0, . . . , d, where Wi, j,d (x) = ev+1 (XTj W j X j )−1 XTj W j ei . Here er is the (d + 1) × 1 vector having 1 in the r th entry and zeros elsewhere. In general, we need a polynomial of order d = v + 1 (see [2]). So, we use the local quadratic estimator, i.e. d = 2, with the Epanechnikov kernel, in order to estimate the first derivative (v = 1).
3 The Proposed Procedure As said, we estimate the first marginal derivative using the local quadratic estimator. To assess ∂m(X)/∂ X j | X j =x = 0 at given x without distributional assumptions, we construct the following e.l. test-statistic, using the same steps of Chang et al. [1]: E L j (x, 0) = sup w
n i=1
wi : wi ≥ 0,
n
wi = 1,
i=1
n
wi Wi, j,2 (x)Yi = 0 .
(5)
i=1
Applying the Lagrange multiplier method to solve the (5), we obtain the e.l. ratio l j (x, 0) = −2 log{E L j (x, 0)} − 2n log n = 2
n i=1
log{1 + λWi, j,2 (x)Yi },
(6)
246
F. Giordano et al.
n Wi, j,2 (x)Yi where λ is the univariate Lagrange multiplier solving i=1 = 0. Since 1+λWi, j,2 (x)Yi n i=1 Wi, j,2 (x)Yi converges to the derivative of X j evaluated at x, a large value of l j (x, 0) is taken as evidence against ∂m(X) | = 0. Then, l j (x, 0) is a statistic for ∂ X j X j =x testing whether or not (4) has zero mean locally at x. For assessing ∂m(X)/∂ X j ≡ 0 uniformly on χ j , we use l j (0) = sup l j (x, 0) x∈χ jn
for each j = 1, . . . , p, where χ jn is a partition of the support χ j into several intervals. For feature screening purposes, we sort l j for all j = 1, . . . , p in decreasing order, and we take the first γn elements of the list. In this way, we define the set Mˆ γn = {1 ≤ j ≤ p : l j ≥ γn } as an estimate of M ∗ . In order to implement the proposed method, we evaluate the statistic l j using l j (0) = max1≤i≤n l j (X i j , 0), where X i j is the ith observation of the jth explanatory variable. With this expedient, we can use the univariate optimisation to solve (6) by the Lagrange multiplier method.
4 Simulation Results Several simulation studies are conducted to investigate the performance of the proposed D-ELSIS method in terms of the following three criteria: (i) the median of the minimum model size (MMSs, i.e., the smallest number of the selected covariates including all the active explanatory variables) for 100 repetitions; (ii) the IQR divided by 1.34 (SD), that is a robust measure of the standard error of MMS; (iii) the true positive rate in percentage (TPR) that control the precision measuring the proportion of actual relevant variables that are correctly identified as such. To calculate the TPR we consider a fixed threshold γn such that the predicted relevant variables are the first 20. In order to have a very good method, the MMS must be equal to the number of the true active variables, with a small SD and an high TPR. We set n = (500, 750, 1000) and p = (100, n/2, 2n). For comparison, we also consider other three screening methods for nonparametric models: the Fused Kolmogorov Filter (FKF) of Mai and Zou [3], the Fused Mean-Variance (FMV) of Yan et al. [5] and the local Empirical Likelihood SIS (ELSIS) of Chang et al. [1]. For the estimation of h in ELSIS and D-ELSIS we consider a cross-validation leave-one-out. We consider two models in the simulation study, described in the following.
A Model-Free Screening Selection Approach by Local Derivative Estimation
247
Example 1 (Additive model with uniform covariates) This example is taken from Example 2 of Chang et al. [1]. Data are generated from model sin(2π X 3 ) + 6 [0.1 sin(2π X 4 ) + 0.2 cos(2π X 4 ) 2 − sin(2π X 3 ) +0.3(sin(2π X 4 ))2 + 0.4(cos(2π X 4 ))3 + 0.5(sin(2π X 4 ))3 + σ ε .
Y = 5X 1 + 3(2X 2 − 1)2 + 4
Here predictors X j ’s are i.i.d random variables of U (0, 1) distribution, and ε ∼ N (0, 1) is independent of X j ’s. In this case we have p ∗ = 4 relevant covariates. We consider four different signal to noise ratios by varying σ 2 , as in Chang et al. [1]. The results are in Table 1. Example 2 (Linear model with correlated normal covariates) Data are generated from model Y = X1 + X2 + X3 + X4 + ε where X ∼ N (0, ) and ε ∼ N (0, 1) is independent of each X j with j = 1, . . . , p. In this model we have p ∗ = 4 relevant linear covariates, all of which with the same parameter 1. We set the variance-covariance matrix = (σk j ) with σkk = 1 and ρ = σk j = 0 and 0.5: so we consider both independent and correlation cases between active and non-active covariates. The results are in Table 2. Overall, in all the settings we consider here, D-ELSIS typically offers similar and sometimes better performance than its competitors. In Example 1, D-ELSIS and ELSIS are equivalent in all cases. Both approaches are able to correctly identify the set of true covariates in the first 20 top ranked variables for all the signal to noise ratios considered. Furthermore, the percentage of relevant covariates included from the top ranked is always 100, so both methods do not make the error of excluding the relevant among the first 20. On the other hand, the fused technique fails to be competitive. In fact, FMV and FKF do not achieve the same performance. This happens in each of the cases evaluated, even when the number of observations is greater than the number of covariates, as in the case n = 500 and p = 100. The TPR is always lower compared to that of the first two approaches. In summary, when the covariates are independent, the four methods are substantially equivalent. In case of correlation among covariates, as in Example 2, D-ELSIS is equivalent to methods based on the fusion, while its performance is always better than the ELSIS method.
4 (0.00)
9 (8.96)
10 94.00 (10.45) σ 2 = 1.74
4 (0.00)
4 (0.00)
7 (4.66)
9 (8.58)
D-ELSIS
ELSIS
FMV
FKF
D-ELSIS
ELSIS
FMV
FKF
4 (0.00)
8 (8.96)
10 (9.14)
ELSIS
FMV
FKF
4 (0.00)
4 (0.00)
8 (9.14)
10 (11.19)
D-ELSIS
ELSIS
FMV
FKF
σ2 = 3
4 (0.00)
D-ELSIS
σ2 = 2
4 (0.00)
Method
94.00
95.50
100.00
100.00
95.25
95.50
100.00
100.00
95.25
97.25
100.00
100.00
95.75
100.00
100.00
15 (19.03)
17 (18.47)
4 (0.00)
4 (0.00)
18 (20.90)
16 (21.64)
4 (0.00)
4 (0.00)
20 (29.29)
12 (14.55)
4 (0.00)
4 (0.00)
12 (21.27)
13 (16.42)
4 (0.00)
4 (0.00)
MMS (SD)
89.75
89.25
99.75
99.75
89.00
89.75
100.00
100.00
88.00
91.25
100.00
100.00
90.50
92.00
100.00
100.00
TPR
n = 500 p = 250
MMS (SD) σ2 = 1
TPR
n = 500 p = 100
p∗ = 4
39 (60.82)
37 (73.13)
4 (0.19)
4 (0.19)
45 (83.21)
32 (61.19)
4 (0.00)
4 (0.75)
60 (111.75)
46 (82.84)
4 (0.00)
4 (0.00)
54 (104.29)
37 (70.52)
4 (0.00)
4 (0.00)
MMS (SD)
83.25
83.50
98.75
98.75
83.00
81.75
99.50
98.75
82.25
82.25
99.50
99.50
81.75
83.25
99.00
99.50
TPR
n = 500 p = 1000
Table 1 Simulation results from Example 1
5 (3.17)
5 (2.24)
4 (0.00)
4 (0.00)
6 (4.67)
5 (3.73)
4 (0.00)
4 (0.00)
5 (2.99)
5 (2.24)
4 (0.00)
4 (0.00)
6 (3.73)
5 (2.99)
4 (0.00)
4 (0.00)
98.50
99.50
100.00
100.00
98.25
98.75
100.00
100.00
99.25
99.00
100.00
100.00
97.25
98.50
100.00
100.00
TPR
n = 750 p = 100 MMS (SD)
13 (18.84)
8 (5.97)
4 (0.00)
4 (0.00)
9 (13.43)
8 (6.90)
4 (0.00)
4 (0.00)
8 (10.63)
8 (8.21)
4 (0.00)
4 (0.00)
10 (16.41)
8 (7.65)
4 (0.00)
4 (0.00)
MMS (SD)
91.50
95.50
100.00
100.00
93.25
96.00
100.00
100.00
94.00
94.75
100.00
100.00
92.00
95.25
100.00
100.00
TPR
n = 750 p = 375
19 (47.76)
19 (28.73)
4 (0.00)
4 (0.00)
17 (30.41)
18 (25.00)
4 (0.00)
4 (0.00)
22 (44.78)
17 (25.37)
4 (0.00)
4 (0.00)
30 (59.51)
25 (37.50)
4 (0.00)
4 (0.00)
88.00
87.75
100.00
100.00
88.25
88.25
100.00
100.00
87.25
88.75
100.00
100.00
86.25
86.50
100.00
100.00
TPR
n = 750 p = 1000 MMS (SD)
4 (2.24)
4 (0.75)
4 (0.00)
4 (0.00)
4 (0.75)
4 (0.75)
4 (0.00)
4 (0.00)
4 (0.75)
4 (0.75)
4 (0.00)
4 (0.00)
4 (0.75)
4 (0.76)
4 (0.00)
4 (0.00)
MMS (SD)
99.75
99.75
100.00
100.00
99.75
99.75
100.00
100.00
99.75
100.00
100.00
100.00
99.75
100.00
100.00
100.00
TPR
n = 1000 p = 100
5 (4.48)
6 (3.73)
4 (0.00)
4 (0.00)
5 (4.66)
5 (2.99)
4 (0.00)
4 (0.00)
6 (5.97)
5 (3.17)
4 (0.00)
4 (0.00)
6 (4.66)
5 (5.04)
4 (0.00)
4 (0.00)
MMS (SD)
96.75
98.50
100.00
100.00
98.50
98.75
100.00
100.00
97.50
98.25
100.00
100.00
96.75
97.00
100.00
100.00
TPR
n = 1000 p = 500
11 (14.55)
9 (16.79)
4 (0.00)
4 (0.00)
12 (18.66)
10 (13.62)
4 (0.00)
4 (0.00)
8 (13.99)
13 (12.69)
4 (0.00)
4 (0.00)
11 (34.33)
15 (20.34)
4 (0.00)
4 (0.00)
MMS (SD)
91.25
92.25
100.00
100.00
82.00
92.75
100.00
100.00
93.25
93.25
100.00
100.00
91.25
91.25
100.00
100.00
TPR
n = 1000 p = 2000
248 F. Giordano et al.
4 (0.00)
4 (0.00)
4 (0.00)
ELSIS
FMV
FKF
4 (0.00)
5 (2.99)
4 (0.00)
4 (0.00)
D-ELSIS
ELSIS
FMV
FKF
ρ = 0.5
ρ=0
4 (0.00)
D-ELSIS
100.00
100.00
99.00
100.00
100.00
100.00
100.00
100.00
4 (0.00)
4 (0.00)
8 (9.89)
4 (0.75)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
100.00
100.00
94.75
96.50
100.00
100.00
100.00
100.00
TPR
n = 500 p = 250
MMS (SD)
MMS (SD)
TPR
n = 500 p = 100
Method
p∗ = 4
4 (0.00)
4 (0.00)
18 (25.56)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
MMS (SD)
99.50
100.00
87.50
100.00
100.00
100.00
100.00
100.00
TPR
n = 500 p = 1000
Table 2 Simulation results from Example 2
4 (0.00)
4 (0.00)
4 (0.75)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
MMS (SD)
100.00
100.00
100.00
100.00
100.00
100.00
100.00
100.00
TPR
n = 750 p = 100
4 (0.00)
4 (0.00)
5 (2.24)
4 (0.75)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
MMS (SD)
100.00
100.00
98.00
100.00
100.00
100.00
100.00
100.00
TPR
n = 750 p = 375
4 (0.00)
4 (0.00)
5 (2.24)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
MMS (SD)
100.00
100.00
98.00
100.00
100.00
100.00
100.00
100.00
TPR
n = 750 p = 1000
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
MMS (SD)
100.00
100.00
100.00
100.00
100.00
100.00
100.00
100.00
TPR
n = 1000 p = 100
4 (0.00)
4 (0.00)
4 (0.75)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
MMS (SD)
100.00
100.00
99.75
100.00
100.00
100.00
100.00
100.00
TPR
n = 1000 p = 500
4 (0.00)
4 (0.00)
5 (5.22)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
4 (0.00)
MMS (SD)
100.00
100.00
98.75
99.75
100.00
100.00
100.00
100.00
TPR
n = 1000 p = 2000
A Model-Free Screening Selection Approach by Local Derivative Estimation 249
250
F. Giordano et al.
References 1. Chang, J., Tang, C.Y., Wu, Y.: Local independence feature screening for nonparametric and semiparametric models by marginal empirical likelihood. Ann. Stat. 44(2), 515–539 (2016) 2. Fan, J., Gijbels, I.: Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability, vol. 66. CRC Press, Boca Raton (1996) 3. Mai, Q., Zou, H.: The fused Kolmogorov filter: a nonparametric model-free screening method. Ann. Stat. 43(4), 1471–1497 (2015) 4. Owen, A.B.: Empirical Likelihood. Chapman and Hall/CRC, Boca Raton (2001) 5. Yan, X., Tang, N., Xie, J., Ding, X., Wang, Z.: Fused mean-variance filter for feature screening. Comput. Stat. Data Anal. 122, 18–32 (2018)
Markov Switching Predictors Under Asymmetric Loss Functions Francesco Giordano and Marcella Niglio
Abstract There is empirical evidence that in economic and financial domains the forecast generation is often based on asymmetric losses that allow to differently treat positive and negative forecasts errors. It has led to the introduction of predictors that differently consider the cost related to the over and underprediction. It this context we focus the attention on the generation of forecasts from nonlinear Markov Switching models using the asymmetric LinEx loss function. After the presentation of the model, we introduce an asymmetric LinEx predictor for a well defined variant of Markov Switching structure, generalizing some results given in the literature and focusing the attention on the theoretical formulation of the predictor and on the properties mainly related to its bias. These results are illustrated in an example that gives evidence of some features of the new predictor. Keywords Markov switching · Asymmetric predictors · LinEx loss
1 Introduction The importance given to the forecasts generation in time series analysis has led researchers to improve and to propose approaches that allows to increase the accuracy of the generated predictions, taking into account empirical needs related to the forecast generation or even introducing theoretical advances that, starting from the evaluation of the main features of the data generating process, define new predictors. These needs have been perceived even in nonlinear time series domain where the forecasts generation has been differently debated and the evaluation of the predictive accuracy has received equally large attention (see, among the others, [6]). The common factor that emerges in many contributions is that in the forecast generation and evaluation a key role is assumed by the loss function, whose selection F. Giordano · M. Niglio (B) Department of Economics and Statistics, University of Salerno, Fisciano, Italy e-mail: [email protected] F. Giordano e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_37
251
252
F. Giordano and M. Niglio
can be seen as “... a primitive to the forecasting problem” ([2], p.13) because it defines the cost, in terms of forecast accuracy, of an imperfect forecast generation. In more detail let Yt , t ∈ T , a time series of length T and let L (·) a loss function, the predictor YˆT +h of YT +h , for h = 1, 2, . . . , H , is obtained by the minimization of: min E[L (YT +h , YˆT +h )|FT ], YˆT +h
with FT = {yT , yT −1 , . . .} the information set up to time T . A large number of loss functions has been introduced to generate forecasts whose properties, advantages and drawbacks need to be evaluated case by case (for a broad review see [2, ch. 2] whereas for some general results on the properties see [3]). In this large class of asymmetric loss functions, we focus the attention on the LinEx loss [7], given by the combination of a Linear and Exponential loss: L (YT +h , YˆT +h ) = exp{a(YT +h − YˆT +h )} − a(YT +h − YˆT +h ) − 1
(1)
with a = 0 and such that: for a > 0 positive forecast errors (underprediction) have more heavy effects on the forecast loss, if compared to negative forecast errors (overprediction) that have approximately a linear behavior; on the contrary with a < 0, the forecast loss is approximately linear with positive forecast errors and exponential with negative forecast errors so giving more weight to underprediction than overprediction. The behavior of the LinEx loss, for different values of a, is depicted in Fig. 1 and it is further compared to the symmetric square loss L (YT +h , YˆT +h ) = (YT +h − YˆT +h )2 . Starting from these results, in the following we introduce an asymmetric predictor for Markov Switching models. After a short presentation of the model in Sect. 2, in
lx (a=1) lx (a=-1) se
0
5
Loss
10
15
Fig. 1 Comparison of loss functions: LinEx (lx) loss (1) with a = {−1, 1} [straight and dashed line] and square loss (se) [dotted line]
-3
-2
-1
0
1
2
3
Markov Switching Predictors Under Asymmetric Loss Functions
253
Sect. 3 we give the details of the new predictor and we present an example to compare the asymmetric predictor to a symmetric competitor.
2 Markov Switching Model The attention in the following will be given to the nonlinear time series structures that belong to the class of Markov Switching (MS) models. The large attention given to these structures is related to their ability to model many features of economic and financial time series (such as business cycles, asymmetries, fat tails) that has led to extensively apply this class of models in empirical domain. Different variants of Markov Switching models have been proposed in the literature [4]. Here we consider the following form: let Yt be a time series that follows a MS structure given by (2) Yt = μst + σst vt , with vt a sequence of independent and identically distributed (i.i.d.) Gaussian random variables, having E[vt ] = 0 and E[vt2 ] = 1; μst is a state dependent mean with st = i, for i = 1, . . . , k; vt is independent from st , for all t; st is a first order recurrent, homogeneous, stationary and ergodic Markov chain with k-states, such that: P(st = j|st−1 = i, st−2 = r, . . .) = P(st = j|st−1 = i) = pi j ,
(3)
for i, j = 1, . . . , k. Note that pi j , for i, j = 1, 2, . . . , k is the transition probability: in other words it defines the probability that st = j given that st−1 = i. These transition probabilities are collected in the transition matrix: ⎡ ⎤ p11 p12 · · · p1k ⎢ p21 p22 · · · p2k ⎥ ⎢ ⎥ (4) P(st |st−1 ) = P = ⎢ . . . . ⎥ ⎣ .. .. . . .. ⎦ pk1 pk2 · · · pkk where kj=1 pi j = 1, for i = 1, . . . , k. The transition matrix P is critical in defining the dynamic structure of this class of models. Let ξ t a (k × 1) vector such that: ⎧ (1, 0, 0, 0) if st = 1, ⎪ ⎪ ⎪ ⎨ (0, 1, 0, 0) if st = 2, ξt = .. .. ⎪ . . ⎪ ⎪ ⎩ (0, 0, 0, 1) if st = k.
254
F. Giordano and M. Niglio
where the jth element of ξ t is 1 if st = j, for j = 1, 2, . . . , k. Hence, following [4], the conditional expectation E(ξ t |ξ t−1 ) can be rewritten as: E(ξ t |ξ t−1 ) = Pξ t−1 .
(5)
The estimation of these conditional expectation has been largely presented in [4]. In particular, when the states are unobserved, as in the forecasting domain, ξ t need to be estimated, such that: ξˆ t|t = [P(st = 1|Ft ), P(st = 2|Ft ), . . . , P(st = k|Ft )]. The estimates are then obtained using iterative equations (for more details see [4]) and taking advantage of the conditional densities of f (Yt |st = i, Ft−1 ), for i = 1, . . . , k.
3 Asymmetric Predictors The generation of forecasts from time series models is usually performed minimizing the expected square loss function. Several advantages can be obtained from its use. Among them: the predictors are unbiased; the predictor variance is monotonically increasing as the number of steps ahead increases; the forecast errors eˆt+1|t = Yt+1 − Yˆt+1|t are serially uncorrelated; from the analytical point of view the square function is often (but not always) easier to minimize with respect to other loss functions. In presence of Markov Switching models the forecast generation has been differently faced. Following [4], under the square loss function and given the information set Ft , the one step ahead forecast is given by: Yˆt+1|t ≡ E[Yt+1 |θ , Ft ] =
k
P(st+1 = j|Ft )E[Yt+1 |st+1 = j, θ , Ft ]
(6)
j=1
with θ the vector of known parameters. In practice, the conditional expectation (6) is a weighted average of the predictions generated from each of the k regimes with weights given by the probability that st+1 = j, for j = 1, 2, . . . , k (for more details, see [1]). If an asymmetric loss is used to generate forecasts, the predictor assumes a completely different form. In this regard [5] introduce a new predictor, under the LinEx loss (1), for a simpler variant of the MS model (2). In particular, in their MS structure, the mean does not change with the state variable, so μst = μ. This difference between the model considered in [5] and model (2) is not negligible from the dynamic point of view and has heavy impact on the forecast generation. To give theoretical evidence of what stated we can show that:
Markov Switching Predictors Under Asymmetric Loss Functions
255
Proposition 1 Let Yt be a time series generated from model (2). Assume that vt+1 has a Normal distribution independent from st+1 , such that when st+1 = i, Yt+1 ∼ N (μi , σi2 ), for i = 1, 2, . . . , k. Then the LinEx predictor of Yt+1 is: 1 ∗ = log[ξˆ t|t Pω] Yˆt+1|t a 2 with ω = (ω1 , ω2 , . . . , ωk ) , ωi = exp aμi + a2 σi2 , for i = 1, 2, . . . , k.
(7)
The bias, that is intrinsic in the generation of the predictor (7), can be evaluated considering this further proposition: Proposition 2 Let Yt be a time series generated from model (2). Under the assump∗ ∗ = Yt+1 − Yˆt+1|t has conditional and tions of Proposition 1, the prediction error eˆt+1|t unconditional expectation respectively given by: 1 ∗ E[eˆt+1|t |Ft ] = ξˆ t|t Pμ − log ξˆ t|t Pω a 1 ∗ ¯ E[eˆt+1|t ] = ξ μ − log(ξ¯ ω) a
(8) (9)
with μ = (μ1 , . . . , μk ) , ξ¯ = ξ¯ P the vector of unconditional probabilities and ω defined in Proposition 1.
The proofs of Propositions 1 and 2 are omitted because they follow the same lines of [5], with the only difference that in our case the means included in the vector μ change with the states. The distribution and the bias of the LinEx predictors can be empirically appreciated in the following example, where the predictors (6) and (7) are compared. Example 1 Let Yt be a time series generated from model (2) with st = 1, σst = 1 μ0.35 if st = 1, μst = −1, σst = 2 if st = 2, transition matrix P = 0.65 0.45 0.55 , vector of ¯ unconditional probabilities ξ = (0.5625; 0.4375) and with vt a Gaussian random variable independent from st . We have generated 5000 time series of length n = 500: for each of them we have generated the forecasts from (6) and (7) considering two values for the asymmetric parameter, a = 1 and a = −1 respectively. The empirical distribution of the prediction errors eˆt+1|t is shown in Fig. 2. It clearly highlights the bias of the LinEx predictor that exhibits different behavior with respect to the value assigned to the parameter a and further emphasizes an asymmetric distribution (that has been often found in nonlinear domain) of eˆt+1|t that is more marked when the predictor (7) is considered. Finally, to evaluate the empirical risk, we have computed the averages of the simulated losses of the predictor (7), with a = 1 and a = −1, and the predictor (6) using, in the first two cases, the LinEx loss and in the last case the square root of the
5 -5
0
Fig. 2 Prediction errors eˆt+1|t = Yt+1 − Yˆt+1|t from the LinEx predictor with a = 1 and a = −1 (7) and the conditional expectation (6)
F. Giordano and M. Niglio 10
256
LX(a=1)
LX(a=-1)
MSE
Mean Square Forecast Error, to properly compare the results. The values obtained are: 2.850, 6.960 and 1.575 respectively. It highlights that the use of the asymmetric LinEx function increases the empirical risk related to the generated forecasts. The features, emerged from the example, lead to properly use the biased predictor (7) in all contexts where, for different reasons, the agents consider inappropriate to assign equivalent cost to positive and negative forecast errors and where overpredict or underpredict future values can be related to a policy of risk management.
References 1. Clements, M.P., Krolzig, H.M.: A comparison of the forecast performance of Markov switching and threshold autoregressive models of US GNP. Econ. J. 1, 47–75 (1998) 2. Elliott, G., Timmermann, A.: Economic Forecasting. Princeton University Press, Princeton (2016) 3. Granger, C.W.J.: Outline of forecast theory using generalized cost functions. Span. Econ. Rev. 1, 161–173 (1999) 4. Hamilton, J.D.: Time Series Analysis. Princeton University Press, Princeton (1994) 5. Patton, A.J., Timmermann, A.: Properties of optimal forecasts under asymmetric loss and nonlinearity. J. Econ. 140, 884–918 (2007) 6. Terasvirta, T., Kock, A.B.: Forecasting with nonlinear time series models. CREATES Research Paper No. 2010-1. Available at SSRN. https://ssrn.com/abstract=1531092 7. Varian, H.R.: A Bayesian approach to real estate assessment. In: Studies in Bayesian Econometrics and Statistics in Honor of L. J. Savage, pp. 195–208. North Holland, Amsterdam (1975)
Screening Covariates in Presence of Unbalanced Binary Dependent Variable Francesco Giordano, Marcella Niglio, and Marialuisa Restaino
Abstract In this contribution we propose a new method to identify the most relevant covariates in a large dataset that (i) is applicable in presence of regression models where the binary dependent variable is characterized by a very small number of ones than zeros, (ii) is not strongly influenced by the correlation between covariates, (iii) is easily applied when the number of predictors increases up to infinity and/or it is greater than the sample size. The proposed procedure extends the idea of Sure Independence Screening for the linear regression model to the Generalized Extreme Value regression framework. This technique allows to define a set of relevant covariates that survive after applying the screening procedure and that, with a probability tending to one, includes the true relevant covariates. We validate the proposed procedure by a simulation study and an empirical analysis devoted to the prediction of firms failure. Keywords GEV · Screening · Variable selection
1 Introduction Since the paper of [1], different models able to predict firms failure and identify the most likely early warning indicators of default have been proposed (for a review see [5]). One of the most commonly used is the logistic regression [10], even if it may not be appropriate in presence of binary dependent variable with unbalanced ones (events, such as firms’ or banks’ defaults) and zeros (non events) for the reasons described in [7] and [8] : (i) it sharply underestimates the probability of rare events, (ii) F. Giordano · M. Niglio · M. Restaino (B) Department of Economics and Statistics, University of Salerno, Via Giovanni Paolo II, 132, Fisciano Salerno, Italy e-mail: [email protected] F. Giordano e-mail: [email protected] M. Niglio e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_38
257
258
F. Giordano et al.
the estimators tend to be biased towards the majority class, which is usually the least important, (iii) the bias of the maximum likelihood estimators of logistic regression parameters in small sample sizes could be amplified in rare event context [9]. Against this background, a skewed link function given by the inverse of the distribution function of a Generalized Extreme Value (GEV) random variable has been suggested in [3, 13]. It is particularly well-suited for predicting firms’ failures, as done in this paper, since it approaches one at a slower rate than it approaches zero. Whichever model is used, a relevant and quite common problem is related to the high number of covariates which potentially influence the event of interest. Hence, it becomes crucial to identify an optimal subset of variables that helps to discern the crucial factors of financial risks. In the literature there exist several variable selection techniques (stepwise regression, lasso, lars, and so on) that have been applied in different domains (see among the others [11, 12]). Nonetheless, all these techniques do not take into account that some of the covariates could be highly correlated (multicollinearity) [2, 4]. Therefore, based on these considerations, this paper proposes a procedure to select the most relevant covariates, that (i) is applicable in presence of regression models where the binary dependent variable is characterized by a very small number of ones than zeros (i.e. GEV regression), (ii) is not strongly influenced by the correlation between covariates, (iii) is easily applied when the number of predictors increases up to infinity and/or it is greater than the sample size. In Sect. 2 the screening procedure is shortly presented. Then, it is evaluated through a Monte Carlo study in Sect. 3 and is further validated in Sect. 4 with an application on the failure risk of the Italian manufacturing firms.
2 Methodology
Let Y = (Y1 , Y2 , . . . , Yn ) be a binary response variable modeled by a Bernoulli random variable with the probability P(Yi = 1) = πi and probability P(Yi = 0) = 1 − πi , for i = 1, . . . , n. Further, let X be the (n × p) matrix of covariates and let β = (β1 , β2 , . . . , β p ) be a vector of coefficients. Let xi = (xi1 , xi2 , . . . , xi p ) be the p-dimensional vector for the ith unit. When Yi is characterized by a very small number of 1’s, the logit link function, traditionally used in presence of a binary dependent variable, has a number of limitations that have led (see among the others [3, 13]) to propose the use of a link function based the Generalized Extreme Values distributions. Following [13], in the present paper we consider the GEV distribution function where πi is given by:
πi = 1 − exp[−(1 − ξ β xi )− ξ ] = 1 − G E V (−β xi ) ,
1
(1)
with (1 − ξ β xi ) > 0, ξ ∈ R the shape parameter and a non-canonical link function
Screening Covariates in Presence of Unbalanced Binary Dependent Variable
259
1 − [− ln(1 − πi )]−ξ = β xi . ξ
(2)
As underlined in Sect. 1, the GEV-regression is particularly suitable for the analysis of the business failure prediction, where the number of firms’ defaults is smaller than the number of non-default firms. When the probability of failure needs to be estimated, it is important to identify the subset of the most relevant covariates that influence the event of interest and the associated probability. The choice of the covariates becomes even more important when the number of covariates increases and it is greater than the sample size. In this domain, following the Sure Independent Screening introduced by [6], our aim is the proposal of a GEV-screening procedure that evaluates all the covariates in order to assign them a degree of importance, allowing to select the most relevant ones. Shortly, the GEV-screening procedure is based on three main steps. Let x j be the (n × 1) vector of the jth covariate: n 1. define i=1 j (β0 , β j , ξ ; xi j , yi ) the marginal log-likelihood function obtained from the link function (2), where β0 is the intercept, β j is the coefficient of the covariate j and the function j (β0 , β j , ξ ; xi j , yi ) is: j (β0 , β j , ξ ; xi j , yi )
= Yi log[1 − exp{−(1 − ξ(β0 + β j xi j ))− ξ }] 1
−(1 − Yi )[1 − ξ(β0 + β j xi j )]− ξ , 1
j = 1, . . . , p; (3)
n 2. maximize i=1 j (β0 , β j , ξ ; x i j , yi ) with respect to β0 , β j and ξ , for j = 1, . . . , p; 3. select from {1, . . . , p} the set of covariates such that those included in the sub model M γn have marginal coefficients |βˆ j | ≥ γn , for j = 1, 2, . . . , p, ˆ M γn = {1 ≤ j ≤ p : |β j | ≥ γn }, where γn is a predefined threshold value such that γn → 0 as n → ∞. In practice, the GEV-screening procedure is based on the evaluation of the marginal log-likelihood estimators and it can be shown that P(K ⊂ M γn ) → 1 as n → ∞, where K is the set of true relevant covariates. From the empirical point of view, the threshold γn is unknown and difficult to estimate. Thus, it can be introduced a tuning parameter, denoted d, such that the GEVscreening selects the first d covariates with the largest values of |βˆ j |, j = 1, 2, . . . , p.
260
F. Giordano et al.
3 Simulation Study To evaluate the GEV-screening approach described in Sect. 2, we perform a Monte Carlo study whose artificial data are generated through a GEV regression where the conditional distribution of Yi , given xi is Bernoulli(πi ), for i = 1, 2, . . . , n, with the maximum number of one’s n × 0.05 . The values of the covariates are generated by a p-variate Normal distribution with null vector of means and correlation matrix with elements ρk, j = ρ, for k = j and k, j = 1, . . . , p. Further, the vector β is given by β j = 2, for j = 1, 2, 3, 4 and β j = 0 for j = 5, . . . , p. We consider two values for the number of units, n = {300, 500}, two values for the correlation among covariates ρ = {0.0, 0.5} and three values for the number of covariates, p = {20, n × 0.5, n × 1.5}. For each combination of parameters, we evaluate the GEV-screening procedure performing 100 Monte Carlo replications, whose results are summarized considering the mean, median and the standard deviation of the minimum number of variables required to select the “true” relevant ones. The results, summarized in Table 1, highlight that in the two settings, with uncorrelated and correlated covariates, the median minimum number of selected variable corresponds to the number of the relevant covariates with the exception of only one high dimensional case with n = 300, p = n × 1.5, ρ = 0.5 (that performs better when the number of units increases to n = 500). Even the mean values and the variability of the minimum number of selected variables clearly show the ability of the GEV-screening to detect the relevant variables with variability that, as expected, is slightly increasing as p grows. The proposed procedure can be seen as a preliminary step to reduce the dimensionality of data and then to apply other variable selection approaches, largely used in this framework. Among them we consider the stepwise regression that if applied directly on the original (and not on the reduced/screened) data can give misleading results. To give evidence of what stated, we can look at Table 2 where, using the generated artificial data, the results of the variable selection, based on the forward stepwise regression, are evaluated considering the mean and the standard deviation of the number of selected variables and of the number of relevant selected variables.
Table 1 Mean, median and standard deviation of the minimum number of variables required to select the “true” relevant ones n
300
ρ
0.0
p
20
150
450
20
150
450
20
250
750
20
250
750
mean
4
4
4.02
4.17
5.42
6.58
4
4
4.01
4.02
4.33
4.47
median 4
4
4
4
4
5
4
4
4
4
4
4
s.d.
0
0.14
0.45
2.72
4.06
0
0
0.10
0.20
1.32
1.54
0
500 0.5
0.0
0.5
Screening Covariates in Presence of Unbalanced Binary Dependent Variable
261
Table 2 Mean (mean ) and standard deviation (sd) of the number of selected variables (*-var) and of the number of relevant selected variables (*-rel) n 300 500 ρ 0.0 0.5 0.0 0.5 p 20 150 20 150 20 250 20 250 mean-var sd-var mean-rel sd-rel
3.39 0.72 2.87 0.65
5.99 2.98 3.02 0.63
1.96 0.28 1.85 0.41
2.08 0.27 1.85 0.36
3.03 0.58 2.68 0.49
5.00 1.83 2.66 0.55
1.93 0.26 1.68 0.47
1.99 0.10 1.72 0.45
As expected, mainly in presence of correlated covariates, the performance of the stepwise regression is quite poor: in particular, the mean number of selected relevant variables is always less than four (in both correlated and uncorrelated cases) and it could be due to the constrained total number of selected covariates guaranteed by the stepwise. Finally, note that the high dimensional case with p = n × 1.5 has not been presented in Table 2, given worse results with respect to the previous cases.
4 Data Analysis The data used to test the screening procedure in GEV-regression refers to a sample of Italian manufacturing firms whose information are collected from the Orbis database, for the period 2007–2014. In order to remove from the data the effects of the financial crisis we limit our attention to the years between 2012 and 2014. When collecting the data, companies with full information for all years before the occurrence of the bankruptcy are considered. The dependent variable takes value 1 if firms are bankrupted and zero otherwise. The independent variables are calculated using data from the financial statements of firms included in the sample. Thus, we compute 122 financial ratios selected as potential predictors according to three criteria: (i) they have a relevant financial meaning in the failure context, (ii) they have been widely used in the failure prediction literature, and (iii) the information needed to calculate these ratios is available [2]. These predictors reflect different aspects of firms’ structure: Turnover (24), Liquidity (19), Efficiency (19), Profitability (34), Solvency (26). The number of bankrupted firms is less than 10% for all years under analysis. The relevant covariates are chosen by combining the GEV-screening procedure, described in Sect. 2, with the stepwise technique. In particular, the number of covariates is reduced by the GEV-screening procedure and then the selection is performed by the stepwise procedure. For the GEV-screening procedure the number of covariates to screen is set equal to d = 10. Furthermore, the variables picked up by the GEV-screening plus stepwise procedure are compared with those selected just using
262
F. Giordano et al.
Table 3 Number of variables selected by GEV-screening plus stepwise procedure and by the stepwise techniques Year Stepwise GEV-screeningGEV-screeningstepwise stepwise and stepwise 2012 2013 2014
2 2 4
2 2 3
1 1 3
Table 4 Variables selected by GEV-screening plus stepwise procedure and by the stepwise technique Year Stepwise GEV-screening-stepwise 2012 2013 2014
ind076, ind110 ind066, ind060 ind062, ind098, ind030, ind041
ind076, ind019 ind066, ind019 ind062, ind098, ind030
ind019 = logarithm of sales; ind030 = Working capital to net worth; ind041 = Ebitda to total assets; ind060 = Long-term debt plus loan to total assets; ind062 = Net income plus interest pais to shareholders funds plus non-current liabilities; ind066 = Net income to net worth; ind076 = Profit before tax plus interest paid to shareholders funds plus non-current liabilities; ind098 = Total assets to sales; ind110 = Operating cash flow
the stepwise. Table 3 shows the number of variables selected by stepwise (column 1), by the GEV-screening plus stepwise (column 2), and the number of covariates picked up by both methods (column 3). Table 4 shows the variables selected by each of the two procedures and the variables selected by both procedures. Finally, to test the impact of the selected variables on the default probability, we computed the AUC for the three years under analysis. The combination of the GEV-screening plus stepwise quite always outperforms the stepwise in terms of AUC whose values are respectively: AUC-stepwise = (0.6756, 0.6298, 0.7286); AUC-GEV-screening-stepwise = (0.7087, 0.6337, 0.7012)).
References 1. Altman, E.I.: Financial ratios. Discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 23(4), 589–609 (1968) 2. Amendola, A., Giordano, F., Parrella, M.L., Restaino, M.: Variable selection in highdimensional regression: a nonparametric procedure for business failure prediction. Appl. Stoch. Model. Bus. Ind. 33, 355–368 (2017) 3. Calabrese, R., Osmetti, S.: Modelling SME loan defaults as rare events: an application to credit defaults. J. Appl. Stat. 40(6), 1172–1188 (2013) 4. Chong, I.-G., Jun, C.-H.: Performance of some variable selection methods when multicollinearity is present. Chemom. Intell. Lab. Syst. 78, 103–112 (2005)
Screening Covariates in Presence of Unbalanced Binary Dependent Variable
263
5. Dimitras, A.I., Zanakis, S.H., Zopounidis, C.: A survey of business failures with an emphasis on prediction methods and industrial applications. Eur. J. Oper. Res. 90(3), 487–513 (1996) 6. Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc.: Ser. B 70, 849–911 (2008) 7. Kiefer, N.M.: Default estimation and expert information. J. Bus. Econ. Stat. 28(2), 320–328 (2010) 8. King, G., Zeng, L.: Logistic regression in rare events data. Polit. Anal. 9(2), 137–163 (2001) 9. McCullagh, P., Nelder, J.A.: Generalized Linear Models. Chapman Hall, New York (1989) 10. Ohlson, J.: Financial ratios and the probabilistic prediction of bankruptcy. J. Account. Res. 18(1), 109–131 (1980) 11. Tian, S., Yu, Y., Guo, H.: Variable selection and corporate bankruptcy forecasts. J. Bank. Financ. 52, 89–100 (2015) 12. Tian, S., Yu, Y.: Financial ratios and bankruptcy predictions: an international evidence. Int. Rev. Econ. Financ. 51, 510–526 (2017) 13. Wang, X., Dey, D.K.: Generalised extreme value regression for binary response data: an application to b2b electronic payments system adoption. Ann. Appl. Stat. 4(4), 2000–2023 (2010)
Health and Wellbeing Profiles Across Europe Aurea Grané , Irene Albarrán, and Roger Lumley
Abstract The main objective of this paper is to create profiles of older Europeans to better understand differing levels of dependency across Europe. Data comes from the latest wave of the Survey of Health, Ageing and Retirement in Europe (SHARE), carried out in 18 countries and representing over 124 million aged individuals in Europe. By using the information of around 30 variables of mixed type, we design four health composite indices for each respondent: self-perception of health, physical health and nutrition, mental agility and level of dependency. These indices are combined with a collection of descriptive variables via the k-prototypes clustering algorithm, leading to five profiles that segment the dataset into the least to the most individuals at risk of health and well-being. Keywords Ageing · Clustering · Dependency · Wellbeing
1 Motivation Aging, dependency and the distribution of resources for long-term care are issues of vital importance in today’s modern society. Europe, in particular, has undergone a demographic shift, with birth rates declining while longevity increases. In fact, it is estimated that by 2060 over 30% of the European population will be over 65, with 12% of the total population being over 80 [11]. This has led to an increased interest from governing institutions in understanding how and where exactly this change is happening so that policy, both at a national and European level, can be better tailored to meet the needs of this changing situation [16]. Of particular interest is the long-term care needs of older individuals, particularly of those who have disabilities that prevent them from performing certain tasks that relate to daily life. Individuals who require such long-term care are said to be depenA. Grané (B) · I. Albarrán · R. Lumley Universidad Carlos III de Madrid, C/ Madrid, 126, 28903 Getafe, Spain e-mail: [email protected] I. Albarrán e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_39
265
266
A. Grané et al.
dent. Dependency is highly complex and intersectional with many factors. These can be physical (age, gender), psychological (depression, mental health issues), cognitive (education), economic (geographical location, access to public services and social relationships family) and more [4]. It is important to view dependency from a holistic approach, understanding all factors that impact on an individual’s level of dependency. Being considered dependent by the state is often the only way an individual can gain access to a variety of public services related to long-term care. However, how an individual is designated ‘dependent’ or not is remarkably heterogeneous across Europe [2, 9]. Long-term care services are essential for older people with dependency and include a range of services, from domestic help, personal care, nursing care, general health care and services such as “meals-on-wheels” [17]. In addition, there is a large reliance on informal care, in particular in Southern European countries, which is affected by national policy [5]. A commonly accepted definition of dependency is that of the R(98) Resolution of the Council of Europe that states dependency is “such state in which people, whom for reason connected to the lack or loss of physical, mental or intellectual autonomy, require assistance and/or extensive help in order to carry out common everyday actions.” Many other definitions exist in other countries or institutions, such as that of the World Health Organization, which states that dependency relates to “any restriction or lack of ability to perform an activity in the manner or within the range considered normal for a human being” [18]. While definitions of who is dependent differ from country to country and institution to institution, it is without doubt that disability increases with age. Given the demographic changes discussed above, it is essential that European countries prepare for the aging population and their increased reliance of public long-term care services. As this change is European wide, understanding who and where the most dependent populations are, can inform government institution and shape how and where any policy work should be directed.
2 Data Description and Methods The Survey of Health, Ageing and Retirement in Europe (SHARE) is a rich panel database of individuals aged 50 or over in 18 European countries and Israel. Wave 6 took place in 2015 and asked questions ranging from the respondent’s financial situation to their self-perception of their health levels [8]. This survey is the only of its type that collects homogeneous information of this nature. This massive dataset aims to be representative; the 60,020 observations used in this analysis include a weighting variable that scales to represent over 124 million aged individuals in Europe [7]. The data also contained many descriptive information about the respondents, such as their level of education, their marital status and others. See Sect. 2.1. To create the profiles, the survey responses were recoded into binary variables and four indices were created. These were then used in the final clustering algorithm, along with the
Health and Wellbeing Profiles Across Europe
267
Table 1 Descriptive variables included in the analysis and their possible values or categories Type Description Values/Categories CT B CT B B B CT B CT
Country
18 European countries and Israel Gender “Male”, “Female” Ages “55–60”, “61–65”, “66–75”, “76+” Employment status “Employed”, “Not working” Marital status “Has no spouse”, “Has a spouse” Children “Has no children”, “Has one or more children” Education “No education”, “Primary”, “Secondary”, “University” Household in financial distress “Yes”, “No” Household receives “Payments and no benefits”, “No benefits and no payments”, Benefits or has payments? “Benefits and payments”, “Payments and no benefits”
B = binary, CT = categorical
categorical data, to create the profiles. The process of creating the indices is described in Sect. 2.2.
2.1 Descriptive Variables Table 1 contains the socio-economic and demographic information about the respondents.
2.2 Index Creation In order to cluster the data efficiently, four indices were created from around 30 variables. To encode these variables, papers using the SHARE dataset were reviewed with a view on how to treat each of the pertinent variables. All the included variables were transformed into binary variables. For each variable, the worst case scenario was given the value 1, while the better scenario was given the value 0. To maintain consistency, when a scale was identical for two or more variables, the same cut-off was designated for these variables. For example, this meant that any question where the response options were “Poor”, “Fair”, “Good”, “Very good” or
268
A. Grané et al.
Table 2 Variables used in creation of each particular index Index 1 Life satisfaction; Life happiness; Self-perceived health; Self-perception of health EURO depression scale; Satisfied doing no activities last year? Index 2 Number of chronic diseases; Number of nights spent in hospital; Physical health and nutrition Living in a nursing home?; Eyesight score; Hearing score; Ever smoked daily?; How often eat fish, meat or poultry; How often eat vegetables; BMI; Max grip strength Index 3 Self-rated reading; Self-rated writing; Score of memory test; Mental agility Score of numeracy test; Score of orientation in time test; Score of words list learning test (both trials); Score of verbal fluency test Index 4 Global Activity Limitation Indicator (GALI); Dependency Number of mobility limitations; Number of difficulties in ADLs; Number of difficulties in IADLs; Physical inactivity
“Excellent”, these variables were transformed in the same manner using the same cut-off point. As with the descriptive variables, binary variables were left unchanged. Once the variables had been recoded to binary values, they were summed and rescaled from 0 to 10. Table 2 shows which variables were used in the creation of which index. A description of the process and reasoning follows.
2.3 Profile Construction Once the indices are constructed, we are interested in obtaining health and wellbeing profiles using their information jointly with that contained in the descriptive variables. Since the data is of mixed type, we use the k-prototypes clustering algorithm [12], that combines the well-known approaches of k-means [15] and k-modes [13] algorithms, with a computational cost of O((T + 1)kn), where n is the number of observations, k the number of clusters and T the number of iterations. As any non-hierarchical clustering approach, k must be determined before running the algorithm. Although optimality is a subjective measure that depends on the goal of the analysis, a variety
Health and Wellbeing Profiles Across Europe
Profile 1
Profile 2
Profile 3
269
Profile 4
Profile 5
Fig. 1 Geographical distribution of Profiles 1 to 5 (% of respondents per profile). The darker the color the higher the percentage
of techniques are available to assist in the decision of the number of clusters to be selected. In this work, we used the “elbow” method.
3 Results With the clusters created, a profile of the average member of each cluster can be made. To do so, the profiles have been ranked by taking the mean of the mean values for each index. The profiles with higher mean values in the indices can be said to be more disadvantaged than those with lower index scores. A broad summary of each profile follows. Additionally, the country of residence of the respondent was not included in the clustering algorithm and had no impact on the formation of the clusters. It is therefore interesting to see how the profiles are distributed across the continent and gain insight into how areas of need are distributed. This is shown in Fig. 1. Profile 1: 16% of all respondents; female; 76 years old or older; low education (primary or none); not working; likely lives alone; suffers from multiple limitations in Activities of Daily Living (ADL) or Instrumental Activities of Daily Living (IADL); health-related payments or benefits; in financial distress. Profile 2: 23% of all respondents; female; equally likely to belong to any age bracket; primary educated; not working; lives with a partner; few limitations in ADL or IADL; health-related payments or benefits. Profile 3: 11% of all respondents; female; equally likely to belong to any age bracket; secondary educated; not working; lives with a partner; few limitations in ADL or IADL; health-related payments or benefits. Profile 4: 23% of all respondents; male; 70 years or older; primary educated; not working; lives with a partner; some limitations in ADL or IADL; health-related payments or benefits; likely in financial distress. Profile 5: 28% of all respondents; male; younger (55–65 years old); secondary or university educated; likely still working; lives with a partner; very unlikely to have limitations in ADL or IADL; no health-related benefits, some payments.
270
A. Grané et al.
Profile 1 and Profile 4 are the worst off for each gender (female and male respectively), although Profile 4 is still better off than Profile 1. This is supported by the literature [1, 6, 14] which acknowledges the well-researched fact that women both live longer and with a poorer quality of life. These profiles also have a higher share of individuals aged 76 years or more old. For the constructed indices, they both perform badly in index 3, which relates to mental agility. Profile 5 is the least disadvantaged profile and is heavily skewed towards younger individuals; over 50% of the individuals in this profile are aged between 55–65, while those over 76 make up less than 10% of this profile. They are the most ‘male heavy’ group and also the most likely to still be working. Continuing to work in later life has been correlated to positive health outcome [16]. Across all the health indices they score low, indicating good health. These individuals are in the least need for current social assistance. One thing that is immediately of note is the distribution of the most disadvantaged profiles: Profile 1 and Profile 4. There is an over-representation of these profiles in the Southern European countries (Portugal, Spain, Italy, Greece). As mentioned above, there is a strong relationship between education and health outcomes. This is also influenced strongly by geography [3, 10]. Namely, having lower education in the Southern and Western European countries has a more negative impact on your health and long-term care needs than similarly educated individuals in the Northern European countries. This is reinforced by the findings here.
References 1. Acciai, F., Hardy, M.: Depression in later life: a closer look at the gender gap. Soc. Sci. Res. 68, 163–175 (2017) 2. Albarrán, A., Alonso, P., Grané, A.: Profile identification via weighted related metric scaling: an application to dependent Spanish children. J. R. Stat. Soc. Ser. A-Stat. Soc. 178, 1–26 (2015) 3. Avendano, M., Jürges, H., Mackenback, J.P.: Educational level and changes in health across Europe: longitudinal results from SHARE. J. Eur. Soc. Policy 19, 301–316 (2009) 4. Baltes, M.M.: The Many Faces of Dependency in Old Age. Cambridge University Press, New York (1996) 5. Barczyk, D., Kredler, M.: Long-term care across europe and the U.S.: The Role of Informal and Formal Care. Working paper, Universidad Carlos III de Madrid (2019) 6. Bonsang, E., Skirbekk, V., Staudinger, U.M.: As you sow, so shall you reap: gender-role attitudes and late-life cognition. Psychol. Sci. 28, 1201–1213 (2017) 7. Börsch-Supan, A., Brandt, M., Hunkler, C., Kneip, T., Korbmacher, J., Malter, F., Schaan, B., Stuck, S., Zuber, S.: Data resource profile: the survey of health, ageing and retirement in Europe (SHARE). Int. J. Epidemiol. (2013) 8. Börsch-Supan, A.: Survey of health, ageing and retirement in Europe (SHARE) Wave 6. Release version: 7.0.0. SHARE-ERIC. Data set. (2019) 9. Carrino, L., Orso, C.:Eligibility and inclusiveness of long-term care institutional frameworks in Europe: a cross-country comparison. University Ca’ Foscari of Venice, Department of Economics Research Paper Series, Issue 28/WP/2014 (2014) 10. Côté-Sergent, A., Fonseca, R., Strumpf, E.: Comparing the education gradient in chronic disease incidence among the elderly in six OECD countries. CIRANO Working Papers, 2018s-11 (2018)
Health and Wellbeing Profiles Across Europe
271
11. Davies, R.: Older people in Europe EU policies and programmes. European Parliamentary Research Service 140811REV1 (2014) 12. Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, pp. 21–34. World Scientific, Singapore (1997) 13. Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998) 14. Lima, A.L.B., Espelt, A., Lima, K.C., Bosque-Prous, M.: Activity limitation in elderly people in the European context of gender inequality: a multilevel approach. Ciên. Saúde Coletiva 23, 2991–3000 (2018) 15. Lloyd, S.P.: Least square quantization in PCM. Bell Telephone Laboratories Paper. Published in journal much later; Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (1982) 16. Ney, S.: Active aging policy in Europe: between path dependency and path departure. Ageing Int. 30, 325–342 (2005) 17. Tenand, M.: Equity and efficiency in long-term care policies: empirical evidence from France and the Netherlands. Economies and finances. PSL Research University. NNT: 2018PSLEE026. tel-01871505v2 (2018) 18. WHO: International Classification of Functioning, Disability and Health (ICF). World Health Organization (2002). http://www.who.int/classifications/icf/en/; 6, 60
On Modelling of Crude Oil Futures in a Bivariate State-Space Framework Peilun He, Karol Binkowski, Nino Kordzakhia, and Pavel Shevchenko
Abstract We study a bivariate latent factor model for the pricing of commodity futures. The two unobservable state variables representing the short and long term factors are modelled as Ornstein-Uhlenbeck (OU) processes. The Kalman Filter (KF) algorithm has been implemented to estimate the unobservable factors as well as unknown model parameters. The estimates of model parameters were obtained by maximising a Gaussian likelihood function. The algorithm has been applied to WTI Crude Oil NYMEX futures data. Keywords Kalman filter · Kalman smoother · State-space model · Crude oil futures
1 Introduction In this1 paper, the OU two-factor model is used for modelling of short and long equilibrium commodity spot price levels. Our motivation is driven by the development of a robust KF algorithm which will be used for joint estimation of the model parameters and the state variables. In a different setup, the parameter estimation problem for bivariate OU process using KF has been studied in [6, 8]. In [2] the KF is used to study the effect of stochastic volatility and interest rates on the commodity spot prices using the market prices of long-dated futures and 1 Data
provided by Datascope—https://hosted.datascope.reuters.com.
P. He (B) · K. Binkowski · N. Kordzakhia · P. Shevchenko Macquarie University, Macquarie Park, NSW 2109, Australia e-mail: [email protected] K. Binkowski e-mail: [email protected] N. Kordzakhia e-mail: [email protected] P. Shevchenko e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_40
273
274
P. He et al.
options. In [9] the Kalman technique has been applied to calibration, jointly with filtering, of partially unobservable processes using particle Markov Chain Monte Carlo approach. The extended KF was developed in [5] for estimation of the state variables in the two-factor model from [10] for the commodity spot price and its convenience yield. In Sect. 2, we will derive the linear partially observable system specific for commodity futures prices developed in the two-factor model, which represents an extension of [11], in the risk-neutral setting. In Sect. 3, the model will be applied to WTI Crude Oil NYMEX futures prices over 2001–2005, 2005–2009 and 2014–2018 time periods.
2 Two-Factor Model with Risk Premium Parameters We propose the two-factor model of pricing of commodity futures which represents an extension of [11], where the spot price St is modelled as the sum of two unobservable factors χt and ξt , (1) log(St ) = χt + ξt , where χt is the short-term fluctuation in prices and ξt is the long-term equilibrium price level. We assume that χt follows an OU equation and its expected value converges to 0 as t → ∞, χ
dχt = (−κχt − λχ )dt + σχ d Z t , κ > 0.
(2)
The changes in the equilibrium level of ξt are expected to persist and ξt is also assumed to be a stationary OU process dξt = (μξ − γ ξt − λξ )dt + σξ d Z tξ , γ > 0, χ
(3)
ξ
where (Z t )t≥0 and (Z t )t≥0 are correlated standard Brownian motions processes with χ ξ E(d Z t d Z t ) = ρχξ dt; σχ and σξ are the volatilities; γ and κ are the speed of meanreversion parameters of χ and ξ processes respectively; (μξ − λξ )/γ is a long-run mean for ξ . In [11], only one factor had a mean-reverting property. In this work, both χt and ξt are modelled as the mean-reverting processes. The parameters λχ and λξ in (2) and (3) were introduced as adjustments for market price of risk. The approach stems from the risk-neutral futures pricing theory developed in [1]. Given the initial values χ0 and ξ0 , χt and ξt are jointly normally distributed. Therefore the logarithm of the spot price, which is the sum of χ and ξ , is normally distributed. Hence, the spot price is log-normally distributed and 1 log[E ∗ (St )] = E ∗ [log(St )] + V ar ∗ [log(St )] = e−κt χ0 + e−γ t ξ0 + A(t), 2
(4)
On Modelling of Crude Oil Futures in a Bivariate State-Space Framework
275
where E ∗ (·) and V ar ∗ (·) represent the expectation and variance taken with respect to the risk-neutral distribution, and μξ − λξ λχ (1 − e−κt ) + (1 − e−γ t ) κ γ 1 − e−(κ+γ )t 1 1 − e−2κt 2 1 − e−2γ t 2 + σχ + σξ + 2 σχ σξ ρχξ . 2 2κ 2γ κ +γ
A(t) = −
(5)
Let F0,T be the current market price of the futures contract with maturity T . For eliminating arbitrage, the futures prices must be equal to the expected spot prices at the asset delivery time T . Hence, under the risk-neutral measure, we have log(F0,T ) = e−κ T χ0 + e−γ T ξ0 + A(T ). After discretization, we will obtain the following AR(1) dynamics for bivariate state variable xt xt = c + Gxt−1 + wt , where
(6)
−κΔt 0 0 e χt ,G = , c = μξ , xt = (1 − e−γ Δt ) ξt 0 e−γ Δt γ
and wt is a column vector of uncorrelated normally distributed random variables with E(wt ) = 0 and Cov(wt ) = W = Cov[(χΔt , ξΔt )] =
1−e−2κΔt 2κ
1−e−(κ+γ )Δt κ+γ
1−e−(κ+γ )Δt σχ σξ ρχξ κ+γ 1−e−2γ Δt 2 σχ σξ ρχξ σξ 2γ
σχ2
,
Δt is the time step between (t − 1) and t. The relationship between the state variables and the observed futures prices is given by yt = dt + Ft xt + vt ,
(7)
where −κ T 1 , . . . , e−κ Tn e , yt = log(Ft,T1 ), . . . , log(Ft,Tn ) , dt = (A(T1 ), . . . , A(Tn )) , Ft = −γ T −γ T n 1, . . . , e e
vt is n-dimensional vector of uncorrelated normally distributed random variables, E(vt ) = 0, Cov(vt ) = V and T1 , . . . , Tn are the futures maturity times. In Sect. 3, we assume that V is a diagonal matrix with non-zero diagonal entries s = (s12 , s22 , . . . , s22 ), i.e. the variance of the error term for the first contract is s12 and s22 for all other remaining contracts. Let Ft be σ —algebra generated by the futures contract up to time t. The prediction errors et = yt − E(yt |Ft−1 ) are supposed to be multivariate normally distributed, then the log-likelihood function of y = (y1 , y2 , . . . , yn T ) can be written as
276
P. He et al.
Fig. 1 WTI Crude Oil futures prices of the first available contract
T 1
nn T log 2π − log det(L t|t−1 ) + et L −1 l(θ ; y) = − t|t−1 et , 2 2 t=1
n
(8)
where the set of unknown parameters θ = (κ, γ , μξ , σχ , σξ , ρχξ , λχ , λξ , s1 , s2 ), n T is the number of time instances, L t|t−1 = Cov(et |Ft−1 ). Given yt , the maximum likelihood estimate (MLE) of θ is obtained by maximising the log-likelihood function from (8). Both quantities et and L t|t−1 are computed within the KF.
3 Crude Oil Futures The unknown parameters were estimated2 by maximising the log-likelihood function (8). Then, the state variables were estimated using the KF and Kalman Smoother (KS), [4, 7]. Given all observations until time T and the current time t, t ≤ T , KF only uses the observations up to t, while KS uses all the available observations up to T . In this section, “in-sample” and “out-of-sample” performances of KF and KS are analysed using the RMSE criterion. We used the historical data of WTI Crude Oil NYMEX futures prices over different time intervals from 1996 to 2019. The data comprised the prices of 20 monthly futures contracts with duration up to 20 months. Figure 1 shows the WTI Crude Oil futures prices from 1996 to 2019. It is obvious that the prices dropped dramatically during the Global Financial Crisis (GFC) in 2008. For studying the “in-sample” and “out-of-sample” forecasting performances, the three separate time periods were selected, 01/01/2001 - 01/01/2005, 01/01/2005 - 01/01/2009, and 01/01/2014 - 01/01/2018.
2
The Appendix containing the initial values and parameter estimates along with their standard errors can be found at https://github.com/peilun-he/MAF-Conference-September-2020.
On Modelling of Crude Oil Futures in a Bivariate State-Space Framework
277
Table 1 RMSE computed over three different time periods using two forecasting methods for each time interval Period 2001–2005 2005–2009 2014–2018 Estimation Filter Smoother Filter Smoother Filter Smoother In-sample C4 C9 C13 Out-ofC14 sample C20
0.003264 0.002289 0.004155 0.005959
0.003268 0.002304 0.004163 0.005955
0.002219 0.001612 0.003959 0.005569
0.002244 0.001616 0.003986 0.005591
0.002180 0.001646 0.003516 0.005215
0.002181 0.001668 0.003494 0.005181
0.018585
0.018579
0.018002
0.018006
0.020331
0.020265
Fig. 2 Cross-sectional graphs of the logarithms of futures prices and their forecasts on 4 different days. S F represents the sum of squares of estimation errors using KF; SS represents the sum of squares of estimation errors using KS
Table 1 provides the RMSE over the selected time periods. “In-sample” forecasting performance has been evaluated on the first 13 contracts (C1-C13), while “out-ofsample” performance has been evaluated on the 14th to the 20th contracts (C14-C20). Overall, the “in-sample” forecasting errors were less than “out-of-sample” errors as seen in Table 1. The “out-of-sample” forecasting errors were consistently increasing with respect of maturity times from C14 to C20 contracts. The RMSE are consistent across the three time intervals, even over 2005 - 2009, where the futures prices plummeted during the GFC. In summary, for each specified time period, the RMSE calculated through KF is smaller for short maturity contracts, which provides evidence that the KF performed better in predicting prices for short maturity contracts, whilst KS outperformed KF in the pricing of longer maturity futures. Figure 2 gives the cross-sectional data plots of the logarithms of futures prices and their forecasts obtained by KF and KS on four different days. The plots exhibit the distinct patterns of futures curves, 05/09/2007, 10/10/2006, 14/11/2005 and 20/09/2005.
278
P. He et al.
The horizontal axis represents the number of contracts from 1 to 20 and the logarithm of futures prices are presented on the vertical axis. The RMSE for the curve with backwardation pattern (top left) appears larger than RMSE of that with contango pattern (top right).3
4 Conclusion In this paper, we have developed the two-factor model which can be used for pricing of oil futures and forecasting their term structure which remains a most significant challenge, [3]. The KF algorithm has been robustified by the grid-search add-on which has been implemented to estimate the hidden factors jointly with unknown model parameters. The model has been applied to WTI Crude Oil futures market prices from 1996 to 2019. The model “in-sample” and “out-of-sample” forecasting performances were evaluated using the RMSE criterion. Moreover, we observed that KF gives a better estimate of state vector xt for shorter maturity contracts, while KS performs better for contracts with longer maturities.
References 1. Black, F.: The pricing of commodity contracts. J. Financ. Econ. 3, 167–179 (1976) 2. Cheng, B., Nikitopoulos, C.S., Schlögl, E.: Pricing of long-dated commodity derivatives: do stochastic interest rates matter? J. Bank. Financ. 95, 148–166 (2018) 3. Cortazar, G., Millard, C., Ortega, H., Schwartz, E.S.: Commodity price forecasts, futures prices, and pricing models. Manag. Sci. 65, 4141–4155 (2019) 4. De Jong, P.: Smoothing and interpolation with the state-space model. J. Am. Stat. Assoc. 84, 1085–1088 (1989) 5. Ewald, C.O., Zhang, A., Zong, Z.: On the calibration of the Schwartz two-factor model to WTI crude oil options and the extended Kalman Filter. Ann. Oper. Res. 282, 119–130 (2019) 6. Favetto, B., Samson, A.: Parameter estimation for a bidimensional partially observed OrnsteinUhlenbeck process with biological application. Scand. J. Stat. 37, 200–220 (2000) 7. Harvey, A.C.: Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge (1990) 8. Kutoyants, Y.A.: On parameter estimation of the hidden Ornstein-Uhlenbeck process. J. Multivar. Anal. 169, 248–263 (2019) 9. Peters, G.W., Briers, M., Shevchenko, P., Doucet, A.: Calibration and filtering for multi factor commodity models with seasonality: incorporating panel data from futures contracts. Methodol. Comput. Appl. Probab. 15, 841–874 (2013) 10. Schwartz, E.S.: The stochastic behavior of commodity prices: implications for valuation and hedging. J. Financ. 52, 923–973 (1997) 11. Schwartz, E.S., Smith, J.E.: Short-term variations and long-term dynamics in commodity prices. Manag. Sci. 46, 893–911 (2000)
3
Backwardation represents the situation where the futures prices with shorter maturities are higher than the futures prices with longer maturities, while contango refers to the reverse situation.
A General Comovement Measure for Time Series Agnieszka Jach
Abstract We propose a nonparametric, time-dependent, cross-scale/crossfrequency dependence measure for multivariate stationary and non-stationary time series termed multi-thickness thick pen measure of association, MTTPMA. The building blocks of the measure are the Thick Pen Transform and the Thick Pen Measure of Association. The new measure is simple and visually interpretable. We demonstrate its potential application on synthetic financial contagion data. Keywords Cross-scale · Multiscale · Multivariate · Nonparametric · Time-varying
1 Introduction In some applications there is a need for a general comovement measure for time series; here comovement or codependence is understood in a broad sense. Such a measure needs to satisfy several requirements. Apart from being unconfined to a particular data-generating mechanism, i.e., nonparametric, ideally it should be: (1) applicable to stationary as well as nonstationary sequences; (2) applicable to more than two time series; (3) time-evolving, so that codependence can be monitored in time; (4) capable of capturing comovement with respect to a given time scale, for a range of time scales from small to large; (5) capable of quantifying codependence across different time scales. The last requirement is non-standard in the literature, but its importance has been highlighted by Gorrostieta et al. [5] in the context of an application in neuroscience. The use of cross-term comovement in finance and economics remains largely unexplored. Existing comovement measures such as cross-correlation coefficient or waveletand frequency-based metrics (this includes recently-proposed evolutionary dual frequency coherence (EDC) of [5]) meet some, but not all requirements (1–5). This is also the case with regression-based techniques used, for example, by Barberis et al. A. Jach (B) Hanken School of Economics, Helsinki, Finland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_41
279
280
A. Jach
[1] and Chen et al. [2]. The objective of this article is to propose a general comovement measure that satisfies all requirements (1–5). To do so, we extend the Thick Pen Measure of Association, TPMA, of [4].
2 Thick Pen Transform, Thick Pen Measure of Association, and a New Measure In this section we recall definitions of the Thick Pen Transform and the Thick Pen Measure of Association of [4] in order to introduce the new measure (more information and examples can be found in Fryzlewicz and Oh [4]). T be a univariate time series while T the set of thickness To that end, let X = (X t )t=1 parameters (positive constants, e.g., increasing integers), whose elements are denoted by τi , i = 1, 2, . . . , |T | (|T | is the cardinality of T ). To simplify the notation let τ represent one of the elements of T (τ = τi for some i). To transform X using TPT means to plot X t versus t using pens of varying thickness τ ∈ T and to assemble all the boundaries obtained in the drawing process, denoting the collection of 2 · |T | sequences of length T by T PT (X ) =
L τt (X ), Utτ (X )
T t=1 τ ∈T
.
(1)
The lower and upper boundaries of the area marked by a (square) pen of a given thickness τ appearing in Eq. (1) are defined as L τt (X ) = min(X t , X t+1 , . . . , X t+τ ), Utτ (X ) = max(X t , X t+1 , . . . , X t+τ ). They are obtained by simply computing moving-window minima and maxima over blocks of data of length τ + 1 starting at t. Visually, small thickness values bring out small-time-scale features of the data while large thickness values bring out largetime-scale features of the data. By zooming in (small τ ) and zooming out (large τ ), the TPT offers a multiscale representation of the data. In a demonstrative example of Fig. 1 we show three time series (top left) in solid, dashed and dotted, and their respective TPTs (top right) for a single thickness value τ = 1. In the latter, the three pairs of lines (or tubes) correspond to the boundaries of the areas obtained in the process of plotting each time series with a pen of thickness τ = 1. If we consider the proportion of the overlap of the three tubes, we will end up with a multivariate comovement measure called Thick Pen Measure of Association of [4]. T , Formally, if we have K time series X = (X (1) , X (2) , . . . , X (K ) ), X (k) = (X t(k) )t=1 k = 1, 2, . . . , K , and their respective TPTs (time series are assumed to be on the same scale, i.e., normalized) then the Thick Pen Measure of Association is defined as
A General Comovement Measure for Time Series
281
Fig. 1 Top left: time series X (1) , X (2) , X (3) (solid, dashed, dotted, respectively). Top right: lower and upper boundaries of TPT-transformed time series X (1) , X (2) , X (3) with τ = 1 (solid, dashed, dotted). Bottom left: TPMA of X (1) , X (2) , X (3) for τ = 1, ρtτ (X (1) , X (2) , X (3) ) (thick solid) and time-averaged TPMA, ρ¯ τ (X (1) , X (2) , X (3) ) (thick dashed). Bottom right: MTTPMA of τ X (1) , X (2) , X (3) for τ = (τ (1) , τ (2) , τ (3) ) = (2, 1, 1), ρt (X (1) , X (2) , X (3) ) (thick solid) and timeτ (1) (2) (3) averaged MTTPMA, ρ¯ (X , X , X ) (thick dashed)
ρtτ (X (1) , . . . , X (K ) ) =
mink (Utτ (X (k) )) − maxk (L τt (X (k) )) . maxk (Utτ (X (k) )) − mink (L τt (X (k) ))
(2)
This measure is restricted to the interval (−1, 1] (the lower end-point is excluded) and has all finite moments. It is time-varying, but can be time-averaged to provide a one-number summary of the multivariate comovement, which we denote by ρ¯ τ (X (1) , . . . , X (K ) ). The last quantity is shown in the bottom left panel of Fig. 1 as a horizontal thick dashed line (Fryzlewicz and Oh [4] refer to a plot of it versus τ as a cross-spectrum) together with the trivariate TPMA, ρtτ (X (1) , X (2) , X (3) ), for τ = 1 (thick solid). At the first time instance t = 1, the value of TPMA is approximately −0.15. Graphically, this value is obtained as follows. It is negative because the limits (L τt , Utτ ) of the three tubes, (−0.5, −0.4), (−0.3, 0.7), (−0.2, 0.8) for solid, dashed and dotted, respectively, are disjoint (in the vertical direction). The value of 0.15 is obtained through dividing 0.2 by 1.3; 0.2 is the length of the longest gap (between solid and dotted tube limits) and 1.3 is the length of (the shortest interval containing) the union of all three tube limits. Obviously, the same answer comes = − 0.2 , from analytical considerations (Eq. (2)), which lead to the ratio −0.4−(−0.2) 0.8−(−0.5) 1.3 τ (k) τ (k) τ (k) because mink (Ut (X )) = −0.4, maxk (L t (X )) = −0.2, maxk (Ut (X )) = 0.8, mink (L τt (X (k) )) = −0.5. At the second time instance t = 2, the value of TPMA is approximately 0.43. It is positive because there no gaps between the three sets of (L τt , Utτ ) ((−0.5, 0.1), (−0.3, 0.1), (−0.2, 0.2), respectively) and the value of 0.43 comes from dividing 0.3 (the length of the overlap) by 0.7. Similar considerations can be repeated for other t’s. The case of ρtτ = 0 corresponds to a zero-length gap. We stress here the fact that in definitions of ρtτ and ρ¯ τ , a single thickness value τ is used to transform all the time series; the calculation can be repeated for other thickness values, τ = τi , i = 1, 2, . . . , |T |, but each time the same τ is used. The new
282
A. Jach
measure we present next uses K thickness values instead, that is, each series X (k) is transformed with its own thickness τ (k) . By bringing out features of X (1) associated with the time scale of (approximately) τ (1) units, those of X (2) associated with the time scale of τ (2) units, . . ., those of X (K ) associated with the time scale of τ (K ) units, and finally combining them, allows us to develop a (τ (1) , τ (2) , . . . , τ (K ) )-cross-scale dependence metric. Multi-thickness TPMA, MTTPMA We now have all the tools and notation in place to introduce the new measure, which we label the multi-thickness Thick Pen Measure of Association, MTTPMA. Let τ = (τ (1) , τ (2) , . . . , τ (K ) ) be a K -dimensional vector of thickness values, such that τ (k) ∈ T is the thickness used to transform the k-th time series X (k) . Define the MTTPMA as ρt(τ
(1)
,τ (2) ,...,τ (K ) )
(k)
(X (1) , . . . , X (K ) ) =
(k)
mink (Utτ (X (k) )) − maxk (L τt (X (k) )) (k)
(k)
maxk (Utτ (X (k) )) − mink (L τt (X (k) ))
.
(3)
This measure inherits all the properties of TPMA of [4], but in addition it allows one to quantify cross-scale dependence between component time series. This is achieved by measuring the proportional overlap/gap of the areas marked by pens of different thicknesses and not the areas marked by a single thickness, and hence the name multi-thickness TPMA. We incorporate this fact into the notation by adding a vector superscript τ to ρt in place of a scalar superscript τ . Obviously, when all the elements of τ are identical, we recover the original definition of [4] of the TPMA. In total, (1) (2) (K ) we can compute |T | K sequences ρt(τ ,τ ,...,τ ) (X (1) , . . . , X (K ) ), but typically far fewer configurations of vectors τ will be of interest and these will be linked to a particular task at hand (e.g., in the application of [5] one would put τ (1) = 10 and τ (2) = 50 to quantify alpha-delta cross-frequency dependence). In the bottom right panel of Fig. 1 we show an example of MTTPMA with τ = (2, 1, 1) for the three time series considered previously (thick solid). In this case MTTPMA quantifies the comovement between 2-scale features of X (1) , 1-scale features of X (2) and 1-scale features of X (3) . Similarly to time-averaged TPMA, we can define the time-averaged (1) (2) (K ) MTTPMA, denoting it by ρ¯ (τ ,τ ,...,τ ) (X (1) , . . . , X (K ) ). This quantity is depicted as a horizontal thick dashed line in the bottom right panel of Fig. 1. One possible variation on MTTPMA is when just one time series (X (k) = X for all k’s) is used in place of K different time series, leading to a cross-scale, auto-dependence metric.
3 Application on Synthetic Financial Contagion Data In this section we demonstrate potential application of MTTPMA on synthetic financial contagion data (see Sect. II of [3]); application of MTTPMA on real financial data can be found in Jach and Felixson [6]. Market contagion is described there (universally-accepted definition of market contagion does not exist) as a phe-
A General Comovement Measure for Time Series
283
Fig. 2 Simulated market returns r (1) (= X (1) ) and r (2) (= X (2) ) under contagion
nomenon in which cross-market comovement increases after a shock to one of T , T = 252, follow the markets. Let (daily) returns on market one, r (1) = (rt(1) )t=1 a uniform distribution on an interval from a1 to b1 , U (a1 , b1 ). In the low-volatility scenario, i.e., for t ∈ t L , where t L is a set containing low-volatility days, we put a = a1,low = −1 and b = b1,low = 1, while in the high-volatility scenario (t ∈ t H ), we set a = a1,high = −10 and b = b1,high = 10. Let (daily) returns on market two, T , satisfy rt(2) = 0.2rt(1) + t , where t ∼ U (−2, 2) for all t’s. We r (2) = (rt(2) )t=1 choose the 2nd quarter as the high-volatility period, t H = {64, 65, . . . , 125} (crisis period). The two series are shown in Fig. 2. We begin the analysis by normalizing the data using z-score transform. This makes the two series of returns more similar during the crisis and less similar otherwise (series r (1) becomes much ‘smaller’ for the non-crisis days compared to the raw series, while series r (2) only slightly ‘smaller’). This similarity during the crisis is manifested in the higher values of the TPMA in that period (diagonal subplots of Fig. 3) compared to the non-crisis period, with a smoother line for the thickness of 20 days (bottom right) compared to that of 5 days (top left). The TPMA corresponds to the MTTPMA shown in there with τ (1) = τ (2) , for τ (1) , τ (2) ∈ T , T = {5, 10, 20}. When we look at the cross-scale dependence quantified by the MTTPMA (off-diagonal subplots of Fig. 3), first we (1) notice that the effect is non-symmetrical, with ρt(τ ,5) (r (1) , r (2) ) being more ‘wiggly’ (2) than ρt(5,τ ) (r (1) , r (2) ). The latter, which picks up the crisis’ end-point more clearly, is the one that quantifies comovement of the 5-day features of market 1 returns (the source of the crisis) with the 10-day features of market 2 returns (top middle) and
Fig. 3 MTTPMA of the normalized r (1) (= X (1) ) and r (2) (= X (2) ) for τ (1) , τ (2) ∈ T , T = {5, 10, 20}
284
A. Jach
with the 20-day features of market 2 returns (top right). Similar conclusions apply to ρt(20,10) (r (1) , r (2) ) and ρt(10,20) (r (1) , r (2) ). In general, MTTPMAs with larger thickness values will tend to look less ‘wiggly’ compared to those that use smaller thickness values—this is in line with what one would expect after correlating large-scale features of time series compared to correlating small-scale features of times series. This application of MTTPMA allows one to gain new insights into the comovement of returns from the two markets by providing a measure of cross-term comovement. It is not possible to apply EDC of [5] here due to the lack of iid data replicates.
4 Summary The objective of this article was to propose a general comovement measure for time series. The generality of such measure was reflected in five conditions stated in the introduction. The new measure, MTTPMA, was formulated in terms of the Thick Pen Transform of [4] and obtained as a judicious extension of the Thick Pen Measure of Association of [4], leading to a simple and visually interpreted quantity, in line with its original counterparts. We explained the rationale behind the method using a demonstrative example and showed how MTTPMA can be applied to simulated financial contagion data. We hope that the flexibility and generality of MTTPMA will make it an attractive metric in financial and economic applications. Acknowledgements AJ was supported by the 2018 grant from the Finnish Foundation for Share Promotion (Börsstiftelsen).
References 1. Barberis, N., Shleifer, A., Wurgler, J.: Comovement. J. Financ. Econ. 75, 283–317 (2005) 2. Chen, H., Singal, V., Whitelaw, R.-F.: Comovement revisited. J. Financ. Econ. 121, 624–644 (2016) 3. Forbes, K., Rigobon, R.: No contagion, only interdependence: measuring stock market comovements. J. Financ. 57, 2223–2261 (2002) 4. Fryzlewicz, P., Oh, H.-S.: Thick-pen transformation for time series. J. Roy. Stat. Soc. B Met. 73, 499–529 (2011) 5. Gorrostieta, C., Ombao, H., von Sachs, R.: Time-dependent dual-frequency coherence in multivariate non-stationary time series. J. Time Ser. Anal. 40, 3–22 (2019) 6. Jach, A., Felixson, K.: Short-, long- and cross-term comovement of OMXH25 stocks. Nord. J. Bus. 68, 23–39 (2019)
Alternative Area Yield Index Based Crop Insurance Policies in Indonesia Dian Kusumaningrum, Rahma Anisa, Valantino Agus Sutomo, and Ken Seng Tan
Abstract Starting in 2015, farmers in Indonesia could protect their paddy cultivation by participating in the insurance scheme enacted by the Paddy Plant Business Insurance (PPBI) program. Despite it is heavily subsidized by the government and that it is a Multi-Peril Crop Insurance (MPCI), less than 1% of the total farming land are insured. The lack of demand could be due to the fact that the existing MPCI’s indemnity depends on the damaged farming land, as opposed to the actual paddy productivity. In view of this drawback, this paper considers two other insurance schemes, with one generalizes the existing MPCI by integrating paddy productivity to the indemnity, and the other is an area yield index (AYI) policy. By calibrating to the empirical data, we conduct extensive Monte Carlo studies to assess the efficiency of these schemes. The simulated premium of the existing MPCI is around IDR 250,000. This is higher than current premium and the underprice can be detrimental to the PPBI program’s sustainability. By comparing cost of insurance and tail risk (VaR and TVaR), we conclude that the first scheme is not a viable solution due to its high premium and high tail risk. The AYI, on the other hand, is a viable option. Its premium is comparable to the existing MPCI but with a much smaller tail risk. Furthermore, it has the potential of overcoming many of the drawbacks that are commonly found in MPCI (such as moral hazard and high administrative cost). Keywords Multi-Peril Crop Insurance (MCPI) · Area Yield Index Crop Insurance · Monte Carlo Simulation D. Kusumaningrum (B) · V. A. Sutomo Prasetiya Mulya University, BSD Tangerang, Banten, Indonesia e-mail: [email protected] V. A. Sutomo e-mail: [email protected] R. Anisa IPB University, Bogor, West Java, Indonesia e-mail: [email protected] K. S. Tan Division of Banking & Finance, Nanyang Business School, Nanyang Technological University, Jurong West, Singapore e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_42
285
286
D. Kusumaningrum et al.
1 Introduction As one of the largest rice producing countries in the world, the government of Indonesia is increasingly concern with protecting farmers’ income. Their vulnerabilities could be due to pests, diseases, and weather related risk (such as drought and flood) that severely impact their crop productivity. As a result, Indonesia launched a national insurance program under the Paddy Plant Business Insurance (PPBI) in 2015. The program is administered by the Ministry of Agriculture Indonesia (MoA) with Jasindo, a state-owned insurance company, underwrites the insurance policy. The policy insures small household paddy farmers with farming land up to 2 hectares (ha). While the policy is a Multi-Peril Crop Insurance (MPCI) that covers crop failures caused by floods, droughts, pests and disease, its indemnity is determined by the severity of the damaged farming land. The MPCI premium is set at IDR 180,000/ha but farmer only pays 20% (i.e. 36,000/ha) while the government coves the remaining 80% [1]. Although PPBI’s MPCI is heavily subsidized by the government, its participation rates have been very low, with less than 1% of the total farming land are insured. This is consistent with the survey [2] conducted in 2019 that only 40% of the farmers are covered under PPBI even though more than 70% of the farmers have insurance (life or health). Moreover, farmers’ willingness to pay for paddy insurance is around IDR 50,000, which exceeds the existing MPCI premium. Hence the cost of insurance is not an issue. The lack of demand of MPCI could be related to the underlying features of the insurance. If farmers do not see the value of the existing MPCI, it is unlikely they will pay for the insurance even though it is within their budget. For this reason, the main objective of this paper is to consider other viable insurance schemes. Our proposed schemes are described in the next section. Section 3 presents and discusses the simulation results and Sect. 4 concludes the paper.
2 Developing Alternative Area Yield Index Policy Suppose a particular county in Indonesia comprises of m villages, with n i paddy farmers in village i, where i = 1, 2, . . . , m. Our objective is to provide paddy insurance to the farmers in this county. Let us assume that for a particular design of MPCI, its indemnity function admits the following representation: ⎛
⎞ Di j ⎠ · yi j · S I · L i j , i = 1, 2, . . . , m, j = 1, 2, . . . , n i , indemnity = ⎝ × 1 Di j Li j L >0.75 ij
(1) where the indicator function 1{A} equals to 1 if A is true and 0 otherwise, and Di j is the damaged farming land for farmer j in the i-th village, L i j is the total insured farming land for farmer j in the i-th village,
Alternative Area Yield Index Based Crop Insurance Policies in Indonesia
287
yi j is the productivity of paddy (measured in tonne/ha) under normal circumstance for farmer j in the i-th village, S I is the sum insured in Rupiah. The indicator function ensures that the farmers must suffer a sufficient loss (i.e. exceeding 75% threshold damage level) before an indemnity can be triggered. From the contracting theory, it is desirable to have some minimum threshold level in order to eliminate small claims to reduce administration cost. The sum insured S I represents the compensation to the farmers for each tonne of paddy loss. The variable yi j measures the actual productivity of paddy (in tonne/ha) if there is no damage to D the farming land. Hence multiplying yi j by L ii jj captures the actual loss of paddy productivity attributed to the damaged farming land. Further multiplying by S I and yi j converts the loss of productivity into dollar value while reflects the size of the insured farming land. It is useful to contrast MPCI (1) with the existing PPBI’s MPCI. Broadly speaking, if yi j is a constant and set to one, then (1) resembles PPBI’s MPCI, thus highlighting the key difference lies in yi j . In other words, the indemnity for PPBI depends only on the severity of the damaged farming land while (1) takes into account both severity of damaged farming land and actual paddy productivity. The latter design appears to capture more adequately the actual loss incurred by the farmers. We now discuss our proposed area yield index (AYI) policy for the Indonesian farmers. Unlike MPCI which is an indemnity-based insurance, AYI is an index-based insurance. This means that its indemnity is linked to a well-defined reference index, which in this case is the yield of a particular “area”. The following gives the indemnity of farmer j in village i covered by AYI: indemnity = max(yc − y i , 0) · S I · L i j , i = 1, 2, . . . , m,
(2)
where yc is the critical area yield index for an area (tonne/ha), and y i is the (actual) average paddy productivity for the i-th village (tonne/ha). In this design, it is convenient to interpret “area” as the county so that the critical area yield index yc is predetermined and is the same for all farmers within the county. Furthermore, y i is the reference index that determines the indemnity. When the actual productivity of the entire village is good on average, then there will be no compensation from the policy. However, if, on average, the actual productivity of the entire village drops below the critical yield yc , then the indemnity will be triggered and all farmers within the village will be compensated proportionally (subject to the insured farming land size), and is irrespective of their actual productivity. By relating indemnity to the average productivity of the entire village, as opposed to base solely on the productivity at the individual level has the advantage of alleviating farmers’ moral hazard. Farmer, on the other hand, are exposed to basis risk; i.e. the mismatch between actual loss and indemnity payment. See, for example, [3] for more discussion on AYI policy. To evaluate the above different paddy insurance schemes, we resort to Monte Carlo studies. The assumed models are calibrated to the historical data provided by
288
D. Kusumaningrum et al.
MoA, Central Bureau of Statistics of Indonesia (CBS), as well as the proprietary experienced data from Jasindo. For example, the AYI policy requires as input yc , the critical area (average) yield for the entire area (i.e. county). This parameter is estimated historically based on MoA’s productivity data over the years 2005 to 2015. Our Monte Carlo studies also take into considerations the farmers’ perceptions surveys and discussion with forum groups conducted in 2019 in six provinces in Indonesia [2]. Moreover, we assume that the premium of the insurance premium is calculated according to the following three commonly adopted premium principles [4]: (i) pure premium; (ii) (loaded) expected premium; (iii) standard deviation premium. From Jasindo’s claim data, the safety loading factor for the latter two premium principles is estimated to be 12%. Finally, to gauge their relative efficiency we compare their loss distributions as well as risk measures such as Value at Risk and (V a R) and Tail Value at Risk (T V a R) to better evaluate the insurer’s risk exposure. The steps for our Monte Carlo studies are summarized as follows: 1. Simulate paddy productivity yi j for farmer j in the i-th village, where i = 1, 2, . . . , m, j = 1, 2, . . . , n i . Here we assume the county has 40 villages with each village consists of 100 farmers; i.e. m = 40 and n i = 100 for all i. Furthermore, the paddy productivity is assumed to be lognormally distributed with mean and variance estimated from the data from MoA and with 0.5 correlation coefficient between farmers’ productivity. 2. From the simulated yi j in Step 1, calculate the indemnity based on either (1) or (2). Averaging these indemnities and keep track of the resulting value. 3. Repeat Steps 1 and 2 100, 000 times produces a distribution of claim by the farmers (or equivalently the loss exposure to the insurer for underwriting the respective insurance policy. The simulated distribution, in turn, allows us to compute measures such as V a R, and T V a R at 90 and 95% confidence levels.
3 Results and Discussions The Monte Carlo procedure described above is applied to both MPCI and AYI schemes. For MPCI, we consider cases with general yi j and with yi j = 1. Recall that when yi j = 1, the resulting MPCI mimics the paddy insurance under the PPBI program. Figure 1 plots the corresponding histogram of claims and Table 1 provides their corresponding premiums, V a R, and T V a R. It is of interest to note that all three loss distributions appear to follow closely a normal distribution. The estimated premiums from MPCI with yi j = 1 range from IDR 238,608 to IDR 267,242 for the three premium principles. Assuming these results are reasonable, then the existing PPBI’s MPCI is underpriced since it charges IDR 180,000 currently. The PPBI’s premium was calculated by assuming 3% of the average amount of input costs needed for paddy farming. The assumption of 3% seems arbitrary. Furthermore, the premium has remained at that level since the inception of the PPBI program. Our simulated results, on the other, are based on a properly calibrated model. While the underpric-
Alternative Area Yield Index Based Crop Insurance Policies in Indonesia
289
(b) MPCI with yi j = 1
(a) MPCI
(c) AYI
Fig. 1 Histogram of insurer’s loss exposure under three paddy insurance designs Table 1 Comparison of premium, V a R, and T V a R under three insurance designs (in Rupiah) Insurance design
Pure premium
Loaded premium
90%
95%
Expectation Std. Dev.
VaR
T VaR
VaR
T VaR
MPCI
1,056,508
1,183,288
1,061,050
4,383,784
5,113,969
4,966,683
5,592,307
MPCI with yi j = 1
238,608
267,242
239,397
988,288
1,036,278
1,042,778
2,099,787
AYI
255,013
285,614
261,729
979,492
1,199,017
1,136,260
1,279,572
ing favours paddy farmers, it raises a concern to the government, especially in view of the PPBI program’s sustainability. Recall that one drawback of the PPBI’s MPCI is that its indemnity depends only on the damaged farming land. If we also wish to integrate the actual paddy productivity to the existing PPBI program, as we have considered in the indemnity function (1), the resulting insurance policy turns out to be very expensive and with a much higher risk exposure to the insurer. As shown in Table 1, the premium for MPCI is more than IDR 1 million, which translates into 4.5 times more expensive than the PPBI’s MPCI. In term of V a R, and T V a R, the risk measure from MPCI is almost five times higher than the corresponding risk measure from PPBI’s MPCI (except for T V a R at 95% confidence level, which is about 2.7 times larger). In view of the high cost of premium and the high risk exposure, these results suggest that integrating paddy productivity to the existing MPCI policy (as stipulated in (1)) is not a viable solution. Let us now consider our proposed AYI policy. The estimated premium for the AYI policy is very close to that from MPCI with yi j = 1. Despite that its premium is slightly higher, it is still well below the level of willingness to pay by the farmers, as reflected in the survey conducted in 2019 (see Anisa et al. [2]). Furthermore, VaR for both insurance schemes is very similar although as we increase the confidence level from 90 to 95%, the TVaR for PPBI’s MPCI is almost double, indicating a much higher tail risk. Hence in term of tail risk, the proposed AYI is preferred. Also, the AYI has the added advantage that it overcomes a number of drawbacks of PPBI’s MPCI, including moral hazard. Furthermore, administratively AYI has lower cost.
290
D. Kusumaningrum et al.
4 Conclusion This paper attempted to improve the existing paddy insurance by considering two other insurance schemes. The first scheme enhanced the existing PPBI’s MPCI by integrating paddy productivity to the indemnity function while the second scheme was an index-based insurance with area yield as its index. By calibrated the models to the empirical data, extensive Monte Carlo studies were performed. Based on our simulation results, we concluded that the first proposed scheme was not viable given that it was too costly and that it exposed insurer to a significant tail risk. The alternate AYI policy, on the other hand, was a viable option in that its premium was comparable to the existing PPBI’s MPCI and yet with a much smaller tail risk. Moreover, it had less moral hazard and a more cost effectiveness program. The downsize of AYI was that it introduced basis risk to the farmers. Such risk should not be ignored though it could be addressed adequately via an optimal construction of “village” and “county”. Finally, the mechanism of AYI is vastly different from the existing MPCI that paddy farmers are getting familiar with. If government were to adopt AYI, it would require additional education and socialization to the paddy farmers. Acknowledgements The authors acknowledge the funding support from the Risk Management, Economic Sustainability and Actuarial Science Development in Indonesia (READI) funded by the Global Affairs Canada. The authors are grateful to the Ministry of Agriculture Indonesia and Jasindo for sharing their data and knowledge, as well as to Jonathan Hoseana, Ph.D, for proofreading the first draft of this paper.
References 1. Ministry of Agriculture. Directorate general of agricultural infrastructure and facilities, pedoman bantuan premi asuransi usaha tani padi tahun anggaran. Technical report (2017). http://psp.pertanian.go.id/index.php/page/newsdetail/39/ PedomanBantuanPremiAsuransiUsahaTaniPadiTA2017.pdf 2. Anisa, R., Kusumaningrum, D., Sutomo, V.A., Tan, K.S.: Potential of reducing crop insurance subsidy based on willingness to pay and random forest analysis. Working Paper (2020) 3. Skees, J.R., Black, J.R., Barnett, B.J.: Designing and rating an area yield crop insurance contract. Am. J. Agric. Econ. 79(2), 430–438 (1997) 4. Porth, L., Tan, K.S., Weng, C.: Optimal reinsurance analysis from a crop insurer’s perspective. Agric. Finance Rev. 73(2), 310–328 (2013)
Clustering Time Series by Nonlinear Dependence Michele La Rocca and Luca Vitale
Abstract The problem of time series clustering has attracted growing research interest in the last decade. The most popular clustering methods assume that the time series are only linearly dependent but this assumption usually fails in practice. To overcome this limitation, in this paper, we study clustering methods applicable to time series with a general dependent (possibly nonlinear) structure. We propose a dissimilarity measure based on the auto distance correlation function which is able to detect both linear and nonlinear dependence structures. Once the pairwise dissimilarity matrix for time series has been obtained, a standard clustering algorithm, such as hierarchical clustering algorithm, can be used. Numerical studies based on Monte Carlo experiments show that our method performs reasonably well. Keywords Clustering · Nonlinear time series · Autodistance correlation function
1 Introduction Large amounts of data relating to time series are frequently collected in the field of economics and finance. In this framework, it is often desirable to identify groups of homogeneous time series, using clustering techniques. Homogeneity implies that time series within the group are similar while those between groups have different properties. This analysis can provide valuable information on its own, but it can also be used as a preliminary step in times series modelling and forecasting. On one hand, pooling similar datasets generally leads to better estimates; on the other, when prediction is needed on a large number of time series, one could be interested in implementing a common analysis within each group. Moreover, in financial M. La Rocca (B) · L. Vitale Department of Economics and Statistics, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy e-mail: [email protected] L. Vitale e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_43
291
292
M. La Rocca and L. Vitale
applications, clustering is often used as an important building block before further processing, such as portfolio selection, is performed. The aim of this paper is to propose and discuss a novel clustering procedure based on the auto distance correlation function, an effective tool recently introduced to measure (possibly) nonlinear dependence across time series. The basic idea is to base the clustering procedure on similarity/dissimilarity measures which are able to include in the clustering algorithm nonlinear features of time series (such as volatility) that are not usually detected by the standard correlation measures, unless appropriate transformations of the original time series are performed. The paper is organized as follows. In Sect. 2 a dissimilarity measure based on distance autocorrelation is introduced and the clustering procedure to find groups in large vectors of time series is proposed and explained. In Sect. 3 the results of a small Monte Carlo study are reported and discussed. Some concluding remarks, along with possible lines of future research, close the paper.
2 Clustering Based on the Auto Distance Correlation Function The distance covariance function and the auto distance correlation function for univariate and multivariate time series are defined in [1–3] along with their estimators and asymptotic properties under general conditions. Suppose that we have observeda d-vector of stationary time series, namely X i,t : i = 1, 2, . . . , d; t = 1, 2, . . . , T . Let Rii (h) be the auto distance correlation of time series i at lag h and let Ri j (h) the distance cross-correlation between the time series i and j at lag h. Define: ⎡
⎤ . . . B(k) . . . B(k − 1)⎥ ⎥ ⎥ .. .. ⎦ . . B(−k) B(−k + 1) B(−k + 2) . . . B(0)
B(0) ⎢B(−1) ⎢ Bk = ⎢ . ⎣ .. where:
B(1) B(0) .. .
B(2) B(1) .. .
Rii (h) R ji (h) B(h) = Ri j (h) R j j (h)
with R being the positive square root of: Ri2j (h) =
i2j (h) V .
ii2 (0) V j2j (0) V
ji2 (h) for i = j, since they measure different i2j (h) = V Note that in general, V dependence structure between the series X i,t and X j,t for all i, j = 1, 2, . . . , d.
Clustering Time Series by Nonlinear Dependence
⎧ ⎨ ⎩
1 (T −h)2 1 (T +h)2
T
j
t,s=1+h
T +h
Aits Bts , j
t,s=1
Aits Bts ,
293
0 ≤ h ≤ (T − 1) −(T − 1) ≤ h ≤ 0
n− j where Aits = atsi − a¯ t·i − a¯ .si + a¯ ··i with atsi = |X t,i − X s,i |, a¯ t.i = ( s=1 atsi )/ n− j n− j n− j (n − j), a¯ .si = ( t=1 atsi )/(n − j) and a¯ ..i = ( t=1 s=1 atsi )/(n − j)2 . Similarly, j j j j j define the quantities bts = |Yt, j − Ys, j | to obtain b¯t· , b¯.s , b¯·· and b¯ts (see [4] for details). A convenient measure of dissimilarity between time series i and j is given by their joint dependency defined as: Si j (k) = |Bk |1/2(k+1) . where |B| denotes the determinant of the matrix B. The dissimilarity measure will reach the largest value, one, when the two series are independent, and will be zero if they are identical. This measure is the analogue of the measure proposed in [5] when the correlation matrix is replaced with a measure of nonlinear dependence, the auto distance correlation matrix. By using simple or partial correlation, the nonlinear features such as volatility or other nonlinear behavior, cannot be measured directly. A possible workaround would be looking at the autocorrelation of the absolute values or the squared residuals of some parametric model fit. But this add an additional complexity in specifying and estimate an appropriate model, which might be not straightforward in the nonlinear case. The use of the auto distance correlation matrix, instead, allows to deal nicely with both linear and nonlinear time series, with possibly very different dynamic structure. The general clustering procedure can be implemented in two steps: Step 1: the series are split into groups by their dependence, measured by the determinant of their distance cross-correlation matrix, used as input of an agglomerative hierarchical clustering; this step will select group of time series that are cross-dependent. Step 2: inside each group the series are split by putting together series with a similar structure, measured by using auto distance correlation and, again, using these measures as input of agglomerative hierarchical clustering. In order to select the value of k, in the definition of the matrix Bk , we might use some prior information, when available, or we may think of k as a meta-parameter to be identified in a data-driven manner. One possible approach is to fix k as the largest significant lag, that is the last lag that contains additional information about both the cross correlations and the autocorrelations of the two series, by using some resampling procedure based on subsampling or wild bootstrap as detailed in [4].
294
M. La Rocca and L. Vitale
3 Numerical Results The procedure has two steps and so the simulation study will have two parts. The first part is used to evaluate the performance of the auto distance correlation function in clustering independent time series; the second part is used to evaluate the performance of the dissimilarity measure based on the auto distance correlation function in clustering dependent series. Here just this latter part will be presented and discussed. The simulation design will consider both a linear and a nonlinear scenarios. The linear scenario is the same as in [5]. It includes 15 linear time series generated according to the model xi,t = φi xi,t−1 + εi,t with φi = 0.9 for i = 1, 2, . . . , 5 and φi = 0.2 for i = 6, 7, . . . , 15. Thus, from the univariate structure we have two clear groups. The cross dependency is introduced through the innovations εi,t which are Gaussian white noise variables but with a dependence structure ρ(i, j) = E(εi,t , εi,t ) which depends on the group correlation structure. Five different group correlation structures are defined by indicating the non null cross-correlations. All the other possible cross correlations are assumed to be zero. D1: ρ(i, i + 1) = 0.5, i = 1, 2, . . . , 9. It includes a group of 10 correlated time series and 5 isolated time series. D2: ρ(i, i + 1) = 0.5, i = 1, 2, . . . , 9 and ρ(i, i + 1) = 0.5, i = 11, 12, . . . , 14. It includes two groups of correlated time series. D3: ρ(i, j) = 0.9 for i = 1, . . . , 9 and j = i + 1, . . . , 10. It includes a group of 10 strongly correlated and 5 isolated time series. D4: ρ(i, j) = 0.9 for i = 1, . . . , 9 and j = i + 1, . . . , 10 and ρ(i, i + 1) = 0.5 for i = 11, . . . , 14. It includes a group of 10 strongly correlated and a group of 5 correlated time series. D5: ρ(i, j) = 0.9 for i = 1, . . . , 9 and j = i + 1, . . . , 10 and for i = 11, . . . , 14 and j = i + 1, . . . , 15. It includes two groups of 10 and 5 strongly correlated time series. The nonlinear scenario design has the same group correlation structure as before but it uses non linear time series generated according to the bilinear model xi,t = ai xi,t−1 + bi xi,t−1 εi,t−1 + εi,t where bi = 0.7 for i = 1, 2, . . . , 5 and bi = 0.2, and for i = 6, 7, . . . , 15, while ai = 0.2 for i = 1, 2, . . . , 15. For the comparison of the clustering results, we have considered a measure of similarity between two different partition as defined in [6]. The index returns values from the interval [0, 1] with the upper limit 1 denoting perfect equality of the two partitions. All computations are made in R using the package dCovTS [4]. The means of the similarity index from 1,000 Monte Carlo replicates for the 5 group correlation structures, when T = 100 and T = 300 are reported in Table 1, for the linear scenario, and Table 2, for the nonlinear scenario. For both scenarios, the performance increases when time series lengths increase. When there is a strong group correlation structure (designs D3 and D5), clustering is excellent for all clustering methods used, even for small sample sizes. When there is a weaker correlation structure, results may vary across the clustering methods used. In this scenario, there
Clustering Time Series by Nonlinear Dependence
295
Table 1 Mean values of the similarity index from 1,000 Monte Carlo replicates for the 5 group correlation structures, when T = 100 and T = 300 for the linear models scenario Method D1 D2 D3 D4 D5 100 300 100 300 100 300 100 300 100 300 Average Centroid Complete Mcquitty Median Single Ward.D Ward.D2
0.81 0.83 0.61 0.78 0.88 0.65 0.67 0.67
0.91 0.91 0.63 0.87 0.92 0.66 0.74 0.70
0.65 0.65 0.67 0.70 0.65 0.65 0.66 0.65
0.68 0.67 0.66 0.78 0.67 0.65 0.67 0.67
1.00 1.00 1.00 1.00 1.00 1.00 0.65 0.84
1.00 1.00 1.00 1.00 1.00 1.00 0.70 1.00
0.75 0.72 0.88 0.77 0.72 0.71 0.67 0.78
0.76 0.71 0.87 0.77 0.72 0.71 0.71 0.90
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
Table 2 Mean values of the similarity index from 1,000 Monte Carlo replicates for the 5 group correlation structures, when T = 100 and T = 300 for the linear models scenario Method D1 D2 D3 D4 D5 100 300 100 300 100 300 100 300 100 300 Average Centroid Complete Mcquitty Median Single Ward.D Ward.D2
0.66 0.77 0.52 0.65 0.81 0.76 0.42 0.43
0.71 0.85 0.56 0.72 0.90 0.91 0.45 0.46
0.66 0.65 0.65 0.69 0.64 0.66 0.71 0.70
0.69 0.65 0.64 0.79 0.65 0.83 0.82 0.82
1.00 1.00 1.00 1.00 1.00 1.00 0.77 0.99
1.00 1.00 1.00 1.00 1.00 1.00 0.82 1.00
0.85 0.74 0.88 0.85 0.74 0.84 1.00 1.00
0.87 0.75 0.87 0.87 0.75 0.96 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
is a higher difference between the linear and nonlinear scenario with better results for the nonlinear case (see Fig. 1), at least for some of the clustering methods. This might suggest the opportunity to use some clustering procedure that can take in to account the different features that time series might exhibit in real data.
4 Concluding Remarks The capacity to successfully cluster large sets of time series is a prevalent domain of research in many fields and particularly in economics and finance. In this paper, a novel procedure for clustering time series that takes into account their nonlinear cross dependency has been presented and discussed. Because of the variety of the structures of the dependence of time series, especially in the nonlinear case, we have avoided model-based clustering and used the distance-based method instead. The
296
M. La Rocca and L. Vitale
Fig. 1 Mean values of the similarity index from 1,000 Monte Carlo replicates for the 5 group correlation structures, when T = 100 and T = 300 for the linear and the nonlinear models scenario, for alternative grouping structures (D1, D2,. . . , D5) and different linkage methods
procedure is based on pairwise measures involving the determinant of the crossdistance correlation matrices from lag zero to lag k. A small Monte Carlo study has shown the good performance of the clustering algorithm for small sample sizes.
References 1. Fokianos, F., Pitsillou, M.: Consistent testing for pairwise dependence in time series. Technometrics 159, 262–270 (2017) 2. Fokianos, F., Pitsillou, M.: Testing independence for multivariate time series via the auto-distance correlation matrix. Biometrika 105, 337–352 (2018)
Clustering Time Series by Nonlinear Dependence
297
3. Zhou, Z.: Measuring nonlinear dependence in time-series, a distance correlation approach. J. Time Ser. Anal. 33, 438–457 (2012) 4. Pitsillou, M., Fokianos, F.: dCovTS: distance covariance/correlation for time series. R J. 8, 324–340 (2016) 5. Alonso, A.M., Pena, D.: Clustering time series by linear dependency. Stat. Comput. 19, 655–676 (2019) 6. Lange, T., Roth, V., Braun, M.L., Buhmann, J.M.: Stability-based validation of clustering solutions. Neural Comput. 16, 1299–1323 (2004)
Quantile Regression Neural Network for Quantile Claim Amount Estimation Alessandro G. Laporta, Susanna Levantesi, and Lea Petrella
Abstract Quantile Regression to estimate the conditional quantile of the claim amount for car insurance policies has already been by Heras et al. (An application of two-stage quantile regression to insurance ratemaking. Scand. Actuar. J. 2018, 2018) [1] and others. In this paper, we explore two alternative approaches, the first involves Quantile Regression Neural Networks (QRNN), while the second is an extension of the Combined Actuarial Neural Network (CANN) by Schelldorfer and Wüthrich (Nesting classical actuarial models into neural networks. SSRN, 2019) [2] where we nest the Quantile Regression model into the structure of a neural network (Quantile-CANN). This technique captures additional information respect to the simple Quantile Regression, representing non linear relationship between the covariates and the dependent variable, and involving possible interactions between predictors. To compute the conditional quantile of the total claim amount for a generic car insurance policy, we adopt the two part model approach discussed by Heras et al. (An application of two-stage quantile regression to insurance ratemaking. Scand. Actuar. J. 2018, 2018) [1]. In a first step, we fit a logistic regression to estimate the probability of positive claim. Then, conditional on positive outcome, we use QRNN and Quantile-CANN to estimate the conditional quantile of the total claim amount. The simulation results show that QRNN and Quantile-CANN exhibit an overall better performance in terms of quantile loss function with respect to the classical Quantile Regression. Keywords Quantile regression · Neural netrworks · Insurance pricing
A. G. Laporta (B) · S. Levantesi Department of Statistical Sciences, Sapienza University of Rome, Piazzale Aldo Moro 5, Rome, Italy e-mail: [email protected] S. Levantesi e-mail: [email protected] L. Petrella MEMOTEF Department, Sapienza University of Rome, Via del Castro Laurenziano 9, Rome, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_44
299
300
A. G. Laporta et al.
1 Introduction The use of machine learning for insurance pricing has flourished in recent years and the related literature is increasing accordingly [3] performs pricing optimization using several machine learning models and [2] employ neural network to enhance GLMs performances. However, such techniques only give an estimation of the expected value of the chosen variable (claim frequency or claim severity), since they are designed to return the pure premium of a specific policy, where the expected value of the total claim amount. Hence, these models, even if they offer insight about the average loss of a policy, are unable to provide the modeler with some information about its potential riskiness, e.g. the quantile of the total claim amount. The quantile approach provides information about the whole distribution of the total claim amount, enabling the insurer to soundly gauge the portfolio riskiness, by computing specific risk measures (e.g. Value-at-Risk). This approach appears particularly suitable when calculating the premium safety loadings. Modeling the quantile claim amount through Quantile Regression (QR) has already been discussed by Heras et al. [1] and Baione and Biancalana [4]. However, such approach may fail to account for non linear relationship between the covariates and the dependent variable or interactions between predictors. To fill this gap, we propose two different methods to estimate the conditional quantile of the total claim amount for a group of car insurance policies: the first involves Quantile Regression Neural Networks (QRNN), while the second is an extension of the Combined Actuarial Neural Network (CANN) by Schelldorfer and Wüthrich [2], where we nest the Quantile Regression model into the structure of a neural network (Quantile-CANN henceforth). This technique is able to represent additional information incorporated in the data, not captured by the simple QR. In the numerical application, we compare the performances obtained by the different models (QR, QRNN and Quantile-CANN), showing that our models exhibit an overall better performance in terms of quantile loss function with respect to the classical QR. We then exploit the estimates provided by the different models in the calculation of a quantile based insurance premium, the so-called Quantile Premium Principle (QPP), proposed by Heras et al. [1].
2 Methodology To compute the τ -th quantile of the total claim amount for a generic policy i, we adopt the two stage approach discussed by Heras et al. [1] as follows: • Using Logistic Regression we estimate the no-claim probability pi for each policy according to the set of covariates x defined on the feature space χ . Such probability − pi . For is involved in the computation of the quantile level τ , obtained as τ = τ1− pi
Quantile Regression Neural Network for Quantile …
301
this level we have that Q Si |Ni >0 (τi |xi ) = Q Si (τ |xi ), where: Si is the claim amount for policyholder i, Ni is the number of claims submitted by policyholder i and Q Si |Ni >0 (τi |xi ) is the quantile (at level τ ) of the total claim amount given by the policyholders that reported at least one claim. • In the second stage we estimate Q Si |Ni >0 (τi |xi ) applying QRNN and QuantileCANN. Hence we obtain the conditional quantile of the total claim amount for each risk group (given by the combination of the different features). Now, considering the QPP, we compute the premium Pi for a generic policyholder i: Pi = E(Si ) + γ · (Q Si |Ni >0 (τi |xi ) − E(Si )) = γ · Q Si |Ni >0 (τi |xi ) + (1 − γ ) · E(Si )
(1) where γ is the loading factor and E(Si ) the expected value of the total claim amount.
2.1 Quantile Regression The estimation of the quantile claim amount is usually performed in the QR framework as proposed by Koenker and Bassett [5]. This regression specifies a linear relationship between the claim amount Si and the set of covariates x defined on the feature space χ of dimension q0 ∈ N, expressed as
Si = xi β τ + εiτ ,
for all i = 1, 2, . . . , N ,
(2)
where β τ = (βτ ,1 , βτ ,2 , . . . , βτ , p ) ∈ Rq0 is the vector of unknown regression parameters. Model (2), with the condition that the τ -th quantile of εiτ is zero for all τ of i = 1, 2, . . . , N , defines a properly specified QR model. The estimates vector β the regression parameters β τ can be obtained as follows: N τ = argminβ 1 ρτ (Si − xi β τ ), β τ N i=1
(3)
where the quantile loss function (I(·) be the indicator function) is defined as: ρτ (u) = u (τ − I(u0 (τi ) N claims i i
6,553.6
5,699.9
5,634.8
4,790.5
4,210.2
6,092.0
5,781.5
5,057.8
5,180.8
4,938.7
3,372.0
2,531.7
4,594.3
5,326.1
2,911.8
2,186.5
3,033.4
2,586.1
3,435.1
2,573.4
2,970.4
QR
6,724.0
4,620.3
6,120.3
3,829.8
3,461.8
5,915.0
5,675.9
5,470.5
5,020.6
5,328.7
3,183.4
2,651.9
4,381.8
5,526.3
2,942.5
2,327.2
2,782.2
2,674.8
3,121.5
2,575.4
2,991.3
QRNN
S |N >0 (τ ) Q i i i
6,766.2
5,672.3
5,848.7
4,668.7
4,087.7
5,966.3
5,700.8
5,260.9
5,181.9
4,959.2
3,234.0
2,639.3
4,497.6
5,382.1
2,970.9
2,330.9
2,980.5
2,709.5
3,294.9
2,598.0
2,978.8
788.0
1,043.1
588.6
1,262.2
1,161.3
437.2
144.8
565.6
360.9
240.9
281.7
163.6
269.6
95.4
49.0
135.7
223.5
81.8
182.9
14.9
33.9
617.6
36.5
103.1
301.5
412.9
260.1
39.1
152.8
200.6
149.1
93.1
43.5
57.1
104.7
18.3
4.9
27.7
6.9
130.7
16.8
12.9
575.4
1,015.6
374.7
1,140.4
1,038.8
311.5
64.1
362.4
362.0
220.5
143.8
56.1
172.9
39.4
10.1
8.6
170.6
41.6
42.7
39.4
25.5
S |N >0 (τ ) − Q E |Q i Si |Ni >0 (τi )| i i Q-CANN QR QRNN Q-CANN
Table 2 Risk groups are given by the combination of DriverAge and CarAge
1,681.4
1,407.8
1,440.9
1,184.9
1,024.3
1,528.8
1,489.3
1,263.8
1,299.1
1,228.4
846.6
630.8
1,130.1
1,344.1
726.6
541.5
750.9
637.9
854.0
635.5
733.9
QR
QPP
1,715.5
1,191.9
1,538.0
992.8
874.6
1,493.4
1,468.2
1,346.3
1,267.0
1,306.4
808.9
654.8
1,087.6
1,384.1
732.7
569.7
700.7
655.6
791.3
635.9
738.1
QRNN
1,723.9
1,402.3
1,483.7
1,160.6
999.8
1,503.7
1,473.2
1,304.4
1,299.3
1,232.4
819.0
652.3
1,110.7
1,355.3
738.4
570.4
740.3
662.6
826.0
640.4
735.6
Q-CANN
304 A. G. Laporta et al.
Quantile Regression Neural Network for Quantile …
305
model, since it is able to detect part of the additional model structure given by the interaction term in Eq. (9) that the classical QR failed to account for. These results are obtained by training the different models on a training set composed by 30.244 policies with at least one claim and considering a validation set of 3.242 policies (used for early stopping), while the out of sample results are obtained using a testing set of 14.259 policies. Note that the covariates Driver Age and V eh Age are categorized in 7 and 3 classes, respectively, resulting in a total amount of 21 risk groups. Since the values of the in-sample and out-of-sample losses are quite similar it seems that network models were able to successfully avoid overfitting. Table 2 shows, for each risk group i and model, the estimated conditional quantile Si |Ni >0 (τi ), the absolute distance from the empirical quantile observed in the testing Q Si |Ni >0 (τi ) − Q SE |N >0 (τi )|, with τ computed fixing τ = 0.99, and QPP sample, | Q i i values calculated according to Eq. 1, assuming γ = 0.2. In particular, observing the results for the absolute distance, we notice that QRNN and Quantile-CANN almost always display a lower value respect to the QR model. More specifically, QRNN has generally the lowest absolute distance, followed by the Quantile-CANN. These findings are consistent with the results for the quantile loss functions reported in Table 1. These results are obtained on the first simulated dataset.
References 1. Heras, A., Moreno, I., Vilar-Zanón, J.: An application of two-stage quantile regression to insurance ratemaking. Scand. Actuarial J. 2018, 1–17 (2018) 2. Schelldorfer, J., Wüthrich, M.V.: Nesting classical actuarial models into neural networks. SSRN. https://ssrn.com/abstract=3320525 (2019) 3. Spedicato, G., Dutang, C., Petrini, L.: Machine learning methods to perform pricing optimization. a comparison with standard glms. Variance 12, 69–89 (2018) 4. Baione, F., Biancalana, D.: An individual risk model for premium calculation based on quantile: a comparison between generalized linear models and quantile regression. North Am. Actuarial J. 23(4), 573–590 (2019) 5. Koenker, R., Bassett, G.: Regression Quantiles. Econ.: J. Econ. Soc. 46(1), 33–50 (1978) 6. Taylor, J.W.: A quantile regression neural network approach to estimating the conditional density of multiperiod returns. J. Forecast. 299–311 (2000)
Modelling Health Transitions in Italy: A Generalized Linear Model with Disability Duration Susanna Levantesi and Massimiliano Menzietti
Abstract Italy is characterized by a population aging produced by both life expectancy increases and fertility rates reduction, as a consequence the social security system is exposed to an increasing disability risk. The Italian National Institute of Social Security (INPS) protects workers through two economic benefits, invalidity benefits (provided to workers whose working capacity is reduced by at least a third) and disability benefits (provided to workers who are completely and permanently unable to return to any working activity). The aim of this paper is modeling health transitions in a multi-state Markov model, developing a Generalized Linear Model including age effects, duration effects and age-duration interaction as covariates explaining health transitions. Keywords Generalized linear models · Multistate health transitions · Invalidity benefits · Disability benefits
1 Introduction The Italian social security system protects the workers registered with obligatory forms of insurance for Disability, Old Age and Survivors, or special packages for independent workers, who are no longer able, completely or in part, to return to working activity, through two economic benefits, invalidity benefits and disability pension, to help ease the state of need and adverse economic position in which they find themselves. The first one is an ordinary invalidity benefit (Assegno Ordinario di Invalidità): an economic benefit, provided upon request, to workers whose working S. Levantesi Department of Statistics, Sapienza University of Rome, Viale Regina Elena, 295G, 00191 Rome, Italy e-mail: [email protected] M. Menzietti (B) Department of Economics, Statistics and Finance Giovanni Anania, University of Calabria, Ponte Pietro Bucci, 87036 Arcavacata di Rende, Cosenza, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 M. Corazza et al. (eds.), Mathematical and Statistical Methods for Actuarial Sciences and Finance, https://doi.org/10.1007/978-3-030-78965-7_45
307
308
S. Levantesi and M. Menzietti
capacity is reduced by at least a third due to physical or mental illness. While, the second one is the disability pension (Pensione di Inabilità): an economic benefit, provided upon request, to workers who are completely and permanently unable to return to any working activity. The aim of the paper is modeling the health transitions related to disability pensions and invalidity benefits in a multiple state Markov model framework, using a Generalized Linear Model (GLM) including age effects, duration effects and their interaction. The model is applied to the database provided by the Italian National Institute of Social Security (INPS), containing information on actives workers and invalidity/disability beneficiaries.
2 Data The data were extracted from two INPS datasets: actives dataset (containing information relating to active employees) and beneficiaries dataset (containing information on pensioners receiving invalidity and disability benefits). They allowed the construction of two different data: (i) the number of active employees exposed to risk and number of transitions from the active state and (ii) the number of invalids/disabled and number of transitions from the invalid/disable state. The first database represents the statistical support for the construction of multiple decrement life tables for actives lives, where decrements are produced by: invalidity, disability, death and other causes. The second database allows for the construction of multiple decrement life tables for invalid lives due to permanent disability, death and revision of the invalidity benefit and the construction of life table for disables. The data considered in the analysis refer to 2014–2016 years. The age considered is 30–60 for active employees and recipients of invalidity benefits and 30–90 for recipients of disability benefits.
3 Methodology Following the main literature on this topic [1–5], we adopt a multi-state continuous time Markov model to describe the health transitions and we model the transition intensities through a GLM as in [2, 5–8].
3.1 A Five-State Time Inhomogeneous Markov Model To represent the states that an individual can assume over time, we consider a nonhomogeneous multistate continuous-time model with five states: 1 = active, 2 = invalid, 3 = disabled, 4 = dead, 5 = withdrawn for other causes (than death). The reactivation from the disability state (3) to the invalidity state (2) is disregarded,
Modelling Health Transitions in Italy: A Generalized …
309
Fig. 1 Graph of the five-states model
since the state of disability is considered permanent. The graph illustrating the nodes (states) and arcs (transitions between states) of the model is shown in Fig. 1. We ij denote with px (s, t, z, v) the generic transition probability from state i at time s ij with duration z to state j at time t with duration v and μx (t, z) the corresponding transition intensity in t with duration z. We assume that the transition intensities ij ij ij are piecewise constant, μx+u (t + u, z + ζ ) for 0 ≤ u, ζ < 1, μx (t, z) = m x (t, z), ij where m x (t, z) are the transition rates corresponding to the transition intensities.
3.2 Generalized Linear Model Specification We model the transition intensities with a GLM approach, that requires the specification of a link function, the linear predictor and the probability distribution. Considering the datasets information (the available years are merged), the variables considered in the model are age, duration and their interaction. The analyses are carried out separately by gender. Let Z = Z 1 , . . . , Z n be the set of covariates, x the age and d the ij duration in the state, we denote m x (d) the central transition rate from state i to state j at age x with duration d. We consider the following covariates: • Z Age is the age of claimant when a state transition occurs; • Z Dur is the duration of claimant when a state transition occurs. Then, we add a quadratic and cubic age-effect ( Age2 and Age3 ) and possible interactions among age and duration. The GLM is defined by the following equation: n ij ij ij Z h · βh + . g m x,d = β0 +
(1)
h=1 ij
ij
where g : R ⇒ R is the link-function characterizing the model, β1 , · · · , βn the ij regression parameters, β0 the intercept and the error term. The linear predictor is ij ij ij given by ηx,d = β0 + nh=1 Z h · βh . Using a log link function, Eq. 1 becomes: n ij ij ij log m x,d = β0 + Z h · βh + . h=1
(2)
310
S. Levantesi and M. Menzietti
ij ij Therefore, the estimated central transition rates are: mˆ x,d = ex p(β0 + nh=1 Z h · ij βh ). We assume that the number of transitions from state i to state j at age x with ij ij ij i i m x,d ), with E x,d duration d, N x,d , follows a Poisson distribution: N x,d ∼ Pois(E x,d exposed to risk in state i at age x with duration d. For transitions from active state, we have to model the following transition rates: from active to invalid, m 12 x ; from active 14 to disabled, m 13 x ; from active to dead, m x ; from active to withdrawn for other causes (than death), m 15 x . Due to the dataset limitation, in the case of active initial state the GLM only considers age related factors; specifically, we include age covariates up to the cubic effect: 1j 1j 1j 1j log mˆ 1x j = β0 + Z Age · β1 + Z Age2 · β2 + Z Age3 · β3
f or j = 2, 3, 4, 5 (3) For transitions from invalid state, we have to model the following transition rates: 23 revision of the invalidity benefit, m 21 x,d ; from invalid to disable, m x,d ; mortality for 24 an invalid person, m x,d . In this case, we use a GLM with the following covariates: age-effect, quadratic age-effect, duration effect and age-duration interaction: 2j 2j 2j 2j 2j 2j log mˆ x,d = β0 + Z Age · β1 + Z Age2 · β2 + Z Dur · β3 + Z Age · Z Dur · β4 (4) f or j = 1, 3, 4 . For the transitions from disabled state, we only have to model the transition from disability to death, m 34 x (d), due to the permanent nature of the disabled state. The GLM for this transition considers as covariates the age-effect, the quadratic ageeffect, the duration effect and the age-duration interaction: 34 34 34 34 34 log mˆ 34 x,d = β0 + Z Age · β1 + Z Age2 · β2 + Z Dur · β3 + Z Age · Z Dur · β4 (5)
3.3 Fitting and Model Selection We fit the above described GLMs to the INPS datasets using the maximum likelihood approach to estimate the regression coefficients β with the log-likelihood function given by: ij ij ij i i N x,d log[E x,d m x,d (β)] − E x,d m x,d (β)] + C (6) l(β) = x
where C is a constant. Separately for each transition rate and gender, we compare different covariates combinations in Eqs. 3, 4 and 5 using a stepwise selection process, where a new covariate is added at each step. The criteria adopted in the
Modelling Health Transitions in Italy: A Generalized …
311
Table 1 AIC, BIC and LRT for transitions from the active state—male population Model
m 12 x
m 13 x
m 14 x
m 15 x
AIC
BIC
p(LRT) AIC
BIC
p(LRT) AIC
BIC
p(LRT) AIC
BIC
p(LRT)
m0
3506
3508
−
325
326
−
683
684
−
8943
8944
−
m1
235
238