Statistical Analysis of Operational Risk Data (SpringerBriefs in Statistics) 3030425797, 9783030425791

This concise book for practitioners presents the statistical analysis of operational risk, which is considered the most

137 19 5MB

English Pages 93 [92] Year 2020

Table of contents :
Contents
List of Figures
List of Tables
1 The Operational Risk
1.1 Introduction
1.2 Models for Operational Risk
1.2.1 Basic Indicator Approach
1.2.2 Standardized Approach
1.2.3 Advanced Measurement Approach
1.3 Loss Distribution Approach
1.4 DIPO Consortium
References
2 Identification of the Risk Classes
2.1 Introduction
2.2 Distributional Tests
2.3 Application to DIPO Data
References
3 Severity Analysis
3.1 Introduction
3.2 Mixture of Three-Parameter Log-Normal Distributions
3.3 Extreme Value Theory
3.4 Application to DIPO Data
3.4.1 Mixture of k Log-Normal Distributions
3.4.2 Log-Normal–GPD Distribution
3.4.3 Comparison
References
4 Frequency Analysis
4.1 Introduction
4.2 Mixture of Poisson Distributions
4.2.1 The Poisson Distribution
4.2.2 Finite Poisson Mixture
4.3 Mixture of Negative Binomial Distributions
4.3.1 The Negative Binomial Distribution
4.3.2 Relationship with Poisson Distribution
4.3.3 Maximum Likelihood Estimation
4.3.4 Finite Negative Binomial Mixture
4.4 Application to DIPO Data
References
5 Convolution and Risk Class Aggregation
5.1 Introduction
5.2 Overall Loss Distribution
5.3 Risk Class Aggregation and Copula Functions
5.3.1 Tail Dependence
5.3.2 Elliptical Copulae
5.3.3 Archimedean Copulae
5.4 Value-at-Risk Estimates Considering t-Copula
References
6 Conclusions

Recommend Papers

Probabilistic Risk Analysis and Bayesian Decision Theory (SpringerBriefs in Statistics) 303116332X, 9783031163326

The book shows how risk, defined as the statistical expectation of loss, can be formally decomposed as the product of tw

99 11 5MB Read more

Tutorial: Statistical Analysis of Network Data

103 0 7MB Read more

Bioinformatic and Statistical Analysis of Microbiome Data

226 14 1MB Read more

Identifiability and Regression Analysis of Biological Systems Models: Statistical and Mathematical Foundations and R Scripts (SpringerBriefs in Statistics) 3030412547, 9783030412548

This richly illustrated book presents the objectives of, and the latest techniques for, the identifiability analysis and

116 42 2MB Read more

Statistical Data Analysis and Entropy 9789811525520

476 78 29MB Read more

Statistical data analysis and entropy 9789811525513, 9789811525520

479 32 3MB Read more

Statistics Slam Dunk: Statistical analysis with R on real NBA data 9781633438682

Learn statistics by analyzing professional basketball data! In this action-packed book, you’ll build your skills in expl

107 98 Read more

Statistical Analysis of Microbiome Data (Frontiers in Probability and the Statistical Sciences) 3030733505, 9783030733506

Microbiome research has focused on microorganisms that live within the human body and their effects on health. During th

103 81 9MB Read more

Statistics Slam Dunk: Statistical analysis with R on real NBA data [1 ed.] 1633438686, 9781633438682

Learn statistics by analyzing professional basketball data! In this action-packed book, you’ll build your skills in expl

114 1 17MB Read more

Statistics and Data Analysis Through R 9798560999926

This book focuses on the implementation of statistics and data analysis through R. It deals first with the Exploratory D

756 109 3MB Read more

Statistical Analysis of Operational Risk Data (SpringerBriefs in Statistics)
3030425797, 9783030425791

Author / Uploaded
Giovanni De Luca
Danilo Carità
Francesco Martinelli

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

SPRINGER BRIEFS IN STATISTICS

Giovanni De Luca Danilo Carità Francesco Martinelli

Statistical Analysis of Operational Risk Data

SpringerBriefs in Statistics

More information about this series at http://www.springer.com/series/8921

Giovanni De Luca Danilo Carità Francesco Martinelli •

•

Statistical Analysis of Operational Risk Data

123

Giovanni De Luca Department of Management and Quantitative Sciences Parthenope University of Naples Naples, Italy

Danilo Carità Department of Management and Quantitative Sciences Parthenope University of Naples Naples, Italy

Francesco Martinelli Research Department UBI Banca Milan, Italy

ISSN 2191-544X ISSN 2191-5458 (electronic) SpringerBriefs in Statistics ISBN 978-3-030-42579-1 ISBN 978-3-030-42580-7 (eBook) https://doi.org/10.1007/978-3-030-42580-7 Mathematics Subject Classiﬁcation (2010): 91B30, 46F10, 62F10, 62H05, 62H30 © The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Contents

1 The Operational Risk . . . . . . . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 1.2 Models for Operational Risk . . . . . . . . . . 1.2.1 Basic Indicator Approach . . . . . . . 1.2.2 Standardized Approach . . . . . . . . 1.2.3 Advanced Measurement Approach 1.3 Loss Distribution Approach . . . . . . . . . . . 1.4 DIPO Consortium . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

1 1 2 4 4 5 7 8 10

2 Identiﬁcation of the Risk Classes 2.1 Introduction . . . . . . . . . . . . . 2.2 Distributional Tests . . . . . . . . 2.3 Application to DIPO Data . . . References . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

11 11 11 15 17

3 Severity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Mixture of Three-Parameter Log-Normal Distributions 3.3 Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . 3.4 Application to DIPO Data . . . . . . . . . . . . . . . . . . . . . 3.4.1 Mixture of k Log-Normal Distributions . . . . . . 3.4.2 Log-Normal–GPD Distribution . . . . . . . . . . . . 3.4.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

19 19 20 21 23 23 37 47 50

4 Frequency Analysis . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . 4.2 Mixture of Poisson Distributions 4.2.1 The Poisson Distribution 4.2.2 Finite Poisson Mixture . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

51 51 51 51 52

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

v

vi

Contents

4.3 Mixture of Negative Binomial Distributions . . 4.3.1 The Negative Binomial Distribution . . 4.3.2 Relationship with Poisson Distribution 4.3.3 Maximum Likelihood Estimation . . . . 4.3.4 Finite Negative Binomial Mixture . . . . 4.4 Application to DIPO Data . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

53 53 54 56 57 57 69

5 Convolution and Risk Class Aggregation . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Overall Loss Distribution . . . . . . . . . . . . . . . 5.3 Risk Class Aggregation and Copula Functions 5.3.1 Tail Dependence . . . . . . . . . . . . . . . . 5.3.2 Elliptical Copulae . . . . . . . . . . . . . . . 5.3.3 Archimedean Copulae . . . . . . . . . . . . 5.4 Value-at-Risk Estimates Considering t-Copula References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

71 71 71 73 74 75 77 78 82

6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

List of Figures

Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig.

2.1 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22 3.23 3.24 3.25 3.26 3.27 3.28 3.29 3.30

Size of the eight business lines . . . . . . . BL1—Severity distribution . . . . . . . . . . BL2—Severity distribution . . . . . . . . . . BL3/ET1—Severity distribution . . . . . . BL3/ET2—Severity distribution . . . . . . BL3/ET3—Severity distribution . . . . . . BL3/ET4—Severity distribution . . . . . . BL3/ET5—Severity distribution . . . . . . BL3/ET6—Severity distribution . . . . . . BL3/ET7—Severity distribution . . . . . . BL4/ET1—Severity distribution . . . . . . BL4/ET2—Severity distribution . . . . . . BL4/ET367—Severity distribution . . . . BL4/ET4—Severity distribution . . . . . . BL4/ET5—Severity distribution . . . . . . BL5—Severity distribution . . . . . . . . . . BL6—Severity distribution . . . . . . . . . . BL7—Severity distribution . . . . . . . . . . BL8/ET1—Severity distribution . . . . . . BL8/ET27—Severity distribution . . . . . BL8/ET35—Severity distribution . . . . . BL8/ET4—Severity distribution . . . . . . BL8/ET6—Severity distribution . . . . . . BL1—Log-normal versus GPD ﬁt . . . . BL2—Log-normal versus GPD ﬁt . . . . BL3/ET1—Log-normal versus GPD ﬁt . BL3/ET2—Log-normal versus GPD ﬁt . BL3/ET3—Log-normal versus GPD ﬁt . BL3/ET4—Log-normal versus GPD ﬁt . BL3/ET5—Log-normal versus GPD ﬁt . BL3/ET6—Log-normal versus GPD ﬁt .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 38 39 39 40 40 41 41 42 vii

viii

Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig. Fig.

List of Figures

3.31 3.32 3.33 3.34 3.35 3.36 3.37 3.38 3.39 3.40 3.41 3.42 3.43 3.44 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20 4.21 4.22

BL3/ET7—Log-normal versus GPD ﬁt . . . . . BL4/ET1—Log-normal versus GPD ﬁt . . . . . BL4/ET2—Log-normal versus GPD ﬁt . . . . . BL4/ET367—Log-normal versus GPD ﬁt . . . BL4/ET4—Log-normal versus GPD ﬁt . . . . . BL4/ET5—Log-normal versus GPD ﬁt . . . . . BL5—Log-normal versus GPD ﬁt . . . . . . . . BL6—Log-normal versus GPD ﬁt . . . . . . . . BL7—Log-normal versus GPD ﬁt . . . . . . . . BL8/ET1—Log-normal versus GPD ﬁt . . . . . BL8/ET27—Log-normal versus GPD ﬁt . . . . BL8/ET35—Log-normal versus GPD ﬁt . . . . BL8/ET4—Log-normal versus GPD ﬁt . . . . . BL8/ET6—Log-normal versus GPD ﬁt . . . . . BL1/ET—Frequency distribution . . . . . . . . . BL2/ET—Frequency distribution . . . . . . . . . BL3/ET1—Frequency distribution . . . . . . . . BL3/ET2—Frequency distribution . . . . . . . . BL3/ET3—Frequency distribution . . . . . . . . BL3/ET4—Frequency distribution . . . . . . . . BL3/ET5—Frequency distribution . . . . . . . . BL3/ET6—Frequency distribution . . . . . . . . BL3/ET7—Frequency distribution . . . . . . . . BL4/ET1—Frequency distribution . . . . . . . . BL4/ET2—Frequency distribution . . . . . . . . BL4/ET367—Frequency distribution . . . . . . BL4/ET4—Frequency distribution . . . . . . . . BL4/ET5—Frequency distribution . . . . . . . . BL5/ET—Frequency distribution . . . . . . . . . BL6/ET—Frequency distribution . . . . . . . . . BL7/ET—Frequency distribution . . . . . . . . . BL8/ET1—Frequency distribution . . . . . . . . BL8/ET27—Frequency distribution . . . . . . . BL8/ET35—Frequency distribution . . . . . . . BL8/ET4—Frequency distribution . . . . . . . . BL8/ET6—Frequency distribution . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42 43 43 44 44 45 45 46 46 47 47 48 48 49 62 62 62 63 63 63 64 64 64 65 65 65 66 66 66 67 67 67 68 68 68 69

List of Tables

Table 2.1

Table 2.2 Table 3.1 Table 3.2 Table 3.3 Table 3.4

Table 4.1 Table 4.2 Table 4.3 Table Table Table Table

5.1 5.2 5.3 5.4

Table 5.5

Pooling results using KS and AD tests for business lines 3, 4, 8 and percentages within each of the three business lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Operational risk classes and percentages . . . . . . . . . . . . . . . . Estimated mixtures according to the following criteria: BIC, argmin (p-value > 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Estimated mixtures according to the proposed criterion . . . . . GPD estimation on DIPO data . . . . . . . . . . . . . . . . . . . . . . . . AD goodness-of-ﬁt comparison between a mixture of k Log-normal distributions and a Log-normal and GPD combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Number of components and KS p-values for Poisson and Negative Binomial distributions. . . . . . . . . . . . . . . . . . . . Adjusted KS p-values for Poisson and Negative Binomial mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selected distribution and number of components for the risk classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Correlation between Severities and Frequencies . . . . . . . . . . . Pearson’s linear correlation matrix . . . . . . . . . . . . . . . . . . . . . Kendall’s rank correlation matrix . . . . . . . . . . . . . . . . . . . . . . Years with maximum aggregated losses, maximum frequencies, and maximum single loss among risk classes . . . Value-at-Risk results using different correlation hypotheses . .

.. ..

15 16

.. .. ..

24 26 38

..

49

..

59

..

60

. . . .

. . . .

61 72 79 80

.. ..

81 82

ix

Chapter 1

The Operational Risk

Abstract This chapter introduces the operational risk and the most popular approaches for its quantification. A focus is set on the Loss Distribution Approach based on the identification of a statistical distribution of losses. Loss business line and event type are defined. Finally, the DIPO consortium is presented. Keywords Operational losses · Loss distribution approach · DIPO consortium

1.1 Introduction The new regulatory framework proposed by the Basel Committee on Banking Supervision aims to achieve international convergence on the revisions of supervisory regulations governing the capital adequacy of banks at international level. In drawing up the new scheme, the Committee has tried to achieve capital requirements that are significantly more sensitive to risk than solid from a conceptual point of view. At the same time, the Committee has taken in due account the particular characteristics of the supervisory and accounting systems currently in force in the individual member countries. A significant innovation of the new scheme consists in the greater use of risk assessments provided by the internal systems of banks as input for computing capital ratios. Finally, it should be emphasized that the new scheme is aimed at establishing minimum capital levels for internationally active banks. Finally, the new scheme has framed the matter of capital adequacy in a flexible way, giving it the ability to evolve over time. This evolution is necessary to ensure that the scheme is able to follow market developments and the progress in risk management methodologies. The main risks associated with banking activities are: • Credit risk: risk of losses due to default or insolvency of the counterparty; • Market risk: risk of losses due to adverse price movements (rates, exchange rates, prices); • Liquidity risk: risk of losses due to the impossibility of timely disinvesting positions;

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 G. De Luca et al., Statistical Analysis of Operational Risk Data, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-42580-7_1

1

2

1 The Operational Risk

• Country risk: risk of losses due to the economic–political situation of the country of residence of the defaulting debtor; • Operational risk: risk of losses due to inadequate processes, personnel, or external events; • Reputational risk: risk of losses due to the lower qualitative perception of the bank on the market; • Strategic risk: risk of losses linked to changes in the competitive environment. The new Basel III agreement following Basel II (BCBS 2003b) is an international business standard issued by the Basel Committee on Banking Supervision (BCBS) that requires financial institutions to maintain enough cash reserves to cover risks incurred by operations. The Basel III accord is based on three pillars: • Minimum capital requirements: banks must use specific approaches and methods for measuring the credit, market, and operational risks in which they could incur and determine the corresponding capital requirements; • Prudential control of capital adequacy: in the supervision activity, the local supervisory authorities must express a judgment on the adequacy of risk control; • Information transparency requirements: information on capital allocation techniques and on risk management processes must be disclosed to the market. The New Agreement defines operational risk as the risk deriving from inadequate or insufficient processes, personnel and internal systems, or from exogenous events. This definition includes legal risk, but not strategic and reputational risk. The methodological lines for the quantification of operational risk are indicated, while in the past it was proposed to create a capital requirement against other risks, without making any reference to operational risk.

1.2 Models for Operational Risk In the classification of the main methods proposed in the literature for the modeling of operational risks, the following subdivision criteria can be adopted: • the Process Approach which presents a structure aimed at modeling operational risk through a description of the processes of the reality to be analyzed; • the Factor Approach which assumes a mutual interaction of several existing risk factors, hypothesizing their manifestation across different processes; • the Actuarial Approach which refers to a characteristic risk theory approach developed in the actuarial field, in which the risk analysis is implemented through the joint study of two stochastic variables: number of events (Frequency) and amount of losses (Severity). Another classification criterion can be considered according to the type of analysis setup:

1.2 Models for Operational Risk

3

• Top-down: these models start from the analysis of loss data at the aggregate level and are independent on the type of structure; • Bottom-up: these models start from the modeling of operational risk losses at a disaggregated level (or single process or single event). The recent literature on operational risk has been mainly oriented toward the analysis of models based on an actuarial approach. These models have strong advantages from both a methodological and an implementation point of view. They can be used both at the top-down and bottom-up levels and are supported by a great theoretical strength. In this regard, it seems appropriate to introduce two very important concepts for the understanding of the topics that we will address in the next sections, that is, • Value-at-Risk (VaR): it is a measure of risk and represents the maximum loss that can occur in a specific time horizon and with a certain level of probability; • Capital-at-Risk (CaR): it is the amount of capital needed to cope with a loss of a given entity (for example, the loss computed as VaR). The supervisory authorities have established the guidelines that should inspire the actions of banks in the field of operational risks, specifying in a rigorous and precise manner the key concepts of business line and event type. The reconciliation of losses and risk measures to the pair business line/event type is useful in terms of efficient capital allocation and implicitly suggests the use of a bottom-up approach for the quantification of operational risk. A bank must use the most appropriate approach to its risk profile and degree of sophistication, changing its approach when it can develop more sophisticated operational risk measurement systems. Moreover, a bank may be allowed to use different approaches for different sectors of its operations, provided that the following conditions are met: • all the operational risks connected with the activities implemented by the bank are captured; • all operations falling within the application of a given approach meet the requirement set; • at the date of application, the Advanced Measurement Approach (described below) is able to capture a significant part of the operational risks; • submission to the supervisory authority of a time schedule for the gradual application of the Advanced Measurement Approach to all relevant legal entities/operating lines. The Basel Committee suggests three approaches: 1. Basic Indicator Approach; 2. Standardized Approach; 3. Advanced Measurement Approach.

4

1 The Operational Risk

1.2.1 Basic Indicator Approach The Basic Indicator Approach (BIA) is the simplest and most immediate approach. The quantification of the operational risk is related to the Gross Income through a multiplicative factor denoted as α. The Capital-at-Risk is given by Ca R = α · G I, where • G I represents the Gross Income or average annual gross income of the entire bank for the 3 previous financial years. It is defined as net interest income. This measure should – be gross of any provision; – exclude the profits or losses realized on the sale of securities of the banking book (profits/losses realized on securities held to maturity or fixed portfolios); – exclude extraordinary losses, errors, or omissions, as well as income deriving from insurance. • α is a fixed percentage (15%) established by the Basel Committee. This approach is extremely simple from an implementation point of view. However, from the methodological point of view, it is clearly not supported by any theoretical model. For this reason, the obtained results are weakly reliable given the extreme sensitivity with respect to the only existing parameter and the impossibility of assessing the gross income capacity to explain the potential losses.

1.2.2 Standardized Approach In the Standardized Approach (SA), the bank’s activities are divided into n standardized business lines. For each of them, the capital charge is computed by multiplying the Gross Income (aggregated at the business line level) by a factor β, attributed by the supervisory authorities to each business line. The total Capital-at-Risk is computed as a simple sum of individual risk capital referred to each business line, namely, Ca R =

n

β(i) · G I (i),

i=1

where • G I (i) represents the Gross Income or average annual gross income referred to the 3 previous financial years of the business line i; • β(i) is a constant multiplicative factor that is assigned by the supervisory authorities to business line i.

1.2 Models for Operational Risk

5

1.2.3 Advanced Measurement Approach The bank can be authorized by the supervisory board to the use of the Advanced Measurement Approach (AMA). In this case, it can allow for the effects of the mitigation of operational risk arising from insurance contracts for an amount not exceeding 20% of its total capital requirement to face operational risk, in compliance with defined criteria. The bank authorized to use the AMA cannot go back to adopting one of the simplified approaches (BIA, SA), except for specific advice from the supervisory authority. To face the operational risk, a bank must maintain a capital endowment equal to the measure generated by the internal system. The requirements to be able to adopt the approach are qualitative and quantitative. The qualitative standards are: • match with the principles of the Committee document Sound practices for the management and supervision of operational risk (BCBS 2003a); • adoption of adequate systems for the treatment of operational risks in compliance with the minimum requirements established by the Basel Committee in its third and final consultative paper in 2003 (BCBS 2003b); • integration of the operational risk management system in the day-by-day risk management of the bank; • validation of the operational risk management system by the Internal Audit, the External Audit, and the Supervisory Authorities. The quantitative standards are: • inclusion of the “low-frequency—high-impact” events, i.e., the catastrophic events (also resorting to external system data); • identification of a 99.9% confidence level and a time horizon of 1 year (homogeneity of the approach with the interval rating-based approach for the credit risk); • analysis of internal time-series data with a size of almost 5 years which are exhaustive with regard to operational losses (3 years if the bank is adopting the AMA for the first time); • mapping of operational losses in seven standard “loss-type” categories; • definition and documentation of the integration mechanism of significant external (consortium, public) data used in conjunction with the bank’s internal data; • definition and documentation of the procedure for estimating the parameters used in determining the risk profile; • use of scenario analyses, conducted at a professional level, in order to evaluate exposure to particularly serious events; • capability of the methodology to grasp the operating context and the internal control system, key factors for defining the operational risk profile. The incentives for the adoption of the AMA methods, according to the latest indications of the Committee, are:

6

1 The Operational Risk

• the expected reduction of the overall capital requirement, with the same credit assets (volumes) and intermediation margin (the exposure indicator for operational risk); • the elimination of the capital floor (initially required), i.e., 75% of the requirement obtained with the application of the standardized approach; • exclusion from the capital requirement of the expected losses (if allowed by the accounting principles in force); • possibility to take into account the correlation effects between bottom-up events; • possibility of resorting to insurance policies, with the effect of obtaining further capital rebates. The three most important advanced approaches are the Internal Measurement Approach, the Loss Distributional Approach, and the Scorecard Approach. The Internal Measurement Approach (IMA) measures the capital charge at each business line/event type combination using aggregated data (total number of events or total loss by event type). To move from these aggregate sizes to a good approximation of the loss distribution, multiplicative factors (exogenous scaling factor to the model) and the bank risk profile index coefficients are introduced. In order to obtain the total capital charge for the bank, a simple sum of the computed capital charges for each combination business line/event type has to be made, i.e., Ca R =

i

Ca R(i, j).

j

In this formula, Ca R(i, j) is the capital charge for business line i and event type j, given by Ca R(i j) = E L(i, j) · R P(i, j) · γ(i, j), where • E L(i, j) is the expected loss computed as: E L(i, j) = E I (i, j) · P E(i, j) · LG E(i, j), where – E I (i, j) is an indicator of exposure to risk; – P E(i, j) is the probability of realization of the loss event; – LG E(i, j) is the amount of loss recorded as a result of the occurrence of the event. • R P(i, j) is an adjustment factor aimed at highlighting the differences between the tail of the loss distributions of the individual bank and the tail of the individual sector at the level of the individual business line. In practice, it tries to capture the characteristics of the distribution of the single bank and that of the individual sector in order to transform the exogenous scale factor into an internal factor;

1.2 Models for Operational Risk

7

• γ(i, j) represents a constant used to transform the expected loss into Capital-atRisk set by the supervisory authorities. The Loss Distribution Approach requires that the bank estimates through its internal data, for each business line/loss event type combination, the distribution of the severity probability (impact of the single loss event) and the distribution of the frequency of the event. This methodology is based on the analysis of the empirical distribution of losses. The main features of such an approach are outlined in the next section. Finally, in the Scorecard Approach, the bank initially sets a capital absorption for operational risk considering the whole corporate or considering the business lines. The bank can change it over time on the basis of appropriate scorecards that seek to capture the risk profiles underlying each business line, taking into account the control and mitigation processes of existing risks. The scorecards are compiled annually and are subject to revision on the basis of data on past losses or on the basis of a number of indicators considered as proxies of operational risks at business lines level.

1.3 Loss Distribution Approach The Loss Distribution Approach (LDA) is based on the identification of a statistical distribution of losses. According to this approach, the losses are classified according to the business line and/or to the event type. This level of disaggregation ensures a level of granularity sufficient to capture the relevant drivers of operational risk which can affect the tail of the estimated loss distribution. In fact, this methodology is based on the separate analysis of the probability distribution of each individual event and of the distribution of the impact, in terms of the economic loss. After having estimated the distributions of these quantities, their joint distribution can be identified through a convolution operation. The risk theory in the actuarial field is based on the joint analysis of two fundamental quantities, both of stochastic nature: 1. number of events (Frequency); 2. amount of losses (Severity). The LDA includes: • the determination of the distribution of the number of occurrences (Frequency) of the losses in a specified period of time; • the determination of the distribution of the amount of the losses (Severity) in the same period; • the determination, through their aggregation, of a single distribution of losses, from which the VaR is computed by cutting the queue of the distribution at the adopted level of significance.

8

1 The Operational Risk

Using appropriate statistical tools, this methodology suggests the determination of the distribution of frequencies and severities within a certain time frame. This methodology can be developed through the following steps: 1. collection and preliminary analysis of loss data in terms of amount (severity) and in terms of frequency; 2. selection of the frequency distribution and of the severity distribution that best describe the data and estimation of the parameters of interest; 3. computation of the combined aggregate function (overall loss function) through the convolution operator or using Monte Carlo simulation techniques; 4. computation of the VaR as a percentile of the overall loss function.

1.4 DIPO Consortium The “Database Italiano Perdite Operative” (DIPO) is an interbanking activity that started to operate in 2003. Its main purpose is to support the development of operational risk management as well as to create a methodology to collect and exchange information on operational losses suffered by members. Currently, 33 Italian banking groups are associated with DIPO for 250 legal entities. The activities primarily consist of managing and maintaining the Italian database of operational losses, a database that has been gathering information since January 2003 on each single event generating an operational risk loss suffered by member banks. Since December 2011, DIPO has become part of ABIServizi S.p.A. In order to harmonize the collection and classification of loss events, each member can rely on certain instruments devised by DIPO including: • the DIPO manual describing the harmonized procedures according to which members should collect information on loss events and each field of the data structure; • the definition of gross operating losses; • the decision tree for assignment of event type; • the criteria regarding mapping of losses and earning margins on the eight business lines defined at the regulatory level. Moreover, queries put forth by members on census/classification of particularly complex events are periodically updated with answers provided by a working group. Census criteria are based on: • mapping scheme for business lines, considered by Basel III, which in the intention of regulators have a double application: imputation of components to the eight business lines and census of single loss events for each BL; • classification of loss events according to a decision tree Event Type (a methodology that considers the event itself, not its effect) structured in seven main loss types. In particular, the latest mapping scheme contemplates the following business lines (BCBS 2001a):

1.4 DIPO Consortium

9

1. Corporate Finance (BL1): Mergers and Acquisitions, Underwriting, Privatizations, Securitization, Research, Debt (Government, High Yield) Equity, Syndications, IPO, Secondary Private Placements; 2. Trading and Sales (BL2): Fixed Income, equity, foreign exchanges, commodities, credit, funding, own position securities, lending and repos, brokerage, debt, prime brokerage; 3. Retail Banking (BL3): Retail lending and deposits, banking services, trust and estates, private lending and deposits, banking services, trust and estates, investment advice, merchant/Commercial/Corporate cards, private labels and retail; 4. Commercial Banking (BL4): Project finance, real estate, export finance, trade finance, factoring, leasing, lends, guarantees, bills of exchange; 5. Payment and Settlement (BL5): Payments and collections, funds transfer, clearing and settlement; 6. Agency and Custody (BL6): Escrow, Depository Receipts, Securities lending (Customers) Corporate actions, Issuer and paying agents; 7. Asset Management (BL7): Pooled, segregated, retail, institutional, closed, open, private equity, pooled, segregated, retail, institutional, closed, open; 8. Retail Brokerage (BL8): Execution and full service. Operational losses can arise with different types of effects which have to be attributed to the originating event type. DIPO identifies the following first-level event types (BCBS 2001b): 1. Internal fraud (ET1): Losses due to acts of a type intended to defraud, misappropriate property or circumvent regulations, the law or company policy, excluding diversity/discrimination events, which involves at least one internal party. 2. External fraud (ET2): Losses due to acts of a type intended to defraud, misappropriate property, or circumvent the law, by a third party. 3. Employment Practices and Workplace Safety (ET3): Losses arising from acts inconsistent with employment, health or safety laws or agreements, from payment of personal injury claims, or from diversity/discrimination events. 4. Clients, Products, and Business Practices (ET4): Losses arising from an unintentional or negligent failure to meet a professional obligation to specific clients (including fiduciary and suitability requirements), or from the nature or design of a product. 5. Damage to Physical Assets (ET5): Losses arising from loss or damage to physical assets from natural disaster or other events. 6. Business disruption and system failures (ET6): Losses arising from disruption of business or system failures. 7. Execution, Delivery, and Process Management (ET7): Losses from failed transaction processing or process management, from relations with trade counterparties and vendors. The consortium collects data with an amount equal to at least e5000 of “Effective Gross Loss.” Effective losses are negative income flows characterized by the certainty of the quantification of the amount as reported in the profit and loss account and

10

1 The Operational Risk

attributable to the event, either directly or through management/departmental observations. Direct attribution applies both to the loss and to any potential expenses— invoiced by third parties—or related to the settlement of the issue. Hence, the reference criterion is the effective impact on the profit and loss account (including the setting up of provisions), but the record of the loss might not match the effective gross loss. For example, suppose that a bank is robbed, causing a loss of 300 with an indemnity from the insurance company of 200: the amount of the robbery is recorded only with regard to the amount of the excess clause, namely, 100. However, the loss to be reported shall be 300. In addition, DIPO disciplines reporting and evaluation of amounts associated with legal risk.1 In the operational risk management, the Advanced Measurement Approaches (AMA) are largely studied by researchers and financial institutions. In particular, the popular Loss Distribution Approach (LDA) is based on a parametric approach for the estimation of the loss Severity and loss Frequency distributions. The salient features of this book are: • the comparison of the popular extreme value theory with a mixture approach for the estimation of the loss Severity distribution (in this way, the right tail of the distribution is described by a specific component of the mixture); • the comparison of the most popular discrete distributions, namely, the Poisson and the Negative Binomial, with a mixture approach to estimate the Frequency distribution; • the calculation of the Value-at-Risk using copula-based methods, taking into account the mixture approaches abovementioned, allowing us to obtain a lower capital requirement figure compared to the traditional approach. In particular, for our analysis, we have employed DIPO daily loss data (with an amount equal to at least e5000) occurred from January 1, 2003 to December 31, 2015.

References BCBS, Basel Committee on Banking Supervision, The New Basel Capital Accord, Second Consultative Paper. (Bank for International Settlements, 2001a) BCBS, Basel Committee on Banking Supervision, Working Paper of the Regulatory Treatment of Operational Risk. (Bank for International Settlements, 2001b) BCBS, Basel Committee on Banking Supervision, Sound Practices for the Management and Supervision of Operational Risk. (Bank for International Settlements, 2003a) BCBS, Basel Committee on Banking Supervision, The New Basel Capital Accord, Third Consultative Paper. (Bank for International Settlements, 2003b)

1 Further

details can be found on the DIPO website http://www.dipo-operationalrisk.it.

Chapter 2

Identification of the Risk Classes

Abstract The chapter shows the procedure of identification of the risk classes, starting from the partition of the losses in business lines. While low-size business lines represent distinct risk classes, high-size business lines are divided following a statistical procedure aimed at creating homogeneous subgroups. Keywords Business line · Event type · Two-sample test

2.1 Introduction Our starting point is the correspondence between business line and risk class, that is, each business line represents a risk class. However, under this assumption, we observe that three risk classes, BL3, BL4, and BL8, appear to be highly sizeable, and as a result, scarcely homogeneous. Figure 2.1 shows the number of losses belonging to the eight business lines. A careful identification of the risk class could provide a more insightful description. Our aim is that of ascertain the possibility to pool the losses of different event types for BL3, BL4, and BL8 following a statistical procedure. In more detail, given two event types of a specific business line, we test the hypothesis that the losses belong to the same distribution. If we cannot reject this hypothesis, then we can merge the losses and consider them belonging to the same risk class; on the other hand, if we reject the hypothesis we have to keep the losses in separate risk classes. These hypothesis tests are conducted using a two-sample test.

2.2 Distributional Tests In this section, we introduce the Kolmogorov–Smirnov test and the Anderson– Darling test, the most popular distributional tests which can be used as powerful tools both for comparing a dataset to a specified distribution (goodness-of-fit test) and for comparing two datasets (two-sample test). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 G. De Luca et al., Statistical Analysis of Operational Risk Data, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-42580-7_2

11

12

2 Identification of the Risk Classes

Fig. 2.1 Size of the eight business lines

BL1

BL2

BL3

BL4

BL5

BL6

BL7

BL8

The Kolmogorov–Smirnov (KS) test was introduced by Kolmogorov (1933, 1941) and Smirnov (1939) as a goodness-of-fit test based on the distance between the empirical distribution function of a sample and the cumulative distribution function of the selected distribution (one-sample case). Alternatively, it can be used as a test to compare the empirical distribution functions of two samples (two-sample case). The distribution of this statistic is identified under the null hypothesis that the sample is drawn from the assumed distribution in the one-sample case, or that the samples are drawn from the same distribution in the two-sample case. In the onesample case, the KS statistic is given by Dn =

√ n sup |Fn (x) − F(x)|, x

where Fn (x) is the empirical cumulative distribution value for a sample size of n, and F(x) is the theoretical cumulative distribution value at x. The null hypothesis H0 : Fn (x) = F(x) is rejected if Dn is larger than the critical value Dα at a given α. This means that a band with a height of Dα is drawn on both sides of the theoretical distribution, and if the empirical distribution falls outside that band at any given point, the null hypothesis is rejected. The two-sample version of the KS test generalizes to: Dn 1 n 2 =

n1n2 sup |Fn 1 (x) − Fn 2 (x)|, n1 + n2 x

(2.1)

where Fn 1 (x) and Fn 2 (x) are two empirical cumulative distribution values at x, based on datasets of size n 1 and n 2 , respectively. The null hypothesis H0 : F1 (x) = F2 (x) is rejected if Dn 1 n 2 is larger than the critical value Dα at a given α. The main advantage of the KS test is represented by its sensitivity to the scale and shape of a distribution, although the aforementioned sensitivity is greater near

2.2 Distributional Tests

13

to the core of the distribution itself. Moreover, it is applicable and dependable even for small sample sizes (Lilliefors 1967). Therefore, the two-sample KS test is recommended in the following experimental situations: • the distribution means or medians are similar but differences in variance or symmetry are suspected; • the sample sizes are small; • the differences between distributions are suspected to affect only the upper or lower tail of distributions; • the shift between two distributions is hypothesized to be small but systematic; • the two samples are of unequal size. The KS test is intended for continuous distributions, but it has also been adapted for discrete distributions (Arnold and Emerson 2011). The form of the test statistic is the same as in the continuous case. Consider two non-decreasing functions f and g, where the function f is a step function with jumps on the set {x1 , . . . , xn } and g is continuous. In order to determine the supremum of the difference between these two functions, notice that sup | f (x) − g(x)| = max max |g(xi ) − f (xi )|, lim |g(x) − f (xi−1 )| x→xi i x (2.2) = max max (|g(xi ) − f (xi )|, |g(xi ) − f (xi−1 )|) . i

Computing the maximum over these 2n values (with f equal to F(x) and g equal to Fn (x) as defined above) is clearly the most efficient way to compute the KS test statistic for a continuous null distribution. When the function g is not continuous, however, equality (2.2) does not hold in general because we cannot replace lim x→xi g(x) with the value g(xi ). If it is known that g is a step function, it follows that for some small , sup | f (x) − g(x)| = max (|g(xi ) − f (xi )|, |g(xi − ) − f (xi−1 )|) , x

i

where the discontinuities in g are more than some distance apart. This, however, requires knowledge that g is a step function as well as of the nature of its support (specifically, the break points). As a result, we can implement the KS test statistic for discrete null distributions by requiring the complete specification of the null distribution. The Anderson–Darling (AD) test was developed in Anderson and Darling (1952, 1954) as an alternative to other goodness-of-fit statistical tests for detecting whether a given sample of data is drawn from a specified probability distribution. The one-sample AD test statistic is non-directional and is computed from the following formula:

14

2 Identification of the Risk Classes

A2 = −n −

n

1 (2i − 1) ln F(x(i) ) + ln(1 − F(x(n+1−i) )) , n i=1

where {x(1) < · · · < x(n) } is the ordered (from the smallest to the largest element) sample of size n, and F(x) is the underlying theoretical cumulative distribution to which the sample is compared. The null hypothesis that {x(1) < · · · < x(n) } comes from the underlying distribution F(x) is rejected if A2 is larger than the critical value A2α at a given α. The two-sample AD test, introduced by Darling (1957) and Pettitt (1976), generalizes to the following formula: A2 =

n 1 +n 2 1 1 (Ni Z (n 1 +n 2 −i) )2 , n 1 n 2 i=1 i Z (n 1 +n 2 −i)

where Z (n 1 +n 2 ) represents the combined and ordered samples X (n 1 ) and Y(n 2 ) , of size n 1 and n 2 , respectively, and Ni represents the number of observations in X (n 1 ) that are equal to or smaller than the ith observation in Z (n 1 +n 2 ) . The null hypothesis that samples X (n 1 ) and Y(n 2 ) come from the same continuous distribution is rejected if A2 is larger than the correspondent critical value. The AD test has the advantage of being very sensitive, especially toward differences at the tails of distributions, and it is also recommended for comparisons between samples belonging to continuous distributions. However, critical values have to be computed for each distribution. Anderson and Darling (1954) found that for one set of observations the KS and AD tests have provided the same result. Stephens (1974) compared several one-sample goodness-of-fit tests and concluded that KS and AD tests surpassed the χ 2 test in power and managed to detect changes in mean better as well. The AD test has the same advantages of the KS test, namely, its sensitivity to shape and scale of a distribution (Anderson and Darling 1954) and its applicability to small samples (Pettitt 1976). Specifically, the critical values for the AD test rise asymptotically and converge very quickly toward the asymptote (Anderson and Darling 1954; Stephens 1974; Pettitt 1976). In addition, the AD test has two extra advantages over the KS test (Engmann and Cousineau 2011): 1. it is especially sensitive toward differences in the tails of distributions; 2. it is better capable of detecting very small differences, even between large sample sizes. In summary, in case of continuous distributions, AD test should be preferred to the KS test. When dealing with discrete contributions instead, the KS test is more suitable than AD test, thanks to the modification introduced by Arnold and Emerson (2011).

2.3 Application to DIPO Data

15

2.3 Application to DIPO Data KS and AD tests have been used to ascertain the possibility to pool the losses of different event types for each of the three business lines BL3, BL4, and BL8. The pooling process of a single business line consists of the following steps: 1. pairwise evaluation of event types in order to verify if they come from the same unknown distribution function F(x), based on the p-value of a two-sample test; 2. assessment of p-values obtained from the different pairwise comparisons with a given α (0.01 in this case): (a) if the biggest p-value obtained is greater than α, the two event types producing that p-value are pooled; (b) if the biggest p-value obtained is smaller than α, the process breaks off and no pooling is possible. 3. repetition of steps 1–2 for the six remaining event types (five early event types plus the pooled one) if in the previous step case (a) occurs.

Table 2.1 Pooling results using KS and AD tests for business lines 3, 4, 8 and percentages within each of the three business lines Kolmogorov–Smirnov Anderson–Darling Risk class Percentage Risk class Percentage BL3/ET1 BL3/ET26 BL3/ET3 BL3/ET4 BL3/ET5 – BL3/ET7 Total BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 Total BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6 Total

2.92 42.16 8.36 16.15 1.83 – 28.58 100 0.51 22.74 47.03 27.53 2.19 100 2.68 15.39 0.58 80.84 0.50 100

BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 Total BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 Total BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6 Total

2.92 41.19 8.36 16.15 1.83 0.97 28.58 100 0.51 22.74 47.03 27.53 2.19 100 2.68 15.39 0.58 80.84 0.50 100

16 Table 2.2 Operational risk classes and percentages

2 Identification of the Risk Classes Risk class

Percentage

BL1 BL2 BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 BL5 BL6 BL7 BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6 Total

0.10 3.25 1.49 20.99 4.26 8.23 0.93 0.49 14.57 0.06 2.69 5.56 3.26 0.26 0.54 0.40 0.42 0.87 5.00 0.19 26.29 0.16 100

At every step of the algorithm, the number of event types decreases of one unity because two of them are pooled. Pooling can also take place between event types pooled themselves previously. If poolings are feasible at every step, the process stops only when all event types are pooled in one, highlighting that all event types belonging to a specific business line come from the same distribution. For example, when considering BL4, ET3 and ET7 have been pooled at the first step because they show the biggest p-value among the 21 realized comparisons. The new group is named ET37. Then, at the second step, the two-by-two comparisons among ET1, ET2, ET4, ET5, ET6, and ET37 have been carried out. During this step, ET6 and ET37 have turned out to be compatible, and as a consequence they have been pooled in a group named ET367. At the third step, we therefore have ET1, ET2, ET4, ET5, and ET367, but we could not make any further pooling. As a result, the BL4 is split into BL4/ET1, BL4/ET2, BL4/ET367, BL4/ET4, and BL4/ET5.

2.3 Application to DIPO Data

17

In order to make a comparison between the two different types of pooling, Table 2.1 shows pooled event types for BL3, BL4, and BL8 according to both KS and AD tests, together with the percentages. It is clear that findings are very similar. Using KS method, we obtain one pooling more (ET2 and ET6 of BL3), proving that the AD method is more conservative in pooling the event types. Single classes display a satisfactory size. BL4/ET1 turns out to be the risk class with the smallest size (102 observations), but it cannot be pooled with other classes, according to both KS and AD tests. In conclusion, both tests provide remarkable results. However, proved in Sect. 2.2 that the AD test turns out to be more useful than KS in analyzing operational loss data since it gives a higher weight to the tail of a distribution, we have decided to use the classes pooled according to the AD method for the following analysis. Final risk classes are 22 and are summarized in Table 2.2, together with the respective percentages.

References T.W. Anderson, D.A. Darling, Asymptotic theory of certain goodness of fit criteria based on stochastic processes. Ann. Math. Stat. 23(2), 193–212 (1952) T.W. Anderson, D.A. Darling, A test of goodness of fit. J. Am. Stat. Assoc. 49(268), 765–769 (1954) T.B. Arnold, J.W. Emerson, Nonparametric goodness-of-fit tests for discrete null distributions. R J. 3(2), 34–39 (2011) D.A. Darling, The Kolmogorov-Smirnov, Cramer-von Mises tests. Ann. Math. Stat. 28(4), 823–838 (1957) S. Engmann, D. Cousineau, Comparing distributions: the two-sample Anderson-Darling test as an alternative to the Kolmogorov-Smirnoff test. J. Appl. Quant. Methods 6(3), 1–17 (2011) A.N. Kolmogorov, Sulla determinazione empirica di una legge di distribuzione. G. dell’Istituto Italiano degli Attuari 4, 83–91 (1933) A.N. Kolmogorov, Confidence limits for an unknown distribution function. Ann. Math. Stat. 12(4), 461–463 (1941) H.W. Lilliefors, On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J. Am. Stat. Assoc. 62(318), 399–402 (1967) A.N. Pettitt, A two-sample Anderson-Darling rank statistic. Biometrika 63(1), 161–168 (1976) N. Smirnov, Sur les écarts de la courbe de distribution empirique. Mat. Sb. 48(1), 3–26 (1939) M.A. Stephens, EDF statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 69(347), 730–737 (1974)

Chapter 3

Severity Analysis

Abstract In this chapter, the severities of the operational losses are estimated using a non-negative continuous distribution. The Log-normal distribution and, in some cases, the mixture of two or more Log-normal distributions have been studied. The number of components of each mixture has been selected proposing a procedure which counterbalance goodness-of-fit and parsimony. Keywords Log-normal distributions · Mixture of distributions · Goodness-of-fit test

3.1 Introduction After identifying risk classes to employ in the analysis conveniently, we have defined which statistical distribution properly fits the “Severity” of each observed risk class. The Severity is depicted as a non-negative continuous random variable X that can be described with a probability density function f X (x, θ) where θ is either the parameter or the vector of parameters. We have decided to adopt the Log-normal distribution for our model since it is extensively used in financial environments for non-negative variables (Soprano et al. 2009). In more detail, we have focused on the three-parameter Log-normal distribution that fits satisfactorily our data (as showed later) given that they include only losses greater than a given threshold. In addition, we have proved that applying the three-parameter Log-normal distribution to operational loss data is tantamount to applying the Normal distribution to the natural logarithms of losses, allowing us to carry out our analysis by using functions which are usually available in most statistical softwares. However, for some classes, the three-parameter Log-normal distribution choice could not lead to an adequate estimation. In such cases, to achieve the required results it is necessary to compute a mixture of two or more Log-normal distributions. For this reason, the ideal number of components has to be estimated for each risk class (if a mixture is not required, the number of components amounts to 1). Findings of our analysis are showed in Sect. 3.4. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 G. De Luca et al., Statistical Analysis of Operational Risk Data, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-42580-7_3

19

20

3 Severity Analysis

3.2 Mixture of Three-Parameter Log-Normal Distributions The three-parameter Log-normal distribution is a positively skewed distribution, useful for modeling continuous positive random variables with support set [γ, +∞) for some γ ≥ 0 (Aristizabal 2012). The probability density function (pdf) of the three-parameter Log-normal distribution is [ln(x − γ) − μ]2 1 , (3.1) f (x; μ, σ, γ) = √ exp − 2σ 2 (x − γ)σ 2π where x > γ ≥ 0, −∞ < μ < +∞, σ > 0. In (3.1): • γ is the threshold (or location) parameter that defines the point where the support set of the distribution begins; • μ is the scale parameter that extends or shrinks the distribution; • σ is the shape parameter that affects the shape of the distribution. The (two-parameter) Log-normal distribution is a special case of the three-parameter Log-normal distribution when γ = 0. As we have mentioned before, we prove with a few simple steps that if X is a random variable that has a three-parameter Log-normal probability distribution with parameter γ, μ and σ, then Y = ln(X − γ) has a Normal distribution with mean μ and variance σ 2 . From (3.1), we define Y = ln(X − γ) and then X = eY + γ. Applying the following formula ∂x g(y) = f (x) = f (x) · e y , ∂y

(3.2)

we have 1 (y − μ)2 √ exp − 2σ 2 e y σ 2π 1 (y − μ)2 . = √ exp − 2σ 2 σ 2π

g(y) = e y

(3.3)

Similarly, it can be shown that if X is a random variable with a mixture of k threeparameter Log-normal probability distributions, then Y = ln(X − γ) has a mixture

3.2 Mixture of Three-Parameter Log-Normal Distributions

21

of k Normal distribution and each component has mean μi and variance σi2 , i = 1, . . . , k. kThe pdf of a mixture of k Log-normal distributions with probability pi , such that i=1 pi = 1, is given by [ln(x − γ) − μ1 ]2 + f (x) = p1 √ exp − 2σ12 (x − γ)σ1 2π 1 [ln(x − γ) − μ2 ]2 + ··· + + p2 √ exp − 2σ22 (x − γ)σ2 2π 1 [ln(x − γ) − μk ]2 + pk √ exp − 2σk2 (x − γ)σk 2π 1

and if we define as above Y = ln(X − γ) we have X = eY + γ. Applying (3.2) again, we obtain (x − γ) (y − μ1 )2 + g(y) = p1 √ exp − 2σ12 (x − γ)σ1 2π (x − γ) (y − μ1 )2 + ··· + + p2 √ exp − 2σ22 (x − γ)σ2 2π (x − γ) (y − μk )2 + pk √ exp − 2σk2 (x − γ)σk 2π 1 (y − μ1 )2 = p1 √ exp − + 2σ12 σ1 2π 1 (y − μ2 )2 + ··· + + p2 √ exp − 2σ22 σ2 2π 1 (y − μk )2 + pk √ exp − . 2σk2 σk 2π

3.3 Extreme Value Theory The traditional approach has considered the analysis of the Severities of the operational loss data after dividing them into two distinct parts: • high-frequency losses with a low impact (the body of the distribution); • low-frequency losses with a high impact (the tail of the distribution).

22

3 Severity Analysis

The two parts generally require different distributions for an effective fitting (Soprano et al. 2009). Techniques from the Extreme Value Theory (EVT), which deals with the stochastic behavior of the extreme values in a process, usually prove to be beneficial in such a model. Let X be a random value with distribution function F and let u be a threshold. The data above the threshold are defined as extreme. The conditional distribution Fu (y) = P(X − u ≤ y | X > u) is known as the excess distribution function of X above threshold u. When u → ∞, for a wide set of distribution classes F, the limit distribution of the excess distribution function is given by the Generalized Pareto Distribution (GPD) distribution, ⎧

1 ⎨ 1 − 1 + ξ y − ξ if ξ = 0 β

G ξ,β (y) = ⎩ 1 − exp − y if ξ = 0, β

where y ≥ 0 if ξ ≥ 0 and 0 ≤ y ≤ −β/ξ if ξ < 0. ξ ∈ R and β > 0 are, respectively, the shape and scale parameters. If ξ > 0, the distribution is characterized by fat tails; in other words, the tail of the distribution decays slower than the exponential one. After defining X as function of Y , X = u + Y , the GPD can be defined in terms of X , 1 x − u −ξ . G ξ,β (x − u) = 1 − 1 + ξ β The estimate of F(x) is given by 1 x − u −ξ Nu ˆ 1+ξ F(x) = 1 − n β for x > u, where Nu is the number of observations over the threshold and n is the total number of observations. The generic percentile Fˆ −1 (q) with q > F(u) can be obtained through

−ξ β n −1 ˆ (1 − q) −1 . F (q) = u + ξ N The parameters of the GPD are usually estimated using the popular method of maximum likelihood applied to the data above the threshold. The density function of the GPD is given by

3.3 Extreme Value Theory

23

⎧

1 ⎨ 1 1 + ξ y − ξ −1 if ξ = 0 β β

gξ,β (y) = ⎩ 1 exp − y if ξ = 0. β β The choice of the threshold is a critical point. A low threshold will allow the researcher to consider a high number of data; however, some of these data are not really extreme. Conversely, a high threshold ensures a low number of data, which are really extreme; however, their limited size can affect the variance of the estimates. According to the literature (see Soprano et al. 2009), the mean excess function e(u) = E(X − u | X > u) is the most important and used tool. In general, a graphical approach is implemented plotting the estimated mean excess function e(u) ˆ =

Nu 1 (xi − u) Nu i=1

against u. When the plot of the empirical mean excess function becomes approximately linear above a value of u, then the excess data above u are distributed as a GPD. In fact, the GPD distribution with shape parameter ξ and scale parameter β implies a mean excess function which is a linear function of the threshold, e(u) =

β + ξu . 1−ξ

3.4 Application to DIPO Data 3.4.1 Mixture of k Log-Normal Distributions Taking advantage of the proofs showed in Sect. 3.2, we have estimated the distribution of DIPO operational losses by means of a mixture of Log-normal distributions for each of the 22 risk classes shown in Table 2.2. Considering that DIPO collects only loss data greater or equal to e 5000 (see Chap. 1), we have subtracted from each class (whose set we denote with X ) the threshold value and we have applied the logarithmical transformation (so we have Y = ln(X − 5000)). At this point, we have estimated a Normal distribution mixture with k components to each class. To this end, we have used the R software exploiting mclust, a package for classification and parameter estimation based on finite Normal mixture modeling (Fraley et al. 2012). The mclust package chooses the ideal number of components

24

3 Severity Analysis

Table 3.1 Estimated mixtures according to the following criteria: BIC, argmin ( p-value > 0.05) Risk class Mixture p-value Mixture p-value components components (BIC) ( p-value > 0.05) BL1 BL2 BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 BL5 BL6 BL7 BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6

1 2 1 5 3 2 2 2 2 1 2 2 1 2 1 1 2 1 2 1 2 1

0.708 0.574 0.965 0.829 0.576 0.055 ≈1 ≈1 0.349 0.985 0.974 0.708 0.357 0.994 0.824 0.560 ≈1 0.920 0.547 0.997 0.086 0.969

1 2 1 3 1 2 1 1 2 1 2 1 1 1 1 1 1 1 2 1 2 1

0.708 0.574 0.965 0.068 0.077 0.055 0.361 0.303 0.349 0.985 0.974 0.063 0.357 0.267 0.824 0.560 0.554 0.920 0.547 0.997 0.086 0.969

(k) of the mixture according to the model showing the highest Bayesian Information Criterion (BIC) among the computed ones (by default mclust estimates up to nine components). Since this method can occasionally produce a rather high number of components, we have computed for each class also the minimum number of k components such that the null hypothesis of the AD goodness-of-fit test, as defined in Sect. 2.2 in its one-sample version, is not rejected with a significance level α = 0.05. For each risk class, Table 3.1 shows the number of components suggested by mclust package according to the BIC with the related p-values and the minimum number of components such that the p-value of AD test is greater than 0.05 with the related p-values. As expected, mixtures obtained by means of the second method have a smaller number of components per class (1.364) compared with the BIC-based criterion which considers on average 1.773 components per class. Observing in greater detail the fourth column, it appears that 15 classes out of 22 need only one component, i.e., we can simply use the Normal distribution to estimate the statistical distribution of

3.4 Application to DIPO Data

25

our classes without adopting any mixture. Six classes need a two-component mixture, whereas only BL3/ET2 requires a mixture with at least three components. The aforementioned criterion works very satisfactorily according to parsimony, but we cannot say the same if we take into account the p-values related to the estimations (for example, in five cases the p-value is slightly above the α value). To obtain more robust p-values and keep a small number of components at the same time, we propose the following selection criterion. Definition 3.1 Let p(k) be the AD test p-value for a mixture of k components, and let k1 = arg min ( p(k) | p(k) > 0.05) . k

Then, the final number of components is given by k2 = k1 +

∞ I p(k1 + j) − p(k1 + j − 1) > υ j , j=1

where I(·) is an indicator function and υ ∈ [0, 1] is a value chosen by the researcher. In this analysis, we have chosen to set υ = 0.5, so that the number of components is increased compared to the basic model only if the gain in terms of p-values is noteworthy. However, to avoid a too high number of components we have decided to set to 3 the maximum number of components per class (as a consequence, the risk class BL3/ET2 is ruled out and the mixture of three components is considered). Findings of the application of the selected criterion to mixture computation are depicted in Table 3.2. The average number of components (1.545) is almost exactly halfway between those ones belonging to models shown in Table 3.1. This means that the criterion can accomplish the goal to represent a kind of “summary” between the BIC and a criterion based on parsimony. It is noteworthy that one-half of the classes does not need a mixture for the estimation of its statistical distribution in any case, whereas in the other half only one class (BL3/ET2) requires three components. The p-values are satisfactory. Four classes (BL3/ET2, BL3/ET3, BL3/ET4, BL8/ET4) still have a p-value smaller than 0.10, but after using the proposed decision criterion we can say that if the number of components of the classes under consideration is increased, we will not achieve a significant improvement of p-values. After estimating the distribution of each class, we show the probability density functions for the 22 risk classes computed by means of mclust package, along with the histogram for each class. As can be seen from Figs. 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13, 3.14, 3.15, 3.16, 3.17, 3.18, 3.19, 3.20, 3.21, and 3.22, estimated distribution functions fit adequately the loss data.

26

3 Severity Analysis

Table 3.2 Estimated mixtures according to the proposed criterion Risk class Mixture components p-value (proposed criterion) 1 2 1 3 1 2 2 2 2 1 2 2 1 2 1 1 1 1 2 1 2 1

0.708 0.574 0.965 0.068 0.077 0.055 ≈1 0.999 0.349 0.985 0.974 0.708 0.357 0.994 0.824 0.560 0.554 0.920 0.547 0.997 0.086 0.969

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

density

BL1 BL2 BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 BL5 BL6 BL7 BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6

6

8

10

12

loss (log. scale)

Fig. 3.1 BL1—Severity distribution

14

16

18

27

0.10 0.00

0.05

density

0.15

0.20

3.4 Application to DIPO Data

5

10

15

loss (log. scale)

0.10 0.00

0.05

density

0.15

0.20

Fig. 3.2 BL2—Severity distribution

5

10

loss (log. scale)

Fig. 3.3 BL3/ET1—Severity distribution

15

3 Severity Analysis

0.15 0.10 0.00

0.05

density

0.20

0.25

28

0

5

10

15

loss (log. scale)

density

0.00

0.05

0.10

0.15

0.20

Fig. 3.4 BL3/ET2—Severity distribution

0

5

10

loss (log. scale)

Fig. 3.5 BL3/ET3—Severity distribution

15

29

0.10 0.00

0.05

density

0.15

0.20

3.4 Application to DIPO Data

0

5

10

15

loss (log. scale)

0.15 0.10 0.00

0.05

density

0.20

0.25

Fig. 3.6 BL3/ET4—Severity distribution

4

6

8

10

loss (log. scale)

Fig. 3.7 BL3/ET5—Severity distribution

12

14

16

3 Severity Analysis

0.15 0.10 0.00

0.05

density

0.20

0.25

30

5

10

15

loss (log. scale)

0.15 0.10 0.00

0.05

density

0.20

0.25

Fig. 3.8 BL3/ET6—Severity distribution

0

5

10

loss (log. scale)

Fig. 3.9 BL3/ET7—Severity distribution

15

31

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14

density

3.4 Application to DIPO Data

4

6

8

10

12

14

16

18

loss (log. scale)

0.15 0.10 0.00

0.05

density

0.20

0.25

Fig. 3.10 BL4/ET1—Severity distribution

5

10

loss (log. scale)

Fig. 3.11 BL4/ET2—Severity distribution

15

3 Severity Analysis

0.10 0.00

0.05

density

0.15

0.20

32

0

5

10

15

loss (log. scale)

0.10 0.00

0.05

density

0.15

0.20

Fig. 3.12 BL4/ET367—Severity distribution

5

10

loss (log. scale)

Fig. 3.13 BL4/ET4—Severity distribution

15

33

0.15 0.00

0.05

0.10

density

0.20

0.25

0.30

3.4 Application to DIPO Data

2

4

6

8

10

12

14

loss (log. scale)

0.10 0.00

0.05

density

0.15

0.20

Fig. 3.14 BL4/ET5—Severity distribution

5

10

loss (log. scale)

Fig. 3.15 BL5—Severity distribution

15

3 Severity Analysis

0.10 0.00

0.05

density

0.15

0.20

34

0

5

10

15

loss (log. scale)

0.10 0.00

0.05

density

0.15

0.20

Fig. 3.16 BL6—Severity distribution

0

5

10

loss (log. scale)

Fig. 3.17 BL7—Severity distribution

15

35

0.10 0.00

0.05

density

0.15

0.20

3.4 Application to DIPO Data

4

6

8

10

12

14

16

loss (log. scale)

0.15 0.10 0.00

0.05

density

0.20

0.25

Fig. 3.18 BL8/ET1—Severity distribution

5

10

loss (log. scale)

Fig. 3.19 BL8/ET27—Severity distribution

15

3 Severity Analysis

0.15 0.10 0.00

0.05

density

0.20

0.25

36

6

8

10

12

14

loss (log. scale)

0.15 0.10 0.00

0.05

density

0.20

0.25

Fig. 3.20 BL8/ET35—Severity distribution

0

5

10

loss (log. scale)

Fig. 3.21 BL8/ET4—Severity distribution

15

37

0.15 0.10 0.00

0.05

density

0.20

3.4 Application to DIPO Data

4

6

8

10

12

14

loss (log. scale)

Fig. 3.22 BL8/ET6—Severity distribution

3.4.2 Log-Normal–GPD Distribution In order to estimate a GPD distribution on DIPO data, first of all, we need to set an appropriate body–tail threshold for each risk class. As mentioned in Sect. 3.3, there are no consolidated analytical methods for this purpose and a standard approach is a qualitative setting through graphical analysis. However, when choosing the threshold, we have considered a constraint: the number of extreme values has to be equal or greater than 30 for each class, allowing us to obtain a GPD estimation based on a sufficient number of observations. Table 3.3 shows for each class, respectively, the quantile used as threshold and the p-values of the AD test. It is evident that quantiles chosen as threshold vary remarkably from class to class. Largest classes have the highest threshold as we expected. The BL8/ET4 class, which includes the highest number of observations (see Table 2.2), has a threshold equal to the 99.85th percentile indeed. Conversely, BL4/ET1 class with the lowest number of observations shows a very low threshold equal to the 65th percentile. Parameters are estimated by means of evir package (Pfaff et al. 2004), which provides functions for the computation of GPD. Maximum likelihood methodology is used for parameter estimation. Estimates obtained by using GPD modeling on risk classes are satisfactory. The last column of Table 3.3 highlights that p-values provided by Anderson–Darling test are greater than 0.7 for each class. The only exception is BL4/ET4 but with a broadly acceptable value (0.336).

38

3 Severity Analysis

Table 3.3 GPD estimation on DIPO data Risk class Threshold quantile (%) 81.00 98.80 98.00 99.60 99.30 99.70 97.00 96.10 99.79 65.00 99.00 99.31 99.10 90.00 95.90 92.50 92.00 95.60 99.65 88.00 99.85 87.50

0.992 0.741 0.998 0.926 0.754 0.968 0.998 0.978 0.999 0.998 0.933 0.824 0.336 0.890 0.895 0.995 0.988 0.998 0.999 0.964 0.940 0.860

0.6 0.2

0.4

ECDF

0.8

1.0

BL1 BL2 BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 BL5 BL6 BL7 BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6

p-value

0.0

Lognormal GPD

0e+00

2e+07

4e+07

Excess loss

Fig. 3.23 BL1—Log-normal versus GPD fit

6e+07

8e+07

39

0.6 0.4 0.2

ECDF

0.8

1.0

3.4 Application to DIPO Data

0.0

Lognormal GPD

0e+00

1e+07

2e+07

3e+07

4e+07

5e+07

Excess loss

0.6 0.4 0.2

ECDF

0.8

1.0

Fig. 3.24 BL2—Log-normal versus GPD fit

0.0

Lognormal GPD

0e+00

1e+07

2e+07

3e+07

4e+07

Excess loss

Fig. 3.25 BL3/ET1—Log-normal versus GPD fit

After obtaining the tail distribution estimates of the 22 risk classes and recalling the Log-normal mixtures previously estimated, we can characterize the whole operational loss distribution by comparing for each class the statistical distribution function with the curves concerning the Log-normal mixtures and GPD. Figures 3.23, 3.24, 3.25, 3.26, 3.27, 3.28, 3.29, 3.30, 3.31, 3.32, 3.33, 3.34, 3.35, 3.36, 3.37, 3.38, 3.39, 3.40, 3.41, 3.42, 3.43, and 3.44 depict a graphical analysis for each class, taking into account the whole dataset. We show in each chart the data empirical distribution function, the Log-normal mixture (red line), and the GPD (green line) which starts

3 Severity Analysis

0.6 0.4 0.2

ECDF

0.8

1.0

40

0.0

Lognormal GPD

0e+00

2e+06

4e+06

6e+06

8e+06

1e+07

Excess loss

0.6 0.4 0.2

ECDF

0.8

1.0

Fig. 3.26 BL3/ET2—Log-normal versus GPD fit

0.0

Lognormal GPD

0.0e+00

5.0e+06

1.0e+07

1.5e+07

2.0e+07

2.5e+07

3.0e+07

Excess loss

Fig. 3.27 BL3/ET3—Log-normal versus GPD fit

from the threshold quantile associated with it. As can be seen, both curves fit losses optimally. In many cases, they practically coincide, highlighting that our approach (based on a mixture of k Log-normal distributions) explains the behavior of extreme data associated with operational risk as effectively as the classical EVT methodology. We have then compared the traditional approach to what we have proposed up until now. In more detail, the procedure described in Sect. 3.2 considers the estimation of a Log-normal distribution mixture for all loss data in each class, while in Sect. 3.3 we have introduced an analysis on the extreme data only, aimed at understanding if the

41

0.6 0.4 0.2

ECDF

0.8

1.0

3.4 Application to DIPO Data

0.0

Lognormal GPD

0.0e+00

5.0e+07

1.0e+08

1.5e+08

2.0e+08

Excess loss

0.6 0.4 0.2

ECDF

0.8

1.0

Fig. 3.28 BL3/ET4—Log-normal versus GPD fit

0.0

Lognormal GPD

0e+00

1e+06

2e+06

3e+06

4e+06

5e+06

Excess loss

Fig. 3.29 BL3/ET5—Log-normal versus GPD fit

estimation of a GPD were possible. Now the abovementioned steps will be carried out together instead. In other words, the Severity for each class is obtained estimating separately the body of the distribution below threshold u and the remaining part of the distribution, that is, the tail (Soprano et al. 2009). For losses lower than u, we use a simple Log-normal distribution, whereas for losses higher than the threshold EVT is applied. Thresholds are set to the values depicted in Table 3.3.

3 Severity Analysis

0.6 0.4 0.2

ECDF

0.8

1.0

42

0.0

Lognormal GPD

0e+00

1e+07

2e+07

3e+07

4e+07

Excess loss

0.6 0.4 0.2

ECDF

0.8

1.0

Fig. 3.30 BL3/ET6—Log-normal versus GPD fit

0.0

Lognormal GPD

0.0e+00

5.0e+06

1.0e+07

1.5e+07

2.0e+07

2.5e+07

3.0e+07

Excess loss

Fig. 3.31 BL3/ET7—Log-normal versus GPD fit

The density function of severity random variable X is then f X (x) =

x 0 with probability function given by

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 G. De Luca et al., Statistical Analysis of Operational Risk Data, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-42580-7_4

51

52

4 Frequency Analysis

P(X = x) = e−λ

λx x = 0, 1, 2, . . . x!

where e is the Euler’s number. It is also denoted as X ∼ Po(λ). The Poisson random variable has expectation given by E[X ] =

∞

xe−λ

x=0

∞ λx λx−1 = λe−λ = λ, x! (x − 1)! x=0

that is, λ is the expected number of events per interval, and has variance given by V ar [X ] = E[X 2 ] − E[X ]2 =

∞

x 2 e−λ

x=0

=

∞

(x(x − 1) + x)e−λ

x=0

= e−λ

∞

x(x − 1)

x=0

λx − λ2 x!

λx − λ2 x!

∞ λx λx + e−λ − λ2 x x! x! x=0

= λ + λ − λ2 = λ. 2

The sample estimation of the Poisson distribution parameter, obtained by maximum likelihood method (λˆ MLE ), is given by the average of the n i.i.d. observations x1 , x2 , . . . , xn : n 1 xi . λˆ MLE = n i=1

The maximum likelihood estimator of λ is unbiased and also efficient, i.e., its variance achieves the Cramér–Rao lower bound.

4.2.2 Finite Poisson Mixture A discrete random variable X is defined as a mixture of k Poisson distributions with probability pi (i = 1, . . . , k) when the following probability mass function applies: P(X = x) = p1

λx λ1x −λ1 λx e + p2 2 e−λ2 + · · · + pk k e−λk x = 0, 1, 2, . . . x! x! x!

where λ1 , . . . , λk are the k component positive means and

k i=1

pi = 1.

4.3 Mixture of Negative Binomial Distributions

53

4.3 Mixture of Negative Binomial Distributions 4.3.1 The Negative Binomial Distribution The Poisson distribution involves a constant rate of loss occurrence over time. Actually, the frequency of operational losses is not constant and the Negative Binomial distribution could be used to model the frequencies of operational losses (Soprano et al. 2009). Consider a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called “success” and “failure”. In each trial, the probability of success is p, and the probability of failure is (1 − p). We observe this sequence until a predefined number r of failures has occurred. Then, the random number of successes, X , will have the Negative Binomial distribution which is written as X ∼ N B(r, p). The Negative Binomial random variable is then a two-parameter discrete variable. Its probability function is P(X = x) =

x +r −1 x

p x (1 − p)r x = 0, 1, 2, . . .

(4.1)

The quantity in parentheses is the binomial coefficient which counts the number of potential success and failure combinations and is equal to

x +r −1 x

=

(x + r − 1)! (x + r − 1)(x + r − 2) · · · (r ) = . x!(r − 1)! x!

(4.2)

Equation (4.2) can alternatively be written in the following manner, explaining the name “Negative Binomial”: (x + r − 1) · · · (r ) (−r )(−r − 1)(−r − 2) · · · (−r − x + 1) = (−1)x x! x! −r x . = (−1) x The expectation and variance of the Negative Binomial distribution are respectively given by p E[X ] = r (1 − p) and V ar [X ] = r

p . (1 − p)2

The definition of the Negative Binomial distribution can be extended to the case of the parameter r taking on a positive real value. Although it is impossible to visualize a non-integer number of “failures”, we can still formally define the distribution

54

4 Frequency Analysis

through its probability mass function. The problem of extending the definition to real-valued (positive) r boils down to extending the binomial coefficient to its realvalued counterpart, based on the Gamma function (x + r ) (x + r − 1)(x + r − 2) · · · (r ) x +r −1 = . = x x! x! (r ) Now, after substituting this expression in (4.1), we say that X has a Negative Binomial distribution if it has a probability mass function P(X = x) =

(x + r ) x p (1 − p)r x = 0, 1, 2, . . . x! (r )

(4.3)

where r is a real positive number. In Negative Binomial regression (Hilbe 2011), the distribution is specified in pr terms of its mean, μ = 1− , which is then related to explanatory variables as in linear p regression or other generalized linear models. From the expression for the mean μ, μ r and 1 − p = μ+r . Then, substituting these expressions in one can derive p = μ+r (4.3) yields a different parametrization of the probability function in terms of μ: P(X = x) =

(x + r ) x! (r )

μ r +μ

x

r r +μ

r x = 0, 1, 2, . . .

(4.4)

The variance can then be written as μ + μr . In this context, the parameter r is referred to as the “dispersion parameter”, “shape parameter”, “clustering coefficient” (Lloyd-Smith 2007), or the “heterogeneity” or “aggregation” parameter (Crawley 2012). This distribution is a generalization of the Poisson process; in fact, the presence of two parameters allows flexibility in the shape of the distribution compared to the Poisson. 2

4.3.2 Relationship with Poisson Distribution Consider a sequence of Negative Binomial random variables where the parameter r goes to infinity and the probability of success in each trial p goes to zero in such a way to keep the mean of the distribution constant. Denoting the mean λ (instead of μ) as λ=r

p , 1− p

p=

λ . λ+r

the parameter p is given by

4.3 Mixture of Negative Binomial Distributions

55

Under this parametrization, the probability function is

P(X = x) =

(r + x) λx 1 (x + r ) x p (1 − p)r = · · r . x x! · (r ) x! (r ) (r + λ) 1 + λr

Now if we consider the limit as r → ∞, the second factor will converge to one, and the third to the exponential function, that is lim P(X = x) =

r →+∞

λx 1 ·1 · λ, x! e

which is the mass function of a Poisson-distributed random variable with expected value λ. So, the alternatively parameterized Negative Binomial distribution converges to the Poisson distribution and r controls the deviation from the Poisson. This makes the Negative Binomial distribution suitable as a robust alternative to the Poisson, given that it approaches the Poisson for large r , λ = Po(λ), lim N B r, r →+∞ λ+r but with a larger variance than the Poisson for small r . The Negative Binomial distribution also arises as a continuous mixture of Poisson distributions (i.e., a compound probability distribution) where the mixing distribution of the Poisson rate is a Gamma distribution. In other words, we can consider the Negative Binomial as a Po(λ), where λ is a random variable itself, distributed as a Gamma distribution with shape r and scale θ = p/(1 − p) or correspondingly rate β = (1 − p)/ p. In order to demonstrate the intuition behind this statement, consider as above two independent Poisson processes, “success” and “failure”, with intensities p and 1 − p. Together, the success and failure processes are tantamount to a single Poisson process of intensity 1, where an occurrence of the process is a success if a related independent coin toss comes up heads with probability p; otherwise, it is a failure. If r is a counting number, the coin tosses show that the count of successes before the r th failure follows a Negative Binomial distribution with parameters r and p. The count is also, however, the count of the success Poisson process at the random time T of the r th occurrence in the failure Poisson process. The success count follows a Poisson distribution with mean pT , where T is the waiting time for r occurrences in a Poisson process of intensity 1 − p, i.e., T is Gamma-distributed with shape parameter r and intensity 1 − p. Steps at the beginning of the paragraph follow, because λ = pT is Gamma-distributed with shape parameter r and intensity (1 − p)/ p. The following formal derivation (which does not depend on r being a count number) confirms the intuition:

56

4 Frequency Analysis

P(X = x) =

∞

f Po(λ) (x) · f Gamma r, 1− p (λ) dλ p

0 ∞

−λ(1− p)/ p

λ −λ r −1 e e ·λ dλ

r p x! 0 (r ) 1− p r −r ∞ (1 − p) p λr +x−1 e−λ/ p dλ = x!(r ) 0 (1 − p)r p −r r +x = (r + x) p x!(r ) (r + x) x p (1 − p)r . = x!(r )

=

x

Because of this, the Negative Binomial distribution is also known as the Gamma– Poisson (mixture) distribution. Actually, it was originally derived as a limiting case of the Gamma–Poisson distribution (Greenwood and Yule 1920).

4.3.3 Maximum Likelihood Estimation The maximum likelihood estimator only exists for samples for which the sample variance is larger than the sample mean (Adamidis 1999). The likelihood function for n i.i.d. observations (x1 , x2 , . . . , xn ) is L(r, p) =

n

f (xi ; r, p),

i=1

from which we compute the log-likelihood function (r, p) =

n i=1

ln ((xi + r )) −

n i=1

ln (xi !) − n ln ((r )) +

n

xi ln ( p) + nr ln (1 − p).

i=1

To find the maximum, we take the partial derivatives with respect to r and p and set them equal to zero: n

1 ∂l(r, p) 1 = =0 xi − nr ∂p p 1 − p i=1 n

∂l(r, p) = ψ(xi + r ) − nψ(r ) + n ln (1 − p) = 0, ∂r i=1 where

(4.5)

(4.6)

4.3 Mixture of Negative Binomial Distributions

ψ(x) =

57

(x) (x)

is the Digamma function. Solving (4.5) for p gives n i=1 x i n . p= nr + i=1 xi Substituting this in (4.6) gives n

∂l(r, p) r n = = 0. ψ(xi + r ) − nψ(r ) + n ln ∂r r + i=1 xi /n i=1

(4.7)

Equation (4.7) cannot be solved for r in closed form. If a numerical solution is desired, an iterative technique such as Newton’s method can be used. Alternatively, the Expectation–Maximization algorithm (EM) can be used (Adamidis 1999).

4.3.4 Finite Negative Binomial Mixture A discrete random variable X is defined as a mixture of k Negative Binomial distributions (parametrized as in (4.4)) with probability pi (i = 1, . . . , k) when the following probability mass function applies:

x r1 (x + r1 ) r1 μ1 + P(X = x) = p1 x! (r1 ) r1 + μ1 r1 + μ1 x r2 (x + r2 ) r2 μ2 + p2 + ··· + x! (r2 ) r2 + μ2 r2 + μ2 x rk (x + rk ) rk μk + pk x! (rk ) rk + μk rk + μk , . . . , rk are the k component means and dispersion paramwhere μ1 , . . . , μk and r1 k pi = 1. eters, respectively, with i=1

4.4 Application to DIPO Data Starting from the list of all dates, we have computed how many losses have occurred in each single day, i.e., how many times every single date appears inside the dataset. To this end, we have computed the Frequency distribution for each level of date

58

4 Frequency Analysis

variable, i.e., every day between January 1, 2003 and December 31, 2015.1 Days without any loss greater than 5000 euros are preserved in the dataset, but they have a frequency equal to 0 in order to make them representative and do not miss any information over the whole observation period (4748 days). For each risk class, the following distributions have been applied to the number of losses occurred every day: • Poisson mixture; • Negative Binomial mixture. For the Poisson mixture, we have resorted to the MixtureInf package (Li et al. 2016) which exploits the EM algorithm, whereas we have adopted the flexmix package (Grün and Leisch 2007) to estimate the Negative Binomial mixture parameters. Despite it is supposed to be used for computation of finite mixtures of regression models, the aforementioned package suits to the purpose in any case because it is able to regress the response variable on a constant. As a consequence, no regression model is carried out de facto and the mixture is computed directly on the response variable (i.e., daily loss frequencies). Therefore, for each class, both a mixture of k Poisson distributions and a mixture of k Negative Binomial distributions have been estimated. The ideal number of components is defined by means of a process which resembles the one adopted in the Severity estimation. Definition 4.2 Let p(k) be the KS test p-value for a k components mixture, and let k1 = arg min ( p(k) | p(k) > 0.05). k

Then, the ultimate number of components is given by ∞ k2 = k1 + I p(k1 + j) − p(k1 + j − 1) > υ j , j=1

where I(·) is an indicator function and υ ∈ [0, 1] is a value selected by the researcher. We set again υ = 0.5. In this case, the KS test p-value is used as a benchmark because, as we have mentioned in Sect. 2.2, it has a revised version designed for discrete distributions provided by dgof package (Arnold and Emerson 2011). Like Severities, we have preferred to set a limit to the increase of the number of components in order to avoid estimating an excessive (and unnecessary) number of parameters. Our limit amounts to no more than five components for the mixture of Poisson distributions and no more than three components for the mixture of Negative Binomial distributions. 1 This

is very straightforward to do in R by means of table function, which requires as input the vector including chronologically ordered dates.

4.4 Application to DIPO Data

59

Table 4.1 Number of components and KS p-values for Poisson and Negative Binomial distributions Risk class Poisson mixture Neg. Binomial mixture Components p-value Components p-value BL1 BL2 BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 BL5 BL6 BL7 BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6

1 3 3 4 3 5 2 2 6 1 5 4 3 1 3 2 2 2 4 1 – 1

≈1 0.470 ≈1 0.524 0.577 ≈1 0.965 ≈1 0.934 ≈1 ≈1 0.997 0.999 0.772 ≈1 0.886 ≈1 ≈1 ≈1 0.963 ≈0 0.999

1 2 1 – 2 2 1 1 2 1 2 2 1 1 1 1 1 1 2 1 2 1

≈1 ≈1 0.647 ≈0 ≈1 0.989 ≈1 0.962 0.085 ≈1 0.999 0.976 0.895 ≈1 ≈1 0.889 0.646 0.985 ≈1 ≈1 0.411 ≈1

For each risk class, Table 4.1 highlights the estimated number of components together with the KS p-value for each distribution. Poisson mixture fits all classes but BL8/ET4. A more in-depth look at the number of components makes it clear that class BL3/ET7 requires even a mixture of six Poisson since for a smaller number of components we cannot obtain an estimation giving a p-value greater than 0.05 (due to the limit established before, such estimation is therefore discarded). As regards the Negative Binomial distribution, BL3/ET2 is the only class without a satisfactory estimation. For most classes, one component is enough, and no one requires a mixture of more than two Negative Binomial distributions. Looking at these findings, it is difficult to decide which type of distribution for each class has to be selected, given that both Poisson and Negative Binomial p-values are highly satisfactory in most cases. A remarkable difference among the two models lies in the number of components to estimate instead. For this reason, we have decided to adopt the following decision criterion: consider an “adjusted” p-value given by

60

4 Frequency Analysis

Table 4.2 Adjusted KS p-values for Poisson and Negative Binomial mixtures Risk class Adjusted p-value Poisson mixture Neg. Binomial mixture BL1 BL2 BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 BL5 BL6 BL7 BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6

≈1 0.094 0.200 0.075 0.115 0.111 0.322 0.333 0.085 ≈1 0.111 0.142 0.200 0.772 0.200 0.295 0.333 0.333 0.143 0.963 – 0.999

0.500 0.200 0.324 – 0.200 0.198 0.500 0.481 0.017 0.500 0.200 0.195 0.448 0.500 0.500 0.445 0.323 0.493 0.200 0.500 0.082 0.500

the model p-value weighted for the number of parameters of the model itself (this number amounts to 2k − 1 for a k Poisson mixture and to 3k − 1 for a k Negative Binomial mixture). For each risk class, we have chosen the distribution showing the highest adjusted p-value. Table 4.2 depicts the adjusted KS p-values related to Poisson and Negative Binomial mixtures. According to this criterion, we have selected the Poisson mixture only for 8 classes out of 22, whereas for all other classes we have opted for a Negative Binomial mixture. Nevertheless, since we have decided to take into account no more than five components for the Poisson mixture, we have preferred to estimate a two Negative Binomial distributions mixture for BL3/ET7 class. Selected distributions for each risk class are shown in Table 4.3 along with the number of components. It is worthwhile noting that we have been able to obtain a limited number of components (and a limited number of parameters to estimate consequently) using the criterion we have mentioned above. Apart from BL3/ET2 class which requires the estimation of seven parameters, all other classes have a

4.4 Application to DIPO Data

61

Table 4.3 Selected distribution and number of components for the risk classes Risk class Distribution Components BL1 BL2 BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 BL5 BL6 BL7 BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6

Poisson Neg. Binomial Neg. Binomial Poisson Neg. Binomial Neg. Binomial Neg. Binomial Neg. Binomial Neg. Binomial Poisson Neg. Binomial Neg. Binomial Neg. Binomial Poisson Neg. Binomial Neg. Binomial Poisson Neg. Binomial Neg. Binomial Poisson Neg. Binomial Poisson

1 2 1 4 2 2 1 1 2 1 2 2 1 1 1 1 2 1 2 1 2 1

number of parameters that goes from 1 to 5, thus restricting the computational burden of the estimation process. Figures 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7, 4.8, 4.9, 4.10, 4.11, 4.12, 4.13, 4.14, 4.15, 4.16, 4.17, 4.18, 4.19, 4.20, 4.21, and 4.22 depict for each class the Frequency histograms together with the probability mass function (represented by the red line).

4 Frequency Analysis

pmf

0.0

0.2

0.4

0.6

0.8

1.0

62

0

2

4

6

8

10

n° events

0.3 0.2 0.0

0.1

pmf

0.4

0.5

Fig. 4.1 BL1/ET—Frequency distribution

0

5

10

15

20

n° events

pmf

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Fig. 4.2 BL2/ET—Frequency distribution

0

2

4

6

n° events

Fig. 4.3 BL3/ET1—Frequency distribution

8

10

pmf

63

0.00 0.02 0.04 0.06 0.08 0.10

4.4 Application to DIPO Data

0

10

20

30

40

50

n° events

0.2 0.0

0.1

pmf

0.3

0.4

Fig. 4.4 BL3/ET2—Frequency distribution

0

5

10

15

20

n° events

0.10 0.00

0.05

pmf

0.15

0.20

Fig. 4.5 BL3/ET3—Frequency distribution

0

5

10

15

n° events

Fig. 4.6 BL3/ET4—Frequency distribution

20

25

30

4 Frequency Analysis

0.0

0.2

pmf

0.4

0.6

64

0

2

4

6

8

10

6

8

10

n° events

0.4 0.0

0.2

pmf

0.6

0.8

Fig. 4.7 BL3/ET5—Frequency distribution

0

2

4

n° events

0.10 0.00

0.05

pmf

0.15

0.20

Fig. 4.8 BL3/ET6—Frequency distribution

0

10

20

n° events

Fig. 4.9 BL3/ET7—Frequency distribution

30

40

65

0.6 0.0

0.2

0.4

pmf

0.8

1.0

4.4 Application to DIPO Data

0

2

4

6

8

10

n° events

pmf

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Fig. 4.10 BL4/ET1—Frequency distribution

0

2

4

6

8

10

n° events

pmf

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

Fig. 4.11 BL4/ET2—Frequency distribution

0

5

10

n° events

Fig. 4.12 BL4/ET367—Frequency distribution

15

20

4 Frequency Analysis

0.2 0.0

0.1

pmf

0.3

0.4

66

0

5

10

15

20

n° events

0.4 0.0

0.2

pmf

0.6

0.8

Fig. 4.13 BL4/ET4—Frequency distribution

0

2

4

6

8

10

n° events

0.4 0.0

0.2

pmf

0.6

0.8

Fig. 4.14 BL4/ET5—Frequency distribution

0

2

4

6

n° events

Fig. 4.15 BL5/ET—Frequency distribution

8

10

67

0.4 0.0

0.2

pmf

0.6

0.8

4.4 Application to DIPO Data

0

2

4

6

8

10

6

8

10

6

8

10

n° events

0.4 0.0

0.2

pmf

0.6

0.8

Fig. 4.16 BL6/ET—Frequency distribution

0

2

4

n° events

0.4 0.0

0.2

pmf

0.6

0.8

Fig. 4.17 BL7/ET—Frequency distribution

0

2

4

n° events

Fig. 4.18 BL8/ET1—Frequency distribution

4 Frequency Analysis

0.0

0.1

pmf

0.2

0.3

68

0

5

10

15

20

n° events

0.4 0.0

0.2

pmf

0.6

0.8

Fig. 4.19 BL8/ET27—Frequency distribution

0

2

4

6

8

10

n° events

0.00

0.05

pmf

0.10

0.15

Fig. 4.20 BL8/ET35—Frequency distribution

0

20

40

n° events

Fig. 4.21 BL8/ET4—Frequency distribution

60

80

69

0.4 0.0

0.2

pmf

0.6

0.8

References

0

2

4

6

8

10

n° events

Fig. 4.22 BL8/ET6—Frequency distribution

References K. Adamidis, Theory & methods: an EM algorithm for estimating negative binomial parameters. Aust. N. Z. J. Stat. 41(2), 213–221 (1999) T.B. Arnold, J.W. Emerson, Nonparametric goodness-of-fit tests for discrete null distributions. R J. 3(2), 34–39 (2011) M.J. Crawley, The R Book, 2nd edn. (Wiley, New York, 2012) M. Greenwood, G.U. Yule, An inquiry into the nature of frequency distributions representative of multiple happenings with particular reference to the occurrence of multiple attacks of disease or of repeated accidents. J. R. Stat. Soc. 83(2), 255–279 (1920) B. Gruün, F. Leisch, Fitting finite mixtures of generalized linear regressions in R. Comput. Stat. Data Anal. 51(11), 5247–5252 (2007) J.M. Hilbe, Negative Binomial Regression (Cambridge University Press, Cambridge, 2011) S. Li, J. Chen, P. Li, MixtureInf: inference for finite mixture models (2016). https://CRAN.Rproject.org/package=MixtureInf J.O. Lloyd-Smith, Maximum likelihood estimation of the negative binomial dispersion parameter for highly overdispersed data, with applications to infectious diseases. PloS One 2(2), e180 (2007) A. Soprano, B. Crielaard, F. Piacenza, D. Ruspantini, Measuring Operational and Reputational Risk: A Practitioner’s Approach (Wiley, New York, 2009)

Chapter 5

Convolution and Risk Class Aggregation

Abstract The chapter shows how to use convolution to estimate the overall loss distribution and the Value-at-Risk of each risk class. To take into account the dependence among risk classes, copula functions are introduced. The Value-at-Risk is finally compared under the hypotheses of independence and dependence among risk classes. Keywords Convolution · Value-at-Risk · Copula function

5.1 Introduction Once the Severity (X ) and the Frequency (N ) have been estimated for each risk class, we can compute the loss distribution for the entire period taken into consideration by means of a method named convolution. To do so, however, severities and frequencies have to be independent. Pearson correlation coefficients have been computed for each class to verify this hypothesis, along with p-values, in order to show the significance of the coefficients. Table 5.1 clearly shows that almost all the coefficient p-values are very high, meaning that no significant correlation occurs between the Severity (X ) and the Frequency (N ). Only BL3/ET2 (0.005) and BL3/ET5 (0.001) classes have a p-value lower than 0.05, but the correlation coefficients ρ X N are quite small (−0.043 and 0.094, respectively). Besides, the sizes of the two classes are rather high. For these reasons, we can say that for the risk classes involved in our analysis we accept the null hypothesis of independence between Severities and Frequencies.

5.2 Overall Loss Distribution Given the hypothesis of independence between Severity and Frequency of each class, the overall loss distribution is obtained through convolution (Soprano et al. 2009). The total loss S, in the holding time period, is given by the random sum of the © The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 G. De Luca et al., Statistical Analysis of Operational Risk Data, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-42580-7_5

71

72

5 Convolution and Risk Class Aggregation

Table 5.1 Correlation between Severities and Frequencies Risk class ρX N BL1 BL2 BL3/ET1 BL3/ET2 BL3/ET3 BL3/ET4 BL3/ET5 BL3/ET6 BL3/ET7 BL4/ET1 BL4/ET2 BL4/ET367 BL4/ET4 BL4/ET5 BL5 BL6 BL7 BL8/ET1 BL8/ET27 BL8/ET35 BL8/ET4 BL8/ET6

0.146 0.003 0.027 −0.043 0.030 0.023 0.094 −0.014 0.013 0.131 −0.019 −0.007 0.000 −0.027 0.018 −0.009 0.034 −0.037 −0.002 −0.024 −0.025 −0.049

p-value 0.080 0.900 0.271 0.005 0.116 0.197 0.001 0.729 0.439 0.197 0.465 0.684 0.987 0.605 0.635 0.850 0.455 0.245 0.932 0.694 0.131 0.446

severities X i of the single losses, that is, S=

N

Xi ,

i=1

where N is the frequency. The probability distribution function of random variable S is given by FS (x) =

∞

p(n) · FX∗n (x),

n=0

where ∗ is the operator convolution and FX∗n (x) is the n-fold convolution of distribution function FX (see also Frachot et al. 2001). In order to evaluate FS (x), we have implemented a three-step Monte Carlo technique (alternative approaches use Panjer’s recursion or Fast Fourier transforms):

5.2 Overall Loss Distribution

73

1. step 1: the number n of losses that occurred in the holding time period is sampled from the Frequency distribution (mixture of Poisson or Negative Binomial); 2. step 2: n independent realizations, representing the values of the observed losses, are sampled from the Severity distribution (mixture of Log-normal); 3. step 3: the overall operational loss is obtained as the sum of the values simulated in the step 2. The procedure has to be repeated for a high number (B) of realizations. In particular, to reach a satisfactory level of accuracy, we have considered B = 10 000 000 realizations.

5.3 Risk Class Aggregation and Copula Functions The Value-at-Risk (VaR) of the loss distribution representing a single risk class is the quantile at a given confidence level α. In other words, provided the losses s j , j = 1, . . . , B, are sorted in increasing order, s[1] ≤ s[2] ≤ · · · ≤ s[B] , the VaR with α = 99.9% is obtained as V a R99.9% =

Fs−1 (99.9%)

j ≥ 99.9% . = inf s[ j] : B

Overall VaR can be straightforwardly computed summing the different Valueat-Risk of the single risk class. In the presence of H operational risk classes, we define V a R(Sh ), h = 1, . . . , H , as the Value-at-Risk computed on loss distribution Sh associated with risk class h at the 99.9% percentile; then we have V a R(S) =

H

V a R(Sh ),

h=1

where S is the distribution of total loss. In this way, it is implicitly assumed that Sh , h = 1, . . . , H , are perfectly dependent. However, this hypothesis is usually deemed as unrealistic. Actually, the identification of the dependence structure among risk classes through statistical methods is typically the suggested strategy and copula functions have a prominent role given their flexibility. A copula function (see Joe 1997; Cherubini et al. 2004 and Nelsen 2006) is a multivariate function with arguments u 1 , . . . , u n defined on the unite n-cube [0, 1]n with the following properties:

74

5 Convolution and Risk Class Aggregation

1. the range of copula C(u 1 , . . . , u n ) is the unit interval [0, 1]; 2. C(u 1 , . . . , u n ) = 0 if any u i = 0 for i = 1, . . . , n 3. C(1, . . . , 1, u i , 1, . . . , 1) = u i for all u i ∈ [0, 1]. The important applications of copula functions in data analysis are based on the Sklar’s theorem, which justifies the role of copula as dependence function. Sklar’s theorem shows that, for continuous multivariate distributions, the univariate margins can be separated from dependence structure, which is completely captured by a copula. Let H (x1 , . . . , xn ) be a multivariate joint distribution function with marginal distribution functions Fi (xi ) for i = 1, . . . , n; then there exists a copula function C, such that C(F1 (x1 ), . . . , Fn (xn )) = H (x1 , . . . , xn ).

(5.1)

If Fi (xi ) are continuous, then the copula C is unique. Conversely, if C is a copula and F1 (x1 ), . . . , Fn (xn ) are distribution functions, then the function H defined above is a joint distribution function with margins F1 (x1 ), . . . , Fn (xn ). As a corollary, let F1−1 (u 1 ), . . . , Fn−1 (u n ) denote the generalized inverses of the uniform marginal distribution functions u 1 , . . . , u n , then for every (u 1 , . . . , u n ) in the unit n-cube, there exists a unique copula C such that C(u 1 , . . . , u n ) = H (F1−1 (u 1 ), . . . , Fn−1 (u n )), where Fi−1 (u i ) denotes the generalized inverse of the uniform marginal distribution function given by Fi−1 (u i ) = in f {xi : Fi (xi ) > u i )}. Finally, the copula density c(u 1 , . . . , u n ) is defined as c(u 1 , . . . , u n ) =

∂ 2 C(u 1 , . . . , u n ) . ∂u 1 , . . . , ∂u n

5.3.1 Tail Dependence In order to evaluate the risk due to extreme events, it is necessary to measure the tail dependence which consists into looking at concordance between less probable values of variables in a bivariate context. Geometrically, this concordance tends to concentrate on the lower and upper quadrant tails of the joint distribution function. Let Fi (xi ) be the marginal distribution functions of the random variable X i (i = 1, 2) and let v be a threshold value; then the lower tail dependence coefficient, λ L , is defined as the limit when v tends to zero, of the conditional probability that the distribution function of the random variable X 2 does not exceed v, given that the corresponding function for X 1 does not exceed v, λ L = lim+ P(X 2 ≤ F2−1 (v)|X 1 ≤ F1−1 (v)). v→0

5.3 Risk Class Aggregation and Copula Functions

75

For λ L ∈ (0, 1], X 1 and X 2 are asymptotically dependent on the lower tail; if λ L is null, X 1 and X 2 are asymptotically independent. For continuous random variables, it is possible to use an alternative and equivalent definition. In fact P(X 2 ≤ F2−1 (v)|X 1 ≤ F1−1 (v)) =

P(X 2 ≤ F2−1 (v), X 1 ≤ F1−1 (v)) P(X 1 ≤ F1−1 (v))

clarifying that the concept of (lower) tail dependence is indeed a copula property (see Joe 1997), that is, C(v, v) λ L = lim+ . (5.2) v→0 v It is straightforward to show that the limit provides the same result if the variables X 1 and X 2 are exchanged. The concept of upper tail dependence is defined in a similar way. Let Fi (xi ) be the marginal distribution function of the random variable X i (i = 1, 2) and let v be a threshold value; then λU is defined as λU = lim− P(X 2 > F2−1 (v)|X 1 > F1−1 (v)). v→1

As

=

P(X 2 > F2−1 (v)|X 1 > F1−1 (v)) =

1 − P(X 2 ≤ F2−1 (v)) − P(X 1 ≤ F1−1 (v)) + P(X 2 ≤ F2−1 (v), X 1 ≤ F1−1 (v)) 1 − P(X 1 ≤ F1−1 (v))

,

an alternative definition is λU = lim− v→1

1 − 2v + C(v, v) . 1−v

5.3.2 Elliptical Copulae The family of elliptical copulae includes many popular copula functions, in particular, the Gaussian copula and the Student’s t-copula. The Gaussian (or Normal) copula is the copula of the multivariate Normal distribution. It is given by C G = R (φ−1 (u 1 ), . . . , φ−1 (u n )), where

76

5 Convolution and Risk Class Aggregation

• R is the Normal standard multivariate distribution function with n × n correlation matrix R = {ρi j }; • φ−1 is the inverse of Gaussian standard univariate distribution function. The parameter for the Gaussian copula can be easily estimated using the Inference For Margin (IFM) method, a two-step procedure. Given a multivariate variable x, for a sample of size n, the log-likelihood function is given by (θ) =

n

i (θ),

i=1

where t (θ) =

log f i (xi ) + log c(u 1 , . . . , u n ),

i

and θ = [θM θC ] is the parameter vector, with θ M collecting the parameters of the marginals while θC denotes the parameters of the copula function. The exact maximum likelihood method implies θˆ = argmax (θ). The more popular IFM method reduces the computational burden of the exact maximum likelihood method. In the first step, the n univariate likelihoods of the marginals are separately maximized and the parameter vector θˆM is estimated. In the second step, using θˆM , the copula density is maximized and the parameter vector θˆC is estimated. Under regularity conditions, IFM estimator verifies the property of asymptotic normality (Joe 1997, 2005). The Gaussian copula does not admit tail dependence in any of the two tails. The Student’s t-copula (or t-copula) is defined in a similar way, with the Student’s t-distribution replacing the Gaussian distribution. It is defined as Ct (u 1 , . . . , u n ) = Tν,R (tν−1 (u 1 ), . . . , tν−1 (u n )), where n • Tν,R is the multivariate distribution function with correlation matrix R and degrees of freedom ν; • tν−1 is the inverse of the Student-t univariate distribution function with ν degrees of freedom.

The preferred estimation procedure is still the IFM method. Unlike the Gaussian copula, it captures tail dependencies but in a symmetric way, that is, lower and upper tail dependencies are restricted to be equal. For this reason, it is usually considered more suitable to describe operational risk data. If the number of degrees of freedom becomes large, the t-copula behaves like the Gaussian one.

5.3 Risk Class Aggregation and Copula Functions

77

5.3.3 Archimedean Copulae The Archimedean copulae family (see Joe 1997) can be built starting from the definition of a generator function : I → R + , continuous, decreasing, and convex, such that (1) = 0. An Archimedean copula C A can be expressed as C A (u 1 , . . . , u n ) = −1 ((u 1 ) + · · · + (u n )). One of the main properties of bivariate Archimedean copulae is the symmetry, that is, C A (u 1 , u 2 ) = C A (u 2 , u 1 ). In the class of Archimedean copulae, we can distinguish different copula functions, both one parameter, such as the Gumbel and the Clayton copula, and multiparameter, such as the Joe–Clayton and any convex combination of two copulae. The Clayton copula can be written as −1/θ −θ , CC (u 1 , . . . , u n ) = u −θ 1 + · · · + un − n + 1 with θ ≥ 0. The generator (t) is represented by t −θ − 1 and −1 (t) = (1 + t)−1/θ . The Clayton copula shows only lower tail dependence. For each pair of variables, the lower tail dependence coefficients is given by λ L = 2− θ , 1

where λ L is the lower tail dependence coefficient as defined in (5.2). The upper tail dependence is not encountered. The parameter θ can be written as θ=−

1 . log2 λ L

The Gumbel copula was introduced by Gumbel in 1960 and belongs to Gumbel– Hougaard class. It can be represented in the following: 1/γ

, C Gu (u 1 , . . . , u n ) = exp − (− log u 1 )γ + · · · + (− log u n )γ with γ ≥ 1. The generator (t) is given by (− log t)γ and, hence, −1 (t) = exp(−t 1/γ ) which for γ > 1 is completely monotonic. The Gumbel copula is only characterized by the presence of upper tail dependence, whose coefficient is 1

λU = 2 − 2 γ , for any pair of variables; hence

78

5 Convolution and Risk Class Aggregation

γ=

1 . log2 (2 − λU )

The Joe–Clayton copula belongs to the family BB7 (see Joe 1997) and its expression is −1/θ 1/κ C J C (u 1 , . . . , u n ) = 1 − 1 − (1 − (1 − u 1 )κ )−θ + · · · + (1 − (1 − u n )κ )−θ − 1 ,

with θ ≥ 0 and κ ≥ 1. The generator function and its inverse are given, respectively, by (t) = [1 − (1 − t)κ ]−θ − 1 and −1 (t) = 1 − [1 − (1 + t)−1/γ]1/κ . When κ = 1, we have the Clayton copula. The lower and upper tail dependence coefficients are, respectively, 1 λ L = 2− θ and

1

λU = 2 − 2 k . Equivalently, the parameters might be put in relation with these coefficients, obtaining 1 θ=− log2 (λ L ) and κ=

1 . log2 (2 − λU )

5.4 Value-at-Risk Estimates Considering t-Copula Computing the VaR using copula-based methods allows us to take into account diversification effects, which enable a better description of the bank’s risk exposure. Diversification effects could have a considerable impact in operational risk modeling. In fact, operational risk classes might be uncorrelated, at least partially. It is extremely unlikely (and it is hardly supported by any empirical evidence) that the most severe operational risk losses will occur systematically during the same period (Soprano et al. 2009). The analysis we have performed shows the correlation results among the 22 risk classes considering loss data with amounts larger than one million euros, aggregated on a yearly basis. Pearson’s linear and Kendall’s rank correlation coefficients between each pair of risk classes have been evaluated and reported in Tables 5.2 and 5.3, respectively. As it can be seen, the correlation coefficients are low and over one-third of the values are negative. Findings above support the use of diversification from an empirical point of view.

2

3/1

0.34

0.60

0.51

−0.23

−0.05

0.53 −0.01

3/3

3/4

0.31

0.21

3/3

3/4

0.29

0.05

0.14

0.01 −0.05 −0.45 −0.28 −0.12

5

0.81

0.39

0.36

0.02

−0.27

−0.09

−0.19

−0.25

8/27

8/35

8/4

8/6

0.19 −0.36

0.07

−0.06

7

8/1

0.48

−0.27

6

0.68

0.65

0.23

0.22

0.43

0.51

0.36

0.18

0.35

0.14

0.00

0.21 −0.03

0.02

0.43

0.37

0.43

0.02 0.40

0.03

0.10

0.63

0.46

0.71 −0.37 0.25

1.00

0.40 −0.20

0.18 −0.19

0.49

0.09

0.02 0.68

0.10

0.50

0.64

0.57

0.51

0.23

0.65

0.32

0.71 −0.05

0.00

0.39 −0.29

0.37

0.21

0.08

0.43

0.17

5

0.21

1.00

0.11

0.25

0.18 −0.02

0.34

0.07

0.54

0.19 −0.21

0.06

0.11

0.11 −0.02

0.18 −0.28 0.13 −0.25 −0.18

0.34 −0.15 −0.27

0.05

8/6

0.23

0.75

0.43

0.14

0.81

0.77

0.28

0.37

0.35

0.39

0.44

0.11 0.11

0.06 −0.10 0.19 −0.10 −0.32

0.03

0.21

0.02

0.36

0.02

0.52 −0.10

0.37

0.43

0.18

0.36

0.37

0.48

0.11

0.68

0.25 0.75 −0.29 0.20 −0.05 −0.07

0.75

0.06 −0.26 −0.15

0.12 −0.05

0.30 −0.21 −0.33 −0.02 −0.02

0.68 −0.07 0.11

8/4

0.04 −0.25 −0.02 −0.28

0.09 −0.06

0.07

1.00 0.16 0.57

0.00 0.22

0.61

0.15 −0.01 −0.13 0.09

0.57 −0.15

0.15

0.28

0.25

0.14

1.00

0.09

0.01

0.28

0.61

0.22

1.00

1.00 −0.21

0.68

0.25

0.01 −0.21

0.68

1.00

0.14

0.00 −0.13

0.18 −0.23 −0.01 1.00 −0.07 0.18 −0.07

0.00

0.00

0.16 −0.07 1.00

0.20 −0.15 −0.15 −0.23

0.37 −0.04 0.75 −0.05 −0.37

0.75

0.48

0.25 −0.15 −0.29 −0.07 −0.10

0.06 −0.32 −0.02 −0.05 −0.26

0.12

0.68

0.21 −0.01

0.13

0.00

0.50 −0.05 −0.04

0.41

0.10

0.02

8/35

1.00 −0.30 −0.29 −0.38 −0.04 −0.15 −0.37 −0.10

0.07

0.13 −0.30

0.07

1.00

0.21 −0.07

0.14

0.63 −0.05

0.18 −0.15 −0.25 −0.29

0.05

0.07 −0.07

0.14

0.11

1.00

0.46

0.40

8/27

0.19 −0.27 −0.09 −0.19 −0.25

8/1 0.07 −0.36

0.22 −0.14

0.32

0.48

0.10 −0.31

0.49 −0.20

0.40

7

0.05 −0.01 −0.03 −0.48

0.14 −0.12

0.43 −0.28

0.04 −0.45

0.29

6 0.01 −0.27 −0.06

0.68 −0.05

0.23

4/5

0.30 −0.04 −0.28 −0.27 −0.18 −0.38 −0.07

0.11 −0.05

0.41

0.44 −0.10 −0.33

0.11 −0.10

0.52 −0.28

0.77 −0.02

0.03

4/4

0.38 −0.21

4/367

1.00 −0.20 −0.37 −0.10 −0.19 −0.40

1.00 −0.10

0.24

0.37 −0.23

0.36

0.67

0.34

0.26

0.64

0.16

4/2

0.32 −0.05 −0.10

0.00

0.03

0.09

0.18

0.09

0.20 −0.53

0.33

0.02 −0.10

0.24

1.00

0.05

0.33

0.04 −0.06 −0.07

0.07

0.23 −0.25

0.21 −0.10

0.37

0.28

0.75

0.44

0.21 −0.19

0.05 −0.02

0.24

0.41

4/1

0.31 −0.25

3/7

0.10 −0.20 −0.02 −0.40 −0.05

0.40

0.64 −0.29

0.39

0.10

0.54 −0.01 −0.31

0.57

0.13 −0.01 −0.48

0.40 −0.14

0.32

0.04

0.23

0.37

0.36

−0.21

0.21

0.67

4/5

0.08

0.34

4/4

0.43

0.26

0.17

0.38

0.64

0.37

0.05

1.00

0.05

0.09 −0.53 −0.23

0.20

0.33

0.05

1.00

0.16

0.18

0.33

0.05 −0.06

0.25

0.34

0.37 −0.36 −0.06

0.10

0.42

3/6

0.53 −0.09

3/5

0.51 −0.01

4/367

0.09

0.37

1.00

0.43

0.28

0.60

0.25 −0.36

0.10

0.43

1.00

0.44 −0.02

0.24

0.28

0.35

0.35

0.57

4/2

−0.25 −0.19

3/7

4/1

−0.09

3/6

0.41

0.42

0.57

−0.06

3/2

1.00

0.34

−0.23

0.34

1.00

−0.29

3/1

3/5

3/2

1.00 −0.29 −0.23 −0.06 −0.23 −0.05

2

1

BL/ET 1

Table 5.2 Pearson’s linear correlation matrix

5.4 Value-at-Risk Estimates Considering t-Copula 79

2

0.33

0.62

−0.18

−0.31

0.10

0.08

3/2

3/3

3/4

3/2

1.00

0.14

−0.14

7

0.74

0.17

0.54

0.06

−0.17

−0.23

−0.41

8/35

8/4

8/6

0.23

0.26

0.18

0.30

0.28

0.06

0.56

0.38

0.67

0.05

0.33 −0.01

0.41

0.13 −0.23 −0.13

−0.33

8/27

8/1

0.15 −0.21 −0.23

0.41

−0.15

6

0.04

0.13

0.00

0.14 0.13

0.38

0.05 0.22

0.07 −0.01

0.07 −0.13

0.06

0.18

0.22

0.02

0.13 −0.18 −0.31 −0.23 −0.01

0.48

0.28

0.12

0.49

0.51 −0.05

0.03 −0.28

0.21

1.00

0.13

0.13

5

0.12

0.00

6

7

0.14 −0.02 −0.14

0.41

0.38

0.41

0.02

0.17 −0.19

0.18 −0.13

0.24 −0.12

0.00

0.10

0.23

0.41

0.67

0.28

0.74

0.13 −0.13

1.00

0.22

0.13 −0.08

0.05

0.10

0.06

0.31

0.10

0.24

0.41

0.00

0.21

0.07 −0.23

0.07 0.22

0.12 −0.03 0.30

0.31

0.14

0.64

0.27

1.00

0.06

0.06

0.26

0.06

0.00

0.06

0.14

0.02

0.22

0.18

0.18

0.38

0.17 0.09

0.31

0.06

0.26

0.14

0.22

0.30

1.00

1.00 −0.10

0.33

0.64

0.26 −0.10

0.33

1.00

0.27

0.07 −0.03

0.12

0.02

0.00

0.41 −0.14

0.12 −0.26 −0.02

0.25

0.30 −0.05 1.00 −0.03

0.07 0.30 −0.03

0.07

1.00

0.09 −0.05

0.31

0.10

0.22

8/6

0.13 −0.02

0.31

0.56

0.18

0.54

0.07 −0.18

0.48

0.14

0.38

0.30

0.17

0.22 −0.06 −0.13 −0.01

0.04 −0.10

0.01

8/4

0.00 −0.15 −0.01 −0.31

0.07 −0.16 −0.10

0.33

0.18

0.28 −0.12

0.08

0.38

0.14

8/35

1.00 −0.27 −0.26 −0.40 −0.12 −0.13 −0.19

0.23

0.07 −0.27

0.23

1.00

0.04 −0.16 −0.26

0.33

0.12

0.12

0.12

0.05

0.01 −0.23

0.21 −0.04

0.07

0.49

0.20

0.05

0.33 −0.13

0.15 −0.04 −0.38

0.33

8/27

0.13 −0.33 −0.17 −0.23 −0.41

8/1 0.14 −0.23

0.23 −0.01

0.41

0.41

0.04 −0.21 −0.01

0.19

0.17 −0.06

0.10

0.28

0.35

0.10 −0.17

0.42 −0.04

0.14 −0.32

0.42 −0.01

0.05 −0.10 −0.10 −0.40

0.01

0.18

0.12 −0.04

0.07

0.13

1.00

0.33

0.00 −0.26

0.25

0.21

0.10

0.08 −0.12

0.12

0.01 −0.06 −0.47

0.23 −0.13 −0.15 −0.08 −0.06

0.06 −0.02

0.31

0.14

0.41

0.10 −0.38

0.20 −0.04 −0.01

0.33

0.19

0.10

−0.01 −0.01 −0.32 −0.04 −0.17

0.28

0.17 −0.09

0.13

1.00

0.46 −0.21 −0.28 −0.05 −0.06

5

0.38

0.35

0.38

0.42

0.10

0.26

0.26

0.33

0.03

0.41

0.14

0.06

0.46

0.38

0.38

0.26

0.41

0.21 −0.21

0.36

0.13

4/5

0.05 −0.03 −0.01 −0.15 −0.14

4/4

1.00 −0.12 −0.12 −0.06 −0.09 −0.47

0.26 −0.12

1.00

0.21 −0.08 0.21

0.21

0.21 −0.14 −0.08

0.01

0.31

0.33

0.33

0.30

0.28 −0.25

0.21

0.26

0.05

0.26

4/367

0.33 −0.03

0.36

0.26

0.05

4/2

0.51 −0.12

0.36

0.31

0.21

0.42

0.13

0.27

0.06

0.33

0.05

0.33

0.33

0.26 −0.03

0.36

0.44

0.03 −0.06

0.01 −0.14

−0.03

0.05

0.28

1.00

0.00

0.00

4/5

0.26

0.21

0.30 −0.25

0.00

1.00

0.03

4/4

0.05

4/367

0.26

0.44 −0.05

0.27 −0.01

0.03

0.18

4/1

0.18 −0.14

3/7

0.15 −0.44 −0.05 −0.01

0.31 −0.26 −0.23

0.15

0.33 −0.03 −0.10

0.00

−0.14 −0.06

0.15

3/6

0.28 −0.03 −0.21

0.03

0.31

1.00

0.28

0.41

0.31

3/5

0.08 −0.03

3/4

0.15 −0.26

0.15

0.28

1.00

0.21

0.62

0.10

3/3

0.15 −0.21 −0.10 −0.44 −0.23

0.33

0.41

0.21

1.00

0.33

−0.03 −0.03 −0.03

4/2

4/1

3/7

3/6

0.28

0.31

1.00

−0.33

3/1

3/5

3/1

1.00 −0.33 −0.18 −0.31

2

1

BL/ET 1

Table 5.3 Kendall’s rank correlation matrix

80 5 Convolution and Risk Class Aggregation

2008

2007

2011

2012

Year with maximum annual frequencies

Year with maximum single loss

2014

2005

Year with maximum annual fre- 2010 quencies

Year with maximum single loss

2012

2005

Year with maximum annual losses 2013

BL4/ET367 BL4/ET4

2008

BL2

BL1

Year with maximum annual losses 2012

2007

2009

2009

BL4/ET5

2010

2010

2010

BL3/ET1

2014

2008

2014

BL5

2013

2005

2006

BL3/ET2

2008

2004

2004

BL6

2008

2009

2008

BL3/ET3

2011

2004

2011

BL7

2005

2014

2005

BL3/ET4

2003

2010

2003

BL8/ET1

2010

2012

2012

BL3/ET5

2008

2006

2008

BL8/ET27

2009

2012

2009

BL3/ET6

Table 5.4 Years with maximum aggregated losses, maximum frequencies, and maximum single loss among risk classes

2005

2009

2005

BL8/ET35

2011

2010

2011

BL3/ET7

2005

2005

2005

BL8/ET4

2004

2007, 2013

2004

BL4/ET1

BL4/ET2

2003

2008

2003

BL8/ET6

2008

2008

2008

5.4 Value-at-Risk Estimates Considering t-Copula 81

82

5 Convolution and Risk Class Aggregation

Table 5.5 Value-at-Risk results using different correlation hypotheses Method VaR EL UL Perfect dependence Student’s t-copula

DE (%)

1 140 215 446

1 551 543

1 138 663 903

0

793 548 835

3 926 664

789 622 171

30.404

Table 5.4 depicts the years of occurrence for maximum aggregated losses, maximum Frequencies, and maximum single loss among risk classes. In more detail, we can observe that 2005 and 2008 represent the years with both more maximum annual losses and more maximum single losses than the others, whereas 2010 is the year when more maximum annual Frequencies have occurred. The overall VaR depends on both the loss distribution of each risk class and the correlation structure among all classes, specified by the copula function. As we have explained in this chapter, in our application we have considered the loss distributions estimated through the convolution method and we have applied the copula function to determine the overall loss distribution and the VaR. Results are calculated using the Student’s t-copula by means of copula package (Yan 2007). Parameters have been estimated according to iTau-MPL method (Mashal and Zeevi 2002). Table 5.5 reports the figure calculated simply as the sum of different VaRs for single risk classes, compared with the value obtained by applying Student’s t-copula, along with the Diversification Effect (DE). The correlation matrix for the t-copula is estimated starting from Kendall’s correlation matrix (Table 5.3), whereas the degrees of freedom are 9.951. The diversification effect is sizable and amounts to 30.404%. This means that the VaR computed using the Student’s t-copula is about 30% smaller than the value calculated under the hypothesis of perfect dependence among risk classes.

References A. Frachot, P. Georges, T. Roncalli, Loss distribution approach for operational risk. Working paper, Groupe de Recherche Opérationnelle, Crédit Lyonnais, 2001 H. Joe, Multivariate Models and Multivariate Dependence Concepts (CRC Press, Boca Raton, 1997) H. Joe, Asymptotic efficiency of the two-stage estimation method for copula-based models. J. Multivar. Anal. 94(2), 401–419 (2005) R. Mashal, A. Zeevi, Beyond correlation: extreme co-movements between financial assets. Unpublished, Columbia University, 2002 R.B. Nelsen, An Introduction to Copulas. Springer Series in Statistics (2006) A. Soprano, B. Crielaard, F. Piacenza, D. Ruspantini, Measuring Operational and Reputational Risk: A Practitioner’s Approach (Wiley, Hoboken, 2009) J. Yan et al., Enjoy the joy of copulas: with a package copula. J. Stat. Softw. 21(4), 1–21 (2007) U. Cherubini, E. Luciano, W. Vecchiato, Copula methods in finance. John Wiley & Sons (2004)

Chapter 6

Conclusions

Abstract The chapter summarizes the analysis of the operational risk data provided by the DIPO consortium. It is remarked that the use of mixture of distributions for loss Severities and Frequencies together with a copula function able to allow for possible dependence among risk classes can improve the results. Keywords Operational risk · Mixture of distributions · Copula function The problem of measuring operational risk of banking groups has been investigated with an emphasis on the statistical treatment of the data. Loss data have been provided by the Italian operational risk database (DIPO) which collects data with an amount equal to at least e5000 of Effective Gross Losses suffered by members. For our analysis, we have used daily loss data occurred from January 1, 2003 to December 31, 2015. The operational losses are generally classified according to a combination of a business line and an event type. We have proposed a classification based on business lines only. However, for the losses belonging to the larger classes (BL3, BL4, and BL8), we have implemented a statistical procedure to test the hypotheses of equal distribution between losses belonging to different event types. For this aim, we have used both the Anderson–Darling and Kolmogorov–Smirnov hypothesis tests. When the hypothesis has been accepted, then the two dataset have been merged. In the opposite case, the two ETs have given origin to two risk classes. Finally, the procedure has produced 22 risk classes representing the starting point of our analysis. Loss amounts have been supposed to be independent realizations of a random variable X identifying the loss Severity. We have also assumed that the number of loss events n is a realization of a random variable N which describes the loss Frequency. We have estimated separately Severity and Frequency for each risk class. As regards Severity, we have used a mixture of k Log-normal distributions, whereas for Frequency estimation we have resorted to two discrete distributions: a mixture of k Poissons and a mixture of k Negative Binomials. The choice of the number of components as well as the Frequency distribution to be used for each class has been made according to specific criteria proposed by the authors. Concerning the Severity, a comparison with the traditional Extreme Value Theory (EVT) approach, usually

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2020 G. De Luca et al., Statistical Analysis of Operational Risk Data, SpringerBriefs in Statistics, https://doi.org/10.1007/978-3-030-42580-7_6

83

84

6 Conclusions

employed to model the right tail of the distribution, has been carried out. Findings show that our proposal has outperformed EVT methodology. After verifying the hypothesis of independence between Severity and Frequency for each risk class, we have computed the overall loss distribution through a process named convolution, in other words, the distribution of total loss S in the holding time period, given by the aggregation of Severities X of the single losses. Afterward, we have computed the Capital-at-Risk by applying the most used risk measure in financial applications, namely, the Value-at-Risk (VaR). The VaR, computed at the confidence level α = 99.9%, has been obtained applying the Monte Carlo algorithm. To ensure that the distribution of the aggregated losses and the VaR itself have a sufficient level of accuracy, 1 000 000 simulations have been performed. Then, the overall Capital-at-Risk has computed: • summing up the single-class VaRs (perfect dependence hypothesis); • using copula methods which take into account diversification effects. Copulae are uniform distributions which enable to extract the dependence structure from joint probability distribution function of a set of random variables and, at the same time, to separate the dependence structure from the univariate marginal behavior. Following a consolidated practice in literature, we have used a Student’s t-copula to model the dependence structure for operational risk data since it is able to model tail dependence. Findings show that the VaR computed using the Student-t copula is about 30% smaller than the value computed under the hypothesis of perfect dependence among risk classes.