Heavy-Tailed Time Series [1st ed.] 9781071607350, 9781071607374

This book aims to present a comprehensive, self-contained, and concise overview of extreme value theory for time series,

248 85 9MB

English Pages XIX, 681 [677] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter ....Pages i-xix
Front Matter ....Pages 1-1
Regularly varying random variables (Rafał Kulik, Philippe Soulier)....Pages 3-21
Regularly varying random vectors (Rafał Kulik, Philippe Soulier)....Pages 23-52
Dealing with extremal independence (Rafał Kulik, Philippe Soulier)....Pages 53-73
Regular variation of series and random sums (Rafał Kulik, Philippe Soulier)....Pages 75-99
Regularly varying time series (Rafał Kulik, Philippe Soulier)....Pages 101-145
Front Matter ....Pages 147-147
Convergence of clusters (Rafał Kulik, Philippe Soulier)....Pages 149-171
Point process convergence (Rafał Kulik, Philippe Soulier)....Pages 173-205
Convergence to stable and extremal processes (Rafał Kulik, Philippe Soulier)....Pages 207-233
The tail empirical and quantile processes (Rafał Kulik, Philippe Soulier)....Pages 235-263
Estimation of cluster functionals (Rafał Kulik, Philippe Soulier)....Pages 265-312
Estimation for extremally independent time series (Rafał Kulik, Philippe Soulier)....Pages 313-332
Bootstrap (Rafał Kulik, Philippe Soulier)....Pages 333-345
Front Matter ....Pages 347-347
Max-stable processes (Rafał Kulik, Philippe Soulier)....Pages 349-371
Markov chains (Rafał Kulik, Philippe Soulier)....Pages 373-423
Moving averages (Rafał Kulik, Philippe Soulier)....Pages 425-452
Long memory processes (Rafał Kulik, Philippe Soulier)....Pages 453-487
Back Matter ....Pages 489-681
Recommend Papers

Heavy-Tailed Time Series [1st ed.]
 9781071607350, 9781071607374

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Springer Series in Operations Research and Financial Engineering

Rafał Kulik Philippe Soulier

Heavy-Tailed Time Series

Springer Series in Operations Research and Financial Engineering Series Editors Thomas V. Mikosch, Køebenhavns Universitet, Copenhagen, Denmark Sidney I. Resnick, Cornell University, Ithaca, USA Stephen M. Robinson, University of Wisconsin-Madison, Madison, USA Editorial Board Torben G. Andersen, Northwestern University, Evanston, USA Dmitriy Drusvyatskiy, University of Washington, Seattle, USA Avishai Mandelbaum, Technion - Israel Institute of Technology, Haifa, Israel Jack Muckstadt, Cornell University, Ithaca, USA Per Mykland, University of Chicago, Chicago, USA Philip E. Protter, Columbia University, New York, USA Claudia Sagastizabal, IMPA – Instituto Nacional de Matemáti, Rio de Janeiro, Brazil David B. Shmoys, Cornell University, Ithaca, USA David Glavind Skovmand, Køebenhavns Universitet, Copenhagen, Denmark Josef Teichmann, ETH Zürich, Zürich, Switzerland

The Springer Series in Operations Research and Financial Engineering publishes monographs and textbooks on important topics in theory and practice of Operations Research, Management Science, and Financial Engineering. The Series is distinguished by high standards in content and exposition, and special attention to timely or emerging practice in industry, business, and government. Subject areas include: Linear, integer and non-linear programming including applications; dynamic programming and stochastic control; interior point methods; multi-objective optimization; Supply chain management, including inventory control, logistics, planning and scheduling; Game theory Risk management and risk analysis, including actuarial science and insurance mathematics; Queuing models, point processes, extreme value theory, and heavy-tailed phenomena; Networked systems, including telecommunication, transportation, and many others; Quantitative finance: portfolio modeling, options, and derivative securities; Revenue management and quantitative marketing Innovative statistical applications such as detection and inference in very large and/or high dimensional data streams; Computational economics.

More information about this series at http://www.springer.com/series/3182

Rafał Kulik Philippe Soulier •

Heavy-Tailed Time Series

123

Rafał Kulik Department of Mathematics and Statistics University of Ottawa Ottawa, ON, Canada

Philippe Soulier Université Paris X Nanterre, France

ISSN 1431-8598 ISSN 2197-1773 (electronic) Springer Series in Operations Research and Financial Engineering ISBN 978-1-0716-0735-0 ISBN 978-1-0716-0737-4 (eBook) https://doi.org/10.1007/978-1-0716-0737-4 Mathematics Subject Classification: 60G70, 60G55, 60G52, 62G32, 62M10 © Springer Science+Business Media, LLC, part of Springer Nature 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Science+Business Media, LLC part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Preface

This book is concerned with extreme value theory for stochastic processes whose finite-dimensional distributions are heavy-tailed in the restrictive sense of regular variation. These models include a large variety of widely used time series models and are worth being investigated. Extreme value theory for stochastic processes is nearly as ancient as extreme value theory for i.i.d. data; while the latter started around WWII, the former can be traced back at least to the stream of articles on level crossings by Gaussian processes which lead to the monograph [CL67]. This line of research was pursued and extended to non-Gaussian processes and a first comprehensive treatment of extreme value theory for stochastic processes was given in [LLR83]. In this reference, the focus is on conditions that lead to the same extremal behavior as for i.i.d. data. This is typical for Gaussian sequences which are extremally independent in the sense that no two observations can be simultaneously extremely large. This is excessively restrictive, since arguably, the typical behavior of time series is the occurrence of exceedences in clusters. Another parallel line of research focused on quantifying the behavior of the sample maxima when it is different from that of an i.i.d. sequence. The extremal index which quantifies this behavior was the main object of research. Markov chains and linear processes were thoroughly investigated. The next milestone was the book [EKM97] which popularized extreme value theory outside its core community. In that book the extremal theory for linear processes was summarized and an incursion was made into the field of non-linear time series such as the ARCH(1) process. Applications to insurance mathematics and ruin theory were given. Since the publication of that book, many more heavy-tailed time series models, in particular non-linear models such as GARCH and bilinear processes, have been thoroughly investigated from the extreme value theory point of view. However, the recent books on extreme value theory are mostly concerned with i.i.d. data: [dHF06] has 6 pages on dependent observations and [BGTS04] contains an overview of extreme value theory for time series, with a section on Markov

v

vi

Preface

chains and their tail chains. [Res07] is essentially concerned with i.i.d. data, except of a chapter on certain long memory models coming from queuing and teletraffic modeling. The statistical side of literature in extreme value theory for dependent data was concerned with the estimation of the tail index and of the extremal index which is a measure of the clustering of extremes, important references in this field being [Hsi91b, Hsi93] and [Dre00]. A few articles considered other statistics such as tail array sums in [RLdH98] and extremogram introduced by [DM09]. In 2009 and 2010, two articles were published which (in our opinion) set extreme value theory for heavy-tailed time series on a new course. [BS09], drawing on a few previous references where it was implicitly used, introduced the tail process which, roughly speaking, describes the asymptotic shape of a stationary time series given an extreme event at time zero. The tail process also turned out to be an efficient tool to express extremal quantities such as the extremal index, asymptotic variances of estimators and, retrospectively, to obtain more transparent conditions for the convergence of the statistics and existence of the limiting quantities. [DR10] provided a unified statistical framework for most estimators of extremal quantities, under the name empirical process of cluster functionals. Most of the recent literature in extreme value theory for heavytailed time series has been influenced by either or both papers. The purpose of this book is to present the recent work and revisit the older literature in a unified framework with the tail process at its center. We now give a detailed presentation of the structure and contents of the book. Structure of this book • Part I and Appendix B present the fundamentals of regular variation. The main elements of the theory of univariate regular variation and regularly varying random variables are recalled in Chapter 1. There, notation will be fixed and classical results such as the Uniform Convergence Theorem, Potter’s bounds, and Breiman’s lemma will be found. Chapter 2 will extend the theory to regularly varying random vectors. In this chapter regular variation will be defined in terms of vague# convergence. In Euclidean spaces or more generally in locally compact second countable spaces, vague# convergence is almost nothing else than vague convergence without bothering about 1. In infinite-dimensional spaces, it is the only way to extend the classical notion. For readers familiar with the classical vague convergence, the switch to vague# convenience will essentially be notational. Appendix B presents the theory of vague# convergence on Polish spaces adapted from [Kal17] and [BP19], where it is simply called vague convergence, and the theory of regular variation in metric spaces adapted from [HL06] (where it is called M0 -convergence). We have chosen to use the visual modifier # (which

Preface

vii

should not be pronounced) since regular variation is still often defined in terms of vague convergence in ½1; 1d n f0g. The rest of Chapter 2 gives useful tools such as bounds, evaluations of moments, transfer of regular variations, and introduce the fundamental dichotomy between extremal dependence and independence. The latter necessitates specific tools which are introduced in Chapter 3. Chapter 4 extends the results of Chapter 2 on sums of jointly regularly varying random variables to infinite series. This is an essentially technical chapter, but the results are indispensable to study many time series models which can be expressed as infinite series, such as moving averages and solutions of certain stochastic recurrence equations. Chapter 5 is the central chapter of this book. It introduces the tail process which will appear ubiquitously in the later chapters, along with its many avatars. The tail process was originally introduced in [BS09] as the weak limit of the finite-dimensional distributions of the time series given an extreme value occurs at time zero. This definition entails that the all finite-dimensional distributions of the time series are regularly varying. This is a stronger assumption than in the past literature which typically only assumed regular variation of the marginal distribution, but is satisfied by all time series models we know. The tail process can be seen as a model for the clusters of exceedences, which are a typical feature of regularly varying time series, except when the finite-dimensional distributions are extremally independent, in which case the tail process vanishes at all non-zero lags. Even though Chapter 5 is essentially devoted to the tail process, it starts with defining and proving the existence of the tail measure of a regularly varying time series, which is the exponent measure of the time series seen as a random element in the Polish space of Rd -valued sequences endowed with the product topology. For this, the use of the theory of vague# convergence developed in Appendix B is crucial. In that framework, the tail process is defined as a random element whose distribution is the tail measure restricted to a set of measure 1. The two definitions are shown to be equivalent. In Chapter 5, we also reconcile the two different spectral decompositions of the tail process, namely the spectral tail process H introduced in [BS09] and a process denoted by Q which first appeared in [DH95] in the characterization of infinitely divisible point processes. This process had not yet been given a name so call it conditional spectral tail process since it appears in a conditional spectral decomposition in Section 5.5. In the literature on limits theorem for infinite variance processes, the same objects were expressed in some references in terms of H under certain conditions, and in other references in terms of Q under somewhat different conditions. We prove in Chapter 5 that these expressions are indeed equal, under conditions which are essentially compatible with those found in the literature. We also provide tools to obtain many more similar identities. Finally, the candidate extremal index, also introduced in [BS09], is defined in Chapter 5 as a functional of the

viii

Preface

tail process. It will be shown later that under some mild conditions it is the true extremal index of the time series under consideration, and in particular it will be shown in Proposition 13.2.6 that it is always the true extremal index of max-stable processes. • Part II is concerned with limit theorems related to extreme value theory. They are of two sorts: the more probabilistic ones such as convergence of the partial maxima and partial sum processes, and the more statistically oriented ones such as the convergence of the tail empirical process and other statistical quantities, and bootstrap procedures. Chapter 7 studies the convergence of the point process of exceedences and its further refinements, the functional point process of exceedences and the point process of clusters. This problem has a very long history, of which an important main milestone is the book [LLR83] where convergence to a Poisson point process is proved under the classical condition D, which allows the use of a blocking method and D0 which rules out clusters of exceedences. The emphasis on condition D0 and the convergence to a simple Poisson process is probably due to the initial focus on discrete and continuous time Gaussian processes, an important feature of which is the extremal independence of the finite-dimensional distribution, except in the degenerate case of correlation 1. Convergence of the point process of exceedences to a compound Poisson process was proved for specific models (moving averages) in [DR85a] and in a more general framework by [HHL88] under a strengthening of the abovementioned condition D. The definitive step was taken by [DH95] where convergence was not only proved but also related to the theory of infinitely divisible point processes. The only remaining improvement was the introduction of the tail process by [BS09] which allowed to give an explicit construction of the sequence Q which appears in [DH95]. The final (to this date) refinement was brought by [BPS18] where the point process of exceedences is embedded into the point process of clusters, defined on the space of sequences. This allows to keep track of the order of exceedences and has some implications for records and functional convergence. As often in asymptotic theory for dependent sequences, a blocking method is employed to obtain the abovementioned results. This method is justified in [LLR83] by condition D, in [HHL88] by a strengthening of the latter condition named ¢ and in [DH95] by condition Aðan Þ which is expressed in terms of the Laplace functional. All these conditions are implied by the Rosenblatt’s strong mixing, hence by the stronger fl-mixing, and therefore hold for m-dependent sequences. These temporal weak dependence conditions allow to break the original sequence into nearly independent blocks, but do not bring information on the extremal behavior within a block. Another important contribution of [DH95] is the introduction of the condition ACðrn ; cn Þ, called anti-clustering condition in this book. The sequence rn is the block size and the sequence cn is a

Preface

ix

threshold, and both sequences must tend to infinity. Heuristically, condition ACðrn ; cn Þ means that the number of exceedences over the threshold cn in a block of length rn remains bounded (almost surely) and the asymptotic shape of the cluster (the block) is given by the tail process. These heuristics will be developed rigorously in Chapter 6 and are at the core of the results of Chapters 7 and 8. Chapter 8 treats the important application of the point process of exceedences to the functional convergence of the partial maxima and partial sums to extremal and stable processes by continuous mapping. This application in the case of i.i.d. regularly varying random vectors was developed in the article [Res86] and the ensuing book [Res87]. Early references for convergence to stable laws of sums of dependent random variables include [Dav83], [DR85a]. [DH95] gave a complete description of the parameters of the limiting stable law in terms of the sequence Q already mentioned. Functional convergence to stable processes is more difficult to obtain than in the finite variance case because of the jumps of the limiting process. It was established in different topologies by what could be called the Croatian school in a series of articles [BKS12], [BK15], [BPS18]. The necessary elements of the theory weak convergence of stochastic processes, in particular the M1 topology on the space of càdlàg functions, are recalled in Appendix C. Chapters 9 to 12 address statistical issues. In Chapter 9, we study the tail empirical process, which is the empirical process based only on observations above a high threshold. There is an important literature on the tail empirical processes both for i.i.d. and dependent data, the last article being [Roo09], which gives a complete theory under strong and fl-mixing. The main difference of Chapter 9 with respect to the previous literature is to use a strengthening of condition ACðrn ; cn Þ, which we call condition Sðrn ; un Þ, to obtain the convergence of the variance. In order to simplify the arguments for tightness, we use fl-mixing. As in most of the related literature, we apply the convergence of the tail empirical process to prove the asymptotic normality of the most famous estimator of the tail index, namely the Hill estimator. The asymptotic variance is expressed in terms of the tail process, and it seems that this had not been done before. Chapter 10 is essentially an extended presentation of the outstanding article [DR10] which deals with more general statistics, called cluster functionals, of which the tail array sums introduced in [RLdH98] are a special case. In [DR10], convergence of the variance and covariances of the statistics considered is assumed, which gives a greater level of generality, but no guidelines on how to prove these convergences. Building on the previous chapter and following [KSW18], we provide practical conditions for these convergences and express the limiting variances and covariances in terms of the tail process or spectral tail process, using intensively the identities obtained in Chapters 5 and 6. Again, we use fl-mixing to simplify the arguments for tightness.

x

Preface

Chapter 11 deals with tail array sums in case of extremal independence, providing statistical theory for the probabilistic framework developed in Chapter 3. Chapter 12 is concerned with a particular bootstrap method for cluster functionals, building again on [DR10] and several articles that followed. An important part of the literature in extreme value statistics for i.i.d. data is concerned with bias issues. For instance, the bias of the Hill estimator is controlled by so-called second (or even third)-order conditions and many bias reduction methods have been developed. For dependent data, this issue is much less relevant and the literature is quasi-nonexistent. We have decided to ignore this issue entirely. • Part III will introduce several classes of regularly varying time series. In Chapter 13, we review some recent and important results of the theory of max-stable processes and rewrite them using the tail process which was proved recently to determine, up to scaling, the distribution of a max-stable process. We will also show that for max-stable processes, certain conditions used as sufficient conditions in the general theory become necessary and sufficient. This highlights the fact that max-stable processes can be seen as ideal models of heavy-tailed time series. In Chapter 14, we show that nearly all the conditions used in Part II can be checked for regularly varying functions of certain geometrically ergodic Markov chains, thus providing a very large class of models to which the theory applies. In Chapter 15, we consider infinite series with random coefficients for which we only compute the tail process and the extremal index. We then specialize to moving averages with deterministic coefficients to which we give a more thorough treatment. In Chapter 16, we study several long memory (or long range dependent) models. Long memory does not have a precise definition. Following [Sam06], it can be heuristically defined as a phase transition in a family of models governed by a parameter, some values of which induce a standard behavior, close to that of i.i.d. data, and other values induce a strikingly non-standard behavior; this phase transition may happen at a different frontiers according to the problem considered: partial maxima, partial sums, autocovariances, etc. We illustrate this phase transition on several models and several problems. This part is by nature incomplete and infinitely many other models could have been included, but editorial time and space being finite, we made these choices, we hope not too arbitrarily.

Preface

xi

A note on the bibliography It would only be a very small overstatement to say that this book stems out of three articles: [DH95], [BS09], and [DR10], in chronological order, where the deepest and most fruitful ideas concerning the point process of exceedences of dependent time series, the tail process and the statistical methodology for extremes can be found. Add to that [Kal17] for the theory of vague# convergence, [Res86] for the use of point processes in functional convergence, [SO12] for the introduction of the tail measure, [DK17] for max-stable processes, [JS14] and [MW13] for Markov chains, [HS08] for infinite series, a few of our own articles to prove that we produced something in this field, and that might be the end of it. However, we felt that it would not do, and many authors would legitimately feel unfairly overlooked. Therefore, we have tried at the end of each chapter to provide some historical insight and to cite the most (still in our opinion) relevant references, especially for recent results (i.e. published after 2009), but we have not tried to be exhaustive, because, as Cervantes wrote in the preface of Don Quijote, “De todo esto ha de carecer mi libro, porque ni tengo qué acotar en el margen, ni qué anotar en el fin, ni menos sé qué autores sigo en él, para ponerlos al principio, como hacen todos, por las letras del A B C, comenzando en Aristóteles y acabando en Xenofonte y en Zoilo o Zeuxis, aunque fue maldiciente el uno y pintor el otro1”. How to use this book This book is not an introductory book on regular variation or extreme value theory or on time series. Readers should be familiar with extreme value theory for i.i.d. data and with basics of time series. Chapters 1, 2, and 4 contain standard material and can be skipped by readers familiar with extreme value theory. Chapter 3 deals with extremal independence and can be skipped upon first reading since it will not be referred to until Chapter 11. From Chapter 5 onward, the book should be essentially read linearly. See Figure 0.1 for suggestions of different paths through Chapters 1 to 16. One novelty of the book is the use of vague# convergence of which we therefore give an extended presentation in Appendix B. The other appendices essentially contain classical results, used throughout the book, with precise references. Most notation will be explained upon first appearance. Let us only mention here a few which will be used throughout the book. Whenever the three letters “cst” appear in an equation, they indicate a constant, which should 1 Of all this my book will lack, because I don’t have anything to add in the margin or write down at the end, and I am far from knowing which authors I follow in it, to put them at the beginning, as everyone else does, following the alphabetical order, starting with Aristotle and finishing with Xenofonte and Zoilus or Zeuxis, although the former was a sycophant and the latter a painter. Cited in [Sou19], English translation by the author.

xii

Preface

Basics

Tail process

Probability

Statistics

Models

1

5

7

9

13

2

6

8

10

3

11

4

12

14

15 16

Fig. 0.1. Dependencies and reading suggestions. A graduate course in extreme value theory for heavy-tailed time series should start with a more or less detailed account of Chapters 1 and 2 depending on the background in extreme value theory, and continue with a rather exhaustive presentation of Chapters 5 and 6, possibly skipping Sections 5.5 and 5.6 in the first place. The course should then proceed with Chapters 7 and 8 if probabilistically oriented or jump directly to Chapter 9 if statistically oriented. Chapters 10 and 12 contain advanced statistical material and should be read after Chapters 6 and 9. Chapters 3 and 11 deal with some aspects of extremal independence and therefore are essentially independent of Chapter 6 but Chapter 11 uses Chapter 9. Chapter 4 is needed for the study of most models described in the last part of the book whose chapters are independent from each other.

depend on none of the variable parameters surroundingR it. The integral with R respect to a measure is denoted indifferently by lðf Þ, E f dl or E f ðxÞlðdxÞ. d

We denote convergence in distribution of random elements by !, except for w

fi:di:

stochastic processes and random measures, for which we use ¼) and ! for the convergence of finite-dimensional distributions of a stochastic process. As mentioned above, we have decided to use the visual modifier # to distinguish vague# convergence from the classical vague convergence. Thanks This book would not exist without the (direct or indirect) contributions of many friends, colleagues, and students, through discussions, invitations to their institutions or workshops, proofreading (and perhaps refereeing) of articles, and all those informal instances where projects make progress in a researcher’s mind. We may cite, alphabetically and non-exhaustively, Bojan Basrak, Clemonell Bilayi-Biakana, Youssouph Cissokho, Clément Dombry, Holger Drees, Enkelejd Hashorva, Gail Ivanoff, Anja Janssen, Thomas Mikosch, Hrvoje Planinić, François Roueff, Parthanil Roy, Gennady Samorodnitsky, Johan Segers, Stilian Stoev, and Olivier Wintenberger. We

Preface

xiii

also express our gratitude (or perhaps apologies) to the referees on whom fell the burden of reading our book. They helped correct many typos and mistakes, and we take sole credit for the remaining ones. A lot of the material of this book was first developed in our joint research projects, during visits to each other’s departments, and during stays at the wonderful research centers in Banff (BIRS), Luminy (CIRM), Oaxaca (BIRS), and Oberwolfach (OMF). Let all these institutions be gratefully thanked. Finally, we are also grateful to the immeasurable patience of Donna Chernyk from Springer. And there are those to whom we are indebted beyond words.

Ottawa, Canada Paris, France February 2020

Rafal Kulik Philippe Soulier

Contents

Part I 1

2

3

4

Regular Variation

Regularly varying random variables . . . . . . . . . . . . . . . . . . . .

3

1.1

Regularly varying functions . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2

Regularly varying sequences . . . . . . . . . . . . . . . . . . . . . . . . .

10

1.3

Regularly varying random variables . . . . . . . . . . . . . . . . . . .

11

1.4

Bounds and moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

1.5

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

1.6

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

Regularly varying random vectors . . . . . . . . . . . . . . . . . . . . .

23

2.1

Regularly varying random vectors . . . . . . . . . . . . . . . . . . . .

25

2.2

Spectral decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

2.3

Conditioning on extreme events . . . . . . . . . . . . . . . . . . . . . .

45

2.4

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

2.5

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

Dealing with extremal independence . . . . . . . . . . . . . . . . . . .

53

3.1

Hidden regular variation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

3.2

Conditioning on an extreme event . . . . . . . . . . . . . . . . . . . .

60

3.3

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

3.4

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

Regular variation of series and random sums . . . . . . . . . . . .

75

4.1

Series with random coefficients . . . . . . . . . . . . . . . . . . . . . . .

75

4.2

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92

xv

xvi

5

Contents

4.3

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

4.4

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

Regularly varying time series . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.1

The tail measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.2

The tail process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

5.3

The time-change formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

5.4

From the tail process to the tail measure . . . . . . . . . . . . . . . 120

5.5

Anchoring maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.6

Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

5.7

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

5.8

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Part II 6

7

8

Limit Theorems

Convergence of clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.1

What is a cluster? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

6.2

Vague# convergence of clusters . . . . . . . . . . . . . . . . . . . . . . 153

6.3

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

6.4

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Point process convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 7.1

Random measures and point processes . . . . . . . . . . . . . . . . . 174

7.2

Convergence of the point process of exceedences: the i.i.d. case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

7.3

Convergence of the point process of clusters . . . . . . . . . . . . 186

7.4

Sufficient conditions for point process convergence . . . . . . . 191

7.5

The extremal index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

7.6

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

7.7

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Convergence to stable and extremal processes . . . . . . . . . . . 207 8.1

Convergence of the partial maxima . . . . . . . . . . . . . . . . . . . 209

8.2

Representations of stable laws and processes . . . . . . . . . . . . 210

8.3

Convergence of the partial sum process . . . . . . . . . . . . . . . . 217

Contents

9

xvii

8.4

Multivariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

8.5

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

8.6

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

The tail empirical and quantile processes . . . . . . . . . . . . . . . 235 9.1

Consistency of the tail empirical distribution function . . . . 236

9.2

Functional central limit theorem for the tail empirical process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

9.3

Quantile process and intermediate order statistics . . . . . . . . 251

9.4

Tail empirical process with random levels . . . . . . . . . . . . . . 253

9.5

The Hill estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

9.6

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

9.7

Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262

10 Estimation of cluster functionals . . . . . . . . . . . . . . . . . . . . . . . 265 10.1 Consistency of the empirical cluster measure . . . . . . . . . . . . 267 10.2 Central limit theorem for the empirical cluster process . . . . 278 10.3 Central limit theorem for feasible estimators . . . . . . . . . . . . 282 10.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 10.5 Proof of Theorem 10.2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 10.6 Proof of Theorem 10.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 10.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 11 Estimation for extremally independent time series . . . . . . . 313 11.1 Finite-dimensional convergence of tail array sums . . . . . . . . 315 11.2 Non-parametric estimation of the limiting conditional distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 11.3 Estimation of the conditional scaling exponent . . . . . . . . . . 324 11.4 Asymptotic equicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 12 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 12.1 Conditional multiplier central limit theorem . . . . . . . . . . . . 335 12.2 Proof of Theorem 12.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 12.3 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

xviii

Contents

Part III

Time Series Models

13 Max-stable processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 13.1 Max-stable vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 13.2 Max-stable processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 13.3 Anticlustering conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 13.4 fl-mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 13.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 13.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 13.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 14 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 14.1 Regular variation of the stationary distribution . . . . . . . . . . 376 14.2 Regular variation of Markov chains . . . . . . . . . . . . . . . . . . . 378 14.3 Checking conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 14.4 Regeneration and extremal index . . . . . . . . . . . . . . . . . . . . . 401 14.5 Extremally independent Markov chains . . . . . . . . . . . . . . . . 411 14.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 14.7 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422 15 Moving averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 15.1 Regular variation of two-sided moving average processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 15.2 Regular variation of one-sided moving average processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 15.3 Moving averages with deterministic coefficients . . . . . . . . . . 436 15.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450 15.5 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 16 Long memory processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 16.1 Long memory moving averages . . . . . . . . . . . . . . . . . . . . . . 454 16.2 Heavy-tailed subordinated Gaussian processes . . . . . . . . . . . 458 16.3 Long memory stochastic volatility processes . . . . . . . . . . . . 471 16.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485

Contents

xix

Appendices Appendix A Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 A.1 Measures on metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 A.2 Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 Appendix B Vague# convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 B.1 Vague# convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 B.2 Regular variation of measures on Polish spaces . . . . . . . . . . . . . 521 Appendix C Weak convergence of stochastic processes . . . . . . . 531 C.1 The space DðIÞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 C.2 The J1 topology on a compact interval . . . . . . . . . . . . . . . . . . . . 532 C.3 The M1 topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540 C.4 Asymptotic equicontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 Appendix D Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 D.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557 D.2 Atoms and small sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 D.3 Geometric ergodicity and mixing . . . . . . . . . . . . . . . . . . . . . . . . 560 Appendix E Martingales, Central Limit Theorems, Mixing, Miscellanea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 E.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 E.2 Central limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564 E.3 Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 E.4 Miscellanea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 Appendix F Solutions to problems . . . . . . . . . . . . . . . . . . . . . . . . . . 579 List of conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677

Part I

Regular Variation

1 Regularly varying random variables

In this chapter we recall the fundamental results about regular variation of univariate real-valued functions and sequences. Most of the material presented here is standard. We will state certain results without proof. See Section 1.6 for references. We will apply these results to real-valued random variables with regularly varying tails. We will prove some the most famous and useful results, such as Potter’s bounds and Breiman’s lemma.

1.1 Regularly varying functions We start with the definition of regular variation. Definition 1.1.1 A positive function f defined on an interval [a, ∞) is said to be regularly varying at infinity if lim

x→∞

f (tx) f (x)

(1.1.1)

exists in (0, ∞) for all t > 0.

Under the single assumption that the function f is measurable, the following theorem characterizes the form of the possible limits and the nature of the convergence. It is known as the Uniform Convergence Theorem. Theorem 1.1.2 (Uniform Convergence Theorem) Let f be a positive Lebesgue measurable function defined on [0, ∞) such that the

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 1

3

4

1 Regularly varying random variables

limit (1.1.1) exists in (0, ∞) for all t in a set of positive Lebesgue measure. Then there exists γ ∈ R such that lim

x→∞

f (tx) = tγ , f (x)

(1.1.2)

for all t > 0 and the convergence is uniform on compact subsets of (0, ∞). Moreover, • If γ > 0, the convergence is uniform on sets [0, b], 0 ≤ b < ∞; • if γ < 0, the convergence is uniform on sets [b, ∞), 0 < b < ∞; • if γ = 0, the convergence is uniform on sets [a, b], 0 < a ≤ b < ∞.

In the sequel, we will only consider Lebesgue measurable functions. We now introduce terminology and make a few elementary comments. • If f satisfies (1.1.2), we will say that f is regularly varying at infinity with index γ. • If γ = 0, then the function f will be said to be slowly varying at infinity. • Consequently, a function f defined on (0, ∞) is regularly varying at infinity with index γ if and only if there exists a slowly varying function  such that f (x) = xγ (x). • Let f be regularly varying at infinity with index γ. If γ > 0, then limx→∞ f (x) = ∞. If γ < 0, then limx→∞ f (x) = 0. Nothing can be said in the case γ = 0. • The product of two regularly varying functions f and g with indices γ and δ is regularly varying with index γ +δ and 1/f is regularly varying with index −γ. • If f is regularly varying with index γ, then for δ = 0, f δ is regularly varying with index γδ. If δ > 0, then x → f (xδ ) is also regularly varying with index γδ. A monotone regularly varying function need not to be continuous, but its jumps are negligible with respect to the size of the function. Lemma 1.1.3 Let f be a monotone function defined on [0, ∞), regularly varying at infinity with index γ. For x ≥ 0, define Δf (x) =

lim

y→x,y>x

Then limx→∞ Δf (x)/f (x) = 0.

f (y) −

lim

y→x,y 0 , (1.1.3) f1 (x) = inf y≤x g(y) , f2 (x) = supy≥x g(y) , if γ < 0 .

Proof. Assume first that γ > 0. Let f be regularly varying with index γ and let f2 be defined as in (1.1.3). Then, by the Uniform Convergence Theorem 1.1.2, we have lim

x→∞

supt∈[0,1] f (tx) f2 (x) = lim = sup tγ = 1 . x→∞ f (x) f (x) t∈[0,1]

To prove the property for f1 , note that g = 1/f is regularly varying with index −γ < 0, so the Uniform Convergence Theorem 1.1.2 yields in this case f1 (x) inf t≥1 f (tx) g(x) = lim = lim x→∞ f (x) x→∞ x→∞ supt≥1 g(tx) f (x) 1 = =1. supt≥1 t−γ lim

Consider now the case γ < 0. Then lim

x→∞

supt≥1 f (tx) f2 (x) = lim = sup t−γ = 1 . x→∞ f (x) f (x) t≥1

The property for f1 follows by considering again g = 1/f .



6

1 Regularly varying random variables

The second most important result is the representation theorem. Theorem 1.1.5 (Representation Theorem) Let  be a function defined on (0, ∞), slowly varying at infinity. There exist x0 > 0 and functions c and η defined on [x0 , ∞) such that lim c(x) = c∗ ∈ (0, ∞) ,

x→∞

lim η(x) = 0

x→∞

and for all x ≥ x0 ,

 (x) = c(x) exp

x

x0

 η(s) ds . s

(1.1.4)

The functions c and η can be chosen in such a way that η is infinitely differentiable.

An important consequence of this result is that if f is regularly varying at infinity, then it is equivalent to a regularly varying smooth function. Corollary 1.1.6 Assume that f is continuously differentiable function and such that f is ultimately positive. If there exists γ ∈ R such that xf  (x) =γ x→∞ f (x) lim

(1.1.5)

then f is regularly varying at infinity with index γ. If f is regularly varying at infinity, then there exists a continuously differentiable function f # that satisfies (1.1.5) and such that limx→∞ f (x)/f # (x) = 1.

Proof. Assume that (1.1.5) holds. Let x0 be such that |f (x)| > 0 for x ≥ x0 . For s ≥ x0 , define η(s) = sf  (s)/f (s) − γ. Then lims→∞ η(s) = 0 and, for x ≥ x0 , f (x) = (x)xγ with  defined by   x η(s) −γ ds . (x) = f (x0 )x0 exp s x0 We must only prove that  is slowly varying. Fix t > 0,  > 0 and ζ such that |tζ − 1| ∨ |t−ζ − 1| ≤ . Since η(s) → 0, we can choose x1 ≥ x0 such that |η(s)| ≤ ζ for s ≥ x1 . Then, for x ≥ x0 , we have

1.1 Regularly varying functions

7



 tx (tx) η(s) = exp ds (x) s x  t  η(xs) = exp ds ≤ exp(ζ| log t|) = (t ∨ t−1 )ζ , s 1 and similarly, (tx)/(x) ≥ (t ∨ t−1 )−ζ . This yields 1 −  ≤ lim inf x→∞

(tx) (tx) ≤ lim sup ≤1+. (x) x→∞ (x)

Since  is arbitrary, we conclude that  is slowly varying.



Karamata theorem It is impossible to omit the celebrated Karamata theorem which we state without proof. We will provide a proof in the context of random variables in Section 1.3. Theorem 1.1.7 Let f be a locally bounded measurable function on [1, ∞), regularly varying at ∞ with index α. For β ∈ R,  x β−1 t f (t)dt 1 lim 1 β = , β+α>0, (1.1.6a) x→∞ x f (x) α+β  ∞ β−1 t f (t)dt 1 =− , β+α 0. Let the function f be regularly varying at infinity with index γ. Then there exists a function g, regularly varying at infinity with index 1/γ such that lim

x→∞

f (g(x)) g(f (x)) = lim =1. x→∞ x x

(1.1.9)

The function g can be chosen as the left-continuous inverse of f and if f (x) = xγ (xγ ) with  slowly varying then g(x) = (x# (x))1/γ , where # is a slowly varying function such that limx→∞ (x)# (x(x)) = 1.

The function # is called the de Bruijn conjugate of . It is unique up to equivalence at ∞ and limx→∞ (# )# (x)/(x) = 1. Proof. By Theorem 1.1.5 and Corollary 1.1.6, the function f is equivalent at infinity to a continuously differentiable function h such that limx→∞ xh (x)/ h(x) = γ. Thus h is ultimately positive and h is invertible on some interval [x0 , ∞). Let g be its inverse and y0 = h(x0 ). Then for all x ≥ x0 , g(h(x)) = x and for all y ≥ y0 , h(g(y)) = y. Moreover, by Corollary 1.1.6, g is regularly varying with index 1/γ, since yg  (y) h(g(y)) 1 = → , y→∞. g(y) g(y)h (g(y)) γ Conversely, since f is equivalent to h at ∞ and g is regularly varying, we have, by uniform convergence, lim

x→∞

g(f (x)) g(h(x)f (x)/h(x)) = lim =1. g(h(x)) x→∞ g(h(x))

Thus limx→∞ x−1 g(f (x)) = 1. Hence, we proved that g = h← fulfills (1.1.9). Now, we prove that g can be chosen as f ← . To prove that f ← defined in (1.1.8) satisfies (1.1.9), note that by Lemma 1.1.3, we can assume without loss of generality that f is right-continuous. Then, by definition of f ← , y ≤ f (f ← (y)) ≤ y + Δf (f ← (y)) , where Δf (x) denotes the jump of f at x (possibly zero). This yields 1−

Δf (f ← (y)) y ≤ ≤1. ← ← f (f (y)) f (f (y))

1.1 Regularly varying functions

9

By Lemma 1.1.3, the left-hand side tends to one, so limy→∞ y −1 f (f ← (y)) = 1, and thus f ← is equivalent to g at ∞. Take γ = 1. Since g is regularly varying with index 1, it can be expressed as g(x) = x# (x) with # slowly varying and by (1.1.9), 1 = lim

x→∞

g(x(x)) = lim (x)# (x(x)) . x→∞ x

For the general case, if f (x) = xγ (xγ ) is regularly varying with index γ > 0, then x → f (x1/γ ) = x(x) is regularly varying with index 1 and thus its inverse can be expressed as x# (x); thus the inverse of f can be expressed as  (x# (x))1/γ . Proposition 1.1.9 (Composition) If the functions f1 and f2 are regularly varying at infinity with respective indices γ1 ∈ R and γ2 ≥ 0 and limx→∞ f2 (x) = ∞, then f1 ◦ f2 is regularly varying with index γ1 γ2 .

Proof. Write fi (x) = xγi i (x), i = 1, 2, where i are slowly varying functions and 2 is positive. Then, f1 (f2 (x)) = 1 (xγ2 2 (x))γ21 (x)xγ1 γ2 . The function γ21 is slowly varying, so we only need to prove that for any γ ≥ 0, the function  defined by (x) = 1 (xγ 2 (x)) is slowly varying at ∞, under the additional assumption that limx→∞ 2 (x) = ∞ if γ = 0. For t, x > 0, write tx = tγ 2 (tx)/2 (x). Then, (tx) 1 (tγ xγ 2 (tx)) 1 (tx xγ 2 (x)) = = . (x) 1 (xγ 2 (x)) 1 (xγ 2 (x)) Since tx → tγ > 0 and xγ 2 (x) → ∞, we conclude by the Uniform Convergence Theorem 1.1.2 that 1 (tx xγ 2 (x)) =1. x→∞ 1 (xγ 2 (x)) lim

Thus  is slowly varying and this concludes the proof.



Remark 1.1.10 As particular examples of Proposition 1.1.9, we obtain that • as already noted, if f is regularly varying with index α then for all γ = 0, f γ is regularly varying with index αγ; • if  is slowly varying and g is regularly varying then ◦g is slowly varying. This is not necessarily true if g is not regularly varying: take for instance (x) = log(x) and g(x) = ex .

10

1 Regularly varying random variables

⊕ Finally, we mention that regular variation can be defined at −∞ or 0 as follows, and the above properties hold, after appropriate adaptation. • A function f defined on an interval (−∞, a) will be said to be regularly varying at −∞ with index γ, if the function x → f (−x) is regularly varying at ∞ with index γ. • A function f defined on an interval (0, a) will be said to be regularly varying at 0 with index γ, if the function x → f (1/x) is regularly varying at ∞ with index −γ.

1.2 Regularly varying sequences In this section, we define regular variation of sequences and show that a regularly varying sequence can be embedded in a regularly varying function. Recall that the integer part of a real number x, denoted by [x], is the largest integer smallest than x. Definition 1.2.1 A sequence {cn , n ∈ N} of positive numbers is said to be regularly varying if there exists a function ψ defined on (0, ∞) such that, for all t > 0, c[nt] = ψ(t) . n→∞ cn lim

(1.2.1)

Proposition 1.2.2 If (1.2.1) holds then (i) there exists ρ ∈ R such that ψ(t) = tρ ; (ii) the function f defined by f (x) = c[x] is regularly varying with index ρ.

Proposition 1.2.2 means that results proven for regularly varying sequences are also valid for regularly varying functions and vice versa. Proof. Let us first prove that cn−1 /cn → 1 as n → ∞. Define ψn (t) = c[nt] /cn . By definition, for all t > 0, ψn (t) → ψ(t). Define now gn (t) = c[[nt]/t] /cn .

1.3 Regularly varying random variables

11

Then gn (t) = ψ[nt] (1/t)ψn (t) → ψ(1/t)ψ(t) as n → ∞. Note now that if t is irrational and t > 1, then [nt] [nt] 1 [nt] 0. The result will follow by Theorem 1.1.2. Write c[t[x]] c[xt] c[xt] f (tx) = × = ψ[x] (t) . f (x) c[x] c[t[x]] c[t[x]] Since ψ[x] (t) → ψ(t) as x → ∞, it suffices to prove that c[xt] /c[t[x]] → 1 as x → ∞. Note that 0 ≤ [xt] − [[x]t] ≤ t + 1. Set k = [t] + 1 and choose n0 such that if n ≥ n0 , |cn /cn−1 − 1| ≤ . Then, for x large enough so that [[x]t] ≥ n0 , we have (1 − )k ≤

c[xt] = c[t[x]]

[xt]  n=[[x]t]+1

cn cn−1

≤ (1 + )k .

Since  is arbitrary, we obtain that limx→∞ c[xt] /c[[x]t] = 1.



1.3 Regularly varying random variables Let F be a distribution function on R. It may happen that F = 1 − F (the survival function) is regularly varying at ∞ or that F is regularly varying at −∞. Since a distribution function is non-increasing the index of regular variation of F at −∞ or of 1 − F at ∞ must be negative or zero. We will exclude the latter case. In order to define regular variation of a random variable, we will also assume that the behavior of F and 1 − F is comparable. Definition 1.3.1 A random variable X defined on a probability space (Ω, F, P) is said to be regularly varying with tail index α > 0 if the survival function of |X| is regularly varying at infinity with index −α and the tail balance condition holds, i.e. there exists pX ∈ [0, 1] such that

12

1 Regularly varying random variables

lim

x→∞

P(X > x) = pX ∈ [0, 1] . P(|X| > x)

(1.3.1)

The parameter pX is called the extremal skewness of (the distribution of ) X.

If pX = 1 then the left tail is said to be lighter than the right tail. This is the case, in particular, when X is non-negative. If pX = 0 then the right tail is lighter than the left tail. The tail balance condition excludes oscillating or other pathological behavior of the tail. See Problems 1.1 and 1.2. If pX > 0, the survival function F of X can be written as F (x) = x−α (x), where  is a function which is slowly varying at infinity. Let X be a random variable with distribution function F . The quantile function of F (or of X) is its left-continuous inverse F ← , defined on [0, 1] by F ← (t) = inf{x : F (x) ≥ t} . By Proposition 1.1.8, F is regularly varying at infinity with index −α < 0, if and only if the function Q defined on [1, ∞) by Q(t) = F ← (1 − 1/t)

(1.3.2)

is regularly varying at infinity with index 1/α. The function Q will be called the upper quantile function of X, since by definition, for t > 1, Q(t) is the quantile of order 1 − 1/t of X. Because of the tail balance condition, we can also consider the upper quantile function of |X|, which will be denoted by Q|X| . For n ≥ 1, define the sequence aX,n by aX,n = Q|X| (n) .

(1.3.3)

Then it holds that 1 = lim nP(|X| > aX,n ) , n→∞

pX = lim nP(X > aX,n ) . n→∞

(1.3.4a) (1.3.4b)

Furthermore, for all x > 0, lim nP(X > aX,n x) = pX x−α ,

x→∞

lim nP(X < −aX,n x) = (1 − pX )x−α .

x→∞

These relations lead to define on R∗ = R \ {0} the measures

(1.3.5a) (1.3.5b)

1.3 Regularly varying random variables

13

νn = nP(a−1 X,n X ∈ ·) and να,p (dy) = (pX 1{y > 0} + (1 − pX )1{y < 0}) α|y|−α−1 dy .

(1.3.6)

These measures are finite on sets separated from zero, that is, on sets whose complementary contains an open interval containing zero. By standard measure theoretic arguments, the relations (1.3.5) imply that for all measurable sets A, such that να,p (∂A) = 0, (1.3.7) lim νn (A) = να,p (A) . n→∞

The value of να,p (A) is infinite if the set A is not separated from zero. We will rephrase this convergence in the language of vague# convergence in a multivariate setting in Chapter 2. We have pX = να,p ((1, ∞)) and if pX = 1, we simply write να for να,p , that is, the measure να is defined on (0, ∞) by (1.3.8) να (dy) = αy −α−1 dy . The next lemma justifies an intuitively obvious result: the sum of a random variable with tail index α and one with a lighter tail still has tail index α, regardless of the dependence between the two random variables. Lemma 1.3.2 Let X be a random variable, regularly varying at ∞ with tail index α > 0 and extremal skewness pX . Let Y be a random variable such that lim

x→∞

Then lim

x→∞

P(|Y | > x) =0. P(|X| > x)

P(X + Y > x) P(X + Y < −x) = 1 − lim = pX . x→∞ P(|X| > x) P(|X| > x)

Proof. Since X is regularly varying, the assumption implies that for every  > 0, P(|Y | > x) P(|Y | > x) P(|X| > x) lim = lim lim x→∞ P(|X| > x) x→∞ P(|X| > x) x→∞ P(|X| > x) = 0 × −α = 0 . Fix  > 0. Then, lim sup x→∞

lim inf x→∞

P(X + Y > x) P(X > (1 − )x) P(|Y | > x) ≤ lim sup + lim sup P(|X| > x) P(|X| > x) x→∞ x→∞ P(|X| > x) −α −α = pX (1 − ) x , P(X + Y > x) P(X > (1 + )x) P(|Y | > x) ≥ lim inf − lim inf x→∞ x→∞ P(|X| > x) P(|X| > x) P(|X| > x) = pX (1 + )−α x−α .

Since  is arbitrary, this proves that the right tail of X + Y is equivalent to the right tail of X. The left tail is obtained similarly. 

14

1 Regularly varying random variables

1.4 Bounds and moments We now state several versions of Potter’s bounds for regularly varying random variables. Proposition 1.4.1 Let X be a non-negative regularly varying random variable with tail index α > 0. For each  ∈ (0, α), there exist finite positive constants such that P(X > x) ≤ cst x−α+ , x > 0 , −α−

P(X > x) ≥ cst x

, x≥.

(1.4.1a) (1.4.1b)

Proof. Set (x) = xα P(X > x). Then  is slowly varying at infinity and locally bounded, and by the Uniform Convergence Theorem 1.1.2, for every  > 0, sup x− (x) < ∞ . x≥1

For  ∈ (0, α) and x ∈ [0, 1], x− (x) = xα− P(X > x) ≤ 1. Thus the function x → x− (x) is uniformly bounded on (0, ∞) and this yields, for all x > 0, P(X > x) = x−α+ x− (x) ≤ x−α+ sup z − (z) = cst x−α+ . z>0

This proves (1.4.1a). For the lower bound, by the Uniform Convergence Theorem 1.1.2, it also holds that for any  > 0, lim x (x) = ∞ .

x→∞

Thus inf x≥ x (x) > 0 and thus (1.4.1b) holds.



We will also use a variant of Potter’s bounds. Proposition 1.4.2 Let X be a non-negative regularly varying random variable with tail index α > 0. For each  > 0, there exists a finite positive constant such that for all x ≥ 1 and y ≥ 0, P(yX > x) ≤ cst (y α− ∨ y α+ ) P(X > x) = cst y α− 1{y < 1} + cst y α+ 1{y ≥ 1} .

(1.4.2) (1.4.3)

1.4 Bounds and moments

15

Proof. Write P(X > x) = x−α (x) with  slowly varying at infinity. If y ≤ 1, for  ∈ (0, α), we have (x/y) (x/y)y  (tx)(tx)− P(yX > x) = yα = y α− ≤ y α− sup . P(X > x) (x) (x) (x)x− t≥1 By the Uniform Convergence Theorem 1.1.2, lim sup

x→∞ t≥1

(tx)(tx)− =1. (x)x−

Furthermore, supz≥1 (z)z − < ∞ and inf x≤x0 (x)x− > 0 for every x0 > 0 thus sup sup x≥1 t≥1

(tx)(tx)− x) (x/y) (tz)(tz) = yα ≤ y α+ sup sup ≤ cst y α+ .  P(X > x) (x) z≥1 0 0 and there exists  > 0 such that E[Y α+ ] < ∞. Then XY is regularly varying with index α and lim

x→∞

P(XY > x) = E[Y α ] . P(X > x)

(1.4.4)

Proof. For x ≥ 1, define the function Gx on (0, ∞) by Gx (y) = P(yX > x)/P(X > x) , y > 0 . By regular variation, limx→∞ Gx (y) = y α and applying Proposition 1.4.2, for any  > 0 and y > 0, Gx (y) ≤ cst(1 ∨ y)α+ . By assumption, E[Y α+ ] < ∞, thus we can apply the dominated convergence theorem, and we obtain that lim

x→∞

which is our result.

P(XY > x) = E[Gx (Y )] = E[Y α ] , P(X > x) 

16

1 Regularly varying random variables

Remark 1.4.4 We have proved slightly more than Breiman’s Lemma, namely that Gx (y) converges to y p in Lp (FY ) for any p < 1 + /α, where FY is the distribution of Y . We also see that the minimal condition is that the collection of functions {Gx , x ≥ 1} must be uniformly integrable with respect to the distribution of Y . ⊕ Remark 1.4.5 If X and Y have the same distribution, then Breiman’s Lemma does not apply. If E[X α ] = ∞, then XY is regularly varying with the same index as X, but has a heavier tail than X. See Problems 1.18 and 1.19. ⊕ Proposition 1.4.6 (Truncated Moments) Let X be a random variable with distribution function F . If F is regularly varying at infinity with index −α, α ≥ 0, then E[X β ] < ∞ if β < α , E[X β ] = ∞ if β > α , and for all t > 0, E[X β 1{X ≤ tx}] α β−α t = , β>α, β x→∞ β−α x F (x) E[X β 1{X > tx}] α β−α t lim = , α>β. x→∞ α−β xβ F (x) lim

(1.4.5a) (1.4.5b)

Moreover, E[X α ] may be finite or infinite, the function x E[X α 1{X ≤ x}] is slowly varying at infinity and E[X α 1{X ≤ x}] =∞. x→∞ xα F (x) lim

→

(1.4.6)

The limits (1.4.5a) and (1.4.5b) entail the following bounds. There exists a constant Cβ such that, for all x ≥ 0, E[X β 1{X ≤ x}] ≤ Cβ xβ F (x) , α < β β

β

E[X 1{X > x}] ≤ Cβ x F (x) , α > β .

(1.4.7a) (1.4.7b)

We prove (1.4.5a) and (1.4.6). The limit (1.4.5b) is obtained in a similar way to (1.4.5a). Proof (Proof of (1.4.5a)). We may assume x ≥ 1. By Fubini’s Theorem,  t E[X β 1{X ≤ tx}] P(tx ≥ X > sx) =β ds . sβ−1 β x P(X > x) P(X > x) 0

1.5 Problems

17

For  > 0, the Uniform Convergence Theorem 1.1.2 yields  t  t P(tx ≥ X > sx) ds = β sβ−1 sβ−1 (s−α − t−α )ds . lim β x→∞ P(X > x)   When  → 0, the latter quantity converges to the required limit. By Potter’s bound (Proposition 1.4.2), for η > 0 such that α + η < β, there exists a constant Cη such that   E[X β 1{X ≤ x}] P(X > sx) = β ds sβ−1 xβ P(X > x) P(X > x) 0   ≤ Cη sβ−1−α−η ds = O(β−α−η ) . 0

This proves that lim lim sup

→0 x→∞

E[X β 1{X ≤ x}] =0. xβ P(X > x)

This concludes the proof.

(1.4.8) 

Proof (Proof of (1.4.6)). Recall that F (x) = x−α (x), where  is slowly varying at ∞. By the Uniform Convergence Theorem 1.1.2 we have, for  ∈ (0, 1],  1 E[X α 1{X ≤ x}] (ux) lim inf = α lim inf du − 1 α x→∞ x→∞ x P(X > x) 0 u(x)  1 (ux) ≥ α lim inf du − 1 x→∞  u(x)  1 du = − 1 = − log() − 1 . u  Since  is arbitrary, this yields (1.4.6).



1.5 Problems 1.1 Let X be a random variable such that for all n ≥ 0, P(X = 2n ) = 2−n−1 . Prove that E[X s ] < ∞ for 0 < s < 1 but E[X] = ∞. Prove that X does not have a regularly varying right tail. Hint: Prove that xP(X > x) does not converge nor tends to ∞. 1.2 For γ, δ ≥ 0 such that γ + δ < 1 and c > 0, prove that the function γ,δ,c defined on [1, ∞) by γ,δ,c (x) = exp{c logγ (x) cos(logδ x)} is slowly varying at infinity. Deduce that if X is a random variable such that for large x, P(X > x) = x−α γ1 ,δ,c (x) and P(X < −x) = x−α γ2 ,δ,c (x) with γ2 = γ1 and c, δ = 0, then X has regularly varying left and right tails but does not satisfy the tail balance condition (1.3.1).

18

1 Regularly varying random variables

1.3 Let X be a non-negative regularly varying random variable with tail index 1. Let an be the 1 − 1/n quantile of X. Prove that the sequence na−1 n is slowly varying. Let L be the function defined on R+ by L(x) = E[X1{X ≤ x}]. Prove that for every t > 0, the sequence L(an t) is slowly varying. 1.4 Let f : R+ → R be a measurable, locally bounded, regularly varying ∞ function at infinity with index −p, 1/2 < p < 1. Prove that 0 |f (x)f (x + y)|dx < ∞ for all y > 0 and ∞  ∞ f (x)f (x + y)dx 0 = lim x−p (1 + x)−p dx . y→∞ yf 2 (y) 0 Let {jj , j ∈ Z}[N] be a sequence of integerssuch that aj = j −p (j) with  ∞ slowly varying and p as before. Prove that i=0 |ai ai+j | < ∞ for all j ≥ 1 and  ∞ ∞ 2 −1 lim (nan ) aj aj+n = x−p (1 + x)−p dx . n→∞

j=0

0

1.5 Let X be a non-negative regularly varying random variable with tail index α > 0. Let g be a non-negative locally bounded function on [0, ∞), regularly varying at infinity with index γ > 0. Prove that g(X) is regularly varying with tail index α/γ. 1.6 Let X be a Pareto random variable with index α, independent of a positive random variable Y . Prove that P(XY > z) = E [(Y /z)α ∧ 1] .

(1.5.1)

1.7 Let X, Y be two i.i.d. regularly varying non-negative random variables with tail index α > 0. Prove that X ∧ Y is regularly varying with tail index 2α. 1.8 Let Z be a non-negative regularly varying random variable with tail index α > 0. Prove that for all  > 0, there exists a constant such that for all x ≥ 1 and y > 0, E[Z β 1{yZ > x}] ≤ cst y α−β (y  ∨ y − ) , β < α , xβ P(Z > x) E[Z β 1{yZ ≤ x}] ≤ cst y α−β (y  ∨ y − ) , β > α . xβ P(Z > x) Prove that if A is a non-negative random variable, independent of Z, such that E[Aα+ ] for some  > 0, then for all x ≥ 1, E[(AZ)β 1{AZ > x}] ≤ cst E[Aα (A− ∨ A )] , β < α , xβ P(Z > x) E[(AZ)β 1{AZ ≤ x}] ≤ cst E[Aα (A− ∨ A )] , β > α . xβ P(Z > x)

1.5 Problems

19

Hint: use conditioning, integration by parts and Potter’s bound. Note that the second bound is actually valid for all x ≥ 0 since the left-hand side is bounded for x ∈ [0, 1]. 1.9 Let X be a non-negative regularly varying random variable with tail index α > 0. Let Y be a random variable, independent of X, and there exists  > 0 such that E[Y α+ ] < ∞. Evaluate E[(XY )β 1{XY ≤ x}] , β>α. x→∞ xβ P(X > x) lim

1.10 Let Z be a continuous regularly varying random variable with tail index α > 1, extremal skewness p ∈ (0, 1) and E[Z] ≤ 0. Prove that there exist a function h : R+ → R+ and a constant Ch > 0 such that E[Z1{−h(x) ≤ Z ≤ x}] = E[Z] , x ≥ 0 , Ch−1 x

≤ h(x) ≤ Ch x , x ≥ Ch .

(1.5.2a) (1.5.2b)

Hint: define G+ (x) = E[Z1{Z > x}], G− (x) = −E[Z1{Z < −x}] and h = G← − ◦ G+ , x ≥ 0 and apply Proposition 1.1.8. Note: if E[Z] ≥ 0, the same type of result holds with truncation by a function h on the right. 1.11 Let X be a non-negative regularly varying random variable with tail index α > 0. Prove that for every p > 0,  ∞ E[logp+ (X/x)] =α logp (t)t−α−1 dt . lim x→∞ F (x) 1 Compute the limiting integral for p > 0. 1.12 Let F be the distribution function of a non-negative random variable X such that E[X] < ∞ and let H be the distribution function defined by  u 1 F (s) ds . H(u) = E[X] 0 If F is regularly varying at infinity with index α > 1 prove that H is regularly with index α − 1 and as u → ∞, ¯ H(u) ∼

uF¯ (u) . E[X](α − 1)

1.13 Let Z be a regularly varying random variable with tail index 1 and extremal skewness pZ . Let an be the 1 − 1/n quantile of |Z|. Prove that for all x, y > 0, lim

n→∞

n (E[Z0 1{|Z0 | ≤ an y}] − E[Z0 1{Z0 | ≤ an x}]) = (2pZ − 1) log(y/x) . an

20

1 Regularly varying random variables

1.14 Let X1 , . . . , Xn be i.i.d. regularly varying random variables with tail index α > 1. Prove that there exists a constant which does not depend on n such that for all q > α and x > 0,

n (Xi − E[Xi ]) > x P i=1

⎧ ⎪ ⎨cst n P(|X1 | > x) ≤ cst n x−2 L(x) ⎪ ⎩ cst {nq/2 x−q + nP(|X1 | > x)}

if α < 2 , if α = 2 with L slowly varying , if α > 2 . (1.5.3)

Hint: Use truncation at level x, apply Markov inequality, Rosenthal inequality (E.1.4), and Proposition 1.4.6. 1.15 Assume that F is regularly varying and let F ∗n be its n-th convolution, prove that for every  > 0, there exists a constant such that F ∗n (x) ≤ cst (1 + )n F¯ (x) for all n ≥ 1. 1.16 Let f be a non-negative measurable locally bounded function on [1, ∞) and define F by  x f (t) dt . F (x) = 1

Prove that if f is ultimately monotone and if F is regularly varying at infinity with index α > 0, then lim

x→∞

xf (x) =α. F (x)

(1.5.4)

If moreover α = 0, prove that f is regularly varying at infinity with index α − 1. 1.17 Let X and Y be independent random variables with Pareto distribution with tail index α > 0 and β > α, respectively. Let F be the distribution function of XY . Prove that 1 − F (x) =

βx−α − αx−β , x≥1. β−α

1.18 Let X and Y be independent random variables with the same Pareto distribution with tail index α > 0. Let F be the distribution function of XY . Prove that 1 − F (x) = αx−α log x + x−α , x ≥ 1 . 1.19 Let X and Y be i.i.d. non-negative regularly varying random variables with tail index α > 0 such that E[Y α ] = ∞. Prove that XY is heavier tailed than X. Hint: apply Fatou’s lemma.

1.6 Bibliographical notes

21

For a function f with bounded variation on R+ , let Lf be the Laplace-Stieltjes transform of f , defined by  ∞ Lf (t) = e−st df (s) . 0

1.20 Let f be a right-continuous non-decreasing function on R+ such that LF (s) < ∞ for s > 0. For α ≥ 0, prove that f is regularly varying at ∞ with index α if and only if Lf is regularly varying at 0 with index −α and then lim

x→∞

Lf (1/x) = Γ (1 + α) . f (x)

(1.5.5)

Hint: Use integration by parts, regular variation, and Potter’s bounds for the direct implication. For the converse, use the continuity theorem A.2.11. 1.21 Let α ∈ (0, 1) and let F be an increasing function on [0, ∞) such that F (0) = 0, limx→∞ F (x) = 1. Prove that 1 − F is regularly varying at ∞ with index −α if and onlyif 1 − LF is regularly varying at 0 with index α. x Hint: Define U (x) = 0 F (t) dt and use Problems 1.16 and 1.20.

1.6 Bibliographical notes The results presented in this chapter are mainly taken from the classical monograph [BGT89]. Most results can also be found in [Res87]. The Uniform Convergence Theorem 1.1.2 is adapted from [BGT89, Theorems 1.2.1, 1.4.1 and 1.5.2]. The Representation Theorem 1.1.5 is [BGT89, Theorem 1.3.1]. Karamata Theorem 1.1.7 is [BGT89, Theorem 1.5.11]. Breiman’s lemma was originally proved in [Bre65] for a Pareto random variable X.

2 Regularly varying random vectors

In this chapter, we define regular variation of finite-dimensional random vectors. To do so, we use the concept of vague# convergence on Rd \ {0} in Section 2.1. A regularly varying random vector is not only characterized by its tail index, but also by a boundedly finite measure on Rd \ {0}, called the exponent measure, which describes the extremal behavior of the vector in given sectors of Rd \ {0}. Using this mode of convergence, we easily obtain conditions for the convergence of expectations and transfer of regular variation under certain mappings. This yields many closure properties for functions of regularly varying vectors, such as products, maxima, and minima. An important feature of multivariate regularly random vectors is the dichotomy between extremal dependence and extremal independence. In the latter case, the exponent measure is useless to describe the extremes of the vector away from the axes. Specific techniques are needed and they will be the object of the next chapter. In Section 2.2, we introduce the polar decomposition of the exponent measure given a certain norm. This will be an essential tool in the study of time series. We give examples of regularly varying vectors and compute their exponent and spectral measures. In Section 2.3, we consider the problem of obtaining limiting conditional distribution of the random vector given an extreme event. When such limits exist, they can be used to define miscellaneous extremal dependence measures which will also be of interest in the context of time series. Notation We introduce here notation that will be used throughout this and later chapters. Regardless of the dimension, we denote by 0 the null vector in Rd or the null sequence in (Rd )Z . We also write ∞ = (∞, . . . , ∞). Vector order is taken

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 2

23

24

2 Regularly varying random vectors

componentwise, i.e. for u, v ∈ [−∞, ∞] we write u ≤ v if and only if ui ≤ vi , 1 ≤ i ≤ d. Multivariate intervals are defined as follows: for u, v ∈ [−∞, ∞] such that u ≤ v, [u, v] = {x ∈ [−∞, ∞] | ui ≤ xi ≤ vi , 1 ≤ i ≤ d} . Open and semi-open intervals are defined similarly. For u ∈ RZ , we recall the notation   ui = max ui , ui = min ui . i∈Z

∈Z

i∈Z

i∈Z

n n For d-dimensional vectors x(1) , . . . , x(n) , j=1 x(j) and j=1 x(j) denote componentwise maxima and minima, respectively, that is, n  j=1 n 

ä Ä (j) (j) , x(j) = ∨nj=1 x1 , . . . , ∨nj=1 xd ä Ä (j) (j) . x(j) = ∧nj=1 x1 , . . . , ∨nj=1 xd

j=1

For a set A ⊂ Rd and y > 0, we define the dilated set yA by yA = {ya : a ∈ A} . Given an arbitrary norm on Rd denoted by |·|, we define the open and closed balls and sphere with center x ∈ Rd and radius r > 0 by B(x, r) = {y ∈ Rd , |x − y| < r} , B(x, r) = {y ∈ Rd , |x − y| ≤ r} , S(x, r) = {y ∈ Rd , |x − y| = r} , Sd−1 = S(0, 1) . The operator norm of a matrix A ∈ Rq×d is defined by A =

sup x∈S(0,1)

|Ax| .

For a set A ⊂ Rd , y ∈ Rd and  > 0, we define d(y, A) = inf{|x − y| : x ∈ A} and the -enlargement of A as A = {y ∈ Rd : d(y, A) < }. Non-decreasing sequences which tend to infinity will play an important role in the sequel so we have given them a name: A scaling sequence is a non-negative non-decreasing sequence tending to infinity.

2.1 Regularly varying random vectors

25

2.1 Regularly varying random vectors 2.1.1 Vague# convergence and regular variation on Rd \ {0} Regular variation of d-dimensional random vectors will be defined in terms of vague# convergence of measures on Rd \{0}. We only recall here its definition. See Appendices B.1 and B.2 for more details. Let Rd be endowed with its usual topology (defined by any arbitrary norm |·|), which makes it a Polish space, and the related Borel σ-field. Definition 2.1.1 • A set A is said to be separated from 0 if there exists an open set U such that 0 ∈ U and A ⊂ U c . • The collection of all sets separated from 0 is called a boundedness and denoted B0 . • A measure ν on Rd \{0} is said to be B0 -boundedly finite if ν(A) < ∞ for all Borel measurable sets A separated from 0. • A sequence of B0 -boundedly finite measures {νt , t ∈ T } (with T = N or T = R+ ) on Rd \ {0} converges B0 -vaguely# to a measure ν on v#

Rd \ {0}, written νt −→ ν, if limt→∞ νt (A) = ν(A) for all Borel sets A which are separated from 0 and such that ν(∂A) = 0.

We can now define the regular variation of a random vector. Definition 2.1.2 A d-dimensional random vector X is regularly varying if there exists a non-zero B0 -boundedly finite measure ν X on Rd \ {0}, called an exponent measure of X, and a scaling sequence {cn } such that # on Rd \ the sequence of measures nP(c−1 n X ∈ ·) converges B0 -vaguely {0} to the measure ν X .

The following result is a particular case of Theorem B.2.2. The proof is in Appendix B.2.1. Theorem 2.1.3 The following statements are equivalent: (i) The vector X is regularly varying in the sense of Definition 2.1.2.

26

2 Regularly varying random vectors

(ii) There exist α > 0, a function g : (0, ∞) → (0, ∞) which is regularly varying at infinity with index α and a B0 -boundedly finite non-zero measure ν X such that g(t)P(t−1 X ∈ ·) converges B0 -vaguely# to ν X as t → ∞. If (i) and (ii) hold, the sequence {cn } is regularly varying at infinity with index 1/α and the measure ν X is (−α)-homogeneous, i.e. for y > 0 and A separated from 0, ν X (yA) = y −α ν X (A) .

(2.1.1)

The function g in (ii) can be chosen as 1/P(X ∈ tA) for any set A such that ν X (A) > 0. If (ν X , cn ) and (ν X , cn ) are as in Definition 2.1.2, then there exists a constant ς such that ν X = ςν X ,

cn = ς 1/α . n→∞ cn lim

The positive number α will be called the tail index of X. Exponent measures are not unique but two exponent measures must be proportional to each other. Since ν X is not identically zero, it must hold that ν X (B c (0, r)) > 0, where B c (0, r) is the complement of the open ball centered at zero with radius r. Moreover, the homogeneity property implies that ν X (S(0, r)) = 0 for all r > 0, where S(0, r) = ∂B(0, r) is the sphere centered at zero with radius r. This entails that |X| is regularly varying with tail index α and νX P(x−1 X ∈ ·) v# , −→ c P(|X| > x) ν X (B (0, 1)) where again vague convergence holds on Rd \{0}. If the sequence cn in (2.1.2) is chosen as cn = Q|X | (n), then c

ν X (B (0, 1)) = 1 .

(2.1.2)

The vague convergence and the homogeneity property imply also that for each j = 1, . . . , d and y > 0 we have lim

x→∞

P(|Xj | > xy) = ςj y −α , P(|X| > x)

2.1 Regularly varying random vectors

27

with ςj =

ν X ({x : |xj | > 1}) . ν X ({x : |x| > 1})

It may happen that ςj = 0 for some, but not all, indices j. This means that some of the components may have a lighter tail than the other, but there must exist at least one component with tail index α. For instance, in case d = 2, if X1 is a Pareto random variable and X2 is a standard exponential distribution (not necessarily independent of X1 ), then (X1 , X2 ) is regularly varying. The exponent measure is concentrated on the x axis and ς2 = 0. More generally, a subvector of a regularly varying vector need not be regularly varying, unless the exponent measure puts mass on the subspace where this subvector lies. This will be made precise in Section 2.1.4. However, if all the random variables Xj , j = 1, . . . , d, have the same distribution, then necessarily ς1 = · · · = ςd > 0 and all the subvectors of X are regularly varying with the same tail index. This will be the case when we will consider finite-dimensional distributions of stationary univariate time series. We proceed with two elementary examples of regularly varying random vectors and give their exponent measures. Further examples will be given in Section 2.2.1. Recall that δx denotes the Dirac measure at point x, that is, for a Borel set B, δx (B) = 1 if x ∈ B and δx (B) = 0 if x ∈ B. Example 2.1.4 (Independence) Assume that Z is d-dimensional random vector with i.i.d. regularly varying components satisfying the tail balance condition (1.3.1). Then, the exponent measure ν Z is concentrated on the axes and choosing cn = Q|Z1 | (n) yields ν Z (du1 , . . . , dud ) =

d 

δ0 (du1 ) × · · · × δ0 (dui−1 ) × να,p (dui ) × δ0 (dui+1 ) × · · · × δ0 (dud ) ,

i=1

(2.1.3) where να,p is as in (1.3.6). In particular, for y > 0 and j = 1, . . . , d, ν Z ({x : |xj | > y}) = y −α . This means that only one of the d components Z1 , . . . , Zd can be extremely large at a time. This property will be often used and is referred to as the “single big jump heuristics”.  Example 2.1.5 (Total dependence) Assume that X = (X1 , . . . , X1 ), where X1 is regularly varying with index α. Let ui < 0 and vi > 0, i = 1, . . . , d. Then, again with the choice cn = Q|X1 | (n), ν X ([u, v]c ) = lim nP(X1 < cn ui or X1 > cn vi for some i = 1, . . . , d) n→∞

= (1 − pX )(∧di=1 (−ui ))−α + pX (∧di=1 vi )−α .

28

2 Regularly varying random vectors

Thus, we conclude that the exponent measure is concentrated on the line u1 = · · · = ud . For future reference we also note that ν X ((v, ∞)) = lim nP(X1 > cn vi n→∞

for all i = 1, . . . , d) = pX (∨di=1 vi )−α . (2.1.4) 

We conclude this section by recalling the link between the exponent measure and the convergence of the renormalized maxima. For sequences of i.i.d. random vectors with non-negative components, this convergence is equivalent to multivariate regular variation. This explains the terminology “exponent measure”. Theorem 2.1.6 Let X (j) , j ≥ 1, be independent copies of the regularly varying vector X with non-negative components and exponent measure ν X associated to the scaling sequence {cn }. Then for u ∈ Rd+ \ {0},  n   c (j) X ≤ cn u = e−ν X ([0,u ] ) . (2.1.5) lim P n→∞

j=1

Proof. If X (j) are i.i.d. then  n   (j) P X ≤ cn u = {1 − n−1 nP (X ∈ cn [0, u]c )}n . j=1

The set [0, u]c is separated from 0 in Rd+ \ {0}, therefore lim nP (X ∈ cn [0, u]c ) = ν X ([0, u]c ) .

n→∞



This yields (2.1.5).

Because of the homogeneity property of the exponent measure, the limiting distribution is max-stable, that is, if we let {Y j , j ≥ 1} denote a sequence of i.i.d. random vectors with distribution given by the right-hand side of (2.1.5), then d

max Y j = Y 1 . c−1 n 1≤j≤n

Max-stable distributions and processes will be further considered in Chapter 13.

2.1 Regularly varying random vectors

29

2.1.2 Extremal dependence and independence We start with a discussion of the two-dimensional case. Let (X1 , X2 ) be a regularly varying vector with non-negative components and exponent measure ν X . Then, lim nP(X1 > cn x, X2 > cn x) = ν X ((x, ∞) × (y, ∞)) . n→∞

The assumption that ν X is not zero does not prevent this limit to be nonzero. This will be the case for instance if X1 and X2 are independent. More generally, this will be the case if the exponent measure is concentrated on the axes, that is, on the lines {(x, 0), x > 0} and {(0, y), y > 0}. This means that X1 and X2 cannot be extremely large simultaneously and they are said to be extremally independent. This situation can be generalized to higher dimensions. Definition 2.1.7 Let X 1 , X 2 be two regularly varying and tail equivalent random vectors (not necessarily with the same dimension), i.e. such that 0 < lim inf x→∞

P(|X 1 | > x) P(|X 1 | > x) ≤ lim sup x) x→∞ P(|X 2 | > x)

They are said to be extremally independent if for all s, t > 0, lim

x→∞

P(|X 1 | > sx, |X 2 | > tx) =0. P(|X 1 | > x)

If in addition both vectors are regularly varying and tail equivalent (which implies that they have the same tail index), extremal independence implies joint regular variation. Let μ ⊗ ν denote the product of two measures μ and ν (see Theorem A.1.10). Proposition 2.1.8 Let X 1 , . . . , X h be tail equivalent regularly varying random vectors with respective exponent measures ν X i , i = 1, . . . , h. They are pairwise extremally independent if and only if the vector (X 1 , . . . , X h ) is regularly varying with exponent measure ν X proportional to h  ςi δ 0 ⊗ · · · ⊗ ν X i ⊗ · · · ⊗ δ 0 , (2.1.6) i=1

where ςi , 1 ≤ i ≤ h, are constants such that ςi /ς1 = ν X i (B c (0, 1))/ν X 1 (B c (0, 1)) .

30

2 Regularly varying random vectors

Proof. The only if part is trivial. The proof of the direct part is by induction. We prove the result for h = 2 and the induction step is trivial. Let d1 and d2 be the dimensions of X 1 and X 2 . We must prove that the sequence of measures {ν x } defined by νx =

P(x−1 (X 1 , X 2 ) ∈ ·) P(|X 1 | > x)

is vaguely# relatively compact by applying Lemma B.1.29 and that the measure ν X defined by (2.1.6) with ci = ν X i (B c (0, 1))/ν X 1 (B c (0, 1)) is the only possible limit along a subsequence by applying Lemma B.1.31. Let f be uniformly continuous with support in A1 × Rd2 with A1 separated from zero. Fix  > 0. Then there exists η such that |x2 | ≤ η implies |f (x1 , x2 )−f (x1 , 0)| ≤ . By extremal independence and since A1 is separated from zero, we have E[f (x−1 (X 1 , X 2 ))1{|X 2 | > ηx}] =0. x→∞ P(|X 1 | > x)

lim ν x (f ) = lim

x→∞

Thus E[|f (x−1 (X 1 , X 2 )) − f (x−1 (X 1 , 0))|] P(|X 1 | > x) x→∞ E[|f (x−1 (X 1 , X 2 )) − f (x−1 (X 1 , 0))1{|X 2 | ≤ ηx}] ≤ lim sup P(|X 1 | > x) x→∞ E[|f (x−1 (X 1 , X 2 )) − f (x−1 (X 1 , 0))1{|X 2 | > ηx}] ≤. + lim sup P(|X 1 | > x) x→∞

lim sup

Since  is arbitrary and δ0 ⊗ ν X 2 (f ) = 0, this proves that E[f (x−1 (X 1 , 0)) = ν X 1 ⊗ δ0 (f ) = ν X (f ) . x→∞ P(|X 1 | > x)

lim ν x (f ) = lim

x→∞

If now f (x2 , x2 ) = g(x2 ) with g a continuous function with support separated from zero, then ν X 1 ⊗ δ0 (f ) = 0 and E[f (x−1 (X 1 , X 2 )) E[g(x−1 X 2 )) = lim x→∞ x→∞ P(|X 1 | > x) P(|X 1 | > x) c ν X 2 (B (0, 1)) ν X 2 (g) = c2 δ0 ⊗ ν X 2 (f ) = ν X (f ) . = ν X 1 (B c (0, 1))

lim ν x (f ) = lim

x→∞

This proves that ν X is the only possible limit for the sequence {ν x } along any subsequence. We must now prove that the sequence {ν x } is relatively compact. Define Un = {(x1 , x2 ) : |x1 | + |x2 | > en }. The sets Un , n ∈ Z satisfy the assumptions (i) and (ii) of Lemma B.1.29 and ν x (Un ) =

P(|X 1 | > xen /2) P(|X 2 | > xen /2) P(|X 1 | + |X 2 | > xen ) ≤ + . P(|X 1 | > x) P(|X 1 | > x) P(|X 1 | > x)

Thus conditions (B.1.7a) and (B.1.7b) hold. This proves that the sequence v#

{ν x } is relatively compact and we conclude that ν x −→ ν X .



2.1 Regularly varying random vectors

31

2.1.3 Convergence of expectations Definition 2.1.2 implies that for all measurable sets A ⊂ Rd which are separated from 0 and such that ν X (∂A) = 0, we have −1 1A (y)ν X (dy) . lim nP(cn X ∈ A) = ν X (A) = n→∞

Rd \{0}

By the Portmanteau Theorem B.1.17 for vague# convergence, the indicator function can be replaced by a bounded continuous function with support separated from 0. The next result shows how to extend this convergence to possibly unbounded function, under an appropriate growth condition. Proposition 2.1.9 Let X be a regularly varying random vector in Rd with tail index α and exponent measure ν X associated to the scaling sequence {cn }. Let g be a ν X -almost everywhere continuous map from Rd to R. Assume that ó î lim lim sup nE |g(X/cn )|1{|X |≤cn } = 0 , (2.1.7a) →0 n→∞ ó î lim lim sup nE |g(X/cn )|1{|X |>M cn } = 0 . (2.1.7b) M →∞ n→∞

Then

lim nE[g(X/cn )] = ν X (g) =

n→∞

Rd

g(y)ν X (dy) .

(2.1.8)



Proof. This is a straightforward application of Theorem B.1.22.

We now give practical conditions on the function g that ensure the conditions of Proposition 2.1.9. Corollary 2.1.10 Let X be a regularly varying random vector in Rd with tail index α and exponent measure ν X associated to the scaling sequence {cn }. Let g be a ν X -almost everywhere continuous map from Rd to R. Assume that there exist  > 0 and cst > 0 such that |g(y)| ≤ cst(|y|

α−

where |·| is any norm on Rd . Then, lim nE[g(X/cn )] = ν X (g) =

n→∞

∧ |y|

α+

),

(2.1.9)

g(y)ν X (dy) .

(2.1.10)

Rd

32

2 Regularly varying random vectors

Proof. Let g be a continuous function on Rd . • Assume first that g is bounded and that its support is separated from 0 in Rd . Then (2.1.10) follows by vague# convergence. • Assume now that g satisfies the bound (2.1.9) in a neighborhood of zero, i.e. |g(y)| ≤ C|y|α+

(2.1.11)

for |y| ≤ 1. For η ∈ (0, 1), let hη be a continuous function such that hη (y) = 0 if |y| ≤ η, hη (y) ≤ 1 if η ≤ |y| ≤ 2η and hη (y) = 1 if |y| ≥ 2η. Then ghη is bounded, continuous and its support is separated from 0 in Rd . Condition (2.1.11) implies E[g(X/cn )hη (X/cn )] ≤ E[g(X/cn )] ≤ O(η α+ ) + E[g(X/cn )hη (X/cn )] . Thus, by vague# convergence, we obtain g(y)hη (y)ν X (dy) ≤ lim sup nE[g(X/cn )] n→∞ |y |≥η α+ ≤ O(η )+ g(y)hη (y)ν X (dy) . |y |≥η

Condition (2.1.11) and the homogeneity property of ν X imply that g(y)ν X (dy) ≤ C |y|α+ ν X (dy) = O(η  ) . |y |≤η

|y |≤η

Altogether, this yields (since η < 1)







sup nE[g(X/cn )]

= O(η  ) .

d g(y)hη (y)ν X (dy) − lim n→∞ R

Since η is arbitrary, this yields (2.1.10). • We must finally consider the case of an unbounded function. If |g(y)| ≤ C|y|α− for |y| ≥ 1, then, for M > 0 and  < , Potter’s bounds (Proposition 1.4.2) yield 

E[|X|α− 1{|X |>M } ] = O(M − ) . The homogeneity property of ν X yields |y |>M |y|α− ν X (dy) = O(M − ). Therefore, truncation arguments similar to those used to control the integrals around zero yield the desired result.

2.1 Regularly varying random vectors

33

 Remark 2.1.11 We have written the limits in Proposition 2.1.9 and Corollary 2.1.10 in terms of a sequence cn . They can also be expressed in terms of the continuous variable x. Specifically, conditions (2.1.7) and the limits (2.1.8) and (2.1.10) become ó î E |g(x−1 X)|1{|X |≤x} =0, lim lim sup →0 x→∞ P(|X| > x) ó î E |g(x−1 X)|1{|X |>M x} lim lim sup =0, M →∞ x→∞ P(|X| > x) E[g(x−1 X)] = ν X (g) . lim x→∞ P(|X| > x) In the sequel, we will use both ways of writing these limits according to context.

2.1.4 Transfer of regular variation We next give conditions for a function of a regularly varying vector X to be regularly varying. Proposition 2.1.12 Let X be a regularly varying random vector with tail index α, exponent measure ν X on Rd \ {0} associated to the scaling sequence {cn }. Let g be a continuous map from Rd to Rk such that g(tx) = tγ g(x) for some γ > 0. If the measure ν X ◦g −1 is not identically zero on Rk \ {0}, then g(X) is regularly varying with index −α/γ and exponent measure ν X ◦ g −1 associated to the scaling sequence {cγn }.

Equivalently, under the assumption of Proposition 2.1.12, we have P(x−1 g(X) ∈ ·) = ν X ◦ g −1 . γ x→∞ P(|X| > x) lim

(2.1.12)

# Proof. Let νn = nP(c−1 to n X ∈ ·). By assumption, νn converges vaguely d ν X in R \ {0}. The continuity and homogeneity of g imply that a set A is separated from zero if and only if g −1 (A) is separated from zero. Thus, Theorem B.1.21 implies that νn ◦ g −1 converges vaguely# to ν X ◦ g −1 . It suffices to identify the scaling sequence.

If ν X ◦ g −1 (∂B) = 0, then ∂(g −1 (B)) ⊂ g −1 (∂B) by Lemma A.1.5, hence we have,

34

2 Regularly varying random vectors −1 −1 lim nP(c−γ (B)) = ν X ◦ g −1 (B) . n g(X) ∈ B) = lim nP(cn X ∈ g

n→∞

n→∞

Therefore, cγn is a scaling sequence for νn ◦ g −1 .



We now extend the previous result to random homogeneous maps. Proposition 2.1.13 Let X be a regularly varying random vector with tail index α exponent measure ν X on Rd \ {0} associated to the scaling sequence {cn }. Let G be a random map from Rd to Rd , independent of the vector X, almost surely continuous and such that G(tx) = tγ G(x) for a fixed (deterministic) γ > 0, and ô ñ E sup |G(u)|α/γ+ < ∞ ,

(2.1.13)

|u |≤1

for an  > 0. If moreover the measure E[ν X ◦ G−1 ] is not identically zero on Rd \ {0}, then G(X) is regularly varying with tail index α/γ and exponent measure E[ν X ◦ G−1 ] associated to the scaling sequence {cγn }.

Proof. We complement the proof of Proposition 2.1.12 with a bounded convergence argument. Let A be a measurable set, separated from 0 in Rd . This implies that there exists η > 0 such that if y ∈ A, then |y| > η. Assume also that E[ν X ◦ G−1 (∂A)] = 0. Then, almost surely, it holds that î ó −γ nP(c−γ n G(X) ∈ A) = E nP(cn G(X) ∈ A | G) . By the previous theorem, we know that limn→∞ nP(c−γ n G(X) ∈ A | G) = ν X ◦ G−1 (A), almost surely. Define V = sup|u |≤1 |G(u)| ∨ 1. Then, by homogeneity of G and applying Proposition 1.4.2, we have −γ α/γ+ nP(c−γ . n G(X) ∈ A | G) ≤ nP(cn |X| V > η | G) ≤ CV γ

By Condition (2.1.13), this allows to apply the bounded convergence theorem and conclude the proof.  We now apply Proposition 2.1.13 to various situations. Random linear maps The following result is a multivariate extension of Breiman’s Lemma 1.4.3.

2.1 Regularly varying random vectors

35

Corollary 2.1.14 Let X be a regularly varying random vector with tail index α, exponent measure ν X on Rd \ {0} associated to the scaling sequence {cn }. Let A be a q × d random matrix, independent of X such that ó î (2.1.14) 0 < E Aα+ < ∞ . Then AX is regularly varying with index −α and exponent measure ν A X associated to the scaling sequence {cn } defined by ν A X = E[ν X ◦ A−1 ] ,

(2.1.15)

provided it is not the null measure.

Proof. Apply Proposition 2.1.13 to the map x → Ax which satisfies the assumptions of the theorem with γ = 1 and Condition (2.1.13) implied by (2.1.14).  Example 2.1.15 Let Z1 , . . . , Zd be positive, i.i.d. and regularly varying and let ν Z be the exponent measure as in Example 2.1.4, so that ν Z ((1, ∞) × R+ ) = 1. Applying (2.1.15) with A the identity matrix, we obtain Ä d ä P i=1 Zi > x =d. (2.1.16) lim x→∞ P(Z1 > x)  In Section 2.2.1, we will use Corollary 2.1.14 to obtain the joint regular variad tion of randomly weighted sums of the form j=1 Yj Xj and describe precisely the exponent measure in the case of independent Xj , j = 1, . . . , d. Only a partial converse of Corollary 2.1.14 is true, although it is difficult to find a counterexample. We state it here without proof. Let the usual scalar product of two vectors, u, v ∈ Rd be denoted by u, v. Theorem 2.1.16 Let X be an d-dimensional random vector. Assume that there exist α > 0, a slowly varying function and a function w defined on Rd such that, for all u ∈ Rd , lim

x→∞

P(u, X > x) = w(u) , x−α (x)

36

2 Regularly varying random vectors

and there exists u0 ∈ Rd such that w(u0 ) = 0. If α is not an integer or if there exits an open cone C of Rd such that w(u) = 0 for u ∈ C, then X is regularly varying with tail index α.

Subvectors Another application of Corollary 2.1.14 is that if X = (X1 , . . . , Xd ) is a regularly varying random vector in the sense of Definition 2.1.2, then any subvector of X is regularly varying, with the same scaling sequence, and the exponent measure of the subvector can be deduced from that of X. This is obtained by taking A as the d × d diagonal matrix whose diagonal entries are either ones or zeroes. Corollary 2.1.17 Let X = (X1 , . . . , Xd ) be a regularly varying random vector with exponent measure ν X associated to the scaling sequence {cn }. For j = 1, . . . , d−1, if ν X ((Rj \{0})×Rd−j ) > 0, then the vector X 1,j = (X1 , . . . , Xj ) is also regularly varying with the same scaling sequence cn and its exponent measure is given by ν X 1, j (A) = ν X (A × Rd−j ) ,

(2.1.17)

for all Borel sets A separated from 0 in Rj .

Weighted maxima Corollary 2.1.18 Let X = (X1 , . . . , Xd ) be a regularly varying random vector with tail index α, exponent measure ν X associated to the scaling sequence {cn }. Let A be a q × d random matrix, independent of X such that (2.1.14) holds. Then the vector ä Ä ∨di=1 A1,i Xi , . . . , ∨di=1 Aq,i Xi is regularly varying with tail index α.

Proof. Apply Proposition 2.1.13 to the map G defined by ä Ä G(x) = ∨di=1 A1,i xi , . . . , ∨di=1 Aq,i xi

2.1 Regularly varying random vectors

37

which satisfies its assumptions by the same arguments as in the proof of Corollary 2.1.14.  Example 2.1.19 Let Z1 , . . . , Zd be positive, i.i.d. and regularly varying and let ν Z be the exponent measure as in Example 2.1.4, so that ν Z ((1, ∞) × Rd−1 + ) = 1. Applying Corollary 2.1.18 we have lim

x→∞

P(max{Z1 , . . . , Zd } > x) =d. P(Z1 > x) 

If ν X ((−∞, u]c ) > 0 for some u ∈ Rd+ \ {0} (which is always true if the components of X are non-negative) and for a ∈ Rd+ \ {0}, we obtain that ∨di=1 a−1 i Xi is regularly varying: for y > 0,  d   Xi lim nP > cn y = y −α ν X ([−∞, a]c ) . (2.1.18) n→∞ a i=1 i Minima In the two-dimensional case, the map (x, y) → x ∧ y is homogeneous with degree 1 but does not necessarily satisfy the assumptions of Proposition 2.1.12. If the exponent measure ν X of a bivariate random vector X = (X1 , X2 ) with non-negative components is concentrated on the axes, then ν X ({(x, y) ∈ (0, ∞)2 | x ∧ y > }) = 0 for all  > 0 and we cannot apply Proposition 2.1.12, since the condition that the measure ν X ◦ g −1 is not identically zero on R2 \ {0} is not fulfilled. See Problem 1.7. In the opposite case where ν X ((0, ∞)2 ) > 0, then because of homogeneity, for any  > 0, ν X ({(x, y) ∈ (0, ∞)2 | x ∧ y > }) > 0 and Proposition 2.1.12 applies. Therefore we obtain the following result. Corollary 2.1.20 Let the bivariate random vector X = (X1 , X2 ) in R2+ be regularly varying with tail index α, scaling sequence {cn }, and exponent measure ν X such that ν X ((0, ∞)2 ) > 0. Then X1 ∧ X2 is regularly varying with index −α and for x > 0, lim nP(X1 ∧ X2 > cn x) = x−α ν X ((1, ∞)2 ) .

n→∞

Products The product map does not necessarily satisfy the assumptions of Proposition 2.1.12. If the exponent measure ν X of a bivariate random vector

38

2 Regularly varying random vectors

X = (X1 , X2 ) with non-negative components is concentrated on the axes, then as for the minimum, we cannot apply Proposition 2.1.12, since the condition that the measure ν X ◦ g −1 is not identically zero on Rd \ {0} is not fulfilled. In the opposite case where ν X ((0, ∞)2 ) > 0, then because of homogeneity, for any  > 0, ν X ({(x, y) ∈ (0, ∞)2 | xy > }) > 0 and Proposition 2.1.12 applies, since the map g(x1 , x2 ) = x1 x2 is homogeneous of degree 2. More generally, we can consider products of the form X1γ X2δ for γ, δ > 0 and we obtain the following result. Corollary 2.1.21 Let the bivariate random vector X = (X1 , X2 ) in R2+ be regularly varying with tail index α and exponent measure ν X such that ν X ((0, ∞)2 ) > 0. Then, for all γ, δ > 0, X1γ X2δ is regularly varying with tail index α/(γ + δ).

Proof. The map g defined on R2+ by g(x1 , x2 ) = xγ1 xδ2 is homogeneous with degree δ + γ. Therefore Proposition 2.1.12 proves that X1γ X2δ has tail index α/(δ + γ).  If γ and δ are positive integers, the assumption that X1 and X2 are nonnegative can be removed and the condition on ν X becomes ν X ([R2 \ (R × {0}) ∪ ({0} × R)]) > 0. We will provide in Section 3.2.1 results on the regular variation of the product of an extremally independent random variable.

2.2 Spectral decomposition In one dimension, the regular variation property can be written equivalently as ã Å X ∈ · = y −α Λ , lim nP |X| > aX,n y, n→∞ |X| where Λ = pX δ1 + (1 − pX )δ−1 and aX,n is defined in (1.3.3). In analogy with the above relation, we can write a spectral decomposition in higher dimensions. Theorem 2.2.1 Let |·| be a norm on Rd and let Sd−1 be the associated unit sphere. A vector X in Rd is regularly varying with tail index α if

2.2 Spectral decomposition

39

and only if there exists a constant ς > 0, a probability measure Λ on Sd−1 and a scaling sequence {cn } such that, for all y > 0, Å ã # X v nP |X| > cn y, ∈ · −→ ςy −α Λ , n → ∞ , (2.2.1) |X| where vague# convergence holds on (0, ∞) × Sd−1 .

The probability measure Λ is called the spectral measure of X associated to the norm |·|. Proof (Proof of Theorem 2.2.1). Set E = (0, ∞) × Sd−1 and define the metric dE on E by dE ((s, x), (t, y)) = |s−1 − t−1 | ∨ |x − y| . A set A is bounded in E for the metric dE if A ⊂ (, ∞) × Sd−1 for some  > 0. Therefore, a sequence of boundedly finite measures {νn , n ∈ N} on E converges vaguely# to a measure ν if limn→∞ νn (A) = ν(A) for all sets A such that A ⊂ (, ∞) × Sd−1 for some  > 0 and ν(∂A) = 0. Define the map T : Rd \ {0} → (0, ∞) × Sd−1 by Å ã x T (x) = |x| , . |x|

(2.2.2)

The map T is a bicontinuous bijection and has the property that a set A is separated from zero in Rd if and only if T (A) ⊂ (, ∞) × Sd−1 for some  > 0. Thus, by Theorem B.1.21, a sequence of boundedly finite measures {μn } converges vaguely# to a measure μ on Rd \ {0} if and only if the sequence {μn ◦ T −1 } converges vaguely# to μ ◦ T −1 on (0, ∞) × Sd−1 . Let cn be as in Definition 2.1.2 and define the measure μn on (0, ∞) × Sd−1 by ÅÅ ã ã |X| X , μn (B) = nP ∈B , cn |X| for any Borel set B of (0, ∞) × Sd−1 . Then, by the definition of T , we have v#

−1 μn (B) = nP(c−1 (B)) and thus nP(c−1 n X ∈T n X ∈ ·) −→ ν X if and only if v#

μn −→ ν X ◦ T −1 . We have to identify ν X ◦ T −1 . If C is a Borel subset of Sd−1 , then, for y > 0,

40

2 Regularly varying random vectors

νX



 ™ã Åß y −1 x x ∈C = νX ∈C x : y −1 |x| > 1, −1 x : |x| > y, |x| y |x| ™ã Åß u = νX ∈C yu : |u| > 1, |u| ™ã Åß u −α = y νX ∈C , u : |u| > 1, |u|

where in the last equality we used the homogeneity property (2.1.1) of ν X . Define now c

ς = ν X (B (0, 1) .

(2.2.3)

Then ς = 0 since ν X is non-degenerate and homogeneous and we can define a probability measure on Sd−1 by ™ã Åß u Λ = ς −1 ν X ∈· , (2.2.4) u : |u| > 1, |u| and thus ν X ◦ T −1 = ςνα ⊗ Λ.



The spectral measure does depend on the choice of the norm but we will not make this dependence explicit in the notation. Exponent measures are equal up to a multiplicative constant (depending on the scaling sequence cn ), whereas spectral measures corresponding to different norms are all probability measures. Another important fact is that any probability measure on the unit sphere is the spectral measure of a regularly varying random vector. Lemma 2.2.2 Let Λ be a probability measure on the unit sphere Sd−1 of Rd endowed with an arbitrary norm. Then there exists a regularly varying random vector whose spectral measure is Λ. Proof. Let R be a standard Pareto random variable with index α > 0. Let U be a random variable on Sd−1 with distribution Λ. Then RU is a regularly varying vector with spectral measure Λ. Indeed, since |RU| = R, we obtain by independence ã Å RU ∈ · | |RU| > x = P(U ∈ ·) . P |RU|  In the next result we give the converse relation to (2.2.4), that is, we express the exponent measure ν X in terms of the spectral measure Λ.

2.2 Spectral decomposition

41

Proposition 2.2.3 Assume that X in Rd is regularly varying with tail index α, with exponent measure ν X and spectral measure Λ associated to the norm | · |. Then, for any measurable subset A of Rd \ {0}, we have ∞ Å ã ν X (A) = ς 1A (rλ) Λ(dλ) αr−α−1 dr , (2.2.5) 0

Sd−1

with ς as in (2.2.3). If Θ is a random element on Sd−1 with distribution Λ, then (2.2.5) can be expressed as ∞ P(rΘ ∈ A)αr−α−1 dr . (2.2.6) ν X (A) = ς 0

Proof. Consider the map T given in (2.2.2). Let A be a measurable subset ¶ © d of R , separated from 0. Then T (A) = (r, λ) ∈ (0, ∞) × Sd−1 | rλ ∈ A , so that, using A = T −1 (T (A)), Ķ ©ä ν X (A) = ν X ◦ T −1 (r, λ) ∈ (0, ∞) × Sd−1 | rλ ∈ A Ķ ©ä = ςνα ⊗ Λ (r, λ) ∈ (0, ∞) × Sd−1 | rλ ∈ A ∞ να (dr) 1A (rλ) Λ(dλ) . =ς 0 Sd−1  Example 2.2.4 Let Θ be a random vector on the unit sphere of Rd with c c distribution Λ. If A = B (0, 1) = {x : |x| > 1}, (2.2.5) yields ν X (B (0, 1)) = α ςE [|Θ| ] = ς, which agrees with (2.2.3). Furthermore, let B be a random matrix satisfying the assumptions of Corollary 2.1.14. Then we can define ν B X by ∞ P(x−1 BX ∈ A) = ν B X (A) = lim P(rBΘ ∈ A)αr−α−1 dr . x→∞ P(|X| > x) 0 α

This yields P(|BX| > x) ∼ E [|BΘ| ] P(|X| > x).



These changes of variable formulas show that the exponent measure (associated to any scaling sequence) is concentrated on the axes if and only if the spectral measure (relative to any norm) is concentrated on the intersection of the axes and the unit sphere.

42

2 Regularly varying random vectors

Proposition 2.2.5 Let X be a regularly varying random vector in Rd . It is extremally independent if and only if its spectral measure (relative to any given norm) is concentrated on the intersection of the axes and the unit sphere.

2.2.1 Examples of exponent and spectral measures We proceed with several examples of regularly varying random vectors. In each case we will identify the exponent measure ν X and the spectral measure Λ. Example 2.2.6 (Example 2.1.4 continued) We now identify the spectral measure in the bivariate non-negative case (that is d = 2 and pX = 1). Let ς be as in (2.2.3) and define β1 = ς −1 ν X ({x = (x1 , 0) : x1 > 0, |(x1 , 0)| > 1}) , β2 = ς −1 ν X ({x = (0, x2 ) : x2 > 0, |(0, x2 )| > 1}) . By the homogeneity property of a norm, we have, for x1 , x2 > 0, |(x1 , 0)|

−1

(x1 , 0) = |(1, 0)|

−1

(1, 0), |(0, x2 )|

−1

(0, x2 ) = |(0, 1)|

−1

Hence, defining s1 = |(1, 0)| (1, 0) and s2 = |(0, 1)| for C ∈ S1 , ™ã Åß x ∈C Λ(C) = ς −1 ν X x : |x| > 1, |x|

−1

−1

(0, 1) .

(0, 1), we obtain,

= ς −1 ν X ({x = (0, x2 ) : x2 > 0, |(0, x2 )| > 1, s2 ∈ C}) + ς −1 ν X ({x = (x1 , 0) : x1 > 0, |(x1 , 0)| > 1, s1 ∈ C}) = β1 1{s 1 ∈C} + β2 1{s 2 ∈C} . Thus, the spectral measure is concentrated on the set {s1 , s2 }: Λ = β1 δs 1 +  β2 δs 2 . Example 2.2.7 (Example 2.1.5 continued) The homogeneity property implies that for x = 0 and sign(x) denoting the sign of x, |(x, . . . , x)|

−1

(x, . . . , x) = sign(x) |1|

−1

·1,

where 1 is a vector in Rd with all components equal to 1. Defining s = |1| we have ™ã Åß x ∈C Λ(C) = ς −1 ν X x : |x| > 1, |x| = ς −1 ν X ({x : |x| > 1, ±s ∈ C}) = pX 1{s∈C} + (1 − pX )1{−s∈C} .

−1

1,

2.2 Spectral decomposition

43

That is, Λ = pX δs + (1 − pX )δ−s .



Example 2.2.8 (Finite moving averages) Finite moving averages can be dealt with by applying Corollary 2.1.14. Assume that (Z1 , . . . , Zd ) are i.i.d. random variables with tail index α > 0. For sake of clarity, we assume that Z1 is non-negative. Let ν Z be the exponent measure of (Z1 , . . . , Zd ) as calculated in (2.1.3). Consider non-negative weights ψ1 , . . . , ψd−1 and define d−1 d−1 X1 = i=1 ψi Zi , X2 = i=1 ψi Zi+1 . In order to obtain bivariate regular variation of (X1 , X2 ), consider the map Ψ from Rd+ → [0, ∞)2 defined by d−1  d−1   ψ i zi , ψi zi+1 . Ψ (z1 , . . . , zd ) = i=1

i=1

Applying Corollary 2.1.14, we obtain that the exponent measure ν X of (X1 , X2 ) has the following form. For u1 , u2 > 0, ν X (([0, u1 ] × [0, u2 ])c ) = ν Z ◦ Ψ −1 (([0, u1 ] × [0, u2 ])c ) ã d−1 Å  u1  u2 −α α −α α + ψd−1 u−α , = ψ1 u 1 + 2 ψ ψ i i−1 i=2 or more generally, for a measurable subset A of Rd , separated from 0,  ∞ d−1  ν X (A) = 1A (ψi s, ψi−1 s) + 1A (0, ψd−1 s) αs−α−1 ds . 1A (ψ1 s, 0) + 0

i=2

(2.2.7) Applying (2.2.7) to A = (u1 , ∞) × (u2 , ∞), A = (u1 , ∞) × R+ and A = R+ × (u2 , ∞) we get, respectively, ν X ((u1 , ∞) × (u2 , ∞)) =

ν X ((u1 , ∞) × (0, ∞)) =

ã ã d−1 Å d−1 Å  u1  u2 −α  ψi  ψi−1 α = , ψi ψi−1 u1 u2 i=2 i=2

ã d−1 Å  ψi α i=1

u1

, ν X ((0, ∞) × (u2 , ∞)) =

ã d−1 Å  ψi α i=1

u2

.

(2.2.8) In particular, choosing u1 = u2 = 1, we have P(X1 > x, X2 > x) ν X ((1, ∞) × (1, ∞)) = = lim x→∞ P(X1 > x) ν X ((1, ∞) × R+ ) Combining this with (2.1.16) we obtain

d−1

α

(ψi ∧ ψi−1 ) . d−1 α i=1 ψi (2.2.9)

i=2

44

2 Regularly varying random vectors

ν X ((1, ∞) × (1, ∞))  P (X1 > x, X2 > x) α = = (ψi ∧ ψi−1 ) . x→∞ P(Z1 > x) ν Z ((1, ∞) × R+ ) i=2 d−1

lim

(2.2.10) The form of the exponent measure is a consequence of the single big jump heuristics: only one of the Zj can be extremely large; see Examples 2.1.4 and 2.2.6. If Z1 is large then only X1 can be large, if Zd is large then only X2 can be large, and if one among Z2 , . . . , Zd−1 is large, then X1 and X2 are simultaneously large. These heuristics can be used to derive the higher dimensional exponent and spectral measures. The exponent measure ν X is concentrated on the union of the d + 1 positive half-lines defined by the equations u1 = 0, u2 = 0, and ψi−1 u1 = ψi u2 , i = 2, . . . , d. This implies that the corresponding spectral measure is thus −1 −1 concentrated on the d + 1 points s0 = |(1, 0)| (1, 0), s1 = |(0, 1)| (0, 1), −1 ti = |(ψi , ψi−1 )| (ψi , ψi−1 ), i = 2, . . . , d, and d α ψ1α δs 0 + ψdα δs 1 + i=2 |(ψi , ψi−1 )| δt i Λ= . (2.2.11) d−1 α ψ1α + ψdα + i=2 |(ψi , ψi−1 )| We see that the spectral measure is discrete, whereas the exponent measure is continuous (in the sense that it puts no mass on points), but not absolutely continuous with respect to Lebesgue’s measure.  Example 2.2.9 (Finite moving maxima) Assume that (Z1 , . . . , Zd ) is a vector of i.i.d. non-negative random variables, regularly varying with index −α. Let ψ1 , . . . , ψd−1 be positive real numbers and define X1 = ∨d−1 i=1 ψi Zi , ψ Z . We apply Corollary 2.1.18. Let ν be the exponent meaX2 = ∨d−1 Z i=1 i i+1 sure of the vector (Z1 , . . . , Zd ). As in the previous example, we can obtain the exponent measure ν X of (X1 , X2 ) by considering the transformation Ψ defined on Rd+ by d−1 Ψ (z1 , . . . , zd ) = (∨d−1 i=1 ψi zi , ∨i=1 ψi zi+1 )

and we obtain ν X (([0, u1 ] × [0, u2 ])c ) = ν Z ◦ Ψ −1 (([0, u1 ] × [0, u2 ])c ) ã d−1 Å  u1  u2 −α α + + ψd−1 u−α . = ψ1α u−α 1 2 ψ ψ i i−1 i=2 We see that this is the same exponent measure as the one that arises in the case of finite moving averages considered in Example 2.2.8. Therefore the spectral measure is also given by (2.2.11).  Example 2.2.10 Assume that Z is a vector with i.i.d. regularly varying components with tail index α, satisfying the tail balance condition (1.3.1)

2.3 Conditioning on extreme events

45

(with pX replaced by pZ ). Consider a random vector σ = (σ1 , . . . , σd ) of positive, identically distributed, possibly dependent random variables such that E[σ1α+ ] < ∞ for some  > 0, independent of Z. Define Xj = σj Zj , j = 1, . . . , d. Breiman’s Lemma 1.4.3 implies that, for j = 1, . . . , d P(Xj > x) P(Xj < −x) = pZ E[σ1α ] , lim = (1 − pZ )E[σ1α ] , x→∞ P(|Z1 | > x) P(|Z1 | > x) P(|Xj | > x) = E[σ1α ] . lim x→∞ P(|Z1 | > x)

lim

x→∞

Let ν Z be the exponent measure of Z as in Example 2.1.4. For C ∈ Rd and y ∈ (0, ∞)d , define y −1 · C = {(u1 /y1 , . . . , ud /yd ) : (u1 , . . . , ud ) ∈ C} .

(2.2.12)

Note that if C is separated from zero, then so is y −1 C. By Corollary 2.1.14, we obtain that X is regularly varying and its exponent measure ν X is defined by Ä î äó ν X (C) = E ν Z σ −1 · C . The measure ν X is also concentrated on the axes. Indeed, if C is separated from the axes, then so is y −1 · C for all y ∈ (0, ∞)d and thus ν Z (y −1 · C) = 0  and consequently ν X (C) = 0.

2.3 Conditioning on extreme events For a random variable X an exceedence over the level x is the event {X > x} or {|X| > x}. There are many ways to define exceedences for a random vector X in Rd ; the simplest one is to choose a norm on Rd and consider the events {|X| > x} for large x. But this choice is arbitrary and one can define many other types of exceedences. Consider a set A which is separated from zero. Recall that this implies that there exists  > 0 such that the complementary of A contains the ball centered at zero with radius  (this  depends on the choice of the norm but always exists). Then an exceedence relative to A can be defined as the event X ∈ xA. This implies that |X| > x so that X is large in the usual sense, but also with respect to the specific choice of the event A. Among possible choices of A, let us mention for instance the following ones: ™ ß ™ ß min xi > 1 , {x1 + · · · + xd > 1} , {x1 x2 > 1} , max xi > 1 , 1≤i≤d

1≤i≤d

and any combinations (unions or intersections) of such events can be thought of. In order to use the regular variation property to study the exceedences

46

2 Regularly varying random vectors

X ∈ xA when x tends to infinity, it is necessary to have ν X (A) > 0, (additionally to ν X (∂A) = 0). In that case, regular variation implies that as x→∞ ã Å X d ν X (A ∩ ·) ∈ · | X ∈ xA −→ . (2.3.1) P x ν X (A) Since exponent measures differ only by a multiplicative constant, the ratio in the right-hand side of (2.3.1) does not depend on the particular choice of the exponent measure. Using Proposition 2.2.3, this limit can be expressed in terms of the spectral measure. For all measurable subsets B of Rd ∞ −α−1 αr dr d−1 1A∩B (rλ) Λ(dλ) ν X (A ∩ B) = 0 ∞ −α−1 S . ν X (A) αr dr Sd−1 1A (rλ) Λ(dλ) 0 Instead of dividing by x in (2.3.1), we can choose a norm on Rd and divide by |X|. This yields, for all measurable set B ⊂ Sd−1 such that Λ(∂B) = 0, ∞ −α−1 ã Å αs ds 1A (su)Λ(du) X ∈ B | x ∈ xA = ∞0 −α−1 B lim P . (2.3.2) x→∞ |X| ds Sd−1 1A (su)Λ(du) 0 αs ν X (B ∗ ∩ A) , (2.3.3) = ν X (A) where B ∗ is the cone with base B, that is, B ∗ = {x ∈ Rd | x/|x| ∈ B}. Choosing now A = {x : |x| > 1}, we simply obtain ã Å X ∈ B | |X| > x = Λ(B) . (2.3.4) lim P x→∞ |X| All these expressions are mathematically indifferent but from a statistical point of view, given the problem and data at hand, one may be more practically relevant than the other. A particular case of interest is when d ≥ 2 and the vector X = (X1 , . . . , Xd ) with exponent measure ν X is split into two subvectors X = (X 1 , X 2 ) with X 1 = (X1 , . . . , Xh ), X 2 = (Xh+1 , . . . , Xd ), (h ∈ {1 . . . , d − 1}). Let C be a set separated from 0 in Rh such that ν X (C × Rd−h ) > 0. Then by Corollary 2.1.17, X 1 is regularly varying with exponent measure ν X 1 given by ν X 1 = ν X (· × Rd−h ). For such a set C, we can consider conditioning on the event {X 1 ∈ xC}. Then, for measurable sets D in Rd−h and such that ν X (∂(C × D)) = 0, Å ã X2 ν X (C × D) lim P ∈ D | X 1 ∈ xC = . (2.3.5) x→∞ x ν X 1 (C) Example 2.3.1 Consider a regularly varying vector X = (X1 , . . . , Xd ). Set h = 1. Choose C = {x1 ∈ R : x1 > 1} and assume that ν X 1 (C) > 0 (which

2.3 Conditioning on extreme events

47

means that X1 does not have a lighter tail than the other components), D = Rd−2 × (ud , ∞), with ud > 0. Then ν X ((1, ∞) × Rd−2 × (ud , ∞)) . lim P(Xd > xud | X1 > x) = x→∞ ν X 1 ((1, ∞)) The limit exists since we assumed that the denominator does not vanish. In the case of extremal independence, the limit is zero. In Chapter 3, we will investigate how a change of scaling may induce a non-zero limit in case of extremal independence.  Example 2.3.2 Let d ≥ 3 and set h = 2. Let X = (X1 , . . . , Xd ) be a regularly varying vector with non-negative components. Choose C = {(x1 , x2 ) ∈ × (ud , ∞), ud > 0. Then R2+ : x1 + x2 > 1} and D = Rd−2 + ν X (C × D) . lim P(Xd > xud | X1 + X2 > x) = x→∞ ν (X1 ,X2 ) (C) The denominator vanishes only if X1 + X2 has a lighter tail than the components X3 , . . . , Xd . Excluding this case, the limit is well defined, and is zero in the case of extremal independence.  Example 2.3.3 Let again d ≥ 3 and h = 2. Now we choose C = {(x1 , x2 ) ∈ R2 : x1 > 1, x2 > 1} and D = Rd−2 × (ud , ∞), ud > 0. If ν X 1 ((1, ∞) × (1, ∞)) > 0, then lim P(Xd > xud | X1 > x, X2 > x) x→∞

=

ν X ((1, ∞) × (1, ∞) × Rd−2 × (ud > ∞)) . ν X 1 ((1, ∞) × (1, ∞))

In contrast to Examples 2.3.1 and 2.3.2, the above limit makes sense only in the case of extremal dependence between X1 and X2 , since otherwise  ν X 1 ((1, ∞) × (1, ∞)) = 0. These examples illustrate that the limits which appear therein provide information on the extremal behavior of the vector X only in the case of extremal dependence. The limits of these conditional probabilities for different choices of the set C have been used as measures of extremal dependence, that is, heuristically, as quantifiers of the tendency of some or all components of the vector to be jointly extremely large, or to define certain indices in applications such as risk management. In the following subsections, we introduce some of these quantities which all have the notable feature that they are degenerate for vectors with extremal independence.

2.3.1 Extremal Dependence Measure Let X = (X1 , . . . , Xd ) be a regularly varying random vector. Since f (λ) = d d−1 , applying the weak convergence j=1 λj is continuous and bounded on S in (2.3.4) yields

48

2 Regularly varying random vectors

ã ò ï Å d  X

lim E f λj Λ(dλ) .

|X| > x = x→∞ |X| Sd−1 j=1 In particular, for a bivariate regularly varying random vector (X1 , X2 ), the expression on the right-hand side is called the Extremal Dependence Measure (EDM): λ1 λ2 Λ(dλ) . EDM(X1 , X2 ) = S1

Since the spectral measure depends on the chosen norm, the same applies to EDM. We can interpret EDM as a covariance-like quantity, computed with respect to the spectral measure. The EDM vanishes whenever the spectral measure Λ is concentrated on the axes, that is, in the case of extremal independence. It may happen that EDM(X1 , X2 ) = 0 even though the random vector is not extremally independent. This is the case for examples when Λ puts equal mass 1/4 at the four points (±1, ±1)/|(1, 1)| or if the spectral measure is uniform on the unit sphere. This is comparable to the fact that the covariance of two square integrable vectors can be zero even if the vectors are not independent. Example 2.3.4 Assume that X1 is positive regularly varying random variable. The extremal dependence measure of (X1 , X1 ) is EDM(X1 , X1 ) = |1|−2 .  Example 2.3.5 (Example 2.2.8 continued) Let Z1 , Z2 , Z3 be i.i.d. regularly varying non-negative random variables. For ψ1 , ψ2 > 0, define X1 = ψ1 Z1 + ψ2 Z2 , X2 = ψ1 Z2 + ψ2 Z3 . Recall that in this particular case the spectral measure is concentrated at the 3 points: |(1, 0)|−1 (1, 0) , |(0, 1)|−1 (0, 1) , |(ψ2 , ψ1 )|−1 (ψ2 , ψ1 ) , with masses given in (2.2.11). The extremal dependence measure is thus EDM(X1 , X2 ) =

|(ψ2 , ψ1 )|α−2 ψ1 ψ2 . + ψ2α + |(ψ2 , ψ1 )|α

ψ1α

 That is, the extremal dependence measure depends on the tail index α. This is a common feature of most of measures for extremal dependence.

2.3.2 The extremogram Let X = (X 1 , . . . , X d ) be a regularly varying random vector in Rqd (that is each of the d components is itself a q-dimensional vector) with exponent

2.3 Conditioning on extreme events

49

measure ν X . Let 1 ≤ i < j ≤ d. Let C, D be measurable sets in Rq , both separated from 0. If ν X (Rq(i−1) × C × Rq(d−i) ) > 0, applying (2.3.5) with C replaced with Rq(i−1) × C and D replaced with Rq(j−i−1) × D × Rq(d−j) yields lim P(X j ∈ xD | X i ∈ xC)

x→∞

=

ν X (Rq(i−1) × C × Rq(j−i−1) × D × Rq(d−j) ) . ν X (Rq(i−1) × C × Rq(d−i) )

This justifies the following definition. Definition 2.3.6 Let X = (X 1 , . . . , X d ) be a regularly varying random vector in Rqd with exponent measure ν X . The extremogram ρ(i, j; C, D) is defined for all 1 ≤ i < j ≤ d and measurable sets C, D separated from 0 in Rq such that ν X (Rq(i−1) × C × Rq(d−i) ) > 0, by ρ(i, j; C, D) = lim P (X j ∈ xD | X i ∈ xC) x→∞

=

(2.3.6)

ν X (Rq(i−1) × C × Rq(j−i−1) × D × Rq(d−j) ) . ν X (Rq(i−1) × C × Rq(d−i) )

The restriction i < j is for notational convenience; an extremogram for i > j can be defined similarly. If C = D, the extremogram can be interpreted as a limiting correlogram of extreme events since we can write     −1  cov 1 c−1 n X i ∈ C , 1 cn X j ∈ C ρ(i, j; C, C) = lim  ¶ © ¶ © . n→∞ −1 var(1 c−1 n X i ∈ C )var(1 cn X j ∈ C ) Indeed, for all C, D, separated from zero, lim ncov (1{X i ∈ cn C}, 1{X j ∈ cn D})

n→∞

= lim n {P (X i ∈ cn C, X j ∈ cn D) − P (X i ∈ cn C) P (X j ∈ cn D)} n→∞

= lim nP (X i ∈ cn C, X j ∈ cn D) n→∞

= ν X (Rq(i−1) × C × Rq(j−i−1) × D × Rq(d−j) ) , and for i = j, Ä ¶ ©ä Ä ä lim nvar 1 c−1 = lim nP c−1 n Xi ∈ C n Xi ∈ C

n→∞

n→∞

= ν X (R

q(i−1)

×C ×R

q(d−i)

).

50

2 Regularly varying random vectors

Tail Dependence Coefficient Choosing particular sets C and D in (2.3.6), we can relate the extremogram to certain other extremal dependence measures. For example, if q = 1 and we choose C = D = (1, ∞), then we obtain the tail dependence coefficient τ (i, j) defined by τ (i, j) = ρ(i, j; (1, ∞), (1, ∞)) = lim P(Xj > x | Xi > x) . x→∞

(2.3.7)

Example 2.3.7 (Example 2.2.8 continued.) The expression for the tail dependence coefficient in case of finite moving averages follows from (2.2.9): d−1 (ψi ∧ ψi−1 )α P(X1 > x, X2 > x) ρ(0, 1; (1, ∞), (1, ∞)) = lim = i=2 d−1 . α x→∞ P(X1 > x) i=1 ψi 

2.3.3 Conditional Tail Expectation Let X = (X1 , X2 ) be regularly varying random vector with exponent measure ν X and tail index α. If f : R → R is continuous and |f (x)| = O(|x|α− ) at infinity for some  > 0, then Corollary 2.1.10 implies that ó î ï Å ò ã E f (X2 /x) 1{X1 >x} X2 lim E f | X1 > x = lim x→∞ x→∞ x P(X1 > x) ∞ ∞ f (y 2 )ν X (dy1 , dy2 ) . = 1 −∞ ν X ((1, ∞) × R) If α > 1, then 1 lim E[X2 | X1 > x] = x→∞ x

∞ ∞ 1

y2 ν X (dy1 , dy2 ) . ν X ((1, ∞) × R) −∞

(2.3.8)

The latter quantity plays a special role in finance and insurance under the name Conditional Tail Expectation (CTE), defined as CTE(x) = E[X2 | X1 > x] .

(2.3.9)

If the limit on the right-hand side of (2.3.8) is not zero, then the conditional tail expectation grows at a linear rate x. Example 2.3.8 (Example 2.2.8 continued) Assume additionally that α > 1. Plugging the expression (2.2.7) into the numerator of (2.3.8) and the left-hand side of (2.2.8) in the denominator, we obtain d−1 α i=2 ψi−1 ψiα−1 CTE(x) = lim . d−1 x→∞ x (α − 1) i=1 ψiα 

2.4 Problems

51

2.4 Problems 2.1 Let | · | be the Euclidean norm on Rd . A random vector X ∈ Rd is said to have an elliptical distribution if there exists a positive random variable R, a random variable U uniformly distributed on the sphere Sd−1 , and a (deterministic) matrix A with full rank such that X = RAU . Prove that X is regularly varying if and only if R is regularly varying and give the spectral measure of X. 2.2 Let X, Y , and Z be three i.i.d. non-negative regularly varying random variables with tail index α > 0 such that E[X α ] = ∞. Prove that the vector (XY, XZ) is multivariate regularly varying with tail index α. Give its exponent and spectral measures. 2.3 Let X be a regularly varying Rd -valued random vector with exponent measure ν X and spectral measure Λ. For u = (u1 , . . . , ud ), v = (v1 , . . . , vd ) with ∨di=1 ui < 0 and ∧di=1 vi > 0, compute ν X ([u, v]c ) and ν X ([v, ∞)) in terms of Λ. 2.4 Let Z be a regularly varying random vector with tail index α > 0 and spectral measure ΛZ with respect to a given norm, independent of a random matrix A with full rank which satisfies the assumptions of Corollary 2.1.14. Determine the spectral measure ΛX of X = AZ with respect to the same norm. Hint: let Θ2 be a random vector with distribution ΛZ , independent of −1 A and set V = |AΘZ | AΘZ . 2.5 Let |·|1 , |·|2 be arbitrary norms on R2 . Let S1 , S2 , Λ1 , Λ2 be the associated unit spheres and spectral measures, respectively. Show that f (λ1 , λ2 )Λ1 (dλ) = ς2 ς1−1 f ◦ H −1 (θ)|θ|α 1 Λ2 (dθ) , S1

S2

where H : S1 → S2 is a map defined by H(λ) = λ/|λ|2 . Conclude that λ1 λ2 Λ1 (dλ) = λ1 λ2 |λ|α−2 Λ2 (dλ) . 1 S1

S2

2.6 For d ≥ 1 and h ≥ 2, let X 1 , . . . , X h be d-dimensional random vectors and write X = (X 1 , . . . , X h ). Assume that for every i ∈ {1, . . . , h}, (X i , . . . , X h ) has the same distribution as (X 1 , . . . , X h−i ). We prove that regular variation of X is equivalent to the following conditions: (a) there exists a probability measure μ on Rdh such that P(x−1 X ∈ · | w |X 1 | > x) =⇒ μ; (b) for all t ≥ 1, μ({x ∈ Rdh | |x1 | > t}) = t−α . 1. Prove that |X 1 | is regularly varying.

52

2 Regularly varying random vectors

2. Define the bounded measure νx = (P(|X 1 | > x))−1 P(x−1 X ∈ ·) on Rdh \ {0}. Apply Lemma B.1.29 to prove that the sequence {νx , x ≥ 1} is vaguely# relatively compact. 3. Consider the class F of bounded continuous functions f on Rdh such that • either there exists u > 0 such that f (x1 , . . . , xh ) = 0 if |x1 |d ≤ u; • or f (x1 , . . . , xu ) = f (0, x2 , . . . , xu ) for all x ∈ Rdh . Apply Lemma B.1.31 to prove that there exists at most one measure ν which can be a limit along a subsequence of νx . 4. Conclude that X is regularly varying and that the measure ν is its exponent measure.

2.5 Bibliographical notes This chapter is essentially standard. Most of its results can be found in textbooks on regular variation or extreme value theory such as [Res87], [BGT89], or [dHF06]. The difference with [Res87] is that we use vague# convergence on Rd \ {0} instead of vague convergence on [−∞, ∞]d \ {0}. The idea to define regular variation on a metric space using vague# convergence on a CSMS is due to [HL06] where it is called M0 convergence. This idea was further developed by [LRR14] and [SZM17] among others. In this chapter, where we consider only Euclidean spaces, there is nearly no difference with the classical vague convergence framework found in [Res87] for instance. The main advantage is to avoid compactification of Rd with a point at ∞ and we do not have to deal with the possibility of mass escaping away to ∞. This also unifies the treatment of finite-dimensional vectors in this chapter and infinite-dimensional random elements in Chapter 5 and later ones. We have decided to use the expression extremally independent to qualify regularly varying vectors whose exponent measure is concentrated on the axes, rather than the usual “asymptotic independence” which is undeniably ambiguous. Most results can be found in the classical references mentioned above. In addition, Corollary 2.1.14 was obtained in [BDM02b, Proposition A.1] (where vague convergence in [−∞, ∞]d \ {0} is used). Equivalence between the multivariate regular variation of a random vector and the univariate regular variation of its components is considered in [BDM02a, Theorem 1.1]. Multivariate extensions of Breiman’s lemma are also considered in [FM12]. Extremal dependence measures can be found in [FKS10], [DM09], and [LR12]. Problem 2.6, which will be referred to in Chapter 5, is adapted from [BS09, Theorem 2.1].

3 Dealing with extremal independence

Consider a regularly varying random vector X which is extremally independent. This implies that the extremal dependence measure (Section 2.3.1), the extremogram, and the tail dependence coefficient (Section 2.3.2) vanish and that the conditional tail expectation grows at a rate slower than linear (Section 2.3.3). In this case the exponent measure is useless to study the extremal behavior of the vector. However, there may be some “hidden” extremal behavior. Let us start by two simple illustrative examples. Example 3.0.1 Let Y be a positive regularly varying random variable and let B be an independent Bernoulli random variable. Define the vector X = (X0 , X1 ) = (BY, (1 − B)Y ). Then X is regularly varying with extremal independence, and for any u0 , u1 > 0, P(X0 > u0 , X1 > u1 ) = 0. Thus in this case there is “nothing” away from the axes.  Example 3.0.2 Let X0 and X1 be i.i.d. positive regularly varying random variables. Then, for all u0 , u1 > 0, P(X0 > xu0 , X1 > xu1 ) =0, P(X0 > x) P(X0 > xu0 , X1 > xu1 ) −α lim = u−α , 0 u1 x→∞ P2 (X0 > x) P(X0 > xu0 , X1 > u1 ) = u−α lim 0 P(X1 > u1 ) . x→∞ P(X0 > x) lim

x→∞

(3.0.1a) (3.0.1b) (3.0.1c)

The limit (3.0.1a) confirms the fact that in case of the extremal independence the usual extreme value theory is useless to study events where both components are extremely large. The limits (3.0.1b) and (3.0.1c) show that using different scaling, either in the numerator or in the denominator, may yield non-degenerate limits and evidence the extremal behavior away from the axes.  © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 3

53

54

3 Dealing with extremal independence

Several concepts have been introduced to deal with extremally independent random vectors. We will only consider hidden regular variation (formalizing the situation in (3.0.1b)) in Section 3.1 and the conditional extreme value framework (formalizing the situation in (3.0.1c)) in Section 3.2.

3.1 Hidden regular variation Throughout this section, d ≥ 2 denotes a fixed integer. We have seen that extremal independence means that no two components can be large at the same time. Hidden regular variation was introduced to deal with events where two components are required to be simultaneously large. Define the cone Cd,2 by  Cd,2 = {x ∈ Rd : |xi ||xj | > 0} . 1≤i 0 ,

where |·| is an arbitrary norm on Rd . We can define a metric d2 on Cd,2 such that bounded sets for this metric are exactly those separated from zero and from at least two axes: −1 d2 (x, y) = [|x − y| ∧ 1] ∨ [h−1 2 (x) − h2 (y)| .

Then d2 is a metric on Cd,2 which induces the usual topology and sets bounded for d2 are sets separated from at least two axes, i.e. there exists  > 0 such that x ∈ A implies h2 (x) > . In the following, we consider vague# convergence on Cd,2 endowed with the boundedness B2 or equivalently with the metric d2 . This means that a sequence of measures μn on Cd,2 converges vaguely# to μ if limn→∞ μn (A) = μ(A) for all Borel sets separated from Hd,2 and such that μ(∂A) = 0.

3.1 Hidden regular variation

55

Definition 3.1.1 Let X be an Rd -valued regularly varying random vector with scaling sequence {cn }. It is said to have hidden regular variation cn = ∞ and a if there exists a scaling sequence {˜ cn } such that limn→∞ cn /˜ ˜ X on Cd,2 , called the hidden exponent measure, non-zero Borel measure ν such that   v# ˜ X , in (Cd,2 , B2 ) . nP c˜−1 (3.1.2) n X ∈ · −→ ν

Note that the definition of hidden regular variation contains the assumption of multivariate regular variation. We now state a result which parallels Theorem 2.1.3 and is again a particular case of Theorem B.2.2. The proof is in Appendix B.2.2. Theorem 3.1.2 Let the random vector X have multivariate regular variation with tail index α and hidden regular variation. Then X is extremally independent, the sequence {˜ cn } is regularly varying at infin˜ X is (−α ˜ )-homogeneous: ity with index 1/˜ α, α ˜ ≥ α and the measure ν for y > 0 and any measurable subset A of Cd,2 , ˜ X (A) . ˜ X (yA) = y −α˜ ν ν

(3.1.3)

If (˜ ν X , c˜n ) are such that (3.1.2) holds, then there exists a constant ς˜ such that ˜ X = ς˜ν ˜ X , ν

c˜n = ς˜1/α˜ . n→∞ c ˜n lim

˜ X is called hidden exponent measure. The positive numThe measure ν ber α ˜ is called the hidden tail index.

Note that the converse is not true, i.e. if X is regularly varying and extremally independent, then it does not necessarily have hidden regular variation. See Example 3.0.1. The hidden exponent measure of a bivariate vector with positive components is characterized by its values on the sets (x, ∞) × (y, ∞) for x, y > 0. As a consequence, hidden regular variation of a bivariate vector with positive components is related to the behavior of the infimum of the components. The following result is a straightforward consequence of the definition of hidden regular variation and complements Corollary 2.1.20.

56

3 Dealing with extremal independence

Proposition 3.1.3 Let X = (X1 , X2 ) be a bivariate random vector with non-negative components, hidden regular variation with hidden tail index ˜ X with respect to the scaling sequence α ˜ , and hidden exponent measure ν c˜n . Then, for a1 , a2 , y > 0,   X1  X2 ˜ X ((a2 , ∞) × (a2 , ∞)) . (3.1.4) lim nP > c˜n y = y −α˜ ν n→∞ a1 a2

Example 3.1.4 Let X1 and X2 be i.i.d. non-negative regularly varying random variables with tail index α > 0. Then the vector (X1 , X2 ) is regularly varying with extremal independence and hidden regular variation with index 2α. Indeed, by independence and marginal regular variation, lim

x→∞

P(X1 > xy1 , X2 > xy2 ) = y1−α y2−α . P2 (X > x)

The hidden exponent measure is the measure on (0, ∞) × (0, ∞) defined by 2 ˜ X (A) = α ν u−α−1 v −α−1 dudv , A

for A separated from 

R2+ \{(0, ∞)×(0, ∞)}

= {(0, ∞)×{0}}∪{{0}×(0, ∞)}.

However, for a higher dimensional vector, hidden regular variation does not give any information about events involving three simultaneous large components. Example 3.1.5 Consider for instance three i.i.d. random variables X0 , X1 , X2 with Pareto distribution with index α. Then the vector (X0 , X1 , X2 ) has hidden regular variation with hidden tail index 2α. Its exponent measure is concentrated on the three half-planes (0, ∞) × (0, ∞) × {0}, (0, ∞) × {0} × (0, ∞), and {0} × (0, ∞) × (0, ∞). However, for x, y, z > 0, lim nP(X0 > n1/2α x, X1 > n1/2α y, X2 > n1/2α z) = 0 .

n→∞

Therefore, hidden regular variation does not give any information on such events and in particular does not yield the regular variation of X0 ∧ X1 ∧ X2 (which in this case is obviously regularly varying with tail index 3α).  Example 3.1.6 (Example 2.2.10 continued) Let Z1 , Z2 be i.i.d. positive, regularly varying random variables with tail index α and let (σ1 , σ2 ) be a vector of identically distributed, positive, possibly dependent random variables such that E[σ1α+ ] < ∞ for some  > 0, independent of Z1 , Z2 . Define Xj = σj Zj , j = 1, 2. Then, for u1 , u2 > 0, we have

3.1 Hidden regular variation

57

P(X1 > u1 x, X2 > u2 x) x→∞ P2 (Z1 > x) E [P(X1 > u1 x, X2 > u2 x | σ1 , σ2 )] = lim x→∞ P2 (Z1 > x)

P(Z1 > xu1 /σ1 ) P(Z2 > xu2 /σ2 ) = lim E | σ 1 , σ2 x→∞ P(Z1 > x) P(Z1 > x) lim

−α = E[σ1α σhα ]u−α . 1 u2

The limit is justified by the regular variation of Z1 , Potter’s bound (1.4.3), and the bounded convergence theorem. By Breiman’s Lemma, P(X1 > x) ∼ E[σ1α ]P(Z1 > x), hence E[σ1α σ2α ] −α −α P(X1 > xu1 , X2 > xu2 ) = u u . x→∞ P2 (X1 > x) (E[σ1α ])2 1 2 lim

We conclude that (X1 , X2 ) has hidden regular variation with hidden tail index 2α.  Example 3.1.7 Assume that Z0 , Z1 , Z2 are independent positive regularly varying random variables with tail index α and with common distribution function F . Let γ ∈ (0, 1) and define Xi = Zi+1 Ziγ , i = 0, 1. Then X0 , X1 are identically distributed and by Breiman’s lemma, P(Xi > x) ∼ E[Z0αγ ]P(Z0 > x) as x → ∞, i = 0, 1. By conditioning on Z1 , we obtain P(X0 > xu0 , X1 > xu1 ) ∞ F¯ ((xu0 /z)1/γ )F¯ (xu1 z −γ )F (dz) = 0 ∞ 1/γ F¯ (u0 w−1/γ )F¯ (x1−γ u1 w−γ )F (xdw) = 0 ∞ 1/γ F¯ (u0 w−1/γ )wαq αw−α−1 dw ∼ F (x)F (x1−γ )u−α 1 0 ∞ −α(1−γ) −α F¯ (w−1/γ )αw−α(1−γ)−1 dw u1 = F (x)F (x1−γ )u0 0 ∞ −α(1−γ) −α 1−γ F¯ (x)xαγ(1−γ)−1 dx = αF (x)F (x )u0 u1 0

−1

= γ(1 − γ)

E[Z

αγ(1−γ)

−α(1−γ) −α u1

]F (x)F (x1−γ )u0

.

The equivalence can be justified by bounded convergence arguments using Potter’s bound, and the expectation is finite since γ(1 − γ) < 1. By the composition theorem for regularly varying functions (see Proposition 1.1.9) the function F (x1−γ ) is regularly varying with index −α(1 − γ). We con˜= clude that (X0 , X1 ) has hidden regular variation with hidden tail index α α(2 − γ). 

58

3 Dealing with extremal independence

Spectral Decomposition As for multivariate regular variation, there exists a spectral decomposition of the hidden spectral measure, between a radial component which is the ˜ The fundamental difference measure να˜ and a hidden spectral measure Λ. between regular variation and hidden regular variation is that the hidden spectral measure may be infinite. In the case of a bivariate vector X = (X1 , X2 ) with positive components, ˜X , hidden regular variation with index −α ˜ , and hidden exponent measure ν choosing for instance the 1 norm |(x1 , x2 )|1 = x1 + x2 , we have the representation (up to a multiplicative constant) ˜ X (A) = ν



˜ α ˜ s−α−1

0

0

1

˜ 1A (su, s(1 − u)) Λ(du) ,

(3.1.5)

for all measurable set A ⊂ (0, ∞)2 and the integral is finite if A is separated ˜ from the axes. It is possible that Λ((0, 1)) = ∞, but choosing A = (1, ∞) × (1, ∞) we obtain that

1 ˜ ˜ α ˜ s−α−1 1{(s[u ∧ (1 − u)] > 1}Λ(du) 0 0

1 ∞ −α−1 ˜ ˜ = 1{(s[u ∧ (1 − u)] > 1}˜ αs Λ(du)

˜ X (A) = ν

0

=

0



1

0

˜ (u ∧ (1 − u))α˜ Λ(du) 0, but satisfies condition (3.1.6) since in that case α ˜ = 2α. Regular variation of products In certain cases, hidden regular variation can be used to prove the regular variation of the product of the components of a bivariate vector with extremally independent components.

3.1 Hidden regular variation

59

Proposition 3.1.8 Let (X1 , X2 ) be a regularly varying random vector with non-negative components, hidden regular variation with index α ˜, ˜ X with respect to the scaling sequence c˜n such hidden exponent measure ν ˜ X ((1, ∞) × (1, ∞)) = 1 and assume moreover that for all y > 0, that ν lim lim sup nP(X1 X2 > c˜2n y, X1 ∧ X2 ≤ ˜ cn ) = 0 .

→0 n→∞

Then

∞∞ 0

0

(3.1.8)

1{uv > 1}˜ ν X (du, dv) < ∞ and

  ˜ lim nP X1 X2 > c˜2n y = y −α/2

n→∞

0



0



1{uv > 1}˜ ν X (du, dv) . (3.1.9)

Proof. By (3.1.8), hidden regular variation, and monotone convergence, we have     lim nP X1 X2 > c˜2n y = lim lim nP X1 X2 > c˜2n y, X1 ∧ X2 > ˜ cn n→∞ →0 n→∞ ∞ ∞ = lim 1{uv > y}˜ ν X (du, dv) (3.1.10) →0  ∞ ∞  1{uv > y}˜ ν X (du, dv) . = 0

0

We must now prove that the latter integral is finite. This is also a consequence of (3.1.8). Write I() for the integral in (3.1.10). Then I() increases as  tends to zero. Therefore it suffices to prove that the function  → I() is bounded. Fix 1 ∈ (0, 1) such that lim sup nP(X1 X2 > c˜2n y, X1 ∧ X2 ≤ c˜n 1 ) ≤ 1. n→∞

Then, for  ∈ (0, 1 ), we have I() = I(1) + I() − I(1) = I(1) + lim nP(X1 X2 > c˜2n y, X1 ∧ X2 ≤ c˜n , X1 ∧ X2 > c˜n ) n→∞

≤ I(1) + lim sup nP(X1 X2 > c˜2n y, X1 ∧ X2 ≤ c˜n 1 ) ≤ I(1) + 1 . n→∞

This proves that I() is bounded and therefore lim→0 I() < ∞. Finally, by ˜ X , we have, setting A = {(u, v) ∈ (0, ∞)2 : the homogeneity property of ν uv > 1}, ∞ ∞ √ ˜ ˜ X (A) . ˜ X ( yA) = y −α/2 ν 1{uv > y}˜ ν X (du, dv) = ν 0

0

This concludes the proof.



60

3 Dealing with extremal independence

In terms of the hidden spectral measure relative to the 1 norm, the integral 1 ˜ ˜ in the right-hand side of (3.1.6) can be expressed as 0 (u(1 − u))α/2 Λ(du) and it is finite if (3.1.8) holds. Example 3.1.9 If X1 and X2 are i.i.d. non-negative random variables with Pareto tails with index α, then the tail index of the product is α but condition (3.1.8) fails to hold and the appropriate scaling sequence for the tail of the  product is (n log(n))1/α . See Problem (1.18).

3.2 Conditioning on an extreme event We consider now a random vector X with values in Rd+h and we want to investigate the situation where only the first d components of X are large and the effect on the other components, without assuming the regular variation of X. The framework for this problem is vague# convergence in the space Rd \{0}× Rh , endowed with the boundedness B0,h of sets separated from {0}×Rh , that is B ∈ B0,h ⇐⇒ ∃ > 0 , B ⊂ Bc (3 =, 0) × Rh . We say that a Borel measure ν on Rd \ {0} × Rh is B0,h -boundedly finite if and only if ν(B) < ∞ for all B ∈ B0,h . A sequence of B0,h -boundedly finite measures νn is said to converge B0,h -vaguely# to ν if limn→∞ νn (A) = ν(A) for all ν-continuity sets A ∈ B0,h . See Appendix B.1 for more details. Definition 3.2.1 (Partial regular variation) Let d ≥ 1 and h ≥ 1 be integers. Let X be a random vector in Rd+h . Set X 0 = (X1 , . . . , Xd ). We say that X has partial regular variation with respect to its d first components if there exist scaling sequences cn , c∗n (j), d + 1 ≤ j ≤ d + h and a non-zero Borel measure μX on (Rd \ {0}) × Rh , such that • μX (· × Rh ) is not the null measure on Rd \ {0}; • there is no i ∈ {1, . . . , h} such that μX is concentrated on (Rd \{0})× Ri−1 × {0} × Rh−i , •

 nP

X0 Xd+h Xd+1 ,··· , ∗ , ∗ cn cn (d + 1) cn (d + h)

in (Rd \ {0} × Rh , B0,h ).



 ∈·

v#

−→ μX ,

(3.2.1)

3.2 Conditioning on an extreme event

61

Comments: • By assumption, the measure ν X 0 defined on Rd \ {0} by ν X 0 (A) = μX (A × Rh ) is not the zero measure and therefore Theorem 2.1.3 implies that X 0 is regularly varying with index −α > 0, exponent measure ν X 0 , and scaling sequence cn which is regularly varying with index 1/α. • The last condition implies that the scaling sequences c∗n (i), i = d + 1, . . . , d + h, have the right order of magnitude and that the limiting conditional distribution of Xd+i /c∗n (d + i) given |X 0 | > cn exists and is non-degenerate. • These assumptions do not imply that the vector X is multivariate regularly varying in Rd+h . Hence the terminology partial regular variation. Theorem 3.2.2 The following statements are equivalent: (i) The vector X satisfies Definition 3.2.1. (ii) There exist α > 0, κd+1 ≥ 0, . . . , κd+h ≥ 0, a function b0 : (0, ∞) → (0, ∞) which is regularly varying at infinity with index α, functions bd+i , . . . , bd+h which are regularly varying at infinity with indices κd+i and a B0,h -boundedly finite measure μX on Rd \ {0} × Rh such that    X 0 Xd+1 Xd+h v# , ,..., b0 (t)P (3.2.2) ∈ · −→ μX , t bd+1 (t) bd+h (t) in (Rd \ {0} × Rh , B0,h ). If (i) and (ii) hold, then the following properties hold: (a) The sequence {cn } is regularly varying at infinity with index 1/α and the function b0 in (ii) can be chosen as 1/P(X ∈ tA) for any set A such that μX (A × Rh ) > 0. (b) For every Borel set C0 ⊂ Rd \ {0} separated from zero and all Borel sets Cd+1 , . . . , Cd+h ⊆ R, μX (tC0 × tκd+1 Cd+1 × · · · × tκd+h Cd+h ) = t−α μX (C0 × Cd+1 × · · · × Cd+h ) . (3.2.3) (c) If (μX , cn , c∗n (d+i), 1 ≤ i ≤ h) and (μX , cn , c∗n  (d+i), 1 ≤ i ≤ h) are as in Definition 3.2.1, then there exist constants ςi > 0, i = 0, . . . , h such that cn = ς0 , n→∞ cn lim

c∗n  (d + i) = ςi n→∞ c∗ n (d + i) lim

62

3 Dealing with extremal independence

and μX (ς0 C0 × ς1 C1 × · · · × ςh Ch ) = μX (C0 × · · · × Ch ) . Proof. By Theorem 2.1.3, we know that (3.2.1) implies that X 0 is regularly varying so we can choose b0 (t) = 1/P(|X 0 | > t) and cn = b← 0 (n). Recall that [x] denotes the greatest integer smaller than or equal to the real number x. Since b0 is regularly varying with index α > 0, we know by Proposition 1.1.8 that b← 0 is regularly varying with index 1/α and by Lemma 1.1.3 that b← b← ([b0 (t)]) 0 (b0 (t)) = lim 0 =1. t→∞ t→∞ t t lim

Let A be the collection of proper differences of subsets A ⊂ Rd \ {0} × Rh of the form A = C0 × B where C0 is a semi-cone of Rd \ {0} separated from zero, and B a Borel subset of Rh . The class A is a π-system and countably generates the open sets Rd \ {0} × Rh by (Lemma A.2.8). Thus the class A is measure determining by Theorem A.1.7. Define, for i = 1, . . . , h, bd+i (t) = c∗[b0 (t)] (d + i) . Let C0 be a bounded semi-cone in Rd \ {0}. For each s > 0 such that sC0 × C1 × · · · × Ch is a continuity set of μX , the convergence  lim nP

n→∞

X0 Xd+1 Xd+h ∈ C1 , . . . , ∗ ∈ Ch ∈ sC0 , ∗ cn cn (d + 1) cn (d + h)



= μX (sC0 × C1 × · · · × Ch ) is locally uniform by monotonicity and Lemma E.4.13. Therefore, if C0 × · · · × Ch is a continuity set for μX , writing n = [b0 (t)], we have   X0 Xd+1 Xd+h ∈ C0 , ∈ C1 , . . . , ∈ Ch lim b0 (t)P t→∞ t bd+1 (t) bd+h (t)  X0 b0 (t) b← ([b0 (t)]) = lim nP C0 , ∈ 0 t→∞ [b0 (t)] cn t  Xd+1 Xd+h ∈ C1 , . . . , ∗ ∈ Ch c∗n (d + 1) cn (d + h) = μX (C0 × C1 × · · · × Ch ). This shows that μX is the only possible vague# limit of the sequence of measures    X 0 Xd+1 Xd+h , ,..., μt = b0 (t)P ∈· . t bd+1 (t) bd+h (t)

3.2 Conditioning on an extreme event

63

If we prove that the collection of measures {μt , t ≥ 1} is tight, then (3.2.2) follows. The argument is similar to the one used in the proof of Theorem B.2.2 and is omitted. To prove that the functions bd+i are regularly varying, we can assume without loss of generality that h = 1 and replacing X 0 by |X 0 | that d = 0 and X0 is non-negative. We assume also that limn→∞ nP(X0 > cn ) = 1. Write then simply c∗n for c∗n (1) and define H(x, y) = μX ((x, ∞) × (−∞, y]) , x > 0, y ∈ R . By (3.2.1), we have, for all but countably many x > 0 and y ∈ R, lim nP(X0 > cn x, X1 ≤ c∗n y) = H(x, y) .

n→∞

Let Fn be the distribution function of X1 given X0 > cn . Then, for a fixed t > 0, we thus have lim F[nt] (y/c∗[nt] ) = H(1, y) .

n→∞

We also have, for each fixed y and by local uniform convergence, lim F[nt] (y/c∗n ) = lim [nt]P(X0 > c[nt] , X1 ≤ c∗n y) n→∞   c[nt] [nt] ∗ nP X0 > cn = lim , X1 ≤ cn y = H(t1/α , y) . n→∞ n cn

n→∞

By the Convergence to Types Theorem A.2.17, this implies that for every t > 0, there exists λt > 0 such that lim

n→∞

c∗[nt] c∗n

= λt .

By Proposition 1.2.2, this proves that the sequence c∗n is regularly varying with index ρ ≥ 0 since cn is non-decreasing. Thus the function b defined by b(t) = c∗[b0 (t)] is regularly varying with index ρα. This shows that (i) =⇒ (ii). If (i) and (ii) hold, we already noted that X 0 is regularly varying so (a) holds. We now prove (b). On one hand, (3.2.2) implies that   X0 Xd+1 Xd+h ∈ C0 , ∈ C1 , . . . , ∈ Ch lim b0 (t)P t→∞ st bd+1 (st) bd+h (st)  X0 b0 (t) b0 (st)P ∈ C0 , = lim t→∞ b0 (st) st  Xd+1 Xd+h ∈ C1 , . . . , ∈ Ch bd+1 (st) bd+h (st) = s−α μX (C0 × Cd+1 × · · · × Cd+h ) .

64

3 Dealing with extremal independence

On the other hand, because of regular variation of the functions bj ,   X0 Xd+1 Xd+h ∈ C0 , ∈ C1 , . . . , ∈ Ch lim b0 (t)P t→∞ st bd+1 (st) bd+h (st)  X0 = lim b0 (t)P ∈ sC0 , t→∞ t  Xd+1 bd+1 (st) bd+h (st) Xd+h ∈ C1 , . . . , ∈ Ch bd+1 (t) bd+1 (t) bd+h (t) bd+h (t) = μX (sC0 × sκd+1 Cd+1 × · · · × sκd+h Cd+h ) . Lastly, (c) follows by a second application of the Convergence to type Theorem A.2.17.  In order to stress the importance of the index of regular variation of the functions bj , we introduce the following terminology. Definition 3.2.3 (Conditional scaling exponent) If (3.2.2) holds, the index κd+j of regular variation of the function bd+j , j = 1, . . . , h, is called the lag j conditional scaling exponent of Xd+j given X 0 is extreme.

Relation to extremal independence As already mentioned, Definition 3.2.1 does not assume regular variation of the vector X. If X is regularly varying, then by the Convergence to Types Theorem, limt→∞ t−1 bd+i (t) < ∞, since otherwise the non-degeneracy assumption would not hold. If limt→∞ t−1 bd+i (t) = 0 for one i, then X is extremally independent. Limiting Conditional Distribution The sequence cn in Definition 3.2.1 is such that μX ((B1c (3 =, 0)) × Rh ) = ν X 0 (B1c (3 =, 0)) = 1. In that case, write   x = X 0 , Xd+1 , . . . , Xd+h X . (3.2.4) x bd+1 (x) bd+h (x) Then, (3.2.2) yields, for all bounded continuous functions f on Rd \ {0} × Rh , x ) | |X 0 | > x] = μ (f 1B c (3=,0)×Rh ) . lim E[f (X X 1

x→∞

(3.2.5)

3.2 Conditioning on an extreme event

65

Let (Υ 0 , Υd+1 , . . . , Υd+h ) be a random vector with distribution μX (· ∩ ((B c (0, 1) × Rh )). Then P(|Υ 0 | > 1) = 1 and |Υ 0 | is a Pareto random variable with tail index α. Define the vector (W 0 , Wd+1 , . . . , Wd+h ) by   Υ0 Υd+1 Υd+h , ,..., W = (W 0 , Wd+1 , . . . , Wd+h ) = . (3.2.6) κ |Υ 0 | |Υ 0 |κd+1 |Υ 0 | d+h Then, W is independent of |Υ 0 | and the convergence (3.2.5) can be reexpressed as x ) | |X 0 | > x] lim E[f (X

x→∞

= E[f (Υ 0 , Υd+1 , . . . , Υd+h )] ∞ = E[f (uW 0 , uκd+1 Wd+1 , . . . , uκd+h Wd+h )]αu−α−1 du .

(3.2.7)

1

This also yields

  X0 Xd+1 Xd+h , ,..., lim E f | |X 0 | > x x→∞ |X 0 | bd+1 (|X 0 |) bd+h (|X 0 |) = E[f (W 0 , Wd+1 , . . . , Wd+h ) .

(3.2.8)

The measure μX can be recovered from the vector (W 0 , W1 , . . . , Wh ). Proposition 3.2.4 Let X satisfy Definition 3.2.1 and assume that μX (B1c (3 =, 0) × Rh ) = 1. Let (W 0 , W1 , . . . , Wh ) be as in (3.2.6). Then, ∞ μX = E[δ(sW 0 ,sκ1 W1 ,...,sκh Wh ) ]αs−α−1 ds . (3.2.9) 0

Examples Example 3.2.5 (Example 2.2.10 and 3.1.6 continued) Recall that the vector X is defined by Xj = σj Zj , j = 0, . . . , h, where Z0 , . . . , Zh are i.i.d. regularly varying positive random variables with index −α and common distribution function FZ , and (σ0 , . . . , σh ) is a vector of positive, possibly dependent random variables such that E[σ0α+ ] < ∞ for some  > 0, and (Z0 , . . . , Zh ) is independent of (σ0 , . . . , σh ). Denote Z j,h = (Zj , . . . , Zh ), σ j,h = (σj , . . . , σh ), j = 1, . . . , h − 1, and define the multiplication of vectors as componentwise. For D ⊂ Rh , by Breiman’s Lemma (Theorem 1.4.3) and Potter’s bounds (Lemma 1.4.2), we have

66

3 Dealing with extremal independence

lim P((X1 , . . . , Xh ) ∈ D | X0 > x)

x→∞

  E P(σ 1,h · Z 1,h ∈ D)F Z (u0 x/σ0 ) = lim x→∞ P(X0 > x) E [P(σ 1,h · Z 1,h ∈ D) σ0α ] . = E[σ0α ]

Hence, we can choose b1 = · · · = bh ≡ 1 to obtain a non-zero limit in (3.2.2). The joint limiting conditional distribution of (X0 , Xj ), j = 1, . . . , h, given that X0 is extreme is thus lim P(X0 > xu0 , Xj ≤ u1 | X0 > x) = u−α 0

x→∞

E [FZ (u1 /σj )σ0α ] . E[σ0α ]

(3.2.10)

Choosing C = {(x0 , x1 ) : x0 + x1 > 1} or C = {(x0 , x1 ) : x0 ∨ x1 > 1}, we obtain, for D ⊆ Rh−1 , lim P((X2 , . . . , Xh ) ∈ D | X0 + X1 > x)

x→∞

=

E [P(σ 2,h · Z 2,h ∈ D) {σ0α + σ1α }] , E[σ0α + σ1α ]

lim P((X2 , . . . , Xh ) ∈ D | X0 ∨ X1 > x)

x→∞

=

E [P(σ 2,h · Z 2,h ∈ D) {σ0α ∨ σ1α }] . E[σ0α ∨ σ1α ]

In the present framework, we cannot consider C = {(x0 , x1 ) : x0 > 1, x1 > 1} as a conditioning set, since ν (X0 ,X1 ) (C) = 0, where ν (X0 ,X1 ) is the exponent  measure of (X0 , X1 ). Example 3.2.6 (Example 3.1.7 continued) We now use different normalizations for X0 and X1 . By conditioning on Z1 , we obtain, P(X0 > xu0 ,X1 > xγ u1 ) ∞ = F Z ((xu0 /z)1/γ ) F Z ((x/z)γ u1 ) FZ (dz) 0 ∞ 1/γ F Z (w−γ u1 ) F Z (u0 w−1/γ ) FZ (xdw) = 0 ∞ F Z (w−γ u1 ) F Z ((u0 /w)1/γ ) αw−α−1 dw . ∼ F¯Z (x) 0

The integral is convergent at zero since γ + 1/γ > 1. Since P(X0 > x) ∼ E[Z0γα ]F Z (x), this yields lim P(X0 > xu0 , X1 > xγ u1 | X0 > x) ∞ F Z (w−γ u1 ) F Z ((u0 /w)1/γ ) αw−α−1 dw . (3.2.11) = 0 E[Z0αγ ]

x→∞

3.2 Conditioning on an extreme event

67

Hence, b1 (x) = xγ and the conditional limiting distribution is not in product form. Since X0 and X2 are independent, b2 (x) = 1. The conditional scaling functions corresponding to different lags are different. Consider now C = {(x0 , x1 ) : x0 ∨ x1 > 1}. Conditioning on Z2 yields, for u0 ≥ 1 and u2 ≥ 0, P(X0 ∨ X1 > xu0 , X2 ≤ xγ u2 ) = P(Z0γ Z1 ∨ Z1γ Z2 > xu0 , Z2γ Z3 ≤ xγ u2 ) ∞ = P(Z0γ Z1 ∨ Z1γ z > xu0 )P(Z3 ≤ (x/z)γ u2 )FZ (dz) 0 ∞ = P((Z0γ Z1 /x) ∨ (wZ1γ ) > u0 )FZ (w−γ u2 )FZ (xdw) . 0

Since limx→∞ P((Z0γ Z1 /x) ∨ wZ1γ > u0 ) = F ((u0 /w)1/γ ) and a probability is bounded by one, by bounded convergence, we obtain P(X0 ∨ X1 > u0 x , X2 ≤ xγ u2 ) x→∞ F Z (x) ∞ = F Z ((u0 /w)1/γ )FZ (w−γ u2 )αw−α−1 dw . lim

0

Taking u2 = ∞, we obtain that P(X0 ∨ X1 > u0 x) ∼ E[Z0αγ ]F Z (x) and lim P(X0 ∨ X1 > u0 x, X2 ≤ xγ u2 | X0 ∨ X1 > x) ∞ 1 = F Z ((u0 /w)1/γ )FZ (w−γ u2 )αw−α−1 dw . E[Z0αγ ] 0

x→∞

Again, the limiting distribution does not have a product form.



Example 3.2.7 (An example with κ = 0 but b not constant) Assume that Z0 , Z1 , Z2 are i.i.d. Pareto random variables with tail index α > 0, that is, F Z (x) = P(Z0 > x) = x−α , x > 1. Consequently, log(Z0 ) has an exponential distribution. Define X0 = Z1 log(Z0 ), X1 = Z2 log Z1 . By Breiman’s Lemma, X0 is regularly varying with index −α and P(X0 > x) ∼ E[{log Z0 }α ] P(Z0 > x) = α1−α Γ (α)P(Z0 > x) as x → ∞. Conditioning on Z1 and the choice b(x) = log(x) yield, for y ≥ 1 and z ≥ 0, P(X0 ≤ xy , X1 ≤ b(x)z | X0 > x)   ∞  z log(x) xy −α−1 1 x < log(Z0 ) ≤ αu P Z2 ≤ du = P P(X0 > x) 1 log(u) u u     z log(x) P(Z1 > x) ∞ FZ = P v −1 < log Z0 ≤ v −1 y αv −α−1 dv . P(X0 > x) 1/x log(xv)

68

3 Dealing with extremal independence

Since log(xv)/ log(x) converges to 1 on (0, ∞) (uniformly on compact sets) and since the distribution function FZ is continuous and bounded, by dominated convergence, we obtain that lim P(X0 ≤ xy , X1 ≤ b(x)z | X0 > x) ∞   FZ (z) = 1−α P v −1 < log Z0 ≤ v −1 y αv −α−1 dv α Γ (α) 0 ∞ FZ (z) = 1−α {e−αw − e−αyw }αwα−1 dv α Γ (α) 0 = FZ (z) {1 − y −α } = FZ (y)FZ (z) .

x→∞

We see that the scaling function is log(x) and the conditional scaling exponent is thus 0.  Example 3.2.8 (A limiting distribution does not necessarily exist) Consider a stationary standard Gaussian process {ξn , n ≥ 0} and define 2 σt = ecξt , with c < 1/2. Assume moreover that |cov(ξ0 , ξn )| < 1 for all n ≥ 1. This is not a stringent assumption since a sufficient condition is that the π process {ξn } has a spectral density f such that −π f (t) dt = 1. In that case, asymptotic independence holds for the bivariate distributions, but a non2 2 trivial conditional limiting distribution ecξh given ecξ0 > x does not exist.  Convergence of expectations We can extend Proposition 3.2.4 to unbounded functions. Recall (3.2.4). Proposition 3.2.9 Let X be a random vector in Rd+h satisfying Definition 3.2.1. Let g be a measurable function defined on Rd+h such that   x )|1{|X 0 | ≤ x} E |g(X =0, (3.2.12) lim lim sup →0 x→∞ P(|X 0 | > x) and either g is bounded or there exists δ > 0 such that   x )|1+δ 1{|X 0 | > x} E |g(X lim sup x) x→∞ Then

0

and



E[|g(sW 0 , sκ1 W1 , . . . , sκh Wh )|]αs−α−1 ds < ∞

(3.2.13)

3.2 Conditioning on an extreme event

x )] = lim E[g(X

x→∞

0



69

E[g(sW 0 , sκ1 W1 , . . . , sκh Wh )]αs−α−1 ds . (3.2.14)

Proof. Let μx be the measure defined on Rd \ {0} × Rh by μx =

x ∈ ·) P(X . P(|X 0 | > x)

(3.2.15)

v#

By assumption, μx −→ μX on (Rd \ {0}) × Rh . Conditions (3.2.12) and (3.2.13) imply those of Theorem B.1.22, thus (3.2.14) holds.  Example 3.2.10 (Example 3.2.5 continued) If α > 1 then for any j = 1, . . . , h we have, integrating (3.2.10), lim E [Xj | X0 > x] =

x→∞

E[Z0 ]E[σj σ0α ] . E[σ0α ]

We see that the CTE here converges to a constant without any normalization. This is coherent with the result obtained in Example 3.2.5.  Example 3.2.11 (Example 3.2.6 continued) If α > 1 > γ, we have, integrating (3.2.11), lim x−γ E[X1 | X0 > x] =

x→∞

γ(α−γ)

] αE[Z0 ]E[Z0 . αγ (α − γ)E[Z0 ]

The growth of the CTE is here at the sublinear rate xγ .



3.2.1 Products of extremally independent random variables It was already mentioned in Section 2.1.4 that if (X0 , X1 ) is a regularly varying random vector with extremal independence, then the possible regular variation of the product X0 X1 is not a consequence of the multivariate regular variation. The heuristic reason is that a product can be big when one of the factor is big or when both are big, so we need to have information on the probability that both factors are big. This information is not given by the exponent measure when there is extremal independence. For simplicity we only consider the case of non-negative components.

70

3 Dealing with extremal independence

Proposition 3.2.12 Let (X0 , X1 ) be partially regularly varying with limiting measure μX such that μX ((1, ∞) × [0, ∞)) = 1 and assume moreover that for all y > 0, lim lim sup

→0 x→∞

P(X0 1{X0 ≤ x}X1 > xb(x)y) =0. P(X0 > x)

Then 

α 1+κ1

E W1









= 0

0



1{uv > 1}μX (du, dv) < ∞

(3.2.16)

(3.2.17)

and   α α P (X0 X1 > xb(x)y) = y − 1+κ1 E W11+κ1 . n→∞ P(X0 > x) lim

(3.2.18)

Proof. By vague# convergence and homogeneity, we have, for all  > 0, ∞ ∞ P (X0 1{X0 > x}X1 > xb(x)y) = 1{uv > y}μX (du, dv) lim x→∞ P(X0 > x)    0 α = E (W1 /y) 1+κ1 ∧ −α . By (3.2.16) and monotone convergence, this yields ∞ ∞ P (X0 X1 > xb(x)y) = lim lim 1{uv > y}μX (du, dv) n→∞ →0  P(X0 > x) 0   α α = y − 1+κ1 E W11+κ1 . We must now prove that the latter quantity is finite. This follows from condition (3.2.16). Indeed, choose  > 0 such that lim sup x→∞

P (X0 1{X0 ≤ x}X1 > xb(x)y) ≤1. P(X0 > x)

Then, for every 0 < η < , we have  ∞ P (X0 1{ηx < X0 ≤ x}X1 > xb(x)y) 1{uv > y}μX (du, dv) = lim x→∞ P(X0 > x) η 0 P (X0 1{X0 ≤ x}X1 > xb(x)y) ≤1. P(X0 > x) ∞ By monotone convergence, this proves that 0 0 1{uv > y}μX (du, dv) < ∞.  ≤ lim sup x→∞

3.3 Problems

71

If X0 and X1 are fully independent, then negligibility condition (3.2.16) holds  under the condition of Breiman’s Theorem 1.4.3 i.e. E[X1α ] < ∞ for some α > α > 0.

3.3 Problems 3.1 In Section 3.1, give the equivalent of (3.1.5) and (3.1.6) for the 2 norm. For a bivariate vector with i.i.d. components, give the equivalent of (3.1.7) for the 2 norm. 3.2 In Section 3.1, give the equivalent of (3.1.5) and (3.1.6) for the ∞ norm. For a bivariate vector with i.i.d. positive components, give the equivalent of (3.1.7) for the ∞ norm. 3.3 Let X0 and X1 independent non-negative random variables. 1. Assume that X0 and X1 are Pareto with indices α and β, respectively. ˜ = α + β. Show that (X0 , X1 ) has hidden regular variation with index α 2. Assume that X0 is regularly varying with index α. Show that (X0 , X1 ) has partial regular variation with b1 (x) = 1. 3. Assume that X0 is regularly varying with index α and E[X1α+δ ] < ∞. Show that (3.2.16) holds. Conclude that X0 X1 is regularly varying with index α. 4. Assume that X0 and X1 have the same Pareto distribution with tail index α. Show that (3.2.16) does not hold. 3.4 Let {E0 , E1 , E2 } be i.i.d. random variables with an exponential distribution with mean 1. For q ∈ (0, 1) and c > 0, define Xi =ec(qEi +Ei+1 ) , i = 0, 1. 1. Prove that (X0 , X1 ) is regularly varying with tail index α = 1/c and is extremally independent. 2. Compute the index of hidden regular variation of (X0 , X1 ). 3. Compute the conditional scaling exponent of X1 given X0 is extreme. 4. Compute the tail index of X0 X1 . Hint: See Example 3.1.7, 3.2.6. 3.5 Let X0 , X1 be random variables. Assume that there exists a function b and a proper distribution function G on R such that for all t > 0 and y ∈ R lim

x→∞

P(X0 > tx, X1 ≤ yb(x)) = t−α G(y) . P(X0 > x)

Prove that b is slowly varying at infinity.

72

3 Dealing with extremal independence

3.6 Let Zi be i.i.d. unit exponential random variables. Fix α > 0, a ∈ (0, 1), and h ≥ 1. Define Yj =

∞ 

−1

aj Zi−j , Xj = eα

Yj

, j = 0, h .

j=0

1. Prove that the vector (X0 , Xh ) is regularly varying with tail index α and extremally independent. 2. Compute the scaling exponent κh and the limiting distribution of Xh suitably normalized given X0 > x when x → ∞. 3. Compute the hidden regular variation index of (X0 , Xh ) and the hidden spectral measure. 4. Compute the regular variation index of X0 Xh . 3.7 Assume that (X, Y ) is a vector of non-negative random variables which satisfies Definition 3.2.1 with conditional scaling exponent κ and let W1 be as in (3.2.6). Let q > 0. Assume that there exists δ > 0 such that   q+δ Y |X>x 0,

q Y 1 αE[W1q ] qκ−α E q 1{X>xs} = s lim . x→∞ P(X > x) b (x) α − qκ 3.8 Let X0 , Y0 be i.i.d. non-negativerandom variables, regularly varying with tail index α > 0 and κ ∈ (0, 1). Define X1 = X0κ + Y0 . 1. Prove that (X0 , X1 ) is regularly varying and extremally independent. 2. Show that the index of hidden regular variation of (X0 , X1 ) is α/κ. 3. Show that the conditional scaling exponent of X1 given X0 is κ.

3.4 Bibliographical notes Hidden regular variation was introduced formally in [Res02] using the cone Cd,2 (E0 in the notation of [Res02]). The index of hidden regular variation α ˜ was anticipated in [LT96] as the tail dependence coefficient η = α/˜ α. In [MR04] hidden regular variation was introduced relative to a given subcone. In particular, besides E0 , the cone E00 = (0, ∞) is used. [MR04, Example 5.1] (cf. Example 3.1.5) gives different indices of regular variation on E0 and E00 . This implies that the minimum is not characterized by the hidden regular

3.4 Bibliographical notes

73

variation on Cd,2 = E0 as erroneously stated in [Res02, Theorem 1 and page 304]. The issue is that the function d(a) that appears in [Res02, Theorem  ˜ X ({x ∈ Rd : di=1 ai xi > 1}), where at 1(ii)] may vanish. Indeed, d(a) = ν least two components of a = (a1 , . . . , ad ) are finite. If d = 3 and a1 = a2 = a3 = 1, then d(a) = 0. Conditioning on an extreme event is related to the old problem of the concomitant, that is, the Y observation corresponding to the maximum of the X observation in a bivariate sample (X1 , Y1 ), . . . , (Xn , Yn ). The earlier references were concerned with elliptical distributions with light-tailed radial component. See for instance [EG81] or [Ber92, Theorem 12.4.1] and (Problem 7.7). For regularly varying random variables, [MRR02] seems to be the first article that deals (implicitly) with partial regular variation by considering vague convergence of vectors of non-negative random variables on the cone (0, ∞) × [0, ∞). A rigorous framework for conditioning on extreme event is [HR07] in the case of both Gumbel and Fr´echet domain of attraction. The name “conditional extreme value” has been used for the condition in Definition 3.2.1. We have avoided this terminology since it is ambiguous: the conditioning is on the variable which is in the domain of attraction of an extreme value distribution, and the limiting conditional distribution of the second one, given the first one is extremely large, is not necessarily an extreme value distribution. This terminology has also been used to describe extreme value theory in presence of covariates. Hidden regular variation and conditioning on extreme event can be studied in a unified manner using regular variation on subcones. The cone E0 gives hidden regular variation while (0, ∞) × [0, ∞) yields conditioning on an extreme event in the sense of Section 3.2. See [Res08, MR11, DMR13]. In [DR11] the authors discuss links between hidden regular variation and conditioning on extreme event. As shown in [DR11, Example 4], it is possible to construct random vectors which have hidden regular variation, but not the other property. The term conditional scaling exponent was introduced in [KS15]. Different issues with conditioning on extreme event are discussed in [DJ17]. These issues are mainly due to compactification with a point at ∞ and do not arise when vague# convergence is used as in the present chapter. In the framework of conditioning on extreme events, the first results on regular variation of products can be found in [MRR02]. Lemma 2.2 therein corresponds to the case κ = 0 and b(x) ≡ 1 in Proposition 3.2.12. Proposition 3.2.12 is adapted from [KS15]. See also [HM11] for a variety of similar results. Risk measures (including conditional expectations) under hidden regular variation and conditional extreme value model are considered in [DFH18] and [KT19], respectively. Problem 3.7 is adapted from [KT19].

4 Regular variation of series and random sums

In this chapter, we will consider infinite weighted sums of i.i.d. regularly varying random vectors. We will provide conditions on the weights, possibly random and possibly dependent of the summands, to ensure the existence and the regular variation of the sum of the series. This will be the main result of the chapter, Theorem 4.1.2. We will apply it to series with deterministic weights and random sums. The results of this chapter will be useful to study certain models in Part III. Throughout this chapter, |·| denotes an arbitrary norm on Rd and · the associated matrix norm.

4.1 Series with random coefficients Let {Z j , j ∈ Z} be a sequence of i.i.d. regularly varying Rd -valued random vectors with tail index α > 0 and let {Aj , j ∈ Z} be a sequence of random d × d matrices. We want to generalize Corollary 2.1.14 to series of the form ∞ 

Ai Z i .

(4.1.1)

i=0

For an infinite series, the dependence structure must be specified and conditions for the summability of the series are also needed. We first introduce the assumption on the dependence. Assumption 4.1.1 (i) {Z j , j ∈ Z}[N] is a sequence of i.i.d. random vectors. (ii) The distribution of Z 0 is multivariate regularly varying with tail index α > 0 and exponent measure ν Z characterized by

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 4

75

76

4 Regular variation of series and random sums

P(x−1 Z 0 ∈ ·) v# −→ ν Z , P(|Z 0 | > x) as x → ∞ in Rd \ {0}. (iii) There exists a filtration {Fj , j ∈ Z}[N] such that, for all j ≥ 0, • Aj is Fj measurable and Z j is Fj+1 measurable; • the σ-fields Fj and σ(Z k , k ≥ j) are independent.

Under Assumption 4.1.1, for each fixed j, Aj and Z j are independent. If α+ moreover E[Ai  ] < ∞ for some  > 0, then by Corollary 2.1.14, Ai Z i is regularly varying with exponent measure E[ν Z ◦ A−1 i ], i.e. P(x−1 Ai Z i ∈ ·) v# −→ E[ν Z ◦ A−1 i ]. P(|Z 0 | > x) We thus set νi = E[ν Z ◦ A−1 i ].

(4.1.2)

If ΘZ 0 is a random vector whose distribution is the spectral measure of Z 0 (with respect to the chosen norm) and independent of {Aj , j ∈ Z}, then we have by Corollary 2.1.14 and Proposition 2.2.3 (see also Problem 2.4),  ∞ −α−1 νi (B) = P(rAi ΘZ dr . 0 ∈ B)αr 0

Choosing B = B c (0, 1) yields  ∞    α c −α−1   ] ≤ E[Ai α ] . νi (B (0, 1)) = P(r Ai ΘZ dr = E[Ai ΘZ 0 > 1)αr 0 0

Therefore, if ν by

∞

i=0

α

E [Ai  ] < ∞, we can define a boundedly finite measure

ν=

∞ 

E[ν Z ◦ A−1 i ].

(4.1.3)

i=0

Moreover, ν is not the null measure if and only if

∞ i=0

 α  ] > 0. E[Ai ΘZ 0

4.1 Series with random coefficients

77

Theorem 4.1.2 Let Assumption 4.1.1 hold with E[Z 0 ] = 0 if α > 1. Assume that there exists  ∈ (0, α) such that ∞ 

α−

E[Ai 

+ Ai 

α+

] < ∞ if α ∈ (0, 1) ∪ (1, 2) ,

(4.1.4a)

i=0

⎡ (α+)/(α−) ⎤ ∞  α− ⎦ < ∞ if α ∈ {1, 2} , E⎣ Ai 

(4.1.4b)

i=0

⎡ (α+)/2 ⎤ ∞  2 ⎦ < ∞ if α > 2 . E⎣ Ai 

(4.1.4c)

i=0

Then

∞ i=0

E[Ai α ] < ∞. If moreover ∞   α  >0, E A i ΘZ 0

(4.1.5)

i=0

∞ d then ∞ ν = i=0 νi is a non-zero boundedly finite measure on R \ {0} and i=0 Ai Z i is regularly varying with tail index α and exponent measure ν, i.e. ∞ P(x−1 i=0 Ai Z i ∈ ·) v# −→ ν , (4.1.6) P(|Z 0 | > x) as x → ∞ in Rd \ {0}.

Before proving Theorem 4.1.2, we give several corollaries. In the case α > 1 we can deal with a non-zero expectation. Corollary 4.1.3 Let Assumption 4.1.1 and the conditions (4.1.4) ∞ hold with α > 1 and E[Z 0 ] = 0. If in addition the series SA = i=0 Ai is almost surely convergent and P(|SA E[Z 0 ]| > x) =0, n→∞ P(|Z 0 | > x) lim

then the conclusion of Theorem 4.1.2 still holds.

Proof Write

78

4 Regular variation of series and random sums ∞ 

Ai Z i =

i=0

∞ 

Ai (Z i − E[Z 0 ]) + E[Z 0 ]

i=0

∞ 

Ai .

i=0

Under the assumptions of the corollary, the second series is summable and with a lighter tail than the first one to which we can apply Theorem 4.1.2. To deal easily with two-sided series with random coefficients, we will assume that the weights are independent from the series Z. Assumption 4.1.4 (i) {Z j , j ∈ Z}[Z] is a sequence of i.i.d. random vectors. (ii) The distribution of Z 0 is multivariate regularly varying with tail index α > 0 and exponent measure ν Z . (iii) The sequences {Z j , j ∈ Z}[Z] and {Aj , j ∈ Z}[Z] are independent.

Corollary 4.1.5 Let Assumption 4.1.4 hold. Assume moreover that the conditions (4.1.4) hold for the two-sided series. If α > 1 we assume addi∞ tionally that E[Z 0 ] = 0. Then i=−∞ E[Ai α ] < ∞, ν is a boundedly ∞ finite measure on Rd \ {0} and i=−∞ Ai Z i is regularly varying with tail index α and exponent measure ν (in the sense of (4.1.6)) if (4.1.5) holds.

Proof of Theorem 4.1.2 The proof consists in proving first the convergence of the series, then the regular variation of a finite sum with exponent measure given by (4.1.3) and finally using a truncation argument. Summability of

∞ i=0

E[Ai α ] α

α−

α+

For α ∈ (0, 1) ∪ (1, 2), Ai  ≤ Ai  + Ai  , thus (4.1.4a) implies the required summability. To deal with the other cases, note that for all 0 < b ≤ a and non-negative numbers {xj , j ∈ N}, ∞  j=0

⎛ xaj ≤ ⎝

∞  j=0

⎞a/b xbj ⎠

.

(4.1.7)

4.1 Series with random coefficients

79

Thus, for α = 1, 2 and  ∈ (0, α) we have by H¨ older inequality   ∞ ∞   α E [Ai  ] = E Ai α i=0

i=0

⎡ α ⎤ α− ∞  ⎦ ≤ E⎣ Ai α− i=0

α ⎛ ⎡ α+ ⎤⎞ α+ α− ∞  ⎦⎠ ≤ ⎝E ⎣ Ai α− 2, (4.1.7) and (4.1.4c) yield ∞ 

 E [Ai α ] = E

i=0

∞ 

⎡



Ai α ≤ E ⎣

i=0

Convergence of the series

∞ 

α2 ⎤ Ai 2

⎦ 1, let p  = α −  with  such that p > 1 if α ≤ 2 and p = 2 if n α > 2. Set Sn = i=0 Ai Z i . Then Sn is a martingale and by Burkholder’s inequality (E.1.6), we have for all n E[|Sn |p ] ≤ E[|Z 0 |p ]

∞ 

p

E[Ai  ] < ∞ .

i=0

By the martingale convergence Theorem E.1.5, this implies that Sn converges almost surely. Regular variation, case of a finite sum m For a fixed m, write Sm = i=0 Ai Z i . We will prove first that the vectors Ai Z i are pairwise extremally independent in the sense of Definition 2.1.7, then apply Proposition 2.1.8 to prove that the vector W m = (A0 Z 0 , . . . , Am Z m ) is regularly varying and identify its exponent measure. We will conclude by applying Proposition 2.1.12.

80

4 Regular variation of series and random sums

We prove pairwise extremal independence of A0 Z 0 and A1 Z 1 . The argument for the other pairs is exactly the same. Let FZ denote the distribution function of |Z 0 |. Then, recalling that (A0 , A1 ) is independent of Z 1 , by conditioning and applying Potter’s bound (1.4.3), we obtain, for x ≥ 1, P(|A0 Z 0 | > sx, |A1 Z 1 | > tx) F Z (x)   E 1{|A0 Z 0 | > sx}F Z (tx/ A1  ≤ F Z (x)

α  − ≤ cst E A1  (A1  ∨ A1  )1{|A0 Z 0 | > sx} . By the dominated convergence theorem, this implies lim sup x→∞

P(|A0 Z 0 | > sx, |A1 Z 1 | > tx) F Z (x)

α  − ≤ cst lim sup E A1  (A1  ∨ A1  )1{|A0 Z 0 | > sx} = 0 . x→∞

Thus we can apply Proposition 2.1.8 to show that the vector W m is regularly varying with exponent measure νW m =

m 

δ 0 ⊗ · · · ⊗ νi ⊗ · · · ⊗ δ 0 .

i=0

More precisely, for all continuity sets of the measure ν W m , lim

x→∞

P(W m ∈ xA) = ν W m (A) . P(|Z 0 | > x)

We now apply Proposition 2.1.12 with the map gm defined on Rd(m+1) by gm (x0 , . . . , xm ) = x0 + · · · + xm . −1 . This yields that Sm is regularly varying with exponent measure ν W m ◦ gm d For each i = 0, . . . , m and Borel set A ∈ R , we have −1 δ 0 ⊗ · · · ⊗ ν i ⊗ · · · ⊗ δ 0 ◦ gm (A)

= δ0 ⊗ · · · ⊗ νi ⊗ · · · ⊗ δ0 ({x ∈ Rd(m+1) : x0 + · · · + xm ∈ A}) = νi (A) . m Thus the exponent measure of Sm is ν m = i=0 νi . Truncation argument ∞ Denote S = i=0 Ai Z i . Let B ∈ Rd \ {0} be separated from 0 and such that ν(∂B) = 0, which implies that ν m (∂B) = 0 for all m ≥ 1. For each m ≥ 1, we have, by the preceding part of the proof,

4.1 Series with random coefficients

81

P(Sm ∈ xB) = ν m (B) . x→∞ P(|Z 0 | > x) lim

By the second part of Proposition 4.1.6, we have lim lim sup

m→∞ x→∞

P(|S − Sm | > x) =0. P(|Z 0 | > x)

(4.1.8)

For  > 0, let B c be the complement of B and B  be the -enlargement of B with respect to the chosen norm, i.e. B  = {x ∈ Rd | inf y ∈B |x − y| < }. For  small enough, B  is also bounded away from zero. Then, |P(S ∈ xB) − P(Sm ∈ xB)| ≤ P(Sm ∈ x{B  ∩ (B c ) }) + P(|S − Sm | > x) . Regular variation of Sm implies P(Sm ∈ x{B  ∩ (B c ) }) = ν m (B  ∩ (B c ) ) x→∞ P(|Z 0 | > x) lim

and since ν m (∂B) = 0, lim→0 ν m (B  ∩(B c ) ) = 0. Using (4.1.8), for m large enough and  small enough, this yields     P(S ∈ xB)  lim sup  − ν(B) P(|Z 0 | > x) x→∞    P(S ∈ xB) P(Sm ∈ xB)   = lim lim sup  − m→∞ x→∞ P(|Z 0 | > x) P(|Z 0 | > x)  P(|S − Sm | > x) ≤ lim ν m (B  ∩ (B c ) ) + lim lim sup =0. m→∞ m→∞ x→∞ P(|Z 0 | > x) This proves the regular variation of S with exponent measure ν. We now prove the tail bound which makes the truncation argument valid. Proposition 4.1.6 If the assumptions of Theorem 4.1.2 hold, there exists a constant which does not depend on the sequence {Aj } such that, for all x ≥ 1,     ∞  P  j=0 Aj Z j  > x P(|Z 0 | > x)

⎤ ⎡⎛ α− ⎞ (α−)∧2 ∞ ⎥ ⎢  (α−)∧2 ⎠ Aj  ≤ cst × E ⎣⎝ ⎦ j=0

⎤ ⎡⎛ α+ ⎞ (α−)∧2 ∞  ⎥ ⎢ (α−)∧2 ⎠ Aj  + cst E ⎣⎝ ⎦ . (4.1.9) j=0

82

4 Regular variation of series and random sums

Remark 4.1.7 The conditions (4.1.4) imply that the series in the right-hand side of (4.1.9) is finite. If α > 2,  must be chosen such that α −  > 2 since we only assume (4.1.4c). For any η > 0, the bound (4.1.9) is valid for x ≥ η with a constant depending on η. As a consequence of Proposition 4.1.6, we obtain that there exists  a sequence of positive constants {cm } such that limm→∞ cm = 0 and P(| j=m Aj Z j | > x) ≤ cm P(|Z 0 | > x). Hence,      ∞ P  j=m Aj Z j  > x =0. (4.1.10) lim sup m→∞ x≥1 P(|Z 0 | > x) Propositions 4.1.6 and (4.1.10) will be crucial in the subsequent chapters. It will allow to obtain limit theorems for time series defined as infinite sums by means of m-dependent approximation. ⊕ The proof of Proposition 4.1.6 will be done separately for the cases α ∈ (0, 1), α = 1, α ∈ (1, 2), α = 2 and α > 2. Note first that this is actually a onedimensional problem since it suffice to prove the bound separately for each  ∞ j  term of the form i=0 Aj,j i Z i , for 1 ≤ j, j ≤ d and the superscripts meaning the entries of the matrix and the coordinates of the vectors. Therefore, from now on, we assume that d = 1 and {Aj , j ∈ Z} and {Zj , j ∈ Z} are two sequences of random variables satisfying Assumption 4.1.1 and the summability conditions (4.1.4). Proof (Proof of Proposition 4.1.6: Case 0 < α < 1). We have ∞ P(|S| > x) = P(|S| > x , ∨∞ j=0 |Aj ||Zj | > x) + P(|S| > x , ∨j=0 |Aj ||Zj | ≤ x) ⎛ ⎞ ∞ ∞   P(|Aj ||Zj | > x) + P ⎝ |Aj ||Zj |1{|Aj ||Zj |≤x} > x⎠ . ≤ j=0

j=0

Conditioning on Aj and applying Potter’s bound (1.4.3) yield, for x ≥ 1, ∞  P(|Aj ||Zj | > x)

P(|Z0 | > x)

j=0

≤ cst

∞ 

E[|Aj |α− ∨ |Aj |α+ ] .

(4.1.11)

j=0

Next by Markov inequality, the independence of Aj and Zj , and Problem 1.8, we obtain, for x ≥ 1, ⎛ ⎞ ∞  P⎝ |Aj ||Zj |1{|Aj ||Zj |≤x} > x⎠ j=0 ∞



1 E[|Aj ||Zj |1{|Aj ||Zj |≤x} ] x j=0

≤ cst P(|Z0 | > x)

∞  j=0

E[|Aj |α (|Aj | ∨ |Aj |− )] .

(4.1.12)

4.1 Series with random coefficients

Combining (4.1.11) and (4.1.12) yields for all x ≥ 1,      ∞ ∞ P  j=0 Aj Zj  > x  ≤ cst E[|Aj |α (|Aj | ∨ |Aj |− )] < ∞ . P(|Z0 | > x) j=0

83

(4.1.13)

To conclude, note that ∞ 

⎡⎛ α+ ⎤ ⎞ α− ∞ ⎥ ⎢  E[|Aj |α+ ] ≤ E ⎣⎝ |Aj |α− ⎠ ⎦ .

j=0

j=0



This concludes the proof in the case α < 1.

Proof (Proof of Proposition 4.1.6: Case 1 < α < 2). Assume first that Z0 has a continuous distribution. Since moreover E[Z0 ] = 0, there exists a function h and a constant Ch ≥ 1 such that E[Z0 1{−h(x) ≤ Z0 ≤ x}] = 0 , x ≥ 0 , Ch−1 x

≤ h(x) ≤ Ch x , x ≥ Ch .

(4.1.14a) (4.1.14b)

See Problem 1.10. Fix v ≥ Ch . Then      ∞ P  j=0 Aj Zj  > x





P(|Z0 | > x)      ∞ P  j=0 Aj Zj 1{|Aj | ≤ x/v} > x/2

+

∞  P(|Aj | > x/v)

P(|Z0 | > x) P(|Z0 | > x) j=0      ∞ P  j=0 Aj Zj 1{−h(x/|Aj |) ≤ Zj ≤ x/|Aj |}1{|Aj | ≤ x/v} > x/4 P(|Z0 | > x) ∞  P(Zj ∈ / [−h(x/|Aj |), x/|Aj |], |Aj | ≤ x/v) + P(|Z0 | > x) j=0 +

∞  P(|Aj | > x/v) j=0

P(|Z0 | > x)

= I + II + III .

(4.1.15)

We consider these three terms starting with III. By the lower bound (1.4.1b) and Markov inequality, we have, for  > 0 such that (4.1.4a) holds, and x ≥ 1 III ≤ cst xα+

∞ 

P(|Aj | > x/v) ≤ cst v α+

j=0

∞ 

E[|Aj |α+ ] < ∞ . (4.1.16)

j=0

If |Aj | ≤ x/v, then x/|Aj | ≥ v ≥ Ch ≥ 1 and [−h(x/|Aj |), x/|Aj |] ⊃ [−Ch−1 x/|Aj |, x/|Aj |] ⊃ [−Ch−1 x/|Aj |, Ch−1 x/|Aj |] .

84

4 Regular variation of series and random sums

Therefore, / [−h(x/|Aj |), x/|Aj |]) ≤ P(Ch |Aj ||Zj | > x) . P(Zj ∈ This entails II ≤

∞  P(Ch |Aj Zj | > x) j=0

P(|Z0 | > x)

.

Conditioning on Aj and applying Potter’s bound (1.4.3) yield, for x ≥ 1, II ≤ cst

∞ 

E[|Aj |α (|Aj | ∨ |Aj |− )] .

(4.1.17)

j=0

Set ξj = Aj Zj 1{−h(x/|Aj |) ≤ Zj ≤ x/|Aj |}1{|Aj | ≤ x/v}. If |Aj | ≤ x/v, then x/|Aj | ≥ v, thus we can apply (4.1.14b) and Problem 1.8, which yields, for x ≥ 1, E[|ξj |α+ ] ≤ E[|Aj Zj |α+ 1{|Aj Zj | ≤ Ch x}] ≤ cst E[|Aj |α (|Aj | ∨ |Aj |− )]xα+ P(|Z0 | > x) . Moreover, the predictability framework yields that {ξj } is a martingale difference sequence. Thus, applying Markov inequality and Burkholder inequality (E.1.6) with α + , we obtain I ≤ cst

∞ 

E[|Aj |α (|Aj | ∨ |Aj |− )] .

j=0

Gathering the three terms, we obtain, for x ≥ 1,     ∞  ∞ P  j=0 Aj Zj  > x  ≤ cst E[|Aj |α (|Aj | ∨ |Aj |− )] < ∞ . P(|Z0 | > x) j=0

(4.1.18)

Finally, to remove the assumption that Z0 is continuous, let Z˜0 be a symmetric continuous random variable such that P(|Z0 | > x) lim =1. ˜0 | > x) x→∞ P(|Z Then Z0 + Z˜0 and Z0 − Z˜0 have the same distribution and both have a continuous distribution. Let {Z˜j , j ≥ 0} be a sequence of i.i.d. copies of Z˜0 , independent of {Aj } and {Zj }. Then,   ⎞ ⎞ ⎛ ⎛ ∞  ∞  ∞      Aj Zj  > x⎠ = P ⎝ Aj (Zj + Z˜j ) + Aj (Zj − Z˜j ) > 2x⎠ P ⎝  j=0   j=0  j=0  ⎞ ⎛ ∞    ≤ 2P ⎝ Aj (Zj + Z˜j ) > 2x⎠ . j=0  Therefore, the bound (4.1.18) holds without the additional assumption.



4.1 Series with random coefficients

85

Proof (Proof of Proposition 4.1.6: Case α = 1). This case is a bit more involved. Define the function L on R+ by L(x) = E[|Z0 |1{|Z0 | ≤ x}] . By Proposition 1.4.6, L is slowly varying at infinity. We use a decomposition into four terms:      ∞ P  j=0 Aj Zj  > x

≤ ≤

P(|Z0 | > x)      ∞ P  j=0 Aj Zj 1{|Aj | ≤ x/v} > x/2

+

∞  P(|Aj | > x/v)

P(|Z0 | > x) P(|Z0 | > x) j=0     ∞ P  j=0 Aj Zj 1{|Aj Zj | ≤ x}1{|Aj | ≤ x/v} > x/4

P(|Z0 | > x) ∞ ∞  P(|Aj Zj | > x, |Aj | ≤ x/v)  P(|Aj | > x/v) + + P(|Z0 | > x) P(|Z0 | > x) j=0 j=0      ∞ P  j=0 Aj {Zj 1{|Aj Zj | ≤ x} − L(x/|Aj |}1{|Aj | ≤ x/v} > x/8 ≤ P(|Z0 | > x)     ∞ P  j=0 Aj L(x/|Aj |)1{|Aj | ≤ x/v} > x/8 + P(|Z0 | > x) ∞ ∞  P(|Aj Zj | > x, |Aj | ≤ x/v)  P(|Aj | > x/v) + + P(|Z0 | > x)) P(|Z0 | > x)) j=0 j=0 = I + I  + II + III .

The terms II and III are exactly the same as in the case 1 < α < 2 therefore we omit them. Consider first I  . By the Uniform Convergence Theorem 1.1.2, if |Aj | ≤ x/v, we have, for any  > 0, L(x/|Aj |)(x/|Aj |)− ≤ sup L(z)(z)− < ∞ . z≥v

Thus, by Markov inequality, 

   ∞ ∞ E P |Aj |1− 1{|Aj | ≤ x/v} > cst x1− ) |Aj |1− j=0 j=0 I ≤ ≤ cst 1+ . P(|Z0 | > x) x P(|Z0 | > x) By the bound (1.4.1b), the denominator is bounded away from zero for x ≥ 1, thus ⎤ ⎡ ∞  I  ≤ cstE ⎣ (4.1.19) |Aj |1− ⎦ < ∞ . j=0

To bound I, set ξj = Aj {Zj 1{|Aj Zj | ≤ x} − L(x/|Aj |)}1{|Aj | ≤ x/v}. By independence of Zj and Aj , Jensen’s inequality, we have for q > 1,

86

4 Regular variation of series and random sums

Lq (x/|Aj |) ≤ E[|Zj |q 1{|Aj Zj | ≤ x} | Aj ] . Thus, for  > 0, applying Problem 1.8, we obtain E[|ξj |1+ ] ≤ 2 E[|Aj |1+ (|Zj |1+ 1{|Aj Zj | ≤ x} + L1+ (x/|Aj |)1{|Aj | ≤ x/v}] ≤ 21+ E[|Aj Zj |1+ 1{|Aj Zj | ≤ x}1{|Aj | ≤ x/v}] ≤ cst E[|Aj |(|Aj | ∨ |Aj |− )]x1+ P(|Z0 | > x) . The predictability framework implies that {ξj } is a martingale difference sequence, thus Markov and Burkholder inequalities (E.1.6) applied with p = 1 +  yield I ≤ cst x−1−

∞ 

E[|ξj |1+ ] ≤ cst

j=0

∞ 

E[|Aj |(|Aj | ∨ |Aj |− )] .

(4.1.20)

j=0

Assumption (4.1.4b) and (4.1.7) applied with a = 1 + , b = 1 −  yield ⎡⎛ 1+ ⎤ ⎤ ⎞ 1− ⎡ ∞ ∞ ∞    ⎥ ⎢ E[|Aj |1+ ] = E ⎣ |Aj |1+ ⎦ ≤ E ⎣⎝ |Aj |1− ⎠ ⎦ < ∞ . j=0

j=0

j=0

Gathering (4.1.19), (4.1.20) and the previous bounds for the terms II and III yields    ⎧ ⎡ ⎤ ⎡ ⎤⎫   ∞ ∞ ∞ ⎨  ⎬ P  j=0 Aj Zj  > x  ≤ cst E ⎣ |Aj |1− ⎦ + E ⎣ |Aj |1+ ⎦ . ⎩ ⎭ P(|Z0 | > x) j=0

This concludes the proof if α = 1.

j=0



Proof (Proof of Proposition 4.1.6: Case α = 2). We start again with the decomposition (4.1.15) of the case 1 < α < 2. The terms II and III do not need to be reconsidered since the argument was valid for any α > 0. We only consider the term I. Applying Markov and Burkholder inequalities, we obtain, for q > 2 and v ≥ Ch ,  q/2 ! ∞ 2 2 E j=0 Aj Zj 1{−h(x/|Aj |) ≤ Zj ≤ x/|Aj |}1{|Aj | ≤ x/v} I ≤ cst xq P(|Z0 | > x)  q/2 ! ∞ 2 2 E A Z 1{|A Z | ≤ C x} j j h j=0 j j . ≤ cst xq P(|Z0 | > x) At this stage, we cannot use the conditioning argument as in the previous cases, because of the power q/2. So we need a little bit of magic: we can replace the sequence {Zj } by a sequence {Zˆj } which has the same distribution as {Zj } and is fully independent of the sequence {Aj }! (To define this sequence,

4.1 Series with random coefficients

87

the probability space might need to be enlarged.) We want to apply the bound (E.4.3) of Theorem E.4.10 which will bound the previous expectation purpose, we by the corresponding one with {Z˜j } instead of {Z " j }. For this # 2 2 2 ˆ2 ˆ write Xj = A Z 1{|Aj Zj | ≤ Ch x}, Yj = A Z 1 |Aj Zj | ≤ Ch x and Fˆj = j

j

j

j

Fj+1 ∨ σ(Zˆi , i ≤ j), j ≥ 0. Then Aj is Fˆj−1 -measurable for j ≥ 1. Moreover, for j ≥ 1, Zj and Zˆj have the same distribution and are independent of Fˆj−1 , thus     L Xj | Fˆj−1 = L Aj Zj 1{|Aj Zj | ≤ Ch x} | Fˆj−1    " #  = L Aj Zˆj 1 |Aj Zˆj | ≤ Ch x | Fˆj−1 = L Yj | Fˆj−1 . Thus we can apply Theorem E.4.10, i.e.  " #q/2 ! ∞ 2 ˆ2 ˆ Z Z E A 1 |A | ≤ C x j j h j=0 j j I ≤ cst . xq P(|Z0 | > x) From now on the path is easy to the conclusion. Write V = Applying Jensen’s inequality, we obtain ⎡⎛ ⎞q/2 ⎤ ∞ " #  ⎥ ⎢ A2j Zˆj2 1 |Aj Zˆj | ≤ Ch x ⎠ ⎦ E ⎣⎝ j=0





∞ 

⎢ = E ⎣V q/2 ⎝V −1

∞ j=0

A2j .

⎞q/2 ⎤ " # ⎥ A2j Zˆj2 1 |Aj Zˆj | ≤ Ch x ⎠ ⎦

j=0

⎡ ⎤ ∞ " #  ≤ E⎣ V q/2−1 A2j |Zˆj |q 1 |Aj Zˆj | ≤ Ch x ⎦ . j=0

This yields

∞ I ≤ cst

j=0

" # E V q/2−1 A2j |Zˆj |q 1 |Aj Zˆj | ≤ Ch x xq P(|Z0 | > x)

.

Applying conditioning and Problem 1.8, for q > 2, δ > 0 and x ≥ 1, we have, " #

E A2j |Zˆj |q 1 |Aj Zˆj | ≤ Ch x | Aj ≤ cst |Aj |4−q (|Aj |δ ∨ |Aj |−δ ) . xq P(|Z0 | > x) Taking q = 2 + δ yields ∞  E[V δ/2 |Aj |2−δ (|Aj |δ ∨ |Aj |−δ )] I ≤ cst j=0

≤ cst

∞  j=0

E[V

δ/2

2−2δ

|Aj |

] + cst

∞  j=0

E[V δ/2 |Aj |2 ] .

(4.1.21)

88

4 Regular variation of series and random sums

For the first term in (4.1.21), we have, using again (4.1.7) with a = 2, b = 2 − 2δ, ⎤ ⎡ ⎤ ⎡ δ/2 ∞ ∞ ∞ ∞     E[V δ/2 |Aj |2−2δ ] = E ⎣V δ/2 |Aj |2−2δ⎦ = E ⎣ A2i |Aj |2−2δ⎦ j=0

j=0

i=0

⎡ ⎤ δ/(2−2δ) ∞ ∞   2−2δ 2−2δ ⎦ ≤ E⎣ |Ai | |Aj | i=0

j=0

j=0

⎡ (2−δ)/(2−2δ) ⎤ ∞  ⎦ . = E⎣ |Ai |2−2δ i=0

The second term in (4.1.21) is handled similarly: ⎡ ⎡ (2+δ)/2⎤ (2+δ)/(2−2δ) ⎤ ∞ ∞ ∞    ⎦ ≤ E⎣ ⎦ . E[V δ/2 |Aj |2 ] = E ⎣ A2i |Ai |2−2δ j=0

i=0

i=0

(4.1.22) Choosing δ = /2 with  such that (4.1.4b) holds, gathering the bounds (4.1.16), (4.1.17) for the terms III and II yields      ∞ P  j=0 Aj Zj  > x P(|Z0 | > x) ∞ ∞   ≤ cst E[|Aj |2+ ] + cst E[|Aj |2− ] j=0

⎡

+ cst E ⎣

j=0 ∞ 

⎤ 2−/2 2−

|Ai |2−

⎡ ⎤ 2+/2 2− ∞  ⎦ + cst E ⎣ ⎦ |Ai |2−

i=0

i=0

⎡ 2+ ⎤ 2− ∞ ∞   ⎦ . ≤ cst E[|Aj |2− ] + cst E ⎣ |Ai |2− j=0

i=0

 Proof (Proof of Proposition 4.1.6: Case α > 2). Again, the terms II and III do not need to be reconsidered since the argument was valid for any α > 0. Fix q > α and set r = q/2 > 1, i.e. 2(r − 1) < q ≤ 2r. We start exactly as in the case α = 2. We have by Burkholder inequality

4.1 Series with random coefficients

E I ≤ cst

xq P(|Z0 | > x) q !  " #r 2r ∞ 2 ˆ2 ˆ E Z Z A 1 |A | ≤ C x j j h j=0 j j

= cst

=

89

" #q/2 ! ∞ 2 ˆ2 ˆj | ≤ Ch x Z Z A 1 |A j j=0 j j



cst xq P(|Z0 |

> x)

xq P(|Z0 | > x) ⎡⎛ q ⎤ ⎞ 2r ∞ r " #  $ ⎥ ⎢ E ⎣⎝ A2js Zˆj2s 1 |Ajs Zˆjs | ≤ Ch x ⎠ ⎦ . j1 ,...,jr =0 s=1

Now, the set of indices {j1 , . . . , jr } is split into two parts: {j1 = · · · = jr } and Br = {j1 = · · · = jr }c . In other words, not all indices j1 , . . . , jr in Br are equal. This yields, for a fixed v ≥ Ch , ⎡⎛ q ⎤ ⎞ 2r ∞  cst ⎥ ⎢ E ⎣⎝ I≤ q |Aj |2r |Zj |2r 1{|Aj Zj | ≤ Ch x}1{|Aj | ≤ x/v}⎠ ⎦ x P(|Z0 | > x) j=0 +

cst xq P(|Z0 | > x) ⎡⎛  ⎢ × E ⎣⎝

r $

q ⎤ ⎞ 2r " # ⎥ A2js Zˆj2s 1 |Ajs Zˆjs | ≤ Ch x 1{|Aj | ≤ x/v}⎠ ⎦

j1 ,...,jr ∈Br s=1

= Ia + Ib . For Ia , application of (4.1.7) with a = 2r ≥ b = q yields ⎡ ⎤ ∞  cst E⎣ |Aj |q |Zj |q 1{|Aj Zj | ≤ Ch x}⎦ . Ia ≤ q x P(|Z0 | > x) j=0 Applying conditioning and (since α < q) Problem 1.8, we have for x ≥ 1,

" # E Aqj |Zˆj |q 1 |Aj Zˆj | ≤ Ch x | Aj ≤ cst |Aj |α (|Aj | ∨ |Aj |− ) . xq P(|Z0 | > x) Thus,

⎡ ⎤ ∞  |Aj |α (|Aj | ∨ |Aj |− )⎦ Ia ≤ cstE ⎣ j=0

⎡⎛ ⎡⎛ ⎞(α−)/2 ⎤ ⎞(α+)/2 ⎤ ∞ ∞ ⎥ ⎥ ⎢  2⎠ ⎢  2⎠ Aj Aj ≤ cst E ⎣⎝ ⎦ + cst E ⎣⎝ ⎦ . j=0

j=0

We now provide a bound for Ib . We consider separately the cases when α is or is not an even integer.

90

4 Regular variation of series and random sums

Case α not an even integer. Then we can choose q such that 2(r − 1) < α < q < 2r

(4.1.23)

and such that (4.1.4c) holds with  = q −α. By definition, if (j1 , . . . , jr ) ∈ Br , there exist i ∈ {2, . . . , r} and positive integers a1 , . . . , ai such that a1 + · · · + ai = r (which implies ai ≤ r − 1), such that i

$

E Zˆj21 · · · Zˆj2r = E Zˆ12ai s=1



ai i  r

 r−1

 r−1  $ 2(r−1) E Zˆ1 = E |Zˆ1 |2(r−1) α, limx→∞ xq P(|Z0 | > x) = ∞, thus ⎡⎛ ⎞(α+)/2 ⎤ ∞ ⎥ ⎢  2⎠ Ib ≤ cstE ⎣⎝ Aj ⎦ .

(4.1.24)

j=0

This proves that (4.1.9) holds for α > 2 not an even integer. Case α even integer. In this case, we choose q such that 2(r − 1) = α < ˜r and {j1 , . . . , q < 2r. We further decompose Ib according to {j1 , . . . , j2 } ∈ B ˜ ˜ jr } ∈ Br \ Br , where Br is the subset of Br where exactly r − 1 of the indices

4.1 Series with random coefficients

91

are equal. The latter term is dealt similarly to Ib in the previous case. Con˜r , say Ic . Define L(v) = E[|Z0 |α 1{|Z| ≤ v}]. sider now the sum over the set B Then L is a slowly varying function and Ic cst ≤ q x P(|Z0 | > x) ⎡⎛ q ⎤ ⎞ 2r " #  ⎢ 2 ˆ2 ⎠ ⎥ ˆ ˆα × E ⎣⎝ Aα ⎦ j1 1{|Aj1 | ≤ x/v}Zj1 1 |Aj1 Zj1 | ≤ Ch x Aj2 Zj2 j1 =j2

cst xq P(|Z0 | > x) ⎡⎛ ⎡ q ⎤ ⎤⎞ 2r " #  ⎢ 2 ˆ2 ˆ ˆα ⎦⎠ ⎥ × E ⎣⎝E ⎣ Aα ⎦ j1 1{|Aj1 | ≤ x/v}Zj1 1 |Aj1 Zj1 | ≤ Ch x Aj2 Zj2 | A ≤

j1 =j2



cst xq P(|Z0 | > x)

q ⎤ ⎞ 2r ⎥ ⎢  α E ⎣⎝ Aj1 1{|Aj1 | ≤ x/v}L(Ch x/|Aj1 |)A2j2 ⎠ ⎦ .

⎡⎛

j1 =j2

As in the case α = 1, we fix δ > 0 such that α − δ > 2. For |Aj1 | ≤ x/v, we can apply the Uniform Convergence Theorem 1.1.2 to obtain ⎡⎛ q ⎤ ⎞ 2r  cst ⎢ 2 ⎠ ⎥ E ⎣⎝ Ic ≤ q Aα−δ ⎦ j1 Aj2 x P(|Z0 | > x) j1 =j2





δq 2r

cst x xq P(|Z0 | > x)

⎡⎛ q ⎤ ⎞ q(α−δ) ⎛ ⎞ 2r 4r ∞ ∞   ⎥ ⎢ ⎝ E ⎣⎝ A2j1 ⎠ A2j2 ⎠ ⎦ j1 =0

cst xq(1−δ/(2r)) P(|Z0 |

> x)

j2 =0

⎤ ⎡⎛ ⎞ q(α+2−δ) 4r ∞ ⎥ ⎢  2⎠ Aj E ⎣⎝ ⎦ . j=0

The second bound was obtained by applying H¨ older inequality. For  such that (4.1.4c) holds, set (α + )(α + 2) , q= α+2−δ with δ small enough such that 2r > q > q(1 − (2r)−1 δ) > α. This yields ⎤ ⎡⎛ ⎞ α+ 2 ∞ cst ⎥ ⎢  2⎠ Ic ≤ q(1−δ/(2r)) Aj E ⎣⎝ ⎦ . x P(|Z0 | > x) j=0 This concludes the proof of the last case of (4.1.9).



92

4 Regular variation of series and random sums

4.2 Applications 4.2.1 Series with deterministic coefficients As a first application of Theorem 4.1.2 and Corollary 4.1.3, we consider here one-dimensional series with deterministic weights. In that case we can consider two-sided sequences since the predictability assumptions are automatically satisfied. Corollary 4.2.1 Let {Z j , j ∈ Z} be a sequence of i.i.d. random vectors, regularly varying with tail index α > 0 and exponent measure ν Z . Let {Aj , j ∈ Z} be a sequence of deterministic matrices such that one of the following sets of conditions holds: %  δ there exists δ ∈ (0, α) ∩ (0, 2] such that j∈Z Aj  < ∞ , (4.2.1) E[Z 0 ] = 0 if α > 1 , or there exists δ ∈ (0, α) ∩ (0, 1] such that



δ

Aj  < ∞ .

(4.2.2)

j∈Z

 Then the series j∈Z Aj Z j is regularly varying with tail index α and exponent measure ν in the sense of (4.1.6).

Proof This result follows from Theorem 4.1.2 and Corollary 4.1.3 since for deterministic weights, the conditions in (4.1.4) boil down to the present ones.  Example 4.2.2 Consider the univariate case. Let {Zj , j ∈ Z} be a sequence of i.i.d. random variables regularly varying with tail index α and extremal of real numbers which satisfy skewness pZ and {ψj , j ∈ Z} be a sequence  either (4.2.1) or (4.2.2). Then the series X = j∈Z ψj Zj is regularly varying with tail index α and extremal skewness pX given by  α α j∈Z {pZ (ψj )+ + (1 − pZ )(ψj )− }  pX = (4.2.3) α j∈Z |ψj | and lim

x→∞

     P  j∈Z ψj Zj  > x P(|Z0 | > x)

=



|ψj |α .

j∈Z



4.2 Applications

93

4.2.2 Randomly stopped sums Let {Z j , j ∈ Z} be a sequence of random vectors satisfying the assumptions of Theorem 4.1.2. Let Gj , j ≥ 0 be a σ-field such that σ(Z 0 , . . . , Z j ) ⊂ Gj for j ≥ 0 and let N be an integer-valued random variable. Set Aj = 1{1 ≤ j ≤ N }Id , j ≥ 0. Then ∞  i=0

Ai Z i =

N 

Zi .

i=1

Two cases of randomly stopped sums are of interest. The case where the number of terms is independent of the summands and the case where it is a stopping time. (i) If N is independent of {Zj }, set Fj = Gj−1 ∨ σ(N ). (ii) If N is a {Gj }-stopping time then set F0 = {∅, Ω} and Fj = Gj−1 for j ≥ 1. Then {j ≤ N } = {N ≤ j − 1}c ∈ Gj−1 = Fj by definition of a stopping time and Zk is independent of Fj if k ≥ j. Then Assumption 4.1.1 holds and for every r > 0, ∞ 

(1{1≤i≤N } )r = N .

i=0

Thus the summability conditions (4.1.4) hold if there exists  > 0 such that E[N ] < ∞ if α ∈ (0, 1) ∪ (1, 2) ,  1+  < ∞ if α = 1 , E N

E N α/2+ < ∞ if α ≥ 2 .

(4.2.4a) (4.2.4b) (4.2.4c)

Under these assumptions, if α > 1, we have by Theorem E.4.8 N   E Z i = E[N ]E[Z 0 ] . i=1

Corollary 4.2.3 Let {Z j , j ∈ Z} be a sequence of i.i.d. regularly varying random vectors with tail index α > 0, exponent measure ν Z and such that E[Z 0 ] = 0 if α > 1. Let N be either independent of the sequence {Z j } or a stopping time with respect to a filtration which contains the natural filtration of the sequence {Z j }. Assume that the moment condiN tions (4.2.4) hold. Then j=1 Z j is regularly varying with tail index α and exponent measure E[N ]ν Z .

94

4 Regular variation of series and random sums

We now consider the case of a non-zero expectation. Corollary 4.2.4 Let {Z j , j ∈ Z} be a sequence of i.i.d. regularly varying random vectors with tail index α > 0, exponent measure ν Z . Let N be either independent of the sequence {Z j } or a stopping time with respect to the natural filtration of the sequence {Z j }. Assume that E[Z 0 ] = 0. If E[N ] < ∞ and P(N > x) =0, n→∞ P(|Z 0 | > x) lim

(4.2.5)

N then j=1 Z j is regularly varying with tail index α and exponent measure E[N ]ν Z .

4.3 Problems 4.1 Let {Zj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables with tail index α > 0 and extremal skewness pZ and such that E[Z0 ] = 0 if α > 1. Let {ψj , j ∈ Z} be a sequence of real numbers and assume that there exists δ ∈ (0, α) ∩ (0, 2] such that  |ψj |δ < ∞. j∈Z

For h ∈ Z, define Xh =

 j∈Z

ψj Zh−j .

1. For h ∈ Z, assume that there exists j ∈ Z such that ψj ψj+h = 0. Prove that the vector (X0 , Xh ) is regularly varying and determine its spectral measure. 2. Deduce that for all h ∈ Z, X0 Xh is regularly varying with tail index α/2 and P(|X0 Xh | > x)  = |ψj ψj+h |α/2 . x→∞ P(Z02 > x) lim

j∈Z

4.2 Let {Zj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables with tail index α and extremal skewness pZ and {Cj,i , j ∈ Z, i ∈ Z} be a collection random variables, independent of {Zj , j ∈ Z} and stationary with respect to j. Assume that the conditions of Corollary 4.1.5 hold and define Xj = i∈Z Cj,i Zj−i , j ∈ Z. 1. Prove that

4.3 Problems

95

& ' P(X0 > x) α = pZ E[|C0,i |α 1{C0,i > 0}] + qE[|C0,i |1{C0,i < 0}] . x→∞ P(|Z0 | > x) lim

i∈Z

Assume from now on that pZ = 1 and P(C0,i ≥ 0) = 1 for all i ∈ Z and ˜ j , j ∈ Z} be defined by ci = (E[|C0,i |α ])1/α . Let the process {X  ˜j = X ci Zj−i , j ∈ Z . i∈Z

2. Prove that the coefficients ci satisfy the conditions of Corollary 4.2.1. ˜ 0 > x)/P(X0 > x) = 1. 3. Prove that limx→∞ P(X n 4. Let Sn = j=1 Xj . Prove that ⎞α ⎤ ⎡⎛ n   P(Sn > x) = E ⎣⎝ Cj,j+i ⎠ ⎦ . lim x→∞ P(Z0 > x) j=1 i∈Z

n ˜ 5. Let S˜n = j=1 X j . Prove that

⎛ ⎞α n P(S˜n > x)  ⎝ = ci+j ⎠ . lim x→∞ P(Z0 > x) j=1 i∈Z

6. Conclude that for n ≥ 2, limx→∞ P(Sn > x)/P(S˜n > x) = 1 (unless α = 1). 4.3 Let {Zj , j ∈ Z} be a sequence of non-negative regularly varying random α/2 variables with tail index α > 0. Let c > 0 be such that cα/2 E[Z0 ] < 1. Prove that the series,  i−1 ∞  $ 2 ci Zj−k Zj−i , Xj = Zj + i=1

k=1

is well defined and cα/2 P(X0 > x) = . 2 α/2 x→∞ P(Z > x) 1 − cα/2 E[Z0 ] 0 lim

4.4 Assume that {Zj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables with tail index α = 1 and extremal skewness pZ . Let {ψj , j ∈ Z} be a sequence of real numbers such that  |ψj |δ < ∞ , (4.3.1) j∈Z

 with δ ∈ (0, 1) and define X = j∈Z ψj Zj . Let aZ,n and aX,n be scaling sequences such that limn→∞ nP(|Z0 | > aZ,n ) = limn→∞ nP(|X| > aX,n ) = 1. Prove that

96

4 Regular variation of series and random sums

lim

n→∞

⎛ ⎞ ⎛ ⎞  n ⎝ E[X1{|X| ≤ aX,n }] − ⎝ ψj ⎠ E[Z0 1{Z0 ≤ aZ,n }]⎠

aX,n

j∈Z

2pZ − 1  =− ψj log(|ψj |) + (2pX − 1) log ψ1 . (4.3.2) |ψ|1 j∈Z

4.5 Assume that {Zj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables with tail index α = 2 and E[Z0 ] = 0. Let {ψj , j ∈ Z} be a sequence of real numbers such that  ψj2 < ∞ . (4.3.3) j∈Z



 Define X0 = j∈Z ψj Zj , X = j∈Z ψj Zj+ . Let aZ,n and aX,n be scaling sequences such that limn→∞ nP(|Z0 | > aZ,n ) = limn→∞ nP(|X| > aX,n ) = 1. Prove that ' & n ( E[X0 X 1 |X0 X | ≤ a2X,n ] lim n→∞ a2 X,n ⎛ ⎞ ⎞  ' & −⎝ ψj ψj+ ⎠ E[Z02 1 Z02 ≤ a2Z,n ]⎠ j∈Z

=−

1 |ψ|21



ψj ψj+ log(|ψj ψj+ |) + (2pX0 X − 1) log ψ21 , (4.3.4)

j∈Z

where pX0 X = lim

x→∞

P(X0 X > x) . P(|X0 X | > x)

4.6 (Another proof of Corollary 4.2.4 when α < 1) Let N be independent of the sequence {Xj , j ≥ 1} of i.i.d. random variables with tail index α ∈ (0, 1). Assume that E[N ] < ∞. Prove that P(S > x) ∼ E[N ] P(X0 > x) . Hint: Let ψ(t) = E[e−tX0 ] and χ(t) = E[e−tS ] be the Laplace transforms of X0 and S, respectively. Show that χ(t) = E[ψ N (t)] and that 1 − χ(t) ∼ E[N ]{1 − ψ(t)} when t → 0. Then use Problem 1.21. 4.7 Let {Zj , j ∈ Z} be a sequence of i.i.d. non-negative random variables with tail index α > 0 and let N be an integer-valued random variable, independent N of {Zj , j ∈ Z}. Let S = j=1 Zj . If P(Z0 > x) is regularly varying at infinity and if there exists z > 1 such that E[z N ] < ∞, then P(S > x) ∼ E[N ] P(Z0 > x) . Hint: Use Problem 1.15 and the dominated convergence theorem.

4.3 Problems

97

4.8 (Ruin probability) Let {Xj , j ∈ Z} be a sequence of i.i.d. random variables representing the claim sizes and assume that the claim arrivals form a homogeneous Poisson process N with intensity λ. Let u be the initial capital of the insurance company, and let c be the premium. The risk process is defined by N (t)

S(t) = u + ct +



Xj .

j=1

The ruin probability is the probability that S be negative at some time t: ψ(u) = P(∃t > 0 : S(t) < 0 | S(0) = u) . If μ = E[X0 ] < ∞, then a necessary and sufficient condition for ψ(u) < 1 is c > λμ. Define then the safety loading ρ by c −1. ρ= λμ Let F denote the distribution of X0 and let H be the distribution function defined by  1 u F (s) ds . H(u) = μ 0 The Pollaczek-Khinˇcin formula gives the following expression for the ruin probability: ψ(u) =

∞ ρ  ∗n (1 + ρ)−n H (u) , 1 + ρ n=0

where H ∗n is the n-th fold convolution of H. 1. Let {Yj , j ∈ Z} be a sequence of i.i.d. random variables with distribution H and let K be an integer-valued random variable such that P(K = k) = ρ(1 + ρ)−k−1 , k ≥ 0. Prove that ⎛ ⎞ K  ψ(u) = P ⎝ Yj > u⎠ . j=1

2. Apply Corollary 4.2.4 and Problem 1.12 to prove that ψ(u) ∼

uF¯ (u) . ρμ(α − 1)

In the next problem we consider the situation opposite to Corollary 4.2.4, that is, the tail of the number of terms in the sum is heavier than the tail of the summands.

98

4 Regular variation of series and random sums

4.9 Let {Zj , j ∈ Z}[N] be i.i.d. non-negative random variables with finite mean μ and there exists p > 1 such that E[Z0p ] < ∞. Let N be an integervalued random variable and define S=

N 

Zj .

j=1

Assume that N is regularly varying with tail index α > 1 and that x1−p = o(P(N > x)). We will prove that lim

x→∞

P(S > μx) =1. P(N > x)

1. For  > 0, let k be the largest integer such that k ≤ (1 − )x. Prove that k P(S > x) ≤ P(N > (1 − )x) + P( i=1 (Zi − 1) > x). 2. For  > 0, let k be the smallest integer such that k ≥ (1 + )N . Prove that k P(S > x) ≥ P(N > (1 + )x) − P( i=1 (Zi − 1) ≤ −x). 3. Conclude by applying the inequality (1.5.3). 4.10 Let {Zj , j ∈ Z} be a sequence of i.i.d. non-negative random variables, N independent of the integer-valued random variable N . Let S = i=1 Zi . We will prove that if S is regularly varying with tail index α ∈ (0, 1) and E[N ] < ∞, then P(Z0 > x) ∼ P(S > x)/E[N ]. Let ψ(t) = E[e−tZ0 ] and χ(t) = E[e−tS ] be the Laplace transforms of Z0 and S, respectively. Let H(z) = E[z N ], z ∈ [0, 1]. Note that H is left-differentiable at 1 and H  (1) = E[N ]. 1. Prove that χ = H ◦ ψ. 2. Prove that for every  > 0, there exist t0 > 0 such that for 0 ≤ t ≤ t0 , {1 − ψ(t)}H  (1 − ) ≤ 1 − χ(t) ≤ {1 − ψ(t)}H  (1) . Hint: prove and use the convexity of H and the continuity of ψ. 3. Apply Problem 1.21 to conclude. 4.11 The L´evy-Khinˇcin Theorem states that an infinitely divisible distribution has a characteristic function that never vanishes and is given by  σ2 2 t + (eitx − 1 − itx1{|x| ≤ 1}) ν(dx) , (4.3.5) log φ(t) = ibt − 2 where b, σ ∈ R and ν is a measure on R such that  +∞ (x2 ∧ 1) ν(dx) < ∞ . −∞

4.4 Bibliographical notes

99

The following result is well known. The infinitely divisible distribution F with L´evy measure ν has a regularly varying right tail (resp. left tail) if and only if the function x → ν(x, ∞) (resp. x → ν(−∞, −x)) is regularly varying at +∞, in which case they have the same index of regular variation. We will prove the direct implication for all α > 0 and the converges for α ∈ (0, 1). Let X be a random variable with characteristic function given by (4.3.5) and assume without loss of generality that b = σ = 0. Let X0 , X+ , and X− be independent random variables with respective characteristic functions φ0 , φ+ , and φ− given by  {eitx − 1 − itx} ν(dx) , log φ0 (t) = |x|≤1  log φ− (t) = {eitx − 1} ν(dx) , x1 d

1. Prove that X = X0 + X+ + X− . 2. Prove that X0 has finite moments of all orders. 3. Prove that X+ can be expressed as X+ =

N+ 

Zi ,

i=1

where N+ is a Poisson random variable with mean θ+ = ν((1, ∞)) and Zi −1 are i.i.d. random variables with distribution θ+ ν restricted to (1, ∞). 4. Provide a similar expression for X− . 5. Apply Corollary 4.2.3 to prove the if part. 6. Apply Problem 4.10 to prove the only if part in the case α ∈ (0, 1).

4.4 Bibliographical notes The main result of this chapter, Theorem 4.1.2, is due to [HS08]. The case of a univariate series with deterministic coefficients has a long history. The conditions of Corollary 4.2.1 were first obtained in [MS00]. The models in Problem 4.2 and Problem 4.3 are considered in [Kul06] and [DR96] for instance. For Problem 4.8, see for instance the monograph [Mik09] on insurance mathematics. Problem 4.9 and extensions can be found in [RS08]. The converse to Corollary 4.2.4 which is obtained in Problem 4.10 can be extended to all α > 0; see [FGAMS06, Proposition 4.8].

5 Regularly varying time series

This chapter introduces the central object of this book: we consider regularly varying time series, that is, sequences of d-dimensional random vectors indexed by Z, whose finite-dimensional distributions are all regularly varying. Time series can be thought of as random elements in the space (Rd )Z of Rd valued sequences. Thus, by analogy with the definition of regular variation in Chapter 2, we could try to define regular variation of a time series as the regular variation of its distribution in the space of sequences. For this, we must endow (Rd )Z with a suitable topology and define vague# convergence on (Rd )Z \ {0}, with 0 denoting the sequence with identically vanishing coordinates. If this is possible, this infinite-dimensional regularly varying random vector will have an exponent measure. If this construction is coherent with the previous definition, the finite-dimensional projections of this infinitedimensional exponent measure should coincide with the exponent measures of the finite-dimensional distributions of the time series. It turns out that these two definitions are indeed equivalent. This will be shown rigorously in Section 5.1, where the infinite-dimensional exponent measure will be introduced as the tail measure ν in 5.1.2. An extremely important feature is that the tail measure is shift-invariant if the time series is stationary. Exponent measures are defined up to scaling, the tail measure can be normalized in such a way that ν({x ∈ (Rd )Z : |x0 | > 1}) = 1 and we can define an (Rd )Z -valued random element Y whose distribution is ν restricted to the latter set. Similarly to the dual definition of the tail measure, we can see that the tail process Y is the distributional limit (in the sense of convergence of the finite-dimensional distributions) of the original time series X, given that |X 0 | is large. Since the tail measure is (−α)-homogeneous, a change of variable shows that −1 the process defined by Θ = |Y 0 | Y is independent of |Y 0 |. This process is called the spectral tail process. The advantage of the tail and spectral

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 5

101

102

5 Regularly varying time series

tail processes over the tail measure is that they can be explicitly computed. Examples will be given in Section 5.2.2. The tail and spectral tail processes of a stationary time series are not stationary, but they inherit a very important property from the shift-invariance and homogeneity of the tail measure: the time-change formula. It will be stated and proved in Section 5.3. In passing from the tail measure to the tail process, it may seem that some information was lost. However, in Section 5.4 we will prove that the tail measure can be fully recovered from the tail process. More precisely, we will prove that given a random element Y which satisfies the time-change formula and such that P(|Y 0 | > 1) = 1, a unique homogeneous shift-invariant measure ν can be defined such that Y is the tail process related to ν. Therefore the relation between the tail process and the tail measure is one to one. In the case where the tail process almost surely tends to zero at ±∞, we will obtain a representation of the tail measure in terms of a sequence Q, called the conditional spectral tail process, whose distribution is that of (Y ∗ )−1 Y under a tilted law (see Definition 5.4.6). This tilted law is well defined only when the number of exceedences of the tail process over the level 1 is not almost surely infinite, that is, P(E(Y ) < ∞) > 0 with E(Y ) = j∈Z 1{|Y j | > 1}. In that case, ϑ = E[E −1 (Y )] > 0 and we call the latter quantity the candidate extremal index. This name will be justified in Chapter 7 where ϑ will be related to the classical extremal index of extreme value theory. The distribution of the sequence Q may seem difficult to compute, but in Section 5.5, we will introduce a family of maps defined on (Rd )Z , called anchoring maps. Heuristically, these maps force the tail process to achieve a certain feature at a given time. For instance, the first exceedence above the level one is an anchoring map. It turns out that the distribution of (Y ∗ )−1 Y conditionally on any anchoring map to be equal to zero is (up to shift) the distribution of Q. The sequence Q and the candidate extremal index will play a key role in all the limit theory developed in Part II. In particular, in Chapter 7, the classical extremal index will be introduced and conditions will be given for these indices to be equal. The sequence Q will also appear in Chapter 8 in the stable limits of the partial sum process and in the extremal processes which are limits of the partial maxima process. It will also appear in Chapters 9 and 10 in the limiting variance of the statistics considered there. However, the forward spectral tail process {Θj , j ≥ 0} is often easier to compute explicitly. Therefore it is useful to obtain expressions in terms of the forward spectral tail process for the quantities which are expressed in terms of the conditional spectral tail process Q. In Section 5.6 we will provide tools to establish such identities which will be useful in Chapters 8 to 10 to obtain

5.1 The tail measure

103

expressions of limits in terms of the tail process and to construct statistical procedures.

Notation Throughout this chapter, we choose an arbitrary norm |·| on Rd . For a sequence x = (xi )i∈Z ∈ (Rd )Z and i ≤ j ∈ Z, we write xi,j = (xi , . . . , xj ), x∗i,j = maxi≤k≤j |xk | if i ≤ j, x∗i,j = 0 if i > j, and x∗ = x∗−∞,∞ . The backshift operator B is defined on (Rd )Z by (Bx)i = xi−1 , i ∈ Z if x = {xi , i ∈ Z} and B k is its k-th iterate. We denote by 0 (Rd ) the space of sequences x = {xj , j ∈ Z} ∈ (Rd )Z such that lim|j|→∞ |xj | = 0. The sequences in 0 (Rd ) are bounded and 0 (Rd ) endowed with the uniform norm x∞ = x∗ = supj∈Z |xj | is a Banach space.  q For q > 0, we define xq = ( j∈Z |xj | )1/q and q (Rd ) = {x ∈ (Rd )Z , xq < ∞}. The spaces q (Rd ) endowed with the norm  · q are also Banach spaces for q ≥ 1.

5.1 The tail measure We start with a formal definition of regular variation for a time series. Definition 5.1.1 A sequence X = {X j , j ∈ Z} of Rd -valued random vectors is said to be regularly varying with tail index α if all its finitedimensional distributions are regularly varying with tail index α and the same scaling sequence.

This definition entails that all the finite-dimensional distributions are regularly varying with the same tail index. It also implies that if we denote the distribution function of |X 0 | by F0 and define the sequence {an } by an = F0← (1 − 1/n), then, for all i ≤ j ∈ Z, there exists a non-zero boundedly finite measure ν i,j on Rj−i+1 \ {0} endowed with its Borel σ-field such that   v# nP a−1 n X i,j ∈ · −→ ν i,j , on Rj−i+1 \{0} as n tends to infinity. Here vague# convergence on Rj−i+1 \{0} is defined as in Section 2.1. Equivalently, P(x−1 X i,j ∈ ·) v# −→ ν i,j , P(|X 0 | > x)

(5.1.1)

104

5 Regularly varying time series

as x → ∞. For i = j, we will simply write ν i for ν i,i , which is the exponent measure of X i . The choice of the scaling sequence an or, equivalently, (5.1.1) guarantees that ν 0 ({y 0 ∈ Rd : |y 0 | > 1}) = 1 .

(5.1.2)

For all pairs of integers i < j, we extend the measure ν i,j to (Rd )j−i+1 by setting ν i,j ({0}) = 0. The collection of measures ν i,j satisfies a consistency condition but they are infinite and therefore we cannot use the classical Kolmogorov extension theorem. Instead, we will use the extension Theorem B.1.33. For i < j ∈ Z, let pi,j be the projection of (Rd )Z onto (Rd )j−i+1 defined by pi,j (x) = xi,j , x ∈ (Rd )Z . Recall that a measure μ on a measurable cone (E, E) is (−α)-homogeneous if μ(tA) = t−α μ(A) for all t > 0 and A ∈ E. Theorem 5.1.2 Let X = {X j , j ∈ Z} be a regularly varying Rd -valued time series with tail index α. There exists a unique σ-finite measure ν on (Rd )Z endowed with the product σ-field, called the tail measure of X, such that (i) ν({0}) = 0; (ii) ν({y ∈ (Rd )Z : |y 0 | > 1}) = 1; (iii) ν({y ∈ (Rd )Z : |y i | > 1}) < ∞ for all i ∈ Z; (iv) ν is (−α)-homogeneous; (v) ν i,j = ν ◦ p−1 i,j for all i ≤ j ∈ Z. If X is stationary then ν is shift-invariant.

Proof. The existence of a measure ν on (Rd )Z such that (i), (iii), and (v) hold is a consequence of Theorem B.1.33; see Appendix B.2.3 for details. The consistency property (v) and (5.1.2) yield ν({y ∈ (Rd )Z : |y 0 | > 1}) = ν 0 ({y 0 ∈ Rd : |y 0 | > 1}) = 1 , i.e. (ii) holds. To prove the homogeneity, consider for t > 0 the measure ν t defined by ν t (C) = tα ν(tC) for all Borel sets C of (Rd )Z \ {0}. For i ≤ j ∈ Z and a Borel set A ⊂ Rj−i+1 \ {0}, since ν i,j is homogeneous, we have α ν t ◦ p−1 i,j (A) = t ν i,j (tA) = ν i,j (A) .

5.1 The tail measure

105

This implies that ν t = ν since ν is uniquely determined by its projections. We now assume that the time series X is stationary and prove that ν is shiftinvariant. The stationarity of X implies that ν i+k,j+k = ν i,j for all i ≤ j and k ∈ N. Define ν  = ν ◦ B. Then, for a Borel set A ⊂ Rj−i+1 \ {0}, −1 −1 ν  ◦ p−1 i,j (A) = ν ◦ B ◦ pi,j (A) = ν ◦ pi+1,j+1 (A) = ν i+1,j+1 (A) = ν i,j (A) .

Since ν is uniquely determined by its projections, this proves that ν = ν  . The tail measure ν is entirely determined by its finite-dimensional projections ν i,j , which are the exponent measures of the vectors (X i , . . . , X j ), i ≤ j ∈ Z. Thus the tail measure is entirely determined by the time series X. Alternatively, we can consider the time series X as a random element in (Rd )Z endowed with the product topology and the Borel σ-field relative to the product topology. The product topology is metrized by the metric     2−|j| xj − y j  ∧ 1 , d∞ (x, y) = j∈Z

where |·| is an arbitrary norm on Rd . We say that a subset A of (Rd )Z is separated from 0 if and only if there exists  > 0 such that d∞ (x, 0) >  for all x ∈ A. The collection of sets separated from zero is a boundedness denoted by B (∞) (see Appendix B.1.1). It is important to note that a set A is in B (∞) if there exist  > 0 and k ∈ N such that x ∈ A =⇒ sup|j|≤k |xj | > . To be separated from the null sequence 0 in the supnorm is not sufficient. In particular, the set {x ∈ (Rd )Z : x∗ > 1} is not in B (∞) . We say that a measure ν is B (∞) -boundedly finite on (Rd )Z \ {0} if ν(A) < ∞ for all sets A separated from 0 and that a sequence of measures νn converges B (∞) -vaguely# to ν in (Rd )Z \ {0} if limn→∞ νn (A) = ν(A) for all Borel subsets A of (Rd )Z separated from zero, which are continuity sets of ν. d Z We can now define the measure nP(a−1 n X ∈ ·) on (R ) \ {0}. Applying # Proposition B.1.35 we obtain the vague convergence in (Rd )Z \ {0} of this measure to the tail measure ν.

Corollary 5.1.3 Let X = {X j , j ∈ Z} be an Rd -valued time series. The following statements are equivalent: (i) X is regularly varying in the sense of Definition 5.1.1. (ii) There exists a non-decreasing sequence {an } and a non-zero B (∞) boundedly finite measure ν on (Rd )Z \ {0} such that v#

nP(a−1 n X ∈ ·) −→ ν in (Rd )Z \ {0} endowed with the boundedness B (∞) .

106

5 Regularly varying time series

If either condition holds, then ν is the tail measure of the time series X.

5.2 The tail process From now on we assume that the time series X is stationary. Set E0 = {y ∈ (Rd )Z : |y 0 | > 1}). Since by definition ν(E0 ) = 1, we can define a random element Y with values in E0 and distribution η defined by η = ν(· ∩ E0 ) .

(5.2.1)

This means that for every non-negative measurable function H on (Rd )Z ,  E[H(Y )] =

 H(y)η(dy) =

(Rd )Z

(Rd )Z

H(y)1{|y 0 | > 1}ν(dy) . (5.2.2)

The process Y is called the tail process. It is important to note that the tail measure ν is shift-invariant because of stationarity, but the measure η is not since we have arbitrarily given a particular role to the time index 0. Consequently the tail process Y is not stationary. Note also that the tail measure depends on the chosen norm only up to a scaling constant, whereas the tail process depends more strongly on the norm. Applying Corollary 5.1.3 and (5.2.2), we obtain, for every bounded or nonnegative functional H on (Rd )Z , continuous with respect to the product topology, E[H(x−1 X)1{|X 0 | > x}] x→∞ P(|X 0 | > x) = ν(H1E0 ) = E[H(Y )] .

lim nE[H(a−1 n X)1{|X 0 | > an }] = lim

n→∞

The penultimate equality is justified by the homogeneity of ν which implies that E0 is a continuity set for ν. Since weak convergence for the product topology is equivalent to the weak convergence of finite-dimensional distributions, (see Lemma A.2.16) this is equivalent to   d L x−1 (X i , . . . , X j ) | |X 0 | > x −→ (Y i , . . . , Y j ) , as x → ∞ for all i ≤ j ∈ Z.

(5.2.3)

5.2 The tail process

107

By definition of the tail process and homogeneity of the tail measure, we have, for y > 0  1{|y 0 | > y}1{|y 0 | > 1}ν(dy) P(|Y 0 | > y) = (Rd )Z  = 1{|y 0 | > y ∨ 1}ν(dy) = (y ∨ 1)−α . (5.2.4) (Rd )Z

This shows that |Y 0 | is a Pareto random variable and is greater than 1 with probability 1. On the contrary, for j = 0, Y j is unconstrained and might even be zero with positive probability. In particular Y j = 0 a.s. for all j = 0 if all the bidimensional distributions of the series {X j } are extremally independent. If d = 1, Y0 has a two-sided Pareto distribution with skewness pX = limx→∞ P(X0 > x)/P(|X0 | > x), that is, for y ≥ 1, P(Y0 > y) = pX y −α , P(Y0 < −y) = (1 − pX )y −α . If d ≥ 1, |Y 0 | is independent of Y 0 /|Y 0 | whose distribution is the spectral measure of X 0 by Theorem 2.2.1. The spectral tail process It turns out that |Y 0 | is actually independent of the whole process Y / |Y 0 |. So we define, for j ∈ Z, Θj =

Yj . |Y 0 |

(5.2.5)

The sequence Θ = {Θj , j ∈ Z} is called the spectral tail process. By definition, Θ0 is concentrated on the sphere Sd−1 and its distribution is the spectral measure of X 0 in the sense of Theorem 2.2.1. If d = 1, Θ0 is concentrated on {−1, 1} and P(Θ0 = 1) = p = 1 − P(Θ0 = −1). Proposition 5.2.1 The spectral tail process Θ defined by (5.2.5) is independent of |Y 0 | and for all bounded or non-negative measurable functions H on (Rd )Z ,  ∞ E[H(Y )] = E[H(rΘ)]αr−α−1 dr . (5.2.6) 1

Proof. Let S be the space defined by S = {y ∈ (Rd )Z : |y 0 | = 1}. Then Θ is concentrated on S. Define the set E∗ = {y ∈ (Rd )Z : |y 0 | > 0} and the one-to-one map ψ by

108

5 Regularly varying time series

ψ : (0, ∞) × S → E∗ (r, θ) → rθ . Define the measure σ on S by σ(B) = ν ◦ ψ((1, ∞) × B) , for every measurable set B ⊂ S. Similarly to the finite-dimensional case (see Theorem 2.2.1), the homogeneity of ν implies that the measure ν ◦ ψ on (0, ∞) × S is a product measure, i.e. ν ◦ ψ((r, ∞) × B) = r−α σ(B) . For a non-negative measurable function H on (Rd )Z , this yields   ∞ H(y)1{|y 0 | > 0}ν(dy) = H(rθ)σ(dθ)αr−α−1 dr . (Rd )Z

0

(5.2.7)

S

−1

For H of the form H(y) = f (|y 0 |)g(|y 0 | y), the definition of Y and Θ and (5.2.7) yield  −1 E[f (|Y 0 |)g(Θ)] = f (|y 0 |)g(|y 0 | y)1{|y 0 | > 1}ν(dy) (Rd )Z ∞

 =

αf (r)r−α−1 dr

 g(θ)σ(dθ). S

1

This proves that |Y 0 | has a Pareto distribution and is independent of Θ whose distribution is σ. Therefore, the identity (5.2.7) can be re-expressed as   ∞ H(y)1{|y 0 | > 0}ν(dy) = E[H(rΘ)]αr−α−1 dr . (5.2.8) (Rd )Z

0

This yields (5.2.6).



A straightforward consequence of (5.2.5) and the independence of Θ and |Y 0 | is the identity

α  |Θs | ∧1 , (5.2.9) P(|Y s | > x) = E x which is obtained by applying (5.2.6) with H(y) = 1{|y s | > 1}. The existence of the tail process is equivalent to the regular variation of the time series. Moreover, because of stationarity, it is sufficient to prove the existence of the forward tail process {Y j , j ∈ N} to prove regular variation.

5.2 The tail process

109

Theorem 5.2.2 Let X be a stationary sequence and α ∈ (0, ∞). The following statements are equivalent: (i) X is regularly varying with tail index α. (ii) There exists a norm |·| such that |X 0 | is regularly varying and there exists a process {Y j , j ∈ Z} such that P(|Y 0 | > y) = y −α for all y ≥ 1 and (5.2.3) holds for all i ≤ j ∈ Z. (iii) There exists a norm |·| such that |X 0 | is regularly varying and there exists a process {Y j , j ∈ N} such that P(|Y 0 | > y) = y −α for all y ≥ 1 and (5.2.3) holds for i = 0 and all j ∈ N. If any of these conditions holds, the distribution of the process Y is the tail measure restricted to the set E0 .

Proof. The implication (i) =⇒ (ii) is already proved. To prove the converse, it suffices to prove that the vectors (X s , . . . , X t ) are regularly varying for all s ≤ t ∈ Z. Let H be a continuous function on Rt−s+1 \{0} such that H(x) = 0 if x∗ ≤  for some  > 0. Decomposing the event {X ∗s,t > x} according to which component is first above the level x, applying stationarity and the assumption (ii), we obtain E[H(x−1 X s,t )] x→∞ P(|X 0 | > x)

t  E[H(x−1 X s,t )1 X ∗s,i−1 ≤ x 1{|X i | > x}] = lim x→∞ P(|X 0 | > x) i=s ∗

t P(|X 0 | > x)  E[H(x−1 B i X s,t)1 X s−i,−1 ≤ x 1{|X 0 | >x}] = lim x→∞ P(|X 0 | > x) P(|X 0 | >x) i=s lim

= −α

t 



E[H(B i Y s,t )1 Y ∗s−i,−1 ≤ 1 ] .

i=s



The indicator 1 Y ∗s−i,−1 ≤ 1 creates no difficulty for passing to the limit since |Y 0 | is a Pareto random variable independent of the spectral tail process Θ. Indeed, (ii) implies that for a bounded continuous map H on (Rd ) and y > 1, lim E[H(|X 0 |

x→∞

−1

X)1{|X 0 | > xy} | |X 0 | > x]

= lim E[H(|X 0 |

−1

x→∞

= y −α E[H(|Y 0 |

−1

X) | |X 0 | > xy]

Y )] .

P(|X 0 | > xy) P(|X 0 | > x)

110

5 Regularly varying time series

Thus P(|Y j | = ) = 0 for all j ∈ Z and all  > 0. For a Borel set A ⊂ Rt−s+1 \ {0} such that x ∈ A implies x∗ > , define ν s,t (A) = −α

t 

P(B i Y s,t ∈ A, Y ∗s−i,−1 ≤ 1) .

(5.2.10)

i=s

Since sets separated from 0 are measure determining (see Theorem B.1.17), this defines a boundedly finite measure on Rt−s+1 \ {0}. This proves that X s,t is regularly varying and that its exponent measure is ν s,t . The implication (ii) =⇒ (iii) is trivial and (iii) implies (i) by Problem 2.6. If (ii) holds, then the distribution of Y s,t is η s,t for all s ≤ 0 ≤ t and thus the distribution of Y is η which is our claim.  Remark 5.2.3 This result also shows that there is a one-to-one correspondence between the tail measure and the tail process since the tail measure is determined by its finite-dimensional projections by Theorem 5.1.2. This will be discussed in greater generality in Section 5.4. ⊕ The existence of the spectral tail process can be seen as an alternative definition of regular variation of a time series. Proposition 5.2.4 Let {X j , j ∈ Z} be a stationary Rd -valued sequence such that |X 0 | is regularly varying with tail index α > 0. Then the following conditions are equivalent: (i) {X j , j ∈ Z} is regularly varying. (ii) There exists a process {Θj , j ∈ Z} such that for all s ≤ t ∈ Z and all continuity sets A ⊂ (Rd )t−s+1 of the distribution of (Θs , . . . , Θt ), we have



Xs Xt ,..., lim P ∈ A | |X 0 | > x = P((Θs , . . . , Θt ) ∈ A) . x→∞ |X 0 | |X 0 | (5.2.11) (iii) There exists a process {Θj , j ∈ N} such that for all t ∈ N and all continuity sets A ⊂ (Rd )t+1 of the distribution of (Θ0 , . . . , Θt ), we have



X0 Xt ,..., lim P ∈ A | |X 0 | > x = P((Θ0 , . . . , Θt ) ∈ A) . x→∞ |X 0 | |X 0 | (5.2.12)

5.2 The tail process

111

If these conditions hold, then the tail process {Y j , j ∈ Z} of {X j , j ∈ Z} is given by Y j = Y Θj where Y is a Pareto random variable with tail index α, independent of the sequence Θ.

Proof. The implication (i) ⇒ (ii) is a consequence of Proposition 5.2.1 and Theorem 5.2.2. To prove the converse, it suffices to prove that (ii) implies (ii) of Theorem 5.2.2. By regular variation of |X 0 |, we have, for  > 1,



Xs Xt ,..., lim P ∈ A, |X 0 | > x | |X 0 | > x x→∞ |X 0 | |X 0 | = −α P((Θs , . . . , Θt ) ∈ A) . −1

This proves that conditionally on |X 0 |, (x−1 |X 0 | , |X 0 | X s,t ) converges weakly to a vector (Y, Θs,t ) where Y is independent of Θs,t and has a Pareto distribution. By the continuous mapping theorem, this proves that (ii) of Theorem 5.2.2 holds. Similarly, (iii) implies (iii) of Theorem 5.2.2 hence implies (i). 

5.2.1 Determination of the tail process by approximations It may happen that the original time series can be approximated by a sequence of tail equivalent time series. If the sequence of tail processes of the approximations converges (in the sense of finite-dimensional distributions) then the limit is the tail process of the original series. Proposition 5.2.5 Let X be a stationary time series such that X 0 is regularly varying with tail index α. Let X (m) , m ≥ 1, be stationary regularly varying time series with respective tail process Y (m) . Assume that for j ∈ Z,     (m)  P X j − X j  > x =0. (5.2.13) lim lim sup m→∞ x→∞ P(|X 0 | > x) Then, for every u > 0, lim lim sup

m→∞ x→∞

   (m)  P(X 0  > xu) P(|X 0 | > x)

= u−α . f i.di.

(5.2.14)

Furthermore, there exists a process Y such that Y (m) −→ Y if and only if X is regularly varying and Y is the tail process of X.

112

5 Regularly varying time series

Remark 5.2.6 For (5.2.14) to hold, it suffices that (5.2.13) holds for j = (m) 0. If the distribution of (X j , X j ) does not depend on j, (which holds if (m)

{(X j , X j ), j ∈ Z} is stationary for every m ≥ 1), then (5.2.13) can be stated simply with j = 0. ⊕ Proof. We will use repeatedly the following inequality. For a metric space (E, d), we denote by Ac the complement of A and by A the -enlargement of A. Then, for all subsets A and elements x, y of E, we have |1A (x) − 1A (y)| ≤ 1A ∩(Ac ) (x)1A ∩(Ac ) (y) + 1{d(x, y) > } .

(5.2.15)

Note that lim→0 1A ∩(Ac ) (x) = 1∂A (x) for all x ∈ E, where ∂A is the topological frontier of A, that is, the set of points which are 0 of   at distance  (m)  (m) c = X i . Applying both A and A . For clarity, write Xi = |X i | and Xi (5.2.15) with E = (0, ∞) and A = (1, ∞), x = X (i) /x, y = X/x and the assumption (5.2.13), we have, for  > 0,   (m) E[|1 Xi > x − 1{Xi > x}|] lim lim sup m→∞ x→∞ P(X0 > x) P(x(1 − ) ≤ Xi ≤ x(1 + )) ≤ lim sup P(X0 > x) x→∞ (m)

P(|Xi − Xi | > x) m→∞ x→∞ P(X0 > x) P(x(1 − ) ≤ Xi ≤ x(1 + )) ≤ lim sup P(X0 > x) x→∞ + lim lim sup

(m)

P(|X i − X i | > x) m→∞ x→∞ P(X0 > x) −α −α ≤ (1 − ) − (1 + ) . + lim lim sup

Since  is arbitrary, this yields   (m) E[|1 Xi > x − 1{Xi > x}|] lim lim sup =0. m→∞ x→∞ P(X0 > x)

(5.2.16)

The latter implies     P(X (m) > x)   0 lim lim sup  − 1 = 0 . m→∞ x→∞  P(X0 > x) 

(5.2.17)

This proves (5.2.14). We will now prove the stated equivalence. For s ≤ t ∈ Z and Borel sets As , . . . , At , we have,

5.2 The tail process

113

  t   t        (m) (m) {X i ∈ xAi } ∩ {X0 > x}) − P {X i ∈ xAi } ∩ {X0 > x}  P   i=s i=s   t  t     (m)  ≤ E  1xAi (X i ) − 1xAi (X i ) 1{X0 > x}   i=s i=s      (m) + E 1{X0 > x} − 1 X0 > x  ≤

t      (m)  E 1xAi (X i ) − 1xAi (X i ) 1{X0 > x} i=s

     (m) + E 1{X0 > x} − 1 X0 > x  .

(5.2.18)

n n n To obtain (5.2.18), we have used the bound | i=1 ai − i=1 bi | ≤ i=1 |ai − bi |, valid for all n ≥ 1 and all real numbers ai , bi ∈ [0, 1]. Applying (5.2.15) to the term first term in (5.2.18) yields, for each i ∈ Z and  > 0,     (m)  E 1xAi (X i ) − 1xAi (X i ) 1{X0 > x}       (m)  ≤ E |1Ai ∩(Aci ) (X i /x)1{X0 > x} + P X i − X i  > x . Assume now that X is regularly varying with tail process Y . Then, for  such that Ai ∩ (Aci ) is a continuity set of Y i (there are only countably many  which fail to have this property), we have, applying (5.2.13),      (m) E 1Ai (X i /x) − 1Ai (X i /x) 1{X0 > x} lim lim sup m→∞ x→∞ P(|X 0 | > x)   ≤ E |1Ai ∩(Aci ) (Y i ) . (5.2.19) Since  is arbitrary, this yields      (m) E 1Ai (X i /x) − 1Ai (X i /x) 1{X0 > x} lim lim sup =0. m→∞ x→∞ P(|X 0 | > x)

(5.2.20)

Combining (5.2.16), (5.2.17), (5.2.18), and (5.2.20), we obtain    t    lim lim sup P {X i ∈ xAi } | X0 > x) m→∞ x→∞  i=s  t     (m) (m) −P {X i ∈ xAi } | X0 > x  = 0 . (5.2.21)  i=s

f i.di.

This proves that Y (m) −→ Y .

114

5 Regularly varying time series f i.di.

Conversely, assume that there exists a process Y such that Y (m) −→ Y . Then, |P(X i ∈ xAi , s ≤ i ≤ t | X0 > x) − P(Y i ∈ Ai , s ≤ i ≤ t)|     (m) (m) ≤ P(X i ∈ xAi , s ≤ i ≤ t | X0 > x) − P(X i ∈ xAi , s ≤ i ≤ t | X0 > x)     (m) (m) (m) + P(X i ∈ xAi , s ≤ i ≤ t | X0 > x) − P(Y i ∈ Ai , s ≤ i ≤ t)     (m) + P(Y i ∈ Ai , s ≤ i ≤ t) − P(Y i ∈ Ai , s ≤ i ≤ t) .   (m) Replacing 1{X0 > x} by 1 X0 > x in the computations that lead to (5.2.19) yields       (m) (m) E 1Ai (X i /x) − 1Ai (X i /x) 1 X0 > x lim sup P(|X 0 | > x) x→∞     (m)   P(X i − X i  > x) (m) −α . (5.2.22) ≤ E |1Ai ∩(Aci ) (Y i ) +  lim sup P(X0 > x) x→∞ (m)

Thus, for sets Ai which are continuity sets of Y i for all m and Y i , letting m tend to ∞, we obtain    t    lim lim sup P {X i ∈ xAi } ∩ {X0 > x}) m→∞ x→∞  i=s  t     (m) (m) −P {X i ∈ xAi } ∩ {X0 > x}   i=s



t 

  E |1Ai ∩(Aci ) (Y i ) .

i=s (m)

Since  is arbitrary, the fact that Y is the tail process of X (m) and the f i.di. assumption that Y (m) −→ Y , we obtain lim P(∩ti=s {X i ∈ xAi } | X0 > x) = P(Y i ∈ Ai , s ≤ i ≤ t) .

x→∞

(m)

The class of Borel sets Ai ⊂ Rd which are continuity sets of all Y i and Y i for all i, m ≥ 1, is convergence determining by Lemma A.2.9. By assumption, (m) d Y 0 −→ Y 0 , thus |Y 0 | has a Pareto distribution. By Theorem 5.2.2, this proves that X is regularly varying and Y is the tail process of X. 

5.2.2 Examples We now give several examples of tail and spectral tail processes.

5.2 The tail process

115

Example 5.2.7 Assume that {Xj , j ∈ Z} is a sequence of i.i.d. regularly varying random variables with tail index α and skewness pX . Then the tail process is Yj = 0, j = 0 and Θj = 0 for j = 0, and P(Θ0 = 1) = pX = 1−P(Θ0 = −1). More generally, if the sequence is m-dependent and regularly  varying then Yj = 0 for |j| > m. Example 5.2.7 can be extended to any multivariate extremally independent time series. Proposition 5.2.8 If the finite-dimensional distributions of the regularly varying time series {X j , j ∈ Z} are extremally independent, then the tail process and the spectral process are identically zero at all lags j = 0.

The next example describes another degenerate case. Example 5.2.9 Assume that {Xj } is a sequence of regularly varying random variables such that Xj = X0 for all j ∈ Z. Then the tail process is Yj = Y0 ,  j ∈ Z and Θj = Θ0 for j ∈ Z. Example 5.2.10 (MA(1) process) Consider the MA(1) process {Xj , j ∈ Z} defined by Xj = Zj + φZj−1 , where φ = 0 and {Zj , j ∈ Z} is an i.i.d. sequence of regularly random variables with tail index α > 0 and skewness pZ ∈ [0, 1]. The process {Xj , j ∈ Z} is stationary and regularly varying with tail index α. Indeed, stationarity is straightforward and the process is 1-dependent so the only property to be checked is the regular variation of the vector (X0 , X1 ). It is a consequence of Corollary 2.1.14 since ⎞ ⎛

Z−1 X0 φ10 ⎝ Z0 ⎠ . = X1 0φ1 Z1 Since the process is 1-dependent, Yj = 0 if |j| > 1. By the “single big jump heuristics” (see Example 2.1.4), |X0 | is extremely large because either |Z0 | or |Z−1 | is large. Formally, for x > 0 P(|X0 | > x) ∼ P(|Z0 | > x) + P(|φ||Z−1 | > x) ∼ (1 + |φ|α )P(|Z0 | > x) . Let H be a bounded continuous function on R3 . Then,

(5.2.23)

116

5 Regularly varying time series

E[H(x−1 (X−1 , X0 , X1 ) | |X0 | > x] E[H(x−1 Z−1 , x−1 φZ−1 , 0)1{|φZ−1 | > x}] (1 + |φ|α )P(|Z0 | > x) −1 E[H(0, x Z0 , x−1 φZ0 )1{|Z0 | > x}] + (1 + |φ|α )P(|Z0 | > x) α ˆ Z, ˆ 0)] + E[H(0, Z, ˜ φZ)] ˜ |φ| E[H(φ−1 Z, , → α 1 + |φ|



with Z˜ a two-sided Pareto random variable with skewness pZ and Z˜ a twosided Pareto random variable with skewness pφZ . Let b be a Bernoulli random variable with mean (1 + |φ|α )−1 , independent of ˆ This yields Y0 = bZ˜ + (1 − b)Z, ˆ Y1 = bφY0 , Y−1 = (1 − b)φ−1 Y0 , Z˜ and Z. and Yj = 0 for all j = 0. The skewness of X0 is given by α pZ (1 + φα + ) + (1 − pZ )φ− . (5.2.24) pX = P(Y0 > 0) = α 1 + |φ| This example will be extended in Chapter 15 to more general models.



Since the tail process is defined as a limit, different processes can have the same tail process. Example 5.2.11 (MMA(1) process) Consider the MMA(1) process {Xj } defined by Xj = max(Zj , φZj−1 ), j ∈ Z, where {Zj } is a sequence of i.i.d. non-negative random variables with (right) tail index α > 0 and φ > 0. Stationarity is again obvious and by 1-dependence, we only need to check the joint regular variation of (X0 , X1 ). It follows from Corollary 2.1.18. Since the Zj are i.i.d., the tail process is given by P(Y1 > y) = lim P(X1 > xy | X0 > x) x→∞

P(Z1 ∨ (φZ0 ) > xy, Z0 ∨ (φZ−1 ) > x) x→∞ P(Z0 ∨ (φZ−1 ) > x) P(Z0 > x{(y/φ) ∨ 1}) = lim x→∞ P(Z0 > x) + P(Z0 > x/φ) {(y/φ) ∨ 1}−α (φ/y)α ∧ 1 = = = P(φbY0 > y) , 1 + φα 1 + φα = lim

with b a Bernoulli random variable with mean 1/(1 + φα ), and Y0 has a one-sided Pareto distribution. The distribution of Y−1 is obtained by similar computations and we see that the tail process has the same form as the one  obtained in Example 5.2.10 with Y0 one sided. Example 5.2.12 (AR(1) process) Let {Zj , j ∈ Z} be a sequence of i.i.d. non-negative regularly varying random variables with tail index α > 0. Consider the recursion

5.3 The time-change formula

117

Xj+1 = ρXj + Zj+1 , with ρ ∈ (0, 1). The stationary solution {Xj , j ∈ Z} can be expressed as Xj =

∞ 

ρi Zj−i , j ∈ Z .

i=0

By Corollary 4.2.1, the tail of the stationary solution is regularly varying and P(X0 > x) ∼ (1 − ρ)−α P(Z0 > x) , x → ∞ . Moreover, for j ≥ 1 and k ∈ Z, Xj+k =

j−1 

ρi Zk+j−i + ρj Xk .

i=0

By the “one big jump heuristics,” for j ≥ 1, if X0 and one of X1 , . . . , Xj are simultaneously extremely large, then X0 is solely responsible for this exceedence. By causality, if j < 0 and if X0 and one of X−j , . . . , X−1 is large, then this past large value is responsible for the exceedence of X0 . Formally, we can describe the tail process as follows. Let {bj } be a sequence of i.i.d. Bernoulli random variables with mean ρα . Then, for each j ≥ 1, conditionally on X0 > x, d

x−1 (X−j , . . . , Xj ) −→ Y0 (b1 . . . bj ρ−j , . . . , b1 b2 ρ−2 , b1 ρ−1 , 1, ρ, . . . , ρj ) , where Y0 is a Pareto random variable with tail index α. Again, this example will be extended in Chapter 15 to more general models.  Example 5.2.13 Let {X j , j ∈ Z} be a stationary regularly varying sequence with tail process {Y j , j ∈ Z} and the spectral tail process {Θj , j ∈ Z}. Let q q > 0. Then the sequence {|X j | , j ∈ Z} is regularly varying with tail process q q  {|Y j | , j ∈ Z} and spectral tail process {|Θj | , j ∈ Z}.

5.3 The time-change formula We now state an important identity which will be used several times to compute probabilities or moments related to the tail process. Theorem 5.3.1 (Time-change formula) Let ν be a measure on (Rd )Z which satisfies conditions (i) to (iv) of Theorem 5.1.2 and let Y and Θ

118

5 Regularly varying time series

be defined by (5.2.2) and (5.2.5). Let H be a bounded or non-negative measurable functional on (Rd )Z . Then, for j ∈ Z and t > 0, E[H(tB j Y )1{t|Y −j | > 1}] = tα E[H(Y )1{|Y j | > t}] , 

Θ j α . E[H(B Θ)1{|Θ−j | = 0}] = E H |Θj | |Θj |

(5.3.1a) (5.3.1b)

The quantity inside the expectation in the right-hand side of (5.3.1b) is understood to be 0 when |Θj | = 0. Proof. Applying (5.2.2), the shift-invariance and homogeneity of ν yield E[H(tB j Y )1{t |Y −j | > 1}]  

 = H(tB j y)1{|y 0 | > 1}1 t y −j  > 1 ν(dy) d Z (R ) 

  = H(ty)1 ty j  > t 1{|ty 0 | > 1} ν(dy) d Z (R )  α =t H(x)1{|xj | > t}1{|x0 | > 1} ν(dx) (Rd )Z

= t E[H(Y )1{|Y j | > t}] . α

This proves (5.3.1a). To prove (5.3.1b), assume first that H is a bounded measurable functional, homogeneous with degree 0. Applying then (5.2.6), the left-hand side of (5.3.1a) becomes E[H(tB j Y )1{|Y −j | > 1/t}] = E[H(B j Θ)1{|Y −j | > 1/t}] . By bounded convergence, the last term converges to the left-hand side of (5.3.1b) as t tends to infinity. Applying now (5.2.6) to the right-hand side of (5.3.1a), we obtain   ∞ 1{r |Θj | > t}αr−α−1 dr tα E[H(Y )1{|Y j | > t}] = tα E H(Θ) 1

α  |Θj | = tα E H(Θ) ∧1 t α

= E [H(Θ) (|Θj | ∧ t) ] . The last term converges to the right-hand side of (5.3.1b) as t tends to infinity and thus (5.3.1b) holds when H is homogeneous with degree 0. If H is not homogeneous degree zero, define the function g on (Rd )Z    with

  −1 by g(y) = H y j  y 1 y j  > 0 . The function g is homogeneous with

5.3 The time-change formula

119

degree 0 and applying the previously proved identity we obtain (using |Θ0 | = 1) E[H(|Θj |

−1

α

α

Θ) |Θj | ] = E[g(Θ) |Θj | ] = E[g(B j Θ)1{|Θ−j | = 0}] = E[H(|Θ0 |

−1

B j Θ)1{|Θ−j | = 0}]

= E[H(B j Θ)1{|Θ−j | = 0}] .



Remark 5.3.2 We have proved the time-change formula by means of the tail measure and without relation to the original time series. This means that if ν is a shift-invariant measure on (Rd )Z satisfying conditions (i) to (iv) of Theorem 5.1.2 and if Y is defined as a random element with distribution η (defined in (5.2.1)), then Y satisfies the time-change formula (5.3.1a). However, the time-change formula can be also proved by using the definition of the tail process as a limit. See Problem 5.10. ⊕ Remark 5.3.3 In the proof, we have shown that (5.3.1a) implies (5.3.1b). These identities are actually equivalent. See Problem 5.12. ⊕ Remark 5.3.4 If H is homogeneous with degree 0, then (5.3.1b) reads E[H(B j Θ)1{|Θ−j | = 0}] = E [H(Θ)|Θj |α ] .

(5.3.2)

If H is homogeneous with degree α, then (5.3.1b) reads E[H(B j Θ)1{|Θ−j | = 0}] = E[H(Θ)1{Θj = 0}] .

(5.3.3) ⊕

Since the spectral tail process is not necessarily bounded (except for Θ0 ), it is important to know the moments of Θj . The time-change formula allows to prove that E[|Θj |α ] is finite.

α

Corollary 5.3.5 For every j ∈ Z, E[|Θj | ] = P(|Θ−j | = 0). If P(Θj = 0) = 1 then P(Θ−j = 0) = 1.

Proof. Apply (5.3.1b) with H ≡ 1.



The second statement of the corollary has an intuitive meaning. Since Θj can be interpreted as the extremal dependence between X 0 and X j , by stationarity, if this dependence vanishes, then so does the dependence between X −j and X 0 which is measured by Θ−j .

120

5 Regularly varying time series

5.4 From the tail process to the tail measure From now on we will forget the original time series and consider the two main objects that we have defined: the tail measure and the tail process. We will only consider shift-invariant tail measures and tail processes which satisfy the time-change formula. Definition 5.4.1 1. A tail measure on (Rd )Z is a measure which satisfies conditions (i)–(iv) of Theorem 5.1.2. 2. A tail process with tail index α is a random element Y on (Rd )Z such that P(|Y 0 | > 1) = 1 and which satisfies the time-change formula (5.3.1a). 3. An α-spectral tail process is a random element Θ on (Rd )Z such that P(|Θ0 | > 0)=1 and which satisfies the time-change formula (5.3.1b).

A measure which satisfies conditions (i) to (iv) of Theorem 5.1.2 is boundedly finite with respect to the boundedness B (∞) (defined above Corollary 5.1.3), that is, ν(A) < ∞ for a measurable set A such that x ∈ A implies sup|j|≤k |xj | >  for  > 0 and k ∈ Z depending on A. A tail process Y is defined without reference to a tail measure. When its distribution is given by (5.2.2), we will call it the tail process associated to the tail measure ν. The purpose of this section is to show that every tail process is associated to a tail measure. If Y is a tail process with tail index α, applying the time-change formula (5.3.1a) with j = 0 and t > 1 yields E[H(|Y 0 |

−1

Y )1{|Y 0 | > t}] = t−α E[H(|Y 0 |

−1

Y )1{t |Y 0 | > 1}]

= t−α E[H(|Y 0 |

−1

Y )] .

−1

This proves that |Y 0 | is independent of |Y 0 | Y which is an α-spectral tail process. Conversely, if Θ is an α-spectral tail process and Y is a Pareto random variable with tail index α, independent of Θ, then Y Θ is a tail process with tail index α. Note that when Y is defined in reference to a tail measure as −1 in Section 5.2, the independence between |Y 0 | and |Y 0 | Y is a consequence of the spectral decomposition of the tail measure (cf. Proposition 5.2.1). Here this independence property is a consequence of the time-change formula. Given a shift-invariant tail measure ν, we already know that we can define a tail process as a random element with distribution η defined in (5.2.1). For a tail measure and a tail process obtained from a stationary time series as in Sections 5.1 and 5.2, we have already seen that the tail process determines

5.4 From the tail process to the tail measure

121

the tail measure since the latter is determined by its finite-dimensional projections, see Remark 5.2.3. We will show in the next section how to obtain this equivalence in the present abstract framework. Our first result is that the tail process associated to a shift-invariant tail measure ν completely determines the tail measure. Let {wj , j ∈ Z} be a sequence of non-negative real numbers and define the map Ew on (Rd )Z by 

  (5.4.1) wj 1 y j  > 1 . Ew (y) = j∈Z

The sequence {wj , j ∈ Z} can be chosen in such a way that  ν({Ew = ∞}) = 0. If the sequence {wj , j ∈ Z} is summable, then Ew (y) ≤ j∈Z wj for all y. If ν is supported on the subset of (Rd )Z which consists of sequences with finitely many exceedences over 1, then the constant sequence wj ≡ 1 can be chosen. In the latter case we simply write E. Since ν is boundedly finite in E = (Rd )Z \ {0}, it is characterized by its value on bounded or non-negative measurable functions H with bounded support in E, that is, such that H(y) = 0 if y ∗ ≤  for one  > 0. For such functions, we show how to recover the tail measure from the tail process. Theorem 5.4.2 Let ν be a shift-invariant (−α)-homogeneous tail measure and let Y be the tail process associated to ν. Let {wj , j ∈ Z} be a sequence of positive real numbers such that ν({Ew = ∞}) = 0. Then, for every bounded or non-negative measurable map H on (Rd )Z such that H(y) = 0 if y ∗ ≤  for some  > 0 (depending on H),   H(B j Y ) −α wj  E ν(H) = . (5.4.2) Ew (B j Y ) j∈Z

   Proof. Fix  > 0 and define Ew (y) = Ew (−1 y) = j∈Z wj 1 y j  >  . The homogeneity of tail measure implies that ν({Ew = ∞}) = 0 for all  > 0. If H(y) = 0 for y such that y ∗ ≤ , multiplying and dividing by Ew (y) and applying the homogeneity and shift-invariance properties of the tail measure and (5.2.2), we obtain

122

5 Regularly varying time series

ν(H) = ν(H1{Ew > 0}) = =



wj −α

j∈Z



 E



 j∈Z

 wj

E

H(y)   1 y j >  ν(dy)  Ew (y)

H(y)   1 y j > 1 ν(dy) Ew (y)

H(B j y) 1{|y 0 | > 1}ν(dy) j E Ew (B y) j∈Z   H(B j Y ) −α wj  E = . Ew (B j Y ) =

wj −α

j∈Z

 If P(E(Y ) < ∞) = 1, then it is possible to let w be the constant sequence equal to 1, and E being shift-invariant, (5.4.2) becomes   H(B j Y ) −α  E ν(H) = . (5.4.3) E(Y ) j∈Z

This is not a completely explicit construction. We will next show how to obtain a fully explicit representation under an additional restriction and a general construction will be given in Problem 5.36. As an immediate and important consequence, we obtain that shift-invariant ν-null sets are also null sets with respect to the distribution of the tail process. We say that a subset A of (Rd )Z is shift-invariant if A = BA and homogeneous if rA = A for all r > 0. Lemma 5.4.3 Let A be a shift-invariant homogeneous measurable set. Then ν(A) = 0 if and only if P(Y ∈ A) = 0. Proof. Since P(Y ∈ A) = ν(A ∩ {|y 0 | > 1}), the direct implication is trivial. Conversely, fix a summable series {wj , j ∈ Z} of positive numbers. For n ≥ 1, set An = A ∩ {y : Ew (ny) > 0}. Since A is homogeneous, nAn = {y : n−1 y ∈ An } = {y : n−1 y ∈ A, Ew (y) > 0} = A ∩ {Ew > 0} . Applying the homogeneity of the tail measure, (5.4.2) and the shift-invariance of A yield     1nAn (B j Y ) 1A (Y ) α wj E w E ν(An ) = nα ν(nAn )=nα =n . j Ew (B j Y ) Ew (B j Y ) j∈Z

j∈Z

Thus P(Y ∈ A) = 0 implies ν(An ) = 0 for all n ≥ 1. Since ν({0}) = 0, this further yields

5.4 From the tail process to the tail measure

123

ν(A) = ν(A \ {0}) = sup ν(An ) = 0 . n≥1

 The quantity E[E −1 (Y )] will play an important role and deserves a name. Definition 5.4.4 Let Y be a tail process. Its candidate extremal index ϑ is defined by  1 ϑ=E . (5.4.4) E(Y )

Obviously, ϑ ≤ 1 and ϑ > 0 if P(E(Y ) < ∞) > 0. In particular, we have P(Y ∈ 0 (Rd )) > 0 =⇒ ϑ > 0 .

(5.4.5)

Theorem 5.4.5 Let Y be a tail process with tail index α. Assume that ϑ > 0. Then for every bounded or non-negative shift-invariant measurable map H and x > 1,        H YY ∗ H YY ∗ 1{Y ∗ > x} −α =x E . (5.4.6) E E(Y ) E(Y )

Proof. Let H be a bounded non-negative shift-invariant measurable map on (Rd )Z . Then, applying the time-change formula (5.3.1a), we have, for x ≥ 1,     H( YY ∗ )1{Y ∗ > x} H( YY ∗ )1{Y ∗ > x} =E 1{E(Y ) < ∞} E E(Y ) E(Y )  

 H( YY ∗ )1 Y ∗−∞,j−1 ≤ x 1{|Y j | > x} = E E(Y ) j∈Z   ∗

Y  H( )1 Y ≤ 1 1{|Y | > 1} ∗ −j −∞,−1 Y = x−α E E(Y ) j∈Z 

∗ Y −α =x E H 1 Y −∞,−1 ≤ 1 1{E(Y ) < ∞} . Y∗ Note that we used the fact that the map y → H((y ∗ )−1 y) is zero-homogenous while applying the time-change formula. Since P(Y ∗ > 1) = 1, taking x = 1 yields

124

5 Regularly varying time series

 

∗ H( YY ∗ ) Y =E H E 1 Y −∞,−1 ≤ 1 1{E(Y ) < ∞} . E(Y ) Y∗ 



Combining these identities yields (5.4.6).

Since Y ∗ is the supnorm of the sequence Y , the process (Y ∗ )−1 Y would deserve the name spectral tail process but it is already taken. This process will be important and therefore it will be convenient to give it a name and we introduce the following definition.

Definition 5.4.6 Let Y be a tail process with positive candidate extremal index. The conditional spectral tail process Q is a random sequence with the distribution of (Y ∗ )−1 Y under the probability PE defined by PE (A) = ϑ−1 E[E −1 (Y )1A ] . That is, for every bounded or non-negative measurable map H,  H((Y ∗ )−1 Y ) ϑE[H(Q)] = E . E(Y )

(5.4.7)

The adjective conditional will be justified in Section 5.5. An important consequence of this definition and Theorem 5.4.5 is that for every bounded or non-negative shift-invariant measurable map H,    ∞ H(Y ) H(Y ∗ (Y /Y ∗ )) E[H(rQ)]αr−α−1 dr . (5.4.8) E =E =ϑ E(Y ) E(Y ) 1 Applying (5.4.8) with H = E yields   α   ϑ E Qj  = ϑ j∈Z

j∈Z



1



  P(r Qj  > 1)αr−α−1 dr

E(Y ) =E = P(E(Y ) < ∞) . E(Y )

Thus, in general, we have ϑ

  α  E Qj  ≤ 1 .

(5.4.9)

j∈Z

If P(E(Y ) < ∞) = 1, then ϑ

  α  E Qj  = 1 . j∈Z

We can complement Lemma 5.4.3.

(5.4.10)

5.4 From the tail process to the tail measure

125

Lemma 5.4.7 Assume that P(E(Y ) < ∞) = 1 and let A be a shift-invariant homogeneous measurable subset of (Rd )Z . Then ν(A) = 0 if and only if P(Q ∈ A) = 0. Proof. By Lemma 5.4.3, it suffices to prove that P(Y ∈ A) = 1 if and only if P(Q ∈ A) = 1. Since we have assumed that P(E(Y ) < ∞) = 1, (5.4.8) yields  E(Y )1{Y ∈ A} P(Y ∈ A) = E E(Y )  ∞ E[E(rQ)1{Q ∈ A}]αr−α−1 dr =ϑ 1   α =ϑ E[Qj  1{Q ∈ A}] . j∈Z

Since (5.4.10) holds, the stated equivalence follows.



We can now prove a very important result. Recall that α (Rd ) is the set of  d Z sequences y ∈ (R ) such that j∈Z |y j |α < ∞. Theorem 5.4.8 Let Y be a tail process with tail index α. The following conditions are equivalent: (i) P(E(Y ) < ∞) = 1; (ii) P(Y ∈ 0 (Rd )) = 1; (iii) P(Y ∈ α (Rd )) = 1.

Proof. Condition (iii) implies ii which in turn implies (i). Conversely, assume that (i) holds. Then (5.4.10) holds and implies that P(Q ∈ α (Rd )) = 1. Since α (Rd ) is homogeneous and shift-invariant, Lemma 5.4.7 implies that  P(Y ∈ α (Rd )) = 1. Since the tail process determines the tail measure (by Theorem 5.4.2) and since a shift-invariant homogeneous set is null for the tail measure if and only if it is null for the tail process (by Lemma 5.4.3), we can rewrite these equivalences in terms of the tail measure. Theorem 5.4.9 Let ν be a shift-invariant (−α)-homogeneous tail measure. Then the following statements are equivalent:

126

5 Regularly varying time series

(i ) ν ({E = ∞}) = 0; (ii ) ν(c0 (Rd )) = 0; (iii ) ν(cα (Rd )) = 0.

Finally, we prove that given a tail process which is concentrated on the set 0 (Rd ) of sequences tending to 0 at ±∞, we can build a tail measure ν with associated tail process Y . The general case (without this restriction) is done in Problem 5.36. Theorem 5.4.10 Let Y be a tail process with tail index α such that P(Y ∈ 0 (Rd )) = 1 and Q be the conditional spectral tail process. Define the measure ν on (Rd )Z by  ∞ ν=ϑ E[δrB j Q ]αr−α−1 dr . (5.4.11) j∈Z

0

Then ν is the unique shift-invariant (−α)-homogeneous tail measure whose tail process is Y .

Proof. The measure ν is obviously shift-invariant and homogeneous and by definition,   α ν({|y 0 | > 1}) = ϑ E[Qj  ] = 1 . j∈Z

This ν is a tail measure and we must check that its associated tail process is Y . Since by definition Q∗ = 1, δrB j Q (0) = 0 for all j ∈ Z and r > 0 thus ν({0}) = 0. For a non-negative measurable map H, we have by the definition of ν, the fact that P(Q∗ = 1) = 1 and (5.4.8),  H(y)1{|y 0 | > 1}ν(dy) E  ∞

  E[H(rB j Q)1 r Qj  > 1 ]αr−α−1 dr =ϑ 0

j∈Z



⎤ ⎡   

E⎣ H(rB j Q)1 r Qj  > 1 ⎦ αr−α−1 dr



=ϑ 1

 =E

j∈Z

j∈Z

H(B j Y )1{|Y j | > 1} E(Y )

 .

5.5 Anchoring maps

127

Applying now the time-change formula (5.3.1a), we obtain   j∈Z H(Y )1{|Y −j | > 1} H(y)1{|y 0 | > 1}ν(dy) = E = E[H(Y )] . E(Y ) E This proves that Y is the tail process associated to ν.



In view of (5.4.11), we introduce a new measure ν ∗ on (Rd )Z which will be useful for limit theorems. Definition 5.4.11 (Cluster measure) Let ν be a (−α)-homogeneous tail measure such that ν(c0 (Rd )) = 0. The cluster measure ν ∗ associated to ν is the measure on 0 (Rd ) defined by  ∞  ∞  δ rY  Y ∗ E[δrQ ]αr−α−1 dr = E ν∗ = ϑ αr−α−1 dr . (5.4.12) E(Y ) 0 0

The terminology will be justified in Chapter 6. The measure ν ∗ is boundedly finite on (Rd )Z \ {0}, puts no mass at 0, and is (−α)-homogeneous. Furthermore, (5.4.11) becomes  ν ∗ ◦ Bj . (5.4.13) ν= j∈Z

In conclusion, the extremal behavior of a regularly varying time series whose tail process tends almost surely to zero can be described by five equivalent objects:

• the tail measure ν; • the cluster measure ν ∗ related to the tail measure by (5.4.13); • the tail process Y related to the tail measure by (5.2.2) and (5.4.3); • the spectral tail process Θ, defined by Θ = |Y 0 |

−1

Y;

• the conditional spectral tail process Q defined in Definition 5.4.6.

5.5 Anchoring maps The sequence Q can be redefined as (Y ∗ )−1 Y conditioned on certain events. This was already done implicitly in the proof of Theorem 5.4.5 for the special

128

5 Regularly varying time series

event {Y ∗−∞,−1 ≤ 1}. This is how it will appear in limits in Chapter 6 and subsequent chapters. The following class of maps will play a key role:

Definition 5.5.1 (Anchoring map) A measurable map A : (Rd )Z → Z∪{−∞, ∞} is called an anchoring map if it has the following properties:   (Ai) if A(y) = j ∈ Z, then y j  ≥ |y 0 | ∧ 1; (Aii) A(By) = A(y) + 1.

Example 5.5.2 – The time of the first exceedence above 1, denoted by A1 , defined by   A1 (y) = inf{j : y j  > 1} with the convention that inf ∅ = ∞ and A1 (y) = −∞ if there are infinitely many exceedences over 1 in Z− (the set of negative integers). – Similarly, the time of the last exceedence over 1 (equal to −∞ if there are no exceedences and ∞ if there is a first one and infinitely many) is an anchoring map. – The infargmax functional, defined by inf arg max(y) = inf{j : y ∗−∞,j = y ∗ }, is an anchoring map. – A random combination of anchoring maps is an anchoring map. For instance, A(y) could be the first exceedence if there is exactly one and the second one if there are more than one. This may not be very useful but shows that there are infinitely many such maps.  If a sequence has at least one and finitely many exceedences over 1, then it has a first and a last exceedence and its infargmax is in Z. We next prove that the converse is almost surely true for a tail process and obtain a representation of the sequence Q in terms of certain anchoring maps. Theorem 5.5.3 Let Y be a tail process. For every anchoring map, P(A(Y ) ∈ Z, E(Y ) = ∞) = 0 .

(5.5.1)

P(E(Y ) < ∞, A(Y ) ∈ / Z) = 0 .

(5.5.2)

Assume that

5.5 Anchoring maps

129

Then, for every non-negative or bounded shift-invariant measurable map H,  H(Y ) E = E[H(Y )1{A(Y ) = 0}] . (5.5.3) E(Y )

In (5.5.3), we use the convention map H, H H = 1{0 < H < ∞}.

0 0

=

∞ ∞

= 0 which implies that for a positive

Proof. Let A be an anchoring map and H a shift-invariant non-negative map. Since P(|Y 0 | > 1) = 1, A(Y ) = j implies |Y j | > 1. Applying the time-change formula (5.3.1a), we obtain E[H(Y )]  E[H(Y )1{A(Y ) = j}] + E[H(Y )1{A(Y ) ∈ / Z}] = j∈Z

=

 j∈Z

=



E [H(Y )1{A(Y ) = j}1{|Y j | > 1}] + E[H(Y )1{A(Y ) ∈ / Z}] 

 E H(Y )1 A(B j Y ) = j 1{|Y −j | > 1} + E[H(Y )1{A(Y ) ∈ / Z}]

j∈Z

=



E [H(Y )1{A(Y ) = 0}1{|Y −j | > 1}] + E[H(Y )1{A(Y ) ∈ / Z}]

j∈Z

= E[H(Y )E(Y )1{A(Y ) = 0}] + E[H(Y )1{A(Y ) ∈ / Z}] .

(5.5.4)

Taking H(y) = 1{E(y) = ∞} yields P(E(Y ) = ∞) = E[E(Y )1{E(Y ) = ∞}1{A(Y ) = 0}] + P(E(Y ) = ∞, A(Y ) ∈ / Z) . The first term in the right-hand side of the above equality can only be 0 or ∞. Since it is less than one, it is equal to zero and we conclude that P(E(Y ) = ∞) = P(E(Y ) = ∞, A(Y ) ∈ / Z) . Thus P(A(Y ) ∈ Z, E(Y ) = ∞) = 0. This proves the first claim. Then, for a bounded measurable shift-invariant map, (5.5.4) yields

(5.5.5)

130

5 Regularly varying time series

 E

 H(Y ) H(Y )1{E(Y ) < ∞} =E E(Y ) E(Y ) = E[H(Y )1{E(Y ) < ∞}1{A(Y ) = 0}]  H(Y )1{E(Y ) < ∞}1{A(Y ) ∈ / Z} +E E(Y ) = E[H(Y )1{A(Y ) = 0}]  H(Y )1{E(Y ) < ∞}1{A(Y ) ∈ / Z} . +E E(Y ) 

If (5.5.2) holds, the last term vanishes and this proves (5.5.3).

We obtain a new representation of the candidate extremal index and of the tail measure. Corollary 5.5.4 For every anchoring map A which satisfies (5.5.2), ϑ = P(A(Y ) = 0) . If P(Y ∈ 0 (Rd ) = 1, then  ∞ ν=ϑ E[δr(Y ∗ )−1 B j Y | A(Y ) = 0]αr−α−1 dr.

(5.5.6)

(5.5.7)

0

j∈Z

Note that (5.5.7) is true since for every non-negative map H, the map  j j∈Z H ◦ B is shift-invariant so we can use (5.5.3). This representation is not true for ν ∗ . However, for a non-negative or bounded shift-invariant measurable map H,  ∞ ν ∗ (H) = ϑ E[H(r(Y ∗ )−1 Y ) | A(Y ) = 0]αr−α−1 dr . (5.5.8) 0

If there exists  > 0 such that H(x) = 0 if x∗ ≤ , (5.5.8) becomes ν ∗ (H) = ϑ−α E[H(Y ) | A(Y ) = 0] .

(5.5.9)

We can also add one equivalent condition to Theorem 5.4.8. Corollary 5.5.5 Let Y be a tail process with tail index α. The following conditions are equivalent: (i) there exists an anchoring map A such that P(A(Y ) ∈ Z) = 1: (ii) P(E(Y ) < ∞) = 1.

5.6 Identities

131

Proof. If (ii) holds, then the first exceedence over 1, the last one and the infargmax satisfy (i). Conversely, if (i) holds, then P(E(Y ) = ∞) = 0 by (5.5.5). 

5.6 Identities As we will see in Chapter 6, the conditional spectral tail process Q appears naturally in limits. However, it is difficult to compute explicitly applying the definition or (5.5.3). It is therefore of interest to obtain expressions of the quantities introduced in the previous section in terms of the spectral tail process Θ and since the forward tail process is often easier to compute than the backward tail process, it will be even more convenient to obtain expressions of these quantities in terms of the forward tail process only. Lemma 5.6.1 Let Y be a tail process with tail index α such that P(E(Y ) < ∞) > 0. Let Θ and Q be the associated spectral and conditional spectral tail processes. Let H be a non-negative shift-invariant α-homogeneous functional. Then,  H(Θ) 1{|Θ| < ∞} = ϑE[H(Q)] . (5.6.1) E α α |Θ|α Note that both sides of (5.6.1) are simultaneously finite or infinite. α

Proof. Since |Y |α < ∞ implies E(Y ) < ∞ and P(|Q|α < ∞) = 1 by (5.4.9), we have by the homogeneity of H and (5.4.8),   H(Y )E(Y ) H(Θ) 1{|Y | 1{|Θ| < ∞} = E < ∞} E α α α α |Θ|α |Y | E(Y )  ∞ α H(Q)E(rQ) =ϑ E αr−α−1 dr α |Q|α 1 

    ∞ H(Q)1 r Qj  > 1 =ϑ E αr−α−1 dr α |Q| α j∈Z 1   α   H(Q) Qj  =ϑ E = ϑE[H(Q)] . α |Q|α j∈Z

 Taking H(x) = (x∗ )α yields the following corollary:

132

5 Regularly varying time series

Corollary 5.6.2 Let Θ be an α-spectral tail process. Then,  ∗ α (Θ ) 1{|Θ| < ∞} . ϑ=E α α |Θ|α

(5.6.2)

Let Y be the tail process associated to Θ. If P(E(Y ) < ∞) = 1, then  ∗ α (Θ ) ϑ=E . (5.6.3) α |Θ|α

We now provide a result under ad hoc assumptions which will be useful to derive identities between the sequence Q and the forward tail process. Recall the notation xi,∞ = (xi , xi+1 , . . . ) for x ∈ (Rd )Z and i ∈ Z, |x|p =  p ( j∈Z |xj | )1/p for p > 0 and α [p](Rd ) = {x ∈ (Rd )Z : |x|p < ∞}. Lemma 5.6.3 Let H : H → R be a shift-invariant, α-homogeneous measurable function defined on a shift-invariant, homogeneous measurable subset H of (Rd )Z such that P(Θ ∈ H | Θ ∈ α ) = 1 and P(Θn,∞ ∈ H | Θ ∈ α ) = 1 for all n ∈ Z. Assume moreover (i) E[|H(Θ0,∞ ) − H(Θ1,∞ )| | Θ ∈ α ] < ∞; (ii) P(limn→∞ H(Θn,∞ ) = 0 | Θ ∈ α ) = 1; (iii) P (limn→∞ H(Θ−n,∞ ) = H(Θ) | Θ ∈ α ) = 1. Then P(Q ∈ H) = 1, E[|H(Q)|] < ∞ and ϑE[H(Q)] = E[{H(Θ0,∞ ) − H(Θ1,∞ )}1{Θ ∈ α }] .

(5.6.4)

−α

Proof. Since the function x → |x|α {H(x0,∞ )−H(x1,∞ )} is 0-homogeneous and equal to 0 whenever |x0 | = 0, by the time-change formula (5.3.2), we have   |H(Θ−j,∞ ) − H(Θ−j+1,∞ )| E 1{Θ ∈  } α α |Θ|α j∈Z   α |H(Θ0,∞ ) − H(Θ1,∞ )| = E |Θj | 1{Θ ∈ α } α |Θ|α j∈Z

= E[|H(Θ0,∞ ) − H(Θ1,∞ )|1{Θ ∈ α }] . Consequently, P

 = 1 and |H(Θ ) − H(Θ )| < ∞ | Θ ∈  −j,∞ −j+1,∞ α j∈Z



5.6 Identities

133

⎡ ⎤  H(Θ−j,∞ ) − H(Θ−j+1,∞ ) E⎣ 1{Θ ∈ α }⎦ α |Θ|α j∈Z   H(Θ−j,∞ ) − H(Θ−j+1,∞ ) = E 1{Θ ∈  } α α |Θ|α j∈Z

= E[{H(Θ0,∞ ) − H(Θ1,∞ )}1{Θ ∈ α }] . On the other hand, assumptions (ii) and (iii) and the dominated convergence theorem ensure that  H(Θ−j,∞ ) − H(Θ−j+1,∞ ) 1{Θ ∈ α } α |Θ|α j∈Z

= lim

n→∞

= lim

n→∞

 −n 1, these quantities may no longer be finite. In view of the bounds αxα−1 ≤ (1 + x)α − xα ≤ α(1 + x)α−1 ,

(5.6.7)

we will use the following convention: ⎧ if α < 1 , ⎨0 α α 1 if α=1, for x = +∞ , (1 + x) − x = ⎩ +∞ if α > 1 .

(5.6.8)

Proposition 5.6.5 Let Y be a tail process with tail index α such that P (Y ∈ 0 ) = 1. Then  α α α  ϑE[|Q|1 ] = E {|Θ0,∞ |1 − |Θ1,∞ |1 } . (5.6.9) If α > 1, the following conditions are equivalent: α

E [|Q|1 ] < ∞ ,   α−1 E |Θ|1 1. Applying this identity and the time-change formula (5.3.2), we obtain 

 ∞ α α α  |Θj,∞ |1 − |Θj+1,∞ |1 |Θ|1 E E 1{|Θj | = 0} α = α |Θ|α |Θ|α j=−∞  ∞ α α  |Θ0,∞ |1 − |Θ1,∞ |1 α = E |Θ | j α |Θ|α j=−∞  α α  = E {|Θ0,∞ |1 − |Θ1,∞ |1 } . Applying Lemma 5.6.1 proves (5.6.9) and applying the inequality (5.6.7), we conclude that (5.6.10a) and (5.6.10b) are equivalent. 

5.6 Identities

135

Proposition 5.6.6 Let Y be a tail process with tail index α such that P (Y ∈ 0 ) = 1. Let H be an α-homogeneous, shift-invariant functional such that |H(x) − H(y)| ≤ cstx − yα α .

(5.6.11)

Then E[|H(Q)|] < ∞, E[|H(Θ0,∞ ) − H(Θ1,∞ )|] < ∞ and ϑE[H(Q)] = E[H(Θ0,∞ ) − H(Θ1,∞ )] .



Proof. The bound (5.6.11) implies the conditions of Lemma 5.6.3.

Example 5.6.7 We obtain the following identities for d = 1. For x ∈ R, define x α = |x|α sign(x). Then  α ϑ E[(Qj )α (5.6.12a) + ] = E[(Θ0 )+ ] = P(Θ0 = 1) = pX , j∈Z

ϑ



α E[(Qj )α − ] = E[(Θ0 )+ ] = P(Θ0 = −1) = 1 − pX ,

(5.6.12b)

j∈Z

ϑ



α

α

E[Qj ] = E[Θ0 ] = E[Θ0 ] = 2pX − 1 ,

(5.6.12c)

j∈Z

where pX is the extremal skewness of the distribution of X0 . Other identities with α = 1 will be given in Problems 5.33 and 5.34.  Theorem 5.6.8 Let Y be a tail process with tail index α such that P (Y ∈ 0 ) = 1 and (5.6.10a) holds. Let K be a non-negative 1homogeneous, shift-invariant functional defined on 1 and Lipschitz continuous with respect to the 1 norm, i.e. for all x, y ∈ 1 , |K(x) − K(y)| ≤ cst · |x − y|1 .

(5.6.13)

Then E[|K α (Θ0,∞ ) − K α (Θ1,∞ )|] < ∞ and ϑE[K α (Q)] = E[K α (Θ0,∞ ) − K α (Θ1,∞ )] .

(5.6.14)

Proof. Apply Lemma 5.6.3 to the function H α which satisfies its assumptions in view of (5.6.13) and the following bounds: for all a, b ≥ 0, ( |a − b|α if α ≤ 1 , α α (5.6.15) |a − b | ≤ α−1 if α > 1 . α|a − b|(a ∨ b)

136

5 Regularly varying time series

   Example 5.6.9 Assume that P lim|j|→∞ |Y j | = 0 = 1 and (5.6.10a) holds. Then ⎞α ⎤ ⎡⎛ ⎞α ⎛ ⎞α ⎤ ⎡⎛ ∞ ∞     Qj ⎠ ⎦ = E ⎣⎝ (5.6.16) |Θj |⎠ − ⎝ |Θj |⎠ ⎦ . ϑE ⎣⎝ j=0

j∈Z

j=1

For d = 1, we have ⎡ ⎛ ⎞⎤ ⎡ ⎛ ⎞ ⎛ ⎞⎤ ∞ ∞    ϑE ⎣h ⎝ Qj ⎠⎦ = E ⎣h ⎝ Θj ⎠ − h ⎝ Θj ⎠⎦ , j∈Z

⎡ ⎛ ϑE ⎣h ⎝sup i∈Z



⎞⎤

j=0





Qj ⎠⎦ = E ⎣sup h ⎝

j≤i

i≥0

i  j=0



j=1



Θj ⎠ − sup h ⎝ i≥0

i 

(5.6.17a) ⎞⎤

Θj ⎠⎦ ,

j=1

(5.6.17b) α with the function h being defined on R by h(x) = xα + , h(x) = x− or h(x) = α |x| . Note that the above limits can be zero if {Xj , j ∈ Z} is not non-negative. Asmentionedafter Lemma 5.6.3, it is not assumed in these identities that α ∞ < ∞. For many time series, this relation does not hold.  E j=0 |Θ|

5.7 Problems 5.1 Prove that the space 0 (Rd ) endowed with  · ∞ is a Banach space. Hint: prove that a Cauchy sequence in 0 (Rd ) is convergent by showing that its limit for the uniform norm is the sequence of the pointwise limits. 5.2 Prove that if A is shift-invariant in (Rd )Z , then ν(A) ∈ {0, ∞}. 5.3 Define τ|X | (h) = limx→∞ P(|X h | > x | |X 0 | > x). Express τ|X |(h) in terms of the tail and spectral tail processes. 5.4 If X is regularly varying with tail index α > 0, prove that for all β < α and h ∈ Z, lim x−β E[|X h | | |X 0 | > x] = E[|Y h | ] . β

β

x→∞

5.5 If α > 1, define CTE|X | (h) = limx→∞ x−1 E[|X h | | |X 0 | > x]. Express this quantity in terms of the tail and spectral tail processes.

5.7 Problems

137

Calculation of Y , Θ, ϑ and Q 5.6 Consider the MA(1) and MMA(1) processes of Examples 5.2.10 and 5.2.11. 1. Compute ϑ; 2. Compute τ|X| (h); 3. Compute CTE|X| (h); 4. Determine (up to shift) the conditional spectral tail process Q (use the first exceedence map); 5. Verify (5.4.10) by direct calculation. 5.7 Consider the AR(1) process of Example 5.2.12. 1. Show that ϑ = 1 − ρα . 2. Compute τX (h) and CTEX (h). 3. Show that the conditional spectral tail process Q is defined (up to shift) as follows: Qj = ρj , j ≥ 0, Qj = 0, j ≤ −1. (Use the first exceedence functional.) 4. Verify (5.4.10) by direct calculation. Tail process under transformation 5.8 (Generalization of Example 5.2.13.) Let {X j , j ∈ Z} be a regularly varying d-dimensional stationary time series with tail process {Y j , j ∈ Z}. Let g : Rd → Rk be a continuous homogenous map, i.e. there exists β > 0 such that g(tx) = tβ g(x) for all x ∈ Rd and t > 0. Assume furthermore that |g(x)| > 1 implies |x| > 1. Prove that if P(|g(Y 0 )| > 1) > 0, then the tail process of the sequence {g(X j ), j ∈ Z} is {g(Y j ), j ∈ Z}, conditionally on |g(Y 0 )| > 1. 5.9 Let {X j , j ∈ Z} be a regularly varying d-dimensional stationary time series. For h ≥ 1 define a Rd(h+1) valued stationary regularly varying time )j = (X j , . . . , X j+h ), j ∈ Z. Let {Y j , j ∈ Z} be the tail process series by X of {X j , j ∈ Z} associated to a norm |·| on Rd . Let {Y* j , j ∈ Z} be the tail )j , j ∈ Z} and a norm  ·  on Rd(h+1) which has process associated with {X the property that (5.7.1) |x0 | > 1 =⇒ (x0 , . . . , xh ) > 1 . Define the maps π0 : Rd(h+1) → Rd and Π0 : (Rd(h+1) )Z → (Rd )Z by π0 ((x0 , . . . , xh )) = x0 and xj , j ∈ Z}) = {π0 (* xj ), j ∈ Z} . Π0 ({*

138

5 Regularly varying time series

) = X. Given a functional Φ : (Rd )Z → R, With this notation, we have Π0 (X) d(h+1) * : (R * = Φ ◦ Π0 , i.e. we define a functional Φ )Z by Φ * x) = Φ(Π0 (* * ∈ (Rd(h+1) )Z . Φ(* x)) , x 1. Write Y* j = (Y* j,0 , . . . , Y* j,h ). Prove that P(Y* 0,0 > 1) = lim

x→∞

2. Prove that

P(|X 0 | > x) . P((X 0 , . . . , X h ) > x)

    * Y* ) | Y* 0,0  > 1 = E[Φ(Y )] . E Φ(

The time-change formula 5.10 Prove the time-change formula (5.3.1a) by using the definition of the tail process as a limit. Hint: consider a bounded or non-negative measurable function H on (Rd )Z which depends only on finitely many coordinates. 5.11 Let Θ be a random element on (Rd )Z which satisfies the time-change formula (5.3.1b) and such that P(|Θ0 | > 0) = 1. Prove that P(|Θ0 | = 1) = 1. 5.12 Let Y be a random variable with a Pareto distribution with index α and independent of Θ. Define Y = Y Θ. Prove that if Θ satisfies (5.3.1b), then Y satisfies (5.3.1a). 5.13 Let Y be a random element on (Rd )Z which satisfies the time-change formula (5.3.1a) and such that P(|Y 0 | > 0) = 1. Then |Y 0 | has a Pareto −1 distribution with tail index α. Define Θ = |Y 0 | Y . Show that Θ and Y 0 are independent. 5.14 Let φ be a bounded measurable function on Rd such that φ(y 0 ) = 0 if |y 0 | ≤ . Prove that for all j ∈ Z, E[φ(Y 0 )φ(Y j )] = E[φ(Y 0 )φ(Y −j )]. Hint: apply the time-change formula. Relations between ν, ν ∗ , Θ, Y and Q 5.15 Prove that the identity (5.2.8) is wrong if the indicator 1{|y 0 | > 0} is omitted in the left-hand side. Hint: consider a sequence of i.i.d. regularly varying random variables.

5.7 Problems

139



5.16 Assume that i∈Z P(|Θi | = 0) = 0. Prove that in that case (5.2.8) holds without the indicator, i.e.  ∞ E[H(rΘ)]αr−α−1 dr . ν(H) = 0

Hint: for a non-negative measurable function H, apply the time-change for+∞ mula (5.3.3) to the function K(x) = 0 H(rx)αr−α−1 dr, x ∈ (Rd )Z . 5.17 Assume that P(lim|j|→∞ |Θj | = 0) = 1 and ∞ 

α

E[|Θj | ] < ∞

(5.7.2)

j=0

and let H be a shift-invariant measurable functional on (Rd )Z such that there exists η > 0 such that  |H(x)| ≤ cst 1{|xj | > η} . j∈Z

1. Prove that ν ∗ (H) =

 0





 E H(uΘ0,∞ )1 Θ∗−∞,−1 = 0 αu−α−1 du .

+ ∞

 Hint: prove that ν (H) = E  H(vΘ0,∞ )1 vΘ∗−∞,−1 ≤  αu−α−1 du for all  ≤ η and conclude by using (5.7.2) and the dominated convergence theorem. ∞ α 2. Prove that P(Θ∗−∞,−1 = 0) = 0 implies j=0 E[|Θj | ] = ∞. ∗

∞ 5.18 Assume that (5.6.10a) holds and E[( j=0 |Θj |)α ] < ∞. Let K be a shift-invariantfunctional on 1 , homogeneous with degree 1 and such that |K(x)| ≤ cst j∈Z |xj | for all x ∈ 1 . Prove that 

 (5.7.3) ϑE[K α (Q)] = E K α (Θ0,∞ )1 Θ∗−∞,−1 = 0 .

  Hint: Define the sequences Q (u) = {Qj 1 u Qj  >  , j ∈ Z} and Θ (u) = {Θj 1{u |Θj | > }, j ∈ N}. Prove that  ∞ ϑ lim P (uK(Q (u)) > 1) αu−α−1 du = ϑE[K α (Q)] < ∞ . →0



Apply the identities (5.4.6) and (5.2.6) and conclude by the dominated convergence theorem. 5.19 Prove that if

⎞α ⎤ ⎡⎛ ∞  |Θj |⎠ ⎦ < ∞ , E ⎣⎝ j=0

140

then

5 Regularly varying time series

α ⎤ α ⎡ ⎤ ⎡ ∞      ∗  

Qj  ⎦ = E ⎣ Θj  1 Θ−∞,−1 = 0 ⎦ . E ⎣  j=0  j∈Z  

(5.7.4)

Hint: Apply Problem 5.18.   5.20 Assume that P lim|j|→∞ |Y j | = 0 = 1. Let H be a 0-homogeneous, bounded, shift-invariant functional on (Rd )Z . Prove that   α E[H(Y )] = ϑ E[H(Q) Qj  ] . j∈Z

Hint: Apply Theorem 5.4.10 to H(y)1{|y 0 | > 1}.   5.21 Assume that P lim|j|→∞ |Y j | = 0 = 1. Let K be a 1-homogeneous, non-negative shift-invariant measurable functional on (Rd )Z . Prove that     α E[ K(Q) ∧ Qj  ] . P(K(Y ) > 1) = ϑ j∈Z

Hint: Apply Theorem 5.4.10 to 1{K(y) > 1}1{|y 0 | > 1}.   5.22 Assume that P lim|j|→∞ |Y j | = 0 = 1. Let H be a shift-invariant functional such that H depends only on the coordinates that are larger than 1. Then ν ∗ (H) = E[H(Y 0,∞ ) − H(Y 1,∞ )] .

(5.7.5)

Hint: write H(Y 0,∞ ) as a telescoping sum. Apply the time-change formula (5.3.1a). 5.23 Assume that P(lim|j|→∞ |Y j | = 0) = 1 and let H be a bounded measurable functional which depends only on coordinates greater than 1 and such that E[|H(Y )||H(Y 0,∞ ) − H(Y 1,∞ )|] < ∞. Prove that ν ∗ (H 2 ) < ∞ and ν ∗ (H 2 ) = E[H(Y ){H(Y 0,∞ ) − H(Y 1,∞ )}] . Hint: write H(Y 0,∞ ) as a telescoping sum. Apply the time-change formula (5.3.1a). 5.24 Assume that P(lim|j|→∞ |Y j | = 0) = 1 and let H, H  be bounded functionals on (Rd )Z such that H  (x) = 0 if x∗ ≤ 1 and E[|H(Y )||H  (Y 0,∞ )− H  (Y 1,∞ )|] < ∞. Prove that ν ∗ (HH  ) = E[H(Y ){H  (Y 0,∞ ) − H  (Y 1,∞ )}] . Hint: write H(Y 0,∞ ) as a telescoping sum. Apply the time-change formula (5.3.1a).

5.7 Problems

141

5.25 Assume that P(lim|j|→∞ |Y j | = 0) = 1. Let H be a shift-invariant bounded or non-negative measurable functional. Prove that ν ∗ (HE) = E[H(Y )] . Deduce that ν ∗ (E 2 ) =



P(|Y j | > 1)

j∈Z

and for s, t > 0, ν ∗ (Es Et ) = s−α



P(|Y j | > t−1 s) ,

j∈Z

where Es (x) = E(x/s). Hint: use Problem 5.24. 5.26 Let φ : Rd → R be a non-negative measurable function. Define the  functions Hφ on 0 (Rd ) by Hφ (x) = j∈Z φ(xj ). Prove that ν ∗ (Hφ ) = ν 0 (φ).  ∞ 5.27 Prove that j∈Z P(|Y j | > 1) and j=1 P(|Y j | > 1) are simultaneously finite or infinite and that 

P(|Y j | > 1) = 1 + 2

∞ 

P(|Y j | > 1) .

j=1

j∈Z

Hint: apply the time-change formula (5.3.1a) to each of the summands. 5.28 Let Y be a tail process such that P(lim|j|→∞ |Y j | = 0) = 1 and let Q be the associated conditional tail process. Prove that d

Y = Y B T (|Q0 |

−1

Q) ,

with Y a Pareto random variable and T an integer-valued random variable such that (Q, T ) is independent of Y and joint distribution given by  α E[g(Q, T )] = ϑ E[g(Q, j) |Q−j | ] . j∈Z

Hint: use the representation (5.4.11). 5.29 Assume that P(lim|j|→∞ |Y j | = 0) = 1 and define the process Z as Y conditioned on the first exceedence over 1 to happen at time zero, ∗ i.e.  Y −∞,−1 ≤ 1. Define the exceedence functional E on 0 by E(x) = j∈Z 1{|xj | > 1} and let Ti (x) be (i+1)-th exceedence of x above the level 1.

142

5 Regularly varying time series

1. Prove the dual relationships:

⎡ ⎤ E(Z )−1 ∞     E H(B −j Z)1{|Z j | > 1} = ϑE ⎣ H(B −Ti (Z ) Z)⎦ , E[H(Y )] = ϑ j=0

 ϑE[H(Z)] = E

i=0 −T1 (Y )

H(B Y) E(Y )

(5.7.6a)

.

(5.7.6b)

2. Prove that E[E(Y )] = E[E 2 (Z)] and P(E(Y ) = k) = ϑkP(E(Z) = k) . This shows that the distribution of the number of exceedences of the tail process Y is the size-biased distribution corresponding to the distribution of the number of exceedences of the process Z. ∞ 3. Set E0 (x) = j=0 1{|xj | > 1}. Apply (5.7.6a) to prove that P(E0 (Y ) = k) = ϑP(E(Z) > k) . 5.30 Assume that P(lim|j|→∞ |Y j | = 0) = 1. Let A and A be two anchoring A A maps and assume that A is 0-homogeneous.  Let Q and Q be two ran dom elements in 0 (Rd ) such that L QA = L (Y ∗ )−1 Y | A(Y ) = 0 and       A d L QA = L (Y ∗ )−1 Y | A (Y ) = 0 . Prove that QA = B −A(Q ) QA . ˜ = 5.31 Let H be a bounded or non-negative functional on 0 (Rd ). Set H  j j∈Z Uj H ◦ B with Uj (x) = 1{|xj | > 1}. Prove that  ˜ 2) = ν ∗ (H E[H(Y )H(B −j Y )1{|Y j | > 1}] . (5.7.7) j∈Z

˜ Hint: Apply (5.4.13) and the shift-invariance of H. 5.32 Let X be a univariate regularly varying time series with tail index α > 0, tail process Y such that P(lim|j|→∞ |Yj | = 0) = 1, candidate extremal index ϑ and conditional spectral tail process Q. Prove that for all h ≥ 0,  P(Xj Xj+h > x) α/2 = ϑ lim E[(Qj Qj+h )+ ] . x→∞ P(X02 > x) j∈Z

Identities for α = 1 5.33 Let d = 1 and consider a non-negative time series with tail index α = 1 and whose tail process satisfies P(lim|j|→∞ Yj = 0) = 1.

5.7 Problems

143

1. Prove that (regardless of whether these quantities are finite or infinite),     = E[log(|Θ|1 ] . ϑ E Qj log Q−1 j |Q|1 j∈Z

Hint: use the definition of the conditional spectral tail process Q with the infargmax functional and apply the time-change formula (5.3.1b).  2. Write Vj = k≥j Θk for all j ∈ Z. Prove that for every j ≥ 1 0 ≤ E [log(V−j ) − log(V−j+1 )] = E[Θj {log(V0 ) − log(V1 )}] . 3. Prove that if y ≥ x > 0, then 0 ≤ log(y) − log(x) ≤ (y − x)/x. 4. Use the previous results to obtain that for all n ≥ 1, 0 ≤ E [log(V−n )] ≤ E[log(V0 )] + 1 . 5. Conclude that the following conditions are equivalent:     1} < ∞ for all j ∈ Z is σ-finite and uniquely determined by its restrictions to the sets {y j  > 1}, j ∈ Z.

5.36 Let Θ be a random element with values in (Rd )Z defined on a probability space (Ω, F, P), such that P(|Θ0 | = 1) = 1 and which satisfies the timechange formula (5.3.1b). For two sequences q ∈ RZ and y ∈ (Rd )Z we define the componentwise product q · y = {qj y j , j ∈ Z}. 1. Show that one can choose a sequence q ∈ [0, ∞)Z such that P(inf arg max(q· α B i Θ) ∈ Z) = 1 for all i ∈ Z. Hint: use the fact that E[|Θj | ] ≤ 1 for all j ∈ Z. 2. Fix one such sequence q and define the measure ν q on (Rd )Z by  ∞

νq = E[δrB j Θ 1 inf arg max(q · B j Θ) = j ]αr−α−1 dr . j∈Z

0

Prove that ν q ({0}) = 0 and that ν q is homogeneous.

5.8 Bibliographical notes

145

3. Use the assumption that P(inf arg max(q·Θ) ∈ Z) = 1 and the time-change formula to obtain that for all h ∈ Z,   ∞ H(y)1{|y h | > 1}ν q (dy) = E[H(rB h Θ)]αr−α−1 dr . (5.7.13) (Rd )Z

1

4. Apply Problem 5.35 to obtain that ν q does not depend on the sequence q. 5. Apply (5.7.13) and Problem 5.35 to prove that ν q is shift-invariant. 6. Conclude that if Θ is the tail process associated to a shift-invariant tail measure ν, then ν q = ν.

5.8 Bibliographical notes The tail process and spectral tail process and the time-change formula were originally introduced in [BS09]. The idea is older and appeared implicitly in the study of regularly varying Markov chains in [Yun98]. The sequence Q appeared implicitly in the seminal paper [DH95]. The existence of the tail measure was first stated (with a sketch of proof) in the unpublished manuscript [SO12] and proved in the context of regular variation on metric spaces in [SZM17]. Anchoring maps were introduced in [BP18]. They also appear in [Has19] under the name 0-pinning functionals. Condition (5.5.2) on anchoring maps is also used in [Has19] in the context of max-stable processes; see Chapter 13. Problem 5.30 is due to Hrvoje Planini´c (personal communication). The equivalence of (ii) and (iii) in Theorem 5.4.8 was originally obtained in a different form (see Corollary 13.2.5) in the context of max-stable processes in [DK17]. It was later proved in [Je19] in terms of the spectral tail process. The rest of this chapter is essentially based on [PS18] and [DHS18]. Some identities of Section 5.6 were partially obtained in [MW14, MW16].

Part II

Limit Theorems

6 Convergence of clusters

When a regularly varying stationary time series is extremally dependent, then the exceedences over high levels will tend to happen in clusters. From a statistical point of view, it is important to identify these clusters in order to make inference on the extremal dependence of the time series. A very common statistical practice is, given n successive observations of the said time series, to split them into mn blocks of size rn (with mn rn ≈ n and mn , rn → ∞), compute a given extreme value statistics on each of the mn blocks, and average these values to obtain an estimator. The simplest example is the number of exceedences over a high threshold cn : in each block the number of exceedences will be recorded, and the average of these numbers will be an estimator of the mean cluster size. We expect that the statistics computed on each block converge to the corresponding statistics computed on the tail process, which can be seen as an asymptotic representation for the block values scaled by cn when the block size rn and the threshold cn both tend to infinity in an appropriate way. In this chapter, building on the results of Chapter 5, we will formalize the latter idea and study the convergence of statistics computed on one block. The problem of convergence of the averaged statistics and their asymptotic distributions will be considered in Chapters 9 and 10.

6.1 What is a cluster? Let X be a regularly varying stationary time series with values in Rd . In this section, we will give sense to the heuristic definition of a cluster as a triangular array (X 1 /cn , . . . , X rn /cn ), with rn , cn → ∞, which converges to the tail process in a certain sense. By definition of the tail process, for each fixed r ∈ N, the distribution of c−1 n X −r,r conditionally on |X 0 | > cn converges weakly to the distribution of

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 6

149

150

6 Convergence of clusters

Y −r,r . In order to let r tend to infinity, we must embed all these finite vectors into one space of sequences. By adding zeroes on each side of the vectors d c−1 n X −r,r and Y −r,r we identify them with elements of the space 0 (R ) of d R -valued sequences which tend to zero at infinity. Then Y −r,r converges to Y in 0 if (and only if) Y ∈ 0 almost surely. Since 0 endowed with the uniform norm is a complete separable metric space, weak convergence in 0 is metrizable and we can apply the triangular argument Lemma A.1.3 to obtain the following result. Lemma 6.1.1 Let X be a regularly varying stationary time series with values in Rd and tail process Y and let {cn } be a scaling sequence. If P(lim|j|→∞ |Y j | = 0) = 1, then there exists a scaling sequence {rn } of integers such that the distribution of c−1 n X −rn ,rn conditionally on |X 0 | > cn converges weakly in 0 (Rd ) to the distribution of Y . Moreover, for all x > 0,   lim lim sup P max |X j | > cn x | |X 0 | > cn = 0 . (6.1.1) m→∞ n→∞

m≤|j|≤rn

Proof. As already mentioned, the existence of a scaling sequence {rn } with the required properties is simply a consequence of the metrizability of weak convergence in 0 . The stated convergence implies that for each fixed m and all x > 0,     lim P max |X j | > cn x | |X 0 | > cn = P sup |Y j | > x . n→∞

m≤|j|≤rn

|j|≥m

Since the tail process tends to zero at infinity by assumption, this proves (6.1.1).  The converse of this result is true: if condition (6.1.1) holds, then the tail process vanishes at infinity. It is also easily seen that if (6.1.1) holds for a pair of sequences (rn , cn ) then it holds for the pair (rn , cn u) for all u ≥ 1. However, this will not be enough and we need that (6.1.1) holds for all pairs (rn , cn u) for all u > 0. Therefore we introduce the following definition. Definition 6.1.2 Let {X j , j ∈ Z} be a stationary sequence. Let {cn } be a scaling sequence and let {rn } be a non-decreasing sequence of integers such that limn→∞ rn = ∞. We say that Condition AC(rn , cn ) holds if for every x, y ∈ (0, ∞),   max |X j | > cn x | |X 0 | > cn y = 0 . (AC(rn , cn )) lim lim sup P m→∞ n→∞

m≤|j|≤rn

6.1 What is a cluster?

151

If |X 0 | is regularly varying, then Condition AC(rn , cn ) holds for i.i.d. random vectors {X j , j∈Z} if the sequences {rn , cn , n ∈ N} satisfy limn→∞ rn P(|X 0 | > cn ) = 0. This condition also holds for an m-dependent sequence, i.e. a sequence {X j , j ∈ Z} such that the σ-fields σ(X u , u ≤ t) and σ(X v , v ≥ t + m) are independent for all t ∈ Z. Lemma 6.1.3 Let {X j , j ∈ Z} be an m-dependent stationary sequence such that |X 0 | is regularly varying. Let {cn } be a scaling sequence. Then AC(rn , cn ) holds for every sequence {rn } such that limn→∞ rn P(|X 0 | > cn ) = 0. Proof. By m-dependence, stationarity and regular variation of |X 0 |, if k > m, we have for x > 0,   P max |X j | > cn x | |X 0 | > cn y k≤|j|≤rn   =P max |X j | > cn x k≤|j|≤rn

≤ rn P(|X 0 | > cn x) ∼ x−α rn P(|X 0 | > cn ) → 0 . We now state the main consequence of Condition AC(rn , cn ). Theorem 6.1.4 Let {X j , j ∈ Z} be a stationary regularly varying sequence with tail process {Y j , j ∈ Z}. Let {cn } be a scaling sequence and let {rn } be a non-decreasing sequence of integers such that limn→∞ rn = ∞. If Condition AC(rn , cn ) holds, then   (6.1.2) P lim |Y j | = 0 = 1 . |j|→∞

Moreover, for all x > 0, the distribution of (xcn )−1 X −rn ,rn conditionally on |X 0 | > cn x converges weakly in 0 to the distribution of the tail process Y .

Proof. Fix integers m > k > 0. Then, for  > 0,     max |X j | > cn  | |X 0 | > cn P max |Y j | >  = lim P n→∞ k≤|j|≤m k≤|j|≤m   ≤ lim sup P max |X j | > cn  | |X 0 | > cn . n→∞

By monotone convergence,

k≤j≤rn



152

6 Convergence of clusters

 P

 sup |Y j | > 

|j|>k

This yields by AC(rn , cn ),  lim P

k→∞

 = sup P m>k

 max |Y j | >  .

k≤|j|≤m



sup |Y j | > 

|j|>k



 max |Y j | >  = lim sup P k→∞ m>k k≤|j|≤m   ≤ lim lim sup P max |X j | > cn  | |X 0 | > cn = 0 . k→∞ n→∞

k≤|j|≤rn

By monotone convergence again, this yields P(lim sup|j|→∞ |Y j | > ) = 0 for all  > 0 and the proof is thus concluded. We now prove the weak convergence of the conditional distributions. By the Portmanteau Theorem A.2.3, we only need to prove that lim E[H(c−1 n X −rn ,rn ) | |X 0 | > cn ] = E[H(Y )] ,

(6.1.3)

n→∞

for every bounded function H : (Rd )Z → R, Lipschitz continuous with respect to the uniform norm, that is, such that |H(x) − H(y)| ≤ cst · ( x − y ∞ ∧ 1). Since H is continuous with respect to the uniform norm, by regular variation, we have lim E[H(c−1 n X −m,m ) | |X 0 | > cn ] = E[H(Y −m,m )] .

n→∞

Since H is Lipschitz continuous and bounded and since the tail process tends to zero under condition AC(rn , cn ), it also holds that lim E[H(Y −m,m )] = E[H(Y )] .

m→∞

To conclude, we only need to apply the triangular argument Lemma A.1.4, that is, to prove that   −1  lim lim sup E[H(c−1 n X −rn ,rn ) − H(cn X −m,m ) | |X 0 | > cn ] = 0 . m→∞ n→∞

Fix  > 0. Since H is Lipschitz continuous, applying condition AC(rn , cn ) yields   −1  lim lim sup E[H(c−1 n X −rn ,rn ) − H(cn X −m,m ) | |X 0 | > cn ] m→∞ n→∞







≤ cst  + lim lim sup P m→∞ n→∞

sup m≤|j|≤rn

|X j | > cn | |X 0 | > cn = cst ×  .

Since  is arbitrary, this concludes the proof.



6.2 Vague# convergence of clusters

153

Remark 6.1.5 Lemma 5.4.3 and Theorem 6.1.4 show that AC(rn , cn ) implies that the tail measure ν is supported on 0 (Rd ). Remark 6.1.6 By the mapping theorem, the conditional weak convergence implies that the convergence (6.1.3) holds for all functions H which are almost surely continuous with respect to the distribution of the tail process Y . Since the tail process can be expressed as Y = |Y 0 | Θ, where Y 0 and Θ are independent and the distribution of |Y 0 | is continuous on R+ , functions whose sets of discontinuities are included in union of sets of the form {x∗ = c} or {|xj | = c} for countably many c ≥ 0 and j ∈ Z are almost surely continuous with respect to the distribution of Y . In particular, functions of the type 1{x∗ > 1}, j∈Z 1{|xj | > 1} are almost surely continuous with respect to the distribution of Y . We can now tentatively define a cluster of a regularly varying stationary time series. Definition 6.1.7 (Cluster) For a given scaling sequence {cn }, a cluster at the level cn is a block of consecutive observations of size rn where {rn } is a scaling sequence such that condition AC(rn , cn ) holds.

6.2 Vague# convergence of clusters We now investigate the unconditional convergence of c−1 n X 1,rn . Note first that because of stationarity, the set of indices {1, . . . , rn } is arbitrary. Contrary to the conditional convergence considered in Theorem 6.1.4, where an extreme value was imposed at time 0, a large value in the cluster can happen at any time. Define the measures ν ∗n,rn , n ≥ 1 on 0 as follows. ν ∗n,rn =



1 . E δc−1 n X 1,rn rn P(|X 0 | > cn )

(6.2.1)

Recall we have defined the candidate extremal index in Section 5.4 by ϑ = E[E −1 (Y )] ,

(6.2.2)

and that P(Y ∈ 0 (Rd )) = 1 implies ϑ > 0 (cf. (5.4.5)) and that we introduced in Definition 5.4.11 the cluster measure ν ∗ on 0 by

154

6 Convergence of clusters

ν∗ = ϑ





E[δrQ ]αr−α−1 dr .

(6.2.3)

0

That is, for a bounded or non-negative measurable function H on 0 , E[H(c−1 n X 1,rn )] , rn P(|X 0 | > cn ) ∞ ν ∗ (H) = ϑ E[H(rQ)]αr−α−1 dr .

ν ∗n,rn (H) =

0

We will call ν ∗ (H) the cluster index associated with the functional H. We stress the dependence on rn in the notation but not on the scaling sequence {cn }. We are interested in the convergence of ν ∗n,rn to ν ∗ . However, the convergence of ν ∗n,rn (H) to ν ∗ (H) may hold only for shift-invariant functionals. Therefore, we must introduce the space ˜0 of shift-equivalent sequences. Definition 6.2.1 The space ˜0 is the space of equivalence classes of 0 endowed with the equivalence relation ∼ defined by x ∼ y ⇐⇒ ∃j ∈ Z , B j x = y .

Lemma 6.2.2 The space ˜0 endowed with the metric d˜ defined on ˜0 by ˜ x, y ˜) = d(˜

inf

x∈˜ x ,y ∈˜ y

˜, y ˜ ∈ ˜0 , |x − y|∞ , x

is a complete separable metric space. Proof. See Problem 6.6.



A shift-invariant function H on 0 can be seen as a function on ˜0 and a measure on 0 can be seen as a measure on ˜0 by applying it only to shiftinvariant functions. Recall that for a shift-invariant measurable map H, we have, for ∞ E[H(r(Y ∗ )−1 Y ) | A(Y ) = 0]αr−α−1 dr . ν ∗ (H) = ϑ 0

Cf. (5.5.8). Thus, if we identify the conditional spectral tail process Q to its equivalence class in ˜0 (Rd ), then its distribution is that of (Y ∗ )−1 Y , conditionally on an arbitrary anchoring map being equal to zero. But let it be stressed again that this equality does not hold in 0 (Rd ). See also Problem 5.30.

6.2 Vague# convergence of clusters

155

See We now describe the framework of vague# convergence on ˜0 \ {0}.

Appendix B.1 for more details. We write 0 for the equivalence class of the null sequence 0 and we say that a subset A of ˜0 is separated from zero if ˜ x, 0)

>  for all x ˜ ∈ A. The class of sets there exists  > 0 such that d(˜ separated from zero forms a boundedness denoted by B 0 . By definition of d˜ and since the null sequence is shift-invariant, this simply means that x∗ >  ˜ ∈ A. for any representant x of any x

if Definition 6.2.3 A measure ν is B 0 -boundedly finite on ˜0 \ {0} ν(B) < ∞ for all Borel sets B separated from zero. A sequence of bound converges B 0 -vaguely# to ν edly finite measures {νn , n ∈ N} on ˜0 \ {0} if limn→∞ νn (B) = ν(B) for all sets B separated from 0 and such that ν(∂B) = 0.

Our first result parallels Lemma 6.1.1. Lemma 6.2.4 Let X be a regularly varying stationary time series with values in Rd and tail process Y such that P(lim|j|→∞ |Y j | = 0) = 1. Then for each scaling sequence {cn }, there exists a scaling sequence of integers {rn } such v# that limn→∞ rn P(|X 0 | > cn ) = 0 and ν ∗ −→ ν ∗ in (˜0 \ {0}, B 0 ). n,rn

Proof. The proof is along the same lines as the proof of Lemma 6.1.1. Define ν ∗n,m (H) =

E[H(c−1 n X 1,m )] . mP(|X 0 | > cn )

We will prove that for each fixed m, ν ∗n,m converges B 0 -vaguely# when n → ∞ to a measure ν ∗m which in turn converges to ν ∗ when m → ∞. Metrizability of vague# convergence and Lemma A.1.3 will finish the proof. By definition of regular variation, P(|X 0 | > cn )−1 P(c−1 n X 1,m ∈ ·) converges vaguely# on Rdm \ {0} (endowed with the boundedness of sets separated from 0) to the exponent measure ν 1,m . These measures can be identified to B 0 -boundedly finite measures on ˜0 \ {0} and this yields the first part with ν ∗m = m−1 ν 1,m . Let H be a shift-invariant bounded continuous function on 0 , such that H(x) = 0 if x∗ ≤ . Then, by Remark 5.2.3, we have 1 m   1  ν ∗m (H) = −α E[H(Y 1−j,m−j )1 Y ∗s−j,−1 ≤ 1 ] = gm (s)ds , m j=1 0 with

  gm (t) = −α E H(Y 1−[mt],m−[mt] )1 Y ∗1−[mt],−1 ≤ 1 ,

156

6 Convergence of clusters

(where [x] denotes the smallest integer larger than or equal to the real number x). Since we have assumed that P(lim|j|→∞ |Y j | = 0), the continuity and boundedness of H (with respect to the uniform norm) imply that for every t ∈ (0, 1),   lim gm (t) = −α E[H(Y )1 Y ∗−∞,−1 ≤ 1 ] = ν ∗ (H) , m→∞

by (5.5.9) applied with the first exceedence functional as anchoring map. Since gm is uniformly bounded, we can apply the dominated convergence theorem to obtain the convergence of the integral of gm and this proves that v#

ν ∗m (H) → ν ∗ (H) and ν ∗m −→ ν ∗ . v#

Therefore there exists a sequence rn such that ν ∗n,rn −→ ν ∗ . For this sequence, we have for all t > 0, P(X ∗1,rn > cn t) = ϑt−α . n→∞ rn P(|X 0 | > cn ) lim

If lim supn→∞ rn P(|X 0 | > cn ) > 0, then there exists a positive constant c > 0 such that, along a subsequence, P(X ∗1,rn > cn t) → ct−α for every t > 0. This is a contradiction since the limit could be arbitrarily large and the sequence is a sequence of probabilities, hence is bounded.  The above result gives little information on the sequence rn . The next results shows that any sequence rn such that condition AC(rn , cn ) holds is appropriate. Theorem 6.2.5 Let condition AC(rn , cn ) hold. The sequence of measures ν ∗n,rn , n ≥ 1 converges B 0 -vaguely# on ˜0 \ {0} to ν ∗ .

Proof. We must prove that for all bounded continuous shift-invariant functions H with support separated from 0, E[H(c−1 n X 1,rn )] = ν ∗ (H) . n→∞ rn P(|X 0 | > cn )

lim ν ∗n,rn (H) = lim

n→∞

Since H has a support separated from zero, there exists  > 0 such that H(x) = 0 if x∗ ≤ . Then, applying the stationarity of X and the shiftinvariance of H, we obtain

6.2 Vague# convergence of clusters

ν ∗n,rn (H) =

157

1 rn P(|X 0 | > cn ) rn    ∗   × E H(c−1 n X 1,rn )1 X 1,j−1 ≤ cn  1{|X j | > cn } j=1

=

P(|X 0 | > cn ) 1 P(|X 0 | > cn ) rn rn    ∗   × E H(c−1 n X 1−j,rn −j )1 X 1−j,−1 ≤ cn  | |X 0 | > cn  j=1

=

P(|X 0 | > cn ) P(|X 0 | > cn )



1

gn (s)ds , 0

with

  ∗ gn (s) = E H(c−1 n X 1−[rn s],rn −[rn s] )1 X 1−[rn s],−1 ≤ cn  | |X 0 | > cn  .

  By Theorem 6.1.4, limn→∞ gn (s) = E[H(Y )1 Y ∗−∞,−1 ≤ 1 ] for each s ∈ (0, 1). Moreover, the sequence gn is uniformly bounded, thus by dominated convergence, regular variation of |X 0 | and (5.5.9) (applied with the first exceedence functional as anchoring map), we obtain   lim ν ∗n,rn (H) = −α E[H(Y )1 Y ∗−∞,−1 ≤ 1 ] = ν ∗ (H) . n→∞

 As a first important application of the vague convergence of ν ∗n,rn (under the conditions of Lemma 6.2.4 or Theorem 6.2.5), we obtain the behavior of the maximum in a cluster, which is related to the candidate extremal index ϑ = E[E −1 (Y )]. v#

Corollary 6.2.6 Assume that ν ∗n,rn −→ ν ∗ . Then, for all t > 0, P(X ∗1,rn > cn t) = ϑt−α . n→∞ rn P(|X 0 | > cn ) lim

(6.2.4)

Proof. Apply Theorem 6.2.5 to the shift-invariant and ν ∗ -almost surely con tinuous map H(x) = 1{x∗ > t}. Before giving other applications, we show that condition AC(rn , cn ) holds for sequences which admit a tail equivalent m-dependent approximation.

158

6 Convergence of clusters

Lemma 6.2.7 Let X be a stationary regularly varying time series with tail process Y such that P(Y ∈ 0 (Rd )) = 1. Assume that there exists a sequence (m) of m-dependent sequences X (m) such that {(X j , X j ), j ∈ Z} is stationary and    (m)  P(X 0 − X 0  > x) =0. (6.2.5) lim lim sup m→∞ x→∞ P(|X 0 | > x) Let ν ∗ and ν ∗(m) be the cluster measures of X and X (m) , respectively. Then v#

v#

ν ∗(m) −→ ν ∗ (as m → ∞) and ν ∗n,rn −→ ν ∗ (as n → ∞) for all scaling sequences cn and rn such that lim rn P(|X 0 | > cn ) = 0 .

n→∞

(6.2.6)

Proof. First, we know by Proposition 5.2.5 that (6.2.5) implies that for every u > 0,    (m)  P(X 0  > xu) = u−α . lim lim sup (6.2.7) m→∞ x→∞ P(|X 0 | > x) • Let cn be a scaling sequence and let rn be a non-decreasing sequence of integers such that (6.2.6) holds. In analogy to

E δc−1 X 1,rn n , ν ∗n,rn = rn P(|X 0 | > cn ) we define ν ∗(m) n,rn



E δc−1 X (m)  n 1,rn = .  (m)  rn P(X 0  > cn )

Since X (m) is m-dependent, it satisfies condition AC(rn , cn ), thus we know ∗(m) v #

by Theorem 6.2.5 that for each m, ν n,rn −→ ν ∗(m) . • Since we have assumed that P(Y ∈ 0 (Rd )) = 1, we know by Lemma 6.2.4 that given the sequence cn , there exists a sequence rn [n]0 such that (6.2.6) v#

holds and ν ∗n,r0 −→ ν ∗ . If we furthermore prove n

   ∗(m)  lim lim sup ν n,r0 (H) − ν ∗n,rn0 (H) = 0 ,

m→∞ n→∞

n

v#

then we will have obtained that ν ∗(m) −→ ν ∗ .

(6.2.8)

6.2 Vague# convergence of clusters

159

• We will actually prove more than (6.2.8). We will prove that for all sequences rn such that (6.2.6) holds (hence in particular for the sequence rn0 ),     ∗ (6.2.9) lim lim sup ν ∗(m) (H) − ν (H) =0, n,rn n,rn m→∞ n→∞

for all  > 0 and all shift-invariant bounded Lipschitz continuous maps on 0 \ {0} such that H(x) = 0 if x∗ ≤ 2. Write first −1 E[H(c−1 n X 1,rn ) − H(cn X 1,rn )]    (m)  rn P(X 0  > cn ) ⎞ ⎛ | > c ) P(|X 0  n − 1⎠ ν ∗(m) + ⎝  n,rn (H) . (m)  P(X 0  > cn ) (m)

∗ |ν ∗(m) n,rn (H) − ν n,rn (H)| =

(6.2.10)

Note that ∗(m) ∗ lim sup ν ∗(m) n,rn (H) ≤ cst lim sup ν n,rn ({x > }) n→∞

n→∞ ∗(m)

= cst ν

({x∗ > }) = cst ϑm −α ,

where ϑm ∈ (0, 1] is the candidate extremal index of X (m) . Thus, by (6.2.7) the last term in (6.2.10) will vanish by letting n, then m tend to ∞. Fix η < . By the Lipschitz and support properties of H, we have −1 E[H(c−1 n X 1,rn ) − H(cn X 1,rn )]    (m)  rn P(X 0  > cn ) (m)

   (m)  (m) P(max1≤j≤rn X j − X j  > cn η) P((X 1,rn )∗ > cn )     ≤ cst η + cst  (m)   (m)  rn P(X 0  > cn ) rn P(X 0  > cn )    (m)  (m) P(X 0 − X 0  > cn η) P((X 1,rn )∗ > cn )     + cst . ≤ cst η  (m)   (m)  rn P(X 0  > cn ) P(X 0  > cn )

Letting n, then m tends to zero, applying (6.2.5) and (6.2.7), we obtain −1 E[H(c−1 n X 1,rn ) − H(cn X 1,rn )]   lim lim sup ≤ cst η .  (m)  m→∞ n→∞ rn P(X 0  > cn ) (m)

Since η is arbitrary, we have proved (6.2.9) and consequently the convergence v#

ν ∗(m) −→ ν ∗ as m → ∞.

160

6 Convergence of clusters ∗(m)

v#

• We now know that ν n,rn −→ ν ∗(m) as n → ∞ for all sequences cn , rn v#

which satisfy (6.2.6) and that ν ∗(m) −→ ν ∗ as m → ∞. Since (6.2.9) holds v#

for all such sequences, this proves that ν ∗n,rn −→ ν ∗ as n → ∞.



We now apply Theorem 6.2.5 to further examples. Example 6.2.8 (Mean cluster size) Since ϑ > 0, if AC(rn , cn ) holds, Corollary 6.2.6 yields ⎤ ⎡ rn  1{|X j | > cn } | X ∗1,rn > cn ⎦ lim E ⎣ n→∞

j=1

= lim

E



rn j=1

1{|X j | > cn }

P(X ∗1,rn > cn )

n→∞

= lim

n→∞

rn P(|X 0 | > cn ) = ϑ−1 . P(X ∗1,rn > cn )

Therefore, ϑ can be interpreted as the inverse limiting mean cluster size.  Example 6.2.9 (Cluster size distribution) For m ∈ N consider the functionals ⎫ ⎧ ⎬ ⎨ 1{|xj | > 1} = m , H(x) = 1 ⎭ ⎩ j∈Z

which are shift-invariant, continuous, bounded, and vanish if x∗ ≤ 1. If Condition AC(rn , cn ) holds, Theorem 6.2.5 yields ⎞ ⎛ rn  ∗ lim P ⎝ 1{|X j | > cn } = m | X 1,rn > cn ⎠ n→∞

j=1

# $ = P E(Y ) = m | Y ∗−∞,−1 ≤ 1 % & 1{E(Y ) = m} = ϑ−1 E . E(Y ) That is, the asymptotic distribution of the number of exceedences over the large threshold cn is the number of indices j such that |Y j | is larger than one, conditionally on |Y 0 | being the first exceedence. Let π(m) be the probability in the right-hand side of the last display. Considering the functional E(Y ) = j∈Z 1{|Y j | > 1} and applying (5.5.9), we have ∞  m=1

mπ(m) =

∞ 

mP(E(Y ) = m | Y ∗−∞,−1 ≤ 1) = E[E(Y ) | Y ∗−∞,−1 ≤ 1]

m=1

= ϑ−1 ν ∗ (E) = ϑ−1 ν({|y 0 | > 1}) = ϑ−1 . This relation means that the limit of the mean cluster size is the mean of the limiting cluster size distribution. 

6.2 Vague# convergence of clusters

161

Example6.2.10 (Stop-loss index) Let S > 0 and consider the functional  H(x) = 1 and define the stop-loss index: j∈Z (xj − 1)+ > S ( ' rn P (X − c ) > Sc j n + n j=1 θstoploss (S) = lim n→∞ rn P(X0 > cn ) ⎞ ⎛ ∞  = P ⎝ (Yj − 1)+ > S, Y ∗−∞,−1 ≤ 1⎠ j=0

⎤ ⎡  ∞ (Y − 1) > S 1 j + j=0 ⎦ . = E⎣ E(Y )  Theorem 6.2.5 entails convergence of ν ∗n,rn (H) for bounded continuous shiftinvariant functionals H which vanish in a neighborhood of 0. We will also be interested in functionals which do not satisfy these assumptions or which are not even defined on the whole space 0 . For instance, functionals of interest include ⎧ ⎫ ⎧ ⎫ ⎨ ⎬ ⎨ ⎬  1 xj > 1 , 1 sup xj > 1 . ⎩ ⎭ ⎩ i∈Z ⎭ j≤i

j∈Z

As usual, extension of the convergence of ν ∗n,rn (H) will be obtained under appropriate uniform integrability or moment conditions. The first such extension is concerned with certain unbounded functionals. Proposition 6.2.11 Let AC(rn , cn ) hold. Let H be a shift-invariant continuous functional on (Rd )Z and η > 0 be such that |H(x)| ≤ cst j∈Z 1{|xj | > η}. Then limn→∞ ν ∗n,rn (H) = ν ∗ (H).

Proof. For M ≥ 1, define HM = H ∧ M . Then ν ∗n,rn (HM ) → ν ∗ (HM ) by Theorem 6.2.5. To conclude, we must prove that lim lim sup ν ∗n,rn (H1{H > M }) = 0 .

M →∞ n→∞

By assumption on H, we have rn ν ∗n,rn (H1{H

> M }) ≤ cst

j=1

P(|X j | > cn η,

rn i=1

1{|X i | > cn } > M )

rn P(|X 0 | > cn ) rn P(|X 0 | > cn η, i=−rn 1{|X i | > cn } > M ) . ≤ cst P(|X 0 | > cn )

162

6 Convergence of clusters

Applying Theorem 6.1.4, we obtain lim lim sup ν ∗n,rn (H1{H > M })

M →∞ n→∞



≤ cst lim P ⎝ M →∞



⎞ 1{|Y j | > η} > M ⎠ = 0 ,

j∈Z

since P(lim|j|→∞ |Y j | = 0) = 1 implies that P for all η > 0.

'

( 1{|Y | > η} < ∞ =1 j j∈Z 

If α ≤ 1 and Condition AC(rn , cn ) holds, then we already know by Theorem 5.4.10 that ⎞α ⎤ ⎡⎛   |Qj |⎠ ⎦ ≤ E[|Qj |α ] = ϑ−1 < ∞ . (6.2.11) E ⎣⎝ j∈Z

j∈Z

Thus P(Q ∈ 1 ) = 1. If α > 1, summability of Q does not necessarily hold and if needed must be assumed or proved under an additional assumption.

Definition 6.2.12 Let {X j , j ∈ Z} be a stationary sequence. Let {cn } be a scaling sequence and let {rn } be a non-decreasing sequence of integers such that limn→∞ rn = ∞. We say that Condition ANSJB(rn , cn ) holds if for all η > 0, r n P( j=1 |X j |1{|X j | ≤ cn } > ηcn ) = 0 . (ANSJB(rn , cn )) lim lim sup →0 n→∞ rn P(|X 0 | > cn )

The abbreviation ANSJB stands for Asymptotic Negligibility of Small Jumps. Remark 6.2.13 If α ∈ (0, 1), then ANSJB(rn , cn ) holds. Indeed, by Markov’s inequality and Proposition 1.4.6, rn P( j=1 |X j |1{|X j | ≤ cn } > ηcn ) E[|X 0 |1{|X 0 | ≤ cn }] ≤ rn P(|X 0 | > cn ) ηcn P(|X 0 | > cn ) P(|X 0 | > cn ) . ≤ cst P(|X 0 | > cn ) This yields lim lim sup

→0 n→∞

r n P( j=1 |X j |1{|X j | ≤ cn } > ηcn ) rn P(|X 0 | > cn )

≤ cst lim 1−α = 0 . →0

6.2 Vague# convergence of clusters

163

If α ≥ 1, then Condition ANSJB(rn , cn ) holds for i.i.d. or m-dependent sequences if rn = o(cn ). See Problem 6.4. For general time series, it is not easy to check. ⊕ Proposition 6.2.14 If AC(rn , cn ) and ANSJB(rn , cn ) hold, then ⎡⎛ ⎞α ⎤  E ⎣⎝ (6.2.12) |Qj |⎠ ⎦ < ∞ j∈Z

and

⎡⎛ ⎞α ⎤ rn  P( i=1 |X j | > cn ) = E ⎣⎝ |Qj |⎠ ⎦ . lim n→∞ rn P(|X 0 | > cn )

(6.2.13)

j∈Z

Remark 6.2.15 If AC(rn , cn ) and (6.2.12) hold, then (6.2.13) and ⊕ ANSJB(rn , cn ) are actually equivalent. Proof. By Theorem 6.2.5, we have r n P( j=1 |X j | 1{|X j | > cn } > cn ) lim n→∞ rn P(|X 0 | > cn ) ⎞ ⎛ ∞       Qj  1 r Qj  >  > 1⎠ αr−α−1 dr . (6.2.14) P ⎝r =ϑ 0

j∈Z

By ' monotone convergence, the right-hand side converges as  → 0 to (α ϑE . We must now prove that this quantity is finite and the j∈Z |Qj | convergence (6.2.13) when ANSJB(rn , cn ) holds. Consider the function ⎛ ⎞ ∞    P ⎝r |Qj |1 r|Qj | > ζ > 1⎠ αr−α−1 dr . g(ζ) = ϑ 0

j∈Z

' (α . To It increases when ζ decreases to zero and its limit is ϑE |Q | j j∈Z prove that this quantity is finite, it suffices to prove that the function g is bounded. Fix  > 0 and η ∈ (0, 1). By ANSJB(rn , cn ), there exists ζ such that r n P( j=1 |X j | 1{|X j | ≤ ζcn } > ηcn ) ≤. lim sup rn P(|X 0 | > cn ) n→∞ Fix ζ  < ζ. Starting from (6.2.14) and applying ANSJB(rn , cn ), we obtain

164

6 Convergence of clusters

0 ≤ g(ζ  ) = ϑ





⎛ P ⎝r

0



⎞   |Qj |1 r|Qj | > ζ  > 1⎠ αr−α−1 dr

j∈Z

r n P( j=1 |X j |1{|X j | > cn ζ  } > cn )

= lim

n→∞

rn j=1

P

= lim

n→∞

≤ lim sup

+ lim ∞

≤+ϑ

(

rn P(|X 0 | > cn ) r n P( j=1 |X j | 1{|X i | > cn ζ} > (1 − η)cn )

n→∞



|X j |1{|X j | > cn ζ} + |X j |1{cn ζ ≥ |X j | > cn ζ  } > cn

rn P(|X 0 | > cn ) r n P( j=1 |X j | 1{|X j | ≤ cn ζ} > ηcn )

n→∞

0

≤  + ϑζ

rn P(|X 0 | > cn )

'

rn P(|X 0 | > cn )

⎞       Qj  1 z Qj  > ζ > 1 − η ⎠ αz −α−1 dz P ⎝z ⎛

j∈Z

−α

.

The latter bound holds since the probability inside the integral is zero if z ≤ ζ since |Qj | ≤ 1 for all j. This proves that the function g is bounded in a neighborhood of zero as claimed. By Condition ANSJB(rn , cn ), we finally obtain ( ' rn P |X | > c j n j=1 lim n→∞ rn P(|X 0 | > cn ) ( ' rn P j=1 |X j |1{|X j | > cn } > cn = lim lim →0 n→∞ rn P(|X 0 | > cn ) ⎛ ⎞ ∞    P ⎝r |Qj |1 r|Qj | >  > ⎠ αr−α−1 dr = lim ϑ →0

0

⎡⎛ = ϑE ⎣⎝



j∈Z

⎞α ⎤ |Qj |⎠ ⎦ .

j∈Z

This proves (6.2.13).



We can now consider functionals H of the form 1{K > 1} where K is defined on 1 and satisfies certain regularity condition. Theorem 6.2.16 Let K be a shift-invariant functional defined on 1 such that K(0) = 0 and which is Lipschitz continuous with constant LK , i.e. for all x, y ∈ 1 (Rd ),

6.2 Vague# convergence of clusters

|K(x) − K(y)| ≤ LK

  xj − y j  .

165

(6.2.15)

j∈Z

Assume that AC(rn , cn ) and ANSJB(rn , cn ) hold. Then ∞ P(K(X 1,rn /cn ) > 1) =ϑ P(K(rQ) > 1)αr−α−1 dr < ∞ . lim n→∞ rn P(|X 0 | > cn ) 0 (6.2.16) If K is 1-homogeneous, then the right-hand side of (6.2.16) is equal to α (Q)]. ϑE[K+

Proof. For  > 0, we define the truncation operator T by T (x) = {xj 1{|x j |>} , j ∈ Z} .

(6.2.17)

The operator T is continuous with respect to the uniform norm at every x ∈ 0 such that |xj | =  for all j ∈ Z. Fix η ∈ (0, 1) and ζ > 0. Let LK be as in (6.2.15) and choose  > 0 such that r n P( j=1 |X j | 1{|X j | ≤ cn } > ηcn /LK ) ≤ζ. lim sup rn P(|X 0 | > cn ) n→∞ Set K = K ◦ T . Applying assumption (6.2.15), we obtain P(K(X 1,rn /cn ) > 1) rn P(|X 0 | > cn ) P(K (X 1,rn /cn ) > 1 − η) P(|K(X 1,rn /cn ) − K (X 1,rn /cn )| > η) + ≤ rn P(|X 0 | > cn ) rn P(|X 0 | > cn ) r n P(K (X 1,rn /cn ) > 1 − η) P( i=1 |X j | 1{|X j | ≤ cn } > ηcn /cst) + . ≤ rn P(|X 0 | > cn ) rn P(|X 0 | > cn ) Applying Theorem 6.2.5 to K , this yields lim sup n→∞

P(K(X 1,rn /cn ) > 1) P(K (X 1,rn /cn ) > 1 − η) ≤ lim sup +ζ rn P(|X 0 | > cn ) rn P(|X 0 | > cn ) n→∞ ∞ =ϑ P(K (rQ) > 1 − η)αr−α−1 dr + ζ . 0

Similarly, P(K(X 1,rn /cn ) > 1) ≥ϑ lim inf n→∞ rn P(|X 0 | > cn )

0



P(K (rQ) > 1 + η)αr−α−1 dr − ζ .

166

6 Convergence of clusters

Since K(0) = 0, (6.2.15) implies that |K(x)| ≤ cst j∈Z |xj |, thus for all y > 0, ⎞ ⎛    r Qj  > y/cst⎠ P(K (rQ) > y) ≤ P ⎝ j∈Z

and the latter quantity is integrable (as a function of r) with respect to αr−α−1 dr in view of ANSJB(rn , cn ) and Proposition 6.2.14. By bounded convergence, this yields ∞ ∞ −α−1 lim P(K (rQ) > y)αr dr = P(K(rQ) > y)αr−α−1 dr . →0

0

0

Altogether, we obtain ∞ P(K(X 1,rn /cn ) > 1) P(K (rQ) > 1 + η)αr−α−1 dr − ζ ≤ lim inf n→∞ rn P(|X 0 | > cn ) 0 ∞ P(K(X 1,rn /cn ) > 1) ≤ϑ P(K (rQ) > 1 − η)αr−α−1 dr + ζ . ≤ lim sup rn P(|X 0 | > cn ) n→∞ 0 

Since ζ and η are arbitrary, this proves (6.2.16).

Combining Theorems 5.6.8 and 6.2.16 yields, for 1-homogeneous functionals K satisfying the assumptions of Theorem 6.2.16, P(K(X 1,rn ) > cn ) rn P(|X 0 | > cn ) α α α (Q)] = E[K+ (Θ0,∞ ) − K+ (Θ1,∞ )] . = ϑE[K+

ν ∗ (1{K > 1}) = lim

n→∞

(6.2.18)

Example 6.2.17 (Large deviations index) Let {Xj , j ∈ Z} be an univariate time series and assume that AC(rn , cn ) and (6.2.12) hold. The functionals ⎞ ⎛  K(x) = ⎝ xj ⎠ j∈Z

+

and H(x) = 1{K(x) > 1} yield the large deviations index: '  ( r n P > cn j=1 Xj + θlargedev = lim n→∞ rn P(|X0 | > cn ) ⎞α ⎤ ⎡⎛ ⎞α ⎛ ⎞α ⎤ ⎡⎛ ∞ ∞    = ϑE ⎣⎝ Qj ⎠ ⎦ = E ⎣⎝ Θj ⎠ − ⎝ Θj ⎠ ⎦ . (6.2.19) j∈Z

+

j=0

+

j=1

+



6.2 Vague# convergence of clusters

167

Example 6.2.18 (Ruin index) Another example of interest is obtained by taking ⎛ ⎞  xj ⎠ K(x) = sup ⎝ i∈Z

j≤i

+

with H(x) = 1{K(x) > 1} which yields the so-called ruin index. Assume AC(rn , cn ) and (6.2.12). Theorem 6.2.16 gives ⎡ ⎛ ⎞α ⎤ j  P(max1≤j≤rn i=1 Xi > cn ) = ϑE ⎣sup ⎝ θruin = lim Qj ⎠ ⎦ . n→∞ rn P(|X0 | > cn ) i∈Z j≤i

+

(6.2.20)  Remark 6.2.19 If {Xj } is a non-negative time series, then the expressions in the right-hand side of (6.2.19) and (6.2.20) are non-zero. Otherwise, if the time series is allowed to admit both positive and negative values, these limits can be zero. Consider the sequence defined by Xj = Zj − Zj−1 , where {Zj , j ∈ Z} is a sequence of i.i.d. regularly varying random variables. Then {Xj , j ∈ Z} is stationary and regularly varying and (6.2.19) vanishes (see Problem 5.6). It can also be checked directly that rn P( j=1 Xj > cn ) P(Zrn − Z0 > cn ) lim = lim =0, n→∞ rn P(|X0 | > cn ) n→∞ rn P(|X0 | > cn ) for any sequences {rn }, {cn } diverging to infinity. This example is somewhat degenerate but attention must be paid to this issue in order to avoid trivialities. ⊕ Finite-dimensional approximations of the cluster indices We give another interpretation of the cluster indices. ∞ Theorem 6.2.20 Assume that E[( j=0 |Θj |)α−1 ] < ∞. Let K be a 1homogeneous, shift-invariant functional defined on 1 such that K(0) = 0 and (6.2.15) holds. Then lim lim

k→∞ x→∞

P(K(X 1,k ) > x) α α = E[K+ (Θ0,∞ ) − K+ (Θ1,∞ )] . kP(|X 0 | > x)

168

6 Convergence of clusters

The above result does not require AC(rn , cn ) nor ANSJB(rn , cn ). However, if both conditions are satisfied, then by Proposition 5.6.5 and (6.2.12) of ∞ Proposition 6.2.14 we have E[( j=0 |Θj |)α−1 ] < ∞. Thus, applying both Theorem 6.2.20 and Theorem 6.2.16, we have (cf. (6.2.18)) lim lim

k→∞ x→∞

P(K(X 1,k ) > x) P(K(X 1,rn /cn ) > 1) = lim . n→∞ kP(|X 0 | > x) rn P(|X 0 | > cn )

Proof (Proof of Theorem 6.2.20). Set b0 = 0 and bk = limx→∞ P(K(X 1,k ) > x)/P(|X 0 | > x). Then, k−1 1 bk = {bi+1 − bi } . k k i=0

Thus by Cesaro’s theorem, it suffices to prove that α α lim {bk+1 − bk } = E[K+ (Θ0,∞ ) − K+ (Θ1,∞ )] .

(6.2.21)

k→∞

In order to prove (6.2.21), we express bk+1 − bk in terms of the tail measure. By definition of the tail measure and by stationarity, we have bk = ν 1,k ({K(y 1,k ) > 1}) = ν({K(y 1,k ) > 1}) , bk+1 = ν 0,k ({K(y 0,k ) > 1}) = ν({K(y 0,k ) > 1}) .     Note that 1 K(y 0,k ) > 1 = 1 K(y 1,k ) > 1 if y 0 = 0. Therefore, we can apply (5.2.8) and obtain #    $ 1 K(y 0,k ) > 1 − 1 K(y 1,k ) > 1 1{y 0 = 0}ν(dy) bk+1 − bk = (Rd )Z ∞

=

E [(1{rK(Θ0,k ) > 1} − 1{rK(Θ1,k ) > 1})] αr−α−1 dr

0 α α = E[K+ (Θ0,k ) − K+ (Θ1,k )] . α

Recall that by Corollary 5.3.5, E[|Θj | ] ≤ 1 thus the previous expectation is finite by assumption on K. Since P( j≥0 |Θj | < ∞) = 1 (by the moment ∞ assumption E[( j=0 |Θj |)α−1 ] < ∞) and K is continuous on 1 , we obtain α α α α (Θ0,k ) − K+ (Θ1,k ) = K+ (Θ0,∞ ) − K+ (Θ1,∞ ). If α ≤ 1, the that limk→∞ K+ α α assumption on K implies that |K+ (Θ0,k ) − K+ (Θ1,k )| is uniformly bounded, and therefore the required convergence holds. If α > 1, we have for each k α α |K+ (Θ0,k ) − K+ (Θ1,k )| ≤ cst

∞ 

|Θj |

α−1

.

j=0

Thus the required limit holds by bounded convergence.



6.3 Problems

169

Example 6.2.21 (Large deviation index) Consider the univariate case and the summation functional x → j∈Z xj . Then under the assumptions of Theorem 6.2.20, we have the convergence ⎡⎛ ⎞α ⎛ ⎞α ⎤ k ∞ ∞   P( j=1 Xj > x) = E ⎣⎝ lim lim Θj ⎠ − ⎝ Θj ⎠ ⎦ = θlargedev . k→∞ x→∞ kP(|X0 | > x) j=0 j=1 +

+

 Example 6.2.22 (Ruin'index) Consider again the univariate case and the ( . Under the assumptions of Theorem functional x → supi∈Z j≤i xj +

6.2.20, we have lim lim

k→∞ x→∞

P(max1≤i≤k (X1 + · · · + Xi ) > x) kP(|X0 | > x) ⎛ ⎞α ⎛ ⎞α ⎤ ⎡ i i   Θj ⎠ − sup ⎝ Θj ⎠ ⎦ = θruin . = E ⎣sup ⎝ i≥0

j=0

i≥1

+

j=1

+



6.3 Problems 6.1 Consider the MA(1) process of Example 5.2.10 and Problem 5.6. Calculate the mean cluster size, the cluster size distribution (cf. Example 6.2.9), the stop-loss index (cf. Example 6.2.10), the large deviations index (cf. Example 6.2.17), and the ruin index (cf. Example 6.2.18). 6.2 Consider the AR(1) process of Example 5.2.12 and Problem 5.7 with ρ ∈ [0, 1). Calculate the mean cluster size, the cluster size distribution, the stop-loss index, the large deviations index, and the ruin index. 6.3 Let {cn } be a scaling sequence and {rn } be an intermediate sequence such that AC(rn , cn ) holds. Prove that $ # ϑ = lim P X ∗1,rn ≤ cn | |X 0 | > cn . n→∞

Hint: apply Theorem 6.1.4 6.4 Assume that {Xj , j ∈ Z} is a regularly varying sequence of i.i.d. random variables with tail index α > 1. Prove that rn P( j=1 |Xj |1{|Xj | ≤ cn } > ηcn ) =0, lim lim sup →0 n→∞ rn P(|X0 | > cn )

170

6 Convergence of clusters

for all scaling sequences rn , cn such that rn = o(cn ). Extend it to mdependent sequences. Hint: Use the assumption rn = o(cn ) to center the sum, then use Burkholder’s inequality (E.1.6) if α < 2 and Rosenthal’s inequality (E.1.4) if α ≥ 2. For m-dependent case, split the sum into rn /m sums of terms whose indices are separated by m. 6.5 Let (X, d) be a metric space. Define the distance between x ∈ X and subset B ⊂ X by d(x, B) = inf{d(x, y) : y ∈ B}. Let ∼ be an equivalence ˜ be the induced quotient space. Define a function relation on X and let X ˜ ˜ ˜ d : X × X → [0, ∞) by ˜ x, y˜) = inf{d(x , y  ) : x ∈ x ˜, y  ∈ y˜} , d(˜ ˜ Let (X, d) be a complete separable metric space. Assume for all x ˜, y˜ ∈ X. ˜ and all x, x ∈ x that for all x ˜, y˜ ∈ X ˜ we have d(x, y˜) = d(x , y˜) ,

(6.3.1)

where d(x, y˜) = inf{d(x, y), y ∈ y˜}. 1. Prove that d˜ is a pseudo-metric. Hint: use (6.3.1) to prove the triangle inequality. ˜ is a separable and complete pseudo-metric space. ˜ d) 2. Prove that (X, ˜ 6.6 Consider the space ˜0 endowed with the metric d. ˜ is complete and separable. Hint: 1. Apply Problem 6.5 to prove that (˜0 , d) show that (6.3.1) holds. ˜ x, y ˜ ) = 0 =⇒ x ˜=y ˜. 2. Prove that d˜ is a proper metric, i.e. d(˜ 6.7 Let π(m) be the cluster size distribution, as in Example 6.2.9. Show that ∞  m=1

m2 π(m) = ϑ−1



P(|Y j | > 1) .

j∈Z

6.8 Assume that {Xj , j ∈ Z} is a univariate stationary regularly varying time series. Let cn be such that limn→∞ nP(|X0 | > cn ) = 1 and AC(rn , cn ) holds. –Prove that AC(rn − h, cn ) holds for {(Xj , Xj+1 , . . . , Xj+h ), j ∈ Z}. –Prove that AC(rn − h, c2n ) holds for {(Xj2 , Xj Xj+1 , . . . , Xj Xj+h ), j ∈ Z}.

6.4 Bibliographical notes

171

6.4 Bibliographical notes This chapter is essentially based on [BPS18] and [PS18]. The main ideas on convergence of cluster and cluster indices (or functionals) stem from [BS09] where earlier references on cluster functionals can be found, most notably [Yun00] and [Seg03]. Corollary 6.2.6 is taken from Proposition 4.2 of [BS09], whose proof is the model for the proof of Theorem 6.2.5. Section 6.2 extends results of [MW14, MW16].

7 Point process convergence

In this chapter we will introduce one important tool in the study of the extremes of regularly varying time series, namely the point process of exceedences. The point process of exceedences records the location and value of the observations over a high threshold. Its convergence to a point process is the main object of this chapter. This convergence in turn will be the starting point of the limit theorems for the partial maxima and partial sum processes to be obtained in Chapter 8. In Section 7.1, we give without proof the main elements of the theory of random measures and point processes on a localized Polish space. The Laplace functional is the main tool to characterize the distribution of a random measure and the weak convergence of a sequence of random measures. The random measures which appear in the limit will be marked Poisson point processes which we define in Section 7.1.2 and characterize through their Laplace functionals. In Section 7.2, we first apply these techniques to the point process of exceedences of i.i.d. data. The limit is a Poisson point process on R\{0} with mean measure να,p , that is a simple point process. For a time series with temporal and extremal dependence, extremes happen in clusters, hence it is natural to consider the point process of clusters, which is a point process on the space of sequences. Under suitable conditions, the limit is a Poisson point process on the space of sequences. This will be shown in Section 7.3 under necessary and sufficient conditions on the Laplace functional. Necessary and sufficient conditions are usually unpractical and we give more convenient conditions in Section 7.4. These conditions are tail equivalent m-dependent approximations already used in Chapter 5 and β-mixing. The β-mixing condition is very stringent, but holds for many common time series. We conclude this chapter by introducing the true extremal index which controls, when positive, the behavior of the partial maxima of the time series. We will show that under the conditions which ensure the convergence of the © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 7

173

174

7 Point process convergence

point process of clusters (or simply of exceedences), it is equal to the candidate extremal index introduced in Chapter 5. This will be the first illustration of the usefulness of the tail process.

7.1 Random measures and point processes Let (E, B) be a localized Polish space endowed with its Borel σ-field E (cf. Appendix B.1.1). Let M(E) (or simply M when there is no risk of confusion) denote the set of B-boundedly finite non-negative Borel measures μ on (E, B), i.e. such that μ(A) < ∞ for all Borel sets A ∈ B. A B-boundedly finite point measure on E is a measure which takes finite integer values on the bounded Borel sets of E. The set of B-boundedly finite point measures on E will be denoted by N (E) (or simply N ). Since E can be expressed as the union of a countable collection of bounded sets (by definition of a localized boundedness, see Definition B.1.1), a point measure ν ∈ N can be expressed as ν=

∞ 

δxj ,

(7.1.1)

j=1

where the elements xj ∈ E are called the points of the measure ν. If ν ∈ N has the form (7.1.1), we write, for f : E → R+ , ν(f ) =

∞ 

f (xj ) .

j=1

Vague# convergence in Rd \{0} was defined in Chapter 2 and in (Rd )Z \{0} in Chapter 5. We now state the definition of vague# convergence in the present framework. See Appendix B.1 for more details. Definition 7.1.1 (Vague# convergence of measures) A sequence {νn } of elements of M is said to converge B-vaguely# to ν, denoted v#

by νn −→ ν, if limn→∞ νn (A) = ν(A) for all bounded continuity sets A of ν.

By the Portmanteau Theorem B.1.17, an equivalent definition is that if limn→∞ νn (f ) = ν(f ) for all Lipschitz continuous functions f with bounded support in E. An important property of vague# convergence of point measures is the convergence of points.

7.1 Random measures and point processes

175

Proposition 7.1.2 (Convergence of points) Let {νn } be a sequence v#

of B-boundedly finite point measures on E such that νn −→ ν0 . Let A be a bounded Borel set of E such that ν0 (∂A) = 0. Denote p = ν0 (A) and let x1 , . . . , xp be the points of ν0 in A. Then, for large enough n, νn (A) = p and the points xn,1 , . . . , xn,p of νn in A can be ordered in such a way that xn,i → xi in E.

Proof. Since A is a bounded continuity set of ν, it holds that ν0 (A) < ∞ and by definition of vague# convergence, limn→∞ νn (A) = ν0 (A). Since {νn (A)} is a sequence of integers, its convergence means that there exists n0 such that νn (A) = ν0 (A) for all n ≥ n0 . Let (x1 , J1 ), . . . , (xk , Jk ) be the points of ν with their multiplicities. This implies that J1 + · · · + Jk = ν0 (A). Since E is a metric space, points are separated and therefore we can find positive radii r1 , . . . , rk such that B(xi , ri ) contains no point of ν other than xi . It suffices then to prove the result for A = B(x1 , r1 ), say, and we omit henceforth the subscript 1. Let J be the multiplicity of x as a point of ν0 . For large n, we have νn (A) = J. Let the points of νn in A be numbered arbitrarily by xn,1 , . . . , xn,J . For every  > 0, ν0 (B(x, )) = J. Thus for large enough n, νn (B(x, )) = J which implies that {xn,1 , . . . , xn,j } ⊂ ν0 (B(x, )). This  proves that limn→∞ xn,i = x for i = 1, . . . , J. As a consequence, we obtain the continuity of the summation of the points within a fixed bounded set, when E is a topological vector space, that is a vector space endowed with a topology with respect to which addition is continuous. For instance Rd with the usual topology is a topological vector space. Let A be a bounded set. The summation functional on A, denoted SA : N → E is defined by  SA (μ) =

xμ(dx) = A

∞ 

xi 1A (xi ) ,

i=1

where the sum is actually finite. Corollary 7.1.3 If E is a topological vector space and A a bounded set of E, then SA : N → E is continuous with respect to the B-vague# topology.

v#

Proof. We must prove that if νn −→ ν, then SA (νn ) → SA (ν). Set k = ν(A) and let x1 , . . . , xk be the points of ν in A. By the convergence of points, for

176

7 Point process convergence

n large enough, νn (A) = k and we can number the points of νn in A by xn,1 , . . . , xn,k with xn,i → xi , i = 1, . . . , k. This yields SA (νn ) =

k  i=1

xn,i →

k 

xi = SA (ν) .

i=1



7.1.1 Random measures The topology of B-vague# convergence is the smallest topology which makes the maps μ → μ(f ) continuous for all bounded continuous functions with bounded support and we denote by B(M) the corresponding Borel sigmafield. Vague# convergence in M is metrizable in such a way that M is a complete separable metric space (CSMS). See Theorem B.1.20. Definition 7.1.4 (Random measure, point process) Let (E, B) be a localized Polish space and (Ω, F, P) be a probability space. A random measure ξ on E is a measurable map from (Ω, F, P) into (M, B(M)). • The application A → E[ξ(A)] is a measure called the mean measure of ξ. A Borel set A is said to be a continuity set of ξ if it is a continuity set of the mean measure of ξ. • The Laplace functional of ξ is the map f → E[e−ξ(f ) ] defined on the set of non-negative measurable functions. • A point process is a random measure with values in N . A point process N is said to be simple if P(N ({x}) ≤ 1) = 1 for all x ∈ E.

We now provide a characterization of the distribution of a random measure. We say that a metric d is compatible with the boundedness B if it induces the topology of E and every set B ∈ B admits a bounded d-enlargement, i.e. there exists  > 0 such that {x ∈ E, d(x, B) < } ∈ B. Theorem 7.1.5 Let (E, B) be a localized Polish space and d be a compatible metric. Let ξ and η be two random measures on E such that one of the following conditions hold: d

(i) (ξ(A1 ), . . . , ξ(Ak )) = (η(A1 ), . . . , η(Ak )) for all k ≥ 1 and all Borel sets A1 , . . . , Ak

7.1 Random measures and point processes

177

d

(ii) ξ(f ) = η(f ) for all bounded Lipschitz continuous functions f with bounded support. (iii) E[e−ξ(f ) ] = E[e−ξ(f ) ] for all bounded d-Lipschitz continuous functions f with bounded support. Then ξ and η have the same distribution.

7.1.2 Poisson point processes Recall that a Poisson random variable N with mean θ is an integer valued random variable such that P(N = n) = e−θ θn /n!. Its moment generating function is given by E[z N ] = eθ(z−1) , z ≥ 0 . The point processes we will be mostly interested in are Poisson point processes. Definition 7.1.6 (Poisson point process) Let (E, B) be a localized Polish space and ν ∈ M. A point process N on E is called a Poisson point process with mean measure ν, abbreviated PPP(ν), if (i) N (A) is a Poisson random variable with mean ν(A) for all bounded Borel sets A; (ii) for all k ∈ N∗ , and all collections A1 , . . . , Ak of disjoint bounded Borel sets, N (A1 ), . . . , N (Ak ) are independent random variables.

The Laplace functional of a Poisson point process can be explicitly computed. Proposition 7.1.7 (Laplace functional of a Poisson point process) Let ν be a boundedly finite measure on a localized Polish space (E, B). A point process N is a Poisson point process on E with mean measure ν if and only if for all bounded continuous functions f with bounded support in E, we have E[e−N (f ) ] = eν(e

−f

−1)

.

178

7 Point process convergence

Proof. For any bounded Borel set A, N (A) is a Poisson random variable with mean ν(A). Thus E[e−tN (A) ] = eν(A)(e

−t

−1)

.

k If f is a step function, i.e. f = i=1 ai 1Ai , where ai are positive real numbers and Ai are pairwise disjoint bounded Borel sets, then E[e−N (f ) ] =

k 

eν(Ai )(e

−ai

−1)

= eν(e

−f

−1)

.

i=1

The generalization to continuous functions with bounded support is obtained by monotone convergence.  Proposition 7.1.7 implies straightforwardly that the properties (i) and (ii) of Definition 7.1.6 can be extended to sets A1 , . . . , An not necessarily bounded but with finite ν-measure. Example 7.1.8 (Homogeneous Poisson point process on R) A homogeneous Poisson point process N with (constant) intensity (or rate) λ > 0 is a point process on R with the following Laplace functional: E[e−N (f ) ] = eλ

∞

−∞

(e−f (x) −1)dx

.

Since the number of points in a compact interval is almost surely finite, the points {Γj , j ∈ Z} of N can be ordered in such a way that · · · ≤ Γ−2 ≤ Γ−1 ≤ Γ0 < 0 ≤ Γ1 ≤ Γ2 ≤ · · · Then, for j ≥ 1, Γj = λ−1 (E1 + · · · + Ej ) where Ei , i ≥ 1 are i.i.d. standard exponential random variables and for j ≤ 0, Γj = −λ−1 (E0 + − · · · + Ej ), where Ej , j ≤ 0 are also i.i.d. standard exponential random variables, inde∞ pendent of {Ej , j ≥ 1}. The point process j=1 δΓj is called a homogeneous Poisson point process with intensity λ on the positive half line.  Example 7.1.9 (A Poisson point process on (0, ∞)) We endow (0, ∞) with the boundedness of sets separated from zero. For α > 0, let να be the measure on (0, ∞) with the density αx−α−1 (cf. (1.3.8)). A Poisson point process on (0, ∞) with intensity measure να can be expressed as N=

∞  j=1

δΓ −1/α , j

where Γj , j ≥ 1 are the points of a unit rate homogeneous Poisson process on [0, ∞). Note that almost surely, there are finitely many points in any interval (, ∞) with  > 0. 

7.1 Random measures and point processes

179

Example 7.1.10 Let μ be a probability measure on a Polish space E. Let N be a Poisson random variable with mean θ and let {Zi , i ∈ N} be i.i.d. random elements on E with distribution μ, independent from N . Define the point process ξ on E by ξ=

N 

δ Zi .

i=1

Then ξ is a PPP on E with mean measure θμ. Indeed, for a non-negative measurable function f ,   −f E[e−ξ(f ) ] = E μ(e−f )N = eθμ(e −1) . If ν is a finite measure, let θ = ν(E) and μ = θ−1 ν. Then the previous constructions yield a PPP(ν).  The previous example can be used to prove the existence of a PPP with a given boundedly finite mean measure. Theorem 7.1.11 Let ν be a non-zero boundedly finite measure on a localized Polish space E. Then there exists a Poisson point process on E with measure ν.

Proof. If ν(E) < ∞ then we can use Example 7.1.10 to prove the existence of a PPP(ν). Let {Bi , i ≥ 1} be a partition of E which consists of bounded measurable sets and such that {B1 ∪ · · · ∪ Bn , n ≥ 1} is a localizing sequence. Thus ν(Bi ) < ∞ for all i ≥ 1 and we can define a sequence {ξi , i ≥ 1} of independent Poisson point processes on Bi with mean measure νi , the restriction of ν to Bi . Define ξ=

∞ 

ξi .

i=1

Then ξ is a PPP(μ). Indeed, let f be a bounded continuous function with bounded support. Then there exists n n such that the support of f is included ) = i=1 ξi (f ) and since e−f − 1 has the same in B1 ∪ · · · ∪ Bn . Then ξ(f  n support as f , ν(e−f − 1) = i=1 νi (e−f − 1). Thus, E[e−ξ(f ) ] =

n 

eνi (e

−f

−1)

=e

n

i=1

νi (e−f −1)

= eν(e

−f

−1)

.

i=1

This proves that ξ is a PPP(ν).



180

7 Point process convergence

A Poisson point process can be sent into another space. Proposition 7.1.12 (Transformation of points) Let (E, BE ), (F, BF ) be two localized Polish spaces and let N be a Poisson point process on E with points {Γj , j ∈ N} and mean measure ν. Let φ be a measurable function from E to F such that the image measure ν ◦φ−1 is BF -boundedly finite.  Then the point process Nφ on F defined by Nφ = j∈N δφ(Γj ) is a Poisson point process on F with mean measure ν ◦ φ−1 on F.

Proof. We can compute the Laplace function of the point process Nφ . Let first A be a bounded Borel set in F and let B = φ−1 (A). Then, ν(B) = νφ (A) < ∞ by assumption and for t ≥ 0, E[e−tNφ (A) ] = E[e−tN (B) ] = eν(B)(e

−t

−1)

−1

= eν◦φ

(A)(e−t −1)

.

As in the proof of Proposition 7.1.7, we extend this equality to step functions and then to all bounded continuous functions with bounded support by monotone convergence to conclude the proof.  Definition 7.1.13 (Marked point process) A marked point process on E is a simple point process ξ on a product space E × F such that P(ξ({s} × F) ≤ 1) = 1 for all s ∈ E. It is called a marked Poisson point process if it is a Poisson point process.

We will be interested in marked Poisson point processes where the marks are independent of the projection ξ(· × F) which is called the base point process. We will further restrict to i.i.d. marks, that is when the marked point process has the representation ξ=

∞ 

δPi ,Q i ,

(7.1.2)

i=1

∞ where N = i=1 δPi is a Poisson point process on E with mean measure ν and Qi , i ≥ 1 are i.i.d. copies of a random element Q with values in a Polish space F and distribution ηQ . The following result is a straightforward application of the characterization of the distribution of a point process by its Laplace functional.

7.1 Random measures and point processes

181

Lemma 7.1.14 A Poisson point process ξ has the representation (7.1.2) if and only if its mean measure is ν ⊗ ηQ and  −ξ(f ) − log E[e ] = {1 − E[e−f (x,Q ) ]}ν(dx) , E

for all bounded continuous functions f : E × F → R+ with bounded support in E × F, i.e. included in a set B × F with B bounded in E. Proof. Assume that f is continuous, non-negative and f (x, y) = 0 if x ∈ /B with B bounded. For x ∈ E, set h(x) = − log E[e−f (x,Q ) ]. Then h is continuous, non-negative with support included in B. Conditioning on the points {Pi }, we obtain   ∞  ∞ log E e− i=1 f (Pi ,Q i ) = log E E e− i=1 f (Pi ,Q i ) | {Pi }

∞   −f (Pi ,Q i ) E e | {Pi } = log E i=1

= log E

∞ 



E e−f (Pi ,Q 0 )

i=1

= log E {e  

−h(x)

E

= F E



e

−h(Pi )

i=1

 =

∞ 

| {Pi }

 − 1}ν(dx) =

E

{E[e−f (x,Q ) ] − 1}ν(dx)

{e−f (x,y ) − 1}ν(dx)ηQ (dy) .

This proves that the point process defined in (7.1.2) is a PPP with mean  measure ν ⊗ ηQ . Example 7.1.15 (Poisson point process on Rd \ {0}) Let |·| be any norm on Rd and let σ be a probability measure on the corresponding unit sphere Sd−1 . Let T : Rd \ {0} → (0, ∞) × Sd−1 be the polar coordinate map −1 defined by T (x) = (|x| , |x| x). Let να be the measure defined in (1.3.8), −α−1 with respect to Lebesgue’s measure on (0, ∞). The i.e. with density αx measure ν = (να ⊗ σ) ◦ T −1 is boundedly finite on Rd \ {0} and we can define a Poisson point process N on Rd \ {0} with mean measure ν by N=

∞  i=1

δΓ −1/α Wi , i

where {Wi , i ∈ N} is a sequence of i.i.d. random variables on Sd−1 with distribution σ, independent from the sequence {Γi , i ∈ N} of points of a unit rate homogeneous Poisson process. To prove this claim, we compute the

182

7 Point process convergence

Laplace functional of N . Let f be a bounded continuous function with support separated from 0. By Lemma 7.1.14 and Proposition 7.1.12 (applied with Q = W ), we have  ∞ −N (f ) − log E[e ]= {1 − E[e−tW0 ]}να (dt) 0   ∞ = {1 − e−tw }να (dt)σ(dw) Sd−1 0  {1 − e−x }ν(dx) . = Rd \{0}



7.1.3 Weak convergence of random measures Definition 7.1.16 (Weak convergence of random measures) Let {ξ, ξn , n ≥ 1} be random measures on (M, B(M)) and let ν be the mean w measure of ξ. We say that ξn converges weakly to ξ, denoted ξn =⇒ ξ, if d

(ξn (A1 ), . . . , ξn (Ak )) −→ (ξ(A1 ), . . . , ξ(Ak )) , for all k = 1, 2, . . . , and all bounded Borel sets A1 , . . . , Ak in E such that ν(∂Ai ) = 0.

The Laplace functional characterizes the weak convergence of random measures. Theorem 7.1.17 Let (E, B) be a localized Polish space and let d be a compatible metric. Let {ξ, ξn , n ≥ 1} be random measures on (M, B(M)). The following properties are equivalent: w

(i) ξn =⇒ ξ; d

(ii) ξn (f ) −→ ξ(f ) for all bounded continuous functions f with bounded support; d

(iii) ξn (f ) −→ ξ(f ) for all bounded Lipschitz continuous functions f with bounded support; (iv) for all bounded continuous functions f with bounded support, lim E[e−ξn (f ) ] = E[e−ξ(f ) ] ;

n→∞

(7.1.3)

7.1 Random measures and point processes

183

(v) for all non-negative d-Lipschitz continuous functions f with bounded support, lim E[e−ξn (f ) ] = E[e−ξ(f ) ] .

n→∞

(7.1.4)

Another consequence is the following criterion for the weak convergence of a sequence of Poisson point processes: Lemma 7.1.18 Let {νn , n ∈ N} be a sequence of boundedly finite measures on a localized Polish space E. Let {Nn , n ∈ N} be a sequence of Poisson point processes such that the mean measure of Nn is νn , n ∈ N. Then w

v#

Nn =⇒ N0 ⇐⇒ νn −→ ν0 . Proof. Let f be a measurable non-negative continuous function with bounded −f support. Since E[e−Nn (f ) ] = eνn (e −1) and e−f − 1 is bounded, continuous and has the same support as f , the stated equivalence is obvious.  We state here a version of the continuous mapping Theorem A.2.5 for random measures. Theorem 7.1.19 (Continuous mapping) Let {ξn , n ∈ N} be a w sequence of random measures in M. Assume that ξn =⇒ ξ0 . Let F be a metric space and let φ : M → F be a Borel measurable map which is almost surely continuous with respect to the distribution of ξ0 . Then d φ(ξn ) −→ φ(ξ0 ).

A doubly indexed sequence of point processes {ξn,i , i, n ≥ 1} is called a null array if lim sup P(ξn,i (B) > 0) = 0 ,

n→∞ i≥1

(7.1.5)

for all bounded measurable sets B. Example 7.1.20 Given a triangular array of random variables {X n,i , n ≥ 1, 1 ≤ i ≤ n}, we can define an array of Dirac point measures {δX n,i , i ≥ 1, n ≥ 1}. The null array condition becomes lim sup P(X n,i ∈ B) = 0 .

n→∞ i≥1

184

7 Point process convergence

 We set ξn = i≥1 ξn,i and we want to characterize the weak convergence of ξn to a Poisson point process when the measures ξn,i are random Dirac masses. Theorem 7.1.21 Let {X n,i , i ≥ 1, n ≥ 1} be a triangular array of rowwise independent random elements in E with distribution νn,i and such that {δX n,i , i ≥ 1, n ≥ 1} is a null array. Let ν be a boundedly finite measure on E and ξ be a PPP on E with mean measure ν. The following statements are equivalent: (i)

∞

(ii)

∞

i=1 δX n,i

w

=⇒ ξ;

v#

νn,i −→ ν.

i=1

Proof. We compute the Laplace functional of ξn = prove that lim E[e−ξn (f ) ] = e−ν(1−e

−f

n→∞

)

∞

i=1 δX n,i .

.

We must (7.1.6)

Let f be a bounded non-negative continuous function with bounded support. Write pn,i = 1 − E[e−f (Xn,i ) ] = νn,i (1 − e−f ). Then ∞ ∞   E[e−ξn (f ) ] = E[e−f (Xn,i ) ] = (1 − pn,i ) . i=1

∞

−f

i=1 ∞

Write νn = i=1 νn,i . Then e−νn (1−e ) = e− i=1 pn,i and ∞ ∞   −f −ξn (f ) ] − e−νn (1−e ) = (1 − pn,i ) − e−pn,i E[e i=1

i=1

∞  (1 − pn,i ) − e−pn,i ≤



i=1 ∞ 

1 2

i=1

p2n,i ≤

1 νn (1 − e−f ) sup pn,i . 2 i≥1

Let B be the support of f . Since δX n,i is a null array, we have sup pn,i ≤ sup P(Xn,i ∈ B) = o(1) . i≥1

v

i≥1

#

Since νn −→ ν and the support of 1 − e−f is also B, we have limn→∞ νn (1 − e−f ) = ν(1 − e−f ) and −f lim E[e−ξn (f ) ] − e−νn (1−e ) = 0 . n→∞

This proves (7.1.6).



7.2 Convergence of the point process of exceedences: the i.i.d. case

185

7.2 Convergence of the point process of exceedences: the i.i.d. case Let {X j , j ∈ Z} be a sequence of random vectors in Rd and {cn } be a scaling sequence. Define the point processes Nn =

n  i=1

δc−1 , Nn = n Xi

∞ 

δn−1 i,c−1 , n Xi

(7.2.1)

i=1

on Rd \ {0} and [0, ∞) × Rd \ {0}, respectively. The point process Nn is called the point process of exceedences and Nn is called the functional point process of exceedences. Note that Nn (A) = Nn ([0, 1] × A), therefore results for Nn imply those for Nn which could be proved directly but without any significant weakening of the assumptions. We will therefore only consider Nn . In order to give a first view of point processes in action, we briefly recall the classical result for the point process of exceedences of a sequence of i.i.d. regularly varying vectors. We consider the Polish space [0, ∞)×Rd \{0} endowed with the boundedness which consists of sets which are included in a set [0, A] × B c (0, ) with A > 0 and  > 0. Theorem 7.2.1 Let {X j , j ∈ Z} be a sequence of i.i.d. regularly varying random vectors with tail index α and let ν 0 be the exponent measure of X 0 associated to the scaling sequence {cn } such that limn→∞ nP(|X 0 | > cn ) = 1. Then Nn converges weakly in N ([0, ∞) × Rd \ {0}) to a Poisson point process N  with mean measure Leb × ν 0 .

An obvious consequence is that Nn converges weakly to a Poisson process N on Rd \ {0} with intensity measure ν 0 . , 1 ≤ i ≤ n. Proof. Consider the array of point processes ξn,i = δn−1 i,c−1 n Xi d Let B be a bounded Borel subset of [0, ∞) × R \ {0}. Then, there exists  > 0 such that for all i ≥ 1, P(ξn,i (B) > 0) ≤ E[ξn,i (B)] ≤ P(|X 0 | > cn ) → 0 . Thus {ξn,i } is a null array. The mean measure of ξn,i is νn,i = δn−1 i ⊗ ∞ v# P(c−1 n X 0 ∈ ·) and by regular variation i=1 νn,i −→ Leb × ν 0 on [0, ∞) × Rd \ {0}. Indeed, let f be a continuous function with a bounded support in [0, ∞) × Rd \ {0}. This implies that there exists c > 0 such that f (t, x) = 0

186

7 Point process convergence

if t > c. Moreover, by Lemma B.1.30, it suffices to consider functions f of the form f (t, x) = g(t)h(x). Since h has bounded support, there exists  > 0 such that h(x) = 0 whenever |x| < . Thus, applying (2.1.8), ∞ 

νn,i (f ) =

i=1

∞ 

i E[f ( ni , X cn )] =

i=1

=

1 nE[h( Xcn1 )] n



 1≤i≤nc

1≤i≤nc

g( ni )

g( ni )E[h( Xcn1 )]  → ν 0 (h)

c 0

g(t)dt = Leb ⊗ ν 0 (f ) .

Therefore we can apply Theorem 7.1.21 to conclude the proof.



7.3 Convergence of the point process of clusters In the case of a time series, exceedences over a high threshold come in clusters, except in the case of extremal independence. Therefore, it is natural to consider the point process of clusters. Let {X j , j ∈ Z} be a stationary regularly varying sequence with tail index α and tail process {Y j , j ∈ Z}. Let {cn } be a scaling sequence and {rn } be an intermediate sequence. Set mn = [n/rn ] and for i ≥ 1, let the i-th rescaled block X n,i be defined by X n,i = c−1 n (X (i−1)rn +1 , . . . , X irn ) . Define the (functional) point process of clusters Nn on [0, ∞) × ˜0 \ {0} by Nn

=

∞ 

δm−1 . n i,X n,i

(7.3.1)

i=1

Assume that P(lim|j|→∞ |Y j | = 0) = 1, which implies that ϑ = P(Y ∗−∞,−1 ≤ 1) > 0, and let Q be the conditional spectral tail process as in Definition 5.4.6. Recall the measures ν ∗n,rn and ν ∗ defined in (6.2.1) and (6.2.3). That is  1 , E δc−1 n X 1,rn rn P(|X 0 | > cn )  ∞ ν ∗ (H) = ϑ E[δrQ ]αr−α−1 dr . ν ∗n,rn =

(7.3.2a) (7.3.2b)

0

By Lemma 6.2.4, given {cn }, the intermediate sequence can be chosen in such v# a way that ν ∗ −→ ν ∗ in ˜0 \ {0}. n,rn

7.3 Convergence of the point process of clusters

187

Theorem 7.3.1 Let {X j , j ∈ Z} be a stationary regularly varying sequence with tail index α and tail process {Y j , j ∈ Z} such that P(lim|j|→∞ |Y j | = 0) = 1. Let {cn } be a scaling sequence such that limn→∞ nP(|X 0 | > cn ) = 1 and let {rn } be an intermediate sequence v# such that ν ∗n,rn −→ ν ∗ in ˜0 \ {0}. Then the following properties are equivalent: (i) the functional point process of clusters Nn converges weakly to a Poisson point process N  on [0, ∞)ט0 \{0} with mean measure Leb⊗ν ∗ ; (ii) for all non-negative continuous functions f with bounded support in [0, ∞) × ˜0 \ {0},

 ∞     −Nn (f ) −f (m−1 i,X ) n,i n − E e lim E e =0. (7.3.3) n→∞

i=1

Remark 7.3.2 By Lemma 7.1.14, if (i) holds, then the limiting point process N  can be expressed as N  =

∞ 

δΓi ,Pi Q (i) ,

(7.3.4)

i=1

∞ where i=1 δΓi ,Pi is a Poisson point process on [0, ∞) × (0, ∞) with mean measure ϑLeb ⊗ να and Q(i) , i ∈ N, are i.i.d. copies of the conditional spectral tail process Q, independent of the previous point process. The Laplace functional of N  can be written as  ∞ ∞  −N  (f ) ] = −ϑ E 1 − e−f (t,uQ ) dt αu−α−1 du . (7.3.5) log E[e 0

0



Proof ( Proof of Theorem 7.3.1). Let {X †n,i , i, n ≥ 1} be an array of row-wise i.i.d. random elements in ˜0 such that X †n,1 has the same distribution as X n,1 for all n ≥ 1. Define ξn,i = δm−1 † . Then {ξn,i } is a null array of point n i,X n,i processes on [0, ∞) × ˜0 \ {0}. The mean measure of ξn,i is δm−1 ⊗ ν ∗n,rn and n i by the same arguments as in the proof of Theorem 7.2.1, ∞ 

v#

δm−1 ⊗ ν ∗n,rn −→ Leb ⊗ ν ∗ n i

i=1

∞ on [0, ∞) × ˜0 \ {0}. By Theorem 7.1.21 this implies that i=1 ξn,i converges weakly to a Poisson point process N  on [0, ∞) × ˜0 \ {0} with mean measure

188

7 Point process convergence

 † ∞ ∞ −1 Leb⊗ν ∗ . Since i=1 E e−f (mn i,X n,i ) is the Laplace functional of i=1 ξn,i , by Theorem 7.1.17, we have ∞   † −1  E e−f (mn i,X n,i ) = E[e−N (f ) ] . lim n→∞

i=1

We can now prove the stated equivalence.    If (i) holds, then limn→∞ E e−Nn (f ) = E[e−N (f ) ] by Theorem 7.1.17 and thus (7.3.3) holds since both terms have the same limit. If (ii) holds then ∞     −1  lim E e−Nn (f ) = lim E e−f (mn i,X n,i ) = E[e−N (f ) ] .

n→∞

n→∞

i=1



Thus (i) holds by Theorem 7.1.17.

We have seen in Section 6.2 that Condition AC(rn , cn ) implies that the tail v#  process vanishes at infinity and that ν ∗n,r −→ ν ∗ on ˜0 \ {0}. n

Corollary 7.3.3 Let {X j , j ∈ Z} be a stationary regularly varying sequence with tail index α. Let {cn } be a scaling sequence and let {rn } be an intermediate sequence such that AC(rn , cn ) and (7.3.3) hold for all bounded Lipschitz continuous functions with bounded support. Then the functional point process of clusters Nn converges weakly to a Poisson point process N  on [0, ∞) × ˜0 \ {0} with mean measure Leb ⊗ ν ∗ .

As a second corollary we also obtain the convergence of the functional point process of exceedences Nn defined in (7.2.1). Corollary 7.3.4 Let {cn } be a scaling sequence and let {rn } be an intermediate sequence such that AC(rn , cn ) holds. If Nn converges weakly to a Poisson point process N  on [0, ∞) × ˜0 \ {0} with mean measure Leb ⊗ ν ∗ , then Nn converges weakly to a Poisson point process N  on [0, ∞) × Rd \ {0} given by N =

∞   i=1 j∈Z

δΓi ,Pi Q (i) , j

where {Γi , Pi , Q(i) , i ≥ 1} are as in Remark 7.3.2.

(7.3.6)

7.3 Convergence of the point process of clusters

189

For a function f : [0, ∞) × Rd → R+ , define Hf on [0, ∞) × ˜0 → R+ by  Hf (t, x) = f (t, xj ) . (7.3.7) j∈Z

Then N  (f ) = N  (Hf ). This implies that the mean measure λ and Laplace functional of N  are given by  ∞  ∞ λ(f ) = ϑ E[f (t, rQj )]αr−α−1 drdt , (7.3.8) j∈Z

log E[e−N



(f )



] = −ϑ

0

0

0

∞



0

  E 1 − e− j∈Z f (t,uQ j ) dt αu−α−1 du , (7.3.9)

for bounded continuous functions f with bounded support in [0, ∞)×Rd \{0}.  ∞ − j∈Z f (t,ux j ) }αu−α−1 du Since f vanishes for small u, the map x → 0 {1−e satisfies the conditions of Lemma 5.6.3, so we can express the Laplace functional in terms of the forward spectral tail process Θ:  ∞ ∞   ∞ −N  (f ) log E[e ] = −ϑ E e− j=0 f (t,uΘj ) 0 0 ∞ −e− j=1 f (t,uΘj ) dt αu−α−1 du . (7.3.10) If the time series {X j } is extremally independent, i.e. if its tail process is trivial, then the limiting point process is simply the Poisson point process with mean measure ν 0 . By considering functions of the form f (t, x) = 1[0,1] (t)g(x), we obtain the convergence of the point process of exceedences Nn → N with N=

∞   i=1 j∈Z

δPi Q (i) ,

(7.3.11)

j

Proof. ( Proof of Corollary 7.3.4). Let f be a bounded continuous function with bounded support in [0, ∞) × Rd \ {0} and let Hf be as in (7.3.7). Since Hf is shift-invariant with respect to the second variable, it can be identified to a function on [0, ∞)× ˜0 and it is bounded away from 0 in ˜0 by assumption on f . We will prove first that d

Nn (Hf ) −→ N  (Hf ) .

(7.3.12)

This convergence does not follow directly from the weak convergence of Nn since Hf is not bounded on 0 (Rd ). We must use a truncation argument. Then d

d

Nn (Hf ∧ M ) −→ N  (Hf ∧ M ) as n → ∞ for all M and N  (Hf ∧ M ) −→ N  (Hf ) as M → ∞. We only need to apply the triangular argument Lemma A.2.10, i.e. to prove that for all  > 0,

190

7 Point process convergence

lim lim sup P(Nn (Hf 1{Hf > M }) > ) = 0 .

(7.3.13)

M →∞ n→∞

Since f has bounded support in [0, ∞) × Rd \ {0}, there exist A, C, δ > 0 such that Nn (Hf 1{Hf > M }) ≤ cst

Am n

Sn,i 1{Sn,i > M/C}

i=1

 rn  with Sn,i = j=1 1 X (i−1)rn +j > cn δ . By Markov inequality and stationarity, and since mn rn ∼ n, we obtain P(Nn (Hf 1{Hf > M }) > ) ≤ cst −1 mn E[Sn,1 1{Sn,1 > M/C}]

 rn  −1 1{|X i | > cn δ} > M/C ≤ cst  nP |X 0 | > cn δ, i=−rn



≤ cst −1 nP |X 0 | > cn δ,

max

m≤|j|≤rn

 |X i | > cn δ

,

with m = M/(4C). This shows that (7.3.13) is a consequence of AC(rn , cn ). This concludes the proof of (7.3.12). P

Finally, we must establish that Nn (f ) − Nn (Hf ) −→ 0. By Lemma B.1.30, it suffices to consider functions of the form f (t, x) = g(t)h(x) where g has bounded support in [0, ∞) (say, [0, A]) and h(x) = 0 if x∗ < δ for some δ > 0. For H : [0, ∞) × ˜0 → R+ we have Nn (H) =

∞ 

H(m−1 n i, X n,i ) .

i=1

Applying this to Hf gives Nn (Hf ) = =

rn ∞  

−1 f (m−1 n i, cn X (i−1)rn +j )

i=1 j=1 ∞ 

rn 

i=1

j=1

g(m−1 n i)

h(c−1 n X (i−1)rn +j ) .

On the other hand Nn (f ) =

∞ 

f (n−1 i, c−1 n X i) =

i=1

=

rn ∞   i=1 j=1

∞ 

g(n−1 i)h(c−1 n X i)

i=1

g(n−1 ((i − 1) + j))h(c−1 n X (i−1)rn +j ) .

7.4 Sufficient conditions for point process convergence

191

Then, Nn (f )−Nn (Hf ) rn ∞   {g(n−1 ((i − 1)rn + j)) − g(i/mn )}h(c−1 = n X (i−1)rn +j ) . i=1 j=1

Since g is continuous and has bounded support in [0, ∞), it is uniformly continuous. Thus for  > 0 and n large enough, we have, for all i ≥ 1 and j = 1, . . . , rn , |g(n−1 ((i − 1)rn + j)) − g(i/mn )| ≤  . This yields |Nn (f ) − Nn (Hf )| ≤ 

rn ∞     1 n−1 ((i − 1)rn + j) < A |h(c−1 n X (i−1)rn +j )| i=1 j=1

= Nn (H) ,  d with H(t, x) = 1{t ≤ A} j∈Z |h(xj )| for x ∈ 0 . Since Nn (H) −→ N  (H) (by the same argument as those that leads to (7.3.12)), this proves that P  Nn (f ) − Nn (Hf ) −→ 0. Remark 7.3.5 If AC(rn , cn ) holds, then it can be shown along the lines of w the proof of Theorem 7.3.1 that Nn =⇒ N  if and only if  [n/r n ]  rn  (7.3.14) lim E e−Nn (f ) − E e− i=1 f (irn /n,X i /cn ) = 0 , n→∞ i=1 for all non-negative Lipschitz continuous functions f with bounded support in [0, ∞) × Rd \ {0}. ⊕

7.4 Sufficient conditions for point process convergence In this section, we consider two types of conditions which can be checked on time series models.

7.4.1 β-mixing time series The β-mixing condition is a relatively stringent but commonly made assumption for time series. See Appendix E.3 for the definition.

192

7 Point process convergence

Lemma 7.4.1 Let {X j , j ∈ Z} be a stationary, regularly varying time series with β-mixing coefficients {βn , n ≥ 1}. Assume that there exist sequences cn , n , rn such that limn→∞ nP(|X 0 | > cn ) = 1, AC(rn , cn ) holds and lim

n→∞

1 n rn n = lim = lim = lim β n = 0 . n→∞ rn n→∞ n n→∞ rn n

(β(rn , n ))

Then (7.3.3) holds for all bounded Lipschitz continuous functions with bounded  support in [0, ∞) × ˜0 \ {0}. Proof. Let {X †j , j ∈ Z} be a sequence such that the blocks X †(i−1)rn +1,irn ,

i ≥ 1 are i.i.d. and have the same distribution as X 1,rn . Write X †n,i = † −1 ˜ c−1 n X (i−1)rn +1,irn and the truncated blocks X n,i = cn X (i−1)rn +1,irn − n

˜ † = c−1 X † and X n,i n (i−1)rn +1,irn − n . Define the following point processes:  = N n

∞  i=1

δ

˜ m−1 n i,X n,i

,

† Nn

=

∞  i=1

δm−1 † n i,X

n,i



 = , N n

∞ 

δm−1 i,X˜ † .

i=1

n

n,i

Let f be a non-negative bounded Lipschitz continuous function with bounded  By Lemma E.3.4, the β-mixing condition β(rn , n ) support in [0, ∞)× ˜0 \{0}. implies that   † n     β = o(1) . E e−Nn (f ) − E e−Nn (f ) ≤ rn n We must now prove that the small blocks are negligible. By stationarity of the blocks and within the blocks and since f is bounded and Lipschitz continuous  there exists  > 0 such that with bounded support in [0, ∞) × ˜0 \ {0},  †  (f ) + N  † (f ) − N  (f ) = O( n )P(X ∗ > cn ) . E Nn (f ) − N 1, n n n n rn This yields     †   †     E e−Nn (f ) − E e−Nn (f ) + E e−Nn (f ) −E e−Nn (f ) ≤ O( rnn )P(X ∗1, n > cn ) . Since Condition AC(rn , cn ) implies AC(n , cn ), we have by Corollary 6.2.6   n n P(X ∗1, n > cn ) = O = o(1) . rn rn  Corollary 7.3.3 and Lemma 7.4.1 together yield the following result.

7.4 Sufficient conditions for point process convergence

193

Corollary 7.4.2 Let {X j , j ∈ Z} be a stationary regularly varying β-mixing time series. Assume that Condition AC(rn , cn ) holds for sequences {rn } and {cn } such that limn→∞ nP(|X 0 | > cn ) = 1 and limn→∞ rn /n = 0. Assume moreover that there exists a sequence n such that n = o(rn ) and β(rn , n ) holds, then the functional point process of clusters Nn converges weakly to a Poisson point process N  on [0, ∞) × ˜0 \ {0} with mean measure Leb ⊗ ν ∗ .

7.4.2 m-dependent approximations Let {X j , j ∈ Z} be a stationary regularly varying time series which admits a (m) sequence of approximations {X j , j ∈ Z} , m ≥ 1, such that for each m the  (defined with the same scaling functional point processes of clusters Nn,m  as n tends to infinity. sequence) converge weakly to a point process N(m) This is the case for instance if X (m) is regularly varying and m-dependent; it then satisfies Condition AC(rn , cn ) (by Lemma 6.1.3) and is β-mixing with geometric rate, thus Corollary 7.3.3 can be applied.

 Assume moreover that N(m) converges weakly to a point process N  on  What more is needed to ensure that Nn converges to N  ? [0, ∞) × ˜0 \ {0}. It turns out that condition (5.2.13) of Proposition 5.2.5 is sufficient if the approximation is jointly stationary with the original sequence.

Theorem 7.4.3 Let {X j , j ∈ Z} be a stationary regularly varying time series. Assume that there exists a sequence of regularly varying time (m) series X (m) , m ≥ 1, such that for each m, {(X j , X j ), j ∈ Z} is stationary and for all  > 0,   (m) P X 0 − X 0 > x =0. (7.4.1) lim lim sup m→∞ x→∞ P(|X 0 | > x) (m)

Let {rn } be an intermediate sequence. Set mn = [n/rn ] and X n,i = (m)

(m)

 c−1 n (X (i−1)rn +1 , . . . , X irn ), i ≥ 1. Define the point process Nn,m by  Nn,m =

∞ 

δm−1 i,X (m) . n

i=1

(7.4.2)

n,i

  Assume that Nn,m converges weakly to a point process N(m) on [0, ∞) ×  ˜  0 \ {0} as n tends to infinity for each m ≥ 1 and that N(m) converges

194

7 Point process convergence

 as m tends to infinity. weakly to a point process N  on [0, ∞) × ˜0 \ {0} Then Nn converges weakly to N  as n tends to infinity.

Proof. We must prove that for all bounded Lipschitz continuous functions f  and for all η > 0, with bounded support in [0, ∞) × ˜0 \ {0}    lim lim sup P |Nn,m (f ) − Nn (f )| > η = 0 . (7.4.3) m→∞ n→∞

d

This will prove by the triangular argument Lemma A.2.10 that Nn (f ) −→ N  (f ) for all bounded Lipschitz continuous functions f with bounded support w  and this will ensure that Nn =⇒ in [0, ∞) × ˜0 \ {0} N  by Theorem 7.1.17. Fix η > 0 and assume without loss of generality that f is 1-Lipschitz with  Then, there exists A,  > 0 such that bounded support in [0, ∞) × ˜0 \ {0}. ∗ f (t, x) = 0 if t > A or x ≤ . Thus, for all ζ > 0,  P(|Nn,m (f ) − Nn (f )| > η)   (m)  ≤ P max |X i − X i | > cn ζ + P(ζNn,m ([0, A] × {x∗ > }) > η) 1≤i≤An   (m)  ≤ cst nP |X 0 − X 0 | > cn ζ + P(Nn,m ([0, A] × {x∗ > }) > η/ζ) .  Applying (7.4.1) and the assumption on Nn,m , we obtain  lim lim sup P(|Nn,m (f ) − Nn (f )| > η) ≤ P(N  ([0, ∞) × {x∗ > }) > η/ζ) .

m→∞ n→∞

Since N  ([0, ∞)×B(0, )c is almost surely finite and ζ is arbitrary, this proves (7.4.3).  Corollary 7.4.4 Let {X j , j ∈ Z} be a stationary regularly varying time series with tail process Y such that P(Y ∈ 0 (Rd )) = 1 and cluster measure ν ∗ . Assume that there exists a sequence of m-dependent regularly varying time series X (m) , m ≥ 1, such that for each m, (m) {(X j , X j ), j ∈ Z} is stationary and (7.4.1) holds for all  > 0. Let {rn } be an intermediate sequence and define Nn as in (7.3.1). Then Nn converges weakly as n tends to infinity to a Poisson point process N  on  with mean measure Leb ⊗ ν ∗ and Nn converges weakly [0, ∞) × ˜0 \ {0} to the point process N  given in (7.3.6).

Remark 7.4.5 The assumption that the m-dependent approximation and the original series are jointly stationary can be relaxed at the cost of a more

7.5 The extremal index

195

complex condition than (7.4.1). From a practical point of view, the current one seems sufficient. It will be fruitfully used in Chapter 15. ⊕  be as in (7.4.2) Proof. Let ν ∗(m) be the cluster measure of X (m) and Nn,m with rn an intermediate sequence. Since AC(rn , cn ) holds for m-dependent sequences which moreover are β-mixing with geometric rates, we can apply  converges weakly to a Poisson point Corollary 7.4.2 to obtain that Nn,m v#

 process N(m) with mean measure Leb⊗ν ∗(m) . By Lemma 6.2.7, ν ∗(m) −→ ν ∗ , w

w

 thus N(m) =⇒ N  by Lemma 7.1.18 and Nn =⇒ N  by Theorem 7.4.3. The proof of the convergence of Nn to N  is a direct proof along the same lines  and does not use AC(rn , cn ).

Remark 7.4.6 We see that for a time series X which is m-dependent approximable in the sense of (7.4.1), the convergence of the point process of clusters holds with any intermediate sequence rn , that is for any sequence such that rn /n → 0. This will be highlighted in Chapter 15.

7.5 The extremal index Let X1 , . . . , Xn be i.i.d. random variables with common distribution function F . Then the distribution function of max1≤i≤n Xi is F n . If the random variables are dependent, the distribution of the sample maximum is not easily obtained. Roughly speaking, the extremal index, if it exists, quantifies the order of magnitude by which the maximum of dependent identically distributed random variables differs from the maximum of i.i.d. random variables with the same distributions. Definition 7.5.1 (The extremal index) A stationary univariate time series {Xj , j ∈ Z} has extremal index θ if for every τ > 0 and increasing sequence {cn },   lim nP(X0 > cn ) = τ ⇐⇒ lim P max Xj ≤ cn = e−θτ . (7.5.1) n→∞

n→∞

1≤j≤n

The extremal index does not necessarily exist and when it does exist it is often not easy to compute explicitly using Definition 7.5.1. If the extremal index exists, the definition implies that it is non-negative. Furthermore, if τ and cn are as in (7.5.1), e−θτ = lim {1 − P( max Xi > cn )} ≥ lim {1 − nP(X1 > cn )} = 1 − τ . n→∞

1≤i≤n

n→∞

196

7 Point process convergence

If such a sequence {cn } exists for all τ > 0 (or at least infinitely many in a neighborhood of zero), then this implies that θ ∈ [0, 1]. By definition, if {Xj† , j ∈ Z} is a sequence of i.i.d. random variables with regularly varying marginal distribution, then its extremal index exists and is equal to 1. If the marginal distribution is the marginal distribution of the stationary sequence {Xj , j ∈ Z}, and if the extremal index exists and is equal to θ, then the limit in the right-hand side of (7.5.1) can be reexpressed as    θ † P max Xj ≤ cn ∼ P max Xj ≤ cn . 1≤j≤n

1≤j≤n

We give two simple examples. Example 7.5.2 Assume that X is a regularly varying random variable and define the time series {Xj } by Xj = X for all j ∈ Z. Note that the tail process is constant Yj = Y0 for all j ∈ Z and thus ϑ = P(Y ∗−∞,−1 ≤ 1) = 0. Furthermore, its extremal index θ is also 0. Indeed, for any scaling sequence {cn },   lim P max Xj ≤ cn = lim P (X ≤ cn ) = 1 . 1≤j≤n

n→∞

n→∞

 Example 7.5.3 Let {Zj , j ∈ Z} be a sequence of i.i.d. random variables with unit Fr´echet distribution. Define the sequence {Xj , j ∈ Z} by Xj = max{Zj , Zj−1 }, j ∈ Z. Then P(X1 ≤ x) = P(Z0 ∨ Z1 ≤ x) = e−2/x . Thus, for every τ > 0, we have lim nP(X1 > 2n/τ ) = lim n{1 − e−τ /n } = τ , n→∞     lim P max Xj ≤ 2n/τ = lim P max Zj ≤ 2n/τ n→∞

n→∞

1≤j≤n

n→∞

= (e

0≤j≤n

−1/(2n/τ ) n

) = e−τ /2 .

If {Xj† } is an i.i.d. sequence with the same marginal distribution as {Xj }, then   † lim P max Xj ≤ 2n/τ = e−τ . n→∞

1≤j≤n

This proves that the extremal index of the sequence {Xj } is 1/2. Recall that we have computed the tail process of this sequence in Example 5.2.11: Y1 = BY0 with B a Bernoulli random variable with mean 1/2 and independent of Y0 and Yj = 0 for all j ≥ 2. Thus

7.5 The extremal index

197

ϑ = P(Y1 ≤ 1) = P(B = 0) = 1/2 = θ. In this case again the extremal index is equal to the candidate extremal index.  In the previous examples of non-negative time series, the extremal index is equal to the candidate extremal index. There is no general relation between the candidate and the true extremal index, except when the former vanishes. Lemma 7.5.4 Let {Xj , j ∈ Z} be a non-negative stationary regularly varying time series. (i) If ϑ = 0, then θ = 0. (ii) If {Xj , j ∈ Z} admits an extremal index θ, then θ ≤ ϑ. Proof. Let cn be such that nP(X0 > cn ) → 1. Denote Mn = max1≤i≤n Xi . Splitting on the first time the level cn is reached and using stationarity, we have, for x > 0, each fixed k ≥ 2 and n ≥ k, P(Mn > cn x) = P(Xn > cn x) +

n−1 

P(Xi > cn x, X ∗i+1,n ≤ cn x)

i=1 n−1 

= P(X0 > cn x) +

P(X0 > cn x, X ∗1,n−i ≤ cn x)

i=1

≤ kP(X0 > cn x) + (n − k)P(X0 > cn x, X ∗1,k ≤ cn x) . Therefore, for each fixed k ≥ 2, letting n → ∞ yields lim sup P(Mn > cn x) ≤ P(Y ∗1,k ≤ 1)x−α .

(7.5.2)

n→∞

Letting k → ∞ and applying (5.5.6) to the last exceedence map which satisfies (5.5.2) and monotone convergence proves that lim sup P(Mn > cn x) ≤ P(Y ∗1,∞ ≤ 1) = ϑx−α . n→∞

Thus ϑ = 0 implies θ = 0. Otherwise, if the extremal index exists, the lim sup −α in the left-hand side of (7.5.2) becomes a limit, equal to 1 − e−θx , whence for all x > 0, −α

1 − e−θx

≤ ϑx−α .

Multiplying both sides by xα and letting x → 0 shows that θ ≤ ϑ.

(7.5.3) 

Under the condition that ensures the convergence of the point process of exceedences to the point process in (7.3.11), we can conclude that the candidate extremal index is indeed the extremal index.

198

7 Point process convergence

Corollary 7.5.5 Let {Xj , j ∈ Z} be a univariate, non-negative, stationary regularly varying time series such that the convergence (7.3.11) holds. Then the extremal index of {Xj , j ∈ Z} exists and equals ϑ.

We will obtain Corollary 7.5.5 as a particular case of the next result. In order to deal with real-valued time series, we introduce the right-tail candidate extremal index:   + ϑ = P sup Yj ≤ 1 | Y0 > 1 (7.5.4a)  =E

j≥1

sup(Θj )α + j≥0



sup(Θj )α + j≥1

 | Θ0 > 0

(7.5.4b)

E[supj∈Z (Qj )α +] . =  α j∈Z E[(Qj )+ ]

(7.5.4c)

Corollary 7.5.6 Let {Xj , j ∈ Z} be a univariate, stationary regularly varying time series such that the convergence (7.3.11) holds. Then the extremal index exists and θ = ϑ+ . In particular, if {Xj , j ∈ Z} is mdependent, then θ = ϑ+ .

Proof. Recall that nP(|X0 | > cn ) → 1. Let c+ n be such that nP(X0 > cn ) → 1. 1/α + Thus, cn ∼ pX cn , where pX is the extremal skewness of X0 . We have for x > 0,     1/α + P max Xi ≤ cn x ∼ P max Xi ≤ pX cn x 1≤i≤n

1≤i≤n

1/α

1/α

= P(Nn ((pX x, ∞) = 0) → P(N ((pX x, ∞)) = 0) ⎛ ⎞ ∞    1/α = P⎝ 1 Pi Qi,j > pX x = 0⎠ ⎛

i=1 j∈Z

⎞ ∞    1/α = P⎝ 1 Pi (Qi,j )+ > pX x = 0⎠ . i=1 j∈Z

Using (7.3.8) and (5.6.12a) yields

7.5 The extremal index

⎛ P⎝

199



∞    1/α 1 Pi (Qi,j )+ > pX x = 0⎠ i=1 j∈Z

  = exp −ϑ



1/α

P(r sup(Qj )+ > xpX )αr−α−1 dr

0





j∈Z



  + −α α −α = exp −ϑp−1 E sup (Q ) x = e−ϑ x . j + X j∈Z

This proves that θ = ϑ+ .



Corollary 7.5.7 Let {Xj , j ∈ Z} be a univariate, stationary regularly varying time series. Assume that there exists a sequence of m-dependent (m) (m) approximations {Xj , j ∈ Z} , m ≥ 1, such that {(Xj , Xj ), j ∈ Z} is (m)

stationary for each m and (7.4.1) holds for all  > 0. Let {Yj be the tail process of

(m) {Xj , j

∈ Z}. If there exists θ ∈ (0, 1] such that

 θ = lim P m→∞

, j ∈ Z}

(m)

max Yi

1≤i≤m

(m)

≤ 1 | Y0

 >1 ,

(7.5.5)

then θ is the extremal index of {Xj , j ∈ Z}.

Proof. For every integer m, we have   P max Xi ≤ cn x 1≤i≤n   (m) = P max Xi ≤ cn x, max |Xi − Xi | ≤ cn  1≤i≤n 1≤i≤n   (m) + P max Xi ≤ cn x, max |Xi − Xi | > cn  1≤i≤n

 ≤P

(m)

max Xi

1≤i≤n

1≤i≤n

  n (m) ≤ cn (x + ) + P(|Xi − Xi | > cn ) . i=1 (m)

(m)

Write θ(m) = P(max1≤i≤m Yi ≤ 1 | Y0 > 1). By Corollary 7.5.6, we (m) know that θ(m) is the extremal index of the sequence X (m) = {Xj , j ∈ Z}. Applying (7.4.1) and (7.5.5), we obtain   (m) lim sup P max Xi ≤ cn x ≤ lim e−θ x = e−θx . n→∞

1≤i≤n

m→∞

For the lower bound, reversing the role of X (m) and X yields

200

7 Point process convergence

 P

(m)

max Xi

1≤i≤n

≤ cn x)  ≤P

  n (m) max Xi ≤ cn (x + ) + P(|Xi − Xi | > cn ) .

1≤i≤n

i=1



The rest of the argument is similar.

7.6 Problems Poisson point processes 7.1 Let E be a localized Polish space, {Un , n ∈ N} a localizing sequence and ν a boundedly finite measure. Set mn = ν(Un ) < ∞. Let {Xn,i , 1 ≤ i ≤ n} be i.i.d. E-valued random variables with distribution m−1 n ν(· ∩ Un ). Let Mn be a random variable with a binomial distribution with parameters n and n−1 mn , independent of {Xn,i , 1 ≤ i ≤ n}. Define the point process Nn by Nn =

Mn 

δXn,i .

i=1 w

Prove that Nn =⇒ N where N is a PPP(ν). 7.2 Let α > 0 and Z be a random element in a Banach space (E,  ·) such α that E[Z ] < ∞. Define a measure on E by  ∞ ν= E[δrZ ]αr−α−1 dr . 0

Prove that ν is boundedly finite on E \ {0} with the boundedness endowed ∞ B0 and that a PPP(ν) can be expressed as i=1 δΓ −1/α Z (i) where {Γi , i ≥ 1} i are the points of a homogeneous PPP on [0, ∞) and Z i , i ≥ 1 are i.i.d. copies of Z. 7.3 (Converse to Proposition 7.1.2) Let {X n,i , 1 ≤ i ≤ n, n ≥ 1} be a for each n ≥ 1, |X n,1 | > triangular array of random vectors in Rd such that n |X n,2 |1= > · · · > |X n,n | > 0. Define Nn = i=1 δX n,i . Let N be a simple point process on Rd \ {0} with a boundedly finite mean measure ν and infinitely many points in any neighborhood of 0. Assume moreover that the point process on (0, ∞) of the norms of the points of N is also simple. Let {Pi , i ≥ 1} be the enumeration of the points of N by decreasing order of their d

norm. Assume that for all k ≥ 1, (X n,1 , . . . , X n,k ) −→ (P1 , . . . , Pk ). Prove w that Nn =⇒ N . Hint: use the Laplace functional and a triangular argument.

7.6 Problems

201

Point process of exceedences 7.4 Let ν be a shift-invariant tail measure with support  in 0 (Rd ) and let Θ ∞ α be the associated spectral tail process and assume that j=0 E[|Θj | ] < ∞  be a random sequence with the distribution and P(Θ∗−∞,−1 = 0) > 0. Let Θ ∗  (i) , i ≥ 1} be i.i.d. copies of Θ.  of Θ conditionally on Θ−∞,−1 = 0 and {Θ ∞ Let i=1 δTi ,Pi be a Poisson point process on [0, 1] × Rd \ {0} with mean  (i) . Prove measure P(Θ∗−∞,−1 = 0) · Leb ⊗ να , independent of the sequences Θ that ∞  δTi ,Pi Θ (7.6.1) N  =  (i) . i=1

is a PPP(ν ∗ ). Hint: Use Problem 5.17 and Problem 7.2. 7.5 Let {Xj , j ∈ Z} be a non-negative regularly varying stationary sequence ¯ be the point process which satisfies Conditions AC(rn , cn ) and (7.3.14). Let N on [0, 1] which counts the number of exceedences over the threshold cn such that limn→∞ nP(X0 > cn ) = 1, that is, for a Borel set A of [0, 1], n  ¯n (A) = N δ ni (A)1{Xi > cn } . i=1

¯ on ¯n converges weakly to a compound Poisson point process N Prove that N [0, 1] which can be expressed as ∞  ¯ = N δTi Ki , i=1

where {Ti , i ∈ N∗ } are the points of a homogeneous Poisson point process on [0, ∞) with rate ϑ and {Kj , j ≥ 1} is a sequence of i.i.d. integer valued random variables such that ⎛ ⎞ ∞  1{Yj > 1} = m | Y ∗−∞,−1 ≤ 1⎠ , m ≥ 1 . P(K1 = m) = P ⎝ j=0

This distribution is the asymptotic cluster size distribution. 7.6 Let Nn be the point process of exceedences of a univariate regularly w varying time series. Assume that Nn =⇒ N where N is a PPP(να ). Give the asymptotic joint distribution of the largest k observations. 7.7 Let (X (0) , X (1) ) be a bivariate vector with non-negative components, regularly varying with tail index α > 0 satisfying Definition 3.2.1 with conditional (0) (1) scaling exponent κ and measure μ as in (3.2.2). Let {(Xj , Xj ), j ∈ Z} be a sequence of i.i.d. vectors whose marginal distribution is the distribution

202

7 Point process convergence

of (X (0) , X (1) ). Let cn be the quantile of order 1 − 1/n of X (0) and Nn be the point process on (0, ∞) × [0, ∞) endowed with the boundedness B of sets separated from the axis x0 = 0, defined by Nn =

n  i=1

(0)

δ X (0) i cn

X

(1)



.

, b(ci

n)

(0)

(n)

Let X(n:k) be the increasing order statistics of X0 , . . . , X0 the concomitant of the i-th order statistic, 1 ≤ i ≤ n.

(1)

and X[n:i] , be

w

1. Prove that Nn =⇒ N where N is a Poisson point process on (0, ∞)×[0, ∞) with mean measure μ. Hint: apply Theorem 7.1.21. 2. Give a representation of N in terms of the random variable W in (3.2.6) and (3.2.9). 3. Apply Proposition 7.1.2 to obtain the asymptotic distribution of the con(1) comitant X[n:n] /b(cn ). Extremal index 7.8 Consider the MA(1) process of Example 5.2.10 and Problem 5.6. Show that # 1∨φα if φ ≥ 0 , α + ϑ = 1+φ 1 if φ < 0 . 7.9 Let ρ ∈ (−1, 1) and {Zj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables with tail index α > 0. Consider the stationary AR(1) prod ∞ j cess defined by the recursion Xj+1 = ρXj + Zj+1 and X0 = j=0 ρ Zj . α Compute the spectral tail process and prove that ϑ = 1 − |ρ| and # 1 − ρα if ρ ≥ 0 , + ϑ = 1 − |ρ|2α if ρ < 0 . 7.10 Let {X j , j ∈ Z} be a regularly varying time series satisfying the assumptions of Corollary 7.5.6. We choose the Euclidean norm on Rd . Let Θ be the spectral tail process. For a fixed u ∈ Rd , assume that P(u, Θ0  > 0) > 0. Define the scaling sequences cn and un such that limn→∞ nP(|X 0 | > cn ) = nP(X 0 , u > un ) = 1. 1. Prove that limn→∞ nP(X 0 , u > cn ) = E[u, Θ0 α + ] > 0 and un ∼ E1/α [u, Θ0 α ]c . + n  α 2. Prove that ϑ j∈Z E[Qj , uα + ] = E[u, Θ0 + ].

7.7 Bibliographical notes

203

3. Prove that the extremal index of {u, X j , j ∈ Z} is given by   α E[supj≥0 Θj , uα E supj∈Z Qj , uα + + − supj≥1 Θj , u+ ] = . ϑ(u) =  α E[Θ0 , uα +] j∈Z E[Qj , u+ ] Autocovariances 7.11 Assume that {Xj , j ∈ Z} is a univariate stationary regularly varying time series. Let cn be such that limn→∞ nP(|X0 | > cn ) = 1, AC(rn , cn ) holds and the point process of clusters Nn converges to the limit N  in (7.3.4). For h ≥ 1, define Nh,n =

n−h 

δc−2 . 2 n (X ,Xi Xi+1 ,...,Xi Xi+h ) i

i=1

Prove that w

Nh,n =⇒

∞  

δPi ((Q(i) )2 ,Q(i) Q(i)

i=1 j∈Z

j

j

(i) (i) j+1 ,...,Qj Qj+h )

,

with {Pi , i ≥ 1} the points of a PPP(ϑνα/2 ) and Q(i) , i ≥ 1, i.i.d. copies of Q, independent of {Pi , i ≥ 1}. Hint: mimic the proof of Corollary 7.3.4 and use Problem 6.8. 7.12 Let the univariate stationary time series X be regularly varying with tail index α > 0 and satisfy Definition 3.2.1. Assume that there exists sequences cn and rn such that rn = o(n) and nP(|X0 | > cn ) → 1, AC(rn , cn ) holds and r  rn

n n n   −f (X n,i ) −f (X n,i ) − E lim E e e =0, n→∞

i=1

i=1

for all bounded functions f with bounded support in R \ {0} × R   continuous i Xi+1 . Let μ be as in Definition 3.2.1 and (W0 , W1 ) be as and X n,i = X , cn b(cn )

in (3.2.8). Prove that Nn converges weakly on R \ {0} × R to a PPP(μ) which can be represented as ∞  N= δ(Pi W (i) ,P 1+κ W (i) ) , i=1 (i)

0

i

1

(i)

where (W0 , W1 ), i ≥ 1, are independent copies of W .

7.7 Bibliographical notes The classical monograph [Res87] deals with i.i.d. sequences and uses the more usual vague convergence on the set [−∞, ∞]d \ {0}. Section 7.1 is essentially

204

7 Point process convergence

based on chapters 1–4 of the monograph [Kal17], complemented by [BP19] and [Zha16]. In particular, the last necessary and sufficient condition for weak convergence of random measures in Theorem 7.1.17 is [BP19, Proposition 4.6]. The study of the convergence of the point process of exceedences has a long story that we will not recall exhaustively here. In the earliest references, the point process of exceedences refers to the point process Nn (· × (1, ∞)) on [0, ∞) which counts the number of exceedences but does not record their values. The convergence of the point process N  is called complete Poisson convergence in [LLR83]. The convergence to a simple Poisson point process was established in [LLR83, Theorem 5.2.1] under two conditions. The first one is a mixing condition tailored to extremes, called Condition D, which is implied by strong mixing and holds for Gaussian stationary processes whose covariance decays sufficiently fast to zero. See [LLR83, Section 4.4]. The second one is the anticlustering Condition D which is related to extremal independence and prevents the clustering of points in the limiting point process. It is also satisfied by Gaussian processes under the same condition. In [HHL88], Condition D is dropped which results in clustering in the limiting point process. The ideas developed in this chapter for the study of the convergence of the point process of exceedences come from the extremely deep paper [DH95]. The sequence Q appears therein in relation to the representation of infinitely divisible point processes. Some results of that paper were translated into the language of the tail process by [BS09] which also introduced the candidate extremal index and proved that it is equal to the extremal index under the anticlustering condition AC(rn , cn ) and (7.3.14) (without the time component). The convergence of the functional point process of exceedences Nn under (7.3.14) was established partially in [BKS12] and the proof was completed in [BT16]. [Kri16] proved that Condition (7.3.14) holds for an alphamixing time series. These ideas were further developed by [BPS18] to prove the convergence of the point process of clusters under conditions AC(rn , cn ) and (7.3.3). The ideas from [DH95] are extended in [DM98b] to point processes of lagged vectors in order to study the limiting behavior of sample covariances. This is the approach which is used in Problem 7.11. [BDM99] applies this general approach to bilinear processes and [BDM02b] to GARCH processes. There are many references which establish the convergence of the functional point process of exceedences Nn for specific models by ad-hoc techniques. See, e.g. [DR85a, DR85b] (infinite order moving averages), [DR96] (bilinear processes), [Kul06] (moving averages with random coefficients), [DM01] and [KS12] (stochastic volatility models, possibly with long memory). The concept of extremal index dates back to 1950s. A summary is given in [LLR83, Section 3.7]. Results under various mixing conditions are given to [O’B87] and [Roo88]. The idea of obtaining the extremal index as the limit

7.7 Bibliographical notes

205

of extremal indices of tail equivalent m-dependent approximation seems to be due to [CHM91] which used condition (7.4.1) and other involved conditions which are replaced here efficiently by the use of the tail process. This technique has been used by [Til18] for shot-noise type processes.

8 Convergence to stable and extremal processes

In this chapter, we consider a real-valued stationary time series {Xj }. We will apply the point process convergence results of Chapter 7 to obtain weak convergence of partial maxima and partial sum processes, the latter when the tail index α ∈ (0, 2). That is we are interested in the weak convergence of the partial maxima and partial sum processes max Xi , t ≥ 0 , Mn (t) = a−1 n 1≤i≤[nt]

Sn (t) =

a−1 n

[nt] 

(Xi − bn ) , t ≥ 0 ,

(8.0.1) (8.0.2)

i=1

where an and bn are suitable normalizing sequences. Denote simply Mn = Mn (1) and Sn = Sn (1). Weak convergence of Mn and Sn follows from the point process convergence by a relatively straightforward application of the continuous mapping theorem (with some care needed to treat the centering term for convergence to a stable law when α ≥ 1). Weak convergence in the space D of the partial maximum process will be established in the M1 topology. The partial sum process may not converge in Skorokhod’s topologies introduced in Appendix C because it can happen that the running maximum of the partial sum process does not converge to the maximum of the limiting process. There are different methods to prove the results of this chapter. Using point process convergence is the most efficient for functional convergence. The running assumption of this chapter will be that the functional point process of exceedences Nn converges weakly to a marked Poisson point process. Assumption 8.0.1 The functional point process of exceedences

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 8

207

208

8 Convergence to stable and extremal processes

Nn =

∞  i=1

δ i , Xi

n an

of the stationary time series {Xj , j ∈ Z} converges weakly to a point process N  on [0, ∞) × (R \ {0}) which can be expressed as N =

∞  

δΓi ,Pi Q(i) ,

i=1 j∈Z

j

∞ where i=1 δΓi ,Pi is a Poisson point process on [0, ∞) × (R \ {0}) with mean measure ϑLeb ⊗ να and the sequences Q(i) , i ≥ 1, are i.i.d. copies of a sequence Q such that P(Q∗ = 1) = 1, independent of {Γi , Pi , i ≥ 1}.

The mean measure λ of N  is given by  ∞ ∞ λ(f ) = ϑ E[f (t, rQj )]αr−α−1 drdt , j∈Z

0

(8.0.3)

0

and its Laplace functional is given by   1  ∞     E 1 − e− j∈Z f (s,uQj ) αu−α−1 du ds , log E[e−N (f ) ] = −ϑ s=0

u=0

(8.0.4) for a non-negative f with bounded support in [0, ∞) × (R \ {0}); cf. (7.3.8)– (7.3.9). Choosing f (t, x) = 1[0,1] (t)1{|x| > 1} yields ∞ > λ(f ) = ϑ

 j∈Z

0



P(r|Qj | > 1)αr−α−1 dr = ϑ



E[|Qj |α ] .

j∈Z

This shows that Assumption 8.0.1 implies that  E[|Qj |α ] < ∞ .

(8.0.5)

j∈Z

We already know that this property holds if the sequence Q is the conditional spectral tail process of the time series X, but we have not made this assumption yet. In Section 8.1 we will study the functional convergence of the sample maxima to an extremal process. In Section 8.2 we will recall the definition of stable

8.1 Convergence of the partial maxima

209

laws and processes and provide the series representation which will appear as limits of the partial sum process in the results of Section 8.3. These limits will be first expressed in terms of the sequence Q appearing in Assumption 8.0.1. If we assume moreover that Q is the conditional spectral tail process of X, we will be able to apply the identities of Chapter 5 and Chapter 6 to obtain expressions in terms of the spectral tail process Θ and the forward spectral tail process Y .

8.1 Convergence of the partial maxima In this section, we assume that {Xj , j ∈ Z} is non-negative time series and we let an = F ← (1 − 1/n), where F is the distribution function of X0 . The renormalized partial maximum process Mn is defined by Mn (t) = a−1 max Xi , t ∈ [0, ∞) . n 1≤i≤[nt]

Define also the extremal process M by M (t) = sup Pi , t ∈ [0, ∞) , i:Γi ≤t



where i≥1 δΓi ,Pi is as in Assumption 8.0.1. The finite-dimensional distributions of the process M are given for 0 = t0 < t1 ≤ · · · < tk and x1 ≤ · · · ≤ xk by P(Mn (t1 ) ≤ x1 , . . . , Mn (tk ) ≤ xk ) = e−ϑ

k

−α q=1 (tq −tq−1 )xq

Theorem 8.1.1 If Assumption 8.0.1 holds, then Mn D([0, ∞)) endowed with the M1 topology.

.

w

=⇒ M in

Theorem 8.1.1 implies that

−α −1 lim P an max Xi ≤ x = e−ϑx , n→∞

1≤i≤n

which means that ϑ is the true extremal index of the time series X, introduced in Definition 7.5.1. This was already shown in Corollary 7.5.5. Proof ( Proof of Theorem 8.1.1). Fix t ∈ [0, ∞) and x > 0. Applying Assumption 8.0.1 and noting that [0, t] × (x, ∞) is a continuity set of λ, we obtain

210

8 Convergence to stable and extremal processes



lim P a−1 max X ≤ x = lim P (Nn ([0, t] × (x, ∞)) = 0) i n 1≤i≤[nt]

n→∞

n→∞

= P (N  ([0, t] × (x, ∞)) = 0) ⎞ ⎛ ∞    (i) 1{Γi ≤ t} 1 Pi Qj > x = 0⎠ = P⎝  =P

i=1 ∞ 

j∈Z



1{Γi ≤ t}1{Pi > x} = 0

.

i=1

  (i) The last equality follows from Q∗ = 1 whence j∈Z 1 Pi Qj > x = 0 if ∞ and only if 1{Pi > x} = 0. Since i=1 δTi ,Pi has mean measure Leb ⊗ ϑνα , we obtain that

−α −1 lim P an max Xi ≤ x = e−ϑtx . n→∞

1≤i≤[nt]

The convergence of finite-dimensional distributions is obtained similarly and by the independent increments of the Poisson point process. This yields for 0 = t0 < t1 ≤ · · · < tk and x1 ≤ · · · ≤ xk , lim P(Mn (t1 ) ≤ x1 , . . . , Mn (tk ) ≤ xk )

n→∞

= lim P (Nn ([0, t1 ] × (x1 , ∞)) = 0, . . . , Nn ((tk−1 , tk ] × (xk , ∞)) = 0) n→∞ ∞   =P 1{tq−1 ≤ Γi ≤ tq }1{Pi > x} = 0, 1 ≤ q ≤ k i=1

=

k  q=1

= e−ϑ



P k

∞ 

 1{tq−1 ≤ Γi ≤ tq }1{Pi > x} = 0

i=1

−α q=1 (tq −tq−1 )xq

.

Finally, since the processes Mn are non-decreasing and stochastically continuous, the convergence of finite-dimensional distribution implies weak conver gence in the M1 topology by Corollary C.3.8.

8.2 Representations of stable laws and processes We recall that a random variable Y has a stable distribution S(α, σ, β, c), α ∈ (0, 2), σ > 0, β ∈ [−1, 1] and c ∈ R if its characteristic function can be expressed as E[eizY ] = eψ(z) with

8.2 Representations of stable laws and processes

 −σ α |z|α {1 − iβsgn(z) tan(πα/2)} + icz , ψ(z) = −σ|z|{1 + i π2 βsgn(z) log(|z|)} + icz ,

if α = 1 , if α = 1 .

We will need the following identities:  ∞ (eir − 1)r−1−α dr = Γ (−α)e−iπα/2 , (0 < α < 1) , 0 ∞ (eir − 1 − ir)r−1−α dr = Γ (−α)e−iπα/2 , (1 < α < 2) , 0  ∞ (eir − 1 − ir1(0,1] (r))r−2 dr = −π/2 + ic0 , , 0

with c0 =

∞ 0

211

(8.2.1)

(8.2.2a) (8.2.2b) (8.2.2c)

r−2 (sin(r) − r1{r≤1} )dr.

A L´evy α-stable process {Λ(t), t ∈ R+ } is a stochastically continuous D([0, ∞))-valued process with independent and stationary stable increments. Its law is entirely characterized by the distribution of Λ(1). If the characteristic function of Λ(1) is given by the formula above, it will be referred to as an (α, σ, β, c)-stable L´evy process. We now give series representations of stable distributions and stable L´evy processes. These representations use the point process N  appearing in Assumption 8.0.1, hence it will be convenient to express the limit of the partial sum process of dependent sequences.

8.2.1 The case α ∈ (0, 1) ∞ If α < 1, the series i=1 Pi is almost surely summable. Indeed, the number of indices i such that Pi > 1 is a Poisson random variable with mean ϑ, hence it is almost surely finite and ∞   1  ϑα . E Pi 1{Pi ≤1} = ϑ x · αx−α−1 dx = 1−α 0 i=1 Moreover, the concavity of the function x → xα on R+ and Assumption 8.0.1 yield ⎡⎛ ⎞α ⎤ ∞   E ⎣⎝ |Qj |⎠ ⎦ ≤ E[|Qj |α ] < ∞ . (8.2.3) 

j=1

j∈Z

Thus the series W = j∈Z Qj is summable with finite α-th moment. Denote  (i) Wi = j∈Z Qj . The random variables Wi , i ≥ 1 are i.i.d. with E[|Wi |α ] < ∞. We define the process Λ on [0, 1] by  t  ∞ ∞  xN  (ds, dx) = 1{Ti ≤t} Pi Wi . (8.2.4) Λ(t) = s=0

x=−∞

i=1

212

8 Convergence to stable and extremal processes

Proposition 8.2.1 Assume that P(|W | > 0) = 1. Then Λ is an (α, σ, β, 0)-stable L´evy process with σ α = ϑE[|W |α ]Γ (1 − α) cos(πα/2) and β = (E[W+α ] − E[W−α ])/E[|W |α ].

Proof. The identity (8.0.4) implies, for t ∈ [0, 1] and z > 0,  ∞ log E[eizΛ(t) ] = ϑt E[eizuW − 1]αu−α−1 du 0  ∞ (eiv − 1)αv −α−1 dv = ϑtz α E[W+α ] 0  ∞ α α (e−iv − 1)αv −α−1 dv + ϑtz E[W− ] 0

= −tz α σ α {1 − iβ tan(πα/2)} , with σ and β as in the statement of the proposition and we have used the identity (8.2.2a) in the last line. This proves that the marginal distribution of Λ(t) is as requested for each t. The weak finite-dimensional distributions are obtained similarly.  We also have an approximation result for the truncated process. For > 0, define  t  xN  (ds, dx) . R (t) = |x|≤

s=0

Lemma 8.2.2 R converges uniformly in probability to zero: for all δ > 0,

lim P sup |R (t)| > δ = 0 . →0

0≤t≤1

Proof. By definition, we have, for all t ∈ [0, 1], |R (t)| ≤

∞  i=1

Pi

 j∈Z

(i)

|Qj |1{Pi |Q(i) |≤} . j

 ∞ (i) ∗  The series j∈Z |Qj | are convergent. Denote Wi = i=1 Pi and Wi =  (i)  ∗  j∈Z |Qj |1{Pi |Q(i) |≤} . Then Wi ≤ Wi and for all i, Wi converges to j

zero surely as → 0. By bounded convergence, this implies that ∞ almost  P W converges to zero almost surely as → 0.  i=1 i i

8.2 Representations of stable laws and processes

213

8.2.2 The case α ∈ [1, 2) Let λ be the mean measure of N  , i.e.  ∞  ∞ E[δt,rQj ]αr−α−1 drdt λ=ϑ j∈Z

and for > 0, define  t  s=0 |x|> t s=0 |x|>

Λ (t) =

0

0

xN  (ds, dx) − 

xN (ds, dx) −



t

xλ(ds, dx) ,

|x|> s=0 t s=0  1, we have  t   ∞ xλ(ds, dx) = ϑt E[Qj 1{r|Qj | > }]αr−α dr s=0

|x|>

0

j∈Z

αϑt 

α E[Qj ] 1−α , = α−1

(8.2.6)

j∈Z

and for α = 1,  t  xλ(ds, dx) = ϑt s=0

 1, the summability condition E be assumed.

 j∈Z

(8.2.7) |Qj |

α 

< ∞ must

 α  < ∞ and Proposition 8.2.3 Assume that α ∈ (1, 2), E |Q | j j∈Z  P(| j∈Z Qj | > 0) = 1. Then, as → 0, Λ converges weakly in the uniform topology to an S(α, σ, β, 0)-stable L´evy process Λ with α     σ α = ϑΓ (1 − α) cos(πα/2)E  Qj  , j∈Z   α  E j∈Z Qj α  .  β= (8.2.8)   E  j∈Z Qj 

214

8 Convergence to stable and extremal processes

Proof. Let ν be the measure on R \ {0} defined by  ∞   ν = E δr j∈Z Qj 1{r|Qj |>} αr−α−1 dr .

(8.2.9)

0

Since Q∗ = 1 and lim|j|→∞ |Qj | = 0 almost surely, the series inside the expectation almost surely has a finite number  of terms α  and the measure ν < ∞ by assumption is boundedly finite on R \ {0}. Since E j∈Z |Qj |  and we have defined W = j∈Z Qj , we also have  E[W+α ]

=  E[W−α ] =

0



0 ∞

P (uW > 1) αu−α−1 du ,

P (uW < −1) αu−α−1 du .

It follows by Lebesgue’s dominated convergence theorem that ν converges (as → 0) vaguely# on R \ {0} to the measure Aνα,pW with A = E[|W |α ] , pW = A−1 E[(W )α +] , να,pW (dx) = α{(1 − pW )(−x)−α−1 1{x0} }dx . For α ∈ (1, 2), we have |eizu − 1 − izu| ≤ |z|2 ∨ 1, z, u ∈ R. Thus Corollary 2.1.10 and the identity (8.2.2b) yield  ∞  log E[eizΛ (t) ] = ϑt E[eiz j∈Z rQj 1{r|Qj |>} − 1]αr−α−1 dr 0  ∞ − izϑt rE[Qj 1{r|Qj |>} ]αr−α−1 dr  = ϑt

j∈Z ∞

{eizu − 1 − izu}ν (du)

−∞  ∞

→ ϑtA

0

−∞ α

{eizu − 1 − izu}να,pQ (du)

= −ϑt|z| AΓ (1 − α) cos(πα/2){1 − i sign(z)β tan(πα/2)} . The proof is concluded by applying Theorem E.4.4.



If the sequence Q is the conditional spectral tail process a time series X with spectral tail process Θ, using the results of Chapters 5 and 6, we can give other expressions of the parameters of the limiting stable law. If α = 1, Theorem 5.6.8 and Example 5.6.9 yield

8.2 Representations of stable laws and processes

α  α ⎤ ⎡     ∞ ∞     α    ⎣ E[|W | ] = E  Θj  −  Θj  ⎦ ,  j=0   j=1    α   α ∞ ∞ E − j=0 Θj j=1 Θj α  α   β= .  ∞  ∞   E  j=0 Θj  −  j=1 Θj 

215

(8.2.10a)

(8.2.10b)

The centering in the case α = 1 is more difficult to handle. We introduce a condition on the sequence Q, which is not necessary but allows for explicit expressions. Proposition 8.2.4 Assume that α = 1, P(| 

 j∈Z

Qj | > 0) = 1 and

#$ ! " E |Qj | log |Qj |−1 < ∞ .

(8.2.11)

j∈Z

Then, as → 0, Λ converges weakly in the uniform topology to an S(1, σ, β, c)-stable L´evy process Λ with  π   σ=ϑ E  Qj  , j∈Z 2   E Q j∈Z j  , β =   E  j∈Z Qj    c = ϑc0 E Qj − ϑm1 , j∈Z

c0 given in (8.2.2c) and  ⎤     ⎣⎝ m1 = E[Qj log(|Q−1 Qj ⎠ log  Qj ⎦ . j |)] + E  j∈Z  j∈Z j∈Z 

⎡⎛





(8.2.12)

  Proof. Write W = j∈Z Qj and W ∗ = j∈Z |Qj |. The assumption (8.2.11) implies that E[W ∗ log W ∗ ] < ∞ since W ∗ ≤ log(|Qj |−1 ) for large j. Moreover, this implies that E[|W log(|W |)|] < ∞. Indeed, if |W | ≤ 1 then |W log(|W |)| ≤ 1 and if |W | ≥ 1, then 0 ≤ log(|W |) ≤ log(W ∗ ).  For > 0, set W  (y) = j∈Z Qj 1{y|Qj | > }. Starting as in the proof of Proposition 8.2.3 and using the identity (8.2.7), we have

216

8 Convergence to stable and extremal processes

log E[eizΛ (t) ] (8.2.13)  ∞  = ϑt E[eizyW (y) − 1]y −2 dy − iϑtzE[W ] log( −1 ) 0 ∞  E[eizyW (y) − 1 − izyW  (y)1{y|W  (y)| ≤ 1}]y −2 dy = ϑt 0  ∞ E[W  (y)1{y|W  (y)| ≤ 1}]y −1 dy − iϑtzE[W ] log( −1 ) + iϑtz 0  ∞ = ϑt (eizu − 1 − izu1{|u| ≤ 1})ν (du) (8.2.14) −∞  ∞

+ iϑzt E[W  (y)1{y|W  (y)| ≤ 1}]y −1 dy + E[W ] log( ) . 0

Since ν converges vaguely# to Aν1,pW , applying (8.2.2c) we obtain that the term in (8.2.14) converges to −ϑt|z| π2 A{1 + i π2 sign(z)β log z} + itzβc0 with β = 2pW − 1 as claimed. Since Q∗ = 1, we have W  (y) = 0 if y ≤ . Thus  ∞ dy − E[W ] log( ) − E[W  (y)1{y|W  (y)| ≤ 1}] y 0  1  1 dy dy = + E[W − W  (y)] E[W  (y)1{y|W  (y)| > 1}] y y  0  ∞ dy − E[yW  (y)1{ < y|W  (y)| ≤ 1}] 2 = I + II − III . y 1 The summability condition (8.2.11) justifies the interchange of integrals and series in I:  1  dy I= =− E[Qj 1{y|Qj | ≤ }] E[Qj log(|Qj | ∨ )] y j∈Z  j∈Z  →− E[Qj log(|Qj |)] . j∈Z

   ∗ ∗ Since W ∗ = j∈Z |Qj | and |W (y)|1{y|W (y)| > 1} ≤ W 1{yW > 1} −1 which is integrable on [0, 1] with respect to the measure y dy on (0, 1] by (8.2.11), we obtain by dominated convergence  1 dy = E[W log+ (|W |)] . E[W 1{y|W | > 1}] II → y 0 Since E[yW  (y)1{ < y|W  (y)| ≤ 1}] ≤ 1 and lim E[yW  (y)1{ < y|W  (y)| ≤ 1}] = E[yW 1{y|W | ≤ 1}] .

→0

8.3 Convergence of the partial sum process

217

Thus, by dominated convergence,  ∞ dy III → yE[S1{y|W | ≤ 1}] 2 = E[W log+ (1/|W |)] . y 1 Since log+ (a) − log+ (1/a) = log(a) for all a > 0, we have proved (8.2.12).  Remark 8.2.5 If the time series X is regularly varying and Q is its conditional spectral tail process, then, by the identities (5.6.12) and Problem 5.33, we have β=

2pX − 1  , ϑE[| j∈Z Qj |]

c = c0 (2pX − 1) − ϑm1 , ⎞ ⎞ ⎤ ⎡ ⎛ ⎛ ∞ ∞   ∞ ∞       m1 = E ⎣ Θj log ⎝ Θj ⎠ − Θj log ⎝ Θj ⎠⎦ .  j=0   j=1  j=0 j=1 It must be noted that β = 2pX − 1 unless the time series X is extremally independent. ⊕

8.3 Convergence of the partial sum process In this section we will obtain the convergence of the partial sum process under Assumption 8.0.1 and the assumption that X0 is regularly varying.

8.3.1 Case 0 < α < 1 In that case, the centering bn of the partial sum process in (8.0.2) can be set equal to zero. We define the partial sum process Sn by Sn (t) = a−1 n

[nt] 

Xi

i=1

with an such that limn→∞ nP(|X0 | > an ) = 1. In that case, under Assumption 8.0.1, the series W = summable and  E[|W |α ] ≤ E[|Qj |α ] < ∞ . j∈Z

 j∈Z

Qj is absolutely

218

8 Convergence to stable and extremal processes

Theorem 8.3.1 Let Assumption 8.0.1 hold, X0 be regularly varying with tail index α ∈ (0, 1), P(|W | > 0) = 1. Let an be such that limn→∞ nP(|X0 | > an ) = 1 and set bn = 0. Then the finite dimensional distributions of the partial sum process Sn converge weakly to those of the (α, σ, β, 0)-stable L´evy process defined in (8.2.4), with σ α = θΓ (1 − α) cos(πα/2)E[|W |α ] and β = (E[W+α ] − E[W−α ])/E[|W |α ].

Proof. Define 

t

Sn, (t) = s=0

 |x|≤

xNn (ds, dx) ,

(8.3.1)

t ∞ t  Λ(t) = s=0 x=−∞ xN  (ds, dx) and R (t) = s=0 |x|≤ xN  (ds, dx). We know that R converges in probability uniformly to zero. Moreover, by stationarity and the moment bound (1.4.7a), we have,  n E sup |Sn, (t)| ≤ E[|X0 |1{|X0 |≤an } ] ≤ cst · n F¯|X0 | (an ) . an 0≤t≤1 By regular variation, this yields  lim lim sup E sup |Sn, (t)| ≤ lim cst · 1−α = 0 . →0 n→∞

→0

0≤t≤1

(8.3.2)

Since N  has almost surely a finite number of points in [0, 1] × ([−∞, − ] ∪ [ , +∞]), we can apply Corollary 7.1.3 and the continuous mapping Theorem 7.1.19 to obtain  t  fi.di. xN  (ds, dx) . (8.3.3) Sn (t) − Sn, (t) −→ Λ(t) − R (t) = s=0

|x|>

Proposition 8.2.1 and Lemma 8.2.2, (8.3.2), (8.3.3) and the triangular argument Lemma A.2.10 yield the result. 

8.3.2 Case α ∈ [1, 2) In this case, a suitable centering is needed. In the case α ∈ (1, 2), the expectation of each Xi is finite so the centering can be chosen as bn = E[X1 ]. Therefore we define Sn (t) = a−1 n

[nt]  i=1

(Xi − E[X1 ]) .

(8.3.4)

8.3 Convergence of the partial sum process

219

In the case α = 1, the expectation may be infinite, so the centering is defined as bn = E[X1 1{|X1 | ≤ an }] and Sn (t) = a−1 n

[nt] 

(Xi − E[X1 1{|X1 | ≤ an }]) .

(8.3.5)

i=1

As in the case α < 1, we use a truncation argument. For > 0, define  [nt] a−1 {Xi 1{|Xi | > an } − E[X0 1{|X0 |>an } ]} if α = 1 , n S¯n, (t) = i=1 [nt] {X 1{|X | > a

} − E[X 1 ]} if α = 1 . a−1 i i n 0 {an  an ) = 1 and all δ > 0,  n   Xi 1{|Xi | ≤ an } lim lim sup P  →0 n→∞  i=1     −E[X0 1{|X0 | ≤ an }] > an δ = 0 . (ANSJ(an )) 

This condition is satisfied for i.i.d. or m-dependent regularly varying sequences with tail index α ∈ (0, 2). See Problem 8.1. It must be checked by appropriate means for each time series model. This assumption has very important consequences. The first one is that we can recover the identity (5.6.12c). Lemma 8.3.3 Assume that X0 is regularly varying with tail index α ∈ [1, 2) and extremal skewness pX . Let an be such that limn→∞ nP(|X0 |>an ) = 1. Let Λ be the process defined in (8.2.5). If Assumption 8.0.1 holds and (8.2.11) fi.di. is satisfied if α = 1, then S¯n, −→ Λ if and only if ϑ



α

E[Qj ] = 2pX − 1 .

j∈Z

If Assumption 8.3.2 is satisfied, then (8.3.7) holds. Proof. We start with the case α > 1. Then,

(8.3.7)

220

8 Convergence to stable and extremal processes

 S¯n, (t) =

t

s=0

 |x|>

xNn (ds, dx) − t(α − 1)−1 αϑ

α

E[Qj ] 1−α

j∈Z



+ t(α − 1)−1 α ⎝ϑ





α E[Qj ] − 2pX + 1⎠ 1−α

j∈Z





+ t (α − 1)−1 α(2pX − 1) 1−α −

 [nt] −1 an E[X0 1{|X0 | > an }] t

= Λn, (t) + tζn ( ) + tςn ( ) . Since N  has an almost surely finite number of points on [0, 1] × [ , ∞], Corollary 7.1.3 and the continuous mapping Theorem 7.1.19, (8.2.5), and d

(8.2.6) imply that Λn, (t) −→ Λ (t). By Proposition 8.2.3, we know that d

Λ (t) −→ Λ(t) as → 0 with Λ the α-stable process defined therein. By w regular variation, limn→∞ ςn ( ) = 0 for all > 0. Thus S¯n, =⇒ Λ if and only if (8.3.7) holds. Let us now prove that Assumption 8.3.2 implies (8.3.7). If the equality (8.3.7) does not hold, for every A > 0, we have lim lim sup P(|S¯n, | > A) = 1 .

→0 n→∞

Write S%n, = a−1 n

n 

{Xi 1{|Xi | ≤ an } − E[X0 1{|X0 |≤an } ]} .

i=1

Then for ∈ (0, 0 ], S¯n, = S¯n,0 + S%n,0 − S%n, and (ANSJ(an )) yields that for every δ > 0, there exists 0 such that for all ∈ (0, 0 ], lim sup P(|S%n, | > 1) ≤ δ . n→∞

Thus, for A > 2 and ≤ 0 , lim sup P(|S¯n, | > A) n→∞

≤ lim sup P(|S¯n,0 | > A − 2) + 2δ n→∞ ⎞ ⎛ 

α E[Qj ] − 2pX + 1) 1−α > A − 2⎠ + 2δ . = P ⎝Λ0 (1) + (α − 1)−1 α(ϑ 0 j∈Z

Note that the right-hand side depends on the fixed 0 , thus for A large enough, lim sup P(|S¯n, | > A) ≤ 3δ . n→∞

This is a contradiction, thus (8.3.7) holds.

8.3 Convergence of the partial sum process

221

In the case α = 1, we have  1   ¯ Sn, = xNn (ds, dx) − ϑ E[Qj ] log( −1 ) s=0



+ ⎝ϑ

|x|>





j∈Z

E[Qj ] − 2pX + 1⎠ log( −1 )

j∈Z

+ (2pX − 1) log( −1 ) − na−1 n E[X0 1{an < |X0 | ≤ an }] = Λn, (1) + ζn ( ) + ςn ( ) . The rest of the proof is exactly the same as previously, with (8.2.7) replacing (8.2.6) and Proposition 8.2.4 replacing Proposition 8.2.3.  Remark 8.3.4 The identity (8.3.7) is (5.6.12c) which was obtained in Chapter 5 under the assumption that X is regularly varying and Q is its conditional spectral tail process. Here we have not made such assumptions, rather we assumed the convergence of the point process of exceedences and the asymptotic negligibility condition (ANSJ(an )). Neither set of conditions imply the other, but they are clearly closely related. ⊕ Theorem 8.3.5 Let Assumptions 8.0.1 and 8.3.2 hold, X0 be regularlyvarying with tail index α ∈ [1, 2) and extremal  skewness pX and P(| j∈Z Qj | > 0) = 1. If α > 1, assume that E[( j∈Z |Qj |)α ] < ∞; if α = 1, assume that (8.2.11) holds. Let an be such that limn→∞ nP(|X0 | > an ) = 1. Then the finite-dimensional distributions of the partial sum process Sn converge weakly to those of an (α, σ, β, c)-stable L´evy process Λ with σ, β, and c as in Propositions 8.2.3 and 8.2.4.

Proof. Let S¯n, (t) be as in (8.3.6). By the proof of Lemma 8.3.3, we know that S¯n, (t) converges weakly to Λ (t) for each t. Convergence of the finite dimensional distribution is obtained by the Wold device (Theorem E.2.3) through similar arguments. Proposition 8.2.3 yields the convergence of Λ to the L´evy process Λ in the uniform topology. We conclude by applying Assumption 8.3.2 and the triangular argument Lemma A.2.10. 

8.3.3 Convergence in the J1 or M1 topology In order to prove the weak convergence of the partial sum process in the J1 or M1 topologies, we need a preliminary lemma on the continuity of the summation functional. We cannot use here Corollary 7.1.3 since D endowed with either the J1 or M1 topology is not a topological vector space.

222

8 Convergence to stable and extremal processes

Let N ([0 , ∞)×R\{0}) be the set of point measures ν on [0 , ∞)×R\{0}. Let A ⊂ R \ {0} be a measurable set, separated from zero. Define the summation functional SA : N ([0 , ∞) × R \ {0}) → D([0, ∞)) by SA (ν)(t) =

∞ 

xi 1A (xi )1{ti ≤ t} .

i=1

We are interested in the continuity of SA with respect to either the J1 or the M1 topology. For this we must restrict its domain to suitable subsets of N ([0 , ∞) × R \ {0}). • Let NJ be the set of boundedly finite simple point measures ν on [0 , ∞) × R \ {0} (which means that ν([0, a] × A) < ∞ for all measurable subsets A of R \ {0} separated from 0) such that ν({0} × R \ {0}) = 0 and ν({t} × R \ {0}) ≤ 1. • Let NM be the set of boundedly finite simple point measures ν on [0 , ∞) × R \ {0} which can be expressed as ν=

∞  

δti ,xi,j ,

i=1 j∈Z

and such that for all i ≥ 1, j, j  ∈ Z, ti > 0 and xi,j xi,j  ≥ 0. Obviously, NJ ⊂ NM and a measure in NM can have several points with the same time coordinates, but these points must have all second coordinates of the same sign. Lemma 8.3.6 Let A ⊂ R \ {0} be a measurable set separated from 0. The summation functional SA • restricted to NJ is continuous with respect to the J1 topology; • restricted to NM is continuous with respect to the M1 topology. Proof ( Proof of the continuity with respect to the J1 topology). Fix a measurable set A ⊂ R \ {0} separated from zero. Since addition is continuous with respect to the J1 topology at a pair of functions with no common jumps by Proposition C.2.5, it suffices to prove that for ν0 = δt0 ,x0 with v#

t0 > 0 and x0 ∈ A, if {νn } is a sequence in NJ such that νn −→ ν0 , J

then SA (νn ) →1 SA (ν0 ). By definition of the J1 topology n D([0, ∞)), we can assume without loss of generality that t0 ∈ (0, 1). The result is a trivial consequence of the convergence of points Proposition 7.1.2. Indeed, for large enough n, νn also has a single point (tn , xn ) in [0, 1] × A and tn → t0 and xn → x0 . Thus SA (νn )(t) = xn 1{tn ≤ t} and SA (ν0 )(t) = x0 1{t0 ≤ t} J

therefore S(νn ) →1 S(ν0 ); see Example C.2.2.



8.3 Convergence of the partial sum process

223

Proof ( Proof of the continuity with respect to the M1 topology). Fix a measurable set A ⊂ R \ {0} separated from zero. Since addition is continuous with respect to the M1 topology at a pair of functions with no common jumps of opposite signs by Proposition C.3.5, it suffices to consider as in the previous case a point measure with only one time point, ı.e. a point k δ measure of the form ν0 = t ,x i=1 0 0,i with t0 ∈ (0, 1) and x0,i ∈ A for i = 1, . . . , k and we can assume without loss of generality that A ⊂ (0, ∞). v#

If {νn } is a sequence in NM such that νn −→ ν0 , then by the convergence of points Proposition 7.1.2, for n large enough, the points of νn in [0, 1] × A are {(tn,i , xn,i ), 1 ≤ i ≤ k} with tn,i → t0 and xn,i → x0,i . Then, k SA (ν0 ) = y0 1[t0 ,∞) with y0 = x1 + · · · + xk and SA (νn ) = i=1 xn,i 1[ti ,∞) . Set yn = xn,1 + · · · + xn,k . It is readily checked (see Example C.3.3) that  dM1 (SA (νn ), SA (ν0 )) ≤ (∨ki=1 |tn,i − t0 |) ∨ |y0 − yn | → 0. To state the functional convergence, we need to strengthen Assumption 8.3.2 as follows. Assumption 8.3.7 For all δ > 0,  k    Xi 1{|Xi | ≤ an } lim lim sup P max  →0 n→∞ 1≤k≤n  i=1   k    −E[Xi 1{|Xi | ≤ an }]  > an δ = 0 .  i=1

(ANSJU(an ))

Here U stands for uniform. As for (ANSJ(an )), the condition is satisfied for i.i.d. or m-dependent regularly varying sequences with α ∈ (0, 2), but must be checked for each time series model. The condition guarantees that for α ∈ [1, 2), Sn − S¯n, (with S¯n, defined in (8.3.6)), converges locally uniformly to zero in probability. The condition always holds for α ∈ (0, 1). Theorem 8.3.8 Assume that the conditions of Theorems 8.3.1 and 8.3.5 and Assumption 8.3.7 hold. If P(Qj = 0, j = 0) = 1, then Sn converges to Λ with respect to the J1 topology on D([0, 1]). If P(Qj Qj  ≥ 0, j, j  ∈ Z) = 1, then Sn converges to Λ with respect to the M1 topology on D([0, ∞)).

224

8 Convergence to stable and extremal processes

Proof ( Proof in the case 0 < α < 1.). Let d∗ denote dJ1 or dM1 according to the set of assumptions considered. Using the notation of the proof of Theorem 8.3.1, we can write Sn = Sn, + Sn − Sn, and Λ = Λ + R . Since Sn − Sn, = SA (Nn ) and Λ = SA (N  ) with A = (−∞, − ) ∪ ( , ∞), we have by the continuous mapping Theorem 7.1.19 and Lemma 8.3.6 that w Sn − Sn, =⇒ Λ in the d∗ topology. By Theorem E.4.1, we can assume that d∗ (Sn − Sn, , Λ ) → 0 almost surely. By Lemma 8.2.2, R converges uniformly in probability to 0 as → 0. Combining the previous arguments with the bound (8.3.2) yields that d∗ (Sn , Λ) → 0 almost surely.  Proof ( Proof in the case 1 ≤ α < 2.). Still with S¯n, defined in (8.3.6), Λ in (8.2.5) and A = (−∞, − ) ∪ ( , ∞), we have S¯n, − Λ = SA (Nn ) − SA (N  ) + (·)rn ( ) , 



with the expression of rn ( ) depending on α > 1 or α = 1. As in the proof of Lemma 8.3.3, we have rn → 0 (note that (8.3.7) holds under Assumption 8.3.7). Thus, as in the case α ∈ (0, 1), d∗ (S¯n, , Λ ) → 0 almost surely. The negligibility condition (ANSJU(an )) implies lim→0 limn→∞ d∗ (S¯n, ,  Sn ) = 0 in probability. We conclude by applying Proposition 8.2.3. Example 8.3.9 As a first example, we recover the classical result for a sequence {Zj , j ∈ Z} of i.i.d. regularly varying random variables with tail index α ∈ (0, 2) and extremal skewness pZ . We know that Assumption 8.0.1 holds and (ANSJU(an )) holds if α ≥ 1. Since the tail process vanishes except at 0, we also know that the convergence holds in D([0, ∞)) endowed with the J1 topology. More precisely, let aZ,n be the (1 − 1/n) quantile of |Z0 | and let [nt] SZ,n (t) = i=1 (Zi −bn ) with bZ,n = 0 if α < 1, bZ,n = E[Z0 1{|Z0 | ≤ aZ,n }] if w α = 1 and bZ,n = E[Z0 ] if α > 1. Then a−1 Z,n Sn =⇒ ΛZ in D([0, ∞)) endowed with the J1 topology, with ΛZ an (α, σZ , βZ , cZ ) stable L´evy process with α = Γ (1 − α) cos(πα/2) if α = 1, σZ = π2 if α = 1, βZ = 2pZ − 1, cZ = 0 if σZ  α = 1 and cZ = c0 βZ if α = 1. Example 8.3.10 Consider the MA(1) process {Xj , j ∈ Z} from Example 5.2.10: Xj = Zj + φZj−1 , where {Zj } is a sequence of i.i.d. random variables with tail index α and skewness pZ . Since the time series {Xj } is 1-dependent, conditions AC(rn , cn ) and (7.3.14) hold and Assumption 8.0.1 holds with the sequence Q equal to the conditional spectral tail process which we know from Problem 5.6 to be defined as follows (up to shift): if |φ| ≤ 1, Q0 = Θ0 , / {0, 1}; if |φ| > 1, then Q0 = Θ0 {φ−1 b + 1 − b}, Q1 = φΘ0 and Qj = 0, j ∈ −1 Q1 = bΘ0 , Q−1 = φ (1 − b)Θ. If φ > 0, then the Qj are all of the same sign (that of Θ0 ) hence the partial sum process converges in the M1 topology. If φ < 0, Q0 Q1 < 0 or Q−1 Q0 < 0

8.4 Multivariate case

225

and the convergence does not hold in the M1 topology. To see this, let us consider for simplicity the case α ∈ (0, 1), −1 ≤ φ < 0 and Z0 is positive. Then the partial sum process can be expressed as Sn (t) = a−1 n

[nt] 

Xi = (1 + φ)a−1 n

i=1

[nt] 

Zi + φa−1 n (Z0 − Z[nt] ) .

i=1

Choosing an = FZ−1 (1 − 1/n), the finite dimensional distributions of the partial sum process Sn converge to a L´evy stable process which can be expressed as (1 + φ)Λ and Λ(t) =

∞ 

Pi 1{Ti ≤ t} .

i=1

However, the running maximum of the partial sum process converges (in the M1 topology since it is a sequence of non-decreasing functions) to a process which can be expressed as ∗

M (t) = sup (1 + φ) 0≤s≤t

∞ 

Pi 1{Ti ≤ s} + |φ|Pi 1{Ti = s} .

i=1

To understand the form of the limit, note that each jump Zi of the partial sum process is immediately compensated by a negative jump of size |φ|Zi . At the limit, only a portion (1 − |φ|) of the extreme jumps remains. But the running maximum keeps tracks of the full jump. This is illustrated in Figure 8.1 where we can see (with some good will) that the asymptotic jumps will be of a smaller size than the full initial jumps. In  particular, if φ = −1, the partial sum process converges to zero since j∈Z Qj = 0, but X1 + · · · + Xk = Zk − Z−1 thus −1 max a−1 n (X1 + · · · + X[ns] ) = an ( max Zj − Z−1 ) =⇒ M (t) w

0≤s≤t

1≤j≤[nt]

(with respect to the M1 topology), where M (t) is the extremal process associated to the sequence {Zj }, introduced in Section 8.1. This is illustrated in Figure 8.2. In Figure 8.3 we illustrate the fact that simultaneous jumps at different time points can be of different signs, but at one location they must  be of the same sign for M1 convergence to hold.

8.4 Multivariate case We briefly discuss the multivariate case. Let ·, · and |·| denote the scalar product and Euclidean norm on Rd and Sd−1 be the associated unit sphere.

8 Convergence to stable and extremal processes

0.0

0.5

1.0

1.5

226

0.0

0.2

0.4

0.6

0.8

1.0

0

2

4

6

8

10

12

Fig. 8.1. The running maximum of the partial sum process of the MA(1) process Xj = Zj − .6Zj−1 with Z Pareto with tail index α = .7

0.0

0.2

0.4

0.6

0.8

1.0

Fig. 8.2. The running maximum of the partial sum process of an MA(1) process Xj = Zj − Zj−1 with Z symmetric Pareto with tail index α = .7

For α ∈ (0, 2), a d-dimensional random vector X has an α-stable distribution if its characteristic function can be written as log E[ei z ,X ] =  −σ α E[|z, U |α {1 − sign(z, U ) tan(πα/2)}] + iz, µ , −σE[|z, U |α {1 + i π2 sign(z, U ) log(|z, U |)}] + iz, µ ,

if α = 1 , if α = 1 ,

with µ ∈ Rd , σ > 0 and U a random variable on Sd−1 whose distribution is called the spectral measure of the stable vector X. This representation implies

227

−1.5

−1.0

−0.5

0.0

0.5

8.4 Multivariate case

0.0

0.2

0.4

0.6

0.8

1.0

Fig. 8.3. The running maximum of the partial sum process of an MA(1) process Xj = Zj + .6Zj−1 with Z symmetric Pareto with tail index α = .7

that for each z ∈ Rd , z, X is a stable random vector with characteristic function log E[eit z ,X ] = −|t|α σ α (z){1 − i sign(t)β(z) tan(πα/2)} + itz, µ , with σ α (z) = σ α E[|z, U |α ] and β(z) =

E[z, U  α ] , E[|z, U |α ]

if α = 1 and 2 sign(t)β(z) log(|t|)} π

2 + it z, µ − σE[z, U  log(|z, U |)] , π

log E[eit z ,X ] = −|t|σ(z){1+i

if α = 1, with the same σ(z) and β(z). The converse is true if α ≥ 1, but is only true if α < 1 in the strictly stable case (i.e. when the characteristic function has no drift term). Theorem 8.3.1 and Theorem 8.3.5 are easily generalized to the multivariate case if we rewrite Assumption 8.0.1 in a multivariate setting. We will only deal with convergence of finite dimensional distributions since tightness in a multivariate setting is even more delicate to handle. For simplicity, we will only consider convergence at one point and therefore we make an assumption on the point process of exceedences.

228

8 Convergence to stable and extremal processes

Assumption 8.4.1 Let {X j , j ∈ Z} be a Rd -valued stationary time series and let  {cn } be a scaling sequence. The point process of exceen converges weakly to a point process N with dences Nn = i=1 δc−1 n Xi the representation N=

∞  

δPi Q (i) ,

i=1 j∈Z

j

∞ where i=1 δPi is a Poisson point process on Rd \{0} with mean measure ϑνα and Q(i) , i ≥ 1, are i.i.d. copies of a sequence Q, such that P(Q∗ = 1) = 1, independent of {Pi , i ≥ 1}.

As noted after Assumption 8.0.1, the point process convergence implies  ! α $ E Qj  < ∞ . j∈Z

n Define S n = c−1 n i=1 (X i − bn ) with ⎧ ⎪ ⎨0 bn = E[X 0 1{|X 0 | ≤ cn }] ⎪ ⎩ E[X 0 ]

if α ∈ (0, 1) , if α = 1 , if α ∈ (1, 2) .

If α ≥ 1, we will assume the asymptotic negligibility of small jumps, i.e.   n      lim lim sup P  X i 1{|X i | ≤ cn } − E[X 0 1{|X 0 | ≤ cn }] > cn δ = 0 . →0 n→∞   i=1

(8.4.1) Equivalently, we can assume the negligibility condition separately for each component. Theorem 8.4.2 Assume that X 0 is regularly varying with tail index α ∈ (0, 2)\{1}. Let Assumption 8.4.1 holdand if α > 1 assume moreover that    Qj  < ∞ = 1. Define W =  Qj and (8.4.1) holds and P j∈Z

j∈Z

assume that P(|W | > 0) = 1. Then S n converges weakly to an α-stable distribution with characteristic function eψ with ψ(z) = −ϑΓ (1−α) cos(πα/2) E[|z, W |α {1 − sign(z, W ) tan(πα/2)}] .

(8.4.2)

8.5 Problems

229

This representation implies that the spectral measure of the limiting stable α distribution is E[|W | δ|W |−1 W ]. In the case α < 1, the limiting distribution ∞ has the representation i=1 Pi W i where W i , i ≥ 1, are i.i.d. copies of W . We now turn to the case α = 1. Theorem  8.4.3 Assume that α = 1. Let Assumption 8.4.1 hold. Define W = j∈Z Qj and assume that P(|W | > 0) = 1. Assume furthermore that (8.4.1) holds and  !  $ E Qj  log(|Qj |−1 ) < ∞ . j∈Z

Then S n converges weakly to a 1-stable distribution with characteristic function eψ with ψ(z) =

 

 |z, W | 2 π −ϑ E |z, W | 1 + i sign(z, W ) log 2 π |W | + iz, µ ,

(8.4.3)

with

⎞⎤ ⎡ ⎛         Qj log ⎝ Qj ⎠⎦ + E[Qj log(Qj )] . µ = c0 ϑE[W ] − ϑE ⎣  j∈Z  j∈Z j∈Z

Remark 8.4.4 If the time series X is regularly varying, Q is its conditional spectral tail process and Θ is its spectral tail process, then ⎞ ⎞⎤ ⎡ ⎛ ⎛           µ = c0 E[Θ0 ] − E ⎣ Θj log ⎝ Θj ⎠ − Θj log ⎝ Θj ⎠⎦ j≥0 j≥1   j≥0 j≥1

8.5 Problems 8.1 Prove that (ANSJ(an )) holds for i.i.d. and m-dependent sequences. Hint: in the i.i.d. case, apply Chebyshev inequality; in the m-dependent case, split the sum into m sums. 8.2 Let a ∈ R∗ and X have an (α, σ, β, c)-stable law. Prove that the law of aX is Stable(α, |a|σ, βsign(a), ac) if α = 1 and Stable(1, |a|σ, βsign(a), ac − σ π2 βa log(|a|)) if α = 1.

230

8 Convergence to stable and extremal processes

8.3 Let Λ be a (α, σ, β, 0)-stable random variable and v be a deterministic vector. Give the law of Λv. 8.4 Assume that

⎞α ⎤ ⎡⎛ ∞  |Θj |⎠ ⎦ < ∞ . E ⎣⎝ j=0

Use Problem 5.17 and Problem 5.19 to show  α * + ∞ 2E Θ 1 Θ∗−∞,−1 = 0 j=0 j + α *  β= + − 1 .   ∞ E  j=0 Θj  1 Θ∗−∞,−1 = 0

(8.5.1)

8.5 Consider the MA(1) process Xj = Zj + φZj−1 from Example 5.2.10. Compute the expressions for the scale and the skewness parameters of the stable law in Propositions 8.2.1, 8.2.3, and 8.2.4. 8.6 Consider the AR(1) process Xj = Zj +ρXj−1 of Problem 7.9 with |ρ| < 1. Compute the expressions for the scale and the skewness parameters of the stable law in Propositions 8.2.1, 8.2.3, and 8.2.4. Autocovariances 8.7 Let {Xj , j ∈ Z} be a stationary regularly varying time series with tail index α ∈ (0, 4). Assume that E[X0 ] = 0 if α > 1. Let cn be the (1 − 1/n)quantile of the distribution of |X0 |. Assume that Nh,n =

n−h 

δc−2 2 n (X ,Xj Xj+1 ,...,Xj Xj ) j

i=1 w

=⇒ Nh =

∞ 

δPi ((Q(i) )2 ,Q(i) Q(i)

i=1

j

j

(i) (i) j+1 ,...,Qj Qj+h )

,

where {Pi , i ≥ 1} are the points of a PPP on (0, ∞) with mean measure ϑνα/2 and Q(i) , i ≥ 1 are i.i.d. copies of the conditional spectral tail process Q of {Xj , j ∈ Z}. (Cf. Problem 7.11 for sufficient conditions.)  α/2  2 1. Assume that 0 < α < 2. Prove that E < ∞. Deduce that j∈Z Qj  for all  = 0, . . . , h, P( j∈Z |Qj Qj+ | < ∞) = 1. 2. Deduce the limit distribution of Sh,n = c−2 n

n−h 

(Xi2 , Xi Xi+1 , . . . , Xi Xi+h ) .

i=1

8.6 Bibliographical notes

3. Assume that 2 < α < 4 and E



 j∈Z

Q2j

α/2

231

< ∞. Assume that

E[X0 ] = 0 and set r(j) = cov(X0 , Xj ) = E[X0 Xj ]. Define S¯h,n = a−2 n

n−h 

(Xi2 − r(0), Xi Xi+1 − r(1), . . . , Xi Xi+h − r(h)) .

i=1

Assume that for each  = 0, . . . , h and all δ > 0,  n  * +  lim lim sup P  Xi Xi+ 1 |Xi Xi+ | ≤ c2n →0 n→∞  i=1 ! * +$ # −E X0 X 1 |X0 X | ≤ c2n  > c2n δ = 0 . Deduce the limiting distribution of S¯h,n . 8.8 Let {Xi , i ∈ Z} be a univariate, regularly varying, extremally independent time series with tail index α > 0. Assume that (X0 , X1 ) satisfies Definition 3.2.1 with scaling function b and conditional scaling exponent κ. Let µ be as in Definition 3.2.1 and (W0 , W1 ) be as in (3.2.8). Let {an } be defined by limn→∞ nP(|X0 | > an ) = 1. Assume that Nn =

n−1  i=1

w

δ Xi , Xi+1 =⇒ N , an

b(an )

where N is a PPP on R \ {0} × R with mean measure µ. (See (Cf. Problem 7.12 for sufficient conditions.) Assume also that (3.2.16) holds and α < 1 + κ. Prove that n−1 ∞   1 d Xi Xi+1 −→ Pi Ui , an b(an ) i=1 i=1

where {Pi , i ≥ 1} are the points of a PPP(να/(1+κ) ) and Ui , i ≥ 1 are i.i.d. copies of W0 W1 , independent of {Pi , i ≥ 1}. Give the parameters of the stable distribution of the series. Hint: prove first the convergence of the point process ˜n = n−1 δ(a b(a ))−1 X X N by continuous mapping using the condition n n i i+1 i=1 (3.2.16).

8.6 Bibliographical notes Extremal process Convergence to the extremal process in the i.i.d. case is extensively considered in [Res87, Chapter 4]. For dependent sequences it is studied in the J1 and M1 topologies in [Kri14, Kri16].

232

8 Convergence to stable and extremal processes

Convergence to stable laws One of the earliest references for the weak convergence of the partial sum process to a stable process is [Dav83]. In particular, convergence is shown to hold for regularly varying functions of a Gaussian process whose covariance function decays in such a way that cov(X0 , Xn ) log(n) → 0, only for 0 < α 1 and therefore the parameters of the limit-

8.6 Bibliographical notes

233

ing stable distribution are not explicitly given. This condition is assumed in [DH95, Theorem 3.2] but not the extra assumption (8.2.11) for the case α = 1 hence a non-explicit expression of the drift parameter is provided therein. Multivariate stable laws We refer to [ST94, Chapter 2] for multivariate stable laws. Functional convergence in the vector-valued case was studied by [BK15]; it necessitates to introduce the weak M1 topology ([Whi02, Chapter 12]). Autocovariances Problem 8.7 is adapted from [DM98b, Theorem 3.5].

9 The tail empirical and quantile processes

In this chapter, we consider a stationary univariate regularly varying times {Xj , j, ∈ Z} with tail index α and the distribution function of X0 will be denoted by F . Let F = 1 − F . Let {un } be a scaling sequence such that lim nF (un ) = ∞ .

n→∞

The tail empirical distribution function is the distribution of the extreme observations. It is more convenient to consider the survivor function, that is the empirical probability of exceedences over extremely large thresholds. Formally, we define:

Tn (s) =

 1 1{Xj >un s} , s > 0 . nF (un ) j=1 n

The tail empirical distribution function is the main statistical tool to study the univariate extremal characteristics of a time series.  If Fn denotes the n usual empirical distribution function, i.e. Fn (x) = n−1 j=1 1{Xj ≤x} and F n = 1 − Fn , then we have the identity F n (un s) Tn (s) = . F (un ) Define Tn (s) = E[Tn (s)] =

F (un s) F (un )

and © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 9

235

236

9 The tail empirical and quantile processes

T (s) = lim Tn (s) = s−α . n→∞

By the Uniform Convergence Theorem 1.1.2, this convergence is uniform on sets [a, ∞) with a > 0. In this chapter, we will study the weak convergence of Tn to T in Section 9.1 and then the convergence of the centered and rescaled tail empirical process defined by

 n (s) = T

» nF (un ){Tn (s) − Tn (s)} , s > 0 .

(9.0.1)

The tail empirical process is the extreme value equivalent of the usual empirical process. For an i.i.d. sequence, the Lindeberg-L´evy Central Limit Theorem  n converge to E.2.2 easily yields that the finite-dimensional distributions of T −α those of a process T which can be expressed as T(s) = W(s ), s > 0, where W is a standard Brownian motion (i.e. a continuous Gaussian process such that cov(W(s), W(t)) = s ∧ t). Under suitable weak dependence assumptions, we may expect convergence  n to a Gaussian process, possibly with dependent increments. We will of T state and prove such a functional central limit theorem for the univariate tail empirical process under a temporal weak dependence condition and a strengthened anticlustering condition which will be introduced in Section 9.2. Additional restrictions on the range of the threshold un will be needed. The functional central limit theorem is stated in Theorem 9.2.10. The limiting process is a Gaussian process with dependent increments. It further implies weak convergence of quantile process, cf. Theorem 9.3.1.  n with appropriate order The next step is to replace un in the definition of T statistics. We will prove in Theorem 9.4.1 that this substitution leads to a different limiting process, a dependent Brownian bridge. In Section 9.5, we will apply the previous results to the famous Hill estimator of the tail index of the marginal distribution.

9.1 Consistency of the tail empirical distribution function We will provide conditions which ensure the weak consistency of the tail empirical distribution function, i.e.

9.1 Consistency of the tail empirical distribution function P Tn (s) −→ T (s) = s−α , s > 0 .

237

(9.1.1)

Since Tn is non-increasing and T is continuous, the convergence is uniform on [s0 , t0 ] for all 0 < s0 < t0 by Dini’s Theorem E.4.12. Proposition 9.1.1 If the regularly varying time series {Xj , j ∈ Z} is m-dependent and if the sequence un satisfies lim un = lim nP(X0 > un ) = ∞ ,

n→∞

n→∞

then (9.1.1) holds.

Proof. By m-dependence, we can write 1  (i) Tn (s) = T (s) + Rn (s) , m i=1 n m

(i) where Rn (s) is a negligible remainder term and Tn (s) are defined by

Tn(i) (s) =

[n/m] ©  ¶ m 1 X(j−1)m+i > un s . nP(X0 > un ) j=1

(i) Since for each i the summands in Tn are i.i.d., we can easily compute the variance:

Ä ä var Tn(i) (s) =

P(X0 > un s) m[n/m] →0. 0 > un ) P(X0 > un )

n2 P(X

P (i) (i) By regular variation, lim→∞ E[Tn (s)] = s−α , thus Tn (s) −→ s−α . This P proves that Tn (s) −→ s−α . 

We now turn to m-dependent approximations. Proposition 9.1.2 Let {Xj , j ∈ Z} be a stationary regularly varying univariate time series. Let un be a sequence such that limn→∞ un = limn→∞ nP(X0 > un ) = ∞. Assume that there exists a sequence of m(m) dependent regularly varying time series {Xj , j ∈ Z}, m ≥ 1, such that (m)

{(Xj , Xj

), j ∈ Z} is stationary for each m ≥ 1 and for all  > 0,

238

9 The tail empirical and quantile processes

lim lim sup

    (m)  P X 0 − X 0  > u n  P(X0 > un )

m→∞ n→∞

=0.

(9.1.2)

Define Tn(m) (s) =

1 (m)

nP(X0

> un )

n ¶ (m) ©  1 Xj > u n s , s > 0 . j=1

P (m) Then for each m ≥ 1, Tn (s) −→ s−α for all s > 0 and (9.1.1) holds.

Proof. By Proposition 5.2.5, we have    P(X (m) > u )    n 0 − 1 = 0 . lim lim sup  m→∞ n→∞  P(X0 > un )  (m)

This yields that limn→∞ nP(X0 s−α by Proposition 9.1.1. (m)

Write vn

(m)

= nP(X0

P (m) > un ) = ∞ for large m and Tn (s) −→

> un ) and vn = nP(X0 > un ). Then, for s > 0

n ¶ (m) ©ä 1 Ä Tn (s) − Tn(m) (s) = 1{Xj > un s} − 1 Xj > un s vn j=1 (m)

+

vn

− vn (m) Tn (s) .

vn

The previous considerations yield that for each η > 0, lim lim sup P(|vn−1 vn(m) − 1|Tn(m) (s) > η) = 0 .

m→∞ n→∞

(m)

For 0 <  < s, we have, splitting according to max1≤j≤n |Xj − Xj greater or smaller than un ,

| being

n ¶ (m) © 1   1{Xj > un s} − 1 Xj > un s  vn j=1



n © 1  ¶ (m) 1 un (s − ) ≤ Xj ≤ un (s + ) vn j=1

+

n © 1  ¶ (m) 1 |Xj − Xj | > un  . vn j=1

9.1 Consistency of the tail empirical distribution function

Therefore, for η > 0,   n 1  ¶ (m) ©  P  1{Xj > un s} − 1 Xj > un s  > η  vn  j=1 Ç (m) å © vn ¶ (m) (m) ≤P Tn (s − ) − Tn (s + ) > η/2 vn n   2   (m)  P Xj − Xj  > un  . + vn η j=1

239

(9.1.3)

P (m) Since Tn (s) −→ T (s), for  sufficiently small, we have Ç (m) å © vn ¶ (m) lim lim sup P Tn (s − ) − Tn(m) (s + ) > η/2 = 0 . m→∞ n→∞ vn

By assumption (9.1.2), the same limit holds for the second term in (9.1.3). Altogether, we have obtained      lim lim sup P Tn(m) (s) − Tn (s) > η = 0 . m→∞ n→∞



This yields (9.1.1) by the triangular argument Lemma A.2.10.

We conclude this section by showing that consistency of the tail empirical distribution function implies the consistency of the intermediate empirical quantile. Let X(n:1) ≤ · · · ≤ X(n:i) ≤ · · · ≤ X(n:n) be the corresponding increasing order statistics. The order statistics can be interpreted as empirical quantiles. If Fn denotes the usual empirical distribution function, then X(n:n−k) = Fn← (1 − k/n). Similarly, the high-order statistics can be interpreted as extreme quantiles. Following the common usage in the extreme value theory literature, we will denote by k an intermediate sequence of integers, the dependence in n will be implicit and we now define the sequence un by un = F ← (1 − k/n). Proposition 9.1.3 If (9.1.1) holds, then X(n:n−k) P −→ 1 . un

Proof. Fix  > 0. Then  P(u−1 n X(n:n−k) > 1 + ) ≤ P(Tn (1 + ) > 1) .

(9.1.4)

240

9 The tail empirical and quantile processes

P If (9.1.1) holds, then Tn (1 + ) −→ (1 + )−α , therefore

 lim sup P(u−1 n X(n:n−k) > 1 + ) ≤ lim P(Tn (1 + ) > 1) = 0 . n→∞

n→∞

We obtain similarly that limn→∞ P(u−1 n X(n:n−k) < 1 − ) = 0. This proves (9.1.4). 

9.2 Functional central limit theorem for the tail empirical process We consider the tail empirical process of a univariate stationary regularly varying time series, that is the centered and normalized tail empirical distribution »  n (s) = nF (un ){Tn (s) − Tn (s)} , s > 0 . T We will first briefly discuss the case of an i.i.d. sequence and then develop a general theory for β-mixing sequences.

9.2.1 The i.i.d. case Define ξn,j (s) = (nF (un ))−1/2 {1{Xj > un s} − P(X0 > un s)} .   n (s) = n ξn,j (s). Since the summands are independent, the finiteThen T j=1 dimensional convergence is obtained by applying the Lindeberg-L´evy central limit Theorem E.2.4. Tightness with respect to the J1 topology of D((0, ∞)) (which means on every compact subintervals [a, b] of (0, ∞)) will be proved by applying Theorems C.2.12 and C.4.5. Theorem 9.2.1 Let {Xj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables with tail index α. Let un be a sequence such that w  n =⇒ T in D((0, ∞)) limn→∞ un = limn→∞ nP(X0 > un ) = ∞. Then T endowed with the J1 topology and T(s) = W(s−α ), s > 0, where W is a standard Brownian motion on R.

Proof. By Theorem E.2.4, in order to prove that the finite-dimensional dis n converge to those of T, we must check the following two tributions of T conditions:

9.2 Functional central limit theorem for the tail empirical process

lim

n→∞

lim n cov(ξn,0 (s), ξn,0 (t)) n→∞ 2 nE[ξn,0 (s)1{|ξn,0 (s)| > }]

241

= (s ∨ t)−α , =0,

for all s, t,  > 0. The second statement holds trivially since |ξn,0 (s)| ≤ (nF (un ))−1/2 , so the indicator is zero for large enough n. The first condition is an immediate consequence of the regular variation of F : P(X0 > un (s ∨ t)) − P(X0 > un s)P(X0 > un s) F (un ) → (s ∨ t)−α .

n cov(ξn,0 (s), ξn,0 (t)) =

In order to prove convergence with respect to the J1 topology, we apply Theorem C.2.15. We must prove that for all 0 < a < b,  n , δ, [a, b]) > ) = 0 , lim lim sup P(w(T

δ→0 n→∞

(9.2.1)

where w denotes the uniform modulus of continuity defined by w(f, δ, [a, b]) =

sup a≤s≤t≤b |s−t|≤δ

|f (t) − f (s)| .

 n indexed by the interval We apply Theorem C.4.5. We consider the process T [a, b]. The Lindeberg condition (i) holds as previously. The random entropy integral condition holds by Corollary C.4.20, since the index set is linearly ordered. We need only to check the L2 continuity condition (ii). In the present context, the random metric dn is defined by d2n (s, t) =

1

n 

nP(X0 > un )

j=1

1{sun < Xj ≤ tun } .

Therefore, we must check that lim lim sup

δ→0 n→∞

sup a≤s≤t≤b |t−s|≤δ

P(sun < X0 ≤ tun ) =0. P(X0 > un )

(9.2.2)

By regular variation, lim

n→∞

P(sun < X0 ≤ tun ) = s−α − t−α . P(X0 > un )

Furthermore, the convergence is uniform on compact intervals of (0, ∞) by the uniform convergence Theorem 1.1.2. Thus (9.2.2) holds.  In the following sections we are going to study the case of β-mixing time series.

242

9 The tail empirical and quantile processes

9.2.2 Dependent data: the blocking method We recall the blocking method which we already used in Section 7.4.1. Let {rn } be an intermediate sequence and set mn = [n/rn ]. The blocking method consists in replacing the original sample (X1 , . . . , Xn ) by a pseudo-sample † † (X1† , . . . , Xn† ) such that the blocks (X(i−1)r , . . . , Xir ), i = 1, . . . , mn , n n +1 are mutually independent and each one has the same distribution as the first block (X1 , . . . , Xrn ). Under the β-mixing assumption, weak convergence  n will be obtained as a consequence of the result for the process results for T †  Tn based on the independent blocks, with the last incomplete block being simply omitted. We define mn  rn ¶ † ©  1 †  Tn (s) = 1 X(i−1)r > u s , (9.2.3a) n +j n nF (un ) i=1 j=1 »  † (s) = nF (un ){T† (s) − E[T† (s)]} . (9.2.3b) T n n n The first task to prove a central limit theorem is usually to compute the variance. By independence and equidistribution of the blocks and by stationarity, we obtain r n  mn † †   var var(Tn (s)) = nF (un s)var(Tn (s)) = 1{Xj > un s} nF (un ) j=1 ⎡ 2 ⎤ rn  mn mn rn2 (F (un s))2 E⎣ 1{Xj > un s} ⎦ − = nF (un ) nF (un ) j=1 =

mn rn F (un s) mn rn2 (F (un s))2 − nF (un ) nF (un ) ã rn Å 2mn rn  j + 1− P(X0 > un s, Xj > un s) . rn nF (un ) j=1

Since mn rn ∼ n and by regular variation, we obtain, provided that the limits below exist,  † (s)) = s−α − s−2α lim rn F (un ) lim var(T n n→∞ ã rn Å  j P(X0 > un s, Xj > un s) . (9.2.4) + 2 lim 1− n→∞ rn F (un ) j=1

n→∞

These computations justify the following assumptions which ensure the existence of the above limits: • We say that Condition R(rn , un ) holds if lim un = ∞ ,

n→∞

lim nF (un ) = ∞ ,

n→∞

lim rn F (un ) = 0 . (R(rn , un ))

n→∞

9.2 Functional central limit theorem for the tail empirical process

243

• We say that Condition S(rn , un ) holds for a time series {Xj , j, ∈ Z} if for all s, t > 0, lim lim sup

m→∞ n→∞

rn  1 P(|X0 | > un s, |Xj | > un t) = 0 . P(|X0 | > un ) j=m

(S(rn , un )) Note that −r n P (|Xj | > un | |X0 | > un ) j=−m

=

−r n

−r n P (|X0 | > un , |X−j | > un ) P (|Xj | > un , |X0 | > un ) = P(|X0 | > un ) P(|X0 | > un ) j=−m j=−m

=

rn rn   P (|X0 | > un , |Xj | > un ) = P(|Xj | > un | |X0 | > un ) . P(|X0 | > un ) j=m j=m

Thus, Condition S(rn , un ) can be written equivalently as a one-sided or twosided sum. Remark 9.2.2 Condition S(rn , un ) holds for m-dependent sequences, for any sequence rn and un such that limn→∞ rn F (un ) = 0. Condition S(rn , un ) implies condition AC(rn , un ) hence it implies that lim|j|→∞ Yj = 0. Assume now that {Xj , j ∈ Z} is regularly varying and let {Yj , j ∈ Z} be the associated tail process.  −1 t | Lemma 9.2.3 If R(rn , un ) and S(rn , un ) hold, then j∈Z P(Yj > s Y0 > 1) < ∞ for all s, t > 0 and   † (s), T  † (t)) = lim cov(T s−α P(Yj > s−1 t | Y0 > 1) . (9.2.5) n n n→∞

j∈Z

If the series is extremally independent then the limiting covariance is the  † (s), T  † (t)) = (s ∨ t)−α . same as for an i.i.d. series, i.e. limn→∞ cov(T n n Proof. Adapting the computations that lead to (9.2.4) and invoking Condition R(rn , un ), we have  † (s), T  † (t)) lim cov(T n n

n→∞

ã rn Å  j 1− n→∞ rn j=1

= (s ∨ t)−α + lim  ×

P(X0 > un s, Xj > un t) P(X0 > un t, Xj > un s) + F (un ) F (un )

 .

(9.2.6)

244

9 The tail empirical and quantile processes

We must prove the existence of the limit in (9.2.6). By regular variation, for each fixed positive integer m, it holds that m−1 Å

lim

n→∞

j 1− rn

j=1

ã

P(X0 > un s, Xj > un t) F (un ) =

m−1 

s−α P(Yj > s−1 t | Y0 > 1) .

j=0

Fix  > 0. By Condition S(rn , un ), we can choose m0 large enough so that for all m ≥ m0 , lim sup n→∞

rn  P(X0 > un s, Xj > un t) ≤. F (un ) j=m

The previous two displays yield, for all m ≥ m0 , r Å ã n  P(X0 > un s, Xj > un t) j  lim sup  1−  r F (un ) n→∞ n j=1



m−1  j=1

   s−α P(Yj > s−1 t | Y0 > 1) ≤  . 

This yields that lim sup n→∞

ã rn Å  P(X0 > un s, Xj > un t) j 1− rn F (un ) j=1 ≤

m 0 −1 

s−α P(Yj > s−1 t | Y0 > 1) +  < ∞ .

j=1

This in turn shows that for all m ≥ m0 m−1  j=1

s−α P(Yj > s−1 t | Y0 > 1) r Å  ã n  P(X0 > un s, Xj > un t)  j  ≤ lim sup  1− +  rn F (un ) n→∞  j=1 ≤

m 0 −1 

s−α P(Yj > s−1 t | Y0 > 1) + 2 < ∞ .

j=1

This proves the stated summability and we finally obtain that

9.2 Functional central limit theorem for the tail empirical process

lim

n→∞

rn Å 

1−

j=1

j rn

ã

245

P(X0 > un s , Xj > un t) F (un ) ∞  s−α P(Yj > s−1 t | Y0 > 1) . = j=1

The second sum in (9.2.6) is dealt with identically and this proves (9.2.5).  The limiting covariance function in (9.2.5) can be expressed in terms of the spectral tail process {Θj , j ∈ Z}. Since P(Y0 > 1) = P(Θ0 = 1), we have  P(Yj > t/s | Y0 > 1) s−α j∈Z −α

=s =



 j∈Z

1



P(rΘj > s−1 t | Θ0 = 1)αr−α−1 dr

E[{(Θj /t) ∧ (1/s)}α + | Θ0 = 1] .

(9.2.7)

j∈Z

Remark 9.2.4 Let {X j , j ∈ Z} be a regularly varying time series with values in Rd . If we consider the tail empirical process based on |X j |, then Lemma 9.2.3 yields  P(|Y j | > 1) < ∞ . j∈Z

Thus, in passing, we have proved that for a  multivariate regularly varying time series, Condition S(rn , un ) implies that j∈Z P(|Y j | > 1) < ∞. ⊕

9.2.3 Central limit theorem In order to justify the blocking method, we again assume that the time series is β-mixing and satisfies condition β(rn , n ) already used in Chapter 7. We recall it here for convenience. Let n and rn be non-decreasing sequences of integer such that 1 n rn n = lim = lim = lim βn = 0 . (β(rn , n )) lim n→∞ n n→∞ rn n→∞ n n→∞ rn

Proposition 9.2.5 Let {Xj , j ∈ Z} be a stationary regularly varying sequence with a continuous marginal distribution F , {un } be a scaling sequence and {rn } be an intermediate sequence. Assume that conditions R(rn , un ), β(rn , n ), and S(rn , un ) hold. Then the finite n converge to those of a Gaussian process dimensional distributions of T T with covariance function defined in (9.2.5).

246

9 The tail empirical and quantile processes

Remark 9.2.6 This result contains the case of an i.i.d. sequence since all its assumptions hold. Condition S(rn , un ) is implied by R(r »n , un ) which holds for any sequence un such that nF (un ) → ∞ and rn = 1/ F (un ). Condition √ β(rn , n ) holds since βn = 0 for all n = 0 and we can choose n = rn . The sequences rn and n do not play any role in the i.i.d. framework. ⊕

Recall that T (s) = s−α . If the sequence {Xj } is extremally independent, we have the following corollary. Corollary 9.2.7 Under the assumptions of Proposition 9.2.5, if moreover the sequence {Xj } is extremally independent, then T has the same distribution as W ◦ T where W is the standard Brownian motion.

Proof (Proof of Proposition 9.2.5). We first prove the central limit theorem  † . Set, for i = 1, . . . , mn for T n Sn,i (s) =

rn ¶ ©  1 X(i−1)rn +j > un s , S n,i (s) = Sn,i (s) − E[Sn,1 (s)] , j=1

(9.2.8) † (s) = Sn,i

rn 

¶ † © † † 1 X(i−1)r > un s , S n,i (s) = Sn,i (s) − E[Sn,1 (s)] . n +j

j=1

(9.2.9) Then the process in (9.2.3b) becomes  † (s) = {nF (un )}−1/2 T n

mn 



S n,i (s)

i=1

and since it is a sum of independent summands, we apply Theorem E.2.4 to obtain the finite-dimensional convergence. We have already proved the convergence of the variance and we must now prove the negligibility condition. † (s) have the same distribution, we must prove that for Since Sn,1 (s) and Sn,1 every , s > 0, »  2   mn E S n,1 (s)1 |S n,1 (s)| >  nF (un ) = 0 . lim n→∞ nF (un ) This condition is proved in Lemma 9.2.8. Thus the finite-dimensional distri † converge to those of G and there only remains to prove that butions of T n   Tn has the same limiting distribution. This is done in Lemma 9.2.9.

9.2 Functional central limit theorem for the tail empirical process

247

Lemma 9.2.8 If Conditions R(rn , un ) and S(rn , un ) hold then for every  > 0 and every s > 0, »  2   1 E S n,1 (s)1 |S n,1 (s)| >  nF (un ) = 0 . lim n→∞ rn F (un ) Proof. It suffices to show the asymptotic negligibility for s = 1. For simplicity, we set Sn = Sn,1 and S n = S n (1). In a first step we note that the centering can be omitted. Indeed,   2 E S n 1{|S |>√nF (u )} n n ⎡ ⎤ 2 r n »    = E⎣ 1{Xj > un } 1 |S n | >  nF (un ) ⎦ (9.2.10a) j=1

+ (rn F (un ))2 P(|S n | > ) rn »    − 2rn F (un ) E[1{Xj > un }1 |S n | >  nF (un ) ] .

(9.2.10b) (9.2.10c)

j=1

  » By omitting the indicator 1 |S n | >  nF (un ) in (9.2.10c) and using the assumption R(rn , un ), we obtain »   2  E S n 1 |S n | >  nF (un ) ⎡ ⎤ 2 rn »    = E⎣ 1{Xj > un } 1 |S n | >  nF (un ) ⎦ + o(rn F (un )) .

(9.2.11)

j=1

Write now ⎡ ⎤ 2 rn »    E⎣ 1{Xj > un } 1 |S n | >  nF (un ) ⎦ = I1 + 2I2 j=1

with I1 =

rn 

»    E 1{Xj > un }1 |S n | >  nF (un ) ,

j=1

I2 =

rn  rn 

»    E 1{Xi > un }1{Xj > un }1 |S n | >  nF (un ) .

i=1 j=i+1

Consider first the term I1 . We have

248

9 The tail empirical and quantile processes

1

I1 ≤ »  nF (un )

rn 

E[1{Xj > un }|S n |]

j=1

n  rn F (un ) 1 E[1{Xj > un }Sn ] + » ≤ »  nF (un ) j=1  nF (un )

r

n  n  1 P(Xi > un , Xj > un ) + o(rn F (un )) = »  nF (un ) j=1 i=1

r

r

r rn n −1  rn F (un ) 2 = » P(Xi > un , Xj > un ) + o(rn F (un )) + »  nF (un )  nF (un ) j=1 i=j+1 n P(X0 > un , Xj > un ) 2rn F (un )  + o(rn F (un )) . ≤ » F (un )  nF (un ) j=1

r

In the last line we have used stationarity. We already know by Lemma 9.2.3 that the sum in the last line has a limit, so it is bounded and this yields I1 = o(rn F (un )). Consider now the term I2 . Fix  > 0 and applying Condition S(rn , un ), we can choose a positive integer such that, for large enough n, rn  P(Xj > un | X0 > un ) ≤  . j=m

Then I2 =

rn  m 

»   E[1{Xi > un }1{Xj > un }1 |S n | >  nF (un ) ]

i=1 j=i+1 rn rn  

+

»   E[1{Xi > un }1{Xj > un }1 |S n | >  nF (un ) ]

i=1 j=i+m

≤ mI1 + rn F (un ) . This proves that lim sup n→∞

I2 ≤ rn F (un )

and since  is arbitrary we have proved that I2 = o(rn F (un )).



Lemma 9.2.9 Assume that conditions R(rn , un ), β(rn , n ), and S(rn , un )  n and those of hold. Then the finite-dimensional limiting distributions of T  † coincide. T n Proof. The first step is to show that the last incomplete block can be removed and also that sub-blocks of length n (called small blocks hereafter) can be

9.2 Functional central limit theorem for the tail empirical process

249

removed at the end of each larger block of length rn . By Lemma 9.2.3, the variance of the last incomplete block is of order O((n − mn rn )F (un )) = o(rn F (un )) and therefore its contribution to the overall sum is negligible. By Lemma E.3.5, there only remains to prove that the variance of the sum over the small independent blocks is negligible. This is true since m   n  n n ¶ † © ¶ † ©   var 1 Xirn −n +j > un s = mn var 1 Xj > u n s i=1 j=1

j=1

= O(mn n F (un )) = o(nF (un )) . 

9.2.4 Functional central limit theorem In order to prove the functional central limit theorem, we must now prove n. the tightness of the sequence of processes T Theorem 9.2.10 Let {Xj , j ∈ Z} be a stationary regularly varying sequence with a continuous marginal distribution F , {un } be a scaling sequence and {rn } be an intermediate sequence. Assume that con n converges weakly ditions R(rn , un ), β(rn , n ), S(rn , un ) hold. Then T to the Gaussian process T in D((0, ∞)) endowed with the J1 topology and the process T has an almost surely continuous version.

Proof. First, we deal with the incomplete block (Xmn rn +1 , . . . , Xn ). Since n − mn rn ≤ rn , we have for any interval [a, b] in (0, ∞),   n    1   sup » {1{Xj > un s} − P(X0 > un s)}    s∈[a,b] nF (u ) n

≤»

j=mn rn +1

1

n 

nF (un ) j=mn rn +1

{1{Xj > un a} − P(X0 > a)} +

2rn P(X0 > un a) » . rn nF (un )

The first term in the right-hand side is negligible by the same argument as in Lemma 9.2.9. The second one vanishes by R(rn , un ). We will apply Theorem C.2.15. We will prove the asymptotic equicontinuity  n with respect to the uniform norm on each interval [a, b] of the process T with 0 < a < b. That is we will prove that for all  > 0,

250

9 The tail empirical and quantile processes

 n , δ, [a, b]) > ) = 0 . lim lim sup P(w(T

(9.2.12)

δ→0 n→∞

The modulus of continuity w was defined in Theorem 9.2.1: w(f, δ, [a, b]) =

sup a≤s≤t≤b |s−t|≤δ

|f (t) − f (s)| .

 n,1 , T  n,2 , T  † and T  † by Define the processes T n,1 n,2  n,1 = {nF (un )}−1/2 T



 n,2 = {nF (un )}−1/2 S n,i , T

1≤i≤mn i odd

 † = {nF (un )}−1/2 T n,1





S n,i ,

1≤i≤mn i even

†  † = {nF (un )}−1/2 S n,i , T n,2

1≤i≤mn i odd





S n,i ,

1≤i≤mn i even



where the block variables S n,i are defined in (9.2.8)–(9.2.9). These processes are sums over blocks separated by a distance rn . Thus we can apply Lemma E.3.4 and we obtain that  † ), L(T  n,i )) ≤ mn βr , i = 1, 2 . dTV (L(T n n,i Let  > 0. Since w(f + g, δ, [a, b]) ≤ w(f, δ, [a, b]) + w(f, δ, [a, b]), we have, by the definition of the total variation distance,  n , δ, [a, b]) > ) P(w(T  n,1 , δ, [a, b]) > /2) + P(w(T  n,2 , δ, [a, b]) > /2) ≤ P(w(T  † , δ, [a, b]) > /2) + P(w(T  † , δ, [a, b]) > /2) + 2mn βr . ≤ P(w(T n n,1 n,2 † Therefore it suffice to prove separately the asymptotic equicontinuity of T n,1  † , or equivalently of T  † since these processes are sums of the same and T n n,2 i.i.d. independent summands. We proceed with this task now.  † is the sum of the rowwise i.i.d. c`adl` Since T ag (hence separable) processes n † Sn,i (defined in (9.2.9)) indexed by [a, b], we can apply Theorem C.4.5 to prove the asymptotic equicontinuity property of Theorem C.2.12. (i) The Lindeberg condition (i) of Theorem C.4.5 holds by Lemma 9.2.8. (ii) Define vn = nF (un ) and the random metric dn on [a, b] by d2n (s, t) = vn−1

mn 

† † {Sn,i (s) − Sn,i (t)}2 .

i=1

In order to prove condition (ii) of Theorem C.4.5, note that for s < t, we have by monotonicity,

9.3 Quantile process and intermediate order statistics

E[d2n (s, t)] =

251

2 2 E[Sn,1 (s)] − E[Sn,1 (t)] E[{Sn,1 (s) − Sn,1 (t)}2 ] ≤ . rn F (un ) rn F (un )

2 (s)] = By Lemma 9.2.3and R(rn , un ), we know that limn→∞ vn−1 E[Sn,1 −α with ς = j∈Z P(Yj > 1 | Y0 > 1). The limiting function being ςs continuous, the convergence is uniform by Dini’s Theorem E.4.12. Thus Item (ii) holds by Lemma C.4.7.

(iii) To prove that condition (iii) of Theorem C.4.5 holds, define the map Es on 0 (R) by  Es (x) = 1{xj > s} , x = (xj , j ∈ Z) . j∈Z

Let G be the class of functions on 0 (R) defined by G = {Es , a ≤ s ≤ b} and Zn be the sequence of processes indexed by G defined by Zn (Es ) = vn−1/2

mn 

† −1/2 Es (u−1 n X (i−1)rn +1,irn ) = vn

i=1

mn 

†  † (s) . Sn,i (s) = T n

i=1

Define the random metric d˜n on G by d˜2n (Es , Et ) = vn−1

mn 

† † −1 2 {Es (u−1 n X (i−1)rn +1,irn ) − Et (un X (i−1)rn +1,irn )} .

i=1

Then d˜n (Es , Et ) = dn (s, t), thus proving the random entropy condition (C.4.5) for [a, b] with respect to the random metric dn is equivalent to proving it for the class G with respect to the random metric d˜n . This change of point of view is needed to apply Lemma C.4.8. Since the class G is linearly ordered, the uniform entropy condition (C.4.9) holds by Corollary C.4.20. Since the envelope function Ea satisfies (C.4.8) by Lemma 9.2.3, the assumptions of Lemma C.4.8 are satisfied and the random entropy condition Item (iii) of Theorem C.4.5 holds. Therefore, we can apply Theorem C.4.5 which implies that (9.2.1) holds. We w   n =⇒ conclude by Theorem C.2.15 that T T in D((0, ∞)) endowed with the  J1 topology and that P(T ∈ C((0, ∞))) = 1.

9.3 Quantile process and intermediate order statistics Let X(n:1) ≤ X(n:2) ≤ · · · ≤ X(n:n) be the order statistics of X1 , . . . , Xn . Our goal is to establish central limit theorem for the so-called intermediate order statistic X(n:n−k) , appropriately standardized, where k (with implicit

252

9 The tail empirical and quantile processes

dependence in n) is an intermediate sequence. We now define the sequence un in terms of k by un = F ← (1 − k/n) and we define the tail empirical quantile ‹n on (0, n/k) by function Q ‹n (t) = T← ([kt]/k) = X(n:n−[kt]) . Q (9.3.1) n un ‹n (1) = u−1 X(n:n−k) . Note that Q n For s0 ∈ (0, 1), define Bn (s0 ) = sup |Tn (t) − T (t)| .

(9.3.2)

t≥s0

By the Uniform Convergence Theorem 1.1.2, Bn (s0 ) converges to zero as n tends to infinity for every s0 > 0. Therefore, it is always theoretically possible (without further assumption) to choose the intermediate sequence k in such √ a way that limn→∞ kBn (s0 ) = 0. Set Q(t) = T ← (t) = t−1/α , t > 0, so that Q (1) = −1/α. Theorem 9.3.1 Let {Xj , j ∈ Z} be a stationary regularly varying sequence with a continuous marginal distribution F . Let k be an intermediate sequence (depending on n) and define un = F ← (1 − k/n). Assume that conditions R(rn , un ), β(rn , n ), S(rn , un ) hold. Assume moreover that there exists s0 ∈ (0, 1) such that √ lim kBn (s0 ) = 0 . (9.3.3) n→∞

Then, ä w √ Ä ‹n − Q =⇒ k Tn − T, Q (T, −Q · T ◦ Q) , −1/α

in D([s0 , ∞)) × D((0, s0 ]) endowed with the product J1 topology. Consequently, © d √ ¶ X(n:n−k) k − 1 −→ α−1 T(1) , (9.3.4) un jointly with the convergence of



k(Tn − T ) to W ◦ T .

Proof. Condition (9.3.3) implies that ä √ Ä  n (s) + o(1) , k Tn (s) − T (s) = T where the o(1) term is deterministic and uniform on [s0 , ∞). This implies that √ k(Tn − T ) converges to T in D([s0 , ∞)). By Vervaat’s Lemma (cf. Theorem C.2.16) this implies that

9.4 Tail empirical process with random levels

253

√ w k(Tn − T, (Tn )← − T ← ) =⇒ (T, −(T ← ) × T ◦ T ← ) −1/α

in D([s0 , ∞)) × D((0, s0

]) endowed with the product J1 topology.



Remark 9.3.2 Using (9.2.5), we see that the limiting variance in (9.3.4) is  α−2 var(T(1)) = α−2 P(Yj > 1 | Y0 > 1) . j∈Z

Under extremal independence, we have seen in Corollary 9.2.7 that T = W◦T , where W is a Brownian motion. This yields the following corollary. Corollary 9.3.3 Under the assumptions of Theorem 9.3.1 and if the sequence {Xj } is extremally independent, then ä w √ Ä ‹n − Q =⇒ k Tn − T, Q (W ◦ T, −Q W) . √ d In particular, k(X(n:n−k) /un − 1) −→ α−1 W(1), jointly with the con√ vergence of k(Tn − T ) to W ◦ T .

9.4 Tail empirical process with random levels The convergence result (9.3.4) for order statistics justifies the substitution made in practice of the deterministic threshold un by a random threshold X(n:n−k) . Let the sequence un be defined by un = F ← (1 − k/n). Define, for s > 0, n © 1 ¶ Tn (s) = 1 Xj > X(n:n−k) s = Tn (sX(n:n−k) /un ) , (9.4.1a) k j=1 √  n (s) = k{Tn (s) − T (s)} . T (9.4.1b) Theorem 9.4.1 Let {Xj , j ∈ Z} be a stationary regularly varying sequence with an absolutely continuous marginal distribution F . Let k be an intermediate sequence and define un = F ← (1 − k/n). Assume that conditions R(rn , un ), β(rn , n ), S(rn , un ), and (9.3.3) hold. Then  n ⇒ T − T × T(1) , T in D([s0 , ∞)) endowed with the J1 topology.

254

9 The tail empirical and quantile processes

Remark 9.4.2 It is important to note that the tail empirical processes with deterministic and random thresholds converge weakly to different limits. Proof. The proof of weak convergence for the tail empirical process with random levels relies on three items: weak convergence of the process with  n , homogeneity of the limiting function T and the bias deterministic levels T condition.  n into three components: We decompose T  n (sX(n:n−k) /un )  n (s) = T T © √ ¶ + k Tn (sX(n:n−k) /un ) − T (sX(n:n−k) /un ) © √ ¶ + k T (sX(n:n−k) /un ) − T (s) .

(9.4.2a) (9.4.2b) (9.4.2c)

P

• By Theorem 9.3.1, X(n:n−k) /un −→ 1. Thus, by Corollary 9.2.7, the right hand side of (9.4.2a) converges to T in D([s0 , ∞)). P

• Since X(n:n−k) /un −→ 1 and since we have chosen s0 ∈ (0, 1), with probability tending to one, we have, for all s ≥ s0 , |Tn (sX(n:n−k) /un ) − T (sX(n:n−k) /un )| ≤ Bn (s0 ). Therefore the right-hand side of (9.4.2b) is oP (1), uniformly with respect to s ≥ s0 . • Since T (s) = s−α , by Theorem 9.3.1 and the delta method, we have © © √ ¶ √ ¶ k T (sX(n:n−k) /un ) − T (s) = s−α k (X(n:n−k) /un )−α − 1 =⇒ s−α T  (1)α−1 T(1) = −T (s)T(1) , w

jointly with Tn (sX(n:n−k) /un ).



In the case of extremal independence, the limit can be expressed in terms of the Brownian bridge B (i.e. the continuous Gaussian process such that cov(B(s), B(t)) = s ∧ t − st, or equivalently B(t) = W(t) − tW(1)).

Corollary 9.4.3 Under the assumptions of Theorem 9.4.1, if moreover w  n =⇒ B ◦ T in the sequence {Xj } is extremally independent, then T D([s0 , ∞)) endowed with the J1 topology.

9.5 The Hill estimator We give here the best-known applications of the previous results which are consistency and asymptotic normality of the Hill estimator.

9.5 The Hill estimator

255

Recall that T (t) = t−α and set γ = 1/α. Then  ∞  ∞ T (t) dt = − γ= log(t)T  (t)dt . t 1 1 Replacing T with its empirical approximation, n © 1 ¶ Tn (s) = 1 Xj > X(n:n−k) s , k j=1

noting that the random function Tn has jumps of size −1/k at t = Xj /X(n:n−k) and recalling that the integration is over t ≥ 1, yields an estimate of γ:  ∞ Tn (t) dt = − = log(t)Tn (dt) t 1 1 n k 1 1 log+ (Xj /X(n:n−k) ) = log(Xj /X(n:n−k) ) . = k j=1 k j=1 

γ n,k



(9.5.1)

This estimator is called the Hill estimator.

9.5.1 Consistency The weak convergence of the tail empirical distribution function implies the consistency of the Hill estimator. P

Theorem 9.5.1 If (9.1.1) holds, then γ n,k −→ γ.

Proof. Note first that  γ n,k =

1 X(n:n−k) un

 ∞  Tn (s) Tn (s) ds + ds . s s 1 P

If (9.1.1) holds, then Proposition 9.1.3 implies that u−1 n X(n:n−k) −→ 1. Thus, the first integral above is oP (1). As explained in Section 9.1, the convergence of Tn to T is uniform on compact intervals of [1, ∞). Therefore, for all t > 0,  t   t Tn (s) T (s) P ds −→ ds. s s 1 1 t Since limt→∞ 1 T (s) s ds = γ, the consistency of the Hill estimator will be obtained if we prove that for all η > 0,

256

9 The tail empirical and quantile processes

Ç lim lim sup P

t→∞ n→∞

Tn (s) ds > η s



t

å =0.

(9.5.2)

Since E[Tn (s)] = nk −1 F (un s) and k = nF (un ), we have ô ñ ∞   ∞ Tn (s) n ∞ F (un s) ds = ds . F (un s)ds = E s k t sF (un ) t t Fix  ∈ (0, α). By Potter’s bound (Proposition 1.4.2), we have ô ñ ∞  ∞ Tn (s) ds ≤ cst E s−α+−1 ds = O(t−α+ ) . s t t 

This proves (9.5.2) by Markov’s inequality.

9.5.2 Asymptotic normality To study its asymptotic behavior, we need to extend the anticlustering conditions S(rn , un ) as well as the bias condition (9.3.3). We will say that the condition Slog (rn , un ) holds if for all s, t > 0, ï Å ã Å ãò rn  X0 Xj 1 lim lim sup E log+ log+ =0. m→∞ n→∞ P(X0 > un ) sun tun j=m (Slog (rn , un )) Note that each term in the above summation is finite and by regular variation (see Problem 1.11), we have  ∞ E[log2+ (X0 /un )] =α log2 (t)t−α−1 dt < ∞ . lim n→∞ F (un ) 1 Theorem 9.5.2 Let {Xj , j ∈ Z} be a stationary regularly varying sequence with an absolutely continuous marginal distribution F . Let k be an intermediate sequence and let the sequence {un } be defined by k = nF (un ). Assume that conditions R(rn , un ), β(rn , n ), S(rn , un ), Slog (rn , un ), (9.3.3) and √  ∞ Tn (t) − T (t) dt = 0 (9.5.3) lim k n→∞ t 1 hold. Then √

d

k { γn,k − γ} −→ γ

 0

1

T(t−γ ) − tT(1) dt . t

9.5 The Hill estimator

257

The limiting distribution is Gaussian with mean zero and variance  γ2 P(Yj > 1 | Y0 > 1) . (9.5.4) j∈Z

If moreover the series {Xj } is extremally independent, then √

d

k { γn,k − γ} −→ N(0, γ 2 ) .

Proof. Note that Tn (t) = Tn (tX(n:n−k) /un ). Therefore  ∞  ∞ dt dt Tn (tX(n:n−k) /un ) = X Tn (t) γ n,k = (n:n−k) t t 1 un  1  ∞ dt dt Tn (t) + X Tn (t) . = (n:n−k) t t 1 u n

This yields √ k( γn,k − γ)  ∞ √  ∞ Tn (t) − T (t) √  1 dt  n (t) dt + k Tn (t) . (9.5.5) = dt + k X T (n:n−k) t t t 1 1 u n

P The middle term goes to zero under Condition (9.5.3). Since Tn (1) −→ 1, applying the joint convergence of Theorem 9.3.1, we expect that a continuous mapping argument yields  ∞ ã √ Å √  n (t) dt − k X(n:n−k) − 1 k( γn,k − γ) ∼ T t un 1  1  ∞ dt dt d −→ T(t) − γT(1) = γ T(t−γ ) − γT(1) , t t 1 0

which is the announced limiting distribution. (Here and below, ∼ means that both sides have the same limiting distribution.) To prove rigorously these approximations, we must prove that  ∞  ∞ dt d  n (t) dt −→ (9.5.6a) T(t) , T t t 1 1 Å ã √  1 X(n:n−k) dt √ Tn (t) ∼ k k X −1 . (9.5.6b) (n:n−k) t un u n

P

For (9.5.6b) set ζn = X(n:n−k) /un . Since ζn −→ 1, we can assume that for n large enough, ζn ∈ (1 − , 1 + ) for some  ∈ (0, 1). Hence, Theorem 9.3.1 implies that

258

9 The tail empirical and quantile processes

√  1 k X

(n:n−k) un

dt Tn (t) = t

√  1 k X

+



T (t)

(n:n−k) un

k

sup t∈(1−ζn ,1+ζn )

=

√  1 k X

(n:n−k) un

T (t)

dt t |Tn (t) − T (t)|(ζn − 1) dt + oP (1) . t

The delta method and T (1) = 1 finish the proof of (9.5.6b). We now turn to the proof of (9.5.6a). For a fixed A > 0, Theorem 9.2.10 yields A   n (t)dt converges weakly to A t−1 T(t)dt as n → ∞. Moreover, that 1 t−1 T 1  ∞ −1 P  † defined in (9.2.3b) and t T(t)dt −→ 0 as A → ∞. Recall the process T n A write  ∞  ∞  n (t) dt , J† =  † (t) dt . Jn,A = T T n n,A t t A A The convergence (9.5.6a) will be proved if we show that for every  > 0, lim lim sup P (|Jn,A | > ) = 0 .

A→∞ n→∞

Application of Lemma E.3.4 with h = 1{|Jn,A | > } implies Ä ä P (|Jn,A | > ) ≤ P |J†n,A | >  + mn βrn , with mn = [n/rn ]. The last term tends to zero as n tends to infinity by β(rn , n ) (recall that n ≤ rn ). Therefore, to finish the proof it suffices to show that ä Ä (9.5.7) lim lim sup var J†n,A = 0 . A→∞ n→∞

By independence of the blocks, we have r  † mn n (X(i−1)r +j /un )∨A Ä † ä   n dt 1 var Jn,A = var t nF (un ) i=1 j=1 A ⎡ ⎤ Å ã 2 rn  X 1 j ⎦ ≤ E⎣ log+ Aun rn F (un ) j=1 ≤2

rn  E[log+ (X0 /(Aun )) log+ (Xj /(Aun ))] j=0

F (un )

We have by regular variation (see Problem 1.11)

.

9.5 The Hill estimator

lim

m−1 

n→∞

j=0

259

E[log2+ (X0 /(Aun ))] E[log+ (X0 /(Aun )) log+ (Xj /(Aun ))] ≤ m lim n→∞ F (un ) F (un )  ∞ =m log2 (t)αt−α−1 dt . A

The latter term vanishes as A → ∞. Also, for each A > 0, rn  E[log+ (X0 /(Aun )) log+ (Xj /(Aun ))] =0 m→∞ n→∞ F (un ) j=m

lim lim

by Slog (rn , un ). This proves (9.5.7) and concludes the proof of the asymptotic normality of the Hill estimator. Finally, we prove the expression (9.5.4) of the limiting variance in terms of the tail process. Let W and B be the processes defined on [0, 1] by W(t) = T(t−1/α ) and B(t) = T(t−1/α ) − tT(1), t ∈ [0, 1]. If the series {Xj } is extremally independent then W is the standard Brownian motion and B is the standard  1Brownian bridge. The limiting distribution of the Hill estimator is Z = α−1 0 B(t)t−1 dt. Recall that the covariance of the process T is given by (9.2.5) and let c∗ denote the autocovariance function of W, i.e.  c∗ (s, t) = s P(Yj > (s/t)1/α | Y0 > 1) . (9.5.8) j∈Z

Since B(t) = W(t) − tW(1), we obtain 



c∗ (s, t) − sc∗ (t, 1) − tc∗ (s, 1) + stc∗ (1, 1) dsdt st 0 0    1 ∗  1 1 ∗ c (s, t) c (s, 1) = γ 2 c∗ (1, 1) + dsdt − 2 ds . st s 0 0 0 1

1

var(Z) = γ 2

Since c∗ (s, t) = c∗ (t, s) = tc∗ (s/t, 1), we further have  0

1

 0

1

 1 t ∗ c∗ (s, t) c (s/t, 1) dsdt = 2 dsdt st s 0 0 0 0  1 1 ∗  1 ∗ c (u, 1) c (u, 1) =2 dudt = 2 du . u u 0 0 0

c∗ (s, t) dsdt = 2 st



1



t

Therefore var(Z) = γ 2 c∗ (1, 1) as announced and reduces to γ 2 in case of extremal independence. 

Remark 9.5.3 In the course of the proof of Theorem 9.5.2, we have proved that

260

9 The tail empirical and quantile processes

 1



 n (s)ds −→ s−1 T d





s−1 T(s)ds .

1

Cf. (9.5.6a). Thus, if we define γ un =

n 1 log (Xj /un ) , k j=1 +

(9.5.9)

we obtain under the assumptions of Theorem 9.5.2 that  1  ∞ √ T(s) dt d ds = γ k{ γun − γ} −→ W(t) , s t 1 0 where W is as above: it is a Gaussian process with covariance function c∗ given in (9.5.8). Thus, the computations for the variance in the previous proof yield   1 1 ∗  1  1 ∗ c (s, t) c (u, 1) −1 −γ dsdt = 2 du . t T(t )dt = var st u 0 0 0 0 In case of extremal independence, the above integral is equal to 1 and the asymptotic variance of γ un is 2γ 2 . ⊕ Remark 9.5.4 Instead of using a deterministic normalization for the sum and a random scaling of the observation as in the Hill estimator, we can define an estimator with a random normalization related to a deterministic scaling. Define n  1 γ un log+ (Xj /un ) = γ un = n . (9.5.10)  1{X > u } j n j=1 Tn (1) j=1 Applying Theorem 9.2.10 and the previous remark, we easily obtain that  1  ∞ √ T(s) W(s) − sW(1) d ds − γT(1) = γ ds . k { γun − γ} −→ s s 1 0 This is the same limit as in Theorem 9.5.2.



Remark 9.5.5 Conditions under which a quantitative estimate of the sequences k which satisfy (9.5.3) are called second-order conditions. These conditions are not really useful in practice since they involve unknown parameters. Examples of such conditions are given in Problems 9.6 and 9.7. ⊕

9.6 Problems 9.1 Consider the MA(1) process of Example 5.2.10. Show that the series in (9.5.4) is equal to ™ ß 1 ∧ φα 1{φ > 0} . α−2 1 + 2 1 + φα

9.6 Problems

261

9.2 Consider the AR(1) process of Example 5.2.12. Show that the series in (9.5.4) is equal to α−2

1 + ρα . 1 − ρα

9.3 Let {Xj , j ∈ Z} be a sequence of i.i.d. random variables with tail F (x) = x−1/γ . Prove directly that √

d

n { γn,k − γ} −→ N(0, γ 2 ) ,

where γ n,k is defined in (9.5.1). 9.4 Let {Xj , j ∈ Z} be a stationary regularly varying sequence with an absolutely continuous marginal distribution F . Let k be an intermediate sequence P (depending on n). Assume that α > 1 and that Tn (s) −→ T (s) for all s > 0. Show that k  X(n:n−j+1) P α “n,k = 1 Θ . −→ k j=1 X(n:n−k) α−1

9.5 Let {Xj , j ∈ Z} be a non-negative stationary regularly varying time series and define (2)

k 1 (log(Xj /X(n:n−k) ))2 , k j=1  −1 (2) γn,k )2 γ n,k − 2( ( γn,k )2 1 1− , =γ n,k + 1 − = γ  + n,k (2) (2) 2 γ n,k 2( γn,k − ( γn,k )2 )

γ n,k = (M )

γ n,k

where γ n,k is defined in (9.5.1). Assume that the conditions of Theorem 9.5.2 hold and √  ∞ Tn (t) − T (t) log(t)dt = 0 . (9.6.1) lim k n→∞ t 1 Prove that   ∞ © d √ ¶ (M ) dt −2 ∞ dt −1 {T(t)−tT(1)} +γ T(t) log(t) k γ 1=n,k − γ −→ (1 − 2γ ) t t 1 1  1  1 W(s) − sW(1) W(s) log(s) ds − ds , = (γ − 2) s s 0 0 with W(s) = T(s−1/α ). Prove that in the i.i.d. case the limiting variance is 1 + γ2.

262

9 The tail empirical and quantile processes

9.6 Let F be a continuous distribution function on R and assume that there exists c, α > 0, and β > α such that   (9.6.2) sup tβ F (t) − ct−α  < ∞ . t≥1

Prove that k = o(nρ ) with ρ = 2(β − α)/(2β − α) is a suitable choice for (9.5.3) to hold. 9.7 Let F be a continuous distribution function on R and assume that there exists c, α, β > 0 such that   (9.6.3) sup tβ F (t) − c log(t)t−α  < ∞ . t≥1

Deduce that k = o(log2 (n)) is a suitable choice (9.5.3) to hold.

9.7 Bibliographical notes The estimation of the tail index (or more generally of the extreme value index) is the statistical problem with the longest history in extreme value theory. For an exhaustive list of reference in the case of i.i.d. random variables, see [dHF06]. The fact that consistency of the tail empirical process implies that of order statistics and the Hill estimators was noticed by [RS98]. Consistency of the tail empirical measure for an AR(p) process with heavy-tailed innovation was obtained by [RS97]. Central and functional central limit theorems for the tail and quantile empirical process of i.i.d. data were obtained, among other, by [CCHM86], [Ein92], [Dre98a, Dre98b]. These references use strong approximation techniques to obtain stronger results such as weighted approximations. [Roo95, Roo09] obtained the functional central limit theorem for the tail empirical process under α and β-mixing. [Dre00, Dre02] strengthened these results by strong approximation techniques for η-mixing sequences. The Hill estimator was introduced in [Hil75]. For i.i.d. data, asymptotic normality of the Hill estimator was proved by [Hal82], [DR84]. [HT85], [CM85] among others. [HW84] proved that the Hill estimator is optimal within certain classes of distributions. For stationary sequences with a regularly varying marginal distribution, the central limit for the Hill estimator was proved under a β-mixing condition (which applies to finite order moving averages) by [Hsi91b]. In [Dre98a, Dre98b], the Hill and other estimators of the tail index are expressed as functionals of the tail quantile process and the limiting theory for the latter induces the limiting theory for these estimators.

9.7 Bibliographical notes

263

The approach to the Hill estimator presented here differs mainly from the literature in the use of the tail process and condition S(rn , un ). This makes both the assumptions and the results arguably more transparent. The βmixing assumption makes the technical parts of the proofs involved with the temporal dependence relatively easy thanks to the coupling property, but some form of mixing is inevitable. It seems that S(rn , un ) first appeared in [Smi92] in the study of the extremal index of Markov chains. A slightly weaker version, called condition R was used in [O’B74], also in relation to the extremal index. The use of this condition in the context of the tail empirical process is due to [KSW18]. Many other estimators of the tail index have been introduced, some with better practical properties than the Hill estimator, some just for the fun of it. Problem 9.5 extends to time series the estimator introduced for i.i.d. random variables by [DEdH89]. Still for i.i.d. data, [CDM85] consider a kernel estimator of the tail index. In [DM98a] its asymptotic normality was proved for infinite order moving averages. An extremely important part of the literature is entirely ignored here: that which is concerned with practical methods for choosing the number of order statistics used in the statistic under consideration, particularly the Hill estimator. Better choices and bias reduction techniques allow to obtain better rates of convergence and confidence intervals. These methods usually rely on second order conditions which are seldom checkable. The only reference we know of that try to define and study really data-driven methods are [DK98] and [BT15]. The choice of the number or order statistics is related to the marginal distribution F , not to dependence structure and hence we do not discuss it here.

10 Estimation of cluster functionals

Let rn be an intermediate sequence and un be a scaling sequence. Let mn = [n/rn ]. In Chapter 6, we introduced the measure ν ∗n,rn defined on (Rd )Z (see (6.2.1)) by ν ∗n,rn =

  1 E δu−1 . (X ,...,X ) 1 rn n rn P(|X 0 | > un )

The empirical counterpart of ν ∗n,rn is given by  ∗n,rn = ν

n  1 δ −1 . nP(|X 0 | > un ) i=1 un (X (i−1)rn +1 ,...,X irn )

m

(10.0.1)

 ∗n,rn is a random element of the set M(˜0 (Rd )) of Borel meaThe measure ν sures on the space ˜0 (Rd ) of shift-equivalent sequences converging to 0 at ±∞ (see Definition 6.2.1), which are finite on sets separated from zero. We  ∗n,rn the empirical cluster measure. Theorem 6.2.5 shows that if will call ν condition AC(rn , un ) (recalled below) holds, the sequence ν ∗n,rn converges vaguely# to a measure ν ∗ in M(˜0 (Rd )), called the cluster measure (see Definition 5.4.11). In this chapter we will investigate the weak convergence of  ∗n,rn to ν ∗ . ν For specific functionals, the empirical cluster measure reduces to the tail  n defined on Rd \ {0} by empirical measure, that is, the random measure λ n = λ

 1 δ −1 . nP(|X 0 | > un ) j=1 un X j n

Indeed, for a function φ defined on Rd , let Hφ be formally defined on (Rd )Z by  φ(xj ) , x = (xj )j∈Z . Hφ (x) = j∈Z

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 10

265

266

10 Estimation of cluster functionals

The series has finitely many nonzero terms if x ∈ 0 (Rd ) and if the support of φ is separated from 0 in Rd . Then,  ∗n,rn (Hφ ) = ν =

n  1 Hφ (u−1 n (X (i−1)rn +1 , . . . , X irn )) nP(|X 0 | > un ) i=1

m

m n rn  1 φ(u−1 n Xj) , nP(|X 0 | > un ) j=1

and  n (φ) = ν  ∗n,rn (Hφ ) + Rn (φ) , λ

(10.0.2)

where Rn (φ) =

n  1 φ(u−1 n Xj) nP(|X 0 | > un ) j=m r +1 n n

is an edge effect that will be shown to be negligible. Furthermore, by regular variation of X 0 , we have n] = E[λ

] v# E[δu−1 n X0 −→ ν 0 , P(|X 0 | > un )

in M(Rd ) .

 n to ν 0 in the space Thus, we will investigate the weak convergence of λ d d M(R ) of Borel measures on R which are finite on sets separated from zero. This chapter is parallel to Chapter 9. In Section 10.1, we will investigate the  n to ν 0 will be  ∗n,rn to ν ∗ . The weak convergence of λ weak convergence of ν obtained as a consequence of the previous one. In Section 10.2, we will obtain the central limit theorem for the finitedimensional distributions of the centered and rescaled empirical cluster process   n = nP(|X 0 | > un ){ ν ∗n,rn − E[ ν ∗n,rn ]} G  = nP(|X 0 | > un ){ ν ∗n,rn − ν ∗n,rn } . (10.0.3) By the considerations above, the asymptotic behavior of the empirical cluster process will yield that of the functional tail empirical process   n − E[λ  n ]} .  n = nP(|X 0 | > un ){λ A In Section 10.3, we will obtain a functional version of the previous convergence, which will enable us to prove central limit theorems for feasible estimators, that is estimators where the deterministic threshold un is replaced by an order statistic. Several examples will be studied in detail in Section 10.4.

10.1 Consistency of the empirical cluster measure

267

Throughout this chapter we will use the same assumptions as in Chapter 9 which we recall for convenience: lim un = ∞ ,

n→∞

lim nP(|X 0 | > un ) = ∞ ,

n→∞

lim rn P(|X 0 | > un ) = 0 ,

n→∞

(R(rn , un )) 1 n n = lim = lim βn = 0 . lim n→∞ n n→∞ rn n→∞ rn (β(rn , n )) We will alsouse the anticlustering conditions already introduced. For each x, y > 0,   max |X j | > un x | |X 0 | > un y = 0 , (AC(rn , un )) lim lim sup P m→∞ n→∞

m≤|j|≤rn

lim lim sup

m→∞ n→∞

rn  P(|X 0 | > un x, |X j | > un y) =0. P(|X 0 | > un ) j=m

(S(rn , un ))

10.1 Consistency of the empirical cluster measure Define μ n,rn =

1 δ −1 . nP(|X 0 | > un ) un (X 1 ,...,X rn ) v#

Lemma 10.1.1 If R(rn , un ) and ν ∗n,rn −→ ν ∗ , then  ∗n,rn =⇒ ν ∗ ν w

in M(˜0 (Rd ))

(10.1.1)

if and only if



rn ∗ n =0, lim E[e−ν n,rn (H) ] − E[e−μn,rn (H) ] n→∞

(10.1.2)

for all shift-invariant bounded Lipschitz continuous functions H defined on 0 (Rd ) with support separated from zero. Proof. We will prove that under the assumptions of Lemma 10.1.1,

rn ∗ n = e−ν (H) , (10.1.3) lim E[e−μn,rn (H) ] n→∞

for all shift-invariant bounded Lipschitz continuous functions H with support separated from zero. By Theorem 7.1.17, the convergence (10.1.1) is equivalent to

268

10 Estimation of cluster functionals ∗

lim E[e−ν n,rn (H) ] = e−ν

n→∞



(H)

,

(10.1.4)

for the same functions. If (10.1.3) holds, then (10.1.4) is trivially equivalent to (10.1.2). We now prove (10.1.3) which is equivalent to lim nrn −1 E[1 − e−μn,rn (H) ] = ν ∗ (H) .

n→∞

Fix A > 0. Set vn = nP(|X 0 | > un ). Then,   nrn −1 E {1 − e−μn,rn (H) − μ n,rn (H)} 

 n,rn (H)}1 μ n,rn (|H|) ≤ vn−1 A = nrn −1 E {1 − e−μn,rn (H) − μ 

 n,rn (H)}1 μ n,rn (|H|) > vn−1 A = I + II . + nrn −1 E {1 − e−μn,rn (H) − μ Using the bound |1 − e−x − x| ≤ x2 ex+ for x ∈ R and the fact that H is bounded and there exists  > 0 such that H(x) = 0 if x∗ ≤ , we obtain

I ≤ cst nrn −1 E[ μ2n,rn (H)1 μ n,rn (|H|) ≤ vn−1 A ] ≤

∗ cst P(X 1,rn > un ) μn,rn (|H|)] cst AE[ ≤ = O(vn−1 ) = o(1) , vn rn P(|X 0 | > un ) vn rn P(|X 0 | > un )

for every A > 0 by Corollary 6.2.6 and R(rn , un ). To deal with II, we use the bound |1 − e−x − x| ≤ 2|x|ex+ . Since H is bounded, we obtain, for large enough A,

 2n  E μ n,rn (|H|)1 μ n,rn (|H|) > vn−1 A rn cst E [|H|(X 1,rn /un )1{|H|(X 1,rn /un ) > A}] = 0 . = rn P(|X 0 | > un )   We have thus proved that lim supn→∞ nrn −1 E 1 − e−μn,rn (H) − μ n,rn (H) = 0. Therefore, applying Theorem 6.2.5, we finally obtain  n  n lim E 1 − e−μn,rn (H) = lim E [ μn,rn (H)] = ν ∗ (H) . n→∞ rn n→∞ rn II ≤ cst

This proves (10.1.3).



We now provide conditions for the convergence of the empirical cluster measure. As in Chapters 7 and 9, we consider two types of conditions: β-mixing and m-dependent approximations.

10.1 Consistency of the empirical cluster measure

269

Theorem 10.1.2 Let {X j , j ∈ Z} be a stationary, regularly varying Rd valued time series. Assume that R(rn , un ), AC(rn , un ), and β(rn , n ) hold. Then (10.1.1) holds.

Proof. The proof is based on the blocking method and similar to the proof of Lemma 7.4.1. We only need to prove that β(rn , n ) implies (10.1.2). Let {X †j , j ∈ Z} be a sequence such that the blocks X †(i−1)rn +1,irn , i ≥ 1 are i.i.d. and have the same distribution as X 1,rn = (X 1 , . . . , X rn ). Set again vn = nP(|X 0 | > un ). Additionally to  ∗n,rn (H) = ν

mn 1  H(X (i−1)rn +1,irn /un ) vn i=1

we consider  ∗n,rn −n (H) = ν

mn 1  H(X (i−1)rn +1,irn −n /un ) . vn i=1

Define also  †n,rn (H) = ν  †n,rn −n (H) = ν

mn 1  H(X †(i−1)rn +1,irn /un ) , vn i=1 mn 1  H(X †(i−1)rn +1,irn −n /un ) . vn i=1

Let H be a shift-invariant bounded Lipschitz continuous function with support separated from zero. By Lemma E.3.4, the β-mixing condition β(rn , n ) implies that     ∗ n − ν† (H) β = o(1) . E e−ν n,rn −n (H) − E e n,rn −n ≤ mn βn ≤ rn n We must now prove that the small blocks of length n , {X (i−1)rn +j , X †(i−1)rn +j , j = rn − n + 1, . . . , rn , i = 1, . . . , mn } , are negligible. By stationarity of the blocks and within the blocks and since H is bounded and Lipschitz continuous with respect to the uniform norm and with support separated from zero, there exists  > 0 such that   †  n,rn (H) − ν  ∗n,rn −n (H) + ν  †n,rn −n (H)  ∗n,rn (H) − ν E ν   mn =O P(X ∗1,n > un ) . vn

270

10 Estimation of cluster functionals

Since Condition AC(rn , un ) implies AC(n , un ), we have by Corollary 6.2.6 and β(rn , n ),   n mn 1 n ∗ ∗ P(X 1,n > un ) P(X 1,n > un ) ≤ =O = o(1) . vn n P(|X 0 | > un ) rn rn Thus the small blocks are negligible. This yields     ∗ † lim E e−ν n,rn (H) − E e−ν n,rn (H) = 0 . n→∞

Finally,   † E e−ν n,rn (H) = (E[e−μn,rn (H) ])mn by independence of the blocks X †(i−1)rn +1,rn , i = 1, . . . , mn . Thus, since mn = [n/rn ], (10.1.2) holds.  We now turn to m-dependent approximations. Consider a sequence of regu(m) larly varying m-dependent sequences {X j , j ∈ Z}, m ≥ 1. We write X n,i = u−1 n X (i−1)rn +1,irn , i = 1, . . . , mn (m)

(m)

and ν ∗(m) n,rn =

  1 E δX (m) .

(m) n,1 rn P X 0 > un

(10.1.5)

Theorem 10.1.3 Let {X j , j ∈ Z} be a stationary, regularly varying Rd valued time series with tail process Y such that P(Y ∈ 0 (Rd )) = 1. (m) Assume that R(rn , un ) holds. Let {X j , j ∈ Z}, m ≥ 1, be regularly (m)

varying m-dependent time series such that {(X j , X j ), j ∈ Z} is stationary for all m ≥ 1 and for all  > 0,

(m) P X 0 − X 0 > un  =0. (10.1.6) lim lim sup m→∞ n→∞ P(|X 0 | > un ) Then (10.1.1) holds.

(m)

Proof. Write vn = nP(|X 0 | > un ), vn Proposition 5.2.5 that (10.1.6) implies



(m) = nP X 0 > un . We know by

10.1 Consistency of the empirical cluster measure

lim lim sup |vn /vn(m) − 1| = 0 .

m→∞ n→∞

271

(10.1.7)

Define  ∗(m) ν n,rn =

n  1 δ (m) . nP(|X 0 | > un ) i=1 X n,i

m

Since the time series X (m) is m-dependent, AC(rn , un ) holds for all sequences rn , un satisfying R(rn , un ) by Lemma 6.1.3. Since an m-dependent sequence is also β-mixing with geometric rate, we also have by Theorem 10.1.2 that for every m ≥ 1, vn ∗(m) w ∗(m)  =⇒ ν , n→∞. (10.1.8) ν (m) n,rn vn We will prove that  ∗n,rn (H)| > η) = 0 , ν ∗(m) lim sup lim P(| n,rn (H) − ν m→∞ n→∞

(10.1.9)

for all η > 0 and all shift-invariant bounded Lipschitz continuous functions with support separated from zero. By Theorem 7.1.17 and the triangular argument Lemma A.2.10, this will ensure that (10.1.1) holds. Let  > 0 be such that H(x∗ ) = 0 if x∗ ≤ . Without loss of generality, we assume that H is 1-Lipschitz. Fix ζ ∈ (0, /2). Write X n,i = u−1 n X (i−1)rn +1,irn . Then  ∗n,rn (H)| | ν ∗(m) n,rn (H) − ν mn 1  (m) ≤ |H(X n,i ) − H(X n,i )| vn i=1

  mn 1  (m) (m) |H(X n,i ) − H(X n,i )|1 max |X j − X j | ≤ ζun vn i=1 j=(i−1)rn +1,...,irn   m n 1  (m) (m) |H(X n,i ) − H(X n,i )|1 max |X j − X j | > ζun + vn i=1 j=(i−1)rn +1,...,irn



mn   ζ  (m)∗ 1 |X (i−1)rn +1,irn | > un /2 vn i=1  mn  cst  (m) + 1 max |X j − X j | > ζun vn i=1 j=(i−1)rn +1,...,irn  mn  1  c (m) ∗(m)  1 max |X j − X j | > ζun . = ζ ν n,rn (B (0, /2)) + cst vn i=1 j=(i−1)rn +1,...,irn



By (10.1.7), for arbitrary δ > 0 and n, m large enough (depending on δ), (m) vn /vn ≤ (1 + δ). Hence,

272

10 Estimation of cluster functionals

1 vn

mn 

(m)

|H(X n,i )−H(X n,i )|

i=1

≤ ζ(1 + δ) + cst

vn (m) vn

 ∗(m) ν n,rn (B (0, /2)) c

 mn  1  (m) 1 max |X j − X j | > ζun . vn i=1 j=(i−1)rn +1,...,irn

Thus, for η > 0, applying Markov inequality to the second term, we obtain  ∗n,rn (H)| > η) P(| ν ∗(m) n,rn (H) − ν   n vn ∗(m) c 1 cst  (m)  n,rn (B (0, /2))>η/(2(1 + δ)ζ) + ≤P (m) ν P(|X j −X j |>ζun ) . η vn j=1 vn v#

Applying (10.1.6), (10.1.8), and the convergence ν ∗ (m) −→ ν ∗ (as m → ∞) which follows from Lemma 6.2.7, we obtain  ∗n,rn (H)| > η) lim lim sup P(| ν ∗(m) n,rn (H) − ν

m→∞ n→∞

≤ lim lim sup P( ν ∗(m) n,rn (B (0, /2)) > η/(2(1 + δ)ζ)) c

m→∞ n→∞

c

= P(ν ∗ (B (0, /2)) > η/(2(1 + δ)ζ)) . The right-hand side is equal to zero if ζ is small enough. This proves (10.1.9) and Theorem 10.1.3.  Consistency of the tail empirical measure Recall that we have defined the tail empirical measure n = λ

 1 δ −1 . nP(|X 0 | > un ) j=1 un X j n

 n described in (10.0.2), the following  ∗n,rn and λ Given the relation between ν result is expected.  ∗n,rn =⇒ ν ∗ , Proposition 10.1.4 If R(rn , un ) and AC(rn , un ) hold and ν w  n =⇒ ν 0 . then λ w

 ∗n,rn =⇒ ν ∗ , then Proof. We will prove that if ν w

10.1 Consistency of the empirical cluster measure 

lim E[e−λ n (φ) ] = e−ν 0 (φ) ,

n→∞

273

(10.1.10)

for all bounded Lipschitz continuous functions φ on Rd with support bounded away from zero. Write vn = nP(|X 0 | > un ). Recall that Hφ is defined on 0 (Rd ) by Hφ (x) =  j∈Z φ(xj ). Then 1  n (φ) = ν  ∗n,rn (Hφ ) + λ vn

n 

φ(u−1 n Xj) .

j=mn rn +1

Let the last term be denoted by Rn (φ). Since φ is bounded with support bounded away from zero, there exists  > 0 such that rn P(|X 0 | > un ) →0. nP(|X 0 | > un )

E[Rn (φ)] ≤ cst Thus (10.1.10) is equivalent to ∗

lim E[e−ν n,rn (Hφ ) ] = e−ν 0 (φ) .

n→∞

(10.1.11)

Since ν ∗ (Hφ ) = ν 0 (φ) (see Problem 5.26), we would like to use the weak  ∗n,rn to ν ∗ to conclude (10.1.11). This is not directly possible convergence of ν since Hφ is not bounded, so a truncation argument is needed. Fix M > 0. P

 ∗n,rn (Hφ ∧ M ) −→ ν ∗ (Hφ ∧ M ). Furthermore, by monotone converThen, ν gence, limM →∞ ν ∗ (Hφ ∧ M ) = ν ∗ (Hφ ) = ν 0 (φ). Thus, by the triangular argument Lemma A.2.10, the convergence (10.1.11) will be a consequence of  ∗n,rn (Hφ 1{Hφ > M }) = 0 . lim lim sup ν

M →∞ n→∞

Write Sn = arity,

rn j=1

¯ φ(u−1 n X j ) and Sn =

rn j=−rn

(10.1.12)

φ(u−1 n X j ). Then, by station-

rn −1 mn j=1 E[φ(un X j )1{Sn > M }] n P(|X 0 | > un )

−1 E[φ(un X 0 )1 S¯n > M ] . ≤ P(|X 0 | > un )

E[ ν ∗n,rn (Hφ 1{Hφ > M })] =

Since φ has support bounded away from zero, there exists  > 0 such that

¯ E[φ(u−1 P(|X 0 | > un , S¯n > M ) n X 0 )1 Sn > M ] ≤ cst . P(|X 0 | > un ) P(|X 0 | > un ) Since AC(rn , un ) holds, we have by Theorem 6.1.4,

274

10 Estimation of cluster functionals

⎛ ⎞  ¯ P(|X 0 | > un , Sn > M ) → −α P ⎝ φ(Y j ) > M ⎠ . P(|X 0 | > un ) j∈Z

 Since the sum j∈Z φ(Y j ) is almost surely finite, the last term tends to zero as M → ∞. Thus (10.1.12) holds and the proof is concluded.  Remark 10.1.5 Let φs (x) = 1{|x| > s}, x ∈ Rd , s > 0. Then φs is almost surely continuous with respect to ν 0 and bounded with support separated from 0, thus Proposition 10.1.4 yields  n (φs ) = λ

 1 P 1{|X j | > un s} −→ ν 0 (φs ) = s−α nP(|X 0 | > un ) j=1 n



and we recover consistency result in Proposition 9.1.1. Convergence of intermediate order statistics

Let X be a stationary time series and let F0 denote the distribution function of |X 0 |. Let |X|(n:1) ≤ · · · ≤ |X|(n:n) be the order statistics of the sequence of norms. Let k be an intermediate sequence (implicitly indexed by n as usual) of integers and let un = F0← (1 − k/n). Define  ∗n,rn = ν

mn 1 −1 δ , k i=1 |X |(n:n−k) (X (i−1)rn +1 ,...,X irn )

n  n = 1 −1 δ . λ k j=1 |X |(n:n−k) X j

(10.1.13) (10.1.14)

 ∗n,rn =⇒ ν ∗ , Corollary 10.1.6 If R(rn , un ) and AC(rn , un ) hold and ν w w P  n =⇒  ∗n,rn =⇒ ν ∗ and λ then u−1 ν 0. n |X|(n:n−k) −→ 1, ν w

Proof. The first statement is a consequence of Propositions 9.1.3 and 10.1.4 and Remark 10.1.5. The other statements are consequences of the continuity of the multiplication of points, see Lemma B.2.4.  Consistency of blocks estimator Consider the blocks estimator of a cluster statistic related to a shift-invariant map H on 0 (Rd ):

10.1 Consistency of the empirical cluster measure

 ∗n,rn (H) = ν

1 k

mn 

H(X (i−1)rn +1,irn /|X|(n:n−k) ) .

275

(10.1.15)

i=1

Consistency of the empirical cluster measure and its extensions give consis ∗n,rn (H). tency ν Example 10.1.7 (Estimation of the extremal index) Consider a univariate non-negative regularly varying time series. Let rn and un be sequences such that R(rn , un ) and AC(rn , un ) hold. Then we know from Chapter 6 that ∗ P(X1,r > un ) n . ϑ = lim n→∞ rn P(X0 > un ) Therefore we can define a blocks estimator of the (candidate) extremal index by [n/rn ]  1   ∗ ϑn,k = 1 X(i−1)rn +1,irn > X(n:n−k) . k i=1

 ∗n,rn (H) with H(x) = 1{x∗ > 1} for x ∈ RZ+ . Since H is That is, ϑn,k = ν bounded with bounded support and almost surely continuous with respect to P ν ∗ , we conclude that under the assumptions of Corollary 10.1.6, ϑn,k −→ ϑ.  We now consider functionals that may not have a bounded support in 0 \{0}. For this, we recall condition ANSJB of Definition 6.2.12: for each , s > 0  rn P( j=1 |X j |1{|X j | ≤ un } > sun ) lim lim sup = 0 . (ANSJB(rn , un )) →0 n→∞ rn P(|X 0 | > un )

Corollary 10.1.8 Let K be a shift-invariant functional defined on 1 such that K(0) = 0 and which is Lipschitz continuous with constant LK , i.e. for all x ∈ 1 (Rd ),  xj − y j . (10.1.16) |K(x) − K(y)| ≤ LK j∈Z

 ∗n,rn =⇒ ν ∗ , If R(rn , un ), AC(rn , un ), and ANSJB(rn , un ) hold and ν then w

P

 ∗n,rn (1{K > 1}) −→ ν ∗ (1{K > 1}) , ν  ∗n,rn (1{K ν

P



> 1}) −→ ν (1{K > 1}) .

(10.1.17a) (10.1.17b)

276

10 Estimation of cluster functionals

Proof. We only need to prove (10.1.17a). We proceed similarly to the proof of Theorem 6.2.16. Recall that vn = nP(|X 0 | > un ). Then  ∗n,rn (H) = ν  ∗n,rn (1{K > 1}) = ν

mn

1  1 K(X (i−1)rn +1,irn /un ) > 1 . vn i=1

Let  > 0 be arbitrary. As in (6.2.17) we consider the truncation operator T : T (x) = {xj 1{|x j |>} , j ∈ Z} .  ∗n,rn =⇒ ν ∗ , we have Since ν w

P

 ∗n,rn (1{K ◦ T > 1}) −→ ν ∗ (1{K ◦ T > 1}) .  ∗n,rn (H ◦ T ) = ν ν Fix η ∈ (0, 1) and ζ > 0. Let LK be as in (10.1.16) and choose  > 0 such that r n P( j=1 |X j | 1{|X j | ≤ cn } > ηcn /LK ) ≤ζ. lim sup rn P(|X 0 | > cn ) n→∞ Then  ∗n,rn (1{K > 1}) ν

 ∗n,rn (1{K ◦ T > 1 − η}) ≤ν mn

1  + 1 |K(X (i−1)rn +1,irn /un ) − K ◦ T (X (i−1)rn +1,irn /un )| > η vn i=1  ∗n,rn (1{K ◦ T > 1 − η}) + Rn, . =ν

Similarly  ∗n,rn (1{K > 1}) ≥ ν  ∗n,rn (1{K ◦ T > 1 + η}) + Rn, . ν Using (10.1.16) and ANSJB(rn , un ) gives  rn P( j=1 |X j | 1{|X j | ≤ cn } > ηcn /LK ) lim lim E[Rn, ] ≤ lim lim =0. →0 n→∞ →0 n→∞ rn P(|X 0 | > cn ) Since η is arbitrary, this finishes the proof.



Next, we extend Proposition 10.1.4 to functions φ on Rd that do not necessarily vanish at zero or are unbounded.

10.1 Consistency of the empirical cluster measure

277

Corollary 10.1.9 Let φ : Rd → R be a measurable function such that lim lim sup

→0 n→∞

E[|φ(X 0 /un )|1{|X 0 | ≤ un }] =0, P(|X 0 | > un )

and there exists δ > 0 such that   E |φ|1+δ (X 0 /un ) sup un ) n≥1

(10.1.18)

(10.1.19)

w  n =⇒ If R(rn , un ) and AC(rn , un ) hold and λ ν 0 then P  n (φ) −→ λ ν 0 (φ)

(10.1.20)

P  n (φ) −→ and λ ν 0 (φ).

Proof. Let again vn = nP(|X 0 | > un ). Assume first that φ is bounded. Define φ by φ (x) = φ(x)1{|x| > }. We have n n  1   n (φ) = 1  λ φ(u−1 φ(u−1 n X j ) = λn (φ ) + n X j )1{|X j | ≤ un } vn j=1 vn j=1

 n (φ ) + Rn, (φ) . =λ Since |Rn, (φ)| ≤ Rn, (|φ|) and by (10.1.18), E[|Rn, (φ)|] converges to 0 as w  n =⇒ n → ∞ and then  → 0. This and λ ν 0 imply (10.1.20) for bounded functions φ. We extend it to unbounded functions. For M > 0 define φM by φM (x) = φ(x) ∧ M . Then n 

−1  n (φ) = λ  n (φM ) + 1 λ φ(u−1 n X j )1 φ(un X j ) > M vn j=1

 n (φM ) + Rn,M (φ) . =λ Assumption (10.1.19) gives   E |φ|1+δ (X 0 /un ) 1 → 0 as M → ∞ . E[|Rn,M (φ)|] ≤ δ sup M n≥1 P(|X 0 | > un ) Application of the first part concludes the proof.



278

10 Estimation of cluster functionals

10.2 Central limit theorem for the empirical cluster process Assume that R(rn , un ) and AC(rn , un ) hold and let ν ∗ be the vague# limit of ν ∗n,rn which exists under the latter assumptions by Theorem 6.2.5. Recall that we defined in (10.0.3) the empirical cluster process   n (H) = nP(|X 0 | > un ){ ν ∗n,rn (H) − E[ ν ∗n,rn (H)]} . G Let G be a Gaussian random measure on L2 (ν ∗ ) with mean measure ν ∗ , that is a random process indexed by L2 (ν ∗ ) whose finite-dimensional distributions are Gaussian with zero mean and cov(G(H), G(H )) = ν ∗ (HH ) . We first state a central limit theorem under ad-hoc assumptions. Theorem 10.2.1 Let {X j , j ∈ Z} be a stationary, regularly varying Rd valued time series. Assume that R(rn , un ), AC(rn , un ), and β(rn , n ) hold. Let H be a linear subspace of L2 (ν ∗ ) such that: (BCLT1) For all H ∈ H, limn→∞ ν ∗n,rn (H) = ν ∗ (H). (BCLT2) For all H ∈ H and η > 0,

   lim ν ∗n,rn H 2 1 |H| > η nP(|X 0 | > un ) =0. n→∞

(10.2.1)

(BCLT3) There exist functions Kn : (Rd )n → R such that     H X 1,rn − H X 1,rn −n ≤ Kn (X r − +1,r ) n n n un un and

  E Kn2 (X 1,n ) =0. lim n→∞ rn P(|X 0 | > un )

fi.di.  n −→ Then G G on H.

 n is linear with respect to its argument and H is a linear space, Proof. Since G  n (H) for every H ∈ H. The we only need to prove the weak convergence of G joint convergence is obtained straightforwardly by the Wold device since every

10.2 Central limit theorem for the empirical cluster process

279

linear combination of functions of H is in H. As before, we consider blocks {(i − 1)rn + 1, . . . , irn }, 1 ≤ i ≤ mn , and a pseudo-sample (X †1 , . . . , X †n ) such that the blocks (X †(i−1)rn +1 , . . . , X irn ), i = 1, . . . , mn , are mutually

 †n,rn (H) be independent with the same distribution as (X 1 , . . . , X rn ). Let ν ∗  n,rn (H), but based on i.i.d. blocks. the statistic defined in the same way as ν ∗ † Clearly, E[ ν n,rn (H)] = E[ ν n,rn (H)]. Conditions (BCLT3) and β(rn , n ) allow us to use Lemma E.3.5 with 1 H(x1,rn /un ) , hn,rn (x1,rn ) =  nP(|X 0 | > un ) 1 Kn (x1,n un ) , gn (x1,n ) =  nP(|X 0 | > un ) to conclude that the limiting distributions (if exist) of     †n,rn (H) − E[ nP(|X 0 | > un ) ν ν ∗n,rn (H)] and that of

 ∗

 n,rn (H) − E[ nP(|X 0 | > un ) ν ν ∗n,rn (H)]

coincide. Conditions (BCLT1) and (BCLT2) are the conditions of the multivariate Lindeberg Central Limit Theorem E.2.4 which yields the asymptotic normality of the estimator based on the independent blocks. Note that centering is not needed in Condition (BCLT2) since the convergence of ν ∗n,rn (H) implies that E[H(X 1,rn /un )] = O(rn P(|X 0 | > un )) and rn P(|X 0 | > un ) → 0  by assumption R(rn , un ).

10.2.1 Central limit theorem for tail array sums In this section we consider the tail array sums n  1  n (φ) = λ φ (X j /un ) . nP(|X 0 | > un ) j=1 Write Hφ (x) =



φ(xj ) ,

j∈Z

we have (as in the proof of Proposition 10.1.4)  ∗n,rn (Hφ ) = ν =

[n/rn ]  1 Hφ (X (i−1)rn +1,irn /un ) nP(|X 0 | > un ) i=1 [n/rn ]  1 nP(|X 0 | > un ) i=1

irn 

 n (φ) − Rn (φ) φ(X j /un ) = λ

j=(i−1)rn +1

(10.2.2)

280

10 Estimation of cluster functionals

with n  1 Rn (φ) = φ(X j /un ) . nP(|X 0 | > un ) j=m r +1

(10.2.3)

n n

 n (φ)  ∗n,rn (Hφ ) and the tail array sum λ This shows that the blocks estimator ν are equal up to edge effects. The functionals Hφ are defined on 0 = 0 (Rd ), the set of Rd -valued sequences x = (xj )j∈Z such that lim|j|→∞ |xj | = 0. We also note that the summation functional Hφ is not bounded even if φ is a bounded function. Thus, to study  n (φ), we need some additional conditions. First, the limiting behavior of λ we introduce the appropriate class of functions. Let ψ : Rd → R+ be a measurable function with the following properties: • lim lim sup

→0 n→∞

E[ψ 2 (X 0 /un )1{|X 0 | ≤ un }] =0. P(|X 0 | > un )

• There exists δ > 0 such that   E ψ 2+δ (X 0 /un ) un ) • lim lim sup

m→∞ n→∞

(10.2.4)

(10.2.5)

     rn  X0 Xj 1 E ψ ψ =0. P(|X 0 | > un ) un un |j|=m

(Sψ (rn , un )) Condition (10.2.4) obviously holds if ψ vanishes in a neighborhood of 0 (that is ψ(x) = 0 whenever |x| < , x ∈ Rd ). If ψ is bounded and satisfies (10.2.4), then (10.2.5) holds for all δ > 0 by Proposition 2.1.9. Thus, if ψ is bounded and vanishes in a neighborhood of 0 then both (10.2.5) and (10.2.4) hold. Also, if ψ is bounded and vanishes in a neighborhood of 0 then Sψ (rn , un ) reduces to the classical anticlustering condition S(rn , un ). Definition 10.2.2 For a function ψ : Rd → R+ , Tψ is the linear space of measurable functions φ : Rd → R such that: • |φ| ≤ cst ψ (where cst may depend on φ); • φ is almost surely continuous with respect to ν 0 .

10.2 Central limit theorem for the empirical cluster process

281

The following theorem describes the finite-dimensional convergence of the process:     n (φ)] .  n (φ) − E[λ  n (φ) = nP(|X 0 | > un ) λ (10.2.6) A Recall that we have defined Hφ on 0 (Rd ) by Hφ (x) = 0 (Rd ). Let G be as in Theorem 10.2.1.

 j∈Z

φ(xj ), x ∈

Theorem 10.2.3 Let {X j , j ∈ Z} be a stationary, regularly varying Rd -valued such that R(rn , un ) and β(rn , n ) hold. Let ψ : Rd → R+ be a measurable function which satisfies conditions (10.2.4) and Sψ (rn , un ). Assume that either ψ is bounded or there exists δ ∈ (0, 1] such that (10.2.5) holds and rn =0. (10.2.7) lim n→∞ (nP(|X | > u ))δ/2 0 n Then ν ∗ (Hφ2 ) < ∞ for all φ ∈ Tψ and   fi.di.  n (φ), φ ∈ Tψ −→ {G(Hφ ), φ ∈ Tψ } . A

The proof consists of showing that the remainder term in (10.2.2) is negligible and that the conditions of Theorem 10.2.1 hold for the class {Hφ , φ ∈ Tψ }. It is postponed to Section 10.5. We can represent the limiting variance in terms of the tail process for functions φ which vanish in a neighborhood of zero. Lemma 10.2.4 Assume that φ ∈ Tψ and let  be such that φ(x0 ) = 0 if |x0 | ≤ , x0 ∈ Rd . Then  −α E[φ(Y 0 )φ(Y j )] (10.2.8a) ν ∗ (Hφ2 ) = j∈Z

= −α E[φ2 (Y 0 )] + 2

∞ 

−α E[φ(Y 0 )φ(Y j )] .

(10.2.8b)

j=1 d ¯ Proof. Denote by φ¯ the extension of φ to  0 (R ) by setting φ(x) = φ(x0 ) j ¯ for x = (xj , j ∈ Z). In this notation Hφ = j∈Z φ ◦ B . Since Hφ is shiftinvariant, we have by (5.4.13)   ¯ ◦ Bj ) ν ∗ (Hφ · φ¯ ◦ B j ) = ν ∗ ((Hφ · φ) (10.2.9) ν ∗ (Hφ2 ) = j∈Z

¯ = = ν(Hφ · φ)

 j∈Z

j∈Z

ν(φ¯ · φ¯ ◦ B j ) .

282

10 Estimation of cluster functionals

Since φ(x0 ) = 0 if |x0 | ≤ , by homogeneity of the tail measure and definition of the tail process, we have  j ¯ ¯ φ(x0 )φ(xj )ν(dx) ν(φ · φ ◦ B ) = (Rd )Z  = −α φ(x0 )1{|x0 | > 1}φ(xj )ν(dx) (Rd )Z

−α

=

E[φ(Y 0 )φ(Y j )] .

This proves (10.2.8a). To prove (10.2.8b), we use φ(x0 ) = 0 if |x0 | ≤ 1 and apply the time-change formula (5.3.1a). See Problem 5.14.  Remark 10.2.5 Since  n (φ)] , ν ∗n,rn (Hφ )] = lim E[λ ν ∗ (Hφ ) = lim E[ n→∞

n→∞

we can restate Theorem 10.2.3 as follows:  ∗

fi.di.  n,rn (Hφ ) − E[ nP(|X 0 | > un ) ν ν ∗n,rn (Hφ )] −→ G(Hφ ) , on {Hφ , φ ∈ Tψ } with G as in Theorem 10.2.1.



10.3 Central limit theorem for feasible estimators Let |X|(n:1) ≤ · · · ≤ |X|(n:n) be the order statistics of |X 1 | , . . . , |X n |. Let k be a non-decreasing sequence of integers (the dependence in n being implicit) and define the feasible estimator of the cluster statistic (cf. (10.1.13)) by  ∗n,rn (H) = ν

mn 1 H(X (i−1)rn +1,irn /|X|(n:n−k) ) . k i=1

(10.3.1)

We want to know its asymptotic distribution, that is the limiting distribution of  m  n √ 1 ∗  H(X (i−1)rn +1,irn /|X|(n:n−k) ) − ν (H) Gn (H) = k k i=1 √ ∗

 n,rn (H) − ν ∗ (H) . = k ν

 Recall that the map E is defined on 0 by E(y) = j∈Z 1 y j > 1 . d Z −1 For a map

) , define Hs by Hs (y) = H(s y). In particular, on (R H defined Es (y) = j∈Z 1 y j > s .

10.3 Central limit theorem for feasible estimators

283

Let I = [s0 , t0 ] be an interval of (0, ∞). We say that the class {Hs , s ∈ I} is linearly ordered if either ∀s, s ∈ [s0 , t0 ] , s ≤ s =⇒ Hs (y) ≤ Hs (y) for all y ∈ (Rd )Z , or ∀s, s ∈ [s0 , t0 ] , s ≤ s =⇒ Hs (y) ≥ Hs (y) for all y ∈ (Rd )Z . Theorem 10.3.1 Let {X j , j ∈ Z} be a stationary, regularly varying Rd valued time series. Assume that the distribution F0 of |X 0 | is continuous. Let k be a nondecreasing sequence of integers and define un = F0← (1 − k/n). Assume that R(rn , un ), β(rn , n ), and S(rn , un ) hold. Fix 0 < s0 < 1 < t0 . Let H : (Rd )Z → R be a shift-invariant measurable map such that the class {Hs : s ∈ [s0 , t0 ]} is linearly ordered and satisfies (BCLT1), (BCLT2), and (BCLT3) of Theorem 10.2.1. Assume moreover that √ lim k sup |ν ∗n,rn (Es ) − ν ∗ (Es )| = 0 , (10.3.2a) n→∞

s∈[s0 ,t0 ]

√ lim k sup |ν ∗n,rn (Hs ) − ν ∗ (Hs )| = 0 .

n→∞

(10.3.2b)

s∈[s0 ,t0 ]

d  n (H) −→ G(H − ν ∗ (H)E). The convergence holds jointly for a Then G finite collection of functions H which satisfy the assumptions.

Remark 10.3.2 The first bias condition (10.3.2a) is written similarly to the second one (10.3.2b). It can be simply written as √ P(|X 0 | > un s) −α −s =0. lim k sup n→∞ P(|X | > u ) 0 n s∈[s0 ,t0 ] We have already met this assumption before; see (9.3.3).



Remark 10.3.3 Since the class {Hs , s ∈ [s0 , t0 ]} satisfies (BCLT1), for every s ∈ [s0 , t0 ], the following limit holds: lim ν ∗n,rn (Hs ) = ν ∗ (Hs ) = s−α ν ∗ (H) .

n→∞

(10.3.3)

Since the functions ν ∗n,rn (Hs ) are monotone and the limit is continuous, the convergence is uniform by Dini’s Theorem E.4.12. Therefore, the sequence k can be theoretically chosen such that (10.3.2a) and (10.3.2b) hold. ⊕ Remark 10.3.4 If H is shift-invariant, the limiting distribution is centered Gaussian with variance

284

10 Estimation of cluster functionals

ν ∗ ({H − ν ∗ (H)E}2 ) = ν ∗ (H 2 ) − 2ν ∗ (H)ν ∗ (HE) + (ν ∗ (H))2 ν ∗ (E 2 )  P(|Y j | > 1) . = ν ∗ (H 2 ) − 2ν ∗ (H)E[H(Y )] + (ν ∗ (H))2 j∈Z



Cf. Problem 5.25.

Proof (Sketch of proof of Theorem 10.3.1). Write ζn = |X|(n:n−k) /un . Since k = nP(|X 0 | > un ), we have √ ∗

 n (H) = k ν  n,rn (Hζn ) − ν ∗ (H) G √ ∗



 n,rn (Hζn ) − ν ∗n,rn (Hζn ) + k ν ∗n,rn (Hζn ) − ν ∗ (Hζn ) = k ν (10.3.4) √ + k {ν ∗ (Hζn ) − ν ∗ (H)} . Step 1. We know by Theorem 10.2.1 that the finite-dimensional distributions of the process √ ∗

 n,rn (Hs ) − ν ∗n,rn (Hs )}, s ∈ [s0 , t0 ] { k ν converge weakly to those of the Gaussian process G. If we prove that G is continuous on the class {Hs , s ∈ [s0 , t0 ]} and that the convergence is locally P uniform, since ζn −→ 1, we obtain √ ∗

d  n,rn (Hζn ) − ν ∗n,rn (Hζn ) −→ G(H) . k ν Step 2. The bias condition (10.3.2b) implies that the second term in (10.3.4) tends to zero. Step 3. Similarly to Theorem 9.3.1, we have, jointly with the previous convergence, √

k(ζn−α − 1) −→ −G(E) . d

(10.3.5)

Therefore, by homogeneity of ν ∗ and √

√ d k {ν ∗ (Hζn ) − ν ∗ (H)} = ν ∗ (H) k(ζn−α − 1) −→ −ν ∗ (H)G(E) .

d  n (H) −→ G(H − Since the convergences hold jointly, we conclude that G ∗  ν (H)E).

A rigorous proof is provided in Section 10.6.

10.3 Central limit theorem for feasible estimators

285

Remark 10.3.5 Under the assumptions of Theorem 10.2.1 and the bias con√ ∗  n,rn (H) − ν ∗ (H) converges in distribuditions (10.3.2a)–(10.3.2b), k ν  n . This tion to G(H). This is different from the limiting distribution of G difference was already noted for the tail empirical process and the Hill estimator. Cf. Remarks 9.4.2 and 9.5.3. ⊕  Remark 10.3.6 Recall that E(x) = j∈Z 1{|xj | > 1}. If S(rn , un ) holds, we have mn

 ∗ k ν n,rn (E) = E X (i−1)rn +1,irn /|X|(n:n−k) =

i=1 m n rn 

  1 |X j | > |X|(n:n−k)

j=1

=k−

n 

  1 |X j | > |X|(n:n−k)

i=mn rn +1

 = k + OP ( rn P(|X 0 | > un )) . Therefore,  ∗    √ √  n,rn (H)  ∗n,rn (H) ν ν ∗ ∗  − ν (H) k − ν (H) = k  ∗n,rn (E) ν 1 + OP (k −1 rn P(|X 0 | > un ) √ ∗

 n,rn (H) − ν ∗ (H) + OP (m−1/2 = k ν ) n −→ G(H − ν ∗ (H)E) . d

⊕ ∗    by  n,rn on ˜0 \ {0} and the process G Remark 10.3.7 Define the measure ν n  1 δ X (i−1)rn +1,irn , un j=1 1{|X j | > un } i=1

∗   n,rn = n ν

 n = G

m

√ ∗   n,rn − ν ∗ } . k{ν

Under the assumptions of Theorem 10.2.1 and if moreover k = nP(|X 0 | > un ),   n (H) has the S(rn , un ) and the bias conditions (10.3.2a)–(10.3.2b) hold, G   n (H) is rel n (H). The asymptotic theory of G same limiting distribution as G  atively easier to obtain than that of Gn (H) since it does not require uniform convergence over the class of functions {Hs , s ∈ [s0 , t0 ]} (which is needed to replace the random threshold with na deterministic one). Also, as noted in Remark 10.3.6, the normalization j=1 1{|X j | > un } can be replaced by mn i=1 E(X (i−1)rn +1,irn /un )

286

10 Estimation of cluster functionals

∗   ∗n,rn (H) and ν  n,rn (H). However, there is one important difference between ν  ∗n,rn (H), the minimal condition on k is that k → For the consistency of ν ∗   ∞ and k/n → 0. To deal with ν (H), the minimal requirement is that n,rn

sequence un must satisfy un → ∞ and nP(|X 0 | > un ) → ∞. The latter condition necessitates the knowledge of the marginal distribution. This is  ∗n,rn (H). ⊕ why the common practice is to choose k and use ν Tail array sums with random threshold

Recall that we denote the exponent measure of X 0 by ν 0 . Define the empirical cluster measure with random threshold and the feasible functional tail empirical process by  n  1 Xj  n (φ) = , (10.3.6) φ λ k j=1 |X|(n:n−k) √

 n (φ) − ν 0 (φ) .  n (φ) = k λ (10.3.7) A We consider a non-negative function ψ which satisfies condition Sψ (rn , un ) and Tψ is the function space introduced in Definition 10.2.2. Theorem 10.3.8 Let {X j , j ∈ Z} be a stationary, regularly varying Rd valued time series. Assume that the distribution F0 of |X 0 | is continuous. Let k be a nondecreasing sequence of integers and define un = F0← (1 − k/n). Assume that √ P(|X 0 | > un s) −α −s =0. lim k sup n→∞ P(|X 0 | > un ) s∈[s0 ,t0 ]

Assume that R(rn , un ), β(rn , n ) and S(rn , un ) hold. and Let ψ be a function defined on Rd such that (10.2.4) and Sψ (rn , un ) holds. Assume that either ψ is bounded or there exists δ ∈ (0, 1] such that (10.2.5) and (10.2.7) hold. Let φ ∈ Tψ be such the class {φs , s ∈ [s0 , t0 ]} is linearly ordered and √  n (φs )] − s−α ν 0 (φ)| = 0 . lim k sup |E[λ (10.3.8) n→∞

s∈[s0 ,t0 ]

 n (φ) −→ G(Hφ − ν 0 (φ)E) and the convergence holds jointly for Then A all finite collection of functions in Tψ satisfying the assumptions. d

Proof. Set Rn = k −1/2

n

j=mn rn +1 {φ(|X|(n:n−k)

−1

X j ) − ν 0 (φ)}. Since

10.3 Central limit theorem for feasible estimators

287

 n (Hφ ) + Rn ,  n (φ) = G A P

 n (Hφ ) −→ G(Hφ − ν ∗ (Hφ )E) and Rn −→ 0. The we need only prove that G first statement holds since the class {Hφs , s ∈ [s0 , t0 ]} is linearly ordered and satisfies the assumptions (BCLT1), (BCLT2), and (BCLT3) of Theorem 10.2.1; see Section 10.5. Thus, we can use Theorem 10.3.1. Define, for s ∈ [s0 , t0 ], d

Rn (s) = k −1/2

n 

−1 {φ(u−1 X j ) − s−α ν 0 (φ)} . n s

j=mn rn +1

Since n − mn rn ≤ rn , under R(rn , un ) and Sψ (rn , un ), Theorem 10.2.1 yields −1/2 that Rn (s) = OP (mn ) for all s ∈ [s0 , t0 ]. The convergence can be proved to be uniform by the same arguments as those used in the proof of Theorem P  10.3.1 (see Section 10.6). Thus Rn −→ 0.  n (φ) is ν ∗ ({Hφ − ν 0 (φ)E}2 ). If φ(x) = 0 when The limiting variance of A |x| ≤ 1, then the limiting variance can be expressed as  E[{φ(Y 0 ) − E[φ(Y 0 )]}{φ(Y j ) − E[φ(Y 0 )]}1{|Y j | > 1}] . j∈Z

Multivariate tail array sums with univariate random threshold The previous result can be extended to tail array sums which depend on finite-dimensional marginal distributions. That is, for a fixed h ≥ 1, we define n−h 1  δ X j,j+h , λh,n = k j=1 un n−h 1  δ X j,j+h . λh,n = k j=1 |X |(n:n−k)

 h,n can be seen as a feasible estimator of the expoThe random measure λ  h,n can nent measure ν X 0,h of (X 0 , . . . , X h ). The asymptotic theory for λ be obtained as a particular case of Theorem 10.2.3 by considering the mul!j = (X j , . . . , X j+h ), j ∈ Z} and applying Problem tivariate time series {X  h,n . We need to adapt 5.9. Thus we only consider the asymptotic theory for λ the conditions of Theorem 10.3.8. We consider a function ψ : Rd(h+1) → R+ and we assume that

288

10 Estimation of cluster functionals

ψ(x0,h ) = 0 if |x0 | ≤ 1 ,   E ψ 2+δ (X 0,h /un ) un ) n≥1

 

rn E ψ X 0,h ψ X j,j+h  un un =0. lim lim sup m→∞ n→∞ P(|X 0 | > un )

(10.3.9a) (10.3.9b)

(10.3.9c)

|j|=m

Note that for simplicity, we have replaced (10.2.4) by (10.3.9a). We redefine the relevant function space Th,ψ as the set of functions φ defined on Rd(h+1) which are ν 0,h -almost surely continuous and such that |φ| ≤ cst ψ. Theorem 10.3.9 Assume that R(rn , un ), β(rn , n ), and S(rn , un ) hold. Assume that the distribution function F0 of |X 0 | is continuous. Let k be an intermediate sequence of integers and let un = F0← (1 − k/n). Let ψ be a function which satisfies (10.3.9a) and (10.3.9c). Assume that either ψ is bounded or there exists δ ∈ (0, 1] such that (10.3.9b) and (10.2.7) hold. Let φ ∈ Th,ψ be such that the class {φs , s ∈ [s0 , t0 ]} is linearly ordered and √  h,n (φs )] − s−α ν X 0,h (φ)| = 0 . lim k sup |E[λ (10.3.10) n→∞

Then

with σφ2 =

s∈[s0 ,t0 ]

√ d  h,n (φ) − ν X 0,h (φ)} −→ k{λ N(0, σφ2 ) 

E[{φ(Y 0,h ) − E[φ(Y 0,h )]}{φ(Y j,j+h )

j∈Z

− E[φ(Y 0,h )]}1{|Y j | > 1}] .

(10.3.11)

The convergence holds jointly for all finite collection of functions in Th,ψ satisfying the assumptions.

" by X "j = Proof. Define the regularly varying stationary time series X X j,j+h = (X j , . . . , X j+h ), j ∈ Z, and let Y be the associated tail process. Let ν ∗ be the cluster measure related to Y and G be the Gaussian d h+1 a norm on (R such random measure with mean measure ν ∗ . We choose

)  (0)  j,0 > 1 . Since y | > 1. Denote E ( y ) = j∈Z 1 y that | y 0 | > 1 =⇒ |

P(|X 0 | > un ) = P Y 0,0 > 1 , n→∞ P (|X 0,h | > un ) lim

10.3 Central limit theorem for feasible estimators

289

(cf. Problem 5.9) we have √

−1/2  d  h,n (φ) − ν X 0,h (φ)} −→ k{λ P Y 0,0 > 1 G(Hφ − ν ∗ (Hφ )E (0) ) .

Thus the asymptotic variance of the limiting distribution is σφ2 =

ν ∗ ({Hφ − ν ∗ (Hφ )E (0) }2 )

. P Y 0,0 > 1

By Lemma 10.2.4 and Problem 5.9, we have  ν ∗ (Hφ2 ) =

E[φ(Y 0,h )φ(Y j,j+h )] . P Y 0,0 > 1 j∈Z

Also, again by Problem 5.9, ν ∗ (Hφ ) = P Y 0,0 > 1 E[φ(Y 0,h )] and 

ν ∗ ((E (0) )2 ) = P Y 0,0 > 1 P(|Y j | > 1) . j∈Z

Note that φ(Y j,j+h ) = φ(Y j,j+h )1{|Y j | > 1}. Since Hφ is shift-invariant y | > 1, applying (5.4.13), the time-change forand since | y 0 | > 1 =⇒ | mula (5.3.1a) and Problem 5.9, we obtain    ν ∗ (Hφ E (0) ) = ν(Hφ 1{| y 0 | > 1}) = E Hφ (Y )1 Y 0,0 > 1     = E φ(Y j,j+h )1 Y 0,0 > 1 j∈Z



 = P Y 0,0 > 1 E[φ(Y j,j+h )1{|Y j | > 1}] j∈Z



E[φ(Y 0,h )1{|Y j | > 1}] . = P Y 0,0 > 1 j∈Z

Also σφ2 =



E[φ(Y 0,h )φ(Y j,j+h )]

j∈Z

− 2E[φ(Y 0,h )]



E[φ(Y j,j+h )1{|Y j | > 1}]

j∈Z

+ E2 [φ(Y 0,h )] =





P(|Y j | > 1)

j∈Z

E[{φ(Y 0,h ) − E[φ(Y 0,h )]}φ(Y j,j+h )1{|Y j | > 1}]

j∈Z

− E[φ(Y 0,h )]

 j∈Z

E [{φ(Y j,j+h ) − E[φ(Y 0,h )]}1{|Y j | > 1}] .

290

10 Estimation of cluster functionals

Applying once more the time-change formula (5.3.1a), we have E[{φ(Y j,j+h ) − E[φ(Y 0,h )]}1{|Y j | > 1}] = E[{φ(Y 0,h ) − E[φ(Y 0,h )]}1{|Y −j | > 1}] . This yields (10.3.11).



10.4 Examples In this section, we apply the results of Section 10.3 to several practical examples.

10.4.1 Bounded cluster functionals that vanish around zero Let BCF be the set of bounded shift-invariant functionals H defined on 0 (Rd ), which are almost surely continuous with respect to ν ∗ and such that H(x) = 0 if x∗ ≤  for some  > 0 and for every x ∈ 0 and m > k ≥ 1,

|H(x1,k ) − H(x1,m )| ≤ cst · 1 x∗k+1,m >  . (10.4.1) The next result indicates that (besides mixing and the technical conditions on rn and un ) only AC(rn , un ) is required for central limit theorem. In other words, the central limit theorem holds under the same conditions as consistency, see Theorem 10.1.2. Hereafter, un and k are related by un = F0← (1 − k/n). Corollary 10.4.1 Assume that Conditions R(rn , un ), β(rn , n ), and AC(rn , un ) hold. Then  ∗

fi.di.  n,rn − E[ nP(|X 0 | > un ) ν ν ∗n,rn ] −→ G , on BCF with G as in Theorem 10.2.1. If furthermore S(rn , un ) and (10.3.2a)–(10.3.2b) hold, then √ ∗

fi.di.  n,rn − ν ∗ −→ G(· − ν ∗ (·)E) , k ν on BCF.

Proof. We must check the conditions of Theorem 10.2.1. Condition (BCLT1) holds by Theorem 6.2.5 and since H, H ∈ BCF implies HH ∈ BCF. The

10.4 Examples

291

negligibility condition (BCLT2) holds trivially for bounded functionals since nP(|X 0 | > un ) → ∞. Moreover, since n ≤ rn , we have P(X ∗rn −n +1,rn > un ) = P(X ∗1,n > un ) = O(n P(|X 0 | > un )) by Corollary 6.2.6, hence (rn P(|X 0 | > un ))−1 P(X ∗rn −n +1,rn > un ) = o(1). In view of (10.4.1), this implies Condition (BCLT3).  Example 10.4.2 (Blocks Estimator of the Extremal Index) Assume that d = 1 and that {Xj , j ∈ Z} is a non-negative regularly varying time series. Recall ∗ P(X1,r > un ) n . n→∞ rn P(X0 > un )

ϑ = lim

Under the assumptions of Corollary 10.2.1, we know by Corollary 7.3.3 that ϑ is the extremal index of the series {Xj }. Let F be the distribution function of X0 and consider the pseudo-estimator ϑn =

[n/rn ]    1 ∗ 1 X(i−1)r > un . n +1,irn nF (un ) i=1

(10.4.2)

 ∗n,rn (H) with H(x) = 1{x∗ > 1} for x ∈ RZ+ . Here ν ∗ (H) = That is, ϑn = ν ν ∗ (H 2 ) = ϑ. If R(rn , un ), β(rn , n ), and AC(rn , un ) hold we obtain # fi.di. nF (un ){ϑn − E[ϑn ]} −→ N(0, ϑ) . In case of extremal independence, ϑ = 1, the limiting distribution is N(0, 1). Consider now the data-based estimator of the extremal index ϑ (cf. Example 10.1.7): [n/rn ]  1   ∗  ϑn,k = 1 X(i−1)rn +1,irn > X(n:n−k) . k i=1

(10.4.3)

Here, H(x) = 1{x∗ > 1} = H 2 (x) and the class {Hs , s ∈ [s0 , t0 ]} is linearly ordered. If condition S(rn , un ) and the appropriate bias conditions hold, the estimator is asymptotically normal and the limiting variance is  P(Yj > 1) − ϑ . ν ∗ ({H − ν ∗ (H)E}2 ) = ϑ2 j∈Z

When the time series is extremally independent, ϑ = 1 and ν ∗ (E 2 ) = 1, so that the limit is degenerated. 

292

10 Estimation of cluster functionals

Example 10.4.3 (Blocks Estimator of Stop-Loss Index) For a univariate non-negative time series, consider the stop-loss index θstoploss (η) introduced in Example 6.2.10. We have θstoploss (η) = ν ∗ (H) with   (xj − 1)+ > η . H(x) = 1 j∈Z



Since H(x) = 0 if x ≤ 1, it also holds that

θstoploss (η) = E[H(Y )1 Y ∗−∞,−1 ≤ 1 ] ⎛ ⎞ ∞  = P ⎝ (Yj − 1)+ > η, Y ∗−∞,−1 ≤ 1⎠ . j=0

The pseudo-estimator is  ∗n,rn (H) θstoploss,n (η) = ν

⎧ mn ⎨  1 = nP(X0 > un ) i=1 ⎩ 1

rn 

(Xj − un )+ > ηun

j=(i−1)rn +1

⎫ ⎬ ⎭

.

If R(rn , un ), β(rn , n ) and AC(rn , un ) hold we obtain,  d nP(X0 > un ){θstoploss,n (η) − E[θstoploss,n (η)]} −→ N(0, θstoploss (η)) . We note that in case of extremal independence, Yj = 0 for j = 0, we have θstoploss (η) = P(Y0 > 1 + η) = (1 + η)−α . Consider now the data-based estimator of the stop-loss index: ⎧ ⎫ mn ⎨ rn ⎬   1 θstoploss,n,k (η) = 1 (Xj − X(n:n−k) )+ > ηX(n:n−k) . ⎩ ⎭ k i=1

j=(i−1)rn +1

If S(rn , un ) and the appropriate bias conditions hold, then √ k(θstoploss,n,k (η) − θstoploss (η)) is asymptotically normal and the limiting variance is

  2 θstoploss (η) 1 − 2P (Yj − 1)+ > η + θstoploss (η) P(Yj > 1) . j∈Z

j∈Z

In case of extremal independence the limiting variance is (1 + η)−α (1 − (1 +  η)−α ). Example 10.4.4 (Blocks Estimator of Cluster Size Distribution) Still considering a univariate non-negative time series, the pseudo-estimator of the cluster size distribution (introduced in Example 6.2.9) by

10.4 Examples

π n (m) =

293

[n/rn ]  1 1{E(X n,i ) = m} , nF (un ) i=1

with X n,i = u−1 n (X(i−1)rn +1 , . . . , Xirn ). The functional considered is H = 1{E = m} and π n (m) is proportional to the number of blocks with exactly m exceedences over the threshold un . If R(rn , un ), β(rn , n ), and AC(rn , un ) hold we obtain # fi.di. nF (un ){ πn (m) − E[ πn (m)]} −→ N(0, π(m)) with * + π(m) = lim E[ πn (m)] = ϑP E(Y ) = m | Y ∗−∞,−1 ≤ 1 . n→∞

In case of extremal independence, Yj = 0 for j = 0, we have π(1) = 1 and π(m) = 0 for m > 1. Consider now the data-based version  [n/rn ]  irn

1  π n (m) = 1 1 Xj > X(n:n−k) = m . j=(i−1)rn +1 k i=1 If condition S(rn , un ) and the appropriate bias conditions hold, then the feasible estimator is asymptotically normal with asymptotic variance  π(m) (1 − 2P(E(Y ) = m)) + π 2 (m) P(Yj > 1) . j∈Z

In the case of extremal independence, then π(m) = 0 for all m > 0 and the limiting variance is zero. 

10.4.2 Indicator functionals We now consider functional of the form 1{K > 1} which possibly do not vanish around the origin, but where K satisfies the assumption of Theorem 6.2.16. Let K be the set of cluster functionals H of the form H = 1{K > 1} such that • K is a shift-invariant functional defined on 1 ; • K(0) = 0; • K is Lipschitz with respect to the 1 norm, i.e. for all x ∈ 1 (Rd ),  x j − y j . (10.4.4) |K(x) − K(y)| ≤ cst · |x − y|1 = cst · j∈Z

294

10 Estimation of cluster functionals

Corollary 10.4.5 Assume that R(rn , un ), β(rn , n ), AC(rn , un ), and ANSJB(rn , un ) hold. Then  ∗

fi.di.  n,rn − E[ nP(|X 0 | > un ) ν ν ∗n,rn ] −→ G , on K with G as in Theorem 10.2.1. Assume moreover that K is homogeneous of order 1 and that Conditions S(rn , un ) and (10.3.2a)–(10.3.2b) hold. Then, √ ∗

d  n,rn − ν ∗ −→ G(· − ν ∗ (·)E) , k ν on K.

Proof. As before, we check the conditions of Theorem 10.2.1. Condition (BCLT1) holds by Theorem 6.2.16 and since for H, H ∈ K we have HH ∈ K. Indeed, if H = 1{K > 1} and H = 1{K > 1}, then HH = 1{K ∧ K > 1}. The functional K ∧ K is shift-invariant and K ∧ K (0) = 0. Moreover, noting that |K ∧ K (x) − K ∧ K (y)| ≤ |K(x) − K(y)| + |K (x) − K (y)| , we conclude that (10.4.4) holds for K ∧ K . The negligibility condition (BCLT2) holds since H is bounded. Now, |1{K(x1,rn ) > 1} − 1{K(x1,rn −n ) > 1}| = 1{K(x1,rn ) > 1}1{K(x1,rn −n ) ≤ 1} + 1{K(x1,rn ) ≤ 1}1{K(x1,rn −n ) > 1} . We consider the first pair of indicators in the last line. The events {K(x1,rn ) > 1} and {K(x1,rn −n ) ≤ 1} imply that there exists s > 0 such that K(x1,rn ) − K(x1,rn −n ) > s . Applying the same reasoning to the second pair of indicators and (10.4.4), we have ⎧ ⎫ rn ⎨ ⎬  |xj | > s . |1{K(x1,rn ) > 1} − 1{K(x1,rn −n ) > 1}| ≤ 21 cst ⎩ ⎭ j=rn −n +1

By Proposition 6.2.11, and since n = o(rn ) ⎛ ⎞ rn  P⎝ |X j | > sun ⎠ = O(n P(|X 0 | > un )) . j=rn −n +1

This implies Condition (BCLT3).



10.4 Examples

295

Example 10.4.6 (Blocks Estimator of Large Deviations Index) For a one-dimensional time series, the large deviation index θlargedev introduced in Example 6.2.17 is given by θlargedev = ν ∗ (H) with ⎞ ⎛  H = 1{K > 1} , K(x) = ⎝ xj ⎠ . j∈Z

Thus

⎡⎛ θlargedev = E ⎣⎝

∞ 

⎞α



Θj ⎠ − ⎝

j=0

+

∞ 

+

⎞α ⎤ Θj ⎠ ⎦ .

j=1

+

A pseudo-estimator is given by θlargedev,n

⎧⎛ mn ⎨  1  ∗n,rn (H) = =ν 1 ⎝ nP(|X0 | > un ) i=1 ⎩

irn 

j=(i−1)rn +1

⎞ Xj ⎠ > un

⎫ ⎬ ⎭

.

+

Then  d nP(|X0 | > un ){θlargedev,n − E[θlargedev,n ]} −→ N(0, θlargedev ) In case of extremal independence, θlargedev = P(Θ0 = 1) = p. Consider now the data-based blocks estimator of the large deviation index: ⎧⎛ ⎫ ⎞ mn ⎨ irn ⎬   1 θlargedev,n,k = 1 ⎝ Xj ⎠ > |X|(n:n−k) . ⎩ ⎭ k i=1

j=(i−1)rn +1

+

Under the assumptions of Corollary 10.4.5, √ k(θlargedev,n,k − θlargedev ) is asymptotically centered Gaussian with variance ⎞ ⎛⎛ ⎞   2 Yj ⎠ > 1⎠ + θlargedev P(|Yj | > 1) . θlargedev − 2θlargedev P ⎝⎝ j∈Z

+

j∈Z

In case of extremal independence, θlargedev = P(Θ0 = 1) = p and the limiting variance is then p − 2p2 + p2 = p(1 − p) and vanishes in case of non-negative time series.  Example 10.4.7 (Blocks estimator of the ruin index) For a univariate time series, the ruin index θruin introduced in Example 6.2.18 and 6.2.22 is given by θruin = ν ∗ (H) with

296

10 Estimation of cluster functionals

⎛ ⎞  H = 1{K > 1} , K(x) = sup ⎝ xj ⎠ i∈Z

j≤i

.

+

Thus ⎛



θruin = E ⎣sup ⎝ i≥0

i  j=0

⎞α

⎛ ⎞α ⎤ i  Θj ⎠ − sup ⎝ Θj ⎠ ⎦ . i≥1

j=1

+

+

A pseudo-estimator is given by  ∗n,rn (H) θruin,n = ν

⎧ ⎛ mn ⎨  1 ⎝ 1 max = nP(|X0 | > un ) i=1 ⎩(i−1)rn un

j=(i−1)rn +1

⎫ ⎬ ⎭

.

+

Then  d nP(|X0 | > un ){θruin,n − E[θruin,n ]} −→ N(0, θruin ) . Consider now the data-based blocks estimator of the ruin index: ⎧ ⎫ ⎛ ⎞ j mn ⎨ ⎬   1 ⎝ 1  max Xj ⎠ > |X|(n:n−k) . θruin,n,k = ⎩j =(i−1)rn +1,...,irn ⎭ k i=1

j=(i−1)rn +1

+

√ Under the assumptions of Corollary 10.4.5, k(θruin,n,k − θruin ) is asymptotically centered Gaussian with variance ⎛ ⎞ ⎛ ⎞   2 P(|Yj | > 1) − 2θruin P ⎝sup ⎝ Yj ⎠ > 1⎠ . θruin + θruin j∈Z

i∈Z

j≤i

+



10.4.3 Tail array sums In this section, we provide examples of applications of Theorem 10.3.9. Example 10.4.8 (Conditional tail expectation) Let {Xj , j ∈ Z} be a univariate non-negative time series with tail process {Yj , j ∈ Z} and α > 1. Recall that CTEh (x) was introduced in Section 2.3.3 as CTEX (h) = CTEh (x) = E[Xh | X0 > x] . Then (cf. Problem 5.3)

10.4 Examples

297

lim x−1 CTEh (x) = lim x−1 E[Xh | X0 > x] = E[Yh ] .

x→∞

x→∞

The feasible estimator of E[Yh ] is given by n−h  Xj+h

 h,n = 1 CTE 1 Xj > X(n:n−k) . k j=1 X(n:n−k)

In order to check the conditions of Theorem 10.3.9, let φ be defined on Rh+1 by φ(x0 , . . . , xh ) = xh 1{x0 > 1} and set ψ = φ. Condition (10.3.9a) holds and (10.3.9c) is equivalent to lim lim sup

m→∞ n→∞

rn  E[1{X0 > un }1{Xj > un }Xh Xj+h ] =0. u2n P(X0 > un ) j=m

(10.4.5)

If α > 2, then (10.3.9b) holds. The bias condition (10.3.10) becomes √ E[Xh 1{X0 > sun }] − s−α E[Yh ] = 0 . lim k sup n→∞ P(X0 > un ) s∈[s0 ,t0 ] The class {φs , s ∈ [s0 , t0 ]} is linearly ordered. Thus if conditions R(rn , un ), β(rn , n ), S(rn , un ), and (10.2.7) hold, we can apply Theorem 10.3.9 to obtain √  h,n − E[Yh ]) converges weakly to a centered Gaussian distributhat k(CTE tion with variance  E[{Yh − E[Yh ]}{Yj+h − E[Yh ]}1{Yj > 1}] . j∈Z

 Estimation of the distribution of the forward spectral tail process For simplicity, we consider a univariate stationary regularly varying time series {Xj , j ∈ Z} with tail process {Yj , j ∈ Z} and spectral tail process {Θj , j ∈ Z}. We now consider the estimation of the distribution of the forward spectral tail process. For h ∈ N∗ and y ∈ R, define Υh (y) = P(Θh ≤ y) and n−h

1 Υh,n (y) = 1 |Xj | > |X|(n:n−k) 1{Xj+h ≤ |Xj |y} . k j=1

To describe the limiting distribution, define the functions φ1,y on Rh+1 by φ1,y (x0 , . . . , xh ) = 1{|x0 | > 1}1{xh ≤ |x0 |y} and Hφ1,y on 0 (Rh+1 ) by

298

10 Estimation of cluster functionals

 Hφ1,y (x) = j∈Z φ1,y (xj ). Let H be the Gaussian process defined on R by H(y) = G(Hφ1,y ). By the usual computations,  var(H(y)) = ν ∗ (Hφ21,y ) = E [φ1,y (Y 0,h )φ1,y (Y j,j+h )] =



j∈Z

P (Θh ≤ y, |Yj | > 1, Θj+h ≤ |Θj |y) .

j∈Z

Thus, H can be equivalently defined as the Gaussian process with covariance function  cov(H(y), H(z)) = P (Θh ≤ y, |Yj | > 1, Θj+h ≤ |Θj |z) . j∈Z

Note that H(∞) = G(E) and recall that by Lemma 9.2.3, condition S(rn , un ) implies that  P(|Yj | > 1) < ∞ . var(H(∞)) = j∈Z

Define the function Υh,n on (0, ∞) × R by Υh,n (s, y) =

P(|X0 | > un s, Xh ≤ |X0 |y) , s>0, y∈R. P(|X0 | > un )

Theorem 10.4.9 Let {Xj , j ∈ Z} be a stationary, univariate regularly varying time series. Assume that the distribution functions F0 of |X0 | and Υh of Θh are continuous. Let k be a non-decreasing sequence of integers and set un = F0← (1 − k/n). Fix 0 < s0 < 1 < t0 . Assume that R(rn , un ), β(rn , n ), and S(rn , un ) hold and √ lim sup sup k Υh,n (s, y) − s−α Υh (y) . n→∞ s0 ≤s≤t0 y∈R

Then, √

k(Υh,n − Υh ) −→ H − Υh H(∞) . fi.di.

If moreover there exists τ > 1 such that 

2+τ  r n E 1{|X | > u } j n j=1 sup 1}] . = j∈Z

"j = "j , j ∈ Z} defined by X Proof. Consider the multivariate time series {X  (Xj , . . . , Xj+h ). This time series is regularly varying with tail process Y . See Problem 5.9 for the link between Y and Y . For s > 0 and y ∈ R, define the function φs,y on Rh+1 by φs,y (x) = 1{|x0 | > s}1{xh ≤ |x0 |y} , x = (x0 , . . . , xh ) . Then, recalling (10.2.6) and (10.3.7), √  n (φs,y ) , k(Υh,n (s, y) − E[Υh,n (s, y)]) = A √  n (φ1,y ) . k(Υh,n (y) − Υh (y)) = A The first step of the proof is to obtain the finite-dimensional convergence  n indexed by the class G = {φs,y , s0 ≤ s ≤ t0 , y ∈ R}. Set ψ(x) = of A 1{|x0 | > s0 }. The assumptions of Theorem 10.2.3 are subsumed by those of the present theorem. In particular, S(rn , un ) for the one-dimensional process implies Sψ (rn , un ) for the (h + 1)-dimensional series. Since (cf. Problem 5.9)

P(|X 0 | > un ) = P Y 0,0 > 1 , n→∞ P (|X 0,h | > un ) lim

we have fi.di.  n (φs,y ) −→ A {P(Y 0,0 > 1)}−1/2 G(Hφs,y ) ,

(10.4.8)

where G is as in Theorem 10.2.1. Since ψ is bounded, the assumptions of Thed  n (φ1,y ) −→ orem 10.3.9 hold and A {P(Y 0,0 > 1)}−1/2 G(Hφ1,y −ν ∗ (Hφ1,y )E). The joint convergence for finitely many y follows from the finite-dimensional  n (φs,y ). convergence of A We now prove that last statement of the theorem. Write ζn = |X|(n:n−k) /un and n (s, y) = E[Υh,n (s, y)] − s−α Υh (y) . B Then,  √  √ √ n (ζn , y) + k{ζn−α − 1}Υh (y) . k Υh,n (y) − Υh (y) = Υh,n (ζn , y) + k B (10.4.9)

300

10 Estimation of cluster functionals

Fix two real numbers a < b. For every η > 0 and δ > 0 (small enough),  P sup Υh,n (ζn , y) − Υh,n (1, y) > η y∈[a,b]



≤ P(|ζn − 1| > δ) + P

sup Υh,n (s, y) − Υh,n (1, y) > η

sup

.

1−δ≤s≤1+δ y∈[a,b] P

By Corollary 10.1.6, ζn −→ 1. By Lemma 10.6.1, for every  > 0, we can choose δ in such a way that the last term above is less than , so  lim sup P sup Υh,n (ζn , y) − Υh,n (1, y) > η ≤  . n→∞

y∈[a,b]

Since  is arbitrary, this proves that P sup Υh,n (ζn , y) − Υh,n (1, y) −→ 0 .

(10.4.10)

y∈[a,b]

By (10.4.8), Lemma 10.6.1, and Theorem C.2.12, we know that Υh,n (1, ·) =⇒ w H in (D([a, b]), J1 ). This, with (10.4.10) implies that Υh,n (ζn , ·) =⇒ H in (D([a, b]), J1 ). The other two terms in the right-hand side of (10.4.9) are dealt with as usual by the bias condition and Vervaat’s lemma.  w

10.5 Proof of Theorem 10.2.3 To prove Theorem 10.2.3 we write wn =



nP(|X 0 | > un ). Then,

 n (φ) − E[λ  n (φ)]} = wn { wn {λ ν ∗n,rn (Hφ ) − E[ ν ∗n,rn (Hφ )]} + wn Rn (φ) , (10.5.1) with the term Rn (φ) defined in (10.2.3). We will prove that the remainder term wn Rn is negligible and that Conditions (BCLT1), (BCLT2), (BCLT3) of Theorem 10.2.1 hold on the space fi.di.  n (φ) − E[λ  n (φ)]} −→ G1= (Hφ ). {Hφ , φ ∈ Tψ }. This will imply that wn {λ 1= 1= ∗ Since cov(G (Hφ ), G (Hϕ )) = ν (Hφ Hϕ ) for φ, ϕ ∈ Tψ , this will prove Theorem 10.2.3. In order to prove that Condition (BCLT1) holds, note first that under the continuity condition in Definition 10.2.2, for φ ∈ Tψ , ν ∗ (Hφ ) = lim ν ∗n,rn (Hφ ) = lim n→∞

n→∞

E[φ(X 0 /un )] = ν 0 (φ) . P(|X 0 | > un )

(10.5.2)

10.5 Proof of Theorem 10.2.3

301

Write Vj,n (φ) = φ (X j /un ) and Sn (φ) =

rn 

Vj,n (φ) , S¯n (φ) = Sn (φ) − E[Sn (φ)] .

(10.5.3)

j=1

We now prove the convergence of cov(Sn (φ), Sn (ϕ)) for φ, ϕ ∈ Tψ (extending Lemma 9.2.3 to general tail array sums). Lemma 10.5.1 Assume that R(rn , un ) holds and let ψ be a function which satisfies conditions (10.2.4), (10.2.5), and Sψ (rn , un ). Then for all φ ∈ Tψ , ν ∗ (Hφ2 ) < ∞ and for all ϕ ∈ Tψ ⎡⎛ ⎞⎛ ⎞⎤ rn rn   1 E ⎣⎝ ν ∗ (Hφ Hϕ ) = lim φ (X j /un )⎠ ⎝ ϕ (X j /un )⎠⎦ n→∞ rn P(|X 0 | > un ) j=1 j=1 (10.5.4a) m  E[φ (X 0 /un ) ϕ (X j /un )] = lim lim (10.5.4b) m→∞ n→∞ P(|X 0 | > un ) j=−m ⎛ ⎞ rn rn   1 = lim cov ⎝ φ (X j /un ) , ϕ (X j /un )⎠ . n→∞ rn P(|X 0 | > un ) j=1 j=1 (10.5.4c) Proof (Proof of Lemma 10.5.1). Since Tψ is a linear space and by the identity 2xy = (x+y)2 −x2 −y 2 , it suffices to prove these identities for φ = ϕ. We will simply Vj,n . henceforth omit φ from the notation for Vj,n (φ) and will write ¯ = φ(x0 ). Then Hφ = j∈Z φ¯ ◦ B j . Define the function φ¯ on 0 (Rd ) by φ(x) ¯ φ◦B ¯ j The continuity condition in Definition 10.2.2 means that the function φ· is ν-almost surely continuous. Indeed, by definition of the tail measure and by its shift-invariance, if φ is almost surely continuous with respect to ν 0 , then the functions φ¯ and φ¯ ◦ B j are almost surely continuous with respect to ν. To be more precise, for all j ∈ Z, 0 = ν 0 (Disc(φ)) = ν({x : φ discontinuous at x0 }) = ν({x : φ discontinuous at xj }) . Thus, Conditions (10.2.4), (10.2.5), and Proposition 2.1.9 ensure that for all j ∈ Z, E[V0,n Vj,n ] = ν(φ¯ · φ¯ ◦ B j ) . (10.5.5) P(|X 0 | > un )  We now prove that the series j∈Z ν(|φ¯ · φ¯ ◦ B j |) is summable and for notational clarity, we assume, without loss of generality, that φ is non-negative. Fix η > 0. Applying Sψ (rn , un ) and the fact that φ is bounded by ψ, we can choose m such that, for every R ≥ m lim

n→∞

302

10 Estimation of cluster functionals R 

R 

E[V0,n Vj,n ] ≤η. n→∞ P(|X 0 | > un ) j=m+1

ν(φ¯ · φ¯ ◦ B j ) = lim

j=m+1

∞ j ¯ ¯ This proves that the series j=1 ν(φ · φ ◦ B ) is summable. Fix m ≥ 1. Applying (10.5.5), we have 

2  rn E V ∞ j=1 j,n  j ¯ ¯ ¯ − ν(φ) − 2 lim sup ν(φ · φ ◦ B ) r P(|X | > u ) n→∞ n 0 n j=1 ≤2

∞ 

ν(φ¯ · φ¯ ◦ B j ) + 2 lim sup n→∞

j=m+1

rn 

E[V0,n Vj,n ] . P(|X 0 | > un ) j=m+1

∞ By Condition Sψ (rn , un ) and since the series j=1 ν(φ¯ · φ¯ ◦ B j ) is summable, we can make the right-hand side of the last displayed equation arbitrarily small and this proves that 

2  rn E ∞ j=1 Vj,n  ¯ +2 = ν(φ) lim ν(φ¯ · φ¯ ◦ B j ) . n→∞ rn P(|X 0 | > un ) j=1 Since ν is shift-invariant, it holds that ∞ ∞ −1    ¯ = ¯ . ν(φ¯ · φ¯ ◦ B j ) = ν(φ¯ ◦ B −j · φ) ν(φ¯ ◦ B j · φ) j=1

j=1

Therefore, E lim

n→∞



j=−∞

2 

rn

j=1 Vj,n

rn P(|X 0 | > un )

=



¯ . ν(φ¯ · φ¯ ◦ B j ) = ν(Hφ · φ)

j∈Z

By the identity (10.2.9), this proves (10.5.4a). Finally, E[Sn2 (φ)] rn 2 (E[V0,n (φ)])2 var(Sn (φ)) = − rn P(|X 0 | > un ) rn P(|X 0 | > un ) rn P(|X 0 | > un ) E[Sn2 (φ)] + O(rn P(|X 0 | > un )) . = rn P(|X 0 | > un ) Under condition R(rn , un ), the last term is o(1). This proves (10.5.4c). By stationarity, Lemma 10.5.1 implies that ⎛ n  var(wn Rn (φ)) = var ⎝wn−1  =O

⎞ φ(X j /un )⎠

j=mn rn +1

(n − mn rn ) P(|X 0 | > un ) n P(|X 0 | > un )

 = o(1) .



10.5 Proof of Theorem 10.2.3

303

Thus, the reminder term in (10.5.1) is negligible. Next, we consider the Lindeberg condition (BCLT2). Note that  

 ν ∗n,rn Hφ2 1 |Hφ | > η nP(|X 0 | > un ) = E[Sn2 (φ)1{|Sn (φ)| > ηwn }] . The next two lemmas prove the Lindeberg condition (BCLT2) separately when ψ is bounded or unbounded. Lemma 10.5.2 Assume that R(rn , un ) holds and let ψ be a bounded function which satisfies conditions (10.2.4) and Sψ (rn , un ). Then for all η > 0 and all φ ∈ Tψ ,     1 E Sn2 (φ)1 |Sn (φ)| > η nP(|X 0 | > un ) lim n→∞ rn P(|X 0 | > un )     1 E S¯n2 (φ)1 |S¯n (φ)| > η nP(|X 0 | > un ) = 0 . = lim n→∞ rn P(|X 0 | > un ) Proof. Write for simplicity Sn = Sn (φ), S¯n = S¯n (φ). As a first step we note that the centering can be omitted. By the assumptions on ψ, its boundedness and regular variation, we have E[|V0,n |q ] = O(P(|X 0 | > un )) for all φ ∈ Tψ and q > 0 which implies E[Sn ] = O(rn P(|X 0 | > un )). Since rn = o(n), we have, for large enough n,

1 |S¯n | > ηwn ≤ 1{|Sn | > ηwn /2} . Since η is arbitrary, we can remove the centering from the indicator. Furthermore,     E Sn2 1{|Sn | > ηwn } E S¯n2 1{|Sn | > ηwn } O({E[Sn ]}2 ) = + rn P(|X 0 | > un ) rn P(|X 0 | > un ) rn P(|X 0 | > un )  2  E Sn 1{|Sn | > ηwn } + O(rn P(|X 0 | > un )) . = rn P(|X 0 | > un ) (10.5.6) Hence, under condition R(rn , un ), it suffices to study the first term in (10.5.6), which is developed as I1 + 2I2 with I1 =

n  1 2 E[Vj,n 1{|Sn |>ηwn } ] , rn P(|X 0 | > un ) j=1

I2 =

rn  rn  1 E[Vi,n Vj,n 1{|Sn |>ηwn } ] . rn P(|X 0 | > un ) i=1 j=i+1

r

 By (10.5.4c) and H¨ older inequality, we have E[|Sn |] = O( rn P(|X 0 | > un )). Applying Markov’s inequality and the boundedness of φ, we obtain

304

10 Estimation of cluster functionals

r n I1 ≤

2 j=1 E[Vj,n |Sn |]

ηwn rn P(|X 0 | > un )

 =O

1 wn

E



r n

j=1 |Vj,n |

2 

rn P(|X 0 | > un )

.

By 10.5.1, the fraction in the last term is bounded, hence I1 = O(wn−1 ) = o(1). Fix a positive integer m. Since φ is bounded, we have I2 ≤

rn m   1 E[|Vi,n Vj,n |1{|Sn |>ηwn } ] rn P(|X 0 | > un ) i=1 j=i+1

+

rn i+m   1 E[|Vi,n Vj,n |1{|Sn |>ηwn } ] rn P(|X 0 | > un ) i=m+1 j=i+1

rn rn   1 + E[|Vi,n Vj,n |] . rn P(|X 0 | > un ) i=m+1 j=i+m+1

(10.5.7a)

(10.5.7b)

(10.5.7c)

Since φ is bounded, the terms in (10.5.7a) and (10.5.7b) are each bounded by n  mφ∞ E[Vj,n 1{|Sn |>ηwn } ] = o(1) , rn P(|X 0 | > un ) j=1

r

by the same argument as for I1 . Thus, rn  2 E[V0,n Vi,n ] . I2 = o(1) + P(|X 0 | > un ) i=m+1

(10.5.8)

By Condition Sψ (rn , un ), the last expression in (10.5.8) can be made arbitrarily small by choosing m large enough.  We extend Lemma 10.5.2 to the case of unbounded functions, at the cost of the extra restriction (10.2.7) on the sequence {rn }. Lemma 10.5.3 Assume that R(rn , un ) holds. Let ψ be a measurable nonnegative function which satisfies conditions (10.2.4) and Sψ (rn , un ), (10.2.5), and such that (10.2.7) holds for the same δ ∈ (0, 1]. Then for all η > 0 and all φ ∈ Tψ ,     1 lim E Sn2 (φ)1 |Sn (φ)| > η nP(|X 0 | > un ) n→0 rn P(|X 0 | > un )     1 E S¯n2 (φ)1 |S¯n (φ)| > η nP(|X 0 | > un ) = 0 . = lim n→0 rn P(|X 0 | > un ) (10.5.9) Proof. We follow closely the proof of Lemma 10.5.2 with appropriate modifica tions. Since φ ∈ Tψ , Lemma 10.5.1 implies that E[Sn ] = O( rn P(|X 0 | > un )) and thus the centering can be removed inside the indicator. The calculations

10.6 Proof of Theorem 10.3.1

305

leading to (10.5.6) are still valid in the unbounded case. Since δ ∈ (0, 1], we have by Markov inequality,   rn  1 1 2 I1 = O E[Vj,n |Sn |δ ] wnδ rn P(|X 0 | > un ) j=1     rn  rn  1 rn 1 2 δ =O E[Vi,n Vj,n ] = O = o(1) , wnδ rn P(|X 0 | > un ) i=1 j=1 wnδ by (10.2.5) and (10.2.7). As for I2 , the second term in (10.5.8) is handled again by Condition Sψ (rn , un ). Applying Markov inequality, the term in (10.5.7b) is bounded by rn rn i+    1 rn E[|V0,n |2+δ ] δ , E[Vi,n Vj,n Vk,n ]≤ δ ηwn rn P(|X 0 | > un ) ηwnδ P(|X 0 | > un ) j=i+1 i=+1

k=1

on account of the extended H¨ older inequality with p = q = 2 + δ and r = (2 + δ)/δ. This again is o(1) by (10.2.5) and (10.2.7). The term in (10.5.7a) is treated analogously.  Finally, we check (BCLT3). For a tail array sum, this boils down to computing the variance of the sum over a small block. By Lemma 10.5.1 we have ⎡⎛ ⎞2 ⎤  n ⎥ ⎢  E ⎣⎝ φ(X j /un )⎠ ⎦ = O(n P(|X 0 | > un )) = o(rn P(|X 0 | > un )) . j=1

This proves (BCLT2) and finishes the proof of Theorem 10.2.3.

10.6 Proof of Theorem 10.3.1 Recall from (10.0.3) that we have defined: √  n (H) = k{ ν ∗n,rn (H) − ν ∗n,rn (H)} . G As already mentioned, we know from Theorem 10.2.1 that the finite-dimensional  n converge to those of G. We only need to prove distributions (in L2 (ν ∗ )) of G asymptotic equicontinuity over the relevant class G. As in the proof of Theorem 10.2.1, we consider a pseudo-sample (X †1 , . . . , X †n ) such that the blocks (X †(i−1)rn +1 , . . . , X irn ), i = 1, . . . , mn , are mutually independent with the same distribution as the original block (X 1 , . . . , X rn ). We define the array of rowwise i.i.d. processes Zn,i on L2 (ν ∗ ), 1 ≤ i ≤ mn by † † Zn,i (H) = H(u−1 n (X (i−1)rn +1 , . . . , X irn )) , i = 1, . . . , mn .

306

10 Estimation of cluster functionals

Thanks to Lemma E.3.4 it suffices to prove the asymptotic equicontinuity of  † defined by the process G n  † (H) = k −1/2 G n

mn 

{Zn,i (H) − E[Zn,i (H)} , H ∈ L2 (ν ∗ ) ,

i=1

indexed by the class G equipped with the metric ρ∗ (H, H ) = ν ∗ ({H −H }2 ).  n , but with summation over  n,i , i = 1, 2, be defined in the same way as G Let G  † be defined in analogous even and odd numbered blocks, respectively, and G n,i way based on the independent blocks. Then Lemma E.3.4 yields, for i = 1, 2, ⎛ ⎞ P⎝

sup

H,H  ∈G ρ ∗ (H,H  ) ⎠ |G ⎛ ≤ P⎝

⎞ sup

H,H  ∈G ρ ∗ (H,H  ) ⎠ + 2mn βrn ,

where ρ∗ (H, H ) = ν ∗ ({H − H }2 ). By condition β(rn , n ), mn βrn = o(1).  † indexed Hence, we will apply Theorem C.4.5 to the sequence of processes G n by the class G = {Hs , s ∈ [s0 , t0 ]}. • The Lindeberg condition (i) of Theorem C.4.5 holds because the class G is linearly ordered and by condition (BCLT2) of Theorem 10.2.1. • Define the random metric d2n (H, K) =

mn 1 † † {H(u−1 n (X (i−1)rn +1 , . . . , X irn )) k i=1 † † 2 −K(u−1 n (X (i−1)rn +1 , . . . ,X irn ))} .

(10.6.1)

Then E[d2n (Hs , Ht )] = ν ∗n,rn ({Hs − Ht }2 ). Thus Condition (ii) of Theorem C.4.5 can be rewritten as follows: for every sequence {δn } decreasing to zero, lim

n→∞

sup s,t∈[s0 ,t0 ] |s−t|≤δn

ν ∗n,rn ({Hs − Ht }2 ) = 0 .

(10.6.2)

By assumption, the class {Hs , s ∈ [s0 , t0 ]} satisfies condition (BCLT1). Because of monotonicity, by Dini’s Theorem E.4.12, the convergence of ν ∗n,rn (Hs2 ) to ν ∗ (Hs2 ) = s−α ν ∗ (H 2 ) is uniform on [s0 , t0 ]. Furthermore, for s0 ≤ s ≤ t ≤ (s + δ) ∧ t0 , the linear ordering implies that ν ∗n,rn ({Hs − Ht }2 ) = ν ∗n,rn (Hs2 ) + ν ∗n,rn (Ht2 ) − 2ν ∗n,rn (Hs Ht ) ≤ ν ∗n,r (Hs2 ) − ν ∗n,r (Ht2 ) . n

n

10.6 Proof of Theorem 10.3.1

For s, t ∈ [s0 , t0 ], ∗ ν n,r (Hs2 ) − ν ∗n,r (Ht2 ) n n ≤2

sup

s0 ≤u≤t0

307

∗ ν n,r (Hu2 ) − ν ∗ (Hu2 ) + ν ∗ (H 2 ){s−α − t−α } . n

Fix η > 0. For large enough n, the uniform convergence yields sup ν ∗n,rn (Hs2 ) − ν ∗n,rn (Ht2 ) ≤ η + ν ∗ (H 2 ) sup {s−α − t−α } s0 ≤s,t≤t0 |s−t|≤δn

s0 ≤s,t≤t0 |s−t|≤δn

δn ν ∗ (H 2 ) . ≤ η + αs−α−1 0 This proves that (10.6.2) holds. • Since the class G is linearly ordered, we know by Lemma C.4.8 and Corollary C.4.20 that the random entropy condition (C.4.5) of Theorem C.4.5 holds.  n is asymptotically The conditions of Theorem C.4.5 hold, thus the sequence G equicontinuous and thus converges weakly in (D([s0 , t0 ]), J1 ) to its limit G which has almost surely uniformly continuous paths. • We must now justify the convergence (10.3.5), jointly with the previous  n (s) be the tail empirical process of the sequence {|X j | , j ∈ Z} (as one. Let T defined in (9.0.1)). Write ζn = |X|(n:n−k) /un . By Theorem 9.3.1, assumptions R(rn , un ), β(rn , n ), S(rn , un ) and the bias condition (10.3.2a) imply that √ w  n , k(ζn − 1)) =⇒ (T (T, α−1 T(1)) in D([s0 , ∞)) × R. For s, t > 0 (cf. Problem 5.25),  P(|Y j | > t−1 s) , ν ∗ (Es Et ) = s−α j∈Z

thus the process {T(s), s > 0} has the same distribution as {G(Es ), s > 0}. That is √ w  n (s), k(ζn − 1)) =⇒ (T (G(Es ), α−1 G(E)) , n in D([s0 , ∞) × R. To conclude, we must prove that the convergence of T  n . Since T  n (s) = G  n (Es ) + Rn (s) with holds jointly with that of G Rn (s) =

√ k

n 

(1{|X j | > un s} − s−α ) ,

j=mn rn +1

it suffices to prove that the remainder term Rn converges uniformly to 0. This  n in D. follows from the same arguments that yields the convergence of T

308

10 Estimation of cluster functionals

10.6.1 Proof of asymptotic equicontinuity for Theorem 10.4.9 Lemma 10.6.1 Under the assumptions of Theorem 10.4.9, for all η > 0, ⎞ ⎛ sup Υh,n (s, y) − Υh,n (t, z) > η ⎠ = 0 . lim lim sup P ⎝ sup δ→0 n→∞

s0 ≤s,t≤t0 a≤y,z≤b |y−z|≤δ |s−t|≤δ

Proof. We will prove, by applying Theorem C.4.5, the asymptotic equicontinuity of the sequence of processes  n (φs,y ), (s, y) ∈ [s0 , t0 ] × [a, b]} {A for all 0 < s0 < 1 < t0 and a < b. As in the previous results, it is sufficient  † . That is, let (X † , 1 ≤ to consider the process with independent blocks A n i i ≤ mn rn ) be random variables such that the blocks (X(i−1)rn +1 , . . . , Xirn ), 1 ≤ i ≤ mn , are rowwise i.i.d. with the same distribution as (X1 , . . . , Xrn ). Define the independent block of (h + 1)-dimensional elements

X †(i−1)rn +1,(i−1)rn +h , . . . , X †irn −h+1,irn , 1 ≤ i ≤ mn , X†n,i = u−1 n  † (Hφ ) .  † (φs,y ) = G A n n s,y Set H = {φs,y , (s, y) ∈ [s0 , t0 ] × [a, b]} and define the metric ρh on H by ρh ((s, y), (t, z)) = |s − t| + |y − z| . The class H is totally bounded for ρh . We now check conditions (i), (ii), and (iii) of Theorem C.4.5. Define vn = mn rn P(|X 0 | > un ) and consider the row-wise independent processes ir n −h

Zn,i (s, y) = vn−1/2

† φs,y (u−1 n X j,j+h ) , 1 ≤ i ≤ mn .

j=(i−1)rn +1

(i) The envelope function is Zn,1 H = vn−1/2

r n −h

  1 |Xj† | > un .

j=1

Thus condition (i) of Theorem C.4.5 holds by Lemma 10.5.2. (ii) Define the random metric dn on H by d2n ((s, y), (t, z)) = vn−1

mn  i=1

{Zn,i (s, y) − Zn,u (t, z)}2 .

10.6 Proof of Theorem 10.3.1

309

Using the inequality (a − b)2 ≤ |a2 − b2 | for a, b ≥ 0, we have, for s < t, E[d2n ((s, y), (t, z))] E[{Zn,1 (s, y) − Zn,1 (t, z)}2 ] rn F 0 (un ) E[{Zn,1 (s, y) − Zn,1 (t, y)}2 ] E[{Zn,1 (t, y) − Zn,1 (t, z)}2 ] ≤2 +2 rn F 0 (un ) rn F 0 (un ) 2 E[{Zn,1 (s, ∞) − Zn,1 (t, ∞)} ] E[{Zn,1 (s0 , y) − Zn,1 (s0 , z)}2 ] ≤2 +2 rn F 0 (un ) rn F 0 (un ) 2 2 2 2 (s0 , z)]| E[Zn,1 (s, ∞)] − E[Zn,1 (t, ∞)] |E[Zn,1 (s0 , y)] − E[Zn,1 + . ≤2 rn F 0 (un ) rn F 0 (un ) =

By Lemma 10.5.1, we have lim

n→∞

lim

n→∞

2 E[Zn,1 (s, ∞)]

rn F 0 (un ) 2 E[Zn,1 (s0 , y)] rn F 0 (un )

= s−α , = s−α 0



P(Θh ≤ y, Θj+h ≤ |Θj |y, Yj > 1) .

j∈Z

 Note that the function y → j∈Z P(Θh ≤ y, Θj+h ≤ |Θj |y, Yj > 1) is continuous. Indeed, since the distribution of Θh is continuous, we have, by the time change formula (5.3.1a), P(Θj+h = |Θj |y, Yj > 1) = P(Yj+h = |Yj |y, Yj > 1) = P(Yh = |Y0 |y, Y−j > 1) = P(Θh = y, Y−j > 1) = 0 . By Dini’s Theorem E.4.12, these convergences are uniform, thus Lemma C.4.7 implies that the condition (ii) of Theorem C.4.5 holds. (iii) To prove that the random entropy condition (iii) of Theorem C.4.5 holds,  † built with independent blocks corresponding to we consider the process G n  = {Hφ , (s, y) ∈ [s0 , t0 ] × the one defined in (10.0.3), indexed by the class H s,y ˜  [a, b]}. Define the random metric dn on H by d˜n (Hφs,y , Hφt,z ) = dn (φs,y , φt,z ) . Thus proving that the random entropy condition holds for H with respect to  with respect to the the random metric dn is equivalent to proving it for H ˜ random metric dn . To this purpose, we will apply Lemma C.4.22. Define the function G on 0 (Rh+1 ) by    (0) (0) (h) 1 |xj | > 1 , x = (xj )j∈Z , xj = (xj , . . . , xj ) . G(x) = j∈Z

310

10 Estimation of cluster functionals

k of functions on 0 (Rh+1 ) by (a) For k ≥ 1, we define the class H k = {H1{G ≤ k}, H ∈ H}  . H k is O(log k); see Corollary C.4.18. The VC-dimension of the class H  set Hk = H1{G ≤ k}. Then, (b) We define Gk = G1{G > k}. For H ∈ H,  Hk ∈ Hk and |H − Hk | ≤ Gk . (c) To prove that (C.4.10) holds, and note first that under condition S(rn , un ), Lemma 10.5.1 yields 

2  r n 1{|X | > u s } E m j n 0 n j=1 1  → s−α E[G2 (X†n,i )] = . 0 vn i=1 rn F 0 (un ) Furthermore, under condition (10.4.7), Burkholder’s inequality (E.1.6), for q ∈ (1, 2) 3 2 m mn n    q  † † 2 2 G (Xn,i ) − E[G (Xn,i )] ≤ cst E[|G2 (X†n,i ) − E[G2 (X†n,i )]|q ] E i=1 i=1  mn 2q   † E G(Xn,i ) ≤ cst i=1 rn  1{|Xj | > un s0 })2q ] . = cst mn E[( j=1

Choosing q ∈ (1, 2) such that 2q < 2 + τ with τ as in (10.4.7) yields 3 2 mn   q −1  G2 (X†n,i ) − E[G2 (X†n,i )] = O(vn−q mn rn F 0 (un )) E vn i=1

= O(vn−(q−1) ) = o(1) . We conclude that vn−1 holds.

m n i=1

P

G2 (X†n,i ) −→ s−α 0 . This proves that (C.4.10)

(d) Condition (C.4.11) holds with γ = 1/τ by Markov inequality and (10.4.7). Thus we can apply Lemma C.4.22 to obtain that the random entropy condition (C.4.5) holds. We have checked all the conditions of Theorem C.4.5 and this concludes the proof of the asymptotic equicontinuity. 

10.7 Bibliographical notes

311

10.7 Bibliographical notes From the point of view of statistical inference, there are two major paradigms in extreme value theory: the peaks-over-threshold (POT) method and the block maxima method. The former one consists of fitting a generalized Pareto distribution to the excesses in a sample over a high threshold. Tail array sums fit into this framework since in the estimation procedure (see, e.g. the Hill estimator) only values exceeding a large threshold are considered. The block maxima method consists of fitting an extreme value distribution to a sample of block maxima. More generally, one may consider statistics based on block maxima. One considers disjoint (non-overlapping) blocks {X (i−1)rn +1 , . . . , X irn }, i = 1, . . . , mn = [n/rn ] or sliding (overlapping) blocks, {X i+1 , . . . , X i+rn }, i = 0, . . . , n − rn , which results in different limit theorems. A discussion on a comparison between the POT and the block maxima methods is given in [BZ18]. The statistical inference based on large values involves estimation of finitedimensional characteristics (like the tail index discussed already in the previous chapter, finite-dimensional distributions of the extremogram or the spectral tail process) or requires estimation of extremal characteristics related to clusters (like estimation of the extremal index). For the former, we have already discussed in Chapter 9 the relevant references for the Hill estimator. Some results on tail array sums were obtained in [RLdH98]. Estimation of extremogram was considered in [DM09]. Estimation of the distribution of the spectral tail process is considered in [DSW15]. The authors study a different estimator of the spectral tail process, based on the time change formula. Typical assumptions used in the aforementioned references are: a version of mixing, an anticlustering condition and moment bounds on the number of exceedences over a large threshold. The use of the miscellaneous S assumption is due to [KSW18]. In all these references a version of the POT method is employed with a threshold un chosen such as nF (un ) → ∞. The rate of convergence in the central limit theorems is (nF (un ))1/2 . In [BS18b, BS18a] the authors use the block maxima method to estimate parameters of the limiting extreme value distribution. They use a different threshold as compared to un . The rates of convergence, n/rn , is linked to the block size. Therefore, it is rather hard to compare the POT and the blocks maxima method. Furthermore, one can argue that the blocks maxima method is rather restricted to estimation of the parameters of the limiting distribution of maxima (the tail index, the extremal index). Estimation of cluster functionals can be seen as a problem on a borderline between the POT and the block maxima methods. On one hand, one considers block statistics, on the other hand only statistics that exceed a large threshold contribute to the estimation procedure. See, e.g. the blocks estimator (10.4.2) of the extremal index. The essential reference for cluster functional estimation

312

10 Estimation of cluster functionals

is a seminal paper [DR10]. Historically, statistical inference for cluster indices centered around estimation of the extremal index (in the discussion below we assume that θ = ϑ) and of the limiting cluster distribution π(m). In [Hsi91a] two disjoint blocks estimators of the extremal index are considered, each stemming from different representations. Under the aforementioned typical assumptions, consistency and asymptotic normality are established. The limiting variance agrees with the one in Example 10.4.2. [Hsi93] studies so-called runs estimator. Consistency and asymptotic normality of blocks and runs estimators from [Hsi91a, Hsi93] are considered in [SW94] and [WN98]. Another disjoint blocks estimator is studied in [SW94]. Modifications of the estimator studied in [SW94] are considered in In [RSF09], including both disjoint and sliding blocks versions. They use again different threshold than un . It turns out that the sliding blocks estimator yields a smaller variance. In [Rob09] the problem of estimation of the extremal index θ and the limiting cluster size distribution π is considered. A link between the distribution of the limiting point process of exceedences and the cluster size distribution (see Problem 7.5 and Eq. (1.4) in [Rob09]) is utilized. [Rob08] extends this methodology to multivariate versions of the extremal index. All the estimators of the extremal index discussed above use the POT and the block maxima principles simultaneously. In [BB18] the authors consider estimation of the extremal index using the block maxima methodology. Theorem 3.2 there gives asymptotic normality and comparison of the variance for disjoint and sliding blocks.

11 Estimation for extremally independent time series

Let {Xj , j ∈ Z} be a stationary regularly varying univariate time series. We consider in this chapter extremally independent stationary time series whose finite-dimensional distributions satisfy Definition 3.2.1, which we restate here under the equivalent formulation from Theorem 3.2.2. Recall first that we defined the boundedness B0,h on (R\{0})×Rh which consists of sets separated from {0} × Rh and that we say that a Borel measure ν on (R \ {0}) × Rh is B0,h -boundedly finite if and only if ν(B) < ∞ for all B ∈ B0,h . Recall also that a sequence of B0,h -boundedly finite measures νn converges B0,h -vaguely# to ν if limn→∞ νn (A) = ν(A) for all ν-continuity sets A ∈ B0,h . Assumption 11.0.1 For all h ≥ 1, there exist non-zero B0,h -boundedly finite measures μ0,h on (R \ {0}) × Rh and regularly varying functions bh such that lim

x→∞

and 1 P P(|X0 | > x)



bh (x) =0 x

X0 X1 Xh , ,··· , x b1 (x) bh (x)

(11.0.1) 

 ∈·

v#

−→ μ0,h

(11.0.2)

in (R\{0}×Rh , B0,h ) and for every y0 > 0, the measures μ0,h ([y0 , ∞)×·) and μ0,h ((−∞, −y0 ) × ·) on Rh are not concentrated on a hyperplane.

In addition to the conditions of Theorem 3.2.2, we impose (11.0.1) in this chapter since we focus on extremally independent time series only. The index κh of regular variation of the function bh is called the (lag h) conditional scaling exponent. See Definition 3.2.3. © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 11

313

314

11 Estimation for extremally independent time series

The exponents κh , h ≥ 1 reflect the influence of an extreme event at time zero on future lags. Even though we expect this influence to decrease with the lag in the case of extremal independence, these exponents are not necessarily monotone decreasing. The measures μ0,h also have some important homogeneity properties: For all Borel sets A0 ⊂ R \ {0}, A1 , . . . , Ah ⊂ R,    h  h   κi −α μ0,h tA0 × (11.0.3) t Ai = t μ0,h Ai . i=1

i=0

As in Section 3.2, let (Υ0 , Υ1 , . . . , Υh ) be a random vector with distribution μ0,h (· ∩ ((−∞, −1) ∪ (1, ∞)) × Rh ). Then |Υ0 | is a Pareto random variable with density αx−α−1 1{x > 1}. Note that by extremal independence, the tail process of X = {Xj , j ∈ Z} is (. . . , 0, Υ0 , 0, . . . ). As in (3.2.6), let   Υ0 Υ1 Υh , W h = (W0 , . . . , Wh ) = ,..., . κ |Υ0 | |Υ0 |κ1 |Υ0 | h Then |Υ0 | and W h are independent. Define the probability measure σ h on {−1, 1} × Rh by   σ h ({} × A) = μ0,h (du0 , uκ0 1 du1 , . . . , uκ0 h duh ) , u0 >1

A

for  ∈ {−1, 1} and A a Borel subset of Rh . It follows that W h is an Rh+1 valued random vector with distribution σ h and for every Borel subset A ⊂ (R \ {0}) × Rh , we have  ∞ μ0,h (A) = P((sW0 , sκ1 W1 , . . . , sκh Wh ) ∈ A) αs−α−1 ds . (11.0.4) 0

See Proposition 3.2.4. Then, as x → ∞,    X0 X1 Xh , ,..., L | |X0 | > x x b1 (x) bh (x) −→ (Υ0 , Υ1 , . . . , Υh ) = (|Υ0 |W0 , |Υ0 |κ1 W1 , . . . , |Υ0 |κh Wh ) . d

(11.0.5)

For every integer h > 0 we denote the distribution function of Υh by Ψh . That is, for all y ∈ R: Ψh (y) = P(Υh ≤ y) = lim P(Xh ≤ bh (x)y | |X0 | > x) x→∞

= μ0,h ({(−∞, −1) ∪ (1, ∞)} × Rh−1 × (−∞, y]) ,

(11.0.6)

for all y ∈ R\{0} since the distribution of Υh is continuous at all points except possibly 0. We note also that the homogeneity property (11.0.3) implies, for all s > 0,

11.1 Finite-dimensional convergence of tail array sums

315

μ0,h ({(−∞, −s) ∪ (s, ∞)} × Rh−1 × (−∞, y]) = s−α μ0,h ({(−∞, −1) ∪ (1, ∞)} × Rh−1 × (−∞, s−κh y]) = s−α Ψh (s−κh y) . Note that bh is defined up to a multiplicative constant. If we replace bh by cbh with c > 0, then Ψh is replaced by Ψh (c·). We will fix the scaling by a moment condition in Section 11.2. We recall the usual assumptions: n , rn , un are scaling sequences such that lim nP(|X0 | > un ) = ∞ ,

lim rn P(|X0 | > un ) = 0 ,

(R(rn , un ))

n n = lim βn = 0 , n→∞ rn rn

(β(rn , n ))

P(|X0 | > un s, |Xj | > un t) =0. P(|X0 | > un ) j=m

(S(rn , un ))

n→∞

lim

1

n→∞ n rn 

lim lim sup

m→∞ n→∞

n→∞

= lim

n→∞

In Section 11.1, we will give results for tail array sums adapted to the extremal independence context, paralleling Section 10.2.1 and Section 10.4.3. These results will be applied in Section 11.2 to the estimation of the limiting conditional distribution Ψh at lag h. We consider only one-dimensional marginal distributions for simplicity. As compared to the estimation of the distribution of the spectral tail process, an additional step is needed: the estimation of the unknown scaling. In Section 11.3, we will study a particular estimator of the conditional scaling exponent based on the behavior of products of extremally independent variables obtained in Proposition 3.2.12.

11.1 Finite-dimensional convergence of tail array sums Let F 0 (x) = P(|X0 | > x). For brevity, we write   Xj Xj+1 Xj+h ,··· , , , i≥0. X n,j = un b1 (un ) bh (un ) This notation is not to be confused with X n,i introduced in Chapter 10 in the proof of Theorem 10.1.3. Let ψ : Rh+1 → R+ be a measurable function such that ψ(x0 , . . . , xh ) = 0 if |x0 | < 1 .

(11.1.1)

Assume furthermore that ψ is bounded or there exists δ > 0 such that

n,0 ) E ψ 2+δ (X un }1{|Xj | > un }] n,0 )ϕ(X n,0 )ϕ(X E[φ(X E[φ(X = . F 0 (un ) F 0 (un ) Thus, (11.1.5) holds by extremal independence. By stationarity we can write, for m ≥ 1,  m   n,0 )ϕ(X n,j )] E[Sn (φ)Sn (ϕ)] |j| E[φ(X = 1− rn rn F 0 (un ) F 0 (un ) j=−m ⎞ ⎛ rn  n,0 )ψ(X n,j )] E[ψ( X ⎠ . +O⎝ F (u ) 0 n j=m+1 By (11.1.4) and (11.1.5), the first term on the right-hand side converges to μ0,h (φϕ). The second term vanishes by assumption Sψ,ei (rn , un ), upon letting n → ∞ and then m → ∞. Finally, R(rn , un ) yields r

n  1 n,0 )]E[ϕ(X n,0 )]| ≤ |E[φ(X F 0 (un ) j=1



n,0 )] E[ψ(X F 0 (un )

and hence the result for the covariances follows.

2 rn F 0 (un ) → 0 

318

11 Estimation for extremally independent time series

The next result is similar to Lemmas 10.5.2 and 10.5.3. The proof is the same since extremal independence plays no role here. Lemma 11.1.4 Let the assumptions of Theorem 11.1.2 hold. Then   

E Sn2 (φ)1 |Sn (φ)| > δ nF 0 (un ) lim =0. (11.1.6) n→∞ rn F 0 (un ) This ends the preliminaries. † Proof (Proof of Theorem 11.1.2). Define mn = [n/rn ] and let {Xn,i ,1 ≤ i ≤ mn , n ≥ 1} be a triangular array of random variables such that the blocks † † {X(i−1)r , . . . , Xir } are independent and each have the same distribution n n +1 as (X1 , . . . , Xrn ). For i = 1, . . . , mn , define   † † † irn  Xj+h Xj Xj+1 † † † ). ,..., , Sn,i , (φ) = φ(X X n,j = n,j un b1 (un ) bh (un ) j=(i−1)rn +1

(11.1.7) The β-mixing property with the rate condition β(rn , n ) and Lemma E.3.5 imply that it suffices to prove weak (finite-dimensional) on Mψ convergence to Mh of the process mn    † † † (φ) = {nF 0 (un )}−1/2 S (φ) − E[S (φ)] . M n,i n,i h,n i=1

Lemma 11.1.3 yields the limiting covariance:   † (ϕ) = μ † (φ), M lim cov M n→∞

h,n

h,n

0,h (φϕ)

.

Lemma 11.1.4 proves the negligibility condition and thus we can invoke Theorem E.2.4. Thus, we obtain convergence of the process {nF 0 (un )}−1/2

mn 

{Sn,i (φ) − E[Sn,i (φ)]} ,

i=1

(with Sn,i defined as in (11.1.7) in terms of the original sequence {Xj , j ∈ Z}). h,n , we can apply the same argument as To conclude the convergence of M for the tail array sums in Chapter 10 and obtain that the reminder term n−h  j=mn rn +1 φ(X n,j ) of the original process is negligible; cf. (10.2.2).

11.2 Non-parametric estimation of the limiting conditional distribution Let s0 ∈ (0, 1) and for s > s0 , set

11.2 Non-parametric estimation of the limiting conditional distribution

φs,y (x0 , . . . , xh ) = 1{|x0 | > s, xh ≤ y} .

319

(11.2.1)

Write simply φs for φs,∞ . In view of (11.0.6), in order to define an estimator of Ψh , we must first consider the infeasible statistic I h,n (s, y) = μ h,n (φs,y ) =

n−h  1 1{|Xj | > un s, Xj+h ≤ bh (un )y} . nF0 (un ) j=1

(11.2.2) Then, Assumption 11.0.1 and the homogeneity property (11.0.3) imply that for all s > 0 and y ∈ R, lim E[I h,n (s, y)] = lim

n→∞

n→∞

P(|X0 | > un s, Xh ≤ b(un )y) F 0 (un )

= μ0,h ((s, ∞) × Rh−1 × (−∞, y]) = s−α Ψh (s−κh y) . We consider the sequence of processes Ih,n defined on (0, ∞) × R by

h,n (φs,y ) = nF0 (un ){I h,n (s, y) − E[I h,n (s, y)]} . Ih,n (s, y) = M Let Ih be the centered Gaussian process on (0, ∞) × R defined by Ih (s, y) = Mh (φs,y ) with covariance cov(Ih (s, y), Ih (t, z)) = (s ∨ t)−α Ψh ((s ∨ t)−κh (y ∧ z)) , s, t > 0 , y, z ∈ R . We note that W(u) = Ih (1, Ψh−1 (u)) , u ∈ (0, 1) ,

(11.2.3)

is a standard Brownian motion on (0, 1). Theorem 11.2.1 Let {Xj , j ∈ Z} be a stationary regularly varying sequence such that Assumption 11.0.1 holds. Assume moreover that fi.di. R(rn , un ), β(rn , n ), and S(rn , un ) hold. Then Ih,n −→ Ih on (0, ∞)×R.

Proof. This is a straightforward application of Theorem 11.1.2: set ψ = φs0 and note that S(rn , un ) implies Sψ,ei (rn , un ) in this case.  Estimation of the scaling function We now need proxies to replace un and bh (un ) which are unknown in order to obtain a feasible statistical procedure. As usual, un will be replaced by

320

11 Estimation for extremally independent time series

an order statistic. To estimate the scaling functions bh we will exploit their representations in terms of conditional mean. We will apply Theorem 11.1.2 with the class of functions {ψs , s ≥ s0 } defined by ψs (x0 , . . . , xh ) = |xh |1{|x0 | > s}

(11.2.4)

and ψ := ψs0 . Therefore, we need appropriate versions of (11.1.2) and Sψ,ei (rn , un ):   2+δ   E  bhX(uhn )  1{|X0 | > s0 un } sup s0 un }1{|Xj | > s0 un }



F 0 (un )

=0.

(Slin,ei (rn , un )) 2 Condition (11.2.5) implies that the sequence (b−1 h (un )Xh ) is uniformly integrable conditionally on |X0 | > un and consequently,   i lim E b−i h (x)|Xh | | |X0 | > x x→∞  ∞ α = |y|i Ψh (dy) = E[|Υ0 |iκh ]E[|Wh |i ] = E[|Wh |i ] < ∞ , i = 1, 2 . α − iκh −∞ (11.2.6)

Therefore Condition (11.2.5) entails α > 2κh . We define  ∞ α ch = |y| Ψh (dy) = E[|Υ0 |κh ]E[|Wh |] = E[|Wh |] . α − κh −∞ As noted following (11.0.6), bh and Ψh are defined up to scaling, so we can and will henceforth assume that ch = 1 for all h ≥ 1. Let k be an integer, define un by k = nF 0 (un ) and let |X|(n:1) ≤ |X|(n:2) ≤ · · · ≤ |X|(n:n) be the order statistics of |X1 |, . . . , |Xn |. An empirical counterpart of (11.2.6) for i = 1 yields bh,n =

n−h " ! 1 |Xj+h |1 |Xj | > |X|(n:n−k) . k j=1

(11.2.7)

Remark 11.2.2 If we had not assumed that ch = 1, then bh,n would be an estimator of ch bh . Furthermore, the moment condition (11.2.5) may seem restrictive. In fact, we could consider a family of estimators bh,n (ζ), where |Xj+h | in (11.2.7) is replaced with |Xj+h |ζ with some ζ > 0. We choose ζ = 1 for simplicity, but all the arguments below can be easily adapted. ⊕

11.2 Non-parametric estimation of the limiting conditional distribution

321

To obtain the asymptotic behavior of this estimator, we will also need two bias conditions:  

 P(|X0 | > un s, Xh ≤ b(un )y)  −α −κh   lim − s Ψh (s sup nF 0 (un )  y) = 0 , n→∞ s≥s0 ,y∈R F 0 (un ) (11.2.8a) 

  E |Xh | 1{|X | > u s} 

0 n   bh (un ) lim sup nF 0 (un )  − sκh −α  = 0 . n→∞ s≥s0 F 0 (un )   (11.2.8b) We will also need the bias condition of Theorem 9.3.1: there exists s0 ∈ (0, 1) such that   √  P(|X0 | > un s)  −α   (11.2.9) lim k sup  −s =0. n→∞ s≥s0 P(|X0 | > un ) Recall that W(u) = Ih (1, Ψh−1 (u)) is a standard Brownian motion and let B be the associated Brownian bridge, i.e. B(u) = W(u) − uW(1) , u ∈ [0, 1] . Proposition 11.2.3 Let {Xj , j ∈ Z} be a stationary regularly varying sequence such that Assumption 11.0.1 holds. Assume that R(rn , un ), β(rn , n ), and S(rn , un ) hold. Assume that there exist δ > 0 and s0 ∈ (0, 1) such that (11.1.3), (11.2.5), (11.2.8a), (11.2.8b), (11.2.9), and Slin,ei (rn , un ) hold. Then      √ √ |X|(n:n−k) bh,n −1 k −1 , k un bh (un )    1 d −→ α−1 W(1), |Ψh−1 (u)|B(du) + α−1 κh W(1) . (11.2.10) 0

Proof. We already know the convergence of the first component of the lefthand side of (11.2.10) by Theorem 9.3.1. We only need to prove the convergence of the second one and the joint convergence is a consequence of Theorem 11.1.2. As a particular case of Theorem 11.1.2 we obtain   fi.di. h,n (ψs ), s > 0 −→ {Mh (ψs ), s > 0} (11.2.11) M with ψs defined in (11.2.4). By Lemma 11.4.1 below and Theorem C.2.15, the convergence holds in D([s0 , ∞)) endowed with the J1 topology and the limiting process has almost surely continuous paths.

322

11 Estimation for extremally independent time series

Write ζn = |X|(n:n−k) /un and θn = bh,n /bh (un ). Thus, by the bias condition (11.2.8b) and the delta method, √ √ √ h,n (ψζ ) + k(E[M h,n (ψζ )] − ζ κh −α ) + k(ζ κh −α − 1) k(θn − 1) = M n

n

n

d

−→ Mh (ψ1 ) + α

−1

n

(κh − α)Mh (φ1 ) ,

(11.2.12)

with φs defined in (11.2.1). By definition of W, (see (11.2.3)), Mh (φ1 ) = W(1). Also,  ∞  1 −1 |y|W(Ψh (dy)) = |Ψh−1 (u)|W(du) . Mh (ψ1 ) = −∞

0

By the normalization ch = 1, we have   1 |Ψh−1 (u)|du = 0

Thus Mh (ψ1 ) − Mh (φ1 ) =

#1 0



−∞

|y|Ψh (y) = 1 .

|Ψh−1 (u)|B(du).



Estimation of the limiting conditional distribution We can now define a data-driven estimator of Ψh :   |X| b (n:n−k) h,n y Ψh,n (y) = I h,n , un bh (un ) =

(11.2.13)

n−h  "  1 ! 1 |Xj | > |X|(n:n−k) 1 Xj+h ≤ bh,n y . k j=1

If we had not assumed that ch = 1, then Ψh,n (y) would be an estimator of Ψh (ch y). The latter function is the rescaled version of Ψh which satisfies the moment condition ch = 1. Assuming that Ψh is differentiable we define the process Λh by  1  |Ψh−1 (u)|B(du) . Λh (y) = B(Ψh (y)) + yΨh (y)

(11.2.14)

0

Theorem 11.2.4 Let the assumptions of Theorem 11.2.1 and Proposition 11.2.3 hold. Assume moreover that the function Ψh is differentiable and that there exists τ > 1 such that  2+τ  r n E 1{|X | > u } j n j=1 1} − α log+ (u) + 1{u > 1})H(du, dv) .

There remains to compute the variance σ 2 of the limiting distribution which is Gaussian with zero mean. Setting h(u, v) = γh−1 log+ (u|v|) − 1{u|v| > 1} − α log+ (u) + 1{u > 1} , and combining with (11.0.4) we obtain  ∞  h2 (s, sκh |Wh |)αs−α−1 ds . σ 2 = (1 + κh )2 E 0

Substituting t = s−α we obtain  ∞ h2 (s,sκh |Wh |)αs−α−1 ds 0  ∞ h2 (t−1/α , t−κh /α |Wh |)dt = 0  ∞   −1 −1 log+ (t−1 |Wh |γh ) − 1 |Wh |γh > t = ! "2  dt − log+ (t−1 ) − 1{t < 1}

0

−1

= ||Wh |γh − 1| . This proves (11.3.5).



11.4 Asymptotic equicontinuity In this section, we prove the asymptotic equicontinuity properties needed in the proofs of Proposition 11.2.3 and Theorem 11.2.4. Recall from (11.2.4) that we have defined, for s > 0, ψs (x0 , . . . , xh ) = |xh |1{|x0 | > s}. Lemma 11.4.1 Under the assumptions of Proposition 11.2.3, for all 0 < h,n (ψs ), s0 ≤ s ≤ t0 } is asymptots0 < t0 < ∞, the sequence of processes {M ically equicontinuous.

328

11 Estimation for extremally independent time series

Proof. Recall the notation   Xj Xj+1 Xj+h ,··· , , X n,j = , j≥0 un b1 (un ) bh (un ) †

. First, and the corresponding one for the independent blocks process: X n,j as in Chapters 9 and 10, it suffices to prove asymptotic equicontinuity for the † built on independent blocks. Define the independent random process M h,n processes Zn,i , 1 ≤ i ≤ mn by Zn,i (φ) = (mn rn P(|X0 | > un ))−1/2

irn 



). φ(X n,j

(11.4.1)

j=(i−1)rn +1

for φ : Rh+1 → R. Then, † (φ) = M h,n

mn 

{Zn,i (φ) − E[Zn,i (φ)]} .

i=1

We apply Theorem C.4.5 to prove that for all η > 0, ⎞ ⎛     † (ψs ) − M † (ψt ) > η ⎠ = 0 . lim lim sup P ⎝ sup M h,n h,n δ→0 n→∞

(11.4.2)

s0 ≤s,t≤t0 |s−t|≤δ

The class G = {ψs , s ∈ [s0 , t0 ]} is totally bounded for the metric ρ(ψs , ψt ) = |t − s|. We must check conditions (i), (ii), and (iii) of Theorem C.4.5. – The asymptotic negligibility condition (C.4.3) holds by Lemma 11.1.4. – Define vn = mn rn P(|X0 | > un ) and the random metric dn on G by ⎧ ⎫2 irn mn ⎨  ⎬   † † 1 ) − ψt (X ) ψs (X . d2n (ψs , ψt ) = n,j n,j ⎩ ⎭ vn i=1

Thus E E[d2n (ψs , ψt )] =

j=(i−1)rn +1

 r n  j=1

2  n,j ) − ψt (X n,j ) ψs (X

rn P(|X0 | > un )

.

Note that for s < t, ψs > ψt thus   2  2  r n r n n,j ) n,j ) ψ ( X ψ ( X E − E s t j=1 j=1 . E[d2n (ψs , ψt )] ≤ rn P(|X0 | > un ) By Lemma 11.1.3, Conditions (11.2.5), Slin,ei (rn , un ) and extremal independence, we have for every s ≥ s0 ,

11.4 Asymptotic equicontinuity

E lim

n→∞



2  r n n,j ) ψ ( X s j=1

rn P(|X0 | > un )

329

= μ0,h (ψs2 ) = s−α+2κh E[Υh2 ] .

Moreover, the convergence is uniform by Dini Theorem E.4.12. Since α > 2κh , the limit is uniformly continuous on [s0 , ∞). Thus we can apply Lemma C.4.7 to prove that (C.4.4) holds. – To prove that the random entropy condition (C.4.5) holds, we will apply Lemma C.4.8 and Corollary C.4.20. For this purpose, instead of the process † indexed by the class G = {ψs , s0 ≤ s ≤ t0 } we must consider the process M h,n n built on the independent blocks G   † † X†n,i = X , 1 ≤ i ≤ mn , (11.4.3) n,(i−1)rn +1 , . . . , X irn indexed by theclass G = {Hψs , s0 ≤ s ≤ t0 } of functions defined on 0 (Rh+1 ) by Hψs (x) = j∈Z ψs (xj ), that is   n (Hψ ) = v −1/2 summn Hψ (X† ) − E[Hψ (X† )] = M † (ψs ) . G n s s s n,i n,i i=1 h,n (11.4.4) The blocks X†n,i are identified to elements of 0 (Rh+1 ) by adding zeros to the left and to the right. The envelope function G of the class G is  G(x) = ψs0 (xj ) , x = (xj )j∈Z ∈ 0 (Rh+1 ) . j∈Z

By definition of ψs0 , the series has finitely many terms. By Lemma 11.1.3, we have  2  r n E j=1 ψs0 (X n,j ) mn E[G2 (Xn,0 )] → μ0,h (ψs20 ) < ∞ . = vn rn P(|X0 | > un ) Thus (C.4.8) of Lemma C.4.8 holds. We define the random semi-metric d˜n on G by d˜2n (Hψs , Hψt ) = vn−1

mn 

{Hψs (X†n,i ) − Hψt (X†n,i )}2

i=1

= vn−1

⎧ rn mn ⎨  i=1



j=1

† ψs (X n,(i−1)rn +j )



⎫2 ⎬

† ψt (X n,(i−1)rn +j )



= d2n (ψs , ψt ) . Thus, proving that the random entropy condition (C.4.5) holds for the class G with respect to the semi-metric dn is equivalent to proving that it holds

330

11 Estimation for extremally independent time series

for the class G with respect to the semi-metric d˜n . The latter property holds by Lemma C.4.8: indeed, (C.4.8) holds and the class G is linearly ordered, thus (C.4.9) holds by Corollary C.4.20. We have checked the conditions of Theorem C.4.5, hence (11.4.2) holds.



Recall from (11.2.1) that φs,y (x0 , . . . , xh ) = 1{|x0 | > s, xh ≤ y}. Fix 0 < s0 < t0 < ∞ and a0 < a1 ∈ R. h,n (φs,y ), s0 ≤ s ≤ t0 , a0 ≤ Lemma 11.4.2 The sequence of processes {M y ≤ a1 } is asymptotically equicontinuous. Proof. The proof is similar to the proof of Lemma 11.4.1, the main difference being in the way the random entropy condition is checked. Again, it † based on suffices to prove asymptotic ρ-equicontinuity for the process M h,n the independent random processes Zn,i , 1 ≤ i ≤ mn defined in (11.4.1). We must check the conditions of Theorem C.4.5 for the class H = {φs,y , s0 ≤ s ≤ t0 , a0 ≤ y ≤ a1 } endowed with the metric ρh (φs,y , φt,z ) = |s − t| + |y − z| for which it is totally bounded. We write as before vn = mn rn P(|X0 | > un ). – The negligibility condition (C.4.3) holds by Lemma 11.1.3. – Define the random metric dn as in the proof of Lemma 11.4.1: ⎧ ⎫2 irn mn ⎨ ⎬     † † 1 ) − φt,z (X ) φs,y (X d2n (φs,y , φt,z ) = . n,j n,j ⎩ ⎭ vn i=1

j=(i−1)rn +1

Using monotonicity and the inequality (a − b)2 ≤ |a2 − b2 | for a, b ≥ 0 and the notation of Lemma 11.1.3, we have E[{Sn (φs,y ) − Sn (φt,z )}2 ] E[d2n (φs,y , φt,z )] = rn P(|X0 | > un ) E[{Sn (φt,y )−Sn (φt,z )}2 ] E[{Sn (φs,y )−Sn (φt,y )}2 ] +2 ≤2 rn P(|X0 | > un ) rn P(|X0 | > un )   2 2 2 2   E[S (φ E[Sn (φs )]−E[Sn (φt )] n s0 ,y )]−E[Sn (φs0 ,z )] +2 . ≤2 rn P(|X0 | > un ) rn P(|X0 | > un ) By Lemma 11.1.3, we have E[Sn2 (φs )] lim = μ0,h (φs )) = s−α , n→∞ rn P(|X0 | > un ) E[Sn2 (φs0 ,y )] −κh = μ0,h (φs0 ,y ) = s−α y) . lim 0 Ψh (s n→∞ rn P(|X0 | > un ) Both convergences are uniform on [s0 , t0 ] and [a0 , a1 ], respectively, so we can apply Lemma C.4.7 to conclude that (C.4.4) holds.

11.4 Asymptotic equicontinuity

331

n defined in – For the random entropy integral, we consider the process G (11.4.4), indexed by the class H = {Hφs,y , s0 ≤ s ≤ t0 , a0 ≤ y ≤ a1 }. The envelope function G of this class is defined by    (0) 1 |xj | > s0 , G(x) = j∈Z (0)

(h)

where we denote x = (xj )j∈Z and xj = (xj , . . . , xj ). We write as before n (Hφ ) = v −1/2 G n

mn    h,n (φ) . Hφ (X†n,i ) − E[Hφ (X†n,i )] = M i=1

We define the random semi-metric d˜n on G by d˜2n (Hφs,y , Hφt,z ) = vn−1

mn 

{Hφs,y (X†n,i ) − Hφt,z (X†n,i )}2 = d2n (φs,y , φt,z ) .

i=1

In order to prove that the random entropy condition (C.4.5) holds for the ˆ we apply Lemma C.4.22. class H endowed with the random metric d, (i) For k ≥ 1, we define the class Hk by Hk = {H1{G ≤ k}, H ∈ H} . The VC-dimension of the class Hk is O(log k); see Lemma 11.4.3. (ii) We define Gk = G1{G > k}. For H ∈ H, set Hk = H1{G ≤ k}. Then, Hk ∈ Hk and |H − Hk | ≤ Gk . (iii) To prove that (C.4.10) holds, note first that under condition S(rn , un ), Lemma 11.1.3 yields  2  r n E mn j=1 1{|Xj | > un s0 } 1  E[G2 (X†n,i )] = . → s−α 0 vn i=1 rn F 0 (un ) Furthermore, under condition (11.2.15), Burkholder’s inequality (E.1.6), for q ∈ (1, 2)  + * m mn n    q    G2 (X†n,i ) − E[G2 (X†n,i )]  ≤ cst E  E[|G2 (X†n,i ) − E[G2 (X†n,i )]|q ]   i=1 i=1  mn 2q     ≤ cst E G(X†n,i ) i=1 rn  1{|Xj | > un s0 })2q ] . = cst mn E[( j=1

332

11 Estimation for extremally independent time series

Choosing q ∈ (1, 2) such that 2q < 2 + τ with τ as in (11.2.15) yields  + * mn    q  −1   † † 2 2 E vn G (Xn,i ) − E[G (Xn,i )]  = O(vn−q mn rn F 0 (un ))   i=1

= O(vn−(q−1) ) = o(1) . We conclude that vn−1 holds.

m n

i=1

P

G2 (X†n,i ) −→ s−α 0 . This proves that (C.4.10)

(iv) Condition (C.4.11) holds with γ = 1/τ by Markov inequality and (11.2.15). Thus we can apply Lemma C.4.22 to obtain that the random entropy condition (C.4.5) holds. We have checked all the conditions of Theorem C.4.5 and this concludes the proof of Lemma 11.4.2.  Lemma 11.4.3 The VC-dimension of the class Hk is O(log k). Proof. For s0 ≤ s ≤ t0 and y ∈ R define gk,(s,y) : (Rh+1 )k → R by gk,(s,y) (x1 , . . . , xk ) =

k 

φs,y (xj ) .

j=1

Let Fk = {gk,(s,y) , s0 ≤ s ≤ t0 ; y ∈ R}. By Example C.4.16, the VC-index of the class Fk is O(log k). Define the map Πk : (Rh+1 )Z → (Rh+1 )k by   (0) Πk (x) = (x(1) , . . . , x(k) )1 |x(k+1) | ≤ s0 , (0)

(h)

where x(i) = (x(i) , . . . , x(i) ), i ≥ 1, are the decreasing order statistics according to the first component. Then  1{Es0 (x) ≤ k} φs,y (xj ) = gk,(s,y) (Πk (x)) . j∈Z

Thus Hk = Fk ◦Πk and by Example C.4.14, VC(Hk ) ≤ VC(Fk ) = O(log(k)). 

11.5 Bibliographical notes This chapter essentially follows [BBKS20] which seems to be the only reference for estimation of a limiting conditional distribution in a heavy-tailed time series context.

12 Bootstrap

In Section 10.3 we obtained a central limit theorem for data-based estimators  ∗n (H) of cluster functionals ν ∗ (H). Theorem 10.3.1 allows us to construct ν confidence intervals for ν ∗ (H) of the form   1 1  ∗n,rn (H) − √ σ(H)zβ/2 , ν  ∗n,rn (H) + √ σ(H)zβ/2 , ν k k where σ 2 (H) is the limiting variance of the estimator and zβ/2 is 1−β/2 quantile of the standard normal random variable. Unfortunately, in most cases, the limiting variance depends on unknown parameters and has a complicated form, often an infinite series, as can be seen in the examples of Section 10.4. Thus constructing estimators of σ 2 (H) seems hardly feasible. Therefore, we will consider the multiplier bootstrap method to obtain confidence intervals. Let {X j , j ∈ Z} be an Rd -valued, stationary regularly varying time series. In Section 10.3, we defined √  ∗   n (H) = k ν  n,rn (H) − ν ∗ (H) G  mn √ i=1 H(X (i−1)rn +1,irn /|X|(n:n−k) ) ∗ − ν (H) . = k k  n (H) converges in distribution Under the assumptions of Theorem  10.3.1, G to G(H − ν ∗ (H)E), where E(x) = j∈Z 1{|xj | > 1} and G was defined at the beginning of Section 10.2. As seen in Remark 10.3.6, we can replace k by the random normalization k ν ∗n,rn (E) and obtain that  ∗ √  n,rn (H) ν ∗ k − ν (H)  ∗n,rn (E) ν also converges weakly to G(H − ν ∗ (H)E). Let now {ξj , j ∈ Z} be a sequence of i.i.d. random variables with E[ξ1 ] = 0 and E[ξ12 ] = 1, independent of {X j , j ∈ Z} and define © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 12

333

334

12 Bootstrap

 ∗n,rn ,ξ ν

mn 1

= (1 + ξi )δ|X |−1 (X (i−1)rn +1 ,...,X irn )) . (n:n−k) k i=1

Define the multiplier statistics  ∗ √  n,rn ,ξ (H) ν  ∗n,rn (H) ν  Gn,ξ (H) = k − ∗  ∗n,rn ,ξ (E)  n,rn (E) ν ν  mn √ i=1 (1 + ξi )H(X (i−1)rn +1,irn /|X|(n:n−k) ) = k m n i=1 (1 + ξi )E(X (i−1)rn +1,irn /|X|(n:n−k) ) mn i=1 H(X (i−1)rn +1,irn /|X|(n:n−k) ) − m n . (12.0.1) i=1 E(X (i−1)rn +1,irn /|X|(n:n−k) ) We will prove in Theorem 12.1.1 that P   n,ξ (H) ≤ x | F X − P(G(H) ≤ x)| −→ 0 , sup |P G n x∈R

where FnX denotes the sigma algebra generated by {X 1 , . . . , X n }. This also implies that P  n (H) ≤ x)| −→  n,ξ (H) ≤ x | F X − P(G 0. sup |P G n x∈R

If for sufficiently large n, Aβ is an interval of R (depending on {X 1 , . . . , X n }) such that  n,ξ (H) ∈ Aβ | F X = β , P G n

then also √    ∗n,rn (H) − ν ∗ (H) ∈ Aβ P k ν

 ∗  √  n,rn (H) ν ≈P k − ν ∗ (H) ∈ Aβ | FnX ≈ β .  ∗n,rn (E) ν To find an approximation of such set Aβ , we can employ the following Monte Carlo procedure: • Simulate independently ξ(l) = {ξ1,l , . . . , ξmn ,l }, l = 1, . . . , N , where N is a positive integer;  n,ξ(l) (H) using the multiplier vector ξ(l) and • For each l calculate Vl = G the same data set X 1 , . . . , X n each time; • For β ∈ (0, 1) calculate V(N :[N β/2]) and V(N :[N (1−β/2)]) , the β/2 and 1 − β/2 sample quantiles of V1 , . . . , VN .

12.1 Conditional multiplier central limit theorem

335

• The set Aβ can be approximated by (V(N :[N β/2]) , V(N :[N (1−β/2)]) ) . This yields the pivotal bootstrap (1 − β)-confidence interval: (2 ν ∗n,rn (H) − V(N :[N (1−β/2)]) , 2 ν ∗n,rn (H) − V(N :[N β/2]) ) . In Theorem 12.1.1, we will prove a bootstrap version of Theorem 10.3.1, which will be applied to tail array sums and in particular to the Hill estimator.

12.1 Conditional multiplier central limit theorem The following result is the multiplier bootstrap counterpart to Theorem 10.3.1. We will not need the bias conditions (10.3.2a)–(10.3.2b), but we will need one additional consistency assumption. Theorem 12.1.1 Let {X j , j ∈ Z} be a stationary, regularly varying Rd valued time series. Assume that the distribution F0 of |X 0 | is continuous. Let k be a nondecreasing sequence of integers and define un = F0← (1 − k/n). Assume that R(rn , un ), β(rn , n ), and S(rn , un ) hold. Fix 0 < s0 < 1 < t0 . Let H : (Rd )Z → R be a shift-invariant measurable map such that the class {Hs : s ∈ [s0 , t0 ]} is linearly ordered and satisfies (BCLT1), (BCLT2), and (BCLT3) of Theorem 10.2.1. Assume furthermore that for all s ∈ [s0 , t0 ], P

 ∗n,rn (Hs2 ) −→ ν ∗ (Hs2 ) = s−2α ν ∗ (H 2 ) . ν

(12.1.1)

Then, P  n,ξ (H) ≤ x | F X − P(G(H − ν ∗ (H)E) ≤ x)| −→ sup |P G 0 . (12.1.2) n x∈R

Remark 12.1.2 Condition (12.1.1) holds in particular if H is bounded with bounded support in 0 \ {0} (by Theorem 10.1.2 or if H = 1{K > 1}, where K fulfills the assumptions of Corollary 10.1.8 and ANSJB(rn , un ) holds. ⊕ Before proving Theorem 12.1.1, we apply it to tail array sums and to the Hill estimator.

336

12 Bootstrap

12.1.1 Tail array sums In Chapter 10 the limiting behavior of tail array sums  was obtained from that of cluster functionals by considering Hφ (x) = j∈Z φ(xj ) and taking  n,ξ (Hφ ), into account the edge effects. Here, we will consider G ⎧  irn mn √ ⎨ (1 + ξi ) j=(i−1)r φ(X j /|X|(n:n−k) ) i=1 n +1  n,ξ (Hφ ) = k   G   ⎩ mn (1 + ξ ) irn 1 |X | > |X| i j i=1 j=(i−1)rn +1 (n:n−k) ⎫ mn rn ⎬ j=1 φ(X j /|X|(n:n−k) )  .  − mn rn ⎭ i=1 1 |X j | > |X|(n:n−k) Thus, there are no edge effects and the next result follows directly from Theorem 12.1.1. Theorem 12.1.3 Let {X j , j ∈ Z} be a stationary, regularly varying Rd valued time series. Assume that the distribution F0 of |X 0 | is continuous. Let k be a nondecreasing sequence of integers and define un = F0← (1 − k/n). Assume that R(rn , un ), β(rn , n ), and S(rn , un ) hold. Let ψ be a function defined on Rd such that (10.2.4) and Sψ (rn , un ) are satisfied. Assume that either ψ is bounded or there exists δ ∈ (0, 1] such that (10.2.5) and (10.2.7) hold. Assume also that there exists ς ∈ (2, 4] such that ς     rn E  j=1 ψ(X j /un ) un ) n≥1 Let φ ∈ Tψ be such the class {φs , s ∈ [s0 , t0 ]} is linearly ordered. Then P  n,ξ (Hφ ) ≤ x | F X − P(G(Hφ − ν X [0](φ)E) ≤ x)| −→ 0. sup |P G n

x∈R

Proof. The conditions of Theorem 12.1.3 imply that, (BCLT1), (BCLT2), and (BCLT3) are fulfilled, as justified in Section 10.5. We only need to show that (12.1.1) holds. We now prove that (12.1.3) implies it. More precisely, we will prove that ν ∗n,rn (Hφ2 )] = lim ν ∗n,rn (Hφ2 ) = ν ∗ (Hφ2 ) , lim E[

(12.1.4a)

lim P(| ν ∗n,rn (Hφ2 ) −

(12.1.4b)

n→∞ n→∞

n→∞ ν ∗n,rn (Hφ2 )|

> η) = 0 ,

12.1 Conditional multiplier central limit theorem

337

for all η > 0. The first statement, (12.1.4a), follows from Lemma 10.5.1. In view of the mixing assumption, we only need to prove (12.1.4b) for the triangular array of independent blocks {X †1 , . . . , X †n }. Set irn

ζn,i =

φ(X †j /un ) .

j=(i−1)rn +1

Write vn = nP(|X 0 | > un ). Then (assuming without loss of generality that  n 2  †n,rn (Hφ2 ) = vn−1 m n = mn rn ), ν i=1 ζn,i and by the mixing assumption and Burkholder inequality (E.1.6) with p = ς/2 ∈ (1, 2], ν †n,rn (Hφ2 ) − ν ∗n,rn (Hφ2 )| > η) + mn βn P(| ν ∗n,rn (Hφ2 ) − ν ∗n,rn (Hφ2 )| > η) ≤ P(| 2 2 mn E[|ζn,0 − E[ζn,0 ]|p ] + mn βn p vn 2 2 − E[ζn,0 ]|p ] E[|ζn,0 + mn βn ≤ cst p−1 vn rn P(|X 0 | > un )   ς  rn E j=1 ψ(X j /un ) + mn βn → 0 ≤ cst vnp−1 rn P(|X 0 | > un )

≤ cst



by application of (12.1.3) and β(rn , n ).

12.1.2 The Hill estimator The Hill estimator was defined in (9.5.1) by

 n   1

|X j | γ n, k = log 1 |X j | > |X|(n:n−k) . k j=1 |X|(n:n−k) The multiplier version of γ n, k is γ n,k,ξ m n =

  irn (1 + ξ ) log(|X | /|X| )1 |X | > |X| i j j i=1 j=(i−1)rn +1 (n:n−k) (n:n−k)   . m n irn i=1 (1 + ξi ) j=(i−1)rn +1 1 |X j | > |X|(n:n−k)

 n,ξ (Hφ ) with φ(x) = log(|x|)1{|x| > 1} we obtain the result By considering G for the multiplier bootstrap statistics corresponding to the Hill estimator. We recall the condition Slog (rn , un ). For all s, t > 0,      rn

|X 0 | |X j | 1 E log+ log+ =0. lim lim sup m→∞ n→∞ P(|X 0 | > un ) sun tun j=m (Slog (rn , un ))

338

12 Bootstrap

Clearly, (10.2.4) and (10.2.5) hold for any δ > 0. Also, the class {φs , s ∈ [s0 , t0 ]} is linearly ordered. Corollary 12.1.4 Let {X j , j ∈ Z} be a stationary, regularly varying Rd valued time series. Assume that the distribution F0 of |X 0 | is continuous. Let k be a nondecreasing sequence of integers and define un = F0← (1 − k/n). Assume that R(rn , un ), β(rn , n ), and S(rn , un ) and Slog (rn , un ) are satisfied. Assume that there exists δ ∈ (0, 1] such that (10.2.7) holds. Assume also that there exists ς ∈ (2, 4] such that ς     rn E  j=1 log(|X j |/un )1{|X j | > un } un ) n≥1 Then √ P sup |P( k { γn,k,ξ − γ n, k} ≤ x | FnX ) − P(N(0, σ 2 ) ≤ x)| −→ 0 , x∈R

with σ 2 = γ 2

 j∈Z

P(|Y j | > 1 | |Y 0 | > 1).

12.2 Proof of Theorem 12.1.1 The convergence (12.1.2) in Theorem 12.1.1 can be worded as follows: the  n,ξ (H) converges weakly to the (determinsequence of conditional laws of G ∗ istic) law of G(H − ν (H)E), in probability. More generally, for a sequence of d-dimensional random vectors {W n , n ∈ N} and a filtration {Fn , n ≥ 0}, we say that the sequence of conditional distributions of W n given Fn converges weakly to the distribution of W in probability if for all u ∈ Rd , P

E[eiu ,W n | Fn ] −→ E[eiu ,W ] , n → ∞ . Define  n,ξ (H) = k −1/2 G

mn

ξi H(X (i−1)rn +1,irn /un ) .

(12.2.1)

i=1

 n,ξ (H) given F X We will prove the finite-dimensional weak convergence of G n (the natural σ-field of the sequence X) in Lemma 12.2.1. In order to prove The n,ξ (H) and consider the latter  n,ξ in terms of G orem 12.1.1, we will express G indexed by a suitable class of functions as a random element in a complete separable metric space. For such a space (E, d) endowed with its Borel σ-field, we

12.2 Proof of Theorem 12.1.1

339

can use the characterization of weak convergence given in Theorem A.2.4. A distance dBL on the set of probability measures on (E, d) is defined by dBL (μ, ν) =

sup f ∈BL1 (E,d)

|μ(f ) − ν(f )| ,

(12.2.2)

where BL1 (E, d) is the set of bounded Lipschitz functions f on E such that |f |∞ + |f |Lip(d) ≤ 1. Weak convergence of a sequence of probability measures on (E, d) is equivalent to convergence with respect to the metric dBL . Given a sequence of random elements W n with values in E and a filtration Fn , the sequence of conditional laws of W n given Fn converges weakly to the law of W in probability if sup f ∈BL1 (E,d)

P

|E[f (W n ) | Fn ] − E[f (W )]| −→ 0 .

(12.2.3)

Equivalently, if μn denotes the (random) conditional law of W n given Fn and μ the (deterministic) law of W , conditional weak convergence in probability of W n given Fn to W means that P

dBL (μn , μ) −→ 0 . We will apply this characterization to the space D([s0 , t0 ]) endowed with the J1 topology which is metrizable in a way that makes D([s0 , t0 ]) a complete separable metric space. We will denote by doJ1 a metric which makes D([s0 , t0 ]) a complete separable metric space and which has the property that doJ1 (f, g) ≤ f − g ∞ for all f, g ∈ D([s0 , t0 ]). See Appendix C.2. The metrizability of weak convergence will allow us to use a triangular argument in the proof of Proposition 12.2.3 below. Let us now describe more precisely the structure of the proof of Theorem 12.1.1. The proof of Theorem 12.1.1 consists of the following steps:  n,ξ (H) • Lemma 12.2.1 establishes finite-dimensional convergence of G with respect to H in a suitable set, conditionally on X 1 , . . . , X n . • In Lemma 12.2.2 we prove the unconditional asymptotic equicontinuity  n,ξ (H) with respect to H in a suitable function class. of G • Combining conditional convergence in Lemma 12.2.1 with the unconditional asymptotic equicontinuity we obtain the conditional functional convergence of the multiplier process; see Proposition 12.2.3. Lemma 12.2.1 Assume that R(rn , un ), AC(rn , un ), and β(rn , n ) hold. Let H be a linear subspace of L2 (ν ∗ ) such that (BCLT1), (BCLT2), (BCLT3) hold and for all H ∈ H, mn

1 P  ∗n,rn (H 2 ) = H 2 (X (i−1)rn +1,irn /un ) −→ ν ∗ (H 2 ) . ν nP(|X 0 | > un i=1 (12.2.4)

340

12 Bootstrap

Then, conditionally on FnX , the finite-dimensional distributions of the process  n,ξ indexed by H converge weakly to those of G in probability. G  n,ξ is linear with respect to its arguProof. Since H is a linear space and G ment, we only need to prove the requested convergence for one H. For H ∈ H, let Yn,i (H) = H(u−1 n (X (i−1)rn +1 , . . . , X irn )) , i = 1, . . . , mn . This yields, with k = nP(|X 0 | > un ),  n,ξ (H) = k −1/2 G

mn

ξi Yn,i (H) .

i=1

We apply Corollary E.2.6. Conditions (BCLT1) and (BCLT2) imply (E.2.12). Condition (12.2.4) is exactly (E.2.10). This concludes the proof.  Define semi-metric ρ∗ (H, H ) =



ν ∗ ({H − H }2 ) on G.

Lemma 12.2.2 Let H ∈ H be such that Hs ∈ H for s ∈ [s0 , t0 ] and the class  n,ξ indexed by the {Hs , s ∈ [s0 , t0 ]} is linearly ordered. Then the process G ∗ class G = {Hs , Es , s ∈ [s0 , t0 ]} is asymptotically ρ -equicontinuous. Proof. As in the previous chapters, we will use a pseudo-sample (X †1 , . . . , X †n ) such that the blocks (X †(i−1)rn +1 , . . . , X †irn ), i = 1, . . . , mn , are mutually independent with the same distribution as the original block (X 1 , . . . , X rn ), also independent from {ξj , j ∈ Z}. For i = 1, . . . , mn , analogously to Yn,i (H), we define † † † (H) = H(u−1 Yn,i n (X (i−1)rn +1 , . . . , X irn )) .

 n,ξ based on the independent blocks, Also, we define the analogue of G  † (H) = k −1/2 G n,ξ

mn

† ξi Yn,i (H) .

i=1

As seen in the proofs of Theorem 9.2.10 and Theorem 10.3.1, Lemma E.3.4  † . Furshows that it suffices to prove asymptotic equicontinuity for G n,ξ thermore, it can be proved separately for the classes {Hs , s ∈ [s0 , t0 } and {Es , s ∈ [s0 , t0 }. We will only do it for the first one since H is arbitrary. We  † which can be expressed as mn Zn,i apply Theorem C.4.5 to the process G i=1 n,ξ where Zn,i are row-wise i.i.d. random elements defined by Zn,i (H) = k −1/2 ξi H X †(i−1)rn +1,irn /un , 1 ≤ i ≤ mn . We check the conditions of Theorem C.4.5.

12.2 Proof of Theorem 12.1.1

341

• The Lindeberg condition (i) holds. Let H be the envelope of the class G. Since G is linearly ordered and indexed by [s0 , t0 ], H ∈ G. Recall that mn = n/rn and fix A > 0. Then,   mn E[ Zn,1 2G 1 Zn,1 2G > η ]   1 E[ξ12 H 2 (X 1,rn /un )1 ξ12 H(X 1,rn /un ) > ηk ] = rn P(|X 0 | > un )   A2 E[H 2 (X 1,rn /un )1 H 2 (X 1,rn /un ) > ηk/A ] ≤ rn P(|X 0 | > un ) E[H 2 (X 1,rn /un )] E[ξ12 1{|ξ1 | > A}] rn P(|X 0 | > un )    = ν ∗n,rn H 2 1 |H| > η nP(|X 0 | > un )/A   + ν ∗n,rn H 2 E[ξ12 1{|ξ1 | > A}] . +

The first term in the last line tends to 0 by Assumption  (BCLT2).  Since  limA→∞ E[ξ12 1{|ξ1 | > A}] = 0 and limn→∞ ν ∗n,rn H 2 = ν ∗ H 2 by (BCLT1), the second term also tends to zero and the Lindeberg condition (i) of Theorem C.4.5 holds. • Define the random semi-metric mn

˜2 (H, K) = 1 d ξ 2 {H(X (i−1)rn +1,irn /un ) − K(X (i−1)rn +1,irn /un )}2 . n k i=1 i ˜2 (H, K)] = ν ∗ ({H − K}2 ), Condition (ii) of Theorem C.4.5 Since E[d n n,rn holds by the same argument as in the proof of Theorem 10.3.1. • By Lemma C.4.8 and Corollary C.4.20, the linear ordering yields the random entropy condition (C.4.5) of Theorem C.4.5. All the conditions of Theorem C.4.5 are verified so we have proved the stated unconditional asymptotic equicontinuity.  Proposition 12.2.3 Assume that R(rn , un ), S(rn , un ), and β(rn , n ) hold. Let H be a linear subspace of L2 (ν ∗ ) such that (BCLT1), (BCLT2), (BCLT3) hold. Let H ∈ H be such that {Hs , s ∈ [s0 , t0 ]} ⊂ H. Assume that (12.1.1) holds for all s ∈ [s0 , t0 ]. Then the conditional distribution  n,ξ (Es ), s ∈ [s0 , t0 ]} (as random ele n,ξ (Hs ), s ∈ [s0 , t0 ]} and {G of {G ments in D[s0 , t0 ]) given FnX converge jointly weakly in probability to the distribution of the process G. Proof. Let μ be the law of the process {G(Hs ), s ∈ [s0 , t0 ]} considered as a random element in D([s0 , t0 ]) endowed with the J1 topology and suitably

342

12 Bootstrap

 n,ξ (Hs ), s ∈ [s0 , t0 ]} given F X . metrized. Let μn be the conditional law of {G n We will prove that P

dBL (μn , μ) −→ 0 .

(12.2.5)

For this we will use a triangular argument. For δ > 0, let N (δ) = inf{j ∈ N : s0 + jδ/(ν ∗ (H 2 )αs−α−1 ) > t0 }. Define 0 tj = s0 + jδ/(ν ∗ (H 2 )αs−α−1 ) , j = 0, . . . , N (δ) − 1 , tN (δ) = t0 . 0  Recall that ρ∗ (H, H ) = ν ∗ ({H − H }2 ). The α-homogeneity of ν ∗ gives for s ≥ s0 such that s + δ ≤ t0 , 2 (ρ∗ (Hs , Hs+δ )) = ν ∗ ({Hs − Hs+δ }2 ) ≤ |ν ∗ (Hs2 ) − ν ∗ (Hs+δ )| 2

= ν ∗ (H 2 ){s−α − (s + δ)−α } ≤ ν ∗ (H 2 )αs−α−1 δ. 0 Thus, (ρ∗ (Hs , Hs+δ )) ≤ δ . 2

sup s,t∈[tj ,tj+1 ]

(12.2.6)

 δ indexed by {Hs , s ∈ [s0 , t0 ]} by We define the process G n,ξ  n,ξ (Ht ) , s ∈ [tj , tj+1 ) , j = 0, . . . , N (δ) − 1 ,  δ (Hs ) = G G n,ξ j  δ (Ht ) = G  n,ξ (Ht ). Similarly, we define and G 0 0 n,ξ Gδ (Hs ) = G(Htj ) , s ∈ [tj , tj+1 ) , j = 0, . . . , N (δ) − 1 , and Gδ (Ht0 ) = G(Ht0 ). These processes are random elements in D([s0 , t0 ]).  δ given F X and μδ be the (unconditional) Let μδn be the conditional law of G n n,ξ δ law of G . We will prove that for all  > 0, lim P(dBL (μδ , μ) > ) = 0 ,

(12.2.7a)

lim P(dBL (μδn , μδ ) > ) = 0 ,

(12.2.7b)

lim P(lim sup dBL (μδn , μn ) > ) = 0 .

(12.2.7c)

δ→0 n→∞ δ→0

n→∞

By the triangular argument Lemma A.1.4, this will prove (12.2.5). (i) Proof of (12.2.7a). The process G(Hs ), s ∈ [s0 , t0 ] is almost surely uniformly continuous (as shown in the proof of Theorem 10.3.1), thus (12.2.7a) holds.  δ and Gδ are (ii) Proof of (12.2.7b). For a fixed δ > 0, the processes G n,ξ random step functions defined on the same fixed grid. Thus the convergence in  δ (Ht ) to Gδ (Ht ), D([s0 , t0 ]) is equivalent to the joint weak convergence of G i i n,ξ i = 0, . . . , N (δ). Thus (12.2.7b) follows from Lemma 12.2.1.

12.2 Proof of Theorem 12.1.1

343

(iii) Proof of (12.2.7c). For brevity, write BL1 and dBL for BL1 (D([s0 , t0 ]), doJ1 ) and dBL(D([s0 ,t0 ]),doJ ) , respectively. By the definition of the distance dBL and 1 Markov inequality it suffices to prove that !      n,ξ ) | F X ] − E[f (G  δ ) | F X ] = 0 . lim E sup E[f (G n→∞

n

f ∈BL1

n,ξ

n

Since doJ1 is dominated by the uniform distance and by (12.2.6), we have !    δ X X    E[f (Gn,ξ ) | Fn ] E sup E[f (Gn,ξ ) | Fn ] −  f ∈BL1     δ   ≤ E sup Gn,ξ (Hs ) − Gn,ξ (Hs ) ∧ 2 s0 ≤s≤t0

⎡ ≤ E⎣

sup s0 ≤s,t≤t0 ρ ∗ (Hs ,Ht )≤δ

⎤      n,ξ (Ht ) ∧ 2⎦ . Gn,ξ (Hs ) − G

By Lemma 12.2.2 (which establishes the unconditional ρ∗ -equicontinuity of  n,ξ ), we have G ⎛ lim lim sup P ⎝

δ→0 n→∞

⎞ sup s,t∈[s0 ,t0 ] ρ ∗ (Hs ,Ht ) ⎠ = 0 , |G

which implies ⎡ lim lim sup E ⎣

δ→0 n→∞

⎤ sup s,t∈[s0 ,t0 ] ρ ∗ (Hs ,Ht ) 0 will be referred to as its scale parameter since if X has distribution Ψα,c , then c−1 X has distribution Ψα,1 . We will simply write Ψα for Ψα,1 . An α-Fr´echet distribution n (x) = Ψα,c (n−1/α x) for all x ≥ 0 and n ≥ 1. has the property that Ψα,c There are only three univariate max-stable distributions, up to scaling, and location: the Gumbel, (negative) Weibull and Fr´echet distributions. The formal definition of multivariate max-stability is given next. Definition 13.1.1 An Rd -valued random vector X is said to be maxstable or to have a max-stable distribution if for every n ≥ 1 and i.i.d. (i) (i) copies X 1 . . . , X n of X, there exist an > 0, bn ∈ R, i = 1, . . . , d, n ≥ 1, such that   n (d) (d) n  Xj(1) − b(1)  X j − bn n d X= . (13.1.1) ,..., (1) (d) an an j=1 j=1 A probability distribution μ on Rd is said to be max-stable if there exists a max-stable vector with distribution μ.

We will be only interested in max-stable vectors with Fr´echet marginals. Their main property is that they are regularly varying and their distribution is characterized, up to scaling, by its exponent measure. Theorem 13.1.2 A max-stable vector with non-degenerate α-Fr´echet marginals is regularly varying and its distribution is entirely characterized by its exponent measure ν X (relative to the scaling sequence n1/α ): for all x ∈ Rd+ , P(X ≤ x) = e−ν X ([0,x]

c

)

.

(13.1.2)

Conversely, a random vector X with non-negative marginals whose distribution is characterized by (13.1.2) with ν X an α-homogeneous boundedly finite measure on Rd+ \ {0} is max-stable with α-Fr´echet marginals.

Proof. Let G(x) = P(X ≤ x). By definition of max-stability, there exists a scaling sequence an such that for all x ∈ [0, ∞]d ,

13.1 Max-stable vectors

351

G(x) = Gn (an x) . Since the marginal distributions, denoted G1 , . . . , Gd , are non-degenerate αFr´echet by assumption, taking, for instance, x = (x1 , ∞, . . . , ∞) yields G1 (x1 ) = Gn1 (an x) . This implies that an = n1/α . Thus, for all x ∈ Rd+ , nP(n−1/α X ≤ x) = {1 − P(n−1/α X ≤ x)} = n{1 − G(n1/α x)} = n{1 − G1/n (x)} 1

= n{1 − e n log G(x) } → − log G(x) . Consider the sequence {μn , n ≥ 1} of boundedly finite measures on [0, ∞)d \ {0} endowed with the boundedness B0 of sets separated from 0, defined by μn = nP(n1/α X ∈ ·) . This sequence is relatively compact for the vague topology. Indeed, for every x = 0, the previously established convergence yields sup μn ([0, x]c ) < ∞ .

n≥1

Moreover, for  > 0, there exists r > 0 such that if |x| > r (for any norm, the supnorm being here the most convenient), − log G(x) ≤ . Therefore, there exists r0 and n0 such that if r ≥ r0 , c

c

sup μn (B (0, r)) ≤ sup μn (B (0, r0 )) ≤ 2 .

n≥n0

n≥n0

For each fixed n, we have by definition of μn , c

lim μn (B (0, r)) = lim nP(|X| > n1/α r) = 0 .

r→∞

r→∞

Combining the last two arguments yields c

lim sup μn (B (0, r)) = 0 .

r→∞ n≥1

By Lemma B.1.23 (see also Example B.1.28), this proves that the sequence μn is relatively compact. Since the complements of rectangles are measure determining (see Lemma B.1.32), there exists only one limit along subsequences; therefore the sequence of measures μn is convergent and let its limit be denoted ν X . The previous consideration shows that X is regularly varying with exponent measure ν X characterized by − log G(x) = ν X ([0, x]c ), i.e. (13.1.2) holds and thus the distribution of X is entirely characterized by the exponent measure. The converse is straightforward. 

352

13 Max-stable processes

Let X be a max-stable vector with α-Fr´echet marginals and exponent measure ν X . By Theorem 7.1.11, there exists a Poisson point process N with mean measure ν X . Let {P i , i ≥ 1} be the points of such a PPP. Then d

X=

∞ 

Pi .

i=1

In the next section, we will generalize the previous results to max-stable processes with α-Fr´echet marginals.

13.2 Max-stable processes Definition 13.2.1 A time series {X j , j ∈ Z} is max-stable if all its finite dimensional distributions are max-stable.

For simplicity, we consider max-stable processes with univariate marginals. The multivariate extension is straightforward. Let {X j , j ∈ Z} be a stationary max-stable process with non-degenerate α-Fr´echet marginals. By Theorem 13.1.2, its finite-dimensional distributions are regularly varying, thus X is regularly varying in the sense of Definition 5.1.1. Therefore it admits a tail measure ν and we can apply the results of Chapter 5 and in particular those of Section 5.4. Recall that by definition, the tail measure is standardized in such a way that ν({x ∈ RZ+ : x0 > 1}) = 1. Theorem 13.2.2 The following statements are equivalent: (i) X is a regularly varying max-stable process with non-degenerate αFr´echet marginal distributions. (ii) There exists a process Z such that E[Zjα ] ∈ (0, ∞) for all j ∈ Z and d

X=

∞ 

−1/α

Γi

Z (i) .

(13.2.1)

i=1

where {Γi } are the points of a unit rate homogeneous PPP on R and Z (i) , i ∈ N are i.i.d. copies of Z, independent of {Γi }. (iii) There exists a process Z such that E[Zjα ] ∈ (0, ∞) for all j ∈ Z and −α

P(X ≤ x) = e−E[∨j∈Z xj

Zjα ]

.

(13.2.2)

for all x ∈ [0, ∞]Z with only finitely many finite components.

13.2 Max-stable processes

353

The distribution of the process X is determined up to scaling by its tail measure ν and if (ii) holds, then  ∞ E[H(rZ)] −α−1 αr dr . (13.2.3) ν(H) = E[Z0α ] 0 The process X is stationary if and only if for all j ∈ Z and all nonnegative or bounded measurable functionals H on RZ , E[Z0α H(Z0−1 B j Z)] = E[Zjα H(Zj−1 Z)] .

(13.2.4)

Proof of Theorem 13.2.2 Proof (Proof of (ii) =⇒ (iii)). Assume that (13.2.1) holds. Then, Ñ é é é Ñ Ñ (i) (i) ∞ ∞    Z Z j j ≤1 . Γi−α ≤1 =P Γi−α P(X ≤ x) = P x x j j i=1 i=1 j∈Z

j∈Z

 (i) Write Ai = j∈Z x−1 j Zj and let μ be the distribution of A1 . Recall that να ˜ with is the measure with the density αx−α−1 , x > 0. The point process N −1/α points {(Γi , Ai ), i ≥ 1} is a marked PPP with mean measure να ⊗ μ on (0, ∞) × R+ . Set B = {(x, y) ∈ (0, ∞) × [0, ∞) : xy > 1}. Then, Ä ä −1/α − log P(X ≤ x) = − log P ∨∞ Ai ≤ 1 i=1 Γi ˜ (B) = 0) = (να ⊗ μ) (B) = − log P(N  ∞ ó î α . P(xA1 > 1)αx−α−1 dx = E ∨j∈Z x−α = j Zj 0

Thus (13.2.2) holds.



Obviously, (iii) =⇒ (i), so there only remains to prove that (i) =⇒ (ii). Proof (Proof of (i) =⇒ (ii)). A max-stable process with α-Fr´echet marginal distributions has regularly varying finite-dimensional distributions by Theorem 13.1.2, thus it admits a tail measure ν by Theorem 5.1.2. The tail measure satisfies ν({0}) = 0 so it can be considered as a boundedly finite α-homogeneous measure on RZ+ \ {0} endowed with the boundedness B0 of sets separated from the zero sequence (still denoted by 0), for the distance d on RZ+ defined by

354

13 Max-stable processes



d(x, y) =

2−|j| |xj − yj | ∧ 1 ,

j∈Z

This distance has the property that d(0, sx) ≤ d(0, tx) for all 0 < s < t and x ∈ RZ+ . Thus we will apply Theorem B.2.5. To this purpose set pj =

log P(Xj ≤ 1) = ν({x ∈ RZ+ : xj > 1}) . log P(X0 ≤ 1)

(13.2.5)

This is well defined and pj > 0 for all j ∈ Z since by assumption the marginal distributions are non-degenerate and the tail measure is standardized in such a way that ν({x ∈ RZ+ : x0 > 1}) = 1. Let q = {qj , j ∈ Z} be a sequence of positive real numbers such that  α Z j∈Z qj pj < ∞ and τ be the 1-homogeneous map on R+ defined by τ (x) = sup qj xj . j∈Z

Then τ (x) = 0 if and only if x = 0 and   ν({x ∈ RZ+ : τ (x) > 1}) ≤ ν({x ∈ RZ+ : qj xj > 1}) = qjα pj < ∞ , j∈Z

j∈Z

by assumption on the sequence q. On the other hand ν({x ∈ RZ+ : τ (x) > 1}) ≥ ν({x ∈ RZ+ : q0 x0 > 1}) = q0α > 0 . Thus we may and will assume that ν({x ∈ RZ+ : τ (x) > 1}) = 1. This allows ‹ to apply Theorem B.2.5 and there exists an RZ+ -valued random element Z defined on a probability space such that  ∞ ν= E[δrZ ]αr−α−1 dr . 0

By construction



1 = ν({x : x0 > 1}) = 0



¶ © E[1 rZ 0 > 1 ]αr−α−1 dr = E[Z˜0α ] .

‹ so that We define now Z = {log P(X0 ≤ 1)}1/α Z,  ∞ E[δrZ ] −α−1 αr dr . ν= E[Z0α ] 0 Let now {Pi , i ≥ 1} be the points of a PPP(να ) on (0, ∞) and Z (i) , i ≥ 1  ˜ = ∞ Pi Z (i) is max-stable with tail be i.i.d. copies of Z. The process X i=1 measure ν and its marginal distributions are Fr´echet with the same scale parameters as those of X. Thus the finite-dimensional distributions of X

are equal by Theorem 13.1.2, since the exponent measures of the and X finite-dimensional distributions are the projections of the tail measure. Thus d

 X = X.

13.2 Max-stable processes

355

Proof (Proof that stationarity is equivalent to (13.2.4)). Assume that (13.2.4) α holds. Then, applying it to the α-homogeneous function z → ∨j∈Z x−α j zj yields −α

P(B h X ≤ x) = e−E[∨j∈Z xj

B h Zjα ]

−α

= e−E[∨j∈Z xj

Zjα ]

= P(X ≤ x) .

This proves that X is stationary. Conversely, if X is stationary, then its tail measure is shift-invariant and applying (13.2.3) to the function Hj (z) = H(zj−1 z)1{zj > 1} yields E[Z0α H(Z0−1 B j Z)] =

 0



E[H(Z0−1 B j Z)1{rZ0 > 1}]αr−α−1 dr

= E[Z0α ]ν(Hj ◦ B j ) = E[Z0α ]ν(Hj )  ∞ E[H(Zj−1 Z)1{rZj > 1}]αr−α−1 dr = 0

= E[Zjα H(Zj−1 Z)] . This proves the stated equivalence and concludes the proof of Theorem 13.2.2.  Remark 13.2.3 A process Z such that (13.2.1) holds is called a spectral process for X (not to be confused with the spectral tail process). The marginal distribution of Xj is Fr´echet with scale parameter (E[Zjα ])1/α . If X is stationary, then E[Zjα ] = E[Z0α ] for all j ∈ Z. As for the time-change formula (5.3.1b), it suffices in (13.2.4) to consider homogeneous functions with degree 0, in which case (13.2.4) becomes E[Z0α H(B j Z)] = E[Zjα H(Z)] ,

(13.2.6)

or homogeneous functions with degree α (such that H(0) = 0), for which it becomes E[H(B j Z)] = E[H(Z)] .

(13.2.7) ⊕

Remark 13.2.4 Let Z be a spectral process which satisfies (13.2.4) and let ˜ = ZH0 (Z) H0 be shift-invariant and homogeneous with degree zero. Then Z is a spectral process which satisfies (13.2.4). Indeed, let H be a non-negative measurable map. Then, ˜ = E[Z α H α (Z)H((Z0 H0 (Z))−1 B j ZH0 (Z))] E[Z˜0α H(Z˜0−1 B j Z)] 0 0 = E[Z0α H0α (Z)H(Z0−1 B j Z)] = E[Z0α H0α (Z0−1 B j Z)H(Z0−1 B j Z)] . Applying now (13.2.4) to the process Z yields

356

13 Max-stable processes

˜ = E[Z α H α (Z −1 Z)H(Z −1 Z)] E[Z˜0α H(Z˜0−1 B j Z)] j 0 j j = E[Zjα H0α (Z)H(Zj−1 Z)] = E[Zjα H0α (Z)H((Zj H0 (Z))−1 ZH0 (Z))] ˜ . = E[Z˜jα H(Z˜ −1 Z)] j

⊕ The representation (13.2.3) of the tail measure entails a new formulation in terms of the spectral process Z of the equivalences stated in Theorems 5.4.8 and 5.4.9. We use the formulation of Corollary 5.5.5, in terms of the infargmax functional. Corollary 13.2.5 Let X be a stationary regularly varying max-stable process which admits the representation (13.2.1). The following statements are equivalen:. (i ) P(inf arg max(Z) ∈ Z) = 1; Ä ä (ii ) P lim|j|→∞ Zj = 0 = 1; Ä ä α (iii ) P j∈Z Zj < ∞ = 1.

13.2.1 The spectral tail process The spectral tail process Θ can be expressed in terms of the process Z of the representation (13.2.1). Proposition 13.2.6 Let X be a stationary max-stable process whose distribution is given by (13.2.2). Then, for all j ∈ Z E[H(Θ)] =

E[Zjα H(Zj−1 B j Z)] . E[Zjα ]

(13.2.8)

Proof. Let H be a non-negative measurable functional on RZ+ . Then, by (5.2.2), (13.2.3) and the shift invariance of the tail measure, we have, for any j ∈ Z,

13.2 Max-stable processes

357

E[H(Y )] = ν(H1E0 ) = ν(B j H1Ej )  ∞ E[H(rB j Z)1{rZj > 1}] −α−1 αr dr . = E[Z0α ] 0 Thus, defining the 0-homogeneous function K by K(y) = H(y0−1 y), E[H(Θ)] = E[H(Y0−1 Y )] = E[K(Y )]  ∞ E[K(B j Z)1{rZj > 1}] −α−1 αr dr = E[Z0α ] 0 =

E[Zjα H(Zj−1 B j Z)] E[Zjα K(B j Z)] = . E[Z0α ] E[Z0α ]

Since stationarity implies that E[Zjα ] = E[Z0α ] for all j ∈ Z, (13.2.8) is proved.  If H is 0-homogeneous, then (13.2.8) becomes E[H(Θ)] =

E[Zjα H(B j Z)] . E[Z0α ]

(13.2.9)

If H is α-homogeneous, then (13.2.8) becomes E[H(Θ)] =

E[H(B j Z)] . E[Z0α ]

(13.2.10)

Applying (13.2.8) with j = 0 we conclude that if Z0 = 1 almost surely, then the spectral process Z is the spectral tail process.

13.2.2 The infargmax formula The distribution of X is characterized by its tail measure, which can be expressed in term of the spectral tail process. The next result provides an explicit expression. Given two sequences x, y, we define the pointwise multiplication x · y = (xy)j∈Z and the pointwise inverse y −1 = (yj−1 )j∈Z . Proposition 13.2.7 Let X be a stationary max-stable process with spectral tail process Θ. Let y ∈ [0, ∞]Z be a sequence of non-negative numbers of which only finitely many are finite. Then Ä ä  − log P(X ≤ y) = yj−α P inf arg max[(B −j y)−1 · Θ] = 0 . j∈Z

(13.2.11)

358

13 Max-stable processes

Proof. Applying the identity (13.2.9) to the infargmax functional which is 0-homogeneous, we obtain ⎤ ⎡  yj−α Zjα ⎦ − log P(X ≤ y) = E ⎣ = = =



j∈Z

î ¶ ©ó yj−α E Zjα 1 inf arg max[y −1 · Z] = j



j∈Z

Ä ä yj−α P inf arg max[y −1 · B j Θ] = j

j∈Z

Ä ä yj−α P inf arg max[(B −j y)−1 · Θ] = 0 .

 j∈Z



This proves (13.2.11).

We will need the following characterization of independence for max-stable processes. Lemma 13.2.8 Let α > 0 and {Pi , i ≥ 1} be the points of a PPP on (0, ∞) with mean measure να . Let Z (1) , Z (2) be spectral processes and let X (1) and X (2) be the processes defined by (k)

Xj

=

∞ 

(k,i)

Pi Zj

, j∈Z,

(13.2.12)

i=1

where Z (k,i) , i ≥ 1 are i.i.d. copies of Z (k) , k = 1, 2. Assume moreover that P(Z (1) = 0, Z (2) = 0) = 0 .

(13.2.13)

Then X (1) and X (2) are independent and the process X defined by Xj = (1) (2) Xj ∨ Xj , j ∈ Z, is max-stable with spectral process Z (1) + Z (2) . Proof. Without loss of generality, we take α = 1. Condition (13.2.13) implies that Z (1) and Z (2) cannot be simultaneously non-zero (as sequences), thus ⎡ ⎤  Zj(1) Zj(2) ⎦ ∨ − log P(X (1) ≤ x, X (2) ≤ y) = E ⎣ xj yj j∈Z ⎤ ⎤ ⎡ ⎡  Zj(2)  Zj(1) ⎦ + E⎣ ⎦ = E⎣ xj yj j∈Z

= − log P(X

j∈Z

(1)

≤ x) − log P(X (2) ≤ y) .

This proves the independence. Taking x = y proves the second claim.



13.3 Anticlustering conditions

359

Example 13.2.9 Let Z be a spectral process such that Ñ é  0

cn x | X0 > cn y = 0 . (AC(cn , rn )) lim lim sup P m→∞ n→∞

m≤|j|≤rn

We already know (cf. Theorem 6.1.4) that AC(cn , rn ) implies that the tail process tends to zero. For max-stable process, the converse is true and in addition the candidate extremal index is the true extremal index when AC(cn , rn ) holds. Theorem 13.3.1 Let X be a stationary regularly varying max-stable process with spectral tail process Θ. Then the following conditions are equivalent: (i) P(lim|j|→∞ Θj = 0) = 1. (ii) Condition AC(cn , rn ) holds with any choice of cn and rn such that limn→∞ rn P(X0 > cn ) = 0. If these conditions hold, then the extremal index θ of X exists and is equal to the candidate extremal index ϑ and θ > 0.

360

13 Max-stable processes

Proof. Assume that (i) holds and let cn and rn be sequences such that c−α n rn → 0. Then, applying the identity (13.2.2), we obtain P( max Xj > cn , X0 > cn ) = −P(X0 ∨ max Xj > cn ) m≤j≤rn m≤|j|≤rn   +P max Xj > cn + P(X0 > cn ) m≤|j|≤rn −α

= −1 + e−cn

−α

+ 1 − e−cn If c−α n rn → 0, then  c−α E Z0α ∨ max n

m≤|j|≤rn

 Zjα ≤ c−α n



E[Z0α ∨maxm≤|j|≤rn Zjα ]

E[maxm≤|j|≤rn Zjα ]

−α

+ 1 − e−cn

E[Z0α ]

E[Zjα ] ≤ E[Z0α ](2rn + 1)c−α n →0.

0≤|j|≤rn

Thus, using (13.2.8) with j = 0,   P max Xj > cn , X0 > cn m≤|j|≤rn      −α α α α α ∼ cn max Zj − E Z0 ∨ max Zj E[Z0 ] + E m≤|j|≤rn m≤|j|≤rn     −α α α −α α α = cn E Z0 ∧ max Zj = cn E[Z0 ]E 1 ∧ max Θj . m≤|j|≤rn

m≤|j|≤rn

Since P(X0 > x) ∼ E[Z0α ]x−α , this yields by dominated convergence, for every m > 0,   max Xj > cn | X0 > cn = E[(1 ∧ Θ∗m,∞ )α ] . lim P n→∞

m≤|j|≤rn

Since limm→∞ Θ∗m,∞ = 0 by assumption, this proves that AC(cn , rn ) holds. Assume now that AC(cn , rn ) holds and set cn = (nE[Z0α ])1/α . By the identity (13.2.2), we have ã Å α −1 α −α max X ≤ x = e−(nE[Z0 ]) E[max1≤i≤n Zi ]x . (13.3.1) P c−1 i n 1≤i≤n

By stationarity of X, we have

.

13.3 Anticlustering conditions

ï

ò

E max Ziα = 1≤i≤n

=

n ß 

ï

ï

n−1 

n−1 

361

ò™ max Ziα

j≤i≤n

E

j=0

=

ò

E max Ziα − E

j=1 n ß  j=1

=

ï

j+1≤i≤n

ò ï ò™ max Ziα − E max Ziα

0≤i≤n−j

1≤i≤n−j

ï ò n−1 ã   Å E max Ziα − max Ziα = E Z0α − max Ziα 0≤i≤j

1≤i≤j

1≤i≤j

j=0

+

 Å ã  E Z0α 1 − max (Zi /Z0 )α 1≤i≤j

j=0

= E[Z0α ]

n−1 

E

+

Å ã  1 − max Θiα . 1≤i≤j

j=0

+

We used (13.2.7) in the second line and (13.2.8) in the last. By Theorem 6.1.4, AC(cn , rn ) implies (i). Thus by the dominated convergence theorem, we have Å Å ã  ã  lim E 1 − max Θiα = E 1 − max Θiα . j→∞

1≤i≤j

1≤i≤∞

+

+

Thus, by Cesaro’s theorem and (5.6.6) Å ï ò ã  α −1 α α lim (nE[Z0 ]) E max Zi = E 1 − max Θi n→∞

1≤i≤n

1≤i≤∞

=

E[(Θ∗0,∞ )α

+

− (Θ∗1,∞ )α ] = ϑ .

This proves that the extremal index θ exists and θ = ϑ.



13.3.2 The extremal index Using the results of Section 13.3.1 and Example 13.2.9, we can obtain the expression for the extremal index of any max-stable process. We first need a preliminary lemma. Lemma 13.3.2 Let X (1) and X (2) be two mutually independent non-negative stationary processes which admit the extremal indices θ1 and θ2 , respectively. (1) (2) Let X be the process defined by Xj = Xj ∨ Xj , j ∈ Z. Assume that there exists p ∈ [0, 1] such that (1)

P(X0 > x) =p. x→∞ P(X0 > x) lim

Then the extremal index of X is pθ1 + (1 − p)θ2 .

362

13 Max-stable processes

Proof. Note first that by independence (which implies extremal indepen(1) (2) dence), P(X0 > x) ∼ P(X0 > x) + P(X0 > x), whence (2)

P(X0 > x) =1−p. x→∞ P(X0 > x) lim

Let an be the 1 − 1/n quantile of the distribution of X0 . By definition of the extremal index and by independence, for every x ≥ 0, Å ã Å ã Å ã (1) (2) P max Xj ≤ an x = P max Xj ≤ an P max Xj ≤ an 1≤j≤n

1≤j≤n

→e

−(pθ1 +(1−p)θ2 )x−α

1≤j≤n

. 

This proves our claim. Theorem 13.3.3 Let X be a max-stable process with spectral tail process Θ. Then it admits an extremal index θ given by   ∗ α ï ò 1 (Θ ) (13.3.2) θ=ϑ=E =E α 1{|Θ|α < ∞} . E(Y ) |Θ|α

An important consequence is that θ = 0 if and only if P for any spectral process Z.

Ä j∈Z

ä Zjα < ∞ = 0

Proof. The expressions for ϑ in (13.3.2) come from Definition 5.4.4 and Corollary 5.6.2. Assume first that P(Z ∈ α (R+ )) = 0 . Then, by Proposition 13.2.6, ¶ © P(Θ ∈ α (R+ )) = E[Z0α 1 Z0−1 Z ∈ α (R+ ) ] = 0 . This implies that ϑ = 0 by Corollary 5.6.2. By Lemma 7.5.4, this proves that θ = 0. Assume now that P(Z ∈ α (R+ )) > 0 . As in Example 13.2.9, we define Z (1) = Z1{Z ∈ α (R+ }) , Z (2) = Z1{Z ∈ / α (R+ )} .

13.3 Anticlustering conditions

363

Then X = X (1) ∨ X (2) , where X (1) and X (2) are independent and have spectral processes Z (1) and Z (2) , respectively, and P(Z (1) ∈ α (R+ )) = 1. Thus X (1) satisfies AC(cn , rn ) by Theorem 13.3.1 and its extremal index θ1 is equal to its candidate extremal index ϑ1 . The spectral tail process of X (1) is given by E[H(Θ(1) )] =

E[Z0α 1{Z ∈ α (R+ )}H(Z0−1 Z)] . E[Z0α 1{Z ∈ α (R+ )}]

Thus, ñ

(Θ(1) )∗  θ1 = ϑ1 = E  Θ(1) α α

ô

 ∗  Z 1 E 1{|Z| < ∞} α α E[Z0α 1{Z ∈ α (R+ )}] |Z|  ∗α  Θ 1 = E 1{|Θ| < ∞} . α α E[Z0α 1{Z ∈ α (R+ )}] |Θ|α

=

Moreover, (1)

P(X0 > x) = E[Z0α 1{Z ∈ α (R+ )}] . x→∞ P(X0 > x) lim

Thus, by Lemma 13.3.2, the extremal index of X is  ∗  Θ 1{|Θ| < ∞} . θ = E[Z0 1{Z ∈ α (R+ )}]ϑ1 = E α α |Θ|α 

13.3.3 Condition S(rn , un ) We recall condition S(rn , un ) introduced in Chapter 9: lim lim sup

m→∞ n→∞

rn 

P(Xj > un | X0 > un ) = 0 .

(S(rn , un ))

j=m

Similarly to AC(cn , rn ), which for max-stable processes is equivalent to the property that the tail process tends to zero almost surely at infinity, it turns out that condition S(rn , un ) is equivalent to what is simply a consequence of it for general time series, namely, the finite expectation of the number of exceedences of the tail process over the level 1.

364

13 Max-stable processes

Theorem 13.3.4 Let X be a stationary max-stable process with tail process Y . The following statements are equivalent: (i) Condition S(rn , un ) holds for all sequences {rn } and {un } such that limn→∞ rn P(X0 > un ) = 0;  (ii) j∈Z P(Yj > 1) < ∞.

 Proof. We already know that S(rn , un ) implies j∈Z P(Yj > 1) < ∞. Conversely, assume without loss of generality that α = 1. Assume without loss of generality that E[Z0 ] = 1, so that E[Zj ] = E[Z0 ] for all j by stationarity. Using the inequality u − u2 /2 ≤ 1 − e−u ≤ u for u ≥ 0 yields −1

P(X0 > x, Xj > x) = 1 − e−x

−1

− e−x

−1

+ e−x

E[Z0 ∨Zj ]

≤ x−1 (2 − E[Z0 ∨ Zj ]) + x−2 (E[Z1 ∨ Z2 ])2 ≤ x−1 E[Z0 ∧ Zj ] + 4x−2 = x−1 E[Θj ∧ 1] + 4x−2 = x−1 P(Yj > 1) + 4x−2 . Thus if rn u−1 n → 0, rn 

P(Xj > un | X0 > un ) ≤

j=m

=

rn  j=m rn 

P(Yj > 1) + O(rn u−1 n ) P(Yj > 1) + o(1) .

j=m



13.4 β-mixing The β-mixing coefficients of a stationary max-stable process can be bounded very simply in terms of the tail process. We will give the following result without proof. See Section 13.7 for references. Theorem 13.4.1 Let X be a stationary max-stable process with Fr´echet marginal distributions and tail process Y . Let βn = β(σ(Xj , j ≤ 0), σ(Xj , j > n)) be the β mixing coefficients. Then  jP(Yj > 1) . βn ≤ 4 j≥n

13.5 Examples

365

13.5 Examples We consider two important classes of max-stable processes and give an example of a max-stable process whose extremal index is zero.

13.5.1 Subordinated Gaussian max-stable processes Let W be a centered Gaussian process and define σj2 = var(Wj ). Let α > 0, {Pi } be the points of a PPP on (0, ∞) with mean measure να and W (i) , i ≥ 1 be i.i.d. copies of W . Define the max-stable process X by Xj =

∞ 

(i)

Pi e W j

−ασj2 /2

.

i=1

Theorem 13.5.1 If W has stationary increments, then X is stationary and its distribution is entirely determined by α and the variogram γ defined by γj = var(Wj − W0 ), j ∈ Z. Its spectral tail process is 1 eW −W0 − 2 αγ .

2

Proof. We must check (13.2.4). Write Zj = eWj −ασj /2 . We must prove that Z ˜ j be the probability measure with density Z α satisfies (13.2.4). For j ∈ Z, let P j with respect to P. By Thorem E.4.5, the process W has the same distribution ˜ j as the process αcov(Wj , W ) + W under P. Thus we obtain, for any under P bounded or non-negative measurable function H, α

˜ j [H(Z −1 Z)] = E ˜ j [H(eW −Wj − 2 (σ E[Zjα H(Zj−1 Z)] = E j = E[H(eW +αcov(W = E[H(e

2

−σj2 )

2 2 ,Wj )−Wj −ασj2 − α 2 (σ −σj )

j W −Wj − α 2B γ

)]

)]

)] .

In the last line, we used the relation γj = 12 (σj2 +σ02 −2cov(W0 , Wj )). Since W has stationary increments, W − Wj has the same distribution as B j W − W0 . Thus E[Zjα H(Zj−1 Z)] = E[H(eB

j

j W −W0 − α 2B γ

)] .

(13.5.1)

Applying again Theorem E.4.5 (in the reverse direction) with the probability ˜ 0 and the Gaussian process B j W yields measure P E[H(eB

j

j W −W0 − α 2B γ

)] = E[Z0α H(Z0−1 B j Z)] .

This proves (13.2.4), thus X is stationary.

366

13 Max-stable processes

By Proposition 13.2.6, the distribution of the spectral tail process is that of ˜ 0 . Thus, applying (13.5.1) with j = 0, we have Z0−1 Z under P α

E[H(Θ)] = E[Z0α H(Z0−1 Z)] = E[H(eW −W0 − 2 γ )] . The distribution of the Gaussian process W − W0 depends only on γ since cov(Ws − W0 , Wt − W0 ) =

1 (γs + γt − γt−s ) . 2

By Proposition 13.2.7, the distribution of X depends only on α and its spectral tail process, hence on the variogram.  Remark 13.5.2 The spectral tail process does not vanish (P(Θj = 0) = 0 for all j), thus it is a valid spectral process and X has the same distribution ˜ defined by as the process X ˜j = X

∞ 

(i)

Pi e W j

(i)

−W0 −αγj /2

, j∈Z.

i=1

Condition AC(cn , rn ) holds if limt→∞ (Wt − W0 − 12 αγt ) = −∞ almost surely. This is satisfied by the standard Brownian motion or the fractional Brownian motions with Hurst index H ∈ (0, 1). To check condition S(rn , un ) and bound the β-mixing coefficients, we will use the following lemma. Let Φ denote the distribution function of the standard Gaussian law. Lemma 13.5.3 Let X be the max-stable process defined in Theorem 13.5.1 and let Y be its tail process. Then, for all j ∈ Z, √ P(Yj > 1) = 2Φ( α2 γj ) . Proof. Write vj = α2 γj . Then α(Wj − W0 ) is a centered Gaussian random variable with variance vj . Thus α2

P(Yj > 1) = E[eαWj −αW0 − 2 γj ∧ 1]  vj /2 2 du √ −u = eu−vj /2 e 2vj  + Φ( vj /2) 2πv j −∞  vj /2 (u−vj )2 du √ − 2v j  = e + Φ( vj /2) 2πvj −∞  −vj /2 2 du √ −u = e 2vj  + Φ( vj /2) 2πv j −∞ √ √ √ = Φ(− 12 vj ) + Φ( 12 vj ) = 2Φ( 12 vj ) . 

13.5 Examples

367

Example 13.5.4 Let 0 < H < 1 and W be a fractional Brownian motion with Hurst index H, that is, W is a Gaussian process with stationary increments such that W0 = 0 almost surely and γj = E[Wj2 ] = σ 2 j 2H . For H = 1/2, W is simply the standard Brownian motion. Let X be the associated max-stable process. By Lemma 13.5.3, we see that X satisfies Condition S(rn , un ) and is β-mixing with rate  βn ≤ 4 jΦ( 12 ασj H ). j>n

√ 2 2H 1 The equivalence Φ(x) ∼ ( 2πx)−1 e−x /2 yields βn = O(n1−3H e− 2 ασn ).

13.5.2 Max-moving averages Let α > 0 and {ψj , j ∈ Z} be a sequence of non-negative real numbers such that  ψiα < ∞ . i∈Z

Let {εj , j ∈ Z} be a sequence of i.i.d. Fr´echet random variables with tail index α and scaling parameter 1. Let X be the process defined by  ψi εj−i , j ∈ Z . Xj = i∈Z

Theorem 13.5.5 The process  X is stationary, max-stable with tail index α, scaling parameter j∈Z ψjα , and spectral tail process Θj =

ψN +j , j∈Z, ψN

(13.5.2)

with N a random variable with distribution P(N = j) = 

ψjα

i∈Z

ψiα

, j∈Z.

Furthermore, (i) Condition AC(cn , rn ) holds for all sequences cn , rn such that limn→∞ rn c−α n = 0 and the extremal index is maxj∈Z ψjα θ=  α . j∈Z ψj

368

13 Max-stable processes

(ii) Condition S(rn , un ) holds for  all sequences cn , rn such that  α α limn→∞ rn c−α n = 0 if and only if j∈Z k∈Z (ψj ∧ ψk ), a sufficient condition for which being  |j|ψjα < ∞ . j∈Z

(iii) X is β-mixing if

 j∈Z

j 2 ψjα < ∞.

Proof. To check that the spectral tail process is given by (13.5.2), let Z be −1 ψj+N . Since ψk = 0 implies P(N = k) = 0, the process defined by Zj = ψN there is no issue with division by 0. Let Z (i) , i ≥ 1 be i.i.d. copies of Z. Let {Pi , i≥ 1} be the points of a PPP on (0, ∞) with mean measure να . Write S = j∈Z ψjα . Then it is easy to see that d

X=S

∞ 

Pi Z (i) .

i=1

Since Z0 = 1 almost surely, the spectral process Z is the spectral tail process. Condition AC(cn , rn ) holds since Zj → 0 as |j| tends to ∞. The extremal index is given by (cf. Theorem 13.3.3) ñ ô −α α maxj∈Z ψN ψj+N maxj∈Z ψjα   θ=E = −α α . α ψN j∈Z ψj j∈Z ψj+N For condition S(rn , un ), we compute −α α P(Yj > 1) = E[ψN ψj+N ∧ 1] = S −1



α (ψkα ∧ ψj+k ).

k∈Z



The last two claims follow easily.

13.5.3 A max-stable process with extremal index 0 (i)

Let {Zj , Zj , j ∈ Z, i ≥ 1} be i.i.d. random variables with a Fr´echet distribution with tail index β > 1. Let {Pi , i ≥ 1} be the points of a PPP on (0, ∞) with mean measure x−2 dx. Define the max-stable process {Xj , j ∈ Z} by Xj =

∞  i=1

(i)

Pi Zj , j ∈ Z .

13.6 Problems

369

Since {Zj , j ∈ Z} is stationary, it satisfies (13.2.4) thus {Xj , j ∈ Z} is stationary. Since P(Z0 > 0) = 1, the spectral  tail process is {Zj , j ∈ Z}. Since it is an i.i.d. sequence, it holds that P( j∈Z Zjα = ∞) = 1. Thus the extremal index is 0. We can check it directly. Indeed,   n ã Å  −1 −1 Zj . − log P max Xj ≤ nx = n x E 1≤j≤n

j=1

Since the Zj are i.i.d. Fr´echet with tail index β, ∨nj=1 Zj has the same distribution as n1/β Z1 . Thus,   n  Zj = n1/β E[Z1 ] = n1/β Γ (1 − 1/β) . E j=1

Since β > 1, this yields Å ã 1/β−1 Γ (1−1/β)x−1 lim P max Xj ≤ nx = lim en =1. n→∞

1≤j≤n

n→∞

This proves that the extremal index is 0. This example can be generalized. If the spectral process is stationary, then the extremal index must be zero. See Problem 13.4.

13.6 Problems 13.1 Let X be a stationary max-stable process with tail measure supported on 0 (R+ ) and spectral process Z such that E[Z0α ] = 1. ó î 1. Show that θ = E (Z ∗0,∞ )α − (Z ∗1,∞ )α . ó î ∞ ∞ 2. Show that θlargedev = E ( j=0 Zj )α − ( j=1 Zj )α (cf. (6.2.19)). 13.2 Let φ ∈ (0, 1), α > 0 and {εj , j ∈ Z} be an i.i.d. sequence with Fε (x) = α −α e−(1−φ )x . Define the ARMAX(1) process as Xj+1 = max{φXj , εj+1 }. ∞ 1. Show that the stationary solution {Xj , j ∈ Z} is Xj = i=0 φi εj−i and has a unit Fr´echet distribution. 2. Compute the tail and spectral tail process and prove that and θ = 1 − φα . 3. Prove that the process is β-mixing and that conditions AC(cn , rn ), S(rn , un ) hold. 13.3 Let X be a stationary max-stable process with Fr´echet marginal distributions with unit scale parameter, whose tail measure is supported by 0 (R+ ).

370

13 Max-stable processes

Let Θ be the spectral tail process and Q be the sequence with the distribution of Θ conditionally on inf arg max(Θ) = 0 and ϑ = P(inf arg max(Θ) = 0). Let {Ti , Pi , Q(i) , i ≥ 1} be the points of a Poisson point process on Z × (0, ∞) × 0 (R+ ) with mean measure δ ⊗ ϑνα ⊗ P(Q ∈ ·), δ being the counting measure on Z and Q(i) being i.i.d. copes of Q. Show that d

X=

∞ 

Pi B Ti Q(i) .

(13.6.1)

i=1

Such a representation is called an M3 representation. 13.4 This is a generalization of Section 13.5.3. Let {Pi , ∈ Z} be the point of a PPP(να ) on (0, ∞) and let Z (i) be i.i.d. copies of a stationary process Z such that E[Z0α ] = 1. Define X=

∞ 

Pi Z (i) ,

i=1

the supremum being taken componentwise as usual. 1. Prove that X is a stationary max-stable process with α-Fr´echet marginals. 2. Prove that its extremal index is 0.

13.7 Bibliographical notes For a general introduction on max-stable distributions, see, for instance, [Res87, Section5.4]. The representation (13.2.1) of max-stable processes was shown in [Vat84]; see also [Vat85], [dH84] and [GHV90]. The proof given here uses B.2.5 which is adapted from [DHS18]. The equivalence stated in Corollary 13.2.5 was obtained by [DK17] by ergodic theoretic methods; see also the references therein for a much broader survey of the theory of max-stable processes. The presentation of Section 13.2 essentially follows ideas of [DH17] and [Has18]. The M3 representation in terms of the sequence Q (in Problem 13.2) is due to [PS18]. See also [DHS18] for further extensions of these ideas to processes in Polish spaces. Theorem 13.3.1 is also due to [DH17] whereas the equivalence in Theorem 13.3.4 seems to have been unnoticed. Theorem 13.4.1 is [DEM12, Corollary 2.2] expressed in terms of the tail process. In [DEM12], the bound for βn is given in terms of the extremal coeffi−1 cient θ(i, j) which is defined by P(Xi ∨ Xj ≤ x) = e−θ(i,j)x . The extremal coefficient can be expressed in terms of a spectral process {Zj , j ∈ Z} by θ(i, j) = E[Zi ∨ Zj ]. Applying Proposition 13.2.6 to the formula in [DEM12, Corollary 2.2] yields Theorem 13.4.1.

13.7 Bibliographical notes

371

The first example among the family of max-stable processes considered in Section 13.5.1 was introduced by [BR77] who considered the standard Brownian motion. Theorem 13.5.1 is a particular case of the general theory of [KSdH09]. Our proof relies on E.4.5 which can be found in [DEO16, Lemma 1, supplementary material]. The example in Section 13.5.3 and Problem 13.4 were communicated to us by Stilian Stoev. This is a consequence of the representation of max-stable processes in terms of flows, see, for instance, [DK17] for more insight on these flow representations.

14 Markov chains

Many time series models are either Markov chains or functions of a Markov chain. We will use the powerful theory of irreducible Markov chains to check the conditions needed to apply the results of Part II. We start with some wellknown definitions and elementary properties in order to fix the notation. Definition 14.0.1 (Markov kernel) Let (E, E) and (F, F) be two measurable spaces. A function Π : E × F → R is called a transition kernel if • for all x ∈ E, Π(x, ·) is a probability measure on F; • for all A ∈ F, the function x → Π(x, A) is measurable. If moreover E = F, then Π is called a Markov (transition) kernel.

Let Π be a Markov kernel on E and f be a bounded measurable function on E. Then Πf is the measurable function defined on E by  Πf (x) = f (y)Π(x, dy) , x ∈ E . E

If ν is a probability measure on (E, E), then νΠ is the probability measure defined on (E, E) by  νΠ(A) = Π(x, A)ν(dx) , A ∈ E . E

The measure ν ⊗ Π is defined on (E2 , E ⊗2 ) by  ν ⊗ Π(h) = h(x, y)ν(dx)Π(x, dy) . E2

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 14

373

374

14 Markov chains

The iterates of the kernel Π n , n ≥ 0 are defined recursively as follows. For n ≥ 0, x ∈ E and A ∈ E,  Π 0 (x, A) = 1A (x) , Π n+1 (x, A) = Π n (y, A)Π(x, dy) . E



The tensor product Π ⊗Π of the two Markov kernels Π and Π  is a transition kernel on E × E ⊗2 by   Π  (y, B)Π(x, dy) , Π ⊗ Π (x, A × B) = A

for all A, B ∈ E. The n-th tensorial power of Π is defined iteratively by Π ⊗1 = Π and Π ⊗(n+1) = Π ⊗n ⊗ Π, n ≥ 1. Definition 14.0.2 (Markov Chain) A stochastic process {Xj , j ∈ N} is called a homogeneous Markov chain on E with respect to the filtration {Fj , j ∈ N} and transition kernel Π if for all j ≥ 0 and all bounded measurable functions f on E, E[f (Xj+1 ) | Fj ] = Πf (Xj ) . The set E is called the state space of the Markov chain. The distribution of X0 is called the initial distribution.

We will only consider homogeneous Markov chains and hereafter homogeneity will always be implicitly assumed. Using the previous notation, it is easily seen that if the initial distribution is ν, then the distribution of Xn is νΠ n and the distribution of (X0 , . . . , Xn ) is ν ⊗ Π ⊗n . We will write Pν , Eν to stress that the initial distribution is ν and if it is a Dirac mass at x, we will simply write Px , Ex . Definition 14.0.3 (Invariant distribution) Let {Xj , j ∈ N be a Markov chain on E with transition kernel Π. A probability measure π on (E, E) is said to be invariant with respect to Π if πΠ = π.

The following result is straightforward and shows the importance of invariant distributions in the theory of Markov chains.

14 Markov chains

375

Theorem 14.0.4 A Markov chain with kernel Π is stationary if and only if its initial distribution is invariant with respect to Π. A stationary Markov chain {Xj , j ∈ N} with invariant distribution π on a Polish space E can be extended to a stationary process {Xj , j ∈ Z} such that for each k ∈ Z, {Xk+j , j ≥ 0} is a Markov chain with kernel Π and invariant distribution π.

Almost all Markov chains have a so-called functional representation. Theorem 14.0.5 Let {Xj , j ∈ N} be a Markov chain with values in (E, E) and assume that E is countably generated. Then there exist • a sequence {εj , j ∈ N} of i.i.d. random elements with values in a measurable space (F, F) independent of X0 ; • a measurable function Φ : E × F → E such that Xj+1 = Φ(Xj , εj+1 ) , j ≥ 0 .

(14.0.1)

The transition kernel Π is given by Π(x, A) = P(Φ(x, ε0 ) ∈ A) , x ∈ E , A ∈ E . d

The chain is stationary if and only if X0 = Φ(X0 , ε1 ).

We will call (14.0.1) a functional representation of the Markov chain. In general, such a representation is not unique and may not be useful. However, Markovian time series models are often defined by such a functional representation and their properties are derived from it. See Theorem D.1.1 for conditions that ensure the existence of an invariant distribution. As in Chapter 13, the main goal of this chapter is to revisit extreme value theory of Markov chains in the light of the tail process. All the properties of a Markov chain are determined by its transition kernel and initial distribution. Regular variation of the chain in the sense of Chapter 5 will depend on the regular variation of the initial distribution and properties of the kernel. In Section 14.1 we will recall one of the main results which allow to establish the regular variation of the initial distribution. In Section 14.2, we will establish the main result of this chapter: we will provide conditions on the kernel which ensure the propagation of the regular variation of the initial distribution to

376

14 Markov chains

the whole chain. These conditions will be expressed either in terms of the kernel (Theorem 14.2.3) or of the functional representation (Theorem 14.2.7). In Section 14.3, we will consider regularly varying time series which can be expressed as functions of geometrically ergodic Markov chains. It turns out that for this class of processes, all the conditions introduced in Chapters 9 and 12 can be checked. This provides a wide class of regularly varying time series to which this theory applies. We will thoroughly study examples in Section 14.3.4 and Section 14.3.6. More examples will be treated as problems in Section 14.6. Certain Markov chains belong to the class of regenerative processes. For such processes, it has long been known that the extremal index, if it exists, can be related to the maximum of the chain over one full regeneration cycle. We will review this theory and give examples in Section 14.4. Finally, in Section 14.5, we will consider extremally independent Markov chains. We will parallel the results of Section 14.2 by replacing the tail process by the hidden tail process.

14.1 Regular variation of the stationary distribution There are several ways to prove that a given distribution is regularly varying. The simplest case is when an explicit expression for it is known, for instance, as a series of regularly varying random vectors and the results of Chapter 4 can be applied. This is, for instance, the case of the AR(1) process and certain solutions of stochastic recurrence equations (SRE) which will be studied exhaustively in Sections 14.3.4 and 14.3.5. The invariant distribution always satisfies a fixed point equation. If the functional representation of the chain is given by (14.0.1), then the distribution d of X0 is invariant if and only if X0 = Φ(X0 , ε1 ). It happens that regular variation of X0 occurs as an implicit consequence of this equation. The theory to establish this property is much more difficult than the theory of Chapter 4, and we will quote without proof the most famous (and very useful) result. References to further results will be given in Section 14.7. Theorem 14.1.1 Let M a real-valued random variable such that the distribution of log |M | given M = 0 is non-arithmetric, i.e. for all c > 0, P(log |M | ∈ {kc, k ∈ Z}) = 0. Assume that there exists α > 0 such that E[|M |α ] = 1 , E[|M |α log+ (|M |)] < ∞ .

(14.1.1)

14.1 Regular variation of the stationary distribution

377

Then m = E[|M |α log(|M |)] ∈ (0, ∞). Let Ψ be a random element in the space of real-valued Borel measurable d functions. Assume that X is a solution to the equation X = Ψ (X), M is independent of (X, Ψ ) and α α α E[|(Ψ (X))α + − (M X)+ |] + E[|(Ψ (X))− − (M X)− |] < ∞ .

(14.1.2)

If P(M ≥ 0) = 1, then lim tα P(X > t) = C+ ,

t→∞

lim tα P(X < −t) = C−

t→−∞

(14.1.3)

with 1 α E[(Ψ (X))α + − (M X)+ ] < ∞ , αm 1 α E[(Ψ (X))α C− = − − (M X)− ] < ∞ . αm C+ =

(14.1.4a) (14.1.4b)

If P(M ≥ 0) < 1, then (14.1.3) holds with C+ and C− replaced by C+ = C− =

1 E[|Ψ (X)|α − |M X|α ] . 2αm

Example 14.1.2 (Stochastic light-tailed innovations)

recurrence

equation

(SRE)

with

d

Consider the equation X = AX + B with A and B non-negative random variables and α > 0 such that E[Aα ] = 1 and E[B α ] < ∞. Then, if {(Aj , Bj ), j ≥ 0} is a sequence of i.i.d. random pairs with the same distribution as (A, B), the series ∞  A0 · · · Aj−1 Bj X0 = B0 + j=1 d

is well defined and is solution to the equation X0 = AX0 + B. Indeed, for q < α, we have E[Aq0 ] < 1 (unless A0 ≡ 1 which we exclude) and if furthermore q ≤ 1, q   ∞ ∞   A0 · · · Aj−1 Bj (E[Aq0 ])j < ∞ . ≤ E[B0q ] E j=1

j=1

If α > 1 and q ∈ [1, α), we have by Minkowski’s inequality  ∞ q  ∞   A0 · · · Aj−1 Bj (E[Aq0 ])j/q < ∞ . E1/q ≤ E1/q [B0q ] j=1

j=1

378

14 Markov chains

This proves that E[X0α−1 ] < ∞ if α > 1. Let us now check (14.1.2) with M = A and Ψ (x) = Ax + B, and (A, B) independent of X0 . If α ≤ 1, E[(AX0 + B)α − (AX0 )α ] ≤ E[B α ] < ∞ . If α > 1, we have E[(AX0 + B)α − (AX0 )α ] ≤ αE[(AX0 + B)α−1 B] Ä ä ≤ α2(α−1)+ E[Aα−1 B]E[X0α−1 ] + E[B α ] Ä ä α−1 1 ≤ α2(α−1)+ (E[Aα ]) α (E[B α ]) α E[X0α−1 ] + E[B α ] < ∞ . In the last line, we applied H¨ older inequality and the property E[X0α−1 ] < ∞. If A satisfies the additional assumptions (related to M therein) of Theorem 14.1.1, then X0 has a power law tail with index α. 

14.2 Regular variation of Markov chains In this section, we provide a sufficient condition on the transition kernel and the initial distribution (even if it is not its invariant distribution) which ensure that the resulting time series is regularly varying and we describe its tail process. It turns out that the forward tail process depends only on the kernel and on the tail index of the regularly varying initial distribution. We need the notion of local uniform convergence. A sequence of functions ft , t ∈ T with T = N or T = R, on a topological space E converges locally uniformly to f at u ∈ E if for every sequence ux in E such that limx→∞ ux = u, it holds that limx→∞ fx (ux ) = f (u). In a locally compact separable Hausdorff space (such as Rd ), local uniform convergence is implied by uniform convergence on compact sets. Assumption 14.2.1 There exists a Markov kernel K on (Rd \ {0}) × B(Rd ) such that lim Π(xu, xA) = K(u, A)

x→∞

(14.2.1)

for all A such that K(u, ∂A) = 0, locally uniformly with respect to every u ∈ Rd \ {0}.

An important and immediate consequence of (14.2.1) is that the kernel K satisfies the following homogeneity property: for all t > 0 and u ∈ Rd \ {0},

14.2 Regular variation of Markov chains

K(tu, tA) = K(u, A) .

379

(14.2.2)

Condition (14.2.1) can be expressed equivalently in terms of the kernel Πx on Rd × B(Rd ) defined by Πx (u, A) = Π(xu, xA) . The assumption of local uniform convergence implies that for every sequence ux in Rd such that limx→∞ ux = u ∈ Rd \ {0} and for every bounded continuous function h on Rd , lim Πx h(ux ) = Kh(u) .

x→∞

(14.2.3)

Let {X j , j ∈ N} be a Rd -valued Markov chain defined on a probability space (Ω, F, P) with kernel Π. Then (14.2.3) can be expressed as lim Exu x [h(X 1 /x)] = Kh(u) ,

x→∞

for all sequences ux → u ∈ Rd \ {0}. Assumption 14.2.2 The distribution of X 0 is multivariate regularly varying with tail index α > 0 and spectral measure Λ0 on Sd−1 .

Theorem 14.2.3 Let {X j , j ∈ N} be a Markov chain on Rd with kernel Π which satisfies Assumptions 14.2.1 and 14.2.2. Assume moreover that either K(u, {0}) = 0

(14.2.4)

lim lim sup sup Π(xu, B c (0, x)) = 0 .

(14.2.5)

for all u ∈ Rd \ {0} or →0 x→∞ |u |≤

In both cases, let K be extended to a kernel on Rd by setting K(0, A) = 1{A} (0). Then the sequence {X j , j ∈ N} is regularly varying and its forward spectral tail process {Θj , j ∈ N} is a Markov chain with kernel K and initial distribution Λ0 and 0 is an absorbing state.

The proof is postponed to Section 14.2.2.

380

14 Markov chains

Remark 14.2.4 Heuristically, (14.2.4) means that the chain cannot fall down from an extreme state to a non-extreme one; if (14.2.4) does not hold, then Condition (14.2.5) allows to extend the local uniform convergence in (14.2.1) to u = 0. It is straightforward to see that (14.2.5) is equivalent to lim lim sup sup Π(xu, (0, ax)) = 0

→0 x→∞ |u |≤

for all a > 0. In both cases, 0 is an absorbing state for the kernel K.



Remark 14.2.5 Rigorously, if the distribution of X 0 is not the invariant distribution, then {X j , j ≥ 0} is not stationary so {Θj , j ∈ N} cannot be called the spectral tail process. However, if the invariant distribution is regularly varying with the same tail index as {X j , j ∈ N}, then {Θj , j ∈ N} is indeed the spectral tail process. ⊕

14.2.1 Regular variation via functional representation Before proving Theorem 14.2.3, we rewrite it and Assumption 14.2.1 when the Markov chain admits the functional representation X j+1 = Φ(X j , εj+1 ) , j ≥ 0 , where {εj , j ∈ N} is a sequence of i.i.d. random variables with values in a measurable space (F, F) and Φ : Rd × F → Rd is a measurable map. Then Assumption 14.2.1 is implied by the following one on the function Φ. Assumption 14.2.6 There exists a function φ : Rd \ {0} × F → Rd such that for all e ∈ F, lim x−1 Φ(xu, e) = φ(u, e) ,

x→∞

locally uniformly with respect to u ∈ Rd \ {0}.

This assumption implies that φ(·, ) is homogeneous with degree 1, that is, φ(tu, ) = tφ(u, ). We can rewrite Theorem 14.2.3 in terms of the functional representation. Theorem 14.2.7 Let {X j , j ∈ N} be a Markov chain on Rd with kernel Π which satisfies Assumptions 14.2.2 and 14.2.6. Assume that either

14.2 Regular variation of Markov chains

381

P(φ(u, ε1 ) = 0) = 0 ,

(14.2.6)

lim lim sup sup P(|Φ(xu, ε1 )| > x) = 0 .

(14.2.7)

for all u ∈ Rd \ {0} or →0 x→∞ |u |≤

In both cases the function φ is extended to Rd by setting φ(0, ·) ≡ 0. Then the sequence {X j , j ∈ N} is regularly varying and its forward spectral tail process {Θj , j ∈ N} is given by the recursion Θj+1 = φ(Θj , εj+1 ) , j ≥ 0 , and Θ0 has distribution Λ0 .

Proof. We only need to check that Assumption 14.2.6 and the conditions of Theorem 14.2.7 imply both Assumption 14.2.1 and the conditions of Theorem 14.2.3. Define the kernel K on Rd \ {0} by K(u, A) = P(φ(u, ε1 ) ∈ A) for u ∈ Rd \{0} and A ∈ B(Rd ). Then, for every bounded continuous function h and every sequence ux which converges to u = 0, Assumption 14.2.6 and the bounded functional convergence yield lim Πx h(ux ) = lim E[h(X 1 /x) | X 0 = xux ] x→∞ î ó = lim E h(x−1 Φ(xux , ε1 )) = E [h(φ(u, ε1 ))] = Kh(u) .

x→∞

x→∞

This proves that Assumption 14.2.1 holds. If P(φ(u, ε1 ) = 0) = 0, then K(u, {0}) = 0. If condition (14.2.7) holds, then lim lim sup sup Π(xu, (0, x)) = lim lim sup sup P(|Φ(xu, ε1 )| > x) = 0 .

→0 x→∞ |u |≤

→0 x→∞ |u |≤

Thus (14.2.5) holds. Finally, a Markov chain with regularly varying initial distribution ν and kernel K has the same distribution as the sequence {Y j , j ∈ N}, where Y 0 has distribution ν and Y j+1 = φ(Y j , εj+1 ) . Since φ is homogeneous with degree 1 and Y j = |Y 0 | Θj , j ≥ 1, the spectral forward tail process {Θj , j ∈ N} satisfies the recursion Θj+1 = φ(Θj , εj+1 ) , j ≥ 0 , and by regular variation Θ0 has distribution Λ0 .



382

14 Markov chains

Example 14.2.8 Consider a Markov chain on [0, ∞) which satisfies the assumptions of Theorem 14.2.7. Let {Wj , j ∈ N∗ } be a sequence of i.i.d. random variables with the same distribution as φ(1, ε1 ). Let Y0 be a Pareto random variable. The homogeneity of φ yields that the forward tail process is given by Yj = Yj−1 Wj = Y0

j 

Wi , j ≥ 1 .

i=1

That is, the forward tail process is a multiplicative random walk. The candidate extremal index is given by    Å ã j j   ϑ = P max Yj ≤ 1 = E max Wiα − max Wiα . j≥1

j≥0

j≥1

i=0

i=1

+



14.2.2 Proof of Theorem 14.2.3 What we must prove is that for every bounded continuous function f on B c (0, 1) × Rdk , lim E[f ((X 0 , . . . , X k )/x) | |X 0 | > x] = E[f (Y Θ0 , · · · , Y Θk )] ,

x→∞

where Y is a random variable with a Pareto distribution with tail index α and {Θj , j ≥ 0} is as in the statement of the theorem. Let λ be the distribution of X 0 and λx be the Radon measure defined on Rd \ {0} by λx (A) =

λ(xA) λ(B c (0, x))

(14.2.8)

for every Borel set A of Rd \ {0}. The regular variation condition implies that λx converges vaguely on Rd \ {0} to να ⊗ Λ0 . We will denote by ν a probability measure on B c (0, 1) which is isomorphic to να ⊗ Λ0 restricted to B c (0, 1). With this notation, what we must prove is then that for every k ≥ 0, λx ⊗ Πx⊗k =⇒ ν ⊗ K ⊗k , w

on B c (0, 1) × Rdk . Equivalently, for every k ≥ 1 and for all (k + 1)-tuples of bounded continuous functions (h0 , . . . , hk ) on B c (0, 1), setting E = Rd ,    lim λx (dx0 )h0 (x0 ) Πx (x0 , dx1 )h1 (x1 ) · · · Πx (xk−1 , dxk )hk (xk ) x→∞ B c (0,1) E E    = ν(dx0 )h0 (x0 ) K(x0 , dx1 )h1 (x1 ) · · · K(xk−1 , dxk )hk (xk ) . B c (0,1)

E

E

14.2 Regular variation of Markov chains

383

This will be proved by induction. For k = 0, this follows from the assumption of regular variation of the initial distribution. Let fx and f be the functions defined on Rkd by fx (x0 , . . . , xk−1 ) = h0 (x0 ) · · · hk−1 (xk−1 )Πx hk (xk−1 ) , f (x0 , . . . , xk−1 ) = h0 (x0 ) · · · hk−1 (xk−1 )Khk (xk−1 ) . ⊗(k−1)

converges The induction assumption for k − 1 implies that λx ⊗ Πx weakly to ν ⊗ K ⊗(k−1) . Therefore, we only need to prove that fx converges locally uniformly to f on a subset F of B c (0, 1) × R(k−1)d such that ν ⊗ K ⊗(k−1) (F ) = 1 and invoke Theorem A.2.6 to conclude the induction. (i) If (14.2.4) holds, then ν ⊗ K ⊗(k−1) (B c (0, 1) × (Rd \ {0})k−1 ) = 1 and by assumption Πx hk converges to Khk locally uniformly on Rd \ {0}. This implies that fx converges locally uniformly to f on B c (0, 1) × (Rd \ {0})k−1 . (ii) If (14.2.5) holds, then let ux be a sequence tending to 0 and let h be a bounded continuous function on Rd . Assume without loss of generality that h(0) = 0. Fix > 0. Then, for large enough x and small enough η,   |h(v)|Π(xux , xdv) + |h(v)|Π(xux , xdv) |Πx h(ux )| ≤ B c (0,η) c

B(0,η)

≤ sup |h(v)| + h ∞ sup Π(u, B (0, ηx)) |v |≤η

|u |≤x

≤ + cst sup Π(u, B c (0, ηx)) . |u |≤x

This yields lim sup |Πx h(ux )| ≤ + cst lim sup sup Π(u, B c (0, ηx)) . x→∞ |u |≤x

x→∞

The right-hand side can be made arbitrarily small by letting → 0. This proves that lim Πx h(ux ) = 0 = h(0) = Kh(0) .

x→∞

This proves that Πx h, hence fx , converges locally uniformly to Kh on Rd . Let us finally prove that {Θj , j ∈ N} is also a homogeneous Markov chain with kernel K. Let {FnY , n ≥ 1}, be the σ-field filtration generated by the tail process {Y j , j ∈ N}. It suffices to prove that for every bounded measurable function F , E[F (Θn+1 ) | FnY ] depends only on Θn . Indeed, define the function G on R∗+ × Rd by G(x, y) = F (y/x). Applying the homogeneity property (14.2.2) of the kernel K yields

384

14 Markov chains

 E[F (Θn+1 ) |

FnY

FnY

] = E[G(|Y 0 | , Y n+1 ) | ]= K(Y n , dv)G(|Y 0 | , v) Rd  K(Y n , |Y 0 | dw)G(|Y 0 | , |Y 0 | w) = d R K(Θn , dw)F (w) . = Rd

14.2.3 The backward tail process To avoid trivialities, we assume from now on that P(Θ−1 = 0) = E[|Θ1 |] = 0. The opposite case corresponds to extremal independence, where the backward and forward tail processes are identically zero at all lags h = 0. By the time-change formula, we can obtain the distribution of the backward spectral tail process {Θj , j ∈ Z− }. Let us determine the distribution of Θ−1 under Assumption 14.2.1. Let H be a bounded measurable function on Sd−1 × Rd . By definition, Λ0 is the distribution of Θ0 . Thus, by the time-change formula Theorem 5.3.1, we have E[H(Θ0 , Θ−1 )1{Θ−1 = 0}] ï Å ò ã Θ1 Θ 0 α , =E H |Θ1 | 1{Θ1 = 0} |Θ1 | |Θ1 |   Å ã v u α , = Λ0 (du) K(u, dv)H |v| . |v| |v| Sd−1 Rd \{0} ‹ on Sd−1 × B(Sd−1 × Rd \ {0}) by Define a submarkovian kernel K  Å ã v u α ‹ KH(u) = , K(u, dv)H |v| , u ∈ Sd−1 , d |v| |v| R for all non-negative or bounded measurable functions H defined on Sd−1 × Rd \ {0}. Then, we have the duality    Å ã v u α ‹ KH(u)Λ , Λ0 (du) K(u, dv)H |v| = 0 (du) , |v| |v| Sd−1 Rd Sd−1 α ‹ or in concise form, with H(u, v) = H(v/ |v| , u/ |v|) |v| for (u, v) ∈ Sd−1 × d R \ {0},

‹ ‹ . = Λ0 ⊗ K(H) Λ0 ⊗ K(H)

(14.2.9)

We also have the identity  E [H(Θ0 , Θ−1 )1{Θ−1 = 0}] =

Sd−1

 Λ0 (du)

Rd

‹ K(u, dv)H(u, v) .

14.2 Regular variation of Markov chains

385

‹ to Rd \ {0} × B(Sd−1 × Rd \ {0}) as follows: for all We extend the kernel K d u ∈ R \ {0} and bounded measurable function F defined on Sd−1 × Rd \ {0},  Å ã Å ã ‹ ‹ u , dv F v , u . KF (u) = K |u| |u| |v| |v| Rd−1 ‹ has the following homogeneity property: for all t > 0, Then K ‹ ‹ K(tu, tdv) = K(u, dv) .

(14.2.10)

The key to our result is the following lemma which extends (14.2.9). Lemma 14.2.9 For a bounded measurable function H on Sd−1 ×(Rd \{0})n , ‹ by define the function H Å ã un u0 α ‹ ,..., H(u0 , . . . , un ) = H |un | . |un | |un | Then ‹⊗n (H) = Λ0 ⊗ K ⊗n (H) ‹ . Λ0 ⊗ K

(14.2.11)

Proof. The identity (14.2.11) is true for n = 1 by definition. Assume that it is true for one n ≥ 1. Let H be a bounded measurable function defined ‹ as above. Then, using the homogeneity on Sd−1 × (Rd \ {0})n and define H property of the kernel K and the change of variable v i = |v 1 | wi , i ≥ 2, we obtain ‹ Λ0 ⊗ K ⊗(n+1) (H)   = Λ0 (dv 0 )

Å

ã v n+1 vn v1 v0 , ,..., , H |v n+1 | |v n+1 | |v n+1 | |v n+1 | Sd−1 Rd(n+1) α × |v n+1 | K(v 0 , dv 1 ) · · · K(v n , dv n+1 )   α Λ0 (dv 0 ) K(v 0 , dv 1 ) |v 1 | × = Sd−1 Rd  Å ã wn+1 w2 v1 v0 ..., , , H |wn+1 | |wn+1 | |v 1 | |wn+1 | |v 1 | |wn+1 | Rdn Å ã v1 α , dw2 K(w2 , dw3 ) · · · K(wn , dwn+1 ) × |wn+1 | K |v 1 |   Å ã v1 v0 α ‹2 ) , = Λ0 (dv 0 ) K(v 0 , dv 1 ) |v 1 | H2 = Λ0 ⊗ K(H |v 1 | |v 1 | Sd−1 Rd ‹ 2) , = Λ0 ⊗ K(H

where the function H2 is defined on Sd−1 × Rd by

386

14 Markov chains

 H2 (u, v) =

Rdn

Å H

ã wn+1 w2 u v ,..., , , |wn+1 | |wn+1 | |wn+1 | |wn+1 | α × |wn+1 | K(u, dw2 ) · · · K(wn , dwn+1 )

‹2 is as in the duality formula (14.2.9). and H Define now the function H ∗ by H ∗ (xn+1 , . . . , x2 , x1 ) =

 Rd

‹ 1 , dv)H(xn+1 , . . . , x1 , v) K(x

‹∗ by duality. Then, applying the homogeneity property (14.2.10) of the and H ‹ and the induction assumption yields kernel K ‹ = Λ0 ⊗ K(H ‹ 2) Λ0 ⊗ K ⊗(n+1) (H)   ‹ K(u, dv) Λ0 (du) = Sd−1 Rd  Å ã wn+1 w2 u v ,..., , , H |wn+1 | |wn+1 | |wn+1 | |wn+1 | Rdn α × |wn+1 | K (u, dw2 ) · · · K(wn , dwn+1 )  Å ã wn+1 w2 u = ,..., , Λ0 (du)H ∗ |wn+1 | |wn+1 | |wn+1 | Sd−1 α × |wn+1 | K (u, dw2 ) · · · K(wn , dwn+1 ) ‹∗ ) = Λ0 ⊗ K ‹⊗n (H ∗ ) = Λ0 ⊗ K ‹⊗(n+1) (H) . = Λ0 ⊗ K ⊗n (H 

This proves (14.2.11) for n + 1 and concludes the induction.

An important consequence of the duality (14.2.11) and the time-change formula is that for every bounded measurable function H on Sd−1 × (Rd \ {0})n , ‹⊗n (H) . E[H(Θ0 , Θ−1 , . . . , Θ−n )1{Θ−n = 0}] = Λ0 ⊗ K

(14.2.12)

‹ Indeed, by Theorem 5.3.1 and by definition of the function H, E[H(Θ0 , Θ−1 , . . . , Θ−n−1 )1{Θ−n−1 = 0}] α = E[H((Θn+1 , . . . , Θ0 )/ |Θn+1 |) |Θn+1 | ] ‹ = Λ0 ⊗ K ‹⊗n (H) . = Λ0 ⊗ K ⊗n (H) ‹ describes the dynamics of the backward tail process Informally, the kernel K ‹ is a Markov kernel, until it reaches 0. If P(Θ−1 = 0) = 0, then the kernel K ‹ K(u, {0}) = 0 and (14.2.12) proves that the backward spectral tail process ‹ never vanishes and is a Markov chain with kernel K.

14.2 Regular variation of Markov chains

387

¯ on In the case where 0 < P(Θ−1 = 0) < 1, we define the Markov kernel K d R by ¯ ‹ K(u, ·) = P(Θ−1 = 0 | Θ0 = u)δ0 + K(u, ·) . ¯ ·). Given Θ0 = u, the distribution of Θ−1 is K(u, The important feature of the backward spectral tail process is that it is also ¯ a Markov chain with kernel K. Theorem 14.2.10 Under the assumptions of Theorem 14.2.3 the backward spectral tail process {Θj , j ≤ −1} is a (reverse) Markov chain with ¯ independent of {Θj , j ≥ 1} conditionally on Θ0 . Moreover, 0 kernel K, is an absorbing state for the backward tail process.

Proof. Let us first prove that 0 is an absorbing state. Indeed, by Theorem 5.3.1, since 1{x = 0} is homogeneous with degree 0, we have for k ≥ 1, α

P(Θ−k−1 = 0, Θ−k = 0) = E[1{Θ1 = 0} |Θk+1 | ] = 0 , since we have already seen in Theorem 14.2.3 that 0 is an absorbing state for the forward tail process. We must prove that for all bounded measurable function H on Sd−1 × Rdn , ¯ ⊗n (H) . E[H(Θ0 , Θ−1 , . . . , Θ−n )] = Λ0 ⊗ K

(14.2.13)

For j = 0, . . . , n − 1, define the function Hj on Sd−1 × (Rd \ {0})j by ¯ j , {0})H(u0 , . . . , uj , 0, . . . , 0) . Hj (u0 , . . . , uj ) = K(u Since 0 is absorbing, we have E[H(Θ0 , . . . , Θ−n )] = E[H(Θ0 , . . . , Θ−n )1{Θ−n = 0}] +

n−1 

E[H(Θ0 , . . . , Θ−j , 0, . . . , 0)1{Θ−j = 0}1{Θ−j−1 = 0}]

j=0

‹⊗n (H) + = Λ0 ⊗ K

n−1  j=0

‹⊗j (Hj ) = Λ0 ⊗ K ¯ ⊗n (H) . Λ0 ⊗ K 

Remark 14.2.11 It must be noted that we have only used in this proof (i) the time-change formula (5.3.1b), (ii) the fact that the forward spectral tail

388

14 Markov chains

process is a Markov chain with kernel K and (iii) the homogeneity property of K, but not the fact that the original time series was a Markov chain. Therefore, if the forward tail process of a stationary time series (not necessarily a Markov chain) is a Markov chain and its kernel satisfies the homogeneity property (14.2.2), then its backward tail process will also be a Markov chain whose kernel is obtained by the duality formula (14.2.9). ⊕ When does the spectral tail process vanish? Since 0 is an absorbing state for both the forward and the backward tail kernel, the sequences {P(Θj = 0), j ≥ 1} and {P(Θ−j = 0), j ≥ 1} are both non-decreasing. By Corollary 5.3.5, we also know that for j = 0, P(Θj = α α α 0) = 1 − E[|Θ−j | ]. Therefore the sequences {E[|Θj | ]} and {E[|Θ−j | ]} are both non-increasing. α

Two situations may occur: E[|Θj | ] = 1 for all j ∈ Z and the tail process α never vanishes, or there exists j = 0 such that E[|Θj | ] < 1 and then the  probability that the tail process vanishes increases for j ≥ j if j > 0 or j  ≤ j if j < 0. Define N+ = inf{j ≥ 1, Θj = 0} and N− = sup{j ≤ −1, Θj = 0}. There is no relation between N+ and N− in general, except if there exists m ≥ 1 such that P(Θm = 0) = 1 which is equivalent to P(Θ−m = 0) = 1. This is the case of an m-dependent time series, for instance. In general, we have the following relations. Since 0 is absorbing, we have, for j ≥ 1, α

P(N+ ≤ j) = P(Θj = 0) = 1 − E[|Θ−j | ] , α P(N− ≥ −j) = P(Θ−j = 0) = 1 − E[|Θj | ] α

α

or equivalently, P(N+ > j) = E[|Θ−j | ] and P(N− < −j) = E[|Θj | ]. Consequently α

(14.2.14a)

α

(14.2.14b)

P(N+ = ∞) = 1 − lim P(Θj = 0) = lim E[|Θ−j | ] j→∞

j→∞

P(N− = −∞) = 1 − lim P(Θ−j = 0) = lim E[|Θj | ] . j→∞

j→∞

14.3 Checking conditions In this section we will prove that functions of geometrically ergodic Markov chains satisfy the conditions used in Part II. We will need the notion of small sets (see Definition D.2.2) and the results of Appendix D.3. We now state our working assumptions.

14.3 Checking conditions

389

Assumption 14.3.1 {Yj , j ∈ Z} is a Markov chain on E with transition kernel Π. There exist a function V : E → [1, ∞), δ ∈ [0, 1), b, c > 0, such that δ + b/c < 1, ΠV ≤ δV + b ,

(14.3.1)

and the level set {V ≤ c} is a small set.

Condition (14.3.1) is called a geometric drift condition. By Theorem D.3.1, Assumption 14.3.1 implies that the chain {Yj } has an invariant distribution, denoted by π, π(V ) < ∞, and is geometrically ergodic. We now consider a function g : E → Rd and define the sequence {X j , j ∈ Z} by X j = g(Yj ). We make the following assumption. Assumption 14.3.2 The sequence {X j , j ∈ Z} is stationary and regularly varying with index α and there exist q, c > 0 such that |g|q ≤ cV , lim sup x→∞

P(V (Y0 ) > x) x)

(14.3.2a) (14.3.2b)

Remark 14.3.3 Under Assumption 14.3.1, we know that π(V ) < ∞, so q must be less than or equal to α. ⊕ Remark 14.3.4 Conditions (14.3.2a) and (14.3.2b) roughly mean that V (Y0 ) q and |X 0 | are tail equivalent. As we will see in examples below, in the case d = 1, a typical choice for the drift function is V (x) = 1+|x|q with q ∈ (0, α). With such a choice, both conditions hold. In the multivariate case, the function g might be of the form g(y) = y1 (the first component of y) and V defined q as V (y) = 1 + |y| . In that case, (14.3.2a) trivially holds and (14.3.2b) holds if the first component of X 0 is tail equivalent to |X 0 |, which is usually the case, but has to be checked for each example. ⊕ An important consequence of Assumption 14.3.2 is that for all r ∈ (0, 1], lim sup x→∞

E [V r (Y0 )1{|X 0 | > x}] x)

(14.3.3)

The main ingredient of the forthcoming proofs will be properties induced by geometric ergodicity. By Theorem D.3.1, for every r ≥ 1, there exist

390

14 Markov chains

β > 1 (which may depend on r) and a constant such that for all measurable 1 functions f such that |f | ≤ V r (which implies π(|f |) < ∞), π(f ) = 0 and ∞ 

1

β j |Π j f (x)| ≤ cst V r (x) .

(14.3.4)

j=0

By Proposition D.3.4, a stationary Markov chain which satisfies Assumption 14.3.1 is β-mixing with geometric rate. In the following subsections, we will check the other conditions used in Part II for time series which satisfy Assumptions 14.3.1 and 14.3.2. This means that the whole theory developed therein is applicable to this wide class of time series models.

14.3.1 Anticlustering conditions Let {X j , j ∈ Z} be a stationary sequence, {un } be a non-decreasing sequence and {rn } be non-decreasing sequences of integers such that limn→∞ un = limn→∞ rn = ∞. We recall here the anticlustering conditions introduced in Chapters 6, 9 and 10. We say that Condition AC(rn , un ) and S(un , rn ) hold if for all y > 0,

max |X j | > un y | |X 0 | > un = 0 , (AC(rn , un )) lim lim sup P m→∞ n→∞

m≤|j|≤rn

lim lim sup

m→∞ n→∞

rn 

P (|X j | > un y | |X 0 | > un y) = 0 .

(S(un , rn ))

|j|=m

Let ψ : Rd → R be a non-negative measurable function. We say that Condition Sψ (rn , un ) holds if for all s, t > 0, lim lim sup

m→∞ n→∞

rn  E [ψ(sX 0 /un )ψ(tX j /un )] =0. P(|X 0 | > un )

(Sψ (rn , un ))

|j|=m

Recall that S(un , rn ) implies AC(rn , cn ) and Sψ (rn , un ) implies S(un , rn ) if ψ vanishes in a neighborhood of zero and is bounded away from zero elsewhere. Theorem 14.3.5 Let Π be a Markov kernel which satisfies Assumptions 14.3.1 and 14.3.2. Let ψ be a non-negative measurable function and > 0 be such that ψ(x) ≤ cst |x|

q/2

1{|x| > }

(14.3.5)

(with q as in Assumption 14.3.2). Then Sψ (rn , un ) holds for all sequences rn , un such that rn P(|X 0 | > un ) = o(1).

14.3 Checking conditions

391

Proof. Let F0 be the distribution function of |X 0 |. By (14.3.2a) and (14.3.5), we have ψ 2 ◦ g ≤ V , thus E[ψ 2 (X 0 /un )] < ∞. Fix an integer L > 0. Then, rn  E [ψ(X 0 /un )ψ(X j /un )]

P(|X 0 | > un )

j=L



rn  E [ψ(X 0 /un ){ψ(X j /un ) − E[ψ(X 0 /un )]} P(|X 0 | > un ) j=L

+

rn (E[ψ(X 0 /un )])2 . P(|X 0 | > un )

The last term goes to zero by (14.3.5) and since rn P(|X 0 | > un ) → 0. Set F0 (un ) = P(|X 0 | > un ). Define the function f on E by f (y) = ψ(g(y)/un ) − E[ψ(X 0 /un )] , y ∈ E .

(14.3.6)

−q/2

Then π(f ) = 0 and |f (y)| ≤ cst un V 1/2 (y) by (14.3.2a) and (14.3.5). Thus, we can apply (14.3.4) with r = 2 and obtain ⎡ ⎤ rn  1 E ⎣ψ(X 0 /un ) ⎦ {ψ(X j /un ) − E[ψ(X j /un )]} F0 (un ) j=L î î ó ó  rn r n f (Yj ) Π j f (Y0 ) E ψ(X 0 /un ) j=L E ψ(X 0 /un ) j=L = = F0 (un ) F0 (un ) î ó r n j E ψ(X 0 /un ) j=L |Π f (Y0 )| ≤ F0 (un ) î ó r n j j E ψ(X /u ) β |Π f (Y )| 0 n 0 j=L ≤ β −L F0 (un ) 1

≤ cst β −L

E[ψ(X 0 /un )V 2 (Y0 )] q/2 un F0 (un )

≤ cst β −L

E[1{|X 0 | > un }V (Y0 )] . uqn F0 (un )

We conclude by applying (14.3.3) and letting L → ∞.



14.3.2 A higher order moment bound In Chapters 10 to 12, we introduced conditions needed to prove the tightness of certain processes in Theorems 10.4.9 and 11.2.4 and the validity of the multiplier bootstrap in Theorem 12.1.3. They can be expressed as follows: there exists ς > 2 such that îÄr äς ó n E j=1 ψ(X j /un ) sup 1} in Theorems 10.4.9 and 11.2.4 and ψ satisfying q/ς ψ(x) ≤ cst |x| 1{|x| > 1} with q as in Assumption 14.3.2. Lemma 14.3.6 If Assumptions 14.3.1 and 14.3.2 hold, then (14.3.7) holds q/3 with ς = 3 for functions ψ(x) = 1{|x| > 1} and ψ(x) ≤ cst |x| 1{|x| > 1} and for all sequences rn , un such that rn P(|X 0 | > un ) = o(1). Proof. Write ξj = ψ(X j /un ). Set again F0 (un ) = P(|X 0 | > un ). By (14.3.2b) and by stationarity, we have ⎡ 3 ⎤ rn  E⎣ ψ(X j /un ) ⎦ j=1

 ≤ cst rn

F0 (un ) +

rn 

E[ξ02 ξi

+

ξ0 ξi2 ]

i=1

+

rn  rn 

 E[ξ0 ξi ξj ]

.

i=1 j=i+1

By a straightforward adaptation of the proof of Theorem 14.3.5, we have r  n  E (ξ0 ξi2 + ξ0 ξi2 ) = O(F0 (un )) . i=1

Thus the two middle terms are of the right order and it suffices to show that r n r n i=1 j=i+1 E[ξ0 ξi ξj ] sup un } − E[V 1/3 (Y0 )1{|X 0 | > un }] . Since |X j | ξj ≤ cst u−q/3 n

q/3

1{|X j | > un } ≤ cst u−q/3 V 1/3 (Yj )1{|X j | > un } , n

we have rn  rn 

E [ξ0 ξi ξj ] ≤ cst u−q/3 n

i=1 j=i+1

rn  rn 

î ó E ξ0 ξi V 1/3 (Yj )1{|X j | > un }

i=1 j=i+1

= cst

u−q/3 n

rn 



E ξ0 ξi

i=1

rn 

 f (Yj )

j=i+1

+ cst u−q/3 rn E[V 1/3 (Y0 )1{|X 0 | > un }] n

rn  i=1

= cst u−q/3 (I + II) . n

E [ξ0 ξi ]

14.3 Checking conditions

393

Again as an application of Theorem 14.3.5, rn 

E [ξ0 ξi ] = O(F0 (un )) .

i=1 q/3

Furthermore, E[V 1/3 (Yj )1{|X 0 | > un }] = O(un F0 (un )), again by (14.3.3). 2 −q/3 Thus un II = O(rn F0 (un )) = o(F0 (un )). For the term I, we use the same technique as in the proof of Theorem 14.3.5: r   r   rn rn n n     j−i E ξ0 ξi f (Yj ) = E ξ0 ξi Π f (Yi ) i=1 j=i+1 i=1 j=i+1   rn rn   ≤ E ξ0 ξi |Π j−i f (Yi )| i=1



≤ cst E ξ0

j=i+1 rn 

ξi V

 1/3

(Yi )

i=1



≤ cst

u−q/3 E n

ξ0

rn 

 V

2/3

(Yi )1{|X 0 | > un }

.

i=1

We now define f2 (y) = V 2/3 (y)1{|g(y)| > un } − E[V 2/3 (Y0 )1{|X 0 )| > un }] and use the same technique. This yields, using (14.3.3) with r = 1/3 and r = 2/3,   r n  2/3 V (Yi )1{|X 0 | > un } E ξ0 i=1



= E ξ0

rn 

 f2 (Yi ) + cst rn E[ξ0 ]E[V 2/3 (Y0 )1{|X 0 | > un }]

i=1

 ≤ E ξ0

rn 

 i

Π f (Y0 ) i=1 u−q/3 rn E[V 1/3 (Y0 )1{|X 0 | n

> un }]E[V 2/3 (Y0 )1{|X 0 | > un }] ó 2 ≤ cst E ξ0 V 2/3 (Y0 ) + cst rn un2q/3 F0 (un ) + cst

î

2

E[V (Y0 )1{|X 0 | > un }] + cst rn un2q/3 F0 (un ) ≤ cst u−q/3 n = O(un2q/3 F0 (un )) . −q/3

Thus un

I = O(F (un )) as needed. This concludes the proof.



394

14 Markov chains

14.3.3 Asymptotic negligibility condition We now consider condition ANSJB(rn , cn ) introduced in Chapter 6 which we recall for convenience. For sequences cn and rn ,  rn P( j=1 |X j |1{|X j | ≤ cn } > ηcn ) =0. (ANSJB(rn , cn )) lim lim sup →0 n→∞ rn P(|X 0 | > cn ) We show that ANSJB(rn , cn ) holds under Assumptions 14.3.1 and 14.3.2 q and under the additional restriction that V ≤ C |g| . We will prove condition ANSJB(rn , cn ) only in the case α ∈ (1, 2) for simplicity. Recall that the condition is satisfied for α ∈ (0, 1]. For the other cases see the references. Lemma 14.3.7 Let Assumptions 14.3.1 and 14.3.2 hold with α ∈ (1, 2) and 1 < q < α. Assume moveover that there exists a constant such that q V ≤ cst |g| . Let rn and cn be sequences such that rn = o(cn ). Then ANSJB(rn , cn ) holds. Proof. Set f (y) = |g(y)|1{|g(y)| ≤ cn }. Condition (14.3.2a) implies that 1 |f | ≤ |g| ≤ cst V q , thus E[|f (Y0 )|] ≤ E[|X 0 |] < ∞ (since α > 1). Since rn = o(cn ), we obtain, for large enough n, r  n  P |X i | 1{|X i | ≤ cn } > cn δ i=1

=P ≤P ≤P

r n  i=1 r n  i=1 r n 

 f (Yi ) > cn δ  {f (Yi ) − E[f (Y0 )]} > cn δ − rn E[|X 0 |]  {f (Yi ) − E[f (Y0 )]} > cn δ/2

i=1 −2 var ≤ 4c−2 n δ

r n 

 f (Yi )

.

i=1 q

Applying Theorem D.3.5 and the assumption V ≤ cst |g| yields r  n  1 var f (Yi ) ≤ cst rn E[|X 0 | 1{|X 0 | ≤ cn }V q (Y0 )] i=1 2

≤ cst rn E[V q (Y0 )1{|X 0 | ≤ cn }] 2

≤ cst rn E[|X 0 | 1{|X 0 | ≤ cn }] . Thus,

14.3 Checking conditions

P

rn i=1

395



2 |X i | 1{|X i | ≤ cn } > cn δ E[|X 0 | 1{|X 0 | ≤ cn }] ≤ cst . rn P(|X 0 | > cn ) c2n P(|X 0 | > cn )

By Proposition 1.4.6, we obtain rn  P i=1 |X i | 1{|X i | ≤ cn } > cn δ lim sup ≤ cst 2−α . rn P(|X 0 | > cn ) n→∞ Letting → 0 concludes the proof.



In the following subsections, we provide examples of models which satisfy the assumptions introduced previously.

14.3.4 AR(1) process with regularly varying innovations Consider the AR(1) process defined by the recursion Xj+1 = ρXj + εj+1 , j ≥ 0 , where ρ ∈ R, {εj , j ∈ N} is a sequence of i.i.d. real-valued random variables independent of a random variable X0 . This means that {Xj , j ∈ N} admits the functional representation (14.0.1) with Φ(x, ε) = ρx + ε. It is thus a Markov chain with kernel Π defined by Π(x, A) = P(ρx + ε1 ∈ A) . Existence of an invariant distribution. The process is stationary if and d only if the distribution of X0 is such that X0 = ρX0 + ε1 . If |ρ| < 1, which δ we assume from now on, and E[|ε0 | ] < ∞ for some δ ∈ (0, 1], the solution to this equation is X0 =

∞ 

ρj ε−j .

(14.3.9)

j=0

By concavity of the function x → xδ , E[|X0 |δ ] ≤

∞ 

ρδj E[|ε0 |δ ] < ∞ .

j=0

Applying Theorem D.1.1, it is easily checked that the series in (14.3.9) is almost surely convergent and is the stationary solution if E[log+ (|ε0 |)] < ∞. Regular variation of the stationary distribution. Assume now that ε0 is regularly varying with tail index α > 0 and extremal skewness pε . By Corollary 4.2.1, X0 is also regularly varying with tail index α and extremal skewness pX = pε if ρ > 0 and pX = (1 + |ρ|α )−1 (pε + (1 − pε )|ρ|α ) if ρ < 0 (see Example 4.2.2).

396

14 Markov chains

Tail process. The functional representation of the Markov chain is given by Φ(x, e) = ρx + e and thus Assumption 14.2.6 is satisfied with φ(u, e) = ρ. The convergence is locally uniform and (14.2.6) holds. The forward spectral tail process is thus Θj = ρj Θ0 with P(Θ0 = 1) = 1 − P(Θ0 = −1) = p0 . Drift condition. We now turn to the drift condition (14.3.1). We assume that there exists q ∈ (0, 1] such that E[|ε0 |q ] < ∞ and set V (y) = 1 + |y|q . Then, for any y ∈ R and x0 > 0 we have using (a + b)q ≤ aq + bq (a, b > 0), ΠV (x) = E[V (ρx + ε0 )] ≤ 1 + |ρ|q |x|q + E[|ε0 |q ] ≤ δV (x) + b ,

(14.3.10)

with δ ∈ (ρq , 1) and b = 1 − δ + E[|ε0 |q ]. Small sets. We assume that ε0 admits a density fε which is continuous and positive on R. Then, for every a > 0 and A ⊂ R,  f (z − ρx)dz ≥ ν(A) , Π(x, A) = P(ρx + ε0 ∈ A) = with =

∞

A

−∞

inf |x|≤a f (z − ρx)dz ∈ (0, 1) and  ν(A) = −1 inf f (z − ρx)dz . A |x|≤a

This proves that all intervals [−a, a] are small sets. See the references for weaker conditions which ensure that compact sets are small sets. Under the assumptions just stated which ensure the drift condition and that compact sets are small, we can apply the results of Part II. For instance, we can apply Corollary 7.5.6 to obtain that the extremal index of {Xj , j ∈ Z} is equal to the right-tail candidate extremal index which was calculated in Problem 7.9: θ = ϑ+ = (1 − ρα )1{ρ ≥ 0} + (1 − |ρ|2α )1{ρ < 0} . The extremal index of {|Xj |, j ∈ Z} is 1 − |ρ|α . Because the β-mixing coefficients decay geometrically fast, the rate conditions R(rn , un ) and β(rn , n ) of Theorem 9.4.1 hold for any sequence k such that k  nδ for some δ ∈ (0, 1). If for such an intermediate sequence the bias condition (9.3.3) is satisfied, then the central limit theorem for the tail empirical process with random threshold (Theorem 9.4.1) holds. Under the appropriate bias condition, we can also obtain the asymptotic normality of the Hill estimator since Slog (rn , un ) holds under Assumptions 14.3.1 and 14.3.2. In Chapter 15, we will use the linear representation of AR(1) instead of the Markovian structure and we will prove that the extremal index exists and is

14.3 Checking conditions

397

equal to the candidate and that the Hill estimator is consistent without using the mixing properties. Recall that mixing is proved using additional assumptions on the noise distribution. However, central limit theorems require mixing.

14.3.5 Stochastic recurrence equation with heavy-tailed innovation Let {(Aj , B j ), j ∈ Z} be a sequence of i.i.d. random pairs in Rd×d ×Rd , independent of a d-dimensional random vector X 0 . Define an Rd -valued process {X j , j ∈ N} recursively by X 0 and X j+1 = Aj+1 X j + B j+1 , j ≥ 0 .

(14.3.11)

We say that {X j , j ∈ N} satisfies a stochastic recurrence equation (SRE). It is a Markov chain with transition kernel Π given by Π(x, ·) = P(A0 x + B 0 ∈ ·) . Existence of an invariant distribution. The chain is stationary if X 0 d satisfies the equation in distribution: X 0 = A1 X 0 + B 1 . By iterating the recursion, it is natural to conjecture that the stationary solution should be given by  i−1  ∞   A−k B −i . (14.3.12) X 0 = B0 + i=1

k=0 q

q

Assume that there exists q ∈ (0, 1] such that E[|B 0 | ] < ∞ and E[ A0 ] < 1 for the associated matrix norm. Then, by concavity of the function x → xq and by independence, q    ∞  i−1 ∞    q q A−k B −i ≤ |B 0 | (E[ A0 ])i < ∞ . E i=1

k=0

i=1

We can apply Theorem D.1.1 to prove the convergence of the series and the existence of an invariant distribution under weaker assumptions. – (D.1.1a) holds with g(A, B) = A . – (D.1.1b) and (D.1.1c) read E[log+ A1 ] < ∞ and E[log A1 ] < 0. – (D.1.1d) holds with x0 = 0 if E[log+ |B 1 |] < ∞. By concavity, E[ A ] < 1 implies E[log( A )] ≤ q −1 log E[ A ] < 0. q

q

398

14 Markov chains

Regular variation of the stationary distribution. We consider here the case where B 0 is heavier tailed than A0 . Specifically, we assume B 0 is regularly varying with tail index α > 0 , α

α+

E[ A0 ] < 1 and there exists > 0 such that E[ A0

] α such that E[ A0 ] < 1. Then for all q  ≤ q, ∞ 

q

E[ A0 · · · Aj−1 ] < ∞ .

j=1

This proves that the conditions of Theorem 4.1.2 hold and X 0 is regularly varying. The tail process. The functional representation is given by Φ(x, (A, B)) = Ax + B, thus Assumption 14.2.6 holds with φ(u, A, B) = Au. If A0 has full rank with probability one, then (14.2.6) holds. In any case, lim sup sup x−1 |Φ(xu, A, b)| ≤ A . x→∞ |u |≤

Therefore (14.2.7) holds. The forward spectral tail process is the multiplicative random walk Θ given by Θ0 and Θj = A j · · · A 1 Θ0 , j ≥ 1 .

(14.3.13)

Note that limj→∞ Θj = 0 by the law of large numbers since E[log( A0 ] < 0. For simplicity, we describe the backward tail process when d = 1 and A0 ≥ 0. In that case, the forward tail process never vanishes if P(A0 = 0) = 0, whereas j P(Θ−j = 0) = 1 − E[|Θj |α ] = 1 − (E[Aα 0 ]) .

(14.3.14)

We apply Theorem 14.2.10 to obtain the backward tail process. Set ρ = E[Aα 0 ] ∈ (0, 1) and let {bj , j ≥ 1} be a sequence of i.i.d. Bernoulli random variables with mean ρ. Let {A˜j , j ∈ N} be a sequence of i.i.d. random variables, independent of {bj , j ≥ 1} such that α E[h(A˜0 )] = ρ−1 E[h(A−1 0 )A0 ] .

The backward tail process is a multiplicative random walk up to its vanishing time:

14.3 Checking conditions

Θ−j =

j 

A˜i bi = 1{j < N }

i=1

j 

399

A˜i ,

i=1

where N = inf{j ≥ 1, bj = 0} has a geometric distribution with mean 1/ρ. This implies that it also holds that limj→−∞ Θj = 0. The backward spectral tail process eventually vanishes by (14.2.14b) since limj→∞ E[|Θj |α ] = j limj→∞ (E[Aα 0 ]) = 0, thus N− is finite with probability one. Still when d = 1 and A0 ≥ 0, B0 ≥ 0, the candidate extremal index is given by    j  α ϑ=E 1 − max Ai . (14.3.15) j≥1

i=1

+

q

Drift condition. For q ∈ (0, α) ∩ (0, 1] such that E[ A ] < 1, define q V (x) = 1 + |x| . Then q

ΠV (x) = Ex [V (X 1 )] = 1 + E[|A0 x + B 0 | ] q q q ≤ 1 + E[ A0 ] |x| + E[|B 0 | ] ≤ δV + b , q

q

with δ = E[ A0 ] and b = 1 − δ + E[|B 0 | ]. Small sets. If B 0 has a positive continuous density on Rd , then arguing as for the AR(1) process, we can prove that compact sets are small. If we only assume that the density is positive in a neighborhood of the origin and A0 has full rank with probability 1, then for every compact set C there exists m ≥ 1 such that (14.3.1) holds. This also proves that the chain is aperiodic by Lemma D.2.4. Under the previous conditions, we can apply Corollary 7.5.6 and we obtain that the extremal index exists and is equal to the candidate extremal index given in (14.3.15). Under the appropriate bias conditions, we can prove that Theorem 9.4.1 holds and the asymptotic normality of the Hill estimator. In Chapter 15, we will prove that the extremal index exists and is equal to the candidate extremal index, and the consistency of the Hill estimator, without the conditions which ensure the minorization. We will use the moving average representation (14.3.12) instead of the Markovian structure.

14.3.6 Stochastic recurrence equation with light-tailed innovation We now consider the equation (14.3.11) for d = 1, under the assumptions of Theorem 14.1.1. Specifically, we assume that {(Aj , Bj ), j ∈ Z} are i.i.d.,

400

14 Markov chains

P(A0 ≥ 0) = 1, the law of log(A0 ) conditionally on A0 > 0 is non-arithmetic and α α E[Aα 0 ] = 1 , E[A0 log+ (A0 )] < ∞ , E[|B0 | ] < ∞ .

(14.3.16)

Note that E[Aα 0 ] = 1 implies (unless A0 ≡ 1) that P(A0 < 1) > 0 and P(A0 > 1) > 0. The first condition is necessary for the existence of the stationary solution, while the second one is necessary for heavy tails of that stationary solution when E[|B0 |α ] < ∞ holds. We have seen in Example 14.1.2 that the series in (14.3.12) is then almost surely absolutely summable and lim xα P(X > x) = c+ ,

x→∞

lim xα P(X < −x) = c− .

x→∞

with α E[(A0 X0 + B0 )α + − (A0 X0 )+ ] , αE[Aα 0 log(A0 )] α E[(A0 X0 + B0 )α − − (A0 X0 )− ] . c− = αE[Aα 0 log(A0 )]

c+ =

Under these assumptions, the stationary time series {Xj , j ∈ Z} is regularly varying and its forward tail process is given by Θj = A1 · · · Aj Θ0 . The main difference with the SRE with heavy-tailed innovations, is that under the present assumptions, E[Θjα ] = 1 for all j ≥ 1, thus the backward spectral tail process never vanishes; cf. (14.2.14b). As previously, we have limj→∞ Θj = 0. The extremal index is given by the same formula as in Section 14.3.5:    j  ϑ=E 1 − max . (14.3.17) Aα i j≥1

i=1

+

The best known examples in the present framework are ARCH(1) and GARCH(1,1) processes. Let a0 > 0 and a1 ≥ 0 and assume that {εj , j ∈ Z} is a sequence of i.i.d. random variables. An ARCH(1) process {Xj , j ∈ Z} satisfies the recursion X j = σ j εj , σj2

= a0 +

2 a1 Xj−1

(14.3.18a) .

(14.3.18b)

Then 2 σj2 = a0 + a1 ε2j−1 σj−1 .

(14.3.19)

14.4 Regeneration and extremal index

401

This is the recursion (14.3.11) with Aj = a1 ε2j−1 , Bj = a0 . Under the assumptions of Theorem 14.1.1 the stationary solution to (14.3.19) is regularly varying with index α/2 given by E[(a1 ε20 )α/2 ] = 1 .

(14.3.20)

If E[|ε0 |α+ ] < ∞ for some > 0, then X0 is regularly varying with index α by Breiman’s Lemma 1.4.3. See also Problem 14.5. A GARCH(1,1) process {Xj , j ∈ Z} satisfies the recursion X j = σ j εj , σj2

= a0 +

(14.3.21a)

2 a1 Xj−1

+

2 b1 σj−1

= a0 + (b1 +

2 a1 ε2j−1 )σj−1

,

(14.3.21b)

with a0 > 0 and a1 ≥ 0 as before and b1 > 0. Define Aj = b1 + a1 ε2j−1 , Bj = a0 , then under the assumptions of Theorem 14.1.1, the stationary solution is regularly varying with tail index α/2 > 0 if E[(b1 + a1 ε20 )α ] = 1 . Again, X0 is regularly varying with tail index α if E[|ε0 |α+ ] < ∞ by Breiman’s Lemma 1.4.3. The drift and minorization conditions are checked in the same way as in the case of heavy-tailed innovations. Under the same conditions, we obtain that the extremal index is equal to the candidate extremal index. Here again, under appropriate bias conditions, the central limit theorem for the Hill estimator can be obtained. However, unlike in the case of the heavy-tailed innovations, stochastic recurrence equations with light-tailed noise do not fit into the framework of Chapter 15.

14.4 Regeneration and extremal index In this section we consider a class of processes which are related to Markov chains. A sequence {Xj , j ∈ N} is a regenerative process if there exists an increasing integer-valued random sequence {Tn , n ∈ N} such that 0 < T0 < T1 < · · · and the (random length) vectors (XTi−1 , . . . , XTi −1 ), i ≥ 1 are i.i.d. and independent of the vector (X0 , . . . , XT0−1 ) which may have a different distribution. It is important to note that the random times Tn need not be measurable with respect to the natural filtration of the process {Xj , j ∈ N}.

402

14 Markov chains

Examples of such processes are functions of Markov chains which admits an accessible positive atom; see Theorem D.2.1. More generally, one may consider 2-dependent regenerative processes, that is, process for which the regeneration cycles are 2-dependent. The theory is more involved in that case and does not lead to tractable extremal properties. Let μ = E[T1 − T0 ], assumed to be finite and by definition μ ≥ 1. If moreover the distribution of T1 − T0 is not restricted to {c, 2c, . . . , } for any integer c > 1, in which case the regenerative process {Xj } is said to be aperiodic, then îT −1 ó 1 E j=T0 f (Xj , Xj+1 , . . . ) E[f (X0 , X1 , . . . )] = . (14.4.1) E[T1 − T0 ] Let {ζj , j ∈ N} be the successive componentwise maxima over each independent cycles, i.e. ζ0 = max Xi , ζj = 0≤i n and Nn = k ⇐⇒ Tk ≤ n < Tk+1 . Then Nn → 1/μ almost surely. Fix δ ∈ (0, 1/μ). Then, for every x > 0,

14.4 Regeneration and extremal index

Å P

ã Å max Xj ≤ x ≤ P

1≤j≤n

max

T0 ≤j δ)

= G[n(1/μ−δ)] (x) + o(1) , where the o(1) term does not depend on x. On the other hand, we have, using (14.4.2), Å ã P max Xj ≤ x 1≤j≤n Å ã ≥P max Xj ≤ x, Nn ≤ n(1/μ + δ) 0≤j≤TNn +1

≥P max ζj ≤ x − P(|Nn /n − 1/μ| > δ) 0≤j≤n(1/μ+δ+1/n)



≥P max ζj ≤ x − P ζ0 > max ζj + o(1) 1≤j≤n(1/μ+δ+1/n)

[n(1/μ+δ+1/n)]

=G

1≤j≤n(1/μ+δ+1/n)

(x) + o(1),

where again the o(1) does not depend on x. If the sequence {cn } satisfies (14.4.3), then Å ã lim sup P max Xj ≤ cn ≤ lim G[n(1/μ−δ)] (cn ) = e−τ (1/μ−δ) , n→∞ 1≤j≤n n→∞ Å ã lim inf P max Xj ≤ cn ≥ lim G[n(1/μ+δ+1/n)] (cn ) = e−τ (1/μ+δ) . n→∞

n→∞

1≤j≤n



Since δ is arbitrary, this proves (14.4.4).

An important consequence of this result is that it gives the extremal index of the sequence {Xj , j ∈ Z} if the stationary distribution can be related to the distribution of the maximum over a cycle. Corollary 14.4.2 If the regenerative sequence {Xj , j ∈ Z} is aperiodic and stationary, then (14.4.2) holds and the following statements are equivalent: (i) the extremal index θ of the stationary regularly varying sequence {Xj , j ∈ Z} exists; (ii) there exists a ≥ 0 such that lim

x→∞

P(ζ1 > x) =a. P(X0 > x)

(14.4.5)

404

14 Markov chains

If either condition holds, then θ = a/μ = lim

x→∞

P(maxT0 ≤i x) . T1 −1 E[ i=T 1{Xi > x}] 0

(14.4.6)

.

Proof. Let x∗ be the right endpoint of G. Then max1≤j≤n ζj converges in probability to x∗ . Therefore, by monotone convergence, (14.4.2) is equivalent to P(ζ0 ≥ x∗ ) = P(ζ0 > x∗ ) = 0. If x∗ = ∞, there is nothing to prove. Assume that x∗ < ∞. Applying the identity (14.4.1), we have for all x > 0, ⎡ ⎤ T 1 −1 P(X0 > x) = μ−1 E ⎣ 1{Xj > x}⎦ . j=T0

Therefore, P(X0 > x) > 0 if and only if P(ζ1 > x) > 0 which implies that the distribution of X0 (under the stationarity assumption) and G have the same right endpoint. By stationarity, the distribution of ζ0 has the same right endpoint as that of X0 , i.e. x∗ . This proves (14.4.2). Let now τ > 0 and cn be such that limn→∞ nP(X0 > cn ) = τ . Then limn→∞ nG(cn ) = aτ . Thus, Theorem 14.4.1 yields Å ã lim P max Xj ≤ cn = e−aτ /μ . n→∞

1≤j≤n

This proves that the extremal index of {Xj } is a/μ.



14.4.1 Examples The next two examples illustrate opposite situations. In the first one, (14.4.5) holds and Corollary 14.4.2 is applicable. Thus, the extremal index exists and the distribution of the maximum over one regeneration cycle is the stationary distribution. The second one deals with a subgeometrically ergodic Markov chain whose extremal index is 0. The third example in this section provides a counterexample to the usual interpretation of the extremal index as the inverse mean cluster size. Example 14.4.3 Let {Vj , j ∈ N∗ } be a sequence of i.i.d. uniform random variables, {Zj , j ∈ N∗ } be a sequence of i.i.d. positive random variables, independent of the previous one and p ∈ (0, 1). Let the Markov chain {Xj , j ∈ N} be defined by X0 ≥ 0, independent of {(Vj , Zj ), j ≥ 1} and the recursion Xj+1 = {Xj + Zj+1 }1{Vj+1 >p} , j ≥ 0 .

14.4 Regeneration and extremal index

405

The chain {Xj } is a randomly killed random walk. The kernel Π of the chain is Π(x, A) = p1A (0) + (1 − p)P(x + Z1 ∈ A) . The state {0} is an atom. The successive return times to zero are defined by (0)

(j)

(j−1)

σ0 = 0 , σ0 = inf{i > σ0

: Xi = 0} , j ≥ 1 .

(1)

We write simply σ0 for σ0 . The distribution of σ0 is geometric with mean (j) 1/p, regardless of the starting point of the chain, hence aperiodic, and σ0 − (j−1) , j ≥ 1 are i.i.d. with the same geometric distribution. Moreover, σ0 is σ0 independent of {Zj }. Thus, the tail of the invariant distribution is given by σ −1  0  P(X0 > x) = pE 1{Z1 +···+Zk >x} k=0

=p =p

=p =p

∞  n=1 ∞  k=0 ∞  k=0 ∞ 

p(1 − p)n−1

P(Z1 + · · · + Zk > x)

k=0

∞ 

P(Z1 + · · · + Zk > x)

p(1 − p)n−1

n=k+1

(1 − p)k P(Z1 + · · · + Zk > x) (1 − p)k−1 P(Z1 + · · · + Zk−1 > x)

k=1

=P

n−1 

σ −1 0 

 Zi > x

.

i=1

On the other hand, since the Zj are non-negative we have   Å G(x) = P =P

max

(1)

(2)

σ0 ≤i x 

Zj > x

=P

ã max (Z1 + · · · + Zi ) > x

1≤i x) .

j=1

Assume that Z1 has tail index α > 0. Applying Corollary 4.2.4, we obtain σ −1  0  1−p P(Z1 > x) . Zj > x ∼ E[σ0 − 1]P(Z1 > x) = P(X0 > x) = P p j=1 Therefore the stationary distribution is in the maximum domain of attraction  of a Fr´echet distribution and the extremal index is 1/E[σ0 ] = p.

406

14 Markov chains

This example was easily studied since the regeneration times were actually independent of the random walk with steps {Xj }. However, what really matters is the fact that the return time has a geometric moment. Example 14.4.4 In this example we construct a regenerative Markov chain, that is, not geometrically ergodic and for which the extremal index is zero. Let {Zj , j ∈ N∗ } be a sequence of i.i.d. positive integer-valued random variables with regularly varying right tail with index β > 1. Define the Markov chain {Xj , j ≥ 0} by the following recursion: ® Xj−1 − 1 if Xj−1 > 1 , Xj = j≥1, if Xj−1 = 1 , Zj and X0 is chosen according to an invariant distribution. Here, {1} is an atom and the successive return times to 1 are regeneration times: (0)

(j)

(j−1)

σ1 = 0 , σ1 = inf{i > σ0 (1)

We write simply σ1 for σ1

: Xi = 1} , j ≥ 1 .

and it is readily checked that σ1 = X0 − 1 and

(j)

σ 1 = X0 − 1 +

j−1 

Zσ(i) +1 . 1

i=1 (j)

(j−1)

Hence, the random variables σ1 − σ1 , j ≥ 2, are i.i.d. with the same distribution as Z0 . Applying the identity (14.4.1), we have for x > 0, ⎡ ⎤ X0 −1+Zσ1 +1  P(X0 > x) = μ−1 E ⎣ 1{X >x} ⎦ j

j=X0

= μ−1 E[(Zσ1 +1 − [x])+ ] = μ−1 E[(Z0 − [x])+ ] , where μ = E[Z0 ] < ∞ since β > 1. Thus, the tail P(X0 > x) of the stationary distribution is regularly varying with index α = β − 1 and is given by P(X0 > x) =

xP(Z0 > x) E[(Z0 − [x])+ ] ∼ , x→∞. E[Z0 ] βE[Z0 ]

(14.4.7)

On the other hand, it is obvious that the tail of the distribution of the cycle maxima is G(x) = P(Z0 > x) = o(F (x)). Thus the extremal index is zero.  The following example provides a counterexample to the usual interpretation of the extremal index as the inverse mean cluster size (see Example 6.2.8). Example 14.4.5 Let {τj , j ∈ N} be a sequence of integer-valued random variables. Define Tj = τ0 + · · · τj , j ≥ 0, T−1 = −∞ and the counting process M by

14.4 Regeneration and extremal index

407

M (t) = j ⇐⇒ Tj−1 ≤ t < Tj , j ≥ 0 . Let {ξj , j ∈ N} be a sequence of i.i.d. positive random variables and define the process {Xn , n ∈ N} by Xn = ξM (n) , that is, Xn = ξj ⇐⇒ Tj−1 ≤ n < Tj . We assume moreover that the pairs (ξj , τj ), j ≥ 1 are i.i.d. and that τ0 is independent of all the other random variables. Under the assumption μ = E[τ1 ] < ∞, the process {Xj , j ∈ N} is regenerative and the Tj are the regeneration times. By construction, the process {Xj , j ∈ N} is constant between regeneration times. The stationary distribution is given by (14.4.1): ⎡ ⎤ T 1 −1 −1 ⎣ 1{Xj ≤ x}⎦ = μ−1 E[τ1 1{ξ1 ≤ x}] . P(X0 ≤ x) = μ E j=T0

It is possible to choose the distribution of τ0 in such a way that M is a stationary renewal process, i.e. the process {M (a+t, b+t], t ≥ 0} is stationary for all 0 ≤ a < b. Specifically, we choose P(τ0 = k) = μ−1 P(τ1 ≥ k) , k ≥ 1 , or equivalently P(τ0 > k) = μ−1 E[(τ1 − k)+ ] , k ≥ 0 . Moreover, the sequence {Xj } is constant during a cycle, thus, for j ≥ 0, max

Tj ≤i 0 .

408

14 Markov chains

This proves that the regeneration times are non-arithmetic and the regeneration process {Xj , j ∈ Z} is aperiodic. Moreover, E[τ1 | ξ1 ] = 1 + a = μ, thus P(X0 ≤ x) = μ−1 E[τ1 1{ξ1 ≤ x}] = P(ξ1 ≤ x) . We now assume that the distribution of ξ1 is in the maximum domain of attraction of the Fr´echet distribution with tail index α > 0. By Corollary 14.4.2, we have P(ζ1 > x) P(ξ1 > x) = μ−1 lim = μ−1 . x→∞ P(X0 > x) x→∞ P(ξ1 > x)

θ = μ−1 lim

We now consider the point process of exceedences Nn defined by Nn =

n 

δa−1 , n Xj

j=1

with an a non-decreasing sequence such that limn→∞ nP(X0 > an ) = 1. Then w Nn =⇒ N , where N is a Poisson point process on (0, ∞) with mean measure −1 μ να . Indeed, let f be a non-negative continuous function with support separated from 0. Nn (f ) =

n 

f (Xj /an ) ≤

j=1

T 0 −1

f (Xj /an ) +

M (n)+1 Ti −1  

j=0



i=1

f (Xj /an )

j=Ti−1

M (n)+1

= τ0 f (ξ0 /an ) +

τi f (ξi /an ) .

i=1

Similarly, 

M (n)

Nn (f ) ≥

τi f (ξi /an ) .

i=1

Since M (n)/n → 1/μ, the required convergence will follow from n −f lim E[e− i=1 τi f (ξi /an ) ] = lim (E[e−τ1 f (ξ1 /an ) ])n = e−να (1−e ) . n→∞

n→∞

Equivalently, we must prove lim n{1 − E[e−τ1 f (ξ1 /an ) ]} = να (1 − e−f ) .

n→∞

For m ≥ 1, write pm = P(m − 1 < ξ1 ≤ m). Then

14.4 Regeneration and extremal index

409

n{1 − E[e−τ1 f (ξ1 /an ) ]} ∞   m−a E[e−f (ξ1 /an ) | m − 1 < ξ1 ≤ m] pm n 1 − = m m=1  a − E[e−(m+1)f (ξ1 /an ) | m − 1 < ξ1 ≤ m] m = n{1 − E[e−f (ξ1 /an ) ]} ∞  apm nE[e−f (ξ1 /an ) − e−(m+1)f (ξ1 /an ) | m − 1 < ξ1 ≤ m] . + m m=1 Since the support of f is separated from zero, there exists > 0 such that the expectation in the last term is zero if m + 1 ≤ an . Thus ∞  pm nE[|e−f (ξ1 /an ) − e−(m+1)f (ξ1 /an ) | |m − 1 < ξ1 ≤ m] m m=1

≤ cst n a−1 n P(ξ1 > an ) → 0 . By regular variation of ξ1 , this yields lim n{1 − E[e−τ1 f (ξ1 /an ) ]} = lim n{1 − E[e−f (ξ1 /an ) ]} = να (1 − e−f ) .

n→∞

n→∞

The convergence of the point process of exceedences confirms that the extremal index is μ−1 although there seems to be a paradox: the extremal index is less than 1, but there is no clustering of extremes since the point process of exceedences converges to a simple Poisson point process. We can also compute the forward tail process. The joint distribution of (X0 , τ0 ) is given by the Ryll-Nardzewski-Slivnyak formula: τ −1  1  −1 1{ξ1 ≤ x}1{τ1 − i = k} P(X0 ≤ x, τ0 = k) = μ E i=0

= μ−1 E [1{ξ1 ≤ x}1{τ1 ≥ k}] . Thus

⎡ P(X0 ≤ x, τ0 > k) = μ−1 E ⎣1{ξ1 ≤ x}



⎤ 1{τ1 ≥ j}⎦

j>k

= μ−1 E [1{ξ1 ≤ x}(τ1 − k)+ ] . This yields P(X0 > x, τ0 > k) = μ−1 E [1{ξ1 > x}(τ1 − k)+ ] . If k ≥ 1,

(14.4.8)

410

14 Markov chains

E[(τ1 − k)+ | m − 1 < ξ1 ≤ m] =



qm,j (j − k) =

j>k

a(m + 1 − k)+ . m

Thus, for fixed k ≥ 1, > 0 and x large, E[(τ1 − k)+ | ξ1 > x] =

 (m + 1 − k)+ a P(m − 1 < ξ1 ≤ m) P(ξ1 > x) m

 ≤ ≥

m≥x

(1+)aP(ξ1 >x) P(ξ1 >x) (1−)aP(ξ1 >x) P(ξ1 >x)

, .

This shows that for all k ≥ 1. lim E[(τ1 − k)+ | ξ1 > x] = a .

x→∞

(14.4.9)

By the regenerative property, X0 and Xk can be simultaneously large only if τ0 > k. Thus, for y0 > 1, y1 > 0, . . . , yk > 0, we have, by (14.4.9), P(X0 > xy0 ,X1 > xy1 , . . . , Xk > xyk ) ∼ P(X0 > x(y0 ∨ · · · ∨ yk ), τ0 > k) = μ−1 E [1{ξ1 > x(y0 ∨ · · · ∨ yk )}(τ1 − k)+ ] ∼ aμ−1 P(ξ1 > x(y0 ∨ · · · ∨ yk )) . This proves that for every k ≥ 1, lim P(X1 > xy0 , . . . , Xk > xyk | X0 > x) = aμ−1 (y0 ∨ · · · ∨ yk )−α .

x→∞

(14.4.10) Thus the tail process has the form Yk = b Y0 , k ≥ 1 , with b a Bernoulli random variable with mean a/μ, independent of Y0 . This means that the forward tail process is constant with probability 1/μ and identically zero from lag 1 with probability 1 − 1/μ. Thus

P sup Yj ≤ 1 = P(b = 0) = 1 − aμ−1 = μ−1 = θ . j≥1

Since P(limj→∞ Yj = 0) < 1, Theorem 6.1.4 implies that condition AC(rn , cn ) cannot hold for any sequences rn and cn . This confirms that the interpretation of the extremal index as the inverse mean cluster size does not hold here. 

14.5 Extremally independent Markov chains

411

14.5 Extremally independent Markov chains In this section, we give conditions on a Markov kernel Π for the associated Markov chain to satisfy Assumption 11.0.1. In order to apply the results of Chapter 11, the drift and minorization conditions are needed. We will not address this issue here since there is nothing specific to the present context and was discussed in the preceding sections of this chapter. Assumption 14.5.1 Let Π be a Markov kernel. There exist a function b which is regularly varying at infinity with index κ ∈ [0, 1) and a Markov kernel K on Rd \ {0} × B(Rd ) such that lim Π(xu, b(x)A) = K(u, A) ,

x→∞

(14.5.1)

for all A such that K(u, ∂A) = 0, locally uniformly with respect to every u ∈ Rd \ {0}.

The restriction κ < 1 is meant to ensure extremal independence of the finitedimensional distributions. Since b is regularly varying, the kernel K has the following homogeneity property: for t > 0, u ∈ Rd \ {0}, K(tu, t·) = tκ K(u, ·) . Define b0 (x) = x, b1 = b and for h ≥ 1, bh = bh−1 ◦ b. Theorem 14.5.2 Let {X j , j ∈ N} be a Markov chain on Rd with kernel Π satisfying Assumption 14.5.1 and initial distribution satisfying Assumption 14.2.2. Assume moreover that either K(u, {0}) = 0

(14.5.2)

lim lim sup sup Π(xu, B c (0, b(x))) = 0 .

(14.5.3)

for all u ∈ Rd or →0 x→∞ |u |≤

In both cases, let K be extended to a kernel on Rd × B(Rd ) by setting K(0, A) = 1{A} (0). Then the sequence {X j , j ∈ N} is regularly varying and extremally independent with conditional scaling exponents κh = κh , h ≥ 1 and the finite-dimensional distributions of the sequence {bj (x)−1 X j , j ≥ 0} converge to those of a Markov chain {Y j , j ∈ N} with kernel K and initial distribution να ⊗ Λ0 .

412

14 Markov chains

Proof. The proof is a line by line adaptation of the proof of Theorem 14.2.3. Let λ be the distribution of X 0 and let λx be defined as in (14.2.8) and define the kernel Πx by Πx (u, A) = Π(xu, b(x)A) . Then we must prove that the measure λx ⊗ ⊗hi=0 Πbi (x) converges weakly to ν ⊗ K ⊗h on B c (0, 1) × Rdh . This is true for h = 0 by regular variation of the initial distribution. Assuming this property is true for h ≥ 0, we have, for a bounded continuous function f on B c (0, 1) × Rd(h+1) , λx ⊗ ⊗hi=0 Πbi (x) (f ) = λx ⊗ ⊗h−1 i=0 Πbi (x) (fx ) with

 fx (u0 , . . . , uh ) =

Rd

f (u0 , . . . , uh+1 )Πbh (x) (uh , duh+1 ) .

⊗h conBy the induction assumption, the sequence of measures λx ⊗ ⊗h−1 i=0 Πx ⊗h verge weakly to ν ⊗ K ; therefore there only remain to prove that fx converges to the function f¯ defined by  f (u0 , . . . , uh )K(uh−1 , duh ) , f¯(u0 , . . . , uh−1 ) = Rd

locally uniformly on the appropriate space. If K(u, {0}) = 0 for all u = 0, then the appropriate space is B c (0, 1) × (Rd \ {0})h−1 and the convergence holds by assumption. On the other hand, if (14.5.3) holds, then, considering without loss hof generality a function f which can be expressed as f (u0 , . . . , uh ) = i=0 fi (ui ), it suffices to prove that Πx fh (ux ) converges locally uniformly on Rd to fh (0) for every sequence ux which converges to 0. This follows from (14.5.3).  As in Section 14.2, we can rewrite Theorem 14.5.2 in terms of the functional representation. Assumption 14.5.3 There exist a function b regularly varying at infinity with index κ > 0 and an application φ : Rd \ {0} × F → Rd such that for every e ∈ F, lim b(x)−1 Φ(xu, e) = φ(u, e) .

x→∞

locally uniformly with respect to u ∈ Rd \ {0}.

14.5 Extremally independent Markov chains

413

This assumption implies that the function φ has the following homogeneity property: for all u ∈ Rd \ {0}, e ∈ F and t > 0, φ(tu, e) = tκ φ(u, e) . Theorem 14.5.4 Let {X j , j ∈ N} be a Markov chain on Rd with functional representation (14.0.1) and initial distribution satisfying Assumption 14.2.2. Assume that Assumption 14.5.3 holds and that either P(φ(u, ε0 ) = 0) = 0

(14.5.4)

for all u ∈ Rd or lim lim sup sup P(|Φ(xu, ε0 )| > b(x)) = 0 .

→0 x→∞ |u |≤

(14.5.5)

Then the sequence {X j , j ∈ N} is regularly varying and extremally independent with conditional scaling exponents κh = κh , h ≥ 1 and the finitedimensional distributions of the sequence {bj (x)−1 X j , j ≥ 0} converge to those of a Markov chain {Υj , j ∈ N} with initial distribution να ⊗ Λ0 and given by the recursion Υj+1 = φ(Υj , εj+1 ).

By analogy to Chapter 5, we may call the sequence Υ the (forward) hidden tail process. Example 14.5.5 In the case of a Markov chain on R+ which satisfies the assumptions of Theorem 14.5.4, we can define Wj = φ(1, εj ) and the hidden forward tail process is then the multiplicative AR(1) process given by the  recursion Υj+1 = Υjκ Wj+1 . Example 14.5.6 (Exponential AR(1)) Let the time series {Xj , j ∈ Z} be defined by Xj = eξj with ξj = ρξj−1 + εj ,

(14.5.6)

where 0 ≤ ρ < 1 and {εj , j ∈ Z} is an i.i.d. sequence such thatE[ε0 ] = 0 and ∞ eε0 has a regularly varying right tail with index α. Let ξj = i=0 ρi εj−i be the stationary solution of the AR(1) equation (14.5.6) and define Xj = eξj , j ∈ Z. Then {Xj , j ∈ Z} is also stationary. Since ρ ∈ [0, 1), we have for all j ≥ 1, j

E[eαρ

0

] x) = P(eξ0 > x) = P(eε0 e

∞ i=1

ρi −i



∼ P(eε0 > x)E eα = P(eε0 > x)

∞ 

> x) ∞ i=1

ρi ε−i



î i ó E eαρ ε−i .

i=1

That is, the marginal distribution of the stationary sequence {Xj , j ∈ Z} has a regularly varying right tail and is tail equivalent to eε0 . The exponential AR(1) satisfies the equation Xj+1 = Xjρ eεj+1 .

(14.5.7)

This corresponds to the functional representation Φ(x, ) = xρ e . Thus, choosing b(x) = xρ we obtain φ(u, ) = e for all u ≥ 0, and hence both Assumptions 14.5.3 and (14.5.4) hold. We can thus apply Theorem 14.5.4. The hidden tail process is a non-stationary exponential AR(1) process Υ defined by  Υj+1 = Υjρ eεj+1 and Υ0 is a standard Pareto random variable. Example 14.5.7 (Switching Exponential AR(1)) Let {Uj , j ∈ Z} be an i.i.d. sequence with uniform marginal distribution on [0, 1], and let {Rj , j ∈ Z} be a sequence of i.i.d. positive random variables, independent of the sequence {Uj , j ∈ Z}. Let ρ > 0, f : [0, ∞) → [0, 1] be a measurable function and define a Markov chain {Xj , j ∈ Z} by X0 and Xj+1 = Rj+1 (Xjρ 1{f (Xj )≤Uj+1 } + 1{f (Xj )>Uj+1 } ) . The transition kernel Π of the chain is defined by Π(x, A) = (1 − f (x))P(xρ R ∈ A) + f (x)P(R ∈ A) . If limx→∞ f (x) = η, then Assumption 14.5.1 holds with K defined by K(u, A) = lim Π(xu, xρ A) = (1 − η)P(R0 ∈ u−ρ A) + ηδ0 (A) , x→∞

for all u > 0 and A ∈ B(Rd ) which is a continuity set of u−ρ R0 . If η > 0, then (14.5.2) does not hold. However (14.5.3) holds. So we can apply Theorem  14.5.2. The conditional scaling exponent at lag 1 is κ1 = ρ.

14.6 Problems 14.1 Consider the SRE of Example 14.1.2. Assume that A, B ≥ 0. Show that for any q = 0,  ∞ E[(AX + B)q − (AX)q ] = q E[(AX + u)q−1 ]P(B > u)du , 0

14.6 Problems

415

where both sides are simultaneously finite or infinite. Conclude that if B is standard exponential, then E[(AX + B)α − (AX)α ] = αE[X α−1 ] . d

Hint: use the fact that AX + B = X. Note that E[(AX + B)α ] = E[X α ] = ∞. 14.2 Assume that {(Aj , Bj ), j ≥ 1} are i.i.d.random vectors of non-negative j random variables. Let Vj = i=1 Ai . Assume that P(A = 0) > 0. Show that ∞ the series i=1 Vi−1 Bi is almost surely finite. 14.3 Let {X j , j ≥ 0} be a Markov chain on Rd with kernel Π. Assume that there exists a locally bounded measurable function V : Rd → [1, ∞) such that ∀r > 0 ,

sup |x|≤r

ΠV (x) 0, a1 ≥ 0 and {εj , j ∈ Z} is a sequence of i.i.d. random variables. 1. Prove that log(a1 )+E[log(ε20 )] < 0 is a sufficient condition for the recursion (14.6.2b) to have a stationary solution. 2. Assume that ε0 has a symmetric Pareto distribution with index β > 0, that is P(ε0 > x) = P(ε0 < −x) = (1/2)x−β , x > 1. Assume that log(a1 ) < −2/β. Show that there exists a unique strictly positive solution α < β/2 to E[(a1 ε20 )α ] = 1 and the conditions of Theorem 14.1.1 are satisfied. 3. Assume that ε0 is standard normal. Assume that a1 ∈ (0, 2 exp(γ)), where γ is the Euler constant. Show that there exists a unique strictly positive solution to E[(a1 ε20 )α ] = 1 and the conditions of Theorem 14.1.1 are satisfied. 4. Let Π be the transition kernel of the chain {Xj }. Assume that ε0 has a density, that is, bounded away from zero on [−τ, τ ] for some τ > 0. Prove that compact sets are small for Π.

416

14 Markov chains q/2

5. Assume that there exists q > 0 such that a1 E[|ε0 |q ] < 1. Prove that (14.3.1) holds for the kernel Π with V (x) = 1 + xq . 14.6 Consider the GARCH(1,1) model X j = σ j εj , σj2

= a0 +

(14.6.3a)

2 a1 Xj−1

+

2 b1 σj−1

= a0 + (b1 +

2 a1 ε2j−1 )σj−1

,

(14.6.3b)

where a0 > 0, a1 , b1 ≥ 0 and {εj , j ∈ Z} is a sequence of i.i.d. random variables. 1. Show that a unique stationary solution to the equation (14.6.3b) exists when E[log(b1 + a1 ε20 )] < 0. 2. Assume that there exists q > 0 such that E[(b1 + a1 ε20 )q ] < 1. Prove that (14.3.1) holds for {σj2 } with V (x) = 1 + xq . 14.7 (Markovian bilinear process) Xj = aXj−1 + bεj Xj−1 + εj , where a, b > 0 and {εj , j ∈ Z} is a sequence of i.i.d. random variables. 1. Show that if E[log(|a + bε0 |)] < 0, then the unique stationary solution exists and is given by   i−1 ∞   d X 0 = ε0 + (a + bεk ) εi . i=1

k=0

2. Assume that a = 0, ε0 has a Pareto distribution with index β > 0, that is, P(ε0 > x) = x−β , x > 1. Assume that log(b) < −1/β. Show that there exists a unique strictly positive solution α to E[(bε0 )α ] = 1 with α < β and the conditions of Theorem 14.1.1 are satisfied. 3. We don’t assume any longer that a = 0. Compute the spectral tail process and the candidate extremal index ϑ. 14.8 (Non-Markovian bilinear process) Xj = aXj−1 + bεj−1 Xj−1 + εj , where a, b > 0 and {εj , j ∈ Z} is a sequence of i.i.d. random variables. 1. Show that a unique stationary distribution exists when E[log(|a+bε0 |)] < 0 and is given by ∞  d X 0 = ε0 + (a + bε1 ) · · · (a + εj+1 )εj+1 . j=1

Hint: Define Uj = (a + bεj )Xj and show that {Uj } fulfills a stochastic recurrence equation.

14.6 Problems

417

2. Conclude that when E[ε0 ] = 0 and E[ε20 ] = 1, then the unique stationary solution exists if a2 + b2 < 1. 14.9 Let {Zj , j ∈ Z} be a sequence of i.i.d. positive regularly varying random variables and let {bj , j ∈ Z} be a sequence of i.i.d. Bernoulli random variables with mean 1 − p ∈ (0, 1), independent of the previous sequence. Let X0 be a random variable independent of all previous ones. For j ≥ 0, define recursively Xj+1 = Xj bj+1 + Zj+1 . 1. Prove that the stationary solution of the equation is given by ∞  b1 · · · bj Zn−j , n ≥ 1 , Xn = Zn + j=1

and that the stationary distribution is that of Z1 + · · · + ZN , where N is an integer valued random variable with a geometric distribution with mean p−1 . 2. Prove that the stationary distribution is regularly varying. 3. Prove that the series {Xj } is regularly varying and compute its spectral tail process and candidate extremal index. 4. Prove that {Xj } is a regenerative process and compute its extremal index. 14.10 Consider the model of Example 14.4.3. Prove that it is geometrically ergodic. Hint: apply Theorem D.3.2 with C = {0}. 14.11 Let {εj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables and {bj , j ∈ Z} be a sequence of i.i.d. Bernoulli random variables with mean p, independent of {Zj , j ∈ Z}. Define Xj+1 = (1− bj+1 )Xj + bj+1 Zj+1 , j ≥ 0. 1. Show that the stationary solution is given by ∞  Xn = bn Zn + bn−i (1 − bn−i+1 ) · · · (1 − bn )Zn−i , n ≥ 1 , i=1

and that the stationary distribution is the distribution of Z0 . 2. Compute the forward spectral tail process and the candidate extremal index ϑ. 3. Show that the process is regenerative and compute the extremal index. 14.12 Let {Vj , j ∈ N} be a sequence of i.i.d. uniform random variables, {εj , j ∈ N} be a sequence of i.i.d. non-negative random variables, independent of the previous one and r : R+ → [0, 1] be a non-decreasing function. Let the Markov chain {Xj , j ∈ N} be defined by X0 , independent of {Vj } and {Zj }, and the recursion

418

14 Markov chains

Xj+1 = {Xj + Zj+1 }1{r(Xj ) ≤ Vj+1 } , j ≥ 0 . Recall that the return times to zero are defined by (1)

(j)

(j−1)

σ0 = σ0 = inf{j ≥ 1 : Xj = 0} , σ0 = inf{i > σ0

: Xi = 0} , j ≥ 1 .

Write μ = E0 [σ0 ], s = 1 − r, W0 = 0 and Wn = Z1 + · · · + Zn , n ≥ 1. 1. Prove that μ=1+

∞  i=1

E

 i−1 

 s(Wj )

j=0

We assume hereafter that μ < ∞. 2. Prove that {0} is an accessible atom and the successive return times to 0 are regeneration times for the chain. Hint: note that Px (σ0 > n) ≤ P0 (σ0 > n) for all x > 0. 3. Prove that the stationary distribution π is given by     σ k−1 ∞ 0 −1   −1 −1 −1 π = μ E0 δ0 + δW k = μ δ0 + μ E δW k s(Wi ) . (14.6.4) k=1

k=1

i=0

4. Find the functional autoregressive representation of {Xj } and prove that Assumption 14.5.3 holds. Hint: set εj = (Vj , Zj ). 5. Assume that the distribution of Z0 is regularly varying with tail index α > 0. Let p = limx→∞ r(x) ∈ (0, 1]. Prove that the stationary distribution is also regularly varying and that P(X0 > x) ∼ p−1 (1 − μ−1 )F Z (x) . 6. Compute the tail process. 7. Show that ϑ = p. 8. Prove that the chain is geometrically ergodic. Hint: check that there exists β ∈ (0, 1 − p) and a constant such that Px (σ0 > n) ≤ P0 (σ0 > n) ≤ cst β n for all x ≥ 0 and apply Theorem D.3.2 with C = {0}. 9. Apply Corollary 14.4.2 to compute the extremal index. Cf. Problem 14.10 and Example 14.4.3 which are the special case. 14.13 Consider the R-valued Markov chain defined by X0 and the recursion Xj+1 = f (Xj ) + Zj+1 , j ≥ 0 ,

14.6 Problems

419

where {Zj , j ≥ 0} is a sequence of i.i.d. random variables with finite mean and f : R → R is a locally bounded continuous function and there exists ρ ∈ (0, 1) such that lim x−1 f (x) = ρ .

|x|→∞

1. Prove that Assumption 14.2.6 and (14.2.6) hold. 2. If the initial distribution is regularly varying with tail index α, compute the forward tail process and the candidate extremal index ϑ. 3. Prove that (14.3.1) holds with V (x) = 1 + |x|. 4. If f is continuous and Z0 has a positive continuous density, prove that compact sets are small sets. Hint: let g be the density of Z0 . Prove that ∞ inf x∈K g(y − f (x))dy > 0 for all compact sets K. −∞ 5. Prove that there exists an invariant distribution, that the chain is geometrically ergodic, and its extremal index is equal to the candidate extremal index. 14.14 Consider the random walk with a reflecting barrier Xj+1 = (Xj + Zj+1 )+ , where {Zj , j ∈ N} is a sequence of i.i.d. random variables, regularly varying with index α > 1 and a negative expectation μ ∈ [−∞, 0). 1. Prove that Assumption 14.2.6 holds. 2. Prove that {0} is an accessible positive atom. Hint: apply Theorem E.4.9. n 3. Show that for each n, Xn has the same distribution as j=0 Sj , where n S0 = 0, Sn = j=1 Zj , n ≥ 1. ∞ 4. Conclude that the unique stationary distribution X∞ = j=0 Sj exists and is regularly varying with index α − 1 (cf. [EKM97, p. 39]). 5. Show that the forward tail chain is constant and the candidate extremal index, hence the extremal index, is 0. (Cf. Lemma 7.5.4.) 14.15 Let π be a probability distribution on Rd with density h with respect to Lebesgue’s measure on Rd . The random walk Hasting-Metropolis algorithm is used to simulate from π when h is known only up to a normalizing constant. Consider the Markov chain on Rd given by X0 and the recursion Xj+1 = Xj + Zj+1 1{Vj+1 ≤ a(Xj , Xj + Zj+1 )} , where {Vj , j ∈ N} is a sequence of i.i.d. uniform random variables and {εj , j ∈ N} is a sequence of i.i.d. random variables with symmetric distribution admitting the density q with respect to Lebesgue’s measure on Rd , and the function a : Rd × Rd → [0, 1] is defined by

420

14 Markov chains

Å a(u, v) = min

ã h(v) ,1 . h(u)

1. Prove that π is the invariant distribution of this Markov chain. 2. Write its functional representation Xj+1 = Φ(Xj , εj+1 ) with εn = (Vj , Zj ) and prove that Assumption 14.2.6 holds. 3. Show that if X0 (whose distribution is π) is regularly varying, then the forward tail chain is constant and the candidate extremal index, hence the extremal index, is 0. (Cf. Lemma 7.5.4.) 14.16 Let {(Aj , Bj ), j ∈ Z} be sequence of i.i.d. pairs of non-negative random variables and assume that A0 satisfies the assumptions of Theorem 14.1.1 and E[B0α ] < ∞. Let {Xj , j ≥ 0} satisfy the recursion Xj+1 = max(Aj+1 Xj , Bj+1 ) , j ≥ 0 . d

1. Prove that a solution to the equation X0 = max(B0 , A0 X0 ) is given by X0 =

∞ 

A1 · · · Ak−1 Bk .

k=1

2. Prove that C+ = limx→∞ P(X0 > x) exists and expressed it in terms of (A0 , B0 ) independent of X0 . Show that C+ > 0 if and only if P(B0 > 0) > 0. 3. Compute the tail process and the candidate extremal index. 14.17 Let {(Aj , Bj , Lj ), j ∈ Z} be sequence of i.i.d. random vectors with of non-negative components. Assume that A0 satisfies the assumptions of Theorem 14.1.1 and E[B0α ] + E[(A0 L0 )α ] < ∞. Let {Xj , j ≥ 0} satisfy the recursion Xj+1 = Bj+1 + Aj+1 max(Lj+1 , Xj ) , j ≥ 0 . d

1. Prove that a solution to the equation X0 = A0 + max(L0 , A0 X0 ) is given by  ∞  k  ∞    B−k M−k ∨ B−j M−j + L−j M−j X0 = k=0

k=0

j=0

with M0 = 1 and M−k = A0 · · · A−k+1 , k ≥ 1. 2. Prove that C+ = limx→∞ P(X0 > x) exists and express it in terms of (A0 , B0 ) independent of X0 . 3. Compute the tail process and the candidate extremal index.

14.6 Problems

421

Extremally independent chains 14.18 Assume that {Xj , j ∈ Z} is a Markov chain on [0, ∞) which fulfills the assumptions of Theorem 14.5.4. Show that î ó α/κ lim P(X1 > b(x) | X0 > x) = E min{1, W1 } x→∞ ñ ® Å ã ´ô X1 α/κ , = lim E min 1, x→∞ b(X0 ) with W1 as in Example 14.5.5. 14.19 Consider the Exponential AR(1) model of Example 14.5.6. 1. Show that the limiting conditional distribution of Xh given X0 > x is  ∞ h h lim P(Xh ≤ xφ y | X0 > x) = P(eξ0,h ≤ v −φ y) αv −α−1 dv , x→∞

1

where for h ≥ 1, ξ0,h =

h−1 i=0

φi ε−i .

2. Let α > 1. Show that ï ò Xh αE[X0 ] î hó . lim E φh | X0 > x = x→∞ x (α − φh )E X0φ 3. Prove that the tail index of X0 Xh is α/(1 + φh ). 14.20 Let {εj , j ∈ Z} be a sequence of i.i.d. non-negative regularly varying random variables with tail index α > 0. Let β ∈ (0, 1), ρ > 0. Let X0 be a random variable independent of {εj , j ∈ Z}. For j ≥ 0, define recursively Xj+1 = ρXjβ + Zj+1 . 1. Prove that if ρβ < 1, then there exists an invariant distribution given by d

X0 = Z0 + ρ(Z1 + ρ(Z2 + ρ(Z3 + · · · )β )β )β . 2. Prove that (14.3.1) holds with V (x) = 1 + xq , q ∈ (0, α) for all values of ρ. 3. Assume that the density of Z0 is non-increasing on [0, ∞) and bounded away from zero on compact sets. Prove that all compact sets are small. 4. Prove that the stationary distribution satisfies P(X0 > x) ∼ P(Z0 > x). 5. Prove that Assumption 14.5.3 holds but (14.5.4) does not hold. 6. Prove that (14.5.1) holds and determine the kernel K but (14.5.2) does not hold.

422

14 Markov chains

7. Show that (14.5.5) holds. 8. Determine the process {Yj , j ≥ 0} in Theorem 14.5.4. 14.21 (β-ARCH) Let {Zj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables with α > 0, such that P(Z0 > 0) > 0. Let β ∈ (0, 1). Let X0 be a random variable independent of {Zj , j ∈ Z}. For j ≥ 0, define recursively Xj = ηj Zj , 2 ηj+1

=ω+

γ(Xj2 )β

(14.6.5a) =ω+

γ(Zj2 )β ηj2β

.

(14.6.5b)

1. Prove that if there exists q ∈ (0, α/2) ∩ (0, 1] such that β q γ q E[Z02qβ ] < 1, there exists an invariant distribution for the chain {ηj2 } given by d

η02 = ω + γZ02β (ω + γ(Z12 )β (ω + γ(Z32 )β (ω + · · · )β )β )β . 2. Prove that (14.3.1) holds for {ηj2 , j ∈ N} with V (x) = 1 + xq for any q < α/(2β) and without restriction on γ. 3. Assume that the density of Z0 is bounded away from 0 on [−τ, τ ] for some τ > 0. Prove that compact sets are small. 4. Let X0 be distributed according to the invariant distribution. Prove that P(X0 > x) = E[η0α ] . x→∞ P(Z0 > x) lim

5. Let Π be the transition kernel of the chain {Xj , j ∈ Z}. Prove that Assumption 14.5.3 and (14.5.5) hold but (14.5.4) does not hold.

14.7 Bibliographical notes For a general introduction to Markov chains, see [MT09] or [DMPS18]. Extreme value theory for Markov chains has a very long history. See, for instance, the references in [Roo88], [Per94] and [JS14]. The earlier references were mainly concerned with the extremal index. In this chapter, our main concern is the regular variation of Markov chains (as processes) and their tail processes, rather than the regular variation of the invariant distribution. The main reference for existence and tails of solutions of stochastic recurrence equations is [Kes73]. Example 14.1.2 is taken from [Gol91] as well as Problems 14.16 and 14.17. The literature on this topic is extremely wide and still growing. References for regular variation of some specific Markoviantype models, such as AR with random coefficients, AR with GARCH errors, threshold AR, and GARCH, include: [dHRRdV89], [BP92], [BK01], [KP04],

14.7 Bibliographical notes

423

[Cli07]. See also [JOC12], [Als16] for recent references. More reference and multivariate extensions are given in [BDM16]. The results of Section 14.2 are based on [JS14]. See also [RZ13a, RZ13b] for related results. Section 14.3 builds on [KSW18] which relies on the application of the geometric drift condition initiated by [MW13]. The link between geometric ergodicity and a positive extremal index was first made by [RRSS06]. The asymptotic negligibility condition ANSJB(rn , cn ) is proved for all α > 0 in [MW13, Theorem 4.1] for atomic chains by regeneration techniques which can be extended to general geometrically ergodic chains. Section 14.4 is essentially inspired from [Roo88]. A reference for regenerative processes and (14.4.1) is [Asm03, Chapter VI]. Example 14.4.5 is due to [Smi88]. For regenerative Markov chains special estimation techniques were developed in [BC06], [BCT09], [BCT13]. Section 14.5 follows [KS15, Section 3]. See also [RZ14]. The stochastic unit-root model of Problem 14.12 is inspired from [GR06] and [Rob07]. The Ryll-Nardzewski-Slivnyak formula (14.4.8) can be found in [BB03, Eq. (1.2.25)]

15 Moving averages

In this chapter we will apply the results of Chapter 4 to define one-sided and two-sided multivariate moving average processes with random coefficients. In Sections 15.1 and 15.2, we will recall the conditions for existence and the regular variation properties. We will compute the tail process and the candidate extremal index. The conditions of existence allow for a m-dependent tail equivalent approximations. Thus the results of Part II which can be obtained by such approximations hold in this context: they include point process convergence, existence of the extremal index and its equality with the candidate extremal index, consistency of the tail empirical measure and the Hill estimator. In order to obtain stronger results, such as convergence to stable laws and asymptotic normality of estimators, we need stronger assumptions. In Section 15.3, we will give such results only in the case of univariate moving average processes with deterministic coefficients.

15.1 Regular variation of two-sided moving average processes Let {Z j , j ∈ Z} be a sequence of i.i.d. regularly varying d-dimensional random vectors with tail index α > 0, {C j,i , j, i ∈ Z} be a sequence of d × d random matrices and μ ∈ Rd . In this chapter we will investigate the extremal behavior of the two-sided moving average process {X j , j ∈ Z} formally defined by Xj = μ +

∞ 

C j,i Z j−i .

(15.1.1)

i=−∞

Conditions for existence of this series were given in Chapter 4. To handle two-sided processes, we need to add an independence assumption between random coefficients and the noise variables Z j . © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 15

425

426

15 Moving averages

Assumption 15.1.1 (i) {Z j , j ∈ Z} is a sequence of i.i.d. random vectors and the distribution of Z 0 is multivariate regularly varying with tail index α > 0, exponent measure ν Z and E[Z 0 ] = 0 if α > 1. (ii) The sequence (of sequences) {{C j,· }, j ∈ Z} is stationary. (iii) The sequences {Z j , j ∈ Z} and {C j,i , j, i ∈ Z} are mutually independent. (iv) There exists  ∈ (0, α) such that ∞ 

α−

E[C 0,i 

+ C 0,i 

α+

] 2 .

(15.1.2c)

i=−∞



∞ Z α (v) > 0, where ΘZ 0 is a random vector whose j=−∞ E C 0,j Θ0 distribution is the spectral measure of Z 0 with respect to the chosen norm and independent of {C j,i , j, i ∈ Z}. ∞ The conditions (15.1.2) and (v) ensure that the measure ν X = i=−∞ E[ν Z ◦ C −1 0,i ] is boundedly finite and is not the null measure. Under Assumption 15.1.1, we can apply Corollary 4.1.5 and obtain the main result of this section. Theorem 15.1.2 If Assumption 15.1.1 holds, then the series in (15.1.1) is almost surely convergent and the process {X j , j ∈ Z} is stationary and regularly varying with tail index α. The distribution of its spectral tail process {Θj , j ∈ Z} is given by 

α  Z −1 Z C 0,j ΘZ E H C Θ (C ) Θ 0,j s,s+j s∈Z 0 0 0 j∈Z α

E[H(Θ)] = , Z i∈Z E C 0,i Θ0 (15.1.3)

15.1 Regular variation of two-sided moving average processes

427

for all non-negative or bounded measurable functions H on (Rd )Z . The extremal index θ of the series {|X j | , j ∈ Z} exists and is given by

 α   Z α C 0,j ΘZ C E − max Θ s≥1 s,s+j 0 0 j∈Z + α

θ= . (15.1.4) Z i∈Z E C 0,i Θ0

m−1 (m) Proof. Define the sequence X j = i=1−m C j,i Z j−i , j ∈ Z and let an the (1 − 1/n)-quantile of the distribution of |X 0 |. The main ingredient of the proof is the following bound, which follows from Proposition 4.1.6. For every  > 0,   (m) (15.1.5) lim lim sup nP X 0 − X 0 > an = 0 . m→∞ n→∞

In view of (15.1.5), we can apply Proposition 5.2.5 and we only need to compute the tail process of the two-sided 2m-dependent approximation X (m) of X. For clarity of notation, we omit the superscript m and we simply assume that C j,i = 0 if |i| ≥ m for all j ∈ Z, so that the infinite sums considered below are in fact finite sums. Then, P(|X 0 | > x) ∼ P(|Z 0 | > x)

∞ 

α  . E C 0,i ΘZ 0

(15.1.6)

i=−∞

Let H be a non-negative continuous function on (Rd )Z which depends only on coordinates xj , s ≤ j ≤ t for s ≤ 0 ≤ t ∈ Z. The vector X s,t is obtained as a random linear transform of the vector (Z s−(m−1) , . . . , Z t+m−1 ). Therefore, by independence of the sequences {Z j } and {C i,j }, applying (15.1.6), Corollary 2.1.14 Propsition 2.2.3, we obtain E[H(x−1 X)1{|X 0 | > x}] x→∞ P(|Z 0 | > x) ∞  E[H(x−1 (C s,s+i , . . . , C t,t+i )Z 0 )1{|C 0,i Z 0 | > x}] = lim x→∞ P(|Z 0 | > x) i=−∞  ∞ ∞

  −α−1  Z αr = E H(r(C s,s+i , . . . , C t,t+i )ΘZ dr . 0 )1 r C 0,i Θ0 > 1 lim

i=−∞

0

Note that there is only a finite number of non-zero terms in the series. Let Y be a Pareto random variable with tail index α, independent of all other random elements. By Fubini’s theorem and the change of variable u = r C 0,i ΘZ 0 , we have for each i ≥ 0,

428



15 Moving averages

 −α−1  Z E[H(r(C s,s+i , . . . , C t,t+i )ΘZ dr 0 )1 r C 0,i Θ0 > 1 ]αr 0    ∞   Z α Z −1 Z −α−1 = E C 0,i Θ0 H u C 0,i Θ−i (C s,s+i , . . . , C t,t+i )Θ0 ]αu du 1

  α −1 H Y C 0,i ΘZ (C s,s+i , . . . , C t,t+i )ΘZ = E C 0,i ΘZ . 0 0 0 ∞

Summing over i and applying (15.1.6) yields E[H(x−1 X)1{|X 0 | > x}] x→∞ P(|X 0 | > x) 

 −1 ∞ Z α (C s,s+i , . . . , C t,t+i )ΘZ H Y C 0,i ΘZ 0 0 i=−∞ E C 0,i Θ0

= . ∞ Z α i=0 E C 0,i Θ0 lim

Since we have proved that X 0 is regularly varying and the limit above exists, we obtain by Theorem 5.2.2 that the (2m-dependent) series X is regularly varying and its spectral tail process is given by (15.1.3). Letting m tend to infinity proves the general case by Proposition 5.2.5. We next compute the extremal index of the sequence {|X j | , j ∈ Z}. Here (m) (m) (m) we keep the superscript m for clarity. Write Yj = Y j and Θj = (m) (m) (m) (m) Yj /Y0 = Θj , j ∈ Z, and note that by m-dependence, Yj = 0 if |j| > m. By Corollary 7.5.7, we have     (m) (m) θ(m) = ϑ(m) = P max Yj ≤ 1 = P max Yj ≤1 j≥1 1≤j≤m      (m) (m) = 1 − P max Yj >1 =1−E max (Θj )α ∧ 1 . 1≤j≤m

1≤j≤m

Write m =

m 

α  . E C 0,i ΘZ 0

i=−m (m) C k,

Write also = C k, 1{|| ≤ m}. Applying (15.1.3) and using the homogeneity of the supremum functional yields  ∞    (m) Z α (m) Z α (m) −1 = 1 − m E max C j,j+i Θ0 θ ∧ C 0,i Θ0 i=−∞ −1 = m

∞ 

 α    (m) Z α (m) (m) Z α E C 0,i ΘZ − max Θ Θ ∧ C j,j+i 0 C 0,i 0 0

i=−∞

=

−1 m

∞  i=−∞

j≥1

 E

j≥1

α  (m) (m) Z α C 0,i Θ0 − max C j,j+i ΘZ 0 j≥1

 . +

Letting m tend to infinity and applying Corollary 7.5.7 proves (15.1.4).



15.2 Regular variation of one-sided moving average processes

429

Remark 15.1.3 An equivalent expression for the extremal index is α

∞ Z α − maxj≥1 C j,j+i ΘZ 0 i=−∞ E maxj≥0 C j,j+i Θ0 α

θ= . (15.1.7) ∞ E C 0,i ΘZ 0

i=−∞

Since the extremal index is the candidate extremal index, all the alternative expressions given in Chapters 5 and 6 are valid. ⊕ Example 15.1.4 Assume that C j,i = C i for all j, i ∈ Z, that is, X is a linear process with random coefficients. Then (15.1.7) becomes α

∞ Z α − maxj≥i+1 C j ΘZ 0 i=−∞ E maxj≥i C j Θ0 α

θ= ∞ Z i=−∞ E C i Θ0

α E maxj∈Z C j ΘZ 0 .

= ∞ Z α i=−∞ E C i Θ0 

15.2 Regular variation of one-sided moving average processes Let {Z j , j ∈ Z} be a sequence of i.i.d. regularly varying d-dimensional random vectors with tail index α > 0, {C j,i , j ∈ Z, i ∈ N} be a sequence of d × d random matrices and μ ∈ Rd . In this chapter we will investigate the extremal behavior of the causal moving average process {X j , j ∈ Z} formally defined by Xj = μ +

∞ 

C j,i Z j−i .

(15.2.1)

i=0

We modify assumption Assumption 15.1.1 to allow for some form of dependence between the noise and the weights. Assumption 15.2.1 (i) {Z j , j ∈ Z} is a sequence of i.i.d. random vectors and the distribution of Z 0 is multivariate regularly varying with tail index α > 0, exponent measure ν Z and E[Z 0 ] = 0 if α > 1. (ii) The sequence (of sequences) {{C j,· }, j ∈ Z} is stationary and C j,0 = Id for all j ∈ Z. (iii) There exists a filtration {Fj , j ∈ Z} such that

430

15 Moving averages

a) for all j ∈ Z, Z j is Fj measurable; b) for all j ∈ Z, C j,i is Fj -measurable for all i ≥ 0; c) for all j ∈ Z, C j,i is independent of Fj−i for all i ≥ 0. (iv) There exists  ∈ (0, α) such that ∞ 

E[C 0,i 

α−

+ C 0,i 

α+

] x) ∼ P(|Z 0 | > x)



 α  . E C 0,i ΘZ 0

(15.2.5)

i=0

Fix s ≤ 0 ≤ t. In compact form, we can write X s,t =

m−s−1 

B i Z −i ,

i=−t

with B i = (C s,s+i , . . . , C t,t+i ) (understood to be a [(t − s + 1)d] × d matrix) and the convention C k, = 0 if  ≥ m or  < 0 so that the series has a finite number of terms. By Assumption 15.2.1 (iii), for i < j, (B i , B j ) is independent of Fi ; hence the components of the vector W = (B m−s−1 Z s−m+1 , . . . , B −t Z t ) ∈ R(t−s+1)d(t−s+m) are regularly varying with tail index α and pairwise extremally independent in the sense of Definition 2.1.7 (see the argument in the proof of Theorem 4.1.2). By Proposition 2.1.8, W is regularly varying and its exponent measure ν is given by (see also Example 2.2.4 for the expression of the spectral measure), E[h(x−1 W )] = ν(h) x→∞ P(|Z 0 | > x) lim

=

m−s−1   ∞ i=−t

0

−α−1 E[h(0, . . . , yB i ΘZ dy 0 , . . . , 0)]αy

432

15 Moving averages

for any non-negative continuous function h on R(t−s+1)d(t−s+m) . (j)

Let x ∈ R(t−s+1)d(t−s+m) be written as a doubly indexed sequence xk , s ≤ j ≤ t, −t ≤ k ≤ m − s − 1. Let H be a function defined on R(t−s+1)d and let H be the function defined on R(t−s+1)d(t−s+m) by (s)

(s)

(t)

(t)

H(x) = H(x−s + · · · + xm−s−1 , . . . , x−t + · · · + xm−t−1 )   (0) (0) × 1 x0 + · · · + xm−1 > 1 . Identifying xj with B j Z −j we have H(x−1 W ) = H(x−1 X s,t )1{|X 0 | > x}. This yields E[H(x−1 X)1{|X 0 | > x}] = ν(H) x→∞ P(|Z 0 | > x) m−s+1   ∞  −α−1  Z = E[H(rB i ΘZ dr 0 )1 r C 0,i Θ0 > 1 ]αr lim

0

i=−t

=

m−1  ∞ i=0

 −α−1  Z E[H(r(C s,s+i , . . . , C t,t+i )ΘZ dr . 0 )1 r C 0,i Θ0 > 1 ]αr

0

The rest of the Theorem 15.1.2.

proof

remains

unchanged

from

the

proof

of 

Example 15.2.3 As in Section 14.3.5, let {(Aj , Z j ), j ∈ Z} be a sequence of i.i.d. random pairs, where Aj is a d × d matrix and Z j is a d-dimensional random vector. We do not rule out contemporaneous dependence between Ak and Z k for each k. Define C j,0 = Id (the identity matrix) and for i ≥ 1, C j,i = Aj · · · Aj−i+1 . α

α+

Assume that E[A0  ] < 1 and there exists  > 0 such that E[A0  ] < ∞. Defining Fj = σ(Ai , Z i , i ≤ j), Assumption 15.2.1 holds and therefore the process {X j , j ∈ Z} defined by i−1  ∞   Xj = Zj + Aj−l Z j−i , j ∈ Z , (15.2.6) i=1

l=0

is well defined, stationary and regularly varying with tail index α > 0. Note that the matrix product is non-commutative so the order of the product is important, even though the distribution is invariant since the matrices Aj are i.i.d. The process given in (15.2.6) is the stationary solution of the stochastic recurrence equation X j = Aj X j−1 + Z j .

15.2 Regular variation of one-sided moving average processes

433

This process is a Markov chain and was already considered in Chapter 14. k−1 Applying (15.2.3) and the convention i=k = Id for all k ∈ Z, the distribution of the forward spectral tail process is given by E[H(Θs , s ≥ 0)] =   α   −1  ∞ j−1 j−1 j−1 Z Z Z E A Θ H A Θ A Θ , s ≥ 0 −i 0 −i 0 −i 0 j=0 i=0 i=0 i=−s α 

 . ∞ j−1 Z j=0 E i=0 A−i Θ0 In particular, E[H(Θ0 )] ∞ =

  α   −1  j−1 j−1 j−1 Z Z Z j=0 E i=0 A−i Θ0 H i=0 A−i Θ0 i=0 A−i Θ0 

. α ∞ j−1 Z E A Θ −i 0 j=0 i=0 (15.2.7)

This means that the forward tail process is a multiplicative AR(1) process given by Θj = A j · · · A 1 Θ0 , j ≥ 1 , with Θ0 independent of the sequence {Aj , j ≥ 1}. This agrees with our findings in Chapter 14. This yields the bound   j  α α α j E[|Θj | ] ≤ E Ai  = (E[A0  ]) . (15.2.8) i=1

Therefore, ∞ 

α

E[|Θj | ] ≤

j=1

∞ 

α

(E[A0  ])j < ∞ .

j=1

The extremal index of {|X j |, j ∈ Z} can be easily computed:     θ = ϑ = P max |Y j | ≤ 1 = 1 − P max |Y j | > 1 j≥1

j≥1

 j α    = 1 − E 1 ∧ max |Θj | = 1 − E 1 ∧ max A i Θ0 j≥1 j≥1 i=1  j α      =E 1 − max A i Θ0 j≥1 i=1 +  j α  j α       = E 1 ∨ max Ai Θ0 − max Ai Θ 0 . j≥1 j≥1 





α

i=1

i=1

434

15 Moving averages

j−1 To describe the backward tail process, define for brevity Tj = i=0 A−i ΘZ 0, j ≥ 0 and α 

 j−1 α E i=0 A−i ΘZ 0 E[|T | ] α  = ∞ j

 pj = α , j ≥0. k−1 ∞ Z k=0 E[|Tk | ] k=0 E i=0 A−i Θ0 Let N be an integer-valued random variable, defined on the same probability space as Θ, with distribution {pj , j ∈ N}, i.e. P(N = j) = pj , j ≥ 0 and such that   α −1 j−1 A , s ∈ Z ΘZ E[|Tj | H(|Tj | −i 0 )] i=−s ∞ , E[H(Θ)1{N = j}] = α k=0 E[|Tk | ] where for matrices Mi , i ∈ Z, we use the convention ⎧ ⎪  ⎨Mk · · · M if k ≤  ,  Mi = Id if k =  + 1 , ⎪ ⎩ i=k 0 otherwise. Applying (15.2.3), we have Θj = 0 if j < −N and for s < 0, E[H(Θs , . . . , Θ0 ) | N = −s] 

 α −1 E |T−s | H |T−s | (Id, As+1 , As+2 As+1 , . . . , A0 . . . As+1 )ΘZ 0 ∞ = . α E [|T | ] k k=0 We use the convention |x| H(x−1 y) = 0 if x = 0.



Condition S(rn ,un ) We have seen that under either Assumption 15.1.1 or Assumption 15.2.1, the tail process tends almost surely to zero. Thus we expect that given a scaling sequence cn , condition AC(rn , cn ) to hold for some sequence rn such that rn P(|X 0 | > cn ) → 0. Unfortunately, no such result is known under these conditions. We prove here that condition S(rn , un ) (which implies AC(rn , cn ) with cn = un ) holds under an additional assumption on the random weights {C i,j }. Lemma 15.2.4 Suppose that Assumption 15.2.1 holds with  such that α −  > 2 if α > 2. If moreover

15.2 Regular variation of one-sided moving average processes

435

⎤ ⎡⎛ α− ⎞ (α−)∧2 ∞ ∞   ⎥ ⎢ (α−)∧2 ⎠ E ⎣⎝ C j,k  ⎦ j=1

k=j

+

∞ 

⎤ ⎡⎛ α+ ⎞ (α−)∧2 ∞  ⎥ ⎢ (α−)∧2 ⎠ E ⎣⎝ C j,k  ⎦ un ) → 0. (j)

Proof. Write X j

=

j−1 i=0

¯ (j) = X j − X (j) , j ≥ 1. C j,i Z j−i and X j j

(j)

(j)

Then X j is independent of X 0 . Indeed, X j is a measurable function of {C j,0 , . . . , C j,j−1 , Z j , . . . , Z 1 } and by the assumption is independent of F0 . On the other hand, X 0 is F0 -measurable. If X and V are independent real random variables, then for any random variable U , δ ∈ (0, 1) and x, y > 0, P(|X| > x, |U + V | > y) ≤ P(|X| > x)P(|V | > δx) + P(|X| > x, |U | > y(1 − δ)) . ¯ (j) and V = X (j) . Then for δ ∈ (0, 1) We apply this bound to X = X 0 , U = X j j and an integer  < rn , n  1 P(|X 0 | > un , |X j | > un ) P(|Z 0 | > un )

r

j=

≤ rn P(|Z 0 | > un )

P (|X 0 | > un ) sup P (|Z 0 | > un ) j=,...,rn

  (j) P X j > δun

P(|Z 0 | > un ) (15.2.10a)   rn ¯ (j) j= P |X 0 | > un (1 − δ), X j > un (1 − δ) + . P (|Z 0 | > un ) (15.2.10b)

By Proposition 4.1.6,

sup j≥0

  (j) P X j > δun P(|Z 0 | > un )

un ) → 0 as n → ∞, the term in (15.2.10a) vanishes as n → ∞. To bound the term in (15.2.10b), we apply again Proposition 4.1.6. This yields

436

15 Moving averages

  ¯ (j) P |X | > u (1 − δ), X 0 n j > un (1 − δ) j=

r n



P (|Z 0 | > un )   ∞ ¯ (j) > u P (1 − δ) X n j j= P (|Z 0 | > un )

≤ cst

∞ 

cj ,

j=

with

⎤ ⎤ ⎡⎛ ⎡⎛ α− α+ ⎞ (α−)∧2 ⎞ (α−)∧2 ∞ ∞ ⎥ ⎥ ⎢  ⎢  (α−)∧2 ⎠ (α−)∧2 ⎠ cj = E ⎣⎝ C j,k  C j,k  ⎦ + E ⎣⎝ ⎦ k=j

k=j

for large enough j. Since the series proves the lemma.



j=0 cj

is summable by assumption, this 

15.3 Moving averages with deterministic coefficients In this section we focus on sequences of the form  ψi Zj−i , j ∈ Z , Xj =

(15.3.1)

i∈Z

where {Zj , j ∈ Z} is a sequence of i.i.d. regularly varying random variables with tail index α and extremal skewness pZ . If α ≥ 1, we assume that E[Z0 ] = 0. Such sequences are referred to as linear processes or MA(∞) processes. If ψi = 0 for i ≤ −1, then the linear process is called one-sided or causal. For the series (15.3.1) to be well defined, we impose the summability conditions of Corollary 4.2.1, i.e. there exists δ ∈ (0, α) ∩ (0, 2] such that  |ψi |δ < ∞ . (15.3.2) i∈Z

This condition implies that 

|ψi |α < ∞ .

(15.3.3)

i∈Z

Recall that (15.3.3) suffices to define the max-moving average process introduced in Section 13.5.2. If α ∈ (0, 1], then the above condition implies that  |ψi | < ∞ . (15.3.4) i∈Z

15.3 Moving averages with deterministic coefficients

437

If α > 1, it may happen that there exists δ ∈ (1, 2] such that (15.3.2) holds but −c i∈Z |ψi | = ∞. For instance, if there exists c > 0 such that ψj = (1 + |j|) , then (15.3.3) holds if cα > 1, but (15.3.4) holds only if c > 1. MA(∞) processes with non-summable weights will be referred to as long memory or strongly dependent moving averages. Recall that we write ⎞1/α ⎛  ψα = ⎝ |ψi |α ⎠ . j∈Z

Under (15.3.2), we also have by Corollary 4.2.1, P(|X0 | > x) ∼ ψα α P(|Z0 | > x) .

(15.3.5)

Thus, if aZ,n and aX,n are the quantile of order 1 − 1/n of |Z0 | and |X0 |, respectively, we have aX,n ∼ ψα aZ,n . m-dependent tail equivalent approximations By Corollary 4.2.1, we have lim sup

  P |j|>m ψj Zj > x P(|Z0 | > x)

m→∞ x≥1

=0.

(15.3.6)

Thus, (15.1.5) holds under (15.3.2), with the approximating sequence X (m) being the linear process corresponding to the truncated sequence {ψj , |j| ≤ m}, i.e. (m) Xj

=

m 

ψi Zj−i , j ∈ Z .

i=−m

The tail and spectral tail processes Using the tail dependent approximation and Proposition 5.2.5 or applying Theorem 15.1.2, we can compute the tail process. Set −1   α pj = |ψi | |ψj |α . (15.3.7) i∈Z

Then (15.1.3) becomes

438

15 Moving averages

E[H(Θ)] =



pj E[H(|ψj |−1 (ψs+j )s∈Z Θ0Z )] .

(15.3.8)

j∈Z

There is no issue with division by zero since ψj = 0 implies pj = 0. This means that the spectral tail and tail processes can be represented as ψj+N Z Θ , Yj = Θj Y j ∈ Z , (15.3.9) Θj = |ψN | 0 with • P(Θ0Z = 1) = 1 − P(Θ0Z = −1) = pZ , • Y is a Pareto random variable with tail index α, independent of Θ0Z , • N an integer-valued random variable, independent of Θ0Z and Y , such that α P(N = j) = ψ−α α |ψj | , j ∈ Z .

Note that (15.3.9) yields Θ0 = sign(ψN )Θ0Z , so that it also holds that Θj =

ψj+N Θ0 , ψN

but Θ0 and N are not independent. The extremal skewness of X0 is given by pX = P(Θ0 = 1) = P(sign(ψN )Θ0Z = 1) ∞  α = ψ−α {pZ (ψk )α α + + (1 − pZ )(ψk )− } .

(15.3.10)

k∈Z

This expression coincides with (4.2.3) from Example 4.2.2. If the process is causal, i.e. ψi = 0 for i < 0, then the tail and spectral tail processes vanish for j < −N . The conditional spectral tail process {Qj , j ∈ Z} is shift-equivalent to the sequence {(ψ ∗ )−1 ψj Θ0Z , j ∈ Z} with ψ ∗ = supj∈Z |ψj |. Thus, the candidate extremal index and right-tail candidate are given by (see (5.4.10) and (7.5.4c)), ϑ=

(ψ ∗ )α 1 = , α α j∈Z E[|Qj | ] j∈Z |ψj |

(15.3.11a)

α pZ supj∈Z (ψj )α E[supj∈Z (Qj )α +] + + (1 − pZ ) supj∈Z (ψj )− = . ϑ+ = ∞ α α α j∈Z E[(Qj )+ ] i∈Z {pZ (ψi )+ + (1 − pZ )(ψi )− } (15.3.11b)

Point process convergence Let Nn and Nn be the functional point processes of exceedences and clusters, respectively, as defined in (7.2.1) and (7.3.1), i.e.

15.3 Moving averages with deterministic coefficients

Nn =

∞ 

δn−1 i,a−1

X,n Xi

i=1

, Nn =

∞ 

439

δm−1 , n i,X n,i

i=1

with X n,i = a−1 X,n (X(i−1)rn +1 , . . . , Xirn ) and mn = [n/rn ]. Since the condition for m-dependent approximation is satisfied, we can apply Corollary 7.4.4 to obtain convergence of these point processes: Nn =⇒ N = w

Nn =⇒ N = w

∞ 

δTi ,Pi Q (i) i=1 ∞  

,

δTi ,Pi Q(i) , j

i=1 j∈Z

where {(Ti , Pi ), i ≥ 1} are the points of a homogeneous PPP on [0, ∞)×(0, ∞) with mean measure ϑLeb ⊗ να , and Q(i) , i ≥ 1, are independent copies of the sequence Q defined above, independent of {(Ti , Pi ), i ≥ 1}. These convergences hold as long as the condition (15.3.2) is satisfied. This includes long memory moving averages. To obtain a more explicit description of N , we can compute its Laplace functional. By (7.3.9)  ∞ ∞   ∗ −1 Z  E 1 − e− j∈Z f (t,u(ψ ) ψj Θ0 ) dt αu−α−1 du log E[e−N (f ) ] = −ϑ 0 0  ∞ ∞  ∗ −α (1 − e− j∈Z f (t,uψj ) ) dt να,[ pZ ](du) = −ϑ(ψ ) 0 −∞  ∞ ∞  −α − ψα (1 − e− j∈Z f (t,uψj ) ) dt να,[ pZ ](du) 0 −∞  ∞ ∞  −1 =− (1 − e− j∈Z f (t,u ψ α ψj ) ) dt να,[ pZ ](du) . −∞

0

This shows that N can also be represented as ∞  

N =

δTi ,Pi ψ −1 ψj Θ(Z,i) , α

(15.3.12)

0

i=1 j∈Z

∞ (Z,i) where {Θ0 , i ≥ 1} are i.i.d. copies of Θ0Z , independent of i=1 δTi ,Pi which is a Poisson point process on [0, ∞) × (0, ∞) with intensity Leb ⊗ να . Furthermore, Problem 7.11 gives Nh,n =

n−h  i=1

w

δa−2

2 X,n (Xi ,Xi Xi+1 ,...,Xi Xi+h )

=⇒ Nh =

∞   i=1 j∈Z

δPi ((Q(i) )2 ,Q(i) Q(i) j

j

(i) (i) j+1 ,...,Qj Qj+h )

,

440

15 Moving averages

where {P*i , i ≥ 1} are the points of a homogenous PPP on (0, ∞) with mean measure ϑνα/2 . Similarly to (15.3.12), we can write ∞   , (15.3.13) Nh = δP ψ −2 2 α (ψj ,ψj ψj+1 ,...,ψj ψj+h ) i i=1 j∈Z

where now {P*i , i ≥ 1} are the points of a homogenous PPP on (0, ∞) with mean measure να/2 . Sums: convergence to stable laws We define the process ΛZ as the limit of the partial sum process of the i.i.d. sequence {Zj , j ∈ Z}, i.e. {a−1 Z,n

[nt] 

w

(Zi − bZ,n ), t ≥ 0} =⇒ ΛZ ,

(15.3.14)

i=1

in D([0, ∞)) endowed with the J1 topology. We use the centering ⎧ ⎪ if 0 < α < 1 , ⎨0 bZ,n = E[Z0 1{|Z0 | ≤ aZ,n }] if α = 1 , ⎪ ⎩ if 1 < α < 2 . E[Z0 ] Then

+ d

ΛZ (1) =

Stable(α, Γ (1 − α) cos(πα/2), βZ , 0) if α = 1 , Stable(1, π2 , βZ , c0 βZ ) if α = 1 ,

with c0 given in (8.2.2c). We will prove that for all α ∈ (0, 2) \ {1}, {a−1 X,n

[nt] 

fi.di.



(Xi − bX,n ), t ≥ 0} −→ ΛX =

i=1

with the centering + 0   bX,n = ψ E[X0 ] = j∈Z j E[Z0 ]

j∈Z

ψj

ψα

ΛZ ,

if 0 < α < 1 , if 1 < α < 2 .

If α = 1, ΛX is an (α, σ, βX , 0)-stable L´evy process with α j∈Z ψj α , σ = Γ (1 − α) cos(πα/2) ψα α ⎞ ⎛  βX = βZ sign ⎝ ψj ⎠ . j∈Z

The case α = 1 will be treated separately.

(15.3.15)

(15.3.16)

15.3 Moving averages with deterministic coefficients

441

Convergence to stable laws: case 0 < α < 1 In that case (15.3.2) implies that j∈Z |ψj | < ∞. Then, (15.3.15) is a direct application of the convergence of the point process of exceedences (15.3.12) and Theorem 8.3.1. By the summability of points, we obtain the representation ∞  j∈Z ψj (Z,i) ΛX (t) = 1{Ti ≤ t}Pi Θ0 . ψα i=1 If the coefficients ψj are non-negative, then the convergence holds in D([0, ∞)) endowed with the M1 topology. Convergence to stable laws: 1 < α < 2, short memory In this case, the asymptotic negligibility (ANSJU(an )) is needed to apply Theorem 8.3.5 and obtain the weak convergence of the finite-dimensional distributions of the partial sum process to a L´evy stable process. If (ANSJU(an )) is satisfied, then (15.3.15) still holds. Unfortunately, a direct proof of this condition is not available in general, but it does hold for m-dependent sequences. It to use the m-dependent approximation. We assume that is thus natural |ψ | < ∞, i j∈Z j∈Z ψj = 0. We define Sn (t) = a−1 X,n

[nt]  i=1

Sm,n (t) =

a−1 X,n

[nt] 

(Xi − bX,n ) , ⎛ ⎝Xi(m)





ψj bZ,n ⎠ ,

|j|≤m

i=1

m

j=−m ψj = 1/α ΛZ . α j∈Z |ψj |

ΛX,m

(m)

Let aX (m) ,n be the (1 − 1/n) quantile of X0 aX (m) ,n ) = 1. By (15.3.5), aX (m) ,n = lim n→∞ aX,n



 m j=−m



j∈Z

|ψj |α

|ψj



1/α ,

(m)

, that is limn→∞ nP(|X0

|>

aX (m) ,n = 1 . (15.3.18) m→∞ n→∞ aX,n lim lim

By application of Theorem 8.3.5 to the (2m + 1)-dependent process {X (m) j , j ∈ Z}, we know that ⎛ ⎞ [nt]   fi.di. ⎝Xi(m) − a−1 ψj bZ,n ⎠ −→ ΛX,m X (m) ,n i=1

|j|≤m

442

15 Moving averages fi.di.

w

and by (15.3.18) we have Sm,n −→ ΛX,m . It obviously holds that ΛX,m =⇒ ΛX . Thus, by the triangular argument Lemma A.2.10, the convergence of Sn to Λ is equivalent to lim lim sup P(|Sn − Sm,n | > δ) = 0 .

m→∞ n→∞

(15.3.19)

Write Z¯i = Zi 1{|Zi | ≤ aZ,n } − E[Z0 1{|Z0 | ≤ aZ,n }. Then, aX,n a−1 Z,n (Sn − Sm,n ) = a−1 Z,n

n  

ψj (Zi−j − bZ,n )

i=1 |j|>m

= a−1 Z,n

n  

ψj Z¯i−j + a−1 Z,n

i=1 |j|>m



⎝ − na−1 Z,n

n  





ψj Zi−j 1{|Zi−j | > aZ,n }

i=1 |j|>m

ψj ⎠ E[Z0 1{|Z0 | > aZ,n }]

|j|>m



⎝ = Um,n + Vm,n − na−1 Z,n



⎞ ψj ⎠ E[Z0 1{|Z0 | > aZ,n }] .

|j|>m

(15.3.20) If α > 1, by regular variation and the definition of aZ,n , we have lim na−1 Z,n E[Z0 1{|Z0 | > aZ,n }] =

n→∞

By elementary algebra



⎝ var(Um,n ) = a−2 Z,n var

n  

α(2pZ − 1) . α−1

⎞ ψj Z¯i−j ⎠

i=1 |j|>m

¯2 = a−2 Z,n E[Z0 ]

n−1 

(n − |k|)

k=1−n



ψj ψj−k 1{|j − k| > m} .

|j|>m

Thus by Proposition 1.4.6, 2 lim var(Um,n ) = lim na−2 Z,n E[Z0 1{Z0 ≤ aZ,n }]

n→∞

n→∞

n−1  1  (n − |k|) ψj ψj−k 1{|j − k| > m} n→∞ n k=1−n |j|>m ⎛ ⎞2 α ⎝ ⎠ α   ψj ψj−k 1{|j − k| > m}= ψj . = 2−α 2−α

× lim

k∈Z |j|>m

|j|>m

15.3 Moving averages with deterministic coefficients

443

This shows that lim lim var(Um,n ) = 0 .

m→∞ n→∞

Finally, by the point process convergence for i.i.d. sequences (Theorem 7.2.1) and by the continuity of the summation functional in the framework of Corollary 7.1.3, we have ⎞ ⎛ ∞   w (Z,i) Vm,n =⇒ ⎝ ψj ⎠ Pi Θ0 1{Pi > 1} , n → ∞ . |j|>m

i=1

The number of points in the sum is almost surely finite and has a Poisson distribution with mean 1. Altogether, using (15.3.5), we have proved (15.3.19). Thus the convergence (15.3.15) holds. Note for the next case that the treatment of Um,n and Vm,n is still valid in the case α = 1. Convergence to stable laws: α = 1 Here as in the case α < 1, (15.3.3) implies j∈Z |ψj | < ∞. Moreover it also implies (8.2.11). Thus if (ANSJU(an )) holds, then we can apply Theorem 8.3.5 to obtain n  d (Xi − E[X0 1{|X0 | ≤ aX,n }]) −→ Stable(1, σX , βX , cX ) , (15.3.21) a−1 X,n i=1

with σX

βX

cX

 π | π  j∈Z ψj | , (15.3.22a) =ϑ E Qj = j∈Z 2 2 j∈Z |ψj | 

⎞ ⎛ E  ψ j∈Z Qj j  = j∈Z βZ = sign ⎝ = ψj ⎠ βZ , (15.3.22b) | j∈Z ψj | E j∈Z Qj j∈Z ⎫ ⎧ ⎞ ⎛  ⎬ ⎨   βZ = c0 (2pX − 1) − ψj log ⎝ ψj ⎠ + ψj log(|ψj |−1 ) . ⎭ ψ1 ⎩ j∈Z

j∈Z

j∈Z

(15.3.22c) Recall that βX = 2pX − 1 except in the case of extremal independence and by the expression (15.3.10), we have ψj ψ j∈Z j j∈Z 2pX − 1 = βZ = βX . (15.3.23) ψ1 ψ1 As in the case α > 1, we will prove the convergence by m-dependent approximation, since we cannot prove directly that (ANSJU(an )) holds. For this purpose, we need to modify the centering. By Problem 4.4, we have

444

15 Moving averages

lim

n→∞

⎛ ⎞  n ⎝ E[X0 1{|X0 | ≤ aX,n }] − ψj E[Z0 1{Z0 ≤ aZ,n }]⎠

aX,n

j∈Z

βZ  = ψj log(|ψj |−1 ) + (2pX − 1) log ψ1 . ψ1

(15.3.24)

j∈Z

Thus the convergence (15.3.21) is equivalent to ⎛ ⎞ ⎛ ⎞ n   ⎝Xi − ⎝ ψj ⎠ E[Z0 1{|Z0 | ≤ aZ,n }]⎠ a−1 Z,n i=1

j∈Z

d

−→ Stable(1, ψ1 σX , βX , c˜X ) ,

(15.3.25)

with (using (15.3.23)), ⎞ ⎛ ⎞ ⎛   c˜X = c0 (2pX − 1)ψ1 − βZ ⎝ ψj ⎠ log ⎝ ψj ⎠ j∈Z j∈Z ⎞ ⎛ ⎞ ⎛    = c0 βZ ψj − βZ ⎝ ψj ⎠ log ⎝ ψj ⎠ . j∈Z j∈Z j∈Z We will prove the convergence (15.3.25) by the triangular argument. In the first place, by (15.3.5) and Theorem 8.3.5 applied to the i.i.d. sequence {Zj , j ∈ Z}, ⎛ ⎞ ⎞ ⎛ n   ⎝Xi(m) − ⎝ a−1 ψj ⎠ bZ,n ⎠ Z,n i=1



≈⎝



|j|≤m



|j|≤m

ψj ⎠ a−1 Z,n

n 

⎛ (Zi−j − bZ,n ) −→ ⎝ d

i=1



⎞ ψj ⎠ ΛZ (1) .

|j|≤m

(The sign ≈ takes care of edge effects   and is easily rigorously justifiable.) d Λ Obviously, ψ (1) −→ Z |j|≤m j j∈Z ψj ΛZ (1), as m → ∞, which has exactly the distribution in the right-hand side of (15.3.25) (cf. Problem 8.2). In order to apply the triangular argument, we must prove that (15.3.19) still holds in the present context. We use the decomposition (15.3.20), with the simplification that the last term vanishes because of the upper truncation in the centering. The rest of the argument is exactly the same. Autocovariances: convergence to stable laws Assume that that there exists δ ∈ (0, α) ∩ (0, 2] such that

15.3 Moving averages with deterministic coefficients



|ψj |δ < ∞ .

445

(15.3.26)

j∈Z

We note first that since the bivariate distributions (X0 , X ) are jointly regularly varying with extremal dependence, X0 X is regularly varying with index α/2 for all  ≥ 1; see Problem 4.1. Autocovariances: α ∈ (0,2) We start with α ∈ (0, 2). Then, (15.3.26) implies can apply Problem 8.7 to obtain a−2 X,n

n−h 

d

(Xi2 , Xi Xi+1 , . . . , Xi Xi+h ) −→

i=1



j∈Z

ψj2 < ∞. Thus, we

(2)

(ψj2 , . . . , ψj ψj+h )ΛX , (15.3.27)

j∈Z (2)

where ΛX has a totally skewed to the right α/2 stable law and can be represented as (2)

ΛX =

∞ 1  * Pi ψ2α α i=1

with {P*i , i ≥ 1} as in (15.3.13). Autocovariances: α ∈ (2,4) If 2 < α < 4, assume that E[Z0 ] = E[X0 ] = 0 and set r() = cov(X0 , X ) = E[X0 X ]. The latter quantities are well defined and  ψj ψj+ . r() = E[Z02 ] j∈Z

Moreover, assume that for each  = 0, . . . , h and all δ > 0,  n    Xi Xi+ 1 |Xi Xi+ | ≤ a2X,n  lim lim sup P →0 n→∞ i=1    −E[X0 X 1 |X0 X | ≤ a2X,n  ]} > a2X,n δ = 0 . Then again we can apply Problem 8.7 to obtain convergence of a−2 X,n

n−h  i=1

(Xi2 − r(0), Xi Xi+1 − r(1), . . . , Xi Xi+h − r(h))

(15.3.28)

446

15 Moving averages

to a multivariate stable vector. However, again as in the case of partial sums, the asymptotic negligibility condition (15.3.28) is hard to verify and rather one uses an approximation argument. A similar argument that leads to (15.3.20) gives ⎧ ⎫ n n  ⎨ ⎬  P 2 Xi Xi+ − ψj ψj+ Zi−j −→ 0 . a−2 X,n ⎩ ⎭ i=1

(15.3.29)

i=1 j∈Z

Since α ∈ (2, 4), E[Z02 ] < ∞. Moreover, adapting (15.3.14), a−2 Z,n

n 

d

(2)

(Zi2 − E[Z02 ]) −→ ΛZ ,

i=1

where (2)

d

ΛZ (1) = Stable(α/2, Γ (1 − α/2) cos(πα/4), 1, 0) . Thus, the limiting distribution of a−2 X,n

n 

{Xi Xi+ − r()}

i=1

is the same as a−2 X,n

n  

 2  ψj ψj+ Zi−j − E[Z02 ]

i=1 j∈Z

and we obtain a−2 X,n

n−h 

(Xi2 − r(0),Xi Xi+1 − r(1), . . . , Xi Xi+h − r(h))

i=1 d

−→

1  2 (2) (ψj , . . . , ψj ψj+h )xΛZ . ψ2α

(15.3.30)

j∈Z

Autocovariances: α =2 If α = 2 and E[Z0 ] = 0, the approximation (15.3.29) is still valid. Adapting again (15.3.14), a−2 Z,n

n  i=1

(2)

(2)

d

(2)

(Zi2 − bZ,n ) −→ ΛZ ,

where bZ,n = E[Z02 1{|Z0 | < aZ,n }] and

15.3 Moving averages with deterministic coefficients (2)

ΛZ = Stable(1,

447

π , 1, c0 ) , 2

c0 given in (8.2.2c). Thus, the limiting distribution of n    (2) Xi Xi+ − bZ,n a−2 X,n i=1

is the same as a−2 X,n

n  

  (2) 2 ψj ψj+ Zi−j − bZ,n

i=1 j∈Z

and a−2 X,n

n−h  i=1

⎛ ⎝Xi2 −



(2)

ψj2 bZ,n , Xi Xi+1 −

j∈Z



(2)

ψj ψj+1 bZ,n ,

j∈Z

· · · , Xi Xi+h −





ψj ψj+h bZ,n ⎠ (2)

j∈Z

d

−→

1  2 (2) (ψj , . . . , ψj ψj+h )ΛZ . ψ2α

(15.3.31)

j∈Z

(2) As in the case of partial sums, we replace the centering bZ,n { j∈Z ψj ψj+ ,  =   0, . . . , h} with {E[X0 X 1 |X0 X < a2Xn ],  = 0, . . . , h} by using Problem 4.5. The limiting stable random vector will change.

Consistency of the tail empirical process and the Hill estimator Recall that the tail empirical distribution is defined by n  1 * Tn (s) = 1{|Xj |>un s} , s ≥ s0 . nP(|X0 | > un ) j=1

(15.3.32)

If (15.3.2) holds, then an m-dependent tail equivalent approximation is possible, thus Propostion 9.1.2 yields the consistency of the tail empirical distribution: P T*n (s) −→ s−α ,

uniformly on [s0 , ∞), for any s0 > 0. By Proposition 9.1.3 and Theorem 9.5.1, this yields the convergence of the intermediate order statistics and the consistency of the Hill estimator. Again, it is noteworthy that this consistency holds for long memory moving averages since its only assumption, (15.3.2), does not rule out long memory. To go beyond consistency, we need to be able to check condition S(rn , un ) and the β-mixing property.

448

15 Moving averages

Condition S(rn ,un ) For causal linear processes (15.2.9) reduces to ∞ 

i|ψi |δ < ∞

(15.3.33)

i=1

for some δ ∈ (0, α) ∩ (0, 2]. Then Condition S(rn , un ) holds for all sequences r n and un such that rn P(|X0 | > un ) → 0. We know that S(rn , un ) implies ∞ j=1 P(|Yj | > 1) < ∞, but condition (15.3.33) is much stronger than what is needed for the latter summability property to hold. We have ∞  j=1

P(|Yj | > 1) =

∞ 

E[{|Θj | ∧ 1}α ]

j=1 ∞ ∞   1 −1 α |ψ | {(|ψj+k | |ψk | ) ∧ 1}α k α |ψ | i i=0 j=1

= ∞ = ∞

k=0 ∞  ∞ 

1

i=−∞

|ψi |α

k=0 j=1

A sufficient condition for the summability of ∞ 

{|ψj+k | ∧ |ψk |}α . j

P(|Yj | > 1) is

α

j |ψj | < ∞ ,

(15.3.34)

j=1

which is also necessary if the coefficients |ψj | are non-increasing. Cf. Theorem 13.3.4. Note that (15.3.33) rules out long memory for all α whereas (15.3.34) allows it for large α. It is an open problem to prove S(rn , un ) under weaker conditions. Conditions for β-mixing Lemma 15.3.1 Consider a causal MA(∞) process such that (15.3.4) holds and assume that a distribution FZ of Z0 is absolutely continuous with density fZ . Assume moreover that the following conditions hold: (i) There exists K ∈ (0, ∞) such that for all u ∈ R,  ∞ |fZ (u − v) − fZ (v)|dv ≤ K|u| . −∞

(ii) There exist q > 0 and  ∈ (0, q) such that E[|Z0 |q ] < ∞ and

(15.3.35)

15.3 Moving averages with deterministic coefficients ∞ 

⎛ ⎝

∞ 

i=0

|ψj |⎠

0, ψj = 0, j < 0, for one ρ > 0. The summability condition (15.3.36) becomes ρ > 2 + 1/α. The condition on ρ is much more restrictive than the one required for existence of the series. The mixing rate is then βn = O(n−(ρ−1)(α−)/(1+α−)+1 ). If the coefficients ψj decay geometrically fast, then so do the β-mixing coefficients. This is the case of causal invertible ARMA processes. Central limit theorem for the Hill estimator Armed with the β-mixing property and (15.3.33) which implies S(rn , un ) (and excludes long memory), we can apply Theorem 9.2.10 and Theorem 9.5.2, if moreover the rate conditions R(rn , un ) and β(rn , n ) hold. To obtain the asymptotic normality of the Hill estimator (defined in (9.5.1)), we need the bias conditions (9.3.3) and (9.5.3) and the anticlustering condition Slog (rn , un ). Unfortunately, this condition has not yet been proved to hold for moving averages. If it holds, then the asymptotic variance of the Hill estimator for the tail index α based on {Xj , j ∈ Z} is  P(Yj > 1 | Y0 > 1) α−2 j∈Z

=

j∈Z



α α α k∈Z {pZ (ψk )+ ∧ (ψj )+ + (1 − pZ )(ψk )− α α2 j∈Z {pZ (ψj )α + + (1 − pZ )(ψj )− }

∧ (ψj )α −}

.

Example 15.3.2 For the AR(1) process we have ψj = ρj if j ≥ 0 and we obtain 1 + |ρ|2α 1 + |ρ|α if ρ ≥ 0 , if ρ < 0 . α2 (1 − |ρ|α ) α2 (1 − |ρ|2α )  Other cluster functionals By Theorem 10.1.3, the consistency of the cluster empirical measure (cf. (10.0.1)) holds as long as the basic condition (15.3.2) holds. Again, this includes long memory moving averages.

450

15 Moving averages

In order to prove central limit theorems for blocks estimators, we need the β-mixing assumption, rates condition, and miscellaneous bias conditions. For tail array sums, we also need the anticlustering condition Sψ (rn , un ) for which we have no sufficient conditions. For the estimator of the extremal index of {|Xj |, j ∈ Z} considered in Example 10.4.2, we only need condition S(rn , un ), thus we can conclude that the estimator is asymptotically normal with limiting variance  P(|Yj | > 1) − ϑ ϑ2 j∈Z

 =

supj≥0 |ψj |α ∞ α i=0 |ψi |

1 ∞ ∞ 2 + α k=0 j=1 {|ψj+k | ∧ |ψk |} ∞ 1+2 α i=0 |ψi |   supj≥0 |ψj |α − ∞ . α i=0 |ψi |

For an AR(1) process with ρ ∈ (−1, 1) we get   (1 − |ρ|α )2 1 + 2|ρ|α (1 − |ρ|α )−1 − (1 − |ρ|α ) = |ρ|α (1 − |ρ|α ) .

15.4 Problems 15.1 Under the assumptions of Theorem 15.2.2, obtain expressions for dependence coefficient τ|X | (h) and the conditional tail expectation CTE|X | (h) if α > 1. 15.2 Consider the SRE process of Example 15.2.3. Assume that d = 1 and the Aj and Zj are non-negative random variables and P(Aj = 0) = 0. 1. Compute the forward and backward spectral tail processes. 2. Compute the candidate extremal index, the tail dependence coefficient τ|X| (h) and the CTE. 15.3 Let {Xj } be given by (15.3.1) with {Zj } regularly varying with tail index α > 0 and the sequence {ψj } satisfying (15.3.2) with ψj = 0 if j < 0. 1. Compute the tail dependence coefficient and the CTE. 2. For a stationary AR(1) process (i.e. the stationary solution of the AR(1) equation Xj = ρXj−1 + Zj where |ρ| < 1), compute the tail process, the specral tail process, the extremal index, the tail dependence coefficient, and the CTE if α > 1. 15.4 Consider the moving average process of Problem 15.3. Compute the expression for the asymptotic variance of the estimator θ2largedev,n,k of the

15.5 Bibliographical notes

451

large deviation index from Example 10.4.6. Apply it to the AR(1) process with ρ ∈ (0, 1).

15.5 Bibliographical notes The results of Sections 15.1 and 15.2, in such generality, seem to be new. The formulas (15.2.4a), (15.3.11b) for the extremal index were obtained in [CHM91, Proposition 2.1] in the case of a linear process with deterministic coefficients. [MS10] obtained the tail process of a moving average with values in a Banach space with deterministic operator weights. Point process convergence for infinite order moving averages with stable innovations was studied for the first time in [Roo78]. It seems that the Single big jump heuristic was introduced to this area for the first time in that reference. Point process convergence for linear processes with deterministic weights has been established by m-dependent approximation, usually under conditions that rule out long memory, following closely the ideas of the previous reference. See [DR85a]. Using the nearly optimal results of [HS08] on the convergence of series allows the use of m-dependent approximations even in the long memory case. Point process convergence immediately gives converges of partial sums when α ∈ (0, 1). In the case 1 < α < 2, convergence to a stable law is usually concluded directly by m-dependent approximation, instead of using point process convergence. See [DH95]. The case α = 1 seem to have been overlooked in the literature. [AT00, Theorem 2.1 and 2.2] prove convergence of partial sums and partial sums of squares in the M1 topology in the long and short-memory cases, respectively. [Kri19] proves functional convergence for moving averages with random coefficients. Sample covariances for α ∈ (0, 2) are considered in [DR85a] by applying directly point process convergence. See also [DR85b]. The case α ∈ [2, 4) is treated in [DR86]. Consistency of the Hill estimator for moving averages with deterministic weights was proved by [RS97, RS98] under conditions which exclude long memory. The consistency of the Hill estimator for long memory moving averages seems to have been unnoticed. The asymptotic variance for the Hill estimator in case of infinite order moving averages with deterministic weights appears in [Hsi91b, Theorem 4.5] (mdependent case) and [Dre03, Dre08]. In both cases, the asymptotic theory is obtained under β-mixing conditions. The asymptotic variance for the blocks estimator of the extremal index in case of finite order moving averages is given in [Hsi91a, Example 4.7].

452

15 Moving averages

The β-mixing result of Lemma 15.3.1 is from [PT85]. Mixing properties of ARMA models were obtained by [Mok88, Mok90]. We are not aware of references which provide conditions for β-mixing of general moving averages with random coefficients in the framework of Theorem 15.2.2.

16 Long memory processes

In this chapter, we will introduce models for which the standard extreme value theory developed in the previous chapters fail in some aspects. These models belong to the loosely defined class of long memory or long-range dependent processes. The first long-range dependent models were studied for Gaussian processes, so a natural definition is through the covariance function. Longrange dependence was then observed in many other models, including infinite variance processes, and a wider definition was needed. The only common feature of these models can be described as a phase transition. Consider a class of processes indexed by a parameter θ ∈ Θ (finite or infinite dimensional) and that Θ can be split as Θ = Θ1 ∪ Θ2 . For values of the parameter θ ∈ Θ1 , the process behaves, relatively to a certain statistics, as a weakly dependent sequence of even as a sequence of i.i.d. random variables, but for θ ∈ Θ2 , the behavior is markedly different. The first long memory behavior investigated was the convergence of the partial sum process. For weakly dependent processes (as in Chapter 8), the partial sum process will typically converge to a process with independent increments, either with finite or infinite variance. For a long memory process, convergence to a process with dependent increments may hold, depending on the value of a certain parameter. The point process of exceedences and the partial maxima are very robust to long memory effects. For Gaussian processes, the partial sum process may have the long memory behavior while the partial maxima still behave like that of a sequence of i.i.d. Gaussian random variables. In this chapter, we will study three classes of models and show some long memory effects. In Section 16.1, we will consider moving average processes with non-summable coefficients and regularly varying innovations. In Section 16.2, we will recall the classical extreme value theory for Gaussian processes (which are of course not regularly varying) and give its implications for heavytailed time series which can be expressed as instantaneous functions of a long memory Gaussian process. In Section 16.3, we will consider a simple model

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4 16

453

454

16 Long memory processes

from financial econometrics: the long memory stochastic volatility process. In each section we will describe more precisely the phase transition which occurs. We have kept this chapter short and omitted or sketched several proofs. The purpose of this chapter is mainly illustrative and serves as a caveat: the standard extreme value theory for time series should not be taken for granted.

16.1 Long memory moving averages We consider a causal moving average {Xj , j ∈ Z}, Xj =

∞ 

ψi Zj−i ,

(16.1.1)

i=0

where {Zj , j ∈ Z} is a sequence of i.i.d. random variables, regularly varying with tail index α ∈ (1, 2) and E[Z0 ] = 0 and {ψj , j ∈ N} is a sequence of real numbers such that ∞ 

|ψj |δ < ∞ ,

(16.1.2)

j=0

for some δ ∈ (0, α), which is necessary for the definition of the series in  (16.1.1), but we now assume that j∈Z |ψj | = ∞. We have already seen in Section 15.3 that the point process of exceedences and the sample maxima converge under the condition (16.1.2). Thus there is no long memory effect. ∞ The non-summability condition j=0 |ψj | = ∞ is not a sufficient characterization of long memory to obtain asymptotic results. We shall make the following assumption. Assumption 16.1.1 Let α > 1. There exists H ∈ (1/α, 1) and a slowly varying function ψ at ∞ such that ψj = ψ (j)j H−1−1/α , j ≥ 1 .

(16.1.3)

For such a H, we have H − 1 − 1/α > −1 (and hence α(H − 1 − 1/α) > −1), thus ∞ it is possible to find δ < α such that (16.1.2) holds. At the same time, j=0 |ψj | = ∞. Under these assumption, we can obtain the limit of the partial sum process of the sequence {Xj , j ∈ Z}. In order to describe the limit, recall from

16.1 Long memory moving averages

455

Chapter 8 that an α-stable L´evy process Λ is a process with stationary independent increments, continuous in probability, such that Λ(0) = 0 a.s. and log E[eizΛ(1) ] = −Γ (1 − α) cos(πα/2)σ α |z|α {1 − iβsign(z) tan(πα/2)} , for z ∈ R, with σ > 0 and β ∈ [−1, 1]. Since Λ has independent increments, ∞ it is possible to define a stochastic integral of the form −∞ f (s)Λ(ds) for all ∞ ∞ functions f ∈ Lα (R), i.e. −∞ |f (s)|α ds < ∞. The integrals −∞ f1 (s)Λ(ds), ∞ . . . , −∞ f (s)Λ(ds) are jointly stable for f1 , . . . , fn ∈ Lα (R). The L´evy fractional stable motion Lα,H is defined as such a stochastic integral:  t H−1/α H−1/α {(t − s)+ − (−s)+ }Λ(ds) . (16.1.4) Lα,H (t) = −∞

It is readily checked that  t |(t − s)H−1/α − (−s)H−1/α |α ds < ∞ , −∞

thus this process is well defined. Furthermore, Lα,H has almost surely continuous sample paths, α-stable marginal distributions, stationary dependent increments and is H-self-similar. This means that for every a > 0, {Lα,H (ct), t ∈ R} = {cH Lα,H (t), t ∈ R} . d

Let aZ,n be the 1 − 1/n quantile of the distribution of |Z0 | and let Sn (t) = [nt] i=1 Xi , t ≥ 0 be the partial sum process. Theorem 16.1.2 Let α ∈ (1, 2) and Assumption 16.1.1 hold. Then w

(H − 1/α)(naZ,n ψn )−1 Sn =⇒ Lα,H in D([0, ∞)) endowed with the J1 topology.

Remark 16.1.3 Contrary to the short-memory case, convergence in the space D endowed with the J1 topology holds. Since the distribution of |Z0 | is regularly varying with tail index α, the scaling sequence aZ,n is regularly varying with index 1/α and the scaling naZ,n ψn of the partial sum process in Theorem 16.1.2 is regularly varying with index H ⊕ We only provide a sketch of proof. See Section 16.4 for a reference to a complete proof.

456

16 Long memory processes

 Proof (Sketch of proof of Theorem 16.1.2). Let Nn = be i∈Z δn−1 i,a−1 Z,n Zi the functional point process of exceedences of the sequence {Zj , j ∈ Z}. Write H − 1/α  Xi = Tn (t) + Rn (t) , naZ,n ψn i=1 [nt]

with [nt] i−1

H − 1/α   ψj Zi−j (16.1.5) naZ,n ψn i=1 j=0 ⎛ ⎞  t  ∞ [nt]−1 [nt]−j   H − 1/α Zj ⎝ ⎠ = ψi = gn (s, t) uNn (ds, du) , nψ a n Z,n s=0 −∞ j=0 i=0

Tn (t) =

(16.1.6) where for s ∈ (0, t) H − 1/α gn (s, t) = nψn



[nt]−[ns]

→ (H − 1/α)

i=0  t−s

ψi uH−1−1/α dy = (t − s)H−1/α ,

0

by regular variation. The second term Rn can be expressed as [nt] ∞

H − 1/α   ψj Zi−j (16.1.7) Rn (t) = naZ,n ψn i=1 j=i ⎛ ⎞  0  ∞ [nt] ∞   Z−j ⎝ H − 1/α = ψi+j ⎠ = hn (s, t) uNn (ds, du) , nψ a n Z,n s=−∞ −∞ j=1 i=1 (16.1.8) where for s < 0, H − 1/α  ψi+[n(−s)] nψn i=1  t → (H − 1/α) (u − s)H−1−1/α du = (t − s)H−1/α − (−s)H−1/α , [nt]

hn (s, t) =

0

by regular variation  t  ∞again. In view of the result of Chapters 7 and 8, recalling that Λ(t) = s=0 u=−∞ uN  (ds, du) it is natural to expect that

16.1 Long memory moving averages fi.di.



Tn (t) −→

t

457

(t − s)H−1/α Λ(ds) ,

0

fi.di.

Rn (t) −→



0

−∞

{(t − s)H−1/α − (−s)H−1/α }Λ(ds) .

The two terms being independent, it follows from this leap of faith that d Rn (t) + Tn (t) −→ Lα,H (t). A rigorous proof will need a small jumps/big jumps decomposition of Nn and Potter’s bounds to apply the dominated convergence theorem. Convergence of finite-dimensional distributions is obtained similarly. The proof of tightness is surprisingly simple. This is a typical feature of long memory. We apply the criterion (C.2.4) which ensures the asymptotic equicontinuity criterion (C.2.9) of Theorem C.2.14. Since P(|Z0 |>aZ,n )∼n−1 , we have by Proposition 4.1.6, for an arbitrarily small > 0, P(|Sn (t) − Sn (s)| > nψn aZ,n )

α−



[nt]−i [nt] ∞

  



1 1



≤ cst n−1 ψj

+ cst n−1

nψn

nψn

j=0 i=0

i=[ns]+1



[nt]−[ns] j=1

α−



ψi+j

.

By H¨older inequality, for 0 ≤ s ≤ t ≤ u, P(|Sn (t) − Sn (s)| > nψn aZ,n , |Sn (t) − Sn (u)| > nψn aZ,n ) ⎛ ⎞α− [nu] [nu]−i   ⎝ 1 ≤ cst n−1 |ψj |⎠ nψn j=0 i=[ns]+1 ⎛ ⎞α− [nu]−[ns] ∞   1 ⎝ + cst n−1 |ψi+j |⎠ nψ n i=0 j=1 ⎛ ⎞α− [nu]−[ns]  1 ≤ cst n−1 ([nu] − [ns]) ⎝ |ψj |⎠ nψn j=0 ⎛ ⎞α− [nu]−[ns] ∞   ⎝ 1 + cst n−1 |ψi+j |⎠ . nψn j=1 i=0 By Potter’s bounds and Karamata Theorem, we obtain, for a small enough , P(|Sn (t) − Sn (s)| > nψn aZ,n ,|Sn (t) − Sn (u)| > nψn aZ,n ) ≤ cst (u − s)(H−1/α−)(α−)+1 .

(16.1.9)

Thus we can apply Theorem C.2.14 and obtain the convergence in D([0, ∞)) endowed with the J1 topology. We see why long memory makes the proof of

458

16 Long memory processes

tightness simple: since H − 1/α > 0, the exponent of (u − s) in (16.1.9) is strictly greater than 1 as needed. For a short-memory moving average, such a simple computation would only yield (u − s).   If the coefficients ψj satisfy (16.1.3) with H < 1/α, then j∈Z |ψj | < ∞ and we can apply the results of Section 15.3: the partial sum process converges to a stable L´evy process with independent increments. The phase transition between long and short memory is thus related to the rate of decay of the coefficients of the moving average and affects the partial sum process.

16.2 Heavy-tailed subordinated Gaussian processes Let {ξj , j ∈ Z} be a stationary centered Gaussian process with correlation function defined by ρn = var(ξ0 )−1 cov(ξ0 , ξn ). Without loss of generality, we assume that var(ξ0 ) = 1.

16.2.1 Extreme value theory for Gaussian processes The standard Gaussian distribution is not regularly varying. It is rapidly varying in the sense that if Φ denotes the c.d.f. of the is standard Gaussian distribution, then lim

x→∞

Φ(ax) =0 Φ(x)

for all a > 1. This is checked by using the equivalent Φ(x) ∼

Φ (x) , x→∞. x

(16.2.1)

An important feature of the Gaussian bivariate distribution is extremal independence: if (X, Y ) ia a bivariate Gaussian vector with standard Gaussian marginal distributions and correlation ρ ∈ (−1, 1), then P(X > x, Y > x) =0. x→∞ P(X > x) lim

Indeed, write Z = Y − ρX. Then Z is independent of X and, for a ∈ (ρ+ , 1), by the property of rapid variation, P(X > x, Y > x) = P(X > x, Z > x − ρX) ≤ P(X > a−1 x) + P(X > x)P(Z > x − (ρa−1 x)+ ) = o(P(X > x)) .

16.2 Heavy-tailed subordinated Gaussian processes

459

This property leads to conjecturing that the extremal index of a Gaussian stationary process should be 1, under possibly some mild restriction on the correlation. To confirm this conjecture, we define the sequences an = (2 log n)−1/2 , √ bn = 2 log n − 2 log(2 2π log n) . Then limn→∞ P(X > bn + an x) = e−x for all x ∈ R and if {ξj , j ∈ Z} is a sequence of i.i.d. standard Gaussian random variables, then for x ∈ R, lim P( max ξi ≤ bn + an x) = e−e

n→∞

1≤i≤n

−x

.

(16.2.2)

We will prove that this limits still holds for a stationary Gaussian process under a very mild restriction on the rate of decay at ∞ of the correlation. For this we will consider the point process of exceedences, defined as Nξ,n =

∞  i=1

δ i , ξi −bn . n

an

Since the Gaussian distribution is in the domain of attraction of the Gumbel law, a centering is needed in addition to the scaling. We state the main result without proof. Theorem 16.2.1. Assume that lim ρn log(n) = 0 .

n→∞

(16.2.3)

Then Nξ,n converges weakly to a Poisson point process with mean measure dte−x dx on [0, ∞) × R and consequently (16.2.2) holds.

This result shows that a long memory effect for the sample maxima and the point process of exceedences may only occur if (16.2.3) does not hold. Different behaviors are then possible. We only quote two results. We need to define a mixed Poisson point process. Let Ξ be a random non-negative measurable function; a mixed Poisson point process with random intensity Ξ is a poisson point process conditionally on Ξ i.e.  ∞ (e−f (x) − 1)Ξ(x)dx . log E[e−N (f ) | Ξ] = −∞

460

16 Long memory processes

Theorem 16.2.2 (i) If lim ρn log(n) = γ ∈ (0, ∞) ,

n→∞

(16.2.4)

then Nξ,n converges weakly to √a mixed Poisson point process on R with random intensity e−x−γ+ 2γζ dx with ζ a standard Gaussian random variable. Consequently,

√ −x−γ 2γζ ]. lim P max ξi ≤ bn + an x = E[e−e n→∞

1≤i≤n

(ii) If ρn and ρn log(n) are eventually monotonic, limn→∞ ρn = 0 and lim ρn log(n) = ∞ ,

n→∞

then

(16.2.5)



 lim P ρ−1/2 ξ − 1 − ρ b max ≤ x = Φ(x) . i n n n

n→∞

1≤i≤n

Gaussian process whose autocovariance function satisfies (16.2.4) or (16.2.5) are called log-correlated. They are rarely encountered in classical time series models, but more frequent in models arising from statistical physics such as Gaussian free fields and branching random walks. We will not discuss them in this chapter.

16.2.2 Limit theory for long memory Gaussian processes Let {ξj , j ∈ Z} be a stationary Gaussian process with standard Gaussian marginal distribution whose covariance function {ρj , j ∈ Z} satisfies ρn = (n)n2H−2 ,

(16.2.6)

with H ∈ (1/2, 1). The best known example is the fractional Gaussian noise ZH which is a Gaussian process with covariance function 1 (|1 + n|2H − 2|n|2H + |n − 1|2H ) , n ∈ Z . 2 The fractional Gaussian noise is the increment process of the fractional Brownian motion BH which is a Gaussian process with covariance cov(BH (s), BH (t)) =

1 (|s|2H − |t − s|2H + |t|2H ) , s, t ∈ R . 2

16.2 Heavy-tailed subordinated Gaussian processes

461

This implies that the Gaussian process {ξj , j ∈ Z} has long memory but still satisfies (16.2.3). Another class of long memory Gaussian processes consist of those which admit a causal representation ∞  ξj = ψj ζj−i , j ∈ Z , (16.2.7) i=0



with {ψj , j ∈ Z} such that j∈Z ψj2 = 1 and {ζj , j ∈ Z} a sequence of i.i.d. standard Gaussian variables. This representation entails ∞  ρj = ψi ψi+j , j ∈ Z . (16.2.8) i=0

In order to obtain long memory, we will assume that ψj is regularly varying at ∞ with index H − 3/2, i.e. there exists a slowly varying function ψ such that ψj = ψ (n)j H−3/2 , j ≥ 1 .

(16.2.9)

Note that this corresponds to (16.1.3) with α = 2. The relations (16.2.8) and (16.2.9) imply that the autocovariance {ρj , j ∈ Z} satisfies (16.2.6) with (n) ∼ cH 2ψ (n) ,

(16.2.10)

where the constant cH is known, cf. Problem 1.4. Hermite polynomials To proceed further, we need to introduce Hermite polynomials. For n ≥ 1, we set 2

Hn (x) = (−1)n ex

/2

dn −x2 /2 e , x∈R. dxn

The first polynomials are easily computed: H0 (x) = 1, H1 (x) = x and H2 (x) = x2 − 1. The Hermite polynomials can also be obtained as the coef2 ficients (in x) of the Taylor series expansion of the function u → exu−u /2 , i.e. 2

exu−u

/2

=

∞  n=0

Hn (x)

un . n!

This definition easily yields the following relation: if a, b are real numbers such that a2 + b2 = 1, then, for all x, y ∈ R, n  n j n−j Hn (ax + by) = Hj (x)Hn−j (y) . (16.2.11) a b j j=0

462

16 Long memory processes

The most important property of Hermite polynomials is that they are orthogonal polynomials with respect to the standard Gaussian distribution, and more generally, if (ξ, ζ) is a Gaussian bivariate vector with standard marginals and correlation ρ, then E[Hj (ξ)] = 0 for all j ≥ 1 and for j, k ≥ 1, cov(Hj (ξ), Hk (ζ)) = k!ρk 1{j = k} .

(16.2.12)

For p > 0, we will say that a measurable function f is in Lp (Φ) if E[|f (ξ)|p ] < ∞. A measurable function f ∈ L2 (Φ) can be expanded in the Hermite basis: f=

∞  cj (f ) j=0

j!

Hj ,

(16.2.13)

with cj (f ) = E[f (ξ)Hj (ξ)] .

(16.2.14)

Note that this definition entails that the Hermite coefficients of a Hermite polynomial Hm are all zero except the m-th one which is m!. The Hermite rank of a function f in Lp (Φ) is the smallest integer m such that cm (f ) = 0. If f ∈ L2 (Φ) has Hermite rank m and if (ξ, ζ) is a Gaussian vector with standard marginals and correlation ρ, then (16.2.12) and (16.2.13) yield cov(f (ξ), f (ζ)) =

∞  c2j (f ) j ρ . j! j=m

(16.2.15)

Thus |cov(f (ξ), f (ζ))| ≤ |ρm |var(f (ξ)) .

(16.2.16)

Let {ξj , j ∈ Z} be a Gaussian process with standard marginals and covariance as in (16.2.6). If f ∈ L2 (Φ) has Hermite rank m ≥ 1, then (16.2.16) yields |cov(f (ξ0 ), f (ξj ))| ≤ var(f (ξ0 ))|(n)|m n−m(2−2H) . This yields that there exists a constant which depends only on the covariance function such that for all n and all functions f with Hermite rank m,    n  n2 ρm if 1 − m(1 − H) > 1/2 , n f (ξi ) ≤ cst var(f (ξ0 )) × var n if 1 − m(1 − H) < 1/2 . i=1 (16.2.17) If 1 − m(1 − H) = 1/2, then   n  f (ξi ) ≤ cst var(f (ξ0 )) × L(n) , var i=1

(16.2.18)

16.2 Heavy-tailed subordinated Gaussian processes

463

with L a slowly varying function. Furthermore, we have the following dichotomy:   = ∞ if m(1 − H) < 1/2 , |cov(f (ξ0 ), f (ξj ))| < ∞ if m(1 − H) > 1/2 . j∈Z In addition to these variance bounds, weak convergence can be proved. In the case where the covariance of {ξj , j ∈ Z} is summable, the limit is the standard Brownian motion. In order to give the limit in the other case, we introduce the family of Hermite processes. For m ∈ N∗ and H ∈ (1 − 1/(2m), 1) and t ≥ 0,    t  m H−3/2 (s − xi )+ ds W(dx1 ) · · · W(dxm ) , Zm,H (t) = ς(m, H) Rm

0 i=1

(16.2.19) where W is a standard Brownian motion and ς(m, H) is a known constant depending only on m and H. See the references for a precise definition of the multiple stochastic integral. The Hermite process of order 2 is also called Rosenblatt process. For all m and H ∈ (1 − 1/(2m), 1), it is a process with stationary dependent increments and self-similar with self-similarity index 1 − m(1 − H) ∈ (1/2, 1). Theorem 16.2.3 Let {ξj , j ∈ Z} be a stationary Gaussian process which satisfies (16.2.6). Let h ∈ L2 (Φ) have Hermite rank m ≥ 1. and consider [nt] the partial sum process Sn (t) = i=1 {h(ξi ) − E[h(ξ0 )]}, t ≥ 0. −m/2

(i) If 1−m(1−H) > 1/2, then n−1 ρn endowed with the J1 topology.

w

Sn =⇒

cm (h) m! Zm,H

in D([0, ∞))

fi.di.

(ii) If 1 − m(1 − H) < 1/2, then n−1/2 Sn −→ Σ(h)W with  cov(h(ξ0 ), h(ξj )) . Σ 2 (h) = j∈Z

We see that the constant ς(m, H) which appears in (16.2.19) is defined in such a way that if h = Hm , then the limit of the partial sum process is simply Zm,H . This is not standard in the literature but this simplifies the statement of the results without loosing any generality. The value of the constant is known but irrelevant.

464

16 Long memory processes

16.2.3 Heavy-tailed subordinated Gaussian processes We now consider a sequence {Xj , j ∈ Z} which can be expressed as Xj = h(ξj ), j ∈ Z, where {ξj , j ∈ Z} is a stationary Gaussian process and h : R → R is a measurable map such that X0 is regularly varying with tail index α > 0. There are many possibilities for this to happen. The simplest one is to define h(x) = 1/Φ(x). Since Φ is continuous, Φ(ξ0 ) is uniformly distributed on [0, 1], thus 1/Φ(ξ0 ) has a Pareto distribution with tail index 1. 2

Example 16.2.4 More generally, we can consider h(x) = ecx . Then √ 2 1  2e− 2 ( log(x)/c) P(h(ξ0 ) > x) = 2P(ξ0 > log(x)/c) ∼  2π log(x)/c  2c x−1/(2c) . = π log(x) Thus the tail index of h(ξ0 ) is (2c)−1 . In order to obtain a real-valued random 2 variable one can take h(x) = sign(x)|x|β ecx . As x → ∞, P(h(ξ0 ) > x) ∼ P(ξ > {log x − β2 log( 1c log x)}/c)  β β 1 1 c ( 1c log x) 4c −1/2c e− 2c {log x− 2 log( c log x)}  √ ∼ = . x 2π log x 2π log(x)/c Taking β = 4c removes the log term in the tail. The left tail is the same by symmetry.  Example 16.2.5 We can also simply consider h(x) = sign(x)|x|−1/α . Then  x−α −x2 /2 e √ dx ∼ (2π)−1/2 x−α . P(h(ξ0 ) > x) = P(0 < ξ < x−α ) = 2π 0 The tail index is α.



In both cases, if (ξ, ζ) are jointly Gaussian with correlation ρ ∈ (−1, 1), then (h(ξ), h(ζ)) are extremally independent. Infinite variance functions of Gaussian processes In order to deal with functions f such that E[f 2 (ξ)] = ∞, we must extend the previous results. Since the Hermite polynomials are in Lq (Φ) for q > 1, the definition of the coefficients cj (f ) can be extended to all functions f in Lp (Φ) for some p > 1 and |cn (f )| ≤ f Lp (Φ) Hn Lp (Φ) .

16.2 Heavy-tailed subordinated Gaussian processes

465

Let (ξ, ζ) be a Gaussian vector with standard marginals and correlation ρ. Let f ∈ Lp (Φ) for some p > 1. If f ∈ L2 (Φ), then (16.2.12) and (16.2.15) yield E[f (ζ)Hk (ξ)] = ck (f )ρk . If f ∈ Lp (Φ) for p ∈ (1, 2), then this relation still holds. Indeed, it is possible to define a sequence of functions fn such that fn ∈ L2 (Φ) and fn → f in Lp (Φ). Then |ck (fn ) − ck (f )| ≤ f − fn Lp (Φ) Hk Lq (Φ) → 0 . Thus, E[f (ζ)Hk (ξ)] = lim E[fn (ζ)Hk (ξ)] = lim ck (fn )ρk = ck (f )ρk . n→∞

n→∞

This yields E[E[f (ζ) | ξ]Hk (ξ)] = E[f (ζ)Hk (ξ)] = ck (f )ρk . √ By Corollary E.4.7 if ρ < p − 1,

(16.2.20)

var(E[f (ζ) | ξ]) ≤ (E[|f (ξ)|p ])2/p < ∞ . Thus we have

∞

k=1

c2k (f ) 2k k! |ρ|

< ∞ and defining

fρ =

∞  cj (f )ρj j=0

j!

Hj ,

we have E[f (ζ) | ξ] = fρ (ξ) and fρ has the same Hermite rank as f . Partial sum process We now consider a measurable function h such that h(ξ0 ) is regularly varying with index α > 0. If h is monotone, then the convergence of the point process of exceedences of {ξj , j ∈ Z} implies that of {Xj , j ∈ Z}. In other situations, there is no rigorously established result. We will simply assume this convergence. We recall the definition of the point process of exceedences of {Xj , j ∈ Z}. Let an be the 1 − 1/n quantile of |X0 | and Nn =

∞  i=1

δ i , Xi . n an

466

16 Long memory processes

Assumption 16.2.6 The sequence {Xj , j ∈ Z} is regularly varying with tail index α > 0 and extremal independence. Furthermore, w

Nn =⇒ N  =

∞ 

δΓi ,Pi ,

i=1

where N  is a Poisson point process with mean measure Leb ⊗ να,p and pX is the extremal skewness of X0 .

Let Λ be the stable L´evy process associated with the point process N  as in Section 8.2. For convenience, we recall that Λ(t) =

∞ 

1{Γi ≤ t}Pi ,

i=1

if α < 1 and Λ is the distributional limit of Λ (t) =  ∞ 1{Ti ≤ t}Pi 1{|Pi | > } − (α − 1)−1 α(2p − 1) −1/α t , i=1 ∞ −1 )t , i=1 1{Ti ≤ t}Pi 1{|Pi | > } − (2p − 1) log(

if α > 1 , if α = 1 ,

as → 0, the convergence being locally uniform. Finally, we define the partial sum process with the appropriate centering: Sn (t) =

[nt] 

(Xi − bn ) ,

i=1

with

⎧ ⎪ ⎨0 bn = E[X0 1{|X0 | ≤ an }] ⎪ ⎩ E[X0 ]

if α ∈ (0, 1) , if α = 1 , if α ∈ (1, 2) .

Theorem 16.2.7 Let {ξj , j ∈ Z} be a Gaussian process with standard marginals which admits the representation (16.2.7) and such that (16.2.9) holds. Let h be a measurable function such that h(ξ0 ) is regularly varying with tail index α ∈ (0, 2) and extremal skewness pX ∈ [0, 1] and Assumption 16.2.6 holds. Let m be the Hermite rank of h. w

(i) If 0 < α < 1, then a−1 n Sn =⇒ Λ;

16.2 Heavy-tailed subordinated Gaussian processes

467

fi.di.

(ii) If α ∈ [1, 2) and 1 − m(1 − H) ≤ 1/2, then a−1 n Sn −→ Λ; w

(iii) If α ∈ (1, 2) and 1/2 < 1 − m(1 − H) < 1/α, then a−1 n Sn =⇒ Λ; −m/2

(iv) If α ∈ [1, 2) and 1 − m(1 − H) > 1/α, then n−1 ρn cm (h) m! Zm,H with Zm,H (t) as in (16.2.19).

w

Sn =⇒

All convergences except in case (ii) hold in D([0, ∞)) endowed with the J1 topology.

Proof. If α ∈ (0, 1), the point process convergence yields the result since the points of N  are summable, and since N  is simple, convergence holds in D([0, ∞)) endowed with the J1 topology by Theorem 8.3.8. In order to consider the next cases, we will use a martingale decomposition of h(ξi ). Define Fj = σ(ζi , i ≤ j). Then Xj is Fj measurable and, for k ∈ N, E[ξj | Fj−k ] =

∞ 

ψi ζj−i ,

i=k

so that var(E[ξj | Fj−k ]) =

∞ 

ψi2 .

i=k

If α > 1, let J be the smallest integer such that ∞ 

ψj2 < α ∧ (1/H) − 1 .

(16.2.21)

i=J

∞ Set then ς 2 = i=J ψj2 and Wj = ς −1 E[ξj | Fk−J ]. Then {Wj , j ∈ Z} is a stationary Gaussian process with standard marginal distribution and cov(W0 , Wj ) = ς −2

∞ 

ψi ψi+j , j ∈ Z .

i=J

Since H ∈ (1/2, 1), this and (16.2.9) yield lim

n→∞

cov(W0 , Wn ) = ζ −2 . ρn

(16.2.22)

Since there exists such that ς 2 < p − 1, Corollary E.4.7 implies that the ∞p < α −1 function hς = i=m (i!) ci (h)ς i Hi is in L2 (Φ), and we have by stationarity that for all k ∈ Z,

468

16 Long memory processes

E[Xk | Fk−J ] = hρ (Wk ) . Define the sequences {Yj , j ∈ Z} and {Zj , j ∈ Z} by Yk = E[Xk | Fk−J ] and Zk = Xk −Yk , k ∈ Z. By Theorem 16.2.3 and (16.2.22), if 1−m(1−H) > 1/2, we have n−1 ρ−m/2 n

[nt]  i=1

w

(Yk − E[X0 ]) =⇒

cm (h) Zm,H (t) , m!

(16.2.23)

in D([0, ∞)) endowed with the J1 topology. Consider now the partial sum process of the sequence {Zj , j ∈ Z}. For > 0, we write a−1 n

[nt] 

Zi = a−1 n

i=1

[nt] 

Xi 1{|Xi | > an } − E[Xi 1{|Xi | > an }]

i=1

+ a−1 n

[nt] 

Xi 1{|Xi | ≤ an } − E[Xi 1{|Xi | ≤ an } | Fi−J ]

i=1

(16.2.24a)  [nt]

− a−1 n

E[Xi 1{|Xi | > an } | Fi−J ] − E[X0 1{|X0 | > an }] .

i=1

(16.2.24b) The point process convergence in Assumption 16.2.6 and Lemma 8.3.3 implies that a−1 n

[nt] 

fi.di.

Xi 1{|Xi | > an } − E[Xi 1{|Xi | > an }] −→ Λ (t) .

i=1

The limiting point process being simple, the convergence holds with respect to the J1 topology by Theorem 8.3.8. To deal with the term in (16.2.24a), define rk, = Xk 1{|Xk | ≤ an } − E[Xk 1{|Xk | ≤ an } | Fk−J ] . Using the fact that for each j = 1, . . . , J, the sequence {r(i−1)J+j, , i ≥ 1} is a square integrable martingale and the maximal inequality for martingales Proposition E.1.2, we obtain

16.2 Heavy-tailed subordinated Gaussian processes

469





   k J k











P max

rk, > an ≤ P max

r(i−1)J+j, > an /J



1≤k≤n

1≤k≤n

i=1 j=1 i=1 ⎡

2 ⎤ k J







≤ J 2 a−2 E ⎣ max

r(i−1)J+j, ⎦ n

1≤k≤n



j=1

≤ cst a−2 n

J  n 

i=1



2 E r(i−1)J+j,



j=1 i=1

≤ cst

2 a−2 n nE[X0 1{|X0 |

≤ an }] .

Thus by regular variation, 



 k





lim sup P max

rk, > an ≤ cst 2−α .

1≤k≤n

n→∞ i=1

For the term in (16.2.24b), Proposition 1.4.6, (16.2.17) and Corollary E.4.7 yield, for p ∈ (1 + ς 2 , α ∧ (1/H)) and 0 < s < t, ⎛ ⎞ [nt]  ⎝ a−2 E[Xi 1{|Xi | > an } | Fi−J ]⎠ n var i=[ns]+1

≤ cst ([nt] − [ns])2 ρ[nt]−[ns] a−2 n var(E[|X0 |1{|X0 | > an } | F−J ]) 2/p ≤ cst ([nt] − [ns])2 ρ[nt]−[ns] a−2 [|X0 |p 1{|X0 | > an }] n E

≤ cst

([nt] − [ns])2 ρ[nt]−[ns] 2−2/p n ρn . n2 ρn

Since p < 1/H, we have n2−2/p ρn → 0. Since n2 ρn is regularly varying with index 2H > 1, we can apply Lemma C.2.11 and obtain the tightness. Thus for every > 0, a−1 n

[nt] 

w

E[Xi 1{|Xi | > an } | Fi−J ] − E[X0 1{|X0 | ≤ an }] =⇒ 0 ,

i=[ns]+1

in D([0, ∞)) endowed with the J1 topology. Altogether we have proved by the triangular argument that a−1 n

[nt] 

w

Zk =⇒ Λ .

(16.2.25)

i=1

Theorem 16.2.3, the bounds (16.2.17) and (16.2.18), and the convergences (16.2.23) and (16.2.25) prove the statements (ii), (iii), and (iv). If α = 1, we write

470

16 Long memory processes

Sn (t) =

[nt] 

Xi 1{|Xi | > an } − E[X0 1{an < |X0 | ≤ an }]

(16.2.26a)

i=1

+

[nt] 

Xi 1{|Xi | ≤ an } − E[X0 1{|X0 | ≤ an }] .

(16.2.26b)

i=1

If 1 − m(1 − H) < 1/2, then (16.2.17) and regular variation yield ⎛ ⎞ [nt]  ⎝ a−2 Xi 1{|Xi | ≤ an } − E[Xi 1{|Xi | ≤ an }]⎠ n var i=1 2 ≤ cst na−2 n E[X0 1{|X0 | ≤ an }] ≤ cst .

This yields the asymptotic negligibility condition (ANSJ(an )) and we obtain fi.di.

that a−1 n Sn −→ Λ by Theorem 8.3.5. If 1 − m(1 − H) ≥ 1/2, we again split the remainder term into two parts by using a martingale decomposition. Write h (x) = h(x)1{|h(x)| ≤ an }. Since n, then h ∈ Lp (Φ) for all p > 1. Let h is bounded for each fixed ∞J be the ∞ smallest integer such that i=J a2j < 1/H − 1. Write then ς 2 = i=J a2j and fix p ∈ (1 + ς 2 , 1/H). Then, by Corollary E.4.7, var(E[h (ξ0 ) | F−J ]) ≤ cst E2/p [|h (ξ0 )|p ] ≤ cst a2n n−2/p .

(16.2.27)

We then write [nt] 

Xi 1{|Xi | ≤ an } − E[X0 1{|X0 | ≤ an }]

i=1

=

[nt] 

Xi 1{|Xi | ≤ an } − E[Xi 1{|Xi | ≤ an } | Fi−J ]

(16.2.28a)

i=1

+

[nt] 

E[Xi 1{|Xi | ≤ an } | Fi−J ] − E[X0 1{|X0 | ≤ an }] .

i=1

(16.2.28b) The term in (16.2.28a) is handled exactly the same way as the term (16.2.24a). For the term in (16.2.28b), we use (16.2.17) and (16.2.27): ⎛ ⎞ [nt]  ⎝ E[Xi 1{|Xi | ≤ an } | Fi−J ] − E[X0 1{|X0 | ≤ an }]⎠ a−2 n var i=1

≤ cst [nt]2 ρ[nt] n2/p = cst

[nt]2 ρ[nt] ρn n2−2/p . n2 ρn

16.3 Long memory stochastic volatility processes

471

Since ρn is regularly varying with index 2H −2 and p < 1/H, ρn n−2/p = o(1) and we obtain that a−1 n

[nt] 

w

E[Xi 1{|Xi | ≤ an } | Fi−J ] − E[X0 1{|X0 | ≤ an }] =⇒ 0 .

i=1



16.3 Long memory stochastic volatility processes Stochastic volatility models were introduced in financial econometrics to model certain empirical features of financial time series, such as heavy tails, leverage, and long memory in volatility. We consider here the simplest of these models, which does not exhibit leverage but has the other two properties. Let {Zj , j ∈ Z} be a sequence of i.i.d. regularly varying random variables with tail index α > 0 and {ξj , j ∈ Z} be a stationary Gaussian process with standard Gaussian marginal distribution. Let finally σ : R → R+ be a measurable function and define Xj = σ(ξj )Zj = σj Zj , j ∈ Z .

(16.3.1)

We assume that there exists > 0 such that E[σ α+ (ξ0 )] < ∞ .

(16.3.2)

Then, by Breiman’s Lemma 1.4.3, X0 is regularly varying with tail index α and P(|X0 | > x) lim = E[σ α (ξ0 )] . x→∞ P(|Z0 | > x) Moreover, all the bivariate marginal distributions are extremally independent. Cf. Example 2.2.10. The behavior of the partial sum process, autocovariances, and tail empirical process will depend on the Hermite rank of the function σ − E[σ(ξ0 )] and on the autocovariance of the Gaussian process {ξj , j ∈ Z}. We will consider the following assumption. Assumption 16.3.1 The autocovariance {ρj , j ∈ Z} satisfies ρj = (j)|j|2H−2 , j = 0 ,

(16.3.3)

with H ∈ (1/2, 1) and  : [0, ∞) → [0, 1] a function slowly varying at ∞.

472

16 Long memory processes

16.3.1 Point process of exceedences and partial sums In order to study the partial sum process, we consider first the point process of exceedences. Let aX,n be the quantile of order 1 − 1/n of the distribution of |X0 | and define as in Chapter 7, Nn

=

∞  i=1

δi,

Xi n aX,n

.

Recall that pZ is the extremal skewness of Z0 and let aZ,n be the quantile of order 1 − 1/n of the distribution of |Z0 |. Theorem 16.3.2 Let Assumption 16.3.1 hold and σ be a continuous function such that E[σ α+ (ξ0 )] < ∞ for some > 0. Then, w

Nn =⇒ N  =

∞ 

δTi ,Pi ,

(16.3.4)

i=1

where {(Γi , Pi ), i ∈ Z} are the points of a PPP on [0, ∞) × R \ {0} with mean measure Leb ⊗ να,ppZ .

Proof. We will compute the Laplace functional. Let f be a non-negative bounded continuous function with bounded support in [0, ∞) × R \ {0}. We may assume that f (t, x) = 0 if t > A with A an integer. Let X be the σ-field generated by the process {ξj , j ∈ Z}. Then, conditioning on X yields  An   −Nn (h) ]=E E[e−f (i/n,σ(ξi )Zi /aX,n ) | X ] . E[e i=1 

Since e−Nn (h) ≤ 1 we can prove the weak convergence of the conditional expectation and the result will follow by application of the dominated convergence theorem. We will prove that An 

P

log E[e−f (i/n,σ(ξi )Zi /aX,n ) | X ] −→ Leb ⊗ να,ppZ (eh − 1) .

i=1

Let νn be the measure on R \ {0} defined by νn = nE[δa−1 Z0 ]. Then, since X,n f is non-negative, bounded with support separated from zero,

16.3 Long memory stochastic volatility processes An 

log E[e−f (i/n,σ(ξi )Zi /aX,n ) | X ] ∼

i=1

An 

473

E[e−f (i/n,σ(ξi )Zi /aX,n ) − 1 | X ]

i=1 An

=−

1 νn (e−f (i/n,σ(ξi )·) − 1) n i=1

= −νn (Hn ) , where Hn is the random function defined by An

Hn (z) =

1  −f (i/n,σ(ξi )z) {e − 1} . n i=1

It is easily checked that under the assumptions on the process {ξj , j ∈ Z} and the function σ,  A P Hn (z) −→ H(z) = E[e−f (t,σ(ξ0 )z) − 1]dt . 0 v#

Since limn→∞ aX,n /aZ,n → E1/α [σ α (ξ0 )], we have νn −→ (E[σ α (ξ0 )])−1 να,ppZ . By homogeneity of να,ppZ we may expect that P

νn (Hn ) −→ (E[σ α (ξ0 )])−1 να,ppZ (H) (16.3.5)  ∞ ∞ = E[e−f (t,σ(ξ0 )z) − 1]dt(E[σ α (ξ0 )])−1 να,ppZ (dz) 0 0  ∞ ∞ (e−f (t,z) − 1)dt(E[σ α (ξ0 )])−1 να,ppZ (dz) = E[σ α (ξ0 )] 0

0

= Leb ⊗ να,ppZ (e−f − 1) . To conclude, we need an argument to justify the convergence (16.3.5). The support of Hn is not separated from zero so we must use a truncation argument. For > 0, define Hn, (z) = Hn (z)1{|z| > } and Kn, = Hn − Hn, . P

The support of Hn, is separated from zero and Hn, −→ H with H (z) = A E[f (t, σ(ξ0 )z)1{|z| > }]dt. The function H is almost surely continuous 0 P

with respect to να,ppZ , thus νn (Hn,z ) −→ (E[σ α (ξ0 )])−1 να,ppZ (H ). The moment condition satisfied by σ implies that lim (E[σ α (ξ0 )])−1 να,ppZ (H ) = να,ppZ (H) .

→0

We now prove that for all δ > 0, lim lim sup P(νn (Kn, ) > δ) = 0 .

→0 n→∞

It suffices to prove that

474

16 Long memory processes

lim lim sup E[νn (Kn, )] = 0 .

→0 n→∞

By definition of Kn and since f is bounded and has support separated from zero, there exists η > 0 such that  A E[νn (Kn, )] = n E[f (t, σ(ξ0 )Z0 /aX,n )1{|Z0 | ≤ aX,n }]dt 0

≤ cst {n{P(σ(ξ0 )Z0 > aX,n η) − nP(Z0 > aX,n )}+ . Thus lim lim sup E[νn (Kn, )] ≤ cst lim {η −α − E[σ α (ξ0 )] −α }+ = 0 .

→0 n→∞

→0

 Partial sum process We now consider the partial sum process Sn (t) =

[nt] 

(Xi − bn ) ,

i=1

with bn as in Chapter 8: ⎧ ⎪ ⎨0 bn = E[X0 1{|X0 | ≤ aX,n }] ⎪ ⎩ E[X0 ]

if α < 1 , if α = 1 , if α > 1 .

We also define the stable process ΛZ associated to the sequence {Zj , j ∈ Z}, i.e. {a−1 Z,n

[nt] 

w

(Zi − bZ,n ), t ≥ 0} =⇒ ΛZ ,

i=1

in D([0, ∞)) endowed with the J1 topology, with the centering ⎧ ⎪ if 0 < α < 1 , ⎨0 bZ,n = E[Z0 1{|Z0 | ≤ aZ,n }] if α = 1 , ⎪ ⎩ if 1 < α < 2 . E[Z0 ] Then

 d

ΛZ (1) =

Stable(α, Γ (1 − α) cos(πα/2), βZ , 0) if α = 1 , Stable(1, π2 , βZ , c0 βZ ) if α = 1 ,

(16.3.6)

16.3 Long memory stochastic volatility processes

475

with c0 given in (8.2.2c). Let cj (σ), j ≥ 0, be the Hermite coefficient of the function σ − E[σ(ξ0 )] and m be its Hermite rank (cf. (16.2.14)). Let Zm,H be the Hermite-Rosenblatt process (cf. (16.2.19)). Theorem 16.3.3 Let Assumption 16.3.1 hold and let m be the Hermite rank of σ − E[σ(ξ0 )]. If α > 1 assume moreover that there exists > 0 such that E[σ 2α+ (ξ0 )] < ∞. (i) If 0 < α ≤ 1 or if 1 < α < 2 and E[Z0 ] = 0, then w

a−1 X,n Sn =⇒ ΛZ . (ii) If 1 < α < 2 and E[Z0 ] = 0, −m/2

• if m(1 − H) < 1/α, then nρn

w

Sn =⇒

cm (σ)E[Z0 ] Zm,H ; m!

w

• if m(1 − H) > 1/α, then a−1 X,n Sn =⇒ ΛZ . In each case, the convergence holds in D([0, ∞)) endowed with the J1 topology.

Proof. If α < 1, then the convergence of the partial sum process follows from the convergence of the point process of exceedences Nn by Theorem 8.3.1 and Theorem 8.3.8. If α ∈ (1, 2), we write Sn (t) =

[nt] 

(Xi − E[X0 ])

i=1

=

[nt] 

(Zi − E[Z0 ])σ(ξi ) + E[Z0 ]

i=1

[nt] 

(σ(ξi ) − E[σ(ξ0 )]) .

i=1

The first term is a martingale; hence we expect the limiting behavior to be as in the weakly dependent case, with normalization a−1 X,n . The second term is the source of long memory. Since E[σ 2 (ξ0 )] < ∞, by Theorem 16.2.3, we know that if m(1 − H) < 1/2, then n−1 ρ−m/2 n

[nt]  i=1

Fix > 0. Then

w

(σ(ξi ) − E[σ(ξ0 )]) =⇒

cm (σ) Zm,H (t) . m!

(16.3.7)

476

16 Long memory processes

[nt]  (Zi − E[Z0 ])σ(ξi ) i=1

=

[nt] 

{Xi 1{|Xi | > aX,n } − E[X0 1{|X0 | > aX,n }]}

i=1



[nt] 

{σ(ξi )E[Z0 1{|Z0 |σ(ξi ) > aX,n } | X ] − E[X0 1{|X0 | > aX,n }]}

i=1

(16.3.8a)  [nt]

+

σ(ξi ){Zi 1{Xi | ≤ aX,n } − E[Z0 1{|Xi | ≤ aX,n } | X ]} .

i=1

(16.3.8b) By the point process convergence and by extremal independence, the identity (8.3.7) holds. Thus by Lemma 8.3.3, we have a−1 X,n

[nt] 

w

{Xi 1{Xi | > aX,n } − E[X0 1{|X0 | > aX,n }]} =⇒ ΛZ, (t) ,

i=1

where ΛZ, is the centered process with jumps greater than (cf. (8.2.5)) and w we know by Proposition 8.2.3 that ΛZ, =⇒ ΛZ as → 0. We will prove that the other two terms are asymptotically negligible, as n → ∞ first then → 0. For the term (16.3.8a) we apply (16.2.17): ⎛ ⎞ [nt]  ⎝ {σ(ξi )E[Z0 1{|Z0 |σ(ξi ) > an }X ] − E[X0 1{X0 | > aX,n }]}⎠ a−2 X,n var i=1

  2 2 2 ≤ cst a−2 X,n [nt] ρ[nt] E σ (ξ0 ) (E[|Z0 |1{|Z0 |σ(ξi ) > aX,n } | X ]) ≤ cst

([nt])2 ρ[nt] 2 ρn E[σ 2 (ξ0 )(na−1 X,n E[|Z0 |1{|Z0 |σ(ξi ) > aX,n } | X ]) ] . n2 ρn

By Problem 1.8, we have α−1+/2 na−1 (ξi ) ∨ 1 . X,n E[|Z0 |1{|Z0 |σ(ξi ) > aX,n } | X ]) ≤ cst σ

This yields 2 2α+ E[σ 2 (ξ0 )(na−1 (ξ0 ) ∨ 1] . X,n E[|Z0 |1{|Z0 |σ(ξi ) > aX,n } | X ]) ] ≤ cstE[σ

Thus we obtain for each t > 0, ⎛ [nt]  −2 ⎝ aX,n var {σ(ξi )E[Z0 1{|Z0 |σ(ξi ) > an } | X ] i=1

 − E[X0 1{|X0 | > aX,n }]}

= O(ρn ) .

16.3 Long memory stochastic volatility processes

477

Since n2 ρn is regularly varying with index 2H > 1, we obtain by Potter’s bound, for s < t, ⎛ [nt]  ⎝ a−2 var {σ(ξi )E[Z0 1{|Z0 |σ(ξi ) > an } | X ] X,n i=[ns]+1

 −

E[X0 1{|X0 | > aX,n }]}

≤ cst (t − s)2H .

This proves the tightness and thus for the term in (16.3.8a) we have a−1 X,n

[nt] 

{σ(ξi )E[Z0 1{|Z0 |σ(ξi ) > an } | X ]

i=1

w

−E[X0 1{|X0 | > aX,n }]} =⇒ 0 , by (C.2.4) and Theorem C.2.14 since ρn → 0 as n → ∞. Consider now the term (16.3.8b). By Doob’s inequality (Proposition E.1.2) ⎡⎛ [nt]  E ⎣⎝ sup σ(ξi ){Zi 1{|Xi | ≤ an } 0≤t≤1 i=1



2

| X⎦

− E[Z0 1{|Xi | ≤ aX,n } | X ]} ≤

n 

σ 2 (ξi )E[Zi2 1{|Xi | ≤ an } | X ] .

i=1

Thus, by regular variation, ⎡⎛ [nt]  −2 ⎣ ⎝ aX,n E sup σ(ξi ){Zi 1{|Xi | ≤ an } 0≤t≤1 i=1

2 ⎤ − E[Z0 1{|Xi | ≤ aX,n } | X ]}



2 2−α . ≤ na−2 X,n E[X0 1{|X0 | ≤ an } | X ] ≤ cst

Finally, for all δ > 0,





[nt] ⎝ lim lim sup P sup

σ(ξi ){Zi 1{|Xi | ≤ an } →0 n→∞ 0≤t≤1 i=1





− E[Z0 1{|Xi | ≤ aX,n } > an,X δ = 0 .

478

16 Long memory processes

Altogether, we have obtained that a−1 X,n

[nt] 

w

(Zi − E[Z0 ])σ(ξi ) =⇒ ΛZ (t) .

(16.3.9)

i=1

Combining (16.3.7) with (16.3.9) gives the limiting behavior of Sn for α ∈ (1, 2), for both cases E[Z0 ] = 0 and E[Z0 ] = 0. If α = 1, we fix > 0 and write Sn (t) =

[nt] 

{Xi 1{|Xi | > an } − E[X0 1{an < |X0 | ≤ aX,n }]}

i=1

+

[nt] 

{σ(ξi )E[Z0 1{|Z0 |σ(ξi ) ≤ aX,n } | X ] − E[X0 1{|X0 | ≤ aX,n }]}

i=1

(16.3.10a) +

[nt] 

σ(ξi ){Zi 1{|Xi | ≤ aX,n } − E[Z0 1{|Z0 |σ(ξi ) ≤ aX,n } | X ]} .

i=1

(16.3.10b) Again, by the point process convergence and extremal independence, (8.3.7) holds and a−1 X,n

[nt] 

fi.di.

{Xi 1{|Xi | > an } − E[X0 1{an < |X0 | ≤ aX,n }]} −→ ΛZ, (t) .

i=1

The term (16.3.10b) is treated in exactly the same way as the corresponding term (16.3.8b). The term (16.3.10a) is slightly different from (16.3.8a). By (16.2.17), we first obtain the bound ⎛ [nt]  −2 ⎝ aX,n var {σ(ξi )E[Z0 1{|Z0 |σ(ξi ) ≤ aX,n } | X ] i=1

 − E[X0 1{|X0 | ≤ aX,n }]}

! 2 " 2 2 ≤ cst a−2 X,n [nt] ρ[nt] E σ (ξ0 )E [|Z0 |1{|Z0 |σ(ξ0 ) ≤ aX,n } | X ] . By regular variation and Proposition 1.4.6, we have E[|Z0 |1{|Z0 |σ(ξ0 ) ≤ aX,n } | X ] = L(an /σ(ξ0 )) , with L(x) = E[|Z0 |1{|Z0 | ≤ x}] is slowly varying at ∞. Since |X0 | is regularly varying with tail index 1, na−1 X,n is a slowly varying function of n. Thus

16.3 Long memory stochastic volatility processes

⎛ ⎝ a−2 X,n var

[nt] 

479

{σ(ξi )E[Z0 1{|Z0 |σ(ξi ) ≤ aX,n } | X ]

i=1

 − E[X0 1{|X0 | ≤ aX,n }]} ≤ cst

[nt]2 ρ[nt] 2 2 ρn n2 a−2 X,n E[σ (ξ0 )L (aX,n /σ(ξ0 )] . n2 ρn

By Potter’s bound, the moment assumption on σ and the regular variation of ρn , we have 2 lim ρn n2 a−2 X,n E[σ (ξ0 )L(an, /σ(ξ0 )] = 0 .

n→∞

The rest of the argument is exactly similar.



Remark 16.3.4 In the case α = 1, we have actually proved the asymptotic negligibility condition (ANSJU(an )). In the case α > 1 and E[Z0 ] = 0, we have not even tried to, since it cannot hold in general in view of the convergence to a non-stable limit.

16.3.2 Tail empirical process and Hill estimator We now consider the tail empirical process and the Hill estimator of the tail index. The dichotomy in the limiting behavior previously observed still exists, but in a different way and with somewhat surprising effects. We assume that the distribution function F of X0 is continuous and for a scaling sequence un , we set k = nF (un ) and for s > 0, n 1 T#n (s) = 1{Xi > un s} . k i=1

In order to state our result, we need to introduce Gn (s, x) =

P(σ(x)Z0 > un s) . P(X0 > un )

(16.3.11)

As in Chapter 9, we write T (s) = s−α . As noted in Remark 1.4.4, we have for every p ≥ 1 such that pα < α + (with as in (16.3.2)) and every s0 > 0, 

p  (16.3.12) lim sup E Gn (s, ξ0 ) − (E[σ α (ξ0 )])−1 σ α (ξ0 )T (s) = 0 . n→∞ s≥s0

This recovers that lim E[Gn (s, ξ0 )] = lim

n→∞

n→∞

P(X0 > un s) = T (s) . P(X0 > un )

480

16 Long memory processes

For each s > 0, let cj,n (s), j ≥ 0 be the Hermite coefficients of the function Gn (s, ·) and let qn (s) be the Hermite rank of the function x → Gn (s, x) − E[Gn (s, ξ0 )], i.e. ∞  cj,n (s) Hj . j!

Gn (s, ·) − E[Gn (s, ξ0 )] =

(16.3.13)

j=qn (s)

Let also cj (σ α ), j ≥ 0 be the Hermite coefficients of σ α and let q be the Hermite rank of σ α − E[σ α (ξ0 )]. The convergence (16.3.12) implies that for all j ≥ 0 and uniformly with respect to s ≥ s0 , lim cj,n (s) =

n→∞

T (s)cj (σ α ) . E[σ α (ξ0 )]

If cj = 0, then cj,n (s) = 0 for large n, uniformly with respect to s ≥ s0 . This convergence implies that lim supn→∞ qn (s) ≤ q but does not imply that qn (s) converges to q. Therefore we make the following slightly restrictive assumption. Assumption 16.3.5 The Hermite rank qn (s) of the function Gn (x, ·) − E[Gn (x, ξ0 )] is equal to the Hermite rank q of the function σ α − E[σ α (ξ0 )] for large enough n and all s > 0. This assumption holds if q = 1. This is the case, for instance, of the usual choice σ(x) = eax for some a > 0. Theorem 16.3.6 Let Assumptions 16.3.1 and 16.3.5 hold with q(1 − H) < 1/2 and assume that there exists > 0 such that E[σ 2α+ (ξ0 )] < ∞ and limn→∞ nF (un ) = ∞. (i) If limn→∞ nF (un )ρqn cq (σ α ) q!E[σ α (ξ0 )] T Zq,H (1).

=

−q/2

∞, then ρn

(ii) If limn→∞ nF (un )ρqn = 0, then



{T#n − E[T#n ]}

w

=⇒

w k{T#n − E[T#n ]} =⇒ W ◦ T .

Both convergences hold in D((0, ∞)) endowed with the J1 topology.

Proof. As before, let X denote the σ-field generated by the process {ξj , j ∈ Z}. Then n P(Xi > un s | X ) Gn (s, ξi ) = P(Xi > un s | X ) = . k P(X0 > un ) We decompose the tail empirical process as T#n − E[T#n ] = In + Jn with

16.3 Long memory stochastic volatility processes

In (s) = Jn (s) =

1 k 1 n

n 

481

{1{Xi > un s} − P(Xi > un s | X )} ,

i=1 n 

{Gn (s, ξi ) − E[Gn (s, ξ0 )]} .

i=1

The process In is a sum of conditionally independent random variables, so we expect convergence to a Gaussian process at the rate k −1/2 . The second process Jn is a the partial sum process of a subordinated long memory Gaussian process, so we expect a dichotomy as in Theorem 16.2.3. We study these two terms separately. Conditionally independent part For the finite-dimensional convergence, we use the conditional LindebergL´evy central limit Theorem E.2.5. We only do the univariate convergence, the extension being straightforward. In the first place, with FZ the distribution function of Z0 , n 1 E[{1{Xi > un s} − P(Xi > un s | X )}2 | X ] k i=1

=

n 1 P(Xi > un s | X ){1 − P(Xi > un s | X )} k i=1

=

1 Gn (s, ξi ){1 − P(Xi > un s | X )}. n i=1

n

Write Qn (s) = n−1

n

i=1

Gn (s, ξi ). Then, applying (16.2.17) yields

var(Qn (s)) ≤ cst ρqn var(Gn (ξ0 )) ≤ cst ρqn E[G2n (s, ξ0 )] . Under the assumption E[σ 2α+ (ξ0 )] < ∞, we have that for each s > 0, Gn (s, ξ0 ) converges to (E[σ α (ξ0 )])−1 σ α (ξ0 )T (s) in quadratic mean, i.e. lim E[|Gn (s, ξ0 ) − (E[σ α (ξ0 )])−1 σ α (ξ0 )T (s)|2 ] = 0 .

n→∞

This yields that limn→∞ E[G2n (s, ξ0 )] = (E[σ α (ξ0 )])−2 E[σ 2α (ξ0 )]T (s) and this in turn proves that limn→∞ var(Qn (s)) = 0. Since lim E[Qn (s)] = lim E[Gn (s, ξ0 )] = T (s) ,

n→∞

n→∞

we conclude that n

Qn (s) =

1 P Gn (s, ξi ) −→ T (s) . n i=1

(16.3.14)

482

16 Long memory processes

Also,

 E

n

1 k E[Gn (s, ξi )P(Xi > un s | X )] = E[G2n (s, ξ0 )] → 0 . n i=1 n

Thus we have proved that n 1 P E[{1{Xi > un s} − P(Xi > un s | X )}2 | X ] −→ T (s) . k i=1

(16.3.15)

This is condition (E.2.6) of Lemma E.2.5. The asymptotic negligibility con√ dition holds trivially: for every and n large enough n so that k > 2, n 1 E[{1{Xi > un s} − P(Xi > un s | X )}2 k i=1 $ √ % 1 |1{Xi > un s} − P(Xi > un s | X )| > k | X ] = 0 .

Thus condition (E.2.7) of Theorem E.2.5 holds and this proves the central √ limit theorem for kIn . We now prove tightness. For 0 < s < t < u, we have by Theorem E.4.11, k 2 E[{In (t) − In (s)}2 {In (u) − In (t)}2 | X ]  n  n   3  {Gn (t, ξi ) − Gn (s, ξi )} {Gn (u, ξi ) − Gn (t, ξi )} ≤ 2 n i=1 i=1 2  n 3  {Gn (u, ξi ) − Gn (s, ξi )} = 3{Qn (u) − Qn (s)}2 . ≤ 2 n i=1 P

We have already seen that Qn (s) −→ T (s). Thus we can apply Lemma C.2.10 and we obtain the asymptotic equicontinuity condition (C.2.3c). Since the limit process W√◦ T is continuous, we obtain by Theorem C.2.14 the weak convergence of kIn to W ◦ T in D((0, ∞)) endowed with the J1 topology. Long memory part By assumption, for each fixed s0 and large enough n, the Hermite rank of Gn (s, ·) is q. Thus, using (16.3.13) n

Jn (s) =

cq,n (s) 1  Hq (ξi ) + Rn (s). q! n i=1

d −q/2 n If 1 − q(1 − H) > 1/2, Theorem 16.2.3 yields that n−1 ρn i=1 Hq (ξi ) −→ α −1 α Zq,H (1). Since cq,n converges uniformly on [s0 , ∞) to (E[σ (ξ0 )]) cq (σ )T ,

16.3 Long memory stochastic volatility processes −q/2

w

483

α

c (σ )

q we obtain that ρn Jn =⇒ q!E[σ α (ξ )] T Zq,H (1) with respect to the uniform 0 topology on [s0 , ∞). Applying (16.2.16), we obtain

2 q var(Rn (s)) = O(n2 ρq+1 n ) = o(n ρn ) .

To prove the tightness, we use the inequality (16.2.17) which yields, for 0 < s < t, var(Jn (s) − Jn (t)) ≤ cst ρqn var(Gn (s, ξ0 ) − Gn (t, ξ0 )) ≤ cst ρqn E[{Gn (s, ξ0 ) − Gn (t, ξ0 )}2 ] . We have already noticed that Gn (s, ξ0 ) converges in quadratic mean to (E[σ α (ξ0 )])−1 σ α (ξ0 )T (s), thus lim E[{Gn (s, ξ0 ) − Gn (t, ξ0 )}2 ] =

n→∞

E[σ 2α (ξ0 )] (T (s) − T (t))2 . (E[σ α (ξ0 )])2

Thus we can apply Lemma C.2.11 and obtain that (C.2.3c) holds all subin−q/2 tervals [a, b] of (0, ∞). Thus, ρn Jn converges weakly in D((0, ∞)) endowed cq (σ α )  with the J1 topology to q! T Zq,H (1). Random threshold Let k be an intermediate sequence and define un = F ← (1 − k/n). Note that T#n (s) = T (s) + In (s) + Jn (s) + Tn (s) − T (s) = T (s) + OP (k −1/2 + ρq/2 n ) + E[Gn (s, ξ0 )] − T (s) . If the bias term E[Gn (s, ξ0 )] − T (s) vanishes, the tail empirical measure T#n is weakly consistent and this implies by Proposition 9.1.3 that X(n:n−k) / P

un −→ 1. As in Chapter 9, we will now replace the deterministic threshold by an order statistic. We define n ( 1 ' T&n (s) = 1 Xi > X(n:n−k+1) s . k i=1

We will now see that the particular structure of the long memory limit entails a surprising cancelation for the tail empirical process with random threshold, under a mild strengthening of the bias condition used in Theorem 9.4.1.

484

16 Long memory processes

Corollary 16.3.7 Under the assumptions of Theorem 16.3.6 and if there exists s0 ∈ (0, 1) such that " ! lim k sup E |Gn (s, ξ0 ) − (E[σ α (ξ0 )])−1 σ α (ξ0 )T (s)|2 = 0 , (16.3.16) n→∞

s≥s0

then √

w

k(T&n − T ) =⇒ B ◦ T ,

where B is a standard Brownian bridge and the convergence holds in D([s0 , ∞)) endowed with the J1 topology.

Proof. For clarity and brevity, we write G = (E[σ α (ξ0 )])−1 σ α . Then,  n )  1 T#n (s) − T (s) = In (s) + T (s) G(ξi ) − 1 n i=1 n

+

1 {Gn (s, ξi ) − G(ξi )T (s)} n i=1

˜ n (s) . = In (s) + T (s)˜Jn (s) + R Write ζn = X(n:n−k) /un . Then, T&n (s) = T#n (sζn ). As mentioned above, the P bias condition (16.3.16) implies that ζn −→ 1. Since 1 = T#n (ζn ) and T (st) = T (s)T (t), we have {T&n (s) − T (s)} = T#n (sζn ) − T (s) = T#n (sζn ) − T (sζn ) − T (s){T#n (ζn ) − T (ζn )} ˜ n (sζn ) = In (sζn ) + T (sζn )˜Jn + R ˜ n (ζn )} − T (s){In (ζn ) + T (ζn )˜ Jn + R ˜ n (sζn ) − T (s)R ˜ n (ζn ) . = In (sζn ) − T (s)In (ζn ) + R −q/2 The two terms with ˜Jn , which converge at the rate ρn , canceled out. Since √ w w P kIn =⇒ W ◦ T and ζn −→ 1, we obtain that In (ζn ·) − T In (ζn ) =⇒ B ◦ T . √ w ˜ n =⇒ 0. If limn→∞ kρqn = 0, the proof of tightness We now prove that k R is similar to the proof of tightness of the term Jn in the proof of Theorem there exists q ∗ > q 16.3.6. Otherwise, since k = o(n) and ρn satisfies (16.3.3),  ∞ q∗ ˜ n (s) = such that limn→∞ kρn = 0. We then write R j=q Bn,j (s)Sn,j with n

Sn,j =

1 cj,n (s) − cj (G)T (s) . Hj (ξi ) , Bn,j (s) = n i=1 j!

16.4 Bibliographical notes

485

Recall that cn,j (s) is the j-th Hermite coefficient of Gn . Thus (16.3.16) yields, for each j ≥ q √ k sup |cj,n (s) − cj (G)G(s)| → 0 . s≥s0

√ q∗ w q/2 Since Sn,j = OP (ρn ), this yields k j=q Bn,j Sn,j =⇒ 0. For the sum of the terms with indices j ≥ q ∗ , we can use the same technique as in the proof ∗  of Theorem 16.3.6 because of the condition kρqn → 0. As in Chapter 9, we now apply the previous results to the Hill estimator. We will state without proof the following result. Let γ = α−1 and recall that the Hill estimator is defined by γ &n =

k−1 1 log X(n:n−i) − log X(n:n−k) . k i=0

Corollary 16.3.8 Under the assumptions of Corollary 16.3.7, √

d

k(& γn − γ) −→ N(0, γ 2 ) .

16.4 Bibliographical notes Long-range dependent Gaussian processes have been studied for at least 50 years. For a complete picture and comprehensive bibliography, see [BFGK13] and [PT17]. Long-range dependence as a phase transition was introduced by [Sam06]. See also [Sam16]. Section 16.1 More about the L´evy fractional stable motion Lα,H can be found in [ST94, Chapters 7 and 12] and [Sam16, Chapter 3]. A proof of Theorem 16.1.2 can be found in Chapter 9 of the latter reference. Fractional ARIMA models with regularly varying innovations are discussed in [KT95, KT96]. Theorem 16.1.2 and similar results were obtained earlier by [KM86, KM88], [AT92], the latter reference also showing that J1 convergence is impossible for moving averages and that M1 convergence holds if the weights are nonnegative. Results on convergence of partial sums and sample covariances in case of long memory linear processes with regularly varying innovations can be found in

486

16 Long memory processes

[HK08]. A comprehensive overview can be found in [BFGK13, Sections 4.3 and 4.4.3]. A related field which is unfortunately untouched here is the extremal theory of non-Gaussian stable processes and fields. See [Roy17] for a review and further references. Section 16.2 The results of Section 16.2.1 on Gaussian processes are taken from [LLR83]. The condition ρn log(n) → 0, introduced in [Ber64], is usually referred to as Berman’s condition. For models which do not satisfy this condition, see, for instance, [Shi15] (branching random walks) and [CCH16] (Gaussian free fields). The Hermite process appeared first in the context of limits of functions of Gaussian long memory processes simultaneously in [DM79] and [Taq79]. See the monograph [PT17] for more on multiple stochastic integrals and Hermite processes. The best reference for Theorem 16.2.3 is [Arc94]. It is perhaps also the best and most elegant paper on long memory Gaussian processes. It is impossible not to pay tribute to the late author by quoting (the also late) S´ andor Cs¨org˝ o’s review in MathSciNet: “This is a brilliant paper. It extends virtually the whole existing asymptotic distribution theory of partial sums of sequences of random variables that are functions of a real stationary Gaussian sequence to the case when the governing Gaussian sequence consists of vectors; in fact it does more. . . . The proofs operate through reduction to Hermite polynomials, a covariance lemma of independent interest, virtuoso use of the diagram formula throughout and, in the long-range dependent case, the Dobrushin-Major method of multiple Wiener-Itˆ o integrals.“ Theorem 16.2.7 is due to [Dav83, Lemma 6] for the case α ∈ (0, 1) and [SH08] for the case α ∈ (1, 2). It must be noted that in both references the convergence of the point process of exceedences is claimed to be proved, but in [Dav83] the function h is implicitly assumed to be non-decreasing and the proof of [SH08, Proposition 2.1] is difficult to follow. It is not obvious that the same proof will work for increasing functions h as in Example 16.2.4 or for functions h with a pole at 0 as in Example 16.2.5. Nevertheless credit must be given where it is due: the idea to use hypercontractivity in [SH08] is brilliant and justifies the paper alone. The case α = 1 of Theorem 16.2.7 is stated without proof in [SH08, Theorem 1.2].

16.4 Bibliographical notes

487

Section 16.3 Extremes of stochastic volatility models were first considered in [BD98]. This section essentially follows [KS11, KS12]. The results in Section 16.3 can be extended in several directions. First, the point process convergence of Theorem 16.3.2 does not require the specific Assumption 16.3.1 on the volatility sequence. It suffices that the volatility sequence {σj , j ∈ Z} is ergodic and the moment condition (16.3.2) holds. The point process convergence can be obtained via m-dependent approximation, see [DM01]. [KS12] allows for dependence between {σj , j ∈ Z} and {Zj , j ∈ Z} in order to account for leverage effects. Partial sums convergence (16.3.3) can be extended to convergence of powers and models with leverage ([KS12, Theorem 4.1]) as well as to limits for sample covariances ([DM01, Theorem 4.1] and [KS12, Theorems 5.2 and 5.3]). In the latter reference, it is shown that the leverage may have a peculiar influence on the limiting behavior of sample covariances. The limiting result for the tail empirical process and the Hill estimator is taken from [KS11]. Extensions in a multivariate setting are given in [KS13], while [BBIK19] allows for leverage and study the tail empirical process, the Hill estimator, and some extensions. A second-order condition is given in [KS11] which ensures the bias condition (16.3.16) and the proof of Corollary 16.3.8 is given there. Stochastic volatility models that allow for heavy tails in volatility are considered in [MR13] and [JD16] (in the short-memory case) and in [KS15]. Some of these models allow for extremal dependence.

Appendices

A Weak convergence

A.1 Measures on metric spaces In this section we recall the main notions concerning measures on metric spaces which are used throughout the book. The essential references are [Bil99] and [Kal17].

A.1.1 Metric spaces Topological spaces A topological space is an ordered pair (E, τ ), where E is a set and τ is a collection of subsets such that: • The empty set belongs to τ ; • A countable union of subsets in τ also belongs to τ ; • The intersection of any finite number of subsets in τ also belongs to τ . The elements of τ are called open sets and τ is called a topology on E. The complementary of a subset A of E is denoted by Ac . A topological space is said to be separable if there exists a countable dense subset. The closure of a set A of a topological space E is the smallest closed set which contains A and is denoted A. The interior of a set A of a topological space E is the largest open set contained in A and is denoted A◦ . The frontier ∂A of a set A is defined by ∂A = A \ A◦ . © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

491

492

A Weak convergence

Let A be a subset of a topological space (E, τ ). The trace topology on A is the topology τA defined by τA = {O ∩ A, O ∈ τ }. Metric spaces A semi-metric on E is a map d : E × E → such that d(x, x) = 0 , d(x, y) = d(y, x) , d(x, y) ≤ d(x, z) + d(y, z) . A metric on E is a semi-metric d which moreover satisfies d(x, y) = 0 if and only if x = y. The open ball with center at x and radius  with respect to d is denoted B(x, ): B(x, ) = {y ∈ E : d(x, y) < }. The topology induced by a semi-metric is the smallest topology which contains the open balls. A topological space (E, τ ) is said to be metrizable if there exists a metric on E which induces the topology τ . There exist non-metrizable topological spaces but we will not meet them. A metric space is said to be complete if every Cauchy sequence has a limit. A Polish space is a topological space homeomorphic to a complete separable metric space. A Banach space is a complete normed vector space. Common examples of Polish spaces are the real line and any separable Banach space. Additionally, some metric spaces that are not complete may still be Polish. For instance, an open ball of a separable Banach space is not complete but is Polish. Product spaces Definition A.1.1 (Product topology) Let I be a non-empty set and let (Ei , τi ), i ∈ I be topological spaces. The product topology on E =  i∈I Ei is the smallest topology which makes the projection maps Πi : j∈I Ej → Ei (defined by Πi (x) = xi , x ∈ E) continuous for all i ∈ I.

Lemma A.1.2 Let (Ei , di ), i ∈ N, be metric spaces. Then the product topology is metrizable. A distance which induces the product topology is given by d(x, y) =

∞  i=0

2−i (di (xi , yi ) ∧ 1) , x, y ∈

 i∈N

Ei .

A.1 Measures on metric spaces

493

(i) A sequence {x(n) } of elements of E converge to x ∈ E if and only if (n) limn→∞ xk = xk for all k ∈ N. (ii) If Ei is a Polish space for all i ∈ N, then E endowed with the product topology is a Polish space. Proof. The direct part of (i) trivial so we only prove the only if part. Fix is ∞  > 0. Let k be such that i=k 2−i ≤ /2. By assumption, there exists n0 (n) such that for all n ≥ n0 , di (xi , xi ) ≤ /2 for all i = 0, . . . , k. Then, for n ≥ n0 d(x

(n)

, x) ≤

k 

2−i (di (xi , xi ) ∧ 1) + /2 ≤  . (n)

i=0

This proves that x

(n)

converges to x in E. 

The second statement is [Bou07, Chapitre IX.6.1, Proposition 1]. Triangular arguments

In a metric space convergence of a sequence can be proved by a triangular argument. This means replacing the sequence at hand by a sequence of sequences which converge pointwise to the original sequence. The following technical lemmas are very often used. Lemma A.1.3 Let (E, d) be a metric space. Let y ∈ E, {xn , n ∈ N}, {xm,n , n ∈ N}, m ≥ 1 and {ym , m ≥ 1} be sequences such that lim xm,n = ym ,

n→∞

lim ym = y .

n→∞

Then there exist a non-decreasing sequence {mn } such that mn → ∞ and lim xmn ,n = y .

n→∞

(A.1.1)

Proof. Define doubly indexed sequence w by wm,n = supk≥n d(xm,k , ym ). By assumption, for each fixed n, wm,n is non-increasing with respect to n and tends to zero as n tends to infinity. Set km = inf{n : wm,n ≤ 1/m} . The sequence {km } is non-decreasing by definition. If it is bounded, setting mn = n yields the result. If km → ∞, set mn = sup{m : km ≤ n} . Then mn → ∞, kmn ≤ n and since wm,n is decreasing with respect to n, wmn ,n ≤ wmn ,kmn ≤ 1/mn → 0 . Thus limn→∞ xmn ,n = y.



494

A Weak convergence

Lemma A.1.4 Let (E, d) be a metric space. Let y ∈ E, {xn , n ∈ N}, {xm,n , n ∈ N}, m ≥ 1 and {ym , m ≥ 1} be sequences such that lim xm,n = ym ,

n→∞

lim ym = y ,

n→∞

lim lim sup d(xm,n , xn ) = 0 .

m→∞ n→∞

Then limn→∞ xn = y.

Proof. Writing d(xn , y) ≤ d(xn , xm,n ) + d(xm,n , ym ) + d(ym , y), we obtain, for every m ≥ 1, lim sup d(xn , y) ≤ lim sup d(xn , xm,n ) + lim sup d(xm,n , ym ) + d(ym , y) n→∞

n→∞

n→∞

= lim sup d(xn , xm,n ) + d(ym , y) . n→∞

Letting m tend to infinity in the right-hand side concludes the proof.



Lemma A.1.5 Let E, F be metric spaces and let g : E → F be a continuous function. Then ∂(g −1 (A)) ⊂ g −1 (∂A) for all A ⊂ F. Proof. Let A ⊂ F and x ∈ ∂(g −1 (A)). Since E is there exists a sequences {xn } ⊂ g −1 (A) and {yn } ⊂ (g −1 (A))c such that limn→∞ xn = limn→∞ yn = x. Since g is continuous, we also have limn→∞ g(xn ) = limn→∞ g(yn ) = g(x). / A for all n. Thus g(x) ∈ ∂A.  By construction, g(xn ) ∈ A and g(yn ) ∈

A.1.2 Measures Let E be a set. The set of all subsets of E is denoted by P(E). Let E be a non-empty set. • A π-system A is a subset of P(E) which is stable by finite intersections: for all A, B ∈ A, A ∩ B ∈ A. • A subset E of P(E) is a σ-field if ∅ ∈ E, A ∈ E implies Ac ∈ E and if {Ai , i ≥ 1} is a countable collection of sets in E, then ∪i≥1 Ai ∈ E. A measurable space is a pair (E, E), where E is a non-empty set and E is a σ-field. A subset A of E which belongs to E will be called E-measurable or simply measurable if there is no risk of confusion. • The σ-field generated by a subset A of P(E) is the smallest σ-field which contains A. It is the collection of all countable unions or intersections of sets in A or whose complement is in A.

A.1 Measures on metric spaces

495

• Let E be a topological space. The Borel σ-field of E is the σ-field generated by the open sets of E. Subsets of E in the Borel σ-field are called Borel sets. The Borel σ-field of Rd is the Borel σ-field with respect to the Euclidean metric. Definition A.1.6 Let (E, E) be a measurable space. An application μ defined on E with values in [0, ∞] is a measure if μ(∅) = 0 and for all countable collection ∞ {Ai , i ≥ 1} of pairwise disjoint E-measurable sets, μ(∪∞ i=1 Ai ) = i=1 μ(Ai ). A probability measure on (E, E) is a measure such that μ(E) = 1. The set of probability measures on E is denoted by M1 (E). Let μ be measure on a topological space E endowed with its Borel σ-field. A set A is a continuity set of μ if A is a Borel set and μ(∂A) = 0.

Theorem A.1.7 Let A be a π-system and F be the σ-field generated by A. Two probability measures which are equal on A are equal on F.

Proof. [Bil86, Theorem 3.3].



As a corollary, we obtain a criterion for the equality of two measures. Lemma A.1.8 Let E be a Polish space. Let μ and ν be two measures on the Borel σ-field such that ν(A) = μ(A) for all common bounded continuity sets. Then μ = ν. Proof. Let d be a metric which induces the topology of E. Let μ be a measure and O be a bounded open set. For s > 0 set Os = {x ∈ O, d(x, Oc ) > s}. Note that Os = ∅ for s small enough since every x ∈ O is included in a ball inside O. Also, ∪s t. The function s → μ(Os ) is non-increasing and limt→s,t 0 we write   BLk (E) = f : E → R : sup |f (x)| ≤ k , |f (x) − f (y)| ≤ kd(x, y) , x, y ∈ E . x∈E

(A.2.1) We define the metric dBL on the set of probability measures on E by dBL (μ, ν) =

sup f ∈BL1 (E,d)

|μ(f ) − ν(f )| .

(A.2.2)

Theorem A.2.4 Let (E, d) be a complete separable metric space. Then (M1 (E), dBL ) is a complete separable metric space and if {μn , n ≥ 0} is a sequence of probability measures on E, μn converges weakly to μ ⇐⇒ lim dBL (μn , μ) = 0 . n→∞

(A.2.3)

Proof. [Dud02, Theorem 11.3.3].

Theorem A.2.5 (Continuous mapping) Let E be a metric space endowed with its Borel σ-field. Let {μn , n ∈ N} be a sequence of proba-



A.2 Weak convergence

499

bility measures which converges weakly to a probability measure μ. Let f be a real-valued function which is continuous μ almost everywhere. Then limn→∞ μn (f ) = μ(f ).

Proof. [Bil68, Theorem 5.1]



A sequence of functions {fn , n ∈ N} defined on a set E is said to converge locally uniformly to f on F ⊂ E if limn→∞ fn (xn ) = f (x) for all x ∈ F and all sequences {xn , n ∈ N} such that limn→∞ xn = x. Theorem A.2.6 (Extended continuous mapping) Let E be a metric space endowed with its Borel σ-field. Let {μn , n ∈ N} be a sequence of probability measures which converges weakly to a probability measure μ. Let F ⊂ E be a Borel set such that μ(F) = 1. If {fn } is a sequence of functions which converges locally uniformly on F to a function f , then limn→∞ μn (fn ) = μ(f ).

Proof. [Bil68, Theorem 5.5]



Lemma A.2.7 Let A be a π-system which countably generates the open sets (i.e. every open set is a countable union of sets in A). Let {μn , n ∈ N} be a sequence of probability measures such that limn→∞ μn (A) = μ0 (A) for every A ∈ π. Then μn converges weakly to μ0 . Proof. [Bil99, Theorem 2.2]



Lemma A.2.8 Let (E, d) be a separable metric set. A sufficient condition for a π-system A to countably generate the open sets is that for every x ∈ E and  > 0, there exists A ∈ A such that x ∈ A◦ and A ⊂ B(x, ). Proof. [Bil99, Theorem 2.3]



Note that if A countably generates the open sets it also countably generates the Borel σ-field. Lemma A.2.9 Let M be a countable collection of measures on a Polish space E. Let AM be the class of continuity sets for all μ ∈ M . Then AM is convergence determining.

500

A Weak convergence

Proof. Let d be a metric which induces the topology of E. Let μ be a measure and O be a bounded open set. For s > 0 set Os = {x ∈ O, d(x, Oc ) > s}. Note that Os = ∅ for s small enough since every x ∈ O is included in a ball inside O. Also, ∪s t. The function s → μ(Os ) is non-increasing and limt→s,t 0 lim lim sup P(d(Xm,n , Xn ) > η) = 0 ,

m→∞ n→∞ d

then Xn −→ X.

Proof. [Bil68, Theorem 4.2]



Theorem A.2.11 (Continuity theorem for Laplace transform) Let X, Xn , n ≥ 1 be non-negative random variables such that limn→∞ E[e−tXn ] = d E[e−tX ] for all t ≥ 0. Then Xn −→ X.

A.2 Weak convergence

501

Proof. [Fel71, Theorem XIII.1.2]



A.2.1 Relative compactness and tightness A useful tool to prove convergence of a sequence in a metric space is relative compactness. A sequence in a metric space is said to be relatively compact if from every subsequence a further converging subsequence can be extracted. A relatively compact sequence which admits only one limit along any subsequence converges to this limit. A convenient criterion for the relative compactness of a set of probability measures is tightness. Definition A.2.12 A collection of probability measures {μi , i ∈ I} on a metric space E is said to be tight if for any  > 0 there exists a compact set K of E such that μi (K) > 1 −  for all i ∈ I.

The next result is known as Prokhorov’s theorem. Theorem A.2.13 A tight set of probability measures on a metric space is relatively compact. If moreover the space is Polish, then the converse is true.

Proof. [Bil99, Theorem 5.1]



The converse part of Prokhorov’s theorem implies that a finite set of probability measures on a Polish space is tight. The main use of Prokhorov’s theorem is to prove that a sequence of probability measures on a metric space converges to a given probability measure by showing that the set is tight and all convergent subsequences converge to this said limit.

A.2.2 Product spaces Daniell-Kolmogorov extension Theorem Theorem A.2.14 Let En , n ∈ N ∪ {∞} be Polish spaces endowed with their Borel σ-field En . Assume that for all k, m ∈ N, n ∈ N ∪ {∞} with k ≤ m ≤ n, there exist continuous surjective maps pm,n : En → Em and

502

A Weak convergence

pn : E∞ → En such that pk,m ◦ pm,n = pk,n and pm,n ◦ pn = pm . Assume ∞ furthermore that E∞ = n=1 p−1 n (En ). For n ≥ 1, let νn be a probability measure on En such that νm = νn ◦p−1 m,n for all m ≤ n ∈ N. Then there exists a probability measure ν∞ on E∞ such that νn = ν∞ ◦ p−1 n for all n ∈ N.

Proof. The proof is an adaptation of the proof of [Pol02, Chapter 4, Theorem 53] which is written for product spaces, but the product structure plays no particular role and is replaced here by the surjectivity and consistency of ∞  the maps pm,n and the condition E∞ = n=1 p−1 n (En ). n ∞ Example A.2.15 Assume that En = i=1 Fi and E∞ = i=1 Fi where Fi are Polish spaces and En is endowed with the product topology for all n ∈ N ∪ {∞}. Then the conditions of Theorem A.2.14 hold with pm,n the  canonical projection of En on Em for m < n. Weak convergence in product space (n)

Let X(n) = {Xj , j ∈ N}, n ≥ 1, be a sequence of random elements in a product space E = i∈Z Ei , where all the Ei are Polish spaces and E is endowed with the product topology which makes it a Polish space by Lemma A.1.2. fi.di.

We say that X(n) −→ X, if (n)

(n)

d

(X0 , . . . , Xk ) −→ (X0 , . . . , Xk ) , in

k i=0

Ei endowed with the product topology for all k ∈ N. d

fi.di.

Lemma A.2.16 X(n) −→ X in E if and only if X(n) −→ X. Proof. [vdVW96, Theorem 14.8]

A.2.3 The convergence to Types Theorem Theorem A.2.17 (Convergence to Types) Let {Fn , n ∈ N} be a sequence of probability distribution functions. Let {an , n ∈ N}, {bn , n ∈ N}, {αn , n ∈ N} and {βn , n ∈ N} be sequences of real numbers such that an ≥ 0 and αn > 0 for all n. Let U and V be two probability distribution functions, each taking at least three values, such that



A.2 Weak convergence

lim Fn (an x + bn ) = U (x),

n→∞

503

lim Fn (αn x + βn ) = V (x) ,

n→∞

at all continuity points x of U and V . Then, there exist A > 0 and B ∈ R such that lim an /αn = A,

n→∞

lim (bn − βn )/αn = B ,

n→∞

U (x) = V (Ax + B) .

Proof. [Res87, Proposition 0.2]

(A.2.4) (A.2.5)



The sequences {an } and {αn } are said to have same type or to be equivalent up to type. This result can be expressed in terms of random variables: if (Xn − bn )/an converges weakly to X and (Xn − βn )/αn converges weakly to d

Y then (A.2.4) holds and Y = AX + B.

B Vague# convergence

B.1 Vague# convergence The essential ideas for this section are due to [DVJ08] and [Kal17]. We follow here the presentation of [BP19].

B.1.1 Boundedness There are many concepts of convergence of measures on a Polish space, each characterized by a particular class of test functions. If we test convergence on bounded continuous functions, weak convergence as in the sense of Definition A.2.1 is obtained. If we choose continuous functions with compact support, the classical notion of vague convergence is recovered. It appears that what really matters is the notion of bounded sets, which is not a topological notion, but can be defined intrinsically without reference to a metric. Definition B.1.1 Let E be a set. A boundedness B on E is a collection of subsets of E, called bounded sets, with the following properties: (i) a finite union of bounded sets is bounded; (ii) a subset of a bounded set is a bounded set. • If E is a topological space, a boundedness is said to properly localize E, if there exists a sequence {Un , n ∈ N} of open bounded sets such that U n ⊂ Un+1 for all n ∈ N, ∪∞ n=1 Un = E and for every bounded set B, there exists n ∈ N such that B ⊂ Un . Such a sequence is called a properly localizing sequence.

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

505

506

B Vague# convergence

• A localized Polish space is a pair (E, B) where E is a Polish space and B is a properly localizing boundedness. • The set of bounded continuous functions with B-bounded support is denoted by C(B).

Remark B.1.2 It is important to note that a properly localizing boundedness always contains the compact sets. Indeed, if K is compact, since ⊕ K ⊂ ∪∞ n=1 Un , there exists n such that K ⊂ Un ; hence K is bounded. If E is a metric space, then the metrically bounded sets form a localized boundedness. Most importantly, there is a converse to this elementary property. Theorem B.1.3 Let (E, B) be a localized Polish space. Then there exists a metric d which induces the topology of E such that E is complete and B ∈ B if and only if B is bounded for d.

Proof. [BP19, Theorem 2.4].



For such a metric, if B is bounded, then the -enlargement of B is still bounded. Also, given any x0 ∈ E, a properly localizing sequence is given by {B(x0 , n), n ∈ N}, the sequence of open balls of radius n. We now give the main examples of boundednesses which are used in this book. Example B.1.4 Let E a Polish space and let x0 ∈ E. Set E0 = E \ {x0 }. Consider the boundedness B0 on E0 which consists of the sets which are separated from x0 , i.e. B ∈ B0 if and only if there exists an open set U such that x0 ∈ U and U ⊂ B c . Equivalently, if d is a metric which induces the topology of E, then B ∈ B0 if and only if there exists  > 0 such that x ∈ B =⇒ d(x, x0 ) >  .

(B.1.1)

A properly localizing sequence for this boundedness is {Un , n ∈ N} with c Un = B (x0 , n−1 ). A metric d0 which induces the trace topology of E on E0 and for which E0 is Polish and the sets of B0 are bounded is given by d0 (x, y) = [d(x, y) ∧ 1] ∨ |d−1 (x0 , x) − d−1 (x0 , y)| .

(B.1.2) 

B.1 Vague# convergence

507

Example B.1.5 As a particular example of Example B.1.4, define the boundedness B0 on Rd (endowed with any norm) which consists of sets that are separated from 0. A properly localizing sequence is Un = {x ∈ Rd : |x| > n−1 }. The space Rd \ {0} is complete for the distance d0 (x, y) = [|x − y| ∧ 1] ∨ [|1/ |x| − 1/ |y| |] , x, y ∈ Rd \ {0} . Example B.1.6 More generally, let E a Polish space and let F be a closed subset of E. Let d be a metric which induces the topology of E. Set EF = E\F and let BF be the class of sets B for which there exists  > 0 (depending on B) such that x ∈ B =⇒ d(x, F ) >  .

(B.1.3)

The distance of a point to a set is defined by d(x, F ) = inf{d(x, y), y ∈ F }. Then BF is a boundedness on EF and a metric dF which induces the trace topology of E on EF and the boundedness BF is given by dF (x, y) = [d(x, y) ∧ 1] ∨ |d−1 (x, F ) − d−1 (y, F )| .

(B.1.4)

A properly localizing sequence is {Un , n ∈ N} with Un = {x ∈ F c : d(x, F ) >  n−1 }. Example B.1.7 We have considered boundednesses which admit a properly localizing sequence. Conversely, let E be a Polish space and let {On , n ∈ N} be an increasing sequence of open sets such that On ⊂ On+1 for all n ∈ N. Define O∞ = ∪∞ n=1 On . Then O∞ is an open subset of a Polish space, hence a Polish space itself. In O∞ , we can define a boundedness B, called the boundedness induced by the sequence {On , n ∈ N}, by defining a bounded set as one which is included in one of the On . Then (O∞ , B) is a localized Polish space. We could have defined the boundednesses in the previous examples in this way. In Example B.1.6, we can define On = Un = {x ∈ E : d(x, F ) > n−1 }, O∞ = EF and we see that the boundedness BF is induced by the sequence {On , n ∈ N}.  Subspaces of product spaces Example B.1.8 Let E1 and E2 be two sets endowed with boundednesses B1 and B2 . Consider E = E1 × E2 . Let B∩ be the boundedness defined as follows: a subset of E is B∩ -bounded if it is included in a rectangle B1 × B2 with Bi ∈ Bi , i = 1, 2. Let d1 , d2 be distances on E1 and E2 which induce their respective topology and boundedness. Then, B∩ is induced by the metric d∩ defined on E by d∩ ((x1 , x2 ), (y1 , y2 )) = d1 (x1 , y1 ) + d2 (x2 , y2 ).  Example B.1.9 Let Ei = R \ {0}, i = 1, 2 and consider the boundednesses Bi , i = 1, 2, that consist of sets separated from 0 as in Example B.1.5. Then

508

B Vague# convergence (i)

B∩ consists of sets separated from both axes. Let {Un , n ∈ N}, i = 1, 2, be (1) (2) the corresponding localizing sequences. Then Un = Un × Un , n ∈ N, is localizing for B∩ . Example B.1.10 The space of interest may be a subspace of a product space. Let E1 and E2 be two Polish spaces. Let Gi be a non-empty open subset of Ei , i = 1, 2, each of Gi endowed with a boundedness Bi . Define E = (G1 × E2 ) ∪ (E1 × G2 ) and let B be defined by B ∈ B ⇐⇒ ∃(B1 , B2 ) ∈ B1 × B2 , B ⊂ (B1 × E2 ) ∪ (E1 × B2 ) . (i)

If {Un , n ∈ N} is a localizing sequence in Gi , i = 1, 2, then the sequence {Un , n ∈ N} defined by Un = (Un(1) × E2 ) ∪ (E1 × Un(2) ) is localizing for B. Note that B ∈ B implies neither Π1 (B) ∈ B1 nor Π2 (B) ∈  B2 , i.e. both projections could be unbounded. Example B.1.11 We can further specify Example B.1.10 by considering the framework of Example B.1.6. Let E1 and E2 be two Polish spaces. Let Fi , i = 1, 2 be non-empty closed proper subsets of Ei and let Gi = Ei \ Fi . Let di be a metric which induces the topology of Ei and let each Gi be endowed with the boundedness Bi defined in Example B.1.6, for which a properly localizing (i) sequence is Un = {xi ∈ Gi , d(xi , Fi ) > n−1 }. Define E = (E1 × E2 ) \ (F1 × F2 ) = (E1 × G2 ) ∪ (G1 × E2 ) . Let B be the boundedness defined by B ∈ B ⇐⇒ ∃(B1 , B2 ) ∈ B1 × B2 , B ⊂ (B1 × E2 ) ∪ (E1 × B2 ) . Equivalently, a set B ⊂ E is bounded if and only if there exists  > 0 such that (x1 , x2 ) ∈ B =⇒ d1 (x1 , F1 ) ∨ d2 (x2 , F2 ) >  . The sequence {Un , n ∈ N} defined by Un = (Un(1) × E2 ) ∪ (E1 × Un(2) ) = {(x1 , x2 ) ∈ E1 × E2 : d1 (x1 , F1 ) ∨ d2 (x2 , F2 ) > n−1 } is properly localizing for B. Note again that B ∈ B implies neither Π1 (B) ∈  B1 nor Π2 (B) ∈ B2 , i.e. both projections could be unbounded. Example B.1.12 An important case is given by Ei = Rd and Gi = Rd \ {0}. Then E = R2d \ {0} (since (x1 , x2 ) = 0 ⇐⇒ x1 = 0 or x2 = 0) and the joint boundedness consists of sets separated from 0 in R2d . Is important to note that the projection of a bounded set need not be a bounded set in the corresponding space. 

B.1 Vague# convergence

509

Example B.1.13 Let E1 and E2 be two Polish spaces. Let F1 be a non-empty closed proper subset of E1 and let G1 = E1 \ F1 . Let d1 be a metric which induces the topology of E1 and let G1 be endowed with the boundedness B1 defined in Example B.1.6, for which a properly localizing sequence is (1) Un = {x1 ∈ G1 : d1 (x1 , F1 ) > n−1 }. Define E = G1 × E2 . Let B be the boundedness defined by B ∈ B ⇐⇒ ∃B1 ∈ B1 , B ⊂ B1 × E2 . Equivalently, a set B ⊂ E is bounded if and only if there exists  > 0 such that (x1 , x2 ) ∈ B =⇒ d1 (x1 , F1 ) >  . The sequence {Un , n ∈ N} defined by Un = Un(1) × E2 = {(x1 , x2 ) ∈ E1 × E2 : d1 (x1 , F1 ) > n−1 } is properly localizing for B. Note that here B ∈ B implies Π1 (B) ∈ B1 . On the other hand, if E2 is equipped with a boundedness B2 , then Π2 (B) does  not need to be in B2 . Example B.1.14 A particular case of Example B.1.13 is given by E1 = R, G1 = R\{0} and E2 = Rd . Then E = (R\{0})×Rd and the joint boundedness consists of sets B such that x = (x0 , . . . , xd ) ∈ B if |x0 | >  for some  > 0.  The projection Π1 (B) of a bounded set B is a bounded set in R \ {0}. The existence of a metric which defines the boundedness is useful theoretically, but other metrics will be very useful. Definition B.1.15 Let (E, B) be a localized Polish space. A metric d on E is said to be compatible if it induces the topology of E and if for every bounded set B ∈ B, there exists  > 0 such that the -enlargement of B is still bounded.

B.1.2 Vague# convergence Definition B.1.16 Let (E, B) be a localized Polish space.

510

B Vague# convergence

• A measure ν on E is said to be B-boundedly finite if ν(A) < ∞ for all B-bounded Borel sets A. The set of B-boundedly finite measures on E is denoted MB . • A sequence {νn , n ∈ N} of B-boundedly finite measures converges B-vaguely# to a measure ν if limn→∞ νn (f ) = ν(f ) for all bounded continuous functions f with B-bounded support. • The topology of vague convergence on MB is the smallest topology on MB which makes the maps μ → μ(f ) continuous for all bounded continuous functions f with B-bounded support.

When there is no risk of confusion, we will omit the prefix B. If the boundedness consists of all subsets of E, weak convergence is recovered. If E is a locally compact separable Hausdorff space and the boundedness consists of sets which are included in a compact set, then vague# convergence coincides with the more classical form of vague convergence. We will see that the space of B-boundedly finite measures endowed with the topology of B-vague# convergence is a Polish space. First we state a form of the Portmanteau theorem. Theorem B.1.17 Let {μn , n ∈ N} be a sequence of boundedly finite measures on a localized Polish space (E, B). The following statements are equivalent: v#

(i) νn −→ ν; (ii) limn→∞ νn (A) = ν(A) for all bounded Borel sets such that ν(∂A) = 0; (iii) for all bounded Borel sets, μ(A◦ ) ≤ lim inf μn (A) ≤ lim sup μn (A) ≤ μ(A) ; n

n

(iv) limn→∞ νn (f ) = ν(f ) for all d-Lipschitz continuous functions f with bounded support, for a compatible metric d on (E, B).

Proof. This is essentially [Kal17, Lemma 4.1]. We only need to prove that the class of bounded Lipschitz functions with respect to any compatible metric d can be used as approximating class of the bounded sets. Indeed, for every bounded set A, the functions fn and gn defined by

B.1 Vague# convergence

fn (x) = {nd(x, (A◦ )c )} ∧ 1 ,

511

gn (x) = (1 − nd(x, A))+

are bounded Lipschitz continuous functions. The support of fn is included in A hence is bounded and the support of gn is included in A1/n thus is bounded for large enough n since d is compatible. Furthermore, lim fn = 1A◦ ,

n→∞

lim gn = 1A .

n→∞



Example B.1.18 The characterization (iv) implies that in Examples B.1.4 and B.1.6, vague# convergence is characterized by bounded functions with bounded support which are Lipschitz continuous with respect to the original distance which is compatible (in the sense of Definition B.1.15). This is usually easier than working with the ad hoc metric defined in (B.1.2) and (B.1.3). In the framework of Example B.1.6, if a set B is bounded, then it is included in a set Un = {x ∈ F c : d(x, F ) > n−1 }. Thus, the (2n)−1 enlargement of B is included in U2n , hence is bounded. Corollary B.1.19 Let (E, B) be a localized Polish space. The following statements are equivalent: v#

1. μn −→ μ; 2. there exists a properly localizing sequence {Uk , k ∈ N} such that μ(∂Uk ) = 0 and μn |Uk converge weakly to μ |Uk for all k ∈ N.

Proof. The direct implication only must be proved. Let d be a metric on E which induces the topology of E and for which the boundedness coincides with the metrically bounded sets (see Theorem B.1.3). Then we can choose a properly localizing sequence {Uk , k ≥ 1} such that for each k ≥ 1, Uk ⊂ Uk+1 for  small enough. Take, for instance, Un = B(x0 , n) for an arbitrary x0 ∈ E. Then, for each k ≥ 1, either μ(∂Uk ) = 0 or μ(∂Uk ) = 0 for all but countably many  > 0. Thus if μ(∂Uk ) > 0, Uk can be replaced by Uk for  such that Uk ⊂ Uk+1 and μ(∂Uk ) = 0. This means that for any boundedly finite measure μ, there exists a properly localizing sequence {Uk , k ≥ 1} v#

such that μ(∂Uk ) = 0 for all k ≥ 1. Finally, if μn −→ μ and μ(∂Uk ) = 0, then limn→∞ μn (Uk ) = μ(Uk ). This proves that the restrictions to Uk of the  measures μn converge to the restriction of μ to Uk . Theorem B.1.20 The space MB equipped with the vague# topology is Polish.

512

B Vague# convergence



Proof. [BP19, Theorem 3.1]. Theorem B.1.21 (Mapping theorem) Let E and E be two Polish spaces endowed with boundednesses B and B . Let h : E → E be a measurable map such that h−1 (B ) ∈ B if B ∈ B . Let {νn , n ≥ 1} be a sequence of B-boundedly finite measures on E which converges B-vaguely# to ν. If ν(Dh ) = 0, then νn ◦ h−1 converges B -vaguely# to ν ◦ h−1 on E .



Proof. [HL06, Theorem 2.5]

We next state a continuous mapping theorem for possibly unbounded functions with possibly unbounded support. Theorem B.1.22 Let (E, B) be a localized Polish space with a proper localizing sequence {Un , n ∈ N}. Let {νn , n ∈ N} be a sequence of Bboundedly finite measures on E which converges B-vaguely# to ν. Let h : E → R be a ν-almost surely continuous map such that lim lim sup νn (|h|1{Ukc }) = 0 ,

(B.1.5a)

lim lim sup νn (|h|1{|h| > A}) = 0 .

(B.1.5b)

k→∞ n→∞ A→∞ n→∞

Then ν(|h|) < ∞ and limn→∞ νn (h) = ν(h).

Proof. Note first that the assumptions entail that ν(|h|) < ∞. By Corollary B.1.19, the localizing sequence can be chosen such that ν(∂Uk ) = 0 for all k ≥ 1. Thus h1{Uk } is almost surely continuous with respect to ν for all k ≥ 1. Also, we can choose an increasing sequence Ak such that h1{|h| ≤ Ak } is almost surely continuous with respect to ν. Then, for every k ≥ 1, lim νn (h1{Uk }1{|h| ≤ Ak }) = ν(h1{Uk }1{|h| ≤ Ak }) .

n→∞

By dominated convergence, we also have limk→∞ ν(h1{Uk }1{|h| ≤ Ak }) = ν(h). The conclusion follows from the triangular argument Lemma A.1.4.  We will need a criterion for relative compactness of a set of boundedly finite measures. Lemma B.1.23 Let (E, B) be a localized Polish space. A set of measures M ⊂ MB is relatively compact if and only if

B.1 Vague# convergence

513

(RCi) supμ∈M μ(B) < ∞ for all B ∈ B; (RCii) for every  > 0 and every B ∈ B, there exists a compact set K such that sup μ(B \ K) ≤  .

μ∈M

If {Un , n ∈ N} is a localizing sequence, then it suffices to check (RCi) and (RCii) for each Un instead of all B ∈ B. Proof. [Kal17, Theorem 4.2]



Remark B.1.24 If E is a locally compact topological space (i.e. every point has a relatively compact neighborhood) and the bounded sets are the relatively compact sets, then (RCii) is superfluous. ⊕ Example B.1.25 Consider the space E = (0, ∞] endowed with the usual topology and let B be the class of relatively compact sets, i.e. sets included in a set (, ∞] for one  > 0. Then a set M of boundedly finite measures is relatively compact if and only if supμ∈M μ((, ∞]) < ∞ for all  > 0. This follows immediately from Remark B.1.24.  Example B.1.26 Consider the space E = (0, ∞) endowed with the usual topology and let B be the class of sets bounded away from zero, i.e. sets included in a set (, ∞) for one  > 0. Then a set M of boundedly finite measures is relatively compact if and only if limx→∞ supμ∈M μ((x, ∞)) = 0. The difference with the previous example is that in the present case there exist bounded sets which are not relatively compact, we must check condition (RCii).  Example B.1.27 (Logarithmic tails) (i) Consider a sequences of measures μn on (0, ∞] such that μn ((x, ∞]) = (nx)/(n), where  is a positive non-increasing slowly varying function on (0, ∞). Consider vague convergence relative to the boundedness K of relatively compact sets of (0, ∞]. Then (RCi) holds and {μn } is relatively compact. Actually, μn converges K-vaguely# to the Dirac mass at ∞. Indeed, for every x < y < ∞, limn→∞ μn ((x, y]) = 0 and limn→∞ μn ((x, ∞]) = 1. (ii) Consider the same measures μn defined on (0, ∞) and let B be the class of sets bounded away from zero. Then {μn } is not B-vaguely# relatively compact since (RCii) fails to hold. Indeed: for x>0, supn≥1 μn ((x, ∞))≥1. This illustrates the fact that different boundednesses induce different notions of vague convergence.  Example B.1.28 Let E = Rd \ {0} be endowed with the usual topology and let B be the class of sets separated from zero, i.e. sets included in the complement of a ball centered at zero (for any norm). Then a set M of boundedly finite measures is relatively compact if and only if

514

B Vague# convergence

∀r > 0 ,

sup μ(B c (0, r)) < ∞ ,

(B.1.6a)

μ∈M

lim sup μ(B c (0, r)) = 0 .

(B.1.6b)

r→∞ μ∈M

The condition (B.1.6b) prevents the mass to escape at infinity as in Example B.1.27 (ii).  The following criterion generalizes the previous example and will be useful in locally compact spaces such as complements of a closed cone in a Euclidean spaces. Lemma B.1.29 Let (E, B) be a localized Polish space. Let {Un , n ∈ Z} be a non-decreasing sequence of bounded sets such that (i) For every bounded set B there exists n ∈ Z such that B ⊂ Un ; (ii) for all n ∈ Z, U n \ Un−1 is compact. A set M of B-boundedly finite measures is relatively compact if ∀n ∈ Z , lim

sup μ(Un ) < ∞ ,

(B.1.7a)

sup μ(Un ) = 0 .

(B.1.7b)

μ∈M

n→−∞ μ∈M

Proof. Let B be a bounded set. By assumption, there exists n ∈ Z such that B ⊂ Un . Thus, assumption (B.1.7a) implies sup μ(B) ≤ sup μ(Un ) < ∞ .

μ∈M

μ∈M

That is, (RCi) holds. Note now that for all k, n ∈ Z such that k < n, assumption (ii) implies that U n \ Uk is compact. Thus, for  > 0, by (B.1.7b), we can choose k such that sup μ(B \ (U n \ Uk )) ≤ sup μ(U n \ (U n \ Uk )) = sup μ(Uk ) ≤  .

μ∈M

μ∈M

μ∈M

Thus (RCii) holds and M is B-vaguely# relatively compact. c



In Example B.1.28, setting Un = B (0, e−n ) for n ∈ Z yields a sequence which satisfies the conditions of Lemma B.1.29. Convergence determining classes of functions Let B∩ be the boundedness defined in Example B.1.8.

B.1 Vague# convergence

515

Lemma B.1.30 Let (E1 , B1 ) and (E2 , B2 ) be two localized Polish spaces. Let the product space E1 ×E2 be endowed with the product topology and the boundedness B∩ . Then the class of functions f such that f (x, x ) = f1 (x)f2 (x ) with f1 , f2 bounded continuous functions with bounded support in E1 and E2 , respectively, is convergence determining for the vague# convergence associated to the boundedness B∩ . If d1 and d2 are metrics which induce the topologies and boundednesses of E1 and E2 , respectively, then f1 and f2 can be assumed to be Lipschitz functions. Proof. This follows Theorem B.1.17 and [Kal17, Lemma 4.1].



We now provide a measure determining class of functions for a particular case of Examples B.1.11 and B.1.12. Lemma B.1.31 Let E1 and E2 be Polish spaces. Let (x∗1 , x∗2 ) ∈ E1 × E2 and define E = (E1 × E2 ) \ {(x∗1 , x∗2 )}. Let B be as in Example B.1.11 (with Fi = {x∗i }). Let C be the class of bounded continuous functions f on E such that either (a) there exists a bounded set A1 ∈ B1 such that f has support in A1 × E2 ; (b) there exists a bounded continuous function g defined on E2 with support in B2 such that f (x, y) = g(y) for all (x, y) ∈ E. Then C is measure determining for boundedly finite measures on (E, B). Proof. Let f ∈ C(B). The support of f is included in a set of the form (A1 × E2 ) ∪ (E1 × A2 ) with Ai ∈ Bi , i = 1, 2. For a function f defined on E, set f2 (x1 , x2 ) = f (x∗1 , x2 ) , f1 = f − f2 . / A1 , the support of f2 is included in E1 × A2 , Then f = f1 + f2 and since x∗1 ∈ hence is bounded. Thus f2 ∈ C and f1 is also bounded continuous with bounded support. For n ≥ 1, let φn be defined on E by φn (x1 , x2 ) = {n(d(x1 , x∗1 ) − 1)+ } ∧ 1. Then φn is bounded and Lipschitz continuous, has bounded support U n × E2 with Un = {x1 ∈ E1 : d1 (x1 , x∗1 ) > n−1 }, and increases to 1G1 ×E2 . Thus f1 φn ∈ C for all n ≥ 1 and limn→∞ f1 φn = f1 . Since f1 is bounded with bounded support, this implies the dominated convergence theorem that limn→∞ μ(f1 φn ) = μ(f1 ). This yields limn→∞ {μ(f1 φn ) + μ(f2 )} = μ(f ). Therefore μ is determined by the class C.  Lemma B.1.32 Consider Rd+ \ {0} endowed with the usual topology and the boundedness B0 of sets separated from zero. The class R of complements of rectangles, i.e. R = {[0, x]c , x ∈ Rd+ \ {0}} is measure determining.

516

B Vague# convergence

Proof. Note that R is stable by finite unions and consider the class A which consists of intersections of elements R and proper differences of these intersections with another element of R: A = {A1 ∩ · · · ∩ Ak , (A1 ∩ · · · ∩ Ak ) \ Ak+1 , A1 , . . . , Ak+1 ∈ R, k ∈ N∗ } . Then A is a obviously a π-system and countably generates the open sets. To prove the latter claim, we use Lemma A.2.8. For a ∈ Rd+ \ {0} and b ∈ Rd+ \ {0} such that ai < bi , i = 1, . . . , d, we will prove that the rectangle [a, b] is in A. (i)

(i)

For i = 1, . . . , d, let b(i) = (b1 , . . . , bd ) have all components equal to those (i) (i) of b except the i-th which is ai , i.e. bj = bj if j = i and bi = ai . Then d

[0, b(i) ]c \ [0, b]c =

i=1

d

[0, b(i) ]c ∩ [0, b] = [a, b] .

i=1

Thus, every x0 ∈ Rd+ \ {0} belongs to an element of A which is itself included in an arbitrarily small ball (for any norm, the supnorm being the most convenient here) centered at x0 . This proves our claim by Lemma A.2.8. Write 1 = (1, . . . , 1) and for n ≥ 1, On = [0, n−1 1]c and An = {A ∈ A, A ⊂ On }. Then An is also a π-system and countably generates the open sets of On . Let μ, ν be two boundedly finite measures on Rd+ \ {0}. Then μ(On ) = ν(On ) < ∞ for all n ≥ 1. Thus for each fixed n ≥ 1, the restrictions of μ−1 (On )μ and ν −1 (On )ν to On are probability measures on On which are equal on An . Thus they are equal by Theorem A.1.7. This implies that μ and ν are equal on all bounded sets, hence are equal. 

B.1.3 Extension theorem Let En , n ∈ N∪{∞} be Polish spaces, each endowed with its Borel σ-field En . Assume that for each k < n ∈ N ∪ {∞}, there exists a surjective continuous map Πk,n : En → Ek such that if k < m < n, Πk,m ◦ Πm,n = Πk,n . For n = ∞, we write Πk for Πk,∞ when convenient. Since we will need the extension Theorem A.2.14, we also assume

E∞ =

∞ n=0

Πn−1 (En ) .

(B.1.8)

B.1 Vague# convergence

517

Our goal is to extend Theorem A.2.14, which is for probability measures, to unbounded measures. Therefore we need to restrict the measures to an increasing sequence of sets on which they are uniformly bounded. (n)

For each n ∈ N, we consider an increasing sequence of open sets {Uk , k ∈ N} (n)

(n)

(n)

in En such that Uk ⊂ Uk+1 for all k ∈ N. We define Fn = ∪k∈N Uk . Then Fn is an open set of En , hence is a Polish space. Let B (n) be the boundedness (n) induced by the sequence {Uk , k ∈ N}, that is, the collection of sets which (n) are included in one of Uk (see Example B.1.7). Then (Fn , B (n) ) is a localized (n) Polish space and {Uk , k ∈ N} is a localizing sequence. We make the following crucial assumption:

−1 For all k ∈ N and m < n ∈ N, Πm,n (Uk

(m)

(n)

) ⊂ Uk .

(B.1.9)

−1 −1 This implies that Πm,n (Fm ) ⊂ Fn and if B ∈ B (m) then Πm,n (B) ∈ B (n) . However, it is not assumed that the direct image of a bounded set is bounded nor that Πk,n (Fn ) ⊂ Fk .

Define Bk = Πk−1 (Uk ). By continuity of the maps Πm,n and by (B.1.9), (∞) {Bk , k ≥ 0} is a non-decreasing sequence of open sets of E∞ . By (B.1.9), continuity of the maps Πn and the property Πm,n ◦ Πn = Πm , we also have (∞)

(∞)

Bk

(k)





(k) (k) (k) ⊂ Πk−1 Uk+1 = Πk−1 (Uk ) ⊂ Πk−1 Uk



(k) (k+1) (∞) −1 −1 −1 Πk,k+1 = Πk+1 Uk+1 ⊂ Πk+1 Uk+1 = Bk+1 .

In the second last equality we used the assumption (B.1.9). Therefore, the (∞) sequence {Bk , k ∈ N} induces a boundedness B (∞) on the Polish space (∞) F∞ = ∪∞ . k=1 Bk (k)

Let A be a bounded set in Fk . Then there exists n ≥ k such that A ⊂ Un , hence −1 Πk−1 (A) ⊂ Πk−1 (Un(k) ) = Πn−1 Πk,n (Un(k) ) ⊂ Πn−1 (Un(n) ) = Bn(∞) . (B.1.10)

Thus Πk−1 (B (k) ) ⊂ B (∞) . For each n ∈ N, we consider a measure μn on En such that μn (En \ Fn ) = 0 (n) and μn (Uk ) < ∞ for all k ∈ N. In other words, the restriction of μn to its support Fn is boundedly finite (with respect to the boundedness B (n) ). We assume that the measures μn satisfy the consistency property:

518

B Vague# convergence

−1 μm = μn ◦ Πm,n , m 0 for all k ≥ k0 . From now on, without loss of generality, we will assume that k0 = 0, i.e. c0 > 0. Note that if limm→∞ cm < ∞, then all the measures are uniformly bounded and the result follows from Theorem A.2.14. (m)

Otherwise, for m ≥ k ≥ k0 , set Bk entails, for m ≥ k, (m)

μm (Bk

−1 = Πk,m (Uk ). The consistency property (k)

−1 ) = μm (Πk,m (Uk )) = μk (Uk ) = ck . (k)

(k)

For m ≥ k, let νk,m be the probability measure defined on Em by νk,m = c−1 k μm (· ∩ Bk

(m)

).

−1 The set {νk,m , m ≥ k} inherits the consistency property. Indeed, using Πm,n ◦ −1 −1 Πk,m = Πk,n and (n)

Bk

−1 −1 −1 −1 = Πk,n (Uk ) = Πm,n ◦ Πk,m (Uk ) = Πm,n (Bk (k)

(k)

(m)

),

we have, for k ≤ m < n, and A ∈ Em , −1 −1 νk,n (Πm,n (A)) = c−1 k μn (Πm,n (A) ∩ Bk ) (n)

−1 = c−1 k μn (Πm,n (A ∩ Bk

(m)

)) = c−1 k μm (A ∩ Bk

(m)

) = νk,m (A) .

B.1 Vague# convergence

519

Therefore, by the Daniell-Kolmogorov extension Theorem A.2.14, there exists −1 for all m ≥ k. a probability measure ν (k) on E∞ such that νk,m = ν (k) ◦ Πm (k) (k) (k) −1 Moreover, since Bk = Πk,k (Uk ) = Uk , we have (∞)

ν (k) (Bk

) = ν (k) (Πk−1 (Uk )) = νk,k (Uk ) = c−1 k μk (Uk (k)

(k)

(∞)

This means that ν (k) has support in Bk

(k)

(k)

∩ Bk ) = 1 .

. (∞)

For k ≥ 1, set μ(k) = ck ν (k) . Then μ(k) has support in Bk and for m ≥ k, (∞) the restriction of μ(m) to Bk is μ(k) . Since we assume (B.1.8), it suffices to check this property for B = Πn−1 (A) with A ∈ En . Then, (recalling that (n) (k) −1 Bk = Πk,n (Uk )), we have (∞)

μ(m) (B ∩ Bk

) = μ(m) (Πn−1 (A) ∩ Bk

(∞)

)

= cm ν (m) (Πn−1 (A) ∩ Πk−1 (Uk )) (k)

−1 = cm ν (m) (Πn−1 {A ∩ Πk,n (Uk )}) (k)

−1 = cm νm,n (A ∩ Πk,n (Uk )) = μn (A ∩ Bk ) (k)

(n)

= ck νk,n (A) = ck ν (k) (Πn−1 (A)) = μ(k) (Πn−1 (A)) = μ(k) (B) . (∞)

Define Vk = Bk

(∞)

\ Bk−1 and define a measure μ(∞) on E∞ by (∞)

μ(∞) = μ(0) (· ∩ B0

)+

∞ 

μ(k) (· ∩ Vk ) .

k=1 (∞)

For a measurable set B ⊂ Bn

(∞)

μ(∞) (B) = μ(0) (B ∩ B0

(∞)

, since μ(k) has support in Bk

)+

n 

, we have

μ(k) (B ∩ Vk ) = μ(n) (B) .

(B.1.12)

k=1 (∞)

Therefore μ(∞) (E∞ \ F∞ ) = 0 (recall that F∞ = ∪k≥0 Bk ). Since μn (En \ Fn ) = 0, in order to prove the consistency property, it suffices to consider bounded sets in Fn since they are measure determining. Let A be a bounded measurable set in Fn , then Πn−1 (A) is bounded in F∞ by (B.1.10). Therefore (∞) there exists n ≥ n such that Πn−1 (A) ∈ Bn . Thus, by (B.1.11) and (B.1.12), 



−1 μ(∞) ◦ Πn−1 (A) = μ(n ) (Πn−1 (A)) = μ(n ) (Πn−1  ◦ Πn,n (A)) 

−1 = μ(n ) (Πn,n  (A)) = μn (A) < ∞ .

This proves the consistency property and that μ(∞) is boundedly finite on  (F∞ , B (∞) ). We conclude the proof by setting μ∞ = μ(∞) .

520

B Vague# convergence

Example B.1.34 For i ∈ N, let Ei be a Polish space endowed nwith its Borel σ-field. Then for all n ∈ N ∪ {∞}, the product spaces En = i=0 Ei endowed ∞ with the product topology are also Polish spaces and E∞ = n=0 Πn−1 (En ), where En , n ∈ N∪{∞}, is the Borel σ-field and Πn is the canonical projection. Let Fi be a closed subset of Ei . For a distance di which induces the topology (i) (n) of Ei , let Ok = {x ∈ Ei : di (x, Fi ) > k −1 }. For k, n ∈ N, define sets Uk in En by ⎛ ⎞  n  n i−1 n     (n) (i) (i) ⎝ Uk = En \ (Ok )c = Ej × Ok × Ej ⎠ . i=1

i=1

j=0

j=i+1

(n)

Then {Uk , k ≥ 1} is a localizing sequence in En . For m < n ∈ N, let Πm,n be the projection of En onto Em . Then, for k ∈ N, ⎛ ⎛ ⎞⎞ m i−1 m    (m) (i) −1 −1 ⎝ ⎝ Πm,n (Uk ) = Πm,n Ej × Ok × Ej ⎠⎠

=

m  i=1

⎛ ⎝

i=1 i−1 

j=0

Ej ×

j=i+1

(i) Ok

j=0

n 

×



Ej ⎠ ⊂ Uk

(n)

.

j=i+1 (n)

Thus assumption (B.1.9) holds. However, Πm,n (Uk ) = Em is not necessarily bounded.  Triangular argument For every k ∈ N, let {μn,k , n ∈ N} and μ(k) be measures on Ek with support in Fk , whose restriction to Fk is boundedly finite and such that v#

μn,k −→ μ(k) , n → ∞ ,

in (Fk , B (k) ) .

(B.1.13)

For each n fixed we assume the following consistency: −1 μn,k = μn,m ◦ Πk,m , k 0 and x ∈ E, s · (t · x) = (st) · x; (M3) the map (t, x) → t · x is continuous with respect to the product topology. For A ⊂ E we define the dilated set t · A = {t · x, x ∈ A}. If A ⊂ B, then s · A ⊂ s · B for all s > 0. A set A is called a cone if t · A ⊂ A for all t > 0; it is called a semi-cone if t · A ⊂ A for all t ≥ 1. If A is a semi-cone and s > t, then s · A ⊂ t · A. Indeed, by assumption, st−1 · A ⊂ A (since st−1 > 1) and by associativity of the outer multiplication, s · A = t · ((st−1 ) · A) ⊂ t · A. By continuity, if A is an open set (respectively, closed set), then t · A is also ◦ open (respectively, closed). Also, s · A = s · A and (s · A) = s · A◦ .

522

B Vague# convergence

Let (E, B) be a localized Polish space endowed with a continuous outer multiplication which moreover satisfies the following properties: (B1) If A is bounded, then ∪t≥s (t · A) is bounded for all s > 0. (B2) There exists a bounded open semi-cone C1 such that ∩t>1 (t · C1 ) = ∅, ∪t≤1 (t·C1 ) = E, s · C1 ⊂ t·C1 if s > t and {n−1 ·C1 , n ∈ N} is a localizing sequence. Furthermore, for every x ∈ E and t > 0, there exists s0 such that s · x ∈ / t · C1 if s < s0 . (B3) The set of bounded semi-cones is measure determining. Definition B.2.1 Let μ be a boundedly finite measure on E, {an } be a scaling sequence and set μn = nμ(an ·), n ≥ 1. The measure μ is said to be regularly varying with scaling sequence an if μn converges vaguely# on E to a non-zero boundedly finite measure ν on E.

Theorem B.2.2 Let E be a localized Polish space endowed with a continuous outer multiplication which satisfies conditions (B1), (B2), and (B3). The following statements are equivalent: (i) μ is regularly varying; (ii) there exist α > 0 and a function g : (0, ∞) → (0, ∞), regularly varying at infinity with index α and a non-zero measure ν such that g(t)μ(t·) converges vaguely# to ν. Furthermore, (a) if either condition holds, then the norming sequence {an } in (i) is regularly varying with index α−1 and the limiting measure ν is homogeneous i.e. for all Borel sets A ⊂ E and t > 0, ν(t · A) = t−α ν(A); the function g in (ii) can be chosen as 1/μ(t · A) for any bounded Borel set A such that ν(A) > 0. (b) if there exist two functions g and g and two measures ν and ν such that (ii) holds, then there exists c ∈ (0, ∞) such that g (t) = c , ν = cν . t→∞ g(t) lim

B.2 Regular variation of measures on Polish spaces

523

Proof. We prove that (ii) implies the homogeneity of ν. Set μt = g(t)μ(t·). For s > 0 define the boundedly finite measure νs by νs (A) = sα ν(s · A). Let A ⊂ E be a bounded measurable set such that ν(∂A) = 0 and ν(∂(s · A)) = 0. Then, ν(A) = lim μst (A) = lim t→∞

t→∞

g(st) μt (s · A) = sα ν(s · A) = νs (A) . g(t)

We have proved that ν and νs are equal on their common bounded continuity sets, thus they are equal by Lemma A.1.8. The implication (ii) =⇒ (i) is straightforward: take an = g ← (n). To prove the converse, assume that μ is regularly varying in the sense of Definition B.2.1 with scaling sequence {an , n ∈ N}. Define the non-decreasing function g as follows: for t ≥ 1, let g(t) be the largest integer n such that an ≤ t, i.e. g(t) = n ⇐⇒ an ≤ t < an+1 . v#

Set μt = g(t)μ(t·). We will prove that μt −→ ν as t → ∞. Fix a bounded semi-cone A such that ν(A) > 0 and ν(∂A) = 0. Then, denoting n = g(t) and using the already noted fact that s · A ⊂ t · A if s > t, we obtain n (n + 1)μ(an+1 · A) ≤ g(t)μ(t · A) ≤ nμ(an · A) (B.2.1) n+1 This yields limt→∞ g(t)μ(t · A) = ν(A) for all semi-cones A which are continuity sets of ν. Since the semi-cones are assumed to be measure determining by (B3), this proves that the sequence μt has only one possible limit. If we prove that it is relatively compact, then this will also prove its convergence. Let A be a semi-cone which satisfies condition (B2) and set Uk = k −1 · A. By assumption (B2) and Lemma B.1.23, we need to check the following conditions: (i) for all k ≥ 1, supt≥1 μt (Uk ) < ∞; (ii) for every  > 0 and k ≥ 1, there exists a compact set K such that supt>0 μt (Uk \ K) ≤ . Since the sequence {nμ(an ·), n ∈ N} converges vaguely# , it is relatively compact. Thus, it satisfies conditions (RCi) and (RCii) and for every k ≥ 1, supn≥1 nμ(an · Uk ) < ∞. This, with (B.2.1), implies that (i) holds. Fix  > 0 and k ≥ 1. Since (RCii) is satisfied by the sequence {nμ(an ·), n ∈ N}, there exists a compact set K such that sup nμ(an (U k \ K)) ≤  .

(B.2.2)

n≥1

Since U k is closed and U k \ K = U k \ (U k ∩ K), we can assume without loss ˜ = {s · x, 0 < s ≤ 1, x ∈ K} ∩ U k . Then of generality that K ⊂ U k . Set K

524

B Vague# convergence

˜ ⊂ (U k \ K) and U k \ K ˜ is a semi-cone therefore (B.2.1) and (B.2.2) (U k \ K) yield, for t ≥ 1 and n = g(t), ˜ ≤ nμ(an · (U k \ K)) ≤  . μt (U k \ K)

(B.2.3)

˜ is a compact set. To conclude that (ii) holds, we must still check that K Since E is metrizable, a set A is compact if and only if any sequence with values in A has a converging subsequence. Let {yn , n ∈ N} be a sequence with ˜ Then there exist sequences {sn , n ∈ N} and {xn , n ∈ N} such values in K. that sn ∈ (0, 1], xn ∈ K and sn · xn = yn ∈ U k . Since K is compact, there exists a subsequence {sn(j) , xn(j) , j ∈ N} such that sn(j) → s ∈ [0, 1] and xn(j) → x ∈ K as j → ∞. We shall prove that s > 0. By assumption (B2), c for each x ∈ K, there exists tx > 0 such that tx · x ∈ / U k , i.e. x ∈ (t−1 x · U k) . −1 c This means that K ⊂ ∪x∈K (tx · U k ) . Since K is compact, there exist x1 , . . . , xI such that −1 c c K ⊂ ∪Ii=1 (t−1 xi · U k ) = (t0 · U k )

with t0 = min(tx1 , . . . , txI ). Since t → t · U k is decreasing (recall that each Uk c is semi-cone), this means that for all t < t0 , t · K ⊂ U k . Since xn ∈ K for all / U k for all j ≥ J. n, if sn(j) → 0, then there exists J such that sn(j) · xn(j) ∈ This contradicts the assumption that sn · xn ∈ U k for all n. Consequently s > 0 and sn · xn → s · x by continuity of the outer product. This proves ˜ is compact and thus (ii) holds. We have proved that {μt , t ≥ 1} is that K v#

relatively compact and consequently that μt −→ ν. We now prove that the function g is regularly varying. Let A be a bounded open semi-cone which satisfies assumption (B2). Since the function s → ν(s · A) is decreasing and since s · A ⊂ t · A for s > t, there is a jump at s such that ν(∂(s · A)) > 0. Therefore there are at most countably many such s. For s such that ν(∂(s · A)) = 0, ν(s−1 A) g(st) g(st)μ(sts−1 A) = lim = . t→∞ g(t) t→∞ g(t)μ(tA) ν(A) lim

(B.2.4)

By Theorem 1.1.2, this proves that g is regularly varying with index α > 0 since g is non-decreasing and ∪s 0. Then lim

n→∞

ν(A) g(t) g(t)μ(tA) = lim = . g (t) n→∞ g (t)μ(tA) ν (A)

B.2 Regular variation of measures on Polish spaces

525

Let c denote the ratio in the right-hand side of the last display. Since ν is boundedly finite, c < ∞. If c = 0, then g(t) = o(g (t)) and for any continuity set B of ν, we have g(t) ν(B) = lim g(t)μ(tB) = lim g (t)μ(tB) = 0 . n→∞ n→∞ g (t) This implies that ν is the null measure which contradicts the assumption therefore c ∈ (0, ∞). Furthermore g(t)μ(tA) g (t)μ(tA) ν (A) = c lim =c . t→∞ g(t)μ(tA) t→∞ g(t)μ(tA) ν(A)

1 = lim

We have proved that ν and ν agree on their common continuity sets so are equal by Lemma A.1.8.  Lemma B.2.3 Let B be a separable Banach space with norm |·| and set E = B \ {0} endowed with the induced topology and let B consists of sets separated from zero, i.e. B ∈ B if and only if there exists  > 0 such that B(0, ) ⊂ B c . Then (E, B) is a localized Polish space and satisfies conditions (B1), (B2), and (B3). Proof. The only non-trivial statement is condition (B3). Let A be the set of proper differences of semi-cones i.e. A ∈ A if and only if there exists two semi-cones A1 and A2 such that A1 \ A2 . Then A is a π-system since the intersection and union of two semi-cones is a semi-cone. For x ∈ E, let δ > 0 be such that 0 ∈ / B(x, δ). Set A1 = ∪t≥1 t · B(x, δ) and A2 = A1 \ B(x, δ). Then A1 and A2 are bounded semi-cones and A1 \ A2 = (x, δ). Thus A countably generates the open sets by Lemma A.2.8 and is therefore measure determining by Theorem A.1.7.  Let E be a Polish space and f : E → [0, ∞). For μ ∈ MB and s > 0, let Ts μ be the measure in MB defined by  Ts μ(f ) = f (sx)μ(dx) . E

If μ is a point measure, then Ts μ is the measure with the same points multiplied by s. Lemma B.2.4 Let E be a localized Polish space endowed with a continuous outer multiplication which satisfies conditions (B1), (B2), and (B3). Assume moreover that the outer multiplication is uniformly continuous on bounded sets. The map (μ, s) → Ts μ defined on MB × (0, ∞) is continuous with respect to the topology of vague# convergence. Proof. Let {μn , n ∈ N} be a sequence of measures in M which converge vaguely# to μ and let {sn , n ∈ N} be a sequence of positive real numbers which converge to s > 0.

526

B Vague# convergence

Let f : E → [0, ∞) be a bounded Lipschitz continuous function with bounded support. Define fs by fs (x) = f (s · x). Then Ts μ(f ) = μ(fs ) and Tsn μn (f ) − Ts μ(f ) = μn (fsn ) − μ(fs ) = μn (fsn − fs ) + μn (fs ) − μ(fs ) . By continuity of the outer multiplication, fsn converges pointwise to fs . Moreover, since sn → s, the sequence fsn is uniformly bounded with uniformly bounded support. Thus μ(fsn ) → μ(f ) by bounded convergence. Since f is bounded with bounded support and Lipschitz continuous, and since the outer multiplication is uniformly continuous on bounded sets, there exists a bounded set A such that for every  > 0 and large enough n, |f (sn x) − f (sx)| ≤ |f |Lip d(sn · x, s · x) ≤ 1A (x). This yields lim sup |μn (fsn − fs )| ≤  lim sup μn (A) . n→∞

n→∞

Since  is arbitrary, this proves that lim supn→∞ |μn (fsn − fs )| = 0. 

B.2.1 Proof of Theorem 2.1.3 Let Rd \ {0} be endowed with the boundedness B0 which consists of sets separated from zero. The outer multiplication is the usual one and satisfies (M1), (M2), and (M3). – Condition (B1) holds: if A is separated from zero, there exists  > 0 such that A ⊂ B c (0, ). Thus ∪t>s (t · A) ⊂ B c (0, s) which implies that ∪t>s (t · A) is bounded. c

– The open bounded semi-cone B (0, 1) satisfies condition (B2). – Condition (B3) holds by Lemma B.2.3. Thus Theorem 2.1.3 is a particular case of Theorem B.2.2.

B.2.2 Proof of Theorem 3.1.2 We consider Cd,2 endowed with the boundedness B2 which consists of sets bounded away from at least two axes. That is, using the map h2 defined in (3.1.1), a set B is bounded if and only if there exists  > 0 such that x ∈ B implies h2 (x) > . Equivalently, B is bounded if there exists  > 0 such

B.2 Regular variation of measures on Polish spaces

527

that B ⊂ h−1 2 ((, ∞)). The outer multiplication is the usual one and has the required properties. – Let A be bounded and  > 0 be such that A ⊂ h−1 2 ((, ∞)). Since the function h2 is homogeneous, for s > 0, ∪t>s (t · A) ⊂ h−1 2 ((s, ∞)) . Thus Condition (B1) holds. – Since h is continuous the open bounded semi-cone h−1 2 ((1, ∞)) satisfies condition (B2). – To check Condition (B3), we show that the collection of proper differences of bounded semi-cones is measure determining by applying Lemma A.2.8. If x ∈ Cd,2 , there exists δ > 0 such that h2 (x) > 2δ. Consider, for instance, the supnorm on Rd . Then A1 = ∪t≥1 (t · B(x, δ)) and A2 = A1 \ B(x, δ) are bounded semi-cones and A1 \ A2 = B(x, δ). Thus the semi-cones are measure determining. Thus Theorem 3.1.2 holds a particular case of Theorem B.2.2.

B.2.3 Proof of the existence of the tail measure We show here how to apply the extension Theorem B.1.33 to obtain the existence of the tail measure in Theorem 5.1.2. We consider (Rd )Z endowed with the product topology and the associated Borel σ-field, so that (B.1.8) holds. Set F∞ = (Rd )Z and for n ∈ N, define Fn = (Rd )2n+1 \ {0} endowed with the boundedness B (n) which consists of sets separated from 0 in (Rd )2n+1 \ {0}. We fix an arbitrary norm in Rd . For k, n ≥ 1, we define the sets (n)

Uk

(∞)

Bk

= {x ∈ Fn :

sup |xi | > k −1 } ,

−n≤i≤n

= {x ∈ F∞ :

sup |xi | > k −1 } .

−k≤i≤k

For k ≤ n, the projections Πk,n : Fn → Fk are defined by Πk,n ((x−n , . . . , xn )) = (x−k , . . . , xk ) and Πn : F∞ → Fn by Πn ((xj , j ∈ Z)) = (x−n , . . . , xn ). (n)

These sets {Uk , k, n ≥ 1} satisfy condition (B.1.9) (see Example B.1.34) (∞) and the sequence {Bk , k ∈ N} is a localizing sequence for B (∞) . The exponent measures ν −n,n satisfy the consistency condition (B.1.11), thus Theorem B.1.33 can be applied to prove the existence of the tail measure.

B.2.4 Spectral representation of homogeneous measures Let (E, d) be a complete separable metric space endowed with a continuous outer multiplication (t, x) → t·x and an element 0 with the following property:

528

B Vague# convergence

d(0, s · x) ≤ d(0, t · x) , 0 < s < t , x ∈ E . Let E \ {0} be endowed with the boundedness B0 of sets separated from zero.

Theorem B.2.5 Let ν be a boundedly finite measure on (E \ {0}, B0 ) such that ν(t · A) = t−α ν(A) for all Borel set A and t > 0. Assume there exists a measurable map τ : E → R+ such that (i) τ (x) = 0 if and only if x = 0; (ii) τ is 1-homogeneous; (iii) ν({x ∈ E : τ (x) > 1}) = 1. Then there exists a probability space (Ω, F, P) and an E-valued random element Z such that  ∞ ν= E[δrZ ]αr−α−1 dr . 0

Proof. We write E∗ = E \ {0}. Define Sτ = {x ∈ E : τ (x) = 1} and the map T : E∗ → (0, ∞) × Sτ by T (x) = (τ (x), τ −1 (x) · x) , x ∈ E∗ . The map T is measurable and one-to-one. Let σ be the probability measure on Sτ defined by σ(B) = ν({x ∈ E∗ : τ (x) > 1, τ −1 (x) · x ∈ B}) , for all Borel subsets B of Sτ . Then να ⊗ σ is the image measure of ν by the map T , i.e. να ⊗ σ = ν ◦ T −1 . The product form is a consequence of the homogeneity of ν. Indeed, for a Borel subset B of Sτ and c > 0, we have, by Fubini theorem and by the homogeneity properties of τ and ν,    ν ◦ T −1 ((c, ∞) × B) = 1 τ (x) > c, τ −1 (x) · x ∈ B ν(dx) ∗ E   1 τ (y) > 1, τ −1 (y) · y ∈ B ν(cdy) = E∗    −α =c 1 τ (x) > 1, τ −1 (x) · x ∈ B ν(dx) −α

=c

E∗

σ(B) = να ((c, ∞))σ(B) .

B.2 Regular variation of measures on Polish spaces

529

This proves our claim. Since T is one-to-one, we have a converse formula: for a Borel subset A of E∗ , by definition of σ, we have  ∞ (να ⊗ σ) ◦ T (A) = 1A (cu)σ(du)αc−α−1 dc 0 Sτ   ∞ 1{τ (x) > 1}1A (cτ −1 (x) · x)ν(dx)αc−α−1 dc = E∗ 0   ∞ τ −α (x)1{τ (x) > 1}1A (sx)ν(dx)αs−α−1 ds = E∗



0



(c = sτ (x)) ∞

= E∗

 = E∗

 =

E∗

τ −α (s−1 y)1{τ (y) > s}1A (y)ν(s−1 dy)αs−α−1 ds

0



(y = sx) ∞

τ −α (y)1{τ (y) > s}1A (y)ν(dy)αsα−1 ds   τ (y ) α−1 αs ds τ −α (y)1A (y)ν(dy) = ν(A) . 0

0

Let Z be an Sτ -valued random element defined on a probability space (Ω, F, P), with law σ. Then the previous formula reads exactly as  ∞ ν(A) = P(rZ ∈ A)αr−α−1 dr . 0



C Weak convergence of stochastic processes

C.1 The space D(I) The first example of function space of interest is the space C(I) of continuous functions defined on an interval I, but it is not suitable for the description of processes that have jumps, such as empirical processes or tail empirical processes, and for limits with discontinuities, such as L´evy stable processes. Since these are the main processes we consider, we skip the usual study of this space and directly jump into a larger space. Recall that I ◦ denotes the interior of I. A function f defined on I is said to be c`adl` ag (continue a` droite et limit´ee `a gauche) if f is right continuous and has left limits at all points s ∈ I ◦ and if it is moreover right continuous at the left endpoint of I if it is in I. That is, (i) for all t ∈ I ◦ , lims→t,s>t f (s) = f (t); (ii) for all t ∈ I ◦ , lims→t,sa f (s) = f (a) if a = inf(I) ∈ I; (iv) lims→b,s 0, hence countably many jumps, and is bounded since by definition it must be left-continuous at 1. The space D(I) is the set of all real-valued c` adl` ag functions on I and obviously contains C(I). There exist several useful topologies defined on D(I). In Appendix C.2, we will discuss the most commonly used one called the J1 topology and in Appendix C.3, we will introduce the weaker M1 topology which is useful in the case of discontinuous limits.

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

531

532

C Weak convergence of stochastic processes

C.2 The J1 topology on a compact interval The main reference is [Bil68] and its second edition [Bil99]. There are differences (in particular in numbering) between the two editions so we will give a precise reference for each result we quote. We consider in this section convergence on a compact interval which we set equal to [0, 1] without loss of generality. Recall that  · ∞ denotes the supremum norm of a function f . Definition C.2.1 (J1 topology) For functions f , g in D([0, 1]), define dJ1 (f, g) = inf max{f ◦ u − g∞ , u − Id∞ } ,

(C.2.1)

where Id is the identity map on [0, 1] and the infimum is taken over all applications u, continuous and increasing on [0, 1] and such that u(0) = 0 and u(1) = 1. Then dJ1 is a metric on D([0, 1]). The J1 topology is the topology defined by this metric. The Borel σ-field relative to the J1 topology is denoted D.

The J1 topology has the following properties: (i) The definition implies immediately that dJ1 (f, g) ≤ f − g∞ ; hence uniform convergence entails convergence in the J1 topology, that is, the uniform topology is finer than the J1 topology. Conversely, there exist sequences which converge for the J1 topology, but not for the uniform topology. (ii) An equivalent definition of the convergence of fn to f with respect to the J1 topology is that there exists a sequence {un } of increasing continuous functions from [0, 1] onto [0, 1], with un (0) = 0 and un (1) = 1, such that un converges uniformly to Id and fn ◦ un converges uniformly to f . As an immediate consequence, the supremum functional is continuous with respect to the J1 topology. (iii) Evaluation at a point is not continuous for the J1 topology, that is, the convergence of fn to f with respect to the J1 topology does not imply that limn→∞ fn (t) = f (t) for all t. This latter convergence is true only if f is continuous at t. This means that for a fixed t ∈ [0, 1], the application defined on D([0, 1]) by f → f (t) is continuous at f for the J1 topology if and only if f is continuous at t. (iv) The space D([0, 1]) endowed with the metric dJ1 is separable. For instance, the set of functions which take a rational value at 1 and for some n ≥ 1 a constant rational value on each interval [i/n, (i+1)/n), 0 ≤ i ≤ n−1,

C.2 The J1 topology on a compact interval

533

is dense in D([0, 1]). Unfortunately (D([0, 1]), dJ1 ) is not complete. However, there exists a metric which defines the same topology and for which the space D([0, 1]) is complete. In other words, D([0, 1]) endowed with the topology defined by the metric dJ1 is a Polish space. It is possible to define a metric doJ1 which defines the same topology and for which D([0, 1]) is complete. This metric can also be chosen such that dJ10 (f, g) ≤ f − g∞ . See [Bil99, Equation 12.16]. The goal of the J1 topology is to handle sequences of functions with jumps but it can handle only single jumps. Example C.2.2 Let {(tn , xn ), n ≥ 0} be a sequence in [0, 1] × R such that limn→∞ (tn , xn ) = (t0 , x0 ) and 0 < t0 < 1. Define the functions fn = J

xn 1{tn ≤ t}, n ≥ 0. Then fn →1 f0 . Indeed, it is easy to define a piecewise linear parametrization rn such that rn (t0 ) = tn and fn ◦ rn − f0 ∞ = |xn − x0 |. Example C.2.3 Let fn be the sequence of functions on [0, 1] defined by 1 fn = 1[ 12 − n1 , 12 ) + 1[ 12 ,1] . 2 Then fn converges pointwise to the function 1[ 12 ,1] but not in the J1 topology. Indeed, for any homomorphism u from [0, 1] onto [0, 1] such that u(0) = 0 and u(1) = 1, 1 fn ◦ u − 1[ 12 ,1] ∞ ≥ . 2 Thus dJ1 (fn , 1[ 12 ,1] ) ≥ 1/2.



The J1 distance is related to uniform convergence for continuous functions. Proposition C.2.4 If a sequence {fn } of functions in D([0, 1]) converges to f with respect to the J1 topology, and if f is continuous, then the convergence is uniform. If a sequence of continuous functions converges with respect to the J1 topology, then its limit must be continuous and the convergence is uniform. Proof. There exists a sequence of increasing continuous functions un such that fn ◦ un − f ∞ ∨ un − Id∞ → 0 or equivalently f ◦ u−1 n − fn ∞ ∨ un − Id∞ → 0. If f is continuous, then f ◦ u−1 n converges uniformly. For each n ≥ 1, there exists a continuous function un such that fn ◦ un − f ∞ ≤ 1/n; hence f is continuous as the uniform limit of a sequence of continuous function. 

534

C Weak convergence of stochastic processes

The result above illustrates one limitation of the J1 topology: it cannot handle “jumps at the limit”. This limitation will be partially overcome by the M1 topology. Addition is also not continuous with respect to the J1 topology. This means that it may happen that fn → f and gn → g for the J1 topology, but fn + gn does not converge to f + g. However, continuity of addition may hold under a restriction on f and g. Proposition C.2.5 If the sequences {fn } and {gn } converge to f and g with respect to the J1 topology, and if f and g have no common jumps, then fn + gn converges to f + g.



Proof. [Whi80, Theorem 4.1].

We now define a modulus of continuity for functions in D([0, 1]). For all bounded functions f on [0, 1] define wJ (f, δ) =

sup 0≤t1 ≤s≤t2 ≤1 ,|t2 −t1 | 0 lim lim sup P(|Xn (δ) − Xn (0)| > ) = 0 ,

(C.2.3a)

lim lim sup P(|Xn (1−) − Xn (1 − δ)| > ) = 0 ,

(C.2.3b)

lim lim sup P(wJ (Xn , δ) > ) = 0 ,

(C.2.3c)

δ→0 n→∞ δ→0 n→∞

δ→0 n→∞

536

C Weak convergence of stochastic processes

w

then there exists a D([0, 1])-valued process X such that Xn =⇒ X in D([0, 1]) endowed with the J1 topology.

Proof. This is obtained by combining [Bil99, Theorem 13.2], its corollary (p. 140) and equation (13.8) in the comments after the proof of the corollary.  Condition (C.2.3c) is called asymptotic stochastic equicontinuity in D([0, 1]). The previous results allow to prove the convergence to a D([0, 1])-valued process without knowing the properties of the limiting process characterized by the finite-dimensional distributions. We say that a stochastic process X defined on an interval I is stochastically continuous at t ∈ I if P(X discontinuous at t) = 0 and stochastically continuous on I if it is stochastically continuous at all t ∈ I. Theorem C.2.9 Let {X, Xn , n ≥ 1} be D([0, 1])-valued stochastic processes. Assume that X is stochastically continuous at 1 and that the finite-dimensional distributions of Xn converge to those X. If moreover w (C.2.3c), then (C.2.3c) holds and Xn =⇒ X in D([0, 1]) endowed with the J1 topology.

Proof. [Bil68, Theorem 15.4] or [Bil99, Theorem 13.3].



A criterion to ensure (C.2.3c) is given in [Bil68, Theorem 15.6]. If there exist β > 1/2, γ ≥ 0 and a continuous non-decreasing function g such that, for all 0 ≤ s ≤ t ≤ u ≤ 1 and x > 0, P (|Xn (u) − Xn (t)| ∧ |Xn (t) − Xn (s)| ≥ x) ≤ cst x−γ [g(u) − g(s)]2β , (C.2.4) then (C.2.3c) holds. By a close inspection of the proof of [Bil68, Theorem 15.6] we obtain the following two results. Lemma C.2.10 Assume that there exist β > 1/2, γ ≥ 0, a sequence of σfields Fn , n ≥ 1 and a continuous non-decreasing function g and a sequence of right-continuous non-decreasing random functions gn which converge to g such that, for all 0 ≤ s ≤ t ≤ u ≤ 1 and x > 0, P (|Xn (u) − Xn (t)| ∧ |Xn (t) − Xn (s)| ≥ x | Fn ) ≤ cst x−γ [gn (u) − gn (s)]2β . (C.2.5) Then, (C.2.3c) holds for all  > 0.

C.2 The J1 topology on a compact interval

537

Proof (Sketch of proof ). Let m ≥ 1 be an integer and set δ = A/2m. Following the arguments of the proof of [Bil68, Theorem 15.6], specifically the bound (15.26) on p. 129, for which the assumed continuity of the function F that appears therein is not used, we see that (C.2.5) implies that

−2γ

P(w (Xn , δ) >  | Fn ) ≤ cst 

2m−1 

{gn (iδ) − gn ((i + 2)δ)}2β ,

i=0

where the constant is non-random and depends only on β and γ. Letting n → ∞ yields lim sup P(w (Xn , δ) >  | Fn ) n→∞

≤ cst −2γ

2m−1 

{g(iδ) − g((i + 2)δ)}2β

i=0

≤ cst −2γ g(0) ⎛

max

0≤i≤2m−1

{g((i + 2)δ) − g(iδ)}2β−1 ⎞2β−1

≤ cst −2γ ⎝ sup |g(t) − g(s)|⎠

.

0≤s≤t≤1 |s−t|≤2δ

Since g is continuous, the latter quantity tends to zero as δ → 0. Thus, for every η > 0, we can choose δ > 0 such that lim sup P(w (Xn , δ) >  | Fn ) ≤ η . n→∞

Since the conditional probability is bounded by 1, this yields lim sup P(w (Xn , δ) > ) ≤ 2η . n→∞

This concludes the proof.



Lemma C.2.11 Assume that there exist β > 1/2, γ ≥ 0, n ≥ 1, a sequence of maps gn : [0, 1] × [0, 1] and a monotone continuous function g : [0, 1] → R+ such that, for all 0 ≤ s ≤ t ≤ u ≤ 1 gn (s, t) ∨ gn (t, u) ≤ gn (s, u) , gn (s, u) ≤ gn (s, t) + gn (t, u) , lim gn (s, t) = |g(t) − g(s)| ,

n→∞

and for all x > 0, P (|Xn (u) − Xn (t)| ∧ |Xn (t) − Xn (s)| ≥ x |) ≤ cst x−γ gn2β (s, u). Then, (C.2.3c) holds for all  > 0.

(C.2.6)

538

C Weak convergence of stochastic processes

Proof. This result can be obtained by the same adaptation of the proof as previously, by replacing gn (iδ) − gn ((i − 1)δ) by gn ((i − 1)δ, iδ), the rest of the argument being similar. This is also a particular case of the Theorem of [GGR96].  A sufficient condition for (C.2.6) is that there exists γ > 0 such that for all s < t, E[|Xn (s) − Xn (t)|γ ] ≤ cst gn2β (s, t).

(C.2.7)

This result obviously includes the case gn (s, t) = gn (s) − gn (t) where gn converges to g. Uniform modulus of continuity Let w(f, δ) =

sup |f (t) − f (s)| .

0≤s≤t≤1 |s−t|≤δ

Clearly, wJ (f, δ) ≤ w(f, δ) and w(·, δ) is measurable (for each fixed δ) because it is continuous with respect to the uniform metric (cf. [Bil99, p. 82]). Even though the metric generated by the sup-distance is not appropriate to study weak convergence in D([0, 1]), nothing prevents us from trying to prove asymptotic equicontinuity with respect to w.

Theorem C.2.12 Assume that all the finite-dimensional distributions of a sequence of D([0, 1])-valued stochastic processes {Xn , n ≥ 1} converge weakly. If moreover for each  > 0, lim lim sup P(w(Xn , δ) > ) = 0 ,

δ→0 n→∞

(C.2.8)

then there exists a stochastic process X such that P(X ∈ C([0, 1])) = 1 w and Xn =⇒ X in D([0, 1]) endowed with the J1 topology.

Proof. This is Corollary (p. 142) to [Bil99, Theorem 13.4] upon noting that (C.2.8) implies (C.2.3a), (C.2.3b) and (C.2.3c).  Criteria to establish (C.2.8) will be given in Appendix C.4.

C.2 The J1 topology on a compact interval

539

C.2.1 Non-compact intervals We now consider the case of a non-empty interval I which does not contain both its endpoints such as R, (0, ∞) or [0, ∞). We refer to [Bil99, Section 16]. There is a subtlety in the extension of convergence to non-compact intervals. Take, for example, fn = 1[0,1−n−1 ) . Then fn does not converge to 1[0,1) in D([0, 1), J1 ) but the convergence does hold in D([0, a], J1 ) for all a > 1. The problem is of course the discontinuity at 1. Definition C.2.13 Let {f, fn , n ≥ 1} be functions in D(I). We say that fn → f with respect to J1 topology if the restrictions fK,n of fn to each compact subinterval K ⊂ I whose endpoints are continuity points of f converges to the restriction fK of f to K with respect to the J1 topology on D(K).

We will use the following criteria. For which we introduce the modulis of continuity over an arbitrary compact interval K. Define w(f, δ, K) = sup |f (t) − f (s)| , s,t∈K |t−s| 0 and compact subinterval K ⊂ I, lim lim sup P(wJ (Xn , δ, K) > ) = 0 .

δ→0 n→∞

(C.2.9)

w

Then Xn =⇒ X in D(I) endowed with the J1 topology.

The definition of the J1 topology on D(I) and Theorem C.2.12 yields the following criterion.

540

C Weak convergence of stochastic processes

Theorem C.2.15 Assume that all the finite-dimensional distributions of a sequence of D(I)-valued stochastic processes {Xn , n ≥ 1} converge weakly. If moreover for each  > 0 and compact subinterval K ⊂ I, lim lim sup P(w(Xn , δ, K) > ) = 0 ,

δ→0 n→∞

(C.2.10)

then there exists a stochastic process X such that P(X ∈ C(I)) = 1 and w Xn =⇒ X in D(I) endowed with the J1 topology.

C.2.2 Vervaat’s lemma The next result gives weak convergence for the inverses. It is of a particular importance when one deals with order statistics and can be viewed as a special case of the Continuous Mapping Theorem A.2.5. Theorem C.2.16 (Vervaat’s Lemma) Let I, J be non-empty intervals, {X, Xn , n ≥ 1} be D(I)-valued random elements such that X has almost surely continuous paths. Let g : I → J be a surjective, continuous, and increasing function with positive derivative g . If Xn , n ≥ 1, w have non-decreasing paths, if cn → ∞ and cn (Xn − g) =⇒ X in D(I) endowed with the J1 topology, then ← ← ← cn (Xn − g, X← n − g ) =⇒ (X, −(g ) × X ◦ g ) , w

in D(I) × D(J) endowed with the product J1 topology.

Proof. [Ver72], [dHF06, Lemma A.0.2].



C.3 The M1 topology We have seen the limitations of the J1 topology. The M1 topology overcomes some of them. In the M1 topology, it will be possible for a sequence of continuous functions to converge to a discontinuous function, and although addition will still not be continuous, it will be “more continuous” than in the J1 topology (compare Propositions C.2.5 and C.3.5). In order to define the M1 topology, we must first define the completed graph of a function with jumps. For x, y ∈ R, define the segment x, y as [x∧y, x∨y].

C.3 The M1 topology

541

For f ∈ D([0, 1]), define Γ (f ) = {(t, x) ∈ [0, 1] × R | x ∈ f (t− ), f (t)} . If f does not have a jump at t, then the segment f (t− ), f (t) is just the point f (t). Otherwise, the graph of f is completed by the vertical segment (t, f (t− )), (t, f (t)) of the plane R2 . See Figure C.1. An order is defined on the graph Γ (f ) as follows: write (t, x) ≺ (t , x ) if t < t or t = t and x is closer to f (t− ) than x , i.e. |f (t− ) − x| < |f (t− ) − x |. The points of the completed graph are thus totally ordered. x

x

t

t

Fig. C.1. Graph (left) and completed graph (right) of a c` adl` ag function.

A parametric representation of the completed graph Γ (f ) is a continuous increasing (with respect to the order ≺) map of [0, 1] onto Γ (f ), i.e. a continuous map (r, u) : [0, 1] → [0, 1] × R such that for all t ∈ [0, 1], either u(t) = f (r(t)) or u(t) ∈ f (r(t)− ), f (r(t)) if f has a jump at r(t). The set of parametric representations of the completed graph of f is denoted by R(f ). We can now define the M1 distance and topology.

Definition C.3.1 The distance dM1 is defined for f, g ∈ D([0, 1]) by dM1 (f, g) =

inf

(r,u)∈R(f ),(s,v)∈R(g)

max{r − s∞ , u − v∞ } .

Then dM1 is a metric on D([0, 1]) and the M1 topology is the topology defined by this metric.

The following bound holds: for f, g ∈ D([0, 1]), dM1 (f, g) ≤ dJ1 (f, g) ≤ f − g∞ . This shows that the J1 topology is finer than the M1 topology and that J1 convergence implies M1 convergence. The converse is not true.

542

C Weak convergence of stochastic processes

Example C.3.2 Define, for instance, for each n ≥ 1 the continuous function fn (x) = n(x − 1/2 + 1/n)1[1/2−1/n,1/2) (x) + 1[1/2,1] (x). Then fn converges pointwise to 1[1/2,1] but not for the J1 topology since the limit is discontinuous. But it does converge in the M1 topology. See Figure C.2. In order to see this, consider the following parameterizations (rn , un ) and (sn , vn ) of the completed graphs of fn and f : rn (t) = t , un (t) = fn (t) , n 1 t1[0,1/2−1/n] (t) + 1[1/2−1/n,1/2] (t) + t1[1/2,1] (t) , sn (t) = n−2 2 un = vn = fn . Then rn − sn ∞ = 1/n and un − vn ∞ = 0, hence dM1 (fn , f ) ≤ 1/n.

fn

1 2

0





f

1 n

1 2

1

0

1 2

1

Fig. C.2. M1 but not J1 convergence of fn to f .

This example shows that the M1 topology can handle a sequence of continuous functions converging to a discontinuous one. The M1 topology can also handle multiple jumps. The sequence of functions of Example C.2.3 does not converge to its pointwise limit in the J1 topology but does converge in the M1 topology. Example C.3.3 Let {(tn,1 , xn,1 ), . . . , (tn,k , xn,k ), n ≥ ([0, 1] × (0, ∞))k such that limn→∞ tn,i = t0 ∈ (0, 1) limn→∞ (xn,1 + · · · + xn,k ) = s0 . Define the functions fn n ≥ 0 and f0 = s0 1[t0 ,∞) . Then it is possible to define tations (rn , un ) of fn and (r0 , u0 ) of f0 such that

1} be a sequence in for i = 1, . . . , k and k = i=1 xn,i 1[tn,i ,∞) , parametric represen-

dM1 (fn , f ) ≤ rn − r0 ∞ ∨ un − u0 ∞ = (∨ki=1 |tn,i − t0 |) ∨ |sn − s0 | . M

Thus fn →1 f0 . The M1 topology shares two important properties as the J1 topology.

C.3 The M1 topology

543

Proposition C.3.4 If fn converges to f with respect to the M1 topology, then supt∈[0,1] fn (t) converges to supt∈[0,1] f (t). If f is continuous, then fn converges uniformly to f .

Proof. The first statement is [Whi02, Theorem 13.4.1]. The second statement follows from [Whi02, Theorem 12.4.1 and Lemmas 12.4.2 and 12.5.1].  The following result shows that the M1 topology is weaker than the J1 topology and more useful with respect to addition. Proposition C.3.5 If f and g have no common jumps of opposite signs, then addition is continuous at (f, g) with respect to the M1 topology.



Proof. [Whi02, Theorem 12.7.3]

This result adds flexibility to the corresponding result in the J1 topology for which no common jumps at all are allowed. However, if f and g have a jump at the same location but with opposite signs, then convergence in the M1 topology may not hold. We now introduce a modulus of continuity adapted to the M1 topology. For real numbers y1 , y2 , and y3 , we define the distance of y3 to the segment y1 , y2  by d(y3 , y1 , y2 ) =

inf

t∈y1 ,y2 

|y3 − t| .

For f ∈ D([0, 1]), define wM (f, δ) = sup{d(f (t), f (s), f (u)), 0 ≤ s < t < u ≤ 1 , 0 < u − s ≤ δ} . This modulus of continuity yields a criterion for M1 convergence. Theorem C.3.6 dM1 (fn , f ) → 0 if and only if fn (t) → f (t) on a dense subset containing 0 and 1 and wM (fn , δ) → 0.

Since it is easily seen from the definition that wM (f, δ) = 0 for all δ > 0 for a non-decreasing function f , an obvious corollary of C.3.6 is that if fn is non-decreasing for each n, then the pointwise convergence of fn to f implies M1 convergence.

544

C Weak convergence of stochastic processes

Theorem C.3.7 Assume that the finite-dimensional distributions of the sequence of processes {Xn , n ≥ 1} with values in D([0, 1]) converge to those of a process X with values in D([0, 1]) such that P(X is left-continuous at 1) = 1. If moreover for each  > 0 lim lim sup P(wM (Xn , δ) > ) = 0 ,

δ→0 n→∞

(C.3.1)

w

then Xn =⇒ X in D([0, 1]) endowed with the M1 topology.

Proof. [Whi02, Theorem 12.12.3]



Unfortunately, there exists no convenient criterion to prove weak convergence in the M1 topology. One must resort to the definition of the M1 distance, to Theorem C.3.7 or to continuous mapping arguments to prove weak convergence of a sequence of stochastic processes with respect to the M1 topology. As already mentioned, the only easy case is that of a sequence of non-decreasing random functions, since wM (f, δ) = 0 for all δ > 0 for a non-decreasing function f . Corollary C.3.8 Assume that the finite-dimensional distributions of the sequence of non-decreasing processes {Xn , n ≥ 1} with values in D([0, 1]) converge to those of a process X such that P(X is left-continuous at 1) = w 1. Then Xn =⇒ X in D([0, 1]) endowed with the M1 topology.

C.3.1 Non-compact intervals We now consider the case of intervals I which do not contain their endpoints such as R, (0, ∞) or [0, ∞). The extension is the same as for the J1 topology. Definition C.3.9 Let {f, fn , n ≥ 1} be functions in D(I). We say that fn → f with respect to M1 topology if the restrictions fK,n of fn to each compact subinterval K ⊂ I whose endpoints are continuity points of f converges to the restriction fK of f to K with respect to the M1 topology on D(K).

C.4 Asymptotic equicontinuity

545

C.4 Asymptotic equicontinuity Definition C.4.1 Let (G, ρ) be a semi-metric space. A process {Z(f ), f ∈ G} is said to be separable it there exists a countable subclass G0 ⊂ G and Ω0 ⊆ Ω with P(Ω0 ) = 1 such that for all ω ∈ Ω0 , f ∈ G and  > 0, Z(f, ω) ∈ {Z(g, ω) : g ∈ G0 ∩ B(f, )} , where B(f, ) is the open ball with center f and radius , relative to ρ.

Note that this definition entails that the space (G, ρ) is separable and that G0 (which is countable by assumption) is dense in G. Example C.4.2 All c` adl` ag stochastic processes indexed by an interval of R are separable. An important consequence of the separability is measurability of the suprema. The result follows from the fact that with probability 1, ZG = sup Z(f ) = sup Z(f ) f ∈G

f ∈G0

and the latter supremum is measurable since G0 is countable. Lemma C.4.3 Assume that {Z(f ), f ∈ G} is separable. Then ZG is measurable.

Definition C.4.4 Let (G, ρ) be a semi-metric space. • The covering number N (G, ρ, ) is the minimum number of balls with radius  needed to cover G. The logarithm of the covering number is called its metric entropy. • The space (G, ρ) is said to be totally bounded if N (G, ρ, ) < ∞ for all  > 0.

Let Zn be the centered empirical process indexed by a class G defined by Zn (f ) =

mn 

{Zn,i (f ) − E[Zn,i (f )]} ,

(C.4.1)

i=1

where {Zn,i , n ≥ 1}, i = 1, . . . , mn , are i.i.d. separable, stochastic processes and mn is a sequence of integers such that mn → ∞.

546

C Weak convergence of stochastic processes

Define the random semi-metric dn on G by mn  d2n (f, g) = {Zn,i (f ) − Zn,i (g)}2 , f, g ∈ G .

(C.4.2)

i=1

In general, the (random) covering number N (G, dn , ) does not need to be measurable. Theorem C.4.5 Consider the sequence of processes Zn (f ) =

mn 

{Zn,i (f ) − E[Zn,i (f )]} ,

i=1

where for each n ≥ 1, {Zn,i , i = 1, . . . , mn } are i.i.d. separable stochastic processes. Assume that the semi-metric space (G, ρ) is totally bounded. Assume moreover that: (i) For all η > 0,   lim mn E[Zn,1 2G 1 Zn,1 2G > η ] = 0 .

n→∞

(C.4.3)

(ii) For every sequence {δn } which decreases to zero, lim

n→∞

sup f,g∈G ρ(f,g)≤δn

E[d2n (f, g)] = 0 .

(C.4.4)

(iii) There exists a measurable majorant of N ∗ (G, dn , ) of the covering number N (G, dn , ) such that for every sequence {δn } which decreases to zero, 

δn



P

log N ∗ (G, dn , )d −→ 0 .

(C.4.5)

0

Then {Zn , n ≥ 1} is asymptotically ρ-equicontinuous, i.e. ⎛ ⎞ lim lim sup P ⎝ sup |Zn (f ) − Zn (g)| > η ⎠ = 0 .

δ→0 n→∞

(C.4.6)

f,g∈G ρ(f,g) 0, n ∈ N, (ei )1≤i≤mn ∈ {−1, 0, 1}mn and

C.4 Asymptotic equicontinuity

j ∈ {1, 2}, the supremum m  n    j ei (Zn,i (f ) − Zn,i (g))  = sup   f,g∈G  ρ(f,g) A mn 2 i=1 ξi i=1 i=1 ξi   mn  −1 2 2 2 = P mn ξi G (Yn,i ) > A (vn /mn ) i=1

mn E[G2 (Yn,1 )] ≤ = O(A−2 ) . A2 vn Since N (G, dL2 (Qn ) , ) ≤ N (G, dL2 (Qn ) ,  ) for  ≤ , we have   δ log N (G, dn , )d > η P 0



δ

=P



 −1/2 log N (G, dL2 (Qn ) , sn )d



0

  =P A   ≤P A

δ/A



 −1/2 log N (G, dL2 (Qn ) , sn )d



0 δ/A

 log N (G, dL2 (Qn ) ,



 Qn

(G2 ))d



0

  ≤P A

δ/A 0

+ P(Qn (G2 ) > A2 s−1 n )    sup log N (G, dL2 (P ) , P (G2 ))d > η + O(A−2 ) .

P ∈P

Thus, by (C.4.9)



lim sup P n→∞

δ



 log N (G, dn , )d > η

= O(A−2 ) .

0



Since A is arbitrary, this finishes the proof.

C.4.2 VC and VC-subgraph classes Let C be a class of subsets of a measurable space E. For a finite set A ⊆ E, denote by TraceC (A) the trace of C on A, that is, the collection of all subsets of A obtained by intersection of A with sets C ∈ C. By ΔC (A) we denote the cardinality of TraceC (A). Note that ΔC (A) ≤ 2Card(A) . We say that C shatters A if ΔC (A) = 2Card(A) , that is, if every subset of A is the intersection of A with some set C ∈ C. Let mC (k) =

sup A⊆E:Card(A)=k

ΔC (A)

550

C Weak convergence of stochastic processes

and VC(C) = inf{k : mC (k) < 2k } . Definition C.4.9 (VC-class) A collection of sets C is called a VC-class if VC(C) < ∞, that is, there exists k < ∞ such that C does not shatter any subsets of E of cardinality k. The quantity VC(C) is called the VCindex of the class C. Example C.4.10 Let E = R, C = {(−∞, c) : c ∈ R} and A = {x1 , x2 }, x1 < x2 . Then TraceC (A) = {∅, x1 , {x1 , x2 }} and ΔC (k) = 3 < 4. Thus, the VC-index of C is 2 since no two-point set can be shattered. In general, C = {(−∞, c) : c ∈ Rd } has VC-index d + 1. Example C.4.11 Let E = R, C = {(a, b) : a, b ∈ R} and A = {x1 , x2 , x3 }, x1 < x2 < x3 . Then TraceC (A) = {∅, x1 , x2 , x3 , {x1 , x2 }, {x2 , x3 }, {x1 , x2 , x3 }} and ΔC (k) = 7 < 8. Thus, the VC-index of C is 3. In general, if a, b ∈ Rd , then the VC-index is 2d + 1. Definition C.4.12 (VC-subgraph class) The subgraph of a real-valued function f : E → R is the set Sf = {(x, s) ∈ E × R : s ≤ f (x)} . A class of functions G is VC-subgraph class if C = {Sf , f ∈ G} is a VC-class. Example C.4.13 ([Kos03], Lemma 9.8) The family of indicator functions of the sets in a VC-class C is a VC-subgraph class of functions with the VC-index equal to VC(C). In particular, the class of indicators of half-open intervals (−∞, c] or (c, ∞) in Rd is a VC-subgraph class with VC index d + 1. Example C.4.14 ([Kos08], Lemma 9.9 (vii)) If G is a VC-subgraph class of functions, then so are G + h = {f + h : f ∈ G}, Gg = {f g : f ∈ G}, and G ◦ g = {f ◦ g : f ∈ G} for any function g. Example C.4.15 ([Kos08], Lemma 9.10) If a class of function is linearly ordered, then it is VC-subgraph class with index 2. Example C.4.16 ([Dre15], Example 3.4) For s0 ≤ s ≤ t0 and y ∈ R let φs,y (x0 , . . . , xh ) = 1{|x0 | > s, xh ≤ y} and define gk,(s,y) : (Rh+1 )k → R by gk,(s,y) (x1 , . . . , xk ) =

k  j=1

φs,y (xj ) .

C.4 Asymptotic equicontinuity

551

Let Fk = {gk,(s,y) : s0 ≤ s ≤ t0 , y ∈ R}. The VC-index of the class Fk is O(log k). Indeed, the subgraph of gk,(s,y) equals {(t,(x1 , . . . , xk )) ∈ R × R2kd : t < gk,(s,y) (x1 , . . . , xk )} =

k 

(−∞, j) × {(x1 , . . . , xk ) : gk,(s,y) (x1 , . . . , xk ) = j} = M(s,y) .

j=0

We will show that the collection {M(s,y) , s0 < s < t0 , y ∈ R} is VC-class and calculate its index. (i)

= (yj,1 , yj,2 ), j = 1, . . . , k,

(i) yj

in R. Consider the set M =

Let m, k be positive integers. We denote y j i = 1, . . . , m. That is, we have mk points {(t

(i)

(i) (i) , (y 1 , . . . , y k )), i

(i)

(i)

= 1, . . . , m} of m points in R × R2k .

• If the symmetric difference {(s, ∞)×(−∞, y)}Δ{(s , ∞)×(−∞, y )} does (i) (i) not include any of the points (y 1 , . . . , y k ), i = 1, . . . , m, then M ∩ M(s,y) = M ∩ M(s ,y ) . (i)

• The hyperplanes {x ∈ R2 : xl = yj,l }, l = 1, 2, j = 1, . . . , k, i = 1, . . . , m, divide R2 into at most (mk + 1)2 hypercubes. • If (s, y) and (s , y ) are in the same hypercube, then the symmetric difference {(s, ∞)×(−∞, y)}Δ{(s , ∞)×(−∞, y )} does not include any of the points from M . Thus, M(s,y) and M(s ,y ) can pick out different subsets of M only if the points (s, y) and (s , y ) are in different hypercubes. • Consequently, the collection {M(s,y) , s0 < s < t0 , y ∈ R} can pick out at most (mk + 1)2 subsets of M . Hence, the collection cannot shutter M if (mk + 1)2 < 2m . This implies that m > 3 log(k). • Hence the VC index of the collection {M(s,y) , s0 < s < t0 , y ∈ R} is O(log k).  Let φ : Rh+1 → R and consider a functional Hφ defined on 0 (Rh+1 ) by  (0) (h) Hφ (x) = φ(xj ) , x = (xj )j∈Z , xj = (xj , . . . , xj ) . j∈Z

 = {Hφ , φ ∈ H}, where H is a space of real-valued functions defined on Let H h+1 . Define the function G on 0 (Rh+1 ) by R    (0) (0) (h) G(x) = 1 |xj | > 1 , x = (xj )j∈Z , xj = (xj , . . . , xj ) . j∈Z

552

C Weak convergence of stochastic processes

k of functions on 0 (Rh+1 ) by Define the class H k = {H1{G ≤ k}, H ∈ H}  . H Finally, let gk (x1 , . . . , xk ) =

k 

φ(xj )

j=1

and Fk = {gk : φ ∈ H}. k ) ≤ VC(Fk ). Lemma C.4.17 We have VC(H Proof. Define the map Πk : (Rh+1 )Z → (Rh+1 )k by   (0) Πk (x) = (x(1) , . . . , x(k) )1 |x(k+1) | ≤ 1 , (0)

(h)

where x(i) = (x(i) , . . . , x(i) ), i ≥ 1, are the decreasing order statistics according to the first component. Then  φ(xj ) = gk (Πk (x)) . 1{G(x) ≤ k} j∈Z

k = Fk ◦ Πk and the result follows by [Kos08, Lemma 9.9 (vii)]. Thus H



Combining Example C.4.16 and Lemma C.4.17 yields: Corollary C.4.18 Let φs,y (x0 , . . . , xh ) = 1{|x0 | > s, xh ≤ y}, s0 ≤ s ≤ t0 k = {Hφ 1{G ≤ k}, s0 ≤ s ≤ t0 , y ∈ R}. Then the and y ∈ R. Define H s,y k is O(log(k)). VC-index of the class H The importance of VC-subgraph classes is illustrated by the following lemma, where the bound on the covering number is established. Lemma C.4.19 If G is a VC-subgraph class with measurable envelope function G, then there exists a constant cst such that for every probability measure P and  ∈ (0, 1), N (G, dL2 (P ) , GL2 (P ) ) ≤ cst VC(G)(16e)VC(G) −2(VC(G)−1) . Proof. [GN16, Theorem 3.7.37], [vdVW96, Theorem 2.6.7] Lemma C.4.19 provides a sufficient condition for (C.4.9).



C.4 Asymptotic equicontinuity

553

Corollary C.4.20 If G is a VC-subgraph class with measurable envelope function G, then (C.4.9) holds. In particular, (C.4.9) holds for linearly ordered classes.

C.4.3 Random entropy for classes approximable by VC-subgraph classes If G is not a VC-subgraph class, we can still use the uniform entropy bound of Lemma C.4.19 to prove the random entropy condition (C.4.5) if G can be approximated by VC-subgraph classes. Let {Yn,i , 1 ≤ i ≤ mn } be random elements in a measurable set E and let G be a class of functions on E. Let vn be a non-decreasing sequence and recall that d2n (f, g) =

mn 1  {f (Yn,i ) − g(Yn,i )}2 . vn i=1

Assumption C.4.21 Let {Gk , k ≥ 1} be an a non-decreasing sequence of subclasses of G. Assume that the envelope function G of G is measurable. Assume also that: (i) There exists a constant cstG such that for every k ∈ N∗ , Gk is VCsubgraph class with index not greater than cstG k. (ii) For every k ≥ 1, there exists a measurable function Gk such that for all f ∈ G, there exists fk ∈ Gk satisfying |f − fk | ≤ Gk . (iii) There exists ζ ∈ (0, ∞) such that mn 4  P G2 (Yn,i ) −→ ζ . vn i=1

(C.4.10)

(iv) There exists γ ∈ (0, 1) such that mn 4  G2 (Yn,i ) = OP (k −1/γ ) . vn i=1 k

(C.4.11)

554

C Weak convergence of stochastic processes

Lemma C.4.22 Let {Yn,i , 1 ≤ i ≤ mn } be a triangular array of random elements in a measurable space E. If Assumption C.4.21 holds, then (C.4.5) holds i.e. there exists a measurable majorant N ∗ (G, dn , ) of N (G, dn , ) such that   δ log N ∗ (G, dn , )d > δ = 0 . (C.4.12) lim lim sup P δ→0 n→∞

0

Proof. Define the (random) probability measure Qn on G by Qn =

mn 1  δY . mn i=1 n,i

Define as usual the L2 (Qn ) distance on G by d2L2 (Qn ) (f, g) = m−1 n

mn 

{f (Yn,i ) − g(Yn,i )}2 .

i=1

For  > 0, define 

mn 4  2 Kn () = min k ∈ N : G2k (Yn,i ) < vn i=1 2

 .

Then, for f, g ∈ G and k > Kn (), we have by Assumption C.4.21 (ii), d2n (f, g) ≤ 2d2n (fk , gk ) +

mn 4  2mn 2 2 . G2k (Yn,i ) ≤ dL2 (Qn ) (fk , gk ) + vn i=1 vn 2

This bound implies that  N (G, dn , ) ≤ N

GKn () , dL2 (Qn ) , 



vn 4mn

1/2

 +1.

Set mn 4mn 4  2 ζn = Qn (G ) = G2 (Yn,i ) vn vn i=1

and Jk () = VC(Gk )(16e)VC(Gk ) −(2(VC(Gk )−1)) .

(C.4.13)

C.4 Asymptotic equicontinuity

555

Since Gk ⊂ G, the envelope function of Gk is smaller than G. Thus, by Lemma C.4.19, we obtain for each k     ! 1/2 vn vn ≤ cstJk  N Gk , dL2 (Qn ) ,  4mn 4mn Qn (G2 ) ≤ cstJk (ζn−1/2 ) ≤ cstJk ()(ζn ∨ 1)VC(Gk ) . (C.4.14) Combining (C.4.13) and (C.4.14) yields log N (G, dn , ) ≤ cst + log JKn () () + VC(GKn () ) log(ζn ∨ 1) . By (C.4.10), log(ζn ∨ 1) = OP (1), thus we need to prove that for all η > 0,   δ lim lim sup P log JKn () () d > η = 0 , (C.4.15a) δ→0 n→∞

0



δ

lim lim sup P

δ→0 n→∞



 VC(GKn () )d > η

=0.

(C.4.15b)

0

By assumption (C.4.11), Kn () = OP (−2γ ). Thus, for ξ ∈ (0, 1), A0 can be chosen such that Kn () ≤ A0 −2γ with probability greater than 1 − ξ. Since γ ∈ (0, 1) and VC(Gk ) = O(k) by Assumption C.4.21 (i), this yields, with probability tending to 1, 

δ



 VC(GKn () )d ≤ cst

0

δ

−γ d = O(δ 1−γ ) .

0

Similarly, with probability tending to 1, we have 

δ



log JKn () () d = O(δ 1−γ ) .

0

This proves (C.4.15).



D Markov chains

In this chapter, we use the definitions and notation introduced at the beginning of Chapter 14 in order to present the main results used therein. Our main reference is the monograph [DMPS18].

D.1 Generalities D.1.1 Existence of stationary solution In this section, we provide conditions that guarantee the existence of the invariant distribution for a Markov chain which admits the functional representation (14.0.1). For conciseness, we write Φz (x) = Φ(x, z) and Φz1 ◦ Φz2 (x) = Φ(Φ(x, z2 ), z1 ). Theorem D.1.1 Let (E, d) be a complete, separable metric space endowed with its Borel σ-field and (F, F) be a measurable space. Let ε0 be an F-valued random element and Φ : E × F → E be a measurable map. Let Π be the Markov kernel defined by Π(x, A) = P(Φ(x, ε0 ) ∈ A) . Assume that there exist x0 ∈ E and a measurable function g : F → R+ such that for all (x, y, z) ∈ E × E × F, d(Φ(x, z), Φ(y, z)) ≤ g(z)d(x, y) , E[log+ g(ε0 )] < ∞ , E[log g(ε0 )] ∈ [−∞, 0) , " + # E log d(x0 , Φ(x0 , ε0 )) < ∞ .

(D.1.1a) (D.1.1b) (D.1.1c) (D.1.1d)

Then the kernel Π admits a unique invariant probability measure which  ∞ defined by the almost sure limit is the distribution of Y © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

557

558

D Markov chains

 ∞ = lim Φε ◦ · · · ◦ Φε (x) , Y 1 n n→∞

which is independent of the initial value x ∈ E.



Proof. [DMPS18, Theorem 2.1.9].

D.1.2 Return times Y Let {Yj , j ∈ N} be a stochastic process. We write F∞ = σ{Yj , j ∈ N}. A Y functional Y is F∞ -measurable if it can be expressed as Y = F (Y0 , Y1 , . . . ). Y -measurable functionals by The forward shift  is defined on the set of F∞ j Y ◦  = F (Y1 , Y2 , . . . ) and its iterates  , j ≥ 1 are defined recursively in an obvious way and 0 = Id.

For a measurable set A ⊂ E, let τA and σA be the first visit and return times to A, that is, τA = inf{j ≥ 0 : Yj ∈ A} , σA = inf{j > 0 : Yj ∈ A} , with the convention inf ∅ = +∞. The hitting and return time are related by σ A = τA ◦  + 1 .

(D.1.2) (0)

(1)

The successive return times to A are defined recursively by σA = 0, σA = σA and (n)

(n−1)

σA = inf{j > σA

: Yj ∈ A} , n ≥ 2 .

The successive return times are linked by the following relation (n)

(n−1)

σ A = σ A ◦  σA

(n−1)

+ σA

, n≥1.

(D.1.3)

In words, the n-th return time is the first return time after the (n − 1)-th return time. A measurable set A is said to be accessible if Px (σA < ∞) = 1 for all x ∈ E. A set A is said to be Harris recurrent if Px (σA < ∞) = 1 for all x ∈ A and positive recurrent (or simply positive) if Ex [σA ] < ∞ for all x ∈ A.

D.2 Atoms and small sets

559

D.2 Atoms and small sets A measurable set A is an atom for the kernel Π if there exists a probability measure νA such that Π(x, ·) = νA for all x ∈ A. The measure νA is called the exit distribution from the atom. If x ∈ A, then Ex [f (Y1 )] = νA (f ) . Since the distribution of the chain starting from a point x ∈ A does not depend on x but only on A, it is customary to denote this distribution by PA . It is important to note that PA is different from PνA : PA is the distribution of {YσA +j , j ≥ 0} and PνA is the distribution of {YσA +j , j ≥ 1}. In particular, in view of (D.1.2), EA [σA ] = EνA [τA ] + 1 . Define the increasing sequence of random times {Tn , n ∈ N} as follows (n−1)

T 0 = τA + 1 , Tn = τA + σ A

◦ τA + 1 , n ≥ 1 .

The random times Tn are regeneration times for the chain. Theorem D.2.1 Assume that A is an accessible, positive recurrent atom with exit distribution νA . Then the chain admits a unique invariant distribution πA defined by $σ −1 $T −1 % % A 0   1 1 EA Eν πA (f ) = f (Yi ) = f (Yi ) . EA [σA ] EνA [T0 ] A i=0 i=0 (D.2.1) Furthermore, EνA [Tn ] < ∞ for all n ≥ 1, YTn has distribution νA and the random length vectors {YTn , . . . , YTn+1 −1 ), n ≥ 0, are i.i.d. under Pμ for any initial distribution μ.

Proof. [DMPS18, Theorem 6.4.2 and Proposition 6.5.1].



As a consequence of the last statement, for every initial distributions μ and n ≥ 0, since YTn has distribution νA , we also have ⎡ ⎤ Tn+1 −1  1 πA (f ) = Eμ ⎣ f (Yi )⎦ . Eμ [Tn+1 − Tn ] i=Tn

When there are no accessible atoms, regenerative properties can be obtained if an accessible small set exists.

560

D Markov chains

Definition D.2.2 Let m ≥ 1 be an integer,  > 0 and ν be a probability measure on E. A set C ∈ E is an (m, ν)-small set if for all x ∈ C and A ∈ E, Π m (x, A) ≥ ν(A) .

(D.2.2)

If m =  = 1, then C is an atom. A Markov chain which admits an accessible small set is said to be irreducible. Cf. [DMPS18, Definition 9.2.1]. With the help of small sets, we can define aperiodicity. Definition D.2.3 (Aperiodicity) A Markov kernel Π which admits an accessible small set is said to be aperiodic if there exists no integer c > 1 such that {n ∈ N∗ : inf Π n (x, C) > 0} ⊂ {c, 2c, 3c, . . . } . x∈C

A convenient criterion for aperiodicity is provided by the following lemma. Lemma D.2.4 The kernel Π is aperiodic if and only if there exists an increasing countable union of small sets. 

Proof. [DMPS18, Proposition 9.4.11].

D.3 Geometric ergodicity and mixing An important subclass of Markov chains is characterized by the convergence of the iterates of the transition kernel to the invariant distribution at a geometric rate. Theorem D.3.1 Assume that there exists a function V : E → [1, ∞), δ ∈ [0, 1), b > 0 such that ΠV ≤ δV + b .

(D.3.1)

Assume furthermore that there exists c ≥ 1 such that δ + b/c < 1 and the level set {V ≤ c} is a small set. Then there exists a unique invariant

D.3 Geometric ergodicity and mixing

561

distribution π, π(V ) < ∞ and the chain is V -geometrically ergodic: there exist a constant c∗ which depends only on the level set {V ≤ c}, δ and b, and ρ ∈ (0, 1) which depends only on δ and b, such that for all measurable functions f satisfying |f | ≤ V , n ∈ N and x ∈ E, |Π n f (x) − π(f )| ≤ c∗ V (x)ρn .

(D.3.2)

Proof. [DMPS18, Theorem 18.4.3].



A Markov chain which satisfies (D.3.2) is said to be V -geometrically ergodic. We provide another sufficient condition for V -geometric ergodicity in terms of the return time to a small set. Theorem D.3.2 Assume that the kernel Π is aperiodic. Let β > 1 and C be a small set such that Ex [β σC ] < ∞ for all x ∈ E and supx∈C Ex [β σC ] < ∞. Then the chain is V -geometrically ergodic and (D.3.1) holds with V (x) = Ex [β τC ].

Proof. [DMPS18, Theorem 15.2.4].



We will also need the following integrability result. We note that the small set assumption in Theorem D.3.1 is not needed. Lemma D.3.3 Let Π be a Markov kernel on E and V : E → [1, ∞) be a function which satisfies (D.3.1). If π is an invariant distribution for Π, then π(V ) < ∞. Proof. [DMPS18, Theorem 14.1.10]. Geometric ergodicity implies β-mixing with geometric rate. Proposition D.3.4 Let {Yj , j ∈ Z} be a stationary and Markov chain with invariant distribution π and state space (E, E), such that E is countably generated. Then  (D.3.3) βn ≤ dT V (Π n (x, ·), π)π(dx) . E

If the chain is geometrically ergodic, then there exists ρ ∈ (0, 1) such that βn = O(ρn ) . (D.3.4)



562

D Markov chains



Proof. [DMPS18, Theorem F.3.3]. Moment bound

Theorem D.3.5 Let the assumptions of Theorem D.3.1 hold. There exist β > 1 and a constant such that for all measurable functions f satisfying |f | ≤ V and π(f ) = 0 and all x ∈ E, ∞ 

β j |Π j f (x)| ≤ cst V (x) .

(D.3.5)

j=0

If {Yj } is stationary under P, then for all functions f such that |f | ≤ V , we have   n  f (Yi ) ≤ cst nE[|f (Y0 )|V (Y0 )] . (D.3.6) var i=1

Proof. If |f | ≤ V and π(f ) = 0, (D.3.2) implies that (D.3.5) holds for each β ∈ (1, ρ−1 ). To prove (D.3.6), write f¯ = f − E[f (Y0 )]. We have π(f¯) = 0 and since |f | ≤ V and V ≥ 1, it also holds that |f¯| ≤ cstV . Thus we can apply (D.3.5):  n  n   ¯ var f (Yi ) = n E[f (Y0 )f (Y0 )] + 2 (n − i)E[f (Y0 )f¯(Yi )] i=1

i=1

= n E[f (Y0 )f¯(Y0 )] + 2

n  i=1

(n − i)E[f (Y0 )Π i f¯(Y0 )] $

≤ n E[f (Y0 )f¯(Y0 )] + cst n E |f (Y0 )|

n 

% |Π f¯(Y0 )| i

i=1

≤ cst nE[|f (Y0 )|V (Y0 )|] . 

This proves (D.3.6). Remark D.3.6 For r > 1, we have by concavity, ΠV

1 r

1

1

1

≤ (ΠV ) r ≤ (λV + b) r ≤ λ r V

Since the level sets of V 1 V r.

1 r

1 r

1

+ br .

are those of V , we can apply Theorem D.3.5 to ⊕

E Martingales, Central Limit Theorems, Mixing, Miscellanea

E.1 Martingales Definition E.1.1 Let (Ω, F, {Fj , j ∈ N}, P) be a filtered probability space. An adapted sequence {ξj , j ∈ N∗ } of integrable random variables is a martingale difference sequence if for all n ≥ 1, E[ξn | Fn−1 ] = 0 a.s. An adapted sequence {Sj , j ∈ N} of integrable random variables is a martingale if {Sj − Sj−1 , j ≥ 1} is a martingale difference sequence, i.e. if for all n ≥ 1, E[Sn | Fn−1 ] = Sn−1 a.s. If the filtration is not explicitly mentioned, then it is understood to be the natural filtration of the sequence {Sj }. Proposition E.1.2 (Doob’s inequality) Let {Xj , j ∈ N} be sequence of martingale differences. For p ≥ 1,  p %  % $ n $ k     p p     E  Xi  ≤ Xk  . E max  (E.1.1)    1≤k≤n  p − 1 i=1 k=1



Proof. [HH80, Theorem 2.2]. For p = 2, this yields  2 ⎤ k n      E ⎣ max  Xi  ⎦ ≤ 2 E[Xk2 ] .  1≤k≤n  ⎡

i=1

(E.1.2)

k=1

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

563

564

E Martingales, Central Limit Theorems, Mixing, Miscellanea

Proposition E.1.3 (Rosenthal inequality) Let {Xj , j ∈ N} be sequence of martingale differences adapted to the filtration {Fi }. For p ≥ 2, ⎡ p/2 ⎤ n n   ⎦+C E[|Sn |p ] ≤ CE ⎣ E[Xi2 | Fi−1 ] E[|Xi |p ] . (E.1.3) i=1

i=1



Proof. [HH80, Theorem 2.12]. If the random variables Xi are i.i.d., then Rosenthal inequality becomes E[|Sn |p ] ≤ Cnp/2 (E[X12 ])p/2 + CnE[|X1 |p ] .

(E.1.4)

Theorem E.1.4 (Burkholder inequality) Let {Xj , j ∈ N} be sequence of martingale differences adapted to the filtration {Fi }. For every p ≥ 1, there exists a constant C such that for all n ≥ 1, ⎡ p/2 ⎤ n  ⎦ . (E.1.5) Xi2 E[|Sn |p ] ≤ CE ⎣ i=1



Proof. [HH80, Theorem 2.10]. If p ∈ [1, 2], Burkholder inequality becomes E[|Sn | ] ≤ C p

n 

E [|Xi |p ] .

(E.1.6)

i=1

Theorem E.1.5 (Martingale convergence theorem) Let {ξj , j ∈ Z} be a martingale difference sequence and set Sn = ξ1 + · · · + ξn , n ≥ 1. If supn≥1 E[|Sn |] < ∞, then Sn converges almost surely to a random variable S such that E[|S|] < ∞. Proof. [HH80, Theorem 2.5]



E.2 Central limit theorems Theorem E.2.1 (Martingale central limit theorem) Let {Zn,i , 1 ≤ i ≤ n} be a triangular array of random variables and let Fn,i be a triangular array of σ-fields such that Zn,j is Fn,i measurable for all j ≤ i ≤ n. Assume that there exists σ 2 > 0 such that

E.2 Central limit theorems n 

565

P

var(Z2n,i | Fn,i−1 ) −→ σ 2 ,

i=1 n 

P

E[Zn,i | Fn,i−1 ] −→ 0 ,

i=1

∀ > 0 ,

n 

P

E[Z2n,i 1{|Zn,i | > } | Fn,i−1 ] −→ 0 .

i=1

Then Zn =

n 

d

Zn,i −→ N (0, σ 2 ) .

i=1

Proof. [HH80, Theorem 3.2 and Corollary 3.1]; [DCD86, Theorem 2.8.42].  As a corollary, we obtain for independent summand what is often referred to as the Lindeberg-L´evy central limit theorem. Theorem E.2.2 Let {Zn,i , 1 ≤ i ≤ n} be a triangular array of independent random variables with mean zero and finite variance. Assume that there exists σ 2 > 0 such that n  lim var(Z2n,i ) = σ 2 , (E.2.1) n→∞

i=1

and for all  > 0, lim

n→∞

Then

n i=1

n 

E[Z2n,i 1{|Zn,i | > }] = 0 .

(E.2.2)

i=1

d

Zn,i −→ N (0, σ 2 ).

Theorem E.2.3 (The Wold device) Let {Xn , n ∈ N} be a sequence of centered processes indexed by a set T . Let X be a centered Gaussian process indexed by T with covariance function Γ . The finite-dimensional distributions of Xn converge weakly to those of X if and only if for all q ≥ 1, u ∈ Rq and t1 , . . . , tq ∈ T q , ⎞ ⎛ q   d (E.2.3) ui Xn (ti ) −→ N ⎝0, Γ (ti , tj )ui uj ⎠ . i=1

Proof. [Bil86, Theorem 29.4]

1≤i,j≤q



Combining Theorem E.2.2 and Theorem E.2.3 we obtain a multivariate version of the Lindeberg-L´evy central limit theorem.

566

E Martingales, Central Limit Theorems, Mixing, Miscellanea

Theorem E.2.4 Let {Zn,i , 1 ≤ i ≤ n}, n ≥ 1, be a triangular array of independent random processes with mean zero and finite variance, indexed by a set T . Assume that there exists a function Γ on T × T such that n  cov(Zn,i (s), Zn,i (t)) = Γ (s, t) , (E.2.4) lim n→∞

i=1

and for all  > 0 and all t ∈ T , n  lim E[Z2n,i (t)1{|Zn,i (t)| > }] = 0 . n→∞

(E.2.5)

i=1

n Then the finite-dimensional distributions of i=1 Zn,i converge weakly to those of a centered Gaussian process X with covariance function Γ , that is, for each q ∈ N, and t1 , . . . , tq ∈ T we have  n  n   d Zn,i (t1 ), . . . , Zn,i (tq ) −→ (X(t1 ), . . . , X(tq )) . i=1

i=1

Proof. Set Xn (t) = we have

n k=1 q 

Zn,k (t). For every q ≥ 1, u ∈ Rq and t1 , . . . , tq ∈ T ,

uj Xn (tj ) =

j=1

q n  

uj Zn,k (tj ) .

k=1 j=1

q By Theorem E.2.3, it suffices to prove that j=1 uj Xn (tj ) converges weakly to a centered Gaussian random variable with variance 1≤i,j≤q Γ (ti , tj )ui uj . To prove this convergence, we apply Theorem E.2.2. The convergence of the variance is a straightforward consequence q of (E.2.4). To check the asymptotic negligibility condition, write ζn,k = j=1 uj Zn,k (tj ). Then n 

2 E[ζn,k 1{|ζn,k | > }]

k=1

≤q ≤q

q  j=1 q 

u2j u2j

j=1

≤q

E[Z2n,k (tj )1{|ζn,k | > }]

k=1 q 

n 

j  =1

k=1

q  q  n  * j=1

=q

n 

2

j  =1

E[Z2n,k (tj )1{|Zn,k (tj  )| > /(q|uj  |)}]

u2j E[Z2n,k (tj )1{|Zn,k (tj )| > /(q|uj |)}]

k=1

+ +u2j  E[Z2n,k (tj  )1{|Zn,k (tj  )| > /(q|uj  |)}]

q  j=1

u2j

n  k=1

E[Z2n,k (tj )1{|Zn,k (tj )| > /(q|uj |)}] .

E.2 Central limit theorems

567

Assumption (E.2.5) ensures that each term of the latter sum tends to zero as n tends to infinity. Thus Conditions (E.2.1) and (E.2.2) hold.  We will need two extensions of the Lindeberg central limit theorem. Theorem E.2.5 Let {Xn,i , 1 ≤ i ≤ n} be a triangular array of random variables with mean zero and finite variance. Assume that there exists a sequence of σ-algebras {Fn , n ≥ 1} such that for every n, Xn,1 , . . . , Xn,n are conditionally independent given Fn . Assume that there exists σ 2 > 0 such that n 

P

2 E[Xn,i | Fn ] −→ σ 2 ,

(E.2.6)

i=1

and for all  > 0, n 

P

2 E[Xn,i 1{|Xn,i | > } | Fn ] −→ 0 .

(E.2.7)

i=1

Then, E[eit as n → ∞ and

n

n i=1

Xn,i

P

| Fn ] −→ e−σ

2 2

t /2

(E.2.8)

d

i=1

Xn,i −→ N(0, σ 2 ).

We provide a brief proof because we couldn’t locate a reference. Proof. The last statement is simply a consequence of the convergence (E.2.8) and the dominated convergence theorem since the conditional characteristic function is bounded by 1. We prove (E.2.8). By the conditional independence, we must show that n 

P

log E[eizXn,i | Fn ] −→ −

i=1

σ2 2 z . 2

(E.2.9)

We need the following elementary bounds. For  ∈ (0, 1), x3 , x∈R, 6 u2 | log(1 + u) − u| ≤ , u ≥ − . 2(1 − )

|eix − 1 − ix +

x2 2 |



For  ∈ (0, 1), we write A for the event {max1≤i≤n |Xn,i | ≤ }. Since E[Xn,i | Fn ] = 0 almost surely, on the event A , we have   2  izXn,i  2 2 | Fn ] − 1 + z2 E[Xn,i | Fn ] ≤ cst z 3 Xn,i , E[e   2 log E[eizXn,i | Fn ] − E[eizXn,i ] + 1 ≤ cst2 z 4 Xn,i .

568

E Martingales, Central Limit Theorems, Mixing, Miscellanea

Thus, on the event A ,   n n   z2    izXn,i 2 log E[e | Fn ] + E[Xn,i | Fn ]    2 i=1 i=1 ≤

n    log E[eizXn,i | Fn ] + E[eizXn,i − 1 | Fn ] i=1 n 

+

E[|eizXn,i − 1 − izXn,i +

i=1

≤ cst 

n 

z2 2 X | | Fn ] 2 n,i

2 E[Xn,i | | Fn ] .

i=1

By assumption (E.2.6), this proves that for all η > 0, we can choose  small enough such that   n  n   z2    izXn,i 2 lim P  log E[e | Fn ] + E[Xn,i | Fn ] > η, A = 0 . n→∞   2 i=1

i=1

On the other hand, P(Ac | Fn ) ≤

n 

P(|Xn,i | >  | Fn ) ≤ −2

i=1

n 

2 E[Xn,i 1{|Xn,i | > } | Fn ) .

i=1 P

By assumption (E.2.7), this yields that P(Ac | Fn ) −→ 0. Since a conditional probability is bounded by 1, this implies that limn→∞ P(Ac ) = 0 by the dominated convergence theorem. Altogether, we have proved (E.2.9).  Corollary E.2.6 (Conditional multiplier CLT) Let {Zn,i , 1 ≤ i ≤ n} be a triangular array of random variables with finite variance, independent of the i.i.d. sequence of real random variables {ξi , i ≥ 1} with E[ξi ] = 0 and E[ξi2 ] = 1. Let FnZ be the σ-algebra generated by {Zn,i , 1 ≤ i ≤ n}. Assume that there exists σ 2 > 0 such that n  P Z2n,i −→ σ 2 , (E.2.10) i=1

and for all  > 0, n 

P

Z2n,i E[ξi2 1{|ξi Zn,i | > } | FnZ ] −→ 0 .

(E.2.11)

i=1

n Then, conditionally on FnZ , i=1 ξi Zn,i , converges weakly to a normal random variable with mean zero and variance σ 2 , i.e. E[eit

n i=1

ξi Zn,i

P

| FnZ ] −→ e−σ

2 2

t /2

E.3 Mixing

569

as n → ∞. Furthermore, if for all  > 0, n 

P

Z2n,i 1{|Zn,i | > } −→ 0 ,

(E.2.12)

i=1

then (E.2.11) holds. Proof. The first statement is a particular case of (E.2.5) with Xn,i = ξn,i Zn,i and Fn = FnZ . To prove the second statement, note first that if ξ0 is bounded, (E.2.12) implies that n 

P

Z2n,i E[ξi2 1{|ξi Zn,i |>} | FnZ ] −→ 0 ,

i=1

i.e. (E.2.11) holds. Otherwise, fix an arbitrary A > 0. Then, mn 

# " Z2n,i (t)E ξi2 1{|ξi ||Zn,i (t)| > } | FnZ

i=1

≤A

2

mn 

Z2n,i (t)1{|Zn,i (t)|

> /A} +

E[ξ12 1{|ξ1 |

> A}]

i=1

mn 

Z2n,i (t) .

i=1

The first expression in the last line converges in probability to zero as n → ∞ by (E.2.12). By (E.2.10), the sum in the second term is bounded in probability in n and vanishes by letting A → ∞. This proves (E.2.11). 

E.3 Mixing Let (Ω, F, P) be a probability space and let A, B be sub-σ-algebras of F. The mixing coefficients were introduced in [Ros56], [VR59], [Ibr62], and [KR60]. See also [Bra05], [Bra07]. Definition E.3.1 α(A, B) =

sup A∈A,B∈B

1  |P(Ai ∩ Bj ) − P(Ai )P(Bj )| , 2 i=1 j=1 I

β(A, B) = sup

|P(A ∩ B) − P(A)P(B)| ; J

where the supremum is taken over all pairs of finite partitions A1 , . . . , AI and B1 , . . . , BJ of Ω such that Ai ∈ A for each i and Bj ∈ B for each j.

570

E Martingales, Central Limit Theorems, Mixing, Miscellanea

Equivalently, the β-mixing coefficients can be expressed as a total variation distance: β(A, B) = dT V (PA⊗B , PA ⊗ PB ) ,

(E.3.1)

where dT V (·, ·) is the total variation distance between measures (see Appendix A.1.2), PA⊗B is the restriction of the probability measure P to the product sigma-field A × B, PA , PB are the restrictions of P to A and B, respectively. Definition E.3.2 Let {Xj , j ∈ Z} be a stationary sequence and let Fab be the σ-field generated by (Xa , Xa+1 , . . . , Xb ), −∞ ≤ a < b ≤ ∞. The sequence {Xj , j ∈ Z} is said to be 0 , Fn∞ ) = 0; (i) strongly or α-mixing if limn→∞ α(F−∞ 0 (ii) absolutely regular or β-mixing if limn→∞ β(F−∞ , Fn∞ ) = 0.

The following relation between the mixing coefficients holds: 2α(A, B) ≤ β(A, B) .

(E.3.2)

Thus β-mixing implies strong mixing. Absolute regularity allows to compare probability laws of a time series with another sequence based on independent blocks. Lemma E.3.3 Let X1 , . . . , Xm be random elements such that β(σ(X1 , . . . , Xj ), σ(Xj+1 , . . . , Xm )) ≤  , † be independent random variables for all j = 1 . . . , m − 1. Let X1† , . . . , Xm † such that Xi has the same distribution as Xi , i = 1, . . . , m. Then

† ) ≤ m . dT V (L (X1 , . . . , Xm ) , L X1† , . . . , Xm

Furthermore, if ξ1 , . . . ξm are i.i.d. random elements and independent of † ), then (X1 , . . . , Xm ) and (X1† , . . . , Xm

† , ξm ) ) ≤ m . dT V (L ((X1 , ξ1 ), . . . , (Xm , ξm )) , L (X1† , ξ1 ), . . . , (Xm Proof. [Ebe84, Lemma 2] for the first statement and [Bra05, Theorem 5.2] for the second one.  As an immediate corollary, we obtain the following results which validates the blocking method.

E.3 Mixing

571

Lemma E.3.4 Let  ≤ r and m be integers. Let {Ω, F, P) be a probability space on which are defined (i) a stationary Rd -valued sequence {X j , j ∈ Z} with β-mixing coefficients βj , (ii) a sequence {X †j , j ∈ Z} such that the vectors (X †(i−1)r+1 , . . . , X †ir ), i ∈ Z, are mutually independent and each one has the same distribution as (X 1 , . . . , X r ), (iii) a sequence {j , j ∈ Z} of i.i.d. random elements with value in a measurable set (E, E), independent of the previous two sequences. Let Qm and Q†m be the respective laws of {(i , X (i−1)r+1 , . . . , X (i−1)r− ), 1 ≤ i ≤ m} and {(i , X †(i−1)r+1 , . . . , X †(i−1)r− ), 1 ≤ i ≤ m}. Then dT V (Qm , Q†m ) ≤ mβ . The bound on the total variation distance yields that for any bounded measurable function h : Rdm(r−) → R we have |Q(h) − Q† (h)| ≤ mβ h∞ .

(E.3.3)

For n, k ≥ 1, let hn,k : (Rd )k → R be measurable functions. Define Hn and H†n by Hn = H†n =

mn  i=1 mn 

hn,rn (X (i−1)rn +1 , . . . , X irn ) , hn,rn (X †(i−1)rn +1 , . . . , X †irn ) ,

i=1 † † H0n = Hn − E[Hn ] , H†,0 n = Hn − E[Hn ] .

The following lemma will be used to prove finite-dimensional convergence for relevant statistics. Lemma E.3.5 Let {X j , j ∈ Z} be a stationary Rd -valued sequence with βmixing coefficients βj . Let n , mn and rn be non-decreasing sequences of integers such that n /rn → 0. Assume that there exist measurable functions gn such that |hn,rn (X 1 , . . . , X rn ) − hn,rn −n (X 1 , . . . , X rn −n )| ≤ gn (X rn −n +1 , . . . , X rn ) , (E.3.4a) lim mn E[gn2 (X 1 , . . . , X n )] = 0 .

(E.3.4b)

lim mn βn = 0 ,

(E.3.5)

n→∞

If moreover n→∞

0 and H†,0 n converges weakly, then Hn converges weakly to the same distribution.

572

E Martingales, Central Limit Theorems, Mixing, Miscellanea

, n = mn hn,r − (X (i−1)r +1 , . . . , X ir − ) and H , † simProof. Define H n n n n n n i=1 , † ) = o(1), and hence ilarly. Condition (E.3.4b) implies that var(H†n − H n the small blocks are negligible for the process based on independent blocks. † † Therefore H†,0 n = Hn − E[Hn ] converges weakly to the same distribution as †,0 † † , , , Hn = Hn − E[Hn ]. Applying Lemma E.3.4 and (E.3.5), we obtain, for all z ∈ R,   0   izH †,0 E[e n ] − E[eizHn ] ≤ mn βn → 0 , , 0 and since the complex exponential is uniformly bounded by 1. Therefore H n †,0 , Hn have the same limiting distribution. There only remains to prove that , 0 have the same limiting distribution, that is, the small blocks are H0n and H n also negligible for the original process. For every  > 0, applying Lemma E.3.4 to the small blocks (which are separated by more than n for large n since n /rn → 0) and the Bienaym´e-Chebyshev inequality, we have

, 0 | > ) ≤ mn β + P |H†,0 − H , †,0 | >  P(|H0n − H n n n n ≤ mn βn + −2 mn E[gn2 (X 1 , . . . , X n )] → 0 , under conditions (E.3.4b) and (E.3.5).



E.4 Miscellanea E.4.1 Stochastic processes Theorem E.4.1 (Skorokhod’s representation theorem) Let {Zj , j ∈ Z} be a sequence of random elements in a separable metric space (S, ρ) such d ˆ there exists ˆ Fˆ , P), that Zn −→ Z0 . Then on a suitable probability space (Ω, a sequence {ηj , j ∈ Z} of random elements such that L (ηj ) = L (Zj ) for all ˆ j ≥ 0 and ηn → η0 P-almost surely. Proof. [Kal97, Theorem 3.30]



Definition E.4.2 An distribution μ on R is said to be infinitely divisible if for each integer n ≥ 1, there exists a distribution μn on R such that μ∗n n = μ. Theorem E.4.3 (L´ evy-Khinchin representation) A distribution μ on R is infinitely divisible if and only exists b ∈ R, σ 2 > 0 and a measure - ∞if there 2 ν on R such that ν({0}) = 0, ∞ (x ∧ 1)ν(dx) < ∞ and    ∞  ∞ σ 2 t2 + eitx μ(dx) = exp ibt − {eitx − 1 − itx1{|x|≤1} }ν(dx) . 2 −∞ −∞ (E.4.1)

E.4 Miscellanea

Proof. [Fel71, Theorem XVII.2.1] [Kal97, Corollary 13.8]

573



Theorem E.4.4 Let {Xn , n ∈ N} be a sequence of L´evy processes such that Xn (1) converges in distribution to X0 (1). Then there exists a sequence ˜ n has the same distribution as Xn and for all t > 0, ˜ n , n ≥ 1} such that X {X as n → ∞, P ˜ n (s) − X0 (s)| −→ sup |X 0.

0≤s≤t

Consequently, Xn converges weakly in the uniform topology to X0 . 

Proof. [Kal97, Theorem 13.17]

E.4.2 Gaussian processes Theorem E.4.5 Let {Wt , t ∈ T } be a centered Gaussian process defined on a probability space (Ω, F, P). Let Z be a jointly Gaussian random variable ˜ be the probability measure on F with zero mean and variance σ 2 > 0. Let P Z−σ 2 /2 with respect to P, i.e. with density e 2 ˜ P(A) = E[eZ−σ /2 1A ] .

. by Define the process W .t = cov(Z, Wt ) + Wt , t ∈ T . W ˜ is the law of W . under P. Then the law of W under P Proof. For t ∈ T , write ρt = σ −2 cov(Z, Wt ) and Zt = Wt −ρt Z. For t1 , . . . , tn , (Zt1 , . . . , Ztn ) is a centered Gaussian vector, independent of Z under P. (Note that Z = ci Wti for one (or several) ti ∈ T and ci = 0 is not ruled out; in that case Zti = 0 and this causes no difficulty.) Thus, conditioning on ˜ is that of σ 2 + Z (Zt1 , . . . , Ztn ) and noting that the distribution of Z under P under P, we obtain ˜ E[H(Z, Wt1 , . . . , Wtn )] = E[H(Z, Zt1 + ρt1 Z, . . . , Ztn + ρtn Z)eZ−σ

2

/2

]

= E[H(σ + Z, Zt1 + ρt1 (σ + Z), . . . , Ztn + ρtn (σ 2 + Z))] 2

2

= E[H(σ 2 + Z, ρt1 σ 2 + Wt1 , . . . , ρtn σ 2 + Wtn )]. Since cov(Z, Wt ) = ρt σ 2 by definition of ρt , this proves our claim.



574

E Martingales, Central Limit Theorems, Mixing, Miscellanea

Hypercontractivity Let Pt , t ≥ 0, be the Ornstein-Uhlenbeck kernel. That is, for a function f such that E[|f (ξ)|] < ∞ with ξ a standard Gaussian random variable, for all x ∈ R,  Pt f (x) = E[f (e−t x + 1 − e−2t ξ)] . Theorem E.4.6 Let p > 1. If f is a measurable function, then for all q and t such that q ≤ 1 + e2t (p − 1), E1/q [|Pt f (ξ)|q ] ≤ E1/p [|f (ξ)|p ]. Proof. [Bak94, Theorem 1.2].



This property is called hypercontractivity. It is applied in Section 16.2.3 under the following form. Corollary E.4.7 Let (ξ, ζ) be a Gaussian random vector with standard marginals and correlation ρ. If p ∈ (1, 2), 1 + ρ2 ≤ p and f is a measurable function such that E[|f (ξ)|] < ∞, then E[E2 [f (ζ) | ξ]] ≤ (E[|f (ξ)|p ])2/p . Proof. If ρ = 0, then ξ and ζ are independent and E[f (ζ) | ξ] = E[f (ζ)] so there is nothing to prove. If ρ ∈ (0, 1), let t = − log ρ. Then E[f (ζ) | ξ] = Pt f (ξ). We conclude by applying Theorem E.4.6 with q = 2. If ρ ∈ (−1, 0), replace ξ by −ξ. 

E.4.3 Random walks Let {Zn , n ≥ 1} be i.i.d. random variables with negative expectation m ∈ [−∞, 0). This entails that E[(Z0 )+ ] < ∞. Define the random walk by S0 and Sn = S0 + Z1 + · · · + Zn , n ≥ 1. We write Px for the distribution of the random walk started from S0 = x. The natural filtration of the random walk is denoted by F S = {FnS , n ≥ 0} Theorem E.4.8 Let τ be an integer-valued random variable such that E[τ ] < ∞. Assume that either τ is an F S -stopping time or τ is independent of F S . Then Ex [Sτ ] = x + E[Z1 ]E[τ ] . Proof. [Gut88, Theorem I.5.3] for the first case, any probability textbook for the second one. 

E.4 Miscellanea

575

Define T0 = inf{n ≥ 1 : Sn ≤ 0}. Theorem E.4.9 For all x ≥ 0 and r ≥ 1, Ex [T0r ] < ∞ ⇐⇒ E[(Z0 )r+ ] < ∞ . 

Proof. [Gut88, Theorem III.3.1]

E.4.4 More inequalities Theorem E.4.10 Let {(Xi , Yi ), i ∈ N} be an adapted sequence of nonnegative random variables defined on a filtered probability space (Ω, F, {Fn , n ∈ N}, P) such that for all i ≥ 1 and t ≥ 0, P(Xi > t | Fi−1 ) ≤ P(Yi > t | Fi−1 ) .

(E.4.2)

Then, for every q ≥ 1 there exists a constant Cq such that for all n ≥ 1, E[(X1 + · · · + Xn )q ] ≤ Cq E[(Y1 + · · · + Yn )q ] .

(E.4.3) 

Proof. [KW92, Theorem 5.2.1]

Theorem E.4.11 Let (bi , b i ), i = 1, . . . , n be independent bivariate pairs such that for each i, bi and b i are Bernoulli random variables with mean pi and p i , respectively, and bi b i = 0. Then ⎡  n  n  2  n 2 ⎤ n     E⎣ (bi − pi ) (bi − pi ) ⎦ ≤ 3 pi p i . i=1

i=1

i=1

i=1

Proof. [Bil68, Equation (13.18)] for the case where pi = p and p i = p for all i and [KS13, Lemma C.2] for the general case. 

E.4.5 Dini’s theorem Theorem E.4.12 (Dini’s Theorem) Let fn be a sequence of monotone functions defined on R, converging pointwise to a continuous limit f . Then the convergence is uniform on compact subsets of R. If limn→∞ {lims→∞ fn (s)} = lims→∞ f (s) ∈ (0, ∞), then the convergence is uniform on intervals [a, ∞). If fn are random functions and the convergence holds pointwise in probability to the deterministic function f , then the convergence holds uniformly in probability, that is, for all η > 0,  lim P sup |fn (s) − f (s)| > η = 0 . n→∞

a≤s≤b

576

E Martingales, Central Limit Theorems, Mixing, Miscellanea

Proof. It is sufficient to prove the statement in probability. Assume, for instance, that all the functions under consideration are non-decreasing. Fix η > 0 and let a = s0 < s1 · · · < sK = b be such that f (sj ) − f (sj−1 ) ≤ η/2 for j = 1, . . . , K. If f has a finite limit at ∞, it is possible to choose b = ∞. For s ∈ [sj , sj+1 ), we have fn (s) − f (s) ≤ fn (sj+1 ) − f (sj+1 ) + f (sj+1 ) − f (s) ≤ fn (sj+1 ) − f (sj+1 ) + η/2 ≤ max |fn (sk ) − f (sk )| + η/2 , 0≤k≤K

fn (s) − f (s) ≥ fn (sj+1 ) − f (sj+1 ) + f (sj+1 ) − f (s) ≥ fn (sj+1 ) − f (sj+1 ) − η/2 ≥ − max |fn (sk ) − f (sk )| − η/2 . 0≤k≤K

This yields sup |fn (s) − f (s)| ≤ max |fn (sj ) − f (sj )| + η/2 . 0≤j≤K

s≥a

P

Since by assumption fn (sj ) −→ f (sj ) for j = 0, . . . , K (including the case sK = ∞ when the second assumption holds), this yields  lim sup P sup |fn (s) − f (s)| > η n→∞

a≤s≤b

 ≤ lim sup P n→∞

max |fn (sj ) − f (sj )| > η/2

0≤j≤K

Since η is arbitrary, this proves our claim.

=0. 

If the limit is not continuous, then local uniform convergence holds at continuity points. Lemma E.4.13 Let fn be a sequence of monotone functions on a compact interval I with non-empty interior, which converge pointwise to a function f . Then the convergence is locally uniform at continuity points of f . Proof. Let I = [a, b] with a < b. Fix a continuity point x ∈ I and a sequence {xn , n ≥ 1} which converges to x. Fix  > 0. By [Bil99, Lemma 12.1], there exists points a = t0 < t1 < · · · < tK = b such that supti ≤x,y 0, E[X s ] =



2−n−1 2ns =

n≥0



 2(s−1)n−1 =

n≥0

1 2−2s



if s < 1 , if s ≥ 1 .

For x > 0, let nx be the smallest integer such that 2nx > x. Then 2nx −1 ≤ x < 2nx and  xP(X > x) = x 2−n−1 = x2−nx . n≥nx

This quantity oscillates in [1/2, 1) and does not have a limit at ∞. Indeed, take for instance x = xn = 2n/2 , then for n even, xP(X > x) = 1/2 and for n odd, xP(X > x) = 1/2. 1.2 x γ,δ,c (x) = cγ logγ−1 (x) cos(logδ (x)) − cδ logγ+δ−1 (x) sin(logδ (x)) . γ,δ,c (x) Thus limx→∞ x γ,δ,c (x)/γ,δ,c (x) = 0 and this proves that  is slowly varying by Corollary 1.1.6. However, if γ1 = γ2 , the ratio γ1 ,δ,c (x) = exp{c(logγ1 (x) − logγ2 (x)) cos(logδ (x))} γ2 ,δ,c (x) does not have a limit at ±∞. 1.3 The tail of X can be written as P(X > x) = (x(x))−1 with  slowly varying. Let # be the de Bruijn conjugate of  (see Proposition 1.1.8). Then © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

579

580

F Solutions to problems

# an = n# (n) and na−1 n = 1/ (n) is slowly varying. By Proposition 1.4.6, the function L is slowly varying, and L(an t) = L(n# (n)t). Thus the second statement follows by Proposition 1.1.9.

1.4 The integrability holds. Indeed, f (x)f (1+x) = L(x)L(1+x)x−p (1+x)−p with L a slowly varying function. As x → ∞, f (x)f (1+x) ∼ L(x)L(1+x)x−2p and 2p > 1. Fix  > 0. Then, -∞  −1 f (x)f (x + y)dx f (xy)f ((1 + x)y) 0 = I dx + J , +  2 yf (y) f 2 (y)  with, applying Proposition 1.4.2 with η > 0 such that p + η < 1 and p − η > 1/2,     |f (xy)f ((1 + x)y)| dx ≤ cst x−p−η dx = O(1−p−η ) = o(1) , I ≤ 2 (y) f 0 ∞ 0 ∞ |f (xy)f ((1 + x)y)| dx ≤ cst J ≤ x−p+η (1 + x)−p+η dx 2 (y) f −1 −1   = O(2p−1−2η ) = o(1) . By the Uniform Convergence Theorem 1.1.2,  lim

y→∞



−1

f (xy)f ((1 + x)y) dx = f 2 (y)



−1

x−p (1 + x)−p dx .



Letting  → 0 proves the first statement. To prove the second statement, define f (t) = a[t] for t ≥ 0. By Proposition 1.2.2, f is regularly varying with index −p and  ∞ ∞  ai ai+j = f (x)f (x + j)dx . i=0

0

Applying the first part proves the second statement. 1.5 By Proposition 1.1.4 there exist increasing functions h1 and h2 such that h1 ≤ g ≤ h2 and limx→∞ h1 (x)/h2 (x) = 1. Therefore, it suffices to prove the result in the case where the function g is increasing. Let g ← be its leftcontinuous inverse which is regularly varying at infinity with index 1/γ by Proposition 1.1.8. Since P(g(X) > x) = P(X > g ← (x)) and the tail function y → P(X > y) is regularly varying with index −α, we obtain that the function x → P(g(X) > x) is regularly varying with index −α/γ by Proposition 1.1.9. 1.6 Conditioning on Y yields P(XY > z) = E[P(X > z/Y | Y )] = E[((z/Y ) ∨ 1)−α ] = E[(Y /z)α ∧ 1] .

F Solutions to problems

581

1.7 Since X and Y are i.i.d. P(X ∧ Y > x) = P2 (X > x) . By Proposition 1.1.9, the function x → P2 (X > x) is regularly varying at infinity with index −2α. This proves that X ∧ Y is regularly varying with tail index 2α. 1.8 By integration by parts, we have E[Z β 1{yZ ≤ x}] P(yZ > x) = −y −β + y −β β x P(Z > x) P(Z > x)



1

βz β−1 0

P((y/z)Z > x) dz . P(Z > x)

Applying Potter’s bound (1.4.3) with  such that β > α +  yields, for all x ≥ 1 and y > 0, E[Z β 1{yZ ≤ x}] xβ P(Z > x)   α−β  − = cst y (y ∨ y ) +

1

βz

β−α−1

−

((y/z) ∨ (y/z) 

 )dz

0

≤ cst y α−β (y  ∨ y − ) . Conditioning on A yields the required bounds. The second bound is valid for all x ≥ 0 since the left-hand side is bounded away from zero for x ∈ [0, 1]. 1.9 By Breiman’s Lemma 1.4.3, XY is regularly varying with tail index α, thus by Proposition 1.4.6, if β > α, E[(XY )β 1{XY ≤ x}] E[(XY )β 1{XY ≤ x}] P(XY > x) = lim β x→∞ x→∞ x P(X > x) xβ P(XY > x) P(X > x) βE[Y α ] . = β−α lim

1.10 Define the functions G+ and G− on R+ by G+ (x) = E[Z1{Z > x}] , G− (x) = −E[Z1{Z < −x}] . Set c+ = E[Z+ ] and c− = E[Z− ]. Then c− ≥ c+ since E[Z] ≤ 0 by assumption. G+ : [0, ∞) → (0, E[Z+ ]] , G− : [0, ∞) → (0, E[Z− ]] . The functions G+ and G− are continuous, non-increasing and G+ (0) = c+ , G− (0) = c− . Since c+ ≤ c− , it is possible to define the function h on R+ by h = G← − ◦ G+ .

582

F Solutions to problems

Since G− is continuous, we have G− ◦ G← − (y) = y for all y ∈ (0, c− ] therefore G− ◦ h(x) = G+ (x), i.e. (1.5.2a) holds. Indeed, E[Z1{−h(x) ≤ Z ≤ x}] = E[Z1{Z ≤ x}] − E[Z1{Z < −h(x)}] = E[Z] − G+ (x) + G− (h(x)) = E[Z] − G+ (x) + G+ (x) = E[Z] . Moreover, by Proposition 1.1.8, limx→∞ x−1 G← − ◦ G− (x) = 1. By assumption on the distribution of Z, we have lim

x→∞

p G+ (x) = ∈ (0, ∞) . G− (x) 1−p

Thus G← ◦ G+ (x) lim − = x→∞ x



p 1−p

1/α

.

This yields (1.5.2b). 1.11 By Fubini theorem,  ∞ E[logp+ (X/x)] P(X > t) dt =p logp−1 (t/x) P(X > x) t F (x) x  ∞ P(X > xt) dt logp−1 (t) =p P(X > x) t  ∞ 1 ∞ p−1 −α dt =α →p log (t)t logp (t)t−α−1 dt . t 1 1 The pointwise convergence is obtained inside the integral by regular variation and the dominated convergence theorem is justified by an appeal to Proposition 1.4.2. The change of variables tα = eu yields   ∞ p −α−1 −p log (t)t dt = α α 1



up e−u du = α−p Γ (p + 1).

0

1.12 Apply Theorem 1.1.7 with β = 1. 1.13 Assume without loss of generality that 0 < x < y. Then, applying the convergence (1.3.7), we have n (E[Z0 1{|Z0 | ≤ an y}] − E[Z0 1{Z0 | ≤ an x}]) an   = u nF (an du) → u ν1,pZ (du) = (2pZ − 1) log(y/x) . x x}]. Then, by Markov inequality and (1.4.7b)   n  Vi > x ≤ nx−1 E[|V0 |] ≤ 2nx−1 E[|X0 |1{|X0 | > x}] P i=1

≤ cst n P(|X1 | > x) . Let q > α. By Rosenthal inequality (E.1.4) and (1.4.7a)  n   P Ui > x i=1

≤ cst nq/2 x−q (E[U02 ])q/2 + cst nx−q E[|U0 |q ] ≤ cst nq/2 x−q (E[X02 1{|X0 | ≤ x}])q/2 + cst nx−q E[|X0 |q 1{|X0 | ≤ x}] ≤ cst nq/2 x−q (E[X02 1{|X0 | ≤ x}])q/2 + cst nP(|X1 | > x) . If α > 2, this yields the last bound in (1.5.3). If α < 2, taking q = 2 yields the first bound in (1.5.3). If α = 2, then E[X12 1{|X1 | ≤ x}] is slowly varying and this yields the middle bound in (1.5.3). 1.15 [AA10, Lemma X.1.8] 1.16 Assume first that f is ultimately non-decreasing. Then, for large enough a < b, we have F (bx) − F (ax) xf (bx) xf (ax) ≤ ≤ . F (x) (b − a)F (x) F (x) Taking a = 1 and b = 1 +  yields lim sup x→∞

(1 + )α − 1 xf (x) ≤ ≤ α(1 + )α . F (x) 

Since  is arbitrary, we obtain that lim supx→∞ xf (x)/F (x) ≤ α. The lower bound is obtained by taking b = 1 and a = 1 − . If f is ultimately nonincreasing, the inequalities are reversed. If α = 0, then αtα f (tx) 1 txf (tx) F (tx) F (x) = lim = = tα−1 . x→∞ f (x) x→∞ t F (tx) F (x) xf (x) αt lim

Thus f ∈ RV∞ (α − 1). 1.17 Conditioning on Y yields, for x ≥ 1,

584

F Solutions to problems





P(X > x/y)βy −β−1 dy  x  ∞ −α −β−1 = (x/y) βy dy + βy −β−1 dy

1 − F (x) =

1

1

x

1 − xα−β βx−α − αx−β + x−β = . = βx−α β−α β−α 1.18 Conditioning on Y yields, for x ≥ 1,  ∞ 1 − F (x) = P(X > x/y)αy −α−1 dy 1  x  ∞ −α −α−1 = (x/y) αy dy + αy −α−1 dy = αx−α log(x) + x−α . 1

x

1.19 Let F be the common distribution function of X and Y . Conditioning on Y yields, for x > 0,  ∞ P(X > x/y) P(XY > x) = F (dy) . P(X > x) P(X > x) 0 By Fatou’s lemma, lim inf x→∞

P(XY > x) ≥ P(X > x)





P(X > x/y) F (dy) x→∞ P(X > x) lim

0 ∞

=

y α F (dy) = E[Y α ] = ∞ .

0

1.20 By integration by parts, we have for all s, x > 0,  ∞ Lf (s/x) f (tx) = dt . se−st f (x) f (x) 0 If f is regularly varying at infinity with index α, as x → ∞, the integrand converges to se−st tα , and  ∞ s e−st tα dt = s−α Γ (1 + α) , 0

so we only need a bounded convergence argument to prove (1.5.5). Since f is non-decreasing, if t ≤ 1, then f (tx)/f (x) ≤ 1. If t > 1, by Proposition 1.4.2, for large enough x, f (tx)/f (x) ≤ 2tα+1 . Thus, for all t ≥ 0, and large enough x, we have f (tx) ≤ 2(t ∨ 1)α+1 . f (x)

F Solutions to problems

585

-∞

Since 0 e−st (t ∨ 1)α+1 dt < ∞ for all s > 0, we can apply the dominated convergence Theorem and (1.5.5) holds, which implies that Lf is regularly varying at 0 with index −α. Conversely, if Lf is regularly varying at 0 with index −α, then lim

x→∞

Lf (s/x) = s−α = Lg(s) Lf (1/x)

with g(x) = xα /Γ (1 + α). The function s → Lf (s/x)/Lf (1/x) is the Laplace transform of the non-decreasing function gx defined by gx (y) =

f (xy) . Lf (1/x)

Thus, Lgx converges to Lg as x → ∞ and thus gx converges pointwise to g by the continuity theorem for Laplace transform. -x 1.21 Denote F¯ = 1 − F and U (x) = 0 F¯ (t) dt. By integration by parts, we obtain that 1 − LF (t) = tLU (t). Thus, by Problem 1.20, 1 − LF is regularly varying at 0 with index α if and only if U is regularly varying at ∞ with index 1 − α. By the monotone density theorem, this is equivalent to F being regularly varying at infinity with index −α.

Chapter 2   2.1 If X is regularly varying, then R is regularly varying since R = A−1 X  and |U | = 1. If R is regularly varying with tail index α > 0, then R |AU | is also regularly varying by Breiman’s Lemma 1.4.3 and P(R |AU | > x) ∼ α −1 E[|AU | ]P(R > x). Set V = |AU | AU . Then, for t > 0 and C ⊂ Sd−1 such α that E[|AU | 1{V ∈ ∂C}] = 0, / 0 P(R |AU | > tx | U ) P(R |AU | > tx, V ∈ C) =E 1{V ∈ C} P(R > x) P(R > x) → t−α E [|AU | 1{V ∈ C}] . α

A dominated convergence argument is obtained by Potter’s bound (1.4.3). This proves that X is regularly varying and its spectral measure Λ is defined by α

Λ=

E [|AU | δV ] . α E [|AU | ]

2.2 For every s, t > 0, (s−1 Y ) ∧ (t−1 Z) has tail index 2α (see Problem 1.7), thus by Breiman’ Lemma 1.4.3,

586

F Solutions to problems

P(XY > sx, XZ > tx) P(X(s−1 Y ∧ t−1 Z) > x) = lim x→∞ x→∞ P(X > x) P(X > x) lim

= E[(s−1 Y ∧ t−1 Z)α ] . By Problem 1.19, we know that the tail of XY is heavier than the tail of X, so XY and XZ are extremally independent and the exponent measure of the vector (XY, XZ) is concentrated on the axes. c

2.3 Write ς = ν X (B (0, 1)). By Proposition 2.2.3, we have ν X ([u, v]c )   =ς



1{rλ ∈ [u, v]c }αr−α−1 dr Λ(dλ)    ∞  d 1 ui vi =ς 1 r> ( 1{λi < 0} + 1{λi > 0}) αr−α−1 dr Λ(dλ) λi λi Sd−1 0 i=1   d −α ui vi =ς 1{λi 0} Λ(dλ) . λi Sd−1 i=1 λi Sd−1

0

Similarly,  ν X ([v, ∞)) = ς

2.4 Define V = |AΘZ |

−1

Sd−1

d 1

(vi−α (λi )α + )Λ(dλ) .

i=1

AΘZ . Then α

ΛX (B) =

E[|AΘZ | 1{V ∈ B}] . α E[|AΘZ | ]

2.5 Let T1 , T2 be the maps defined by (2.2.2), with norms | · |1 , | · |2 , respectively. Likewise, define ς1 , ς2 via (2.2.3). Let ν X be the exponent measure. Thus, ν X ◦ Ti−1 = ςi να ⊗ Λi , i = 1, 2. Define a map H : S1 → S2 by H(λ) = λ/|λ|2 . It is a bijection and H −1 : S2 → S1 becomes H −1 (θ) = θ/|θ|1 . Thus using (2.2.4) we have

F Solutions to problems

587

 f (λ)Λ1 (dλ)    u −1 = ς1 f (λ)ν X ∈ dλ u : |u|1 > 1, |u|1 S1   γ = ς1−1 f (λ)ν X ◦ T2−1 (r, γ) ∈ R+ × S2 : |rγ|1 > 1, ∈ dλ |γ|1 S1   γ = ς2 ς1−1 f (λ)να × Λ2 (r, γ) ∈ R+ × S2 : |rγ|1 > 1, ∈ dλ |γ| 1 S1   γ dθ = ς2 ς1−1 f (θ/|θ|1 )να × Λ2 (r, γ) ∈ R+ × S2 : |rγ|1 > 1, ∈ |γ| |θ| 1 1 S2 = ς2 ς1−1 f ◦ H −1 (θ)|θ|α 1 Λ2 (dθ) . S1

S2

2.6 1. By assumption, we have, for t ≥ 1, lim P(|X 1 | > tx | |X 1 | > x) = μ({x ∈ Rdh : |x1 | > t}) = t−α .

x→∞

Thus, for t ∈ (0, 1), lim

x→∞

P(|X 1 | > tx) 1 = lim x→∞ P(|X 1 | > x | |X 1 | > tx) P(|X 1 | > x) 1 1 = −1 −α = t−α . = lim y→∞ P(|X 1 | > t−1 y | |X 1 | > y) (t )

2. To prove that {νx , x ≥ 1} is relatively compact, we use Lemma B.1.29. We h choose a norm |·|d on Rd and define on Rdh the norm |x| = i=1 |xi |d . Set Un = B c (0, e−n ). Then {Un , n ≥ 1} is a localizing sequence and U n \ Un−1 is compact for all n ∈ Z. Furthermore, if x ∈ Un , there exists i ∈ {1, . . . , h} such that |xi | > e−n /h. Therefore, νx (Un ) ≥

hP(|X 1 |d > xe−n /η) . P(|X 1 |d > x)

This implies that supn≥1 μx (Un ) < ∞ and also lim lim sup νx (Un ) = 0 .

n→∞ x→−∞

Since νx (Un ) converges to zero when n → −∞ locally uniformly with respect to x, this proves conditions (B.1.7a) of Lemma B.1.29. 3. By Lemma B.1.31, we know that the class F of bounded continuous functions f on Rdh such that • either there exists u > 0 such that f (x1 , . . . , xh ) = 0 if |x1 |d ≤ u;

588

F Solutions to problems

• or f (x1 , . . . , xu ) = f (0, x2 , . . . , xu ) for all x ∈ Rdh , is measure determining. We will prove that limx→∞ νx (f ) exists for all f ∈ F. Assume first that f is of the first kind. Then the assumption yields E[f (x−1 X 1 , . . . , x−1 X h )] x→∞ P(|X 1 |d > x)

lim νx (f ) = lim

x→∞

E[f ((ux)−1 uX 1 , . . . , (ux)−1 uX h )] P(|X 1 |d > ux) x→∞ P(|X 1 |d > ux) P(|X 1 |d > x) /  0 P(|X 1 |d > ux) uX 1 uX h = lim E f ,..., | |X 1 |d > ux x→∞ ux ux P(|X 1 |d > x)

= lim

= u−α μ(fu ) , as x → ∞, with fu (x) = u−α f (ux). For f of the second kind, we have, by the stationarity assumption, νx (f ) =

E[f (0, x−1 X 1 , . . . , x−1 X h−1 )] E[f (0, x−1 X 2 , . . . , x−1 X h )] = . P(|X 1 |d > x) P(|X 1 |d > x)

For h = 1, the convergence holds by the fact that X 1 is regularly varying, thus the proof can be concluded by an induction argument. 4. Since {νx , x > 0} is relatively compact and may only have one limit since the class F is measure determining, we conclude that νx converges vaguely# and its limit is the exponent measure of X.

Chapter 3 3.1 The unit sphere for the 2 norm is parameterized by u → (cos u, sin u), ˜ to a measure on [−π, π], we obtain thus, identifying Λ  ∞  π −α−1 ˜ ˜ ˜ X (A) = α ˜s 1A (s cos u, s sin u) Λ(du) . ν 0

−π

Choosing A = (1, ∞)2 yields the condition  ∞  π ˜ ˜ ˜ X (A) = ν 1{s cos u > 1, s sin u > 1}Λ(du) α ˜ s−α−1 ds 0 −π   ∞  π/2   −1 ˜ ˜ = 1 s > (cos u ∧ sin u) ds Λ(du) α ˜ s−α−1 0



0 π/2

˜ (cos u ∧ sin u)α˜ Λ(du) 1}Λ 1

[0,1]



[0,1]

[0,1)



˜ 1 (du) + uα˜ Λ

=

 ˜ ˜ 1{su > 1}Λ2 (du) α ˜ s−α−1 ds

˜ 2 (du) < ∞ . uα˜ Λ [0,1)

This is a necessary condition for a measure on Sd−1 (for the supnorm) to be a hidden spectral measure. For a vector with independent component, we ˜ X ((x, ∞) × (y, ∞)) = x−α y −α . Thus for A ⊂ Sd−1 ∩ (R+ )2 , know that ν  ˜ 1{(1, y/x) ∈ A}αx−α−1 αy −α−1 dxdy Λ(A) = 2 (x,y)∈(1,∞) x>y

 + 

(x,y)∈(1,∞)2 x≤y

1{(x/y, 1) ∈ A}αx−α−1 αy −α−1 dxdy

1{(1, u) ∈ A}αu

= [0,1)

−α−1

 du +

1{(u, 1) ∈ A}αu−α−1 du .

[0,1]

The necessary condition above holds in this case since α ˜ = 2α. 3.3 1. First, (X0 , X1 ) is regularly varying with index α ∧ β and the exponent measure concentrated on both axes (if α = β) or one of the axes (if α = β). Set c˜n = n1/(α+β) . Then nP(X0 > c˜n u, X1 > c˜n v) = nn−α/(α+β) u−α n−β/(α+β) v −β = u−α v −β .

590

F Solutions to problems

2. It is obvious since for u > 0, P(X0 > xu, X1 ≤ v) = u−α FX1 (v) . P(X0 > x) 3. Since b(x) ≡ 1, we have by Proposition 1.4.6 E[X0α+δ 1{X0 ≤ x}]E[Y α+δ ] = αδ −1 δ . x→∞ xα+δ P(X0 > x) lim

Letting  → 0 we conclude (3.2.16). 4. Note first that (3.2.16) cannot hold since it would contradict the tail behavior of X0 X1 in Problem 1.18. We have for  ∈ (0, 1) and x such that x > 1,  x P(X0 1{X0 ≤ x}X1 > x) α = αx P(uX1 > x)u−α−1 du P(X0 > x) 1  x uα u−α−1 du = α log(x) . =α 1

The latter expression converges to ∞ when x → ∞. 3.4 1. Note first that ecE0 is has a Pareto distribution with tail index α = 1/c and by Breiman’s Lemma 1.4.3, P(X0 > x) ∼ α(α − q)−1 x−α . Thus we can apply Example 3.1.7 to prove extremal independence. 2. By Example 3.1.7, the hidden tail index is α ˜ = α(2 − a). 3. Write Zi = ecEi , i = 0, 1, 2. Set κ = q. Conditioning on Z1 , we have, for x > 1, u0 > 1 and u0 > 0, α xα P(Z0q Z1 > xu0 , Z1q Z2 > xκ u1 ) P(X0 > u0 x, X1 > xκ u1 | X0 > x) ∼ α−q  ∞ α α x [(xu0 z −1 )−α/q ∧ 1][(xκ u1 z −q )−α ∧ 1]αz −α−1 dz ∼ α−q 1  ∞ α ∼ [(u0 z −1 )−α/q ∧ 1][(u1 z −q )−α ∧ 1]αz −α−1 dz α − q 1/x  ∞ α → [(u0 z −1 )−α/q ∧ 1][(u1 z −q )−α ∧ 1]αz −α−1 dz . α−q 0 4. X0 X1 = ecqE0 +(1+q)cE1 +cE2 . The heaviest tail corresponds to the term (1 + q)cE1 , thus the tail index of X0 X1 is 1/{(1 + q)c} and by Breiman’s Lemma 1.4.3, P(X0 X1 > x) ∼

1 1 −α/(1+q) = q −2 (1 + q)2 x−α/(1+q) . q x 1 1 − 1+q 1 − 1+q

This could have been obtained by Proposition 3.2.12 but not by Proposition 3.1.8.

F Solutions to problems

591

3.5 The assumption implies that X0 is regularly varying with tail index α. Thus lim

x→∞

P(X0 > tx, X1 ≤ yb(tx)) P(X0 > tx, X1 ≤ yb(tx)) P(X0 > tx) = lim x→∞ P(X0 > x) P(X0 > tx) P(X0 > x) −α = t G(y) .

This implies that for all t > 0, lim

x→∞

P(X0 > x, X1 ≤ yb(tx)) =1. P(X0 > x, X1 ≤ yb(x))

Assume that lim supx→∞ b(tx)/b(x) > 1 for one fixed t > 0. Then, along a subsequence, for some  > 0, P(X0 > x, X1 ≤ (1 + )yb(x)) G(y(1 + )) P(X0 > x, X1 ≤ yb(tx)) ≥ = . P(X0 > x, X1 ≤ yb(x)) P(X0 > x, X1 ≤ yb(x)) G(y) Since G is proper, y can be chosen in such a way that G(y(1 + )) > G(y). This leads to a contradiction. Similarly, if lim inf x→∞ b(tx)/b(x) < 1, then there exists  > 0 such that along a subsequence P(X0 > x, X1 ≤ (1 − )yb(x)) G(y(1 − )) P(X0 > x, X1 ≤ yb(tx)) ≤ = . P(X0 > x, X1 ≤ yb(x)) P(X0 > x, X1 ≤ yb(x)) G(y) Again, y can be chosen in such a way that G(y(1 − )) < G(y). We conclude that b is slowly varying. 3.6 1. Note first that −1

P(eα

Z0

> x) = P(Z0 > α log x) = e−α log x = x−α ,

and for  > 0 such that a(α + ) < α, / ∞

α+ 0   α−1 ∞ aj Z−j j=1 E e = j=1

1 1−

α+ j α a

.

Therefore, by Breiman’s Lemma 1.4.3, X0 is regularly varying with index α and ∞ 3 2 ∞ j  " aα # 1 a Z−j −α −α −α j=1 =x P(X0 > x) ∼ x E X−1 = x E e . 1 − aj j=1 −1

h−1

To prove the extremal independence, set Wh = eα has a Pareto tail with index α thus

h P(X0 > x, Xh > x) = P X0 > x, X0a Wh > x

j=0

aj Zh−j

. Then Wh

∼ cst × x−α E[1{X0 > x}X0αa ] = o(x−α ) . h

592

F Solutions to problems

2. For h ≥ 0 and y ≥ 0,

h h P(Xh ≤ xa y | X0 > x) = P Wh (x−1 X0 )a ≤ y | X0 > x % $ αa−h Wh ∧1 . =1−E y This proves that κh = ah and the limiting distribution G of Xh given X0 > x is given by $ % αa−h Wh ∗ ∧1 , G(y) = P(Wh Z ≤ y) = 1 − E y where Z ∗ is a Pareto random variable with tail index αa−h . The measure μ is given by $ % αa−h Wh −α −α μ((s, ∞) × (−∞, y]) = s − E ∧s . y 3. Since Wh has a Pareto tail with index α and is independent of X0 , we have h

P(X0 > xy, Xh > xz) = P(X0 > xy, X0a Wh > xz) ∼ cst × z −α x−α E[1{X0 > xy}X0αa ] h

∼ cst × (yz)−α x−α(2−a

h

)

.

Thus the index of hidden regular variation is α(2 − ah ). 4. We can compute directly the tail behavior of X0 Xh . By Breiman’s Lemma 1.4.3, we obtain

h P(X0 Xh > x) = P Wh X01+a > x −1 h−1  α h  aj − α ∼ x 1+ah E[Wh1+κ ] = x−α/(1+a ) . 1− 1 + ah j=0 We can also obtain this result by applying Proposition 3.2.12; this requires to α /(1+ah ) ] < ∞. check condition (3.2.16). Fix α ∈ (α, α(1 + ah )). Then E[Wh For  > 0, by Markov inequality, Potter’s bounds (Proposition 1.4.2) and Proposition 1.4.6, we have

F Solutions to problems

593

h

P(X0 1{X0 ≤ x}Xh > x1+a y) h

h

= P(X01+a 1{X0 ≤ x}Wh > x1+a y) / 0 α  h   ≤ y −α /(1+a ) E W 1+ah x−α E[X0α 1{X0 ≤ x}] / 0 α  h  ≤ cst × y −α /(1+a ) E W 1+ah α P(X0 > x) 



≤ cst × y −α /(1+a ) α −α P(X0 > x) . h

This proves that (3.2.16) holds. 3.7 By 3.2.9, (3.3.1) applied with g(x, y) = y q 1{x > 1} implies - ∞Proposition qκ−α ds < ∞. Thus qκ < α and by regular variation of b, that 1 s / q 0 Y 1 E q 1{X>xs} lim x→∞ P(X > x) b (x) / 0 Yq αE[W1q ] qκ−α P(X > xs) bq (xs) E | X > xs = s . = lim x→∞ P(X > x) bq (x) bq (xs) α − qκ 3.8 1. Since κ < 1, X1 is tail equivalent to Y0 by Lemma 1.3.2. For s, t > 0, P(X0 > sx, X1 > tx) = P(X0 > sx, X0κ + Y0 > tx) ≤ P(X0 > sx, Y0 > tx/2) + P(X0 > sx, X0κ > tx/2) = O(P(X0 > x)2 ) + O(P(X0 > x1/κ )) . This proves joint regular variation with extremal independence. 2. For u0 , u1 > 0, P(X0 > xu0 ,X0κ + Y0 > xu1 ) ∼ P(X0 > xu0 , Y0 > xu1 ) + P(X0 > xu0 , X0κ > xu1 ) −α/κ

−α 1/κ ∼ P(X0 > x)2 u−α )u1 0 u1 + P(X0 > x

.

If κ < 1/2, the index of hidden regular variation is 2α and if κ ≥ 1/2, then the index of hidden regular variation is α/κ. 3. For u0 > 1 and , u1 > 0, P(X0 > xu0 , X0κ + Y0 > u1 xκ | X0 > x)  ∞ F (xdv) P(Y0 > (u1 − v κ )xκ ) = P(X > x) u0  ∞ F (xdv) = (1{v κ > u1 } + P(Y0 > xκ (u1 − v κ ))1{u1 > v κ }) P(X > x) u  0∞ 1/κ αv −α−1 dv = (u0 ∨ u1 )−α . → 1/κ

u0 ∨u1

594

F Solutions to problems

Chapter 4 4.1 1. For h ∈ Z, we can write (X0 , Xh ) =

 (ψj , ψj+h )Z−j . j∈Z

Applying Corollary 4.1.5, we obtain, P(x−1 (X0 , Xh ) ∈ ·)  = να,pZ ({u ∈ R : (ψj , ψj+h )u ∈ ·}) . x→∞ P(|Z0 | > x)

ν X = lim

j∈Z

The assumption that there exists j such that ψj ψj+h = 0 ensures that ν X is not the null measure and that X0 and Xh are extremally dependent since, for instance, P(|X0 | ∧ |Xh | > x)  = |ψj |α ∧ |ψj+h |α > 0 . lim x→∞ P(|Z0 | > x) j∈Z

2. Applying Proposition 2.1.12, we obtain that X0 Xh is regularly varying and P(|X0 Xh | > x) = ν X ({(u, v) ∈ R2 : |uv| > 1}) lim x→∞ P(|Z0 |2 > x)  = να,pZ ({u ∈ R : |ψj ψj+h |u2 > 1}) j∈Z

=



|ψj ψj+h |α/2 .

j∈Z

4.2 1. By Corollary 4.1.5, P(Xj > x) lim x→∞ P(|Z0 | > x)  = E[ν Z ({u ∈ R : Cj,i u > 1}) i∈Z

  α pZ E[|Cj,i |α 1{Cj,i > 0}] + (1 − pZ )E[|Cj,i |1{Cj,i < 0}] . = i∈Z

2. The conditions of Theorem 4.1.2 imply that if α ≤ 2, there exists  > 0  α− such that j∈Z E[C0,i ] < ∞. This yields    α− α α− cα− = E[C0,i ] α ≤ E[C0,i ] 2, the conditions of Theorem 4.1.2 imply that j∈Z E[C0,i ] < ∞. This yields    2 α α 2 c2i = E[C0,i ] ≤ E[C0,i ] x)  P(X α = cα E[C0,i ]. i = x→∞ P(Z0 > x) lim

i∈Z

i∈Z

This proves our claim. 4. Sn =

n 

Cj,j+i Z−i .

i∈Z j=1

Thus, applying Corollary 4.1.5 yields the required limit. 5. Follows from the previous limit since the ci are deterministic. 6. Obvious. 4.3 The assumptions of Theorem 4.1.2 are easily checked and this yields ∞

P(X0 > x)  = E[να/2 ({u ∈ R+ : cj Z1 · · · Zj−1 u > 1}) x→∞ P(Z 2 > x) 0 j=1 lim

=

∞ 

E[(cj Z1 . . . Zj−1 )α/2 ] = cα/2

j=1

=

∞ 

α/2

(cα/2 E[Z0

])j

j=0

cα /2 α/2

1 − cα/2 E[Z0

. ]

4.4 Since by definition limn→∞ na−1 X,n X (aX,n ) = 1 and aX,n /aZ,n → ψ1 , we have by the uniform convergence Theorem 1.1.2, n lim {E[X1{|X| ≤ aX,n }] − E[X1{|X| ≤ aZ,n }]} = (2pX − 1) log ψ1 . n→∞ aX,n Thus, (4.3.2) is equivalent to ⎛ ⎞  n ⎝ lim E[X1{|X| ≤ aZ,n }] − ψj E[Z0 1{|Z0 | ≤ aZ,n }]⎠ n→∞ aZ,n j∈Z  = −(2pZ − 1) ψj log(|ψj |) . (F.1) j∈Z

We first prove that for a fixed x > 0, E[|ψj Zj |1{|X| ≤ x}] < ∞ for all j. This is obviously true if ψj = 0 so without loss of generality we assume that ψj = 0. In that case, we write U = ψj Zj and V = X − ψj Zj . Then U and V

596

F Solutions to problems

are independent and tail equivalent with tail index 1 by Corollary 4.2.1. We have E[|U |1{|U + V | ≤ x}] = E[|U |1{−x − V ≤ U ≤ x − V }] = E[Lx (V )] , with Lx (y) = E[|U |1{−x − y ≤ U ≤ x − y}]. The function Lx is slowly varying at ±∞ by Proposition 1.4.6 and V has tail index 1 thus E[Lx (V )] < ∞. For brevity, we henceforth write an for aZ,n . Since the series {ψj , j ∈ Z} is summable, we have ⎛ ⎞   n ⎝ E[X1{|X| ≤ an }] − ψj E[Z0 1{|Z0 | ≤ an }]⎠ = ψj Jn,j , an j∈Z

j∈Z

with Jn,j =

n {E[Zj 1{|X| ≤ an }] − E[Z0 1{|Z0 | < an }]} . an

Consider j such that ψj = 0 and write X (j) = X − ψj Zj = Then X (j) is independent of Zj and

 i∈Z,i=j

ψi Zi .

E[Zj 1{|X| ≤ an }] = E[Zj 1{|ψj Zj | ≤ an (1 − )}1{|X| ≤ an }] + E[Zj 1{|ψj Zj | > an (1 − )}1{|X| ≤ an }]   ≤ E[Z0 1{|ψj Z0 | ≤ an (1 − )}1 |X (j) | ≤ an  ]   + E[Zj 1{|X| ≤ an }1{|ψj Zj | > an (1 − )}1 |X (j) | ≤ an  ] = E[Z0 1{|ψj Z0 | ≤ an (1 − )}] − E[Zj 1{|ψj Zj | ≤ an (1 − )}]P(|X (j) | > an )   + E[Zj 1{|X| ≤ an }1{|ψj Zj | > an (1 − )}1 |X (j) | ≤ an  ] . For each fixed  > 0, we have by the uniform convergence Theorem 1.1.2, lim

n→∞

n (E[Z0 1{|ψj Z0 | ≤ an (1 − )}] − E[Z0 1{Z0 | ≤ an }]) an = (2pZ − 1) log((1 − )/|ψj |).

By Proposition 1.4.6, the function x → a−1 n nE[|ψj Zj |1{|ψj Zj | ≤ an x}] is slowly varying, thus lim

n→∞

n E[|Zj |1{|ψj Zj | ≤ an (1 − )}]P(|X (j) | > an ) = 0 . an

For the last term, note that |x + y| ≤ z and y ≤ z implies |x| ≤ (1 + )z, thus

F Solutions to problems

597

  E[|Zj |1{|X| ≤ an }1{|ψj Zj | > an (1 − )}1 |X (j) | ≤ an  ] ≤ E[|Zj |1{an (1 − ) < |ψj Zj | ≤ an (1 + )}] . This yields lim sup n→∞

  n E[|Zj |1{|X| ≤ an }1{|ψj Zj | > an (1 − )}1 |X (j) | ≤ an  ] an  1+ ≤ log 1−

.

Altogether, we have proved that for each j such that ψj = 0 and for each  > 0, lim sup n→∞

n |E[Zj 1{|X| ≤ an }] + (2pZ − 1) log(|ψj |)| an ≤ log(1 + ) − 2 log(1 − ) .

Since  is arbitrary, we have proved that if ψj = 0, then lim Jn,j = − log ψj .

n→∞

In order to conclude, we must use a dominated convergence argument, for which we need to bound |ψj Jn,j | by a summable series independent of n. For a fixed  ∈ (0, 1) and j such that ψj = 0, the decomposition previously used yields   n 1− 1− E[|Z0 |1 an ( |ψ ∧ 1) < |Z | ≤ a ( ∨ 1) ] |Jn,j | ≤ 0 n | |ψ | j j an # "  n E |Z0 |1 an (1 − )|ψj |−1 < |Z0 | ≤ an (1 + )|ψj |−1 + an   n E[|Z0 |1 |Z0 | ≤ an |ψj |−1 ] P(|X (j) | > an ) = B1 + B2 + B3 . + an  We start with the last term B3 . Let X ∗ = i∈Z |ψi ||Zi |. By Corollary 4.2.1, we know that X ∗ is tail equivalent to |Z0 |. Thus,   n E[|Z0 |1 |Z0 | ≤ an |ψj |−1 ] P(X ∗ > an ) an   −1 ≤ cst a−1 ]. n E[|Z0 |1 |Z0 | ≤ an |ψj |

B3 ≤

Applying Potter’s bound (1.4.1a), B3 ≤ cst a−1 n



an |ψj |−1

P(|Z0 | > t)dt

0

≤ cst

a−1 n



0

an |ψj |−1

δ−1 t−δ dt ≤ cst a−δ ≤ cst |ψj |δ−1 . n |ψj |

598

F Solutions to problems

Consider now the term B1 . Let ψ ∗ = supj∈Z |ψj |. If 1 −  ≤ |ψj |, then B1 ≤ nP(|Z0 | > an (1 − )(ψ ∗ )−1 ) , the latter quantity being uniformly bounded with respect to n since it is convergent. If 1 −  > |ψj |, then the analysis is slightly more involved. In that case, by integration by parts, we have   n B1 = E[|Z0 |1 an < |Z0 | ≤ an |ψj |−1 ] an  |ψj |−1 nP(|Z0 | > an y)dy + nP(|Z0 | > an ) . ≤ 1

The last term is uniformly bounded in n by definition of an . Applying Potter’s bound (1.4.3) with δ ∈ (0, 1) as in (4.3.1) yields 

|ψj |−1

 nP(|Z0 | > an y)dy ≤ cst

1

|ψi |−1

y −δ dy ≤ cst |ψj |δ−1 ,

1

where the constant does not depend on j. Similarly for B2 we have, applying integration by parts and Potter’s bound, if |ψj | < 1,  B2 ≤

1+ |ψj | 1− |ψj |

nP(|Z0 | > an y)dy + |ψj |−1 nP(|Z0 | > an (1 − )|ψj |−1 )

≤ cst |ψj |δ−1 . Altogether, we have obtained that for all j such that |ψj | < 1 (i.e. for all but finitely many j), |ψj Jn,j | ≤ cst |ψj |δ . This is a summable series by assumption, thus we can apply the dominated convergence theorem and obtain (F.1). 4.5 Write X0 X =



ψj ψj− Zj2 +



ψj ψj  + Zj Zj  =: X (1) + X (2) .

j,j  ∈Z j=j 

j∈Z

Applying Problem 4.4 gives ⎛   n lim 2 ⎝E[X (1) 1 X (1) ≤ a2X,n ] n→∞ a X,n −



⎞   ψj ψj+ E[Z02 1 Z02 ≤ a2Z,n ]⎠

j∈Z

=−

1  ψj ψj+ log(|ψj ψj+ |) + (2pX0 X − 1) log ψ21 . |ψ|21 j∈Z

F Solutions to problems

599

Next, lim Jn = lim

n→∞

n→∞

n a2X,n

  (E[X (1) 1 X (1) ≤ a2X,n ]−   E[X (1) 1 |X (1) + X (2) | ≤ a2X,n ]) = 0 .

Furthermore, since E[Zj Zj  ] < ∞ for j = j , we have E[X (2) ] < ∞ and   n lim 2 E[|X (2) |1 |X (2) | < a2X,n ] = 0 . n→∞ a X,n (2) We always have X  > X  0 X(2)  . If Zj are non-negative, then for x > 0, 1{X0 X < x} ≤ 1 X < x and hence   n E[|X (2) |1 |X (1) + X (2) | < a2X,n ] 2 aX,n   n ≤ 2 E[|X (2) |1 |X (2) | < a2X,n ] → 0 . aX,n

4.6 Let ψ denote the Laplace transform of X, i.e. ψ(t) = e−tX0 . Then the Laplace transform ψS of S is given by ψS (t) = E[e−tS ] = E[ψ N (t)] =

∞ 

ψ k (t)P(N = k)

k=0

=1+

∞ 

{ψ k (t) − 1}P(N = k) .

k=1

For  > 0, there exists some t0 such that |1 − ψ(t)| <  if t ≤ t0 . Then, for all k ≥ 1, k(1 − )k−1 {1 − ψ(u)} ≤ 1 − ψ k (u) ≤ k{1 − ψ(u)} . Summing these relations, we obtain {1 − ψ(u)}

∞ 

kP(N = k)(1 − )k−1 ≤ 1 − ψS (t) ≤ {1 − ψ(u)}E[N ] .

k=1

Let the function H be defined by on [0, 1] by H(z) = E[z N ]. Since E[N ] < ∞, H is continuously differentiable on [0, 1] and H (1) = E[N ]. Thus, for any η > 0,  van be chosen small enough so that H (1 − ) ≥ E[N ](1 − η). This yields 1−η ≤

1 − ψS (t) ≤1. E[N ]{1 − ψ(t)}

600

F Solutions to problems

Thus 1 − ψS (t) ∼ E[N ]{1 − ψ(t)}. Since the right tail of X0 is regularly varying with index α ∈ (0, 1), we have, for some function  slowly varying at infinity, that P(X0 > x) ∼ x−α (x), which implies by Problem 1.21 that 1 − ψ(t) ∼ tα (1/t). Thus we have obtained that 1 − ψS (t) ∼ E[N ]tα (1/t) and this in turn implies that P(S > x) ∼ E[N ]x−α (x) = E[N ]P(X0 > x). 4.7 Let  ∈ (0, z − 1) where E[z N ] < ∞. By Problem 1.15, there exists C such that F ∗n (x) ≤ C(1 + )n F¯ (x). Thus, ∞  P(S > x) F ∗n (x) . = P(N = n) ¯ P(X > x) n=1 F (x)

Now, for each n, P(N = n)F ∗n (x)/F¯ (x) converges to nP(N = n) and is bounded by C(1 + )n P(N = n) which is a summable series by assumption. Hence bounded convergence applies and the proof is concluded. 4.8 1. Since H ∗n is the distribution function of Y1 + · · · + Yn , the formula follows by conditioning on K. 2. By Problem 1.12, H is regularly varying at ∞ with index 1 − α. Thus (4.2.5) holds and we can apply Corollary 4.2.4 to conclude. 4.9 k Assume without loss of generality that μ = 1 and fix  > 0. Write Sk = i=1 Zi . 1. Let k1 be the largest integer smaller than or equal to (1 − )x. The random variable N being integer-valued, N ≤ (1 − )x entails N ≤ k1 . Thus, P(S > x) = P(S > x, N ≤ (1 − )x) + P(S > x, N > (1 − )x) ≤ P (Sk1 > x) + P(N > (1 − )x) ≤ P (Sk1 − k1 > x) + P(N > (1 − )x) . 2. Let k2 be the smallest integer larger than or equal to (1 + )x. The random variable N being integer-valued, N > (1 + )x entails N ≥ k2 . Thus, P(S > x) ≥ P(S > x, N > (1 + )x) ≥ P(Sk2 > x, N > (1 + )x) = P(N > (1 + )x) − P(N > (1 + )x, Sk2 ≤ x) ≥ P(N > (1 + )x) − P(Sk2 ≤ x) = P(N > (1 + )x) − P(Sk2 − k2 ≤ x − k2 ) ≥ P(N > (1 + )x) − P(Sk2 − k2 ≤ −x). 3. Applying the inequality (1.5.3) with q = 2p − 2 if p > 2 to bound the terms with Ski in the previous relations yields

F Solutions to problems

601

P(S > x) ≤ P(N > (1 − )x) + cst k1 x−p ≤ P(N > (1 − )x) + cst x1−p , P(S > x) ≥ P(N > (1 + )x) + cst k2 x−p ≥ P(N > (1 + )x) + cst x1−p . By assumption, we have x1−p = o(P(N > x)), thus P(S > x) ≤ (1 − )−α , x→∞ P(N > x) P(S > x) lim inf ≥ (1 + )−α . x→∞ P(N > x)

lim sup

Since  is arbitrary, this concludes the proof. 4.10 1. Since S = have

N u=1

Zi and the Zi are i.i.d. and independent of N , we

χ(t) = E[e−tS ] =

∞ 

E[e−t(Z1 +···+Zk ) ]P(N = k)

k=0

=

∞ 

ψ k (t)P(N = k) = H(ψ(t)) .

k=1

2. For t < 1, H (t) = E[N 2 z N ] > 0, thus H is convex. This yields, for all t ≥ 0, 1 − χ(t) = χ(0) − χ(t) = H(1) − H(ψ(t)) ≤ H (1){1 − ψ(t)} = E[N ]{1 − ψ(t)} . For the lower bound, fix  > 0. Since ψ is continuous on [0, ∞) and ψ(0) = 1, there exists t0 > 0 such that ψ(t) ≥ 1 −  for all t ∈ [0, t0 ]. Thus, by convexity of H, 1 − χ(t) = H(1) − H(ψ(t)) ≥ H (1 − ){1 − ψ(t)} . 3. Since  is arbitrary, this proves that 1 − χ(t) ∼ E[N ]{1 − ψ(t)} as t → 1. By Problem 1.21, since α ∈ (0, 1), Z0 is regularly varying with tail index α ∈ (0, 1) if and only if S is and P(S > x) ∼ E[N ]P(Z0 > x). 4.11 1. Since X0 , X+ and X− are independent, the characteristic function of their sum is φ0 φ+ φ− = φ. 2. Since log φ0 is infinitely many times differentiable at zero, all moments of X0 are finite. 3. By direct computation, E[eitX+ ] = E[(E[eitZ1 ])N ] = eθ+ (E[e = eθ + (

∞ 1

−1 eitz θ+ ν(dz)−1)

itZ1 ]−1)

=e

∞ 1

{eitz −1}ν(dz)

= φ+ (t) .

602

F Solutions to problems

N− 4. Similarly, X− can be represented as X− = i=1 Yi with N− a random variable with a Poisson distribution with intensity θ− = ν((−∞, −1)) and Zi −1 i.i.d. with distribution θ− ν restricted to (−∞, −1). 5. If x → ν(x, ∞) is regularly varying at +∞, then so is X+ , with the same index by Corollary 4.2.4. The same holds for X− and since X+ and X− are independent, the left and right tails of X+ + X− are precisely the tails of X− and X+ . Since X0 is lighter-tailed than X+ + X− , the sum X0 + X+ + X− is tail equivalent to X+ + X− by Lemma 1.3.2. 6. Conversely, if X is regularly varying at infinity, then so is X+ . By Problem 1.21, this implies that Z1 , hence the map x → ν(x, ∞) is regularly varying with the same index as X.

Chapter 5 5.2 Noting that ν({y ∗ ≤ 1}) = 0 and using the shift-invariance of ν and then of A we have    ν(A) = ν(A, y ∗ > 1) = ν(A, y ∗−∞,j−1 ≤ 1, y j  > 1) =



j∈Z

ν(B

j

A, y ∗−∞,−1

≤ 1, |y 0 | > 1)

j∈Z

=



ν(A, y ∗−∞,−1 ≤ 1, |y 0 | > 1) .

j∈Z

This yields ν(A) ∈ {0, ∞}. 5.3 Since Y h = Y Θh , where Y is a Pareto random variable with tail index α > 0 independent of Θh , we obtain by definition of the tail process and Problem 1.6: α

τ|X | (h) = P(|Y h | > 1) = E[|Θh | ∧ 1] . 5.4 Let ν 0,h be the exponent measure of X 0,h . the function g defined on R(h+1)d by g(x0,h ) = |xh | 1{|x0 | > 1} satisfies the conditions of Corollary 2.1.10. Thus x−β E[|X h | | |X 0 | > x] β

E[g(x−1 X 0,h )] → ν 0,h (g) P(|X 0 | > x)  h β |y h | 1{|y 0 | > 1}ν(dy) = E[|Y h | ] . =

=

(Rd )Z

F Solutions to problems

603

5.5 By Problem 5.4, since α > 1 the convergence of expectations holds and CTE|X | (h) = E[|Y h |] = E[Y ]E[|Θh |] =

α E[|Θh |] . α−1

5.6 Since both processes have the same tail process, we compute the required quantities simultaneously. 1. ϑ = P(Y ∗−∞,−1 ≤ 1) = P(|φ−1 (1 − b)Y0 | ≤ 1) = P(b = 1) + P(b = 0){1 − P(|Y0 | > φ)} =

|φ|α |φ|α ∨ 1 1 −α + (1 − |φ| )1{|φ| > 1} = . 1 + |φ|α 1 + |φ|α 1 + |φ|α

2. We use Problem 5.3. For |h| ≥ 2 τ|X| (h) vanish. Otherwise, τ|X| (1) = τ|X| (−1) = P(|Y 1 | > 1) = P(|φbY0 | > 1) =

|φ|α ∧ 1 . 1 + |φ|α

3. Similarly, for |h| ≥ 2, CTE|X| (h) = 0. Otherwise CTE|X| (1) = E[|Y1 |] = E[|φbY0 |] =

α |φ| , 1 + |φ|α α − 1

CTE|X| (−1) = E[|Y−1 |] = E[|φ−1 (1 − b)Y0 |] = |φ|−1

|φ|α α . 1 + |φ|α α − 1

4. If |φ| ≤ 1, the event Y ∗−∞,−1 ≤ 1 is equivalent to b = 1, in which case Y1 = Y ∗ = |Y0 | and Y1 = φY0 . Consequently, conditionally on b = 1, Q0 = Θ0 , Q1 = φΘ0 and Qj = 0 otherwise. If |φ| > 1, the event Y ∗∞,−1 ≤ 1 is equivalent to b = 1, in which case Y ∗ = |Y1 |, Q0 = |φ|−1 Θ0 , Q1 = Θ0 , or b = 0 and Y0 < φ, in which case Y ∗ = Y0 , Q0 = Θ0 , Q−1 = φ−1 Θ0 . 5. The previous computations yields, if |φ| ≤ 1,  E[|Qj |α ] = 1 + |φ|α = ϑ−1 . j∈Z

If |φ| > 1, 

E[|Qj |α ] = 1 + |φ|−α = ϑ−1 .

j∈Z

5.7 Recall from Example 5.2.12 that the tail process is given by

604

F Solutions to problems

 ρj Y0 Yj = b1 . . . bj ρ−j Y0

if j ≥ 0 , if j < 0 ,

where bj , j < 0 are i.i.d. Bernoulli random variables with mean ρα . 1. ϑ = P(maxj≥1 ρj Y0 ≤ 1) = P(ρY0 ≤ 1) = 1 − ρα . 2. For h ≥ 1, τX (h) = P(Yh > 1) = P(ρh Y0 > 1) = ραh . For h < 0, τX (h) = P(Yh > 1) = P(b1 . . . bh ρh Y0 > 1) = P(b1 = 1, . . . , bh = 1) = ρ−αh . α For h ≥ 1, CTEX (h) = E[Yh ] = ρh α−1 . For h < 0, α CTEX (h) = E[Yh ] = ρh ραh . α−1

3. If Y ∗−∞,−1 = 0, then Y ∗ = Y0 and thus Qj = ρj , j ≥ 0, Qj = 0, j ≤ −1.   jα 4. j∈Z E[Qα = (1 − ρα )−1 = ϑ−1 . j]= j≥0 ρ 5.8 For simplicity, we assume that β = 1. Let C = {x ∈ Rd : |g(x)| > 1}. By positive homogeneity and continuity, g(0) = 0 and C is included in B c (0, ) in Rd for some  > 0. Thus

g(X j )  P ∈ A, X ∈ xC 0 x g(X j ) P ∈ A | |g(X 0 )| > x = x P(X 0 ∈ xC)

g(X j ) ∈ A, X ∈ xC P 0 x P(|X 0 | > x) . = P(|X 0 | > x) P(X 0 ∈ xC) By definition, P(X 0 ∈ xC) P(X 0 ∈ xC, |X 0 | > x) = lim lim x→∞ P(|X 0 | > x) x→∞ P(|X 0 | > x) −α =  P(Y 0 ∈ C) = −α P(|g(Y 0 )| > 1) . The latter quantity is positive by assumption. Also,



g(X j ) g(X j ) P ∈ A, X 0 ∈ xC P ∈ A, X 0 ∈ xC, |X 0 | > x x x lim = lim x→∞ x→∞ P(|X 0 | > x) P(|X 0 | > x)

g(X j ) P ∈ A, X ∈ xC, |X | > x 0 0 x = lim x→∞ P(|X 0 | > x) −α =  P(g(Y j ) ∈ A, |g(Y 0 )| > 1) . Altogether, we have obtained  g(X j ) lim P ∈ A | |g(X 0 )| > x x→∞ x If we choose  = 1, the claim is proved.

= P(g(Y j ) ∈ A | |g(Y 0 )| > 1) .

F Solutions to problems

605

5.9 1. The property (5.7.1) yields

 

 .  . P π0 (X 0 ) > x, X 0  > x P(|X 0 | > x)

= lim lim x→∞ P((X 0 , . . . , X h ) > x) x→∞ .0  > x P X 

   = P π0 (Y, 0 ) > 1 = P(|Y, 0,0 | > 1) .

2. Using the definition of the tail process, the previous question and (5.7.1) yield  2 3    3 E Φ(Π0 (Y, ))1 π0 (Y, 0 ) > 1 2 , Y, ) | Y, 0,0  > 1 = E Φ( P(|Y, 0,0 | > 1)    2  3   . .0  > x . E Φ(Π0 (X/x))1 π0 (X 0 /x) > 1 1 X .0>x) P(X = lim x→∞ .0  > x) P(|X 0 | > x) P(X 2  3 . . E Φ(Π0 (X/x))1{|X 0 | > x}1 X 0  > x = lim x→∞ P(|X 0 | > x) 2 3 . E Φ(Π0 (X/x))1{|X 0 | > x} = lim x→∞ P(|X 0 | > x) E [Φ({X j /x, j ∈ Z})1{|X 0 | > x}] = E [Φ(Y )] . = lim x→∞ P(|X 0 | > x) 5.10 For all j ∈ Z, we have by stationarity of X, E[H(B j Y )1{|Y −j | > t}] E[H(x−1 B j X))1{|X −j | > xt}1{|X 0 | > x}] x→∞ P(|X 0 | > x) −1 E[H(x X)1{|X 0 | > xt}1{|X j | > x}] = lim x→∞ P(|X 0 | > x) E[H((xt)−1 tX)1{|X 0 | > xt}1{t |X j | > xt}] P(|X 0 | > xt) = lim x→∞ P(|X 0 | > xt) P(|X 0 | > x) = t−α E[H(tY )1{|tY j | > 1}] .

= lim

Since the distribution of the tail process is characterized by its finitedimensional distributions, this proves the time-change formula (5.3.1a). 5.11 Applying the time-change formula (5.3.1b) with j = 0 yields, for every bounded measurable function H defined on Rd , /  0 Θ0 α E[H(Θ0 )1{|Θ0 | = 0}] = E H . |Θ0 | |Θ0 |

606

F Solutions to problems α

Taking H ≡ 1 yields E[|Θ0 | ] = P(|Θ0 | = 0) = 1, the latter by assumption. Taking H(x) = 1{|x| = 1} yields α

P(|Θ0 | = 1) = E[|Θ0 | ] = 1 . 5.12 If (5.3.1b) holds, then E[H(tB j Y )1{|Y −j | > 1/t}]  ∞ = E[H(rtB j Θ)1{r |Θ−j | > 1/t}]αr−α−1 dr 1 ∞ E[H(rtB j Θ)1{r |Θ−j | > 1/t}1{r |Θ0 | > 1}1{|Θ−j | = 0}]αr−α−1 dr = 1  ∞   −1 −1 α = E[H(rt |Θj | Θ)1 r |Θj | |Θ0 | > 1/t |Θj | ]αr−α−1 dr 1 / ∞ 0 −α −α−1 =t E H(uΘ)1{u |Θj | > t}αu du 1

= t−α E [H(Y )1{|Y j | > t}] , where the last line was obtained by the change of variable u |Θj | = rt. Thus (5.3.1a) holds. 5.13 Taking H ≡ 1 and j = 0 in (5.3.1a) yields, for t > 0, t−α P(t |Y 0 | > 1) = P(|Y 0 | > t) . Letting t tend to 0 yields limt→0 t−α P(t |Y 0 | > 1) = 1. Taking now H(y) = 1{|y 0 | > 1} and j = 0 yields, for t ≤ 1 t−α P(t |Y 0 | > 1) = P(|Y 0 | > 1, |Y 0 | > t) = P(|Y 0 | > 1) . This proves that P(|Y 0 | > 1) = 1 and that |Y 0 | has a Pareto distribution. −1

Furthermore, if we define Θ = |Y 0 | Y , then Θ and |Y 0 | are independent. −1 Indeed, applying (5.3.1a) to y → H(|y 0 | y) yields, for t ≤ 1, E[H(Θ)1{t |Y 0 | > 1}] = tα E[H(Θ)1{t |Y 0 | > t}] = tα E[H(Θ)] . 5.14 Applying the time-change formula (5.3.1a) with t = 1 yields E[φ(Y 0 )φ(Y j )] = E[φ(Y 0 )φ(Y j )1{|Y j | > 1}] = E[φ(Y −j )φ(Y 0 )1{|Y −j | > 1}] = E[φ(Y −j )φ(Y 0 )] . 5.15 Let X be a sequence of i.i.d. non-negative regularly varying random variables. Then, taking H(x) = 1{x1 > 1} yields

F Solutions to problems





607

H(rΘ)αr−α−1 dr = 0 .

0

On the other hand, if we omit the indicator in the left-hand side of (5.2.8), we obtain, by the shift invariance of the tail measure,   1{y1 > 1}ν(dy) = 1{y0 > 1}ν(dy) = 1 . (Rd )Z

(Rd )Z

This proves that the indicator cannot be omitted in (5.2.8) in general. 5.16 The function K being α-homogeneous, we have by the shift-invariance of the tail measure, (5.2.8), the time-change formula (5.3.3) and by the assumption that the spectral tail process never vanishes        H(y)1 y j > 0 ν(dy) = H(B j y)1{|y 0 | > 0}ν(dy) (Rd )Z

(Rd )Z

= E[K(B j Θ)] = E[K(B j Θ)1{|Θ−j | = 0}] = E[K(Θ)1{|Θj | = 0}] = E[K(Θ)]  = H(y)1{|y 0 | > 0}ν(dy). (Rd )Z

Taking H(y) = 1{|y 0 | = 0} yields, for all j ∈ Z, ν({y 0 = 0, y j = 0}) = 0. This in turn yields  ν({y 0 = 0, y j = 0}) = 0 . ν({y 0 = 0, y = 0}) ≤ j∈Z

Thus ν({y 0 = 0}) = ν({y = 0}) = 0 by Theorem 5.1.2 (i), and we conclude that ν({y 0 = 0}) = 0 and thus (5.2.8) holds without the indicator in the left-hand side. 5.17 1. By assumption, H is shift-invariant and H(x) = 0 for x∗ ≤  for all  ≤ η. Therefore, by the definition of ν ∗ , (5.4.8) and (5.2.6),  ∞  ∞ ν ∗ (H) = E[H(rQ)]αr−α−1 dr = −α E[H(rQ)]αr−α−1 dr  " 1 # = −α E [H(Y )] = −α E H(Y )1 Y ∗−∞,−1 ≤ 1  ∞ "  # = −α E H(uΘ)1 uΘ∗−∞,−1 ≤ 1 αu−α−1 du  ∞ 1 "  # E H(vΘ0,∞ )1 vΘ∗−∞,−1 ≤  αv −α−1 dv =  / ∞ 0  ∗  −α−1 =E H(vΘ0,∞ )1 vΘ−∞,−1 ≤  αv dv . 

608

F Solutions to problems

By assumption, we have ∞    1{v|Θj | > η} . |H(vΘ0,∞ )|1 vΘ∗∞,−1 ≤  ≤ cst j=0

The latter function is integrable under condition (5.7.2). Therefore the dominated convergence theorem applies and letting  tend to zero yields  ∞ "  # ν ∗ (H) = E H(vΘ0,∞ )1 Θ∗−∞,−1 = 0 αv −α−1 dv . 0

2. If P(Θ∗−∞,−1 = 0) = 0 and if (5.7.2) holds, then taking H(x) = 1{x∗ > 1} yields  ∞ "  # E H(uΘ0,∞ )1 Θ∗−∞,−1 = 0 αu−α−1 du = 0 . ϑ = ν ∗ (H) = 0

This is a contradiction so (5.7.2) does not hold. 5.18 Define Y (1) = {Y j 1{|Y j | > 1}, j ∈ N}. If α > 1, the assumption on ∞ {Θj , j ∈ Z} implies that E[( j=0 |Θj |)α−1 ] < ∞ and we know that this  α  implies that E[( j∈Z Qj ) ] < ∞. Thus, the assumption on K and the dominated convergence theorem yield  ∞ α P (uK(Q (u)) > 1) αu−α−1 du . ∞ > ϑE[K (Q)] = ϑ lim →0



Applying the identities (5.4.8) and (5.2.6), we obtain  ∞ P (uK(Q (u)) > 1) αu−α−1 du ϑE[K α (Q)] = ϑ lim →0   ∞ −α P (vK(Q1 (v)) > 1) αv −α−1 dv = ϑ lim  →0 1 * + = lim −α P K(Y (1)) > 1 ; Y ∗−∞,−1 ≤ 1 →0  ∞ * + −α P vK(Θ1 (v)) > 1 ; vΘ∗−∞,−1 ≤ 1 αv −α−1 dv = lim  →0  ∞ 1 * + = lim P uK(Θ (u)) > 1 ; uΘ∗−∞,−1 ≤  αu−α−1 du . →0



Note that when passing from the representation with the conditional spectral tail process Q to the sequence Y , the set of indices changed from Z to N since the functional K is applied to Y¯ (1) while imposing that Y ∗−∞,−1 ≤ 1, thus the negative coordinates of Y are put to zero. Since * + lim P uK(Θ (u)) > 1 ; uΘ∗−∞,−1 ≤  →0 + * = P uK(Θ0,∞ ) > 1 ; Θ∗−∞,−1 = 0 ,

F Solutions to problems

609

∞ if E[( j=0 |Θj |)α ] < ∞, then by bounded convergence, we obtain 



* + P uK(Θ0,∞ ) > 1 ; Θ∗−∞,−1 = 0 αu−α−1 du 0   = E[K(Θ0,∞ )1 Θ∗−∞,−1 = 0 ] .

ϑE[K α (Q)] =

This proves (5.7.3).     5.19 Apply Problem 5.18 with K(x) =  j∈Z xj . 5.20 Applying Theorem 5.4.10 to H(y)1{|y 0 | > 1} and the properties of H yields E[H(Y )] = ν(H1{|y 0 | > 1})  ∞    =ϑ E[H(rB j Q)1 rQ−j  > 1 ]αr−α−1 dr j∈Z





0

   α α E[H(Q) Q−j  ] = ϑ E[H(Q) Qj  ] .

j∈Z

j∈Z

5.21 Apply Theorem 5.4.10 to 1{K(y) > 1}1{|y 0 | > 1} yields P(K(Y ) > 1) = ν({K(y) > 1, |y 0 | > 1})  ∞   =ϑ P(rK(Q) > 1, r Qj  > 1)αr−α−1 dr j∈Z





0

 + α * E[ K(Q) ∧ Qj  ] .

j∈Z

5.22 Since H is shift-invariant, applying (5.5.9) (with  = 1) yields "  # ν ∗ (H) = E H(Y )1 Y ∗−∞,−1 ≤ 1 .    By assumption, |H(y j,∞ )−H(y j+1,∞ )| = |H(y j,∞ )−H(y j+1,∞ )|1 y j  > 1 . Applying the time-change formula yields

610

F Solutions to problems

∞ 

"  # E |H(Y j,∞ ) − H(Y j+1,∞ )|1 Y ∗−∞,−1 ≤ 1

j=0

=

=

∞  j=0 ∞ 

"   # E |H(Y j,∞ ) − H(Y j+1,∞ )|1 Y ∗−∞,−1 ≤ 1 1{|Y j | > 1} "   # E |H(Y 0,∞ ) − H(Y 1,∞ )|1 Y ∗−∞,−j−1 ≤ 1 1{|Y −j | > 1}

j=0

⎤ ∞   ∗  1 Y −∞,−j−1 ≤ 1 1{|Y −j | > 1}⎦ = E ⎣|H(Y 0,∞ ) − H(Y 1,∞ )| ⎡

j=0

"

 # = E |H(Y 0,∞ ) − H(Y 1,∞ )|1 Y ∗−∞,0 > 1 = E [|H(Y 0,∞ ) − H(Y 1,∞ )|] . Thus we can apply Fubini’s theorem and obtain E[H(Y 0,∞ ) − H(Y 1,∞ )] ∞  "  # = E {H(Y j,∞ ) − H(Y j+1,∞ )}1 Y ∗−∞,−1 ≤ 1 j=0

 # " = E H(Y 0,∞ )1 Y ∗−∞,−1 ≤ 1 "  # = E H(Y )1 Y ∗−∞,−1 ≤ 1 = ν ∗ (H) . 5.23 By assumption,    |H(y j,∞ ) − H(y j+1,∞ )| = |H(y j,∞ ) − H(y j+1,∞ )1 y j  > 1 . Since H(y) = 0 if y ∗ ≤ 1, we have by (5.5.9),   ν ∗ (H 2 ) = E[H 2 (Y )1 Y ∗−∞,−1 ≤ 1 ]   = E[|H(Y )||H(Y 0,∞ )|1 Y ∗−∞,−1 ≤ 1 ] . Since H is bounded, applying the time-change formula (5.3.1a) yields ν ∗ (H 2 ) ∞    ≤ E[|H(Y )||H(Y j,∞ ) − H(Y j+1,∞ )|1{|Y j | > 1}1 Y ∗−∞,−1 ≤ 1 ] =

j=0 ∞ 

  E[H(Y )|H(Y 0,∞ ) − H(Y 1,∞ )|1{|Y −j | > 1}1 Y ∗−∞,−j−1 ≤ 1 ]

j=0

= E[H(Y )|H(Y 0,∞ ) − H(Y 1,∞ )|] < ∞ . This proves that the series

F Solutions to problems ∞ 

611

  E[|H(Y )||H(Y j,∞ ) − H(Y j+1,∞ )|1{|Y j | > 1}1 Y ∗−∞,−1 ≤ 1 ]

j=0

is summable, and therefore we can apply the dominated convergence theorem to prove the stated equality. 5.24 Applying the time-change formula (5.3.1a), we obtain ν ∗ (HH )

  = E[H(Y )H (Y )1 Y ∗−∞,−1 ≤ 1 ]   = E[|H(Y )||H (Y 0,∞ )|1 Y ∗−∞,−1 ≤ 1 ] ∞    E[|H(Y )||H (Y j,∞ ) − H (Y j+1,∞ )|1{|Y j | > 1}1 Y ∗−∞,−1 ≤ 1 ] ≤ =

j=0 ∞ 

  E[H(Y )|H (Y 0,∞ ) − H (Y 1,∞ )|1{|Y −j | > 1}1 Y ∗−∞,−j−1 ≤ 1 ]

j=0

= E[H(Y )|H (Y 0,∞ ) − H (Y 1,∞ )|] < ∞ . This proves that ν ∗ (HH ) < ∞. 5.25 Note that E(Y 0,∞ )−E(Y 1,∞ ) = 1 almost surely thus applying Problem 5.24 yields ν ∗ (HE) = E[H(Y )] and  ν ∗ (E 2 ) = E[E(Y )] = P(|Y j | > 1) . j∈Z

5.26 Let φ˜ be the function φ extended to (Rd )Z by φ(x) = φ(x0 ). By definition of Hφ , ν ∗ ∗ and (5.4.11),  ∞ ˜ = ν 0 (φ) . ν ∗ (Hφ ) = E[φ(rQj )]αr−α−1 dr = ν(φ) j∈Z

0

The last equality follows from the fact that the exponent measure of a finitedimensional distribution is the projection of the tail measure. 5.27 Applying the time-change formula (5.3.1a) to each of the summands, we obtain   P(|Y j | > 1) = P(|Y j | > 1) . j1

5.28 By (5.4.11), the definition of the tail process and P(Q∗ = 1) = 1, we have

612

F Solutions to problems

E[H(Y )] = ϑ

 j∈Z



j∈Z



    E[H(uB t Q)1 u Q−t  > 1 ]αu−α−1 du



    E[H(uB t Q)1 u Q−t  > 1 ]αu−α−1 du

0

 



1

2 3  α −1 −1 E[Q−t  H(Y B t (|Q0 | Q))] = E H(Y B T (|Q0 | Q)) .

j∈Z

5.29 1. Applying the time-change formula yields  "   # E 1 Y ∗−∞,j−1 ≤ 1 1{|Y j | > 1}H(Y ) E[H(Y )] = j∈Z

=



"   # E 1 Y ∗−∞,−1 ≤ 1 1{|Y −j | > 1}H(B j Y )

j∈Z





" # E 1{|Z −j | > 1}H(B j Z) .

j∈Z

Since |Z j | ≤ 1 for j < 0 by definition of Z, this proves (5.7.6a). Conversely, 0  / H(Y )   ∗ 1 Y −∞,−1 ≤ 1 1{|Y j | > 1} ϑE[H(Z)] = E E(Y ) j∈Z 0  / H(B j Y )   ∗ = 1 Y −∞,−j−1 ≤ 1 1{|Y −j | > 1} E E(Y ) j∈Z 0 / 0  / H(B j Y ) H(B −T1 (Y ) Y ) 1{T1 (Y ) = −j} = E = E . E(Y ) E(Y ) j∈Z

2. Applying (5.7.6b) to the shift-invariant map H = E yields the first identity. Applying (5.7.6a) to the shift-invariant map H(x) = 1{E(x) = k} yields ⎤ ⎡ E(Z )−1  1{E(Z) = k}⎦ = ϑkP(E(Z) = k) . P(E(Y ) = k) = ϑE ⎣ i=0

3. Apply (5.7.6a) to the functional E0 and note that E(Z )−1



  1 E0 (B −Ti (Z ) Z) = k = 1{E0 (Z) ≥ k} = 1{E(Z) ≥ k} .

i=0

5.30 Write Z = (Y ∗ )−1 Y . Then, applying the homogeneity of A and the time-change formula (5.3.1a),

F Solutions to problems

613

ϑE[H(QA )] = E[H(Z)1{A(Z) = 0}]  E[H(Z)1{A(Z) = 0}1{A (Y ) = j}1{|Y j | > 1}] = j∈Z

=



E[H(B j Z)1{A(Z) = −j}1{A (Y ) = 0}1{|Y −j | > 1}]

j∈Z

= E[H(B −A(Y ) Z)1{A (Y ) = 0}] = ϑE[H(B −A(Q

A

)



QA )] .

˜ is shift invariant, we have 5.31 Applying (5.4.13) and noting that H  ˜ 2) = ˜ j H ◦ B j ) = ν(HHU ˜ ν ∗ (H ν ∗ (HU 0) j∈Z

˜ )H(Y )] = = E[H(Y



E[H(Y )H(B −j Y )1{|Y j | > 1}] .

j∈Z

5.32 By definition of the tail measure and the identity (5.4.11), we have lim

x→∞

P(Xj Xj+h > x) = ν({x ∈ 0 (R) : x0 x1 > 1}) P(X02 > x)  ∞ =ϑ P(r2 Qj Qj+h > 1)αr−α−1 dr 0

j∈Z





α/2

E[(Qj Qj+h )+ ] .

j∈Z

5.33 1. Note that x−1 j |x|1 ∈ [1, ∞] for all x ∈ 0 and j ∈ Z. Consider the non-negative shift-invariant functional  + * x → xj log x−1 j |x|1 j∈Z

+ with the convention xj log x−1 j |x|1 = 0 when xj = 0. Applying the definition of the conditional spectral tail process Q with the infargmax functional and the time-change formula (5.3.1b), we obtain  " +# * ϑ E Qj log Q−1 j |Q|1 *

j∈Z

=

 j∈Z

=

 j∈Z

+ # " * E Θj log Θj−1 |Θ|1 1{inf arg max(Θ) = 0} "  # E log (|Θ|1 ) 1 inf arg max(B j Θ) = 0 ⎡



= E [log (|Θ|1 )] ≥ E ⎣log ⎝

∞  j=0

⎞⎤ Θj ⎠⎦ .

614

F Solutions to problems

+# " * and E [log (|Θ|1 )] are simultaneously This proves that E Qj log Q−1 j |Q|1 finite or infinite and that (5.7.8a) implies (5.7.8c).  2. Write Vj = k≥j Θk for all j ∈ Z. By the time-change formula (5.3.2) applied (with  α = 1) to the  0-homogeneous function x → {log(|x0,∞ |1 ) − log(|x1,∞ |1 )}1 |x1,∞ |1 > 0 , we have for all j ≥ 1 0 ≤ E [log(V−j ) − log(V−j+1 )] = E [{log(V−j ) − log(V−j+1 }1{Θ− j = 0})] = E[Θj {log(V0 ) − log(V1 )}] . 3. If y ≥ x > 0, then



y

0 ≤ log(y) − log(x) =

s−1 = ds ≤ (y − x)/x .

x

4. Since V0 − V1 = |Θ0 | = 1, the previous bound yields # " 0 ≤ E [Θj {log(V0 ) − log(V1 )}] ≤ E V1−1 Θj . Summing over j yields, for n ≥ 1, 0 ≤ E [log(V−n )] ≤ E[log(V0 )] +

n  # " E V1−1 Θj ≤ E[log(V0 )] + 1 . j=1

5. If E[log(V0 )] < ∞, then E[log(|Θ|1 )] < ∞ by monotone convergence. Since we already know that (5.7.8a) and (5.7.8b) are equivalent and trivially imply (5.7.8c), we have proved the stated equivalence. 6. We have obtained that E [log(V−j ) − log(V−j+1 )] = E[Θj {log(V0 ) log(V1 )}] for all j ∈ N. If (5.7.8b) holds, summing over j ≥ 1 yields E [log(Θ1 )] =

∞ 

E [log(V−j ) − log(V−j+1 )] + E[log(V0 )]

j=1

= E[V1 {log(V0 ) − log(V1 )}] + E[log(V0 )] = E[V0 log(V0 ) − V1 log(V1 )] . The latter equality uses the fact that Θ0 = 1 and V0 = 1 + V1 . This proves (5.7.9). 5.34 1. By assumption P(Q ∈ H) = 1. Since H is shift invariant and homogeneous, we can apply Lemma 5.4.7 to obtain that P(Θ ∈ H) = 1. 2. The function T is continuous on R and differentiable on R∗ with T (u) = log(|u|) + 1. It is negative and greater than −e−1 on [0, 1] and increasing on

F Solutions to problems

615

[e−1 , ∞]. Thus, if u and v have the same sign, say 0 ≤ u ≤ v, and v − u ≤ 1, then  ≤ e−1 ≤ 1 if v ≤ 1 -v |T (u) − T (v)| = u {1 + log(s)}ds ≤ 1 + log(v) if v > 1 . If u and v have different signs, say v ≤ 0 < u, then the condition |u − v| ≤ 1 implies −1 ≤ v ≤ u ≤ 1. Thus |T (u) − T (v) ≤ 2 sup |s log(s)| = 2e−1 ≤ 1 . s∈[0,1]

This proves the desired bound. 3. We apply Lemma 5.6.3. We have already checked that P(Θ ∈ H) = 1. Since |S0 − S1 | ≤ 1, we can apply the bound on the function T previously obtained and we have ⎛ ⎞ ∞  |H(Θ0,∞ ) − H(Θ0,∞ )| = |T (S0 ) − T (S1 )| ≤ 1 + log ⎝ |Θj |⎠ . j=0

Since we assume that (5.7.8c) holds, this yields E[|H(Θ0,∞ )−H(Θ0,∞ )|] < ∞ and the other two conditions of Lemma 5.6.3 hold by continuity. Thus we can apply Lemma 5.6.3 and obtain (5.7.11). 5.35 For h ∈ Z and n ≥ 1, define Bh,n = {x ∈ (Rd )Z : |xh | > n−1 }. Then (Rd )Z \{0} = ∪h∈Z,n≥1 Bh,n . By assumption, ν(Bh,n ) = nα ν(Bh,1 ) < ∞, thus ν is σ-finite. The collection of sets Bh,n is countable so we can enumerate them as {Di , i ≥ 1}. We can then define C1 = D1 and Ci = Di \ (D1 ∪ · · · ∪ Di−1 ), i ≥ 2. Then (Rd )Z \ {0} is the disjoint union of the measurable sets Ci and each one is such that ν(Ci ) < ∞ since there exists h ∈ Z and n ≥ 1 such that Ci ⊂ Bh,n , or equivalently nCi ⊂ Bh,1 . By homogeneity, knowing ν on the sets Bh,1 , h ∈ Z is thus equivalent to knowing it on all the sets measure such that ν ({0}) = 0 Ci , i ≥ 1. Let ν be another α-homogeneous   and whose restrictions to the sets {y j  > 1} coincide with those of ν. Then ν restricted to each set Ci is also equal to ν, hence, for all measurable set A ∈ (Rd )Z , since by assumption ν({0}) = 0, we have ν(A) = ν(A \ {0}) =

∞ 

ν(A ∩ Ci )

i=1

=

∞ 

ν (A ∩ Ci ) = ν (A) .

i=1

If Ci ⊂ Ah,n , then n(Ci ∩ A) ⊂ Ah,1 . This proves that the restrictions of ν to the sets Ah,1 for all h ∈ Z determine ν.

616

F Solutions to problems

 α 5.36 1. Since E[|Θj | ] ≤ 1 for all j, if j∈Z qj < ∞, then  α E[ qj |Θj−i | ] < ∞ j∈Z

for all i ∈ Z, thus P(lim|j|→∞ qj |Θj−i | = 0) = 1 and this implies that P(inf arg max(q · B i Θ) ∈ Z) = 1. 2. By definition, inf arg max(0) = −∞. Thus  ∞ ν q ({0}) = P(Θ = 0, inf arg max(q · B j Θ = j)αr−α−1 dr = 0 . 0

j∈Z

For t > 0 and a measurable set A ⊂ (Rd )Z ,  ∞ ν q (tA) = P(rB j Θ ∈ tA, inf arg max(q · B j Θ = j)αr−α−1 dr j∈Z

= t−α

 j∈Z

0 ∞

P(rB j Θ ∈ A, inf arg max(q · B j Θ = j)αr−α−1 dr .

0

Thus, ν q is α-homogeneous. 3. Let H be a non-negative measurable map on (R-d )Z . For h ∈ Z, define ∞ a non-negative measurable map Hα by Hα (y) = 0 H(ry)1{r |y h | > 1} −α−1 dr. Since Hα (y) = Hα (y)1{|y h | > 0}, the time-change formula αr (5.3.3) yields  H(y)1{|y h | > 1}ν q (dy) (Rd )Z

=

 j∈Z

=

 j∈Z

=



  E[Hα (B j Θ)1{|Θh−j | > 0}1 inf arg max(q · B j Θ) = j ]   E[Hα (B h Θ)1{|Θj−h | > 0}1 inf arg max(q · B h Θ) = j ]   E[Hα (B h Θ)1 inf arg max(q · B h Θ) = j ] .

j∈Z

In the last line, we used the property that inf arg max(q · B h Θ) = j implies |Θj−h | > 0. Since P(inf arg max(q · B h Θ) ∈ Z) = 1, we further obtain  H(y)1{|y h | > 1}ν q (dy) = E[Hα (B h Θ)] (Rd )Z









= 0

= 1

E[H(rB h Θ)1{r |Θ0 | > 1}]αr−α−1 dr E[H(rB h Θ)]αr−α−1 dr .

F Solutions to problems

617

4. By Problem 5.35, the measure ν q is characterized by its restrictions to the sets {|y h | > 1}. The previous result shows that ν q does not depend on q on these sets, so it does not depend on q at all. 5. For j, h ∈ Z, apply the identity (5.7.13) yields     ν q ◦ B −j (H1{|y h | > 1}) = H ◦ B j (y)1 y h−j  > 1 ν q (dy) (Rd )Z ∞



E[H ◦ B j (rB h−j yΘ)]αr−α−1 dr

= 1 ∞ =

E[H(rB h yΘ)]αr−α−1 dr

1

= ν q (H1{|y h | > 1}) . This shows that ν q and ν q ◦ B −h coincide on the sets {|y h | > 1} and are therefore equal. This proves that ν q is shift-invariant. 6. If Θ is the spectral tail process associated to a B-invariant tail measure ν, then by definition of the spectral tail process,   H(y)1{y h > 1}ν(dy) = H(B h y)1{y 0 > 1}ν(dy) (Rd )Z

(Rd )Z ∞

 =

E[H(B h Θ)]αr−α−1 dr



1

= (Rd )Z

H(y)1{y h > 1}ν q (dy) .

This implies that ν q = ν.

Chapter 6 6.1 We know from Example 5.2.10 that the tail process is (. . . , 0, 0, φ−1 (1 − b)Y0 , Y0 , φbY0 , 0, 0, . . . , ) , and Y0 is a two-sided Pareto random variable with skewness pX given in (5.2.24), P(b = 1) = (1 + |φ|α )−1 and the cluster size distribution is the inverse of the candidate extremal index: ϑ−1 =

1 + |φ|α . |φ|α ∨ 1

– Cluster size distribution. Since the process is 2-dependent, π(m) = 0 for m = 1, 2. For m = 1,

618

F Solutions to problems

π(1) = ϑ−1 P (|φ|b |Y0 | ≤ 1, (1 − b)|Y0 | ≤ |φ|) 1 − |φ|α |φ|α 1{|φ| ≤ 1} + ϑ−1 (1 − |φ|−α ) 1{|φ| > 1} α 1 + |φ| 1 + |φ|α = (1 − |φ|α )1{|φ| ≤ 1} + (1 − |φ|−α )1{|φ| > 1} .

= ϑ−1

There can be two exceedences only if b = 1, thus π(2) = ϑ−1 P (|φ|b |Y0 | > 1, (1 − b)|Y0 | ≤ |φ|) = ϑ−1

|φ|α ∧ 1 = |φ|α ∧ |φ|−α . 1 + |φ|α

– Stop-loss index: ⎛ θstoploss (S) = P ⎝

∞ 

⎞ (Yj − 1)+ > S, Y ∗−∞,−1 ≤ 1⎠

j=0

1 P((Y0 − 1)+ + (φY0 − 1)+ > S) 1 + |φ|α |φ|α + P((Y0 − 1)+ + (φ−1 Y0 − 1)+ > S, |Y0 | < |φ|) . 1 + |φ|α

=

– The large deviations index: ⎞α ⎛ ⎞α ⎤ ⎡⎛ ∞ ∞   θlargedev = E ⎣⎝ Θj ⎠ − ⎝ Θj ⎠ ⎦ = =

j=1 + + α α# E (Θ0 + Θ1 )+ − (Θ1 )+ " α α# E (Θ0 (1 + φb))+ − (φbΘ0 )+ α 

"

= pX

j=0

|φ| 1 α pX ((1 + φ)α + + − φ+ ) 1 + |φ|α 1 + |φ|α  α +(1 − pX )((1 + φ)α − − φ− ) .

If φ ≥ 0, this yields p(1 + φα )−1 (1 + φ)α . – Ruin index; " α α α# θruin = E (Θ0 )+ ∨ (Θ0 (1 + φb))+ − (φbΘ0 )+ .

6.2 The tail process is given by Yj = ρj Y0 1{j ≥ −N } with P(N = n) = (1 − ρα )ραn . We also know from Problem 5.7 that the sequence Q is defined by: Qj = ρj , j ≥ 0 and Qj = 0 otherwise. The mean cluster size is the inverse of the candidate extremal index calculated in Problem 5.7: ϑ = 1 − ρα .

F Solutions to problems

619

– Cluster size distribution: for m ≥ 1, ⎛ ⎞ ∞  1{Yj > 1} = m, Y ∗−∞,−1 ≤ 1⎠ π(m) = ϑ−1 P ⎝ ⎛ = ϑ−1 P ⎝ ⎛ = P⎝

j=0 ∞ 

⎞ 1{Yj > 1} = m, N = 0⎠

j=0 ∞ 



1{Yj > 1} = m⎠

j=0 −(m−1)

= P(ρ – Stop-loss index:



θstoploss (S) = P ⎝

< Y0 ≤ ρ−m ) = ρ(m−1)α − ρmα .

∞  j=0

⎞ (Yj − 1)+ > S, Y ∗−∞,−1 ≤ 1⎠ ⎛

= (1 − ρα )P ⎝

∞ 

⎞ (ρj Y0 − 1)+ > S ⎠ .

j=0

– Large deviations index: ⎞α ⎤ ⎞α ⎡⎛ ⎛ ∞   1 − ρα θlargedev = ϑE ⎣⎝ Qj ⎠ ⎦ = (1 − ρα ) ⎝ ρj ⎠ = . (1 − ρ)α j=0 j∈Z

6.3 By Theorem 6.1.4, since AC(rn , cn ) holds, the distribution of X −rn ,rn conditionally on |X 0 | > cn tends to the distribution of the tail process. Thus + * lim P X ∗1,rn ≤ cn | |X 0 | > cn = P(Y ∗1,∞ ≤ 1) = ϑ . n→∞

¯ n,j = |Xj |1{|Xj | ≤ cn } − mn with mn = E[|X0 |1{|X0 | ≤ cn }]. 6.4 Write X Since α > 1 and rn = o(cn ), we have mn = O(1) and for large enough n, ⎞ ⎛ ⎞ ⎛ rn rn   ¯ n,j > 2ηcn − rn mn ⎠ X |Xj |1{|Xj | ≤ cn } > 2ηcn ⎠ = P ⎝ P⎝ j=1

⎛ ≤ P⎝

j=1 rn 

⎞ ¯ n,j > ηcn ⎠ . X

j=1

Assume first that α ∈ (1, 2). Fix δ ∈ (α, 2). Applying Markov and Burkholder’s inequalities (E.1.6), we obtain

620

F Solutions to problems

 rn P( j=1 |Xj |1{|Xj | ≤ cn }| > 2ηcn ) rn P(|X0 | > cn )



P



rn j=1

¯ n,j > ηcn X

rn P(|X0 | > cn ) ¯ n,0 |δ ] E[|X ≤ cst δ cn P(|X0 | > cn ) E[|X0 |δ 1{|X0 | ≤ cn }] . ≤ cst cδn P(|X0 | > cn )

Letting n → ∞ and apply (1.4.5a), we obtain r n P( j=1 |Xj |1{|Xj | ≤ cn }| > 2ηcn ) = O(δ−α ) . lim sup rn P(|X0 | > cn ) n→∞ Since δ > α, this proves our claim in the case α ∈ (1, 2). Assume now that α ≥ 2. Write Ln = E[X02 1{|X0 | ≤ cn }]. If α > 2, the sequence Ln is bounded and if α = 2, it is a slowly varying function of cn by Proposition 1.4.6. Take δ > 2α − 2. Applying Markov and Rosenthal’s inequalities (E.1.4)

  rn rn ¯ n,j > ηcn P X |Xj |1{|Xj | ≤ cn }| > ηcn ) P( j=1 j=1 ≤ rn P(|X0 | > cn ) rn P(|X0 | > cn ) δ/2 ¯ 2 ])δ/2 + rn E[X ¯δ ] rn (E[X n,0 n,0 ≤ cst cδn rn P(|X0 | > cn ) δ/2

rn (E[X02 1{|X0 | ≤ cn }])δ/2 + rn E[|X0 |δ 1{|X0 | ≤ cn }] cδn rn P(|X0 | > cn )  δ/2−1 rn Ln E[|X0 |δ 1{|X0 | ≤ cn }] . + cst ≤ cst δ/2+1 cn cδn P(|X0 | > cn ) cn P(|X0 | > cn ) ≤ cst

Since Ln is either bounded or a slowly varying function of cn , we obtain as previously r n P( j=1 |Xj |1{|Xj | ≤ cn }| > ηcn ) = O(δ−α ) . lim sup rn P(|X0 | > cn ) n→∞ 6.5 1. To prove that d˜ is a pseudo-metric, the only nontrivial step is to show that d˜ satisfies the triangle inequality, but, that is, implied by Condition (6.3.1). ˜ is complete. Let ˜ d) 2. Separability is easy to check and we prove that (X, ˜ Then we can find a strictly increasing ˜ d). {˜ xn } be a Cauchy sequence in (X, sequence of nonnegative integers {nk } such that ˜ xm , x d(˜ ˜n )
0 and yi  < y∞ /4 for |i| ≥ N . Since B kn x − y∞ → 0 there exists an integer n0 > 0 such that B kn x − y∞ < y∞ /4 for n ≥ n0 . By our assumption, we can find an integer n ≥ n0 such that kn − kn0 + i0 ≥ N , and it follows that 1 3 y∞ < (B kn x)i0  = xkn +i0  = |(B kn0 x)kn −kn0 +i0 | < y∞ , 4 2 which is a contradiction. Hence, the sequence {kn } is bounded. 6.7 Applying (5.5.9) and Problem 5.25 we obtain ∞ ∞   m2 π(m) = m2 P(E(Y ) = m | Y ∗−∞,−1 ≤ 1) m=1

m=1

= E[E 2 (Y ) | Y ∗−∞,−1 ≤ 1]  P(|Y j | > 1) . = ϑ−1 ν ∗ (E 2 ) = ϑ−1 j∈Z

622

F Solutions to problems

6.8 We only prove the second statement. We need to prove that for every δ > 0 (and an arbitrary norm on Rh+1 ),   lim lim sup nP((X02 , X0 X1 , . . . , X0 Xh ) > c2n δ 2 , m→∞ n→∞   2 (Xj , Xj Xj+1 , . . . , Xj Xj+h ) > c2n ) = 0 . max m≤|j|≤rn −h

Taking for simplicity the supnorm, we have, for every x ∈ Rh+1 and b > 0,   2 (x0 , x0 x1 , . . . , x0 xh ) > b2 =⇒ |(x0 , . . . , xh )| > b . Thus   P((X02 , X0 X1 , . . . , X0 Xh ) > c2n δ 2 , max

m≤|j|≤rn −h

  2 (Xj , Xj Xj+1 , . . . , Xj Xj+h ) > c2n )

≤ P(|X 0,h | > cn δ, ≤

h 

max

m≤|j|≤rn −h

P(||Xi | > cn | > δ/h,

i=0

≤ hP(||X0 | > cn | > δ/h,

|X j,j+h | > cn ) max

m−h≤|j|≤rn

max

m−2h≤|j|≤rn

|Xj | > cn )

|Xj | > cn ) .

Thus AC(rn , cn ) for {Xj , j ∈ Z} implies AC(rn − h, c2n ) for the sequence {(Xj2 , Xj Xj+1 , . . . , Xj Xj+h ), j ∈ Z}.

Chapter 7 7.1 Let f be a bounded continuous function with bounded support. Then, /

Mn 0 m mn n n −Nn (f ) −f (Xn,1 ) E[e−f (Xn,1 ) ] + 1 − E[e ] = E E[e ] = n n   n n 1 mn 1 ν(e−f 1Un ) + 1 − = = 1 + ν([e−f − 1]1Un ) . n n n There exists n0 such that the support of f is included in Un for all n ≥ n0 . This yields, for n ≥ n0 ,  n −f 1 E[e−Nn (f ) ] = 1 + ν([e−f − 1]) → eν(e −1) . n This proves that Nn converges weakly to a PPP(ν). 7.2 Let A ⊂ B0 . By definition, there exists  > 0 such that z ∈ A implies z > . If A is moreover measurable,

F Solutions to problems

 ν(A) ≤



623

P(r Z > )αr−α−1 dr = −α E[Z ] < ∞ . α

0

Thus ν is B0 -boundedly finite. The point process with points {Γi , Z i ), i ≥ 1} is a Poisson point process on [0, ∞)×E with mean measure Leb⊗μ, μ being the distribution of the process Z. Let φ : (0, ∞)×E → E be the measurable map defined by φ(t, z) = t−1/α z. Then ν = (Leb ⊗ μ) ◦ φ−1 , thus Proposition 7.1.12 yields the result. 7.3 Let f be a bounded non-negative continuous function with bounded support in Rd \{0}, that is, there exists  > 0 such that f (x) = 0 if |x| ≤ . Since the points X n,i are ordered according to their norm, there exists a greatest index j such that |X j | > . Thus, for a fixed m ≥ 1, 3 2 n 3 2 E e−Nn (f ) = E e− i=1 f (X n,i ) n 

=

j=1 m 

=

2 j 3 E e− i=1 f (X n,i ) 1{|X n,j+1 | ≤  < |X n,j |} 2 j 3 E e− i=1 f (X n,i ) 1{|X n,j+1 | ≤  < |X n,j |} + Sn,m .

j=1

For each fixed m, we have by assumption lim

n→∞

m 

2 j 3 E e− i=1 f (X n,i ) 1{|X n,j+1 | ≤  < |X n,j |}

j=1

=

m 

2 j 3 E e− i=1 f (Pi ) 1{|Pj+1 | ≤  < |Pj |} .

j=1

Furthermore, the assumption on N implies that lim

m→∞

m 

2 j 3 E e− i=1 f (Pi ) 1{|Pj+1 | ≤  < |Pj |} = E[e−N (f ) ] .

j=1

To conclude, we only need to apply the triangular argument Lemma A.1.4, that is to prove that lim lim sup

m→∞ n→∞

n 

2 j 3 E e− i=1 f (X n,i ) 1{|X n,j+1 | ≤  < |X n,j |} = 0 .

j=m+1

This holds since n  j=m+1

2 j 3 E e− i=1 f (X n,i ) 1{|X n,j+1 | ≤  < |X n,j |} ≤ P(|X n,m | > ) .

624

F Solutions to problems

By assumption, limn→∞ P(|X n,m | > ) = P(|Pm | > ) (for all but countably many ) and since there are only finitely many points of N greater than , it follows that limm→∞ P(|Pm | > ) = 0. 7.4 N is a marked PPP so we only need to compute its mean measure, say μ. ∞ α The assumption j=0 E[|Θj | ] < ∞ implies that E[(Θ∗ )α | Θ∗−∞,−1 ] < ∞. Therefore we can apply Problem 7.2. Let f be a bounded continuous function with bounded support in [0, ∞) × 0 (Rd ). Then, by Problem 5.17,  ∞ ∞ −α−1 , μ(f ) = P(Θ∗−∞,1 = 0) E[f (t, rΘ)]αr dr 0 0  ∞ ∞   E[f (t, rΘ)1 Θ∗−∞,1 = 0 ]αr−α−1 dr = ν ∗ (f ) . = 0

0



Thus μ = ν . 7.5 Let Nn be the functional point process of exceedences (cf. (7.2.1)) and N its limit under AC(rn , cn ) and (7.3.14). Let A be a bounded Borel set in [0, ∞) such that Leb(∂A) = 0. Then A × (1, ∞) is bounded in [0, ∞) × (0, ∞) and is a continuity set of Leb ⊗ να , thus w ¯n (A) = Nn (A × (1, ∞)) =⇒ N N (A × (1, ∞)) . ¯ . Let Y be a random variable We can compute the Laplace functional of N with a Pareto distribution with tail index α, independent of the sequence Q. Define a random variable K by  1{Y Qj > 1} . K= j∈Z

Let f be a bounded continuous function with bounded support in [0, ∞) and write f¯(x, y) = f (x)1{y > 1}. Then, using the expression for the Laplace functional of N in (7.3.9) and the fact that Q∗ ≤ 1, we obtain 

¯

¯

log E[e−N (f ) ] = log E[e−N (f ) ]  ∞ ∞  =ϑ E[e− j∈Z f (x)1{rQj >1} − 1]dxαr−α−1 dr 0 ∞ 0 ∞  E[e− j∈Z f (x)1{rQj >1} − 1]dxαr−α−1 dr =ϑ 1 ∞ 0  =ϑ E[e−f (x) j∈Z 1{Y Qj >1} − 1]dx 0 ∞ E[e−Kf (x) − 1]dx . =ϑ 0

∞ Let us now compute the Laplace functional of the point process i=1 δTi Ki where Ki , i ≥ 1 are i.i.d. copies of K. Write ψ(s) = − log E[e−Kf (s) ]. Then, conditioning on T = σ(Ti , i ≥ 1), we have

F Solutions to problems

E[e−

∞

i=1

Ki f (Ti )

2

625

3 |T] %

∞

] = E E[e− i=1 Ki f (Ti ) $∞  =E E[e−Ki f (Ti ) | T ] i=1

=E

$∞ 

% e

−ψ(Ti )

3 2 ∞ = E e− i=1 ψ(Ti ) = E[e−N0 (ψ) ] ,

i=1

where N0 is a homogeneous PPP on [0, ∞) with rate ϑ. Thus  ∞  ∞ ∞ (e−ψ(s) − 1)ds = ϑ E[e−Kf (s) − 1]ds . log E[e− i=1 Ki f (Ti ) ] = ϑ 0

0

  Since the map H(x) = 1 j∈Z 1{xj > 1} = m is shift-invariant, we obtain by (5.4.8), ⎛ P(K = m) = P ⎝ 



⎞ 1{Y Qj > 1} = m⎠

j∈Z ∞

E[H(rQ)]αr−α−1 dr = E[H(Y ) | Y ∗−∞,−1 ≤ 1] ⎞ ⎛  1{Yj > 1} = m | Y ∗−∞,−1 ≤ 1⎠ . = P⎝

=

1

j∈Z

7.6 Let ∞ = x0 > x1 > x2 > · · · > xk . −1 P(c−1 n X(n:n−k+1) ≤ xk , . . . , cn X(n:n) ≤ x1 ) = P(Nn ((xk , ∞)) = k − 1, . . . , Nn ((x2 , ∞)) = 1, Nn ((x1 , ∞)) = 0)

→ P(N ((xk , ∞)) = k − 1, . . . , N ((x2 , ∞)) = 1, N ((x1 , ∞)) = 0) = P(N ((xk , xk−1 ]) = 1, . . . , N ((x2 , x1 ]) = 1, N ((x1 , ∞)) = 0) −α

= e−xk

k 

(x−α − x−α i i−1 ) .

i=1

 7.7 1. Set X n,i =

(0)

Xi cn

X

(1)

, b(cin ) . Then {X n,i , 1 ≤ i ≤ n} is a null array (cf.

proof of Theorem 7.2.1) and by assumption, ⎤ ⎡ ⎢ nE ⎣δ X (0) i cn

(1) X , b(ci ) n

v#  ⎥ −→



μ,

on (0, ∞) × [0, ∞) .

Thus we can apply Theorem 7.1.21 to prove the stated convergence.

626

F Solutions to problems

2. By Proposition 3.2.4, we have μ(A) = N=

∞ 

-∞ 0

P((s, sκ W ) ∈ A)αs−α−1 ds. Thus

δPi ,Piκ Wi ,

i=1

where

∞ i=1

Pi is a PPP(να ) and {Wi , i ≥ 1} are i.i.d. copies of W . (0)

(1)

d

3. By the convergence of points, (X(n:n) /cn , X[n:n] /b(cn )) −→ (Z, Z κ W ) where Z has a Fr´echet distribution with tail index α > 0 and is independent of W . 7.8 Let b be a Bernoulli random variable with success probability (1+|φ|α )−1 . Then the spectral tail process is the sequence {Yj , j ∈ Z} given by (. . . , 0, 0, φ−1 (1 − b)Θ0 , Θ0 , φbΘ0 , 0, 0, . . . , ) , where P(Θ0 = 1) = pX . By (5.6.6), 0 / ϑ = E sup |Θj |α − sup |Θj |α j≥0 α

=

j≥1

|φ| 1 1 ∨ |φ|α + {1 ∨ |φ|α − |φ|α } = . α α 1 + |φ| 1 + |φ| 1 + |φ|α

By (7.5.4b), / 0  1∨φα α α α ϑ = E sup(Θj )+ − sup(Θj )+ | Θ0 > 0 = 1+φ 1 j≥0 j≥1 +

if φ ≥ 0 , if φ < 0 .

7.9 The forward spectral tail process is given by Θj = ρj Θ0 , j ≥ 0 . By (5.6.6), / 0 ϑ = E sup |Θj |α − sup |Θj |α = 1 − |ρ|α . j≥0

j≥1

By (7.5.4b), / ϑ =E +

sup(Θj )α + j≥0



sup(Θj )α + j≥1

7.10 Set p(u) = E[u, Θ0 α + ].

0  1 − ρα | Θ0 > 0 = 1 − |ρ|2α

if ρ ≥ 0 , if ρ < 0 .

F Solutions to problems

627

1. By the representation of the exponent measure ν X in terms of Θ0 (see (2.2.6)), we have  ∞ P(u, rΘ0  > 1)αr−α−1 dr = E[u, Θ0 α lim nP(X 0 , u > cn ) = +] . n→∞

0

The latter quantity is positive by assumption and this relation and the definition of the sequence un imply that un ∼ p(u)1/α cn . 2. Proposirion 5.6.6 yields ϑ





⎣ E[Qj , uα +] = E



E[Θj , uα +−

j≥0

j∈Z



⎤ ⎦ E[Θj , uα + = p(u) .

j≥1

3. Write H(u, x) = {x ∈ Rd : x, u > p(u)1/α x} P( max X i , u ≤ un x) 1≤i≤n

∼ P( max X i , u ≤ p(u)1/α cn x) 1≤i≤n

= P (Nn (H(u, x)}) = 0) ⎛ ⎞ ∞     → P⎝ 1 Pi Qi,j , u > p(u)1/α x = 0⎠  =P

i=1 j∈Z ∞ 

   1/α sup 1 Pi Qi,j , u+ > p(u) x = 0

i=1 j∈Z

  = exp −ϑ

∞ 0

P(r supQj , u+ > p(u)

1/α

x)αr

−α−1

 dr

j∈Z

 / 0 −α −1 α = exp −ϑx p(u) E supQj , u+ . j∈Z

To conclude, we note that by Proposition 5.6.6 again, we have 0 / 0 / α α Θ , u − sup Θ , u = E sup supQj , uα j j + + + . j∈Z

j≥0

j≥1

7.11 Define the map πh : Rh+1 → Rh+1 by πh (x0 , . . . , xh ) = (x20 , x0 x1 , . . . , x0 xh ) . To a map f : Rh+1 → R we associate a map Hf,h : [0, ∞) × RZ → R defined by  f ◦ πh (xj,j+h ) . Hf,h (t, x) = 1[0,1] (t) j∈Z

628

F Solutions to problems

This is well defined on [0, 1] × 0 (R) for functions f which vanish in a neighborhood of zero. Embedding X n,i into 0 (R) by adding zeroes on the left and on the right, we have Nh,n (f ) = Nn (Hf,h ) + N (Hf,h ) = d

∞  i=1

n−h 

f ◦ πh (c−1 n (Xj , . . . , Xj+h )) ,

j=mn rn +1 ∞   (i)

Hf,h (P˜i Q ) =

(i)

(i)

(i)

(i)

(i)

f (Pi ((Qj )2 , Qj Qj+1 , ˙,Qj Qj+h )) ,

i=1 j∈Z

where {P˜i , Q(i) , i ≥ 1} are the points of a PPP(ϑνα,p ⊗ PQ ) and Pi = P˜i2 , so that {Pi2 , Q(i) , i ≥ 1} are as stated. The sum of the terms between mn rn and n − h is not zero if X ∗mn rn ,n > cn  for some  > 0. Under AC(rn , cn ), this event is OP (rn P(|X0 | > cn )) = o(1) by Corollary 6.2.6. d

Thus we are left to prove that Nn (Hf,h ) −→ N (Hf,h ) for all non-negative bounded continuous functions f with bounded support in Rh+1 \ {0}. If f has bounded support in Rh+1 \ {0}, then Hf,h has bounded support in [0, ∞) × 0 (RZ ). Indeed, x∗ ≤  implies |xj xj+ | ≤ 2 for all j ∈ Z and d

 = 0, . . . , h. We cannot directly conclude that Nn (Hf,h ) −→ N (Hf,h ), since Hf,h is unbounded. Thus we must use a truncation argument as in the proof of Corollary 7.3.4. This follows from the fact that AC(rn − h, c2n ) holds for the sequence {(Xj2 , Xj Xj+1 , . . . , Xj Xj+h ), j ∈ Z} which is proved in Problem 6.8. 7.12 The only thing to do is to check that AC(rn , cn ) implies that  rn n lim E[1 − e− i=1 f (X n,i ) ] = μ(f ) . n→∞ rn This is a straightforward adaptation of the proof of Theorem 6.2.5.

Chapter 8 8.1 If the random variables are i.i.d. then by Chebyshev inequality we obtain that the probability in (ANSJ(an )) is bounded by nE[X02 1{X0 ≤ an }] . a2n δ 2 Keeping in mind that limn→∞ nP(|X0 | > an ) = 1 and applying Proposition 1.4.6 we obtain nE[X 2 1{X ≤ an }] E[X 2 1{X ≤ an }] P(|X0 | > an ) = lim 2 2 2 n→∞ n→∞ an δ (an )2 δ 2 P(|X0 | > an ) P(|X0 | > an ) α 2−α  . = δ −2 2−α lim

F Solutions to problems

629

Letting  → 0 proves the claim. The proof for an m-dependent time series is done by splitting the sum into n/m sums with indices separated by m. 8.2 We only do the case α = 1. The characteristic function of aX is given by logE[eit(aX) ] = log E[ei(at)X ] 2 = −σ|at|{1 + i βsign(at) log(|at|)} + ic(at) π 2 = −|a|σ|t|{1 + i βsign(a)sign(t)[log(|a|) + log(|t|)]} + i(ac)t π 2 2 = −|a|σ|t|{1 + i βsign(a)sign(t) log(|t|)} + ita{c − σ β log(|a|)} . π π This is the stated law. 8.3 We compute the characteristic function. If α = 1, log E[eiz ,Λv  ] = log E[eiz ,v Λ ] = −σ α |z, v|α {1 − i βsign(z, v)(tan πα/2)} v α βv α | {1 − i sign(z, )(tan πα/2)} . = −σ α |v| |z, |v| |v| α

This is an α-stable vector with scale parameter σ α |v| and spectral measure −1 concentrated on the vector β |v| v. For α = 1, 2 log E[eiz ,Λv  ] = −σ|z, v|{1 + iβ sign(z, v) log(z, v)} π 2 βv v v |{1 + iβ sign(z,  log(|z, |)} = −σ |v| |z, |v| π |v| |v| 2 − i βσz, v log(|v|) . π Thus Λv is a 1-stable random vector with scale parameter and spectral measure as previously and drift parameter − π2 βσz, v log(|v|). 2 

α 3 ∞ < ∞ implies that 8.4 By Problem 5.17, the condition E |Θ | j j=0 ∗ P(Θ−∞,−1 = 0) > 0 and we conclude by the identity (5.7.4) applied to (8.2.8). 8.5 By Example 5.2.10, we know that the spectral tail process is given by Θ−1 = φ−1 (1 − b)Θ0 and Θ1 = φbΘ0 , with Θ0 = ±1 with probability pX given by pX =

α pZ (1 + φα + ) + (1 − pZ )φ− , 1 + |φ|α

630

F Solutions to problems

P(b = 1) = 1−P(b = 0) = (1+|φ|α )−1 and Θj = 0 if j ∈ / {−1, 0, 1}. The scale parameter, expressed as Γ (1 − α) cos(πα/2)σ0α if α = 1 or πσ0 /2 if α = 1, and the skewness β are given by (cf. (8.2.10a) and (8.2.10b)), α

α

σ0α = E [|Θ0 + Θ1 | − |Θ1 | ] = E [|1 + φb|α − b|φ|α ] =

|φ|α |1 + φ|α |1 + φ|α − |φ|α + = ; 1 + |φ|α 1 + |φ|α 1 + |φ|α

and if φ = −1, " α α# β = 2σ0−α E (Θ0 + Θ1 )+ − (Θ1 )+ − 1 " α α# = 2σ0−α E (Θ0 (1 + φb))+ − (φbΘ0 )+ − 1 " α α# = 2pX σ0−α E (1 + φb)+ − (φb)+ " α α# + 2(1 − pX )σ0−α E (1 + φb)− − (φb)− − 1 =

α 2pX σ0−α {(1 + φ)α 2pX σ0−α |φ|α + − φ+ } + α 1 + |φ| 1 + |φ|α

α 2(1 − pX )σ0−α {(1 + φ)α 2(1 − pX )σ0−α |φ|α − − φ− } + −1 1 + |φ|α 1 + |φ|α α α 2{pX φα (2pX − 1)(1 + φ)α + + (1 − 2pX )(1 + φ)− } − + (1 − pX )φ+ } + = |1 + φ|α |1 + φ|α  α 2{pX φα +(1−p )φ } X − + 2pX − 1 + if φ > −1 , (1+φ)α = 2pX |φ|α 1 − 2pX + |1+φ|α if φ < −1 .

+

If α = 1, the drift parameter is c = c0 (2pX − 1) − E [Θ0 (1 + bφ) log(|1 + bφ|) − Θ0 bφ log(|bφ|)] = c0 (2pX − 1) −

(2pX − 1) {(1 + φ) log(|1 + φ|) − φ log(|φ|)} . 1 + |φ|

8.6 The forward tail process is given by Θj = ρj Θ0 with P(Θ0 = 1) = pX given by pX = pZ 1{ρ ≥ 0} +

pZ + (1 − pZ )|ρ|α 1{ρ < 0} . 1 + |ρ|α

The scale parameter is Γ (1 − α) cos(πα/2)σ0α if α = 1 or πσ0 /2 if α = 1 with α  α ⎤ ⎡ ∞ ∞   2        α 3 1 − |ρ|α −α σ0α = E ⎣ Θj  −  Θj  ⎦ = E |1 − ρ| − ρ(1 − ρ)−1  = . (1 − ρ)α  j=0   j=1  The skewness of the limiting stable law is given by

F Solutions to problems

⎡⎛ β = 2σ0−α E ⎣⎝

∞ 

⎞α

Θj ⎠ − ⎝

j=0

=

2σ0−α (1

−α

− ρ)



+

∞ 

631

⎞α ⎤ Θj ⎠ ⎦ − 1

j=1

+

# " α E[ (Θ0 )α + − (ρΘ0 )+ − 1

α = 2(1 − |ρ|α )−1 {pX (1 − ρα + ) − (1 − pX )ρ− } − 1  if ρ ≥ 0 , 2pX − 1 α = X )|ρ| 2 pX −(1−p − 1 if ρ < 0 . 1−|ρ|α

The drift parameter if α = 1 is c = c0 (2pX − 1) − (2pX − 1){log((1 − ρ)−1 ) − (1 − ρ)−1 ρ log(|ρ|)} . 8.7 1. Since α < 2, α/2 ∈ (0, 1) thus, for every  ≥ 0, ⎡⎛ ⎡⎛ ⎞α/2 ⎤ ⎞α/2 ⎤   ⎢ ⎥ ⎥ ⎢ E ⎣⎝ |Qj Qj+1 |⎠ ⎦ ≤ E ⎣⎝ Q2j ⎠ ⎦ j∈Z

j∈Z

⎡ ⎤   ≤ E⎣ |Qj |α ⎦ = E[|Qj |α ] = ϑ−1 < ∞ . j∈Z

j∈Z

 This implies that P( j∈Z |Qj Qj+ | < ∞) = 1 for all  ≥ 1. 2. Since α/2 < 1, as in Theorem 8.3.1, the limit of Sh,n is obtained by the summation of the points of Nh , i.e. ∞  d Sh,n −→ Pi W i i=1 (0)

(1)

(h)

()

with  W i , i ≥ 1, i.i.d. copies of W = (Wi , Wi , . . . , Wi ) and Wi j∈Z Qj Qj+ ,  = 0, . . . , h. This is an α/2 stable distribution.

=

3. If 2 < α < 4 and the asymptotic negligibility condition holds, then Theod rem 8.4.2 yields that S¯h,n −→ Λ where Λ is an α-stable random vector with characteristic function eψ and ψ given by ψ(z) = ϑΓ (1−α) cos(πα/2)E [|z, W |α {1−i sign(z, W) tan(πα/2)}] . 8.8 Let T : R \ {0} × R → R be defined by T (x, y) = xy. Let μ be the measure defined on R \ {0} by μ(f ) = μ ◦ T −1 (f ). Then μ is boundedly finite on R \ {0}. Indeed, if f is measurable, non-negative, bounded with bounded support on R \ {0}, then there exists  > 0 such that f (x) ≤ cst1{|x| > }, thus  ∞

E[f (r1+κ W0 W1 )]αr−α−1 dr  ∞ ≤ cst P(r1+κ |W1 | > )αr−α−1 dr = α/(1+κ) E[|W1 |α/(1+κ) ] .

μ(f ) =

0

0

632

F Solutions to problems

˜ = ∞ δP U is a PPP(να/(1+κ), ). Thus by Proposition 7.1.12, N i i i=1 To a function f defined on R \ {0}, we associate a function Hf on R \ {0} × R defined by Hf (x, y) = f (xy). Then Hf does not have bounded support on R \ {0} × R. Fix  > 0 and define Hf, (x, y) = Hf (x, y)1{|x| > } and L = Hf − Hf, . Then ˜

E[e−Nn (f ) ] = E[e−Nn (Hf ) ] = E[e−Nn (Hf, )−Nn (L ) ] = E[e−Nn (Hf, ) ] + E[e−Nn (Hf, ) {e−Nn (L ) − 1}] . If f is moreover non-negative, bounded and continuous and has bounded support in R \ {0}, then Hf, is bounded and continuous and has bounded support in R \ {0} × R, thus lim E[e−Nn (Hf, ) ] = E[e−N (Hf, ) ] .

n→∞

Since f has bounded support in R \ {0}, we have, for  small enough, N (Hf, ) =

∞ 

f (Pi1+κ W0 W1 1{Pi > })

i=1

=

∞ 

˜ (f ) . f (Pi1+κ W0 W1 ) = N (Hf ) = N

i=1

Moreover, for  small enough, 0 ≤ E[e−Nn (Hf, ) {1 − e−Nn (L ) }] ≤ E[Nn (L )] ≤ cst nP(X0 X1 > an b(an ), X0 ≤ an )) . Since (3.2.16) holds, we have lim lim sup E[e−Nn (Hf, ) {1 − e−Nn (L ) }] = 0 .

→0 n→∞

By the triangular argument Lemma A.1.4, this proves that ˜

˜

lim E[e−Nn (f ) ] = E[e−N (f ) ]

n→∞ w ˜n =⇒ ˜. and N N

Since α/(1 + κ) < 1 and E[|Ui |α/(1+κ) ] = E[|W1 |α/(1+κ ] < ∞ by Proposition 3.2.12, we can apply the summation  of points as in Theorem 8.3.1 and the ∞ stated convergence holds. The series i=1 Pi Ui is almost surely convergent and has an (α/(1 + κ), σ, β, 0)-stable law with parameters α

σ α = Γ (1 − α) cos(πα/2)E[|W1 | 1+κ ] , E[(W0 W1 ) 1+κ  ] β= . α E[|W1 | 1+κ ] α

F Solutions to problems

633

Chapter 9 9.1 From Example 5.2.10 we know that the tail process is (. . . , 0, 0, φ−1 (1 − b)Y0 , Y0 , φbY0 , 0, 0, . . . , ) , and Y0 is a two-sided Pareto random variable and b is a Bernoulli random variable with mean (1 + |φ|α )−1 . We know that ⎧ ⎫ ∞ ⎨ ⎬   var(T(1)) = P(Yj > 1 | Y0 > 1) = 1 + 2 P(Yj > 1 | Y0 > 1) . ⎩ ⎭ j=1

j∈Z

Thus, the limiting variance in (9.5.4) is   1 ∧ φα α−2 var(T(1)) = α−2 1 + 2 1 + φα if φ > 0 and α−2 var(T(1)) = α−2 if φ < 0. 9.2 From Example 5.2.12 we know that the tail process is Y0 (b1 . . . bj ρ−j , . . . , b1 b2 ρ−2 , b1 ρ−1 , 1, ρ, . . . , ρj ) , where {bj } is a sequence of i.i.d. Bernoulli random variables with mean ρα . Thus 

P(Yj > 1 | Y0 > 1) =

j∈Z



P(Yj > 1) = 1 + 2

∞ 

P(Yj > 1)

j=1

j∈Z

=1+2

1 + ρα ρα = . 1 − ρα 1 − ρα

9.3 The random n variables Zj = log(Xj ), j ≥ 1, are standard exponential and γ n,n = n−1 j=1 Zj which is the MLE of γ. The model is regular and γ 2 is the Cramer-Rao bound for the estimation of γ. 9.4 Since Tn (s) = 0 for s > X(n:n) , integration by parts gives  ∞  ∞  ∞ n,k = − Θ Tn (s)ds = 1 + Tn (s)ds sTn (ds) = Tn (1) + 1



=1+

1 X(n:n−k) un

T,n (s)ds +



1 ∞

1

T,n (s)ds ,

1

where un = F −1 (1 − k/n). By Proposition 9.1.3, consistency of T,n implies P u−1 n X(n:n−k) −→ 1. Thus the first integral above is oP (1). Furthermore, for all t > 0,

634

F Solutions to problems



t

P T,n (s)ds −→

1



t

T (s)ds. 1

-t Since limt→∞ 1 T (s)ds = 1/(α−1), the claimed convergence will be obtained if we prove that for all η > 0,  ∞ T,n (s)ds > η = 0 . lim lim sup P (F.2) t→∞ n→∞

t

Since E[T,n (s)] = nk −1 F (un s) and k = nF (un ), we have / ∞ 0  ∞  F (un s) n ∞ , Tn (s)ds = E ds . F (un s)ds = k F (un ) t t t Fix  ∈ (0, α). By Potter’s bound Proposition 1.4.2, we have / ∞ 0  ∞ , Tn (s)ds ≤ cst E s−α+ ds = O(t−α++1 ) . t

t

Since α > 1, this proves (F.2) by Markov’s inequality. 9.5 By integration by parts, we have  ∞  (2) 2 γ n,k = − (log(t)) Tn (dt) = 2 1



t−1 log(t)Tn (t)dt .

1

By the same technique as for the proof of consistency of the Hill estimator, we have  ∞ (2) P γ n,k −→ 2 t−1 log(t)T (t)dt = 2γ 2 . 1

Similarly to (9.5.5), under the bias condition (9.6.1), we have  ∞ √ (2) , n (t) log(t) dt k{ γn,k − 2γ 2 } = 2 T t 1 √  ∞ Tn (t) − T (t) log(t)dt +2 k t 1  1 √ log(t) +2 k X T,n (t) dt (n:n−k) t un  ∞ , n (t) log(t) dt + o(1) + oP (1) =2 T t 1  1  ∞ dt dt d 2 −→ 2 T(t) log(t) = −2γ W(t) log(t) . t t 1 0 √ √ (2) 2 Write Zn = k( γn,k − γ) and Zn = k( γn,k − 2γ 2 ). Then

F Solutions to problems

 √  (M ) k γ n,k − γ = Zn +

635

√ (2) Zn − 2 k{(γ + k −1/2 Zn )2 − γ 2 } (2)

2( γn,k − ( γn,k )2 ) (2)

Zn − 4γZn + oP (1) 2γ 2  1  1 W(s) − sW(1) W(s) log(s) d ds − ds . −→ (γ − 2) s s 0 0 = Zn +

Under extremal independence, W is the standard Brownian motion and the variance of the first integral is 1. Elementary calculus yields  1 W(s) log(s) var ds = 5 , s 0  1  1 W(s) − sW(1) W(s) log(s) cov ds, ds = −2 . s s 0 0 Thus the asymptotic variance is (γ − 2)2 + 4(γ − 2) + 5 = γ 2 + 1 . 9.6 Let un be the 1 − k/n quantile of F . Since F is continuous, nF (un ) = k. Define Tn (t) = k −1 nF (un t). Then   |Tn (t) − t−α | = k −1 nF (un t) − t−α    −α ≤ k −1 n F (un t) − c(un t)−α  + k −1 n|cu−α n − F (un )|t −β −α ≤ cst k −1 nu−β + t−α ) ≤ cst k −1 nu−β . n (t n t

The last inequality is due to the assumption α < β. The assumption (9.6.2) also implies that un = O((n/k)1/α ), thus √ −1 −β  ∞ −α−1 √  ∞ |Tn (t) − t−α | dt ≤ cst kk nun k t dt t 1 1  2β−α  β−α α k 2(β−α) −1/2+β/α 1−β/α ≤ cst k n = cst . n Thus k = o(nρ ) with ρ = 2(β − α)/(2β − α) yields (9.5.3). 9.7 Let un be the 1 − k/n quantile of F . Since F is continuous, nF (un ) = k. Define Tn (t) = k −1 nF (un t). Then   |Tn (t) − t−α | = k −1 nF (un t) − t−α    ≤ k −1 n F (un t) − c log(un t)(un t)−α    −α −α  + k −1 n c log(un )u−α + ck −1 nu−α log(t) n − F (un ) t n t −β −α + t−α ) + ck −1 nu−α log(t) ≤ cst k −1 nu−β n (t n t −α log(t) . ≤ cst k −1 nu−α n t

636

F Solutions to problems

−1 Applying (9.6.3) again, we have u−α n + O(u−β n log(un ) = k n ), thus −1 −1 (un ) + O(u−β (un )) . k −1 nu−α n = log n log −1 log n and thus (9.5.3) holds. Choosing k = o(log2 (n)) yields k −1 nu−α n ∼α

Chapter 13 13.1 1. By Theorem 13.3.3, Proposition 5.6.4 applied with j = +α and +α * (13.2.10) * 0 to the α-homogeneous map x → x∗0,∞ − x∗1,∞ yields θ = E[(Θ∗0,∞ )α − (Θ∗1,∞ )α ] = E[Z ∗0,∞ )α − (Z ∗1,∞ )α ] . 2. Similarly, θlargedev

⎞α ⎛ ⎞α⎤ ⎡⎛ ⎞α ⎛ ⎞α ⎤ ⎡⎛ ∞ ∞ ∞ ∞     = E ⎣⎝ Θj⎠ − ⎝ Θj⎠ ⎦ = E ⎣⎝ Zj⎠ − ⎝ Zj⎠ ⎦ . j=0

j=1

+

13.2 1. The stationary solution is Xj = P(Xj ≤ x) =

∞ 

j=0

+



+

j=1

+

φi εj−i and

i=1

P(φj εj ≤ x) = e−(1−φ

α

)

∞

j=0

φαj x−α

−α

= e−x

.

j=0

2. Use (13.5.2) to get Θj = ρj 1{j ≥ −N } with N independent of Y0 and P(N = k) = (1 − φα )φαk , k ≥ 0. Thus by Theorem 13.3.3, $ % / 0 φ−αN φ−αN θ = E ∞ = E = 1 − φα . −N α (1 − φα )−1 αj φ φ j=−N 3. By Theorem 13.5.5, conditions AC(cn , rn ) and S(rn , un ) hold and the process is β-mixing with geometric rate. 13.3 This is a consequence of Theorem 5.4.10 and the representation (5.4.11) since the tail measure of the process in the right-hand side of (13.6.1) is  ∞ E[δrB j Q ]αr−α−1 dr . j∈Z

0

13.4 1. Follows from Theorem 13.2.2. 2. By (13.2.9),



P⎝

 j∈Z

⎧ ⎫⎤ ⎨ ⎬ Θjα < ∞⎠ = E ⎣Z0α 1 Zjα < ∞ ⎦ . ⎩ ⎭

Since Z is stationary,





j∈Z

 j∈Z

Zjα = ∞ almost surely. Thus θ = 0.

F Solutions to problems

637

Chapter 14 14.1 Set f (u) = E[(AX + u)q − (AX)q ]. Since B is independent of A and X, integration by parts yields  ∞  ∞ q q f (u)FB (du) = f (u)P(B > u)du E[(AX + B) − (AX) ] = 0 0  ∞ E[(AX + u)q−1 ]P(B > u)du . =q 0

If B has a standard exponential distribution, then  ∞  ∞ q−1 q E[(AX + u) ]P(B > u)du = q E[(AX + u)q−1 ]e−u du 0

0

= qE[(AX + N )q−1 ] = qE[X q−1 ] . 14.2 Let p0 = P(A = 0) > 0. Then    ∞     P  Vi−1 Bi  < ∞ ≥ P(Vi = 0 for some i ≥ 1)   i=1

=

∞ 

P(Ai = 0, |Vi−1 | > 0) =

i=1

∞ 

p0 (1 − p0 )i−1 = 1 .

i=1

14.3 Condition (14.6.1b) implies that there exists δ ∈ (0, 1) and r > 0 such c that for all x ∈ B (0, r), ΠV (x) ≤ δV (x) . Set b = sup|x|≤r V −1 (x)ΠV (x) < ∞. Then, ΠV (x) ≤ δV (x)1{|x| > r} + b1{|x| ≤ r} ≤ δV (x) + b . 14.4 The function t → E[X t ] is equal to 1 at 0 and strictly greater than 1 (possibly infinite) at t0 and is convex since its second derivative is E[log2 (X)X t ]. Thus, either it is strictly above the level 1 on [0, t0 ) if its derivative at zero is non-negative or it first drops below the level 1 if the derivative at zero is negative. The derivative at zero being E[log(X)] < 0, this proves our claim. 14.5 The sequence {σj2 } fulfills the stochastic recurrence equation (14.3.11) with Aj = a1 ε2j−1 and Bj = a0 . 1. The criterion of existence is obtained by applying Theorem D.1.1.

638

F Solutions to problems

2. Since ε20 has a Pareto distribution with tail index β/2, we have E[log(a1 ε20 )] = log(a1 ) + log(ε20 )  ∞ 1 = log(a1 ) + β log(t)t−β/2−1 dt = log(a1 ) + 2/β < 0 , 2 1 by assumption. Thus E[log(a1 ε20 )] < 0 and we can apply Problem 14.4 to prove that there exists a unique solution α < β/2 to the equation E[(a1 ε20 )α ] = 1. Also, E[B0α ] < ∞, thus Theorem 14.1.1 applies. 3. The proof is similar to the Pareto case. 4. Let fε be the density of ε0 and fmin = minx∈[−τ,τ ] fε (x). Let A > 0. We will show that [−A, A] is small. For x ∈ [−A, A] and a measurable set B,  ∞   1 (a0 + a1 x2 )1/2 z ∈ B fε (z)dz Π(x, B) = 0  fε ((a0 + a1 x2 )−1/2 u)du = (a0 + a1 x2 )−1/2 B    2 −1/2 1 |u| ≤ (a0 + a1 x2 )1/2 τ du ≥ fmin (a0 + a1 A ) B   1/2 2 −1/2 fmin 1 |u| ≤ a0 τ du ≥ (a0 + a1 A ) B

= (a0 + a1 A2 )−1/2 fmin Leb(B ∩ [−a0 τ, a0 τ ]) . 1/2

1/2

Thus, (D.2.2) holds with  = 2(a0 + a1 A2 )−1/2 a0 τ fmin , 1/2

ν = (2a0 τ )−1 Leb(· ∩ [−a0 τ, a0 τ ]) . 1/2

1/2

1/2

5. We check conditions (14.6.1a) and (14.6.1b). First, for any r > 0, ΠV (x) 1 + |a0 + a1 x2 |q/2 E[|ε0 |q ] = ≤ 1 + |a0 + a1 r2 |q/2 E[|ε0 |q ] . V (x) 1 + |x|q Thus (14.6.1a) holds. Then ΠV (x) 1 + |a0 + a1 x2 |q/2 E[|ε0 |q ] q/2 = lim = a1 E[|ε0 |q ] < 1 x→∞ 1 + |x|q |x|→∞ V (x) lim

by assumption. This proves (14.6.1b) and the geometric drift condition (14.3.1) holds by Problem 14.3. 2 , therefore the sequence {σj2 } 14.6 Note that σj2 = a0 + (b1 + a1 ε2j−1 )σj−1 fulfills the stochastic recurrence equation with Aj = b1 + a1 ε2j−1 , Bj = a0 .

F Solutions to problems

639

1. Follows from Theorem D.1.1. 2. Let Π be the transition kernel of the Markov chain {σj2 }. For x ≤ r, ΠV (x) 1 + E[{a0 + (b1 + a1 ε20 )x}q ] = ≤ 1 + E[{a0 + (b1 + a1 ε20 )r}q ] . V (x) 1 + xq Thus (14.6.1a) holds. Furthermore, ΠV (x) 1 + E[{a0 + (b1 + a1 ε20 )x}q ] = lim = E[(b1 + a1 ε20 )q ] . x→∞ V (x) x→∞ 1 + xq lim

This proves (14.6.1b) and the geometric drift condition (14.3.1) holds by Problem 14.3. 14.7 Note that Xj = (a + bεj )Xj−1 + εj . Thus, {Xj , j ∈ Z} fulfills the stochastic recurrence equation with Aj = (a + bεj ) and Bj = εj . 1. Follows directly from Section 14.3.5. 2. We have



E[log(bε0 )] = log(b) +



log(t)βt−β−1 dt = log(b) + β −1 < 0 ,

1

by assumption. By Problem 14.4, this proves that there exists a unique solution α < β to the equation E[(bε0 )α ] = 1. j 3. The spectral tail process is Θj = i=1 (a + bεi )Θ0 . Using (14.3.15) we have  % $ j  α . ϑ=E 1 − max (a + bεi ) j≥1

i=1

+

14.8 Note that Xj = (a + bεj−1 )Xj−1 + εj . Thus, {Xj , j ∈ Z} satisfies a recursion of the form (14.3.11), however, (a + bεj−1 , εj ), j ≥ 1, is not an i.i.d. sequence. To overcome this issue, define Uj = (a + bεj )Xj . Then Xj = Uj−1 + εj and Uj = (a + bεj )Uj−1 + (a + bεj )εj . 1. The sequence {Uj } fulfills the stochastic recurrence equation (14.3.11) with Aj = (a + bεj ) and Bj = (a + bεj )εj . The stationary distribution is given by d

U0 =

∞ 

(a + b0 ) · · · (a + bj−1 )(a + bj )j .

j=0

Since Xj = Uj−1 + j , we obtain the form of the stationary distribution.

640

F Solutions to problems

2. By concavity of the function log, E[log((a + bε0 )2 )] ≤ log E[(a + bε0 )2 ] thus E[(a + bε0 )2 ] < 1 implies E[log((a + bε0 )2 )] < 0. If E[ε0 ] = 0 and E[ε20 ] = 1, the latter condition is equivalent to a2 + b2 < 1. 14.9 1. The form of the stationary solution follows from (14.3.12). Since N P(b1 · · · bj = 1) = (1 − p)j , X0 has the same distribution as j=1 Zj with P(N = j) = (1 − p)j−1 , j ≥ 1. 2. Regular variation of the stationary distribution follows from Corollary 4.2.3: P(X0 > x) ∼ E[N ]P(Z0 > x) = p−1 P(Z0 > x) . 3. The form of the forward spectral tail process follows from (14.3.13): Θj = b1 · · · bj , j ≥ 1 and P(Θj = 0) = 1 − (1 − p)j . The formula for the candidate extremal index follows from (14.3.15): / 0 ϑ = E 1 − max b1 · · · bj = E[1 − b1 ] = p . j≥1

4. Let σ1 = inf{j ≥ 1 : bj = 0} and for n ≥ 1, define recursively σn = inf{j > σn−1 : bj = 0} . Then the random times σn , n ≥ 1} are regeneration times for the chain, and their increments have a geometric distribution with mean p−1 . By Corollary 14.4.2, the extremal is given by θ = p lim

x→∞

P(maxσ1 ≤i x) P(X0 > x)

if the limit exists. Since the Zi are positive, we have P( max Xi > x) = P(Xσ2 −1 > x) = P(X0 > x) , σ1 ≤i x) =p =p =p. P(X0 > x) P(X0 > x) P(X0 > x)

14.12 1. By definition, P0 (σ0 > 0) = 1. With the notation introduced above, we have, for n ≥ 1, P0 (σ0 > n) = P(r(0) ≤ V1 , r(W1 ) ≤ V2 , . . . , r(Wn−1 ) ≤ Vn ) % $n−1  s(Wi ) . =E i=0

Thus, μ = E0 [σ0 ] =

∞  n=0

P(σ0 > n) = 1 +

∞  n=1

E

$n−1 

% s(Wi )

.

i=0

2. Since we assume μ < ∞, {0} is a recurrent atom. Since r is non-decreasing, Px (σ0 > n) ≤ P0 (σ0 > n) for all n ∈ N, thus Ex [σ0 ] < ∞ for all x ≥ 0. 3. Since {0} is an accessible atom, the return times to 0 are regeneration times. Thus we can apply (14.4.1) to obtain the expression of the invariant distribution. 4. Φ(x, e) = (x + z)1{r(x) ≤ v} with e = (v, z). Since r is non-decreasing, the limit p = limx→∞ r(x) exists. Thus

642

F Solutions to problems

Φ(xy, e) = y1{p ≤ v} = φ(y, e) . x→∞ x lim

The convergence is uniform on compact sets of (0, ∞) for each e ∈ [0, 1] × [0, ∞). 5. By (14.6.4), −1

P(X0 > x) = μ

∞ 

$  k  k−1 %   E 1 Zi > x s(Wi ) . i=1

k=1

i=0

By the single big jump heuristics, for large x, in each sum there is only one large Zj . Then s(Z1 + · · · + Zi ) ∼ 1 − p for i ≥ j. Thus P(X0 > x) ∼ μ−1 F Z (x)

∞  k 

E

$j−1 

k=1 j=1

= μ−1 F Z (x)

∞ 

E

$j−1 

j=1

=p

−1 −1

μ

F Z (x)

i=0

j=1

E

s(Wi ) (1 − p)k−j %

s(Wi )

i=0 ∞ 

%

$j−1 

∞ 

(1 − p)k−j

k=j

%

s(Wi ) = p−1 μ−1 (μ − 1)F Z (x) .

i=0

To justify this equivalence rigorously, note that for each j ≥ 1, 2   3 $j−1 % ∞ k k−1 E 1 Z > x s(W )   i i i=1 i=0 lim =E s(Wi ) (1 − p)k−j . x→∞ F Z (x) i=0 k=j Moreover, since s is non-increasing, 2 3 2   3 k k−1 k k−1 s(0) E 1{Z > x/k} q=1 s(Zq ) E 1 Z > x s(W ) i i i i=1 i=1 i=0 q=i ≤ F Z (x) F Z (x) P(Z0 > x/k) . ≤ ks(0)(E[s(Z0 ])k−1 P(Z0 > x) Applying Proposition 1.4.2, this yields, for an arbitrary  > 0, x ≥ 1 and k ≥ 1, 2   3 k k−1 E 1 i=1 Zi > x i=0 s(Wi ) ≤ cst k α+1+ (E[s(Z0 ])k−1 . F Z (x) The latter bound does not depend on x and is summable so we can apply the dominated convergence theorem to conclude.

F Solutions to problems

643

6. Since X0 is regularly varying, the spectral tail process satisfies the recursion Θ0 = 1 , Θj+1 = Θj 1{p ≤ Vj+1 } , j ≥ 0 . This shows that Θj = 1{T > j} and Yj = Y0 1{T > j} with P(T > j) = (1 − p)j , j ≥ 0. 7. If p = 1, the tail process is Θj = 0 for j ≥ 1. In any case, ϑ = P(sup Yj ≤ 1) = P(T = 1) = p . j≥1

8. If r is bounded by p < 1, we will prove that the return time to zero started from zero (hence from any point) has a geometric moment. Since s is nonincreasing, for n ≥ 1, $n−1 % $n−1 %   s(Wi ) ≤ s(0)E (Zi ) = s(0){E[s(Z1 )])n−1 . P0 (σ0 > n) = E i=0

i=1

Since E[s(Z0 )] < 1, this proves that E0 [β σ0 ] < ∞ for β > 1 such that βE0 [s(Z0 )] < 1. By Theorem D.3.2, this proves that (14.3.1) holds with C = {0}. 9. We must find an equivalent for P0 (max1≤i≤σ0 −1 Xi > x). Since Z0 is nonnegative, the maximum is achieved at the last point of the cycle. Thus,  max Xi > x = P0 (Z1 + · · · + Zσ0 −1 > x) . P0 1≤i≤σ0 −1

Define Gi = σ(X0 , Z1 , . . . , Zi , V1 , . . . , Vi+1 ). Then, starting from X0 = 0, {σ0 − 1 = i} = {σ0 = i + 1} = {r(0) ≤ V1 } ∩

i

{r(Z1 + · · · + Zj ) ≤ Vj+1 }

j=1

∩ {Vi+1 < r(Z1 + · · · + Zi )} ∈ Gi . Thus σ0 − 1 is a G-stopping times and we can apply Corollary 4.2.4 since σ0 has geometric moments so is lighter-tailed than Z0 and (4.2.5) holds. Thus  P0 max Xi > x = P0 (Z1 + · · · + Zσ0 −1 > x) 0≤i≤σ0 −1

∼ E[σ0 − 1]F Z (x) = (μ − 1)F Z (x) . This yields θ=

1 1 (μ − 1) P0 (max1≤i≤σ0 −1 Xi > x) lim = =p=ϑ. −1 x→∞ μ P(X0 > x) μ p μ−1 (μ − 1)

644

F Solutions to problems

14.13 1. We have Φ(x, z) = f (x) + z and Assumption 14.5.3 holds with φ(u, z) = ρu, thus (14.2.6) also holds. 2. If the initial distribution is regularly varying, then Yj = ρj Y0 . The forward tail process is a deterministic AR(1) process and ϑ = P(sup |Yj | ≤ 1) = P(ρ|Y0 | ≤ 1) = 1 − ρα . j≥1

3. Let Π be the kernel of the chain and set V (x) = 1+|x|. Then by assumption lim

|x|→∞

ΠV (x) 1 + E[|f (x) + Z0 |] = lim =ρ 0, sup |x|≤r

ΠV (x) ≤ 1 + sup |f (x)| + E[|Z0 |] < ∞ . V (x) |x|≤r

Thus we can apply Problem 14.3 and thus (14.3.1) holds. 4. Fix a compact set K. Let g be a density of Z0 . Then  ∞ Π(x, A) = 1A (f (x) + y)g(y)dy  −∞ g(y − f (x))dy ≥ inf g(y − f (x))dy . = A x∈K

A

Since f is continuous, there exists a > 0 such that f (K) ⊂ [−a, a]. Then if y ∈ [−a, a] and x ∈ K, −2a ≤ y − f (x) ≤ 2a . By assumption, there exists  > 0 such that inf |y|≤2a g(y) ≥ . This yields -∞ inf x∈K g(y − f (x)) ≥  for all y ∈ [−a, a], thus −∞ inf x∈K g(y − f (x))dy > 0. 5. The chain is therefore geometrically ergodic, admits an invariant distribution. If the invariant distribution is regularly varying, θ = ϑ. 14.14 1. We have Φ(x, e) = (x + e)+ , thus for u > 0, φ(u, e) = lim

x→∞

(xu + e)+ =u. x

(F.3)

2. By the law of large numbers, limn→∞ Z1 + · · · + Zn = −∞. Thus Px (σ0 < ∞) for all x ≥ 0. Since it is assumed that E[(Z0 )+ ] < ∞, we can apply Theorem E.4.9 to prove that Ex [σ0 ] < ∞ for all x ≥ 0.

F Solutions to problems

645

3. We show first that for each n, Xn has the same distribution as ⎫ ⎧ n n ⎬ ⎨   Zj , Zj . max 0, Zn , Zn + Zn−1 , . . . , ⎭ ⎩ j=2

j=1

Indeed, the statement is trivially true for n = 0, 1. We proceed by induction: Xn+1 = (Xn + Zn+1 )+ ⎧ ⎫⎞ ⎛ n+1 n+1 ⎨   ⎬ = ⎝max Zn+1 , Zn + Zn+1 , Zn+1 + Zn + Zn−1 , . . . , Zj , Zj ⎠ ⎩ ⎭ j=2 j=1 ⎧ ⎫ + n+1 n+1 ⎨   ⎬ = max 0, Zn+1 , Zn + Zn+1 , Zn+1 + Zn + Zn−1 , . . . , Zj , Zj . ⎩ ⎭ j=2

j=1

This concludes the induction step. It is immediate that Xn has the same n distribution as j=0 Sj . 4. By the strong law of large numbers, since E[Z0 ] < 0, Sn → −∞ almost ∞ surely. Thus, X∞ = j=0 Sj < ∞ almost surely. By [EKM97, p. 39],  ∞ −1 P(Z0 > y)dy P(X∞ > x) ∼ (−μ) x

as x → ∞. This yields that X∞ is regularly varying with index α − 1. 5. Since the invariant distribution is regularly varying, (F.3) implies that Yj = Y0 for all j, hence the candidate extremal index is zero. By Lemma 7.5.4, this implies that the extremal index is also 0. 14.15 1. Let f be a nonnegative measurable function. Eπ [f (X1 )] = Eπ [f (X0 + Z1 1{V1 ≤ a(X0 , X0 + Z1 })]   = f (x + z)a(x, x + z)h(x)q(z)dxdz Rd Rd   + f (x){1 − a(x, x + z)}h(x)q(z)dxdz d d  R R f (x + z)1{h(x + z) ≤ h(x)}h(x + z)q(z)dxdz = Rd Rd   + f (x + z)1{h(x + z) > h(x)}h(x)q(z)dxdz Rd Rd   f (x)1{h(x + z) ≤ h(x)}{h(x) − h(x + z)}q(z)dxdz . + Rd

Rd

By the symmetry of q and the translation invariance of Lebesgue’s measure, we have

646

F Solutions to problems





Rd

Rd

f (x + z)1{h(x + z) ≤ h(x)}h(x + z)q(z)dxdz   f (x)1{h(x) ≤ h(x − z)}h(x)q(z)dxdz = d d R R = f (x)1{h(x) ≤ h(x + z)}h(x)q(z)dxdz . Rd

Rd

Similarly,   Rd

Rd

f (x + z)1{h(x + z) > h(x)}h(x)q(z)dxdz   = f (x)1{h(x) > h(x − z)}h(x − z)q(z)dxdz Rd Rd   f (x)1{h(x) > h(x + z)}h(x + z)q(z)dxdz . = Rd

Rd

Thus

 Eπ [f (X1 )] =



Rd

Rd

f (x)h(x)q(z)dxdz = Eπ [f (X0 )] .

This proves that π is invariant. 2. The functional representation is given by Φ(u, v, z) = u + z1{v ≤ a(u, u + z)} . Therefore, Assumption 14.2.6 holds with φ(u, e) = lim x−1 Φ(xu, v, z) = u . x→∞

3. If the invariant distribution is regularly varying, the previous relation implies that Yj = Y0 for all j, hence the candidate extremal index is zero. By Lemma 7.5.4, this implies that the extremal index is also 0. 14.16 [Gol91, Section 5] 14.17 [Gol91, Section 6] 14.18 On one hand P(X1 > b(x) | X0 > x) → P (Y0 > 1, Y1 > 1) = P(Y0 > 1, Y0κ W1 > 1)  ∞ = P(v κ W1 > 1)αv −α−1 dv 1



1

α/κ

P(W1

= 0

2 3 α/κ > r)dr = E min{1, W1 } .

F Solutions to problems

647

On the other hand, for bounded and continuous functions f , /  0 X1 E[f (W1 )] = lim E f | X0 > x x→∞ b(X0 ) so that E

2

3

α/κ min{1, W1 }

$

  X1 = lim E min 1, x→∞ b(X0 )

α/κ

% .

14.19 1. For (y0 , . . . , yh ) ∈ [1, ∞) × Rh , we have h

lim P(X0 ≤ xy0 , X1 ≤ xφ y1 , . . . , Xh ≤ xφ yh | X0 > x) x→∞  y0 h = P(eξ0,1 ≤ v −φ y1 , . . . , eξ0,h ≤ v −φ yh ) αv −α−1 dv . 1

The limiting conditional distribution of Xh given X0 > x is thus  ∞ h h lim P(Xh ≤ xφ y | X0 > x) = P(eξ0,h ≤ v −φ y) αv −α−1 dv . x→∞

1

2. If α > 1, we can apply Proposition 3.2.9 and we obtain / 0 Xh αE[eξ0,h ] αE[X0 ] = lim E φh | X0 > x = . h x→∞ α − φh x (α − φh )E[X0φ ] h

h−1

i

3. The recursion (14.5.6) yields X0 Xh = X01+φ e i=0 φ εh−i and the series h −1 h−1 i i=0 φ εh−i ] < ∞ in the exponential is independent of X0 . Also, E[eα(1+φ ) and by Breiman’s Lemma, we obtain directly that the tail index of X0 Xh is α/(1 + φh ). 14.20 The sequence {Xj , j ≥ 1} is a Markov chain with the functional representation Φ(x, z) = ρxβ +z and transition kernel Π(x, A) = P(ρxβ +Z0 ∈ A). 1. Write Φz (x) = Φ(x, z) = ρxβ + z. Define then, for an arbitrary x ≥ 0, ˆ n (x) by X ˆ n (x) = ΦZ ◦ ΦZ · · · ΦZ (x) = Z0 + ρ(Z1 + ρ(Z2 + · · · + (ρxβ + Zn )β )β )β . X 0 1 n ˆ n (0)} is non-decreasing and Since the Zi are non-negative, the sequence {X since β ∈ (0, 1), for all x ≥ 1 and y ≥ 0, (x + y)β ≤ xβ + βy. Indeed, if fx (y) = (x + y)β and gx (y) = xβ + βy, then fx (0) = gx (0) for each x and moreover gx (y) is concave. This yields

648

F Solutions to problems

ˆ n (0) ≤ Z0 + ρ(Z1 ∨ 1)β + ρ2 β(Z2 ∨ 1 + ρ(Z3 ∨ 1 + · · · )β )β X ≤ Z0 + ρ(Z1 ∨ 1)β + ρ2 β(Z2 ∨ 1)β + ρ3 β 2 (Z3 ∨ 1 + ρ(Z4 + · · · )β ∞  ≤ Z0 + ρj β j−1 (Zj ∨ 1)β . j=1

For q ∈ (0, α ∧ 1], ˆ n (0))q ] ≤ E[Z0 ] + ρq E[(Z0 ∨ 1)qβ ] E[(X

∞ 

(ρβ)q(j−1) < ∞ .

j=1

ˆ which is a solution of the ˆ n (0) converges almost surely to a limit X Thus X d d β β ˆ ˆ n+1 (0). recursion X = ρX + Z0 since ρ(Xn (0)) + Zn+1 = X 2. For q ∈ (0, α), set V (x) = 1 + xq . Then, for r > 0 and x ≤ r, ΠV (x) 1 + E[(ρxβ + Z)q ] = V (x) 1 + xq ≤ 1 + E[(ρrβ + Z)q ] ≤ 1 + 2(q−1)+ (ρq rqβ + E[Z q ]) < ∞ , 1 + E[(ρxβ + Z)q ] 1 + 2(q−1)+ (ρq xqβ + E[Z q ]) ≤ lim =0. q x→∞ x→∞ 1+x 1 + xq lim

Thus (14.3.1) holds by Problem 14.3 for all values of ρ. 3. Let h be the density of Z0 on R+ . For A > 0, x ∈ [0, A] and a measurable subset B of R+ ,   ∞ β β h(u − ρx )du ≥ h(u)1B (u)du = ν(B) , P(ρx + Z0 ∈ B) = ρAβ

B

with  = P(Z0 > ρAβ ) > 0 since Z has an unbounded support (being regularly varying) and ν having the density −1 h on ρAβ . 4. The previous calculations show that there exists an invariant distribution π for all values of ρ and π(V ) < ∞. Thus, E[X0q ] < ∞ for all q ∈ (0, α). Choose q ∈ (βα, α). Then lim sup x→∞ d

Since X0 =

ρX0β

P(X0β > x) E[X0q ] ≤ lim q/β =0. x→∞ x P(Z0 > x) P(Z0 > x) 0

+ Z0 , this implies that P(X0 > x) ∼ P(Z0 > x).

5. If ux → u ≥ 0, then Φ(xux , z) ρxβ uβx + z = lim = ρu . β x→∞ x→∞ x xβ lim

Thus Assumption 14.5.3 holds with φ(u, z) = ρuβ thus (14.5.4) does not hold.

F Solutions to problems

649

6. This previous result proves that Assumption 14.5.1 holds with K(u, B) = δρuβ (B) , so (14.5.2) does not hold. 7. Since Φ(x, z) = ρxβ + z, we have lim lim sup sup P(Φ(xu, Z0 ) > b(x)) = lim lim sup sup P(ρxβ uβ + Z0 > xβ )

→0 x→∞

→0 x→∞

u≤

u≤

≤ lim lim sup P(ρβ + x−β Z0 > 1) →0 x→∞   = lim 1 ρβ > 1 = 0 . →0

8. Applying Theorem 14.5.4, we obtain 2

j

β β Yj = ρYj−1 = ρ1+β Yj−2 = · · · = ρ1+β+···+β Y0β

j+1

.

14.21 The sequence {ηj , j ∈ N} satisfies 2 ηj+1 = ω + γηj2β Zj2β .

Thus, it is a Markov chain with state space [ω, ∞) and functional representation Φ(x, z) = ω + γxβ z 2β and transition kernel Πη (x, A) = P(ω + γxβ Z02β ∈ A). √ 1. Note that replacing Zj by ωZj and ηj by ω −1/2 ηj and γ by ω −1 γ, we can assume that ω = 1. Then, by the same argument as in Problem 14.20, 1 + γ(Z02 )β (1 + γ(Z12 )β (1 + γ(Z32 )β (1 + · · · )β )β )β ∞  ≤ 1 + γ(Z02 )β + γ(Z02 )β (βγ)j−1 (Z12 · · · Zj2 )β . j=1

Thus, if there exists q ≤ 1, q < α/(2β), such that (βγ)q E[Z02qβ ] < 1, then the series is almost surely summable. 2. Note that E[Z02qβ ] < ∞. Set V (x) = 1 + xq . Then, Πη V (x) 1 + E[(ω + γxβ (Z02 )β )q ] = V (x) 1 + xq ≤

1 + 2(q−1)+ ω q + 2(q−1)+ γ q xβq E[(Z02 )qβ ] . 1 + xq

Thus, for r > 0, sup ω≤x≤r

lim

x→∞

Πη V (x) ≤ 1 + 2(q−1)+ ω q + 2(q−1)+ γ q rβq E[(Z02 )qβ ] , V (x) Πη V (x) =0. V (x)

This proves that the drift condition (14.3.1) holds by Problem 14.3.

650

F Solutions to problems

3. Let A > ω. We will prove that [ω, A] is a small set. Let h be the density of (Z02 )β . Then h is bounded away from zero on [0, τ ] and for a measurable set B ⊂ [ω, ∞), x ∈ A Πη (x, B) = P(ω + γxβ Z02β ∈ B)    ∞ u − ω du 1B (ω + γxβ y)h(y)dy = h = γxβ γxβ B 0    inf 0≤y≤τ h(y) ≥ 1 ω ≤ u ≤ ω + γxβ τ du β γA B   inf 0≤y≤τ h(y) ≥ 1 ω ≤ u ≤ ω + γω β τ du = ν(B) , β γA B with =

ω β τ inf 0≤y≤τ h(y) , ν = (γω β τ )−1 Leb(· ∩ [ω, ω + γω β τ ]) . Aβ

4. From the previous points and Theorem D.3.1 we know that there exists a stationary distribution π for {ηj2 , j ∈ Z}. This also implies that there exists a stationary distribution for {Xj , j ∈ Z}. Furthermore π(V ) < ∞, therefore E[η02q ] < ∞ for all α/2 < q < (α/(2β)) ∧ 1. Therefore, we can choose  > 0 such that E[η0α+ ] < ∞. Using Lemma 1.4.3 and since η0 > 0, we have P(X0 > x) = P(η0 Z0 > x) ∼ E[η0α ]P(Z0 > x) 5. If us → u, 1/2 x−β Φ(xux , z) = x−β (ω + γx2β u2β z → γ 1/2 uβ z . x )

Thus Assumption 14.5.3 holds with φ(u, z) = γ 1/2 uβ z and (14.5.4) does not hold. However, lim lim sup sup P(|Φ(xu, Z0 )| ≥ b(x))

→0 x→∞ |u|≤

= lim lim sup P(x−β (ω + γx2β 2β )1/2 |Z0 | ≥ 1) ≤

→0 x→∞ lim P(γ2β |Z0 | →0

Thus (14.5.5) holds.

Chapter 15 15.1 Using (15.1.3) we have

≥ 1) = 0 .

F Solutions to problems

651

α

τ|X | (h) = P(|Y h | > 1) = E[|Θh | ∧ 1]   α # " ∞ Z α   ∧ C 0,j ΘZ 0 j=0 E C h,h+j Θ0 α # " = . ∞ Z E C 0,i Θ  i=0

0

If α > 1, then α E[|Θh |] α−1 3 2    ∞ Z α−1   C h,h+j ΘZ C E Θ 0,j 0 0 j=0 α  α . = ∞ Z   ] α−1 i=0 E[ C 0,i Θ0

CTE|X | (h) = E[|Y h |] =

15.2 1. For j ≥ 1, Θj =

j 

Aj , j ≥ 1 .

i=1

Let p = E[Aα 0 ] and N be an integer-valued random variable, independent of {Aj , j ∈ Z}, with a geometric distribution such that 3 2 j α E i=1 Ai 3 = (1 − p)pj , j ≥ 0 . 2 P(N = j) =  ∞ k α E A k=0 i=1 i Prove that for n ≥ 1, Θ−n = B1 · · · Bn 1{N ≥ n} , where Bi is a sequence of i.i.d. random variables such that E[H(B1 )] =

−1 E[Aα 1 H(A1 )] . α E[A1 ]

2. Applying Problem 15.1, we have for h ≥ 1 $ h %  α τ|X| (h) = E Ai ∧ 1 , CTE|X| (h) = i=1

α (E[A0 ])h . α−1

15.3 1. Applying Problem 15.1, we have ∞ α (|ψi | ∧ |ψi+h |) P(|Yh | > 1) = i=0∞ , α i=0 |ψi | ∞ |ψ |α−1 |ψi+h | α i=0 ∞i CTE|X| (h) = . α α−1 i=0 |ψi |

652

F Solutions to problems

2. For the AR(1) process, we have Θj = ρj (sign(ρ))N Θ0 1{j ≥ −N } , j ∈ Z , where P(Θ0 = 1) = 1 − P(Θ0 = −1) = p, the extremal skewness of X0 , and N is an integer-valued random variable, independent of Θ0 , such that P(N = j) = (1 − |ρ|α )|ρ|jα . The extremal index of the series {|Xj |, j ∈ Z} is θ = 1 − |ρ|α . The tail dependence coefficient τ|X| (h) and CTE|X| (h) become τ|X| (h) = |ρ|αh , CTE|X| (h) =

α|ρ|h . α−1

15.4 The limiting variance is given in Example 10.4.6: ⎞α ⎡⎛ ⎤   2 2 σld = θlargedev − 2θlargedev E ⎣⎝ Θj ⎠ ∧ 1⎦ + θlargedev P(|Yj | > 1) . j∈Z

Here we have

⎡⎛

θlargedev = E ⎣⎝

∞ 

j∈Z

+

⎞α ⎤ ⎛ ∞  Θj ⎠ − ⎝ Θj ⎠ ⎦

j=0

⎞α

+

j=1

+

⎧⎛ ⎞α ⎛ ⎞α ⎫⎤ ∞ ∞ ⎨  ⎬  ⎦ , = E ⎣|ψN |−α ⎝ ψj+N Θ0 ⎠ − ⎝ ψj+N Θ0 ⎠ ⎩ ⎭ ⎡

j=0

+

j=1

+

with N an integer-valued random variable such that α P(N = j) = ψ−α α |ψj | , j ≥ 0 .

For an AR(1) process with ρ ∈ (0, 1) we get P(N = k) = (1 − ρα )ρk , k ≥ 0 , Θj =

ρj+N Θ0 = ρj Θ0 1{j ≥ −N } , j ∈ Z . ρN

In particular, Θj = ρj Θ0 for j ≥ 0. Thus, with p = P(Θ0 = 1), ⎞α ⎛ ⎞α ⎤ ⎡⎛ ∞ ∞   Θj ⎠ − ⎝ Θj ⎠ ⎦ θlargedev = E ⎣⎝ j=0

+

j=1

+

⎧⎛ ⎞α ⎛ ⎞α ⎫ ∞ ∞ ⎨  ⎬  1 − ρα =p ⎝ ρj ⎠ − ⎝ ρj ⎠ . =p ⎩ ⎭ (1 − ρ)α j=0

j=1

F Solutions to problems

Also, ⎛⎛ P ⎝⎝

 j∈Z



Θj = Θ0 ⎞





j≥−N

ρj = Θ0 (1 − ρ)−1 ρ−N , thus

⎡⎛

Yj ⎠ > 1⎠ = E ⎣⎝

j∈Z

653



⎞α

Θj ⎠ ∧ 1⎦ = pE[{(1 − ρ)−α ρ−N α } ∧ 1] = p .

j∈Z

+



+

Lastly, P(|Yj | > 1) = ραj for j ≥ 0 and P(Yj > 1) = P(Y−j > 1) for j < 0 by the time-change formula (5.3.1a), so 

P(|Yj | > 1) =

j∈Z

 j∈Z

ρα|j| =

1 + ρα . 1 − ρα

Altogether, 2 σld

1 − ρα =p (1 − ρ)α

  1 − ρα 1 + ρα 1 − 2p + p . (1 − ρ)α 1 − ρα

List of conditions

We gather here fore easier reference the main conditions under which the results of Chapters 6 to 12 are obtained. Anticlustering conditions • AC(rn , cn ): for all x, y > 0  max |X j | > cn x | |X 0 | > cn y lim lim sup P m→∞ n→∞

=0.

m≤j≤rn

Used from Chapter 6 to Chapter 12. Ensures that the tail process converges to 0 at ∞. • S(rn , un ): for all x, y > 0 lim lim sup

m→∞ n→∞

rn 

P (|X j | > un x | |X 0 | > un y) = 0 .

|j|=m

Used in Chapter 9 and Chapter 10 to ensure convergence of variance of estimators. • Slog (rn , un ): for all x, y > 0,

/  rn  |X 0 | 1 lim lim sup E log+ m→∞ n→∞ P(|X 0 | > un ) xun

 log+

|j|=m

|X j | yun

0 =0.

Used in Chapter 9 and Chapter 11 to ensure convergence of variance of the Hill estimator. • Sψ (rn , un ) lim lim sup

m→∞ n→∞

/  rn  X0 1 E ψ P(|X 0 | > un ) un |j|=m

 ψ

Xj un

0 =0.

Used in Chapter 10 to ensure convergence of variance of tail array sums. © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

655

656

List of conditions

• Sψ,ei (rn , un ) lim lim sup

m→∞ n→∞

.n,j = X



2 3 .n,0 )ψ(X .n,j ) rn E ψ(X  |j|=m

F 0 (un )

Xj Xj+1 Xj+h ,··· , , un b1 (un ) bh (un )

=0,

.

Used in Chapter 11 to ensure convergence of variance of tail array sums under extremal independence. • Slin,ei (rn , un ) lim lim sup

m→∞ n→∞ / rn 

×

E

|j|=m

1 P(|X0 | > un )

0 |Xh | |Xj+h | 1{|X0 | > s0 un }1{|Xj | > s0 un } = 0 . bh (un ) bh (un )

Used in Chapter 11 to ensure convergence of variance of the estimator of the conditional scaling function. • Spr (rn , un ) lim lim sup

m→∞ n→∞

rn  P(|X0 Xh | > un bh (un )s0 , |Xj Xj+h | > un bh (un )s0 ) =0. F 0 (un ) |j|=m

Used in Chapter 11 to ensure convergence of variance of the estimator of the conditional scaling exponent. • Slog,pr (rn , un ) lim lim sup

m→∞ n→∞





3 2 |Xj Xj+h | |X0 Xh | rn E log log  + un b(un ) + un b(un ) |j|=m

F 0 (un )

=0.

Used in Chapter 11 to ensure convergence of variance of the estimator of the conditional scaling exponent. Negligibility of small jumps • For non-centered sums within blocks: ANSJB(rn , cn ); For all η > 0, r n P( j=1 |X j |1{|X j | ≤ cn } > ηcn ) lim lim sup =0. →0 n→∞ rn P(|X 0 | > cn ) Used in Chapter 6 to obtain summability and moments of the conditional spectral tail process and in Chapter 10 for defining and estimating certain cluster functionals.

List of conditions

657

• For centered sums: (ANSJ(an )); For all δ > 0,   n      Xi 1{|Xi | ≤ an } − E[X0 1{|X0 | ≤ an }] > an δ = 0 . lim lim sup P  →0 n→∞   i=1

Used in Chapter 8 for the weak convergence of sums to stable laws. • For maximum of centered sums: (ANSJU(an )); For all δ > 0, k        lim lim sup P max  Xi 1{|Xi | ≤ an }−E[Xi 1{|Xi | ≤ an }] >an δ = 0 . →0 n→∞  1≤k≤n  i=1

Used in Chapter 8 for the functional weak convergence of partial sum process to stable processes. Rate conditions • Rate condition R(rn , un ) lim un = ∞ ,

n→∞

lim nF (un ) = ∞ ,

n→∞

lim rn F (un ) = 0 .

n→∞

Condition on the size of blocks rn compared to the threshold un . • Mixing condition β(rn , n ) lim

n→∞

1 n rn n = lim = lim = lim βn = 0 . n→∞ rn n→∞ n n→∞ rn n

Weak temporal dependence condition used from Chapter 7 to Chapter 12. Relates the rate of decay of the mixing coefficients to the size of blocks. Bias conditions • (9.3.3): For s0 ∈ (0, 1): lim

n→∞



   P(|X 0 | > un t)  −α   −t =0. nP(|X 0 | > un ) sup  t≥s0 P(|X 0 | > un )

Used in Chapters 9 and 10 to control bias in the tail empirical process.

References

[AA10] Søren Asmussen and Hansj¨ org Albrecher. Ruin probabilities, volume 14 of Advanced Series on Statistical Science & Applied Probability. World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, second edition, 2010. [Als16] Gerold Alsmeyer. On the stationary tail index of iterated random Lipschitz functions. Stochastic Processes and their Applications, 126(1):209–233, 2016. [Arc94] Miguel A. Arcones. Limit theorems for nonlinear functionals of a stationary Gaussian sequence of vectors. Annals of Probability, 22(4):2242–2274, 1994. [Asm03] Søren Asmussen. Applied probability and queues, volume 51 of Applications of Mathematics (New York). Springer-Verlag, New York, second edition, 2003. Stochastic Modelling and Applied Probability. [AT92] Florin Avram and Murad S. Taqqu. Weak convergence of sums of moving averages in the α-stable domain of attraction. Annals of Probability, 20(1):483–503, 1992. [AT00] Florin Avram and Murad S. Taqqu. Robustness of the r / s statistic for fractional stable noises. Statistical Inference for Stochastic Processes, 3(1):69–83, 2000. [Bak94] Dominique Bakry. L’hypercontractivit´e et son utilisation en th´eorie des semigroupes. In Lectures on probability theory (Saint-Flour, 1992), volume 1581 of Lecture Notes in Math., pages 1–114. Springer, Berlin, 1994. [BB03] Fran¸cois Baccelli and Pierre Br´emaud. Elements of queueing theory, volume 26 of Applications of Mathematics (New York). Springer-Verlag, Berlin, second edition, 2003. [BB18] Betina Berghaus and Axel B¨ ucher. Weak convergence of a pseudo maximum likelihood estimator for the extremal index. Annals of Statistics, 46(5):2307–2335, 10 2018. © Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

659

660

References

[BBIK19] Clemonell Bilayi-Biakana, Gail Ivanoff, and Rafal Kulik. The tail empirical process for long memory stochastic volatility models with leverage. Electronic Journal of Statistics, 2019. Accepted. [BBKS20] Clemonell Bilayi-Biakana, Rafal Kulik, and Philippe Soulier. Statistical inference for heavy tailed series with extremal independence. Extremes, 23(1):1–33, 2020. [BC06] Patrice Bertail and St´ephan Cl´emen¸con. Regenerationbased statistics for harris recurrent markov chains. In Dependence in Probability and Statistics, volume 187 of Lecture Notes in Statistics, pages 3–54. Springer, Berlin, 2006. [BCT09] Patrice Bertail, St´ephan Cl´emen¸con, and Jessica Tressou. Extreme values statistics for markov chains via the (pseudo) regenerative method. Extremes, 12(4), 2009. [BCT13] Patrice Bertail, St´ephan Cl´emen¸con, and Jessica Tressou. Regenerative block-bootstrap confidence intervals for tail and extremal indexes. Electronic Journal of Statistics, 7:1224– 1248, 2013. [BD98] F. Jay Breidt and Richard A. Davis. Extremes of stochastic volatility models. The Annals of Applied Probability, 8(3): 664–675, 1998. [BDM99] Bojan Basrak, Richard A. Davis, and Thomas Mikosch. The sample ACF of a simple bilinear process. Stochastic Processes and their Applications, 83(1):1–14, 1999. [BDM02a] Bojan Basrak, Richard A. Davis, and Thomas Mikosch. A characterization of multivariate regular variation. Annals of Applied Probability, 12(3):908–920, 2002. [BDM02b] Bojan Basrak, Richard A. Davis, and Thomas Mikosch. Regular variation of GARCH processes. 99(1):95–115, Stochastic Processes and their Applications, 2002. [BDM16] Dariusz Buraczewski, Ewa Damek, and Thomas Mikosch. Stochastic models with power-law tails. Springer Series in Operations Research and Financial Engineering. Springer, [Cham], 2016. The equation X = AX + B. [Ber64] Simeon M. Berman. Limit theorems for the maximum term in stationary sequences. Annals of Mathematical Statistics, 35:502–516, 1964. [Ber92] Simeon M. Berman. Sojourns and extremes of stochastic processes. The Wadsworth & Brooks/Cole Statistics/Probability Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA, 1992. [BFGK13] Jan Beran, Yuanhua Feng, Sucharita Ghosh, and Rafal Kulik. Long-memory processes. Springer, Heidelberg, 2013.

References

661

[BGT89] Nicholas H. Bingham, Charles M. Goldie, and Jozef L. Teugels. Regular variation. Cambridge University Press, Cambridge, 1989. [BGTS04] Jan Beirlant, Yuri Goegebeur, Jozef Teugels, and Johan Segers. Statistics of extremes. Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester, 2004. Theory and applications, With contributions from Daniel De Waal and Chris Ferro. [Bil68] Patrick Billingsley. Convergence of probability measures. New York, Wiley, 1968. [Bil86] Patrick Billingsley. Probability and measure. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons, Inc., New York, second edition, 1986. [Bil99] Patrick Billingsley. Convergence of probability measures. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, second edition, 1999. [BJL16] Raluca Balan, Adam Jakubowski, and Sana Louhichi. Functional convergence of linear processes with heavy-tailed innovations. Journal of Theoretical Probability, 29(2):491–526, 2016. [BJMW11] Katarzyna Bartkiewicz, Adam Jakubowski, Thomas Mikosch, and Olivier Wintenberger. Stable limits for sums of dependent infinite variance random variables. 150(3-4):337–372, Probability Theory and Related Fields, 2011. [BK01] Milan Borkovec and Claudia Kl¨ uppelberg. The tail of the stationary distribution of an autoregressive process with arch(1) errors. Annals of Applied Probability, 11:1220–1241, 2001. [BK15] Bojan Basrak and Danijel Krizmani´c. A multivariate functional limit theorem in weak M1 topology. Journal of Theoretical Probability, 28(1):119–136, 2015. [BKS12] Bojan Basrak, Danijel Krizmani´c, and Johan Segers. A functional limit theorem for dependent sequences with infinite variance stable limits. Annals of Probability, 40(5):2008–2033, 2012. [Bou07] N. Bourbaki. Topologie g´en´erale. Springer-Verlag, Berlin, Heidelberg, 2007. [BP92] Philippe Bougerol and Nico Picard. Stationarity of GARCH processes and of some nonnegative time series. Journal of Econometrics, 52(1-2):115–127, 1992. [BP18] Bojan Basrak and Hrvoje Planini´c. Compound poisson approximation for random fields with application to sequence alignment. arXiv:1809.00723, 2018.

662

References

[BP19] Bojan Basrak and Hrvoje Planini´c. A note on vague convergence of measures. Statistics and Probability Letters, 153:180– 186, 2019. [BPS18] Bojan Basrak, Hrvoje Planini´c, and Philippe Soulier. An invariance principle for sums and record times of regularly varying stationary sequences. Probability Theory and Related Fields, 172(3-4):869–914, 2018. [BR77] Bruce M. Brown and Sidney I. Resnick. Extreme values of independent stochastic processes. Journal of Applied Probability, 14(4):732–739, 1977. [Bra05] Richard C. Bradley. Basic properties of strong mixing conditions. A survey and some open questions. Probability Surveys, 2:107–144, 2005. [Bra07] Richard C. Bradley. Introduction to strong mixing conditions. Vol. 1. Kendrick Press, Heber City, UT, 2007. [Bre65] Leo Breiman. On some limit theorems similar to the arcsine law. Theory of Probability and Applications, 10:323–331, 1965. [BS09] Bojan Basrak and Johan Segers. Regularly varying multivariate time series. Stochastic Processes and their Applications, 119(4):1055–1080, 2009. [BS18a] Axel B¨ ucher and Johan Segers. Inference for heavy tailed stationary time series based on sliding blocks. Electronic Journal of Statistics, 12(1):1098–1125, 2018. [BS18b] Axel B¨ ucher and Johan Segers. Maximum likelihood estimation for the fr´echet distribution based on block maxima extracted from a time series. Bernoulli, 24(2):1427–1462, 05 2018. [BT15] St´ephane Boucheron and Maud Thomas. Tail index estimation, concentration and adaptivity. Electronic Journal of Statistics, 9(2):2751–2792, 2015. [BT16] Bojan Basrak and Azra Tafro. A complete onvergence theorem for stationary regularly varying time series. Extremes, 19(3):549–560, 2016. [BZ18] Axel B¨ ucher and Chen Zhou. A horse racing between the block maxima method and the peak–over–threshold approach. arXiv:1807.00282, 2018. [CCH16] Alberto Chiarini, Alessandra Cipriani, and Rajat Subhra Hazra. Extremes of some Gaussian random interfaces. Journal of Statistical Physics, 165(3):521–544, 2016. [CCHM86] Miklos Cs¨ org˝ o, S´ andor Cs¨ org˝ o, Lajos Horv´ ath, and David Mason. Weighted empirical and quantile processes. Annals of Statistics, 14:31–85, 1986. [CDM85] S´ andor Cs¨ org˝ o, Paul Deheuvels, and David Mason. Kernel estimates of the tail index of a distribution. Annals of Statistics, 13(3):1050–1077, 1985.

References

663

[CHM91] Michael R. Chernick, Tailen Hsing, and William P. McCormick. Calculating the extremal index for a class of stationary sequences. Advances in Applied Probability, 23(4):835– 850, 1991. [CL67] Harald Cram´er and Ross M. Leadbetter. Stationary and related stochastic processes. Sample function properties and their applications. John Wiley & Sons, Inc., New YorkLondon-Sydney, 1967. [Cli07] Daren B. H. Cline. Regular variation of order 1 nonlinear ararch models. Stochastic Process. Appl., 117:840–861, 2007. [CM85] Miklos Cs¨ org˝ o and David Mason. Central limit theorems for sums of extreme values. Mathematical Proceedings of the Cambridge Philosophical Society, 98(3):547–558, 1985. [Dav83] Richard A. Davis. Stable limits for partial sums of dependent random variables. Annals of Probability, 11(2):262–269, 1983. [DCD86] Didier Dacunha-Castelle and Marie Duflo. Probability and statistics. Vol. II. Springer-Verlag, New York, 1986. [DDSW18] Richard A. Davis, Holger Drees, Johan Segers, and Michal Warchol. Inference on the tail process with application to financial time series modelling. Journal of Econometrics, 205(2):508– 525, 2018. [DEdH89] Arnold L.M. Dekkers, John H.J. Einmahl, and Laurens de Haan. A moment estimator for the index of an extreme value distribution. Annals of Statistics, 17(4):1833–1855, 1989. [DEM12] Cl´ement Dombry and Fr´ed´eric Eyi-Minko. Strong mixing properties of max-infinitely divisible random fields. 122(11):3790– Stochastic Processes and their Applications, 3811, 2012. [DEO16] Cl´ement Dombry, Sebastian Engelke, and Marco Oesting. Exact simulation of max-stable processes. Biometrika, 103(2):303–317, 2016. [DFH18] Bikramjit Das and Vicky Fasen-Hartmann. Risk contagion under regular variation and asymptotic tail independence. Journal of Multivariate Analysis, 165:194–215, 2018. [dH84] Laurens de Haan. A spectral representation for max-stable processes. Annals of Probability, 12(4):1194–1204, 1984. [DH95] Richard A. Davis and Tailen Hsing. Point process and partial sum convergence for weakly dependent random variables with infinite variance. Annals of Probability, 23(2):879–917, 1995. [DH17] Krzysztof D¸ebicki and Enkelejd Hashorva. On extremal index of max-stable stationary processes. Probability and Mathematical Statistics, 37(2):299–317, 2017. [dHF06] Laurens de Haan and Ana Ferreira. Extreme value theory. Springer Series in Operations Research and Financial Engineering. Springer, New York, 2006. An introduction.

664

References

[dHRRdV89] Laurens de Haan, Sidney I. Resnick, Holger Rootz´en, and Casper G. de Vries. Extremal behaviour of solutions to a stochastic difference equation with applications to ARCH processes. Stochastic Processes and their Applications, 32(2):213– 224, 1989. [DHS18] Cl´ement Dombry, Enkelejd Hashorva, and Philippe Soulier. Tail measure and spectral tail process of regularly varying time series. Annals of Applied Probability, 28(6):3884–3921, 2018. [DJ89] Manfred Denker and Adam Jakubowski. Stable limit distributions for strongly mixing sequences. Statistics and Probability Letters, 8:477–483, 1989. [DJ94] Andre R. Dabrowski and Adam Jakubowski. Stable limits for associated random variables. Annals of Probability, 22(1):1– 16, 1994. [DJ17] Holger Drees and Anja Janßen. Conditional extreme value models: fallacies and pitfalls. Extremes, 20(4):777–805, 2017. [DK98] Holger Drees and Edgar Kaufmann. Selecting the optimal sample fraction in univariate extreme value estimation. Stochastic Processes and their Applications, 75(2):149– 172, 1998. [DK17] Cl´ement Dombry and Zakhar Kabluchko. Ergodic decompositions of stationary max-stable processes in terms of their spectral functions. Stochastic Processes and their Applications, 127(6):1763–1784, 2017. [DM79] R. L. Dobrushin and P´eter Major. Non-central limit theorems for nonlinear functionals of Gaussian fields. Zeitschrift f¨ ur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 50(1):27–52, 1979. [DM98a] Somnath Datta and William McCormick. Inference for the tail parameters of a linear process with heavy tail innovations. Annals of Institute of Statistical Mathematics, 50(2):337–359, 1998. [DM98b] Richard A. Davis and Thomas Mikosch. The sample autocorrelations of heavy-tailed processes with applications to ARCH. Annals of Statistics, 26(5):2049–2080, 1998. [DM01] Richard A. Davis and Thomas Mikosch. Point process convergence of stochastic volatility processes with application to sample autocorrelation. Journal of Applied Probability, 38A:93– 104, 2001. Probability, statistics and seismology. [DM09] Richard A. Davis and Thomas Mikosch. The extremogram: A correlogram for extreme events. Bernoulli, 38A:977–1009, 2009. Probability, statistics and seismology. [DMC12] Richard A. Davis, Thomas Mikosch, and Ivor Cribben. Towards estimating extremal serial dependence via the bootstrapped extremogram. Journal of Econometrics, 170:142 – 152, 2012.

References

665

[DMPS18] Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier. Markov Chains. Springer Series in Operations Research and Financial Engineering. Springer Nature Switzerland, Cham, 2018. [DMR13] Bikramjit Das, Abhimanyu Mitra, and Sidney I. Resnick. Living on the multidimensional edge: seeking hidden risks using regular variation. Advances in Applied Probability, 45(1):139– 163, 2013. [DR84] Richard A. Davis and Sidney I. Resnick. Tail estimates motivated by extreme value theory. Annals of Statistics, 12(4): 1467–1487, 1984. [DR85a] Richard A. Davis and Sidney I. Resnick. Limit theory for moving averages of random variables with regularly varying tail probabilities. Annals of Probability, 13(1):179–195, 1985. [DR85b] Richard A. Davis and Sidney I. Resnick. More limit theory for the sample correlation function of moving averages. Stochastic Processes and their Applications, 20(2):257– 279, 1985. [DR86] Richard A. Davis and Sidney I. Resnick. Limit theory for the sample covariance and correlation functions of moving averages. Annals of Statistics, 14(2):533–558, 1986. [DR96] Richard A. Davis and Sidney I. Resnick. Limit theory for bilinear processes with heavy-tailed noise. Annals of Applied Probability, 6(4):1191–1210, 1996. [DR10] Holger Drees and Holger Rootz´en. Limit theorems for empirical processes of cluster functionals. Annals of Statistics, 38(4):2145–2186, 2010. [DR11] Bikramjit Das and Sidney I. Resnick. Conditioning on an extreme component: model consistency with regular variation on cones. Bernoulli, 17(1):226–252, 2011. [Dre98a] Holger Drees. A general class of estimators of the extreme value index. Journal of Statistical Planning and Inference, 66:95– 112, 1998. [Dre98b] Holger Drees. On smooth statistical tail functionals. Scandinavian Journal of Statistics, 25:187–210, 1998. [Dre00] Holger Drees. Weighted approximations of tail processes for β-mixing random variables. Annals of Applied Probability, 10(4):1274–1301, 2000. [Dre02] Holger Drees. Tail empirical processes under mixing conditions. In Empirical process techniques for dependent data, pages 325–342. Birkh¨ auser Boston, Boston, MA, 2002. [Dre03] Holger Drees. Extreme quantile estimation for dependent data, with applications to finance. Bernoulli, 9(4):617–657, 2003. [Dre08] Holger Drees. Some aspects of extreme value statistics under serial dependence. Extremes, 11(1):35–53, 2008.

666

References

[Dre15] Holger Drees. Bootstrapping empirical processes of cluster functionals with application to extremograms. arXiv:1511.00420, 2015. [DSW15] Holger Drees, Johan Segers, and Michal Warchol. Statistics for tail processes of Markov chains. Extremes, 18(3):369–402, 2015. [Dud02] Richard M. Dudley. Real analysis and probability, volume 74 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2002. Revised reprint of the 1989 original. [DVJ08] Daryl J. Daley and David Vere-Jones. An introduction to the theory of point processes. Vol. II. Probability and its Applications (New York). Springer, New York, second edition, 2008. General theory and structure. [Ebe84] Ernst Eberlein. Weak convergence of partial sums of absolutely regular sequences. Statistics & Probability Letters, 2(5):291– 293, 1984. [EG81] William F. Eddy and James D. Gale. The convex hull of a spherically symmetric sample. Advances in Applied Probability, 13(4):751–763, 1981. [Ein92] John H. J. Einmahl. Limit theorems for tail processes with application to intermediate quantile estimation. Journal of Statistical Planning and Inference, 32(1):137–145, 1992. [EKM97] Paul Embrechts, Claudia Kl¨ uppelberg, and Thomas Mikosch. Modelling extremal events, volume 33 of Applications of Mathematics. Springer-Verlag, Berlin, 1997. [Fel71] William Feller. An introduction to probability theory and its applications. Vol. II. Second edition. John Wiley & Sons Inc., New York, 1971. [FGAMS06] Gilles Fa¨ y, B´arbara Gonz´ alez-Ar´evalo, Thomas Mikosch, and Gennady Samorodnitsky. Modeling teletraffic arrivals by a Poisson cluster process. Queueing Systems. Theory and Applications, 54(2):121–140, 2006. [FKS10] Vicky Fasen, Claudia Kl¨ uppelberg, and Martin Schlather. High-level dependence in time series models. Extremes, 13(1):1–33, 2010. [FM12] Anne-Laure Foug`eres and Cecile Mercadier. Risk measures and multivariate extensions of breiman’s theorem. Journal of Applied Probability, 49(2):364–384, 2012. [GGR96] Christian Genest, Kilani Ghoudi, and Bruno R´emillard. A note on tightness. Statistics & Probability Letters, 27(4):331–339, 1996. [GHV90] Evarist Gin´e, Marjorie G. Hahn, and Pirooz Vatan. Maxinfinitely divisible and max-stable sample continuous processes. Probability Theory and Related Fields, 87(2):139–165, 1990.

References

667

[GN16] Evarist Gin´e and Richard Nickl. Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge University Press, New York, 2016. [Gol91] Charles M. Goldie. Implicit renewal theory and tails of solutions of random equations. Annals of Applied Probability, 1(1):126–166, 1991. [GR06] Christian Gourieroux and Christian Y. Robert. Stochastic unit root models. Econometric Theory, 22(6):1052–1090, 2006. [Gut88] Alan Gut. Stopped Random Walks. Springer, 1988. [Hal82] Peter Hall. On some simple estimates of an exponent of regular variation. Journal of the Royal Statistical Society. Series B. Methodological, 44(1):37–42, 1982. [Has18] Enkelejd Hashorva. Representations of max-stable processes via exponential tilting. Stochastic Processes and their Applications, 128(9):2952–2978, 2018. [Has19] Enkelejd Hashorva. On extremal index of max-stable random fields. arXiv:, 2019. [HH80] Peter Hall and Chris C. Heyde. Martingale limit theory and its application. Academic Press, New York, 1980. [HHL88] Tailen Hsing, J¨ urg H¨ usler, and Ross M. Leadbetter. On the exceedance point process for a stationary sequence. Probability Theory and Related Fields, 78(1):97–112, 1988. [Hil75] Bruce M. Hill. A simple general approach to inference about the tail of a distribution. Annals of Statistics, 3(5):1163–1174, 1975. [HK08] Lajos Horv´ ath and Piotr Kokoszka. Sample autocovariances of long-memory time series. Bernoulli, 14(2):405–418, 2008. [HL06] Henrik Hult and Filip Lindskog. Regular variation for measures on metric spaces. Publ. Inst. Math. (Beograd) (N.S.), 80(94):121–140, 2006. [HM11] Rajat Subhra Hazra and Krishanu Maulik. Products in Conditional Extreme Value Model. arXiv:1104.1688, 2011. [HR07] Janet E. Heffernan and Sidney I. Resnick. Limit laws for random vectors with an extreme component. Annals of Applied Probability, 17(2):537–571, 2007. [HS08] Henrik Hult and Gennady Samorodnitsky. Tail probabilities for infinite series of regularly varying random vectors. Bernoulli, 14(3):838–864, 2008. [Hsi91a] Tailen Hsing. Estimating the parameters of rare events. Stochastic Processes and their Applications, 37(1):117–139, 1991. [Hsi91b] Tailen Hsing. On tail index estimation using dependent data. Annals of Statistics, 19(3):1547–1569, 1991.

668

References

[Hsi93] Tailen Hsing. Extremal index estimation for a weakly dependent stationary sequence. Annals of Statistics, 21(4):2043– 2071, 1993. [HT85] Erich Haeusler and Jozef L. Teugels. On asymptotic normality of Hill’s estimator for the exponent of regular variation. Annals of Statistics, 13(2):743–756, 1985. [HW84] Peter Hall and A. H. Welsh. Best attainable rates of convergence for estimates of parameters of regular variation. Annals of Statistics, 12(3):1079–1084, 1984. [Ibr62] I. A. Ibragimov. Some limit theorems for stationary processes. Teor. Verojatnost. i Primenen., 7:361–392, 1962. [Jak86] Adam Jakubowski. Principle of conditioning in limit theorems for sums of random variables. Annals of Probability, 14(3):902–915, 1986. [Jak89] Adam Jakubowski. α-stable limit theorems for sums of dependent random vectors. Journal of Multivariate Analysis, 29:219–251, 1989. [Jak93] Adam Jakubowski. Minimal conditions in p-stable limit theorems. Stochastic Processes and their Applications, 44:291– 327, 1993. [Jak97] Adam Jakubowski. Minimal conditions in p-stable limit theorems ii. Stochastic Processes and their Applications, 68:1–20, 1997. [JD16] Anja Janßen and Holger Drees. A stochastic volatility model with flexible extremal dependence structure. Bernoulli, 22(3):1448–1490, 2016. [Je19] Anja Janß en. Spectral tail processes and max-stable approximations of multivariate regularly varying time series. 129(6):1993– Stochastic Processes and their Applications, 2009, 2019. [JK19] Carsten Jentsch and Rafal Kulik. Bootstrapping hill estimator and tail arrays sums for regularly varying time series. Submitted, 2019. [JOC12] Predrag R. Jelenkovi´c and Mariana Olvera-Cravioto. Implicit renewal theory and power tails on trees. Advances in Applied Probability, 44(2):528–561, 2012. [JS14] Anja Janßen and Johan Segers. Markov tail chains. Journal of Applied Probability, 51(4):1133–1153, 2014. [Kal97] Olav Kallenberg. Foundations of modern probability. Probability and its Applications (New York). Springer-Verlag, New York, 1997. [Kal17] Olav Kallenberg. Random Measures, Theory and Applications, volume 77 of Probability Theory and Stochastic Modelling. Springer-Verlag, New York, 2017.

References

669

[Kes73] Harry Kesten. Random difference equations and renewal theory for products of random matrices. Acta Mathematica, 131:207– 248, 1973. [KM86] Yuji Kasahara and Makoto Maejima. Functional limit theorems for weighted sums of i.i.d. random variables. Probab. Theory Relat. Fields, 72(2):161–183, 1986. [KM88] Yuji Kasahara and Makoto Maejima. Weighted sums of i.i.d. random variables attracted to integrals of stable processes. Probability Theory and Related Fields, 78(1):75–96, 1988. [Kos03] Michael R. Kosorok. Bootstraps of sums of independent but not identically distributed stochastic processes. Journal of Multivariate Analysis, 84:299–318, 2003. [Kos08] Michael R. Kosorok. Introduction to empirical processes and semiparametric inference. Springer Series in Statistics. Springer, New York, 2008. [KP04] Claudia Kl¨ uppelberg and Serguei Pergamenchtchikov. The tail of the stationary distribution of a random coefficient AR(q) model. Annals of Applied Probability, 14(2):971–1005, 2004. [KR60] A. N. Kolmogorov and Ju. A. Rozanov. On a strong mixing condition for stationary Gaussian processes. Teor. Verojatnost. i Primenen., 5:222–227, 1960. [Kri14] Danijel Krizmani´c. Weak convergence of partial maxima processes in the M1 topology. Extremes, 17(3):447–465, 2014. [Kri16] Danijel Krizmani´c. Functional weak convergence of partial maxima processes. Extremes, 19(1):7–23, 2016. [Kri19] Danijel Krizmani´c. Functional convergence for moving averages with heavy tails and random coefficients. ALEA. Latin American Journal of Probability and Mathematical Statistics, 16(1):729–757, 2019. [KS11] Rafal Kulik and Philippe Soulier. The tail empirical process for long memory stochastic volatility sequences. Stochastic Processes and their Applications, 121(1):109 – 134, 2011. [KS12] Rafal Kulik and Philippe Soulier. Limit theorems for long memory stochastic volatility models with infinite variance: Partial sums and sample covariances. Advances in Applied Probability, 44(4):1113 – 1141, 2012. [KS13] Rafal Kulik and Philippe Soulier. Estimation of limiting conditional distributions for the heavy tailed long memory stochastic volatility process. Extremes, 16(2):203 – 239, 2013. [KS15] Rafal Kulik and Philippe Soulier. Heavy tailed time series with extremal independence. Extremes, 18(2):273–299, 2015. [KSdH09] Zakhar Kabluchko, Martin Schlather, and Laurens de Haan. Stationary max-stable fields associated to negative definite functions. Annals of Probability, 37(5):2042–2065, 2009.

670

References

[KSW18] Rafal Kulik, Philippe Soulier, and Olivier Wintenberger. The tail empirical process of regularly varying functions of geometrically ergodic markov chains. Stochastic Processes and their Applications, 2018. Accepted. DOI: j.spa.2018.11.014. [KT95] Piotr S. Kokoszka and Murad S. Taqqu. Fractional arima with stable innovations. Stochastic Processes and their Applications, 60:19–47, 1995. [KT96] Piotr S. Kokoszka and Murad S. Taqqu. Infinite variance stable moving averages with long memory. Journal of Econometrics, 73:79–99, 1996. [KT19] Rafal Kulik and Zhigang Tong. Estimation of the expected shortfall given an extreme component under conditional extreme value model. Extremes, 22(1):29–70, 2019. [Kul06] Rafal Kulik. Limit theorems for moving averages with random coefficients and heavy-tailed noise. Journal of Applied Probability, 43(1):245–256, 2006. [KW92] Stanislaw Kwapie´ n and Wojbor A. Woyczy´ nski. Random series and stochastic integrals: single and multiple. Probability and its Applications. Birkh¨ auser Boston, Inc., Boston, MA, 1992. [LLR83] Ross M. Leadbetter, Georg Lindgren, and Holger Rootz´en. Extremes and related properties of random sequences and processes. Springer Series in Statistics. Springer-Verlag, New York, 1983. [LR88] Ross M. Leadbetter and Holger Rootz´en. Extremal theory for stochastic processes. Annals of Probability, 16(2):431–478, 1988. [LR11] Sana Louhichi and Emmanuel Rio. Functional convergence to stable L´evy motions for iterated random Lipschitz mappings. Electronic Journal of Probability, 16:no. 89, 2452–2480, 2011. [LR12] Martin Larsson and Sidney I. Resnick. Extremal dependence measure and extremogram: the regularly varying case. Extremes, 15(2):231–256, 2012. [LRR14] Filip Lindskog, Sidney I. Resnick, and Joyjit Roy. Regularly varying measures on metric spaces: hidden regular variation and hidden jumps. Probability Surveys, 11:270–314, 2014. [LT96] Anthony W. Ledford and Jonathan A. Tawn. Statistics for near independence in multivariate extreme values. Biometrika, 83(1):169–187, 1996. [Mik09] Thomas Mikosch. Non-life insurance mathematics. Universitext. Springer-Verlag, Berlin, second edition, 2009. An introduction with the Poisson process.

References

671

[Mok88] Abdelkader Mokkadem. Mixing properties of ARMA processes. Stochastic Processes and their Applications, 29(2):309– 315, 1988. [Mok90] Abdelkader Mokkadem. Propri´et´es de m´elange des processus autor´egressifs polynomiaux. Annales de l’Institut Henri Poincar´e. Probabilit´es et Statistiques, 26(2):219–260, 1990. [MR04] Krishanu Maulik and Sidney I. Resnick. Characterizations and examples of hidden regular variation. Extremes, 7(1):31–67 (2005), 2004. [MR11] Abhimanyu Mitra and Sidney I. Resnick. Hidden regular variation and detection of hidden risks. Stochastic Models, 27:591– 614, 2011. [MR13] Thomas Mikosch and Mohsen Rezapour. Stochastic volatility models with possible extremal clustering. Bernoulli, 19(5A):1688–1713, 2013. [MRR02] Krishanu Maulik, Sidney I. Resnick, and Holger Rootz´en. Asymptotic independence and a network traffic model. Journal of Applied Probability, 39(4):671–699, 2002. [MS00] Thomas Mikosch and Gennady Samorodnitsky. The supremum of a negative drift random walk with dependent heavy-tailed steps. Annals of Applied Probability, 10(3):1025–1064, 2000. [MS10] Thomas Meinguet and Johan Segers. Regularly varying time series in banach spaces. arXiv:1001.3262, 2010. [MT09] Sean Meyn and Richard L. Tweedie. Markov chains and stochastic stability. Cambridge University Press, Cambridge, second edition, 2009. [MW13] Thomas Mikosch and Olivier Wintenberger. Precise large deviations for dependent regularly varying sequences. 156(3-4):851–887, Probability Theory and Related Fields, 2013. [MW14] Thomas Mikosch and Olivier Wintenberger. The cluster index of regularly varying sequences with applications to limit theory for functions of multivariate Markov chains. Probability Theory and Related Fields, 159(1-2):157– 196, 2014. [MW16] Thomas Mikosch and Olivier Wintenberger. A large deviations approach to limit theorem for heavy-tailed time series. Probability Theory and Related Fields, 166(1-2):233– 269, 2016. [MZ15] Thomas Mikosch and Yuwei Zhao. The integrated periodogram of a dependent extremal event sequence. Stochastic Processes and their Applications, 125(8):3126–3169, 2015. [O’B74] G. L. O’Brien. Limit theorems for sums of chain-dependent processes. Journal of Applied Probability, 11:582–587, 1974.

672

References

[O’B87] George L. O’Brien. Extreme values for stationary and Markov sequences. Annals of Probability, 15(1):281–291, 1987. [Per94] Roland Perfekt. Extremal behaviour of stationary Markov chains with applications. The Annals of Applied Probability, 4(2):529–548, 1994. [Pol02] David Pollard. A user’s guide to measure theoretic probability, volume 8 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2002. [PS18] Hrvoje Planini´c and Philippe Soulier. The tail process revisited. Extremes, 21(4):551–579, 2018. [PT85] Tuan D. Pham and Lanh T. Tran. Some mixing properties of time series models. Stochastic Processes and their Applications, 19(2):297 – 303, 1985. [PT17] Vladas Pipiras and Murad S. Taqqu. Long-range dependence and self-similarity. Cambridge Series in Statistical and Probabilistic Mathematics, [45]. Cambridge University Press, Cambridge, 2017. [Res86] Sidney I. Resnick. Point processes, regular variation and weak convergence. Advances in Applied Probability, 18(1):66–138, 1986. [Res87] Sidney I. Resnick. Extreme values, regular variation and point processes. Applied Probability, Vol. 4,. New York, SpringerVerlag, 1987. [Res02] Sidney I. Resnick. Hidden regular variation, second order regular variation and asymptotic independence. Extremes, 5(4):303–336, 2002. [Res07] Sidney I. Resnick. Heavy-Tail Phenomena. Springer Series in Operations Research and Financial Engineering. Springer, New York, 2007. Probabilistic and statistical modeling. [Res08] Sidney I. Resnick. Multivariate regular variation on cones: application to extreme values, hidden regular variation and conditioned limit laws. Stochastics, 80(2-3):269–298, 2008. [RLdH98] Holger Rootz´en, Ross M. Leadbetter, and Laurens de Haan. On the distribution of tail array sums for strongly mixing stationary sequences. Annals of Applied Probability, 8(3):868– 885, 1998. [Rob07] Christian Y. Robert. Stochastic stability of some statedependent growth-collapse processes. Advances in Applied Probability, 39(1):189–220, 2007. [Rob08] Christian Y. Robert. Estimating the multivariate extremal index function. Bernoulli, 14(4):1027–1064, 11 2008. [Rob09] Christian Y. Robert. Inference for the limiting cluster size distribution of extreme values. Annals of Statistics, 37(1):271– 310, 2009.

References

673

[Roo78] Holger Rootz´en. Extremes of moving averages of stable processes. Annals of Probability, 6(5):847–869, 1978. [Roo88] Holger Rootz´en. Maxima and exceedances of stationary Markov chains. Advances in Applied Probability, 20(2):371– 390, 1988. [Roo95] Holger Rootz´en. The tail empirical process for stationary sequences. Technical report, Chalmers University, 1995. [Roo09] Holger Rootz´en. Weak convergence of the tail empirical process for dependent sequences. Stochastic Processes and their Applications, 119(2):468–490, 2009. [Ros56] Murray Rosenblatt. A central limit theorem and a strong mixing condition. Proc. Nat. Acad. Sci. U. S. A., 42:43–47, 1956. [Roy17] Parthanil Roy. Maxima of stable random fields, nonsingular actions and finitely generated abelian groups: a survey. Indian Journal of Pure and Applied Mathematics, 48(4):513– 540, 2017. [RRSS06] Gareth O. Roberts, Jeffrey S. Rosenthal, Johann Segers, and Bruno Sousa. Extremal indices, geometric ergodicity of markov chains, and mcmc. Extremes, 9(3-4):213–229, 2006. [RS97] Sidney I. Resnick and C˘ at˘ alin St˘ aric˘ a. Asymptotic behavior of Hill’s estimator for autoregressive data. Communications in Statistics. Stochastic Models, 13(4):703– 721, 1997. Heavy tails and highly volatile phenomena. [RS98] Sidney I. Resnick and Catalin St˘ aric˘ a. Tail index estimation for dependent data. Annals of Applied Probability, 8(4):1156– 1183, 1998. [RS08] Christian Y. Robert and Johan Segers. Tails of random sums of a heavy-tailed number of light-tailed terms. Insurance: Mathematics & Economics, 43(1):85–92, 2008. [RSF09] Christian Y. Robert, Johan Segers, and Christopher A. T. Ferro. A sliding blocks estimator for the extremal index. Electronic Journal of Statistics, 3:993–1020, 2009. [RZ13a] Sidney I. Resnick and David Zeber. Asymptotics of Markov kernels and the tail chain. Advances in Applied Probability, 45(1):186–213, 2013. [RZ13b] Sidney I. Resnick and David Zeber. Clustering of Markov chain exceedances. Bernoulli, 19(4):1419–1448, 2013. [RZ14] Sidney I. Resnick and David Zeber. Transition kernels and the conditional extreme value model. Extremes, 17(2):263–287, 2014. [Sam06] Gennady Samorodnitsky. Long range dependence. Foundations R and Trends in Stochastic Systems, 1(3):163–257, 2006. [Sam16] Gennady Samorodnitsky. Stochastic processes and long range dependence. Springer Series in Operations Research and Financial Engineering. Springer, Cham, 2016.

674

References

[Seg03] Johan Segers. Functionals of clusters of extremes. Advances in Applied Probability, 35(4):1028–1045, 2003. [SH08] Allan Sly and Chris Heyde. Nonstandard limit theorem for infinite variance functionals. Annals of Probability, 36(2):796– 805, 2008. [Shi15] Zhan Shi. Branching random walks, volume 2151 of Lecture Notes in Mathematics. Springer, Cham, 2015. Lecture notes from the 42nd Probability Summer School held in ´ ´ e de Probabilit´es de Saint-Flour. Saint Flour, 2012, Ecole d’Et´ [Saint-Flour Probability Summer School]. [Smi88] Richard L. Smith. A counterexample concerning the extremal index. Advances in Applied Probability, 20(3):681–683, 1988. [Smi92] Richard L. Smith. The extremal index for a Markov chain. Journal of Applied Probability, 29(1):37–45, 1992. [SO12] Gennady Samorodnitsky and Takashi Owada. Tail measures of stochastic processes or random fields with regularly varying tails. Unpublished manuscript, 2012. [Sou19] Eliette Soulier. En r´eponse `a Cervant`es. Les listes dans La Arcadia de Lope de Vega (1598). In Olivier Biaggini et Philippe Gu´erin (eds), Entre les choses et les mots. Usages et prestiges des listes, pages 189–204. Presses Universit´e Sorbonne Nouvelle, 2019. [ST94] Gennady Samorodnitsky and Murad S. Taqqu. Stable non-Gaussian random processes. Stochastic Modeling. Chapman & Hall, New York, 1994. Stochastic models with infinite variance. [SW94] Richard L. Smith and Ishay Weissman. Estimating the extremal index. Journal of the Royal Statistical Society. Series B. Methodological, 56(3):515–528, 1994. [SZM17] Johan Segers, Yuwei Zhao, and Thomas Meinguet. Polar decomposition of regularly varying time series in star-shaped metric spaces. Extremes, 20(3):539–566, 2017. [Taq79] Murad S. Taqqu. Convergence of integrated processes of arbitrary Hermite rank. Zeitschrift f¨ ur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 50(1):53–83, 1979. [Til18] Charles Tillier. Extremal index for a class of heavy-tailed stochastic processes in risk theory. In Nonparametric statistics, volume 250 of Springer Proc. Math. Stat., pages 171–183. Springer, Cham, 2018. [TK10a] Marta Tyran-Kami´ nska. Convergence to L´evy stable processes under some weak dependence conditions. Stochastic Processes and their Applications, 120(9):1629–1650, 2010. [TK10b] Marta Tyran-Kami´ nska. Functional limit theorems for linear processes in the domain of attraction of stable laws. Statistics and Probability Letters, 80(11-12):975–981, 2010.

References

675

[TK10c] Marta Tyran-Kami´ nska. Weak convergence to L´evy stable processes in dynamical systems. Stochastics and Dynamics, 10(2):263–289, 2010. [Vat84] Pirooz Vatan. Max-stable and max-infinitely divisible laws on infinite dimensional spaces. Ph.D. thesis, Massachusetts Institute of Technology, 1984. [Vat85] Pirooz Vatan. Max-infinite divisibility and max-stability in infinite dimensions. In Probability in Banach spaces, V (Medford, Mass., 1984), volume 1153 of Lecture Notes in Math., pages 400–425. Springer, Berlin, 1985. [vdVW96] Aad W. van der Vaart and Jon A. Wellner. Weak convergence and empirical processes. Springer, New York, 1996. [Ver72] Wim Vervaat. Functional central limit theorems for processes with positive drift and their inverses. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 23:245–253, 1972. [VR59] V. A. Volkonski˘ı and Yu. A. Rozanov. Some limit theorems for random functions. I. Theor. Probability Appl., 4:178–197, 1959. [Whi80] Ward Whitt. Some useful functions for functional limit theorems. Mathematics of Operations Research, 5(1):67–85, 1980. [Whi02] Ward Whitt. Stochastic-process limits. Springer-Verlag, New York, 2002. [WN98] Ishay Weissman and S. Yu. Novak. On blocks and runs estimators of the extremal index. Journal of Statistical Planning and Inference, 66(2):281–288, 1998. [Yun98] Seokhoon Yun. The extremal index of a higher-order stationary markov chain. Annals of Applied Probability, 8(2):408– 437, 1998. [Yun00] Seokhoon Yun. The distributions of cluster functionals of extreme events in a dth-order markov chain. Journal of Applied Probability, 37(1):29–44, 2000. [Zha16] Yuwei Zhao. Point processes in a metric space. ArXiv e-prints, November 2016.

Index

Symbols A , 112 Ac , 112, 491 Bn (s0 ), 252 J1 topology, 532 N (G, ρ, ), 545 Q(t), 12 BLk , 498 CTE|X | (h), 136 n , 279 A ∗   n,rn , 285 ν T , 236 Tn , 235 Tn , 253  n , 266, 281 A  An , 286  n , 282 G Ih , 319  n , 236 T  T†n , 242 aX,n , 12 A, 128 A1 , 128 B, 254 β-ARCH, 422 W, 236 Y , 106 A, 491 cst, xi dJ1 , 532 dM1 , 541 dBL , 498

d∞ , 105 -enlargement, 24, 112 E, 121, 141, 251, 282 Ew , 121 Mψ , 316 Tψ , 280 inf arg max, 128 ◦ A  , 491  f (x)μ(dx), 496 f dμ, 496 0 , 150 B(M), 176 Sψ,ei (rn , un ), 316 Slin,ei (rn , un ), 320 Slog,pr (rn , un ), 325 Spr (rn , un ), 325 Cd,2 , 54 MB , 509 wJ , 534 wM , 543 M, 174 M(E), 174 N , 174 N (E), 174 μ(f ), 496 μ0,h , 314 σ h , 314 ν i,j , 103 ⊗, 496 Nn , 186 Nn , 185 Nn , 185

© Springer Science+Business Media, LLC, part of Springer Nature 2020 R. Kulik and P. Soulier, Heavy-Tailed Time Series, Springer Series in Operations Research and Financial Engineering, https://doi.org/10.1007/978-1-0716-0737-4

677

678

Index

BCF, 290 K, 293 D([0, 1]), 531 E∗ , 107 S, 107 M1 , 495 ν, 104 ν ∗ , 127 ν ∗n,rn , 153, 265 τ|X |(h), 136  n , 265 λ  λh,n , 287  n , 274 λ  h,n , 287 λ  ∗n,rn , 265 ν  †n,rn , 269 ν  ∗n,rn , 274 ν Gn , 266 G, 278  h,n , 316 M Mh , 316 ˜0 , 154 ∨,∧, 24   n,ξ , 334 G

γ n,k,ξ , 337 f ←, 7 m-dependence, 151 xα , 135, 213 Q|X| , 12 Tn , 235  ∗n,rn ,ξ , 333 ν conditional spectral tail process, 124, 126, 127, 217 VC-class, 550 VC-index, 550 VC-subgraph, 550 C` adl` ag, 531 empirical cluster measure, 265 empirical cluster process, 266 tail empirical measure, 265 functional tail empirical process, 266 max-stable time series, 352 vector, 350 ANSJB(rn , cn ), 162, 163, 165, 275, 294, 394 Slog (rn , un ), 256, 325, 326, 337, 338

Sψ (rn , un ), 280, 281, 286, 301, 303, 304 β(rn , n ), 192, 245, 248, 249, 252, 253, 256, 266–269, 278, 281, 283, 286, 290–294, 315 S(rn , un ), 243, 245, 247–249, 252, 253, 256, 283, 286 AC(rn , un ), 267, 268, 269, 272, 274, 275, 277, 278, 290–294 AC(rn , cn ), 150, 163, 165–167, 188, 193, 224 ANSJ(an ), 219, 228, 229 ANSJU(an ), 221, 223, 224 R(rn , un ), 242, 243, 245, 247–249, 252, 253, 256, 266–269, 272, 274, 275, 277, 278, 281, 283, 286, 290–294, 315 self-similar, 455 Semi-cone, 521 tail empirical distribution function, 235 A absolute regularity, 570 anchoring map, 128 AR(1), 116, 137, 169, 202, 230, 260, 261, 395, 450 extremal index, 396 ARCH(1), 400, 415 B backshift, 103 bilinear process Markovian, 416 non-Markovian, 416 blocks estimator cluster size distribution, 292 extremal index, 291 large deviation index, 295 ruin index, 295 stop-loss index, 292 Borel set, 495 sigma-field, 495 boundedness, 505 B0 , 200, 506 localized, 505 Breiman’s Lemma, 15

Index C candidate extremal index, 123 right-tail, 198 closure, 491 cluster, 153 cluster measure, 127 cluster index, 154 cluster size distribution, 160 large deviation index, 166, 169 mean cluster size, 160 ruin index, 167, 169 stop-loss index, 161 cluster size distribution, 160 blocks estimator, 292 compatible metric, 509 complementary, 491 conditional scaling exponent, 64, 313 conditional tail expectation, 50 cone, 521 continuity set, 495 convergence vague# , 25, 174, 509 local uniform, 499 covering number, 545 D de Bruijn conjugate, 8 dilated set, 24 dirac measure, 27 E exponential AR(1), 413, 421 exponent measure, 25 hidden, 55 extremal dependence measure (EDM), 48 extremal independence, 29 extremal index, 195 AR(1), 396 blocks estimator, 291 candidate, 123 candidate (right-tail), 198 MA(∞), 438 regenerative process, 404 stochastic recurrence equation, 399 extremal skewness, 12 extremogram, 49 F Fr´echet distribution, 350

679

scale parameter, 350 tail index, 350 frontier, 112, 491 function left-continuous inverse, 7 regularly varying, 4 slowly varying, 4 G GARCH(1,1), 401, 416 geometric drift, 389 H Hermite expansion, 462 Hermite polynomials, 461 Hermite rank, 462 Hermite process, 463 hidden exponent measure, 55 hidden regular variation, 55 hidden tail index, 55 Hill estimator, 255 AR(1), 260 MA(1), 260 I index of regular variation of random vector, 26 interior, 491 inverse function, 7 K Karamata Theorem, 7 killed random walk, 404 L Laplace functional, 177 large deviation index, 166, 169 blocks estimator, 295 long memory, 437 M MA(∞), 436 extremal index, 438 MA(1), 115, 137, 169, 224, 230, 260 Markov chain, 374 V -geometrically ergodic, 561 aperiodic, 560 functional representation, 375 irreducible, 560 small set, 560 Markov kernel, 373

680

Index

mean measure, 176 mean cluster size, 160 measurable space, 494 measure boundedly finite, 25 metric, 492 compatible, 176, 182 metric entropy, 545 mixing alpha, 570 beta, 570 strong, 570 mixing coefficients, 569 MMA(1), 116, 137 models β-ARCH, 422 AR(1), 395 ARCH(1), 400, 415 bilinear process, 416 exponential AR(1), 413, 421 GARCH(1,1), 401, 416 killed random walk, 404 MA(∞), 436 stochastic recurrence equation, 377, 397, 399 switching exponential AR(1), 414 modulus of continuity, 534 M1 topology, 543 multiplier statistics, 334 P partial regular variation, 60 point measure, 174 point process, 176 of exceedances, 185 of exceedances (functional), 185 simple, 176 Poisson point process, 177 intensity measure, 177 Potter’s bounds, 14 PPP(ν), 177 process separable, 545 stochastically continuous, 536 Prokhorov Theorem, 501 Q quantile function, 12

R random measure, 176 regular variation composition, 9 function, 4 hidden, 55 inverse, 8 joint, 103 multivariate, 25 of sequences, 10 partial, 60 relative compactness, 501 Representation Theorem, 6 ruin index, 167, 169 blocks estimator, 295 S scaling sequence, 24 semi-metric, 492 random, 546 separable process, 545 set shift-invariant, 122 homogeneous, 122 sign, 42 single big jump heuristics, 27 slowly varying function Karamata Theorem, 7 Representation Theorem, 6 Uniform Convergence Theorem, 3 slow variation function, 4 space q (Rd ), 103 0 (Rd ), 103 Banach, 492 complete, 492 localized Polish, 506 metrizable, 492 Polish, 492 separable, 491 totally bounded, 545 spectral decomposition, 38 measure, 39 tail process, 107 spectral tail process, 120 stable distribution, 210

Index

681

L´evy process, 211 stable process L´evy fractional stable motion, 455 LFSM, 455 stochastic recurrence equation, 397 extremal index, 399, 432 with heavy tailed innovations, 397 with light tailed innovations, 377, 399 stochastic volatility, 44, 56 stop-loss index, 161 blocks estimator, 292 survival function, 11

univariate, 235 tail empirical process univariate, 236 tail empirical quantile function, 252 tail equivalence, 29 tail index, 11, 26 hidden, 55 tail measure, 104, 120 tail process, 106, 120 associated to a tail measure, 120 totally bounded, 545 total variation distance, 497 transition kernel, 373

T tail dependence coefficient, 50 tail empirical distribution

U Uniform Convergence Theorem, 3 upper quantile function, 13