261 102 12MB
English Pages 453 [456] Year 2012
MODERN PROBABILITY AND STATISTICS GENERALIZED POISSON MODELS AND THEIR APPLICATIONS IN INSURANCE AND FINANCE
ALSO AVAILABLE IN M O D E R N PROBABILITY AND STATISTICS Robustness in Data Analysis: criteria and methods G.L. Shevlyakov
and N.O.
Vilchevski
Asymptotic Theory of Testing Statistical Hypotheses: Efficient Statistics, Optimality, Power Loss and Deficiency V.E.
Bening
Selected Topics in Characteristic Functions N.G.
Ushakov
Chance and Stability. Stable Distributions and their Applcatlons V.M. Zolotarev
and V.V.
Uchaikin
Normal Approximation: New Results, Methods and Problems V.V. Senatov
Modern Theory of Summation of Random Variables V.M.
Zolotarev
MODERN PROBABILITY AND STATISTICS
Generalized Poisson Models and their Applications in Insurance and Finance Vladimir E. Bening and Victor Yu. Korolev Moscow State University
mys?m UTRECHT · BOSTON · KÖLN · TOKYO, 2 0 0 2
VSPBV
Tel: + 3 1 3 0 6 9 2 5 7 9 0
PO. Box 346
Fax:+31 30 693 2081
3 7 0 0 A H Zeist
[email protected]
The Netherlands
www.vsppub.com
© VSP B V 2002 First p u b l i s h e d in 2 0 0 2 ISBN 90-6764-366-1
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
Printed
in The Netherlands
by Ridderprint
bv,
Ridderkerk.
Contents Foreword
ix
Preface
xiii
1 Basic notions of probability theory 1.1 Random variables, their distributions and moments 1.2 Generating and characteristic functions 1.3 Random vectors. Stochastic independence 1.4 Weak convergence of random variables and distribution functions . . . 1.5 Poisson theorem 1.6 Law of large numbers. Central limit theorem. Stable laws 1.7 The Berry-Esseen inequality 1.8 Asymptotic expansions in the central limit theorem 1.9 Elementary properties of random sums 1.10 Stochastic processes
1 1 11 21 24 30 35 45 47 56 62
2 Poisson process 2.1 The definition and elementary properties of a Poisson process 2.2 Poisson process as a model of chaotic displacement of points in time . . 2.3 The asymptotic normality of a Poisson process 2.4 Elementary rarefaction of renewal processes
69 69 72 74 76
3 Convergence of superpositions of independent stochastic processes 3.1 Characteristic features of the problem 3.2 Approximation of distributions of randomly indexed random sequences by special mixtures 3.3 The transfer theorem. Relations between the limit laws for random sequences with random and non-random indices 3.4 Necessary and sufficient conditions for the convergence of distributions of random sequences with independent random indices 3.5 Convergence of distributions of randomly indexed sequences to identifiable location or scale mixtures. The asymptotic behavior of extremal random sums 3.6 Convergence of distributions of random sums. The central limit theorem and the law of large numbers for random sums 3.7 A general theorem on the asymptotic behavior of superpositions of independent stochastic processes
83 83
V
85 89 91 98 105 115
3.8 The transfer theorem for random sums of independent identically distributed random variables in the double array limit scheme 117 4 Compound Poisson distributions 4.1 Mixed and compound Poisson distributions 4.2 Discrete compound Poisson distributions 4.3 The asymptotic normality of compound Poisson distributions. The Berry-Esseen inequality for Poisson random sums. Non-central Lyapunov fractions 4.4 Asymptotic expansions for compound Poisson distributions 4.5 The asymptotic expansions for the quantiles of compound Poisson distributions 4.6 Exponential inequalities for the probabilities of large deviations of Poisson random sums. An analog of Bernshtein-Kolmogorov inequality . . 4.7 The application of Esscher transforms to the approximation of the tails of compound Poisson distributions 4.8 Estimates of convergence rate in local limit theorems for Poisson random sums
123 123 129
5 Classical risk processes 5.1 The definition of the classical risk process. Its asymptotic normality . . 5.2 The Pollaczek-Khinchin-Beekman formula for the ruin probability in the classical risk process 5.3 Approximations for the ruin probability with small safety loading . . . 5.4 Asymptotic expansions for the ruin probability with small safety loading 5.5 Approximations for the ruin probability 5.6 Asymptotic approximations for the distribution of the surplus in general risk processes 5.7 A problem of inventory control 5.8 A non-classical problem of optimization of the initial capital
181 181
133 139 151 155 157 166
185 189 191 203 213 221 227
6 Doubly stochastic Poisson processes (Cox processes) 233 6.1 The asymptotic behavior of random sums of random indicators 233 6.2 Mixed Poisson processes 238 6.3 The modified Pollaczek-Khinchin-Beekman formula 249 6.4 The definition and elementary properties of doubly stochastic Poisson processes 252 6.5 The asymptotic behavior of Cox processes 256 7 Compound Cox processes with zero mean 7.1 Definition. Examples 7.2 Conditions of convergence of the distributions of compound Cox processes with zero mean. Limit laws 7.3 Convergence rate estimates 7.4 Asymptotic expansions for the distributions of compound Cox processes with zero mean 7.5 Asymptotic expansions for the quantiles of compound Cox processes with zero mean 7.6 Exponential inequalities for the probabilities of large deviations of compound Cox processes with zero mean
vi
265 265 266 269 273 281 283
7.7 Limit theorems for extrema of compound Cox processes with zero mean 285 7.8 Estimates of the rate of convergence of extrema of compound Cox processes with zero mean 287 8 Modeling evolution of stock prices by compound Cox processes 8.1 Introduction 8.2 Normal and stable models 8.3 Heterogeneity of operational time and normal mixtures 8.4 Inhomogeneous discrete chaos and Cox processes 8.5 Restriction of the class of mixing distributions 8.6 Heavy-tailedness of scale mixtures of normals 8.7 The case of elementary increments with non-zero means 8.8 Models within the double array limit scheme 8.9 Quantiles of the distributions of stock prices
291 291 292 294 297 303 307 308 310 313
9 Compound Cox processes with nonzero mean 9.1 Definition. Examples 9.2 Conditions of convergence of compound Cox processes with nonzero mean. Limit laws 9.3 Convergence rate estimates for compound Cox processes with nonzero mean 9.4 Asymptotic expansions for the distributions of compound Cox processes with nonzero mean 9.5 Asymptotic expansions for the quantiles of compound Cox processes with nonzero mean 9.6 Exponential inequalities for the negative values of the surplus in collective risk models with stochastic intensity of insurance payments . . . . 9.7 Limit theorems for extrema of compound Cox processes with nonzero mean 9.8 Convergence rate estimates for extrema of compound Cox processes with nonzero mean 9.9 Minimum admissible reserve of an insurance company with stochastic intensity of insurance payments 9.10 Optimization of the initial capital of an insurance company in a static insurance model with random portfolio size
317 317 318 322 326 338 339 342 347 350 351
10 Functional limit theorems for compound Cox processes 357 10.1 Functional limit theorems for non-centered compound Cox processes . . 357 10.2 Functional limit theorems for nonrandomly centered compound Cox processes 363 11 Generalized risk processes 11.1 The definition of generalized risk processes 11.2 Conditions of convergence of the distributions of generalized risk processes 11.3 Convergence rate estimates for generalized risk processes 11.4 Asymptotic expansions for the distributions ofgeneralized risk processes 11.5 Asymptotic expansions for the quantiles of generalized risk processes . 11.6 Exponential inequalities for the probabilities of negative values of generalized risk processes vii
373 373 375 378 381 384 386
12 Statistical inference concerning the parameters of risk processes 391 12.1 Statistical estimation of the ruin probability in classical risk processes 391 12.2 Specific features of statistical estimation of ruin probability for generalized risk processes 395 12.3 A nonparametric estimator of the ruin probability for a generalized risk process 398 12.4 Interval estimator of the ruin probability for a generalized risk process 404 12.5 Computational aspects of the construction of confidence intervals for the ruin probability in generalized risk processes 412 Bibliography
415
Index
431
viii
Foreword This book is the seventh in the series of monographs ^Modern Probability and Statistics' following the books • V.M. Zolotarev, Modern Theory of Summation of Random Variables; • V.V. Senatov, Normal Approximation: New Results, Methods and Problems] • V.M. Zolotarev and V.V. Uchaikin, Chance and Stability. Stable Distributions and their Applications] • N.G. Ushakov, Selected Topics in Characteristic Functions. • V.E. Bening, Asymptotic Theory of Testing Statistical Hypotheses: Efficient Statistics, Optimality, Power Loss, and Deficiency. • G.L. Shevlyakov and N.O. Vilchevski, Robustness in Data Analysis: Criteria and Methods. The Russian school of probability theory and mathematical statistics made a universally recognized contribution to these sciences. Its potentialities are not only very far from being exhausted, but are still increasing. During last decade there appeared many remarkable results, methods and theories which undoubtedly deserve to be presented in monographic literature in order to make them widely known to specialists in probability theory, mathematical statistics and their applications. However, due to recent political changes in Russia followed by some economic instability, for the time being, it is rather difficult to organize the publication of a scientific book in Russia now. Therefore, a considerable stock of knowledge accumulated during last years yet remains scattered over various scientific journals. To improve this situation somehow, together with the VSP publishing house and first of all, its director, Dr. Jan Reijer Groesbeek who with readiness took up the idea, we present this series of monographs. The scope of the series can be seen from both the title of the series and the titles of the published and forthcoming books: • Yu.S. Khokhlov, Generalizations of Stable Distributions: Structure and Limit Theorems] ix
• A.V. Bulinski and M.A. Vronski, Limit Theorems for Associated Random Variables·, • Β. V. Gnedenko and A. N. Kolmogorov, Limit Distributions for Sums of Independent Random Variables, 2nd English edition. • P. P. Bocharov, C. D'Apice, Α. V. Pechinkin and S. Salerno, Queueing Theory. • M. V. Mikhalevich and I. V. Sergienko, Randomized Methods of Optimization through Preference Relations. • E.V. Morozov, General Queueing Networks: the Method of Regenerative Decomposition·, • A.N. Chuprunov, Random Processes Observed at Random Times·, • D.H. Mushtari, Probabilities and Topologies on Linear Spaces·, • V.G. Ushakov, Priority Queueing Systems; • V.Yu. Korolev and V.M. Kruglov, Random Sequences with Random Indices; • Yu.V. Prokhorov and A. P. Ushakova, Reconstruction of Distribution Types·, • L. Szeidl and V.M. Zolotarev, Limit Theorems for Random Polynomials and Related Topics;
as well as many others. To provide high-qualified international examination of the proposed books, we invited well-known specialists to join the Editorial Board. All of them kindly agreed, so now the Editorial Board of the series is as follows: L. Accardi (University Roma Tor Vergata, Rome, Italy) A. Balkema (University of Amsterdam, the Netherlands) M. Csörgö (Carleton University, Ottawa, Canada) W. Hazod (University of Dortmund, Germany) V. Korolev (Moscow State University, Russia)—Editor-in-Chief V. Kruglov (Moscow State University, Russia) M. Maejima (Keio University, Yokohama, Japan) J. D. Mason (University of Utah, Salt Lake City, USA) E. Omey (EHSAL, Brussels, Belgium) K. Sato (Nagoya University, Japan) J. L. Teugels (Katholieke Universiteit Leuven, Belgium) A. Weron (Wroclaw University of Technology, Poland) M. Yamazato (University of Ryukyu, Japan) V. Zolotarev (Steklov Institute of Mathematics, Moscow, Russia)—Editorin-Chief χ
We hope that the books of this series will be interesting and useful to both specialists in probability theory, mathematical statistics and those professionals who apply the methods and results of these sciences to solving practical problems. Of course, the choice of authors primarily from Russia is due only to the reasons mentioned above and by no means signifies that we prefer to keep to some national policy. We invite authors from all countries to contribute their books to this series.
V. Yu. Korolev, V. M. Zolotarev, Editors-in-Chief
Moscow, December 2001.
xi
Preface This book can be regarded as an ode to the Poisson distribution. The analytic properties of this distribution, capability of flexible generalizations and very many important practical applications demonstrate that this law, by all means, deserves such a solemn attitude. As is known, ordinary Poisson point processes are very good mathematical models of absolutely chaotic displacement of points on the line, on the plane or in the three-dimensional space (that is, of'discrete chaos'). Therefore, they are widely used in many applied problems to model 'absolutely stochastic' flows of events, that is, of events which are chaotically distributed in time, say, in queueing theory, physics, biology, reliability theory. Of special importance these models are for insurance and financial applications where they describe flows of insurance claims or flows of events at stock exchanges. The classical risk theory is based on the assumption that the process of insurance payments is Poisson. However, in actual life, as a rule, the observed processes are far from their ideal mathematical models. Therefore, the Poisson models are generalized in various ways in order to create more flexible and adequate models of real phenomena. As P. C. Consul noted in the preface to his book Generalized Poisson Distribution. Properties and Applications, based on the methods used, the researchers have used such expressions as compound, displaced, double, mixed, modified, generalized, Lagrangian and inflated to describe new distributions raised from the Poisson law by the corresponding mechanism of generalization. Nevertheless, without underestimating the role of other ways of generalization we dare say that the two main and natural directions of generalization of Poisson models are compounding and mixing. Compound Poisson processes are sums of independent identically distributed random variables up to a Poisson processes. These models are widely used in physics, biology and other fields of natural science. Of special importance they are traditionally in insurance where they are used to describe the surplus of an insurance company within the classical models of collective risk theory. Moreover, compound Poisson distributions play a very important role in the theory of limit theorems for sums of independent random variables where the method of accompanying infinitely divisible compound Poisson laws is well known (see, e.g., (Gnedenko xiii
and Kolmogorov, 1954; Kruglov, 1976)) and proved to be very efficient. The other way of generalization of Poisson models is mixing. In actual life, the observed chaos is never homogeneous. Therefore, it is more reasonable to model the 'inhomogeneous discrete chaos,' that is, inhomogeneous chaotic point process, by means of Poisson processes with stochastic intensities. These processes are called doubly stochastic Poisson processes or Cox processes. They appear to be very important mathematical models of real processes since, being more general than Poisson processes, they are more flexible and hence, more adequate models, and second, they inherit many favorable analytical properties of Poisson processes. Some very good books have been published on generalized Poisson models. Along with Handbook of the Poisson Distribution by F. A. Haight published by Wiley in 1967 we should mention three books by Jan Grandell: Doubly Stochastic Poisson Processes published by Springer in 1976, Aspects of Risk Theory published by Springer in 1991 and Mixed, Poisson Processes published by Chapman and Hall in 1997. These books mainly deal with characterization properties of compound and mixed Poisson processes and describe some of their applications in insurance. But the properties of compound Cox processes, that is, of sums of independent identically distributed random variables up to a Cox process, as well as the asymptotic properties of (compound) Cox processes under infinitely increasing (in some sense) cumulative intensity fell out of the contents of the books mentioned above although they are very important, especially from the point of view of insurance and financial applications where they provide good asymptotic approximations for basic characteristics such as the distributions of the surplus of an insurance company under risk fluctuations or of increments of stock prices under non-constant intensity of trade. The proposed book aims at filling this gap. It presents the description of the present state of art in the field of compound Cox processes and their applications in insurance and finance. Along with the review of the well-known classical results on compound and mixed Poisson processes and risk theory, it contains many new results obtained by the authors recently. Among new theoretical results presented in the book we should mention new convergence criteria, convergence rate estimates, asymptotic expansions for quantiles of stochastic processes and many others. These results were scattered over many journal papers and are now collected together. However, some theoretical applications of compound Poisson models remained beyond the limits of this book. For example, we had to omit all the material related to the method of accompanying infinitely divisible distributions (which are compound Poisson). The main idea of the book is to concentrate on generalized Poisson models oriented at their exploitation in applied problems, first of all, in insurance and finance. Among applied problems considered in the book, four deserve to be mentioned especially. The first of them is the problem of prediction of stock prices. The classical xiv
theory of financial markets assumes that the process of stock prices is the socalled geometrical Brownian motion resulting in that the increments of stock prices should have a log-normal distribution. But the actual distributions observed in practice are considerably more leptokurtic and have heavier tails. Although many alternative models were proposed to describe this departure from log-normality, there have been no systematic mathematical base under these considerations. Based on compound Cox processes, it is possible to construct a rather complete mathematical theory describing this phenomenon. The key point is that the intensities of flows of events at stock exchanges are substantially non-constant resulting in that the flows of events themselves are inhomogeneous. Cox processes turn out to be a very convenient tool for modeling these flows. In particular, the theory based on the asymptotic properties of compound Cox processes can explain how heavy-tailed distributions, for example, stable laws can occur as the distributions of stock price increments even when jumps of the stock price process have finite variances. As an alternative to the classical (although rather inadequate) model of geometric Brownian motion, our constructions quite naturally (through functional limit theorems) lead to the model of stock prices evolution which can be called geometric Brownian motion in random medium. By the way, the models proposed in the book are rather general and can be successfully used to describe similar phenomena occurring not only in finance, but also in other fields, e.g., physics of turbulent plasma. The second problem is connected with the description of the asymptotic behavior of the so-called generalized risk processes. These processes are natural generalizations of the classical risk process with constant premium rate and Poisson flow of claims. In order to construct a more flexible mathematical model for the surplus of an insurance company it is reasonable to take into account both risk and portfolio fluctuations. It can be shown that under risk fluctuations (non-constant intensity of insurance payments), in reasonable strategies of the insurer the premium rate or, which is in some sense the same, the current size of the portfolio must not be constant. On the other hand, the intensity of payments should be proportional to the current number of insurance contracts in the portfolio resulting in that the cumulative stochastic intensity of payments should be proportional to the total number of contracts in the portfolio or, which is in some sense the same, to the cumulative premiums. This reasoning quite naturally leads us to the generalized risk processes which are obtained from classical risk processes by means of stochastic change of time. In the book, the asymptotic behavior of these processes is investigated in full detail. The results presented in the book provide serious theoretical grounds for the construction of reasonable (asymptotic) approximations for the distribution of the surplus of an insurance company. The third problem is that of statistical estimation of the probability of ruin for a generalized risk process. Usually, the ruin probability is not estimated directly. There are very many analytical results dealing with the construcXV
tion of expressions asymptotically equivalent to the ruin probability (under infinitely growing initial capital or/and small safety loading) or of (two-sided) bounds for the ruin probability. According to the traditional approach, the parameters of these asymptotic expressions, say, the Lundberg exponent (see (Grandell, 1991)) or bounds are replaced by their statistical estimators and the resulting expression is considered as the statistical estimator for the ruin probability. However, each of the analytical (asymptotic) estimate requires information on the behavior of the tails of the distributions of claims whereas in practice this information can hardly be obtained since statistical inference concerning these distributions is based on samples with finite sizes. Therefore, from the practical viewpoint it is very reasonable to follow another approach and to construct direct statistical estimators for the ruin probability based on the pre-history of the risk process. This problem is principally new. Only one paper due to Croux and Veraverbeke has been devoted to this subject. In the book, developing and generalizing the result of this pathbreaking paper, both point and interval (two-sided) statistical estimators are constructed for the ruin probability for the so-called generalized risk process characterized by both stochastic portfolio and risk fluctuations, that is, by the stochastic growth of the reserve of an insurance company due to insurance premiums and the stochastic intensity of payments. The fourth problem is related to a non-traditional approach to the determination of optimal values of basic parameters of a risk process, namely, the starting capital of an insurance company and premium rate. The thing is that the probability of non-ruin traditionally used as an optimality criterion sometimes seems to be an unrealistic characteristic of functioning of an insurance company. First, in real practice it is useless to plan the activity of an insurance company for an infinite time interval and second, the parameters of the risk process do not remain unchanged following economical or political changes. Therefore, in the proposed book it is suggested to use a somewhat different optimality criterion related to the total costs of the functioning of an insurance company within a finite time interval. According to one of possible criteria, the solution is surprising: the optimal starting capital should be negative, that is, one can start his insurance business having debts or taking credits. This problem is a re-formulation of a problem of optimal inventory control also considered in the book. The methods we use in the book determined the choice of the exposed material. We tried to manage without application of sophisticated results of the theory of stochastic processes and restricted ourselves to the apparatus usually included in all basic university courses of probability theory for applied mathematicians, engineers or physicists. However, for the sake of completeness we included in the book a separate chapter devoted to simplest functional limit theorems for compound Cox processes, although to prove them we consider the double array limit scheme unlike we do in the rest of the book where we deal with the scheme of 'growing* sums. Actually, in the book we mainly deal with xvi
finite-dimensional distributions (more exactly, even one-dimensional distributions) of generalized Poisson processes. This turns out to be enough for better understanding of what happens in many applied problems in insurance and finance. We tried to make our book self-contained. For this purpose the main part of the book is forestalled by the introductory chapter (Chapter 1) containing a brief account of basic results of probability theory used in the sequel so that the reader does not have to repeatedly refer to other sources for better understanding. We avoided rigorous proofs of the well-known results presented in this chapter, otherwise the volume of the book could become catastrophically large. Only some of results are supplied with proofs. This was done for methodical purposes when either the results are not traditional for the basic courses of probability theory or the proofs are not traditional themselves. Chapter 2 is devoted to the properties of Poisson processes. To provide grounds for using them as models of 'discrete chaos,' a special emphasis is made on the extremal entropy properties of these processes. The book also contains some auxiliary results of the asymptotic theory of superpositions of independent stochastic processes, in particular, of random sequences with random indexes and random sums gathered in Chapter 3. Along with compound Cox processes, some interesting both theoretical and applied problems are considered in the book for compound Poisson processes (Chapter 4), namely, those related to convergence rate estimates, BernshteinKolmogorov type inequalities for the probabilities of large deviations and asymptotic expansions for the quantiles. Compound Poisson distributions and processes play a very important role in the classical risk theory. The results presented in Chapter 4 can be applied to practical evaluations of basic characteristics within static insurance models (the so-called individual risk models). Chapter 5 is devoted to the dynamic insurance models (the so-called collective risk models). The main object of interest is the classical risk process. For the sake of completeness of presentation, here the well-known Beekman-Khinchin-Pollaczek formula is deduced which expresses the ruin probability in terms of geometric convolutions. Along with the classical asymptotic results due to Lundberg and Cramér, some new estimates are presented here concerning the behavior of ruin probability with large initial capital or with small safety loading. In the latter case, the new results are presented here giving the refined formulas for the ruin probability based on the asymptotic expansions. The main part of this chapter formally deals not with Poisson but with geometric random sums (that is, sums of a random number of independent identically distributed random variables with the number of summands being geometrically distributed). However, since the geometric distribution can be represented as both mixed or compound Poisson, this material does not stand out of the framework of the book. Moreover, from this point of view the recently published book Geometric Sums: Bounds for Rare Events with Applications. Risk Analysis, Reliability, Queueing by
xvii
V. V. Kalashnikov (Kluwer Academic Publishers, 1997) can be regarded as allied. Also, in Chapter 5 the asymptotic approximations are discussed for the distribution of the surplus of an insurance company for general risk processes with non-Poisson flows of claims. A non-classical optimization problem is also considered here (the fourth problem mentioned above). The material of this chapter can be used for both lecturers and students as a base for the course in collective risk theory. Chapter 6 deals with doubly stochastic Poisson processes, and, in particular, with mixed Poisson distributions. Of methodical interest is the modified Beekman-Khinchin-Pollaczek formula for the ruin probability in the classical risk process presented here. This formula connects ruin probability with the probability of a certain geometric random sum to exceed the value of the initial capital. But, as is known, the geometric distribution is a mixed Poisson distribution with exponential mixing distribution. Therefore, we can represent the ruin probability as the probability of a mixed Poisson random sum to exceed the value of the initial capital. This trick makes it possible to use the properties of Poisson random sums for the investigation of some properties of ruin probabilities. The asymptotic behavior of compound Cox processes with zero mean essentially differs from that of compound Cox processes with nonzero expectation since to obtain non-trivial limit laws we should use different normalizations of the controlling process in these two cases. Therefore, these two cases are considered in two separate chapters of the book (Chapters 7 and 9 respectively). The asymptotic methods of the analysis of compound Cox processes with zero mean developed in Chapter 7, are used in Chapter 8 for the description of the evolution of stock prices (see the first problem mentioned above). Functional limit theorems for compound Cox processes are proved in Chapter 10. These theorems provide serious grounds for the consideration of geometric subordinated Lévy processes (more exactly, geometric subordinated Brownian motion) as reasonable macro-models of the evolution of stock prices. Chapter 11 deals with generalized risk processes and includes the description of the asymptotic behavior of their various characteristics (see the second problem mentioned above). A separate chapter (Chapter 12) is devoted to the statistical inference concerning the ruin probability for generalized risk processes (see the third problem mentioned above). The book is designated for specialists in applied probability and to those who use models and methods of probability theory to solve practical problems in the fields of insurance and finance. It can be used as a textbook to support courses in applied probability, mathematical risk theory, insurance mathematics and mathematical models of evolution of financial indexes at graduate and postgraduate levels. The material of the book was the subject of the courses the authors read at the Faculty of Computational Mathematics and Cybernetics, Moscow State University, and of the lectures the authors gave xviii
in Vologda State Pedagogical University (Russia), Nankai University (Tianjin, China), Xian Northwestern Polytechnical University (China), and Maria Curie-Sklodowska University (Lublin, Poland). In the process of writing this book we felt a non-formal moral support from our colleagues and friends. Our special thanks are to V. Zolotarev for encouraging discussions and to V. Kruglov for stimulating criticism. V. Kalashnikov and S. Shorgin were very kind to look through the draft version of some sections and gave many useful advises. Their support, in part, took the form of their new results which they sacrificed to our book. P. Embrechts drew our attention to a very interesting material on the approximation methods for the distributions of random sums, the fragments of which were included in the text. We also thank A. Balkema, M. Csôrgô, E. Omey and J. Teugels whose recommendations by all means improved the structure of the book and A. Kolchin for computer editing of the manuscript. Our effective work on the book would have been impossible if not a very friendly atmosphere at the Department of Mathematical Statistics of the Moscow State University headed by Yu. Prokhorov. But of course, we felt main inspiration in the tolerant and optimistic attitude of our families to this work. The book contains results of research supported by the Russian Foundation for Basic Research, projects 99-01-00846, 99-01-00847, 00-01-00360, 0001-00657,00-02-17507; Russian Humanitarian Scientific Foundation, project 00-02-00152a; INTAS, project IR 97-537; Committee on Knowledge Extension Research (CKER) of the North American Society of Actuaries. V. E. Bening V. Yu. Korolev Moscow-Savelovo-Zhavoronki,
xix
August, 2001
1
Basic notions of probability theory This chapter introduces basic notions and theorems of probability theory which will be extensively exploited throughout the rest of the book. In most cases, the proofs of the statements are omitted. The interested reader can find them in such well-known sources as (Cramér, 1937; Doob, 1953; Gnedenko and Kolmogorov, 1954; Loève, 1977; Feller, 1971; Lukacs, 1970; Neveau, 1965; Petrov, 1975). The reader who feels well-acquainted with probability theory may skip this chapter and start reading this book from Chapter 2 returning to Chapter 1 only for references where necessary.
1.1. Random variables, their distributions and moments Let Ω be a non-empty set of elements ω. These elements will be called elementary events or elementary outcomes and the set Ω will be called the space of elementary outcomes. Let be a set of subsets of the space of elementary outcomes Ω possessing the following properties: • Ωe % • if ß e % then B° e °U; • if Bi e % i = 1,2
then oo
U
B¡e%
i=1 as well. The set is called a σ-algebra of events or a Borei field of events and its elements are called events. ι
2
1. Basic notions of probability
theory
The set Ω together with a σ-algebra of its subsets constitute a measurable space (Ω, °11). It is obvious that the systems of sets 3£ = {0,Ω},
% = {fi: fie Ω}
are σ-algebras. As this is so, % is the trivial or the 'poorest' σ-algebra, whereas is the 'richest' σ-algebra consisting of all the subsets of Ω. A measure (a σ-additive function of sets) Ρ defined on °U and normalized by the condition Ρ(Ω) = 1 is called a probability measure or a probability. If Β e % then the value PCB) is called the probability of an event B. The triple (Ω,0!!, Ρ) is called a probability space or a probability model. Any real function X = X(co) defined on Ω maps the set of elementary outcomes onto the real line R1 = R. Let A be an arbitrary set of real numbers, that is, let A ç R. Define the set Z _1 (A) = {ω: Χ(ω) e A}, which is a subset of Ω and is called the preimage of the set A. Define the class S8 of Borei sets on the real line as the least σ-algebra containing all open sets (or all intervals). Hence, a Borei set is a set which can be constructed of intervals using the operations of complement and countable union or countable intersection. IfX _ 1 (A) e for any Borei set A e 9δ, then the function X(ω) is called measurable. A real finite measurable function is called a random variable. The simplest example of a non-trivial random variable is the indicator 1b(ω) of a set fi e %
Another example of a random variable is a discrete random variable which takes at most countably many different values {xi,x2,...}. Obviously, the events B¿ = {ω : Χ(ω) = x¿} are disjoint and |J¿ B¿ = Ω. Denote Pi = PCB«). The set {Oc¿,Pi)}¿;>i is called the distribution of a discrete random variable X. In what follows, for brevity, instead of Ρ({ω : X e A}) we shall write PCX" e A). The distribution of a discrete random variable X completely determines the probabilities for the random variable X to fall into any Borei sets: if A e 28, then Ρ ( Z e A ) = £ pi. i:
XÌÉA
Note that the requirement that a random variable must be a measurable function of an elementary outcome guarantees the possibility to consider sets
1.1. Random variables, their distributions and moments
3
of the form {ω: Χ(ω) e A} as events whatever a Borei set A is and, hence, the probabilities for a random variable to fall into any Borei sets are defined. Therefore, we can consider the probability measure Ρχ defined on the set 28 of all Borei sets by the relation PX(A) = Ρ({ω : Χ(ω) e A}),
A e a.
This probability measure is called the distribution of a random variable X. Thus, any random variable X generates a new probability space (IR, SS, Ρχ). Consider the probability P(X e A) in the case where A = ( — 0 0 , I n this case we set Fx(x)^F(x) = P(X 0 for any δ > 0. The set of all growth points of F will be called the spectrum of F. Among all measures on (R, a special role is played by the Lebesgue measure which assigns to any interval (a, b) its measure equal to its length b — a and, hence, any one-point set has the zero Lebesgue measure, which is quite natural for continuous models. By the way, in most textbooks a very interesting question is bypassed. Namely, this is the question why, to define a probability model (a probability space), one should necessarily consider a specific σ-algebra of events. It seems that it could be much simpler if, as a set of events, one could always consider the set of all subsets of Ω (or of R, if a measure is to be defined on subsets of the real line). However, in general, this turns out to be impossible. Namely, in 1930 S. M. Ulam proved
4
1. Basic notions of probability
theory
his famous theorem that a finite measure defined on all the subsets of a set whose cardinality is that of continuum, is identically equal to zero, if it is equal to zero at any subset containing exactly one element (for a brief proof see, e.g. (Oxtoby, 1971, Chapter 5). Let μ be a measure defined on a measurable space (IR, 95). A distribution Ρχ is called absolutely continuous with respect to μ, if there exists a nonnegative function ρ (χ) such that Ρχ(Α) = [
JA
piy)ßidy)
for any A e S& (here the integral is understood in the Lebesgue sense). As this is so, the function pix) is called the density of the random variable X. A discrete distribution is not absolutely continuous with respect to the Lebesgue measure, but is absolutely continuous with respect to the counting measure which assigns to any set A e 28 the value equal to the number of those points of {¿c¿} which fall into A. Furthermore,
In what follows, a distribution absolutely continuous with respect to the Lebesgue measure will be called simply absolutely continuous. It can be shown that the distribution of a random variable X is absolutely continuous, if P(X e A) = 0 for any set A e Sì with zero Lebesgue measure. For an absolutely continuous distribution we have p(x) = F'(x). Since every distribution function is monotonie, we can immediately enumerate some general properties of Fix) the proofs of which can be found in textbooks in the theory of real functions (see, e.g. (Natanson, 1961)). THEOREM 1.1.1. A distribution function F{x) has at most a finite number of jump points at which the jumps are greater or equal to δ > 0 and, hence, has at most countably many discontinuity points. The derivative F'(x) of Fix) exists at almost all points x. THEOREM 1.1.2. Any distribution function Fix) can be uniquely represented as the sum of three components: Fix) = aiFifct) + a2F2ix) + a3F3ix), where αχ, a 0, then χ is called a possible value of the random variable X. A random variable X has a lattice distribution, if the set of all its possible values has the form {6 + nh, η = 0,±1,±2,...} where b and h > 0 are fixed numbers. Here the number h is called the step of the distribution. The greatest step of a lattice distribution is called the span of the distribution. Consider some special distributions which will be often used in what follows. THE DEGENERATE DISTRIBUTION. A random variable X has the degenerate dis-
tribution concentrated in a point α e IR, if
P(X = a) = 1.
(1.1.2)
In this case „ Í0, x a. THE BINOMIAL DISTRIBUTION. A random variable X has the binomial distribu-
tion with parameters (n.p) (0 < ρ < 1, η > 1), if P(X = k) = Q p * ( l - p ) " - * ,
¿ = 0 , 1 , ( 1 . 1 . 3 )
A binomially distributed random variable describes the number of successes in η Bernoulli trials (independent homogeneous dichotomic trials) with the probability of success in a separate trial equal to ρ (and, respectively, with the probability of failure in a separate trial equal to 1 — p). THE POISSON DISTRIBUTION. A random variable X has the Poisson distribution with parameter λ > 0, if Xk P(X = k) = e-1—, k = 0,1,... (1.1.4) k\ The Poisson distribution is a good approximation to the binomial distribution with large η and small p. For more detail see Section 1.7. THE NEGATIVE BINOMIAL DISTRIBUTION (PASCAL DISTRIBUTION). A random va-
riable X has the negative binomial distribution with parameters η and ρ (η > 0, 0 < ρ < 1), if P(X = jfe)= ^
+
¿ = 0,1,...
6
1. Basic notions of probability
theory
If η is a positive integer, then the random variable with negative binomial distribution describes the number of Bernoulli trials required to obtain exactly η successes. THE GEOMETRIC DISTRIBUTION is a special case of the negative binomial dis-
tribution with η = 1. THE UNIFORM DISTRIBUTION on an interval [A, 6] is defined by its density
[0,
xe[a,bl
THE NORMAL (GAUSSIAN) DISTRIBUTION. A random variable X has the normal
distribution with parameters (μ, σ 2 ), μ e IR, σ 2 > 0, if its density has the form
< L U 0
The normal distribution with parameters (0,1) is called standard. In what follows, the standard normal distribution function and its density will be denoted Φ(χ) and 0, a > 0, β > 0, γ > 0, h > cid. with location parameter o e IR and scale parameter λ > 0 is defined by its density
THE CAUCHY DISTRIBUTION
=
nW + l-a)>V
Xe R
·
A random variable Ζ has the Laplace distribution with location parameter o e IR and scale parameter λ > 0, if its density has the form
THE LAPLACE (DOUBLE EXPONENTIAL) DISTRIBUTION.
p{x)
=
X e IR.
£i
Let Χ(ω) be a random variable defined on a probability space (Ω, I, P). Since a probability space is a measurable space with a measure, the notion of an integral can be introduced. If
I
|Χ(ω)| άΡ(ω) < oo,
Ω
then the random variable Ζ is said to have the mean value or the which is denoted EX". Thus, by definition we have
mathematical
expectation
EX = f
Ja
The equality
XdP.
+00
/ -00 xdFxix)
holds. Here on the right-hand side there is a Stieltjes integral. If the random variable X is absolutely continuous and p(x) is its density, then +00
/
-oo
xp(x)dx.
8
1. Basic notions ofprobability theory
I f X is a discrete random variable with distribution {(jc¿,p¿)}¿^i, then EX = Σ χ ί Ρ ί · fei Let h(x) be a Borei function, that is, a real function defined on the real line so that for any c e IR the set h(x) < c} is Borei. Then +00
/
h(x)dF(x)
-00
under the condition that at least one of these integrals exists. Let X and Y be two random variables. Then E(X + Y) = EX + EY, provided that any two of the expectations involved in this equality exist. The mathematical expectations of the random variables Xs and |X|S are called the sth moment and the sth absolute moment of the random variable X, respectively (or, respectively, the moment of order s and the absolute moment of order s of the random variable X): +00
/ -00
x?dF(x),
(1.1.7)
+00
|*|sdFOc). (1.1.8) The central moment μ^ and the absolute central moment vs of order s > 0 of a random variable X are respectively defined by the equalities /
-oo
+00
/
-00 +00
(x - di)* dF(x),
(1.1.9)
\x - onl'dFix). (1.1.10) -00 A special role is played by the second central moment μ2 which is called the variance of the random variable X and is denoted DX, DZ = E(X - EX")2 = EX2 - (EX")2. /
Note that DX is always defined, if EX is defined, but can take the value +oo. The quantity σ = v/DX is called the mean square deviation of the random variable X. Note an important property of the variance: DX = 0 if and only if P(X = EX) = 1,
1.1. Random, variables, their distributions and moments
9
that is, in that case the random variable X is constant with probability one. Further, if the variance is finite, then D(aX + b) = a2DX,
a, b e R.
In particular, the standardized random variable
Vox always has zero expectation and variance 1. If the random variable X has the normal distribution with parameters (μ, σ2), then DX = σ2.
EX = μ, lîFx(x)
is the distribution function of a random variable X, then r 00 EX=
JO
rO (1 -Fx(x))dx+
J—oo
Fx(x)dx.
(1.1.12)
If; furthermore, P(X > 0) = 1, then roo
as = EX* = \s\
Jo
x'-Hl-FxixVdx
(1.1.13)
for any real s Φ 0. Now consider some numerical shape and location characteristics of the distribution of a random variable. The distribution Ρχ of a random variable X is called unimodal, if there exists a value χ = a such that the distribution function Fx(x) is convex for χ < a and is concave for * > a. Here the number a is called the mode of the distribution. If a random variable X is absolutely continuous, then its modes are those values x, at each the density px(x) is maximum. If X is a discrete random variable and pu = P(X" = Xk), then the values x¿ for which P(X" = χi) = maxpk k
are said of as its modes. The median of a random variable X is any number medX satisfying the conditions PCX" > medX) > PCX" < medX) > \ . Ζ Á For an absolutely continuous random variable X the median is defined as the value med Ζ for which med A rmeàX
/ -oo
roo
^
Px(x) dx= JmedX Px(x) dx = ¿-.
10
1. Basic notions ofprobability theory
The quantile of order α, α e (0,1), (the a-quantile) of a random variable X is the value xa for which P(X < xa) > a,
P(X >xa)> 1 — a.
IfX is an absolutely continuous random variable, then its a-quantile satisfies the equation Fx(xa) = a. The median is the quantile of order 1/2. The mathematical expectation characterizes the center of the distribution of a random variable X in the sense that E(X-*)
2
>EQR-EX-)
2
= DX
for any χ e IR. As this is so, the variance characterizes the dispersion of the distribution around this center. The median characterizes the center of the distribution of a random variable X in the sense that E\X-x\ > E|X-medX| for any* e R. The mathematical expectation is unique, if it exists. The median always exists, but sometimes is not uniquely determined. The dispersion of the distribution can also be characterized by the difference of the quantiles of orders α > 1/2 and 1 — a. In particular, the quantity X3/4 — xy± is called the interquartile range. If a random variable X hasfinitemoments of orders up to the third (the third included), then the quantity 71
/X-EX\3 \ VÖX J
_ E(X - EX)3 _ Mg (DX)3'2 σ3
is called the skewness or asymmetry coefficient. If ft < 0, then the left tail of the distribution is heavier than the right one. If ft > 0, then vice versa. If a random variable X hasfinitemoments of orders up to the fourth (the fourth included), then the quantity (Χ-·ΈΧγ
EÍX-EX) 4
μ4
is called the excess coefficient or kurtosis of its distribution. IfX has the density p(x) and ft > 0, then p(x) has sharper vertex and heavier tails than 0, then p(x) has more gentle vertex and lighter tails than M> for all Λ: from some Borei set B. Then for any random variable X we have
P(
M
0
1.2. Generating and characteristic functions
11
This theorem immediately follows from the relations +00 /· /
g(x)dF(x)
-oo
> M /JB dF(x) = ΜΡχ(Β).
In particular, by setting gOt) = (x — EX)2 we immediately obtain the Chebyshev inequality:
for a n y positive t ΏΧ
P(|X-EX|>i) 0, we obtain the inequality P(|X| > t $ y ) < i
Finally, for^Gx:) =
(1.1.15)
t > Ο, M = eta we have Ee*
Ρ (X>a) 1, then
if X is a nonnegative random
EX < {ΈΧα)ν 1, the characteristic function f(t) is represented as the series 00 m ^ j f ' p k .
(1.2.17)
k=l
From the definition of a Lebesgue-Stieltjes integral it follows that any distribution uniquely determines the corresponding characteristic function. It is easy to verify that if X has the normal distribution with parameters (μ, σ 2 ), then fxit) = exp < ίίμ
-
t2a2
I 2 J The gamma-distributed random variable X with scale parameter λ and shape parameter a has the characteristic function
(
λ
\
a
In particular, the exponentially distributed random variableX with parameter λ has the characteristic function fx(t)=
λ
λ — it
The Cauchy distribution with density p(x) = λ π Ηλ2 + (χ - a)] characteristic function f(t)=
iat xltl e
~ .
1
has the
1.2. Generating and characteristic
functions
The Laplace (double exponential) distribution with density pix) = \e has the characteristic function
15 λ1,χ α
'
eiatX2
If X is a discrete random variable with generating function ψ(β), then the characteristic function of Ζ has the form fit) =
¥(eu).
In particular, the Poisson distribution with parameter λ has the characteristic function fit) = βχρ{λ(β£ί - 1)}. The characteristic functions are defined at all real t for all random variables. Now we enlist the basic properties of characteristic functions. (1) Any characteristic function satisfies the conditions Λ0)
=
1,
|Λί)| oo 2Τ J-τ
f(t]e~ltydt
exists and equals the height of the jump of the distribution function Fix) at the point χ =y. So, if Fix) is continuous at y, then this limit equals zero. According to Theorem 1.1.2, any distribution function Fix) can be represented as the weighted sum of absolutely continuous, discrete and singular distribution functions. Using this fact we obtain the corresponding representation of a characteristic function fix) = oi/iU) + a2f2(x) + a3f3(x),
(1.2.19)
where fk, k = 1,2,3, is the characteristic function of the corresponding component of Fix). Now we will consider the behavior of each of components in (1.2.19).
1. Since F i Oc) is absolutely continuous, then +00
/
eitxF'1{x)dx
-00
and hence, by the Riemann-Lebesgue theorem we have flit)
0
as
|£|
oo.
(1.2.20)
Hence it follows that
Â^è/j™!2^0· If for all χ there exists an absolutely integrable rath derivative F^\x), then by integration by parts it is easy to show that the behavior of /ì(i) at infinity is described by the relation A ( í ) = 0(|í| 1_n )
as
t —> oo.
(1.2.21)
2. If by Xk and pk, k = 1,2,..., we, respectively, denote the discontinuity points and the heights of the jumps of the distribution function Fix) at these points, then 00
arf2it) =
Σρ^*· k=l
1.2. Generating and characteristic functions
17
This expression is the sum of an absolutely convergent trigonometric series and lim sup |/2(i)| = 1. |t|-»oo
(1.2.22)
Furthermore, I
rT
1
00
T^Lmìdt'4ÌhPÌ
α2 23)
·
Moreover, consider the lattice discrete distribution pn = PÇK = b + nh),
η = 0,±1,±2,...
Its characteristic function fit) is represented as the Fourier series +00 f(t) = e Σ eitnhPn. n=—00 itb
(1.2.24)
so that \f(2n/h)\ = 1. Conversely, if |/Xfo)| = 1 for some ίο * 0, then the corresponding distribution is lattice and any its discontinuity points differ by the quantity proportional to 2jUQ1. The span of the lattice distribution is equal to h if and only if |/"(i)| < 1 for 0 < \t\ < 2nth and |/"(i)| = 1 for t = 2nlh. Hence it follows that if f(t) is the characteristic function of a lattice distribution with span h, then for any ε > 0 there exists 0 < qe < 1 such that \m\ 0, then
lim sup |/Xi)| < 1. |t|-xx>
function
18
1. Basic notions of probability
theory
Moreover, in this case for an arbitrarily small ε > 0 there exists qe < 1 such that \f(t)\ ε > 0. If ai = 1, then lim f i t ) = 0. £| — | »oo Ifa2 = 1, then limsup|/W| = 1. |i|-»oo Any characteristic function fit) satisfies the equality 1
rT
°°
T-ioo
k=\
where pk are the heights of jumps of the distribution function F(x) in its discontinuity pointsXk, k = 1,2,... T H E O R E M 1 . 2 . 3 . A distribution function Fix) with characteristic function fit) is absolutely continuous if and only if the function \fit)\2 is absolutely integrable. Moreover, the density p{x) = F'ix) satisfies the Parseval equality
oo / -oop\x)dx
i
roo
= — J—oo \fit)\2dt.
(1.2.25)
Relation ( 1 . 2 . 2 5 ) is called the Fiancherei equality. Characteristic functions satisfy the following inequalities. THEOREM 1 . 2 . 4 .
If fit) is a characteristic function such that |/Xf)| < q < 1 for |i| > c,
c > 0,
then for |i| 0 and γ > 0 such that THEOREM 1 . 2 . 5 .
\fit)\ 1, then the characteristic function of this random variable is k times differentiable and, moreover, = ikOk= ikEXk.
fg\0)
(1.2.26)
The converse is true in the following form: if the characteristic function of a random variable X is differentiable k times at the point t = 0, then X has all the moments of orders up to k if k is even and up to k — 1 if k is odd. Using the Taylor formula we can show that if a random variable Ζ has the moment a* = EX* of some integer order k > l , then its characteristic function fx(t) can be expanded as k
a
fxit) = ι + Σ τ { i t ) J + o W l ^ · j= 1
'
°·
(1 2 27)
· ·
For t small enough, the principal branch of log fx (t) which tends to zero together with t, is representable as k
κ·
iogfxit) = Σ τ { i t ) 1 + o ( l * l )· M J'
t
0-
d·2·28)
Here the coefficients {>cj(X) = kj, j = 1,2,...} are called cumulants or semiinvariants of the random variable X. The semiinvariants can also be defined by the formula Kj = ¿Zw(0), ι·>
where l(t) = logf x (t).
(1.2.29)
For the normal distribution with arbitrary parameters the semiinvariants of all orders beginning with the third are equal to zero. For the Poisson distribution with parameter λ the semiinvariants of all orders are equal to λ. From the formal identity oo
( 1 +
E j M
\ J
)
oo =
Σ
j W
we can obtain the following formula connecting the semiinvariant κ 3 of an arbitrary order s with the moments oo. It is easy to see that fx{t) = Ee 1 ^ =
= exp {λ (eu - 1)} .
(1.4.5)
28
1. Basic notions ofprobability theory
Therefore, flit) = e~itVXfx
(a~m)
= exp {-ity/λ
ia~m + λ [e,ia~
m
- l) } .
But
therefore, setting s = tX
1/2 ,
as λ -» oo, we obtain
fl{t) = exp {-¿ί\/λ + λ (ia~V2
+ (¿ί) 2 (2λ) _1 + ο α 2 λ - 1 ) ) }
e~t%lï.
• The following simple theorems often turn out to be useful. THEOREM 1.4.7. If a sequence of distribution functions F\(x),F2(x),... weakly converges to a continuous distribution function, Fix), then this convergence is uniform in χ e IR. THEOREM 1.4.8. Let p(x),pi(x),p2(x), ...bea pnix) -» pix),
sequence of densities and η
oo,
for all real χ maybe except for a set of values of χ of zero Lebesgue measure. Then
uniformly with respect to all Borei sets A ç IR. In what follows, we will often use the following criteria of weak compactness. Let {ΧΘ, θ e Θ} be a family of random variables with distribution functions Fe(x) and characteristic functions feis) respectively. THEOREM 1.4.9. The following assertions are equivalent: (i) the family {.Fe0c), θ e Θ} is weakly compact; (ii) lim sup P(|Xe| > R ) = 0; R->oo ge0 (iii) the family {feis), θ e Θ} is equicontinuous at the point s = 0. Let Fix) and Gix) be two distribution functions. The Lévy distance iLévy metric) L\iF, G) between the distribution functions Fix) and Gix) is defined as
1.4. Weak convergence of random variables
and distribution
functions
29
the greatest lower bound of the set of positive numbers h for which FOc—h)—h < G(x) < F(x + h) + h for all λ: L^F, G) = inf {h : F{x — h) — h< G{x) < F(x + h) + h, V * e R}. The Lévy metric Li(F, G) can be interpreted as the side of the largest square with sides parallel to coordinate axes, which can be inscribed between the graphs of the distribution functions F and G. It is very simple to prove that (1) Li(F, G) = 0 Fix) = G(x). (2) L t W . G i ^ L d G . F ) . (3) L i ( F , H ) < LxCF, G) + L i ( F , H ) .
The Lévy metric metrizes weak convergence: distribution functions Fn(x) weakly converge to a distribution function F(x) if and only if Li(F„,F) -» 0. THEOREM 1 . 4 . 1 0 . The metric space of one-dimensional
distribution
functions
endowed with the distance L\{F, G) is complete. Another example of a widely used distance in the space of distribution functions is the Kolmogorov or uniform metric p(F,G)=
sup
\F(x)-G(x)\.
—oo
Although the Kolmogorov distance is easier visualized than the Lévy metric, it does not metrize the weak convergence. Nevertheless, these two metrics are related by the following evident inequality: LiiF.G^pCF.G). Moreover, if the distribution function G has the density g(x) = G'(x), then p ( i \ G ) < ( l + supgOc))Li(F,G), X
see, e.g. (Zolotarev, 1997). It can be shown that although the 'pure' Kolmogorov metric ρ does not metrize the weak convergence, it has a 'smoothed' version PH(F, G) =
sup
\(F * H)(x) - (G * H)(x)\
—oo 0 : Ρχ(Α) < P2(Ae) + ε for any closed A e Έ}. The Lévy-Prokhorov distance between distributions Pi and P2 is defined as L 2 (Pi, P2) = max{a(Pi, P2), σ(Ρ2, Pi)}. We will assume that the Lévy-Prokhorov distance between random vectors X and Y is defined as the Lévy-Prokhorov distance between the induced probability distributions: L2(X, Υ) = L2(Px, Ργ). As is known, the weak convergence of random vectors, that is, that of their distributions, is equivalent to their L2-convergence (see, e.g. (Shiryaev, 1984)).
1.5. Poisson theorem Consider a sequence of independent trials of the same type (Bernoulli trials), in each of which an event A occurs with probability p. Let the random variable Xk be equal to 1, if in the ¿th trial the event A occurred and 0 otherwise. Then the random variables X\, X2,... are independent and P(X* = l ) = p ,
P(X* = 0) = l - p = g .
Introduce the sum Sn =X\ + ··· +Xn which is equal to the number of occurrences of the event A in the first η trials. It is easy to see that ES n = np, DSn = npq. The well-known de MoivreLaplace theorem (see the next section) asserts that if the variance DSn = npq is large enough (which is observed with large η and ρ and q noticeably different from zero and one), then the distribution of the random variable Sn is close to normal. However, often, for example, in insurance, one has to deal with 'rare' events A the probability ρ of the occurrence of which in a separate trial is small (that is, is close to zero), whereas the value of η may be moderate.
1.5. Poisson theorem
31
For example, if ρ = 0.02 and η = 200, then np = 2 and npq = 1.96. In this case the normal approximation for the distribution of the random variable Sn is very inaccurate while the Poisson approximation is much better. Namely, the following result is true. THEOREM 1.5.1.
Letup = X>0,k - 1 1
-
1)).
>1
Furthermore, since η
lim W ,
oo,
34
1. Basic notions of probability theory
then for anyfixeds, |s| < 1, we have ¥sa(s) -» βλ(8-υ. But the latter expression is nothing but the generating function of the Poisson distribution with parameter λ > 0. The theorem is proved. • Theorems 1.5.1-1.5.3 describe the behavior of the number of rarely observed events with large number of trials. They are sometimes called theorems on rare events or laws of small numbers. These theorems can be regarded as mathematical grounds for the use of the Poisson distribution in many applied problems. Now consider the rate of convergence in the laws of small numbers. For k = 0,1,2,... denote nk = bk = Ρ(Sn = k). Denote Π = {πο, πι, ΤΓ2,...},
Β = {όο,δι,&2. ···}·
Define the distance of total variation between the distribution of the random variable Sn and the Poisson distribution by the formula oo k=0
The following results will be presented without proofs. We restrict ourselves only by references to the corresponding sources. THEOREM
1.5.4. Ifpn,ι = ... = pn,n =P, λ = np, then the inequality ||B — Π|| < 2p min{2, λ}
holds. For the proof, see (Prokhorov, 1953). In what follows we will use the notation λ =Ρη. 1 + ··· +Pn,n,
h. = Pn, 1 + ···
Ti-
lt is easy to see that ESn = λ, THEOREM
DSn = λ - λ2.
1.5.5. The inequality ||B -n|| < 2 min{9, λ} maxpnj
holds.
(1.5.2)
1.6. Law of large numbers.
Central limit theorem. Stable
laws
35
For the proof, see (LeCam, 1960). THEOREM 1.5.6. The inequality ||Β-Π|| 0 entering Theorem 1.5.3 is not necessary for the Poisson approximation to hold for the Poisson-binomial distribution. Actually, as a necessary and sufficient condition we should consider the condition h¿Ιλ 0 which is equivalent to DS„ Ê S : "
1
by virtue of (1.5.2).
1.6. Law of large numbers. Central limit theorem. Stable laws We will begin this section with the formulation of the most well-known limit theorem of probability theory, the law of large numbers. First we will present the formulation of this result due to A. Ya. Khinchin. 1.6.1. Let X\,X2,... be independent identically distributed random variables with finite expectation EXi = a. Let Sn =X\ + ... +Xn. Then
THEOREM
η as η
oo.
The essence of the law of large numbers is that in the asymptotic behavior of the arithmetic mean of independent identically distributed random variables with finite expectation the information concerning the particular form
36
1. Basic notions of probability
theory
of the distribution is being lost as the number of summands grows, and what remains of this information is only the value of the mathematical expectation. Later A. N. Kolmogorov proved a stronger version of the law of large numbers. Namely, he showed that the existence of the expectation is necessary and sufficient for (1.6.1) to hold with convergence in probability replaced by convergence with probability one. More precisely, Kolmogorov proved the following result. 1.6.2. (i) IfXi,X2,... are independent identically distributed random variables with finite expectation EXi = a, then
THEOREM
(1.6.2)
(ii) Conversely, ifXi,Xz,... are independent identically distributed random variables such that (1.6.2) holds with some o e IR, then the expectation EXi exists and equals a. The law of large numbers, first variants of which were proved as long ago as in the beginning of the XVIII century, plays the fundamental role in probability theory since it gives an explanation to the empirically observed stability of frequencies which is the basis of statistical methodology. Another fundamental limit theorem of probability theory is the central limit theorem which can be regarded as an attempt to answer the question concerning the rate of convergence in the law of large numbers or, which is the same, concerning the accuracy of approximation of mathematical expectation by the arithmetic mean of independent identically distributed observations. Indeed, the distribution which appears to be limiting for the arithmetic mean p is degenerate, that is, Sjn — a -» 0 as η -» oo. If there exist a sequence of positive coefficients c„ which can 'stretch' the quantities Sjn — a so that the limit distribution for the random variable (1.6.3) is non-degenerate and proper (the probability of infinite values is zero), then we can conclude that the difference Sjn — a behaves asymptotically as c" 1 which means that the rate of convergence (or the accuracy) in the law of large numbers is proportional to c^ 1 . Another question is what is the limit distribution for the random variable (1.6.3), if it exists. The answers to these questions are given by the central limit theorem and its generalizations. Usually the term central limit theorem means any statement concerning the convergence of distributions of sums of small summands to the normal law as the number of summands infinitely grows. The central limit theorem explains the wide applicability of the normal law to approximate the result of
1.6. Law of large numbers. Central limit theorem. Stable
laws
37
a stochastic experiment influenced by a large number of random factors. By Φ(χ) we will denote the standard normal distribution function, e~u2/2du,
Φ(*) = - 4 = Γ \/2π.
χ e R.
We begin the study of the central limit theorem with its simplest version. Let Χχ,Χι,... be independent identically distributed random variables such that ΕΧχ = 0 and DXi = 1. Then the distribution of the random variable n~V2 (Xi+. ..+Xn) weakly converges to the standard normal distribution as η -> oo. THEOREM 1.6.3.
PROOF. Let fit) be the characteristic function of X\. Denote τ = n~m. By virtue of Theorems 1.4.4 and 1.4.5, the statement of the theorem is equivalent to that for all t e R we have fn(zt) as η
e" i2/2
(1.6.4)
oo. By virtue of (1.2.27) we have fis) =
1
—
Γ
+ o(s2).
(1.6.5)
Setting here s = xt, we obtain Γ(τί)=(ΐ -1τ2ί2
+ 0 (τ
2 2
ί ))".
Hence (1.6.4) follows.
•
COROLLARY 1.6.1. LetX\,Xi,... be independent identically distributed random variables such that there exist EXi = a and DXi - σ2 e (0, oo). Then
Ρ
(Sn-na \ < χ ^ Φ(χ) V Oy/n J
uniformly in χ as η -> oo. Corollary 1.6.1 gives the following answer to the above question. Since Sn —na\ _ y/ñ ÍSn Oy/ñ J σ Vη then if the variance ofXi exists, then the rate of convergence in (1.6.1) is n~V2. From Corollary 1.6.1, in turn, we obtain the well-known de Moivre-Laplace theorem mentioned in section 1.5.
38
1. Basic notions of probability theory
COROLLARY 1.6.2 (de Moivre-Laplace theorem). Let Zn be the random vari-
able with binomial distribution with parameters η and ρ e (0,1). Then
uniformly in χ as η -> oo. In what follows we will use the notation introduced above. It is quite natural to expect that if the random variable X\ has a density, then the density of the random variable τSn should converge to the standard normal density oo.
PROOF. By the inversion formula (see Theorem 1.2.8) we have 1 f°° 2 sup |p„Cx) - φ(χ)| < — / |Γ(τί) - e~*/21 dt = Inι + /„a + hs, χ ¿π J-oo
(1.6.6)
where Ini = ^-[ Ifn(rt)-e-t2'2\dt, •¿Λ ^|ί| 0,
7„2 = J - /
\fn(rt) - e~t2/2\dt,
δ > 0,
Ina = ^~ í
Ifn(zt)-e~t2l2\dt.
2 π Ja 0 such that 2 „—i tl4 /4 for \f(t)\ 0 large enough. Finally, consider I n 3 . By Theorem 1.2.2 for δ > 0 fixed above there exists a qs e (0,1) such that |/(i)|Sy/ñ
Inaíq^y/ñ
(1.6.8)
From this inequality by virtue of integrability of the function |/"(i)| it follows that Inz
0,
ra
00.
Thus, the desired statement follows from (1.6.7).
•
Note that from the proof of this theorem we can obtain a stronger result. COROLLARY 1 . 6 . 3 .
Let for some integer k>l the function
|/"(Í)|*
be integrable:
oo /
-00
\f(t)\kdt < 00.
Then the random variable xSn has the density pn(x) and sup * Ípn(jc) — φ(χ)\ -»0,
η -» oo.
The only changes in the proof of this result as compared to that of the preceding one are that in inequality (1.6.8) the factor is replaced by qnfk and |/(i)| is replaced by |/(ί)|*. Necessary and sufficient conditions of convergence of the densities pn(x) of normalized sums of independent identically distributed random variables to the standard normal density are contained in the following theorem. THEOREM 1 . 6 . 5 .
In order that sup ÌPnOc) - 00,
it is necessary and sufficient that there exists an no such that the density pno(x) is bounded.
40
1. Basic notions of probability theory
Now consider the lattice case. Let X\,Xz,... be independent identically distributed lattice random variables with common distribution function Fix), characteristic function fit) and EX"i = Ο, ΟΧχ = 1. As above, denote Sn = X\ +... +Xn. Assume that the random variables X¿ take values of the form b + kh,
k = 0,±1,±2,...
and h is the span of distribution. From (1.2.24) it follows that the function \fit)\ has the period 2π/Λ and, hence, is not integrable. However, an analog of Theorem 1.6.5 is the assertion concerning the probabilities of the form p n (y) = P(TS„ =y),
y = xinb + kh),
k = 0,±1,±2,...
THEOREM 1.6.6. If Fix) is a lattice distribution with span h, then, as η we have Vñpniy) sup 0. - (piy) y
oo,
PROOF. From Theorem 1.2.9 it immediately follows that Vñpniy)
= 7γ! e-üTirt)dt, ¿JC J\t\eBn J x
-
rttow
* TSB^S Σ
t. Dn
-
^ r ·
A=1
Even without an assumption of the finiteness of moments, nevertheless, it may turn out that there exist numbers o„ and bn > 0, η = 1,2... such that the distribution function F*(x) of the random variable
satisfies (1.6.11). THEOREM 1.6.10. LetXi.Xi,
··· be a sequence of independent random variables with common distribution function F(x). Sequences of constants an and bn > 0, η = 1,2,... such that the distribution functions F*(x) of the random variables S„ satisfy (1.6.11) exist if and only if lim x2P(LXi| >x) ( [
j^dFXyji
\J\y\°o h(x)
= 1.
(1.6.16)
If there exist sequences of numbers o„ and bn > 0 such that the distribution functions of the sums S* (see (1.6.14)) of independent random summands Xk with common distribution function F(x) weakly converge to a distribution function G(x), then F(x) is said to belong to the domain of attraction of G(x). Thus, condition (1.6.15) is necessary and sufficient for F(x) to belong to the domain of attraction of the standard normal law. THEOREM 1.6.11. LetXi.Xi,...
be a sequence of independent random variables with distribution functions F\(x),F2(x),..., respectively. Sequences of numbers an and bn > 0, η = 1,2... such that Χχ,Χζ,... satisfy the condition of uniform negligibility in the form lim max P(LX¿| > ebn) = 0
n->oo \ 0, 61 e IR and b 0 and 6 e IR such that G(a\x + 61) * G(fi2X + ¿>2)s G(ax + 6). A characteristic function g(t) is stable if and only if it can be represented as g(t) = exp jioí - φ| α ( l +
fca))
}
1.7. The Berry-Esseen inequality where
45
[tan if α * 1, J log |f I. if α * 1.
Here the parameter α is called the characteristic exponent. All non-degenerate stable distributions are absolutely continuous. The examples of stable distributions are normal (a = 2), Cauchy (a = 1), Lévy (a = 1/2) with the density Pwiix) = ι and the distribution with the density p(x) = ρ y 0. Other examples of stable distributions whose densities are represented through elementary functions are unknown. Stable distributions with characteristic exponents a < 2 have moments of orders δ < a and do not have moments of orders δ > a, that is, stable laws (except the normal one) have heavy tails. All stable laws and also negative binomial, gamma and Poisson distribution are examples of the so-called infinitely divisible laws. A distribution function F(x) with characteristic function f(t) is called infinitely divisible, if for any η > 1 there exists a characteristic function fn{t) such that f(t)^(fn(t))n. Infinitely divisible characteristic functions do not take zero values anywhere.
1.7. The Berry-Esseen inequality Let XltX2,... be a sequence of independent random variables with finite expectations μι = ΕΧι and variances of = QX¿, i > 1. Denote B\ = σ\ + ... + and assume that Q < B * < o o . The assumptions of existence of moments of orders higher than the second makes it possible not only to establish the weak convergence of the distribution functions F*(x) of normalized sums
Bn
i=ι
to the standard normal distribution function Φ(χ), but to estimate the rate of this convergence as well. THEOREM 1 . 7 . 1 .
ments
Assume that for some δ > E\Xi -ßi\ 2 + s
1 sup IF*(x) - Φ(*)| < C ^ x
n
¿ E|Xi i=l
μί|
2+δ
.
(1.7.1)
In the case δ = 1 inequality (1.7.1) is called the Esseen inequality. Moreover, in this case C < 0.7915 (Shiganov, 1986). If the random variables Χχ,Χζ,... have identical distributions and δ = 1, then inequality (1.7.1) takes the form pi γ _ „13 sup I F » - Φ(*)| < Co 1 \ r 1 , χ σ° y re
(1.7.2)
where μ = ΕΧχ, σ 2 = ΟΧχ. Inequality (1.7.2) is called the Berry-Esseen inequality. Now it is known that Co < 0.7655 (Shiganov, 1986). The absolute constants C and Co in inequalities (1.7.1) and (1.7.2) cannot be less than (2π)~ ν 2 . The lower bound of the constant Co in the Berry-Esseen inequality is equal to
where the first supremum is taken over all η and all distribution functions F(x) with zero means and finite third moments. The exact value of this bound is unknown. However, it can be shown that ^ σ3 , . , \/Ϊ0 + 3 lim sup sup y / ñ — > 0.4097. ύ sup |F*(*) - Φ(*)| = n—>oo njr Ε|Λχ — μ\ χ 6\/2π The Berry-Esseen inequality admits the following refinements and modifications. 1. Let Hi be the set of functions g(x) defined for all real * and satisfying the conditions (i) the function g{x) is nonnegative, even and does not decrease for χ > 0; (ii) the function xlgix) does not decrease for χ > 0. Note that, for example, for 0 < δ < 1 the function g (χ) = |x|á is in Hi. Let X\,Xz Xn be independent identically distributed random variables such that EXi = 0, DXi = σ 2 and EXfgiXJ < oo for some function g(x) e Hi. Then (Katz, 1963) «¡pKW-Wls^SaW
A>0.
(1.7.3,
1.8. Asymptotic expansions in the central limit theorem
47
2. LetXi,X 2 ,...,X n be independent identically distributed random variables with EXi = μ, DXi = σ 2 and Ε|Ζχ|2+δ < oo for some δ e (0,1]. The following non-uniform estimate holds (see (Nagaev, 1965; Bikelis, 1966)): KW -
φ
ωΐ *
0(δ)
F|Y- _ „12+5 σ2^ίΐ + |^)π5/2·
(1.7.4)
where C( 3 1 . 9 4 J ^ i t L ,
(1.7.5)
3. LetXi.X.. .,Xn be independent identically distributed random variables with EXi = 0, DXi = σ 2 and E|Xi|r < oo for some r > 3. Then CL« where the constant C(r) > 0 is finite and depends only on r. At the same time, in (Mackevicius, 1983) it was shown that if the existence of only the second moment of summands is assumed, then the distributions of normalized sums of independent identically distributed random variables may approach the normal law arbitrarily slow. The accuracy of the normal approximation are considered in full detail in the books (Petrov, 1975; Bhattacharya and Ranga Rao, 1976; Senatov, 1998).
1.8. Asymptotic expansions in the central limit theorem In this section we define and discuss Edgeworth expansions as approximations to distribution functions of some random variables typically arising in probability theory and mathematical statistics. At first we present some of the main results concerning the classical theory of Edgeworth expansions for the distribution functions of sums of independent identically distributed random variables. Second, the problem of extending the theory of Edgeworth expansions for sums of independent random variables to more general random variables is considered in brief.
48
1. Basic notions of probability
theory
A sequence of distribution functions Fn(x) is said to admit an expansion of order k e Ν if sup IF n (x)
-
Edgeworth
τ = n~m,
Φη*(χ)| = ο(τ*),
(1.8.1)
χ
with Φnjtix) of the form k Φη*(χ)
= Φ(*) + Σ τ*0/(*)(*). j= ι
where Qj(x) are polynomials. The classical theory of Edgeworth expansions is concerned with sums of independent random variables. The validity of Edgeworth expansions was first proved for distribution functions of sums of independent identically distributed random variables by H. Cramér, see (Cramér, 1937). For further development of this theory, see(Bergström, 1951; Gnedenko and Kolmogorov, 1954; Petrov, 1975; Bhattacharya and Ranga Rao, 1976; Bhattacharya and Denker, 1990). A nice introduction can be found in (Feller, 1971; Hall, 1992). For applications in statistics see, e.g. (Bening, 2000). Let Χι,..., Xn be independent and identically distributed random variables with mean μ and finite variance σ2 > 0. Define the normalized sum S* and its distribution function F*(x) as x
s : - , ± x : .
;
«
.
w
1
= p 0 such that for any Τ > 0 suV\F(x)-G(x)\ 0 this expansion remains valid for all |ί| < δ^/η because |/^)|, f ι),. ,.,Χ(ω, tj¡)). The distributions of these random vectors for various k > 1 and t\,...,tk are called finite-dimensional distributions of the process (X(í), t e Τ}. Denote Fh
tjk frι,
...,**) = Ρ({ω: X(o,h) < x1
X(œ,tk) < «λ})·
The set {Ftl «»fri, is said to satisfy the condition of consistency, if for anyfc > 1, any t¡ e Τ, i = 1 k any*¿ e R,i = l,...,k and any y e {1, we have Fh
tj tk(xi,'—xj-i, oo,xJ+l
xk)
= Fh
^ ^ . . . ^ f r i , ...,X/_1,X/+1, ...,**)·
(1.10.1)
The famous Kolmogorov theorem states that if the set of finite-dimensional distribution functions {F^ ^fri, ...,**)} satisfies the condition of consistency, then it uniquely defines the distribution of the random process X. More precisely, the following statement is true. Kolmogorov). Let {Fh tjkfri xk), ti e Τ ç IR, i = l,...,k, ti < t 1} be a family of finite-dimensional distribution functions satisfying condition of consistency (1.10.1). Then there exists a probability space (Ω, ¿4, P) and a stochastic process {X(t), t e T] such that THEOREM 1.10.1 (A. N.
Ρ({ω: X((û,ti) 1, any í¿ e Τ, i = 0,..., η such that to < t\ < ... 0 such that t e T,t + heT,seT,s + heT. A homogeneous stochastic process with independent increments is called a Levy process. A very important example of a stochastic process with continuous sample paths which can be regarded as a model of continuous chaos is the so-called Wiener process used to describe the coordinate of a particle undertaking a Brownian motion. A Wiener process is a stochastic process (W(i), t > 0} satisfying the following conditions: (i) W(0) = 0; (ii) the process W(t) has independent increments; (iii) the process W(t) is homogeneous; (iv) there exist α e IR and σ 2 > 0 such that, as t -» 0, EW(t) = at + o(t), EW2(i) = σ2ί + o(t), E| W(t)\3 = o{t). A Wiener process is an example of a Lévy process. It can be shown that all finite-dimensional distributions of a Wiener process are normal. In particular, P(W(£) < χ) = Φ((* — at)l(a\/i)). The parameters a and σ 2 are called drift and diffusion coefficients respectively. Now consider the weak convergence of stochastic processes. For further detail we refer the interested reader to the classical book (Billingsley, 1968). Let P, Pi, P2.... be probability measures on ©. We say that P„ weakly converge to Ρ and denote this as P„ => P, if
as η oo for any real continuous bounded function/" on V. Let {Xn}n2i be a sequence of stochastic processes. We shall say that the sequence {Xn}n^i converges in distribution to a stochastic process X as η -> oo and denote this as Xn => X, if Ρχη => Ρχ. Sometimes instead of the convergence of stochastic processes in distribution we will speak of their weak convergence. Let D = D[0,1] be the space of real functions defined on [0,1] such that these functions are right-continuous and have left-side limits: (i) for 0 < t < 1 the limit x(t+) = limx(s) exists and x(t+) = χ(t); (ii) for 0 < t < 1 the limit x(t—) = lim*(s) exists. «ît
65
1.10. Stochastic processes
Let be the class of strictly increasing continuous mappings of the interval [0,1] onto itself. Let f be a non-decreasing function on [0,1], f{0) = 0, f( 1) = 1. Set f(t)-f(s) = sup logt —s If 11/11 < oo, then the function f is continuous and strictly increasing and hence, it belongs to 9». Define the distance do(x,y) in the setZ)[0,1] as the greatest lower bound of those positive numbers ε, for which 9» contains some function/" such that 11/11 s ε and sup |x(f) -y(f(t))\ < ε. t It can be demonstrated that the space D[0,1] is complete with respect to the metric do. The metric space (D[0,1], d0) is called the Skorokhod space. Everywhere in what follows we will assume that ¡7 = (D[0, l],do), that is, we will consider stochastic processes as random elements in 2) = (D[0, l],do)· Let X,X\,X2,... be random elements with values in 2). Let Τ χ be a set of points of the interval [0,1] such that 0 e Τχ, l e Τχ and for 0 < t < 1 we have te Τχ if and only if P(X(i) *X(t-))
= 0.
The following theorem establishes sufficient conditions of the weak convergence of stochastic processes in 2). THEOREM 1.10.2. Let for any k e Ν
(Xnitl), ...,Xn(tk)) => (X(fi), ...,X(tk)),
n^oo,
for t\,...,tk belonging to Τχ. Let P(X(l)*X(l-)) = 0 and let there exist a non-decreasing continuous function F on [0,1] such that for any ε > 0 Ρ(|Χ„(ί) -Χ„(ίχ)| > ε. IX n (t 2 ) -Xn(t)I > ε) < e~2S[F(t2) - F(tι)]2α (1.10.2)
for ti 0, a > 1/2. Then Xn => Χ,
η —> oo.
1. Basic notions ofprobability theory
66
The proof of this theorem can be found, say, in (Billingsley, 1968). Condition (1.10.2) can be replaced by a more restrictive moment condition E[|X„(i) - Χ„(ί!)| δ IX n (t 2 ) -Χ„(ί)|δ] < m2)
-F(h)]2a.
(1.10.3)
To conclude this section, we consider some elementary properties of superpositions of independent Lévy processes. Let Y and U be arbitrary independent random variables such that (1) Ρ (Y = 0) < 1; (2) Ρ (U > 0) = 1, Ρ {U = 0) < 1;
(3) ΕΥ2 < oo, E U2 < oo. Using the random variables Y and U, we can define the independent Lévy processes (Y(í), t > 0}, and {U(t), t > 0} such that (4) Y(l) i Y, U{1) = U; (5) U{0) = O, P{U(t) < oo) = 1 for any t > O, almost all trajectories of the process U{t) do not decrease and are right-continuous. Consider the random variable Ζ = Y(Í7(1)). It is not difficult to show that the characteristic function of Ζ has the form EeisZ = EtEe"1™]1™,
s € IR,
(1.10.4)
(this follows from that, by the definition of a Lévy process, for any t > 0 the random variable Y(t) has the same distribution as the sum Y„i + ... + Yn¡n of independent random variables such that Ynj = Y(tln),j > 1, where η e Ν is arbitrary). Relation (1.10.4) holds for any nonnegative random variable 17(1) such that P([/(l) = 0) < 1. And if the random variable 17(1) is infinitely divisible, as in our case, then the random variable Ζ is also infinitely divisible. Indeed, in this situation for any η > 2 there exist independent identically distributed random variables ί/"η>χ,..., UnA providing the representation 17(1) = I7».i +
+ Un,n so that
Ee" z = E[Ee" r(1) ] t/nl+ · ·+ί/"·Λ = [Εφ^"· 1 ]",
s e IR,
where 0. For this purpose, consider its characteristic function y/t(s) = EeIsX(i) defined for 69
70
2. Poisson process
s e IR. With the use of properties 1, 2, and 3 for an arbitrary h > 0 we obtain ψι+his) = EeisX(t+h)
=
Ej^t+Ki-xiD+xm
= Ee ^ (t) Ee" [X(i+w - X(t)1 = Ee is * (i) Ee^ (/i) = %(«)% 0 is arbitrary, is equal to s _ 1 l ( 0 0, among all homogeneous point processes with one and the same given intensity λ > 0, the Poisson process has the greatest λί-dimensional entropy in the interval (0, t). Other entropy characteristics are discussed with respect to the Poisson process in (Rudemo, 1964; McFadden, 1965b).
2.3. The asymptotic normality of a Poisson process Let N(t), t > 0, be a Poisson process with intensity λ > 0. As we have seen above, EN(t) = DN(t) = Xt. The aim of this section is to establish the asymptotic
2.3. The asymptotic
75
normality of a Poisson process
normality of a Poisson process in the sense that Ρ
Φ(*) as Xt
oo.
(2.3.1)
Actually, this statement has already been proved in Section 1.4 (see Theorem 1.4.6). But here we will prove a somewhat stronger assertion which implies (2.3.1). We will show that the convergence of the distribution functions in (2.3.1) is uniform in χ and will construct the estimate of the rate of this uniform convergence. Let Co be the absolute constant in the Berry-Esseen inequality, \/ÎÔ + 3 < Co < 0.7655, 6\/2π
0.4097 < (see Section 1.7). THEOREM
2.3.1.
For any Χ > 0, t > 0 we have _ . N(t) - Xt
sup
..
(2.3.2)
X
\fXt
Let η e Ν be arbitrary. The characteristic function of the random variable N(t) can be represented as PROOF.
E e x p { i s N i t ) } = e x p {Xt (eis - 1 ) } = ( e x p j ^
(eis - 1)
se
This implies that N(t) = Nn,ι + ... +Nn¡n, whereN„i, ...,Nn¡n are independent random variables with the same Poisson distribution with parameter Xtln so that ENn i = DiVn>i = Xtln. Denote the left-hand side of inequality (2.3.2) by Δ(λ£). Then by the Berry-Esseen inequality (see (1.7.2)) we have Co E|¿V«.i -
Xtln\3
(2.3.3) 3/2 ν/" (DiV„,i) Since (2.3.3) holds for any η e Ν playing the role of an auxiliary parameter, we can choose η so that η > Xt. But then MXt)
0 as k the inequality L2((Uk,
Vk), (U, V)) < L2((Uk,
00, k e
Vk), (U'k, V'k)) + L2((U'k,V'k),
%Similarly, (U, V))
and condition (3.4.5) imply L2((Uk, Vk), (U, V)) 0 as k 00, k e 3ti. We apply Theorem 3.3.1 to the sequences and {(Uk, V*)}/fee3Ci and obtain Li(Zk,Z) -> 0 as k 00, k e 3£i, which contradicts the assumption that Li(Zk, Z)> δ for all k e 3Ci. The theorem is thus proved. •
96
3. Convergence of superpositions
of independent
stochastic
processes
The formulation of Theorem 3.4.1 contains the condition of existence of sequences of numbers {a¿}, {ó¿}, and {dk}, which guarantee the weak coman pactness of the families of random variables — d {bNkldk}k> ι· In practice, this condition should not be regarded as too restrictive. It is quite easy to see that the first of these families is weakly compact if there exist moments of the original sequence of any positive order. Indeed, let 0 < E|S* — akY < oo for some y > 0 and all k > 1. Then by the Markov inequality, setting bk = (E|S¿ — a¿| r ) Uy , we find that .. _ ( Sk-Ok lim sup Ρ —; k \ bk
λ .. El5 -a |y 1 > χ < lim sup ' ¿ , f e ' =r lim — = 0, ) *-> k (*&*) χ-^οοχΊ
(3
413)
which by Theorem 1.4.9 means that the sequence {( 0 and all k > 1. Denote bk = (E|Sa — ak\y)Vy and assume that bk oo as k oo. Let {i be a sequence of positive numbers such that dk -» oo (k -> oo) and the family {bNkldk}k¿ι is weakly compact. Convergence (3.4.3) takes place with some real if and only if there exists a weakly compact sequence of triples of random variables {(Y¿, U'k, such that (Y'k, U'k, V'k) e Ύ{Ζ) for each k>l and conditions (3.4.4) and (3.4.5) hold. Furthermore, in many situations it is quite appropriate to replace the condition of the weak compactness of the sequence {Y*} by the condition of its weak convergence. As we will see in Theorem 3.4.4, in this case the formulations are considerably simplified. Finally, if a very important special type of randomly indexed random sequences, random sums, is considered, then the condition of the weak compactness of the sequence {b?jkldk}k>i can be replaced ρ by a quite natural condition Nk -» oo (k -» oo) (see Section 3.6). Theorem 3.4.1 is a complete inversion of Theorem 3.3.1. But situations are possible where either the properties of the terms of the sequence {S*} or those of the indices guarantee that either condition (3.2.1) or (3.3.1) hold. Now we will formulate two corresponding modifications of Theorem 3.4.1. We begin with the situation where (3.3.1) holds. For an arbitrary random variable Ζ and an arbitrary pair ([/, V) with Ρ(Ϊ7 > 0) = 1, we introduce the set °ΙΧ(Ζ I U, V) = {Υ : Ζ = YU + V, Y and (U, V) are independent}. The main distinction of this situation from the one considered in Theorem 3.4.1 is that for some Z, U, and V the set may be empty. Indeed, let, for example, P(V = 0) = 1 and let the random variable U be exponentially distributed. Let Ζ be a positive random variable. Then the condition
3.4. Necessary and sufficient conditions of convergence
97
P(Y > 0) = 1 is necessary for the representation Ζ = YU to take place. In this case for any χ > 0 roo
P(YU>x) = /
e-^dGis),
(3.4.14)
Jo
where G(s) = P(Y _1 < s). But on the right-hand side of (3.4.14) there is the Laplace-Stieltjes transform of the distribution function G. Therefore, if as Ζ we take a random variable such that the function Q(s) = Ρ (Ζ > s) is not completely monotone, then representation Ζ = YU becomes impossible for any random variable Y independent of U because by virtue of the Bernshtein theorem (see, e.g. (Feller, 1971, Chapter XIII)) any Laplace-Stieltjes transform must be completely monotone. 3.4.3. Assume that sequences of numbers {a*}, {è*}, {c*}, {dk}, are such that bk > 0, bk -» oo, dk > 0, dk -» oo as k -» oo and convergence (3.3.1) takes place whereas the family of random variables {(S¿ — ak)lbk}kz ι is weakly compact. Convergence (3.4.3) takes place if and only if there exists a weakly compact sequence of random variables {Y¿}¿>i such that Y'k e 1 and condition (3.4.4) holds. THEOREM
To formulate one more variant of partial inversion of Theorem 3.3.1, for any two random variables Ζ and Y introduce the set °W{Z IY) = {(17, V) : Ζ = YU + V,Y and (U, V) are independent}, which contains all pairs of random variables (17, V) independent of Y providing the representation Ζ = YU + V. Using the same arguments as those used to describe the properties of the set V(Z) we can easily make sure that, first, the set TCIY) is always nonempty and, second, for some Ζ and Y the set Ψ{Ζ\Υ) can contain more that one element. The following theorem makes it possible to attract the property of identifiability of mixtures of distributions to the investigation of the asymptotic properties of randomly indexed random sequences. Later we will see that this leads to a considerable simplification of formulations. Assume that (3.2.1) holds with some sequences of positive numbers {bk} (bk -» oo as k oo), real numbers {α*} and some random variable Y. Let {dk} be positive numbers such that dk -> oo {k oo) and the family {bNkldk}k¿í is weakly compact. Convergence (3.4.3) to some random variable Ζ takes place with some real numbers {c¿} if and only if there exists a weakly compact sequence of pairs {(U'k, such that (U'k, V'k) e °W(Z\Y) for each k > 1 and (3.4.5) holds. THEOREM 3.4.4.
98
3. Convergence of superpositions
REMARK 3.4.1.
of independent
stochastic
processes
In all statements of this chapter as an indispensable condition,
we have bk -> oo,
dk -> oo,
k
oo.
(3.4.15)
This condition plays the main role in the proof of Lemma 3.2.1. In other words, we consider the normalization which contracts the sequences under study. This normalization is typical for limit theorems which describe the asymptotic behavior of sums of independent random variables. At the same time, in the asymptotic problems of mathematical statistics the dilative normalization by sequences tending to zero is traditionally used. It can be easily seen that Lemma 3.2.1 and all the rest statements of this chapter remain valid if condition (3.4.15) is replaced by the condition bk
0,
du -> 0,
k
oo.
3.5. Convergence of distributions of randomly indexed sequences to identifiable location or scale mixtures. The asymptotic behavior of extremal random sums As we have seen in Section 3.3, the limit laws for the random variables Z* = —CkVdk have the form EH(x—V)IU), whereH(x) is the limiting distribution function for the random variables Y* = OS* — akVbk. These distributions are mixtures of the distribution function J? in which mixing is performed over both location and scale parameters. In this section we will concentrate our attention on the situations where in the mixture approximating the distribution of Z* either the random variable U or the random variable V are degenerate. We start with the case where U is degenerate. Using the notation introduced above, from representation (3.3.5) we obtain that shift (location) mixtures appear as limit laws for Z* when P(Í7 = b) = 1 for some b > 0 (if b = 0, then the limit distribution function coincides with P(V < x)). Without loss of generality we can assume that 6 = 1. Denote I Y) = {V: Ζ = Y + V, Y and V are independent}. In general, for some Ζ and Y the set i£(Z | Y) may contain more than one element (recall that, speaking in terms of random variables, we actually mean their distributions). The element V e Ï£(Z | Y) is unique (if it exists) for identifiable families considered below. THEOREM 3.5.1. Let bk - > oo, dk - > oo, b^Jdk The convergence
Zk=>Z,
k^>oo,
=> 1 and Yk => Y as k - » oo.
(3.5.1)
3.5. Convergence to identifiable
mixtures
99
takes place with some real c\, k > 1, if and only if there exists a weakly compact sequence of random variables V'k e ¡£(Z \ Y), for which
This assertion reduces to Theorem 3.4.4, because the weak compactness of the sequence {b}fkldk}k>i is a consequence of its convergence to 1. From representation (3.3.5) we see that scale mixtures of distributions limiting for nonrandomly indexed random sequences appear as limit laws for sequences with random indices when P(V = b) = 1 for some 6 e IR1. Without loss of generality we will assume that 6 = 0. For any fixed random variables Ζ and Y denote Sf(Z|Y) = {U\ZÌ
UY, P(U >Q) = 1,U and Y are independent}.
In general, for some Ζ and Y the set SPCZlY) may contain more than one element. The random variable U e V(Z I Y) (if it exists) is unique if the family of mixtures under consideration is identifiable (see below). THEOREM 3.5.2. Let bk -» oo, dk -> oo and Y/¡ => Y as k -> oo where P(Y = 0) < 1. Assume that Nk^oo,
k -» oo,
(3.5.2)
and the sequences {au}, {c¿} and {dk} provide the convergence aNk Ck
~ dk
=*>0,
k -» oo.
(3.5.3)
Convergence (3.5.1) takes place if and only if there exists a weakly compact sequence of random variables {U'k}k¿ ι, U'k e ïf(Z \ Y), for which Lx
U'k) ->0.
k
oo.
(3.5.4)
We begin with proving the 'only if part. We reduce the proof to Theorem 3.4.4. For this purpose show that the sequence is weakly compact if (3.5.1), (3.5.2), and(3.5.3)hold. Since Sw* -ow* = GSw* — c¿) — (α^ —c¿) and the sequences {( 0 which satisfy the condition PROOF.
P(|Y| > R ) > 5 ,
(3.5.5)
100
3. Convergence of superpositions
of independent
stochastic
processes
where —R and R are continuity points of the distribution function of the random variable Y. For R defined in such a way and for an arbitrary* > 0, SNk — ojvt
>x
SNk — ClNk
> Ρ
dk
SNk —ONk
>x\
b
dk
-1
SNk — axk
\
>R
Nk SNk — ®Nk >
dk
b
bN„
SNk
>R
N„
aNk
—
>R
bm
oo
/h
— 0>n
bn Sn
>R
On bn
>R
(3.5.6)
(the last equality takes place since any constant is independent of any random variable). Since (S n — an)/bn => Y as η -» e», from (3.5.5) it follows that there exists no = no(R, δ) such that Sn
—
o,n
>R
>
bn
for all η > no. Therefore, continuing (3.5.6) we obtain SN,, — am dk
'
n=n0+1 _
δ
~
2
> δ~
2
Λθ
bNk
bn
(3.5.7)
P[b-^>^)-P(Nk oo, dk -» oo and Y* 1. Assume that (3.5.2) holds. Then Sn
" ~ÜNk => Z, dk
Y as k
k -» oo,
oo with P(Y = 0)
1, we reduce the situation under consideration to Theorem 3.5.2 with = c* = 0. For these centering sequences condition (3.5.3) holds automatically. The theorem is proved. • If the random variables Sk, k > 1, are cumulative sums of independent random variables, then it becomes possible to get rid of the requirement that the random variable Y should be nondegenerate at zero (see Section 3.6). We continue to investigate the conditions of weak convergence of randomly indexed random sequences under the assumption that the sequence with nonrandom indices, being appropriately centered and normalized, weakly converges to some random variable Y with the distribution function H. In this
102
3. Convergence of superpositions
of independent stochastic
processes
situation, if the family of location/scale mixtures of the distribution function if is identifiable, then the conditions of convergence take especially simple form. Recall the definition of identifiable families (Teicher, 1961). Let a function G(x,y) be defined on IR1 xR m , m > 1, and, moreover, let it be measurable in y for each fixed χ and for each fixed y let it be a distribution function as a function of*. Let â be a family of m-dimensional random variables. Denote 3? = {F(x) = EG0c,Q): Q e a}. The family SF determined by the kernel G and the set 2, is called identifiable, if from the equality with Qi e â, Q2 e a it follows that Qi = Q2. In the case under consideration y = (u, v), G(x,y) = H((x — v)lu). Therefore, we can use well-known results on the identifiability of location-scale mixtures of distributions. Unfortunately, there are no examples of identifiable general two-parameter families of location-scale mixtures. The examples of such families are known only for the case where Q is a two-dimensional discrete random vector taking a finite number of values. In this case, for example, the family of finite location-scale mixtures of normal laws is identifiable (Teicher, 1963). But identifiable one-parameter location or scale mixtures are well known (in our case they appear when either the random variable U or the random variable V are degenerate. For example, a location mixture of any infinitely divisible distribution (normal as a special case) is identifiable. As examples of identifiable scale mixtures we can mention mixtures generated by the normal kernel with zero mean or by the gamma distribution or by any degenerate (not in zero) distribution. Some necessary and sufficient conditions of identifiability of families of one-parameter families can be found in, e.g. (Tallis, 1969). First, consider the conditions of convergence of the distributions of randomly indexed random sequences to identifiable location mixtures. As we have already noted, these mixtures appear as limits when b^Jdk =» b as k oo for some 6 > 0. Without loss of generality, assume that 6 = 1. Let bk -» oo, dk -» oo, b^Jdk => 1 and Yk => Y as k oo and let, moreover, the family of location mixtures of the distribution function H corresponding to the random variable Y be identifiable. Then convergence (3.5.1) takes place with some real Ck, k>l, if and only if there exists a random variable V satisfying the conditions: THEOREM 3.5.4.
(1) Ζ = Y + V, where Y and Vare independent; (2) ——,
=> V as k -» oo.
3.5. Convergence to identifiable
103
mixtures
This assertion reduces to Theorem 3.5.1 with the account of the fact that in the case under consideration the set if (Z | Y) contains at most one element since the family of location mixtures of the distribution function H is identifiable. Here condition 1 means that the distribution function of the random variable Ζ belongs to the family of location mixtures of the distribution function H. Now turn to the conditions of convergence to identifiable scale mixtures. As we have noted, they appear as limits when — Ck)ldk => b as k -> oo for some b e R1. Without loss of generality we assume that b = 0. THEOREM 3.5.5. Let bk - » oo, dk - » oo and
=> Y as k - » oo. Let,
moreover,
the family of scale mixtures of the distribution function H corresponding to the random variable Y be identifiable. Let conditions (3.5.2) and (3.5.3) hold. Then convergence (3.5.1) takes place with some real cu, k>l, if and only if there exists a random variable U satisfying the conditions (1) Ρ(U > 0) = 1; (2) Ζ = YU, where Y and U are independent; (3)
dk
=> U as k -» oo.
This assertion reduces to Theorem 3.5.2, because the condition of identifiability of the family of scale mixtures of the distribution function H means that the latter should be nondegenerate at zero and the set if(Z \ Y) contains at most one element. Here condition 2 means that the distribution function of the random variable Ζ belongs to the family of scale mixtures of the distribution function H. Now we give some corollaries to Theorem 3.5.5. THEOREM 3.5.6. Let bk - » oo, dk - » oo and Yk => Y as k -> oo. Let,
moreover,
the family of scale mixtures of the distribution function H corresponding to the random variable Y be identifiable. Assume that condition (3.5.2) holds. Convergence (3.5.12) takes place if and only if there exists a random variable U satisfying conditions 1-3 of Theorem 3.5.5. The proof of this assertion is a combination of those of Theorems 3.5.3 a n d 3.5.5.
To illustrate the potentialities of Theorem 3.5.6, we consider the conditions of convergence of extremal random sums of independent random variables. Let Xi,X2,... be independent random variables with EX, = 0, 0 < QX, = of < oo, j > 1. For a positive integer k, set B2k = a\ + ... + σΑ2,
Sk=Xί + ...+Xk,
Sk = min{Si
Sk = max{Si
Sk},
Sk}.
104
3. Convergence of superpositions
of independent
stochastic
processes
It is well known that if the random variables Xi,X 2 ,... satisfy the Lindeberg condition: for any α > 0 lim ¿ A_>00 BB?
I ] J Íx aB x2dP(Xi < x) k t íj-fJy \ ^ "
= 0,
(3.5.13)
then, as k -» oo, Ρ
( ^Bk
J
=>G(x),
Ρ
f\Bk < χ ) F*·
=> 1 -
G(-x),
(3.5.14)
where G(x) = 2Φ(πΐ3χ{0,χ}) - 1,
l e R1,
is the distribution function of the maximum of the standard Wiener process on the interval [0,1] (this is one of manifestations of the invariance principle). The following statement can be regarded as a generalization of (3.5.14) to random sums. THEOREM 3.5.7. Let B\ -» oo as k -> oo. Assume that the random variables Nk satisfy condition (3.5.2), whereas the random variablesX\,X2> ··· satisfy the Lindeberg condition (3.5.13) and are independent ofNk, k > 1. The existence of a nonnegative random variable U such that
BX* —1 =>TT U, Bk
L k -> oo,
is necessary and sufficient for the extremal random sums SNk and Sxk to have limit distributions ask oo:
p (!¡f < *) ^F{x)'
p {it
Furthermore, F(x) = EGixlU), F(x) = 1 - EG(-xlU), x e IR1. Consider S ^ . The assertion concerning the asymptotic behavior of SNk is proved in a similar way. First we observe that the family of scale mixtures of the distribution function G(x) is identifiable. Indeed, if this had not been so, then there would have existed a distribution function F(x) whose all growth points are concentrated on the nonnegative semiaxis and two random variables U\ and U2 with different distributions such that PROOF.
F(x) = 2ΕΦ(*/17ι) - 1 = 2ΕΦ{xlU2) - 1,
x>0.
3.6. Convergence
105
of random sums
But then the distribution function W(x) = ±(sign(*)F(|*|) + 1) would have been representable both as W(*) = ΕΦ(*/ϊ/ι) and as W(x) = ΕΦ(χ/υ2), which is impossible because the family of scale mixtures of normal distribution functions with zero means is identifiable (see, e.g. (Teicher, 1 9 6 1 ) ) . Now it remains to refer to Theorem 3 . 5 . 6 . • COROLLARY 3 . 5 . 1 . Under
Ρ I
if and only
the hypotheses
=>G(X),
p
of Theorem
3.5.7,
[ j £
=>l-G(-x)
< x
i
if k^oo.
Bk
This assertion directly follows from Theorem 3.5.7 because the scale mixtures of the distribution functions G(x) and 1 - G(—x) are identifiable.
3.6. Convergence of distributions of random sums. The central limit theorem and the law of large numbers for random sums In this section we will consider random sequences of a special form. Namely, l e t X i , ^ . ··· be independent random variables. For an integer k > 1, set Sk =Xi
+ ••• +Xk-
Let Nk, k > 1, be integer-valued nonnegative random variables such that Nk andXi,X2. ··· are independent for each k > 1. The main objects of our interest in this section are random sums S ^ . Recall that in the preceding sections to every random variable Ζ we put in correspondence the set Ύ(Ζ) which contains all triples of random variables (Y, U, V) providing the representation in which the random variable Y and the pair ( U , V) are independent. Let, as before, L\ and L¿ be metrics in the spaces of one- and two-dimensional random variables (or, which is the same in this case, of their distributions), respectively, which metrize weak convergence.
Ζ = YU+V
THEOREM 3 . 6 . 1 . Assume
that
Nk
co,
k
-> oo,
(3.6.1)
106
3. Convergence of superpositions of independent stochastic processes
and the sequences {a¿} and {¿»¿}, bk > 0, bk -> oo, k oo, provide the weak compactness of the family {{Sk — ak)lbk}kz ι· The convergence Sn*~
du
Ck
k -» oo,
=> (some) Z,
(3.6.2)
takes place with some positive {dk}, dk -» oo, k oo, and real {c¿} if and only if there exists a weakly compact sequence of triples of random variables {(Y¿, U'k, V£)}tei such that (Y'k, U'k, V'k) g Ύ{Ζ) for each k > 1 and ^ ¿ X ; - a*j , Y i j -> 0,
Lí ^
. iXJlV'S) - 0 ,
L2
k -> oo,
(3.6.3)
* -» oo.
(3.6.4)
Comparison of the formulation of this theorem with that of Theorem 3.4.1, leads us to the conclusion that it suffices only to make sure that conditions (3.6.1) and (3.6.2) imply the weak compactness of the family of random variables {bN¿dk\k¿ι· Let Xj be a random variable independent ofXj and PROOF.
such that-X£ = Xj. Denote X$s) =X- - Xj (Xjs) is the symmetrization of XJ). Let q e (0,1). The greatest lower bound of ç-quantiles of the random variable Nk will be denoted as h(q). Assuming that the random variables Nk, Χι,Χϊ, ..., X'VX'2,... are independent for each k > 1, we introduce the random variable η
-
1 N" φ »
Using the symmetrization inequality (see, e.g. (Loève, 1977)) P(|X*) Ο, we obtain oo P{\Tk\>x) =
J2P(Nk=n)P n=1 \ oo < 2 ^ P(iVjfe = η) Ρ n=l = 2P
1
n
>X
¿(pH χ ~2
X
3.6. Convergence of random
sums
107
for any a; > 0. Hence it follows that lim sup P(|ï*| > x ) < 2 lim sup Ρ
χ—•oo £
x-»oo £
by virtue of condition (3.6.2). By virtue of Theorem 1.4.9 (ii), this means that the sequence is weakly compact. Check that C(q) =
sup k
dk
< oo
(3.6.5)
for each q e (0,1). Using the Lévy inequality Ρ max \ ISrSn
>x\
o is weakly compact at infinity; (ii) Ζ = Yit)Uit) + Vit) for each t > 0, moreover, Yit) and the pair iUit), Vit)) are independent;
116
3. Convergence of superpositions _ f{B(M(t))
(1V) L A
( ( " W ·
A(M(t))
— C(t)\
of independent stochastic .....
—m—J
...
ΛΥΩ ΝΩ)
'
processes
\
)
^
PROOF. Let {Í¿}A;>I be an arbitrary infinitely increasing sequence. For t running along this sequence, Theorem 3.7.1 holds by virtue of Theorem 3.4.1 with Sk = X(tk),
Nk = M(tk),
ak = A(tk),
bk = B{tk), ck = C(tk), dk = D(tk).
But the
sequence is arbitrary and the limit random variable Ζ does not depend on t. The theorem is thus proved. • Theorem 3.7.1 sharpens and extends the famous Dobrushin's lemma (Dobrushin, 1955). In the formulation of Theorem 3.7.1, generally speaking, the condition that the sequence of triples {(Mí), U(t), V(í))}¿;>o should be weakly compact at infinity, is ambiguous since it holds automatically. Indeed, to make this sure we should not make any changes in the 'only if part of the proof. Some minor changes should be made in the 'if part. Namely, the weak compactness at infinity of the families {Y(í)}Í2O and {C/(Í)}ÍSO directly follows from the weak compactness at infinity of the families (CXXf) —A(t))lB(t)}t^o and {B(M(t))ID(t)}t>o and conditions 2 and 3. Thus, it remains to prove the weak compactness at infinity of the family {V(í)}¿s:o· But by the inequality REMARK 3.7.1.
P(|YI + Y21 >R)
RI2) + P(|Y 2 | > RI2),
(3.7.1)
which is valid for any random variables Υχ and Y2 and any R > 0, by virtue of condition 1 for an arbitrary R > 0 we see that P(|V(í)| >R) = P(|W(í)[/(í) + V(í) - W(t)U(t)\ > R) < P(|W(í)t/(í) + y(í)| > Λ/2) + P(|W(í)£/(í)| > R/2) = P(|Z| > Λ/2) + P(|W(f)tf(t)| > RI2). (3.7.2) The first summand on the right-hand side of the latter inequality does not depend on t. The family of products {W(t)U(t)}tiio of weakly compact at infinity random terms is weakly compact at infinity itself. Indeed, let {ÍA BE AN arbitrary infinitely increasing sequence of nonnegative numbers. Show that for any ε > 0 there exists R0 = R0(ε) such that sup P(|W(f*)l/(f*)| > R ) < e k
for all R > Ro. For arbitrary M > 0 and i? > 0, P(|W(tk)U(tk)\
>R) = P(| W(ÍA)C7(Ía)| > R; U(tk) < M) +
P{\W(tk)U(tk)\>R;U(tk)M) +
P(|W(t*)| > RIM).
(3.7.3)
3.8. Transfer theorem in the double array limit scheme
117
Therefore, by virtue of the weak compactness at infinity of the families {W(i)}fco and {U(t)}t>o we can choose Μ = Μ(ε) so large that P{U(tk) > Μ(ε)) < ε/2 and then choose R0 = ROTE) so large that P(|W(f*)| > RIM(E)) < ε/2 for all R > Rο(ε). The numbers Μ(ε) and Rq{e) chosen in this way guarantee (3.7.3). Hence, by choosing in (3.7.2) R large enough we can make the righthand side arbitrarily small uniformly over any infinitely increasing sequence which means that the family {V(¿)}¿>o is weakly compact at infinity.
3.8. The transfer theorem for random sums of independent identically distributed random variables in the double array limit scheme To have an opportunity to trace the changes in the distributions of random sums in the situation where the distributions of summands may change as the distribution of the random index changes, we will consider the simplest version of the transfer theorem for random sums of independent identically distributed random variables in the double array limit scheme. Let { Z n j } ^ ! , η = 1 , 2 , . . . be a double array of row-wise independent and identically distributed random variables. Let Nn> η = 1,2,... be positive integer-valued random variables such that for each η the random variable Nn is independent of the sequence For natural k denote Snj; = Xni + ... + Xnj¡. For definiteness we will assume that all the distribution functions considered below are right-continuous. Assume that there exist an infinitely increasing sequence of positive integers {m„}„äi and distribution functions F(x) andA(x) such that THEOREM 3.8.1.
Ρ(S n ^ n Η(χ), Ρ{N n < mnx) => A{x),
η -> oo, η —> oo.
(3.8.1) (3.8.2)
n^> oo,
(3.8.3)
Then Ρ{Snp n 0 the random variables Νχ,Χι,Χζ,... are independent. As usual, by Sχ we denote the Poisson random sum S χ = X\ + ... +Χνχ· Assume that there exist ΕΧχ = a and DXi = σ 2 > 0. For integer k>0 denote EX"* = a¿. Of course, oto = 1, αϊ = a and Ü2 = σ 2 + o 2 . The characteristic functions of the random variables Χχ and Si will be respectively denoted fit) and hiit). It is well known that if f(z) is r times continuously differentiable, then, as t -> 0, fit) = 1 +
2
+
iitf g
^
(Ä+2)!
+ o(f).
(4.4.1)
Recall the following definitions (see Sections 1.1 and 1.8). We say that a random variable X has the lattice distribution, if all the numbers xn such that Ση^ι = x„) = 1 belong to the set of the form {6 + nh,n = 0,±1,±2,...} for some 6 e IR1 and h > 0. Here h is called the span of the
distribution.
It is well known that the distribution of a random variable X is lattice if and only if there exists ίο * 0 such that E exp{tíoX} = 1.
(4.4.2)
Moreover, if (4.4.2) holds for some ίο * 0, then as the span of the distribution of X we can take h = 2π/ί0 (see Chapter 1). We say that a random variable Χι satisfies the Cramér condition (C), if lim sup |/"(i)| < 1. |i|-»oo
(4.4.3)
140
4. Compound Poisson
distributions
As usual, the standard normal distribution function and its density are denoted Φ(λ) and φ(χ) respectively. F o r i = 0,1,2,..., define the function Hk(x) : R1 -» U1 as
φ(χ) The function Hkix), χ e R 1 , defined in this way is a polynomial of degree k and is called the Hermite polynomial of degree k. It is easy to see that H0(x) = 1,
H2{x)=x2-1,
H1{x)=x,
H4{x) = x4 - 6x2 + 3,
H3(x) =x3 — 3x,
H5(x) =x5 - 10x3 + 15x,
Heix) = χ6 - 15x4 + 45x2 - 15. Let m be a nonnegative integer and qk e IR1, k = 0, ...,m. polynomial
Consider the
m
q(x) = Y^qkX?· k=o Let Hq{x), ...,Hm{x) be Hermite polynomials. Set m Q(x) = Y^qkHk{x). k=Q Then it is easy to see that the function ψ(ί) = g(¿í)exp{-í 2 /2} is the Fourier transform of the function Ψ(*) = Q(x) 3 is a fixed integer. For a complex z, set
Obviously, f{z) is a polynomial of degree < r — 2 with real-valued coefficients; moreover, f(0) = 0. From (4.4.1) it follows that, as t -> 0, fit) - 1 -iat
a t2 + - ^ - = at)2fat) + 0(f). £t
4.4. Asymptotic
expansions for compound Poisson distributions
141
For λ > 0 and a complex«, set r-2
=Σ
2
1
/
Ζ j. f - 1
Ü a2
vi* \
Ζ
(4.4.4)
s/Xa2)
It is easy to verify that there exist integer m > 3 and polynomials qk(z) with real-valued coefficients, k = 3,..., m, not depending on λ such that m
ρλω =
(4.4.5) k=3
for all λ > 0 and complex z. Moreover, the polynomials determined by (4.4.4) and (4.4.5). Let
quiz)
are uniquely
Lk
qk(z) = j ^ q k j z j
(4.4.6)
j= 3
be the corresponding representation of