231 91 12MB
English Pages 426 [428] Year 2011
MODERN PROBABILITY AND STATISTICS
Modern Theory of Summation of Random Variables
MODERN PROBABILITY AND STATISTICS
Modern Theory of Summation of Random Variables Vladimir M. Zolotarev Steklov Mathematical Institute Moscow, Russia
///VSP/// UTRECHT, THE NETHERLANDS, 1 9 9 7
VSP BV P.O. B o x 3 4 6 3 7 0 0 A H Zeist The Netherlands
© V S P B V 1997
First p u b l i s h e d i n 1 9 9 7
ISBN 90-6764-270-3
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.
Printed
in The Netherlands
by Ridderprint
bv,
Ridderkerk.
Contents Foreword
iii
Preface
ν
1 Introduction to the theory of probability metrics 1.1 Probability metrics 1.2 Minimal and protominimal metrics 1.3 Special minimal and protominimal metrics 1.4 Ideal metrics 1.5 Relationships between metrics 1.6 Linearly invariant metrics 2 The 2.1 2.2 2.3
nature of limit theorems The history of limit theorems A generalization of the Lindeberg-Feller theorem Natural estimates of convergence rate in the central limit theorem 2.4 Limit theorems as stability theorems 2.5 One more mechanism forming the central limit theorem
1 2 12 21 34 58 85 93 93 103 115 126 137
3 The normalization of random sequences 3.1 General normalization problems 3.2 The choice of constants under linear normalization
145 145 155
4 Centers and scatters of random variables 4.1 The definition and properties of index, centers and scatters . . . 4.2 An analog of the Chebyshev inequality 4.3 Centers and scatters in the criteria of L-compactness 4.4 Centers and scatters in the criteria of L-convergence of random sequences
163 163 171 176
5 Classical theory of limit theorems for sums of independent random variables 5.1 Reduced models i
181 195 195
5.2 Infinitely divisible laws 5.3 Limit theorems of the classical theory 5.4 The estimates of accuracy of approximations in limit theorems of the classical theory 5.5 On completeness of classical theory 6 Generalizations of the classical theory of summation of independent random variables 6.1 The history of the problem 6.2 Stability of compositions of distributions 6.3 Limit theorems in a non-classical setting 6.4 Limit theorems in the generalized summation scheme 6.5 The method of metric distances 6.6 Sums of non-independent random variables
202 207 227 249 259 259 262 279 310 340 380
Bibliography
397
Index
411
ii
Foreword This book opens a new series of monographs. We entitled this series 'Modern Probability and Statistics'. There are many similar projects in other publishing houses. Therefore we should clarify why the decision to launch one more was made. The Russian school of probability theory and mathematical statistics made a universally recognized contribution to these sciences. Its potentialities are not only very far from being exhausted, but are still increasing. During last decade there appeared many remarkable results, methods and theories which undoubtedly deserve to be presented in monographic literature in order to make them widely known to specialists in probability theory, mathematical statistics and their applications. However, due to recent political changes in Russia followed by some economic instability, for the time being, it is rather difficult to organize the publication of a scientific book in Russia now. Therefore, a considerable stock of knowledge accumulated during last years yet remains scattered over various scientific journals. To improve this situation somehow, together with the VSP International Science Publishers and first of all, its director, Dr. Jan Reijer Groesbeek who with readiness took up the idea, we present this series of monographs. To provide high-qualified international examination of the proposed books, we invited well-known specialists to join the Editorial Board. All of them kindly agreed, so now the Editorial Board of the series is as follows: A. Balkema (University of Amsterdam, the Netherlands) W. Hazod (University of Dortmund, Germany) V. Kalashnikov (Moscow Institute for Systems Research, Russia) V. Korolev (Moscow State University, Russia)—Editor-in-Chief V. Kruglov (Moscow State University, Russia) J. D. Mason (University of Utah, Salt Lake City, USA) E. Omey (EHSAL, Brussels, Belgium) K. Sato (Nagoya University, Japan) M. Yamazato (University of Ryukyu, Japan) V. Zolotarev (Steklov Mathematical Institute, Moscow, Russia)—Editorin-Chief iii
The scope of the series can be seen from both the title of the series and the titles of the forthcoming books: • V. V. Senatov, Normal Approximation: New Results, New Methods, New Problems·, • Yu. S. Khokhlov, Generalizations of Stable Distributions: Structure and Limit Theorems·, • V. E. Bening, Asymptotic Theory of Testing Statistical Hypotheses: Efficient Statistics, Optimality, Deficiency, • N. G. Ushakov, Selected Topics in Characteristic Functions·, • V. V. Uchaikin and V. M. Zolotarev, Chance and Stability; • G. L. Shevlyakov and N. 0. Vilchevskii, Robust Estimation: Criteria and Methods. Among the proposals under discussion are the following books: • V. Yu. Korolev and V. M. Kruglov, Random Sequences with Random Indices·, • Ε. M. Kudlaev, Decomposable Statistics; • G. P. Chistyakov, Analytical Methods in the Problem of Stability of Decompositions of Random Variables; • A. N. Chuprunov, Random Processes Observed at Random Times; • D. H. Mushtari, Probabilities and Topologies on Linear Spaces; • E. Danielyan and V. G. Ushakov, Priority Queueing Systems; • Ε. V. Morozov, General Queueing Networks: the Method of Regenerative Decomposition; • V. E. Bening, V. Yu. Korolev, and S. Ya. Shorgin, Compound Doubly Stochastic Poisson Processes and Their Applications in Insurance and Finance, as well as many others. Of course, the choice of authors primarily from Russia is due only to the reasons mentioned above and by no means signifies that we prefer to keep to some national policy. We invite authors from all countries to contribute their books to this series. V. Yu. Korolev, V. M. Zolotarev, Editors-in-Chief Moscow, July 1997.
iv
Preface The biggest and, possibly, most important for applications part of modern probability theory consists of the so-called limit theorems. In probability theory, the formulae which are precise and, at the same time, suitable for computation, are rather exception to the common rule. This fact inspires the demand to use approximations of the probability distributions and related characteristics which we are interested in. These approximations should, on the one hand, be convenient for numerical calculations, and on the other hand, possess a potential possibility to guarantee the required approximation accuracy. To provide us with the approximations satisfying the above requirements is the main objective of the limit theorems of probability theory. A very significant place among them is occupied by the limit theorems for sequences of distributions of sums of independent random variables whose values can be taken from a space of a rather general nature (real numbers, vectors, functions, etc.) For several decades, the scheme of summation of independent random variables and the related limit theorems plays the central role in probability theory. Although nowadays the focus of research is essentially biased, various stochastic effects in this scheme still intrigue a number of scientists. This long and deep interest in the scheme of summation of independent random variables can easily be explained. Its birth at the first half of the XIX century is mainly due to the creation and development of the theory of measurement errors. It has long been known that in experimental measurements a naturalist faces a whole spectrum of problems caused by the imperfection of the equipment, extraneous 'noises' and the limited capabilities of the human perception itself. Even Ulugh Beigh and Galileo attached a special significance to eliminating measurement errors. The demands of theoretical and applied astronomy at the XVIII century, related to developing seafaring, made the problem to eliminate measurement errors very actual. It is clear that the nature of the problem itself calls for application of probability theory. The errors of experimental measurements of a variable υ are constituted by many factors, which, to a first approximation, can be considered as independent. ~Let ΧΙ,ΧΪ, ...,X n be the active factors, and / be the law of formation ν
of the error δ while measuring the variable v, i.e., S=
f(X1,X2,...,Xn).
Knowing almost nothing about the factors Xj, ensuring only that they are small in absolute value and act independently of each other, we naturally consider these factors as independent and small in probability random variables. If we additionally assume that the function f is smooth in the totality of variables, then, using one, two, or more terms of the Taylor expansion, we can rewrite δ in the form
S = fo + Y^fiXi + Y/fijXiXj + ... » ij As we see, the first linear approximation of δ is a sum of independent random variables
ί«Λ+/Α + ...+fnXn.
The second factor favored to the summation scheme is the appearance of a number of linear, to some extent, applied and theoretical problems. The first limit theorem in the history of probability theory is due to the magnificent Swiss mathematician Jacob Bernoulli (1654—1705), which is contained in the fourth part of his book 'Ars Conjectandi'. Unfortunately, this book was not completed, and published at Basel only eight years after the death of the author 1 The Bernoulli theorem, also referred to as the law of large numbers, is interesting not only from the historic viewpoint. The law reflected in this theorem, in its further development, turns out to be one of the most important principle of the universe. Having initiated the research on theory of limit theorems, the law of large numbers remains the most important its part. But the importance of Bernoulli's book is not exhausted by this discovery. Although a very significant stage of the evolution of probability theory at the first half of the XVIII century was motivated by the routine knowledge accumulation (in particular, works due to Huygens were published in 1657, and communications between Pascal and Fermat, in 1679), the Bernoulli book plays a distinguished role. The algebraic notation used, and a more clear understanding of probability laid the foundation for the further boost of evolution of probability theory. In the period discussed, several books devoted to the theory and practice of analysis of random events came into existence due to Montmort 2 , (1708), Moivre (1718), Simpson (1740), and Bayes 3 (1763). 1 Giving credit to Bernoulli for his discovery as for an event of great scientific value, the Russian Academy of Sciences devoted the special session held December 1, 1913, to the 200th anniversary of the law of large numbers. Academician A. A. Markov and Professor A. A. Chuprov presented talks on the importance of the law of large numbers, which were then published in (Chuprov and Markov, 1977). 2 The first edition of Montmort's book was printed out before Bernoulli's book, but Montmort was informed on its contents and in the preface to his own book referred to Bernoulli's work as 'magnificent'. 3 The scientific activity of Bayes finished about ten years before his death in 1761.
vi
Nowadays, when probability theory has become a rigorous mathematical discipline and we can objectively estimate the practical value and perspectives of evolution of this science, the stimulating part played by Bernoulli's book is very evident. Using rather conventional wording, we can consider probability theory as the Earth, as it was imagined by the ancient scientists, supported by three whales. The first 'whale' is the notion of independence of random events. Precisely this notion distinguished probability theory from set theory, measure theory and functional analysis to an autonomous branch and encircled the domain of its applications. The second foundation is the formula of total probability, which lay at the ground of the whole variety of direct probabilistic methods. Finally, the third 'whale' is the law of large numbers, which established a connection between the theory and practice and filled the skin of mathematical models with particular numerical wine of the correlated actual processes. If we removed any 'whale', the complex building of probability theory would surely crash. The last 'whale', the law of large numbers in its simplest formulation, is the child of Bernoulli. The first one, in a fuzzy formulation, entered the probabilistic practice even in the XVII century, and the second was, undoubtedly, dispersed in the mathematical atmosphere of that epoch, since Moivre was acquainted with it. Thus, the process of initial formation of probability theory as a scientific discipline, finished in its main to the middle of XVIII century, was stimulated by Bernoulli's book, and hence we refer to the date of its publication, 1713, as the birthday of probability theory. The description of the further development of the theory of limit theorems can be found in Section 2.1. First of all it should be noted (and I must apologize to the readers for this) that the title of the book does not identically correspond to its contents. In our opinion, the book with such a title should contain a considerably more voluminous material than that which we managed to squeeze into the scheduled volume. Now, when we have an opportunity to survey all the conceptions and intentions of the classical theory of limit theorems as well as the results accumulated during its development for more than two hundred years, it is quite natural and very timely to propose a project aiming at the publication of an encyclopedia on various aspects of this remarkable theory in many volumes. Possibly, the publication of the series of monographs 'Modern Probability and Statistics' launched by VSP will somehow promote the practical solution of this large and labor-consuming project. The present book is the first book in this series. According to what has been said above, it should have been entitled, for example, in the following way: 'Reflections on the Theme of New Directions and Methods Which Appeared in the Theory of Summation of Random Variables During Last Decades As the Continuation of Traditions of the Classical Theory, vii
Some Its Generalizations, Extensions and Prospectives'. However, this wordy title as well as an unhurried and calm narration does not fit the traditions of our impetuous and impatient time. First of all, this book is addressed to young scientists who became interested in probability theory as in one of most important mathematical tools of the XXI century. They have all the grounds for this since probability theory will be an indispensable component of all future information, biological and social researches. The present book aims at helping young mathematicians who decided to devote their lives to probability theory or its various applications, more easily enter into the range of ideas, problems and methods of the theory of limit theorems. Without retelling the contents of the book, we briefly outline the key questions considered in it. 1. Limit theorems for sums of independent random variables without the condition of uniform asymptotic negligibility of summands which is traditional for the classical theory. 2. Limit theorems as theorems of stability of certain decompositions of limit laws. This is a principal point of view and the corresponding conclusions lead far beyond the limits of the scheme of summation of independent random variables. 3. It is impossible to analyze the stability effects within various probability models by traditional methods, for example, such as the method of characteristic functions. New methods should be applied. One of these promising methods which has already many times proved its value is the method of metric distances based on the concept of a probability metric. Therefore, much attention in the book is paid to the acquaintance with the theory of probability metrics. 4. Centers and scatters can turn out to be a methodically useful novelty. Until recently, these characteristics of random variables which can successfully replace expectations and variances in the problems related to the summation scheme were rarely used. 5. Probability theory and closely connected mathematical statistics have every reason to be regarded as mathematical disciplines which are nearest to practical applications. However, when one gets acquainted with limit theorems, whose practical importance is often emphasized by mathematicians themselves, this can hardly be seen. The overwhelming majority of refinements of limit theorems whose very essence is not only to point at some approximations but also to give estimates for errors of the approximations, do not fit the requirements: the estimates are often given in a non-numerical form, that is, the absolute constants are used without any numerical estimation or, which is even worse, the dependance of estimates of errors on the involved characteristics is often not given explicitly. This makes the estimates absolutely useless even for orientation. On the other side, the question of how natural the structure of traditional estimates is often bypassed. In other words, it remains beyond the range of consideration how natural it is to inviii
volve one or other characteristic of random variables into the estimate. This situation was the motive for us to include special sections in the book which deal with the problems of effective refinements of limit theorems and natural convergence rate estimates connecting convergence criteria and convergence rate. 6. When calling for the revision of the classical heritage, we should not restrict ourselves by pure proclamations. Therefore one of the sections of the book is specially set aside for the discussion of how complete the theory of limit theorems is. The work on the material of this book continued for many years. The intention to write a monograph on new approaches, problems and methods of the theory of limit theorems appeared as long ago as during my visit to Australia in 1971 where I read the semester course in this subject in the University of Sydney. The work on the material finally formed the book 'Modern Theory of Summation of Independent Random Variables' published in 1986 in Russian by the Nauka publishing house. However, my interest to all these directions was not exhausted and furthermore, it even grew to the new theory of probability metrics and its applications in limit theorems. After the book in Russian appeared, my knowledge and understanding of the subject continued their evolution and I began to think of the publication of a new, considerably revised and extended variant of the monograph mentioned above. Such an opportunity was given to me by the VSP publishing house who decided to launch a new series on probability theory, mathematical statistics and their applications. As compared to the Russian version published in 1986, this book contains principally new material. The whole Chapter 2 is new. The new Sections 1.6 and 6.6 were added to Chapters 1 and 6. The rest material was seriously checked and revised. I remember the time when I worked on both first and second versions of the book with great pleasure. First of all, very interesting and fruitful was my work at the Department of Analysis and Mathematical Statistics of the University of Sydney which was then headed by professor O. Lankaster, a well-known specialist in the field of mathematical and applied statistics. It was difficult for me to prepare the text of lectures, the prototype of this book, if not the help of my colleagues and friends, first of all, Dr. Mick O'Neill, my 'guardian angel', and Dr. John Mack, an excellent analyst. To some extent, each book is a result of a collective work. This monograph is not an exception. My disciples, colleagues and friends helped me in the preparation of the material of this book. V. V. Senatov not only carefully read and edited the Russian text, but also prepared Theorem 6.5.7 for Chapter 6. I. S. Shiganov helped in the preparation of references and the part of the book dealing with the problem of estimation of the constant in the BerryEsseen theorem. The discussion of the manuscript with L. Szeidl was very useful. V. M. Kruglov carefully read the Russian text and pointed out some inaccuracies. I could not but think of Boris V. Gnedenko who was the last ix
classic of the Russian school of probability theory. When the Russian edition of this book of 1986 was in preparation, he showed a keenest interest to it and, after he got acquainted with its contents, gave a series of useful advices which have not lost their significance. Β. V. Gnedenko died in December 1995, and I cannot express him my deep gratitude for his attention to my work and for his zeal for flourishing of the Russian school of probability theory. I bend down before the blessed memory of this magnificent mathematician and illustrious person. V. Yu. Korolev, whose energy in many respects provided the publication of this book, carefully read the new material and suggested some improvements of the proofs of theorems in Section 2.3. S. G. Sigovtsev and Β. M. Shirokov translated and typeset the manuscript. The design of the book and the tedious final computer editing is due to Α. V. Kolchin. My wife Galina Smychnikova selflessly supported all of my work. Moreover, having a profession very far from mathematics, she gave useful advices concerning the language and style. I am very thankful for all of them for help and support. On its final stages, the work on the material of this book was supported by the Russian Foundation for Basic Research, project 96-01-01920. V: M. Zolotarev
Moscow, June 1997.
χ
1 Introduction to the Theory of Probability Metrics Correct settings of problems concerning asymptotic approximations of some random variables by the others certainly contain the explanation in which sense the closeness of the random variables is treated: as the closeness of realizations of random variables defined on the same probability space, as the closeness of the distributions of these random variables etc. Usually such an explanation is reduced to a direct or an indirect definition of some topology in the set of objects under consideration. It can be done, in particular, by means of defining a distance between pairs of elements. If the problem of asymptotic approximation is solved at the qualitative level and the answer is searched in the framework of the alternative 'does approach'—'does not approach', then the method by which the topology was defined, is of no importance. However, in the case where we are interested not only in the fact of the approximation of two random variables itself, but in the rate of their closeness as well, it is necessary to introduce preliminarily the notion of the distance between two random variables corresponding to specified topology. Information about how to define distances in the sets of random variables in a reasonable way, what the properties of such distances are and what the methods of their application to the problems of approximative type are, forms the subject of the theory of probability metrics. A part of this information is collected in the present chapter. Though it makes up a comparatively small part of the general theory, even the presented material will not be used completely in the next chapters of this book. Such a superfluous at the first sight extension was made intentionally. The point is that the theory of probability metrics is a very young branch of probability theory yet. By the same token, the acquaintance with the origins of this theory which does not require to refer to journals will promote a more harmonic perception of the facts and the methods which will be used in the following chapters. Besides, those readers who will obtain an understanding of the new methods 1
2
1. Theory of probability
metrics
and will like to try them for solving the problems of their own, will have some initial base for it.
1.1. Probability metrics Let (Ω, 05, P) denote the probability space where Ω is a set of numbers from the interval (0,1), 55 is the system of all Borel sets from Ω and Ρ is the Lebesgue measure of these sets. Consider the set SC = {Χ(ω)} of all real-valued ©-measurable functions defined on R. In probability theory functions Χ(ω) are interpreted as random variables defined on the same probability space (Ω, 55, P). The set of random variables SC together with the measure Ρ of the basic space generates the space of marginal distributions of these random variables = {Ρχ: X e SC}, the space of distributions of pairs of random variables - {Ρχγ Χ,Υ e SC}, the space of joint distributions of triples of random variables from SC etc. If we are speaking about the measures of closeness of two random variables from SC, i.e., about the closeness between two real-valued 55-measurable functions defined on Ω, then it is natural to turn to the functional analysis first of all, where a similar question is solved by means of introducing the notion of a metric and its generalization, the notion of a semimetric. In this connection we should not neglect the specifics of probability theory where it is assumed not to distinguish random variables which coincide with probability one in calculations. Let @ be some non-empty set from SC possessing the property that together with each random variable X belonging to any random variable which coincides with X with probability one also belongs to this set. We shall call the sets of such type admissible. DEFINITION 1.1.1. A metric (semimetric) on an admissible set ® c SC is a mapping μ: = @χ @ R+ = [0, oo) possessing the three following properties. For any Χ,Υ,Ζ e % (1) (the property of complete identification for a metric) μ(Χ, Y) = 0 if and only if P(X = Y) = 1; (1A) (the property of incomplete identification for a semimetric) μ(Χ, Y) = 0 if P(X = Y) = 1; (2) (the property of symmetry) μ(Χ,Υ) = μ{Υ,Χ>, (3) (the triangle inequality) μ(Χ,Υ) L(X, Y) and ε" > L(Z, Y). In accordance with ( 1 . 1 . 8 ) these conditions imply that for all χ e R the following inequalities hold: Fx{x - ε') -ε' < Fz(x) < Fx(x + ε') + ε', Fz(x - ε") - ε" < FY(x) < Fz(x + ε') + ε'. Therefore, for any χ e R Fx(x - ε' - ε") - ε' - ε" < Fz(x - ε") - ε" < FY(x), Fx(x + ε' + ε") + ε' + ε" > Fz(x + ε") + ε" > FY(x). Hence L(X,Y) < ε' + ε". After passing to the limits ε' L(Z, Y), we obtain the triangle inequality. Thus, L is a simple probability metric in S) = SC. EXAMPLE 1 . 1 . 8
L(X,Z),
ε"
(Levy-Prokhorov metric).
π(Χ, Υ) = π(Ρχ,Ργ) = inf {ε: Ρχ(Α) < Ργ(Αε) + ε, Ργ < Ρχ{Αε) + ε, A e
55},
(1.1.9)
where 05 is the system of Borel sets from R and A® = {x: < e,y e A} is the ε-neighborhood of the set A. It is a direct analog of the Levy metric. The fact that the functional π(Χ, Υ") is a simple probability metric in SC is verified in exactly the same way as it was done for the Levy metric.
1. Theory of probability metrics
10 E x a m p l e 1 . 1 . 9 (λ.-metric).
λ(Χ, Y) = min {max [± max(|/x-(f) - /y(i)|: \t\ < Τ), 1 / τ| : Τ > θ} . (1.1.10)
The functional λ obviously satisfies conditions I S and 2. Let us verify whether or not 3 holds. Since the functions fx, fy are continuous everywhere, we can rewrite (1.1.10) as λ(Χ, Y) = \ max(|f x (t) - fY(t)|: \t\ < T 0 ) = 1 / T 0 ,
(1.1.11)
where To is the positive solution of the equation from the right-hand side of (1.1.11). Let λ(Χ,Ζ) = 1 / Γι and λ{Ζ, Υ) = 1 / T2, where Τχ and Τ2 are chosen for λ(Χ,Ζ) and λ(Ζ, Υ") according to (1.1.11). If To ^ min(Ti, T2), then 1 / T 0 < max(l / Tx, 1 / T 2 ) < 1 / Τχ + 1 / T 2 . If T 0 < m i n ( T i , T 2 ) , t h e n 1 / To = I max(lfx(t)
- /y(i)|: \t\ < T 0 )
< \ m a x ( | / x ( i ) - fz(t)I:
\t\ < T 0 ) + \ m a x ( | f x ( t ) - fY(t)\:
\t\ < T 0 )
< 1 / Τι + 1 / T 2 . Therefore, the triangle inequality holds. The functional λ is thus a simple probability metric on SC. E x a m p l e 1 . 1 . 1 0 (compound iifp-metric). .
% P ( X , Y) = (E|X - y|P)«Wi.Vp)>
T h e special case (compound
ρ > ο.
(1.1.12)
mean metric) corresponds to ρ = 1:
Ti(z,y) = E | x - y | .
u.1.13)
If 5£p{X,Z) and !£p{Z{, Y) are finite, then in the case ρ < 1 the triangle inequality is immediately verified, and in the case ρ > 1 it is an implication of Minkowski's inequality. As in the case of the mean ici metric, the set dC is partitioned into non-overlapping admissible subsets six, within which the functional ! £ p is finite. Since !£ p obviously possesses properties 1A and 2, this functional is a compound probability metric on each set sd x .
11
1.1. Probability metrics EXAMPLE 1.1.11 (engineer metric). S(X,Y)
= | E X - EY\.
(1.1.14)
The domain of definition of the functional $ is the set @ of the random variables with finite expectations. An immediate verification of conditions IAS, 2 and 3 shows t h a t i s a simple probability metric in 3). EXAMPLE 1.1.12 (uniformly quadratic metric). pl(X, Y) = sup I
J(P(X
e [ - Λ , 0] + * ) - Ρ(Y e [ - Λ . 0 ] + x)fdx:
h > 0j .
(1.1.15)
It is not difficult to verify that ^P(.X e \—h, 0] + x) = px(x,h) is the distribution density of the random variable X + ω/ι, where ω is a random variable independent of X and uniformly distributed on the interval [—1,0]. Therefore, equality (1.1.15) can be rewritten as Y) = sup{hß%(X + wh,Y + coh): h > 0},
(1.1.16)
where Y) = \ J(pxix)
- pY(x))2
dx
and ρχ is the distribution density of a random variable X. The functional μο is defined only on the set S^o of those distributions which have densities; moreover is partitioned into non-overlapping subsets within which the functional μο is finite, just like it was observed in the case of the mean zci metric. However, the functional p* is defined for any pairs of the distributions Ρχ,Ργ. Since sup{ I PCX" e [ - f t + * , * ] ) - P ( Y e [ - Λ + * , χ ] ) | : Λ > 0, * e R } < 1 and, in addition, i J P(X e [~h + x,x])dx = Jpx(x,h)dx
= 1,
(1.1.15) immediately implies that p*(X, Y ) ^ 1· The functional ρ» is a simple metric on SC. Indeed, the fact that the functional μο is a simple probability metric is almost evident. We only need some explanations for property 3, i.e., for the triangle inequality. But this inequality is an elementary corollary of the famous Minkowski's inequality. It is not difficult to transfer properties IS, 2, 3 from the functional μο to ρ» due to relation (1.1.16) between them.
12
1. Theory of probability
metrics
REMARK 1.1.3. Metrics and semimetrics which are not probability ones did not hitherto receive a methodical application. They are almost completely absent in the further presentation as well. The appearance of Definition 1.1.1 should be regarded as a natural logical step of discussing the general question about the choice of distances in spaces of random variables. If metrics and semimetrics which are not probability ones found their place among the suitable applications, they would probably gain some independent feeling generalizing the term 'probability metric'. Such situation inspires us to use everywhere in what follows, instead of the cumbersome terminology 'probability metric', 'simple probability metric', 'compound probability metric' the simplifications 'metric', 'simple metric' and 'compound metric' respectively. If we need to highlight the fact that a compound (simple) metric under consideration possesses property 1 (IS respectively) then we say that this metric possesses the property of complete identification. Of all the 12 examples of probability metrics presented above, only the engineer metric (Example 1.1.11) does not possess the property of complete identification. From our point of view it is undesirable to use the term 'semimetric' for some probability metrics only because they possess the property of incomplete identification 1A. Keeping such point of view, due to the same reasons one should call the uniform metric, the Levy metric, etc., semimetrics, that is to destroy the established terminological traditions without any necessity.
1.2. Minimal and protominimal metrics Let ® c dC be some admissible set and μ be some probability metric defined on it. As in the previous section, the set ® 2 is partitioned into non-overlapping subsets (3χγ, and the set of joint distributions ^ O 2 ) , into non-overlapping subsets S°xy respectively. Each of Ρ^χγ consists of joint distributions Ρχγ e & ( Ψ ) corresponding to fixed marginal distributions Ρχ, Ργ\ therefore, as well as &xy is defined for the given @ by the pair of marginal distributions (Ρχ,Ργ).
We consider the functional μ (Ρχ,Ργ)
= inf {μ(Ρυν):
Puv e
&χγ}
= inf{(£/,V): ( [ / , V ) c @ ! y } .
(1.2.1)
LetX, Y € then the set 9 χ γ is not empty and for each pair (U, V) e the metric μ(υ, V) is finite. Hence the functional μ takes some finite value at each set ^χγ· It is clear that the value of this functional is defined by the pair of distributions (Ρχ,Ργ) and that the domain of its definition is the set & ( % χ c 3>x χ & 1 , where &(3>) = {Ρχ: X e μ(αχ, αγ). Hence
μ(υ,ν) = μ(αχ,αγ),
{U,V) e = °2χ and therefore inf {Gz(®x>: X' e %χ\ = i n f { G z ( @ x : X ' e
°JX},
the second summand is also equal to GzC2>x)· Finally it turns out that dz(X) = 0. Thus the semimetric μο(Χ, Y) is exactly the semimetric with the minimal metric μο ξ 0 which we need. One property of minimal metrics makes them an especially important tool for obtaining various estimates in approximation problems. This is connected with the following inequality which we present in the general form, though still only its particular cases are used. Let G(u ι,..., u r ; υ ι,..., ) be a real-valued function defined for non-negative arguments. We assume that the function G
1.2. Minimal and protominimal
19
metrics
w6 • i
Wi
•.7i \ •
TS
y" χ
W3
w 5 74'·
76 \r
W'
Λ
Figure 1.1. (a) does not increase in each of the variables Uj, 1 < j < r; (b) is right-continuous at all non-negative points vi, 1 < I < s. We consider r + s tuples of pairs of random variables (Uj, Vj), 1 < j < r; (Xi, Υι), 1 < I < s related by the semimetrics δ, = öj(Uj, Vj) and the probability metrics μι = μι(Χι, Υι). From the set of the metrics μι we choose all the compound metrics μ^,.,.,μ^, τη < s, which are corresponded by the pairs of random variables (Wj = Xit, W- = Y^), 1 < i < τη. We construct the graph (which can consist of several connected components) whose vertices are the random variables Wi, W[,..., Wm, W'm, and the edges are the metrics μ^ connecting the pairs of random variables Wi, W[. We say that this graph Γ is unchangeable if its structure (i.e., the metrics denoting the edges of the graph, their interconnection and the sets which the metrics are defined on) remains invariant. In order to explain this, we consider the example of a tuple of 6 pairs of random variables (Wi, W'2),..., (W6, Wg) related by compound probability metrics γι = μ w h i c h may be both coinciding and different. The graph Γ of these random variables satisfying the additional conditions W[ = W2 = W3,
W'3 = WiW5,
W'5 = W6,
W'e = W'4,
is given in Fig. 1.1. We pay attention to one peculiarity of the graph Γ: it has the cycle (W5,W6,W'4,W5). THEOREM 1.2.4. Assume
that the G(51
inequality δτ;μι,...,μ8)>0
(1.2.8)
20
1. Theory of probability metrics
is true for any tuples of the random variables (Uj, Vß, 1 < j < r, (X[, Y{), 1 < I < s, from the domains of definition of the related fixed semimetrics δ, = 5j(Uj,Vj) and the probability metrics μι - βι(Χι, Υ}) respectively. If the graph Γ constructed according to the above rule is unchangeable and besides, it contains no cycles, then inequality (1.2.8) remains true if we replace the semimetrics δ, and the probability metrics μι by the corresponding minimal functionals hj and minimal metrics μι; i.e., (1.2.8) implies (1.2.9)
Τ{δι,...,Κ·,μί,...,μ3)>0.
REMARK 1.2.1. It is impossible to replace the assumption that μι are probability metrics by a more general condition that μι are semimetrics. However, such a replacement is possible for the trails (Wj, Wj) which are isolated from the other graph Γ, since in this case the minimization of j, W-) does not require any concordance with the minimization procedure of other compound metrics μι. This situation takes place in the particular case 2 below. The proof of the theorem is based on the same principle of choosing sequences of special tuples of random variables, which was already demonstrated during the proofs of Lemma 1.2.1 and Theorem 1.2.3. This rather cumbersome formulated theorem found its application only in several particular cases. We mention two most important of them. 1. We have met this case in Lemma 1.2.1, when we were proving that the functional μ satisfies the triangle inequality. Let G(ui; i>i, v2) = vi+v2-
u\.
This function obviously satisfies requirements (a) and (b). Put {U\ = X, Vi = Υ), (X\ = X,Y\ = Z), (X2 = Z,Y2 = Y). These pairs are connected by one and the same compound probability metric μ. Then by virtue of Theorem 1.2.4, the validity of the inequality μ(Χ,Ζ) + μ(Ζ,Υ)-μ(Χ,Υ)>
0
for any tuples of random variables Χ,Υ,Ζ e ® implies the validity of the inequality μ(Χ,Ζ) + μ(Ζ,Υ)-μ(Χ,Υ)> 0. 2. Let φ(χ), ψ(χ) be non-negative functions defined on the semiaxis χ > 0, moreover the function φ(χ) does not decrease as χ grows and ψ(χ) is rightcontinuous at each point χ > 0. Further, let δ(Χ, Y) and μ(Χ, Y) be compound semimetrics defined on the admissible sets and 2>μ respectively, provided • ®μ· If for any Χ, Y e ® μ the inequality φ(δ(Χ,Υ)) 0,
Jo
which is obviously bounded, satisfies the Lipschitz condition and therefore belongs to We obtain 1/fT(x)d(Fx(x)
-Fy(x))| = | |
fT(x)(Fx(x)-FY(x))dx
= j |ii(x)-Fy(x)|/(|*|
sup | |
fT(x)
d(Fx
- FY
:Τ
>
0J
= κ^Χ,
Y).
1.3. Special minimal and protominimal
This means that within each admissible set considered, τi(X,Y)
=
metrics
23
where the metric τ\(Χ, Y) is
Kl(X,Y).
This relation can be immediately generalized, if we replace the random variables Χ, Y in it by some functions of these variables (which naturally requires the corresponding change for the domain of definition Let H(x) be a non-decreasing absolutely continuous function on R with the derivative h(x) = H'(x). Then Kl(H(x),H(Y))
= J\P(H(X) = J \Fxb)
0.
COROLLARY 1.3.2. The second important particular case of Theorem 1.3.1 is one result due to Dobrushin (Dobrushin, 1970). It was established before the
expanded variant of the Kantorovich theorem was proved. In Theorem 1.3.1 specify the metric d(x,y) = diOc,y) = I(x * y). Then (see (1.1.5)) τ(Χ, Y;di)
= EI(X *Y)
= i(X, Y).
On the other hand, ζ(Χ, Y;d\) coincides with the distance of total variation σ(Χ,Υ) (see (1.1.2)). Indeed, select in 3-{d\) the set of the functions { / a W = I(x e A): A e 53}. Then obviously ζ(Χ,Υ;άι) > sup{|E/(X e A) - ΕI(Y e A)|: A e 35} = sup{|P(X € A) - P(Y e A)\: A e 05} = σ(Χ, Y). Besides, the condition |/(ac) — /(y)| < 1 defining the set exists a constant c such that { * : 0 < fix) - c < 1 } υ { * : 0 < c - fix)
1/2 and c — fix 1 / 2 or fix — 1) — c > 1/2 a n d c - / ( x 2 ) > 1/2.
24
1. Theory of probability metrics
In both cases the condition |/(*ι) — /(*2)| ^ 1 is violated. Further we obtain ζ(Χ, Y; d{) = sup 1 1 J (fix) - c) d(Fx(x) - FYix)) v) and with its help introduce the functional WXY(U,V)
= inf {Ww(u,v):
Puv e &>χγ},
which is naturally determined by the pair of real numbers u, ν and by the pair of marginal distributions Ρχ, Ργ (or which is the same, by the pair Fx, Fy). LEMMA 1.3.1. For any u, ν and Fx, Fy
Wxyiu, v) = max{0,F x iu) - Fy(i;)}. PROOF. On the one hand, Fxiu) - Fyiv) = PCX· < β) - Ρ (Υ, υ) < Ρ (Ζ < u) - PiX < u, Υ < ν) =
P(X u + ε) = 0 for any ε > 0, which means that L{X, Y) = 0. The symmetry of L is evident. Let us verify whether or not the triangle inequality holds. We choose εχ and ε u + ει + ε2} c {X< u,Z > u + ει} u {Z < u + ε, Y > u + ει + ε2},
1. Theory ofprobability metrics
28
we obtain WXY{U,
u + ει + ε 2 ) < Wxziu, u + εχ) + WZY(U
+ EI,U
+
ειε 2 ) < ει + ε2
for any u e R. In a similar way we obtain WYX(U,
U + ε ι + ε 2 ) < εχ + ε 2
for any u e R. But this means that L(X,Y) ε) = 0}.
It is a compound metric. Indeed, since the presence of the first two properties of a metric for the functional ξ is evident, it suffices to check whether or not the triangle inequality holds. Let ει > ξ(Χ,Ζ) and ε 2 > ξ(Ζ, Y). Since the inclusion {\X - Y\ > ει + ε 2 } c {\X - Z\ > £ l } υ {|Ζ - Y \ > ε 2 } takes place for the random variables specified, then P(\X -Υ\>ε1
+ ε 2 ) = 0,
that is, ξ(Χ,Υ) ξ(Χ,Ζ), ε 2 —> ξ (Ζ, Υ), we obtain the triangle inequality for ξ as a limit. Let us represent the metric ξ(Χ, Y) as ξ(Χ,Υ) = Οίπχγ,Ψγχ) = inf {ε: Ψχγ(ιι, v)I(u + ε = υ) = 0,
WYX(U,
V)I(U
+ ε = ν) = 0, u e R}.
This expression corresponds to notation (1.3.6), therefore we are able to apply Theorem 1.3.2 and arrive at ξ(Χ,Υ) = inf {ε: Fx{u) 1 looks like. The answer was found by Dall'Aglio in (Dall'Aglio, 1956) cited above, and in the more general setting, in the recent works (Cambanis, Simons, and Stout, 1976) and (Rachev, 1983). Before we formulate the general result which implies the answer to the question we are interested in, we make one remark. Let μ(Χ, Y) be some compound metric (or even semimetric). We have no reason to expect that in general situation there exists some universal algorithm to construct the minimal metric μ(Χ, Y) basing on given marginal distributions FX and FY. However, it is possible to suggest an upper bound for a minimal metric which pretends to be of good precision in the number of cases. Namely, denote by ω the random variable which is uniformly distributed on the interval (0,1), and introduce the random variables X = Ρ χ 1 ( ω ) , Y = F y 1 ( ω ) ) with the joint Hoeffding distribution function P(X < u, Υ
oo, n{cX,cY) -> v(X, Y) = sup{P(X eA,YiA):Ae
S3}.
S. Rachev found an obvious proof of the difference between π and Jif. Namely, let Χ, Y be independent and take the values 0 and 1 with probability 1/2 each. It is not hard to calculate that v(X, 7 ) = 1/4, i(X,Y) = 1/2. Since X(cX, cY) i(X, Y), c —> oo, then for c large enough we have π(cX, cY) < Jif(cX, cY). If we assume now that the upper bound in (1.3.14) is attained at some set Ao e 05, then π coincides with L in which the distribution functions Fx, Fy are transformed in the following way: Fx - » Therefore the metric π in (1.3.4) is minimized by the distribution F^y of the 'Hoeffding type': F%y(u,v) -
min(F%>(u),Fp(v)).
(1.3.15)
1.3. Special minimal and protominimal metrics
31
In the case where there exists no such set Ao, we deal with the limit of the sequence of distributions of type (1.3.15). Since the metrics σ and π are connected by the limit relation (we will speak in more detail about it in Section 1.5) σ(Χ,Υ)
= lim
c-> oo
K(CX,CY),
we can also speak about the minimizing role of the distributions of the Ήοeffding type' in the asymptotic sense in this case. We will complete our acquaintance with the special cases of minimal and protominimal metrics by proving two inequalities. They not only illustrate the methods which use minimal metrics, but also are a good tool for constructing quantitative estimates in the approximation problems. We consider some measurable mapping Β: R R, and assume that for the mapping Β there exists an estimate \B&)-B(y)\ 0 taking the values g(u) >u,ue (0,1), and g(0) = 0. THEOREM 1.3.5. If the functions Β and gare related by inequality (1.3.16), then for any random variables Χ, Y e the inequalities
X(B(X),B(Y)) ε) < ε. Transforming the inequality under the sign of probability, we obtain P(g{% — Y|) >#(ε)) < ε; hence with the help of (1.3.16) we find that P(|S(X) - fl(Y)| >g(e)) < P(g{\X - Y\ >g{e)) < e1 η ...,Υη)) < Σ >ι
(Xj,Yj) of the type
>1
since i(X, Υ) = ΕI(X ί 7 ) = P(Z * Y). The second inequality is obtained from the first one by the passage from the metric i to the corresponding minimal metric σ (see (1.3.3)). The central question which arises during such a passage is the question of consistency of the joint distributions in the whole ensemble of random variables during their transformation. So in our case we have the joint distributions Ρχ, Ργ of the vectors X and Y as initial ones. To conduct the minimization in the right-hand side of (1.3.19), we should select the joint distributions of the pairs of the random variables (Xj, Yj) in an appropriate way. For the sake of simplicity, let us assume that the minimizing joint distributions Pj for the distances i(Xj, Yj) exist, so that i{Pj) = σ(Χ — j, Yj) (otherwise we should consider the sequence of the distributions which produce the minimization in the limit). If we take a neighboring pair of the random variables, then in Fig. 1.2 we can see the cycle in which the joint distributions Qj, Q'J of the pairs (Xj,Xj+ χ) and (Yj, Yj+i) (horizontal dotted lines) are induced by the given joint distributions Ρχ, Ργ, and the joint distributions of the pairs (XJt Yj), (Xj+i, Yj+i) (vertical dotted lines) should be equal to Pj and Pj + i respectively. Since for the two pairs of the random variables under consideration we have no constraints on the choice of Qj and Q'J, whereas the choice of Pj, PJ+\ essentially depends on the marginal distributions of the random variables and on which compound metric is minimized, we cannot expect the consistency of the measures Qj, Q'J, Pj, Pj+\ in the general situation. In the particular situation under consideration, where Qj is equal to the direct product of the marginal distributions of the variables Xj and Xj+1, and Q'J
1.4. Ideal
X
m^.«H···· 1
τ • •Μ·4
j
1ΜΗ·
33
metrics
X
j+ 1
1 »·#·4 1
τ
τ Μ·4
Yj
Yj+i
τ ΜΗ··
Figure 1.2. is equal to the similar product of the marginal distributions of the variables Yj, Yj+1, everything is simplified essentially, since the specified form of Qj and Qj1 does not preclude any choice of the distributions Pj and Pj+i- This circumstance allows us to minimize each summand on the right-hand side of (1.3.19) while the distributions Ρχ and Ργ remain invariable. Further, whatever enters the left-hand side of (1.3.19), after minimizing the right-hand side of this inequality, it is certainly no less than the minimal value of the left-hand side, which yields inequality (1.3.20).
1.4. Ideal metrics Having a great variety of metrics and the general conception of their classification, structure, properties and interconnections, we rather naturally come to the conclusion that not all the metrics are equivalently suitable for application to problems under consideration. Some of them are capable to simplify the solution essentially due to their specific properties. Though these general arguments do not give anything by themselves, but being formalized in the form of several statements of principle and supported by quite well developed theory of probability metrics, they became a basis of the so-called method of metric distances. The main statement of this method is the account of structural features of a stochastic model, in whose framework the approximation problem is solved, together with the properties of the selected metric. The better the structural singularities are accounted for in the properties of the metric, the easier the problem posed is solved, the better the mechanism and the moving coils of the stochastic model under investigation are seen. Since this situation assumes the choice of the metric corresponding to the problem irrespective of how much this metric is popular or suitable for further numerical computations, there appears the necessity to recalculate the obtained estimates for metrics, whose specification is dictated by extraneous
34
1. Theory of probability metrics
reasons. This stage of solution helps us to overcome a special range of questions and the related facts, which can be conventionally called comparison of metrics. Here we will not discuss the method of metric distances in more detail, because we will return to it in the sixth chapter. A number of papers (Zolotarev, 1976; Zolotarev, 1979b; Zolotarev, 1983a; Senatov, 1977; Zolotarev, 1979a) can be recommended for the one who wants to get acquainted with its main assertions and examples of its application. The material of this section will serve as a good example of embodiment of the first of the presented statements, and the material contained in the next section can partly be an illustration of realization of the second one (although its true destination is wider in this book). In the framework of linear models which we will deal with further on, we consider the scheme of summation of random variables, the scheme of multiplication of random variables and any generalization of the scheme of summation of the form X\ w ..., where the random variables Xj (of finite or infinite number) are connected by some commutative and associative semigroup binary operation in X, which we agree to denote by ω. In fact, we are interested in the case of independent random variables Xj, for which the u/ operation induces the corresponding semigroup operation on the space of distributions and vice versa. The connection between these operations is established by the following relation (the notation ο is used for the operations in the space of distributions). For any independent random variables Χ, Y e 9C with distributions Ρχ, Ργ J ξ(χ)(Ρχ
ο Ργ) (dx) = Ε J J ξ(χνγ)Ρχ(άχ)Ργ(άγ),
(1.4.1)
where ξ (χ) is any bounded continuous function on R. The sign of mathematical expectation is for the case where the 'summation' of constant numbers χ φ y gives a random variable (i.e., the ω operation itself carries some probabilistic element). Such operations are already not seldom in probability theory. The examples are the Kingman (a, /3)-convolution and the generalized Urbanik convolutions expanding this notion (see (Kingman, 1962; Urbanik, 1964; Urbanik, 1984b)), which mainly turn out to be connected with the random operations of the specified type in SC. Considering the generalized w-sum of the independent random variables X = X\ w... wX„ with the desire to obtain simple approximations of its distribution Ρχ = Ρχ1 ο ... ο ΡΧη in mind, it is natural to start our search for the approximations also among the distributions of the w-sums Y - Yi w ... w Y„ of the independent random variables, whose specific properties allow us to calculate their distributions essentially easier. Actually this idea lies in the foundation of all the problems of the classical theory of limit theorems. It remains dominating in its modern development. What should be made a measure of difference for the distributions of such sums? Here the hint is the solution of a similar problem in the functional
1.4. Ideal metrics
35
analysis, where the notion of norm is introduced and allows us to define the distance between the elements which possesses quasilinear properties. Following this pattern, we will look for metrics which possess similar quasilinear properties. We divide the solution of the problem we have just set into two parts. At first, consider the operation of ordinary summation of random variables, and then turn to the general case of a semigroup operation. Recall that, beginning with this section, we use a simplified versions of the definition of a probability metric and its particular case, a simple probability metric only. In order to avoid any misunderstandings we formulate these variants below. DEFINITION 1.4.1. A mapping μ: [0, oo] is called a probability metric on 3C, if for any random variables Χ,Υ,Ζ e 9C the relations 1A, 2 and 3 from Definition 1.1.1 hold. The informative sense of these relations in the case where their left-hand side is equal to infinity is that their right-hand side should also be equal to infinity, and for relation 2 the converse is true as well. DEFINITION 1.4.2. A probability metric μ on % is called simple if it induces the mapping μ: ^ x ä ® 1 - ) [0. oo]. In this connection condition 1A is replaced by condition IAS. A probability metric μ on 3C which is not simple is called compound. 1.4.1.
The scheme of summation of independent random variables DEFINITION 1.4.3. A simple metric μ in ST is called an ideal metric of order s if it possesses the following two properties. Property of regularity. For any two random variables Χ , Y and a random variable Ζ independent of them the inequality μ(Χ + Ζ,Υ +
Ζ) 0, which possesses the additional properties: (1*) The definition of the metric 7 as a functional on χ SP1 can be extended to the set Μ ' χ Μ ' , where c jW c J i , Μ is the set of measures on R (the extension admits that some values of 7 on the extended set are equal to infinity).
1. Theory of probability metrics
42
(2*) For any measures m\, m2 e Jt' and any constant c > 0 cy(mi,m 2 ) = 7(cmi,cm 2 ). On the base of p(w) of type (d) in accordance with (1.4.11) we introduce the functional μ(Χ, Y) = lim sup tay(X + tU,Y + tU). i->oo
(1.4.12)
In this connection it is assumed that a > 1 and that U is the sum of independent random variables U = Ua = a>\ + ω α _ι, where ωο = 0 and (üh, h > 0, has the distribution with density A-1 — — / ( O c c ^ n i + fc)). T(h) The distribution of X = tU also has the density PX+tu(x)= [
Ptua(x-
u)dFx(u),
Jx-ßt
where βα~ι = Πα) and PtuSx) = ß(tß)~a[(minGx, /3ί)) α_1 - max(0,a; - ί)) α_1 ]· The latter equality shows that for any a > 1, χ > 0 taPtuM)
χ
α - 1
TT-r. Γ(ο)
t
°°·
The limiting function here is the density of some measure in the space Jt. Hence under the corresponding constraints of moment type on the distribution Fx it follows that for each χ tapx+tua ^ Ja-lFx{x)
= Γ (*~")0 J-00 Γ(α)
Γχ (χ - αΤ~2 = / τν ^Fx{u)du, J-00 Γ(α - 1)
1
dFx{u) t^oo.
(1.4.13)
Here $ a f (up to a constant factor) is the operator of fractional o-fold integration of the function / , which in the case of integer a turns into the ordinary α-fold integration of / . The similar limit is obtained for the distribution function as t —> 00: taFx+tua (x) H> JaFx(x)
for each x.
(1.4.14)
1.4. Ideal metrics
43
The functions J? a ~ l Fx and ,J?aFx can be considered as the distribution functions of the corresponding measures in the space moreover, :a 1 a ^ J F x ( x ) plays the role of the density with respect to ,9 a Fx . The formal passage to the limit in (1.4.12) with account of (1.4.14) gives the following representation of the metric μ:
Ji\
μ(Χ, Υ) = r(JaFx,
J ~ Fx(x)
JaFY).
=
(1.4.15)
Here we are speaking about the formal passage to the limit, since its justification requires additional assumptions concerning to γ (topological, structural etc.), which we do not intend to concern with. Formulae (1.4.13)-(1.4.15) give us interesting relations between metrics. We present some of them as illustrations without proofs.
For the distance of total variation σ and the mean metric κι lim to(X + tUlt Υ + tUi) = κ^Χ, Υ). 1
ί σ(Χ + tUs, Υ + tUs) = lim tsici(X
lim 3
f->oo
-II
+ tUs-i,
Υ
+ iC7s_i)
(X - u)|S—1d(F (u)-F (u)) Γ(s)
x
Y
dx. (1.4.18)
In the case s > 1 by means of integrating by parts of the interior integral in (1.4.18) this metric is transformed to
rx (x - u)s~2 (F (u)-F (u))du x Y Γ(δ - 1)
//
dx.
There is a metric on the right-hand side of (1.4.18), which at least for integer s coincides with the so-called -metrics. The latter are defined in a general case with the help of the functional of another type. Let us dwell upon their definition and properties.
44
1. Theory of probability
metrics
Let s > 0 be some number. It can be uniquely decomposed into the sum s = m + a, where m > 0 is an integer and 0 < a < 1. Denote by the set of real-valued bounded functions on R which at all points have the sth derivatives, provided that | / m ) ( * ) - / m ) ( y ) | < I* - y | a .
(1.4.19)
The metric ζ3, s > 0, is defined by the equality ζ3(Χ, Y) = sup{|E(/(X) - f(Y))I: / €
(1.4.20)
It can easily be seen that the functional ζ8 is a metric. For s = 0 the metric ζ, is defined by the relation ζο = lims_»o In this connection the set SPs is transformed into the set of the functions determined by the condition \ f(x) — f(y)\ < I(x ϊ y). Such condition precisely corresponds to the definition of the metric ζ(Χ, Y',d\) from Corollary 1.3.2, where we have proved that this metric coincides with the metric σ. Therefore ζο = THEOREM 1.4.2. The metric
ζ8, s > 0, defined
order s. In the case of integer
by (1.4.20) is an ideal
s > 1 the metric
ζ3 metric
metric
is representable
of
in the
(1.4.18).
form
PROOF. W e introduce two families of transformations of functions from the set
: = f(x + .y),
(Ayf)(x)
(Bcf)
=
c~sf(cx),
where y e R and c > 0. It can easily be seen that both families of transformations do not bring the functions / e S-s out of this set. Moreover, they establish a one-to-one mapping of the set 3·s onto itself. Let Ζ be a random variable independent of X, Y. Then |E(/(X + Z) - f(Y + Z))| < J |E(AyfiX) - Ayf(Y))\ dFz(y). Hence ζ3(Χ
+ Z,Y
+ Z)
0. Let us prove the second part of the theorem. We have Ε(f(X) -
f(Y))
= J f(x)(dFx(x)
-
Fy(x)).
(1.4.21)
We carry out the s-fold integration by parts on the right-hand side of (1.4.21) and obtain that |E(/(X) - /(Y))| = 1/ fs\x)Js-l(Fx(x) < J |/ s) (*)| ι Js~\Fx{x)
-
FY(x))dx - FY(x)))\dx.
(1.4.22)
The function f belongs to 3·s, which means that fs~l> satisfies the Lipschitz condition, and thus it almost everywhere has the derivative which does not exceed one, i.e., |/s)(x)| < 1. If we understand integral (1.4.22) in the Lebesgue sense, then the remark we made leads us to the estimate |E(/(X) - /(Y))| < J IJ? s ~\F x {x) - FY{x))\dx·, moreover, the equality is attained at the function /o, which is defined (up to an arbitrary polynomial Ps-\(x) of degree s — 1) by the condition β\χ)
= signJ?s-l(Fx(x)
-
FY(x)l
In the long run this means that Ζ3(Χ, Y) = J \Js'HFx(X)
- FY(x))\dx.
(1.4.23)
Though the function /o can be unbounded and therefore not belong to 3· s , there always exists a sequence fn e s, fn /ο, η oo, for which the integrals converge to integral (1.4.23). This argument should be kept in mind further on. REMARK 1.4.1. The proof of (1.4.23) can leave the reader unsatisfied, since /o was reconstructed up to the polynomial P s _i(x), and this requires some explanations; moreover, we have met with the same situation earlier while defining the ζ3 metric. The point is that the class 3· s is defined by the condition on the derivatives fm\x), but this means that, together with every f,f + Pm also belongs to the class 3-s if Pm is an arbitrary polynomial of degree m. The metric ζε(Χ, Y) which is formed by the sup procedure over the set is estimated below by the variable |E(F(x) — F(y))\ with any function f e 3s. In
46
1. Theory of probability
metrics
particular, this function can be a polynomial Pm(:c) with arbitrary coefficients. Therefore, ζ9(Χ, Y) > sup
5 > * E ( X * - Y * ) • a\, •••,am e R , k=l
But the lower estimate is not equal to infinity only in the case where condition (1.4.8) holds, i.e., as in all the above examined cases, this condition turns out to be necessary for the finiteness of the metric under consideration. The sufficient condition for the finiteness of Y) is obtained by joining the condition vs(X,Y) = J |x|s |d(FxCx) - FyOe))| < oo. to condition (1.4.8). This fact will be obtained in the next section as a corollary of the upper bound (1.5.41) for the metric ζ8 expressed in terms of the metric vs. In the case of integer s the condition of the finiteness of vs can be replaced by a less severe condition of the finiteness of ks (this metric has appeared in Corollary 1.3.1). As in the case of the metric vs, the sufficiency of the condition ks < oo is proved by constructing the corresponding estimate (1.5.42). The semiadditivity of an ideal metric (1.4.4) can be extended to a random number of summands. Let Xi,X2, ••·, ^ l , Y2,... be sequences of independent random variables and Ν be an integer-valued random variable independent of the random variables from both sequences. Then for any value ofs > 0 THEOREM 1 . 4 . 3 .
/ Ν Cs
Ν
Σ * - Σ
\;=i
t=i
\ Ρ
/
00 Σ
PiN
*
(1.4.24)
Yk).
PROOF. With the help of the formula for conditional expectations we obtain
iit^AM-^r-MiPHP))·
After transferring the right-hand side of this equality to the analytical notation of the metric ζ8, we obtain the inequality / Ν
Ν
\
00
\ί=1
ί=1
/
η=1 οο 0, m be an integer and 0 < a < 1. Then for
anyX, Y (1.4.25) The proof of (1.4.25) is reduced to the verification of the fact that (Γ(1 + α)/ γ(1 + s))|x|s e After that the inequality becomes obvious. We should mention that (1.4.25) is non-trivial only in the case where the metric ζ8(Χ, Y) is finite, and this assumes that necessary condition (1.4.8) holds together with some additional assumption like (1.4.9), (1.5.41) or (1.5.42). 1.4.4. For sums of a random number of independent random variables which were considered in Theorem 1.4.3, inequality (1.4.25) gives the following estimate of the difference of the sth absolute moments: COROLLARY
Ν
Σ * ί=1
Ν Σ
γ
ΤΎ1 ί
ί=1
00
R=1
For ordinary sth moments with integer s the same inequalities hold as (1.4.25) and the one which was just presented. THEOREM 1.4.5. Let s = m + a > 0, where m is integer and 0 < a < 1. Let us consider random variables Χ, Y for which ζ5(Χ, Y) < oo, and a random variable Ζ independent of them. If the distribution of Ζ has the r-times differentiable density pz(x) on R, r > 1, provided that Kr - Var(p^ -1 ') < then (1.4.26)
ζ,(Χ + Z,Y + Z)< ΚΓζε+Γ(Χ, Y).
PROOF. The property Kr < oo allows us to construct a sequence of distributions with densities p(x, η), η > 1, which possess the following properties: p(x, n) has r derivatives, moreover pM(x,n) = anp^(x)I(\x\ < n), where an is a positive constant, p(x,n) is equal to zero outside of some interval (which, generally speaking, depends on n) and
Kr(n) = Var(p{r~l)(x, n)) ->Kr,
η
oo.
Let Zn denote the random variable with distribution density p(x, n) and independent of X, Y; consider the functions / e and introduce the functions fn(x) = Ef(x + Zn) = J p(y — x,n) f(y)dy.
(1.4.27)
48
1. Theory of probability
metrics
The function fn(x) has at least m + r derivatives in all points. Indeed, differentiate expression (1.4.27) r times and then, after the change of the variable χ — y = u, differentiate the expression ff
= (-l)rJp
i r
\u,n)f(x-
u)du
m times more by virtue of the fact that f has m continuous derivatives. Finally we obtain fnm+r)(x)
= (-l)r J /
( x - u)pM(u,n)du.
m )
(1.4.28)
It follows from (1.4.28) that for any x, y l/n m+r) (*) - f r % ) I * / l / m ) ( * - «> - fm)(y - «)| \p(r\u,n) I du 0} of continuous (within the natural metric of R) and bijective measurable mappings of R onto itself. We make the following assumptions concerning the family 2Ϊ: (1) The set of operators & forms a group which is isomorphic to the group of positive numbers with respect to the operation of summation, i.e., TcITcII = TcIcII for any positive c', c"; T\ is a unity (identical) operator; T~l = Tc-i is the operator inverse to Tc. (2) Any mapping Tc e & possesses the property of distributivity with respect to the operation of generalized summation ω, i.e., for any x,y e R Tc(xwy)i(Tcx)w(Tcy), where the equality = is treated in the sense of coincidence of the distributions of the variables which are standing on the left-hand and right-hand sides of the equality (recall that χ wy can be a random variable). (3) In the set 9C there exists an admissible set of random variables possessing the following property: For any sequence Xi,X%,... of independent and identically distributed random variables from 9t1" there exists a sequence of positive numbers C\,C2,... such that for any η > 1. Xx ivX 2 w ... w Xn ί TCnXx.
(1.4.29)
Property 3 is an analog of the definition of strictly stable distributions in the summation scheme. This property has nothing to do with the problem of defining or constructing ideal metrics in the scheme of generalized ^-summation of independent random variables. Its destination here is different—to make the choice of the family & natural for the operation w under consideration. Equality (1.4.29) shows that the mappings T " 1 are capable to compactify the sequence of growing ω-sums of independent random variables T~\X^...wXjiXi. The definition of ideal metric in the scheme of generalized summation will be given in a wider interpretation than the above one. With this aim in view we introduce the class Ψ = { v(x)} of the real functions on the semiaxis χ > 0, which possess the following properties: (a) ψ(χy) < ψ(χ)ψ(γ) for any x,y > 0;
50
1. Theory of probability
metrics
(b) ψ(χ) does not decrease on the semiaxis χ > 0; (c) for any function ψ there exists a non-negative integer m, which is called the index of ψ(χ) (ind ψ) such that φ (χ) = ψ(χ)χ~ηι < χ on the semiaxis χ > 1, φ(χ) > χ on the interval 0 < χ < 1 and 0(+O) > 0. DEFINITION 1.4.4. A simple metric μ at SC is called ideal respect to the pair (Φ, ST))3 if it possesses the properties:
of order
ψ
(with
Regularity. For any random variables Χ, Y and any random variable Ζ independent of them μ(ΧνΖ,ΥυΖ) 0 and any random variables Χ, Y ß(TcX,TcY)
1 The invariance of a metric με with respect to the ω-shift of the random variables Χ, Y by a constant in general case where the semigroup does not contain the inverse elements etc., is not a consequence of the regularity. However, if the inverse elements exist (i.e., the semigroup is a group), then the invariance property will take place. COROLLARY 1.4.6 (corollary of homogeneity). The inequality μ(Χ,Υ)
0, is taken as the normalizing transformation. Equality (1.4.41) shows how the operation ο should be determined on the space Μ f of measures concentrated at the nonnegative semiaxis. The solution of equation (1.4.34) is the distribution function of the measure from M* of the form Hs(x) = max(0,x (s) ). Therefore
The ideal metric is formed on the base of (1.4.35) with the help of the metric γ = σ: μ(Χ, Y) =
a(HsoFx,Hs°FY).
1.5. Relationships between metrics We consider below different types of relationships between metrics: inequalities, notations of metrics in a new analytical form and topological relations. We begin with the latter.
1.5.1. Topological relations DEFINITION 1.5.1. The sequence of random variables {Xn} from 3C is called μ-convergent
to the random variable Xo from 9C, if μ(Χη,Χο)
0 as η
oc.
Comparing the topological possibilities of two metrics μι and ß2 we say that the μι metric is no stronger than μ2 (or, which is the same, ß2 is no weaker than μι), and write μι •< μ·2 if any μ2-οοηνβ^βηί sequence is μι-convergent as well. In the case where μι -< ß2 and there exists a sequence and Xo from SC such that μι(Χη,Χο) 0, μ2(Χη,Χο) Ά 0, we say that the metric μι is
1.5. Relationships between metrics
59
strictly weaker than ß2 (or, which is the same, ß2 is strictly stronger than μχ), and write ίμχ -< μ2· We call the metrics μι, μ2 which possess the property μι μ2, μ2 di μι equivalent and write μι ~ μ2· If one of the two relations, μι < μ2 or μ2 ^ μι, can be established between the metrics μχ and μ2 on SC, then these metrics are called comparable·, otherwise they are called incomparable. DEFINITION 1.5.2. Let μ be some metric on SC. The set si c SC is called μcompact if from any sequence of random variables Xn e si, η = 1,2,... it is possible to choose a subsequence Xnj which is μ-convergent as j —> oo to some random variable Xo e SC.
We will present below a number of facts concerning the comparison of metrics. A part of them are well known, as for the rest we can refer to the works (Ganssler, 1971; Zolotarev, 1976; Zolotarev, 1979b; Senatov, 1977; Zolotarev, 1983b). 1. For the metrics from Examples 1.1.1-1.1.12 the following topological relations hold:
π -< κχ,
Χ -Κ i,
Χ -< Τχ.
(1.5.1) (1.5.2)
2. Let us consider two metrics μι and μ2 at SC which possess the property of complete identification, i.e., property 1 or property IS in the case of simple metrics μι and μ2· Assume that metrics μχ and ß2 are comparable (say, μχ •< within some set si c SC. Then any two of the three properties given below imply the third one: PROPOSITION.
(a) the set si is μι-compact; (b) the set si is μι-compact; (c) the metrics μχ and μ2 are equivalent on si. This assertion implies the following criterion of convergence of random sequences. Let μχ and μ2 be metrics possessing the property of complete identification. If μχ •< μ2 then in order that the sequence Xn is μ2-convergent to Xo as η oo it is necessary and sufficient that the following two conditions hold: (a) μι(Χη,Χο) -» 0, η
oo;
(b) the sequence {Z„} forms a μ2-οοηιρ3ϋί set.
1. Theory of probability
60
metrics
3. Let μι -< ß2 be some metrics with complete identification on SC. We assume that a set si c SC is μι-compact. Then there exists a non-negative function ψ(χ, si) defined on the semiaxis χ > 0 such that ψ(+0, si) = 0 and for any X e SC,Y e si the inequality μι(Χ,Υ)
0 and put
1,
if χ > 0,
and FY(x) = 1 - Fx(-x). Then FY(0) = 1 - Fx(0) = 0, Fx(0) - Fy(0) = 1 and piX, Y) = 1. On the other hand, fxit) - fY(t) = (2πί / c){exp(—|ί|) - exp(-| a,y>4x/ß},
(1.5.6)
where
x[ZM(*) + W(y)]
We do not give the proof of this theorem. Ideologically, it follows the path found by Berry and Esseen, and the details can be found in (Zolotarev, 1967a) (but for a more narrow class of the functions Q, which is absolutely unessential) or in (van Beek, 1972). In Chapter 5 we will return to this inequality, because it is the most suitable tool for solving the problem of estimating the absolute constant in the BerryEsseen theorem. The questions of minimization of the function I(x,y) and the choice of the function Q will also be discussed there, as well as some others. COROLLARY 1.5.1. A customary particular case of (1.5.6), the Berry-Esseen inequality, appears if we choose the function (1 — cosx) / (πχ 2 ), for which 4 4
The function (jc)+ = max(0,x).
1.5. Relationships
between
metrics
63
q(t) = (1 — \t\)+, as Q, and assume that H2 is a distribution function (i.e., a monotone function from and Hi is an absolutely continuous function with the derivative bounded by d(H 1). In this case we have Bu1 = Bh2 = 0 , β = 0(0) = oo and D = ά{Ηχ) < oo, α = 1.7. If we fix any χ > a and denote χ
c 1 = ciOc) = — — [7-ΓΤ7, 2Λ(χ) — x||Q||
c 2 = c2(x) = A(x),
then for any y > 0 (1.5.6) yields p^.ff^icJl π Jo
\h1{t)-h2(t)\-+c2-\.
(1.5.8)
t
y )
We should mention one particular case of (1.5.8) which is obtained if we first lety tend to 00 and then let χ tend to 00: 1 f°° dt pdh,H2) < - / |Ai(i) - h2(t)I-. π Jo
t
(1.5.9)
The metric ρ is one of a few metrics invariant with respect to strictly monotone transformations Η: R —> R, i.e., p(H(X),H(Y))
= p(X,Y).
(1.5.10)
It is still the only known metric 5 which possesses together with (1.5.10) the property of arbitrarily close p-approximation of continuous distributions by discrete ones. 1.5.3.
Levy metric
Along with (1.1.8), the metric L can be defined by the equality UX, Y) = max(A(Z, Υ), Δ(Υ,Χ)), where
(1.5.11)
Δ(Χ,Υ) = max(min(y - x,Fx(x) - FY(y)): x,y e R).
From the geometrical point of view (1.5.11) means that L is equal to the side of the largest square, which can be placed between the graphs of the functions Fx(x) and Fy(x) (the coincidence of the graph points and the bound points of the square is admitted). The definition of the metric L is easily extended from the set 3· of distribution functions to the set of all possible non-decreasing functions on R. 5
Of course, if we do not take into consideration 'pathological' metrics (see (Ignatov and Rachev, 1986» and the metrics which are obtained from ρ by means of the transformation ψ(ρ) with the help of a non-decreasing and concave on the non-negative semiaxis function ψ.
64
1. Theory of probability metrics
The Levy metric which have appeared in probability theory as long ago as the uniform metric, but which did not become popular at that time, is closely connected with the Hausdorff distance between sets on a plane. The latter is defined according to some metric d(a, b) on the plane as follows. For any sets A, Β e R 2 H(A,B) = max{S(A,B),S(B,A)}, where S(A,B) = sup{inf(rf(a,6): b e B): a e A}. If we consider the graphs of the functions Fx and FY as the set of points on the plane, and take the Minkowski distance d(a,b) = max(|xi — X2I» b l — 721) as d(a, b), a = (*i,yi), b = {x2,yz), then H(FX,FY) = L{FX,FY). This interesting fact was mentioned in (Penkov and Sendov, 1962), but it is rather probable that Levy came to his definition of the L metric through the Hausdorff distance as well, though there are no direct indications to this in Levy's works. LEMMA 1.5.1. The metric L is a regular metric with respect to the operations X + Y and max(X,Y). PROOF. First of all we note that definition (1.1.8) of the metric L implies the equivalence of inequalities
L(X, Υ) < ε Fx(x - ε) < FY(x) < Fx{x + ε) + ε,
χ e R. (1.5.12)
Change* for χ —y in (1.5.12) and multiply all the four parts of the inequalities by dF%(y) and integrate over all y e R. As a result, this gives us that for any χ e R (Fx * Fz)(x - ε) - ε < (FY * Fz)(x) < (Fx * Fz)(x + ε) + ε. Hence, L(FX * FZ,FY
* Fz) < ε. Now, if ε
L(X, Y) we obtain (1.5.13)
L(X + Z,Y + Z) L): Fx(x) - FY(x) = Fx(x) - FY(x + ε) + FY(x + ε) - FY(x) < ε + QU, Υ), after which we pass to the limit as ε -» L(X, Y). Relation (1.5.15) becomes clear if in definition (1.1.8) of the metric L or in its equivalent (1.5.12) we replaced, Y by cX, cY, respectively, and then pass to the limit as c -» oo. The first one of inequalities (1.5.16) shows that L{cX,cY) approaches p(X, YO staying less than it. THEOREM 1.5.2. Let Χ, Y be some random variables with characteristic
tions fx, fY, and let Τ > 0. Then
L{X,Y) < - Γ If x (t) - fY(t)\- + 5 · 6 6 1 θ ^ + Γ ) . π Jo t I
func-
(1.5.18)
REMARK 1.5.1. At one time there was a hope that it is possible to get rid of the logarithmic factor in the second summand of estimate (1.5.18). This hope was based on some particular results. For instance, Yu. Khokhlov and S. Arkhipov
66
1. Theory of probability
metrics
(Tver University) obtained such a refinement under the additional assumption that the graphs of the distribution functions Fx, Fy (complemented by the lines of jumps) have at most two points of intersection. However, in May 1985 V. Popov (Institute of Mathematics of Bulgarian Academy of Science) reported at the International Conference on stability problems of stochastic models in Varna that it is impossible to get rid of the multiplier ln(l + T) in a general situation6. The proof of the theorem uses the following assertion. LEMMA 1 . 5 . 2 .
the following
For any random variables Χ, Y independent of Ζ and any δ > 0 inequality (1.5.19)
L(X, Y) L(X + Z,Y + Z) FY(x) < (FY * Fz)(x + δ ) + 1 - ί ζ ( δ ) < (Fx * Fz)(x + δ + ε) + ε + 1 -
Fz(5)
< Fx(x + 2δ + ε) + 1 - Fz(ö) + Fzi-δ)
+ ε.
The lower bound is obtained in a similar way: FY(x) > Fx(x - 2δ - ε) - 1 + Fz(ö) - Fz(-ö)
- ε.
(1.5.20)
According to the definition of the metric L, this means that L(X, Υ) < ξζ(δ) + ε. The passage to the limit as ε L(X + Z,Y + Z) gives the desired inequality. PROOF OF THE THEOREM. In view of (1.5.19), the first inequality (1.5.16) and (1.5.9) we have L(X, Y) < L(X + Z,Y + Z) + ξζ(δ) < p(X + Z,Y + Z) + ξζ(δ) 1 Γ OO dt \ίχ-ίγ\·\ίζ\+ ξζ(δ). π Jο t 6 See Lecture Notes in Mathematics, 1233: pp. 114-124.
(1.5.21)
Stability Problems for Stochastic Models,
1.5. Relationships between metrics
67
Let Τ > 9 be fixed. We choose as Ζ in (1.5.21) the normally distributed random variable with zero mean and variance σ 2 = 4T~2 log(l+T), and put δ = 2 V2T'1 log(l + T). In this case, as can be shown by elementary calculations,
therefore ξζ{δ) = 2 9 into account, we see that the right-hand side of this inequality does not exceed log(l + Τ) / (80πΤ). The presented estimates yield the inequality LiX,YulJQT\fx-fY\di+(4V2
+
1 \ log(l + T) ± )
The value 4\/2 + 1 / (80π) » 5.66 guarantees that the second summand of the inequality proved for Τ > 9 remains greater than 1 for all 0 < Τ < 9.
1.5.4. Distance of total variation The metric σ defined by (1.1.2) is one of the strongest metrics used in probability theory. As it was mentioned in Section 1.4, σ is an ideal metric of order zero with respect to any pair (w, 21) and in this sense it is unique. Among other metrics related to σ we mention the weighted distances of total variation, which are known as the sth absolute pseudomoments, s > 0: vs(X,Y)=
f \x\s\d(Fx(x) - FY(x))\ = f \x\s\Px(dx)-PY(dx)\ J
J
= Vi(X 0, is written as the sth absolute moment. Hence for any s > r > 0 we have ( i v r ( X , F ) ) 1 / r < (iv s (X,Y)) 1 / S . Other ways of specifying the properties of moment type for the absolute pseudomoments are also possible. Let r > 0, s > 0 and ms = vs+r(X, Y) / vr(X, Y). The ratio ms for fixed r and Fx, Fy is an sth absolute moment of some distribution, which means that (logm s ) / s is a non-decreasing convex function on the semiaxis s > 0.
69
1.5. Relationships between metrics
1.5.5. The Levy-Prokhorov metric The metric π defined by (1.1.9) is regular with respect to the operation of summation of random variables, but it is not homogeneous. Aside from (1.1.9), there is one more representation of the metric π which is an analog of representation (1.5.11) of the metric L: π(Χ, Y) = inf{max(f,υ(Χ, Y; t)): t > 0},
(1.5.25)
where v{X, Y; t) = sup{P x (A) - Py(A f ): A e t)dx:
Y e
J
tends to zero as t —> oo. REMARK 1.5.3. The random variable X has to belong to the same set or which is determined by it, and it can be replaced by any random variable from the corresponding set. Note that by adding to any finite set from we cannot change the main property of the set S>*, that is, vanishing of the function 5s(t) as ί —> oo, but we can change its rate. Besides, the function 5s(t) does not depend on the random variable X in its definition. The sets defined in such a way give the possibility to answer the question in which situation the metrics π and ics are equivalent within the set Consider first the case s = 1. LEMMA 1.5.3. For any t> 0 and any Χ, Y e LHX, Υ) < κ\(Χ, Y) < 3 \ / 2 ( t + 2 ) L ( X , Υ) + δχ(ί).
(1.5.30)
PROOF. The geometric interpretation of the metric L(X, Y) is the side of the largest square which can be placed between the graphs of the distribution functions Fx and Fy, and the metric κχ is interpreted as the area contained between these graphs (regardless of the sign of the difference Fx — Fy), which is no less than the area of the largest square which is L2. Let us validate the second inequality. We split integral (1.1.4) into two parts: Kl(X,
Y)=
( I Fx - FYI dx+ f \FX -FY\dx J\x\t
= J1
+
The second integral does not exceed δχ(ί), and the first one is equal to the area between the graphs of Fx and Fy on the interval [—£,£]. Take the graph of the
72
1. Theory of probability metrics
function Fy, and at the point χ = — t place the center of the square whose sides are parallel to the coordinate axes and are equal to L. After this, begin to move the center along the graph from χ = — t to the point χ - t. The strip covered by the square contains the second graph of Fx, which follows from definition (1.1.8) of the metric L. Estimating the area of this strip above, we obviously obtain the estimate of the integral The total length of the curve of the graph of the non-decreasing function Fy(x) varying from 0 to 1 cannot exceed 2t + 1 on the interval under consideration. We split the curve of the graph (which includes the vertical sections corresponding to the jumps of Fy) into η equal parts where η > (21 + 1) / ( W 2 ) > η — 1. It is not difficult to see that, moving within each of these parts, whose length does not exceed the diagonal of the square the square covers the strip with area no greater than 3L 2 . Hence, S2 < 3L2n < 3\/2 ^ + i +
L < 3y/2it + 2)L.
COROLLARY 1.5.2. Let us consider some set defined above and the set defined by the condition Y e 3>* Y 0, which contradicts the assumption. THEOREM 1.5.4. Lets>
1; then for any Χ, Y
Χ1+ε(Χ,
Y) < 2 s ~ 1 T s (X, Y),
where CM is the Ky Fan metric and Ts(X, Y) = E|X (s) — Y* s) | is the metric for the Ks metric;
π 1+8 (Χ, Y ) < 2s-1k-s(Z, Y).
(1.5.32) protominimal
(1.5.33)
PROOF. Let us find the minimum of a constant c such that for all x,y e R
|x-y| 0 and any Χ , Y we find that rs(X, Y ) > E|X ε) > 21~SE\X - Y|S7(|X - Y| > ε) > 2 1 - S £ S P ( | X - Y | >ε). We choose ε so that P(|X - Y| > ε) > ε; then xs{X, Υ) > 2 1 _ 5 ε 1 + 5 . In accordance with definition (1.1.6) of the Ky Fan metric, we have Υ ) > ε; moreover, ε can be arbitrarily close to the metric Jif. Therefore if we let ε tend to Jif(X, Y), we arrive at (1.5.32). Inequality (1.5.33) is obtained from (1.5.32) by means of the passage to the minimal metrics ts = KS and Μ = it, which is possible due to the particular case 2 of Theorem 1.2.4 (inequalities (1.2.10), (1.2.11)).
74
1. Theory of probability metrics
THEOREM 1.5.5. Let s > r > 1. For any Χ, Y (±τ Γ (Χ, Y)) ^ < (±τ 5 (Χ, Y)) ^ ,
(1.5.35)
(|κ>(Χ, Υ)) 1 / Γ
0 by the conditions: k(t) = 1 if t < aT, k(t) = 0 if t > Τ and kit) = (1 - t / Τ) / (1 - a) if aT < t < T. The derivative of this function k'(t) exists everywhere with the exception of the four points, and is an odd function which is equal to zero everywhere on the positive semiaxis except for the interval aT < t < Τ where k'(t) = - 1 / ((1 - a)T). According to (1.5.24), PROOF.
f°° 1+a W( 1 - k) < / f|dA'(f)| = . Jo
1 — α
Further, iz = izk + iz( 1 — k), hence using the property of semiadditivity of the functional W we find W{iz) < Wiizk) + W(iz( 1 - k)) = W1 + W2. Put u(t) =
ri/t,
if |fI > aT,
Mt),
if |i| < aT,
76
1. Theory of probability metrics
where v(t) is some continuous function on the interval |i| < aT with bounded total variation and v(±aT) = ±i!T. Denote the set of all functions with these properties by Ψ . Beurling (Beurling, 1938) proved that min{W(u): ν e Ψ } = •£= = W(u0), ZTCLL
i.e., this minimum is attained at some function vo of the type specified. Therefore, using the fact that 1 — k(t) = 0 in the interval |f| < aT, we can write iz( 1 — k) = Δ«ο(1 — k); hence in view of semimultiplicativity of W we obtain the estimate W2 < W(uo)W(l - Jfe)W(Ai) < π
+ a Var(F FY) x 2 aT 1 — a
1+α
The summand Wi is estimated by means of (1.5.23): 1/2
Wi
4
oi,
Wk>
θ2·
(2.2.15)
For the sake of definiteness it is worth mentioning that the convergence of distributions here and elsewhere is being considered in the Levy metric, i.e., it is weak.
2. The nature of limit theorems
110
Next we select a subsequence {k"} from the sequence {A'} such that lim Μ Λ " ) =
fei,
lim b2(k") - b2
(2.2.16)
exist (as k" —> oo). Obviously b\ + b2 = 1. Further, due to (2.2.15), we have (Uka *
Φ(* — αχ — 02)
and at the same time 0 k „ * Wik»)Cx) -> (tf * W)(x). Therefore (£7 * W)(x) = Φ(χ — αχ — and b\ - a f , b2 - a\. The distribution U is non-degenerate by virtue of the fact that σ\ > θ3 / 4, which follows from inequalities (2.2.7) and (2.2.13). Drawing on Cramer's theorem on the expansion of the normal law, we find that
On the other hand, due to (2.2.15) and (2.2.16), we have
From this we can conclude that as k" —> oo UUkn, Vk„) = L(Ük", Vk·') < L(Ük,r, U) + uu,
Vk„)
0.
But this contradicts the original assumption (2.2.13). PROOF OF THE THEOREM. SUFFICIENCY. L e t
^
= * Fnj, a„
Φ η ο ΔΛ
8
8
«(δ) < ^ 2 < ^ 2 < ε·
This completes the proof of the necessity of condition 2. Looking over the proof of the theorem, one might note t h a t the specific choice of the set 2l„ is not essential. In fact, other sets 2l„ could have been used. In particular, such a set would be = i/' : anj < Jn}, where y„ is a sequence of positive numbers approaching to zero. It is true t h a t then we would have had to require t h a t Δ η (δ) —> 0 for any sufficiently slowly decreasing sequence y„ —» 0. Of course the presence of some set in the formulation of the theorem which the summation is taken over makes it more difficult and it would be desirable to free oneself somehow of this peculiarity which is not natural to the Lindeberg-Feller theorem. It turns out t h a t such a form of the theorem exists. From our point of view, it is more satisfying than the one cited above. THEOREM 2.2.2 (second version). L(Fn, Φ) —> 0 as τι positive δ the following conditions are satisfied:
oo if and only if for each
(1) an = supL(Fnj,nj) -> 0, j (2) Λ„(δ) = W x2d(Fnj - Φη]) j ·Ί*Ι 0 it follows that Λ„(δ) —> 0, Λ η (δ) = Σ
+
Σ = < + < ·
114
2. The nature of limit
theorems
Since the variances of the distributions Fnj and Φ„; coincide,
S
~ Fnj) = Σ
a„
=Τ
/
χ2(1Φν
~
Αη(δ),
α*. / χ2άΦ < f χ2άΦ(χ). J\x\>S!a,y J\x\iSiy/ä^
Hence Κ| 0 such that 112, p(Fn, Φ) < C-^n' n> 1. (2.3.1) σ3 It was also demonstrated that if Xj are not identically distributed, then the right-hand side of (2.3.1) transforms into CMn / B® where Mn = βχ + ... + βη, ßj = EXj,j > 1. Further efforts within this problem were connected with the attempts to obtain the exact value of the absolute constant C at the Lyapunov fraction in the right-hand side of (2.3.1) or at least its good estimates. Now it is known (Shiganov, 1986) that C < 0.7915 in the general situation and C < 0.7655 for identically distributed summands. We will return to the BerryEsseen inequality in Section 5.4. However, it was realized that the structure of the Berry-Esseen estimate (2.3.1) is rather lame. First, it can reasonably indicate the value of the error only for very large η and second, no lower estimates of p(Fn, Φ) with the same order are known.
It is evident that the best convergence rate estimates should have the form ν/ι( 0 is sufficient for the central limit theorem while Vn -» 0 is necessary. From (2.3.3) and (2.3.4) it obviously follows that Ψι(ν'η) < ψ2{νΙ) that is, v^' —» 0 => v'n —> 0 as η oo. The latter inequality implies that < νr1^«)) where ψ^ 1 («) is the function inverse to i^i(u). A little later we will see that this inequality means that the metric v" is *heavier' than v^. However, it turns out that the upper estimates (2.3.3) and lower estimates (2.3.4) using one and the same functional, i.e., with v^ = v" can hardly be obtained. Therefore the problem of construction of the estimates of type (2.3.3) and (2.3.4) with different v^ and v" is reasonable. Among all estimates of this type of most interest are those which use equivalent metrics Vn ~ v^' (that is, v^ —» 0 v^' -> 0 as η oo) because they meet the requirements we discussed above. Therefore we will separate the estimates of type (2.3.3) and (2.3.4) which use equivalent metrics v'n ~ v" and call them natural.
2.3.3. Upper estimate of the rate of convergence in the generalized Lindeberg-Feller theorem The aim of this subsection is the construction of an estimate of the form p(Fn,0) < ψ(μη), where i//(u) is a nonnegative non-decreasing function defined for w > 0 such that ψ(+0) = 0. THEOREM 2 . 3 . 2 .
condition
(2.3.2)
Let the general conditions of the Bavli-Khinchin model except hold. Then p(Fn, Φ) 1.
(2.3.5)
2. The nature of limit theorems
120
PROOF. We use the well-known Berry-Esseen inequality: if fn and g are the characteristic functions of the distributions Fn and Φ, then for any Τ > 0 2 Γτ dt 24 p(Fn, Φ) < - / |/„(ί) - g(t)I - + ——j=. π Jo t πΤνϊπ
(2.3.6)
In the definition of the functional μ„ the main roles are played by the sum η . /(ε) = 2Σ / Μ I K j i x ) ~ Φη/(*)| dx j= 1 J\X\>E and ε 2 . Since as ε grows from 0 to oo, ε 2 continuously increases and /(ε) continuously decreases, then the equation ε 2 = /(ε) has exactly one solution εο for which μ = εο· To estimate p(Fn, Φ) with the help of (2.3.6) we should estimate |/n(f)—g(i)| where fn(t) = Uj fnj(t), g(t) = Il/g/yW = e x p ( - i 2 / 2). Choose ε so that η . ε2 = 2 Σ / kl IFnjix) j = i ·>\Α>ε
dx
With the account of (2.2.1) we have \fn(t)~g(t)\ ϊ Σ j a
-
~' Σ J
1
-
itX\ I
-
r
*n/x)\dx a
< * ϊ Σ [ * j J
*
/ \βί'Χ ~
K - < b * < * ) \ d x
m i n
( μ Σ y j
Γ x2\Fnj(x) J-ε
- Φη/χ)\άχ
= V Σ 2 j
+ 4 Σ j
+ f ( / V-Zkls« J1*1
f \x\\Fnj(x) - Φη/χ)\άχ J\x\ ψ(μη)
may not exist. This can be due to the fact that the functional μη is too heavy for such an estimate and we should look for some functional vn which is lighter than μη. Now we will try to explain what we mean by "heavy1 and 'light'. When we consider the functionals μη, λη, λ'η and similar to them, we can notice that all of them represent a distance between the sets of distribution functions αζ - (Fni,Fn2,...) and α® = (Φηι,Φη2. ·•·) i n the space of all such sets satisfying the general assumptions of the Bavli—Khinchin model (with the exception of condition (UN)). All of these metrics are mutually equivalent in the sense that μη(αζ,α*)^
0
Λ»(α£,α*) -> 0 λ > £ , α * ) ^ 0.
This is nothing else but the topological equivalence related to the space SC = { X } of random variables or to the space 3· = {i^} of distribution functions. We have already met with the notion of the topological strength of the probability metrics. Namely, μ ~ v, that is, the metrics μ and ν are topologically equivalent in SC, if for any sequence of random variables Χο,Χχ,Χ2,... from the set 3C we have μ(Χη,Χο)
-> 0 v(Xn,X0)
0,
η
oo.
A metric μ is weaker than ν (μ -< ν) if μ(Χη,Χ0)
0 oo,
(2.3.7)
but not vice versa. (If only (2.3.7) holds, we say that μ is not stronger than v, μ r< v.) It turns out that it is quite natural and reasonable to introduce additional classification within the sets of equivalent metrics with the help of the concept of the weight of a metric. 5Actually, using the smoothing inequality given in Lemma 12.2 in (Bhattacharya and Ranga Rao, 1976), or inequality (5.4.37), or Theorem 1.6.1 (see Section 5.4) instead of the Berry-Esseen inequality (2.3.6) we can sharpen (2.3.5) by showing that the absolute constant can be reduced at least to 2 —Ed.
122
2. The nature of limit
theorems
So, let μ and ν be some equivalent metrics in SC and let {Xj : j > 1} and {Y, : j > 0} be two sequences of elements of SC. We say that ν is heavier than μ and μ is no heavier than ν (μ ν) if μ(Χη,Υη) —» 0 0 such that Κ = \ max I/„(f) - g(f)| = 1 / T 0 . |t|sJo This is guaranteed by the structure of the metric λ. Quite similarly we can conclude t h a t there exists Τι > 0 such t h a t Κ = X(logfn, logg(t)) = \ max | log/„(f) - log#(f)| = 1 / T 1 . |i|sTi Recall t h a t ^ ( i ) = e x p ( - f 2 / 2). It is quite evident t h a t if
> T0, then
But if Τι < To, then the lower estimate of λ^ through λ^ will have another form. From Lemma 2.3.3it follows that \ez - 1| > 2\z\ / π if \z\ < π / 2. Therefore putting ζ = log(/„(f) /g(f)) we obtain |/„(f) - g(t)I > -g(t)I log /„(f) - log£(f)| κ hence by virtue of the relation To > T\ it follows t h a t 2 2λη = max |/„(f) - #(f)| > max |/„(f) - g(t)\ > - max^(f)| log/„(f) - log^(f)| |i|sr0 |ί| ) ) J ./|u|>e
0.
(2.4.2)
REMARK
2.4.1. In particular one can choose b\j = σ^. In this case Rotar (Rotar, 1975) showed that the union of conditions (i) and (ii) is equivalent to the following: for any ε > 0 £ / \u\\Fnj{u) - nj(u)\du0. j J\"\>£
(2.4.3)
In 1968, Macys (Macys, 1968) was able to extend Theorem A to the case of limit distributions in the Linnik class IQ C (S, and in 1969 the problem of constructing limit theorems without the (UN) condition was completely solved (see (Zolotarev, 1969; Zolotarev, 1970c). Kruglov (Kruglov, 1971a; Kruglov, 1971b) succeeded in extending these results to the multivariate case. The formulation of main results of this theory involves certain additional notions and associated facts, plus a slight change in the statement of the problem. Let G € & and let g be the corresponding characteristic function. We introduce Δ(G) = sup{fc: inf(|g(i)|, |i| < k ) > 0}. We point out two properties of Δ(G): If G = Gi * G 2 * ... where Gj e
then
A(G) = inf(A(G,·): j > 1).
(2.4.4)
If the sequence of distribution functions Fn weakly converges to G, then liminf (2.4.5) n-»oo A(Fn) > A(G). The second quantity we need is called the r-center of the distribution function G: C r (G) = - 3 l o g g { r ) , 0
where the function Fn (and similarly Gn) is constructed from the distribution functions Fnj by the rule Fn(u)
=
Σ
J
[ min(l, J — 00
w2)dFnj(w).
We remark that the uniform boundedness of the total variation of Fn is a consequence of the relative compactness of the set {-F„} and thus is a necessary condition of weak convergence of Fn to G (in this context see Chapter 6 of this book). A basic interpretation of limit theorems as some sort of stability theorems is offered by the general model of stability of the so-called pure characterization problems. It amounts to the following. We consider two abstract spaces X = { * } , 2) = { y } in which subsets 21 c X and 55 c 2) are given. Also given is a mapping J : £ —> 2)> which may or may not be one-to-one but with the condition that the full preimage of 2) is X. The implication 0c2t,Jxe B ) ( r e C = 2 l n J - 1 5 3 ) , where J - 1 05 is the full preimage of 55, is called the pure problem of characterization of the set £ by the sets 21 and 55. This model can cover all characterization problems. Let us further assume that in the spaces X and 2) there are defined distances between the elements μ(χ,χ') and v(y,y'). By a distance μ in X (similarly, ν in 2)) we mean any functional μ: XxX —> [0, oo], with the property that μ(χ,χ') = 0 χ = χ'. In particular, any metric in X can play the role of μ. By the distance between subsets J), D ' in X we mean, as usual, μ ( 5 ) , © ' ) = 8up{inf(M0e,x#):
ε £>',* e £ ) } .
Let us put δ
= μ(χ, 21) +
v(Jx,
03),
λ = μ(χ,
€).
The pure problem of characterization of the set C by the sets 21 and 55 equivalent to the assertion δ = 0 λ = 0, is said to be stable (with respect to the chosen distances μ and v) if δ->0
λ - > 0 .
(2.4.14)
2. The nature of limit theorems
130
Let us examine Theorem A from the viewpoint of this model. We introduce two spaces: 2) = & and the space X whose elements are tuples χ = (Vi, V2,...), where V, e 3F, Vi * V2 * ... e 3· and furthermore J
u dVjiu) = 0,
Σ
/ " 2 dVJ(u)
=
L
j
The mapping J: X 2) is defined by the equality Jx = Vi * V2 * · · ·, 21 = X and the set 05 consists of just the distribution function Φ of the standard normal law. For ν and μ we choose v(G, G') = L(G, G'), the Levy metric on SF, and μ(χ, χ') = sup {UVj,
Vj) :j>
1} + L(V,
V')
where V(u) = Σ
w2dVj(w)
j J—00
and the similarly defined V'{u) are distribution functions. In the model under consideration the set to be characterized C = 21 η J - 1 ® = 17_1Q3 is the set of all possible collections of distribution functions (Φβ which the distribution function is decomposed into and which additionally have zero expectations and variances adding up to 1. According to Cramer's theorem, all Φ/ are distribution functions of normal distributions. Thus, we have δ = v(Jx, OO
(2.4.17)
2.4. Limit theorems as stability
131
theorems
for each u e R. The set {Φ„} is relatively compact. Indeed, if u > 0, then roo /
1 - Φ„(β) = £
r oo dOnjiw)
j J" roο < Σ
b
poo W d«K«;) Ju/b^j
= £ j =
Ju
Ju
W2 άΦ(ιν).
A similar inequality holds for Φ η (α) for u < 0._ Hence, 1 — Φ„(μ)+Φ„(—u) 0 as u —> oo uniformly in n. In the case where {Φ η } is relatively compact, (2.4.17) is equivalent to (see Section 1.5) L(Dn, Φ„) -4 0,
η
oo.
Thus, we can represent the criterion of Theorem A in the following metric form: as η -> oo & =LCF„,®)-> 0 «=> λη = μ(χη,ζη)
= s u p { U F n j , Φηβ :j>
1} + L ( F n , Φ η ) -> 0.
(2.4.18)
Now let us assume that (2.4.16) does not hold, e.g. that 0 does not imply λ 0. Then there must exist h > 0 and a sequence x (n) = ( i ^ . F ^ , . . . ) e X such that δ (η) = L(F (n) ,Φ) 0, whereΙ* η ) = F(1n) * * ... ε 9 but λΜ =
j-i h > 0.
(2.4.19)
Since μ(χ(η\ J " 1 © ) < μ(χ(η\ζΜ) for any sequence z(n) = ( Φ ^ , Φ ^ , . . . ) e C = _1 J Q3, (2.4.19) contradicts (2.4.18). The violation of (2.4.16) in the other direction is treated similarly. REMARK 2.4.3. The distance μ in the construction of criterion (2.4.16) can be chosen in many ways just as one and the same topology of convergence of the distributions can be defined by structurally different metrics. In the case at hand, using property (2.4.3), instead of distance (2.4.15) we can take in (2.4.16) the distance μ(χ,χ')
= m i n < m a x [ h, ^
{
V
[
j
| u | | V / u ) - Vj(u)| du 1 : h > 0 1 . Μ > Λ
J
J (2.4.20)
REMARK 2.4.4. As is easily seen, distances (2.4.15) and (2.4.20) are metrics in the space X. Although Theorem A is only a special case of Theorem B, it has been singled out and specially analyzed in order that the mechanism of the considered
132
2. The nature of limit theorems
phenomenon be more evident. In the general situation both the 'plan of attack' and the basic reasoning remain the same. The differences are mostly technical. We begin with a description of the pure characterization problem with the effect of whose stability Theorem Β is associated. For each r > 0 let us define the set 3-r = {G e 3·: A{g) > r } . If r < s, then obviously SFs z> and the limit set for 3·τ as r 0 is SP, and as r —> oo, the set ^ of distribution functions G for which the characteristic function g(t) Φ 0, t e R. _ Further, let us fix any r > 0 and as 2) choose the set = { G e 3· r Cr(G) = 0} and as X, the set of tuples of centered distribution functions χ - (Vi, V2, ..·), Cr(Vj) = 0 , j > 1, such that Vi * V2 * ... e 3r. The mapping of X onto φ , as before, consists in the convolution of the components of x, i.e., Jx = V\ * * ... Further, we put 21 = X and as 03 we choose the set consisting of the single distribution function G e The next step consists in the choice of the distances. As the distance ν on 2), we take the Levy metric L, and define the distance μ on £ as follows. Let χ = (Vi, V 2 ,...) and x' = (V[,V'2,...) be any tuples in Χ, V' = Jx' and let Vj, v'j be the characteristic functions corresponding to the distribution functions Vj, Vj. We introduce the functional
q(x,x';T)
= max · Σ(νβ)
- v'jit))
oo a.s. by the strong law of large numbers. Uniformity of the distribution of the vector Y on the sphere Σ„ is nothing else than the uniform distribution of its direction in R". The vector X/ y/n also possesses this property in an asymptotic sense. Of course, the passage from the vector Y to the vector X/ \fn violates the exact characterization conditions of Zinger's theorem, but these deviation levels asymptotically preserve, in the same asymptotic sense, the conclusion of closeness of the distribution of the vector X / y/n to the corresponding vector Ν / y/n with spherically symmetric normal distribution (the components Nj of the vector Ν = (Νι, ...,Nn) are independent and have standard normal distribution). Of course, these plausible considerations need more thorough confirmation, and we will do this now. Let us suppose additionally that E|Xi| ? < oo for some 2 < q < 3, and let us consider the distance between distributions of the vectors X / y/n and Ν / y/n in the sense of the metric Xq, which is defined in the following way. Let Χ', X" be random vectors in R" with characteristic functions f'(t), f "(t),
139
2.5. The central limit theorem
t e R", and let ρ > 0. We set λρίχ',χ") = sup ||t||- p |f'(t) - f"(t)|. t We note the following properties of the functional λρ defined for pairs of η-dimensional distributions. (1) The functional λρ is a metric in the space of all η-dimensional distributions; moreover, it is a metric in the broad sense of this term since the value λρ = oo is admissible for λρ. According to our terminology, λρ is a simple probability metric in the space 2Cn of n-dimensional random vectors defined on the same probability space. (2) For any positive c and any pair of vectors Χ',Χ" e 3Cn, y c x ' , c x " ) = c%(x',x"). (3) If vectors X'1( X" are independent of the vectors X^, X^', then v x ' i + χ ^ , χ ϊ + χ £ ) < yx'i.x")+ipOCϊ,χ'ζ'). (4) If Χ',Χ" e SCn~l and we substitute these vectors into the metric λρ defined in the space 3Cn, we get the same metric defined in the space (5) If 2 < q < 3, then the coincidence of expectations and covariance matrices of vectors Χ, X" and the finiteness of the moments E|X'|9, E|X"| 9 imply y x ' , x " ) < oo. Now we return to the distance Ν/y/n), and represent the vectors X and Ν as sums of independent vectors belonging to one-dimensional subspaces of the space dCn~. X = Xi + ... +Xn, Ν = N i + ... + N„,
X;=(5yXjJ 0, then choosing X = 0 we obtain L(0, Y / 6) 0 => μ(0, Y / 6) 0 as b oo, i.e., μ(0,Y) = 0 for any Y. Due to the triangle inequality, this means that μ(Χ,Υ)
= 0.
Nevertheless, in the classical theory of limit theorems such inconsistent situation is avoided by means of the fact that in the general case the summands Xnj satisfy some conditions which allow us to proceed without scale normalization, confining ourselves to shifts of the sums Z„; and in the special
148
3. The normalization of random sequences
case where the growing sums are considered, and therefore it is necessary to use a scale normalization, only the variables Y with absolutely continuous distributions appear as the L-limits for Zn = (Z„ — an) / bn, for which L(Zn,Y) —> 0 p(Z„, V) —> 0. But the metric ρ is already invariant with respect to a scale transformation. It is clear that such approach is rather universal and in the theory of limit theorems for the generalized sums Zn the question about normalizing these sums can be also replaced with some special conditions upon the summands Xnj, but we should acknowledge that it is rather leaving the problem than solving it. So^having accepted the concept of limit approximation of the transformed sums Zn by the random variables Yn of a special form and using some metric μ comparable with L for this, consider the peculiarities of each of two distinguished possible cases μ ^ L and μ >- L. If μ >~ L, then the μ-compactness of the set {Zn } implies the L-compactness of this set and consequently, makes (3.1.3) impossible for Z„. The criterion of μrapprochement of Zn and Yn under the additional assumption of μ-compactness of the set {Ϋ η } (if Ϋη = Y, then such requirement holds automatically) is constituted, in view of the criterion of equivalence (see Section 1.5), by the condition L(Z„, Yn) —> 0, η oo,and by the minimal condition providing the μ-compactness of the sequence {Z n }. Now let μ ^ L. As in the previous case, we assume that the sequence {Y^} is μ-compact, which implies that the μ-compactness of the sequence {Z„} is a necessary condition of the μ-rapprochement of Zn and Y„. Unfortunately, these properties do not lead to the L-compactness of the sequence {Z„} and therefore they do not guarantee the absence of effect (3.1.3) for Z„. A natural strengthening of the above assumption, the replacement of the condition of the μ-compactness by the condition of the L-compactness of the sequence {Yn}. does not help us here. Nevertheless, we will try to get some orientation in possible situations. Consider simple metrics μ(Χ,Υ) = ß(Fx,Fy) which allow the expansion from the set 3- to the set 2Ui of the non-decreasing on R functions Mix) with total variation ||M|| < 1 and M(—oo) = 0. We say that a metric μ induces on a set A c R the weak convergence in 2Πι if for any sequence of distribution functions F \ , F 2 . . . , F n , . . . and some function Μ e 2Ui the property ß(Fn,M) 0, η oo, implies the convergence of Fn(x) to M(x) at every point of continuity of the function Μ on the set A. There are two mutually exclusive possibilities. (a) The set A is uncountable and contains its uncountable part both on the left-hand and on the right-hand semiaxes outside any bounded interval. (b) The set A is countable, or being uncountable it does not contain its
3.1. General normalization problems
149
uncountable part in at least one of the semiaxes sufficiently far from the origin. A partial answer to the question we are interested in is given by the following sufficient condition of the L-compactness of the sequence {Zn} which is μ-approaching the sequence Yn. Assume that a metric μ induces the weak convergence on the set A possessing property (a). If the sequence {Yn} forms a μ-compact set, then the sequence {Zn} is an L-compact set. THEOREM 3 . 1 . 1 .
Assume that the sequence {Yn} possesses the stipulated property and μ(Ζη, Y„) —» 0, η —» oo, but at the same time the sequence {Zn} is not an Lcompact set. We choose from {Y„} some subsequence {Y n '} with the property μ(Υηι,Υ') 0, η —> oo, where Y' is some random variable. Further, let ε > 0 be some small number. In the set of the points of continuity of the distribution of Y' we choose ξι > 0 and ξ2 > 0 belonging to the set A such that PROOF.
Ρ(-ξι 0;
G„Cx) = max(0,x ( 1 / 6 n ) ),
Gnhc) = min(0,x ( 1 / w ). After shifting the argument by a constant we obtain the transformations G„Gcw a n ) which are analogs of the linear transformation and are applied in the scheme of multiplication of random variables as successfully as the linear transformation in the summation scheme (Grigorevskii, 1 9 8 6 ) . If W is the semigroup operation of maximum, i.e., X W Y = max(x,y), then the solutions of equation ( 3 . 1 . 1 0 ) among the non-decreasing functions on R are all the functions of this class. EXAMPLE 3 . 1 . 5 .
EXAMPLE 3.1.6. Consider the stochastic semigroup operation defined by the equality ( g i s the function inverse to g) XwY = g-\e'g(X)
+ e"g(Y)),
(3.1.11)
where ε' and ε" are random variables independent of Χ, Y and of each other, distributed by one and the same fixed law, and g(x) is a continuous function on R, strictly increasing from — oo to +oo. For this operation the transformation Gn(x) satisfying (3.1.10) is the function Gn(x)=g-1(bng(x)),
bn> 0.
3. The normalization
154
of random sequences
Operation (3.1.11) is an extension of the notion of α-convolution, the particular case of the Urbanik generalized convolutions (Urbanik, 1964). It should be especially mentioned that the transformations Gn found as solutions of (3.1.10) do not necessarily L-compactify arbitrary sequences of sums {.Zn} formed with the help of the operation ω under consideration. In Example 3.1.4 we met the situation where the linear transformation is not capable to L-compactify the sequence of the growing sums of independent identically distributed random variables, while such compactification was established by the power transformation. However, this example which violates the recommendation to choose Gn as a solution of (3.1.10) becomes an argument in its favor if we take a look at it from the other side. Namely, already Darling in (Darling, 1952) paid attention to the following interesting phenomenon. Let us consider independent random variables X\,X2,.·. with common distribution function Fx and ordinary and generalized sums of these variables Z„ = X\ + ... +Xn, Yn = Χι w ... uiXn = maxCXi, ...,X n ), η = 1,2,... It turns out that in the case where the 'tails' of the distribution 1 — Fx(x) and Fx(—x) are slowly varying at infinity keeping to some consistency then Zn d — > 1,
η
oo
(for more details concerning this phenomenon, see Section 5.5). Hence it follows that a s n - > oo, L(Z£ m ,Y)^> 0
L(Y< 1M) ,F)-> 0.
The choice of the power transformation for the generalized sums Yn, as can be seen from Example 3.1.5, completely conforms to the recommendation, since the transformation Gn(x) = is a solution of equation (3.1.10) for the maximum operation. We agree upon the following abbreviated notation in addition to the symbol = introduced above which denotes the equality of distributions and to the symbol —» denoting the weak convergence of distributions. If X is some random variable, then X always denotes the variable X centered by its median, i.e., X=X-medX, s
and X stands for the symmetrization of the variable X, i.e. Xs
=X-X',
where X' is a random variable independent of X and possessing the same distribution. The presence of indices, asterisks etc. does not change the meaning of this notation.
155
3.2. Linear normalization
3.2. The choice of constants under linear normalization In the scheme of summation of independent random variables the linear transformations Z n = ( Z n — an) / bn are traditionally dominating. However, even dealing with such a simple transformation we face an uneasy question: how should we choose the constants an and bn in order not only to provide the L-compactification of the sequence { Z n } , but also to fit the construction of the limit theorems best? The situation where the variables Z n have finite second and therefore first moments is the simplest. In this case the constants in Gn can be specified according to the following rule: an = Ε Z n ,
b\ = DZ n .
(3.2.1)
Under such a choice of an, bn in accordance with Chebyshev's inequality we have Ρ (|Z„| >Tj
1,
which obviously excludes eifect (3.1.3) and yields the L-compactness of the sequence { Z n } (see the reference in the beginning of the preceding section). The constants of form (3.2.1) have one more advantage because they can be expressed via the corresponding numerical characteristics (expectations and variances) of separate summands X n j . At the same time, being a reliable tool of normalizing the growing sums Z n (i.e., in which the distributions of the summands Xnj are independent of n), the constants an, bn of the form (3.2.1) sometimes appear not to be suitable in general situation and cannot take into account the necessary shifts and contractions of the variables Z n . Let us present one simple example. EXAMPLE 3.2.1. L e t x n i , ...,X nn , η = 1,2,... be independent random variables of the form
Xnj =
η + 1 / \Jn
with probability g ( l — 1/ n2),
η — 11 y/n
with probability ^(1 — 1 / n2),
—n(n2 — 1) with probability 1 / n2. For these random variables we obviously have a
n
= η Έ Χ
η
\ = 0,
b2n
= n D X
n
l
= ( η
2
-
1)(n2 + 1 / n 2 ) .
By means of the asymptotic analysis of the characteristic functions f n ( t ) of the random variables Z n , it is not difficult to make sure that f
n
( t )
-> 1
Z
n
ί>
0
156
3. The normalization of random, sequences
as η -» oo, i.e., the universal normalization (3.2.1) does not lead to informative limit approximation in the L metric, though it provides the L-compactness of the sequence {Z„}. On the other hand, the same asymptotic analysis of the characteristic functions f* of the random variables Z* = (Z„ — a*) / b* connected with the other choice of the constants a* = n2 and b*n = 1 shows that, as η oo, fn{t) -> exp(—i2 / 2)
Z\ = Zn - η2 Λ Ν,
where Ν is the random variable with the standard normal distribution. It is not difficult to understand the behavior of the distributions of the sums Z„ and Z*. Let us consider independent random variables Vn\,..., Vnn, η = 1,2,..., which u n l i k e X n j do not take negative values, i.e., Vnj = η — 1 / \/n or η + 1 / y/n with probability 1 / 2 each. It is easily verified that Vn-n2
= V„i + ... + Vnn -η2
^N,
η
oo.
In this connection EVn = n2 and DVn = 1, i.e., normalization of the variables Vn is performed in accordance with rule (3.2.1). We see that the main thing to do in choosing the normalizing constants is to discover some center of concentration of probability measure and the order of dispersion of the measure around this center. At the same time, in certain situations the expectations and the variances are not capable to detect these characteristics which are essential for constructing the limit distributions, since they take into account and give too much meaning to the influence of small parts of the measure distant from the natural center. This circumstance compels us not to trust the choice of the normalizing constants according to rule (3.2.1), but to search for other more reliable ways of the choice. Moreover, the expectations and the variances do not always exist. It is interesting to observe how the question about the normalization of sums Z„ of the general form is solved inj;he classical theory of limit theorems. Let us write out the normalized sum Zn = (Zn — an) / bn as the sum Z„ = Xni + ... + Xnn - ant where Xnj = Xn / bn and an = antbn. Since we have the possibility to choose the constants a n , then without loss of generality we can assume that medX n j = medX„7 = 0. The median is not an additive characteristic with respect to summation of independent random variables, therefore their sum is not obliged to have zero median. However, this is not the main deficiency of median. Defining the point which divides the probability measure into two equal parts (taking no account of the part which they may have in common), the median does not have any information about the diffusion of these parts on the corresponding semiaxes. Finally, it may turn out that the centering of the summands by means of medians does not prevent the drift of non-zero part of the distributions of the sums Zn to infinity. This situation is
3.2. Linear normalization
157
well observed in Example 3.2.1, where medX„/ = τι — 1 / y/n and the resulting shift by means of medians is equal to a n = n 2 — ^/n instead of the required shift a* = n2, which actually provides the limit relation Zn - Zn — τι2 N. Hence the centering of the summands by medians is not sufficient. The situation requires additional corrections which are capable to take account of the dispersion of the distributions of summands Xnj centered by medians. Such corrections are the so-called truncated mathematical expectations anj(e) = Ε(X n j — medX„y)7(|Xnj· — medZny| < e), where /(·) is the indicator of an event and ε is a positive number which may depend on n. The resulting correction an is of the form an
η = ^(medZ,y
+ anj(e)).
(3.2.2)
The (UN) condition of uniform asymptotic negligibility of the summands Xnj (mentioned in Section 2.1) was put into the base of the classical theory of limit theorems. The (UN) condition does not yield medX n j = 0, 1 < j < n, but it implies that all these medians are close to zero. It turns out that such closeness makes it possible not to center the summands Xnj by their medians and to choose the correcting shifts anj(e) and an as if the medians of Xnj were equal to zero (see (Gnedenko and Kolmogorov, 1954)). The classical theory does not pose the question about the proper choice of the scale normalization in general situation, i.e., it is assumed that bn = 1, η > 1. It is absorbed by the initial setting of the problem, which contains only the shift and the conditions of convergence of the distributions of the sums Z n ~ Un = E ^ · ~ "«Ρ whose summands are properly centered. Of course, it is impossible to get rid of the scale transformation of the summands completely. Besides the fact that it is hidden in the (UN) condition, it reveals itself as some special requirements occurring in the criterion of convergence of the sums Zn — an. The shift an becomes related to the truncated expectations if the (UN) condition holds. In this case it is natural to expect that the latent problem of scale normalization appears in the form of the truncated variances of the summands η ά (ε) = £ DXnjI(\Xnj\ < ε). 2η
>ι
158
3. The normalization
of random
sequences
This expectation becomes true, and in the criteria of the classical theory of limit theorems for sums of independent random variables we actually meet the conditions of the form 2 limd (ε) = limd2 = dr = const, ε->0 ε->0 Λ
«
(3.2.3)
where d2(e) and d?{e) are respectively equal to the upper and the lower limits of the variables d2(e) as η -> oo. But this means that choosing the sequence of numbers ε„ which does not decrease to zero too fast, in accordance with the behavior of the variables d2(e) we can replace condition (3.2.3) with the following equivalent lim d2n(en) = d2. n->oo
Now the latent mechanism of the choice of the constants bn in the normalization of the sums Zn becomes clear. Let us rewrite the last limit relation as the equality d2(en) = d2k2, where kn —> 1 as η -» oo. Since η k~2d2n(en)
= Σ
D(*-%)/(£-%•
< k~len)
= d2,
7=1
the choice of the normalizing constant bn can be considered as the solution of the equation d2(en) = d2
(3.2.4)
with respect to the variable bn occurring on its left-hand side. This transcendental equation is rather complicated already in its notation, to say nothing of its solution. So the way of choosing the constants bn providing just asymptotical but not precise equality (3.2.4) seems to be preferable. Namely, this is the way it is done in the classical theory. We should draw the reader's attention to the fact that in Example 3.2.1 considered above the passage from the random variables Xnj to the variables Vnj and the corresponding modifications of the choice of the constants from an, bn to a*, b* actually represents the procedure of truncation of random variables. The technique of proofs based on the use of truncated expectations and variances has become one of the methodical foundations of the theory of limit theorems and is successfully applied in various applications. Different types of normalizing constants were used in the construction of limit theorems. However, from the conceptual viewpoint they do not go far from truncated expectations and variances which we considered. Having no obvious advantages, these innovations, as a rule, did not spread outside of the works of their creators.
3.2. Linear
159
normalization
Later, we will get acquainted with one interesting construction which is capable to correct the way of the choice of the factors bn in the linear transformation of the sum Zn. It turns out to be even more useful for the problem of converting of non-convergent series of independent random variables into convergent by means of shifting this series by a non-convergent numerical series. This information is of a certain interest for us, since further we will have to deal with infinite series of independent random variables. It will be also useful to compare the properties of the characteristic presented below with the distribution scatter which will be considered in the next chapter. All related information is taken from the first part of the nice book due to Ito (Ito, 1969), to which we address the readers who want to find the proofs of the facts presented below. DEFINITION 3.2.1. Let X be some random variable with the distribution function Fx and-X7 be random variable independent of it with the same distribution Fx. The order of dispersion of X (or of the distribution Fx) is the functional
δ{Χ) = - log{E exp(-|X - X'|)}.
(3.2.5)
This name of the functional δ{Χ) is justified by some of its properties similar to the properties of the variance of random variable3. However, δ(Χ) also possesses very valuable specific properties. (1) For any random variable X 0 < δ(Χ) < oo. (2) For any independent random variables Χ, Y δ(Χ + Υ) > δ(Χ). (3) The order of dispersion is invariant with respect to any shift of X by a constant c: δ(Χ + c) = δ(Χ). (4) δ(Χ) = 0 if and only if the distribution of X is degenerate. (5) Let fx be the characteristic function of a random variable X. Then
«^-Mi/l«»! 2 !^}· 3The quantity exp(—δ(Χ)) connected with δ(Χ) is called the order of concentration of the distribution ofX. By its properties it can be compared with the concentration function Q(x, X).
3. The normalization of random sequences
160
(6) If Yn Λ Y as η -> oo, then δ(Υ„) -> δ(Υ). (7) If a sequence of random variables { Y n } provides δ(Υη)
0 as η —> oo,
then there exists a sequence of constants { a „ } for which Y n — a„ Λ 0 as η —» oo. (8) δ(Υ η ) —> oo as η —> oo if and only if the concentration functions Q(x, Yn) —> 0 as η —> oo for each x>0. The following properties are directly connected with sums of independent random variables. Let Xi,X2, ... be independent random variables and Yn = X\ + ... +Xn. If η —* oo, then Yn transforms into the row Υ = Χι +X2 + ... of independent random variables. The value of Υ can be interpreted as the limit of the sequence { Y „ } which is treated in various senses, for instance Yn^Y,
Yn^Y
Yn-*Y
with probability 1.
It is well known in probability theory that the notions of convergence of infinite series of independent random variables are equivalent in all the three senses (see (Loeve, 1963; Feller, 1970)). Since Yn+i is equal to the sum of independent summands Yn +Xn+i then, by virtue of property 2, δ(Υη) < δ(Υη+1). Therefore there always exists a finite or infinite limit δ = lim δ(Υη). n-> oo
(3.2.6)
THEOREM 3.2.1. If for some sequence { Y „ } of growing sums of independent random variables the limit δ is finite, then there exist a sequence of constants {c^} and a random variable Y such that Yn — cn —» Y with probability one as η —¥ oo. Moreover, δ ( Υ ) = δ. The converse is also true. The sequence { c „ } is called the sequence of centering constants. THEOREM 3.2.2. Let a sequence {Yn} provide that the corresponding limit δ is finite. Choose the constants cn so that Ε arctan(Y„ - c„) = 0
(3.2.7)
(the solution cn of this equation always exists and is unique). Then the sequence { c „ } is the sequence of centering constants. The sequence { c „ } defined by equalities (3.2.7) is called the sequence of Doob's centering constants.
3.2. Linear normalization
161
If the transformation of a non-convergent series of independent r a n d o m variables Y2jXj into a convergent series, required in Theorem 3.2.1, is possible, t h e n it can be done by means of t h e corresponding centering of separate summands. Namely, using t h e centering of the form Xj = Xj - (cj - c/_i),
c0 = 0
by virtue of Theorem 3.2.1 we conclude t h a t t h e series Σ^-Χ) converges. Any non-convergent series Y^jXj which can be transformed into a convergent one by m e a n s of centering its summands by some constants is referred to as essentially convergent. If {c„} is some sequence of centering constants for t h e sequence {Y n } and {id n } is a sequence of numbers which h a s a finite limit d as η -» oo, t h e n t h e sequence {cn + dn} is again a sequence of centering constants. It is known (see (Loeve, 1963)) t h a t for each essentially convergent series J2jXj t h e role of t h e sequence of centering constants is played by t h e sums η an = ^ ( m e d Xj + EXjI(\Xj\ j=ι
< ε)),
n> 1,
where ε > 0 is some fixed number and I is t h e indicator of an event. By virtue of what h a s been said above, for t h e sequence of Doob's centering constants {c„} and t h e universal sequence of t h e centering constants {a„} t h e limit relation cn — a n const should hold as η —> oo.
4
Centers and Scatters of Random Variables Considering the problem of normalization of sums of independent random variables we use such numerical characteristics as median, truncated mathematical expectation, truncated variance and order of dispersion which are capable to take account of necessary information about the properties of the distributions of sums Z„ and of summands Xnj constituting these sums. Unfortunately, all of them do not possess the additivity property which provides the possibility to relate the characteristics of Zn and the corresponding characteristics of-Xiy. In this connection, ordinary expectations and variances are more suitable, but as we have said above, they have some deficiencies which do not allow to use them in some cases. It is natural to pose the question whether or not there exist characteristics of random variables which possess the main properties of expectations and variances and at the same time are free of their deficiencies, and by the same token are capable to compete with the characteristics specified above. It turns out that such characteristics do exist; this chapter will be devoted to the description of their properties. Recall that by fx(t) and Q(x,X) we respectively denote the characteristic function and the concentration function of a random variable X.
4.1. The definition and properties of index, centers and scatters Put A(X) = sup{k: min(|/y(i)|: |f | £ A) > 0}. On the set of all characteristic functions this functional can take both finite and infinite values. 163
4. Centers and scatters of random
164 DEFINITION
variables
4.1.1. The functional A(x) is called the index of a random vari-
able X. Let us get acquainted with the main properties of the index. Some of them are simple consequences of the definition of AC*) or are obtained by means of elementary reasoning, while the others require some explanations. Let X,Xi.X2, ··· be some sequence of random variables, a e R and c > 0 be constants. Then (1) 0 < Δ(Χ) < oo,
Δ(α) = oo.
(2) Δ(—X) = MX). (3) Δ(Χ + a) = A(X). (4) A(cX) = A(X) I c. (5) I f X n - 4 o a s n - ) o o , then A(Xn) -» oo. (6) If Xn Λ X and Zn Λ 0 as η
oo, then
liminf Δ(Χη + Zn) > A{X)\ n-> oo
(4.1.1)
moreover, the case of strict inequality is possible. Indeed, let us fix some r < A(X). Then according to the definition of the index inf {|/χ(ί)|: 0 < t < r} = a(r) > 0. We introduce the analogous quantities On(r) for the random variables Xn. Since Xn Λ X, for η large enough we obtain the inequalities an(r) > a(r) / 2. It is not difficult to calculate that Ifxn+zn(t) - /^(ί)| < 2P(|Zn| > ε) + ε, for any positive t and ε. We put εη = 2π(Ζη, 0); then the right-hand side of the last inequality by the definition of the metric X and (1.3.4) does not exceed 3ε„, εη 0 as η oo. Hence it follows that for η large enough the estimate inf {\fxn+zM
:0r,
which obviously implies the limit relation sought for. The possibility of occurrence of the strict inequality is illustrated by the following example with Z„ = 0.
4.1. Index, centers and scatters
165
Let us consider the functions fn(t) = max(l - | i | , e x p ( — η = 1,2,... These functions are symmetric, convex and strictly positive on the semiaxis t > 0. Hence they represent characteristic functions of some random variables Xn whose distributions belong to the well-known Polya class. Further, as η oo we obtain fn(t) fx(t) = max(0,1 — |f|), i.e., Xn —» X. Since A(X„) = oo and A(X) = 1, the strict inequality takes place in (4.1.1). (7) If Χι, X2,... are independent random variables, then for any η > 1 dn = Δ(Χι, ...,Xn) = min(Ä(XA):
k 1). n - > 00
Conversely, the latter equality implies the following property of the index: If a series X = X\ + X% + ... of independent random variables converges with probability one, then A(X) = d = mf(A(.Xk): k > 1).
(4.1.2)
Indeed, let us represent the series X as X = Sn+Zn, where Sn = Xi + ...+Xn. Since it converges, with probability one, Zn 0 as η oo. Further, |/χ(ί)| = \fsn(t)\ • |/z„(i)|. The first factor does not vanish on the interval |έ| < dn, where dn > d. In view of property 5, Δ(Ζη) —> oo as η —> oo, hence A(X) cannot be less than d. (8) We select from f the subset /ο = {/χ: f x * 0, t e R} and denote the corresponding set of random variables by 2fo· Then MX) = 00
X e 2f 0 .
In particular, A(X) = 00 for all random variables with infinitely divisible distributions. (9) Let X be some random variable and Τ > 0 be a number such that q = Q(T,X) > 1 / 2 . Then MX) > T~ V 4 g - 2.
(4.1.3)
166
4. Centers and scatters of random variables
Proving this inequality we can assume that m e d X = 0, since the index and concentration function are invariant with respect to the shift of the random variable X by any constant. We obtain
The interval |ac| < Τ contains the interval I of the length Τ such that PCX" g I ) > 1 / 2, since otherwise the condition m e d X = 0 does not hold. Therefore f J\x\>T
dFx(x)
< 1
~q,
-
2(1 -
which yields the inequality |/jf(i)| > ι -
\t2T2
q),
being valid for any values of t. Its right-hand side is strictly positive for the values of t whose absolute values are less than T~l\/4q — 2. Since the calculation of a finite A(X) is a difficult problem, (4.1.3) is useful because it gives a simple sufficient condition for the choice of r. Let us take some distribution function Γ(ί) concentrated on the interval 0 < t < 1. For it, as can be easily seen, (4.1.4) Further, for any random variable X e dC and any 0 < r < A(X) we introduce the functionals
Since at the interval 0 < t < r < Δ(Χ) the function fx(t) does not turn into zero, both introduced characteristics of the random variable X take finite values. DEFINITION 4.1.2. The functionals Cr(X) and Er(X) defined on the whole set 3C are respectively called r-center and r-scatter of random variable X (in brief, center and scatter).
4.1. Index, centers and scatters
167
Below we will show that to some extent centers and scatters can pretend to be universal characteristics of random variables, which due to their properties are capable to compete with such famous characteristics as median, ordinary and truncated mathematical expectations, Doob's centering constants, order of dispersion, ordinary and truncated variances. First of all we should note that though the function Γ entering the definition of Q· and ®r is taken from a sufficiently wide class, it does not exhibit its individuality in many properties of centers and scatters. At the same time, it is clear that the more points of growth the function Γ has, the more information about the distribution ofX is concentrated in Q and B>.. Taking this argument into account, we can restrict ourselves with two extreme cases. The first one corresponds to the function Γ(ί) which has the only point of growth t = 1, and then Cr(X) = - 3 log fx(r), r
Br(X) = —|*log rz
fx(r);
the second one corresponds to the function Γ«) = t for which Cr(X) =-
f13logfx(rt)dt, r Jo
Br(X) =
r
Jo
[\\ogfx(rt)dt.
We call these two particular cases light and heavy centers and scatters respectively. Let us get acquainted with some properties inherent in any of centers and scatters. Most of them are verified so easily that it does not require any explanations. Let Χ,Χι,Χι,... be some sequence of random variables, a e R and k > 0 be constants; then (1) C I X ) and
take finite values for any X e SC and any 0 < r < MX).
Further, in all properties it is assumed that the value of r in centers and scatters is less than the corresponding value of the index. (2) Cr(-X) = -Cr(X),
Br(-X) = W ) .
(3) Cr(kX) = kCrk (X), (4)Cr(a)
= a,
(kX) = k2%k (X).
Br(a) = 0.
(5) If Y is a random variable distributed by the normal law with mean a and variance k2, then for any r > 0 Cr(Y) = a,
Br(Y) = k2.
(6) If random variables X\, ...,Xn are independent and AT =X\ +... +Xn, then for 0 < r < A(X) any r-centers and r-scatters of the random variables Χ,Χι, ...,Xn exist; moreover, η c o o = Σ
η w )
=
Σ
4. Centers and scatters of random
168
variables
The first part of the last assertion follows from property (4.1.2) of the index, and the second one, from the fact that the function fx(t) is equal to the product of the functions fxn(t), η > 1, for each t. (7) Let Xn ί> X and Z„ Λ 0 as η oc. Then for each 0 < r < A(X) any r-centers and r-scatters of the random variables Xn + Zn exist, and moreover B,(X„ + Z„)
Cr(Xn + Zn)
1»·ίΧ).
The first part of this assertion follows from property (4.1.1) of the index, and the second one, from the fact that Xn +Zn X and on the interval 0 < t < r the functions fxn+zn(t) converge uniformly to the function fx(t). (8) I f r
0, then Cr(X)
EX,
Br(X)
DX
in the situation where the limiting values are finite. Property 8 is a consequence of the fact that, as r —» 0, sup 11 EX — ^
I : 0 < f < r I - » 0,
Now we turn to the question how close the centers Cr(X) can be to the median m e d X and to the truncated mean αχ(ε) = EX7(|X| < ε), and the scatters l r ( X ) can be to the truncated variances ά\{ε) = DX/(|X| < ε). We should say from the very beginning that such comparison has any sense only if the random variables X are appropriately centered, say, by medians. (9) Let h\ and hi be constants defined by equality (4Λ.4). Let us choose positive numbers ε, r so that er < 1 / 2 and q = Q(e,X) > 3 / 4 . Then