Normal Approximation: New Results, Methods and Problems 9783110933666


247 116 15MB

English Pages 370 [376] Year 2011

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Foreword
Preface
1 Introduction
1.1 Formulation of the problem
1.2 Historical aspects; Qualitative effects of transition from the one-dimensional case to multidimensional spaces
1.3 On the contents of the book
1.4 Some notation and definitions
2 Elements of the theory of probability metrics
2.1 Basic probability metrics
2.2 On distinctions between probability metrics
2.3 Special properties of probability metrics
2.4 Relations between metrics
2.5 Structure of the set of metrics
2.6 Estimates in equivalent metrics
2.7 The Lévy-Prokhorov metrics; Weak convergence
2.8 Uniform metrics
2.9 Metrics λ and χ
2.10 Ideal metrics
2.11 Extremal properties of L and ρ metrics
3 Method of characteristic functions; Berry-Esseen theorem
3.1 Basic properties of characteristic functions
3.2 Estimation of closeness of characteristic functions in CLT
3.3 Berry-Esseen inequality
3.4 Berry-Esseen theorem
3.5 Modifications of Berry-Esseen theorem
3.6 One remark about the method of characteristic functions in the multidimensional case
4 Method of compositions in the one-dimensional case
4.1 Preliminary remarks
4.2 Smoothing inequality
4.3 Basic estimate
4.4 Speculations
4.5 One modification of the method of compositions
4.6 Convergence rate estimate under the condition β2+δ < ∞, 0
Recommend Papers

Normal Approximation: New Results, Methods and Problems
 9783110933666

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

MODERN PROBABILITY AND STATISTICS NORMAL APPROXIMATION: NEW RESULTS, METHODS AND PROBLEMS

ALSO AVAILABLE IN MODERN PROBABILITY AND STATISTICS: Modern Theory of Summation of Random Variables Vladimir M. Zolotarev

MODERN PROBABILITY AND STATISTICS

Normal Approximation: New Results, Methods and Problems Vladimir V. Senatov Moscow State University

MY

SPΗ/

UTRECHT, THE NETHERLANDS, 1 9 9 8

VSP BV P.O. Box 346 3700 AH Zeist The Netherlands

Tel:+31 30 692 5790 Fax: +31 30 693 2081 E-mail: [email protected] Home page: http://www.vsppub.com

© V S P BV 1998 First published in 1998 ISBN 90-6764-292-4

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner.

Printed in The Netherlands by Ridderprint bv, Ridderkerk.

Contents Foreword

iii

Preface

ν

1 Introduction 1.1 Formulation of the problem 1.2 Historical aspects; Qualitative effects of transition from the onedimensional case to multidimensional spaces 1.3 On the contents of the book 1.4 Some notation and definitions

1 1

2 Elements of the theory of probability metrics 2.1 Basic probability metrics 2.2 On distinctions between probability metrics 2.3 Special properties of probability metrics 2.4 Relations between metrics 2.5 Structure of the set of metrics 2.6 Estimates in equivalent metrics 2.7 The Levy-Prokhorov metrics; Weak convergence 2.8 Uniform metrics 2.9 Metrics λ and χ 2.10 Ideal metrics 2.11 Extremal properties of L and ρ metrics

3 32 34 39 40 45 47 52 53 61 62 83 89 99 124

3 Method of characteristic functions; Berry-Esseen theorem 129 3.1 Basic properties of characteristic functions 130 3.2 Estimation of closeness of characteristic functions in CLT . . . . 132 3.3 Berry-Esseen inequality 136 3.4 Berry-Esseen theorem 139 3.5 Modifications of Berry-Esseen theorem 140 3.6 One remark about the method of characteristic functions in the multidimensional case 151 i

4 Method of compositions in the one-dimensional case 4.1 Preliminary remarks 4.2 Smoothing inequality 4.3 Basic estimate 4.4 Speculations 4.5 One modification of the method of compositions 4.6 Convergence rate estimate under the condition fo+s < oo, 0*,(*) — p)/p) . . 277 8.2 Derivatives of the normal law 281 8.3 Estimates of the Pn-measure of the sets F(e) and Φ(ε) 304 8.4 On a bound in the space Ek

331

9 Lower bounds for uniform metrics

337

Conclusion

351

Bibliography

355

Index

363

ii

Foreword This book is the second in a new series of monographs. We entitled this series 'Modern Probability and Statistics'. There are many similar projects in other publishing houses. Therefore we should clarify why the decision to launch one more was made. The Russian school of probability theory and mathematical statistics made a universally recognized contribution to these sciences. Its potentialities are not only very far from being exhausted, but are still increasing. During last decade there appeared many remarkable results, methods and theories which undoubtedly deserve to be presented in monographic literature in order to make them widely known to specialists in probability theory, mathematical statistics and their applications. However, due to recent political changes in Russia followed by some economic instability, for the time being, it is rather difficult to organize the publication of a scientific book in Russia now. Therefore, a considerable stock of knowledge accumulated during last years yet remains scattered over various scientific journals. To improve this situation somehow, together with the VSP International Science Publishers and first of all, its director, Dr. Jan Reijer Groesbeek who with readiness took up the idea, we present this series of monographs. To provide high-qualified international examination of the proposed books, we invited well-known specialists to join the Editorial Board. All of them kindly agreed, so now the Editorial Board of the series is as follows: A. Balkema (University of Amsterdam, the Netherlands) W. Hazod (University of Dortmund, Germany) V. Kalashnikov (Moscow Institute for Systems Research, Russia) V. Korolev (Moscow State University, Russia)—Editor-in-Chief V. Kruglov (Moscow State University, Russia) J. D. Mason (University of Utah, Salt Lake City, USA) E. Omey (EHSAL, Brussels, Belgium) K. Sato (Nagoya University, Japan) M. Yamazato (University of Ryukyu, Japan) V. Zolotarev (Steklov Mathematical Institute, Moscow, Russia)—Editorin-Chief iii

The scope of the series can be seen from both the title of the series and the titles of the published and forthcoming books: • V. M. Zolotarev, Modern Theory of Summation of Random Variables', • Yu. S. Khokhlov, Generalizations of Stable Distributions: Structure and Limit Theorems·, • V. E. Bening, Asymptotic Theory of Testing Statistical Hypotheses: Efficient Statistics, Optimality, Deficiency, • N. G. Ushakov, Selected Topics in Characteristic Functions; • V. V. Uchaikin and V. M. Zolotarev, Chance and Stability; • G. L. Shevlyakov and N. O. Vilchevskii, Robust Estimation: Criteria and Methods. Among the proposals under discussion are the following books: • V. Yu. Korolev and V. M. Kruglov, Random Sequences with Random Indices; • Ε. M. Kudlaev, Decomposable Statistics; • G. P. Chistyakov, Analytical Methods in the Problem of Stability of Decompositions of Random Variables; • A. N. Chuprunov, Random Processes Observed at Random Times; • D. H. Mushtari, Probabilities and Topologies on Linear Spaces; • E. Danielyan and V. G. Ushakov, Priority Queueing Systems; • Ε. V. Morozov, General Queueing Networks: the Method of Regenerative Decomposition; • V. E. Bening, V. Yu. Korolev, and S. Ya. Shorgin, Compound Doubly Stochastic Poisson Processes and Their Applications in Insurance and Finance; • Yu. V. Prokhorov and A. P. Ushakova, Reconstruction of Distribution Types (the Problem of Small Samples) ; • L. Szeidl and V. M. Zolotarev, Limit Theorems for Random Polynomials and Related Topics, as well as many others. Of course, the choice of authors primarily from Russia is due only to the reasons mentioned above and by no means signifies that we prefer to keep to some national policy. We invite authors from all countries to contribute their books to this series. V. Yu. Korolev, V. M. Zolotarev, Editors-in-Chief Moscow, July 1998.

iv

Preface The central limit theorem of probability theory asserts that, under some conditions, the distribution of a sum of random variables is approximated by the normal law. This allows us to consider in actual practice the normal distribution rather than the distribution of a sum of random variables. In other words, we can replace the distribution of a sum of random variables, which is usually very complicated, with a somewhat simple limiting normal distribution. The natural question arises about the normal approximation errors. The first results concerning the normal approximation errors were due to the Russian mathematician, academician Λ. M. Lyapunov, who obtained them in his works of 1900-1901. Lyapunov's investigations inspired many scientists to begin the analysis of approximation errors in limit theorems; this theme is still far from completion even today. An impetus to the development of this field was given by the emergence of the famous Berry-Esseen theorem proved in the beginning of the forties. In the second half of the forties, the first works on the normal approximation in multidimensional spaces came to light, whereas the investigations concerning the infinite-dimensional cases began in the sixties. The importance of the investigations concerning the theory of limit theorems for sums of independent random variables is explicable indeed. The simple addition operation of independent random variables corresponds to the very complicated operation of convolution of distributions. From the formal viewpoint, the distribution of the sum of η independent random variables distributed by a law Ρ can be immediately expressed as the η-fold convolution P*n of the distribution P. But the calculation of multifold convolution is very hard, and thus a highly non-trivial mathematical theory appeared, which deals with the multifold convolutions; it was exactly the theory of limit theorems for sums of independent random variables. This theory goes back to the works due to Bernoulli, de Moivre, Laplace, pursued by Poisson, Cauchy, Gauss, Chebyshev, Markov, and Lyapunov. The related list of mathematicians of the twentieth century is much more voluminous. The experiences in the field of limit theorems for sums of independent random variables were summed up in the outstanding monograph 'Limit Theorems for Sums of Independent Random Variables' due to Β. V. Gnedenko and ν

Α. Ν. Kolmogorov, which appeared in 1949, and since then was translated into many languages. The estimation of approximation errors in limit theorems is a constituent part of the theory of limit theorems. To solve this problem, various approaches were suggested; the most powerful and well-deserved among them is the method of characteristic functions. It was introduced by Lyapunov, and up to the forties of the twentieth century remained the base method to analyze multifold convolutions of distributions. Then other methods came to light; we mention here two of them: the method of compositions suggested by Bergström in the end of the forties, and the method of metric distances proposed by Zolotarev in the middle of the seventies. Applied to a series of problems of probability theory, the latter method appeared to be no weaker than the method of characteristic functions, and even overpowered it in many cases. It is clear that the error of approximation of a function can be understood in many senses. So, a considerable part of this monograph is devoted to introducing the reader into the theory of probability metrics. After this, we consider some results based on the method of characteristic functions, then the method of compositions, and finally, the method of metric distances. The last method is the base of this monograph. The idea of this method consists of the use of metrics that possess some special properties. We consider the case of real-valued random variables, the case of random variables with values in finite-dimensional Euclidean spaces, and the case where the random variables take values in infinite-dimensional real separable Hilbert spaces. It seems likely that the title of this book intrigued the reader. Indeed, from the common viewpoint, first the problem arises, then some method to solve it, and finally, the result appears. But very often we come up against another situation, where some result (maybe, in some special case) is given, then an algorithm to validate it in the general case appears, and finally, one can formulate the whole problem itself, as well as many new problems that occur while proving. Probably, the most natural example of such a situation is the central limit theorem itself. It was first formulated by de Moivre in 1730 for the special case of Bernoulli trials. A century later, Laplace re-invented the central limit theorem, and found the integral form of the limiting normal law (whereas de Moivre represented the limiting distribution as a series). Then decades of intensive work of the best mathematicians were required for the central limit theorem to get its modern form. So, new methods were invented (e.g. the method of characteristic functions), and new problems arose: in particular, the problem to estimate the convergence rate. Some of these problems are not solved yet. Concluding the preface, the author would like to thank those mathematicians whose attention helps to materialize this book. First, the author expresses his deep gratitude to V. M. Zolotarev for his vi

inspiration and his sincere interest to this research. The author is grateful to Yu. S. Khokhlov who verified the first versions of the proof of convergence rate estimates in the space l-i. The author is deeply indebted to Α. V. Kolchin who kindly agreed to translate and typeset the manuscript, and made a series of remarks which helped to improve the presentation. Many thanks are due to V. Yu. Korolev who made a number of useful remarks and whose energy in many respects made the publication of this book possible. Of course, the above-mentioned persons are not responsible for the inevitable errors that present in this book with probability close to one. The author hopes that they will not prevent the reader from understanding the presentation. V: V: Senatov Moscow, July 1998.

vii

1

Introduction 1.1. Formulation of the problem The notion of the central limit theorem (in what follows, for the sake of brevity we will write CLT) should be considered as collective, and covers various assertions concerning the behavior of distributions of sums of random variables. The feature which these assertions (theorems) share in common is that the normalized sums only are considered, and the limit distribution for them is the normal law. Both independent and non-independent summands can be met; their distributions can be either identical or different (in various senses); the summands can be either real-valued random variables or random vectors belonging to multidimensional spaces. There are many kinds of constraints imposed on the moments of the summands, their smoothness, etc. We consider the simplest, in some sense, formulation of the problem; namely we will deal with independent identically distributed random variables, which nevertheless can take their values from various spaces (in the same space in the framework of a single problem). We consider real-valued random variables, random variables which take values in the ^-dimensional Euclidean space and random variables which take values from an arbitrary infinite-dimensional real separable Hilbert space. In the last case, since the infinite-dimensional real separable Hilbert spaces are isomorphic to the 12 space, we will speak about ^-valued random variables only. It what follows, if a definition, assertion, etc., does not depend on the space under consideration, we will speak about random variables in an abstract space H. Now, let us describe the summation scheme we will deal with. We consider a sequence X\,Xi,... of independent identically distributed random variables (r.v.'s) taking their values in a space H. We assume, without loss of generality, that they have zero mean ΕΧχ = 0, and E|Xi|2 < 00, where I · I is the norm in H. Let Ρ be the distribution of Xy, Pn be the distribution of the normalized sum Zn = CXi + ... +Xn)n~m, and let Φ be the normal law with zero mean and the covariance operator coinciding with that of P. In

1

2

1. Introduction

the one-dimensional case, Φ is the normal distribution with zero mean and the variance which is equal to the variance of P; the definitions required in multidimensional case are given in Section 1.4. We always assume, without loss of generality, that the eigenvalues of, of, ··· of the covariance operator Β of the distribution Ρ are arranged in the descending order: a\ > of ^ ··., and let oo o 2 = ^ o f = E|Z 1 | 2 ϊ=1 denote the trace of the operator Β. Let Yi, Y2,... stand for independent random variables with distribution Φ. It is well known that under the above assumptions Ρη-*Φ,

asn->oo.

(1.1.1)

Convergence (1.1.1) is understood in various senses. We will consider metrizable convergence only, and formulate the problem as follows. Let μ be some metric in the space of distributions. How to estimate μ(Ρη, Φ)? The collection of these problems are named as the problem of convergence rate estimation in the CLT. Unless otherwise specified, we assume that fa(P) = E|Xi| 3 < 00. To apprehend why this rather traditional constraint is so interesting, the reader should refer to Section 1.2. Thus, we consider the scheme of summation of independent identically distributed variables and are concerned with estimating the distance μ(Ρ„,Φ) under the condition fe(P) < 00. We are interested in convergence rate estimates for various metrics and spaces; the estimates obtained will utilize various information on the constituent random variables. It turns out that even in our quite simple summation scheme many questions remain open; we will deal with them below. These problems are closely related to, say, those of the theory of probability metrics. In what follows, while estimating μ(Ρ„, Φ) we usually fix η and denote, for fixed n, the distribution of the random variable CXi + ... + Xj)n~m by Pj, 0 < j < η (the distribution Po degenerates at zero). If η were not fixed, we should provide two subscripts for the distributions of these random variables. We let Pj denote the distribution of CXi +... +Xj)j~1/2. For any n, the distribution Φ can be represented as the distribution of the normalized sum (Yi + ... + Yn)n~112. The distributions Φ^, 0 of > ... > of; thus (1.2.5) contains the minimum eigenvalue of the covariance operator. In the finite-dimensional case we can always assume that σ* > 0, because otherwise the problem is reduced to the same one in a space of lower dimension. We note that many estimates for the class C can be derived from estimates for random variables with identity covariance operator (although here we have no equivalent formulations as in the one-dimensional case), which is due to the invariance of C under nondegenerate linear transformations of the space Ek. In this connection, we stress that the random variable B~mX\ entering into (1.2.4) has the identity covariance operator, and βζ(Β~ υ2 Χχ) < ft(Xi)a^" . It seems likely that due to this factor the problem on the dependence of the estimates on the covariance operator does not intrigue the scientists.

1.2. History

7

We have said earlier that for the class Θ we are usually able to obtain 'more precise' estimates. This 'precision', as a rule, consists in the phenomenon that the order of c(k) in k in estimates of type (1.2.5) for the class © is less than for the class 1, there exist distributions Ρ (depending on n, which corresponds to the triangular array scheme) such that of = ... = ο^)β3σζ3η-υ2.

It is easy to construct the corresponding example (Senatov, 1985a) beginning with the fact that the class of all spheres can approximate any half-space He,r = {x e Ek, (x, e) > r}, where (·, ·) denotes the scalar product in Ek, and sup \Pn(Hej) - Φ(Ηβ,Γ)| = r

8

1. Introduction

where the one-dimensional distributions P^ and Φ(β) are the projections of Pn and Φ onto e. Thus, for pCP„, Φ , β ) to vanish in Ek, the distribution of each component i = 1, ...,k, of the vector Zn = (Xi + ... +Xn)n~m over any basis must be approximated by the corresponding normal law in the uniform metric; this approximation automatically becomes uniform in i and all bases in Ek, while the finiteness of the third moment of the norm of X\ in 12 does not guarantee the uniform approximation, which is due to the fact that σ* 0 as k - » 00 in Z0,ae

Ek}.

It is clear that the right-hand side of the next to the last inequality bounds Δ„ as well. At the same time, the asymptotic behavior of An and An(P,R,a) can vastly differ as η —» oo. Below we will frequently come up against this phenomenon. Along with the problem to extend the Berry-Esseen estimate to the multidimensional case, some mathematicians considered the specific effects related to the multidimensionality of the space of values of random variables. In this connection we should mention (von Bahr, 1967a). Let us give the estimate from (Senatov, 1986b), which refines one result of that research. For any distribution Ρ with identity covariance operator in Ek, An(P,R,a) < ( c ^ -

1

fan'1'2 + ( c ^ f t / T 1 ' 2 ) * ,

i.e., for k > 2 the leading term (in n) of this estimate which is of order n~m, tends to zero as R —» 00, which is impossible in the one-dimensional case.

1.2. History

9

Before we turn to uniform (over some system of spheres) estimates in 12, it is pertinent to dwell on the estimates in weak metrics, i.e., in the LevyProkhorov metric n(U, V; 21) = inf { ε : V(A) < U(Ae) + ε, 17(A) < V(Ae) + ε for all A e 21}, where U, V are probability distributions, 21 is some system of Borel sets, Αε = {x: d(x,A) < ε} is the ε-neighborhood of a set A. Here d(x,A) is the distance between χ and the set A. Upper and lower bounds obtained for the Levy-Prokhorov metrics were the first to demonstrate qualitative difference between the finite-dimensional and infinite-dimensional cases. Those bounds, in addition, showed how the information on the distribution used in estimation can affect the order of the resulting bounds. From the definition of the Levy—Prokhorov metric it follows that 7t(U, V; 21) < p(U, V; 21) for any system 21; therefore, for the systems © and C convergence rate estimates in Ek in the metric π can be obtained as corollaries of the corresponding estimates in the metric p. But we cannot use this way to estimate π(Ρη,Φ; ©). An estimate of π(Ρη, Φ; 03) in Ek of order η 1 / 2 for fa < 00 was obtained in (Yurinskii, 1975). Thus, for any 21 c 03 in Ek for π(Ρη, Φ,21) there exist estimates of order n~v2 as η oo and 03 < 00. But they share one common deficiency: if the distribution Ρ degenerates due to, say, σ* tending to zero, they become too rough, namely their right-hand sides exceed one for σ*. small enough, and the left-hand sides tend to zero as η oo, no matter how the (non-degenerate) covariance operator Β degenerates (keep in mind that several its eigenvalues can be not tending to zero). From an estimate given in (Zolotarev, 1976a) it follows that in i i we have π(Ρ„,Φ; 6 ) < οζ$\Ρ,Φ)η~1/8

< cß^n~m.

(1.2.7)

Further, there appear results due to Yurinsky (Yurinskii, 1977a; Yurinskii, 1977b) and Yamukov (Yamukov, 1977) which were then refined in (Senatov, 1984a), which imply that in the space Ek π(Ρη, Φ; Ο < ckmß^n~m,

(1.2.8)

π(Ρη, Φ; 53) < ck1/4ߣ/4n-V8,

(1.2.9)

It is known that estimate (1.2.8) is precise with respect to the order in k; the corresponding example is due to V. Yu. Bentkus. As concerns (1.2.7), in (Senatov, 1977a; Senatov, 1977b; Senatov, 1981) it was demonstrated that it cannot be improved with respect to its order in n. Indeed, for any ε > 0 in 1 c(P)/i- ( 1 / 8 + e ) ,

c{P) > 0.

(1.2.10)

10

1. Introduction

Instead of the class Θ, we can take the class of all half-spaces in For the Levy-Prokhorov metric over the system of all Borel sets, a stronger result is true (Senatov, 1977a; Senatov, 1977b; Senatov, 1981): for any ε > 0 in ί 0.

(1.2.11)

In addition, sup{7r(P„,®;?8): supp(P) c {* e t2: \x\ = 1}} > c > 0,

(1.2.12)

where supp(P) is the support of the measure P. Thus, convergence rate estimates in the central limit theorem for fe < oo in infinite-dimensional spaces can be of a rate of convergence to zero in η differing from n~V2, in contrast to the finite-dimensional case. Moreover, in (Senatov, 1981) it was noticed that the presence of ra~1/8 in (1.2.7) for fa < oo is, in some sense, not due to the infinite dimension of the space of values of random variables under consideration. More exactly, let Dp, β > 0, be the class of distributions in E1 such that ßs(P) < β. For each distribution of this class, (1.2.7) yields L(Pn,0)


0 as k -» oo. If in the finite-dimensional case we do not take the information on the minimum eigenvalue of the operator Β into

1.2. History

11

account, we arrive at the same loss of order with respect to η as in the infinitedimensional case. Below we will frequently come up against this phenomenon while estimating pd and pR. Besides, we will see that the formulations of problems which are natural in the infinite-dimensional case being extended to the finite-dimensional case can lead us to new results which essentially refine and amplify the known ones. Inequality (1.2.13) demonstrates that the order of estimates (1.2.8) and (1.2.9) in η is correct, and this is due to the fact that they do not take the information on the covariance operator of the distribution Ρ into account. Inequalities (1.2.11) and (1.2.12) confirm that for π(Ρη,Φ; 03) even the existence of moments of arbitrarily high order cannot compensate the lack of information on the covariance operator of the distribution P. At the same time, if we know that, say, of = 0(A -(1+a) ), a > 0, 03 < oo, we are able to obtain a bound of the form of η to some power, which was shown in (Senatov, 1977b; Yurinskii, 1977a). Let us turn to the estimation of p(P„, Φ; 21) in where as 01 we consider the sets of spheres with either bounded radii or bounded shifts |a| of their centers from the origin. We have noticed that there are no non-trivial estimates of p(P„, Φ, Θ) in ^2» hence we usually estimate either ρά(Ρη,Φ) = sup{Δη(Ρ,Λ,α): |o| < d, R > 0} (the case where d = 0 is allowed), or pR(Pn, Φ) = sup{A„(P,r,a): a e

l2,r Φ) ~ ° ( σ ι

σ6)(2+5)/3

ß2+Sn~S>2·

was obtained. The last estimates of order n~5'2, 0 < δ < 1, were presented on a session of the Fifth Vilnius International Conference on Probability Theory and Mathematical Statistics. In what follows, we drop the assumption of the independence of components of the random vector X\. Estimate (1.2.21) without assumption of the independence of components of the random vector X\ was proved in (Senatov, 1988b), and (1.2.26), in (Senatov, 1989a). The second term in (1.2.26) can be replaced by /3|(σι...σ 5 ) _ 1 σ^ 1 3 η~ 3 (see (Senatov, 1989b)). In (Sazonov et al., 1988b), the estimate , (σ3 + d?)fan~V2 p d (P n ,0. Let e = a/\a\, and let σ2 be the variance of the projection of Ρ onto the subspace {ye £2'· y = te, —00 < t < 00}. For any k>7, s > 3, γ < kl2, β < 1/2, q > 1 and n>2we have

THEOREM

\n(P,R,a) < c |θ(Ρ,Λ >η ,Α) exp

vs(crnß,e,P) rsn(s-2)/2

+ ( ^ 2 )

lin(1>maXS'f))+C^]} . . (1(1.2.31) 2

1.2. History

23

where ae(P,R,n,k) mi? ιλ = A · Λ j^mm ^ l,max ^ —,

· Λ[ l,maxA ( - Ä\\ JA J ||min , - ) )

J(x,e)ZL In (1.2.31) r can be changed for r + ae. THEOREM 1.2.3. Let SCR, α) be an arbitrary sphere in 12, r = \R — |a||. For any k > 7, s > 3, γ < k/2, β < 1/2, q > 1 and n> 2 "1

vs(crnP,P) +

rsn(s-2)/2

where c depend on k, q, s, β only, C is an absolute constant, θ and β are the same as in Theorem 1.2.2, and

Here r + σ can be substituted for r. 'Non-uniform' estimates containing factors which exponentially decrease as r grows were studied in (Sazonov and Ulyanov, 1991). We stress that for one and the same r the bound given in Theorem 1.2.2 can be much more precise than that of Theorem 1.2.3, because Theorem 1.2.2 utilizes ae instead of mere σ, and σ sup — = 00 e σβ for any non-degenerate distribution P. Let us formulate and discuss several corollaries to Theorems 1.2.1-1.2.3. COROLLARY 1 . 2 . 1 . For

any

k > i ,

Δ„ < c(k)fk(R),

(1.2.32)

1. Introduction

24 where k

/ (J}3a „-l/2\l/6\

MR) = Π min

σ.

J '

f (R) = Π min (L, for R >k R*; k fk(R) = Π min ( l , m a x

l

k>l,

Q.2.34) (1.2.35)

for R < R+; here

and c(k) is some non-decreasing

sequence.

COROLLARY 1.2.2. For any 1 < k < 6, (ßsn~ l/2yfe/3 + τ ——

( (R3ßzn~ll2)k/6 A ^ An < c < \ where c is a constant.

In

particular, 'η-1 1 [

σι.,.σβ

σι ...σ 6 J '

COROLLARY 1.2.3. The bound S

J * ^ ^ σι.,.σβ

+

+

σισ2

a m σι.,.σβ

+

+

„ „ σισ2

-

A J

holds. The distributions Pn and Φ can be close because either the number η of summands is large, or the initial distribution Ρ is close to the normal law Φ, which is taken into account in Corollary 1.2.3. COROLLARY 1.2.4. We consider the sphere S(R,(R + r)e), where r > 0, e belongs to the orthogonal complement Ι2ΘΕ6 of Ε6 to ί2, and \e\ = 1. For this sphere

7

,

for any s > 3, η > 2, where c depends on s only.

f fk ν, „

\\ ,

A 1 Γ®η(®""2)/2

1.2. History

25

The right-hand side of this inequality does not become worse while the covariance operator degenerates so that the eigenvectors do not change and of > of ^ ..·, where of are the eigenvalues corresponding to the eigenvectors e;j it becomes better as ση 0, and the limit value of the right-hand side for ση = 0 is o(n~y2) as s > 3 and n oo. It is not hard to construct bounds which tend to zero as ση —> 0, which results only in a slight modification of Theorem 1 . 2 . 2 . COROLLARY 1 . 2 . 5 . F o r a n y k = l

6,

((σ Δ η ( Ρ , ϋ , α ) < c—

3

+ |a|3)ß3ra-1/2)*/6 U J « σι...Ok

— ,

(1.2.36)

where c is a constant. COROLLARY 1 . 2 . 6 . F o r a n y k = l

6, β < 1 / 2 , q > 1, a n d s > 3,

3)fen-1/2)to6 Vsicrn^P) Γτ*\ AA nr( pP p, R , a^) < cf((o < 3 + |a| — exp( -C-^ σχ...σΗ σ 2 J +(σ + r Y n ^ - V ' 2 Λ V\ + (

where c depends on q, β, and s only. Let us consider the function /*(/?) in Corollary 1.2.1. Since the sequence c ( k ) in (1.2.32) does not decrease, it makes sense to ζ consider only those k for which < 1. Indeed, iffaσ ^ η ~ 1 / 2 > 1, then for R > R t the last factor in the definition of f k ( R ) is equal to one. It is easily seen that for R < R * this last factor is equal to one as well. In the case where ftaf3/!-1'2 > 1, Corollary 1.2.1 yields a trivial bound. Hence we will assume that l. Let, for definiteness, k > 7. In Fig. 1, we show the points where the form of the function f k ( R ) changes, and give the values of f k ( R ) in the intervals between them. Let us discuss this in more detail. First, let R > R * . The inequalities

(R%n-1,2)y6 ^ (R%n~ll2)m< σι σζ


> 0, the function 0(iS(i2,a) — x) is twice Frechet-difFerentiable, and its second derivative is a Lipschitzian with Lipschitz constant not exceeding cÄ 3 (ai...a6) - 1 . The smoothness allows us to use the technique of ^-metrics which immediately provides us with upper bounds for the random vectors with independent components. Specially constructed examples allow us to integrate lower and upper bounds for such vectors. Then the final results for ρχ(Ρη,Φ) and pd(Pn,) can be obtained by certain perfection of the technique of construction of upper bounds. We should stress that the question on refining Yurinskii's estimate (1.2.18) leads us to posing a wide class of problems, among which only a few have been solved up to now. In the infinite-dimensional case, the diversity of formulations and heterogeneity of obtained results is incomparable with the finite-dimensional case. Moreover, the problems which are formulated in the infinite-dimensional case, after passing to the finite-dimensional field give us new results which essentially universalize the known ones. As the author believes, this diversity of formulations and results makes it impossible to choose a problem whose solution gives us a key to understanding the effects existing in the multidimensional estimation.

32

1. Introduction

1.3. On the contents of the book As we have said, the main idea of this book consists in that metrics can be used as an apparatus in investigation the approximation problems. Therefore, we decide to devote the whole Chapter 2 to the theory of probability metrics. In this book we give the definitions and basic properties of the most popular probability metrics. The field of considered questions mainly coincides with that in the corresponding chapter of the monograph (Zolotarev, 1997) but the contents is vastly different. We give mainly those properties which will be used in what follows; here we trace how the 'internal' properties of metrics yield their 'external' properties. In particular, in Chapter 2 we obtain convergence rate estimates in the CLT in some metrics. In Chapter 2, we pay much attention to the problems related to weak convergence of distributions, although most convergence rate estimates in the CLT obtained later are connected with uniform distances which are formally stronger than the weak ones. But the point is that in many situations where we are able to deduce estimates in uniform metrics the uniform convergence turns out to be equivalent to the weak one. This equivalence is closely related to the 'smoothness' of the limiting normal law. In the same situations but where the 'smoothness' of the limiting law does not take place, there are no convergence rate estimates in the uniform metrics, and we should restrict ourselves to estimates in weak metrics whose orders differ from the traditional ones (see Chapter 6). It can happen that some smoothness of the limiting normal law exists but is not sufficient to obtain estimates of traditional order ra_1/2. The 'smoothness' of distributions can be understood in various senses, this is why we used quotation marks before; further we will drop them. The smoothness can be understood as the differentiability of the distribution function, boundedness of some its derivatives, the existence of the absolutely continuous component, the decrease of the characteristic function with a certain rate, the validity of the Cramer condition, the condition σ(Ρ„,Φ) 0 as η oo, etc. We should note that the question of the smoothness of distributions in the CLT is the most cloudy up to now. Even a formulation of the problem is not clearly seen. The CLT itself contains no constraints on the smoothness of the involved distributions. In the finite-dimensional case the limiting normal law (if it is not degenerate) becomes smooth automatically (in any reasonable sense). In this case one cannot recognize the role of the smoothness of the limiting law. It seems likely that the smoothness of the limiting law manifested itself in convergence rate estimates in CLT in the space i2 in the form of the condition that several eigenvalues of the covariance operator should be separated from zero. After this excursion, let us turn back to the contents of the book. Since there are a great variety of probability metrics, the question of their comparison naturally arises. The problems concerning estimation of metrics

1.3. On the contents of the book

33

via other metrics and the structure of the space of metrics itself arise as well, and are considered in Chapter 2, too. Chapter 3 is devoted to the convergence rate estimation in the CLT by the method of characteristic functions. Here we prove (in the one-dimensional case) the famous Berry-Esseen theorem. Here we discuss the question why the method of characteristic functions does not provide us with good results in multidimensional spaces. We also give several modifications of the BerryEsseen theorem. These are very simple estimates, but they are connected with a very important problem which can be formulated as follows. What information on the initial distribution aifects the accuracy of convergence rate estimates in the CLT, and how? We have dwelled on this question in Section 1.2. At present, there exists no formulation of the problem to be anywhere near formal. The author believes that the collection of convergence rate estimates we have gathered is too sparse to perceive this problem in its entirety. Chapter 3 does not depend on others. Chapter 4 is devoted to the method of compositions. Here we consider the one-dimensional case; we use pseudomoments and discuss them while estimating the convergence rate. We suggest a modification of the method of compositions which lets us not use the induction on the number of random summands in the proof of estimates. This modification will then be applied to the estimation in the space £2, where no one succeeds in carrying out the mentioned induction. In this chapter, we give the reader a very rich (maybe, even too detailed) presentation. To read this chapter, the reader should be acquainted with Section 2.10. In Chapter 5 we consider the convergence rate estimates in the CLT in finite-dimensional spaces. We estimate the uniform distances over systems of convex sets and over the class of all Borel sets. We use the method of compositions. A qualified person can read this chapter with no references to Chapter 4. Here, as in all subsequent chapters, the acquaintance with Section 2.10 is compulsory, because the technique of ideal metrics is used very intensively. In this book, we do not consider estimates over spheres in Ek, although they exist and are of much interest. The point is that the study of this question should result in very significant increase of the volume of the book. In the author's opinion, the known estimates for the spheres in Ek are not adequate to the information they use: it seems likely that now we are not able to take full advantage of the information on the spheres on which we compare the distributions in the CLT. In Chapter 6, we consider the problems of estimation of the convergence rate in weak metrics. In particular, we discuss the following question: how does the geometry of the space Η of values of random variables influence the geometry of the space of distributions on H? More exactly, we dwell upon the effect of the growth of dimensionality of the space Ek on the convergence rate estimates in the CLT.

34

1. Introduction

In Chapter 7, we consider estimates for Pß(P n , Φ) (Theorem 7.3.1) in the space £2. These estimates follow from Theorem 1.2.1, and can be considered as particular cases of this theorem. Their proofs, because we make use of absolute moments instead of pseudomoments and impose additional constraints, is much shorter than the proof of Theorem 1.2.1. Nevertheless, in the proof of Theorem 7.3.1 we take account of the basic ideas of the proof of Theorem 1.2.1. To free the presentation of Theorem 7.3.1 from the proofs of auxiliary assertions, we gather them together in Chapter 8. Chapter 9 contains some examples that illustrate the accuracy of the estimates given in Theorems 1.2.1 and 7.3.1. In this book, we give many examples. As a rule, they give us information on the accuracy of estimation. Moreover, the author believes that an analysis of some examples can be more helpful than a theorem. When this book was in preparation, we frequently came up against the following dilemma. Let us have two results. One of them is simple and clear, but is a special case of the second, whose proof is not so natural. How can we act: First prove the clear result and then generalize it, or immediately prove the second one deriving the first result as a corollary? We always prefer the first way, although the second one can be shorter. In all chapters we attract the reader's attention to unsolved problems. We compiled the book with the intention to make individual chapters independent, referring, if necessary, only to definitions and formulations from other chapters. Of course, this results in some repetitions, but we consciously go this way in an effort to make the reading comfortable.

1.4. Some notation and definitions Hereafter the distribution function F of a real-valued random variable X is F(x) = P(X < x), i.e., the distribution functions are left-continuous. The d.f. of a r.v. X will be sometimes denoted as Fx(x). The degenerate distribution (the distribution degenerated at point a) is the distribution concentrated at the point a, and its distribution function is

(let E(x) sometimes stand for the d.f. E°(x) with a = 0). We say that the degenerate distribution is the distribution with a single growth point. The growth point of a distribution function F is any point χ such that Fix + ε) — Fix — ε) > 0 for any ε > 0. The totality of all growth points of a distribution function F is called the support of F and denoted by suppF.

1.4. Some notation and definitions

35

We recall that a discrete distribution Ρ is a lattice distribution if all its growth points belong to some arithmetic progression a + mh, m = 0,±1,..., h > 0. In other words, Ρ is a lattice distribution if suppP c

χ = a + mh, m = 0,±1

h > 0}.

(1.4.1)

The maximal h for which inclusion (1.4.1) is valid is called the span of the lattice distribution P. The arithmetic progressions on the real axis are sometimes referred to as lattices. We recall that the Lebesgue decomposition is the (unique) representation of an arbitrary distribution function F(x) in the form of mixture of three distributions F(x) = pacFac(x) + paFa(x) + pdFd(x), where Faf(x) is absolutely continuous, Fa(x) is singular (i.e., Fs(x) is continuous and F'a(x) = 0 almost everywhere with respect to the Lebesgue measure), and F oo. But Fxn do not converge to E° in the metric ζι, since ζι (Fxn,E°)

= oo.

It turns out that one of the most convenient metrics in general problems is THE L£VY METRIC

L(F,

G) = inf { ε : F ( x - e ) - e < G(x) 0 and draw the graphs of the functions F(x+ε)+e and F(x — ε) — ε, i.e., shift the graph of F(x) to the left and upward to ε, and to the right and downward to ε. The set of the points Cc,;y) on the plane such that the abscissa χ is arbitrary and the ordinate y is chosen so that F(x — ε) — ε < y < F(x + ε) + ε) will be called the ε-neighborhood of F in the Levy metric. Then we can say (with usual stipulation concerning min and inf) that L(F, G) is the minimum ε for which the graph of G(x) belongs to the ε-neighborhood of F (it is clear that such an ε exist). Besides, it is easily seen that L(F, G) is equal to the side of the largest square (with sides parallel to the coordinate axes) which can be placed between the graphs of the functions F and G; the coincidence of the graph points and the boundary points of the square is admitted. It is easily seen that the convergence in the Levy metric takes place in both of the above examples. Further we will see that, excluding from consideration somewhat 'pathological' cases, we can refer to the Levy metric as the weakest probability metric (see Section 2.7). We will discuss the extremal properties of the Levy metric in Section 2.11. The above-defined metrics are given on the set of the distribution functions. But it is known that for any random variable X we can take into consideration not only its distribution function Fx(x) but its probability distribution Ρχ(Β) = PCX" e B) as well, where Β is an arbitrary Borel set on the real axis. The correspondence between the distributions and the distribution functions is oneto-one, and hence it is of no importance whether we speak about distributions or distribution functions. In the case of real-valued random variables, the use of distribution functions is common, because the d.f. Fx is a function of a point whereas Ρχ is a function of a set, and the function of a point is a more simple object (e.g., we can draw a graph of it). But this advantage is lost after the transition to the multidimensional space-valued random variables, and it becomes more convenient to use the measures Ρχ (although we are still able to introduce multi-dimensional distribution functions). In what follows, as we have said in Introduction, we will consider not only real-valued random variables, but the random variables which take their values from either finite-dimensional Euclidean spaces Ek or the space l Ρ = Q is not necessarily true. But, as a rule, the system 21 is rich enough, and the corresponding p( ·, ·; 21) is a metric. If 21 = 03, we will usually write σ( ·, · ) instead of p( ·, ·; 03). Similarly we introduce THE LiVY—PROKHOROV METRIC OVER A SYSTEM OF SETS 21

π(Ρ, Q; 21) = inf { ε : Q(A) < P(A e ) + ε, P(A) < Q(A e ) + ε for all A e 21}, where 21 c 03, Αε = { x : d(x,A) < ε } is the ε-neighborhood of the set A. Hereafter d{x,A) = inf {d0c,30, y e A } is the distance between the point χ and the set A, d(x,y) is the distance between points in the corresponding space H. These metrics universalize the Levy metric. In the case of the space E 1 and 21= {(-oc.x), χ e E 1 } π(Ρ, Q; 21) = L(F, G) where F and G are the distribution functions corresponding to the measures Ρ and G. In the case 21 = 03 we will sometimes write π( •, · ) rather than π( ·, ·; 03) and will omit the words 'over a system of sets 03'. It is clear that in the definition of the Levy-Prokhorov metric we can vary the metric d in the space of values of random variables as well. In particular, if we take d(x,y) = d0(x,y) = < [1,

*

x*y,

then the resulting Levy-Prokhorov metric coincides with the uniform metric over the corresponding system of sets. The questions related to varying the

44

2. Elements

of the theory of probability

metrics

metric d are very interesting but lie outside the field we are considering in this book. We have a feeling that it is not convenient for a reader to deal with a mix of measures and the corresponding distribution functions; in what follows we will use the notions of distribution and distribution functions as equivalent. Hence the notations σ(.F, G), where F, G are d.f.'s and L(P, Q) where Ρ and Q are measures are both admissible. Moreover, for a metric over a set of distributions we will write μ(Χ, Y) where Χ, Y are random variables, meaning the distance μ between their distributions, which does not lead to ambiguity. It is well known that, along with the one-to-one correspondence between the distributions and distribution functions, there exists a one-to-one correspondence between the distributions and their characteristic functions. Therefore, as a distance over the set of distributions we can take a metric over the set of characteristic functions. For example, we introduce UNIFORM METRIC OVER THE SET OF CHARACTERISTIC FUNCTIONS X(f,g)

= s u p { | / ( i ) - g(t)\:

- oo < t < o o } ,

where /, g are characteristic functions. But this metric is rarely used. Much more interesting is λ-METRIC

X(f,g) = maxmin{iύ max |/(f) - g(i)|, 1 IT}, T>0 |ί| 0.

2.2.0n distinctions between probability metrics

45

Finally, let us present the basic class of the metrics which we will use in what follows, namely THE IDEAL METRIC ζ.

where s > 0, is the class of the bounded functions f(x), χ e H, which Eire m-fold differentiable (in the Frechet sense in the multidimensional case), and |/