238 97 6MB
English Pages 437 [438] Year 2023
Probability Theory and Stochastic Modelling 104
Sergey Bobkov Gennadiy Chistyakov Friedrich Götze
Concentration and Gaussian Approximation for Randomized Sums
Probability Theory and Stochastic Modelling Volume 104
Editors-in-Chief Peter W. Glynn, Stanford University, Stanford, CA, USA Andreas E. Kyprianou, University of Bath, Bath, UK Yves Le Jan, Université Paris-Saclay, Orsay, France Kavita Ramanan, Brown University, Providence, RI, USA Advisory Editors Søren Asmussen, Aarhus University, Aarhus, Denmark Martin Hairer, Imperial College, London, UK Peter Jagers, Chalmers University of Technology, Gothenburg, Sweden Ioannis Karatzas, Columbia University, New York, NY, USA Frank P. Kelly, University of Cambridge, Cambridge, UK Bernt Øksendal, University of Oslo, Oslo, Norway George Papanicolaou, Stanford University, Stanford, CA, USA Etienne Pardoux, Aix Marseille Université, Marseille, France Edwin Perkins, University of British Columbia, Vancouver, Canada Halil Mete Soner, Princeton University, Princeton, NJ, USA
Probability Theory and Stochastic Modelling publishes cutting-edge research monographs in probability and its applications, as well as postgraduate-level textbooks that either introduce the reader to new developments in the field, or present a fresh perspective on fundamental topics. Books in this series are expected to follow rigorous mathematical standards, and all titles will be thoroughly peer-reviewed before being considered for publication. Probability Theory and Stochastic Modelling covers all aspects of modern probability theory including:
Gaussian processes Markov processes Random fields, point processes, and random sets Random matrices Statistical mechanics, and random media Stochastic analysis High-dimensional probability
as well as applications that include (but are not restricted to): Branching processes, and other models of population growth Communications, and processing networks Computational methods in probability theory and stochastic processes, including simulation Genetics and other stochastic models in biology and the life sciences Information theory, signal processing, and image synthesis Mathematical economics and finance Statistical methods (e.g. empirical processes, MCMC) Statistics for stochastic processes Stochastic control, and stochastic differential games Stochastic models in operations research and stochastic optimization Stochastic models in the physical sciences Probability Theory and Stochastic Modelling is a merger and continuation of Springer’s Stochastic Modelling and Applied Probability and Probability and Its Applications series.
Sergey Bobkov • Gennadiy Chistyakov Friedrich Götze
Concentration and Gaussian Approximation for Randomized Sums
Sergey Bobkov School of Mathematics University of Minnesota Minneapolis, MN, USA
Gennadiy Chistyakov Fakultät für Mathematik Universität Bielefeld Bielefeld, Germany
Friedrich Götze Fakultät für Mathematik Universität Bielefeld Bielefeld, Germany
ISSN 2199-3130 ISSN 2199-3149 (electronic) Probability Theory and Stochastic Modelling ISBN 978-3-031-31148-2 ISBN 978-3-031-31149-9 (eBook) https://doi.org/10.1007/978-3-031-31149-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Given a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) on a probability space (Ω, F , P) with values in the Euclidean space R𝑛 , 𝑛 ≥ 2, define the weighted sums ⟨𝑋, 𝜃⟩ =
𝑛 ∑︁
𝜃 𝑘 𝑋𝑘 ,
𝑘=1
parameterized by points 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) of the unit sphere S𝑛−1 in R𝑛 (with center at the origin and radius one). In general, the distribution functions of weighted sums ⟨𝑋, 𝜃⟩, say 𝐹𝜃 (𝑥) = P{⟨𝑋, 𝜃⟩ ≤ 𝑥}, 𝑥 ∈ R, essentially depend on the parameter 𝜃. On the other hand, a striking observation made by V. N. Sudakov [169] in 1978 indicates that, under mild correlation-type conditions on the distribution of 𝑋, and when 𝑛 is large, most of the 𝐹𝜃 ’s are concentrated around a certain “typical” distribution function 𝐹. Here “most” should be understood in the sense of the normalized Lebesgue measure 𝔰𝑛−1 on S𝑛−1 . A more precise statement can be given, for example, under the isotropy condition E ⟨𝑋, 𝜃⟩ 2 = 1,
𝜃 ∈ S𝑛−1 ,
which frequently appears in many applications. Similar to the classical central limit theorem, Sudakov’s result thus represents a rather general principle of convergence, with various interesting aspects. A related phenomenon was discovered later by Diaconis and Freedman [79] in terms of low-dimensional projections of non-random data (cf. also von Weizsäcker [176]). The phenomenon of concentration of the family {𝐹𝜃 } 𝜃 ∈S𝑛−1 naturally begs the question of closeness of 𝐹𝜃 to 𝐹 for all 𝜃 from a large part of the sphere in terms of standard distances 𝑑 in the space of probability distributions on the real line. A canonical choice would be the Kolmogorov (uniform) distance 𝜌(𝐹𝜃 , 𝐹) = sup |𝐹𝜃 (𝑥) − 𝐹 (𝑥)|. 𝑥
v
vi
Preface
Less sensitive alternatives would be the Lévy distance n o 𝐿(𝐹𝜃 , 𝐹) = inf ℎ ≥ 0 : 𝐹 (𝑥 − ℎ) − ℎ ≤ 𝐹𝜃 (𝑥) ≤ 𝐹 (𝑥 + ℎ) + ℎ for all 𝑥 ∈ R , as well as the distances in 𝐿 𝑝 -norms 1/ 𝑝 ∫ ∞ |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| 𝑝 d𝑥 , 𝑑 𝑝 (𝐹𝜃 , 𝐹) =
𝑝 ≥ 1,
−∞
among which 𝑊 = 𝑑1 and 𝜔 = 𝑑2 are most natural. For a given distance 𝑑, the behavior of the average value 𝑚 = E 𝜃 𝑑 (𝐹𝜃 , 𝐹), as well as the deviation from the mean in spherical probability 𝔰𝑛−1 {𝑑 (𝐹𝜃 , 𝐹) ≥ 𝑚 + 𝑟 }, is of interest as a function of 𝑛 and 𝑟 > 0. In this context the model of independent random variables 𝑋 𝑘 has been intensively studied in the literature. When 𝑋 𝑘 are independent and identically distributed (the i.i.d. case) and have mean zero and variance one, the distribution functions 𝐹𝜃 are known to be close to the standard normal distribution function ∫ 𝑥 2 1 Φ(𝑥) = √ e−𝑦 /2 d𝑦, 2𝜋 −∞ as long as max 𝑘 |𝜃 𝑘 | is small. If the 3-rd absolute moment 𝛽3 = E |𝑋1 | 3 is finite, the Berry–Esseen theorem allows us to quantify this closeness by virtue of the bound 𝜌(𝐹𝜃 , Φ) ≤ 𝑐𝛽3
𝑛 ∑︁
|𝜃 𝑘 | 3 ,
𝑘=1
which holds for some √ absolute constant 𝑐 > 0. Although the right-hand side is greater than or equal to 𝑐𝛽3 / 𝑛 for any 𝜃 ∈ S𝑛−1√, the bound above implies a similar upper bound on average: E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝑐 ′ 𝛽3 / 𝑛. The i.i.d. case inspires the idea that, under some natural moment and correlationtype assumptions, most of the 𝐹𝜃 might also be close to the standard normal law. But, in light of Sudakov’s theorem, this is equivalent to a similar assertion about the typical distribution – a property which is determined by the distribution of the Euclidean norm 1/2 |𝑋 | = 𝑋12 + · · · + 𝑋𝑛2 . Indeed, in general, the typical distribution can be identified as the spherical average ∫ 𝐹 (𝑥) = 𝐹𝜃 (𝑥) d𝔰𝑛−1 (𝜃) ≡ E 𝜃 𝐹𝜃 (𝑥), S𝑛−1
which may be alternatively described as the distribution of |𝑋 | 𝜃 1 , where the first coordinate of a point on the sphere is treated as a random variable independent of 𝑋. (In the sequel, √ E 𝜃 is always understood as the integral with respect to the measure 𝔰𝑛−1 .) Since 𝜃 1 𝑛 is nearly standard normal, 𝐹 will be close to Φ if and only if the random variable 𝑅 2 = 𝑛1 |𝑋 | 2 is approximately 1 in the sense of the weak topology. In
Preface
vii
many situations, this can be verified directly by computing, for example, the variance of 𝑅 2 , while in some others it represents a non-trivial “thin shell” type concentration problem. This book aims to describe the current state of the art concerning Sudakov’s theorem. In particular, using the metrics 𝑑 mentioned above, we will focus on the derivation of various bounds for E 𝜃 𝑑 (𝐹𝜃 , 𝐹) and E 𝜃 𝑑 (𝐹𝜃 , Φ), as well as on large deviation bounds. Our investigations rely on several basic tools. Besides classical techniques of Fourier Analysis (such as Berry–Esseen-type bounds), many arguments rely upon the spherical concentration phenomenon, that is, concentration properties of the measures 𝔰𝑛−1 for growing dimensions 𝑛, including the associated Sobolev-type and infimum-convolution inequalities. Concentration tools are also used for various classes of distributions of 𝑋 when trying to approximate the typical distribution 𝐹 by the standard normal law. In order to facilitate the readability of the presentation of the results related to the Sudakov phenomenon, we decided to make the presentation more self-contained by including these auxiliary techniques in the first three chapters. Thus we describe in a separate part (Part II) some general results on concentration in the setting of Euclidean and abstract metric spaces. Most of this material can also be found in other publications, including the excellent survey and monograph by M. Ledoux [129], [130], and the recent book by D. Bakry, I. Gentil, and M. Ledoux [8]. The spherical concentration is discussed separately in Part III. It is a classical well-known fact (whose importance was first emphasized by V. Milman in the early 1970s) that any mean zero smooth (say, √ Lipschitz) function 𝑓 on the unit sphere 1/ S𝑛−1 has deviations at most of the order 𝑛 with respect to the growing dimension √ 𝑛. Moreover, as a random variable, 𝑛 𝑓 has Gaussian tails under the measure 𝔰𝑛−1 . In addition to this spherical phenomenon, we present recent developments on the so-called second order concentration, which was pushed forward by the authors as an advanced tool in the theory of randomized summation. Roughly speaking, the second order concentration phenomenon indicates that, under proper normalization hypotheses in terms of the Hessian, any smooth 𝑓 on S𝑛−1 orthogonal to all affine functions in 𝐿 2 (𝔰𝑛−1 ) actually has deviations at most of the order 1/𝑛. Moreover, as a random variable, 𝑛 𝑓 has exponential tails under the measure 𝔰𝑛−1 . Part III also contains various bounds on deviations of elementary polynomials under 𝔰𝑛−1 and collects asymptotic results on special functions related to the distribution of the first coordinate on the sphere. These tools are needed to quantify Sudakov’s theorem in terms of several moment and correlation-type conditions, and for various classes of distributions of 𝑋. With this aim, we shall introduce and discuss the following moment type quantities for a parameter 𝑝 ≥ 1, 𝑀 𝑝 = sup
E | ⟨𝑋, 𝜃⟩ | 𝑝
1/ 𝑝
𝜃 ∈S𝑛−1
as well as the variance-type functionals
,
1/ 𝑝 1 𝑚 𝑝 = √ E | ⟨𝑋, 𝑌 ⟩ | 𝑝 , 𝑛
Preface
viii
2 𝑝 1/ 𝑝 √ |𝑋 | 𝜎2 𝑝 = 𝑛 E − 1 , 𝑛
Λ =
sup Var Í
∑︁ 𝑛
𝑎𝑖2𝑗 =1
𝑎 𝑖 𝑗 𝑋𝑖 𝑋 𝑗 ,
𝑖, 𝑗=1
where 𝑌 is an independent copy of 𝑋. For example, 𝑀2 = 𝑚 2 = 1 in the isotropic case, and 𝜎42 = 𝑛1 Var(|𝑋 | 2 ), which can often be estimated via evaluation of the covariances of 𝑋𝑖2 and 𝑋 2𝑗 . The relevance of these functionals will be clarified in various examples; they are also connected with analytic properties of the distribution 𝜇 of 𝑋 expressed in terms of isoperimetric or Poincaré-type inequalities. For example, there is a simple bound Λ ≤ 4/𝜆21 via the spectral gap 𝜆 1 associated to 𝜇. We shall now outline several results on upper bounds for E 𝜃 𝑑 (𝐹𝜃 , 𝐹) and E 𝜃 𝑑 (𝐹𝜃 , Φ) involving these functionals for various distances 𝑑. They are discussed in detail in the remaining Parts IV–VI of this monograph. • Lévy distance. Here the moments 𝑀1 and 𝑀2 will control quantitative bounds on fluctuations of 𝐹𝜃 around the typical distribution 𝐹 in the metric 𝐿 providing polynomial rates with respect to 𝑛. Namely, for some absolute constant 𝑐 > 0 we have log 𝑛 1/3 𝑀1 + log 𝑛 , E 𝜃 𝐿 (𝐹𝜃 , 𝐹) ≤ 𝑐 𝑀22/3 . E 𝜃 𝐿 (𝐹𝜃 , 𝐹) ≤ 𝑐 1/4 𝑛 𝑛 • Kantorovich 𝐿 1 transport distance. Here rates can be improved in terms of the moments 𝑀 𝑝 of higher order. In particular, we have 𝑝−1
E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 𝑐 𝑝 𝑀 𝑝 𝑛− 2 𝑝
( 𝑝 > 1),
√ where the constants 𝑐 𝑝 depend on 𝑝 only. However, a classical rate of 1/ 𝑛 from other contexts will not be achievable via these bounds. • Kolmogorov distance. Using the variance-type functionals 𝜎𝑝 , it is possible not only to replace the typical distribution 𝐹 with the normal distribution function Φ, thus proving a law of attraction for 𝐹𝜃 , but also to show a standard rate as well, assuming a finite third moment. Analogously to the classical Berry–Esseen theorem, it is shown that, if E |𝑋 | 2 = 𝑛 and E𝑋 = 𝑎, then 𝐴 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √ 𝑛 with 𝐴 = 𝑐 (𝑚 33/2 + 𝜎33/2 + |𝑎|) up to some absolute constant 𝑐. Here, one may eliminate the parameter 𝑎, by using elementary bounds 𝑚 3 ≤ 𝑀32 and 𝜎3 ≤ 𝜎4 (the latter requires, however, the finiteness of the 4-th moment). A slightly worse estimate can also be derived under less restrictive moment assumptions. For example, log 𝑛 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝑐 (𝑀22 + 𝜎2 ) √ . 𝑛
Preface
ix
• Trigonometric and other functional models of random variables. Modulo a logarithmic factor, the upper bounds such as log 𝑛 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝑐 √ 𝑛 turn out to be optimal with respect to 𝑛 in many examples of orthonormal systems 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) of functions in 𝐿 2 . These include in particular the trigonometric system of size 𝑛 with components √ 𝑋2𝑘−1 (𝑡) = 2 cos(𝑘𝑡), √ 𝑋2𝑘 (𝑡) = 2 sin(𝑘𝑡), −𝜋 < 𝑡 < 𝜋, 𝑘 = 1, . . . , 𝑛/2 (𝑛 even), with respect to the normalized Lebesgue measure P on Ω = (−𝜋, 𝜋). More precisely, we derive lower bounds such as E 𝜃 𝜌(𝐹𝜃 , Φ) ≥ √
𝑐 𝑛 (log 𝑛) 𝑠
for some positive 𝑐 and 𝑠 independent of 𝑛. A similar bound also holds for the sequence of the first 𝑛 Chebyshev polynomials on the interval Ω = (−1, 1), for the Walsh system on the Boolean cube {−1, 1} 𝑝 (with 𝑛 = 2 𝑝 − 1), for systems of functions 𝑋 𝑘 (𝑡 1 , 𝑡2 ) = 𝑓 (𝑘𝑡1 + 𝑡 2 ) with 1-periodic 𝑓 (such functions 𝑋 𝑘 form a strictly stationary sequence of pairwise independent random variables on the square Ω = (0, 1) × (0, 1) under the restricted Lebesgue measure), and some others. • 𝐿 2 distance. In order to develop lower bounds as above, similar upper and lower bounds will be needed for the 𝐿 2 -distance 𝜔, in combination with upper bounds for the Kantorovich-distance 𝑊. A number of general results in this direction will be obtained under moment and correlation-type assumptions, as in the case of Kolmogorov distance 𝜌. In fact, in the case of 𝜔, the correct asymptotic behavior of E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) will be derived up to the order 1/𝑛2 . For instance, when√the random vector 𝑋 has an isotropic symmetric distribution and satisfies |𝑋 | = 𝑛 a.s. (and thus all 𝜎𝑝 = 0), one has E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ∼
1 E ⟨𝑋, 𝑌 ⟩ 4 𝑛4
with an error term of order 1/𝑛2 , and a similar result holds for the Gaussian limit Φ instead of the typical distribution 𝐹. Here, as before, 𝑌 denotes an independent copy of 𝑋. • Improved rates in the i.i.d. case. Returning to the classical i.i.d. model with E𝑋1 = 0, E𝑋12 = 1, a remarkable result due to Klartag and Sodin [125] which we include in this monograph improves the pointwise Berry–Esseen bound as follows E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐𝛽4 , 𝑛
𝛽4 = E𝑋14 .
x
Preface
In fact, this result holds in the non-i.i.d. situation as well with replacement of 𝛽4 with the arithmetic means of the 4-th moments of 𝑋 𝑘 . This bound can be complemented by corresponding large deviation bounds. Thus, for typical coefficients, the distances 𝜌(𝐹𝜃 , Φ) are at most of order 1/𝑛, which is not true in general when the coefficients are equal to each other! Moreover, we show that, if the distribution of 𝑋1 is symmetric, and the next moment 𝛽5 = E |𝑋1 | 5 is finite, it is possible to slightly correct the normal distribution Φ to obtain a better approximation such as E 𝜃 𝜌(𝐹𝜃 , 𝐺) ≤
𝑐𝛽5 . 𝑛3/2
Here, 𝐺 is a certain function of bounded variation which is determined by 𝛽4 and depends on 𝑛 (but not on 𝜃). • The second order correlation condition. Certainly Sudakov’s theorem begs the question whether or not similar results continue to hold for dependent components 𝑋 𝑘 . This is often the case, although the orthonormal systems mentioned above may serve as counter examples. More precisely, the variance functional Λ = Λ(𝑋) turns out to be responsible for improved rates of normal approximation for 𝐹𝜃 ’s on average and actually for most 𝜃’s. When 𝑋 has an isotropic symmetric distribution, it will be shown by virtue of the second order spherical concentration that E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 log 𝑛 Λ, 𝑛
which thus extends the i.i.d. case modulo a logarithmic factor. The symmetry assumption can be removed at the expense of additional terms reflecting higher order correlations. In particular, in the presence of the Poincaré-type inequality, we have E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 1 log 𝑛 −1 𝜆1 , 𝑛
which may be complemented by corresponding deviation bounds. • Distributions with many symmetries. Special attention will be devoted to the case where the distribution of 𝑋 is symmetric about all coordinate axes and isotropic (which reduces to the normalization condition E𝑋 𝑘2 = 1). The Λ-functional then simplifies, and under the 4-th moment condition, the Berry–Esseen bound “on average” takes the form E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 log 𝑛 max E𝑋 𝑘4 + 𝑉2 , 𝑘 ≤𝑛 𝑛
where 𝑉2 = sup Var(𝜃 1 𝑋12 + · · · + 𝜃 𝑛 𝑋𝑛2 ). 𝜃 ∈S𝑛−1
If additionally the distribution of 𝑋 is invariant under permutations of coordinates, it yields a simpler bound
Preface
xi
E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 log 𝑛 E𝑋14 + 𝜎42 . 𝑛
Here the last term 𝜎42 may be removed in some cases, e.g. when cov(𝑋12 , 𝑋22 ) ≤ 0. These results can be sharpened under some additional assumptions on the shape of the distribution of 𝑋. We include the proof of the following important variant of the central limit theorem due to Klartag [123]: If the random vector 𝑋 has an isotropic, coordinatewise symmetric log-concave distribution, then, for all 𝜃 ∈ S𝑛−1 , 𝜌(𝐹𝜃 , Φ) ≤ 𝑐
𝑛 ∑︁
𝜃 4𝑘
𝑘=1
up to some absolute constant 𝑐. Here, the average value of the right-hand side is of order 1/𝑛. Although the class of log-concave probability distributions is studied in many investigations, their basic properties are discussed in this text as well. In particular, we include the proof of the Brascamp–Lieb inequality, which serves as a main tool in Klartag’s theorem. Finally, in the last chapter we conclude with brief historical remarks on results about randomized variants of the central limit theorem, in which coefficients have a special structure. Acknowledgements. This work started in 2015 during the visit of the first author to the Bielefeld University, Germany, and he is grateful for their hospitality. The authors were supported by the SFB 701 and the SFB 1283/2 2021 – 317210226 at Bielefeld University. The work of the first author was also supported by the Humboldt Foundation, the Simons Foundation, and NSF grants DMS-1855575, DMS-2154001. It is our great pleasure to thank Michel Ledoux for valuable comments on the draft version of the monograph. In Memoriam. Shortly after completion of this book, Gennadiy Chistyakov passed away after a prolonged illness in December 2022. We mourn the loss of our friend and colleague.
Contents
Part I Generalities 1
Moments and Correlation Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Isotropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1 1.2 First Order Correlation Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Moments and Khinchine-type Inequalities . . . . . . . . . . . . . . . . . . . . . 6 Moment Functionals Using Independent Copies . . . . . . . . . . . . . . . . 8 1.4 1.5 Variance of the Euclidean Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.6 Small Ball Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.7 Second Order Correlation Condition . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2
Some Classes of Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pairwise Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 2.3 Coordinatewise Symmetric Distributions . . . . . . . . . . . . . . . . . . . . . . 2.4 Logarithmically Concave Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Khinchine-type Inequalities for Norms and Polynomials . . . . . . . . . . 2.6 One-dimensional Log-concave Distributions . . . . . . . . . . . . . . . . . . . 2.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23 23 29 30 34 38 43 48
3
Characteristic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Berry–Esseen-type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Lévy Distance and Zolotarev’s Inequality . . . . . . . . . . . . . . . . . . . . . . 3.4 Lower Bounds for the Kolmogorov Distance . . . . . . . . . . . . . . . . . . . 3.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51 51 54 57 60 62
4
Sums of Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Lyapunov Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Rosenthal-type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63 63 67 69 xiii
Contents
xiv
4.4 4.5 4.6 4.7 4.8 4.9
Normal Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expansions for the Product of Characteristic Functions . . . . . . . . . . . Higher Order Approximations of Characteristic Functions . . . . . . . . Edgeworth Corrections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rates of Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72 74 77 80 83 87
Part II Selected Topics on Concentration 5
Standard Analytic Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.1 Moduli of Gradients in the Continuous Setting . . . . . . . . . . . . . . . . . . 91 5.2 Perimeter and Co-area Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.3 Poincaré-type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.4 The Euclidean Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5.5 Isoperimetry and Cheeger-type Inequalities . . . . . . . . . . . . . . . . . . . . 101 Rothaus Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.6 5.7 Standard Examples and Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5.8 Canonical Gaussian Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.9 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6
Poincaré-type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.1 Exponential Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.2 Growth of 𝐿 𝑝 -norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.3 Moment Functionals. Small Ball Probabilities . . . . . . . . . . . . . . . . . . 118 6.4 Weighted Poincaré-type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.5 The Brascamp–Lieb Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.6 Coordinatewise Symmetric Log-concave Distributions . . . . . . . . . . . 126 6.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7
Logarithmic Sobolev Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.1 The Entropy Functional and Relative Entropy . . . . . . . . . . . . . . . . . . 131 7.2 Definitions and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 7.3 Exponential Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 7.4 Bounds Involving Relative Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.5 Orlicz Norms and Growth of 𝐿 𝑝 -norms . . . . . . . . . . . . . . . . . . . . . . . 141 7.6 Bounds Involving Second Order Derivatives . . . . . . . . . . . . . . . . . . . 144 7.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8
Supremum and Infimum Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 8.1 Regularity and Analytic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 8.2 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.3 Hamilton–Jacobi Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.4 Supremum/Infimum Convolution Inequalities . . . . . . . . . . . . . . . . . . 159 8.5 Transport-Entropy Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Contents
xv
Part III Analysis on the Sphere 9
Sobolev-type Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.1 Spherical Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.2 Second Order Modulus of Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 9.3 Spherical Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.4 Poincaré and Logarithmic Sobolev Inequalities . . . . . . . . . . . . . . . . . 178 9.5 Isoperimetric and Cheeger-type Inequalities . . . . . . . . . . . . . . . . . . . . 181 9.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
10 Second Order Spherical Concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 10.1 Second Order Poincaré-type Inequalities . . . . . . . . . . . . . . . . . . . . . . . 185 10.2 Bounds on the 𝐿 2 -norm in the Euclidean Setup . . . . . . . . . . . . . . . . . 188 10.3 First Order Concentration Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 190 10.4 Second Order Concentration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 10.5 Second Order Concentration With Linear Parts . . . . . . . . . . . . . . . . . 194 10.6 Deviations for Some Elementary Polynomials . . . . . . . . . . . . . . . . . . 197 10.7 Polynomials of Fourth Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 10.8 Large Deviations for Weighted ℓ 𝑝 -norms . . . . . . . . . . . . . . . . . . . . . . 204 10.9 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 11 Linear Functionals on the Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 11.1 First Order Normal Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 11.2 Second Order Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 11.3 Characteristic Function of the First Coordinate . . . . . . . . . . . . . . . . . 212 11.4 Upper Bounds on the Characteristic Function . . . . . . . . . . . . . . . . . . . 215 11.5 Polynomial Decay at Infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 11.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Part IV First Applications to Randomized Sums 12 Typical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 12.1 Concentration Problems for Weighted Sums . . . . . . . . . . . . . . . . . . . . 223 12.2 The Structure of Typical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 225 12.3 Normal Approximation for Gaussian Mixtures . . . . . . . . . . . . . . . . . . 227 12.4 Approximation in Total Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 12.5 𝐿 𝑝 -distances to the Normal Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 12.6 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 12.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 13 Characteristic Functions of Weighted Sums . . . . . . . . . . . . . . . . . . . . . . . . 241 13.1 Upper Bounds on Characteristic Functions . . . . . . . . . . . . . . . . . . . . . 241 13.2 Concentration Functions of Weighted Sums . . . . . . . . . . . . . . . . . . . . 244 13.3 Deviations of Characteristic Functions . . . . . . . . . . . . . . . . . . . . . . . . 245 13.4 Deviations in the Symmetric Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 13.5 Deviations in the Non-symmetric Case . . . . . . . . . . . . . . . . . . . . . . . . 251
Contents
xvi
13.6 The Linear Part of Characteristic Functions . . . . . . . . . . . . . . . . . . . . 255 13.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 14 Fluctuations of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 14.1 The Kantorovich Transport Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 259 14.2 Large Deviations for the Kantorovich Distance . . . . . . . . . . . . . . . . . . 264 14.3 Pointwise Fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 14.4 The Lévy Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 14.5 Berry–Esseen-type Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 14.6 Preliminary Bounds on the Kolmogorov Distance . . . . . . . . . . . . . . . 278 14.7 Bounds With a Standard Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 14.8 Deviation Bounds for the Kolmogorov Distance . . . . . . . . . . . . . . . . . 287 14.9 The Log-concave Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 14.10 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 Part V Refined Bounds and Rates 15 𝑳 2 Expansions and Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 15.1 General Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 15.2 Bounds for 𝐿 2 -distance With a Standard Rate . . . . . . . . . . . . . . . . . . . 302 15.3 Expansion With Error of Order 𝑛−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 15.4 Two-sided Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 15.5 Asymptotic Formulas in the General Case . . . . . . . . . . . . . . . . . . . . . 310 15.6 General Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 16 Refinements for the Kolmogorov Distance . . . . . . . . . . . . . . . . . . . . . . . . . 317 16.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 16.2 Large Interval. Final Upper Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 16.3 Relations Between Kantorovich, 𝐿 2 and Kolmogorov distances . . . . 323 16.4 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 16.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 17 Applications of the Second Order Correlation Condition . . . . . . . . . . . . 331 17.1 Mean Value of 𝜌(𝐹𝜃 , Φ) Under the Symmetry Assumption . . . . . . . 331 17.2 Berry–Esseen Bounds Involving Λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 17.3 Deviations Under Moment Conditions . . . . . . . . . . . . . . . . . . . . . . . . . 337 17.4 The Case of Non-symmetric Distributions . . . . . . . . . . . . . . . . . . . . . 340 17.5 The Mean Value of 𝜌(𝐹𝜃 , Φ) in the Presence of Poincaré Inequalities344 17.6 Deviations of 𝜌(𝐹𝜃 , Φ) in the Presence of Poincaré Inequalities . . . 348 17.7 Relation to the Thin Shell Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 17.8 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
Contents
xvii
Part VI Distributions and Coefficients of Special Type 18 Special Systems and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 18.1 Systems with Lipschitz Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 18.2 Trigonometric Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 18.3 Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 18.4 Functions of the Form 𝑋 𝑘 (𝑡, 𝑠) = 𝑓 (𝑘𝑡 + 𝑠) . . . . . . . . . . . . . . . . . . . . 365 18.5 The Walsh System on the Discrete Cube . . . . . . . . . . . . . . . . . . . . . . . 367 18.6 Empirical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 18.7 Lacunary Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 18.8 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 19 Distributions With Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 19.1 Coordinatewise Symmetric Distributions . . . . . . . . . . . . . . . . . . . . . . 375 19.2 Behavior On Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 19.3 Coordinatewise Symmetry and Log-concavity . . . . . . . . . . . . . . . . . . 380 19.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 20 Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 20.1 Edgeworth Expansion for Weighted Sums . . . . . . . . . . . . . . . . . . . . . . 389 20.2 Approximation of Characteristic Functions of Weighted Sums . . . . . 392 20.3 Integral Bounds on Characteristic Functions . . . . . . . . . . . . . . . . . . . . 394 20.4 Approximation in the Kolmogorov Distance . . . . . . . . . . . . . . . . . . . . 397 20.5 Normal Approximation Under the 4-th Moment Condition . . . . . . . . 400 20.6 Approximation With Rate 𝑛−3/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 20.7 Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 20.8 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 21 Coefficients of Special Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 21.1 Bernoulli Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 21.2 Random Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 21.3 Existence of Infinite Subsequences of Indexes . . . . . . . . . . . . . . . . . . 414 21.4 Selection of Indexes from Integer Intervals . . . . . . . . . . . . . . . . . . . . . 416 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Part I
Generalities
Chapter 1
Moments and Correlation Conditions
In this chapter, we introduce general functionals of an algebraic character associated to probability distributions on R𝑛 with sufficiently many finite moments. These include maximal moments of linear functionals of a given order and the characteristics defined via correlation-type conditions. Such functionals appear to be natural tools in the study of Khinchine-type inequalities, as well as in the estimation of “small ball” probabilities. As usual, R𝑛 denotes the real Euclidean 𝑛-space, endowed with the Euclidean norm |𝑥| = (𝑥12 + · · · + 𝑥 𝑛2 ) 1/2 and the inner product ⟨𝑥, 𝑦⟩ = 𝑥1 𝑦 1 + · · · + 𝑥 𝑛 𝑦 𝑛 , where 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ), 𝑦 = (𝑦 1 , . . . , 𝑦 𝑛 ) ∈ R𝑛 .
1.1 Isotropy A random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 defined on a probability space (Ω, F , P) is said to have finite second moment if the expectation E |𝑋 | 2 is finite (that is, when all components 𝑋𝑖 have finite second moments E𝑋𝑖2 ). Let us start with the following classical notion. Definition 1.1.1 A random vector 𝑋 in R𝑛 with finite second moment is called isotropic, or one says that 𝑋 has an isotropic distribution, if E ⟨𝑋, 𝑎⟩ 2 = |𝑎| 2
for all 𝑎 ∈ R𝑛 .
This definition is frequently used in Convex Geometry, especially for random vectors which are uniformly distributed over a convex body (in which case the body is called isotropic, cf. [144]). In the probabilistic context, the above condition may be rewritten in terms of the components of 𝑋 as E𝑋𝑖 𝑋 𝑗 = 𝛿𝑖 𝑗 ,
𝑖, 𝑗 = 1, . . . , 𝑛,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_1
3
4
1 Moments and Correlation Conditions
where 𝛿𝑖 𝑗 = 1 {𝑖= 𝑗 } is the Kronecker symbol. Often, one additionally assumes that all 𝑋𝑖 have mean zero, and then the above means that these random variables are non-correlated and have variances one. The property of being isotropic is stable under orthogonal transformations: the random vector 𝑌 = 𝑈 𝑋 is isotropic, whenever 𝑋 is isotropic, and 𝑈 : R𝑛 → R𝑛 is an orthogonal linear operator (or an orthogonal matrix). With a random vector 𝑋 in R𝑛 having finite second moment, we associate a linear operator 𝑅, or 𝑛 × 𝑛 matrix, defined by E ⟨𝑋, 𝑎⟩ 2 = ⟨𝑅𝑎, 𝑎⟩ ,
𝑎 ∈ R𝑛 .
It is a covariance matrix of 𝑋, when 𝑋 has mean zero, but we call it so in the general case as well. Then, the isotropy condition means that 𝑅 is an identity matrix (or, an identity linear operator on R𝑛 ). In general, if 𝑅 is non-degenerate, there always exists a linear map 𝐴 : R𝑛 → R𝑛 such that 𝑌 = 𝐴𝑋 is an isotropic random vector. Indeed, one may take 𝐴 = 𝑅 −1/2 . In many problems, the isotropy condition thus simply represents a normalization only, and often does not lead to the loss of generality. Let us mention a few immediate consequences from isotropy. Proposition 1.1.2 Assume that a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 is isotropic. Then 𝑎) 𝑏) 𝑐) 𝑑)
E𝑋 𝑘2 = 1 for all 𝑘 = 1, . . . , 𝑛. Hence E Í |𝑋 | 2 = 𝑛. We have 𝑛𝑘=1 (E𝑋 𝑘 ) 2 ≤ 1. If 𝑌 is an independent copy of 𝑋, then E ⟨𝑋, 𝑌 ⟩ 2 = 𝑛.
The property 𝑐) follows from Bessel’s inequality, using orthogonality of 𝑋𝑖 as elements in the space 𝐿 2 (Ω, F , P). The last assertion is also obvious: 2
E ⟨𝑋, 𝑌 ⟩ = E
∑︁ 𝑛 𝑖=1
2 𝑋𝑖𝑌𝑖
=
𝑛 ∑︁
(E 𝑋𝑖 𝑋 𝑗 ) 2 = 𝑛.
𝑖, 𝑗=1
Thus, although isotropy does not mean that the 𝑋 𝑘 have mean zero, the expectations E𝑋 𝑘 must be bounded by 1 and must be small on average. The assertion 𝑑), which may be taken as a natural weakening of isotropy, is equivalent to the property that 𝑛 1 ∑︁ 2 𝜆 = 1, 𝑛 𝑘=1 𝑘
where the 𝜆 𝑘 are eigenvalues of the covariance operator 𝑅 of 𝑋. Indeed, if 𝑈 is an orthogonal linear operator on R𝑛 , then the correlation operator of the random e = 𝑈 𝑋 is equal to 𝑅 e = 𝑈 𝑅𝑈 −1 . Moreover, since 𝑅 is symmetric, one can vector 𝑋 e is diagonal, with diagonal elements 𝜆 𝑘 being always find 𝑈 such that the matrix 𝑅 e2 = 𝜆 𝑘 . Since the eigenvalues of 𝑅. In particular, E 𝑋 𝑘
1.2 First Order Correlation Condition
5
𝑛 D E2 ∑︁ e 𝑎 = ⟨𝑅𝑎, 𝑎⟩ = 𝜆 𝑘 𝑎 2𝑘 , E 𝑋,
𝑎 ∈ R𝑛 ,
𝑘=1
we conclude that 𝑛 𝑛 D E2 ∑︁ ∑︁ e 𝑌e = E e2 = E ⟨𝑋, 𝑌 ⟩ 2 = E ⟨𝑈 𝑋, 𝑈𝑌 ⟩ 2 = E 𝑋, 𝜆2𝑘 , 𝜆𝑘 𝑋 𝑘 𝑘=1
𝑘=1
where 𝑌 is an independent copy of 𝑋 and 𝑌e = 𝑈𝑌 . On the other hand, e| 2 = E |𝑋 | 2 = E | 𝑋
𝑛 ∑︁
e2 = E𝑋 𝑘
𝑛 ∑︁
𝜆𝑘 .
𝑘=1
𝑘=1
Í Í By Cauchy’s inequality, 𝑛1 𝑛𝑘=1 𝜆2𝑘 ≥ 𝑛1 ( 𝑛𝑘=1 𝜆 𝑘 ) 2 with equality if and only if all eigenvalues 𝜆 𝑘 are equal to each other. Hence, we have the following general relation, which also allows one to compare 𝑏) with property 𝑑) from Proposition 1.1.2. Proposition 1.1.3 For any random vector 𝑋 in R𝑛 with finite second moment, E ⟨𝑋, 𝑌 ⟩ 2 ≥
2 1 E |𝑋 | 2 , 𝑛
where 𝑌 is an independent copy of 𝑋. Moreover, assuming that E |𝑋 | 2 = 𝑛, we have E ⟨𝑋, 𝑌 ⟩ 2 = 𝑛 if and only if 𝑋 is isotropic.
1.2 First Order Correlation Condition A natural weakening of the isotropy is the following property requiring that the components 𝑋 𝑘 of 𝑋 are almost non-correlated. Definition 1.2.1 We say that a random vector 𝑋 in R𝑛 satisfies a first order correlation condition with constant 𝑀2 if E ⟨𝑋, 𝑎⟩ 2
1/2
≤ 𝑀2 |𝑎|
for all 𝑎 ∈ R𝑛 .
The optimal value of this constant is given by 𝑀2 (𝑋) = sup (E ⟨𝑋, 𝜃⟩ 2 ) 1/2 , 𝜃 ∈S𝑛−1
where the supremum extends over the unit sphere S𝑛−1 = {𝜃 ∈ R𝑛 : |𝜃| = 1}. Representing the maximal eigenvalue of the covariance matrix 𝑅 associated with 𝑋, 𝑀2 (𝑋) = max 𝜆 𝑘 , 𝑘
6
1 Moments and Correlation Conditions
it is invariant under orthogonal transformation of the space and therefore does not depend on the choice of the coordinate system in R𝑛 . Similarly to Proposition 1.1.2, we have: Proposition 1.2.2 Assume that a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 satisfies a first order correlation condition with constant 𝑀2 . Then 𝑎) 𝑏) 𝑐) 𝑑)
E𝑋 𝑘2 ≤ 𝑀22 for all 𝑘 = 1, . . . , 𝑛. Hence E |𝑋 | 2 ≤ 𝑀22 𝑛. Í We have 𝑛𝑘=1 (E𝑋 𝑘 ) 2 ≤ 𝑀22 , that is, |E𝑋 | 2 ≤ 𝑀22 . 𝑀2 (𝑋 − E𝑋) ≤ 𝑀2 (𝑋). To derive 𝑐), note that the vector E𝑋 = (E𝑋1 , . . . , E𝑋𝑛 ) satisfies ⟨E𝑋, 𝜃⟩ 2 ≤ E ⟨𝑋, 𝜃⟩ 2 ≤ 𝑀22 |𝜃| 2 = 𝑀22
for all 𝜃 ∈ S𝑛−1 . Taking the maximum over all 𝜃 yields the desired bound | E𝑋 | ≤ 𝑀2 . For the inequality in 𝑑), just note that, for all 𝑎 ∈ R𝑛 , E ⟨𝑋, 𝑎⟩ 2 ≥ Var(⟨𝑋, 𝑎⟩) = E ⟨𝑋 − E𝑋, 𝑎⟩ 2 . Proposition 1.2.2 is proved. Another useful observation is: Proposition 1.2.3 If the random variables 𝑋1 , . . . , 𝑋𝑛 are non-correlated, with means E𝑋 𝑘 = 0 and variances 𝑣2𝑘 = Var(𝑋 𝑘 ), 𝑣 𝑘 ≥ 0, then 𝑀2 (𝑋) = max 𝑘 𝑣 𝑘 .
1.3 Moments and Khinchine-type Inequalities The functional 𝑀2 (𝑋) has a natural generalization in terms of the 𝑝-th absolute moments of linear functionals of 𝑋. Definition 1.3.1 Given a random vector 𝑋 in R𝑛 and a number 𝑝 ≥ 1, put 𝑀 𝑝 (𝑋) = sup
E | ⟨𝑋, 𝜃⟩ | 𝑝
1/ 𝑝 .
𝜃 ∈S𝑛−1
These quantities are not decreasing in 𝑝. An inequality of the form 𝑀 𝑝 (𝑋) ≤ 𝑀 means that 𝑋 satisfies a 𝑝-th moment condition (with constant 𝑀), in full analogy with the terminology accepted for i.i.d. (that is, independent, identically distributed) random variables. In the case 𝑝 = 2, we are reduced to Definition 1.2.1. However, in the sequel we will be especially interested in 𝑝 = 4 and larger values. Recall that, given a random variable 𝜉, by Hölder’s inequality, the 𝐿 𝑝 -norms ∥𝜉 ∥ 𝑝 = (E |𝜉 | 𝑝 ) 1/ 𝑝 do not decrease. The reverse relations of the form ∥𝜉 ∥ 𝑝 ≤ 𝐶 𝑝,𝑞 ∥𝜉 ∥ 𝑞 ,
0 < 𝑞 < 𝑝,
1.3 Moments and Khinchine-type Inequalities
7
when they are stated for classes of random variables with constants 𝐶 𝑝,𝑞 independent of members of a given class, are commonly called Khinchine-type inequalities. For example, if the random vector 𝑋 is isotropic, we get by the definition that ∥ ⟨𝑋, 𝑎⟩ ∥ 𝑝 ≤ 𝑀 𝑝 (𝑋) ∥ ⟨𝑋, 𝑎⟩ ∥ 2
for any 𝑎 ∈ R𝑛 ,
and the constant here is best possible. This relation represents a Khinchine-type inequality with 𝑝 > 2 and 𝑞 = 2. The quantity 𝑀 𝑝 (𝑋) may be used to control the 𝑝-th moment of the Euclidean length |𝑋 |, according to the following extension of Proposition 1.2.2 in part 𝑏). Proposition 1.3.2 Given 𝑝 ≥ 2, for any random vector 𝑋 in R𝑛 , √ (E |𝑋 | 𝑝 ) 1/ 𝑝 ≤ 𝑀 𝑝 (𝑋) 𝑛. Note that, if 𝑋 is isotropic, there is an opposite inequality √ (E |𝑋 | 𝑝 ) 1/ 𝑝 ≥ (E |𝑋 | 2 ) 1/2 = 𝑛. √ Hence, the rate 𝑛 for the 𝐿 𝑝 -norm of |𝑋 | with respect to 𝑛 is correct as long as 𝑀 𝑝 (𝑋) is of order 1. Proof If 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) is a random vector uniformly distributed in the unit sphere S𝑛−1 , the distribution of the linear functional ⟨𝑎, 𝜃⟩ depends on the Euclidean length |𝑎|, only. In particular, taking 𝑎 = |𝑎|𝑒 1 = (|𝑎|, 0, . . . , 0), we have E 𝜃 | ⟨𝑎, 𝜃⟩ | 𝑝 = |𝑎| 𝑝 E 𝜃 |𝜃 1 | 𝑝 ,
𝑎 ∈ R𝑛 ,
where E 𝜃 denotes the integral over the uniform probability measure 𝔰𝑛−1 on S𝑛−1 . Inserting here 𝑎 = 𝑋, we get E 𝜃 | ⟨𝑋, 𝜃⟩ | 𝑝 = |𝑋 | 𝑝 E 𝜃 |𝜃 1 | 𝑝 . Next, take the expectation with respect to 𝑋 and use E | ⟨𝑋, 𝜃⟩ | 𝑝 ≤ 𝑀 𝑝 (𝑋) to arrive at the upper bound 𝑀 𝑝𝑝 (𝑋) . E |𝑋 | 𝑝 ≤ E 𝜃 |𝜃 1 | 𝑝 Moreover, since E 𝜃 𝜃 12 = 𝑛1 , we get 1 (E 𝜃 |𝜃 1 | 𝑝 ) 1/ 𝑝 ≥ (E 𝜃 |𝜃 1 | 2 ) 1/2 = √ . 𝑛 This proves the proposition.
□
The growth of the functionals 𝑀 𝑝 can be controlled by means of the Orlicz norms, which may be stronger in comparison with 𝐿 𝑝 -norms. In particular, the use of the exponential moments leads to the following:
8
1 Moments and Correlation Conditions
Proposition 1.3.3 For a number 𝜆 > 0, assume that E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2 for all 𝜃 ∈ S𝑛−1 . Then for all 𝑝 ≥ 1, 𝑀 𝑝 (𝑋) ≤ 𝜆𝑝. Proof Put 𝜉 = | ⟨𝑋, 𝜃⟩ |. Using the inequality 𝑥 𝑝 e−𝑥 ≤ ( 𝑝e ) 𝑝 , 𝑥 ≥ 0, we have E 𝜉 𝑝 = 𝜆𝑝 E
𝜉 𝑝
e− 𝜉 /𝜆 e 𝜉 /𝜆 ≤
𝜆
𝜆𝑝 𝑝 e
E e 𝜉 /𝜆 ≤ 2
𝜆𝑝 𝑝 e
< (𝜆𝑝) 𝑝 .
It remains to take the maximum over all 𝜃.
□
1.4 Moment Functionals Using Independent Copies As closely related moment-type functionals, let us also introduce the following. Definition 1.4.1 Given a random vector 𝑋 in R𝑛 and a number 𝑝 ≥ 1, put 1/ 𝑝 1 𝑚 𝑝 (𝑋) = √ E | ⟨𝑋, 𝑌 ⟩ | 𝑝 , 𝑛 where 𝑌 is an independent copy of 𝑋. In particular, as we discussed in Section 1.1, we have 𝑛 1 ∑︁ 2 1/2 𝜆 𝑚 2 (𝑋) = √ 𝑛 𝑘=1 𝑘
in terms of the eigenvalues 𝜆 𝑘 of the covariance matrix associated with 𝑋. These quantities 𝑀 𝑝 are also non-decreasing in 𝑝 and are invariant under linear orthogonal transformations of Euclidean space. The next elementary observation will play an important role in our study of the randomized central limit theorem. Proposition 1.4.2 Given independent random vectors 𝑋 and 𝑌 in R𝑛 , for any 𝑝 ≥ 2, 1/ 𝑝 1 ≤ 𝑀 𝑝 (𝑋) 𝑀 𝑝 (𝑌 ). √ E | ⟨𝑋, 𝑌 ⟩ | 𝑝 𝑛 In particular, 𝑚 𝑝 (𝑋) ≤ 𝑀 𝑝2 (𝑋). Indeed, by Definition 1.3.1, for any particular value of 𝑌 , E𝑋 | ⟨𝑋, 𝑌 ⟩ | 𝑝 ≤ 𝑀 𝑝𝑝 (𝑋) |𝑌 | 𝑝 . Here one should take the expectation with respect to 𝑌 and apply Proposition 1.3.2.
1.4 Moment Functionals Using Independent Copies
9
A possible rate of growth of 𝑚 𝑝 may be controlled, for example, by the exponential moment assumption sup E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2
(𝜆 > 0).
𝜃 ∈S𝑛−1
Here one may combine Proposition 1.4.2 with Proposition 1.3.3 to conclude that 𝑚 𝑝 ≤ (𝜆𝑝) 2 . This bound implies that for some absolute constant 𝑐 > 0, 𝑐 √︁ ⟨𝑋, ⟩ E exp | 𝑌 | ≤ 2. 𝜆𝑛1/4 Another natural, although rough, bound for the functional 𝑚 𝑝 may be given for certain classes of interest like in the following elementary statement. √ Proposition 1.4.3 If the random vector 𝑋 is isotropic, and |𝑋 | ≤ 𝑏 𝑛 a.s. for some 𝑏 > 0, then 𝑚 3 ≤ 𝑏 2/3 𝑛1/6 , 𝑚 4 ≤ 𝑏 𝑛1/4 . To show this, let 𝑌 be an independent copy of 𝑋. Then | ⟨𝑋, 𝑌 ⟩ | ≤ 𝑏 2 𝑛 a.s., while E ⟨𝑋, 𝑌 ⟩ 2 = 𝑛. Hence 1/3 1/3 1 1 𝑚 3 = √ E | ⟨𝑋, 𝑌 ⟩ | 3 ≤ √ E ⟨𝑋, 𝑌 ⟩ 2 · 𝑏 2 𝑛 = 𝑏 2/3 𝑛1/6 𝑛 𝑛 and
1/4 1/4 1 1 𝑚 4 = √ E ⟨𝑋, 𝑌 ⟩ 4 ≤ √ E ⟨𝑋, 𝑌 ⟩ 2 · 𝑏 4 𝑛2 = 𝑏 𝑛1/4 . 𝑛 𝑛
In addition to 𝑚 𝑝 (𝑋), for integers 𝑝 ≥ 1 one may also consider related functionals 1/ 𝑝 1 𝛼 𝑝 (𝑋) = √ E ⟨𝑋, 𝑌 ⟩ 𝑝 , 𝑛 where 𝑌 is an independent copy of 𝑋 (of course, 𝛼 𝑝 = 𝑚 𝑝 for even 𝑝). Once 𝑋 has a finite moment of order 𝑝, 𝛼 𝑝 (𝑋) is well-defined, finite and non-negative. Indeed, writing 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ), 𝑌 = (𝑌1 , . . . , 𝑌𝑛 ), we have E ⟨𝑋, 𝑌 ⟩ 𝑝 =
𝑛 ∑︁
2
E 𝑋𝑖1 . . . 𝑋𝑖 𝑝
≥ 0.
𝑖1 ,...,𝑖 𝑝 =1
The argument can be extended to involve more general functionals. Proposition 1.4.4 Let 𝑋 be a random vector in R𝑛 with finite moment of an integer order 𝑝 ≥ 1, and 𝑌 be its independent copy. For any real number 0 ≤ 𝛼 ≤ 𝑝, E
⟨𝑋, 𝑌 ⟩ 𝑝 ≥ 0, (|𝑋 | 2 + |𝑌 | 2 ) 𝛼
10
1 Moments and Correlation Conditions
where the ratio is defined to be zero if 𝑋 = 𝑌 = 0. In addition, for 𝛼 ∈ [0, 2], E
⟨𝑋, 𝑌 ⟩ 2 |𝑋 | 2 |𝑌 | 2 1 E ≥ . 𝑛 (|𝑋 | 2 + |𝑌 | 2 ) 𝛼 (|𝑋 | 2 + |𝑌 | 2 ) 𝛼
Proof First, let us note that E
| ⟨𝑋, 𝑌 ⟩ | 𝑝 (|𝑋 | |𝑌 |) 𝑝 ≤E = (E |𝑋 | 𝑝−𝛼 ) 2 , 2 2 𝛼 (|𝑋 | |𝑌 |) 𝛼 (|𝑋 | + |𝑌 | )
so, the expectation is finite. Moreover, without loss of generality, we may assume that 0 < 𝛼 ≤ 𝑝 and 𝜉 = |𝑋 | 2 + |𝑌 | 2 > 0 with probability 1. We use the identity ∫ ∞ ∫ ∞ 1/𝛼 −𝛼 exp − 𝜉𝑡 d𝑡 = 𝑐 𝛼 𝜉 where 𝑐 𝛼 = exp − 𝑠1/𝛼 d𝑠, 0
0
which gives 𝑐 𝛼 E ⟨𝑋, 𝑌 ⟩ 𝑝 𝜉 −𝛼 =
∫
∞
E ⟨𝑋, 𝑌 ⟩ 𝑝 exp − 𝜉𝑡 1/𝛼 d𝑡.
0
Writing 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ), 𝑌 = (𝑌1 , . . . , 𝑌𝑛 ), we have E ⟨𝑋, 𝑌 ⟩ 𝑝 exp − 𝜉𝑡 1/𝛼 = E ⟨𝑋, 𝑌 ⟩ 𝑝 exp − 𝑡 1/𝛼 (|𝑋 | 2 + |𝑌 | 2 ) 𝑛 ∑︁ 2 = E 𝑋𝑖1 . . . 𝑋𝑖 𝑝 exp − 𝑡 1/𝛼 |𝑋 | 2 , 𝑖1 ,...,𝑖 𝑝 =1
which shows that the left expectation is always non-negative. Integrating over 𝑡 > 0, this proves the first assertion of the proposition. For the second assertion, write ∫ ∞ 𝑐 𝛼 E ⟨𝑋, 𝑌 ⟩ 2 𝜉 −𝛼 = E ⟨𝑋, 𝑌 ⟩ 2 exp − 𝑡 1/𝛼 (|𝑋 | 2 + |𝑌 | 2 ) d𝑡 ∫0 ∞ = E ⟨𝑋 (𝑡), 𝑌 (𝑡)⟩ 2 d𝑡, 0
where 𝑋 (𝑡) = exp − 𝑡 1/𝛼 |𝑋 | 2 /2 𝑋,
𝑌 (𝑡) = exp − 𝑡 1/𝛼 |𝑌 | 2 /2 𝑌 .
Since 𝑌 (𝑡) represents an independent copy of 𝑋 (𝑡), one may apply Proposition 1.1.3 to these random vectors, which gives E ⟨𝑋 (𝑡), 𝑌 (𝑡)⟩ 2 ≥
1 E |𝑋 (𝑡)| 2 |𝑌 (𝑡)| 2 . 𝑛
1.5 Variance of the Euclidean Norm
Hence ∫
∞
11
∫ 1 ∞ E |𝑋 (𝑡)| 2 |𝑌 (𝑡)| 2 d𝑡 𝑛 0 ∫ 1 ∞ E |𝑋 | 2 |𝑌 | 2 exp − 𝑡 1/𝛼 (|𝑋 | 2 + |𝑌 | 2 ) d𝑡 = 𝑛 0 𝑐𝛼 E |𝑋 | 2 |𝑌 | 2 𝜉 −𝛼 . = 𝑛
E ⟨𝑋 (𝑡), 𝑌 (𝑡)⟩ 2 d𝑡 ≥
0
This proves the second inequality of the proposition.
□
1.5 Variance of the Euclidean Norm For various reasons, one often needs to use correlations of products 𝑋𝑖 𝑋 𝑗 . In fact, already the information about the mutual covariances of 𝑋𝑖2 (i.e., when 𝑖 = 𝑗) has applications to randomized variants of the central limit theorem. A particular functional of significant interest is 𝜎42 (𝑋) =
Var(|𝑋 | 2 ) Var(𝑋12 + · · · + 𝑋𝑛2 ) = 𝑛 𝑛
(𝜎4 ≥ 0).
If E |𝑋 | 2 = 𝑛, the boundedness of this quantity means that the values of 𝑋 are concentrated√around the Euclidean sphere of radius E |𝑋 | (which is typically large and of order 𝑛). This can be illustrated by the following observation. Proposition 1.5.1 For any random vector 𝑋 in R𝑛 such that E |𝑋 | 2 > 0, we have Var(|𝑋 |) ≤
Var(|𝑋 | 2 ) . E |𝑋 | 2
Thus, if E |𝑋 | 2 = 𝑛 (in particular, if 𝑋 is isotropic), then Var(|𝑋 |) ≤ 𝜎42 (𝑋). The first inequality follows from the general relation E𝜉 2 Var(𝜉) ≤ Var(𝜉 2 ), which is valid for any non-negative random variable 𝜉. Indeed, assuming that the second moment 𝑎 2 = E𝜉 2 is finite (𝑎 > 0), we have Var(𝜉 2 ) = E (𝜉 2 − 𝑎 2 ) 2 = E (𝜉 − 𝑎) 2 (𝜉 + 𝑎) 2 ≥ E (𝜉 − 𝑎) 2 · 𝑎 2 ≥ Var(𝜉) · 𝑎 2 . The advantage of Var(|𝑋 |) compared with 𝜎42 is that it is finite under the second moment condition (i.e., when E |𝑋 | 2 < ∞), while the finiteness of 𝜎42 requires the 4-th moment condition. On the other hand, often the latter functional is easier to handle. In addition, similarly to 𝑀4 (𝑋), 𝜎42 (𝑋) allows one to control the 𝐿 4 -norm of |𝑋 |, since, by the very definition and using Proposition 1.2.2,
12
1 Moments and Correlation Conditions
E |𝑋 | 4 = (E |𝑋 | 2 ) 2 + 𝜎42 (𝑋)𝑛 ≤ 𝑀24 (𝑋) 𝑛2 + 𝜎42 (𝑋)𝑛. This bound can be an alternative to the one given in Proposition 1.3.2, i.e., to E |𝑋 | 4 ≤ 𝑀44 (𝑋) 𝑛2 . Let us also mention a simple observation about the behavior of 𝜎4 under shifts. Proposition 1.5.2 For any random vector 𝑋 in R𝑛 , 2 𝜎4 (𝑋 − E𝑋) ≤ 𝜎4 (𝑋) + √ 𝑀22 (𝑋). 𝑛 Proof Let us write 𝑀2 = 𝑀2 (𝑋), 𝜎4 = 𝜎4 (𝑋), and 𝑎 = E𝑋. Since |𝑋 − 𝑎| 2 = |𝑋 | 2 − 2 ⟨𝑋, 𝑎⟩ + |𝑎| 2 , we have Var(|𝑋 − 𝑎| 2 ) = Var(|𝑋 | 2 ) + 4 Var(⟨𝑋, 𝑎⟩) − 4 cov(|𝑋 | 2 , ⟨𝑋, 𝑎⟩). By Proposition 1.2.2, |𝑎| ≤ 𝑀2 , so, Var(⟨𝑋, 𝑎⟩) ≤ E ⟨𝑋, 𝑎⟩ 2 ≤ 𝑀22 |𝑎| 2 ≤ 𝑀24 , and, by Cauchy’s inequality, cov(|𝑋 | 2 , ⟨𝑋, 𝑎⟩) ≤
√︃ √︁ √ Var(|𝑋 | 2 ) E ⟨𝑋, 𝑎⟩ 2 ≤ 𝜎4 𝑛 𝑀22 .
Collecting these bounds, we get that √ √ 2 Var(|𝑋 − 𝑎| 2 ) ≤ 𝜎42 𝑛 + 4𝑀24 + 4𝜎4 𝑛 𝑀22 = 𝜎4 𝑛 + 2𝑀22 , so that
2 2 𝜎42 (𝑋 − 𝑎) ≤ 𝜎4 + √ 𝑀22 . 𝑛
Hence, Proposition 1.5.2 is proved.
□
In the light of Proposition 1.5.1, it is natural to look for conditions that ensure an equivalence of Var(|𝑋 |) and 𝜎42 (𝑋). Proposition 1.5.3 Suppose that E |𝑋 | 2 = 𝑛. 𝑎) If |𝑋 | 2 ≤ 2𝑛 a.s., then 𝜎42 (𝑋) ≤ 16 Var(|𝑋 |). 𝑏) If 𝑀6 (𝑋) is finite, then 𝜎42 (𝑋) ≤ 16 Var(|𝑋 |) +
1 6 𝑀 (𝑋). 4 6
1.5 Variance of the Euclidean Norm
13
Thus, the finiteness/boundedness of 𝑀6 allows us in some sense to reverse the inequality of Proposition 1.5.1. The argument is based on the following lemma from calculus (which will also be needed later on). Lemma 1.5.4 For any random variable 𝜉 ≥ 0 with finite 4-th moment, 1 − E𝜉 ≤
1 E (1 − 𝜉 2 ) + E (1 − 𝜉 2 ) 2 . 2
If 𝜉 2 ≤ 2 a.s., then we also have a lower bound 1 1 E (1 − 𝜉 2 ) + E (1 − 𝜉 2 ) 2 . 2 16 √ Proof By Taylor’s formula for the function 𝑤(𝜀) = 1 − 1 − 𝜀 around zero on the half axis 𝜀 < 1, we have 1 − E𝜉 ≥
𝑤(𝜀) =
1 𝑤 ′′′ (𝜀1 ) 3 1 𝜀 + 𝜀2 + 𝜀 2 8 6
for some point 𝜀1 between zero and 𝜀. Since 𝑤 ′′′ (𝜀) = 38 (1 − 𝜀) −5/2 ≥ 0, we have an upper bound 1 1 𝑤(𝜀) ≤ 𝜀 + 𝜀 2 , 𝜀 ≤ 0. 2 8 Also, 𝑤 ′′′ (𝜀) ≤
3 8
35/2 < 6 for 0 ≤ 𝜀 ≤ 23 , so, 1 1 2 𝑤 ′′′ (𝜀1 ) 3 𝜀 + 𝜀 ≤ 𝜀2 + 𝜀 ≤ 𝜀2 . 8 6 8
Thus, in both cases, 𝑤(𝜀) ≤
1 𝜀 + 𝜀2 , 2
𝜀≤
2 . 3
Let us now prove this inequality for the remaining values 23 ≤ 𝜀 ≤ 1. Since then 7 1 2 6 2 2 𝜀 + 𝜀 ≥ 6 𝜀, while 𝑤(𝜀) ≤ 1, we may assume that 𝜀 ∈ [ 3 , 7 ]. In this interval it is 7 7 2 sufficient to verify that 𝑤(𝜀) ≤ 6 𝜀, i.e., 1 − 𝜀 ≥ (1 − 6 𝜀) . But the latter inequality is fulfilled at the endpoints of the interval and therefore at all inner points 23 < 𝜀 < 67 . Hence, for all 𝜀 ≤ 1, we have 𝑤(𝜀) ≤ 12 𝜀 + 𝜀 2 . Applying this inequality with 𝜀 = 1 − 𝜉 2 , we get a pointwise bound 1−𝜉 ≤
1 (1 − 𝜉 2 ) + (1 − 𝜉 2 ) 2 . 2
Taking the expectation of both sides, the first bound of the lemma follows. Let us now turn to the lower bound. From Taylor’s formula for 𝑤, using again the property 𝑤 ′′′ (𝜀) ≥ 0, we have 𝑤(𝜀) ≥ 12 𝜀 + 18 𝜀 2 for all 𝜀 ≥ 0. In the case 𝜀 ≤ 0, we have 𝑤 ′′′ (𝜀) ≤ 38 . Hence, for some point 𝜀1 between zero and 𝜀 ∈ [−1, 0],
14
1 Moments and Correlation Conditions
𝑤(𝜀) =
1 𝑤 ′′′ (𝜀 ) 1 1 3 |𝜀| 1 1 1 2 1 𝜀 + 𝜀2 + 𝜀 ≥ 𝜀 + 𝜀2 − ≥ 𝜀+ 𝜀 . 2 8 6 2 8 8 6 2 16
Applying the resulting inequality with 𝜀 = 1 − 𝜉 2 and taking the expectation, we obtain the second bound of the lemma. □ Proof (of Proposition 1.5.3) In 𝑎), the random variable 𝜉 = √|𝑋𝑛| satisfies the condition of the second part in Lemma 1.5.4. Hence, applying its lower bound, we get Var(|𝑋 |) = 𝑛 (1 − E𝜉) (1 + E𝜉) 1 2 𝑛 E (1 − 𝜉 2 ) 2 = 𝜎 (𝑋), ≥ 𝑛 (1 − E𝜉) ≥ 16 16 4 thus proving the first assertion. To prove 𝑏), introduce the events 𝐴 = {|𝑋 | 2 ≤ 2𝑛} and 𝐵 = {|𝑋 | 2 > 2𝑛}. In the end of the proof of Lemma 1.5.4 we obtained the lower bound √ 1 1 2 𝜀 , 1− 1−𝜀 ≥ 𝜀+ 2 16
−1 ≤ 𝜀 ≤ 1.
Applying it with 𝜀 = 1 − 𝜉 2 on the set 𝐴 and using E 𝜉 2 = 1, we get 1 1 E (1 − 𝜉 2 ) 2 1 𝐴 ≤ E (1 − 𝜉) 1 𝐴 − E (1 − 𝜉 2 ) 1 𝐴 16 2 1 = E (1 − 𝜉) − E (1 − 𝜉) 1 𝐵 + E (1 − 𝜉 2 ) 1 𝐵 . 2 Here −(1 − 𝜉) +
1 2
(1 − 𝜉 2 ) = − 12 (1 − 𝜉) 2 ≤ 0. Hence
1 Var(|𝑋 |) 1 1 E (1 − 𝜉 2 ) 2 1 𝐴 ≤ E (1 − 𝜉) = ≤ Var(|𝑋 |). 16 𝑛 1 + E𝜉 𝑛 Thus, we have obtained a more general statement in comparison with 𝑎), namely 1 E (|𝑋 | 2 − 𝑛) 2 1 𝐴 ≤ 16 Var(|𝑋 |), 𝑛 which becomes the desired claim when |𝑋 | 2 ≤ 2𝑛. As for the expectation over the complementary set 𝐵, one may involve the moment functionals 𝑀 𝑝 = 𝑀 𝑝 (𝑋) with suitable values 𝑝 ≥ 4. Recall that, by Proposition 1.3.2, E |𝑋 | 𝑝 ≤ 𝑀 𝑝𝑝 𝑛 𝑝/2 . Hence, by Markov’s inequality, P(𝐵) ≤
1 1 E |𝑋 | 𝑝 ≤ 𝑝 𝑝/2 𝑀 𝑝𝑝 . (2𝑛) 𝑝 2 𝑛
Applying Hölder’s inequality with exponents 𝑞 > 1 and 𝑞 ′ =
𝑞 𝑞−1 ,
we then get that
1.5 Variance of the Euclidean Norm
15
1 1 E (|𝑋 | 2 − 𝑛) 2 1 𝐵 ≤ E |𝑋 | 4 1 𝐵 𝑛 𝑛 𝑝 ′ 1 1 4 2 𝑀 𝑝 1/𝑞′ ≤ (E |𝑋 | 4𝑞 ) 1/𝑞 P(𝐵) 1/𝑞 ≤ 𝑀4𝑞 𝑛 𝑛 𝑛 2 𝑝 𝑛 𝑝/2 𝑀 𝑝 1/𝑞′ 𝑝 1− 𝑝 4 = 𝑀4𝑞 𝑛 2𝑞′ . 𝑝 2 The power of 𝑛 is vanishing for 𝑝 = 2𝑞 ′. On the other hand, if we want to minimize the maximal index of the moment functionals, one should choose 𝑝 = 4𝑞. The two constraints are fulfilled for 𝑞 = 3/2, 𝑞 ′ = 3, and thus 𝑝 = 6, in which case 6
𝑀 1/3 1 6 E (|𝑋 | 2 − 𝑛) 2 1 𝐵 ≤ 𝑀64 . 𝑛 26 Thus Proposition 1.5.3 is proved.
□
The normalized variance 𝜎42 is a member of the family of the following variancetype functionals which allow us to control the concentration of |𝑋 | about its mean in the case E |𝑋 | 2 = 𝑛. Definition 1.5.5 Given a random vector 𝑋 in R𝑛 with E |𝑋 | 2 = 𝑛, put 2 𝑝 1/ 𝑝 |𝑋 | − 1 , 𝜎2 𝑝 (𝑋) = 𝑛 E 𝑛 √
𝑝 ≥ 1.
The finiteness of 𝜎2 𝑝 (𝑋) is equivalent to the finiteness of the moment of |𝑋 | of order 2𝑝 (thus explaining the appearance of the index 2𝑝). Note that 𝜎2 𝑝 represents a non-decreasing function of 𝑝, which attains a minimum at 𝑝 = 1 with value 1 𝜎2 (𝑋) = √ E |𝑋 | 2 − 𝑛 . 𝑛 √ At the same time, all these functionals dominate the 𝐿 1 -deviation of |𝑋 | from 𝑛. Proposition 1.5.6 If E |𝑋 | 2 = 𝑛, then √ 𝜎2 𝑝 (𝑋) ≥ 𝜎2 (𝑋) ≥ E |𝑋 | − 𝑛 for all 𝑝 ≥ 1. Moreover, √ 1 2 𝜎2 (𝑋) ≤ Var(|𝑋 |) ≤ 2𝜎2 (𝑋) 𝑛. 4 The first line of inequalities is obvious, and we only need to explain the last line. In terms of the random variable 𝜉 = √|𝑋𝑛| , we have 𝜉 ≥ 0, E𝜉 2 = 1, and Var(|𝑋 |) = 𝑛 Var(𝜉) = 𝑛 (1 − (E𝜉) 2 ) = 𝑛 (1 − E𝜉) (1 + E𝜉), √ while 𝜎2 = 𝑛 E |1 − 𝜉 2 |. By Cauchy’s inequality,
16
1 Moments and Correlation Conditions
(E |1 − 𝜉 2 |) 2 ≤ E (1 − 𝜉) 2 E (1 + 𝜉) 2 = 4 E (1 − 𝜉) E (1 + 𝜉), implying that 𝜎22 ≤ 4 Var(|𝑋 |). For the last bound, just note that E𝜉 ≤ 1, so that Var(|𝑋 |) ≤ 2𝑛 E (1 − 𝜉), while √ √ 𝜎2 = 𝑛 E |1 − 𝜉 | (1 + 𝜉) ≥ 𝑛 E |1 − 𝜉 |.
1.6 Small Ball Probabilities The functionals 𝜎2 𝑝 = 𝜎2 𝑝 (𝑋),
𝑚 𝑝 = 𝑚 𝑝 (𝑋),
𝑀 𝑝 = 𝑀 𝑝 (𝑋)
and
can be used in the problem of bounding probabilities of “small"” balls. We can already see this from the following elementary claim. Proposition 1.6.1 Given a random vector 𝑋 in R𝑛 such that E |𝑋 | 2 = 𝑛, we have, for all 𝑝 ≥ 1 and 0 < 𝜆 < 1, 𝜎2𝑝𝑝 1 . P |𝑋 | ≤ 𝜆𝑛 ≤ (1 − 𝜆) 𝑝 𝑛 𝑝/2 2
In addition, P |𝑋 | 2 ≤ 𝜆𝑛 ≤
Var(|𝑋 |) 4 . 𝑛 (1 − 𝜆) 2
Here, the second inequality sharpens the first one in the range 1 ≤ 𝑝 ≤ 2 up to a 𝜆-dependent factor (when Var(|𝑋 |) is of order 𝜎2𝑝𝑝 ). Proof According to Definition 1.5.5, 𝑝 𝜎2𝑝𝑝 = 𝑛− 𝑝/2 E 𝑛 − |𝑋 | 2 . Hence, by Chebyshev’s inequality, n o P |𝑋 | 2 ≤ 𝜆𝑛 = P 𝑛 − |𝑋 | 2 ≥ (1 − 𝜆) E |𝑋 | 2 ≤
𝜎2𝑝𝑝 𝑛 𝑝/2 (1 − 𝜆) 𝑝 (E |𝑋 | 2 ) 𝑝
=
𝜎2𝑝𝑝 , (1 − 𝜆) 𝑝 𝑛 𝑝/2
thus proving the first inequality of the proposition. To prove the second inequality, put 𝑣2 = Var(|𝑋 |) = 𝑛 − (E |𝑋 |) 2 (𝑣 ≥ 0), √ √ 2 . In the case 𝑣2 ≤ (1 − 𝑐 2 )𝑛 with 𝜆 < 𝑐 < 1, we 𝑛 − 𝑣 so that 𝑣2 ≤ 𝑛 and E |𝑋 | = √ have E |𝑋 | ≥ 𝑐 𝑛, and by Chebyshev’s inequality,
1.6 Small Ball Probabilities
17
√ P |𝑋 | 2 ≤ 𝜆𝑛 = P |𝑋 | ≤ 𝜆𝑛 n √︁ √ o = P E |𝑋 | − |𝑋 | ≥ 𝑛 − 𝑣2 − 𝜆𝑛 n √ o √ ≤ P E |𝑋 | − |𝑋 | ≥ 𝑐 𝑛 − 𝜆𝑛 ≤ n
o
n
o
𝑣2 . √ (𝑐 − 𝜆) 2 𝑛 √ The last ratio is greater than or equal to 1, as long as 𝑣2 ≥ (𝑐 − √𝜆) 2 𝑛, and then the resulting inequality holds automatically. Therefore, for all 𝑐 ∈ ( 𝜆, 1),
√ 𝑣2 ≤ (1 − 𝑐2 ) 𝑛 or 𝑣2 ≥ (𝑐 − 𝜆) 2 𝑛 =⇒ P |𝑋 | 2 ≤ 𝜆𝑛 ≤
𝑣2 . √ (𝑐 − 𝜆) 2 𝑛
In particular, the √latter bound holds true√without any constraint on 𝑣, as long as (1 − 𝑐2 )𝑛 ≥ (𝑐 − 𝜆) 2 𝑛, that is, 2𝑐2 − 2𝑐 𝜆 + 𝜆 − 1 ≤ 0. This quadratic inequality in the 𝑐 takes place in the √ √ variable √ interval [𝑐 0 , 𝑐 1 ] with the right endpoint 𝑐 1 = 1 ( 𝜆 + 2 − 𝜆), 𝜆 < 𝑐 1 < 1. Hence 𝑐 = 𝑐 1 is the best admissible which satisfies 2 𝑣2 √ . And for this value we have value minimizing the ratio 2 (𝑐− 𝜆) 𝑛
√ √ 𝑣2 ( 𝜆 + 2 − 𝜆) 2 𝑣2 4 𝑣2 = ≤ , P |𝑋 | ≤ 𝜆𝑛 ≤ √ 2 2 𝑛 𝑛 (1 − 𝜆) 2 (1 − 𝜆) (𝑐 1 − 𝜆) 𝑛
2
which was required. The proposition is proved.
□
Thus, if the variance functionals 𝜎2 𝑝 are of order 1, the resulting bounds are of order 𝑛− 𝑝/2 . However, they may be considerably sharpened under the convolution operation. Given an independent copy 𝑌 of 𝑋, write |𝑋 − 𝑌 | 2 = |𝑋 | 2 + |𝑌 | 2 − 2 ⟨𝑋, 𝑌 ⟩ . By the first inequality of Proposition 1.6.1 with 𝜆 = 34 , 2𝑝 2𝑝 n n 3 o n 3 o 4 𝜎2 𝑝 3 o P |𝑋 | 2 + |𝑌 | 2 ≤ 𝑛 ≤ P |𝑋 | 2 ≤ 𝑛 P |𝑌 | 2 ≤ 𝑛 ≤ . 4 4 4 𝑛𝑝
Alternatively for the case 𝑝 = 2, the second inequality of the proposition leads to the improvement n 3 o 46 P |𝑋 | 2 + |𝑌 | 2 ≤ 𝑛 ≤ 2 Var2 (|𝑋 |). 4 𝑛 On the other hand, by Markov’s inequality, for any 𝑞 ≥ 1, 𝑞 n 1 o 4𝑞 E | ⟨𝑋, 𝑌 ⟩ | 𝑞 4𝑞 𝑚 𝑞 = . P | ⟨𝑋, 𝑌 ⟩ | ≥ 𝑛 ≤ 4 𝑛𝑞 𝑛𝑞/2
Splitting the event |𝑋 − 𝑌 | 2 ≤ 14 𝑛 to the case where | ⟨𝑋, 𝑌 ⟩ | ≥ where an opposite inequality holds, we have a set inclusion
1 4
𝑛 and to the case
18
1 Moments and Correlation Conditions
n
|𝑋 − 𝑌 | 2 ≤
1 o n 1 o n 3 o 𝑛 ⊂ | ⟨𝑋, 𝑌 ⟩ | ≥ 𝑛 ∪ |𝑋 | 2 + |𝑌 | 2 ≤ 𝑛 . 4 4 4
Hence, we arrive at: Proposition 1.6.2 Let 𝑌 be an independent copy of a random vector 𝑋 in R𝑛 such that E |𝑋 | 2 = 𝑛. Then for all 𝑝, 𝑞 ≥ 1, n 4𝑞 42 𝑝 1 o P |𝑋 − 𝑌 | 2 ≤ 𝑛 ≤ 𝑞/2 𝑚 𝑞𝑞 + 𝑝 𝜎22𝑝𝑝 . 4 𝑛 𝑛 In particular, choosing 𝑞 = 2𝑝, n 𝐶 1 o P |𝑋 − 𝑌 | 2 ≤ 𝑛 ≤ 𝑝 4 𝑛 with constant 𝐶 = 42 𝑝 (𝑚 22 𝑝𝑝 + 𝜎22𝑝𝑝 ). We also have the bound n 4𝑞 46 1 o P |𝑋 − 𝑌 | 2 ≤ 𝑛 ≤ 𝑞/2 𝑚 𝑞𝑞 + 2 Var2 (|𝑋 |). 4 𝑛 𝑛 Recall that, by Proposition 1.4.2, one may replace 𝑚 𝑞 with 𝑀𝑞2 in these bounds. In the most interesting cases 𝑝 = 1, 𝑝 = 32 , 𝑝 = 2 with 𝑞 = 2𝑝, we thus get respectively n 1 o 16 2 (𝑚 2 + 𝜎22 ), P |𝑋 − 𝑌 | 2 ≤ 𝑛 ≤ 4 𝑛 n 1 o 64 P |𝑋 − 𝑌 | 2 ≤ 𝑛 ≤ 3/2 (𝑚 33 + 𝜎33 ), 4 𝑛 n o 256 1 P |𝑋 − 𝑌 | 2 ≤ 𝑛 ≤ 2 (𝑚 44 + 𝜎44 ). 4 𝑛 Choosing 𝑝 = 𝑞 = 2, we also have n 1 o 16 2 256 4 𝑚 + 𝜎 . P |𝑋 − 𝑌 | 2 ≤ 𝑛 ≤ 4 𝑛 2 𝑛2 4 √ Remark 1.6.3 If |𝑋 | = 𝑛 a.s. and thus all 𝜎2 𝑝 = 0, the latter bounds yield n 𝑚2 𝑛o ≤ 𝑐1 2 , P |𝑋 − 𝑌 | 2 ≤ 4 𝑛 4 o n 𝑚 𝑛 ≤ 𝑐 2 24 , P |𝑋 − 𝑌 | 2 ≤ 4 𝑛
𝑐 1 = 16, 𝑐 2 = 256.
In the isotropic case, the second inequality implies the first one, up to an absolute √ factor. Indeed, since |𝑋 | = 𝑛 a.s., we have | ⟨𝑋, 𝑌 ⟩ | ≤ 𝑛2 , so, 𝑚 44 =
1 E ⟨𝑋, 𝑌 ⟩ 4 ≤ E ⟨𝑋, 𝑌 ⟩ 2 = 𝑚 22 𝑛. 𝑛2
1.6 Small Ball Probabilities
19
In fact, both inequalities are optimal in the sense that, up to absolute factors, they are attained for the random vector 𝑋 with the isotropic distribution 𝑛 1 ∑︁ √ 𝛿 , 𝑛 𝑘=1 𝑛 𝑒𝑘
L (𝑋) =
where 𝑒 1 , . . . , 𝑒 𝑛 is an orthonormal basis in R𝑛 . In this example, 𝑚 2 = 1, while 𝑚 44 = 𝑛. Hence
𝑚22 𝑛
=
𝑚44 𝑛2
= 𝑛1 . Similarly,
n 1 𝑛o =P 𝑋 =𝑌 = . P |𝑋 − 𝑌 | 2 ≤ 4 𝑛 Note that the distributions 𝐹𝜃 of linear functionals ⟨𝑋, 𝜃⟩ in this particular example represent empirical measures 𝐹𝜃 = L (⟨𝑋, 𝜃⟩) =
𝑛 1 ∑︁ √ 𝛿 , 𝑛 𝑘=1 𝑛 ⟨𝜃 ,𝑒𝑘 ⟩
√ √ based on 𝑛 “observations” 𝑛 ⟨𝜃, 𝑒 1 ⟩ , . . . , 𝑛 ⟨𝜃, 𝑒 𝑛 ⟩. Let us now extend the bounds of Proposition 1.6.2 for a growing number of convolutions, using the variance functional 𝜎4 = 𝜎4 (𝑋) and the moment functionals 𝑚 2 𝑝 = 𝑚 2 𝑝 (𝑋) with an integer value of 𝑝 ≥ 1. Given independent copies {𝑋 (𝑘) }1≤𝑘 ≤ 𝑝 of the random vector 𝑋 in R𝑛 , consider the sum of the form 𝑆𝑝 =
𝑝 ∑︁
𝜀 𝑘 𝑋 (𝑘)
𝑘=1
with arbitrary coefficients 𝜀 𝑘 = ±1 and write |𝑆 𝑝 | 2 = Σ1 + Σ2 with Σ1 =
𝑝 ∑︁
|𝑋 (𝑘) | 2 ,
Σ2 =
∑︁
D E 𝜀𝑖 𝜀 𝑗 𝑋 (𝑖) , 𝑋 ( 𝑗) .
1≤𝑖≠ 𝑗 ≤ 𝑝
𝑘=1
The second sum contains 𝑝( 𝑝 − 1) < 𝑝 2 terms, and each of them has 𝐿 2 𝑝 -norm
D (𝑖) ( 𝑗) E
𝑋 ,𝑋
2𝑝
D E 2 𝑝 1/2 𝑝 √ = E 𝑋 (𝑖) , 𝑋 ( 𝑗) = 𝑚 2 𝑝 𝑛.
√ Hence, by the triangle inequality, ∥Σ2 ∥ 2 𝑝 ≤ 𝑝 2 𝑚 2 𝑝 𝑛, and by Markov’s inequality, 𝑚 22 𝑝𝑝 1 o E |Σ2 | 2 𝑝 2 2𝑝 ≤ (4𝑝 ) . P |Σ2 | ≥ 𝑛 ≤ 1 4 𝑛𝑝 ( 4 𝑛) 2 𝑝 n
On the other hand, by Proposition 1.5.1 with 𝑝 = 2 and 𝜆 = 12 , we have
20
1 Moments and Correlation Conditions
2𝑝 n n 4𝜎4 1 o 1 o𝑝 P Σ1 ≤ 𝑛 ≤ P |𝑋 | 2 ≤ 𝑛 ≤ . 2 2 𝑛 Hence
n n n 1 o 𝐶 1 o 1 o P |𝑆 𝑝 | 2 ≤ 𝑛 ≤ P |Σ2 | ≥ 𝑛 + P |Σ1 | ≤ 𝑛 ≤ 𝑝 4 4 2 𝑛
with 𝐶 = (4𝑝 2 𝑚 2 𝑝 ) 2 𝑝 + (4𝜎42 ) 𝑝 . Actually, we need such a bound for the sum of 2𝑝 random vectors. Proposition 1.6.4 Let 𝑋 (𝑘) , 𝑌 (𝑘) (1 ≤ 𝑘 ≤ 𝑝) be independent copies of the random vector 𝑋 in R𝑛 such that E |𝑋 | 2 = 𝑛. Then ∑︁ 𝑝 2 1 𝐶 (𝑘) (𝑘) P (𝑋 − 𝑌 ) ≤ 𝑛 ≤ 2 𝑝 4 𝑛 𝑘=1 with constant 𝐶 = 16 𝑝 2 𝑚 4 𝑝
4𝑝
+ 4𝜎42
2𝑝
.
1.7 Second Order Correlation Condition In the context of higher order concentration (see Chapters 13 and 17), a more powerful natural correlation condition emerged which led to the following requirement. Definition 1.7.1 We say that a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 satisfies a second order correlation condition with constant Λ if, for any collection 𝑎 𝑖 𝑗 ∈ R, Var
∑︁ 𝑛 𝑖, 𝑗=1
𝑛 ∑︁ 𝑎 𝑖 𝑗 𝑋𝑖 𝑋 𝑗 ≤ Λ 𝑎 2𝑖 𝑗 . 𝑖, 𝑗=1
In the matrix form, using 𝐴 = (𝑎 𝑖 𝑗 )𝑖,𝑛 𝑗=1 with the Hilbert–Schmidt norm ∥ 𝐴∥ HS = Í 2 1/2 𝑎𝑖 𝑗 , the definition becomes Var ⟨𝐴𝑋, 𝑋⟩ ≤ Λ ∥ 𝐴∥ 2HS . It is sufficient to consider in this inequality only symmetric matrices 𝐴. Indeed, the matrix 𝐵 = 12 ( 𝐴 + 𝐴 ′) is symmetric, where 𝐴 ′ is the transpose of 𝐴, and ⟨𝐴𝑋, 𝑋⟩ = ⟨𝐵𝑋, 𝑋⟩. On the other hand, by the triangle inequality, ∥𝐵∥ HS ≤ ∥ 𝐴∥ HS . Any random vector 𝑋 in R𝑛 with finite 4-th moment satisfies a second order correlation condition for some finite constant, and otherwise Λ = ∞. By the definition, the optimal value Λ = Λ(𝑋) represents the maximal eigenvalue of the covariance matrix associated with the 𝑛2 -dimensional random vector 𝑛 𝑋 (2) = 𝑋𝑖 𝑋 𝑗 − E𝑋𝑖 𝑋 𝑗 𝑖, 𝑗=1 .
1.7 Second Order Correlation Condition
21
That is, in terms of the 𝑀2 -functional, Λ(𝑋) = 𝑀22 𝑋 (2) . For short, let us write 𝑀 𝑝 = 𝑀 𝑝 (𝑋), 𝑚 𝑝 = 𝑚 𝑝 (𝑋), 𝜎42 = 𝜎42 (𝑋), Λ = Λ(𝑋). Putting 𝑎 𝑖 𝑗 = 𝛿𝑖 𝑗 in Definition 1.7.1, we obtain an elementary relation between the two variance functionals. Proposition 1.7.2 For any random vector 𝑋 in R𝑛 , we have 𝜎42 ≤ Λ. There can be a large difference between the values of these two functionals. For example, when |𝑋 | = const, we necessarily have 𝜎42 = 0. The moment functional 𝑀4 can also be controlled in terms of Λ and 𝑀2 . Putting 𝑎 𝑖 𝑗 = 𝜃 𝑖 𝜃 𝑗 , 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 , Definition 1.7.1 leads to Var ⟨𝑋, 𝜃⟩ 2 ≤ Λ. Using E ⟨𝑋, 𝜃⟩ 2 ≤ 𝑀22 , we obtain: Proposition 1.7.3 For any random vector 𝑋 in R𝑛 , we have 𝑀44 ≤ 𝑀24 + Λ. In particular, if 𝑋 is isotropic, then 𝑀44 ≤ 1 + Λ. Recalling Proposition 1.4.2, we also have: Proposition 1.7.4 If 𝑌 is an independent copy of the random vector 𝑋 in R𝑛 , then 2 E ⟨𝑋, 𝑌 ⟩ 4 ≤ 𝑀24 + Λ 𝑛2 . Equivalently, 𝑚 42 ≤ 𝑀24 + Λ. Similarly to 𝑀 𝑝 (𝑋), 𝑚 𝑝 (𝑋) and 𝜎42 (𝑋), the functional Λ(𝑋) is invariant under linear orthogonal transformations of the Euclidean space R𝑛 . Indeed, if 𝑌 = 𝑈 𝑋, where 𝑈 : R𝑛 → R𝑛 is orthogonal, then
Var ⟨𝐴𝑌 , 𝑌 ⟩ = Var 𝑈 −1 𝐴𝑈 𝑋, 𝑋 ≤ Λ ∥𝑈 −1 𝐴𝑈 ∥ 2HS = Λ∥ 𝐴∥ 2HS . Hence, one may always assume that E𝑋𝑖 𝑋 𝑗 = 𝜆𝑖 𝛿𝑖 𝑗 , where the 𝜆𝑖 are eigenvalues of the covariance operator of 𝑋. In that case, Definition 1.7.1 becomes E
∑︁ 𝑛
2 𝑎 𝑖 𝑗 (𝑋𝑖 𝑋 𝑗 − 𝜆𝑖 𝛿𝑖 𝑗 )
≤Λ
𝑖, 𝑗=1
𝑛 ∑︁
𝑎 2𝑖 𝑗 .
𝑖, 𝑗=1
In the isotropic case, all 𝜆𝑖 = 1, and the definition of Λ(𝑋) is further simplified to E
∑︁ 𝑛 𝑖, 𝑗=1
2 𝑎 𝑖 𝑗 (𝑋𝑖 𝑋 𝑗 − 𝛿𝑖 𝑗 )
≤Λ
𝑛 ∑︁
𝑎 2𝑖 𝑗 .
𝑖, 𝑗=1
All the above definitions extend to complex-valued random variables 𝑋𝑖 with complex numbers 𝑎 𝑖 and 𝑎 𝑖 𝑗 (in which case, 𝑎 𝑖2 and 𝑎 𝑖2𝑗 should be replaced with |𝑎 𝑖 | 2 and |𝑎 𝑖 𝑗 | 2 respectively). Note that, if 𝜉 is a complex-valued random variable, its variance is defined by Var(𝜉) = E |𝜉 − E𝜉 | 2 = E |𝜉 | 2 − |E𝜉 | 2 .
22
1 Moments and Correlation Conditions
Proposition 1.7.5 For any isotropic random vector 𝑋 in R𝑛 , we have Λ ≥
𝑛−1 𝑛 .
Proof Applying the inequality from Definition 1.7.1 to the matrix 𝐴 with only one non-zero entry on the (𝑖, 𝑗)-place, we get Var(𝑋𝑖 𝑋 𝑗 ) = E𝑋𝑖2 𝑋 2𝑗 − 𝛿𝑖 𝑗 ≤ Λ. Summing this over all 𝑖, 𝑗 leads to E |𝑋 | 4 − 𝑛 ≤ 𝑛2 Λ. But E |𝑋 | 4 ≥ (E |𝑋 | 2 ) 2 = 𝑛2 , thus proving the proposition. □ When 𝑛 = 1, it is possible that Λ = 0 (which is attained for the ±1 Bernoulli random variables). However, if 𝑛 ≥ 2, we have a universal bound Λ(𝑋) ≥ 12 in the whole class of isotropic probability distributions.
Chapter 2
Some Classes of Probability Distributions
The relevance of the previously introduced moment and correlation-type functionals can be illustrated by examples of some classical classes of probability distributions. In this chapter, these functionals are discussed for product measures (in which case one can also refine upper bounds on “small ball” probabilities), for joint distributions of pairwise independent random variables, and for coordinate-symmetric distributions. We also discuss the class of logarithmically concave measures and include some additional background material which will be needed later on.
2.1 Independence One basic example, which may be used for illustrations of the previously introduced moment-like functionals, corresponds to independent components of 𝑋. Let us clarify the meaning of these functionals in this important case. As before, let 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ). From the very definition it follows that 𝜎42 (𝑋) =
𝑛 1 ∑︁ Var(𝑋 𝑘2 ) 𝑛 𝑘=1
(2.1)
as long as 𝑋1 , . . . , 𝑋𝑛 are independent. We also have: Proposition 2.1.1 Let 𝑛 ≥ 2. If the random variables 𝑋1 , . . . , 𝑋𝑛 are independent and have mean zero, then 1/2 𝑀2 (𝑋) = max E𝑋 𝑘2 , Λ(𝑋) = max Var(𝑋 𝑘2 ), 2𝑎𝑏 , 𝑘
𝑘
where 𝑎 and 𝑏 are the first two maximal numbers in the sequence E𝑋12 , . . . , E𝑋𝑛2 . In particular, Λ(𝑋) ≤ 2 max 𝑘 E𝑋 𝑘4 .
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_2
23
24
2 Some Classes of Probability Distributions
Proof The first equality is obvious. As for the second assertion, put 𝑣𝑖2 = E𝑋𝑖2 and note that the random variables 2 𝑋𝑖(2) 𝑗 = 𝑋𝑖 𝑋 𝑗 − E𝑋𝑖 𝑋 𝑗 = 𝑋𝑖 𝑋 𝑗 − 𝛿𝑖 𝑗 𝑣𝑖
have mean zero. Their covariances are given by (2) 2 E 𝑋𝑖(2) 𝑗 𝑋 𝑘𝑙 = E (𝑋𝑖 𝑋 𝑗 − 𝛿𝑖 𝑗 𝑣𝑖 ) 𝑋 𝑘 𝑋𝑙
= E 𝑋𝑖 𝑋 𝑗 𝑋 𝑘 𝑋𝑙 − 𝛿𝑖 𝑗 𝛿 𝑘𝑙 𝑣𝑖2 𝑣2𝑘 .
(2.2)
Case 1: 𝑖 ≠ 𝑗. The right-hand side of (2.2) is vanishing unless (𝑖, 𝑗) = (𝑘, 𝑙) or (𝑖, 𝑗) = (𝑙, 𝑘). In both these cases it is equal to (2) (2) (2) 2 2 E 𝑋𝑖(2) 𝑗 𝑋𝑖 𝑗 = E 𝑋𝑖 𝑗 𝑋 𝑗𝑖 = 𝑣𝑖 𝑣 𝑗 .
Case 2: 𝑖 = 𝑗. The right-hand side of (2.2) may be non-zero only when 𝑘 = 𝑙. Case 2a): 𝑖 = 𝑗, 𝑘 = 𝑙, 𝑖 ≠ 𝑘. Then the right-hand side of (2.2) is equal to (2) E 𝑋𝑖𝑖(2) 𝑋 𝑘𝑘 = E𝑋𝑖2 E𝑋 𝑘2 − E𝑋𝑖2 E𝑋 𝑘2 = 0.
Case 2b): 𝑖 = 𝑗 = 𝑘 = 𝑙. The right-hand side is equal to E 𝑋𝑖𝑖(2) 𝑋𝑖𝑖(2) = E𝑋𝑖4 − E𝑋𝑖2 E𝑋𝑖2 = Var(𝑋𝑖2 ). (2) Summarizing, we see that the covariance E 𝑋𝑖(2) 𝑗 𝑋 𝑘𝑙 may be non-zero only when (𝑖, 𝑗) = (𝑘, 𝑙) or (𝑖, 𝑗) = (𝑙, 𝑘) for 𝑖 ≠ 𝑗 and when 𝑖 = 𝑗 = 𝑘 = 𝑙, in which case it is equal to (2) (2) (2) 2 2 2 E 𝑋𝑖(2) 𝑗 𝑋𝑖 𝑗 = E 𝑋𝑖 𝑗 𝑋 𝑗𝑖 = (1 − 𝛿𝑖 𝑗 ) 𝑣𝑖 𝑣 𝑗 + 𝛿𝑖 𝑗 Var(𝑋𝑖 ).
Therefore, for any real symmetric matrix (𝑎 𝑖 𝑗 )𝑖,𝑛 𝑗=1 , Var
∑︁ 𝑛
𝑎 𝑖 𝑗 𝑋𝑖 𝑋 𝑗 =
𝑛 𝑛 ∑︁ ∑︁
(2) 𝑎 𝑖 𝑗 𝑎 𝑘𝑙 E 𝑋𝑖(2) 𝑗 𝑋 𝑘𝑙
𝑖, 𝑗=1 𝑘,𝑙=1
𝑖, 𝑗=1
=2
∑︁
𝑎 𝑖2𝑗 𝑣𝑖2 𝑣2𝑗 +
𝑖≠ 𝑗
∑︁
𝑎 2𝑖𝑖 Var(𝑋𝑖2 ).
𝑖
Clearly, the maximum of this variance subject to
Í𝑛 𝑖, 𝑗=1
𝑎 𝑖2𝑗 ≤ 1 is equal to
n o max 2 max 𝑣𝑖2 𝑣2𝑗 , max Var(𝑋𝑖2 ) , 𝑖≠ 𝑗
and the desired assertion follows.
𝑖
□
2.1 Independence
25
Corollary 2.1.2 For i.i.d. random variables 𝑋1 , . . . , 𝑋𝑛 with mean zero, 𝑀2 (𝑋) = E𝑋12
1/2
,
𝜎42 (𝑋) = Var(𝑋12 ),
and if 𝑛 ≥ 2, n 2o Λ(𝑋) = max Var(𝑋12 ), 2 E𝑋12 ≤ 2 E𝑋14 . Let us now turn to the functionals 𝑀 𝑝 with arbitrary 𝑝. Proposition 2.1.3 For independent random variables 𝑋1 , . . . , 𝑋𝑛 with mean zero, max E |𝑋 𝑘 | 𝑝
1/ 𝑝
≤ 𝑀 𝑝 (𝑋) ≤ 𝐶 𝑝 max E |𝑋 𝑘 | 𝑝
1/ 𝑝
𝑘
𝑘
√ for any 𝑝 ≥ 2, where one may put 𝐶 𝑝 = 2 𝑝. In the i.i.d. situation, 𝑀 𝑝 (𝑋) is thus equivalent to the 𝐿 𝑝 -norm (E |𝑋1 | 𝑝 ) 1/ 𝑝 . Proof The lower bound follows from 𝑀 𝑝 ≥ 𝑀2 and Proposition 2.1.1. For the upper bound, we use the symmetrization argument. By Jensen’s inequality, if 𝜂 is an independent copy of a random variable 𝜉 with mean zero and finite 𝑝-th absolute moment, then E |𝜉 | 𝑝 ≤ E |𝜉 − 𝜂| 𝑝 . Let 𝑌 = (𝑌1 , . . . , 𝑌𝑛 ) be an independent copy of the random vector 𝑋 and put 𝑋 ′ = 𝑋 − 𝑌 . Its components 𝑋 𝑘′ = 𝑋 𝑘 − 𝑌𝑘 are independent and have symmetric distributions. Hence ∑︁ 𝑝 𝑛 E | ⟨𝑋, 𝜃⟩ | 𝑝 ≤ E | ⟨𝑋 ′, 𝜃⟩ | 𝑝 = E 𝜃 𝑘 𝜀 𝑘 𝑋 𝑘′ , 𝑘=1
where the 𝜀 𝑘 = ±1 are arbitrary. Here one can take the average over all 𝜀 𝑘 using Khinchine’s inequality ∑︁ 𝑝 ∑︁ 𝑝/2 𝑛 𝑛 E 𝜀 𝑎 𝑘 𝜀 𝑘 ≤ 𝐵 𝑝 𝑎 2𝑘 , 𝑘=1
𝑘=1
which holds with a constant 𝐵 𝑝 > 0 depending on 𝑝 only. This gives 𝑝
E | ⟨𝑋, 𝜃⟩ | ≤
𝐵 𝑝𝑝
E
∑︁ 𝑛
𝜃 2𝑘 𝑋 𝑘′2
𝑝/2
≤ 𝐵 𝑝𝑝 E
𝑘=1
𝑛 ∑︁
𝜃 2𝑘 |𝑋 𝑘′ | 𝑝 ,
𝑘=1
where we also used the convexity of the power function 𝑥 → 𝑥 𝑝/2 together with the assumption 𝜃 12 + · · · + 𝜃 2𝑛 = 1. Maximizing over 𝜃 and using E |𝑋 𝑘′ | 𝑝 ≤ 2 𝑝−1 E |𝑋 𝑘 | 𝑝 (again, by Jensen’s inequality), we arrive at 𝑀 𝑝𝑝 (𝑋) ≤ 2 𝑝−1 𝐵 𝑝𝑝 max E |𝑋 𝑘 | 𝑝 . 𝑘
26
2 Some Classes of Probability Distributions
Finally, let us recall a standard argument providing an explicit bound on 𝐵 𝑝 . Let 𝑍 𝑛 = 𝑎 1 𝜀1 + · · · + 𝑎 𝑛 𝜀 𝑛 with 𝑎 12 + · · · + 𝑎 2𝑛 = 1. For every 𝑡 > 0, we have E 𝜀 e𝑡 |𝑍𝑛 | ≤ 2 E 𝜀 e𝑡 𝑍𝑛 = 2
𝑛 Ö
cosh(𝑎 𝑘 𝑡) ≤ 2
𝑘=1
𝑛 Ö
e (𝑎𝑘 𝑡)
2 /2
= 2 e𝑡
2 /2
.
𝑘=1
Using 𝑥 𝑝 e−𝑥 ≤ ( 𝑝e ) 𝑝 , 𝑥 ≥ 0, this gives E 𝜀 |𝑍 𝑛 | 𝑝 = 𝑡 − 𝑝 E 𝜀 |𝑡𝑍 𝑛 | 𝑝 e−𝑡 |𝑍𝑛 | e𝑡 |𝑍𝑛 | ≤
𝑝 𝑝 e𝑡
E 𝜀 e𝑡 |𝑍𝑛 | ≤ 2
𝑝 𝑝 e𝑡
e𝑡
2 /2
.
√ Here, the (optimal) choice 𝑡 = 𝑝 leads to E 𝜀 |𝑍 𝑛 | 𝑝 ≤ 2 ( 𝑝e ) 𝑝/2 . Hence, a similar √ bound holds true for 𝐵 𝑝 , which yields 𝐶 𝑝 ≤ ( 4e𝑝 ) 𝑝/2 < (2 𝑝) 𝑝 . We thus arrive at the desired upper bound. □ Next, let us also stress that small ball bounds like the one of Proposition 1.6.1 with 𝑝 = 2, n o 𝜎 2 (𝑋) 1 , P |𝑋 | 2 ≤ 𝜆𝑛 ≤ 4 (1 − 𝜆) 2 𝑛
0 < 𝜆 < 1 (where E |𝑋 | 2 = 𝑛),
and even the bounds of Propositions 1.6.2 and 1.6.4 may be considerably sharpened for independent summands with respect to the growing parameter 𝑛. To recall the standard argument, consider the sum 𝑆 𝑛 = 𝜉1 +· · ·+𝜉 𝑛 of independent random variables 𝜉 𝑘 ≥ 0 such that E𝑆 𝑛 = 𝑛 and Var(𝑆 𝑛 ) = 𝜎 2 𝑛. Given a parameter 𝑡 > 0 (to be chosen later on), we have, for any 0 < 𝜆 < 1, E e−𝑡𝑆𝑛 ≥ e−𝜆𝑡 𝑛 P{𝑆 𝑛 ≤ 𝜆𝑛}. On the other hand, every function 𝑢 𝑘 (𝑡) = E e−𝑡 𝜉𝑘 is positive, convex, and admits Taylor’s expansion near zero up to the quadratic form, which implies that 𝑢 𝑘 (𝑡) ≤ 1 − 𝑡 E𝜉 𝑘 +
o n 𝑡2 2 𝑡2 E𝜉 𝑘 ≤ exp − 𝑡 E𝜉 𝑘 + E𝜉 𝑘2 . 2 2
Multiplying these inequalities, we get n 𝑏𝑡 2 o , E e−𝑡𝑆𝑛 ≤ exp − 𝑡𝑛 + 2
𝑏=
𝑛 ∑︁
E𝜉 𝑘2 .
𝑘=1
The two bounds yield n o P{𝑆 𝑛 ≤ 𝜆𝑛} ≤ exp − (1 − 𝜆)𝑛𝑡 + 𝑏𝑡 2 /2 , and after optimization over 𝑡 (in fact, 𝑡 =
1−𝜆 𝑏
𝑛), we arrive at the exponential bound
2.1 Independence
27
n (1 − 𝜆) 2 o P{𝑆 𝑛 ≤ 𝜆𝑛} ≤ exp − 𝑛2 . 2𝑏 Note that 𝑏 = Var(𝑆 𝑛 ) +
𝑛 ∑︁
(E𝜉 𝑘 ) 2 ≤ 𝜎 2 + max (E𝜉 𝑘 ) 2 𝑛, 𝑘
𝑘=1
so
(1 − 𝜆) 2 𝑛 . P{𝑆 𝑛 ≤ 𝜆𝑛} ≤ exp − 2 𝜎 2 + max 𝑘 (E𝜉 𝑘 ) 2
In the case 𝑆 𝑛 = |𝑋 | 2 = 𝑋12 + · · · + 𝑋𝑛2 , thus with 𝜉 𝑘 = 𝑋 𝑘2 , we have 𝜎 2 = 𝜎42 (𝑋), and according to Proposition 2.1.1, 𝑀24 (𝑋) = max (E𝑋 𝑘2 ) 2 = max (E𝜉 𝑘 ) 2 . 𝑘
𝑘
Hence, we get: Proposition 2.1.4 If the random variables 𝑋1 , . . . , 𝑋𝑛 are independent and E |𝑋 | 2 = 𝑛, then for all 0 < 𝜆 < 1, n P{|𝑋 | 2 ≤ 𝜆𝑛} ≤ exp −
o (1 − 𝜆) 2 𝑛 , 2 𝜎42 + 𝑀24
where 𝜎42 = 𝜎42 (𝑋) and 𝑀2 = 𝑀2 (𝑋). In particular, in the i.i.d. case, n (1 − 𝜆) 2 o 𝑛 . P{|𝑋 | 2 ≤ 𝜆𝑛} ≤ exp − 2 E𝑋14 In fact, the property that the probabilities P{|𝑋 | 2 ≤ 𝜆𝑛} decrease exponentially fast does not require the finiteness of the 4-th moment. To show this, let us return to the sum 𝑆 𝑛 = 𝜉1 + · · · + 𝜉 𝑛 of independent random variables 𝜉 𝑘 ≥ 0, assuming that the summands are identically distributed with E𝜉1 = 1, so that E𝑆 𝑛 = 𝑛. As before, given 𝑡 > 0 and 0 < 𝜆 < 1, we have a lower bound on the Laplace transform E e−𝑡𝑆𝑛 ≥ e−𝜆𝑡 𝑛 P{𝑆 𝑛 ≤ 𝜆𝑛}. Let 𝑉 denote the common distribution of 𝜉 𝑘 which is supported on the positive half-axis [0, ∞). The function ∫ ∞ 𝑢(𝑡) = E e−𝑡 𝜉1 = e−𝑡 𝑥 d𝑉 (𝑥), 𝑡 ≥ 0, 0
is positive, convex, non-increasing, and has a continuous, non-decreasing derivative ∫ ∞ 𝑢 ′ (𝑡) = −E 𝜉1 e−𝑡 𝜉1 = − 𝑥e−𝑡 𝑥 d𝑉 (𝑥), 0
28
2 Some Classes of Probability Distributions
with 𝑢(0) = 1, 𝑢 ′ (0) = −1. Now, given 𝑝 ∈ (0, 1), let 𝜅 = 𝜅 𝑝 denote the maximal quantile for the probability measure 𝑥d𝑉 (𝑥) on (0, ∞) of order 𝑝, i.e., the minimal number 𝜅 > 0 such that ∫ 𝑥 d𝑉 (𝑥) ≤ 1 − 𝑝. (𝜅 ,∞)
Using 1 − e−𝑦 ≤ 𝑦 (𝑦 ≥ 0), we have, for all 𝑠 > 0, ∫ ∞ 𝑥(1 − e−𝑠𝑥 ) d𝑉 (𝑥) 1 + 𝑢 ′ (𝑠) = 0 ∫ ∫ −𝑠𝑥 = 𝑥(1 − e−𝑠𝑥 ) d𝑉 (𝑥) 𝑥(1 − e ) d𝑉 (𝑥) + 𝑥>𝜅 0 0 be chosen to satisfy 1+𝜆 E 𝑋12 1 2 ≤ . 𝑋1 >𝜅 2 Then
n (1 − 𝜆) 2 o P{|𝑋 | 2 ≤ 𝜆𝑛} ≤ exp − 𝑛 . 8𝜅 2
2.2 Pairwise Independence
29
2.2 Pairwise Independence A certain weakening of the independence property is the case where the random variables 𝑋1 , . . . , 𝑋𝑛 are pairwise independent. In that case we still have the important identity 𝑛 1 ∑︁ 𝜎42 (𝑋) = Var(𝑋 𝑘2 ). 𝑛 𝑘=1 If additionally these random variables are identically distributed, then 𝜎42 (𝑋) = Var(𝑋12 ), which is thus dimension-free. Proposition 2.1.1 is also partially saved. If 𝑋1 , . . . , 𝑋𝑛 are pairwise independent and have mean zero, then 1/2 𝑀2 (𝑋) = max E𝑋 𝑘2 , 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ). 𝑘
Moreover, if all E𝑋 𝑘2 = 1, then we deal with an isotropic random vector 𝑋. Example. Given a real-valued, 1-periodic, Borel measurable function 𝑓 on the real line, an interesting example of pairwise independent random variables is described by the model 𝑋 𝑘 (𝑡, 𝑠) = 𝑓 (𝑘𝑡 + 𝑠)
(0 < 𝑡, 𝑠 < 1, 𝑘 = 1, 2, . . . )
(2.3)
on the probability space Ω = (0, 1) × (0, 1), which we equip with the Borel 𝜎-algebra F and the Lebesgue measure P. This system is related to the well-studied systems { 𝑓 (𝑘𝑡)} (like in the case of trigonometric systems), but an additional “mixing” argument 𝑠 adds a number of remarkable properties. In particular, the following statement holds. Proposition 2.2.1 {𝑋 𝑘 }∞ 𝑘=1 is a strictly stationary sequence of pairwise independent random variables on Ω. Proof Put 𝜂 𝑘 (𝑡, 𝑠) = 𝑘𝑡 + 𝑠 (mod 1). Since 𝑋 𝑘 = 𝑓 (𝜂 𝑘 ), it is sufficient to show that the 𝜂 𝑘 form a strictly stationary sequence of pairwise independent random variables. Moreover, instead of 𝜂 𝑘 , one may consider the sequence 𝜉𝑘 = 𝜁 𝑧𝑘 ,
𝑘 = 1, 2 . . . ,
where 𝜁 and 𝑧 are independent, complex-valued random variables uniformly distributed on the unit circle 𝑆 1 of the complex plane. Note that 1, if 𝑚 = 0, 𝑚 E𝜁 = for all 𝑚 ∈ Z. 0, if 𝑚 ≠ 0, By independence of 𝜁 and 𝑧, we have, for all integers 𝑚 1 , . . . , 𝑚 𝑁 and ℎ ≥ 0, 𝑚𝑁 𝑚1 𝑚1 +···+𝑚 𝑁 E 𝜉1+ℎ . . . 𝜉𝑁 E 𝑧 (1+ℎ)𝑚1 +···+( 𝑁 +ℎ) 𝑚 𝑁 . +ℎ = E 𝜁
30
2 Some Classes of Probability Distributions
Here, the right-hand side is either equal to 0 or 1, and is equal to 1 if and only if 𝑚1 + · · · + 𝑚 𝑁 = 0
and
(1 + ℎ)𝑚 1 + · · · + (𝑁 + ℎ)𝑚 𝑁 = 0,
that is, if and only if 𝑚 1 + · · · + 𝑚 𝑁 = 0 and 1 · 𝑚 1 + · · · + 𝑁 · 𝑚 𝑁 = 0. The latter description does not depend on ℎ, which implies the strict stationarity by applying Weierstrass’ density theorem. Similarly, for indexes 𝑘 > 𝑙 ≥ 1 and all 𝑛, 𝑚 ∈ Z, one easily verifies that E 𝜉 𝑘𝑛 𝜉𝑙𝑚 = E 𝜉 𝑘𝑛 E 𝜉𝑙𝑚 , which is an equivalent form of independence of 𝜉 𝑘 and 𝜉𝑙 (for 𝑆 1 -valued random □ variables). Proposition 2.2.1 is proved. Being pairwise independent, 𝜉 𝑘 form an almost deterministic sequence in the sense that 𝜉 𝑘 = 𝑔 𝑘 (𝜉1 , 𝜉2 ) for certain measurable functions 𝑔 𝑘 on 𝑆 1 × 𝑆 1 . Moreover, 𝑗 there is a relation 𝜉 𝑘 = 𝑔𝑖, 𝑗,𝑘 (𝜉𝑖 , 𝜉 𝑗 ), 𝑖 ≠ 𝑗, as long as 𝑘− 𝑖− 𝑗 is an integer. Since the function (𝑡, 𝑠) → 𝑘𝑡 + 𝑠 mod(1) is uniformly distributed in (0, 1) as a random variable on Ω, we have ∫ 1 ∫ 1 𝑝 E𝑋 𝑘 = 𝑓 (𝑥) d𝑥, E |𝑋 𝑘 | = | 𝑓 (𝑥)| 𝑝 d𝑥. 0
∫1
0
∫1
Hence, when 0 𝑓 (𝑥) d𝑥 = 0 and 0 𝑓 (𝑥) 2 d𝑥 = 1, the vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) will be isotropic, with components being pairwise independent. The latter insures that, if 𝑓 has finite 4-th moment on (0, 1), then the variance functional 𝜎42 (𝑋)
1 = Var(|𝑋 | 2 ) = 𝑛
∫
1 4
∫
𝑓 (𝑥) d𝑥 − 0
1 2
2
𝑓 (𝑥) d𝑥 0
is finite and does not depend on 𝑛.
2.3 Coordinatewise Symmetric Distributions Another illustrative example is the class of random vectors 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) such that the distributions of (𝜀1 𝑋1 , . . . , 𝜀𝑋𝑛 ) with arbitrary 𝜀 𝑘 = ±1 do not depend on the choice of 𝜀 𝑘 . Equivalently, the distribution of 𝑋 is invariant under reflections (𝑥1 , . . . , 𝑥 𝑛 ) → (𝜀1 𝑥1 , . . . , 𝜀 𝑛 𝑥 𝑛 ) of the space R𝑛 about coordinate axes. In particular, 𝑋 and −𝑋 should have the same distribution, that is, the distribution of 𝑋 should be symmetric about the origin. We call such probability distributions coordinatewise symmetric, although in the literature they are also called distributions with an unconditional basis. This class includes all symmetric product measures on R𝑛 , which corresponds to the case where the components 𝑋 𝑘 are i.i.d. random variables with symmetric distributions on the real line. It is therefore not surprising that many formulas from the previous sections extend to the coordinatewise symmetric distributions. In particular,
2.3 Coordinatewise Symmetric Distributions
31
we have the following elementary assertion, which is a full analog of Proposition 2.1.3 about mean zero independent components. Proposition 2.3.1 If 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) has a coordinatewise symmetric distribution, then 1/2 𝑀2 (𝑋) = max E𝑋 𝑘2 . 𝑘
Moreover, for any 𝑝 ≥ 2, max E |𝑋 𝑘 | 𝑝
1/ 𝑝
≤ 𝑀 𝑝 (𝑋) ≤ 𝐶 𝑝 max E |𝑋 𝑘 | 𝑝
1/ 𝑝
,
𝑘
𝑘
√ where one may take 𝐶 𝑝 = 2 𝑝. The first assertion is obvious, as well as the lower bound in the second one. For an upper bound, using the coordinatewise symmetry, we may write ∑︁ 𝑝 𝑛 E | ⟨𝑋, 𝜃⟩ | 𝑝 = E 𝜃 𝑘 𝜀 𝑘 𝑋 𝑘 , 𝑘=1
where the 𝜀 𝑘 = ±1 are arbitrary. The remaining part of the argument is identical to the one from the proof of Proposition 2.1.3. Recalling Proposition 1.4.2, we obtain as a consequence that 1/ 𝑝 2/ 𝑝 1 𝑚 𝑝 (𝑋) = √ E | ⟨𝑋, 𝑌 ⟩ | 𝑝 ≤ 𝐶 2𝑝 max E |𝑋 𝑘 | 𝑝 𝑘 𝑛 (where 𝑌 is an independent copy of 𝑋). In fact, this inequality can be refined in terms of the mean moments. Let us restrict ourselves to the mean 4-th moment 𝑛
1 ∑︁ E𝑋 𝑘4 , 𝛽¯4 = 𝑛 𝑘=1 which may also be treated as a normalized Lyapunov coefficient (in the central limit theorem). Proposition 2.3.2 If the random vector 𝑋 has a coordinatewise symmetric distribution, then 𝑚 44 (𝑋) ≤ 3 𝛽¯42 . Indeed, we have a general identity E ⟨𝑋, 𝑌 ⟩ 4 =
𝑛 ∑︁
2 E 𝑋𝑖1 𝑋𝑖2 𝑋𝑖3 𝑋𝑖4 .
𝑖1 ,...,𝑖4 =1
The last expectation is non-vanishing if and only if in the sequence 𝑖 1 , 𝑖2 , 𝑖3 , 𝑖4 there are either two distinct numbers which are listed twice, or all of them coincide. Hence, by Cauchy’s inequality,
32
2 Some Classes of Probability Distributions
E ⟨𝑋, 𝑌 ⟩ 4 = 3
∑︁
≤3
∑︁
(E𝑋𝑖2 𝑋 2𝑗 ) 2 +
∑︁
𝑖≠ 𝑗
(E𝑋𝑖4 ) 2
𝑖
E𝑋𝑖4 𝑋 4𝑗 +
∑︁
(E𝑋𝑖4 ) 2 = 3
𝑖
𝑖≠ 𝑗
∑︁
E𝑋𝑖4
2 −2
∑︁
𝑖
(E𝑋𝑖4 ) 2 .
𝑖
It follows that E ⟨𝑋, 𝑌 ⟩ 4 ≤ 3𝑛2 𝛽¯42 . As an important subclass, one may consider random vectors 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) with spherically invariant distributions. The distribution of 𝑋 is then uniquely determined by the distribution of its Euclidean norm |𝑋 |, and in polar coordinates one may write 𝑋 = 𝑟𝜃, assuming that 𝜃 is a random vector uniformly distributed in the unit sphere S𝑛−1 and 𝑟 ≥ 0 is a random variable independent of 𝜃. For example, when 𝑋 is standard normal, 𝑟 2 has a 𝜒2 -distribution with 𝑛 degrees of freedom. In this subclass, the distributions of linear functionals ⟨𝑋, 𝑎⟩ depend on |𝑎| only, and similarly to the i.i.d. case, many assertions about 𝑋 can be stated in terms of 𝑋1 or 𝑟. In particular, 𝑀22 (𝑋) = E𝑋12 , so that 𝑋 is isotropic if and only if E𝑋12 = 1, or equivalently E |𝑋 | 2 = 𝑛. More generally, one may consider coordinatewise symmetric distributions that are invariant under permutations of coordinates, i.e., under mappings (𝑥1 , . . . , 𝑥 𝑛 ) → (𝑥 𝜋 (1) , . . . , 𝑥 𝜋 (𝑛) ) for arbitrary permutations 𝜋 of {1, . . . , 𝑛}. Concerning the functional 𝜎42 (𝑋), the situation is not similar to the i.i.d. case, since the covariances cov(𝑋𝑖2 , 𝑋 2𝑗 ), 𝑖 ≠ 𝑗, may not vanish. More precisely, all such covariances are equal to each other, and we have 𝜎42 (𝑋) =
1 Var(|𝑋 | 2 ) = Var(𝑋12 ) + (𝑛 − 1) cov(𝑋12 , 𝑋22 ). 𝑛
(2.4)
Hence, in order to bound 𝜎42 (𝑋) from above by a dimension-free quantity, one needs a property like 𝑐 cov(𝑋12 , 𝑋22 ) ≤ . 𝑛 The same remark applies to a more complicated Λ-functional, which is often of the same order as 𝜎42 (𝑋). To be more precise, in the coordinatewise symmetric case, Λ(𝑋) may be in essence reduced to the moment-type functional 𝑉2 (𝑋) = sup Var(𝜃 1 𝑋12 + · · · + 𝜃 𝑛 𝑋𝑛2 ) = 𝑀22 (𝑌 ), 𝜃 ∈S𝑛−1
where 𝑌 is a random vector with components 𝑌𝑘 = 𝑋 𝑘2 − E𝑋 𝑘2 . That is, 𝑉2 (𝑋) represents the maximal eigenvalue of the covariance matrix cov(𝑋𝑖2 , 𝑋 2𝑗 )}𝑖,𝑛 𝑗=1 . Proposition 2.3.3 Given a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) with a coordinatewise symmetric distribution, we have 𝑉2 (𝑋) ≤ Λ(𝑋) ≤ 2 max E𝑋 𝑘4 + 𝑉2 (𝑋). 𝑘
(2.5)
2.3 Coordinatewise Symmetric Distributions
33
If additionally the distribution of 𝑋 is invariant under permutations of coordinates, then 𝜎42 (𝑋) ≤ Λ(𝑋) ≤ 2 E𝑋14 + 𝜎42 (𝑋), (2.6) where the last term 𝜎42 (𝑋) may be removed in the case cov(𝑋12 , 𝑋22 ) ≤ 0 (𝑛 ≥ 2). These assumptions are met, in particular, when the components are independent and have a common symmetric distribution on the line. In this case, (2.6) is consistent with representation for Λ(𝑋) from Proposition 2.1.1. Proof The lower bound on Λ(𝑋) in (2.5) follows from Definition 1.7.1 by choosing the coefficients of the form 𝑎 𝑖 𝑗 = 𝜃 𝑖 𝛿𝑖 𝑗 . In the proof of the second inequality (the upper bound), the argument is similar to the one used in the proof of Proposition 2.1.1. Put 𝑣𝑖2 = E𝑋𝑖2 and define 2 𝑋𝑖(2) 𝑗 = 𝑋𝑖 𝑋 𝑗 − E𝑋𝑖 𝑋 𝑗 = 𝑋𝑖 𝑋 𝑗 − 𝛿𝑖 𝑗 𝑣𝑖 .
As before, the covariances of these mean zero random variables are given by (2) 2 2 E 𝑋𝑖(2) 𝑗 𝑋 𝑘𝑙 = E 𝑋𝑖 𝑋 𝑗 𝑋 𝑘 𝑋𝑙 − 𝛿𝑖 𝑗 𝛿 𝑘𝑙 𝑣𝑖 𝑣 𝑘 .
(2.7)
Case 1: 𝑖 ≠ 𝑗. By the symmetry about the coordinate axes, the right-hand side of (2.7) is vanishing unless (𝑖, 𝑗) = (𝑘, 𝑙) or (𝑖, 𝑗) = (𝑙, 𝑘). In both these cases, it is equal to (2) (2) (2) 2 2 E 𝑋𝑖(2) 𝑗 𝑋𝑖 𝑗 = E 𝑋𝑖 𝑗 𝑋 𝑗𝑖 = E𝑋𝑖 𝑋 𝑗 . Case 2: 𝑖 = 𝑗. Again, the right-hand side in (2.7) is non-zero only when 𝑘 = 𝑙. Case 2a): 𝑖 = 𝑗, 𝑘 = 𝑙, 𝑖 ≠ 𝑘. The right-hand side in (2.7) is equal to (2) E 𝑋𝑖𝑖(2) 𝑋 𝑘𝑘 = E𝑋𝑖2 𝑋 𝑘2 − E𝑋𝑖2 E𝑋 𝑘2 = cov(𝑋𝑖2 , 𝑋 𝑘2 ).
Case 2b): 𝑖 = 𝑗 = 𝑘 = 𝑙. The right-hand side is equal to E 𝑋𝑖𝑖(2) 𝑋𝑖𝑖(2) = E𝑋𝑖4 − E𝑋𝑖2 E𝑋𝑖2 = Var(𝑋𝑖2 ). (2) In both subcases, E 𝑋𝑖𝑖(2) 𝑋 𝑘𝑘 = cov(𝑋𝑖2 , 𝑋 𝑘2 ). Therefore, for any collection of real Í numbers 𝑎 𝑖 𝑗 such that 𝑎 𝑖 𝑗 = 𝑎 𝑗𝑖 , 𝑖, 𝑗 = 1, . . . , 𝑛, and with 𝑖,𝑛 𝑗=1 𝑎 𝑖2𝑗 = 1,
Var
∑︁ 𝑛
𝑎 𝑖 𝑗 𝑋𝑖 𝑋 𝑗 =
𝑛 ∑︁ 𝑛 ∑︁
(2) 𝑎 𝑖 𝑗 𝑎 𝑘𝑙 E 𝑋𝑖(2) 𝑗 𝑋 𝑘𝑙
𝑖, 𝑗=1 𝑘,𝑙=1
𝑖, 𝑗=1
=2
∑︁
𝑎 2𝑖 𝑗 E𝑋𝑖2 𝑋 2𝑗 +
𝑖≠ 𝑗
∑︁
𝑎 𝑖𝑖 𝑎 𝑘𝑘 cov(𝑋𝑖2 , 𝑋 𝑘2 ).
𝑖,𝑘
Here, the first sum on the right-hand side does not exceed ∑︁ ∑︁ max E𝑋𝑖2 𝑋 2𝑗 𝑎 2𝑖 𝑗 ≤ max E𝑋𝑖4 𝑎 𝑖2𝑗 ≤ max E𝑋𝑖4 𝑖≠ 𝑗
𝑖
𝑖
𝑖≠ 𝑗
𝑖≠ 𝑗
34
2 Some Classes of Probability Distributions
(by applying Cauchy’s inequality). As for the second sum, it does not exceed 𝑉2 (𝑋), and we obtain (2.8) Λ(𝑋) ≤ 2 max E𝑋𝑖2 𝑋 2𝑗 + 𝑉2 (𝑋), 𝑖≠ 𝑗
from which (2.5) follows immediately. Turning to the second assertion, we may assume that 𝑛 ≥ 2. The first inequality in (2.6) is general, cf. Proposition 1.7.2. For the upper bound, we note that, for any 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 , Var(𝜃 1 𝑋12
+···+
𝜃 𝑛 𝑋𝑛2 )
=
𝑛 ∑︁
𝜃 𝑖 𝜃 𝑗 cov(𝑋𝑖2 , 𝑋 2𝑗 )
𝑖, 𝑗=1
=
=
∑︁
𝜃 𝑖 𝜃 𝑗 cov(𝑋12 , 𝑋22 ) + Var(𝑋12 )
𝑖≠ 𝑗 𝑛 ∑︁
2 𝜃𝑖
cov(𝑋12 , 𝑋22 ) − cov(𝑋12 , 𝑋22 ) + Var(𝑋12 ).
𝑖=1
Here, in the case cov(𝑋12 , 𝑋22 ) ≥ 0, the last sum is maximized for equal coefficients, and using (2.4), we then get Var(𝜃 1 𝑋12 + · · · + 𝜃 𝑛 𝑋𝑛2 ) ≤ (𝑛 − 1)cov(𝑋12 , 𝑋22 ) + Var(𝑋12 ) = 𝜎42 (𝑋). Hence, (2.8) implies (2.6). In the case where cov(𝑋12 , 𝑋22 ) ≤ 0, we have similarly Var(𝜃 1 𝑋12 + · · · + 𝜃 𝑛 𝑋𝑛2 ) ≤ −cov(𝑋12 , 𝑋22 ) + Var(𝑋12 ) = E𝑋14 − E𝑋12 𝑋22 , which means that 𝑉2 (𝑋) ≤ E𝑋14 − E𝑋12 𝑋22 . Hence, by (2.8), Λ(𝑋) ≤ 2 E𝑋12 𝑋22 + 𝑉2 (𝑋) ≤ E𝑋12 𝑋22 + E𝑋14 ≤ 2 E𝑋14 . Hence, (2.6) follows in this case as well, without the 𝜎42 (𝑋)-functional. Proposition 2.3.3 is proved.
□
2.4 Logarithmically Concave Measures A positive Borel measure 𝜇 on R𝑛 is called logarithmically concave or log-concave, if for all 𝑡, 𝑠 > 0, 𝑡 + 𝑠 = 1, and all non-empty compact subsets 𝐴, 𝐵 of R𝑛 , 𝜇(𝑡 𝐴 + 𝑠𝐵) ≥ 𝜇( 𝐴) 𝑡 𝜇(𝐵) 𝑠 . Here, 𝑡 𝐴 + 𝑠𝐵 = {𝑡𝑥 + 𝑠𝑦 : 𝑥 ∈ 𝐴, 𝑦 ∈ 𝐵}
(2.9)
2.4 Logarithmically Concave Measures
35
denotes the Minkowski average of sets, which is compact for compact 𝐴 and 𝐵. In this case, (2.9) extends to all non-empty Borel sets in R𝑛 (and then it is safer to use the lower measure 𝜇∗ on the left-hand side). If a random vector 𝑋 in R𝑛 has a log-concave distribution, one also says that 𝑋 is log-concave. Log-concave measures form an important class of a great interest, especially in Convex Geometry. They possess a number of remarkable properties resembling the behavior of product measures, and here we briefly mention a few of them. 𝑎) The property of being log-concave is dimension free: If 𝜇 is log-concave on R𝑛 viewed as a linear subset of R𝑚 (𝑚 ≥ 𝑛), then 𝜇 is log-concave on R𝑚 . 𝑏) Stability under linear transformations: If 𝜈 = 𝜇𝑇 −1 is the image of a log-concave measure 𝜇 on R𝑛 under a linear map 𝑇 : R𝑛 → R𝑚 , then the inequality (2.9) continues to hold for 𝜈 on R𝑚 . In particular, if a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) is log-concave, then all components 𝑋𝑖 , and moreover – all linear functionals ⟨𝑋, 𝑣⟩ are log-concave on the real line. 𝑐) The product 𝜇 = 𝜇1 ⊗ · · · ⊗ 𝜇 𝑘 of log-concave measures 𝜇𝑖 on R𝑛𝑖 is log-concave on R𝑛 of dimension 𝑛 = 𝑛1 + · · · + 𝑛 𝑘 . Combining this property with 𝑏), we also obtain that the class of log-concave measures is preserved under convolutions. 𝑑) The support 𝐾 = supp(𝜇) of a log-concave probability measure 𝜇 on R𝑛 is a closed convex set. In particular, any affine subspace of R𝑛 has 𝜇-measure 0 or 1 (zero-one law). By the dimension of 𝜇 one means the dimension of 𝐾. Proposition 2.4.1 (C. Borell) Any log-concave probability measure 𝜇 on R𝑛 is absolutely continuous with respect to the Lebesgue measure 𝜆 𝐾 on the support set 𝐾 and has density 𝑓 = e−𝑉 for some convex function 𝑉 : 𝐾 → (−∞, ∞]. Conversely, if 𝑉 is convex, the equality d𝜇 = e−𝑉 d𝜆 𝐾 defines a log-concave measure. The proof of the existence of densities for log-concave probability measures may be found in [63], [64]. In dimension one, it is however simple. Assume that the distribution of 𝑋 is not a delta-measure, so that 𝑎 = ess inf 𝑋 < 𝑏 = ess sup 𝑋, and introduce the distribution function 𝐹 (𝑥) = P{𝑋 ≤ 𝑥}, 𝑥 ∈ R. Applying (2.9) to the half-axes, we obtain that log 𝐹 and log(1 − 𝐹) are concave functions on (𝑎, 𝑏). This implies that these functions are locally absolutely continuous, and thus 𝐹 has a density on (𝑎, 𝑏). Moreover, the distribution of 𝑋 may not have a positive mass at 𝑎, or at 𝑏. Note that the density 𝑓 in Proposition 2.4.1 may be defined according to the Lebesgue differentiation theorem by 𝑓 (𝑥) = lim inf 𝜀↓0
𝜇(𝐵(𝑥, 𝜀)) , 𝜆 𝐾 (𝐵(𝑥, 𝜀))
𝑥 ∈ R𝑛 ,
where 𝐵(𝑥, 𝜀) denotes the Euclidean ball of radius 𝜀 centered at 𝑥. Hence, by (2.9) applied to the balls, we have, for all 𝑡, 𝑠 > 0 (𝑡 + 𝑠 = 1) and 𝑥, 𝑦 ∈ R𝑛 , 𝑓 (𝑡𝑥 + 𝑠𝑦) ≥ 𝑓 (𝑥) 𝑡 𝑓 (𝑦) 𝑠 . That is, log 𝑓 must be a concave function, necessarily finite in the relative interior int(𝐾). In other words, 𝑓 must be log-concave. In particular, a probability measure
36
2 Some Classes of Probability Distributions
𝜇 on R𝑛 which is absolutely continuous with respect to the 𝑛-dimensional Lebesgue measure is log-concave if and only if it has a log-concave density. For example, the Lebesgue measure 𝜇 = 𝜆 on R𝑛 and its restriction 𝜇 = 𝜆 𝐾 to any convex body 𝐾 in R𝑛 are log-concave. This also follows from the definition (2.9) by applying the Brunn–Minkowski inequality 𝑛 𝜆(𝑡 𝐴 + 𝑠𝐵) ≥ 𝑡𝜆( 𝐴) 1/𝑛 + 𝑠𝜆(𝐵) 1/𝑛 , (2.10) where 𝐴 and 𝐵 may be arbitrary Borel sets in R𝑛 . Other interesting examples include Gaussian measures, exponential distributions, and their normalized restrictions to arbitrary convex bodies. The sufficiency part in the characterization of log-concave measures may be established by several methods. One of them is the classical bisection method going back to Hadwiger ([73], [63]). A simpler approach relies upon the Prékopa–Leindler theorem, which is natural to include here (cf. also [157], [68]). Proposition 2.4.2 Let 𝑡, 𝑠 > 0, 𝑡 + 𝑠 = 1, and let 𝑢, 𝑣, 𝑤 : R𝑛 → [0, ∞) be Borel measurable functions satisfying 𝑤(𝑡𝑥 + 𝑠𝑦) ≥ 𝑢(𝑥) 𝑡 𝑣(𝑦) 𝑠 Then 𝑤(𝑧) d𝑧 ≥
(2.11)
𝑠 𝑣(𝑦) d𝑦 .
(2.12)
𝑡 ∫
∫
∫
for all 𝑥, 𝑦 ∈ R𝑛 .
𝑢(𝑥) d𝑥
Proof An immediate induction argument reduces the assertion to dimension one. If 𝑛 ≥ 2, fix 𝑥, 𝑦, 𝑧 ∈ R𝑛−1 such that 𝑡𝑥 + 𝑠𝑦 = 𝑧 and consider the functions 𝑢 1 (𝑥 ′) = 𝑢(𝑥, 𝑥 ′), 𝑣1 (𝑦 ′) = 𝑣(𝑦, 𝑦 ′), 𝑤1 (𝑧 ′) = 𝑤(𝑧, 𝑧 ′) in variables 𝑥 ′, 𝑦 ′, 𝑧 ′ ∈ R. Since they satisfy (2.11) on the real line, we obtain (2.12), that is, the relation 𝑤(𝑡𝑥 ˜ + 𝑠𝑦) ≥ 𝑢(𝑥) ˜ 𝑡 𝑣˜ (𝑦) 𝑠 ,
𝑥, 𝑦 ∈ R𝑛−1 ,
(2.13)
for the functions ∫
∞
= 𝑢(𝑥) ˜
𝑢(𝑥, 𝑥 ′) d𝑥 ′,
∫
∞
𝑣˜ (𝑦) = ∫−∞∞
−∞
𝑤(𝑧) ˜ =
𝑣(𝑦, 𝑦 ′) d𝑦 ′, 𝑤(𝑧, 𝑧 ′) d𝑧 ′ .
−∞
The relation (2.13) is of the form (2.11), so that we may apply the induction hypothesis in dimension 𝑛 − 1 to obtain (2.12) for ( 𝑢, ˜ 𝑣˜ , 𝑤), ˜ which, by Fubini’s theorem, is the same as (2.12) in dimension 𝑛 for the original functions 𝑢, 𝑣, 𝑤. Now, let 𝑛 = 1. By a simple truncation, one may assume that 𝑢 and 𝑣 are bounded. Moreover, since both (2.11) and (2.12) do not change when the involved functions are multiplied by positive constants, let ess sup 𝑢 = ess sup 𝑣 = 1. In this case, the sets
2.4 Logarithmically Concave Measures
37
𝐵(𝛼) = {𝑦 ∈ R : 𝑣(𝑦) > 𝛼}, 𝐶 (𝛼) = {𝑧 ∈ R : 𝑤(𝑧) > 𝛼}
𝐴(𝛼) = {𝑥 ∈ R : 𝑢(𝑥) > 𝛼},
are non-empty for all 0 < 𝛼 < 1. The hypothesis (2.11) yields the set inclusion 𝑡 𝐴(𝛼) + 𝑠𝐵(𝛼) ⊂ 𝐶 (𝛼), so, we may apply the one dimensional Brunn–Minkowski inequality (2.10). It gives ∫
∞
∞
∫
−∞
∫
1
𝜆(𝐶 (𝛼)) d𝛼 ≥
𝑤(𝑧) d𝑧 = 0
𝜆(𝐶 (𝛼)) d𝛼 0
∫
1
∫
1
𝜆( 𝐴(𝛼)) d𝛼 + 𝑠
≥𝑡
𝜆(𝐵(𝛼)) d𝛼 ∫ ∞ 𝑡 ∫ ∫ ∞ ∫ ∞ 𝑢(𝑥) d𝑥 =𝑡 𝑣(𝑦) d𝑦 ≥ 𝑢(𝑥) d𝑥 + 𝑠 0
−∞
0
−∞
−∞
which is the desired conclusion (2.12) for 𝑛 = 1.
∞
𝑠 𝑣(𝑦) d𝑦 ,
−∞
□
Let us also note that the inequality (2.10) in dimension one is elementary. Indeed, let 𝐴 and 𝐵 be compact sets such that sup 𝐴 = inf 𝐵 = 0 (otherwise, one may properly shift these sets, since (2.10) is shift-invariant). Then (𝑡 𝐴) ∪ (𝑠𝐵) ⊂ 𝑡 𝐴 + 𝑠𝐵, (𝑡 𝐴) ∩ (𝑠𝐵) = {0}, and thus, by the additivity, 𝜆(𝑡 𝐴 + 𝑠𝐵) ≥ 𝑡𝜆( 𝐴) + 𝑠𝜆(𝐵). As for higher dimensions, the Prékopa–Leindler theorem actually extends the Brunn–Minkowski inequality. The hypothesis (2.11) is fulfilled for the indicator functions 𝑢 = 1 𝐴, 𝑣 = 1 𝐵 , 𝑤 = 1𝑡 𝐴+𝑠𝐵 , and (2.12) then leads to the multiplicative (log-concave) variant of (2.10), 𝜆(𝑡 𝐴 + 𝑠𝐵) ≥ 𝜆( 𝐴) 𝑡 𝜆(𝐵) 𝑠 . 𝑎 𝑏 , 𝑠 = 𝑎+𝑏 , where 𝑎 = 𝜆( 𝐴) 1/𝑛 , 𝑏 = 𝜆(𝐵) 1/𝑛 , and Being applied with weights 𝑡 = 𝑎+𝑏 1 1 ′ ′ with the sets 𝐴 = 𝑎 𝐴, 𝐵 = 𝑏 𝐵, the above relation yields 𝜆( 𝐴 + 𝐵) ≥ (𝑎 + 𝑏) 𝑛 , which is an equivalent variant of (2.10). More generally, given a convex function 𝑉 on an open convex domain in R𝑛 and non-empty Borel sets 𝐴, 𝐵 ⊂ R𝑛 , the hypothesis (2.11) is fulfilled for the functions 𝑢 = e−𝑉 1 𝐴, 𝑣 = e−𝑉 1 𝐵 , 𝑤 = e−𝑉 1𝑡 𝐴+𝑠𝐵 , and then (2.12) yields the inequality (2.9) for the measure d𝜇(𝑥) = e−𝑉 ( 𝑥) d𝑥. This means that 𝜇 is log-concave. One immediate consequence of the Prékopa–Leindler theorem (Proposition 2.4.2) is the following useful observation due to Prékopa [160].
Corollary 2.4.3 If 𝑓 (𝑥, 𝑦) is an integrable log-concave function on R𝑛 × R𝑚 , then the function ∫ 𝑓 (𝑥, 𝑦) d𝑦
𝑔(𝑥) = R𝑚
is log-concave on R𝑛 .
38
2 Some Classes of Probability Distributions
Indeed, the measure 𝜇 with density 𝑓 is log-concave on R𝑛 × R𝑚 . Hence, by property 𝑏), its image 𝜇 ′ under the projection 𝑇 (𝑥, 𝑦) = 𝑥 represents a log-concave measure on R𝑛 . It is absolutely continuous with respect to the Lebesgue measure, and the function 𝑔 appears as the density of 𝜇 ′. Therefore, 𝑔 is log-concave. One remarkable consequence of Corollary 2.4.3 is that the convolution of any two integrable log-concave functions on R𝑛 is log-concave as well. This property was discovered by Davidovich, Korenbljum and Khacet, with a rather straightforward elementary argument, cf. [78]. Now, let us briefly comment on the one-dimensional case (to which we will return later on). According to Borell’s characterization, a probability measure 𝜇 on the real line is log-concave if and only if it is a delta-measure (mass point) or if it has a logconcave density 𝑓 supported on some open interval (𝑎, 𝑏), finite or not. In particular, it must be unimodal: 𝑓 is non-increasing on (𝑎, 𝑥0 ] and is non-decreasing on [𝑥0 , 𝑏) for some 𝑥0 ∈ [𝑎, 𝑏]. The latter can be seen from another useful characterization. With any probability measure 𝜇 which is supported on (𝑎, 𝑏) and has there an almost everywhere positive density 𝑓 , one may associate the function 𝐼 𝜇 ( 𝑝) = 𝑓 (𝐹 −1 ( 𝑝)),
0 < 𝑝 < 1,
(2.14)
where 𝐹 −1 : (0, 1) → (𝑎, 𝑏) is the inverse to the distribution function 𝐹 (𝑥) = 𝜇((𝑎, 𝑥)), 𝑎 < 𝑥 < 𝑏, which is obviously increasing on (𝑎, 𝑏). Note that, up to a shift, any positive continuous function 𝐼 on (0, 1) defines a unique probability measure 𝜇 with the associated function 𝐼 = 𝐼 𝜇 , for example, via the identity ∫ 𝑝 d𝑡 𝐹 −1 ( 𝑝) = 𝑚 + , 𝐼 1/2 (𝑡) where 𝑚 is a median for 𝜇. For example, for the two-sided exponential distribution 𝜈 with density 21 e−| 𝑥 | , the associated function is given by 𝐼 𝜈 ( 𝑝) = min{𝑝, 1 − 𝑝}. Borell’s characterization yields the following useful description. Proposition 2.4.4 A non-degenerate probability measure 𝜇 on the real line is logconcave if and only if it is supported on an open interval, finite or not, and has there a positive continuous density, such that the function 𝐼 𝜇 is concave on (0, 1).
2.5 Khinchine-type Inequalities for Norms and Polynomials The density 𝑓 of an absolutely continuous log-concave probability measure 𝜇 on R𝑛 must decay exponentially fast at infinity, and one may easily derive an upper bound 𝑓 (𝑥) ≤ 𝐶 e−𝑐 | 𝑥 | with positive constants 𝑐 and 𝐶 which do not depend on 𝑥. Therefore, linear functionals over log-concave probability measures have a finite exponential moment. Moreover, the 𝐿 𝑝 -moments turn out to be equivalent to each other, so that Khinchine-type inequalities hold true. In fact, there is a more general
2.5 Khinchine-type Inequalities for Norms and Polynomials
39
observation due to Borell [63] about the 𝐿 𝑝 -norms ∥ 𝑋 ∥ 𝑝 = (E ∥ 𝑋 ∥ 𝑝 ) 1/ 𝑝 for an arbitrary norm ∥ · ∥ on R𝑛 . Proposition 2.5.1 If a random vector 𝑋 in R𝑛 has a log-concave distribution, then ∥ 𝑋 ∥ 𝑝 ≤ 𝐶 𝑝 E ∥ 𝑋 ∥,
𝑝 ≥ 1,
(2.15)
for some positive absolute constant 𝐶. Proof We follow a simple argument by Borell. Putting 𝐴 = {𝑥 ∈ R𝑛 : ∥𝑥∥ < 𝑟} and 𝐵 = R𝑛 \ 𝜆𝐴 (𝜆 > 1), we have a set inclusion 𝜆−1 2 𝐴+ 𝐵 ⊂ R𝑛 \ 𝐴. 𝜆+1 𝜆+1 Hence, by (2.9), 2
𝜆−1
1 − 𝜇( 𝐴) ≥ 𝜇( 𝐴) 𝜆+1 (1 − 𝜇(𝜆𝐴)) 𝜆+1 , or equivalently, P{∥ 𝑋 ∥ ≥ 𝜆𝑟 } ≤ 𝛼
1 − 𝛼 𝜆+1 2
, 𝛼 where 𝛼 = P{∥ 𝑋 ∥ < 𝑟 }. Since (2.15) is homogeneous with respect to 𝑋, we may assume that 𝑟 = 1 is a quantile of ∥ 𝑋 ∥ of order 𝛼 = 2/3. This leads to P{∥ 𝑋 ∥ ≥ 𝜆} ≤
1 − 𝜆−1 2 2 , 3
𝜆 ≥ 1.
By direct integration, ∫ ∞ ∫ 1 1 1 ∞ − 𝜆−1 𝑝 + P{∥ 𝑋 ∥ ≥ 𝜆} d𝜆 𝑝 ≤ + 2 2 d𝜆 3 3 3 1 1 √ ∫ 1 ∞ − 𝜆−1 𝑝 2 2 𝑝 ≤ 2 2 d𝜆 = Γ( 𝑝 + 1). 3 0 3 log 2
E ∥ 𝑋 ∥ 𝑝 1 { ∥𝑋 ∥ ≥1} =
Using Γ( 𝑝 + 1) ≤ 𝑝 𝑝 , we get that E ∥ 𝑋 ∥ 𝑝 ≤ 1 + E ∥ 𝑋 ∥ 𝑝 1 { ∥𝑋 ∥ ≥1} ≤ 1 +
2𝑝 𝑝 2 𝑝 𝑝 ≤ 1+ 𝑝 , log 2 log 2
so that (E ∥ 𝑋 ∥ 𝑝 ) 1/ 𝑝 ≤ (1 + log2 2 ) 𝑝. Finally, since 13 ≤ P{∥ 𝑋 ∥ ≥ 1} ≤ E ∥ 𝑋 ∥, we arrive at the conclusion of the proposition with constant 𝐶 = 3 (1 + log2 2 ) < 12. □
40
2 Some Classes of Probability Distributions
Corollary 2.5.2 If a random variable 𝜉 has a log-concave distribution, then (E |𝜉 | 𝑝 ) 1/ 𝑝 ≤ 𝐶 𝑝 E |𝜉 |,
𝑝 ≥ 1,
(2.16)
for some absolute constant 𝐶. Hence, if a random vector 𝑋 in R𝑛 has a log-concave distribution, then 𝑀 𝑝 (𝑋) ≤ 𝐶 𝑝 𝑀1 (𝑋). The second assertion uses the property that all linear functionals of 𝑋 represent log-concave random variables. In fact, to derive (2.16), elementary convexity-like arguments may be sufficient, and here we mention one of them. First assume that the distribution of 𝜉 is symmetric about zero and is not a mass point. Suppose for normalization that the quantile 𝜅 𝛼 for the distribution function 𝐹 (𝑥) = P{|𝜉 | ≤ 𝑥} of order 𝛼 ∈ (0, 1) satisfies 𝜅 𝛼 = − log(1−𝛼). At this step, we only use the weaker property that 𝑆(𝑥) = − log(1−𝐹 (𝑥)) is a convex function on [0, ∞). Let 𝑏 = sup{𝑥 ≥ 0 : 𝐹 (𝑥) < 1} and let 𝑇 = 𝑆 −1 : [0, ∞) → [0, 𝑏) be the inverse function, which is thus increasing and concave. Note that 𝑇 (𝜅 𝛼 ) = 𝜅 𝛼 and |𝜉 | = 𝑇 (𝜂), where the random variable 𝜂 has a standard exponential distribution with density e−𝑥 (𝑥 ≥ 0). Let 𝑙 = 𝑙 (𝑥) be an affine function majorizing 𝑇 and such that 𝑙 (𝜅 𝛼 ) = 𝑇 (𝜅 𝛼 ). Equivalently, for some 𝑐 ∈ R, 𝑙 (𝑥) = 𝑇 (𝜅 𝛼 ) + 𝑐(𝑥 − 𝜅 𝛼 ) = 𝑐𝑥 + (1 − 𝑐)𝜅 𝛼 ,
𝑥 ≥ 0.
Since 𝑙 (0) ≥ 0, we necessarily have 0 ≤ 𝑐 ≤ 1, and by the convexity of the power function, 𝑙 (𝑥) 𝑝 ≤ 𝑐𝑥 𝑝 + (1 − 𝑐)𝜅 𝛼𝑝 . Hence E |𝜉 | 𝑝 = E 𝑇 (𝜂) 𝑝 ≤ E 𝑙 (𝜂) 𝑝 ≤ 𝑐 E 𝜂 𝑝 + (1 − 𝑐)𝜅 𝛼𝑝 = 𝑐 Γ( 𝑝 + 1) + (1 − 𝑐)𝜅 𝛼𝑝 ≤ max{Γ( 𝑝 + 1), 𝜅 𝛼𝑝 }. It remains to relate the quantile to the first absolute moment of 𝜉. Applying Markov’s inequality, we have 1 − 𝛼 = P{|𝜉 | ≥ 𝜅 𝛼 } ≤ 𝜅1𝛼 E |𝜉 |, hence E |𝜉 | 𝑝 ≤
max{Γ( 𝑝 + 1), 𝜅 𝛼𝑝 } (E |𝜉 |) 𝑝 . ((1 − 𝛼)𝜅 𝛼 ) 𝑝
This inequality is homogeneous with respect to 𝜉 of order 𝑝, and therefore one may drop the normalization condition. Choosing 𝛼 = 1 − 1/𝑒, we arrive at E |𝜉 | 𝑝 ≤ e 𝑝 Γ( 𝑝 + 1) (E |𝜉 |) 𝑝 . Since Γ( 𝑝 + 1) 1/ 𝑝 ≤ 𝑝, we obtain the inequality (2.16) with constant 𝐶 = e. To remove the symmetry assumption, assume that E𝜉 = 0 and apply the previous step to 𝜉 − 𝜉 ′, where 𝜉 ′ is an independent copy of 𝜉. By Jensen’ inequality, we get E |𝜉 | 𝑝 ≤ E |𝜉 − 𝜉 ′ | 𝑝 ≤ e 𝑝 Γ( 𝑝 + 1) (E |𝜉 − 𝜉 ′ |) 𝑝 ≤ (2𝑒) 𝑝 Γ( 𝑝 + 1) (E |𝜉 |) 𝑝 .
2.5 Khinchine-type Inequalities for Norms and Polynomials
41
Finally, in the general one-dimensional case, write 𝜉 = 𝜂 + E𝜉. Then ∥𝜂∥ 1 ≤ 2 ∥𝜉 ∥ 1 , and, by the previous step applied to 𝜂, ∥𝜉 ∥ 𝑝 ≤ ∥𝜂∥ 𝑝 + ∥𝜉 ∥ 1 ≤ 2𝑒 Γ( 𝑝 + 1) 1/ 𝑝 ∥𝜂∥ 1 + ∥𝜉 ∥ 1 ≤ (4𝑒 Γ( 𝑝 + 1) 1/ 𝑝 + 1) ∥𝜉 ∥ 1 , which yields the inequality (2.16) with constant 𝐶 = 4e + 1. Another important feature of the log-concavity is that the 𝐿 1 -norm on the righthand side of (2.15) may be replaced with a smaller quantity, the so-called “𝐿 0 -norm” defined by ∥ 𝑋 ∥ 0 = lim ∥ 𝑋 ∥ 𝑝 = exp E log ∥ 𝑋 ∥ . 𝑝→0
Introduce the distribution function 𝐹 (𝑟) = P{∥ 𝑋 ∥ ≤ 𝑟 },
𝑟 ≥ 0.
The next Khinchine-type inequality is due to Latala [127]. We will obtain it as a consequence of a stronger property following an argument of [20]. Proposition 2.5.3 If the random vector 𝑋 has a log-concave distribution on R𝑛 , e then the function 𝐹 (𝑟) 𝑟 −1/(2e) is non-decreasing in the interval 0 < 𝐹 (𝑟) ≤ e+1 . In particular, E ∥ 𝑋 ∥ ≤ 𝐶 ∥ 𝑋 ∥0 (2.17) for some absolute constant 𝐶. Proof Without loss of generality, assume that the distribution of 𝑋 is not a mass point, so that the interval Δ = {𝑟 : 0 < 𝐹 (𝑟) < 1} is not empty. By the definition (2.9), the function log 𝐹 (𝑟) is concave in Δ, so 𝐹 (𝑟) has a left continuous Radon– Nikodym derivative 𝑓 (𝑟) on Δ. Moreover, using a simple approximation argument, we may assume that the function 𝑓 is continuous on Δ. For 𝑟, 𝑠 ∈ Δ, 𝑠 > 𝑟, consider the set 𝐴 = {𝑥 ∈ R𝑛 : 𝑟 < ∥𝑥∥ ≤ 𝑠}, and denote by 𝐵 = {𝑥 ∈ R𝑛 : ∥𝑥∥ ≤ 1} the unit ball for the given norm. Since for 𝜆 ∈ (0, 12 ], (1 − 𝜆) 𝐴 + 𝜆 (𝑟 𝐵) ⊂ {𝑥 ∈ R𝑛 : (1 − 2𝜆) 𝑟 < ∥𝑥∥ ≤ (1 − 𝜆) 𝑠 + 𝜆𝑟 }, we obtain from (2.9) that 𝐹 ((1 − 𝜆)𝑠 + 𝜆𝑟) − 𝐹 ((1 − 2𝜆)𝑟) ≥ (𝐹 (𝑠) − 𝐹 (𝑟)) 1−𝜆 𝐹 (𝑟) 𝜆 . Here, there is an equality for 𝜆 = 0, and comparing the derivatives of both sides at 𝜆 = 0, we arrive at 𝑓 (𝑠) (𝑟 − 𝑠) + 2 𝑓 (𝑟)𝑟 ≥ (𝐹 (𝑠) − 𝐹 (𝑟)) log
𝐹 (𝑟) . 𝐹 (𝑠) − 𝐹 (𝑟)
42
2 Some Classes of Probability Distributions
Hence, since 𝑟 < 𝑠, 2 𝑓 (𝑟)𝑟 ≥ (𝐹 (𝑠) − 𝐹 (𝑟)) log
𝐹 (𝑟) . 𝐹 (𝑠) − 𝐹 (𝑟)
1 , one can choose 𝑠 ∈ Δ so Now, given 0 < 𝛼 < 1 and 𝑟 ∈ Δ such that 𝐹 (𝑟) < 1+𝛼 that 𝐹 (𝑠) − 𝐹 (𝑟) = 𝛼𝐹 (𝑟), and then the above inequality yields
2 𝑓 (𝑟)𝑟 ≥
𝛼 𝐹 (𝑟), log(1/𝛼)
𝑝 𝛼 that is, 𝐹𝑓 (𝑟) (𝑟) ≥ 𝑟 with constant 𝑝 = 2 log(1/𝛼) . But this is equivalent to the property 𝑝 that the function log 𝐹 (𝑟) − log(𝑟 ) is non-decreasing on the interval 0 < 𝐹 (𝑟) < 1 1+𝛼 . Choosing 𝛼 = 1/e, which maximizes the power 𝑝 = 𝑝(𝛼), the first assertion of the proposition is thus proved. Assuming for normalization that ∥ 𝑋 ∥ has a quantile of order 2/3 at 𝜅 = 1, then e ). Hence 𝐹 (𝑟) ≤ 𝐹 (1) 𝑟 1/(2e) = 23 𝑟 1/(2e) for all 0 < 𝑟 ≤ 1 (since 23 ≤ e+1 ∞
∫ E log ∥ 𝑋 ∥ =
1
∫ log 𝑟 d𝐹 (𝑟) ≥
0
log 𝑟 d𝐹 (𝑟) 0
∫
1
=− 0
𝐹 (𝑟) 2 d𝑟 ≥ − 𝑟 3
∫ 0
1
𝑟 1/(2e) 4e d𝑟 = − . 𝑟 3
Dropping the normalization condition on the quantile, we arrive at the relation ∥ 𝑋 ∥ 0 ≥ exp{− 4e 3 } 𝜅. But, by Proposition 2.5.1, E ∥ 𝑋 ∥ ≤ 𝑐𝜅, which has been derived 2 with 𝑐 = 1 + log 2 . Hence, the inequality (2.17) follows with constant 𝐶 = 𝑐 exp{ 4e 3 }. Proposition 2.5.3 is thus proved. □ Similar dimension-free Khinchine-type inequalities continue to hold for polynomials of bounded degree in place of the norms. Let us mention without proof the following important result (which generalizes Corollary 2.5.2). Proposition 2.5.4 Suppose that the random vector 𝑋 has a log-concave distribution on R𝑛 . For any polynomial 𝑄 in 𝑛 real variables of degree 𝑑, and any 𝑝 ≥ 1, (E |𝑄(𝑋)| 𝑝 ) 1/ 𝑝 ≤ 𝐶 𝑝,𝑑 E |𝑄(𝑋)| with some constant 𝐶 𝑝,𝑑 depending on 𝑝 and 𝑑 only. One immediate consequence of this result concerns the variance-type functionals 2 𝑝 1/ 𝑝 |𝑋 | − 1 , 𝜎2 𝑝 (𝑋) = 𝑛 E 𝑛 √
𝑝 ≥ 1.
Since 𝑄(𝑥) = |𝑥| 2 is a quadratic polynomial on R𝑛 , Proposition 2.5.4 yields:
2.6 One-dimensional Log-concave Distributions
43
Corollary 2.5.5 If a random vector 𝑋 in R𝑛 has a log-concave distribution, then 𝜎2 𝑝 (𝑋) ≤ 𝐶 𝑝 𝜎2 (𝑋) for some constant 𝐶 𝑝 depending on 𝑝 only. This relation holds regardless of the condition E |𝑋 | 2 = 𝑛 which was used in the definition of 𝜎2 𝑝 (𝑋). If 𝑋 is log-concave and isotropic, then 𝑀2 (𝑋) = 1 and all moment functionals 𝑀 𝑝 (𝑋) are bounded by constants depending on 𝑝 only (Corollary 2.5.2). Hence, by Propositions 1.5.1 and 1.5.3, we also have the relations Var(|𝑋 |) ≤ 𝜎42 (𝑋) ≤ 16 Var(|𝑋 |) + 𝐶 for some positive absolute constant 𝐶.
2.6 One-dimensional Log-concave Distributions The description of log-concavity in dimension one, given in Proposition 2.4.4, may be used to establish a number of relations between various moment-like functionals and the value of the density 𝑓 at the median 𝑚. In particular, we have: Proposition 2.6.1 Given a random variable 𝜉 with a log-concave density 𝑓 , 1 1 ≤ 𝑓 (𝑚) 2 ≤ . 12 Var(𝜉) 2 Var(𝜉) An equality in the first inequality is achieved for the uniform distribution on any finite interval, while an equality in the second inequality is achieved for the twosided exponential distribution 𝜈. Proof Let 𝜇 be the distribution of 𝜉. As before, we denote by 𝐹 (𝑥) = P{𝜉 ≤ 𝑥}, 𝑥 ∈ R, the distribution function of 𝜉, and by 𝐹 −1 : (0, 1) → (𝑎, 𝑏) its inverse, where (𝑎, 𝑏) is the supporting interval for 𝜇. By the definition (2.14) of the 𝐼 𝜇 -function, ∫ 𝑝 d𝑡 , 𝑝, 𝑞 ∈ (0, 1), 𝐹 −1 ( 𝑝) − 𝐹 −1 (𝑞) = 𝐼 (𝑡) 𝜇 𝑞 while, by the concavity of 𝐼 𝜇 , 𝐼 𝜇 ( 𝑝) ≥ 2 𝐼 𝜇
1 2
min{𝑝, 1 − 𝑝} = 2 𝑓 (𝑚) 𝐼 𝜈 ( 𝑝),
0 < 𝑝 < 1,
where 𝜈 has a two-sided exponential distribution with density 12 e−| 𝑥 | (𝑥 ∈ R). Since the function 𝐹 −1 has distribution 𝜇 under the Lebesgue measure on (0,1), we get
44
2 Some Classes of Probability Distributions
Var(𝜉) = =
1 2
∫ 1∫
1 2
∫ 1∫
0
0
1
(𝐹 −1 ( 𝑝) − 𝐹 −1 (𝑞)) 2 d𝑝 d𝑞
0 1
∫
0
𝑝
d𝑡 𝐼 𝜇 (𝑡)
𝑞
2
1 2
d𝑝 d𝑞 ≤
∫ 1∫ 0
1 ∫ 𝑝
0
𝑞
d𝑡 2 𝑓 (𝑚) 𝐼 𝜈 (𝑡)
2 d𝑝 d𝑞.
For the same reason, if a random variable 𝜂 has distribution 𝜈, we have 1 Var(𝜂) = 2 Hence
∫ 1∫ 0
1
∫
0
𝑞
𝑝
d𝑡 𝐼 𝜈 (𝑡)
2 d𝑝 d𝑞.
1 1 , Var(𝜂) = (2 𝑓 (𝑚)) 2 2 𝑓 (𝑚) 2
Var(𝜉) ≤
thus proving the second inequality of the proposition. To prove the first inequality, we should estimate the function 𝐼 𝜇 from above. With this aim, let us again note that, by the concavity of 𝐼 𝜇 , for some 𝑐 ∈ R, 𝐼 𝜇 ( 𝑝) ≤ 𝑙 𝑐 ( 𝑝) ≡ 𝐼 𝜇
1 1 +𝑐 𝑝− = 𝑓 (𝑚) + 𝑐 𝑝 − , 2 2 2
1
0 < 𝑝 < 1.
Because 𝐼 𝜇 is non-negative, we have a restriction |𝑐| ≤ 2 𝑓 (𝑚). From this, 1 Var(𝜉) = 2
∫ 1∫
1 2
∫ 1∫
≥
0
1
∫
0
0
𝑝
𝑞 1 ∫ 𝑝
0
𝑞
2 d𝑡 d𝑝 d𝑞 𝐼 𝜇 (𝑡) 2 d𝑡 d𝑝 d𝑞 ≡ 𝑢(𝑐). 𝑙 𝑐 (𝑡)
The function 𝑐 → 1/𝑙 𝑐 (𝑡) is positive and convex, and so is the function 𝑢. Since the latter is also symmetric around zero, we get 𝑢(𝑐) ≥ 𝑢(0) for all admissible 𝑐. Therefore, 1 Var(𝜉) ≥ 2 =
∫ 1∫ 0
1
∫
𝑝
0
𝑞
1 2 𝑓 (𝑚) 2
∫ 1∫ 0
d𝑡 𝑙0 (𝑡)
2 d𝑝 d𝑞
1
( 𝑝 − 𝑞) 2 d𝑝 d𝑞 =
0
1 , 12 𝑓 (𝑚) 2
where the last expression describes the variance of the uniform distribution. Proposition 2.6.2 follows.
□
Here is another interesting application. Proposition 2.6.2 Given a random variable 𝜉 with a log-concave distribution, 1 1 ≤ P{𝜉 ≤ E𝜉} ≤ 1 − . e e
(2.18)
2.6 One-dimensional Log-concave Distributions
45
Both inequalities are sharp, since on the right-hand side there is equality for 𝜉 having a standard exponential distribution. The left inequality may be viewed as a “log-concave” or limit version of the known fact that for any convex body 𝐾 in R𝑛 and any half-space with boundary passing through the centroid of 𝐾, Vol𝑛 (𝐾 ∩ 𝐻) ≥
1 Vol𝑛 (𝐾). e
𝑛 𝑛 ) . This In the space of a fixed dimension 𝑛 the factor 1/e can be replaced with ( 𝑛+1 property was first observed by Grünbaum and Hammer with a proof based on the Schwarz symmetrization, cf. [104].
Proof We use the same notations as in the proof of Proposition 2.6.1. First note that, for any positive concave function 𝐽 ( 𝑝) in 0 < 𝑝 < 1, the function 𝐽 ( 𝑝)/𝑝 is non-increasing, hence 𝐽 ( 𝑝)/(1 − 𝑝) is non-decreasing. Using the concavity of the function 𝐼 = 𝐼 𝜇 , we conclude that, for any 𝑝 0 ∈ (0, 1), 𝐼 ( 𝑝0) 𝐼 ( 𝑝) ≤ for 𝑝 ∈ (0, 𝑝 0 ] 1− 𝑝 1 − 𝑝0
and
𝐼 ( 𝑝0) 𝐼 ( 𝑝) ≥ for 𝑝 ∈ [ 𝑝 0 , 1). 1− 𝑝 1 − 𝑝0
Hence, by Fubini’s theorem, ∫
1
−1
0
1
∫
𝑢
d𝑝 i d𝑢 0 𝑝0 𝐼 ( 𝑝) ∫ 𝑝0 ∫ 1 𝑝 1− 𝑝 = 𝐹 −1 ( 𝑝 0 ) + d𝑝 − d𝑝 𝐼 ( 𝑝) 0 𝑝0 𝐼 ( 𝑝) ∫ 1 ∫ 𝑝0 1 − 𝑝0 𝑝 1 − 𝑝0 −1 ≤ 𝐹 ( 𝑝0) + d𝑝 − d𝑝 𝐼 ( 𝑝 0 ) 𝑝0 𝐼 ( 𝑝0) 0 1 − 𝑝 (1 − 𝑝 0 ) (1 + log(1 − 𝑝 0 )) = 𝐹 −1 ( 𝑝 0 ) + . 𝐼 ( 𝑝0)
𝐹 (𝑢) d𝑢 =
E𝜉 =
∫
h
−1
𝐹 ( 𝑝0) +
Taking 𝑝 0 = 1− 1e , we arrive at E𝜉 ≤ 𝐹 −1 (1− 1e ) which is equivalent to 𝐹 (E𝜉) ≤ 1− 1e . This is exactly the right inequality in (2.18). The left inequality follows from the right one, by applying it to the random variable −𝜉. This proves the proposition. □ Another application deals with Khinchine-type inequalities for the moments of 𝜉 with respect to the restricted measures. Proposition 2.6.3 If a random variable 𝜉 defined on some Lebesgue probability space (Ω, F , P) has a log-concave distribution, then for every measurable set 𝐴 ⊂ Ω, ∫ ∫ P( 𝐴) 2 √︁ P( 𝐴) 3 Var(𝜉), Var(𝜉). |𝜉 | dP ≥ √ 𝜉 2 dP ≥ (2.19) 24 𝐴 𝐴 4 2 Proof We use the previous notations, assuming that 𝜉 has a log-concave density 𝑓 . The assumption that (Ω, F , P) is a Lebesgue space is understood in the sense of
46
2 Some Classes of Probability Distributions
Rokhlin, which means that (Ω, F , P) can be transformed by virtue of a measurepreserving map to the interval (0, 1) with some Borel probability measure 𝜆. Moreover, since the distribution of 𝜉 has no atom, 𝜆 can be assumed to be the Lebesgue measure on (0, 1); an account on these spaces may be found in [161], [59]. Thus, let Ω = (0, 1) be equipped with the Lebesgue measure P, and let 𝛼 = P( 𝐴) > 0. Putting 𝜉 ( 𝑝) = 𝐹 −1 ( 𝑝), one may further assume that 0 < 𝐹 (0) < 1 (otherwise, 𝜉 is either positive or negative, but then the inequalities in (2.19) can be made stronger by adding a constant to 𝜉). In this case, since the function 𝐹 −1 is increasing and continuous, 𝐹 −1 ( 𝑝 0 ) = 0 for some 𝑝 0 ∈ (0, 1). Hence ∫ 𝑝 d𝑢 1 |𝐹 −1 ( 𝑝)| = | 𝑝 − 𝑝 0 |, ≥ sup𝑢 𝐼 (𝑢) 𝑝0 𝐼 (𝑢) which yields ∫
∫
|𝐹 −1 ( 𝑝)| d𝑝 ≥
|𝜉 | dP = 𝐴
𝐴
1 sup𝑢 𝐼 (𝑢)
∫ | 𝑝 − 𝑝 0 | d𝑝. 𝐴
It should also be clear that within the ∫class of all measurable sets 𝐴 ⊂ (0, 1) of Lebesgue measure P( 𝐴) = 𝛼, the integral 𝐴 | 𝑝 − 𝑝 0 | d𝑝 attains its minimum at some interval (𝑐, 𝑑) of length 𝛼 containing 𝑝 0 . The worst situation corresponds to the case 1+𝛼 2 𝑝 0 = 12 and 𝐴 = ( 1−𝛼 2 , 2 ), and then this integral is equal to 𝛼 /4. Hence ∫ |𝜉 | dP ≥ 𝐴
𝛼2 . 4 sup𝑢 𝐼 (𝑢)
It remains to note that, by the concavity of the function 𝐼, we have √︄ 2 sup |𝐼 (𝑢)| ≤ 2𝐼 (1/2) = 2 𝑓 (𝑚) ≤ , Var(𝜉) 𝑢 where we applied Proposition 2.6.1 in the last step. By a similar argument, ∫ ∫ 1 𝜉 2 dP ≥ ( 𝑝 − 𝑝 0 ) 2 d𝑝 (sup𝑢 𝐼 (𝑢)) 2 𝐴 𝐴 ∫ 1+𝛼 2 1 2 𝛼3 1 𝑝 − d𝑝 = ≥ , 2 (sup𝑢 𝐼 (𝑢)) 2 1−𝛼 12 (sup𝑢 𝐼 (𝑢)) 2 2 and then Proposition 2.6.1 gives the second inequality in (2.19). Note also that the first inequality implies the second one by applying the Cauchy inequality, but with a worse constant. □ Finally, let us quantify the assertion on the exponential decay of log-concave densities.
2.6 One-dimensional Log-concave Distributions
47
Proposition 2.6.4 If a random variable 𝜉√︁has a log-concave density 𝑓 with expectation 𝑎 = E𝜉 and standard deviation 𝜎 = Var(𝜉), then for all 𝑥 ∈ R, 𝑓 (𝑥) ≤
𝐶 −| 𝑥−𝑎 |/𝜎 √12 e , 𝜎
where 𝐶 is a universal constant. Proof Let 𝑚 be a median of 𝜉. First we show that 𝑓 (𝑥) ≤ 2e 𝑓 (𝑚) e− 𝑓 (𝑚) |𝑥−𝑚 | ,
𝑥 ∈ R. ∫∞ We may assume that 𝑥 > 0 and 𝑚 = 0 so that P{𝜉 ≥ 0} = 0 𝑓 (𝑥) d𝑥 = 21 . First consider the case 𝑓 (0)𝑥 ≥ 1. We use an upper bound
(2.20)
𝑓 (𝑥) ≤ 𝑀 (𝑥) = sup e−𝑢( 𝑥) , 𝑢
where the supremum is taken over all convex functions 𝑢 : [0, ∞) → (−∞, ∞] such that e−𝑢(0) = 𝑓 (0) and ∫ ∞ 1 (2.21) e−𝑢( 𝑥) d𝑥 ≤ . 2 0 By the convexity, for any such 𝑢, all these constraints will also be fulfilled for the function 𝑢 0 which is linear in [0, 𝑥], is equal to ∞ on (𝑥, ∞), and which has values 𝑢 0 (0) = 𝑢(0), 𝑢 0 (𝑥) = 𝑢(𝑥). Hence, while computing 𝑀 (𝑥), we may restrict ourselves to functions of the form 𝑢 𝛼 (𝑦) = − log 𝑓 (0) + 𝛼𝑦, 0 ≤ 𝑦 ≤ 𝑥 and 𝑢 𝛼 (𝑦) = ∞ for 𝑦 > 𝑥. The range of 𝛼 is determined by (2.21), that is, by the −𝛼𝑥 −𝑦 condition 𝑓 (0) 1−e𝛼 ≤ 12 . Since the function 1−e𝑦 is positive and decreases on R, we may conclude that 𝑀 (𝑥) = 𝑓 (0) e−𝑦
where 𝑦 is the solution of
1 1 − e−𝑦 = . 𝑦 2 𝑓 (0)𝑥
1 1 But, by the assumption, 2 𝑓 (0) 𝑥 ≤ 2 , so, 𝑦 ≥ 1, since otherwise −𝑦 1 1−e 1 Hence 2 𝑓 (0) ≥ 2𝑦 and thus 𝑦 ≥ 𝑓 (0)𝑥, implying 𝑥 = 𝑦
1−e−𝑦 𝑦
≥ 1 − e−1 > 12 .
𝑓 (𝑥) ≤ 𝑀 (𝑥) ≤ 𝑓 (0) e− 𝑓 (0) 𝑥 . To treat the case 𝑓 (0)𝑥 < 1, just recall that, by the concavity of 𝐼 (𝑡) = 𝑓 (𝐹 −1 (𝑡)), we have 𝐼 (𝑡) ≤ 2𝐼 (1/2) = 2 𝑓 (0), implying 𝑓 (𝑥) ≤ 2 𝑓 (0) ≤ 2e 𝑓 (0) e− 𝑓 (0) 𝑥 . In both cases, we obtain the desired estimate (2.20). Now, starting from (2.20), we immediately get |E𝜉 − 𝑚| ≤ 4e/ 𝑓 (𝑚). Hence, applying (2.20) once more, we have 𝑓 (𝑥) ≤ 2e · e4𝑒 𝑓 (𝑚) e− 𝑓 (𝑚) |𝑥−𝑎 | . It remains to apply Proposition 2.6.1, connecting 𝑓 (𝑚) with 𝜎.
□
48
2 Some Classes of Probability Distributions
2.7 Remarks The material in Sections 2.1–2.3 is rather standard. Small ball probability bounds for independent summands like the one in Proposition 2.1.4 have been studied by many authors. Systems of pairwise independent random variables with similar deterministic properties were first constructed by Joffe [110]. The system 𝑋 𝑘 (𝑡, 𝑠) = Ψ(𝑘𝑡 + 𝑠) as above was considered in [46]. Besides Borell’s characterization of log-concave measures (in the sufficiency part of Proposition 2.4.1) and Prékopa’s theorem (Corollary 2.4.3), the Prékopa–Leindler theorem is known to have numerous applications. As a simple example, one can start from a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R+𝑛 with a log-concave density 𝑓 and note that the condition (2.11) is satisfied by the functions 𝑥 𝑝1 𝑥 𝑝𝑛 𝑦 𝑞1 𝑦 𝑞𝑛 1 𝑛 1 𝑛 𝑢(𝑥) = ... 𝑓 (𝑥), 𝑣(𝑦) = ... 𝑓 (𝑦), 𝑝1 𝑝𝑛 𝑞1 𝑞𝑛 𝑧 𝑟1 𝑧 𝑟𝑛 1 𝑛 𝑤(𝑧) = ... 𝑓 (𝑧), 𝑟1 𝑟𝑛 as long as 𝑝 𝑘 > 𝑟 𝑘 > 𝑞 𝑘 > 0, 𝑟 𝑘 = 𝑡 𝑝 𝑘 + 𝑠𝑞 𝑘 (𝑡, 𝑠 > 0, 𝑡 + 𝑠 = 1). The conclusion (2.12) of Proposition 2.4.2 is then equivalent to the statement that the function 𝑔( 𝑝 1 , . . . , 𝑝 𝑛 ) = E
𝑋 𝑝1 1
𝑋 𝑝𝑛 𝑛
...
𝑝1
𝑝𝑛
is log-concave on R+𝑛 (cf. [27]). In dimension 𝑛 = 1, when 𝑋 has a standard exponential distribution, we obtain in particular that the function Γ( 𝑝 + 1)/𝑝 𝑝 is log-concave in 𝑝 > 0 (a property which was mentioned in the proof of Proposition 1.3.3). Khinchine-type inequalities for norms as in Propositions 2.5.1 and 2.5.3 may further be strengthened to the form ∥ 𝑋 ∥ 𝑝 ≤ 𝐶 𝑝,𝑞 ∥ 𝑋 ∥ 𝑞 ,
𝑝 > 𝑞 > −1,
cf. Guédon [102]. In the class of one-dimensional symmetric log-concave distributions, best constants in Khinchine-type inequalities were found by Karlin, Proschan and Barlow [119] in their study of total positivity. Namely, they proved that (E |𝜉 | 𝑝 ) 1/ 𝑝 ≤ Γ( 𝑝 + 1) 1/ 𝑝 E |𝜉 |,
𝑝 ≥ 1,
as long as the random variable 𝜉 has a symmetric log-concave distribution on the real line. Here, equality is attained for the two-sided exponential distribution. This distribution plays an extremal role in some similar relations involving the values of the density 𝑓 of 𝜉 at zero. In particular, for the same class, K. Ball [10] proves that 𝑓 (0) 𝑝 Γ(𝑞 + 1) 𝑝+1 (E |𝜉 | 𝑝 ) 𝑞+1 ≤ 𝑓 (0) 𝑞 Γ( 𝑝 + 1) 𝑞+1 (E |𝜉 | 𝑞 ) 𝑝+1 whenever 0 ≤ 𝑞 < 𝑝 < ∞.
2.7 Remarks
49
Relations between 𝐿 𝑝 -norms of 𝜉 with 3 different values of 𝑝 have been studied in the context of reverse Lyapunov inequalities, cf. e.g. Borell [62]. The description of log-concave probability distributions on the line as in Proposition 2.4.4 was emphasized in [18]. Let 𝜇 be a non-atomic probability measure with distribution function 𝐹 which is continuous and strictly increasing on the supporting interval. In [18] it was shown that 𝜇 is log-concave if and only if, for any ℎ > 0, the function 𝑅 ℎ ( 𝑝) = 𝐹 (𝐹 −1 ( 𝑝) + ℎ) is concave on (0, 1). The material of Section 2.6 is based on the papers [20] and [25]. Khinchine-type inequalities for polynomials over convex bodies were studied by Bourgain [67]. The more general Proposition 2.5.4 was proved in [22], [23]. In this chapter, we did not discuss in detail the behavior of the variance functional 𝜎42 (𝑋) =
1 Var(|𝑋 | 2 ), 𝑛
when the random vector 𝑋 has a log-concave distribution on R𝑛 . The so-called thin shell (or variance) conjecture asserts that 𝜎4 (𝑋) is bounded by an absolute constant in the entire class of all isotropic log-concave probability measures on R𝑛 , and regardless of the dimension 𝑛. In connection with the central limit theorem this conjecture was raised in [4] and [51]. Nevertheless, some positive results are known, and, for example, Proposition 6.6.2 from Chapter 6 will confirm the thinshell conjecture for coordinatewise symmetric log-concave distributions (Klartag’s theorem). In the general isotropic log-concave case, the first non-trivial bounds were obtained by Klartag [121], [122]. Later, Guédon and E. Milman [101] derived the estimate 𝜎4 (𝑋) ≤ 𝑐𝑛1/3 . Recently, Lee and Vempala [133], [134] developed Eldan’s localization technique ([84]) and showed that 𝜎4 (𝑋) ≤ 𝑐𝑛1/4 . Chen [77] considerably improved this method deriving the estimate 𝜎4 (𝑋) ≤ 𝑐𝑛 𝜀𝑛 with a certain sequence 𝜀 𝑛 → 0. The next sharpening 𝜎4 (𝑋) ≤ 𝑐 (log 𝑛) 𝛼 with 𝛼 = 4 is due to Klartag and Lehec [124]. The preprint [108] contains a further sharpening with 𝛼 = 2.2226, which is the best known result so far. 4 when the random As for lower bounds, it was shown in [51] that 𝜎42 (𝑋) ≥ 𝑛+4 vector 𝑋 is uniformly distributed in a symmetric isotropic convex body (an equality is attained for the Euclidean ball). Let us also mention two results about large and small ball probabilities for isotropic log-concave random vectors 𝑋 in R𝑛 . It was shown by Paouris that, for some absolute constants 𝑟 0 > 0 and 𝑐 > 0, P{|𝑋 | 2 ≤ 𝑟𝑛} ≤ 𝑟 𝑐
√ 𝑛
,
0 ≤ 𝑟 ≤ 𝑟0,
cf. [152], [69]. Whether or not one can sharpen this bound to P{|𝑋 | 2 ≤ 𝑟 0 𝑛} ≤ e−𝑐𝑛
50
2 Some Classes of Probability Distributions
is equivalent to the so-called slicing problem (as was noticed by Bourgain, Klartag and V. Milman). The latter may equivalently be formulated as the property that the density 𝑓 of 𝑋 satisfies sup 𝑥 𝑓 (𝑥) ≥ 𝑐 𝑛 for some absolute constant 𝑐 > 0 (still assuming that 𝑋 has an isotropic log-concave distribution). As was shown by Eldan and Klartag [83], this would be indeed true if the thin shell conjecture were true. That the slicing conjecture is true for coordinatewise symmetric log-concave distributions is a well-known simple fact, cf. e.g. [58]. On the other hand, if 𝑋 is isotropic and is uniformly distributed over a convex body 𝐾 ⊂ R𝑛 , then we have a large deviation bound √ √ P{|𝑋 | ≥ 𝑐𝑟 𝑛} ≤ e−𝑟 𝑛 ,
𝑟 ≥ 1,
where 𝑐 > 0 is an absolute constant (Paouris [151]). √ Thus, an essential part of mass in 𝐾 is contained in the Euclidean ball of radius 𝑐 𝑛. For coordinatewise symmetric convex bodies such bounds were previously studied in [58].
Chapter 3
Characteristic Functions
The moment functionals we discussed before may be explicitly expressed in terms of characteristic functions of linear functionals of a given random vector. However, information on various bounds on characteristic functions and their deviations from the characteristic function of another law on the real line will be needed for a different purpose – to study the Kolmogorov and Lévy distances between the corresponding distribution functions. In this chapter, we describe general tools in the form of smoothing and Berry–Esseen-type inequalities, which allow one to perform the transition from the results about closeness or smallness of Fourier–Stieltjes transforms to corresponding results about the associated functions of bounded variation.
3.1 Smoothing For the Kolmogorov distance we use the standard notation 𝜌(𝐹, 𝐺) = sup |𝐹 (𝑥) − 𝐺 (𝑥)| = ∥𝐹 − 𝐺 ∥, 𝑥
where 𝐹 and 𝐺 may be arbitrary functions of bounded variation on the real line, and where ∥ · ∥ refers to the 𝐿 ∞ -norm. Denote their Fourier–Stieltjes transforms by ∫ ∞ ∫ ∞ 𝑖𝑡 𝑥 e d𝐹 (𝑥), 𝑔(𝑡) = 𝑓 (𝑡) = e𝑖𝑡 𝑥 d𝐺 (𝑥) (𝑡 ∈ R). −∞
−∞
Proposition 3.1.1 Let 𝐹 be a non-decreasing bounded function, and let 𝐺 be a function of bounded variation such that 𝐹 (−∞) = 𝐺 (−∞) = 0. For any 𝑇 > 0, ∫
𝑇
𝜌(𝐹, 𝐺) ≤ −𝑇
∫ 𝑓 (𝑡) − 𝑔(𝑡) d𝑡 + 𝑇 sup 𝑡 𝑥 ∈R 0
2𝜋 𝑇
|𝐺 (𝑥 + 𝑢) − 𝐺 (𝑥)| d𝑢.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_3
(3.1)
51
52
3 Characteristic Functions
Proof Put 𝐴 = 𝐹 − 𝐺 and 𝜀 = 1/𝑇. We use smoothing of the function 𝐴 with the distribution function 𝐻 𝜀 (𝑥) = 𝐻 (𝑥/𝜀). That is, consider the convolution 𝐴 𝜀 = 𝐴 ∗ 𝐻 𝜀 , defined by ∫ ∞
𝐴(𝑥 − 𝑦) d𝐻 𝜀 (𝑦),
𝐴 𝜀 (𝑥) = −∞
by choosing the canonical distribution function with density 𝑝(𝑥) =
d𝐻 (𝑥) 1 sin(𝑥/2) 2 1 − cos 𝑥 = = , d𝑥 2𝜋 𝑥/2 𝜋𝑥 2
𝑥 ∈ R.
It has the triangle characteristic function ℎ(𝑡) = (1 − |𝑡|) + , so that 𝐻 𝜀 has the characteristic function ℎ 𝜀 (𝑡) = (1 − 𝜀|𝑡|) + supported on the interval [−𝑇, 𝑇]. We may assume that the first integral in (3.1) is convergent. Note that 𝐴 𝜀 has the Fourier–Stieltjes transform 𝑎 𝜀 (𝑡) = ( 𝑓 (𝑡) − 𝑔(𝑡)) ℎ 𝜀 (𝑡), which is supported on [−𝑇, 𝑇]. Hence, the function 𝐴 𝜀 is continuous and admits the representation according to the Fourier inversion formula for functions of bounded variation. That is, for all 𝑥, 𝑦 ∈ R, we have e−𝑖𝑡 𝑥 − e−𝑖𝑡 𝑦 𝑎 𝜀 (𝑡) d𝑡. −𝑖𝑡 −𝑇 ∫ 𝑇 −𝑖𝑡 𝑦 Here, since the function 𝑎 𝜀 (𝑡)/𝑡 is integrable, necessarily −𝑇 e 𝑖𝑡 𝑎 𝜀 (𝑡) d𝑡 → 0 as 𝑦 → −∞ (by the Riemann–Lebesgue lemma). Since also ∫
1 2𝜋
𝐴 𝜀 (𝑥) − 𝐴 𝜀 (𝑦) =
𝑇
lim 𝐴 𝜀 (𝑦) = lim 𝐴(𝑦) = 𝐹 (−∞) − 𝐺 (−∞) = 0, 𝑦→−∞
𝑦→−∞
we obtain that 1 𝐴 𝜀 (𝑥) = 2𝜋
∫
𝑇
−𝑇
e−𝑖𝑡 𝑥 𝑎 𝜀 (𝑡) d𝑡, −𝑖𝑡
𝑥 ∈ R.
(3.2)
This equality implies that 1 | 𝐴 𝜀 (𝑥)| ≤ 2𝜋
∫
𝑇
−𝑇
∫ 𝑇 𝑎 (𝑡) 1 𝜀 𝑓 (𝑡) − 𝑔(𝑡) d𝑡 ≤ d𝑡, 𝑡 2𝜋 −𝑇 𝑡
and since 𝑥 is arbitrary, ∥ 𝐴𝜀 ∥ ≤
1 2𝜋
∫
𝑇
−𝑇
𝑓 (𝑡) − 𝑔(𝑡) d𝑡. 𝑡
The next step is to properly bound ∥ 𝐴 𝜀 ∥ from below. Write ∫ ∞ 𝐴 𝜀 (𝑥) = 𝐴(𝑥 − 𝜀𝑦) d𝐻 (𝑦) = 𝐼0 (𝑥) + 𝐼1 (𝑥) −∞
with
(3.3)
3.1 Smoothing
53
∫
∫
𝐼0 (𝑥) =
𝐴(𝑥 − 𝜀𝑦) d𝐻 (𝑦),
𝐼1 (𝑥) =
|𝑦 | ≤𝑟/2
𝐴(𝑥 − 𝜀𝑦) d𝐻 (𝑦), |𝑦 |>𝑟/2
where the parameter 𝑟 > 0 will be chosen later on. Putting h 𝑟 𝑟 i ∫ 𝛾 =1−𝐻 − , = 𝑝(𝑥) d𝑥, 2 2 | 𝑥 | ≥𝑟/2 where 𝐻 is treated as measure, we have |𝐼1 (𝑥)| ≤ 𝛾∥ 𝐴∥ and thus ∥𝐼1 ∥ ≤ 𝛾∥ 𝐴∥. To estimate the integral 𝐼0 (𝑥), introduce the quantity ∫
ℎ
|𝐺 (𝑥 + 𝑢) − 𝐺 (𝑥)| d𝑢,
𝛿(ℎ) = sup 𝑥
ℎ > 0,
0
which appears on the right-hand side of (3.1). The monotonicity of 𝐹 ensures that ∫ 𝐼0 (𝑥) ≥ (𝐹 (𝑥 − 𝜀𝑟/2) − 𝐺 (𝑥 − 𝜀𝑦)) d𝐻 (𝑦) |𝑦 | ≤𝑟/2
= (1 − 𝛾) (𝐹 (𝑥 − 𝜀𝑟/2) − 𝐺 (𝑥 − 𝜀𝑟/2)) ∫ + (𝐺 (𝑥 − 𝜀𝑟/2) − 𝐺 (𝑥 − 𝜀𝑦)) 𝑝(𝑦) d𝑦. |𝑦 | ≤𝑟/2
Since 𝑝(𝑦) ≤
1 2𝜋𝜀
1 2𝜋 ,
1 2𝜋 ∫
the last integral does not exceed in absolute value
∫ |𝐺 (𝑥 − 𝜀𝑟/2) − 𝐺 (𝑥 − 𝜀𝑦)| d𝑦 = |𝑦 | ≤𝑟/2 𝜀𝑟
|𝐺 (𝑥 − 𝜀𝑟/2) − 𝐺 (𝑥 − 𝜀𝑟/2 + 𝑢)| d𝑢 ≤ 0
1 𝛿(𝜀𝑟). 2𝜋𝜀
Hence
1 𝛿(𝜀𝑟). 2𝜋𝜀 By a similar argument, using the monotonicity of the function 𝐹, we also have 𝐼0 (𝑥) ≥ (1 − 𝛾) 𝐴(𝑥 − 𝜀𝑟/2) −
𝐼0 (𝑥) ≤ (1 − 𝛾) 𝐴(𝑥 + 𝜀𝑟/2) +
1 𝛿(𝜀𝑟), 2𝜋𝜀
so that the two bounds yield |𝐼0 (𝑥)| ≥ (1 − 𝛾) max{𝐴(𝑥 − 𝜀𝑟/2), −𝐴(𝑥 + 𝜀𝑟/2)} −
1 𝛿(𝜀𝑟). 2𝜋𝜀
Taking here the supremum over all 𝑥 on the left and then on the right, we arrive at ∥𝐼0 ∥ ≥ (1 − 𝛾) ∥ 𝐴∥ −
1 𝛿(𝜀𝑟). 2𝜋𝜀
54
3 Characteristic Functions
But, by the triangle inequality, | 𝐴 𝜀 (𝑥)| ≥ |𝐼0 (𝑥)| − |𝐼1 (𝑥)| ≥ |𝐼0 (𝑥)| − 𝛾∥ 𝐴∥, so, ∥ 𝐴 𝜀 ∥ ≥ ∥𝐼0 ∥ − 𝛾∥ 𝐴∥ and thus ∥ 𝐴 𝜀 ∥ ≥ (1 − 2𝛾) ∥ 𝐴∥ −
1 𝛿(𝜀𝑟). 2𝜋𝜀
This is a smoothing inequality: In the case 𝛾 < 12 , it allows one to estimate the Kolmogorov distance ∥ 𝐴∥ = 𝜌(𝐹, 𝐺) between the original distributions in terms of the Kolmogorov distance ∥ 𝐴 𝜀 ∥ = 𝜌(𝐹 ∗ 𝐻 𝜀 , 𝐺 ∗ 𝐻 𝜀 ) between the smoothed distributions (at the expense of a potentially small error). Namely, returning to (3.3), the above inequality implies (1 − 2𝛾) ∥ 𝐴∥ −
1 1 𝛿(𝜀𝑟) ≤ 2𝜋𝜀 2𝜋
∫
𝑇
−𝑇
𝑓 (𝑡) − 𝑔(𝑡) d𝑡, 𝑡
or equivalently 1 ∥ 𝐴∥ ≤ 2𝜋 (1 − 2𝛾)
∫
𝑇
−𝑇
𝑓 (𝑡) − 𝑔(𝑡) 1 𝛿(𝜀𝑟) . d𝑡 + 𝑡 2𝜋 (1 − 2𝛾) 𝜀
(3.4)
For example, let us choose 𝑟 = 2𝜋, for which 𝛾 = 1 − 𝐻 ( [−𝜋, 𝜋]) ∫ ∞ ∫ ∞ ∫ sin2 𝑥 1 4 2 2 d𝑥 d𝑥 = 2 . = 𝑝(𝑥) d𝑥 = < 2 2 𝜋 𝜋 𝑥 𝑥 𝜋 𝜋/2 𝜋/2 | 𝑥 |> 𝜋 In this case, 𝜋 1 < = 0.8401... 2𝜋 (1 − 2𝛾) 2 (𝜋 2 − 8) Hence, (3.4) yields the desired inequality (3.1). Proposition 3.1.1 is proved.
□
3.2 Berry–Esseen-type Inequalities If 𝐺 is differentiable and has a bounded derivative on the real line, then (3.1) immediately yields a more familiar inequality due to Esseen (also called the Berry– Esseen inequality). Namely, from Proposition 3.1.1 we immediately get:
3.2 Berry–Esseen-type Inequalities
55
Proposition 3.2.1 Let 𝐹 be a non-decreasing bounded function, and let 𝐺 be a differentiable function of bounded variation such that 𝐹 (−∞) = 𝐺 (−∞) = 0. If |𝐺 ′ (𝑥)| ≤ 𝐴 for all 𝑥, then for any 𝑇 > 0, ∫
𝑇
𝜌(𝐹, 𝐺) ≤ −𝑇
Here, as before, ∫ 𝑓 (𝑡) =
2𝜋 2 𝐴 𝑓 (𝑡) − 𝑔(𝑡) . d𝑡 + 𝑡 𝑇
∞
∫ e𝑖𝑡 𝑥 d𝐹 (𝑥),
(3.5)
∞
e𝑖𝑡 𝑥 d𝐺 (𝑥)
𝑔(𝑡) =
−∞
(𝑡 ∈ R)
−∞
denote the Fourier–Stieltjes transforms of 𝐹 and 𝐺, respectively. Indeed, under the assumptions on 𝐺, for any 𝑥 ∈ R, ∫
2𝜋 𝑇
∫
2𝜋 𝑇
|𝐺 (𝑥 + 𝑢) − 𝐺 (𝑥)| d𝑢 ≤ 𝐴𝑇
𝑇 0
𝑢 d𝑢 = 0
2𝜋 2 𝐴 , 𝑇
so that the last term in (3.1) is bounded by the last term in (3.5). When 𝐺 is a distribution function (not necessarily differentiable), one may also relate the second integral in (3.1) to the concentration function defined by 𝑄 𝐺 (ℎ) = sup (𝐺 (𝑥 + ℎ) − 𝐺 (𝑥−)),
ℎ ≥ 0.
𝑥
Since it is non-decreasing in ℎ, (3.1) yields ∫
𝑇
𝜌(𝐹, 𝐺) ≤ −𝑇
𝑓 (𝑡) − 𝑔(𝑡) d𝑡 + 2𝜋 𝑄 𝐺 (2𝜋/𝑇), 𝑡
(3.6)
which is more general than (3.5), since it may involve discontinuous functions 𝐺. The last term in (3.6) may be further bounded in terms of the characteristic function 𝑔 of 𝐺 by virtue of the following general relation of independent interest. Lemma 3.2.2 Given a distribution function 𝐺 with characteristic function 𝑔, for any 𝜀 > 0, 96 2 ∫ 1/𝜀 𝜀 |𝑔(𝑡)| d𝑡. 𝑄 𝐺 (𝜀) ≤ 2 95 0 We also have 𝑄 𝐺 (2𝜋𝜀) ≤
𝜋2 𝜀 2
∫
1/𝜀
|𝑔(𝑡)| d𝑡. 0
Indeed, applying the last inequality with 𝜀 = 1/𝑇 in (3.6), we arrive at:
56
3 Characteristic Functions
Proposition 3.2.3 If 𝐹 and 𝐺 are distribution functions, then for any 𝑇 > 0, ∫
𝑇
𝜌(𝐹, 𝐺) ≤ −𝑇
∫ 𝑇 𝑓 (𝑡) − 𝑔(𝑡) 𝜋3 |𝑔(𝑡)| d𝑡. d𝑡 + 𝑡 𝑇 0
Proof (of Lemma 3.2.2) Let us start with an obvious general identity ∫ ∞ ∫ ∞ 𝑢(𝑥) ˆ d𝐺 (𝑥) = 𝑢(𝑡)𝑔(𝑡) d𝑡, −∞
(3.7)
−∞
which holds true for any integrable function 𝑢 on the real line with Fourier transform ∫ ∞ 𝑢(𝑥) ˆ = e𝑖𝑡 𝑥 𝑢(𝑡) d𝑡. −∞ 𝑥/2) 2 appears as the Fourier transform of Recall that the function 𝑝(𝑥) = sin(𝑥/2 𝑞(𝑡) = (1 − |𝑡|) + . One may therefore apply (3.7) to 𝑢(𝑡) = 𝑞(𝜀𝑡), in which case 𝑢(𝑥) ˆ = 1𝜀 𝑝( 𝑥𝜀 ). Since both sides of (3.7) are real numbers, we then get
∫
∞
∫
∞
𝑢(𝑡)𝑔(𝑡) d𝑡 = −∞
(1 − 𝜀|𝑡|) + 𝑔(𝑡) d𝑡 ≤
−∞
∫
1/𝜀
|𝑔(𝑡)| d𝑡. −1/𝜀
On the other hand, for any 0 < 𝜅 < 𝜋, the left integral in (3.7) is greater than or equal to ∫ ∫ sin(𝑥/2𝜀) 2 1 1 sin 𝜅 2 d𝐺 (𝑥) ≥ d𝐺 (𝑥) 𝜀 | 𝑥 | ≤2𝜅 𝜀 𝑥/2𝜀 𝜀 𝜅 | 𝑥 | ≤2𝜅 𝜀 1 sin 𝜅 2 𝐺 ([−2𝜅𝜀, 2𝜅𝜀]), = 𝜀 𝜅 where in the last step 𝐺 is treated as a measure, and where we used the property that sin 𝑥 𝑥 is decreasing in 0 ≤ 𝑥 ≤ 𝜋. Hence 𝐺 ( [−2𝜅𝜀, 2𝜅𝜀]) ≤
𝜅 2 ∫ 1/𝜀 𝜀 |𝑔(𝑡)| d𝑡. sin 𝜅 −1/𝜀
Applying it to the shifted distribution functions 𝐺 (𝑥 + 𝑎) with arbitrary 𝑎 ∈ R, the above inequality remains valid for any interval of length 4𝜅𝜀, so that 𝜅 2 ∫ 1/𝜀 𝜀 |𝑔(𝑡)| d𝑡, 𝑄 𝐺 (4𝜅𝜀) ≤ 2 sin 𝜅 0
0 < 𝜅 < 𝜋, 𝜀 > 0.
1 96 Choosing here 𝜅 = 14 and using (4 sin(1/4)) 2 < 95 , we obtain the first inequality of 𝜋 the lemma, and the choice 𝜅 = 2 leads to the second one. □
3.3 Lévy Distance and Zolotarev’s Inequality
57
3.3 Lévy Distance and Zolotarev’s Inequality Another important metric which metrizes the weak topology in the space of probability measures on the line (in contrast with the Kolmogorov metric) is given by the Lévy distance 𝐿(𝐹, 𝐺) = inf ℎ ≥ 0 : 𝐺 (𝑥 − ℎ) − ℎ ≤ 𝐹 (𝑥) ≤ 𝐺 (𝑥 + ℎ) + ℎ for all 𝑥 ∈ R , where 𝐹 and 𝐺 are arbitrary distribution functions. By continuity of 𝐹 and 𝐺 from the right, this infimum is always attained (so, it may be replaced with the minimum). In a more geometric language, 𝐿 describes the size of the side of the largest square enclosed between the graphs of the two distribution functions (on the plane R2 ). In general, 0 ≤ 𝐿 ≤ 1, and there is an elementary relation 𝐿 (𝐹, 𝐺) ≤ 𝜌(𝐹, 𝐺) connecting it with the Kolmogorov distance. Conversely, if 𝐺 has a bounded derivative satisfying |𝐺 ′ (𝑥)| ≤ 𝐴 for all 𝑥 ∈ R, we have an opposite bound 𝜌(𝐹, 𝐺) ≤ (1 + 𝐴) 𝐿 (𝐹, 𝐺). Therefore, when the constant 𝐴 is known to be not large, the problems of the estimation of the Kolmogorov and Lévy distances are in essence equivalent. Otherwise, the Berry–Esseen inequality (3.5) is not that satisfactory. As it turns out, when bounding the weaker distance 𝐿 (𝐹, 𝐺) in terms of the associated characteristic functions ∫ ∞ ∫ ∞ 𝑓 (𝑡) = e𝑖𝑡 𝑥 d𝐹 (𝑥), 𝑔(𝑡) = e𝑖𝑡 𝑥 d𝐺 (𝑥) (𝑡 ∈ R), −∞
−∞
any “smoothness”-type condition imposed on 𝐺 is irrelevant. This may be seen from the following statement due to Zolotarev (which we formulate and prove in a slightly different form). Proposition 3.3.1 Given distribution functions 𝐹 and 𝐺, for any 𝑇 > 0, 𝐿(𝐹, 𝐺) ≤
1 𝜋
∫
𝑇
−𝑇
𝑓 (𝑡) − 𝑔(𝑡) 4 log(1 + 𝑇) . d𝑡 + 𝑡 𝑇
(3.8)
Proof For 𝑥 ∈ R, ℎ ≥ 0, it is sufficient to estimate the quantity L (𝑥, ℎ) = max{𝐹 (𝑥 − ℎ) − 𝐺 (𝑥 + ℎ), 𝐺 (𝑥 − ℎ) − 𝐹 (𝑥 + ℎ)}, in view of the implication sup L (𝑥, ℎ) ≤ 𝑏 + 2ℎ =⇒ 𝐿 (𝐹, 𝐺) ≤ 𝑏 + 2ℎ
(𝑏 ≥ 0).
(3.9)
𝑥
With this aim, one may use smoothing with the help of the normal distribution functions ∫ 𝑥 2 2 1 e−𝑦 /2𝜀 d𝑦 (𝑥 ∈ R) Φ 𝜀 (𝑥) = √ 𝜀 2𝜋 −∞
58
3 Characteristic Functions
with a parameter 𝜀 > 0 which will be chosen as a function of 𝑇. Similarly to the proof of Proposition 3.1.1, for the convolutions 𝐹𝜀 = 𝐹 ∗ Φ 𝜀 and 𝐺 𝜀 = 𝐺 ∗ Φ 𝜀 , consider the deviations ∫ ∞ (𝐹 (𝑥 − 𝜀𝑦) − 𝐺 (𝑥 − 𝜀𝑦)) dΦ(𝑦), 𝐼 (𝑥) = 𝐹𝜀 (𝑥) − 𝐺 𝜀 (𝑥) = −∞
where Φ = Φ1 is the standard normal distribution function. Assuming that the integral in (3.8) is finite, from the Fourier inversion formula (3.2) applied with 𝐴 𝜀 = 𝐹𝜀 − 𝐺 𝜀 , we have ∫ ∞ 1 𝑓 (𝑡) − 𝑔(𝑡) −𝜀 2 𝑡 2 /2 |𝐼 (𝑥)| ≤ d𝑡. (3.10) e 2𝜋 −∞ 𝑡 Now, let us bound the integral 𝐼 (𝑥) from below by splitting the integration into two parts. Namely, given 𝑙 > 0, write 𝐼 (𝑥) = 𝐼0 (𝑥) + 𝐼1 (𝑥) with ∫ 𝐼0 (𝑥) = (𝐹 (𝑥 − 𝜀𝑦) − 𝐺 (𝑥 − 𝜀𝑦)) dΦ(𝑦), |𝑦 | ≤𝑙 ∫ 𝐼1 (𝑥) = (𝐹 (𝑥 − 𝜀𝑦) − 𝐺 (𝑥 − 𝜀𝑦)) dΦ(𝑦). |𝑦 | ≥𝑙
Clearly, |𝐼1 (𝑥)| ≤ 2 (1 − Φ(𝑙)) ≡ 𝛾.
(3.11)
On the other hand, using the monotonicity of both 𝐹 and 𝐺, 𝐼0 (𝑥) ≥ (𝐹 (𝑥 − 𝜀𝑙) − 𝐺 (𝑥 + 𝜀𝑙)) (1 − 𝛾), 𝐼0 (𝑥) ≤ (𝐹 (𝑥 + 𝜀𝑙) − 𝐺 (𝑥 − 𝜀𝑙)) (1 − 𝛾), which imply that |𝐼0 (𝑥)| ≥ (1 − 𝛾) L (𝑥, 𝜀𝑙). Since |𝐼0 (𝑥)| ≤ |𝐼 (𝑥)| + |𝐼1 (𝑥)|, (3.10)–(3.11) yield sup L (𝑥, 𝜀𝑙) ≤ 𝑥
where
∫ 𝐽 (𝜀) =
𝛾 1 𝐽 (𝜀) + , 2𝜋 (1 − 𝛾) 1−𝛾
(3.12)
∞
−∞
𝑓 (𝑡) − 𝑔(𝑡) −𝜀 2 𝑡 2 /2 d𝑡. e 𝑡
In order to estimate this integral, one may split the integration to the regions |𝑡| < 𝑇 and |𝑡| ≥ 𝑇 and make use of the elementary inequality ∫ ∞ 2 2 1 −𝜀 2 𝑡 2 /2 1 e d𝑡 < 2 2 e−𝜀 𝑇 /2 . 𝑡 𝜀 𝑇 𝑇
3.3 Lévy Distance and Zolotarev’s Inequality
59
Since | 𝑓 (𝑡) − 𝑔(𝑡)| ≤ 2, this gives ∫
𝑇
𝐽 (𝜀) ≤ −𝑇
𝑓 (𝑡) − 𝑔(𝑡) 2 2 4 d𝑡 + 2 2 e−𝜀 𝑇 /2 , 𝑡 𝜀 𝑇
so that, by (3.12), 1 sup L (𝑥, 𝜀𝑙) ≤ 2𝜋(1 − 𝛾) 𝑥
∫
𝑇
−𝑇
𝑓 (𝑡) − 𝑔(𝑡) 2 2 2 𝛾 e−𝜀 𝑇 /2 + . d𝑡 + 𝑡 1−𝛾 𝜋(1 − 𝛾) 𝜀 2𝑇 2
Thus, according to the remark (3.9) with ℎ = 𝜀𝑙, if we show that 2 2 2 𝛾 e−𝜀 𝑇 /2 + ≤ 2𝜀𝑙, 2 2 1−𝛾 𝜋(1 − 𝛾) 𝜀 𝑇
(3.13)
then we will get 1 𝐿(𝐹, 𝐺) ≤ 2𝜋(1 − 𝛾) Put 𝜀=
∫
𝑇
−𝑇
1 √︁ 2 log(1 + 𝑇), 𝑇
𝑓 (𝑡) − 𝑔(𝑡) d𝑡 + 2𝜀𝑙. 𝑡
𝑙=
(3.14)
√︁ 2 log(1 + 𝑇). 2
With this choice, using another well-known estimate 1 − Φ(𝑥) ≤ 12 e−𝑥 /2 (𝑥 ≥ 0), we have 𝛾 1 1 , ≤ . 𝛾 = 2 (1 − Φ(𝑙)) ≤ 1+𝑇 1−𝛾 𝑇 √ √ 1 If 𝑇 ≥ e − 1, then 𝑙 ≥ 2, 𝛾 ≤ 2 (1 − Φ( 2)) < 0.16, 𝜋 (1−𝛾) < 0.4, hence the left-hand side of (3.13) does not exceed 0.4 e/(e − 1) 1 1 + + < 𝜋(1 − 𝛾) (1 + 𝑇) log(1 + 𝑇) 𝑇 (1 + 𝑇) log(1 + 𝑇) 1+𝑇 2 log(1 + 𝑇) < = 𝜀𝑙. 1+𝑇 Therefore, the condition (3.13) is fulfilled, and we obtain (3.14), that is, ∫ 𝑇 4 log(1 + 𝑇) 1 𝑓 (𝑡) − 𝑔(𝑡) . 𝐿 (𝐹, 𝐺) ≤ d𝑡 + 2𝜋(1 − 𝛾) −𝑇 𝑡 𝑇 This inequality also holds for 𝑇 < e − 1, since 𝐿 ≤ 1, while the last term is larger 4 > 1. As a result, we obtain (3.8) for all 𝑇 > 0. Proposition 3.3.1 is proved.□ than e−1
60
3 Characteristic Functions
3.4 Lower Bounds for the Kolmogorov Distance In order to obtain lower bounds for the Kolmogorov distance, characteristic functions may be used as well. In particular, we have: Proposition 3.4.1 Let 𝐹 and 𝐺 be functions of bounded variation such that 𝐹 (−∞) = 𝐺 (−∞) and 𝐹 (∞) = 𝐺 (∞), with Fourier–Stieltjes transforms 𝑓 and 𝑔, respectively. Then ∫ 2 1 ∞ 𝜌(𝐹, 𝐺) ≥ √ ( 𝑓 (𝑡) − 𝑔(𝑡)) e−𝑡 /2 d𝑡 . (3.15) 2 2𝜋 −∞ Moreover, for any 𝑇 > 0, ∫ 1 𝑇 𝑡 ( 𝑓 (𝑡) − 𝑔(𝑡)) 1 − d𝑡 . (3.16) 𝜌(𝐹, 𝐺) ≥ 3𝑇 0 𝑇 Proof These bounds are consequences of a more general estimate ∫ ∞ ∫ ∞ ( 𝑓 (𝑡) − 𝑔(𝑡)) 𝑤(𝑡) d𝑡 ≤ sup |𝐹 (𝑥) − 𝐺 (𝑥)| |F (𝑡𝑤(𝑡))| d𝑥, −∞
(3.17)
−∞
𝑥
which holds true whenever the function (1 + |𝑡|)𝑤(𝑡) is integrable. Here ∫ ∞ (F 𝑣(𝑡)) (𝑥) = e𝑖𝑡 𝑥 𝑣(𝑡) d𝑡, 𝑥 ∈ R, −∞
denotes the Fourier transform of 𝑣. To derive (3.17), let 𝐴 : R → R be a function of bounded variation such that 𝐴(−∞) = 𝐴(∞) = 0, with Fourier–Stieltjes transform 𝑎(𝑡). For points 𝑇 > 0 of continuity of 𝐴, define the truncated Fourier–Stieltjes transform ∫
∫
𝑇
𝑇
e𝑖𝑡 𝑥 d𝐴(𝑥) = 𝜀𝑇 (𝑡) − 𝑖𝑡
𝑎𝑇 (𝑡) = −𝑇
e𝑖𝑡 𝑥 𝐴(𝑥) d𝑥,
𝑡 ∈ R,
−𝑇
where we integrated by parts and put 𝜀𝑇 (𝑡) = e𝑖𝑡𝑇 𝐴(𝑇) − e−𝑖𝑡𝑇 𝐴(−𝑇). By Fubini’s theorem, ∫ ∞ ∫ 𝑇 ∫ 𝑇 𝑖𝑡 𝑥 𝐴(𝑥) (F 𝑡𝑤(𝑡)) (𝑥) d𝑥 = 𝐴(𝑥) e 𝑡𝑤(𝑡) d𝑡 d𝑥 −𝑇 −𝑇 −∞ ∫ 𝑇 ∫ ∞ = 𝑡𝑤(𝑡) e𝑖𝑡 𝑥 𝐴(𝑥) d𝑥 d𝑡 −∞ −𝑇 ∫ ∞ =𝑖 (𝜀𝑇 (𝑡) − 𝑎𝑇 (𝑡)) 𝑤(𝑡) d𝑡. −∞
From this it follows immediately that ∫ ∞ ∫ ∫ ∞ 𝑎𝑇 (𝑡) 𝑤(𝑡) d𝑡 ≤ ∥ 𝐴∥ |F (𝑡𝑤(𝑡))| d𝑥 + −∞
−∞
∞
−∞
|𝜀𝑇 (𝑡)| |𝑤(𝑡)| d𝑡,
(3.18)
3.4 Lower Bounds for the Kolmogorov Distance
61
where ∥ 𝐴∥ = sup 𝑥 | 𝐴(𝑥)|. By the definition, 𝜀𝑇 (𝑡) → 0 as 𝑇 → ∞ uniformly over all 𝑡 ∈ R. In addition, 𝑎𝑇 (𝑡) → 𝑎(𝑡) and, using the total variation norm of 𝐴, |𝑎𝑇 (𝑡)| ≤ ∥ 𝐴∥ TV
for all 𝑡 ∈ R.
Hence, letting 𝑇 → ∞ along the points of continuity of 𝐴 and applying the Lebesgue dominated convergence theorem, from (3.18) we obtain that ∫ ∞ ∫ ∞ |F (𝑡𝑤(𝑡))| d𝑥. 𝑎(𝑡) 𝑤(𝑡) d𝑡 ≤ ∥ 𝐴∥ −∞
−∞
The latter becomes the desired bound (3.17) by putting 𝐴 = 𝐹 − 𝐺 and 𝑎 = 𝑓 − 𝑔. 2 For example, the particular choice 𝑤(𝑡) = e−𝑡 /2 in (3.17) leads us to the first lower bound (3.15) of the proposition. To deduce the second bound, we choose the function 𝑤𝑇 (𝑡) = (1 − 𝑡/𝑇) 1 (0,𝑇) (𝑡). First let 𝑇 = 1 and write 𝑤 = 𝑤1 . Then, for all 𝑥 ≠ 0, for 𝑣(𝑡) = 𝑡𝑤(𝑡), we have ∫
1
e𝑖𝑡 𝑥 𝑡 (1 − 𝑡) d𝑡 =
F 𝑣(𝑥) = 0
where 𝑞(𝑥) =
−𝑥e𝑖 𝑥
−e𝑖 𝑥 − 1 2(e𝑖𝑥 − 1) 𝑞(𝑥) + = 3 , 𝑥2 𝑖𝑥 3 𝑥
− 2𝑖 e𝑖 𝑥 − 𝑥 + 2𝑖. Clearly, |𝑞(𝑥)| ≤ 2|𝑥| + 4. On the other hand, ∫ |F 𝑣(𝑥)| ≤
1
𝑡 (1 − 𝑡) d𝑡 = 0
1 , 6
and therefore ∫
∞
∫
4
∫
∞
|F 𝑣(𝑥)| d𝑥 = 2
|F 𝑣(𝑥)| d𝑥 + 2 |F 𝑣(𝑥)| d𝑥 0 4 ∫ ∞ 2𝑥 + 4 31 4 d𝑥 = < 3. ≤ +2 3 3 12 𝑥 4
−∞
In the general case of 𝑇 > 0, for 𝑣𝑇 (𝑡) = 𝑡𝑤𝑇 (𝑡) = 𝑡𝑤(𝑡/𝑇), we have F 𝑣𝑇 (𝑥) = 𝑇 2 · (F 𝑣) (𝑇𝑥), so that ∫ ∞ ∫ ∞ |F 𝑣𝑇 (𝑥)| d𝑥 = 𝑇 |F 𝑣(𝑥)| d𝑥 < 3𝑇 . −∞
Thus Proposition 3.4.1 is proved.
−∞
□
Let also note that the choice 𝑤(𝑡) = (1 − |𝑡|/𝑇) + in (3.17) would lead us to a similar bound ∫ 1 𝑇 |𝑡| ( 𝑓 (𝑡) d𝑡 𝜌(𝐹, 𝐺) ≥ − 𝑔(𝑡)) 1 − . 3.5 −𝑇 𝑇
62
3 Characteristic Functions
3.5 Remarks In a slightly different form, with an arbitrary factor 𝑏 > 21𝜋 in front of the first integral and with corresponding modifications on the right, the inequality (3.5) is due to Esseen [86], who also required the unnecessary assumption 𝐹 (∞) = 𝐺 (∞). See also Petrov [155]. The more general relation (3.1), with implicit universal factors, is due to Fainleib [88], who additionally assumed that both 𝐹 and 𝐺 are distribution functions. There are other related forms of Proposition 3.2.3 such as 1 𝜌(𝐹, 𝐺) ≤ 2𝜋
∫
𝑇
−𝑇
∫ 𝑇 𝑓 (𝑡) − 𝑔(𝑡) 1 (| 𝑓 (𝑡)| + |𝑔(𝑡)|) d𝑡, d𝑡 + 𝑡 𝑇 −𝑇
where 𝑇 > 0, and 𝑓 and 𝑔 are characteristic functions associated to given distribution functions 𝐹 and 𝐺. This bound immediately follows from the representation ∫ 𝑖 1 d𝑡 lim +𝑅 𝐹 (𝑥) = + e−𝑖𝑡 𝑥 𝑓 (𝑡) 2 2𝜋 𝜀↓0 𝜀< |𝑡 | 0), then | 𝑓 (𝑡)| ≥ 𝜎|𝑡| ≤ 1, and in this interval |𝜀(𝑡)| ≤
(4.2) 1 2
3𝑝 𝛽 𝑝 |𝑡| 𝑝 . 𝑝
whenever
(4.3)
The argument is based on the generalized chain rule: If a complex-valued function 𝑦 = 𝑦(𝑡) is defined and has 𝑝 derivatives in some open interval of the real line, and 𝑧 = 𝑧(𝑦) is analytic in the region containing all values of 𝑦, then 𝑘 𝑝 Ö ∑︁ d𝑚 𝑧(𝑦) 1 1 d𝑟 𝑦(𝑡) 𝑟 d𝑝 𝑧(𝑦(𝑡)) , = 𝑝! d𝑡 𝑝 d𝑦 𝑚 𝑦=𝑦 (𝑡) 𝑟=1 𝑘 𝑟 ! 𝑟! d𝑡 𝑟
(4.4)
where 𝑚 = 𝑘 1 + · · · + 𝑘 𝑝 and where the summation runs over all tuples (𝑘 1 , . . . , 𝑘 𝑝 ) of non-negative integers such that 𝑘 1 + 2𝑘 2 + · · · + 𝑝𝑘 𝑝 = 𝑝. This identity has a number of interesting particular cases such as the following ones. Lemma 4.1.2 With summation as before, for any 𝜆 ∈ R and any integer 𝑝 ≥ 1, ∑︁
𝑝 Ö (1 + 𝜆) 𝑝 − 1 1 𝑘𝑟 𝜆 = , (𝑚 − 1)! 𝑘 ! 𝑝 𝑟=1 𝑟 𝑝 ∑︁ Ö 1 𝜆𝑟 𝑘𝑟 = 𝜆𝑝. 𝑘 ! 𝑟 𝑟 𝑟=1
Proof Applying (4.4) with 𝑧(𝑦) = − log(1 − 𝑦), this identity becomes −
𝑝 ∑︁ (𝑚 − 1)! Ö d𝑝 1 1 d𝑟 𝑦(𝑡) 𝑘𝑟 log(1 − 𝑦(𝑡)) . = 𝑝! d𝑡 𝑝 (1 − 𝑦(𝑡)) 𝑚 𝑟=1 𝑘 𝑟 ! 𝑟! d𝑡 𝑟
Let us choose here 𝜆𝑡 = −𝜆 + 𝜆(1 − 𝑡) −1 1−𝑡 with sufficiently small values of 𝑡, so that |𝑦(𝑡)| < 1. In this case, 𝑦(𝑡) =
d𝑟 𝑦(𝑡) = 𝑟! 𝜆(1 − 𝑡) −(𝑟+1) , d𝑡 𝑟 while
(4.5)
4.1 Cumulants
−
65
i d𝑝 d𝑝 h log(1 − 𝑦(𝑡)) log(1 − 𝑡) = − log(1 − (1 + 𝜆)𝑡) d𝑡 𝑝 d𝑡 𝑝 ( 𝑝 − 1)! ( 𝑝 − 1)! + (1 + 𝜆) 𝑝 =− . (1 − 𝑡) 𝑝 (1 − (1 + 𝜆)𝑡) 𝑝
Therefore, (4.5) yields ( 𝑝 − 1)!
h
𝑝 i ∑︁ Ö (𝑚 − 1)! 𝜆 𝑚 (1 + 𝜆) 𝑝 1 1 = 𝑝! . − 𝑝 𝑝 𝑚 𝑝+𝑚 (1 − (1 + 𝜆)𝑡) (1 − 𝑡) (1 − 𝑦(𝑡)) (1 − 𝑡) 𝑘 ! 𝑟=1 𝑟
Putting 𝑡 = 0, we obtain the first identity of the lemma. Finally, let us apply (4.4) with 𝑧(𝑦) = e 𝑦 , when this identity becomes 𝑝
∑︁ Ö 1 1 d𝑟 𝑦(𝑡) 𝑘𝑟 d 𝑝 𝑦 (𝑡) 𝑦 (𝑡) e = 𝑝! . e d𝑡 𝑝 𝑘 ! 𝑟! d𝑡 𝑟 𝑟=1 𝑟
(4.6)
It remains to choose here 𝑦(𝑡) = − log(1 − 𝜆𝑡), so that d𝑟 𝑦(𝑡) = 𝜆𝑟 (𝑟 − 1)! (1 − 𝜆𝑡) −𝑟 , d𝑡 𝑟 while
d𝑝 d 𝑝 𝑦 (𝑡) e = 𝑝 (1 − 𝜆𝑡) −1 = 𝑝! 𝜆 𝑝 (1 − 𝜆𝑡) −( 𝑝+1) . 𝑝 d𝑡 d𝑡 Hence, at the point 𝑡 = 0 the equality (4.6) yields the second identity of the lemma.□ Proof (of Proposition 4.1.1) Given |𝑡| ≤ 𝑡 0 , it is sufficient to estimate the 𝑝-th derivative of log 𝑓 (𝑠) in absolute value uniformly in the interval |𝑠| ≤ |𝑡|. According to the chain rule (4.4) with 𝑧(𝑦) = log 𝑦 and 𝑦 = 𝑓 (𝑠), (log 𝑓 (𝑠)) ( 𝑝) = 𝑝!
𝑝 ∑︁ (−1) 𝑚−1 (𝑚 − 1)! Ö 1 1 (𝑟) 𝑘𝑟 𝑓 (𝑠) , 𝑓 (𝑠) 𝑚 𝑘 ! 𝑟! 𝑟=1 𝑟
(4.7)
where 𝑚 = 𝑘 1 + · · · + 𝑘 𝑠 and the summation runs over all non-negative integers 𝑝 𝑘 1 , . . . , 𝑘 𝑝 such that 𝑘 1 + 2𝑘 2 + · · · + 𝑝𝑘 𝑝 = 𝑝. Using | 𝑓 (𝑟) (𝑠)| ≤ 𝛽𝑟 ≤ 𝛽𝑟/ 𝑝 together with | 𝑓 (𝑠)| ≥ 𝜆1 , Lemma 4.1.2 gives 𝑝 ∑︁ Ö (1 + 𝜆) 𝑝 − 1 1 𝜆 𝑘𝑟 (log 𝑓 (𝑠)) ( 𝑝) ≤ 𝑝! 𝛽 𝑝 ≤ 𝑝! 𝛽 𝑝 . (𝑚 − 1)! 𝑘 ! 𝑟! 𝑝 𝑟=1 𝑟
Hence, by the integral Taylor formula, |𝑡| 𝑝 (1 + 𝜆) 𝑝 − 1 𝑝 ≤ 𝛽𝑝 |𝑡| , |𝜀(𝑡)| ≤ sup (log 𝑓 (𝑠)) ( 𝑝) 𝑝! 𝑝 |𝑠 | ≤ |𝑡 | as required.
66
4 Sums of Independent Random Variables
In the second assertion, we have 𝑓 ′ (0) = 0, | 𝑓 ′′ (𝑡)| ≤ 𝜎 2 . Hence, by the Taylor formula, 𝑡2 𝜎2𝑡 2 1 |1 − 𝑓 (𝑡)| ≤ sup | 𝑓 ′′ (𝑠)| ≤ ≤ 2 2 2 |𝑧 | ≤ |𝑡 | for 𝜎|𝑡| ≤ 1, so that | 𝑓 (𝑡)| ≥ 12 . Proposition 4.1.1 is proved.
□
Each cumulant 𝛾 𝑝 = 𝛾 𝑝 (𝑋) of a random variable 𝑋 is determined by the first moments 𝛼𝑟 = E𝑋 𝑟 , 𝑟 = 1, . . . , 𝑝. More precisely, the chain rule formula (4.7) at 𝑠 = 0 gives the identity 𝛾 𝑝 = 𝑝!
∑︁
(−1) 𝑚−1 (𝑚 − 1)!
𝑝 Ö 1 𝛼𝑟 𝑘𝑟 𝑘 ! 𝑟! 𝑟=1 𝑟
(4.8)
with summation as in (4.4). For example, 𝛾1 = 𝛼1 , 𝛾2 = 𝛼2 − 𝛼12 . Moreover, if 𝛼1 = E𝑋 = 0, 𝜎 2 = E𝑋 2 , then 𝛾1 = 𝛼1 ,
𝛾2 = 𝛼2 = 𝜎 2 ,
𝛾3 = 𝛼3 ,
𝛾4 = 𝛼4 − 3𝛼22 = 𝛽4 − 3𝜎 4 .
The reverse statement is also true. Namely, applying the chain rule formula (4.6) with 𝑦(𝑡) = log 𝑓 (𝑡) at 𝑡 = 0, we arrive at 𝛼 𝑝 = 𝑝!
𝑝 ∑︁ Ö 1 𝛾𝑟 𝑘𝑟 . 𝑘 ! 𝑟! 𝑟=1 𝑟
(4.9)
Using (4.1)–(4.2), the cumulants can be bounded in terms of absolute moments. Indeed, applying (4.2) at 𝑡 = 0 and letting 𝑡 0 → 0, 𝜆 → 1, we get |𝛾 𝑝 | = 𝑝! lim 𝑡→0
|𝜀(𝑡)| ≤ (2 𝑝 − 1) ( 𝑝 − 1)! 𝛽 𝑝 . |𝑡| 𝑝
If 𝑋 has mean zero, the constant on the right-hand side can be improved, as the following statement shows. We follow a simple argument due to Bikjalis [17]. Proposition 4.1.3 If 𝛽 𝑝 < ∞ for some integer 𝑝 ≥ 1, and E𝑋 = 0, then |𝛾 𝑝 | ≤ ( 𝑝 − 1)! 𝛽 𝑝 . Proof Since 𝛾1 = 0 and 𝛾2 = 𝛼2 = 𝛽2 when E𝑋 = 0, the desired bound is immediate for 𝑝 ≤ 2. So, let 𝑝 ≥ 3. Differentiating the identity 𝑓 ′ (𝑡) = 𝑓 (𝑡) (log 𝑓 (𝑡)) ′ near zero 𝑝 − 1 times in accordance with the binomial formula, one gets 𝑝−1
∑︁ d 𝑝−1−𝑟 d𝑟+1 d𝑝 𝑟 𝑓 (𝑡) = 𝐶 𝑓 (𝑡) log 𝑓 (𝑡), 𝑝−1 d𝑡 𝑝 d𝑡 𝑝−1−𝑟 d𝑡 𝑟+1 𝑟=0 where we use the notation 𝐶𝑛𝑘 =
𝑛! 𝑘!(𝑛−𝑘)!
for the binomial coefficients. Equivalently,
4.2 Lyapunov Coefficients
67
𝑝−2 d𝑝 1 d𝑝 d𝑟+1 1 ∑︁ 𝑟 d 𝑝−1−𝑟 log 𝑓 (𝑡) = 𝑓 (𝑡) 𝐶 𝑓 (𝑡) log 𝑓 (𝑡). − d𝑡 𝑝 𝑓 (𝑡) d𝑡 𝑝 𝑓 (𝑡) 𝑟=0 𝑝−1 d𝑡 𝑝−1−𝑟 d𝑡 𝑟+1
At 𝑡 = 0, using 𝛼1 = 0, this identity becomes 𝛾𝑝 = 𝛼𝑝 −
𝑝−3 ∑︁
𝐶 𝑟𝑝−1 𝛼 𝑝−1−𝑟 𝛾𝑟+1 .
𝑟=0
One can now proceed by induction on 𝑝. Since |𝛼 𝑝−1−𝑟 | ≤ 𝛽 (𝑝𝑝−1−𝑟)/ 𝑝 and 𝑝 𝛾𝑟+1 ≤ 𝑟!𝛽 (𝑟+1)/ (the induction hypothesis), we obtain that 𝑝 |𝛾 𝑝 | ≤ 𝛽 𝑝 + 𝛽 𝑝
𝑝−3 ∑︁
𝑝−3
𝐶 𝑟𝑝−1 𝑟!
= ( 𝑝 − 1)! 𝛽 𝑝
𝑟=0
h
i ∑︁ 1 1 + . ( 𝑝 − 1)! 𝑟=0 ( 𝑝 − 1 − 𝑟)!
1 1 The expression in the square brackets ( 𝑝−1)! + ( 2!1 + · · · + ( 𝑝−1)! ) is equal to 1 for 1 𝑝 = 3 and is smaller than 6 + (e − 2) < 1 for 𝑝 ≥ 4, thus proving Proposition 4.1.3.□
4.2 Lyapunov Coefficients Let 𝑋1 , . . . , 𝑋𝑛 be independent random variables on a probability Í space (Ω, F , P) with mean zero and variances 𝑏 2𝑘 = Var(𝑋 𝑘 ) (𝑏 𝑘 ≥ 0) such that 𝑛𝑘=1 𝑏 2𝑘 = 1. Many important properties of the distributions of the sums 𝑆 𝑛 = 𝑋1 + · · · + 𝑋𝑛 rely upon the behavior of moment functionals such as 𝛼𝑝 =
𝑛 ∑︁
E𝑋 𝑘𝑝 ,
𝛽 𝑝,𝑘 = E |𝑋 𝑘 | 𝑝 ,
𝑘=1
𝐿𝑝 =
𝑛 ∑︁
𝛽 𝑝,𝑘 ( 𝑝 ≥ 2 integer).
𝑘=1
The quantity 𝐿 𝑝 is called the Lyapunov ratio or the Lyapunov coefficient of order 𝑝. It is finite if and only if all the 𝑋 𝑘 ’s have finite absolute moments of order 𝑝, in which case 𝛼 𝑝 is well-defined and satisfies |𝛼 𝑝 | ≤ 𝐿 𝑝 . Let us mention a few general properties of these quantities. 1
Proposition 4.2.1 The function 𝑝 → 𝐿 𝑝𝑝−2 is non-decreasing, that is, 𝑞−2
𝐿 𝑞 ≤ 𝐿 𝑝𝑝−2 ,
2 ≤ 𝑞 ≤ 𝑝.
In addition, 𝐿 𝑝−𝑞 𝐿 𝑞 ≤ 𝐿 𝑝−2 ,
2 ≤ 𝑞 ≤ 𝑝 − 2.
68
4 Sums of Independent Random Variables
Proof Let 𝐹𝑘 denote the distribution of 𝑋 𝑘 . Since 𝐿 2 = 1, the equality d𝜇(𝑥) = Í 𝑛 2 𝑘=1 𝑥 d𝐹𝑘 (𝑥) defines a (Borel) probability measure on the real line. Moreover, 𝐿𝑝 =
𝑛 ∫ ∑︁ 𝑘=1
∞
∫
∞
|𝑥| 𝑝−2 d𝜇(𝑥) = E |𝜉 | 𝑝−2 ,
|𝑥| 𝑝 d𝐹𝑘 (𝑥) =
−∞
−∞ 1
1
where 𝜉 is a random variable distributed according to 𝜇. Hence 𝐿 𝑝𝑝−2 = E |𝜉 | 𝑝−2 ) 𝑝−2 is non-decreasing on the axis 𝑝 > 2 (not necessarily for integer values). In the second assertion, we may assume that 2 < 𝑞 < 𝑝 − 2 (real), so that 𝑝 > 4. Let 𝜂 be an independent copy of 𝜉. Using Young’s inequality 𝑥𝑦 ≤ 𝑢1 𝑥 𝑢 + 1𝑣 𝑦 𝑣 (𝑥, 𝑦 ≥ 0, 𝑢, 𝑣 ≥ 1 such that 𝑢1 + 1𝑣 = 1), we get 𝐿 𝑝−𝑞 𝐿 𝑞 = E |𝜉 | 𝑝−𝑞−2 |𝜂| 𝑞−2 𝑝 − 𝑞 − 2 𝑝−4 𝑞 − 2 𝑝−4 ≤E |𝜉 | + |𝜂| = 𝐿 𝑝−2 . 𝑝−4 𝑝−4 The proposition is proved.
□
Proposition 4.2.2 We have 𝐿 𝑝 ≥ 𝑛−
𝑝−2 𝑝
. In particular, 𝐿 4 ≥ 𝑛−1 , 𝐿 5 ≥ 𝑛−3/2 .
Proof Let 𝑝 > 2. By Hölder’s inequality with exponents 𝑢 = 1=
𝑛 ∑︁
𝑏 2𝑘 ≤ 𝑛1/𝑢
𝑘=1
𝑛 ∑︁
𝑏 𝑘𝑝
1/𝑣
≤ 𝑛1/𝑢
𝑛 ∑︁
𝑘=1
1/𝑣 𝛽 𝑝,𝑘
𝑝 𝑝−2
and 𝑣 =
𝑝 2,
= 𝑛1/𝑢 𝐿 1/𝑣 𝑝 .
𝑘=1
Hence 𝐿 𝑝 ≥ 𝑛−𝑣/𝑢 = 𝑛−( 𝑝−2)/ 𝑝 , thus proving the claim.
□
On the other hand, if 𝐿 𝑝 is small, then the variances 𝑏 2𝑘 of 𝑋 𝑘 have to be small as well, uniformly over all 𝑘. 𝑝 Proposition 4.2.3 For any 𝑝 > 2, we have max 𝑘 ≤𝑛 𝑏 𝑘 ≤ 𝐿 1/ 𝑝 .
Indeed, max 𝑏 𝑘 ≤
∑︁ 𝑛
𝑘
𝑏 𝑘𝑝
𝑘=1
1/ 𝑝 ≤
∑︁ 𝑛
1/ 𝑝 𝛽 𝑝,𝑘
𝑝 = 𝐿 1/ 𝑝 .
𝑘=1
Once the Lyapunov coefficient 𝐿 𝑝 is finite, one may introduce the cumulants 𝛾 𝑝,𝑘 = 𝛾 𝑝 (𝑋 𝑘 ) =
d𝑝 log 𝑣 𝑘 (𝑡) 𝑡=0 , 𝑝 𝑝 𝑖 d𝑡
𝑘 = 1, . . . , 𝑛,
where 𝑣 𝑘 = E e𝑖𝑡 𝑋𝑘 denote the characteristic functions of 𝑋 𝑘 . Since the characteristic function of 𝑆 𝑛 is described as the product 𝑓𝑛 (𝑡) = E e𝑖𝑡𝑆𝑛 = 𝑣1 (𝑡) . . . 𝑣𝑛 (𝑡), the 𝑝-th cumulant of 𝑆 𝑛 is given by
4.3 Rosenthal-type Inequalities
69
𝛾 𝑝 = 𝛾 𝑝 (𝑆 𝑛 ) =
𝑛 ∑︁ d𝑝 log 𝑓 (𝑡) = 𝛾 𝑝,𝑘 . 𝑛 𝑡=0 𝑖 𝑝 d𝑡 𝑝 𝑘=1
The first values are 𝛾1 = 0, 𝛾2 = 1. Applying the inequality of Proposition 4.1.3, we immediately obtain a similar relation between the Lyapunov coefficients and the cumulants of the sums. Proposition 4.2.4 For all 𝑝 ≥ 2, the cumulants 𝛾 𝑝 = 𝛾 𝑝 (𝑆 𝑛 ) satisfy |𝛾 𝑝 | ≤ ( 𝑝 − 1)! 𝐿 𝑝 .
4.3 Rosenthal-type Inequalities The Lyapunov coefficients 𝐿𝑝 =
𝑛 ∑︁
E |𝑋 𝑘 | 𝑝
𝑘=1
may also be used to bound absolute moments of the sums 𝑆 𝑛 = 𝑋1 + · · · + 𝑋𝑛 . In particular, we have the following observation due to Rosenthal [162]. As before, the random variables 𝑋1 , . . . , 𝑋𝑛 are assumed to be independent with mean zero. Proposition 4.3.1 If E |𝑆 𝑛 | 2 = 1, then for any real number 𝑝 ≥ 2, E |𝑆 𝑛 | 𝑝 ≤ 𝐶 𝑝 max{𝐿 𝑝 , 1}
(4.10)
for some constant 𝐶 𝑝 depending on 𝑝 only. For even integers 𝑝 ≥ 4, we have the following simple argument. Applying the expression (4.9) to 𝑆 𝑛 and putting 𝑟 ∗ = max(𝑟, 2), Proposition 4.2.4 gives E |𝑆 𝑛 | 𝑝 = 𝛼 𝑝 (𝑆 𝑛 ) = 𝑝! ≤ 𝑝!
𝑝 ∑︁ Ö 1 𝛾𝑟 (𝑆 𝑛 ) 𝑘𝑟 𝑘 ! 𝑟! 𝑟=1 𝑟 𝑝 ∑︁ Ö 1 𝐿 𝑟 ∗ 𝑘𝑟 , 𝑘 ! 𝑟 𝑟=1 𝑟
(4.11)
where the summation is performed over all non-negative integers 𝑘 1 , . . . , 𝑘 𝑝 such that 𝑘 1 + 2𝑘 2 + · · · + 𝑝𝑘 𝑝 = 𝑝. By Proposition 4.2.1, 𝑟−2
𝑟
𝐿 𝑟 ≤ 𝐿 𝑝𝑝−2 ≤ (max(𝐿 𝑝 , 1)) 𝑝 ,
2 ≤ 𝑟 ≤ 𝑝.
Hence, using the second identity of Lemma 4.1.2, the second sum in (4.11) does not exceed 𝑘 1/ 𝑝 𝑝 ∑︁ Ö 1 (max(𝐿 𝑝 , 1)) 𝑟 𝑟 = max(𝐿 𝑝 , 1). 𝑘 ! 𝑟 𝑟=1 𝑟
70
4 Sums of Independent Random Variables
Thus,(4.10) holds true with 𝐶 𝑝 ≤ 𝑝! for 𝑝 = 2, 4, 6, 8, . . . Let us also describe an alternative induction argument restricting ourselves to integer values 𝑝 ≥ 3 (although the argument works for real 𝑝 > 2 as well). We consider the following Rosenthal-type inequality E |𝑆 𝑛 | 𝑝 ≤
𝑝 ∑︁
𝑝−𝑚
𝐴𝑚 ( 𝑝) 𝐿 𝑚 𝐿 2 2 ,
(4.12)
𝑚=2
where the non-negative coefficients 𝐴𝑚 ( 𝑝) will be determined by the induction. In fact, it makes sense to choose 𝐴 𝑝 ( 𝑝) = 1 and 𝐴 𝑝−1 ( 𝑝) = 𝑝. Then (4.12) holds automatically for 𝑛 = 1. Note that this inequality is homogeneous with respect to 𝑋 𝑘 , so, the condition 𝐿 2 = E |𝑆 𝑛 | 2 = 1 is irrelevant. To carry out the induction step, assume that 𝑛 ≥ 2 and 𝐿 2 = 1. By the integral Taylor formula, for all 𝑥, ℎ ∈ R, if 𝑝 is odd, we have 𝑝
|𝑥 + ℎ| =
𝑝−1 ∑︁
∫ 𝐶 𝑙𝑝
sign(𝑥) 𝑥
𝑝−𝑙 𝑙
ℎ + 𝑝ℎ
𝑝
1
sign(𝑥 + 𝑡ℎ) (1 − 𝑡) 𝑝−1 d𝑡. 0
𝑙=0
A similar representation without the sign function also holds when 𝑝 is even. We apply it with independent summands 𝑥 = 𝑆 𝑛,𝑘 = 𝑆 𝑛 − 𝑋 𝑘 and ℎ = 𝑋 𝑘 , 1 ≤ 𝑘 ≤ 𝑛. Taking expectations of both sides and noting that the term corresponding to 𝑙 = 1 is vanishing (due to E𝑋 𝑘 = 0), we get ∑︁ E |𝑆 𝑛 | 𝑝 ≤ E |𝑆 𝑛,𝑘 | 𝑝 + 𝐶 𝑙𝑝 E |𝑆 𝑛,𝑘 | 𝑝−𝑙 E |𝑋 𝑘 | 𝑙 . 2≤𝑙 ≤ 𝑝
Here, for 𝑙 = 𝑝−1, one may use E |𝑆 𝑛,𝑘 | ≤ (E |𝑆 𝑛,𝑘 | 2 ) 1/2 ≤ 1, while for 2 ≤ 𝑙 ≤ 𝑝−2, from the induction hypothesis (4.12) it follows that E |𝑆 𝑛,𝑘 | 𝑝−𝑙 ≤
𝑝−𝑙 ∑︁
𝐴𝑞 ( 𝑝 − 𝑙) 𝐿 𝑞 .
𝑞=2
Plugging this result into the previous bound and summing over all 𝑘 ≤ 𝑛, we arrive at 𝑛 E |𝑆 𝑛 | 𝑝 ≤
𝑛 ∑︁
E |𝑆 𝑛,𝑘 | 𝑝
𝑘=1
+
∑︁
∑︁
𝐶 𝑙𝑝 𝐴𝑞 ( 𝑝 − 𝑙) 𝐿 𝑞 𝐿 𝑙 + 𝑝 𝐿 𝑝−1 + 𝐿 𝑝 .
2≤𝑙 ≤ 𝑝−2 2≤𝑞 ≤ 𝑝−𝑙
To further simplify, recall that 𝐿 𝑞 𝐿 𝑙 ≤ 𝐿 𝑞+𝑙−2 (Proposition 4.2.1), which turns the right-hand side into a linear function with respect to the Lyapunov coefficients:
4.3 Rosenthal-type Inequalities
𝑛 E |𝑆 𝑛 | 𝑝 ≤
𝑛 ∑︁
71
∑︁
E |𝑆 𝑛,𝑘 | 𝑝 +
𝐵𝑚 ( 𝑝) 𝐿 𝑚 + 𝑝 𝐿 𝑝−1 + 𝐿 𝑝 ,
(4.13)
2≤𝑚≤ 𝑝−2
𝑘=1
Í where 𝐵𝑚 ( 𝑝) = 2≤𝑙 ≤𝑚 𝐶 𝑙𝑝 𝐴𝑚−𝑙+2 ( 𝑝 − 𝑙). To bound the first sum in (4.13), again by the induction hypothesis (4.12), we have E |𝑆 𝑛,𝑘 | 𝑝 ≤
𝑝 ∑︁
𝐴𝑚 ( 𝑝)
E |𝑋 𝑗 | 𝑚 ,
𝑗≠𝑘
𝑚=2
so that 𝑛 ∑︁
∑︁
𝑝
E |𝑆 𝑛,𝑘 | ≤ (𝑛 − 1)
𝑘=1
𝑝 ∑︁
𝐴𝑚 ( 𝑝) 𝐿 𝑚 .
𝑚=2
Plugging this into (4.13), we obtain 𝑝
𝑛 E |𝑆 𝑛 | ≤ 𝑛
𝑝 ∑︁
𝐴𝑚 ( 𝑝) 𝐿 𝑚 +
∑︁
(𝐵𝑚 ( 𝑝) − 𝐴𝑚 ( 𝑝)) 𝐿 𝑚
2≤𝑚≤ 𝑝−2
𝑚=2
+ ( 𝑝 − 𝐴 𝑝−1 ( 𝑝)) 𝐿 𝑝−1 + (1 − 𝐴 𝑝 ( 𝑝)) 𝐿 𝑝 . As a result, we arrive at the desired relation (4.12) for 𝑛 summands, as long as 𝐴 𝑝 ( 𝑝) = 1, 𝐴 𝑝−1 ( 𝑝) = 𝑝, and if the remaining coefficients satisfy 𝐵𝑚 ( 𝑝) ≤ 𝐴𝑚 ( 𝑝), that is, ∑︁ (4.14) 𝐴𝑚 ( 𝑝) ≥ 𝐶 𝑙𝑝 𝐴𝑚−𝑙+2 ( 𝑝 − 𝑙), 2 ≤ 𝑚 ≤ 𝑝 − 2. 2≤𝑙 ≤𝑚
The latter is not a condition for 𝑝 = 3. For larger values of 𝑝, one may replace the inequality symbol here with the equality symbol and use it as a recursive formula ∑︁ 𝐶 𝑙𝑝 𝐴𝑚−𝑙+2 ( 𝑝 − 𝑙), 2 ≤ 𝑚 ≤ 𝑝 − 2, 𝐴𝑚 ( 𝑝) = 2≤𝑙 ≤𝑚
with respect to 𝑝. If 𝑝 = 4, this formula is reduced to 𝐴2 (4) = 𝐶42 𝐴2 (2) = 6, and if 𝑝 = 5, it is reduced to 𝐴2 (5) = 𝐶52 𝐴2 (3) = 30
and
𝐴3 (5) = 𝐶52 𝐴3 (3) + 𝐶53 𝐴2 (2) = 20.
Hence, assuming that 𝐿 2 = 1, we get in (4.12) the Rosenthal-type inequalities E |𝑆 𝑛 | 3 ≤ 𝐿 3 + 3 ≤ 4 max{𝐿 3 , 1}, E |𝑆 𝑛 | 4 ≤ 𝐿 4 + 4𝐿 3 + 6 ≤ 11 max{𝐿 4 , 1}, E |𝑆 𝑛 | 5 ≤ 𝐿 5 + 5𝐿 4 + 20𝐿 3 + 30 ≤ 56 max{𝐿 5 , 1}.
72
4 Sums of Independent Random Variables
To handle larger values, one may take 𝐴𝑚 ( 𝑝) = 𝑝! (2 ≤ 𝑚 ≤ 𝑝 − 2) in (4.14) or even better 𝐴𝑚 ( 𝑝) = ( 𝑝 − 1)! (for large 𝑝). The latter choice leads to ∑︁ 𝐿 𝑚 < 𝑝! max{𝐿 𝑝 , 1}. E |𝑆 𝑛 | 𝑝 ≤ 𝐿 𝑝 + 𝑝𝐿 𝑝−1 + ( 𝑝 − 1)! 2≤𝑚≤ 𝑝−2
4.4 Normal Approximation Now, we turn to the problem of normal approximation for distribution functions 𝐹𝑛 (𝑥) = P{𝑆 𝑛 ≤ 𝑥} of the sums 𝑆 𝑛 = 𝑋1 + · · · + 𝑋𝑛 . We assume that the random variables 𝑋1 , . . . , 𝑋𝑛 are independent and have mean zero Í and variances 𝑏 2𝑘 = Var(𝑋 𝑘 ) such that 𝑛𝑘=1 𝑏 2𝑘 = 1. Thus, E𝑆 𝑛 = 0, Var(𝑆 𝑛 ) = 1. The central limit theorem aims to provide natural conditions such that the distribution of 𝑆 𝑛 can be approximated in some sense by the standard Gaussian measure on the real line, that is, with density and distribution function ∫ 𝑥 2 1 𝜑(𝑥) = √ e−𝑥 /2 , Φ(𝑥) = 𝜑(𝑦) d𝑦 (𝑥 ∈ R). −∞ 2𝜋 In terms of the Kolmogorov distance, the normal approximation may be quantified Í by means of the Lyapunov coefficient 𝐿 3 = 𝑛𝑘=1 E |𝑋 𝑘 | 3 . Namely, we have the following well-known variant of the central limit theorem. Proposition 4.4.1 For some absolute constant 𝑐 > 0, 𝜌(𝐹𝑛 , Φ) ≤ 𝑐𝐿 3 .
(4.15)
This assertion is√often referred to as the Berry–Esseen theorem, especially in the i.i.d. case 𝑋 𝑘 = 𝜉 𝑘 / 𝑛 with E 𝜉 𝑘 = 0, E 𝜉 𝑘2 = 1, 𝛽3 = E |𝜉 𝑘 | 3 , when (4.15) becomes 𝑐𝛽3 𝜌(𝐹𝑛 , Φ) ≤ √ . 𝑛 Here, the best constant is unknown, but it is known that one may take 𝑐 = 0.56 ([166]). As a preliminary step towards the proof of Proposition 4.4.1, first let us see that the characteristic function 𝑓𝑛 (𝑡) = E e𝑖𝑡𝑆𝑛 has a Gaussian decay on a large interval, whose length is controlled by 𝐿 3 (of course, this makes sense when 𝐿 3 is small). Lemma 4.4.2 In the interval |𝑡| ≤
1 𝐿3 ,
we have | 𝑓𝑛 (𝑡)| ≤ e−𝑡
2 /6
.
Proof Let 𝑣 𝑘 denote the characteristic function of 𝑋 𝑘 , and let 𝑋 𝑘′ be an independent copy of 𝑋 𝑘 . The random variable 𝑌𝑘 = 𝑋 𝑘 − 𝑋 𝑘′ has a symmetric distribution with
4.4 Normal Approximation
73
second moment 2𝑏 2𝑘 and a non-negative characteristic function |𝑣 𝑘 (𝑡)| 2 . If 𝐿 3 is finite, so are the absolute moments 𝛽3,𝑘 = E |𝑋 𝑘 | 3 , and we have E |𝑌𝑘 | 3 = E (𝑋 𝑘 − 𝑋 𝑘′ ) 2 |𝑋 𝑘 − 𝑋 𝑘′ | ≤ E (𝑋 𝑘2 − 2𝑋 𝑘 𝑋 𝑘′ + 𝑋 𝑘′2 ) (|𝑋 𝑘 | + |𝑋 𝑘′ |) = 2 E |𝑋 𝑘 | 3 + 2 E𝑋 𝑘2 E |𝑋 𝑘 | ≤ 4𝛽3,𝑘 . Hence, expanding the function |𝑣 𝑘 | 2 around zero according to Taylor’s formula up to the 3-rd term, we get |𝑣 𝑘 (𝑡)| 2 ≤ 1 − 𝑏 2𝑘 𝑡 2 +
n o 4 2 𝛽3,𝑘 |𝑡| 3 ≤ exp − 𝑏 2𝑘 𝑡 2 + 𝛽3,𝑘 |𝑡| 3 . 3! 3
Multiplying these inequalities and assuming that |𝑡| ≤
1 𝐿3 ,
we arrive at
o n 2 2 | 𝑓𝑛 (𝑡)| 2 ≤ exp − 𝑡 2 + 𝐿 3 |𝑡| 3 ≤ e−𝑡 /3 , 3 proving the lemma.
□
On a smaller interval, the statement of Lemma 4.4.2 may be sharpened by comparing 𝑓𝑛 (𝑡) with the characteristic function of the standard Gaussian measure. Lemma 4.4.3 In the interval |𝑡| ≤ 𝐿 3−1/3 , we have | 𝑓𝑛 (𝑡) − e−𝑡
2 /2
| ≤ 𝑐𝐿 3 |𝑡| 3 e−𝑡
2 /2
(4.16)
for some absolute constant 𝑐 > 0. Proof Applying Proposition 4.1.1 with 𝑝 = 3 and using E𝑋 𝑘 = 0, we see that the characteristic functions 𝑣 𝑘 (𝑡) = E e𝑖𝑡 𝑋𝑘 admit the representation 𝑡2 𝑣 𝑘 (𝑡) = exp − 𝑏 2𝑘 + 𝜀 𝑘 (𝑡) , 𝑏 𝑘 |𝑡| ≤ 1, 2 with remainder terms such that |𝜀 𝑘 (𝑡)| ≤ 9 𝛽3,𝑘 |𝑡| 𝑝 , cf. (4.3). Multiplying these equalities, we obtain a similar representation 1 2 +𝜀 (𝑡)
𝑓𝑛 (𝑡) = e− 2 𝑡
.
(4.17)
Moreover, according to Proposition 4.2.3, if 𝐿 31/3 |𝑡| ≤ 1, then 𝑏 𝑘 |𝑡| ≤ 1 for all 𝑘 ≤ 𝑛, and therefore (4.17) holds true in this interval with a remainder term satisfying |𝜀(𝑡)| ≤
𝑛 ∑︁
|𝜀 𝑘 (𝑡)| ≤ 9𝐿 3 |𝑡| 3 ≤ 9.
𝑘=1
In particular, |e 𝜀 (𝑡) − 1| ≤ 𝑐 |𝜀(𝑡)| ≤ 9𝑐 𝐿 3 |𝑡| 3 for some absolute constant 𝑐 > 0. Hence, (4.17) implies the desired relation (4.16), thus proving the lemma. □
74
4 Sums of Independent Random Variables
With a certain loss in the bound, both lemmas can be united in one relation | 𝑓𝑛 (𝑡) − e−𝑡
2 /2
| ≤ 𝑐𝐿 3 min{1, |𝑡| 3 } e−𝑡
2 /8
,
|𝑡| ≤
1 . 𝐿3
Proof (of Proposition 4.4.1) Since 𝜌(𝐹𝑛 , Φ) ≤ 1, we may assume that 𝐿 3 ≤ 1 in (4.15). Putting 𝑇0 = 𝐿 3−1/3 and 𝑇 = 𝐿 3−1 , the Berry–Esseen inequality (3.5) of Proposition 3.2.1 applied with 𝐹 = 𝐹𝑛 and 𝐺 = Φ yields ∫
| 𝑓𝑛 (𝑡) − e−𝑡 𝑡
𝑇0
𝜌(𝐹𝑛 , Φ) ≤ 2 0
∫
𝑇
+2 𝑇0
2 /2
| 𝑓𝑛 (𝑡)| + e−𝑡 𝑡
|
d𝑡
2 /2
2𝜋 2 1 d𝑡 + √ . 2𝜋 𝑇
Here, the first integral is bounded by 𝑐 1 𝐿 3 , by Lemma 4.4.3, while the second integral is bounded according to Lemma 4.4.2 by ∫
∞
2 𝑇0
2
2 𝑐3 e−𝑡 /6 d𝑡 ≤ 𝑐 2 e−𝑇0 /6 ≤ 3 = 𝑐 3 𝐿 3 . 𝑡 𝑇0
Here, 𝑐 𝑗 are positive absolute constants. Proposition 4.4.1 is proved.
□
4.5 Expansions for the Product of Characteristic Functions One may wonder whether or not it is possible to sharpen the rate of approximation in Proposition 4.4.1 when replacing the standard normal distribution function by a certain correction. In general this is not possible unless one uses rather non-universal approximations (for example, when all random variables 𝑋 𝑘 are discrete and are identically distributed), but a certain sharpening is indeed possible in a weaker sense, assuming that the Lyapunov coefficient 𝐿 𝑝 is finite (and actually small) for a fixed integer 𝑝 ≥ 4. We now focus on the refinement of the inequality (4.16) of Lemma 4.4.3, which will be used when applying the Berry–Esseen inequality (3.5). Thus, keeping the same notations and assumptions as in the previous section, let us return to the characteristic function 𝑓𝑛 (𝑡) = E e𝑖𝑡𝑆𝑛 of 𝑆 𝑛 = 𝑋1 + · · · + 𝑋𝑛 . As we know from Proposition 4.1.1, in terms of the cumulants 𝛾𝑟 ,𝑘 of 𝑋 𝑘 , the characteristic functions 𝑣 𝑘 (𝑡) = E e𝑖𝑡 𝑋𝑘 admit the representation 𝑣 𝑘 (𝑡) = exp
∑︁ 𝑝−1
𝛾𝑟 ,𝑘
(𝑖𝑡) 𝑟 + 𝜀 𝑘 (𝑡) , 𝑟!
𝑏 𝑘 |𝑡| ≤ 1,
𝑟=2 𝑝
with remainder terms satisfying |𝜀 𝑘 (𝑡)| ≤ 3𝑝 𝛽 𝑝,𝑘 |𝑡| 𝑝 . Here, we used the assumption 𝛾1,𝑘 = E𝑋 𝑘 = 0. Multiplying these equalities, we obtain a similar representation
4.5 Expansions for the Product of Characteristic Functions
𝑓𝑛 (𝑡) = e
−𝑡 2 /2
75
∑︁ 𝑝−1
(𝑖𝑡) 𝑟 exp 𝛾𝑟 + 𝜀(𝑡) 𝑟! 𝑟=3
(4.18)
involving the cumulants 𝛾𝑟 of 𝑆 𝑛 (recall that 𝛾2 = E𝑆 2𝑛 = 1). In view of Proposition 4.2.3, the interval where this equality holds is also controlled in terms of 𝐿 𝑝 . Namely, 𝑝 if 𝐿 1/ 𝑝 |𝑡| ≤ 1, then 𝑏 𝑘 |𝑡| ≤ 1 for all 𝑘 ≤ 𝑛, and therefore |𝜀(𝑡)| ≤
𝑛 ∑︁
|𝜀 𝑘 (𝑡)| ≤
3𝑝 3𝑝 𝐿 𝑝 |𝑡| 𝑝 ≤ . 𝑝 𝑝
𝑘=1
In particular, in this interval, |e 𝜀 (𝑡) − 1| ≤ 𝑐 𝑝 |𝜀(𝑡)| ≤ 𝑐 ′𝑝 𝐿 𝑝 |𝑡| 𝑝 for some constants depending on 𝑝 only. As a result, (4.18) is simplified to 𝑓𝑛 (𝑡) = e−𝑡
2 /2
e𝑄 𝑝−1 (𝑖𝑡) (1 + 𝜀(𝑡)),
|𝜀(𝑡)| ≤ 𝑐 𝑝 𝐿 𝑝 |𝑡| 𝑝 ,
(4.19)
where 𝑄 𝑝−1 (𝑧) =
𝑝−1 ∑︁ 𝛾 𝑝−1 𝛾𝑟 𝑟 𝛾 3 3 𝑧 = 𝑧 +···+ 𝑧 𝑝−1 , 𝑟! 3! ( 𝑝 − 1)! 𝑟=3
𝑧 ∈ C.
That is, we arrive at: 𝑝 Lemma 4.5.1 The representation (4.19) holds in the interval 𝐿 1/ 𝑝 |𝑡| ≤ 1 ( 𝑝 ≥ 3) with a constant 𝑐 𝑝 > 0 depending on 𝑝 only.
Note that, when 𝑝 = 3, the sum in (4.19) is vanishing, and we return to the inequality (4.16) of Lemma 4.4.3. In the next step, assuming that 𝑝 ≥ 4, it is natural to simplify the expression e𝑄 𝑝−1 (𝑖𝑡) (1 + 𝜀(𝑡)) to the form 1 + 𝑅 𝑝−1 (𝑖𝑡) + 𝜀(𝑡) with a certain polynomial 𝑅 𝑝−1 and with a new remainder term, which would be still as small as the Lyapunov coefficient 𝐿 𝑝 . This may indeed be possible on a smaller interval. Namely, write exp{𝑄 𝑝−1 (𝑖𝑡)} =
∞ ∑︁
𝑎 𝑘 (𝑖𝑡) 𝑘
𝑘=0
with coefficients 𝑎𝑘 =
∑︁ 3𝑘1 +···+( 𝑝−1) 𝑘 𝑝−3 =𝑘
𝑘 𝑝−3 𝛾 𝑘1 𝛾 1 𝑝−1 3 ··· . 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)!
Clearly, all these series are absolutely convergent for all 𝑡. Part of the last infinite series represents the desired polynomial 𝑅 𝑝−1 . Recall that by 𝛾𝑟 we denote the cumulants of 𝑆 𝑛 of orders 𝑟, which are well defined for all 𝑟 = 1, . . . , 𝑝 − 1 as long as the Lyapunov coefficient 𝐿 𝑝−1 is finite.
76
4 Sums of Independent Random Variables
Definition 4.5.2 If 𝐿 𝑝−1 is finite (𝑝 ≥ 4), put 𝑅 𝑝−1 (𝑖𝑡) =
∑︁
𝑘 𝑝−3 𝛾 𝑘1 𝛾 1 𝑝−1 3 ··· (𝑖𝑡) 3𝑘1 +···+( 𝑝−1) 𝑘 𝑝−3 , 𝑘 1 ! . . . 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)!
where the summation runs over all collections of non-negative integers (𝑘 1 , . . . , 𝑘 𝑝−3 ) that are not all zero and such that 𝑑 ≡ 𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 ≤ 𝑝 − 3. Here the constraint 𝑑 ≤ 𝑝 − 3 has the aim to involve only the terms that might not be small in comparison with 𝐿 𝑝 . Indeed, as we know from Propositions 4.2.4 and 4.2.1, 𝑙−2 𝑝−3 |𝛾𝑙 | ≤ (𝑙 − 1)! 𝐿 𝑙 ≤ (𝑙 − 1)! 𝐿 𝑝−1 ,
3 ≤ 𝑙 ≤ 𝑝 − 1,
(4.20)
which gives 𝛾 𝑘1 𝑘 𝑝−3 𝛾 𝑑 1 𝑝−1 3 𝑝−3 ··· . 𝐿 ≤ 𝑘 3! ( 𝑝 − 1)! 3 1 · · · ( 𝑝 − 1) 𝑘 𝑝−3 𝑝−1 𝑑
(4.21)
𝑑
𝑝−3 Here, 𝐿 𝑝−1 ≤ 𝐿 𝑝𝑝−2 . So, the left product in (4.21) is at least as small as 𝐿 𝑝 in the case 𝑑 ≥ 𝑝 − 2, when 𝐿 𝑝 is small. Of course, this should be justified when comparing e𝑄 𝑝−1 (𝑖𝑡) − 1 and 𝑅 𝑝−1 (𝑖𝑡) on a proper interval of the 𝑡-axis. The index 𝑝 − 1 in 𝑅 𝑝−1 indicates that all cumulants up to 𝛾 𝑝−1 participate in the constructions of these polynomials. In their definition, 𝑖𝑡 is raised to the power
𝑘 = 3𝑘 1 + · · · + ( 𝑝 − 1)𝑘 𝑝−3 = 𝑑 + 2(𝑘 1 + 𝑘 2 + · · · + 𝑘 𝑝−3 ), which may vary from 3 to 3( 𝑝 − 3), with maximum 3( 𝑝 − 3) attainable when 𝑘 1 = 𝑝 − 3 and all other 𝑘 𝑟 = 0. Anyway, deg(𝑅 𝑝−1 ) ≤ 3( 𝑝 − 3). These observations imply a simple boundon the growth of 𝑅 𝑝−1 , which will be needed in the sequel. First, using |𝑡| 𝑘 ≤ max |𝑡| 3 , |𝑡| 3( 𝑝−3) , we have, by (4.21), ∑︁ |𝑅 𝑝−1 (𝑖𝑡)| ≤ max |𝑡| 3 , |𝑡| 3( 𝑝−3)
𝑑 1 1 𝑝−3 . 𝐿 𝑝−1 𝑘 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3 𝑘1 · · · ( 𝑝 − 1) 𝑝−3
Applying the elementary bound ∑︁
1 1 < e1/3 · · · e1/( 𝑝−1) < 𝑝 − 1, 𝑘 1 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3 · · · ( 𝑝 − 1) 𝑘 𝑝−3
we arrive at: Proposition 4.5.3 For all 𝑡 ∈ R, |𝑅 𝑝−1 (𝑖𝑡)| ≤ ( 𝑝 − 1) max |𝑡| 3 , |𝑡| 3( 𝑝−3) max 𝐿 𝑝−1 , 1 .
(4.22)
4.6 Higher Order Approximations of Characteristic Functions
77
4.6 Higher Order Approximations of Characteristic Functions Using the polynomials 𝑅 𝑝−1 from Definition 4.5.2, the approximation of the characteristic functions 𝑓𝑛 given in (4.19) may be simplified in terms of the functions 𝑔 𝑝−1 (𝑡) = e−𝑡
2 /2
(1 + 𝑅 𝑝−1 (𝑖𝑡)).
(4.23)
1 1 Proposition 4.6.1 If 𝐿 𝑝 < ∞ ( 𝑝 ≥ 4), in the interval |𝑡| max 𝐿 𝑝𝑝−2 , 𝐿 𝑝3( 𝑝−2) ≤ 1, we have for some constant 𝑐 𝑝 > 0 depending on 𝑝
2 | 𝑓𝑛 (𝑡) − 𝑔 𝑝−1 (𝑡)| ≤ 𝑐 𝑝 𝐿 𝑝 max |𝑡| 𝑝 , |𝑡| 3( 𝑝−2) e−𝑡 /2 .
(4.24)
This inequality is also true for 𝑝 = 3, when it becomes | 𝑓𝑛 (𝑡) − e−𝑡
2 /2
| ≤ 𝑐𝐿 3 |𝑡| 3 e−𝑡
2 /2
,
that is, the bound (4.16) of Lemma 4.4.3 (recall that 𝑅2 = 0, so that 𝑔2 is the standard normal characteristic function). But, if 𝐿 𝑝 is smaller than 𝐿 3 for 𝑝 ≥ 4 (which is typical), (4.24) is better than (4.16). This may be seen in the i.i.d. case where 𝑋 𝑘 = √1𝑛 𝜉 𝑘 with E 𝜉 𝑘 = 0, E 𝜉 𝑘2 = 1, 𝛽 𝑝 = E |𝜉 𝑘 | 𝑝 < ∞. Then (4.24) becomes | 𝑓𝑛 (𝑡) − 𝑔 𝑝−1 (𝑡)| ≤ 𝑐 𝑝 𝛽 𝑝 𝑛−
𝑝−2 2
2 max |𝑡| 𝑝 , |𝑡| 3( 𝑝−2) e−𝑡 /2 ,
1
which holds on the interval |𝑡| = 𝑂 (𝑛 3( 𝑝−2) ). That is why the function 𝑔 𝑝−1 (𝑡) is often called the corrected normal “characteristic” function of order 𝑝 − 1. Let us mention that an inequality similar to (4.24) is also true for the first 𝑝 − 1 derivatives of 𝑓𝑛 . Namely, in the same interval as in Proposition 4.6.1, d𝑟 2 d𝑟 𝑟 𝑓𝑛 (𝑡) − 𝑟 𝑔 𝑝−1 (𝑡) ≤ 𝑐 𝑝 𝐿 𝑝 max |𝑡| 𝑝−𝑟 , |𝑡| 3( 𝑝−2)+𝑟 e−𝑡 /2 d𝑡 d𝑡
(4.25)
for any 𝑟 = 1, . . . , 𝑝 − 1. Proof (of Proposition 4.6.1) In order to apply (4.19), we need to relate 𝑅 𝑝−1 to the cumulant polynomials 𝑄 𝑝−1 . Recall that 𝑙−2
|𝛾𝑙 | ≤ (𝑙 − 1)! 𝐿 𝑙 ≤ (𝑙 − 1)! 𝐿 𝑝𝑝−2 ,
3 ≤ 𝑙 ≤ 𝑝 − 1,
like in (4.20). Hence, for any complex number 𝑧 in the disc 𝐿 𝑝 |𝑧| 𝑝−2 ≤ 1, |𝑄 𝑚 (𝑧)| ≤
𝑝−1 ∑︁ 1
𝑙−2
𝐿 𝑝𝑝−2 |𝑧| 𝑙
𝑙 𝑙=3 1
= 𝐿 𝑝𝑝−2 |𝑧| 3
𝑝−1 ∑︁ 1
𝑙 𝑙=3
1
𝐿 𝑝𝑝−2 |𝑧|
𝑙−3
1
≤ log( 𝑝 − 1) 𝐿 𝑝𝑝−2 |𝑧| 3 .
(4.26)
78
4 Sums of Independent Random Variables
1 Í 𝑝−1 1 𝑝−2 3 where we used 𝑙=3 𝑙 < log( 𝑝 − 1). Moreover, if additionally 𝐿 𝑝 |𝑧| ≤ 1, we get a uniform bound |𝑄 𝑝−1 (𝑧)| ≤ log( 𝑝 − 1). Now, consider the function
Ψ(𝑤) = e𝑤 −
𝑝−3 ∞ ∑︁ ∑︁ 𝑤 𝑘−( 𝑝−2) 1 𝑘 𝑤 = 𝑤 𝑝−2 𝑘! 𝑘! 𝑘= 𝑝−2 𝑘=0
of the complex variable 𝑤. If |𝑤| ≤ log( 𝑝 − 1), then |Ψ(𝑤)| ≤ |𝑤| 𝑝−2
∞ ∑︁ 1 log( 𝑝 − 1)) 𝑘−( 𝑝−2) ≤ |𝑤| 𝑝−2 . 𝑘! 𝑘= 𝑝−2
Indeed, the last sum, call it 𝑆( 𝑝), does not exceed 𝑝 − 2 − log( 𝑝 − 1) 1 log( 𝑝−1) e − 1 − log( 𝑝 − 1) = ≡ 𝑇 ( 𝑝). (log( 𝑝 − 1)) 𝑝−2 (log( 𝑝 − 1)) 𝑝−2 Clearly, 𝑇 (4) < 1 and 𝑇 (5) < 1. For 𝑝 ≥ 6, we have log 𝑇 ( 𝑝) < log( 𝑝 − 2) − ( 𝑝 − 2) log log( 𝑝 − 1) < 0. log 𝑥
Here, since the function 𝑥 is decreasing for 𝑥 ≥ e, we only need to verify the last inequality for 𝑝 = 6, when it is also true. Thus, 𝑆( 𝑝) < 1 for all 𝑝 ≥ 4. The inequality Ψ(𝑤) ≤ |𝑤| 𝑝−2 can be used with 𝑤 = 𝑄 𝑝−1 (𝑧), 𝑧 = 𝑖𝑡. Since |𝑄 𝑝−1 (𝑖𝑡)| ≤ log( 𝑝 − 1), applying the non-uniform estimate (4.26), we get |Ψ(𝑄 𝑝−1 (𝑖𝑡))| ≤ |𝑄 𝑝−1 (𝑖𝑡)| 𝑝−2 ≤ log 𝑝−2 ( 𝑝 − 1) 𝐿 𝑝 |𝑡| 3( 𝑝−2) , that is, 𝑝−3 𝑄 𝑝 (𝑖𝑡) ∑︁ 1 𝑄 𝑝−1 (𝑖𝑡) 𝑘 ≤ log 𝑝−2 ( 𝑝 − 1) 𝐿 𝑝 |𝑡| 3( 𝑝−2) . − e 𝑘! 𝑘=0
(4.27)
As a result, we are only concerned with the remainder term 𝑝−3 ∑︁ 1 𝑄 𝑝−1 (𝑖𝑡) 𝑘 − 𝑅 𝑝−1 (𝑖𝑡) 𝑟 (𝑖𝑡) = 𝑘! 𝑘=1 𝑝−1 𝑘 𝑝−3 ∑︁ 1 ∑︁ (𝑖𝑡) 𝑙 𝛾𝑙 − 𝑅 𝑝−1 (𝑖𝑡). = 𝑘! 𝑙=3 𝑙! 𝑘=1
Using the polynomial formula, let us write down the double sum (with 𝑧 = 𝑖𝑡) as 𝑝−3 ∑︁
∑︁
𝑘=1 𝑘1 +···+𝑘 𝑝−3 =𝑘
𝑘 𝑝−3 𝛾 𝑘1 𝛾 1 𝑝−1 3 ··· 𝑧3𝑘1 +···+( 𝑝−1) 𝑘 𝑝−3 . 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)!
4.6 Higher Order Approximations of Characteristic Functions
79
This expression defines 𝑅 𝑝−1 (𝑖𝑡) with the difference that Definition 4.5.2 contains the constraint 𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 ≤ 𝑝 − 3, while here we have a weaker constraint 𝑘 1 + 𝑘 2 + · · · + 𝑘 𝑝−3 ≤ 𝑝 − 3. Hence, all terms appearing in 𝑅 𝑝−1 (𝑖𝑡) are present in the above double sum, so, 𝑟 (𝑖𝑡) =
∑︁
𝑘 𝑝−3 𝛾 𝑘1 𝛾 1 𝑝−1 3 ··· (𝑖𝑡) 3𝑘1 +···+( 𝑝−1) 𝑘 𝑝−3 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)!
with summation subject to 𝑘 1 + 𝑘 2 + · · · + 𝑘 𝑝−3 ≤ 𝑝 − 3,
𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 ≥ 𝑝 − 2. 𝑙−2
Necessarily, all 𝑘 𝑗 ≤ 𝑝 − 3 and at least one 𝑘 𝑗 ≥ 1. Using |𝛾𝑙 | ≤ (𝑙 − 1)! 𝐿 𝑝𝑝−2 , we get |𝑟 (𝑧)| ≤
𝑝−1 ∑︁ Ö 𝑙=3
1
1
𝑘 𝑙−2 ! 𝑙
where 𝑀=
𝑙−2
𝐿 𝑝𝑝−2
𝑘𝑙−2
|𝑧| 𝑁 =
∑︁
𝑁 𝐿𝑀 𝑝 |𝑧|
𝑝−1 Ö 𝑙=3
1
1 𝑘𝑙−2
𝑘 𝑙−2 ! 𝑙
1 (𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 ), 𝑝−2
𝑁 = 3𝑘 1 + · · · + ( 𝑝 − 1)𝑘 𝑝−3 = (𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 ) + 2𝑘,
𝑘 = 𝑘 1 + 𝑘 2 + · · · + 𝑘 𝑝−3 .
1
Note that ( 𝑝 − 2)𝑀 = 𝑁 − 2𝑘. If 𝐿 𝑝𝑝−2 |𝑧| ≤ 1, using 1 ≤ 𝑘 ≤ 𝑝 − 2, we have 𝐿 𝑀−1 |𝑧| 𝑁 ≤ |𝑧| 𝑁 −( 𝑝−2) ( 𝑀−1) = |𝑧| ( 𝑝−2)+2𝑘 ≤ max |𝑧| 𝑝 , |𝑧| 3( 𝑝−2) . 𝑝 Hence 𝑝−1 ∑︁ Ö |𝑟 (𝑧)| ≤ 𝐿 𝑝 max |𝑧| 𝑝 , |𝑧| 3( 𝑝−2) 𝑙=3
The latter sum is dominated by exp{
1 𝑙=3 𝑙 }
Í 𝑝−1
1
1 𝑘𝑙−2
𝑘 𝑙−2 ! 𝑙
.
≤ 𝑝 − 1, so
|𝑟 (𝑧)| ≤ ( 𝑝 − 1) 𝐿 𝑝 max |𝑧| 𝑝 , |𝑧| 3( 𝑝−2) . In view of (4.27), this gives the representation e𝑄 𝑝−1 (𝑖𝑡) = 1 + 𝑅 𝑝−1 (𝑖𝑡) + 𝛿(𝑡) with an error term satisfying |𝛿(𝑡)| ≤ 𝑐 𝑝 𝐿 𝑝 max |𝑡| 𝑝 , |𝑡| 3( 𝑝−2)
1 1 for |𝑡| max 𝐿 𝑝𝑝−2 , 𝐿 𝑝3( 𝑝−2) ≤ 1,
,
80
4 Sums of Independent Random Variables
where we denote by 𝑐 𝑝 a constant depending on 𝑝 only. Note that |𝛿(𝑡)| ≤ 𝑐 𝑝 in the indicated interval. Since |𝑄 𝑝−1 (𝑖𝑡)| ≤ log( 𝑝 − 1), we also have |𝑅 𝑝−1 (𝑖𝑡)| ≤ 𝑐 𝑝 + 𝑝. Returning to (4.19), we may write 2 𝑓𝑛 (𝑡) = e−𝑡 /2 1 + 𝑅 𝑝−1 (𝑖𝑡) + 𝛿(𝑡) (1 + 𝜀(𝑡)) 2 2 = e−𝑡 /2 1 + 𝑅 𝑝−1 (𝑖𝑡) + e−𝑡 /2 (1 + 𝑅 𝑝−1 (𝑖𝑡)) 𝜀(𝑡) + (1 + 𝜀(𝑡)) 𝛿(𝑡) . 1
Here, by Lemma 4.5.1, |𝜀(𝑡)| ≤ 𝑐 𝑝 𝐿 𝑝 |𝑡| 𝑝 under the condition 𝐿 𝑝𝑝 |𝑡| ≤ 1. This is 1 1 1 indeed fulfilled, since 𝐿 𝑝𝑝 ≤ max 𝐿 𝑝𝑝−2 , 𝐿 𝑝3( 𝑝−2) . Moreover, 𝐿 𝑝 |𝑡| 𝑝 ≤ 1, so that |𝜀(𝑡)| ≤ 𝑐 𝑝 . Since both 𝜀(𝑡) and 𝛿(𝑡) have been properly bounded, Proposition 4.6.1 is proved. □
4.7 Edgeworth Corrections Again, let 𝑋1 , . . . , 𝑋𝑛 be independent random variables with mean zero and variances Í 𝑏 2𝑘 = Var(𝑋 𝑘 ) such that 𝑛𝑘=1 𝑏 2𝑘 = 1. If the Lyapunov coefficient 𝐿 𝑝 is small for an integer 𝑝 ≥ 4, Proposition 4.6.1 tells us that the characteristic function 𝑓𝑛 (𝑡) of 𝑆 𝑛 = 𝑋1 + · · · + 𝑋𝑛 is close to the function 𝑔 𝑝−1 (𝑡), defined in (4.23), on a relatively long interval. Therefore, it is reasonable to believe that in some sense the distribution of 𝑆 𝑛 is close to the signed (Lebesgue–Stieltjes) measure 𝜇 𝑝−1 on the real line, whose Fourier– Stieltjes transform is exactly 𝑔 𝑝−1 , that is, with ∫ ∞ e𝑖𝑡 𝑥 d𝜇 𝑝−1 (𝑥) = 𝑔 𝑝−1 (𝑡), 𝑡 ∈ R. (4.28) −∞
In order to describe these measures explicitly, let us recall the Chebyshev–Hermite polynomials 𝐻 𝑘 (𝑥) = (−1) 𝑘 (e−𝑥
2 /2
) (𝑘) e 𝑥
2 /2
,
𝑘 = 0, 1, 2, . . . (𝑥 ∈ R).
Equivalently, 𝜑 (𝑘) (𝑥) = (−1) 𝑘 𝐻 𝑘 (𝑥)𝜑(𝑥) in terms of the standard normal density 𝜑. Each 𝐻 𝑘 is a polynomial of degree 𝑘 with leading coefficient 1. For example, 𝐻0 (𝑥) = 1, 𝐻1 (𝑥) = 𝑥,
𝐻2 (𝑥) = 𝑥 2 − 1, 𝐻3 (𝑥) = 𝑥 3 − 3𝑥,
𝐻4 (𝑥) = 𝑥 4 − 6𝑥 2 + 3, 𝐻5 (𝑥) = 𝑥 5 − 10𝑥 3 + 15𝑥.
These polynomials are orthogonal on the real line with weight 𝜑(𝑥) and form a complete orthogonal system in the Hilbert space 𝐿 2 (R, 𝜑(𝑥) d𝑥). By repeated integration by parts (with 𝑡 ≠ 0)
4.7 Edgeworth Corrections
e−𝑡
2 /2
81 ∞
∫
e𝑖𝑡 𝑥 𝜑(𝑥) d𝑥 =
= −∞
1 (−𝑖𝑡) 𝑘
∫
∞
e𝑖𝑡 𝑥 𝜑 (𝑘) (𝑥) d𝑥,
−∞
we have the identity ∫
∞
e𝑖𝑡 𝑥 𝐻 𝑘 (𝑥)𝜑(𝑥) d𝑥 = (𝑖𝑡) 𝑘 e−𝑡
2 /2
.
−∞
Using the inverse Fourier transform, one may therefore write ∫ ∞ 2 1 e−𝑖𝑡 𝑥 (𝑖𝑡) 𝑘 e−𝑡 /2 d𝑡, 𝐻 𝑘 (𝑥) 𝜑(𝑥) = 2𝜋 −∞ which may be taken as another definition of 𝐻 𝑘 . Returning to Definition 4.5.2, we therefore obtain: Proposition 4.7.1 Let 𝐿 𝑝−1 < ∞ for an integer 𝑝 ≥ 4. The measure 𝜇 𝑝−1 with Fourier–Stieltjes transform 𝑔 𝑝−1 has density 𝜑 𝑝−1 (𝑥) = 𝜑(𝑥) + 𝜑(𝑥)
∑︁
𝑘 𝑝−3 𝛾 𝑘1 𝛾 1 𝑝−1 3 ··· 𝐻 𝑘 (𝑥), 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)!
where 𝑘 = 3𝑘 1 +· · ·+( 𝑝−1)𝑘 𝑝−3 and where the summation runs over all non-negative integers 𝑘 1 , . . . , 𝑘 𝑝−3 not all zero and such that 𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 ≤ 𝑝 − 3. Recall that the cumulants 𝛾𝑟 of 𝑆 𝑛 are well defined for all 1 ≤ 𝑟 ≤ 𝑝 − 1. If 𝑝 = 3, the above sum is empty, and we put 𝜑2 = 𝜑. Since (𝐻 𝑘−1 (𝑥)𝜑(𝑥)) ′ = −𝐻 𝑘 (𝑥)𝜑(𝑥), the corresponding “distribution” function ∫ 𝑥 Φ 𝑝−1 (𝑥) = 𝜇 𝑝−1 ((−∞, 𝑥]) = 𝜑 𝑝−1 (𝑦) d𝑦, 𝑥 ∈ R, −∞
is described as Φ 𝑝−1 (𝑥) = Φ(𝑥) − 𝜑(𝑥)
∑︁
𝑘 𝑝−3 𝛾 𝑘1 𝛾 1 𝑝−1 3 ··· 𝐻 𝑘−1 (𝑥) 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)!
with summation as before. Definition 4.7.2 The signed measure 𝜇 𝑝−1 is called the Edgeworth approximation of order 𝑝 − 1 for the distribution of 𝑆 𝑛 (or an Edgeworth correction of the normal law). Thus, 𝜑 𝑝−1 (𝑥) = 𝜑(𝑥) + 𝜑(𝑥)𝑈 𝑝−1 (𝑥), Φ 𝑝−1 (𝑥) = Φ(𝑥) − 𝜑(𝑥)𝑉 𝑝−1 (𝑥), where 𝑈 𝑝−1 and 𝑉 𝑝−1 are polynomials of degree at most 3( 𝑝 − 3) and 3( 𝑝 − 3) − 1, respectively.
82
4 Sums of Independent Random Variables
For the first values, we have 𝛾3 𝐻2 (𝑥), 3! 𝛾32 𝛾3 𝛾4 𝑉4 (𝑥) = 𝐻2 (𝑥) + 𝐻3 (𝑥) + 𝐻5 (𝑥), 3! 4! 2! 3!2 𝛾3 𝛾4 𝛾5 𝐻2 (𝑥) + 𝐻3 (𝑥) + 𝐻4 (𝑥) 𝑉5 (𝑥) = 3! 4! 5! 𝛾33 𝛾32 𝛾3 𝛾4 + 𝐻8 (𝑥). 𝐻 (𝑥) + 𝐻 (𝑥) + 5 6 3! 4! 2! 3!2 3!4 𝑉3 (𝑥) =
Let us briefly describe a few basic properties of these measures. Proposition 4.7.3 The moments of 𝑆 𝑛 and 𝜇 𝑝−1 coincide up to order 𝑝 − 1, or equivalently (𝑟) 𝑓𝑛(𝑟) (0) = 𝑔 𝑝−1 (0), 𝑟 = 0, 1, . . . , 𝑝 − 1. In particular, ∫
∞
𝜇 𝑝−1 (R) =
𝜑 𝑝−1 (𝑥) d𝑥 = 1. −∞
The latter immediately follows from the Fourier transform formula (4.28) applied at 𝑡 = 0. The more general assertion may be obtained as a consequence of (4.25), (𝑟) which gives | 𝑓𝑛(𝑟) (𝑡) − 𝑔 𝑝−1 (𝑡)| = 𝑂 (|𝑡| 𝑝−𝑟 ) as 𝑡 → 0. Proposition 4.7.4 If 𝐿 𝑝−1 < ∞ for an integer 𝑝 ≥ 4, then for some positive constant 𝑐 𝑝 depending on 𝑝, ∫ ∞ (1 + |𝑥|) 𝑝 |𝜇 𝑝−1 (d𝑥) − 𝜇(d𝑥)| ≤ 𝑐 𝑝 max{𝐿 𝑝−1 , 1}, −∞
where 𝜇 is the standard Gaussian measure. In addition, sup |𝜑 𝑝−1 (𝑥) − 𝜑(𝑥)| ≤ 𝑐 𝑝 max{𝐿 𝑝−1 , 1}. 𝑥
Proof Denote by 𝐼 the integral appearing in the first inequality. It describes the weighted total variation distance between the measures 𝜇 𝑝−1 and 𝜇 with weight (1 + |𝑥|) 𝑝 and is explicitly given as ∫ ∞ 𝐼= (1 + |𝑥| 𝑝 ) |𝑈 𝑝−1 (𝑥)| 𝜑(𝑥) d𝑥. −∞
In Proposition 4.7.1, the tuples (𝑘 1 , . . . , 𝑘 𝑝−3 ) from the sum satisfy 1 ≤ 𝑑 ≤ 𝑝−3, where 𝑑 = 𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 . Hence, by (4.21), 𝐼 is bounded by ∫ ∞ ∑︁ 1 1 (1 + |𝑥| 𝑝 ) |𝐻 𝑘 (𝑥)| 𝜑(𝑥) d𝑥, max{𝐿 𝑝−1 , 1} 𝑘 1 ! . . . 𝑘 𝑝−3 ! 3 𝑘1 . . . ( 𝑝 − 1) 𝑘 𝑝−3 −∞
4.8 Rates of Approximation
83
where 𝑘 = 3𝑘 1 + · · · + ( 𝑝 − 1)𝑘 𝑝−3 may vary from 3 to 3( 𝑝 − 3). Let 𝑍 be a standard normal random variable. Using E 𝐻 𝑘 (𝑍) 2 = 𝑘! we get, by the Cauchy inequality, ∫ ∞ (1 + |𝑥|) 𝑝 |𝐻 𝑘 (𝑥)| 𝜑(𝑥) d𝑥 = E (1 + |𝑍 |) 𝑝 |𝐻 𝑘 (𝑍)| −∞ √︁ √ ≤ E (1 + |𝑍 |) 2 𝑝 𝑘! ≤ 𝑐 𝑝 . Hence 𝐼 does not exceed 𝑐 𝑝 max{𝐿 𝑝−1 , 1}
∑︁
1 1 . 𝑘 1 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3 · · · ( 𝑝 − 1) 𝑘 𝑝−3
The latter sum is smaller than 𝑝 − 1, cf. (4.22), and we obtain the first inequality. The second bound follows from Proposition 4.5.3. Since ∫ ∞ ∫ ∞ 2 𝑖𝑡 𝑥 −𝑡 2 /2 𝜑 𝑝−1 (𝑥) − 𝜑(𝑥) = e (𝑔 𝑝−1 (𝑡) − e ) d𝑡 = e𝑖𝑡 𝑥 𝑅 𝑝−1 (𝑖𝑡) e−𝑡 /2 d𝑡, −∞
−∞
it gives |𝜑 𝑝−1 (𝑥) − 𝜑(𝑥)| ≤ ( 𝑝 − 1) max 𝐿 𝑝−1 , 1
∫
∞
2 max |𝑡| 3 , |𝑡| 3( 𝑝−3) e−𝑡 /2 d𝑡.
−∞
Proposition 4.7.4 is proved.
□
4.8 Rates of Approximation We are prepared to derive a Berry–Esseen-type bound quantifying the approximation of the distribution function 𝐹𝑛 (𝑥) = P{𝑆 𝑛 ≤ 𝑥} by the corrected normal “distribution” function Φ 𝑝−1 . Throughout, we denote by 𝑐 𝑝 a positive constant depending on 𝑝 only, which may vary from place to place. Continuing the setting of the previous section, we have: Proposition 4.8.1 If 𝐿 𝑝 < ∞ ( 𝑝 ≥ 3), then for any 𝛿 ≥ 0, ∫ 𝑐 𝑝 𝜌(𝐹𝑛 , Φ 𝑝−1 ) ≤ 𝐿 𝑝 + 𝛿 + 1 { 𝛿 ≤𝐿3 ≤1}
1/ 𝛿
1/𝐿3
| 𝑓𝑛 (𝑡)| d𝑡. 𝑡
(4.29)
As before, we denote by 𝑓𝑛 (𝑡) the characteristic function of 𝑆 𝑛 . If 𝑝 = 3 and 𝛿 = 𝐿 3 , we return in (4.29) to the Berry–Esseen bound (4.15) in Proposition 4.4.1 (so, this value of 𝛿 is optimal). If 𝑝 ≥ 4, choosing 𝛿 = 𝐿 𝑝 , (4.29) simplifies to ∫ 𝑐 𝑝 𝜌(𝐹𝑛 , Φ 𝑝−1 ) ≤ 𝐿 𝑝 + 1 {𝐿 𝑝 ≤𝐿3 ≤1}
1/𝐿 𝑝
1/𝐿3
| 𝑓𝑛 (𝑡)| d𝑡. 𝑡
84
4 Sums of Independent Random Variables
However, in further applications a slightly different choice of 𝛿 will be more useful. As a preliminary step, here we extend the inequality (4.24), although with a worse Gaussian decay on the right-hand side, to the larger interval in full analogy with Lemma 4.4.2. The next inequality is of independent interest. Proposition 4.8.2 If 𝐿 𝑝 < ∞ ( 𝑝 ≥ 3), then in the interval |𝑡| ≤ 1/𝐿 3 , we have 2 | 𝑓𝑛 (𝑡) − 𝑔 𝑝−1 (𝑡)| ≤ 𝑐 𝑝 𝐿 𝑝 min 1, |𝑡| 𝑝 e−𝑡 /8 .
(4.30)
In the proof, we employ Lemma 4.4.2 in order to bound the characteristic function 𝑓𝑛 (𝑡) outside the interval of Proposition 4.6.1, together with a similar observation about the decay of the corrected normal characteristic function. 1
1
Lemma 4.8.3 In the region |𝑡| max{𝐿 𝑝𝑝−2 , 𝐿 𝑝3( 𝑝−2) } ≥ 1, we have |𝑔 𝑝−1 (𝑡)| ≤ 𝑐 𝑝 𝐿 𝑝 e−𝑡
2 /8
.
Proof We use (4.21), implying that |𝑅 𝑝−1 (𝑖𝑡)| ≤
∑︁
𝑑 1 1 𝑝−2 |𝑡| 𝑘 , 𝐿 𝑝 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3 𝑘1 · · · ( 𝑝 − 1) 𝑘 𝑝−3
where the summation runs over all collections of non-negative integers (𝑘 1 , . . . , 𝑘 𝑝−3 ) that are not all zero and such that 𝑘 = 3𝑘 1 + · · · + ( 𝑝 − 1)𝑘 𝑝−3 ,
𝑑 = 𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 ≤ 𝑝 − 3.
All tuples that are involved satisfy 1 ≤ 𝑘 ≤ 3𝑑 ≤ 3( 𝑝 − 3). 𝑑
𝑑
2
If 𝐿 𝑝 ≥ 1, then 𝐿 𝑝𝑝−2 ≤ 𝐿 𝑝 , and using |𝑡| 𝑘 e−3𝑡 /8 ≤ 𝑐 𝑝 , we get 𝐿 𝑝𝑝−2 |𝑡| 𝑘 e−𝑡 2 𝑐 𝑝 𝐿 𝑝 e−𝑡 /8 , and the inequality of the lemma follows. 𝑑
If 𝐿 𝑝 ≤ 1, it will be sufficient to bound the products 𝐿 𝑝𝑝−2
−1
|𝑡| 𝑘 e−3𝑡
2 /8
2 /2
≤
uniformly
− 1 𝐿 𝑝 𝑝−2 .
for all admissible tuples. Put 𝑥 = Using the hypothesis |𝑡| ≥ 𝑥 1/3 , let us rewrite every such product and then estimate it as follows: 𝑑
𝐿 𝑝𝑝−2
−1
|𝑡| 𝑘 e−3𝑡
2 /8
= 𝑥 ( 𝑝−2)−𝑑 e−𝑡
2 /4
1
≤ 𝑥 ( 𝑝−2)−𝑑 e− 4 𝑥 3
= (4𝑦) 2
· |𝑡| 𝑘 e−𝑡
2/3
2 /8
· |𝑡| 𝑘 e−𝑡
( ( 𝑝−2)−𝑑) −𝑦
e
2 /8
· (8𝑢) 𝑘/2 e−𝑢 ,
where we changed the variables 𝑥 = (4𝑦) 3/2 , 𝑡 = (8𝑢) 1/2 . Both functions in the variables 𝑦 ≥ 0 and 𝑢 ≥ 0 are bounded by constants depending 𝑝, only. Thus, 𝑑
𝐿 𝑝𝑝−2 |𝑡| 𝑘 e−𝑡 proving the lemma.
2 /2
≤ 𝑐 𝑝 𝐿 𝑝 e−𝑡
2 /8
, □
4.8 Rates of Approximation
85
Proof (of Proposition 4.8.2) We distinguish between several cases. 1
1
Case 1. Moderate interval: |𝑡| max{𝐿 𝑝𝑝−2 , 𝐿 𝑝3( 𝑝−2) } ≤ 1. In this case, the inequality (4.30) follows from (4.24). 1
1
Case 2. Large region: |𝑡| max{𝐿 𝑝𝑝−2 , 𝐿 𝑝3( 𝑝−2) } ≥ 1 with 1 ≤ |𝑡| ≤ 𝐿13 . In this case, we bound both 𝑓𝑛 (𝑡) and 𝑔 𝑝−1 (𝑡) in absolute value by appropriate quantities. First, 2 by Lemma 4.4.2, | 𝑓𝑛 (𝑡)| ≤ e−𝑡 /6 , which is valid for |𝑡| ≤ 𝐿13 . Let us now consider an estimate of the form 2 e−𝑡 /24 ≤ 𝐶 𝑝 𝐿 𝑝 . −
1
If 𝐿 𝑝 ≥ 1, this holds with 𝐶 𝑝 = 1. If 𝐿 𝑝 ≤ 1, then necessarily |𝑡| ≥ 𝑡 0 ≡ 𝐿 𝑝 3( 𝑝−2) , so, one may take 𝐶𝑝 =
n 1 − 2 o 1 2 1 exp − 𝐿 𝑝 3( 𝑝−2) = 𝑡 03( 𝑝−2) e− 24 𝑡0 , 𝐿𝑝 24
which is bounded by a 𝑝-dependent constant 𝑐 𝑝 . As a result, we arrive at the desired upper bound 2 2 2 | 𝑓𝑛 (𝑡)| ≤ e−𝑡 /24 · e−𝑡 /8 ≤ 𝑐 𝑝 𝐿 𝑝 e−𝑡 /8 . A similar bound also holds for the approximating function 𝑔 𝑝−1 (𝑡), by Lemma 4.8.3 whenever |𝑡| ≥ 1, implying (4.30). 1
1
Case 3. Consider the region: |𝑡| max{𝐿 𝑝𝑝−2 , 𝐿 𝑝3( 𝑝−2) } ≥ 1 with |𝑡| ≤ min{1, Necessarily 𝐿 𝑝 ≥ 1. Hence, by Proposition 4.7.4, ∫ ∞ |𝑥| 𝑝 |d𝜇 𝑝−1 (𝑥)| ≤ 𝑐 𝑝 𝐿 𝑝 ,
1 𝐿3 }.
−∞
implying that, for all 𝑡 ∈ R, we have a similar bound on the 𝑝-th derivative of the corrected normal characteristic function, namely ∫ ( 𝑝) ∞ 𝑝 𝑖𝑡 𝑥 𝑔 (𝑡) = 𝑥 e d𝜇 𝑝−1 (𝑥) ≤ 𝑐 𝑝 𝐿 𝑝 . 𝑝−1 −∞
Similarly, by the Rosenthal inequality, 𝑓𝑛( 𝑝) (𝑡) ≤ E |𝑆 𝑛 | 𝑝 ≤ 𝑐 𝑝 𝐿 𝑝 . Since also (𝑟) 𝑓𝑛(𝑟) (0) = 𝑔 𝑝−1 (0) for 𝑟 = 1, . . . , 𝑝 − 1 (Proposition 4.7.3), it follows from the Taylor integral formula that | 𝑓𝑛 (𝑡) − 𝑔 𝑝−1 (𝑡)| ≤ 2𝑐 𝑝 𝐿 𝑝 Proposition 4.8.2 is proved.
|𝑡| 𝑝 . 𝑝! □
Proof (of Proposition 4.8.1) As was mentioned, we may assume that 𝑝 ≥ 4. If 𝐿 𝑝 ≥ 1, there is nothing to prove. Indeed, applying Proposition 4.7.4 (the first inequality for the total variation norm without weight), we conclude that
86
4 Sums of Independent Random Variables
𝜌(𝐹𝑛 , Φ 𝑝−1 ) ≤ 𝜌(𝐹𝑛 , Φ) + ∥𝜇 𝑝−1 − 𝜇∥ TV 𝑝−3
≤ 1 + 𝑐 𝑝 max{𝐿 𝑝−1 , 1} ≤ 1 + 𝑐 𝑝 𝐿 𝑝𝑝−2 ≤ (1 + 𝑐 𝑝 )𝐿 𝑝 . 1
Thus, we may assume that 𝐿 𝑝 ≤ 1, in which case necessarily 𝐿 3 ≤ 𝐿 𝑝𝑝−2 ≤ 1, by Proposition 4.2.1. First let 𝛿 ≤ 𝐿 3 . We are in position to apply Proposition 3.2.1 with 𝐹 = 𝐹𝑛 and 𝐺 = Φ 𝑝−1 . Choosing 𝑇 = 1/𝛿, the inequality (3.5) yields 1/ 𝛿
∫ 𝜌(𝐹𝑛 , Φ 𝑝−1 ) ≤ 2 0
| 𝑓𝑛 (𝑡) − 𝑔 𝑝−1 (𝑡)| d𝑡 + 2𝜋 2 𝐴𝛿, 𝑡
(4.31)
where 𝐴 = sup 𝑥 |Φ′𝑝−1 (𝑥)| = sup 𝑥 |𝜑 𝑝−1 (𝑥)| is bounded by a 𝑝-dependent constant, by Proposition 4.7.4. The bound (4.30) may be used to estimate the integral over the interval 0 ≤ 𝑡 ≤ 1/𝐿 3 , which gives ∫ 0
1/𝐿3
| 𝑓𝑛 (𝑡) − 𝑔 𝑝−1 (𝑡)| d𝑡 ≤ 𝑐 𝑝 𝐿 𝑝 𝑡
∫
∞
min{𝑡 𝑝−1 , 1} e−𝑡
2 /8
d𝑡 = 𝑐 ′𝑝 𝐿 𝑝 .
0 𝑝−3
Next, by Proposition 4.5.3, and using 𝐿 𝑝−1 ≤ 𝐿 𝑝𝑝−2 ≤ 1, we have |𝑔 𝑝−1 (𝑡)| ≤ e−𝑡
2 /2
2 2 + ( 𝑝 − 1) max |𝑡| 3 , |𝑡| 3( 𝑝−3) e−𝑡 /2 ≤ 𝑐 𝑝 e−𝑡 /4 ,
which implies ∫
∞
1/𝐿3
|𝑔 𝑝−1 (𝑡)| 2 d𝑡 ≤ 2𝑐 𝑝 e−1/(4𝐿3 ) ≤ 𝑐 ′𝑝 𝐿 3𝑝−2 ≤ 𝑐 ′𝑝 𝐿 𝑝 𝑡
(where we used Proposition 4.2.1 in the last step). The two integral bounds give ∫ 0
1/ 𝛿
| 𝑓𝑛 (𝑡) − 𝑔 𝑝−1 (𝑡)| d𝑡 ≤ 𝑐 𝑝 𝐿 𝑝 + 𝑡
∫
1/ 𝛿
1/𝐿3
| 𝑓𝑛 (𝑡)| d𝑡. 𝑡
Hence, the right-hand side of (4.31) is bounded by the right-hand side of (4.29). In the case 𝐿 3 < 𝛿, the integral term in (4.29) is not present. Choosing 𝑇 = 1/𝐿 3 in Proposition 3.2.1, the bound (4.30) similarly yields 1/𝐿3
| 𝑓𝑛 (𝑡) − 𝑔 𝑝−1 (𝑡)| d𝑡 + 2𝜋 2 𝐴𝐿 3 𝑡 0 ≤ 𝑐 𝑝 (𝐿 𝑝 + 𝐿 3 ) ≤ 𝑐 𝑝 (𝐿 𝑝 + 𝛿). ∫
𝜌(𝐹𝑛 , Φ 𝑝−1 ) ≤ 2
This corresponds to (4.29) without the integral on the right-hand side. Proposition 4.8.1 is proved.
□
4.9 Remarks
87
4.9 Remarks Assuming that the random variable 𝑋 has mean zero and standard deviation 𝜎, the constant in the inequality (4.3) can be improved for a smaller interval 𝜎|𝑡| ≤ 51 . As was shown in [33], (log 𝑓 (𝑡)) ( 𝑝) ≤ ( 𝑝 − 1)! 𝛽 𝑝 ( 𝑝 ≥ 3), and thus |𝜀( 𝑝)| ≤ 𝑝1 𝛽 𝑝 . This extends the Bikjalis inequality |𝛾 𝑝 | ≤ ( 𝑝 − 1)! 𝛽 𝑝 from Proposition 4.1.3. The factorial growth of the constant is optimal, up to an exponentially growing factor as 𝑝 tends to infinity, which was emphasized by Bulinskii [72] in his study of upper bounds in a more general scheme of random vectors and associated mixed cumulants. To illustrate possible lower bounds, he considered the symmetric Bernoulli distribution assigning the mass 12 to the points ±1. In this case, the characteristic function is 𝑓 (𝑡) = cos 𝑡, and one may use the Taylor expansion log 𝑓 (𝑡) = log cos 𝑡 = −
∞ ∑︁ 22𝑠 (22𝑠 − 1)
(2𝑠)! 𝑠=1
involving Bernoulli numbers 𝐵 𝑝 = even integer values of 𝑝, |𝛾 𝑝 | =
2 (2 𝑝)! (2 𝜋) 2 𝑝
𝐵𝑠
𝑡 2𝑠 , 2𝑠
𝑑2 𝑝 , where 𝑑2 𝑝 =
|𝑡| < Í∞ 𝑛=1
𝜋 , 2
𝑛−2 𝑝 . Thus, for
2𝑝 2 (2 𝑝 − 1) 2 𝑝 (2 𝑝 − 1) 𝐵 𝑝/2 = ( 𝑝 − 1)! 𝑑 ∼ 2 ( 𝑝 − 1)! 𝑝 𝑝 𝜋𝑝 𝜋
To compare with the upper bound, note that in this Bernoulli case, 𝛽 𝑝 = 1 for all 𝑝. The study of the best value 𝐶 𝑝 in Rosenthal’s inequality (4.10) has a long history, and here we only mention some selected results. Define 𝐶 ∗𝑝 to be an optimal constant in (4.10), when additionally assuming that the distributions of 𝑋 𝑘 are symmetric around the origin. By Jensen’s inequality, there is a simple general relation 𝐶 ∗𝑝 ≤ 𝐶 𝑝 ≤ 2 𝑝−1 𝐶 ∗𝑝 , which reduces in essence the study of Rosenthal-type inequalities to the symmetric case. Johnson, Schechtman and Zinn [109] have derived the two-sided bounds 𝑝 7.35 𝑝 ≤ (𝐶 ∗𝑝 ) 1/ 𝑝 ≤ . √ log(max( 𝑝, e)) 2 e log(max( 𝑝, e)) 𝑝 Hence, asymptotically 𝐶 1/ is of order 𝑝/log 𝑝 for growing values of 𝑝. They 𝑝 have also obtained an upper bound with a better numerical factor, (𝐶 ∗𝑝 ) 1/ 𝑝 ≤ √︁ 𝑝/ log max( 𝑝, e), which implies a simple bound 𝐶 𝑝 ≤ (2𝑝) 𝑝 , 𝑝 > 2. As for the best constant in the symmetric case, it was shown by Ibragimov and Sharakhmetov [107] that 𝐶 ∗𝑝 = E |𝜉 − 𝜂| 𝑝 for 𝑝 > 4, where 𝜉 and 𝜂 are independent Poisson random variables with parameter 𝜆 = 12 (cf. also Pinelis [156] for a similar
88
4 Sums of Independent Random Variables
𝑝 description without the symmetry assumption). In particular, (𝐶 ∗𝑝 ) 1/ 𝑝 ∼ e log 𝑝 as 𝑝 tends to infinity. This result easily yields 𝐶 ∗𝑝 ≤ 𝑝! for 𝑝 = 3, 4, 5, . . . Standard sources about Edgeworth expansions for distributions of the sums of independent random variables are the books by Petrov [154], [155], and by Bhattacharya and Ranga Rao [16]. The survey [33] extends several results in this direction to non-integer values of the parameter 𝑝. In particular, the inequalities (4.24) and (4.30) are derived there for 𝑝 = 𝑚 + 𝛼 with integers 𝑚 ≥ 2 and 0 < 𝛼 ≤ 1, with 𝑔 𝑝−1 replaced by 𝑔𝑚 , and for the derivatives of order 𝑟 = 0, 1, . . . , 𝑚.
Part II
Selected Topics on Concentration
Chapter 5
Standard Analytic Conditions
Many interesting classes of probability distributions on R𝑛 may be described by means of various analytic, that is, integro-differential inequalities, usually called Sobolev-type inequalities. Keeping the setting of an abstract metric space, in this chapter we introduce the most frequently used ones known as Poincaré-type and (stronger) Cheeger-type inequalities. The second type appears as a particular case of more general Sobolev-type inequalities which can be used as an equivalent form for isoperimetric inequalities. At the end of the chapter, we briefly describe basic examples of probability distributions on Euclidean space satisfying these inequalities.
5.1 Moduli of Gradients in the Continuous Setting Analytic conditions imposed on a given Borel probability measure 𝜇 on R𝑛 are usually expressed in the form of Sobolev-type inequalities which connect the distribution of an arbitrary smooth function 𝑓 under 𝜇 with the distribution of its gradient ∇ 𝑓 , more often – with the distribution of its Euclidean length, or the modulus of the gradient 1/2 ∑︁ 𝑛 2 (5.1) , |∇ 𝑓 (𝑥)| = |𝜕𝑘 𝑓 (𝑥)| 𝑘=1 𝑓 ( 𝑥) where 𝜕𝑘 𝑓 (𝑥) = 𝜕𝜕𝑥 are partial derivatives of 𝑓 at the point 𝑥. For various reasons, 𝑘 it is convenient to extend such inequalities to larger classes of functions, such as the class of all locally Lipschitz 𝑓 . Moreover, to keep a higher level of generality and thus get more freedom in applications, it is also useful to have results that may be derived from Sobolev-type inequalities in the setting of abstract “continuous” metric spaces rather than on R𝑛 , only. Therefore, let us start with an arbitrary metric space (M, 𝑑) without isolated points – a property which serves well enough the idea of “continuity” of the space. A function 𝑓 on M is said to be Lipschitz on a given nonempty subset 𝐴 of M if, for some constant 𝐿 ≥ 0, we have
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_5
91
92
5 Standard Analytic Conditions
| 𝑓 (𝑥) − 𝑓 (𝑦)| ≤ 𝐿𝑑 (𝑥, 𝑦)
for all 𝑥, 𝑦 ∈ 𝐴.
The optimal value of 𝐿 is called the Lipschitz semi-norm of 𝑓 on 𝐴, and when 𝐴 = M, it will be denoted ∥ 𝑓 ∥ Lip . The function 𝑓 will be called locally Lipschitz if it has a finite Lipschitz semi-norm on every ball in M. When M is locally compact, and every ball is compact in M, the latter is equivalent to the property that any point in M has a neighborhood where 𝑓 has a finite Lipschitz semi-norm. For locally Lipschitz functions 𝑓 on M, the modulus of the gradient may be understood in the generalized sense as the finite function |∇ 𝑓 |(𝑥) = |∇ 𝑓 (𝑥)| = lim sup 𝑦→𝑥
| 𝑓 (𝑥) − 𝑓 (𝑦)| , 𝑑 (𝑥, 𝑦)
𝑥 ∈ M.
(5.2)
One may also involve the operators Δ 𝜀 𝑓 (𝑥) =
sup
| 𝑓 (𝑦) − 𝑓 (𝑥)|,
𝜀 > 0,
𝑑 ( 𝑥,𝑦) < 𝜀
which represent lower semi-continuous (hence Borel measurable) functions, as long as 𝑓 is continuous (as the supremum of a family of lower semi-continuous functions 𝑥 → | 𝑓 (𝑦) − 𝑓 (𝑥)| 1 {𝑑 ( 𝑥,𝑦) < 𝜀 } ). It then follows from (5.2) that, for all 𝑥 ∈ M, |∇ 𝑓 (𝑥)| = lim sup 𝜀→0
h i Δ 𝜀 𝑓 (𝑥) = lim sup 𝑛Δ1/𝑛 𝑓 (𝑥) , 𝜀 𝑛→∞
implying that the function |∇ 𝑓 | is Borel measurable. The definition (5.2) extends by the same formula to complex-valued functions 𝑓 = 𝑢 + 𝑖𝑣, 𝑢 = Re( 𝑓 ), 𝑣 = Im( 𝑓 ), and then we have the obvious relations |∇𝑢| ≤ |∇ 𝑓 |,
|∇𝑣| ≤ |∇ 𝑓 |,
|∇ 𝑓 | 2 ≤ |∇𝑢| 2 + |∇𝑣| 2 .
When 𝑓 is defined on an open set M in R𝑛 (with the Euclidean distance 𝑑) and is differentiable at the point 𝑥 ∈ M, this definition leads to the modulus of the gradient in the usual sense (5.1), and in the complex-valued case we have |∇ 𝑓 (𝑥)| 2 = |∇𝑢(𝑥)| 2 + |∇𝑣(𝑥)| 2 . But, if 𝑓 is not differentiable at 𝑥, it is safer to understand |∇ 𝑓 (𝑥)| according to (5.2). Note that when 𝑓 is locally Lipschitz on the open set M ⊂ R𝑛 , it must be differentiable at almost all points with respect to the Lebesgue measure (by a wellknown theorem of Rademacher). In dimension one, any locally Lipschitz function 𝑓 on M = (𝑎, 𝑏) is absolutely continuous (locally). Conversely, if 𝑓 is (locally) absolutely continuous on that interval, then (5.2) defines the modulus of the Radon– Nikodym derivative of 𝑓 . Another important particular space which will be used intensively is the unit sphere M = S𝑛−1 , where (5.2) defines the length of the spherical gradient of 𝑓 at 𝑥. In some problems/Sobolev-type inequalities, it makes sense to slightly modify the notion of the generalized modulus of gradient. For example, on product spaces M = M1 × · · · × M𝑛 with Euclidean-type metrics
5.1 Moduli of Gradients in the Continuous Setting
𝑑 (𝑥, 𝑦) =
𝑛 ∑︁
93
𝑑 𝑘 (𝑥 𝑘 , 𝑦 𝑘 ) 2
1/2 ,
𝑘=1
where 𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ), 𝑦 = (𝑦 1 , . . . , 𝑦 𝑛 ), 𝑥 𝑘 , 𝑦 𝑘 ∈ M 𝑘 , and where 𝑑 𝑘 is a metric in M 𝑘 , a natural alternative is given by ∇ 𝑓 (𝑥)| = |e
𝑛 ∑︁
|∇ 𝑥𝑘 𝑓 (𝑥)| 2
1/2 .
(5.3)
𝑘=1
Here |∇ 𝑥𝑘 𝑓 (𝑥)| is defined on (M 𝑘 .𝑑 𝑘 ) according to (5.2) for the function 𝑥 𝑘 → 𝑓 (𝑥) with fixed 𝑥𝑖 , 𝑖 ≠ 𝑘. If 𝑓 is locally Lipschitz on M = R𝑛 , we have again | ∇˜ 𝑓 (𝑥)| = |∇ 𝑓 (𝑥)| for all points 𝑥 ∈ R𝑛 where 𝑓 is differentiable (hence, for almost all 𝑥). The definition (5.2) saves many relations from calculus or, at least, it transforms them into corresponding inequalities. For example, for any family of functions ( 𝑓𝑡 )𝑡 ∈𝑇 on M, ∇ sup 𝑓𝑡 (𝑥) ≤ sup |∇ 𝑓𝑡 (𝑥)|. 𝑡 ∈𝑇
𝑡 ∈𝑇
The chain rule formula turns into |∇𝑇 ( 𝑓 )| ≤ |𝑇 ′ ( 𝑓 )| |∇ 𝑓 (𝑥)|, where |𝑇 ′ | may also be understood according to (5.2) on the real line R. It will be called the chain rule inequality. An application of the operator 𝑓 → |∇ 𝑓 | to the function |∇ 𝑓 | leads to the second order modulus of the gradient |∇2 𝑓 (𝑥)| = |∇ |∇ 𝑓 (𝑥)| | = lim sup 𝑦→𝑥
| |∇ 𝑓 (𝑥)| − |∇ 𝑓 (𝑦)| | . 𝑑 (𝑥, 𝑦)
(5.4)
Note that the Lipschitz property ∥ 𝑓 ∥ Lip ≤ 1 implies that |∇ 𝑓 (𝑥)| ≤ 1 for all 𝑥 ∈ M, and the converse is also often true, including the case where M is an open convex subset of R𝑛 (with the induced Euclidean distance). Hence, in this case, the requirement |∇2 𝑓 (𝑥)| ≤ 1 for every 𝑥 ∈ M means that the function |∇ 𝑓 | is Lipschitz. If |∇ 𝑓 | is locally Lipschitz, then 𝑓 is of course locally Lipschitz as well. If 𝑓 is 𝐶 2 -smooth on an open subset M of R𝑛 , we denote by 2 𝑛 𝜕 𝑓 (𝑥) 𝑓 ′′ (𝑥) = 𝜕𝑥𝑖 𝜕𝑥 𝑗 𝑖, 𝑗=1 the 𝑛 × 𝑛 matrix of second order partial derivatives of 𝑓 at 𝑥 (the Hessian of 𝑓 ). In this case, the function |∇ 𝑓 | will be locally Lipschitz, and as is easy to check, |∇2 𝑓 (𝑥)| = |∇ 𝑓 (𝑥)| −1 | 𝑓 ′′ (𝑥)∇ 𝑓 (𝑥)| for all points at which |∇ 𝑓 (𝑥)| > 0. Here the right-hand side should be understood as the operator norm ∥ 𝑓 ′′ (𝑥) ∥ in the case |∇ 𝑓 (𝑥)| = 0. In particular, in all cases
94
5 Standard Analytic Conditions
|∇2 𝑓 (𝑥)| ≤ ∥ 𝑓 ′′ (𝑥) ∥
for all 𝑥 ∈ M.
For example, for the quadratic function 𝑓 (𝑥) =
1 2
2 𝑘=1 𝜆 𝑘 𝑥 𝑘 ,
Í𝑛
𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ),
√︃Í 𝑛 2
4 2 𝑘=1 𝜆 𝑘 𝑥 𝑘
|∇ 𝑓 (𝑥)| = √︃ Í𝑛
2 2 𝑘=1 𝜆 𝑘 𝑥 𝑘
≤ max |𝜆 𝑘 |. 𝑘
5.2 Perimeter and Co-area Inequality Let 𝜇 be a Borel probability measure on the metric space (𝑀, 𝑑). Integrals involving moduli of gradients are closely related to geometric characteristics such as the perimeter of sets (via the co-area formula). Given a non-empty set 𝐴 in M and 𝜀 > 0, denote by 𝐴 𝜀 = {𝑥 ∈ M : 𝑑 (𝑥, 𝑦) < 𝜀 for some 𝑦 ∈ 𝐴} an 𝜀-neighborhood of 𝐴. Being a union of open balls in M, it is an open set as well. The (outer) 𝜇-perimeter of 𝐴 is defined by 𝜇+ ( 𝐴) = lim inf 𝜀→0
𝜇( 𝐴 𝜀 ) − 𝜇( 𝐴) . 𝜀
In the abstract metric probability space setting the co-area formula takes the form of the following inequality. Proposition 5.2.1 For any Lipschitz function 𝑓 on M, ∫ ∫ ∞ |∇ 𝑓 | d𝜇 ≥ 𝜇+ {| 𝑓 | > 𝑡} d𝑡.
(5.5)
0
Proof Since |∇| 𝑓 | | ≤ |∇ 𝑓 |, we may assume that 𝑓 ≥ 0. First, let 𝑓 be bounded and satisfy | 𝑓 (𝑥) − 𝑓 (𝑦)| ≤ 𝐿𝑑 (𝑥, 𝑦) for all 𝑥, 𝑦 ∈ M with some constant 𝐿. Put 𝐴(𝑡) = {𝑥 ∈ M : 𝑓 (𝑥) > 𝑡}, 𝑡 ∈ R, and 𝑓 𝜀 (𝑥) = sup{ 𝑓 (𝑦) : 𝑑 (𝑥, 𝑦) < 𝜀},
𝜀 > 0.
Clearly, {𝑥 ∈ M : 𝑓 𝜀 (𝑥) > 𝑡} = 𝐴(𝑡) 𝜀 , so, these sets are open, which implies that the function 𝑓 𝜀 is lower semi-continuous. We can also write ∫ ∫ ∞ 𝑓 𝜀 d𝜇 = 𝜇( 𝐴(𝑡) 𝜀 ) d𝑡, 0
and with a similar representation for 𝑓 we get ∫ ∫ ∞ 𝑓 𝜀 (𝑥) − 𝑓 (𝑥) 𝜇( 𝐴(𝑡) 𝜀 ) − 𝜇( 𝐴(𝑡)) d𝜇(𝑥) = d𝑡. 𝜀 𝜀 0
5.2 Perimeter and Co-area Inequality
95
By the Lipschitz condition, 0 ≤ 𝑓 𝜀 (𝑥) − 𝑓 (𝑥) ≤ 𝐿𝜀 for all 𝑥 ∈ M, so, the left integrand is bounded by 𝐿. Therefore, using the Lebesgue dominated convergence theorem and Fatou’s lemma, and noting that lim sup 𝜀→0
we have ∫
𝑓 𝜀 (𝑥) − 𝑓 (𝑥) 𝑓 (𝑦) − 𝑓 (𝑥) = lim sup ≤ |∇ 𝑓 (𝑥)|, 𝜀 𝜀 𝑦→𝑥
∫
𝑓𝜀 − 𝑓 d𝜇 lim sup 𝜀 𝜀→0 ∫ ∫ 𝑓𝜀 − 𝑓 𝑓𝜀 − 𝑓 ≥ lim sup d𝜇 ≥ lim inf d𝜇 𝜀→0 𝜀 𝜀 𝜀→0 ∫ ∞ 𝜇( 𝐴(𝑡) 𝜀 ) − 𝜇( 𝐴(𝑡)) = lim inf d𝑡 𝜀→0 𝜀 0 ∫ ∞ ∫ ∞ 𝜇( 𝐴(𝑡) 𝜀 ) − 𝜇( 𝐴(𝑡)) ≥ lim inf d𝑡 = 𝜇+ ( 𝐴(𝑡)) d𝑡. 𝜀→0 𝜀 0 0
|∇ 𝑓 | d𝜇 ≥
Hence, we obtain (5.5). To remove the boundedness assumption, one may use a truncation argument. Define 𝑓𝑟 (𝑥) = max{ 𝑓 (𝑥), 𝑟}, where 𝑟 tends to infinity along the values such that 𝜇{| 𝑓 | = 𝑟 } = 0. Then 𝑓 has a Lipschitz constant at most 𝐿, with |∇ 𝑓𝑟 | = |∇ 𝑓 | on the set { 𝑓 < 𝑟} and |∇ 𝑓𝑟 | = 0 on { 𝑓 > 𝑟}. Hence, by the previous step applied to 𝑓𝑟 , ∫ ∫ 𝑟 |∇ 𝑓 | d𝜇 ≥ 𝜇+ { 𝑓 > 𝑡} d𝑡. { 𝑓 𝜇( 𝐴), then, by the definition of the perimeter, 𝜇+ ( 𝐴) = ∞, so there is nothing to prove. Suppose that 𝜇(clos( 𝐴)) = 𝜇( 𝐴). Since clos( 𝐴 𝜀 ) ⊂ 𝐴 𝜀′ for 𝜀 < 𝜀 ′, one has 𝜇+ ( 𝐴) = lim inf 𝜀→0
𝜇(clos( 𝐴 𝜀 )) − 𝜇( 𝐴) . 𝜀
96
5 Standard Analytic Conditions
Hence there exists a sequence 𝜀 𝑛 ↓ 0 such that 𝜇(clos( 𝐴 𝜀𝑛 )) − 𝜇( 𝐴) → 𝜇+ ( 𝐴) 𝜀𝑛
as 𝑛 → ∞.
Now, take a sequence 𝑐 𝑛 ∈ (0, 1), 𝑐 𝑛 → 0, and let o n 𝑑(𝐴 𝑐𝑛 𝜀𝑛 , 𝑥) , 𝑓𝑛 (𝑥) = 1 − min 1, (1 − 𝑐 𝑛 ) 𝜀 𝑛 where 𝑑 (𝐵, 𝑥) = inf{𝑑 (𝑦, 𝑥) : 𝑦 ∈ 𝐵} denotes the distance from a set 𝐵 in M to a point 𝑥 ∈ M. Any such function 𝑑 (𝐵, 𝑥) has Lipschitz semi-norm at most 1, hence |∇ 𝑓𝑛 (𝑥)| ≤ ∥ 𝑓𝑛 ∥ Lip ≤
1 , (1 − 𝑐 𝑛 ) 𝜀 𝑛
𝑥 ∈ M.
Necessarily 𝑑 ( 𝐴𝑐𝑛 𝜀𝑛 , 𝑥) ≥ (1 − 𝑐 𝑛 ) 𝜀 𝑛 for 𝑥 ∉ 𝐴 𝜀𝑛 . Indeed, in the other case, we have 𝑑 (𝑦, 𝑥) < (1 − 𝑐 𝑛 ) 𝜀 𝑛 for some 𝑦 ∈ 𝐴𝑐𝑛 𝜀𝑛 which means that 𝑑 (𝑎, 𝑦) < 𝑐 𝑛 𝜀 𝑛 for some 𝑎 ∈ 𝐴. Hence, by the triangle inequality, 𝑑 (𝑎, 𝑥) ≤ 𝑑 (𝑎, 𝑦) + 𝑑 (𝑦, 𝑥) < 𝜀 𝑛 , implying 𝑥 ∈ 𝐴 𝜀𝑛 . Thus, 𝑓𝑛 (𝑥) = 0 for 𝑥 ∉ 𝐴 𝜀𝑛 and therefore |∇ 𝑓𝑛 (𝑥)| = 0 on the open set M \ clos( 𝐴 𝜀𝑛 ). On the other hand, 𝑓𝑛 (𝑥) = 1 on the open set 𝐴𝑐𝑛 𝜀𝑛 , where |∇ 𝑓𝑛 (𝑥)| = 0 as well. As a result, 1 𝐴𝑐𝑛 𝜀𝑛 ≤ 𝑓𝑛 ≤ 1 𝐴 𝜀𝑛 so that lim 𝑓𝑛 (𝑥) = lim 1 𝐴 𝜀 (𝑥) = 1clos( 𝐴) (𝑥)
𝑛→∞
𝜀→0
for all 𝑥 ∈ M. On the other hand, ∫ ∫ 𝜇(clos( 𝐴 𝜀𝑛 )) − 𝜇( 𝐴) → 𝜇+ ( 𝐴). |∇ 𝑓𝑛 | d𝜇 = |∇ 𝑓𝑛 | d𝜇 ≤ (1 − 𝑐 ) 𝜀 𝑛 𝑛 clos( 𝐴 𝜀𝑛 )\𝐴𝑐𝑛 𝜀𝑛 Proposition 5.2.2 is proved.
□
5.3 Poincaré-type Inequalities Analytic conditions imposed on a measure are often stated in the form of Poincarétype inequalities. Let us start with a standard variant of such integro-differential relations, assuming that (M, 𝑑) is an arbitrary metric space (without isolated points). Definition 5.3.1 A Borel probability measure 𝜇 on M is said to satisfy a Poincarétype inequality with constant 𝜆1 > 0 if, for any bounded, Lipschitz function 𝑓 on M, ∫ 𝜆1 Var 𝜇 ( 𝑓 ) ≤ |∇ 𝑓 | 2 d𝜇. (5.6) The best value 𝜆1 = 𝜆1 (𝜇) = 𝜆1 (M, 𝑑, 𝜇) in (5.6) is called the Poincaré constant. If 𝑋 is a random element in M with distribution 𝜇, (5.6) may be rewritten in the form
5.3 Poincaré-type Inequalities
97
𝜆1 Var( 𝑓 (𝑋)) ≤ E |∇ 𝑓 (𝑋)| 2 , and then we also write 𝜆1 = 𝜆1 (𝑋). Before giving any examples and comments about properties of such measures, first let us make sure that (5.6) may be extended to a larger class of functions, or in the case of Euclidean space – may be reduced to the class of smooth functions in terms of the usual gradient. Proposition 5.3.2 If the inequality (5.6) holds true for any Lipschitz function 𝑓 supported on some ball in M, then it may be extended to the class of all locally Lipschitz functions 𝑓 on M. More precisely, if 𝑓 is locally Lipschitz, while |∇ 𝑓 | belongs to 𝐿 2 (𝜇), then 𝑓 is in 𝐿 2 (𝜇) as well, and the inequality (5.6) is true with the same constant 𝜆1 . Recall that the property of being locally Lipschitz is understood as having finite Lipschitz semi-norms on every ball in M (which is consistent with the traditional understanding, when M is locally compact, with balls being compact). Also, if 𝑓 is Lipschitz and is supported on some ball in M, then it has to be bounded on M. Note that, in the current setting of an abstract metric space, the inequality (5.6) may be extended to the class of all locally Lipschitz complex-valued functions 𝑓 = 𝑢 + 𝑖𝑣 on M in a weaker form as the relation ∫ ∫ | 𝑓 | 2 d𝜇 ≤ 2 |∇ 𝑓 | 2 d𝜇, 𝜆1 with the assumption
∫
𝑓 d𝜇 = 0 (by applying (5.6) to 𝑢 and 𝑣).
Proof (of Proposition 5.3.2) Assuming that 𝑓 is Lipschitz on every ball and is bounded so that | 𝑓 (𝑥)| ≤ 𝐴, 𝑥 ∈ M, for some constant 𝐴, consider the functions 1 𝑓𝑟 (𝑥) = 𝑓 (𝑥) 1 − 𝑑 (𝑥, 𝑥 0 ) + 𝑟 with a fixed point 𝑥0 ∈ M. They are supported on the closed balls 𝐵(𝑥0 , 𝑟) with center at 𝑥0 and radius 𝑟 > 0, and moreover, |∇ 𝑓𝑟 (𝑥)| ≤ (|∇ 𝑓 (𝑥)| + 𝑟1 𝐴) 1 𝐵( 𝑥0 ,𝑟) . Hence, an application of (5.6) to 𝑓𝑟 leads to the same inequality for 𝑓 as 𝑟 → ∞. Indeed, since 𝑓𝑟 (𝑥) → 𝑓 (𝑥) for all 𝑥 ∈ M, we may apply Fatou’s lemma, which gives ∫∫ 𝜆1 𝜆1 Var 𝜇 ( 𝑓 ) = ( 𝑓 (𝑥) − 𝑓 (𝑦)) 2 d𝜇(𝑥) d𝜇(𝑦) 2 ∫∫ ∫ 𝜆1 2 lim inf ≤ ( 𝑓𝑟 (𝑥) − 𝑓𝑟 (𝑦)) d𝜇(𝑥) d𝜇(𝑦) ≤ lim inf |∇ 𝑓𝑟 | 2 d𝜇. 𝑟→∞ 2 𝑟→∞ On the other hand, assuming that |∇ 𝑓 | ∈ 𝐿 2 (𝜇), the last integral does not exceed ∫ ∫ 2𝐴 𝐴2 2 |∇ 𝑓 | + 2 d𝜇 → |∇ 𝑓 | 2 d𝜇 (𝑟 → ∞). |∇ 𝑓 | + 𝑟 𝑟
98
5 Standard Analytic Conditions
To remove the boundedness assumption, assume that 𝑓 define 𝑓 (𝑥), if n o 𝑓𝑟 (𝑥) = max − 𝑟, min{ 𝑓 , 𝑟} = 𝑟, if if −𝑟,
is locally Lipschitz and | 𝑓 (𝑥)| < 𝑟, 𝑓 (𝑥) ≥ 𝑟, 𝑓 (𝑥) ≤ −𝑟,
where 𝑟 will tend to infinity along the values such that 𝜇{| 𝑓 | = 𝑟 } = 0. In that case, 𝑓𝑟 is locally Lipschitz for any 𝑟 > 0, while |∇ 𝑓𝑟 (𝑥)| = |∇ 𝑓 (𝑥)| 1𝑈𝑟 (𝑥) for 𝜇-almost all 𝑥, where 𝑈𝑟 = {𝑥 : | 𝑓 (𝑥)| < 𝑟} (which is an open set). Hence, an application of (5.6) to 𝑓𝑟 gives ∫ ∫ ∫ ∫ 𝜆1 𝜆1 2 | 𝑓 (𝑥) − 𝑓 (𝑦)| d𝜇(𝑥) d𝜇(𝑦) ≤ | 𝑓𝑟 (𝑥) − 𝑓𝑟 (𝑦)| 2 d𝜇(𝑥) d𝜇(𝑦) 2 𝑈𝑟 𝑈𝑟 2 M M ∫ ∫ 2 = 𝜆1 Var 𝜇 ( 𝑓𝑟 ) ≤ |∇ 𝑓𝑟 | d𝜇 = |∇ 𝑓 | 2 d𝜇. M
𝑈𝑟
Sending 𝑟 → ∞ completes the proof of the proposition.
□
5.4 The Euclidean Setting In the case of the Euclidean space M = R𝑛 , under certain regularity assumptions imposed on the density of the measure 𝜇, the number 𝜆1 is described as the smallest positive eigenvalue of the Sturm–Liouville operator associated to the density. Hence, 𝜆1 is also called the spectral gap. The functional 𝜆1 = 𝜆1 (𝑋), where the random vector 𝑋 is distributed according to 𝜇, is translation invariant, homogeneous of order −2, and is invariant under all orthogonal transformations of the space, i.e., 𝜆1 (𝑎 + 𝑏𝑇 (𝑋)) = 𝑏 −2 𝜆1 (𝑋)
(𝑎 ∈ R𝑛 , 𝑏 ≠ 0, 𝑇 ∈ O (𝑛)).
Many important probability distributions on R𝑛 are known to satisfy Poincarétype inequalities, although the problem of bounding 𝜆1 from below is often nontrivial. Note that an application of (5.6) to linear functionals leads to the simple upper bound 1 𝜆1 (𝑋) ≤ inf . | 𝜃 |=1 Var(⟨𝑋, 𝜃⟩) The aim of this section is to complement Proposition 5.3.2 with the following.
5.4 The Euclidean Setting
99
Proposition 5.4.1 Suppose that a Borel probability measure 𝜇 is absolutely continuous on R𝑛 . If the inequality (5.6) is fulfilled in the class of all 𝐶 ∞ -smooth, compactly supported functions 𝑓 : R𝑛 → R, then it continues to hold for all locally Lipschitz 𝑓 on R𝑛 . Proof By Proposition 5.3.2, we may assume that 𝑓 is compactly supported and has a finite Lipschitz semi-norm 𝐿 = ∥ 𝑓 ∥ Lip . By Rademacher’s theorem, 𝑓 is differentiable at almost all points 𝑥 ∈ R𝑛 , and for every such point, the generalized modulus of the gradient |∇ 𝑓 (𝑥)| represents the Euclidean length of the usual gradient ∇ 𝑓 (𝑥). The same is true of the modified modulus of the gradient ∇ 𝑓 (𝑥)| = |e
lim sup 𝑦1 ,𝑦2 →𝑥, 𝑦1 ≠𝑦2
| 𝑓 (𝑦 1 ) − 𝑓 (𝑦 2 )| . |𝑦 1 − 𝑦 2 |
(5.7)
By the definition, if | e ∇ 𝑓 (𝑥)| < 𝑎, then, for some 𝜀 > 0, sup
n | 𝑓 (𝑦 ) − 𝑓 (𝑦 )| o 1 2 : |𝑦 1 − 𝑥| < 𝜀, |𝑦 2 − 𝑥| < 𝜀, 𝑦 1 ≠ 𝑦 2 < 𝑎. |𝑦 1 − 𝑦 2 |
Hence | e ∇ 𝑓 (𝑥 ′)| < 𝑎 whenever |𝑥 ′ − 𝑥| < 𝜀. This implies lim sup 𝑥′ →𝑥 | e ∇ 𝑓 (𝑥 ′)| ≤ |e ∇ 𝑓 (𝑥)|, which means that the function | e ∇ 𝑓 | is upper semi-continuous. Next, we employ a smoothing argument: Given a non-negative, 𝐶 ∞ -smooth func∫ tion 𝑤 on R𝑛 supported on the Euclidean ball 𝐵(0, 𝑟) and such that 𝑤 d𝑥 = 1, consider the convolutions ∫ 𝑓 𝛿 (𝑥) = 𝑓 (𝑥 − 𝛿𝑧)𝑤(𝑧) d𝑧 ∫ 1 = 𝛿−𝑛 𝑤 𝑥 − 𝑧 𝑓 (𝑧) d𝑧 (𝑥 ∈ R𝑛 , 𝛿 > 0). 𝛿 Every such function 𝑓 𝛿 is compactly supported (since 𝑓 has a compact support) and is 𝐶 ∞ -smooth, so that (5.6) is fulfilled: ∫ 𝜆1 Var 𝜇 ( 𝑓 𝛿 ) ≤ |∇ 𝑓 𝛿 | 2 d𝜇. (5.8) By the Lipschitz assumption, | 𝑓 𝛿 (𝑥) − 𝑓 (𝑥)| ≤ 𝐶𝛿 for some constant 𝐶 independent of 𝑥, so, the variances Var 𝜇 ( 𝑓 𝛿 ) tend to Var 𝜇 ( 𝑓 ) as 𝛿 → 0. To handle the right integrals in (5.8), recall the operators Δ 𝜀 𝑓 (𝑥) =
sup | 𝑓 (𝑦) − 𝑓 (𝑥)|
(𝑥 ∈ R𝑛 , 𝜀 > 0).
|𝑦−𝑥 |< 𝜀
As was stressed, Δ 𝜀 𝑓 is lower semi-continuous. Since | 𝑓 (𝑦 − 𝛿𝑧) − 𝑓 (𝑥 − 𝛿𝑧)| ≤ Δ 𝜀 𝑓 (𝑥 − 𝛿𝑧) for |𝑦 − 𝑥| ≤ 𝜀, we get ∫ ∫ | 𝑓 𝛿 (𝑦) − 𝑓 𝛿 (𝑥)| ≤ | 𝑓 (𝑦 − 𝛿𝑧) − 𝑓 (𝑥 − 𝛿𝑧)| 𝑤(𝑧) d𝑧 ≤ Δ 𝜀 𝑓 (𝑥 − 𝛿𝑧) 𝑤(𝑧) d𝑧
100
5 Standard Analytic Conditions
and therefore
Using
1 1 Δ 𝜀 𝑓 𝛿 (𝑥) ≤ 𝜀 𝜀
∫ Δ 𝜀 𝑓 (𝑥 − 𝛿𝑧) 𝑤(𝑧) d𝑧.
1 𝜀
Δ 𝜀 𝑓 ≤ 𝐿 and applying the dominated convergence theorem, we obtain that h1 ∫ i Δ 𝜀 𝑓 𝛿 (𝑥) ≤ lim sup Δ 𝜀 𝑓 (𝑥 − 𝛿𝑧) 𝑤(𝑧) d𝑧 |∇ 𝑓 𝛿 (𝑥)| = lim sup 𝜀 𝜀 𝜀→0 𝜀→0 ∫ h1 i Δ 𝜀 𝑓 (𝑥 − 𝛿𝑧) 𝑤(𝑧) d𝑧 ≤ lim sup 𝜀 𝜀→0 ∫ ∫ = |∇ 𝑓 (𝑥 − 𝛿𝑧)| 𝑤(𝑧) d𝑧 = |e ∇ 𝑓 (𝑥 − 𝛿𝑧)| 𝑤(𝑧) d𝑧.
The last two integrands coincide for almost all 𝑧, by the Lipschitz property of 𝑓 . One may now use the upper semi-continuity of | e ∇ 𝑓 | and apply the Lebesgue dominated convergence theorem once more. This gives ∫ lim sup |∇ 𝑓 𝛿 (𝑥)| ≤ lim sup |e ∇ 𝑓 (𝑥 − 𝛿𝑧)| 𝑤(𝑧) d𝑧 𝛿→0+ 𝛿→0+ ∫ ≤ lim sup | e ∇ 𝑓 (𝑥 − 𝛿𝑧)| 𝑤(𝑧) d𝑧 = | e ∇ 𝑓 (𝑥)|. 𝛿→0+
As a result, from (5.8) we derive the desired inequality ∫ ∫ 2 e 𝜆1 Var 𝜇 ( 𝑓 ) ≤ | ∇ 𝑓 | d𝜇 = |∇ 𝑓 | 2 d𝜇. Proposition 5.4.1 is proved.
□
Without the assumption that 𝜇 is absolutely continuous, we have thus derived a slightly weaker variant of (5.6), namely, ∫ 𝜆1 Var 𝜇 ( 𝑓 ) ≤ |e ∇ 𝑓 | 2 d𝜇 (5.9) with a modified modulus of gradient defined in (5.7). For various applications, one would like to apply (5.6) to functions of the form 𝑇 ( 𝑓 ) with 𝑓 being Lipschitz. Starting with the relation (5.9), note first that, according to the definition (5.7), we still have a chain rule inequality |e ∇𝑇 ( 𝑓 )| ≤ |( e ∇𝑇) ( 𝑓 )| | e ∇ 𝑓 |, where | e ∇𝑇 | is defined in (5.7) on the real line. Since also | e ∇ 𝑓 (𝑥)| ≤ ∥ 𝑓 ∥ Lip for all 𝑥 ∈ R𝑛 , we arrive at the following distributional inequality on the basis of (5.9).
5.5 Isoperimetry and Cheeger-type Inequalities
101
Proposition 5.4.2 Suppose that a Borel probability measure 𝜇 on R𝑛 admits a Poincaré-type inequality (5.6) in the class of all 𝐶 ∞ -smooth, compactly supported functions on R𝑛 . Then, for any function 𝑓 on R𝑛 with ∥ 𝑓 ∥ Lip ≤ 1 and any locally Lipschitz function 𝑇 on R, ∫ (5.10) |( e 𝜆1 Var 𝜇 (𝑇 ( 𝑓 )) ≤ ∇𝑇) ( 𝑓 )| 2 d𝜇. Here, | e ∇𝑇 | = |𝑇 ′ | as long as 𝑇 has a derivative 𝑇 ′. Remark 5.4.3 Under the assumption of Proposition 5.4.1, the inequality (5.6) may be extended to the class of all complex-valued, locally Lipschitz functions 𝑓 = 𝑢 + 𝑖𝑣 on R𝑛 . Indeed, 𝑓 is differentiable a.e., so that |∇ 𝑓 (𝑥)| 2 = |∇𝑢(𝑥)| 2 + |∇𝑣(𝑥)| 2 for almost all points 𝑥 ∈ R𝑛 . If M is an open subset of R𝑛 with the Euclidean metric, and 𝜇 is absolutely continuous, such a generalization should be modified as follows: If the inequality (5.6) is fulfilled in the class of all 𝐶 ∞ -smooth, Lipschitz functions 𝑓 : M → R, then it continues to hold for all locally Lipschitz 𝑓 : M → C. The same remark applies to many other spaces including the unit sphere M = S𝑛−1 , which we equip with the normalized spherical Lebesgue measure 𝜇 = 𝔰𝑛−1 .
5.5 Isoperimetry and Cheeger-type Inequalities Let us return to the setting of an abstract metric space (M, 𝑑) with a Borel probability measure 𝜇. In some problems it is desirable to use an 𝐿 1 -version of the Poincaré-type inequality such as ∫ ∫ ℎ
| 𝑓 − 𝑚| d𝜇 ≤
|∇ 𝑓 | d𝜇.
(5.11)
It is required to hold with some constant ℎ > 0 for all bounded, Lipschitz functions 𝑓 on M (and then it will hold for all locally Lipschitz 𝑓 like in Proposition 5.3.2). Here 𝑚 = 𝑚( 𝑓 ) denotes a median of 𝑓 under 𝜇, that is, a real number 𝑚 such that 𝜇{ 𝑓 ≤ 𝑚} ≥
1 , 2
𝜇{ 𝑓 ≥ 𝑚} ≥
1 . 2
Note that the first integral in (5.11) does not depend on the choice of the median. The inequality (5.11) is related to problems of an isoperimetric flavor. As in Section 5.2, denote by 𝐴 𝜀 an open 𝜀-neighborhood of a set 𝐴 in M. The isoperimetric problem in (M, 𝑑, 𝜇) amounts finding lower bounds for the quantity 𝜇( 𝐴 𝜀 ) uniformly over all sets 𝐴 ⊂ M of measure 𝜇( 𝐴) ≥ 𝑝 for fixed 0 < 𝑝 < 1 and 𝜀 > 0. As another variant, this problem is stated in terms of the 𝜇-perimeter as the relation 𝜇+ ( 𝐴) ≥ 𝐼 (𝜇( 𝐴)) in the class of all Borel subsets 𝐴 in M. An optimal function 𝐼 = 𝐼 𝜇 in such a relation is called an isoperimetric profile or an isoperimetric function of 𝜇.
102
5 Standard Analytic Conditions
Definition 5.5.1 The quantity ℎ(𝜇) = inf
𝜇+ ( 𝐴) , min{𝜇( 𝐴), 1 − 𝜇( 𝐴)}
where the infimum runs over all Borel sets 𝐴 in M of measure 0 < 𝜇( 𝐴) < 1, is called the Cheeger isoperimetric constant for the metric probability space (M, 𝑑, 𝜇). That is, ℎ = ℎ(𝜇) is an optimal value in the inequality 𝐼 𝜇 ( 𝑝) ≥ ℎ min{𝑝, 1 − 𝑝} with arbitrary 0 < 𝑝 < 1. Proposition 5.5.2 The Cheeger isoperimetric constant is an optimal non-negative value ℎ for which the 𝐿 1 -Poincaré-type inequality (5.11) holds true. Proof In one direction, in order to derive an isoperimetric-type inequality 𝜇+ ( 𝐴) ≥ ℎ min{𝜇( 𝐴), 1 − 𝜇( 𝐴)}
(5.12)
on the basis of (5.11), we may assume that 𝜇(clos( 𝐴)) = 𝜇( 𝐴) (since otherwise 𝜇+ ( 𝐴) = ∞). We apply Proposition 5.2.2 and take a sequence of Lipschitz functions 𝑓𝑛 : M → [0, 1] such that 𝑓𝑛 → 1clos( 𝐴) pointwise and ∫ lim sup |∇ 𝑓𝑛 | d𝜇 ≤ 𝜇+ ( 𝐴). 𝑛→∞
∫ By (5.11), the integral on the left is greater than or equal to ℎ | 𝑓𝑛 − 𝑚 𝑛 | d𝜇, where 𝑚 𝑛 ∈ [0, 1] are medians for 𝑓𝑛 . If 𝑝 = 𝜇( 𝐴) < 1/2, then necessarily 𝑚 𝑛 → 0. Indeed, suppose that 𝑚 𝑛 ≥ 𝑚 > 0 for infinitely many values of 𝑛. By the definition 1 1 of medians, the Ñ sets 𝐴Ð 𝑛 = { 𝑓 𝑛 ≥ 𝑚 𝑛 } have 𝜇-measures at least 2 . Hence, 𝜇(𝐵) ≥ 2 for the set 𝐵 = 𝑛≥1 𝑘 ≥𝑛 𝐴 𝑘 . But, for any point 𝑥 ∈ 𝐵, we have 𝑓𝑛 (𝑥) ≥ 𝑚 𝑛 for infinitely many values of 𝑛 and hence 1clos( 𝐴) (𝑥) ≥ 𝑚, that is, 𝑥 ∈ clos( 𝐴). This means that 𝐵 ⊂ clos( 𝐴), which would imply that 𝜇( 𝐴) ≥ 12 . Thus, by the Lebesgue dominated convergence theorem, ∫ ∫ lim | 𝑓𝑛 − 𝑚 𝑛 | d𝜇 = lim 𝑓𝑛 d𝜇 = 𝜇(clos( 𝐴)) = 𝜇( 𝐴), 𝑛→∞
𝑛→∞
thus proving (5.12). Similarly, if 𝑝 > 1/2, then necessarily 𝑚 𝑛 → 1, so that ∫ ∫ lim | 𝑓𝑛 − 𝑚 𝑛 | d𝜇 = lim (1 − 𝑓𝑛 ) d𝜇 = 1 − 𝜇(clos( 𝐴)) = 1 − 𝜇( 𝐴), 𝑛→∞
𝑛→∞
which also proves (5.12). Finally, in the case 𝑝 = 1/2, take a convergent subsequence 𝑚 𝑛′ → 𝑚 ∈ [0, 1] and note that any 𝑚 ∈ [0, 1] represents a median for the indicator function 1clos( 𝐴) under 𝜇. Hence, ∫ ∫ 1 lim | 𝑓𝑛′ − 𝑚 𝑛′ | d𝜇 = lim | 𝑓𝑛 − 𝑚| d𝜇 = 𝜇(clos( 𝐴)) = 𝜇( 𝐴) = , 𝑛→∞ 𝑛→∞ 2
5.6 Rothaus Functionals
103
and we obtain (5.12) again. For the opposite direction, we may assume that 𝑚 = 0 (in view of the translation invariance of (5.11)). First let |∇ 𝑓 | = 0 on the set { 𝑓 = 0} and define 𝑓 + = max{ 𝑓 , 0},
𝑓 − = max{− 𝑓 , 0},
so that 𝑓 = 𝑓 + − 𝑓 − and | 𝑓 | = 𝑓 + + 𝑓 − . Note that |∇ 𝑓 + | = |∇ 𝑓 | and |∇ 𝑓 − | = |∇ 𝑓 | on the open sets { 𝑓 > 0} and { 𝑓 < 0}, respectively. Hence, applying (5.12) and the co-area inequality of Proposition 5.2.1 to the function 𝑓 + , we get ∫ ∞ ∫ ∫ ∞ 𝜇{ 𝑓 > 𝑡} d𝑡. |∇ 𝑓 | d𝜇 ≥ 𝜇+ { 𝑓 > 𝑡} d𝑡 ≥ ℎ { 𝑓 >0}
Similarly for ∫
𝑓−
0
0
we have ∫ |∇ 𝑓 | d𝜇 ≥
{ 𝑓 𝑡} d𝑡 = ℎ 0
which is the required inequality (5.11). In order to remove the assumption that |∇ 𝑓 | = 0 on the set √ { 𝑓 = 0}, one may apply the previous step to the functions 𝑇𝜀 ( 𝑓 ) with 𝑇𝜀 (𝑥) = 𝜀 2 + 𝑥 2 − 𝜀 (𝜀 > 0). Since in general |∇𝑇𝜀 ( 𝑓 )| ≤ 𝑇𝜀′ (| 𝑓 |) |∇ 𝑓 |, the modulus of the gradient |∇𝑇𝜀 ( 𝑓 )| is vanishing when 𝑓 = 0, hence when 𝑇𝜀 ( 𝑓 ) = 0. Since also 𝑚(𝑇𝜀 ( 𝑓 )) = 0, we conclude that ∫ ∫ 𝑇𝜀′ (| 𝑓 |) |∇ 𝑓 | d𝜇 ≥ ℎ 𝑇𝜀 (| 𝑓 |) d𝜇. Here, 𝑇𝜀′ ≤ 1, and letting 𝜀 → 0, in the limit we arrive at (5.11) for 𝑓 . Thus Proposition 5.5.2 is proved.
□
5.6 Rothaus Functionals One may ask whether or not one may replace the median 𝑚 in (5.11) with more tractable quantities such as the expectation. The answer is affirmative, and moreover, one may consider more general homogeneous functionals including ∫ ∫ + − L ( 𝑓 ) = sup 𝑓 𝑔1 d𝜇 + 𝑓 𝑔2 d𝜇 , (5.13) (𝑔1 ,𝑔2 ) ∈ G
where the supremum runs over some non-empty family G of pairs of 𝜇-integrable functions (𝑔1 , 𝑔2 ) on M. The value L ( 𝑓 ), finite or not, is well defined when 𝑓 + 𝑔1 and
104
5 Standard Analytic Conditions
𝑓 − 𝑔2 are 𝜇-integrable for any couple (𝑔1 , 𝑔2 ) ∈ G. In connection with isoperimetric inequalities, these functionals were introduced by Rothaus [163]. If 𝑔2 = −𝑔1 , the functional becomes ∫ 𝑓 𝑔 d𝜇 L ( 𝑓 ) = sup 𝑔 ∈𝐺
for some family 𝐺, which includes all 𝐿 𝑝 -norms. Another important example is given by the entropy functional L ( 𝑓 ) = Ent 𝜇 (| 𝑓 |) = E 𝜇 | 𝑓 | log | 𝑓 | − E 𝜇 | 𝑓 | log E 𝜇 | 𝑓 |. Proposition 5.6.1 The inequality ∫ |∇ 𝑓 | d𝜇 ≥ L ( 𝑓 )
(5.14)
holds true for all bounded Lipschitz functions 𝑓 on M (and then for all locally Lipschitz 𝑓 for which L ( 𝑓 ) is well defined) if and only if for all Borel sets 𝐴 in M, 𝜇+ ( 𝐴) ≥ max{L (1 𝐴), L (−1 𝐴)}.
(5.15)
Proof The implication (5.14) ⇒ (5.15) is based on the approximation of indicator functions by Lipschitz functions; it is very similar to the analogous step in the proof of Proposition 5.5.2. To argue in the other direction, let us return to the co-area inequality (5.5) and apply it to the functions 𝑓 + and 𝑓 − . It then yields a similar relation ∫
∫ |∇ 𝑓 | d𝜇 ≥
∞
𝜇+ { 𝑓 > 𝑡} d𝑡 +
∫
0
𝜇+ { 𝑓 < 𝑡} d𝑡
(5.16)
−∞
0
(if the property |∇ 𝑓 | = 0 on the set { 𝑓 = 0} is not fulfilled, one may use the transforms 𝑇𝜀 as in the proof of Proposition 5.4.2). To derive (5.14) from (5.15), take a couple (𝑔1 , 𝑔2 ) participating in (5.13). Then, using (5.15)–(5.16) and introducing the sets 𝐴𝑡 = { 𝑓 > 𝑡}, 𝐵𝑡 = { 𝑓 < 𝑡}, we have ∫
∫ |∇ 𝑓 | d𝜇 ≥
∞
∫
0
L (1 𝐴𝑡 ) d𝑡 +
L (−1 𝐵𝑡 ) d𝑡 −∞
0 ∞∫
∫ 0∫ 1 𝐴𝑡 𝑔1 d𝜇 d𝑡 + 1 𝐵𝑡 𝑔2 𝜇 d𝑡 −∞ ∫ ∫0 ∫ ( 𝑓 + 𝑔1 + 𝑓 − 𝑔2 ) d𝜇. = 1 𝐴0 𝑓 𝑔1 d𝜇 + 1 𝐵0 𝑓 𝑔2 d𝜇 = ∫
≥
Note that we have made use of Fubini’s theorem, which was possible due to the integrability of 𝑔1 and 𝑔2 . It remains to take the supremum over all admissible (𝑔1 , 𝑔2 ), and then we arrive at (5.14). The extension of this inequality to locally Lipschitz 𝑓 may be justified similarly to the proof of Proposition 5.3.2. □
5.6 Rothaus Functionals
105
In the particular case L( 𝑓 ) = e ℎ ∥ 𝑓 − E𝜇 𝑓 ∥1 = e ℎ sup
n∫
∫ 𝑓 𝑔 d𝜇 :
𝑔 d𝜇 = 0, |𝑔| ≤ 1
ℎ ≥ 0, (5.14) turns into with parameter e ∫ ∫ e | 𝑓 | d𝜇 ≤ |∇ 𝑓 | d𝜇. ℎ
o
(5.17)
ℎ=e Corollary 5.6.2 The best (largest) value e ℎ(𝜇) in the inequality (5.17), holding for any 𝜇-integrable, locally Lipschitz function 𝑓 on M with 𝜇-mean zero, represents an optimal one in the isoperimetric-type inequality ℎ 𝜇( 𝐴) (1 − 𝜇( 𝐴)), 𝜇+ ( 𝐴) ≥ 2e
𝐴 ⊂ M (Borel)
and is connected with the Cheeger isoperimetric constant ℎ = ℎ(𝜇) via Moreover, e ℎ = ℎ, if 𝐼 𝜇 ( 𝑝)/𝑝(1 − 𝑝) is minimized at 𝑝 = 1/2.
(5.18) 1 2
ℎ≤e ℎ ≤ ℎ.
Proof Indeed, in this case L (1 𝐴) = L (−1 𝐴) = 2 𝜇( 𝐴) (1 − 𝜇( 𝐴)). Hence, by Proposition 5.6.1, the relations (5.17) and (5.18) are equivalent for any fixed value e ℎ ≥ 0. In terms of the isoperimetric profile, one may rewrite (5.18) as the property 𝐼 𝜇 ( 𝑝) ≥ 2e ℎ 𝑝(1 − 𝑝), where the optimal value is described as e ℎ = inf
0< 𝑝0}
ℎ2 4
∫
𝑓 2 d𝜇.
If 𝑓 is not necessarily non-negative, one may apply the previous step to the functions 𝑓 + and 𝑓 − . Since they have median at the origin, we have ∫ ∫ ∫ ∫ ℎ2 ℎ2 |∇ 𝑓 | 2 d𝜇 ≥ 𝑓 2 d𝜇, |∇ 𝑓 | 2 d𝜇 ≥ 𝑓 2 d𝜇, 4 { 𝑓 >0} 4 { 𝑓 0} { 𝑓 0, i.e., with density 𝜆e−𝜆𝑥 (𝑥 > 0), one has 𝜆 1 (𝜇) =
ℎ(𝜇) = 𝜆,
1 2 𝜆 . 4
This example shows that the Cheeger inequality of Proposition 5.6.3 is optimal. To prove this, assume without loss of generality that 𝜆 = 1. Given ∫ 𝑥 a smooth function 𝑓 on the real line with bounded derivative, put 𝑔(𝑥) = 𝑓 (0) + 0 | 𝑓 ′ (𝑦)| d𝑦. Integrating by parts, we have E 𝜇 | 𝑓 − 𝑚( 𝑓 )| ≤ E 𝜇 | 𝑓 − 𝑓 (0)| ≤ E 𝜇 (𝑔 − 𝑔(0)) ∫ ∞ =𝜆 𝑔 ′ (𝑥) e−𝜆𝑥 d𝑥 = E 𝜇 | 𝑓 ′ |. 0
Hence, the inequality (5.11) holds true with ℎ = 1, which can be seen to be optimal (by testing it on exponential functions). Thus, ℎ(𝜇) = 𝜆, and, by Proposition 5.6.3, 𝜆1 (𝜇) ≥ 14 𝜆2 . For the opposite inequality, one may test the inequality
5.7 Standard Examples and Conditions
E𝜇 𝑓 2 − E𝜇 𝑓
107
2
≤
1 E 𝜇 𝑓 ′2 𝜆1
on the functions 𝑓𝑡 (𝑥) = e𝑡 𝑥 with 𝑡 < 𝜆2 . Inserting 𝑓 = 𝑓𝑡 with 𝑡 →
𝜆 2
yields 𝜆1 ≤
𝜆2 4 .
5.7.2 Two-sided exponential distributions. A similar argument leads to the same relations ℎ(𝜇) = 𝜆 and 𝜆 1 (𝜇) = 14 𝜆2 for the symmetric probability measures with densities 12 𝜆e−𝜆| 𝑥 | (𝑥 ∈ R) with parameter 𝜆 > 0. 5.7.3 Tail-type sufficient conditions. Following Borovkov and Utev [66], suppose that the probability measure 𝜇 on R may be decomposed as the mixture of two probability measures, 𝜇 = 𝛼𝜇0 + (1 − 𝛼)𝜇1 , 0 < 𝛼 ≤ 1, where 𝜇0 has density 𝑝 satisfying ∫ ∞ (𝑦 − 𝑥 0 ) 𝑝(𝑦) d𝑦 ≤ 𝑐 𝑝(𝑥) for 𝑥 > 𝑥0 , 𝑥 ∫ 𝑥 − (𝑦 − 𝑥 0 ) 𝑝(𝑦) d𝑦 ≤ 𝑐 𝑝(𝑥) for 𝑥 < 𝑥0 −∞
for some fixed 𝑥0 ∈ R and 𝑐 > 0. Then 𝜆1 (𝜇) ≥ 𝛼/𝑐. ∫ ∞We leave this assertion as an exercise. Note that, when 𝜇0 has mean 𝑥 0 , we have (𝑦 − 𝑥0 ) 𝑝(𝑦) d𝑦 = 0, and the two tail conditions can be united as one inequality −∞ ∫
∞
(𝑦 − 𝑥0 ) 𝑝(𝑦) d𝑦 ≤ 𝑐 𝑝(𝑥) for all 𝑥 ∈ R. 𝑥
5.7.4 Gaussian measures. If 𝜇 is a probability measure with density 𝑝(𝑥) =
2 2 1 √ e−( 𝑥−𝑎) /2𝜎 , 𝜎 2𝜋
𝑥 ∈ R (𝑎 ∈ R, 𝜎 > 0),
the last tail condition is fulfilled with 𝛼 = 1, 𝜇0 = 𝜇, 𝑐 = 𝜎 2 . Hence, by the previous example and also taking into account the general upper bound 𝜆1 (𝑋) ≤ 1/Var(𝑋), we conclude that 𝜆1 (𝜇) = 1/𝜎 2 . 5.7.5 Log-concave measures. Recall that a probability measure 𝜇 on R is logconcave (or logarithmically concave), if it has a log-concave density 𝑝, i.e., such that log 𝑝 is concave. Necessarily, 𝑝 decays at infinity exponentially fast, so 𝜇 has a finite exponential moment. One can show that if 𝜎 2 is the variance of 𝜇, then 1 2 ≤ ℎ2 (𝜇) ≤ 2 , 2 3𝜎 𝜎
1 1 ≤ 𝜆 1 (𝜇) ≤ 2 , 2 12𝜎 𝜎
where the constants in the bounds for the Cheeger constant are optimal (cf. [20]).
108
5 Standard Analytic Conditions
5.7.6 Necessary and sufficient conditions. Any probability measure 𝜇 on R may be decomposed in a unique way as the sum 𝜇 = 𝜇0 + 𝜇1 , where 𝜇0 is an absolutely continuous component with respect to the Lebesgue measure, with some density 𝑝, and where 𝜇1 is an orthogonal component in the sense of Measure Theory. For the property 𝜆1 > 0 (hence for ℎ > 0 as well) it is necessary that 𝑝(𝑥) > 0 almost everywhere on the support interval (𝑎, 𝑏) ⊂ R of 𝜇0 . More precisely, the Cheeger isoperimetric constant admits a simple description ℎ = ess inf 𝑎 0) |∇S 𝑓 | d𝔰𝑛−1 ≤ is equivalent to ∫
𝑓 ΔS2 𝑓 d𝔰𝑛−1 ≥ −(𝑐 + 𝑛 − 2)
∫ 𝑓 ΔS 𝑓 d𝔰𝑛−1 .
(10.3)
The latter can be studied by means of the orthogonal expansions (9.7) and (9.8) in spherical harmonics. Indeed, since ΔS 𝑓 𝑑 = −𝑑 (𝑛 + 𝑑 − 2) 𝑓 𝑑 and hence Δ2S 𝑓 𝑑 = 𝑑 2 (𝑛 + 𝑑 − 2) 2 𝑓 𝑑 , we have the orthogonal expansions ΔS 𝑓 = −
∞ ∑︁ 𝑑=1
𝑑 (𝑛 + 𝑑 − 2) 𝑓 𝑑 ,
ΔS2 𝑓 =
∞ ∑︁
𝑑 2 (𝑛 + 𝑑 − 2) 2 𝑓 𝑑 .
𝑑=1
As a result, (10.3) is reduced to the comparison 𝑑 2 (𝑛+𝑑−2) 2 ≥ (𝑐+𝑛−2)·𝑑 (𝑛+𝑑−2), that is, to the requirement 𝑐 ≤ 𝑑 2 + (𝑑 − 1) (𝑛 − 2). Thus, if we want to involve all 𝐶 2 -smooth functions 𝑓 , the optimal value of 𝑐 is described as the minimum of the right-hand side above over all 𝑑 ≥ 1. The minimum is achieved for 𝑑 = 1, which leads to the optimal value 𝑐 = 1 and thus proves the first relation of the proposition. However, if we require 𝑓 to be orthogonal to all linear functions, then we can only allow the values 𝑑 ≥ 2, and the optimal □ value is 𝑐 = 𝑛 + 2. This gives the second relation of Proposition 10.1.2. Recalling the improved Poincaré inequality (9.11), from Proposition 10.1.2 we immediately obtain: Proposition 10.1.3 For any 𝐶 2 -function 𝑓 on S𝑛−1 with mean zero, ∫ ∫ 1 2 ∥ 𝑓𝑆′′ ∥ 2HS d𝔰𝑛−1 , 𝑓 d𝔰𝑛−1 ≤ 𝑛−1 where equality is attained for all linear functions. Moreover, if 𝑓 is orthogonal to all linear functions with respect to 𝔰𝑛−1 , then ∫ ∫ 1 2 ∥ 𝑓𝑆′′ ∥ 2HS d𝔰𝑛−1 , 𝑓 d𝔰𝑛−1 ≤ 2𝑛(𝑛 + 2) with equality attainable for all quadratic harmonics. To get these relations, one can also apply (10.2) and the representation (9.7) in order to reduce them to spherical harmonics. But for 𝑓 = 𝑓 𝑑 in 𝐻 𝑑 , the inequality of ∫ ∫ the form 𝑐 𝑓 2 d𝔰𝑛−1 ≤ ∥ 𝑓S′′ ∥ 2HS d𝔰𝑛−1 becomes
188
10 Second Order Spherical Concentration
𝑐 ≤ 𝑑 (𝑛 + 𝑑 − 2) 𝑑 (𝑛 + 𝑑 − 2) − (𝑛 − 2) , where the right-hand side is an increasing function of 𝑑. An interesting consequence of the first relation of Proposition 10.1.3 is the statement that the equality 𝑓S′′ = 0 is only possible for constant functions (in contrast with the Euclidean Hessian).
10.2 Bounds on the 𝑳 2 -norm in the Euclidean Setup If 𝑓 is defined and smooth in an open neighborhood of the sphere, and if we prefer to deal with usual Euclidean derivatives (rather than with spherical derivatives), one may use the general relation |∇S 𝑓 | ≤ |∇ 𝑓 |. Then, from the Poincaré inequality (10.1) we have a formally weaker bound ∫ ∫ 1 2 𝑓 d𝔰𝑛−1 ≤ |∇ 𝑓 | 2 d𝔰𝑛−1 . 𝑛−1 In fact, this inequality returns us to (10.1) when it is applied to the functions of the form 𝑓 (𝑥/|𝑥|). Our next step is to estimate the above integral on the right-hand side by involving the second derivative in analogy with Proposition 10.1.2. Proposition 10.2.1 Suppose that a function 𝑓 is defined and 𝐶 2 -smooth in some neighborhood of S𝑛−1 . If 𝑓 is orthogonal to all linear functions with respect to 𝔰𝑛−1 , then ∫ ∫ 4 ∥ 𝑓 ′′ ∥ 2HS d𝔰𝑛−1 . |∇ 𝑓 | 2 d𝔰𝑛−1 ≤ (10.4) 𝑛−1 Proof By the Poincaré inequality, applied to the 𝑖-th partial derivative of 𝑓 , ∫
2
2
∫
(𝜕𝑖 𝑓 (𝜃)) d𝔰𝑛−1 (𝜃) ≤
𝜕𝑖 𝑓 (𝜃) d𝔰𝑛−1 (𝜃) +
1 𝑛−1
which after summation over all 𝑖 ≤ 𝑛 yields ∫ ∫ 2 2 |∇ 𝑓 | d𝔰𝑛−1 ≤ ∇ 𝑓 d𝔰𝑛−1 +
∫ ∑︁ 𝑛
(𝜕𝑖 𝑗 𝑓 (𝜃)) 2 d𝔰𝑛−1 (𝜃),
𝑗=1
1 𝑛−1
∫
∥ 𝑓 ′′ ∥ 2HS d𝔰𝑛−1 .
(10.5)
In order to bound the intermediate Euclidean norm, we recall that the assumption that 𝑓 is orthogonal to all linear functions is equivalent to the property that every function of the form ⟨∇S 𝑓 (𝜃), 𝑣⟩ = ⟨∇ 𝑓 (𝜃), 𝑣⟩ − ⟨∇ 𝑓 (𝜃), 𝜃⟩ ⟨𝑣, 𝜃⟩
10.2 Bounds on the 𝐿 2 -norm in the Euclidean Setup
189
has 𝔰𝑛−1 -mean zero (Proposition 9.3.3). In terms of 𝑢(𝜃) = ⟨∇ 𝑓 (𝜃), 𝜃⟩ , this may be written as the vector equality ∫ ∫ 𝑢(𝜃) 𝜃 d𝔰𝑛−1 (𝜃). ∇ 𝑓 d𝔰𝑛−1 = Hence
2 ∫ ∫ ∫ 𝑢(𝜃)𝑢(𝜃 ′) ⟨𝜃, 𝜃 ′⟩ d𝔰𝑛−1 (𝜃)d𝔰𝑛−1 (𝜃 ′), ∇ d𝔰 𝑓 𝑛−1 =
which does not exceed 1/2 ∫ ∫ 1/2 ∫∫ ⟨𝜃, 𝜃 ′⟩ 2 d𝔰𝑛−1 (𝜃)d𝔰𝑛−1 (𝜃 ′) 𝑢(𝜃) 2 𝑢(𝜃 ′) 2 d𝔰𝑛−1 (𝜃)d𝔰𝑛−1 (𝜃 ′) 1 = √ 𝑛
∫
1 𝑢 2 d𝔰𝑛−1 ≤ √ 𝑛
∫
|∇ 𝑓 | 2 d𝔰𝑛−1 ,
where we used the Cauchy–Schwarz inequality together with |𝑢(𝜃)| ≤ |∇ 𝑓 (𝜃)|. Inserting this bound into (10.5), we get a relation which is solved as ∫ ∫ 𝑐𝑛 1 2 ∥ 𝑓 ′′ ∥ 2HS d𝔰𝑛−1 , 𝑐 𝑛 = |∇ 𝑓 | d𝔰𝑛−1 ≤ √ . 𝑛−1 1 − 1/ 𝑛 √ Clearly, 𝑐 𝑛 ≤ 𝑐 2 = 2+ 2 < 4, thus proving (10.4). Note also that 𝑐 𝑛 → 1 as 𝑛 → ∞. So, the constant 4 may be improved for large values of 𝑛. □ Combining (10.4) with the Poincaré inequality (9.11), we get a second order Poincaré-type inequality in the Euclidean setup, ∫ ∫ 2 ∥ 𝑓 ′′ ∥ 2HS d𝔰𝑛−1 , ( 𝑓 − 𝑚) 2 d𝔰𝑛−1 ≤ 𝑛(𝑛 − 1) assuming that 𝑓 is orthogonal to all linear functions, and where 𝑚 is the mean of 𝑓 with respect to 𝔰𝑛−1 . Here the left integral will not change when it is applied to 𝑓 𝑎 (𝑥) = 𝑓 (𝑥) − 𝑎2 |𝑥| 2 in place of 𝑓 , while the right integral will depend on 𝑎. Hence ∫ ∫ 2 2 ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ 2HS d𝔰𝑛−1 , ( 𝑓 − 𝑚) d𝔰𝑛−1 ≤ 𝑛(𝑛 − 1) where I𝑛 denotes the 𝑛 × 𝑛 identity matrix. Thus, we arrive at: Proposition 10.2.2 If 𝑓 is 𝐶 2 -smooth in some neighborhood of S𝑛−1 and is orthogonal to all affine functions with respect to 𝔰𝑛−1 , then for any 𝑎 ∈ R, ∫ ∫ 2 ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ 2HS d𝔰𝑛−1 . 𝑓 2 d𝔰𝑛−1 ≤ 𝑛(𝑛 − 1)
190
10 Second Order Spherical Concentration
Let us finally mention that the constants in Propositions 10.2.1 and 10.2.2 are asymptotically correct with respect to the growing dimension 𝑛. This may be tested on the example of the quadratic functions 𝑓 (𝜃) = ⟨𝜃, 𝑣⟩ 2 − 𝑛1 with unit vectors 𝑣.
10.3 First Order Concentration Inequalities If a function 𝑓 on S𝑛−1 has 𝔰𝑛−1 -mean zero and has Lipschitz semi-norm ∥ 𝑓 ∥ Lip ≤ 1 with respect to the geodesic distance, then, by the Poincaré inequality, its 𝐿 2 -norm on the probability space (S𝑛−1 , 𝔰𝑛−1 ) satisfies ∥ 𝑓 ∥ 𝐿 2 (𝔰𝑛−1 ) ≤ √
1 . 𝑛−1
√ Hence, | 𝑓 | is of order 1/ 𝑛 on a large subset of the sphere. This is part of the spherical concentration phenomenon, although expressed in a weak form, to which we refer as the first order concentration. In fact, as we know, Poincaré-type inequalities provide more – they are responsible for exponential integrability of Lipschitz functions, and, for example, Proposition 6.1.1 provides an exponential bound on the tails, namely √ 𝔰𝑛−1 | 𝑓 | ≥ 𝑟 ≤ 6 e−2 𝑛−1 𝑟 ,
𝑟 ≥ 0.
Equivalently, modulo an absolute constant, we have the relation ∥ 𝑓 ∥ 𝜓1 ≤
√𝑐 𝑛−1
for
e |𝑡 |
the Orlicz norm generated by the Young function 𝜓1 (𝑡) = − 1. Even more information is provided by the logarithmic Sobolev inequality on the unit sphere, cf. Proposition 9.4.2. Since the logarithmic Sobolev constant in this case is 𝜌(𝔰𝑛−1 ) = 𝑛 − 1, one may apply Corollary 7.3.3 and Corollary 7.5.3, which give an essential sharpening in terms of the tail behavior of Lipschitz functions. Proposition 10.3.1 For any 𝑓 : S𝑛−1 → R with ∥ 𝑓 ∥ Lip ≤ 1 and 𝔰𝑛−1 -mean zero, ∫ e𝑡 𝑓 d𝔰𝑛−1 ≤ exp
n
o 𝑡2 , 2(𝑛 − 1)
𝑡 ∈ R.
In particular, for all 𝑟 > 0, 2 𝔰𝑛−1 𝑓 ≥ 𝑟 ≤ e−(𝑛−1)𝑟 /2 ,
2 𝔰𝑛−1 | 𝑓 | ≥ 𝑟 ≤ 2 e−(𝑛−1)𝑟 /2 .
(10.6)
When the Lipschitz semi-norm of the function 𝑓 is too large, one may use the log-Sobolev inequality to bound the 𝐿 𝑝 -norms ∥ 𝑓 ∥ 𝑝 = ∥ 𝑓 ∥ 𝐿 𝑝 (𝔰𝑛−1 ) in terms of the 𝐿 𝑝 -norms of the modulus of the gradient. In particular, from Proposition 7.5.4 we obtain the following assertion (cf. also Remark 7.5.6).
10.3 First Order Concentration Inequalities
191
Proposition 10.3.2 Given a Lipschitz function 𝑓 : S𝑛−1 → C with 𝔰𝑛−1 -mean zero, we have, for any 𝑝 ≥ 2, √︁ 𝑝−1 ∥ 𝑓 ∥𝑝 ≤ √ ∥∇S 𝑓 ∥ 𝑝 . 𝑛−1 When 𝑝 = 2, here we return to the Poincaré inequality on the unit sphere. As a simple example, we get the following bound for linear functions. Corollary 10.3.3 For any 𝑣 ∈ R and 𝑝 ≥ 2, ∫
𝑝
| ⟨𝑣, 𝜃⟩ | d𝔰𝑛−1 (𝜃)
1/ 𝑝
√︂ ≤
𝑝 |𝑣|. 𝑛
Indeed, the function 𝑓 (𝜃) = ⟨𝑣, 𝜃⟩ has a gradient satisfying |∇S 𝑓 ∥ 𝑝 ≤ |∇ 𝑓 | = |𝑣|. Hence, by Proposition 10.3.2, √ ∫ 1/ 𝑝 √︁ 𝑝 − 1 𝑝 𝑝 |𝑣| ≤ √ |𝑣|, 2 ≤ 𝑝 ≤ 𝑛. | ⟨𝑣, 𝜃⟩ | d𝔰𝑛−1 (𝜃) ≤ √ 𝑛 𝑛−1 If 𝑝 > 𝑛, the resulting bound holds true as well, since the 𝐿 𝑝 -norm on the left-hand side does not exceed |𝑣|. Similar concentration inequalities may also be derived from the isoperimetric theorem (Proposition 9.5.2), including the class of functions 𝑓 that are defined and Lipschitz on R𝑛 with power 𝛼 ∈ (0, 1], i.e., satisfying | 𝑓 (𝜃) − 𝑓 (𝜃 ′)| ≤ |𝜃 − 𝜃 ′ | 𝛼
(𝜃, 𝜃 ′ ∈ R𝑛 ).
(10.7)
In this case, we write ∥ 𝑓 ∥ Lip( 𝛼) ≤ 1. Alternatively, one can use the supremumconvolution approach (based on the spherical logarithmic Sobolev inequality) to derive, for example, the following generalization of the exponential bound (10.6). Proposition 10.3.4 Given a function 𝑓 : R𝑛 → R with 𝔰𝑛−1 -mean zero and ∥ 𝑓 ∥ Lip( 𝛼) ≤ 1 (0 < 𝛼 ≤ 1), we have o n 𝑛−1 𝑟 2/𝛼 , 𝔰𝑛−1 𝑓 ≥ 𝑟 ≤ exp − 2
𝑟 ≥ 0.
(10.8)
Proof By (10.7), for all 𝜃, 𝜃 ′ ∈ R𝑛 and 𝑡 > 0, 𝑓 (𝜃 ′) − 𝑓 (𝜃) −
𝛼 2−𝛼 1 1 |𝜃 − 𝜃 ′ | 2 ≤ |𝜃 − 𝜃 ′ | 𝛼 − |𝜃 − 𝜃 ′ | 2 ≤ (𝛼𝑡) 2−𝛼 , 2𝑡 2𝑡 2
so that for the supremum-convolution of 𝑓 we have an upper bound 𝑃𝑡 𝑓 (𝜃) = sup 𝜃 ′ ∈R𝑛
h
𝑓 (𝜃 ′) −
i 𝛼 1 2−𝛼 |𝜃 − 𝜃 ′ | 2 ≤ 𝑓 (𝜃) + (𝛼𝑡) 2−𝛼 . 2𝑡 2
Hence, by Corollary 9.4.3, we obtain an upper bound on the Laplace transform,
192
10 Second Order Spherical Concentration
∫ e𝑡 𝑓 d𝔰𝑛−1
n ∫ o 𝑡 𝑃 𝑛−1 ≤ exp 𝑡 𝑓 d𝔰𝑛−1 o n2 − 𝛼 𝛼 𝛼 2 ≤ exp (𝑛 − 1) − 2−𝛼 𝛼 2−𝛼 𝑡 2−𝛼 . 2
By Markov’s inequality, the latter yields o n 𝛼 𝛼 2 2−𝛼 (𝑛 − 1) − 2−𝛼 𝛼 2−𝛼 𝑡 2−𝛼 . 𝔰𝑛−1 𝑓 ≥ 𝑟 ≤ exp − 𝑟𝑡 + 2 It remains to optimize the right-hand side over all 𝑡 > 0, and then we arrive at the required inequality (10.8). □
10.4 Second Order Concentration Using a standard normal random variable 𝑍, the Gaussian deviation bounds (10.6) may be stated informally as a stochastic dominance | 𝑓 | ⪯ 𝑐 √|𝑍𝑛| . Up to an absolute constant 𝑐 > 0, this may also be written equivalently as ∫ 2 e (𝑛−1) 𝑓 /𝑐 d𝔰𝑛−1 ≤ 2. In fact, under stronger assumptions involving the second derivative 𝑓S′′, one can further strengthen this stochastic dominance with respect to the dimension 𝑛 – for example, to get a comparison such as | 𝑓 | ⪯ 𝑐 ( √𝑍𝑛 ) 2 . Given a real symmetric 𝑛 × 𝑛 matrix 𝐴 = (𝑎 𝑖 𝑗 )𝑖,𝑛 𝑗=1 (e.g. when 𝐴 = 𝑓S′′) , we denote by ∥ 𝐴∥ = max | 𝜃 | ≤1 | 𝐴𝜃| its operator norm, and by ∥ 𝐴∥ HS =
∑︁ 𝑛
|𝑎 𝑖 𝑗 | 2
1/2
𝑖, 𝑗=1
its Hilbert–Schmidt norm. If the entries of 𝐴 are complex numbers, the definition is similar, the only difference being that the supremum should be taken over all 𝜃 ∈ C𝑛 such that |𝜃| ≤ 1 (that is, from the complex unit sphere). Proposition 10.4.1 Assume that a 𝐶 2 -smooth function 𝑓 : S𝑛−1 → R is orthogonal ∫ to all affine functions. If ∥ 𝑓S′′ ∥ ≤ 1 on S𝑛−1 and ∥ 𝑓S′′ ∥ 2HS d𝔰𝑛−1 ≤ 𝑏, then ∫ o n 𝑛−1 | 𝑓 | d𝔰𝑛−1 ≤ 2. exp 2(1 + 𝑏) Proof By Proposition 9.2.1, |∇S2 𝑓 (𝜃)| ≤ ∥ 𝑓S′′ (𝜃) ∥ ≤ 1 for all points 𝜃 ∈ S𝑛−1 , so that we may apply Proposition 7.6.1 on the space M = S𝑛−1 with 𝜇 = 𝔰𝑛−1 . It gives
10.4 Second Order Concentration
∫
193
e (𝑛−1) 𝑓 /2 d𝔰𝑛−1 ≤ exp
n𝑛 − 1
∫
2
o |∇S 𝑓 | 2 d𝔰𝑛−1 .
To bound the last integral, we may apply the second relation of Proposition 10.1.2, and then we get ∫ ∫ n𝑛 − 1 o 𝑛−1 1 𝑓 d𝔰𝑛−1 ≤ ∥ 𝑓S′′ ∥ 2HS d𝔰𝑛−1 ≤ 𝑏. log exp 2 2(𝑛 + 2) 2 Using a similar inequality for the function − 𝑓 , we have ∫ ∫ ∫ 𝑛−1 𝑛−1 𝑛−1 e 2 𝑓 d𝔰𝑛−1 + e− 2 𝑓 d𝔰𝑛−1 ≤ 2e𝑏/2 . e 2 | 𝑓 | d𝔰𝑛−1 ≤ It follows that, for any 𝜆 ≥ 1, ∫ ∫ 𝑛−1 𝑛−1 e 2 | 𝑓 |/𝜆 d𝔰𝑛−1 ≤ e 2
|𝑓|
d𝔰𝑛−1
It remains to note that (2e𝑏/2 ) 1/𝜆 = 2 for 𝜆 = 1 + Proposition 10.4.1 is proved.
1/𝜆
𝑏 log 4
≤ (2e𝑏/2 ) 1/𝜆 .
≤ 1 + 𝑏. □
In the sequel, we will however work with Euclidean derivatives, and another version of Proposition 10.4.1 will be needed. Proposition 10.4.2 Let a real-valued function 𝑓 be defined and 𝐶 2 -smooth in some open neighborhood of S𝑛−1 . Assume that it is orthogonal to all affine functions and satisfies ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ ≤ 1 on S𝑛−1 together with ∫ ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ 2HS d𝔰𝑛−1 ≤ 𝑏 (10.9) for some 𝑎 ∈ R and 𝑏 ≥ 0. Then ∫ n exp
o 𝑛−1 | 𝑓 | d𝔰𝑛−1 ≤ 2. 2(1 + 3𝑏)
(10.10)
Proof Suppose that 𝑓 is orthogonal to all linear functions in 𝐿 2 (𝔰𝑛−1 ) and has mean 𝑚. We now apply Corollary 7.6.2 to the function 𝑓 − 𝑚 together with bound (10.4) of Proposition 10.2.1, which gives ∫ ∫ o n𝑛 − 1 ( 𝑓 − 𝑚) d𝔰𝑛−1 ≤ 2 ∥ 𝑓 ′′ ∥ 2HS d𝔰𝑛−1 . log exp 2 Applying it to 𝑓 𝑎 (𝑥) = 𝑓 (𝑥) − 𝑎2 |𝑥| 2 in place of 𝑓 , we get ∫ ∫ n𝑛 − 1 o ( 𝑓 − 𝑚) d𝔰𝑛−1 ≤ 2 ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ 2HS d𝔰𝑛−1 ≤ 2𝑏. log exp 2
194
10 Second Order Spherical Concentration
Now, assume that 𝑚 = 0 and apply a similar inequality to the function − 𝑓 . Then ∫ 𝑛−1 e 2 | 𝑓 | d𝔰𝑛−1 ≤ 2e2𝑏 . Hence, for any 𝜆 ≥ 1, ∫ ∫ 𝑛−1 𝑛−1 | 𝑓 |/𝜆 2 e d𝔰𝑛−1 ≤ e 2
|𝑓|
d𝔰𝑛−1
1/𝜆
≤ (2e2𝑏 ) 1/𝜆 .
As before, it remains to note that (2e2𝑏 ) 1/𝜆 = 2 for 𝜆 = 1 + Proposition 10.4.2 is proved.
2𝑏 log 2
≤ 1 + 3𝑏. □
10.5 Second Order Concentration With Linear Parts In Propositions 10.4.1–10.4.2 one may also start with an arbitrary 𝐶 2 -smooth function 𝑓 , but apply the hypotheses and the conclusions to the projection 𝑇 𝑓 of 𝑓 onto the orthogonal complement of the space H of all affine functions on the sphere in 𝐿 2 (𝔰𝑛−1 ). This space has dimension 𝑛 + 1, and one may choose for the orthonormal basis in H the canonical functions √ 𝑙0 (𝜃) = 1, 𝑙 𝑘 (𝜃) = 𝑛 𝜃 𝑘 (𝑘 = 1, . . . , 𝑛, 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ). Therefore, the “affine” part 𝑙 = 𝑇 𝑓 − 𝑓 of 𝑓 is described as the orthogonal projection in 𝐿 2 (𝔰𝑛−1 ) onto H , namely 𝑙 (𝜃) = Proj H ( 𝑓 ) =
𝑛 ∑︁
⟨ 𝑓 , 𝑙 𝑘 ⟩ 𝐿 2 (𝔰𝑛−1 ) 𝑙 𝑘 (𝜃)
𝑘=0
=
𝑛 ∫ ∑︁
𝑓 (𝑥)𝑙 𝑘 (𝑥) d𝔰𝑛−1 (𝑥) 𝑙 𝑘 (𝜃)
𝑘=0
∫
∫ =
𝑓 (𝑥) d𝔰𝑛−1 (𝑥) + 𝑛
𝑓 (𝑥)
𝑛 ∑︁
𝑥 𝑘 𝜃 𝑘 d𝔰𝑛−1 (𝑥).
𝑘=1
In other words, 𝑙 (𝜃) = 𝑚 + ⟨𝑣, 𝜃⟩ with ∫ 𝑚= 𝑓 (𝑥) d𝔰𝑛−1 (𝑥),
∫ 𝑣=𝑛
𝑥 𝑓 (𝑥) d𝔰𝑛−1 (𝑥),
so, 𝑇 𝑓 (𝜃) = 𝑓 (𝜃) − 𝑙 (𝜃). The function ⟨𝑣, 𝜃⟩ will be referred to as a linear part of 𝑓 . For example, if 𝑓 is even, i.e. 𝑓 (−𝜃) = 𝑓 (𝜃) for all 𝜃 ∈ S𝑛−1 , then 𝑇 𝑓 = 𝑓 − 𝑚. In the setting of Proposition 10.4.2, the functions 𝑇 𝑓 and 𝑓 have identical Euclidean second derivatives. Hence, if we want to obtain an inequality similar to (10.10) without the orthogonality assumption (still assuming conditions on the Eu-
10.5 Second Order Concentration With Linear Parts
195
clidean second derivative), we need to verify that the affine part 𝑙 is of order 𝑛1 . This may be achieved by estimating the 𝐿 2 -norm of 𝑙 and using the well-known fact that the linear functions on the sphere behave like Gaussian random variables. If, for definiteness, 𝑓 has mean zero, then ∫∫ 1 ⟨𝑥, 𝑦⟩ 𝑓 (𝑥) 𝑓 (𝑦) d𝔰𝑛−1 (𝑥)d𝔰𝑛−1 (𝑦). ∥𝑙 ∥ 2𝐿 2 = |𝑣| 2 = 𝑛𝐼, where 𝐼 = 𝑛 Therefore, a natural requirement should be a bound 𝐼 ≤ 𝑏 20 /𝑛3 with 𝑏 0 of order 1. This leads to the following generalization of Proposition 10.4.2, which is more flexible in applications. Proposition 10.5.1 Let 𝑓 be a real-valued, 𝐶 2 -smooth function in some open neighborhood of S𝑛−1 . Assume that it has 𝔰𝑛−1 -mean zero and 𝐼 ≤ 𝑏 20 /𝑛3 (𝑏 0 ≥ 0). If ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ ≤ 1 on S𝑛−1 for some 𝑎 ∈ R, and (10.9) holds, then ∫ o n 𝑛−1 | 𝑓 | d𝔰𝑛−1 ≤ 2. (10.11) exp 2(1 + 𝑏 0 + 3𝑏) Proof Let 𝑙 (𝜃) = ⟨𝑣, 𝜃⟩ be the linear part of 𝑓 as above. First, we need a bound on the Orlicz norm corresponding to the Young function 𝜓1 (𝑡) = e |𝑡 | − 1 in the class of linear functions. Applying Proposition 10.3.1 to 𝑙 and −𝑙, we have ∫ o n 𝑡2 |𝑣| 2 , 𝑡 > 0. e𝑡 |𝑙 | d𝔰𝑛−1 ≤ 2 exp 2(𝑛 − 1) Hence, for 𝜆 > 1, ∫
n e𝑡 |𝑙 |/𝜆 d𝔰𝑛−1 ≤ 2 exp
o 1/𝜆 𝑡2 |𝑣| 2 = 2, 2(𝑛 − 1)
where the last equality holds true for the value 𝜆 = 1 + ∥𝑙 ∥ 𝜓1 ≤
𝜆 𝑡
=
1 𝑡
+
𝑡 |𝑣 | 2 (𝑛−1) log 4 .
𝑡 2 |𝑣 | 2 (𝑛−1) log 4 .
Equivalently,
Optimizing over 𝑡 > 0, we arrive at ∥𝑙 ∥ 2𝜓1 ≤
2 |𝑣| 2 . (𝑛 − 1) log 2
(10.12)
√ Since |𝑣| ≤ 𝑏 0 / 𝑛, we thus have ∥𝑙 ∥ 𝜓1 ≤ 2𝑏 0 /(𝑛 − 1). On the other hand, by Proposition 10.4.2 with the same assumption on the second derivative of 𝑓 , ∫ n 𝑛−1 o |𝑇 𝑓 | d𝔰𝑛−1 ≤ 2, exp 2(1 + 3𝑏) 2 (1 + 3𝑏). Using | 𝑓 | ≤ |𝑇 𝑓 | + |𝑙 | and applying the which means that ∥𝑇 ∥ 𝜓1 ≤ 𝑛−1 triangle inequality in the Orlicz space 𝐿 𝜓1 (𝔰𝑛−1 ), we conclude that
196
10 Second Order Spherical Concentration
∥ 𝑓 ∥ 𝜓1 ≤
2𝑏 0 + 2(1 + 3𝑏) . 𝑛−1
This is an equivalent form of (10.11), so, Proposition 10.5.1 is proved.
□
Let us summarize and extend Propositions 10.2.2, 10.4.2 and 10.5.1 to the class of complex-valued functions. Proposition 10.5.2 Suppose that a complex-valued function 𝑓 is 𝐶 2 -smooth in some open neighborhood of S𝑛−1 and has 𝔰𝑛−1 -mean zero. For any 𝑎 ∈ C, ∫ ∫ 2 2 2 ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ 2HS d𝔰𝑛−1 , | 𝑓 | d𝔰𝑛−1 ≤ ∥𝑙 ∥ 𝐿 2 + (10.13) 𝑛(𝑛 − 1) where 𝑙 is a linear part of 𝑓 . Moreover, if ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ ≤ 1 on S𝑛−1 , then ∫ 6 4 ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ 2HS d𝔰𝑛−1 + . ∥ 𝑓 ∥ 𝜓1 ≤ 4 ∥𝑙 ∥ 𝐿 2 + 𝑛−1 𝑛−1
(10.14)
Proof Let 𝑓0 = Re( 𝑓 ), 𝑓1 = Im( 𝑓 ), 𝑙 0 = Re(𝑙), 𝑙1 = Im(𝑙), and 𝑎 0 = Re(𝑎), 𝑎 1 = Im(𝑎). The inequality (10.13) follows from Proposition 10.2.2 applied to the functions 𝑢 0 = 𝑓0 − 𝑙 0 and 𝑢 1 = 𝑓1 − 𝑙 1 with 𝑎 0 and 𝑎 1 respectively. Turning to the second assertion, first let us apply (10.12) to the linear part of 𝑓 which gives ∥𝑙 𝑘 ∥ 2𝜓1 ≤
2𝑛 4 ∥𝑙 𝑘 ∥ 2𝐿2 ≤ ∥𝑙 𝑘 ∥ 2𝐿2 , (𝑛 − 1) log 2 log 2
𝑘 = 0, 1.
Using |𝑙 | ≤ |𝑙 0 | + |𝑙1 | and ∥𝑙 ∥ 2𝐿 2 = ∥𝑙0 ∥ 2𝐿 2 + ∥𝑙1 ∥ 2𝐿 2 , we then get 2 ∥𝑙 ∥ 𝜓1 ≤ ∥𝑙 0 ∥ 𝜓1 + ∥𝑙1 ∥ 𝜓1 ≤ √︁ ∥𝑙 0 ∥ 𝐿2 + ∥𝑙 1 ∥ 𝐿2 log 2 √ √︃ 2 2 ∥𝑙 0 ∥ 2𝐿2 + ∥𝑙 1 ∥ 2𝐿2 ≤ 4 ∥𝑙 ∥ 𝐿2 . ≤ √︁ log 2 The functions 𝑢 𝑘 (𝑘 = 0, 1) are orthogonal to all linear functions in 𝐿 2 (𝔰𝑛−1 ), and by the assumption, | ⟨( 𝑓 ′′ (𝜃) − 𝑎 I𝑛 )𝑣, 𝑤⟩ | ≤ 1,
𝜃 ∈ S𝑛−1 ,
for all 𝑣, 𝑤 ∈ C𝑛 such that |𝑣| ≤ 1 and |𝑤| ≤ 1. Since 𝑓 ′′ (𝜃) = 𝑢 0′′ (𝜃) + 𝑖𝑢 1′′ (𝜃), it 𝑛−1 . Hence, we may apply the inequality (10.10) follows that ∥𝑢 ′′ 𝑘 − 𝑎 𝑘 I𝑛 ∥ ≤ 1 on S of Proposition 10.4.2, which gives ∫ 2 2 (1 + 3𝑏 𝑘 ), 𝑏 𝑘 = ∥𝑢 𝑘 ∥ 𝜓1 ≤ ∥𝑢 ′′ 𝑘 − 𝑎 𝑘 I𝑛 ∥ HS d𝔰 𝑛−1 , 𝑛−1 so that
10.6 Deviations for Some Elementary Polynomials
197
2 ∥ 𝑓 − 𝑙 ∥ 𝜓1 ≤ ∥𝑢 0 ∥ 𝜓1 + ∥𝑢 1 ∥ 𝜓1 ≤ (2 + 3𝑏 0 + 3𝑏 1 ) 𝑛 − 1 ∫ 4 6 = + ∥ 𝑓 ′′ − 𝑎 I𝑛 ∥ 2HS d𝔰𝑛−1 . 𝑛−1 𝑛−1 It remains to combine the two bounds on the linear and non-linear parts by applying the triangle inequality ∥ 𝑓 ∥ 𝜓1 ≤ ∥ 𝑓 − 𝑙 ∥ 𝜓1 + ∥𝑙 ∥ 𝜓1 . Proposition 10.5.2 is proved. □
10.6 Deviations for Some Elementary Polynomials Let us illustrate the first and second order spherical concentration inequalities on the example of elementary polynomials such as 𝑄 𝑝 (𝜃) =
𝑛 ∑︁
𝑎 𝑘 𝜃 𝑘𝑝 ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 , 𝑝 = 1, 2, . . . ,
𝑘=1
with real coefficients 𝑎 𝑘 . For applications to the randomized versions of the central limit theorem, we are mostly interested in the particular integer values 𝑝 = 3 and 𝑝 = 4. To get an idea about the rate of fluctuations of these functions on the unit sphere with respect to 𝔰𝑛−1 for large values of 𝑛 (with natural normalizations of the coefficients 𝑎 𝑘 ), let us look first at their moments, including the moments of the first coordinate on the sphere. Since 𝑄 𝑝 has a symmetric distribution under 𝔰𝑛−1 when 𝑝 ∫ is odd, necessarily 𝑄 𝑝 d𝔰𝑛−1 = 0. As for even integers, we have: Proposition 10.6.1 For any integer 𝑝 ≥ 1, ∫ 𝑄 2 𝑝 d𝔰𝑛−1 In particular, ∫ 𝜃 12 𝑝 d𝔰𝑛−1 (𝜃) =
𝑛 ∑︁ (2𝑝 − 1)!! = 𝑎𝑘 . 𝑛(𝑛 + 2) · · · (𝑛 + 2𝑝 − 2) 𝑘=1
(2𝑝 − 1)!! (2𝑝 − 1)!! . ≤ 𝑛(𝑛 + 2) · · · (𝑛 + 2𝑝 − 2) 𝑛𝑝
(10.15)
This formula follows from the fact that 𝜃 1 , viewed as a random variable on the 𝑛−3 probability space (S𝑛−1 , 𝔰𝑛−1 ), has density 𝑐 𝑛 (1 − 𝑥 2 ) 2 supported on the interval |𝑥| < 1. Using the beta function, this gives ∫
𝜃 12 𝑝 d𝔰𝑛−1 (𝜃) =
𝐵( 𝑝 + 12 ,
𝑛−1 2 ) 1 𝑛−1 𝐵( 2 , 2 )
which is further simplified to (10.15).
=
Γ( 𝑝 + 12 ) Γ( 𝑛2 ) , Γ( 𝑝 + 𝑛2 ) Γ( 12 )
198
10 Second Order Spherical Concentration
For computation of the mutual covariances between even powers of the spherical coordinates, one may use the identity ∫
𝜃 12 𝑝 𝜃 22 𝑝 d𝔰𝑛−1 (𝜃) =
2 (2𝑝 − 1)!! . 𝑛(𝑛 + 2) · · · (𝑛 + 4𝑝 − 2)
(10.16)
This follows from the relation 𝑍 = |𝑍 | 𝜃 and similar formulas E𝑍12 𝑝 = (2𝑝 − 1)!! and E |𝑍 | 2 𝑝 = 2 𝑝
Γ( 𝑝 + 𝑛2 ) = 𝑛(𝑛 + 2) · · · (𝑛 + 2𝑝 − 2), Γ( 𝑛2 )
where 𝑍 = (𝑍1 , . . . , 𝑍 𝑛 ) is a standard normal random vector in R𝑛 independent of the random vector 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) uniformly distributed over the unit sphere. Let 𝑝 = 1. The functional 𝑄 1 is linear and has a Lipschitz semi-norm ∥𝑄 1 ∥ Lip = (𝑎 21 + · · · + 𝑎 2𝑛 ) 1/2 . In this case, Proposition 10.3.1 with 𝑓 = 𝑄 1 correctly describes a subgaussian behavior of the distribution of 𝑄 1 : If 𝑎 21 + · · · + 𝑎 2𝑛 = 1, then 2 𝔰𝑛−1 |𝑄 1 | ≥ 𝑟 ≤ 2 e−(𝑛−1)𝑟 /2 ,
𝑟 > 0,
which corresponds to the short notation |𝑄 1 | ⪯ 𝑐 √|𝑍𝑛| with 𝑍 ∼ 𝑁 (0, 1). Let now 𝑝 = 3. The functional 𝑄 3 has mean zero, with Lipschitz semi-norm ∥𝑄 3 ∥ Lip = 3 max 𝑘 |𝑎 𝑘 |. In this case, if max 𝑘 |𝑎 𝑘 | ≤ 1, the first order concentration inequality (10.6) provides a similar relation 2 𝔰𝑛−1 |𝑄 3 | ≥ 𝑟 ≤ 2 e−(𝑛−1)𝑟 /18 . However, this conclusion may be considerably strengthened with respect to the growing dimension 𝑛. Since 𝑄 3 is not orthogonal to linear functions, we should appeal to the second order concentration inequality of Proposition 10.5.1, which involves functions with non-zero linear parts. To show that this part is sufficiently small for 𝑄 3 , consider the integral ∫∫ ⟨𝑥, 𝑦⟩ 𝑄 3 (𝑥)𝑄 3 (𝑦) d𝔰𝑛−1 (𝑥)d𝔰𝑛−1 (𝑦) 𝐼𝑄3 = =
∫ ∫ ∑︁ 𝑛
𝑥𝑖 𝑦 𝑖
𝑖=1
=
𝑛 ∑︁ 𝑖=1
𝑎 2𝑖
∫
𝑛 ∑︁
𝑎 𝑗 𝑥 3𝑗
𝑗=1
𝑥𝑖4 d𝔰𝑛−1 (𝑥)
𝑛 ∑︁
𝑎 𝑘 𝑦 3𝑘 d𝔰𝑛−1 (𝑥)d𝔰𝑛−1 (𝑦)
𝑘=1
2 =
𝑛 ∑︁ 9 𝑎2 , 𝑛2 (𝑛 + 2) 2 𝑖=1 𝑖
which corresponds to (10.15) with 𝑝 = 2. If max 𝑘 |𝑎 𝑘 | ≤ 1, then 𝐼𝑄3 ≤ 𝑛93 . Now, the second derivative 𝑄 3′′ (𝜃) = (6 𝑎 𝑖 𝜃 𝑖 𝛿𝑖 𝑗 )𝑖,𝑛 𝑗=1 is a diagonal matrix, for which
10.6 Deviations for Some Elementary Polynomials
∥𝑄 3′′ (𝜃) ∥ = 6 max |𝑎 𝑘 𝜃 𝑘 |, 𝑘
199
2 ∥𝑄 3′′ (𝜃) ∥ HS = 36
𝑛 ∑︁
𝑎 2𝑘 𝜃 2𝑘 .
𝑘=1
Hence, the conditions of Proposition 10.5.1 are fulfilled for the function 𝑓 (𝜃) = 1 6 𝑄 3 (𝜃) with parameters 𝑎 = 0 and ∫ 𝑏=
∥
𝑓 ′′ ∥ 2HS d𝔰𝑛−1
1 = 36
∫
∥𝑄 3′′ (𝜃) ∥ 2HS d𝔰𝑛−1 (𝜃) =
𝑛 1 ∑︁ 2 𝑎 ≤ 1. 𝑛 𝑘=1 𝑘
In addition, ∫∫ 𝐼𝑓 =
⟨𝑥, 𝑦⟩ 𝑓 (𝑥) 𝑓 (𝑦) d𝔰𝑛−1 (𝑥) d𝔰𝑛−1 (𝑦) =
so, one may put 𝑏 0 = 12 . Since
1 2(1+𝑏0 +3𝑏)
1 1 𝐼𝑄 ≤ , 36 3 4𝑛3
≥ 19 , from the inequality (10.11), we get:
Corollary 10.6.2 If max 𝑘 |𝑎 𝑘 | ≤ 1, then ∫ n 𝑛 − 1 o 𝑄 3 d𝔰𝑛−1 ≤ 2. exp 9 In particular, for all 𝑟 ≥ 0, 𝔰𝑛−1 {(𝑛 − 1) |𝑄 3 | ≥ 𝑟 } ≤ 2 e−𝑟/9 .
(10.17)
Thus, 𝑄 3 is of order 1/𝑛 with respect to 𝑛, and there is a subexponential behavior of tails of the distribution, i.e., |𝑄 3 | ⪯ 𝑐 ( √|𝑍𝑛| ) 2 . In particular, we have a bound ∥𝑄 3 ∥ 𝐿 2 ≤ 𝑛𝑐 for some absolute constant 𝑐. Note that, under our conditions, this 1 𝑛 -rate cannot be improved, since by (10.15), Var𝔰𝑛−1 (𝑄 3 ) = ∥𝑄 3 ∥ 2𝐿 2 =
∫
𝑄 23 d𝔰𝑛−1 =
𝑛 ∑︁ 15 𝑎2 , 𝑛(𝑛 + 2)(𝑛 + 4) 𝑘=1 𝑘
(10.18)
which dominates ( 𝑛𝑐 ) 2 when, for example, all 𝑎 𝑘 = 1. In fact, at the expense of a worse decay in 𝑟 on the right-hand side of (10.17), one may weaken the assumption on the coefficients in Corollary 10.6.2. Í Proposition 10.6.3 If 𝑛1 𝑛𝑘=1 𝑎 2𝑘 = 1, then for all 𝑟 > 0, o n 1 𝑟 2/3 . 𝔰𝑛−1 {𝑛 |𝑄 3 | ≥ 𝑟 } ≤ 2 exp − 23 Proof We apply the 𝐿 𝑝 -Poincaré-type inequality of Proposition 10.3.2 to the func𝑝−1 ≤ 2𝑛𝑝 and applying Jensen’s inequality for the power tion 𝑓 = 𝑛𝑄 3 . Using 𝑛−1 function, we get, for any 𝑝 ≥ 2,
200
10 Second Order Spherical Concentration
∥ 𝑓 ∥ 𝑝𝑝 ≤ 𝑛 𝑝
2𝑝 𝑝/2
∥∇𝑄 3 ∥ 𝑝𝑝 ∫ ∑︁ 𝑛 𝑝/2 = 𝑛 𝑝/2 (2𝑝) 𝑝/2 3 𝑝 𝑎 2𝑘 𝜃 4𝑘 d𝔰𝑛−1 (𝜃) 𝑛
𝑘=1
∫ 𝑛 ∑︁ 2 𝑝 𝑝/2 𝑝 1 𝑎 |𝜃 𝑘 | 2 𝑝 d𝔰𝑛−1 (𝜃) ≤ 𝑛 (2𝑝) 3 · 𝑛 𝑘=1 𝑘 ∫ = 𝑛 𝑝 (18 𝑝) 𝑝/2 |𝜃 1 | 2 𝑝 d𝔰𝑛−1 (𝜃). If 𝑝 = 𝑚 is an integer, applying the relation (10.15), we get 𝑚/2 ∥ 𝑓 ∥𝑚 (2𝑚 − 1)!! ≤ 18𝑚/2 2𝑚 𝑚 3𝑚/2 , 𝑚 ≤ (18 𝑚)
√ where we used the bound (2𝑚 − 1)!! ≤ (2𝑚) 𝑚 . Thus, ∥ 𝑓 ∥ 𝑚 ≤ 6 2 𝑚 3/2 . At the expense of a larger absolute factor, this inequality can be extended to all real 𝑝 ≥ 2 in place of 𝑚. Indeed, let 𝑚 be the integer such that 𝑚 ≤ 𝑝 < 𝑚 + 1. Then √ ∥ 𝑓 ∥ 𝑝 ≤ ∥ 𝑓 ∥ 𝑚+1 ≤ 6 2 (𝑚 + 1) 3/2 √ √ ≤ 6 2 ( 𝑝 + 1) 3/2 ≤ 9 3 𝑝 3/2 , √ i.e. ∥ 𝑓 ∥ 𝑝𝑝 ≤ (𝑏 𝑝) 3 𝑝/2 with constant 𝑏 = (9 3) 2/3 = 35/3 . By Markov’s inequality, choosing 𝑝 = 21/31 𝑏 𝑟 2/3 (𝑟 > 0), we get 𝔰𝑛−1 {| 𝑓 | ≥ 𝑟 } ≤
n 𝑝 o (𝑏 𝑝) 3 𝑝/2 = exp − log 2 , 𝑝 𝑟 2
provided that 𝑝 ≥ 2. But, in the case 0 < 𝑝 < 2, the above right-hand side is greater than 1/2, so that we have n 𝑝 o 𝔰𝑛−1 {| 𝑓 | ≥ 𝑟 } ≤ 2 exp − log 2 2 for all 𝑝 > 0 (hence for all 𝑟 > 0). It remains to note that 1 2/3 . Proposition 10.6.3 is proved. 23 𝑟
𝑝 2
log 2 =
10.7 Polynomials of Fourth Degree Similar observations can be made about the polynomials 𝑄 4 (𝜃) =
𝑛 ∑︁ 𝑘=1
𝑎 𝑘 𝜃 4𝑘 ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 .
log 2 24/3 35/3
𝑟 2/3 > □
10.7 Polynomials of Fourth Degree
They have 𝔰𝑛−1 -means
3 𝑛+2
201
𝑎, ¯ where 𝑎¯ =
1 (𝑎 1 + · · · + 𝑎 𝑛 ), 𝑛
and Lipschitz semi-norms ∥𝑄 4 ∥ Lip = 4 max 𝑘 |𝑎 𝑘 |. If max 𝑘 |𝑎 𝑘 | ≤ 1, the first order concentration inequality (10.6) provides the relation n 𝔰𝑛−1 𝑄 4 −
o 2 3 𝑎¯ ≥ 𝑟 ≤ 2 e−(𝑛−1)𝑟 /32 𝑛+2
(𝑟 > 0).
3 This corresponds to the notation |𝑄 4 − 𝑛+2 𝑎| ¯ ⪯ 𝑐 √|𝑍𝑛| , which also implies |𝑄 4 | ⪯ 𝑐 √|𝑍𝑛| (since | 𝑎| ¯ ≤ 1). However, this conclusion may be strengthened with the help of Proposition 10.4.2. Indeed, 𝑄 4 is orthogonal in 𝐿 2 (𝔰𝑛−1 ) to all linear functions, while the second derivative 𝑄 4′′ (𝜃) = (12 𝑎 𝑖 𝜃 𝑖2 𝛿𝑖 𝑗 )𝑖,𝑛 𝑗=1 is a diagonal matrix, for which 𝑛 ∑︁ ∥𝑄 4′′ (𝜃) ∥ = 12 max |𝑎 𝑘 𝜃 2𝑘 |, ∥𝑄 4′′ (𝜃) ∥ 2HS = 144 𝑎 2𝑘 𝜃 4𝑘 . 𝑘
𝑘=1
Hence, the condition of Proposition 10.4.2 is fulfilled for the function 𝑓 (𝜃) = 1 12 𝑄 4 (𝜃) with parameters 𝑎 = 0 and 1 𝑏= 144
∫
∥𝑄 4′′ (𝜃) ∥ 2HS d𝔰𝑛−1 (𝜃) =
𝑛 ∑︁ 3 3 3 𝑎2 ≤ ≤ . 𝑛(𝑛 + 2) 𝑘=1 𝑘 𝑛 + 2 4
As a result, from the inequality (10.10) we obtain that ∫ n𝑛 − 1 3 𝑎¯ o exp 𝑄 4 − d𝔰𝑛−1 ≤ 2, 78 𝑛+2 which yields: Corollary 10.7.1 If max 𝑘 |𝑎 𝑘 | ≤ 1, then for all 𝑟 ≥ 0, o n 3 𝑎¯ 𝔰𝑛−1 (𝑛 − 1) 𝑄 4 − ≥ 𝑟 ≤ 2 e−𝑟/78 . 𝑛+2 In particular, 𝔰𝑛−1 {(𝑛 − 1) |𝑄 4 | ≥ 𝑟 } ≤ 3 e−𝑟/78 .
(10.19)
Here the second bound follows from the first one, by noting that | 𝑎| ¯ ≤ 1, so that 3 𝑎¯ | ≥ 𝑟 − 3(𝑛−1) > 0 for 𝑟 ≥ 3. Hence, by (𝑛 − 1) |𝑄 4 | ≥ 𝑟 implies (𝑛 − 1) |𝑄 4 − 𝑛+2 𝑛+2 Markov’s inequality, 1
𝔰𝑛−1 {(𝑛 − 1) |𝑄 4 | ≥ 𝑟 } ≤ 2e− 78
(𝑟− 3(𝑛−1) 𝑛+2 )
≤ 2e3/78 e−𝑟/78 .
In the case 𝑟 ≤ 3, (10.19) is immediate, since the right-hand side is greater than 1.
202
10 Second Order Spherical Concentration
Thus, 𝑄 4 is also of order 1/𝑛 with respect to 𝑛, and we have a subexponential behavior of tails, i.e., |𝑄 4 | ⪯ 𝑐 ( √|𝑍𝑛| ) 2 . The deviations of 𝑄 4 around its mean are even smaller. To see this, let us look at the variance Var𝔰𝑛−1 (𝑄 4 ) =
𝑛 ∑︁
𝑎 2𝑘 Var𝔰𝑛−1 (𝜃 4𝑘 ) +
∑︁
𝑎 𝑖 𝑎 𝑗 Cov𝔰𝑛−1 (𝜃 𝑖4 , 𝜃 4𝑗 )
𝑖≠ 𝑗
𝑘=1
= Var𝔰𝑛−1 (𝜃 14 )
𝑛 ∑︁
𝑎 2𝑘 + Cov𝔰𝑛−1 (𝜃 14 , 𝜃 24 )
= 𝐴𝑛
𝑎 2𝑘
+ 𝐵𝑛
𝑘=1
𝑎𝑖 𝑎 𝑗
𝑖≠ 𝑗
𝑘=1 𝑛 ∑︁
∑︁
∑︁ 𝑛
2 𝑎𝑘 ,
𝑘=1
where we used the property that the measure 𝔰𝑛−1 is invariant under permutations of coordinates. Here, the coefficients are given by ∫ ∫ 𝐴𝑛 = 𝜃 18 d𝔰𝑛−1 (𝜃) − 𝜃 14 𝜃 24 d𝔰𝑛−1 (𝜃), ∫ ∫ 2 𝐵𝑛 = 𝜃 14 𝜃 24 d𝔰𝑛−1 (𝜃) − 𝜃 14 d𝔰𝑛−1 (𝜃) . To simplify, one may apply the formulas (10.15)–(10.16), from which we find that ∫ ∫ 9 3 4 , 𝜃 14 𝜃 24 d𝔰𝑛−1 (𝜃) = 𝜃 1 d𝔰𝑛−1 (𝜃) = , 𝑛(𝑛 + 2) 𝑛(𝑛 + 2)(𝑛 + 4) (𝑛 + 6) and
∫
𝜃 18 d𝔰𝑛−1 (𝜃) =
105 , 𝑛(𝑛 + 2)(𝑛 + 4) (𝑛 + 6)
so that 𝐴𝑛 =
96 , 𝑛(𝑛 + 2)(𝑛 + 4) (𝑛 + 6)
𝐵𝑛 = −
𝑛2 (𝑛
72 (𝑛 + 3) . + 2) 2 (𝑛 + 4) (𝑛 + 6)
Since 𝐵𝑛 ≤ 0, it follows that Var𝔰𝑛−1 (𝑄 4 ) ≤
96 ¯2 𝑎 𝑛3
𝑛
1 ∑︁ 2 where 𝑎¯2 = 𝑎 . 𝑛 𝑘=1 𝑘
This inequality suggests that the function 𝑄 4 − have some nice applications.
3 𝑎¯ 𝑛+2
(10.20)
is of order 𝑛−3/2 , which should
10.7 Polynomials of Fourth Degree
Proposition 10.7.2 If
1 𝑛
Í𝑛 𝑘=1
203
𝑎 2𝑘 = 1, then for all 𝑟 ≥ 0,
o n n 1√ o 3 𝑎¯ 𝑟 . 𝔰𝑛−1 𝑛3/2 𝑄 4 − ≥ 𝑟 ≤ 2 exp − 𝑛+2 38
(10.21)
3 𝑎¯ For short, this assertion may be written as the dominance |𝑄 4 − 𝑛+2 | ⪯ 𝑐𝑍 4 𝑛−3/2 , where 𝑍 ∼ 𝑁 (0, 1).
Proof We use an argument which is similar to the one from the proof of Proposition 3 𝑎¯ ) together 10.6.3. Applying Proposition 10.3.2 to the function 𝑓 = 𝑛3/2 (𝑄 4 − 𝑛+2 with Jensen’s inequality, we have that, for any 𝑝 ≥ 2, ∥ 𝑓 ∥ 𝑝𝑝 ≤ 𝑛3 𝑝/2
2𝑝 𝑝/2
∥∇𝑄 4 ∥ 𝑝𝑝 𝑛 𝑛 2𝑝 𝑝/2 ∫ ∑︁ 𝑝/2 4𝑝 𝑎 2𝑘 𝜃 6𝑘 d𝔰𝑛−1 (𝜃) = 𝑛3 𝑝/2 𝑛 𝑘=1 ∫ 𝑛 𝑝/2 ∑︁ 2 𝑝 2𝑝 𝑝 1 2 4 ≤𝑛 𝑎 |𝜃 𝑘 | 3 𝑝 d𝔰𝑛−1 (𝜃) 𝑛 𝑛 𝑘=1 𝑘 ∫ = 𝑛3 𝑝/2 (32 𝑝) 𝑝/2 |𝜃 1 | 3 𝑝 d𝔰𝑛−1 (𝜃).
Let us replace 𝑝 with 2𝑚 assuming that 𝑚 ≥ 1 is an integer. By (10.15), we get √ 2 2𝑚 𝑚 ∥ 𝑓 ∥ 2𝑚 2𝑚 ≤ (64 𝑚) (6𝑚 − 1)!! ≤ (48 6 𝑚 ) , √ where we used (2𝑘 − 1)!! ≤ (2𝑘) 𝑘 . Hence ∥ 𝑓 ∥ 2𝑚 ≤ 48 6 𝑚 2 . To extend this inequality to real 𝑝 ≥ 2 (although with a large absolute factor), pick up an integer 𝑚 such that 2𝑚 ≤ 𝑝 < 2(𝑚 + 1). Then √ ∥ 𝑓 ∥ 𝑝 ≤ ∥ 𝑓 ∥ 2(𝑚+1) ≤ 48 6 (𝑚 + 1) 2 √ √ ≤ 48 6 𝑝 2 = (𝑏 𝑝) 2 𝑝 , 𝑏 = (48 6) 1/2 . √ By Markov’s inequality, given 𝑟 > 0 and choosing 𝑝 = 21/41 𝑏 𝑟, we get 𝔰𝑛−1 {| 𝑓 | ≥ 𝑟 } ≤
n 𝑝 o (𝑏 𝑝) 2 𝑝 = exp − log 2 𝑝 𝑟 2
provided that 𝑝 ≥ 2. In the case 0 < 𝑝 < 2, the right-hand side is greater than 1/2, so that n 𝑝 o 𝔰𝑛−1 {| 𝑓 | ≥ 𝑟 } ≤ 2 exp − log 2 2 for all 𝑝 > 0. It remains to note that
𝑝 2
log 2 >
1 38
𝑟 1/2 , thus proving (10.18).
□
204
10 Second Order Spherical Concentration
10.8 Large Deviations for Weighted ℓ 𝒑 -norms In the case where the coefficients 𝑎 𝑘 are bounded, the inequality (10.21) may be further sharpened for relatively small values of 𝑟. Consider the functionals 𝑙 𝑝 (𝜃) =
𝑛 ∑︁
𝑎 𝑘 |𝜃 𝑘 | 𝑝 ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ,
𝑘=1
with 𝑎 𝑘 ≥ 0, so that 𝑄 𝑝 = 𝑙 𝑝 when 𝑝 is an even integer. Proposition 10.8.1 If all 𝑎 𝑘 ≤ 1, then for all 𝑟 ≥ 1 and 𝑝 > 2, n 𝑝−2 o 𝔰𝑛−1 𝑛 2 𝑙 𝑝 ≥ 𝑐 𝑝 𝑟 ≤ exp − (𝑟𝑛) 2/ 𝑝
(10.22)
√ where one may take 𝑐 𝑝 = ( 𝑝 + 2) 𝑝 . Correspondingly, if 𝑎 1 + · · · + 𝑎 𝑛 ≤ 1, then o n 𝑝 𝔰𝑛−1 𝑛 2 𝑙 𝑝 ≥ 𝑐 𝑝 𝑟 ≤ 2 exp − 𝑟 2/ 𝑝 (10.23) with 𝑐 𝑝 = (6𝑝) 𝑝/2 . Note that (10.22) with 𝑝 = 4 cannot be obtained on the basis of (10.21). Indeed, since | 𝑎| ¯ ≤ 1 and using 𝑟 − 3 ≥ 14 𝑟 for 𝑟 ≥ 4, (10.21) only gives n 3 𝑎¯ 1 √ o 𝔰𝑛−1 {𝑛 |𝑄 4 | ≥ 𝑟} ≤ 𝔰𝑛−1 𝑛3/2 𝑄 4 − ≥ 𝑟 𝑛 𝑛+2 4 n o √ 1/4 ≤ exp − 𝑐 𝑟 𝑛 . Proof For simplicity, we assume that 𝑝 ≥ 3 is an integer. In the first assertion, one may also assume that all 𝑎 𝑘 = 1, in which case 𝑙 𝑝 (𝜃) = |𝜃 1 | 𝑝 + · · · + |𝜃 𝑛 | 𝑝 . We apply the spherical concentration inequality (10.6) to the ℓ 𝑛𝑝 -norm 𝑓 (𝑥) = ∥𝑥∥ 𝑝 =
∑︁ 𝑛
1/ 𝑝 |𝑥 𝑘 |
𝑝
,
𝑥 = (𝑥1 , . . . , 𝑥 𝑛 ) ∈ R𝑛 .
𝑘=1
Since ∥𝑥∥ 𝑝 ≤ ∥𝑥∥ 2 = |𝑥|, it has Lipschitz semi-norm 1, and thus ∫ n o 2 𝔰𝑛−1 𝑓 ≥ 𝑓 d𝔰𝑛−1 + 𝑡 ≤ e−𝑛𝑡 /4 for all 𝑡 ≥ 0, which implies that n
𝑝 𝔰𝑛−1 𝑙 1/ 𝑝
≥
∫
𝑙 𝑝 d𝔰𝑛−1
1/ 𝑝
o 2 + 𝑡 ≤ e−𝑛𝑡 /4 .
(10.24)
10.8 Large Deviations for Weighted ℓ 𝑝 -norms
205
Using the formula (10.15), one may derive the bound ∫ 𝑝−2 𝑙 𝑝 d𝔰𝑛−1 < 𝐴 𝑝 𝑛− 2 , 𝐴 𝑝 = 𝑝 𝑝/2 .
(10.25)
Indeed, for even powers 𝑝 = 2𝑘, 𝑘 = 2, 3, . . . , it implies ∫ 𝑝−2 (2𝑘 − 1)!! < 2 𝑘 𝑘! 𝑛−(𝑘−1) < (2𝑘) 𝑘 𝑛−(𝑘−1) = 𝑝 𝑝/2 𝑛− 2 . 𝑙2𝑘 d𝔰𝑛−1 < 𝑛 𝑘−1 Here, the resulting bound also holds for 𝑝 = 2𝑘 − 1. This can be shown by using 𝑝−2) the property that the function 𝑝 → 𝑙 1/( is non-decreasing in 𝑝 > 2 (which is a 𝑝 2𝑘−3 2𝑘−2 and particular case of Proposition 4.2.1 in view of 𝑙2 = 1). It gives 𝑙 2𝑘−1 ≤ 𝑙2𝑘
∫
∫ 𝑙 2𝑘−1 d𝔰𝑛−1 ≤
𝜃 12𝑘
2𝑘−3 2𝑘−2 d𝔰𝑛−1 (𝜃) 2𝑘−3
2𝑘−3
< (2 𝑘 𝑘! 𝑛−(𝑘−1) ) 2𝑘−2 = (2 𝑘 𝑘!) 2𝑘−2 𝑛− < (2𝑘 − 1)
𝑘 (2𝑘−3) 2𝑘−2
𝑛−
𝑝−2 2
< (2𝑘 − 1)
𝑝−2 2
2𝑘−1 2
𝑛−
𝑝−2 2
,
where we used the simple inequality 2 𝑘 𝑘! < (2𝑘 − 1) 𝑘 . Thus, (10.25) is derived, and returning to (10.24), we obtain a deviation inequality o n 2 √ − 𝑝−2 𝑝 2𝑝 + 𝑡 𝑝 ≤ e−𝑛𝑡 /4 . 𝔰𝑛−1 𝑙 1/ ≥ 𝑛 𝑝 𝑝−2
Here, the choice 𝑡 = 2𝑟 1/ 𝑝 𝑛− 2 𝑝 leads to o n 𝑝−2 n o √ 𝑝 ≥ 𝑝 + 2𝑟 1/ 𝑝 ≤ exp − (𝑟𝑛) 2/ 𝑝 , 𝔰𝑛−1 𝑛 2 𝑝 𝑙 1/ 𝑝 that is,
o n 𝑝−2 n o √ 𝔰𝑛−1 𝑛 2 𝑙 𝑝 ≥ ( 𝑝 + 2𝑟 1/ 𝑝 ) 𝑝 ≤ exp − (𝑟𝑛) 2/ 𝑝 .
√ √ Since 𝑟 ≥ 1, we have ( 𝑝 + 2𝑟 1/ 𝑝 ) 𝑝 ≤ ( 𝑝 + 2) 𝑝 𝑟, and the inequality (10.22) follows. Turning to the second assertion, consider the function 𝑓 = 𝑛 𝑝/2 𝑙 𝑝 . We apply (10.25), which may be rewritten, using 𝐿 𝑝 -norms over the measure 𝔰𝑛−1 , as the bound ∥𝜃 1 ∥ 𝑝 ≤ ( 𝑛𝑝 ) 1/2 . By the triangle inequality in 𝐿 𝑚 with an integer 𝑚 ≥ 1, this gives 𝑛 ∑︁ 𝑝 𝑝/2 ∥ 𝑓 ∥ 𝑚 ≤ 𝑛 𝑝/2 𝑎 𝑘 ∥ |𝜃 𝑘 | 𝑝 ∥ 𝑚 ≤ 𝑛 𝑝/2 ∥𝜃 1 ∥ 𝑚 . 𝑝 ≤ (𝑚 𝑝) 𝑘=1
Hence, by Markov’s inequality, for any 𝑡 > 0, 𝔰𝑛−1 { 𝑓 ≥ 𝑡} ≤ 𝑡 −𝑚 ∥ 𝑓 ∥ 𝑚 𝑚 ≤
(𝑚 𝑝) 𝑝/2 𝑡
𝑚 .
206
10 Second Order Spherical Concentration
Suppose that 𝑡 ≥ 2 (2𝑝) 𝑝/2 and consider the largest integer 𝑚 such that i.e., 𝑚 ≤ 𝑚 𝑡 = 𝑝1 ( 2𝑡 ) 2/ 𝑝 , so that
(𝑚 𝑝) 𝑝/2 𝑡
≤ 12 ,
𝔰𝑛−1 { 𝑓 ≥ 𝑡} ≤ 2−𝑚 . Since 𝑚 𝑡 ≥ 2, necessarily 𝑚 ≥
1 2
𝑚 𝑡 , and the above large deviation bound yields
n log 2 o 𝑚𝑡 𝔰𝑛−1 { 𝑓 ≥ 𝑡} ≤ exp − 2 n n log 2 2/ 𝑝 o 1 2/ 𝑝 o = exp − 𝑡 ≤ exp 𝑡 , − 6𝑝 2𝑝 · 22/ 𝑝
𝑡 ≥ 2 (2𝑝) 𝑝/2 ,
where we used the assumption 𝑝 > 2. For 𝑡 < 2 (2𝑝) 𝑝/2 , the above right-hand side is larger than 12 . So, this inequality is fulfilled automatically with an additional factor 2, that is, n 1 2/ 𝑝 o 𝑡 𝔰𝑛−1 { 𝑓 ≥ 𝑡} ≤ 2 exp − 6𝑝 for all 𝑡 > 0. It remains to change the variable 𝑡 = (6𝑝) 𝑝/2 𝑟, and then we arrive at the desired relation (10.23). Proposition 10.8.1 is proved. □ Using ( 𝐴31/3 + 2) 3 < 33 and ( 𝐴41/4 + 2) 4 < 121, we get in (10.22) for the particular cases 𝑝 = 3 and 𝑝 = 4 that, for all 𝑟 ≥ 1, 𝔰𝑛−1 {𝑛1/2 |𝑙3 | ≥ 33 𝑟} ≤ exp − (𝑟𝑛) 2/3 , (10.26) 1/2 𝔰𝑛−1 {𝑛 |𝑙4 | ≥ 121 𝑟} ≤ exp − (𝑟𝑛) . (10.27)
10.9 Remarks The material of this chapter is mainly based on the papers [36] (Sections 10.1–10.5 and [35] (Sections 10.6–10.8). An interesting question we did not discuss in this chapter concerns the asymptotic behavior of smooth symmetric functions on the 𝑛-sphere for large 𝑛. They may be asymptotically expressed in terms of polynomials of the functions 𝑄 𝑝 of this chapter (with identical coefficients 𝑎 𝑘 = 1). If the symmetric functions are centered in the sense that they do not depend on 𝑄 1 , the approximation resembles Edgeworth-type expansions in the central limit theorem. A typical asymptotic expansion of first order of this type will be an affine function of 𝑄 3 with an error term of order 𝑄 4 to which exponential bounds apply. For details we refer the interested reader to [95], [96] (and there Theorem 2.2 with order 𝑠 = 2). In these papers it is shown that this scheme provides a universal structure for all Edgeworth-type expansions.
Chapter 11
Linear Functionals on the Sphere
This chapter contains a more detailed analysis of distributions of linear functionals with respect to the normalized Lebesgue measure 𝔰𝑛−1 on the unit sphere S𝑛−1 ⊂ R𝑛 (𝑛 ≥ 2). The aim is in particular to quantify the asymptotic normality of these distributions and to include dimensional refinements of such approximation in analogy with Edgeworth expansions (which however we consider up to order 2).
11.1 First Order Normal Approximation By the rotational invariance of the measure 𝔰𝑛−1 , all linear functionals 𝑓 (𝜃) = ⟨𝜃, 𝑣⟩ with |𝑣| = 1 have equal distributions under 𝔰𝑛−1 . Hence, it is sufficient to focus just on the first coordinate 𝜃 1 of the vector 𝜃 ∈ S𝑛−1 viewed as a random variable on the probability space (S𝑛−1 , 𝔰𝑛−1 ). As already mentioned before, this random variable has density Γ( 𝑛2 ) 𝑛−3 𝑐 𝑛 1 − 𝑥 2 + 2 , 𝑥 ∈ R, 𝑐 𝑛 = √ 𝜋 Γ( 𝑛−1 2 ) with respect to the Lebesgue measure on the real line, where 𝑐 𝑛 is a normalizing √ constant. We denote by 𝜑 𝑛 the density of the normalized first coordinate 𝑛 𝜃 1 , i.e., 𝑥 2 𝑛−3 2 𝜑 𝑛 (𝑥) = 𝑐 𝑛′ 1 − , 𝑛 + We have 𝑐 𝑛′ → and
√1 , 2𝜋
𝑐 𝑛′
0, |𝜑 𝑛 (𝑥) − 𝜑(𝑥)| ≤ Moreover, in the interval |𝑥| ≤
𝑐 −𝑥 2 /4 e . 𝑛
1√ 2 𝑛,
|𝜑 𝑛 (𝑥) − 𝜑(𝑥)| ≤
2 𝑐 (1 + 𝑥 4 ) e−𝑥 /2 . 𝑛
As for the normalizing constants in the definition of 𝜑 𝑛 , let us show that 𝑐 𝑛′ < Equivalently, we need to prove that 𝑛 − 1 √︂ 𝑛 𝑛 0 with a log-concave distribution, the function 𝑢(𝑥) = log E 𝜉 𝑥 − 𝑣(𝑥) is concave in 𝑥 > 0, where 𝑣(𝑥) = 𝑥 log 𝑥. If 𝜉 has a standard exponential distribution, we have 𝑢(𝑥) = log Γ(𝑥 + 1) − 𝑣(𝑥), so, by Jensen’s inequality, 1 1 1 1 −𝑣 𝑥− ≥ log Γ(𝑥 + 1) − 𝑣(𝑥) + log Γ(𝑥) − 𝑣(𝑥 − 1) log Γ 𝑥 + 2 2 2 2 for any 𝑥 ≥ 1. Hence, putting Δ𝑣(𝑥) = Γ(𝑥) = Γ(𝑥 + 1)/𝑥, we get
1 2
𝑣(𝑥) +
1 2
𝑣(𝑥 − 1) − 𝑣(𝑥 − 12 ) and using
1 1 1 − log(𝑥 + 1) ≤ Δ𝑣(𝑥) − log(1 + 1/𝑥) ≡ 𝜓(𝑥). log Γ(𝑥 + 1) − log Γ 𝑥 + 2 2 2 Let us show that 𝜓(𝑥) is negative in 𝑥 > 1. We have 𝜓(1) = 0, while 𝜓 ′′ (𝑥) =
𝑃(𝑥) 4𝑥 2 (𝑥
− 1) (𝑥 − 12 ) (𝑥 + 1) 2
,
𝑃(𝑥) = −3𝑥 3 + 6𝑥 2 + 2𝑥 − 1.
By the Taylor integral formula, Δ𝑣(𝑥) → 0 as 𝑥 → ∞, so, 𝜓(∞)√ = 0. The quadratic equation 𝑃 ′ (𝑥) = 0 has two roots 𝑥1,2 = 13 ± 13 3 < 1, so that 𝑃(𝑥) is decreasing for 𝑥 > 1. Since 𝑃(1) = 4 and 𝑃(∞) = −∞, there is a unique point 𝑥0 > 1 such that 𝑃(𝑥) > 0 in 1 ≤ 𝑥 < 𝑥0 and 𝑃(𝑥) < 0 in 𝑥 > 𝑥 0 . Hence, 𝜓 is convex
210
11 Linear Functionals on the Sphere
in [1, 𝑥0 ] and is concave in [𝑥0 , ∞), implying that 𝜓(𝑥) < 0 for 𝑥 > 1. In particular, 1 √ Γ(𝑥 + 1) < Γ 𝑥 + 𝑥 + 1, 2 which is only needed for 𝑥 =
𝑛 2
𝑥 ≥ 1,
− 1.
11.2 Second Order Approximation We now sharpen Proposition 11.1.2 by providing approximations for 𝜑 𝑛 (𝑥) with errors of order 1/𝑛2 by means of a √ suitable modification of the standard normal density. Assuming again that |𝑥| ≤ 12 𝑛 and 𝑛 ≥ 3, let us refine Taylor’s expansion 2 𝑛−3 considered before. Namely, for 𝑝 𝑛 (𝑥) = (1 − 𝑥𝑛 ) 2 with some 0 ≤ 𝜀 ≤ 1, we have 𝑥2 𝑛−3 log 1 − 2 𝑛 2 ∞ 4 𝑛−3 𝑥 𝑥 𝑥 2 3 ∑︁ = + 2+ 2 𝑛 2𝑛 𝑛 𝑘=3 𝑛 − 3 𝑥2 𝑥4 𝑥 6 2𝜀 = + 2+ 3 = 2 𝑛 2𝑛 𝑛 3
− log 𝑝 𝑛 (𝑥) = −
1 𝑥 2 𝑘−3 𝑘 𝑛 𝑥 2 3𝑥 2 𝑥 4 1 3 4 𝑛 − 3 6 − + − 2 𝑥 − 𝑥 𝜀 . 2 2𝑛 4𝑛 𝑛 4 3𝑛
The remainder term 𝛿=− 2
satisfies 𝛿 ≥ − 3𝑥 2𝑛 +
𝑥4 4𝑛
−
1 3 4 𝑛 − 3 6 3𝑥 2 𝑥 4 + − 2 𝑥 − 𝑥 𝜀 2𝑛 4𝑛 𝑛 4 3𝑛
3𝑥 4 4𝑛2
> − 27 64 . Hence
|e− 𝛿 − 1 + 𝛿| ≤ Moreover, using 𝑥 2 ≤
1 4
𝛿2 27/64 e ≤ 𝛿2 . 2
𝑛, we get
1 3 4 1 6 3𝑥 2 𝑥 4 + + 2 𝑥 + 𝑥 2𝑛 4𝑛 𝑛 4 3 3𝑥 2 𝑥 4 1 3 2 1 4 𝑥 2 27 1 2 ≤ + + 𝑥 + 𝑥 = + 𝑥 . 2𝑛 4𝑛 4𝑛 4 3 𝑛 16 3
|𝛿| ≤
Using (𝑎 + 𝑏) 2 ≤ 2𝑎 2 + 2𝑏 2 , we obtain that 𝛿2 ≤
𝑥4 𝑛2
(6 + 29 𝑥 4 ). Hence,
11.2 Second Order Approximation
e𝑥
2 /2
211
𝑝 𝑛 (𝑥) = e− 𝛿 = 1 − 𝛿 + 𝜀1 𝛿2 3𝑥 2 𝑥 4 1 3 4 𝑛 − 3 6 𝑥4 2 = 1+ − + 2 𝑥 − 𝑥 𝜀 + 𝜀2 2 6 + 𝑥 4 2𝑛 4𝑛 𝑛 4 3𝑛 9 𝑛 3𝑥 2 𝑥 4 𝐴 = 1+ − + , 2𝑛 4𝑛 𝑛2
for some |𝜀 𝑗 | ≤ 1. Here 3 𝑛 − 3 6 2 𝑥 𝜀 + 𝑥 4 6 + 𝑥 4 | 𝐴| ≤ 𝑥 4 − 4 3𝑛 9 3 4 1 6 2 4 1 2 4 ≤ 𝑥 + 𝑥 + 𝑥 6 + 𝑥 ≤ 7𝑥 4 + 𝑥 6 + 𝑥 8 ≤ 8𝑥 4 + 𝑥 8 . 4 3 9 3 9 One can summarize. Lemma 11.2.1 If 𝑛 ≥ 3, then in the interval |𝑥| ≤ 𝑝 𝑛 (𝑥) = e−𝑥
2 /2
h
1+
1√ 2 𝑛,
i 6𝑥 2 − 𝑥 4 𝜀 + 2 8𝑥 4 + 𝑥 8 , 4𝑛 𝑛
|𝜀| ≤ 1.
We are now ready to derive a similar expansion for 𝜑 𝑛 (𝑥), expressing its deviations from the standard normal density 𝜑(𝑥) up to the quadratic term 1/𝑛2 . Denoting by 𝑍 a standard normal random variable, from Lemma 11.2.1 we obtain that ∫ ∞ 1 1 1 3 1 𝑝 𝑛 (𝑥) d𝑥 = 1 + 6 E𝑍 2 − E𝑍 4 + 𝑂 2 = 1 + +𝑂 2 . √ 4𝑛 4𝑛 𝑛 𝑛 2𝜋 −∞ √ Here we used the property that 𝑝 𝑛 (𝑥) has a sufficiently fast decay for |𝑥| ≥ 12 𝑛, as indicated in Lemma 11.1.1. Since 𝜑 𝑛 (𝑥) = 𝑐 𝑛′ 𝑝 𝑛 (𝑥) is a density, we conclude that 1 1 √ √ 3 3 + 𝑂 2 , 𝑐 𝑛′ 2𝜋 = 1 − +𝑂 2 . 1 = 𝑐 𝑛′ 2𝜋 1 + 4𝑛 4𝑛 𝑛 𝑛 √ From this, by Lemma 11.2.1, for |𝑥| ≤ 12 𝑛, 1 h i √ 2 3 𝜀 6𝑥 2 − 𝑥 4 2𝜋 e 𝑥 /2 𝜑 𝑛 (𝑥) = 1 − +𝑂 2 + 2 8𝑥 4 + 𝑥 8 1+ 4𝑛 4𝑛 𝑛 𝑛 1 + 𝑥8 3 6𝑥 2 − 𝑥 4 − +𝑂 . = 1+ 4𝑛 4𝑛 𝑛2 Using the 4-th Chebyshev–Hermite polynomial 𝐻4 (𝑥) = 𝑥 4 − 6𝑥 2 + 3, we arrive at: √ √ Proposition 11.2.2 In the interval |𝑥| ≤ 12 𝑛, 𝑛 ≥ 3, the random variable 𝜃 1 𝑛 has a density satisfying h 1 + 𝑥8 i 𝐻4 (𝑥) +𝐶 , 𝜑 𝑛 (𝑥) = 𝜑(𝑥) 1 − 4𝑛 𝑛2 where the quantity 𝐶 is bounded in absolute value by a universal constant.
212
11 Linear Functionals on the Sphere
Taking into account Lemma 11.1.1, we also have a non-uniform Gaussian bound on the whole real line. Proposition 11.2.3 For all 𝑥 ∈ R, 𝑛 ≥ 3, with some universal constant 𝐶 > 0, 𝐻4 (𝑥) 𝐶 −𝑥 2 /4 . 𝜑 𝑛 (𝑥) − 𝜑(𝑥) 1 − ≤ 2e 4𝑛 𝑛
11.3 Characteristic Function of the First Coordinate In the sequel we use the following notation. Definition. Denote by 𝐽𝑛 = 𝐽𝑛 (𝑡) the characteristic function of the first coordinate 𝜃 1 of a random vector 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) which is uniformly distributed on S𝑛−1 . In a more explicit form, for any 𝑡 ∈ R, ∫ ∞ 𝑛−3 𝐽𝑛 (𝑡) = 𝑐 𝑛 e𝑖𝑡 𝑥 (1 − 𝑥 2 )+ 2 d𝑥 −∞ ∫ ∞ √ 𝑥 2 𝑛−3 2 = 𝑐 𝑛′ e𝑖𝑡 𝑥/ 𝑛 1 − d𝑥, 𝑛 + −∞
Γ( 𝑛2 ) 𝑐𝑛 . 𝑐 𝑛′ = √ , 𝑐 𝑛 = √ 𝑛 𝜋 Γ( 𝑛−1 2 )
Note that the equality 𝐽e𝜈 (𝑡) = √
𝑡 𝜈 ∫
1 𝜋 Γ(𝜈 +
1 2)
2
1
1
e𝑖𝑡 𝑥 (1 − 𝑥 2 ) 𝜈− 2 d𝑥
−1
defines the Bessel function of the first kind with index 𝜈 (cf. [11], p. 81). Therefore, 𝐽𝑛 (𝑡) =
1 𝑡 −𝜈 e 1√ 𝜋Γ 𝜈+ 𝐽𝜈 (𝑡) 𝑐𝑛 2 2
with 𝜈 =
𝑛 − 1. 2
Since this relationship will not be needed, we prefer to use the similar notation 𝐽𝑛 as stated in Definition. √ Thus, the characteristic function of the normalized first coordinate 𝜃 1 𝑛 is given by ∫ ∞ √ e𝑖𝑡 𝑥 𝜑 𝑛 (𝑥) d𝑥, 𝜑ˆ 𝑛 (𝑡) = 𝐽𝑛 𝑡 𝑛 = −∞
which is the Fourier transform of the probability density 𝜑 𝑛 . Propositions 11.1.2 and 11.2.3 can be used, respectively, to compare 𝜑ˆ 𝑛 (𝑡) with 2 the Fourier transform e−𝑡 /2 of the standard normal distribution and with the Fourier transform of the “corrected Gaussian measure”, as well as to compare the derivatives of these transforms. Indeed, in general, if we have two integrable functions on the real line, say, 𝑝 and 𝑞, their Fourier transforms
11.3 Characteristic Function of the First Coordinate
∫
213
∞
e𝑖𝑡 𝑥 𝑝(𝑥) d𝑥,
𝑝(𝑡) ˆ =
∞
∫
e𝑖𝑡 𝑥 𝑞(𝑥) d𝑥
𝑞(𝑡) ˆ =
−∞
−∞
satisfy, for all 𝑡 ∈ R, ∞
∫
| 𝑝(𝑥) − 𝑞(𝑥)| d𝑥.
ˆ − 𝑞(𝑡)| ˆ ≤ | 𝑝(𝑡) −∞
In Propositions 11.1.2 and 11.2.3, the density 𝑝 = 𝜑 𝑛 is respectively approximated by the functions and
𝑞(𝑥) = 𝜑(𝑥)
𝐻4 (𝑥) 𝑞(𝑥) = 𝜑(𝑥) 1 − . 4𝑛
Since ∫
∞ 𝑖𝑡 𝑥
e
−𝑡 2 /2
𝜑(𝑥) d𝑥 = e
−∞
∫
∞
,
e𝑖𝑡 𝑥 𝐻4 (𝑥)𝜑(𝑥) d𝑥 = 𝑡 4 e−𝑡
2 /2
,
−∞
the associated Fourier transforms are described as 𝑞(𝑡) ˆ = e−𝑡
2 /2
,
𝑞(𝑡) ˆ = e−𝑡
2 /2
1−
𝑡4 . 4𝑛
Therefore, applying the bounds in these propositions, we respectively get ∫ ∞ ∫ ∞ 2 2 𝐶 𝐶 e−𝑥 /4 d𝑥, | 𝑝(𝑡) e−𝑥 /4 d𝑥. | 𝑝(𝑡) ˆ − 𝑞(𝑡)| ˆ ≤ ˆ − 𝑞(𝑡)| ˆ ≤ 2 𝑛 −∞ 𝑛 −∞ Let us state this once more. Proposition 11.3.1 For all 𝑡 ∈ R, we have 𝐶 √ 2 𝐽𝑛 𝑡 𝑛 − e−𝑡 /2 ≤ , 𝑛 where 𝐶 is an absolute constant. Moreover, √ 𝑡 4 −𝑡 2 /2 𝐶 e ≤ 2. 𝐽𝑛 𝑡 𝑛 − 1 − 4𝑛 𝑛 ∫∞ Returning to the general scheme, let us note that, when −∞ |𝑥| 𝑘 𝑝(𝑥) d𝑥 < ∞ and ∫∞ |𝑥| 𝑘 𝑞(𝑥) d𝑥 < ∞, the functions 𝑝ˆ and 𝑞ˆ are 𝑘 times continuously differentiable, −∞ and their derivatives of the 𝑘-th order are given by ∫ ∞ d𝑘 𝑝(𝑡) ˆ = e𝑖𝑡 𝑥 (𝑖𝑥) 𝑘 𝑝(𝑥) d𝑥 d𝑡 𝑘 −∞ and similarly for 𝑞. Hence
214
11 Linear Functionals on the Sphere
d𝑘 ∫ ∞ d𝑘 ˆ − 𝑘 𝑞(𝑡) ˆ ≤ |𝑥| 𝑘 | 𝑝(𝑥) − 𝑞(𝑥)| d𝑥. 𝑘 𝑝(𝑡) d𝑡 d𝑡 −∞ Since
∫
∞
|𝑥| 𝑘 e−𝑥
2 /4
d𝑥 = 2 𝑘+1 Γ
−∞
𝑘 + 1 ≤ (𝑐𝑘) 𝑘/2 2
for some absolute constant 𝑐 > 0, the statement of Proposition 11.3.1 may be extended to all derivatives. Proposition 11.3.2 For all 𝑘 = 0, 1, . . . and 𝑡 ∈ R, (𝐶 𝑘) 𝑘/2 d𝑘 √ d𝑘 2 , 𝑘 𝐽𝑛 𝑡 𝑛 − 𝑘 e−𝑡 /2 ≤ 𝑛 d𝑡 d𝑡 where 𝐶 is an absolute constant. Moreover, 𝑘 d √ d 𝑘 𝑡 4 −𝑡 2 /2 (𝐶 𝑘) 𝑘/2 . ≤ d𝑡 𝑘 𝐽𝑛 𝑡 𝑛 − d𝑡 𝑘 1 − 4𝑛 e 𝑛2 In particular, taking 𝑘 = 1, we have 𝐶 √ ′ 2 𝐽𝑛 𝑡 𝑛 + 𝑡e−𝑡 /2 ≤ , 𝑛 2 𝐶 √ ′ 𝑡 5 𝑡3 − − 𝑡 e−𝑡 /2 ≤ 2 . 𝐽𝑛 𝑡 𝑛 − 4𝑛 𝑛 𝑛 One may also add a 𝑡-depending factor on the right-hand side. For 𝑡 of order 1, this can be done just by virtue of Taylor’s formula. Indeed, the first three derivatives of the functions √ √ 𝑓𝑛 (𝑡) = 𝐽𝑛 𝑡 𝑛) = E e𝑖𝑡 𝜃1 𝑛 ,
𝑡 4 −𝑡 2 /2 e 𝑔𝑛 (𝑡) = 1 − 4𝑛
are equal at zero. Since also | 𝑓𝑛(4) (𝑡) − 𝑔𝑛(4) (𝑡)| ≤ 𝑛𝐶2 according to Proposition 11.3.2, the Taylor formula refines this proposition for the interval |𝑡| ≤ 1. Corollary 11.3.3 With some absolute constant 𝐶 we have, for all 𝑡 ∈ R, √ 𝑡 4 −𝑡 2 /2 𝐶 e ≤ 2 min{1, 𝑡 4 }, 𝐽𝑛 𝑡 𝑛 − 1 − 4𝑛 𝑛 √ ′ 4𝑡 2 − 𝑡 4 −𝑡 2 /2 𝐶 e ≤ 2 min{1, |𝑡| 3 }. 𝐽𝑛 𝑡 𝑛 + 𝑡 1 + 4𝑛 𝑛 √ The process of expanding the values 𝜑 𝑛 (𝑥) and their Fourier transforms 𝐽𝑛 𝑡 𝑛 in powers of 1/𝑛 can be done in analogy with Edgeworth expansions in the central limit theorem. Let us mention without a detailed derivation the next approximation, that is, the third order approximation (which however will not be needed for our
11.4 Upper Bounds on the Characteristic Function
215
applications). As before, 𝐻4 (𝑥) = 𝑥 4 − 6𝑥 2 + 3 denotes the 4-th Chebyshev–Hermite polynomial. √ √ Proposition 11.3.4 In the interval |𝑥| ≤ 21 𝑛, 𝑛 ≥ 3, the random variable 𝜃 1 𝑛 has a density satisfying h 1 + 𝑥 10 i 𝐻4 (𝑥) 𝑃(𝑥) + 2 +𝐶 , 𝜑 𝑛 (𝑥) = 𝜑(𝑥) 1 − 4𝑛 𝑛 𝑛3 where the quantity 𝐶 is bounded by a universal constant, and where 𝑃 is a (universal) polynomial of degree 8. Applying Lemma 11.1.1, we also have a non-uniform approximation for 𝜑 𝑛 (𝑥) on the whole real line and a corresponding assertion about its Fourier transform. Corollary 11.3.5 For all 𝑥 ∈ R, with some universal constant 𝐶, 𝐻4 (𝑥) 𝑃(𝑥) 𝐶 −𝑥 2 /4 + 2 ≤ 3e . 𝜑 𝑛 (𝑥) − 𝜑(𝑥) 1 − 4𝑛 𝑛 𝑛 Therefore, with some universal polynomial 𝑄 of degree 8, for all 𝑡 ∈ R, √ 𝑡 4 𝑄(𝑡) −𝑡 2 /2 𝐶 + 2 e ≤ 3. 𝐽𝑛 𝑡 𝑛 − 1 − 4𝑛 𝑛 𝑛
11.4 Upper Bounds on the Characteristic Function Although Proposition 11.3.1 provides good approximations for ∫ √𝑛 √ 𝑐𝑛 𝑥 2 𝑛−3 2 𝑖𝑡 𝑥 e d𝑥 𝐽𝑛 𝑡 𝑛 = √ 1 − √ 𝑛 𝑛 − 𝑛 ∫ 1 √ 𝑛−3 = 𝑐𝑛 e𝑖𝑡 𝑥 𝑛 1 − 𝑥 2 2 d𝑥, −1
𝑐𝑛 = √
Γ( 𝑛2 )
,
𝜋 Γ( 𝑛−1 2 )
with respect to the growing dimension 𝑛, it does not say anything √ about the decay of these characteristic functions on large 𝑡-intervals such as |𝑡| ≤ 𝑐 𝑛 (which should be similar to the Gaussian characteristic function). To this aim, let us describe a simple approach based on contour integration, assuming temporarily that 𝑛 ≥ 4. 𝑛−3 The function 𝑧 → (1 − 𝑧2 ) 2 is analytic in the whole complex plane when 𝑛 is odd and in the strip 𝑧 = 𝑥 + 𝑖𝑦, |𝑥| < 1, when 𝑛 is even. Therefore, integrating along the boundary 𝐶 of the rectangle [−1, 1] × [0, 𝑦] with 𝑦 > 0 (slightly modifying the contour in a standard way near the points −1 and 1), we have ∫ √ 𝑛−3 e𝑖𝑡 𝑧 𝑛 1 − 𝑧 2 2 d𝑧 = 0. 𝐶
216
11 Linear Functionals on the Sphere
√ Then we obtain a natural decomposition 𝐽𝑛 (𝑡 𝑛) = 𝑐 𝑛 𝐼1 (𝑡) + 𝐼2 (𝑡) + 𝐼3 (𝑡) , which we consider for 𝑡 > 0, where ∫ 1 √ √ 1 − (𝑥 + 𝑖𝑦) 2 𝑛−3 𝑛−3 2 −𝑡 𝑦 𝑛 2 2 𝐼1 (𝑡) = e 1+𝑦 d𝑥, e𝑖𝑡 𝑥 𝑛 2 1 + 𝑦 −1 ∫ 𝑦 √ √ 𝑛−3 𝐼2 (𝑡) = −e𝑖𝑡 𝑛 e−𝑡 𝑠 𝑛 1 − (1 + 𝑖𝑠) 2 2 d𝑠, 0
𝐼3 (𝑡) = e−𝑖𝑡
√
∫ 𝑛
𝑦
e−𝑡 𝑠
√ 𝑛
1 − (1 − 𝑖𝑠) 2
𝑛−3 2
d𝑠.
0
For 0 ≤ 𝑠 ≤ 𝑦 ≤ 𝛼, √︁ √︁ 1 − (1 + 𝑖𝑠) 2 = 𝑠 𝑠2 + 4 ≤ 𝛼 𝛼2 + 4 ≡ 𝛽. √1 , 6
Choosing 𝛼 =
|𝐼2 (𝑡)| ≤ 𝛽
we have 𝛽 = 56 . Hence, for all 𝑡 > 0,
𝑛−3 2
∫
𝑦
e−𝑡 𝑠
0
√ 𝑛
𝑛−3 1 d𝑠 ≤ √ 𝛽 2 , 𝑡 𝑛
𝑛−3 1 |𝐼3 (𝑡)| ≤ √ 𝛽 2 . 𝑡 𝑛
(11.2)
In order to estimate 𝐼1 (𝑡), we use an elementary identity 1 − (𝑥 + 𝑖𝑦) 2 2 = (1 − 𝑥 2 + 𝑦 2 ) 2 + 4𝑥 2 𝑦 2 = (1 − 2𝑥 2 )(1 + 𝑦 2 ) 2 + 𝑥 2 (𝑥 2 + 6𝑦 2 + 2𝑦 4 )
(𝑥, 𝑦 ∈ R),
which for the region |𝑥| ≤ 1 yields 1 − (𝑥 + 𝑖𝑦) 2 2 ≤ 1 − 2𝑥 2 + 𝑥 2 𝑣(𝑦 2 ), 1 + 𝑦2 Since 𝑣 ′ (𝑧) = assuming that
4−2𝑧 (1+𝑧) 3 𝑧 = 𝑦2
1 + 6𝑧 + 2𝑧2 . (1 + 𝑧) 2
𝑣(𝑧) =
> 0 in 0 ≤ 𝑧 < 2, this function increases in this interval, and ≤ 16 , we have 𝑣(𝑦 2 ) ≤ 𝑣(1/6) =
74 49 .
Hence
1 − (𝑥 + 𝑖𝑦) 2 2 n 24 o 24 2 ≤ 1 − 𝑥 ≤ exp − 𝑥2 . 49 49 1 + 𝑦2 Using this estimate and assuming that 𝑛 is sufficiently large, e.g. 𝑛 ≥ 12, so that 𝑛 − 3 ≥ 34 𝑛, we have ∫
1
−1
|1 − (𝑥 + 𝑖𝑦) 2 | 𝑛−3 2 1+
𝑦2
∫
∞
12 𝑛−3
2
e− 49 2 𝑥 d𝑥 −∞ √︂ √︂ √︂ 49 2𝜋 7 2𝜋 = ≤ . 12 𝑛 − 3 3 𝑛
d𝑥 ≤
11.4 Upper Bounds on the Characteristic Function
217
This upper bound allows us to conclude that √︂ √︂ n 𝑛−3 √ 7 2𝜋 −𝑡 𝑦 √𝑛 7 2𝜋 𝑛 − 3 2o 2 2 |𝐼1 (𝑡)| ≤ e 1+𝑦 ≤ exp − 𝑡𝑦 𝑛 + 𝑦 . 3 𝑛 3 𝑛 2 Choosing here 𝑦 = Recalling that 𝑐 𝑛 =
√𝑡 , the expression in the 𝑛 √︁ √ 𝑐 𝑛′ 𝑛 < 𝑛/(2𝜋), we then
𝑐 𝑛 |𝐼1 (𝑡)| ≤ In the case 𝑡 >
√︁
7 −𝑡 2 /2 e , 3
𝑛/6, we choose 𝑦 = 𝑐 𝑛 |𝐼1 (𝑡)| ≤
√1 6
exponent will be smaller than 𝑡 2 /2. get
0≤𝑡≤
√︁
𝑛/6.
(11.3)
√ 𝑛 1 2 and then −𝑡𝑦 𝑛 + 𝑛−3 2 𝑦 < − 12 − 4 , so that
7 − 𝑛 −1 e 12 4 , 3
𝑡≥
√︁
𝑛/6.
(11.4)
Let us collect these estimates, still assuming that 𝑛 ≥ 12. We use the bound √ 𝑛−3 𝛽 2 ≤ 𝐶 e−𝑛/12 . Here, since 𝛽 < e−1/12 , the optimal constant corresponds √ to 𝑛 = 12, so, √𝐶 = √1 ( 56 ) 9/2 e < 12 . Hence, by (11.2), we have, whenever 𝑡 ≥ 2, 2𝜋
2𝜋
√ 𝑐 𝑛 𝑛−3 1 2 √ 𝛽 2 ≤ √ e−𝑛/12 . 𝑛 2 √︁ √ Combining this bound with (11.3), for the interval 2 ≤ 𝑡 ≤ 𝑛/6 we obtain that 𝑐 𝑛 |𝐼2 (𝑡)| + |𝐼3 (𝑡)|) ≤
7 2 1 𝑐 𝑛 |𝐼1 (𝑡)| + |𝐼2 (𝑡)| + |𝐼3 (𝑡)| ≤ √ e−𝑛/12 + e−𝑡 /2 . 3 2 √︁ Similarly, in the case 𝑡 > 𝑛/6, we use (11.4), leading to 7 1 𝑐 𝑛 |𝐼1 (𝑡)| + |𝐼2 (𝑡)| + |𝐼3 (𝑡)| ≤ √ e−𝑛/12 + e−1/4 e−𝑛/12 < 3 e−𝑛/12 . 3 2 √ √ 2 Finally, if 0 ≤ 𝑡 ≤ 2, then |𝐽𝑛 (𝑡 𝑛)| ≤ 1 < 3 e−𝑡 /2 . One can summarize. Proposition 11.4.1 For all 𝑡 ∈ R, √ 𝐽𝑛 (𝑡 𝑛) ≤ 3 e−𝑡 2 /2 + 3 e−𝑛/12 .
(11.5)
Here the assumption 𝑛 ≥ 12 may be removed, since 3 e−𝑛/12 > 1 for 𝑛 < 12, √︁ while |𝐽𝑛 | ≤ 1. Note that in the region |𝑡| ≤ 𝑛/6, the bound (11.5) yields √ 𝐽𝑛 (𝑡 𝑛) ≤ 6 e−𝑡 2 /2 , which complements the normal approximation given in Proposition 11.3.1.
218
11 Linear Functionals on the Sphere
11.5 Polynomial Decay at Infinity Although the right-hand side of (11.5) is of an exponentially small order with respect √ to 𝑛 for |𝑡| ≥ 𝑛, this inequality√does not reflect properties such as integrability of the characteristic function 𝐽𝑛 (𝑡 𝑛) and polynomial decay in 𝑡 at infinity. Indeed, for example, in the case 𝑛 = 3, 𝜃 1 has a uniform distribution on [−1, 1], so that 𝐽3 (𝑡) = sin𝑡 𝑡 , implying |𝐽3 (𝑡)| ≤ |𝑡1 | . More generally, for 𝑛 ≥ 4 and 𝑡 > 0, let us integrate by parts to get 𝑐 𝑛 (𝑛 − 3) 𝐽𝑛 (𝑡) = 𝑖𝑡
∫
1
e𝑖𝑡 𝑥 𝑥(1 − 𝑥 2 )
𝑛−5 2
d𝑥.
−1
This implies that ∫ 1 𝑛−5 2𝑐 𝑛 (𝑛 − 3) 𝑥(1 − 𝑥 2 ) 2 d𝑥 |𝐽𝑛 (𝑡)| ≤ 𝑡 0 ∫ 1 𝑛−5 𝑐 𝑛 (𝑛 − 3) 2𝑐 𝑛 = (1 − 𝑦) 2 d𝑦 = . 𝑡 𝑡 0 Since 𝑐 𝑛
0 ∫
𝜋
e
−𝑡 sin 𝑢
2 sin 𝑢
𝑛−3 2
𝜋/2
d𝑢 = 2𝑐 𝑛
e−𝑡
sin 𝑢
2 sin 𝑢
𝑛−3 2
d𝑢.
0
0
Using the bounds 𝑢 ≥ sin 𝑢 ≥
2 𝜋
∫ |𝐽𝑛 (𝑡)| ≤ 2𝑐 𝑛
𝑢 in the last integrand, the above is simplified to ∞
2𝑡
e− 𝜋
𝑢
(2𝑢)
𝑛−3 2
d𝑢
0
= 𝑐𝑛
𝜋 𝑛−1 2
Γ
𝑛 − 1
𝑡
2
𝑛 1 𝜋 𝑛−1 2 =√ Γ . 2 𝜋 𝑡
To further simplify, we recall the estimate Γ(𝑥 + 1) ≤ 𝑥 𝑥 for 𝑥 ≥ 1 (cf. Section 1.3). 𝑛−2 𝑛−1 2 < ( 𝑛2 ) 2 for 𝑛 ≥ 4. The resulting bound is clearly This gives Γ( 𝑛2 ) ≤ ( 𝑛−2 2 ) valid for 𝑛 = 3 as well. Thus, we arrive at: Proposition 11.5.3 If 𝑛 ≥ 3 and 𝑡 ≠ 0, then 1 𝜋𝑛 𝑛−1 2 . |𝐽𝑛 (𝑡)| ≤ √ 2 |𝑡| 𝜋
(11.6)
𝑛−1
Thus, if 𝑛 is fixed, 𝐽𝑛 (𝑡) = 𝑂 (𝑡 − 2 ) as 𝑡 → ∞, and as is easy to check, the 𝑛−1 power 𝑛−1 2 is optimal, since it corresponds to the power 2 − 1 in the formula for the density of 𝜃 1 under 𝔰𝑛−1 . √ In addition, (11.6) indicates √ exponential smallness of |𝐽𝑛 (𝑡 𝑛)| with respect to 𝑛 in the region, say, 𝑡 ≥ 𝜋 𝑛. Nevertheless, let us rewrite this bound (almost equivalently) to fit the form of the bound of Proposition 11.4.1. Since √ √ 𝜋 𝑛 3𝑡 2 −1/2 ≤ 1+ 2 , 𝑡 ≥ 𝜋 𝑛, 2𝑡 𝜋 𝑛 from (11.6) we get √ 3𝑡 2 − 𝑛−1 1 4 . |𝐽𝑛 (𝑡 𝑛)| ≤ √ 1 + 2 𝜋 𝑛 𝜋
(11.7)
220
11 Linear Functionals on the Sphere
with the additional feature that it continues to This bound is as good as (11.5), √ hold for the values 0 < 𝑡 < 𝜋 𝑛 as well (modulo absolute constants). To see this, first note that in this region
Also, (1 +
1+
4𝜀 − 𝑛−1 4 𝑛 )
1 − 𝑛−1 𝑡 2 − 𝑛−1 4 4 ≥ 1 + ≥ e−𝑛/12 , 2 3 3𝜋 𝑛
≥ e−𝜀 for all 𝜀 ≥ 0. Hence, choosing 𝜀 = e−𝑡
2 /2
𝑛 ≥ 3. 1 2 2𝑡 ,
we have
𝑡 2 − 𝑛−1 2𝑡 2 − 𝑛−1 4 4 ≤ 1+ 2 . ≤ 1+ 𝑛 3𝜋 𝑛
One may summarize, by combining (11.7) with the bound (11.5). Proposition 11.5.4 If 𝑛 ≥ 3, then for all 𝑡 ∈ R, with some universal constant, 2 − 𝑛−1 √ 4 𝐽𝑛 (𝑡 𝑛) ≤ 6 1 + 𝑐𝑡 . 𝑛
(11.8)
One may take 𝑐 = 1/(3𝜋 2 ). Here, the right-hand side describes the correct polynomial behavior, properly reflecting the role of the dimension 𝑛. In fact, (11.8) holds for 𝑛 = 2 as well.
11.6 Remarks The first order approximation as in Proposition 11.1.2 and the Gaussian bound of Proposition 11.4.1 (with somewhat worse absolute constants) can be found in [38]. A second order approximation was developed in [40]. Note that Proposition 11.5.3 seems to be new.
Part IV
First Applications to Randomized Sums
Chapter 12
Typical Distributions
In this part (Chapters 12–14) we describe various aspects of Sudakov’s theorem. We start with the so-called typical distributions in his theorem and compare them with the standard normal law by means of the variance-type functionals including 𝜎42 (𝑋).
12.1 Concentration Problems for Weighted Sums Consider the model of random vectors 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 , 𝑛 ≥ 2, in full generality as introduced in Chapter 1. We denote by 𝐹𝜃 the distribution of the weighted sum ⟨𝑋, 𝜃⟩ =
𝑛 ∑︁
𝜃 𝑘 𝑋𝑘 ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ,
𝑘=1
as well as the associated distribution function 𝐹𝜃 (𝑥) = P{⟨𝑋, 𝜃⟩ ≤ 𝑥},
𝑥 ∈ R.
Here we return to the striking observation due to Sudakov stating that, when 𝑛 is large, most of 𝐹𝜃 ’s are concentrated about a certain probability distribution 𝐹 with respect to the normalized Lebesgue measure 𝔰𝑛−1 on the unit sphere S𝑛−1 . Now we give a precise formulation as in [169] under the second moment assumption E ⟨𝑋, 𝜃⟩ 2 ≤ 𝑀22 ,
𝜃 ∈ S𝑛−1 ,
(12.1)
and in terms of the (Kantorovich) 𝐿 1 -distance between distribution functions.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_12
223
224
12 Typical Distributions
Theorem 12.1.1 For each 𝛿 > 0, there is an integer 𝑛 𝛿 depending on 𝛿 with the following property. If 𝑛 ≥ 𝑛 𝛿 , one can choose a set Θ ⊂ S𝑛−1 of measure 𝔰𝑛−1 (Θ) ≥ 1 − 𝛿 and a distribution function 𝐹 such that ∫ ∞ 𝑊 (𝐹𝜃 , 𝐹) = |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 ≤ 𝑀2 𝛿 for all 𝜃 ∈ Θ. −∞
Thus, the concentration property of the family {𝐹𝜃 } 𝜃 ∈𝑆 𝑛−1 has a surprisingly universal character, since no requirement on the distribution of 𝑋 beyond the correlationtype condition (12.1) is needed. Let us compare this assertion with the classical i.i.d. model assuming that the components 𝑋 𝑘 are independent, identically distributed, have mean zero and variance one (in particular, 𝑀2 = 1). If the 3-rd absolute moment 𝛽3 = E |𝑋1 | 3 is finite, then, by the non-uniform Berry–Esseen theorem, for all 𝜃 ∈ S𝑛−1 , 𝑊 (𝐹𝜃 , Φ) ≤ 𝑐𝛽3
𝑛 ∑︁
|𝜃 𝑘 | 3 ,
𝑘=1
where 𝑐 > 0 is a universal constant (cf. e.g. [155]). This particular case is consistent with Theorem 12.1.1, which may be stated for all large 𝑛 ≥ 𝑛(𝛿, 𝛽3 ) for a universal set, for example, 𝑛 n ∑︁ √ o |𝜃 𝑘 | 3 ≤ 2/ 𝑛 . Θ = 𝜃 ∈ S𝑛−1 : 𝑘=1
Indeed, on the spaces
(S𝑛−1 , 𝔰
𝑛−1 )
we have convergence in probability
𝑛 √ ∑︁ 4 𝑛 |𝜃 𝑘 | 3 → E |𝑍 | 3 = √ 0 we have, for all 𝑛 ≥ 1, ∫ ∞ 1 + Var(𝜌) . (1 + 𝑥 2 ) |𝐹 (d𝑥) − Φ(d𝑥)| ≤ 𝑐 (12.2) 𝑛 −∞ In particular, ∫
∞
−∞ ∞
(1 + 𝑥 2 ) |𝐹 (d𝑥) − Φ(d𝑥)| ≤ 𝑐
1 + 𝜎42 , 𝑛
∫
1 + 𝜎2 (1 + 𝑥 2 ) |𝐹 (d𝑥) − Φ(d𝑥)| ≤ 𝑐 √ . 𝑛 −∞
Here the positive measure |𝐹 (d𝑥) − Φ(d𝑥)| denotes the variation in the sense of measure theory, and the left integral represents the weighted total variation of 𝐹 − Φ. In particular, we have a similar bound on the usual total variation distance between 𝐹 and Φ. The first assertion of the proposition is stronger than the second one because of the relation Var(𝜌) ≤ Var(𝜌 2 ), cf. Proposition 1.5.1. However, in applications, it might be more convenient to use the second bound (although it requires finiteness of the 4-th moments of the 𝑋 𝑘 ’s). The last inequality uses the variance-type functional
12.3 Normal Approximation for Gaussian Mixtures
227
√ 1 𝜎2 = 𝜎2 (𝑋) = √ E |𝑋 | 2 − 𝑛 = 𝑛 E |𝜌 2 − 1|. 𝑛 √ It follows from the first one due to the relation Var(|𝑋 |) ≤ 2 𝑛 𝜎2 , cf. Proposition 1.5.6. Although it provides worse dependence in 𝑛, it requires finiteness of the 2-nd moments of 𝑋 𝑘 and will be more preferable in obtaining the standard √1𝑛 -rate (recall also that 𝜎2 ≤ 𝜎4 ).
12.3 Normal Approximation for Gaussian Mixtures As a natural step, let us consider separately the question of the normal approximation for general mixtures of normal distributions Φ𝑡 on the line with mean zero and standard deviations 𝑡 ≥ 0. Assuming that an arbitrary non-negative random variable 𝜌 = 𝜌(𝜔) on a probability space (Ω, F , P) is independent of 𝑍 ∼ 𝑁 (0, 1), introduce the distribution function of the random variable 𝜌𝑍, i.e., Φ𝜌 (𝑥) = E Φ𝜌( 𝜔) (𝑥),
𝑥 ∈ R.
(12.3)
In this section we focus on the Kolmogorov distance from Φ𝜌 to Φ. Proposition 12.3.1 If E𝜌 2 = 1, then sup |Φ𝜌 (𝑥) − Φ(𝑥)| ≤ 2.1 Var(𝜌 2 ). 𝑥
Proof One may assume that 𝜌 > 0 a.s. Given 0 < 𝜀 < 1, we split the expectation in (12.3) into the events 𝐴 = {𝜌 2 ≥ 1 − 𝜀} and 𝐵 = {𝜌 2 < 1 − 𝜀}, so that Φ𝜌 (𝑥) − Φ(𝑥) = E (Φ𝜌( 𝜔) (𝑥) − Φ(𝑥)) 1 𝐴 + E (Φ𝜌( 𝜔) (𝑥) − Φ(𝑥)) 1 𝐵 .
(12.4)
Putting 𝜎 2 = Var(𝜌 2 ), 𝜎 ≥ 0, we have, by Chebyshev’s inequality, P(𝐵) = {1 − 𝜌 2 > 𝜀} ≤
𝜎2 . 𝜀2
(12.5)
Using also |Φ𝜌( 𝜔) (𝑥) − Φ(𝑥)| ≤ 12 , which holds for all 𝑥 ∈ R due to the symmetry of the involved distributions, we get the estimate | E (Φ𝜌( 𝜔) (𝑥) − Φ(𝑥)) 1 𝐵 | ≤
𝜎2 2𝜀 2
(12.6)
for the second term on the right-hand side of (12.4). As for the first term, fix a value 𝜌 = 𝜌(𝜔), 𝜔 ∈ Ω, and apply the inversion formula
228
12 Typical Distributions
∫ ∞ 2 2 2 1 e−𝜌 𝑡 /2 − e−𝑡 /2 e−𝑖𝑡 𝑥 d𝑡 2𝜋 −∞ −𝑖𝑡 ∫ ∞ 2 1 sin(𝑡𝑥) −𝜌2 𝑡 2 /2 = e − e−𝑡 /2 d𝑡, 𝜋 0 𝑡
Φ𝜌( 𝜔) (𝑥) − Φ(𝑥) =
which implies E (Φ𝜌 (𝑥) − Φ(𝑥)) 1 𝐴 = Now write e−𝜌
2 𝑡 2 /2
− e−𝑡
∞
∫
1 𝜋
0
2 /2
2 2 2 sin(𝑡𝑥) E e−𝜌 𝑡 /2 − e−𝑡 /2 1 𝐴 d𝑡. 𝑡
= e−𝑡
2 /2
e−(𝜌
2 −1)𝑡 2 /2
−1
and apply the elementary bound 1 2 𝑀 𝛿 e , 2
|e 𝛿 − 1 − 𝛿| ≤ Since 𝛿 ≡ − (𝜌 e−𝜌
2 −1)𝑡 2
2
2 𝑡 2 /2
≤
− e−𝑡
𝜀𝑡 2 2 2 /2
−∞ < 𝛿 ≤ 𝑀 (𝑀 ≥ 0).
on 𝐴, this gives that, for some random |𝛾| ≤ 1, = e−𝑡
2 /2
−
(𝜌 2 − 1) 2 𝑡 4 𝜀𝑡 2 /2 (𝜌 2 − 1)𝑡 2 +𝛾 e . 2 8
Hence, taking the expectation over the set 𝐴 and using 1 𝐴 = 1 − 1 𝐵 , we get for some constant |𝑐 1 | ≤ 1 22 2 𝑡 2 −𝑡 2 /2 𝑐 1 𝑡 4 −(1−𝜀)𝑡 2 /2 E e−𝜌 𝑡 /2 − e−𝑡 /2 1 𝐴 = e E (𝜌 2 − 1) 1 𝐵 + e E (𝜌 2 − 1) 2 1 𝐴 . 2 8 Returning to the inversion formula, this gives E (Φ𝜌 (𝑥) − Φ(𝑥)) 1 𝐴 = 𝛼(𝑥) E (𝜌 2 − 1) 1 𝐵 + 𝑐 2 𝛽 E (𝜌 2 − 1) 2 1 𝐴 with a constant 𝑐 2 such that |𝑐 2 | ≤ 1, where ∫ ∞ 2 1 1 𝛽= 𝑡 3 e−(1−𝜀)𝑡 /2 d𝑡 = 8𝜋 0 4𝜋 (1 − 𝜀) 2 and 𝛼(𝑥) = Clearly, |𝛼(𝑥)| ≤
1 2𝜋
√1 , 2 2 𝜋𝑒
∫ 0
∞
𝑡 sin(𝑡𝑥) e−𝑡
2 /2
2 1 d𝑡 = √ 𝑥 e−𝑥 /2 . 2 2𝜋
which leads to the uniform bound
1 E (Φ𝜌 (𝑥) − Φ(𝑥)) 1 𝐴 ≤ √1 | E (𝜌 2 − 1) 1 𝐵 | + E (𝜌 2 − 1) 2 1 𝐴 . 2 4𝜋 (1 − 𝜀) 2 2𝜋𝑒
12.4 Approximation in Total Variation
229
Next, by Cauchy’s inequality, and recalling (12.5), √︁ 𝜎2 | E (𝜌 2 − 1) 1 𝐵 | ≤ 𝜎 P(𝐵) ≤ , 𝜀 while E (𝜌 2 − 1) 2 1 𝐴 ≤ E (𝜌 2 − 1) 2 = 𝜎 2 . Using these bounds together with the bound (12.6) for the set 𝐵, we get the estimate | E (Φ𝜌 (𝑥) − Φ(𝑥))| ≤ 𝐶 𝜀 𝜎 2 with constant
1 1 1 + √ . + 2 2𝜀 2 2𝜋𝑒 𝜀 4𝜋 (1 − 𝜀) 2 Choosing 𝜀 = 0.65, we have 𝐶 𝜀 = 2.0191... Thus, Proposition 12.3.1 is proved. □ 𝐶𝜀 =
12.4 Approximation in Total Variation At the expense of a larger numerical constant, we now strengthen Proposition 12.3.1 by involving a much stronger distance. Keeping the same assumptions and notations as in the previous section, we prove: Proposition 12.4.1 If E𝜌 2 = 1, then for some absolute constant 𝑐 > 0, ∫ ∞ (1 + 𝑥 2 ) |Φ𝜌 (d𝑥) − Φ(d𝑥)| ≤ 𝑐 Var(𝜌).
(12.7)
−∞
Proof The normal distribution function Φ𝑡 with standard deviation 𝑡 > 0 has density 2 𝜑𝑡 (𝑥) = 1𝑡 𝜑(𝑥/𝑡), where 𝜑(𝑥) = √1 e−𝑥 /2 is the standard normal density. 2𝜋 For a fixed number 𝑥, let us expand the function 𝑢(𝑡) = 𝜑𝑡 (𝑥) according to the Taylor formula up to the quadratic term at the point 𝑡0 = 1. We have 𝑢(𝑡 0 ) = 𝜑(𝑥), 𝑢 ′ (𝑡) = (𝑡 −4 𝑥 2 − 𝑡 −2 ) 𝜑(𝑥/𝑡),
𝑢 ′ (𝑡 0 ) = (𝑥 2 − 1) 𝜑(𝑥),
and 𝑢 ′′ (𝑡) = (𝑡 −7 𝑥 4 − 5𝑡 −5 𝑥 2 + 2𝑡 −3 ) 𝜑(𝑥/𝑡) = 𝑡 −3 𝜓(𝑥/𝑡), where 𝜓(𝑧) = (𝑧4 − 5𝑧 2 + 2) 𝜑(𝑧). Therefore, using the integral Taylor formula ′
𝑢(𝑡) = 𝑢(𝑡0 ) + 𝑢 (𝑡0 ) (𝑡 − 𝑡 0 ) + (𝑡 − 𝑡0 )
2
∫ 0
we get
1
𝑢 ′′ ((1 − 𝑠) + 𝑠𝑡) (1 − 𝑠) d𝑠,
(12.8)
230
12 Typical Distributions
𝜑𝑡 (𝑥) − 𝜑(𝑥) = (𝑡 − 1) (𝑥 2 − 1) 𝜑(𝑥) ∫ 1 −3 (1 − 𝑠) + 𝑠𝑡 𝜓 + (𝑡 − 1) 2 0
𝑥 (1 − 𝑠) d𝑠. (1 − 𝑠) + 𝑠𝑡
We apply this representation with 𝑡 = 𝜉 (𝜔), where 𝜉 is a positive random variable (on some probability space). Thus, Φ 𝜉 has density 𝜑 𝜉 = E 𝜑 𝜉 ( 𝜔) , which can be represented as 𝜑 𝜉 (𝑥) − 𝜑(𝑥) = (𝑥 2 − 1) 𝜑(𝑥) E (𝜉 − 1) ∫ 1 −3 + E (𝜉 − 1) 2 (1 − 𝑠) + 𝑠𝜉 𝜓 0
𝑥 (1 − 𝑠) d𝑠. (1 − 𝑠) + 𝑠𝜉
Putting 𝑅 𝜉 (𝑥) = 𝜑 𝜉 (𝑥) − 𝜑(𝑥) − (𝑥 2 − 1) 𝜑(𝑥) E (𝜉 − 1) and applying Fubini’s theorem, we may conclude that the integral bounded by ∫
∞
(𝜉 − 1)
E −∞
2
1
∫
= 𝑐 0 E (𝜉 − 1)
2
∫
1
−∞
|𝑅 𝜉 (𝑥)| d𝑥 is
𝑥 (1 − 𝑠) d𝑠 d𝑥 (1 − 𝑠) + 𝑠𝜉 −2 (1 − 𝑠) + 𝑠𝜉 (1 − 𝑠) d𝑠
(1 − 𝑠) + 𝑠𝜉 0
∫∞
−3 𝜓
0
with 𝑐 0 = ∫ 1 1−𝑠 0 (1− 𝑠2 ) 2
∫∞ −∞
|𝜓(𝑥)| d𝑥. In particular, if 𝜉 ≥ 21 , the latter integral does not exceed
d𝑠 = 4 log 2 − 2 < 1, so that ∫
∞
|𝑅 𝜉 (𝑥)| d𝑥 ≤ 𝑐 0 E (𝜉 − 1) 2 .
−∞
This implies that for some |𝜃| ≤ 1, ∫ ∞ ∫ |𝜑 𝜉 (𝑥) − 𝜑(𝑥)| d𝑥 = |E𝜉 − 1| −∞
∞
∫
(𝜉 − 1) 2
−∞
∫
∫
∫∞ −∞
1
(1 − 𝑠) + 𝑠𝜉 0
where 𝑐 1 =
∫ 1 ∞
2 −∞ 𝑥
2
𝑥 2 |𝑅 𝜉 (𝑥)| d𝑥 may be bounded from above by
−3
𝑥 2 𝜓
𝑥 (1 − 𝑠) d𝑠 d𝑥 = 𝑐 1 E (𝜉 − 1) 2 , (1 − 𝑠) + 𝑠𝜉
|𝜓(𝑥)| d𝑥. Therefore (without any constraint on 𝜉),
∞
−∞
|𝑥 2 − 1| 𝜑(𝑥) d𝑥 + 𝜃𝑐 0 E (𝜉 − 1) 2 .
−∞
Analogously, the integral E
∞
𝑥 2 |𝜑 𝜉 (𝑥) − 𝜑(𝑥)| d𝑥 = |E𝜉 − 1|
∫
∞
−∞
𝑥 2 |𝑥 2 − 1| 𝜑(𝑥) d𝑥 + 𝜃𝑐 1 E (𝜉 − 1) 2 .
12.4 Approximation in Total Variation
231
The two representations can now be combined to ∫ ∞ (1 + 𝑥 2 ) |𝜑 𝜉 (𝑥) − 𝜑(𝑥)| d𝑥 ≤ 𝑎 |E𝜉 − 1| + 𝑏 E (𝜉 − 1) 2 , −∞
where ∫
∞
(1 + 𝑥 2 ) |𝑥 2 − 1| 𝜑(𝑥) d𝑥,
𝑎=
∫
∞
𝑏= −∞
−∞
1+
1 2 𝑥 |𝜓(𝑥)| d𝑥. 2
Let us derive numerical bounds on these absolute constants. If 𝑍 ∼ 𝑁 (0, 1), then using (1 + 𝑥 2 ) |𝑥 2 − 1| ≤ 1 + 𝑥 4 , we have 𝑎 ≤ 1 + E𝑍 4 = 4. Since |𝜓(𝑥)| ≤ (𝑥 4 + 5𝑥 2 + 2)𝜑(𝑥), we also have 𝑏 ≤ E (𝑍 4 + 5𝑍 2 + 2) +
1 E (𝑍 6 + 5𝑍 4 + 2𝑍 2 ) = 26. 2
Thus, ∫
∞
(1 + 𝑥 2 ) |𝜑 𝜉 (𝑥) − 𝜑(𝑥)| d𝑥 = 4 |E𝜉 − 1| + 26 E (𝜉 − 1) 2 ,
(12.9)
−∞
provided that 𝜉 ≥ 1/2. Now, consider the general case assuming without loss of generality that 𝜌 > 0. Introduce the events 𝐴0 = P{𝜌 < 1/2}, 𝐴1 = {𝜌 ≥ 1/2}, and put 𝛼0 = P( 𝐴0 ), 𝛼1 = P( 𝐴1 ), assuming again without loss of generality that 𝛼0 > 0. Next we split the distribution 𝑄 of 𝜌 into the two components supported on (0, 1/2) and [1/2, ∞) and denote by 𝜌0 and 𝜌1 some random variables distributed respectively as the normalized restrictions of 𝑄 to these regions, so that 𝜌0 < 1/2 and 𝜌1 ≥ 1/2. We thus represent the density of Φ𝜌 as the convex mixture of two densities 𝜑𝜌 = 𝛼0 𝜑𝜌0 + 𝛼1 𝜑𝜌1 ,
(12.10)
where 𝜑𝜌0 (𝑥) =
1 E 𝜑(𝑥/𝜌) 1 {𝜌∈ 𝐴0 } , 𝛼0
𝜑𝜌1 (𝑥) =
1 E 𝜑(𝑥/𝜌) 1 {𝜌∈ 𝐴1 } . 𝛼1
Note that, since E𝜌 2 = 1, we necessarily have E𝜌 ≤ 1. On the other hand, Var(𝜌) = 1 − (E𝜌) 2 = (1 − E𝜌)(1 + E𝜌) ≥ 1 − E𝜌. Hence |E𝜌 − 1| ≤ Var(𝜌), and as a consequence, E (𝜌 − 1) 2 = 2 (1 − E𝜌) ≤ 2 Var(𝜌).
(12.11)
In particular, since 𝜌 < 1/2 implies (𝜌 − 1) 2 > 1/4, we have, by Chebyshev’s inequality, 𝛼0 = P{𝜌 < 1/2} ≤ 8 Var(𝜌). (12.12)
232
12 Typical Distributions
Now, by the previous step (12.9) with 𝜉 = 𝜌1 , ∫ ∞ (1 + 𝑥 2 ) |𝜑𝜌1 (𝑥) − 𝜑(𝑥)| d𝑥 ≤ 4 |E𝜌1 − 1| + 26 E (𝜌1 − 1) 2 .
(12.13)
−∞
On the other hand, ∞
∫
−∞
so that
∫
(1 + 𝑥 2 ) 𝜑𝜌0 (𝑥) d𝑥 = 1 + E𝜌02 ≤
∞
−∞
(1 + 𝑥 2 ) |𝜑𝜌0 (𝑥) − 𝜑(𝑥)| d𝑥 ≤
5 , 4
13 . 4
From (12.10), (12.12) and (12.13), we then get that ∫ ∞ (1 + 𝑥 2 ) |𝜑𝜌 (𝑥) − 𝜑(𝑥)| d𝑥 ≤ 26 Var(𝜌) −∞
+ 4 |E𝜌1 − 1| + 26 E (𝜌1 − 1) 2 .
(12.14)
It remains to estimate the last two expectations. First suppose that Var(𝜌) ≤ 1/16, so that, by (12.12), 𝛼0 ≤ 12 and 𝛼1 ≥ 12 . By the definition, E𝜌1 =
1 1 E𝜌 1 {𝜌∈ 𝐴1 } = E𝜌 − E𝜌 1 {𝜌∈ 𝐴0 } , 𝛼1 𝛼1
so E𝜌1 − 1 =
1 E (𝜌 − 1) − E (𝜌 − 1) 1 {𝜌∈ 𝐴0 } . 𝛼1
By Cauchy’s inequality and applying (12.11) and (12.12), E (𝜌 − 1) 1 {𝜌∈ 𝐴 } ≤ E |𝜌 − 1| 1 {𝜌∈ 𝐴 } 0 0 1/2 1/2 ≤ E (𝜌 − 1) 2 𝛼0 ≤ 4 Var(𝜌).
(12.15)
Hence |E𝜌1 − 1| ≤
1 |E𝜌 − 1| + |E (𝜌 − 1) 1 {𝜌∈ 𝐴0 } | ≤ 10 Var(𝜌). 𝛼1
Similarly, E𝜌12 =
1 1 E𝜌 2 1 {𝜌∈ 𝐴1 } = 1 − E𝜌 2 1 {𝜌∈ 𝐴0 } , 𝛼1 𝛼1
so, using 𝜌 ≤ 1/2 on 𝐴0 and applying (12.15), we have
(12.16)
12.4 Approximation in Total Variation
233
1 E (1 − 𝜌 2 ) 1 {𝜌∈ 𝐴0 } 𝛼1 1 3 = E (1 − 𝜌) (1 + 𝜌) 1 {𝜌∈ 𝐴0 } ≤ E |1 − 𝜌| 1 {𝜌∈ 𝐴0 } 𝛼1 2𝛼1 3 ≤ · 4 Var(𝜌) ≤ 12 Var(𝜌). 2𝛼1
E𝜌12 − 1 =
Writing E (𝜌1 − 1) 2 = (E𝜌12 − 1) − 2 E (𝜌1 − 1) and applying (12.16), the above gives E (𝜌1 − 1) 2 ≤ 12 Var(𝜌) + 20 Var(𝜌) = 32 Var(𝜌). It remains to use this bound together with (12.16) in (12.14), which gives the desired estimate (12.7), i.e., ∫ ∞ (1 + 𝑥 2 ) |𝜑𝜌 (𝑥) − 𝜑(𝑥)| d𝑥 ≤ 𝑐 Var(𝜌) −∞
with constant 𝑐 = 26 + 4 · 10 + 26 · 32 = 898. Finally, in the case Var(𝜌) > 1/16, one may just use ∫ ∞ ∫ ∞ ∫ 2 2 (1 + 𝑥 ) |𝜑𝜌 (𝑥) − 𝜑(𝑥)| d𝑥 ≤ (1 + 𝑥 ) 𝜑𝜌 (𝑥) d𝑥 + −∞
−∞
∞
(1 + 𝑥 2 ) 𝜑(𝑥) d𝑥
−∞
= (1 + E (𝜌𝑍) 2 ) + (1 + E𝑍 2 ) = 4 < 64 Var(𝜌). Proposition 12.4.1 is proved with 𝑐 = 898.
□
Proof (of Proposition 12.2.2) Assuming (without loss of generality) that 𝑛 ≥ 3,√let Φ𝑛 and 𝜑 𝑛 denote respectively the distribution function and density of 𝑍 𝑛 = 𝜃 1 𝑛, where 𝜃 1 is the first coordinate of a random point uniformly distributed on the unit sphere S𝑛−1 . If 𝜌 2 = 𝑛1 |𝑋 | 2 is independent of 𝑍 𝑛 , then, by the definition of the typical distribution, 𝐹 (𝑥) = P{𝜌𝑍 𝑛 ≤ 𝑥} = E Φ𝑛 (𝑥/𝜌), 𝑥 ∈ R, where we may also assume that 𝜌 > 0 a.s. We know from Proposition 11.1.2 that |𝜑 𝑛 (𝑥) − 𝜑(𝑥)| ≤
𝑐 −𝑥 2 /4 e 𝑛
for all 𝑥 ∈ R with some universal constant 𝑐 > 0. Hence ∫ ∞ ∫ ∞ 𝑐 2 (1 + 𝑥 ) |Φ𝑛 (d𝑥) − Φ(d𝑥)| = (1 + 𝑥 2 ) |𝜑 𝑛 (𝑥) − 𝜑(𝑥)| d𝑥 ≤ 𝑛 −∞ −∞
(12.17)
(with some other constant). But, for any fixed value of 𝜌, ∫ ∞ ∫ ∞ (1 + 𝑥 2 ) |Φ𝑛 (d𝑥/𝜌) − Φ(d𝑥/𝜌)|(d𝑥) = (1 + 𝜌 2 𝑥 2 ) |Φ𝑛 (d𝑥) − Φ(d𝑥)|. −∞
−∞
So, taking the expectation with respect to 𝜌 and using Jensen’s inequality, we get
234
12 Typical Distributions
∫
∞
∫
∞
(1 + 𝑥 2 ) |𝐹 (d𝑥) − Φ𝜌 (d𝑥)| = (1 + 𝑥 2 ) |E Φ𝑛 (d𝑥/𝜌) − E Φ(d𝑥/𝜌)| −∞ −∞ ∫ ∞ ≤ E (1 + 𝑥 2 ) |Φ𝑛 (d𝑥/𝜌) − Φ(d𝑥/𝜌)| ∫−∞∞ ∫ ∞ (1 + 𝜌 2 𝑥 2 ) |Φ𝑛 (d𝑥) − Φ(d𝑥)| = ≤ E (1 + 𝑥 2 ) |Φ𝑛 (d𝑥) − Φ(d𝑥)|. −∞
−∞
It remains to apply (12.17) together with Proposition 12.4.1.
□
12.5 𝑳 𝒑 -distances to the Normal Law Proposition 12.4.1 may be used to obtain the (a priori weaker) non-uniform bound i h sup (1 + 𝑥 2 ) |Φ𝜌 (𝑥) − Φ(𝑥)| ≤ 𝑐 Var(𝜌) 𝑥
(cf. Lemma 12.5.4 below). The appearance of the weight 1+𝑥 2 on the left is important in order to control the 𝐿 𝑝 -distances between Φ𝜌 and Φ, since we get ∫
1/ 𝑝
∞ 𝑝
|Φ𝜌 (𝑥) − Φ(𝑥)| d𝑥
≤ 𝑐 Var(𝜌),
𝑝 ≥ 1,
(12.18)
−∞
for some absolute constant 𝑐 > 0. Here Φ𝜌 denotes the distribution function of 𝜌𝑍 with an arbitrary random variable 𝜌 ≥ 0, independent of 𝑍 ∼ 𝑁 (0, 1) and such that E𝜌 2 = 1. In the scheme of the weighted sums ⟨𝑋, 𝜃⟩ with 𝜌 2 = 𝑛1 |𝑋 | 2 , we obtain from Proposition 12.2.2 a similar bound on the normal approximation of the typical distribution function 𝐹 by the standard normal distribution function Φ in all 𝐿 𝑝 distances. Recall that 𝜎2 = √1𝑛 E |𝑋 | 2 − 𝑛 and 𝜎42 = 𝑛1 Var(|𝑋 | 2 ). Proposition 12.5.1 Suppose that E |𝑋 | 2 = 𝑛. With some absolute constant 𝑐 > 0 we have, for all 𝑝 ≥ 1, ∫
1/ 𝑝
∞
|𝐹 (𝑥) − Φ(𝑥)| 𝑝 d𝑥
≤𝑐
1
+ Var(𝜌) .
𝑛
−∞
In particular, ∫
1/ 𝑝
∞
|𝐹 (𝑥) − Φ(𝑥)| 𝑝 d𝑥 −∞
∫
1/ 𝑝
∞ 𝑝
|𝐹 (𝑥) − Φ(𝑥)| d𝑥 −∞
1 + 𝜎2 ≤𝑐 √ , 𝑛 ≤𝑐
1 + 𝜎42 . 𝑛
The particular value 𝑝 = 2 will be considered later on in more detail (in Problem 12.1.2). In this case, we deal with a special important distance
12.5 𝐿 𝑝 -distances to the Normal Law
𝜔(𝐹, 𝐺) =
235
∫
∞
|𝐹 (𝑥) − 𝐺 (𝑥)| 2 d𝑥
1/2 ,
−∞
which is finite, as long as the distribution functions 𝐹 and 𝐺 have finite first absolute moments. Returning to (12.18), we may thus write 𝜔(Φ𝜌 , Φ) ≤ 𝑐 Var(𝜌).
(12.19)
In fact, here the distance may be evaluated explicitly in terms of 𝜌, by using the following elementary identity (which will be needed in the sequel as well). Lemma 12.5.2 For all 𝛼, 𝛼0 ≥ 0, ∫
∞
−∞
2
2
√ √ √ e−𝛼𝑡 − e−𝛼0 𝑡 d𝑡 = 2 𝜋 𝛼 − 𝛼 . 0 𝑡2
Indeed, denoting the left integral by 𝜓(𝛼), we obtain a smooth function on (0, ∞) with derivative √ ∫ ∞ 𝜋 2 𝜓 ′ (𝛼) = − e−𝛼𝑡 d𝑡 = − √ . 𝛼 −∞ Since also 𝜓(𝛼0 ) = 0, after integration of the above equality we get the assertion. Now, let 𝜌 ′ be an independent copy of 𝜌, so as to represent the square of the characteristic function of Φ𝜌 as |E e𝑖𝑡𝜌𝑍 | 2 = |E e−𝜌
2 𝑡 2 /2
| 2 = E e−(𝜌
2 +𝜌′2 ) 𝑡 2 /2
.
Hence, by Plancherel’s theorem, ∫
∞
−∞
∞
1 2𝜋
∫
1 = 2𝜋
∫
(Φ𝜌 (𝑥) − Φ(𝑥)) 2 d𝑥 =
−∞ ∞ −∞
E e𝑖𝑡𝜌𝑍 − e−𝑡 𝑡 E e−(𝜌
2 /2
2 +𝜌′2 ) 𝑡 2 /2
2 d𝑡
− 2 E e−(𝜌 𝑡2
2 +1) 𝑡 2 /2
+ e−𝑡
2
d𝑡.
Applying Lemma 12.5.2 with 𝛼0 = 0, we arrive at the identity √︂ √︂ i 𝜌2 + 1 𝜌 2 + 𝜌 ′2 1 h 2 −E −1 . 𝜔 (Φ𝜌 , Φ) = √ 2 E 2 2 𝜋 As a result, (12.19) can be restated as the relation √︂ √︂ 𝜌2 + 1 𝜌 2 + 𝜌 ′2 −E − 1 ≤ 𝑐 (Var(𝜌)) 2 , 2E 2 2 where 𝑐 is an absolute constant. It is unclear how to obtain such an estimate by a different argument (which would not be based on Proposition 12.4.1). Finally, let us sharpen Proposition 12.5.1 √ under the stronger hypothesis that the distribution of 𝑋 lies on the sphere of radius 𝑛 (that is, when 𝜎2 = 𝜎4 = 0).
236
12 Typical Distributions
Proposition 12.5.3 Let 𝑋 be a random vector in R𝑛 with |𝑋 | 2 = 𝑛 a.s. Then 𝜔2 (𝐹, Φ) =
15 1 + 𝑂 (𝑛−3 ). √ 256 𝜋 𝑛2
Proof The characteristic function of 𝐹 is described as √ 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) = E 𝜃 E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ = E 𝐽𝑛 (𝑡|𝑋 |) = 𝐽𝑛 (𝑡 𝑛),
𝑡 ∈ R,
where 𝐽𝑛 is the characteristic function of the first coordinate 𝜃 1 under the uniform measure 𝔰𝑛−1 on the unit sphere S𝑛−1 . We apply the Plancherel formula and the expansion for 𝐽𝑛 from Corollary 11.3.3, 1 √ 𝑡 4 −𝑡 2 /2 e + 𝑂 2 min(1, 𝑡 4 ) . 𝐽𝑛 𝑡 𝑛 = 1 − 4𝑛 𝑛 This gives ∫ ∞ √ 2 d𝑡 1 |𝐽𝑛 𝑡 𝑛 − e−𝑡 /2 | 2 2 2𝜋 −∞ 𝑡 ∫ ∞ 4 2 d𝑡 1 𝑡 −𝑡 2 /2 = + 𝑂 (𝑛−2 ) min(1, 𝑡 4 ) 2 e 2𝜋 −∞ 4𝑛 𝑡 √ ∫ ∞ 6 1 𝑡 1 15 2𝜋 −𝑡 2 −3 = e d𝑡 + 𝑂 (𝑛 ) = √ + 𝑂 (𝑛−3 ), 2𝜋 −∞ 16 𝑛2 2𝜋 16𝑛2 · 8 2
𝜔2 (𝐹, Φ) =
thus proving the proposition.
□
Finally, let us explain the non-uniform bound mentioned in the beginning of this section. It is a consequence of Proposition 12.4.1, using the following general observation. Lemma 12.5.4 Suppose that the distribution functions 𝐹 and 𝐺 have densities 𝑝 and 𝑞 such that the weighted total variation distance ∫ ∞ 𝐿= (1 + 𝑥 2 ) | 𝑝(𝑥) − 𝑞(𝑥)| d𝑥 −∞
is finite. Then h i sup (1 + 𝑥 2 ) |𝐹 (𝑥) − 𝐺 (𝑥)| ≤ 2𝐿. 𝑥
12.6 Lower Bounds
237
Proof By Fubini’s theorem, ∫ ∞ ∫ ∞ ∫ ∞ 𝑥 |𝑥 (𝐹 (𝑥) − 𝐺 (𝑥))| d𝑥 = ( 𝑝(𝑦) − 𝑞(𝑦)) d𝑦 d𝑥 0 ∫0 ∞ h ∫𝑥 ∞ i ≤ 𝑥 | 𝑝(𝑦) − 𝑞(𝑦)| d𝑦 d𝑥 0 𝑥 ∫ 1 ∞ 2 = 𝑦 | 𝑝(𝑦) − 𝑞(𝑦)| d𝑦. 2 0 A similar bound also holds for the integral over the negative half-axis, so ∫ ∞ 1 |𝑥 (𝐹 (𝑥) − 𝐺 (𝑥))| d𝑥 ≤ 𝐿. 2 −∞ The function 𝐻 (𝑥) = (1 + 𝑥 2 ) (𝐹 (𝑥) − 𝐺 (𝑥)) has a Radon–Nikodym derivative ℎ(𝑥) = 2𝑥 (𝐹 (𝑥) − 𝐺 (𝑥)) + (1 + 𝑥 2 ) ( 𝑝(𝑥) − 𝑞(𝑥)), implying that 𝐻 has a total variation norm satisfying ∫ ∞ ∥𝐻 ∥ TV = |ℎ(𝑥)| d𝑥 −∞ ∫ ∞ ∫ ∞ ≤2 |𝑥 (𝐹 (𝑥) − 𝐺 (𝑥))| d𝑥 + (1 + 𝑥 2 ) | 𝑝(𝑥) − 𝑞(𝑥)| d𝑥 ≤ 2𝐿. −∞
−∞
Since |𝐻 (𝑥)| ≤ ∥𝐻 ∥ TV for all 𝑥, the required inequality follows. As its consequence, we obtain that, for any 𝑝 ≥ 1, ∫ ∞ 1/ 𝑝 ∫ ∞ |𝐹 (𝑥) − 𝐺 (𝑥)| 𝑝 d𝑥 ≤ 2𝐿 −∞
−∞
□
1/ 𝑝 1 d𝑥 . 2 𝑝 (1 + 𝑥 )
The last integral is smaller than 4, so that the on the left is bounded by 8𝐿. This explains the bound (12.18), as well as the first bound in Proposition 12.5.1 via an application of Proposition 12.2.2. 𝐿 𝑝 -norm
12.6 Lower Bounds In a certain sense, the upper bound of Proposition 12.3.1 on the Kolmogorov distance is optimal with respect to the variance of 𝜌 2 . At least, this is the case when 𝜌 is bounded, as the following assertion shows.
238
12 Typical Distributions
Proposition 12.6.1 If E𝜌 2 = 1 and 0 ≤ 𝜌 ≤ 𝑀 a.s., then for the distribution function Φ𝜌 of the random variable 𝜌𝑍, where 𝑍 ∼ 𝑁 (0, 1) is independent of 𝜌, we have sup |Φ𝜌 (𝑥) − Φ(𝑥)| ≥ 𝑥
𝑐 Var(𝜌 2 ), 𝑀5
(12.20)
where 𝑐 > 0 is an absolute constant. Note that the left-hand side is dominated by the total variation ∥Φ𝜌 − Φ∥ TV , while Var(𝜌 2 ) ≥ Var(𝜌). Also, the assumption E𝜌 2 = 1 ensures that 𝑀 ≥ 1. Proof We will need a general lower bound on the Kolmogorov distance Δ = sup |𝐹 (𝑥) − 𝐺 (𝑥)| 𝑥
between distribution functions 𝐹 and 𝐺 in terms of their characteristic functions 𝑓 and 𝑔. Recall that according to Proposition 3.4.1, ∫ 2 1 ∞ ( 𝑓 (𝑡) − 𝑔(𝑡)) e−𝑡 /2 d𝑡 . Δ≥ √ −∞ 2 2𝜋 We apply this bound with 𝐹 = Φ𝜌 and 𝐺 = Φ, in which case, by Jensen’s inequality, 𝑓 (𝑡) = E e−𝜌
2 𝑡 2 /2
≥ e−𝑡
2 /2
= 𝑔(𝑡).
Hence ∫ ∞ 2 2 2 2 1 E e−𝜌 𝑡 /2 − e−𝑡 /2 e−𝑡 /2 d𝑡 sup |Φ𝜌 (𝑥) − Φ(𝑥)| ≥ √ 𝑥 2 2𝜋 −∞ 1 1 1 = E √︁ −√ . (12.21) 2 2 1 + 𝜌2 One can now expand the function 𝑢(𝑡) = (1 + 𝑡) −1/2 near the point 𝑡0 = 1 according to the integral Taylor formula (12.8) up to the quadratic term. Since 𝑢 ′′ (𝑡) = 34 (1 + 𝑡) −5/2 , this gives ∫ 1 −5/2 3 E (𝜌 2 − 1) 2 1 + ((1 − 𝑠) + 𝑠𝜌 2 ) (1 − 𝑠) d𝑠 4 0 ∫ 1 −5/2 3 ≥ Var(𝜌 2 ) 1 + ((1 − 𝑠) + 𝑠𝑀 2 ) (1 − 𝑠) d𝑠 4 0 3 ≥ Var(𝜌 2 ) (1 + 𝑀 2 ) −5/2 , 8
E 𝑢(𝜌 2 ) − 𝑢(1) =
where we used 𝑀 ≥ 1 in the last step. It remains to apply (12.21) and then we arrive at (12.20). Proposition 12.6.1 follows. □
12.7 Remarks
239
Finally, returning to the scheme of the weighted sums ⟨𝑋, 𝜃⟩ (𝜃 ∈ S𝑛−1 ) with typical distribution function 𝐹, let us give an asymptotic analog of Proposition 12.5.3 for the Kolmogorov distance under the same hypothesis that the distribution √ of the random vector 𝑋 in R𝑛 lies on the sphere of radius 𝑛. √ Proposition 12.6.2 If |𝑋 | = 𝑛 a.s., then 𝑐0 𝑐1 ≤ sup |𝐹 (𝑥) − Φ(𝑥)| ≤ 𝑛 𝑛 𝑥 for some numerical constants 𝑐 1 > 𝑐 0 > 0. Proof Since 𝜎42 (𝑋) = 0, the upper bound follows from Proposition 12.2.2. For the lower bound, we use the second lower bound of Proposition 3.4.1, ∫ 1 𝑇 𝑡 ( 𝑓 (𝑡) − 𝑔(𝑡)) 1 − d𝑡 , Δ≥ 3𝑇 0 𝑇 and once more Corollary 11.3.3 with its asymptotic formula for the characteristic function 𝑓 of 𝐹, namely, 𝑡 4 −𝑡 2 /2 e + 𝑂 (𝑛−2 min(1, 𝑡 4 )). 𝑓 (𝑡) = 1 − 4𝑛 Putting 𝑔(𝑡) = e−𝑡
2 /2
and choosing 𝑇 = 1, the above lower bound then yields
Δ = sup |𝐹 (𝑥) − Φ(𝑥)| ≥ 𝑥
𝑐2 𝑐 1 𝑐 1′ − 2 ≥ , 𝑛 𝑛 𝑛
𝑛 ≥ 𝑛0 ,
for some numerical constants 𝑐 1 , 𝑐 2 > 0, 𝑐 1′ ∈ R and some integer 𝑛0 ≥ 1. In the case 𝑛 < 𝑛0 , the statement is obvious, since the distribution 𝐹 is supported on the √ √ interval [− 𝑛0 , 𝑛0 ], so that Δ is bounded away from zero for all 𝑛 < 𝑛0 . Proposition 12.6.2 follows. □
12.7 Remarks Propositions 12.2.2 and 12.4.1 were shown in [37]. A closely related property in the form of a local limit theorem was studied in [51], which contains the following observation. Suppose that a random vector 𝑋 is uniformly distributed in a convex body 𝐾 in R𝑛 with volume Vol𝑛 (𝐾) = 1 and satisfies ∫ ⟨𝑥, 𝜃⟩ 2 d𝑥 = 𝐿 2𝐾 , 𝜃 ∈ S𝑛−1 . E ⟨𝑋, 𝜃⟩ 2 = 𝐾
That is, the random vector 𝑌 = typical distribution 𝐹, we have
1 𝐿𝐾
𝑋 is isotropic. Then, for the density 𝑝 of the
240
12 Typical Distributions
𝑝(𝑥) −
1 √ 𝐿 𝐾 2𝜋
e−𝑥
2 /(2𝐿 2 ) 𝐾
𝜎 𝐿 1 4 𝐾 ≤ 𝐶 2√ + , 𝑥 𝑛 𝑛
√ 0 < 𝑥 ≤ 𝑐 𝑛,
for some absolute constants 𝐶 and 𝑐, where 𝜎42 = 𝜎42 (𝑌 ) =
1 1 Var(|𝑌 | 2 ) = Var(|𝑋 | 2 ). 𝑛 𝑛𝐿 4𝐾
Note that the typical density may also be described as ∫ 𝑝(𝑥) = Vol𝑛−1 𝐾 ∩ 𝐻 𝜃 (|𝑥|) d𝔰𝑛−1 (𝜃),
𝑥 ∈ R,
𝑆 𝑛−1
which expresses the average (𝑛 − 1)-dimensional volume of the section of 𝐾 by the hyperplanes 𝐻 𝜃 (|𝑥|) = {𝑦 ∈ R𝑛 : ⟨𝑦, 𝜃⟩ = |𝑥|} perpendicular to 𝜃 at distance |𝑥| from the origin. For some special bodies the normal approximation as described above has been studied in [71] and [126], cf. also [70].
Chapter 13
Characteristic Functions of Weighted Sums
In order to study deviations of the distribution functions 𝐹𝜃 from the typical distribution 𝐹 by means of the Kolmogorov distance, Berry–Esseen-type inequalities, which we discussed in Chapter 3, will be used. To this end we need to focus first on the behavior of characteristic functions of 𝐹𝜃 .
13.1 Upper Bounds on Characteristic Functions As before, let 𝑋 = (𝑋1 , . . . .𝑋𝑛 ) be a random vector in R𝑛 , 𝑛 ≥ 2. Problem 12.1.2 asking for the concentration of the distribution functions 𝐹𝜃 of the weighted sums ⟨𝑋, 𝜃⟩ around the typical distribution 𝐹 = E 𝜃 𝐹𝜃 may be attacked by the study of the characteristic functions 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ ,
𝑡 ∈ R.
(13.1)
Therefore, we aim to quantify the concentration of 𝑓 𝜃 around the characteristic function 𝑓 of the typical distribution 𝐹 for most directions 𝜃 in terms of the correlationtype functionals introduced in Chapter 1. Recall that, in view of Proposition 12.2.1, the characteristic function of 𝐹 is described as 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) = E 𝜃 E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ = E 𝐽𝑛 (𝑡|𝑋 |),
𝑡 ∈ R,
where 𝐽𝑛 is the characteristic function of the first coordinate 𝜃 1 under the uniform measure 𝔰𝑛−1 on the unit sphere S𝑛−1 . First let us look at the decay of 𝑓 𝜃 (𝑡) at infinity on average. From (13.1), E 𝜃 | 𝑓 𝜃 (𝑡)| 2 = E 𝜃 E e𝑖𝑡 ⟨𝑋−𝑌 , 𝜃 ⟩ = E 𝐽𝑛 (𝑡|𝑋 − 𝑌 |), where 𝑌 is an independent copy of 𝑋. To proceed, let us rewrite the Gaussian-type bound (11.5) of Proposition 11.4.1 as
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_13
241
242
13 Characteristic Functions of Weighted Sums
|𝐽𝑛 (𝑡)| ≤ 3 e−𝑡
2 /2𝑛
+ 3 e−𝑛/12
(13.2)
which gives E 𝜃 | 𝑓 𝜃 (𝑡)| 2 ≤ 3 E e−𝑡
2 |𝑋−𝑌 | 2 /2𝑛
+ 3 e−𝑛/12 .
Splitting the latter expectation into the event 𝐵 = {|𝑋−𝑌 | 2 ≤ 𝜆𝑛} and its complement, we get the following general statement. Lemma 13.1.1 The characteristic functions 𝑓 𝜃 satisfy, for all 𝑡 ∈ R and 𝜆 > 0, 2 1 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 ≤ e−𝜆𝑡 /2 + e−𝑛/12 + P{|𝑋 − 𝑌 | 2 ≤ 𝜆𝑛}, 3
where 𝑌 is an independent copy of 𝑋. In particular, √︁ 2 1 √ E 𝜃 | 𝑓 𝜃 (𝑡)| ≤ e−𝜆𝑡 /4 + e−𝑛/24 + P{|𝑋 − 𝑌 | 2 ≤ 𝜆𝑛}. 3 If E |𝑋 | 2 = 𝑛, the right-hand side of these bounds can be further developed by involving the moment and variance-type functionals 1 𝑚 𝑝 = √ (E | ⟨𝑋, 𝑌 ⟩ | 𝑝 ) 1/ 𝑝 , 𝑛
2 𝑝 1/ 𝑝 |𝑋 | − 1 . = 𝑛 E 𝑛 √
𝜎2 𝑝
Both are non-decreasing functions in 𝑝 ≥ 1. Recall that, by Proposition 1.1.3, 𝑚 2 ≥ 1 due to E |𝑋 | 2 = 𝑛, so that 𝑚 𝑝 ≥ 1 for all 𝑝 ≥ 1. The most interesting special values are 2𝑝 = 2, 3, 4, in which cases 𝜎2 =
1 E | |𝑋 | 2 − 𝑛|, 𝑛1/2
𝜎33/2 =
1 E | |𝑋 | 2 − 𝑛| 3/2 , 𝑛3/4
𝜎42 =
1 Var(|𝑋 | 2 ). 𝑛
In order to estimate the probability of the event 𝐵, let us choose 𝜆 = 1/4. Then, by Proposition 1.6.2, P(𝐵) ≤ 𝐶𝑛− 𝑝 with constant 𝐶 = 42 𝑝 (𝑚 22 𝑝𝑝 + 𝜎22𝑝𝑝 ). From Lemma 13.1.1 we therefore deduce: Lemma 13.1.2 Let the random vector 𝑋 in R𝑛 satisfy E |𝑋 | 2 = 𝑛. If the moment 𝑚 2 𝑝 is finite for 𝑝 ≥ 1, then for some constant 𝑐 > 0, 𝑐 𝑝 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 ≤ 𝑐 𝑝 E 𝜃 | 𝑓 𝜃 (𝑡)| ≤
𝑚 22 𝑝𝑝 + 𝜎22𝑝𝑝 𝑛𝑝 𝑚 2𝑝𝑝 + 𝜎2𝑝𝑝 𝑛 𝑝/2
+ e−𝑡
+ e−𝑡
2 /8
2 /16
,
.
Note that, by Jensen’s inequality, | 𝑓 (𝑡)| ≤ E 𝜃 | 𝑓 𝜃 (𝑡)|. Hence, the characteristic function of the typical distribution shares the same bounds. In fact, the parameter 𝑚 2 𝑝 is not needed. Indeed, by Proposition 1.5.1 with 𝜆 = 12 , if E |𝑋 | 2 = 𝑛, then
13.1 Upper Bounds on Characteristic Functions
243
𝜎2𝑝𝑝 1 o 𝑝 P |𝑋 | ≤ 𝑛 ≤ 2 𝑝/2 . 2 𝑛
n
2
Hence, by Lemma 13.1.1, and using (13.2), | 𝑓 (𝑡)| ≤ E |𝐽𝑛 (𝑡|𝑋 |)| 1 { |𝑋 | ≤√𝑛/2} + E |𝐽𝑛 (𝑡|𝑋 |)| 1 { |𝑋 |>√𝑛/2} ≤2
𝜎2𝑝𝑝
𝑝
𝑛 𝑝/2
+ 3 e−𝑡
2 /4
+ e−𝑛/12 .
Thus, we get: Lemma 13.1.3 Let the random vector 𝑋 in R𝑛 satisfy E |𝑋 | 2 = 𝑛. Then with some constant 𝑐 𝑝 > 0 depending on 𝑝 ≥ 1, for all 𝑡 ∈ R, 𝑐 𝑝 | 𝑓 (𝑡)| ≤
1 + 𝜎2𝑝𝑝 𝑛 𝑝/2
+ e−𝑡
2 /4
.
Therefore, for all 𝑇 > 0, 𝑐𝑝 𝑇
∫
𝑇
| 𝑓 (𝑡)| d𝑡 ≤ 0
1 + 𝜎2𝑝𝑝 𝑛 𝑝/2
+
1 . 𝑇
These upper bounds may be considerably sharpened in the case of independent random variables. Define the average 4-th moment 𝑛
1 ∑︁ E𝑋 𝑘4 . 𝛽¯4 = 𝑛 𝑘=1 Lemma 13.1.4 If 𝑋1 , . . . , 𝑋𝑛 are independent, have mean zero, variance one, and finite moment 𝛽¯4 , then E 𝜃 | 𝑓 𝜃 (𝑡)| 2 ≤ 3e−𝑡
2 /2
¯
+ 6e−𝑛/(16 𝛽4 ) .
As a consequence, E 𝜃 | 𝑓 𝜃 (𝑡)| ≤
√ −𝑡 2 /4 √ −𝑛/(32 𝛽¯ ) 4 3e + 6e .
Proof Let 𝑌 √ = (𝑌1 , . . . , 𝑌𝑛 ) be an independent copy of 𝑋. The random vector e = (𝑋 − 𝑌 )/ 2 is isotropic and has mean zero, so, 𝑀2 ( 𝑋) e = 1, while 𝑋 e = 𝜎42 ( 𝑋)
𝑛 1 ∑︁ 1 e| 2 = 1 Var | 𝑋 Var (𝑋 𝑘 − 𝑌𝑘 ) 2 = ( 𝛽¯4 + 1). 𝑛 4𝑛 𝑘=1 2
Since 𝛽¯4 ≥ 1, we get e + 𝑀 4 ( 𝑋) e = 𝜎42 ( 𝑋) 2
1 ¯ ( 𝛽4 + 1) + 1 ≤ 2 𝛽¯4 . 2
244
13 Characteristic Functions of Weighted Sums
e with 𝜆 = 1 , we may therefore conclude that Applying Proposition 2.1.4 to 𝑋 2 n o e| 2 ≤ 𝜆𝑛 P{|𝑋 − 𝑌 | 2 ≤ 𝑛} = P | 𝑋 n o (1 − 𝜆) 2 ≤ exp − 𝑛 ≤ exp − 𝑛/(16 𝛽¯4 ) . e + 𝑀 4 ( 𝑋) e 2 𝜎42 ( 𝑋) 2 Hence, Lemma 13.1.1, now with 𝜆 = 1, gives E 𝜃 | 𝑓 𝜃 (𝑡)| 2 ≤ 3e−𝑡
2 /2
¯
+ 3e−𝑛/12 + 3e−𝑛/(16 𝛽4 ) ≤ 3e−𝑡
2 /2
¯
+ 6e−𝑛/(16 𝛽4 ) .
It remains to apply Cauchy’s inequality. Lemma 13.1.4 is proved.
□
13.2 Concentration Functions of Weighted Sums The integral appearing in Lemma 13.1.3 provides a well-known upper bound on the concentration function, which in general is defined by 𝑄 𝜉 (ℎ) = 𝑄 𝐹 (ℎ) = sup P{𝑥 ≤ 𝜉 ≤ 𝑥 + ℎ},
ℎ ≥ 0.
𝑥
Here, 𝜉 is an arbitrary random variable with distribution function 𝐹. This quantity gives important information about the behavior of 𝐹 – in particular, about possible concentration of the “mass” around a point. But this coincidence in terminology is irrelevant to the spherical concentration and its consequences. According to Lemma 3.2.2, 96 2 ∫ 1/ℎ ℎ | 𝑓 (𝑡)| d𝑡, 𝑄 𝐹 (ℎ) ≤ 2 95 0
ℎ > 0,
where 𝑓 is the characteristic function of 𝜉. We will use this bound to estimate the concentration function of the weighted sums ⟨𝑋, 𝜃⟩, which thus gives ∫
1/ℎ
𝑄 𝐹𝜃 (ℎ) ≤ 3ℎ
| 𝑓 𝜃 (𝑡)| d𝑡. 0
Applying Lemma 13.1.2 with 𝑝 = 2, we immediately obtain an upper bound on the concentration functions on average, namely E 𝜃 𝑄 𝐹𝜃 (1/𝑇) ≤ 𝑐 𝑚 42 + 𝜎42
1 𝑐 + , 𝑛 𝑇
Here we may choose 𝑇 = 𝑛/(𝑚 42 + 𝜎42 ), and then we get:
𝑇 > 0.
13.3 Deviations of Characteristic Functions
245
Proposition 13.2.1 Given a random vector 𝑋 in R𝑛 with finite 4-th moment and such that E |𝑋 | 2 = 𝑛, put 𝛾 = 𝑚 42 (𝑋) + 𝜎42 (𝑋). For some absolute constant 𝑐 > 0 we have 𝛾 E 𝜃 𝑄 𝐹𝜃
≤
𝑛
𝑐𝛾 . 𝑛
By Markov’s inequality, given 𝑟 > 0, on the part of the unit sphere S𝑛−1 of 𝔰𝑛−1 -measure at least 1 − 2𝑐 𝑟 , we have that 𝛾 𝑄 𝐹𝜃
≤
𝑛
𝑟 𝛾 . 2 𝑛
Let us note that all concentration functions are subadditive, that is, 𝑄(ℎ1 + ℎ2 ) ≤ 𝑄(ℎ1 ) + 𝑄(ℎ2 ) for all ℎ1 , ℎ2 ≥ 0. In particular, 𝑄(𝜆ℎ) ≤ (𝜆 + 1) 𝑄(ℎ) ≤ 2𝜆 𝑄(ℎ), whenever 𝜆 ≥ 1, ℎ ≥ 0. Hence, the above inequality implies 𝛾 𝛾 ≤ 𝜆𝑟 , 𝜆 ≥ 1. 𝑄 𝐹𝜃 𝜆 𝑛 𝑛 Substituting ℎ = 𝜆 𝛾𝑛 , we obtain a partial answer to the question raised after the formulation of Problem 12.1.3. Corollary 13.2.2 Let E |𝑋 | 2 = 𝑛. Given 𝑟 > 0, on the part of the unit sphere of 𝔰𝑛−1 -measure at least 1 − 𝑟𝑐 with an absolute constant 𝑐 > 0, we have 𝑄 𝐹𝜃 (ℎ) ≤ 𝑟 ℎ
for all ℎ ≥
𝑚 42 (𝑋) + 𝜎42 (𝑋) . 𝑛
This statement shows that, when 𝑚 4 and 𝜎4 are bounded, most of the distribution functions 𝐹𝜃 behave like Lipschitz functions whose “almost” Lipschitz semi-norm ∥𝐹𝜃 ∥ Lip 𝜀 ≡ sup
𝑥−𝑦 ≥ 𝜀
𝐹𝜃 (𝑥) − 𝐹𝜃 (𝑦) 𝑥−𝑦
with 𝜀 =
𝑚 42 (𝑋) + 𝜎42 (𝑋) 𝑛
is of order 1.
13.3 Deviations of Characteristic Functions Now we are ready to apply the first order concentration on the sphere to study Problem 12.1.2 in terms of characteristic functions 𝑓 𝜃 (𝑡) with fixed 𝑡 ∈ R, rather than directly for the distributions 𝐹𝜃 . This can be done using the moment functionals 𝑀 𝑝 (𝑋) = sup
E | ⟨𝑋, 𝜃⟩ | 𝑝
1/ 𝑝 .
𝜃 ∈S𝑛−1
As before, let 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) denote the characteristic function of the typical distribution 𝐹. The complex-valued function 𝑢 𝑡 (𝜃) = 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ is smooth
246
13 Characteristic Functions of Weighted Sums
on the whole space R𝑛 and has partial derivatives 𝜕𝑘 𝑢 𝑡 (𝜃) = 𝑖𝑡 E𝑋 𝑘 e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ , 𝜕𝜃 𝑘
𝑘 = 1, . . . , 𝑛,
or in vector form ⟨∇𝑢 𝑡 (𝜃), 𝑣⟩ = 𝑖𝑡 E ⟨𝑋, 𝑣⟩ e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ ,
𝑣 ∈ C𝑛 ,
where we use a standard inner product in the complex Euclidean 𝑛-space. Hence, writing 𝑣 = 𝑎 + 𝑏𝑖, 𝑎, 𝑏 ∈ R𝑛 , we have in terms of 𝑀1 = 𝑀1 (𝑋) | ⟨∇𝑢 𝑡 (𝜃), 𝑣⟩ | ≤ |𝑡| E | ⟨𝑋, 𝑣⟩ | ≤ |𝑡| (E | ⟨𝑋, 𝑎⟩ | + E | ⟨𝑋, 𝑏⟩ |) ≤ 𝑀1 |𝑡| (|𝑎| + |𝑏|). This readily implies uniform bounds on the modulus of the gradient, namely |∇ Re(𝑢 𝑡 (𝜃))| ≤ 𝑀1 |𝑡|, |∇ Im(𝑢 𝑡 (𝜃))| ≤ 𝑀1 |𝑡|, √ so that |∇𝑢 𝑡 (𝜃)| ≤ 2 𝑀1 |𝑡|. By Cauchy’s inequality, we also have | ⟨∇𝑢 𝑡 (𝜃), 𝑣⟩ | 2 ≤ 𝑡 2 E | ⟨𝑋, 𝑣⟩ | 2 ≤ 𝑀2 𝑡 2 (|𝑎| 2 + |𝑏| 2 ) = 𝑀2 𝑡 2 |𝑣| 2 , so that |∇𝑢 𝑡 (𝜃)| ≤ 𝑀2 |𝑡| (where 𝑀2 = 𝑀2 (𝑋)). It is also interesting to get a similar bound holding on average. With this aim, let us square the vector representation and write ⟨∇𝑢 𝑡 (𝜃), 𝑣⟩ 2 = 𝑡 2 E ⟨𝑋, 𝑣⟩ ⟨𝑌 , 𝑣⟩ e𝑖𝑡 ⟨𝑋−𝑌 , 𝜃 ⟩ ,
𝑣 ∈ S𝑛−1 ,
where 𝑌 is an independent copy of 𝑋. Integrating over 𝑣 with respect to 𝔰𝑛−1 , we get |∇𝑢 𝑡 (𝜃)| 2 = 𝑡 2 E ⟨𝑋, 𝑌 ⟩ e𝑖𝑡 ⟨𝑋−𝑌 , 𝜃 ⟩ . One can now integrate over 𝜃 ∈ S𝑛−1 and summarize. Lemma 13.3.1 Given 𝑡 ∈ R, the function 𝑢 𝑡 (𝜃) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ satisfies √ |∇𝑢 𝑡 (𝜃)| ≤ 2𝑀1 (𝑋) |𝑡|, |∇𝑢 𝑡 (𝜃)| ≤ 𝑀2 (𝑋) |𝑡| for all 𝜃 ∈ R𝑛 , Moreover, using an independent copy 𝑌 of 𝑋, we have E 𝜃 |∇𝑢 𝑡 (𝜃)| 2 = 𝑡 2 E ⟨𝑋, 𝑌 ⟩ 𝐽𝑛 (𝑡 (𝑋 − 𝑌 )). Recall that 𝐽𝑛 (𝑡) denotes the characteristic function of the first coordinate of a point drawn from the unit sphere S𝑛−1 according to the uniform distribution 𝔰𝑛−1 . Thus, the Lipschitz property of the functions 𝜃 → 𝑓 𝜃 (𝑡) can be expressed in terms of the first absolute moment 𝑀1 (𝑋) of the random vector 𝑋 in R𝑛 . Applying Proposition 10.3.1 separately to the real and imaginary parts of the function
13.3 Deviations of Characteristic Functions
𝑢(𝜃) =
247
1 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) |𝑡|𝑀1 (𝑋)
(𝑡 ≠ 0),
we then get the subgaussian deviation inequality 2 𝔰𝑛−1 (𝑛 − 1) |Re(𝑢(𝜃))| ≥ 𝑟 ≤ 2e−𝑟 /2 , and similarly for Im(𝑢(𝜃)), valid for any 𝑟 ≥ 0. To unite both, note that |𝑢(𝜃)| ≥ 𝑟 implies 𝑟 𝑟 |Re(𝑢(𝜃))| ≥ √ or |Im(𝑢(𝜃))| ≥ √ . 2 2 As a result, we arrive at: Proposition 13.3.2 If 𝑀1 (𝑋) ≤ 𝑀1 , then for all 𝑡 ∈ R and 𝑟 ≥ 0, 2 2 2 𝔰𝑛−1 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| ≥ 𝑟 ≤ 4 e−(𝑛−1) 𝑟 /(4𝑀1 𝑡 ) . We also have corresponding bounds on the 𝐿 𝑝 -norms ∥ 𝑓 𝜃 (𝑡)− 𝑓 (𝑡) ∥ 𝑝 with respect to the measure 𝔰𝑛−1 . In the cases 𝑝 = 1 and 𝑝 = 2 such bounds can also be derived from the Cheeger-type (Proposition 9.5.1) and Poincaré inequalities on the sphere. Proposition 13.3.3 If 𝑀1 (𝑋) ≤ 𝑀1 , then for all 𝑡 ∈ R, E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤
2 (𝑀1 𝑡) 2 . 𝑛−1
(13.3)
More generally E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 𝑝 ≤
2( 𝑝 − 1) 𝑝/2 𝑛−1
(𝑀1 |𝑡|) 𝑝 ,
𝑝 ≥ 2.
Here, the 𝐿 𝑝 -bounds are based on Proposition 10.3.2. If we choose to use 𝑀2 (𝑋) rather than 𝑀1 (𝑋), the constant 2 may be removed from both inequalities. Moreover, under the additional assumption that the functions 𝑢 𝑡 (𝜃) = 𝑓 𝜃 (𝑡) are orthogonal to all linear functions in 𝐿 2 (𝔰𝑛−1 ), the bound (13.3) may be improved by virtue of the Poincaré-type inequality (9.11), which gives E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤
1 𝑀 2 (𝑋) 𝑡 2 . 2𝑛 2
(13.4)
Thus, the first order moment condition guarantees the standard √1𝑛 -rate of deviations for 𝑓 𝜃 (𝑡) around 𝑓 (𝑡) and hence for the distribution functions 𝐹𝜃 around 𝐹, at least in a weak sense (by applying tools from Fourier Analysis). Sudakov formulated this property under a (stronger) second order moment condition 𝑀2 (𝑋) ≤ 𝑀2 , which includes all isotropic random vectors. Let us state Proposition 13.3.2 once more for this important particular case.
248
13 Characteristic Functions of Weighted Sums
Corollary 13.3.4 If the random vector 𝑋 in R𝑛 is isotropic, then, for all 𝑡 ∈ R and 𝑟 ≥ 0, 2 2 𝔰𝑛−1 (𝑛 − 1) | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| ≥ 𝑟 ≤ 4 e−𝑟 /(4𝑡 ) . Moreover, E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 𝑝 ≤
𝑝 − 1 𝑝/2 𝑛−1
|𝑡| 𝑝 ,
𝑝 ≥ 2.
13.4 Deviations in the Symmetric Case In order to improve the standard rate in the bounds of Proposition 13.3.3 and Corollary 13.3.4 with respect to the dimension for deviations of 𝑢 𝑡 (𝜃) = 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ R𝑛 ,
(13.5)
about the mean characteristic function 𝑓 (𝑡) (under the measure 𝔰𝑛−1 ), we employ the second order concentration phenomenon on the sphere. This requires us to impose a corresponding orthogonality condition and to consider the second partial derivatives of 𝑢 𝑡 (𝜃) with 𝑡 being a real parameter. In turn, the study of the Hessian of 𝑢 𝑡 inspires us to impose a second order correlation condition on 𝑋. Thus, let 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) be an isotropic random vector in R𝑛 satisfying a second order correlation condition with finite constant Λ = Λ(𝑋), that is, 𝑛 𝑛 ∑︁ 2 ∑︁ E 𝑧 𝑗 𝑘 (𝑋 𝑗 𝑋 𝑘 − 𝛿 𝑗 𝑘 ) ≤ Λ |𝑧 𝑗 𝑘 | 2 𝑗,𝑘=1
(13.6)
𝑗,𝑘=1
for all complex numbers 𝑧 𝑗 𝑘 . To apply Proposition 10.2.2 (the second order Poincarétype inequality) and Proposition 10.4.2 (the second order concentration), let us also assume that 𝑋 has a symmetric distribution. In this case, 𝑢 𝑡 (𝜃) are even with respect to 𝜃, i.e., 𝑢 𝑡 (−𝜃) = 𝑢 𝑡 (𝜃), and hence these functions are orthogonal to all linear functions in 𝐿 2 (𝔰𝑛−1 ) for any 𝑡 ∈ R. In order to show that the deviations of 𝑢 𝑡 are of order 1/𝑛, we need to choose a suitable value of 𝑎 ∈ C and develop the operator norm ∥𝑢 𝑡′′ − 𝑎 I𝑛 ∥ and the Hilbert– Schmidt norm ∥𝑢 𝑡′′ − 𝑎 I𝑛 ∥ HS . First note that, by direct differentiation in (13.5), [𝑢 𝑡′′ (𝜃)] 𝑗 𝑘 =
𝜕2 𝑓 𝜃 (𝑡) = −𝑡 2 E 𝑋 𝑗 𝑋 𝑘 e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ . 𝜕𝜃 𝑗 𝜕𝜃 𝑘
Hence, it makes sense to take 𝑎 = −𝑡 2 𝑓 (𝑡) to balance the diagonal elements in the matrix of second derivatives of 𝑢 𝑡 . For all vectors 𝑣, 𝑤 ∈ C𝑛 with complex components, using the canonical inner product in the complex 𝑛-space, we have
𝑢 𝑡′′ (𝜃)𝑣, 𝑤 = −𝑡 2 E ⟨𝑋, 𝑣⟩ ⟨𝑋, 𝑤⟩ e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ .
13.4 Deviations in the Symmetric Case
249
The isotropy assumption ensures that E | ⟨𝑋, 𝑣⟩ | 2 = |𝑣| 2 and similarly for 𝑤. Hence, with this choice of 𝑎, by Cauchy’s inequality,
| (𝑢 𝑡′′ (𝜃) − 𝑎 I𝑛 )𝑣, 𝑤 | ≤ 𝑡 2 E | ⟨𝑋, 𝑣⟩ | | ⟨𝑋, 𝑤⟩ | + |𝑎| | ⟨𝑣, 𝑤⟩ | ≤ 2𝑡 2 , whenever |𝑣| = 1 and |𝑤| = 1. This bound ensures that ∥𝑢 𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ ≤ 2𝑡 2
(𝜃 ∈ R𝑛 , 𝑡 ∈ R).
(13.7)
Now, putting 𝑎(𝜃) = −𝑡 2 𝑓 𝜃 (𝑡), we have ∥𝑢 𝑡′′ (𝜃) − 𝑎(𝜃) I𝑛 ∥ 2HS =
𝑛 ∑︁ 𝑢 ′′ (𝜃) 𝑗 𝑘 − 𝑎(𝜃)𝛿 𝑗 𝑘 2 𝑡 𝑗,𝑘=1
𝑛 2 ∑︁ 𝑧 𝑗 𝑘 𝑢 𝑡′′ (𝜃) 𝑗 𝑘 − 𝑎(𝜃)𝛿 𝑗 𝑘 = sup 𝑗,𝑘=1 𝑛 2 ∑︁ 𝑧 𝑗 𝑘 𝑋 𝑗 𝑋 𝑘 − 𝛿 𝑗 𝑘 e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ = 𝑡 4 sup E 𝑗,𝑘=1 𝑛 ∑︁ 2 𝑧 𝑗 𝑘 (𝑋 𝑗 𝑋 𝑘 − 𝛿 𝑗 𝑘 ) , ≤ 𝑡 4 sup E 𝑗,𝑘=1
Í where the supremum runs over all complex numbers 𝑧 𝑗 𝑘 such that 𝑛𝑗,𝑘=1 |𝑧 𝑗 𝑘 | 2 = 1. But, under this constraint (with complex coefficients), the last expectation does not exceed Λ, according to the hypothesis (13.6). Thus, 2 ∥𝑢 𝑡′′ (𝜃) − 𝑎(𝜃) I𝑛 ∥ HS ≤ Λ𝑡 4
(13.8)
for all 𝜃. On the other hand, by Proposition 13.3.3 in the improved form (13.4), and using the isotropy assumption, we have E 𝜃 ∥ (𝑎(𝜃) − 𝑎) I𝑛 ∥ 2HS = 𝑛𝑡 4 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤
1 6 𝑡 . 2
(13.9)
Combining the last two bounds on the Hilbert–Schmidt norm, we arrive at E 𝜃 ∥𝑢 𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS ≤ 2Λ𝑡 4 + 𝑡 6 .
(13.10)
One can now involve the second order Poincaré-type inequality of Proposition 10.2.2, which yields 2 E 𝜃 ∥𝑢 𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS 𝑛(𝑛 − 1) 2 (2Λ𝑡 4 + 𝑡 6 ). ≤ 𝑛(𝑛 − 1)
E 𝜃 | 𝑓𝑡 (𝜃) − 𝑓 (𝑡)| 2 ≤
(13.11)
250
13 Characteristic Functions of Weighted Sums
This inequality already produces the desired 𝑛1 -rate with respect to 𝑛, and in fact, one may control large deviations. In terms of the function 𝑢(𝜃) =
1 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)), 2𝑡 2
𝜃 ∈ R𝑛 , 𝑡 ≠ 0,
the inequalities (13.7) and (13.10) take the form
1
′′
𝑢 (𝜃) + 𝑓 (𝑡) I𝑛 ≤ 1, 2
2 Λ 𝑡2 1
E 𝜃 𝑢 ′′ (𝜃) + 𝑓 (𝑡) I𝑛 ≤ + , HS 2 2 4
which means that the conditions of Proposition 10.4.2 are fulfilled for the real and imaginary parts of 𝑢 with parameter 𝑏 = 12 Λ + 14 𝑡 2 . Applying the exponential bound (10.10) to both real and imaginary parts, we then get ∥ Re( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) ∥ 𝜓1 ≤
4𝑡 2 (1 + 3𝑏), 𝑛−1
and similarly for Im( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)). Hence, by the triangle inequality for the 𝜓1 -norm, 8𝑡 2 (1 + 3𝑏) 𝑛−1 2𝑡 2 2𝑡 2 = (4 + 6Λ + 3𝑡 2 ) ≤ (14 Λ + 3𝑡 2 ), 𝑛−1 𝑛−1
∥ 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) ∥ 𝜓1 ≤
where we used that Λ ≥ 12 . As a result, we arrive at the following conclusion. Proposition 13.4.1 Let 𝑋 be an isotropic random vector in R𝑛 with a symmetric distribution and finite constant Λ. Then, the characteristic functions 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ satisfy, for any 𝑡 ≠ 0, E 𝜃 exp
n
o 𝑛−1 | 𝑓 (𝑡) − 𝑓 (𝑡)| ≤ 2. 𝜃 2𝑡 2 (14Λ + 3𝑡 2 )
However, in further applications it is desirable to improve the dependence on 𝑡 on the left-hand side of this exponential bound, at least, for not too large values of |𝑡|. With this aim, let us apply (13.11) in (13.9) which leads to the improved relation ∥ (𝑎(𝜃) − 𝑎) I𝑛 ∥ 2HS ≤
2 (2Λ𝑡 8 + 𝑡 10 ). 𝑛−1
Together with (13.8) this gives E 𝜃 ∥𝑢 𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS ≤ 2Λ𝑡 4 +
4 (2Λ𝑡 8 + 𝑡 10 ). 𝑛−1
This quantity is bounded by 𝑐Λ𝑡 4 in the region |𝑡| ≤ 𝑛1/6 . But, one may enlarge it by repeating the argument. Indeed, using the above upper bound, one may apply Proposition 10.2.2 once more, which gives
13.5 Deviations in the Non-symmetric Case
E 𝜃 | 𝑓𝑡 (𝜃) − 𝑓 (𝑡)| 2 ≤
251
2 4 2Λ𝑡 4 + (2Λ𝑡 8 + 𝑡 10 ) . 𝑛(𝑛 − 1) 𝑛−1
Using this in (13.9), we get ∥ (𝑎(𝜃) − 𝑎) I𝑛 ∥ 2HS ≤
4 2 2Λ𝑡 8 + (2Λ𝑡 12 + 𝑡 14 ) , 𝑛−1 𝑛−1
which together with (13.8) yields E 𝜃 ∥𝑢 𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS ≤ 2Λ𝑡 4 +
4 4 2Λ𝑡 8 + (2Λ𝑡 12 + 𝑡 14 ) . 𝑛−1 𝑛−1
The expression on the right-hand side is bounded by multiples of Λ𝑡 4 in larger regions such as |𝑡| ≤ 𝐴𝑛1/5 . By (13.7), the conditions of Proposition 10.4.2 are therefore fulfilled for the real and imaginary parts of the same function 𝑢 as above with parameter 𝑏 = 𝑐Λ/2. As a result, Proposition 10.4.2 yields: Proposition 13.4.2 Let 𝑋 be an isotropic random vector in R𝑛 with a symmetric distribution and finite constant Λ. Then, the characteristic functions 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ satisfy, for any 𝑡 ≠ 0, |𝑡| ≤ 𝐴𝑛1/5 , E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤
𝑐 Λ𝑡 4 𝑛2
with some constant 𝑐 > 0 depending on the parameter 𝐴 > 0 only. Moreover, n 𝑛 o | 𝑓 (𝑡) E 𝜃 exp − 𝑓 (𝑡)| ≤ 2. 𝜃 𝑐Λ𝑡 2 Continuing the process of repeated applications of (13.9), one can obtain Proposition 13.4.2 for the regions |𝑡| ≤ 𝐴𝑛 𝛼 with any fixed 𝛼 < 14 .
13.5 Deviations in the Non-symmetric Case We now drop the symmetry assumption and extend the bounds of Proposition 13.4.2 to the general case of an isotropic random vector 𝑋 in R𝑛 with a finite constant Λ as in (13.6). This is possible by involving an additional parameter, namely, the 𝐿 2 -norm of the linear part of characteristic functions 𝑢 𝑡 (𝜃) = 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ ,
𝜃 ∈ R𝑛 .
More precisely, consider an orthogonal decomposition 𝑢 𝑡 (𝜃) = 𝑓 (𝑡) + 𝑙 𝑡 (𝜃) + 𝑣𝑡 (𝜃), where 𝑙 𝑡 (𝜃) = 𝑎 1 (𝑡) 𝜃 1 + · · · + 𝑎 𝑛 (𝑡) 𝜃 𝑛 ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ R𝑛 ,
(13.12)
252
13 Characteristic Functions of Weighted Sums
is the orthogonal projection of 𝑢 𝑡 in 𝐿 2 (R𝑛 , 𝔰𝑛−1 ) onto the space of all linear functions (the linear part), and 𝑣𝑡 (𝜃) = 𝑢 𝑡 (𝜃) − 𝑓 (𝑡) − 𝑙 𝑡 (𝜃) is the non-linear part of 𝑢 𝑡 . By the orthogonality, E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 = E 𝜃 |𝑙 𝑡 (𝜃)| 2 + E 𝜃 |𝑣𝑡 (𝜃)| 2 .
(13.13)
To bound the last 𝐿 2 -norm, first note that 𝑢 𝑡 and 𝑣𝑡 have equal second order partial derivatives on R𝑛 . Choosing 𝑎 = −𝑡 2 𝑓 (𝑡) and putting 𝑎(𝜃) = −𝑡 2 𝑓 𝜃 (𝑡) as before, the inequalities (13.7)–(13.8) do not change, so that, for all 𝜃 ∈ R𝑛 and 𝑡 ∈ R, ∥𝑣𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ ≤ 2𝑡 2 ,
∥𝑣𝑡′′ (𝜃) − 𝑎(𝜃) I𝑛 ∥ 2HS ≤ Λ𝑡 4 .
(13.14)
On the other hand, by the isotropy assumption, we have |∇𝑢 𝑡 | ≤ |𝑡| (Lemma 13.3.1), so that, by the Poincaré inequality, E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤
𝑛 2 𝑡 ≤ 2𝑡 2 . 𝑛−1
(13.15)
Hence E 𝜃 ∥(𝑎(𝜃) − 𝑎) I𝑛 ∥ 2HS = 𝑛𝑡 4 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤ 2𝑡 6 .
(13.16)
Combining the last two bounds leads to E 𝜃 ∥𝑣𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS ≤ 2Λ𝑡 4 + 4𝑡 6 , which, by Proposition 10.2.2, gives E 𝜃 |𝑣𝑡 (𝜃)| 2 ≤
2 2 E 𝜃 ∥𝑣𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS ≤ (2Λ𝑡 4 + 4𝑡 6 ). 𝑛(𝑛 − 1) 𝑛(𝑛 − 1)
Next, we need to remove the last term 4𝑡 6 for the range |𝑡| ≤ 𝐴𝑛1/5 (by adding a smaller quantity). Using the above bound in (13.13), we have E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤ E 𝜃 |𝑙 𝑡 (𝜃)| 2 +
2 (2Λ𝑡 4 + 4𝑡 6 ), 𝑛(𝑛 − 1)
which, according to the identity in (13.16), gives E 𝜃 ∥ (𝑎(𝜃) − 𝑎) I𝑛 ∥ 2HS ≤ 𝑛𝑡 4 E 𝜃 |𝑙 𝑡 (𝜃)| 2 +
2 (2Λ𝑡 8 + 4𝑡 10 ). 𝑛−1
Combining this with the second inequality in (13.14), we get E 𝜃 ∥𝑣𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS ≤ 2𝑛𝑡 4 E 𝜃 |𝑙 𝑡 (𝜃)| 2 + 2Λ𝑡 4 +
4 (2Λ𝑡 8 + 4𝑡 10 ). 𝑛−1
13.5 Deviations in the Non-symmetric Case
253
Hence, by Proposition 10.2.2 once more, 2 4 2𝑛𝑡 4 E 𝜃 |𝑙 𝑡 (𝜃)| 2 + 2Λ𝑡 4 + (2Λ𝑡 8 + 4𝑡 10 ) 𝑛(𝑛 − 1) 𝑛−1 4 4𝑡 16 4 = E 𝜃 |𝑙 𝑡 (𝜃)| 2 + Λ𝑡 4 + (Λ𝑡 8 + 2𝑡 10 ), 𝑛−1 𝑛(𝑛 − 1) 𝑛(𝑛 − 1) 2
E 𝜃 |𝑣𝑡 (𝜃)| 2 ≤
so that, by (13.13), 4𝑡 4 E 𝜃 |𝑙 𝑡 (𝜃)| 2 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤ 1 + 𝑛−1 16 4 Λ𝑡 4 + (Λ𝑡 8 + 2𝑡 10 ). + 𝑛(𝑛 − 1) 𝑛(𝑛 − 1) 2 According to the identity (13.16) once more, this gives 4𝑡 4 E 𝜃 |𝑙 𝑡 (𝜃)| 2 E 𝜃 ∥ (𝑎(𝜃) − 𝑎) I𝑛 ∥ 2HS ≤ 𝑛𝑡 4 1 + 𝑛−1 16 4 Λ𝑡 8 + (Λ𝑡 12 + 2𝑡 14 ). + 𝑛−1 (𝑛 − 1) 2 Combining this inequality with the second inequality in (13.14), we get 4𝑡 4 E 𝜃 |𝑙 𝑡 (𝜃)| 2 E 𝜃 ∥𝑣𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS ≤ 2𝑛𝑡 4 1 + 𝑛−1 32 8 Λ𝑡 8 + (Λ𝑡 12 + 2𝑡 14 ). + 2Λ𝑡 4 + 𝑛−1 (𝑛 − 1) 2 Now, if |𝑡| ≤ 𝐴𝑛1/5 , the coefficient in front of E 𝜃 |𝑙 𝑡 (𝜃)| 2 does not exceed a multiple of 𝑛𝑡 4 . Similarly, in this region the last three terms can be bounded by Λ𝑡 4 up to a numerical factor. Hence, this bound simplifies to 𝑐 E 𝜃 ∥𝑣𝑡′′ (𝜃) − 𝑎 I𝑛 ∥ 2HS ≤ 𝑛𝑡 4 E 𝜃 |𝑙 𝑡 (𝜃)| 2 + Λ𝑡 4 for some constant 𝑐 > 0 depending on the parameter 𝐴, only. Once more by Proposition 10.2.2, it follows that 𝑐 E 𝜃 |𝑣𝑡 (𝜃)| 2 ≤ E 𝜃 |𝑙 𝑡 (𝜃)| 2 +
2 Λ𝑡 4 . 𝑛(𝑛 − 1)
It remains to use this inequality in (13.13) to get a similar upper bound 𝑐 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤ 2 E 𝜃 |𝑙 𝑡 (𝜃)| 2 +
2 Λ𝑡 4 . 𝑛(𝑛 − 1)
To get a stronger deviation inequality in terms of the Orlicz 𝜓1 -norm, note that, by (13.7), the conditions of Proposition 10.5.2 are fulfilled for the same function
254
13 Characteristic Functions of Weighted Sums
𝑢(𝜃) =
1 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) 2𝑡 2
as in the symmetric case, with − 12 𝑓 (𝑡) in place of 𝑎. Since (13.16) holds for 𝑢 𝑡 as well, this inequality may be rewritten as
2
1
𝑐 E 𝜃 𝑢 ′′ (𝜃) + 𝑓 (𝑡) I𝑛 ≤ 𝑛 ∥𝑙 𝑡 ∥ 2𝐿 2 + Λ. HS 2 Note that the linear part of 𝑢 is 10.5.2 applied to 𝑢 yields
1 𝑙. 2𝑡 2 𝑡
Hence, the inequality (13.14) of Proposition
1
1+Λ 1
+ ∥𝑙 𝑡 ∥ 2𝐿 2 + 2 ∥𝑙 𝑡 ∥ 𝐿 2 . 𝑐 2 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) ≤ 𝜓1 𝑛 2𝑡 2𝑡 Using once more Λ ≥ 12 , the above is simplified to 𝑐 ∥ 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) ∥ 𝜓1 ≤
Λ𝑡 2 + ∥𝑙 𝑡 ∥ 𝐿 2 + ∥𝑙 𝑡 ∥ 2𝐿 2 𝑡 2 . 𝑛
Here, the last term on the right-hand side is dominated by the second to last term in the smaller interval |𝑡| ≤ 𝐴𝑛1/6 . Indeed, according to the 𝐿 2 -bound (13.15), 2 |𝑡| 3 ∥𝑙 𝑡 ∥ 𝐿 2 𝑡 2 ≤ ∥ 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) ∥ 𝐿 2 𝑡 2 ≤ √ ≤ 2𝐴3 . 𝑛 Hence ∥𝑙 𝑡 ∥ 2𝐿 2 𝑡 2 ≤ 2𝐴3 ∥𝑙 𝑡 ∥ 𝐿 2 . As a result, we arrive at: Proposition 13.5.1 Given an isotropic random vector 𝑋 in R𝑛 with finite constant Λ, the characteristic functions 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ satisfy, for any |𝑡| ≤ 𝐴𝑛1/5 , 𝑐 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤ E 𝜃 |𝑙 𝑡 (𝜃)| 2 +
Λ𝑡 4 𝑛2
(13.17)
with a constant 𝑐 > 0 depending on 𝐴 > 0 only, where 𝑙 𝑡 (𝜃) is the linear part of 𝑓 𝜃 (𝑡) from the orthogonal decomposition (13.12). Moreover, in the interval |𝑡| ≤ 𝐴𝑛1/6 , 𝑐 ∥ 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) ∥ 𝜓1 ≤ ∥𝑙 𝑡 ∥ 𝐿 2 +
Λ𝑡 2 . 𝑛
(13.18)
If the distribution of 𝑋 is symmetric about the origin, then 𝑙 𝑡 (𝜃) = 0, and we return in (13.17) to the Poincaré-type inequality of Proposition 13.4.2, while (13.18) leads to the large deviation bound. The linear part 𝑙 𝑡 (𝜃) is also vanishing when 𝑋 has mean zero and a constant √ Euclidean norm, i.e. when |𝑋 | = 𝑛 a.s. This will be discussed in the next section.
13.6 The Linear Part of Characteristic Functions
255
13.6 The Linear Part of Characteristic Functions In order to make the bounds (13.17)–(13.18) effective, we need to properly estimate the 𝐿 2 -norm of the linear part 𝑙 𝑡 (𝜃) of 𝑓 𝜃 (𝑡), that is, the integrals 𝐼 (𝑡) = ∥𝑙 𝑡 (𝜃) ∥ 2𝐿 2 (𝔰
𝑛−1 )
= 𝑛 E 𝜃 E 𝜃 ′ ⟨𝜃, 𝜃 ′⟩ 𝑓 𝜃 (𝑡) 𝑓¯𝜃 ′ (𝑡).
Thus, we are looking for conditions and formulas which yield bounds 𝐼 (𝑡) = 𝑂 ( 𝑛12 ). As an equivalent formula, write 𝐼 (𝑡) = 𝑛
𝑛 ∑︁
| E 𝜃 𝜃 𝑘 𝑓 𝜃 (𝑡)| 2 = 𝑛
𝑘=1
𝑛 ∑︁
h i ′ E E 𝜃 E 𝜃 ′ 𝜃 𝑘 𝜃 𝑘′ e𝑖𝑡 ⟨𝑋, 𝜃 ⟩−𝑖𝑡 ⟨𝑌 , 𝜃 ⟩ ,
𝑘=1
where 𝑌 is an independent copy of 𝑋. To compute the inner expectations, introduce the function √ 𝐾𝑛 (𝑡) = 𝐽𝑛 𝑡𝑛 , 𝑡 ≥ 0, where, as before, 𝐽𝑛 denotes the characteristic function of the first coordinate of a point on the unit sphere S𝑛−1 under 𝔰𝑛−1 . By the definition, E 𝜃 e𝑖 ⟨𝑣, 𝜃 ⟩ = 𝐽𝑛 (|𝑣|) = 𝐾𝑛
|𝑣| 2 , 𝑛
𝑣 = (𝑣1 , . . . , 𝑣𝑛 ) ∈ R𝑛 .
Differentiating this equality with respect to the variable 𝑣 𝑘 , we obtain that 𝑖 E 𝜃 𝜃 𝑘 e𝑖 ⟨𝑣, 𝜃 ⟩ =
2𝑣 𝑘 ′ |𝑣| 2 𝐾 . 𝑛 𝑛 𝑛
Let us multiply this by a similar equality −𝑖 E 𝜃 𝜃 𝑘 e−𝑖 ⟨𝑤, 𝜃 ⟩ =
2𝑤 𝑘 ′ |𝑤| 2 𝐾𝑛 , 𝑛 𝑛
and conclude that, for all 𝑣, 𝑤 ∈ R𝑛 , |𝑣| 2 |𝑤| 2 h i 4𝑣 𝑤 ′ 𝑘 𝑘 𝐾𝑛′ 𝐾𝑛′ . E 𝜃 E 𝜃 ′ 𝜃 𝑘 𝜃 𝑘′ e𝑖 ⟨𝑣, 𝜃 ⟩−𝑖 ⟨𝑤, 𝜃 ⟩ = 2 𝑛 𝑛 𝑛 Hence, summing over all 𝑘 ≤ 𝑛, we get 𝑛 ∑︁
2 2 h i 4 ⟨𝑣, 𝑤⟩ ′ ′ |𝑣| ′ |𝑤| 𝐾 𝐾 . E 𝜃 E 𝜃 ′ 𝜃 𝑘 𝜃 𝑘′ e𝑖 ⟨𝑣, 𝜃 ⟩−𝑖 ⟨𝑤, 𝜃 ⟩ = 𝑛 𝑛 𝑛 𝑛 𝑛2 𝑘=1
It remains to make the substitution 𝑣 = 𝑡 𝑋, 𝑤 = 𝑡𝑌 , and to take the expectation over (𝑋, 𝑌 ). Then we arrive at the following expression for the integrals 𝐼 (𝑡).
256
13 Characteristic Functions of Weighted Sums
Lemma 13.6.1 For all 𝑡 ∈ R, the characteristic function 𝑓 𝜃 (𝑡) as a function of 𝜃 on the sphere S𝑛−1 has a linear part whose squared 𝐿 2 (𝔰𝑛−1 )-norm is equal to 𝐼 (𝑡) =
𝑡 2 |𝑋 | 2 𝑡 2 |𝑌 | 2 4𝑡 2 E ⟨𝑋, 𝑌 ⟩ 𝐾𝑛′ 𝐾𝑛′ , 𝑛 𝑛 𝑛
√ 𝑡𝑛 , 𝑡 ≥ 0, and where 𝑌 is an independent copy of 𝑋. √ In particular, if |𝑋 | = 𝑛 a.s., then
where 𝐾𝑛 (𝑡) = 𝐽𝑛
𝐼 (𝑡) =
4𝑡 2 ′2 2 𝐾 (𝑡 ) E ⟨𝑋, 𝑌 ⟩ , 𝑛 𝑛
which is vanishing as long as 𝑋 has mean zero. Note that the property 𝐼 (𝑡) = 0 continues to hold for more general random vectors. In particular, this is the case when the conditional distribution of 𝑋 given that |𝑋 | = 𝑟 has mean zero for any 𝑟 > 0. Now, let us write down an asymptotic formula for the function 𝐾𝑛 and its derivative. We know from Corollary 11.3.3 that √ 𝑡 4 − 4𝑡 2 −𝑡 2 /2 d 𝐽𝑛 (𝑡 𝑛) = −𝑡 1 − e + 𝑂 𝑛−2 min(1, |𝑡| 3 ) . d𝑡 4𝑛 √ Since 𝐽𝑛 (𝑡 𝑛) = 𝐾𝑛 (𝑡 2 ), after differentiation we find that 2𝑡𝐾𝑛′ (𝑡 2 ) =
𝑡 4 − 4𝑡 2 −𝑡 2 /2 d 𝐾𝑛 (𝑡 2 ) = −𝑡 1 − e + 𝑂 𝑛−2 min(1, |𝑡| 3 ) . d𝑡 4𝑛
Changing the variable, we arrive at 𝐾𝑛′ (𝑡) = −
𝑡 2 − 4𝑡 −𝑡/2 1 1− e + 𝑂 𝑛−2 min(1, 𝑡) , 2 4𝑛
𝑡 ≥ 0.
From this 𝐾𝑛′ (𝑡)𝐾𝑛′ (𝑠) =
(𝑡 2 + 𝑠2 ) − 4(𝑡 + 𝑠) −(𝑡+𝑠)/2 1 1− e + 𝑂 𝑛−2 4 4𝑛
uniformly over all 𝑡, 𝑠 ≥ 0, so, 4𝐾𝑛′
𝑡 2 |𝑋 | 2 𝑛
𝐾𝑛′
𝑡 2 |𝑌 | 2
4 𝑡 4 ( |𝑋𝑛2| + = 1−
|𝑌 | 4 ) 𝑛2
𝑐 𝑛2
|𝑌 | 2 2 2 2 𝑛 ) − 𝑡 (|𝑋|2𝑛+|𝑌 | )
4𝑛
𝑛
with a remainder term satisfying |𝜀| ≤ If 𝑋 is isotropic, we get
2
− 4𝑡 2 ( |𝑋𝑛| +
e
up to some absolute constant 𝑐.
𝑡2 𝑐𝑡 2 E | ⟨𝑋, 𝑌 ⟩ | |𝜀| ≤ 3 𝑛 𝑛
√︃ 𝑐𝑡 2 E ⟨𝑋, 𝑌 ⟩ 2 = 5/2 . 𝑛
+𝜀
13.7 Remarks
257 2
2
But, even assuming that E |𝑋 | 2 = 𝑛, one may use E | ⟨𝑋, 𝑌 ⟩ | ≤ E |𝑋 | 2+ |𝑌 | = 𝑛, 2 which leads to a bound of order at most 𝑛𝑡 2 . Hence, from Lemma 13.6.1 we obtain: Lemma 13.6.2 Let 𝑋 be a random vector in R𝑛 such that E |𝑋 | 2 = 𝑛. For all 𝑡 ∈ R, the characteristic function 𝑓 𝜃 (𝑡) as a function of 𝜃 on the unit sphere has a linear part whose squared 𝐿 2 (𝔰𝑛−1 )-norm is represented as 𝑡4 𝑡2 𝐼 (𝑡) = E ⟨𝑋, 𝑌 ⟩ 1 − 𝑛
|𝑋 | 4 𝑛2
+
|𝑌 | 4 𝑛2
− 4𝑡 2 4𝑛
|𝑋 | 2 𝑛
+
|𝑌 | 2 𝑛
e−
𝑡 2 (|𝑋| 2 +|𝑌 | 2 ) 2𝑛
+ 𝑂 (𝑡 2 𝑛−2 ),
where 𝑌 is an independent copy of 𝑋. Moreover, if 𝑋 is isotropic, then the error term may be improved to 𝑂 (𝑡 2 𝑛−5/2 ). Let us apply this result to the inequalities of Proposition 13.5.1, still assuming that 𝑌 is an independent copy of 𝑋. Proposition 13.6.3 Let 𝑋 be an isotropic random vector in R𝑛 with finite constant Λ = Λ(𝑋). For any |𝑡| ≤ 𝐴𝑛1/5 , the characteristic functions 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ satisfy Λ𝑡 4 𝑐 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤ 2 + 𝐼 (𝑡) 𝑛 with some constant 𝑐 > 0 depending on the parameter 𝐴 > 0 only, where 𝐼 (𝑡) = with 𝑅 2 =
𝑡2 (𝑈 2 + 𝑉 2 ) 𝑡 4 − 8𝑅 2 𝑡 2 −𝑅2 𝑡 2 E ⟨𝑋, 𝑌 ⟩ 1 − e + 𝑂 𝑡 2 𝑛−5/2 𝑛 4𝑛
|𝑋 | 2 + |𝑌 | 2 2𝑛
and 𝑈 =
|𝑋 | 2 𝑛 ,𝑉
=
|𝑌 | 2 𝑛 .
Moreover, in the interval |𝑡| ≤ 𝐴𝑛1/6 ,
𝑐 ∥ 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) ∥ 𝜓1 ≤
Λ𝑡 2 √︁ + 𝐼 (𝑡). 𝑛
13.7 Remarks First order concentration bounds on deviations of characteristic functions like the one of Proposition 13.3.2 were derived in [25]. Proposition 13.4.1 is proved in [39], and its extension to the non-linear case has been developed in [40].
Chapter 14
Fluctuations of Distributions
The main theme in this chapter is the various bounds on the deviations of the distribution functions 𝐹𝜃 (𝑥) from the typical distribution function 𝐹 (𝑥) and from the standard normal distribution function Φ(𝑥) at individual points 𝑥 and in several weak metrics, including the 𝐿 1 -distance (the Kantorovich metric), the Lévy and Kolmogorov distances.
14.1 The Kantorovich Transport Distance In dimension one (i.e., on the real line), the Kantorovich transport distance between two probability distributions, say 𝐹 and 𝐺, takes the simple form of the 𝐿 1 -distance ∫ ∞ |𝐹 (𝑥) − 𝐺 (𝑥)| d𝑥, 𝑊 (𝐹, 𝐺) = −∞
where we use the same symbols to denote the distribution functions associated with given measures. This quantity, which defines a metric in the space of all probability measures on the real line with finite first absolute moments, is closely related to the topology of weak convergence (cf. e.g. [174]). As before, given a random vector 𝑋 in R𝑛 (𝑛 ≥ 2), we denote by 𝐹𝜃 (𝑥) = P{⟨𝑋, 𝜃⟩ ≤ 𝑥},
𝑥 ∈ R,
the distribution functions of the weighted sums ⟨𝑋, 𝜃⟩ with 𝜃 ∈ S𝑛−1 and by 𝐹 (𝑥) = E 𝜃 𝐹𝜃 (𝑥) the average (typical) distribution function. In order to deal with the main Problem 12.1.2, we start with the Kantorovich distance for bounding possible fluctuations of 𝐹𝜃 around 𝐹 on average. As a main parameter of the underlying distribution of 𝑋, we use the moment functionals 𝑀 𝑝 = 𝑀 𝑝 (𝑋) = sup
E | ⟨𝑋, 𝜃⟩ | 𝑝
1/ 𝑝 ,
𝑝 ≥ 1.
𝜃 ∈S𝑛−1
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_14
259
260
14 Fluctuations of Distributions
Denote by 𝑞=
𝑝 𝑝−1
the conjugate power. Our aim is to prove the following statement quantifying and generalizing Theorem 12.1.1. Proposition 14.1.1 If 𝑝 > 1, then E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 7𝑞𝑀 𝑝
1 . (2𝑛) 1/(2𝑞)
(14.1)
For example, in the case 𝑝 = 2, we get 1
E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 14 𝑀2 (2𝑛) − 4 . By Chebyshev’s inequality, it follows that, for any 𝛿 > 0, 14 1 (2𝑛) − 4 . 𝔰𝑛−1 𝑊 (𝐹𝜃 , 𝐹) ≥ 𝑀2 𝛿 ≤ 𝛿 The expression on the right does not exceed 𝛿 as soon as 𝑛 ≥ 144 𝛿−8 , so that Theorem 12.1.1 holds true with 𝑛 𝛿 = [144 𝛿−8 ] + 1. For the proof of Proposition 14.1.1 we need one elementary lemma. Lemma 14.1.2 Given distribution functions 𝐹 and 𝐺, for all real numbers 𝑎 < 𝑏 and any integer 𝑁 ≥ 1, ∫ 𝑎
𝑏
𝑁 ∫ ∑︁ |𝐹 (𝑥) − 𝐺 (𝑥)| d𝑥 ≤ 𝑘=1
2(𝑏 − 𝑎) , (𝐹 (𝑥) − 𝐺 (𝑥)) d𝑥 + 𝑁 𝑎 𝑘−1 𝑎𝑘
where 𝑎 𝑘 = 𝑎 + (𝑏 − 𝑎 ) 𝑁𝑘 . Proof Denote by 𝐼 the collection of all indexes 𝑘 = 1, . . . , 𝑁 such that in the 𝑘-th interval Δ 𝑘 = (𝑎 𝑘−1 , 𝑎 𝑘 ) the function 𝜓(𝑥) = 𝐹 (𝑥) − 𝐺 (𝑥) does not change the sign. The remaining indexes form a complementary set 𝐽 ⊂ {1, . . . , 𝑁 }. Then ∫ ∫ |𝐹 (𝑥) − 𝐺 (𝑥)| d𝑥 = (𝐹 (𝑥) − 𝐺 (𝑥)) d𝑥 Δ𝑘
Δ𝑘
for all 𝑘 ∈ 𝐼. In the case 𝑘 ∈ 𝐽, we obviously have sup |𝜓(𝑥)| ≤ sup (𝜓(𝑥) − 𝜓(𝑦)) ≤ 𝐹 (Δ 𝑘 ) + 𝐺 (Δ 𝑘 ), 𝑥 ∈Δ 𝑘
𝑥,𝑦 ∈Δ 𝑘
where 𝐹 and 𝐺 are treated as probability measures. In this case, ∫ 𝑏−𝑎 . |𝐹 (𝑥) − 𝐺 (𝑥)| d𝑥 ≤ (𝐹 (Δ 𝑘 ) + 𝐺 (Δ 𝑘 ))|Δ 𝑘 |, |Δ 𝑘 | = 𝑁 Δ𝑘
14.1 The Kantorovich Transport Distance
261
Combining these estimates, we can bound the integral
∫
𝑏 𝑎
|𝐹 (𝑥) − 𝐺 (𝑥)| d𝑥 by
∑︁ ∑︁ ∫ (𝐹 (𝑥) − 𝐺 (𝑥)) d𝑥 + (𝐹 (Δ 𝑘 ) + 𝐺 (Δ 𝑘 )) |Δ 𝑘 | Δ𝑘
𝑘 ∈𝐼
𝑘 ∈𝐽
𝑁 ∫ 𝑁 ∑︁ 𝑏 − 𝑎 ∑︁ + 𝐺 − (𝑥) (𝐹 (𝑥)) d𝑥 (𝐹 (Δ 𝑘 ) + 𝐺 (Δ 𝑘 )). 𝑁 𝑘=1 Δ𝑘 𝑘=1
≤ The lemma is proved.
□
As a preliminary step towards Proposition 14.1.1, let us also derive: Lemma 14.1.3 If 𝑀 𝑝 is finite, then, for any bounded, Borel measurable function 𝑤 : R → R, ∫ ∞ 𝜋𝑀 ∫ ∞ 1/𝑞 𝑝 |𝑤(𝑥)| 𝑞 d𝐹 (𝑥) E𝜃 𝑤(𝑥) (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) d𝑥 ≤ √ . (14.2) −∞ −∞ 2𝑛 In particular, for all 𝑎 < 𝑏, ∫ E𝜃
𝑎
𝑏
𝜋𝑀 𝑝 (𝐹 (𝑏) − 𝐹 (𝑎)) 1/𝑞 . (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) d𝑥 ≤ √ 2𝑛
(14.3)
Proof Note that the left integral in (14.2) is well-defined and is finite, once 𝑀1 is finite (which is fulfilled under the 𝑝-th moment assumption 𝑀 𝑝 < ∞). Given a function 𝑔 : R → R, having a bounded continuous derivative 𝑔 ′, put ∫ ∞ ∫ ∞ ∫ ∞ 𝑢(𝜃) = 𝑔 d𝐹𝜃 − 𝑔 d𝐹 = E 𝑔(⟨𝑋, 𝜃⟩) − 𝑔 d𝐹, 𝜃 ∈ R𝑛 . −∞
−∞
−∞
This function is 𝐶 1 -smooth on R𝑛 , has 𝔰𝑛−1 -mean zero on the unit sphere, and we have the identity ⟨∇𝑢(𝜃), 𝑣⟩ = E ⟨𝑋, 𝑣⟩ 𝑔 ′ (⟨𝑋, 𝜃⟩). Using (E | ⟨𝑋, 𝑣⟩ | 𝑝 ) 1/ 𝑝 ≤ 𝑀 𝑝 for 𝑣 ∈ S𝑛−1 and applying Hölder’s inequality, we get 1/𝑞 | ⟨∇𝑢(𝜃), 𝑣⟩ | ≤ 𝑀 𝑝 E |𝑔 ′ (⟨𝑋, 𝜃⟩)| 𝑞 , which implies 1/𝑞 |∇𝑢(𝜃)| ≤ 𝑀 𝑝 E |𝑔 ′ (⟨𝑋, 𝜃⟩)| 𝑞 .
(14.4)
We now integrate this inequality over the unit sphere and apply the 𝐿 1 -Poincaré-type inequality of Proposition 9.5.1, which yields 𝜋 E 𝜃 |𝑢| ≤ √ E 𝜃 |∇𝑢|. 2𝑛
(14.5)
262
14 Fluctuations of Distributions
Since, by Hölder’s inequality, 1/𝑞 1/𝑞 E 𝜃 E |𝑔 ′ (⟨𝑋, 𝜃⟩)| 𝑞 ≤ E 𝜃 E |𝑔 ′ (⟨𝑋, 𝜃⟩)| 𝑞 1/𝑞 ∫ ∫ ∞ |𝑔 ′ | 𝑞 d𝐹𝜃 = E𝜃 =
∞
|𝑔 ′ | 𝑞 d𝐹
1/𝑞 ,
−∞
−∞
from (14.4)–(14.5) we obtain that ∫ ∫ ∞ 𝜋𝑀 𝑝 ∞ ′ 𝑞 1/𝑞 𝑔 d(𝐹𝜃 − 𝐹) ≤ √ E𝜃 |𝑔 | d𝐹 . −∞ −∞ 2𝑛 Integrating by parts on the left and replacing 𝑔 ′ with 𝑤, the above relation becomes ∫ ∞ 𝜋𝑀 ∫ ∞ 1/𝑞 𝑝 |𝑤| 𝑞 d𝐹 , E𝜃 𝑤(𝑥) (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) d𝑥 ≤ √ −∞ −∞ 2𝑛 which is the required inequality (14.2) with an additional assumption that 𝑤 is continuous. But this can easily be relaxed to the condition that 𝑤 is measurable. The lemma is thus proved. □ Proof (of Proposition 14.1.1) Since the inequality (14.1) is homogeneous with respect to 𝑋, we may assume that 𝑀 𝑝 = 1. Applying (14.3) with the intervals Δ 𝑘 = (𝑎 𝑘−1 , 𝑎 𝑘 ) as in Lemma 14.1.2, we get ∫ 𝜋 E𝜃 (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) d𝑥 ≤ √ 𝐹 (Δ 𝑘 ) 1/𝑞 . Δ𝑘 2𝑛 Therefore, by Lemma 14.1.2, for any integer 𝑁 ≥ 1, ∫
𝑏
|𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 ≤
E𝜃 𝑎
𝑁 𝜋 ∑︁ 2(𝑏 − 𝑎) +√ 𝐹 (Δ 𝑘 ) 1/𝑞 . 𝑁 2𝑛 𝑘=1
1/𝑞 Í𝑁 By Hölder’s inequality, the last sum does not exceed 𝑁 1/ 𝑝 ≤ 𝑁 1/ 𝑝 , 𝑘=1 𝐹 (Δ 𝑘 ) and we arrive at the estimate ∫ 𝑏 𝜋 2(𝑏 − 𝑎) + √ 𝑁 1/ 𝑝 . E𝜃 |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 ≤ 𝑁 𝑎 2𝑛 In particular, putting 𝑎 = −𝑏, 𝑏 > 0, we have ∫
𝑏
|𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 ≤
E𝜃 −𝑏
𝜋 4𝑏 + √ 𝑁 1/ 𝑝 . 𝑁 2𝑛
(14.6)
14.1 The Kantorovich Transport Distance
263
To extend this integral to the whole real line, we use the assumption 𝑀 𝑝 = 1, which, by Markov’s inequality, implies that 𝐹𝜃 {𝑦 : |𝑦| ≥ 𝑥} ≤ 𝑥 − 𝑝 for all 𝑥 > 0 together with a similar bound for 𝐹 after averaging over 𝜃. This gives ∫ ∫ 2𝑏 1− 𝑝 |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 ≤ |𝑥| − 𝑝 d𝑥 = . 𝑝−1 | 𝑥 | ≥𝑏 | 𝑥 | ≥𝑏 Combining this estimate with (14.6), we get ∫ ∞ 𝜋 2𝑏 1− 𝑝 4𝑏 + + √ 𝑁 1/ 𝑝 . E𝜃 |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 ≤ 𝑝 − 1 𝑁 −∞ 2𝑛 Here, the right-hand side is minimized at 𝑏 = ( 𝑁2 ) 1/ 𝑝 , for which 4𝑁 (1− 𝑝)/ 𝑝 4𝑁 (1− 𝑝)/ 𝑝 2𝑏 1− 𝑝 4𝑏 + = 1/ 𝑝 + = 𝑞 21+1/𝑞 𝑁 −1/𝑞 . 𝑝−1 𝑁 2 ( 𝑝 − 1) 21/ 𝑝 Thus, 𝜋 E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 𝑞 21+1/𝑞 𝑁 −1/𝑞 + √ 𝑁 1/ 𝑝 . (14.7) 2𝑛 Now, one should optimize the right-hand side of this inequality over all integers 𝑁 ≥ 1. Consider the function of the real variable 𝑥 > 0 𝜓(𝑥) = 𝛼𝑥 −1/𝑞 + 𝛽𝑥 1/ 𝑝 , It has a minimum at 𝑥0 = Note that 𝑥0 > For this value
2 𝜋
√
𝜋 𝛼 = 𝑞 21+1/𝑞 , 𝛽 = √ . 2𝑛
𝛼 𝑝 1 1+1/𝑞 √ = 2 𝑝 2𝑛. 𝛽 𝑞 𝜋
2𝑛 > 1. Therefore, choosing 𝑁 = [𝑥0 ] +1, we have 𝑥0 < 𝑁 ≤ 2𝑥 0 .
𝜓(𝑁) ≤ 𝛼 𝑥0−1/𝑞 + 𝛽 (2𝑥0 ) 1/ 𝑝 = 𝑐 𝑞 (2𝑛) −1/(2𝑞) with constant
2
𝑐 𝑞 = 𝜋 1/𝑞 2−1/𝑞 (2𝑝) 1/ 𝑝 (2 + 21/𝑞 (𝑞 − 1)). Hence, by (14.7), E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 𝑐 𝑞 (2𝑛) −1/(2𝑞) . It remains to bound the expression for 𝑐 𝑞 . First, one may use 2 + 21/𝑞 (𝑞 − 1) ≤ 2𝑞 2 and (2𝑝) 1/ 𝑝 ≤ e2/e , so that 𝑐 𝑞 ≤ 2e2/e 𝑢(𝑞)𝑞, where 𝑢(𝑞) = 𝜋 1/𝑞 2−1/𝑞 . This function attains its maximum at 𝑞 0 = (log 4)/(log 𝜋), which gives 𝑐 𝑞 ≤ 2e2/e 𝑢(𝑞 0 ) 𝑞 < 6.7 𝑞, and we arrive at the desired inequality (14.1). □
264
14 Fluctuations of Distributions
14.2 Large Deviations for the Kantorovich Distance Since Proposition 14.1.1 only deals with the average approximation of 𝐹𝜃 by 𝐹 by means of the Kantorovich distance 𝑊, one may wonder how to control deviations for most of the directions 𝜃. To this end one can employ again the first order concentration property on the sphere. We keep the same notations as in the previous section, 𝑝 assuming in particular that 𝑝 > 1 and 𝑞 = 𝑝−1 . Proposition 14.2.1 If 𝑀1 is finite, then for any 𝑟 > 0, 𝔰𝑛−1 {𝑊 (𝐹𝜃 , 𝐹) ≥ 𝑚 + 𝑟 } ≤ e−𝑛𝑟
2 /(4𝑀 2 ) 1
,
𝔰𝑛−1 {𝑊 (𝐹𝜃 , 𝐹) ≤ 𝑚 − 𝑟 } ≤ e−𝑛𝑟
2 /(4𝑀 2 ) 1
,
where 𝑚 = E 𝜃 𝑊 (𝐹𝜃 , 𝐹). In particular, for any 𝑝 ≥ 2,
E 𝜃 𝑊 (𝐹𝜃 , 𝐹)
𝑝
1/ 𝑝
√︁ 2𝑝 ≤ 𝑚 + 𝑀1 √ . 𝑛
Involving other 𝑀 𝑝 -functionals, one can eliminate the quantity 𝑚 in the bound for deviations above the mean. Proposition 14.2.2 If 𝑀 𝑝 is finite ( 𝑝 > 1), then for all 𝑟 ≥ 0, n o 2 𝔰𝑛−1 𝑊 (𝐹𝜃 , 𝐹) ≥ 7𝑞𝑀 𝑝 (2𝑛) −1/(2𝑞) + 𝑀 𝑝 𝑟 ≤ e−𝑛𝑟 /4 .
(14.8)
In particular, o n o n 1 𝔰𝑛−1 𝑊 (𝐹𝜃 , 𝐹) ≥ 8𝑞𝑀 𝑝 (2𝑛) −1/(2𝑞) ≤ exp − 𝑛1/ 𝑝 . 8
(14.9)
Thus, the distances 𝑊 (𝐹𝜃 , 𝐹) may exceed 𝐶𝑞𝑀 𝑝 (2𝑛) −1/(2𝑞) only on a very small part of the unit sphere (when the dimension 𝑛 is large). Note also that the bound (14.8) implies (14.1) up to an absolute factor (in place of the constant 7). Indeed, the non-negative random variable 𝜉 (𝜃) =
1 1 + 𝑊 (𝐹𝜃 , 𝐹) − 7𝑞 (2𝑛) − 2𝑞 𝑀𝑝
has subgaussian tails on (S𝑛−1 , 𝔰𝑛−1 ). More precisely, (14.8) implies that E 𝜃 𝜉 ≤ so that 1 E 𝑊 (𝐹𝜃 , 𝐹) ≤ 7𝑞 (2𝑛) −1/(2𝑞) + E 𝜃 𝜉 ≤ 𝑐 ′ 𝑞𝑛−1/(2𝑞) 𝑀𝑝
√𝑐 , 𝑛
for some positive absolute constants 𝑐 and 𝑐 ′. In this sense, Proposition 14.2.2 sharpens Proposition 14.1.1.
14.2 Large Deviations for the Kantorovich Distance
265
Proof (of Propositions 14.2.1 and 14.2.2) The distance function ∫ ∞ 𝑢(𝜃) = 𝑊 (𝐹𝜃 , 𝐹) = |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 −∞
is Lipschitz on Indeed, according to the classical Kantorovich–Rubinstein theorem, it admits the representation ∫ ∞ ∫ ∞ h∫ ∞ h i i 𝑢(𝜃) = sup 𝑔 d𝐹𝜃 − 𝑔 d𝐹 , 𝑔 d𝐹 = sup E 𝑔(⟨𝑋, 𝜃⟩) − R𝑛 .
−∞
−∞
−∞
where the supremum runs over all functions 𝑔 : R → R with ∥𝑔∥ Lip ≤ 1, cf. e.g. [80] (note that in dimension one as above this representation is elementary). It follows that, for all 𝜃, 𝜃 ′ ∈ R𝑛 , h i 𝑢(𝜃) − 𝑢(𝜃 ′) ≤ sup E 𝑔(⟨𝑋, 𝜃⟩) − E 𝑔(⟨𝑋, 𝜃 ′⟩) . But, using the Lipschitz property, we have |E 𝑔(⟨𝑋, 𝜃⟩) − E 𝑔(⟨𝑋, 𝜃 ′⟩)| ≤ E | ⟨𝑋, 𝜃⟩ − ⟨𝑋, 𝜃 ′⟩ | ≤ 𝑀1 |𝜃 − 𝜃 ′ |. Therefore, ∥𝑢∥ Lip ≤ 𝑀1 , so that, by Proposition 10.3.1, for all 𝑟 ≥ 0, 2 2 𝔰𝑛−1 𝑢 − E 𝜃 𝑢 ≥ 𝑟 ≤ e−𝑛𝑟 /(4𝑀1 ) , 2 2 𝔰𝑛−1 𝑢 − E 𝜃 𝑢 ≤ −𝑟 ≤ e−(𝑛𝑟 /(4𝑀1 ) , which yield the large deviation bounds of Proposition 14.2.1, as well as (14.8), by involving the bound (14.1) on the mean E 𝜃 𝑢 and noting that 𝑀1 ≤ 𝑀 𝑝 . The second bound (14.9) follows from the first one when applying (14.8) with 𝑟 = 𝑞 (2𝑛) −1/(2𝑞) . It remains to explain the moment inequality. Putting 𝜉 = (𝑊 (𝐹𝜃 , 𝐹) − 𝑚) + and using Γ(𝑥 + 1) ≤ 𝑥 𝑥 with 𝑥 = 𝑝/2 ≥ 1, we have ∫ ∞ ∫ ∞ 2 2 e−𝑛𝑟 /(4𝑀1 ) d𝑟 𝑝 𝔰𝑛−1 {𝜉 ≥ 𝑟 } d𝑟 𝑝 ≤ E𝜃 𝜉 𝑝 = 0 0 2𝑀 𝑝 𝑝 𝑝/2 𝑀1 √︁2𝑝 𝑝 2𝑀 𝑝 𝑝 1 1 Γ +1 ≤ √ = ≡ 𝐴. = √ √ 2 2 𝑛 𝑛 𝑛 Thus, ∥𝜉 ∥ 𝑝 = (E 𝜃 𝜉 𝑝 ) 1/ 𝑝 ≤ 𝐴. Since 𝑊 (𝐹𝜃 , 𝐹) ≤ 𝜉 + 𝑚, we conclude, by the triangle inequality, that ∥𝑊 (𝐹𝜃 , 𝐹) ∥ 𝑝 ≤ ∥𝜉 ∥ 𝑝 + 𝑚 ≤ 𝐴 + 𝑚. Both propositions are now proved. □ Note that for growing values of 𝑝 the rate of approximation of 𝐹𝜃 by 𝐹 in the Kantorovich distance 𝑊 as in (14.1) with respect to the growing dimension 𝑛 gets better and approaches the standard √1𝑛 -rate (like in the classical central limit theorem with i.i.d. summands). On the other hand, the moment functionals 𝑀 𝑝 grow as well. To balance both effects, we need to require an appropriate hypothesis on the rate
266
14 Fluctuations of Distributions
of growth of the moments as in the following statement, which provides a nearly standard rate of approximation. Proposition 14.2.3 Suppose that E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2 for all 𝜃 ∈ S𝑛−1 with some 𝜆 > 0. Then log 𝑛 E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 𝑐𝜆 √ 𝑛 for some absolute constant 𝑐. Moreover, for all 𝑟 > 0, n o 2 log 𝑛 𝔰𝑛−1 𝑊 (𝐹𝜃 , 𝐹) ≥ 𝑐𝜆 √ + 𝜆𝑟 ≤ e−𝑛𝑟 /4 . 𝑛 The assumption about the moments is expressed here in terms of the Orlicz norms ∥ ⟨𝑋, 𝜃⟩ ∥ 𝜓 generated by the Young function 𝜓(𝑟) = e |𝑟 | − 1. The finiteness of this norm is equivalent to the property that the 𝐿 𝑝 -norms ∥ ⟨𝑋, 𝜃⟩ ∥ 𝑝 grow at worst at a linear rate as 𝑝 grows to infinity. Proof One may assume that 𝜆 = 1. In view of the bound 𝑟 𝑝 ≤ 𝑝 𝑝 e− 𝑝 e𝑟 (𝑟 ≥ 0), the moment hypothesis of the proposition implies that ∥ ⟨𝑋, 𝜃⟩ ∥ 𝑝𝑝 ≤ 𝑝 𝑝 e− 𝑝 E e | ⟨𝑋, 𝜃 ⟩ | ≤ 2𝑝 𝑝 e− 𝑝 , so that 𝑀1 ≤ 𝑀 𝑝 ≤ 2𝑝/e. Applying Proposition 14.1.1, we obtain that E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤
14 1 𝑝𝑞 (2𝑛) 1/(2 𝑝) √ . e 2𝑛
It is sufficient to choose √ here 𝑝 = 2 + log 𝑛, in which case the right-hand side will be bounded by (𝑐 log 𝑛)/ 𝑛. The second assertion of the proposition follows from the first bound of Proposition 14.2.1, by noting that E e | ⟨𝑋, 𝜃 ⟩ | ≥ 1 + E | ⟨𝑋, 𝜃⟩ |, which implies 𝑀1 ≤ 1. □
14.3 Pointwise Fluctuations Let 𝑋 be a random vector in R𝑛 (𝑛 ≥ 2). We now turn to other distances that give more information about possible fluctuations of the distribution functions 𝐹𝜃 (𝑥) = P{⟨𝑋, 𝜃⟩ ≤ 𝑥} about the typical distribution function 𝐹 (𝑥) = E 𝜃 𝐹𝜃 (𝑥). Here, we consider deviations for individual points 𝑥 ∈ R and use only the moment functional 𝑀2 = 𝑀2 (𝑋). As a natural approach to the problem, one may employ the supremum-convolution inequality on the unit sphere described in Corollary 9.4.3. It implies that o n E 𝜃 exp{𝑡𝑢(𝜃)} ≤ exp 𝑡 E 𝜃 𝑃 2𝑡 𝑢(𝜃) , 𝑡 > 0. (14.10) 𝑛
14.3 Pointwise Fluctuations
267
where 𝑢 may be an arbitrary 𝔰𝑛−1 -integrable function on R𝑛 , and h |𝜃 − 𝜃 ′ | 2 i 𝑃𝑠 𝑢(𝜃) = sup 𝑢(𝜃 ′) − , 2𝑠 𝜃 ′ ∈R𝑛
𝑠 > 0,
is the supremum-convolution operator with the quadratic cost function. We apply the inequality (14.10) to the functions of the form 𝑢(𝜃) = 𝐹𝜃 (𝑥) with fixed 𝑥 ∈ R. By Markov’s inequality, for any ℎ > 0, 𝐹𝜃 ′ (𝑥) = P{⟨𝑋, 𝜃 ′⟩ ≤ 𝑥} = P{⟨𝑋, 𝜃⟩ ≤ 𝑥 + ⟨𝑋, 𝜃 − 𝜃 ′⟩} ≤ P{⟨𝑋, 𝜃⟩ ≤ 𝑥 + ℎ} + P{⟨𝑋, 𝜃 − 𝜃 ′⟩ ≥ ℎ} ≤ 𝐹𝜃 (𝑥 + ℎ) +
𝑀22 ℎ2
|𝜃 − 𝜃 ′ | 2
and similarly 𝐹𝜃 ′ (𝑥) ≥ 𝐹𝜃 (𝑥 − ℎ) − Putting ℎ = 2𝑀2
√︃
𝑡 𝑛,
𝑀22 ℎ2
|𝜃 − 𝜃 ′ | 2 .
we then get
𝐹𝜃 ′ (𝑥) ≤ 𝐹𝜃 (𝑥 + ℎ) +
𝑛 |𝜃 − 𝜃 ′ | 2 , 4𝑡
𝐹𝜃 ′ (𝑥) ≥ 𝐹𝜃 (𝑥 − ℎ) −
𝑛 |𝜃 − 𝜃 ′ | 2 . 4𝑡
Hence, by the definition of the supremum-convolution operators, h i 𝑛 |𝜃 − 𝜃 ′ | 2 ≤ 𝐹𝜃 (𝑥 + ℎ), 𝑃 2𝑡 𝑢(𝜃) = sup 𝐹𝜃 ′ (𝑥) − 𝑛 4𝑡 𝜃 ′ ∈R𝑛 i h 𝑛 𝑃 2𝑡 (−𝑢) (𝜃) = sup − 𝐹𝜃 ′ (𝑥) − |𝜃 − 𝜃 ′ | 2 ≤ −𝐹𝜃 (𝑥 − ℎ). 𝑛 4𝑡 𝜃 ′ ∈R𝑛 Let us now rewrite (14.10) in the form o n E 𝜃 exp{𝑡 (𝑢 − E 𝜃 𝑢)} ≤ exp 𝑡 E 𝜃 𝑃 2𝑡 𝑢 − E 𝜃 𝑢 . 𝑛
Replacing 𝑢 with −𝑢, we also have n o E 𝜃 exp{𝑡 (E 𝜃 𝑢 − 𝑢)} ≤ exp 𝑡 E 𝜃 𝑃 2𝑡 (−𝑢) + E 𝜃 𝑢 . 𝑛
For 𝑢(𝜃) = 𝐹𝜃 (𝑥), we have E 𝜃 𝑢 = 𝐹 (𝑥), and similarly, by the above pointwise estimates on the sup-convolutions, E 𝜃 𝑃 2𝑡 𝑢(𝜃) ≤ 𝐹 (𝑥 + ℎ), 𝑛
E 𝜃 𝑃 2𝑛𝑡 (−𝑢) (𝜃) ≤ −𝐹 (𝑥 − ℎ).
Thus, we have obtained the following estimates on the exponential moments for deviations of 𝐹𝜃 (𝑥) around 𝐹 (𝑥).
268
14 Fluctuations of Distributions
Proposition 14.3.1 For any 𝑥 ∈ R and 𝑡 > 0, with ℎ = 2𝑀2
√︃
𝑡 𝑛−1 ,
E 𝜃 exp{𝑡 (𝐹𝜃 (𝑥) − 𝐹 (𝑥))} ≤ exp{𝑡 (𝐹 (𝑥 + ℎ) − 𝐹 (𝑥))}, E 𝜃 exp{𝑡 (𝐹 (𝑥) − 𝐹𝜃 (𝑥))} ≤ exp{𝑡 (𝐹 (𝑥) − 𝐹 (𝑥 − ℎ))}. The two bounds can be united in one relation: For any 𝑡 ∈ R, E𝜃 e
𝑡 (𝐹𝜃 ( 𝑥)−𝐹 ( 𝑥))
n
o
≤ exp |𝑡|(𝐹 (𝑥 + ℎ) − 𝐹 (𝑥 − ℎ)) ,
√︂ ℎ = 2𝑀2
|𝑡| . 𝑛
This already shows that the local behavior of the distribution function 𝐹 near a given point 𝑥 turns out to be responsible for the large deviation behavior at this point of the distribution function 𝐹𝜃 around its mean (as a function of 𝜃 ∈ S𝑛−1 ). For a quantitative statement, let us recall that 𝐹 represents the distribution of 𝜃 1 |𝑋 |, where 𝜃 1 is the first coordinate of the vector 𝜃 on the unit sphere viewed as a random variable independent of 𝑋. Clearly, 𝐹 is absolutely continuous with respect to the Lebesgue measure on the real line, symmetric about the origin√(as a measure) and unimodal. Denoting by 𝜑 𝑛 the density of the random variable 𝜃 1 𝑛, the density of 𝐹 may be written as √ 𝑥 √𝑛 𝑛 𝜑𝑛 , 𝑥 ∈ R. 𝑝(𝑥) = E |𝑋 | |𝑋 | Hence, 𝑝 is even, smooth and non-decreasing on (0, ∞).√Also, 𝐹 has a finite Lipschitz 𝑛 semi-norm ℓ = ∥𝐹 ∥ Lip if and only if the expectation E |𝑋 | is finite, and then √
𝑐0 E
√ 𝑛 𝑛 ≤ ℓ ≤ 𝑐1 E |𝑋 | |𝑋 |
for some absolute constants 𝑐 1 > 𝑐 0 > 0. In addition, using 𝐹 (𝑥 + ℎ) − 𝐹 (𝑥 − ℎ) ≤ 2ℓℎ, the exponential bound on deviations of 𝐹𝜃 (𝑥) yields n |𝑡| 3/2 o E 𝜃 e𝑡 (𝐹𝜃 ( 𝑥)−𝐹 ( 𝑥)) ≤ exp{2ℓℎ |𝑡|} = exp 4𝑀2 ℓ √ . 𝑛 Equivalently, making the substitution 𝑡 = 𝜉 (𝜃) =
𝑛1/3 (4𝑀2 ℓ) 2/3
𝜆, for the function
𝑛1/3 (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) (4𝑀2 ℓ) 2/3
(viewed as a random variable on the sphere), we get a bound on the Laplace transform E 𝜃 exp{𝜆𝜉} ≤ exp{|𝜆| 3/2 },
𝜆 ∈ R.
14.4 The Lévy Distance
269
By Markov’s inequality, for any 𝑟 > 0, 𝔰𝑛−1 {𝜉 ≥ 𝑟} ≤ exp{𝜆3/2 − 𝜆𝑟} = exp{−4𝑟 3 /27}
for 𝜆 =
4 2 𝑟 . 9
Similarly, 𝔰𝑛−1 {𝜉 ≤ −𝑟 } ≤ exp{−4𝑟 3 /27}. Therefore, we arrive at: Corollary 14.3.2 Assume that the density of the typical distribution 𝐹 is bounded by a number ℓ. Then, for any 𝑥 ∈ R and 𝑟 > 0, 𝔰𝑛−1
n
o 𝑛1/3 |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| ≥ 𝑟 ≤ 2 exp{−4𝑟 3 /27}. 2/3 (4𝑀2 ℓ)
In particular, for some absolute constant 𝑐, E 𝜃 |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| ≤ 𝑐 (𝑀2 ℓ) 2/3
1 𝑛1/3
.
14.4 The Lévy Distance Recall that the Lévy distance 𝐿(𝐹𝜃 , 𝐹) between the distribution functions 𝐹𝜃 and 𝐹 is defined as the minimum over all ℎ ∈ [0, 1] such that the inequality 𝐹 (𝑥 − ℎ) − ℎ ≤ 𝐹𝜃 (𝑥) ≤ 𝐹 (𝑥 + ℎ) + ℎ holds for all 𝑥 ∈ R. In general, there is an elementary relation 𝐿 2 ≤ 𝑊 connecting it with the Kantorovich distance. Indeed, if ℎ > 𝐿(𝐹𝜃 , 𝐹), one can find 𝑥 such that 𝐹𝜃 (𝑥) > 𝐹 (𝑥 + ℎ) + ℎ
or
𝐹 (𝑥) > 𝐹𝜃 (𝑥 + ℎ) + ℎ.
In the first case (for definiteness), we then have ∫ 𝑥+ℎ ∫ 𝑊 (𝐹𝜃 , 𝐹) ≥ (𝐹𝜃 (𝑦) − 𝐹 (𝑦)) d𝑦 ≥ 𝑥
𝑥+ℎ
(𝐹𝜃 (𝑥) − 𝐹 (𝑥 + ℎ)) d𝑦 > ℎ2 .
𝑥
Letting ℎ → 𝐿 (𝐹𝜃 , 𝐹) yields the required inequality √︁ 𝐿 (𝐹𝜃 , 𝐹) ≤ 𝑊 (𝐹𝜃 , 𝐹). Now, using Proposition 14.2.1, we may therefore conclude that, for all 𝑟 ≥ 0, n √︁ 𝔰𝑛−1 𝐿(𝐹𝜃 , 𝐹) ≥ 7𝑞𝑀 𝑝
o √︁ 2 1 𝑀 𝑟 ≤ e−(𝑛−1)𝑟 /4 . + 𝑝 1/(4𝑞) (2𝑛)
270
14 Fluctuations of Distributions
In particular (which also follows from Proposition 14.1.1), E 𝐿 (𝐹𝜃 , 𝐹) ≤
√︁ 7𝑞𝑀 𝑝
1 . 𝑛1/(4𝑞)
With respect to 𝑛, the latter bound behaves worse than 𝑛−1/4 for any value of 𝑝 > 1. In fact, in order to reach this rate (modulo a logarithmically growing factor), only the functional 𝑀1 = 𝑀1 (𝑋) is needed. Proposition 14.4.1 For any random vector 𝑋 in R𝑛 with finite first absolute moment, E 𝜃 𝐿(𝐹𝜃 , 𝐹) ≤
𝑀1 + 2 log 𝑛 . 𝑛1/4
Proof Introduce the characteristic functions 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ and 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) (𝑡 ∈ R), corresponding to the distribution functions 𝐹𝜃 of the weighted sums ⟨𝑋, 𝜃⟩ and the typical distribution 𝐹. We apply Zolotarev’s estimate from Proposition 3.3.1: For any 𝑇 > 0, 1 E 𝜃 𝐿 (𝐹𝜃 , 𝐹) ≤ 𝜋
∫
𝑇
−𝑇
E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 4 log(1 + 𝑇) d𝑡 + . |𝑡| 𝑇
Here, according to Proposition 13.3.3, E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| ≤
√ √2 𝑀1 𝑛−1
|𝑡|, so that
√ 4 log(1 + 𝑇) 2 2 𝑀1𝑇 + . E 𝜃 𝐿 (𝐹𝜃 , 𝐹) ≤ √ 𝜋 𝑇 𝑛−1 To bound the above right-hand side, choose 𝑇 = 𝑛1/4 for 𝑛 ≥ 16. Then 𝑇4 log(1+𝑇) < 2𝑛−1/4 log 𝑛 and
√ 2√ 2 𝑇 𝜋 𝑛−1
< 𝑛−1/4 , which implies the desired inequality. If 𝑛 < 16, it
holds automatically, since then
2 log 𝑛 𝑛1/4
> 1, while 𝐿 ≤ 1. Proposition 14.4.1 follows.□
The bound of Corollary 14.3.2 suggests, however, that, by involving the moment functional 𝑀2 , one can further improve the rate in the bound of Proposition 14.4.1. Indeed, this turns out to be possible by virtue of the supremum-convolution approach, as can be seen from the following: Proposition 14.4.2 For any 𝑟 > 0, n 𝔰𝑛−1 𝐿 (𝐹𝜃 , 𝐹) ≥ 𝑟 ≤ 12 𝑀2 𝑟 −3/2 + 1 exp −
o 𝑛 3 𝑟 . 32 𝑀22
(14.11)
In particular, for some absolute constant 𝑐, E 𝜃 𝐿 (𝐹𝜃 , 𝐹) ≤ 𝑐
log 𝑛 1/3 𝑛
𝑀22/3 .
(14.12)
14.4 The Lévy Distance
271
Proof We rewrite the exponential bounds of Proposition 14.3.1 as E 𝜃 e𝑡 (𝐹𝜃 ( 𝑥)−𝐹 ( 𝑥+ℎ)) ≤ 1, E 𝜃 e𝑡 (𝐹 ( 𝑥−ℎ)−𝐹𝜃 ( 𝑥)) ≤ 1, √︃ where ℎ = 2𝑀2 𝑛𝑡 with arbitrary 𝑡 > 0. By Markov’s inequality, for any 𝑟 ≥ 0, 𝔰𝑛−1 𝐹𝜃 (𝑥) − 𝐹 (𝑥 + ℎ) ≥ 𝑟/2 ≤ e−𝑡𝑟 , 𝔰𝑛−1 𝐹 (𝑥 − ℎ) − 𝐹𝜃 (𝑥) ≥ 𝑟/2 ≤ e−𝑡𝑟/2 . Let us recall that the typical distribution is symmetric about the origin and has a unimodal density (in particular, 𝐹 is continuous). Fix an integer 𝑁 ≥ 1 large enough so that 1 − 𝐹 (𝑁 ℎ) = 𝐹 (−𝑁 ℎ) ≤ 𝜀 for a given 𝜀 > 0. For the points 𝑥 𝑘 = 𝑘 ℎ, 𝑘 = 0, ±1, ±(𝑁 + 1), the above bounds yield n 𝑟o ≤ (2𝑁 + 3) e−𝑡𝑟/2 , 𝔰𝑛−1 max [𝐹𝜃 (𝑥 𝑘 ) − 𝐹 (𝑥 𝑘 + ℎ)] ≥ 2 |𝑘 | ≤𝑁 +1 n 𝑟o ≤ (2𝑁 + 3) e−𝑡𝑟/2 . 𝔰𝑛−1 max [𝐹 (𝑥 𝑘 − ℎ) − 𝐹𝜃 (𝑥 𝑘 )] ≥ 2 |𝑘 | ≤𝑁 +1 Hence, for the random variable n o 𝜉 𝑁 = max max [𝐹𝜃 (𝑥 𝑘 ) − 𝐹 (𝑥 𝑘 + ℎ)], max [𝐹 (𝑥 𝑘 − ℎ) − 𝐹𝜃 (𝑥 𝑘 )] |𝑘 | ≤𝑁 +1
|𝑘 | ≤𝑁 +1
we have 𝔰𝑛−1 {𝜉 𝑁 ≥ 𝑟/2} ≤ 2(2𝑁 + 3) e−𝑡𝑟/2 . Now, take any point 𝑥 ∈ [−(𝑁 + 1)ℎ, (𝑁 + 1)ℎ] different from all 𝑥 𝑘 ’s and choose 𝑘 such that 𝑥 𝑘−1 < 𝑥 < 𝑥 𝑘 (−𝑁 ≤ 𝑘 ≤ 𝑁 + 1). Then 𝐹𝜃 (𝑥) − 𝐹 (𝑥 + 2ℎ) ≤ 𝐹𝜃 (𝑥 𝑘 ) − 𝐹 (𝑥 𝑘−1 + 2ℎ) = 𝐹𝜃 (𝑥 𝑘 ) − 𝐹 (𝑥 𝑘 + ℎ) ≤ 𝜉 𝑁 , and similarly 𝐹 (𝑥 − 2ℎ) − 𝐹𝜃 (𝑥) ≤ 𝐹 (𝑥 𝑘 − 2ℎ) − 𝐹𝜃 (𝑥 𝑘−1 ) = 𝐹 (𝑥 𝑘−1 − ℎ) − 𝐹𝜃 (𝑥 𝑘−1 ) ≤ 𝜉 𝑁 . In addition, if 𝑥 > 𝑥 𝑁 +1 = (𝑁 + 1)ℎ, then 𝐹𝜃 (𝑥) − 𝐹 (𝑥 + 2ℎ) ≤ 1 − 𝐹 (𝑁 ℎ) ≤ 𝜀 and 𝐹 (𝑥 − 2ℎ) − 𝐹𝜃 (𝑥) ≤ 1 − 𝐹𝜃 ((𝑁 + 1)ℎ) = (1 − 𝐹 (𝑥 𝑁 )) + (𝐹 (𝑥 𝑁 +1 − ℎ) − 𝐹𝜃 (𝑥 𝑁 +1 )) ≤ 𝜉 𝑁 + 𝜀. Using similar bounds for 𝑥 < −(𝑁 + 1)ℎ, we conclude that the random variable n o 𝜉 = max sup [𝐹𝜃 (𝑥) − 𝐹 (𝑥 + 2ℎ)], sup [𝐹 (𝑥 − 2ℎ) − 𝐹𝜃 (𝑥)] 𝑥
𝑥
272
14 Fluctuations of Distributions
satisfies 𝜉 ≤ 𝜉 𝑁 + 𝜀. Hence, for all 𝑟 > 0, n o 𝑟 𝔰𝑛−1 𝜉 ≥ + 𝜀 ≤ 2(2𝑁 + 3) e−𝑡𝑟/2 . 2 Let us now choose a suitable 𝑁 by using the functional 𝑀2 . Since ∫ ∞ 𝑥 2 d𝐹𝜃 (𝑥) = E ⟨𝑋, 𝜃⟩ 2 ≤ 𝑀22 −∞
for all 𝜃 ∈ S𝑛−1 , the same is true for 𝐹 as well. But if 𝜂 is a random variable distributed according to 𝐹, then, by Markov’s inequality, P{|𝜂| ≥ 𝑁 ℎ} ≤
𝑀22 (𝑁 ℎ) 2
where the last inequality holds true, as long as 𝑁 ≥ such integer, so that 𝑁 ≤ 𝑀ℎ2 𝜀 −1/2 + 1. Inserting 𝑡 = for 𝜉, we then get
≤ 𝜀, 𝑀2 −1/2 . Let 𝑁 be the smallest ℎ 𝜀 𝑛 2 into the deviation bound ℎ 2 4𝑀2
o 4𝑀 n n o 𝑟 𝑛 2 −1/2 𝜀 + 10 exp − 𝑟 ℎ2 . 𝔰𝑛−1 𝜉 ≥ + 𝜀 ≤ 2 2 ℎ 8𝑀2 Now we choose 𝜀 =
1 2
𝑟, in which case the above inequality becomes
𝔰𝑛−1 {𝜉 ≥ 𝑟 } ≤
4𝑀
2
n √ 𝑟 −1/2 2 + 10 exp −
ℎ
o 𝑛 2 𝑟 ℎ . 8𝑀22
Let us replace here ℎ with ℎ/2 and formulate this inequality once more in terms of the functional n o 𝐿 ℎ (𝐹𝜃 , 𝐹) = max sup [𝐹𝜃 (𝑥) − 𝐹 (𝑥 + ℎ)], sup [𝐹 (𝑥 − ℎ) − 𝐹𝜃 (𝑥)] . 𝑥
𝑥
√ Let us also use 8 2 < 12 to simplify the constants. Then, for any 𝑟, ℎ > 0, we have n o 𝑀 n o 𝑛 2 −1/2 2 𝑟 + 1 exp − 𝑟 𝔰𝑛−1 𝐿 ℎ (𝐹𝜃 , 𝐹) ≥ 𝑟 ≤ 12 ℎ . ℎ 32 𝑀22 It is interesting that the probabilities on the right-hand side decay exponentially fast with respect to 𝑛. In order to relate this inequality to the Lévy distance, we recall that 𝐿(𝐹𝜃 , 𝐹) ≤ ℎ ⇐⇒ 𝐿 ℎ (𝐹𝜃 , 𝐹) ≤ ℎ. Hence, in the case ℎ = 𝑟, the above yields the required inequality (14.11). Now, rewrite this inequality in terms of the function 𝜂 = 𝑀2−2/3 𝐿(𝐹𝜃 , 𝐹) as n 𝑛 o 𝑟3 . 𝔰𝑛−1 {𝜂 ≥ 𝑟 } ≤ 12 𝑟 −3/2 + 1 exp − 32
14.4 The Lévy Distance
If 𝑟 ≥ 6𝑎 𝑛 , 𝑎 𝑛 = (
273
log 𝑛 1/3 , 𝑛 )
𝑛 3 𝑟 }≤ it follows that exp{− 32
𝑟 −3/2 + 1 ≤ (6𝑎 𝑛 ) −3/2 + 1 = Hence ∫
∞
∫ 𝔰𝑛−1 {𝜂 ≥ 𝑟 } d𝑟 ≤
6𝑎𝑛
63/2
1 𝑛3
1 3 exp{− 64 𝑟 }, while
√ 𝑛1/2 + 1 < 𝑛. 1/2 (log 𝑛)
∞
𝑛 3 12 12 𝑟 −3/2 + 1 e− 32 𝑟 d𝑟 ≤ 2 𝑛 6𝑎𝑛
∫
∞
1
3
e− 64 𝑟 d𝑟.
0
This shows that E𝜂 ≤ 𝑐𝑎 𝑛 for some absolute constant 𝑐. Proposition 14.4.2 is proved. □ 𝑟 log 𝑛 1/3 , we get a somewhat simpler estiSubstituting in (14.11) 𝑟 with 𝑐𝑀22/3 𝑛 mate for large deviations. Corollary 14.4.3 With some absolute constant 𝑐 > 0, for any 𝑟 ≥ 1, o n log 𝑛 1/3 𝑀22/3 ≤ 𝑛−𝑟 . 𝔰𝑛−1 𝐿(𝐹𝜃 , 𝐹) ≥ 𝑐𝑟 𝑛 The dependence on 𝑟 in the exponent on the right-hand side of (14.11) can be sharpened by involving the functionals 𝑀 𝑝 with 𝑝 > 2. This may be achieved using the Lipschitz property as in the following: Lemma 14.4.4 For all 𝜃, 𝜃 ′ ∈ R𝑛 and 𝑝 ≥ 1, 𝑝
𝑝
𝐿(𝐹𝜃 , 𝐹𝜃 ′ ) ≤ 𝑀 𝑝𝑝+1 |𝜃 − 𝜃 ′ | 𝑝+1 . In particular, for any distribution function 𝐺, 𝑝
𝑝
|𝐿(𝐹𝜃 , 𝐺) − 𝐿 (𝐹𝜃 ′ , 𝐺)| ≤ 𝑀 𝑝𝑝+1 |𝜃 − 𝜃 ′ | 𝑝+1 . Thus, the distance functions 𝜃 → 𝐿 (𝐹𝜃 , 𝐺) belong to the class Lip(𝛼) on the unit 𝑝 and have Lip(𝛼)-seminorms at most 𝑀 𝑝𝑝/( 𝑝+1) . sphere for 𝛼 = 𝑝+1 Proof Let 𝑟 = |𝜃 − 𝜃 ′ | > 0 and put 𝑣 =
1 𝑟
(𝜃 − 𝜃 ′). By Markov’s inequality,
P{| ⟨𝑋, 𝜃⟩ − ⟨𝑋, 𝜃 ′⟩ | > 𝑟𝑡} = P{| ⟨𝑋, 𝑣⟩ | > 𝑡} ≤
𝑀 𝑝𝑝 𝑡𝑝
for any 𝑡 > 0. Hence, for any 𝑥 ∈ R, 𝐹𝜃 ′ (𝑥) = P{⟨𝑋, 𝜃 ′⟩ ≤ 𝑥} = P{⟨𝑋, 𝜃⟩ ≤ 𝑥 + (⟨𝑋, 𝜃⟩ − ⟨𝑋, 𝜃 ′⟩)} ≤ P{⟨𝑋, 𝜃⟩ ≤ 𝑥 + 𝑟𝑡} + P{⟨𝑋, 𝜃⟩ − ⟨𝑋, 𝜃 ′⟩ > 𝑟𝑡} ≤ 𝐹𝜃 (𝑥 + 𝑟𝑡) +
𝑀 𝑝𝑝 . 𝑡𝑝
274
14 Fluctuations of Distributions 𝑝
Choose 𝑡 = 𝑀 𝑝𝑝/( 𝑝+1) 𝑟 −1/( 𝑝+1) so that for ℎ = 𝑟𝑡 =
𝑀𝑝 𝑡𝑝
, we obtain that
𝐹𝜃 ′ (𝑥) ≤ 𝐹𝜃 (𝑥 + ℎ) + ℎ with arbitrary 𝑥. This implies that 𝐿(𝐹𝜃 , 𝐹𝜃 ′ ) ≤ 𝑟𝑡, which is the first assertion. The second assertion follows from the triangle inequality |𝐿 (𝐹𝜃 , 𝐺) − 𝐿(𝐹𝜃 ′ , 𝐺)| ≤ 𝐿 (𝐹𝜃 , 𝐹𝜃 ′ ), thus proving the lemma.
□
We can now combine Proposition 10.3.4 about the deviations of Lip(𝛼) functions and Lemma 14.4.4, where we choose 𝐺 = 𝐹. Proposition 14.4.5 If 𝑀 𝑝 is finite ( 𝑝 ≥ 1), then for any 𝑟 > 0, 𝔰𝑛−1
n
𝑟 2( 𝑝+1)/ 𝑝 . |𝐿 (𝐹𝜃 , 𝐹) − E 𝜃 𝐿 (𝐹𝜃 , 𝐹)| ≥ 𝑟 ≤ 2 exp − (𝑛 − 1) 2𝑀 𝑝2 o
The particular case 𝑝 = 2 leads to the following estimate, which is consistent with (14.11): n o 𝑟3 . 𝔰𝑛−1 |𝐿(𝐹𝜃 , 𝐹) − E 𝜃 𝐿(𝐹𝜃 , 𝐹)| ≥ 𝑟 ≤ 2 exp − (𝑛 − 1) 2𝑀22
14.5 Berry–Esseen-type Bounds The moment functionals 𝑀 𝑝 = 𝑀 𝑝 (𝑋) may also be used to control the Kolmogorov distance 𝜌(𝐹𝜃 , 𝐹) = sup |𝐹𝜃 (𝑥) − 𝐹 (𝑥)|. 𝑥
However, in general it is stronger than the Lévy distance, and the associated topology is stronger than the weak topology in the space of probability distributions on the line. So, in order to get good bounds on 𝜌(𝐹𝜃 , 𝐹), we will need to involve some other characteristics of the underlying distribution of the random vector 𝑋. We are mostly interested in bounding the first and the second moments, E 𝜃 𝜌(𝐹𝜃 , 𝐹) and E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹), in the sense of the uniform distribution 𝔰𝑛−1 on the unit sphere S𝑛−1 . Our approach to the main Problem 12.1.2 will be based on Fourier analysis, namely, on general Berry–Esseen-type bounds, which allow one to transfer bounds on the characteristic functions to bounds on the Kolmogorov distance. First let us state separately one general relation which is adapted to the characteristic functions of linear forms of the random vector 𝑋 in R𝑛 , ∫ ∞ 𝑖𝑡 ⟨𝑋, 𝜃 ⟩ 𝑓 𝜃 (𝑡) = E e = e𝑖𝑡 𝑥 d𝐹𝜃 (𝑥), 𝑡 ∈ R, 𝜃 ∈ S𝑛−1 . −∞
14.5 Berry–Esseen-type Bounds
275
As before, ∫
∞
e𝑖𝑡 𝑥 d𝐹 (𝑥)
𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) = −∞
denotes the characteristic function of the typical distribution 𝐹 = E 𝜃 𝐹. Lemma 14.5.1 Given a random vector 𝑋 in R𝑛 , for all 𝑇 ≥ 𝑇0 > 0, ∫
𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤
𝑇0
E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 𝑡 0 ∫ 𝑇 ∫ 𝑇 E 𝜃 | 𝑓 𝜃 (𝑡)| 1 + d𝑡 + | 𝑓 (𝑡)| d𝑡, 𝑡 𝑇 0 𝑇0
(14.13)
where 𝑐 > 0 is an absolute constant. Lemma 14.5.2 Moreover, if 𝑇 ≥ 𝑇0 ≥ 1, then 1
E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 d𝑡 𝑡2 0 ∫ 𝑇0 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 + log 𝑇 d𝑡 𝑡 0 ∫ 𝑇 2 ∫ 𝑇 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 1 + log 𝑇 d𝑡 + 2 | 𝑓 (𝑡)| d𝑡 . 𝑡 𝑇 𝑇0 0 ∫
𝑐 E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤
(14.14)
Proof By Proposition 3.2.3, for any 𝜃 ∈ S𝑛−1 , ∫
𝑇
𝑐 𝜌(𝐹𝜃 , 𝐹) ≤ 0
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 1 d𝑡 + 𝑡 𝑇
∫
𝑇
| 𝑓 (𝑡)| d𝑡,
(14.15)
0
where one may take 𝑐 = 1/𝜋 3 . After taking the expectation of both sides, we get ∫ 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ 0
𝑇
1 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 + 𝑡 𝑇
∫
𝑇
| 𝑓 (𝑡)| d𝑡. 0
In further estimations, when 𝑇 is chosen to be large, it makes sense to divide the second to last integral into two intervals, [0, 𝑇0 ] and [𝑇0 , 𝑇] with 𝑇0 being much smaller than 𝑇. In that case, in the integral over the second interval one may use the bounds E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| ≤ E 𝜃 | 𝑓 𝜃 (𝑡)| + | 𝑓 (𝑡)| ≤ 2 E 𝜃 | 𝑓 𝜃 (𝑡)|, which leads to (14.13). By a similar argument, one may estimate the 𝐿 2 -norm of the Kolmogorov distance. Squaring (14.15), we get 𝑐2 2 𝜌 (𝐹𝜃 , 𝐹) ≤ 2
∫ 0
𝑇
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 𝑡
2
1 + 2 𝑇
∫
𝑇
2 | 𝑓 (𝑡)| d𝑡 .
0
276
14 Fluctuations of Distributions
Assuming that 𝑇 > 1, we split integration in the first integral into the intervals [0, 1] and [1, 𝑇]. For the first interval, by Cauchy’s inequality, ∫
1
0
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 d𝑡 ≤ 𝑡
1
∫ 0
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 d𝑡. 𝑡2
To treat the second interval, consider the probability measure 𝜇 on [1, 𝑇] with density d𝜇 (𝑡) 1 d𝑡 = 𝑡 log 𝑇 . Then, by Cauchy’s inequality, ∫ 1
𝑇
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 = log 𝑇 𝑡
∫
𝑇
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝜇(𝑡) 1
∫
𝑇
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 d𝜇(𝑡)
≤ log 𝑇
1/2
1
∫
𝑇
= log 𝑇 1
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 d𝑡 𝑡
1/2 .
Hence, for some absolute constant 𝑐 > 0, ∫ 1 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 2 d𝑡 𝑐 𝜌 (𝐹𝜃 , 𝐹) ≤ 𝑡2 0 ∫ 𝑇 2 ∫ 𝑇 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 1 + log 𝑇 d𝑡 + 2 | 𝑓 (𝑡)| d𝑡 . 𝑡 𝑇 1 0 Without any essential loss, one may extend integration in the second integral to the larger interval [0, 𝑇]. Moreover, taking the expectation over 𝜃, we then get 𝑐 E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤
1
E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 d𝑡 𝑡2 0 ∫ 𝑇 2 ∫ 𝑇 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 1 + log 𝑇 d𝑡 + 2 | 𝑓 (𝑡)| d𝑡 . 𝑡 𝑇 0 0
∫
Again, one may split the integration in the second integral into two intervals [0, 𝑇0 ] and [𝑇0 , 𝑇] with 1 ≤ 𝑇0 ≤ 𝑇, so as to consider separately sufficiently large values of 𝑡 for which | 𝑓 𝜃 (𝑡)| is small enough (with high probability). More precisely, since 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡)
and
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤ 2 | 𝑓 𝜃 (𝑡)| 2 + 2 | 𝑓 (𝑡)| 2 ,
we have E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤ 4 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 . This leads to the bound (14.14).
□
One may simplify the bounds of Lemmas 14.5.1–14.5.2 by involving 𝑀 𝑝 together with the moment and normalized variance-type functionals 1 𝑚 𝑝 = √ (E | ⟨𝑋, 𝑌 ⟩ | 𝑝 ) 1/ 𝑝 , 𝑛
2 𝑝 1/ 𝑝 |𝑋 | − 1 = 𝑛 E 𝑛 √
𝜎2 𝑝
14.5 Berry–Esseen-type Bounds
277
(where 𝑌 is an independent copy of 𝑋). Recall that 𝑚 𝑝 ≤ 𝑀 𝑝2 (Proposition 1.4.2). In addition, if E |𝑋 | 2 = 𝑛, then, by Proposition 1.1.3, for any 𝑝 ≥ 2, 𝑚 𝑝 ≥ 𝑚2 ≥
1 E |𝑋 | 2 = 1. 𝑛
From Lemma 14.5.1 we obtain: Lemma 14.5.3 Let 𝑋 be a random vector in R𝑛 with finite moment of order 2𝑝 ( 𝑝 ≥ 1), such that E |𝑋 | 2 = 𝑛. Then with some constant 𝑐 𝑝 depending on 𝑝 only, for all 𝑇 ≥ 𝑇0 ≥ 1, ∫ 𝑇0 d𝑡 𝑐 𝑝 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 𝑡 0 𝑚 2𝑝𝑝 + 𝜎2𝑝𝑝 2 𝑇 1 + 1 + log + + e−𝑇0 /16 . 𝑝/2 𝑇 𝑇 𝑛 0 Proof According to Lemma 13.1.3, for all 𝑇 > 0, 𝑐𝑝 𝑇
∫
𝑇
| 𝑓 (𝑡)| d𝑡 ≤ 0
1 + 𝜎2𝑝𝑝 𝑛 𝑝/2
+
1 . 𝑇
This allows us to estimate the last integral in (14.13). Moreover, by Lemma 13.1.2, ∫
𝑇
𝑐𝑝 𝑇0
E 𝜃 | 𝑓 𝜃 (𝑡)| d𝑡 ≤ 𝑡 ≤
∫
𝑇
𝑇0 𝑚 2𝑝𝑝
𝑚𝑝 + 𝜎𝑝 2𝑝 2𝑝 𝑛 𝑝/2
+ 𝜎2𝑝𝑝 𝑛 𝑝/2
log
2
+ e−𝑇0 /16
d𝑡 𝑡
𝑇 8 −𝑇 2 /16 + e 0 𝑇0 𝑇02
for the second to last integral in (14.13). Since 𝑚 2 ≥ 1 and 𝑇0 ≥ 1, Lemma 14.5.3 follows. □ In the spirit of Berry–Esseen-type bounds, one may also develop lower bounds. In particular, we have the follow general estimate. Lemma 14.5.4 Given a function 𝐺 of bounded variation such that 𝐺 (−∞) = 0 and 𝐺 (∞) = 1, for any 𝑇 > 0, ∫ 𝑇 1 𝑡 E𝜃 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) 1 − d𝑡 . E 𝜃 sup |𝐹𝜃 (𝑥) − 𝐺 (𝑥)| ≥ 6𝑇 𝑇 𝑥 0 Proof We need the following assertion: Given a complex-valued random variable 𝜉 with finite first absolute moment, we have E |𝜉 − 𝑏| ≥ 12 E |𝜉 − E𝜉 | for any complex number 𝑏. Indeed, by the triangle inequality, |𝜉 − E𝜉 | ≤ |𝜉 − 𝑏| + |𝑏 − E𝜉 | ≤ |𝜉 − 𝑏| + E |𝑏 − 𝜉 |.
278
14 Fluctuations of Distributions
It remains to take the expectations of both sides. We use this inequality with ∫ 𝜉= 0
𝑇
𝑡 𝑓 𝜃 (𝑡) 1 − d𝑡, 𝑇
∫
𝑇
𝑏= 0
𝑡 d𝑡, 𝑔(𝑡) 1 − 𝑇
where 𝑔 is the Fourier–Stieltjes transform of 𝐺. Applying the bound (3.16) of Proposition 3.4.1 on the probability space (S𝑛−1 , 𝔰𝑛−1 ), we then get ∫ 𝑇 1 𝑡 E𝜃 ( 𝑓 𝜃 (𝑡) − 𝑔(𝑡)) 1 − d𝑡 E 𝜃 sup |𝐹𝜃 (𝑥) − 𝐺 (𝑥)| ≥ 3𝑇 𝑇 𝑥 0 ∫ 𝑇 1 𝑡 ≥ E𝜃 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) 1 − d𝑡 . 6𝑇 𝑇 0 Lemma 14.5.4 is proved.
□
14.6 Preliminary Bounds on the Kolmogorov Distance √ One can use the bounds of Lemmas 14.5.1–14.5.2 to obtain the rate 1/ 𝑛 modulo a logarithmic factor. In addition to the moment functionals 𝑀 𝑝 = 𝑀 𝑝 (𝑋), we involve the variance-type functional 1 𝜎2 = 𝜎2 (𝑋) = √ E |𝑋 | 2 − 𝑛 . 𝑛 Proposition 14.6.1 If a random vector 𝑋 in R𝑛 has finite first moment 𝑀1 , then for some absolute constant 𝑐 > 0 we have 𝑐 E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤
𝑀12 (log 𝑛) 2 1 + P |𝑋 − 𝑌 | 2 ≤ 𝑛/4 (log 𝑛) 2 + 2 , 𝑛 𝑛
where 𝑌 is an independent copy of 𝑋. As a consequence, if 𝑋 has finite 2-nd moment 𝑀2 and E |𝑋 | 2 = 𝑛, then 𝑐 E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤ (𝑀24 + 𝜎22 )
(log 𝑛) 2 . 𝑛
A similar bound also holds for the normal distribution function Φ in place of 𝐹. √︁ Proof We apply Lemma 14.5.2 with 𝑇0 = 5 log 𝑛 and 𝑇 = 5𝑛. The first two integrals in (14.14) can be bounded by using the spherical Poincaré inequality, i.e., using the first bound of Proposition 13.3.3, E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤
2𝑀12 2 𝑡 . 𝑛−1
14.6 Preliminary Bounds on the Kolmogorov Distance
279
This gives 1
∫ 0
∫ 𝑇0 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 d𝑡 + log 𝑇 d𝑡 𝑡 𝑡2 0 2𝑀12 25 ≤ 1+ (log(5𝑛)) 2 . 𝑛−1 2
To bound the third and fourth integrals in (14.14), we involve Lemma 13.1.1 with 𝜆 = 1/4, which gives 2 1 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 ≤ e−𝑡 /8 + e−𝑛/12 + P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4}. 3
This implies that, for some constant 𝐶, 1 𝑇
∫
𝑇
| 𝑓 (𝑡)| d𝑡 ≤ 0
1 𝑇
≤𝐶
∫
𝑇
E 𝜃 | 𝑓 𝜃 (𝑡)| d𝑡 ≤ 0
1 𝑇
so that 1 𝑇2
∫
| 𝑓 (𝑡)| d𝑡
∫
𝑇
E 𝜃 | 𝑓 𝜃 (𝑡)| 2
1/2
d𝑡
0
√︁ + P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4} , 2
𝑇
1 𝑇
≤ 2 𝐶2
0
1 2 + P{|𝑋 − 𝑌 | ≤ 𝑛/4} . 𝑛2
Similarly, for some absolute constants 𝑐 > 0 and 𝐶 > 0, ∫
𝑇
𝑐 𝑇0
2 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 𝑇 d𝑡 ≤ e−𝑛/12 + P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4} log + e−𝑇0 /8 𝑡 𝑇0 1 ≤ 𝐶 3 + P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4} log 𝑛 . 𝑛
These bounds prove the first assertion of the proposition. For the second assertion, note that 𝑀22 ≥ 𝑚 2 ≥ 1, so that the last term 1/𝑛2 is dominated by 𝑀24 /𝑛. It remains to recall that, by Proposition 1.6.2, P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4} ≤ 16
𝑀 4 + 𝜎22 𝑚 22 + 𝜎22 ≤ 16 2 . 𝑛 𝑛
This leads to the second bound. Here, 𝐹 may be replaced with the standard normal distribution function Φ due to Proposition 12.2.2. Indeed, it provides the bound on the weighted total variation distance, which implies that 𝜌(𝐹, Φ) ≤ √𝐶𝑛 1+𝜎2 . Hence, by the triangle inequality,
280
14 Fluctuations of Distributions
𝜌 2 (𝐹𝜃 , Φ) ≤ 𝜌(𝐹𝜃 , 𝐹) + 𝜌(𝐹, Φ)
2
≤ 2𝜌 2 (𝐹𝜃 , 𝐹) + 2𝜌 2 (𝐹, Φ) ≤ 2𝜌 2 (𝐹𝜃 , 𝐹) +
4𝐶 2 1 + 𝜎22 . 𝑛
After averaging over 𝜃, Proposition 14.6.1 is thus proved.
□
Thus, in a rather general situation (when 𝑀2 and 𝜎22 are of order one), the √ Kolmogorov distances 𝜌(𝐹𝜃 , 𝐹) and 𝜌(𝐹𝜃 , Φ) are at most of order 1/ 𝑛 on average, modulo a logarithmic factor. In particular, log 𝑛 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝐶 (𝑀22 + 𝜎2 ) √ 𝑛 log 𝑛
for some absolute constant 𝐶. It is interesting that for the rate √𝑛 in this central limit theorem, we do not need moments higher than 2. As for the first inequality of Proposition 14.6.1, it implies that √︃ 1 𝑀1 log 𝑛 + P |𝑋 − 𝑌 | 2 ≤ 𝑛/4 log 𝑛 + . 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ √ 𝑛 𝑛 In fact, this assertion may slightly be refined by using Lemma 14.5.1 (rather than 14.5.2). Namely, with similar arguments as in the proof of Proposition 14.6.1, we then arrive at: Proposition 14.6.2 If a random vector 𝑋 in R𝑛 has finite moment 𝑀1 , then √︁ 𝑀1 log 𝑛 √︃ 1 + P |𝑋 − 𝑌 | 2 ≤ 𝑛/4 log 𝑛 + , 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ √ 𝑛 𝑛 where 𝑐 > 0 is an absolute constant and 𝑌 is an independent copy of 𝑋. √︁ Proof Using the same parameters 𝑇0 = 5 log 𝑛 and 𝑇 = 5𝑛, we now apply Lemma 14.5.1. Again, the first integral in (14.13) can be bounded by virtue of the the first bound of Proposition 13.3.3, which gives √ ∫ 𝑇0 5 2 𝑀1 √︁ E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 ≤ √ log 𝑛. 𝑡 0 𝑛−1 To bound the second and third integrals in (14.13), Lemma 13.1.1 with 𝜆 = 1/4 gives √︁ 2 𝑐 E 𝜃 | 𝑓 𝜃 (𝑡)| ≤ e−𝑡 /16 + e−𝑛/24 + P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4}, which implies that, for some constant 𝐶, 1 𝑇
∫
𝑇
| 𝑓 (𝑡)| d𝑡 ≤ 0
1 𝑇
∫
𝑇
E 𝜃 | 𝑓 𝜃 (𝑡)| d𝑡 ≤ 𝐶 0
1 𝑛
√︁ + P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4}
14.6 Preliminary Bounds on the Kolmogorov Distance
281
and ∫
𝑇
𝑇0
√︁ 2 E 𝜃 | 𝑓 𝜃 (𝑡)| 𝑇 d𝑡 ≤ 𝐶 e−𝑛/24 + P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4} log + 𝐶e−𝑇0 /16 . 𝑡 𝑇0
These bounds prove Proposition 14.6.2.
□
As an illustration of Propositions 14.6.1–14.6.2, let us mention the following consequence for the i.i.d. case. Corollary 14.6.3 If the components of the random vector 𝑋 in R𝑛 are independent, identically distributed, have mean zero and finite second moment, then √︂ log 𝑛 (log 𝑛) 2 , E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤ 𝑐 , E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ 𝑐 𝑛 𝑛 where the constant 𝑐 depends on the distribution of 𝑋1 only. Indeed, since the Kolmogorov distance is invariant under linear transformations, without loss of generality one may assume that E (𝑋1 − 𝑌1 ) 2 = 1, where 𝑌 = (𝑌1 , . . . , 𝑌𝑛 ) is an independent copy of 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ). But then, by Proposition 2.1.5, applied to the random variables 𝑋 𝑘 − 𝑌𝑘 , we have P{|𝑋 − 𝑌 | 2 ≤ 𝑛/4} ≤ e−𝑐𝑛 , for some constant 𝑐 > 0 depending on the distribution of 𝑋1 only. It remains to apply Propositions 14.6.1–14.6.2. However, one cannot draw a similar conclusion to Corollary 14.6.3 by replacing 𝐹 with the normal distribution function Φ. If E𝑋12 = 1, the second bound in Proposition 14.6.1 is applicable, but it contains the parameter 1 𝜎2 = √ E 𝜉1 + · · · + 𝜉 𝑛 , 𝑛
𝜉 𝑘 = 𝑋 𝑘2 − 1,
which might grow to infinity as 𝑛 → ∞, when 𝜉1 does not have a finite second moment, i.e., when E𝑋14 = ∞. As for the typical distribution, let us recall that it is described as the law 𝐹 = L (|𝑋 |𝜃 1 ), where the first coordinate 𝜃 1 of a point √ 𝜃 on the unit sphere is viewed as being independent of 𝑋. The distribution of 𝜃 1 𝑛 is close to the standard normal law Φ at the rate of order 1/𝑛 (by Proposition 11.1.2). Hence, in the bounds of Corollary 14.6.3, 𝐹 may be replaced with the law 𝐺 = L (|𝑋 | √𝑍𝑛 ), i.e., we still have √︁
log 𝑛 E 𝜃 𝜌(𝐹𝜃 , 𝐺) ≤ 𝑐 √ , 𝑛
E 𝜃 𝜌 2 (𝐹𝜃 , 𝐺) ≤ 𝑐
log2 𝑛 . 𝑛
282
14 Fluctuations of Distributions
Recall that 𝐺 represents the mixture of centered Gaussian measures on the real line with characteristic function ∫ ∞ 𝑛 2 2 2 2 e𝑖𝑡 𝑥 d𝐺 (𝑥) = E e−𝑡 |𝑋 | /2𝑛 = E e−𝑡 𝑋1 /2𝑛 −∞ 𝑛 ∫ ∞ 2 2 = e−𝑡 𝑟 /2𝑛 d𝜈(𝑟) , 𝑡 ∈ R, −∞
where the last two expressions correspond to the i.i.d. case as in Corollary 14.6.3 with a common distribution 𝜈 of the components 𝑋 𝑘 . Equivalently, in this case, 𝐺 serves as the distribution of the normalized sum Σ𝑛 =
𝑋1 𝑍1 + · · · + 𝑋𝑛 𝑍 𝑛 √ 𝑛
with independent random variables 𝑍 𝑘 ∼ 𝑁 (0, 1) that are also independent of 𝑋. Remark 14.6.4 Under the assumptions of Corollary 14.6.3, even if additionally E𝑋1 = 0, E𝑋12 = 1, the normalized sums √1𝑛 (𝑋1 + · · · + 𝑋𝑛 ) (that is, the weighted sums ⟨𝑋, 𝜃⟩ with equal coefficients) have distributions 𝐹𝑛 which do not need be close to the standard normal law Φ at the standard rate. Moreover, as was shown by Matskyavichyus [135], the distances 𝜌(𝐹𝑛 , Φ) may tend to zero at an arbitrarily slow rate. Namely, for any sequence 𝜀 𝑛 → 0, 𝑋1 may have a distribution with mean zero and variance one, such that 𝜌(𝐹𝑛 , Φ) ≥ 𝜀 𝑛 for all 𝑛 large enough. In the construction of [135], this distribution has characteristic function of the form ∫ ∞ 2 2 𝑖𝑡 𝑋1 Ee = e−𝑡 𝑟 /2 d𝜈(𝑟), 𝑡 ∈ R, 0
for some discrete probability measure 𝜈 on (0, ∞). Hence, the distribution of 𝑍 𝑛 is the same as the distribution of Σ𝑛 when 𝑋12 is distributed according to 𝜈. This means that the concentration of the 𝐹𝜃 around their mean 𝐹 = E 𝜃 𝐹𝜃 may be essentially stronger than their concentration around the standard normal law.
14.7 Bounds With a Standard Rate To get rid of the logarithmic term in the bounds of Propositions 14.6.1–14.6.2, one may involve the 3-rd moment assumptions in terms of the moment and variance-type functionals 𝑚 𝑝 , 𝑀 𝑝 , and 𝜎𝑝 of index 𝑝 = 3. They are defined by 1/3 1/3 1 𝑚 3 = √ E | ⟨𝑋, 𝑌 ⟩ | 3 , 𝑀3 = sup E | ⟨𝑋, 𝜃⟩ | 3 , 𝑛 | 𝜃 |=1 3 2 3 23 √ |𝑋 | 2 1 2 3 − 1 = √ E |𝑋 | 2 − 𝑛 2 , 𝜎3 = 𝑛 E 𝑛 𝑛
14.7 Bounds With a Standard Rate
283
where 𝑌 is an independent copy of 𝑋. Let us recall that 𝑚 3 ≤ 𝑀32 (Proposition 1.4.2), while 𝜎32 ≤ 𝜎42 = 𝑛1 Var(|𝑋 | 2 ). Proposition 14.7.1 Let 𝑋 be a random vector in R𝑛 with mean 𝑎 = E𝑋, finite 3-rd moment, and such that E |𝑋 | 2 = 𝑛. Then for some absolute constant 𝑐, 1 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝑐 (𝑚 33/2 + 𝜎33/2 + |𝑎|) √ . 𝑛
(14.16)
1 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝑐 (𝑀33 + 𝜎33/2 ) √ . 𝑛
(14.17)
In particular,
Let us recall that |𝑎| ≤ 𝑀2 , while 𝑚 3 ≥ 𝑚 2 ≥ 1 due to the assumption E |𝑋 | 2 = 𝑛 (Proposition 1.1.3 and Proposition 1.2.2). Hence, in the isotropic case, the term |𝑎| may be removed from (14.16). As for (14.17), this inequality follows from (14.16), since 𝑚 3 ≤ 𝑀32 , while |𝑎| ≤ 𝑀2 ≤ 𝑀3 . In general, the term |𝑎| may not be removed from (14.16). See Remark 14.7.2, where we discuss the sharpness of this inequality with respect to the expectation of 𝑋. Proof (of Proposition 14.7.1) By Lemma 14.5.3 with 𝑝 = 3/2, we have for all 𝑇 ≥ 𝑇0 ≥ 1, ∫ 𝑇0 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ 𝑡 0 𝑚 3/2 + 𝜎 3/2 2 𝑇 1 + 3 3/4 3 1 + log + + e−𝑇0 /16 , 𝑇0 𝑇 𝑛 √︁ where 𝑐 > 0 is an absolute constant. Let us take here 𝑇 = 4𝑛 and 𝑇0 = 4 log 𝑛. Since 𝑚 3 ≥ 1, the last two terms are negligible, and we get the bound ∫ 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ 0
𝑇0
E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| log 𝑛 d𝑡 + (𝑚 33/2 + 𝜎33/2 ) 3/4 . 𝑡 𝑛
To analyze the above integral over the interval [0, 𝑇0 ], we apply the first order Poincaré-type inequality on the unit sphere via the equality of Lemma 13.3.1, E 𝜃 |∇ 𝑓 𝜃 (𝑡)| 2 = 𝑡 2 E ⟨𝑋, 𝑌 ⟩ 𝐽𝑛 (𝑡 (𝑋 − 𝑌 )), which gives 1 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ √ 𝑛
∫
𝑇0
√︁ E ⟨𝑋, 𝑌 ⟩ 𝐽𝑛 (𝑡 (𝑋 − 𝑌 )) d𝑡
0
+ (𝑚 33/2 + 𝜎33/2 )
log 𝑛 . 𝑛3/4
(14.18)
284
14 Fluctuations of Distributions
Next, we apply the bound of Proposition 11.3.1 𝐶 √ 2 𝐽𝑛 𝑡 𝑛 − e−𝑡 /2 ≤ 𝑛 which allows one to replace the 𝐽𝑛 -term with e−𝑡 not exceeding, up to an absolute constant,
2 |𝑋−𝑌 | 2 /2𝑛
at the expense of an error
√ √ 𝑚 33/2 𝑚3 𝑚2 1 √︁ 𝑇0 E | ⟨𝑋, 𝑌 ⟩ | ≤ 3/4 𝑇0 ≤ 3/4 𝑇0 ≤ 3/4 𝑇0 . 𝑛 𝑛 𝑛 𝑛 As a result, the bound (14.18) simplifies to 1 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ √ 𝑛
∫
𝑇0
√︁
0
with 𝐼 (𝑡) = E ⟨𝑋, 𝑌 ⟩ e−𝑡
𝐼 (𝑡) d𝑡 + (𝑚 33/2 + 𝜎33/2 )
2 |𝑋−𝑌 | 2 /2𝑛
log 𝑛 𝑛3/4
(14.19)
√ = E 𝑍 E 𝑋e𝑖𝑡 ⟨𝑋,𝑍 ⟩/ 𝑛 | 2 .
Note that 𝐼 (𝑡) ≥ 0, which follows from the second representation, in which the random vector 𝑍 is independent of 𝑋 and has a standard normal distribution on R𝑛 . Now, focusing on 𝐼 (𝑡), consider the events n 1 o 𝐴 = |𝑋 − 𝑌 | 2 ≤ 𝑛 , 4
n 1 o 𝐵 = |𝑋 − 𝑌 | 2 > 𝑛 4
and split the expectation in the definition of 𝐼 (𝑡) into the sets 𝐴 and 𝐵, so that 𝐼 (𝑡) = 𝐼1 (𝑡) + 𝐼2 (𝑡), where 𝐼1 (𝑡) = E ⟨𝑋, 𝑌 ⟩ e−𝑡
2 |𝑋−𝑌 | 2 /2𝑛
1 𝐴,
𝐼2 (𝑡) = E ⟨𝑋, 𝑌 ⟩ e−𝑡
2 |𝑋−𝑌 | 2 /2𝑛
1𝐵 .
As we know (cf. Proposition 1.6.2), P( 𝐴) ≤ 64
𝑚 33 + 𝜎33 𝑛3/2
.
Hence, applying Hölder’s inequality, as well as the simple inequality 𝑥𝑦 2 ≤ 𝑥 3 + 𝑦 3 (𝑥, 𝑦 ≥ 0) with 𝑥 = 𝑚 3 and 𝑦 = 𝜎3 , we have |𝐼1 (𝑡)| ≤ E | ⟨𝑋, 𝑌 ⟩ | 3 ) 1/3 (P( 𝐴)) 2/3 𝑚 2 + 𝜎32 √ 32 ≤ √ (𝑚 33 + 𝜎33 ). ≤ 𝑚 3 𝑛 · 16 3 𝑛 𝑛
14.7 Bounds With a Standard Rate
285
Now, using E ⟨𝑋, 𝑌 ⟩ = |𝑎| 2 , we represent the second expectation as 2
2
|𝑋−𝑌 | 2
𝐼2 (𝑡) = e−𝑡 E ⟨𝑋, 𝑌 ⟩ e−𝑡 ( 2𝑛 −1) 1 𝐵 2 |𝑋−𝑌 |2 2 2 2 = e−𝑡 E ⟨𝑋, 𝑌 ⟩ e−𝑡 ( 2𝑛 −1) − 1 1 𝐵 − e−𝑡 E ⟨𝑋, 𝑌 ⟩ 1 𝐴 + e−𝑡 |𝑎| 2 . Here, the last√ expectation has already been bounded in absolute value by 32 (𝑚 33 + 𝜎33 )/ 𝑛. To estimate the first one, we use the inequality |e−𝑥 − 1| ≤ |𝑥| e 𝑥0 (𝑥0 ≥ 0, 𝑥 ≥ −𝑥0 ). Since on the set 𝐵 there is a uniform bound 𝑡2
|𝑋 − 𝑌 | 2 7 − 1 ≥ − 𝑡2, 2𝑛 8
we conclude by virtue of Hölder’s inequality that |𝑋 − 𝑌 | 2 2 |𝑋−𝑌 |2 2 E | ⟨𝑋, 𝑌 ⟩ | e−𝑡 ( 2𝑛 −1) − 1 1 𝐵 ≤ 𝑡 2 e7𝑡 /8 E | ⟨𝑋, 𝑌 ⟩ | − 1 2𝑛 3 2 13 |𝑋 − 𝑌 | 2 2 2 3 ≤ 𝑡 2 e7𝑡 /8 E | ⟨𝑋, 𝑌 ⟩ | 3 E . − 1 2𝑛 √ The first expectation on the right-hand side is E | ⟨𝑋, 𝑌 ⟩ | 3 = (𝑚 3 𝑛) 3 . As for the second one, first write 1 |𝑌 | 2 1 |𝑋 − 𝑌 | 2 1 |𝑋 | 2 −1 = −1 + − 1 − ⟨𝑋, 𝑌 ⟩ . 2𝑛 2 𝑛 2 𝑛 𝑛 3
3
3
3
Using the inequality ( 12 𝑥 + 21 𝑦 + 𝑧) 2 ≤ 𝑥 2 + 𝑦 2 + 2𝑧 2 (𝑥, 𝑦, 𝑧 ≥ 0), we then also have |𝑋 − 𝑌 | 2 3 |𝑋 | 2 3 1/2 2 2 2 2 E − 1 ≤ 2 E − 1 + 3/2 E | ⟨𝑋, 𝑌 ⟩ | 3 = 3/4 𝜎33/2 + 𝑚 33/2 , 2𝑛 𝑛 𝑛 𝑛 which gives 3 2 |𝑋 − 𝑌 | 2 2 2 3 E − 1 ≤ √ 𝜎3 + 𝑚 3 . 2𝑛 𝑛 Therefore, 2 |𝑋−𝑌 |2 2 E | ⟨𝑋, 𝑌 ⟩ | e−𝑡 ( 2𝑛 −1) − 1 1 𝐵 ≤ 2𝑡 2 e7𝑡 /8 𝑚 3 (𝑚 3 + 𝜎3 ) ≤ 4𝑡 2 e7𝑡
2 /8
(𝑚 32 + 𝜎32 ),
and, as a result, 2
2 2 e−𝑡 𝐼2 (𝑡) ≤ 32 √ (𝑚 33 + 𝜎33 ) + 4𝑡 2 e−𝑡 /8 (𝑚 32 + 𝜎32 ) + e−𝑡 |𝑎| 2 , 𝑛 2
where the factor e−𝑡 in the first term may be removed without loss of strength.
286
14 Fluctuations of Distributions
Together with the estimate on 𝐼1 (𝑡), we get 2 2 64 𝐼 (𝑡) ≤ √ (𝑚 33 + 𝜎33 ) + 4𝑡 2 e−𝑡 /8 (𝑚 32 + 𝜎32 ) + e−𝑡 |𝑎| 2 , 𝑛
so √︁
𝐼 (𝑡) ≤
2 2 8 (𝑚 33/2 + 𝜎33/2 ) + 2 |𝑡| e−𝑡 /16 (𝑚 3 + 𝜎3 ) + e−𝑡 /2 |𝑎| 𝑛1/4
and 1 √ 𝑛
∫
𝑇0
√︁
𝐼 (𝑡) d𝑡 ≤
0
8𝑇0 𝐶 (𝑚 33/2 + 𝜎33/2 ) + √ (𝑚 3 + 𝜎3 + |𝑎|) 3/4 𝑛 𝑛
for some absolute constant 𝐶. Returning to the bound (14.19), we thus obtain that 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤
log 𝑛 3/2 𝐶 (𝑚 3 + 𝜎33/2 ) + √ (𝑚 3 + 𝜎3 + |𝑎|). 𝑛3/4 𝑛
To simplify it, one may use again the property 𝑚 3 ≥ 1, which implies that 𝑚 3 + 𝜎3 ≤ 2(𝑚 33/2 + 𝜎33/2 ) for all values of 𝜎3 . Thus, for some absolute constant 𝑐 > 0, 𝐶 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ √ (𝑚 33/2 + 𝜎33/2 + |𝑎|). 𝑛 To obtain a similar bound with Φ in place of 𝐹, it remains to recall Proposition √ 2 , where 1 + 𝜎2 may further be bounded by 12.2.2 with its estimate 𝜌(𝐹, Φ) ≤ 𝐶 1+𝜎 𝑛 2(𝑚 33/2 + 𝜎33/2 ). The inequality (14.16) has now been proved.
□
Remark 14.7.2 Let us illustrate the inequality (14.16) on the example where the random vector 𝑋 has a normal distribution with a large mean value. Given a standard normal random vector 𝑍 in R𝑛−1 (which we identify with the space of all points in R𝑛 with zero last coordinate), define 𝑋 = 𝛼𝑍 + 𝜆𝑒 𝑛
with 1 ≤ 𝜆 ≤ 𝑛1/4 , 𝛼2 (𝑛 − 1) + 𝜆2 = 𝑛,
where 𝑒 𝑛 = (0, . . . , 0, 1) is the last unit vector in the canonical basis of R𝑛 (necessarily, |𝛼| < 1). Since 𝑍 is orthogonal to 𝑒 𝑛 , we have |𝑋 | 2 = 𝛼2 |𝑍 | 2 + 𝜆2 and E |𝑋 | 2 = 𝑛. In addition, 𝜎32 ≤ 𝜎42 =
𝛼4 2𝛼4 (𝑛 − 1) Var(|𝑍 | 2 ) = ≤ 2. 𝑛 𝑛
Let 𝑍 ′ be an independent copy of 𝑍, so that 𝑌 = 𝛼𝑍 ′ + 𝜆𝑒 𝑛 is an independent copy of 𝑋. From this, by Jensen’s inequality, 1 1 E ⟨𝑋, 𝑌 ⟩ 4 = 2 E (𝛼2 ⟨𝑍, 𝑍 ′⟩ + 𝜆2 ) 4 2 𝑛 𝑛 8 8 8 ′ 4 ≤ 2 (𝛼 E ⟨𝑍, 𝑍 ⟩ + 𝜆8 ) = 2 (3𝛼8 (𝑛2 + 2𝑛) + 𝜆8 ) ≤ 80. 𝑛 𝑛
𝑚 34 ≤ 𝑚 44 =
14.8 Deviation Bounds for the Kolmogorov Distance
287
Thus, both 𝑚 3 and 𝜎3 are bounded, while the mean 𝑎 = E𝑋 = 𝜆𝑒 𝑛 has the Euclidean length |𝑎| = 𝜆 ≥ 1. Hence, the inequality (14.16) simplifies to 𝑐𝜆 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √ . 𝑛 Let us see that this bound may be reversed (which would imply that |𝑎| may not be removed from (14.16)). For any unit vector 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ), the linear form 𝑆 𝜃 = ⟨𝑋, 𝜃⟩ has a normal distribution on the real line with mean E𝑆 𝜃 = 𝜆𝜃 𝑛 and variance Var(𝑆 𝜃 ) = 𝛼2 (1 − 𝜃 𝑛 ) 2 < 1. Note that, for the normal distribution function 1 2 Φ 𝜇, 𝜎 2 (𝑥) = Φ( 𝑥−𝜇 𝜎 ) with parameters |𝜇| ≤ 1, 2 ≤ 𝜎 ≤ 1, we have 𝜌(Φ 𝜇, 𝜎 2 , Φ) ≥ |Φ 𝜇, 𝜎 2 (0) − Φ(0)| 2 2 |𝜇| 1 1 ≥ √ = |Φ(−𝜇/𝜎) − Φ(0)| ≥ √ e−𝜇 /2𝜎 |𝜇|. 𝜎 2𝜋 2𝜋𝑒
In our case, since 𝜆 ≤ 𝑛1/4 and 𝛼2 =
√ 𝑛− 𝑛 1 𝑛 − 𝜆2 ≥ ≥ 1− √ , 𝑛−1 𝑛−1 𝑛
we have |E𝑆 𝜃 | ≤ 1 and Var(𝑆 𝜃 ) ≥ 12 on the set Ω = {𝜃 : |𝜃 𝑛 | < enough. It then easily follows that for some constant 𝑐 > 0 E 𝜃 𝜌(𝐹𝜃 , Φ) ≥ √
log 𝑛 √ } 𝑛
with 𝑛 large
𝜆
𝑐𝜆 E |𝜃 𝑛 | 1 { 𝜃 ∈Ω} ≥ √ . 𝑛 2𝜋𝑒
14.8 Deviation Bounds for the Kolmogorov Distance In order to see that the bound (14.17) is valid not only on average, but also for all directions 𝜃 from a large part of the unit sphere, it is natural to complement or strengthen this bound in terms of large deviations. Note that, in general 1 𝐿(𝐹𝜃 , Φ), 𝐿 (𝐹𝜃 , Φ) ≤ 𝜌(𝐹𝜃 , Φ) ≤ 1 + √ 2𝜋 so, it does not matter which of the two distances is used to measure the closeness to the standard normal law. Let us return to Proposition 14.4.5 and combine it with Proposition 14.7.1. Then we immediately obtain:
288
14 Fluctuations of Distributions
Proposition 14.8.1 Suppose that the random vector 𝑋 in R𝑛 satisfies E |𝑋 | 2 = 𝑛, and let 𝑀3 be finite. Then, for all 𝑝 > 0 and 𝑟 > 0, n o n 𝐴 𝑟 2( 𝑝+1)/ 𝑝 o 𝔰𝑛−1 𝐿(𝐹𝜃 , Φ) ≥ √ + 𝑟 ≤ exp − 𝑛 , 𝑛 4𝑀 𝑝2 where 𝐴 depends on 𝑀3 and 𝜎3 , only. One may take 𝐴 = 𝑐 (𝑀33 + 𝜎33/2 ) with some absolute constant 𝑐 > 0. For example, in the isotropic case, that is, with 𝑝 = 2 and 𝑀2 = 1, o n 3 𝐴 𝔰𝑛−1 𝐿 (𝐹𝜃 , Φ) ≥ √ + 𝑟 ≤ e−𝑛𝑟 /4 , 𝑛 so that with high 𝔰𝑛−1 -probability 𝜌(𝐹𝜃 , Φ) ≤ 2𝐴
(log 𝑛) 1/3 . 𝑛1/3
But, this “typical” bound can be improved by choosing larger values of 𝑝, as long as the moments 𝑀 𝑝 do not grow too fast. To this end, assume, for example, that E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2 for all 𝜃 ∈ S𝑛−1 with some 𝜆 > 0. Using 𝑥 𝑝 e−𝑥 ≤ ( 𝑝e ) 𝑝 (𝑥, 𝑝 > 0), we then get ⟨𝑋, 𝜃⟩ 𝑝 𝑝 𝑝 e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 𝜆 e and hence 𝑀 𝑝𝑝 ≤ 2 ( 𝜆e𝑝 ) 𝑝 , i.e., 𝑀 𝑝 ≤ 21/ 𝑝
𝜆 𝜆𝑝 = 2𝑠 , e e𝑠
𝑠=
1 . 𝑝
From this, if 𝑝 ≥ 1, 2/ 𝑝 e2 𝑟 2( 𝑝+1)/ 𝑝 2 𝑟 2 = 𝑟 ≥ 𝑟 𝑠2 𝑟 2𝑠 2 · 4𝑠 𝜆 2 2𝑀 𝑝2 2𝑀 𝑝2 e2 ≥ 𝑟 2 2 𝑠2 exp − 2𝑠 log(1/𝑟) . 8𝜆
Assuming that 0 < 𝑟 ≤ e−1 , one may choose 𝑠 = 1/log(1/𝑟), and then the latter expression is greater than or equal to 𝑟 2 /(8𝜆2 log2 (1/𝑟)). One may now use this lower bound in the inequality of Proposition 14.8.1, to get the following corollary.
14.9 The Log-concave Case
289
Corollary 14.8.2 Suppose that E |𝑋 | 2 = 𝑛 and E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2 for all 𝜃 ∈ S𝑛−1 with some 𝜆 > 0. Then n o o n 𝐴 𝑟2 𝑛 𝔰𝑛−1 𝐿 (𝐹𝜃 , Φ) ≥ √ + 𝑟 ≤ exp − 16 𝜆2 log2 (1/𝑟) 𝑛 for all 0 < 𝑟 ≤ e−1 , where 𝐴 > 0 depends on 𝜆 and 𝜎3 only. In the inequality e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≥ 2𝜆1 2 ⟨𝑋, 𝜃⟩ 2 one may take the expectation, which leads to 2 ≥ 2𝜆1 2 E ⟨𝑋, 𝜃⟩ 2 . Taking another expectation with respect to d𝔰𝑛−1 (𝜃), we see that the condition E |𝑋 | 2 = 𝑛 forces 𝜆 to be bounded away from zero: 𝜆 ≥ 12 . Let us choose (log 𝑛) 3/2 . 𝑟 = 4𝜆 √ 𝑛 It follows that 𝑟 ≥ 2
(log 2) 3/2 √ 𝑛
>
√1 , 𝑛
so that log(1/𝑟)
√ + 𝑒𝑟 ≤ 𝔰𝑛−1 𝐿(𝐹𝜃 , Φ) ≥ √ + 𝑟 ≤ 4 . 𝑛 𝑛 𝑛 In the case 𝑟 ≥ 1/e, the left 𝔰𝑛−1 -probability is zero, that is, we arrive at the following: Corollary 14.8.3 Under the assumptions of Corollary 14.8.2, with some constant 𝐴 depending on 𝜆 and 𝜎3 , the inequality 𝜌(𝐹𝜃 , Φ) ≤ 𝐴
(log 𝑛) 3/2 √ 𝑛
holds true for all 𝜃 ∈ S𝑛−1 except for a set of directions of 𝔰𝑛−1 -measure ≤ 𝑛−4 .
14.9 The Log-concave Case Most of the previous observations may be refined under an additional log-concavity assumption on the distribution of 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ). More precisely, we assume here that the random vector 𝑋 is isotropic, has mean zero, and a log-concave distribution on R𝑛 (𝑛 ≥ 2). The aim of this section is to derive the following bounds on the
290
14 Fluctuations of Distributions
fluctuations of the distribution functions 𝐹𝜃 about the typical distribution 𝐹 = E 𝜃 𝐹𝜃 . First, we remove the log 𝑛 term from Proposition 14.2.3 when speaking about the Kantorovich distance. Proposition 14.9.1 For some absolute constant 𝑐 > 0, we have 𝑐 E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ √ . 𝑛 Moreover, for any 𝑟 > 0, o n 2 𝑐 𝔰𝑛−1 𝑊 (𝐹𝜃 , 𝐹) ≥ √ + 𝑟 ≤ e−𝑛𝑟 /4 . 𝑛 Turning to the Kolmogorov distance, let us note that all weighted sums ⟨𝑋, 𝜃⟩ have a log-concave distribution on the real line with mean zero and variance one. Consequently, they have a bounded 𝐿 𝜓1 -norm, i.e., E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2 for some absolute constant 𝜆 > 0 (cf. Corollary 2.5.2). Corollary 14.8.3 is thus applicable and provides the rate of fluctuation of the 𝐹𝜃 from the standard normal law Φ of order at most √𝐴 (log 𝑛) 3/2 . However, the constant 𝐴 in that statement depends upon the variance 𝑛 functional 𝜎3 (𝑋), about which we cannot say whether or not it is bounded by a universal constant (in the general log-concave isotropic situation). As it turns out, when replacing Φ with 𝐹, there is no need to involve this functional. Proposition 14.9.2 For all 𝑟 > 0, n h i o √ 2 𝔰𝑛−1 sup e𝑐 | 𝑥 | |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| ≥ 𝑟 ≤ 𝐶 𝑛 log 𝑛 e−𝑐𝑛𝑟 𝑥
with some absolute constants 𝐶 > 0 and 𝑐 > 0. In particular, √︁ 𝑐 log 𝑛 . E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ √ 𝑛 By the Borell characterization theorem (Proposition 2.4.1), the distribution 𝜇 of 𝑋 has a log-concave density 𝑝(𝑥) (which is also due to the assumption that 𝑋 is isotropic, so that 𝜇 is full-dimensional). Let us write the points in R𝑛 in the form 𝑥 = (𝑦, 𝑡) with 𝑥 ∈ R𝑛−1 and 𝑡 ∈ R. As a first step of the proof, we need: Lemma 14.9.3 For every 𝑡 ∈ R and every unit vector 𝑙 in R𝑛−1 , ∫ | ⟨𝑙, 𝑦⟩ | 𝑝(𝑦, 𝑡) d𝑦 ≤ 𝐶 e−𝑐 |𝑡 | , R𝑛−1
where 𝐶 and 𝑐 are positive universal constants. Proof The function 𝑙 + (𝑦) = max{⟨𝑙, 𝑦⟩ , 0} is log-concave on R𝑛−1 , and so is the function 𝑙 + (𝑦) 𝑝(𝑦, 𝑡) on R𝑛 . Hence, by the Prékopa theorem (Corollary 2.4.3), the function
14.9 The Log-concave Case
291
∫
𝑙 + (𝑦) 𝑝(𝑦, 𝑡) d𝑦
𝑢(𝑡) = R𝑛−1
is log-concave on R. We consider the linear functions 𝜉 (𝑦, 𝑡) = ⟨𝑙, 𝑦⟩ and 𝜂(𝑦, 𝑡) = 𝑡 as log-concave random variables on the probability space (R𝑛 , 𝜇). By the assumptions, E𝜉 = E𝜂 = 0 and E𝜉 2 = E𝜂2 = 1. Thus, using 𝜉 + = max{𝜉, 0}, we have ∫ ∞ ∫ ∞ ∫ ∞ 𝑡𝑢(𝑡) d𝑡 = E𝜉 + 𝜂, 𝑡 2 𝑢(𝑡) d𝑡 = E𝜉 + 𝜂2 . 𝑢(𝑡) d𝑡 = E𝜉 + , −∞
−∞
−∞
Introduce the log-concave probability density 𝑞(𝑡) =
𝑢(𝑡) , E𝜉 +
𝑡 ∈ R.
Since we need to estimate the function 𝑢(𝑡) from above, we apply Proposition 2.6.4 to 𝑞. Let 𝜁 be a random variable with this density, so that E𝜁 =
E𝜉 + 𝜂 , E𝜉 +
Var(𝜁) =
E𝜉 + 𝜂2 E𝜉 + − (E𝜉 + 𝜂) 2 . (E𝜉 + ) 2
Let 𝜈 denote the normalized restriction of 𝜇 to the half-space 𝐴 = {𝜉 > 0} in R𝑛 . Then the above variance can also be written as ∫ ∞ 1 Var(𝜁) = (𝑡 − 𝑎) 2 𝑢(𝑡) d𝑡 E𝜉 + −∞ ∫ 𝜇( 𝐴) 𝜇( 𝐴) = (𝜂 − 𝑎) 2 𝜉 d𝜈 = ∥(𝜂 − 𝑎) 2 𝜉 ∥ 1 , (14.20) + E𝜉 E𝜉 + ∫ where 𝑎 = E𝜁 and where∫we use the notation ∥𝜓∥ 𝛼 = ( |𝜓| 𝛼 d𝜈) 1/𝛼 with the usual convention ∥𝜓∥ 0 = exp{ log |𝜓| d𝜈}. We have ∥ (𝜂 − 𝑎) 2 𝜉 ∥ 1 ≥ ∥ (𝜂 − 𝑎) 2 𝜉 ∥ 0 = ∥ |𝜂 − 𝑎| ∥ 20 ∥𝜉 ∥ 0 , while, since the measure 𝜈 is log-concave, by Propositions 2.5.1 and 2.5.3, we have ∥𝜓∥ 0 ≥ 𝑐 𝛼 ∥𝜓∥ 𝛼 for every norm 𝜓 up to some constants 𝑐 𝛼 > 0 depending on 𝛼 only. Hence ∥ (𝜂 − 𝑎) 2 𝜉 ∥ 1 ≥ 𝑐 ∥𝜂 − 𝑎∥ 22 ∥𝜉 ∥ 2 (14.21) for some absolute constant 𝑐 > 0. Recall that ∫ 1 (𝜂 − 𝑎) 2 d𝜇 ∥𝜂 − 𝑎∥ 22 = 𝜇( 𝐴) 𝐴
and
∥𝜉 ∥ 22 =
1 𝜇( 𝐴)
∫
𝜉 2 d𝜇. 𝐴
Applying Proposition 2.6.3 to the random variables 𝜂 − 𝑎 and 𝜉, and using Var(𝜂) = Var(𝜉) = 1, we obtain that
292
14 Fluctuations of Distributions
∥𝜂 − 𝑎∥ 22 ≥
1 𝜇( 𝐴) 2 , 24
∥𝜉 ∥ 22 ≥
1 𝜇( 𝐴) 3 . 24
Using these estimates in (14.21), the identity (14.20) then yields Var(𝜁) ≥
𝜇( 𝐴) 7/2 𝑐 . √ 24 24 E𝜉 +
√︁ In addition, E𝜉 + ≤ E𝜉 2 = 1, while, by Proposition 2.6.2, 𝜇( 𝐴) ≥ 1e . Hence 𝜎 2 ≡ Var(𝜁) ≥ 𝑐 for some other constant 𝑐 > 0. + Now, in order to bound the expectation 𝑎 = E𝜁 = EE𝜉𝜉 +𝜂 , just note that |E𝜉 + 𝜂| 2 ≤ E𝜉 2 E𝜂2 = 1, and once more by Propositions 2.6.2–2.6.3, ∫ 1 1 E𝜉 + = 𝜉 d𝜇 ≥ √ 𝜇( 𝐴) 2 ≥ √ . 𝐴 4 2 4 2 e2 √ Thus, |𝑎| ≤ 4 2 e2 . Applying now Proposition 2.6.4, we finally get 𝑞(𝑡) ≤
n |𝑡 − 𝑎| o 𝐶 exp − √ ≤ 𝐶 ′e−𝑐 |𝑡 | 𝜎 𝜎 12
for some positive absolute constants 𝑐, 𝐶, 𝐶 ′ (since we have universal bounds for 𝜎 and 𝑎). It remains to note that 𝑢(𝑡) = 𝑞(𝑡) E𝜉 + ≤ 𝑞(𝑡). ∫ At last, replacing 𝑙 with −𝑙, we get a similar estimate for R𝑛−1 𝑙 − (𝑦) 𝑝(𝑦, 𝑡) d𝑦, where 𝑙 − (𝑦) = max{− ⟨𝑙, 𝑦⟩ , 0}. Lemma 14.9.3 is proved. □ Lemma 14.9.4 For every 𝑡 ∈ R, the function 𝑢 𝑡 (𝜃) = 𝐹𝜃 (𝑡) = P{⟨𝑋, 𝜃⟩ ≤ 𝑡} has a Lipschitz semi-norm on S𝑛−1 satisfying, with some universal constants 𝐶 > 0 and 𝑐 > 0, ∥𝑢 𝑡 ∥ Lip ≤ 𝐶 e−𝑐 |𝑡 | . Proof Without loss of generality, we may assume that the density 𝑝 of 𝑋 is positive on R𝑛 (hence everywhere continuous). It is readily verified that 𝑢 𝑡 is differentiable at all points of R𝑛 except for the origin, and we need to show that the spherical gradient of 𝑢 𝑡 satisfies |∇S 𝑢 𝑡 (𝜃)| ≤ 𝐶 e−𝑐 |𝑡 | for all 𝜃 ∈ S𝑛−1 . Moreover, by the isotropy assumption, it is sufficient to consider the point 𝜃 = 𝑒 𝑛 = (0, . . . , 0, 1). Note that ∑︁ 𝑛−1 1/2 𝜕𝑢 𝑡 (𝑒 𝑛 ) 2 |∇S 𝑢 𝑡 (𝑒 𝑛 )| = = sup | ⟨∇𝑢 𝑡 (𝑒 𝑛 ), 𝑙⟩ |, 𝜕𝜃 𝑘 |𝑙 |=1 𝑘=1 where the supremum is taken over all unit vectors 𝑙 = (𝑙 1 , . . . , 𝑙 𝑛−1 , 0) in R𝑛 with the last coordinate 𝑙 𝑛 = 0.
14.9 The Log-concave Case
293
Fix 𝑘 = 1, . . . , 𝑛 − 1. The two-dimensional random vector (𝑋 𝑘 , 𝑋𝑛 ) has a positive (continuous) log-concave density 𝑝 𝑘 = 𝑝 𝑘 (𝑥 𝑘 , 𝑥 𝑛 ) on R2 . Therefore, using the canonical orthonormal basis 𝑒 1 , . . . , 𝑒 𝑛−1 , 𝑒 𝑛 in R𝑛 , for any 𝜀 > 0, the difference 𝑢 𝑡 (𝑒 𝑛 + 𝜀𝑒 𝑘 ) − 𝑢 𝑡 (𝑒 𝑛 ) = P{𝑋𝑛 + 𝜀𝑋 𝑘 ≤ 𝑡} − P{𝑋𝑛 ≤ 𝑡} may be written in terms of the density as ∫ 0 ∫ 𝑡−𝜀 𝑥𝑘 ∫ 𝑝 𝑘 (𝑥 𝑘 , 𝑥 𝑛 ) d𝑥 𝑛 d𝑥 𝑘 − −∞
∫
∫
−𝑥 𝑘
=𝜀 −∞
∫
∫ 𝑝 𝑘 (𝑥 𝑘 , 𝑡 + 𝜀𝑢) d𝑢 d𝑥 𝑘 − 𝜀
𝑡−𝜀 𝑥 𝑘 ∞ ∫ 0
0
0
∫
𝑡
𝑝 𝑘 (𝑥 𝑘 , 𝑥 𝑛 ) d𝑥 𝑛 d𝑥 𝑘
0
𝑡 0
∞
0
−𝑥 𝑘 ∞
∫ (−𝑥 𝑘 ) 𝑝 𝑘 (𝑥 𝑘 , 𝑡) d𝑥 𝑘 − (𝜀 + 𝑜(𝜀))
= (𝜀 + 𝑜(𝜀))
𝑝 𝑘 (𝑥 𝑘 , 𝑡 + 𝜀𝑢) d𝑢 d𝑥 𝑘
−∞
𝑥 𝑘 𝑝 𝑘 (𝑥 𝑘 , 𝑡) d𝑥 𝑘 . 0
Note that all the integrals are well-defined and finite, since log-concave densities decrease exponentially fast at infinity and, more precisely, they admit exponential upper bounds such as 𝐶 e−𝑐 | 𝑥 | . Thus, letting 𝜀 → 0, we find that ∫ ∞ 𝜕𝑢 𝑡 (𝑒 𝑛 ) =− 𝑥 𝑘 𝑝 𝑘 (𝑥 𝑘 , 𝑡) d𝑥 𝑘 . 𝜕𝜃 𝑘 −∞ ∫ , we arrive at the repreSince, by Fubini’s theorem, 𝑝 𝑘 (𝑥 𝑘 , 𝑥 𝑛 ) = R𝑛−2 𝑝(𝑥) d𝑥d𝑥 𝑘 d𝑥𝑛 sentation ∫ 𝜕𝑢 𝑡 (𝑒 𝑛 ) =− 𝑥 𝑘 𝑝(𝑥1 , . . . , 𝑥 𝑛−1 , 𝑡) d𝑥1 . . . d𝑥 𝑛−1 . 𝜕𝜃 𝑘 R𝑛−1 Hence, in a vector form, given a unit vector 𝑙 in R𝑛−1 , ∫ ⟨∇𝑢 𝑡 (𝑒 𝑛 ), 𝑙⟩ = − ⟨𝑙, 𝑦⟩ 𝑝(𝑦, 𝑡) 𝑑𝑦, R𝑛−1
where we write 𝑦 = (𝑥1 , . . . , 𝑥 𝑛−1 ). It remains to apply Lemma 14.9.3.
□
Proof (of Proposition 14.9.1) By Proposition 9.5.1 (𝐿 1 -Poincaré-type inequality on the unit sphere), applied to the function 𝑢 𝑡 (𝜃) = 𝐹𝜃 (𝑡), we have 𝜋 𝐶 E 𝜃 |𝐹𝜃 (𝑡) − 𝐹 (𝑡)| ≤ √ |∇𝑆 𝑢 𝑡 (𝜃)| ≤ √ e−𝑐 |𝑡 | 𝑛 2𝑛 for some absolute positive constants 𝐶 and 𝑐, using Lemma 14.9.4 in the last step. Integrating this inequality over the variable 𝑡, we then arrive at ∫ ∞ 𝐶′ E 𝜃 𝑊 (𝐹𝜃 , 𝐹) = E 𝜃 |𝐹𝜃 (𝑡) − 𝐹 (𝑡)| d𝑡 ≤ √ , 𝑛 −∞ which is the first assertion of the proposition. To prove the second one, it remains to apply Proposition 14.2.1. □
294
14 Fluctuations of Distributions
Proof (of Proposition 14.9.2) Since every weighted sum ⟨𝑋, 𝜃⟩ with 𝜃 ∈ S𝑛−1 represents a log-concave random variable with mean 𝑎 = 0 and standard deviation 𝜎 = 1, we know from Proposition 2.6.4 that the density 𝑞 𝜃 (𝑥) = 𝐹𝜃′ (𝑥) of ⟨𝑋, 𝜃⟩ satisfies, for all 𝑥 ∈ R, 𝑞 𝜃 (𝑥) ≤ 𝐶1 e− |𝑥 |/4 , (14.22) where 𝐶1 is some positive universal constant. Integrating this inequality over 𝜃, we have a similar bound 𝑞(𝑥) ≤ 𝐶1 e− | 𝑥 |/4 (14.23) for the density 𝑞 of 𝐹 (although 𝑞 does not need to be log-concave). Hence max{𝐹𝜃 (−𝑥), 1 − 𝐹𝜃 (𝑥)} ≤ 4𝐶1 e−𝑥/4 ,
𝑥 ≥ 0,
and similarly for 𝐹 as well. Equivalently, we have e 𝑥/8 max{𝐹𝜃 (−𝑥), 1 − 𝐹𝜃 (𝑥)} ≤ 4𝐶1 e−𝑥/8 and the same for 𝐹, so, e | 𝑥 |/8 |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| ≤ 4𝐶1 𝑟
for all |𝑥| ≥ 8 log(1/𝑟),
where 𝑟 ∈ (0, 12 ] is a fixed number. Put Ω(𝑥) = {𝜃 ∈ S𝑛−1 : e𝑐 | 𝑥 | |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| ≤ 𝐶𝑟},
(14.24)
𝑥 ∈ R,
where 𝑐 and 𝐶 are constants from Lemma 14.9.3 (we may assume that 𝑐 ≤ 1/8). By this lemma, one may apply the spherical concentration inequality (10.6) of Proposition 10.3.1 to the Lipschitz function 𝜃 → 𝐶1 e𝑐 | 𝑥 | (𝐹𝜃 (𝑥) − 𝐹 (𝑥)), which gives 2 𝔰𝑛−1 (Ω(𝑥)) ≥ 1 − 2 e−𝑛𝑟 /4 . We apply this inequality to a sequence of points 𝑥 = 𝑥1 , . . . , 𝑥 𝑁 increasing with step log(1/𝑟) when 𝑟 ↓ 0, 𝑟 and such that 𝑥 1 = −𝑥 𝑁 (the number 𝑁 = 𝑁𝑟 will grow as 𝑐 ′ 𝑟 𝑁 ′ where the constant 𝑐 has to be later specified). As a result, for the set Ω = ∩ 𝑘=1 Ω(𝑥 𝑘 ), we have 2 𝔰𝑛−1 (Ω(𝑡)) ≥ 1 − 2𝑁e−𝑛𝑟 /4 . (14.25) Now, take 𝜃 ∈ Ω so that, for all 1 ≤ 𝑘 ≤ 𝑁, e𝑐 | 𝑥𝑘 | |𝐹𝜃 (𝑥 𝑘 ) − 𝐹 (𝑥 𝑘 )| ≤ 𝐶𝑟.
(14.26)
To extend this estimate to all points 𝑥 ∈ [𝑥 1 , 𝑥 𝑁 ], assume that 𝑥 𝑘 < 𝑥 < 𝑥 𝑘+1 for some 𝑘 = 1, . . . , 𝑁 − 1. Since 𝑥 𝑘+1 − 𝑥 𝑘 = 𝑟, from (14.22)–(14.24) and (14.26) it follows that
14.9 The Log-concave Case
295
|𝐹𝜃 (𝑥) − 𝐹 (𝑥)| ≤ |𝐹𝜃 (𝑥 𝑘 ) − 𝐹 (𝑥 𝑘 )| + |𝐹𝜃 (𝑥) − 𝐹𝜃 (𝑥 𝑘 )| + |𝐹 (𝑥) − 𝐹 (𝑥 𝑘 )| ≤ 𝐶𝑟e−𝑐 |𝑥𝑘 | + 𝑟 sup 𝑞 𝜃 (𝑦) + 𝑟 sup 𝑞(𝑦) 𝑥 𝑘 0. Using the property 𝜌(𝐹𝜃 , 𝐹) ≤ 1, this implies E 𝜃 𝜌(𝐹𝜃 , 𝐹) = E 𝜃 𝜌(𝐹𝜃 , 𝐹) 1 {𝜌(𝐹
√ 𝜃 ,𝐹) 0, 4 (14.27) 𝔰𝑛−1 {𝐿 (𝐹𝜃 , 𝐹) ≥ 𝑟} ≤ 4𝑛3/8 e−𝑛𝑟 /8 . log 𝑛
This bound makes sense for 𝑟 ≥ 𝑐 ( 𝑛 ) 1/4 . Hence, Proposition 14.4.2 and Corollary 14.4.3 provide an improvement of the rate from 𝑛−1/4 to 𝑛−1/3 (modulo logarithmically growing terms with respect to 𝑛). Proposition 14.9.2 was proved in [25] as well. The key Lemma 14.9.3 was actually discovered before in [4] in the case where 𝑋 is uniformly distributed over a convex body (using arguments from Convex Geometry). Proposition 14.9.1, as a consequence of Lemma 14.9.3, was mentioned in [31]. The material of Sections 14.6 and 14.7 is based on [38]. The closeness of most 𝐹𝜃 to Φ may also be studied for stronger metrics, or in stronger senses. An important result in this direction is due to S. Sodin ([167], Theorem 3). Let 𝑋 be a random vector in R𝑛 with mean zero and 𝜓1 -marginals, that is, satisfying E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2 for all 𝜃 ∈ S𝑛−1 with some constant 𝜆 > 0. Suppose that the typical distribution 𝐹 is almost standard normal in the sense that, for some 𝜀 ∈ (0, 𝜀0 ), 1 − 𝐹 (𝑥) ≤ 1 + 𝜀, 0 ≤ 𝑥 ≤ 𝑇 . 1−𝜀 ≤ 1 − Φ(𝑥) Then 𝔰𝑛−1
n
1 − 𝐹 (𝑥) o 𝐶𝑇 8 n o 𝜃 2 6 − 1 ≥ 10 𝜀 ≤ sup exp − 𝑐𝑛𝜀 /𝑇 . 𝑛𝜀 4 0≤𝑥 ≤𝑇 1 − Φ(𝑥)
Here the positive constants 𝜀0 , 𝑐, 𝐶 do not depend on 𝑛 and the distribution of 𝑋.
Part V
Refined Bounds and Rates
Chapter 15
𝑳 2 Expansions and Estimates
We now consider more precise assertions about fluctuations of the distribution functions 𝐹𝜃 (𝑥) of the weighted sums ⟨𝑋, 𝜃⟩ around their means 𝐹 (𝑥). This is possible for the 𝐿 2 -distance defined for arbitrary distribution functions 𝐹 and 𝐺 by ∫
∞
(𝐹 (𝑥) − 𝐺 (𝑥)) 2 d𝑥
𝜔(𝐹, 𝐺) =
1/2 ,
−∞
which is always finite as long as these distributions have finite first absolute moments. The main tool in this investigation will be the Plancherel theorem, which relates this distance to the difference of the corresponding characteristic functions 𝑓 and 𝑔 by virtue of the explicit formula 𝜔(𝐹, 𝐺) =
1 2𝜋
∫
1/2 𝑓 (𝑡) − 𝑔(𝑡) 2 . d𝑡 𝑡
∞
−∞
15.1 General Approximations For an arbitrary random vector 𝑋 in R𝑛 and a weighted sum of its components, we therefore have the identity ∫ ∞ 1 Var 𝜃 ( 𝑓 𝜃 (𝑡)) E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = d𝑡, (15.1) 2𝜋 −∞ 𝑡2 where as before 𝐹𝜃 (𝜃 ∈ S𝑛−1 ) denotes the distribution function of the weighted sum ⟨𝑋, 𝜃⟩ with characteristic function 𝑓 𝜃 , and where 𝐹 = E 𝜃 𝐹𝜃 denotes the typical distribution with characteristic function 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) (E 𝜃 and Var 𝜃 mean integration and taking the variance over the uniform probability measure 𝔰𝑛−1 on the unit sphere). Recall that Var 𝜃 ( 𝑓 𝜃 (𝑡)) = E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 = E 𝜃 | 𝑓 𝜃 (𝑡)| 2 − | 𝑓 (𝑡)| 2 , © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_15
299
15 𝐿 2 Expansions and Estimates
300
and 𝑓 (𝑡) = E 𝜃 E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ = E𝐽𝑛 (𝑡|𝑋 |), where 𝐽𝑛 is the characteristic function of the first coordinate 𝜃 1 of 𝜃 under 𝔰𝑛−1 . Similarly, if 𝑌 is an independent copy of 𝑋, E 𝜃 | 𝑓 𝜃 (𝑡)| 2 = E 𝜃 E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ E e−𝑖𝑡 ⟨𝑌 , 𝜃 ⟩ = E 𝜃 E e𝑖𝑡 ⟨𝑋−𝑌 , 𝜃 ⟩ = E𝐽𝑛 (𝑡|𝑋 − 𝑌 |). Hence, the Plancherel formula (15.1) becomes ∫ ∞ d𝑡 1 2 E 𝜃 𝜔 (𝐹𝜃 , 𝐹) = E𝐽𝑛 (𝑡|𝑋 − 𝑌 |) − |E𝐽𝑛 (𝑡|𝑋 |)| 2 2 . 2𝜋 −∞ 𝑡
(15.2)
To further simplify this expression (asymptotically with respect to the growing √ dimension 𝑛), let us assume that E |𝑋 | ≤ 𝑏 𝑛 and apply the normal approximation √ 𝐽𝑛 𝑡 𝑛 − e−𝑡 2 /2 ≤ 𝐶 min{1, 𝑡 2 }, 𝑛 where 𝐶 is an absolute constant (cf. Section 11.3). Note that, for any 𝜂 > 0, ∫
∞
𝛿(𝜂) ≡ −∞
min{1, 𝑡 2 𝜂2 } d𝑡 = 2 𝑡2
∫
1/𝜂 2
∫
∞
𝜂 d𝑡 + 2 0
1/𝜂
1 d𝑡 = 4𝜂, 𝑡2
(15.3)
with 𝛿(0) = 0. Hence ∫ ∞ ∫ ∞ n 𝑡 2 |𝑋 − 𝑌 | 2 o d𝑡 𝐶 −𝑡 2 |𝑋−𝑌 | 2 /2𝑛 d𝑡 min 1, E 𝐽𝑛 (𝑡|𝑋 − 𝑌 |) − e 2 ≤ E 𝑛 𝑛 𝑡 𝑡2 −∞ −∞ 4𝐶 |𝑋 − 𝑌 | 8𝐶𝑏 = E √ ≤ . 𝑛 𝑛 𝑛 Similarly, ∫ ∞ ∫ ∞ d𝑡 2 2 2 2 d𝑡 |𝐽𝑛 (𝑡|𝑋 |) − e−𝑡 |𝑋 | /2𝑛 | 2 (E𝐽𝑛 (𝑡|𝑋 |)) 2 − (E e−𝑡 |𝑋 | /2𝑛 ) 2 2 ≤ 2 E 𝑡 𝑡 −∞ −∞ ∫ ∞ n 𝑡 2 |𝑋 | 2 o d𝑡 8𝐶 |𝑋 | 2𝐶 8𝐶𝑏 ≤ E min 1, = E√ ≤ . 𝑛 𝑛 𝑛 𝑛 𝑡2 𝑛 −∞ Using this bound in (15.2), we arrive at the following general approximation. Proposition 15.1.1 √ Let 𝑌 be an independent copy of the random vector 𝑋 in R𝑛 satisfying E |𝑋 | ≤ 𝑏 𝑛 a.s. Then ∫ ∞ 2 d𝑡 𝐶𝑏 2 2 2 2 1 E e−𝑡 |𝑋−𝑌 | /2𝑛 − E e−𝑡 |𝑋 | /2𝑛 + , E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = 2𝜋 −∞ 𝑛 𝑡2 where 𝐶 is a quantity bounded by an absolute constant (in absolute value).
15.1 General Approximations
301
In order to obtain more precise formulas with a remainder term of a smaller order than 1/𝑛, we need more precise approximations of the function 𝐽𝑛 , such as √ 𝑡 4 −𝑡 2 /2 𝐽𝑛 𝑡 𝑛) = 1 − e + 𝑂 𝑛−2 min{1, 𝑡 2 } , 4𝑛
(15.4)
which holds uniformly on the real line with a universal constant in 𝑂 (note that the remainder term can be made even smaller for small values of 𝑡, cf. Corollary 11.3.3). With this approximation, as before, we have ∫ ∞ 8𝐶𝑏 𝑡 4 |𝑋 − 𝑌 | 4 − 𝑡 2 |𝑋−𝑌 |2 d𝑡 e 2𝑛 2 ≤ 2 . E 𝐽𝑛 (𝑡|𝑋 − 𝑌 |) − 1 − 3 4𝑛 𝑡 𝑛 −∞ 4
𝑡 Next we note that the main term (1− 4𝑛 ) e−𝑡 constant, which implies that
2 /2
in (15.4) is bounded by an absolute
√ √ 𝑡4 𝑠4 −(𝑡 2 +𝑠2 )/2 e 𝐽𝑛 𝑡 𝑛) 𝐽𝑛 𝑠 𝑛) = 1 − 1− + 𝑂 𝑛−2 min{1, 𝑡 2 + 𝑠2 } 4𝑛 4𝑛 𝑡 4 + 𝑠4 −(𝑡 2 +𝑠2 )/2 e = 1− + 𝑂 𝑛−2 min{1, 𝑡 2 + 𝑠2 } . 4𝑛 Hence 𝑡 4 (|𝑋 | 4 + |𝑌 | 4 ) − 𝑡 2 (|𝑋|2 +|𝑌 |2 ) 2𝑛 e |E𝐽𝑛 (𝑡|𝑋 |)| 2 = E 𝐽𝑛 (𝑡|𝑋 |) 𝐽𝑛 (𝑡|𝑌 |) = E 1 − 4𝑛3 n 𝑡 2 (|𝑋 | 2 + |𝑌 | 2 ) o + 𝑂 𝑛−2 E min 1, . 𝑛 Integrating over 𝑡 according to (15.2) and applying (15.3), we obtain the following relation. As before, 𝑌 denotes an independent copy of 𝑋. √ Proposition 15.1.2 If 𝑋 is a random vector in R𝑛 satisfying E |𝑋 | ≤ 𝑏 𝑛, then with some quantity 𝐶 which is bounded by an absolute constant, we have E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = where ∫ 𝐼=
∞
−∞
𝐶𝑏 1 𝐼+ 2, 2𝜋 𝑛
h 𝑡 4 (|𝑋 | 4 + |𝑌 | 4 ) − 𝑡 2 (|𝑋|2 +|𝑌 |2 ) i d𝑡 𝑡 4 |𝑋 − 𝑌 | 4 − 𝑡 2 |𝑋−𝑌 |2 2𝑛 e 2𝑛 − E 1 − e . E 1− 3 4𝑛 4𝑛3 𝑡2
15 𝐿 2 Expansions and Estimates
302
15.2 Bounds for 𝑳 2 -distance With a Standard Rate In this section our aim is to show that the mean distances E 𝜃 𝜔2 (𝐹𝜃 , 𝐹)
and E 𝜃 𝜔2 (𝐹𝜃 , Φ)
are of order at most 𝑂 (1/𝑛) with involved constants depending only on 𝑚 22 = 𝑚 22 (𝑋) =
1 E ⟨𝑋, 𝑌 ⟩ 2 𝑛
𝜎42 = 𝜎42 (𝑋) =
and
1 Var(|𝑋 | 2 ) 𝑛
(where 𝑌 is an independent copy of 𝑋). In particular, we do not need to involve the functional 𝑀4 . Proposition 15.2.1 Let 𝑋 be a random vector in R𝑛 with E𝑋 = 𝑎 and E |𝑋 | 2 = 𝑛. For some absolute constant 𝑐 ≥ 0, E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
𝑐𝐴 , 𝑛
(15.5)
where 𝐴 = |𝑎| 2 + 𝑚 22 + 𝜎42 . In particular, if 𝑋 is isotropic, then E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
𝑐 (1 + 𝜎42 ) . 𝑛
Similar inequalities continue to hold with the standard normal distribution function Φ in place of 𝐹. If 𝑋 is isotropic, then 𝑚 2 = 1, while |𝑎| ≤ 1 (by Bessel’s inequality). Hence, both characteristics 𝑚 2 and 𝑎 may be removed from the parameter 𝐴 in this case (more precisely, one may put 𝐴 = 1 + 𝜎42 ). If 𝑋 is not necessarily isotropic, one may involve the functional 𝑀2 by using |𝑎| ≤ 𝑀2 and 𝑚 2 ≤ 𝑀22 (Propositions 1.4.2 and 1.2.2). Hence, in (15.5) one may take 𝐴 = 𝑀24 + 𝜎42 . In general, it may however happen that 𝑚 2 and 𝜎4 are bounded, while |𝑎| is large. The example in Remark 15.2.2 shows that this parameter cannot be removed. Proof To derive (15.5), introduce the random variable 𝜌2 = By Jensen’s inequality, E e−𝑡 E e−𝑡
2 |𝑋−𝑌 | 2 /2𝑛
|𝑋 − 𝑌 | 2 2𝑛
2 |𝑋 | 2 /2𝑛
− E e−𝑡
≥ e−𝑡
(𝜌 ≥ 0).
2 /2
2 |𝑋 | 2 /2𝑛
, so that
2
≤ E e−𝑡
2 |𝑋−𝑌 | 2 /2𝑛
2
− e−𝑡 .
15.2 Bounds for 𝐿 2 -distance With a Standard Rate
303
By Proposition 15.1.1, E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
1 E 2𝜋
∫
∞
−∞
e−𝜌
2𝑡2
− e−𝑡
2
𝑡2
d𝑡 +
𝑐 𝑛
for some absolute constant 𝑐 > 0. As we know, the above integral may be evaluated according to the identity of Lemma 12.5.2, ∫
∞
−∞
2
2
√ √ √ e−𝛼𝑡 − e−𝛼0 𝑡 d𝑡 = 2 𝜋 𝛼 − 𝛼 , 0 𝑡2
𝛼, 𝛼0 ≥ 0.
(15.6)
Applying it with 𝛼0 = 1 and 𝛼 = 𝜌 2 , we arrive at the bound √
𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤ (1 − E𝜌) +
𝑐 𝑛
for some absolute constant 𝑐. Furthermore, let us recall the bound of Lemma 1.5.4, 1 − E𝜌 ≤
1 E (1 − 𝜌 2 ) + E (1 − 𝜌 2 ) 2 . 2
Then we get √
𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
1 𝑐 E (1 − 𝜌 2 ) + E (1 − 𝜌 2 ) 2 + . 2 𝑛
Since |𝑋 − 𝑌 | 2 = |𝑋 | 2 + |𝑌 | 2 − 2 ⟨𝑋, 𝑌 ⟩, we may write 1 − 𝜌2 =
𝑛 − |𝑋 | 2 𝑛 − |𝑌 | 2 ⟨𝑋, 𝑌 ⟩ + + , 2𝑛 2𝑛 𝑛
which implies that 1 − E𝜌 2 =
1 1 E ⟨𝑋, 𝑌 ⟩ = |𝑎| 2 , 𝑛 𝑛
𝑎 = E𝑋.
In addition, (1 − 𝜌 2 ) 2 ≤ 2
𝑛 − |𝑋 | 2 2𝑛
+
⟨𝑋, 𝑌 ⟩ 2 𝑛 − |𝑌 | 2 2 +2 , 2𝑛 𝑛2
which gives E (1 − 𝜌 2 ) 2 ≤
E ⟨𝑋, 𝑌 ⟩ 2 𝜎42 + 2𝑚 22 Var(|𝑋 | 2 ) + 2 = . 𝑛 𝑛2 𝑛2
It remains to apply these bounds in (15.7), which leads to √
𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
1 2
|𝑎| 2 + 𝜎42 + 2𝑚 22 + 𝑐 . 𝑛
(15.7)
15 𝐿 2 Expansions and Estimates
304
Due to the assumption E |𝑋 | 2 = 𝑛, we necessarily have 𝑚 2 ≥ 1 (cf. Proposition 1.1.3), and the inequality (15.5) follows. √ Finally, in order to replace 𝐹 with Φ, let us recall that 𝜔(𝐹, Φ) ≤ 𝑐 (1+𝜎2 )/ 𝑛 up to an absolute positive constant 𝑐 (Proposition 12.5.1). Squaring this inequality and using 𝜎2 ≤ 𝜎4 , we get 𝜔2 (𝐹, Φ) ≤ 2𝑐2 (1 + 𝜎42 )/𝑛. Proposition 15.2.1 is proved. □ Returning to the last step, let us explain how to estimate 𝜔2 (𝐹, Φ) with arguments developed in the previous and current sections. By the Plancherel formula, ∫ ∞ 2 1 d𝑡 2 | E𝐽𝑛 (𝑡|𝑋 |) − e−𝑡 /2 | 2 2 . (15.8) 𝜔 (𝐹, Φ) = 2𝜋 −∞ 𝑡 √ 2 Applying the approximation |𝐽𝑛 (𝑡 𝑛) − e−𝑡 /2 | ≤ 𝑛𝑐 min{1, |𝑡|} together with the identity (15.3), we conclude that ∫ ∞ ∫ ∞ 2 2 d𝑡 2 2 −𝑡 2 |𝑋 | 2 /2𝑛 d𝑡 𝐽𝑛 (𝑡|𝑋 |) − e−𝑡 |𝑋 | /2𝑛 2 E𝐽𝑛 (𝑡|𝑋 |) − E e 2 ≤E 𝑡 𝑡 −∞ −∞ ∫ ∞ n 𝑡 2 |𝑋 | 2 o d𝑡 8𝑐2 |𝑋 | 2𝑐2 8𝑐2 ≤ 2 E min 1, = 2 E√ ≤ 2 . 𝑛 𝑛 𝑡2 𝑛 𝑛 𝑛 −∞ Hence, up to an error of order at most 𝑂 (𝑛−2 ), one may replace the characteristic 2 2 function 𝑓 (𝑡) = E𝐽𝑛 (𝑡|𝑋 |) of 𝐹 with the characteristic function 𝑔(𝑡) = E e−𝑡 |𝑋 | /2𝑛 of the corresponding mixture of centered Gaussian mixtures on the real line. More precisely, subtracting 𝑔(𝑡) from 𝑓 (𝑡) and adding in (15.8), it follows that ∫
1 𝜋
𝜔2 (𝐹, Φ) ≤
∞
(E e−𝑡
2 |𝑋 | 2 /2𝑛
− e−𝑡
2 /2
𝑡2
−∞
)2
d𝑡 +
8𝑐2 . 𝜋𝑛2
1 |𝑋 | 2 and 𝛼0 = 12 . Again, by Let us remove the square and apply (15.6) with 𝛼 = 2𝑛 1 Lemma 1.5.4 with 𝜉 = √𝑛 |𝑋 |, we then see that the above integral does not exceed
∫
∞
−∞
E e−𝑡
2 |𝑋 | 2 /2𝑛
− e−𝑡
𝑡2
2 /2
d𝑡 =
√ √ 2𝜋 E 1 − |𝑋 |/ 𝑛
1 √ ≤ 2𝜋 Var |𝑋 | 2 = 𝑛 Thus,
√ 2𝜋 2 𝜎4 . 𝑛
√
1 2 𝜎2 𝜔 (𝐹, Φ) ≤ √ 4 + 𝑂 2 . 𝑛 𝜋𝑛 2
Remark 15.2.2 To see that the presence of the parameter 𝑎 in (15.5) is necessary under the condition E |𝑋 | 2 = 𝑛, let us return to the example of the Gaussian random vector described in Remark 14.7.2, 𝑋 = 𝛼𝑍 + 𝜆𝑒 𝑛
with 1 ≤ 𝜆 ≤ 𝑛1/4 , 𝛼2 (𝑛 − 1) + 𝜆2 = 𝑛,
15.3 Expansion With Error of Order 𝑛−1
305
where 𝑍 is a standard normal random vector in R𝑛−1 . As was noticed, 𝜎42 ≤ 2 and 𝑚 22 ≤ 𝑚 42 ≤ 9, while the mean 𝑎 = E𝑋 has length |𝑎| = 𝜆 ≥ 1. Hence, the inequality (15.5), being stated for the normal distribution function in place of 𝐹, simplifies to E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≤
𝑐𝜆2 . 𝑛
Let us show that this bound may be reversed up to an absolute factor. Note that, for any unit vector 𝜃, the linear form 𝑆 𝜃 = ⟨𝑋, 𝜃⟩ has a normal distribution on the line with mean 𝜆𝜃 𝑛 and variance 𝛼2 (1 − 𝜃 2𝑛 ). Consider the normal distribution function 1 2 Φ 𝜇, 𝜎 2 (𝑥) = Φ( 𝑥−𝜇 𝜎 ) with parameters 0 ≤ 𝜇 ≤ 1 and 2 ≤ 𝜎 ≤ 1 (𝜎 > 0). If 𝜇 | 𝑥−𝜇 | 𝑥−𝜇 , then 𝑥−𝜇 𝑥 ≤ 1+𝜎 𝜎 ≤ 𝑥, while 𝜎 ≥ |𝑥|. Hence, the interval [ 𝜎 , 𝑥] has length 𝑥−
2 𝑥 − 𝜇 𝜇 − (1 − 𝜎)𝑥 = ≥ 𝜇 ≥ 𝜇, 𝜎 𝜎 1+𝜎
and the standard normal density 𝜑(𝑦) attains a minimum on it at the left endpoint. It follows that ∫ 𝑥 𝑥 − 𝜇 𝜑(𝑦) d𝑦 ≥ 𝜇 𝜑 , |Φ 𝜇, 𝜎 2 (𝑥) − Φ(𝑥)| = 𝑥−𝜇 𝜎 𝜎 so that 2
𝜔 (Φ 𝜇, 𝜎 2 , Φ) ≥ 𝜇 =
2
∫
𝜇 1+𝜎
𝑥 − 𝜇 2 𝜑
−∞ ∫ − 𝜇 1+𝜎 𝜎𝜇2
2𝜋
d𝑥
𝜎 2
e−𝑦 d𝑦 ≥
−∞
𝜎𝜇2 2𝜋
∫
−1
2
e−𝑦 d𝑦 ≥ 𝑐𝜇2 .
−∞
In our case, 𝜇 = E𝑆 𝜃 = 𝜆𝜃 𝑛 and 𝜎 2 = Var(𝑆 𝜃 ) = 𝛼2 (1 − 𝜃 2𝑛 ). Since 𝜆 ≤ 𝑛1/4 and 2 1 2 𝛼2 = 𝑛−𝜆 𝑛−1 , we have that, for all 𝑛 large enough, 0 ≤ 𝜇 ≤ 1 and 𝜎 ≥ 2 on the set log 𝑛 𝑛−1 Ω𝑛 = {𝜃 ∈ S : 0 ≤ 𝜃 𝑛 < √𝑛 }. It then follows that E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥ 𝑐𝜆2 E 𝜃 𝜃 2𝑛 1 { 𝜃 ∈Ω𝑛 } ≥
𝑐 ′𝜆 2 . 𝑛
15.3 Expansion With Error of Order 𝒏−1 Let us now sharpen Proposition 15.2.1 by using a more precise asymptotic expansion for the functions 𝐽𝑛 . With this aim, we strengthen the condition E |𝑋 | 2 = 𝑛 to the pointwise assumption |𝑋 | 2 = 𝑛 a.s. In this case, 𝜌2 =
|𝑋 − 𝑌 | 2 = 1 − 𝜉, 2𝑛
𝜉=
⟨𝑋, 𝑌 ⟩ , 𝑛
15 𝐿 2 Expansions and Estimates
306
where 𝑌 is an independent copy of 𝑋. Thus, 𝜉 may take values in [−1, 1], and in addition, E𝜉 2 = 𝑛1 𝑚 22 . Proposition 15.3.1 Let 𝑋 be a random vector in R𝑛 such that |𝑋 | 2 = 𝑛 a.s. Then √
1 1 𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = 1 + E 1 − (1 − 𝜉) 1/2 − + 𝑂 (𝑛−2 ). 4𝑛 8𝑛
A similar expression holds true for Φ in place of the typical distribution 𝐹. Proof Let us return to the identity of Proposition 15.1.2, E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) =
1 𝐼 + 𝑂 (𝑛−2 ), 2𝜋
in which the integral 𝐼 may be written in terms of 𝜉 as ∫ ∞ h (1 − 𝜉) 2 𝑡 4 −(1− 𝜉 ) 𝑡 2 𝑡 4 −𝑡 2 i d𝑡 e e . 𝐼=E 1− − 1− 𝑛 2𝑛 𝑡2 −∞ To evaluate this integral, consider the function ∫ ∞ h 𝛼2 𝑡 4 −𝛼𝑡 2 𝑡 4 −𝑡 2 i d𝑡 e e , 𝜓(𝛼) = 1− − 1− 𝑛 2𝑛 𝑡2 −∞
𝛼 ≥ 0.
We have ∞
2𝛼𝑡 2 − 𝛼2 𝑡 4 −𝛼𝑡 2 e d𝑡 𝑛 −∞ ∫ ∞ 𝑠2 − 14 𝑠4 −𝑠2 /2 √ 1 −1/2 1 1+ e d𝑠 = − 𝜋 1 + 𝛼 , = −√ 𝑛 4𝑛 2𝛼 −∞
𝜓 ′ (𝛼) = −
∫
1+
and after integration √ 1 1/2 𝛼 −1 . 𝜓(𝛼) = 𝜓(1) − 2 𝜋 1 + 4𝑛 Also, ∫
∞
𝜓(1) = − −∞
so
√ 𝜋 𝑡 2 −𝑡 2 e d𝑡 = − , 2𝑛 4𝑛
√ h 1 1i 𝜓(𝛼) = 2 𝜋 1 + (1 − 𝛼1/2 ) − . 4𝑛 8𝑛 It remains to insert the value 𝛼 = 1 − 𝜉, since 𝐼 = 𝜓(1 − 𝜉). The last assertion follows from the asymptotic expansion for 𝜔2 (𝐹, Φ) given in Proposition 12.5.3. Proposition 15.3.1 is thus proved. □
15.4 Two-sided Bounds
307
√ |2 Note that in the case |𝑋 | = 𝑛 a.s., the random variable 𝜌 2 = |𝑋−𝑌 satisfies 2𝑛 𝜌 2 ≤ 2 a.s., while E𝜌 2 = 1 − 𝑛1 |E𝑋 | 2 ≤ 1. Hence, we may apply the lower bound of Lemma 1.5.4, which gives 1 − E𝜌 ≥
1 1 1 E (1 − 𝜌 2 ) + E (1 − 𝜌 2 ) 2 ≥ E ⟨𝑋, 𝑌 ⟩ 2 . 2 16 16 𝑛2
Applying the asymptotic expansion of Proposition 15.3.1 (with Φ in place of 𝐹), this leads to the following lower bound on the 𝐿 2 -distance on average. Corollary 15.3.2 Let 𝑋 be a random vector in R𝑛 such that |𝑋 | 2 = 𝑛 a.s. Then √
𝜋 E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥
1 1 E ⟨𝑋, 𝑌 ⟩ 2 − + 𝑂 (𝑛−2 ), 2 8𝑛 16 𝑛
where 𝑌 is an independent copy of 𝑋. This lower bound shows the importance of isotropy for obtaining bounds on E 𝜃 𝜔2 (𝐹𝜃 , Φ) that are better than 𝑛1 . Note that the boundedness of the moment 𝑀2 does not guarantee better rates. In the isotropic case, we have E ⟨𝑋, 𝑌 ⟩ 2 = 𝑛, but if, for example E ⟨𝑋, 𝑌 ⟩ 2 ≥ 4𝑛, Corollary 15.3.2 yields √
𝜋 E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥
𝑐 1 − 8𝑛 𝑛2
for some absolute constant 𝑐. Example 15.3.3 Let 𝜉1 , . . . , 𝜉 𝑛/4 (𝑛 = 4, 8, 12, . . . ) be independent Bernoulli random variables taking the values ±1 with probability 12 , and let 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) be the random vector with components 𝑋 𝑘 = 2𝜉 𝑘 for 𝑘 ≤ 𝑛4 and 𝑋 𝑘 = 0 for 𝑘 > 𝑛4 . In this case, |𝑋 | 2 = 𝑛 and 𝑀2 (𝑋) = 2. Also, 2
E ⟨𝑋, 𝑌 ⟩ =
𝑛 ∑︁
(E𝑋 𝑘2 ) 2 = 4𝑛.
𝑘=1
15.4 Two-sided Bounds We will now simplify the statement of Proposition 15.3.1 by involving the 4-th moment condition and the (non-negative) moment functionals 1/ 𝑝 1 𝛼 𝑝 = √ E ⟨𝑋, 𝑌 ⟩ 𝑝 , 𝑛 where 𝑌 is an independent copy of 𝑋. We considered these functionals in Section 1.4 for positive integers 𝑝, and now need them for the particular values 𝑝 = 3 and 𝑝 = 4. Note that 𝛼4 = 𝑚 4 and 𝛼3 ≤ 𝑚 3 .
15 𝐿 2 Expansions and Estimates
308
In the case |𝑋 | 2 = 𝑛 a.s., 𝑚 4 is finite, but in general it may be of order 1, as well as being large in comparison with the dimension 𝑛. On the other hand, if additionally 𝑋 is isotropic, then using ⟨𝑋, 𝑌 ⟩ 4 ≤ 𝑛2 ⟨𝑋, 𝑌 ⟩ 2 , we have E ⟨𝑋, 𝑌 ⟩ 4 ≤ 𝑛3 , i.e., 𝑚 4 ≤ 𝑛1/4 . Proposition 15.4.1 Let 𝑋 be an isotropic random vector in R𝑛 such that |𝑋 | 2 = 𝑛 a.s. and E𝑋 = 0. Then √
𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) =
1 𝑐 E ⟨𝑋, 𝑌 ⟩ 3 + 4 E ⟨𝑋, 𝑌 ⟩ 4 + 𝑂 (𝑛−2 ) 3 16 𝑛 𝑛
1 ≤ 𝑐 ≤ 3. A similar expansion also holds for 𝐹 replaced with the normal with 100 distribution function Φ. In particular,
E 𝜃 𝜔2 (𝐹𝜃 , Φ) = with 𝑐 1 =
1√ 16 𝜋
and
1 200
𝑐2 𝑐1 𝛼3 + 2 𝛼4 + 𝑂 (𝑛−2 ) 3/2 𝑛 𝑛
< 𝑐 2 < 2.
Since 𝛼4 ≥ 1, we immediately obtain: Corollary 15.4.2 Under the same assumptions, E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
1 𝑐 √ 3 E ⟨𝑋, 𝑌 ⟩ 3 + 4 E ⟨𝑋, 𝑌 ⟩ 4 , 𝑛 16 𝜋 𝑛
(15.9)
where 𝑐 is an absolute constant. A similar bound also holds for 𝐹 replaced with the normal distribution function Φ. In view of the pointwise bound ⟨𝑋, 𝑌 ⟩ 3 ≤ 𝑛 ⟨𝑋, 𝑌 ⟩ 2 , we have E ⟨𝑋, 𝑌 ⟩ 3 ≤ 𝑛2 . Therefore, the inequality (15.9) implies E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
𝑐 . 𝑛
But this has been already obtained in the bound (15.5) of Proposition 15.2.1 with 𝜎4 = 0 and 𝑀2 = 1 (and where the assumption about the mean was not used). If the distribution of 𝑋 is symmetric about the origin, then the cubic term 𝛼3 is vanishing, and the upper bound (15.9) is further simplified. In fact, since 𝛼3 ≥ 0 in general, in this case a similar lower bound holds as well (regardless of the symmetry property). Corollary 15.4.3 Let 𝑋 be an isotropic random vector in R𝑛 with mean zero and such that |𝑋 | 2 = 𝑛 a.s. Then for some positive absolute constants 𝑐 𝑗 > 0, E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥
𝑐1 𝑐2 E ⟨𝑋, 𝑌 ⟩ 4 − 2 . 4 𝑛 𝑛
15.4 Two-sided Bounds
309
Moreover, if the distribution of 𝑋 is symmetric about the origin, then E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
𝑐3 E ⟨𝑋, 𝑌 ⟩ 4 . 𝑛4
Similar bounds also hold for 𝐹 replaced with Φ. The last lower bound makes sense when E ⟨𝑋, 𝑌 ⟩ 4 has an order larger than 𝑛2 . In this case we obtain two-sided bounds, so that E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ∼
1 E ⟨𝑋, 𝑌 ⟩ 4 . 𝑛4
Let us also recall that, by Proposition 1.4.2 with 𝑝 = 4, (E ⟨𝑋, 𝑌 ⟩ 4 ) 1/4 ≤ Hence the upper bound of Corollary 14.4.3 implies E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
𝑐𝑀48 𝑛2
√
𝑛 𝑀4 (𝑋) 2 .
,
where the right hand-side is thus of order 𝑛−2 when 𝑀4 is bounded. For the proof of Proposition 15.4.1, first let us further develop the expansion for √ the function 𝑤(𝜀) = 1 − 1 − 𝜀, which was considered in the proof of Lemma 1.5.4, up to the term 𝜀 4 . Lemma 15.4.4 For all |𝜀| ≤ 1, √ 1 1 1 3 𝜀 + 3𝜀 4 . 1 − 1 − 𝜀 ≤ 𝜀 + 𝜀2 + 2 8 16 In addition,
√ 1 1 1 3 𝜀 + 0.01 𝜀 4 . 1 − 1 − 𝜀 ≥ 𝜀 + 𝜀2 + 2 8 16 Proof By Taylor’s formula for the function 𝑤(𝜀) around zero on the half-axis 𝜀 < 1, we have 1 1 3 5 4 𝑤 (5) (𝜀1 ) 5 1 𝜀 + 𝜀 + 𝜀 𝑤(𝜀) = 𝜀 + 𝜀 2 + 2 8 16 128 120 for some point 𝜀1 between zero and 𝜀. Since 𝑤 (5) (𝜀) = 𝑤(𝜀) ≤ Also, 𝑤 (5) (𝜀) ≤
105 32
105 32
1 1 3 5 4 1 𝜀 + 𝜀2 + 𝜀 + 𝜀 , 2 8 16 128
(1 − 𝜀) −9/2 ≥ 0, we have 𝜀 ≤ 0.
39/2 < 461 for 0 ≤ 𝜀 ≤ 23 , so, in this interval 5 4 𝑤 (5) (𝜀1 ) 5 𝜀 + 𝜀 ≤ 3𝜀 4 . 128 120
Thus, in both cases, 𝑤(𝜀) ≤
1 1 3 1 𝜀 + 𝜀2 + 𝜀 + 3𝜀 4 . 2 8 16
15 𝐿 2 Expansions and Estimates
310
This inequality also holds for the remaining values 32 ≤ 𝜀 ≤ 1, since the right-hand side is greater than or equal to 1, while 𝑤(𝜀) ≤ 1. The upper bound of the lemma is thus proved. Now, from Taylor’s formula we also get that √ 1 1 1 3 5 4 𝜀 + 𝜀 , 1 − 1 − 𝜀 ≥ 𝜀 + 𝜀2 + 2 8 16 128 In addition, if −1 ≤ 𝜀 ≤ 0, then 𝑤 (5) (𝜀) ≤
105 32 ,
𝜀 ≥ 0.
so
√ 1 1 1 3 5 4 𝑤 (5) (𝜀1 ) 𝜀 + 𝜀 1+ 𝜀 1 − 1 − 𝜀 = 𝜀 + 𝜀2 + 2 8 16 128 120 105 5 1 1 1 3 1 1 1 3 ≥ 𝜀 + 𝜀2 + 𝜀 + 𝜀4 − 32 ≥ 𝜀 + 𝜀 2 + 𝜀 + 0.01 𝜀 4 . 2 8 16 128 120 2 8 16 Lemma 15.4.4 is proved.
□
Proof (of Proposition 15.4.1) Using the lemma with 𝜀 = 𝜉 = Proposition 15.3.1, we get an asymptotic representation √
⟨𝑋,𝑌 ⟩ 𝑛
and applying
1 1 1 2 1 𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = 1 + E𝜉 + E𝜉 3 + 𝑐 E𝜉 4 − + 𝑂 (𝑛−2 ) 4𝑛 8 16 8𝑛
with 0.01 ≤ 𝑐 ≤ 3. If additionally 𝑋 is isotropic, then E ⟨𝑋, 𝑌 ⟩ 2 = 𝑛, i.e. E𝜉 2 = 𝑛1 , and the representation is simplified to √
11 𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = 1 + E𝜉 3 + 𝑐 E𝜉 4 + 𝑂 (𝑛−2 ), 4𝑛 16
thus removing the term of order 1/𝑛. As was already noted, E𝜉 3 ≤ Hence, the above is further simplified to √
𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) =
1 𝑛
and E𝜉 4 ≤ 𝑛1 .
1 E𝜉 3 + 𝑐 E𝜉 4 + 𝑂 (𝑛−2 ), 16
which is exactly the expansion of Proposition 15.4.1.
15.5 Asymptotic Formulas in the General Case Let us now return to the Plancherel formula in the general case, ∫ ∞ E𝐽𝑛 (𝑡|𝑋 − 𝑌 |) − (E𝐽𝑛 (𝑡|𝑋 |)) 2 1 d𝑡. E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = 2𝜋 −∞ 𝑡2 √ If E |𝑋 | ≤ 𝑏 𝑛, then, according to Proposition 15.1.2, E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) =
𝐶𝑏 1 𝐼+ 2, 2𝜋 𝑛
□
15.5 Asymptotic Formulas in the General Case
311
where 𝐶 is bounded by an absolute constant and ∫ ∞ 𝑡 4 |𝑋 − 𝑌 | 4 − 𝑡 2 |𝑋−𝑌 |2 𝑡 4 (|𝑋 | 4 + |𝑌 | 4 ) − 𝑡 2 (|𝑋|2 +|𝑌 |2 ) d𝑡 2𝑛 2𝑛 𝐼=E 1− e − 1− e . 4𝑛3 4𝑛3 𝑡2 −∞ To evaluate such integrals, consider the functions of the form ∫ ∞ d𝑡 2 2 1 (1 − 𝑟𝑡 4 ) e−𝛼𝑡 /2 − e−𝑡 /2 2 (𝛼 > 0, 𝑟 ∈ R). 𝜓𝑟 (𝛼) = √ 𝑡 2𝜋 −∞ Clearly, 1 𝜓𝑟 (1) = − √ 2𝜋 and 𝜓𝑟′ (𝛼) = −
1 1 √ 2 2𝜋
∫
∫
∞
𝑟𝑡 2 e−𝑡
2 /2
d𝑡 = −𝑟
−∞
∞
(1 − 𝑟𝑡 4 ) e−𝛼𝑡
−∞
2 /2
3𝑟 1 d𝑡 = − √ 1 − 2 . 𝛼 2 𝛼
Hence ∫ 𝜓𝑟 (𝛼) − 𝜓𝑟 (1) =
𝛼
−
1
1 −1/2 3𝑟 −5/2 𝑧 + 𝑧 d𝑧 = (1 + 𝑟) − (𝛼1/2 + 𝑟𝛼−3/2 ), 2 2
and we get 𝜓𝑟 (𝛼) = 1 − (𝛼1/2 + 𝑟𝛼−3/2 ). Here, when 𝛼 and 𝑟 approach zero such that 𝑟 = 𝑂 (𝛼2 ), in the limit it follows that 𝜓0 (0) = 1. Note also that for 𝑟 = 0 the above formula returns us to the identity of Lemma 12.5.2 in the particular case 𝛼0 = 1/2 and with 𝛼/2 in place of 𝛼. From this, 1 1/2 −3/2 1/2 −3/2 √ 𝐼 = E (𝜓𝑟1 (𝛼1 ) − 𝜓𝑟2 (𝛼2 )) = E (𝛼2 + 𝑟 2 𝛼2 ) − E (𝛼1 + 𝑟 1 𝛼1 ), 2𝜋 which we need with 𝛼1 =
|𝑋 − 𝑌 | 2 |𝑋 − 𝑌 | 4 , 𝑟1 = , 𝑛 4𝑛3
𝛼2 =
|𝑋 | 2 + |𝑌 | 2 |𝑋 | 4 + |𝑌 | 4 , 𝑟2 = . 𝑛 4𝑛3
It follows that 𝛼21/2 + 𝑟 2 𝛼2−3/2 = 𝛼11/2 + 𝑟 1 𝛼1−3/2
|𝑋 | 2 + |𝑌 | 2 1/2
1 |𝑋 | 4 + |𝑌 | 4 , 4𝑛 (|𝑋 | 2 + |𝑌 | 2 ) 2 1
1+
𝑛 |𝑋 − 𝑌 | 2 1/2 = 1+ 𝑛 4𝑛
with the assumption that both expressions are equal to zero in the case 𝑋 = 𝑌 = 0. Therefore, we have the following general, although preliminary, observation on the basis of Proposition 15.1.2.
15 𝐿 2 Expansions and Estimates
312
√ Lemma 15.5.1 Let 𝑋 be a random vector in R𝑛 satisfying E |𝑋 | ≤ 𝑏 𝑛. Then with some quantity 𝐶, which is bounded by an absolute constant, we have 𝐶𝑏 1 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = √ E𝑅 + 2 , 𝑛 2𝜋 where 𝑅=
1 |𝑋 | 4 + |𝑌 | 4 |𝑋 − 𝑌 | 1 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 1+ 1 + − √ √ 4𝑛 (|𝑋 | 2 + |𝑌 | 2 ) 2 4𝑛 𝑛 𝑛
with the assumption that 𝑅 = 0 in the case 𝑋 = 𝑌 = 0. When |𝑋 | 2 = |𝑌 | 2 = 𝑛, this lemma returns us to Proposition 15.3.1. As for the general case, note that 𝑅 ≤ 2 |𝑋√|+𝑛|𝑌 | , so E𝑅 ≤ 2𝑏. Let us give a simpler expression by noting that (|𝑋 | 2 − |𝑌 | 2 ) 2 |𝑋 | 4 + |𝑌 | 4 1 = − , (|𝑋 | 2 + |𝑌 | 2 ) 2 2 2 (|𝑋 | 2 + |𝑌 | 2 ) 2 so that 𝑅=
(|𝑋 | 2 − |𝑌 | 2 ) 2 1 |𝑋 − 𝑌 | 1 1 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 1 + − 1 + . + √ √ 8𝑛 4𝑛 8 𝑛3/2 (|𝑋 | 2 + |𝑌 | 2 ) 3/2 𝑛 𝑛
The first term here is actually of order 𝜎42 (𝑋)/𝑛2 , which follows from the next general observation. Lemma 15.5.2 Let 𝜉 be a non-negative random variable with finite second moment (not identically zero), and let 𝜂 be its independent copy. Then E
(𝜉 − 𝜂) 2 Var(𝜉) 1 { 𝜉 +𝜂>0} ≤ 12 . 3/2 (𝜉 + 𝜂) (E 𝜉) 3/2
Proof By homogeneity, we may assume that E𝜉 = 1. In particular, E |𝜉 − 𝜂| ≤ 2. We have E
(𝜉 − 𝜂) 2 1 { 𝜉 +𝜂>1/2} ≤ 23/2 E (𝜉 − 𝜂) 2 1 { 𝜉 +𝜂>1/2} (𝜉 + 𝜂) 3/2 √ ≤ 23/2 E (𝜉 − 𝜂) 2 = 4 2 Var(𝜉).
Also note that, by Chebyshev’s inequality, P {𝜉 ≤ 1/2} = P {1 − 𝜉 ≥ 1/2} ≤ 4 Var(𝜉) 2 , so P {𝜉 + 𝜂 ≤ 1/2} ≤ P {𝜉 ≤ 1/2} P {𝜂 ≤ 1/2} ≤ 16 Var(𝜉) 2 .
15.5 Asymptotic Formulas in the General Case
Hence, since E
| 𝜉 −𝜂 | 𝜉 +𝜂
313
≤ 1 for 𝜉 + 𝜂 > 0, we have, by Cauchy’s inequality,
√︁ (𝜉 − 𝜂) 2 1 |𝜉 − 𝜂| 1 {0< 𝜉 +𝜂 ≤1/2} ≤ E {0< 𝜉 +𝜂 ≤1/2} (𝜉 + 𝜂) 3/2 √︁ √︁ √ ≤ E |𝜉 − 𝜂| P {𝜉 + 𝜂 ≤ 1/2} ≤ 4 2 Var(𝜉).
It remains to combine both inequalities, which yield E
√ (𝜉 − 𝜂) 2 1 { 𝜉 +𝜂>0} ≤ 8 2 Var(𝜉) ≤ 12 Var(𝜉). 3/2 (𝜉 + 𝜂)
The lemma is proved.
□
Applying it with 𝜉 = |𝑋 | 2 , 𝜂 = |𝑌 | 2 , and assuming that E |𝑋 | 2 = 𝑛, we get that E
𝜎42 (𝑋) Var(|𝑋 | 2 ) (|𝑋 | 2 − |𝑌 | 2 ) 2 Var(|𝑋 | 2 ) = 12 ≤ 12 . ≤ 12 (|𝑋 | 2 + |𝑌 | 2 ) 3/2 (E |𝑋 | 2 ) 3/2 𝑛3/2 𝑛1/2
Hence, from Lemma 15.5.1 we obtain: Proposition 15.5.3 Let 𝑋 be a random vector in R𝑛 satisfying E |𝑋 | 2 = 𝑛. Then, for some quantity 𝑐 bounded by an absolute constant, we have 1 + 𝜎42 1 , E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = √ E𝑅 + 𝑐 𝑛2 2𝜋 where 𝜎42 =
1 𝑛
Var(|𝑋 | 2 ) and 𝑅=
1 |𝑋 − 𝑌 | 1 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 1+ − √ 1+ . √ 8𝑛 4𝑛 𝑛 𝑛
In this rather general setup (we only assume that E |𝑋 | 2 = 𝑛), this formula √ may be effectively used in many specific situations. For example, when |𝑋 | = 𝑛 a.s., then 𝜎4 = 0, and we return to the setting of the previous two sections. Moreover, all obtained results can be derived from Proposition 15.5.3 in this particular case. Let us now see what simplification can be done in the symmetric case. Define the random variable 2 ⟨𝑋, 𝑌 ⟩ 𝜉= , |𝑋 | 2 + |𝑌 | 2 √︁ assigning to it the zero value, when 𝑋 = 𝑌 = 0. Since |𝑋−𝑌 | = (|𝑋 | 2 +|𝑌 | 2 ) 1/2 1 − 𝜉, we have (|𝑋 | 2 + |𝑌 | 2 ) 1/2 h 1+ √ 𝑛 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 h = 1+ √ 𝑛
𝑅=
1 i 1 √︁ − 1−𝜉 1+ 8𝑛 4𝑛 √︁ 1 1i (1 − 1 − 𝜉 ) − . 4𝑛 8𝑛
15 𝐿 2 Expansions and Estimates
314
One can now apply Lemma 15.4.4, which yields 𝑅=
(|𝑋 | 2 + |𝑌 | 2 ) 1/2 h 1 1 3 1 1 1i 𝜉 + 𝜉2 + 𝜉 + 𝑐𝜉 4 − 1+ √ 4𝑛 2 8 16 8𝑛 𝑛
with 0.01 ≤ 𝑐 ≤ 3. If 𝑋 has a symmetric distribution (about the origin), then the expectation of the terms containing 𝜉 and 𝜉 3 will vanish. As a result, we arrive at the following assertion. Corollary 15.5.4 Let 𝑋 be a random vector in R𝑛 with a symmetric distribution satisfying E |𝑋 | 2 = 𝑛. For some quantities 𝑐 1 , 𝑐 2 bounded by an absolute constant and such that 𝑐 1 is positive and bounded away from zero, we have 1 + 𝜎42 1 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 4 1 1+ E𝑅 + 𝑐 1 E 𝜉 + 𝑐2 , E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = √ √ 4𝑛 𝑛2 𝑛 2𝜋 where 𝑅=
(|𝑋 | 2 + |𝑌 | 2 ) 1/2 2 1 𝜉 − , √ 𝑛 8 𝑛
𝜉=
2 ⟨𝑋, 𝑌 ⟩ . |𝑋 | 2 + |𝑌 | 2
15.6 General Lower Bounds To derive effective general bounds on E 𝜃 𝜔2 (𝐹𝜃 , 𝐹), let us return to the representation √ of Lemma 15.5.1. According to the remark after the lemma, if E |𝑋 | ≤ 𝑏 𝑛 then we have 1 𝑐𝑏 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = √ E (𝑅0 + 𝑅1 ) + 2 , 𝑛 2𝜋 where 1 (|𝑋 | 2 − |𝑌 | 2 ) 2 , 8 𝑛3/2 (|𝑋 | 2 + |𝑌 | 2 ) 3/2 1 |𝑋 − 𝑌 | 1 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 1+ − √ 1+ 𝑅1 = √ 8𝑛 4𝑛 𝑛 𝑛 𝑅0 =
with the assumption that 𝑅0 = 0 when 𝑋 = 𝑌 = 0. Here 𝑐 is some quantity bounded by an absolute constant. (Although 𝑅1 has already appeared as the expression 𝑅 in Proposition 15.5.3, we prefer to keep the small summand 𝑅0 .) Recall that in terms ⟩ of 𝜉 = |𝑋2 ⟨𝑋,𝑌 , the expression 𝑅1 admits the lower bound | 2 +|𝑌 | 2 𝑅1 ≥
1 1 3 1i 1 1 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 h 𝜉 + 𝜉2 + 𝜉 + 0.01 𝜉 4 − . 1+ √ 4𝑛 2 8 16 8𝑛 𝑛
The expectation of the terms on the right-hand side containing 𝜉 and 𝜉 3 is nonnegative. This follows from the first inequality of Proposition 1.4.4,
15.6 General Lower Bounds
315
E ⟨𝑋, 𝑌 ⟩ 𝑝 (|𝑋 | 2 + |𝑌 | 2 ) −𝛼 ≥ 0, which holds true for any integer 𝑝 ≥ 1 and any real 0 ≤ 𝛼 ≤ 𝑝. Hence, removing 1 the unnecessary factor 1 + 4𝑛 , up to some absolute constants 𝑐 1 , 𝑐 2 > 0, we get 1 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 2 1 1 𝜉 − E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥ √ E𝑅0 + √ E √ 𝑛 8 𝑛 2𝜋 2𝜋 2 2 1/2 (|𝑋 | + |𝑌 | ) 𝑏 + 𝑐1 E 𝜉 4 − 𝑐2 2 . √ 𝑛 𝑛
(15.10)
Now, by the second inequality of Proposition 1.4.4, applied with 𝛼 = 3/2, E (|𝑋 | 2 + |𝑌 | 2 ) 1/2 𝜉 2 = 4 E
⟨𝑋, 𝑌 ⟩ 2 |𝑋 | 2 |𝑌 | 2 4 ≥ E . 2 2 3/2 𝑛 (|𝑋 | 2 + |𝑌 | 2 ) 3/2 (|𝑋 | + |𝑌 | )
This gives E
1 4 |𝑋 | 2 |𝑌 | 2 (|𝑋 | 2 + |𝑌 | 2 ) 1/2 2 1 2 2 1/2 𝜉 − ≥ 3/2 E − (|𝑋 | + |𝑌 | ) √ 𝑛 8𝑛 (|𝑋 | 2 + |𝑌 | 2 ) 3/2 8 𝑛 (|𝑋 | 2 − |𝑌 | 2 ) 2 1 = − 3/2 E = −E𝑅0 . 8𝑛 (|𝑋 | 2 + |𝑌 | 2 ) 3/2
Thus, the summand E𝑅0 in (15.6) neutralizes the term containing 𝜉 2 , and we are left with the term containing 𝜉 4 . That is, we arrive at: √ Proposition 15.6.1 Let 𝑋 be a random vector in R𝑛 satisfying E |𝑋 | ≤ 𝑏 𝑛, and let 𝑌 be its independent copy. For some absolute constants 𝑐 1 , 𝑐 2 > 0, we have E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥ 𝑐 1 E 𝜌 𝜉 4 − 𝑐 2 where 𝜌=
|𝑋 | 2 + |𝑌 | 2 2𝑛
1/2 ,
𝜉=
𝑏 , 𝑛2
2 ⟨𝑋, 𝑌 ⟩ . |𝑋 | 2 + |𝑌 | 2
If |𝑋 | 2 = 𝑛 a.s., this is exactly the lower bound of Corollary 15.4.3. In order to simplify the bound of Proposition 15.6.1 in the basic case where E |𝑋 | 2 = 𝑛, note that E𝜌 2 = 1 and Var(𝜌 2 ) =
𝜎42 2𝑛 .
Using
𝜉 =1−
|𝑋 − 𝑌 | 2 , |𝑋 | 2 + |𝑌 | 2
we have 𝜉 4 ≥ (1 − 𝛼) 4 1 { |𝑋−𝑌 |2 ≤ 𝛼 ( |𝑋 |2 + |𝑌 |2 ) } ≥ (1 − 𝛼) 4 1 { |𝑋−𝑌 |2 ≤ 𝛼𝜆𝑛,
|𝑋 | 2 + |𝑌 | 2 ≥ 𝜆𝑛} ,
0 < 𝛼, 𝜆 < 1.
15 𝐿 2 Expansions and Estimates
316
The inequality |𝑋 | 2 + |𝑌 | 2 ≥ 𝜆𝑛 is equivalent to 𝜌 2 ≥ 𝜆2 , so, o (1 − 𝛼) 4 √ n 𝜆 P |𝑋 − 𝑌 | 2 ≤ 𝛼𝜆𝑛, |𝑋 | 2 + |𝑌 | 2 ≥ 𝜆𝑛 √ 2 (1 − 𝛼) 4 √ 𝜆 P{|𝑋 − 𝑌 | 2 ≤ 𝛼𝜆𝑛} − P{|𝑋 | 2 + |𝑌 | 2 ≤ 𝜆𝑛} . ≥ √ 2
E 𝜌 𝜉4 ≥
But, by Proposition 1.6.1 with 𝑝 = 2, 2 P |𝑋 | 2 + |𝑌 | 2 ≤ 𝜆𝑛 ≤ P |𝑋 | 2 ≤ 𝜆𝑛 ≤
𝜎44 1 . (1 − 𝜆) 4 𝑛2
Hence E 𝜌 𝜉4 ≥
𝜎44 (1 − 𝛼) 4 √ 1 𝜆 P{|𝑋 − 𝑌 | 2 ≤ 𝛼𝜆𝑛} − . √ (1 − 𝜆) 4 𝑛2 2
Choosing, for example, 𝛼 = 𝜆 = 12 , we get E 𝜌 𝜉4 ≥
1 o 𝜎4 1 n P |𝑋 − 𝑌 | 2 ≤ 𝑛 − 42 . 32 4 2𝑛
Hence, from Proposition 15.6.1 we obtain: Proposition 15.6.2 Let 𝑋 be a random vector in R𝑛 satisfying E |𝑋 | 2 = 𝑛, and let 𝑌 be its independent copy. For some absolute constants 𝑐 1 , 𝑐 2 > 0, we have n 1 + 𝜎44 1√ o 𝑛 − 𝑐2 , E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥ 𝑐 1 P |𝑋 − 𝑌 | ≤ 2 𝑛2 where 𝜎42 =
1 𝑛
Var(|𝑋 | 2 ).
Chapter 16
Refinements for the Kolmogorov Distance
Let 𝑋 be a random vector in R𝑛 with finite 4-th moment and such that E |𝑋 | 2 = 𝑛. According to Proposition 14.7.1, the distributions 𝐹𝜃 of the linear forms ⟨𝑋, 𝜃⟩ satisfy the following bound on the Kolmogorov distance to the standard normal law Φ on average: 1 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝐶 (𝑚 43/2 + 𝜎43/2 + |𝑎|) √ . 𝑛 Here 𝑎 = E𝑋, and as before 𝑚 44 =
1 E ⟨𝑋, 𝑌 ⟩ 4 , 𝑛2
𝜎42 =
1 Var(|𝑋 | 2 ), 𝑛
with 𝑌 being an independent copy of 𝑋. While these quantities may be of the order 1 in many interesting situations, the dependence with respect to 𝑛 in this bound does not seem to be quite satisfactory (although it cannot be improved in some other interesting examples). The aim of this chapter is to describe natural conditions which would ensure that the distance 𝜌(𝐹𝜃 , Φ) does not exceed 1/𝑛 up to a logarithmically growing factor. In particular, we would like to obtain a corresponding analog of Proposition 15.4.1 about the 𝐿 2 -distance 𝜔(𝐹𝜃 , Φ). At the end of the chapter, we also discuss relations between these two distances which will allow us to develop lower bounds with a standard rate of normal approximation.
16.1 Preliminaries √ Assume that |𝑋 | = 𝑛 a.s., so that 𝜎42 = 0 and thus the characteristic function of the typical distribution is given by √ 𝑓 (𝑡) = E 𝐽𝑛 (𝑡|𝑋 |) = 𝐽𝑛 (𝑡 𝑛).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_16
317
318
16 Refinements for the Kolmogorov Distance
In addition, in this case √ Δ𝑛 (𝑡) ≡ E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 = E𝐽𝑛 (𝑡|𝑋 − 𝑌 |) − (𝐽𝑛 (𝑡 𝑛)) 2 . Let us return to the Berry–Esseen-type bound (14.14) of Lemma 14.5.2: For all 𝑇 > 𝑇0 ≥ 1, with some absolute constant 𝑐 > 0, we have ∫ 𝑇0 ∫ 1 Δ𝑛 (𝑡) Δ𝑛 (𝑡) 2 d𝑡 + log 𝑇 d𝑡 𝑐 E 𝜃 𝜌 (𝐹𝜃 , 𝐹) ≤ 2 𝑡 𝑡 0 0 ∫ ∞ 2 ∫ 𝑇 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 1 + log 𝑇 d𝑡 + 2 | 𝑓 (𝑡)| d𝑡 . 𝑡 𝑇 𝑇0 0 The last integral is convergent and is bounded by an absolute constant uniformly over all 𝑛 ≥ 4 in view of the polynomial bound (11.8), 𝑐𝑡 2 − 𝑛−1 4 , | 𝑓 (𝑡)| ≤ 6 1 + 𝑛
𝑡∈R
(holding with 𝑐 = 1/(3𝜋) 2 , Proposition 11.5.4). Hence, the above bound on the Kolmogorov distance simplifies to 1
∫ 𝑇0 Δ𝑛 (𝑡) Δ𝑛 (𝑡) d𝑡 + log 𝑇 d𝑡 𝑐 E 𝜃 𝜌 (𝐹𝜃 , 𝐹) ≤ 2 𝑡 𝑡 0 0 ∫ 𝑇 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 1 + log 𝑇 d𝑡 + 2 . 𝑡 𝑇 𝑇0 ∫
2
As we know from Lemma 13.1.2 with 𝑝 = 2, for some absolute constant 𝑐 > 0, 𝑐 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 ≤
𝑚 44 𝑛2
+ e−𝑡
2 /8
,
so, 𝑚4 2 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 d𝑡 ≤ 24 log 𝑇 + e−𝑇0 /8 . 𝑡 𝑛 𝑇0 √︁ Let us choose here 𝑇 = 4𝑛 and 𝑇0 = 4 log 𝑛. Since 𝑚 4 ≥ 1, we then arrive at the following preliminary general bound. √ Lemma 16.1.1 Let 𝑋 be a random vector in R𝑛 such that |𝑋 | = 𝑛 a.s., and let 𝑌 be an independent copy of 𝑋. Then for some absolute constant 𝑐 > 0, ∫ 4√log 𝑛 ∫ 1 Δ𝑛 (𝑡) (log 𝑛) 2 Δ𝑛 (𝑡) 2 d𝑡 + log 𝑛 d𝑡 + E ⟨𝑋, 𝑌 ⟩ 4 . 𝑐 E 𝜃 𝜌 (𝐹𝜃 , 𝐹) ≤ 2 𝑡 𝑡 𝑛4 0 0 𝑐′
∫
𝑇
Since E ⟨𝑋, 𝑌 ⟩ 4 ≥ 𝑛2 , this bound is fulfilled automatically for 𝑛 = 2 and 𝑛 = 3.
16.1 Preliminaries
319
In order to study the integrals in Lemma 16.1.1, let us first focus on the first integral, additionally assuming that the random vector 𝑋 in R𝑛 is isotropic and has mean zero. We need to develop an asymptotic bound on Δ𝑛 (𝑡) for 𝑡 ∈ [0, 1]. Putting ⟩ 2 𝜉 = ⟨𝑋,𝑌 𝑛 , we have |𝑋 − 𝑌 | = 2𝑛(1 − 𝜉), so that √︁ √ Δ𝑛 (𝑡) = E𝐽𝑛 𝑡 2𝑛(1 − 𝜉) − (𝐽𝑛 𝑡 𝑛 ) 2 . We use the asymptotic formula from Proposition 11.3.1, √ 𝑡 4 −𝑡 2 /2 e + 𝜀 𝑛 (𝑡), 𝐽𝑛 𝑡 𝑛) = 1 − 4𝑛
𝑡 ∈ R,
where 𝜀 𝑛 (𝑡) denotes a quantity of the form 𝑂 𝑛−2 min(1, 𝑡 4 ) with a universal constant in 𝑂 (which may vary from place to place). It implies a similar bound √ 𝑡 4 −𝑡 2 e + 𝜀 𝑛 (𝑡). (𝐽𝑛 𝑡 𝑛)) 2 = 1 − 2𝑛 Since |𝜉 | ≤ 1 a.s., we also have 2 √︁ 𝑡4 𝐽𝑛 𝑡 2𝑛(1 − 𝜉) = 1 − (1 − 𝜉) 2 e−𝑡 (1− 𝜉 ) + 𝜀 𝑛 (𝑡). 𝑛 Hence, subtracting from e𝑡
2𝜉
the linear term 1 + 𝑡 2 𝜉 and adding, one may write
2 𝑡 4 i 𝑡4 (1 − 𝜉) 2 e𝑡 𝜉 − 1 − + 𝜀 𝑛 (𝑡) 𝑛 2𝑛 2 = e−𝑡 E (𝑈 + 𝑉) + 𝜀 𝑛 (𝑡) 2
Δ𝑛 (𝑡) = e−𝑡 E
h
1−
with 𝑡4 1 𝑡4 − (1 − 𝜉) 2 + 1 − (1 − 𝜉) 2 · 𝑡 2 𝜉, 𝑛 2 𝑛 2 𝑡4 𝑉 = 1 − (1 − 𝜉) 2 (e𝑡 𝜉 − 1 − 𝑡 2 𝜉). 𝑛
𝑈=
Using E𝜉 = 0, E𝜉 2 = 0≤𝑡≤1
1 𝑛
E𝑈 = −
and hence E |𝜉 | 3 ≤ E𝜉 2 ≤
1 𝑛,
we find that in the interval
𝑡 4 2𝑡 6 𝑡 6 𝑡4 𝑡4 − 2 + 2 − E𝜉 3 = − + 𝜀 𝑛 (𝑡). 2𝑛 𝑛 𝑛 2𝑛 𝑛
Next write 𝑉 =𝑊−
𝑡4 (1 − 𝜉) 2 𝑊, 𝑛
𝑊 = e𝑡
2𝜉
− 1 − 𝑡 2 𝜉.
Using |e 𝑥 − 1 − 𝑥| ≤ 2𝑥 2 for |𝑥| ≤ 1, we have |𝑊 | ≤ 2𝑡 4 𝜉 2 . Hence, the expected value of the second term in the representation for 𝑉 does not exceed 8𝑡 8 /𝑛2 . Moreover, by
320
16 Refinements for the Kolmogorov Distance
Taylor’s expansion, 𝑊=
1 4 2 1 6 3 𝑡 𝜉 + 𝑡 𝜉 + 𝑅𝑡 8 𝜉 4 , 2 6
𝑅=
∞ 2𝑘−8 ∑︁ 𝑡
𝜉 𝑘−4 ,
𝑘! 𝑘=4
implying that 𝑡4 𝑡6 3 + E𝜉 + 𝑐𝑡 8 E𝜉 4 2𝑛 6 with a quantity 𝑐 bounded by an absolute constant. Summing the two expansions, we arrive at 𝑡6 E (𝑈 + 𝑉) = E𝜉 3 + 𝑐𝑡 8 E𝜉 4 + 𝜀 𝑛 (𝑡), 6 and therefore, ∫ 1 Δ𝑛 (𝑡) d𝑡 ≤ 𝑐 1 E𝜉 3 + 𝑐 2 E𝜉 4 + 𝑂 (𝑛−2 ), 𝑡2 0 E𝑊 =
for some absolute constants 𝑐 1 , 𝑐 2 > 0. Here E𝜉 4 ≥ (E𝜉 2 ) 2 = 𝑛−2 , so the term 𝑂 (𝑛−2 ) may be absorbed by the 4-th moment of 𝜉. Since E𝜉 3 ≥ 0, the bound of Lemma 16.1.1 may be simplified to ∫ 4√log 𝑛 Δ𝑛 (𝑡) (log 𝑛) 2 2 3 4 d𝑡 + E ⟨𝑋, 𝑌 ⟩ 4 . 𝑐 E 𝜃 𝜌 (𝐹𝜃 , 𝐹) ≤ E𝜉 + E𝜉 + log 𝑛 𝑡 𝑛4 0 Here again the last term dominates E𝜉 4 , and we arrive at: 𝑛 Lemma 16.1.2 √ Let 𝑋 be an isotropic random vector in R with mean zero and such that |𝑋 | = 𝑛 a.s., and let 𝑌 be an independent copy of 𝑋. Then for some absolute constant 𝑐 > 0, ∫ 4√log 𝑛 Δ𝑛 (𝑡) 2 d𝑡 + E𝜉 3 + (log 𝑛) 2 E𝜉 4 , 𝑐 E 𝜃 𝜌 (𝐹𝜃 , 𝐹) ≤ log 𝑛 𝑡 0
where 𝜉 =
⟨𝑋,𝑌 ⟩ 𝑛 .
16.2 Large Interval. Final Upper Bound Our next step (which is most important) is to show that the integral in Lemma 16.1.2 is dominated by the expression E𝜉 3 + E𝜉 4 up to logarithmic factors. This integral √ may be expressed in terms of the functions 𝑔𝑛 (𝑡) = 𝐽𝑛 (𝑡 2𝑛) and ∫ 𝜓(𝛼) = 0
𝑇
𝑔𝑛 (𝛼𝑡) − 𝑔𝑛 (𝑡) d𝑡, 𝑡
√︁ with 𝑇 = 4 log 𝑛. Indeed, in terms of 𝜉 =
⟨𝑋,𝑌 ⟩ 𝑛 ,
0≤𝛼≤
we have
√ 2,
16.2 Large Interval. Final Upper Bound
321
√ E𝐽𝑛 (𝑡 2𝑛(1 − 𝜉)) − (𝐽𝑛 (𝑡 𝑛)) 2 d𝑡 𝑡 0 0 √ √ ∫ 𝑇 √︁ 𝐽𝑛 (𝑡 2𝑛) − (𝐽𝑛 (𝑡 𝑛)) 2 = E𝜓 1 − 𝜉 + d𝑡. 𝑡 0 √︁ To proceed, we need to develop a Taylor expansion for 𝜉 → 𝜓 1 − 𝜉 around zero in powers of√𝜉. Recall that 𝑔𝑛 (𝑡) represents the characteristic function of the random variable 2𝑛 𝜃 1 on (S𝑛−1 , 𝔰𝑛−1 ). This already ensures that |𝑔𝑛 (𝑡)| ≤ 1 and √ √ √ |𝑔𝑛′ (𝑡)| ≤ 2𝑛 E |𝜃 1 | ≤ 2𝑛 (E 𝜃 12 ) 1/2 = 2
∫
𝑇
Δ𝑛 (𝑡) d𝑡 = 𝑡
∫
√︁
𝑇
for all 𝑡 ∈ R. Hence |𝑔𝑛 (𝛼𝑡) − 𝑔𝑛 (𝑡)| ≤
√ 2 (𝛼 − 1) |𝑡| ≤ |𝑡|,
so that ∫
1
|𝑔𝑛 (𝛼𝑡) − 𝑔𝑛 (𝑡)| d𝑡 + 𝑡 0 ≤ 2 + 2 log 𝑇 < 4 log 𝑇
∫
|𝜓(𝛼)| ≤
1
𝑇
|𝑔𝑛 (𝛼𝑡) − 𝑔𝑛 (𝑡)| d𝑡 𝑡
(since 𝑇 > 𝑒). In addition, 𝜓(1) = 0 and 𝜓 ′ (𝛼) =
∫
𝑇
𝑔𝑛′ (𝛼𝑡) d𝑡 =
0
1 (𝑔𝑛 (𝛼𝑇) − 1). 𝛼
Therefore, we arrive at another expression ∫ ∫ 𝛼 𝑔𝑛 (𝑇𝑥) − 1 d𝑥 = 𝜓(𝛼) = 𝑥 1 1
𝛼
𝑔𝑛 (𝑇𝑥) d𝑥 − log 𝛼. 𝑥
For |𝜀| ≤ 1, let ∫
(1−𝜀) 1/2
𝑔𝑛 (𝑇𝑥) d𝑥, 𝑥 1 1 𝑢(𝜀) = 𝜓 (1 − 𝜀) 1/2 = 𝑣(𝜀) − log(1 − 𝜀), 2 √︁ so that E 𝜓 1 − 𝜉 = E 𝑢(𝜉). According to the bound of Proposition 11.4.1, 𝑣(𝜀) =
√ 2 |𝐽𝑛 (𝑡 𝑛)| ≤ 3 (e−𝑡 /2 + e−𝑛/12 ), which yields 2
|𝑔𝑛 (𝑡)| ≤ 3 (e−𝑡 + e−𝑛/12 ).
322
16 Refinements for the Kolmogorov Distance
Hence, for −1 ≤ 𝜀 ≤ 21 , up to an absolute constant 𝐶, √ 2
∫ |𝑣(𝜀)| ≤ √1 2
sup √ |𝑔𝑛 (𝑇𝑥)|
≤𝑥 ≤ 2
√1 2
≤ sup√ |𝑔𝑛 (𝑧)| ≤ 3 (e−𝑇
1 d𝑥 𝑥 2 /2
+ e−𝑛/12 ) ≤
𝑧 ≥𝑇/ 2
𝐶 , 𝑛8
√︁ where the last inequality is specialized to the choice 𝑇 = 4 log 𝑛. Using the Taylor expansion for the logarithmic function, we also have − log(1 − 𝜀) ≤ 𝜀 +
1 2 1 3 𝜀 + 𝜀 + 4𝜀 4 , 2 3
𝜀≤
1 . 2
Combining the two inequalities, we get 𝑢(𝜀) ≤
1 1 𝐶 1 𝜀 + 𝜀 2 + 𝜀 3 + 2𝜀 4 + 8 , 2 4 6 𝑛
−1 ≤ 𝜀 ≤
1 . 2
In order to involve the remaining interval 12 ≤ 𝜀 ≤ 1 in an inequality of a similar type, recall that |𝑢(𝜀)| ≤ 3 log 𝑇 for all |𝜀| ≤ 1. Hence, the above inequality will hold automatically by increasing the coefficient 2 in front of 𝜀 4 to a multiple of log 𝑇. As a result, we obtain the desired inequality on the whole segment, i.e., 𝑢(𝜀) ≤
1 1 1 𝑐 𝜀 + 𝜀 2 + 𝜀 3 + 𝑐 log 𝑇 𝜀 4 + 8 , 2 4 6 𝑛
−1 ≤ 𝜀 ≤ 1,
for some absolute constant 𝑐 > 0. Thus, √︁ 1 1 1 𝑐 𝜓 1 − 𝜉 ≤ 𝜉 + 𝜉 2 + 𝜉 3 + 𝑐 log 𝑇 𝜉 4 + 8 , 2 4 6 𝑛 and taking the expectation, we get E𝜓
√︁
1 1 1−𝜉 ≤ + E𝜉 3 + 𝑐 log 𝑇 E𝜉 4 , 4𝑛 6
where the term 𝑐𝑛−8 was absorbed by the 4-th moment of 𝜉. Now, let us turn to the integral √ √ ∫ 𝑇 𝐽𝑛 (𝑡 2𝑛) − (𝐽𝑛 (𝑡 𝑛)) 2 𝐼𝑛 = d𝑡 𝑡 0 and recall the asymptotic formulas √ 𝑡 4 −𝑡 2 𝐽𝑛 𝑡 2𝑛) = 1 − e + 𝜀 𝑛 (𝑡), 𝑛
√ 2 𝑡 4 −𝑡 2 𝐽𝑛 𝑡 𝑛) = 1 − e + 𝜀 𝑛 (𝑡), 2𝑛
16.3 Relations Between Kantorovich, 𝐿 2 and Kolmogorov distances
323
where 𝜀 𝑛 (𝑡) = 𝑂 𝑛−2 min(1, 𝑡 4 ) . After integration, this remainder term will create an error of order at most 𝑛−2 log 𝑇, up to which 𝐼𝑛 is equal to ∫
𝑇
− 0
2 𝑡 4 −𝑡 2 d𝑡 1 1 e =− 1 − (𝑇 2 + 1) e−𝑇 = − + 𝑜(𝑛−15 ). 2𝑛 𝑡 4𝑛 4𝑛
Thus, 𝐼𝑛 = − and therefore ∫ 0
𝑇
1 + 𝑂 (𝑛−2 log 𝑇), 4𝑛
√︁ 1 Δ𝑛 (𝑡) d𝑡 = E 𝜓 1 − 𝜉 + 𝐼𝑛 ≤ E𝜉 3 + 𝑐 log 𝑇 E𝜉 4 . 𝑡 6
One can now apply this estimate in Lemma 16.1.2, and then we eventually obtain: 𝑛 Proposition 16.2.1 √ Let 𝑋 be an isotropic random vector in R with mean zero and such that |𝑋 | = 𝑛 a.s., and let 𝑌 be an independent copy of 𝑋. Then for some absolute constants 𝑐 1 , 𝑐 2 > 0,
E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤ 𝑐 1 (log 𝑛) E𝜉 3 + 𝑐 2 (log 𝑛) 2 E𝜉 4 , ⟩ where 𝜉 = ⟨𝑋,𝑌 𝑛 . A similar inequality continues to hold for the standard normal distribution function Φ in place of 𝐹.
Although this assertion is mainly aimed at obtaining √ rates of order 1/𝑛 (modulo a logarithmic factor), it easily yields rates of order 1/ 𝑛. Indeed, using |𝜉 | ≤ 1, we have that 1 1 E𝜉 4 ≤ E𝜉 2 = , E |𝜉 | 3 ≤ E𝜉 2 = . 𝑛 𝑛 Hence, from Proposition 16.2.1 we recover a particular case of the more general Proposition 14.6.1 (where 𝑀2 = 1 and 𝜎4 = 0 under the current assumptions). 𝑛 Corollary 16.2.2 √ Let 𝑋 be an isotropic random vector in R with mean zero and such that |𝑋 | = 𝑛 a.s. Then for some absolute constant 𝑐,
E 𝜃 𝜌 2 (𝐹𝜃 , Φ) ≤
𝑐 (log 𝑛) 2 . 𝑛
16.3 Relations Between Kantorovich, 𝑳 2 and Kolmogorov distances Let us now compare the 𝐿 2 and Kolmogorov distances on average, between the distributions 𝐹𝜃 of the weighted sums ⟨𝑋, 𝜃⟩ and the typical distribution 𝐹 = E 𝜃 𝐹𝜃 . Such information will be needed to derive appropriate lower bounds on E 𝜃 𝜌(𝐹𝜃 , 𝐹).
324
16 Refinements for the Kolmogorov Distance
√ Proposition 16.3.1 Let 𝑋 be a random vector in R𝑛 such that |𝑋 | ≤ 𝑏 𝑛 a.s. Then, for any 𝛼 ∈ [1, 2], 𝑏 −𝛼/2 E 𝜃 𝜔 𝛼 (𝐹𝜃 , 𝐹) ≤ 14 (log 𝑛) 𝛼/4 E 𝜃 𝜌 𝛼 (𝐹𝜃 , 𝐹) +
8 . 𝑛4
(16.1)
As will be clear from the proof, at the expense of a larger coefficient in front of log 𝑛, the last term 𝑛−4 can be shown to be of the form 𝑛−𝛽 for any prescribed value of 𝛽. A relation similar to (16.1) is also true for the Kantorovich distance 𝑊 in place of 𝐿 2 . We state it for the case 𝛼 = 1. √ Proposition 16.3.2 Let 𝑋 be a random vector in R𝑛 such that |𝑋 | ≤ 𝑏 𝑛 a.s. Then √︁ 4𝑏 E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 10 𝑏 log 𝑛 E 𝜃 𝜌(𝐹𝜃 , 𝐹) + 4 . 𝑛
(16.2)
Proof Put 𝑅 𝜃 (𝑥) = 𝐹𝜃 (−𝑥) + (1 − 𝐹𝜃 (𝑥)) for 𝑥 > 0 and define 𝑅 similarly on the basis of 𝐹. Using (𝐹𝜃 (−𝑥) − 𝐹 (−𝑥)) 2 ≤ 2𝐹𝜃 (−𝑥) 2 + 2𝐹 (−𝑥) 2 , (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) 2 ≤ 2 (1 − 𝐹𝜃 (𝑥)) 2 + 2 (1 − 𝐹 (𝑥)) 2 , we have (𝐹𝜃 (−𝑥) − 𝐹 (−𝑥)) 2 + (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) 2 ≤ 2𝑅 𝜃 (𝑥) 2 + 2𝑅(𝑥) 2 . Hence, given 𝑇 > 0 (to be specified later on), we have ∫ (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) 2 d𝑥 + (𝐹𝜃 (𝑥) − 𝐹 (𝑥)) 2 d𝑥 −𝑇 | 𝑥 | ≥𝑇 ∫ ∞ ∫ ∞ 2 2 ≤ 2𝑇 𝜌 (𝐹𝜃 , 𝐹) + 2 𝑅 𝜃 (𝑥) d𝑥 + 2 𝑅(𝑥) 2 d𝑥.
𝜔2 (𝐹𝜃 , 𝐹) =
∫
𝑇
𝑇
𝑇
It follows that, for any 𝛼 ∈ [1, 2], 𝛼
𝜔 𝛼 (𝐹𝜃 , 𝐹) ≤ (2𝑇) 2 𝜌 𝛼 (𝐹𝜃 , 𝐹) ∫ ∞ 𝛼2 ∫ + 2 𝑅 𝜃 (𝑥) 2 d𝑥 + 2
𝑅(𝑥) 2 d𝑥
𝛼2 ,
𝑇
𝑇
and therefore, by Jensen’s inequality (since
∞
𝛼 2
≤ 1).
𝛼
E 𝜃 𝜔 𝛼 (𝐹𝜃 , 𝐹) ≤ (2𝑇) 2 E 𝜃 𝜌 𝛼 (𝐹𝜃 , 𝐹) ∫ ∞ 𝛼2 ∫ 2 + 2 E 𝜃 𝑅 𝜃 (𝑥) d𝑥 + 2 𝑇
𝑇
∞ 2
𝑅(𝑥) d𝑥
𝛼2 .
16.3 Relations Between Kantorovich, 𝐿 2 and Kolmogorov distances
325
Next, by Markov’s inequality, for any 𝑥 > 0 and 𝑝 ≥ 1, 𝑅 𝜃 (𝑥) 2 ≤
E | ⟨𝑋, 𝜃⟩ | 𝑝 𝑥𝑝
2
E | ⟨𝑋, 𝜃⟩ | 2 𝑝 𝑥2 𝑝
≤
and
E 𝜃 E | ⟨𝑋, 𝜃⟩ | 2 𝑝 . 𝑥2 𝑝 Since 𝑅 = E 𝜃 𝑅 𝜃 , a similar inequality holds true for 𝑅 as well (by applying Cauchy’s inequality). Hence E 𝜃 𝑅 𝜃 (𝑥) 2 ≤
𝛼
E 𝜃 𝜔 (𝐹𝜃 , 𝐹) ≤ (2𝑇)
𝛼 2
𝛼
E 𝜃 𝜌 (𝐹𝜃 , 𝐹) + 2 2 E 𝜃 E | ⟨𝑋, 𝜃⟩ |
∫
2𝑝
∞
1 𝑥2 𝑝
𝑇
𝛼2 d𝑥
.
Here, when 𝜃 is treated as a random vector with distribution 𝔰𝑛−1 , which is independent of 𝑋, we have, by Corollary 10.3.3, E 𝜃 | ⟨𝑋, 𝜃⟩ | 2 𝑝 ≤
2𝑝 𝑝
|𝑋 | 2 𝑝 ,
𝑝 ≥ 1,
𝑛
so that, by the assumption on 𝑋, E 𝜃 E | ⟨𝑋, 𝜃⟩ | 2 𝑝 ≤ (2𝑏 2 𝑝) 𝑝 . From this,
∫
2 2 E𝜃 𝑇
∞
E | ⟨𝑋, 𝜃⟩ | 2 𝑝 d𝑥 𝑥2 𝑝
𝛼2
𝛼
≤
2 2 +1 (2𝑝 − 1)
𝛼 2
(2𝑏 2 𝑝)
𝛼𝑝 2
𝑇−
𝛼(2 𝑝−1) 2
.
Thus, 𝛼
𝛼
E 𝜃 𝜔 (𝐹𝜃 , 𝐹) ≤ (2𝑇)
𝛼 2
2 2 +1
𝛼
E 𝜃 𝜌 (𝐹𝜃 , 𝐹) +
𝛼
𝑇
(2𝑝 − 1) 2
𝛼 2
2𝑏 2 𝑝 𝑇2
𝛼2𝑝 .
√ Let us take 𝑇 = 2𝑏 𝑝, in which case the above inequality becomes √ 𝛼 E 𝜃 𝜔 𝛼 (𝐹𝜃 , 𝐹) ≤ (4𝑏 𝑝) 2 E 𝜃 𝜌 𝛼 (𝐹𝜃 , 𝐹) +
2 𝛼+1 (2𝑝 − 1)
𝛼 2
𝛼𝑝 √ 𝛼 (𝑏 𝑝) 2 2− 2 .
√ To simplify the last term, one can use 𝑝 ≤ 2𝑝 − 1 for 𝑝 ≥ 1 together with 2 𝛼+1 ≤ 8 𝑝 𝛼𝑝 and 2− 2 ≤ 2 2 (since 1 ≤ 𝛼 ≤ 2), which leads to 𝛼 √ 𝛼 E 𝜃 𝜔 𝛼 (𝐹𝜃 , 𝐹) ≤ (4𝑏 𝑝) 2 E 𝜃 𝜌 𝛼 (𝐹𝜃 , 𝐹) + 8𝑏 2 2− 𝑝/2 .
Finally, choosing 𝑝 = 8
log 𝑛 log 2 ,
we arrive at (16.1).
326
16 Refinements for the Kolmogorov Distance
Now, turning to (16.2), we use the same functions 𝑅 𝜃 and 𝑅 as before and write ∫
∫
𝑇
𝑊 (𝐹𝜃 , 𝐹) =
|𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 + |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥 |𝑥 | ≥𝑇 ∫ ∞ ∫ ∞ ≤ 2𝑇 𝜌(𝐹𝜃 , 𝐹) + 𝑅(𝑥) d𝑥, 𝑅 𝜃 (𝑥) d𝑥 + −𝑇
𝑇
𝑇
which gives ∫
∞
E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 2𝑇 E 𝜃 𝜌(𝐹𝜃 , 𝐹) + 2
𝑅(𝑥) d𝑥. 𝑇
By Markov’s inequality, for any 𝑥 > 0 and 𝑝 ≥ 1, 𝑅 𝜃 (𝑥) ≤
E | ⟨𝑋, 𝜃⟩ | 𝑝 , 𝑥𝑝
𝑅(𝑥) = E 𝜃 𝑅 𝜃 (𝑥) ≤
E 𝜃 E | ⟨𝑋, 𝜃⟩ | 𝑝 . 𝑥𝑝
Hence ∫
∞
1 d𝑥. 𝑥𝑝
E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 2𝑇 E 𝜃 𝜌(𝐹𝜃 , 𝐹) + 2 E 𝜃 E | ⟨𝑋, 𝜃⟩ | 𝑝 𝑇
Here, one may use once more the bound (E 𝜃 | ⟨𝑋, 𝜃⟩ | 𝑝 ) 1/ 𝑝 ≤ which yields
√︃
𝑝 𝑛
|𝑋 | for 𝑝 ≥ 2,
E 𝜃 E | ⟨𝑋, 𝜃⟩ | 𝑝 = E |𝑋 | 𝑝 E 𝜃 |𝜃 1 | 𝑝 ≤ 𝑏 𝑝 𝑛 𝑝/2 E 𝜃 |𝜃 1 | 𝑝 ≤ 𝑏 2 𝑝 and E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 2𝑇 E 𝜃 𝜌(𝐹𝜃 , 𝐹) +
𝑝/2
2 (𝑏 2 𝑝) 𝑝/2 . 𝑝 − 1 𝑇 𝑝−1
√ Let us take 𝑇 = 2𝑏 𝑝, in which case the above inequality becomes √ E 𝜃 𝑊 (𝐹𝜃 , 𝐹) ≤ 4𝑏 𝑝 E 𝜃 𝜌(𝐹𝜃 , 𝐹) + 4𝑏 Here we choose 𝑝 =
4 log 𝑛 log 2
√ 𝑝 −𝑝 2 . 𝑝−1
√
≥ 4. Using
𝑝 𝑝−1
< 1, we arrive at (16.2).
□
16.4 Lower Bounds A lower bound on E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) which would be close to the upper bound of Proposition 16.2.1 (up to logarithmic terms) may be given with the help of the lower bound on E 𝜃 𝜔2 (𝐹𝜃 , 𝐹). More precisely, this can be done in the case where the quantity 𝑐 1 E𝜉 3 + 𝑐 2 E𝜉 4 asymptotically dominates 𝑛−2 . As was shown in Section 15.4, in the √ isotropic mean zero case with |𝑋 | = 𝑛 a.s., we have an asymptotic expansion
16.4 Lower Bounds
327
√
𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) =
1 E𝜉 3 + 𝑐 E𝜉 4 + 𝑂 (𝑛−2 ) 16
with 0.01 ≤ 𝑐 ≤ 3. As before, here we use the notation 𝜉=
1 ⟨𝑋, 𝑌 ⟩ , 𝑛
where 𝑌 is an independent copy of 𝑋. Recall that E𝜉 3 ≥ 0. Combining the above expansion with the bound (16.1) of Proposition 16.3.1 with 𝛼 = 2 and 𝑏 = 1, we therefore obtain a lower bound on the Kolmogorov distance. 𝑛 Proposition 16.4.1 √ Let 𝑋 be an isotropic random vector in R with mean zero and such that |𝑋 | = 𝑛 a.s. For some absolute constants 𝑐 𝑗 > 0,
√︁ 𝑐3 log 𝑛 E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≥ 𝑐 1 E𝜉 3 + 𝑐 2 E𝜉 4 − 2 . 𝑛
(16.3)
A similar inequality continues to hold for the normal distribution function Φ in place of 𝐹. The relation (16.2) in Proposition 16.3.1 for the Kantorovich distance 𝑊 (𝐹𝜃 , 𝐹) may be used to answer the following question: Is it possible to sharpen the lower bound (16.3) by replacing E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) with E 𝜃 𝜌(𝐹𝜃 , 𝐹)? To this end, we will need additional information about moments of 𝜔(𝐹𝜃 , 𝐹) of order higher than 2. Let us start with an elementary general inequality, connecting the three distances. Namely, given arbitrary distribution functions 𝐹 and 𝐺, we have ∫ ∞ 𝜔2 (𝐹, 𝐺) = (𝐹 (𝑥) − 𝐺 (𝑥)) 2 d𝑥 −∞ ∫ ∞ ≤ sup |𝐹 (𝑥) − 𝐺 (𝑥)| |𝐹 (𝑥) − 𝐺 (𝑥)| d𝑥 = 𝜌(𝐹, 𝐺) 𝑊 (𝐹, 𝐺). −∞
𝑥
Thus, putting 𝑊 = 𝑊 (𝐹𝜃 , 𝐹),
𝜔 = 𝜔(𝐹𝜃 , 𝐹),
𝜌 = 𝜌(𝐹𝜃 , 𝐹),
we have 𝜔3 ≤ 𝑊 3/2 𝜌 3/2 and, by Hölder’s inequality with exponents 𝑝 = 4 and 𝑞 = 43 ,
E 𝜃 𝜔3
1/3
1/12 1/4 ≤ E𝜃 𝑊6 E 𝜃 𝜌2 .
By Proposition 14.2.1 with 𝑝 = 6, and using 𝑀1 ≤ 𝑀2 , we have √ 1/6 12 6 E𝜃 𝑊 ≤ E 𝜃 𝑊 + 𝑀2 √ , 𝑛 so that E𝜃 𝜔
3
1/3
√
12 ≤ E 𝜃 𝑊 + 𝑀2 √ 𝑛
1/2
E 𝜃 𝜌2
1/4 .
328
16 Refinements for the Kolmogorov Distance
√ If |𝑋 | ≤ 𝑏 𝑛 a.s., one may apply Proposition 16.3.2 so as to estimate E 𝜃 𝑊. This gives 1/4 1/3 √︁ 𝑏 𝑀2 1/2 E 𝜃 𝜌2 (16.4) E 𝜃 𝜔3 ≤ 𝐶 𝑏 log 𝑛 E 𝜃 𝜌 + 4 + √ 𝑛 𝑛 for some absolute constant 𝐶. In particular, if the parameters 𝑀2 and 𝑏 are of order 1, the 𝐿 3 -norm of 𝜔(𝐹𝜃 , 𝐹) is of the order of the 𝐿 2 -norm of 𝜌(𝐹𝜃 , 𝐹) modulo a logarithmic factor. As an illustration, let us return to Proposition 14.6.1, which under the assumption E |𝑋 | 2 = 𝑛 provides the estimate 1/2 log 𝑛 E 𝜃 𝜌 ≤ E 𝜃 𝜌2 ≤ 𝐶 (𝑀22 + 𝜎2 ) √ , 𝑛 for some absolute constant 𝐶 > 0 in terms of our standard functionals 𝑀22 = sup E ⟨𝑋, 𝜃⟩ 2 , 𝜃 ∈S𝑛−1
1 𝜎2 = √ E | |𝑋 | 2 − 𝑛|. 𝑛
Since necessarily 𝑏 ≥ 1 and 𝑀2 ≥ 1, inserting the last bound in (16.4), we get: √ Lemma 16.4.2 Let 𝑋 be a random vector in R𝑛 such that |𝑋 | ≤ 𝑏 𝑛 a.s. and E |𝑋 | 2 = 𝑛. Then for some absolute constant 𝑐 > 0, 1/3 √ (log 𝑛) 5/4 . 𝑐 E 𝜃 𝜔3 (𝐹𝜃 , 𝐹) ≤ (𝑀23/2 + 𝜎23/4 ) 𝑏 √ 𝑛 Let us now explain how this upper bound can be used to refine the lower bound (16.3). The argument is based on the following general elementary observation. Given a random variable 𝜂, introduce the 𝐿 𝑝 -norms ∥𝜂∥ 𝑝 = (E |𝜂| 𝑝 ) 1/ 𝑝 . Lemma 16.4.3 If 𝜂 ≥ 0 with 0 < ∥𝜂∥ 3 < ∞, then E𝜂 ≥
1 (E 𝜂2 ) 2 . E 𝜂3
(16.5)
Moreover, o 1 ∥𝜂∥ 6 n 1 2 . P 𝜂 ≥ √ ∥𝜂∥ 2 ≥ 8 ∥𝜂∥ 3 2
(16.6)
Thus, in the case where ∥𝜂∥ 2 and ∥𝜂∥ 3 are almost equal (up to not large factors), ∥𝜂∥ 1 will be of a similar order. Moreover, 𝜂 cannot be much smaller than its mean E𝜂 on a large part of the probability space (where it was defined). Proof The relation (16.5) follows from the Cauchy inequality applied to the representation 𝜂2 = 𝜂1/2 𝜂3/2 . To prove (16.6), put 𝑝 = P{𝜂 ≥ 𝑟} for a given number 𝑟 > 0. By Hölder’s inequality with exponents 3/2 and 3, E 𝜂2 1 { 𝜂 ≥𝑟 } ≤ E 𝜂3
2/3
𝑝 1/3 .
16.4 Lower Bounds
329
Hence, choosing 𝑟 =
√1 2
∥𝜂∥ 2 , we get
E 𝜂2 = E 𝜂2 1 { 𝜂 ≥𝑟 } + E 𝜂2 1 { 𝜂 0. Then E 𝜃 𝜔(𝐹𝜃 , 𝐹) ≥
𝑐𝑏 −3/2 𝑀29/2
+
𝜎29/4
𝐷4 √ , (log 𝑛) 15/4 𝑛
where 𝑐 > 0 is an absolute constant. Moreover, o n 𝑐𝑏 −3 𝐷6 1 . 𝔰𝑛−1 𝜔(𝐹𝜃 , 𝐹) ≥ √ 𝐷 ≥ 2𝑛 𝑀29 + 𝜎29/2 (log 𝑛) 15/2 The lower bound obtained for E 𝜃 𝜔(𝐹𝜃 , 𝐹) implies a similar assertion for the Kolmogorov distance. Indeed, by Proposition 16.3.1 with 𝛼 = 1, we have 1 8 √ E 𝜃 𝜔(𝐹𝜃 , 𝐹) ≤ 14 (log 𝑛) 1/4 E 𝜃 𝜌(𝐹𝜃 , 𝐹) + 4 . 𝑛 𝑏 This gives: Corollary 16.4.5 Under the assumptions of Proposition 16.4.4, E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≥
𝑐𝑏 −2 𝑀29/2 + 𝜎29/4
𝐷4 1 , √ − (log 𝑛) 4 𝑛 4𝑛4
where 𝑐 > 0 is an absolute constant. In particular, if 𝑋 is isotropic and |𝑋 | 2 = 𝑛 a.s., then 1 𝑐𝐷 4 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≥ √ − 4. 4 (log 𝑛) 𝑛 4𝑛
330
16 Refinements for the Kolmogorov Distance
Furthermore, √ if 𝐷 is of order 1, we obtain a lower bound on E 𝜃 𝜌(𝐹𝜃 , 𝐹) with rate of the order 1/ 𝑛 modulo a logarithmic factor. Hence, 𝐹 may also be replaced with the standard normal distribution function Φ, since 𝜌(𝐹, Φ) ≤ 𝑐𝑛1 , cf. Proposition 12.2.2. In this connection, let us emphasize that the rates for the normal approximation of 𝐹𝜃 that are better than 1/𝑛 (on average) cannot be obtained under the support assumption as above. Proposition 16.4.6 For any random vector 𝑋 in R𝑛 such that |𝑋 | 2 = 𝑛 a.s., we have E 𝜃 𝜌(𝐹𝜃 , Φ) ≥
𝑐 , 𝑛
where 𝑐 > 0 is an absolute constant. Proof Using the convexity of the distance function 𝐻 → 𝜌(𝐻, Φ) and applying Jensen’s inequality, we get E 𝜃 𝜌(𝐹𝜃 , Φ) ≥ 𝜌(𝐹, Φ). To further estimate the latter distance, one may apply Proposition 3.4.1, which gives, for any 𝑇 > 0, ∫ 1 𝑇 𝑡 −𝑡 2 /2 ( 𝑓 (𝑡) d𝑡 (16.7) 𝜌(𝐹, Φ) ≥ − e ) 1 − . 3𝑇 0 𝑇 Since |𝑋 | 2 = 𝑛 a.s., and recalling Corollary 11.3.3, we have √ 𝑡 4 −𝑡 2 /2 e + 𝑂 𝑛−2 min{1, 𝑡 4 } , 𝑓 (𝑡) = 𝐽𝑛 𝑡 𝑛) = 1 − 4𝑛 which holds uniformly on the real line with a universal constant in 𝑂. With this approximation, choosing 𝑇 = 1, it follows from (16.7) that 𝜌(𝐹, Φ) ≥ 𝑛𝑐 with some absolute constant 𝑐 > 0 for all 𝑛 ≥ 𝑛0 (where 𝑛0 is determined by 𝑐 only). But, a √ √ similar bound also holds for 𝑛 < 𝑛0 since 𝐹 is supported on the interval [− 𝑛, 𝑛].□
16.5 Remarks The material of Chapters 15–16 is based on [41].
Chapter 17
Applications of the Second Order Correlation Condition
We continue to use the notations of the previous chapters and denote by 𝐹𝜃 the distribution (function) of the linear form ⟨𝑋, 𝜃⟩ = 𝜃 1 𝑋1 + · · · + 𝜃 𝑛 𝑋𝑛 of a given random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 , 𝑛 ≥ 2. The average value 𝐹 = E 𝜃 𝐹𝜃 with integration on the unit sphere S𝑛−1 over the uniform measure 𝔰𝑛−1 represents the typical distribution of the weighted sums of the components 𝑋 𝑘 of 𝑋. The aim of this chapter is to derive almost 1/𝑛 bounds on the Kolmogorov distance 𝜌(𝐹𝜃 , 𝐹) in the general case, that is, when the distribution of 𝑋 is not necessarily √ supported on the sphere (of radius of 𝑛). To this end we involve the second order concentration inequalities on S𝑛−1 , assuming throughout the chapter that 𝑋 satisfies a second order correlation condition: With some (finite) constant Λ = Λ(𝑋), Var
∑︁ 𝑛
𝑎 𝑗 𝑘 𝑋 𝑗 𝑋𝑘 ≤ Λ
𝑗,𝑘=1
𝑛 ∑︁
𝑎 2𝑗 𝑘
𝑗,𝑘=1
for all 𝑎 𝑗 𝑘 ∈ R. In addition, we always suppose that the distribution of 𝑋 is isotropic, i.e., E 𝑋 𝑗 𝑋 𝑘 = 𝛿 𝑗 𝑘 for all 𝑗, 𝑘. Recall that necessarily Λ ≥ 12 in the isotropic case.
17.1 Mean Value of 𝝆(𝑭𝜽 , 𝚽) Under the Symmetry Assumption Like in Chapter 14, our basic tool will be the Berry–Esseen bounds. Hence, we need appropriate information about the behavior of the characteristic functions 𝑓 𝜃 (𝑡) = E e𝑖𝑡 ⟨𝑋, 𝜃 ⟩ ,
𝜃 ∈ R𝑛 , 𝑡 ∈ R,
treated as smooth functions on the Euclidean space (with 𝑡 serving as a parameter). Preliminary results in this direction on the deviations of 𝜃 → 𝑓 𝜃 (𝑡) on S𝑛−1 about © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_17
331
332
17 Applications of the Second Order Correlation Condition
their means 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) have been already obtained in Section 13.4. In particular, it was shown that, if additionally the distribution of 𝑋 is symmetric about the origin, then for the intervals |𝑡| ≤ 𝐴𝑛1/5 (𝐴 > 0), we have E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 2 ≤
𝑐 Λ𝑡 4 , 𝑛2
where the constant 𝑐 > 0 is determined by 𝐴 only (Proposition 13.4.2). Hence E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| ≤
𝑐√ 2 Λ𝑡 . 𝑛
(17.1)
This estimate is sufficient to control the distances 𝜌(𝐹𝜃 , 𝐹) on average. Indeed, by Propositions 1.7.2 and 1.7.4, in the isotropic case we have 𝑚 42 + 𝜎42 ≤ 1 + 2Λ. Hence, the Berry–Esseen bound of Lemma 14.5.3 specialized to the case 𝑝 = 2 yields, for all 𝑇 ≥ 𝑇0 ≥ 1, ∫ 𝑇0 Λ 𝑇 1 −𝑇 2 /16 E | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 + 1+log + +e 0 , (17.2) 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ 𝑡 𝑛 𝑇0 𝑇 0 √︁ with some absolute constant 𝑐 > 0. Let us choose here 𝑇 = 4𝑛 and 𝑇0 = 4 log 𝑛. 𝑇2 √ 16 log 𝑛 √ Using (17.1), the integral in (17.2) produces the term 𝑛0 Λ = 𝑛 Λ (up to an log 𝑛 absolute factor), the second term on the right-hand side produces 𝑛 Λ, while the last two terms are of the order 1/𝑛. The condition 𝑇0 ≤ 𝐴𝑛1/5 in (17.1) is fulfilled for a suitable absolute constant 𝐴. Thus, we arrive at the following bound. Proposition 17.1.1 Given an isotropic random vector 𝑋 in R𝑛 with a symmetric distribution, we have 𝑐 log 𝑛 Λ, E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ 𝑛 for some absolute constant 𝑐 > 0. A similar inequality continues to hold for the standard normal distribution function Φ replacing 𝐹. This bound is applicable to a large variety of probability distributions on R𝑛 , especially when the parameter Λ can effectively be bounded. For example, one can apply the bound Λ ≤ 4/𝜆1 of Proposition 6.3.3, where 𝜆1 = 𝜆1 (𝑋) is an optimal value in the Poincaré-type inequality 𝜆1 Var(𝑢(𝑋)) ≤ E |∇𝑢(𝑋)| 2 (in the class of all bounded smooth functions 𝑢 on R𝑛 ). In this case, Proposition 17.1.1 leads to:
17.2 Berry–Esseen Bounds Involving Λ
333
Corollary 17.1.2 If 𝑋 is isotropic and has a symmetric distribution satisfying a Poincaré-type inequality on R𝑛 with 𝜆1 > 0, then for some absolute constant 𝑐 > 0, E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 log 𝑛 −1 𝜆1 . 𝑛
In particular, suppose additionally that the distribution of 𝑋 has density 𝑝(𝑥) = e−𝑉 ( 𝑥) ,
𝑥 ∈ 𝐾,
on the supporting (open convex) set 𝐾 ⊂ R𝑛 , where the function 𝑉 is twice continuously differentiable with Hessian satisfying 𝑉 ′′ (𝑥) ≥ 𝜆𝐼 𝑛 , 𝑥 ∈ 𝐾 (in the matrix sense) for some 𝜆 > 0. Then, according to the Bakry–Emery criterion, 𝜆1 ≥ 𝜆 (cf. Proposition 7.2.7). Hence, the bound of Corollary 17.1.2 holds true with 𝜆 in place of 𝜆1 . In fact, the symmetry assumption may be removed from Corollary 17.1.2. This will be discussed in Section 17.5.
17.2 Berry–Esseen Bounds Involving 𝚲 While Proposition 17.1.1 only gives a bound on 𝜌(𝐹𝜃 , 𝐹) in the mean on the unit sphere, one may wonder whether this distance is at most of the same order (log 𝑛)/𝑛 on a large part of the sphere. Of course, by Markov’s inequality, n 𝑐 Λ log 𝑛 o 1 𝑟 ≤ 𝔰𝑛−1 𝜌(𝐹𝜃 , 𝐹) ≥ 𝑛 𝑟
(𝑟 > 0).
This already provides some information about deviations of 𝜌(𝐹𝜃 , 𝐹) above its mean. However, the decay for probabilities such as 1/𝑟 is rather slow for growing 𝑟. One may try to sharpen this large deviation inequality under higher order moment hypotheses on 𝑋 by applying the exponential inequality of Proposition 13.4.2 (in the symmetric case) or the more general 𝜓1 -bounds of Propositions 13.5.1/13.6.3. With this aim, as a preliminary step, let us return to the Berry–Esseen-type inequality ∫ 𝑐 𝜌(𝐹𝜃 , 𝐹) ≤ 0
𝑇
1 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 + 𝑡 𝑇
∫
𝑇
| 𝑓 (𝑡)| d𝑡, 0
which holds true for any 𝑇 > 0 with some universal constant 𝑐 > 0. By Lemma 13.1.3 with 𝑝 = 2 and 𝑇 ≥ 𝑛, we have in the isotropic case (using 𝜎42 ≤ Λ, Λ ≥ 12 ) 𝑐 𝑇
∫
𝑇
| 𝑓 (𝑡)| d𝑡 ≤ 0
1 + 𝜎42 1 1+Λ 1 5Λ + ≤ + ≤ . 𝑛 𝑇 𝑛 𝑛 𝑛
334
17 Applications of the Second Order Correlation Condition
Moreover, the pointwise subgaussian bound of the same lemma yields 𝑐 | 𝑓 (𝑡)| ≤ so that, for all 𝑇 ≥ 𝑇0 ≥ 1, ∫ 𝑐
𝑇
𝑇0
1 + 𝜎42 2 2 3Λ + e−𝑡 /4 ≤ + e−𝑡 /4 , 𝑛 𝑛
2 Λ 𝑇 | 𝑓 (𝑡)| d𝑡 ≤ log + e−𝑇0 /4 . 𝑡 𝑛 𝑇0
Thus, using the Λ-functional, we have: Lemma 17.2.1 Let 𝑋 be an isotropic random vector in R𝑛 . With some absolute constant 𝑐 > 0, for all 𝜃 ∈ S𝑛−1 , ∫ 𝑐 𝜌(𝐹𝜃 , 𝐹) ≤ 0
𝑇0
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 + 𝑡
∫
𝑇
𝑇0
| 𝑓 𝜃 (𝑡)| Λ 𝑇 d𝑡 + log , 𝑡 𝑛 𝑇0
(17.3)
√︁ provided that 𝑇 ≥ 2𝑇0 ≥ 4 log 𝑛 with 𝑇 ≥ 𝑛. Our aim will be to show that the right-hand side of (17.3) is of the order Λ/𝑛 up to a logarithmically growing factor for most of 𝜃 ∈ S𝑛−1 rather than just on average. First, let us consider the integrals of | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)|/𝑡 over moderate (that is, not too large) intervals [0, 𝑇0 ] with 𝑇0 ≤ 𝐴𝑛1/6 . If 0 < 𝑡 ≤ 𝑇0 , and the random vector 𝑋 has an isotropic symmetric distribution on R𝑛 , the exponential bound of Proposition 13.4.2 is applicable and may be stated in terms of the Orlicz norm generated by the Young function 𝜓1 (𝑠) = e |𝑠 | − 1. On the space (S𝑛−1 , 𝔰𝑛−1 ), this norm is defined for any measurable function 𝜉 on the unit sphere by n o ∥𝜉 ∥ 𝜓1 = inf 𝜆 > 0 : E 𝜃 𝜓1 (|𝜉 (𝜃)|/𝜆) ≤ 1 . Thus, by Proposition 13.4.2, in the symmetric case 𝑐 ∥ 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) ∥ 𝜓1 ≤
Λ𝑡 2 𝑛
for some absolute constant 𝑐 > 0. To drop the symmetry assumption, we need to involve the linear part 𝑙 𝑡 (𝜃) of the function 𝜃 → 𝑓 𝜃 (𝑡) − 𝑓 (𝑡), that is, its orthogonal projection in 𝐿 2 (R𝑛 , 𝔰𝑛−1 ) to the 𝑛-dimensional subspace of all linear functions on R𝑛 . By Proposition 13.5.1, we then have a more general bound 𝑐 ∥ 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) ∥ 𝜓1 ≤
Λ𝑡 2 √︁ + 𝐼 (𝑡), 𝑛
𝐼 (𝑡) = ∥𝑙 𝑡 ∥ 2𝐿 2 .
17.2 Berry–Esseen Bounds Involving Λ
335
An application of the triangle inequality in the Orlicz space yields
∫ 𝑇0
∫ 𝑇0
d𝑡
d𝑡
𝑐 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| ≤ 𝑐 ∥ 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)∥ 𝜓1 𝑡 𝑡 0 0 𝜓1 ∫ 𝑇0 2 √︁ d𝑡 Λ𝑡 ≤ + 𝐼 (𝑡) 𝑛 𝑡 0 ∫ 𝑇0 √︁ Λ 2 d𝑡 𝑇 + = 𝐼 (𝑡) . 2𝑛 0 𝑡 0 By Markov’s inequality, 𝔰𝑛−1 {|𝜉 | ≥ 𝑟 ∥𝜉 ∥ 𝜓1 } ≤ 2 e−𝑟 (𝑟 > 0). Hence, we get: Lemma 17.2.2 Let 𝑋 be an isotropic random vector in R𝑛 . For all 0 < 𝑇0 ≤ 𝐴𝑛1/6 and 𝑟 > 0, ∫ 𝑇0 ∫ 𝑇0 √︁ | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 𝑟Λ 2 d𝑡 𝔰𝑛−1 𝑐 d𝑡 ≥ 𝑇0 + 𝑟 𝐼 (𝑡) ≤ 2 e−𝑟 𝑡 𝑛 𝑡 0 0 with a constant 𝑐 > 0 depending on the parameter 𝐴 > 0 only. Here, 𝐼 (𝑡) is the squared 𝐿 2 -norm of the linear part of the function 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) in 𝐿 2 (𝔰𝑛−1 ). √︁ In applications, the value 𝑇0 = 𝑇0 (𝑛, 𝑟) may be chosen to be of the order log 𝑛 and depending on 𝑟, while the parameter 𝑇 in (17.3) should be of the order 𝑛. Outside the interval of 𝑡 of moderate size, that is, on the long interval [𝑇0 , 𝑇], both | 𝑓 𝜃 (𝑡)| and | 𝑓 (𝑡)| are expected to be small for most of 𝜃, at least integrally. To study this property, let us consider the integral of the form ∫
𝑇
𝐿 (𝜃) = 𝑇0
| 𝑓 𝜃 (𝑡)| d𝑡, 𝑡
𝑇 ≥ 𝑇0 > 0.
(17.4)
By Hölder’s inequality, for any integer 𝑝 ≥ 1, 𝐿 (𝜃)
2𝑝
2 𝑝−1
≤ log
𝑇 ∫ 𝑇0
𝑇
𝑇0
so that E 𝜃 𝐿(𝜃)
2𝑝
2 𝑝−1
≤ log
𝑇 ∫ 𝑇0
𝑇0
Since | 𝑓 𝜃 (𝑡)| 2 𝑝 = E e𝑖𝑡 ⟨Σ 𝑝 , 𝜃 ⟩ ,
𝑇
Σ𝑝 =
| 𝑓 𝜃 (𝑡)| 2 𝑝 d𝑡, 𝑡 E 𝜃 | 𝑓 𝜃 (𝑡)| 2 𝑝 d𝑡. 𝑡
𝑝 ∑︁
𝑋 (𝑘) − 𝑌 (𝑘) ,
𝑘=1
𝑋 (𝑘) ,
where may write
𝑌 (𝑘)
(1 ≤ 𝑘 ≤ 𝑝) are independent copies of the random vector 𝑋, we E 𝜃 | 𝑓 𝜃 (𝑡)| 2 𝑝 = E𝐽𝑛 (𝑡 Σ 𝑝 ).
336
17 Applications of the Second Order Correlation Condition
Thus, E 𝜃 𝐿 (𝜃) 2 𝑝 ≤ log2 𝑝−1
𝑇 ∫ 𝑇0
𝑇
E𝐽𝑛 (𝑡 Σ 𝑝 )
𝑇0
d𝑡 . 𝑡
Next, given 𝛼 > 0, we split the above expectation into the events 𝐴 𝛼 = |Σ 𝑝 | ≤ √ √ 𝛼 𝑛 , 𝐵 𝛼 = |Σ 𝑝 | > 𝛼 𝑛 , and apply the upper bound of Proposition 11.4.1, √ 𝐽𝑛 (𝑡 𝑛) ≤ 3 e−𝑡 2 /2 + e−𝑛/12 . This gives ∫
𝑇
E𝐽𝑛 (𝑡 Σ 𝑝 ) 1 𝐵 𝛼 𝑇0
∫
e−𝛼
2 𝑡 2 /2
+ e−𝑛/12 d𝑡 𝑡 𝑇0 2 2 𝑇 ≤ 3 e−𝛼 𝑇0 /2 + e−𝑛/12 log . 𝑇0
d𝑡 ≤3 𝑡
𝑇
As for the complementary event, one may just use |𝐽𝑛 | ≤ 1 leading to the bound ∫
𝑇
E𝐽𝑛 (𝑡|Σ 𝑝 |) 1 𝐴 𝛼 𝑇0
𝑇 d𝑡 ≤ P( 𝐴 𝛼 ) log . 𝑡 𝑇0
Hence, we get E 𝜃 𝐿(𝜃) 2 𝑝 ≤ 3 log2 𝑝
𝑇
e−𝛼
2 𝑇 2 /2 0
𝑇0
More specifically, let us choose here 𝑇0 =
2 𝛼
√︁
+ e−𝑛/12 + P( 𝐴 𝛼 ) .
𝑝 log 𝑛 and 𝑇 = 𝑇0 𝑛, which leads to
E 𝜃 𝐿(𝜃) 2 𝑝 ≤ 3 (log 𝑛) 2 𝑝 𝑛−2 𝑝 + e−𝑛/12 + P( 𝐴 𝛼 ) . Using the inequality 𝑥 2 𝑝 e−𝑥 ≤ ( 2𝑒𝑝 ) 2 𝑝 (𝑥 ≥ 0), we have e−𝑛/12 ≤ ( 24𝑒 𝑝 ) 2 𝑝 𝑛−2 𝑝 , and the above bound is simplified. As a result, we get: Lemma 17.2.3 Let 𝑋 (𝑘) , 𝑌 (𝑘) (1 ≤ 𝑘 ≤ 𝑝) be independent copies of a random vector √︁ 𝑋 in R𝑛 . Suppose that 𝐿 (𝜃) is defined in (17.4) for limits of integration 2 𝑇0 = 𝛼 𝑝 log 𝑛 and 𝑇 = 𝑇0 𝑛 with 𝛼 > 0. Then, for some absolute constant 𝑐 > 0, E 𝜃 𝐿(𝜃) 2 𝑝 ≤ (𝑐 log 𝑛) 2 𝑝 𝑝 2 𝑝 𝑛−2 𝑝 + P( 𝐴 𝛼 ) , √ Í𝑝 𝑋 (𝑘) − 𝑌 (𝑘) . where 𝐴 𝛼 = |Σ 𝑝 | ≤ 𝛼 𝑛 , Σ 𝑝 = 𝑘=1
(17.5)
17.3 Deviations Under Moment Conditions
337
17.3 Deviations Under Moment Conditions One may now develop several natural scenarios in Lemma 17.2.3, where the small ball probabilities P( 𝐴 𝛼 ) are used in various models of interest. For example, under the moment assumptions, using the variance functional 𝜎42 = 𝑛1 Var(|𝑋 | 2 ) and assuming that E |𝑋 | 2 = 𝑛, Proposition 1.6.4 provides the bound P( 𝐴1/2 ) ≤ 𝐶 𝑝 𝑛−2 𝑝 ,
𝐶 𝑝 = 16𝑝 2 𝑚 4 𝑝
4𝑝
+ 4𝜎42
2𝑝
,
(17.6)
where 𝑚 4 𝑝 = 𝑚 4 𝑝 (𝑋) and 𝜎42 = 𝜎42 (𝑋). Hence, the growth of 𝑚 𝑞 for large 𝑞 is responsible for large deviations of the integrals 𝐿 (𝜃). To capture the behavior of the moments, one may start, for example, from the exponential moment condition E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2,
𝜃 ∈ S𝑛−1 ,
(17.7)
with a parameter 𝜆 > 0. In analogy with the moment functionals 𝑀 𝑝 = 𝑀 𝑝 (𝑋), here the optimal value is described as the maximal Orlicz norm 𝜆 = sup ∥ ⟨𝑋, 𝜃⟩ ∥ 𝜓1 , 𝜃 ∈S𝑛−1
corresponding to the Young function 𝜓1 (𝑡) = e |𝑡 | − 1 (𝑡 ∈ R). Then the previous results about the long and moderate intervals lead to: Proposition 17.3.1 Let 𝑋 be an isotropic random vector in R𝑛 with a symmetric distribution and a finite constant Λ. If 𝑋 satisfies the moment condition (17.7), then with some absolute constant 𝑐 > 0, for all 𝑟 ≥ 0, o n 𝑐 log 𝑛 (Λ + 𝜆4 ) 𝑟 ≤ 2 exp{−𝑟 1/8 }. 𝔰𝑛−1 𝜌(𝐹𝜃 , 𝐹) ≥ 𝑛
(17.8)
A similar inequality also holds for the normal distribution function Φ in place of 𝐹. For the values 𝑟 = (𝛽 log 𝑛) 9 , 𝛽 > 0, this estimate provides a polynomial bound o n 𝑐𝛽8 (log 𝑛) 9 (Λ + 𝜆4 ) ≤ 2 𝑛−𝛽 𝔰𝑛−1 𝜌(𝐹𝜃 , 𝐹) ≥ 𝑛 with respect to 1/𝑛. In other words, if a number 𝐴 is sufficiently large, then with high 𝔰𝑛−1 -probability 𝜌(𝐹𝜃 , 𝐹) ≤
𝐴 (log 𝑛) 9 (Λ + 𝜆4 ). 𝑛
Technically, the power 1/8 in (17.8) comes from the fact that the constant 𝐶 𝑝 in (17.6) grows at least like 𝑝 8 𝑝 for large values of 𝑝.
338
17 Applications of the Second Order Correlation Condition
Proof If 𝑟 < 1, the right-hand side of (17.8) is greater than 1, while for 𝑟 > 𝑛 and 𝑐 > 2 the 𝔰𝑛−1 -probability on the left-hand side is zero. Hence, one may assume that 1 ≤ 𝑟 ≤ 𝑛. According to Propositions 1.4.2 and 1.3.3, we have 𝑚 𝑝 ≤ 𝑀 𝑝2 (𝑋) ≤ 𝜆2 𝑝 2 for any 𝑝 ≥ 2. Hence, the quantity 𝐶 𝑝 in (17.6) admits the bound 𝐶 𝑝 ≤ 𝑐𝜆2 𝑝 4
4𝑝
+ 4𝜎42
2𝑝
.
Moreover, one may involve the second order correlation condition, which ensures that 𝜎42 ≤ Λ, according to Proposition 1.7.2. By (17.6), this gives P( 𝐴1/2 ) ≤ 𝑐 𝑝 𝜆8 𝑝 𝑝 16 𝑝 + Λ2 𝑝 𝑛−2 𝑝 up to some absolute constant 𝑐 > 0. In view of the isotropy assumption, the parameter 𝜆 is bounded away from zero. Indeed, from (17.7) it follows that, for any 𝜃 ∈ S𝑛−1 , 1+
1 ⟨𝑋, 𝜃⟩ 2 E ≤ E e | ⟨𝑋, 𝜃 ⟩ |/𝜆 ≤ 2, 2 𝜆
and thus 𝜆2 ≥ 12 . Hence, the term 𝜆8 𝑝 𝑝 16 𝑝 dominates the term 𝑝 2 𝑝 which appears √︁ in (17.5). Therefore, being defined for the parameters 𝑇0 = 4 𝑝 log 𝑛 and 𝑇 = 𝑇0 𝑛, the integrals 𝐿(𝜃) of Lemma 17.2.3 satisfy the moment bounds E 𝜃 𝐿 (𝜃) 2 𝑝 ≤ (𝑐 log 𝑛) 2 𝑝 𝜆8 𝑝 𝑝 16 𝑝 + Λ2 𝑝 𝑛−2 𝑝 . Applying Cauchy’s inequality, it follows that E 𝜃 𝐿 (𝜃) 𝑝 ≤
𝑐 log 𝑛 𝑝
𝜆4 𝑝 𝑝 8 𝑝 + Λ 𝑝 ,
𝑛
implying
E 𝜃 𝐿(𝜃) 𝑝
1/ 𝑝
≤
𝑐 log 𝑛 𝑐 log 𝑛 Λ + 𝜆4 𝑝 8 ≤ Λ + 𝜆4 𝑝 8 . 𝑛 𝑛
At this step (choosing a larger constant 𝑐, if necessary), one may assume that 𝑝 ≥ 1 is an arbitrary real number. One can now apply Markov’s inequality which gives o n 𝑝8 𝑝 𝑐e log 𝑛 (Λ + 𝜆4 ) 𝑠 ≤ , 𝑠 ≥ 1. 𝔰𝑛−1 𝐿 (𝜃) ≥ 𝑛 (e𝑠) 𝑝 Choosing here 𝑝 = 𝑠1/8 and replacing 𝑐e with 𝑐, we thus arrive at the inequality
17.3 Deviations Under Moment Conditions
∫
𝑇
𝔰𝑛−1 𝑇0
339
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 𝑐 log 𝑛 4 d𝑡 ≥ (Λ + 𝜆 ) 𝑠 ≤ exp{−𝑠1/8 }. 𝑡 𝑛
Another change 𝑠 = 𝑥 8/7 leads to ∫ 𝑇 𝑐 log 𝑛 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 𝔰𝑛−1 d𝑡 ≥ (Λ + 𝜆4 ) 𝑥 8/7 ≤ exp{−𝑥 1/7 }, 𝑡 𝑛 𝑇0 Note that, for this choice of 𝑝, we have √︁ √︁ 𝑇0 = 4𝑠1/16 log 𝑛 = 4𝑥 1/14 log 𝑛,
𝑥 ≥ 1.
𝑇02 𝑥 = 16 𝑥 8/7 log 𝑛.
One may now apply Lemma 17.2.2 with 𝑥 in place of 𝑟. Since 𝐼 (𝑡) = 0 for all 𝑡 ∈ R (by the symmetry assumption), we then get ∫ 𝑇0 𝑐 log 𝑛 8/7 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 ≥ Λ𝑥 ≤ 2 e−𝑥 . 𝔰𝑛−1 𝑡 𝑛 0 This bound holds as long as 𝑇0 ≤ 𝐴𝑛1/6 with a constant 𝑐 > 0 depending on the parameter 𝐴 > 0. Combining the two bounds obtained for the intervals [0, 𝑇0 ] and [𝑇0 , 𝑇], we obtain a similar bound for the whole interval [0, 𝑇], namely ∫ 𝑇 𝑐 log 𝑛 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 ≥ (Λ + 𝜆4 ) 𝑥 8/7 ≤ 3 exp{−𝑥 1/7 }. 𝔰𝑛−1 𝑡 𝑛 0 Replacing 𝑥 = 𝑟 7/8 , we arrive at ∫ 𝑇 𝑐 log 𝑛 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| 4 d𝑡 ≥ (Λ + 𝜆 ) 𝑟 ≤ 3 exp{−𝑟 1/8 }. 𝔰𝑛−1 𝑡 𝑛 0 √︁ The required condition 𝑇0 = 4𝑟 1/16 log 𝑛 ≤ 𝐴𝑛1/6 is fulfilled with an absolute constant 𝐴 > 0 in view of the assumption 𝑟 ≤ 𝑛. √︁ It remains to apply Lemma 17.2.1, in which the condition 𝑇0 ≥ 2 log 𝑛 is fulfilled due to the assumption 𝑟 ≥ 1. Thus, the bound (17.3) of this lemma is applicable and leads to o n 𝑐 log 𝑛 (Λ + 𝜆4 ) 𝑟 ≤ 3 exp{−𝑟 1/8 } 𝔰𝑛−1 𝜌(𝐹𝜃 , 𝐹) ≥ 𝑛 for some absolute constant 𝑐 > 0. As explained before, this bound holds automatically for 0 ≤ 𝑟 ≤ 1 and 𝑟 ≥ 𝑛, hence, it holds true for all 𝑟 ≥ 0. Rescaling the variable 𝑟, we get the required inequality 17.8. Proposition 17.3.1 is proved. □
340
17 Applications of the Second Order Correlation Condition
17.4 The Case of Non-symmetric Distributions In order to extend the bound of Proposition 17.1.1, E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤
𝑐 log 𝑛 Λ, 𝑛
(17.9)
to the case where the distribution of 𝑋 is not necessarily symmetric about the origin, we need to employ more sophisticated results reflecting the size of the linear part of the characteristic functions 𝑓 𝜃 (𝑡) in 𝐿 2 (S𝑛−1 , 𝔰𝑛−1 ) as functions of the variable 𝜃. This may be achieved at the expense of a certain term that has to be added to the right-hand side in (17.9). More precisely, we derive the following bound, denoting by 𝑐 a positive absolute constant which may vary from place to place. Proposition 17.4.1 Let 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) be an isotropic random vector in R𝑛 , and let 𝑌 be an independent copy of 𝑋. Then 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤
1/2 log 𝑛 1/4 ⟨𝑋, 𝑌 ⟩ log 𝑛 Λ+ E √︁ . 𝑛 𝑛 |𝑋 | 2 + |𝑌 | 2
(17.10)
A similar inequality remains to hold for the standard normal distribution function Φ in place of 𝐹. √︁ Here, the ratio ⟨𝑋, 𝑌 ⟩ / |𝑋 | 2 + |𝑌 | 2 is understood to be zero, if 𝑋 = 𝑌 = 0. Moreover, the expectation on the right-hand side of (17.10) is non-negative, which follows from the representation ⟨𝑋, 𝑌 ⟩
E √︁
1 =√ 2 2 𝜋 |𝑋 | + |𝑌 |
∫
𝑛 ∞ ∑︁
E𝑋 𝑘 e−|𝑋 |
2𝑟 2
2
d𝑟.
−∞ 𝑘=1
If the distribution of 𝑋 is symmetric, this expectation is vanishing, and we return in (17.10) to the bound (17.9). Thus, Proposition 17.4.1 represents a generalization of this result. To derive (17.10), we employ Proposition 13.6.3, which yields a second order concentration bound 𝑐 E 𝜃 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| ≤
1 √ 2 √︁ Λ 𝑡 + 𝐼 (𝑡), 𝑛
(17.11)
where 𝐼 (𝑡) is the squared 𝐿 2 (𝔰𝑛−1 )-norm of the linear part of the function 𝜃 → 𝑓 𝜃 (𝑡). As we know, 𝐼 (𝑡) =
𝑡2 (𝑈 2 + 𝑉 2 ) 𝑡 4 − 8𝑅 2 𝑡 2 −𝑅2 𝑡 2 E ⟨𝑋, 𝑌 ⟩ 1 − e + 𝑂 (𝑡 2 𝑛−5/2 ) 𝑛 4𝑛
with 𝑈=
|𝑋 | 2 , 𝑛
𝑉=
|𝑌 | 2 , 𝑛
𝑅2 =
|𝑋 | 2 + |𝑌 | 2 𝑈 +𝑉 = (𝑅 ≥ 0). 2 2𝑛
(17.12)
17.4 The Case of Non-symmetric Distributions
341
As a preliminary step, first let us prove: Lemma 17.4.2 If the random vector 𝑋 in R𝑛 is isotropic, then ∫ 𝑐 0
𝑇0
𝐼 (𝑡) 1 ⟨𝑋, 𝑌 ⟩ Λ2 d𝑡 ≤ E + 2, 𝑛 𝑅 𝑡2 𝑛
(17.13)
√︁ where 𝑇0 = 4 log 𝑛. Proof Introduce the events 𝐴 = {𝑅 ≤ representation (17.12), it follows that ∫
𝑇0
0
1 2}
and 𝐵 = {𝑅 > 12 }. From the asymptotic
∫ 𝑇0 i 1 h 𝐼 (𝑡) −𝑅 2 𝑡 2 ⟨𝑋, ⟩ d𝑡 = E 𝑌 e d𝑡 𝑛 𝑡2 0 ∫ 𝑇0 i 2 2 2 h + 2 E ⟨𝑋, 𝑌 ⟩ 𝑅 2 𝑡 2 e−𝑅 𝑡 d𝑡 𝑛 0 ∫ 𝑇0 h i 1 2 2 1 − 2 E ⟨𝑋, 𝑌 ⟩ (𝑈 2 + 𝑉 2 ) 𝑡 4 e−𝑅 𝑡 d𝑡 + 𝑂 2 . 4𝑛 𝑛 0
After the change of the variable 𝑅𝑡 = 𝑠 (assuming without loss of generality that 𝑅 > 0) and putting 𝑇1 = 𝑅𝑇0 , the above is simplified to ∫
𝑇0
0
∫ 1 h ⟨𝑋, 𝑌 ⟩ 𝑇1 −𝑠2 i 𝐼 (𝑡) d𝑡 = E e d𝑠 𝑛 𝑅 𝑡2 0 ∫ 2 h ⟨𝑋, 𝑌 ⟩ 𝑇1 2 −𝑠2 i + 2E 𝑠 e d𝑠 𝑅 𝑛 0 h ⟨𝑋, 𝑌 ⟩ 𝑈 2 + 𝑉 2 ∫ 𝑇1 i 1 1 4 −𝑠 2 − 2E 𝑠 e d𝑠 + 𝑂 . 𝑅 4𝑛 𝑅4 𝑛2 0
At the expense of a small error, integration here may be extended from the interval [0, 𝑇1 ] to the whole half-axis (0, ∞). To see this, one can use the estimate ∫ ∞ 2 2 (1 + 𝑠2 + 𝑠4 ) e−𝑠 d𝑠 < 𝑐e−𝑇1 /2 (𝑇1 ≥ 0), 𝑇1
together with ⟨𝑋, 𝑌 ⟩ |𝑋 | |𝑌 | |𝑋 | 2 + |𝑌 | 2 ≤ = 𝑅𝑛. ≤ 𝑅 𝑅 2𝑅 By Proposition 1.6.1 (with 𝑝 = 2, 𝜆 = 12 ), we have P{|𝑋 | 2 ≤
(17.14) 𝑛 2}
≤
4𝜎42 𝑛
n n 𝑛 o n 2 𝑛 o 16Λ2 𝑛o ≤ P |𝑋 | 2 ≤ P |𝑌 | ≤ ≤ . P( 𝐴) = P |𝑋 | 2 + |𝑌 | 2 ≤ 2 2 2 𝑛2
≤
4Λ 𝑛 ,
so,
(17.15)
On the other hand, on the complementary set 𝐵, we have 𝑇12 = 16𝑅 2 log 𝑛 > 4 log 𝑛. Since E𝑅 2 = 1, it follows that
342
17 Applications of the Second Order Correlation Condition 2
2
2
E 𝑅 e−𝑇1 /2 = E 𝑅 e−𝑇1 /2 1 𝐴 + E 𝑅 e−𝑇1 /2 1 𝐵 ≤
1 𝑐Λ2 1 P( 𝐴) + 2 E𝑅 ≤ 2 , 2 𝑛 𝑛
where we used the universal lower bound Λ ≥ 12 . Hence, using (17.14), h | ⟨𝑋, 𝑌 ⟩ | ∫ E
𝑅
∞
𝑇1
i 2 2 𝑐Λ2 e−𝑠 d𝑠 ≤ 𝑐𝑛 E𝑅e−𝑇1 ≤ . 𝑛
By a similar argument, h | ⟨𝑋, 𝑌 ⟩ | ∫ E
𝑅
∞
𝑇1
i 2 2 𝑐Λ2 𝑠2 e−𝑠 d𝑠 ≤ 𝑐𝑛 E𝑅 e−𝑇1 /2 ≤ . 𝑛
Using 𝑈 2 + 𝑉 2 4(𝑈 2 + 𝑉 2 ) = ≤ 4, 𝑅4 (𝑈 + 𝑉) 2 we also have E
i h | ⟨𝑋, 𝑌 ⟩ | 𝑈 2 + 𝑉 2 ∫ ∞ 𝑐Λ2 4 −𝑠 2 −𝑇12 /2 𝑠 e d𝑠 ≤ 𝑐𝑛 . E𝑅 e ≤ 𝑅 𝑛 𝑅4 𝑇1
Thus, extending the integration to the whole positive half-axis, we get ∫ 0
𝑇0
𝑐 1 ⟨𝑋, 𝑌 ⟩ 𝑐 2 ⟨𝑋, 𝑌 ⟩ 𝑐 3 h ⟨𝑋, 𝑌 ⟩ 𝑈 2 + 𝑉 2 i 𝐼 (𝑡) d𝑡 = E + 2E − 2E + 𝑂 Λ2 𝑛−2 2 4 𝑛 𝑅 𝑅 𝑅 𝑡 𝑛 𝑛 𝑅
for some positive absolute constants 𝑐 𝑗 . Moreover, using the identity (𝑈 − 𝑉) 2 (𝑈 − 𝑉) 2 𝑈2 + 𝑉 2 = 2 + = 2 + 2 , 𝑅4 2𝑅 4 (𝑈 + 𝑉) 2 ⟩ and recalling that E ⟨𝑋,𝑌 ≥ 0, it follows that for some other positive absolute 𝑅 constants ∫ 𝑇0 Λ2 𝑐 1 ⟨𝑋, 𝑌 ⟩ 𝑐 2 ⟨𝑋, 𝑌 ⟩ (𝑈 − 𝑉) 2 𝐼 (𝑡) d𝑡 ≤ E − E + 𝑂 . (17.16) 𝑛 𝑅 𝑅 (𝑈 + 𝑉) 2 𝑡2 𝑛2 𝑛2 0
To get rid of the last expectation (by showing that it is bounded by a dimension free quantity), first note that, by (17.15), the expression under this expectation is bounded in absolute value by 𝑅𝑛. Hence, applying Cauchy’s inequality and using E𝑅 2 = 1, from (17.14) we obtain that ⟨𝑋, 𝑌 ⟩ ⟨𝑋, 𝑌 ⟩ (𝑈 − 𝑉) 2 √︁ 1 ≤ E E 1 𝐴 ≤ 𝑛 E𝑅 1 𝐴 ≤ 𝑛 P( 𝐴) ≤ 4Λ. (17.17) 𝐴 2 𝑅 𝑅 (𝑈 + 𝑉)
17.4 The Case of Non-symmetric Distributions
343
⟩ Turning to the complementary set, note that | ⟨𝑋,𝑌 𝑅 | ≤ 2 | ⟨𝑋, 𝑌 ⟩ | on 𝐵, while
|𝑈 − 𝑉 | |𝑈 − 𝑉 | (𝑈 − 𝑉) 2 ≤ = ≤ 2 |𝑈 − 𝑉 |. 2 𝑈 +𝑉 (𝑈 + 𝑉) 2𝑅 2 Hence, by Cauchy’s inequality, and using E ⟨𝑋, 𝑌 ⟩ 2 = 𝑛, ⟨𝑋, 𝑌 ⟩ (𝑈 − 𝑉) 2 1 𝐵 ≤ 4 E | ⟨𝑋, 𝑌 ⟩ | |𝑈 − 𝑉 | E 𝑅 (𝑈 + 𝑉) 2 √ √ √ √︁ ≤ 4 𝑛 E (𝑈 − 𝑉) 2 = 4 2 𝜎4 ≤ 4 2Λ. Combining this bound with (17.17), we finally obtain that ⟨𝑋, 𝑌 ⟩ (𝑈 − 𝑉) 2 ≤ 𝑐Λ. E 𝑅 (𝑈 + 𝑉) 2 As a result, we arrive in (17.16) at the bound (17.13). Lemma 17.4.2 is thus proved.□ Proof (of Proposition 17.4.1) Inserting (17.11) into the Berry–Esseen-type bound (17.2), we get that with some absolute constant 𝑐 > 0, for all 𝑇 ≥ 𝑇0 ≥ 1, ∫ 𝑇0 √︁ 𝐼 (𝑡) 2 Λ 𝑇 1 1 2√ d𝑡 + 1 + log + + e−𝑇0 /16 . 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ 𝑇0 Λ + 𝑛 𝑡 𝑛 𝑇0 𝑇 0 √︁ We choose here 𝑇 = 4𝑛, 𝑇0 = 4 log 𝑛, so that log 𝑛 Λ+ 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤ 𝑛
∫
𝑇0
√︁
0
𝐼 (𝑡) d𝑡. 𝑡
Moreover, by Cauchy’s inequality, 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤
√︁ log 𝑛 Λ + 𝑇0 𝑛
∫ 0
𝑇0
𝐼 (𝑡) 1/2 d𝑡 . 𝑡2
Hence, an application of Lemma 17.4.2 yields 𝑐 E 𝜃 𝜌(𝐹𝜃 , 𝐹) ≤
√︁ 1 ⟨𝑋, 𝑌 ⟩ Λ2 1/2 log 𝑛 Λ + 𝑇0 E + 2 . 𝑛 𝑛 𝑅 𝑛
Simplifying the expression on the right-hand side, we arrive at (17.10). Proposition 17.4.1 is proved.
□
344
17 Applications of the Second Order Correlation Condition
17.5 The Mean Value of 𝝆(𝑭𝜽 , 𝚽) in the Presence of Poincaré Inequalities Let us rewrite the bound (17.10) of Proposition 17.4.1 as 𝑐 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 2
log 𝑛 (log 𝑛) 1/4 ⟨𝑋, 𝑌 ⟩ 1/2 Λ+ E , √ 𝑛 𝑅 𝑛
(17.18)
2
+|𝑌 | where 𝑅 2 = |𝑋 | 2𝑛 (𝑅 ≥ 0), with 𝑌 being an independent copy of 𝑋. As a next step, we are going to simplify this bound by modifying the first term on the right-hand side and removing the second one for a large class of probability distributions 𝜇 of 𝑋 on R𝑛 . As we know, this is possible for symmetric 𝜇 and also √ for the class of probability distributions that are supported on the sphere of radius 𝑛 and have mean zero with log 𝑛 E ⟨𝑋, 𝑌 ⟩ 3 = 0 (in this case, the first term should be modified to 𝑛2 (E ⟨𝑋, 𝑌 ⟩ 4 ) 1/2 , cf. Proposition 16.2.1). Note that, under our standard assumptions,
E𝑅 2 = 1,
Var(𝑅 2 ) =
𝜎42 Λ ≤ . 2𝑛 2𝑛
⟩ Hence, with high probability the ratio ⟨𝑋,𝑌 𝑅 is almost ⟨𝑋, 𝑌 ⟩, which in turn has zero expectation, as long as 𝑋 has mean zero. However, in general it is not clear whether or not this approximation is sufficient to make further simplifications (having also in mind the factor 𝑛−1/2 ). Nevertheless, the approximation 𝑅 2 ∼ 1 is indeed sufficiently strong, for example, in the case where 𝜇 satisfies a Poincaré-type inequality ∫ 𝜆1 Var 𝜇 (𝑢) ≤ |∇𝑢| 2 d𝜇
(in the class of all smooth functions 𝑢 on R𝑛 ). Namely, we have the following generalization of Corollary 17.1.2. Proposition 17.5.1 Let 𝑋 be an isotropic random vector in R𝑛 with mean zero and a positive Poincaré constant 𝜆 1 . Then for some absolute constant 𝑐 > 0, 𝑐 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
log 𝑛 −1 𝜆1 . 𝑛
(17.19)
The proof is based on the following statement of independent interest. We denote by 𝑐 a positive absolute constant which may vary from place to place. Lemma 17.5.2 If the isotropic random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) has mean zero and a positive Poincaré constant 𝜆1 , then E
⟨𝑋, 𝑌 ⟩ 𝑐 ≤ 2 . 𝑅 𝜆1 𝑛
(17.20)
17.5 The Mean Value of 𝜌(𝐹𝜃 , Φ) in the Presence of Poincaré Inequalities
345
As was emphasized in the previous section, the above expectation is always ⟩ non-negative. Here, the ratio ⟨𝑋,𝑌 us understood to be zero in the case 𝑋 = 𝑌 = 0. 𝑅 Applying (17.20) in (17.18) and using Λ ≤ 4/𝜆1 (Proposition 6.3.3), we get an estimate on average 𝑐 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
(log 𝑛) 1/4 1 log 𝑛 1 + √ √ , 𝑛 𝜆1 𝑛 𝜆1 𝑛
thus proving the relation (17.19). Proof (of Lemma 17.5.2) Without loss of generality, assume that 𝑅 > 0 a.s. Put 𝛿𝑛 =
1 . 𝜆1 𝑛
We apply the Poincaré-type inequality ∫∫ ∫∫ 1 |𝑢(𝑥, 𝑦)| 2 d𝜇(𝑥) d𝜇(𝑦) ≤ |∇𝑢(𝑥, 𝑦)| 2 d𝜇(𝑥) d𝜇(𝑦), 𝜆1
(17.21)
with respect to the product measure 𝜇 ⊗ 𝜇, which holds true with the same Poincaré constant as for 𝜇 in the class of all smooth functions 𝑢 on R𝑛 × R𝑛 with (𝜇 ⊗ 𝜇)-mean zero. Moreover, according to Proposition 6.2.1, in this class, for any 𝑝 ≥ 2, ∫∫ ∫∫ 𝑝𝑝 |𝑢(𝑥, 𝑦)| 𝑝 d𝜇(𝑥) d𝜇(𝑦) ≤ |∇𝑢(𝑥, 𝑦)| 𝑝 d𝜇(𝑥) d𝜇(𝑦). (17.22) (2𝜆1 ) 𝑝/2 Let us also recall that, by Proposition 6.3.4 applied to the 2𝑛-dimensional random vector (𝑋, 𝑌 ), the event 𝐴 = {𝑅 ≤ 12 } has probability √ P( 𝐴) ≤ 3 e− 𝜆1 𝑛/2 . By (17.14), we have E
| ⟨𝑋,𝑌 ⟩ | 𝑅
≤ 𝑅𝑛, so that
| ⟨𝑋,𝑌 ⟩ | 𝑅
≤
1 2
𝑛 on the set 𝐴. This implies
𝑛 𝑐 | ⟨𝑋, 𝑌 ⟩ | 3𝑛 −√𝜆1 𝑛/2 1 𝐴 ≤ P( 𝐴) ≤ e ≤ 2 . 𝑅 2 2 𝜆1 𝑛
(17.23)
Similarly, E | ⟨𝑋, 𝑌 ⟩ | 1 𝐴 ≤
𝑛 𝑐 P( 𝐴) ≤ 2 , 4 𝜆1 𝑛
and since 𝑋 has mean zero, for the complementary set 𝐵 = {𝑅 > same bound E ⟨𝑋, 𝑌 ⟩ 1 𝐵 ≤ 𝑐 . 𝜆21 𝑛
1 2}
we have the
346
17 Applications of the Second Order Correlation Condition
Using once more (17.14), on the set 𝐴 we also have 𝑛 𝑐 P( 𝐴) ≤ 2 , 16 𝜆1 𝑛 𝑛 𝑐 4 E | ⟨𝑋, 𝑌 ⟩ | 𝑅 1 𝐴 ≤ P( 𝐴) ≤ 2 . 64 𝜆1 𝑛 E | ⟨𝑋, 𝑌 ⟩ | 𝑅 2 1 𝐴 ≤
Now, consider the function 𝑤(𝜀) = (1 + 𝜀) −1/2 on the half-axis 𝜀 ≥ − 34 . By Taylor’s formula, for some point 𝜀1 between − 34 and 𝜀, 𝑤(𝜀) = 1 −
1 3 15 1 3 𝜀 + 𝜀2 − (1 + 𝜀1 ) −7/2 𝜀 3 = 1 − 𝜀 + 𝜀 2 − 𝛽𝜀 3 2 8 16 2 8
with some 0 ≤ 𝛽 ≤ 40. Putting 𝜀 = 𝑅 2 − 1, we then get on the set 𝐵 ⟨𝑋, 𝑌 ⟩ 1 3 = ⟨𝑋, 𝑌 ⟩ − ⟨𝑋, 𝑌 ⟩ (𝑅 2 − 1) + ⟨𝑋, 𝑌 ⟩ (𝑅 2 − 1) 2 − 𝛽 ⟨𝑋, 𝑌 ⟩ (𝑅 2 − 1) 3 𝑅 2 8 15 5 3 ⟨𝑋, 𝑌 ⟩ − ⟨𝑋, 𝑌 ⟩ 𝑅 2 + ⟨𝑋, 𝑌 ⟩ 𝑅 4 − 𝛽 ⟨𝑋, 𝑌 ⟩ (𝑅 2 − 1) 3 . = 8 4 8 By the independence of 𝑋 and 𝑌 , and due to the mean zero assumption, E ⟨𝑋, 𝑌 ⟩ = E ⟨𝑋, 𝑌 ⟩ 𝑅 2 = 0. Hence, writing 1 𝐵 = 1 − 1 𝐴, we have E
⟨𝑋, 𝑌 ⟩ 15 5 3 = − E ⟨𝑋, 𝑌 ⟩ 1 𝐴 + E ⟨𝑋, 𝑌 ⟩ 𝑅 2 1 𝐴 − E ⟨𝑋, 𝑌 ⟩ 𝑅 4 1 𝐴 𝑅 8 4 8 3 4 2 3 + E ⟨𝑋, 𝑌 ⟩ 𝑅 − 𝛽 E ⟨𝑋, 𝑌 ⟩ (𝑅 − 1) 1 𝐵 . 8
Here, the first three expectations on the right-hand side do not exceed in absolute value a multiple 𝜆12 𝑛 . Thus, using (17.23), we get 1
E
⟨𝑋, 𝑌 ⟩ 𝑐1 3 ≤ 2 + E ⟨𝑋, 𝑌 ⟩ 𝑅 4 + 𝑐 2 E | ⟨𝑋, 𝑌 ⟩ | |𝑅 2 − 1| 3 , 𝑅 𝜆1 𝑛 8
(17.24)
where 𝑐 1 > 0 and 𝑐 2 > 0 are absolute constants. By Cauchy’s inequality, the square of the last expectation does not exceed E ⟨𝑋, 𝑌 ⟩ 2 E (𝑅 2 − 1) 6 = 𝑛 E (𝑅 2 − 1) 6 . In turn, the latter expectation may be bounded by virtue of the inequality (17.22) 1 (|𝑥| 2 + |𝑦| 2 ) − 1. Since applied with 𝑝 = 6 to the function 𝑢(𝑥, 𝑦) = 2𝑛 |∇𝑢(𝑥, 𝑦)| 2 = |∇ 𝑥 𝑢(𝑥, 𝑦)| 2 + |∇ 𝑦 𝑢(𝑥, 𝑦)| 2 =
|𝑥| 2 + |𝑦| 2 , 𝑛2
it gives E (𝑅 2 − 1) 6 ≤
𝑐 E 𝑅6 . 𝑛3
𝜆31
(17.25)
17.5 The Mean Value of 𝜌(𝐹𝜃 , Φ) in the Presence of Poincaré Inequalities
347
On the other hand, the Poincaré-type inequality easily yields the bound E𝑅 6 ≤ 𝑐/𝜆31 . However, in this step a more accurate estimation is required. Write 𝑅 6 = (𝑅 2 − 1) 3 + 3 (𝑅 2 − 1) 2 + 3 (𝑅 2 − 1) + 1, so that E𝑅 6 = E (𝑅 2 − 1) 3 + 3 E (𝑅 2 − 1) 2 + 1.
(17.26)
By (17.21) with the same function 𝑢, we have E (𝑅 2 − 1) 2 ≤
2 E𝑅 2 = 2𝛿 𝑛 , 𝜆1 𝑛
while (17.22) with 𝑝 = 3 gives 3 E |𝑅 2 − 1| 3 ≤ 27 𝛿3/2 𝑛 E |𝑅| .
Putting 𝑥 2 = E𝑅 6 (𝑥 > 0) and using E |𝑅| 3 ≤ 𝑥 (by applying Cauchy’s inequality), we therefore get from (17.26) that 𝑥 2 ≤ 27 𝛿3/2 𝑛 𝑥 + 6𝛿 𝑛 + 1. This quadratic inequality is easily solved to yield 𝑥 ≤ 𝑐 (𝛿 𝑛 + 1) 3/2 . One can now apply this bound in (17.25) to conclude that E (𝑅 2 − 1) 6 ≤
𝑐 (𝛿 𝑛 + 1) 3 . 𝑛3
𝜆31
This implies E ⟨𝑋, 𝑌 ⟩ 2 E (𝑅 2 − 1) 6 ≤
𝑐 (𝛿 𝑛 + 1) 3 , 𝑛2
𝜆31
which allows us to simplify the inequality (17.24) to the form E
⟨𝑋, 𝑌 ⟩ 𝑐1 𝑐2 3 ≤ 2 + 3/2 (𝛿 𝑛 + 1) 3/2 + E ⟨𝑋, 𝑌 ⟩ 𝑅 4 . 𝑅 8 𝜆1 𝑛 𝜆 𝑛 1
(17.27)
We are left with the estimation of the last expectation. Since E ⟨𝑋, 𝑌 ⟩ |𝑋 | 4 = E ⟨𝑋, 𝑌 ⟩ |𝑌 | 4 = 0, it follows that E ⟨𝑋, 𝑌 ⟩ 𝑅 4 =
1 1 2 2 2 2 ⟨𝑋, ⟩ E E |𝑋 | 𝑋 , 𝑌 |𝑋 | |𝑌 | = 2𝑛2 2𝑛2
where the latter expectation is understood in the usual vector sense. That is, in terms of the components in 𝑋, we have E |𝑋 | 2 𝑋 = (𝑎 1 , . . . , 𝑎 𝑛 ),
𝑎 𝑘 = E |𝑋 | 2 𝑋 𝑘 = E (|𝑋 | 2 − 𝑛) 𝑋 𝑘 .
348
17 Applications of the Second Order Correlation Condition
Since the collection {𝑋1 , . . . , 𝑋𝑛 } appears as an orthonormal system in 𝐿 2 (Ω, F , P), the numbers 𝑎 𝑘 represent the (Fourier) coefficients for the projection of the random variable |𝑋 | 2 − 𝑛 onto the span of 𝑋 𝑘 ’s. Hence, by Bessel’s inequality, 𝑛 ∑︁
2
4𝑛 E |𝑋 | 2 𝑋 2 = 𝑎 2𝑘 ≤ |𝑋 | 2 − 𝑛 𝐿 2 (P) = Var(|𝑋 | 2 ) = 𝑛 𝜎42 (𝑋) ≤ . 𝜆1 𝑘=1
Thus, E ⟨𝑋, 𝑌 ⟩ 𝑅 4 ≤
2 . 𝜆1 𝑛
In view of the upper bound 𝜆1 ≤ 1 (cf. Proposition 6.3.1), this expectation is dominated by the first term in (17.27), and we arrive at E
3/2 ⟨𝑋, 𝑌 ⟩ 𝑐 𝑐 1 ≤ 2 + 3/2 +1 . 𝑅 𝜆1 𝑛 𝜆 𝑛 𝜆1 𝑛 1
If 𝜆1 ≥ 𝑛−1 , the first term on the right-hand side dominates the second one, and we obtain the desired inequality (17.20). In the other case, we have 𝜆21 𝑛 ≥ 𝑛, and then (17.20) holds true, since, by (17.14), E thus proved.
⟨𝑋,𝑌 ⟩ 𝑅
1
≤ 𝑛 E𝑅 ≤ 𝑛. Lemma 17.5.2 is □
17.6 Deviations of 𝝆(𝑭𝜽 , 𝚽) in the Presence of Poincaré Inequalities In the next step we complement Proposition 17.5.1 with a large deviation bound in the presence of a Poincaré-type inequality for the distribution of the random vector 𝑋. As we know, for some absolute 𝑐 > 0, √
E e𝑐
𝜆1 | ⟨𝑋, 𝜃 ⟩ |
≤2
for all 𝜃 ∈ S𝑛−1
(cf. Proposition 6.1.1 and the Remarks after it). Hence, the parameter 𝜆 in Proposition 17.3.1 satisfies 𝜆 ≤ 𝑐√1𝜆 , and we are led to the bound 1
n 𝑐 log 𝑛 −2 o 𝜆1 𝑟 ≤ 2 exp{−𝑟 1/8 }, 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ) ≥ 𝑛
𝑟 ≥ 0.
However, here the decay in 𝑟 on the right-hand side can be improved by a more careful estimation of the probability of the events 𝐴 𝛼 from Lemma 17.2.3, while 𝜆−2 1 on the left may be replaced with a smaller quantity 𝜆−1 . More precisely, we establish 1 the following refinement of Proposition 17.5.1.
17.6 Deviations of 𝜌(𝐹𝜃 , Φ) in the Presence of Poincaré Inequalities
349
Proposition 17.6.1 Let 𝑋 be an isotropic random vector in R𝑛 with mean zero, satisfying the Poincaré-type inequality with 𝜆1 > 0. Then with some absolute constant 𝑐 > 0, for all 𝑟 ≥ 0, n √ 𝑐 log 𝑛 −1 o 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ) ≥ 𝜆1 𝑟 ≤ 2 exp − 𝑟 . 𝑛
(17.28)
This inequality readily implies a similar estimate on the mean given in Proposition 17.5.1. For the values 𝑟 = (𝛽 log 𝑛) 2 , 𝛽 > 0, (17.28) provides a polynomial bound n 𝑐𝛽2 (log 𝑛) 3 −1 o 𝜆1 ≤ 2𝑛−𝛽 . 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ) ≥ 𝑛 In other words, if a number 𝐴 is large enough, then with high 𝔰𝑛−1 -probability 𝜌(𝐹𝜃 , Φ) ≤
𝐴 (log 𝑛) 3 −1 𝜆1 . 𝑛
Proof (of Proposition 17.6.1) Let us return to the Berry–Esseen-type√︁bound of Lemma 17.2.1. Using Λ ≤ 4/𝜆1 and assuming that 𝑇 = 𝑇0 𝑛, 𝑇0 ≥ 2 log 𝑛, the bound (7.3) gives, for all 𝜃 ∈ S𝑛−1 , 𝑐 𝜌(𝐹𝜃 , Φ) ≤
log 𝑛 −1 𝜆1 + 𝑛
∫
𝑇0
| 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 + 𝐿(𝜃). 𝑡
0
(17.29)
Here ∫
𝑇
𝐿(𝜃) = 𝑇0
| 𝑓 𝜃 (𝑡)| d𝑡, 𝑡
and 𝑐 > 0 is an absolute constant. In order to estimate 𝔰𝑛−1 -probabilities for large deviations of this term, let us recall that, by Proposition 6.3.5, ∑︁ 𝑝 2 𝑛𝑝 1√ ≤ 3 e− 3 𝜆1 𝑛 , P (𝑋 (𝑘) − 𝑌 (𝑘) ) ≤ 2 𝑘=1 where 𝑋 (𝑘) , 𝑌 (𝑘) (1 ≤ 𝑘 ≤ 𝑝) are independent copies of 𝑋. One may therefore √︁ apply Lemma 17.2.3 with parameter 𝛼 = 𝑝/2 and conclude that for some absolute constant 𝑐 > 0, 1√ (17.30) E 𝜃 𝐿 (𝜃) 2 𝑝 ≤ (𝑐 log 𝑛) 2 𝑝 𝑝 2 𝑝 𝑛−2 𝑝 + e− 3 𝜆1 𝑛 . Note that the integrals 𝐿 (𝜃) were defined according to (17.4) with limits of integration 𝑇0 =
√︁ 2 √︁ 𝑝 log 𝑛 = 8 log 𝑛 𝛼
and 𝑇 = 𝑇0 𝑛,
which is consistent with the choice of the parameters 𝑇0 and 𝑇 in (17.29). Using the inequality e−𝑥 ≤ ( 4e𝑥𝑝 ) 4 𝑝 (𝑥 > 0) and the property 𝜆1 ≤ 1, (17.30) simplifies to
350
17 Applications of the Second Order Correlation Condition
E 𝜃 𝐿 (𝜃) 2 𝑝 ≤
𝑐 log 𝑛 𝑛
𝜆−1 1
2𝑝
𝑝4 𝑝
(for some other constant), i.e., ∥𝐿 ∥ 𝐿 2 𝑝 (𝔰𝑛−1 ) ≤
𝑐 log 𝑛 −1 2 𝜆1 𝑝 . 𝑛
This inequality is readily extended to all real 𝑝 ≥ 1/2. Thus, for any 𝑝 ≥ 1, E 𝜃 𝐿(𝜃) 𝑝 ≤
𝑐 log 𝑛 𝑛
𝜆−1 1
𝑝
𝑝2 𝑝 .
One can now repeat the arguments which were used in the proof of Proposition 17.3.1. First, by Markov’s inequality, for all 𝑟 > 0, n 𝑝2 𝑝 𝑐e log 𝑛 −1 o 𝜆1 𝑟 ≤ . 𝔰𝑛−1 𝐿(𝜃) ≥ 𝑛 (e𝑟) 𝑝 √ Assuming that 𝑟 ≥ 1 and choosing 𝑝 = 𝑟, we thus have for some constant 𝑐 > 0 n √ 𝑐 log 𝑛 −1 o 𝜆1 𝑟 ≤ e− 𝑟 . 𝔰𝑛−1 𝐿(𝜃) ≥ 𝑛
(17.31)
This may be combined with a large deviation bound of Lemma 17.2.2. First, by Lemmas 17.4.2 and √︁ 17.5.2, and repeating the argument in the proof of Proposition 17.4.1 with 𝑇0′ = 4 log 𝑛, we have ∫ 0
𝑇0′
√︁
√︃ ∫ 𝑇0′ 𝐼 (𝑡) 1/2 𝐼 (𝑡) d𝑡 ≤ 𝑇0′ d𝑡 𝑡 𝑡2 0 √︃ 1 ⟨𝑋, 𝑌 ⟩ Λ2 1/2 1 ≤ 𝑐 𝑇0′ E + 2 ≤ 𝑐 ′ (log 𝑛) 1/4 . 𝑛 𝑅 𝜆1 𝑛 𝑛
Hence, using 𝑇0 ≤ 𝑇0′ ≤ 𝐴𝑛1/6 (where 𝐴 > 0 is an absolute constant) and once more Λ ≤ 4/𝜆1 , Lemma 17.2.2 yields ∫ 𝑇0 𝑐 log 𝑛 −1 | 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)| d𝑡 ≥ 𝜆 1 𝑟 ≤ 2 e−𝑟 . 𝔰𝑛−1 (17.32) 𝑡 𝜆1 𝑛 0 Thus, being applied in (17.29), the two inequalities (17.31)–(17.32) yield n √ 𝑐 log 𝑛 −1 o 𝜆 1 𝑟 ≤ 3 e− min{𝑟 , 𝑟 } . 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ) ≥ 𝑛 Rescaling the variable 𝑟, one can now easily obtain the desired inequality (17.28). Proposition 17.6.1 is proved. □
17.7 Relation to the Thin Shell Problem
351
17.7 Relation to the Thin Shell Problem Returning to Proposition 17.5.1 with its bound E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 log 𝑛 −1 𝜆1 , 𝑛
recall that 𝜆1 = 𝜆1 (𝑋) > 0 for any log-concave random vector 𝑋 in R𝑛 . So, one may introduce the positive numbers 𝜆1,𝑛 = inf 𝜆1 (𝑋), 𝑋
2 𝜎4,𝑛 = sup 𝜎42 (𝑋), 𝑋
by taking the infimum and the supremum in the entire class of all isotropic, logconcave probability distributions on R𝑛 with mean zero. Since 𝜎42 (𝑋) = 𝑛1 Var(|𝑋 |) 2 1 2 is related to 𝜆 1 via 𝜎42 (𝑋) ≤ 4/𝜆1 (cf. Proposition 6.3.3), we have 𝜆−1 1,𝑛 ≥ 4 𝜎4,𝑛 . As was shown by Eldan [84], this inequality may be reversed in the form 1
≤ 𝑐 log 𝑛
𝜆1,𝑛
𝑛 𝜎2 ∑︁ 4,𝑘
𝑘 𝑘=1
for some absolute constant 𝑐 > 0. It may be simplified to 1 𝜆1,𝑛
2 ≤ 𝑐𝜎4,𝑛 (log 𝑛) 2 ,
(17.33)
2 is increasing in some sense (cf. Remark 17.7.2 below). since the sequence 𝜎4,𝑛 2 is bounded by an absolute Assuming that the thin shell conjecture stating that 𝜎4,𝑛 constant is true, Proposition 17.5.1 then yields the upper bound
E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 (log 𝑛) 3 𝑛
for any mean zero isotropic random vector 𝑋 in R𝑛 with a log-concave distribution. To refine the relationship between the central limit theorem and the thin-shell problem, let us complement Proposition 17.5.1 with the following general statement involving the maximal 𝜓1 -norm 𝜆 = max 𝜃 ∈S𝑛−1 ∥ ⟨𝑋, 𝜃⟩ ∥ 𝜓1 . Proposition 17.7.1 Let 𝑋 be a random vector in R𝑛 with E |𝑋 | 2 = 𝑛, satisfying the exponential moment condition (17.7). Then for some absolute constant 𝑐 > 0, 𝑐 𝜎42 (𝑋) ≤ 𝑛 (𝜆 log 𝑛) 4 E 𝜃 𝜌(𝐹𝜃 , Φ) +
𝜆4 +1 𝑛4
(17.34)
In the isotropic log-concave case, the linear functionals ⟨𝑋, 𝜃⟩ have log-concave distributions on the real line with second moment 1 for all 𝜃 ∈ S𝑛−1 . Hence the condition (17.7) is fulfilled for some absolute constant 𝜆 (cf. Corollary 2.5.2 or 2.6.4), which simplifies (17.34) to the relation
352
17 Applications of the Second Order Correlation Condition
𝑐 𝜎42 (𝑋) ≤ 𝑛 (log 𝑛) 4 E 𝜃 𝜌(𝐹𝜃 , Φ) + 1. Therefore, the potential property E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 log 𝑛 𝑛
(17.35)
would imply that 𝜎42 (𝑋) ≤ 𝑐 (log 𝑛) 5 .
(17.36)
Note that in the thin shell problem one may assume additionally (without loss of generality) that √ the distribution of 𝑋 is symmetric about zero. Indeed, define 𝑋 ′ = (𝑋 − 𝑌 )/ 2, where 𝑌 is an independent copy of a random vector 𝑋 with an isotropic log-concave distribution on R𝑛 (with mean zero). Then, the distribution of 𝑋 ′ is isotropic, log-concave, and symmetric about zero. Moreover, 1 Var |𝑋 | 2 + |𝑌 | 2 − 2 ⟨𝑋, 𝑌 ⟩ 4𝑛 1 1 1 1 = Var |𝑋 | 2 + Var |𝑌 | 2 + E ⟨𝑋, 𝑌 ⟩ 2 = 𝜎42 (𝑋) + 1. 4𝑛 4𝑛 𝑛 2
𝜎42 (𝑋 ′) =
Hence, if (17.36) is true for the random vector 𝑋 ′, then it holds for 𝑋 as well (with constant 2𝑐 in place of 𝑐). Note also that, starting from the normal approximation such as (17.35) and ap7 plying the inequalities (17.33)–(17.34), we get 𝜆−1 1,𝑛 ≤ 𝑐 (log 𝑛) . Thus, modulo 𝑛-dependent logarithmic factors, the following three assertions are equivalent up to positive constants 𝑐 and 𝛽 (perhaps different in different places): 𝛽 (i) 𝜆−1 1,𝑛 ≤ 𝑐 (log 𝑛) ; 2 (ii) 𝜎4,𝑛 ≤ 𝑐 (log 𝑛) 𝛽 ; (iii) E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝑛𝑐 (log 𝑛) 𝛽 for any isotropic random vector 𝑋 in R𝑛 with a symmetric log-concave distribution.
Proof (of Proposition 17.7.1) In view of the triangle inequality 𝜌(𝐹, Φ) ≤ E 𝜃 𝜌(𝐹𝜃 , Φ), it is sufficient to derive (17.34) for 𝜌(𝐹, Φ) in place of E 𝜃 𝜌(𝐹𝜃 , Φ). As we know, the typical distribution function is represented as 𝐹 (𝑥) = P{|𝑋 | 𝜃 1 ≤ 𝑥}, assuming that the random vector 𝑋 and 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 viewed as a random vector uniformly distributed on the unit sphere are independent. This description yields ∫ ∞ 3 , 𝑥 4 d𝐹 (𝑥) = E |𝑋 | 4 E 𝜃 𝜃 14 = (𝑛2 + 𝜎42 𝑛) 𝑛(𝑛 + 2) −∞ or equivalently ∫
∞
−∞
𝑥 4 d𝐹 (𝑥) −
∫
∞
−∞
𝑥 4 dΦ(𝑥) =
3 (𝜎 2 − 2), 𝑛+2 4
(17.37)
17.7 Relation to the Thin Shell Problem
353
where 𝜎42 = 𝜎42 (𝑋). On the other hand, it follows from (17.7) that ∫
∞
e | 𝑥 |/𝜆 d𝐹 (𝑥) ≤ 2.
(17.38)
−∞
Using 𝑡 2 ≤ 4e−2 e𝑡 (𝑡 ≥ 0) together with the property √ (17.38) implies that 𝜆 ≥ e/ 8, which in turn implies 1 − Φ(𝑥) ≤
1 −𝑥 2 /2 e ≤ e−𝑥/𝜆 , 2
∫∞ −∞
𝑥 2 d𝐹 (𝑥) =
1 𝑛
E |𝑋 | 2 = 1,
𝑥 ≥ 0.
In addition, by Markov’s inequality, 𝐹 (−𝑥) + (1 − 𝐹 (𝑥)) ≤ 2 e−𝑥/𝜆 , so that |𝐹 (−𝑥) − Φ(−𝑥)| + |𝐹 (𝑥) − Φ(𝑥)| ≤ 4 e−𝑥/𝜆 for all 𝑥 ≥ 0. Hence, integrating by parts, we see that, for any 𝑇 ≥ 6𝜆, the absolute value of the left-hand side of (17.37) does not exceed ∫ 𝑇 ∫ ∞ 4 |𝑥| 3 |𝐹 (𝑥) − Φ(𝑥)| d𝑥 + 16 𝑥 3 e−𝑥/𝜆 d𝑥 ≤ 2𝑇 4 𝜌(𝐹, Φ) + 32 𝜆𝑇 3 e−𝑇/𝜆 . −𝑇
𝑇
Choosing 𝑇 = 9𝜆 log 𝑛, we get i h 𝜆4 𝜎42 − 2 ≤ 𝐶𝑛 𝜆4 (log 𝑛) 4 𝜌(𝐹, Φ) + 9 (log 𝑛) 3 , 𝑛 thus proving (17.34).
□
2 . Given two Remark 17.7.2 Let us comment on the monotonicity of the sequence 𝜎4,𝑛 independent isotropic random vectors 𝑋 and 𝑌 with values in R𝑛 and R 𝑘 , respectively, and having log-concave distributions, the random vector 𝑍 = (𝑋, 𝑌 ) is isotropic in R𝑛+𝑘 and has a log-concave distribution as well. Since |𝑍 | 2 = |𝑋 | 2 + |𝑌 | 2 , we have 2 Var(|𝑍 | 2 ) = Var(|𝑋 | 2 ) + Var(|𝑌 | 2 ). Choosing 𝑋 and 𝑌 such that Var(|𝑋 | 2 ) = 𝑛𝜎4,𝑛 2 2 and Var(|𝑌 | ) = 𝑘𝜎4,𝑘 , it follows that 2 𝜎4,𝑛+𝑘 ≥
𝑛 𝑘 𝜎2 + 𝜎2 . 𝑛 + 𝑘 4,𝑛 𝑛 + 𝑘 4,𝑘
(17.39)
2 2 implying that the sequence 𝜎 2 is non-decreasing along In particular, 𝜎4,2𝑛 ≥ 𝜎4,𝑛 4,𝑛 2 ≥ 1 𝜎 2 whenever 𝑛 ≤ 𝑘 ≤ 𝑛. One the powers 𝑛 = 2𝑚 . In addition, by (17.39), 𝜎4,𝑛 2 4,𝑘 2 may now combine these two properties. Given 1 ≤ 𝑘 < 𝑛, let 2𝑚−1 < 𝑛 ≤ 2𝑚 and 2𝑙−1 ≤ 𝑘 ≤ 2𝑙 for some integers 𝑚 ≥ 𝑙 ≥ 1. In the case 𝑙 = 𝑚, the previous bound 2 ≥ 1 𝜎 2 holds true. If 𝑙 < 𝑚, then similarly 𝜎4,𝑛 2 4,𝑘 2 𝜎4,𝑛 ≥
1 2 1 2 1 2 𝜎 𝑚−1 ≥ 𝜎4,2 𝜎 . 𝑙 ≥ 2 4,2 2 4 4,𝑘
354
17 Applications of the Second Order Correlation Condition
2 ≥ This shows that 𝜎4,𝑛
1 4
2 whenever 𝑛 > 𝑘 ≥ 1. In particular, 𝜎4,𝑘
𝑛 𝜎2 ∑︁ 4,𝑘
𝑘 𝑘=1
2 ≤ 4𝜎4,𝑛
𝑛 ∑︁ 1 2 ≤ 4𝜎4,𝑛 (1 + log 𝑛). 𝑘 𝑘=1
17.8 Remarks The material of this chapter is based on the papers [39] and [40].
Part VI
Distributions and Coefficients of Special Type
Chapter 18
Special Systems and Examples
We will now illustrate some general results on the closeness of the distributions 𝐹𝜃 of the weighted sums ⟨𝑋, 𝜃⟩, 𝜃 ∈ S𝑛−1 , to the typical distribution 𝐹 = E 𝜃 𝐹𝜃 on several specific classes of random vectors 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 . Most of them can be united in one scheme, where it is possible to develop a common approach to lower bounds on E 𝜃 𝜌(𝐹𝜃 , 𝐹) and E 𝜃 𝜔(𝐹𝜃 , Φ) that are similar to upper bounds (modulo logarithmic terms with respect to the dimension 𝑛).
18.1 Systems with Lipschitz Condition We start with the description of a general situation where a rate of order 𝑛−1/2 is achievable, namely for the 𝐿 2 -distance 𝜔(𝐹𝜃 , 𝐹) on average with respect to 𝜃 and therefore for the Kolmogorov distances 𝜌(𝐹𝜃 , 𝐹) and 𝜌(𝐹𝜃 , Φ) modulo a logarithmic factor. While the upper bounds of order 𝑛−1/2 for these distances have been studied in detail, in this section we focus on conditions that provide similar lower bounds. Definition. Let 𝐿 be a fixed measurable function on the underlying probability space (Ω, F , P). We will say that the system 𝑋1 , . . . , 𝑋𝑛 of random variables on (Ω, F , P), or the random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 satisfies a Lipschitz condition with a parameter function 𝐿 if max |𝑋 𝑘 (𝑡) − 𝑋 𝑘 (𝑠)| ≤ 𝑛 |𝐿(𝑡) − 𝐿(𝑠)|,
1≤𝑘 ≤𝑛
𝑡, 𝑠 ∈ Ω.
(18.1)
When Ω is an interval of the real line (finite or not), and 𝐿 (𝑡) = 𝐿𝑡, 𝐿 > 0, this condition means that every function 𝑋 𝑘 in the system has Lipschitz semi-norm at most 𝐿𝑛. As before, we involve the variance functional 𝜎42 = 𝑛1 Var(|𝑋 | 2 ).
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_18
357
358
18 Special Systems and Examples
Proposition 18.1.1 Suppose that E |𝑋 | 2 = 𝑛. If the random vector 𝑋 satisfies the Lipschitz condition with a parameter function 𝐿, then E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥
𝑐 𝐿 𝑐 0 (1 + 𝜎44 ) − 𝑛 𝑛2
(18.2)
for some absolute constant 𝑐 0 > 0 and with a constant 𝑐 𝐿 > 0 depending on the distribution of 𝐿 only. Moreover, if 𝐿 has a finite second moment, then 𝑐 0 (1 + 𝜎44 ) 𝑐1 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥ √︁ − 𝑛2 𝑛 Var(𝐿)
(18.3)
for some absolute constant 𝑐 1 > 0. We also have similar estimates when 𝐹 is replaced with the standard normal distribution function Φ. Note that, if 𝑋1 , . . . , 𝑋𝑛 form an orthonormal system in 𝐿 2 (Ω, F , P), and 𝐿 has a finite second moment ∥𝐿∥ 22 = E𝐿 2 , then this moment has to be bounded from below by a multiple of 1/𝑛2 . Indeed, integrating the inequality |𝑋 𝑘 (𝑡) − 𝑋 𝑘 (𝑠)| 2 ≤ 𝑛2 |𝐿(𝑡) − 𝐿(𝑠)| 2 over the product measure P ⊗ P, we obtain a lower bound 𝑛2 Var(𝐿) ≥ Var(𝑋 𝑘 ) = 1 − (E𝑋 𝑘 ) 2 . Í Summing over all 𝑘 ≤ 𝑛 and using 𝑛𝑘=1 (E𝑋 𝑘 ) 2 ≤ 1 (cf. Proposition 1.1.2), we get Var(𝐿) ≥
1 𝑛−1 ≥ 2 𝑛3 2𝑛
(𝑛 ≥ 2).
The Lipschitz condition (18.1) guarantees the validity of the following property. Lemma 18.1.2 Suppose that the random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) satisfies the Lipschitz condition with the parameter function 𝐿, and let 𝑌 be an independent copy of 𝑋. Then n o 𝑐√𝜆 2 , 0 ≤ 𝜆 ≤ 1, P |𝑋 − 𝑌 | ≤ 𝜆𝑛 ≥ 𝑛 where the constant 𝑐 > 0 depends on the distribution of 𝐿 only. Moreover, if 𝐿 has a finite second moment, then √ n o 𝜆 2 P |𝑋 − 𝑌 | ≤ 𝜆𝑛 ≥ √︁ , 0 ≤ 𝜆 ≤ 𝑛2 Var(𝐿). 6𝑛 Var(𝐿) In turn, this lemma is based on the following general observation.
18.1 Systems with Lipschitz Condition
359
Lemma 18.1.3 If 𝜂 is an independent copy of a random variable 𝜉, then for any 𝜀0 > 0, P{|𝜉 − 𝜂| ≤ 𝜀} ≥ 𝑐𝜀, 0 ≤ 𝜀 ≤ 𝜀0 , with √︁ some constant 𝑐 > 0 independent of 𝜀. Moreover, if the standard deviation 𝜎 = Var(𝜉) is finite, then P{|𝜉 − 𝜂| ≤ 𝜀} ≥
1 𝜀, 6𝜎
0 ≤ 𝜀 ≤ 𝜎.
Proof The difference 𝜉 − 𝜂 has a non-negative characteristic function ℎ(𝑡) = |𝜓(𝑡)| 2 , where 𝜓 is the characteristic function of 𝜉. Denoting by 𝐻 the distribution function of 𝜉 − 𝜂, we start with a general identity ∫ ∞ ∫ ∞ 𝑝(𝑥) ˆ d𝐻 (𝑥) = 𝑝(𝑡)ℎ(𝑡) d𝑡, (18.4) −∞
−∞
which is valid for any integrable function 𝑝(𝑡) on the real line with Fourier transform ∫ ∞ 𝑝(𝑥) ˆ = e𝑖𝑡 𝑥 𝑝(𝑡) d𝑡, 𝑥 ∈ R. −∞
Given 𝜀 > 0, here we take a standard pair 𝑝(𝑡) =
𝜀𝑡 1 sin 2 2 , 𝜀𝑡 2𝜋 2
𝑝(𝑥) ˆ =
1 |𝑥| + 1− . 𝜀 𝜀
In this case, ∫
∞
𝑝(𝑥) ˆ d𝐻 (𝑥) ≤ −∞
1 𝜀
∫ d𝐻 (𝑥) = [−𝜀, 𝜀 ]
On the other hand, since the function ∫
∞
𝑝(𝑡)ℎ(𝑡) d𝑡 ≥ −∞
sin 𝑢 𝑢
2 1 2 sin(1/2) 2𝜋
1 P{|𝜉 − 𝜂| ≤ 𝜀}. 𝜀
is decreasing in 0 < 𝑢
0 depends on the distributions of 𝐿 1 and 𝐿 2 only. Proof By the Lipschitz condition (18.5), for any 𝑘 ≤ 𝑛, |𝑋 𝑘 (𝑡1 , 𝑡2 ) − 𝑋 𝑘 (𝑠1 , 𝑠2 )| 2 ≤ 2𝑛2 |𝐿 1 (𝑡1 ) − 𝐿 1 (𝑠1 )| + 2 |𝐿 2 (𝑡2 ) − 𝐿 2 (𝑠2 )| 2 , so, |𝑋 (𝑡 1 , 𝑡2 ) − 𝑌 (𝑠1 , 𝑠2 )| 2 =
𝑛 ∑︁
|𝑋 𝑘 (𝑡 1 , 𝑡2 ) − 𝑋 𝑘 (𝑠1 , 𝑠2 )| 2
𝑘=1
≤ 2𝑛3 |𝐿 1 (𝑡 1 ) − 𝐿 1 (𝑠1 )| 2 + 2𝑛 |𝐿 2 (𝑡 2 ) − 𝐿 2 (𝑠2 )| 2 .
18.1 Systems with Lipschitz Condition
361
Putting 𝐿 1 (𝑡1 , 𝑡2 ) = 𝐿 1 (𝑡1 ) and 𝐿 2 (𝑡1 , 𝑡2 ) = 𝐿 2 (𝑡 2 ), one may treat 𝐿 1 and 𝐿 2 as independent random variables. If 𝐿 1′ is an independent copy of 𝐿 1 , and 𝐿 2′ is an independent copy of 𝐿 2 (and 𝐿 1 , 𝐿 1′ , 𝐿 2 , 𝐿 2′ are independent), we obtain that o n n 𝜆o P |𝑋 − 𝑌 | 2 ≤ 𝜆𝑛 ≥ P 𝑛2 |𝐿 1 − 𝐿 1′ | 2 + |𝐿 2 − 𝐿 2′ | 2 ≤ 2 n o n 𝜆 𝜆o ≥ P 𝑛2 |𝐿 1 − 𝐿 1′ | 2 ≤ P |𝐿 2 − 𝐿 2′ | 2 ≤ 4 4 n o n √ √ o 1 1 = P |𝐿 1 − 𝐿 1′ | ≤ 𝜆 P |𝐿 2 − 𝐿 2′ | ≤ 𝜆 . 2𝑛 2 It remains to apply Lemma 18.1.3.
□
Let us now explain how to apply Lemma 18.1.2 and Lemma 18.1.4 to obtain a general lower bound on the 𝐿 2 -distance ∫ ∞ 1/2 𝜔(𝐹𝜃 , 𝐹) = |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| 2 d𝑥 . −∞
Here we denote by 𝐹𝜃 (𝑥) the distribution function of the linear form ⟨𝑋, 𝜃⟩ with 𝜃 ∈ S𝑛−1 and by 𝐹 (𝑥) = E 𝐹𝜃 (𝑥), 𝑥 ∈ R, the typical distribution function. To this end, let us recall the bound of Proposition 15.6.2, n 1 + 𝜎44 1 o , E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥ 𝑐 0 P |𝑋 − 𝑌 | 2 ≤ 𝑛 − 𝑐 1 4 𝑛2 valid for some positive absolute constants 𝑐 0 and 𝑐 1 under the assumption that E |𝑋 | 2 = 𝑛. Combining this bound with Lemma 18.1.2, we arrive at the lower bounds (18.2)–(18.3). But, using Lemma 18.1.4, we come to a similar conclusion: Proposition 18.1.5 The statement of Proposition 18.1.1 continues to hold under the Lipschitz condition (18.5). Thus, for some absolute constant 𝑐 0 > 0, E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥
𝑐 𝑐 0 (1 + 𝜎44 ) − , 𝑛 𝑛2
where the constant 𝑐 > 0 depends on the distributions of 𝐿 1 and 𝐿 2 only. A similar estimate also holds when 𝐹 is replaced with the normal distribution function Φ. The assertion on the normal approximation follows from the inequality 𝜔(𝐹, Φ) ≤ 𝐶 (1 + 𝜎42 )/𝑛, cf. Proposition 12.5.1.
362
18 Special Systems and Examples
18.2 Trigonometric Systems For an illustration of Propositions 18.1.1 and 18.1.5, let us first consider a few examples √where the distribution of the random vector 𝑋 is supported on the sphere of radius 𝑛 (in which case 𝜎42 = 0). The first classical example is described by the trigonometric system 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) with components √ 𝑋2𝑘−1 (𝑡) = 2 cos(𝑘𝑡), √ 𝑋2𝑘 (𝑡) = 2 sin(𝑘𝑡) (−𝜋 < 𝑡 < 𝜋, 𝑘 = 1, . . . , 𝑛/2), assuming that 𝑛 is even. They are treated as random variables on the probability space Ω = (−𝜋, 𝜋) equipped with the normalized Lebesgue measure P (the uniform distribution). Note that the linear forms 𝑛 2 √ ∑︁ ⟨𝑋, 𝜃⟩ = 2 𝜃 2𝑘−1 cos(𝑘𝑡) + 𝜃 2𝑘 sin(𝑘𝑡) ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ,
𝑘=1
√ represent trigonometric polynomials of degree at most 𝑛2 . The normalization 2 is chosen √ for the convenience only, since in this case 𝑋 is isotropic and satisfies |𝑋 | = 𝑛. By Sudakov’s theorem, the distributions 𝐹𝜃 are almost standard normal for most of 𝜃. And as usual, the main question we study is to determine the rate of approximation of 𝐹𝜃 by Φ for canonical probability metrics. Since for all 𝑘 ≤ 𝑛2 , we have √ 𝑛 |𝑋 𝑘 (𝑡) − 𝑋 𝑘 (𝑠)| ≤ 𝑘 2 |𝑡 − 𝑠| ≤ √ |𝑡 − 𝑠|, 2
𝑡, 𝑠 ∈ Ω,
√ the Lipschitz condition is fulfilled with the function 𝐿 (𝑡) = 𝑡/ 2, in which case Var(𝐿) = 𝜋 2 /6. Hence Proposition 18.1.1 yields the lower bound E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥
𝑐1 𝑐2 − 2 𝑛 𝑛
for some absolute constants 𝑐 𝑗 > 0. Such an estimate may also be obtained by applying Corollary 15.4.3, which gives E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥
𝑐 E ⟨𝑋, 𝑌 ⟩ 4 − 𝑂 (𝑛−2 ), 𝑛4
where 𝑌 is an independent copy of 𝑋. On the other hand, according to Proposition 15.2.1, E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≤ 𝑛𝑐 . Hence, we arrive at: Proposition 18.2.1 For the trigonometric system 𝑋 we have for some positive absolute constants 𝑐 1 > 𝑐 0 > 0, 𝑐0 𝑐1 ≤ E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≤ . 𝑛 𝑛
(18.6)
18.2 Trigonometric Systems
363 𝑐′
The inequality 𝑐𝑛1 − 𝑛𝑐22 ≥ 𝑛1 holds true with some absolute constant 𝑐 1′ > 0 for all 𝑛 large enough. So, strictly speaking, the lower bound in (18.6) may only be stated for 𝑛 ≥ 𝑛0 (on the basis of Proposition 18.1.1). It should however be clear that E 𝜃 𝜔2 (𝐹𝜃 , Φ) does not vanish (since trigonometric polynomials are bounded and therefore may not have a normal distribution). Hence, one may involve in (18.6) all values of 𝑛. Applying Proposition 16.4.4 with 𝐷 = 𝑐 0 /𝑛, there are similar bounds for the 𝐿 1 -norm (modulo logarithmic factors). Namely, it gives 𝑐0
𝑐1 √ ≤ E 𝜃 𝜔(𝐹𝜃 , Φ) ≤ √ . 𝑛 (log 𝑛) 𝑛 15 4
We also get an analogous pointwise lower bound on the “essential” part of the unit sphere. It seems however natural that the logarithmic factor could be removed from the left-hand side. A similar statement is also true for the Kolmogorov distance. Combining Proposition 14.6.1 with Corollary 16.4.5, we have: Proposition 18.2.2 For the trigonometric system 𝑋 we have with some positive absolute constants 𝑐 1 > 𝑐 0 > 0, 𝑐0 𝑐 1 log 𝑛 . √ ≤ E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √ 4 (log 𝑛) 𝑛 𝑛
(18.7)
Analogous results continue to hold for the cosine trigonometric system 𝑋 with components √ 𝑋 𝑘 (𝑡) = 2 cos(𝑘𝑡) (0 < 𝑡 < 𝜋, 𝑘 = 1, . . . , 𝑛) on the probability space Ω = (0, 𝜋) equipped with the normalized Lebesgue measure P. In that case, the linear forms 𝑛 √ ∑︁ ⟨𝑋, 𝜃⟩ = 2 𝜃 𝑘 cos(𝑘𝑡),
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ,
𝑘=1
√ represent trigonometric polynomials of degree 𝑛. Due to the normalization 2, the √ distribution of 𝑋 is isotropic in R𝑛 . The property |𝑋 | = 𝑛 is not true anymore. √ However, there is a pointwise bound |𝑋 | ≤ 2𝑛. In addition, the variance functional 𝜎42 = 𝑛1 Var(|𝑋 | 2 ) is bounded by a universal constant for all 𝑛 ≥ 1 (and actually it does not depend on 𝑛). Indeed, write 𝑋 𝑘2 = 2 cos2 (𝑘𝑡) = 1 + 12 (e2𝑖𝑘𝑡 + e−2𝑖𝑘𝑡 ), so that ∑︁ ∑︁ 2 (|𝑋 | 2 − 𝑛) = e2𝑖𝑘𝑡 , 4 (|𝑋 | 2 − 𝑛) 2 = e2𝑖 (𝑘+𝑙)𝑡 . 0< |𝑘 | ≤𝑛
0< |𝑘 |, |𝑙 | ≤𝑛
It follows that 4 Var(|𝑋 | 2 ) =
∑︁ 0< |𝑘 |, |𝑙 | ≤𝑛
E e2𝑖 (𝑘+𝑙)𝑡 =
∑︁ 0< |𝑘 | ≤𝑛, 𝑙=−𝑘
1 = 2𝑛.
364
18 Special Systems and Examples
Hence
1 1 Var(|𝑋 | 2 ) = . 𝑛 2 Now, the Lipschitz condition for the system 𝑋 is fulfilled with the function 𝐿 (𝑡) = √ 2 𝑡. Hence Proposition 18.1.1 yields a similar lower bound E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥ 𝑐𝑛1 − 𝑛𝑐22 . On the other hand, by Proposition 15.2.1, 𝜎42 =
E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≤
𝑐 (1 + 𝜎42 ) 3𝑐/2 = . 𝑛 𝑛
As for the Kolmogorov distance, one may appeal again to Proposition 14.6.1 and Corollary 16.4.5. Proposition 18.2.3 For the cosine trigonometric system 𝑋 on the interval (0, 𝜋), the bounds (18.6)–(18.7) are also true for some positive absolute constants. Í Let us also note that the sums 𝑛𝑘=1 cos(𝑘𝑡) remain bounded for growing 𝑛 (for any fixed 0 < 𝑡 < 𝜋). Hence the normalized sums √ ∑︁ 𝑛 2 cos(𝑘𝑡), 𝑆 𝑛 (𝑡) = √ 𝑛 𝑘=1 which correspond to ⟨𝑋, 𝜃⟩ with equal coefficients, are convergent to zero pointwise on Ω as 𝑛 → ∞. In particular, they fail to satisfy the central limit theorem.
18.3 Chebyshev Polynomials An example closely related to the cosine trigonometric system is represented by the normalized Chebyshev polynomials. They are defined by √ 𝑋 𝑘 (𝑡) = 2 cos(𝑘 arccos 𝑡) 𝑛 𝑛 i √ h 𝑡 𝑛−2 (1 − 𝑡 2 ) + = 2 𝑡𝑛 − 𝑡 𝑛−4 (1 − 𝑡 2 ) 2 − . . . , 2 4 which we consider for 𝑘 = 1, 2, . . . , 𝑛. These polynomials are orthonormal on the interval Ω = (−1, 1) with respect to the probability measure 1 dP(𝑡) = √ , d𝑡 𝜋 1 − 𝑡2
−1 < 𝑡 < 1,
cf. e.g. [114], [120] (for simplicity, we exclude the constant function 𝑋0 (𝑡) = 1). Similarly to the cosine trigonometric system, for the random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) we find that ∑︁ 4 (|𝑋 | 2 − 𝑛) 2 = exp{2𝑖(𝑘 + 𝑙) arccos 𝑡}. 0< |𝑘 |, |𝑙 | ≤𝑛
18.4 Functions of the Form 𝑋𝑘 (𝑡 , 𝑠) = 𝑓 (𝑘𝑡 + 𝑠)
365
It follows that ∑︁
4 Var(|𝑋 | 2 ) = 1 𝑛
1 = 2𝑛,
0< |𝑘 | ≤𝑛
0< |𝑘 |, |𝑙 | ≤𝑛
so that 𝜎42 =
∑︁
E exp{2𝑖(𝑘 + 𝑙) arccos 𝑡} =
Var(|𝑋 | 2 ) = 12 . Now, for all 𝑘 ≤ 𝑛,
√ |𝑋 𝑘 (𝑡) − 𝑋 𝑘 (𝑠)| ≤ 𝑘 2 | arccos 𝑡 − arccos 𝑠|,
𝑡, 𝑠 ∈ Ω,
which implies that √ the Lipschitz condition for the system 𝑋 is fulfilled with the function 𝐿 (𝑡) = 2 arccos 𝑡. One may also notice that ∫
1
Var(𝐿) = 2
(arcsin 𝑡) 2 dP(𝑡) =
−1
𝜋2 . 6
Hence Proposition 18.1.1 yields the lower bound E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥
𝑐1 𝑐2 − 2 𝑛 𝑛
for some absolute constants 𝑐 > 0. Together with Proposition 15.2.1, we obtain a full analogue of Proposition 18.2.1, while an application of Proposition 14.6.1 and Corollary 16.4.5 lead to an analogue of Proposition 18.2.2. Proposition 18.3.1 For the system (𝑋1 , . . . , 𝑋𝑛 ) of the first 𝑛 Chebyshev polynomials on (−1, 1), we have with some positive absolute constants 𝑐 1 > 𝑐 0 > 0, 𝑐0 𝑐1 ≤ E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≤ , 𝑛 𝑛 𝑐0 𝑐 1 log 𝑛 . √ ≤ E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √ (log 𝑛) 4 𝑛 𝑛
18.4 Functions of the Form 𝑿 𝒌 (𝒕, 𝒔) = 𝒇 (𝒌 𝒕 + 𝒔) Let 𝑓 be a real-valued, 1-periodic Borel measurable function on the real line such ∫1 ∫1 that 0 𝑓 (𝑥) d𝑥 = 0 and 0 𝑓 (𝑥) 2 d𝑥 = 1. On the square Ω = (0, 1) × (0, 1), which we equip with the Lebesgue measure P, we consider the system of random variables 𝑋 𝑘 (𝑡, 𝑠) = 𝑓 (𝑘𝑡 + 𝑠)
(0 < 𝑡, 𝑠 < 1).
As we know from Section 2.2, {𝑋 𝑘 }∞ 𝑘=1 represents a strictly stationary sequence of pairwise independent random variables on Ω. Since ∫
1
𝑓 (𝑥) d𝑥 = 0,
E𝑋 𝑘 = 0
E𝑋 𝑘2 =
∫ 0
1
𝑓 (𝑥) 2 d𝑥 = 1,
366
18 Special Systems and Examples
the vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) is isotropic, and its components are pairwise independent. The latter insures that, if 𝑓 has finite 4-th moment on (0, 1), the variance functional ∫ 1 1 2 2 𝜎4 = Var(|𝑋 | ) = 𝑓 (𝑥) 4 d𝑥 − 1 𝑛 0 is finite and does not depend on 𝑛. In addition, if the function 𝑓 has finite Lipschitz semi-norm ∥ 𝑓 ∥ Lip , then for all (𝑡1 , 𝑡2 ) and (𝑠1 , 𝑠2 ) in Ω, |𝑋 𝑘 (𝑡 1 , 𝑡2 ) − 𝑋 𝑘 (𝑠1 , 𝑠2 )| ≤ ∥ 𝑓 ∥ Lip 𝑘 |𝑡1 − 𝑠1 | + |𝑡2 − 𝑠2 | . This means that the Lipschitz condition (18.5) is fulfilled with linear functions 𝐿 1 and 𝐿 2 . Hence, one may apply Proposition 18.1.5, giving the lower bound E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≥
𝑐 (1 + 𝜎44 ) 𝑐𝑓 − . 𝑛 𝑛2
𝑐′
Hence E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥ 𝑛𝑓 for all 𝑛 ≥ 𝑛0 , where the positive constants 𝑐 𝑓 and 𝑐 ′𝑓 , as well as the integer 𝑛0 ≥ 1, depend on the distribution of 𝑓 only. Since the collection {𝐹𝜃 } 𝜃 ∈S𝑛−1 is separated from the normal distribution function Φ in the weak sense for 𝑛 < 𝑛0 (by the uniform boundedness of the 𝑋 𝑘 ’s), the latter lower bound holds true for all 𝑛 ≥ 1. On the other hand, by Proposition 15.2.1, there is a similar upper bound under the 4-th moment condition, and then we obtain another full analogue of Proposition 18.2.1. Also,√as was used above, Lipschitz functions on (0, 1) are bounded, so that |𝑋 | ≤ 𝑏 𝑛 with 𝑏 = sup 𝑥 | 𝑓 (𝑥)|, and one may apply Corollary 16.4.5. Using Proposition 14.6.1, we thus obtain the analogue of Proposition 18.2.2. Proposition 18.4.1 Assume that a 1-periodic function 𝑓 has finite 4-th moment on ∫1 ∫1 (0, 1) and is normalized so that 0 𝑓 (𝑥) d𝑥 = 0, 0 𝑓 (𝑥) 2 d𝑥 = 1. Then E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≤
𝑐1 , 𝑛
E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 2 log 𝑛 √ 𝑛
for all 𝑛 ≥ 2 with some positive constants 𝑐 𝑗 depending on 𝑓 only. Moreover, if 𝑓 has a finite Lipschitz semi-norm, then E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≥
𝑐0 , 𝑛
E 𝜃 𝜌(𝐹𝜃 , Φ) ≥
𝑐0 √ . (log 𝑛) 4 𝑛
Choosing 𝑓 (𝑡) = cos 𝑡, we obtain the system 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) with 𝑋 𝑘 (𝑡, 𝑠) = cos(𝑘𝑡 + 𝑠), which is closely related to the cosine trigonometric system. The main difference is however the property that the 𝑋 𝑘 ’s are now pairwise independent. Nevertheless, the normalized sums
18.5 The Walsh System on the Discrete Cube
367
𝑛
1 ∑︁ cos(𝑘𝑡 + 𝑠) √ 𝑛 𝑘=1 fail to satisfy the central limit theorem (since they converge pointwise to zero). One can construct other similar examples in the form 𝑓 (𝑘𝑡 + 𝑠).
18.5 The Walsh System on the Discrete Cube Given an integer 𝑑 ≥ 1, consider the functions on the discrete cube Ω = {−1, 1} 𝑑 , Ö 𝑋 𝜏 (𝑡) = 𝑡 𝑘 , 𝑡 = (𝑡1 , . . . , 𝑡 𝑑 ) ∈ Ω, 𝜏 ⊂ {1, . . . , 𝑑}, 𝑘 ∈𝜏
where one puts 𝑋 ∅ = 1. Equipping the discrete cube with the uniform (i.e., normalized counting) measure P, we obtain 2𝑑 random variables which form a complete orthonormal system in 𝐿 2 (Ω, P), called the Walsh system on the discrete cube. Note that each 𝑋 𝜏 with 𝜏 ≠ ∅ is a symmetric Bernoulli random variable: It takes only two values, −1 and 1, each with probability 12 . For simplicity, let us exclude the constant random variable 𝑋 ∅ and consider the random vector 𝑋 = {𝑋 𝜏 } 𝜏≠∅ in R𝑛 of dimension 𝑛 = 2𝑑 − 1 (the ordering of the components does not play any role). As before, we denote by 𝐹𝜃 the distribution function of the linear forms ∑︁ ⟨𝑋, 𝜃⟩ = 𝜃 𝜏 𝑋 𝜏 , 𝜃 = {𝜃 𝜏 } 𝜏≠∅ ∈ S𝑛−1 . 𝜏≠∅
√ Since |𝑋 𝜏 | = 1 and thus |𝑋 | = 𝑛, for the study of the asymptotic behavior of the 𝐿 2 -distance 𝜔(𝐹𝜃 , 𝐹) and 𝜔(𝐹𝜃 , Φ) on average, one may apply the two-sided bounds of Proposition 15.4.1 or Corollaries 15.4.2–15.4.3. Let 𝑌 be an independent copy of 𝑋, which we realize on the product space Ω2 = Ω × Ω with product measure P2 = P × P by Ö Ö 𝑋 𝜏 (𝑡, 𝑠) = 𝑡 𝑘 , 𝑌𝜏 (𝑡, 𝑠) = 𝑠 𝑘 𝑡 = (𝑡 1 , . . . , 𝑡 𝑑 ), 𝑠 = (𝑠1 , . . . , 𝑠 𝑑 ) ∈ Ω. 𝑘 ∈𝜏
𝑘 ∈𝜏
Then the inner product ⟨𝑋, 𝑌 ⟩ =
∑︁ 𝜏≠∅
𝑋 𝜏 (𝑡, 𝑠)𝑌𝜏 (𝑡, 𝑠) = −1 +
𝑑 Ö
(1 + 𝑡 𝑘 𝑠 𝑘 )
𝑘=1
takes only two values, namely 2𝑑 − 1 in the case 𝑡 = 𝑠, and −1 if 𝑡 ≠ 𝑠. Hence
368
18 Special Systems and Examples
E ⟨𝑋, 𝑌 ⟩ 4 = (2𝑑 − 1) 4 P2 (𝑡, 𝑠) ∈ Ω2 : 𝑡 = 𝑠 + P2 (𝑡, 𝑠) ∈ Ω2 : 𝑡 ≠ 𝑠 𝑛4 1 = (2𝑑 − 1) 4 2−𝑑 + (1 − 2−𝑑 ) = + 1− ∼ 𝑛3 𝑛+1 𝑛+1 and E ⟨𝑋, 𝑌 ⟩ 3 = (2𝑑 − 1) 3 2−𝑑 + (1 − 2−𝑑 ) =
1 𝑛3 + 1− ∼ 𝑛2 . 𝑛+1 𝑛+1
As a result, we obtain: Proposition 18.5.1 For the Walsh system 𝑋 on the discrete cube {−1, 1} 𝑑 of size 𝑛 = 2𝑑 − 1, we have with some positive absolute constants 𝑐 1 > 𝑐 0 > 0, 𝑐0 𝑐1 ≤ E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≤ . 𝑛 𝑛 By Proposition 14.6.1 and Corollary 16.4.5, we also have 𝑐0 𝑐 1 log 𝑛 . √ ≤ E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √ 4 (log 𝑛) 𝑛 𝑛
18.6 Empirical Measures Here is another interesting example leading to a similar rate of normal approximation. Let 𝑒 1 , . . . , 𝑒 𝑛 denote the canonical basis in R𝑛 . Assuming that the random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) takes only 𝑛 values, √ √ 𝑛 𝑒1 , . . . , 𝑛 𝑒 𝑛 , each with √ probability 1/𝑛, the linear form ⟨𝑋, 𝜃⟩ also takes 𝑛 values, namely, √ 𝑛 𝜃 1 , . . . , 𝑛 𝜃 𝑛 , each with probability 1/𝑛, for any 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 . That is, as a measure, the distribution of ⟨𝑋, 𝜃⟩ is described as 𝐹𝜃 =
𝑛 1 ∑︁ √ 𝛿 , 𝑛 𝑘=1 𝑛 𝜃𝑘
√ which represents an empirical measure based on the observations 𝜁 𝑘 = 𝑛 𝜃 𝑘 , 𝑘 = 1, . . . , 𝑛. Each 𝜁 𝑘 is almost standard normal, while jointly they are nearly independent (we have √ already considered in detail its densities 𝜑 𝑛 (𝑥) and characteristic functions 𝐽𝑛 (𝑡 𝑛)). Just taking a short break, let us recall that when 𝑍 𝑘 are standard normal and independent, it is well-known that the empirical measures 𝐺𝑛 =
𝑛 1 ∑︁ 𝛿𝑍 𝑛 𝑘=1 𝑘
18.6 Empirical Measures
369
√ approximate the standard normal law Φ with rate 1/ 𝑛 with respect to the Kolmogorov distance. More precisely, E𝐺 𝑛 = Φ, and there is a subgaussian deviation bound n√ o 2 P 𝑛 𝜌(𝐺 𝑛 , Φ) ≥ 𝑟 ≤ 2e−2𝑟 , 𝑟 ≥ 0 (cf. [136]). In particular, E 𝜌(𝐺 𝑛 , Φ) ≤ √𝑐𝑛 for some absolute constant 𝑐. Note that the characteristic function of 𝐺 𝑛 is given by 𝑔𝑛 (𝑡) = It has mean 𝑔(𝑡) = e−𝑡
2 /2
𝑛 1 ∑︁ 𝑖𝑡 𝑍𝑘 e , 𝑛 𝑘=1
𝑡 ∈ R.
and variance
𝑛 2 1 ∑︁ Var(e𝑖𝑡 𝑍𝑘 ) E 𝑔𝑛 (𝑡) − 𝑔(𝑡) = Var(𝑔𝑛 (𝑡)) = 2 𝑛 𝑘=1 1 2 1 1 = Var(e𝑖𝑡 𝑍1 ) = 1 − |E e𝑖𝑡 𝑍1 | 2 = 1 − e−𝑡 𝑛 𝑛 𝑛
(in view of the independence of 𝑍 𝑘 ). Hence, applying Plancherel’s theorem together with Lemma 12.5.2 with 𝛼 = 0 and 𝛼0 = 1, we also have ∫ ∞ 2 𝑔 (𝑡) − 𝑔(𝑡) 2 1 1 1 − e−𝑡 𝑛 E d𝑡 = √ . d𝑡 = 𝑡 2𝜋𝑛 −∞ 𝑡2 𝑛 𝜋 −∞ √ Thus, on average the 𝐿 2 -distance 𝜔(𝐺 𝑛 , Φ) is of order 1/ 𝑛 as well. √ Similar properties may be expected for the √ random variables 𝜁 𝑘 = 𝑛 𝜃 𝑘 and hence for the random vector 𝑋. Note that |𝑋 | = 𝑛, while E 𝜔2 (𝐺 𝑛 , Φ) =
1 2𝜋
∫
∞
E ⟨𝑋, 𝜃⟩ 2 =
𝑛 1 ∑︁ √ ( 𝑛 𝜃 𝑘 ) 2 = 1, 𝑛 𝑘=1
so that 𝑋 is isotropic. We now involve the asymptotic formula of Proposition 15.3.1, 1 1 1 E 1 − (1 − 𝜉) 1/2 − √ + 𝑂 (𝑛−2 ), E 𝜃 𝜔2 (𝐹𝜃 , Φ) = √ 1 + 4𝑛 𝜋 8𝑛 𝜋 ⟩ where 𝜉 = ⟨𝑋,𝑌 with 𝑌 being an independent copy of 𝑋. By the definition, 𝜉 takes 𝑛 only two values, 1 with probability 𝑛1 , and 0 with probability 1 − 𝑛1 . Hence, the last expectation is equal to 𝑛1 , and we get
7/8 E 𝜃 𝜔2 (𝐹𝜃 , Φ) = √ + 𝑂 (𝑛−2 ). 𝑛 𝜋 Note that in this particular case, the lower bound of Corollary 15.3.2 is useless.
370
18 Special Systems and Examples
As for the Kolmogorov distance, by Proposition 14.6.1 and Corollary 16.4.5, 𝑐0 𝑐 1 log 𝑛 √ ≤ E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √ (log 𝑛) 4 𝑛 𝑛 for some absolute constants 𝑐 1 > 𝑐 0 > 0. Apparently, both logarithmic terms can be removed. Their appearance here is explained by the use of the Fourier tools (in the form of Berry–Esseen bounds), while the proof of the Dvoretzky–Kiefer–Wolfowitz inequality on 𝜌(𝐺 𝑛 , Φ), [81], is based on entirely different arguments.
18.7 Lacunary Systems 2 An orthonormal sequence of random variables {𝑋 𝑘 }∞ 𝑘=1 in 𝐿 (Ω, F , P) is called a 2 lacunary system of order 𝑝 > 2 if, for any sequence (𝑎 𝑘 ) in ℓ , the series ∞ ∑︁
𝑎 𝑘 𝑋𝑘
𝑘=1
converges in the 𝐿 𝑝 -norm to some element of 𝐿 𝑝 (Ω, F , P). This property is equivalent to the Khinchine-type inequality
E |𝑎 1 𝑋1 + · · · + 𝑎 𝑛 𝑋𝑛 | 𝑝
1/ 𝑝
≤ 𝑀 𝑝 (𝑎 21 + · · · + 𝑎 2𝑛 ) 1/2 ,
(18.8)
which should be valid for all 𝑎 𝑘 ∈ R with some constant 𝑀 𝑝 independent of 𝑛 and the choice of the coefficients 𝑎 𝑘 . For basic properties of such systems we refer the interested reader to [120]. Starting from an orthonormal lacunary system of order 𝑝 = 4, consider the random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ). According to Corollary 15.4.2, if |𝑋 | 2 = 𝑛 a.s. and E𝑋 = 0, then, up to some absolute constant 𝑐 > 0, 𝑐 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
1 1 E ⟨𝑋, 𝑌 ⟩ 3 + 4 E ⟨𝑋, 𝑌 ⟩ 4 , 3 𝑛 𝑛
(18.9)
where 𝑌 is an independent copy of 𝑋. A similar bound 𝑐 E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤
log 𝑛 (log 𝑛) 2 E ⟨𝑋, 𝑌 ⟩ 3 + E ⟨𝑋, 𝑌 ⟩ 4 3 𝑛 𝑛4
(18.10)
also holds for the Kolmogorov distance (Proposition 16.2.1). Note that, in the inequality (18.8), the optimal constant is just the moment functional 𝑀 𝑝 = 𝑀 𝑝 (𝑋), while according to Proposition 1.4.2, we have general relations E | ⟨𝑋, 𝑌 ⟩ | 3 ≤ 𝑀36 𝑛3/2 ,
E ⟨𝑋, 𝑌 ⟩ 4 ≤ 𝑀48 𝑛2 .
18.7 Lacunary Systems
371
Hence, the bounds (18.9)–(18.10) lead to the estimates 1
1 8 𝑀 , 𝑛2 4 (log 𝑛) 2 8 log 𝑛 𝑐 E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤ 3/2 𝑀36 + 𝑀4 , 𝑛2 𝑛
𝑐 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
𝑛3/2
𝑀36 +
Thus, if 𝑀4 is bounded, both distances are at most of order 𝑛−3/4 on average (modulo a logarithmic factor). Moreover, if Σ3 (𝑛) ≡ E ⟨𝑋, 𝑌 ⟩ 3 =
2
∑︁
E𝑋𝑖1 𝑋𝑖2 𝑋𝑖3
(18.11)
1≤𝑖1 ,𝑖2 ,𝑖3 ≤𝑛
is bounded by a multiple of 𝑛, then these distances are on average at most 1/𝑛 (modulo a logarithmic factor in the case of 𝜌). Indeed, by the lacunary property, we have a Khinchine-type inequality
E𝑌 ⟨𝑋, 𝑌 ⟩ 4
1/4
√ ≤ 𝑀4 |𝑋 | = 𝑀4 𝑛,
so that E ⟨𝑋, 𝑌 ⟩ 4 ≤ 𝑀4 𝑛2 , and one may apply (18.9)–(18.10). For an illustration, on the interval Ω = (−𝜋, 𝜋) with the uniform measure dP(𝑡) = 1 2 𝜋 d𝑡, consider a finite trigonometric system 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) with components √ 𝑋2𝑘−1 (𝑡) = 2 cos(𝑚 𝑘 𝑡), √ 𝑋2𝑘 (𝑡) = 2 sin(𝑚 𝑘 𝑡),
𝑘 = 1, . . . , 𝑛/2,
where 𝑚 𝑘 are positive integers such that 𝑚 𝑘+1 ≥ 𝑞 > 1, 𝑚𝑘 assuming that 𝑛 is even. Then 𝑋 is an isotropic random vector satisfying |𝑋 | 2 = 𝑛 and E𝑋 = 0, and with 𝑀4 bounded by a function of 𝑞 only. For evaluation of the moment Σ3 (𝑛) as in (18.11), one may use the identities cos 𝑡 = E 𝜀 e𝑖 𝜀𝑡 ,
sin 𝑡 =
1 E 𝜀 𝜀 e𝑖 𝜀𝑡 , 𝑖
where 𝜀 is a Bernoulli random variable taking the values ±1 with probability 12 . Let 𝜀1 , 𝜀2 , 𝜀3 be independent copies of 𝜀 which are also independent of 𝑋. Using the property that 𝜀1′ = −𝜀1 𝜀3 and 𝜀2′ = −𝜀2 𝜀3 are independent, the identity for the cosine function implies that, for all integers 1 ≤ 𝑛1 ≤ 𝑛2 ≤ 𝑛3 ,
372
18 Special Systems and Examples
E cos(𝑛1 𝑡) cos(𝑛2 𝑡) cos(𝑛3 𝑡) = E 𝜀 E exp{𝑖(𝜀1 𝑛1 + 𝜀2 𝑛2 + 𝜀3 𝑛3 ) 𝑡} = E 𝜀 𝐼{𝜀1 𝑛1 + 𝜀2 𝑛2 + 𝜀3 𝑛3 = 0} 1 = E 𝜀 𝐼{𝜀1′ 𝑛1 + 𝜀2′ 𝑛2 = 𝑛3 } = 𝐼{𝑛1 + 𝑛2 = 𝑛3 }, 4 where E 𝜀 means the expectation over (𝜀1 , 𝜀2 , 𝜀3 ), and where 𝐼{𝐴} denotes the indicator of the event 𝐴. Similarly, involving also the identity for the sine function, we have E sin(𝑛1 𝑡) sin(𝑛2 𝑡) cos(𝑛3 𝑡) = −E 𝜀 E 𝜀1 𝜀2 exp{𝑖(𝜀1 𝑛1 + 𝜀2 𝑛2 + 𝜀3 𝑛3 ) 𝑡} = −E 𝜀 𝜀1 𝜀2 𝐼{𝜀1 𝑛1 + 𝜀2 𝑛2 + 𝜀3 𝑛3 = 0} 1 = −E 𝜀 𝜀1′ 𝜀2′ 𝐼{𝜀1′ 𝑛1 + 𝜀2′ 𝑛2 = 𝑛3 } = − 𝐼{𝑛1 + 𝑛2 = 𝑛3 }, 4 E sin(𝑛1 𝑡) cos(𝑛2 𝑡) sin(𝑛3 𝑡) = −E 𝜀 E 𝜀1 𝜀3 exp{𝑖(𝜀1 𝑛1 + 𝜀2 𝑛2 + 𝜀3 𝑛3 ) 𝑡} = −E 𝜀 𝜀1 𝜀3 𝐼{𝜀1 𝑛1 + 𝜀2 𝑛2 + 𝜀3 𝑛3 = 0} 1 = E 𝜀 𝜀1′ 𝐼{𝜀1′ 𝑛1 + 𝜀2′ 𝑛2 = 𝑛3 } = 𝐼{𝑛1 + 𝑛2 = 𝑛3 }, 4 E cos(𝑛1 𝑡) sin(𝑛2 𝑡) sin(𝑛3 𝑡) = −E 𝜀 E 𝜀2 𝜀3 exp{𝑖(𝜀1 𝑛1 + 𝜀2 𝑛2 + 𝜀3 𝑛3 ) 𝑡} = −E 𝜀 𝜀2 𝜀3 𝐼{𝜀1 𝑛1 + 𝜀2 𝑛2 + 𝜀3 𝑛3 = 0} 1 = E 𝜀 𝜀2′ 𝐼{𝜀1′ 𝑛1 + 𝜀2′ 𝑛2 = 𝑛3 } = 𝐼{𝑛1 + 𝑛2 = 𝑛3 }. 4 On the other hand, if the sine function appears in the product once or three times, such expectations will be vanishing. They are thus vanishing in all cases where 𝑛1 + 𝑛2 ≠ 𝑛3 , and do not exceed 14 in absolute value for any combination of sine and cosine terms in all cases where 𝑛1 + 𝑛2 = 𝑛3 . Therefore, the moment Σ3 (𝑛) in (18.11) is bounded by a multiple of 𝑇3 (𝑛) where 𝑇3 (𝑛) = card (𝑖1 , 𝑖2 , 𝑖3 ) : 1 ≤ 𝑖 1 ≤ 𝑖2 < 𝑖3 ≤ 𝑛, 𝑚 𝑖1 + 𝑚 𝑖2 = 𝑚 𝑖3 . One can now involve the lacunary assumption. If 𝑞 > 2, the property 𝑖1 ≤ 𝑖 2 < 𝑖3 implies 𝑚 𝑖1 + 𝑚 𝑖2 < 𝑚 𝑖3 , so that 𝑇3 (𝑛) = Σ3 (𝑛) = 0. In the case 1 < 𝑞 ≤ 2, define 𝐴𝑞 to be the (finite) collection of all couples (𝑘 1 , 𝑘 2 ) of positive integers such that 𝑞 −𝑘1 + 𝑞 −𝑘2 ≥ 1. By the lacunary assumption, if 1 ≤ 𝑖1 ≤ 𝑖2 < 𝑖3 ≤ 𝑛, we have 𝑚 𝑖1 + 𝑚 𝑖2 ≤ 𝑞 −(𝑖3 −𝑖1 ) + 𝑞 −(𝑖3 −𝑖2 ) 𝑚 𝑖3 < 𝑚 𝑖3 , as long as the couple (𝑖3 − 𝑖1 , 𝑖2 − 𝑖1 ) is not in 𝐴𝑞 . Hence,
18.7 Lacunary Systems
373
𝑇3 (𝑛) ≤ card (𝑖 1 , 𝑖2 , 𝑖3 ) : 1 ≤ 𝑖 1 ≤ 𝑖2 < 𝑖3 ≤ 𝑛, (𝑖 3 − 𝑖1 , 𝑖3 − 𝑖2 ) ∈ 𝐴𝑞 ≤ 𝑛 card( 𝐴𝑞 ) ≤ 𝑐 𝑞 𝑛 with constant depending on 𝑞 only. Returning to (18.9)–(18.10), we then obtain: Proposition 18.7.1 For the lacunary trigonometric system 𝑋 of an even length 𝑛 and with parameter 𝑞 > 1, E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) ≤
𝑐𝑞 , 𝑛2
E 𝜃 𝜌 2 (𝐹𝜃 , 𝐹) ≤
𝑐 𝑞 (log 𝑛) 2 , 𝑛2
where the constants 𝑐 𝑞 depend on 𝑞 only. Similar inequalities also hold when 𝐹 is replaced with the standard normal distribution function Φ. In this connection one should mention a classical result of Salem and Zygmund concerning distributions of the lacunary sums 𝑆𝑛 =
𝑛 ∑︁
(𝑎 𝑘 cos(𝑚 𝑘 𝑡) + 𝑏 𝑘 sin(𝑚 𝑘 𝑡))
𝑘=1
with an arbitrary prescribed sequence of the coefficients (𝑎 𝑘 ) 𝑘 ≥1 and (𝑏 𝑘 ) 𝑘 ≥1 . As≥ 𝑞 > 1 for all 𝑘 and put sume that 𝑚𝑚𝑘+1 𝑘 𝑣𝑛2 =
𝑛 1 ∑︁ 2 (𝑎 + 𝑏 2𝑘 ) 2 𝑘=1 𝑘
(𝑣𝑛 ≥ 0),
so that the normalized sums 𝑍 𝑛 = 𝑆 𝑛 /𝑣𝑛 have mean zero and variance one under the measure P. It was shown in [164] that the 𝑍 𝑛 are weakly convergent to the standard normal law, i.e., their distributions 𝐹𝑛 under P satisfy 𝜌(𝐹𝑛 , Φ) → 0 as 𝑛 → ∞, 𝑎2 +𝑏2 if and only if 𝑛𝑣2 𝑛 → 0 (in fact, the weak convergence was established on every 𝑛 subset of Ω of positive measure). Restricting to the coefficients 𝜃 2𝑘−1 = 𝑎 𝑘 /𝑣𝑛 , 𝜃 2𝑘 = 𝑏 𝑘 /𝑣𝑛 , Salem–Zygmund’s theorem may be stated as the assertion that 𝜌(𝐹𝜃 , Φ) is small if and only if ∥𝜃 ∥ ∞ = max1≤𝑘 ≤𝑛 |𝜃 𝑘 | is small. The latter condition naturally appears in the central limit theorem about the weighted sums of independent identically distributed random variables. Thus, Proposition 18.7.1 complements this result in terms of the rate of convergence in the mean on the unit sphere. The result of [164] was generalized in [165]; as it turns out there is no need to assume that all 𝑚 𝑘 are integers, and the asymptotic normality is saved for real 𝑚 𝑘 > 1. However, in this more general situation, the rate 1/𝑛 as such that inf 𝑘 𝑚𝑚𝑘+1 𝑘 √ in Proposition 18.7.1 is no longer true (although the rate 1/ 𝑛 is valid). The main reason is that the means √ sin(𝜋𝑚 𝑘 ) √ E𝑋2𝑘−1 = 2 E cos(𝑚 𝑘 𝑡) = 2 𝜋𝑚 𝑘
374
18 Special Systems and Examples
may be non-zero. For example, choosing 𝑚 𝑘 = 2 𝑘 + 21 , we obtain an orthonormal system with E𝑋2𝑘 = 0, while √ 2 2 . E𝑋2𝑘−1 = 𝜋 (2 𝑘+1 + 1) Hence E ⟨𝑋, 𝑌 ⟩ = |E𝑋 | 2 =
𝑛 1 8 ∑︁ →𝑐 2 𝑘+1 𝜋 𝑘=1 (2 + 1) 2
(𝑛 → ∞)
for some absolute constant 𝑐 > 0, where 𝑌 is an independent copy of 𝑋. It can be seen that E ⟨𝑋, 𝑌 ⟩ 2 = 𝑛 + 𝑂 (1), Putting 𝜉 = √
⟨𝑋,𝑌 ⟩ 𝑛
E ⟨𝑋, 𝑌 ⟩ 3 = 𝑂 (𝑛),
E ⟨𝑋, 𝑌 ⟩ 4 = 𝑂 (𝑛2 ).
and applying Proposition 15.3.1, we find that
1 1 𝜋 E 𝜃 𝜔2 (𝐹𝜃 , 𝐹) = 1 + E 1 − (1 − 𝜉) 1/2 − + 𝑂 (𝑛−2 ) 4𝑛 8𝑛 1 1 1 1 𝑐 = 1+ E𝜉 + E𝜉 2 − + 𝑂 (𝑛−2 ) = + 𝑂 (𝑛−2 ). 4𝑛 2 8 8𝑛 2𝑛
A similar asymptotic is also true when 𝐹 is replaced with Φ.
18.8 Remarks The material of this chapter is based on [41] For lacunary trigonometric cosine sequences √ 𝑋𝑛 (𝜔) = 2 cos(𝑛 𝜔), −𝜋 < 𝜔 < 𝜋, Gaposhkin [93] established the following variant of the Berry–Esseen bound. Given a ≥ 𝑞 > 1, sequence 𝑚 𝑘 of natural numbers satisfying the lacunary condition inf 𝑘 𝑚𝑚𝑘+1 𝑘 Í∞ 2 and a family of real numbers {𝑎 𝑛,𝑘 } 𝑛,𝑘 ≥1 such that 𝑘=1 𝑎 𝑛,𝑘 = 1, consider the sums 𝑆𝑛 =
∞ ∑︁
𝑎 𝑛,𝑘 𝑋𝑚𝑘 .
𝑘=1
Thus, E𝑆 𝑛 = 0 and E𝑆 2𝑛 = 1 on the interval Ω = (−𝜋, 𝜋) with the normalized Lebesgue measure P. It was shown that the distribution functions 𝐹𝑛 (𝑥) = P{𝑆 𝑛 ≤ 𝑥} satisfy 𝜌(𝐹𝑛 , Φ) ≤ 𝐶𝑞 sup |𝑎 𝑛,𝑘 | 1/4 , 𝑘 ≥1
where the constants 𝐶𝑞 depend on 𝑞 only.
Chapter 19
Distributions With Symmetries
In this chapter we consider separately another natural example of probability distributions on R𝑛 for which one may potentially obtain rates better than the standard √1 -rate for approximations of 𝐹 𝜃 by the standard normal law (in contrast with most 𝑛 of the examples of the previous chapter). Nevertheless, in the first section we focus on the standard rate and postpone refinements to the next sections. In the last section, we consider the important case of distributions which are coordinatewise symmetric and log-concave and give the proof of a theorem due to Klartag [123].
19.1 Coordinatewise Symmetric Distributions Recall that a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 is said to have a coordinatewise symmetric distribution if it is symmetric under all reflections (𝑥1 , . . . , 𝑥 𝑛 ) → (𝜀1 𝑥1 , . . . , 𝜀 𝑛 𝑥 𝑛 ), 𝜀 𝑘 = ±1, of the space R𝑛 around the coordinate axes. In Section 2.3 we discussed a few properties of such distributions. First, we consider a scheme with explicitly prescribed weights. Let (Ω, F , P) be the underlying probability space on which all random variables 𝑋 𝑘 are defined, and let 𝐹𝑛 (𝑥) = P{𝑆 𝑛 ≤ 𝑥} be the distribution functions of the sums 𝑆 𝑛 = 𝑋1 + · · · + 𝑋 𝑛 . When 𝑋 has a coordinatewise symmetric distribution with a finite second moment, then necessarily E𝑋 𝑘 = 0, so that E𝑆 𝑛 = 0, and E𝑆 2𝑛 = E |𝑋 | 2 =
𝑛 ∑︁
E𝑋 𝑘2 .
𝑘=1
We shall estimate the Kolmogorov distance 𝜌(𝐹𝑛 , Φ) = sup 𝐹𝑛 (𝑥) − Φ(𝑥) 𝑥
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_19
375
376
19 Distributions With Symmetries
in terms of moment-type functionals. The following general observation may be obtained based on the Berry–Esseen theorem. Proposition 19.1.1 If the random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 has a coordinatewise symmetric distribution, and E |𝑋 | 2 = 1, then 𝜌(𝐹𝑛 , Φ) ≤ 0.56 E
𝑛 1 ∑︁ 3 |𝑋 | + 2.1 Var(|𝑋 | 2 ), 𝑘 |𝑋 | 3 𝑘=1
(19.1)
where the expression under the expectation sign is understood to be zero in the case |𝑋 | = 0. Moreover, for some absolute constant 𝑐 > 0, 𝜌(𝐹𝑛 , Φ) ≤ 0.56 E
𝑛 1 ∑︁ 3 |𝑋 | + 𝑐 Var(|𝑋 |). 𝑘 |𝑋 | 3 𝑘=1
(19.2)
Since Var(|𝑋 |) ≤ Var(|𝑋 | 2 ) in view of the assumption E |𝑋 | 2 = 1, the inequality (19.2) is stronger and requires only finiteness of the second moment 𝑀2 . On the other hand, the variance Var(|𝑋 | 2 ) is more tractable; in addition, (19.1) contains an explicit numerical constant. Proof Let 𝜉1 , . . . , 𝜉 𝑛 be independent random variables with mean zero and finite 3-rd absolute moments. Putting 𝑟 2 = Var(𝜉1 + · · · + 𝜉 𝑛 ), 𝑟 ≥ 0, we have using the Berry–Esseen Theorem (cf. Proposition 4.4.1) for some absolute constant 𝑐 > 0, sup P{𝜉1 + · · · + 𝜉 𝑛 ≤ 𝑥} − Φ𝑟 (𝑥) ≤ 𝑐𝐿 3 . (19.3) 𝑥
Here 𝐿3 =
𝑛 1 ∑︁ E |𝜉 𝑘 | 3 𝑟 3 𝑘=1
denotes the corresponding Lyapunov ratio, and Φ𝑟 (𝑥) = Φ(𝑥/𝑟) stands for the distribution function of the normal law with mean zero and standard deviation 𝑟. In the case 𝑟 = 0, put 𝐿 3 = 0 and Φ𝑟 (𝑥) = 1 {𝑥 ≥0} . As we mentioned before, (19.3) holds true with 𝑐 = 0.56 according to [166]. We apply this theorem with 𝜉 𝑘 = 𝜀 𝑘 𝑋 𝑘 , where 𝜀1 , . . . , 𝜀 𝑛 are independent Bernoulli random variables taking the values ±1 with probability 1/2, which are also independent of the 𝑋 𝑘 ’s. By the assumption, the random vector (𝜀1 𝑋1 , . . . , 𝜀 𝑛 𝑋𝑛 ) has the same distribution as 𝑋. Hence, for any fixed values of 𝑋 𝑘 = 𝑋 𝑘 (𝜔), 𝜔 ∈ Ω, (19.3) may be applied to the sums 𝑇𝑛 = 𝜀1 𝑋1 + · · · + 𝜀 𝑛 𝑋𝑛 to get sup P 𝜀 {𝑇𝑛 ≤ 𝑥} − Φ |𝑋 ( 𝜔) | (𝑥) ≤ 0.56 𝑥
𝑛 ∑︁ 1 |𝑋 𝑘 (𝜔)| 3 . |𝑋 (𝜔)| 3 𝑘=1
Let us take expectations of both sides with respect to the 𝑋 𝑘 ’s and then insert the expectation on the left-hand side inside the supremum. We thus obtain the distribution function 𝐹𝑛 (𝑥) = E P 𝜀 {𝑇𝑛,𝑋 ≤ 𝑥}, while Φ |𝑋 | = E Φ |𝑋 ( 𝜔) | represents a mixture of
19.1 Coordinatewise Symmetric Distributions
377
Gaussian measures which we considered in Chapter 12. Therefore, we get sup 𝐹𝑛 (𝑥) − Φ |𝑋 | (𝑥) ≤ 0.56 E
𝑥
𝑛 1 ∑︁ 3 E |𝑋 𝑘 | . |𝑋 | 3 𝑘=1
On the other hand, by Proposition 12.3.1, if E |𝑋 | 2 = 1, then sup |Φ |𝑋 | (𝑥) − Φ(𝑥)| ≤ 2.1 Var(|𝑋 | 2 ), 𝑥
and it remains to combine the two bounds to get (19.1). Moreover, by Proposition 12.4.1, sup |Φ |𝑋 | (𝑥) − Φ(𝑥)| ≤ 𝑐 Var(|𝑋 |), 𝑥
and then we obtain (19.2). Proposition 19.1.1 is thus proved.
□
To simplify the bound (19.1), let us derive a simple corollary. Given 0 < 𝜆 < 1, by Chebyshev’s inequality, the event 𝐴 = {|𝑋 | ≤ 𝜆} has probability P( 𝐴) = P{1 − |𝑋 | 2 ≥ 1 − 𝜆2 } ≤ Applying the inequality (𝑥12 + · · · + 𝑥 𝑛2 ) −3/2 E
1 Var(|𝑋 | 2 ). (1 − 𝜆2 ) 2
Í𝑛 𝑘=1
|𝑥 𝑘 | 3 ≤ 1 𝑥 𝑘 ∈ R), we then have
𝑛 1 ∑︁ 1 3 |𝑋 | 1 Var(|𝑋 | 2 ). 𝑘 𝐴 ≤ P( 𝐴) ≤ |𝑋 | 3 𝑘=1 (1 − 𝜆2 ) 2
As for the complementary event 𝐵 = {|𝑋 | > 𝜆}, one may just write 𝑛 𝑛 1 ∑︁ 1 ∑︁ 3 |𝑋 𝑘 | 1 𝐵 ≤ 3 E |𝑋 𝑘 | 3 . E |𝑋 | 3 𝑘=1 𝜆 𝑘=1
Combining these two estimates, from (19.1) we get 𝜌(𝐹𝑛 , Φ) ≤
𝑛 1 0.56 ∑︁ 3 E |𝑋 | + 2.1 + Var(|𝑋 | 2 ). 𝑘 𝜆3 𝑘=1 (1 − 𝜆2 ) 2
Choosing 𝜆 = 0.281/3 , we arrive at: Corollary 19.1.2 If the random vector 𝑋 in R𝑛 has a coordinatewise symmetric distribution with E |𝑋 | 2 = 1, then 𝜌(𝐹𝑛 , Φ) ≤ 2
𝑛 ∑︁ 𝑘=1
E |𝑋 𝑘 | 3 + 6 Var(|𝑋 | 2 ).
(19.4)
378
19 Distributions With Symmetries
19.2 Behavior On Average Given a random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 with a coordinatewise symmetric distribution, let us return to the weighted sums ⟨𝑋, 𝜃⟩ = 𝜃 1 𝑋1 + · · · + 𝜃 𝑛 𝑋𝑛 and consider their distribution functions 𝐹𝜃 (𝑥) = P{⟨𝑋, 𝜃⟩ ≤ 𝑥} (𝜃 ∈ S𝑛−1 , 𝑛 ≥ 2). If 𝑋 is isotropic, which is equivalent to the condition E𝑋 𝑘2 = 1, 𝑘 = 1, . . . , 𝑛, then the random vector 𝑋 𝜃 = (𝜃 1 𝑋1 , . . . , 𝜃 𝑛 𝑋𝑛 ) satisfies E |𝑋 𝜃 | 2 = 1. Hence, Corollary 19.1.2 is applicable to 𝑋 𝜃 (in place of 𝑋), and (19.4) gives 𝜌(𝐹𝜃 , Φ) ≤ 2
𝑛 ∑︁
|𝜃 𝑘 | 3 E |𝑋 𝑘 | 3 + 6 Var(|𝑋 𝜃 | 2 ).
(19.5)
𝑘=1
This inequality can be used to control the distances 𝜌(𝐹𝜃 , Φ) on average. To estimate the expectations over 𝜃 on the right-hand side of (19.5), let us represent a standard normal random vector 𝑍 = (𝑍1 , . . . , 𝑍 𝑛 ) in R𝑛 in the form 𝑍 = |𝑍 | 𝜃, where 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) is regarded as a random vector on S𝑛−1 , which is uniformly distributed on the sphere and is independent of |𝑍 |. This gives 4 √ = E |𝑍 𝑘 | 3 = E |𝑍 | 3 E 𝜃 |𝜃 𝑘 | 3 . 2𝜋 Since E |𝑍 | 3 ≥ (E |𝑍 | 2 ) 3/2 = 𝑛3/2 , we get E 𝜃 |𝜃 𝑘 | 3 ≤
√4 2𝜋
𝑛−3/2 . Using 𝑍𝑖2 𝑍 2𝑗 =
|𝑍 | 4 𝜃 𝑖2 𝜃 2𝑗 and E |𝑍 | 4 = 𝑛2 + 2𝑛, we also find that E 𝜃 𝜃 𝑖2 𝜃 2𝑗 =
𝑛2
1 (𝑖 ≠ 𝑗), + 2𝑛
E 𝜃 𝜃 𝑖4 =
𝑛2
3 + 2𝑛
(recall the more general identities (10.15) and (10.16)). Hence E 𝜃 Var(|𝑋 𝜃 | 2 ) = E 𝜃
𝑛 ∑︁
𝜃 𝑖2 𝜃 2𝑗 cov(𝑋𝑖2 , 𝑋 2𝑗 )
𝑖, 𝑗=1
=
𝑛 𝑛 ∑︁ ∑︁ 1 2 2 2 Var(𝑋𝑖2 ) cov(𝑋 , 𝑋 ) + 𝑖 𝑗 𝑛2 + 2𝑛 𝑖, 𝑗=1 𝑛2 + 2𝑛 𝑖=1
=
∑︁ 1 2 2 Var(|𝑋 Var(𝑋𝑖2 ). | ) + 𝑛2 + 2𝑛 𝑛2 + 2𝑛 𝑖=1
𝑛
Thus, from (19.5),
19.2 Behavior On Average
379
E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √
𝑛 ∑︁
8 2𝜋 𝑛3/2
E |𝑋 𝑘 | 3
𝑘=1 𝑛
+
6 12 ∑︁ 2 Var(|𝑋 Var(𝑋 𝑘2 ). | ) + 𝑛2 + 2𝑛 𝑛2 + 2𝑛 𝑘=1
To further simplify, note that the assumption E𝑋 𝑘2 = 1 implies E |𝑋 𝑘 | 3 ≤ E𝑋 𝑘4 . We also have Var(𝑋 𝑘2 ) ≤ E𝑋 𝑘4 , so that
E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √
𝑛
8 2𝜋 𝑛3/2
6 12 ∑︁ E |𝑋 𝑘 | 4 + 2 Var(|𝑋 | 2 ). + 2 𝑛 + 2𝑛 𝑘=1 𝑛 + 2𝑛
Introduce the mean 4-th moment 𝑛
1 ∑︁ E |𝑋 𝑘 | 4 . 𝛽¯4 = 𝑛 𝑘=1 √
Using
1 𝑛2 +2𝑛
≤
2−1 𝑛1/2
√ for 𝑛 ≥ 2 and 12 ( 2 − 1) +
√8 2𝜋
(19.6)
< 9, we then arrive at:
Proposition 19.2.1 If an isotropic random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) in R𝑛 has a coordinatewise symmetric distribution, then 6 9 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √ 𝛽¯4 + 2 Var(|𝑋 | 2 ). 𝑛 𝑛 √ Thus, if E |𝑋 𝑘 | 4 ≤ 𝑐 and 𝜎42 (𝑋) = 𝑛1 Var(|𝑋 | 2 ) ≤ 𝑐 𝑛, we are led to a bound with standard rate, namely 15𝑐 E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ √ . 𝑛 Nevertheless, in many interesting cases these bounds can be improved to get a rate of the order 1/𝑛, up to a logarithmic factor. To this end, let us note that, by the coordinatewise symmetry assumption, we necessarily have E ⟨𝑋, 𝑌 ⟩ 3 = 0, where 𝑌 is an independent copy of 𝑋. Moreover, E ⟨𝑋, 𝑌 ⟩ 4 ≤ 3𝑛2 𝛽¯42 , according to Proposition 2.3.2. Applying Corollary 15.4.2 and Proposition 16.2.1, we get: Proposition 19.2.2 Let 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) be an isotropic random vector in R𝑛 with a coordinatewise symmetric distribution such that |𝑋 | 2 = 𝑛 a.s. Then for some positive absolute constant 𝑐, 𝑐 E 𝜃 𝜔2 (𝐹𝜃 , Φ) ≤ 2 𝛽¯42 , 𝑛 where 𝛽¯4 is the mean 4-th moment defined in (19.6). In addition, E 𝜃 𝜌 2 (𝐹𝜃 , Φ) ≤
𝑐 (log 𝑛) 2 ¯2 𝛽4 . 𝑛2
380
19 Distributions With Symmetries
In order to get rid of the support assumption on the distribution of 𝑋, we need to involve the variance functional 𝜎42 (𝑋) = 𝑛1 Var(|𝑋 | 2 ), or the functional 𝑉2 (𝑋) = sup Var 𝜃 1 𝑋12 + · · · + 𝜃 𝑛 𝑋𝑛2 , 𝜃 ∈S𝑛−1
which was introduced in Proposition 2.3.3 with the aim to bound Λ = Λ(𝑋). Combining this proposition with Proposition 17.1.1, we obtain a similar rate with respect to the dimension 𝑛. Proposition 19.2.3 Let 𝑋 be an isotropic random vector in R𝑛 with a coordinatewise symmetric distribution. Then for some absolute constant 𝑐, E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 log 𝑛 max E𝑋 𝑘4 + 𝑉2 (𝑋) . 1≤𝑘 ≤𝑛 𝑛
If additionally the distribution of 𝑋 is invariant under permutations of coordinates, E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 log 𝑛 4 E𝑋1 + 𝜎42 (𝑋) , 𝑛
where the last term 𝜎42 (𝑋) may be removed in the case cov(𝑋12 , 𝑋22 ) ≤ 0.
19.3 Coordinatewise Symmetry and Log-concavity We now consider one important class of coordinatewise symmetric distributions on R𝑛 where a 𝑛1 -rate of normal approximation for the distributions 𝐹𝜃 of the weighted sums ⟨𝑋, 𝜃⟩ = 𝜃 1 𝑋1 + · · · + 𝜃 𝑛 𝑋𝑛 is achieved on a large part of the unit sphere. Keeping our standard notations, we have the following (non-randomized) variant of the central limit theorem due to Klartag [123]. Proposition 19.3.1 Suppose that a random vector 𝑋 in R𝑛 has an isotropic, coordinatewise symmetric log-concave distribution. Then, for all 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) in S𝑛−1 , with some universal constant 𝑐 > 0 we have 𝜌(𝐹𝜃 , Φ) ≤ 𝑐
𝑛 ∑︁
𝜃 4𝑘 .
(19.7)
𝑘=1
In particular, E 𝜃 𝜌(𝐹𝜃 , Φ) ≤ 𝑛𝑐 . Moreover, by Proposition 10.8.1 with 𝑝 = 4, √ 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ) ≥ 𝑐𝑟 ≤ e− 𝑟 𝑛 ,
𝑟 ≥ 1.
Without the log-concavity assumption, the relation (19.7) is no longer true, even when the components 𝑋 𝑘 of 𝑋 are independent and symmetric about the origin (as in the case of Bernoulli 𝑋 𝑘 ’s).
19.3 Coordinatewise Symmetry and Log-concavity
381
The proof requires some preparation. By assumption, the random vector 𝑋 has a coordinatewise symmetric, log-concave density on R𝑛 . In view of the isotropy, we have E𝑋 𝑘2 = 1, and a Khinchine-type inequality for log-concave random variables ˆ 𝑘 = 1, . . . , 𝑛, with some absolute constant. yields the upper bound E𝑋 𝑘4 ≤ 𝐶, Lemma 19.3.2 There exist absolute constants 𝐶, 𝐶˜ ≥ 1 such that, for every 𝑡 ≥ 0, 1 1 − Φ 𝑡 + 2 (1 − Φ(𝑡)) 1/4 ≥ (1 − Φ(𝑡)), 𝐶 1 Φ 𝑡 − 2 (1 − Φ(𝑡)) 1/4 ≥ Φ(−2) ≥ (1 − Φ(𝑡)). 𝐶 If 𝑥 > 0 satisfies | 1𝑥 −
1 𝜑 (𝑡) |
˜ ≤ 4𝐶 (1 − Φ(𝑡)) −3/4 for some 𝑡 ≥ 0, then 𝑥 ≤ 𝐶𝜑(𝑡).
1 This lemma is easily obtained by applying the elementary inequalities 1+𝑥 𝜑(𝑥) ≤ 1 1 − Φ(𝑥) ≤ 𝑥 𝜑(𝑥), 𝑥 ≥ 0. Here, the upper bound is well known. To prove the left one, consider the function 𝑢(𝑥) = 𝜑(𝑥) − (1 + 𝑥)(1 − Φ(𝑥)) on the positive half-axis. We have 𝑢(0) = √1 − 12 < 0, while 𝑢 ′′ (𝑥) = (1 − 𝑥)𝜑(𝑥). Hence, 𝑢 is convex on 2𝜋 [0, 1] and is convex on [1, ∞). Since 𝑢(∞) = 0, it follows that 𝑢(𝑥) ≤ 0 for all 𝑥 ≥ 0. Now, given a random variable 𝑋 with an even, continuously differentiable density 𝑝, we denote by 𝑆 𝜀 = 𝑆 𝜀 (𝑋) the subset of the interval [0, 𝑁] such that
𝑝(𝑥) = 𝜑(𝑥),
𝑥 ∈ 𝑆𝜀,
where a real number 𝑁 is defined by the relation 1 − Φ(𝑁) = 2𝐶 𝐴𝜀 2 . Here 𝐶 > 0 is an absolute constant, 𝐴 ≥ 1 and 0 < 𝜀 < 1 (satisfying 2𝐶 𝐴𝜀 2 < 1). Let Γ denote a random variable with density 𝑞(𝑥) = 𝑐𝑥 −8 sin8 (𝑥/8), where 𝑐 is a normalizing constant. This density is proportional to the Fourier transform of the 8-th convolution power of the indicator function 1 [− 1 , 1 ] , so, the characteristic 8 8 function 𝛾(𝑡) of Γ is vanishing for |𝑡| ≥ 1. Lemma 19.3.3 Let 𝑋 be a random variable, independent of Γ, with an even, continuously differentiable density 𝑝. Given 0 < 𝜀 < 1 and 𝐴 ≥ 1, assume that |P(𝑋 + 𝜀Γ ≤ 𝑥) − Φ(𝑥)| ≤ 𝐴𝜀 2 ,
𝑥 ∈ R.
(19.8)
If there exists an absolute constant 𝐶1 such that | 𝑝 ′ (𝑥 + 𝑢)| ≤ 𝐶1 for all 𝑥 ∈ 𝑆 𝜀 and |𝑢| ≤ (1 − Φ(𝑥)) 1/4 , then for some absolute constant 𝐶2 , |P(𝑋 ≤ 𝑥) − Φ(𝑥)| ≤ 𝐶2 𝐴𝜀 2 ,
𝑥 ∈ R.
(19.9)
Proof By Markov’s inequality, P{|Γ| ≥ (2 E Γ6 ) 1/6 } ≤ 12 . Therefore, for 𝑥 ≥ 𝑁, P 𝑋 + 𝜀Γ ≥ 𝑥 − (2 E Γ6 ) 1/6 𝜀 ≥ P 𝑋 + 𝜀Γ ≥ 𝑥 − (2 E Γ6 ) 1/6 𝜀, |Γ| ≤ (2 E Γ6 ) 1/6 ≥ P 𝑋 ≥ 𝑥, |Γ| ≤ (2 E Γ6 ) 1/6 1 ≥ P{𝑋 ≥ 𝑥}. 2
382
19 Distributions With Symmetries
Since 1 − Φ(𝑥 − (2 E Γ6 ) 1/6 𝜀) ≤ 𝐶𝜀 2 for 𝑥 ≥ 𝑁 with an absolute constant 𝐶 > 0, we obtain from (19.8) and the above lower bound that P(𝑋 ≥ 𝑥) ≤ 2( 𝐴 + 𝐶)𝜀 2 ,
𝑥 ≥ 𝑁,
which implies the required bound (19.9) for 𝑥 ≥ 𝑁, recalling that 1−Φ(𝑁) = 2𝐶 𝐴𝜀 2 . Now, on the interval 0 ≤ 𝑥 ≤ 𝑁, consider the function 𝜓(𝑥) = P(𝑋 ≤ 𝑥) − Φ(𝑥). The inequality (19.9) is fulfilled at 𝑥 = 0 and 𝑥 = 𝑁 (since 𝜓(0) = 0). But, this function attains its minimum and maximum either at the endpoints, or at one of the inner points, where necessarily 𝜓 ′ (𝑥) = 𝑝(𝑥) − 𝜑(𝑥) = 0 (if they exist). Hence, it is sufficient to prove (19.9) for 𝑥 ∈ 𝑆 𝜀 . Note that 1 − Φ(𝑥) ≥ 2𝐶 𝐴𝜀 2 and thus ∫ n o 1 𝜀 4 E Γ4 ≤ 𝐶𝜀 2 𝑞(𝑠/𝜀) d𝑠 = P 𝜀|Γ| ≥ (1 − Φ(𝑥) 1/4 ≤ 𝜀 |𝑠 | ≥ (1−Φ( 𝑥)) 1/4 1 − Φ(𝑥) for some absolute constant 𝐶. On the other hand, by Taylor’s formula, P{𝑋 ≤ 𝑥 + 𝑠} = P{𝑋 ≤ 𝑥} + 𝑝(𝑥)𝑠 +
1 ′ 𝑝 (𝑥 + 𝑎𝑠)𝑠2 , 2
|𝑎| ≤ 1,
while, by the assumption, we have | 𝑝 ′ (𝑥 + 𝑎𝑠)| ≤ 𝐶1 for |𝑠| ≤ (1 − Φ(𝑥)) 1/4 . Hence 1 ∫ ∞ P(𝑋 ≤ 𝑥 + 𝑠) − P(𝑋 ≤ 𝑥) 𝑞(𝑠/𝜀) d𝑠 |P(𝑋 + 𝜀Γ ≤ 𝑥) − P(𝑋 ≤ 𝑥)| = 𝜀 −∞ 1 ∫ 1 ≤ 𝑝(𝑥) 𝑠 + 𝐶1 𝑠2 𝑞(𝑠/𝜀) d𝑠 𝜀 |𝑠 | ≤ (1−Φ( 𝑥)) 1/4 2 ∫ 2 + 𝑞(𝑠/𝜀) d𝑠 𝜀 |𝑠 | ≥ (1−Φ( 𝑥)) 1/4 ∫ 𝐶1 ≤ 𝑠2 𝑞(𝑠/𝜀) d𝑠 + 2𝐶𝜀 2 2𝜀 |𝑠 | ≤ (1−Φ( 𝑥)) 1/4 ˜ 2 ≤ 𝐶1 E (𝜀Γ) 2 + 2𝐶𝜀 2 ≤ 𝐶𝜀 ˜ It remains to note that, for 𝑥 ∈ 𝑆 𝜀 , by (19.8), for an absolute constant 𝐶. |P(𝑋 ≤ 𝑥) − Φ(𝑥)| ≤ ≤ The lemma is proved.
˜ 2 + |P(𝑋 + 𝜀Γ ≤ 𝑥) − Φ(𝑥)| 𝐶𝜀 ˜ 2 + 𝐴𝜀 2 ≤ 𝐶2 𝐴𝜀 2 . 𝐶𝜀 □
19.3 Coordinatewise Symmetry and Log-concavity
383
Lemma 19.3.4 Let 𝑋 be a random variable with an even, positive, continuously differentiable log-concave density 𝑝. Assuming that (19.8) holds, there exists an absolute constant 𝐶3 such that | 𝑝 ′ (𝑥+𝑢)| ≤ 𝐶3 for all 𝑥 ∈ 𝑆 𝜀 and |𝑢| ≤ (1−Φ(𝑥)) 1/4 . Proof Fix 𝑥 ∈ 𝑆 𝜀 and recall that 1 − Φ(𝑥) ≥ 2𝐶 𝐴𝜀 2 . By Markov’s inequality, 𝜀 6 E Γ6 (1 − Φ(𝑥)) 3/2 1 ≤ 𝜀 E Γ6 (1 − Φ(𝑥)). (2𝐶 𝐴) 5/2
n o P 𝜀 |Γ| ≥ (1 − Φ(𝑥)) 1/4 ≤
(19.10)
By Lemma 19.3.2, 1 − Φ(𝑥 + 2 (1 − Φ(𝑥)) 1/4 ) ≥ 𝐶1 (1 − Φ(𝑥)), and therefore, by (19.8), n o P 𝑋 + 𝜀Γ ≥ 𝑥 + 2 (1 − Φ(𝑥)) 1/4 ≥ 1 − Φ(𝑥 + 2(1 − Φ(𝑥)) 1/4 ) − 𝐴𝜀 2 1 (1 − Φ(𝑥)) − 𝐴𝜀 2 𝐶 1 1 1 (1 − Φ(𝑥)) − (1 − Φ(𝑥)) = (1 − Φ(𝑥)). ≥ 𝐶 2𝐶 2𝐶 ≥
Consequently, we get from (19.10) under the legitimate assumption that 𝜀 is smaller than a given absolute constant, n o n o P 𝑋 ≥ 𝑥 + (1 − Φ(𝑥)) 1/4 ≥ P 𝑋 + 𝜀Γ ≥ 𝑥 + 2 (1 − Φ(𝑥)) 1/4 n o 1 (1 − Φ(𝑥)}. − P 𝜀|Γ| ≥ (1 − Φ(𝑥)) 1/4 ≥ 4𝐶 A similar argument, using the second inequality of Lemma 19.3.2, shows that n o n o P 𝑋 ≤ 𝑥 − (1 − Φ(𝑥)) 1/4 ≥ P 𝑋 + 𝜀Γ ≤ 𝑥 − 2 (1 − Φ(𝑥)) 1/4 n o 1 (1 − Φ(𝑥)). − P 𝜀|Γ| ≥ (1 − Φ(𝑥)) 1/4 ≥ 4𝐶 We conclude that, whenever |𝑢| ≤ (1 − Φ(𝑥)) 1/4 , n o 1 (1 − Φ(𝑥)). min P(𝑋 ≥ 𝑥 + 𝑢), P(𝑋 ≤ 𝑥 + 𝑢) ≥ 4𝐶
(19.11)
Now, fix 𝑢 0 ∈ R. Since 𝑝 > 0 and log 𝑝 is concave, one may write the inequality 𝑝(𝑢) ≤ 𝑝(𝑢 0 ) exp
o n 𝑝 ′ (𝑢 ) 0 (𝑢 − 𝑢 0 ) , 𝑝(𝑢 0 )
𝑢 ∈ R.
384
19 Distributions With Symmetries
Therefore, in the case 𝑝 ′ (𝑢 0 ) ≠ 0, we have ∫ 𝑢0 ∫ ∞ ∫ 𝑝(𝑢) d𝑢 ≤ min 𝑝(𝑢) d𝑢, −∞
𝑢0
∞
𝑢0
n | 𝑝 ′ (𝑢 )|(𝑢 − 𝑢 ) o 0 0 𝑝(𝑢 0 ) exp − d𝑢 𝑝(𝑢 0 )
𝑝(𝑢 0 ) 2 = ′ . | 𝑝 (𝑢 0 )| We may thus conclude from (19.11) that −1 | 𝑝 ′ (𝑥 + 𝑢)| ≤ 𝑝 2 (𝑥 + 𝑢) min{P(𝑋 ≥ 𝑥 + 𝑢), P(𝑋 ≤ 𝑥 + 𝑢)} ≤
4𝐶 𝑝 2 (𝑥 + 𝑢). 1 − Φ(𝑥)
(19.12)
Equivalently, |(1/𝑝(𝑥 + 𝑢)) ′ | ≤ 4𝐶/(1 − Φ(𝑥)), which yields
1 4𝐶 1 − (1 − Φ(𝑥)) 1/4 = 4𝐶 (1 − Φ(𝑥)) −3/4 ≤ 𝑝(𝑥 + 𝑢) 𝑝(𝑥) 1 − Φ(𝑥)
for |𝑢| ≤ (1 − Φ(𝑥)) 1/4 . Recall that 𝑝(𝑥) = 𝜑(𝑥) for 𝑥 ∈ 𝑆 𝜀 . Applying the third ˜ inequality of Lemma 19.3.2, we obtain that 𝑝(𝑥 + 𝑢) ≤ 𝐶𝜑(𝑥), and returning to (19.12), we thus get | 𝑝 ′ (𝑥 + 𝑢)| ≤
4𝐶 𝐶˜ 2 𝜑(𝑥) 2 ≤ 𝐶3 1 − Φ(𝑥)
for some absolute constant 𝐶3 . The lemma is proved.
□
Lemma 19.3.5 Let Δ1 , . . . , Δ𝑛 be independent, symmetric Bernoulli random variables. Given 𝜎 > 0 and 𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ R𝑛 , 𝜃 ≠ 0, assume that ∑︁
𝜃 2𝑘 ≤
𝑘: | 𝜃𝑘 |> 𝜎
1 2 |𝜃| . 2
(19.13)
Then with some absolute constant 𝐶 > 0, for all 𝑥 ∈ R, 𝑛 𝑛 n ∑︁ 𝜎 2 ∑︁ o 𝑥 𝜃 4𝑘 + . 𝜃 𝑘 Δ 𝑘 + 𝜎Γ ≤ 𝑥 − Φ ≤ 𝐶 P |𝜃| |𝜃| 2 𝑘=1 |𝜃| 4 𝑘=1
(19.14)
Proof In view of the homogeneity of the inequality (19.14), we may assume that Í |𝜃| = 1. Fix 𝑥 ∈ R and put 𝜀 = ( 𝑛𝑘=1 𝜃 4𝑘 ) 1/2 . The characteristic function of the Í𝑛 Î random variable 𝜉 = 𝑘=1 𝜃 𝑘 Δ 𝑘 + 𝜎Γ is given by E e𝑖𝑡 𝜉 = 𝛾(𝜎𝑡) 𝑛𝑘=1 cos(𝜃 𝑘 𝑡). Therefore, by the Fourier inversion formula, P{𝜉 ≤ 𝑥} − Φ(𝑥) =
1 2𝜋
∫
∞
𝛾(𝜎𝑡)
−∞
𝑛 Ö 𝑘=1
cos(𝜃 𝑘 𝑡) − e−𝑡
2 /2
e𝑖 𝑥𝑡 d𝑡. 𝑖𝑡
19.3 Coordinatewise Symmetry and Log-concavity
385
We need to bound the absolute value of this integral, up to an absolute factor, by 𝜎 2 + 𝜀 2 . First consider the main case 𝜎 ≤ 𝜀 1/2 . Splitting the integration over the regions |𝑡| ≤ 𝜀 −1/2 , 𝜀 −1/2 ≤ |𝑡| ≤ 𝜎 −1 and |𝑡| ≥ 𝜎 −1 , we represent the above integral as the sum 𝐼1 + 𝐼2 + 𝐼3 . 2 4 In order to estimate 𝐼1 , one may use the simple equality e𝑠 /2 cos 𝑠 = e𝑎𝑠 for |𝑠| ≤ 1 with |𝑎| ≤ 1. Since |𝜃 𝑘 | ≤ 𝜀 1/2 for all 𝑘, we have, for |𝑡| ≤ 𝜀 −1/2 , 𝑛 𝑛 Ö n ∑︁ o 2 2 e 𝜃𝑘 𝑡 /2 cos(𝜃 𝑘 𝑡) − 1 = exp 𝑎𝑡 4 𝜃 4𝑘 − 1 ≤ 3𝜀 2 𝑡 4 . 𝑘=1
𝑘=1
The characteristic function 𝛾 of Γ satisfies 1 − 𝑐𝑡 2 ≤ 𝛾(𝑡) ≤ 1 for 0 ≤ 𝑡 ≤ 1 with an absolute constant 𝑐 > 0. We may thus conclude that for |𝑡| ≤ 𝜀 −1/2 𝛾(𝜎𝑡)
𝑛 Ö
2 2 /2
e 𝜃𝑘 𝑡
cos(𝜃 𝑘 𝑡) = 1 + 𝑎 1 𝑐 𝜎 2 𝑡 2 1 + 3𝑎 2 𝜀 2 𝑡 4 ) = 1 + 𝑎 3 4𝑐𝜎 2 𝑡 2 + 3 𝜀 2 𝑡 4 ,
𝑘=1
with some |𝑎 𝑗 | ≤ 1. The latter formula yields 𝜀 −1/2
∫ |𝐼1 | =
e−𝑡
2 /2
𝛾(𝜎𝑡)
cos(𝜃 𝑘 𝑡) − 1
e𝑖 𝑥𝑡
d𝑡
𝑘=1
∞
e−𝑡
≤𝐶
2 2 /2
e 𝜃𝑘 𝑡
𝑖𝑡
−𝜀 −1/2
∫
𝑛 Ö
2 /2
(𝜎 2 𝑡 2 + 𝜀 2 𝑡 4 )
−∞
d𝑡 ≤ 𝐶 ′ (𝜎 2 + 𝜀 2 ) |𝑡|
for some absolute constants 𝐶, 𝐶 ′. Í To bound 𝐼2 , let 𝐽 = {𝑘 ≤ 𝑛 : |𝜃 𝑘 | ≤ 𝜎}, so that 𝑘 ∈𝐽 𝜃 2𝑘 ≥ 12 , by the hypothesis 2 (19.13). Applying cos 𝑠 ≤ e−𝑠 /2 (|𝑠| ≤ 1), we get, for |𝑡| ≤ 𝜎 −1 , 𝑛 𝑛 Ö Ö n 1 ∑︁ o 2 𝜃 2𝑘 ≤ e−𝑡 /4 . cos(𝜃 𝑘 𝑡) ≤ cos(𝜃 𝑘 𝑡) ≤ exp − 𝑡 2 2 𝑘 ∈𝐽 𝑘=1 𝑘 ∈𝐽
Using the bound 1 − Φ(𝑥) ≤ ∫ |𝐼2 | ≤ 2
1 2
e−𝑥 𝜎 −1
𝜀 −1/2
∫
2 /2
Ö 𝑛 2 d𝑡 cos(𝜃 𝑘 𝑡) + e−𝑡 /2 𝑡 𝑘=1
∞
e−𝑡
≤4
(𝑥 > 0), we then obtain that
2 /4
d𝑡 < 8 e−1/(4𝜀) < 64 𝜀 2 .
𝜀 −1/2
The bound for 𝐼3 is easy as well. Since 𝛾(𝜎𝑡) = 0 for |𝑡| ≥ 𝜎 −1 , we have ∫ ∞ 2 2 d𝑡 < 2e−1/(2𝜎 ) < 4𝜎 2 . |𝐼3 | ≤ 2 e−𝑡 /2 𝑡 𝜎 −1 It remains to collect together all bounds for the integrals 𝐼 𝑗 , which leads to (19.14).
386
19 Distributions With Symmetries
In the case 𝜎 > 𝜀 1/2 , the argument is simpler, since then we only need to consider the integrals 𝐼0 and 𝐼3 over the regions |𝑡| ≤ 𝜎 −1 and |𝑡| ≥ 𝜎 −1 respectively. Since the first region is contained in |𝑡| ≤ 𝜀 −1/2 , we have |𝐼0 | ≤ 𝐶 ′ (𝜎 2 + 𝜀 2 ) according to the previous step for the integral 𝐼1 . The bound |𝐼3 | < 4𝜎 2 also holds. □ The lemma is thus proved, and we are ready to return to the setting of Proposition 19.3.1. Proof (of Proposition 19.3.1) Assume that the random variable Γ with a special distribution as above is independent of the random vector 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ). In view of Lemmas 19.3.3–19.3.4, it is sufficient to derive the inequality 𝑛 ∑︁ sup P 𝜀Γ + 𝜃 𝑘 𝑋 𝑘 ≤ 𝑦 − Φ(𝑦) ≤ 𝐶𝜀 2 , 𝑦 ∈R
(𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ,
𝑘=1
√︃ Í with 𝜀 = 4 𝐶ˆ 𝑛𝑘=1 𝜃 4𝑘 and for some positive absolute constant 𝐶. Let Δ1 , . . . , Δ𝑛 be independent, symmetric Bernoulli random variables that are also independent of 𝑋 and Γ. Fix 𝑦 ∈ R, 𝜃 ∈ S𝑛−1 , and define the function 𝑛 n o ∑︁ 𝑃 𝑦 (𝑥) = P 𝜀Γ + 𝜃 𝑘 𝑥 𝑘 Δ𝑘 ≤ 𝑦 ,
𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ) ∈ R𝑛 .
𝑘=1
Since the density of 𝑋 is coordinate-wise symmetric, the random variable Í has the same distribution as 𝑛𝑘=1 𝜃 𝑘 𝑋 𝑘 Δ 𝑘 , so that n
P 𝜀Γ +
𝑛 ∑︁
o 𝜃 𝑘 𝑋 𝑘 ≤ 𝑦 = E𝑃 𝑦 (𝑋).
Í𝑛 𝑘=1
𝜃 𝑘 𝑋𝑘
(19.15)
𝑘=1
Í Let 𝐴 denote the collection of all points 𝑥 ∈ R𝑛 such that 12 ≤ 𝑛𝑘=1 𝜃 2𝑘 𝑥 2𝑘 ≤ 32 . By Lemma 19.3.5 applied to the vector (𝜃 1 𝑥 1 , . . . , 𝜃 𝑛 𝑥 𝑛 ) and with 𝜎 = 𝜀, we have 𝑛 𝑛 . ∑︁ 1/2 ∑︁ 𝜃 2𝑘 𝑥 2𝑘 𝜃 4𝑘 𝑥 4𝑘 , 𝑃 𝑦 (𝑥) − Φ 𝑦 ≤ 𝐶 𝜀2 + 𝑘=1
𝑦 ∈ 𝐴.
(19.16)
𝑖=1
Í On the other hand, for the random variable 𝑈 = 𝑛𝑘=1 𝜃 2𝑘 𝑋 𝑘2 , the weighted Poincarétype inequality of Proposition 6.6.1 with 𝑓 (𝑥 1 , . . . , 𝑥 𝑛 ) = 𝜃 12 𝑥12 + · · · + 𝜃 2𝑛 𝑥 𝑛2 (cf. also Proposition 6.6.2) gives Var(𝑈) ≤ 16
𝑛 ∑︁ 𝑘=1
𝜃 4𝑘 E𝑋 𝑘4 ≤ 16 𝐶ˆ
𝑛 ∑︁
𝜃 4𝑘 = 𝜀 2 .
𝑘=1
Since E 𝑈 = 1, Chebyshev’s inequality yields P{|𝑈 − 1| ≥ i.e., P(𝑋 ∉ 𝐴) ≤ 4𝜀 2 . From (19.16) we thus get
1 2}
≤ 4Var(𝑈) ≤ 4𝜀 2 ,
19.4 Remarks
387
𝑛 ∑︁ √ 𝜃 4𝑘 𝑋 𝑘4 ≤ 𝐶 ′ 𝜀 2 , E𝑃 𝑦 (𝑋) − E Φ(𝑦/ 𝑈) ≤ P{𝑋 ∉ 𝐴} + 𝐶 E 𝜀 2 + 𝑘=1
ˆ As a result, in view of (19.15), we only need to where we used the bound E𝑋 𝑘4 ≤ 𝐶. show that for some constant 𝐶 √ |E Φ(𝑦/ 𝑈) − Φ(𝑦)| ≤ 𝐶𝜀 2 . √ And indeed, according to Proposition 12.3.1 with 𝜌 = 𝑈, the expression on the left does not exceed 3 Var(𝜌 2 ) = 3 Var(𝑈) ≤ 3𝜀 2 . The proposition is proved. □
19.4 Remarks The Berry–Esseen bound of Corollary 19.1.2 in the form (19.5) was proved by Goldstein and Shao [97] (with different absolute constants), cf. also an earlier paper by M. Meckes and E. Meckes [140] for similar results where the closeness of 𝐹𝜃 to Φ is given explicitly in terms of 𝜃. Multidimensional projections of log-concave distributions with an unconditional basis, that is, the distributions of random vectors ⟨𝑋, 𝜃 1 ⟩ , . . . , ⟨𝑋, 𝜃 𝑘 ⟩ , 𝜃 𝑖 ∈ S𝑛−1 , 1 ≤ 𝑘 ≤ 𝑛, were considered by M. Meckes [141]. As we have already mentioned, Proposition 19.3.1 is due to Klartag [123].
Chapter 20
Product Measures
In this chapter we shall discuss the classical scheme of sums of independent random variables, which will allow us to sharpen many of the previous results. In particular, the logarithmic factor appearing in the bounds for the Kolmogorov distance in Propositions 17.1.1 and 17.5.1 may be removed (as well as in the deviation bound of Proposition 17.6.1). This is shown using Fourier Analysis, more precisely – a third order Edgeworth expansion for characteristic functions under the 4-th moment condition (cf. Chapter 4), and applying several results from Chapter 10 about deviations of elementary polynomials on the unit sphere. Even better bounds hold when applying a fourth order Edgeworth expansion under the 5-th moment condition.
20.1 Edgeworth Expansion for Weighted Sums Throughout this chapter, let 𝑋1 , . . . , 𝑋𝑛 (𝑛 ≥ 2) be independent random variables with mean zero, variance one, and finite absolute moments 𝛽 𝑝,𝑘 = E |𝑋 𝑘 | 𝑝 of an integer order 𝑝 ≥ 3. As before, we consider the weighted sums 𝑆 𝜃 = 𝜃 1 𝑋1 + · · · + 𝜃 𝑛 𝑋 𝑛 ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ,
together with their distribution functions 𝐹𝜃 (𝑥) = P{𝑆 𝜃 ≤ 𝑥}. As we know from Chapter 4, every 𝐹𝜃 is approximated by the standard normal law with density and distribution function ∫ 𝑥 1 −𝑥 2 /2 𝜑(𝑥) = √ e , Φ(𝑥) = 𝜑(𝑦) d𝑦 (𝑥 ∈ R). −∞ 2𝜋 Moreover, the rate of this approximation may be controlled by means of the Berry– Esseen inequality 𝑛 ∑︁ 𝜌(𝐹𝜃 , Φ) ≤ 𝑐 |𝜃 𝑘 | 3 𝛽3,𝑘 , 𝑘=1
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_20
389
390
20 Product Measures
where 𝑐 > 0 is an absolute constant, cf. (4.15). Here, the sum represents the Lyapunov coefficient 𝐿 3 for the weighted sequence {𝜃 𝑘 𝑋 𝑘 } 𝑛𝑘=1 and is greater than or equal √ to 1/ 𝑛. To improve this standard rate of approximation (which is however not guaranteed in general), one should consider an Edgeworth correction of the normal law of order 𝑝 − 1 with some 𝑝 ≥ 4. To describe these corrections explicitly, let us recall Definition 4.5.2 (cf. (4.23), Proposition 4.7.1 and Definition 4.7.2. We note that the cumulants for the weighted sum exist up to order 𝑝 and are given by the formula 𝛾𝑙 (𝜃) ≡ 𝛾𝑙 (𝑆 𝜃 ) =
𝑛 ∑︁
𝛾𝑙 (𝑋 𝑘 ) 𝜃 𝑙𝑘 ,
𝑙 = 1, 2, . . . , 𝑝,
𝑘=1
in which every cumulant 𝛾𝑙 (𝑋 𝑘 ) represents a certain polynomial in moments E𝑋 𝑘𝑚 , 𝑚 ≤ 𝑙. In particular, 𝛾1 (𝑋 𝑘 ) = 0 and 𝛾2 (𝑋 𝑘 ) = 1, by the assumption that E𝑋 𝑘 = 0 and Var(𝑋 𝑘 ) = 1. Using these functionals, the Edgeworth correction of order 𝑝−1 for 𝐹𝜃 is defined as a signed Borel measure on the real line with total measure 1, whose Fourier–Stieltjes transform, density and “distribution” function are respectively given by 𝛾 (𝜃) 𝑘1 𝛾 (𝜃) 𝑘 𝑝−3 1 𝑝−1 3 ··· (𝑖𝑡) 𝑘 , 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)! 𝛾 (𝜃) 𝑘1 𝛾 (𝜃) 𝑘 𝑝−3 ∑︁ 1 𝑝−1 3 ··· 𝐻 𝑘 (𝑥), 𝜑 𝑝−1, 𝜃 (𝑥) = 𝜑(𝑥) + 𝜑(𝑥) 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)! 𝛾 (𝜃) 𝑘1 𝛾 (𝜃) 𝑘 𝑝−3 ∑︁ 1 𝑝−1 3 Φ 𝑝−1, 𝜃 (𝑥) = Φ(𝑥) − 𝜑(𝑥) ··· 𝐻 𝑘−1 (𝑥) 𝑘 1 ! · · · 𝑘 𝑝−3 ! 3! ( 𝑝 − 1)! 𝑔 𝑝−1, 𝜃 (𝑡) = 𝑔(𝑡) + 𝑔(𝑡)
∑︁
2
(for 𝑡 ∈ R and 𝑥 ∈ R). Here 𝑔(𝑡) = e−𝑡 /2 is the characteristic function of the standard normal law, and 𝐻 𝑘 (𝑥) is the Chebyshev–Hermite polynomial of degree 𝑘 = 3𝑘 1 + · · · + ( 𝑝 − 1)𝑘 𝑝−3 . In each formula, the summation runs over all non-negative integers 𝑘 1 , . . . , 𝑘 𝑝−3 that are not all zero and such that 𝑘 1 + 2𝑘 2 + · · · + ( 𝑝 − 3)𝑘 𝑝−3 ≤ 𝑝 − 3. In particular, 𝛾3 (𝜃) (𝑖𝑡) 3 , 3! 𝛾4 (𝜃) 𝛾3 (𝜃) 2 𝛾3 (𝜃) 6 −𝑡 2 /2 (𝑖𝑡) 3 + (𝑖𝑡) 4 + (𝑖𝑡) , 𝑔4, 𝜃 (𝑡) = e 1+ 3! 4! 2! 3!2 𝑔3, 𝜃 (𝑡) = e−𝑡
2 /2
1+
(20.1)
and 𝛾3 (𝜃) 𝐻2 (𝑥)𝜑(𝑥), 3! 𝛾 (𝜃) 𝛾4 (𝜃) 𝛾3 (𝜃) 2 3 Φ4, 𝜃 (𝑥) = Φ(𝑥) − 𝜑(𝑥) 𝐻2 (𝑥) + 𝐻3 (𝑥) + 𝐻 (𝑥) . 5 3! 4! 2! 3!2
Φ3, 𝜃 (𝑥) = Φ(𝑥) −
(20.2)
20.1 Edgeworth Expansion for Weighted Sums
391
A number of properties of the functions Φ 𝑝−1, 𝜃 which we discussed before in Chapter 4 are controlled by the Lyapunov coefficients. In the case of the weighted sequence 𝜃 1 𝑋1 , . . . , 𝜃 𝑛 𝑋𝑛 , they are defined by 𝐿 𝑝 (𝜃) =
𝑛 ∑︁
𝛽 𝑝,𝑘 |𝜃 𝑘 | 𝑝 ,
𝜃 ∈ S𝑛−1 ,
𝑘=1 𝑝−2
thus representing weighted ℓ 𝑝 -norms on the sphere. Although 𝐿 𝑝 (𝜃) ≥ 𝑛− 2 for all 𝜃 (Proposition 4.2.2), these functions are of the same order for most of 𝜃 on the sphere with respect to the uniform measure 𝔰𝑛−1 . Therefore, we use as basic functionals 𝑛 1 ∑︁ 𝛽 𝑝,𝑘 . 𝛽 𝑝 = max 𝛽 𝑝,𝑘 and 𝛽¯ 𝑝 = 1≤𝑘 ≤𝑛 𝑛 𝑘=1 Clearly, 𝛽 𝑝 ≥ 𝛽¯ 𝑝 ≥ 1
𝐿 𝑝 (𝜃) ≤ 𝛽 𝑝 ≤ 𝑛 𝛽¯ 𝑝 .
and 1
Since 𝛽¯2 = 1, the function 𝑝 → ( 𝛽¯ 𝑝 ) 𝑝−2 is non-decreasing for 𝑝 > 2 in analogy 1 𝑝−2 is non-dewith 𝐿 𝑝 (Proposition 4.2.1). Since each 𝛽2,𝑘 = 1, the function 𝑝 → 𝛽 𝑝,𝑘 1
creasing as well, and the same is thus true for 𝛽 𝑝𝑝−2 . In particular, for all 𝑝 ≥ 4, 1
1
𝛽¯3 ≤ ( 𝛽¯ 𝑝 ) 𝑝−2 ,
𝛽3 ≤ 𝛽 𝑝𝑝−2 ,
2
𝛽¯4 ≤ ( 𝛽¯ 𝑝 ) 𝑝−2 ,
2
𝛽4 ≤ 𝛽 𝑝𝑝−2 .
In terms of these functionals, one may control deviations of the Lyapunov coefficients, as the following statement shows, with a rate depending on the type of the functional. Here and in the sequel, we denote by 𝑐 a positive absolute constant and by 𝑐 𝑝 a positive constant which depends on 𝑝 only. Proposition 20.1.1 For any integer 𝑝 ≥ 3, E 𝜃 𝐿 𝑝 (𝜃) ≤ 𝑝 𝑝/2 𝛽¯ 𝑝 𝑛−
𝑝−2 2
(20.3)
.
Moreover, n 𝑝−2 o 𝔰𝑛−1 𝑛 2 𝐿 𝑝 (𝜃) ≥ 𝑐 𝑝 𝛽¯ 𝑝 𝑟 ≤ 2 exp − 𝑟 2/ 𝑝 , 𝑟 > 0, n 𝑝−2 o 𝔰𝑛−1 𝑛 2 𝐿 𝑝 (𝜃) ≥ 𝑐 𝑝 𝛽 𝑝 𝑟 ≤ exp − (𝑟𝑛) 2/ 𝑝 , 𝑟 ≥ 1.
(20.4) (20.5)
Indeed, by the definition, E 𝜃 𝐿 𝑝 (𝜃) = 𝑛 𝛽¯ 𝑝 E 𝜃 |𝜃 1 | 𝑝 . 𝑝
As was shown in Section 10.8, we have E 𝜃 |𝜃 1 | 𝑝 ≤ 𝑝 𝑝/2 𝑛− 2 , and (20.3) follows. This bound is also implied by (20.4), although with a worse 𝑝-dependent factor. In turn, (20.4) is obtained from the large deviation bound (10.23) on the unit sphere
392
20 Product Measures
Í applied with coefficients 𝑎 𝑘 = 𝛽 𝑝,𝑘 / 𝑛𝑗=1 𝛽 𝑝, 𝑗 . Using 𝐿 𝑝 (𝜃) ≤ 𝛽 𝑝
𝑛 ∑︁
|𝜃 𝑘 | 𝑝 ,
𝑘=1
(20.5) follows from the deviation bound (10.22), cf. Proposition 10.8.1.
20.2 Approximation of Characteristic Functions of Weighted Sums By the independence hypothesis, the characteristic functions of the weighted sums 𝑆 𝜃 have a product structure, 𝑓 𝜃 (𝑡) = E e𝑖𝑡𝑆 𝜃 =
𝑛 Ö
E e𝑖𝑡 𝜃𝑘 𝑋𝑘
(𝜃 ∈ S𝑛−1 , 𝑡 ∈ R).
𝑘=1
Hence, one may apply the general Proposition 4.8.2 in order to explore the rate of approximation of 𝑓 𝜃 (𝑡) by the corrected “normal” characteristic functions 𝑔 𝑝−1, 𝜃 (𝑡) for most directions 𝜃 ∈ S𝑛−1 . This can be done by√virtue of Proposition 20.1.1 on the 𝑡-intervals of a moderate size such as |𝑡| ≤ 𝑛/(𝑐𝛽3 ), which also allows one to consider an approximation for deviations of 𝑓 𝜃 (𝑡) from the mean (typical) characteristic function 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡), 𝑡 ∈ R, in terms of deviations of 𝑔 𝑝−1, 𝜃 (𝑡) from their means. Recall that, by Proposition 4.7.4, |𝑔 𝑝−1, 𝜃 (𝑡)| ≤ 𝑐 𝑝 max{𝐿 𝑝 (𝜃), 1}. Moreover, we have the following statement. Proposition 20.2.1 If 𝛽 𝑝 is finite for 𝑝 ≥ 3, then for all 𝜃 ∈ S𝑛−1 , | 𝑓 𝜃 (𝑡) − 𝑔 𝑝−1, 𝜃 (𝑡)| ≤ 𝑐 𝑝 𝐿 𝑝 (𝜃) min{1, |𝑡| 𝑝 } e−𝑡
2 /8
,
|𝑡| ≤
1 . 𝐿 3 (𝜃)
(20.6)
As a consequence, for all 𝜃 ∈√S𝑛−1 except for a set of measure ≤ 𝑐 𝑝 𝛽 𝑝 exp{−𝑛2/ 𝑝 }, we have in the interval |𝑡| ≤ 𝑛/(𝑐𝛽3 ) 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) = 𝑔 𝑝−1, 𝜃 (𝑡) − E 𝜃 𝑔 𝑝−1, 𝜃 (𝑡) + 𝜀 with
𝑝−2 2 |𝜀| ≤ 𝑐 𝑝 𝛽 𝑝 𝑛− 2 |𝑡| 𝑝 e−𝑡 /8 + exp − 𝑛2/ 𝑝 .
(20.7)
20.2 Approximation of Characteristic Functions of Weighted Sums
393
Proof The inequality (20.6) is a partial case of Proposition 4.8.2. Using (20.5) with 𝑝 = 3 and 𝑟 = 1, we obtain that the inequality 𝑐𝛽3 1 𝐿 3 (𝜃) ≤ √ ≡ 𝑇𝑛 𝑛 holds true with some absolute constant 𝑐 > 0 for all 𝜃 from a set Ω0 on the sphere of measure at least 1 − exp{−𝑛2/3 }. Hence, by (20.6), for all 𝜃 ∈ Ω0 , sup |𝑡 | ≤𝑇𝑛
h
i 1 | 𝑓 (𝑡) − 𝑔 (𝑡)| ≤ 𝑐 𝑝 𝐿 𝑝 (𝜃). 𝜃 𝑝−1, 𝜃 min{1, |𝑡| 𝑝 } e−𝑡 2 /8
Applying once more (20.5) with 𝑟 = 1, we get that the inequality 𝑝−2 𝑓 𝜃 (𝑡) − 𝑔 𝑝−1, 𝜃 (𝑡) ≤ 𝑐 𝑝 𝛽 𝑝 𝑛− 2 |𝑡| 𝑝 e−𝑡 2 /8 ,
|𝑡| ≤ 𝑇𝑛 ,
(20.8)
holds for all 𝜃 in a set Ω on the sphere of measure at least 1 − 2 exp{−𝑛2/ 𝑝 }. One may replace 𝑓 𝜃 (𝑡) and 𝑔 𝑝−1, 𝜃 (𝑡) in this inequality by their mean values 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) and 𝑔 𝑝−1 (𝑡) = E 𝜃 𝑔 𝑝−1, 𝜃 (𝑡) at the expense of an error not exceeding (1 − 𝔰𝑛−1 (Ω)) sup | 𝑓 𝜃 (𝑡) − 𝑔 𝑝−1, 𝜃 (𝑡)| ≤ 4 exp − 𝑛2/ 𝑝 . 𝑡
Averaging over 𝜃 in (20.8), this inequality thus yields in the same 𝑡-interval 𝑝−2 𝑓 (𝑡) − 𝑔 𝑝−1 (𝑡) ≤ 𝑐 𝑝 𝛽 𝑝 𝑛− 2 |𝑡| 𝑝 e−𝑡 2 /8 + 4 exp − 𝑛2/ 𝑝 . Finally, combining the latter with (20.8), one may bound the expression ( 𝑓 𝜃 (𝑡) − 𝑔 𝑝−1, 𝜃 (𝑡)) − ( 𝑓 (𝑡) − 𝑔 𝑝−1 (𝑡)) = 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) − 𝑔 𝑝−1, 𝜃 (𝑡) − E 𝜃 𝑔 𝑝−1, 𝜃 (𝑡) by a similar quantity. As a result, we arrive at (20.7), and Proposition 20.2.1 is proved. □ Let us now express the representation (20.7) in a more explicit form in the particular cases 𝑝 = 4 and 𝑝 = 5, assuming that the random variables 𝑋1 , . . . , 𝑋𝑛 are identically distributed, with E𝑋1 = 0 and E𝑋12 = 1. Note that 𝛽¯ 𝑝 = 𝛽 𝑝 = E |𝑋1 | 𝑝 . For 𝑝 = 4, the representation (20.1) for the corrected normal characteristic function 𝑔3, 𝜃 (𝑡) contains the term 𝛾3 (𝜃) = 𝛼3 𝛼3 (𝜃),
𝛼3 (𝜃) =
𝑛 ∑︁
𝜃 3𝑘 ,
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ,
𝑘=1
where 𝛼3 = E𝑋13 = 𝛾3 (𝑋1 ). Since these functions have 𝑠 𝑛−1 -mean zero, from (20.7) we obtain:
394
20 Product Measures
𝑛−1 except for a set on the sphere Corollary 20.2.2 If 𝛽4 is finite, then for all 𝜃 ∈ S√ √ − 𝑛 of measure at most 𝑐𝛽4 e , in the interval |𝑡| ≤ 𝑛/(𝑐𝛽3 ), we have
(𝑖𝑡) 3 −𝑡 2 /2 e +𝜀 3!
𝑓 𝜃 (𝑡) − 𝑓 (𝑡) = 𝛼3 𝛼3 (𝜃) with |𝜀| ≤ 𝑐𝛽4
1
𝑡 4 e−𝑡
2 /8
√ + exp − 𝑛 .
𝑛
Turning to the next order of approximation with 𝑝 = 5, let us recall that 𝛾4 (𝑋1 ) = E𝑋14 − 3 = 𝛽4 − 3. Hence, we need to involve the cumulant function 𝛾4 (𝜃) = (𝛽4 − 3) 𝑙4 (𝜃),
𝑙4 (𝜃) =
𝑛 ∑︁
𝜃 4𝑘 .
𝑘=1
Moreover, assuming that 𝛼3 = 0, the expression in (20.1) for 𝑔4, 𝜃 (𝑡) is considerably 3 , we then obtain: simplified. Since E 𝜃 𝑙4 (𝜃) = 𝑛+2 Corollary 20.2.3 If 𝛽5 is finite and 𝛼3 = 0, then for all√𝜃 ∈ S𝑛−1 except for a set of measure at most 𝑐𝛽5 exp{−𝑛2/5 }, in the interval |𝑡| ≤ 𝑛/(𝑐𝛽3 ), we have 𝑓 𝜃 (𝑡) − 𝑓 (𝑡) = (𝛽4 − 3) 𝑙4 (𝜃) − with |𝜀| ≤ 𝑐𝛽5
3 𝑡 4 −𝑡 2 /2 e +𝜀 𝑛 + 2 4!
1 2 |𝑡| 5 e−𝑡 /8 + exp − 𝑛2/5 . 3/2 𝑛
20.3 Integral Bounds on Characteristic Functions Let us now return to the general (not necessarily i.i.d.) setting and consider the behavior of characteristic functions 𝑓 𝜃 (𝑡) on large 𝑡-intervals. More precisely, here we focus on integrals of the form ∫ 𝐼 𝑝 (𝜃) = 1 { 𝛿 ≤𝐿3 ( 𝜃) ≤1}
1/ 𝛿
1/𝐿3 ( 𝜃)
| 𝑓 𝜃 (𝑡)| d𝑡, 𝑡
(20.9)
which appear in the Berry–Esseen bound of Proposition 4.8.1. Here and in the sequel, we choose the parameter 𝛿 = 𝛽¯ 𝑝 𝑛−
𝑝−2 2
=𝑛
− 𝑝/2
𝑛 ∑︁ 𝑘=1
E |𝑋 𝑘 | 𝑝 .
20.3 Integral Bounds on Characteristic Functions
395
Proposition 20.3.1 Let 𝑝 ≥ 4. For some absolute constants 𝑐 > 0 and 𝐶 > 0, n o − 1 E 𝜃 𝐼 𝑝 (𝜃) ≤ 𝐶 𝑝 1 { 𝛿 ≤1} exp − 𝑐𝑛1/4 𝛽¯ 𝑝 2( 𝑝−2) .
(20.10)
Moreover, o n o n − 2 E 𝜃 𝐼 𝑝 (𝜃) ≤ 𝐶 𝑝 1 { 𝛿 ≤1} exp − 2𝑛2/3 + exp − 𝑐𝑛𝛽 𝑝 𝑝−2 .
(20.11)
Proof If 𝛿 > 1, then 𝐼 𝑝 (𝜃) = 0, and there is nothing to prove. So, let 𝛿 ≤ 1. On the sphere S𝑛−1 we consider the sets Ω0 = {𝛿 ≤ 𝐿 3 (𝜃) ≤ 1},
Ω1 = {𝐿 3 (𝜃) < 𝜀},
Ω2 = {𝐿 3 (𝜃) ≥ 𝜀}
with parameter 𝜀 > 0 to be chosen later on. To estimate the measure of the last set, 1 note that the bound 𝛽¯3 ≤ ( 𝛽¯ 𝑝 ) 𝑝−2 implies √
1 1 𝑛 √ ¯ − 𝑝−2 ≥ 𝑛 (𝛽𝑝) = 𝛿− 𝑝−2 . ¯ 𝛽3
Hence, applying the bound (20.4) of Proposition 20.1.1 with 𝑝 = 3, we have o n √𝑛 2/3 o n 2 𝜀 ≤ 2 exp − 𝑐 𝜀 2/3 𝛿− 3( 𝑝−2) 𝔰𝑛−1 (Ω2 ) ≤ 2 exp − 𝑐 𝛽¯3 with an absolute constant 𝑐 > 0. Since | 𝑓 𝜃 (𝑡)| ≤ 1, while the lower limit of integration in (20.9) is necessarily greater than or equal to 1, we have 𝐼 𝑝 (𝜃) ≤ log(1/𝛿) for all 𝜃 ∈ S𝑛−1 . This gives o n 2 E 𝜃 𝐼 𝑝 (𝜃) 1Ω2 ≤ 2 log(1/𝛿) exp − 𝑐 𝜀 2/3 𝛿− 3( 𝑝−2) . Turning to the integral over the complementary set and putting 𝜀 ′ = min{𝜀, 1}, note that, for all 𝜃 ∈ Ω1 , ∫ 1/ 𝛿 | 𝑓 𝜃 (𝑡)| d𝑡, 𝐼 𝑝 (𝜃) ≤ 1 { 𝛿 ≤ 𝜀′ } ′ 𝑡 1/𝜀 so that ∫ E 𝜃 𝐼 𝑝 (𝜃) 1Ω1 ≤ 1 { 𝛿 ≤ 𝜀′ }
1/ 𝛿
1/𝜀′
E | 𝑓 𝜃 (𝑡)| d𝑡. 𝑡 2
In this step we involve Lemma 13.1.4. Since 𝛽¯4 ≤ ( 𝛽¯ 𝑝 ) 𝑝−2 , this gives o n 𝑛 √ √ 2 2 ( 𝛽¯ 𝑝 ) − 𝑝−2 E 𝜃 | 𝑓 𝜃 (𝑡)| ≤ 3 e−𝑡 /4 + 6 exp − 32 o n 1 √ −𝑡 2 /4 √ 2 = 3e 𝛿− 𝑝−2 . + 6 exp − 32
396
20 Product Measures
Hence ∫
1/ 𝛿
n 1 o d𝑡 2 + 3 exp − 𝛿− 𝑝−2 32 𝑡 1/𝜀′ o o n n 2 1 1 − 𝑝−2 ≤ 4 exp − 2 + 3 log(1/𝛿) exp − 𝛿 . (20.12) 32 4𝜀
E 𝜃 𝐼 𝑝 (𝜃) 1Ω1 ≤ 1 { 𝛿 ≤ 𝜀′ }
2 e−𝑡
2 /4
One can now combine the two estimates, which leads to o n n 2 1 o E 𝜃 𝐼 𝑝 (𝜃) ≤ 4 exp − 2 + 5 log(1/𝛿) exp − 𝑐 𝜀 2/3 𝛿− 3( 𝑝−2) . 4𝜀 In order to (approximately) optimize the right-hand side over 𝜀, it is sufficient to equalize the two exponential terms on the right-hand side. Namely, choosing 1 𝜀 = 𝛿 4( 𝑝−2) ≤ 1, the resulting bound simplifies to o n 1 E 𝜃 𝐼 𝑝 (𝜃) ≤ (4 + 5 log(1/𝛿)) exp − 𝑐 𝛿− 2( 𝑝−2) . Choosing a smaller value of the constant 𝑐 > 0, one may replace the logarithmic term with a quantity not exceeding 𝐶 𝑝, since oi h n 𝑐 1 sup log(1/𝛿) exp − 𝛿− 2( 𝑝−2) ≤ 𝐶 𝑝. 2 0< 𝛿 ≤1 As a result, we arrive at (20.10). Turning to the second√bound of the lemma, define the sets Ω 𝑗 in a similar way with 𝜀 = 𝑐𝑏 𝑛 , 𝑏 𝑛 = 𝛽3 / 𝑛, where the constant 𝑐 > 0 is chosen according to the bound (20.5) of Proposition 20.1.1, so that 𝔰𝑛−1 (Ω2 ) ≤ exp − 3𝑛2/3 . Note that 𝛽¯ 𝑝 ≥ 1, which implies log(1/𝛿) ≤
𝑝−2 2
log 𝑛. Since | 𝑓 𝜃 (𝑡)| ≤ 1, we get
𝑝−2 𝔰𝑛−1 (Ω2 ) log 𝑛 ≤ 𝐶 𝑝 exp − 2𝑛2/3 E 𝜃 𝐼 𝑝 (𝜃) 1Ω2 ≤ 2 for some constant 𝐶. Using (20.12) once more, we also have o o n 1 2 1 𝛿− 𝑝−2 + 3 log(1/𝛿) exp − 2 32 4 (𝑐𝑏 𝑛 ) o n n 1 2 𝑛 o ≤ 4 exp − 2 + 𝐶 𝑝 exp − 𝛿− 𝑝−2 . 64 4𝛽3
n E 𝜃 𝐼 𝑝 (𝜃) 1Ω1 ≤ 4 exp −
1
𝑝−2
It remains to combine the two bounds and use 𝛽3 ≤ 𝛽 𝑝𝑝−2 together with 𝛿 ≤ 𝛽 𝑝 𝑛− 2 . Proposition 20.3.1 is proved. □
20.4 Approximation in the Kolmogorov Distance
397
20.4 Approximation in the Kolmogorov Distance The approximation of 𝐹𝜃 by means of the corrected normal “distribution” function Φ 𝑝−1, 𝜃 on average and actually for most of the weight vectors 𝜃 ∈ S𝑛−1 can be given in terms of the 𝑀 𝑝 -functional, which is equivalent in the case of independent components to the maximal 𝐿 𝑝 -norm 𝛽 𝑝 = max 𝑘 (E |𝑋 𝑘 | 𝑝 ) 1/ 𝑝 (within 𝑝-dependent constants, cf. Proposition 2.1.3). One may also use a more flexible functional 𝛽¯ 𝑝 . Recall that, by Proposition 4.7.4, for all 𝜃 ∈ S𝑛−1 , 𝜌(Φ 𝑝−1, 𝜃 , Φ) ≤ 𝑐 𝑝 max{𝐿 𝑝 (𝜃), 1} with some constant 𝑐 𝑝 ≥ 1 depending on 𝑝 only. Since 𝜌(Φ, 𝐹𝜃 ) ≤ 1, a similar inequality also holds if we replace Φ with 𝐹𝜃 so that 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≤ 𝑐 𝑝 max{𝐿 𝑝 (𝜃), 1}. As a consequence, applying (20.4), we obtain that, for any 𝑟 ≥ 1/ 𝛽¯ 𝑝 , 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 𝑐2𝑝 𝛽¯ 𝑝 𝑟 ≤ 𝔰𝑛−1 max{𝐿 𝑝 (𝜃), 1} ≥ 𝑐 𝑝 𝛽¯ 𝑝 𝑟 = 𝔰𝑛−1 𝐿 𝑝 (𝜃) ≥ 𝑐 𝑝 𝛽¯ 𝑝 𝑟 𝑝−2 (20.13) ≤ 2 exp − 𝑟 2/ 𝑝 𝑛 𝑝 . In other words, the distance from 𝐹𝜃 to Φ 𝑝−1, 𝜃 is essentially bounded by 1/ 𝛽¯ 𝑝 . We now derive much stronger inequalities. In particular, one may remove the requirement that the variable 𝑟 is bounded away from zero in the inequality (20.13). Proposition 20.4.1 If 𝑝 ≥ 4, we have E 𝜃 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≤ 𝑐 𝑝 𝛽¯ 𝑝 𝑛−
𝑝−2 2
.
(20.14)
Moreover, for all 𝑟 > 0, 𝑝−2 𝔰𝑛−1 𝑛 2 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 𝑐 𝑝 𝛽¯ 𝑝 𝑟 ≤ 2 exp − 𝑟 2/ 𝑝 .
(20.15)
These large deviation bounds may be further sharpened in terms of the moment functional 𝛽 𝑝 (which is often of the same order as 𝛽¯ 𝑝 ). Proposition 20.4.2 If 𝑝 ≥ 4, we have, for all 𝑟 ≥ 1, 𝑝−2 𝔰𝑛−1 𝑛 2 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 𝑐 𝑝 𝛽 𝑝 𝑟 ≤ 𝜀 𝑝 (𝑛, 𝑟),
(20.16)
where 𝑝−2 − 2 𝜀 𝑝 (𝑛, 𝑟) = exp − (𝑟𝑛) 2/ 𝑝 + 2 𝐼 𝛽¯ 𝑝 ≤ 𝑛 2 exp − min 𝑛2/3 , 𝑐𝑛𝛽 𝑝 𝑝−2 . Here, the last term should be removed when 𝛽¯ 𝑝 > 𝑛
𝑝−2 2
.
398
20 Product Measures
For example, choosing 𝑟 = 1 and assuming that 𝑛 is large enough, this bound gives 𝑝−2 𝔰𝑛−1 𝑛 2 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 𝑐 𝑝 𝛽 𝑝 ≤ exp − 𝑛2/ 𝑝 . Our basic tool is Proposition 4.8.1, which relates the Kolmogorov distance to the Lyapunov coefficients 𝐿 𝑝 (𝜃) and the characteristic functions 𝑓 𝜃 (𝑡) via the upper bound 𝑐 𝑝 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≤ 𝐿 𝑝 (𝜃) + 𝐼 𝑝 (𝜃) + 𝛿, where 𝐼 𝑝 (𝜃) is the integral defined in (20.9) with arbitrary 𝛿 ≥ 0. As before, we 𝑝−2 choose the value 𝛿 = 𝛽¯ 𝑝 𝑛− 2 , so that 𝑐 𝑝 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≤ 𝐿 𝑝 (𝜃) + 𝐼 𝑝 (𝜃) + 𝛽¯ 𝑝 𝑛−
𝑝−2 2
.
(20.17)
With this approach we are reduced to the study of the distributions of these two functionals on the sphere, 𝐿 𝑝 (𝜃) and 𝐼 𝑝 (𝜃), considered in the previous two sections. Proof (of Proposition 20.4.1) Using e−𝑥 ≤ 𝑝 2 𝑝 𝑥 −2( 𝑝−2) (𝑥 > 0), from the bound (20.10) of Proposition 20.3.1 it follows that E 𝜃 𝐼 𝑝 (𝜃) ≤ 𝑐 𝑝 𝛽¯ 𝑝 𝑛−
𝑝−2 2
.
Applying this in (20.17) together with a similar bound (20.3) on E 𝜃 𝐿 𝑝 (𝜃), we arrive at the first inequality (20.14). By Markov’s inequality, (20.10) also implies that o n 𝑝−2 1 𝐶𝑝 exp − 𝑐𝛿− 2( 𝑝−2) . 𝔰𝑛−1 𝑛 2 𝐼 𝑝 (𝜃) ≥ 𝑐 𝑝 𝛽¯ 𝑝 ≤ 1 { 𝛿 ≤1} 𝑐𝑝 𝛿 Since sup 𝛿>0
(20.18)
oi 2 2 𝑝 n 𝑐 1 exp − 𝛿− 2( 𝑝−2) ≤ , 𝛿 2 𝑐
h1
one can choose suitable values of 𝑐 > 0 and 𝑐 𝑝 > 0 in order to simplify (20.18) to the form o n 𝑝−2 1 𝔰𝑛−1 𝑛 2 𝐼 𝑝 (𝜃) ≥ 𝑐 𝑝 𝛽¯ 𝑝 ≤ 1 { 𝛿 ≤1} exp − 𝑐𝛿− 2( 𝑝−2) . Let us now combine this with the deviation bound (20.4) on 𝐿 𝑝 (𝜃), and then from (20.17) we get that 𝑝−2 𝔰𝑛−1 𝑛 2 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 2𝑐 𝑝 𝛽¯ 𝑝 𝑟 ≤ 2 exp − 𝑟 2/ 𝑝 o n 1 + 1 { 𝛿 ≤1} exp − 𝑐𝛿− 2( 𝑝−2) . If 𝛿 > 1, the last term on the right-hand side is vanishing, and we obtain the large deviation bound (20.15). But, if 𝛿 ≤ 1, we only have (replacing 2𝑐 𝑝 with 𝑐 𝑝 ) n o 𝑝−2 1 𝔰𝑛−1 𝑛 2 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 𝑐 𝑝 𝛽¯ 𝑝 𝑟 ≤ 3 exp − min 𝑟 2/ 𝑝 , 𝑐𝛿− 2( 𝑝−2) .
20.4 Approximation in the Kolmogorov Distance
399
1
𝑝
In the range 𝑟 2/ 𝑝 ≤ 𝛿− 2( 𝑝−2) , that is, for 𝑟 ≤ 𝑟 0 = 𝛿− 4( 𝑝−2) , the above also leads to the required bound 𝑝−2 𝔰𝑛−1 𝑛 2 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 𝑐 𝑝 𝛽¯ 𝑝 𝑟 ≤ 3 exp − 𝑐𝑟 2/ 𝑝 .
(20.19)
Let now 𝑟 > 𝑟 0 , in which case 𝛽¯ 𝑝 𝑟𝑛−
𝑝−2 2
3 𝑝−8
= 𝛿𝑟 > 𝛿𝑟 0 = 𝛿− 4( 𝑝−2) ≥ 1.
Hence, we may apply the inequality (20.13) with 𝑟𝑛− 𝑝−2 𝔰𝑛−1 𝑛 2
𝑝−2 2
in place of 𝑟, which gives 𝑝−2 o 𝑝−2 2 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 𝑐 𝑝 𝛽¯ 𝑝 𝑟 ≤ 2 exp − (𝑟𝑛− 2 ) 𝑝 𝑛 𝑝 = 2 exp − 𝑟 2/ 𝑝 . n
Thus, we have obtained (20.19) in all cases with arbitrary 𝑟 > 0. Rescaling the variable 𝑟, this inequality may be modified to the form (20.15). Proposition 20.4.1 is proved. □ Proof (of Proposition 20.4.2) We now apply Markov’s inequality on the basis of 𝑝−2 (20.11). Since 𝛽 𝑝 𝑛 2 ≥ 𝛿, it gives 𝑝−2
𝔰𝑛−1 𝑛
𝑝−2 2
𝐼 𝑝 (𝜃) ≥ 𝑐 𝑝 𝛽 𝑝
𝐶𝑝 𝑛 2 exp − 2𝑛2/3 ≤ 1 { 𝛿 ≤1} 𝑐𝑝 𝛽𝑝 o n 2 𝐶𝑝 exp − 𝑐𝛿− 𝑝−2 . + 1 { 𝛿 ≤1} 𝑐𝑝𝛿
As before, choosing a smaller value of 𝑐 and a larger value of 𝑐 𝑝 , the coefficient 𝐶𝑝 𝑐 𝑝 𝛿 in the last term may be removed. Similarly, using 𝛽 𝑝 ≥ 1, the first term may be replaced with exp{−𝑛2/3 }. Hence, the above bound simplifies to o n 𝑝−2 − 2 𝔰𝑛−1 𝑛 2 𝐼 𝑝 (𝜃) ≥ 𝑐 𝑝 𝛽 𝑝 ≤ 1 { 𝛿 ≤1} exp − 𝑛2/3 + 1 { 𝛿 ≤1} exp − 𝑐𝑛𝛽 𝑝 𝑝−2 . It remains to combine this with (20.5). Recalling (20.17), we then get that with some constant 𝑐 𝑝 > 0, for all 𝑟 ≥ 1, 𝑝−2 𝔰𝑛−1 𝑛 2 𝜌(𝐹𝜃 , Φ 𝑝−1, 𝜃 ) ≥ 𝑐 𝑝 𝛽 𝑝 𝑟 ≤ exp − (𝑟𝑛) 2/ 𝑝 + 1 { 𝛿 ≤1} exp − 𝑛2/3 o n − 2 + 1 { 𝛿 ≤1} exp − 𝑐𝑛𝛽 𝑝 𝑝−2 . Here, the condition 𝛿 ≤ 1 is the same as 𝛽¯ 𝑝 ≤ 𝑛 (20.16).
𝑝−2 2
, which leads to the inequality □
400
20 Product Measures
20.5 Normal Approximation Under the 4-th Moment Condition As we have seen so far, the approximation of the distribution functions 𝐹𝜃 by the corrected normal “distribution” functions Φ 𝑝−1, 𝜃 can be made as good as we wish for most 𝜃 under natural moment-type conditions (with respect to the growing parameter 𝑛). However, these approximating functions still depend on 𝜃, and one may wonder if one can use a fixed distribution (or measure) to approximate 𝐹𝜃 , for example, by the average corrected normal distribution function 𝐺 𝑝−1 (𝑥) = E 𝜃 Φ 𝑝−1, 𝜃 (𝑥),
𝑥 ∈ R.
This seems to be a natural approach; however, in general the concentration of Φ 𝑝−1, 𝜃 around 𝐺 𝑝−1 may not be as strong as the concentration of 𝐹𝜃 around Φ 𝑝−1, 𝜃 . In the case 𝑝 = 4, the approximating functions are described in (20.2); let us rewrite them in a more explicit form as Φ3, 𝜃 (𝑥) = Φ(𝑥) −
𝛼3 (𝜃) 2 (𝑥 − 1)𝜑(𝑥), 6
𝛼3 (𝜃) =
𝑛 ∑︁
𝛼3,𝑘 𝜃 3𝑘 ,
𝑘=1
where 𝛼3,𝑘 = E𝑋 𝑘3 . Hence 𝐺 3 = Φ is the standard normal distribution function. Our basic functionals quantifying the strength of concentration of 𝐹𝜃 about Φ3, 𝜃 have been 𝑛 1 ∑︁ 𝛽4,𝑘 (𝛽4,𝑘 = E𝑋 𝑘4 ). 𝛽4 = max 𝛽4,𝑘 and 𝛽¯4 = 1≤𝑘 ≤𝑛 𝑛 𝑘=1 They can be used to quantify the strength of concentration of Φ3, 𝜃 around Φ as well. This leads to the following observation due to Klartag and Sodin [125]. Proposition 20.5.1 We have E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 ¯ 𝛽4 . 𝑛
Moreover, for any 𝑟 > 0, 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ) ≥ 𝑐 𝛽¯4 𝑟 ≤ 2 exp − 𝑟 1/2 .
(20.20)
(20.21)
The strength of approximation may actually be sharpened in terms of 𝛽4 . Proposition 20.5.2 With some absolute constants 𝑐, 𝑐 0 > 0, for all 𝑟 > 0, 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ) ≥ 𝑐𝑟 𝛽4 ≤ 2 exp − 𝑟 2/3 .
(20.22)
Moreover, if E𝑋 𝑘3 = 0 (𝑘 ≤ 𝑛) and 𝑛 ≥ 𝛽42 , then √ 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ) ≥ 𝑐𝛽4 ≤ 2 exp − 𝑐 0 𝑛 .
(20.23)
20.5 Normal Approximation Under the 4-th Moment Condition
401
Proof (of Proposition 20.5.1) By Proposition 20.4.1, E 𝜃 𝜌(𝐹𝜃 , Φ3, 𝜃 ) ≤ that, by the triangle inequality, E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 ¯ 𝛽4 + E 𝜃 𝜌(Φ3, 𝜃 , Φ). 𝑛
𝑐 ¯ 𝑛 𝛽4 ,
so
(20.24)
To get a similar bound on the last integral, we use |𝑥 2 − 1| 𝜑(𝑥) ≤ 1, 𝑥 ∈ R. Hence, by the very definition of the 3-rd order corrected normal distribution function, 𝜌(Φ3, 𝜃 , Φ) ≤
1 |𝛼3 (𝜃)|. 6
(20.25)
Put 𝑎 𝑘 = E𝑋 𝑘3 and note that 𝑎 2𝑘 ≤ E𝑋 𝑘4 , which is due to the assumption E𝑋 𝑘2 = 1. Recalling the identity (10.18) with 𝑄 3 = 𝛼3 , we obtain that E 𝜃 𝜌 2 (Φ3, 𝜃 , Φ) ≤
𝑛 𝑛 ∑︁ 2 ∑︁ 5/12 1 E𝜃 𝑎 𝑘 𝜃 3𝑘 = 𝑎 2𝑘 . 36 𝑛(𝑛 + 2) (𝑛 + 4) 𝑘=1 𝑘=1
The last sum does not exceed 𝑛 𝛽¯4 , so that E 𝜃 𝜌 2 (Φ3, 𝜃 , Φ) ≤ E 𝜃 𝜌(Φ3, 𝜃 , Φ) ≤
1 𝑛
√︃
1 𝑛2
𝛽¯4 and hence
1 𝛽¯4 ≤ 𝛽¯4 𝑛
(since 𝛽¯4 ≥ 1). Applying this estimate in (20.24), we arrive at (20.20). Next, we apply Proposition 10.6.3 to 𝑄 3 with coefficients 𝑎 𝑘 = E𝑋 𝑘3 /( 𝛽¯4 ) 1/2 to obtain from (10.18) that, for any 𝑟 > 0, o n 1 𝑟 2/3 . 𝔰𝑛−1 𝑛𝜌(Φ3, 𝜃 , Φ) ≥ 6( 𝛽¯4 ) 1/2 𝑟 ≤ 2 exp − 23 After the change of variable, and using 𝛽¯4 ≥ 1, one may rewrite this as 𝔰𝑛−1 𝑛𝜌(Φ3, 𝜃 , Φ) ≥ 𝑐 𝛽¯4 𝑟 ≤ 2 exp − 𝑟 2/3 .
(20.26)
On the other hand, by the deviation bound (20.15) of Proposition 20.4.1 with 𝑝 = 4, 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ3, 𝜃 ) ≥ 𝑐 𝛽¯4 𝑟 ≤ 2 exp − 𝑟 1/2 . Combining these two bounds, we thus get n o 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ) ≥ 2𝑐 𝛽¯4 𝑟 ≤ 4 exp − min 𝑟 2/3 , 𝑟 1/2 , which can easily be simplified to the form (20.21) by changing the variable 𝑟 and choosing a suitable constant 𝑐 > 0. Proposition 20.5.1 is now proved. □
402
20 Product Measures
Proof (of Proposition 20.5.2) Turning to Proposition 20.5.2, we employ the inequality (20.16), which in the case 𝑝 = 4 gives, for all 𝑟 ≥ 1, n o 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ3, 𝜃 ) ≥ 𝑐𝛽4 𝑟 ≤ 3 exp − min (𝑟𝑛) 1/2 , 𝑛2/3 , 𝑐 0 𝑛𝛽4−1 . (20.27) In the second assertion √ of the proposition, we necessarily have Φ3, 𝜃 = Φ, and, by the assumption, 𝑛𝛽4−1 ≥ 𝑛. Hence, choosing here 𝑟 = 1, we arrive at a concentration bound which is equivalent to (20.23). As for the first assertion, first assume that 𝑟 ≥ 1. We combine (20.27) with (20.26) and use 𝛽¯4 ≤ 𝛽4 , to get that 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ) ≥ 𝑐𝛽4 𝑟 ≤ 5𝜀(𝑛, 𝑟) (20.28) with
n o 𝜀(𝑛, 𝑟) = exp − min (𝑟𝑛) 1/2 , 𝑛2/3 , 𝑐 0 𝑛𝛽4−1 , 𝑟 2/3 .
Since 𝜌(𝐹𝜃 , Φ) ≤ 1, one may restrict (20.28) to the region 1 ≤ 𝑟 ≤ 𝑐𝛽𝑛 4 (in particular, 𝛽4 ≤ 𝑛/𝑐). But then, 𝑟 2/3 is dominated by all other terms in the definition of 𝜀(𝑛, 𝑟), so that (20.28) simplifies to 𝔰𝑛−1 𝑛𝜌(𝐹𝜃 , Φ) ≥ 𝑐𝑟 𝛽4 ≤ 5 exp − 𝑟 2/3 , 𝑟 ≥ 1. This inequality also holds for 0 ≤ 𝑟 < 1, since its right-hand side is greater than 1 in this case. Hence it holds for all 𝑟 ≥ 0. Finally, rescaling the variable 𝑟 and choosing a larger constant 𝑐 > 0, the above inequality yields (20.22). Proposition 20.5.2 is proved. □
20.6 Approximation With Rate 𝒏−3/2 In order to improve the 𝑛1 -rate of normal approximation for most of the distribution functions 𝐹𝜃 , we need to involve the next term in the Edgeworth expansion. For simplicity, let us consider the i.i.d. case so that 𝑋1 , . . . , 𝑋𝑛 have equal distributions with E𝑋1 = 0, E𝑋22 = 1, 𝛽¯ 𝑝 = 𝛽 𝑝 = E |𝑋1 | 𝑝 . Recall that for 𝑝 = 5, the representation (20.2) for the corrected normal distribution function Φ4, 𝜃 contains the term 𝛾3 (𝜃) = 𝛼3 (𝜃 13 + · · · + 𝜃 3𝑛 ),
𝜃 = (𝜃 1 , . . . , 𝜃 𝑛 ) ∈ S𝑛−1 ,
where 𝛼3 = E𝑋13 . It has fluctuations on the sphere of order 1/𝑛, which follows from the formula (10.18), implying that E 𝜃 𝛾3 (𝜃) 2 =
15 𝛼2 . (𝑛 + 2) (𝑛 + 4) 3
20.6 Approximation With Rate 𝑛−3/2
403
Since the fluctuations of the other terms around their means in (20.2) are of a smaller order, the distances 𝜌(Φ4, 𝜃 , 𝐺 4 ) with 𝐺 4 = E 𝜃 Φ4, 𝜃 are going to be of the same order 1/𝑛, as long as 𝛼3 ≠ 0. This means that the improvement in comparison with the normal approximation obtained in the previous section can only be achieved when 𝛼3 = 0. This is not a strict condition. It is fulfilled, for example, when the underlying distribution (of 𝑋1 ) is symmetric about the origin. So, let us assume that 𝛼3 = 0. Since 𝛾4 (𝜃) = (𝛽4 − 3) 𝑙 4 (𝜃),
𝑙4 (𝜃) =
𝑛 ∑︁
𝜃 4𝑘 ,
𝑘=1
the representation (20.2) is simplified in this case to Φ4, 𝜃 (𝑥) = Φ(𝑥) −
𝛽4 − 3 𝐻3 (𝑥)𝜑(𝑥) 𝑙4 (𝜃), 4!
𝐻3 (𝑥) = 𝑥 3 − 3𝑥.
(20.29)
These functions have the spherical mean 𝐺 4 (𝑥) = Φ(𝑥) −
𝛽4 − 3 𝐻3 (𝑥)𝜑(𝑥), 8 (𝑛 + 2)
𝑥 ∈ R.
(20.30)
We shall verify below that the distances 𝜌(Φ4, 𝜃 , 𝐺 4 ) are of the order 𝑛−3/2 , which leads to the following statement. Proposition 20.6.1 If 𝛼3 = 0 and 𝛽5 < ∞, then E 𝜃 𝜌(𝐹𝜃 , 𝐺 4 ) ≤
𝑐 𝛽5 . 𝑛3/2
(20.31)
Proof Using (20.30), let us rewrite (20.29) as Φ4, 𝜃 (𝑥) = 𝐺 4 (𝑥) −
𝛽4 − 3 3 𝐻3 (𝑥) 𝜑(𝑥) 𝑙 4 (𝜃) − . 24 𝑛+2
It is easy to check that |𝐻3 (𝑥)| 𝜑(𝑥) < 1 for all 𝑥 ∈ R. Since 𝛽4 ≥ 1 so that |𝛽4 − 3| ≤ 2𝛽4 , it follows that 𝜌(Φ4, 𝜃 , 𝐺 4 ) ≤
𝛽4 3 𝑙4 (𝜃) − . 12 𝑛+2
Now, let us recall that, by (10.20) with 𝑎 𝑘 = 1, E 𝜃 𝑙 4 (𝜃) −
√ 3 √︁ ≤ Var 𝜃 (𝑙4 (𝜃)) ≤ 96 𝑛−3/2 , 𝑛+2
which implies that E 𝜃 𝜌(Φ4, 𝜃 , 𝐺 4 ) ≤ 𝛽4 𝑛−3/2 .
(20.32)
404
20 Product Measures
Also, according to Proposition 20.4.1 with 𝑝 = 5, E 𝜃 𝜌(𝐹𝜃 , Φ4, 𝜃 ) ≤ 𝑐𝛽5 𝑛−3/2 . Applying the triangle inequality for the Kolmogorov distance 𝜌 together with 𝛽4 ≤ 𝛽5 , the two bounds yield the required inequality (20.31). □ It is possible to strengthen this proposition in terms of deviation bounds. Proposition 20.6.2 If 𝛼3 = 0 and 𝛽5 < ∞, then, for all 𝑟 ≥ 0, 𝔰𝑛−1 𝑛3/2 𝜌(𝐹𝜃 , 𝐺 4 ) ≥ 𝑐𝛽5 𝑟 ≤ 2 exp − 𝑟 2/5 .
(20.33)
Moreover, if 𝑛 ≥ 𝛽5 , 𝔰𝑛−1 𝑛3/2 𝜌(𝐹𝜃 , 𝐺 4 ) ≥ 𝑐𝛽5 𝑟 ≤ 2 exp − 𝑟 1/2 .
(20.34)
Proof Rescaling the variable 𝑟 (if necessary), we may assume that 𝑟 ≥ 1. Applying Corollary 10.6.2, we have n 𝔰𝑛−1 𝑛3/2 𝑙4 (𝜃) −
o o n 1 3 𝑟 1/2 , ≥ 𝑟 ≤ 2 exp − 𝑛+2 38
𝑟 ≥ 0.
Hence, from (20.32) it also follows that o n n 1 𝛽4 𝑟 −3/2 o 𝑛 ≤ 2 exp − 𝑟 1/2 , 𝔰𝑛−1 𝜌(Φ4, 𝜃 , 𝐺 4 ) ≥ 12 38 which implies 𝔰𝑛−1 𝜌(Φ4, 𝜃 , 𝐺 4 ) ≥ 𝐶 𝛽5 𝑟 𝑛−3/2 ≤ 2 exp − 𝑟 1/2
(20.35)
for some absolute constant 𝐶 > 0. On the other hand, by the bound (20.15) of Proposition 20.4.1 with 𝑝 = 5, 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ4, 𝜃 ) ≥ 𝐶 𝛽5 𝑟 𝑛−3/2 ≤ 2 exp − 𝑟 2/5 . In combination with (20.35), the latter yields an inequality which is equivalent to (20.33). To sharpen this bound, we employ Proposition 20.4.2. The inequality (20.16) specialized to the case 𝑝 = 5 provides the bound n o 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ4, 𝜃 ) ≥ 𝐶 𝛽5 𝑟 𝑛−3/2 ≤ 3 exp − min (𝑟𝑛) 2/5 , 𝑛2/3 , 𝑐𝑛𝛽5−2/3 for some absolute constants 𝑐 > 0 and 𝐶 > 0. Combining this with (20.35), we get that n o 𝔰𝑛−1 𝜌(𝐹𝜃 , 𝐺 4 ) ≥ 𝐶 𝛽5 𝑟 𝑛−3/2 ≤ 5 exp − min 𝑟 1/2 , (𝑟𝑛) 2/5 , 𝑛2/3 , 𝑐𝑛𝛽5−2/3 .
20.6 Approximation With Rate 𝑛−3/2
If
405
n 2o 1 ≤ 𝑟 ≤ 𝑟 0 = min 𝑛4/3 , 𝑛𝛽5−2/3 ,
both (𝑟𝑛) 2/5 and 𝑛2/3 will dominate 𝑟 1/2 , and the above bound gives the desired relation (20.36) 𝔰𝑛−1 𝜌(𝐹𝜃 , 𝐺 4 ) ≥ 𝐶 𝛽5 𝑟 𝑛−3/2 ≤ 5 exp − 𝑐𝑟 1/2 . As for larger values of 𝑟, the usual Berry–Esseen bound 𝜌(𝐹𝜃 , Φ) ≤ 𝐶 𝐿 3 (𝜃) = 𝐶 𝛽3
𝑛 ∑︁
|𝜃 𝑘 | 3
𝑘=1
with a purely Gaussian approximation is more accurate (here one may take 𝐶 = 1). Indeed, applying the inequality (20.5) of Proposition 20.1.1 with 𝑝 = 3, we get that, for all 𝑟 ≥ 𝑛, n o 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ) ≥ 𝐶 𝛽3 𝑟 𝑛−3/2 ≤ 𝔰𝑛−1 𝑛1/2 𝐿 3 (𝜃) ≥ 𝐶 𝛽3 𝑟𝑛−1 ≤ exp − 𝑟 2/3 with some absolute constant 𝐶. Since 𝛽3 ≤ 𝛽5 , we obtain that 𝔰𝑛−1 𝜌(𝐹𝜃 , Φ) ≥ 𝐶 𝛽5 𝑟 𝑛−3/2 ≤ exp − 𝑟 2/3 .
(20.37)
But this inequality also holds for 𝑟 > (𝑛𝛽5−2/3 ) 2 . Indeed, in this case 𝛽5 𝑟 𝑛−3/2 > 𝛽5 𝑛−3/2 𝑛𝛽5−2/3
2
=
𝑛3/2 1/3
> 1.
𝛽5
Hence, choosing 𝐶 > 1 in (20.37), the left 𝔰𝑛−1 -probability in this inequality is vanishing due to the property 𝜌(𝐹𝜃 , Φ) ≤ 1. Thus, the inequality (20.37) holds true for all 𝑟 > 𝑟 0 . For sufficiently large values of 𝑟, one may replace Φ with 𝐺 4 in (20.37). Indeed, from the representation (20.30) it follows that 𝜌(𝐺 4 , Φ) ≤
𝛽5 𝛽4 ≤ . 𝑛 𝑛
√ Here, the last ratio does not exceed 𝛽5 𝑟 𝑛−3/2 for all 𝑟 > 𝑟 0 if and only if 𝑟 0 ≥ 𝑛. By the definition of 𝑟 0 , the latter is equivalent to 𝑛 ≥ 𝛽58/9 (which is fulfilled as long as 𝑛 ≥ 𝛽). Thus, (20.37) holds true with 𝐺 4 in place of Φ for all 𝑟 > 𝑟 0 , and therefore the inequality (20.36) is fulfilled for some constants 𝑐 > 0 and 𝐶 > 0 without any constraint on the range of 𝑟. It remains to rescale the variable 𝑟 in (20.36) to obtain the bound (20.34). Proposition 20.4.1 is now proved. □
406
20 Product Measures
20.7 Lower Bounds Keeping the notations and the basic assumptions in the i.i.d. case, let us now examine the sharpness of the bounds of Propositions 20.5.1 and 20.6.1, E 𝜃 𝜌(𝐹𝜃 , Φ) ≤
𝑐 𝛽4 , 𝑛
E 𝜃 𝜌(𝐹𝜃 , 𝐺 4 ) ≤
𝑐 𝛽5 , 𝑛3/2
where 𝛽4 = E𝑋14 and 𝛽5 = E |𝑋1 | 5 . The second bound is optimal in the sense that it can be reversed in a typical situation, where the 4-th moment of 𝑋1 is different from the 4-th moment of the standard normal law. The same observation applies to the first inequality, when the 3-rd moment 𝛼3 = E𝑋13 is not zero. Denote by G the collection of all functions 𝐺 of bounded variation on the real line such that 𝐺 (−∞) = 0 and 𝐺 (∞) = 1. Proposition 20.7.1 If 𝛼3 ≠ 0, 𝛽4 < ∞, then the inequality inf E 𝜃 𝜌(𝐹𝜃 , 𝐺) ≥
𝐺∈G
𝑐 𝑛
(20.38)
holds for all 𝑛 with a constant 𝑐 > 0 depending on 𝛼3 and 𝛽4 only. Moreover, if 𝛼3 = 0, 𝛽4 ≠ 3, 𝛽5 < ∞, then inf E 𝜃 𝜌(𝐹𝜃 , 𝐺) ≥
𝐺∈G
𝑐 , 𝑛3/2
(20.39)
where the constant 𝑐 > 0 depends on 𝛽4 and 𝛽5 . Proof Our basic tool will be Lemma 14.5.4. According to the lower bound of this lemma, for any 𝑇 > 0, ∫ 𝑇 𝑡 1 E 𝜃 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) 1 − d𝑡 . (20.40) E 𝜃 𝜌(𝐹𝜃 , 𝐺) ≥ 6𝑇 𝑇 0 As before, here 𝑓 𝜃 (𝑡) and 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡) denote the characteristic functions of 𝐹𝜃 and the typical distribution 𝐹 = E 𝜃 𝐹𝜃 . We use 𝑐, 𝑐 ′, 𝑐 ′′ to denote positive absolute constants which may vary from place to place. First, let us apply the representation from Corollary 20.2.2, which holds for all 𝑛−1 of 𝔰 1/2 } in the interval 𝜃 from 𝑛−1 -measure at least 1 − 𝑐𝛽4 exp{−𝑛 √ a set Ω ⊂ S |𝑡| ≤ 𝑛/(𝑐𝛽3 ). Given 0 < 𝑇 ≤ 1, we have ∫
𝑇
𝑡 3 e−𝑡
2 /2
0
1−
𝑡 1 d𝑡 > 𝑇 4
∫
𝑇/2
𝑡 3 d𝑡 =
0
On the other hand, ∫ 0
𝑇
𝑡 4 e−𝑡
2 /8
1−
𝑡 1 d𝑡 < 𝑇 5 . 𝑇 5
1 4 𝑇 . 256
20.7 Lower Bounds
407
Therefore, for all 𝜃 ∈ Ω and 𝑛 ≥ (𝑐𝛽3 ) 2 , ∫ 𝑇 𝑡 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) 1 − d𝑡 ≥ 𝑐 |𝛼3 | |𝛼3 (𝜃)| 𝑇 4 𝑇 0 𝑐′ 𝛽4𝑇 5 − 𝑐 ′ 𝛽4 exp − 𝑛1/2 𝑇 . − 𝑛
(20.41)
To explore the behavior of the right-hand side on average for the growing parameter 𝑛, note that 𝑐 (20.42) E 𝜃 |𝛼3 (𝜃)| = E 𝜃 |𝜃 13 + · · · + 𝜃 3𝑛 | ≥ . 𝑛 To see this, write a standard normal random vector 𝑍 = (𝑍1 , . . . , 𝑍 𝑛 ) in R𝑛 in the form 𝑍 = 𝑟𝜃, where the random vector 𝜃 is uniformly distributed on S𝑛−1 and 𝑟 = |𝑍 | is independent of 𝜃. This gives E 𝜃 |𝛼3 (𝜃)| E 𝑟 3 = E |𝑆 𝑛 | where 𝑆 𝑛 = 𝑍13 + · · · + 𝑍 𝑛3 . Since E 𝑟 4 = 𝑛(𝑛 + 2) ≤ 3𝑛, we have, by Hölder’s inequality, E 𝑟 3 < 3𝑛3/2 , so, 1 E |𝑆 𝑛 |. 3𝑛3/2 √ By the central limit theorem, √1𝑛 E |𝑆 𝑛 | → 15 as 𝑛 → ∞, and we arrive at (20.42). E 𝜃 |𝛼3 (𝜃)| ≥
Moreover, since |𝛼3 (𝜃)| ≤ 1 for all 𝜃 ∈ S𝑛−1 , we have E 𝜃 |𝛼3 (𝜃)| 1Ω (𝜃) ≥ 𝑛𝑐 for all 𝑛 large enough. One can now integrate the inequality (20.41) over the set Ω, to conclude that ∫ 𝑇 1 𝑡 𝑇 3 E𝜃 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) 1 − d𝑡 ≥ 𝑐 |𝛼3 | − 𝑐 ′ 𝛽4𝑇 𝑇 𝑇 𝑛 0 − 𝑐 ′′ 𝛽4 exp − 𝑛1/2 for all 𝑛 ≥ 𝑛0 . Choosing an appropriate value of 𝑇 ∼ |𝛼3 |/𝛽4 and applying (20.40), we get a lower bound of the form E 𝜃 𝜌(𝐹𝜃 , 𝐺) ≥ 𝑐
|𝛼3 | 4 − 𝑐 ′′ 𝛽4 exp − 𝑛1/2 3 𝛽4 𝑛
with an arbitrary function 𝐺 of bounded variation such that 𝐺 (−∞) = 0, 𝐺 (∞) = 1. The latter immediately yields the required relation (20.37) for the range 𝑛 ≥ 𝑛0 with constant 𝑐 = 𝑐 0 |𝛼3 | 4 /𝛽43 and for a sufficiently large 𝑛0 depending 𝛼3 and 𝛽4 . In order to treat the remaining values 2 ≤ 𝑛 < 𝑛0 , let us note that the infimum in (20.38) is positive. Indeed, assuming the opposite for a fixed 𝑛, there would exist a 𝐺 ∈ G such that E 𝜃 𝜌(𝐹𝜃 , 𝐺) = 0 and hence 𝐹𝜃 (𝑥) = 𝐺 (𝑥) for all 𝜃 ∈ S𝑛−1 and 𝑥 ∈ R. In particular, all the weighted sums 𝑆 𝜃 would be equidistributed, which is only possible when all the random variables 𝑋 𝑘 have a standard normal distribution, according to the Pólya characterization theorem [158], cf. also [115]. But this contradicts the assumption 𝛼3 ≠ 0.
408
20 Product Measures
The second assertion, where 𝛼3 = 0 and 𝛽4 ≠ 3, is similar. We now apply the representation of Corollary 20.2.3, which holds for all 𝜃√in a set Ω ⊂ S𝑛−1 of measure at least 1 − 𝑐𝛽5 exp{−𝑛2/5 } in the same interval |𝑡| ≤ 𝑛/(𝑐𝛽3 ). Given 0 < 𝑇 ≤ 1, we have ∫ 𝑇/2 ∫ 𝑇 1 5 𝑡 1 4 −𝑡 2 /2 d𝑡 > 𝑡 4 d𝑡 = 𝑇 . 𝑡 e 1− 𝑇 4 0 640 0 On the other hand, ∫
𝑇
𝑡 5 e−𝑡
2 /8
1−
0
𝑡 1 d𝑡 < 𝑇 6 . 𝑇 6
Therefore, for all 𝜃 ∈ Ω and 𝑛 ≥ (𝑐𝛽3 ) 2 , ∫ 𝑇 𝑡 3 5 d𝑡 ≥ 𝑐 𝛽 − 3 (𝜃) ( 𝑓 (𝑡) − 𝑓 (𝑡)) 1 − − 𝑙 𝑇 4 4 𝜃 𝑇 𝑛+2 0 𝑐′ − 3/2 𝛽5 𝑇 6 − 𝑐 ′ 𝛽5 exp − 𝑛2/5 . 𝑛
(20.43)
Using the arguments based on the application of the central limit theorem, we similarly have 𝑐 3 (20.44) E 𝜃 𝑙4 (𝜃) − ≥ 3/2 . 𝑛+2 𝑛 Indeed, the representation 𝑍 = 𝑟𝜃 yields 𝑆𝑛 ≡
𝑛 ∑︁
(𝑍 𝑘4 − 3) = 𝑟 4 𝑙 4 (𝜃) −
𝑘=1
3 3 + (𝑟 4 − 𝑛(𝑛 + 2)), 𝑛+2 𝑛+2
implying that E |𝑆 𝑛 | ≤ 𝑛(𝑛 + 2) E 𝑙4 (𝜃) −
3 3 E |𝑟 4 − 𝑛(𝑛 + 2)|. + 𝑛+2 𝑛+2
To estimate the last expectation, recall that E |𝑍 | 2 𝑝 = 𝑛(𝑛 + 2) . . . (𝑛 + 2𝑝 − 2) for 𝑝 = 1, 2, . . . In particular, E 𝑟 4 = 𝑛(𝑛 + 2) and Var(𝑟 4 ) = E |𝑍 | 8 − (E |𝑍 | 4 ) 2 = 𝑛(𝑛 + 2) (𝑛 + 4) (𝑛 + 6) − 𝑛2 (𝑛 + 2) 2 = 8 𝑛(𝑛 + 2) (𝑛 + 3). Hence √︁ E |𝑟 4 − 𝑛(𝑛 + 2)| ≤ 2 2 𝑛(𝑛 + 2)(𝑛 + 3), so that √
𝑛 (𝑛 + 2) E 𝑙 4 (𝜃) −
√ 1 3 ≥ √ E |𝑆 𝑛 | − 6 2 𝑛+2 𝑛
√︂
𝑛+3 . 𝑛+2
(20.45)
20.8 Remarks
409
√︃ √ Here, by the central limit theorem, √1𝑛 E |𝑆 𝑛 | → Var(𝑍14 ) = 4 6 as 𝑛 → ∞, which √ is greater than 6 2. This shows that the right-hand side of (20.45) is bounded away from zero, and therefore the lower bound (20.44) holds true for all 𝑛 ≥ 2 with some 3 | ≤ 1 on S𝑛−1 , necessarily absolute constant 𝑐 > 0. Moreover, since |𝑙4 (𝜃) − 𝑛+2 E 𝜃 𝑙4 (𝜃) −
3 𝑐 1Ω (𝜃) ≥ 𝑛+2 𝑛
with some absolute constant 𝑐 > 0 for all 𝑛 large enough. One can now integrate the inequality (20.43) over the set Ω, to conclude that ∫ 𝑇 𝑇4 1 𝑡 E𝜃 ( 𝑓 𝜃 (𝑡) − 𝑓 (𝑡)) 1 − d𝑡 ≥ 3/2 𝑐 𝛽4 − 3 − 𝑐 ′ 𝛽5 𝑇 𝑇 𝑇 𝑛 0 − 𝑐 ′′ 𝛽5 exp − 𝑛2/5 . Choosing an appropriate value of 𝑇 ∼ |𝛽4 − 3|/𝛽5 and applying (20.40), we get E 𝜃 𝜌(𝐹𝜃 , 𝐺) ≥ 𝑐
|𝛽4 − 3| 5 − 𝑐 ′′ 𝛽5 exp − 𝑛2/5 . 4 𝛽5 𝑛
The latter yields the required relation (20.39) for the range 𝑛 ≥ 𝑛0 with constant 𝑐 = 𝑐 0 |𝛽4 − 3| 5 /𝛽54 and with a sufficiently large 𝑛0 depending on 𝛽5 . A similar argument based on an application of the Pólya theorem allows us to involve the remaining values 2 ≤ 𝑛 < 𝑛0 . Proposition 20.7.1 is proved. □
20.8 Remarks As mentioned before, Proposition 20.5.1 is due to Klartag and Sodin [125]. There it has also been shown that, in the i.i.d. case, a 𝑛1 -bound such as 𝜌(𝐹𝜃 , Φ) ≤
𝑐 𝑛
holds for some universal collections of coefficients. For example, when 𝑛 is divisible by 4, one may take √︂ √ √ √ √ 2 (1, 2, −1, − 2, 1, 2, −1, − 2, . . . ) 𝜃= 3𝑛 The rate of convergence is closely related to Diophantine approximations, cf. e.g. [34]. Most of the other material of the chapter is contained in [35].
Chapter 21
Coefficients of Special Type
There are a number of results on distribution functions 𝐹𝜃 of the weighted sums ⟨𝑋, 𝜃⟩ = 𝜃 1 𝑋1 + · · · + 𝜃 𝑛 𝑋𝑛 for coefficients 𝜃 𝑘 which have a special structure. In this chapter we briefly discuss some of these results while skipping proofs.
21.1 Bernoulli Coefficients When all 𝜃 𝑘 = ± √1𝑛 , we are dealing with distribution functions 𝐹𝜀 (𝑥) = P{𝑆 𝜀 ≤ 𝑥} of the randomized sums 1 𝑆 𝜀 = √ (𝜀1 𝑋1 + · · · + 𝜀 𝑛 𝑋𝑛 ), 𝑛
𝜀 = (𝜀1 , . . . , 𝜀 𝑛 ) ∈ {−1, 1} 𝑛 .
In this scheme, the uniform measure 𝔰𝑛−1 on the unit sphere is replaced with the normalized counting measure 𝜇 𝑛 on the discrete cube Ω𝑛 = {−1, 1} 𝑛 , and then the typical distribution is understood as the average ∫ 1 ∑︁ 𝐹 (𝑥) = 𝐹𝜀 (𝑥) d𝜇 𝑛 (𝜀) = 𝑛 𝐹𝜀 (𝑥), 𝑥 ∈ R. 2 𝜀 Ω𝑛 The following observation was made in [26]. Proposition 21.1.1 Given an isotropic random vector 𝑋 in R𝑛 , for any 𝑟 > 0, 𝜇 𝑛 𝐿 (𝐹𝜀 , 𝐹) ≥ 𝑟 ≤ 𝐶𝑛1/4 exp − 𝑐𝑛𝑟 8 (21.1) with some absolute constants 𝐶 > 0 and 𝑐 > 0.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9_21
411
412
21 Coefficients of Special Type
This statement may be viewed as a discrete analog of Propositions 14.4.1–14.4.2 for general spherical coefficients. It implies that 𝐿(𝐹𝜀 , 𝐹) ≤ 𝑐
log 𝑛 1/8 𝑛
for most ±1 sequences 𝜀. The property that the right-hand side of (21.1) has an exponential decay with respect to 𝑛 allows one to study the asymptotic normality of the weighted sums along arrays of random variables {𝑋𝑛,𝑖 }, 1 ≤ 𝑖 ≤ 𝑛. Let 𝜇 denote the uniform (infinite product Bernoulli) measure on {−1, 1}∞ , and assume that 𝑎) E𝑋𝑛,𝑖 𝑋𝑛, 𝑗 = 𝛿𝑖, 𝑗 for all 𝑛 ≥ 1 (isotropy); 𝑏)
√1 𝑛
𝑐)
1 𝑛
max𝑖 ≤𝑛 |𝑋𝑛,𝑖 | → 0 in probability as 𝑛 → ∞; Í𝑛 2 𝑖=1 𝑋𝑛,𝑖 → 1 in probability as 𝑛 → ∞ (law of large numbers for squares).
Corollary 21.1.2 Under the assumptions 𝑎)–𝑐), for 𝜇-almost all ±1-sequences (𝜀1 , 𝜀2 , . . . ), we have that, as 𝑛 → ∞, 𝑛 n 1 ∑︁ o 𝜀𝑖 𝑋𝑛,𝑖 ≤ 𝑥 → Φ(𝑥), P √ 𝑛 𝑖=1
𝑥 ∈ R,
We refer to [26] for details. Let us stress that this variant of the central limit theorem does not hold any longer for prescribed coefficients, say for 𝜀𝑖 = 1, even if we assume that all {𝑋𝑛,𝑖 }1≤𝑖 ≤𝑛 are uniformly bounded, symmetrically distributed and pairwise independent. For example, for the values 𝑛 = 𝑑 (𝑑 − 1)/2, 𝑑 = 1, 2, . . . , consider the family of random variables 𝑋𝑛, (𝑘, 𝑗) = 𝜉 𝑘 𝜉 𝑗 ,
1 ≤ 𝑘 < 𝑗 ≤ 𝑑,
where 𝜉1 , . . . , 𝜉 𝑑 are independent and have a symmetric Bernoulli distribution on {−1, 1}. In this case, the conditions 𝑏)–𝑐) are fulfilled automatically, since |𝑋𝑛, (𝑘, 𝑗) | = 1, while the isotropy condition 𝑎) follows from the pairwise independence of 𝑋𝑛, (𝑘, 𝑗) . On the other hand, by the central limit theorem for the sums of 𝜉 𝑗 ’s, weakly in distribution 1 √ 𝑛
𝑑 𝑑 𝑍2 − 1 1 ∑︁ 2 𝜉𝑘 − √ ⇒ √ 𝑋𝑛, (𝑘, 𝑗) = √ 2 𝑛 𝑘=1 2 𝑛 2 1≤𝑘< 𝑗 ≤𝑑
∑︁
(𝑑 → ∞),
where 𝑍 ∼ 𝑁 (0, 1). Thus, the limit law for the normalized sums exists, but it is not normal.
21.2 Random Sums
413
21.2 Random Sums Similar concentration-type observations can be made for the distribution functions 𝐹𝜏 of the sums 1 𝑆 𝜏 = √ (𝑋𝑖1 + · · · + 𝑋𝑖𝑘 ), 𝑘 where 𝜏 = {𝑖1 , . . . , 𝑖 𝑘 }, 1 ≤ 𝑖 1 < · · · < 𝑖 𝑘 ≤ 𝑛. In this case, the average distribution function is defined to be ∫ 1 ∑︁ 𝐹𝜏 (𝑥), 𝑥 ∈ R, 𝐹 (𝑥) = 𝐹𝜏 (𝑥) d𝜇 𝑛,𝑘 (𝜏) = 𝑘 (21.2) 𝐶𝑛 𝜏 G𝑛,𝑘 where 𝜇 𝑛,𝑘 (𝑘 = 1, . . . , 𝑛) denotes the normalized counting measure on the collection G𝑛,𝑘 of all subsets 𝜏 ⊂ {1, . . . , 𝑛} of cardinality |𝜏| = 𝑘. Note that |G𝑛,𝑘 | has 𝑛! . cardinality 𝐶𝑛𝑘 = 𝑘!(𝑛−𝑘)! Proposition 21.2.1 Given an isotropic random vector 𝑋 in R𝑛 , for any 𝑟 > 0, 𝜇 𝑛,𝑘 𝐿 (𝐹𝜏 , 𝐹) ≥ 𝑟 ≤ 𝐶 𝑘 3/4 exp − 𝑐𝑘𝑟 8 with some absolute constants 𝐶 > 0 and 𝑐 > 0. Here, the concentration property is telling us that the distribution of the resulting sum 𝑆 𝜏 does not essentially depend on the concrete times 𝑖 1 , . . . , 𝑖 𝑘 when the observations 𝑋 𝑗 are made, see [28]. The question of whether or not the average distribution 𝐹 in (21.2) is close to the normal distribution or, more flexibly, to an appropriate Gaussian mixture, is connected with certain elementary polynomials in 𝑛 complex variables of degree 1 ≤ 𝑘 ≤ 𝑛, namely 𝑃𝑛,𝑘 (𝑧) =
1 𝐶𝑛𝑘
∑︁
𝑧 𝑖1 . . . 𝑧𝑖𝑘 ,
𝑧 = (𝑧1 , . . . , 𝑧 𝑛 ) ∈ C𝑛 .
𝑖1 0, 𝑛 n √ o ∑︁ √ P |𝑋𝑖 | ≥ 𝜀 𝑛 P max |𝑋𝑖 | ≥ 𝜀 𝑛 ≤ 1≤𝑖 ≤𝑛
𝑖=1 𝑛 ∑︁ E 𝑋𝑖2 log(1 + |𝑋𝑖 |) 𝐶 ≤ √ ≤ 2 √ → 0 2 𝑛 log(1 + 𝜀 𝑛) 𝜀 𝜀 log(1 + 𝜀 𝑛) 𝑖=1
as 𝑛 → ∞. Recall that 𝑁 (0, 𝜌 2 ) denotes the law of the random variable of 𝜌𝑍, assuming that 𝜌 is independent of 𝑍 ∼ 𝑁 (0, 1). We have the following statement proved in [45].
21.3 Existence of Infinite Subsequences of Indexes
415
Proposition 21.3.1 Under the conditions 𝑎)–𝑐), weakly in distribution 𝑆 𝑛 ⇒ 𝑁 (0, 𝜌 2 )
(𝑛 → ∞)
(21.3)
for at least one sequence 𝑖 𝑛 . Moreover, for any prescribed sequence 𝑗 𝑛 such that 𝑗 𝑛 /𝑛 → ∞, we may have 𝑖𝑛 ≤ 𝑗𝑛
for all 𝑛 large enough.
(21.4)
The first observations of this kind with a purely normal limit are apparently due to Kac in his 1937 paper [111], and somewhat later there is a similar result by Fortet [90], cf. also [112]. On the unit interval Ω = (0, 1) equipped with Lebesgue measure P, they considered subsystems of 𝑋𝑛 (𝜔) = 𝑓 (𝑛𝜔), where 𝑓 is a fixed 1-periodic function on the real line. The existence of a sequence satisfying (21.3) without the tightness property (21.4) for 𝑖 𝑛 is actually known to hold in more general situations and for several schemes of weighted sums. In particular, for sequences of real numbers 𝑎 = (𝑎 𝑛 ) such that 𝐴𝑛 = (𝑎 12 + · · · + 𝑎 2𝑛 ) 1/2 → ∞,
𝑎 𝑛 = 𝑜( 𝐴𝑛 )
(which we call an admissible sequence), one may consider the weighted sums 𝑆 𝑛 (𝑎) =
𝑛 1 ∑︁ 𝑎 𝑘 𝑋𝑖𝑘 . 𝐴𝑛 𝑘=1
In 1955 Morgenthaler [147] considered an arbitrary uniformly bounded orthonormal system 𝑋𝑛 on the unit interval ((0, 1), P) (thus, satisfying 𝑎)–𝑐)). He proved that there exists an increasing sequence 𝑖 𝑛 and a measurable function 𝜌 ≥ 0 on Ω with ∥ 𝜌∥ 2 = 1 and ∥ 𝜌∥ ∞ ≤ sup𝑛 ∥ 𝑋𝑛 ∥ ∞ such that (21.3) holds true for any admissible 𝑎 = (𝑎 𝑛 ). Moreover, this convergence is stable in the sense that (21.3) holds on every measurable set 𝐵 ⊂ Ω of positive measure with respect to the normalized restriction of P to 𝐵. More general statements including necessary and sufficient conditions were obtained in the mid 60s – early 70s by Gaposhkin in a series of papers, and here we mention two results. Let (𝑋𝑛 ) 𝑛≥1 and 𝜌 ≥ 0 be random variables such that E𝑋𝑛2 = E𝜌 2 = 1. Proposition 21.3.2 The following properties are equivalent: 1) There exists an increasing sequence 𝑖 𝑛 such that the central limit theorem (21.3) holds true for any admissible 𝑎 = (𝑎 𝑛 ); 2) There exists an increasing sequence 𝑖 𝑛 such that 𝑋𝑖𝑛 → 0 weakly in 𝐿 2 and 𝑋𝑖2𝑛 → 𝜌02 weakly in 𝐿 1 as 𝑛 → ∞ for some 𝜌0 ≥ 0 on Ω equidistributed with 𝜌. We refer to [91], Theorem 5, and [94], Theorem 6.
416
21 Coefficients of Special Type
Here, the case where 𝜌 = 1 when the limit distribution is standard normal is of special interest. The existence of a random variable 𝜌 with ∥ 𝜌∥ 2 = 1 such that (21.3) holds for some increasing sequence 𝑖 𝑛 is equivalent to the weak convergence 𝑋𝑖𝑛 → 0 together with the uniform integrability of the sequence 𝑋𝑖2𝑛 (cf. [91], Theorem 8, or [92], Theorem 1.5.3). Proposition 21.3.3 If 𝑋𝑛 → 0 weakly in 𝐿 2 , then there exists an increasing sequence 𝑖 𝑛 such that (21.3) holds true for any admissible 𝑎 = (𝑎 𝑛 ) and some random variable 𝜌 ≥ 0. We refer to [92], Theorem 1.5.2, and [94], Theorem 5. In [94], Gaposhkin introduced an “equivalence lemma” which made it possible to reduce many problems on subsequences of 𝑋𝑛 to martingale differences (such as convergence of series, the central limit theorem, the law of the iterated logarithm) and eventually to extend the corresponding statements from Lebesgue measure and the space Ω = (0, 1) to arbitrary probability spaces. The above Proposition 21.3.3 was rediscovered by Chatterji [75] with a similar martingale approach. Chatterji introduced an informal statement known as the principle of subsequences; it states that any limit theorem about independent, identically distributed random variables continues to hold under proper moment assumptions for a certain subsequence of a given sequence of random variables. This general observation was made precise and developed by Aldous [3], and later by Berkes and Péter [13].
21.4 Selection of Indexes from Integer Intervals One should however note that not much is known about the speed of increase of a subsequence chosen to satisfy a central limit theorem like in Propositions 21.3.1 and 21.3.2. For example, sharpening results of Salem and Zygmund [164], [165] on lacunary trigonometric subsequences, Erdös [85] proved that any sequence 𝑋𝑖𝑛 (𝜔) = cos(2𝜋𝑖 𝑛 𝜔), with
𝑖𝑛+1 𝑖𝑛
≥ 1+
𝑐𝑛 √ , 𝑛
𝜔 ∈ (0, 1),
𝑐 𝑛 → ∞, satisfies 1 𝑆 𝑛 = √ (𝑋𝑖1 + · · · + 𝑋𝑖𝑛 ) ⇒ 𝑁 (0, 1/2) 𝑛
(21.5)
with respect to the uniform measure P on (0, 1); √ cf. also [14]. Note that the indexes 𝑖 𝑛 in Erdös’ theorem must grow faster than e 𝑛 . Using a randomization of indexes in a trigonometric orthonormal system, Berkes [15] showed that a sequence 𝑖 𝑛 can actually be chosen to satisfy 𝑖 𝑛+1 − 𝑖 𝑛 = 𝑂 ( 𝑗 𝑛 ) for any prescribed 𝑗 𝑛 → ∞. Proposition 21.3.1 above describes a similar property, with possible applications to other orthonormal systems.
21.4 Selection of Indexes from Integer Intervals
417
To describe Berkes’ construction, suppose that the set of all natural numbers is partitioned into non-empty consecutive intervals Δ 𝑘 , 𝑘 ≥ 1, of respective lengths (cardinalities) |Δ 𝑘 | → ∞. Berkes [14] proved that, if we select each 𝑖 𝑘 from Δ 𝑘 independently and at random according to the discrete uniform distribution on that interval, then for almost all choices of indexes, (21.5) still holds true. Hence, the gaps 𝑖 𝑛+1 − 𝑖 𝑛 may grow as slow as we wish. Moreover, as was shown in [46], the Berkes construction is also applicable to general systems of random variables {𝑋𝑛 } satisfying the following properties: 𝑎) E 𝑋𝑖 𝑋 𝑗 = 𝛿𝑖, 𝑗 for all 𝑖, 𝑗;
√ 𝑏) max1≤𝑘 ≤𝑛 max𝑖 ∈Δ𝑘 |𝑋𝑖 | = 𝑜( 𝑛) in probability as 𝑛 → ∞; Í𝑛 𝑋𝑖2 ⇒ 𝜌 2 weakly in distribution as 𝑛 → ∞, for some random 𝜌 ≥ 0. 𝑐) 𝑛1 𝑖=1 Proposition 21.4.1 Under the conditions 𝑎)–𝑐), for almost all indexes {𝑖 𝑘 } 𝑘 ≥1 selected independently and uniformly from Δ 𝑘 , weakly in distribution 𝑆 𝑛 ⇒ 𝑁 (0, 𝜌 2 )
(𝑛 → ∞).
In fact, here the condition 𝑎) may be weakened to 𝑀2 = sup sup E |𝜃 1 𝑋1 + · · · + 𝜃 𝑛 𝑋𝑛 | 2 < ∞, 𝑛≥1 𝜃 ∈S𝑛−1
which means that the spectral norms of the correlation operators of the random vectors (𝑋1 , . . . , 𝑋𝑛 ) remain bounded for growing 𝑛. This statement is based on a strong concentration property of the distribution functions 𝐹𝜏 of 𝑆 𝑛 = 𝑆 𝑛 (𝜏) with respect to finite selections 𝜏 = (𝑖1 , . . . , 𝑖 𝑛 ) from the space Δ1 × · · · × Δ𝑛 , which is equipped with the product 𝜇 𝑛 = 𝜆1 ⊗ · · · ⊗ 𝜆 𝑛 of the uniform measures 𝜆 𝑘 on Δ 𝑘 . The conditions 𝑏)–𝑐) are not needed, while 𝑎) is replaced with 𝑀2 < ∞. Proposition 21.4.2 For any 𝑟 > 0, 𝜇 𝑛 𝜏 : 𝐿(𝐹𝜏 , 𝐹) ≥ 𝑟 ≤ 𝐶e−𝑐𝑛 , ∫ where 𝐹 = 𝐹𝜏 d𝜇 𝑛 (𝜏) is the average distribution, and where 𝐶, 𝑐 denote positive constants depending on 𝑀2 and 𝑟, only. Returning to the cosine trigonometric system, Berkes raised the natural question of whether or not it is possible to find 𝑖 𝑛 with bounded gaps 𝑖 𝑛+1 − 𝑖 𝑛 , still satisfying (21.5). As was also shown in [46], the answer is negative: It turns out that a nontrivial limit 𝑆 𝑛 ⇒ 𝜉 is possible in the case of bounded gaps, but then necessarily E𝜉 2 < 12 (so, part of the second moment must be vanishing in the limit distribution). To see what may happen in the typical situation, the next observation from [46] gives a hint.
418
21 Coefficients of Special Type
Proposition 21.4.3 In the case of the cosine trigonometric system, for almost all indexes {𝑖 𝑘 } 𝑘 ≥1 selected independently and uniformly from the two point integer sets Δ 𝑘 = {2𝑘 − 1, 2𝑘 }, weakly in distribution we have √ 2 𝑆 𝑛 ⇒ 𝑁 (0, 𝜌 2 ) (𝑛 → ∞), where 𝜌 is distributed according to the arcsine law. More precisely, 𝜌 has the distribution function 𝐹 (𝑥) = 𝜋2 arccos(𝑥), 0 < 𝑥 < 1. √ Hence E𝜌 2 = 12 , while E ( 2 𝑆 𝑛 ) 2 = 1. Other typical distributions may appear in the limit for intervals Δ 𝑘 of a larger length.
References
1. Aida, S.; Masuda, T.; Shigekawa, I. Logarithmic Sobolev inequalities and exponential integrability. J. Func. Anal. 126, 83–101 (1994). 2. Aida, S.; Stroock, D. Moment estimates derived from Poincaré and logarithmic Sobolev inequalities. Math. Res. Lett. 1, no. 1, 75–86 (1994). 3. Aldous, D. J. Limit theorems for subsequences of arbitrarily-dependent sequences of random variables. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 40, no. 1, 59–82 (1977). 4. Antilla, M.; Ball, K.; Perissinaki, I. The central limit problem for convex bodies. Trans. Amer. Math. Soc. 355, no. 12, 4723–4735 (2003). 5. Alonso-Gutiérrez, D.; Bastero, J. Approaching the Kannan–Lovász-Simonovits and Variance Conjectures. Lecture Notes in Mathematics, 2131. Springer, Cham, (2015). x+148 pp. 6. Artstein-Avidan, S.; Giannopoulos, A.; Milman, V. D. Asymptotic Geometric Analysis. Part I. Mathematical Surveys and Monographs, 202. American Mathematical Society, Providence, RI (2015). xx+451 pp. 7. Bakry, D.; Emery, M. Diffusions hypercontractives. In: Sém. Probab., XIX, Lecture Notes in Math. 1123, pp. 177–206, Springer-Verlag, New York, Berlin (1985). 8. Bakry, D.; Gentil, I.; Ledoux, M. Analysis and Geometry of Markov Diffusion Operators. Springer, Berlin Heidelberg New York (2014). 9. Bakry, D.; Ledoux, M. Lévy–Gromov’s isoperimetric inequality for an infinite-dimensional diffusion generator. Invent. Math. 123, no. 2, 259–281 (1996). 10. Ball, K. Logarithmically concave functions and sections of convex sets in 𝑅 𝑛 . Studia Math. 88, no. 1, 69–84 (1988). 11. Bateman, H. Higher Transcendental Functions. Vol. II, McGraw-Hill Book Company, Inc., 1953, 396 pp. 12. Bentkus, V.; Götze, F. Optimal rates of convergence in the CLT for quadratic forms. Ann. Probab. 24, no. 1, 466–490 (1996). 13. Berkes, I.; Péter, E. Exchangeable random variables and the subsequence principle. Probab. Theory Related Fields 73, no. 3, 395–413(1986). 14. Berkes, I. On the central limit theorem for lacunary trigonometric series. Anal. Math. 4, no. 3, 159–180 (1978). 15. Berkes, I. A central limit theorem for trigonometric series with small gaps. Z. Wahrsch. Verw. Gebiete 47, no. 2, 157–161 (1979). 16. Bhattacharya, R. N.; Ranga Rao, R. Normal Approximation and Asymptotic Expansions. John Wiley & Sons, Inc. (1976). Also: Soc. for Industrial and Appl. Math., Philadelphia (2010). 17. Bikjalis, A. Remainder terms in asymptotic expansions for characteristic functions and their derivatives. (Russian) Litovsk. Mat. Sb. 7 (1967) 571–582 (1968). Selected Transl. in Math. Statistics and Probability 11, 149–162 (1973). © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9
419
420
References
18. Bobkov, S. G. Extremal properties of half-spaces for log-concave distributions. Ann. Probab. 24, no. 1, 35–48 (1996). 19. Bobkov, S. G. An isoperimetric inequality on the discrete cube, and an elementary proof of the isoperimetric inequality in Gauss space. Ann. Probab. 25, no. 1, 206–214 (1997). 20. Bobkov, S. G. Isoperimetric and analytic inequalities for log-concave probability measures. Ann. Probab. 27, no. 4, 1903–1921 (1999). 21. Bobkov, S. G. Remarks on the Gromov–Milman inequality. (Russian) Vestn. Syktyvkar. Univ., Ser. 1 Mat. Mekh. Inform. 3, 15–22 (1999). 22. Bobkov, S. G. Remarks on the growth of 𝐿 𝑝 -norms of polynomials. In: Geometric Aspects of Functional Analysis, pp. 27–35, Lecture Notes in Math., 1745, Springer, Berlin (2000). 23. Bobkov, S. G. Some generalizations of Prokhorov’s results on Khinchin-type inequalities for polynomials. (Russian) Teor. Veroyatnost. i Primenen. 45, no. 4, 745–748 (2000). Translation in: Theory Probab. Appl. 45, no. 4, 644–647 (2002). 24. Bobkov, S. G. A localized proof of the isoperimetric Bakry–Ledoux inequality and some applications. (Russian) Teor. Veroyatnost. i Primenen. 47, no. 2, 340–346 (2002). Translation in: Theory Probab. Appl. 47, no. 2, 308–314 (2003). 25. Bobkov, S. G. On concentration of distributions of random weighted sums. Ann. Probab. 31, no. 1, 195–215 (2003). 26. Bobkov, S. G. Concentration of distributions of the weighted sums with Bernoullian coefficients. In: Geometric Aspects of Functional Analysis, Lecture Notes in Math. 1807, pp. 27–36, Springer, Berlin (2003). 27. Bobkov, S. G. Spectral gap and concentration for some spherically symmetric probability measures. In: Geometric Aspects of Functional Analysis, Lecture Notes in Math. 1807, pp. 37–43, Springer, Berlin, 2003. 28. Bobkov, S. G. Concentration of normalized sums and a central limit theorem for noncorrelated random variables. Ann. Probab. 32, no. 4, 2884–2907 (2004). 29. Bobkov, S. G. Large deviations and isoperimetry over convex probability measures with heavy tails. Electr. J. Probab. 12, 1072–1100 (2007). 30. Bobkov, S. G. On isoperimetric constants for log-concave probability distributions. In: Geometric Aspects of Functional Analysis, Lecture Notes in Math. 1910, pp. 81–88, Springer, Berlin (2007). 31. Bobkov, S. G. On a theorem of V. N. Sudakov on typical distributions. (Russian) Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 368 (2009), Veroyatnost i Statistika. 15, 59–74. Translation in: J. Math. Sciences (N. Y.), 167, no. 4, 464–473 (2010). 32. Bobkov, S. G. Closeness of probability distributions in terms of Fourier–Stieltjes transforms. (Russian) Uspekhi. Matemat. Nauk, vol. 71, issue 6(432), 37–98 (2016). Translation in: Russian Math. Surveys 71:6, 1021–1079 (2016). 33. Bobkov, S. G. Asymptotic expansions for products of characteristic functions under moment assumptions of non-integer orders. In: Convexity and Concentration, pp. 297–357, IMA Vol. Math. Appl., 161, Springer, New York (2017). 34. Bobkov, S. G. Central limit theorem and Diophantine approximations. J. Theoret. Probab. 31, no. 4, 2390–2411 (2018). 35. Bobkov, S. G. Edgeworth corrections in randomized central limit theorems. In: Geometric Aspects of Functional Analysis, Lecture Notes in Math. 2256, 71–97 (2020). 36. Bobkov, S. G.; Chistyakov G. P.; Götze, F. Second order concentration on the sphere. Communications in Contemp. Math. 19, no. 5, 1650058 (2017). 20 pp. 37. Bobkov, S. G.; Chistyakov G. P.; Götze, F. Gaussian mixtures and normal approximation for V. N. Sudakov’s typical distributions. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 457, Veroyatnost i Statistika. 25, 37–52 (2017). Also in: J. Math. Sciences 238, no. 4, 366–376 (2019). 38. Bobkov, S. G.; Chistyakov G. P.; Götze, F. Berry–Esseen bounds for typical weighted sums. Electronic J. of Probability 23, no. 92, 1–22 (2018).
References
421
39. Bobkov, S. G.; Chistyakov G. P.; Götze, F. Normal approximation for weighted sums under a second order correlation condition. Ann. Probab. 48, no. 3, 1202–1219 (2020). 40. Bobkov, S. G.; Chistyakov G. P.; Götze, F. Poincaré inequalities and normal approximation for weighted sums. Electron. J. of Probab. 25 (2020), Paper no. 155 (2020). 31 pp. 41. Bobkov, S. G.; Chistyakov G. P.; Götze, F. Asymptotic expansions and two-sided bounds in randomized central limit theorems. Preprint (2020). 42. Bobkov, S. G.; Cordero-Erausquin, D. K-L-S-type isoperimetric bounds for log-concave probability measures. Ann. Mat. Pura Appl. (4) 195, no. 3, 681–695 (2016). 43. Bobkov, S. G.; Gentil, I.; Ledoux, M. Hypercontractivity of Hamilton–Jacobi equations. J. Math. Pures Appl. 80 7, 669–696 (2001). 44. Bobkov, S. G.; Götze, F. Exponential integrability and transportation cost related to logarithmic Sobolev inequalities. J. Func. Anal. 1, 1–28 (1999). 45. Bobkov, S. G.; Götze, F. On the central limit theorem along subsequences of noncorrelated observations. Teor. Veroyatnost. i Primenen. 48, no. 4, 745–765 (2003). Translation in: Theory Probab. Appl. 48, no. 4, 604–621 (2004). 46. Bobkov, S. G.; Götze, F. Concentration inequalities and limit theorems for randomized sums. Probab. Theory Related Fields 137, no. 1–2, 49–81 (2007). 47. Bobkov, S. G.; Götze, F. Hardy-type inequalities via Riccati and Sturm–Liouville equations. In: Sobolev Spaces in Mathematics, Sobolev Type Inequalities, Vol. I, Intern. Math. Series, Vol. 8, ed. by V. Maz’ya, Springer (2008). 48. Bobkov, S. G.; Götze, F. Concentration of empirical distribution functions with applications to non-i.i.d. models. Bernoulli 16, no. 4, 1385–1414 (2010). 49. Bobkov, S. G.; Houdré, C. Some Connections Between Isoperimetric and Sobolev-type Inequalities. Mem. Amer. Math. Soc. 129, no. 616 (1997). viii+111 pp. 50. Bobkov, S. G.; Houdré, C. Isoperimetric constants for product probability measures. Ann. Probab. 25, no. 1, 184–205 (1997). 51. Bobkov, S. G.; Koldobsky, A. On the central limit property of convex bodies. In: Geometric Aspects of Functional Analysis, Lecture Notes in Math. 1807, pp. 44–52, Springer, Berlin (2003). 52. Bobkov, S. G.; Ledoux, M. Poincaré’s inequalities and Talagrand’s concentration phenomenon for the exponential distribution. Probab. Theory Rel. Fields 107, 383–400 (1997). 53. Bobkov, S. G.; Ledoux, M. From Brunn–Minkowski to Brascamp–Lieb and to logarithmic Sobolev inequalities. Geom. Funct. Anal. 10, no. 5, 1028—1052 (2000). 54. Bobkov, S. G.; Ledoux, M. From Brunn–Minkowski to sharp Sobolev inequalities. Ann. Mat. Pura Appl. (4) 187, no. 3, 369–384 (2008). 55. Bobkov, S. G.; Ledoux, M. Weighted Poincaré-type inequalities for Cauchy and other convex measures. Ann. Probab. 37, 403–427 (2009). 56. Bobkov, S. G.; Ledoux, M. On weighted isoperimetric and Poincaré-type inequalities. In: High Dimensional Probability V: The Luminy Volume. Vol. 5, pp. 1–29, IMS Collections (2009). 57. Bobkov, S. G.; Ledoux, M. One-dimensional Empirical Measures, Order Statistics and Kantorovich Transport Distances. Memoirs of the AMS, 261, no. 1259 (2019). 126 pp. 58. Bobkov, S. G.; Nazarov, F. L. On convex bodies and log-concave probability measures with unconditional basis. In: Geometric Aspects of Functional Analysis, pp. 53–69, Lecture Notes in Math. 1807, Springer, Berlin, 2003. 59. Bogachev, V. I. Measure theory. Vol. I, II. Springer-Verlag, Berlin (2007). Vol. I: xviii+500 pp., Vol. II: xiv+575 pp. 60. Bogachev, V. I.; Kolesnikov, A. V. The Monge–Kantorovich problem: achievements, connections, and prospects. (Russian) Uspekhi Mat. Nauk 67, no. 5 (407), 3–110 (2012); translation in: Russian Math. Surveys 67, no. 5, 785–890 (2012). 61. Bonnefont, M.; Joulin, A.; Ma, Y. Spectral gap for spherically symmetric log-concave probability measures, and beyond. J. Funct. Anal. 270, 2456–2482 (2016).
422 62. 63. 64. 65. 66. 67.
68.
69.
70. 71. 72. 73.
74. 75. 76.
77. 78.
79. 80. 81.
82. 83.
84. 85.
References Borell, C. Complements of Lyapunov’s inequality. Math. Ann. 205, 323–331 (1973). Borell, C. Convex measures on locally convex spaces. Ark. Math. 12, 239–252 (1974). Borell, C. Convex set functions in 𝑑-space. Period. Math. Hungar. 6, no. 2, 111–136 (1975). Borell, C. The Brunn–Minkowski inequality in Gauss space. Invent. Math. 30 (1975), no. 2, 207–216. Borovkov A. A.; Utev S. A. On an inequality and a characterization of the normal distribution connected with it. Probab. Theory Appl. 28, 209–218 (1983). Bourgain, J. On the distribution of polynomials on high dimensional convex bodies. In: Geometric Aspects of Functional Analysis, pp. 127–137, Lecture Notes in Math., 1469, Springer, Berlin (1991). Brascamp, H. J.; Lieb, E. H. On extensions of the Brunn–Minkowski and Prékopa–Leindler theorems, including inequalities for log-concave functions, and with an application to the diffusion equation. J. Funct. Anal. 22, 366–389 (1976). Brazitikos, S.; Giannopoulos, A.; Valettas, P.; Vritsiou, B-H. Geometry of Isotropic Convex Bodies. Mathematical Surveys and Monographs, 196. American Mathematical Society, Providence, RI (2014). xx+594 pp. Brehm, U.; Hinow, P.; Vogt, H.; Voigt, J. Moment inequalities and central limit properties of isotropic convex bodies. Math. Z. 240, no. 1, 37–51 (2002). Brehm, U.; Voigt, J. Asymptotics of cross sections for convex bodies. Beiträge Algebra Geom. 41, no. 2, 437–454 (2000). Bulinskii, A. V. Estimates for mixed semi-invariants and higher covariances of bounded random variables. (Russian) Teor. Verojatnost. i Premenen. 19, 869–873 (1974). Burago, Yu. D.; Zalgaller, V. A. Geometric Inequalities. Translated from the Russian by A. B. Sosinskii. Fundamental Principles of Mathematical Sciences, 285. Springer Series in Soviet Mathematics. Springer-Verlag, Berlin (1988). xiv+331 pp. Caffarelli, L. A. Monotonicity properties of optimal transportation and the FKG and related inequalities. Comm. Math. Phys. 214, no. 3, 547–563 (2000). Chatterji, S. D. A principle of subsequences in probability theory: the central limit theorem. Advances in Math. 13, 31–54 (1974); correction, ibid. 14, 266–269 (1974). Cheeger, J. A lower bound for the smallest eigenvalue of the Laplacian. In: Problems in Analysis (Papers Dedicated to Salomon Bochner, 1969), pp. 195–199. Princeton Univ. Press, Princeton, N. J. (1970). Chen, Y. An almost constant lower bound of the isoperimetric coefficient in the KLS conjecture. Geom. Funct. Anal. 31, no. 1, 34–61 (2021). Davidovich, Ju. S.; Korenbljum, B. I.; Khacet, B. I. A certain property of logarithmically concave functions. (Russian) Dokl. Akad. Nauk SSSR 185, 1215–1218 (1969). English translation in: Soviet Math. Dokl. 10, 477–480 (1969). Diaconis, P.; Freedman, D. Asymptotics of graphical projection pursuit. Ann. Stat. 12, no. 3, 793–815 (1984). Dudley, R. M. Real Analysis and Probability. The Wadsworth & Brooks/Cole Mathematics Series. Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA (1989). Dvoretzky, A.; Kiefer, J.; Wolfowitz, J. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist. 27, 642– 669(1956). Ehrhard, A. Isoperimetric inequalities and Gaussian Dirichlet integrals. (French) Ann. Sci. École Norm. Sup. (4) 17, no. 2, 317–332 (1984). Eldan, R.; Klartag, B. Approximately Gaussian marginals and the hyperplane conjecture. In: Concentration, Functional Inequalities and Isoperimetry, pp. 55–68, Contemp. Math. 545, Amer. Math. Soc., Providence, RI (2011). Eldan, R. Thin shell implies spectral gap up to polylog via a stochastic localization scheme. Geom. Funct. Anal. 23, no. 2, 532–569 (2013). Erdös, P. On trigonometric sums with gaps. Magyar Tud. Akad. Mat. Kutató Int. Közl 7 37–42 (1962).
References
423
86. Esseen, C.-G. Fourier analysis of distribution functions. A mathematical study of the Laplace– Gaussian law. Acta Math. 77, 1–125 (1945). 87. Evans, L. C. Partial Differential Equations. Graduate Studies in Math. vol. 19, Amer. Math. Soc., Providence, RI (1998). xviii+662 pp. 88. Fainleib, A. S. A generalization of Esseen’s inequality and its application in probabilistic number theory. (Russian) Izv. Akad. Nauk SSSR Ser. Mat. 32, 859–879 (1968). 89. Figiel, T.; Lindenstrauss, J.; Milman, V. D. The dimension of almost spherical sections of convex bodies. Acta Math. 139, no. 1–2, 53–94 (1977). 90. Fortet, R. Sur une suite egalement répartie. (French) Studia Math. 9, 54–70 (1940). 91. Gaposhkin, V. F. Sequences of functions and the central limit theorem. (Russian) Mat. Sbornik (N.S.) 70 (112), 145–171 (1966), Select. Translat. Math. Statist. Probab. 9, 89–118 (1971). 92. Gaposhkin, V. F. Lacunary series and independent functions. (Russian) Uspekhi Mat. Nauk 21, no. 6 (132), 3–82 (1966). Translation in: Russian. Math. Surveys 21, 1–82 (1966). 93. Gaposhkin, V. F. The rate of approximation to the normal law of the distributions of weighted sums of lacunary series. (Russian) Teor. Verojatnost. i Primenen. 13, 445–461 (1968). 94. Gaposhkin, V. F. Convergence and limit theorems for subsequences of random variables. (Russian) Teor. Verojatnost. i Primenen. 17, 401–423 (1972). 95. Götze, F. Asymptotic expansions in functional limit theorems. J. Multivariate Anal. 16, no. 1, 1–20 (1985). 96. Götze, F.; Naumov, A.; Ulyanov, V. Asymptotic analysis of symmetric functions. J. Theoret. Probab. 30, no. 3, 876–897 (2017). 97. Goldstein, L.; Shao, Q.-M. Berry–Esseen bounds for projections of coordinate symmetric random vectors. Electron. Commun. Probab. 14, 474–485 (2009). 98. Gozlan, G.; Léonard, C. Transport inequalities. A survey. Markov Process. Rel. Fields 16 635–736 (2010). 99. Gozlan, N.; Roberto, C.; Samson, P.-M. Hamilton–Jacobi equations on metric spaces and transport-entropy inequalities. Rev. Mat. Iberoam. 30, no. 1, 133–163 (2014). 100. Gromov, M.; Milman V. D. A topological application of the isoperimetric inequality. Amer. J. Math. 105, 843–854 (1983). 101. Guédon, O.; Milman, E. Interpolating thin-shell and sharp large-deviation estimates for isotropic log-concave measures. Geom. Funct. Anal. 21, no. 5, 1043–1068 (2011). 102. Guédon, O. Kahane–Khinchine type inequalities for negative exponent. Mathematika 46, no. 1, 165–173 (1999). 103. Gross, L. Logarithmic Sobolev inequalities. Amer. J. Math. 97, no. 4, 1061–1083 (1975). 104. Grünbaum, B. Partitions of mass-distributions and of convex bodies by hyperplanes. Pacific J. Math. 10, 1257–1261 (1960). 105. Holley, R., Stroock, D. Logarithmic Sobolev inequalities and stochastic Ising models. J. Statist. Phys. 46, no. 5–6, 1159–1194 (1987). 106. Ibragimov, I. A.; Linnik, Yu. V. Independent and Stationary Sequences of Random vVariables. With a supplementary chapter by I. A. Ibragimov and V. V. Petrov. Translation from the Russian edited by J. F. C. Kingman. Wolters-Noordhoff Publishing, Groningen (1971). 443 pp. 107. Ibragimov, R.; Sharakhmetov, Sh. On an exact constant for the Rosenthal inequality. (Russian) Teor. Veroyatnost. i Primenen. 42, no. 2, 341–350 (1997); translation in Theory Probab. Appl. 42 (1997), no. 2, 294–302 (1998). 108. Jambulapati, A.; Lee, Y. T.; Vempala, S. S. A slightly improved bound for the KLS constant. Preprint (2022), arXiv:2208.11644. 109. Johnson, W. B.; Schechtman, G.; Zinn, J. Best constants in moment inequalities for linear combinations of independent and exchangeable random variables. Ann. Probab. 13, no. 1, 234–253 (1985). 110. Joffe, A. On a sequence of almost deterministic pairwise independent random variables. Proc. Amer. Math. Soc. 29, no. 2, 381–382 (1971).
424
References
111. Kac, M. Sur les fonctions indépendantes. V. Studia Math. 7, 96–100 (1937). Í 112. Kac, M. On the distribution of values of sums of the type 𝑓 (2 𝑘 𝑡). Ann. Math. (2) 47, 33–49 (1946). 113. Kac, I. S.; Krein, M. G. Criteria for the discreteness of the spectrum of a singular string. (Russian) Izv. Vysš. Učebn. Zaved. Matematika, no. 2 (3), 136–153 (1958). 114. Kaczmarz, S.; Steinhaus, H. Theory of Orthogonal Series. Warszawa, Lwow (1935); Russian ed.: Izdat. Fiz.-Mat. Lit., Moscow (1958. 507 pp. 115. Kagan, A. M.; Linnik, Yu. V.; Rao, C. R. Characterization Problems in Mathematical Statistics. Translated from the Russian by B. Ramachandran. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons, New York-London-Sydney (1973). xii+499 pp. 116. Kannan, R.; Lovász, L.; Simonovits, M. Isoperimetric problems for convex bodies and a localization lemma. Discrete and Comput. Geom. 13, 541–559 (1995). 117. Kantorovitch, L. On mass transportation. (Russian) Reprinted from C. R. (Doklady) Acad. Sci. URSS (N.S.) 37, no. 7–8 (1942). Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 312 (2004), Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 11, 11–14; translation in: J. Math. Sci. (N.Y.) 133, no. 4, 1381–1382 (2006). 118. Kantorovitch, L. On a problem of Monge. (Russian) Reprinted from C. R. (Doklady) Acad. Sci. URSS (N.S.) 3, no. 2 (1948). Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 312 (2004), Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 11, 15–16; translation in J. Math. Sci. (N.Y.) 133, no. 4, 1383 (2006). 119. Karlin, S.; Proschan, F.; Barlow, R. E. Moment inequalities of Pólya frequency functions. Pacific J. Math. 11, 1023–1033 (1961). 120. Kashin, B. S.; Saakyan, A. A. Orthogonal Series. Translated from the Russian by Ralph P. Boas. Translation edited by Ben Silver. Translations of Mathematical Monographs 75. American Mathematical Society, Providence, RI (1989). xii+451 pp. 121. Klartag, B. A central limit theorem for convex sets. Invent. Math. 168, no. 1, 91–131 (2007). 122. Klartag, B. Power-law estimates for the central limit theorem for convex sets. J. Funct. Anal. 245, no. 1, 284–310 (2007). 123. Klartag, B. A Berry–Esseen type inequality for convex bodies with an unconditional basis. Probab. Theory Related Fields 145, no. 1-2, 1–33 (2009). 124. Klartag, B.; Lehec, J. Bourgain’s slicing problem and KLS isoperimetry up to polylog. Geom. Funct. Anal. 32, no. 5, 1134–1159 (2022). 125. Klartag, B.; Sodin, S. Variations on the Berry–Esseen theorem. Teor. Veroyatn. Primen. 56, no. 3, 514–533 (2011); reprinted in Theory Probab. Appl. 56, no. 3, 403–419 (2012). 126. Koldobsky, A.; Lifshits, M. Average volume of sections of star bodies. In: Geometric Aspects of Functional Analysis, pp. 119–146, Lecture Notes in Math. 1745, Springer, Berlin (2000). 127. Latala, R. On the equivalence between geometric and arithmetic means for log-concave measures. In: Convex Geometric Gnalysis (Berkeley, CA, 1996), pp. 123–127, Math. Sci. Res. Inst. Publ. 34, Cambridge Univ. Press, Cambridge (1999). 128. Ledoux, M. Remarks on logarithmic Sobolev constants, exponential integrability and bounds on the diameter. J. Math. Kyoto Univ. 35, no. 2, 211–220 (1995). 129. Ledoux, M. Isoperimetry and Gaussian analysis. In: Lectures on Probability Theory and Statistics (Saint-Flour, 1994), pp. 165–294, Lecture Notes in Math., 1648, Springer, Berlin (1996). 130. Ledoux, M. Concentration of measure and logarithmic Sobolev inequalities. In: Séminaire de Probabilités XXXIII. Lect. Notes in Math. 1709, pp. 120–216, Springer (1999). 131. Ledoux, M. The Concentration of Measure Phenomenon. Math. Surveys and monographs, vol. 89, AMS (2001). 132. Ledoux, M. Spectral gap, logarithmic Sobolev constant, and geometric bounds. In: Surveys in Differential Geometry. Vol. IX, pp. 219-–240, Surv. Differ. Geom., IX, Int. Press, Somerville, MA (2004).
References
425
133. Lee, Y. T.; Vempala, S. S. Eldan’s stochastic localization and the KLS hyperplane conjecture: an improved lower bound for expansion. In: 58th Annual IEEE Symposium on Foundations of Computer Science – FOCS 2017, pp. 998–1007, IEEE Computer Soc., Los Alamitos, CA (2017). 134. Lee, Y. T.; Vempala, S. S. The Kannan–Lovász–Simonovits Conjecture. In: Current Developments in Mathematics 2017, pp. 1–36, Int. Press, Somerville, MA (2019). 135. Matskyavichyus, V. K. A lower bound for the rate of convergence in the central limit theorem. (Russian) Theory Probab. Appl. 28, no. 3, 565–569 (1983). 136. Massart, P. The tight constant in the Dvoretzky–Kiefer–Wolfowitz inequality. Ann. Probab. 18, no. 3, 1269–1283 (1990). 137. Maurey, B. Some deviation inequalities. Geom. and Funct. Anal. 1, 188–197 (1991). 138. Maz’ya, V. G. Classes of domains and embedding theorems for functions spaces. Dokl. Akad. Nauk SSSR, 527–530 (1960) (Russian); translated in Soviet Math. Dokl. 1, 882–885 (1960). 139. Maz’ya, V. G. Sobolev Spaces. Springer-Verlag, Berlin - New York (1985). 140. Meckes, M. W.; Meckes, E. The central limit problem for random vectors with symmetries. J. Theoret. Probab. 20, no. 4, 697–720 (2007). 141. Meckes, M. W. Gaussian marginals of convex bodies with symmetries. Beiträge Algebra Geom. 50, no. 1, 101–118 (2009). 142. Milman, V. D. A new proof of A. Dvoretzky’s theorem on cross-sections of convex bodies. Func. Anal. Appl. 5, 28–37 (1971). 143. Milman, V. D. Asymptotic properties of functions of several variables that are defined on homogeneous spaces. Soviet Math. Dokl. 12, 1277–1281 (1971). Translated from: Dokl. AN SSSR 199, no. 6, 1247–1250 (1971) (Russian). 144. Milman, V. D.; Pajor, A. Isotropic position and inertia ellipsoids and zonoids of the unit ball of a normed 𝑛-dimensional space. In: Geometric Aspects of Functional Analysis (1987–88), pp. 64–104, Lecture Notes in Math. 1376, Springer, Berlin (1989). 145. Milman, V. D.; Schechtman, G. Asymptotic Theory of Finite-Dimensional Normed Spaces. With an appendix by M. Gromov. Lecture Notes in Mathematics 1200, Springer-Verlag, Berlin (1986). viii+156 pp. 146. Milman, E. On the role of convexity in isoperimetry, spectral gap and concentration. Invent. Math. 177, no. 1, 1–43 (2009). 147. Morgenthaler, G. W. A central limit theorem for uniformly bounded orthonormal systems. Trans. Amer. Math. Soc. 79, 281–311 (1955). 148. Muckenhoupt, B. Hardy’s inequality with weights. Studia Math. XLIV, 31–38 (1972). 149. Mueller, C. E.; Weissler, F. B. Hypercontractivity for the heat semigroup for ultraspherical polynomials and on the 𝑛-sphere. J. Funct. Anal. 48, no. 2, 252–283 (1982). 150. Otto, F.; Villani, C. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal. 173, no. 2, 361–400 (2000). 151. Paouris, G. Concentration of mass on convex bodies. Geom. Funct. Anal. 16, no. 5, 1021– 1049 (2006). 152. Paouris, G. Small ball probability estimates for log-concave measures. Trans. Amer. Math. Soc. 364, no. 1, 287–308 (2012). 153. Payne, L. E.; Weinberger, H. F. An optimal Poincaré inequality for convex domains. Arch. Rational Mech. Anal. 5, 286–292 (1960). 154. Petrov, V. V. Sums of Independent Random Variables. Translated from the Russian by A. A. Brown. Ergebnisse der Mathematik und ihrer Grenzgebiete, Band 82. Springer-Verlag, New York-Heidelberg (1975). x+346 pp. Russian ed.: Moscow, Nauka (1972). 414 pp. 155. Petrov, V. V. Limit Theorems for Sums of Independent Random Variables. Moscow, Nauka (1987) (in Russian). 156. Pinelis, I. Exact Rosenthal-type bounds. Ann. Probab. 43, no. 5, 2511–2544 (2015). 157. Pisier, G. The Volume of Convex Bodies and Banach Space Geometry. Cambridge Tracts in Mathematics 94. Cambridge University Press, Cambridge (1989). xvi+250 pp.
426
References
158. Pólya, G. Herleitung des Gaußschen Fehlergesetzes aus einer Funktionalgleichung. (German) Math. Z. 18, no. 1, 96–108 (1923). 159. Prawitz, H. Limits for a distribution, if the characteristic function is given in a finite domain. Skand. Aktuarietidskr. 1972, 138–154 (1973). 160. Prékopa, A. On logarithmic concave measures and functions. Acta Sci. Math. (Szeged) 34, 335–343 (1973). 161. Rokhlin, V. A. On the fundamental ideas of measure theory. (Russian) Mat. Sbornik N.S. 25 (67), 107–150 (1949). Also in Selected Works. Edited and with a preface by Vershik. Moskovskii Tsentr Nepreryvnogo Matematicheskogo Obrazovaniya, Moscow (1999). 496 pp. 162. Rosenthal, H. P. On the subspaces of 𝐿 𝑝 ( 𝑝 > 2) spanned by sequences of independent random variables. Israel J. Math. 8, 273–303 (1970). 163. Rothaus, O. S. Analytic inequalities, isoperimetric inequalities and logarithmic Sobolev inequalities. J. Funct. Anal. 64, no. 2, 296–313 (1985). 164. Salem, R.; Zygmund, A. On lacunary trigonometric systems. Proc. Nat. Acad. Sci. USA 33, 333–338 (1947). 165. Salem, R.; Zygmund, A. On lacunary trigonometric series. II. Proc. Nat. Acad. Sci. USA 34, 54–62 (1948). 166. Shevtsova, I. G. Refinement of estimates for the rate of convergence in Lyapunov’s theorem. (Russian) Dokl. Akad. Nauk 435, no. 1, 26–28 (2010); translation in Dokl. Math. 82, no. 3, 862–864 (2010). 167. Sodin, S. Tail-sensitive Gaussian asymptotics for marginals of concentrated measures in high dimension. In: Geometric Aspects of Functional Analysis, pp. 271–295, Lecture Notes in Math. 1910, Springer, Berlin (2007). 168. Stein, E. M.; Weiss, G. Introduction to Fourier Analysis on Euclidean Spaces. Princeton Mathematical Series, No. 32. Princeton University Press, Princeton, N.J. (1971). x+297 pp. 169. Sudakov, V. N. Typical distributions of linear functionals in finite-dimensional spaces of higher dimension. Soviet Math. Dokl. 19, no. 6, 1578–1582 (1978). Translated from: Dokl. Akad. Nauk SSSR 243, no. 6 (1978). 170. Sudakov, V. N.; Tsirelson, B. S. Extremal properties of half-spaces for spherically invariant measures: Problems in the theory of probability distributions, II. (Russian) Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI) 41, 14–24, 165 (1974). 171. Talagrand, M. A new isoperimetric inequality and the concentration of measure phenomenon. In: Geometric Aspects of Functional Analysis, Israel Seminar, pp. 94–124 Springer Verlag, Lect. Notes Math. 1469 (1991). 172. Talagrand, M. Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’I.H.E.S. 81, 73–205 (1995). 173. Talagrand, M. Transportation cost for Gaussian and other product measures. Geom. Funct. Anal. 6, 587–600 (1996). 174. Villani, C. Topics in Optimal Transportation. Graduate Studies in Mathematics 58. American Mathematical Society, Providence, RI (2003). xvi+370 pp. 175. Wang, F.-Y. Logarithmic Sobolev inequalities on noncompact Riemannian manifolds. Probab. Theory Related Fields 109, 417–424 (1997). 176. von Weizsäcker, H. Sudakov’s typical marginals, random linear functionals and a conditional central limit theorem. Probab. Theory Rel. Fields 107, 313–324 (1997). 177. Zolotarev, V. M. Estimates of the difference between distributions in the Lévy metric. (Russian) In: Collection of Articles Dedicated to Academician Ivan Matveevich Vinogradov on his 80th Birthday, I. Trudy Mat. Inst. Steklov. 112, pp. 224–231 (1971).
Glossary
Algebra R𝑛 = R × · · · × R ⟨𝑥, 𝑦⟩ = 𝑥1 𝑦 1 + · · · + 𝑥 𝑛 𝑦 𝑛 |𝑥| = (𝑥 12 + · · · + 𝑥 𝑛2 ) 1/2 C𝑛 = C × · · · × C ⟨𝑥, 𝑦⟩ = 𝑥1 𝑦¯ 1 + · · · + 𝑥 𝑛 𝑦¯ 𝑛 |𝑥| = (|𝑥1 | 2 + · · · + |𝑥 𝑛 | 2 ) 1/2 ∥ 𝐴∥ HS =
Í
𝑛 𝑖, 𝑗=1
|𝑎 𝑖 | 2
1/2
∥ 𝐴∥ = sup{| 𝐴𝑥| : 𝑥 ∈ C𝑛−1 , |𝑥| = 1}
Euclidean 𝑛-dimensional space inner product for vectors (points) 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ), 𝑦 = (𝑦 1 , . . . , 𝑦 𝑛 ) ∈ R𝑛 Euclidean norm, length of vector 𝑥 ∈ R𝑛 complex Euclidean 𝑛-dimensional space inner product for vectors (points) 𝑥 = (𝑥 1 , . . . , 𝑥 𝑛 ), 𝑦 = (𝑦 1 , . . . , 𝑦 𝑛 ) ∈ C𝑛 Euclidean norm of vector 𝑥 ∈ C𝑛 Hilbert–Schmidt norm of 𝑛 × 𝑛 matrix 𝐴 = (𝑎 𝑖 𝑗 )𝑖,𝑛 𝑗=1 with complex entries operator norm of complex matrix 𝐴
Probability (Ω, F∫, P) E𝜉 = 𝜉 dP Var(𝜉) = E |𝜉 − E𝜉 | 2 ∥𝜉 ∥ 𝑝 = (E |𝜉 | 𝑝 ) 1/ 𝑝 ∥𝜉 ∥ 𝜓 = inf{𝜆 > 0 : E 𝜓(𝜉/𝜆) ≤ 1}
underlying probability space expectation of random variable (or vector) 𝜉 variance of complex-valued random variable 𝜉 𝑝 𝐿 -norm of random variable or vector 𝜉 (may be used for 𝑝 ≥ 0) Orlicz norm of 𝜉 generated by the Young function 𝜓
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9
427
Glossary
428
∥𝜉 ∥ 𝜓𝑝 𝛾 𝑝 (𝜉) =
d𝑝 𝑖 𝑝 d𝑡 𝑝
Orlicz norm of 𝜉 generated by the Young function 𝜓 𝑝 (𝑟) = exp{|𝑟 | 𝑝 } − 1 cumulant of random variable 𝜉 of order 𝑝 = 1, 2, . . .
E exp{𝑖𝑡𝜉} 𝑡=0
Moment- and variance-type functionals 𝑋 = (𝑋1 , . . . , 𝑋𝑛 ) 𝑌 = (𝑌1 , . . . , 𝑌𝑛 )
random vector in R𝑛 independent copy of 𝑋
𝑀 𝑝 = 𝑀 𝑝 (𝑋) = sup 𝜃 ∈S𝑛−1 ∥ ⟨𝑋, 𝜃⟩ ∥ 𝑝
moment of random vector 𝑋 in R𝑛 of order 𝑝 ≥ 1, 6
𝑚 𝑝 = 𝑚 𝑝 (𝑋) = 𝜎42 = 𝜎42 (𝑋) = 𝜎2 𝑝
√1 𝑛
∥ ⟨𝑋, 𝑌 ⟩ ∥ 𝑝
1 𝑛
Var(|𝑋 | 2 ) 𝑝 1/ 𝑝 √ 2 = 𝜎2 𝑝 (𝑋) = 𝑛 E |𝑋𝑛| − 1
Λ = supÍ 𝑎2 =1 Var
Í
𝑖𝑗
𝑛 𝑖, 𝑗=1
𝑎 𝑖 𝑗 𝑋𝑖 𝑋 𝑗
moment of 𝑋 of order 𝑝, using independent copy 𝑌 of 𝑋, 8 variance-type functional, 11 variance-type functional of order 2𝑝, assuming that E |𝑋 | 2 = 𝑛, 15 functional responsible for second order correlation condition
Metric spaces (𝑀, 𝑑) |∇ 𝑓 (𝑥)| = lim sup𝑑 ( 𝑥,𝑦)→0
| 𝑓 ( 𝑥)− 𝑓 ( 𝑦) | 𝑑 ( 𝑥,𝑦)
|∇2 𝑓 (𝑥)| = |∇ |∇ 𝑓 (𝑥)| | (𝑀, 𝑑, 𝜇) 𝜇+ ( 𝐴) 𝜆1 = 𝜆1 (𝜇) ℎ = ℎ(𝜇) 𝜌 = 𝜌(𝜇)
(abstract) metric space 𝑀 with metric 𝑑, without isolated points generalized modulus of gradient, 92 of a function 𝑓 : M → R. second order modulus of gradient, 93 metric space with probability measure 𝜇 𝜇-perimeter of set 𝐴 ⊂ 𝑀, 94 (outer Minkowski content) spectral gap (Poincaré constant), 96 Cheeger constant, 102 logarithmic Sobolev constant, 134
Analysis on the sphere S𝑛−1 = {𝑥 ∈ R𝑛 : |𝑥| = 1} 𝑓 ( 𝑥) 𝑛 𝑓 ′ (𝑥) = ∇ 𝑓 (𝑥) = 𝜕𝜕𝑥 𝑖=1 𝑖 𝑓 ′′ (𝑥) =
𝑛 𝜕2 𝑓 ( 𝑥) 𝜕𝑥𝑖 𝜕𝑥 𝑗 𝑖, 𝑗=1
unit sphere in R𝑛 Euclidean gradient (first derivative) of function 𝑓 = 𝑓 (𝑥) at the point 𝑥 Euclidean Hessian (second derivative)
Glossary
429
∇S 𝑓 (𝑥)
spherical gradient (first derivative) of 𝑓 at the point 𝑥 ∈ S𝑛−1 , 169 spherical second derivative of 𝑓 at 𝑥, 170
𝑓S′′ (𝑥) |∇S2 𝑓 (𝑥)| = |∇S |∇𝑆 𝑓 (𝑥)| |
spherical second order modulus of gradient, 172 spherical Laplacian of 𝑓 at 𝑥, 175 uniform distribution on S𝑛−1 (normalized Lebesgue measure) expectation (integral) with respect to 𝔰𝑛−1
ΔS 𝑓 (𝑥) 𝔰𝑛−1 E 𝜃 𝑓 (𝜃) = 𝐽𝑛 (𝑡) =
∫
∫
𝑓 d𝔰𝑛−1
exp{𝑖𝑡𝜃 1 } d𝔰𝑛−1 (𝜃)
characteristic function of the first coordinate of a point 𝜃 under 𝔰𝑛−1 , 212
Weighted sums and probability metrics 𝑆 𝜃 = ⟨𝑋, 𝜃⟩ = 𝜃 1 𝑋1 + · · · + 𝜃 𝑛 𝑋𝑛
𝐹𝜃 (𝑥) = P{𝑆 𝜃 ≤ 𝑥} 𝐹 (𝑥) = E 𝜃 𝐹𝜃 (𝑥)
weighted sum of coordinates of random vector 𝑋 = (𝑋, . . . , 𝑋𝑛 ) with weights 𝜃 𝑖 distribution function of 𝑆 𝜃 , 223 typical distribution function, 224
𝑓 𝜃 (𝑡) = E e𝑖𝑡𝑆 𝜃 𝑓 (𝑡) = E 𝜃 𝑓 𝜃 (𝑡)
characteristic function of 𝑆 𝜃 , 241 characteristic function of 𝐹, 241
𝑊 (𝐹𝜃 , 𝐹) =
∫∞ −∞
𝐿 1 -distance between 𝐹𝜃 and 𝐹, 259 (Kantorovich distance) Lévy distance between 𝐹𝜃 and 𝐹, 269 𝐿 ∞ -distance between 𝐹𝜃 and 𝐹, 274 (Kolmogorov distance)
|𝐹𝜃 (𝑥) − 𝐹 (𝑥)| d𝑥
𝐿(𝐹𝜃 , 𝐹) 𝜌(𝐹𝜃 , 𝐹) = sup 𝑥 |𝐹𝜃 (𝑥) − 𝐹 (𝑥)| 𝜔(𝐹𝜃 , 𝐹) =
∫ ∞ −∞
(𝐹𝜃 (𝑥) − 𝐹 (𝑥)) 2 d𝑥
1/2
𝐿 2 -distance between 𝐹𝜃 and 𝐹, 299
Standard notations/abbreviations 1 𝐴, 𝐼 ( 𝐴) 𝑥 + = max{𝑥, 0} 𝑥 − = max{−𝑥, 0} a.e., a.s.
indicator of set (event) 𝐴 positive part of a real number 𝑥 negative part of 𝑥 almost everywhere, almost surely
Index
approximation of characteristic functions of weighted sums, 392 approximation in total variation, 229 Bakry–Emery criterion, 136 Bernoulli coefficients, 411 Berry–Esseen bounds involving Λ, 333 Berry–Esseen inequality, 54 Berry–Esseen theorem, 72 Berry–Esseen-type bounds (Kolmogorov distance), 274 Berry–Esseen-type inequality, 54 Bessel function of the first kind, 212 beta distributions, 122 bounds involving relative entropy, 140 bounds involving second order derivatives, 144 bounds on the 𝐿 2 -norm (Euclidean setup), 188 Brascamp–Lieb inequality, 123 canonical Gaussian measure, 110 Cauchy measures, 121 characteristic function, 51 higher order approximations, 77 integral bounds, 394 of the first coordinate, 212
of weighted sums (approximation), 392 of weighted sums (typical distribution), 241 Chebyshev–Hermite polynomials, 80 Chebyshev polynomials, 364 Cheeger isoperimetric constant, 102, 108 sphere, 182 Cheeger-type inequality, 101, 106, 181 co-area inequality, 94 coefficients of special type, 411 concentration functions of weighted sums, 244 concentration inequality (first order), 190 concentration of measure phenomenon, 133 concentration problems for weighted sums, 223 concentration (second order), 192 with linear parts, 194 convolutions, 110, 136 coordinatewise symmetric distributions, 30, 375 coordinatewise symmetric log-concave distributions, 126 coordinatewise symmetry and log-concavity, 380
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 S. Bobkov et al., Concentration and Gaussian Approximation for Randomized Sums, Probability Theory and Stochastic Modelling 104, https://doi.org/10.1007/978-3-031-31149-9
431
Index
432
corrected normal characteristic function, 77 cost function, 163 covariance matrix, 4 cumulants, 63 deviation bounds (Kolmogorov distance), 287 deviations non-symmetric case, 251 of characteristic functions, 245 symmetric case, 248 under moment conditions, 337 distributions with an unconditional basis, 30 with symmetries, 375 Edgeworth approximation, 81 Edgeworth correction, 80 of the normal law, 81 Edgeworth expansion for weighted sums, 389 entropy functional, 104, 131 exponential bounds, 137 exponential distributions, 106 two-sided, 107 exponential integrability, 113 exponential logarithmic Sobolev inequality, 134 exponential measures, 122 finite second moment, 3 first order correlation condition, 5 fluctuation of distributions, 259 Gaussian bound for Lipschitz functions, 137 Gaussian measures, 107, 122, 136 Gaussian Poincaré-type inequality, 110 Gaussian transport-entropy inequality, 165 generalized Cauchy distribution, 121 generalized modulus of the gradient (unit sphere), 169 generalized Student distribution, 121
growth of 𝐿 𝑝 -norms, 116, 141 Hamilton–Jacobi equations, 156, 158 Hessian, 93 Hilbert–Schmidt norm, 170 Hopf–Lax formula, 158 hyperplane conjecture (slicing problem), 129 independence, 23 infimum-convolution, 149, 158 inequality, 149, 159 operator, 181 informational divergence, 131 integral bounds on characteristic functions, 394 isoperimetric function, 101 isoperimetric inequality, 181 isoperimetric problem, 101, 181 for Gauss space, 111 isoperimetric profile, 101 isoperimetry, 101 isotropic, 3 distribution, 3 isotropy, 3 Kantorovich distance comparison with Kolmogorov distance, 324 large deviations, 264 transport, 259 Khinchine-type inequality, 6, 7 for norms and polynomials, 38 Kolmogorov distance, 51, 317 approximation in, 397 bounds, 278 comparison with Kantorovich distance, 324 comparison with 𝐿 2 distance, 323 lower bounds, 60 Kullback–Leibler distance, 131, 140 𝐿 1 -Poincaré-type inequality, 111 𝐿 2 distance (comparison with Kolmogorov distance), 323 𝐿 2 estimates, 299
Index
𝐿 2 expansions, 299 lacunary systems, 370 Laplacian operator, 175 large deviations (weighted ℓ 𝑝 -norms), 204 Lévy distance, 57, 269 Lévy–Prokhorov distance, 163 linear functionals on the sphere, 207 linear part of characteristic functions, 255 Lipschitz condition, 357 Lipschitz function, 91 Lipschitz semi-norm, 92 Lipschitz transforms, 109, 136 locally Lipschitz function, 92 logarithmically concave measure, 34 logarithmic Sobolev constant, 134 optimal for unit sphere, 180 logarithmic Sobolev inequality, 131, 133, 134, 137, 178 exponential, 134 modified, 134 log-concave measure, 34, 107 𝐿 𝑝 -distances to the normal law, 234 𝐿 𝑝 –Kantorovich distance, 163 Lyapunov coefficient, 67 Lyapunov ratio, 67 minimal distance, 163 moduli of gradients (continuous setting), 91 moment functionals, 118 using independent copies, 8 moments, 6 𝜇-perimeter (outer), 94 non-symmetric distributions, 340 normal approximation, 72 for Gaussian mixtures, 227 one-dimensional log-concave distributions, 43 Orlicz norm, 133, 141 pairwise independence, 29 Parseval’s identity, 178
433
Payne–Weinberger inequality, 129 perimeter inequality, 94 Plancherel formula, 300, 310 Poincaré constant, 96, 118 Poincaré inequality, 178 Poincaré inequality (sphere), 181 Poincaré-type inequality, 96, 106, 113 for standard Gaussian measure, 122, 123 Orlicz norm, 135, 141 second order, 185 weighted, 121 pointwise fluctuations (distribution functions), 266 polynomial decay at infinity (characteristic function), 218 power transport distance, 163, 164 Prékopa–Leindler theorem, 36, 37, 48 principle of subsequences, 416 product measures, 108, 135, 389 product of characteristic functions (expansions), 74 random sums, 413 rates of approximation, 83 relations between Kantorovich, 𝐿 2 and Kolmogorov distances, 323 relative entropy, 131, 140 Rosenthal-type inequality, 69 rotationally invariant measures, 110 Rothaus functionals, 103 Rothaus’ inequality, 133 second order correlation condition, 20 applications, 331 second order modulus of gradient, 172 second order spherical concentration, 185 slicing problem, 50 small ball probabilities, 16, 118, 119 smoothing, 51 Sobolev-type inequality, 169
Index
434
spectral gap, 180 spherical concentration inequality elementary polynomials, 197 fourth degree polynomials, 200 spherical derivative, 169 first, 169 second, 170 spherical gradient, 169 spherical harmonic expansion, 177 spherical integration by parts, 178 spherical Laplacian, 175 spherical partial derivatives, 175 subadditivity property classical Shannon entropy, 134 of entropy, 133 of the variance functional, 108 Sudakov’s theorem, 223, 224 sums of independent random variables, 63 supremum-convolution, 149 inequality, 149, 159 operator, 181 systems with Lipschitz condition, 357
tail-type sufficient conditions, 107 tensorization of the variance, 108 thin shell conjecture, 49, 129, 351 transport-entropy inequality, 163 trigonometric systems, 362 typical distribution, 223, 224 structure, 225 uniform distribution on balls, 136 on the sphere, 137 upper bounds (characteristic functions), 215 typical distribution, 241 variance conjecture, 49 variance of the Euclidean norm, 11 Walsh system (on the discrete cube), 367 weighted Poincaré-type inequality, 121 Young’s inequality, 68 Zolotarev’s inequality, 57