242 10 1MB
English Pages 144 [152] Year 2023
Universitext
Daniel W. Stroock
Gaussian Measures in Finite and Infinite Dimensions
Universitext Series Editors Nathanaël Berestycki, Universität Wien, Vienna, Austria Carles Casacuberta, Universitat de Barcelona, Barcelona, Spain John Greenlees, University of Warwick, Coventry, UK Angus MacIntyre, Queen Mary University of London, London, UK Claude Sabbah, École Polytechnique, CNRS, Université Paris-Saclay, Palaiseau, France Endre Süli, University of Oxford, Oxford, UK
https://avxhm.se/blogs/hill0
Universitext is a series of textbooks that presents material from a wide variety of mathematical disciplines at master’s level and beyond. The books, often well class-tested by their author, may have an informal, personal even experimental approach to their subject matter. Some of the most successful and established books in the series have evolved through several editions, always following the evolution of teaching curricula, into very polished texts. Thus as research topics trickle down into graduate-level teaching, first textbooks written for new, cutting-edge courses may make their way into Universitext.
https://avxhm.se/blogs/hill0
Daniel W. Stroock
Gaussian Measures in Finite and Infinite Dimensions
https://avxhm.se/blogs/hill0
Daniel W. Stroock Department of Mathematics Massachusetts Institute of Technology Cambridge, MA, USA
ISSN 0172-5939 ISSN 2191-6675 (electronic) Universitext ISBN 978-3-031-23121-6 ISBN 978-3-031-23122-3 (eBook) https://doi.org/10.1007/978-3-031-23122-3 Mathematics Subject Classification: 28C20, 60G15, 60B11, 46G12 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
https://avxhm.se/blogs/hill0
This book is dedicated to Len Gross, the man whose ideas inspired much of it
https://avxhm.se/blogs/hill0
Preface
During the spring semester of 2022 I taught a class at M.I.T. based on the material in this book. Although my choice of Gaussian measures as the topic was influenced by the renewed interest, resulting from Oded Shramm’s work on conformal fields, of M.I.T. students in Gaussian measures, I was equally motivated by my own admiration of the theory of Gaussian measures on Banach spaces developed by Irving Segal and Leonard Gross in connection with constructive quantum field theory. I realize that the program to construct quantum fields via Gaussian measures has come on hard times, but I believe that ideas like Gross’s notion of abstract Wiener spaces should not be forgotten and may well have a role to play in the future. The contents are broken into four chapters. Chapter 1 contains some basic facts about characteristic functions. All of them are classical, many of them are familiar, but others have been either neglected or forgotten. Be that as it may, they will be useful later. Chapter 2 begins with a number of results about Gaussian measures in finite dimensions. Included are the Lévy-Cramér Theorem for independent random variables whose sum is Gaussian, the Poincaré and Gross’s logarithmic inequalities for Gaussian measures, the relationship of Gaussian measure to Hermite polynomials and functions, the concentration theorem of Maury and Pisier, and the Gaussian isoperimetric inequality. These are followed by the construction, via Kolmogorov’s Consistency Theorem and his continuity criterion, of Gaussian processes, and the chapter concludes with a discussion of Brownian motion and the Ornstein-Uhlenbeck processes. Because they are required for an understanding of what follows, Chap. 3 begins with a summary of a few functional analytic results, like Bochner’s theory of integration for Banach space valued functions, X. Fernique’s remarkable inequality for Gaussian measures on a Banach space, and Gaussian measures on a Hilbert space. Armed with those preliminaries, I next introduce the notion of an abstract Wiener space and show that there is one for every non-degenerate centered Gaussian measure on a separable Banach space. Having introduced and provided examples of abstract Wiener spaces, I attempt in Chap. 4 to demonstrate the importance of the CameronMartin subspace, that invisible but critical Hilbert space which is the distinguishing feature of an abstract Wiener space, by proving several results in which it plays an vii
https://avxhm.se/blogs/hill0
viii
Preface
essential part. These include the Cameron-Martin formula, the Banach space version of the Gaussian isoperimetric inequality, the Donsker-Varadhan large deviations principle for rescaled Gaussian measures, Rademacher’s differentiation theorem, and, somewhat later, Strassen’s law of the iterated logarithm. Along the way, I derive a result which shows that the definition I use of an abstract Wiener space is equivalent to Gross’s and leads to a proofs of a couple of his fundamental theorems which are used in the derivation of the Donsker-Varadhan large deviations principle as well as the construction of Banach space valued Brownian motion. The final topic is the construction of Euclidean free fields, first for one dimension and then for higher dimensions. In order to do so, I have to give a brief account in terms of Hermite functions of L. Schwartz’s theory of tempered distributions, after which the construction closely resembles ones given earlier. I do not know how, or even if, this book will be used. It covers only a small part of the subject and does so from a particular perspective. A much more complete treatment is given in V. Bogachev’s Gaussian Measures, which contains not only an enormous amount of material but also an admirable summary of and references to other sources. Nonetheless, I think that the contents of this book constitute the basis for a one semester, special topics course in probability theory. It also might be used as reference material in a less specialized course or in research. In any case, I have enjoyed writing it and hope that there will be at least one other person who will read and enjoy it. Nederland, CO, USA
Daniel W. Stroock
https://avxhm.se/blogs/hill0
Contents
1 Characteristic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Some Basic Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Infinitely Divisible Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 11
2 Gaussian Measures and Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Gaussian Measures on R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Cramér–Lévy Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Gaussain Measures and Cauchy’s Equation . . . . . . . . . . . . . . 2.3 Gaussian Spectral Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 A Logarithmic Sobolev Inequality . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Hermite Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Hermite Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Gaussian Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 A Few Basic Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 A Concentration Property of Gaussian Measures . . . . . . . . . . 2.4.3 The Gaussian Isoperimetric Inequality . . . . . . . . . . . . . . . . . . 2.5 Constructing Gaussian Families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Continuity Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Stationary Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . .
19 19 21 23 26 29 33 35 39 40 42 45 51 54 60 63
3 Gaussian Measures on a Banach Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Some Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 A Little Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Fernique’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Gaussian Measures on a Hilbert Space . . . . . . . . . . . . . . . . . . 3.3 Abstract Wiener Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 The Cameron–Martin Subspace and Formula . . . . . . . . . . . . 3.3.2 Some Examples of Abstract Wiener Spaces . . . . . . . . . . . . . .
69 69 70 71 76 78 80 86 94
ix
https://avxhm.se/blogs/hill0
x
Contents
4 Further Properties and Examples of Abstract Wiener Spaces . . . . . . . 4.1 Wiener Series and Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 An Isoperimetric Inequality for Abstract Wiener Space . . . . 4.1.2 Rademacher’s Theorem for Abstract Wiener Space . . . . . . . 4.1.3 Gross’s Operator Extention Procedure . . . . . . . . . . . . . . . . . . . 4.1.4 Orthogonal Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Large Deviations in Abstract Wiener Spaces . . . . . . . . . . . . . 4.2 Brownian Motion on a Banach Space . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Abstract Wiener Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Strassen’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 One Dimensional Euclidean Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Some Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 An Abstract Wiener Space for L 2 (λR ; R) . . . . . . . . . . . . . . . . 4.4 Euclidean Fields in Higher Dimensions . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 An Abstract Wiener Space for L 2 (λR N ; R) . . . . . . . . . . . . . . . 4.4.2 The Ornstein–Uhlenbeck Field in Higher Dimensions . . . . . 4.4.3 Is There any Physics Here? . . . . . . . . . . . . . . . . . . . . . . . . . . . .
101 101 104 106 109 114 117 121 121 124 129 129 130 133 134 136 137
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
https://avxhm.se/blogs/hill0
Notation
General α+ & α− a ∨ b&a ∧ b t f S · u i o(g(t)) B E (a, r ) Γ˚ & Γ¯ x, ξ
The positive and negative parts of α ∈ R (Sect. 1.2). The maximum and minimum of a, b ∈ R. The integer part of t ∈ R. The restriction of the function f to the set S. The uniform (supremum) √ norm. The imaginary number −1. f (t) tends to 0 as t tends to a limit. A function f for which g(t) The ball of radius r around a in the metric space E. The interior, and closure of a subset in a topological space. The action of ξ ∈ B ∗ on x ∈ B.
Sets, Functions, and Spaces C Z & Z+ N Q A S N −1 1A sgn(x) Cb (E; R) or Cb (E; C) ϕu C n (E; R N )
The complex numbers. The integers and the positive integers. The non-negative integers: N = {0} ∪ Z+ . The set of rational numbers. The complement of the set A. The unit sphere in R N . The indicator function of the set A. The signum function: equal to 1 if x ≥ 0 and −1 if x < 0. The space of bounded continuous functions from a topological space E into R or C. The uniform norm supx∈E |ϕ(x)| if ϕ. The space of f : G → R N with n ∈ N ∪ {∞} continuous derivatives. xi
https://avxhm.se/blogs/hill0
xii
Notation
Ccn (E; R N ) S (R N ; R) or S (R N ; C)
The space of f ∈ C n (E; R N ) that vanish off of a compact set. The Schwartz space of smooth R or C-valued functions with rapidly decreasing derivatives.
Measure Theoretic (a.e., μ) σ (C) BE B¯ μ δa λS M1 (E) μν μ⊥ν Φ *μ Γ f (x)d x L p (μ; E) or L p (μ; C) ϕ, μ
(R) [a,b] ϕ(t)dψ(t) E & Eμ E[X, A]
To be read almost everywhere with respect to μ. The σ -algebra generated by C (Sect. 2.1.2). The Borel σ -algebra σ (G(E)) over E. The completion of the σ -algebra B with respect to the measure μ. The unit point mass at a. Lebesgue measure on the set S ∈ BR N . Space of probability Borel measures on E. μ is absolutely continuous with respect to ν. μ is singular to ν. The pushforward (image) of μ under Φ. Equivalent to the Lebesgue integral Γ f dλR N of f on Γ . The Lebesgue space of E-valued or C-valued functions f p for which fE is μ-integrable. The integral of ϕ with respect to μ. The Riemann–Stieltjes integral of ϕ on [a, b] with respect to ψ. The expected value and expected value with respect to μ. The expected value of X1A .
https://avxhm.se/blogs/hill0
Chapter 1
Characteristic Functions
This chapter is devoted to a few facts, several of which I learned from an interesting set of notes by Bryc [3], about and related to what probabilists call a characteristic function.
1.1 Some Basic Facts Given an element μ of M1 (R N ), the set of Borel probability measures on R N , its characteristic function is its Fourier transform μˆ defined by μ(ξ) ˆ =
ei(ξ,x)R N μ(d x) for ξ ∈ R N .
Lemma 1.1.1 For every μ ∈ M1 (R N ), μˆ is a uniformly continuous C-valued ˆ ( · u is the uniform norm) and μ(−ξ) ˆ = μ(ξ). ˆ function such that μ ˆ u = 1 = μ(0) Moreover, if ϕ ∈ Cb (R N ; C) ∩ L 1 (λR N ; C) and ϕˆ ∈ L 1 (λR N ; C), then ϕ dμ = (2π)
−N
ϕ(ξ) ˆ μ(−ξ) ˆ dξ.
(1.1.1)
In particular, μ = ν if and only if μˆ = ν. ˆ Proof That μ ˆ u ≤ 1 = μ(0) ˆ is obvious, as, by Lebesgue’s Dominated Convergence Theorem, is the continuity of μ. ˆ To see that μˆ is uniformly continuous, note that 2 ≤ |μ(η) ˆ − ν(ξ)| ˆ
1 − ei(η−ξ,x)R N 2 μ(d x) = 2
1 − cos(η − ξ, x)R N μ(d x),
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. W. Stroock, Gaussian Measures in Finite and Infinite Dimensions, Universitext, https://doi.org/10.1007/978-3-031-23122-3_1
https://avxhm.se/blogs/hill0
1
2
1 Characteristic Functions
and therefore that
2 ≤ 2 1 − μ(η ˆ − ξ) . |μ(η) ˆ − μ(ξ)| ˆ
(1.1.2)
Turning to (1.1.1), note that, because ϕ and ϕˆ are λR N -integrable, the L 1 -Fourier inversion formula says that ϕ(x) = (2π)−N
ˆ dξ. e−i(ξ,x)R N ϕ(ξ)
Hence, by Fubini’s Theorem
(2π)
N
ϕ dμ = =
−i(ξ,x)R N
ϕ(ξ) ˆ dξ μ(d x) e −i(ξ,x)R N μ(d x) dξ = ϕ(ξ) ˆ μ(−ξ) ˆ dξ. ϕ(ξ) ˆ e
Finally, to see that μ = ν if μˆ = ν, ˆ remember that the Fourier transform takes the Schwartz space S (R N ; C) (cf. Sect. 4.5.1) into itself. Thus, if ϕ ∈ S (R N ; C), then both ϕ and ϕˆ are in L 1 (λR N ; C), and so, by (1.1.1), μˆ = νˆ =⇒
ϕ dμ =
ϕ dν for all ϕ ∈ S (R N ; C),
which means that μ = ν.
Lemma 1.1.2 If μ ∈ M1 (R) is the distribution of a random variable X , and, for some n ≥ 1, E[|X |n ] < ∞, then μˆ ∈ C n (R; C) and ˆ for 0 ≤ m ≤ n. E X m eiξ X ] = (−i)m ∂ m μ(ξ) Proof This is trivial when m = 0. If it holds for some 0 ≤ m < n, then (−i)m
∂ m μ(ξ ˆ + η) − ∂ m μ(ξ) ˆ eiη X − 1 = E X m eiξ X . η η
iη X Since X m eiξ X e η−1 ≤ |X |m+1 , the result follows from Lebesgue’s Dominated Convergence Theorem. Because the exponent in the definition of μˆ is pure imaginary, one cannot blithely differentiate μˆ to derive expressions like ∂ξn μ(0) ˆ = i n E[X n ]. To justify such computations, it is helpful to introduce the finite difference operators Δt ϕ(ξ) = ϕ(ξ+ 2t )−ϕ(ξ− 2t ) for t > 0 and functions ϕ : R −→ C to compute them. If ϕ vanishes t at 0 faster than |ξ|n for some n ≥ 1, then limt0 Δnt ϕ(0) = 0. Thus, for f ’s which
https://avxhm.se/blogs/hill0
1.1 Some Basic Facts
3
are n times continuously differentiable in a neighborhood of 0, limt0 Δnt f (0) = (m) limt0 Δnt Pn (0), where Pn is the Taylor polynomial nm=0 f m!(0) ξ m . Therefore we will know that limt0 Δnt f (0) = f (n) (0) once we show that Δnt ξ m = n!δm,n for m 0 ≤ m ≤ n, which comes down to checking that Δm t ξ = m! for all m ≥ 1. That Δt ξ = 1 is obvious. Now assume the result for 1 ≤ m < M. Then ΔtM−1 ξ M−m = 0 for m > 1, and so M M − ξ − 2t tΔtM ξ M = ΔtM−1 ξ + 2t M M M−1 M−m t m Δt ξ = ) 1 + (−1)m+1 = MtΔtM−1 ξ M−1 = t M!. 2 m m=0 Lemma 1.1.3 Let μ ∈ M1 (R) be the distribution of a random variable X . If n ≥ 1 ˆ < ∞, then E[X 2n ] < and there exists a sequence tk 0 such that supk≥1 |Δ2n tk μ(0)| ∞. In particular, this will be the case if μˆ is 2n times continuously differentiable in a neighborhood of 0. Proof Set ex (ξ) = ei xξ , and observe that Δ2t ex (ξ) = 2ex (ξ)
xt cos xt − 1 = −4t −2 ex (ξ) sin2 , 2 t 2
and therefore n −2n ex (ξ) sin2n Δ2n t ex (ξ) = (−4) t
xt . 2
Thus
2n n −2n 2n X tk Δ μ(0) = 4 tk E sin tk ˆ 2
π X tk π n −2n 2n −n 2n ≥ 2 E X , |X | ≤ , , |X | ≤ ≥ 4 tk E sin 2 2tk 2tk and so, by the Monotone Convergence Theorem, E X 2n < ∞.
Theorem 1.1.4 If μˆ admits an extension as an analytic function on BC (0, r ), then E eα|X | < ∞ for α ∈ [0, r ), and so ζ ∈ {z ∈ C : ζ ∈ (−r, r )} −→ M(ζ) = E eζ X ∈ C is analytic. Proof Since μˆ ∈ C ∞ (R; C), Lemma 1.1.3 implies that E[|X |n ] < ∞ and, by Lemma ˆ = i n E[X n ] for all n ≥ 1. Hence, 1.1.2, ∂ n μ(0)
https://avxhm.se/blogs/hill0
4
1 Characteristic Functions
μ(ξ) ˆ =
∞ E[X n ] n=0
n!
(iξ)n
where the radius of convergence of the series is at least r . In particular, for α ∈ [0, r ), ∞ E[X 2n ] 2n α < ∞, E eαX + e−αX = 2 (2n)! n=0 α|X | and so E e < ∞. Thus, by Lebesgue’s Dominated Convergence Theorem, ∂ζ E eζ X = E X eζ X if |ζ| < r . Knowing that the map μ ∈ M1 (R N ) −→ μˆ ∈ Cb (R N ; C) is one-to-one, one should ask about its continuity properties. For that purpose, introduce the notation
ϕ, μ to denote the integral of a function ϕ with respect to a (not necessarily finite) measure μ, and say that a sequence {μn : n ≥ 1} ⊆ M1 (R N ) converges weakly to μ ∈ M1 (R N ) if, for every ϕ ∈ Cb (R N ; C), ϕ, μn −→ ϕ, μ. Often I will write w μn −→ μ to mean that {μn : n ≥ 1} converges weakly to μ. w It is clear that μˆ n −→ μˆ pointwise if μn −→ μ. In fact, as we will see, the convergence of the μˆ n ’s is uniform on compact subsets. More importantly, we will show that uniform convergence on compacts of the μˆ n ’s implies weak convergence of the μn ’s. Lemma 1.1.5 Let {μn : n ≥ 1} ∪ {μ} ⊆ M1 (R N ) be given, and assume that
ϕ, μn −→ ϕ, μ for all ϕ ∈ Cc∞ (R N ; R). Then, for any ψ ∈ C R N ; [0, ∞) ,
ψ, μ ≤ limn→∞ ψ, μn . Moreover, if ψ ∈ C R N ; [0, ∞) is μn -integrable for each n ∈ Z and if ψ, μn −→ ψ, μ ∈ [0, ∞), then for any sequence {ϕn : n ≥ 1} ⊆ C(R N ; C) that converges uniformly on compact subsets to a ϕ ∈ C(R N ; C) and satisfies |ϕn | ≤ Cψ for some C < ∞ and all n ≥ 1, ϕn , μn −→ ϕ, μ. Proof Choose ρ ∈ Cc∞ R N ; [0, ∞) with total λR N -integral 1, and set ρ (x) = −N ρ(−1 x) for > 0. Also, choose η ∈ Cc∞ R N ; [0, 1] so that η = 1 on BR N (0, 1) and 0 off of BR N (0, 2), and set η R (x) = η(R −1 x) for R > 0. Begin by noting that ϕ, μn −→ ϕ, μ for all ϕ ∈ Cc∞ (R N ; C). Next, suppose that ϕ ∈ Cc (R N ; C), and, for > 0, set ϕ = ρ ϕ, the convolution ρ (x − y)ϕ(y) dy RN
of ρ with ϕ. Then, for each > 0, ϕ ∈ Cc∞ (R N ; C) and therefore ϕ , μn −→
ϕ , μ. In addition, there is an R > 0 such that supp(ϕ ) ⊆ BR N (0, R) (i.e., ϕ = 0 off of BR N (0, R)) for all ∈ (0, 1]. Hence, lim ϕ, μn − ϕ, μ ≤ 2ϕ − ϕ u + lim ϕ , μn − ϕ , μ = 2ϕ − ϕ u
n→∞
n→∞
Since lim0 ϕ − ϕu = 0, we have now shown that ϕ, μn −→ ϕ, μ for all ϕ ∈ Cc (R N ; C).
https://avxhm.se/blogs/hill0
1.1 Some Basic Facts
5
Now suppose that ψ ∈ C R N ; [0, ∞) , and set ψ R = η R ψ, where η R is as above. Then, for each R > 0, ψ R , μ = limn→∞ ψ R , μn ≤ limn→∞ ψ, μn . Hence, by Fatou’s Lemma, ψ, μ ≤ lim R→∞ ψ R , μ ≤ limn→∞ ψ, μn . Finally, suppose that ψ ∈ C R N ; [0, ∞) is μn -integrable for each n ∈ Z and that
ψ, μn −→ ψ, μ ∈ [0, ∞). Given {ϕn : n ≥ 1} ⊆ C(R N ; C) satisfying |ϕn | ≤ Cψ and converging uniformly on compact subsets to ϕ, one has ϕn , μn − ϕ, μ ≤ ϕn − ϕ, μn + ϕ, μn − ϕ, μ. Moreover, for each R > 0, lim ϕn − ϕ, μn
n→∞
≤ lim
n→∞ x∈B
sup
RN
(0,2R)
|ϕn (x) − ϕ(x)| η R , μn + lim (1 − η R )(ϕn − ϕ), μn n→∞
≤ 2C lim (1 − η R )ψ, μn = lim 2C ψ, μn − η R ψ, μn n→∞
n→∞
= 2C (1 − η R )ψ, μ, and similarly lim ϕ, μn − ϕ, μ n→∞ ≤ lim η R ϕ, μn − η R ϕ, μ + C lim (1 − η R )ψ, μn + C (1 − η R )ψ, μ n→∞
n→∞
= 2C (1 − η R )ψ, μ. Because ψ is μ-integrable, (1 − η R )ψ, μ −→ 0 as R → ∞ by Lebesgue’s Dominated Convergence Theorem, and so we are done. In the proof of the following theorem and elsewhere, a function ϕ : R N −→ C is said to be rapidly decreasing if supx∈R N (1 + |x|2n )|ϕ(x)| < ∞ for all n ≥ 0. Theorem 1.1.6 Given a sequence {μn : n ≥ 1} ⊆ M1 (R N ) and a μ ∈ M1 (R N ), w μn −→ μˆ uniformly on compact subsets if μn −→ μ. Conversely, if μn (ξ) −→ μ(ξ) ˆ pointwise, then ϕn , μn −→ ϕ, μ whenever {ϕn : n ≥ 1} is a uniformly bounded sequence in Cb (R N ; C) that tends to ϕ uniformly on compact subsets. In particular, w μn −→ μ. Proof Since ei(ξn ,x)R N −→ ei(ξ,x)R N uniformly for x in compact subsets if ξn → ξ, w ˆ if ξn → ξ, and therefore Lemma 1.1.5 says that, if μn −→ μ, then μˆ n (ξn ) −→ μ(ξ) w that μn −→ μ implies that μˆ n → μ uniformly on compact subsets. By Turning to the second part of the theorem, suppose that μn −→ μˆ pointwise. Lemma 1.1.5, we need only check that ϕ, μn −→ ϕ, μ when ϕ ∈ Cc∞ R N ; R . But, for such a ϕ, ϕˆ is smooth and rapidly decreasing, and therefore the result follows immediately from (1.1.1) together with Lebesgue’s Dominated Convergence Theorem.
https://avxhm.se/blogs/hill0
6
1 Characteristic Functions
The following is a generalization to R N of the classical Helly-Bray Theorem. It was further generalized to complete separable metric spaces by Prohorov, and, with only minor changes (cf. Theorem 9.1.9 in [14]), the treatment here can be used to prove Prohorov’s result. Lemma 1.1.7 A subset S of M1 (R N ) is sequentially relatively compact in the weak topology if and only if (1.1.3) lim sup μ BR N (0, R) = 0. R→∞ μ∈S
Proof The first step is to recognize that there is a countable set {ϕk : k ∈ Z+ } ⊆ Cc (R N ; R) of linearly independent functions whose span is dense, with respect to uniform convergence, in Cc (R N ; R). To see this, choose η ∈ Cc R N ; [0, 1] so that η = 1 on BR N (0, 1) and 0 off BR N (0, 2), and set η R (y) = η(R −1 y) for R > 0. Next, for each ∈ Z+ , apply the Stone–Weierstrass Theorem to choose a countable dense subset {ψ j, : j ∈ Z+ } of C BR N (0, 2); R , and set ϕ j, = η ψ j, . Clearly {ϕ j, : ( j, ) ∈ (Z+ )2 } is dense in Cc (R N ; R). Finally, using lexicographic ordering of (Z+ )2 , extract a linearly independent subset {ϕk : k ≥ 1} by taking ϕk = ϕ jk ,k , where ( j1 , 1 ) = (1, 1) and ( jk+1 , k+1 ) is the first ( j, ) such that ϕ j, is linearly independent of {ϕ1 , . . . , ϕk }. Given a sequence {μn : n ≥ 1} ⊆ S, we can use a diagonalization procedure to find a subsequence {μn m : m ≥ 1} such that ak = limm→∞ ϕk , μn m exists for every functional Λ on the span of {ϕk : k ≥ 1} so that k ∈ Z+ . Next, define the linear K αk ϕk , then Λ(ϕk ) = ak . Notice that if ϕ = k=1 K αk ϕk , μn m = lim ϕ, μn m ≤ ϕu , m→∞ m→∞
Λ(ϕ) = lim
k=1
and similarly that Λ(ϕ) = limm→∞ ϕ, μn m ≥ 0 if ϕ ≥ 0. Therefore Λ admits a unique extension as a non-negativity preserving linear functional on Cc (R N ; R) that satisfies Λ(ϕ) = limn→∞ ϕ, μn and therefore |Λ(ϕ)| ≤ ϕu for all ϕ ∈ Cc (R N ; R). Now assume that (1.1.3) holds. For each ∈ Z, apply the Riesz Representation Theorem to produce a non-negative Borel measure ν supported on BR N (0, 2) so that
ϕ, ν = Λ(η ϕ) for ϕ ∈ Cc (R N ; R). Since ϕ, ν+1 = Λ(ϕ) = ϕ, ν whenever ϕ vanishes off of BR N (0, ), it is clear that ν+1 Γ ∩ BR N (0, + 1) ≥ ν+1 Γ ∩ BR N (0, ) = ν Γ ∩ BR N (0, ) for all Γ ∈ BR N . Hence the limit ∞ μ(Γ ) ≡ lim ν Γ ∩ BR N (0, ) = ν Γ ∩ BR N (0, ) \ BR N (0, − 1) , →∞
=1
https://avxhm.se/blogs/hill0
1.1 Some Basic Facts
7
exists and determines a non-negative Borel measure μ on R N whose restriction to BR N (0, ) is ν for each ∈ Z+ . In particular, μ(R N ) ≤ 1 and ϕ, μ = limm→∞ ϕ, μn m for every ϕ ∈ Cc (R N ; R). Thus, by Lemma 1.1.5, all that remains is to check that μ(R N ) = 1. But μ(R N ) ≥ η , μ = lim η , μn m ≥ lim μn m BR N (0, ) m→∞ m→∞ = 1 − lim μn m BR N (0, ) , m→∞
and, by (1.1.3), the final term tends to 0 as → ∞. To prove the converse assertion, suppose that S is sequentially relatively compact. If (1.1.3) failed, then we could find an θ ∈ (0, 1) and, for each n ∈ Z, a μn ∈ S such that μn BR N (0, n) ≤ θ. By sequential relative compactness, this would mean that w N there is a subsequence {μn m : m ≥ 1} ⊆ S and a μ ∈ M1 (R ) such that μn m −→ μ and μn m BR N (0, n m ) ≤ θ. On the other hand, for any R > 0, μ BR N (0, R) ≤ lim η R , μn m ≤ lim μn m BR N (0, n m ) ≤ θ, m→∞
m→∞
and so we would arrive at the contradiction 1 = lim R→∞ μ BR N (0, R) ≤ θ.
The following lemma provides a way to check (1.1.3) using the Fourier transform. Lemma 1.1.8 Let μ ∈ M1 (R N ). Then, for all (r, R) ∈ [0, ∞)2 and e ∈ S N −1 , 1 − μ(r ˆ e) ≤ r R + 2μ {x ∈ R N : |(x, e)R N | ≥ R} . Next, define
(1.1.4)
sin θ for r > 0. s(r ) = sup 1 − θ θ≥r
Then s(r ) ∈ (0, 1] for all r > 0, limr 0
s(r ) r2
= 16 , and
N |1 − μ(ξ)| ˆ μ {x ∈ R N : |x| ≥ R} ≤ max for all (r, R) ∈ (0, ∞)2 . 1 |ξ|≤r s(N − 2 r R)
(1.1.5)
Proof The facts about s(r ) are easily checked using elementary calculus. To prove (1.1.4), simply observe that, since |1 − eiξ | ≤ |ξ| ∧ 2, 1 − μ(r ˆ e) ≤
1 − eir (x,e)R N μ(d x) ≤ r R + 2μ {x : |(x, e)| ≥ R} .
Turning to (1.1.5), note that, for t ≥ 0, ≥ 1 − μ(te) ˆ
1 − cos(t x, e)R N dμ.
https://avxhm.se/blogs/hill0
8
1 Characteristic Functions
Now integrate this inequality with respect to t ∈ [0, r ], and divide by r to see that sin(r e, x)R N 1 r 1− dt ≥ 1 − μ(te) ˆ μ(d x) ˆ ≥ max|1 − μ(ξ)| |ξ|≤r r 0 (r e, x)R N sin(r e, x)R N 1− μ(d x) ≥ s(r R)μ {x : |(e, x)R N | ≥ R} . ≥ (r e, x)R N |(e,x)R N |≥R
Finally, use the fact that 1 μ {x : |x| ≥ R} ≤ N sup μ {x : |(e, x)R N | ≥ N − 2 R} e∈S N −1
to arrive at (1.1.5)
The following theorem answers the question posed before Lemma 1.1.5 about the relationship between characteristic functions and weak convergence. It was proved by Lévy and is called Lévy’s Continuity Theorem. Theorem 1.1.9 Let {μn : n ≥ 1} ⊆ M1 (R N ), and assume that f (ξ) = limn→∞ μˆ n (ξ) exists for each ξ ∈ R N . Then f is the characteristic function of a μ ∈ M1 (R N ) if and only if there is a δ > 0 for which lim sup μˆ n (ξ) − f (ξ) = 0,
n→∞ |ξ|≤δ w
in which case μn −→ μ. Proof The only assertion not already covered by Lemmas 1.1.5 and 1.1.1 is the “if” part of the equivalence. But, if μˆ n −→ f uniformly in a neighborhood of 0, then it is easy to check that supn≥1 |1 − μˆ n (ξ)| must tend to zero as |ξ| → 0. Hence, by (1.1.5) and Lemma 1.1.7, we know that there exists a μ and a subsequence {μn m : m ≥ 1} w w such that μn m −→ μ. Since μˆ must equal f , Lemma 1.1.1 says that μn −→ μ. Bochner found an interesting characterization of characteristic functions, one which is intimately related to Lévy’s Continuity Theorem. To describe his result, say that a function f : R N −→ C is non-negative definite if the matrix
f (ξ j − ξk )
1≤ j,k≤n
is non-negative definite for all n ≥ 2 and ξ1 , . . . , ξn ∈ R N , which is equivalent to saying n f (ξ j − ξk )α j αk ≥ 0 j,k=1
for all α1 , . . . , αn ∈ C.
https://avxhm.se/blogs/hill0
1.1 Some Basic Facts
9
Theorem 1.1.10 A function f : R N −→ C is a characteristic function if and only if f is continuous, f (0) = 1, and f is non-negative definite. Proof Assume that f = μˆ for some μ ∈ M1 (R N ). Then it is obvious that f is continuous and that f (0) = 1. To see that it is non-negative definite, observe that n
f (ξ j − ξk )α j αk =
j,k=1
⎛ ⎝
n
⎞ ei(ξ j −ξk )x α j αk ⎠ μ(d x)
j,k=1
2 n iξ x j e α j μ(d x) ≥ 0. = j,k=1
Now assume that f is a continuous, non-negative definite function with f (0) = 1. Because 1 f (ξ) A≡ f (−ξ) 1 is non-negative definite, f (ξ) + f (−ξ) and i f (ξ) − i f (−ξ) are both 0, and therefore f (ξ) = f (−ξ). Thus A is Hermitian, and because it is non-negative definite, 1 − | f (ξ)|2 ≥ 0. Therefore | f (ξ)| ≤ 1. Next, let ψ ∈ S (R N ; R), and use Riemann approximations to see that
ˆ ˆ ψ(η) dξdη ≥ 0. f (ξ − η)ψ(ξ)
Assume for the moment that f ∈ L 1 (λR N ; C), and set g(x) = (2π)−N
e−i(ξ,x) f (ξ) dξ.
ˆ ˆ By Parceval’s identity, Fubini’s Theorem and the fact that ψ(ξ) = ψ(−ξ), 2 (−ξ) dξ = f (ξ)ψ f (ξ) ψˆ ∗ ψˆ (−ξ) dξ g(x)ψ(x)2 d x = ˆ ˆ ˆ ψ(η) ˆ = f (ξ)ψ(ξ + η)ψ(η) dξdη = f (ξ − η)ψ(ξ) dξdη ≥ 0.
(2π) N
Hence, since g is continuous, it follows that g ≥ 0. In addition, f = gˆ and so g(x) d x = f (0) = 1 and f is the Fourier transform of the probability measure dμ = g dλR N . To remove the assumption that f is integrable, choose a non-negative, even ρ ∈ N S (R N ; R) for which ρ, λR N = 1, and set ρt (x) = t − 2 ρ(t −1 x) for t > 0. Then ρˆ ∈ ˆ Therefore f t ≡ ρt f is a continuous, λR N -integrable S (R N ; R) and ρt (ξ) = ρ(tξ). function that is 1 at 0. To see that f t is non-negative definite, note that
https://avxhm.se/blogs/hill0
10 n
1 Characteristic Functions
f t (ξ j − ξk )α j αk =
j,k=1
n
f (ξ j − ξk )α j αk
j,k=1
=
⎛ ⎝
n
ei(ξ j −ξk )x ρt (x) d x
⎞ f (ξ j − ξk ) α j eiξ j x αk eiξk x ⎠ ρt (x) d x ≥ 0.
j,k=1
t for some μt ∈ M1 (R N ), and so, since f t −→ f uniformly on compact Thus f t = μ subsets, Lévy’s Continuity Theorem implies that μt tends weakly to a μ ∈ M1 (R N ) for which f = μ. ˆ Because it is difficult to check whether a function is non-negative definite, it is the more or less trivial necessity part of Bochner’s Theorem that turns out in practice to be more useful than the sufficiency conditions. Exercise 1.1.1 (i) By combining Theorem 1.1.10 with (1.1.2), one sees that if f is a continuous, non-negative definite function for which f (0) = 1, then | f (ξ)| ≤ 1 and | f (η) − f (ξ)|2 ≤ 2 1 − f (η − ξ) . Show that these inequalities hold even if one drops the continuity assumption. Hint: Use the non-negative definiteness of the matrices
1 f (−ξ) f (ξ) 1
⎛
⎞ 1 f (−ξ) f (−η) 1 f (ξ − η)⎠ and ⎝ f (ξ) f (η) f (η − ξ) 1
to see that f (−ξ) = f (ξ) and that 1 + 2α 1 − f (η − ξ) + 2α2 f (η) − f (ξ) ≥ 0 for all α ∈ R. (ii) Show that if f 1 and f 2 are non-negative definite functions, then so are f 1 f 2 and, for any a, b ≥ 0, a f 1 + b f 2 . Hint:Show thatif A and B are non-negative definite, Hermitian N × N matrices, then Ak, Bk, 1≤k,≤N is also. (iii) Suppose that f : R N −→ C is a function for which f (0) = 1. Show that if lim|x|0 1−|x|f (x) = 0, then f cannot be a characteristic function. In particular, if 2 −|ξ|α is not a characteristic function. α > 2, then e (iv) Given a finite signed Borel measure μ on R N , define μ(ξ) ˆ =
ei(ξ,x)R N μ(d x),
and show that μˆ = 0 if and only if μ = 0.
https://avxhm.se/blogs/hill0
1.2 Infinitely Divisible Laws
11
Hint: Use the Hahn Decomposition Theorem to write μ as the difference of two, mutually singular, non-negative Borel measures on R N . (v) Suppose that f : R −→ C is a non-constant, twice continuously differentiable characteristic function. Show that f (0) < 0 and that f f(0) is again a characteristic function. In addition, show that f 2u ∨ f u ≤ | f (0)| and that | f (η) − f (ξ)| ≤ 1 | f (0)| 2 |η − ξ|. (vi) Suppose that {μn : n ≥ 1} ⊆ M1 (R) and that f (ξ) = limn→∞ μn (ξ) exists for each ξ ∈ R. Show that f is a characteristic function if and only if it is continuous at 0, and notice that this provides an alternative proof of Theorem 1.1.9. dμn = (2n)−1 1[−n,n] . Show that (vii) Let μn ∈ M1 (R) be the measure for which dλ R μn −→ 1{0} pointwise, and conclude that {μn : n ≥ 1} has no weak limits. This example demonstrates the important role that continuity plays in Bochner’s and Lévy’s theorems.
1.2 Infinitely Divisible Laws Except for Lemma 1.2.1, the contents of this subsection will not be used below. Because sums of independent, identically distributed random variables play a prominent role in probability theory, Lévy and Khinchine asked what are the distributions of random variables that, for every n ≥ 1, can be written as the sum of n independent, identically distributed random variables. To express this in terms of measures, remember that the convolution of μ, ν ∈ M1 (R N ) is the measure μ ∗ ν ∈ M1 (R N ) given by μ ∗ ν(Γ ) =
1Γ (x + y) μ(d x)ν(dy)
and that, if μ and ν are the distributions of independent random variables X and Y , then μ ∗ ν is the distribution of X + Y . Thus, what they were asking is which μ ∈ M1 (R N ) have the property that, for every n ≥ 2, μ admits an nth root μ n1 ∈ M1 (R N ) with respect to convolution. That is, μ = (μ n1 )∗n = μ n1 ∗ · · · ∗ μ n1 , n
which, since μ ∗ ν = μˆ ν, ˆ is equivalent to μˆ = ( μ n1 )n . Denote by I(R N ) the set of infinitely divisible μ ∈ M1 (R N ). What Lévy and Khinchine proved is that μ ∈ I(R N ) if and only if
https://avxhm.se/blogs/hill0
12
1 Characteristic Functions
1 μ(ξ) ˆ = exp i(b, ξ)R N − ξ, Aξ R N 2
+ ei(ξ,y)R N − 1 − i1 BR N (0,1) (y)(ξ, y)R N M(dy) ,
(1.2.1)
for some b ∈ R N , non-negative definite, symmetric A ∈ Hom(R N ; R N ), and Borel |y|2 measure M on R N such that M({0}) = 0 and 1+|y| 2 M(dy) < ∞. The expression in (1.2.1) is called the Lévy–Khinchine formula, a measure M satisfying the stated conditions is called a Lévy measure, and the triple (b, A, M) is called a Lévy system. It is clear that if the right hand side of (1.2.1) is a characteristic function for every Lévy system, then these are characteristic functions of infinitely divisible laws. Indeed, if μ n1 )n . μ corresponds to (b, A, M) and μ n1 corresponds to nb , nA , Mn , then μˆ = ( Proving that the function f (b,A,M) on the right hand side of (1.2.1) is a characteristic function is a relatively easy. To wit, f (0,I,0) = γˆ (cf. (2.1.1)), where γ(d x) = |x|2
(2π)− 2 e− 2 λR N (d x), and so it is easy to check that f b,A,0 is the characteristic func1 tion of the distribution of b + A 2 x under γ. Also, if the Lévy measure M is finite and π M is the Poisson measure given by N
π M = e−M(R
N
)
∞ M ∗n , n! n=0
(1.2.2)
then f b M ,0,M = π M , where bM =
y M(dy). BR N (0,1)
Hence, when M is finite, f (b,A,M) is the characteristic function of γb−b M ,A ∗ π M . Finally, for general Lévy measures M, set Mk (dy) = 1[ k1 ,∞) (|y|)M(dy). Then Mk is finite, and so f (b,A,Mk ) is a characteristic function. Therefore, since f (b,A,Mk ) −→ f (b,A,M) uniformly on compact subsets, Theorem 1.1.9 says that f (b,A,M) is a characteristic function. There are several proofs that (1.2.1) describes the characteristic function of every μ ∈ I(R N ), but none of them is simple. Nonetheless, it is possible to explain the idea on which all approaches are based. Namely, elements of I(R N ) have nth roots for all n ≥ 1 in the Abelian group M1 (R N ) with convolution. Thus, for any m, n ≥ 1, a μ ∈ I(R N ) has an mn th root μ mn = (μ n1 )∗m . Hence one should expect that one can take the logarithm of μ. Equivalently, one should expect that the limit n1 (ξ) − 1 μ (ξ) = lim n μ n→∞
exists and that μˆ = eu . As we will show, proving the existence of this limit is quite easy, but the proof that
https://avxhm.se/blogs/hill0
1.2 Infinitely Divisible Laws
μ (ξ) = i(b, ξ)R N −
13
1 ξ, Aξ R N + 2
ei(ξ,y)R N − 1 − i1 B
(y)(ξ, y)R N R N (0,1)
M(dy)
for some Lévy system (b, A, M) is harder. To prove that μ exists, we will need the following elementary fact about C-valued functions. In its statement, log is the principle branch of the logarithm function on C \ (−∞, 0]. That is, log(r eiθ = log r + θ for r > 0 and θ ∈ (−π, π). In particular, log(ζ) = −
∞ (1 − ζ)n n=1
n
when |1 − ζ| < 1.
Lemma 1.2.1 Given R > 0, suppose that f : BR N (0, R) −→ C is a continuous function that equals 1 at 0 and never vanishes. Then there exists a unique continuous function f : BR N (0, R) −→ C such that f (0) = 0 and f = e f on BR N (0, R). Moreover, if ξ ∈ BR N (0, R) and r > 0, then f (η) N N : η ∈ sup 1 − B (ξ, r ) ∩ B (0, R) 0 so that |1 − μ(ξ)| ˆ ≤ 21 for |ξ| ≤ r . By Lemma 1.2.1, there is a unique continuous : BR N (0, r ) −→ C such that (0) = 0 and μˆ = e on BR N (0, r ). μ n1 )n . Then μ n1 doesn’t vanish on BR N (0, r ), and so there Now choose μ n1 so that μˆ = ( n1 = en on BR N (0, r ). Hence, is an n : BR N (0, r ) −→ C such that n (0) = 0 and μ
μˆ = enn , and so, by uniqueness, = nn and therefore μ n1 = e n on BR N (0, r ).
Next, because = log μˆ and |1 − μ| ˆ ≤ 21 on BR N (0, r ), || ≤ 2 there. Therefore, since ≤ 0, |1 − μ n1 | = |1 − e n | ≤ n2 on BR N (0, r ). Now apply (1.1.5) to see that, for any ρ > 0, 2N μn {x : |x| ≥ ρ} ≤ 1 ns(N − 2 ρr ) and therefore, by (1.1.4), that |1 − μ n1 (ξ)| ≤ ρR + Finally, take ρ =
1 8R
and n ≥
4N ns(N − 2 r ρ) 1
16N 1
s(N − 2 r ρ)
for ρ, R > 0 and |ξ| ≤ R.
, and conclude that μ n1 , and therefore μ, ˆ doesn’t
vanish on BR N (0, R) for any R > 0.
https://avxhm.se/blogs/hill0
1.2 Infinitely Divisible Laws
15
As a consequence of Lemmas 1.2.2 and 1.2.1, we know that if μ ∈ I(R N ), then μˆ = eμ for a unique continuous μ : R N −→ C which vanishes at 0. Further, by the μ n1 = e n . Thus argument at the beginning of the preceding proof, μ = (μ n1 )∗n =⇒ μ μ is the limit which was predicted above. The challenge now is to show that 1 μ (ξ) = (b,A,M) (ξ) ≡ i(b,ξ)R N − ξ, Aξ R N 2
ei(ξ,y)R N − 1 − i1 BR N (0,1) (y)(ξ, y)R N M(dy) + for some Lévy system (b, A, M). Here is an outline of one approach. Using the estimates in Lemma 1.1.8, one can show (cf. (vi) in Exercise 1.2.1) that μ (ξ) has at most quadratic growth, thereby justifying
−1 ϕ(ξ) ˆ μ (−ξ) dξ = lim t t0
tμ (−ξ)
ϕ, μt − ϕ(0) − 1 dξ = (2π) N lim ϕ(ξ) ˆ e t0 t
for ϕ ∈ S (R N ; C). Next, define the linear functional Aμ on S (R; R) by Aμ ϕ = lim t0
ϕ, μt − ϕ(0) . t
At this point the problem becomes that of showing that N 1 Aμ ϕ = b, ∇ϕ(0) R N + A j,k ∂ j ∂k ϕ(0) 2 j,k=1 ϕ(x) − ϕ(0) − 1 BR N (0,1) (x) x, ∇ϕ(0) R N M(d x) +
for some Lévy system (b, A, M). Indeed, define the Lévy operator L(b,A,M) on Cb2 (R N ; C) by N 1 A j,k ∂ j ∂k ϕ(x) L(b,A,M) ϕ(x) = b j , ∇ϕ(x) R N + 2 j,k=1 ϕ(x + y) − ϕ(x) − 1 BR N (0,1) (x) y, ∇ϕ(x) R N M(dy). +
Using Fubini’s Theorem and elementary Fourier theory, one can check that (2π)
−N
ϕ(ξ) ˆ (b,A,M) (−ξ) dξ = L(b,A,M) ϕ(0),
https://avxhm.se/blogs/hill0
16
1 Characteristic Functions
and therefore one would know that ϕ(ξ) ˆ ϕ(ξ) ˆ μ (−ξ) dξ = (b,A,M) (−ξ) dξ for ϕ ∈ S (R; C), which is possible only if μ = (b,A,M) . The proof that Aμ ϕ = L(b,A,M) ϕ(0) for some Lévy system (b, A, M) relies on two facts, the most crucial of which is the simple observation that Aμ ϕ ≥ 0 if ϕ ∈ S (R N ; R) satisfies ϕ ≥ ϕ(0), a property that is reminiscent of the minimum principle for second order elliptic operators and is obvious from the original expression for Aμ ϕ. The second required is that Aμ has a quasi-local property. Namely, fact if ϕ ∈ S (R; C) and ϕ R (x) = ϕ Rx , then Aμ ϕ R −→ 0 as R → ∞. Verifying this property is most easily done by using the expression for the action of Aμ in terms of μ . Finally, based on these two properties alone, one can show that Aμ ϕ = L(b,A,M) ϕ(0) for some (b, A, M). See Sect. 3.1 in [14] for more details. Exercise 1.2.1 (i) Show that if μ ∈ I(R N ), then etμ is a characteristic function of a μt ∈ I(R N ) for each t > 0. In addition, show that μs ∗ μt = μs+t . (ii) Let μ ∈ M1 (R N ) and {n k : k ≥ 1} ⊆ Z+ be a sequence that increases to ∞. Show that if, for each k ≥ 1, there is an μ n1 ∈ M1 (R N ) such that μ = (μ n1 )∗n k , then k
k
μ ∈ I(R N ). α (iii) Show that for each α ∈ (0, 2] there is a μ ∈ I(R N ) for which e−|ξ| is the characteristic function. Thus, when combined with (iii) in Exercise 1.1.1, this proves α that, for any t > 0, e−t|ξ| is a non-negative definite function if and only α ∈ (0, 2]. (iv) Show that I(R N ) is closed under weak convergence. Further, show that it is the weak closure in M1 (R N ) of Poisson measures, measures of the form (cf. (1.2.2)) π M where M is a finite Lévy measure. (v) The Poisson kernel for the upper halfspace R × (0, ∞) is the function p y (x) = After checking that p y (x) d x = 1, take dμ y = p y dλR , and show that μy (ξ) = e−y|ξ| . Hint: Show that ∂ y2 p y (x) = −∂x2 p y (x) and therefore that ∂ y2 py (ξ) = ξ 2 py (ξ). Next, show that lim y0 py (ξ) = 1, and use the fact that | py (ξ)| ≤ 1 to conclude that py (ξ) = e−y|ξ| . y . π(x 2 +y 2 )
(vi) Suppose that μ ∈ I(R N ), r > 0, and that |1 − μ(ξ)| ˆ ≤ 21 if |ξ| ≤ r . Show that there is a C < ∞ which depends only on r such that |μ (ξ)| ≤ C(1 + |ξ|2 ) for all ξ ∈ RN . Hint: Arguing as in the proof of Lemma 1.1.8 and using the notation there, show that |1 − μˆ n1 (ξ)| μ n1 {x : |(x, e)R N | ≥ R} ≤ max |ξ|≤r s(r R)
https://avxhm.se/blogs/hill0
1.2 Infinitely Divisible Laws
17
for all e ∈ S N −1 and R > 0. Next check that, 2 2 1 =⇒ |μ 1 (ξ)| ≤ =⇒ |1 − μ n1 (ξ)| ≤ , n 2 n n and therefore that μ n1 {x : |(x, e)R N | ≥ R} ≤ ns(r2 R) . Starting from this and using (1.1.4), conclude that 4 1 |1 − μˆ n1 (ξ)| ≤ + 4 ns(r |ξ|−1 ) |1 − μ(ξ)| ˆ ≤
16 and therefore that |μ (ξ)| ≤ 2n if n ≥ s(r |ξ| −1 ) . Finally, to complete the proof, note 2 that there is an > 0 such that s(t) ≥ t for t ∈ (0, 1].
https://avxhm.se/blogs/hill0
Chapter 2
Gaussian Measures and Families
This chapter deals with some of the properties of Gaussian measures and the construction of families of Gaussian random variables.
2.1 Gaussian Measures on R The standard Gaussian measure on R is the Borel probability measure γ0,1 λR with Randon–Nikodym derivative dγ0,1 1 x2 (x) = (2π)− 2 e− 2 . dλR 1
If b ∈ R and a ≥ 0, then γb,a ∈ M1 (R N ) denotes the distribution of b + a 2 x under γ0,1 . Thus, γb,0 = δb , the unit point mass at b, and if a > 0, then γb,a (d x) = (x−b)2
(2πa)− 2 e− 2a λR (d x). It is easy to check that a real valued random variable with distribution γb,a has mean value b and variance a. Some people say that such a random variable is Gaussian and others say it is normal, and I will sometimes use one term and sometimes the other term. Finally, the set N (b, a) will denote the set of normal random variables with mean b and variance a. The following lemma contains a few useful facts about Gaussian random variables. 1
Lemma 2.1.1 If a ≥ 0 and X ∈ N (0, a), then aζ 2 E eζ X = e 2 for all ζ ∈ C,
(2.1.1)
αX 2 1 E e 2 = (1 − αa)− 2 for α ∈ 0, a −1 ,
(2.1.2)
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. W. Stroock, Gaussian Measures in Finite and Infinite Dimensions, Universitext, https://doi.org/10.1007/978-3-031-23122-3_2
https://avxhm.se/blogs/hill0
19
20
2 Gaussian Measures and Families
α E |X |α = Cα a 2 where Cα =
|x|α γ0,1 (d x) for α ∈ [0, ∞),
(2.1.3)
and, for n ≥ 1, n (2n)! E X 2n−1 = 0 and E X 2n = n a n = a n (2m − 1). 2 n! m=1
(2.1.4)
R2 In particular, P |X | ≥ R ≤ 2e− 2a for R > 0. Moreover, if {X n : n ≥ 1} is a sequence of Gaussian random variables and X n converges in probability to X , then X is Gaussian and E |X n − X | p −→ 0 for all p ∈ [1, ∞). Proof When ζ ∈ R, (2.1.1) follows by writing ζ x − −
x2 2a
as
aζ 2 (x − aζ)2 + 2a 2
and using the translation invariance of λR . To handle general ζ ∈ C, note that both sides of the equation are entire functions that agree on R and therefore on C. a , and observe that To prove (2.1.2), set c = 1−αa αX 2 1 E e 2 = (2πa)− 2
c 21 x2 dx = . exp − 2c a 1
Since X has the same distribution as a 2 times a standard normal random variable, (2.1.3) is trivial and (2.1.4) reduces to the case when a = 1. But, when a = 1, Theorem 1.1.4 justifies ∞ ∞
αm m α2m α2 E X = E eαX = e 2 = , m! 2m m! m=0 m=0
from which (2.1.4) with a = 1 follows immediately. Now suppose that X n ∈ N (bn , an ) and that X n −→ X in probability. Then, for all ξ ∈ R, an ξ 2 eibn ξ− 2 = E eiξ X n −→ E eiξ X . Choose r > 0 so that E eiξ X = 0 for |ξ| ≤ r . Then, by applying Lemma 1.2.1 to ζ ∈ C −→ E ei ζ X , there is a unique continuous : [−r, r ] −→ C such that (0) = 0 2 and E eiξ X = e(ξ) , and so, by that lemma, ibn − an2ξ −→ (ξ) for |ξ| ≤ r . Hence, a, and so X ∈ N (b, a). there exist b ∈ R and a ≥ 0 such that bn −→ b and an −→ Furthermore, since bn −→ b, to see that E |X n − X | p −→ 0, it suffices to do so when bn = 0 for all n. But, since {an : n ≥ 1} is bounded, in that case, (2.1.2) says that there exists an α > 0 such that
https://avxhm.se/blogs/hill0
2.2 Cramér–Lévy Theorem
21
αX 2 αX n2 E e 2 ≤ sup E e 2 < ∞, n≥1
and so |X n − X | p : n ≥ 1 is a sequence of uniformly integrable functions that tend to 0 in probability. Remark: Suppose that {X n : n ≥ 1} is a sequence of Gaussian random variables that converge in distribution (i.e., their distributions converge weakly) to a random variable X . Then the argument used in the preceding shows that X is again a Gaussian random variable. Hence, the set of Gaussian measures is closed in the weak topology. By contrast (cf. part (iv) of Exercise 1.2.1), the weak closure of the set of Poisson measures π M is I(R N ).
2.2 Cramér–Lévy Theorem The following remarkable theorem was discovered by Cramér and Lévy. So far as I know, there is no truly probabilistic or real analytic proof of it. Theorem 2.2.1 If X and Y are independent random variables whose sum is Gaussian, then each of them is Gaussian. Proof Without loss in generality, assume that X + Y ∈ N (0, 1). Choose r > 0 so that P(|X | ≥ r ) ∨ P(|Y | ≥ r ) ≤ 21 . Then R2
P(|X | ≥ r + R) ≤ 2P(|X | ≥ r + R & |Y | ≤ r ) ≤ 2P(|X + Y | ≥ R) ≤ 4e− 2 , 2 2 R2 and similarly P(|Y | ≥ r + R) ≤ 4e− 2 . Therefore E eαX ∨ E eαY < ∞ for some α > 0. Knowing that X and Y are integrable, one can reduce to the case when they have mean 0, andso we will proceed under that assumption. Set f (ζ) = E eζ X and g(ζ) = E eζY for ζ ∈ C. By Theorem 1.1.4, both f ζ2
and g are entire functions whose product equals e 2 . In particular, neither of them vanishes anywhere, and so, by Lemma 1.2.1, there is an entire function θ such that 2 θ(0) = 0, f (ζ) = eθ(ζ) , and g(ζ) = exp ζ2 − θ(ζ) . Furthermore, since E[X ] = 0 and therefore θ (0) = 0, θ(ζ) =
∞
cn ζ n where n!cn = ∂ξn log E eξ X ξ=0 ∈ R.
n=2
Observe that, by Hölder’s inequality, log ◦ f and log ◦g are convex functions of ξ ∈ R. Thus, since they and their first derivatives vanish at ξ = 0, they must be non-negative on R. That is,
https://avxhm.se/blogs/hill0
22
2 Gaussian Measures and Families
θ(ξ) ≥ 0 ≤
ξ2 − θ(ξ). 2
Therefore, for ζ = ξ + iη,
ξ2 e θ(ζ) = eθ(ζ) ≤ E eζ X = eθ(ξ) ≤ e 2 and e
ξ 2 −η 2 2
− θ(ζ)
ξ2
≤ e 2 −θ(ξ) ,
which means that −η 2 ≤ 2 θ(ζ) ≤ ξ 2 and therefore that θ(ζ) ≤ Finally, by Cauchy’s Theorem, for n > 2 and any r > 0, 2πcn = r −n
2π
θ(r eit )e−int dt = 2r −n
0
since
2π 0
2π
|ζ|2 . 2
θ(r eit )e−int dt
0
θ(r eit )e−int
dt =
2π
θ(r eit )eint dt = 0.
0
Thus 2π|cn | ≤ 2πr 2−n for all r > 0, which means that cn = 0 for n > 2. Therefore 2 f (ζ) = ec2 ζ , which is possible only if c2 ≥ 0. In the following and elsewhere, I will employ the technique of symmetrization of a μ ∈ M1 (R). To describe this technique, define μ˜ by μ(Γ ˜ ) = μ(−Γ ). Then the symmetrization of μ is the measure μ ∗ μ. ˜ Equivalently, μ ∗ μ˜ = |μ| ˆ 2 , and if X and Y are independent random variables with distribution μ, then X − Y has distribution μ ∗ μ. ˜ Obviously, μ ∗ μ˜ is even in the sense that it assigns −Γ the same measure as it does Γ . Less obvious is the fact that integrability properties of X are intimately related to those of X − Y . Specifically, let α be a median of X (i.e., P(X ≥ α) ≥ 21 ≤ P(X ≤ α)). For any R > 0, P(X ≥ R + α) ≤ 2P(X ≥ R + α & Y ≤ α) ≤ 2P(X − Y ≥ R), and similarly P(X ≤ −R + α) ≤ 2P(X − Y ≤ −R). Therefore P(|X − α| ≥ R) ≤ 2P(|X − Y | ≥ R), and so
1 1 1 E |X | p p ≤ 2 p E |X − Y | p p + α
for all p ∈ [1, ∞). The following corollary, which is closely related to (iii) in Exercises 1.1.1 and 1.2.1, was proved originally by Marcinkiewicz.
https://avxhm.se/blogs/hill0
2.2 Cramér–Lévy Theorem
23
Corollary 2.2.2 If μˆ is the exponential of a polynomial, then μ is Gaussian. Proof By symmetrization and the Lévy–Cramér Theorem, we may assume that μ is even, and therefore that μ(ξ) ˆ = e P(ξ) , where P(ξ) = nm=0 cm ξ 2m with cm ∈ R for all 0 ≤ m ≤ n. If n = 0, there is noting to do, and so we will assume that n ≥ 1 and ˆ ≤ 1, it is clear that (−1)n cn < 0. cn = 0, in which case, since |μ(ξ)| Now let X be a random variable with distribution μ. By Theorem 1.1.4, we know that E eζ X = e P(−iζ) for ζ ∈ C. π
Take ζ K = i K ei 2n , where K > 0. Then, on the one hand,
ζ X
E e K = exp P K ei 2nπ = exp −cn K 2n + o(K 2n ) , and, on the other hand,
ζ X
E e K ≤ E e ζ K X = E e−K sin 2nπ X π π = e P(i K sin 2n ) = exp (−1)n cn K 2n sin2n + o(K 2n ) . 2n Hence |cn | ≤ (−1)n cn sin2n
π , 2n
which is possible only if n = 1.
By combining Corollary 2.2.2 with Theorem 1.1.10, one sees that the only nonnegative definite functions which are the exponential of a polynomial are, apart from a multiplicative constant, characteristic functions of Gaussian random variables.
2.2.1 Gaussain Measures and Cauchy’s Equation Here we will develop a useful characterization of Gaussian measures. Theorem 2.2.3 Let μ ∈ M1 (R) and α, β ∈ (0, 1) with α2 + β 2 = 1 be given. Then μ is a centered Gaussian if and only if μ(ξ) ˆ = μ(αξ) ˆ μ(βξ). ˆ Proof The necessity part is trivial. Now assume the sufficiency condition. Using induction, one sees that, for any n ≥ 0, n n μ(ξ) ˆ = μ(α ˆ m β n−m ξ)(m ) . m=0
In particular, μˆ never vanishes and so there is a unique continuous choice of log μ(ξ) ˆ which vanishes at ξ = 0. First suppose that t ≡ x 2 μ(d x) < ∞. Then
x μ(d x) =
αx + β y μ(d x)μ(dy) = (α + β)
https://avxhm.se/blogs/hill0
x μ(d x),
24
2 Gaussian Measures and Families
which, since (α + β)2 > α2 + β 2 = 1, means that
x μ(d x) = 0. Thus, since
α2m β 2(n−m) ξ 2 + o α2m β 2(n−m) ξ 2 μˆ αm β n−m ξ = 1 − 2 and max1≤m≤n αm β n−m −→ 0 as n → ∞, for each ξ ∈ R log μ(ξ) ˆ =
n
n m=0
=−
m
n
m=0
log μ(α ˆ m β n−m ξ)
n m
tα2m β 2(n−m) ξ 2 tξ 2 + o α2m β 2(n−m) ξ 2 −→ − , 2 2
n n 2m 2(n−m) where = 1 was used. Thus, it suffices to show in2 the final step m=0 m α β that x μ(d x) < ∞. Next suppose that μ is symmetric. Then μˆ > 0 everywhere and log μ(1) ˆ =
n
n m=0
m
log
cos(αm β n−m x) μ(d x) .
Hence, since 1 − t ≤ − log t for t ∈ (0, 1], − log μ(1) ˆ ≥
n
n m=0
Because
m
1 − cos(αm β n−m x) μ(d x).
n
n x2 1 − cos(αm β n−m x) −→ , 0≤ m 2 m=0
Fatou’s lemma implies that x 2 μ(d x) ≤ −2 log μ(1) ˆ < ∞. Finally, for general μ’s, observe that μ ∗ μˇ will satisfy the hypothesis and use either Theorem 2.2.1 or the fact that μ has a second moment if μ ∗ μˇ does. A function f : R −→ R is additive if f (x + y) = f (x) + f (y) for all x, y ∈ R, and say that f : R −→ R is a.e. additive if f (x + y) = f (x) + f (y) for λR2 -almost every (x, y) ∈ R2 . Cauchy asked which additive functions are linear, and Erdˇos asked which a.e. additives ones are a.e. equal to a linear function. Theorem 2.2.3 provides answers to both Cauchy’s and Erdˇos’s questions. Theorem 2.2.4 Let f : R −→ R be a Borel measurable function. If f is additive, then f (x) = f (1)x for all x ∈ R. If f is a.e. additive, then there is an a ∈ R such that f (x) = ax for λR -almost every x ∈ R. Proof Assume that f is additive. It is easy to check that f (q x) = q f (x) for all q ∈ Q and x ∈ R. Thus, if f is continuous, then f (x) = f (1)x. Now assume that f
https://avxhm.se/blogs/hill0
2.2 Cramér–Lévy Theorem
25
is locally λR -integrable, and choose ρ ∈ Cc∞ (R; R) with total integral 1. Then f ∗ ρ is smooth and f ∗ ρ(x) = f (x) + f (−y)ρ(y) dy. Thus f is smooth and therefore it is linear. In general, let μ be the distribution of f under the standard Gaussian measure γ0,1 , and set q1 = 35 and q2 = 45 . Then, because q12 + q22 = 1, ˆ 2 ξ) = μ(q ˆ 1 ξ)μ(q
=
eıξ(q1 f (x)+q2 f (y)) γ0,1 (d x)γ0,1 (dy) eıξ f (q1 x+q2 y) γ0,1 (d x)γ0,1 (dy) = μ(ξ), ˆ
and therefore, by Theorem 2.2.3, μ is a centered Gaussian measure. In particular,
f (x)2 γ0,1 (d x) =
x 2 μ(d x) < ∞,
and so f is locally λR -integrable and therefore linear. Now assume that f is a.e. additive. The first step is to show that, for all q1 , q2 ∈ Q+ , f (q1 x + q2 y) = q1 f (x) + q2 f (y) (a.e.,λR2 ). To this end, let n ≥ 1, and assume that f (nx + y) = n f (x) + f (y) (a.e.,λR2 ). Then, because the distribution of (x, x + y) under λR2 is equivalent to that of (x, y), f (n + 1)x + y = f nx + (x + y) = n f (x) + f (x + y) = n f (x) + f (x) + f (y) = (n + 1) f (x) + f (y) for λR2 -a.e. (x, y) ∈ R2 . Hence, by induction, for all n ≥ 1, f (nx + y) = n f (x) + f (y) for λR2 -a.e. (x, y) ∈ R2 . At the same time, because the distribution of (nx, y) under λR2 is equivalent to that of (x, y), f (nx) + f (y) = f (nx + y) = n f (x) + f (y) for λR2 -a.e. (x, y) ∈ R2 , and so, by Fubini’s f (nx) = n f (x) (a.e.,λR ). Similarly, because Theorem, the λR2 -distribution of nx , y is equivalent of that of (x, y), for all m, n ∈ Z+ , f mn x = m f (x) (a.e.,λR ). Finally, given q1 , q2 ∈ Q+ , the λR2 -distribution of (q1 x, q2 y) is n equivalent to that of (x, y), and therefore f (q1 x + q2 y) = f (q1 x) + f (q2 y) = q1 f (x) + q2 f (y) for λR2 -a.e. (x, y) ∈ R2 . To complete the proof, again take q1 = 35 and q2 = 45 , and again consider the disˆ = μ(q ˆ 1 ξ)μ(q ˆ 2 ξ), and therefore tribution μ of f under γ0,1 . Then, just as before, μ(ξ) f is locally λR -integrable. In addition, if ρ ∈ Cc∞ (R; R) has total integral 1, then f˜(x) ≡ f ∗ ρ(x) −
f (−y)ρ(y) dy = f (x) for λR -a.e. x ∈ R.
https://avxhm.se/blogs/hill0
26
2 Gaussian Measures and Families
In particular, for each y, f˜(x + y) = f (x + y) for λR -a.e. x ∈ R, and from this and Fubini’s Theorem, it follows that f˜ is a smooth additive function. Hence, f˜ is linear, and so f (x) = f˜(1)x for λR -a.e. x ∈ R. Extentions of these results and ideas can be found in [4].
2.3 Gaussian Spectral Properties The first goal here is to prove that
2 ϕ − ϕ, γ0,1 dγ0,1 ≤ ϕ 2L 2 (γ0,1 ;R)
(2.3.1)
for ϕ ∈ C 1 (R; R). A closely related inequality was used to great effect by Poincaré, and such inequalities have ever since been called a Poincaré inequality. In the proof of (2.3.1) we will make use of the function ⎛
⎞
t − 1 (y − e− 2 x)2 ⎠ p(t, x, y) = 2π(1 − e−t ) 2 exp ⎝− for (t, x, y) ∈ (0, ∞) × R × R. 2(1 − e−t )
Observe that p(t, x, y) dy = 1 and p(s + t, x, y) = p(s, x, ξ) p(t, ξ, y) dξ and ∂t p(t, x, y) =
1 2
2 ∂x − x∂x p(t, x, y).
Hence, if Pt is the operator on Cb (R; C) given by Pt ϕ(x) =
ϕ(y) p(t, x, y) dy,
(2.3.2)
then it is a contraction on Cb (R; C) into itself, and Ps+t = Pt ◦ Ps . Thus {Pt : t > 0} is semigroup, known as the Ornstein–Uhlenbeck semigroup, of contractions on Cb (R; C). Further, (2.3.3) ∂t Pt ϕ = LPt ϕ, and so, if ϕ ∈ Cb2 (R; C), lim t0
Pt ϕ − ϕ = Lϕ, t
https://avxhm.se/blogs/hill0
(2.3.4)
2.3 Gaussian Spectral Properties
27
where L=
1 2
2 ∂x − x∂x
(2.3.5)
is the Ornstein–Uhlenbeck operator and the convergence is uniform on compact subsets. Note that 2 − 2t 2 − 2e x y + y x 1 −1 exp − p(t, x, y)γ0,1 (d x) = 2π(1 − e−t ) 2 λR (d x), 2(1 − e−t ) and so
ϕ, Pt ψ L 2 (γ0,1 ;R) = Pt ψ, ϕ L 2 (γ0,1 ;R) .
In particular,
Pt ϕ dγ0,1 = 1, Pt ϕ L 2 (γ0,1 ;R) = Pt 1, ϕ L 2 (γ0,1 ;R) =
ϕ dγ0,1 .
Next, by Jensen’s inequality, |Pt ϕ| p ≤ Pt |ϕ| p , and therefore Pt ϕ L p (γ0,1 ;R) ≤ ϕ L p (γ0,1 ;R) for p ∈ [1, ∞]. Hence, Pt extends to L 2 (γ0,1 ; R) as a self-adjoint contraction, and so {Pt : t > 0} can be viewed as a strongly continuous semigroup of self-adjoint contractions on L 2 (γ0,1 ; R). Finally, note that lim Pt ϕ(x) =
t→∞
ϕ dγ0,1 and lim Pt ϕ = ϕ, t0
first uniformly on compact subsets for ϕ ∈ Cb (R; C) and then in L p (γ0,1 ; R) for p ∈ [1, ∞) and ϕ ∈ L p (γ0,1 ; R). The estimate (2.3.1) is equivalent to ϕ2L 2 (γ0,1 ;R) ≤ ϕ 2L 2 (γ0,1 ;R) + ϕ, γ0,1 2 , and it suffices to check it when ϕ ∈ S (R; R). Thus let ϕ ∈ S (R; R) be given, and observe that t t (Pt ϕ) (x) = ∂x ϕ(y + e− 2 x) p(t, 0, y) dy = e− 2 Pt ϕ (x). x2 x2 Using (2.3.3), 2Lϕ = e 2 ∂x e− 2 ϕ , integration by parts, and the preceding, conclude that
https://avxhm.se/blogs/hill0
28
2 Gaussian Measures and Families
d ϕ, Pt ϕ L 2 (γ0,1 ;R) = ϕ, LPt ϕ L 2 (γ0,1 ;R) dt t e− 2 ϕ , Pt ϕ L 2 (γ0,1 ;R) . = − 21 ϕ , (Pt ϕ) L 2 (γ0,1 ;R) = − 2 Since
ϕ , Pt ϕ
L 2 (γ0,1 ;R)
| ≤ ϕ L 2 (γ0,1 ;R) Pt ϕ L 2 (γ0,1 ;R) ≤ ϕ 2L 2 (γ0,1 ;R) ,
d e− 2 2 ϕ L 2 (γ0,1 ;R) . ϕ, Pt ϕ L 2 (γ0,1 ;R) ≤ dt 2 t
−
Integrating the preceding over t ∈ (0, ∞), one arrives at (2.3.1). Animportantapplicationof (2.3.1)isthefollowingergodicpropertyof{Pt : t > 0}. Suppose that ϕ, γ0,1 = 0. Then d Pt ϕ2L 2 (γ0,1 ;R) = 2 Pt ϕ, LPt ϕ γ0,1 = −(Pt ϕ) 2L 2 (γ0,1 ;R) ≤ −Pt ϕ2L 2 (γ0,1 ;R) , dt where the final inequality comes from (2.3.1) and the fact that Pt ϕ, γ0,1 = 0. Hence, Pt ϕ2L 2 (γ0,1 ;R) ≤ e−t ϕ2L 2 (γ0,1 ;R) , and so, even if ϕ, γ0,1 = 0, Pt ϕ − ϕ, γ0,1 2L 2 (γ0,1 ;R) ≤ e−t ϕ − ϕ, γ0,1 2L 2 (γ0,1 ;R) ≤ e−t ϕ2L 2 (γ0,1 ;R) .
(2.3.6)
A feature of Poincaré inequalities like (2.3.1) is their behavior when one takes products. Namely, suppose that μ j ∈ M1 (R N j ), j ∈ {1, 2}, and that ϕ2L 2 (μ j ;R) ≤ c|∇ j ϕ|2L 2 (μ j ;R) + ϕ, μ j 2 , where ∇ j is the gradient for functions on R N j . Then, if N = N1 + N2 and μ = μ1 × μ2 on R N , an application of Fubini’s Theorem shows that ϕ2L 2 (μ;R) ≤ c|∇ϕ|2L 2 (μ;R) + ϕ, μ2 , where ∇ is the gradient for functions on R N . As a consequence, we now see that (2.3.1) implies that, for any N ∈ Z+ , N 2L 2 (γ N ϕ − ϕ, γ0,1
0,1 ;R)
Define Pt(N ) ϕ(x) =
ϕ(y) RN
N
≤ |∇ϕ|2L 2 (γ N
0,1 ;R)
.
p(t, x j , y j )λR N (dy)
j=1
https://avxhm.se/blogs/hill0
(2.3.7)
2.3 Gaussian Spectral Properties
29
(N ) (N ) (N ) for ϕ ∈ Cb (R N ; C). Then it is easy to check from (2.3.3) that ∂t Pt ϕ = L Pt ϕ, 1 (N ) where L ψ(x) = 2 Δψ(x) − x, ∇ψ(x) R N . Thus, proceeding as we did in the derivation of (2.3.6), one finds that N Pt(N ) ϕ − ϕ, γ0,1 2L 2 (γ N
0,1 ;R)
−t
≤ e ϕ −
N ϕ, γ0,1 2L 2 (γ N ;R) 0,1
≤ e−t ϕ2L 2 (γ N
0,1 ;R)
.
(2.3.8)
2.3.1 A Logarithmic Sobolev Inequality It turns out that one can prove a slightly stronger inequality than (2.3.1). Namely, for ϕ ∈ C 1 (R; R), ϕ2 log
ϕ2 ϕ2L 2 (γ0,1 ;R)
dγ0,1 ≤ 2ϕ 2L 2 (γ0,1 ;R) .
(2.3.9)
This inequality was proved first by Gross [8] who called it a logarithmic Sobolev inequality. Obviously, (2.3.9) looks a lot like (2.3.1), especially when one rewrites it as
ϕ2 log ϕ2 dγ0,1 ≤ 2ϕ 2L 2 (γ0,1 ;R) + ϕ2L 2 (γ0,1 ;R) log ϕ2L 2 (γ0,1 ;R) .
However, as Gross showed, it has regularity consequences similar to, albeit weaker than, a classical Sobolev inequality. To prove (2.3.9), let ϕ be a strictly positive element of S (R; R), and set (cf. (2.3.2)) ϕt = Pt ϕ. Then, since ∂t ϕt , γ0,1 = ∂t ϕt , γ0,1 = 0, d ϕt log ϕt , γ0,1 = Lϕt log ϕt , γ0,1 dt e−t (Pt ϕ )2 1 (ϕ t )2 , γ0,1 . =− , γ0,1 = − 2 ϕt 2 Pt ϕ Observe that, by Schwarz’s inequality, 2
(Pt ϕ ) =
Pt
ϕ 1
ϕ2
2 ϕ
1 2
≤ Pt
(ϕ )2 ϕ
Pt ϕ,
and therefore 2 d (ϕ ) e−t (ϕ )2 e−t ϕt log ϕt , γ0,1 ≥ − Pt , γ0,1 = − , γ0,1 . dt 2 ϕ 2 ϕ
https://avxhm.se/blogs/hill0
30
2 Gaussian Measures and Families
After integrating over t ∈ (0, ∞), one has that
ϕ log ϕ, γ0,1
1 (ϕ )2 ≤ , γ0,1 + ϕ, γ0,1 logϕ, γ0,1 . 2 ϕ
Knowing this inequality for strictly positive ϕ’s in S (R; R), it is obvious that it extends to non-negative ϕ’s in C 1 (R; R). Finally, given a ϕ ∈ C 1 (R; R), apply the inequality to ϕ2 and thereby arrive at (2.3.9). To see that (2.3.9) is optimal, suppose that it holds with 2ϕ 2L 2 (γ0,1 ;R) replaced x2
by cϕ 2L 2 (γ0,1 ;R) , and take ϕ = eα 4 for α ∈ (0, 1). Show that
ϕ2 log ϕ2 γ0,1 =
α 2(1 − α)
3 2
1 , ϕ2 , γ0,1 = (1 − α)− 2 , ϕ 2L 2 (γ0,1 ;R) =
α2 3
4(1 − α) 2
,
log(1 − α). Now let α 1 to see that c ≥ 2. and conclude that α ≤ 2c α2 − (1−α) 2 The regularity property alluded to above was discovered by Nelson [11] and is called hypercontraction. It states that the semigroup {Pt : t > 0} is smoothing in the sense that Pt ϕ L q(t) (γ0,1 ;R) ≤ ϕ L p (γ0,1 ;R) (2.3.10) when p ∈ (1, ∞) and q(t) = 1 + ( p − 1)et . Gross’s proof of this from (2.3.9) is the following. Let a uniformly positive ϕ ∈ Cb1 (R; R) be given, and set ϕt = Pt ϕ and F(t) = ϕt L q(t) (γ0,1 ) . Then, with , one sees that q˙ = dq dt q˙ dF q˙ F 1−q q q−1 = − 2 F log F q + ϕt log ϕt , γ0,1 + F 1−q ϕt Lϕt , γ0,1 dt q q q q 2 (q − 1) q−2 2 F 1−q ϕt q q ˙ ϕ − ϕ , γ = log (ϕ ) , γ . 0,1 0,1 t t q t q2 2 ϕt , γ0,1 Since q˙ = q − 1 and ϕt (ϕ t )2 = equals q−2
(q − 1)
ψt2
log
4 q2
q 2 ψt where ψt = ϕt2 , the bracketed expression
ψt2
ψt 2L 2 (γ0,1 ;R)
! , γ0,1 −
2ψt 2L 2 (γ0,1 ;R)
,
which, by (2.3.9), is less than or equal to 0. Hence F is non-increasing, and so (2.3.10) holds, first for uniformly positive ϕ ∈ C 1 (R; R) and then for ϕ ∈ L p (γ0,1 ; R). Notice that, because Ps+t = Pt ◦ Ps and q(s + t) = 1 + q(s) − 1 et , (2.3.10) implies that Pt ϕ L q(t) (γ0,1 ;R) is a non-decreasing function of t ≥ 0, and therefore Gross’s computation shows that (2.3.10) is equivalent to (2.3.9). In particular, q(t) cannot be replaced by 1 + ( p − 1)ect for any c > 1. In fact, by direct computation
https://avxhm.se/blogs/hill0
2.3 Gaussian Spectral Properties
31
of ϕ L p (γ0,1 ;R) and ϕ L q (γ0,1 ;R) when ϕ = eαx , one can show that, for any q > q(t), there is a ϕ ∈ L p (γ0,1 ; R) for which Pt ϕ L q (γ0,1 ;R) = ∞. Just as was the case for the Poincaré inequality, one can use Fubini’s theorem to check that (2.3.9) implies ϕ2 log RN
ϕ2 ϕ2L 2 (γ N
0,1 ;R)
N dγ0,1 ≤ 2|∇ϕ|2L 2 (γ N
0,1 ;R)
(2.3.11)
for ϕ ∈ C 2 (R N ; R). Thus, by Gross’s argument, one knows that N N Pt(N ) ϕ L q(t) (γ0,1 ;R) ≤ ϕ L p (γ0,1 ;R)
when p ∈ (1, ∞) and q(t) = 1 + ( p − 1)et .
(2.3.12)
Because his goal was to construct quantum fields on infinite dimensional spaces, Nelson’s interest in these matters was their dimension independence. Given a Borel measurable function V : R N −→ R which is bounded either below or above, a key step in his program was to prove a dimension independent estimate for N − 21 ∇ϕ2L 2 (γ N Λ(V ) = sup V ϕ2 , γ0,1
0,1 ;R
N)
: ϕ ∈ C 1 (R N ; R) with ϕ L 2 (γ0,1 ∗N ;R) = 1 ,
(2.3.13)
which is the upper bound for the spectrum of the Schrödinger operator L(N ) + V =
1 Δ − (x, ∇)R N + V 2
N ; R), and what he showed is that on L 2 (γ0,1
Λ(V ) ≤
1 N log e4V , γ0,1 . 4
(2.3.14)
Because most familiar techniques for proving bounds on the spectrum of an operator are dimension dependent, anyone who has attempted to obtain such bounds will recognize that (2.3.14) is a remarkable result. Although Nelson based his proof of (2.3.14) on (2.3.12), it can be seen as a direct consequence of (2.3.11) combined with an interesting variational formula for the quantity on the left hand side of (2.3.11). To be precise, given a pair of probability measures μ and ν on a measurable space (E, F), define the relative entropy H (ν|μ) of ν with respect to μ by " H (ν|μ) =
∞
f log f dμ
if ν μ and f = if ν μ.
https://avxhm.se/blogs/hill0
dν dμ
32
2 Gaussian Measures and Families
Notice that, when ϕ L 2 (γ0,I ;R) = 1, ϕ2 can be thought of as the density of the probability measure ϕ2 dγ0,I with respect to γ0,I , and the left hand side of (2.3.11) is the relative entropy of this measure with respect to γ0,1 . Relative entropy is a quantity that appears in a variety of settings, including information theory and the theory of large deviations, and it has many interesting properties. The property that we will use here is the content of the following lemma. Lemma 2.3.1 Let B(E; R) denote the space of bounded, F-measurable R-valued functions on E, and, for any probability measures μ and ν on (E, F), set Iμ (ν) = sup ψ, ν − logeψ , μ : ψ ∈ B(E; R) . Then H (ν|μ) = Iμ (ν). Before proving Lemma 2.3.1, notice that (2.3.14) is a relatively easy conseN quence of (2.3.11) combined with Lemma 2.3.1 applied to μ = γ0,1 . Namely, given 1 2 N ϕ ∈ C (R; R) with ϕ L 2 (γ0,1 ;R) = 1, set f = ϕ and ψ = 4V , and take dν = f dμ. N N Then H (ν|γ0,1 ) = ϕ2 log ϕ2 , γ0,1 ≤ 2|∇ϕ|2L 2 (γ N ;R) , and so, if V is bounded, 0,1 then, by Lemma 2.3.1, 2|∇ϕ|2L 2 (γ N
0,1 ;R)
N N − log e4V , γ0,1 , ≥ 4 V ϕ2 , γ0,1
which means that 1 4
2 N 1 N ≥ V ϕ , γ0,1 − 2 |∇ϕ|2L 2 (γ N log e4V , γ0,1
0,1 ;R)
,
which proves (2.3.14) for bounded V ’s. Finally, knowing it for bounded V ’s, it is easy to extend it to V ’s which are bounded either above or below. Proof of Lemma 2.3.1 First note that ν Iμ (ν) is non-negative and convex. Further, because, by Jensen’s inequality, ψ, μ ≤ logeψ , μ, Iμ (μ) = 0. Next, given θ ∈ (0, 1), set νθ = (1 − θ)ν + θμ. Then, because H (μ|μ) = 0, convexity implies H (νθ |μ) ≤ (1 − θ)H (ν|μ). On the other hand, ν μ =⇒ νθ μ =⇒ H (νθ |μ) = ∞, and if dν = f dμ, then dνθ = f θ dμ, where f θ = (1 − θ) f + θ, and therefore, since log is non-decreasing and concave, H (νθ |μ) = f θ log f θ , μ = θlog f θ , μ + (1 − θ) f log f θ , μ ≥ θ log θ + (1 − θ)2 H (ν|μ). Hence H (ν|μ) = limθ0 H (νθ |μ).
https://avxhm.se/blogs/hill0
2.3 Gaussian Spectral Properties
33
We now show that Iν (μ) ≤ H (ν|μ), and clearly we need do so only when H (ν|μ) < ∞ and therefore dν = f dμ. If f is everywhere positive, then, by Jensen’s inequality, ψ e , ν = eψ , μ, exp ψ, ν − H (ν|μ) = exp ψ − log f, ν ≤ f and so ψ, ν − log eψ , μ ≤ H (ν|μ). If f can vanish, consider νθ , and, after letting θ 0, conclude that this inequality continues to hold, and therefore that Iμ (ν) ≤ H (ν|μ). In proving that H (ν|μ) ≤ Iμ (ν), we will assume Iμ (ν) < ∞. The first step is to show that ν μ. Thus, suppose Γ ∈ F with μ(Γ ) = 0, and, given r > 0, set ψr = r 1Γ . Then Iμ (ν) ≥ ψr , ν − log eψr , μ = r ν(Γ ), for all r > 0, and so ν(Γ ) = 0. Now assume that dν = f dμ. If f is bounded and uniformly positive, then log f ∈ B(E; R), and so Iμ (ν) ≥ log f, ν − log f, μ = H (ν|μ). If f is uniformly positive and not necessarily bounded, set f n = f ∧ n. Then, since limn→∞ f n , μ = 1, H (ν|μ) = lim log f n , ν = lim log f n , ν − log f n , μ ≤ Iμ (ν) n→∞
n→∞
Finally if f can vanish, H (νθ |μ) ≤ Iμ (νθ ) ≤ (1 − θ)Iμ (ν), and so H (ν|μ) ≤ Iμ (ν).
2.3.2 Hermite Polynomials x2
x2
For n ≥ 0, set Hn (x) = (−1)n e 2 ∂xn e− 2 . Then ∂ Hn (x) = x Hn (x) − Hn+1 (x), and so Hn+1 = a+ Hn , where a+ = x − ∂x is the raising operator. Proceeding by induction, one sees that Hn is an nth order polynomial, known as the nth unnormalized Hermite polynomial, for which 1 is the coefficient of x n and Hn (−x) = (−1)n Hn (x). Next, observe (2.3.15) a+ ϕ, ψ L 2 (γ0,1 ;R) = ϕ, ∂ψ L 2 (γ0,1 ;R) for ϕ, ψ ∈ C 1 (R; C) whose derivatives have at most exponential growth. Hence, if m ≤ n, then
Hn , Hm
L 2 (γ0,1 ;R)
n = a+ H0 , Hm L 2 (γ0,1 ;R) = H0 , ∂ n Hm L 2 (γ0,1 ;R) = n!δm,n ,
https://avxhm.se/blogs/hill0
34
2 Gaussian Measures and Families 1
and therefore Hn L 2 (γ0,1 ;R) = (n!) 2 and {Hn : n ≥ 0} is an orthogonal sequence in L 2 (γ0,1 ; R). In addition, for n ≥ 1, " n! ∂ Hn , Hm L 2 (γ0,1 ;R) = Hn , a+ Hm L 2 (γ0,1 ;R) = 0
if m = n − 1 if m = n − 1.
Since ∂ Hn is in the span of {Hm : 0 ≤ m < n}, ∂ Hn = n−1 m=0 αm Hm for some {αm : 0 ≤ m < n} ⊆ R. By taking inner products in L 2 (γ0,1 ; R), one finds that αm = 0 for m < n − 1 and (n − 1)!αn−1 = n!, and thereby concludes that ∂ Hn = n Hn−1 . Summarizing our results thus far, (x − ∂)Hn = Hn+1 , ∂ Hn = n Hn−1 , Hn , Hm L 2 (γ0,1 ;R) = n!δm,n . In particular,
n LHn = − Hn , 2
(2.3.16)
(2.3.17)
where L = 21 (∂ 2 − x∂) is the Ornstein–Uhlenbeck operator. To show that {Hn : n ≥ 0} is a basis in L 2 (γ0,1 ; R), use the exponential Taylor’s series to see that 2
e
ζ x− ζ2
=
∞
ζn n=0
n!
Hn (x) for (ζ, x) ∈ C × R,
(2.3.18)
where the convergence is uniform on compact subsets. In addition, using the preceding computations, one sees that, as a function of x, the convergence is in L 2 (γ0,1 ; R) uniformly for ζ in compact subsets. Now suppose that ϕ ∈ L 2 (γ0,1 ; R) is orthogx2
onal to {Hn : n ≥ 0}, and set ψ(x) = (2π)− 2 e− 2 ϕ(x). Then, ψ ∈ L 1 (λR ; R) ∩ L 2 (λR ; R), and, by (2.3.18) with ζ = iξ, 1
2
ξ ˆ ψ(ξ) = e− 2
∞
(iξ)n ϕ, Hn L 2 (γ0,1 ;R) = 0, n! n=0
and so ψ, and therefore ϕ, vanish (a.e.,λR ). Hence, we now know that if H˜ n = √Hn!n is the normalized nth Hermite polynomial, then { H˜ n : n ≥ 0} is an orthonormal basis in L 2 (γ0,1 ; R) consisting of eigenfunctions for the Ornstein–Uhlenbeck operator L in (2.3.5). Define the operators {Pt : t > 0} as in (2.3.2), and recall that ∂t Pϕ = LPt ϕ. In addition, using the semigroup property and (2.3.4), check that LPt ϕ = Pt Lϕ for ϕ ∈ C 2 (R; C) whose second derivative has at most polynomial growth. Thus, by nt (2.3.17), ∂t Pt Hn = − n2 Pt Hn , and so Pt Hn = e− 2 Hn . Now let ϕ ∈ L 2 (γ0,1 ; R) be given. Then
https://avxhm.se/blogs/hill0
2.3 Gaussian Spectral Properties
35
ϕ=
∞
(ϕ, Hn ) L 2 (γ0,1 ;R) n=0
n!
Hn ,
where the convergence is in L 2 (γ0,1 ; R), and so, since Pt is self-adjoint on L 2 (γ0,1 ; R) and therefore (Pt ϕ, Hn ) L 2 (γ0,1 ;R) , = (ϕ, Pt Hn ) L 2 (γ0,1 ;R) ,, Pt ϕ =
nt ∞
e− 2 (ϕ, Hn ) L 2 (γ0,1 ;R)
n=0
n!
Hn for ϕ ∈ L 2 (γ0,1 ; R),
(2.3.19)
where the convergence is in L 2 (γ0,1 ; R).
2.3.3 Hermite Functions For n ≥ 0, the nth unnormalized Hermite function h n is given by h n (x) = x2 1 e− 2 Hn (2 2 x). Using (2.3.16), one can check that 1
1
(x − ∂)h n = 2 2 h n+1 , (x + ∂)h n = 2 2 nh n−1 ; 1 and h n , h m L 2 (λR ;R) = π 2 n!δm,n ,
(2.3.20)
Hh n = − n + 21 h n where H = 21 ∂ 2 − x 2 .
(2.3.21)
and therefore that
Mathematicians call H the Hermite operator, and physicists call it the one dimensional harmonic oscillator Hamiltonian. Clearly, the h n ’s are mutually orthogonal elements of S (R; R). To check that they form a basis in L 2 (λR ; C), let ϕ ∈ L 2 (λR ; C), 1 x2 1 set ψ(x) = π 2 e 4 ϕ(2− 2 x), and check that ψ ∈ L 2 (γ0,1 ; C) and (ϕ, h n ) L 2 (λR ;C) = (ψ, Hn ) L 2 (γ0,1 ;C) . Hence, if ϕ is orthogonal to {h n : n ≥ 0}, then ψ = 0 (a.e.,λR ), and so {h n : n ≥ 0} is an orthogonal basis in L 2 (λR ; R) consisting of eigenfunctions for H. As the following lemma shows, they are also eigenfunctions for the Fourier transform. 1 Lemma 2.3.2 For each n ≥ 0, h#n = i n (2π) 2 h n . 1 1 Proof First note that h#0 = (2π) 2 h 0 . Next, assume that h#n = i n (2π) 2 h n , and conclude that
https://avxhm.se/blogs/hill0
36
2 Gaussian Measures and Families
eiξx xh n (x) d x − eiξx h n (x) d x = −i∂ξ h#n (ξ) + iξ h#n (ξ) 1 1 1 = i n+1 (2π) 2 ξ − ∂ξ )h n (ξ) = i n+1 2 2 (2π) 2 h n+1 (ξ).
22 h n+1 (ξ) = 1
We can use Lemma 2.3.2 to get uniform size estimates of the h n ’s. After multiplying both sides of (2.3.21) by 2h n and integrating, we know that xh n 2L 2 (λR ;R) + h n 2L 2 (λR ;R) = (2n + 1)h n 2L 2 (λR ;R) = (2n + 1)π 2 n!, 1
and therefore
1
(1 + x 2 )h n (x)2 d x ≤ 2(n + 1)π 2 n!. As a consequence of this and Schwarz’s inequality, h n L 1 (λR ;R) =
1 1 1 1 1 (1 + x 2 )− 2 (1 + x 2 ) 2 |h n (x)| d x ≤ π 2 2(n + 1)π 2 n! 2 ,
and so
1
3
1
1
h n L 1 (λR ;R) ≤ 2 2 π 4 (n + 1) 2 (n!) 2 . At the same time, by Lemma 2.3.2, we have that h n u = (2π)− 2 h#n u ≤ (2π)− 2 h n L 1 (λR ;R) . 1
1
1 2 Starting from this and using the fact coming from (2.3.20) that h n = 2 nh n−1 − h n+1 , one sees that
1 1 1 1 1 h n u ≤ π 4 (n + 1)! 2 and h n u ≤ 2− 2 π 4 (2n + 1) n! 2 ,
(2.3.22)
from which it follows that 1 x2 1 x2 1 1 |Hn (x)| ≤ π 4 (n + 1)! 2 e 4 and |Hn (x)| ≤ π 4 n n! 2 e 4
(2.3.23)
since Hn = n Hn−1 . Define the operator Q t for t > 0 and ϕ ∈ L 2 (λR ; R) by Qt ϕ =
t ∞ e− 2
π
1 2
n=0
e−nt
(ϕ, h n ) L 2 (λR ;R) hn . n!
https://avxhm.se/blogs/hill0
(2.3.24)
2.3 Gaussian Spectral Properties
37
Equivalently, Q t is the operator on L 2 (λR ; R) for which Q t h n = e−(n+ 2 )t h n . 1
(2.3.25)
Using the properties of {h n : n ≥ 0} in (2.3.20), one can check that {Q t : t > 0} is t a semigroup of self-adjoint operators and that Q t ϕ L 2 (λR ;R) ≤ e− 2 ϕ L 2 (λR ;R) . In addition, if ϕ ∈ S (R; C), then (2.3.21) says that ∂t Q t ϕ =
t ∞ e− 2
π
1 2
e−nt
n=0
(ϕ, Hh n ) L 2 (λR ;R) hn . n!
and so ∂t Q t ϕ = Q t Hϕ for ϕ ∈ S (R; C).
(2.3.26)
Recall that for ϕ ∈ Cb (R; R),
∞
(ϕ, Hn ) L 2 (γ0,1 ;R) Hn (x) n! n=0 t 1 (y − e− 2 x)2 −t − 2 where p(t, x, y) = 2π(1 − e ) exp − . 2(1 − e−t ) p(t, x, y)ϕ(y) dy =
e− 2
nt
Using the first estimate in (2.3.23), observe that, for each > 0, the series ∞
e− 2
nt
n=0
Hn (x)Hn (y) n!
is absolutely convergent uniformly for t ≥ and (x, y) in compact subsets. Thus
−t
(1 − e )
− 21
(y − e− 2 x)2 exp − 2(1 − e−t ) t
=
∞
e− 2
nt
n=0
Hn (x)Hn (y) − y2 e 2, n!
and so,
(1 − e−t )
− 21
e−t x 2 − 2e− 2 x y + e−t y 2 exp − 2(1 − e−t ) t
=
∞
n=0
e− 2
nt
Hn (x)Hn (y) . n!
Equivalently, if, for θ ∈ {z ∈ C : z ∈ (−1, 1)}, − 1 (θx)2 − 2θx y + (θy)2 , M(θ, x, y) = 2π(1 − θ2 ) 2 exp − 2(1 − θ2 )
https://avxhm.se/blogs/hill0
38
2 Gaussian Measures and Families
then M(θ, x, y) =
∞
n=0
θn
Hn (x)Hn (y) , n!
(2.3.27)
first for θ ∈ (0, 1) and then, by analytic continuation, for θ ∈ BC (0, 1). The function M(θ, x, y) is called the Mehler kernel, and (2.3.27) is one of the many formulas in which it appears. Another formula in which it plays a role is in connection with the semigroup {Q t : t > 0}. Namely, notice that the estimate (2.3.22) guarantees that, each > 0, the series ∞
e−nt
n=0
h n (x)h n (y) n!
absolutely uniformly for t ≥ and (x, y) ∈ R2 . Moreover, ∞
n=0
x 2 +y 2 h n (x)h n (y) 1 1 = M e−t , 2 2 x, 2 2 y e− 2 n! (1 + e−2t )x 2 − 2e−t x y + (1 + e−2t )y 2 −2t − 21 . = (1 − e ) exp − 1 − e−2t
e−nt
Hence, for ϕ ∈ L 1 (λR ; C), Q t ϕ(x) = q(t, x, y)ϕ(y) dy
where q(t, x, y) equals
(1 + e−2t )x 2 − 2e−t x y + (1 + e−2t )y 2 exp − 2(1 − e−2t ) 2 (cosh t)x − 2x y + (cosh t)y 2 1 . = (2π sinh t)− 2 exp − 2 sinh t
e−t π(1 − e−2t )
21
(2.3.28)
Finally, notice that this expression for Q t shows that Q t maps L 1 (λR ; R) into S (R; C) and therefore that, by (2.3.26), ∂s Q s Q t ϕ = Q s HQ t ϕ. After s 0, this means that (2.3.29) ∂t Q t ϕ = HQ t ϕ for ϕ ∈ L 1 (λR ; R). Exercise 2.3.1 (i) One of the ways in which relative entropy plays a role is as a measure of the distance between measures. In fact, show that ν − μ2var ≤ 2H (ν|μ).
https://avxhm.se/blogs/hill0
(2.3.30)
2.4 Gaussian Families
39
Hint: Reduce to the case when dν = f dμ. Next, check that 3(t − 1)2 ≤ (4 + 2t) t log t − t + 1 for t ≥ 0, and use this and Schwarz’s inequality to estimate (ii) Show that
| f − 1| dμ.
2 ∞
zm x h m (x) = exp − + 2zx − z 2 , m! 2 m=0 where the converence is in L 2 (λR ; C) uniformly for z is compact subsets of C, and √ use this to give another proof the hˆ n (ξ) = 2πi n h n (ξ) for ξ ∈ R. (iii) Using (2.3.20), show that, for any n, k, ∈ N there exist (k,) {an, j : −(k + ) ∧ n ≤ n ≤ k + } ⊆ R (k,) (k,) and and C (k,) < ∞ such that |an, j |≤C
x k ∂ h˜ n (x) = (n + 1)
k+ 2
k+
(k,) ˜ an, j h n+ j (x).
j=−(k+)∧n
Conclude from this that, for each m ∈ N, there is a Cm < ∞ such that max sup |x k ∂ h˜ n (x)| ≤ Cm n
m+1 2
k+=m x∈R N
2.4 Gaussian Families If X and Y are square integrable random variables on a probability space (Ω, F, P), then the covariance cov(X, Y ) of X and Y is the number E X − E[X ] Y − E[Y ] = E[X Y ] − E[X ]E[Y ]. If X 1 , . . . , X n ∈ L 2 (P, R), then the covariance cov(X 1 , . . . , X n ) of {X 1 , . . . , X n } is the n × n-matrix whose (k, )th entry is cov(X k , X ). Lemma 2.4.1 Given random variable X 1 , . . . , X n ∈ L 2 (P; R), set X = (X 1 , . . . , X n ) and A = cov(X 1 , . . . , X n ). k ≤ n, then the span span {X 1 , . . . , X n } of {X 1 , . . . , X n } If E[X k ] = 0 for all 1 ≤ in L 2 (P; R) equals span (ξ, X )Rn : ξ ⊥ Null(A) .
https://avxhm.se/blogs/hill0
40
2 Gaussian Measures and Families
Proof Let Π denote orthogonal projection from R n onto Null(A). Then X = (I − Π )X + Π X and E |Π X |2 = Trace(Π AΠ ) = 0. Hence, X = (I − Π )X (a.s, P).
Given a probability space (Ω, F, P), a Gaussian family is a subspace G of L 2 (P; R) each of whose elements is Gaussian. A Gaussian family is said to be centered if all its elements have mean value 0. By Lemma 2.1.1, the L 2 -closure of a Gaussian family is again a Gaussian family. If G is a Gaussian family, define the function m G : G −→ R by m G (X ) = E[X ] and the function CG : G 2 −→ R by CG (X, Y ) = cov(X, Y ). The functions m G and CG are known as the mean and covariance of G.
2.4.1 A Few Basic Facts N Set γ0,I = γ0,1 . That is, γ0,I λR N and x 2j |x|2 dγ0,I 1 N (x) = (2π)− 2 e− 2 = (2π)− 2 e− 2 . dλR N j=1
N
Given b ∈ R N and a non-negative, symmetric linear transformation A on R N , take 1 γb,A to be the distribution of b + A 2 x under γ0,I . When A is non-degenerate, it is easy to check that γb,A λR N and dγb,A (x) = gb,A (x) dλR N
−1 1 x − b, A (x − b) − RN ≡ (2π) N det(A) 2 exp − . 2
(2.4.1)
In keeping with the notation used in the real valued case, I will write X ∈ N (b, A) to mean that X is an R N -valued random variable whose distribution is γb,A . 1 θ for some θ ∈ (−1, 1), a remark that statisRemark: When N = 2 and A = θ 1 ticians have found useful is the relation between g0,A and the Mehler kernel. Namely, g0,A (x1 , x2 ) = (2π)−1 M(θ, x1 , x2 )e− and so
∞
x12 +x22 2
Hn (x1 )Hn (x2 ) dγ0,A (x) = θn . dγ0,I n! n=0
https://avxhm.se/blogs/hill0
,
2.4 Gaussian Families
41
Lemma 2.4.2 γb,A is the one and only μ ∈ M1 (R N ) with the property that (x, ξ)R N : ξ ∈ R N } is a Gaussian family G under μ for which m G (x, ξ)R N = (b, ξ)R N and CG (x, ξ)R N , (x, ξ)R N = (ξ, Aξ)R N for all ξ ∈ R N . In addition,1 e
(ζ,y)R N
γb,A (dy) = exp (b, ζ)R N
(ζ, Aζ)R N + 2
for ζ ∈ C N ,
(2.4.2)
and, if α < A−1 op , then
− 1 2 eα|x| γ0,A (d x) = det(I − α A) 2 .
(2.4.3)
Proof To prove (2.4.2), begin with the case when b = 0 and A = I , and therefore, N , the desired result follows from (2.1.1). To handle the general case, since γ0,I = γ0,1 check that 1 1 1 21 2 (ζ,y)R N γb,A (dy) = e(ζ,b+A 2 x)R N γ0,I (d x) = e(ζ,b)R N + 2 (A ζ,A ζ)R N . e From (2.4.2) we know that (ξ, Aξ)R N γb,A (ξ) = exp i(b, ξ)R N − 2
(2.4.4)
and that any μ ∈ M1 (R N ) with the stated properties will have the same characteristic function as γb,A . To prove (2.4.3), let a1 , . . . , an be the eigenvalues of A, and observe that, by (2.1.2)
eα|x| γ0,A (d x) = 2
N k=1
eαak xk γ0,1 (d xk ) = 2
R
N
− 1 1 (1 − αak )− 2 = det(I − α A) 2
k=1
The following result is arguably the most important property of Gaussian families. It says that when dealing with a Gaussian family, independence can be checked by computing covariances. In fact, although independence implies vanishing covariance for any random variables, for Gaussian families, vanishing covariance implies independence. Since independence is a statement that usually involves checking integrals 1
Even though some of the vectors involved are complex, the inner product here is the Euclidean one, not the Hermitian one.
https://avxhm.se/blogs/hill0
42
2 Gaussian Measures and Families
of non-linear functions of random variables, it is remarkable that there is a rich family of random variables for which it suffices to check only linear functions of the random variables. Theorem 2.4.3 Let G be a centered Gaussian family on the probability space (Ω, F, P). For any subset ∅ = S G, set F S = σ(S), the smallest σ-algebra with respect to which all the elements of S are measurable, and take S ⊥ to be the perpendicular complement in L 2 (P; R) of S. Then F S is independent of F S ⊥ ∩G . Proof What we must show is that if {X 1 , . . . , X m } ⊆ S and {Y1 , . . . , Yn } ⊆ S ⊥ ∩ G, then the distribution of (X 1 , . . . , X m , Y1 , . . . , Yn ) is the product of the distribution of (X 1 , . . . , X m ) with the distribution of (Y1 , . . . , Yn ) . To this end, set A = cov(X 1 , . . . , X m ), B = cov(Y1 , . . . , Yn ), and observe that C ≡ cov(X 1 , . . . , X m , Y1 , . . . , Yn ) =
A 0 . 0 B
Hence, since γ0,A , γ0,B , and γ0,C are, respectively, the distributions of (X 1 , . . . , X m ) , (Y1 , . . . , Yn ) , and (X 1 , . . . , X m , Y1 , . . . , Yn ) , (ξ, Aξ)Rm + (η, Bη)Rn ξ = exp − γ0,C η 2 ξ = γ0,A (ξ) γ0,B (η) = γ0,A × γ0,B η for all ξ ∈ Rm and η ∈ Rn , from which it follows that γ0,C = γ0,A × γ0,B .
2.4.2 A Concentration Property of Gaussian Measures In this subsection I will show that if a Gaussian measure gives positive measure to a set then it is nearly concentrated on a neighborhood of that set. The driving force behind the analysis here is the following beautiful result of Maurey and Pisier. Theorem 2.4.4 Let A be a strictly positive definite, symmetric transformation on R N , and let X be a N (0, A)-random variable on (Ω, F, P). If f : R N −→ R is a continuous function satisfying
1 | f (y) − f (x)| ≤ λ A− 2 (y − x) for x, y ∈ R N , then, for all t ∈ R,
λ2 π 2 t 2 E et f (X ) E e−t f (X ) ≤ e 8 ,
and so
https://avxhm.se/blogs/hill0
2.4 Gaussian Families
43
λ2 π 2 t 2 E et ( f (X )−E[ f (X )]) ≤ e 8 and
λ2 π 2 t 2 E et f (X ) ≤ e 16
if f is odd. Proof First observe that, by replacing f by λ−1 f , we can reduce to the case when λ = 1. In addition, without loss in generality, we will assume that there is a second N (0, A)-random variable Y which is independent of X . Finally, note that, after applying a standard mollification procedure, we may assume that f is smooth and 1 |A 2 ∇ f | ≤ 1 everywhere. 1 Now let f be a smooth function satisfying |A 2 ∇ f | ≤ 1 everywhere, and observe that (∗) E et f (X ) E e−t f (X ) = E et ( f (X )− f (Y )) . Next, for θ ∈ R, set X (θ) = X cos θ + Y sin θ and Y (θ) = −X sin θ + Y cos θ. Using characteristic functions, it is easy to check that, for each θ, X (θ) and Y (θ) are again mutually independent, N (0, A)-random variables. Furthermore, by the Fundamental Theorem of Calculus, f (X ) − f (Y ) =
π 2
0
∇ f X (θ) , Y (θ) R N dθ,
and so, by Jensen’s inequality, et ( f (X )− f (Y )) = et
π 2 0
∇ f (X (θ)),Y (θ)
RN
dθ
≤
2 π
π 2
πt
e2
∇ f (X (θ)),Y (θ)
RN
dθ.
0
Hence, by (∗) and Fubini’s Theorem, 2 E et f (X ) E e−t f (X ) = E et ( f (X )− f (Y )) ≤ π
π 2
πt E e 2 ∇ f (X (θ)),Y (θ) R N dθ.
0
Because Y (θ) is independent of X (θ), (2.4.2) implies that
1 2 ! πt ∇ f (X (θ)),Y (θ) π 2 t 2 A 2 ∇ f X (θ) π2 t 2 RN E e2 = E exp ≤e 8 . 8 To complete the proof, note that, by Jensen’s inequality, E et ( f (x)−E[ f (X )]) = E et f (X ) e−E[t f (X )] ≤ E et f (X ) E e−t f (X ) , and, when f is odd, E et f (X ) = E e−t f (X ) .
https://avxhm.se/blogs/hill0
44
2 Gaussian Measures and Families
As a more or less immediate consequence of Theorem 2.4.4, we have that 2R 2 P f (X ) − E[ f (X )] ≥ R ≤ e− π2 λ2 2R 2 for R > 0. P | f (X ) − E[ f (X )]| ≥ R ≤ 2e− π2 λ2 4R 2 P | f (X )| ≥ R ≤ 2e− π2 λ2 if f is odd
(2.4.5)
Indeed, by Markov’s inequality, λ2 π 2 t 2 P f (X ) − E[ f (X )] ≥ R ≤ e−t R+ 8
for all t ≥ 0,
and so the first of these follows when one takes t = λ4R 2 π 2 . Further, given the first estimate, the second follows when the first one is applied to both f and − f and the two are added, and the same argument applies when f is odd. Perhaps the most interesting aspect of these results is what they say about the concentration property of Gaussian measures, especially the dimension independence of that property. For example, suppose that X is standard R N -valued Gaussian random variable. Then 1 2 2 Γ N 2+1 1 E |X | = ∼ N 2 as N → ∞. Γ N2 On the other hand, by (2.4.5)
2R 2 P X − E[|X |] ≥ R ≤ 2e− π2 . Thus, with large probability independent of N , the values of X will be concentrated 1 in an annular region around the sphere of radius N 2 . The following theorem gives a more general statement of this Gaussian concentration phenomenon. Theorem 2.4.5 If X is an R N -valued, N (0, A)-random variable and Γ ∈ BR N , then2 2 − R P X ∈Γ ∧P X ∈ / Γ (R) ≤ e 2π2 Aop for R ≥ 0, where Γ (R) = {x ∈ R N : |x − Γ | ≤ R}. Hence, if ∈ (0, 1) and P(X ∈ Γ ) ≥ , then $ 2 − R P X∈ / Γ (R) ≤ e 2π2 Aop for R > π 2Aop log 1 . Proof Set f (x) = |x − Γ |, and observe that | f (y) − f (x)| ≤ Ao p |A− 2 (y − x)|. 1
2 A is the operator norm sup{|Ax| : |x| = 1} of A, which, because A is symmetric and positive op definite, equals sup{(x, Ax)R N : |x| = 1}.
https://avxhm.se/blogs/hill0
2.4 Gaussian Families
If E[ f (X )] ≤
R , 2
45
then, by (2.4.5),
P X∈ / Γ (R) ≤ P f (X ) − EP [ f (X )] ≥ If EP [ f (X )] ≥
R , 2
R 2
2
≤e
− 2π2 RA
op
.
then, by (2.4.5) applied to − f ,
P(X ∈ Γ ) ≤ P E[ f (X )] − f (X ) ≥
R 2
2
≤e
− 2π2 RA
op
.
Hence, the first assertion is proved.
$ To prove the second assertion, let R > π 2Aop log 1 be given. Then, because 2 2 − R − R / Γ (R) ≤ e 2π2 Aop . P(X ∈ Γ ) ≥ > e 2π2 Aop , P X ∈ As a consequence of Theorem 2.4.5, one sees that if P(X $ ∈ Γ ) ≥ , then, with large probability, X lies within a distance on the order of Aop log 1 from Γ . In other words, once one knows that γ0,A (Γ ) ≥ , one knows that most of the mass of γ0,A is concentrated relatively nearby Γ , and the extent of this concentration depends only on Aop and not on dimension.
2.4.3 The Gaussian Isoperimetric Inequality The goal in this subsection is to prove a result that can be thought of as a isoperimetric inequality for Gaussian measures and can be used to derive concentration results closely related to those in the preceding subsection. To describe this result, 2 − 21 − τ2 be the standard Gauss kernel on R, and take Φ(x) = let x g(τ ) = (2π) e g(τ ) dτ to be the error function. −∞ Theorem 2.4.6 For any N ∈ Z+ , Γ ∈ BR N , and t ≥ 0, γ0,I Γ (t) ≥ Φ Φ −1 γ0,I (Γ ) + t , where Γ (t) = {x ∈ R N : |x − Γ | ≤ t}.
(2.4.6)
To understand in what sense (2.4.6) is an isoperimetric inequality, (t)note that for a half-space H = {x : (x, e)R N ≤ a}with e ∈ S N −1 and a ∈ R,γ H = Φ(a + t). 0,I Thus, if a = Φ −1 γ0,I (Γ ) , then γ0,I (H ) = γ0,I (Γ ) and γ0,I Γ (t) ≥ γ0,I H (t) . In other words, among the sets B ∈ BR N with γ0,I (B) = γ0,I (Γ ), half-spaces are ones for which the growth of t γ0,I B (t) is minimal. The first derivations, given independently by Borell and by Tsirelson & Sudakov, of (2.4.6) were based on Lévy’s isoperimetric inequality for spheres combined with n is the weak limit as the observation in Exercise 2.4.1 below that, for each n ≥ 1, γ0,1 N → ∞ of the marginal distribution of the first n-coordinates under the normalized
https://avxhm.se/blogs/hill0
46
2 Gaussian Measures and Families 1
surface measure on the N -sphere of radius N 2 . The derivation that follows was given by Bobkov [2]. It too requires realizing γ0,I as a weak limit, but this time of measures based on sums of Bernoulli random variables rather than the surface measure on spheres. The first step is to rewrite (2.4.6) as Φ −1 γ0,I (Γ (t) ) − Φ −1 γ0,I (Γ ) ≥ t.
(2.4.7)
It is clear that (2.4.7) is equivalent to (2.4.6) and that it will hold for all Γ ∈ BR N if it holds for closed ones. In addition, since Φ −1 (1) = ∞ and Φ −1 (0) = −∞, we may and will assume that 0 < γ0,I (Γ ) ≤ γ0,I (Γ (t) ) < 1. Now consider the rightcontinuous, non-decreasing function F(t) = Φ −1 γ0,I (Γ (t) ) for t ∈ [0, T ), where T = sup{t ≥ 0 : γ0,I (Γ (t) ) < 1}. Lebesgue’s Differentiation Theorem says that f (t) ≡ lim
τ 0
F(t + τ ) − F(t) exists for λR -a.e. t ∈ [0, T ) τ
t and that F(t) − F(0) ≥ 0 f (τ ) dτ . Hence, (2.4.7) will follow once we show that f ≥ 1 (a.e.,λR ) on [0, T ). Next observe that (Φ −1 ) = Ψ1 , where Ψ = g ◦ Φ −1 , and therefore γ0,I Γ (t+τ ) − γ0,I Γ (t) F(t + τ ) − F(t) 1 lim lim = , τ τ Ψ γ0,I (Γ (t) ) τ 0 τ 0 which means that it suffices to show that γ0,I Γ (t+τ ) − γ0,I Γ (t) ≥ Ψ γ0,I (Γ (t) ) . lim τ τ 0 In fact, because Γ (t+τ ) = (Γ (t) )(τ ) , we need only to prove that γ0,I Γ (τ ) − γ0,I Γ ≥ Ψ γ0,I (Γ ) . lim τ τ 0
(2.4.8)
The second step is to show that (2.4.8) will follow once we know that for all ϕ ∈ Cb2 R N ; [0, 1] , Ψ ϕ, γ0,I ≤ Ψ ◦ ϕ, γ0,I + |∇ϕ|, γ0,I . To see that this suffices, let ∈ 0, 21 ) be given, and define ητ (x) = 1 −
|x − Γ (τ ) | ∧ 1. (1 − 2)τ
https://avxhm.se/blogs/hill0
(2.4.9)
2.4 Gaussian Families
47
Then ητ is a [0, 1]-valued, Lipschitz continuous function with Lipschitz constant −1 for which equal to (1 − 2)τ " ητ (x) =
1 if |x − Γ | ≤ τ 0 if |x − Γ | ≥ (1 − )τ .
Now choose ρ ∈ Cb∞ R N ; [0, ∞) so that ρ = 0 off BR N (0, τ ) and ρ dλR N = 1, −1 and define ϕτ = ρ ∗ ητ . Then ϕτ ∈ C ∞ R N ; [0, 1] , |∇ϕτ | ≤ (1 − 2)τ , and " ϕτ (x) =
1 for x ∈ Γ 0 for x ∈ / Γ (τ ) .
Hence, limτ 0 ϕτ (x) = 1Γ (x), and therefore limτ 0 Ψ ◦ ϕτ (x) = 0 for each x ∈ R N . Further, since ϕτ achieves its minimum value off of Γ (τ ) and its maximum on 1Γ (τ ) −1Γ Γ , |∇ϕτ | ≤ (1−2)τ , and so, by (2.4.9), Ψ γ0,1 (Γ ) = lim Ψ ϕτ , γ0,I τ 0
γ0,I (Γ (τ ) ) − γ0,I (Γ ) . τ τ 0
≤ lim |∇ϕτ |, γ0,I ≤ (1 − 2)−1 lim τ 0
Thus (2.4.8) follows after one lets 0. For reasons that will become clear shortly, we will prove (2.4.9) by proving the slightly stronger inequality Ψ ϕ, γ0,I ≤
RN
1 (Ψ ◦ ϕ)2 + |∇ϕ|2 2 dγ0,I .
(2.4.10)
This inequality looks somewhat like a Poincaré inequality, and, like a Poincaré inequality, it is preserved under products. To see this, assume that it holds for 1 ≤ N ≤ M, and let ϕ ∈ Cb2 R M+1 ; [0, 1] be given. Writing γ0,I for R M+1 as the product of γ0,I for R M and γ0,1 ,
1 (Ψ ◦ ϕ)2 + |∇ϕ|2 2 dγ0,I R M+1 2 1 2 2 2 Ψ ◦ ϕ(x, y) + |∇x ϕ(x, y)| + |∂ y ϕ(x, y)| = γ0,I (d x) γ0,1 (dy). R
RM
1
Because the triangle inequality implies that (a, b) ∈ R2 −→ (a 2 + b2 ) 2 ∈ [0, ∞) is convex, Jensen’s inequality followed by our induction hypothesis implies that
https://avxhm.se/blogs/hill0
48
2 Gaussian Measures and Families
RM
2 1 Ψ ◦ ϕ(x, y) + |∇x ϕ(x, y)|2 + |∂ y ϕ(x, y)|2 2 γ0,I (d x) ≥
RM
1 Ψ ◦ ϕ(x, y)2 + |∇x ϕ(x, y)|2 2 γ0,I (d x)
+ where ψ(y) = R
RM
RM
|∂ y ϕ(x, y)| γ0,I (d x)
2 21
2
1 ≥ Ψ ◦ ψ(y)2 + |ψ (y)|2 2 ,
ϕ(x, y) γ0,I (d x). Since
2 1 Ψ ◦ ψ(y) + |ψ (y)|2 2 γ0,1 (dy) ≥ Ψ ψ, γ0,1 = Ψ ϕ, γ0,I ,
we now know that (2.4.10) holds for all N ≥ 1 if it does when N = 1. Thus what we need to show is that 2 1 Ψ ϕ, γ0,1 ≤ Ψ ◦ ϕ + |ϕ |2 2 dγ0,1 (2.4.11) R
2
for ϕ ∈ Cb R; [0, 1] . In some ways, the next step is the most interesting. What we are going to show is that (2.4.11) follows from a discrete analog of itself. Namely, let β be the symmetric Bernoulli measure on {−1, 1} (i.e., β({±1}) = 21 ), and, given a function f : {−1, 1} −→ [0, 1], define D f (±1) = ± f (1)−2f (−1) . Then the analog of (2.4.11) in this setting is Ψ f, β ≤
{−1,1}
2 1 Ψ ◦ f + |D f |2 2 dβ.
(2.4.12)
To understand why (2.4.12) implies (2.4.11), observe that, by exactly the same argument as we used above, (2.4.12) is self-replicating under products. Thus, if 1 + + P = β Z on Ω = {−1, 1}Z and S˜n (ω) = n − 2 nm=1 ω(m), then, for any n ≥ 1, (2.4.12) implies ⎡ 21 ⎤ n 2
|Dm ϕ ◦ S˜n |2 ⎦ , Ψ E[ϕ ◦ S˜n ] ≤ E ⎣ Ψ ◦ ϕ ◦ S˜n + m=1
where Dm ϕ ◦ S˜n (ω) = Since
1 ϕ S˜n (ω) − ϕ S˜n ω) − 2n − 2 ω(m) . 2
Dm ϕ S˜n (ω) + n − 21 ω(m)ϕ S˜n (ω) ≤ ϕ u n
https://avxhm.se/blogs/hill0
2.4 Gaussian Families
and therefore
49
n
2 2
C
Dm ϕ ◦ S˜n (ω) − ϕ ◦ S˜n (ω) ≤ 1
n2
m=1
for some C < ∞, an application of Central Limit Theorem to { S˜n : n ≥ 1} completes the proof that (2.4.11) follows from (2.4.12). What remains to be done is give a proof of (2.4.12), an intricate but relatively elementary exercise in calculus. Given f : {−1, 1} −→ [0, 1], set c = f, β = f (1)+ f (−1) and ξ = f (1)−2f (−1) . Then (2.4.12) becomes 2 1 1 Ψ (c + ξ)2 + ξ 2 2 + Ψ (c − ξ)2 + ξ 2 2 . Ψ (c) ≤ 2
(∗)
To verify (∗), we will make frequent use of the calculations in the following lemma. Lemma 2.4.7 For any θ ∈ [0, 1], Φ −1 (1 − θ) = −Φ −1 (θ) and Ψ (1 − θ) = Ψ (θ). Moreover, for c ∈ 0, 21 and ξ ∈ [0, c], Ψ (c + ξ) ≥ Ψ (c − ξ) and |Ψ (c + ξ)| ≤ Ψ (c − ξ). Finally, (Ψ )2 + 1 (Ψ 2 ) = 2 (Ψ )2 − 1 and (Ψ )2 = 2 on (0, 1). Ψ2 Proof Once one checks that Ψ = −Φ −1 , the calculations of derivatives are simple applications of the product and chain rules. Because 1 − Φ(t) = 1 − γ0,1 (−∞, t] = γ0,1 (t, ∞) = γ0,1 (−∞, −t) = Φ(−t), Φ −1 (1 − θ) = −Φ −1 (θ), from which it follows Ψ (1 − θ) = Ψ (θ). that Next observe that Ψ is non-decreasing on 0, 21 . Now let c ∈ 0, 21 ] and ξ ∈ [0, c] be given. Then, either c − ξ ≤ c + ξ ≤ 21 and therefore Ψ (c + ξ) ≥ Ψ (c − ξ), or c − ξ ≤ 1 − c − ξ ≤ 21 and therefore Ψ (c + ξ) = Ψ (1 − c − ξ) ≥ Ψ (c − ξ). Similarly, because Ψ is non-negative and non-increasing on 0, 21 ], |Ψ (c + ξ)| ≤ Ψ (c − ξ). Returning to (∗), first note that there is nothing to do if c ∈ {0, 1} and second that it suffices to handle ξ > 0. Further, since Ψ (1 − θ) = Ψ (θ) and therefore (∗) holds for f if it does for 1 − f , we may and will assume that c ∈ 0, 21 ] and ξ ∈ (0, c). Now set u(ξ) = Ψ (c + ξ)2 + ξ 2 , and, after rewriting (∗) in terms of u and squaring both sides, check that it is equivalent first to 1 4u(0) − u(ξ) + u(−ξ) ≤ 2 u(ξ)u(−ξ) 2 ,
https://avxhm.se/blogs/hill0
50
2 Gaussian Measures and Families
and then to 2 16u(0)2 + u(ξ) − u(−ξ) ≤ 8u(0) u(ξ) + u(−ξ) . Thus, if v(ξ) = u(ξ) − u(0), then (∗) is equivalent to 2 Ψ (c + ξ)2 − Ψ (c − ξ)2 ≤ 8Ψ (c)2 v(ξ) + v(−ξ) .
(∗∗)
Using the calculations in Lemma 2.4.7, one sees that v = 2(Ψ )2 (c + ξ) and therefore that v (ξ) + v (−ξ) = 2 Ψ (c + ξ)2 + Ψ (c − ξ)2 . Since, by Lemma 2.4.7, (Ψ )2 is convex, the right hand side of the preceding dominates 4Ψ (c)2 , and so, because v(ξ) + v(−ξ) vanishes to first order at ξ = 0, v(ξ) + v(−ξ) ≥ 2Ψ (c)2 ξ 2 . Therefore (∗∗) will hold if 2 Ψ (c + ξ)2 − Ψ (c − ξ)2 ≤ 16Ψ (c)2 Ψ (c)2 ξ 2 . Because, again by Lemma 2.4.7, Ψ (c + ξ) ≥ Ψ (c − ξ) and Ψ (c) ≥ 0, what we need to show is that Ψ (c + ξ)2 − Ψ (c − ξ)2 ≤ 4Ψ (c)Ψ (c) = 2(Ψ 2 ) (c). ξ But, from Lemma 2.4.7, we know that (Ψ 2 ) (c + ξ) − (Ψ 2 ) (c − ξ) = 2 Ψ (c + ξ)2 − Ψ (c − ξ)2 ≤ 0, which means that ξ Ψ (c + ξ)2 − Ψ (c − ξ)2 is concave and therefore that Ψ (c + ξ)2 − Ψ (c − ξ)2 ≤ (Ψ 2 ) (c). 2ξ Thus we have proved (2.4.6) and therefore (2.4.7). As with the Maurey–Pisier estimate, the most significant feature of (2.4.6) is its dimension independence. For instance it says that, independent of dimension, γ0,I (Γ ) ≥
1 t2 =⇒ γ0,I R N \ Γ (t) ≤ 1 − Φ(t) ≤ e− 2 . 2
To see how it can be used to prove estimates like the one in (2.4.5), consider a Lipschitz continuous function f : R N −→ R with Lipschitz constantλ, let m be a median of f under γ0,I , and set Γ± (t) = {±( f − m) ≤ λt}. Then γ0,I Γ± (0) ≥ 21 , (t) ⊆ Γ± (t), and so Γ± (0) γ0,I Γ± (t) ≥ Φ(t).
https://avxhm.se/blogs/hill0
2.5 Constructing Gaussian Families
Therefore
51
t2 γ0,I {±( f − m) > λt} ≤ 1 − Φ(t) ≤ e− 2 .
(2.4.13)
Finally, suppose that g : R −→ [0, ∞) is a continuous function with the property that g(s) ≤ g(t) if 0 ≤ s ≤ t or 0 ≥ s ≥ t. Then, starting from (2.4.13), one can show that g f (x) − m γ0,I (d x) ≤ g(λt) γ0,1 (dt). (2.4.14) RN
R
Indeed, it suffices to check this when g is continuously differentiable and g(0) = 0, in which case ∞ g f (x) − m γ0,I (d x) = g (t)γ0,I {x : f (x) − m ≥ t} dt RN 0 ∞ + g (−t)γ0,I {x : f (x) − m ≤ −t} dt 0 ∞ ∞ −1 ≤ g (t) 1 − Φ(λ t) dt + g (−t) 1 − Φ(λ−1 t) dt 0 0 ∞ ∞ =λ g (λt) 1 − Φ(t) dt + λ g (−λt) 1 − Φ(t) dt 0 0 = g(λt) γ0,1 (dt). R
In particular
α| f (x) − m|2 exp 2λ2
γ0,I (d x) ≤
1 1
(1 − α) 2
for α ∈ [0, 1). 1
Exercise 2.4.1 For N ≥ 2, define μ N ∈ M1 (R N ) to be the distribution of N|X2 |X , where X is a standard normal R N -valued random 1 variable. Show that μ N is the normalized surface measure on the sphere S N −1 N 2 ) in R N . Next, using the strong law of large numbers, show that, for each n ≥ 1, the distribution of x ∈ R N −→ (x1 , . . . , xn ) ∈ Rn under μ N tends weakly to the standard normal distribution on Rn . This is a version of a result discovered by Mehler and rediscovered by Borel. By using a change of variables, one can prove the stronger result that the convergence is taking place in variation norm.
2.5 Constructing Gaussian Families Let I be a non-empty index set and (Ω, F, P) a probability space. Given a subset {X (ξ) : ξ ∈ I} of L 2 (P; R), set
https://avxhm.se/blogs/hill0
52
2 Gaussian Measures and Families
c(ξ, η) = cov X (ξ), X (η) ≡ E X (ξ) − E[X (ξ)] X (η) − E[X (η)] = E X (ξ)X (η) − E X (ξ) E X (η) . The function c is called the covariance function for {X (ξ) : ξ ∈ I}. There are three obvious properties that such a covariance function possesses. Namely, it is R-valued, symmetric (i.e., c(ξ, η) = c(η, ξ)), and non-negative definite in the sense that, for all n ≥ 1, {ξ1 , . . . , ξn } ⊆ I, and {s1 , . . . , sn } ⊆ R,
c(ξk , ξ )sk s ≥ 0.
(2.5.1)
1≤k,≤n
To check this last property, simply observe that
c(ξk , ξ )sk s = var
n
sk X (tk ) ≥ 0.
k=1
1≤k,≤n
Given a symmetric function c : I 2 −→ R satisfying (2.5.1), the goal in this subsection is to show it is the covariance function for a family of centered Gaussian random variables. That is, we will show that there exists a probability space (Ω, F, P) on which there is a collection {X (ξ) : ξ ∈ I} of square integrable random variables with covariance function c whose span is a Gaussian family. Such a collection is called a Gaussian process with covariance c. In order to prove this, we will use a famous theorem of Kolmogorov known as Kolmogorov’s Consistency Theorem. To state his result, for each ξ ∈ I, )let (E ξ , ρξ ) be a complete, separable metric space. Given ∅ = S ⊆ I, set Ω S = * ξ∈S E ξ , and take Ω = ΩI . Thinking of Ω S as the set of all functions ω S : S −→ ξ∈S E ξ such that ω S (ξ) ∈ E ξ for each ξ ∈ S, define π S : Ω −→ Ω S so that π S ω = ω S. If ∅ = F ⊂⊂ I (i.e., F is a non-empty, finite subset of I), give Ω F the product topology, and set A F = {π −1 F Γ : Γ ∈ BΩ F }. Finally, take F = σ(A), where A=
+ A F : ∅ = F ⊂⊂ I ,
and note that A is an algebra of subsets of Ω. If, for each ∅ = F ⊂⊂ I, μ F ∈ M1 (Ω F ), the family {μ F : ∅ = F ⊂⊂ I} is said to be consistent, if μ F1 (Γ ) = μ F2 ω F2 ∈ Ω F2 : ω F2 F1 ∈ Γ for all ∅ = F1 ⊂ F2 ⊂⊂ I and Γ ∈ BΩ F1 . Theorem 2.5.1 Referring to the preceding, if {μ F : ∅ = F ⊂⊂ I} is a consistent family, then there is a unique probability measure P on (Ω, F) such that, for all ∅ = F ⊂⊂ I and Γ ∈ BΩ F ,
https://avxhm.se/blogs/hill0
2.5 Constructing Gaussian Families
53
P {ω : ω F ∈ Γ } = μ F (Γ ). Proof Uniqueness is trivial in any case, and, when I is finite, there is nothing to do. Next, suppose that I is countable, in which case we may assume that I = Z+ and define ∞
ρm ω(m), ω (m) −m ρ(ω, ω ) = 2 1 + ρm ω(m), ω (m) m=1 for ω, ω ∈ Ω. One can easily check that ρ is a complete, separable metric for Ω and that ρ-convergence of {ωk : k ≥ 1} in Ω is equivalent to ρm -convergence of {ωk (m) : k ≥ 1} in E m for each m ≥ 1. In particular, the σ-algebra F described above coincides with the Borel field BΩ determined by ρ. Set Fn = {1, . . . , n}, Ωn = Ω Fn , πn = π Fn , An = A Fn , and μn = μ Fn for n ≥ 1. From the consistency hypothesis, we know that for 1 ≤ m < n and Γ ∈ BΩm , μm (Γ ) = μn {ω Fn ∈ Ωn : ω Fn Fm ∈ Γ } . Next, for each m ≥ 1, choose an element em ∈ E m , and define Φn : Ωn −→ Ω so that " ω Fn (m) if 1 ≤ m ≤ n Φn (ω Fn )(m) = if m > n. em Clearly, Φn is continuous, and πn ◦ Φn is the identity map on Ωn . Now define Pn ∈ M1 (Ω) to be (Φn )∗ μn . Then Pn {ω : ω Fm ∈ Γ } = μm (Γ ) for Γ ∈ BΩm and 1 ≤ m ≤ n. What we need to show is that there exists a P ∈ M1 (Ω) such that P An = Pn An for all n ≥ 1, and to do so it suffices that show that there is a P ∈ M1 (Ω) to which {Pn : n ≥ 1} converges in the sense that ϕ, Pn −→ ϕ, P for all ϕ ∈ Cb (Ω; R). That there can be at most one P is obvious, and, by Prohorov’s Theorem (cf. Theorem 9.1.9 in [14]), to prove the existence of such a P, it suffices to prove that, for each > 0, there is a compact set K ⊆ Ω such that inf n≥1 Pn (K ) ≥ 1 − . To this end, let > 0 be given, and, using Ulam’s Lemma (cf. Lemma 9.1.7 in [14]), choose a compact K 1 % e1 in E 1 such that μ1 K 1 ) ≥ 1 − 2 , and, for n ≥ 2, choose a compact K n % en in E n so that μn Ωn−1 × K n ≥ 1 − 2n . A standard diagonalization argument shows that K = {ω : ω(n) ∈ K n for n ≥ 1} is / K m for some 1 ≤ m ≤ n}, a compact subset of Ω. In addition, if An = {ω : ω(m) ∈ then An Ω \ K . Finally, P1 (A1 ) = μ1 (E 1 \ K 1 ) ≤ 2 , and, for each n ≥ 2,
https://avxhm.se/blogs/hill0
54
2 Gaussian Measures and Families
Pn (An ) ≤
n
Pn {ω : ω(m) ∈ / Km }
m=1
= μ1 (E 1 \ K 1 ) +
n
Pm {ω : ω(m) ∈ / Km }
m=2 n
= μ1 (E 1 \ K 1 ) +
n
μm Ωm−1 × (E m \ K m ) ≤ 2−m ≤ .
m=1
m=1
Therefore, for any 1 ≤ m ≤ n, Pn (Am ) = Pm (Am ) ≤ . At the same time, if 1 ≤ n < m, then, because e j ∈ K j for all j ≥ 1, Φn−1 (Am ) = Φn−1 (An ) and therefore Pn (Am ) = Pn (An ) ≤ . It follows that Pn (Ω \ K ) = limm∞ Pn (Am ) ≤ for all n ≥ 1, which means that P exists. It remains to treat the case when I is uncountable. For each countable subset S ⊂ I, let P S be the measure just constructed on Ω S . Then, by uniqueness, if S1 ⊆ S2 , P S2 {ω S2 : ω S2 S1 ∈ A} = P S1 (A) for A ∈ BΩS . Hence we can define a finitely additive function P on the algebra A by setting P(π −1 S A) = P S (A) for A ∈ BΩ S . Furthermore, if {Ak : k ≥ 1} ⊆ A and Ak ∅, then we can choose a countable S ⊆ I such that {π S Ak : k ≥ 1} ⊆ BΩS , and clearly π S Ak ∅. Hence, P(Ak ) = P S π S Ak 0, and therefore, by the Daniell Extention Theorem, P admits an extension to F = σ(A) as a probability measure. Corollary 2.5.2 Let I be a non-empty set and c : I 2 −→ R a symmetric function satisfying (2.5.1). Then there is a probability space (Ω, F, P) on which there is a collection of random variables {X (ξ) : ξ ∈ I} whose span is a centered Gaussian family for which c is the covariance function. Proof Take Ω = RI , the space of all maps ω : I −→ R, and define the σ-algebra F accordingly. Given a ∅ = F ⊂⊂ I,take μ F tobe the centered Gaussian measure on R F with covariance matrix A F = c(ξ j , ξk ) ξ j ,ξk ∈F . Then it is easy to check that μ F : ∅ = F ⊂⊂ I is a consistent family. Now apply Theorem 2.5.1 to produce a probability measure P on (Ω, F) with the property that, for all n ≥ 1 and ξ1 , . . . , ξn ∈ I, the distribution of ω(ξ1 ), . . . , ω(ξn ) under P is μ{ξ1 ,...,ξn } , and conclude that {ω(ξ) : ξ ∈ I} are random variables with the required property.
2.5.1 Continuity Considerations Unless I is countable, there is an inherent weakness in the conclusion drawn in Theorem 2.5.1. To understand this weakness, consider the case in which I = R and, for all ξ ∈ R, E ξ = R. In this case, Ω is the set of all maps ω : R −→ R and
https://avxhm.se/blogs/hill0
2.5 Constructing Gaussian Families
55
F is the sigma algebra generated by the maps ω ω(ξ). Thus the only events to which the measure P can assign a probability are those which depend on the values of ω at a countable number of times. In particular, because, for any sequence {ξm : m ≥ 1} ⊆ R and every ω ∈ C(R; R) there is a discontinuous ω ∈ Ω that equals ω at all the ξn ’s, the only Γ ∈ F contained in C(R; R) is empty. Hence, C(R; R) has inner P-measure 0, and therefore, unless C(R; R) has P-outer measure 0, it cannot be P-measurable. When presented with an uncountable collection {X (ξ) : ξ ∈ I} of random variables on a probability space (Ω, F.P), one way to overcome the kind of problem raised above is to ask whether there is another family { X˜ (ξ) : ξ ∈ I} which has the same distribution as {X (ξ) : ξ ∈ I} and has the desired property. With this in mind, one says that { X˜ (ξ) : ξ ∈ I} is a version of {X (ξ) : ξ ∈ I} if X˜ (ξ) = X (ξ) (a.s., P) for each ξ ∈ I. Clearly, any version of {X (ξ) : ξ ∈ I} will have the same distribution as {X (ξ) : ξ ∈ I}. To see how this idea applies to questions like continuity, again consider the setting described at the end of the preceding paragraph, and suppose that there exists a version {ω(ξ) ˜ : ξ ∈ R} of {ω(ξ) : ξ ∈ R} with the property that ξ ω(ξ) ˜ is always continuous. Then, even though the inner P-measure of C(R; R) is 0, its outer P-measure is 1. To see this, suppose Γ ∈ σ {ω(ξm ) : m ≥ 1} contains C(R; R), and set A = {ω : ω(ξm ) = ω˜ m (ξm ) for all m ≥ 1}. Then, P(A) = 1 and, because {ω˜ : ω ∈ Ω} ⊆ A and therefore A ⊆ Γ , P(Γ ) = 1. We will now apply the preceding considerations to the Gaussian processes constructed in Corollary 2.5.2. Suppose that I is a metric space and that c is a covariance function on I 2 . If there is a choice of random variables X (ξ), ξ ∈ I, that are continuous with respect to ξ and form a centered Gaussian process for which c is the covariance function, then c must be a continuous function on I 2 . Indeed, suppose (ξn , ηn ) −→ (ξ, η) in I 2 . Then, by Lemma 2.1.1, X (ξn ) −→ X (ξ) and X (ηn ) −→ X (η) in L 2 (P; R), and so c(ξn , ηn ) − c(ξ, η) = E X (ξn ) − X (ξ) X (ηn ) + E X (ξ) X (ηn ) − X (η) −→ 0.
However, the converse statement is false. That is, just because c is continuous on I 2 , there need not exist Gaussian process {X (ξ) : ξ ∈ I} on some probability space (Ω, F, P) with c as its covariance function and the property that ξ ∈ I X (ξ, ω) is a continuous function for every ω ∈ Ω. In fact, even when I = R, the condition on c that guarantees the existence of a continuous process is very technical.3 Nonetheless, the following theorem of Kolmogorov enables us to prove that if I = R N and, for some β ∈ (0, 1], , sup
c(ξ, ξ) + c(η, η) − 2c(ξ, η) : |η| ∨ |ξ| ≤ R < ∞ for all R > 0, (2.5.2) |η − ξ|β
then a continuous choice exists.
3
Its sufficiency was proved by Dudley, and its necessity was proved later by Talagrand.
https://avxhm.se/blogs/hill0
56
2 Gaussian Measures and Families
In the proof of the following theorem we will use the fact that if Q is a closed cube in R N and, for each vertex v of Q, av is an element of a vector space E, then there is a unique function f : Q −→ E, known as the multi-linear extension of v av , such that f (v) = av for each vertex v and f is an affine function of each coordinate. For example, if Q = [0, 1]2 , then f (ξ1 , ξ2 ) = (1 − ξ1 )(1 − ξ2 )a(0,0) + (1 − ξ1 )ξ2 a(0,1) + ξ1 (1 − ξ2 )a(1,0) + ξ1 ξ2 a(1,1) . The general case can be handled by translation, scaling, and induction on N . Theorem 2.5.3 Suppose that, for some cube Q = [a, b] N ⊆ R N , {X (ξ) : ξ ∈ Q} is a family of random variables taking values in a Banach space E, and assume that, for some p ∈ [1, ∞), C < ∞, and r ∈ (0, 1], 1 N p E X (η) − X (ξ) E p ≤ C|η − ξ| p +r for all ξ, η ∈ Q.
Then there exists a version { X˜ (ξ) : ξ ∈ Q} of {X (ξ) : ξ ∈ Q} such that ξ ∈ Q −→ X˜ (ξ)(ω) ∈ E is continuous for all ω ∈ Ω. In fact, for each α ∈ [0, r ), there is a K < ∞, depending only on N , p, r , and α, such that ⎡ ⎢ E⎣
sup ξ,η∈[0,R] N ξ=η
X˜ (η) − X˜ (ξ) E |η − ξ|α
p
⎤ 1p N ⎥ +r −α . ⎦ ≤ K C(b − a) p
Proof Given ξ ∈ R N , define ξ1 = 1≤ j≤N |ξ j |. First note that, by an elementary translation and rescaling argument, it suffices to treat the case when Q = [0, 1] N . Given n ≥ 0, let In be the set of unordered pairs {k, m} ⊆ N N ∩ [0, 2n ] N for which m − k1 = 1, and define 0 0 Mn = max 0 X (m2−n ) − X k2−n 0 E {k,m}∈In
⎛
⎞ 1p
0 0 0 X (m2−n ) − X k2−n 0 p ⎠ . ≤⎝ E {k,m}∈In
Observe that ⎞ 1p ⎛
p 1p 0 0 N p E Mn ≤⎝ E 0 X (m2−n ) − X k2−n 0 E ⎠ ≤2 N −1 C2−nr + p . {k,m}∈In
https://avxhm.se/blogs/hill0
2.5 Constructing Gaussian Families
57
Let n ≥ 0 be given, and take X n ( · ) to be the function that equals X ( · ) at the vertices of and is multi-linear on each cube m2−n + [0, 2−n ] N . Because X n+1 (ξ) − X n (ξ) is a multi-linear function on m2−n−1 + [0, 2−n−1 ] N , sup X n+1 (ξ) − X n (ξ) E
ξ∈[0,1] N
=
max
m∈N N ∩[0,2n+1 ] N
X n+1 (m2−n−1 ) − X n (m2−n−1 ) E .
Since X n+1 (m2−n−1 ) = X (m2−n−1 ) and either X n (m2−n−1 ) = X (m2−n−1 ) or
X n (m2−n−1 ) =
θm,k X (k2−n−1 ),
{k: {k,m}∈In }
where the θm,k ’s are non-negative and sum to 1, it follows that sup X n+1 (ξ) − X n (ξ) E ≤ Mn+1
ξ∈[0,1] N
and therefore that
! 1p sup X n+1 (ξ) −
E
ξ∈[0,1] N
≤ 2 N C2−nr + p . N
p X n (ξ) E
Hence, for 0 ≤ n < n ,
! 1p
E sup sup X n (ξ) − n >n ξ∈[0,1] N
p X n (ξ) E
2 N C2−nr + p ≤ , 1 − 2−r N
and so there exists a measurable map X˜ : [0, 1] N × Ω −→ E such that ξ X˜ (ξ, ω) is continuous for each ω ∈ Ω and ! 1p
E
sup X˜ (ξ) −
ξ∈[0,1] N
p X n (ξ) E
2 N C2−nr + p . 1 − 2−r N
≤
(∗)
Furthermore, X˜ (ξ) = X (ξ) (a.s., P) if ξ = m2−n for some n ≥ 0 and m ∈ N N ∩ [0, 2n ] N , and therefore, since ξ X˜ (ξ) is continuous and 1 N p E X (m2−n ) − X (ξ) E p ≤ C2−n( p +r )
if m j 2−n ≤ ξ j < (m j + 1)2−n for 1 ≤ j ≤ N , it follows that X (ξ) = X˜ (ξ) (a.s.,P) for each ξ ∈ [0, 1] N .
https://avxhm.se/blogs/hill0
58
2 Gaussian Measures and Families
To prove the final estimate, note that 1
X n (η) − X n (ξ) E ≤ N 2 2n |ξ − η|Mn , and therefore, P-almost surely, 1 X˜ (η) − X˜ (ξ) E ≤ 2 sup X˜ (β) − X n (β) E + N 2 2n |ξ − η|Mn .
β∈[0,1] N
Hence, by (∗), ⎡
⎢ E⎣
sup
ξ,η∈[0,1] 2−n−1
N , p
and C < ∞ such that
1 p E X˜ (η) − X˜ (ξ) E p ≤ C|η − ξ|β for all ξ, η ∈ R N .
Then, for each γ > β, X˜ (ξ) − X˜ (0) E = 0 (a.s., P) and in L p (P; R). |ξ|→∞ |ξ|γ lim
Proof Set ξ∞ = max1≤ j≤N |ξ j |. Take α = 0 in Theorem 2.5.3. Then, because sup 2n−1 ≤ξ∞ ≤2n
X˜ (ξ) − X˜ (0) E ≤ 2−(n−1)γ sup X˜ (ξ) − X˜ (0) E , |ξ|γ 2n−1 ≤ξ∞ ≤2n
https://avxhm.se/blogs/hill0
2.5 Constructing Gaussian Families
E
sup 2n−1 ≤ξ∞ ≤2n
59
X˜ (ξ) − X˜ (0) E |ξ|γ
p ! 1p
≤2
−γ(n−1)
and so
E
sup 2n−1 ≤ξ
E
sup ξ∞ ≥2m−1
∞
≤2n
X˜ (ξ) −
X˜ (ξ) − X˜ (0) E |ξ|γ
p X˜ (0) E
p ! 1p ≤
! 1p ≤ 2β+γ K C2(β−γ)n ,
2β+γ K C (β−γ)m 2 . 1 − 2β−γ
Corollary 2.5.5 Suppose that c is a covariance function on R N × R N that satisfies (2.5.2). Then, there exists a probability space (Ω, F, P) on which there is a centered Gaussian process {X (ξ) : ξ ∈ R N } with covariance function c such that X ( · , ω) ∈ C(R N ; R) for each ω ∈ Ω. Moreover, for each α < β2 , X ( · , ω) is Hölder continuous of order α on compact sets. Finally, if there is a C < ∞ such that c(η, η) + c(ξ, ξ) − 2c(ξ, η) ≤ C|η − ξ|β for all ξ, η ∈ R N , then, for each γ > β2 , {X (ξ) : ξ ∈ R N } can be chosen so that |ξ| → ∞.
|X (ξ,ω)| |ξ|γ
−→ 0 as
Proof Let {X (ξ) : ξ ∈ R N } be a Gaussian process with covariance function c. Then, for each R > 0, E |X (η) − X (ξ)|2 = c(η, η) + c(ξ, ξ) − 2c(ξ, η) ≤ C R |η − ξ|β for some C R < ∞ and all ξ, η ∈ [−R, R] N , and so, for any p ∈ [1, ∞), 1 β E |X (η) − X (ξ)| p p ≤ K p C R |η − ξ| 2 , p N (d x). Now, given 0 ≤ α < β2 , choose 0 < r < where K p = |x| p γ0,1 determine p ∈ [1, ∞) by Np = β2 − r . Then
β 2
− α, and
1 N E |X (η) − X (ξ)| p p ≤ K p C R |η − ξ| p +r , and so the continuity assertion follows from Theorem 2.5.3. Similarly, when C R can be chosen independent of R, the concluding growth estimate follows from Corollary 2.5.4.
https://avxhm.se/blogs/hill0
60
2 Gaussian Measures and Families
2.5.2 Some Examples By far the most renowned Gaussian process parameterized by the real numbers is the one constructed originally by Wiener and usually called Brownian motion. The covariance function for this process is " w(s, t) =
|s| ∧ |t| if st > 0 0 if st ≤ 0.
(2.5.3)
To check that w is a covariance function, it suffices to check that its restriction to [0, ∞)2 is and to observe that
∞
s∧t =
1[0,∞) (s − u)1[0,∞) (t − u) du for s, t ≥ 0,
0
and therefore that n
j,k=1
s j ∧ sk α j αk = 0
∞
⎛ ⎝
n
⎞2 α j 1[0,∞) (s j − u)⎠ du ≥ 0
j=1
for all choices of s1 , . . . , sn ∈ [0, ∞) and α1 , . . . , αn ∈ R. Theorem 2.5.6 There exists a probability space (Ω, F, P) on which there is a centered Gaussian process {B(t) : t ∈ R} with covariance function w and having the property that, for each ω ∈ Ω, B( · , ω) a Hölder continuous function of every order 0 ≤ α < 21 , and |B(t, ω)| 1 = 0 for every γ > . lim γ |t|→∞ |t| 2 Proof Simply observe that w(t, t) + w(s, s) − 2w(s, t) = |t − s|, and apply Corollary 2.5.5. From now on I will say that a collection {X (t) : t ∈ R} of random variables on a probability space (Ω, F, P) is a Brownian motion if it is a centered Gaussian process with covariance w and X ( · , ω) is continuous for P-almost every ω ∈ Ω. Clearly, P-almost all of the paths X ( · , ω) will possesses all the properties described in Theorem 2.5.6. Wiener constructed this process to provide a mathematically rigorous foundation for Einstein’s model of a physical phenomenon, first reported by a botanist named Brown, on which he was basing his kinetic theory of gases. However, at least for mathematicians, its renown does not derive from its connection to Einstein or the frequency of its appearance in models of physical, engineering, and even financial phenomena, but to the sometimes startling properties it possesses. The following
https://avxhm.se/blogs/hill0
2.5 Constructing Gaussian Families
61
provides a few elementary examples of these exotic properties, the fourth of which casts some doubt on the physical validity of Einstein’s model. Theorem 2.5.7 Let {B(t) : t ∈ R} be a Brownian motion on (Ω, F, P). (i) σ {B(−t) : t ≥ 0} is independent of σ {B(t) : t ≥ 0} ; for each s ∈ [0, ∞), B(s) is independent of σ {B(t2 ) − B(t1 ) : s ≤ t1 ≤ t2 } ; and, for any n ≥ 1 and t0 < · · · < tn , {B(tm ) − B(tm−1 ) : 1 ≤ m ≤ n} are mutually independent, N (0, tm − tm−1 )-random variables. (ii) Both {−B(t) : t ∈ R} and {B(−t) : t ∈ R} are Brownian motions, and, for 1 any α > 0, {α− 2 B(αt) : t ∈ R} is again a Brownian motion. In addition, for any T ∈ R, {B(t + T ) − B(T ) : t ∈ R} is a Brownian motion. ˜ ˜ ˜ : t ∈ R} is a Brow(iii) Set B(0) = 0 and B(t) = |t|B 1t for t = 0. Then { B(t) nian motion. (iv) Define Vn (t) =
n &2
t'−1
2 B (m + 1)2−n − B m2−n for t > 0.
m=0
Then, lim sup |Vn (s) − s| = 0 (a.s., P) for all t > 0.
n→∞ s∈[0,t]
In particular, P-almost no path B( · ) has locally bounded variation or is locally Hölder continuous of any order larger than 21 . Proof By Theorem 2.4.3, to verify (i) it suffices to observe that E B(t)B(−t) = 0 for all t ∈ R, E B(t2 ) − B(t1 ) B(s) = 0 if 0 ≤ s ≤ t1 ≤ t2 , and E B(t2 ) − B(t1 ) B(t4 ) − B(t3 ) = 0 for t1 < t2 < t3 < t4 . Since all the processes described in (ii) are centered Gaussian processes with continuous paths, all that one needs to do is check that w is their covariance function. Similarly, the collection in (iii) is a centered Gaussian process with covariance function w, and so it suffices to check that its paths are P-almost surely continuous, which comes down to showing that they are 1 continuous at 0. But we know that = 0 (a.s.,P). = 0 (a.s.,P), and so lim |t|B lim|t|→∞ B(t) t→0 t t Turning to (iv), define Δm,n = B (m + 1)2−n − B m2−n for m ≥ 0. Clearly, Δm,n ∈ N (0, 2−n ), and, by (i), for each n ≥ 0, the Δm,n ’s are mutually independent. Now set Ym,n = Δ2m,n − 2−n and Sn (m) = m k=0 Yk,n . Then, for each n ≥ 0, the Ym,n ’s are mutually independent random variable with mean 0 and variance 21−2n . Since |Vn (t) − t| ≤ |Vn (t) − 2−n &2n t'| + 2−n = |Sn (&2n t' − 1)| + 2−n , it suffices to show that sup |Sn (m)| : 0 ≤ m ≤ 2n t} −→ 0 almost surely. But, by Kolmogorov’s inequality,
https://avxhm.se/blogs/hill0
62
2 Gaussian Measures and Families
P
2 sup |Sn (m)| ≥ ≤ −2 E Sn &2n t' ≤ −2 21−n t,
0≤m≤2n t
and so the proof of the first part of (iv) is complete. To prove the rest, use (ii) to see that it suffices to consider the paths restricted to [0, 1]. Next observe that if ϕ : [0, 1] −→ R is Hölder continuous of order α > 21 , then n 2
−1
2 ϕ (m + 1)2−n − ϕ m2−n ≤ ϕC α ([0,1];R) 2n(1−2α) −→ 0.
m=0
Also, if ϕ is continuous and of bounded variation and if ρ is the modulus of continuity for ϕ, then n 2
−1
2 ϕ (m + 1)2−n − ϕ m2−n ≤ ϕvar ρ(2−n ) −→ 0.
m=0
Hence, P-almost no Brownian path can have either of these properties.
Given a collection {X (t) : t ∈ R} of random variables, set Fs = σ {X (τ ) : τ ∈ (−∞, s]} for s ∈ R. Then {X (t) : t ∈ R} is aMarkov process if, for all s < t, the conditional distribution of X (t) given σ {X (s)} is the same as that of X (t) given Fs . If {X (t) : t ∈ R} is a Markov process and, for s < t, x ∈ R P(s, x; t, · ) ∈ M1 (R) is a measurable map (i.e., x P(s, x; t, Γ ) is measurable for all Γ ∈ BR ) such that
E {X (t) ∈ Γ } Fs = P s, X (s); t, Γ for Γ ∈ BR ,
then P(s, x; t, · ) is called the transition probability function for {X (t) : t ∈ R}. Finally, {X (t) : t ∈ R} is said to be a homogeneous Markov process if it has a transition probability function that, as a function of s < t, depends only on t − s. That is, P(s, x; t, · ) = P(t − s, x, · ) ≡ P(0, x; t − s, · ). + An important fact about a Brownian motion {B(t) : t ∈ R} is that {B(t ) : t ∈ R} + is a Markov process. Indeed, if Fs = σ {B(τ ) : τ ∈ (−∞, s]} , then, for all s < t and bounded measurable ϕ : R −→ R, +
E ϕ B(t ) Fs = ϕ B(s + ) + y γ0,t + −s + (dy) since, by (i) in Theorem 2.5.7, B(t + ) − B(s + ) is independent of Fs and is a centered Gaussian with variance t + − s + . Therefore {B(t + ) : t ∈ R} is a Markov process with transition probability γx,t + −s + . This conclusion is a special case of the following. Lemma 2.5.8 Let {X (t) : t ∈ R} be a centered Gaussian process with covariance function c. If {X (t) : t ∈ R} is a Markov process, then c(r, t)c(s, s) = c(r, s)c(s, t) for all r ≤ s < t.
https://avxhm.se/blogs/hill0
(2.5.4)
2.5 Constructing Gaussian Families
63
Conversely, if c satisfies (2.5.4) and t ∈ R c(t, t) ∈ R is continuous, then {X (t) : t ∈ R} is a Markov process with transition probability function P(s, x; t, · ) = γb(s,t)x,a(s,t) , where " c(s,t) if c(s, s) > 0 b(s, t) = c(s,s) 0 if c(s, s) = 0, " c(t, t) − a(s, t) = c(t, t)
c(s,t)2 c(s,s)
if c(s, s) > 0 if c(s, s) = 0,
and γb,0 = δb . Proof Note that subadditivity implies that c(s, t)2 ≤ c(s, s)c(t, t). First suppose that {X (t) : t ∈ R} is a Markov process. Given r ≤ s < t, either c(s, s) = 0, in which case c(r, s) = 0 and therefore (2.5.4) holds, or c(s, s) > 0, in c(s,t) X (s) is independent of X (s) and therefore E X (t) Fs = which case X (t) − c(s,s) c(s,t) X (s), which means that c(s,s) c(r, s)c(s, t) c(s, t) E X (s)X (r ) = . c(r, t) = E X (t)X (r ) = c(s, s) c(s, s) Conversely, assume t c(t, t) is continuous and that (2.5.4) holds. If r ≤ s < t and c(s, s) > 0, then c(s, t)c(r, s) E X (t) − b(s, t)X (s) X (r ) = c(r, t) − = 0, c(s, s) and so X (t) − b(s, t)X (s) is a centered Gaussian with variance a(s, t) that is independent of Fs . What remains is to handle the case when c(s, s) = 0 for some s ∈ R. To this end, let t ∈ R be given. There is no problem if either c(t, t) = 0 or c(s, s) > 0 for all s ≤ t. Now set s0 = sup{s ≤ t : c(s, s) = 0}, and assume that s0 > −∞. By continuity, c(s0 , s0 ) = 0, and so we are done if s0 = t. If s0 < t, then we have to show that c(t, r ) = 0 for all r < s0 . But r < s0 =⇒ c(t, r )c(r, r ) = c(r, s0 )c(s0 , t) = 0, which, since c(r, r ) = 0 =⇒ c(t, r ) = 0, is possible only if c(t, r ) = 0.
2.5.3 Stationary Gaussian Processes A stochastic process {X (ξ) : ξ ∈ V }, where V is a vector space, is said to be stationary if, for all η ∈ V , the distribution of {X (ξ + η) : ξ ∈ V } is the same as that of {X (ξ) : ξ ∈ V }. For a centered Gaussian process, it is easy to show that stationarity is equivalent to its covariance function c being translation invariant. That is c(ξ, η) = c(0, η − ξ) for all ξ, η ∈ V .
https://avxhm.se/blogs/hill0
64
2 Gaussian Measures and Families
When V = R N , c is translation invariant if and only if there is an R-valued, non-negative definite function f such that c(ξ, η) = f (η − ξ), which means that f must be a characteristic function for a symmetric probability measure if c is f (0) continuous. Thus we have a ready source of translation invariant covariance functions. Notice that, by Corollary 2.5.5, a centered Gaussian process with covariance function f (η − ξ) will admit a continuous version if f is Hölder continuous of some positive order at 0. Theorem 2.5.9 Suppose that f : R −→ R is a continuous, non-negative definite function and that {X (t) : t ∈ R} is a continuous, centered Gaussian process with covariance function f (t − s) on (Ω, F, P). If f (t) −→ 0 as t → ∞, then {X (t) : t ∈ R} is ergodic. In particular, for any bounded, Borel measurable ϕ : R −→ R, lim
T ∞
1 T
T
ϕ X (t) dt = E ϕ X (0) =
ϕ(y) γ0, f (0) (dy) (a.s., P).
0
Proof Without loss in generality, we will assume that Ω = C(R; R) and F = σ {ω(t) : t ∈ R} . For each s ∈ R, define the time-shift map Ts : Ω −→ Ω by Ts ω(t) = ω(s + t). Clearly each Ts is an F-measurable, P-measure preserving map, and Ts ◦ Tt = Tt+s for all s, t ∈ R. By the Individual Ergodic Theorem (cf. Theorem 6.2.12 in [14]), what we have to show is that if A ∈ F is time-shift invariant (i.e., A = Tt−1 A for all t ≥ 0), then P(A) ∈ {0, 1}. For that purpose, we will show that, for any A ∈ F, P A ∩ Tt−1 A −→ P(A)2 as t → ∞,
(∗)
from which it is clear that P(A) = P(A)2 if A is time-shift invariant. Let A be the collection of all subsets having the form ω : ω(t1 ), . . . , ω(tn ) ∈ Γ for some n ≥ 1, t1 < · · · < tn , and Γ ∈ BRn , and let C be the set of A ∈ F for which (∗) holds. Clearly A is an algebra which generates F and C is closed under complementation. Thus, if we can show that C ⊇ A and is closed under non-decreasing limits, then we will know that C = F. ω(tn ) ∈ Γ , where t1 < To prove that A ⊆ C, let A = ω : ω(t1 ), . . . · · · < tn . By Lemma 2.4.1, we may assume that B = f (t j − tk ) 1≤ j,k≤n is nondegenerate. Next, for t > tn − t1 , define C(t) =
f (t j − tk − t)
1≤ j,k≤n
B C(t) . and B(t) = C(t) B
https://avxhm.se/blogs/hill0
2.5 Constructing Gaussian Families
65
Then γ0,B(t) is the joint distribution of
ω(t1 ), . . . ω(tn )
and ω(t1 + t), . . . ω(tn + t) ,
and so P(A ∩ Tt−1 A) = γ0,B(t) (Γ × Γ ). Since B(t) −→
B 0 0 B
as t → ∞
2 , which means that and B is non-degenerate, γ0,B(t) converges in variation to γ0,B −1 2 2 P(A ∩ Tt A) −→ γ0,B (Γ ) = P(A) . Finally, suppose that {An : n ≥ 1} ⊆ C and that An A. Then
0 ≤ P A ∩ Tt−1 A − P An ∩ Tt−1 An ≤ P(A \ An ) + P T −1 An \ Tt−1 An = 2P(A \ An ).
Hence, if (∗) holds for each An , then it also holds for A.
We now have two examples of the influence that properties of f have on the properties of the associated Gaussian process: continuity properties of f are reflected in continuity properties of the paths, and decay properties of f at infinity influence long term properties, in particular ergodicity, of the process. Periodicity provides a more dramatic example. To wit, take f (t) = cos(2πt). Of course one can use Corollary 2.5.5 to construct an associated Gaussian process, but there is a more 2 , and define revealing way to do so. Namely, take Ω = R2 and P = γ0,1 X (t, ξ) = ξ1 cos(2πt) + ξ2 sin(2πt) for t ∈ R and ξ ∈ Ω. Clearly {X (t) : t ∈ R} is a centered Gaussian process under P. Moreover, E X (s)X (t) = cos(2πs) cos(2πt) + sin(2πs) sin(2πt) = cos 2π(t − s) . Thus cos 2π(t − s) is the covariance function for {X (t) : t ∈ R}. As Doob noticed, there are very few R-valued characteristic functions f : R −→ R for which the corresponding covariance function satisfies (2.5.4). Indeed, if (2.5.4) holds, f (ξ + η) = f (ξ) f (η) for all ξ, η ≥ 0. Thus, if g(ξ) = sgn(ξ) log f (ξ), then g(ξ + η) = g(ξ) + g(η) for all ξ, η ∈ R, and so, by Theorem 2.2.4, f (ξ) = e−α|ξ| for some α ≥ 0. As a consequence, if {X (t) : t ∈ R} is the centered Gaussian process |t| centered Gaussian process with covariance function e− 2 , then any other stationary that is Markov will have the same distribution as a X (bt) : t ∈ R} for some a, b ≥ 0. Because they were introduced by Ornstein and Uhlenbeck, these are called stationary Ornstein–Uhlenbeck processes. There is an interesting way to construct a stationary Ornstein–Uhlenbeck process from a Brownian motion. Namely, let {B(t) : t ∈ R} be a Brownian motion on a probability space (Ω, F, P) that supports a random variable X 0 ∈ N (0, 1) that is independent of {B(t) : t ∈ R}, and define
https://avxhm.se/blogs/hill0
66
2 Gaussian Measures and Families
|t| X (t) = e− 2 X 0 + B sgn(t)(e|t| − 1) .
(2.5.5)
It is not hard to check that {X (t) : t ∈ R} is an stationary Ornstein–Uhlenbeck pro|t| cess with covariance function e− 2 . For aficionados of Brownian motion, this representation facilitates computations. For example, it makes clear that, for s < t, t−s X (t) − e− 2 X (s) is independent of Fs = σ {X (τ ) : τ ≤ s} and has distribution γ0,1−es−t . Hence, {X (t) : t ∈ R} is a homogeneous Markov process, and the function p(t, x, y) that appeared in (2.3.2) is the density of its transition probability function. Exercise 2.5.1 (i) Show that if c1 and c2 are covariance functions on I 2 , then c1 c2 is also a covariance function there. Hint: Construct a probability space on which there are mutually independent families {X 1 (ξ) : ξ ∈ I} and {X 2 (ξ) : ξ ∈ I} of centered random variables, the first with covariance function c1 and the second with covariance function c2 . Alternatively, see if you can find a more direct proof. (ii) Assume that f ∈ C 2 (R; R) is a characteristic function whose second derivative is Hölder continuous of some positive order. Set I = {(s, t) ∈ R2 : st = 0}, and define c (s, 0); (t, 0) = f (t − s), c (s, 0); (0, t) = c (0, t); (s, 0) = f (t − s), and c (0, s); (0, t) = − f (t − s). Show that c is a covariance function on I and that there is a continuous centered Gaussian process {Z (s, t) : (s, t) ∈ I} for which c is the covariance function. Next, set X (t) = Z (t, 0) and Y (t) = Z (0, t), and show that {X (t) : t ∈ R} is a centered Gaussian process with covariance function f (t − s), that {Y (t) : t ∈ R} is a centered Gaussian process with covariance function − f (t − s), and that, almost surely, X (t) − X (s) =
t
Y (τ ) dτ for all s < t.
s
Equivalently, X ( · ) ∈ C 1 (R; R) and Y ( · ) = X˙ ( · ) almost surely. (iii) If f is an R-valued characteristic function that is periodic and {X (t) : t ∈ R} is a continuous, centered Gaussian process for which f (t − s) is the covariance function, show that almost surely X ( · ) is periodic with same period as f . In addition, if f ∈ C 2 (R; R) and f is Hölder continuous of some positive order, show that X (t) is independent of X˙ (t) for each t ∈ R. (iv) Let {B(t) : t ∈ R} be a Brownian motion. Using the representation of an Ornstein–Uhlenbeck process in terms of a Brownian motion, show that, for each α > 0, |B(t)| =0 lim 1 |t|→∞ |t| 2 (log |t|)α
https://avxhm.se/blogs/hill0
2.5 Constructing Gaussian Families
67
almost surely. This improves our earlier result but is still far short of the result coming from the law of the iterated logarithm (cf. Sect. 4.2.2) which says that |B(t)| lim 3 = 1. 2t log(log t)
|t|→∞
(v) Let {B(t) : t ∈ R} be a Brownian motion, and define B0 (t) = B(t) − ψ(t)B(1), where ψ(t) = (t ∧ 1)+ . Show that B(1) is independent of σ {B0 (t) : t ∈ R} . Next, given a non-negative, σ {B(t) : t ∈ R} -measurable function F, set ¯ F(y) = E F ◦ (B0 + ψ y) for y ∈ R, and show that
E F ◦ B, B(1) ∈ Γ =
Γ
¯ F(y) γ0,1 (dy)
for Γ ∈ BR . Conclude that
E F ◦ B σ {B(1)} = F¯ B(1) . Thus, the distribution of {B0 (t) : t ∈ R} is that of a Brownian motion conditioned to be at 0 at time t = 1. For this reason, {B0 (t) : t ∈ R} is sometimes called a pinned Brownian motion. (vi) Referring to (iv), show that {B0 (t + ) : t ∈ R} is a Markov process. Further, if P(s, x; t, · ) is its transition probability function, show that P(s, x; t, · ) = γ 1−t x, (1−t)(t−s) for 0 < s < t < 1. 1−s
1−s
(vii) Let {X (t) : t ∈ R} be an Ornstein–Uhlenbeck process with covariance func |t| tion e− 2 , and set Ft = σ {X (τ ) : τ ∈ (−∞, t]} for t ∈ R. If Hn is the nth Hermite polynomial, show that, for each n ∈ N, nt e 2 Hn X (t) , Ft , P is a martingale. (viii) Let {B(t) : t ∈ R} be a Brownian motion and set Ft = σ {B(τ + ) : τ ∈ (−∞, t]} for t ∈ R. Show that, t +ξ2 + exp ξ B(t ) − , Ft , P 2
https://avxhm.se/blogs/hill0
68
2 Gaussian Measures and Families
1 n is a martingale for each ξ ∈ R. Next, for each n ∈ N, check that (t, x) t 2 Hn t − 2 x is a polynomial in (t, x) ∈ R2 and that n 1 (t + ) 2 Hn (t + )− 2 B(t + ) , Ft , P is a martingale. Hint: Use (2.3.18).
https://avxhm.se/blogs/hill0
Chapter 3
Gaussian Measures on a Banach Space
3.1 Motivation The theories of Gaussian measures and Hilbert spaces are inextricably related, and the goal of this and the next chapter is to explain and explore that relationship. To understand the relationship, it is best to begin in the finite dimensional setting. Let H be an N -dimensional Hilbert space. A Borel probability measure W on H is said to be the standard Gaussian measure on H if {( · , h) H : h ∈ H } is a centered Gaussian family under W with covariance function (g, h) H . To see that such a W N exists, let {h m : 1 ≤ m ≤ N } be an orthonormal basis in H , define Φ : R −→ H N N by Φ(x) = m=1 xm h m , and take W = Φ∗ γ0,1 . To see that there is only one such W, note that Φ is invertible and that Φ −1 (h) = (h, h 1 ) H , . . . , (h, h N ) H , and check N . Alternatively, set λ H = Φ∗ λR N , check that λ H is the unique that (Φ −1 )∗ W = γ0,1 translation invariant Borel measure on H that assigns measure 1 to the unit cube {h ∈ H : (h, h m ) H ∈ [0, 1] for 1 ≤ m ≤ N }, and show that W(dh) = (2π)− 2 e− N
h2H 2
λ H (dh).
Given a finite dimensional Banach space B, let x, ξ denote the action of ξ ∈ B ∗ on x ∈ B, and say that a W ∈ M1 (B) is a centered Gaussian measure on W. B if { · , ξ : ξ ∈ B ∗ } is a centered Gaussian family under Assuming that, in addition, W is non-degenerate in the sense that var · , ξ > 0 if ξ = 0, there is a unique Hilbert structure on B such that W is the standard Gaussian measure for B with that Hilbert structure. Namely, let c be the covariance function for { · , ξ : ξ ∈ B ∗ }, and observe that c is a bilinear, symmetric quadratic form on B ∗ such that c(ξ, ξ) > 0 unless ξ = 0. Thus, for each ξ ∈ B ∗ there is a unique Aξ ∈ B such that Aξ, η = c(ξ, η) for all η ∈ B ∗ , and the map A : B ∗ −→ B is linear isomorphism. Now define (x, y) H = x, A−1 y for x, y ∈ B. Clearly (x, y) (x, y) H is bilinear. In addition, if x = Aξ and y = Aη, then (x, y) H = Aξ, η = c(ξ, η),
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. W. Stroock, Gaussian Measures in Finite and Infinite Dimensions, Universitext, https://doi.org/10.1007/978-3-031-23122-3_3
https://avxhm.se/blogs/hill0
69
70
3 Gaussian Measures on a Banach Space
from which it follows that ( · , ·· ) H is an inner product on B. Finally, if y1 , y2 ∈ B, then (x, y1 ) H (x, y2 ) H W(d x) = x, A−1 y1 x, A−1 y2 W(d x) = c A−1 y1 , A−1 y2 = (y1 , y2 ) H . Hence, W is the standard Gaussian measure for the Hilbert space B with inner product ( · , ··) H . The situation is much more complicated when B is infinite dimensional, and all the complications stem from the fact that there is no standard Gauss measure on an infinite dimensional Hilbert space. Indeed, suppose that W were the standard Gauss measure on the infinite dimensional Hilbert space H , and let {h m : m ≥ 0} be an orthonormal sequence in H . Set X m (x) = (x, h m ) H , and observe that {X m : m ≥ 0} would be a sequence of mutually independent N (0, 1) random variables under W. Thus ∞
n n 2 −t 2 X m (x) W(d x) = lim exp − e γ0,1 (dt) = lim 3− 2 = 0, n→∞
m=0
n→∞
2 which leads to the contradiction that x2H ≥ ∞ m=0 X m (x) = ∞ for W-almost every x ∈ H . When H is separable, and therefore dim(H ) is countable, there is a more intuitive reason why W cannot exist. Namely, if W existed, then one would guess that 1 h2H (3.1.1) W(dh) = e− 2 λ H (dh), Z dim(H )
where Z = (2π) 2 and λ H is the translation invariant Borel measure on H with givesmeasure1totheunitcube{h ∈ H : (h, h m ) H ∈ [0, 1] for m ∈ N},{h m : m ≥ 0} being an orthonormal basis in H . However, even though Feynman was able to make remarkable predictions based on related expressions, the formula in (3.1.1) defies mathematical rationalization: the measure λ H does not exist and Z = ∞. It was to overcome the problems caused by the preceding non-existence result that Segal and Gross [9] developed the theory to which this chapter is devoted.
3.2 Some Background This section contains a few results which will play a central role in what follows.
https://avxhm.se/blogs/hill0
3.2 Some Background
71
3.2.1 A Little Functional Analysis Besides standard results about Hilbert and Banach spaces, I will need a few less familiar ones. Throughout, the Banach spaces here will be over R, and I will use B ∗ to denote the dual of B and x, ξ to denote the action of ξ ∈ B ∗ on x ∈ B. Lemma 3.2.1 If B is a separable Banach space, then, for each R > 0, the weak* topology on B B ∗ (0, R) is compact and metrizable. Hence the weak* topology on any bounded subset of B ∗ is second countable (i.e., is generated by a countable neighborhood basis), and so B ∗ with the weak* topology is separable. Proof It suffices to show that the weak* topology on B B ∗ (0, 1) is metrizable and compact. To this end, choose a sequence {xn : n ≥ 1} ⊆ B B (0, 1) of linearly independent elements whose span S is dense in B, and define ρ(ξ, η) =
∞
2−n |xn , η − ξ | for ξ, η ∈ B B ∗ (0, 1).
n=1
Let {ξk : k ≥ 1} ⊆ B B ∗ (0, 1) be given. Clearly, ρ is a metric, and ρ(ξk , ξ) −→ 0 if {ξk : k ≥ 1} is weak* convergent to ξ. Conversely, if ρ(ξk , ξ) −→ 0, then limk→∞ xn , ξk −→ xn , ξ for each n ≥ 1 and therefore x, ξk −→ x, ξ for all x ∈ S. Since S is dense in B and ξk B ∗ ≤ 1 for all k ≥ 1, it follows that {ξk : k ≥ 1} is weak* convergent to ξ. Thus ρ is a metric for the weak* topology on B B ∗ (0, 1). To prove compactness, use a diagonalization procedure to extract a subsequence {ξk j : j ≥ 1} such that am = lim j→∞ xm , ξk j exists for each m ≥ 1, and set f (x) = n n m=1 αm am if x = m=1 αm x m . Because the x m ’s are linearly independent, f is a well defined linear function on S. Furthermore, f (x) = lim j→∞ x, ξk j and therefore | f (x)| ≤ x B for x ∈ S. Hence there is a ξ ∈ B B ∗ (0, 1) such that lim x, ξk j = f (x) = x, ξ for x ∈ S,
j→∞
and, just as above, this means that ξk j −→ ξ in the weak* topology.
Given a Borel probability measure μ on a Banach space B, define its characteristic function μˆ : B ∗ −→ C to be its Fourier transform μ(ξ) ˆ = eix,ξ μ(d x). Observe that if ξ1 , . . . , ξ N ∈ B ∗ and μξ1 ,...,ξ N ∈ M1 (R N ) is the distribution of x x, ξ1 , . . . , x, ξ N under μ, then ⎛ ⎞ N ˆ⎝ η j ξ j ⎠ for η ∈ R N . μ ξ1 ,...,ξ N (η) = μ j=1
Hence, by Lemma 1.1.1, μξ1 ,...,ξ N is uniquely determined by μ. ˆ
https://avxhm.se/blogs/hill0
72
3 Gaussian Measures on a Banach Space
Lemma 3.2.2 If B is a separable Banach space, then B B is the smallest σ-algebra with respect to which x ∈ B x, ξ ∈ R is measurable for all ξ ∈ B ∗ . In particular, ˆ if μ, ν ∈ M1 (E), then μ = ν if and only if μˆ = ν. Proof Clearly x ∈ B −→ x, ξ ∈ R is B B measurable for all ξ ∈ B ∗ . To prove that sets in B B are measurable with respect to the σ-algebra generated by these maps, it suffices to show that · B is measurable with respect to them. For that purpose, choose a sequence {ξn : n ≥ 1} ⊆ B ∗ which is weak* dense in B B ∗ (0, 1), and check that x B = supn≥1 x, ξn . Giventhepreceding,weknowthatμ = ν onB B if,foralln ≥ 1andξ1 , . . . , ξn ∈ B ∗ , the distribution of the map x ∈ B −→ x, ξ1 , · · · , x, ξn ∈ Rn is the same under μ as it is under ν, and, by the preceding observation, this will be true if and only if νˆ = μ. ˆ I will also be making use of Bochner’s theory of integration for Banach space valued functions. What follows is a brief outline of his theory. (See Sect. 5.1.2 in [14] for more details.) Let B be a separable, real Banach space and (Ω, F, P) a probability space. A function X : Ω −→ B is said to be simple if X is F-measurable and X takes only finitely many values, in which case its expected value with respect to P is the element of B given by X (ω) P(dω) ≡ x P(X = x). E[X ] = Ω
x∈B
Notice that an equivalent description of E[X ] is as the unique element of B with the property that E[X ], ξ = E X, ξ for all ξ ∈ B ∗ , and therefore that the mapping taking simple X to E[X ] is linear. Next, observe that, by the same argument as was used in the proof of Lemma 3.2.2, ω ∈ Ω X (ω) B ∈ R is F-measurable if X : Ω −→ B is F-measurable, and say that X is integrable if X B is. The space of B-valued integrable functions will be denoted by 1 L 1 (P; B), and, as usual, we will identify elements of L (P; B) that differ on a set of P-measure 0 and will take X L 1 (P;B) = E X B . Bochner’s definition of the integral of an B-valued, integrable X is completed in the following theorem. Theorem 3.2.3 If X : Ω −→ B is integrable, then there is a unique element E[X ] ∈ B satisfying E[X ], ξ = E[X, ξ ] for all ξ ∈ B ∗ . In particular, the mapping X ∈ L 1 (P; B) −→ E[X ] ∈ B is linear and satisfies E[X ] ≤ E X B . B
(3.2.1)
Finally, if X ∈ L 1 (P; B), then there is a sequence {X n : n ≥ 1} of B-valued, simple functions with the property that X n − X L 1 (P;B) −→ 0.
https://avxhm.se/blogs/hill0
3.2 Some Background
73
Proof Clearly uniqueness and linearity both follow immediately from the given characterization of E[X ]. As for (3.2.1), simply observe that ξ, E[X ] = E ξ, X ≤ E[X B ] for all ξ ∈ B B ∗ (0, 1). Thus, what remains is to prove existence and the final approximation assertion. In fact, once the approximation assertion is proved, then existence will follow immediately from the observation that, by (3.2.1), E[X ] can be taken equal to limn→∞ E[X n ] if X − X n L 1 (P;B) −→ 0. To prove the approximation assertion, begin with the case when M = supω∈Ω X (ω) B < ∞. Choose a dense sequence {x : ≥ 1} in B, and set A0,n = ∅ and A,n = ω : X (ω) − x E < n1 for , n ≥ 1. Then, for each n ∈ Z+ there exists an L n ∈ Z+ with the property that P Ω\
Ln
A,n
=1
0. Then because, by Lemma 3.2.1, B B ∗ (0, 1) with the weak* topology is also separable, there exists an > 0 and a ξ ∈ B B ∗ (0, 1) with the property that P X, ξ ≥ > 0, from which it follows that there is an A ∈ F for which E X, A , ξ = E X, ξ , A = 0. Turning to the uniqueness and other properties of X Σ , it is obvious that uniqueness follows immediately from the preceding and that linearity follows from uniqueness. As for (3.2.3), notice that if ξ ∈ B ∗ and ξ B ∗ ≤ 1, then E X Σ , ξ , A = E X, ξ , A ≤ E X B , A = E E X B Σ , A for every A ∈ Σ. Hence, by of conditional expectation for R-valued ran the theory dom variables, X Σ , ξ ≤ X B Σ (a.s., P) for each element ξ from the unit ball in B ∗ ; and so, because B B ∗ (0, 1) with the weak* topology is separable, (3.2.3) follows. Finally, to prove the existence of X Σ , suppose that X is simple, let R denote its range, and check that
https://avxhm.se/blogs/hill0
3.2 Some Background
75
XΣ ≡
x P X = x Σ
x∈R
has the required properties. In order to handle general X ∈ L 1 (P; B), use the approximation result in Theorem 3.2.3 to find a sequence {X n : n ≥ 1} of simple functions that tend to X in L 1 (P; B). Then, since (X n )Σ − (X m )Σ = X n − X m Σ (a.s., P) and therefore, by (3.2.3), E (X n )Σ − (X m )Σ B ≤ E X n − X m B , we X Σ ∈ L 1 (P; B) to which the sequence exists a Σ-measurable know that there 1 (X n )Σ : n ≥ 1 converges in L (P; B); and clearly X Σ has the required properties. In the future, I will call X Σ the conditional expectation of X given Σ and will use E[X | Σ] to denote it. The following are essentially immediate consequences of uniqueness: E Y X Σ = Y E X Σ (a.s., P) for bounded, R-valued, Σ -measurable Y, (a.s, P) E X T = E E X Σ T
and
whenever T is a sub-σ-algebra of Σ. Once one knows how to take the conditional expectation of Banach space valued random variables, one should investigate what can be said about Banach space valued martingales. That is, given a non-decreasing sequence {Fn : n ≥ 0} of subσ-algebras and an adapted sequence {X n : n ≥ 0} ⊆ L 1 (P; B) (i.e., X n is Fn measurable for each n) with the property that X n = E[X n+1 |Fn ] for all n ≥ 0, the triple (X n , Fn , P) is called a martingale, and one should wonder how many of the properties of R-valued martingales it has. As the following theorem demonstrates, many of the results for R-valued martingales have easily proved analogs for Banach space valued ones. An exception is Doob’s Martingale Convergence Theorem, which does not hold in general. Theorem 3.2.5 Let Banach space and X n , Fn , P a B-valued B be a separable martingale. Then X n B , Fn , P is a non-negative submartingale and therefore, for each N ∈ Z+ and all R ∈ (0, ∞), P
sup X n B ≥ R 0≤n≤N
≤
1 E X N B , sup X n B ≥ R . R 0≤n≤N
In particular, for each p ∈ (1, ∞),
https://avxhm.se/blogs/hill0
(3.2.4)
76
3 Gaussian Measures on a Banach Space
1p p E sup X n B ≤ n∈N
1 p p sup E X n B p . p − 1 n∈N
(3.2.5)
X ∈ L 1 (P; B), and X n = E[X | Fn ], then Finally, if F = σ n≥0 Fn , X n − X B −→ 0 (a.s.,P) and in L 1 (P; R). Proof The fact that X n B , Fn , μ is a submartingale is an easy application of the inequality in (3.2.3); and, given this fact, the inequalities in (3.2.4) and (3.2.5) follows from the corresponding results in the R-valued case. Let X ∈ L 1 (P; B) be given, and set X n = E[X |Fn ], n ∈ N. Because of (3.2.4), we know that the set of X for which X n −→ X (a.s.,P) is a closed subset of L 1 (P; B). Moreover, if X is simple, then the P-almost everywhere convergence of X n to X follows easily from the R-valued result. Hence, we now know that X n −→ X (a.s, P) for each X ∈ L 1 (μ; B). To prove that the convergence is taking place in L 1 (P; B), note that, by Fatou’s Lemma, X L 1 (P;B) ≤ lim X n L 1 (P;B) , n→∞
whereas (3.2.3) guarantees that X L 1 (P;B) ≥ lim X n L 1 (P;B) . n→∞
Hence, because
X n B − X B − X n − X B ≤ 2X B ,
the convergence in L 1 (P; B) is an application of Lebesgue’s Dominated Convergence Theorem.
3.2.2 Fernique’s Theorem A Borel probability measure W on separable Banach space B is said to a centered Gaussian measure if { · , ξ : ξ ∈ B ∗ } is a centered Gaussian family under W. Equivalently, if c(ξ, η) =
x, ξ x, η W(d x) for ξ, η ∈ B ∗ ,
c(ξ,ξ) then c is the covariance function for this family and W(ξ) = e− 2 for ξ ∈ B ∗ . The engine that drives much of what follows is the following remarkable result of X. Fernique. See Corollary 4.1.4 for a sharper statement.
https://avxhm.se/blogs/hill0
3.2 Some Background
77
Theorem 3.2.6 Let W be a centered Gaussian measure on the separable Banach space B. If 1 , R = inf r : P(x B ≥ r ) ≤ 10 then
e
x2B 18R 2
∞ e !2n W(d x) ≤ K ≡ e + . 3 n=0 1 2
(3.2.6)
Proof Set P = W 2 on Ω = B 2 , and define X 1 x1 , x2 ) = x1 and X 2 x1 , x2 ) = x2 . Then X 1 and X 2 are independent B-valued random variables under P. Furthermore, 2 2 and Y2 = x1 −x , then, for all ξ1 , ξ2 ∈ B ∗ , if Y1 = x1 +x 1 1 22
22
−1 2− 21 (ξ1 + ξ2 ) W 2 2 (ξ1 − ξ2 ) EP eiY1 ,ξ1 eiY2 ,ξ2 = W
c(ξ1 ,ξ1 ) c(ξ2 ,ξ2 ) c(ξ1 + ξ2 , ξ1 + ξ2 ) c(ξ1 − ξ2 , ξ1 − ξ2 ) − = e− 2 e− 2 = exp − 4 4 P iX 1 ,ξ1 iX 2 ,ξ2 , =E e e and so, by Lemma 3.2.2, (Y1 , Y2 ) has the same distribution under P as (X 1 , X 2 ). Let 0 < s ≤ t be given, and use the preceding to justify P X 1 B ≤ s P X 1 B ≥ t = P X 1 B ≤ s & X 2 B ≥ t 1 1 = P X 1 − X 2 B ≤ 2 2 s & X 1 + X 2 B ≥ 2 2 t 1 1 ≤ P X 1 B − X 2 B ≤ 2 2 s & X 1 E + X 2 B ≥ 2 2 t 2 1 1 ≤ P X 1 B ∧ X 2 B ≥ 2− 2 (t − s) = P X 1 B ≥ 2− 2 (t − s) . Now suppose that P X 1 B ≤ R ≥ 1 tn = R + 2 2 tn−1 for n ≥ 1. Then
9 , 10
and define {tn : n ≥ 0} by t0 = R and
2 P X 1 B ≤ R P X 1 B ≥ tn ≤ P X 1 B ≥ tn−1 and therefore
2 P X 1 B ≥ tn−1 P X 1 B ≥ tn ≤ P X 1 B ≤ R P X 1 B ≤ R
for n ≥ 1. Working by induction, one gets from this that 2n P X 1 B ≥ R P X 1 B ≥ tn ≤ P X 1 B ≤ R P X 1 B ≤ R
https://avxhm.se/blogs/hill0
78
3 Gaussian Measures on a Banach Space
and therefore, since tn = Hence,
n+1 2 −1 1 2 2 −1
2
R ≤3·2
n+1 2
n n R, that P X 1 B ≥ 3 · 2 2 R ≤ 3−2 .
∞ X 1 2E 1 n n+1 n EP e 18R2 ≤ e 2 P X 1 B ≤ 3R + e2 P 3 · 2 2 R ≤ X 1 B ≤ 3 · 2 2 R n=0 1
≤ e2 +
∞ n=0
e !2n 3
= K.
A B-valued random variable X on a probability space (Ω, F, P) is said to be a centered Gaussian random variable if, for all ξ ∈ B ∗ , X, ξ is a centered Gaussian random variable, which is equivalent to saying that X ∗ P is a centered Gaussian measure on B. Theorem 3.2.6 provides the following vast generalization of the estimate (2.4.3) in Lemma 2.4.2. Corollary 3.2.7 Let X be a B-valued, centered Gaussian random variable on (Ω, F, P). Then 1 2 =⇒ E eαX B ≤ K , P X B ≥ R) ≤ 10 where α = (18R 2 )−1 and K is the one in Theorem 3.2.6. Hence, if {X n : n ≥ 1} is a sequence of B-valued centered Gaussian random variable on (Ω, F, P) that converge in probability to a random variable X , then X is a centered Gaussian p random variable and E X n − X B −→ 0 for all p ∈ [1, ∞). Proof The first assertion follows from Theorem 3.2.6 applied to X ∗ P. To prove the second assertion, use Lemma 2.1.1 to see thatX is a centered Next, observe Gaussian. 1 and therefore that, that there is an R < ∞ for which supn≥1 P X n B ≥ R ≤ 10 p by Theorem 3.2.6, {X − X n B : n ≥ 1} is uniformly integrable for each p ∈ [1, ∞).
3.2.3 Gaussian Measures on a Hilbert Space Let H be a separable Hilbert space. If W is a centered Gaussian measure on H , then {( · , h) H : h ∈ H } is a Gaussian family under W whose covariance function c is a non-negative" definite, symmetric bilinear function on H 2 . Moreover, by Theorem 3.2.6, C ≡ x2H W(d x) < ∞, and therefore c(h, h) =
(x, h)2H W(d x) ≤ Ch2H ,
https://avxhm.se/blogs/hill0
3.2 Some Background
79
Hence there exists a non-negative definite, symmetric operator A : H −→ H such that c(g, h) = (g, Ah) H for all g, h ∈ H . In particular, if {h n : n ≥ 1} is an orthonormal basis in H , then Trace(A) =
∞ ∞ (h n , Ah n ) H = (x, h n )2H W(d x) n=1
n=1
=
x2H W(d x) = C < ∞,
which means that A is a symmetric trace class operator. Conversely, if A is a symmetric, non-negative definite trace class operator on H , then there is a W ∈ M1 (H ) under which {( · , h) H : h ∈ H } is a centered Gaussian family with covariance function c(g, h) = (g, Ah) H . To see this, let {h n : n ≥ 0} be an orthonormal basis in H consisting of eigenfunctions for A, and set αn = (h n , Ah n ) H , the eigenvalue corresponding to h n . Next, let {X n : n ≥ 0} be a sequence of mutually independent N (0, 1) random variables on a probability space (Ω, F, P), and define n 1 αk2 X k h k for n ≥ 0. Sn = k=0
Then, for m < n,
n αk ≤ αk , E Sn − Sm 2H = k=m+1
k>m
and so there exists a Borel measurable S : Ω −→ H such that αk . E S − Sm 2H ≤ k>m
Furthermore, for each h ∈ H , (S, h) H under P is a centered Gaussian random variable with variance ∞ αk (h, h k )2H = (h, Ah) H . k=0
Finally, take W = S∗ P. Combined with the preceding, we have now proved the following theorem. Theorem 3.2.8 Let A be a symmetric, non-negative definite operator of the separable Hilbert space H . Then there exists a W ∈ M1 (H ) under which {( · , h) H : h ∈ H } is a centered Gaussian family with covariance function (g, Ah) H if and only if A is trace class. Exercise 3.2.1 with the prop(i) Let {xn : n ≥ 0} be a sequence in the ∞separable Banach space B N erty that ∞ n=0 x n B < ∞. Show that n=0 |ξn |x n B < ∞ for γ0,1 -almost every
https://avxhm.se/blogs/hill0
80
3 Gaussian Measures on a Banach Space
∞ ξ ∈ RN , and define X : RN −→ B so that X (ξ) = ∞ n=0 ξn x n if n=0 |ξn |x n B < ∞ and X (ξ) = 0 otherwise. Show that the distribution W of X is a centered Gaussian measure on B. In addition, show that W is non-degenerate if and only if the span of {xn : n ≥ 0} is dense in B. (ii) Here is an application of Fernique’s Theorem and part (i) to functional analysis. Let E and F be a pair of separable Banach spaces and ψ a Borel measurable, linear map from E to F. Given an E-valued centered Gaussian random variable X , use Theorem 2.2.3 to see that ψ ◦ X is an F-valued, centered Gaussian random variable, and apply Fernique’s Theorem to conclude that ψ ◦ X is a square integrable. Next, suppose that ψ were not continuous, and choose {xn : n ≥ 0} ⊆ E and {ηn : n ≥ 0} ⊆ F ∗ so that xn E = 1 = ηn F ∗ and ψ(xn ), ηn ≥ (n + 1)3 . Using part (i), show that there exist centered Gaussian E-valued random variables {X n : n ≥ 0} ∪ {X }, such that X n (ξ) = (n + 1)−2 ξn xn and X (ξ) ≡ every ξ ∈ RN . Check that
∞ n=0
N X n (ξ) exists for γ0,1 -almost
N N (dξ) ≥ ψ ◦ X (ξ), ηn 2 γ0,1 (dξ) ψ ◦ X (ξ)2F γ0,1 N ≥ ψ ◦ X n (ξ), ηn 2 γ0,1 (dξ) ≥ (n + 1),
N ; F). Conclude that every and thereby arrive at the contradiction that ψ ◦ X ∈ / L 2 (γ0,1 Borel measurable, linear map from E to F is continuous. In conjunction with the Lusin–Souslin Theorem (which says that the inverse of an injective, Borel measurable function between complete, separable metric spaces is Borel measurable) this provides a proof of Schwartz’s result that a linear map between Banach spaces is continuous if its graph is Borel measurable. See [17] for more information.
3.3 Abstract Wiener Spaces Let B be an infinite dimensional, separable Banach space and W ∈ M1 (B) a nondegenerate centered Gaussian measure with covariance c (i.e., the covariance function for the Gaussian family { · , ξ : ξ ∈ B ∗ }). Even though, as explained in Sect. 3.1, we know that there is no Hilbert structure on B for which W is the standard Gauss measure, one can hope that there is a structure which provides some of the same advantages, and Gross’s notion an abstract Wiener space is one such structure. The following lemma is needed in order to explain Gross’s idea. Lemma 3.3.1 Let B be a separable, real Banach space, and suppose that H ⊆ B is a real Hilbert space that is continuously embedded as a dense subspace of B.
https://avxhm.se/blogs/hill0
3.3 Abstract Wiener Spaces
81
(i) For each ξ ∈ B ∗ there is a unique h ξ ∈ H with the property that h, h ξ H = h, ξ for all h ∈ H . Moreover, the map ξ ∈ B ∗ −→ h ξ ∈ H is linear, continuous, one-to-one, and onto a dense subspace of H . In fact, for any weak* dense subset S ∗ of B ∗ , {h ξ : ξ ∈ S ∗ } is dense in H . (ii) If x ∈ B, then x ∈ H if and only if there is a K < ∞ such that |x, ξ | ≤ K h ξ H for all ξ ∈ B ∗ . Moreover, for each h ∈ H , h H = sup{h, ξ : ξ ∈ B ∗ & h ξ H ≤ 1}. (iii) If L ∗ is a weak* dense subspace of B ∗ , then there exists a sequence {ξn : 0} is an orthonormal basis for H . Moreover, if n ≥ 0} ⊆ L ∗ such that {h ξn : n ≥ 2 x ∈ B, then x ∈ H if and only if ∞ n=0 x, ξn < ∞. In particular, if x H ≡ ∞ when x ∈ B \ H , then · H : B −→ [0, ∞] is a lower semi-continuous function and so B H ⊆ B B . Finally,
h, h
H
=
∞ h, ξn h , ξn for all h, h ∈ H. n=0
Proof Because H is continuously embedded in B, there exists a C < ∞ such that h B ≤ Ch H . Thus, if ξ ∈ B ∗ and f (h) = h, ξ , then f is linear and | f (h)| ≤ h B ξ B ∗ ≤ Cξ B ∗ h H , and so, by the Riesz Representation Theo rem for Hilbert spaces, there exists a unique h ξ ∈ H such that f (h) = h, h ξ H . In fact, h ξ H ≤ Cξ B ∗ , and uniqueness can be used to check that ξ h ξ is linear. To see that ξ h ξ is one-to-one, it suffices to show that ξ = 0 if h ξ = 0. But if h ξ = 0, then h, ξ = 0 for all h ∈ H , and therefore, because H is dense in B, ξ = 0. Finally, to complete the proof of (i), let S ∗ be a weak* dense subset of B ∗ , and suppose that {h ξ : ξ ∈ S ∗ } were notdense in H . Then there would exist an h ∈ H \ {0} with the property that h, ξ = h, h ξ H = 0 for all ξ ∈ S ∗ . But, since S ∗ is weak* dense in B ∗ , this would lead to the contradiction that h = 0. Obviously, if h ∈ H , then |h, ξ | = |(h, h ξ ) H | ≤ h H h ξ H for ξ ∈ B ∗ . Conversely, if x ∈ B and |x, ξ | ≤ K h ξ H for some K < ∞ and all ξ ∈ B ∗ , set f (h ξ ) = x, ξ for ξ ∈ B ∗ . Then, because ξ h ξ is one-to-one, f is a well-defined, linear functional on {h ξ : ξ ∈ B ∗ }. Moreover, | f (h ξ )| ≤ K h ξ H , and therefore, since {h ξ : ξ ∈ B ∗ } is dense in H , f admits a unique extension as a continuous, linear functional on H . Hence, by Riesz’s theorem, there is an h ∈ H such that x, ξ = f (h ξ ) = h, h ξ H = h, ξ , ξ ∈ B ∗ , which means that x = h ∈ H . In addition, if h ∈ H , then h H = sup{h, ξ : h ξ H ≤ 1} follows from the density of {h ξ : ξ ∈ B ∗ } in H , and this completes the proof of (ii). Turning to (iii), note that, because bounded subsets of B ∗ with the weak* topology are second countable, the same is true of bounded subsets of L ∗ . Thus, we can find a sequence in L ∗ that is weak* dense in B ∗ and then extract a subsequence {ηn : n ≥ 1} of linearly independent elements whose span S ∗ is weak* dense in B ∗ . Since the h ηn ’s are linearly independent, one can apply the Gram–Schmidt orthogonalization procedure
https://avxhm.se/blogs/hill0
82
3 Gaussian Measures on a Banach Space
to produce a sequence {ξn : n ≥ 0} whose span is S ∗ and for which {h ξn : n ≥ 0} is orthonormal in H . Because the span of {h ξn : n ≥ 0} is equal to {h ξ : ξ ∈ S ∗ }, which, by what we proved in (i), is dense in H , {h ξn : n ≥ 0} is an orthonormal basis in H . Knowing this, it is immediate that
h, h
H
=
∞
h, h ξn
H
h , h ξn
H
=
n=0
∞
h, ξn h , ξn .
n=0
∞ 2 2 In particular, h2H = ∞ n=0 h, ξn . Finally, if x ∈ B and n=0 x, ξn < ∞, set ∞ ∗ g = m=0 x, ξn h ξn . Then g ∈ H and x − g, ξ = 0 for allξ ∈ S . Hence, since 2 S ∗ is weak* dense in B ∗ , x = g ∈ H . Finally, because x ∞ m=0 x, ξm is lower semi-continuous, · H is also, and so B H ⊆ B B . Given a separable Hilbert space H which is dense and continuously embedded in a separable Banach space B, and a W ∈ M1 (B), the triple (H, B, W) is called an abstract Wiener space1 if
h ξ 2H W(ξ) = exp − for all ξ ∈ B ∗ . 2
(3.3.1)
Equivalently, (H, B, W) is an abstract Wiener space if and only if W ∈ M1 (B) and { · , ξ : ξ ∈ B ∗ } is a centered Gaussian family under W for which cov(ξ, η) = (h ξ , h η ) H . In the case when B itself is a Hilbert space and one uses its inner product to identify B ∗ with B, note that (3.3.1) is the same as
ei(x,y) B W(d x) = e−
h y 2H 2
where h y is the element of H such that (h, y) B = (h, h y ) H for all h ∈ H . As the following theorem shows, there is an abstract Wiener space associated with any non-degenerate centered Gaussian measure W on a Banach B. Theorem 3.3.2 If B is a separable Banach space and W ∈ M1 (B), then W is a non-degenerate, centered Gaussian measure if and only if there is separable Hilbert space for which (H, B, W) is an abstract Wiener space, in which case there is only one such H . Conversely, if H is a separable Hilbert space, then there is a separable Banach space and a W ∈ M1 (B) for which (H, B, W) is an abstract Wiener space. Proof Keep in mind that if W is a centered Gaussian measure on B, then, by Theorem 3.2.6, x2B ∈ L 1 (W; R). Hence, for any ξ ∈ B ∗ , x, ξ ∈ L 2 (W; R) and xx, ξ ∈ L 1 (W; B).
1
Although this formulation of an abstract Wiener space is somewhat different from Gross’s, it will be shown in Theorem 4.1.8 below that it is equivalent to his.
https://avxhm.se/blogs/hill0
3.3 Abstract Wiener Spaces
83
Suppose that (H, B, W) is an abstract Wiener space. Obviously, W is a centered Gaussian measure. To check that W is non-degenerate, let ξ ∈ B ∗ \ {0} and observe that x, ξ 2 W(d x) = h ξ 2H > 0. Next, to see that H is uniquely determined, note that, for any ξ, η ∈ B ∗ , #
h ξ , η = (h η , h ξ ) H =
$ xx, ξ W(x), η ,
x, η x, ξ W(d x) =
and so hξ =
xx, ξ W(d x).
(∗)
Since {h ξ : ξ ∈ H } is dense in H , this proves that H is uniquely determined. Now assume that W is a non-degenerate, centered Gaussian measure on B. Given ξ ∈ B ∗ , define h ξ by (∗), and define the quadratic form ( · , ··) H on {h ξ : ξ ∈ B ∗ } by (h η , h ξ ) H =
x, η x, ξ W(d x) for η ∈ B ∗ .
Because W is non-degenerate, h ξ 2H ≡ (h ξ , h ξ ) H = 0 =⇒ ξ = 0 =⇒ h ξ = 0. Hence ( · , ··) H is a non-degenerate inner product. Now let H be the completion of {h ξ : ξ ∈ B ∗ } with respect to · H . Equivalently, let L to be the closure in L 2 (W; R) of { · , ξ : ξ ∈ B ∗ }, and take % H=
& xψ(x) W(d x) : ψ ∈ L .
It is clear that h, ξ = (h, h ξ ) H , first when h = h η for some η ∈ B ∗ and then, by continuity, for all h ∈ H . In particular, for any ξ ∈ B ∗ with ξ B ∗ = 1, h, ξ ≤ h H h ξ H = h H
x, ξ W(d x) 2
where C =
x2B
W(d x)
21
21
≤ Ch H
,
and so h B ≤ Ch H . From this it follows that H is continuously embedded in B as ! " h 2 = exp − ξ2 H . a subspace. In addition, because h ξ 2H = x, ξ 2 W(d x), W(ξ) Finally, to see that H is dense in B, suppose that it were not. Then, by the Hahn– Banach Theorem, there would be a ξ ∈ B ∗ \ {0} such that h, ξ = 0 for all h ∈ H . But this would mean that h ξ 2H = h ξ , ξ = 0, which contradicts ξ = 0. Thus, we have shown that (H, B, W) is an abstract Wiener space.
https://avxhm.se/blogs/hill0
84
3 Gaussian Measures on a Banach Space
Next let H be a separable Hilbert space, and choose an orthonormal basis {h n : n ≥ 0}. For α > 1, define
h B (−α)
∞ = (n + 1)−α (h, h n )2H
21 ∈ [0, ∞],
n=0
and let B (−α) be the completion of H with respect to · B (−α) . Clearly B (−α) is a separable Hilbert space with inner product
x, y
B (−α)
=
∞ (n + 1)−α x, h n H y, h n H , n=0 α
: n ≥ 0} is an and H is dense in it. Further, if h n(−α) = (n + 1) 2 h n , then {h (−α) n orthonormal basis in B (−α) . Now define A : B (−α) −→ H ⊆ B (−α) by Ax =
∞ α (n + 1)− 2 x, h n(−α) B (−α) h n . n=0
Then y, Ax B (−α) = x, Ay B (−α) ,
x, Ax
B (−α)
=
∞
2 (n + 1)−α x, h (−α) ≥ 0, n B (−α)
m=0
and
∞ m=0
(−α) (−α) hm , Ah m
B (−α)
=
∞
(n + 1)−α < ∞,
m=0
and so A is a positive definite, symmetric trace class linear transformation on B (−α) . Hence, by Theorem 3.2.8, there exists a non-degenerate, centered Gaussian measure W ∈ M1 (B (−α) ) for which (x, y) ∈ B (−α) × B (−α) −→ (y, Ax) B (−α) ∈ R is the covariance function. Finally, given ξ ∈ (B (−α) )∗ , let y be the element of B (−α) such that x, ξ = (x, y) B (−α) for x ∈ B (−α) , and set h ξ = Ay. Then h ξ ∈ H and h, ξ = (h, y) B (−α) = h, Ay H = (h, h ξ ) H
https://avxhm.se/blogs/hill0
3.3 Abstract Wiener Spaces
85
for h ∈ H . In particular, h ξ 2H = y, Ay B (−α) , and so W(ξ) =
ei(x,y) B (−α) W(d x) = e−
h ξ 2H 2
.
This completes the proof that (H, B (−α) , W) is an abstract Wiener space.
Remark: There is another, one that is more functional analytic, way to describe the Hilbert space H for a non-degenerate, centered Gaussian measure W on a separable Banach space B. Namely, let c be the covariance function for the centered Gaussian family { · , ξ : ξ ∈ B ∗ }. By Theorem 3.2.6, |c(ξ, η)| ≤ Cξ B ∗ η B ∗ for some C < ∞, and, because W is non-degenerate, c(ξ, ξ) > 0 if ξ = 0. Thus there exists a unique, bounded linear map A : B ∗ −→ B ∗∗ such that c(ξ, η) = B ∗ η, Aξ B ∗∗ . What is less obvious is that, even when B is not reflexive, A determines a map of B ∗ into B. That is, for each ξ ∈ B ∗ , there is a unique h ξ ∈ B such that c(ξ, η) =
B ∗ η,
Aξ B ∗∗ = h ξ , η .
One way " to see this is to check, as we did in the preceding proof, that one can take h ξ = xx, ξ W(d x). Alternatively, one can use the fact that an element of B ∗∗ can be represented by an element of B if and only if it is weak* continuous. Thus, since, by Lemma 2.1.1, · , ηn −→ · , η in L 2 (W; R) if {ηn : n ≥ 1} is weak* convergent to η and therefore that Aξ is weak* continuous, h ξ exists. At this point, one proceeds as above: one defines the inner product (h ξ , h η ) H = c(ξ, η) and takes H to be to completion of {h ξ : ξ ∈ B ∗ } with respect to the corresponding Hilbert norm. As is evident from the proof of the second part of Theorem 3.3.2, there are many choices of the Banach space for a given Hilbert space even though there is only one Hilbert space for a given Banach space. Thus, when thinking in terms of abstract Wiener spaces, the canonical object is the Hilbert space. Metaphorically speaking, the Hilbert space is the skeleton, whereas the Banach space is simply a flesh coating that provides a comfy place for the measure to live, and there is a multitude of ways in which the same skeleton can be coated. A challenging problem is that of determining when one can find a better (i.e., thinner) coating than the one produced in the preceding proof. Related to the construction of abstract Wiener spaces for a given Hilbert space is the following. All separable Hilbert spaces of the same dimension are isometric, isomorphic images of one another, and, as the following theorem shows, there is an analogous statement that one can make about their associated abstract Wiener spaces. Theorem 3.3.3 Let (H, B, W) be an abstract Wiener space and F an isometric, isomorphism from H to the Hilbert space G. If C is a separable Banach space in which G is continuously embedded as a dense subspace and for which there exists
https://avxhm.se/blogs/hill0
86
3 Gaussian Measures on a Banach Space
an extension of F as a continuous, linear map F˜ from B to C, then (G, C, F˜∗ W) is an abstract Wiener space. In particular, one can take C to be the completion of G with respect to the norm · C on G given by gC = F −1 g B and F˜ to be the continuous extension of F as a map from B onto C. ˜ If η ∈ C ∗ , then, for all g ∈ G, Proof Let F˜ : C ∗ −→ B ∗ be the adjoint map for F.
g, gη
G
= g, η = F −1 g, F˜ η = F −1 g, h F˜ η H = g, Fh F˜ η G ,
and so gη = Fh F˜ η . Hence,
˜ F˜ η ei F x,η W(d x) = W
h F˜ η 2H gη 2G = exp − = exp − , 2 2
˜∗ W(η) = F
which completes the proof that G, C, F˜∗ W is an abstract Wiener space. Turning to the concluding assertion, it is clear that · C is a norm on G and that the completion C of G with respect to · C is a separable Banach space in which G is continuously embedded as a dense subspace. Moreover, F has a unique extension F˜ as an isometric, isomorphism from B onto C. Although Theorem 3.3.3 says that, abstract Wiener spaces corresponding to Hilbert spaces of the same dimension are, from a sufficiently abstract perspective, the same, just as in practice it is foolish to identify of all Hilbert spaces of the same dimension with one another, one should not think that the associated abstract Wiener spaces are all the same. Indeed, nearly nothing is of interest at that level of abstraction.
3.3.1 The Cameron–Martin Subspace and Formula Given a centered, non-degenerate Gaussian measure W on B, the Hilbert space H for which (H, B, W) is an abstract Wiener space is called its Cameron–Martin subspace. It is important to keep in mind the relationship between the Cameron–Martin subspace for W and the covariance function c for the Gaussian family { · , ξ : ξ ∈ B ∗ }. Namely, as explained in the Remark following the proof of Theorem 3.3.2, by c, and H is the completion of {h ξ : ξ ∈ B ∗ } with the map ξ h ξ is determined √ respect to the norm ξ c(ξ, ξ). Theorem 3.3.4 If (H, B, W) is an abstract Wiener space, then the map ξ ∈ B ∗ −→ h ξ ∈ H is continuous from the weak* topology on B ∗ into the strong topology on H . In particular, for each R > 0, {h ξ : ξ ∈ B B ∗ (0, R)} is a compact subset of H ,
https://avxhm.se/blogs/hill0
3.3 Abstract Wiener Spaces
87
B H (0, R) is a compact subset of B, and, when H is infinite dimensional, W(H ) = 0. Finally, there is a unique linear, isometric map I : H −→ L 2 (W; R) such that I(h ξ ) = · , ξ for all ξ ∈ B ∗ , and {I(h) : h ∈ H } is a Gaussian family in L 2 (W; R) for which ( · , ··) H is the covariance function. Proof To prove the initial assertion, suppose that {ξk : k ≥ 1} ⊆ B ∗ is weak* convergent to ξ. Then x, ξk −→ x, ξ for all x ∈ B, and therefore, by Lemma 2.1.1, · , ξk −→ · , ξ in L 2 (W; R). Hence 2 h ξk − h ξ 2H = · , ξk − · , ξ L 2 (W;R) −→ 0. Given the first assertion, the compactness of {h ξ : ξ ∈ B B ∗ (0, R)} in H follows from the compactness of B B ∗ (0, R) in the weak* topology. To see that B H (0, R) is compact in B, first note that B H (0, R) is compact in the weak topology on H . Therefore, all that we have to show is that the embedding map h ∈ B H (0, R) −→ h ∈ B is continuous from the weak topology on Hinto the strong topology on B. Thus, suppose that h k −→ h weakly in H . Because h ξ : ξ ∈ B B ∗ (0, 1) is compact in H , for each > 0 there exist an n ∈ Z+ and {ξ1 , . . . , ξn } ⊆ B B ∗ (0, 1) such that {h ξ : ξ ∈ B B ∗ (0, 1)} ⊆
n
B H (h ξm , ).
1
Now choose so that |h k − h, ξm | = |(h k − h, h ξm ) H < for 1 ≤ m ≤ n and k ≥ . Then, for any ξ ∈ B B ∗ (0, 1) and all k ≥ , |h k − h, ξ | ≤ + min h k − h, h ξ − h ξm H ≤ (1 + 2R) , 1≤m≤n
which proves that h k − h B ≤ (2R + 1) for all k ≥ . To see that W(H ) = 0 when H is infinite dimensional, choose {ξn : n ≥ 0} as in (iii) of Lemma 3.3.1, and set X n (x) = x, ξn . Then {X n : n ≥ 0} is an infinite sequence of mutually independent N (0, 1)-random variables, and so, since ' H= x∈B:
∞ n=0
( X n2 < ∞
and
exp −
∞
X n2
W(d x) = 0,
n=0
W(H ) = 0. Turning to the map I, define I(h ξ ) = · , ξ . Then, for each ξ, I(h ξ ) is a centered Gaussian with variance h ξ 2H , and so I is a linear isometry from {h ξ : ξ ∈ B ∗ } into L 2 (W; R). Hence, since {h ξ : ξ ∈ B ∗ } is dense in H , I admits a unique extension as a linear isometry from H into L 2 (W; R). Moreover, as the L 2 (W; R)-limit of centered Gaussians, I(h) is a centered Gaussian with variance h2H for each h ∈ H , and so cov I(g), I(h) = (g, h) H .
https://avxhm.se/blogs/hill0
88
3 Gaussian Measures on a Banach Space
The map I in Theorem 3.3.4 was introduced for the classical Wiener space by R.E.A.C. Paley and Wiener, and so I will call it the Paley–Wiener map. To appreciate its importance here, observe that {h ξ : ξ ∈ B ∗ } is the subspace of those g ∈ H with the property that h ∈ H −→ (h, g) H ∈ R admits a continuous extension to B. Even though, when dim(H ) = ∞, no such continuous extension exists for general g ∈ H , I(g) can be thought of as an extension of h (h, g) H , albeit one that is defined only up to a W-null set. Of course, one has to be careful when using this interpretation, since, when H is infinite dimensional, I(g)(x) for a given x ∈ E is not well-defined simultaneously for all g ∈ H . Nonetheless, by adopting it, one gets further evidence for the idea that W wants to be the standard Gaussian measure on H . Namely, because
eiI(h) dW = e−
h2H 2
, h ∈ H,
if W lived on H , then it would certainly be the standard Gaussian measure there. The Paley–Wiener map also provides strong support for the idea that it is the Hilbert space H that is the canonical component in the triple (H, B, W). Indeed suppose that (H, B1 , W1 ) and (H, B2 , W2 ) are both abstract Wiener spaces. Because there may be no obvious correspondence between elements of B1∗ and B2∗ , in general there will be no obvious relationship between the Gaussian families { · , ξ1 : ξ1 ∈ B1∗ } and { · , ξ1 : ξ2 ∈ B2∗ }, and so it will be hard to understand the connection between the measures W1 and W2 in terms of their Fourier transforms. On the other hand, if I1 and I2 are the Paley–Wiener maps for (H, B1 , W1 ) and (H, B2 , W2 ), then the Gaussian family {I1 (h) : h ∈ H } will have the same distribution under W1 as {I2 (h) : h ∈ H } has under W2 . In particular, if ξ1 ∈ B1∗ , then the W2 -random variable that should be associated with · , ξ1 is I2 (h ξ1 ), which need not equal · , ξ2 for any ξ2 ∈ B2∗ . Among the most important applications of the Paley–Wiener map is the following theorem about the behavior of Gaussian measures under translation. That is, if y ∈ B and Ty : B −→ B is given by Ty (x) = x + y, we will be looking at the measure (Ty )∗ W and its relationship to W. Using the reasoning suggested above, the result is easy to guess. Namely, if W really lived on H and were given by a Feynman-type representation (cf. Sect. 3.1) W(dh) =
1 − h2H e 2 λ H (dh), Z
then (Tg )∗ W should have the Feynman representation 1 − h−g2H 2 e λ H (dh), Z which could be rewritten as (Tg )∗ W (dh) = exp h, g H − 21 g2H W(dh).
https://avxhm.se/blogs/hill0
3.3 Abstract Wiener Spaces
89
Hence, if we are correct to interpret I(g) as ( · , g) H , we are led to guess that, at least for g ∈ H , (Tg )∗ W(d x) (d x) = Rg (x) W(d x), where Rg = exp I(g) − 21 g2H . (3.3.2) That (3.3.2) is correct was proved for the classical Wiener space by R. Cameron and T. Martin, and for this reason it is called the Cameron–Martin formula. In fact, one has the following result, the second half of which is due to I. Segal. Theorem 3.3.5 If (H, B, W) is an abstract Wiener space, then, for each g ∈ H , (Tg )∗ W W and the Rg in (3.3.2) is the corresponding Radon–Nikodym derivative. Conversely, if y ∈ B \ H , then (Ty )∗ W is singular with respect to W. Proof Let g ∈ H , and set μ = (Tg )∗ W. Then μ(ξ) ˆ =
eix+g,ξ W(d x) = exp ig, ξ − 21 h ξ 2H .
(∗)
Now define ν by the right-hand side of (3.3.2). Clearly ν ∈ M1 (B). Thus, we will have proved the first part of this theorem once we show that νˆ is given by the righthand side of (∗). To this end, observe that, for any h 1 , h 2 ∈ H , e
α1 I(h 1 )+α2 I(h 2 )
α2 α12 h 1 2H + α1 α2 h 1 , h 2 H + 2 h 2 2H dW = exp 2 2
for all α1 , α2 ∈ C. Indeed, this is obvious when α1 and α2 are pure imaginary, and, since both sides are entire functions of (α1 , α2 ) ∈ C2 , it follows in general by analytic continuation. In particular, by taking h 1 = g, α1 = 1, h 2 = h ξ , and α2 = i, it is easy to check that the right-hand side of (∗) is equal to ν(ξ). ˆ To prove the second assertion, begin by recalling from Lemma 3.3.1 that for any y ∈ B, y ∈ H if and only if there is a K < ∞ with the property that |y, ξ | ≤ K for all ξ ∈ B ∗ with h ξ H = 1. Now suppose that (Ty )∗ W ⊥ W, and let R be the Radon– Nikodym derivative of its absolutely continuous part. Given ξ ∈ B ∗ with h ξ H = 1, let Fξ be the σ-algebra generated by x x, ξ , and check that (Ty )∗ W Fξ W Fξ with Radon–Nikodym derivative
y, ξ 2 Y (x) = exp y, ξ x, ξ − . 2 Hence,
and so
1 2 Y ≥ EW R Fξ ≥ EW R 2 Fξ ,
y, ξ 2 exp − 8
1
1
= Y 2 , W ≥ α ≡ R 2 , W ∈ (0, 1].
https://avxhm.se/blogs/hill0
90
3 Gaussian Measures on a Banach Space
Since this means that y, ξ 2 ≤ 8 log α1 , y ∈ H , and so we have shown that (Ty )∗ W ⊥ W unless y ∈ H . To appreciate that Gaussian measures behave as nicely under translation as any probability measures on an infinite dimensional space, one should know about the theorem of Sudakov below. Recall that a subset of a metric space is said to be meager if it can be written as the countable union of nowhere dense (i.e., with empty interior) closed sets. If the metric space is complete, then the Baire Category Theorem says that meager subsets are nowhere dense. Lemma 3.3.6 If {K n : n ≥ 1}is a sequence of compact subsets of an infinite dimensional Banach space E, then ∞ n=1 K n is meager and so its interior is empty. Proof All that we have to show is that compact subsets of E are nowhere dense. Thus, suppose that K ⊆ E is compact. To show that its interior is empty, it suffices to prove that K − K = {y − x : x, y ∈ K } contains no neighborhood of 0. Indeed, if we knew that and if B E (x, r ) ⊆ K for some x ∈ E and r > 0, then we would have K − K . Thus, suppose that the contradiction that B E (0, r ) ⊆ B E (x, r ) − B(x, r ) ⊆ r > 0 is given, and choose x1 , . . . , xn ∈ E so that K ⊆ nm=1 B E xm , r4 . Because E is infinite dimensional, the Hahn–Banach Theorem says that there exists a ξ ∈ E ∗ such that ξ E ∗ = 1 and xm , ξ = 0 for 1 ≤ m ≤ n. Now choose y ∈ B E (0, r ) so that y, ξ ≥ 3r4 . Then, x, ξ ≤ r4 and x + y, ξ ≥ r2 for all x ∈ K . Hence x+y∈ / K for any x ∈ K , and so y ∈ B E (0, r ) \ (K − K ). Theorem 3.3.7 If μ is a Borel probability measure on an infinite dimensional, separable Banach space E, then the set of x ∈ E for which (Tx )∗ μ ⊥ μ is contained in a meager set S, and therefore the set of x ∈ E for which (Tx )∗ ⊥ μ is dense in E. Proof Using Ulam’s Lemma (cf. Lemma 9.1.7 in [14]), for each n ≥ 1 choose ˜ a compact K n ⊆ E for which μ(K n ) ≥ 1 − n1 , and set Γ = ∞ n=1 K n and Γ = ˜ ˜ (K − K ). Clearly x ∈ Γ ⇐⇒ −x ∈ Γ , μ(Γ ) = 1, and, by Γ −Γ = ∞ m n m,n=1 Lemma 3.3.6, Γ˜ is meager, which means its complement is dense. Finally, if −x ∈ / Γ˜ , then (−x + Γ ) ∩ Γ = ∅, and so (Tx )∗ μ(Γ ) = μ(−x + Γ ) = 0, which means that (Tx )∗ μ ⊥ μ. In view of Sudakov’s result, one sees that centered Gaussian measures provide as good a substitute for Lebesgue measure as one can hope to find: not only does one know the directions in which they are quasi-invariant under translation, one has a simple formula for the associated Radon–Nikodym derivatives. In particular, (3.3.5) provides the basis for an integration by parts formula that plays a central role in the development of the Sobolev type theory used in what is called Malliavin’s calculus. The integration by parts formula is based on the observation that if h ∈ H , then, for any non-negative, Borel measurable functions f and g EW ( f ◦ Th )g = EW Rh f (g ◦ T−h ) .
https://avxhm.se/blogs/hill0
(∗)
3.3 Abstract Wiener Spaces
91
Hence, if f ∈ L 2 (W; R) and g ∈ L p (W; R) for some p > 2, then ( f ◦ Th )g ∈ L 1 (W; R) and (∗) again holds. Now choose {ξ j : j ≥ 1} ⊆ B ∗ so that {h j : j ≥ 1} is an orthonormal basis in H when h j = h ξ j , and let P be the set of functions ϕ of the form ϕ(x) = p x, ξ1 , . . . , x, ξ N for some N ≥ 1 and real polynomial p on R N . Then, for any f ∈ L 2 (W; R) and ϕ ∈ P, d W d E ( f ◦ Tth )ϕ t=0 = EW Rth f (ϕ ◦ T−th ) t=0 = EW f ∂h ϕ , dt dt where ∂h ϕ(x) = x, h ϕ(x) − ∂h ϕ and ∂h ϕ =
N d ϕ(x + sh)s=0 = (h, h j ) H ∂ j p x, ξ1 , . . . , x, ξ N , ds j=1
∂ j p being the partial derivative of p with respect to the jth coordinate. In particular, d W E ( f ◦ Tth )ϕ = EW f ∂h ϕ , t=0 dt and so, if ϕ, ψ ∈ P,
EW ∂h ϕψ = EW ϕ∂h ψ .
(∗∗)
The equation in (∗∗) is the basic result, but there is a better way to formulate 2 R). To it, a formulation which will require us to know that P is dense in L (W; see that it is, set Fn = σ { · , ξm : 1 ≤ m ≤ n} . Then, for any f ∈ L 2 (W; R), E[ f | Fn ] −→ f in L 2 (W; R), and so it suffices to show that Fn -measurable f ’s can be approximated in L 2 (W; R) by elements of P. But this is equivalent N N ; R), then f˜ can be approximated in L 2 (γ0,1 ; R) to showing that if f˜ ∈ L 2 (γ0,1 by polynomials, and such an approximation is provided by introducing the Her) mite polynomials Hm (x) = Nj=1 Hm j (x j ) for m ∈ N N and x ∈ R N , checking that ) 1 Hm , Hm 2 N = m!δm,m where m! = Nj=1 m j !, setting H˜ m = (m!)− 2 Hm , and L (γ0,1 ;R)
N concluding that { H˜ m : m ∈ N N } is an orthonormal basis in L 2 (γ0,1 ; R). Hence the ˜ provides the required approximation by partial sums of the Hermite series for f polynomials. Now define ∇ϕ = Nj=1 (∂h j ϕ)h j for ϕ ∈ P. Clearly ∇ is a densely defined linear map of P into the space P H of functions Ψ of the form Nj=1 ψ j h j , where N ≥ 1 and {ψ1 , . . . , ψ N } ⊆ P. In addition, by applying (∗∗) to each term in N ∇ϕ, Ψ H = (∂h j ϕ)ψ j , j=1
https://avxhm.se/blogs/hill0
92
3 Gaussian Measures on a Banach Space
one sees that
N ∇ϕ, Ψ L 2 (W;H ) = ϕ, ∂hj ψ j L 2 (W;R) . j=1
Hence, P H is contained in the domain of the adjoint ∇ of ∇, and ∇ Ψ (x) =
N x, ξ j ψ j (x) − ∂h j ψ j (x) ∈ P for Ψ ∈ P H. j=1
Further, by the same argument with which we proved that P is dense in L 2 (W; R), one can show that P H is dense in L 2 (W; H ), and so ∇ is a densely defined linear map from L 2 (W; H ) into L 2 (W; R). In particular, ∇ admits an adjoint (∇ ) . Based on the preceding, we now know that the operator ∇ is closable in the sense that the closure of its graph is the graph of an operator. Indeed, the graph of the adjoint of any densely defined operator is closed, and the graph of ∇ is contained in the graph of (∇ ) . Hence, if W12 (W; R) is the set of f ∈ L 2 (W; R) for which there is an F ∈ L 2 (W; H ) such that ( f, F) is in the closure of the graph of ∇ on P, then we can extend ∇ to W12 (W; R) by taking ∇ f = F, in which case it will still be true that f, ∇ Ψ L 2 (W;R) = ∇ f, Ψ L 2 (W;H ) for Ψ ∈ P H. In order for this extension of ∇ to be useful, one needs to have a good criterion for determining when an f ∈ L 2 (W; R) is an element of W12 (W; R). Such a criterion is provided by the following theorem. Theorem 3.3.8 The space W12 (W; R) equals the domain of (∇ ) . Hence, if f ∈ L 2 (W; R), then f ∈ W12 (W; R) if and only if f, ∇ Ψ
L 2 (W;R)
≤ CΨ L 2 (W;H ) for some C < ∞ and all Ψ ∈ P H.
Proof Only the “if” assertion requires a proof. Let Γ denote the closure in L 2 (W; R) × L 2 (W; H ) of the graph of ∇ P. What we need to show is that if ( f, F) is in the graph of (∇ ) , then ( f, F) ∈ Γ . To this end, suppose that f is in the domain of (∇ ) and F = (∇ ) f , and set f n = E[ f | Fn ] and Fn = Πn E[F | Fn ], where Fn is as above and Πn is the orthogonal projection map from H onto span{h j : 0 ≤ j ≤ n}. Also, given Ψ = Nj=1 ψ j h j ∈ P H , set n∧N Ψn = E Πn Ψ | Fn ] = E ψ j | Fn h j . j=1
https://avxhm.se/blogs/hill0
3.3 Abstract Wiener Spaces
93
Observe that if j > n, then d E f n ◦ Tth j ψ j = E f n ∂hj ψ j , t=0 dt
0= and therefore
fn , ∇ Ψ
= f n , ∇ Πn Ψ L 2 (W;R) = f, ∇ Ψn L 2 (W;R) = F, Ψn L 2 (W;H ) = Fn , Ψ L 2 (W;H ) .
L 2 (W;R)
Hence ( f n , Fn ) is an element of the graph of (∇ ) , and so, since ( f n , Fn ) −→ ( f, F) in L 2 (W; R) × L 2 (W; H ), we need only deal with pairs ( f, F) which are N ; R), F N -measurable for some N ∈ Z+ . That is, we need to show that if f ∈ L 2 (γ0,1 2 N N N N F ∈ L (γ0,1 ; R ), and, for all Ψ = ψ1 , . . . , ψ N ) : R −→ R with polynomial coordinates, f, ∇ Ψ L 2 (γ N ;R) = F, Ψ L 2 (γ N ;R) , 0,1
where
∇ Ψ (x) =
N
0,1
∂ j ψ j
=
j=1
N
xjψj − ∂jψj ,
j=1
N then there exists a sequence {ϕk : k ≥ 1} ⊆ P such that ϕk −→ f in L 2 (γ0,1 ; R) 2 N N and ∇ϕk −→ F in L (γ0,1 ; R ). To see this, observe that, by (2.3.16), for m ∈ N N and 1 ≤ j ≤ N ,
1 (m j + 1) 2 f, H˜ m( j) L 2 (γ N
0,1 ;R)
= f, ∂ j H˜ m L 2 (γ N
0,1 ;R)
= F j , H˜ m L 2 (γ N
0,1 ;R)
,
where m( j) is the element of N N obtained by adding 1 to m j . Starting from this, it is not hard to show first that 2 m1 f, H˜ m 2 N F2 2 N N = L (γ0,1 ;R )
L (γ0,1 ;R)
m∈N N
and then, again by (2.3.16), that Fj =
f, H˜ m
{m: m j ≥1}
Hence we can take ϕk =
m1 ≤k (
f, H˜ m
˜
∂ H . N L 2 (γ0,1 ;R) j m
N L 2 (γ0,1 ;R)
H˜ m .
https://avxhm.se/blogs/hill0
94
3 Gaussian Measures on a Banach Space
3.3.2 Some Examples of Abstract Wiener Spaces Theorem 3.3.2 says that, to every non-degenerate, centered Gaussian measure on a separable Banach space, there corresponds an abstract Wiener space. In this subsection, we will give several examples that illustrate how that correspondence is made in practice. We begin with the measures discussed in Sect. 3.2.3, when B is a separable Hilbert space and one uses the inner product to identify B ∗ with B. In this case, to any non-degenerate, centered Gaussian measure W, there exists a positive definite, symmetric, trace class operator A : B −→ B for which (ξ, η) ∈ B 2 −→ c(ξ, η) ≡ (ξ, Aη) B ∈ R is the covariance function. Thus, if h ξ ≡ Aξ and (h ξ , h η ) H ≡ (h ξ , η) B for ξ, η ∈ B, then ( · , · · ) H is a non-degenerate inner product on {h ξ : ξ ∈ B} and if H is the completion of {h ξ : ξ ∈ B} with respect (h ξ , h η ) H = c(ξ, η). Therefore, √ to the norm h ξ H ≡ c(ξ, ξ), then we will know that (B, H, W) is an abstract Wiener space once we show that {Aξ : ξ ∈ B} is dense in B. To that end, choose an orthonormal basis {em : m ≥ 0} for B, where em is an eigenvector for A with eigenvalue αm . Then ∞ Aξ = αm (ξ, em ) B , m=0
and so h ∈ H if and only if ∞
−1 αm (h, em )2B < ∞,
m=0
in which case this sum is equal to h2H . Hence, for any x ∈ B, H nm=0 (x, em ) B → x in B as n → ∞. When W is a non-degenerate, centered Gaussian measure on an infinite dimensional, separable Banach space B that is not a Hilbert space, one cannot use spectral theory to analyze the map A : B ∗ −→ B discussed in the remark following Theorem 3.3.2. Of course, one can always construct the Cameron–Martin subspace by using the Bochner integral representation given in the proof of that theorem, but seldom will that lead to a practical description. For that reason, it is often best to begin by making an educated guess. For example, consider the Gaussian measure which is the distribution of Brownian motion. To be precise, take Ω0 to be the space of all con= 0. tinuous paths ω : R −→ R with the properties that ω(0) = 0 and lim|t|→∞ |ω(t)| |t| , Then Ω0 becomes a separable Banach space when one takes ωΩ0 = supt∈R |ω(t)| 1+|t| and, using Riesz’s Representation Theorem, it is easy to identify Ω0∗ with the space of signed Borel measures μ on R for which μ({0}) = 0 and μ
Ω0∗
≡
R
(1 + |t|) |μ|(dt) < ∞,
https://avxhm.se/blogs/hill0
3.3 Abstract Wiener Spaces
95
where |μ| denotes the variation measure for μ (i.e., if μ = μ+ − μ− is the Hahn decomposition of μ, then |μ| = μ+ + μ− ). It was shown in Theorem 2.5.7 that almost all Brownian paths are in Ω0 , and so their distribution induces a centered Gaussian measure W0 on Ω0 . The most intuitive way to guess the Cameron–Martin subspace for W0 is to think about its Feynman type representation. Namely, we are looking for a separable Hilbert the property that, for any n ≥ 1, the W0 -distribution of m space2 H ⊆ Ω0 with h n : −n ≤ m ≤ n 2 is ⎛
2
(2π)−n exp ⎝− 2
n
1 2
n h
m n
−h
m−1 !2 n
m=−n 2 +1
⎞
*
⎠
dh
m . n
−n 2 ≤m≤n 2 m =0
Ignoring everything except the exponential in the density, and rewriting the sum in the exponent as 2 n2 h mn − h m−1 1 n − , 1 2n 2 n m=−n +1
one is led to the Feynman-type representation W0 (dh) =
1 1 ˙ 2 dt λ H (dh). exp − h(t) Z 2 R
With this in mind, take H01 (R; R) to be the set of absolutely continuous h ∈ Ω0 whose derivatives are in L 2 (λR ; R), and turn H01 (R; R) into a separable Hilbert ˙ L 2 (λR ;R) . Using a standard mollification procedure, one space with h H01 (R;R) = h 1 sees that H01 (R; R) is dense in Ω0 . Furthermore, |h(t)| ≤ |t| 2 h H01 (R;R) , and so hΩ0 ≤ h H01 (R;R) , which means that H01 (R; R) is continuously embedded in Ω0 . Theorem 3.3.9 If W0 ∈ M1 (Ω0 ) is the distribution of a Brownian motion, then (H01 (R; R), Ω0 , W0 ) is an abstract Wiener space. Proof {ω(t) : t ∈ R} is centered Gaussian process with covariance w(s, t) = 1[0,∞) (st)|s| ∧ |t| under W0 . Thus, if μ ∈ Ω0∗ and one uses the formula n 2
ω, μ = lim
n→∞
ω
m m m+1 ! μ n, n , n
m=−n 2
it is easy variable with vari"" to check that ω ω, μ is a centered Gaussian random ance w(s, t) μ(ds)μ(dt), and therefore { · , μ : μ ∈ Ω0∗ } is a centered Gaussian family with covariance function c(μ, ν) =
w(s, t) μ(ds)μ(dt).
https://avxhm.se/blogs/hill0
96
3 Gaussian Measures on a Banach Space
Next, for μ ∈ Ω0∗ , let h μ be the function given by h μ (t) = Because ∞ 1[0,|s|] (τ )1[0,|t|] (τ ) dτ , w(s, t) = 1[0,∞) (st)
"
w(s, t) μ(ds).
0
'" t μ (τ , ∞) dτ if t ≥ 0 0 h μ (t) = " 0 t μ (−∞, τ ] dτ if t < 0.
one sees that
Thus h μ is absolutely continuous, and ' ∞) if t ≥ 0 ˙h μ (t) = μ (t, −μ (−∞, t] if t < 0. Since |μ| (−t, t) ≤ (1 + t)−1 μΩ0∗ , it follows that h μ ∈ H01 (R; R). Furthermore, if h ∈ H01 (R; R), then (h, h μ ) H01 =
∞
˙ )μ [τ , ∞) dτ − h(τ
0
0
−∞
˙ )μ (−∞, τ ] dτ = h, μ . h(τ
In particular, h μ 2H 1 (R;R) 0
= h μ , μ =
w(s, t) μ(ds)μ(dt) = c(μ, μ),
and so H01 (R; R) is the Cameron–Martin space for W0 .
One should understand how the preceding " result can be interpreted in terms the operator A : Ω0∗ −→ Ω0 given by Aμ(t) = w(s, t) μ(dt). Because ∂t2
w(s, t)ϕ(s) ds = −ϕ(t) for ϕ ∈ Cc2 (R; R),
μ is minus the second distributional derivative of h μ = Aμ, and so A−1 h μ = −∂ 2 h μ . Therefore, if H is the Cameron–Martin space for W0 , we should expect that
h, h μ
H
˙ h˙ μ 2 = −∂ 2 h, h μ L 2 (λR ;R) = h, , L (λ R ;R)
at least for h ∈ Cc2 (R; R). Starting from this, it is should be clear why H must be H01 (R; R). There is an important interpretation of the Paley–Wiener map h ∈ H01 (R; R) −→ I(h) ∈ L 2 (W0 ; R). For μ ∈ Ω0∗ , ω, μ is the Lebesgue integral
∞
ω(t) μ(dt) = (R)
ω(t) dμ (t, ∞) − (R)
0
https://avxhm.se/blogs/hill0
0 −∞
ω(t) dμ (−∞, t) ,
3.3 Abstract Wiener Spaces
97
where, taking advantage of the facts that ω is continuous and t μ (t, ∞) and t μ (−∞, t) are functions of bounded variation, the integrals on the right are taken in the sense of Riemann–Stieltjes. A fundamental property (cf. Theorem 1.2.1 in [15]) of Riemann–Stieltjes integration is that if ϕ is Riemann–Stieltjes integrable with respect to ψ on an interval [a, b], then ψ is Riemann integrable with respect to ϕ and b
(R)
b
ϕ dψ = ϕ(b)ψ(b) − ϕ(a)ψ(a) − (R)
a
ψ dϕ.
a
Hence, since ω(0) = 0 and |μ| (t, ∞) ∨ |μ| (−∞, −t) ≤ μΩ0∗ (1 + t)−1 for t > 0,
∞
ω, μ = (R) 0
μ (t, ∞) dω(t) − (R)
0 −∞
μ (−∞, t) dω(t).
(3.3.3)
This means that I(h μ ) is a bona fide Riemann–Stieltjes integral. Even though, for general h ∈ H01 (R; R), I(h) will be neither a Lebesgue or Riemann–Stieltjes and is only defined up to a set of W0 -measure 0, it is the limit in L 2 (W0 ; R) of Riemann– Stieltjes integrals, and that accounts for the use by many authors of the term Paley– Wiener integral. Paley and Wiener did not interpret their integral in terms of abstract Wiener spaces. Instead, they developed their theory along the lines in " the preceding paragraph. That is, they observed that if f ∈ Cc1 (R; R), then (R) f (t) dω(t) is a well defined Riemann–Stieltjes integral which, under W0 , is a centered Gaussian random variable with variance f 2L 2 (λR ;R) . They then used this observation and a mollifi" cation procedure to give meaning to f (t) dω(t) for general f ∈ L 2 (λR ; R). That is, given f ∈ L 2 (λR ; R), they chose a sequence { f n : n ≥ 1} ⊆" Cc1 (R; R) that converges to f in L 2 (λR ; R) and realized that the integrals (R) f n (t) dω(t) would converge in L 2 (W0 ; R) to a centered Gaussian random variable, which they denoted " by f (t) dω(t), with variance f 2L 2 (λR ;R) . From the abstract Wiener space perspective, what they did is start with μ ∈ Ω0∗ which are absolutely continuous with respect to λR and have a continuous, compactly supported Radon–Nikodym derivative and then use the same extension procedure as we used in the proof of Theorem 3.3.4. The stationary Gaussian processes discussed in Sect. 2.5.3 are another source of abstract Wiener spaces. For example, recall the Ornstein–Uhlenbeck process discussed at the end of that section, only here rescale time and space so that its covariance |t| function is 2−1 e−|t| instead of e− 2 . Again the paths of this process are continuous and have sublinear growth at infinity. Hence its distribution is a Borel measure on the = 0, which space Ω of continuous functions ω : R −→ R satisfying lim|t|→∞ |ω(t) |t|
|ω(t) . becomes a separable Banach space when one uses the norm ωΩ = supt∈R 1+|t| Just as "before, Ω ∗ can be identified as the space of signed Borel measures on R for which (1 + |t|) |μ|(dt) < ∞, only now μ({0}) need not be 0. Using the fact that this Ornstien–Uhlenbeck process is a homogeneous Markov process with transition probability density
https://avxhm.se/blogs/hill0
98
3 Gaussian Measures on a Banach Space
2 1 y − e−t x −2t − 2 exp − π(1 − e ) , 1 − e−2t one can use Feynman type formalism to see that its Cameron–Martin subspace must ˙ ∈ L 2 (λR ; R). Since eleconsist of absolutely continuous h ∈ Ω for which (h + h) ments of Ω are tempered distributions and therefore have Fourier transforms, the Fourier transform hˆ of such an h will have the property that (1 − iξ)hˆ ∈ L 2 (λR ; R) 2 ˙ 22 and therefore both h and h˙ are in L 2 (λR ; R) and h + h L (λR ;R) = h L 2 (λR ;R) + ˙ 22 h L (λR ;R) . This line of reasoning is justified in the following theorem. Theorem 3.3.10 Let U ∈ M1 (Ω) be the distribution of the Ornstein–Uhlenbeck process with covariance function 21 e−|t| , and let H 1 (R; R) be the space of absolutely continuous h : R −→ R for which both h and h˙ are in L 2 (λR ; R). Then H 1 (R; R) is a separable Hilbert space with norm ˙ 22 h H 1 (R;R) = h2L 2 (λR ;R) + h L (λR ;R)
! 21
,
H 1 (R; R) is dense in Ω, and (H 1 (R; R), Ω, U) is an abstract Wiener space. Proof Everything but the final statement is left as an exercise. Proceeding as in the proof of Theorem 3.3.9, one sees that ω, μ ω, ν U(dω) = Given μ ∈ Ω ∗ , define h μ (t) =
1 2
"
1 h˙ μ (t) = − 2
1 2
e−|t−s| μ(ds)ν(dt) for μ, ν ∈ Ω ∗ .
e−|t−s| μ(ds). Then
sgn(t − s)e−|t−s| μ(ds),
and so, for h ∈ H 1 (R; R),
˙ h˙ μ h,
L 2 (λR ;R)
1 −|t−s| ˙ =− dt μ(ds) h(t)sgn(t − s)e 2 = h(s) μ(ds) − h(t)h μ (t) dt,
the second equality being a application of integration by parts. Hence h, μ = h, h μ ) H 1 (R;R) . In addition, ω, μ 2 U(dω) =
1 2
e−|t−s| μ(ds)μ(dt) = h μ , μ = h μ 2H 1 (R;R) ,
which completes the proof that (H 1 (R; R), Ω, U) is an abstract Wiener space.
https://avxhm.se/blogs/hill0
3.3 Abstract Wiener Spaces
99
Just as the Cameron–Martin subspace for the distribution of Brownian motion can be found using the covariance map, so too can the Cameron–Martin subspace for the distribution of the Ornstein–Uhlenbeck process. The operator A : Ω ∗ −→ Ω here is " −|t−s| 1 μ(ds), and one can easily check that Aμ − ∂ 2 Aμ = μ given by Aμ(t) = 2 e in the sense of distributions. Using integration by parts and proceeding in the same way as we did before, one concludes that Cameron–Martin space for U must be H 1 (R; R). There is an interesting connection between stationary Gaussian processes and Paley–Wiener integrals. Namely, suppose that f ∈ L 2 (λR ; R) is an R-valued characteristic function. Then its inverse Fourier transform (2π)−1 fˇ is the density of a even Borel probability measure on R. Hence fˇ is a non-negative, even λR -integrable 1 function. Take ψ to be the Fourier transform"of (2π)−1 fˇ 2 , and observe that ψ is an even element of L 2 (R; R) and that f (t) = ψ(t − s)ψ(s) ds. Using Paley–Wiener integration, now define X (t, ω) =
ψ(t − τ ) dω(τ ) ∈ L 2 (W0 ; R) for t ∈ R.
Then
X (s, ω)X (t, ω) W0 (dω) = =
ψ(t − τ )ψ(s − τ ) dτ ψ(t − s − τ )ψ(τ ) dτ = f (t − s),
and so {X (t) : t ∈ R} under W0 is a centered stationary Gaussian process with covariance f (t − s). This representation of stationary Gaussian processes is a key ingredient in filtering and prediction theories.
https://avxhm.se/blogs/hill0
Chapter 4
Further Properties and Examples of Abstract Wiener Spaces
This chapter is a continuation of the preceding one. Here we will derive a few more general properies of abstract Wiener spaces and then contstruct examples of what physicists call free Euclidean fields.
4.1 Wiener Series and Some Applications The following is a modern version of Wiener’s approach to providing a mathematically satisfactory treatment of Brownian motion. Theorem 4.1.1 Let H be an infinite dimensional, separable, real Hilbert space and B a Banach space into which H is continuously embedded as a dense subspace. If for some orthonormal basis {h m : m ≥ 0} in H the series ∞
ym h m converges inB
(4.1.1)
m=0 N for γ0,1 -almost
N
every y = (y0 , . . . , ym , . . . ) ∈ R
and if S : RN −→ B is given by S(y) =
0
∞ m=0
ym h m when the series converges inB otherwise,
N is an abstract Wiener space. Conversely, if then H, B, W with W = S∗ γ0,1 (H, B, W) is an abstract Wiener space and {h m : m ≥ 0} is an orthogonal sequence in H such that, for each m ∈ N, either h m = 0 or h m H = 1, then © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. W. Stroock, Gaussian Measures in Finite and Infinite Dimensions, Universitext, https://doi.org/10.1007/978-3-031-23122-3_4
https://avxhm.se/blogs/hill0
101
102
4 Further Properties and Examples of Abstract Wiener Spaces
W
E
n p I(h m )h m < ∞ for all p ∈ [1, ∞), sup n≥0 m=0
(4.1.2)
B
∞
and, for W-almost every x ∈ B, in B to the m=0 [I(h m )](x)h m converges W-conditional expectation value of x given σ {I(h m ) : m ≥ 0} . Moreover, ∞
[I(h m )](x)h m is W-independent of x −
m=0
∞
[I(h m )](x)h m .
m=0
Finally, if {h m : m ≥ 0} is an orthonormal basis in H , then, for W-almost every x ∈ B, ∞ m=0 [I(h m )](x)h m converges in B to x, and the convergence is also in L p (W; B) for every p ∈ [1, ∞). Proof First assume that (4.1.1) holds for some orthonormal basis, and set Sn (y) = n N N m=0 ym h m and W = S∗ γ0,1 . Then, because Sn (y) −→ S(y) in B for γ0,1 -almost N every y ∈ R , n
1 1 N 2 2
e− 2 (h ξ ,h m ) H = e− 2 h ξ H , W(ξ) = lim Eγ0,1 eiSn ,ξ = lim n→∞
n→∞
m=0
which proves that (H, B, W) is an abstract Wiener space. Next suppose that (H, B, W) is an abstract Wiener space and that {h m : m ≥ 0} n ∈ N, set is an orthogonal sequence withh m H ∈ {0, 1} for each m ≥ 0. For each ∞ ≡σ F Fn = σ {I(h m ) : 0 ≤ m ≤ n} . Clearly, Fn ⊆ Fn+1 and F n=0 n is the σ-algebra generated by {I(h m ) : m ≥ 0}. Moreover, if Sn = nm=0 I(h m )h m , then, since {I(h m ) : m ≥ 0} is a Gaussian family and x − Sn (x), ξ is perpendicular in L 2 (W; R) to I(h m ) for all ξ ∈ B ∗ and 0 ≤ m ≤ n, x − Sn (x) is W-independent of Fn . Thus Sn = EW [x | Fn ], and so, by Theorem 3.2.5, we know both that (4.1.2) holds and that Sn −→ EW [x | F] W-almost surely and in L p (W; B) for all p ∈ [1, ∞). In addition, the W-independence of Sn (x) from x − Sn (x) implies that the limit quantities possess the same independence property. In order to complete the proof at this point, all that we have to do is show that x = EW [x | F] W-almost surely when {h m : m ≥ 0} is an orthonormal basis. EquivW alently, we must check that B B is contained n in the W-completion F of F. To this end, note that, for each h ∈ H , because m=0 (h, h m ) H h m converges in H to h, n
h, h m
H
m=0
I(h m ) = I
n
h, h m
h H m
−→ I(h) in L 2 (W; R).
m=0 W
Hence, I(h) is F -measurable for every h ∈ H . In particular, this means that x W x, ξ is F -measurable for every ξ ∈ B ∗ , and so, since B B is generated by { · , ξ : W ξ ∈ B ∗ }, B B ⊆ F .
https://avxhm.se/blogs/hill0
4.1 Wiener Series and Some Applications
103
It is important to acknowledge that the preceding theorem would not have made Wiener’s life easier. Wiener knew full well that what he had to do is prove that a series of the sort described in Theorem 4.1.1 converges in the space of continuous paths. Being an expert in harmonic analysis, he chose an orthonormal basis consisting of trigonometric functions. Namely, because he realized that it was sufficient to construct Brownian paths on the time interval [0, 1], he restricted his considerations to that interval and took t if m = 0 for t ∈ [0, 1]. h m (t) = 2 21 sin(mπt) if m ≥ 1 mπ 1
Since {1} ∪ {2 2 cos(mπt) : m ≥ 1} is an orthonormal basis in L 2 (λ[0,1] ; R), {h m : 1 m ≥ 0} is an orthonormal basis in H0 [0, 1]; R ≡ {h [0, 1] : h ∈ H01 (R; R)}. Thus, what Wiener had to do was show that, if {X m : m ≥ 0} is a sequence of mutually independent N (0, 1)-random variables, then t X0 +
1 ∞ 2 2 sin(mπt) Xm mπ m=1
converges almost surely uniformly on [0, 1]. Had he been willing to settle for Brownian paths in L 2 (λ[0,1] ; R), his task would have much simpler. Indeed, using the 1 orthonormality of {2 2 sin(mπt) : m ≥ 1}, one sees that ⎡
⎤
2 E ⎣sup hm X m >k m=k+1
L 2 (λ[0,1] ;R)
X2 m ⎦=E 2 π2 m m>k
= π −2
m −2 .
m>k
Therefore, as distinguished from uniform convergence, proving convergence in L 2 (λ[0,1] ; R) is trivial. Wiener’s proof of uniform convergence is based on non-trivial results from harmonic analysis. Corollary 4.1.2 If W is a non-degenerate, centered Gaussian measure on the separable Banach space B, then W(G) > 0 for all non-empty open subsets G ⊆ B. Proof Let H be the Cameron–Martin subspace for W on B, and assume that · B ≤ · H . Since H is dense in B, it suffices to show that W B B (h, r ) > 0 for all h ∈ H and r > 0. In addition, because, by (3.3.2),
W B B (0, r ) =
B B (h,r )
21 R−h dW ≤ R−h L 2 (W;R) W B B (h, r )
and R−h 2L 2 (W;R)
=e
−h2H
e−2I(h) dW = eh H ,
https://avxhm.se/blogs/hill0
2
104
4 Further Properties and Examples of Abstract Wiener Spaces
it is enough to prove that W B B (0, r ) > 0 for all r > 0. To this end, let {h n : n ≥ 0} be an orthonormal basis in H , and set Sn = nm=0 I(h m )h m . By Theorem 4.1.1, for each r > 0 there is an n such that 1 W x − Sn B ≤ r2 ≥ . 2 At the same time, Sn 2B ≤ Sn 2H = nm=0 I(h m )2 , and therefore, since x − Sn is independent of Sn , n+1 BRn+1 0, r2 > 0 W B B (0, r ) ≥ 21 γ0,1 for all r > 0.
4.1.1 An Isoperimetric Inequality for Abstract Wiener Space By taking advantage of the dimension independence of (2.4.6), we will prove here the analog of (2.4.6) for abstract Wiener spaces. Let (H, B, W) be an infinite dimensional, non-degenerate abstract Wiener space. Given A ∈ B B and t ≥ 0, define A(t) = x + h : x ∈ A & h ∈ B H (0, t) . Equivalently, x ∈ A(t) if and only if there is a y ∈ A such that y − x H ≤ t. Notice that A(t) is significantly smaller than it would have been had we used · B rather than · H to define it. Also, observe that, because, by (iii) in Lemma 3.3.1, · H is a lower semi-continuous function on B, A(t) is a closed subset of B and therefore Borel measurable. Theorem 4.1.3 For any A ∈ B B and t ≥ 0, W(A(t) ) ≥ −1 W(A) + t .
(4.1.3)
In particular, if A ∈ B B and x ∈ A =⇒ x + h ∈ A for all h ∈ H , then W(A) ∈ {0, 1}. Proof Choose a sequence {ξn : n ≥ 0} ⊆ B ∗ } so that {h n : n ≥ 0} is an orthonormal basis in H when h n = h ξn . By Theorem 4.1.1, we know that B B is contained in the W-completion of σ { · , ξn : n ≥ 0} . Thus, since the set of A for which (4.1.3) holds is closed under monotone limits, itsuffices for us to show that it holds when A ∈ F N ≡ σ { · , ξn : 0 ≤ n ≤ N − 1} for some N ≥ 1. To this end, define the projection map π N : B −→ R N by π N (x) = x, ξ0 , . . . , x, ξ N −1 . Then, A ∈
https://avxhm.se/blogs/hill0
4.1 Wiener Series and Some Applications
105
(t) (t) F N if and only if A = π −1 = π −1 N () for some ∈ BR N , in which case A N ( ), N (t) N where, as in (2.4.6), = {y ∈ R : |y − | ≤ t}. Further, W(A) = γ0,1 () and N ( (t) ), and so (4.1.3) for A follows from (2.4.6) for . W(A(t) ) = γ0,1 Given the first part, the second part follows from the fact that, under the stated con(t) (t) dition, A = A forall t > 0, and therefore W(A) > 0 =⇒ W(A) = W(A ) ≥ −1 W(A) + t −→ 1 as t → ∞.
An interesting consequence of (4.1.3) is the fact that if L ∈ B B is a subspace of B containing H , then W(L) ∈ {0, 1}. Thus, if E is a Banach space of paths containing H01 (R; R), then the probability that a Brownian path will be in E is either 0 or 1. Corollary 4.1.4 Let f : B −→ [−∞, ∞] be a Borel measurable function satisfying | f (x + h) −f (x)| ≤ λhH for some λ ∈ [0, ∞) and all h ∈ H if | f (x)| < ∞. If W | f | < ∞ > 0, then W | f | < ∞ = 1, and t2 W {x ∈ B : ± f (x) ∓ m ≥ λt} ≤ 1 − (t) ≤ e− 2 ,
(4.1.4)
where m is a median of f under W. Hence, for any g ∈ C R; [0, ∞) satisfying g(s) ≤ g(t) when st ≥ 0 and |s| ≤ |t|,
g f (x) − m W(d x) ≤ B
R
g(λt) γ0,1 (dt).
(4.1.5)
In particular, if = sup h ξ H : ξ ∈ B B ∗ (0, 1) and m is the smallest median of x x B under W, then 2 1 − ( −1 t) ≤ W {x : x B ≥ t} ((t−m)+ )2 ≤ 2 1 − ( −1 (t − m)+ ≤ 2e− 22 , and so
exp
α(x B − m)2 2 2
W(d x) ≤
1 1
(1 − α) 2
≤
exp
αx2B 2 2
for α ∈ [0, 1).
(4.1.6)
W(d x)
Proof Since | f (x)| < ∞ =⇒ | f (x + h)| < ∞ for all h ∈ H , W | f | < ∞ > 0 =⇒ W | f | < ∞ = 1. The estimate in (4.1.4) is proved from (4.1.3) in exactly the same way as the one in (2.4.13) was derived from (2.4.6). To prove the upper bounds in (4.1.6), one needs to know that h B ≤ h H for all h ∈ H . To see this, given h ∈ H \ {0}, choose ξ ∈ B ∗ so the ξ B ∗ = 1 and h, ξ = h B . Then h B = (h, h ξ ) H ≤ h H h ξ H ≤ h H . Now simply observe that y B − x B ≤ x − y B ≤ y − x H .
https://avxhm.se/blogs/hill0
106
4 Further Properties and Examples of Abstract Wiener Spaces
The first step in proving the lower bound in (4.1.6) is to notice that, since B B ∗ (0, 1) is compact with respect to the weak* topology and ξ ∈ B ∗ −→ h ξ ∈ H is weak* continuous into the strong topology on H , there is a ξ ∈ B B ∗ (0, 1) such the h ξ H = , which is possible only if ξ B ∗ = 1. Finally, knowing that this ξ exists, note that · , ξ is a centered Gaussian under W with variance 2 . Hence, since x B ≥ |x, ξ |, W x B ≥ t ≥ W |x, ξ | ≥ t = γ0,1 R \ (− −1 t, −1 t) = 2 1 − ( −1 t) . The assumption in Corollary 4.1.4 that W(| f | < ∞) > 0 is vital. Indeed, otherwise we would get a contradiction by taking f (x) = x H . The estimates in (4.1.6) and its application represent a significant sharpening of Fernique’s result. An asymptotic version of (4.1.6) was proved by Donsker and Varadhan using the result in Theorem 4.1.7 below. They showed that 1 lim R −2 log W x B ≥ R) = − 2 , 2
R→∞
αx2B
and from this they concluded that e 22 is W-integrable if and only if α < 1. The proof given here of the lower bound in (4.1.6) is an adaptation of their argument.
4.1.2 Rademacher’s Theorem for Abstract Wiener Space An important theorem proved by H. Rademacher states that if f : R N −→ R is a uniformly Lipschitz continuous function, then it has a gradient at λR N -almost every point. More precisely, there is a Borel measurable function ∇ f : R N −→ R N such that f (x + te) − f (x) − t ∇ f (x), e R N = 0 for λR N -a.e. x ∈ R N . lim sup t0 e∈S N −1 t In particular, ∇ f can be chosen so that |∇ f | u is the Lipschitz constant for f . The analog for abstract Wiener spaces given below appeared originally in [6]. Theorem 4.1.5 Let (H, B, W) be an abstract Wiener space and f : B −→ R a Borel measurable function for which there exists a λ < ∞ such that | f (y) − f (x)| ≤ λy − x H for all x, y ∈ B. Then there exists an F ∈ L ∞ (W; H ) and an A ∈ B B such that W(A) = 1 and, for x ∈ A lim t0
f (s + th) − f (x) − t F(x), h H t
= 0 as t → 0
https://avxhm.se/blogs/hill0
4.1 Wiener Series and Some Applications
107
uniformly for h in compact subsets of H . Proof First observe that, by Corollary 4.1.4, f ∈ L p (W; R) for all p ∈ [1, ∞). Given ξ ∈ B ∗ with h ξ H = 1, define f (x + th ξ ) − f (x) ξ = x ∈ B : lim exists in R t0 t and ξ = (x, s) ∈ B × R : x + sh ξ ∈ ξ .
Since the function t f (x + th ξ ) is Lipschitz continuous and therefore absolutely continuous, Lebesgue’s Differentiation Theorem guarantees that, for each x ∈ B, everywhere differentiable, and therefore, for each x ∈ t f (x + th ξ ) is λR -almost B, γ0,1 {s : (x, s) ∈ ξ } = 1. Next observe that ξ = x : x − x, ξ h ξ , x, ξ ∈ ξ , and therefore, since x − x, ξ h ξ is W-independent of x, ξ ∈ N (0, 1), W(ξ ) =
γ0,1
s : x − x, ξ h ξ , s ∈ ξ W(d x) = 1.
Define Fξ (x) =
limt0 0
f (x+th ξ )− f (x) t
if x ∈ ξ if x ∈ / ξ .
Now choose {ξm : m ≥ 0} ⊆ B ∗ so that {h m : m ≥ 0} isan orthonormal basis for H when h m = h ξm , and take D to be the set of ξ ∈ span {ξm : m ≥ 0} for which ⊆ Q. Clearly {h ξ : ξ ∈ D} is a dense subset of H , and since {(h ξ , h m ) H : m ≥ 0} D is countable, ≡ ξ∈D ξ has W-measure 1. Given ξ ∈ D and ϕ ∈ P (cf. the notation in the last part of Sect. 3.3.1),
Fξ , ϕ
L 2 (W;R)
f (x + th ξ ) − f (x) ϕ(x) W(d x) t Rth ξ ϕ ◦ T−th ξ (x) − ϕ(x) = lim f (x) W(d x) t0 t ∞ (h ξ , h m ) H f (x)∂hm ϕ(x) W(d x). = f (x)∂h ξ ϕ(x) W(d x) = = lim t0
m=0
Since, for each m, by reversing the preceding one has
https://avxhm.se/blogs/hill0
108
4 Further Properties and Examples of Abstract Wiener Spaces
f (x + th m ) − f (x) ϕ(x) W(d x) f (x)∂hm ϕ(x) W(d x) = lim t0 t = Fξm (x)ϕ(x) W(d x),
it follows that (Fξ , ϕ) L 2 (W;R) =
∞
(h ξ , h m ) H (Fξm , ϕ) L 2 (W;R)
m=0
for all ϕ ∈ P and therefore, because P is dense in L 2 (W; R), that Fξ = ∞ m=0 (h ξ , h m ) H Fξm (a.s., W) for each ξ ∈ D. Let A be the set of x ∈ for which Fξ (x) = ∞ m=0 (h ξ , h m ) H Fξm (x) for all ξ ∈ D. Then W(A) = 1, and, for x ∈ A, ∞ (h ξ , h m ) H Fξm (x) ≤ λh ξ H for all ξ ∈ D. m=0
2 2 Thus ∞ m=0 Fξm (x) ≤ λ for x ∈ A, and so, if we define the Borel measurable function F : B −→ H by F(x) =
0
∞ m=0
Fξm (x)h m if x ∈ A if x ∈ / A,
then Fξ (x) = F(x), h ξ H for ξ ∈ D and x ∈ A. Finally, for x ∈ A and h ∈ H , define t ∈ [0, 1] −→ [ρ(x, h)](t) ∈ R so that [ρ(x, h)](0) = 0 and [ρ(x, h)](t) =
f (x + th) − f (x) − F(x), h H for t ∈ (0, 1]. t
Then, sup [ρ(x, g)](t) − [ρ(x, h)](t) ≤ (λ + 1)g − h H for g, h ∈ H, t∈(0,1]
and, for x ∈ A and ξ ∈ D, ρ(x, h ξ ) ∈ C [0, 1]; R . Therefore h ρ(x, h) (0, 1] is the continuous extension of its restriction to {h ξ : ξ ∈ D}, and so h ∈ H −→ ρ(x, h) ∈ C [0, 1]; R is a continuous map for x ∈ A. Hence, if x ∈ A, then, as t 0, f (x + th) − f (x) − F(x), h H = [ρ(x, h)](t) −→ 0 t uniformly for h’s in compact subsets of H .
https://avxhm.se/blogs/hill0
4.1 Wiener Series and Some Applications
109
As a corollary of Theorem 4.1.5, one knows that a function f satisfying its hypothesis is an element of the space W12 (W; R) described in the last part of Sect. 3.3.1 and that F = ∇ f . Indeed, for any ϕ ∈ P and m ≥ 0, and so
Fξm , ϕ L 2 (W;R) = f, ∂hm ϕ L 2 (W;R) ,
F,
L 2 (W;H )
= f, ∇ L 2 (W;R)
for all ∈ P H . Hence, f is in the domain of (∇ ) and therefore, by Theorem 3.3.8, it is an element of W12 (W; R). Here is an amusing application of Theorem 4.1.5. Given a weak* compact subset K of B ∗ , set f (x) = maxξ∈K x, ξ . Obviously, f satisfies the hypothesis in the theorem. Let A be the set of x with the property that
F(x), h
H
= lim t0
f (x + th) − f (x) for all h ∈ H, t
and take x = ξ ∈ K : x, ξ = f (x) . By Theorem 4.1.5, W(A) = 1. Further f (x) more, if x ∈ A and ξ ∈ x , then f (x+th)− ≥ h, ξ for t > 0, and so F(x), h H ≥ t h, ξ for all h ∈ H . Since this means that F(x), h H = (h, h ξ ) H for all h ∈ H , it follows that h ξ = F(x) and therefore that there is precisely one element of x for each x ∈ A. In other words, for each x ∈ A, there is a unique ξx ∈ K such that x, ξx = max x, ξ : ξ ∈ K . When applied to Brownian motion, this result says that, on each compact set of times, almost every Brownian path achieves its maximum value precisely once.
4.1.3 Gross’s Operator Extention Procedure A major goal of both Segal and Gross was to develop a procedure for extending operators defined on the Cameron–Martin subspace to the Banach space in which it is embedded, and this subsection contains a couple of Gross’s fundamental results in that direction. Throughout, H will be an infinite dimensional, separable Hilbert space over R. Lemma 4.1.6 Let (H, B, W) be an abstract Wiener space and {h m : m ≥ 0} an (h, h m ) H I(h m ) converges to orthonormal basis in H . Then, for each h ∈ H , ∞ m=0 I(h) W-almost surely and in L p (W; R) for every p ∈ [1, ∞). Proof The W-almost sure convergence is an application of Kolmogorov’s convergence theorem for sums of mutually independent, square integrable random variables. That the convergence is in L p (W; R) for p ∈ [1, ∞) follows from Lemma 2.1.1,
https://avxhm.se/blogs/hill0
110
4 Further Properties and Examples of Abstract Wiener Spaces
and that the limit is I(h) comes from the convergence of H.
n
m=0 (h, h m ) H h m
to h in
Theorem 4.1.7 Let (H, B, W) be an abstract Wiener space. For each finite dimensional subspace L of H there is a W-almost surelyunique map PL : E −→ H such that, for every h ∈ H and W-almost every x ∈ B, h, PL x H = I( L h)(x), where L denotes the orthogonal projection map from H onto L. In fact, if {g1 , . . . , gdim(L) } dim(L) is an orthonormal basis for L, then PL x = i=1 [I(gi )](x)gi ∈ L for W-almost every x ∈ B. In particular, the distribution of x ∈ B −→ PL x ∈ L under W is the dim(L) y j g j ∈ L under γ0,1 . same as that of (y1 , . . . , ydim(L) ) ∈ Rdim(L) −→ dim(L) j=1 Finally, x PL x is W-independent of x x − PL x. Proof Set = dim(L). In view of Theorem 4.1.1, it suffices to note that I( L h) = I
(h, gk ) H gk
k=1
(h, gk ) H I(gk ) = I(gk )gk , h = k=1
k=1
for all h ∈ H
H
The definition of an abstract Wiener space that I have been using is not the same as Gross’s. The difference is that his definition included the property that is derived in the following theorem. Theorem 4.1.8 Let (H, B, W) be an abstract Wiener space and {h n : n ≥ 0} an orthonormal basis for H . If L n =! span {h 0 , . . . , h n } , then for all > 0 there exists an n ∈ N such that EW PL x2B ≤ 2 whenever L is a finite dimensional subspace of H that is perpendicular to L n . Proof Without loss in generality, we will assume that · B ≤ · H . Arguing by contradiction, we will show that if the asserted property did not hold, basis { f n : n ≥ 0} for H such that ∞ then there would exist an orthonormal 2 I( f ) f fails to converge in L (W; B). n n 0 Suppose that there is an > 0 such !that for all n ∈ N there exists a finite dimendefine sional L ⊥ L n with EW PL x2B ≥ 2. Under this assumption, {n m : m ≥ 0} ⊆ N, {m : m ≥ 0} ⊆ N, and { f 0 , . . . , f n m } : m ≥ 0 ⊆ L n m inductively by the following prescription. First, take n 0 = 0 = 0 and f 0 = h 0 . Next, knowing n m and !{ f 0 , . . . , f n m }, choose a finite dimensional subspace L ⊥ L n m so that EW PL x2B ≥ 2 , set m = dim(L), and let {gm,1 , . . . , gm,m } be an orthonormal basis for L. For any δ > 0 there exists an n ≥ n m + m such that m L gm, j , L gm,k − δ j,k ≤ δ. n n H j,k=1
In particular, if δ ∈ 0, 21 , then the elements of { L n gm,i : 1 ≤ i ≤ m } are linearly independent and the orthonormal set {g˜ m, j : 1 ≤ j ≤ m } obtained from them via the Gram–Schmidt orthogonalization procedure satisfies
https://avxhm.se/blogs/hill0
4.1 Wiener Series and Some Applications m
g˜ m, j − L n gm, j H ≤ K m
j=1
111 m L gm, j , L gm,k − δi, j n n j,k=1
for some K m < ∞ which depends only on m . Moreover, because L ⊥ L n m ,
h, L n gm, j
H
= L n h, gm, j H = h, gm, j H = 0 for h ∈ L n m ,
can find an n m+1 ≥ and so g˜ m, j ⊥ L n m for all 1 ≤ j ≤ m . We therefore n m + m for which span {h n : n m < n ≤ n m+1 } admits an orthonormal basis { f n m +1 , . . . , f n m+1 } ⊥ L n m with the property that 1m gm, j − f n m + j H ≤ 4 . Clearly { f n : n ≥ 0} is an orthonormal basis for H . On the other hand, ⎡ 2 ⎤ 21 m +m n EW ⎣ I( f n ) f n ⎦ n=n m +1
B
2 ⎤ 21 ⎡ m ⎦ I(g ≥ − EW ⎣ )g − I( f ) f m, j m, j nm + j nm + j j=1 B
≥−
m
2 ! 1 EW I(gm, j )gm, j − I( f n m + j ) f n m + j H 2 ,
j=1
2 ! 1 and so, since EW I(gi,m )gm, j − I( f n m +i ) f n m +i H 2 is dominated by 2 ! 1 !1 EW I(gm, j ) − I( f n m + j ) gm, j H 2 + EW I( f n m + j )2 2 gm, j − f n m + j H ≤ 2gm, j − f n m + j H , we have that
⎡ 2 ⎤ 21 m +m n for all m ≥ 0, EW ⎣ I( f n ) f n ⎦ ≥ 2 n +1 m
and this means that
∞ n=0
B
I( f n ) f n cannot be converging in L 2 (W; B).
Besides showing that my definition of an abstract Wiener space is the same as Gross’s, Theorem 4.1.8 allows us to prove a very convincing statement, again due to Gross, of just how non-unique is the Banach space for which a given Hilbert space is the Cameron–Martin subspace. Corollary 4.1.9 If (H, B, W) is an abstract Wiener space, then there exists a separable Banach space B0 that is continuously embedded in B as a measurable subset and has the properties that W(B0 ) = 1, bounded subsets of B0 are relatively compact in B, and (H, B0 , W B0 is again an abstract Wiener space.
https://avxhm.se/blogs/hill0
112
4 Further Properties and Examples of Abstract Wiener Spaces
Proof Again we will assume that dim(H ) = ∞ and · E ≤ · H . Choose {ξn : n ≥ 0} ⊆ B ∗ so that {h n : n ≥ 0} is an orthonormal basis in H when h n = h ξn , and set L n = span {h 0 , . . . , h n } . Next, using Theorem 4.1.8, choose an !1 increasing sequence {n m : m ≥ 0} so that n 0 = 0 and EW PL x2B 2 ≤ 2−m for m ≥ 1 and finite dimensional L ⊥ L n m , and define Q for ≥ 0 on B into H so that n
Q 0 x = x, ξ0 h 0 and Q x =
x, ξn h n when ≥ 1.
n=n −1 +1
Finally, set Sm = PL nm =
m
x B0 ≡ Q 0 x E +
=0
∞
Q , and define B0 to be the set of x ∈ B such that 2 Q x B < ∞ and Sm x − x B −→ 0.
=1
To show that · B0 is a norm on B0 and that B0 with norm · B0 is a Banach space, first note that if x ∈ B0 , then x B = lim Sm x B ≤ Q 0 x B + lim m→∞
m→∞
m
Q x B ≤ x B0 ,
=1
and therefore · B0 is certainly a norm on B0 . Next, suppose that the sequence {xk : k ≥ 1} ⊆ B0 is Cauchy convergent with respect to · B0 . By the preceding, we know that {xk : k ≥ 1} is also Cauchy convergent with respect to · B , and so there exists an x ∈ B such that xk −→ x in B. We need to show that x ∈ B0 and that xk − x B0 −→ 0. Because {xk : k ≥ 1} is bounded in B0 , it is clear that x B0 < ∞. In addition, for any m ≥ 0 and k ≥ 1, x − Sm x B = lim x − Sm x B ≤ lim x − Sm x B0 →∞
= lim
→∞ n>m
→∞
n Q n x B ≤ 2
n 2 Q n xk B + sup x − xk B0 . >k
n>m
Thus, by choosing k for a given > 0 so that sup>k x − xk B0 < , we conclude that limm→∞ x − Sm x B < and therefore that Sm x −→ x in B. Hence, x ∈ B0 . Finally, to see that xk −→ x in B0 , simply note that x − xk B0 = Q 0 (x − xk ) B +
→∞
m 2 Q m (x − xk ) B
m=1
≤ lim
∞
Q 0 (x − xk ) B +
∞
m 2 Q m (x − xk ) B
m=1
which tends to 0 as k → ∞.
https://avxhm.se/blogs/hill0
≤ sup x − xk B0 , >k
4.1 Wiener Series and Some Applications
113
To show that bounded subsets of B0 are relatively compact in B, it suffices to show that for any sequence {x : ≥ 1} ⊆ B B0 (0, R), then there is an x ∈ B to which a subsequence converges in B. For this purpose, observe that, for each m ≥ 0, there is a subsequence {xk : k ≥ 1} along which {Sm xk : k ≥ 1} converges in L n m . Hence, by a diagonalization argument, {xk : k ≥ 1} can be chosen so that {Sm xk : k ≥ 1} converges in L n m for all m ≥ 0. Since, for 1 ≤ j < k, xk − x j B ≤ Sm xk − Sm x j B +
Q n (xk − x j ) B
n>m
≤ Sm xk − Sm x j B + 2R
1 , n2 n>m
it follows that {xk : k ≥ 1} is Cauchy convergent in B and therefore that it converges in B. We must still show that B0 ∈ B B and that H, B0 , W B0 is an abstract Wiener space. To see the first of these, observe that x ∈ B −→ x B0 ∈ [0, ∞] is lower semicontinuous and that {x : Sm x − x B −→ 0} ∈ B B . In addition, because, by Theorem 4.1.1, Sm x − x B −→ 0for W-almost every x ∈ B, we will know that W(B0 ) = 1 once we show that W x B0 < ∞ = 1, which follows immediately from ∞ ! ! ! EW x B0 = EW Q 0 x B + m 2 EW Q m x B 1
!
≤ EW Q 0 x B +
∞
m 2 EW Q m x2B
! 21
1
! ≤ EW Q 0 x B +
∞
m 2 2−m < ∞.
m=1
The next step is to check that H is continuously embedded in B0 . Certainly h ∈ H =⇒ Sm h − h B ≤ Sm h − h H −→ 0. Next suppose that h ∈ H \ {0} and that h ⊥ L n m , and let L be the line in H spanned by h. Then PL x = h−2 H [I(h)](x)h, and so, because L ⊥ L n m , ! 1 h B 1 h B ≥ EW I(h)2 2 = . 2 m 2 h H h H Hence, we now know that h ⊥ L n m =⇒ h B ≤ 2−m h H . In particular, Q m+1 h B ≤ 2−m Q m+1 h H ≤ 2−m h H for all m ≥ 0 and h ∈ H , and so h B0
∞
∞ m2 = Q 0 h B + m Q m h B ≤ 1 + 2 2m m=1 m=1 2
https://avxhm.se/blogs/hill0
h H = 25h H .
114
4 Further Properties and Examples of Abstract Wiener Spaces
To complete the proof, we must show that H is dense in B0 and that, for each 1 2
= e− 2 h ξ H , where h ξ ∈ H is determined by h, h ξ H = h, ξ for ξ ∈ B0∗ , W(ξ) h ∈ H . Both these facts rely on the observation that x − Sm x B0 =
n 2 Q n x B −→ 0 for all x ∈ B0 .
n>m
Knowing this, the density of H in B0 is obvious. Finally, if ξ ∈ B0∗ , then, by the preceding and Lemma 4.1.6, x, ξ = lim Sm x, ξ = lim m→∞
= lim
m→∞
m→∞
nm
hξ , hn
H
nm
x, ξn h n , ξ
n=0
! ! I(h n ) (x) = I(h ξ ) (x)
n=0
for W-almost every x ∈ B0 . Hence · , ξ under W is a centered Gaussian with variance h ξ 2H . Using the ideas in the preceding proof, one can show that, given a separable Hilbert space H , H itself equals the intersection of all the separable Banach spaces in which H is the Cameron–Martin subspace. Thus, as is usually true in life, when dim(H ) = ∞ there is always a better choice but never a best one. See [4] for details.
4.1.4 Orthogonal Invariance Consider the standard Gauss distribution γ0,I on R N . Obviously, γ0,I is orthogonally invariant. That is, if O is an orthogonal transformation of R N , then γ0,I is invariant under the transformation TO : R N −→ R N given by TO x = Ox. On the other hand, none of these transformations can be ergodic since any radial function on R N is invariant under TO for every O. Now think about the analogous situation when R N is replaced by an infinite dimensional Hilbert space H and (H, B, W) is an associated abstract Wiener space. As we are about to show, W is invariant under orthogonal transformations on H . On the other hand, because x H = ∞ for W-almost every x ∈ B, there are no nontrivial radial functions now, a fact that leaves open the possibility that some orthogonal transformation of H give rise to ergodic transformations for W. The purpose of this subsection is to investigate these matters, and I begin with the following formulation of the orthogonal invariance of W. Theorem 4.1.10 Let (H, B, W) be an abstract Wiener space and O an orthogonal transformation on H . Then there is a W-almost surely unique, Borel measurable map TO : B −→ B such that I(h) ◦ TO = I(O h) W-almost surely for each h ∈ H . Moreover, W = (TO )∗ W.
https://avxhm.se/blogs/hill0
4.1 Wiener Series and Some Applications
115
Proof To prove uniqueness, note that if T and T both satisfy the defining property for TO , then, for each ξ ∈ B ∗ , T x, ξ = I(h ξ )(T x) = I(O h ξ ) = I(h ξ )(T x) = T x, ξ for W-almost every x ∈ B. Hence, since B B ∗ (0, 1) is separable in the weak* topology, T x = T x for W-almost every x ∈ B. ≥ 0} for H , and let C To prove existence, choose an orthonormal basis {h m : m ∞ [I(h )](x)h and be the set of x ∈ B for which both ∞ m m m=0 m=0 [I(h m )](x)Oh m converge in B. By Theorem 4.1.1, we know that W(C) = 1 and that x TO x ≡
∞ m=0 [I(h m )](x)Om
0
if x ∈ C if x ∈ /C
has distribution W. Hence, all that remains is to check that I(h) ◦ TO = I(O h) W-almost surely for each h ∈ H . To this end, let ξ ∈ B ∗ , and observe that [I(h ξ )](TO x) = TO x, ξ =
∞
h ξ , Oh m
H
[I(h m )](x)
m=0
=
∞ O h ξ , h m H [I(h m )](x) m=0
for W-almost every x ∈ B. Thus, since, by Lemma 4.1.6, the last of these series convergences W-almost surely to I(O h ξ ), we have that I(h ξ ) ◦ TO = I(O h ξ ) W-almost surely. To handle general h ∈ H , simply note that both h ∈ H −→ I(h) ◦ TO ∈ L 2 (W; R) and h ∈ H −→ I(O h) ∈ L 2 (W; R) are isometric, and remember that {h ξ : ξ ∈ B ∗ } is dense in H . I discuss next the possibility of TO being ergodic for some orthogonal transformations O. First notice that TO cannot be ergodic if O has a non-trivial, finite dimensional invariant subspace L, since if {h 1 , . . . , h n } were an orthonormal basis for L, then nm=1 I(h m )2 would be a non-constant, TO -invariant function. Thus, the only candidates for ergodicity are O’s that have no non-trivial, finite dimensional, invariant subspaces. In a more general and highly abstract context, Segal [10] showed that the existence of a non-trivial, finite dimensional subspace for O is the only obstruction to TO being ergodic. Here I will show less. Theorem 4.1.11 Let (H, B, W) be an abstract Wiener space. If O is an orthogonal transformation on H with the property that, for every g, h ∈ H , limn→∞ On g, h H = 0, then TO is ergodic. Proof What we have to show is that any TO -invariant element ∈ L 2 (W; R) is W-almost surely constant, and for this purpose it suffices to check that
https://avxhm.se/blogs/hill0
116
4 Further Properties and Examples of Abstract Wiener Spaces
! lim EW ( ◦ TOn ) = 0
n→∞
for all ∈ L 2 (W; R) with mean value 0. In fact, if {h m : m ≥ 1} is an orthonormal basis for H , then it suffices to check (∗) when (x) = F [I(h 1 )](x), . . . , [I(h N )](x) for some N ∈ Z+ and bounded, Borel measurable F : R N −→ R. The reason why it is sufficient to check it for such ’s is that, because TO is W-measure preserving, the set of ’s for which (∗) holds is closed in L 2 (W; R). Hence, if we start with any ∈ L 2 (W; R) with mean value 0, we can first approximate it in L 2 (W; R) by bounded functions then condition these bounded approximates with mean value 0 and with respect to σ {I(h 1 ), . . . ,I(h N )} to give them the required form. Now suppose that = F I(h 1 ), . . . , I(h N ) for some N and bounded, Borel measurable F on R N . Then ! EW ( ◦ TOn ) = F(ξ)F(η) γ0,Cn (dξ × dη), R N ×R N
where Cn =
I Bn Bn I
with Bn =
h k , On h
H
1≤k,≤N
and the block structure corresponds to R N × R N . Finally, by our hypothesis about O, we can find a subsequence {n m : m ≥ 0} such that limm→∞ Bn m = 0, from which it is clear that γ0,Cnm tends to γ0,I × γ0,I in variation and therefore ! lim EW ( ◦ TOn m ) = EW []2 = 0.
m→∞
Perhaps the best tests for whether an orthogonal transformation satisfies the hypothesis in Theorem 4.1.11 come from spectral theory. To be more precise, if Hc and Oc are the space and operator obtained by complexifying H and O, the Spectral Theorem for normal operators allows one to write Oc =
2π
eiα d E α ,
0
where {E α : α ∈ [0, 2π)} is a resolution of the identity in Hc by orthogonal procontinuous if, for jection operators. The spectrum of Oc is saidto be absolutely each h ∈ Hc , the non-decreasing function α E α h, h Hc is absolutely continuous, which, by polarization, means that α E α h, h Hc is absolutely continuous for all
https://avxhm.se/blogs/hill0
4.1 Wiener Series and Some Applications
117
h, h ∈ Hc . The reason for introducing this concept here is that, by combining the Riemann–Lebesgue Lemma with Theorem 4.1.11, one can prove thatTO is ergodic if the spectrum of Oc is absolutely continuous. Indeed, given h, h ∈ H , let f be the Radon–Nikodym derivative of α E α h, h Hc , and apply the Riemann–Lebesgue Lemma to see that 2π n einα f (α) dα −→ 0 as n → ∞. O h, h H = 0
This conclusion is substantially weaker than the one proved by Segal. He proved that TO will be ergodic if and only if α (E α h, h) H is non-atomic for all h ∈ H .
4.1.5 Large Deviations in Abstract Wiener Spaces The primary goal of this subsection is to derive the following result. Theorem 4.1.12 Let (H, B, W) be an abstract Wiener space, and, for > 0, denote 1 by W the W-distribution of x 2 x. Then, for each ∈ B B , − inf
h∈˚
h2H ≤ lim log W () 2 0 h2H . ≤ lim log W () ≤ − inf 0 2 h∈
(4.1.7)
The original version of Theorem 4.1.12 was proved by M. Schilder for the classical Wiener measure using a method that does not extend easily to the general case. The statement that I have given is due to Donsker and Varadhan [5], and my proof derives from an approach that was introduced in this context by Varadhan. The lower bound is an easy application of the Cameron–Martin formula. Indeed, all that one has to do is show that if h ∈ H and r > 0, then h2H . lim log W B B (h, r ) ≥ − 2 0 To this end, note that, for any ξ ∈ B ∗ and δ > 0, 1 1 W B B (h ξ , δ) = W B B (− 2 h ξ , − 2 δ) −1 1 1 2 = EW e− 2 x,ξ − 2 h ξ H , B B (0, − 2 δ) 1 1 2 −1 ≥ e−δ ξ B ∗ − 2 h ξ H W B B (0, − 2 δ) ,
https://avxhm.se/blogs/hill0
118
4 Further Properties and Examples of Abstract Wiener Spaces
which means that h ξ 2H B B (h ξ , δ) ⊆ B B (h, r ) =⇒ lim log W B B (h, r ) ≥ −δξ B ∗ − , 0 2 and therefore, after letting δ 0 and remembering that {h ξ : x ∈ B ∗ } is dense in H , that (∗) holds. The proof of the upper bound in (4.1.7) is a little more involved. The first step is to show that it suffices to treat the case when is relatively compact. To this end, refer to Corollary 4.1.9, and set C R equal to the closure in B of B B0 (0, R). By Fernique’s 2 ! Theorem applied to W on B0 , we know that EW eαx B0 ≤ K for some α > 0 and K < ∞. Hence R2 W B \ C R = W B \ C− 21 R ≤ K e−α , and so, for any ∈ B B and R > 0, R2 W ≤ 2W ( ∩ C R ) ∨ K e−α . Thus, if we can prove the upper bound for relatively compact ’s, then, because ∩ C R is relatively compact, we will know that, for all R > 0, " lim log W () ≤ − 0
inf h∈
h2H 2
# ∧ αR 2 ,
from which the general result is immediate. To prove the upper bound when is relatively compact, we will show that, for any y ∈ B, y2 − 2 H if y ∈ H lim lim log W B B (y, r ) ≤ (∗∗) r 0 0 −∞ if y ∈ / H. To see that (∗∗) is enough, assume that it is true, and let ∈ B B \ {∅} be relatively compact. Given β ∈ (0, 1), for each y ∈ choose r (y) > 0 and (y) > 0 so that
W B B (y, r (y)) ≤
(1−β) 2 e− 2 y H if y ∈ H e− β 1
if y ∈ / H
for all 0 < ≤ (y). Because is relatively compact, we can find N ∈ Z+ and {y1 , . . . , y N } ⊆ such that ⊆ 1N B B (yn , rn ), where rn = r (yn ). Thus, for sufficiently small > 0, # " 1 1−β 2 , inf h H ∧ W () ≤ N exp − 2 h∈ β
https://avxhm.se/blogs/hill0
4.1 Wiener Series and Some Applications
119
"
and so lim log W () ≤ − 0
1−β inf h2H 2 h∈
∧
# 1 . β
Now let β 0. Finally, to prove (∗∗), observe that −1 −1 W B B (y, r ) = W B B ( √y , √r ) = EW e− 2 x,ξ e 2 x,ξ , B B ( √y , √r ) h 2 1 ! −−1 y,ξ − ξ2 H −r ξ B ∗ −−1 (y,ξ −r ξ B ∗ ) W − 2 x,ξ , ≤e =e E e for all ξ ∈ B ∗ . Hence, lim lim log W B B (y, r ) ≤ − sup y, ξ − 21 h ξ 2H .
r 0 0
ξ∈B ∗
Note that the preceding supremum is the same as half the supremum of y, ξ over y2 ξ with h ξ H = 1, which, by Lemma 3.3.1, is equal to 2 H if y ∈ H and to ∞ if y∈ / H. Exercise 4.1.1 (i) Let μ be a Borel probability measure on the separable Banach space E, set A = {x ∈ E : (Tx )∗ μ μ}, and, for x ∈ A, let Rx be the Radon-Nikodym derivative of (Tx )∗ μ with respect to μ. Show that, for x, y ∈ A =⇒ x + y ∈ A and Rx+y = R y ◦ T−x Rx . Use this to show that if (H, B, W) be an abstract Wiener space and g, h ∈ H , then I(g) ◦ Th = I(g) + (g, h) H (a.s., W). (ii) Let (H, B, W) be an abstract Wiener space, and set I H = {I(h) : h ∈ H }. Show$ that I(H ) is a closed subspace of L 2 (W; R). Next, given ψ ∈ L 2 (W; R), set g = ψ(x)x W(d x). Show that g ∈ H and that $I(g) is the orthogonal projection of ψ onto I(H ). In particular, conclude that h = I(h)(x)x W(d x) for all h ∈ H . (iii) Let {X (t) : t ∈ R} be a centered Gaussian process with covariance function cos 2π(t − s) . Describe the distribution of this process in terms of an abstract Wiener space. (iv) Let (H, B, W) be an abstract Wiener space, and suppose that {ξn : n ≥ 0} ⊆ that {h n : n ≥ 0} is an orthonormal sequence when h h = h ξn . B ∗ has the property Set Sn (x) = nm=0 x, ξm h m , and show that # " α 2 0,
E sup
sup
Sn (t) − Sn (s) B0 (t − s)
n≥0 −T ≤s 0 and W N -almost every y ∈ B N , {Sn (t, y) : n ≥ 0 & t ∈ [−T, T ]} is relatively compact in B and {Sn ( · , y) [−T, T ] : n ≥ 0} is uniformly · B -equicontinuous, the Ascoli–Arzela Theorem guarantees that, W N -almost surely, {Sn ( · , y) : n ≥ 0} is relatively compact in C(R; B) with the topology of uniform convergence on compacts. Thus, what remains to be shown is that W N -almost surely, lim sup sup
T →∞ n≥0 |t|≥T
Sn (t, y) B = 0. |t|
But, sup |t|≥2k
Sn (t, y) B Sn (t, y) B ≤ sup |t| |t| +1 ≥k 2 ≤|t|≤2 ≤
≥k
2− 8
7
sup
Sn (t, y) B
0≤|t|≤2+1
https://avxhm.se/blogs/hill0
1
|t| 8
,
4.2 Brownian Motion on a Banach Space
123
and therefore, by (∗),
E sup sup
n≥0 |t|≥2k
Sn (t) B |t|
4 41 ≤
2 1
1−k 2
22 − 1
K.
In addition to the preceding compactness result, we need the following simple criterion for checking when a relatively compact sequence in 0 (B) converges. Lemma 4.2.3 Suppose that {ωn : n ≥ 0} is a relatively compact sequence in 0 (B). If limn→∞ ωn (t), ξ exists for each t in a dense subset of R and ξ in a weak* dense subset of B ∗ , then {ωn : n ≥ 0} converges in 0 (B). Proof For a relatively compact sequence to be convergent, it is necessary and sufficient that every convergent subsequence have the same limit. Thus, suppose that ω and ω are limit points of {ωn : n ≥ 0}. Then, by hypothesis, ω(t), ξ = ω (t), ξ for t in a dense subset of R and ξ in a weak* dense subset of B ∗ . But this means that the same equality holds for all (t, ξ) ∈ R × B ∗ and therefore that ω = ω . Proof of Theorem 4.2.1 In view of Lemmas 4.2.2 and 4.2.3 and the separability of B B ∗ (0, 1) in the weak* topology, we will know that {Sn ( · , y) : n ≥ 0} converges in 0 (B) for W N -almost every y ∈ B N once we show that, for each (t, ξ) ∈ [0, ∞) × in R for W N -almost every y ∈ B N . But if ξ ∈ B ∗ , {Sn (t, y), ξ : n ≥ 0} converges n ∗ B , then Sn (t, y), ξ = 0 ym , ξ gm (t), the random variables y ym , ξ gm (t) are mutually independent, centered Gaussians under W N with variance h ξ 2H gm (t)2 , and ∞ ∞ gm (t)2 = (gm , f t )2H 1 (R;R) = f t 2H 1 (R;R) = t. 0
m=0
m=0
Thus, by Kolmogorov’s convergence theorem for sums of mutually independent, square integrable random variables, we have the required convergence. Further, because we are dealing with Gaussian random variables, almost sure convergence implies L 2 -convergence. Next, define S : [0, ∞) × B N −→ B so that S(t, y) =
limn→∞ Sn (t, y) if {Sn ( · , y) : n ≥ 0} converges in 0 (B) 0 otherwise.
Given ∈ 0 (B)∗ , determine h ∈ H01 (R; H ) by h, h H 1 (R;H ) = h, for 0
all h ∈ H01 (R; H ). We must show that, under W N , y S( · , y), is a centered Gaussian with variance h 2H 1 (R;H ) . To this end, define ξm ∈ B ∗ so that 0 B x, ξm B ∗ = 0 (B) gm x, 0 (B)∗ for x ∈ B, where gm x is the element of 0 (B) such that gm x(t) = gm (t)x. Then,
https://avxhm.se/blogs/hill0
124
4 Further Properties and Examples of Abstract Wiener Spaces
S( · , y), = lim Sn ( · , y), = lim n→∞
n→∞
n ym , ξm W N -almost surely. 0
Hence, S( · , y), is certainly a centered Gaussian under W N . To compute its variance, choose an orthonormal basis {h k : k ≥ 0} for H , and note that, for each m ≥ 0, ∞ gm h k , 2 . ym , ξm 2 W N (dy) = h ξm 2H = k=0
Thus, since {gm h k : (m, k) ∈ N2 } is an orthonormal basis in H01 (R; H ), ∞ ∞ ! 2 E S( · ), 2 = gm h k , h H 1 (R;H ) = h 2H 1 (R;H ) . gm h k , 2 = 0
m,k=0
0
m,k=0
Finally, if W0(B) is the W N -distribution of y S( · , y), then the preceding shows that H01 (B), 0 (R; H ), W0(B) is an abstract Wiener space. To see that {ω(t) : t ∈ R} is a Brownian motion under W0(B) , it suffices to show that, for all s < t and ξ ∈ B ∗ , ! E ω(s), ξ ω(s), ξ = w(s, t)h ξ 2H , where w(s, t) = 1[0,∞) (st)s ∧ t. To this end, define s ∈ 0 (B)∗ by 0 (B) ω, s 0 (B)∗
= B ω(s), ξ B ∗ for ω ∈ 0 (B).
Then h s = f s h ξ , where f s ∈ H01 (R; H ) is defined as in the proof of Lemma 4.2.2. Hence ! E ω(s), ξ ω(s), ξ = h t , h s H 1 (R;B) = ( f t , f s ) H 1 (R;R) h ξ 2H = w(s, t)h ξ 2H . 0 0
4.2.2 Strassen’s Theorem This subsection is devoted to a beautiful version, proved originally by Strassen [13] for classical Brownian motion, of the law of the iterated logarithm The classical statement of the law of the iterated logarithm says that if {X m : m ≥ 1} is a sequence of independent, identically distributed square integrable R-valued random variables with mean 0 and variance 1 and S˜n = Snn , where Sn = nm=1 X m % and n = 2n log(log n ∨ 3), then, with probability 1, the sequence { S˜n : n ≥ 1} is compact and the set of its limit points coincides with the interval [−1, 1]. A. Khinchine was the first to prove such a result when he did so for Bernoulli random
https://avxhm.se/blogs/hill0
4.2 Brownian Motion on a Banach Space
125
variables. Over time, various people, including Kolmogorov, extended the result to more general random variables, and it was finally proved in full generality by Hartman and Wintner. Strassen’s version introduced an innovation that had not been anticipated. Theorem 4.2.4 Let H01 (R; B), 0 (B), W0(B) be as in Theorem 4.2.1. Given ω ∈ % 0 (B), define ω˜ n (t) = ω(nt) for n ≥ 1 and t ∈ R, where n = 2n log(log n ∨ 3). n Then, for W0(B) -almost every ω, the sequence {ω˜ n : n ≥ 1} is relatively compact in 0 (B) and its set of limit points coincides with the closed unit B H01 (R;H ) (0, 1) in H01 (R; H ). Equivalently, for W0(B) -almost every ω, lim ω˜ n − B H01 (R;H ) (0, 1) 0 (B) = 0
n→∞
and, for each h ∈ B H01 (R;H ) (0, 1), limn→∞ ω˜ n − h 0 (B) = 0. Proof Without loss in generality, we will assume that · B ≤ · H . The proof relies on the Brownian scaling invariance property (cf. (vi) in Exercise 4.1.1), which says that W0(B) is invariant under the scaling maps Sα : 0 (B) −→ 1 0 (B) given by Sα ω = α− 2 ω(α · ) for α > 0 and is easily proved as a consequence of the fact that these maps are orthogonal transformations on H01 (R; H ). In addition, we will use the fact that, for R > 0, r ∈ (0, 1], and ω ∈ 0 (B), ω(r · ) − B H01 (R;H ) (0, R) 0 (B) ≤ ω − B H01 (R;H ) (0, R) 0 (B) . To see this, let h ∈ B H01 (R;H ) (0, R) be given, and check that h(r · ) is again in B H (0, R) and that ω(r · ) − h(r · ) 0 (B) ≤ ω − h 0 (B) . To prove that w˜ n tends to B H (0, 1), begin by observing that, for any β ∈ (1, 2), lim ω˜ n − B H (0, 1) B ≤ lim
n→∞
max
m→∞ β m−1 ≤n≤β m
ω˜ n − B H (0, 1) B .
. Taking the preceding Next, let δ ∈ (0, 1) be given, and set R = 1 + 2δ and β = 1+δ 2 comments into account and applying the upper bound in (4.1.7), one sees that
max ω˜ n − B H01 (R;H ) (0, 1) 0 (B) ≥ δ β m−1 ≤n≤β m m β 2 ω(nβ −m · ) (B) − B H01 (R;H ) (0, 1) ≥δ max = W0 n β m−1 ≤n≤β m 0 (B) m−1 ! m−1 ! δ β β (B) ≤ W0 ≥ max ω nβ −m · − B H01 (R;H ) 0, m m β m−1 ≤n≤β m β2 β2 (B) β m−1 ! δ β m−1 ! (B) ≥ ≤ W0 ω − B H01 (R;H ) 0, m m β2 β2 0 (B) m 1 = W0(B) β 2 −1 ω − B (0, 1) ≥ δ m−1 H (R;H ) ! β (B) 0
W0(B)
https://avxhm.se/blogs/hill0
126
4 Further Properties and Examples of Abstract Wiener Spaces
ω − B H01 (R;H ) (0, 1) 0 (B) ≥ δ β m−1 ! (B) ω∈ / B H01 (R;H ) (0, 1 + δ) = Wβ m −2 m−1 β ! R2 βm−1 ! R 2 β m−1 ! m−1 m−1 − β m ≤ exp − log log β ! ! = log β βm
= Wβ(B) m −2
for all sufficiently large m’s. Hence, since ∞
W0(B)
> 1,
m=1
and therefore
R2 β
W0(B)
max
β m−1 ≤n≤β m
ω˜ n − B H (0, 1) 0 (B) ≥ δ < ∞,
lim ω˜ n − B H (0, 1) 0 (B) ≥ δ = 0.
n→∞
Because H01 (R; B) is separable, to prove that limn→∞ ω˜ n − h 0 (B) = 0 for every h ∈ B H01 (R;H ) (0, 1) (a.s.,W0(B) ), it suffices to prove that limn→∞ ω˜ n − h 0 (B) = 0 (a.s.,W0(B) ) for each h ∈ B H01 (R;H ) (0, 1). In addition, because lim sup
sup
T →∞ ω∈A |t|∈[T / −1 ,T ]
ω(t) B = 0, 1 + |t|
for any relatively compact A ⊆ 0 (B), and, by the preceding and Theorem 3.3.4, for W0(B) -almost every ω, the union of {ω˜ n : n ≥ 1} and B H01 (R;H ) (0, 1) is relatively compact in 0 (B), it suffices to prove that lim
sup
ω˜ n (±t) − ω˜ n (±k −1 ) − h(±t) − h(±k −1 ) B 1+t
n→∞ t∈[k −1 ,k]
= 0 (a.s.,W0(B) )
for each h ∈ B H01 (R;H ) (0, 1) and k ≥ 2. Since, for a fixed k ≥ 2, the random variables ω˜ k 2m − ω˜ k 2m (±k −1 ) {t : ±t ∈ [k −1 , k]}, m ≥ 1, are W0(B) -independent random variables, we can use the Borel–Cantelli Lemma to reduce the problem to showing that, if ωˇ k m (t) = then
∞
ω˜ k m (t + k −1 ) − ω˜ k m (k −1 ) if t ≥ 0 ω˜ k m (t − k −1 ) − ω˜ k m (−k −1 ) if t < 0,
W0(B) ωˇ k 2m − h 0 (B) ≤ δ = ∞
m=1
https://avxhm.se/blogs/hill0
4.2 Brownian Motion on a Banach Space
127
for each δ > 0, k ≥ 2, and h ∈ B H01 (R;H ) (0, 1). Finally, given h ∈ B H01 (R;H ) (0, 1), set 1+h H 1 (R;H ) 0 ∈ h H01 (R;H ) , 1 . Because (W0(B) )(k m −12m )2 is the W0(B) -distribution R= 2 k of ω ωˇ k 2m , the lower bound in (4.1.7) with = B 0 (B) (h, δ) says that 2 m 2 W0(B) ωˇ k 2m − h 0 (B) ≤ δ ≥ e−R log(log k ) = (m log k)−R
for sufficiently large m.
An essentially trivial corollary of Theorem 4.2.4 is the law of the iterated logarithm for centered Gaussian random variables with values in a separable Banach space B. Indeed, if (H, B, W) is an associated abstract Wiener space and Sn is the nth partial sum of such random variables, then {Sn : n ≥} has the same distribution as {ω(n) : n ≥ 1} under W0(B) , and therefore, almost surely, Sn =0 1 lim − {h(1) : h ∈ B (0, 1)} H0 (R;H ) n→∞ n
and
B
Sn lim − h(1) = 0 for all h ∈ B H01 (R;H ) (0, 1).
n→∞
n
B
Further, by taking h(t) = (1 ∧ t)+ g for g ∈ H , one sees that B H (0, 1) = {h(1) : h ∈ B H01 (R;H ) (0, 1)}. The preceding corollary is far less interesting than the one Strassen had in mind. Namely, by combining his theorem when H = R with a beautiful idea of A. Skorokhod (cf. Chap. 7 in [12]), he realized that it provides an elegant proof of the Hartman–Wintner law of the iterated logarithm. What Skorokhod had shown is that if X is an R-valued random variable with mean value 0 and variance 1 and if {B(t) : t ∈ R} is a Brownian motion, then there is a stopping time σ for {B(t) : t ≥ 0} such that B(σ) has the same distribution as X . For example, if P(X = ±1) = 21 , then one can take σ to be the first time t ≥ 0 that |B(t)| = 1. In any case, the expected value of σ is E[X 2 ] = 1, and {B(t + σ) − B(σ) : t ≥ 0} is independent of B(σ) and has the same distribution as {B(t) : t ≥ 0}. Now, use induction to construct a sequence of stopping times {ζn : n ≥ 0} so that ζ0 = 0 and, for n ≥ 1, τn = ζn − ζn−1 is the stopping time σ relative to {B(t + ζn ) − B(ζn−1 ) : t ≥ 0}. independent, identically Then {τn : n ≥ 1} is a sequence of mutually distributed random variables with mean value 1, and B(ζn ) − B(ζn−1 ) : n ≥ 1 is a sequence of mutually independent random variables each of which has the same distribution as X . Therefore proving the law of the iterated logarithm for X comes down to undern) as n → ∞. To this end, set B˜ n (t) = B(nt) , and observe standing the behavior of B(ζ n n 1 ˜ that, by Theorem 4.2.4 applied when H = H0 (R; R), { Bn : n ≥ 1} is almost surely relatively compact in 0 (R), and so, since, by the law of large numbers, almost surely,
https://avxhm.se/blogs/hill0
ζn n
−→ 1
128
4 Further Properties and Examples of Abstract Wiener Spaces
B(ζn ) ˜ n (1) = B˜ n n −1 ζn − B˜ n (1) −→ 0 − B n almost surely. Finally, we know that, almost surely, { B˜ n (1) : n ≥ 1} is compact and that its set of limit points is the closure of {h(1) : h H01 (R;R) ≤ 1} in R and is therefore the interval [−1, 1]. Exercise 4.2.1 For ω ∈ 0 (B), define ∗
ω (t) =
|t|ω 0
1 t
if t = 0 if t = 0.
(i) Show that ω ∗ ∈ 0 (B) and that ω ω∗ is an isometric, linear map of 0 (B) onto itself. Further show that h ∗ ∈ H01 (R; H ) for h ∈ H01 (R; H ) and that h h ∗ is an orthogonal transformation on H01 (R; H ). In particular, conclude that ω∗ has the same distribution under W0(B) as ω. (ii) Set nω nt X n (t, ω) = % , 2 log(2) (n ∨ 3) and show that, for W0(B) -almost every ω ∈ 0 (B), {X n ( · , ω) : n ≥ 1} is a relatively compact sequence for which B H01 (R;H ) (0, 1) is the set of limit points. Hint: Note that X n ( · , ω) − h 0 (B) = X n ( · , ω)∗ − h ∗ ) 0 (B) ∗ and that X n ( · , ω) = −1 n |t|ω
n . Conclude that t
X n ( · , ω) − h 0 (B) = ω&n∗ − h 0 (B) . (iii) Let (H, B, W) be an abstract Wiener space, X 0 a B-valued random variable with distribution W, and {B(t) : t ∈ R} a B-valued Brownian motion with distribution W0(B) . Define 1 X (t) = 2− 2 e−t X 0 + e−t B sgn(t) e2t − 1 , and show that for each ξ ∈ B ∗ , X (t), ξ : t ∈ R is and Ornstein–Uhlenbeck process with covariance 21 e−|t−s| gξ 2H . (iv) Continuing part (iii), let H 1 (R; H ) be the Hilbert space of absolutely continuous h : R −→ H for which 21 ˙ 22 < ∞. h H 1 (R;H ) ≡ h2L 2 (λR ;H ) + h L (λR ;H )
https://avxhm.se/blogs/hill0
4.3 One Dimensional Euclidean Fields
129
In addition, take (B) to be the space of continuous ω : R −→ B satisfying B B = 0, and check that (B) with norm ω (B) = supt∈R ω(t) is a lim|t|→∞ ω(t) 1+|t| 1+|t| separable Banachspace. Show that the paths t X (t) are almost surely elements of (B) and that H 1 (R; H ), (B), U (B) is an abstract Wiener space if U (B) is the distribution of X .
4.3 One Dimensional Euclidean Fields In this section we will be studying Hilbert spaces for which the only associated Banach spaces contain generalized functions (i.e., distributions) on which point evaluation is not a continuous functional. Such an abstract Wiener spaces describe what physicists call a field, and when, like L 2 (λR N ; R), the Hilbert space is invariant under the Euclidean group, it is called a Euclidean field;
4.3.1 Some Background Recall the normalized Hermite functions {h˜ n : n ≥ 0} introduced in Sect. 2.3.3. They form an orthonormal basis not only in L 2 (λR ; R), they also form a basis in L. Schwartz’s test function space S (R; R) of smooth functions ϕ all of whose derivatives are rapidly decreasing in the sense that lim |x|k ∂ ϕ(x) = 0 for all k, ∈ N.
|x|→∞
To be precise, using the results in Sect. 2.3.3 (cf. part (iii) in Exercise 2.3.1 and Sect. 7.3.4 in [15] for more details), especially (2.3.20) and (2.3.22), one can show that ϕ ∈ S (R; R) if and only if ϕS (m) (R;R)
∞ 21 2 1 m ˜ n+2 ϕ, h n L 2 (λR ;R) ≡ < ∞ for all m ∈ N, n=0
and that S (R; R) becomes a complete, separable metric space when one uses the metric ∞ ϕ − ψS (m) (R;R) ρ(ϕ, ψ) = 2−m . 1 + ϕ − ψS (m) (R;R) m=0 As a consequence, the dual space S ∗ (R; R) of tempered distributions can be described as the set of linear functionals u on S (R; R) for which the sequence h˜ n , u : n ≥ 0} has at most polynomial growth, and the action of u ∈ S ∗ (R; R) on ϕ ∈ S (R; R) is given by
https://avxhm.se/blogs/hill0
130
4 Further Properties and Examples of Abstract Wiener Spaces
ϕ, u =
∞ (ϕ, h˜ n ) L 2 (λR ;R) h˜ n , u .
(4.3.1)
n=0
The preceding leads to a natural decomposition of S ∗ (R; R) into subspaces corresponding to the relationship of their elements to L 2 (λR ; R). Namely, set λn = n + 21 , and, for m ≥ 0, define S (−m) (R; R) to be the space of u ∈ S ∗ (R; R) for which ∞ 21 2 ˜ λ−m < ∞. uS (−m) (R;R) ≡ n h n , u n=0
Clearly S (−m) (R; R) is a separable Hilbert space with inner product ∞ ˜ ˜ u, v)S (−m) (R;R) = λ−m n h n , u h n , v n=0 m
: n ≥ 0} is an orthonormal basis for it. Alternaand, if h n(−m) = λn2 h˜ n , then {h (−m) n m tively, let H be the Hermite operator in (2.3.21), and define the operators (−H) 2 for m ∈ Z by ∞ m m 2 (−H) ϕ = λn2 ϕ, h˜ n L 2 (λR ;R) h˜ n for ϕ ∈ S (R; R). n=0
These operators are self-adjoint and therefore extend to S ∗ (R; R) by defining m (−H) 2 u so that m m ϕ, (−H) 2 u = (−H) 2 ϕ, u . m
m Since (−H) 2 h˜ n = λn2 h˜ n , it should now be clear that u ∈ S (−m) (R; R) ⇐⇒ m − m2 (−H) u ∈ L 2 (λR ; R) and that (−H) 2 maps L 2 (λR ; R) isometrically onto (−m) (R; R). Finally, if Q t is the operator in (2.3.24), then, by (2.3.25), S
h˜ n , Q t u = Q t h˜ n , u = e−λn t h˜ n , u , and so Q t maps S ∗ (R; R) into S (R; R) and, as t 0, Q t u −→ u in S (−m) (R; R) if u ∈ S (−m) (R; R).
4.3.2 An Abstract Wiener Space for L 2 (λR ; R) The Hilbert space L 2 (λR ; R) is an example of a Hilbert space that requires the preceding considerations. Indeed, if B were a Banach space of distributions for which evaluation at a point were represented by an element of B ∗ , then there would have to exist an f 0 ∈ L 2 (λR ; R) such that (ϕ, f 0 ) L 2 (λR ;R) = ϕ(0) when ϕ ∈ S (R; R). But
https://avxhm.se/blogs/hill0
4.3 One Dimensional Euclidean Fields
131
this would mean that |ϕ(0)| ≤ f 0 L 2 (λR ;R) ϕ L 2 (λR ;R) , and so f 0 L 2 (λR ;R) would have to be infinite. Proceeding as in the proof of Theorem 3.3.2, define A : S (−2) (R; R) −→ (−2) (R; R) by S ∞ (−2) Au = λ−1 , u S (−2) (R;R) h˜ n , n hn n=0
and observe that ∞
h (−2) , Ah (−2) n n
S
(−2) (R;R)
=
n=0
∞
λ−2 n < ∞.
n=0
Hence, by Theorem 3.2.8, there is a centered, Gaussian measure W L ∈ M1 S (−2) (R; R) for which u, Av S (−2) (R;R) is the covariance function. Moreover, given u ∈ S (−2) (R; R), set f u = Au, note that f u ∈ L 2 (λR ; R), and check that, for g ∈ L 2 (λR ; R), (g, u)S (−2) (R;R) =
∞
g, h n(−2)
S (−2) (R;R)
h n(−2) , u
S (−2) (R;R)
= g, f u L 2 (λR ;R)
n=0
and that f u 2L 2 (λR ;R) = u, Au S (−2) (R;R) . Therefore, since L 2 (λ R ; R) is continuous embedded as a dense subspace of S (−2) (R; R), we have proved the following theorem. Theorem 4.3.1 L 2 (λR ; R), S (−2) (R; R), W L is an abstract Wiener space. There is an interesting connection between this abstract Wiener space and Brownian motion. Namely, define I 1[0,t] fort ≥ 0 B(t) = −I 1[t,0] for t < 0. Then {B(t) : t ∈ R} is a centered Gaussian process under W L with the covariance function w in (2.5.3). Hence there is a continuous version of {B(t) : t ≥ 0}, and this version will be a Brownian motion. If elements of S (−2) (R; R) were bona fide functions and we pretended that I( f )(u) = ( f, u) L 2 (λR ;R) , then the preceding would be saying that $ t B(t, u) =
0 $u(τ ) dτ 0 − t u(τ ) dτ
if t ≥ 0 if t < 0.
https://avxhm.se/blogs/hill0
132
4 Further Properties and Examples of Abstract Wiener Spaces
˙ u).” To make this mathematically kosher, we have to Alternatively, “u(t) = B(t, formulate it in the language of distribution theory. That is, given v ∈ S ∗ (R; R), ∂v is the element of S ∗ (R; R) satisfying ϕ, ∂v = −ϕ , v for all ϕ ∈ S (R; R). To see that u = ∂ B( · , u) in this sense, set ft =
1[0,t] if t ≥ 0 −1[t,0] if t < 0
and (cf. (2.3.24)) Bs (t, u) = Q s f t , u for s > 0. Then, as s 0, Bs (t, · ) − B(t, · ) 2 = Q s f t − f t L 2 (λR ;R) −→ 0 L (W L ;R)
(∗)
uniformly for t in compact subsets. At the same time, Q s f t , u = lim Q s f t , Q σ u L 2 (λR ;R) = lim f t , Q s+σ u L 2 (λR ;R) σ0 σ0 = lim Q σ f t , Q s u L 2 (λR ;R) = f t , u s L 2 (λR ;R) , σ0
where u s = Q s u ∈ S (R; R), and so $ t Bs (t, u) =
0 $u s (τ ) dτ 0 − t u s (τ ) dτ
if t ≥ 0 if t < 0.
Hence, for any ϕ ∈ S (R; R),
ϕ (t)Bs (t, u) dt = −
ϕ(t)u s (t) dt,
and therefore 21 2 ϕ (t)B(t, u) dt + ϕ, u W L (du) 21 2 ϕ (t) B(t, u) − Bs (t, u) dt W L (du) ≤ +
ϕ, u − u s 2 W L (du)
21
.
By Minkowski’s inequality and (∗), the first term on the right tends to 0 as s 0, and, because u s −→ u in S (−2) (R; R) and u s S (−2) (R;R) ≤ uS (−2) (R;R) ∈
https://avxhm.se/blogs/hill0
4.4 Euclidean Fields in Higher Dimensions
133
L 2 (W L ; R), the second term does also. Finally, because S (R; R) is separable, having proved that ϕ , B( · , u) = −ϕ, u (a.s.,W L ) for each ϕ ∈ S (R; R), it follows that u = ∂ B( · , u) for W L -almost every u ∈ S (−2) (R; R). As a dividend of these considerations, we see that W L lives on a much smaller class of distributions than S (−2) (R; R). In fact, we now know that W L is supported on the space of tempered distributions that are the first distributional derivative of functions that are Hölder continuous of every order less than 21 and grow at infinity slower than every power greater than 21 . In the engineering literature, the derivative of Brownian motion is known as white ˙ noise. That is because, if one pretends the B(t) exists in a classical sense, then ˙ the process { B(t) : t ∈ R} would be totally uncorrelated Gaussian process. In fact, its covariance function c(s, t) would be ∞ when s = t and 0 when s = t. Hence, ˙ for each t ∈ R, B(t) would be a centered Gaussian random variable with infinite ˙ variance which is independent of { B(s) : s = t}. Admittedly, this picture is much more intuitively appealing than the more mathematically correct one given above, but loss of intuition is a price that mathematicians are accustomed to paying in the pursuit of rigor. The Ornstein–Uhlenbeck process is another process that can be constructed starting from 2 L (λR ; R), S (−2) (R; R), W L by the procedure we just used to construct Brownian motion. The reason why the Gaussian process {I( f t ) : t ∈ R} under W L has the distribution of a Brownian motion is that ( f s , f t ) L 2 (λR ;R) = w(s, t). Thus, what is needed to construct an process is a family {ψt : t ∈ R} ⊆ L 2 (λR ; R) for which Ornstein–Uhlenbeck 1 −|t−s| . To find such a family, remember (cf. (v) in Exercise 1.2.1) ψs , ψt L 2 (λR ;R) = 2 e 1 −|t| that e is the characteristic function of the Cauchy distribution π(1+ξ 2 ) , and conclude that we can take ψt ∈ L 2 (λR ; R) to be the function whose Fourier transform 1 is eitξ (1 + ξ 2 )− 2 . Another expression of ψt is given in (ii) of Exercise 4.4.1.
4.4 Euclidean Fields in Higher Dimensions As we have seen, the abstract Wiener space for L 2 (λR ; R) already requires the introduction of distributions, albeit distributions of a mild order. Although the order of the distributions required goes up with dimension, only a few new ideas are required.
https://avxhm.se/blogs/hill0
134
4 Further Properties and Examples of Abstract Wiener Spaces
4.4.1 An Abstract Wiener Space for L 2 (λR N ; R) The first step is to introduce the Hermite functions for R N . Given n = (n 1 , . . . , n N ) ∈ N N , define N
h˜ n (x) = h˜ n j (x j ) for x ∈ R N . j=1
Then {h˜ n : n ∈ N N } forms an orthonormal basis in L 2 (R N ; R). In addition, for each n Hh˜ n = −λn h˜ n where H =
1 2
N N , and n1 = − |x|2 , λn = n1 + n j. 2 j=1
The first of these is just the standard construction of bases on product spaces from bases on the factors, and the second is an easy consequence of (2.3.21). Further, these Hermite functions play the same role in Schwartz’s theory of tempered distributions on R N as their antecedents do for his theory of tempered distributions on R. That is, the test function space S (R N ; R) consists of ϕ ∈ L 2 (λR N ; R) with the property that
˜ 2 λm n ϕ, h n L 2 (λ
n∈N N
RN
;R)
< ∞ for all m ≥ 0,
and the space S ∗ (R N ; R) of tempered distributions is the set of linear functionals u on S (R N ; R) with the property that
2 ˜ λ−m n h n , u < ∞ for some m ≥ 0.
n∈N N
Finally, for each m ≥ 0, one takes S (−m) (R N ; R) to be the separable Hilbert space of u ∈ S ∗ (R N ; R) for which uS (−m) (R N ;R) =
2 ˜ λ−m n h n , u
21 < ∞.
n∈N N m
= λn2 h˜ n , then {h (−m) : n ∈ N N } is an orthonormal basis for Obviously, if h (−m) n n (−m) N (R ; R). S In view of the preceding preparations, it should be clear how to go about constructing an abstract Wiener space for L 2 (λR N ; R). Indeed, define A : S (−N −1) (R N ; R) −→ S (−N −1) (R N ; R) by Au =
− N 2+1 (−N −1) hn , u S (−N −1) (R N ;R) h˜ n .
λn
n∈N N
https://avxhm.se/blogs/hill0
4.4 Euclidean Fields in Higher Dimensions
Then
−1) −1) , Ah (−N h (−N n n
135
S (−N −1) (R N ;R)
n∈N N
=
∞ +
N −N −1 card n 2
=0
∞ ( + 1) N −1 : n1 = ≤ < ∞. ( + N2 ) N +1 =0
Hence, by Theorem 3.2.8, there is a centered Gaussian measure W L N ∈ M1 S (−N −1) (R N ; R) for which v, Au S (−N −1) (R N ;R) is the covariance function. Next, for a given u ∈ S (−N −1) (R N ; R), set f u = Au, and check that g, u S (−N −1) (R N ;R) = (g, f u ) L 2 (λR N ;R) for g ∈ L 2 (λR N ; R) and f u 2L 2 (λ N ;R) = u, Au S (−N −1) . Therefore L 2 (λR N ; R), R S (−N −1) (R N ; R), W L N is an abstract Wiener space. When N ≥ 2, there is no true analog of Brownian motion because there is no way to interpret the Hilbert space H01 (R N ; R) as a space of tempered distributions. One might naïvely guess that it should be the completion of {h ∈ Cc1 (R N ; R) : h(0) = 0} with respect to the norm ∇h L 2 (R N ;R N ) , but, because, when N ≥ 2 such elements of L 2 (λR N ; R) are defined only up to a set of measure 0, the condition u(0) = 0 gets lost and prevents this completion from being space of distributions. See part (i) of Exercise 4.4.1 for more details. In spite of preceding, there is an approximate analog of Brownian motion, known as the Brownian sheet, in higher dimensions. To describe it, for t = (t1 , . . . , t N ) ∈ R N define t (x) =
N
1[0,t +j ] (x j ) − 1[t −j ,0) (x j ) for x = (x1 , . . . , x N ) ∈ R N , j=1
and set B(t) = I t . Then {B(t) : t ∈ R N } is a centered Gaussian family under W L N with covariance function '
!
EWL N B(s)B(t) =
0
N j=1
|s j | ∧ |t j |
if s j t j ≥ 0 for all 1 ≤ j ≤ N otherwise.
As a consequence, one finds that B(t) − B(s) is independent of B(s) if s j ≤ t j for all 1 ≤ j ≤ N and that there is a C < ∞ such that EWL N
2 !
B(t) − B(s)
≤ C T N −1 |t − s| for T ≥ 1 and s, t ∈ [−T, T ] N .
https://avxhm.se/blogs/hill0
136
4 Further Properties and Examples of Abstract Wiener Spaces
Hence, by Corollary 2.5.5, there is a version of {B(t) : t ∈ R N } which is Hölder continuous of every order less than 21 . Finally, by essentially the same argument as we used when N = 1, one can show that ∂ N B(t, u) = u for W L N -almost every u ∈ S (−N −1) (R N ; R). ∂t1 · · · ∂t N
4.4.2 The Ornstein–Uhlenbeck Field in Higher Dimensions As distinguished from H01 (R; R), the analog H 1 (R N ; R) of H 1 (R; R) is a function space: one simply has to complete S (R N ; R) with respect to the Hilbert norm corresponding to the inner product
g, h
H 1 (R N ;R)
= g, h L 2 (λ
RN
;R)
+ ∇g, ∇h L 2 (λ
RN
;R N )
.
However, when N ≥ 2, it will be shown that H 1 (R N ; R) is not the Cameron– Martin space for a centered Gaussian measure on a Banach space of distributions for which evaluation at a point is given by an element of its dual. Thus, like the one for L 2 (R N ; R), a Gaussian measure for which H 1 (R N ; R) is the Cameron–Martin space lives on a space of distributions, and the associated abstract Wiener space will be a Euclidean field. To understand what follows, it is helpful to re-interpret the construction that we made of Sect. 4.3.2 of the Ornstein–Uhlenbeck process starting 2 at the end (−2) (λ ; R), S (R; R), W from L R L . If ϕ ∈ S (R; R), then (h, ϕ) H 1 (R;R) = h, Lg L 2 (λR ;R) for h ∈ H 1 (R; R), where L is the Bessel operator 1 − ∂ 2 . Clearly L maps S (R; R) continuously into itself, and, in terms of the Fourier transform, the action of L is determined by ( = (1 + ξ 2 )ϕ(ξ). ˆ Lϕ Further, if, for any α ∈ R and ϕ ∈ S (R; R), L α ϕ is determined by L α ϕ(ξ) = 2 α α ˆ then L also maps S (R; R) continuously into itself and L α+β = (1 + ξ ) ϕ(ξ), α β α ∗ α α α define L on S (R; R) by ϕ, L u = L ϕ, u . Since L ◦ Lα . Thus we can ψ, L ϕ L 2 (λR ;R) = L ϕ, ψ L 2 (λR ;R) for ϕ, ψ ∈ S (R; R), this definition of L α on S ∗ (R; R) is consistent with the one on S (R; R), and the function ψt in Sect. 4.3.2 1 is L − 2 δt . When α ≤ 0, it is easy to check that L α is a continuous, linear map of S (−m) (R; R) 1 into itself for every m ∈ N and that L − 2 is an isometric isomorphism of L 2 (λR ; R) 1 onto H (R; R). In particular, we can apply Theorem 3.3.3 to see that H 1 (R; R), B, 1 1 (L − 2 )∗ W L is an abstract Wiener space when B is the Banach space L − 2 u : u ∈ 1 S (−2) (R; R) with norm x B = L 2 xS (−2) (R;R) . As we know, there is a more
https://avxhm.se/blogs/hill0
4.4 Euclidean Fields in Higher Dimensions
137
pleasing choice of Banach space, namely the one in Theorem 3.3.10, but the construction used here is too crude to arrive at that choice. With the preceding in mind, it should be clear how to construct from L 2 (λR N ; R), S (−N −1) (R N ; R), W L N an abstract Wiener space whose Cameron–Martin space is 1 N N H (R ; R). Indeed, one first observes that, for ϕ ∈ S (R ; R), (h, ϕ) H 1 (R N ;R) = h, Lϕ L 2 (λ N ;R) , where L is the Bessel operator 1 − . Proceeding as in the R ( = (1 + |ξ|2 )ϕ(ξ), ˆ and use this to define the opercase when N = 1, note that Lϕ α ators L for α ∈ R, first on S (R N ; R) and then on S ∗ (R N ; R). Check that 1 L − 2 is a bounded linear map of S (−N −1) (R N ; R) into itself and an isometric isomorphism from L 2 (λR N ; R) onto H 1 (R N ; R). Hence, by Theorem 3.3.3, if 1 1 W F N = (L − 2 )∗ W L N and B is the Banach space L − 2 u : u ∈ S (−N −1) (R N ; R) 1 with norm x B = L 2 xS (−N −1) (R N ;R) , then H 1 (R N ; R), B, W F N is an abstract Wiener space. The critical difference between the cases N = 1 and N ≥ 2 comes from the fact 1 that (1 + |ξ|2 )− 2 is an element of L 2 (λR N ; R) if and only if N = 1, and therefore 1 L − 2 δx ∈ L 2 (λR N ; R) ifand only if N = 1. Thus, when N ≥ 2, there are no functions ˆ = (1 + |ξ|2 )−1 . As ψx , x ∈ R N , for which ψx , ψ y L 2 (λ N ;R) = k(y − x), where k(ξ) R a consequence, when N ≥ {X (x) : x ∈ 2, there is no centered Gaussian process R N } that plays the role for H 1 (R N ; R), S (−N −1) (R N ; R), W F N that the Ornstein– Uhlenbeck plays when N = 1. Alternatively, this difference can be seen in terms of L −1 . For δ0 to be an element of B ∗ , it is necessary that there exist an h 0 ∈ H 1 (R N ; R) such that ϕ, δ0 = (ϕ, h 0 ) H 1 (R N ;R) = Lϕ, h 0 L 2 (λ
RN
;R)
= ϕ, Lh 0
for all ϕ ∈ S (R N ; R). This means that Lh 0 = δ0 and therefore h 0 (ξ) = (1 + |ξ|2 )−1 . But (1 + |ξ|2 )−1 ∈ L 2 (λR N ; R) if and only if N ∈ {1, 2} and ∇h 0 (ξ) = (1 + |ξ|2 )−1 ξ ∈ L 2 (λR N ; R N ) if and only if N = 1. Thus h 0 exists if and only if N = 1.
4.4.3 Is There any Physics Here? There are many reasons why the answer is a resounding NO. Physicists want nontrivial quantum fields, and all we have done is produce Euclidean fields, and not even particularly interesting ones. Feynman’s path-integral formalism provides a way of understanding why we have not been doing physics. If Feynman were asked to describe the measure W in an abstract Wiener space (H, B, W), he would say that W is given by the formula in (3.1.1). Of course, unless H is finite dimensional, from a mathematical standpoint, (3.1.1) is irreparably flawed when H is infinite dimensional: as we have seen, the measure W lives on the Banach space B and does not even see H . Nonetheless,
https://avxhm.se/blogs/hill0
138
4 Further Properties and Examples of Abstract Wiener Spaces
as we saw when we derived (3.3.2), one can make accurate predictions based on expressions like (3.1.1), and so I will ignore its flaws in the following. The Feynman representation of measure W F4 in is
H 1 (R4 ; R), S (−5) (R4 ; R), W F4
h2L 2 (R4 ;R) + ∇h2L 2 (R4 ;R4 ) 1 W F4 (dh) = exp − λ H 1 (R4 ;R) (dh). Z 2
Although we have given a mathematically satisfactory description of W F4 , that is only the first step in producing a physically satisfactory quantum field. A basic physical requirement is that a quantum field be invariant under Lorentz transformations (i.e., ones that preserve the quadratic form −x12 + 4j=2 x 2j ) of coordinates, whereas W F4 is invariant under Euclidean (i.e., ones that preserve |x|2 ) transformations of coordinates. The obvious way to convert a Euclidean invariant field into Lorentz invariant one is to replace x1 by i x1 , a step that is easier to describe than it is to carry out. Besides the issue raised in the preceding paragraph, there is another serious problem to be confronted even in the Euclidean setting. The problem is that the abstract Wiener space for H 1 (R N ; R) is a Euclidean model of a system of free particles, particles which do not interact. The simplest Euclidean model of interacting particles would be one in which the density exp −
h2H 1 (R4 ;R)
2
with respect to λ H 1 (R4 ;R) is replaced by exp −
h2H 1 (R4 ;R) + αh4L 4 (R4 ;R)
2
for some α > 0. Equivalently, one for a measure on H 1 (R4 ; R) whose den is looking 4 sity with respect to W F4 is exp −αh L 4 (R N 4 ;R) . Since, by the Sobolev Embedding
Theorem, L 4 (R4 ; R) ⊆ H 1 (R4 ; R), this would make perfectly good sense if W F4 lived on H 1 (R4 ; R), but no naïve interpretation is available when one takes into account of the fact that W F4 lives on a space of distributions. Indeed, one knows how to apply lots of linear operations to distributions, but one doesn’t know how to apply non-linear ones to them. Nonetheless, Nelson was able to construct a non-trivial two dimensional Euclidean field and showed that it could be transformed into a Lorentz invariant field. Using different techniques, in a sequence of articles Glimm and Jaffe carried out the same program for three dimensional fields. Seeing as the book [7] that they subsequently wrote is 535 pages long, I think that I can be forgiven, maybe
https://avxhm.se/blogs/hill0
4.4 Euclidean Fields in Higher Dimensions
139
even thanked, for not attempting to summarize their work here. As far as I know, to date, nobody has succeeded in constructing a non-trivial four dimensional quantum field, and there are results that indicate that nobody ever will. Exercise 4.4.1 (i) In connection with the problems with the space H01 (R N ; R), let C be the space of ϕ ∈ C 1 (R2 ; R) for which ϕ(0) = 0 and ∇ϕ ∈ L 2 (R2 ; R2 ). Then H01 (R2 ; R) is the completion of C with respect to the Hilbert norm ϕ H01 (R2 ;R) = ∇ϕ L 2 (R2 ;R2 ) . Now define ⎧ ⎪ if x ≥ 1 ⎨1 ψn (x) = log(nx) if n1 ≤ |x| < 1 log n ⎪ ⎩ 0 if x < n1 . Choose a ρ ∈ C ∞ R2 ; [0, ∞) which vanishes off of BR2 (0, 1) and has integral 1, set ρn (x) = n 2 ρ(nx), and take ϕn = ρn ∗ ψn . Clearly, ϕn ∈ C. Show that ϕn −→ 1 in S ∗ (R2 ; R) and ∇ϕn L 2 (R2 ;R2 ) −→ 0. Thus, just because the · H01 (R2 ;R) norms of a sequence in C tends to 0, the sequence need not converge to 0 in S ∗ (R2 ; R), and so H01 (R2 ; R) is not a subset of S ∗ (R2 ; R). (ii) Our representation in Sect. 3.6.2 of powers of the Bessel operator L was as Fourier multipliers. The goal in here is to describe the distributions of which the 1 Fourier multipliers for L −1 and L − 2 are the Fourier transform. To do so, define r N (λ) = (4π)− 2
N
∞
t 2 −2 e−λt− t dt for λ > 0, N
1
0
and set k N (x) = r N
|x|2 4
and ψ N (x) = r N +1
|x|2 4
for x ∈ R N \ {0}.
2 −1 (N (ξ) = (1 + |ξ|2 )− 2 . and ψ Show that k( N (ξ) = (1 + |ξ| ) 1
Hint: Observe that 2 ∞ |x|2 |x| N = e−t gt (x) dt where gt (x) = (4πt)− 2 e− 4t . rN 4 0 √
(iii) Continuing (ii), show that r1 (λ) = 2−1 e− Laplace transforms to show that ⎧ 1 ⎪ ⎨log λ r N (λ) ∼ N 1− N ⎪ ⎩ 2 2 λ 2 N −2
λ
, and use Abelian asymptotics for
if N = 2
as λ 0.
if N ≥ 3
https://avxhm.se/blogs/hill0
References
1. Bogachev, V.: Gaussian Measures. AMS Math. Surv. Monogr. 62 (1998) 2. Bobkov, S.V.: An Isoperimetric Inequality on the Discrete Cube, and an Elementary Proof on the Isoperimetric Inequality on Gauss Space. Ann. Prob. 25(1), 206–214 (1997) 3. Wlodzimierz Bryc: Normal Distribution: Characterizations with Applications, Lecture Notes in Statistics vol 100, Springer (1995) 4. Chen, L., Stroock, D.: Additive Functions and Gaussian Measures. Prokhorov and Contemporary Probability Theory, Springer Proceedings in Mathematics and Statistics, vol 33. Springer, Berlin, Heidelberg (2013) 5. Donsker, M., Varadhan, S.R.S.: Large Deviations for Stationary Gaussian Processes. Comm. Math. Phys. 97, 187–210 (1985) 6. Enchev, O., Stroock, D.: Rademacher’s Theorem for Wiener Functionals. Ann. Probab. 21, 25–33 (1993) 7. Glimm, J., Jaffe, A.: Quantum Physics, 2nd edn. Springer (1987) 8. Gross, L.: Logarithmic Sobolev Inequalities. Am. J. Math. 97, 1061–1083 (1975) 9. Gross, L.: Abstract Wiener Spaces. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Berkeley, California, 1965–1966), Vol. II: Contributions to Probability Theory, Part 1. Berkeley, California: University of California Press, pp. 31–42 (1965–1966) 10. Segal, I.: Ergodic Subgroups of the Orthogonal Group on a Real Hilbert Space. Ann. Math. 66(2) (1957) 11. Simon, B.: The P(φ)2 Euclidean (Quantum) Field Theory, Princeton Series in Physics. Princeton University Press, Princeton, N.J. (1974) 12. Skorohod, A.V.: Studies in the Theory of Random Processes, Dover Books on Mathematics, (1982) 13. Strassen, V.: An Invariance Principle for the Law of the Iterated Logarithm. Z. W. Verw. Geb. 3, 211–226 (1964) 14. Stroock, D.W.: Probability Theory, An Analytic View, 2nd edn. Cambridge University Press (2010) 15. Stroock, D.W.: Essentials of Integration Theory for Analysis, 2nd edn. Springer GTM, 262 (2020) 16. Stroock, D.W.: Some Thoughts about Segal’s Ergodic Theorem. Colloq. Math. 118(1), 89–105 (2010) 17. Stroock, D.W.: On a Theorem of Laurent Schwartz. C. R. Acad. Sci. Paris 349(1–2), 5–6 (2011) © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. W. Stroock, Gaussian Measures in Finite and Infinite Dimensions, Universitext, https://doi.org/10.1007/978-3-031-23122-3
https://avxhm.se/blogs/hill0
141
Index
finite dimensional, 1 infinite dimensional, 71
A abstract Wiener space, 82 isoperimetric inequality, 104 large deviations for, 117 additive function, 24 a.e., 24
B Bochner integration, 72 Bochner’s Theorem, 9 Brownian motion, 60 Banach space valued, 121 pinned, 67 Strassen’s Theorem, 125 Brownian sheet, 135
C Cameron–Martin formula, 89 subspace, 86 characteristic function, 1, 71 consistent family of measures, 52 weak convergence of measures, 4 covariance, 39 function, 52
E Euclidean field, 129
F Fernique’s Theorem, 76 Fourier transform
G Gaussian family, 40 centered, 40 process, 52 stationary, 63 Gaussian measure concentration property, 44 isoperimetric inequality finite dimensional, 45 for abstract Wiener space, 104 non-degenerate, 69 on a Banach space, 76 standard, 19
H Hermite function, 35 operator, 35 polynomial normalized, 34 unnormalized, 33 raising operator, 33 hypercontraction, 30
I isoperimetric inequality finite dimensional, 45 infinite dimensional, 104
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 D. W. Stroock, Gaussian Measures in Finite and Infinite Dimensions, Universitext, https://doi.org/10.1007/978-3-031-23122-3
https://avxhm.se/blogs/hill0
143
144
Index O Ornstein–Uhlenbeck operator, 27 process, 65 semigroup, 26
K Kolmogorov Consistency Theorem, 52 continuity criterion, 56
L Lévy system, 12 measure, 12 operator, 15 Cramer–Lévy Theorem, 21 Lévy–Khinchine formula, 12 Lévy’s Continuity Theorem, 8 logarithmic Sobolev inequality, 29
M Malliavin’s calculus, 90 Markov process, 62 homogeneous, 62 martingale, 75 Maurey–Pisier Theorem, 42 Mehler kernel, 38
P Paley–Wiener map, 88 Poincaré inequality, 26
R Rademacher’s Theorem, 106 rapidly decreasing, 5 relative entropy, 31, 38
S Strassen’s law of the iterated logarithm, 125 Sudakov’s Theorem, 90 symmetrization of measure, 22
T transition probability function, 62 N non-degenerate, 69 non-negative definite function, 8 normal random variable, 19
W white noise, 133
https://avxhm.se/blogs/hill0