128 31 15MB
English Pages 256 Year 2014
Diffusion, Quantum Theory, and Radically Elementary Mathematics
Diffusion, Quantum Theory, and Radically Elementary Mathematics
Mathematical Notes 47
edited by William G. Faris
PRINCETON UNIVERSITY PRESS PRINCETON AND OXFORD
Copyright
© 2006 by Princeton University Press
Published by Princeton University Press, 41 William Street, Princeton, New Jersey 08540 In the United Kingdom: Princeton University Press, 3 Market Place, Woodstock, Oxfordshire OX20 I SY All Rights Reserved Library of Congress Control Number 2006040533 ISBN-13: 978-0-691-12545-9 (pbk. : alk. paper) ISBN-IO: 0-691-12545-7 (pbk. : alk. paper) British Library Cataloging-in-Publication Data is available. The publisher would like to acknowledge the author of this volume for providing the camera-ready copy from which this book was printed. This book has been composed in Times Roman using LKfEX. Printed on acid-free paper.
00
pup.princeton.edu Printed in the United States of America 10987654321
The Laplace operator in its various manifestations is the most beautiful and central object in all of mathematics. Probability theory, mathematical physics, Fourier analysis, partial differential equations, the theory of Lie groups, and differential geometry all revolve around this sun, and its light even penetrates such obscure regions as number theory and algebraic geometry. Edward Nelson, Tensor Analysis
Contents
Preface
ix
Chapter 1. Introduction: Diffusive Motion and Where It Leads William G. Faris Chapter 2. Hypercontractivity, Logarithmic Sobolev Inequalities, and Applications: A Survey of Surveys Leonard Gross
45
Chapter 3. Ed Nelson's Work in Quantum Theory Barry Simon
75
Chapter 4. Symanzik, Nelson, and Self-Avoiding Walk David C. Brydges
95
Chapter 5. Stochastic Mechanics: A Look Back and a Look Ahead Eric Carlen .
117
Chapter 6. Current Trends in Optimal Transportation: A Tribute to Ed Nelson Cedric Villani
141
Chapter 7. Internal Set Theory and Infinitesimal Random Walks Gregory F. Lawler
157
Chapter 8. Nelson's Work on Logic and Foundations and Other Reflections on the Foundations of Mathematics Samuel R. Buss
183
Chapter 9. Some Musical Groups: Selected Applications of Group Theory in Music Julian Hook
209
Chapter 10. Afterword Edward Nelson
229
viii
CONTENTS
Appendix A. Publications by Edward Nelson
233
Index
241
Preface
Diffusive motion-displacement due to the cumulative effect of irregular fluctuations-has been a fundamental concept in mathematics and physics since the work of Einstein on Brownian motion. It is also relevant to understanding various aspects of quantum theory. This volume explains diffusive motion and its relation to both nonrelativistic quantum theory and quantum field theory. It also shows how diffusive motion concepts lead to a radical reexamination of the structure of mathematical analysis. Einstein's original work on diffusion was already remarkable. He suggested a probability model describing a particle moving along a path in such a way that it has a definite position at each instant, but with a motion so irregular that it has no well-defined velocity. Since the main tool of a physicist is the relation between force and rate of change of velocity, it was astonishing that he could dispense with velocity and still make predictions about Brownian motion. The story of how he did this is told in Edward Nelson's Dynamical Theories of Brownian Motion. In brief, Einstein worked with an average velocity, or drift, defined in a subtle way by ignoring the irregular fluctuations. In his analysis this average velocity results from a balance of external force and frictional force. The external force might be gravity, but, as Nelson remarks, the beauty of the argument is that this force is entirely virtual. In other words, it only has to be nonzero, and it does not appear in Einstein's final result for the diffusion coefficient. Diffusion is part of probability theory, while quantum mechanics involves waves with complex number amplitudes. However, diffusion is related to quantum mechanics in various ways. The Wiener integral over paths that underlies the Einstein model for Brownian motion has a complex number analog in the Feynman path integral of quantum mechanics. While the properties of the Feynman path integral are elusive, the probability models that describe diffusion are precise mathematical objects, and they have direct connections to quantum mechanics. These connections form a thread that runs through this book. The introductory chapter by Faris describes the interrelationships between the book's various themes, many of which were first brought to light by Nelson. One major theme is Markovian diffusion, where a particle wanders randomly but also feels the influence of systematic drift. Diffusion is related to quantum theory and quantum field theory in several ways. These connections, both technical and conceptual, are particularly apparent in the chapters by Gross, Simon, Brydges, Carlen, and Villani. Another important theme is the need for a closer look at the irregular paths arising from the diffusion. Intuitively they each consist of an unlimited number of
x
PREFACE
infinitesimal random steps. Lawler's chapter makes this notion precise by using a syntactic approach to nonstandard analysis. This approach employs an augmented language that recognizes among the real numbers some that are infinitesimal and some that are unlimited in size. When this framework is applied to natural numbers, it happens that among the natural numbers there are some that are unlimited, that is, greater than each standard natural number. Furthermore, one way of describing a diffusion process is in terms of a finite sequence of random variables, where the number of variables, while finite, is unlimited in just this sense. This leads to a related topic, the syntactic description of natural numbers, which is explored in the chapter by Sam Buss. A final chapter explores the mathematical structure of musical composition, which turns out to parallel the structure of spacetime. This contribution is by Jay Hook, who was a Ph.D. student of Nelson in mathematics before turning to music theory. The idea for this book came out of the conference Analysis, Probability, and Logic, held at the Mathematics Department of the University of British Columbia on June 17 and 18, 2004. It was in honor of Edward Nelson, professor at Princeton University, who has done beautiful and influential work in probability, functional analysis, mathematical physics, nonstandard analysis, stochastic mechanics, and logic. The conference was hosted and supported by the Pacific Institute of Mathematical Sciences (PIMS). The National Science Foundation (USA) provided travel support for some participants. The organizers were David Brydges, Eric Carlen, William Faris, and Greg Lawler. The presentations at this conference led to the present volume. The editor thanks all who provided help with its preparation, including the authors, William Priestly, and Joseph McMahon. He is also grateful to the editors at Princeton University Press, Vickie Kearn and Linny Schenck, and to the copyeditor, Beth Gallagher, who gave generously of their expertise.
Chapter One Introduction: Diffusive Motion and Where It Leads William G. Faris*
1.1 DIFFUSION The purpose of this introductory chapter is to point out the unity in the following chapters. At first this might seem a difficult enterprise. The authors of these chapters treat diffusion theory, quantum mechanics, and quantum field theory, as well as stochastic mechanics, a variant of quantum mechanics based on diffusion ideas. The contributions also include an infinitesimal approach to diffusion and related probability topics, an approach that is radically elementary in the sense that it relies only on simple logical principles. There is further discussion of foundational problems, and there is a final essay on the mathematics of music. What could these have in common, other than that they are in some way connected to the work of Edward Nelson? In fact, there are important links between these topics, with the apparent exception of the chapter on music. However, the chapter on music is so illuminating, at least to those with some acquaintance with classical music, that it alone may attract many people to this collection. In fact, there is an unexpected connection to the other topics, as will become apparent in the following more detailed discussion. The plan is to begin with diffusion and then see where this leads. In ordinary free motion distance is proportional to time: Llx = vLlt. (1.1) This is sometimes called ballistic motion. Another kind of motion is diffusive motion. The characteristic feature of diffusion is that the motion is random, and distance is proportional to the square root of time:
Llx = ±av;s;i. (1.2) As a consequence diffusive motion is irregular and inefficient. The mathematics of diffusive motion in explained in sections \.1-1.3 of this chapter. There is a close but subtle relation between diffusion and quantum theory. The characteristic indication of quantum phenomena is the occurrence of the Planck constant Ii in the description. This constant has the dimensions M L2 IT of angular momentum. The relation to diffusion derives from 2 Ii (1.3) a =-, m
• Department of Mathematics, University of Arizona, Tucson, AZ 85721, USA
2
CHAPTER 1
where m is the mass of the particle in the quantum system. The diffusion constant L2 IT for a diffusion; that is, it characterizes a kind of motion where distance squared is proportional to time. In quantum mechanics it is customary to define the dynamics by quantities expressed in energy units, that is, with dimensions M L2 /T2. The determination of the time dynamics involves a division by which changes the units to inverse time units liT. In the following exposition energy quantities, such as the potential energy function V(x), will be in inverse time units. This should make the comparison with diffusion theory more transparent. One connection between quantum theory and diffusion is the relationship between real time in one theory and imaginary time in the other theory. This connection is precise and useful, both in the quantum mechanics of nonrelativistic particles and in quantum field theory. This connection is explored in sections 1.4-1.7. The marriage of quantum theory and the special relativity theory of Einstein and Minkowski is through quantum field theory. In relativity theory a mass m has an associated momentum me and an associated energy me 2 . These define in turn a spatial decay rate
a 2 has the appropriate dimensions
n,
( 1.4) and a time decay rate ( 1.5) These set the distance and time scales for quantum fluctuations in relativistic field theory. This theory is related to diffusion in an infinite-dimensional space of Euclidean fields. Some features of this story are explained in sections 1.8-1.10 of this introduction and in the later chapters by Leonard Gross, Barry Simon, and David Brydges. The passage from real time to imaginary time is convenient but artificial. However, in the domain of nonrelativistic quantum mechanics of particles there is a closer connection between diffusion theory and quantum theory. In stochastic mechanics the real time of quantum mechanics is also the real time of diffusion, and in fact quantum mechanics itself is formulated as conservative diffusion. This subject is sketched in sections 1.11-1.12 of this introduction and in the chapters by Eric Carlen and Cedric Villani. The conceptual importance of diffusion leads naturally to a closer look at mathematical foundations. In the calculus of Newton and Leil?niz, motion on short time and distance scales looks like ballistic motion. This is not true for diffusive motion. On short time and distance scales it looks like the Wiener process, that is, like the Einstein model of Brownian motion. In fact, there are two kinds of calculus for the two kinds of motion, the calculus of Newton and Leibniz for ballistic motion and the calculus of Ito for diffusive motion. The calculus of Newton and Leibniz in its modern form makes use of the concept of limit, and the calculus of Ito relies on limits and on the measure theory framework for probability. However, there is another calculus that can describe either kind of motion and is quite elementary. This is the infinitesimal calculus of Abraham Robinson, where one interprets Ilt
3
INTRODUCTION
and ~x as infinitesimal real numbers. It may be that this calculus is particularly suitable for diffusive motion. This idea provides the theme in sections 1.13-1.14 of this introduction and leads to the later contributions by Greg Lawler and Sam Buss. The concluding section 1.15 connects earlier themes with a variation on musical composition, presented in the final chapter by Julian Hook.
1.2 THE WIENER WALK The Wiener walk is a mathematical object that is transitional between random walk and the Wiener process. Here is the construction of the appropriate simple symmetric random walk. Let 6, ... , ~n be a finite sequence of independent random variables, each having the values ±1 with equal probability. One way to construct such random variables is to take the set {-1, 1}n of all sequences ~ of n values ± 1 and give it the uniform probability measure. Then ~k is the kth element in the sequence, and the function ~ t--+ ~k is the corresponding random variable. The random walk is the sequence Sk = 6 + ... +~k defined for 0 ~ k ~ n. The underlying probability space in this construction is finite, with 2n points. Here is the construction of the n-step Wiener walk on the time interval [0, T]. Let ~t = Tin be the time step. Fix the diffusion constant a 2 > 0, and let the corresponding space step be
~x=a5t.
(1.6)
If (1.7)
for k = 0,1,2, ... n, define Wn(tk) = 6~x
+ ... + ~k~X.
(1.8)
Finally, define w(n)(t) for real t with 0 ~ t ~ T by linear interpolation. Then is a random real continuous function defined on [0, T]. This random function is the Wiener walk with time step ~t = Tin. A typical sample path is illustrated in Figure I. I. Let C([O, T]) be the metric space of all real continuous functions on the time interval [0, T]. Let J.L(n) be the probability measure induced on Borel subsets of C([O, T]) by the random function w(n). That is, the probability of a Borel subset is the probability that the function w(n) is in this subset. This probability measure J.L(n) is the distribution of the Wiener walk. It is concentrated on a finite set of 2n piecewise linear continuous paths. w(n)
1.3 THE WIENER PROCESS The Wiener process is a fundamental object in probability theory, describing a particular kind of random path. Another common name for it is Brownian motion, since it is closely related to the Einstein model for Brownian motion of a physical
4
CHAPTER 1
x
t
Figure 1.1 A sample path of the Wiener walk.
particle. There are other models of the physical process of Brownian motion, so it is clearer to use "Wiener process" for the mathematical object. The Wiener process may be constructed in a number of ways, but one way to get an intuition for it is to think of it as a limit of the Wiener walk. In this limit the distribution of the Wiener walk, which is given by binomial probabilities, converges to the distribution of the Wiener process, which is Gaussian. PROPOSITION 1.1 (Construction of Wiener measure) For each n = 1,2,3, ... let j1-(n) be the probability measure defined on Borel subsets ofC([O, T]) defined by the Wiener walk with time step D..t satisfying nD..t = T. Then there is a probability measure j1- defined on the Borel subsets ofC([O, T]) such that j1-(n) -+ j1- as n -+ 00 in the sense of weak convergence of probability measures.
This result may be found in texts on probability [2]. The statement about weak convergence means that for each bounded continuous real function F defined on the space C([O, T]) the expectation F dj1-{n) -+ F dj1- as n -+ 00. This j1- is the Wiener measure with diffusion parameter (T2. 1ft is fixed, then the map w f-+ w(t) is a function from the probability space C([O, T]) with Wiener measure j1- to the real numbers, and hence is a random variable. For each t 2:: 0 this random variable w(t) has mean zero and variance (T2t. Furthermore, the increments of w corresponding to disjoint time intervals are independent. Since the random variable w(t) is the sum of an arbitrarily large number of independent increments, by the central limit theorem it must have a Gaussian distribution. The random continuous function w associated with the Wiener measure is the Wiener process. A typical sample path is sketched in Figure 1.2. So far the Wiener process has been defined as a random continuous function on a bounded interval [0, TJ of time. However, it is not difficult to build the unbounded interval [0, +00) out of a sequence of bounded intervals and thus give a
J
J
5
INTRODUCTION
x
t
Figure 1.2 A sample path of the Wiener process. definition of the Wiener process as a random continuous function on this larger time interval. In fact, it is even possible to define the Wiener process for the time interval (-00, +00) as a random continuous function satisfying the normalization w(O) = O. Henceforth the Wiener process will refer to the probability space C( (-00, +00)) with the probability measure J.L defined in this way. In the following the expectation of a random variable F defined on the space -00, +00)) with respect to the Wiener measure J.L is written
C«
J.L[F] =
J
F dJ.L.
(1.9)
That is, the same notation is used for expectation as for probability. For example, the expectation of w(t) (as a function of w) is J.L[w(t)J = 0, and the variance is
J.L[w(t)2J
=
O'
21tl.
Another·useful topic is weighted increments of the Wiener process and the corresponding Wiener stochastic integral. Let h < t2, and consider the corresponding increment W(t2) - w(td. This is Gaussian with mean zero and variance O' 2(t2 h) = O' 2l[tl' t2JI, which is proportional to the length of the interval. Consider two such increments W(t2) - W(tl) and W(t2) - w(tD. The condition of independent increments implies that they have covariance
J.L[(W(t2) - W(tl))(W(t~) - w(t~))J =
O'
21[tl, t2J n [t~, t~]I,
(1.10)
which is proportional to the length of the intersection of the two intervals. This generalizes to weighted increments. Let f be a real function such that J~oo f(t)2 dt < +00. Then the Wiener stochastic integral J~oo f(t)d w(t) is a well-defined Gaussian random variable with mean zero. Furthermore, the condition of independent increments implies that the covariance of two such stochastic integrals is
J.L
[1:
f(t) dw(t)
1:
g(t') dw(t')] =
0'21:
f(t)g(t) dt.
(1.11)
6
CHAPTER 1
The independent increment property (1.10) is the special case when f and 9 are indicator functions of intervals. Another description of the Wiener process is by a partial differential equation. For t > 0 let p(y, t) (as a function of y) be the probability density of the Wiener process at time t, so that
M[f(w(t))]
=
i:
f(y)p(y, t) dy.
Since the density p(y, t) is Gaussian with mean zero and variance that it satisfies the partial differential equation
1 28 p -8p = -0-. 8t 2 8y2 2
(1.12) 0-2t,
it follows
(1.13)
This is the simplest diffusion equation (or heat equation). For a Wiener process describing diffusion in finite-dimensional Euclidean space the probability density is jointly Gaussian. There is also a corresponding partial differential equation, in which the second derivative in the space variable is replaced by the Laplace operator. Later it will appear that in infinite-dimensional space it is preferable to deal with the Ornstein-Uhlenbeck velocity process instead of the Wiener process. The probability distributions are jointly Gaussian, but they have a more complicated time dependence. They still solve a second-order linear partial differential equation. However, this equation involves the sum of the Laplace operator with another operator, the first-order differential along the direction of a linear vector field.
1.4 DIFFUSION, KILLING, AND QUANTUM MECHANICS The first remarkable discovery connecting quantum mechanics with diffusion theory is that the fundamental equation of quantum mechanics is closely related to an equation describing diffusion with killing. As we shall see, the connection is through the Feynman-Kac formula. Quantum theory, of course, is the ultimate mystery of modern science. It has many strange features, such as remarkable correlations over long distances. These correlations are experimentally observed, and their peculiar nature takes mathematical shape in the form of a violation of Bell's inequalities. The appendix to [17] gives an account of this subject and its implications. However strange quantum mechanics may be, there is universal agreement that the wave function 'lj; for an isolated system satisfies the Schrodinger equation -8'lj; =
8t
.[1
l
2 2 -0- - 8
2
8x 2
-
V(x) ] 'lj;.
(1.14)
Here 0- 2 = film and V(x) is the potential energy, here measured in inverse time units. This is often stated in terms of the Schrodinger operator H defined by
H =
1
82
-"20- 2 8x 2
+ V(x).
(1.15)
7
INTRODUCTION
rate
3
-3
x
Figure 1.3 A killing rate (potential) function with subtraction V(x) - AD. Then the Schrodinger equation (1.14) has the form
a1f; = -iH1f;.
(1.16)
at
This way of writing the equation differs slightly from the usual quantum mechanical convention. The usual quantum mechanical potential energy and total energy are obtained from the V and H in the present treatment by multiplication by the constant Ii. This converts inverse time units to energy units. In quantum mechanics the dynamics is defined by dividing energy by Ii, therefore returning to inverse time units. So in the present notation, with inverse time units for H, the solution of the SchrMinger equation with initial condition 1f;(x) = 1f;(x, 0) is
1f;(x, t) = (e- itH 1f;)(x).
(1.17)
This operator exponential may be interpreted via spectral theory or by the theory of one-parameter semi groups [9]. There are several connections between quantum mechanics and diffusion. The most obvious ones are treated here in sections 1.4-1.7. There is also a more profound connection between the Schrodinger equation and diffusion given by the equations of stochastic mechanics. That will be the subject of sections 1.11-1.12. A particularly simple way to go from quantum mechanics to diffusion is to replace it by t. The resulting diffusion equation is then
ar =
-
at
[1-a -axa 2
2
2 2
- V(x) ] r
•
(1.18)
Here a 2 is interpreted as a diffusion constant. The diffusing particle randomly vanishes at a certain rate V(x) ~ 0 depending on its position x in space. Thus there is some chance that at a random time T it will cease to diffuse and vanish. In probability it is common to say that the particle is killed. A typical killing rate function V(x) is sketched in Figure 1.3. Actually, the sketch shows V(x) - Ao, where the subtracted constant Ao is the least eigenvalue of H. The solution r(x, t) has a probability interpretation. Let r(y) be a given function of the position variable y. Then r(x, t) is the expectation of r(y), when y is taken
8
CHAPTER 1
as the random position of the diffusing particle at time t ~ 0, provided that the particle was started at x at time 0 and has not yet vanished. (The contribution to the expected value for a particle that has vanished is zero.) The initial condition for the equation is r(x, 0) = r(x). The equation
-ar = -Hr at
(1.19)
has an operator solution, this time of the form
r(x,t)
= (e-tHr)(x).
(1.20)
This operator exponential may also be interpreted via spectral theory or by the theory of one-parameter semigroups. However, in .this case there is also a direct probabilistic solution of equation (1.18) given by the Feynman-Kac formula. . PROPOSITION 1.2 (Diffusion with killing: The Feynman-Kac formula) The so-
lution of the equation for diffusion with killing is given by
r(x, t) = J.L [e- J~ V(x+w(s)) dsr(X
+ w(t))]
.
(1.21)
This says that the solution is obtained by letting the particle diffuse according to a Wiener process, but with a chance to vanish from its current location in space at a rate proportional to the value of V at this location. The exponential factor is the probability that the particle has not yet vanished at time t. There are discussions of the Feynman-Kac formula in Nelson's article [6] on the Feynman integral and in the book by Simon [16] on functional integration.
1.5 DIFFUSION, DRIFT, AND QUANTUM MECHANICS Another connection of quantum mechanics with diffusion is more subtle. Instead of diffusion with killing, there is diffusion with a drift that maintains equilibrium. Let H be the SchrMinger operator (1.15) of section 1.4, expressed as before in inverse time units. Suppose that H has an an eigenfunction 'l/Jo(x) > 0 with eigenvalue Ao. Thus ( 1.22) This gives a particular decay mode solution of the diffusion with killing equation (1.18) of the form ro(x, t) = e->.ot'l/Jo(x): The interpretation in terms of diffusion with drift comes from the change of variable r(x, t) = f(x, t)e->.ot'l/Jo(x). In other words,' f(x, t) is the ratio of the solution to the decay mode solution, which is seen to satisfy the partial differential equation
af at
[1
2
a a] = "20" 2 ax2 + u(x) ax f.
( 1.23)
Here the function u(x) represents a drift vector field given by
( ) _ 2_1_ a'l/Jo(x) ux -0" 'l/Jo(x) ax .
( 1.24)
9
INTRODUCTION drift
Figure 1.4 A drift function u(x). A typical drift function u(x) is sketched in Figure 1.4. Again this equation has an operator solution. Define the backward diffusion with drift operator H by a similarity transformation given by the operator product 1 . H = 'l/Jo . (H - >'0) ·'l/Jo· (1.25) A
The transformed operator has the form 1
2
a2
H=-20" ax 2 A
-
a
u (x)ax'
(1.26)
The first term is the diffusion term, and the second term is the term corresponding to a drift u(x). The equation (1.23) may be written as the backward equation
af = -fIf.
at
(1.27)
A particle starts at x and diffuses under the influence of the drift. Let f (y) be a given function of the position variable y. Then the solution f(x , t) is the expectation of f(y), where y is taken as the random position of the diffusing particle at time t ~ 0, provided that the particle was started at x at time O. The initial condition for the equation is f(x , 0) = f(x). The operator solution ofthis equation is given by
f(x , t)
=
(e- tif f)(x).
(1.28)
One probabilistic solution arises from the Feynman-Kac formula (1.21), given by
f(x , t) =
eAotjL
['l/Jo~X) e- J~ V(x+w(s»ds f(x + w(t))'l/Jo(x + W(t))].
(1.29)
However, this solution emphasizes the connection with killing. The true nature of this solution is rev~aled by looking at it directly as a diffusion process with drift vector field u(x). The diffusion process with drift u may be defined directly by the stochastic differential equation
dx(t) = u(x(t)) dt + dw(t).
(1.30)
10
CHAPTER 1
with the initial condition x(O) = x. The first term on the right represents the effect of the systematic drift, while the second term on the right represents the influence of random diffusion. This equation involves the Wiener path, which is nondifferentiable with probability I. However it may be formulated in integrated form as
x(t) - x
=
1t
u(x(s)) ds + w(t).
( 1.3l)
For each continuous path w( t) this equation determines a corresponding continuous path x(t). Since the w(t) paths are random (given by the Wiener process), the x(t) paths are also random. In particular, let f(y) be a function of the space variable, and consider the expectation
f(x, t)
= 1l[J(x(t)) I x(O) = xl
(1.32)
as a function of time and the starting position. PROPOSITION 1.3 (Diffusion with drift: Stochastic differential equation) Sup-
pose the diffusion with drift process x(t) is defined by the stochastic differential equation with the initial condition x(O) = x. Then the expectation f(x, t) satisfies the backward diffusion with drift equation. In other words, the expectation satisfies equation (1.23). This result is standard in the theory of Markov diffusion processes. A charming account is found in Nelson's book [7].
1.6 STATIONARY DIFFUSION Another way of describing diffusion with drift is in terms of probability density. Say that instead of starting the diffusing particle at a fixed point, its starting point is random with probability density p(x). Then, after time t, the probability will have diffused to p(y, t). Thus, if one computes the expectation of a function f(y) of the position at time t, one gets
i:
f(y)p(y, t) dy
=
i:
f(x, t)p(x) dx.
(1.33)
The density satisfies a partial differential equation, the forward equation (or
Fokker-Planck equation). It is
8p 8t =
[120' 8y28 - 8y8 u(y) ] p. 2
2
( 1.34)
Again this has an operator form. Define the forward diffusion with drift operator
iI' by another similarity transformation, this time given by the operator product ", 1 H = 'l/Jo . (H - Ao) . - . 'l/Jo This has the form
", = - 8y8(120' 8y8- u(y) ) .
H
2
(1.35)
( 1.36)
11
INTRODUCTION
density
Figure 1.5 A stationary density po(x).
The equation
ap
~,
-=-Hp
(1.37)
p(y, t) = (e- tif ' p)(y).
(1.38)
at
has an operator solution
Let (1.39) Then Po is interpreted as an equilibrium probability density. The drift u may be expressed directly in terms of Po by
u(x)
=
~0"2_1_apo(x). 2 po(x) ax
(l.40)
Say that the initial probability density p(x) = po(x). the probability density that defines the diffusion process. Then since H' Po = 0 the density remains the same; that is. p(x, t) = po(x) for all t. This is the stationary diffusion process. A typical stationary density is shown in Figure 1.5. Since the stationary diffusion process is defined from the process with killing by a similarity transformation. the measure associated with the stationary diffusion process may be defined in terms ofthe killing rate V(y) by
P,[F] = eTAo
J
Po (x)J.L [F 'l/'o~x) e- g V(x+w(s))ds'l/'O(x + W(T))] dx. (1.41)
Here F depends on the path x(s) for s between time 0 and time T. Since po(x) = 'l/'O(x)2 this is equivalent to
P,[F] = eTAo
J
J.L[F'I/'o(x)e- faT V(x+w(s)) ds'l/'o(x + w(T))] dx.
(1.42)
This suggests that it should be possible to construct the stationary diffusion with drift measure directly from the Wiener measure with killing. without using the
12
CHAPTER 1
x 2
1
10
0
t
-1 -2
Figure 1.6 A sample path ofa nonlinear diffusion dx = u(x) dt
+ dw(t).
know ledge of the ground state wave function 1/Jo (x). This sort of construction turns out to be useful in the field theory context discussed in sections 1.8-1.10. Let
itT[F]
= _1 Jl[Fe- g ZT
V(w(s))
dS8(w(T))].
(1.43)
This is the Wiener expectation, conditioned on returning to 0 after time T and on not vanishing. The delta function enforces the condition of return to the origin, and the ZT in the denominator is the probability of not vanishing along the way. Suppose that T is large and that F depends only on part of the path well in the interior of the interval from 0 to T. The expectations should then look much like the expectations for the stationary diffusion process. This gives a peculiar but useful view of the stationary diffusion process in ter.ms of the killing process. The diffusing particle starts at zero and is lucky enough both to survive and to return to zero. Along the way it appears to be diffusing in equilibrium. However, this equilibrium is maintained at the cost of the many unsuccessful attempts that are discarded. A sketch of a typical sample path is shown in Figure 1.6. The transition from quantum mechanics to diffusion is computationally difficult. In fact, the first step is to start with the function V(x) and solve the eigenvalue equation for the ground state wave function 1/Jo(x). Going the other way is easier. Start with 1/Jo(x) or with po(x) = 1/JO(x)2. Then U
1 2 a1og(po(x)) () x ="20" ax '
(1.44)
and V (x) - Ao is recovered from
V(x) - AO = _1_u(x)2 20"2
+ ~ au(x).
(1.45) 2 ax This, in fact, is the way the illustrations in the preceding sections were conceived.
13
INTRODUCTION
The starting point was a density given by two displaced Gaussians of the form ( 1.46) From this it was easy to use equation (1.44) to compute the drift u(x) and equation (1.45) to get the subtracted potential V(x) - Ao. Once the drift was available, it was easy to simulate the diffusion process by using the stochastic differential equation (1.30).
1.7 THE ORNSTEIN-UHLENBECK PROCESS This section is devoted to the Ornstein-Uhlenbeck process, which is a special case where everything can be computed. This process was originally introduced to provide a model for physical Brownian motion. In this model the Omstein-Uhlenbeck process describes the velocity of the diffusing particle, so it might properly be called the Omstein-Uhlenbeck velocity process. Thus it is a more detailed model than the Einstein model of Brownian motion using the Wiener process, in which the particle paths are nondifferentiable and hence do not have velocities. In thl;"! following discussion the Omstein-Uhlenbeck process is used in another way, as a description of the position of a diffusing particle that has a tendency to drift toward the origin under the influence of a linear vector field. The general importance of the Omstein-Uhlenbeck process is that it is not only a Markov diffusion process, but· it is also Gaussian. Thus it has all good properties at once. The corresponding object in quantum theory with all good properties is the quantum harmonic oscillator. In fact, the Omstein-Uhlenbeck diffusion process may be used as a tool to understand the quantum harmonic oscillator. For the quantum harmonic oscillator the killing rate depends quadratically on the distance from the origin, so ( 1.47) This is the harmonic oscillator potential energy in units of inverse time. This is simply a parabola, as sketched in Figure 1.7. Again the sketch shows the potential with the lowest eigenvalue subtracted. The usual harmonic oscillator potential energy expression in quantum mechanics is obtained by multiplying the right-hand size of equation (1.47) by Ii and is (1/2)mw 2 x 2 . The energy operator H, also in inverse time units, is determined by
H = __1 a 2 _8 2 8x 2 2
1 w2 x 2 + __
It is easy to see that H has least eigenvalue Ao
2 a2
(1.48)
= ~w with eigenfunction '"
. 2
'l/Jo(x) = Ce-?;;'Ix .
( 1.49)
The corresponding Gaussian probability density is
Po(x)
= 'l/JO(X)2 = C 2e-;"Ix2.
(1.50)
14
CHAPTER 1
rate
x
Figure 1.7 The Ornstein-Uhlenbeck killing rate (harmonic oscillator potential) with subtraction V(x) - AO = (1/2)(w 2 /a 2 )x 2 - (1/2)w.
drift
x
Figure 1.8 The Ornstein-Uhlenbeck (harmonic oscillator) drift u(x)
= -wx.
15
INTRODUCTION
The drift in the diffusion process works out to be the linear drift
u(x)
= -wx.
(1.51 )
This linear drift is sketched in Figure 1.8. Nothing could be simpler. The Omstein-Uhlenbeck process x(t) may be defined by the linear stochastic differential equation (the Langevin equation)
dx(t)
= -wx(t) dt + dw(t)
(1.52)
with the initial condition x(O) = x. The first term on the right is a drift toward the origin depending linearly on the distance, while the second term on the right is the random diffusion term. PROPOSITION 1.4 (Ornstein-Uhlenbeck process: Differential equation) The Langevin stochastic differential equation for the Ornstein-Uhlenbeck process has the explicit solution given for t 2 0 by
x(t)
= e-wtx + 10 1 e-w(t-s) dW(s).
(1.53)
This is a Wiener stochastic integral, so each x(t) is Gaussian. Furthermore, the conditional mean is given by the exponential decay factor
x(t) = p[x(t) I x(O) =
xl =
e-wtx.
(1.54)
By (l.Il) the conditional covariance is
p[(x(t) - x(t))(x(t l )
-
x(t l )) I x(O) =
xl =
a210
t/\t'
e-w(t-s)e-W(I'-s) ds.
( 1.55) Here t 1\ tl denotes the minimum of t and tl. This integral is elementary; the result is that the conditional covariance of x(t) is given by
p[(x(t) - x(t))(x(t') - x(t')) I x(O) =
xl
In particular, x(t) has conditional variance
p[(x(t) - X(t))2 I x(O)
:2
= ~(e-wlt-t'l - e-w(t+t')). (1.56)
2w
2
= xl = ~(l 2w
e- 2wl )).
( 1.57)
If the time t > 0 is large, then the Omstein-Uhlenbeck process is in a stationary equilibrium state. In this case x(t) has mean zero and variance a 2 /(2w). Furthermore, it is Gaussian, as shown in Figure 1.9. The covariance is obtained by multiplying the variance by the exponential decay factor e-w1t-t'l. For t > 0 let gt be the Gaussian density with mean zero and variance given by (a 2 /w)(1 - e- 2wt ). The famous Mehler formula, expressed in terms of such a Gaussian density, follows immediately from the calculation of the conditional mean and variance and the fact that x(t) is Gaussian. PROPOSITION 1.5 (Ornstein-Uhlenbeck process: Mehler formula) Let x(t) diffuse according to the Ornstein-Uhlenbeck process. For every bounded measurable function f the conditional expectation for t > 0 is given by the integral
f(x,t)
= p[f(x(t)) I x(O) = xl
=
i:
gt(y - e-wtx)f(y)dy.
(1.58)
16
CHAPTER 1
density
~------~------~----~------~ x
Figure 1.9 The Omstein-Uhlenbeck (harmonic oscillator) stationary density P 0 determines a time oscillation rate
mc2 mT=--
(1.65)
Ii
and a spatial decay rate (1.66) While in Minkowski field theory the time coordinate plays a special role, in Euclidean field theory it is just another space coordinate. A mass thus determines both a time decay rate mT and a spatial decay rate mL, as given by the above formulas. The easiest case of Euclidean fields is the free field, which is a mean-zero Gaussian field. As discussed in the appendix (section 1.1'6) to this introduction, such a field is automatically constructed merely by specifying its covariance. In the one-dimensional case the stationary Ornstein-Uhlenbeck process is a mean zero Gaussian process formulated in terms of a random function x(t) from time to the space in which a particle is moving. This process has covariance 0'2
C(t, s) = 2we-wlt-sl
= 0'2 ((£2)-1 - dt 2 + w 2 (t, s).
(1.67)
The last expression for the covariance says that it is the kernel of an integral operator that inverts a differential operator. In the Euclidean field point of view it is more natural to think of the OrnsteinUhlenbeck process as a random function ¢(x) of a space variable x. The value of the function is some field quantity. The covariance of the stationary OrnsteinUhlenbeck process regarded as a random function of a space variable x in the onedimensional Euclidean space E is
C1 (x,y)
0'2 1 0'2 ( d 2 = ___ e-mLlx-yl = - -2 + mi )
c 2mL
dx
c
-1
(x,y).
( 1.68)
. The natural generalization to a space variable x belonging to a Euclidean space
E of dimension n is 0'2
2
2-1
C(x,y) = -(-Vx +mL)
c For example, in dimension 3 this is
C3(x,y)
0'2 = _(_V2 +m 2 )-1(x,y) =
c
x
0'2
_
(x,y). 1
c 41l"lx - yl
(1.69)
e-mlx-yl.
(1.70)
In all dimensions above I the covariance is singular on the diagonal x = y. In dimension 2 this is a logarithmic singularity, while in n > 2 dimensions it is an
19
INTRODUCTION
inverse n - 2 power. This means that a random function with this covariance would have to have an infinite variance. Finiteness is restored by having random variables that are Schwartz distributions (generalized functions). That is, they are defined not on points but on test functions. A test function is a function that is smooth and sufficiently" rapidly decreasing. The random variable is then
¢(f)
=/
¢(x)f(x) c?x,
( 1.71)
where the right-hand side is only a formal expression. The field ¢(x) is only a formal expression. It fails to exist as a function because it oscillates too much on small distance scales, The averaged object ¢(f) is defined because the oscillations are cancelled by averaging with a smooth weight function. The covariance of ¢(f) with ¢(g) is
C(f, g) = / / f(x)C(x, y)g(y) c?x dny.
(1.72)
This is well defined, since the integral on the right is finite. In particular, the variance of ¢(f) is
C(f,f)
=/ /
f(x)C(x,y)f(y)dnxdny.
(1.73)
All we need to know about the covariance is that this variance is finite. Then we automatically have mean zero Gaussian random variables that are Schwartz distributions. Let J.L be the probability measure defining the random Schwartz distributions that constitute the Euclidean free field. The natural analogy with the Feynman-Kac formula might lead us to define non-Gaussian measures by
P[F]
= ;A J.L [F(¢) exp ( -
[
V(¢(x)) c?x) ] .
(1.74)
Here A is a large box, and the ¢(x) is conditioned to be zero on the boundary of the box. This is the analog in higher dimensions of the stationary process started at a fixed time and conditioned to survive and return to the starting point at a later fixed time. In that case the exponential factor describes the survival in the face of possible killing in regions of space with large potential V, and the delta function enforces the return. In the field theory case the exponential factor penalizes field configurations with large values of the potential V, and the zero boundary condition pins the field values on the sides of the box. The problem is that it is a delicate matter to define nonlinear functions of Schwartz distributions. A naive attempt would be to define V (¢) (f) to be V (¢(f)). The calculation
/ V(¢(x))f(x) dnx
f- V ( / ¢(x)f(x) dnx)
(1.75)
shows that this is misguided. The right-hand side V(¢(f)) is defined. But it obviously does not give a good definition of the left-hand side. Therefore it is striking that there are situations where it is possible to define a polynomial in the field,
20
CHAPTER 1
in fact, a Wick power. As explained in the appendix (section 1.16), the kth Wick power: ¢(X)k : of a Gaussian field ¢(x) looks formally like a polynomial of degree kin ¢(x). It may exist even when the ordinary power ¢(x)k does not exist. The reason is that the expectation of the product of the Wick power: ¢( x) k : with the Wick power: ¢(y)k : is k!C(x,y)k. The covariances C(x,x) and C(y,y) do not enter into the expectation. PROPOSITION 1.6 (Wick powers of the Gaussian free field) Suppose that for some integer power k ~ 1 the integral
!!
g(x)C(x,y)kg(y)dnxdny
2. Think of x = (x, ct) as having a space part and an imaginary time part. Then this singularity is integrable when one only integrates over (n - I)-dimensional space. For each test function 10 on (n - I)-dimensional space, define the sharp-time field as the integral.
¢t(fo)
=
!
¢(x, t)/o(x) dn - 1 x.
(1.79)
This field is a well-defined Gaussian random variable with finite covariance
Jl[¢s(fo)¢t(go)]
=
!!
10(x)C(x, SiY, t)go(y) dn - 1 xdn - 1 y.
(1.80)
The sharp-time fields ¢t (fo) for fixed values of t belong to a certain Hilbert space of Schwartz distributions; this is a space in which diffusion can take place. Again consider Euclidean space-time with coordinates x = (x, ct) and condition on the fields ¢(x, 0) at time zero. Write
)-1 ,
f)2
C = a 2 ( - 8t 2 + w2
(1.81 )
where w is the positive square root of the partial differential operator
w2 = -c2\7~
+ m~.
(1.82)
Then, in analogy with equation (1.67), the time dependence of the covariance is (1.83)
so the covariance of the time zero fields is Coo = (I/2)a 2 w- 1 , which is invertible with inverse COc/ = (2/a 2 )w. According to the general results in the appendix (section 1.16), the conditional mean operator is K t = COtCOc}. This is K t = c w1tl . The conditional mean of the time t field given the time zero field ¢o is Jl*[¢t(fo)] = (fo, Kt¢o). In short, (1.84) where ¢o is the given time zero field. Again, from the general results the conditional covariance is the covariance minus the covariance of the conditional means. This gives
C*(s, t)
= ~a2w-1(e-wlt-sl _ e-w(ltI+lsl)).
( 1.85)
In particular, the equal-time conditional covariance is
C*(t, t)
= ~a2w-1(1- e-2wltl). 2
Nt
(1.86)
of moderately smooth test funcChoose a sufficiently small Hilbert space tions on (n - I)-dimensional Euclidean space with a sufficiently large dual Hilbert
22
CHAPTER 1
space'lfo of moderately rough distributions. These are chosen so that C* (t, t) is of trace class from to 11.0 . For test functions 10 in the corresponding fixedtime fields 4>t (10) live in 11. 0 . Define a diffusive dynamics on the fixed-time fields by taking conditional expectations given the time zero fields. Then for the time t fields the conditional mean is e- wt 4>o and the conditional covariance is the C*(t, t) given above.
1I.t
1I.t
PROPOSITION 1.7 (Gaussianfreejield: Mehler formula) Let 'Yt be the Gaussian measure with mean zero and covariance C*(t, t). Consider a bounded continuous real function F on the Hilbert space 11.0 offixed-timefields. Then the conditional expectation of F evaluated on the field at time t > 0 given the field at time zero is implemented by an infinite-dimensional Mehler formula
(e- tiI F)(4)o)
=
f
F(e- wt 4>o
+ x) d"Yt(x)·
(1.87)
The theory of the infinite-dimensional Ornstein-Uhlenbeck semigroup with its Mehler formula is available from several sources; one brief account is [I]. The field X(t) diffuses in the field space 11.0 with diffusion constant 0'2 and drift given by the -w from equation (1.82) acting on the field. Since w 2:: mT > 0 the action of exp( -tw) on 11.0 is stabilizing and produces a stationary process, the Euclidean free field. The Mehler formula yields the expectation of this process at time t, given the time-zero field. Suppose a construction produces a random field on Euclidean space E of dimension n. One can think of the Euclidean space as having coordinates x = (x, ct), where x is an ordinary space coordinate and t is imaginary time. Do we then have a quantum field on Minkowski space M of dimension n? The strategy would be to replace t by it and leave the space variables x alone. It turns out that there is such an analytic continuation, but it is a subtle matter. The case of the free field is relatively simple. The replacement leads from the Ornstein-Uhlenbeck semi group exp( -tiI) in the case of the free Euclidean field to the harmonic oscillator evolution exp( -itiI) in the case of the free Minkowski quantum field. For non-Gaussian interacting fields the passage from Euclidean probability to Minkowski field theory requires more work. At the outset one needs estimates on the interaction for the Euclidean fields. The logarithmic Sobolev inequality (or equivalently, the hypercontractivity condition) gives such an estimate. The classic logarithmic Sobolev inequality is for the generator iI of the Ornstein-Uhlenbeck process or, equivalently, the quantum mechanical harmonic oscillator. As explained in the contribution by Gross, a logarithmic Sobolev inequality is equivalent to a lower bound for certain Schr6dinger operators. The following lower bound is a consequence of this equivalence and of the existence of a logarithmic Sobolev inequality for Gauss measure. PROPOSITION 1.8 (Semiboundedness of Hamiltonians) Consider a real Hilbert and a covariance operator w- 1 : --t 11.0 from it to its dual space. space Suppose that the time decay operators e- tw for t 2:: 0 act in 110 as a strongly continuous semigroup of operators. Let 0'2 > 0 be a diffusion constant, and let 'Yt
1I.t
1I.t
23
INTRODUCTION
be the Gaussian measure with mean zero and covariance (0- 2 /2}(1 - e- 2wt )w- 1 . Suppose that the covariance w- 1 is trace class, so that "It is supported on 1£0. Define the Ornstein-Uhlenbeck generator iI by the Mehler formula (e- til F)(¢)
=
!
F(e- wt ¢
+ x) d"lt(X)·
(1.88)
This acts in the space L2 of functions on 1£0 that are square integrable ·with respect to "100. Suppose that there is a constant mT with w ~ mT > O. Consider a v real function V on 1£0 such that e - mT is in L2. Then the sum of the OrnsteinUhlenbeck generator with V is bounded below. That is,
(1.89) as quadratic forms.
The function V may be unbounded below, but the operator iI + V is bounded below. At first this result looks incredibly weak, since it holds only if the negative singularity of V is extremely mild. The remarkable fact is that it is independent of dimension. It even holds in infinitely many dimensions, and this makes possible the application to quantum field theory. The contribution to this volume by Simon explains Nelson's application of this result to Wick powers [5] as a step in the construction of random fields on two-dimensional Euclidean space. The contribution by Brydges goes on to show how another idea of Nelson led to the construction of the quantum field from the Euclidean field. An exposition by Nelson himself is found in [10].
1.10 INTERSECTING PATHS AND LOCAL TIMES The particle picture describes particles moving in space as a function of time. The field picture describes fields defined on Euclidean space E (space and imaginary time) or Minkowski space-time M (space and real time). There is a strong analogy, but it is mainly through the mathematics. It is striking that there is another way of describing fields in terms of a sea of particles (paths defined as functions of an auxiliary parameter) moving in space E or space-time M. This auxiliary parameter is a kind of artificial time; it should not be identified with ordinary time. The ultimate result is a picture of fields represented in terms of diffusing particles in space (functions of the artificial time). Each particle diffuses up to an exponentially distributed random "time" T (depending on the particle), at which point it vanishes. Furthermore, each particle has a random ± "charge." The field value ¢(y) represents a "charge"-weighted sum involving the total amount of "time" each particle in the sea spends at y. There are several variants of this theory. The one described here is an early version due to Wolpert. His first article [19] describes the theory of Wiener path intersections and local times, and his second article [18] gives the particle picture for Euclidean field theory. The idea behind these representations is that the central limit theorem causes a weighted sum of the local times of many particles to become a Gaussian field. In the contribution of Brydges there is some discussion of
24
CHAPTER 1
a different mechanism whereby the suJ.l1 of local times is related to the square of a Gaussian field. Consider a Wiener process in Euclidean space starting at a point x; for convenience take the diffusion constant to be (72 = 2. This process describes random continuous paths t f-t w(t) with values in E = IRn, defined for t ~ 0 and with w(o) = x. It is interpreted as describing the path of a particle diffusing in space (or Euclidean space-time) as a function of the auxiliary time parameter. The transition density ofw(t) is Gaussian; that is,
gt(Y - x)
= /lx[8(w(t) -
y)]
( 1.90)
as a function of y is a Gaussian density with mean x and with covariance 2tIn , where In denotes the identity matrix. The connection with field theory is that the covariance of the free Euclidean field (1.69) on E = IR n is proportional to
G(y - x)
= LX! e- m2t gt(y -
x) dt.
( 1.91)
The role of the exponential factor e- m2t is to represent the probability that the particle has not yet vanished at "time" t. Define the local time by
T(w, y) =
1T
8(w(t) - y) dt.
( 1.92)
This measures the "time" spent at y by a particle that diffuses from x and vanishes at random "time" T. The rate of vanishing is the constant rate m 2 > O. This is a formal expression, but it can be interpreted in the sense of distributions. Multiply by a weight function that depends on y and then integrate over y; the result is a welldefined random variable. For instance, if the weight function were the indicator function of a region in space, this would represent the "time" that the particle spent in this region until its death at "time" T. The relation between the covariance and the local time is
G(y - x)
= /lx[T(w,y)].
( 1.93)
This is the covariance of the free Euclidean field. The expectation of a quantity quadratic in the field is expressed in terms of an expectation linear in the local time. Consider a large box of volume V. Let N be a Poisson random variable with mean Ct. Consider a random number N of independent Wiener processes Wj(t) starting at random points Xi chosen independently and uniformly in the box. This is the sea of diffusing particles. Each particle has a positive or negative random sign (7i = ±1, for i = 1, ... ,N, also chosen independently and with equal probability for the two signs. (This might be thought of as a kind of charge.) Furthermore, each of the particles diffuses according to a Wiener process with killing at the rate m 2 • The "time" the ith particle vanishes is Ti. Define a parameter A by
1 A
2
Ct
m2
V·
( 1.94)
25
INTRODUCTION
The 0: IV factor describes the average density of the starting points for the particles, while the 11m 2 time factor represents how long the particle lives on the average. For each particle there is a local time
T(Wi' y)
= foT; .(y) = A~ LUiT(Wi,Y).
(1.96)
i=l
For large expected number 0: of particles this is a small constant times a sum with many terms, each of which is a sum of the local times of the particles at y, weighted by the corresponding signs. As usual in the theory of Schwartz distributions this makes sense if we cancel fluctuations by averaging:
¢>.(f)
=
f
¢>.(y)f(y) dy
(1.97)
is a weII-defined random variable. PROPOSITION 1.9 (Local time and the free field (Wolpert)) The joint probability distribution of the random variables ¢>.(f) defined in terms of the local times with random signs converges as A --t 0 to the joint probability distribution of the random Euclidean free fields ¢(f).
His proof proceeds by computing moments. The computation of the covariance helps explain the choice of the coefficient A~. Since the random signs produce considerable cancellation, the computation soon leads to the expression (1.98) The contribution to the expectation of the product of local times is twice the contribution from the situation where x is visited before y. The particle starts off uniformly at Xl and diffuses until it reaches X and then continues to diffuse until it reaches y and then eventuaIIy vanishes. The contribution from the diffusion from Xl to X is l/V times 11m2 , since it has to get from the random Xl to X before it vanishes. The contribution from the diffusion from X to y is G(y - x). So the final result for the covariance is (1.99) Wolpert proved that for diffusion in two-dimensional space one can define products of local times. In other words, for k independent diffusions in two dimensions it is meaningful to consider
T(WI,y)···T(Wk,Y)
(1
(k
= 10 ···10
then Ho+ V is bounded below. Ed used the Feynman-Kac formula to combine hypotheses (a) and (b) to prove the semiboundedness. Some later proofs avoided the Feynman-Kac formula, replacing it by Trotter product formula ideas. (I again suggest that the reader see Barry Simon's contribution to this volume.) Ed also showed in this paper that the hypothesis (a) actually does hold for the potential of interest. In addition he showed: c. If the infinitely many variables are replaced by one variable then indeed e- tHo is a bounded operator from L2(JR, 'Y) into LP(JR, 'Y) for t > tN(2,p), for p > 2. That is, in the notation of Example 2.1.2, Ed proved that Ile-tA~ 112-+P < 00 when n = 1 and t > tN(2,p). For a full proof of semiboundedness of H it remained therefore to extend the one-dimensional result in item (c) to infinite dimensions. This proved to be a major point in the further progress of constructive quantum field theory. This point was settled by Jim Glimm in his paper [37], in which he showed that in one dimension, e- tHo is actually a contraction from L2 to LP for large enough t, much larger
52
CHAPTER 2
than tN(2,p). Since these semi group operators are integral operators with positive kernels a straightforward use of Minkowski's inequality shows that the tensor product of two such operators from L2 to LP is bounded from L2(product space) to LP(product space) with norm at most the product of the norms. This point was elaborated by Irving Segal in [87]. By tensoring up it follows then that e- tHo is a contraction in infinite dimensions as well, for large enough t. Jim Glimm's paper was a vital step, not only for the further progress of constructive quantum field theory, but also for understanding hypercontractivity itself, as it developed over the next few years. All of this work on hypercontractivity (and I've mentioned only three papers out of many up to that time) was done originally in the highly structured context of quantum field theory. The mathematical structures centered on certain tensor algebras over a Hilbert space and various isomorphic versions. A functional analyst who might have found Theorem 2.1, above, interesting would have had great difficulty reading it out of the existing literature if he or she was unfamiliar with these special structures. In his paper [87] (submitted November 15, 1969), Irving Segal began the process of isolating the key mathematical concepts that had been responsible for some of the successes of the constructive quantum field theory program up to that point. He abstracted some of the important features of these structures and brought the seemingly specialized subject matter into the general mathematical literature. His paper also showed how the special structures fit into the abstract theory and how it is that causality (signals don't propagate faster than the speed of light) forces the potential V to be so singular (by example rather than by a theorem). Shortly after Irving Segal's paper [87] was submitted Raphael Hoegh-Krohn and Barry Simon, in [54] (submitted November 30, 1970), further abstracted and developed some of the important features of these structures. They introduced the terminology "hypercontractive semi group" in that paper and established the viewpoint of this concept that we still have today. I want to emphasize that I have discussed these two papers here because of their central role in today's understanding of the general notion of hypercontractivity. But the importance of the model-dependent evolution of these kinds of inequalities by Nelson, Glimm, Jaffe, L. Rosen, and Segal, even before these two papers appeared, cannot be overemphasized. The introduction to [54] is a good source of history of the preceding four years (1966-1970), during which there was rapid progress on the problem of existence of quantum fields. For an exposition of the state of the art of the existence theory in 1987 as well as of the many physical properties of the quantum fields constructed the reader may consult the book [38] by Glimm and Jaffe. Ed returned to the study of (what we now call) hypercontractivity in [75] (submitted September 1972) to show that the inequality t ~ t N (p, q) was necessary and sufficient for contraction, as in Theorem 2.1 above. Up to that time it was known that contractivity occurred after some large time to (p, q). Ed's technique in this paper was extremely novel: he proved that the one-dimensional case of Theorem 2.1 yielded a contraction by showing that the infinite-dimensional case was bounded. Ed gave yet another distinct proof of Theorem 2.1 in his Erice lecture [76] (submitted 1973).
53
HYPERCONTRACTIVITY
I've focused very narrowly on these five papers so far because they are on a direct line to the main subject matter of this survey. Without these papers in the discussion the main sequence of ideas wouldn't make sense. 1 need to warn the reader, though, that this is not intended to be anything like a comprehensive survey of the history of that time, even when restricted to the narrow subject matter at hand, hypercontractivity and logarithmic Sobolev inequalities. A listing of (I believe all) of the relevant papers on this subject along with some discussion may be found in [44]. Before the notion of logarithmic Sobolev inequality came along there were still two other developments intermediate between hypercontractivity and logarithmic Sobolev inequalities: We see from Chapter I that Ed's precise bounds in [75], that is, Theorem 2.1, yield the semiboundedess inequality (SB) by means of the two intermediate implications: (HC) implies (LS), which in tum implies (SB). But even before the advent of the intermediate inequality (LS), Guerra, Rosen, and Simon observed in [51] (submitted September 29, 1972) (cf. their equation (1.1)) that one can derive the inequality (SB) directly from Ed's strongest form, (HC), via Trotter product formula methods. The use of Trotter product formula methods means that one really only needs to know (HC) (for p :::: 2) for small t. Of course we understand now that one actually needs only the derivative of (HC) (for p :::: 2) at t :::: 0 to derive (LS) and therefore to derive semiboundedness, (SB). I want to go back in time now to a paper of Paul Federbush, [33] (submitted April 12, 1968). Paul found a different way to derive semiboundedness, given just the information in item (a) in Ed's paper. Paul was motivated in part by a desire, rather widespread at that time, to eliminate use of the Feynman-Kac formula from Ed's proof. I'm going to go into a little detail here because, as it turned out, the central point of Paul's paper was superceded by Jim Glimm's proof of contraction, while at the same time, what appears to have been a side issue for Paul in this paper could be said, in retrospect, to be a preliminary step toward logarithmic Sobolev inequalities. Paul began with essentially the identity (2.9). If we don't take the q( t) root in the norm we find
G(f) ==
~I
r l(e-
dt t=o ix
:::: A
Ix 11(xW
tHo f)(x)lq(t)dJ.L(x)
log 11(x)ldJ.L(x) - 2(Hol, f)
(2.13)
if q(O) :::: 2 and q'(O) :::: A > O. Paul then used Young's inequality, pretty much as in the proof of Theorem 2.5 above, to derive (essentially)
2((Ho
+ V)I,I)
~ Alllll~
-
Alle-vIAII~ - G(f) if
111112:::: 1.
(2.14)
Thus the semiboundedness of Ho + V would follow (given that Ed had already shown that lIe- vI A II§ < 00) if one could show that
sup{G(f) : 111112
:::: I} < 00.
(2.15)
This could actually have been derived from Ed's one-dimensional inequality for the case of interest to Paul (wherein physical space is taken to be a circle). But
54
CHAPTER 2
Paul gave instead a novel combinatorial proof based on expansion of the function f in tenns of Hennite functions of many independent variables. In the end, therefore, Paul did not differentiate Ed's hypercontractive inequality, as we did above in the proof of Theorem 2.4, but rather differentiated his own inequality. One half of Paul's paper is in fact devoted to getting estimates of the function appearing after the t-derivative in (2.13) in order to show that (2. t 5) holds. Because of the way he organized these inequalities he never wrote down the inequality Ifl210g IfldJ1, :::; const.(Hof, f) + ... , although one could easily derive this by piecing together some of his inequalities. Nowadays we can recognize this as a Sobolev-type inequality because we appreciate more fully the fact that Ho is indeed a Dirichlet fonn operator-an infinite-dimensional analog of (2.4). But at that time this viewpoint seems not to have been in the air, even though the common expression, Ho = w(k)a'kakdk, strongly suggested this and even though physicists often wrote (and write) Ho explicitly as a second-order partial differential operator in infinitely many variables. In any case it is clear that Paul actually proved half of Theorem 2.5 even though he never stated it as such and never even wrote down any variant of a logarithmic Sobolev inequality or even any variant of (2.12). As I emphasized in the previous section the equivalence in Theorem 2.5 has nothing to do with logarithmic Sobolev inequalities itself. Nor does the proof that (HC) implies (LS). Only the proof that (LS) implies (HC) actually requires that the operator AI' be a Dirichlet fonn operator. This last implication was not in anyone's mind (as far as I know) until December 1972 when I found myself waiting in a Greyhound bus terminal in Syracuse, New York, with nothing to do. It is perhaps time for me to thank the Greyhound Bus Company for allowing me enough time in their tenninal to prove that (LS) implies (HC). The paper [43] was submitted in June 1973. The influence or noninfluence of physics on mathematics is often argued back and forth. History is forgotten. The origin of hypercontractivity and logarithmic Sobolev inequalities in mathematical physics may well be lost to sight already by many users. Besides the previous discussion, the book [89], by Barry Simon, captures a lot of the relation of hypercontractivity to constructive quantum field theory of that early period.
I
I
2.3 SURVEYS Hypercontractivity and logarithmic Sobolev inequalities have evolved together over the past thirty years. The associated concepts and techniques, along with their applications to other parts of mathematics, have been discussed in quite a few books and surveys. This body of work can be conveniently (but in truth, arbitrarily) divided into seventeen overlapping areas, each of which has already been the subject of one kind of detailed surveyor another, with the exception of some of the relatively new areas. My intent is to give a necessarily superficial description of what each of these areas is about-the flavor, so to speak. Wherever possible I will refer the reader to existing surveys for more details rather than to original papers. The breadth of influence of these two concepts has been large. I want to warn the reader
55
HYPERCONTRACTIVITY
that not only have I omitted mention of many papers, including some of my own, but also it is likely that this compilation is incomplete even at the next level up, the level of surveys. Moreover, it is not impossible that I have even failed to mention entire areas of mathematics that have been affected by hypercontractivity and logarithmic Sobolev inequalities. Be assured that this was not by intention. Glaring omissions and historical errors have been greatly reduced by help from Barry Simon, Laurent Sal off-Coste, and Brian Davies, to whom I am very thankful. Those omissions and errors that remain are clearly my responsibility. A broad sample of papers on our topic can be accessed easily these days by searching Math. Sci. Net under "Anywhere logarithmic Sobolev" and "Anywhere hypercontractivity". The former category produced 316 entries on August 24, 2004. The latter category produced 124 entries on the same day.
2.3.1 Herbst inequalities (,75), concentration of measures Question: For what other measures besides Gauss measure does a logarithmic Sobolev inequality hold (and consequently hypercontractivity)? Much of the work on (LS) and (HC) is concerned directly with this question. Sufficient conditions have been adduced in many contexts. In a letter to me in November 1975, Ira Herbst showed, by a quantum field theory computation, that if (LS) holds for a measure J.L on IR then
l
e€X
2
dJ.L(x)
O. This necessary condition for the validity of (LS) also plays a role in understanding ultracontractivity, which will be discussed in section 2.3.6. Herbst's necessary condition for the validity of (LS) is closely related to the problem of concentration of measure: Suppose that J.L is a probability measure on the Borel sets of a metric space X and C is a closed subset of X with J.L( C) = 1/2. Then 1 - /-L( {x EX: dist( x, C) '< r}) goes to zero as r -+ 00. But how fast? If it goes to zero "very fast" then one could say that the measure J.L concentrates around the boundary of C.
SURVEYS: These two topics have been surveyed recently by Michel Ledoux [65, 66, 68, 70], by the Toulouse group [91], and by Bobkov and Ledoux [12]. The survey [70] relates these concepts to geometric quantities such as diameter, Ricci curvature, and the first eigenvalue of the Laplacian when the underlying space is a Riemannian manifold, or more generally a metric measure space in the sense of Gromov. 2.3.2 Clifford algebra HC and LS (,73) About 1935 the differential geometer O. Veblen pointed out a natural linear isomorphism between the exterior algebra over IRn and an algebra introduced by Clifford late in the nineteenth century. It happens that the exterior algebra over an infinitedimensional Hilbert space has an interpretation as the state space for a quantum mechanical system consisting of a variable number of electrons. Motivated by this
56
CHAPTER 2
relation to quantum field theory, Irving Segal showed in [86] that the isomorphism is also isometric, provided one completes the infinite-dimensional Clifford algebra in the "L 2 " norm. What is involved here is a notion of noncom mutative integration, developed initially by von Neumann, and later by Segal. (A self-contained exposition of noncommutative integration may be found in the 1974 paper of Ed Nelson, [77].) The usefulness of the representation of the electron space by L2(Clifford algebra) for proving existence of lowest eigenstates of some infinite-dimensional quantum field Hamiltonians was already known to this author in 1972 [41]. So at the same time that I was establishing the equivalence of (LS) and (HC) for Dirichlet form operators over measure spaces, I tried to do the same for the Clifford algebra [42]. The Dirichlet form operator A." (cf. (2.4)) has a quite precise analog for Clifin the Laplacian ford algebras: one replaces the sum of squares of derivations by a sum of squares of certain antiderivations 8; on the Clifford algebra. However, although I had the correct time for contraction from L 2(Clifford) to L 4 (Clifford) in [41] I could not, by interpolation, get the coefficient in front of the energy term in (LS) down to I, as it should be, but only to In 3. This program was completed 20 years later by Eric Carlen and Elliott Lieb in [17], which introduced some beautiful structures along the way. The result is that hypercontractivity holds for the Clifford algebra for exactly the same time, tN(p, q), as in the Gaussian measure case.
8J
SURVEYS: Advances beyond [42] but prior to the work of Carlen and Lieb are surveyed in the Varenna survey [44]. An analytical use of the exterior algebra (closely related to the Clifford algebra) in quantum field theory is discussed in the book [34]. Hypercontractivity in other noncommutative algebras is surveyed in [95] by Boguslaw Zegarlinski.
2.3.3 1\vo points and Aline Bonami. Links to Fourier analysis and information theory Let S = {I, -I} and denote by f.L the measure on S which assigns measure 112 to each point. If f : S ~ ffi. define (D f)(x) = (1(1) - f( -1))/2 THEOREM 2.6 (1\vo-point hypercontractivity)
Ile-tD' Dlb(I')--tLq(l')
::;
1 if and only ift ~ tN(p, q).
(2.16)
I thought I was the first one to prove this theorem [43] (submitted 1973). But I learned years later that Aline Bonami [14} (submitted 1969) discovered it four years earlier than me. Her interest in this kind of theorem was stimulated by her investigations into harmonic analysis on groups. She also observed that this inequality tensors up and yields the analog on the infinite product group SN. My own interest in this kind of inequality was stimulated by my attempt to better understand Clifford algebra hypercontractivity. The Clifford algebra in lowest dimension (i.e., over a one-dimensional space) is commutative and is precisely the algebra of functions on S. The operator D* D is precisely the analog of the number operator (which the reader can interpret to mean an analog of the operator A." defined in (2.4». I had
HYPERCONTRACTIVITY
57
already proven (2.16) for p = 2 and q = 4 in the more general context of a Clifford algebra over an arbitrary dimensional space [41]. But getting this two-point space settled, that is, (2.16), didn't help for understanding the general Clifford algebra case (which remained unsettled for twenty years, as I mentioned in the previous section). On the other hand I discovered that by tensoring up (2.16) n times and then restricting to functions of Xl + ... + Xn the inequality (2.16) goes over to Ed's inequality in dimension 1 via the central limit theorem. Since all of these inequalities tensor up, this gives a proof of Gaussian hypercontractivity from an elementary calculus proof of (2.16). Aline Bonami's proof of (2.16) differs from mine in that she proved (2.16) directly, while I proved the associated (LS) directly and then used my equivalence theorem technique to deduce (2.16). Bill Beckner [6] subsequently extended the two-point inequality to complexvalued functions and applied the central limit theorem technique to derive the sharpest form of the Hausdorff-Young inequality, a weak version of which had previously been known from interpolation theory without best constants. This inequality expresses the precise bound of the Fourier transform as a map from LP(JRn) to Lpl (JRn) for 1 ~ p ~ 2. In the process it was necessary to transfer inequalities back and forth from Gauss measure to Lebesgue measure. Both Bill Beckner [6] and Eric Carlen [16] gave a'Teformulation of a stronger form of (LS) in terms of Lebesgue measure instead of Gauss measure. The stronger form is possible because of the use of complex-valued functions. Further insight into this stronger form and its relation to statistical mechanics was obtained recently by Mary Beth Ruskai [83]. In an old paper [31], Bill Faris pointed out that the Beckner-Carlen inequality may be stated as an entropy inequality that has both the classical Heisenberg uncertainty principle and the logarithmic Sobolev inequality as special cases! Some very recent applications of the two-point theory of Bonami and Beckner were made recently to noise stability and Boolean functions in [8, 35, 85]. See the bibliographies of these papers for other recent work in this direction. The Lebesgue measure version of (LS) makes contact with a vast and much earlier literature on information theory. The logarithmic Sobolev inequality, (LS), can be deduced from some earlier information-theoretic inequalities after transformation through several steps. An extensive survey of this literature is given by the Toulouse group [91] (see especially Chapter 10), showing relations between the, logarithmic Sobolev inequality and information-theoretic inequalities of Shannon, Cramer-Rao, Blachman-Stam, and also Fourier-analytic inequalities of BrascampLieb, Beckner-Hirschman and Hausdorff-Young.
SURVEYS: A little more information about these topics can be found in the survey -[44]. A self-contained exposition of hypercontractivity, logarithmic Sobolev inequalities, the two-point case, and how to apply the central limit theorem is given by Svante Janson in [60]. See especially Chapter 5. A very broad perspective on all the topics of this section is given in the Toulouse survey [91] ..
58
CHAPTER 2
2.3.4 Curvature and LS The Ricci curvature of a Riemannian manifold M relates to the degree to which the Laplacian commutes with the gradient. Define a bilinear form on functions to functions by 2r(f, g) = 6.(fg) - (6.f)g - 1(6.g). For example, if M = ]Rn it is easy to compute that this is just \l I(x), \l g(x). Now repeat this procedure, starting again with the Laplacian but with r(f,g) instead of Ig. Thus put 2r 2 (f, g) = 6.r(f,g) - f(6./, g) - r(f,6.g). This is zero on ]Rn. But it happens that on a Riemannian manifold it gives the Ricci curvature: r 2 (f, g) = Ric(\l I, \l g). Bakry and Emery abstracted this in 1985 [5] by replacing the Laplacian in the previous formulas by the generator of an arbitrary contraction semi group. They then used the resulting abstract notion of Ricci curvature to prove hypercontractivity of the semigroup under an assumption of lower boundedness on the abstract Ricci curvature. This beautiful interplay between geometry and semigroup theory has proven to be a powerful technique for proving hypercontractivity of a given semigroup. SURVEYS: There are now several thorough surveys of this approach by Dominique Bakry [3,4], Michel Ledoux [67], and the Toulouse group [91].
2.3.5 HC and LS as compactness tools: Ground states and large deviations The Sobolev embedding theorem not only gives a continuous embedding of a Sobolev space over a domain n in ]Rn into LP(n) but "also asserts that the embedding is a compact operator if p is not too large and n is bounded. The classical Sobolev theorems are specifically for finite dimensions. They don't make sense over an infinite-dimensional space. But hypercontractivity and logarithmic Sobolev inequalities are both meaningful and correct in infinite dimensions. Roughly speaking this derives from the fact that the constants in (HC) and (LS) are dimension independent while at the same time Gauss measure in infinite dimensions is a perfectly fine measure (on ]Roo, say). (HC) and (LS) have been used as a kind of compactness condition in at least three circumstances. I. This author used (HC) as a compactness condition to prove the existence of ground states for certain quantum field theoretic systems [41] and, more recently, to prove the existence of ground states for certain Schrooinger operators over loop groups. 2. Ichiro Shigekawa used (LS) to prove existence of invariant measures for certain infinite-dimensional diffusion processes [88]. 3. Deuschel and Stroock used (LS) as a kind of compactness condition in the theory of large deviations [24].
SURVEYS: We refer the reader to [24] for an exposition and to [65,66] for surveys.
59
HYPERCONTRACTIVITY
2.3.6 (Intrinsic) hyper-, super-, and u1tra-contractivity Suppose we have a Schrodinger operator H = -~ + V over ]Rn. Assume that H has a normalizable ground state'ljJ (which typically will be strictly positive if V doesn't have bad singularities). Then we can apply Jacobi's trick as in section 2.1.3 and unitarily transform H to a self-adjoint operator acting in L2(]Rn, IL), where IL is the probability measure IL = 'ljJ2 dx. After subtraction of a constant the resulting operator is exactly the Dirichlet form operator All" The link between (LS) and (HC) is therefore applicable in the spaces L 2(]Rn,JL). Davies and Simon [21] have referred to properties of AI' as intrinsic properties of the operator H. For example, the operator H is called intrinsically hypercontractive if AI' satisfies Ed's hypercontractivity inequalities. Now it can happen for some measures that the semigroup e- tA " has even better boundeness properties than the Gaussian semi group e-tA~ discussed so far. Jay Rosen [80] has defined a semi group e- tA " to be supercontractive if, for any p < 00, e- tA " is bounded from L 2(1L) into LP(JL) for all t > O. This should be contrasted with the statement of Ed's theorem, which (for Gauss measure) shows that e-tA~ is actually unbounded from L2 into LP if p is too large, given t. Jay Rosen showed, for example, that the Dirichlet form associated to the measure exp( -Ixlb)dnx on ]Rn with b > 2 is a supercontractive semigroup generator. Davies and Simon [21] introduced an even stronger notion: the semigroup e- tA " is ultracontractive if it is bounded from L2 into Loo for all t > O. Of course one must expect that as t -1- 0 the norm Ile- tA " 11£2-+£00 goes to infinity because the semigroup is doing less smoothing. In fact there are interesting links between the rate of blowup and the dimension, n. See [22]. There are two circumstances in which Davies and Simon apply these ideas. First one should note that supercontractivity and ultracontractivity are linked to families of logarithmic Sobolev inequalities in much the same way that hypercontractivity is linked to a single logarithmic Sobolev inequality. They used this connection to derive properties of the semigroup e- tA ,.. in terms of properties of the potential V. A typical theorem is a statement of the form that e- tH is intrinsically ultracontractive if the potential V grows fast enough at infinity (substantially faster than quadratically.) This portion of their paper has not been surveyed. But further developments along these lines have been made in [50] and [71]. The second kind of application of these ideas which Davies and Simon made was in the context of heat kernels for the Laplacian itself in a bounded open set n of ]Rn. They first carried out Jacobi's trick, starting with the ground state for the Dirichlet Laplacian in n. They then derived ultracontractive and used this to bounds for the intrinsic semigroup under mild conditions on derive boundary behavior of higher eigenfunctions of the Dirichlet Laplacian. They also derived bounds on the heat kernel itself. This portion of their paper, along with many other related topics, is discusse 0,
L
,
tnllAncpl1 v /2 (and p ~ 2). Since -A - c/r 2 for c large and v ~ 5 is not essentially self-adjoint on C8"(IRV), this condition would seem to be close to optimal since ~rl-:';l (r- 2)P dVr < 00 if P < v /2. What I discovered is that the "correct" conditions are asymmetric: positive singularities need only be L2. I proved that if V E L2(JRv, e- x2 dVx) and V ~ 0, then -A + V is essentially self-adjoint on (JRV) for any v.
Co
The proof [70] went as follows. By Ed's result on hypercontractivity of the fixed Hamiltonians [56], Ho = -A + x 2 generates a hypercontractive semigroup after translating to L2 (JRV, 05 dV x) with 0 0 the ground state of Ho. By Segal's theorem and a simple approximation argument, N = Ho + V is self-adjoint on Co(JRV). Now use [N, -A+ V) = [x 2 , -A+ V) to verify the hypotheses of Nelson's commutator theorem [57] to conclude that -A + V is essentially self-adjoint. Actually, in [70], I used a different argument than the Nelson commutator theorem, but I could have used it!
I conjectured that the weak growth restriction implicit in f V(x)2e- X2 dx < 00 was unnecessary and that V ~ and V E Lroc (JR V ) implied that - A + V was essentially self-adjoint on Co(JRV). Kato took up this conjecture and found the celebrated Kato's inequality approach to self-adjointness [43]. This is not the right place to describe this in detail (see [43] or [64, 73]), but what is important is that between the original draft he sent me and the final paper, he added magnetic fields and he used as an intermediate inequality
°
I(V' - ia)cpl ~ V'lcpl
(3.2)
pointwise in x. Formally, (3.2) is obvious, for if cp = Icple i11 , then it follows that Re(e- i11 (V' - ia)cp) = Re((V' - ia + i(V'1]))lcpl) = V'lcpl. What I realized two years later was that by integrating (3.2) in x, one has (Icpl, H(O, V)lcpl) ~ (cp, H(a, V)cp),
(3.3)
78
CHAPTER 3
which implies the diamagnetism of the ground state. The analog of (3.3) for finite temperature is
Tr(e- t3H (a,V))
~
Tr(e- t3H (o,v)),
and this led me to conjecture the diamagnetic inequality (3.1). At the time, every Thursday the mathematical physicists at Princeton got together for a "brown bag lunch." During 1973-1978, the postdocs/assistant professors included Michael Aizenman, Sergio Albeverio, Yosi Avron, Jiirg Frohlich, Ira Herbst, Lon Rosen, and Israel Sigal. Lieb, Wightman, and I almost always attended, and often Dyson and Nelson did. After lunch, various people talked about work in progress. I discussed (3.3) and my conjecture (3.1), explaining that I was working on proving it. After I finished, Ed announced: "Your conjecture is true; it follows from the correct variant of the Feynman-Kac formula with a magnetic field." So the first proof of (3.1) was Ed's. Characteristically, he refused my offer to coauthor the paper where this first appeared with another semi group-based proof [72]. I should mention that the simplest proof of (3.1) and my favorite [73] is very Nelsonian in spirit: one uses the Trotter product formula to get the semigroup (e-tH(a,v)) as a limit of products of one-dimensional operators e+t(8j-iaj)2 In and uses the fact that one-dimensional magnetic fields can be gauged away. This is Nelsonian for two reasons. The use of the Trotter product formula in such a context is due to Ed, but also the proof is a poor man's version of Ed's original proof: the gauge transformations are just a discrete approximation to an Ito stochastic integral. e. Point interactions. The subject of point interactions has been heavily studied (see, e.g., [3]). So far as I know, Ed was the first to study point interactions as limits of potentials with supports shrinking to a point. He presented this in his courses; an extension of the ideas then appeared in the theses of his students, Alberto Alonso and Charles Friedman [22]. The basic points are: i. If v ~ 4 and Vn is any sequence of potentials, say, each bounded (but not uniformly bounded in n), supported in {x Ilxl < n-l}, then -~+ Vn -t -~ in the strong resolvent sense. ii. If v
~
2 and Vn
~
0, (i) remains true.
iii. If v = 1,2,3, there are special negative Vn's that have strong limits different from -~, many with a single negative eigenvalue. These are the point integrations. Point (i) is an immediate consequence of the fact that {J E C8"(lR V ) I f == 0 in a neighborhood of O} is an operator core of -~ if v ~ 4. While (ii) can be obtained by similar consideration of form cores (and a suitable, somewhat subtle, limit theorem for quadratic forms), in typical fashion, Ed explained it not from this point of view, but by noting that in dimension 2 or more and with x, y f:. 0, almost every Brownian path from x to y in fixed finite time t avoids O. Thus, in a Feynman-Kac formula, if Vn has shrinking support, the
79
ED NELSON'S WORK IN QUANTUM THEORY
integrand goes to 1 (i.e., exp( - I~ Vn(w(v) ds) -+ 1); Vn ~ 0 is needed to use the dominated convergence theorem in path space.
3.3 THE NELSON MODEL A search in MathSciNet on "Nelson model" turns up nineteen papers, many of them recent [2, 5, 6, 7, 8, 9, II, 13, 14,23,25,37,38,39,40,47,48,49,78], so I'd be remiss to not mention the model, although I'll restrict myself to describing the model itself and noting that Ed introduced it in [55] and studied it further in [54]. The nucleon space 1{(N) is L2(JR3n) where n is fixed (most later papers take n = 1) with elements in 1{(N) written 'Ij!(Xl' ... ,xn ) and free nucleon Hamiltonian n H(N)
= - I>~j. j=l
The meson space is the Fock space, 1{(M), on JR3 with creation operators at(k) (k E JR3). The meson has mass IL (Ed took IL > 0; many applications take IL = 0) and free Hamiltonian
where
One defines the cutoff field for fixed x E JR3 by
CPx(x) = T 1/ 2 (27r)-3/2
f
w(k)-1/2(a(k)e ik 'x
+ at(k)e-ik.x)x(k) dk.
Ed took X to be a sharp cutoff (characteristic function of a large ball); some later authors take other smoother X's. One defines 1{ = 1{(N) 1811{(M),
and on 1{, n
HI = 9
L cp,,(Xj), j=l
where 9 is a coupling constant and now x is the nucleon coordinate. The Nelson model is the Hamiltonian
This has been a popular model because it is essentially the simplest example of an interacting field theory with an infinite number of particles.
80
CHAPTER 3
3.4 HYPERCONTRACTIVITY The next two sections concern outgrowths of Nelson's seminal paper [S6]. This paper of only five pages (and because of the format of the conference proceedings, they are short pages; in 1. Math. Phys., it would have been less than two pages!) is remarkable for its density of good ideas. The following abstracts a notion Ed discussed in [56]. DEFINITION 3.1 Let Ho ~ 0 be a positive self-adjoint operator on the Hilbert space L2(M, dJ.L) with dJ.L a probability measure. We say e- tHo is a hypercontractive semigroup if and only if
a.
Ile-tHocpllp ~ Ilcpllp, 1 ~ p
~ 00, t
> 0,
b. for some To and some C < 00,
Ile- THo cpll4
~
Cllcpl12.
(3.4)
Here the bounds are intended as a priori on cp E L2 n £P. Ed's key discovery in [S6] is that if V is a function with e- v E np -00). The simplest proof of this bounded ness result follows from the formula
IleA+BII
~
IleAeB11
(3.S)
for self-adjoint operators A and B. This formula is associated with the work of Golden, Thompson, and Segal (see the discussion of section 8a in Simon [74 D. It is proven by a suitable use of the Trotter product formula and the fact that II CD II ~ IICIlIiDIi. Typically, in Ed's application, he appeals to a Feynman-Kac formula, which has the Trotter formula built in, and to a use of H61der's inequality, which can replace IICDII ~ IICIlIiDII because in the path integral formulation, the operators become functions. I'd like to sketch a proof of (3.S) since it is not appreciated that it follows from Lowner's theorem on monotonicity of the square root ([SO]; see also [36,42]). We start with (3.6) By the functional calculus, it suffices to prove (3.6) when C is a number and by scaling when C = 1, in which case, by a change of variables, it reduces to an arctan integral. Since C(C + W)-l = 1 - w(C + W)-l, we have
o ~ C ~ D =* (C + W)-1
~ (D
+ W)-1 (3.7)
which is Lowner's result. Let A, B be finite Hermitian matrices. Since
o ~ C ~ D ¢:> IIC1 / 2 D- 1/ 211 ~ 1,
81
ED NELSON'S WORK IN QUANTUM THEORY
(3.7) can be rewritten
IIC1/ 2 D- 1/ 2 11 which, letting
C 1/ 2
=
eA ,
D 1 /2
=
~ 1~
e- B ,
IIC1 / 4 D- 1/ 411 2 ~
1,
implies
lIe A / 2 e B / 2112
~ 11e A eB II·
(3.8)
Iterating (3.8) implies
II(e A / 2n eB / 2n )2nll ~ Ile A / 2n eB / 2"11 2n ~
IleAeBII·
Taking n --t 00 and using the Trotter product formula implies (3.5) for bounded matrices, and then (3.5) follows by a limiting argument. Once one has (3.5), one gets lower boundedness by noting
lie-TV e- THo cpl12 ~ Ile-TVII41Ie-THocpIl4 ~
Clie- TV II41IcpI12,
so hypercontractivity and e- 4TV E L1 imply Ik~-T(Ho+V)11 < 00. The term "hypercontractive" appeared in my paper with Hl1Iegh-Krohn [41], which systematized and extended the ideas of Nelson [56], Glimm [27], Rosen [65], and Segal [67]. The name stuck, and I recall Ed commenting to me one day, with a twinkle in his eye that many know, that afte. all, "hypercontractive" was not really an accurate term since the theory only requires (3.4) with C < 00, not C ~ I! That is, e- THo is only bounded from L2 to L 4, not contractive. We should have used "hyperbounded," not "hypercontractive." Ed was correct (of course!), but I pointed out (correctly, I think!) that hypercontractive had a certain ring to it that hyperb0unded just didn't have. There was, of course, a double irony in Ed's complaint. The first involves an issue that wasn't explicitly addressed in [56]. What Ed proved, using LP properties of the Mehler kernel, is that for the one-dimensional intrinsicoscillator,Ho = ~+Xd~ onL 2 (1R,7l'-1/2 e -x 2 dx),e- tHo is bounded 4 from L2 to L if t is large enough with a bound on the norm between those spaces ofthe form 1 + O(e- t ) as t --t 00. Ed then applied this to a free quantum field in a box with periodic boundary conditions. Because the eigenvalues of relevant modes go Wi '" one has n~o(1 + e- w1t ) convergent, so this application is legitimate[56] does not discuss anything explicit about the passage to infinitely many degrees of freedom, but this step was made explicit in [21]. (I thank Lenny Gross for making this point to me at the conference in Vancouver.) To handle cases like Ho in infinite volume, itis important to know that for t large enough, e- tHo is actually a contraction from L2 to L 4 , so the discreteness of modes doesn't matter. This was accomplished by Glimm [27], who showed that if HoI = 0, Hol{I}.L ~ mo and (3.4) for some C, then by increasing T, (3.4) holds with C = 1. The second irony concerns Ed's second great contribution to hypercontractivity: the proof in [59] of optimal estimates for second quantized semigroups-exactly the kind of special Ho in e- tHo he considered in [56]. He proved that such an operator from LP to Lq either was not bounded or was contractive! His precise result is, if H ~ a ~ 0, then r( e- tHo ) is a contraction from Lq to LP if e- ta ~ (q _1)1/2 /(p _1)1/2 and is not bounded otherwise. Here r( . ) is the second quantization of operators; see [71].
-t
e,
82
CHAPTER 3
Ed's work in these two papers on hypercontractive estimates spawned an industry, especially after the discovery of log Sobolev inequalities by Federbush [21] and Gross [31]. Brian Davies, in his work on ultracontractivity [17] and on Gaussian estimates on heat kernels [15], found deep implications of extensions of these ideas. While I dislike this way of measuring significance, I note that eighty papers in MathSciNet mention "hypercontractive" in their titles or reviews and Google finds 269 hits. See [16, 32] for reviews of the literature on this subject.
3.5 TAMING WICK ORDERING There was a second element in [56] besides hypercontractivity, namely, the control of e- tv . I want to schematically explain the difficulty and the way Ed solved it. In adding the interaction to a free quantum field, one might start with a spatial cutoff and want to consider
=
Vun
(L I, and n is Lipschitz, bounded, and connected, then n is a ball.
6.12 A COMPUTATION Once again, the reader might find the above results fishy, since there is no mention of optimal transportation in the Sobolev problem. So here is a flavor of the recipe. Normalize f fP' = 1, and define V'cp as the optimal transport (with quadratic cost) in the Monge problem between fP' and hP' , where
'-
C(t)
h(x) .- (1 + II t x II P')(n-p )1' P
1 Bn
h
p'-
- 1 (t > 0).
For any value oft> 0, we shall find an inequality linking IIV'fllLP(o), IIfilLPu (ao)' and h. The problem will be solved by the family of these inequalities. The ingredients that will be used are very basic. • Definition of transport: f F(x)h(V'cp(x)) dx = f G(y)h(y) dy . • Monge-Ampere equation: F(x) = G(V'cp(x)) det D2cp(x) . • Arithmetic-geometric inequality: the arithmetic mean is no less than the geometric mean; in particular, (det D2cp) lin ~ !:J.cp/n.
154
CHAPTER 6
I X . Y ~ (J IIXII~) lip ( IIIYIIP,)IIP' , where the exponents satisfy (lip) + (lip') = l. Stokes's formula: In V' . ~ = I{m ~ . a, where a is the unit outer normal to o.
• Holder's inequality:
• • Chain rule, in the form V'(FO)
= exFo-lV' F.
All these elements together easily lead to the following calculation:
r hP~ ~ inr(det D2cp)l/n fPd ~.!. r!:::,cpfPd n in
iBn
~ _pu n
r V'cp. fPd-IV'1 +.!. r (V'cp. a) fP'
in
n ian
~ ~ (l IP'IV'CPIP') lip' (llV' liP) lip + ~ fan fP'lV'cpl ~~
(In hP'(y)lyIP' d
Y) lip'
(llV' liP) lip + ~ fan fP' .
This is the core of the argument!
6.13 CONCLUSION Although these problems are seemingly so different, there is a lot in common behind Monge's problem of deblais and remblais on one hand, and the trace Sobolev inequality which was exposed above on the other hand. The path from one of these issues to the other led us to discuss problems in fluid mechanics or diffusion processes. Many problems which looked unrelated have found their place in a nice and complex picture: for instance, it makes sense to argue that Sobolev inequalities are a manifestation of the convexity of the functional
p f---t
-
r p(x)l-l/n dx
iRn
along geodesics of optimal transportation in P2(lR n ). While it is doubtful that optimal transportation will revolutionize the gigantic theory of Sobolev-like variational problems, in that field it has at least the same merit as Ed Nelson's rewriting of quantum mechanics (the parallel that I am drawing here does not mean that both problems are of the same importance!!): it provides an occasion to think and change one's point of view about a well-established theory.
BIBLIOGRAPHY
155
Bibliography [1] J.-D. Benamou and Y. Brenier, Weak existence for the semigeostrophic equations formulated as a coupled Monge-Ampere/transport problem, SIAM J. Appl. Math. 58 (1998), no. 5,1450--1461. [2] Y. Brenier, Decomposition polaire et rearrangement monotone des champs de vecteurs, C. R. Acad. Sci. Paris Ser. I Math. 305 (1987), no. 19,805-808. [3] H. Brezis and E. Lieb, Sobolev inequalities with a remainder term, J. Funct. Anal. 62 (1985), 73-86. [4] J. A. Carrillo, R. J. McCann, and C. Villani, Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates, Rev. Mat. Iberoamericana 19 (2003), no. 3, 971-1018. [5] D. Cordero-Erausquin, R. J. McCann, and M. Schmuckenschliiger, A Riemannian interpolation inequality a la Borell. Brascamp and Lieb, Invent. Math. 146 (2001), no. 2, 219-257. [6] M. 1. P. Cullen and R. J. Douglas, Applications of the Monge-Ampere equation and Monge transport problem to meteorology and oceanography, Monge Ampere equation: Applications to geometry and optimization (Deerfield Beach, FL, 1997), American Mathematical Society, Providence, RI, 1999, pp.33-53. [7] A. Eliassen, The quasi-static equations of motion, Geofys. Publ. 17 (1948), no. 3. [8] U. Frisch, S. Matarrese, R. Mohayaee, and A. Sobolevski, A reconstruction of the initial conditions of the Universe using optimal transportation, Nature 417 (2002), 260--262. [9] R. Jordan, D. Kinderlehrer, and F. Otto, The variational formulation of the Fokker-Planck equation, SIAM J. Math. Anal. 29 (1998), no. I, 1-17. [10] F. Maggi and C. Villani, Balls have the worst best Sobolev inequalities, J. Geom. Anal. 15 (2005), 83-121. [II] R. McCann, A convexity theory for interacting gases and equilibrium crystals, Ph.D. thesis, Princeton University, Princeton, NJ, 1994. [12] Felix Otto, The geometry of dissipative evolution equations: the porous medium equation, Comm. Partial Differential Equations 26 (2001), no. 1-2, 101-174. [13] S. Rachev and L. Riischendorf, Mass transportation problems. Vol. I: Theory. Vol. II: Applications, Probability and Its Applications, Springer-Verlag, New York,1998.
156
CHAPTER 6
[14] Cedric Villani, Topics in optimal transponation, Graduate Studies in Mathematics, vol. 58, American Mathematical Society, Providence, RJ, 2003.
Chapter Seven Internal Set Theory and Infinitesimal Random Walks Gregory F. Lawler* 7.1 INTRODUCTION Let me start this chapter with a disclaimer: I am not an expert on logic or nonstandard analysis. I had the opportunity to learn Internal Set Theory (1ST) as a graduate student and to use it in my research. It has been many years since I have used nonstandard analysis (at least in public), and there are others who are better equipped to discuss the fine points of Internal Set Theory and other flavors of nonstandard analysis. However, I happily accepted the opportunity to make some remarks for a number of reasons. First, Internal Set Theory is intrinsically a very beautiful subject. Second, I believe that it has been a major influence in my own research, at least in how I have been very interested in the relation between discrete and continuous problems. And third, it gives a nice excuse to discuss some results and open questions about the relationship between random walk and Brownian motion (an area in which I have more expertise). While at this point I would be considered an "outsider" to the world of nonstandard analysis, I really am a convert in that I believe in infinitesimals and the essential finiteness of mathematics. (However, I have come to believe that the traditional styles of writing mathematics using such constructions as the standard real numbers are aesthetically pleasant and make for cleaner theories than those using infinitesimals. This does not mean that they are fundamentally more "real" than those that use infinitesimals.) There are times when I could use nonstandard language when describing results but I choose not to because most of the readers or listeners will not be comfortable with such terminology. I will start by giving an introduction to the basic ideas of Internal Set Theory and nonstandard analysis. I only give enough to whet the reader's appetite; I strongly recommend Ed's original article [21] on Internal Set Theory as well as the opening chapter of an unfinished book [20] for a thorough introduction. For the more usual model-theoretic approaches to nonstandard analysis, I recommend [4] or [8]. Robinson's original monograph [26] is worth a look but it is extremely difficult for those without a background in logic. For the remainder of this paper, I will look at a particular area where nonstandard "Department of Mathematics, Cornell University, Ithaca, NY 14853, USA Supported by NSF grant DMS-0405021
158
CHAPTER 7
analysis can very naturally be applied: the study of Brownian motion and other random processes. The idea is to study continuous processes by looking at elementary infinitesimal processes, or sometimes, to study discrete processes by considering them on infinitesimal scales and comparing them to continuous processes. I start by doing some very classical mathematics, combinatorics for simple random walk, that dates back at least to DeMoivre and Laplace. I then discuss the relationship between infinitesimal random walks (Wiener walk) and the Wiener process. From a practical perspective, it is useful to construct a random walk from a Wiener process; by choosing infinitesimal increment sizes one gets a Wiener walk and a Wiener process that are very close. Such constructions are called couplings or strong approximations; in this paper we describe both the well-known Skorokhod embedding and the strong approximation scheme of Koml6s, Major, and Tusnlidy which is essentially optimal. I follow by describing one property of the Wiener process that can be discussed more simply (at least in my opinion) from considering the elementary simple random walk-local time for one-dimensional Wiener process. Next I discuss an open problem concerning the idea of local time on the outer boundary of a Wiener process in two dimensions. I give a longer discussion on another process, loop-erased random walk, whose study takes me back to my graduate student days when Ed Nelson gave me an excellent thesis problem that can more or less be phrased as "erase the loops from a simple random walk path and see what you get." There are many interesting open problems about this process which can be formulated nicely in nonstandard terms. I finish the paper with a few comments and personal opinions about nonstandard analysis.
7.2 AN INFORMAL INTRODUCTION TO INTERNAL SET THEORY Nonstandard analysis, as first created by Abraham Robinson [26], is a means to do analysis with "infinitesimals." His observation was that the real numbers could be extended in a conservative fashion to include positive "numbers" smaller than all positive reals. Nelson's Internal Set Theory differs from Robinson's approach to nonstandard analysis in that Nelson writes down the axioms of the real numbers (actually axioms for sets) under the assumption that infinitesimals or "very large" numbers already exist inside the real numbers. I would like to give a somewhat informal introduction to infinitesimals which will lead to the axioms of Internal Set Theory. We start with the basic assumption that in our entire lifetime (or if you wish, in the entire lifetime of the universe) one cannot conceive/describe more than a finite number of numbers. We will use the word "conceivable" to indicate that a number is describable. The integers I, 2, 555, and 103456754378654329653 are all conceivable. • Any real number that can be defined using only a conceivable number of symbols is conceivable. The real numbers 7r and e are also conceivable. We can describe 7r as the ratio of the
159
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
circumference of a circle to its diameter or in a number of other ways, for example, I 7r=
.
16n n
lIm -
n-too
(2n)-2 n
We can describe e by
e
= n-too lim
(1 + .!.)n n
=L 00
or
e
n=O
1
n!"
Note that limiting operations are allowed in the finite descriptions. Our first assumption becomes: • There is a finite set that contains all the conceivable numbers. However, we would also expect the following to be true:
• If n is a conceivable integer then n
+ 1 is a conceivable integer.
For many, the game would stop here because we have reached a contradiction to the principle of induction. For others they might try to do mathematics without the principle of induction. We do not want to lose induction, so let us state it here:
• If S is a subset of the positive integers with 1 E S and such that (n E S) (n + 1 E S), then S = {I, 2, 3, ... }.
==}
In order to reconcile this we must have the following: • There is no set S that consists solely of the conceivable positive integers. Note that this does not contradict our earlier assumption-there exists a finite set that contains the conceivable positive integers. What does not exist is a set that contains only the conceivable positive integers. 2 The "definition" of conceivable is at best vague at this point. The starting point for Internal Set Theory is not a definition of this concept but rather the introduction of it as an undefined predicate along with the axioms that it satisfies. In the formal theory we will replace the word "conceivable" with the usual word standard. We will also consider sets rather than numbers; of course, a real number can be associated to the singleton set containing that number. We can also talk about standard functions, since functions are subsets of direct products of sets. We write st for standard and we add to ZFC (Zermelo-Fraenkel set theory with the axiom of choice) the undefined predicate st and three axioms that are conveniently labelled (I), (S), and (T) which matches perfectly the TLA 3 for Internal Set Theory. Any formula from ZFC that does not contain the predicate st is called an internal formula; otherwise, it is called an external formula. The first axiom is the 'This limit is an exercise in Stirling's fonnula, see §7.4. 2 In other forms of nonstandard analysis one can talk about the set of conceivable integers. However,
this set does not have all the properties of other sets. It is called an external set while the sets that will be called sets in 1ST are called internal sets. The "sets" of Internal Set Theory are the "internal sets" of other theories. 3Three letter acronym.
160
CHAPTER 7
transfer principle (T): suppose A(x, tl, ... ,tn ) is an internal fonnula whose only free"variables are x, tl, ... ,tn' Then \;fttl \;ftt2 ... \;fttn [\;ftx A(x, t l , ...
,tn ) ==> Vx A(x, tl,' .. ,tn )].
An equivalent fonnulation of this principle is \;fttl \;ftt2 .. , \;fttn [3x B(x, iI, ... , tn)
==> 3st x B(x, t l , ... , tn)).
It says that if we have a formula that does not include the word standard and such that all the values other than x of the variables are standard, then if there exists some x that satisfies the fonnula, there must be a standard x. For example, any conceivable equation that has a solution must have a standard solution (all parameters in a "conceivable" equation must be conceivable). Next is idealization (I): if B(x, y) is an internal formula with free variables x, y, and possibly some other free variables, then [\;ftfin z 3xVy E z B(x,y))
¢::=>
[3x\;fty B(x,y)).
Here stfin stands for "standard, finite." This is the axiom that asserts the "finiteness" of the set of conceivable numbers. For example, by considering B(x, y) = "y E x and x finite," we see that there exists a finite set that contains all the standard sets. The final axiom is standardization (S): if C(z) is a fonnula (internal or external) with free variable z, and perhaps other free variables,
\;ftx3st y\;ftz [z E Y ¢::=> z E x/\C(z)). One can think of this in tenns of our inability to see anything more than standard sets. If we consider a fonnula and look at the z that satisfy it, there is a standard set such that standard z satisfy C(z) if and only if they are in the set. As an example, let C(z) = "z E Nand z standard" and suppose that x = N. Then we can choose y=N. Whenever one introduces axioms, the first question is whether or not the axioms are consistent. Unfortunately, it is still an open question whether or not 1ST, that is, ZFC with (T), (I), (S) added, is consistent. This has nothing to do with ISTthe problem is that it is unknown whether ZFC is consistent! However, one can prove [21] that if ZFC is consistent, then 1ST is consistent. Moreover, 1ST is a conservative extension of ZFC. This means any internal statement of ZFC that can be proved using 1ST can be proved without using 1ST. This is simultaneously one of the beauties and one of the major limitations of nonstandard analysis. We are free to use the predicate standard and the axioms of 1ST and we know that if we prove an internal theorem, then the theorem is valid without the addition of the predicate standard. However, we also know that we could have proved the theorem without using any nonstandard analysis! In 1ST all definitions not involving the concept "standard" are exactly the same as in usual mathematics. Thus a set is finite (with m elements) if and only if there is a bijection of the set with {I, ... ,m}. In this definition there is no mention of whether or not the number m is standard. Using (I) we can see that there is a finite subset of the positive integers that contains each of the standard positive integers. (In 1ST there is no set of all standard positive integers.) In particular, there exist nonstandard integers.
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
161
More specifically, consider the set
{1,2, ... ,N}, where N is a nonstandard integer. This set contains each of the standard positive integers. To see this, note that (S) implies that there is a standard subset S of {I, 2, ... } with the property that for standard integers n, n E S if and only if n E {I, ... , N}. Also, for every standard n, n E {I, ... , N} implies that n + 1 E {I, ... , N}. Hence S is a standard subset of N containing 1 and such that for all standard n, (n E S) =} (n + 1 E S). By (T), the last implication holds for all positive integers, and hence by induction S = N. By a similar argument, one can see that the standard set S' with the property that for standard n, (n E S') ¢:> (n E {N, N + 1, ... }) is the empty set.ln particular, if n E N is standard and N E N is nonstandard, then n < N. This is consistent with our heuristic idea of a standard positive integer as being "conceivable." As mentioned before, Robinson's framework [26] for nonstandard analysis is different. His original approach, and the approach used by a large percentage of those who do nonstandard analysis, is to start with the real numbers (which are called the standard reals) and add to these numbers more elements called hyperrea/s as well as operations on the hyperreals that are extensions and have the same properties as the arithmetic operations on the reals. If we restrict to the positive integers we get a similar heuristic view as in 1ST:
1,2,3, ... ,n, ... , ... , ... ,N -l,N,N + 1, ... where all the standard positive numbers are listed after which all the nonstandard positive numbers follow. In the language of 1ST the set {I, 2, ... , N} is finite, while the standard positive numbers do not even form a set. However, in the terminology of Robinson the set {I, 2, ... , N} is not finite (since it contains all the standard integers, an infinite set), but rather is hyperfinite; that is, it satisfies many of the properties of a finite set. I believe that understanding Robinson's approach requires more understanding of the underlying issues in mathematical logic. It should be noted, however, that to prove that 1ST is a conservative extension of ZFC, one needs to confront these issues. One must show that given a model of ZFC (or, for simplicity, consider a model of the usual real numbers), then there is a model for 1ST; the construction actually adds to the model so, in effect, it is adding numbers to the real numbers. However, when the theorem is proved, one no longer needs to consider the numbers as having been added. One major issue here fundamentally is that we really4 do not understand the real numbers so well that we can distinguish between the real numbers with hyperreals added or real numbers with certain numbers designated as standard.
7.3 BASICS OF INFINITESIMALS The construction of nonstandard models or nonstandard syntactical systems for the reals is an issue in logic. Nonstandard analysis, on the other hand, should be 4Excuse the pun.
162
CHAPTER 7
considered as the use of these models to answer questions in analysis. For this reason, it is perhaps better called infinitesimal analysis, but this term has other meanings in mathematics. One of the main goals of 1ST is to make it easy for the mathematician to start doing analysis with infinitesimals without spending much time on the logic. In this section, I will describe some of the basics of nonstandard analysis; most of these ideas are the same for all "flavors" of nonstandard analysis. We call a real number x limited if Ixl ~ n for some standard integer n. We call x infinitesimal, written x ~ 0, if Ixl < lin for every standard integer n > 0 or, equivalently, Ixl < liN for some nonstandard integer N > O.S Write x ~ Y if x - Y ~ O. Also, x ~ 00 means that x is positive and unlimited. The notation x « y means that x < y and not x ~ y, while x « 00 means not x ~ 00. If x is any limited real number, then there is a unique standard real number, denoted st(x), such that x ~ st(x). A function f : lR -t lR is *-continuous6 at y if f(x) ~ f(y) for all x ~ y. This is not the same as continuity; however, a standard function is continuous if and only if it is *-continuous at every standard real. This can be seen by noting the equivalence (for standard y) of:
\IE> 036> 0 [Ix
- yl < 6 ==>
If(x) - f(y)1 < E)
and \lst E >
0 3st 6 > 0 [Ix -
yl < 6 ==>
If(x) - f(y)1 < E). Note that the standard continuous function f(x) = x 2 is not *-continuous at an unlimited integer N since f(N + N- 1 ) = N 2 + 2 + N- 2 'I- f(N). If N is an unlimited integer, then the function f(x) = Nx is continuous at the origin, but is not *-continuous there. If (Y, p) is a metric space, we write Yl ~ Y2 if P(Yl, Y2) ~ O. If (Y, p) is standard and Yl, Y2 E Yare standard, then P(Yl, Y2) is standard. Hence, for standard elements of standard metric spaces, P(Yl, Y2) ~ 0 implies Yl = Y2. A function f : Y -t lR is *-continuous if f(x) ~ f(y) for all x ~ y. A standard function is *continuous if and only if it is uniformly continuous; one can see this by considering the standard function
osc(f; 6)
= sup{lf(x) -
f(Y)1 : Ix
- yl < 6}.
An element Y E Y is called near-standard if there is a standard Yo E Y such that Y ~ Yo; the previous comment shows that such a Yo is unique (if it exists). A standard metric space (Y, p) is compact if and only if each Y E Y is near-standard. This is an example of a mathematical concept that has a more intuitive definition in nonstandard analysis than in standard analysis. 5It may not seem immediately obvious that these are equivalent. but {n EN: Ixl < lin} is a well-defined (not necessarily standard) finite subset of the integers and hence has a largest element. If the set contains all the standard integers. then the largest element is a nonstandard integer. 6This is often called S-continuous. but since we use S for simple random walk we choose this terminology.
163
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
If Y = Cd[O, 1] is the set of continuous functions f : [0,1] -t lR d and p is the supremum norm, we write stf for the standard part of the function f, if it exists. Note that
f(x)
~
st/(x),
Let osc(f; 8) denote the oscillation of I,
osc(f; 8)
= max{lf(s) -
f(t)1 :
x E [0,1].
°: :; s, t :::;
1, Is
- tl :::; 8}.
A sufficient condition for a function f E Cd[O,I] to be near-standard is that 1/(0)1 :::; M for some standard integer M and there exists a standard continuous function g(.) with g(O+) = such that osc(f; 8) :::; g( 8). Indeed, if 1 satisfies these conditions, then 1 is in the compact set
°
{h: Ih(O)1 :::; M,osc(h;8) :::; g(8),0 < 8:::; I} (the Ascoli-Arzehi theorem shows that this is compact).
7.4 SIMPLE RANDOM WALK IN Z One of the advantages of nonstandard analysis is that it allows one to approximate continuous objects by elementary discrete objects. In this paper, we will focus on the relationship between the Wiener walk (a simple random walk with infinitesimal time and space increments) and the Wiener process as models for Brownian motion. Since we will be using the discrete object, we need to discuss the asymptotic behavior of the sums of random variables. Here we will quickly derive the facts that we need. This work is very classical, dating back at least to DeMoivre and Laplace. Most everything here is elementary, but it is nice to recall how much one can learn about the random walk by simple counting of the number of paths. 7 Let Xl, X 2 , ••• denote a standard sequence of independent random variables with 1 IP'{Xj = I} = IP'{Xj = -I} = 2' and let Sn
= Xl + ... + Xn be the corresponding simple random walk. Note that IP'{S2n = 2j} = ( 2n .) 2- 2n = T 2n ~2n)!., (7.1) n
+J
(n
+ J )!(n -
J)!
since the binomial coefficient gives the number of ways to choose n + j "+ Is" and n - j "-Is" from the 2n steps. To find the asymptotics of the right-hand side, one uses Stirling's formula. In fact, approximation of probabilities as above was a major motivation in developing Stirling's formula. Since the derivation is short, we will give it here. Let
bn
7 Of course.
e- n n n +(1/2)
= ----n!
if one wants to generalize these results to more complicated increment distributions than
±l. one needs other arguments.
164
CHAPTER 7
Then
b~:1
= e-1
(1 +~) (1 +~r/2 1+O(n=
n
The last equality uses
(l+~)n =e
[1- 2~
2 ).
+o(n-2)] ,
which can be established by taking logarithms and using Taylor series. Therefore, ifm~n,
b ~ bn
= II [1 + O(n- 2)] = 1 + O(n- 1 ). . m-1
J=n
Hence, there exists a constant c* , which we will determine below, such that
n! = c* e- n nn+(1/2) [1
+ O(n- 1)).
Plugging into (7.\), if Ijl ~ n/2, we see that IP'{S2n
[l+0(n- 1 ) ]J2 -c*
Vn
( l_L.2) -1/2 n2
(7.2)
= 2j} equals
2·)
( l_L .2) -n ( 1 _ _J_. n2 n +J
j
Note that
lE[e S2n / V21I) = lE[eXt/ V21I)2n = cosh(1/v'2r"i)2n = e1/ 2 [1
+ O(n-l)].
Hence,
IP'{S2n ~ c5n logn} = lP'{e S2n / V21I ~ nC} ~ n- c lE[e s2n / V21I) = e1/ 2 n- c [1
+ O(n- 1)].
(7.3)
This implies that lim
n-too
But, lim
n-too
This implies c* =
IP'{S2n
"~ IP'{S2n Ijl:SFn log n
L
-
.2/ = JOO e- 2dx = vf-1r.
1 e- J
Vn Ijl:sFn log n
= 2j} = 1.
X
n
-00
.j2ir in Stirling's formula (7.2), and we get the expression 1 CJ.2/ n = 2J}. = .jirii
1) exp { 0 ( ;;:
+ 0 ( nj43 )
}
.
(7.4)
(While our proof above holds only for Ijl ~ n/2, it is easy to check that this also holds for n/2 ~ Ijl ~ n using only the easy estimate 2- 2n ~ IP'{ S2n = 2j} ~ 1.) We can write 1
.jirii e
_j2/ n
2
= V(27l") (2n)
{ (2j)2} exp - 2 (2n) .
.
165
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
This shows that the distribution of S2n is approximately that of a normal random variable with mean 0 and variance 2n, or equivalently, S2n/ffn is approximately a standard normal. We have just derived the classical central limit theorem for random walk; in fact, we have established a stronger result which is often called a local central limit theorem. In the remainder of this section we will derive the estimates needed to get afunctional central limit theorem, that is, a theorem about the convergence of the random function
in(t)
= ~,
0:::; t :::; 1,
where the random walk is linearly interpolated so that in is a continuous function from [0, 1] into R A useful tool to estimate the maximum value of a random walk is the reflection principle which is essentially the following observation: if k > j and we know that Sj ~ y then the conditional probability that Sk ~ y is at least 1/2. Using the reflection principle at the first time that Sj ~ a fo log n, we can see that for all n sufficiently large, lP'{rrt;ISjl ~ afo logn}:::; 21P'{ISnl ~ afo logn}
= 41P'{Sn ~ avn logn} :::; 5e1 / 2 n- a . The last inequality uses (7.3). More generally, define the oscillation of the random walk by
osc(S;m,n)
= max{ISk -
Sjl: Ik -
jl:::; m,
1:::; j,k:::; n}.
Note that if m, n are positive integers
osc(S;n,mn):::;3
max
.max ISln+i-Slnl.
I==O, ...• m-l.==O •...• n
Therefore, for all n sufficiently large and all a
> 0,
lP'{osc(S;n,mn) ~ 3avn logn} :::;mlP'{osc(S;n,n) ~ avn logn} :::;5mn- a . (7.5) From this we can conclude the following "tightness" result: for every is an Rf < 00 such that for every n,
1P'{IStn -Ssnl :::; Rf vn v't=S Ilog(t-s)1 for all 0 :::; s
< t :::; I}
E
> 0 there
~ I-E. (7.6)
This estimate is the technical tool needed to establish that the functional central limit theorem, that is, the functions in, considered as random variables taking values in C[O, 1], converge in distribution to a particular C[O, I)-valued random variable, whose distribution is called Wiener measure. We discuss this in the next section. In a later section we will need the following well-known estimate for the simple random walk: there exist constants 0 < Cl < C2 < 00 such that Cl
.
fo :::; IP'{Sj7.f 0 : J
= 1, ... , n} :::;
C2
fo'
(7.7)
166
CHAPTER 7
We sketch a proof here. Let q(n) be the probability above and let 1]n 2n : Sj = O}. Then
1=
n
n
j=O
j=O
= max{j
~
2: lP{1]n = 2j} = 2: lP{S2j = O} q(n - j) n
2:(j + 1)-1/2 ~ cq(n) n l / 2.
~ cq(n)
j=O
This gives the upper bound. For the lower bound, one can use the central limit theorelP and reflection principle to show that there is a 8 > 0, independent of n, such that with probability at least 8, Sj =1= 0 for n < j ~ 2n. Hence
8~
2:
lP{1]n
= 2j} =
j~n/2
2:
lP{S2j
j~n/2
= O} q(n -
j)
2: (j + 1)-1/2 ~ cq(n/2) n l / 2.
~ cq(n/2)
j~n/2
7.5 WIENER WALK AND WIENER PROCESS Let N be a nonstandard positive integer. For ease, we will assume that N = 2K , where K is also a nonstandard integer. Let Xl, X 2 , ••• be independent, identically distributed random variables with lP{ Xj = I} = lP{ Xj = -I} = 1/2 and let Sn be the associated random walk, that is, So = 0 and Sn = Xl + ... + X n . Let D..t = l/N and for k = 0,1,2, ... , define the Wiener walk by Wkll.t = (D..t)-1/2 Sk. We can define W t for other t by linear interpolation. Consider Wt, 0 ~ t ~ 1. The Wiener walk gives a (nonstandard) measure on C[O, 1]; in fact, it is a point mass giving measure 2- N to the 2N different realizations of the walk. Let us denote this measure by WN. If j is a standard integer, then the random variables
W 2- j , W 2.2- j
-
W 2- j , · •• , WI - W(2 j -l).2- j
are independent and identically distributed with
lP{W2- j
= 2nN -1/2 } = lP{SN2- = 2n} = 2-N2j
j
(
N2- j ) N2-j-l + n .
Although the path Wt is clearly continuous, it is not as obvious that it is *continuous; in fact, certain realizations of the path (e.g., Xl = X 2 = ... = X N = 1) give paths that are not *-continuous. However, nearly surely the paths are *continuous; that is, for every standard f > 0, there is an event of probability at least 1 - f such that on this event the paths are *-continuous. This follows from the tightness result (7.6) for simple random walk paths. The Wiener walk gives a model for Brownian motion,8 random continuous motion with independent, stationary increments. The model is elementary in that it is a HIt is standard in the mathematical literature to treat Brownian motion and Wiener process as synonymous. However. in other scientific literature. the ternl Brownian motion is often used for a physical process. while the Wiener process is one of the precise mathematical models that can be used to describe this process. We adopt the latter usage in this paper.
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
167
uniform measure on a finite probability space (with 2N elements). When viewed by a human who cannot distinguish points that differ by an infinitesimal, the process looks like Brownian motion. Can one understand Brownian motion by understanding the Wiener walk? The answer is yes, but there are different ways to formulate this precisely. The natural way for a (standard) mathematician to study Brownian motion is to construct the continuous model for Brownian motion, the Wiener process. A Wiener process is a collection of random variables B t , 0 ~ t ~ 1, such that with probability one B t is a continuous function of t, the distribution of B t - Bs depends only on t - s, and for each 0 ~ So ~ Sl ~ ... ~ Sn ~ 1, the random variables BSI - BSQ'·.·' BSn - B Sn _1 are independent. It can be shown that for such a process, the increments B t - Bs would have to have a normal distribution. Since there are an uncountable number of values of t, a little work needs to be done to show that such a process can be constructed (the hardest part, which is not really very hard, is to derive an oscillation estimate similar to what we derived for the simple random walk). If we normalize so that IE[Bd = 0, IE[BrJ = 1, then the process is essentially unique. Using Robinson's framework for nonstandard analysis and a construction due to Loeb [18], Anderson [1] constructed a standard Wiener process9 from a Wiener walk with N ~ 00. Roughly speaking, once one proves that the Wiener walk is nearly surely *-continuous, then the path Wt, 0 ~ t ~ 1 is a near-standard element of e[O, 1] so we define Bt, 0 ~ t ~ 1 to be the standard part of W. In particular, nearly surely, W t ~ B t , 0 ~ t ~ 1. A little thought tells us there are some subtleties in justifying this definition. What we are trying to do is to define a "function" from a finite space (the set of random walks) into an uncountable space (e[O, 1]) in such a way that the induced probability measure on e[O, 1] is nonatomic. In Robinson's framework for nonstandard analysis, the number N is not really a finite integer but rather a hyperfinite integer that has been added to the integers in such a way that it retains the properties of a finite integer. Therefore the set of random walk paths of N steps is not finite. Needless to say, a little background in mathematical logic is needed to understand the construction! In 1ST, the number N ~ 00 is really an integer, so one cannot construct a standard Wiener process in this fashion. However, one can get around this by starting with a standard Wiener process B t and then defining the Wiener walk W t , 0 ~ t ~ 1 as a function of Bt, 0 ~ t ~ 1, so that nearly surely Wt ~ Bt, 0 ~ t ~ 1. The process of defining a random walk from a Wiener process so that the two processes are close goes under the name strong approximation or coupling. We describe two such couplings in the next section. One advantage of defining the Wiener walk directly from the Wiener process (rather than using, say, the Anderson-Loeb construction) is that one often gets a more precise estimate than just W t - B t ~ O. (It is often important to know more than whether a number is infinitesimal since if x ~ 0, y ~ 0, x/y can be anything.) Of course, one can use coupling to define Wiener walks in the usual framework for nonstandard analysis as well. 9The tenn "standard" when referring to a Wiener process has two different meanings: one is standard as in nonstandard analysis, and the other is normalized so that !E[Bd = 0, !E[B;] = 1. In this section "standard" Wiener processes are standard in both senses.
168
CHAPTER 7
When we fix one N ::: 00 and consider the Wiener walk and Wiener process that are close we are essentially doing weak convergence of random walk to the Wiener process. For many purposes, it is better to consider strong convergence. In this case, the coupling technique is better. One starts with a standard Wiener process and a coupling as in the next section, which gives a sequence of Wiener walks w(n) defined from the Wiener process. With the appropriate choices of coupling, we can then see that nearly surely for all t E [0,1] and all N ::: 00, Wt(N) ::: B t . Finally, the "radical" approach suggested by Nelson [22] is to forget entirely about the standard Wiener process and to use the Wiener walk as the model for Brownian motion. This has the beauty of making a very elementary theory. However, one eventually does want to talk about the Wiener process since it has some properties that approximations do not have. For example, the Wiener process satisfies the exact scaling rule that states that if B t is a standard Wiener process and a > 0 is a standard real, then Bt := a- 1 / 2 Bat is another standard Wiener process.
7.6 COUPLING A (strong) couplinglO (of Wiener process and random walk) is a probability space (n, F, IP') on which are defined a standard Wiener process B t and a standard sequence of simple random walks S~n), n = 1,2, .... Here the time parameter of the random walks is made continuous by linear interpolation. Associated to the random walks are the scaled random walks Wt(n) := n- 1 / 2 S~7). When N ::: 00, W(N) is a Wiener walk. Let
~~n) = max{IW~n)
- Bsl :
0 ~ s ~ t}.
We will call a coupling good if for every positive integer N ::: 00 and every t « 00, ~~N) ::: 0 nearly surely. By "Robinson's lemma," this is equivalent to saying that for each N ::: 00 there is aT::: 00 such that ~¥'») ::: 0 nearly surely." Usually strong couplings are constructed by starting with a Wiener process and defining the random walks as functions of the paths of the Wiener process. A wellknown example of a strong coupling is Skorokhod embedding. Essentially, for each n, we consider the Wiener process every time that it has made a new increment of absolute value n- 1/ 2 ; if the increment is positive, the random walk goes up, and if the increment is negative, the random walk goes down. It is easy to see for each n . that this gives a random walk. To be more precise, we define stopping times for the Wiener process B t as follows: let 7(0, n) = 0 and
7(j,n) = min{t > 7(j -1,n):
IBt - BT(j-l,n)1
= n- 1 / 2 }.
Then the random walks are defined by
S j(n)
-
n 1/2 B T(j,n),
W(n) -B tin T(j,n),
IOThe word "strong" refers to the fact that we have a sequence of random walks defined from the Wiener process. 11 Consider
the set of positive integers m such thatlP{ .:l;,;') ~ 11m} ~ 11m.
169
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
with appropriate linear interpolation. For fixed n, the random variables
{n[r(j,n) - r(j -1,n)): j = 1,2, ... } are independent, identically distributed with mean I and finite variance. The centrallimit theorem implies (roughly) that r(j, n) = (j In) + O(n- l / 2). Using the relation IBt - Bsl ~ It - 81 1/ 2 , we see that we expect Iwf n ) - Bli ~ n- l / 4 • By doing estimates more carefully it can be shown that there is a c such that for all a
lP'{~~n) ~ acn- l / 4 logn} ~ ctn l - a, which shows, in particular, that this is a good coupling, A different, sharper coupling was given by Koml6s, Major, and Tusmldy [10]. For this coupling, there exist a, b, c with
lE[exp{anl/2~~n)}) ~ cn b. In particular, for all f3
(7.8)
> 0,
lP'{~~n) ~ f3n-l/210gn}=lP'{exp{anl/2~~n)} ~ nai3} ~n-ai3lE[exp{anl/2~~n)}) ~ cn b- ai3 .
The definition of the coupling is fairly easy although it takes careful estimation to establish the bound (7.8). We will describe the coupling for N = 2K where K ~ 00. Given the Wiener process B t we will first define WI = WfN) using only the value of B l ; we then define W l / 2 using only the values of B l , B l / 2 ; we then define W l / 4 , W 3 / 4 using only the values of B l / 4 , B l / 2, B 3 / 4 , B l , etc. For the first step, we note that Bl is a N(O, 1) random variable. Define WI from Bl using quantile-coupling which is the natural way to define a discrete random variable as a function of a normal random variable so that they are as close as possible. More precisely,
WI = j N- l / 2 , SN = j
ifrj_l
< Bl
~ rj,
where rk is defined by
L lP'{SN = i} = lP'{B l ~ rd = j i~k
rk
-00
Using (7.4), one can check that
IWI
-
Bd ~
1 ICC
v21l"
2
e- x /2 dx.
l-N [1 + Wf],
provided that IWll < N l /2. In other words, the typical error in the approximation is O(N-l/2). The conditional distribution of B l / 2 given Bl is N(BI/2, 1/2); this can be seen by writing
. 1 1 B l / 2 = "2 Zl / 2 + "2 Bl ,
where Zl/2 = 2 Bl/2 - B l , and checking that Zl/2 and Bl are independent. T~e conditional distribution of W l / 2 given WI = j N- l /2 is a hypergeometric distribution with mean (j /2) N- l / 2 . Given B l , B l / 2 we define
Wl /2 =
1-
1
"2 Zl /2 + "2 Wl ,
170
CHAPTER 7
where Zl/2 is the random variable with distribution of W 1 / 2 that is quantile-coupled with Zl/2' Since
1 B 1/ 2 - W1/ 2 = 2" [Zl/2 - Zl/2]
1 -
+ 2" [Bl
-
(1/2) WI given WI
- WIl,
we can see that we would expect Bl/2 - W 1 / 2 to be 0(N- 1 / 2 ). We continue this process to get the coupling. The bound (7.8) can be obtained using essentially only (7.3) and some brute force; see [17]. While this coupling is very sharp, one does need to take care when using it because, although the Wiener walk and the Wiener process are separately Markov processes, if we look at both simultaneously we do not have a Markov process.
7.7 LOCAL TIME In many situations one wants to consider special times or points in a random curve, often a set of times whose fractal dimension is positive but smaller than I or a set of points whose dimension is positive but smaller than that of the entire curve. In many cases, it is easy to describe these points on the discrete level, and one would hope to be able to use the discrete analog to help understand the continuous object. There are often nice heuristic arguments that relate the discrete to the continuous, and the problem is to justify the relation rigorously. In this section we consider a case, local time at the origin of a one-dimensional Brownian motion or random walk. where the sets are well understood and for which the link between the discrete and the continuous can be made rigorous. Let B t be a standard Wiener process with a standard good coupling Wt(n) := n-l/2S~n) of random walks. Let us fix N := 00 and write just W = W(N), S = SeN). Let j
R j = I:l{Sj =O} k=O
denote the number of visits to the origin by S in the first j steps. Using (7.4), we see that as j --+ 00
JE[Rj]
1
j/2
j/2
k=l
k=l y7r
= 2: JP'{S2k = O} = 0(1) + 2:
~ k
= J2/,rr //2 + 0(1).
Define the local time ofW (at the origin) to be the random function " I{WkiN -- O} . L- j/N -- N- 1 / 2 Rj -- N- 1 / 2 " ~
k:5,j
The local time measures that amount of time that the Wiener walk spends at the origin. It is a random process depending on the realization of the walk and is normalized so that JE[Ll] := J2/7r. It is clearly nondecreasing in time and only increases when the Wiener walk is at the origin. It is not very difficult to show that if m « 00, then nearly surely. Lt. 0 t m is near-standard and *-continuous; this
:s :s
171
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
is a straightforward estimate for random walks, similar to the proof of *-continuity of the Wiener walk. The key random walk estimate is (7.7), which states that there is an a > 0, such that IF'{Rj > I} ~ 1 - (a/JJ). By iterating this estimate, we get the exponential bound
~} =1F'{RI + j -
IF' {L(l+j)/N - Ll/N >
- k J]}
Rl k..jJ J
ak . 0, and Ts denotes the first time u ~ s with Bu = 0, then, given B t , 0 ~ t ~ s, the distribution of L T.+ r - Ls = L T.+ r - LT. is the same as that of L r . This process is unique when we choose the normalization
lE[Ld
= J2/,rr.
While this construction of the continuous local time from its discrete analog is beautiful, a little bit of thought tells one that the proof must use much more structure about the processes than just "the Wiener walk is infinitely close to the Wiener process." The latter fact alone does not tell us that there is such a close relationship between the times that the Wiener walk is exactly at 0 and the times that the Wiener process is exactly at O. The reason we can do this is that both the Wiener walk (i.e., the simple random walk) and the Wiener process have a great deal of homogeneity. For the Wiener process it can be shown that with probability I,
L t = lim
0, the quantities
~ f} ds,
f- 1 / 2
1 t
l{IWsl
~ f} ds
are "macroscopic" quantities that should be approximately the same if B t and W t are very close. Let us explain why the analog of (7.9) holds for the Wiener walk; a similar argument gives (7.9) for the Wiener process. Let Pj be the time of the jth return to the origin by the simple random walk S, Pj = min{k: Lk/N = Let
L*(j,kN- 1 / 2 ) = N- 1 / 2
j/VN}.
L I'50Pi
l{SI = k}
172
CHAPTER 7
denote the "local time at k/.JN" at the first time the local time at the origin reaches j /.IN, and if A is a subset of N- 1 /2 z, L*(j,A) =
L L*(j,x).
:rEA
Note that L*(j,O) = j/.JN. Suppose AM = {k : Ikl ::; M} for some integer M with M/.JN ~ O. Then L*(j,A M ) is the sum of j independent and identically distributed random variables with the distribution of YM /.IN where YM = .IN L*(1, AM). A straightforward calculation using (7.7) shows that
lE[YM]
= 2M + 1,
Var[YM] ::; lE[YL] ::; c M 3 ,
for some standard c > O. Hence, lE[L*(..[N,AM)] =1 2M+1 '
'" 0 Var [ L*(..[N,AM)] 2M + 1 _.
In other words, the first time that the origin is visited .IN times is approximately the same time as the first time AM is visited (2M + 1) .IN times. 7.8 OUTER BOUNDARY OF WIENER PROCESS IN 1R2
As we have said, it is somewhat fortunate that the local time for the Wiener walk and the Wiener process are as closely related as they are. There are other quantities associated to Wiener paths for which there are elementary random walk equivalents and for which it is still open whether or not one can define the Wiener process quantity by taking the "standard part" of the corresponding Wiener walk quantity. I will give one interesting example for which the answer is unknown: parameterizing the outer boundary of the planar Wiener process. Let B t , 0 ::; t ::; 1, denote a two-dimensional Wiener bridge, that is, a Wiener process B t in 1R2 = C conditioned so that Bo = B1 = O. This is conditioning on an event of probability zero, but this can be made precise by taking a limit. In fact, if 13t is a complex Brownian motion, then Bt,O ::; t ::; 1, has the distribution of 13t - t 131 • Let H denote the unbounded component of C \ B[O, 1] and let 8 denote the boundary of H. Then 8 is called the frontier or outer boundary of the Wiener bridge. It is not obvious, but can be proved by estimation of Brownian "intersection exponents" [3], that 8 is a Jordan curve, that is, is topologically equivalent to a circle. In particular, this means that if F is a conformal transformation of C \ {Izl ::; 1} onto H (the existence of such a transformation follows from the Riemann mapping theorem), then F can be extended continuously to H in such a way that F is a homeomorphism of the unit circle onto 8. This parameterization of 8 by F, which can be called parameterization by harmonic measure or capacity, is in many ways an unnatural parameterization from a local perspective on the curve . . What we would like to do is to give the outer boundary a "natural" parameterization in a sense we now describe. Mandelbrot [19] observed that simulations of 8 (or, more precisely, of random walk approximation of 8) suggested that its fractal dimension was 4/3, which is
173
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
the same as the conjectured dimension for the scaling limit for planar self-avoiding walks or self-avoiding polygons. A few years ago, Oded Schramm, Wendel in Werner, and I [14] used the Schramm-Loewner evolution (SLE) to prove that the outer boundary of the Wiener bridge does have Hausdorff dimension 4/3. This result, which is essentially the determination of a "disconnection exponent" for planar Wiener processes, also implies a similar result for random walk boundaries [131. Choose an integer N ~ 00 and let W t = Wt,N denote a Wiener walklbridge of 2N2 steps. More precisely, for nonnegative integers k ~ 2N2, Wk~t
= N- 1 Sk,
where Llt = 1/{2N2) and S is a simple random walk conditioned so that S2N2 = O. W t for other t E [0, 1] are defined by linear interpolation. Using the K6mlos, Major, and Tusmidy approximation scheme (see [10, 17]), we can find a standard e < 00 and a coupling of a Wiener walklbridge and a standard complex Wiener bridge on the same probability space so that, except for an event of probability not exceeding eN-lOO,
IWt - Btl ~ eN- 1 10gN,
0 ~ t ~ l.
The set of points visited by the Wiener walklbridge at times 0, Llt, 2Llt, ... , 1 Llt,1 is contained in the lattice N-l 'I}. To each such lattice point z, let us associate the closed square of side length N- 1 centered at z with sides parallel to the coordinate axes. Let H* denote the unbounded component of C with all these squares removed, and let 8* denote the boundary of H*. Then 8* is the union of line segments of length N- 1 • In fact, 8* is a self-avoiding polygon (on the dual lattice). The number of these segments in 8* is of order N 4 / 3 • Hence we can parameterize 8* by starting at some point and traversing the boundary at constant speed N- 1 / 3 • This is a natural parameterization if we consider 8* as a self-avoiding polygon. This leads to some open questions: • Is there a way to take the "standard part" of this parameterization to get a parameterization of 8? • Is this standard part independent of the choice of N? In other words, is there a unique (up to a constant factor and a choice of "starting point") natural parameterization 'TJ of 8? • Does the parameterization 'TJ have finite (4/3)-variation? In other words, is
n~oJ~~j Gn) -~ (\: 1 )1
4/3
E (0,00),
or, at least,
a> 4/3, a < 4/3? Here a > 0 is the time duration of 8 in this parameterization which depends on the realization. The conjectures on the variation do hold for the discrete model on the smallest "microscopic" level; that is, if we divide time into intervals with Llt = N-4/3,
174
CHAPTER 7
there are O(N 4 / 3 ) such intervals and IWt+ At - Wtl = N- 1 . However. we do not know whether it holds for the discrete model at "mesoscopic" or "macroscopic" scales. and this prevents us from concluding something about the Wiener process.
7.9 LOOP-ERASED RANDOM WALK The loop-erased random walk (LERW) is a process obtained from simple random walk by erasing loops. For this section. we let N ~ 00. d 2?: 2 a standard integer. B t a standard ddimensional Wiener process. and W t = N- l SdtN2 a Wiener walk with infinitesimal space increments 1/N. The d is included in the time variable of S in order that the process makes the correct number of times steps; that is. by time t the process has taken approximately tN 2 steps in each of the d coordinates. We will assume that the Wiener walk and Wiener process are coupled l2 so that for some T ~ 00. nearly surely
Wt
~
Bt ,
0 :::; t :::; T.
Suppose Sn is a simple random walk in Zd, d 2?: 3. For these values of d the process is transient so the following is well defined:
ao
= max{j : Sj = O}.
We then define recursively O"j+l
= max{k
> aj : Sk = SUj+!}'
The resulting process Sj := SUj is called the loop-erased random walk. This is also called chronological loop-erasure because it is equivalent to erasing loops from the path whenever they appear. We can think of this as a consistent family of measures Pn = Pn,d on self-avoiding walks of n steps starting at the origin. If d = 2. one can also define the loop-erased walk by a limiting procedure as follows. For any n « 00, we take M ~ 00; consider the measure on n-step walks obtained by looking at the first n steps of the loop-erasure of [So, Sl, ... ,SPM J. where PM is the first j with ISj I 2?: M. and then taking the standard part of this measure. For convenience we will choose M to be much larger than N. M = NlO will do. We can consider St for continuous t by linear interpolation. This is the same thing as defining St = Su, where at = ak + t for k :::; t < k + 1. The loop-erased random walk is also called the Laplacian random walk because its transition probabilities are given by solving the discrete Laplace equation. If V is a finite subset of Zd. there is a unique function rPv : Zd -t [0, 00) satisfying:
rPv(z) = 0,
z E V;
12We have not discussed couplings of d-dimensional walks. It is not difficult to use Skorokhod embedding and a little extra randomness to get a coupling of a d-dimensional simple random walk and a d-dimensional Brownian motion although the coupling is not an "embedding" of the random walk in the Brownian motion. It is harder to give an analog of the Koml6s, Major, Tusnady approximation in d dimensions but it can be done. See (29) for a survey of recent advances in this area.
175
INTERNAL SET THEORY AND INFINITESIMAL RANDOM WALKS
lim