256 110 2MB
English Pages 375 [376] Year 2013
De Gruyter Studies in Mathematics 50 Editors Carsten Carstensen, Berlin, Germany Nicola Fusco, Napoli, Italy Fritz Gesztesy, Columbia, Missouri, USA Niels Jacob, Swansea, United Kingdom Karl-Hermann Neeb, Erlangen, Germany
Zoltán Sasvári
Multivariate Characteristic and Correlation Functions
De Gruyter
Mathematical Subject Classification 2010: 42B10, 43A35, 60F05, 60G10, 60G50, 60G51, 60E07, 60E10.
ISBN 978-3-11-022398-9 e-ISBN 978-3-11-022399-6 Set-ISBN 978-3-11-174044-7 ISSN 0179-0986 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the internet at http://dnb.dnb.de. © 2013 Walter de Gruyter GmbH, Berlin/Boston Typesetting: P T P-Berlin Protago-TEX-Production GmbH, www.ptp-berlin.de Printing and binding: Hubert & Co. GmbH & Co. KG, Göttingen Printed on acid-free paper Printed in Germany www.degruyter.com
Preface
Our goal in writing this book is to make the basic concepts and results on multivariate characteristic and correlation functions easily accessible to both students and researchers in a comprehensive manner. Especially in the interest of students, every attempt has been made to make the book as self-contained as possible. This led to a relatively large appendix containing topics like infinite products, functional equations, special functions or compact operators. We hope that this appendix will make the usage of the book much easier, for example in seminars. Chapters 1, 2 and parts of Chapter 3 have been ‘tested’ in lectures and seminars given at the Technical University Dresden, Wroclaw University of Technology, and Swansea University. In a certain sense, characteristic functions and correlation functions are the same, the common underlying concept is positive definiteness. Many results in probability theory, mathematical statistics and stochastic processes can be derived by using these functions. While there are books on characteristic functions of one variable, books devoting some sections to the multivariate case, and books treating the general case of locally compact groups, interestingly there is no book devoted entirely to the multidimensional case which is extremely important for applications. This book is intended to fill this gap at least partially. Due to the abundance of results and of their applications a single book cannot treat all of them. Most of the presented results are fairly well known. For this reason only few references are included, rather we cite papers and monographs where such references can be found. A brief outline of the book is as follows. The first chapter presents basic results and should be read carefully since it is essential for the understanding of the subsequent chapters. The second chapter is devoted to correlation functions, their applications to stationary processes and some connections to harmonic analysis. In Chapter 3 we deal with several special properties, Chapter 4 is devoted to the extension problem while Chapter 5 contains a few applications. Many results in the Appendix are presented with proofs. The reader might find some sections of the Appendix interesting on their own, e.g., the section Solutions of certain functional equations or the section Linear independence of exponential functions. The numbering of chapters, sections, theorems, etc. is traditional. Equations are numbered consecutively within each paragraph, theorem, definition, and so forth. Equation (i) stands for the i-th equation within the current paragraph while equation (l.k.j.i) stands for the i-th equation within the Paragraph (l.k.j). My warmest thanks go to Georg Berschneider who read the whole manuscript, gave valuable remarks, helped to find literature to special topics and solved several TeX
vi
Preface
problems. Björn Böttcher read substantial parts of the manuscript, I am grateful for his comments. I thank my colleagues at the Institut für Stochastik, Technische Universität Dresden, for many helpful discussions, in particular Christiane Weber for her help in preparing the figures, the index, and the bibliography, and Willi Schenk for the proof idea of Lemma B.1.2. I record my gratitude to the publishers, especially to the series editor Niels Jacob for his interest and encouragement. Last, but not least, I would like to thank my wife, Evelyne Sasvári, for her support and understanding. Dresden, November 2012
Zoltán Sasvári
Contents
Preface
v
1
Characteristic functions
1
1.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3 Inversion theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4 Basic properties of positive definite functions . . . . . . . . . . . . . . . . . . . . 25 1.5 Further properties of positive definite functions on Rd . . . . . . . . . . . . 32 1.6 Lévy’s continuity theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 1.7 The theorems of Bochner and Herglotz . . . . . . . . . . . . . . . . . . . . . . . . . 41 1.8 Fourier transformation on Rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 1.9 Fourier transformation on discrete commutative groups . . . . . . . . . . . . 55 1.10 Basic properties of Gaussian distributions . . . . . . . . . . . . . . . . . . . . . . 57 1.11 Some inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2
Correlation functions
67
2.1 Random fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.2 Correlation functions of second order random fields . . . . . . . . . . . . . . 70 2.3 Continuity and differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 2.4 Integration with respect to complex measures . . . . . . . . . . . . . . . . . . . . 77 2.5 The Karhunen–Loève decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 86 2.6 Integration with respect to orthogonal random measures . . . . . . . . . . . 92 2.7 The theorem of Karhunen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 2.8 Stationary fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 2.9 Spectral representation of stationary fields . . . . . . . . . . . . . . . . . . . . . . 109 2.10 Unitary representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 2.11 Unitary representations and positive definite functions . . . . . . . . . . . . . 125 3
Special properties
132
3.1 Strict positive definiteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 3.2 Infinitely differentiable and rapidly decreasing functions . . . . . . . . . . . 134
viii
Contents
3.3 Analytic characteristic functions of one variable . . . . . . . . . . . . . . . . . 139 3.4 Holomorphic L2 Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 3.5 Further properties of Gaussian distributions . . . . . . . . . . . . . . . . . . . . . 154 3.6 Fourier transformation of radial measures and functions . . . . . . . . . . . 160 3.7 Radial characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 3.8 Schoenberg’s theorems on radial characteristic functions . . . . . . . . . . . 172 3.9 Convex and completely monotone functions . . . . . . . . . . . . . . . . . . . . 175 3.10 Convolution roots with compact support . . . . . . . . . . . . . . . . . . . . . . . . 184 3.11 Infinitely divisible characteristic functions . . . . . . . . . . . . . . . . . . . . . . 187 3.12 Conditionally positive definite functions . . . . . . . . . . . . . . . . . . . . . . . . 189 4
The extension problem
200
4.1 General results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 4.2 The cases Rd and Zd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 4.3 Decomposition of locally defined positive definite functions . . . . . . . . 213 4.4 Extension of radial positive definite functions . . . . . . . . . . . . . . . . . . . 221 5
Selected applications
224
5.1 Limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 5.2 Sums of independent random vectors and the Jessen–Wintner purity law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 5.3 Ergodic theorems for stationary fields . . . . . . . . . . . . . . . . . . . . . . . . . . 234 5.4 Filtration of discrete stationary fields . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Appendix
242
A
242
Basic notation
A.1 Standard notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 A.2 Multidimensional notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 B
Basic analysis
244
B.1 Miscellaneous results from classical analysis . . . . . . . . . . . . . . . . . . . . 244 B.2 Uniform convergence of continuous functions . . . . . . . . . . . . . . . . . . . 256 B.3 Infinite products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 B.4 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 B.5 The Riemann–Stieltjes integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 B.6 Multivariate calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 B.7 The Lebesgue integral on Rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Contents
C
Advanced analysis
ix 278
C.1 Functions of a complex variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 C.2 Almost periodic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 C.3 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 C.4 The Gamma function and the formulae of Stirling and Binet . . . . . . . . 288 C.5 Bessel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 C.6 The Mellin transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300 C.7 The Laplace transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 C.8 Existence of continuous logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 C.9 Solutions of certain functional equations . . . . . . . . . . . . . . . . . . . . . . . 306 C.10 Linear independence of exponential functions . . . . . . . . . . . . . . . . . . . 310 D
Functional analysis
314
D.1 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 D.2 Matrices and kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 D.3 Hilbert spaces and linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 D.4 Convex sets and the theorem of Kre˘ın and Milman . . . . . . . . . . . . . . . 332 D.5 Weak topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 E
Measure theory
339
E.1 Borel measures, weak and vague convergence . . . . . . . . . . . . . . . . . . . 339 E.2 Convolution of measures and functions . . . . . . . . . . . . . . . . . . . . . . . . 345 F
Probability
347
F.1 Basic notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 F.2 Convergence of random vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 F.3 Products of probability spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 F.4 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 Bibliography
357
Index
361
Chapter 1
Characteristic functions
In this very first chapter we prove several classical results due to, among others, Herglotz, Bochner, Lévy, and Plancherel. We start with basic properties of characteristic functions and continue with the investigation of positive definite functions. As a byproduct, we obtain basic results on the Fourier transformation which will be needed in subsequent chapters. Before starting we suggest to have a short look at Appendix A which contains basic notation used throughout the book.
1.1
Basic properties
This is probably the easiest section of the book. We first define characteristic functions and then show several simple properties. All the proofs are short and not difficult. In spite of their simplicity, these properties are extremely useful and frequently applied. Definition 1.1.1. Let X be a d-dimensional random vector and denote by the distribution of X . The characteristic function f D fX of X is defined by Z f .t / WD E ei.t;X/ D ei.t;x/ d.x/; t 2 Rd : Rd
The existence of the expectation above is guaranteed by the fact that the complexvalued random variable ei.t;X/ is bounded for each t 2 Rd . Sometimes we call f the characteristic function of the distribution , in which case we also write f . As we will see in Section 1.3, every distribution is uniquely determined by its characteristic function. If X has a density p, then Z ei.t;x/ p.x/ dx; t 2 Rd : f .t / D Rd
If X is discrete with values x1 ; x2 ; : : : and corresponding probabilities p1 ; p2 ; : : :, then X f .t / D pj ei.t;xj / ; t 2 Rd : j
Especially, the functions t 7! ei.t;x/
and
t 7! cos..t; x// D
1 i.t;x/ 1 i.t;x/ C e e 2 2
2
Chapter 1 Characteristic functions
are characteristic functions for every x 2 Rd . As a further example, assume that X is uniformly distributed on the parallelepiped K D Œa1 ; b1 Œad ; bd Rd ; where aj < bj . We then have 1 fX .t / D .K/
Z i.t;x/
e K
1 dx D .K/
Z
d Y
eitj xj dx
K j D1
d Z bj d Y Y 1 eibj tj eiaj tj D eitj xj dxj D .K/ i.bj aj /tj aj j D1
j D1
where the factors of the last product are equal to 1 (by definition) if tj D 0. In the last example, the characteristic function fX is real-valued whenever aj D bj for all j . We will show later (cf. Theorem 1.3.13) that fX is real-valued if and only if X and X have the same distribution, i.e., if is symmetric: D 4 . The proof of the “if part” is very simple, but for the “only if part” we need the fact that every distribution is uniquely determined by its characteristic function. Theorem 1.1.2. Every characteristic function f on Rd has the following properties: (i)
f .0/ D 1;
(ii)
jf .t /j 1;
(iii)
f .t / D f .t /;
(iv)
f is uniformly continuous on Rd ;
(v)
the inequality
t 2 Rd ; t 2 Rd ;
n X
f .ti tj /ci cj 0
i;j D1
holds for every positive integer n, for all t1 ; : : : ; tn 2 Rd , and for all c1 ; : : : ; cn 2 C. Proof. The first three properties follow immediately from Definition 1.1.1. To prove (iv) assume that f is the characteristic function of a random vector X . Then jf .t C h/ f .t /j D jE Œei.t;X/ .ei.h;X/ 1/j E.jei.h;X/ 1j/ where the upper bound is independent of t and, by the dominated convergence theorem, tends to zero as h ! 0. This shows that f is uniformly continuous on Rd .
3
Section 1.1 Basic properties
The last statement follows from n X i;j D1
f .ti tj /ci cj D
n X
n X E.ei.ti tj ;X/ /ci cj D E ci ei.ti ;X/ cj ei.tj ;X/
i;j D1
i;j D1
n ˇ2 ˇ X ˇ ˇ ci ei.ti ;X/ ˇ 0: DE ˇ iD1
We will see in Section 1.7 that inequality (1.1.2.v) is a characteristic property of characteristic functions. The next theorem is very useful in the study of sums of independent random vectors. Theorem 1.1.3. If X and Y are independent d-dimensional random vectors, then the characteristic function of their sum is fXCY D fX fY : Proof. Let t 2 Rd be arbitrary. Since X and Y are independent, the complex-valued random variables ei.t;X/ and ei.t;Y / are independent as well. Thus, fXCY .t / D E.ei.t;XCY / / D E.ei.t;X/ ei.t;Y / / D E.ei.t;X/ / E.ei.t;Y / / D fX .t / fY .t /: For arbitrary distributions and on Rd there exist independent d-dimensional random vectors X and Y such that and are the distributions of X and Y , respectively.1 From this and Theorem 1.1.3 we obtain: Corollary 1.1.4. The product of two characteristic functions of d variables is again a characteristic function. Noting that f is the characteristic function of X , we obtain the following corollary. Corollary 1.1.5. If X and Y are independent identically distributed random vectors with characteristic function f , then jf j2 is the characteristic function of X Y . Next we present further simple methods to construct characteristic functions from given ones.
1
Note that the distribution of X C Y is so that the equation in Theorem 1.1.3 can also be written as f D f f .
4
Chapter 1 Characteristic functions
Theorem 1.1.6. Let R be a random variable and X be a d-dimensional random vector. If R and X are independent, then2 Z 1 fRX .t / D fX .st / dR .s/; t 2 Rd : 1
Proof. Applying equation (F.1.3) we obtain fRX .t / D E ei.t;RX/ D E eiR.t;X/ Z 1 D eis dR.t;X/ .s/ 1 Z 1 eis dR ? .t;X/ .s/ D 1 Z 1Z 1 eisu d.t;X/ .u/ dR .s/ D 1 1 Z 1Z eis.t;x/ dX .x/ dR .s/ D 1 Rd Z 1 fX .st / dR .s/ : D 1
Theorem 1.1.7. Let X be a d-dimensional random vector. Then the equation fAXCb .t / D ei.t;b/ fX .AT t /;
t 2 Rn
holds for every linear mapping A W Rd ! Rn and b 2 Rn . In particular, if X D .X1 ; X2 / where X1 and X2 are random variables, then fa1 X1 Ca2 X2 .t / D fX .a1 t; a2 t /;
a1 ; a2 ; t 2 R:
Proof. We have T t;X/
fAXCb .t / D EŒei.t;AXCb/ D E Œei.t;b/ ei.A
D ei.t;b/ fX .AT t /:
Theorem 1.1.8. Let X1 and X2 be independent d1 - and d2 -dimensional random vectors, respectively, and write d D d1 C d2 . Then the characteristic function of the d-dimensional random vector X D .X1 ; X2 / is given by fX .t / D fX1 .t1 / fX2 .t2 / where t1 2 Rd1 ; t2 2 Rd2 , and t D .t1 ; t2 / 2 Rd .3
2 3
See also Lemma 1.5.7. See also Theorem 1.3.10.
5
Section 1.1 Basic properties
Proof. The theorem follows from fX .t / D E.ei.t;X/ / D E.ei.t1 ;X1 / ei.t2 ;X2 / / using the independence of X1 and X2 . Applying the same argument as in the proof of Corollary 1.1.4, we obtain the following corollary. Corollary 1.1.9. Let fi be a characteristic function of di variables .i D 1; 2/. Then the function f defined by f .t / D f1 .t1 / f2 .t2 / where t1 2 Rd1 ; t2 2 Rd2 , and t D .t1 ; t2 / 2 Rd , is a characteristic function of d D d1 C d2 variables. Remark 1.1.10. It is a useful fact that the characteristic function fX of a random vector X D .X1 ; : : : ; Xd / is determined by the characteristic functions fLy of all linear combinations Ly D y1 X1 C C yd Xd D .y; X /;
y 2 Rd
and vice versa. Indeed, fLy .s/ D E.eisLy / D E.eis.y;X/ / D fX .s y/;
y 2 Rd ; s 2 R:
(1)
In particular, fX .y/ D fLy .1/. From (1) we also see that s 7! fX .s y/ is a characteristic function of one variable for all y 2 Rd . Remark 1.1.11. Let X D .X1 ; : : : ; Xd / be a random vector and i1 ; : : : ; ik be integers such that 1 i1 < < ik d . It follows immediately from Definition 1.1.1 that the characteristic function of the random vector .Xi1 ; : : : ; Xik / can be obtained by setting ti D 0 for i … fi1 ; : : : ; ik g in fX .t1 ; : : : ; td /. The distribution of .Xi1 ; : : : ; Xik / is usually called a marginal distribution. Lemma 1.1.12. Let be a distribution on Rd . The real part Z Re f .t / D cos ..t; x// d.x/; t 2 Rd Rd
of its characteristic function f is the characteristic function of the distribution . C 4 /=2 . Proof. The statement follows immediately from the fact that Z Z 4 h.t / d .t / D h.t / d.t / for every bounded continuous function h on Rd .
6
Chapter 1 Characteristic functions
In the rest of this section we present some simple examples of characteristic functions of one variable. These functions are shown in Figures 1.1–1.5. Example 1.1.13. (a) The exponential distribution with parameter ˛ > 0 has the density function x 7! ˛1Œ0;1/ .x/ e˛x ; x 2 R , and its characteristic function is given by Z 1 ˛ e.it˛/x ˇˇxD1 ˛ ˛e˛x eitx dx D D ˇ xD0 it ˛ ˛ it 0 (see Figure 1.1). (b) The Laplace distribution with parameter ˇ > 0 has the density function x 7! 1 2ˇ
jxj
e ˇ ; x 2 R. Its characteristic function is given by Z 1 Z 1 jxj 1 1 1 e ˇ eitx dx D e. ˇ Cit/x dx 2ˇ 1 2ˇ 0 Z 0 1 1 C e. ˇ Cit/x dx 2ˇ 1 1 1 1 1 C D D 1 1 2ˇ 1 C ˇ2t 2 it ˇ it C ˇ
(see Figure 1.2). Using Corollary 1.1.5 and the last two examples we see that the difference of two independent identically distributed exponential random variables is Laplace distributed. Im 0.5
1 Re
0:5 Figure 1.1. The function t 7!
1 ; 1it
t 2 Œ10; 10, from Example 1.1.13(a).
7
Section 1.1 Basic properties
0.5
3
2
1
Figure 1.2. The function t 7!
1 1Ct 2
1
2
3
from Example 1.1.13(b).
(c) To compute the characteristic function of the binomial distribution ! n X n k p .1 p/nk ık D k kD0
with parameters n 1 and p 2 Œ0; 1, let X1 ; : : : ; Xn be independent random variables such that P .Xj D 0/ D 1 p; P .Xj D 1/ D p for all j . Then is the distribution of X D X1 C C Xn . Applying Theorem 1.1.3 we obtain f .t / D fX .t / D ŒfX1 .t /n D Œ1 p C p eit n (see Figure 1.3 and Figure 1.4). (d) The characteristic function of the Poisson distribution e
1 X k ık kŠ
kD0
with parameter 0 is e
1 1 X X k itk .eit /k it e D e D e.e 1/ kŠ kŠ
kD0
(see Figure 1.5).
kD0
8
Chapter 1 Characteristic functions Im 1
0:5
1 Re
1 Figure 1.3. The function t 7! Œ1 p C p eit n from Example 1.1.13(c) with p D 1=2 and n D 4: Im 1
0:8
1 Re
1 Figure 1.4. The function t 7! Œ1 p C p eit n from Example 1.1.13(c) with p D 0:7 and n D 4.
9
Section 1.2 Differentiability Im 0.5
1 Re
0:5 Figure 1.5. The function t 7! exp.eit 1/ from Example 1.1.13(d).
1.2
Differentiability
The characteristic functions in Example 1.1.13 are infinitely often differentiable (even analytic) on R. We will show in Example 1.3.7 that the characteristic function of the Cauchy distribution with parameter ˛ D 1 is given by f .t / D ejtj . This function is not differentiable at 0. Example B.1.19 shows that characteristic functions may be nowhere differentiable. Throughout this section, denotes a probability distribution on Rd with characteristic function f . We will see that differentiability of f is closely related to the existence of moments of . Theorem 1.2.1. Let ˛ 2 Nd0 be such that the moment M˛ of exists. Then the partial derivative D ˛ f exists and we have Z x ˛ ei.t;x/ d.x/; t 2 Rd ; (i) D ˛ f .t / D ij˛j Rd
(ii)
D ˛ f .0/
(iii)
D˛ f
D
ij˛j
M˛ ;
is uniformly continuous and jD ˛ f j is bounded by the absolute moment A˛ of .
Proof. Suppose that ˛j > 0 for some j and let e1 ; : : : ; ed be the standard basis of Rd . For h 2 R n f0g we have Z eihxj 1 i.t;x/ f .t C hej / f .t / D xj e d.x/: h hxj Rd
10
Chapter 1 Characteristic functions
Theorem B.1.5 with n D 0 shows that the integrand is dominated by jxj j. Taking the limit h ! 0 and applying Lebesgue’s theorem on dominated convergence we obtain Z @ f .t / D i xj ei.t;x/ d.x/; t 2 Rd : @tj Rd Repeating this argument we obtain (i). Equation (ii) is obtained from (i) by setting t D 0. From (i) we infer that Z ˛ jD f .t /j jx ˛ j d.x/ D A˛ : Rd
The uniform continuity of D ˛ f follows from ˇZ ˇ ˇ ˇ jD ˛ f .t C h/ D ˛ f .t /j D ˇ x ˛ .ei.h;x/ 1/ ei.t;x/ d.x/ˇ Rd Z jx ˛ j jei.h;x/ 1j d.x/ Rd
by dominated convergence. The previous theorem has the following immediate consequences. Corollary 1.2.2. Let X D .X1 ; : : : ; Xd / be a d-dimensional random vector with characteristic function f . If Xj is square integrable for all j , then the covariance matrix cov.X / D .cj k /dj;kD1 is given by cj k D
@2 @ @ f .0/ C f .0/ f .0/: @tj @tk @tj @tk
Corollary 1.2.3. Assume that ¤ ı0 and that for some ˛ 2 Nd0 the moment M2˛ of exists. Then the function g defined by g.t / D
.1/j˛j D 2˛ f .t /; M2˛
t 2 Rd
is a characteristic function. The corresponding distribution is given by d.x/ D
1 x 2˛ d.x/: M2˛
Example 1.2.4. As an application of Theorem 1.2.1 we compute the mean and the variance of the distributions from Example 1.1.13. Let f denote the characteristic function of a random variable X with finite variance. Then, in view of (1.2.1.ii), E.X / D if 0 .0/;
E.X 2 / D f 00 .0/
11
Section 1.2 Differentiability
and hence
Var.X / D E. X 2 / .E X /2 D f 00 .0/ f 0 .0/2 :
(a) If X is exponentially distributed with parameter ˛ > 0 then we have d ˛ ˛i ; D dt ˛ it .˛ it /2
f 0 .t / D
f 00 .t / D
2˛ .˛ it /3
and hence E.X / D ˛1 ; E.X 2 / D ˛22 and Var.X / D ˛12 . (b) In the case of the Laplace distribution with parameter ˇ > 0 we have f 0 .t / D
d 1 2ˇ 2 t D ; dt 1 C ˇ 2 t 2 .1 C ˇ 2 t 2 /2
f 00 .t / D
2ˇ 2 .3ˇ 2 t 2 1/ : .1 C ˇ 2 t 2 /3
Thus, E.X / D 0 and E.X 2 / D Var.X / D 2ˇ 2 . (c) Assume that X has the binomial distribution with parameters n and p. Writing q WD 1 p we have d .q C peit /n D inp .q C peit /n1 eit dt f 00 .t / D np .q C peit /n2 .q C pneit / eit : f 0 .t / D
Consequently, E.X / D np; E.X 2 / D np.np C q/ and Var.X / D npq. (d) If X has the Poisson distribution with parameter > 0, then d .eit 1 / it 1 it 1 D i eit e.e / ; f 00 .t / D eit .1 C eit / e.e / e f 0 .t / D dt and therefore E.X / D ; E.X 2 / D C 2 and Var.X / D . The next theorem follows immediately from Theorem 1.2.1 and Theorem B.6.1 with a D 0. Theorem 1.2.5. If for some positive integer n all moments M˛ ; ˛ 2 Nd0 ; j˛j n, of exist, then f .t / D
X M˛ X .it /˛ C R˛ .t / t ˛ ; ˛Š
j˛jn
where R˛ .t / D Moreover, jR˛ .t /j
n ˛Š
2A˛ ˛Š
t 2 Rd
j˛jDn
Z
1
.D ˛ f .xt / D ˛ f .0//.1 x/n1 dx:
0
and limt!0 R˛ .t / D 0.
Sometimes it is more convenient to use the following expansion which is obtained by using Theorem B.6.2.
12
Chapter 1 Characteristic functions
Theorem 1.2.6. Under the conditions of the previous theorem there exist ; 2 .0; 1/ depending on t such that X M˛ X Re D ˛ f . t / C i Im D ˛ f .t / f .t / D .it /˛ C t ˛: ˛Š ˛Š j˛jn1
j˛jDn
The next special case is frequently used. Corollary 1.2.7. Let X D .X1 ; : : : ; Xd / be a random vector such that E.Xj2 / < 1 for all j . For the characteristic function f of X we then have f .t / D 1 C i
d X
E.Xj /tj
j D1
d 1 X ŒE.Xi Xj / C Ri;j .t / ti tj ; 2
t 2 Rd
i;j D1
where jRi;j .t /j 2E.jXi Xj j/ and limt!0 Ri;j .t / D 0. Moreover, there exist ; 2 .0; 1/ depending on t such that f .t / D 1 C i
d X
E.Xj /tj C
j D1
d 1 X ŒRe fti ;tj . t / C i Im fti ;tj .t / ti tj : 2 i;j D1
Lemma 1.2.8. Let t 2 Rd be such that the inequality 1 Re f .s t / Ct s 2 ;
s2R
(1)
holds with some Ct 0. Then, Z Rd
.t; x/2 d.x/ 2 Ct :
(2)
In particular, if (1) holds for all t 2 Rd , then all moments of second order of are finite and hence all partial derivatives of order two of f exist.4 Proof. Let s 2 R n f0g; t 2 Rd and T > 0 be arbitrary. We then have Z 1 Re f .s t / 1 cos .s.t; x// D d.x/ Ct 2 s s2 Rd Z T Z T 1 cos .s.t; x// d.x/: s2 T T Using the inequality (B.1.6.2) and Lebesgue’s theorem on dominated convergence we see that the right-hand side tends to Z T Z 1 T .t; x/2 d.x/ 2 T T as s ! 0. Since T > 0 is arbitrary, we find that (2) holds. 4
Compare this result with Theorem 1.11.4.
13
Section 1.2 Differentiability
The second statement is obtained by choosing special t ’s with tj D 1 and ti D 0, i ¤ j , in (2). The existence of D ˛ f does not imply the existence of the moment M˛ in general.5 However, the following is true. Theorem 1.2.9. If for some ˛ 2 Nd0 n f0g and for g D Re f all partial derivatives D ˇ g; ˇ < 2˛, exist in an open neighborhood of zero and if D 2˛ g exists at zero, then the moment M2˛ of exists. Moreover, the partial derivative D 2˛ f exists on all of Rd . Proof. The second statement follows from the first one using Theorem 1.2.1. To prove the first statement assume first that j˛j D 1. By our assumption the function h.s/ D h˛ .s/ D Re f .s ˛/;
s2R
is differentiable in a neighborhood of 0 and h00 .0/ exists. Since h is even, h0 .0/ D 0. Applying (1.1.2.ii) and the mean value theorem we see that 0
h0 .s/ h0 .s/ 1 h.s/ D ; s2 s s
s¤0
where D .s/ 2 .0; 1/. Since h0 .0/ D 0, the right-hand side tends to h00 .0/ as s ! 0. Consequently, there exists a constant C D C˛ such that 1 h.s/ C; s2
s¤0
(note that jh.s/j 1 for all s). Thus, by Lemma 1.2.8, the moment M2˛ exists. The general case j˛j > 1 follows from Corollary 1.2.3 using induction on j˛j. We continue with a simple application to conditional expectations. Definition 1.2.10. Let X0 be a random variable with finite expectation and let X D .X1 ; : : : ; Xd / be a random vector. The random variable X0 is said to have polynomial regression of order k on X if there exists a polynomial X Q.t / D b˛ t ˛ ; t 2 Rd (1) ˛2Nd 0 ; j˛jk
of order k such that E.X0 jX / D Q.X /: If k D 0; 1 or 2 we speak of constant, linear or quadratic regression, respectively. 5
See Remark 1.10.6 in [6] for an example.
(2)
14
Chapter 1 Characteristic functions
Whether X0 has polynomial regression on X can be answered in terms of the characteristic function of the random vector .X0 ; X1 ; : : : ; Xd /. Theorem 1.2.11. Let k 2 N0 and let X0 ; X1 ; : : : ; Xd be random variables such that all moments of order at most k of the random vector Y D .X0 ; X1 ; : : : ; Xd / exist. Denote by g the characteristic function of Y . Then the equation (1.2.10.2) holds if and only if 6 X iD .1;0;:::;0/ g.0; t / D .i/j˛j b˛ D .0;˛1 ;:::;˛d / g.0; t /; t 2 Rd : ˛2Nd 0 ; j˛jk
Proof. Write X D .X1 ; : : : ; Xd /. By Lemma F.4.2, equation (1.2.10.2) holds if and only if X b˛ E X ˛ ei.t;X/ ; t 2 Rd : E X0 ei.t;X/ D ˛
Hence, the statement follows immediately from Theorem 1.2.1. Next we prove an interesting result: If the product of two characteristic functions is 2n-times differentiable, then so are the factors.7 Theorem 1.2.12. Suppose that f D f1 f2 where f1 ; f2 are characteristic functions on R and let 1 ; 2 be the corresponding distributions. If f is 2n-times differentiable, then so is f1 and k .k/ 1=k jf1 .0/j 2 Ak C jm2 j ; 1 k 2n (1) where m2 is an arbitrary median8 of 2 and Ak is the absolute moment of order k of WD 1 2 . Analogous statements hold for f2 . Proof. Note first that f is the characteristic function of . Since f is 2n-times differentiable, the moments Ak ; k D 1; : : : ; 2n, exist by Theorem 1.2.9. For every a 2 R we have Z Z Z 1 > Ak .a/ WD
R
jx ajk d.1 2 /.x/ D
R
R
jt C s ajk d1 .t / d2 .s/:
jt jk . Using this inequality and If t and s a have the same sign, then jt C s the fact that the integrand above is zero if s D a and t D 0, we obtain Z Z Z Z Ak .a/ jt jk d1 .t / d2 .s/ C jt jk d1 .t / d2 .s/ Œa;1/ Œ0;1/ .1;a .1;0/ Z Z k D 2 Œa; 1/ jt j d1 .t / C 2 .1; a jt jk d1 .t /: ajk
Œ0;1/ 6 7 8
Further results of this type can be found in [33]. See also Theorem 3.3.9. This means that 2 ..1; m2 / 12 and 2 .Œm2 ; 1//
.1;0/
1 2
.
Section 1.2 Differentiability
15
These inequalities show that if a D m2 is a median of 2 , then Z 1 jt jk d1 .t / 2Ak .m2 /; k D 1; : : : ; 2n:
(2)
1
By virtue of Theorem 1.2.1, the function f1 is 2n-times differentiable. Moreover, by (2) and (1.2.1.iii), we have .k/
jf1 .0/j 2Ak .m2 /: On the other hand, Z Ak .m2 / D
(3)
1
jx m2 jk d.x/ 1 ! ! Z 1 X k k X k k j kj jxj jm2 j Aj jm2 jkj : d.x/ D j j 1 j D0
j D0
The last sum is, in view of inequality (F.1.5), majorized by ! k k X k 1=k j 1=k jm2 jkj D Ak C jm2 j Ak j j D0
which, together with (3), proves the inequality (1). Interchanging the roles of f1 and f2 , we obtain the last statement of the theorem. The next theorem generalizes the differentiability statement of Theorem 1.2.12.9 Theorem 1.2.13. Let ı > 0 and f be a 2n-times differentiable function on the interval .ı; ı/ such that f .t / D f .t /; jt j < ı. Further, let f1 ; : : : ; fr be characteristic functions on R and ˛1 ; : : : ; ˛r be positive real numbers such that r Y
jfj .t /j˛j D jf .t /j;
ı < t < ı:
j D1
Then the fj ’s are 2n-times differentiable on R. Proof. Let gj .t / D fj .t /fj .t / D jfj .t /j2 ;
t 2R
jf .t /j2 ;
and g.t / D f .t /f .t / D jt j < ı. The gj ’s are nonnegative characteristic functions, the corresponding distributions will be denoted by j . The function g is even, nonnegative and 2n-times differentiable. Moreover, r Y j D1 9
See also Theorem 3.3.14.
˛
gj j .t / D g.t /;
ı < t < ı:
(1)
16
Chapter 1 Characteristic functions
We prove by induction on k that gj is 2k-times differentiable for all k D 1; : : : ; n. The statement for fj follows then from Theorem 1.2.12. ˛ Assume first that k D 1. Since gj .t / 1 we have g.t / gj j .t /. Using this and the mean value theorem for the function x 7! x ˛j we see that ˛
1 gj j .t / 1 gj .t / 1 g.t / D ˛j s.t /˛j 1 2 2 t t t2 where gj .t / s.t / 1. As g is even g 0 .0/ D 0 and hence the left side tends to 12 g 00 .0/ if t tends to zero. On the other hand, s.t / tends to 1 and therefore 1 gj .t / C; t2
t ¤0
for some constant C . Lemma 1.2.8 shows that gj is twice differentiable. Suppose now that gj is 2k-times differentiable for some k < n. Then .2l1/
gj
.0/ D 0;
l D 1; : : : ; kI j D 1; : : : ; r:
We choose ı0 2 .0; ı such that g.t / ¤ 0 if jt j < ı0 . Then we have r X
1 D h.t /; gj .t /
˛j log
j D1
ı0 < t < ı0
(2)
where h D log g. This equation and Faà di Bruno’s formula (B.1.14) with m D 2k; g D log and f D gj show that for t 2 .ı0 ; ı0 / we have h
.2k/
.t / D
r X
˛j
X
c.b/
j D1
gj0 .t /
!b1
gj .t /
gj00 .t / gj .t /
!b2
.2k/
gj
.t /
!b2k
gj .t /
(3)
where the last sum is over all b 2 N2k 0 satisfying b1 C 2b2 C C 2kb2k D 2k and c.b/ D
.2k/Š.1/jbj .jbj 1/Š : b1 Šb2 Š : : : b2k Š .1Š/b1 .2Š/b2 : : : ..2k/Š/b2k
Setting t D 0 in (3), subtracting the resulting expression from (3) and using that c.b/ D 1 if b D .0; 0; : : : ; 1/, we obtain r X j D1
.2k/
˛j
gj
.t /
gj .t / h
!
.2k/ gj .0/
.2k/
.t / h
D
.2k/
.0/
r X j D1
˛j
X
c.b/ ŒSj;b .t / Sj;b .0/
(4)
17
Section 1.2 Differentiability
where the last sum is over all b 2 N2k 0 satisfying b2k D 0; b1 C 2b2 C C .2k 1/b2k1 D 2k and !b !b !b .2k1/ gj0 .t / 1 gj00 .t / 2 gj .t / 2k1 : Sj;b .t / D gj .t / gj .t / gj .t / .2l1/
Using that gj is 2k-times continuously differentiable and gj .0/ D 1; gj .0/ D 0, 1 l k, we obtain 1 (5) D 1 C O.t 2 /; t ! 0 gj .t / and .2l1/ .2l2/ .2l2/ gj .t / D O.t /; gj .t / D gj .0/ C O.t 2 /; t ! 0: These relations show that Sj;b .t / Sj;b .0/ D O.t 2 / if b ¤ .0; 0; : : : ; 1/. Since h is 2k-times differentiable we conclude that the left-hand side of (4) is O.t 2 /. Applying this and (5) we get r X
.2k/ .2k/ ˛j gj .t / gj .0/ D O.t 2 /:
j D1 .2k/
.2k/
Since gj =gj .0/ is a characteristic function (cf. Corollary 1.2.3), inequality (1.1.2.ii) shows that the terms in this sum have the same sign and hence they are O.t 2 /. .2k/ By Lemma 1.2.8 the function gj is twice differentiable, completing the induction on k. Remark 1.2.14. If all the numbers ˛k are integers or rational numbers, then Theorem 1.2.13 follows easily from Theorem 1.2.12 using the fact that fj˛ is a characteristic function for all ˛ 2 N. If g is a nonnegative characteristic function and ˛ is not an integer, then g ˛ is not necessarily a characteristic function. To see a simple example, let g.t / D cos2 t; t 2 R. Then g 1=2 .t / D j cos t j and g 1=2 is not a characteristic function since it is infinitely differentiable on .=2; =2/ but it is not differentiable at =2 (cf. Theorem 1.2.9). Remark 1.2.15. The set of all points where a given characteristic function is differentiable can be quite complicated. To see examples, let b 2 be an integer and f"n g1 1 be a sequence of nonnegative numbers tending to zero. Denote by D the set of all t 2 R where the characteristic function g defined by g.t / D C
1 X
"n b n cos b n t;
t 2R
nD1
is differentiable (the constant C is determined by the condition g.0/ D 1). Then g is differentiable at 0. Moreover:
18
Chapter 1 Characteristic functions
For every open interval I R the set D \ I has the same cardinality as R. P 0 (ii) If 1 1 "n < 1 then D D R and g is continuous. P1 2 (iii) If 1 "n D 1 then .D/ D 0. (i)
This result is due to A. Zygmund, a proof can be found in Section 1.10 of [6].
1.3 Inversion theorems The main result of this section is Theorem 1.3.3, which states that distributions are uniquely determined by their characteristic functions. This result allows us to work with functions instead of measures, which is very useful for example in the study of limit theorems. At the end of this section we present some simple applications of the uniqueness theorem. Theorem 1.3.1. Let f be the characteristic function of a d-dimensional distribution and let a; b 2 Rd such that aj < bj ; j D 1; : : : ; d . Then 10 1 T !1 .2/d lim
Z
Z
T
T
::: T
Z Y d d Y eiaj tj eibj tj f .t / dt D j .x/ d.x/ itj Rd
T j D1
where j .x/ D
j D1
8 < 1 :
if if if
1 2
0
xj 2 .aj ; bj / xj D aj or xj D bj xj … Œaj ; bj :
Proof. For T > 0 write 1 I.T / WD .2/d 1 D .2/d
Z
Z
T
T
::: T
Z
T j D1
Z
T
T
::: T
d Y eiaj tj eibj tj f .t / dt itj
T
Y Z d eiaj tj eibj tj i.x;t/ e d.x/ dt: itj Rd j D1
In view of Theorem B.1.5, ˇ ˇ ˇ ˇ ˇ eiaj tj eibj tj ˇ ˇ 1 ei.bj aj /tj ˇ ˇ ˇ ˇ i.x;t/ ˇ e ˇ ˇDˇ ˇ bj aj ; ˇ ˇ ˇ ˇ itj tj
10
The value of .eiaj tj eibj tj /=itj at tj D 0 is defined by continuity. It is equal to bj aj .
19
Section 1.3 Inversion theorems
and hence we may apply Fubini’s theorem: Z T Y Z Z T d 1 ei.xj aj /tj ei.xj bj /tj I.T / D : : : dt d.x/ itj .2/d Rd T T j D1 Z D
Rd j D1
Z DW
Z T d
Y sin.xj bj /tj 1 sin.xj aj /tj dtj d.x/ 2 T tj tj d Y
Rd j D1
Kj .T; x/ d.x/:
It follows from Corollary B.1.8 that limT !1 Kj .T; x/ D j .x/. By Theorem B.1.9, we have jKj .T; x/j < 2. Hence, the statement of the theorem follows by dominated convergence. Corollary 1.3.2. If the -measure of the boundary of the set J D .a1 ; b1 / .ad ; bd / is equal to zero, then 1 lim T !1 .2/d
Z
Z
T
::: T
T
d Y eiaj tj eibj tj f .t / dt D .J /: itj
T j D1
From this corollary we immediately obtain the following theorem. Theorem 1.3.3 (Uniqueness Theorem). Every distribution is uniquely determined by its characteristic function f . Proof. It suffices to show that the -measure of sets of the form J D .a1 ; b1 / .ad ; bd / is uniquely determined by f . If the boundary @J of J has zero -measure, then we can apply the previous corollary. Otherwise we consider the sets J D .a1 C ; b1 / .ad C ; bd /;
> 0:
If 1 ¤ 2 then @J1 \ @J2 D ;. Since is a finite measure the set f > 0 W .@J / ¤ 0g is denumerable. Consequently, there exists a sequence f n g1 1 such that .@Jn / D 0 and 1 [ J n J D nD1
from which the theorem follows.
20
Chapter 1 Characteristic functions
The next lemma deals with a special case of Gaussian distributions which will be investigated in more detail in Sections 1.10 and 3.5. Lemma 1.3.4. The function ' ; > 0, given by 2 1 kxk ' .x/ D p e 2 2 ; . 2 /d
x 2 Rd
is a probability density11 with characteristic function g .t / D e Moreover,
2 kt k2 2
;
t 2 Rd :
1 g1= : ' D p . 2 /d
(1)
Proof. We consider first the case d D 1. By Corollary B.1.11, the function ' is a probability density. The corresponding characteristic function is Z 1 2 1 y e 2 2 eity dy: g .t / D p 2 1 Differentiating (cf. Theorem 1.2.1) and integrating by parts we get Z 1 2 i y 0 g .t / D p ye 2 2 eity dy 2 1 Z 1 2 2t y D p e 2 2 eity dy D 2 tg .t /: 2 1 Every solution of this differential equation has the form t 7! C e g .0/ D 1 we must have C D 1. The case d > 1 is obtained now from the equation ' .x/ D
d Y
1 1 x 2 e 2 2 j ; p 2 j D1
2t2 2
. Since
x 2 Rd :
The last statement is trivial. The next lemma is a special form of Parseval’s identity (see also equation (i) in Plancherel’s Theorem 1.8.9). Lemma 1.3.5. Let and be distributions on Rd with characteristic functions f and f , respectively. Then Z Z ei.t;y/ f .t / d.t / D f .x C y/ d.x/ Rd
Rd
holds for all y 2 Rd . 11
The distribution corresponding to the case D 1 is called standard Gaussian (see also Figure B.2).
21
Section 1.3 Inversion theorems
Proof. By definition of f Z i.t;y/
e
f .t / D
Rd
ei.t;xCy/ d.x/:
Integrating both sides with respect to and using Fubini’s theorem we obtain the desired equation. Throughout the rest of this section f denotes the characteristic function of a probability distribution on Rd. The most frequently applied inversion formula is presented in the next theorem. Theorem 1.3.6. If f 2 L1 .Rd / then is absolutely continuous and the corresponding density p is given by Z 1 f .t / ei.t;y/ dt; y 2 Rd : p.y/ D .2/d Rd Moreover, p is bounded and uniformly continuous. Proof. Since f is integrable it can be written as a (complex) linear combination of four nonnegative integrable functions. Thus, the left-hand side of the equation above represents a linear combination of four characteristic functions. As characteristic functions are bounded and uniformly continuous, it remains to prove the first part of the theorem. Let X and Y be independent d-dimensional random vectors such that is the distribution of X and Y is standard Gaussian (cf. Lemma 1.3.4). For any > 0 the density and the characteristic function of Y are given by the functions ' and g of Lemma 1.3.4, respectively. Applying equation (F.1.2) and Lemma 1.3.4 we see that the random vector Sn D X C n1 Y; n 2 N, has a density pn given by
Z pn .y/ D
Rd
'1=n .y x/ d.x/ D
n p 2
d Z
Rd
gn .y x/ d.x/;
y 2 Rd :
Lemma 1.3.5 shows that the integral on the right-hand side is equal to Z ei.x;y/ f .x/'n .x/ dx Rd
and therefore
1 pn .y/ D .2/d
Z Rd
ei.t;y/ f .t /e
1 ktk2 2n2
dt:
(1)
The sequence fSn g1 1 converges pointwise to X . Therefore, the corresponding sequence of distributions converges weakly to . By Theorem E.1.7, Z Z h.y/pn .y/ dy D h.y/ d.y/ (2) lim n!1 Rd
Rd
22
Chapter 1 Characteristic functions
for every bounded continuous function h on Rd . Since the modulus of the integrand on the right-hand side of (1) is bounded by jf j, Lebesgue’s dominated convergence theorem can be applied to show that pn .y/ tends to Z 1 f .t / ei.t;y/ dt; y 2 Rd p.y/ WD .2/d Rd for all y 2 Rd . From (1) we also see that the sequence fpn g1 nD1 is uniformly bounded. Applying Lebesgue’s theorem once more and using (2) we obtain Z Z h.y/p.y/ dy D lim h.y/pn .y/ dy Rd Rd n!1 Z Z h.y/pn .y/ dy D h.y/ d.y/: D lim n!1 Rd
Rd
From this we conclude that is absolutely continuous with density p. Example 1.3.7. Let 1 ; t 2R 1 C ˛2t 2 be the characteristic function of the Laplace distribution with density f .t / D
jyj 1 e ˛ ; y 2 R 2˛ (see (1.1.13.b)). By the previous theorem, Z 1 jyj 1 1 eity dt: e ˛ D 2˛ 2 1 1 C ˛ 2 t 2
p.y/ D
jyj
Replacing y by y, this equation shows that y 7! e ˛ is the characteristic function of the Cauchy distribution with density t 7! 1 1C˛˛2 t 2 . Theorem 1.3.8. For every a 2 Rd we have Z T Z T 1 lim ei.a;t/ f .t / dt D .fag/: T !1 .2T /d T T Proof. Applying Fubini’s theorem we obtain Z T Z T 1 ei.a;t/ f .t / dt d .2T / T T
Z Z T Z T 1 i.a;t/ i.t;x/ e e d.x/ dt D .2T /d T T Rd Z
Z T Z T 1 i.t;xa/ D e dt d.x/ .2T /d T Rd T
23
Section 1.3 Inversion theorems
Z D
Rd
Z DW
Rd
! Z T d Y 1 eitj .xj aj / dtj d.x/ 2T T j D1 ! d Y Kj .xj ; T / d.x/ j D1
where Kj .aj ; T / D 1 and lim Kj .xj ; T / D lim
T !1
T !1
sin T .xj aj / D 0; T .xj aj /
x j ¤ aj :
Hence, the assertion of the theorem follows by dominated convergence. Theorem 1.3.9. Assume that D d C c where c is a continuous nonnegative measure and X pn ıxn d D n
is a nonnegative discrete measure. Then Z T Z T X 1 lim jf .t /j2 dt D pn2 : d T !1 .2T / T T n Consequently, is continuous if and only if the limit on the left-hand side is equal to zero. Proof. Let X1 and X2 be independent random vectors with characteristic function f . Then jf j2 is the characteristic function of X D X1 X2 and X X P .ŒX1 D xn \ ŒX2 D xn / D pn2 : P .X D 0/ D n
n
Applying Theorem 1.3.8 with a D 0 and with f replaced by jf j2 we obtain the assertion of the theorem. We close this section with some simple applications of the uniqueness Theorem 1.3.3. Theorem 1.3.10. The random variables X1 ; : : : ; Xd with characteristic functions f1 ; : : : ; fd are independent if and only if fX .t / D
d Y
fj .tj /;
t 2 Rd
j D1
where fX denotes the characteristic function of the random vector X D .X1 ; : : : ; Xd /.
(1)
24
Chapter 1 Characteristic functions
Proof. Denoting by j the distribution of Xj , the random variables Xj ’s are independent if and only if the distribution of X is equal to the product measure 1 d . The statement of the theorem follows from the uniqueness Theorem 1.3.3 and from the fact that the right-hand side of equation (1) is equal to the characteristic function of the product measure 1 d . Theorem 1.3.11. A probability measure on Rd is uniquely determined by its values assigned to the half spaces Hy;r WD fx 2 Rd W .y; x/ rg;
y 2 Rd ; r 2 R:
Proof. Let X be the identity mapping on Rd and consider X as a random vector on the probability space . ; A; P / WD .Rd ; Bd ; /. Then is the distribution of X and Hy;r D f! 2 W .y; X.!// rg: Thus, the values .Hy;r /; r 2 R, determine the distribution (and hence the characteristic function) of the random variable ! 7! .y; X.!// for all y 2 Rd . On the other hand, the characteristic functions of these random variables determine the characteristic function of X (cf. Remark 1.1.10). This fact together with Theorem 1.3.3 proves the result. The next result can be proved in the same way as the previous one. We omit the proof. Theorem 1.3.12. The distribution of a random vector X is uniquely determined by the distributions of all orthogonal projections of X on lines going through the origin. Recall that a probability distribution on Rd is called symmetric if D 4 , i.e., .B/ D .B/ for all Borel subsets of Rd . The distribution of a random vector X is symmetric if and only if X and X have the same distribution. Theorem 1.3.13. For a random vector X the following conditions are equivalent. (i)
The distribution of X is symmetric.
(ii)
The characteristic function f of X is real-valued. R f .t / D Rd cos..t; x// d.x/; t 2 Rd :
(iii)
Proof. If is symmetric then X and X have the same distribution and hence f .t / D E ei.t;X/ D E ei.t;X/ D f .t / D f .t /; t 2 Rd : Thus, f is real-valued.
Section 1.4 Basic properties of positive definite functions
25
Suppose that f is real-valued. Then f .t / D f .t / D f .t / for all t 2 Rd , hence and 4 have the same characteristic function. Therefore D 4 , by Theorem 1.3.3. The equivalence of (ii) and (iii) follows from f .t / D E Œcos.t; X / C iE Œsin.t; X / :
1.4
Basic properties of positive definite functions
An essential property of characteristic functions and covariance functions is that they are positive definite. In the present section we investigate positive definiteness on its own. In the main part of the book we deal with positive definite functions on the group Rd , but occasionally we need also the group Zd . For this reason it is practical to treat the basic properties of these functions in a more general setting. Throughout this section the symbol G denotes an arbitrary commutative group. We use additive notation for the group operation.12 We have already shown in Theorem 1.1.2 that every characteristic function is positive definite in the following sense. Definition 1.4.1. A complex-valued function f on G is said to be positive definite if the inequality n X f .ti tj /ci cj 0 i;j D1
holds for every positive integer n, for all t1 ; : : : ; tn 2 G, and for all c1 ; : : : ; cn 2 C. We denote by P .G/ the set of all positive definite functions on G while P c .Rd / and denote the set of all continuous and Lebesgue measurable positive definite functions on Rd , respectively.
P m .Rd /
Remark 1.4.2. By the definition above, f is positive definite if and only if the matrix n A D f .ti tj / i;j D1 is positive semidefinite13 for an arbitrary choice of n 2 N and t1 ; : : : ; tn 2 G. The inequality in Definition 1.4.1 can be formulated in terms of finitely supported complex measures as follows. Let tj ; cj be as in Definition 1.4.1 and define the meas12
13
Note that most of the results on positive definite functions presented in this book have their analogues on locally compact groups (see for example [49]). The restriction of considering only commutative groups without topology makes the treatment simpler. On the other hand, the results obtained are sufficient for many applications in probability theory. Section D.2 contains some basic results on positive semidefinite matrices.
26
Chapter 1 Characteristic functions
ure by D Then Q D
n X
cj ıtj :
j D1
Pn
j D1 cj ıtj
and
Z f .t x/ d. /.x/ Q D
Q f .t / D G
n X
f .t ti C tj /ci cj ;
t 2 G:
i;j D1
Setting here t D 0 and replacing ti by ti we obtain the sum in Definition 1.4.1. We have thus proved the following lemma. Lemma 1.4.3. A complex-valued function f on G is positive definite if and only if the inequality Q f .0/ 0 holds for all finitely supported complex measures on G. Next we present two examples where we show the positive definiteness of a given function by proving that the matrix A in Remark 1.4.2 is positive semidefinite. Example 1.4.4. A simple example of a positive definite function on G is given by the function 1f0g . In this case the matrix A after Definition 1.4.1 is the n n identity matrix whenever the ti ’s are mutually different. More generally, if G0 is a subgroup of G, then 1G0 is positive definite. To see this let F be a non-void finite subset of G. The set F can be written as F D [nkD1 Fk where the sets Fk are contained in mutually different cosets of G0 . Let nk denote the number of elements of Fk and choose a numbering t1 ; t2 ; : : : ; tm of the elements of F such that ti 2 Fk if and only if k1 X j D1
nj < i
k X
nj
j D1
where the sum on the left-hand side is interpreted as 0 if k D 1. Then 3 2 0 A1 0 6 0 A2 0 7 m 7 6 f .ti tj / i;j D1 D 6 7 : :: 5 4 0 0 An where all entries of Ak are equal to 1. It is easy to check that this matrix is positive semidefinite. We will see in Section 1.7 that the ‘building stones’ of positive definite functions are the characters defined below.
27
Section 1.4 Basic properties of positive definite functions
Definition 1.4.5. A complex-valued function ¤ 0 on G is called a character of G if .x C y/ D .x/ .y/ and .x/ D .x/ for all x; y 2 G. It is easy to check that .0/ D 1 and j .x/j D 1 for all x 2 G if is a character of G. Example 1.4.6. Every character of G is positive definite. Indeed, ˇ n ˇ2 n n ˇX ˇ X X ˇ ˇ
.ti tj /ci cj D
.ti / .tj /ci cj D ˇ ci .ti /ˇ 0: ˇ ˇ i;j D1
i;j D1
iD1
Lemma 1.4.7. Every continuous character of Rd has the form
.t / D ei.x;t/ ;
t 2 Rd
with some x 2 Rd . If is a character of Zd then there exists z 2 Td such that .n/ D z n ;
n 2 Zd :
Proof. The first statement follows immediately from Theorem C.9.2 using the fact that j j D 1. To prove the second one let e1 ; : : : ; ed be the standard basis of Zd . If is a character of Zd and n D .n1 ; : : : ; nd / 2 Zd then .n/ D .n1 e1 C C nd ed / D .e1 /n1 : : : .ed /nd D z n where zi D .ei / 2 T. Next we prove several simple properties of positive definite functions. From Theorem D.2.3 we obtain the following lemma: Lemma 1.4.8. Let f 2 P .G/. Then f is Hermitian, i.e., f .x/ D f .x/ holds for all x 2 G. Theorem 1.4.9. Let f1 ; f2 2 P .G/. Then the functions f1 , Ref1 , jf1 j2 , and f1 f2 are positive definite. Moreover, p1 f1 C p2 f2 is positive definite for all p1 ; p2 0. Proof. The positive definiteness of f1 and p1 f1 C p2 f2 follow at once from Definition 1.4.1. We therefore have Ref1 D 12 .f1 C f1 / 2 P .G/. The fact that f1 f2 is positive definite is an immediate consequence of Schur’s theorem stating that the pointwise product of nonnegative definite matrices is nonnegative definite (cf. Theorem D.2.12). It follows that jf1 j2 D f1 f1 2 P .G/. The next theorem follows immediately from Definition 1.4.1.
28
Chapter 1 Characteristic functions
Theorem 1.4.10. If a sequence (or net) of positive definite functions converges pointwise to a function f then f is positive definite. As a consequence of Theorems 1.4.9 and 1.4.10 we obtain the following corollary: Corollary 1.4.11. The set P .G/ is a convex cone closed in the topology of pointwise convergence. Next we prove some frequently used inequalities.14 Theorem 1.4.12. Let f 2 P .G/. Then (i)
jf .x/j f .0/ for all x 2 G;
(ii)
jf .x/ f .y/j2 2f .0/Œf .0/ Re f .x y/ for all x; y 2 G;
(iii)
jf .0/f .x C y/ f .x/f .y/j2 Œf .0/2 jf .x/j2 Œf .0/2 jf .y/j2 for all x; y 2 G; ˇ2 ˇ n n ˇ ˇX X ˇ ˇ ai f .xi /ˇ f .0/ f .xi xj /ai aj for an arbitrary choice of n 2 ˇ ˇ ˇ
(iv)
iD1
(v)
i;j D1
N, x1 ; : : : ; xn 2 G, and a1 ; : : : ; an 2 C; ˇ2 ˇ n m ! n ˇ ˇX X X ˇ ˇ f .xi yj /ai bj ˇ f .xi xj /ai aj ˇ ˇ ˇ iD1 j D1
i;j D1
m X
! f .yi yj /bi bj
i;j D1
for all n; m 2 N, x1 ; : : : ; xn ; y1 ; : : : ; ym 2 G, and for all complex numbers a1 ; : : : ; an ; b1 ; : : :, bm . Proof. We prove all functions of Pn (v) first. Denote by L the complex linear spacePof m the form g D 1 ai 1fxi g , ai 2 C, xi 2 G. For such g and h D 1 bi 1fyi g 2 L we define the inner product .g; h/ by .g; h/ WD
n X m X
f .xi yj /ai bj :
iD1 j D1
It is easy to check that this definition is correct, i.e., .g; h/ does not depend on the particular representations of g and h. As f 2 P .G/, this inner product is nonnegative definite. Thus, the Cauchy–Schwarz inequality j.g; h/j2 .g; g/ .h; h/ holds, and this is the same as (v). Setting m D 1, y1 D 0, and b1 D 1 in (v), we get (iv). The inequality (ii) follows readily from (iv) by setting n D 2, x1 D x, x2 D y, a1 D 1, a2 D 1. Putting n D 1, x1 D x, and a1 D 1 in (iv), we get jf .x/j2 f .0/2 and, since f .0/ D .1f0g ; 1f0g / 0, this gives jf .x/j f .0/. 14
See also Theorem 1.11.6.
29
Section 1.4 Basic properties of positive definite functions
To prove (iii) we first remark that if a matrix of the form 1 0 1 a b @a 1 c A b c 1 is positive semidefinite, then its determinant is nonnegative (cf. Theorem D.2.9). Computing this determinant we obtain 1 C abc C abc jaj2 C jbj2 C jcj2 or equivalently, jc abj2 .1 jaj2 /.1 jbj2 /:
(1)
Assume first that f .0/ D 1: Applying (1) to the (Hermitian) matrix 0 1 1 f .x/ f .y/ 3 1 f .x C y/A f .xi xj / i;j D1 D @ f .x/ f .y/ f .x y/ 1 where x1 D 0, x2 D x, and x3 D y, we obtain (iii). If f .0/ D 0, then f D 0 by (i) and so (iii) is satisfied. The case f .0/ ¤ 0 can be reduced to the case f .0/ D 1 by considering the positive definite function f =f .0/. Corollary 1.4.13. Let f 2 P .G/ be such that f .0/ D 1 and write G0 WD f x 2 G W jf .x/j D 1 g: Then G0 is a subgroup of G and f .x C y/ D f .x/f .y/ for all x 2 G0 and y 2 G. Proof. This follows immediately from inequality (1.4.12.iii) and Lemma 1.4.8. Corollary 1.4.14. Let f 2 P .R/ be such that f .0/ D 1. If there exist n 2 N and h > 0 such that f .h/n D 1 then f is periodic with period nh. Proof. The case n D 1 follows readily from Corollary 1.4.13. Using this, the general case is obtained by noting that jf .h/j D 1 and therefore, by Corollary 1.4.13, f .nh/ D f .h/n D 1. The functions t 7! ei.t;x/ ;
t 7! cos..t; x//
and
t 7! ektk ; 2
t; x 2 Rd
30
Chapter 1 Characteristic functions
provide examples for the subgroup G0 in the previous corollary. Example 1.4.4 shows that for an arbitrary subgroup G0 of G there exists f 2 P .G/ such that G0 D f x 2 G W jf .x/j D 1 g. A simple corollary of inequality (1.4.12.iii) is the following: Corollary 1.4.15. Let fn 2 P .G/ be such that fn .0/ D 1; n 2 N. Then the set of all x 2 G with fn .x/ ! 1; n ! 1, is a subgroup of G. In particular, if G D Rd and fn ! 1 on a neighborhood of 0 then fn ! 1 on Rd . Next we list some simple results that can be used to construct positive definite functions. Lemma 1.4.16. Let fi 2 P .Gi / where Gi ; i D 1; 2, is a commutative group. Then the function .x1 ; x2 / 7! f1 .x1 /f2 .x2 / is positive definite on the product group G1 G2 . Proof. The lemma follows immediately from the fact that the entrywise product of positive semidefinite matrices is positive semidefinite (see Schur’s Theorem D.2.12). Lemma 1.4.17. Let g be a given positive definite function on G and let an ; n 2 N0 , be nonnegative numbers such that the series '.z/ WD
1 X
an z n ;
z2C
nD0
is convergent whenever jzj g.0/. Then the function f .t / WD '.g.t //; t 2 G, is positive definite. Proof. By (1.4.12.i), we have jg.t /j g.0/, and hence the definition of f is correct. The positive definiteness of f follows from Theorems 1.4.9 and 1.4.10. For example, the functions eg and 1= .2g.0/ g/, are positive definite if g 2 P .G/ and g.0/ ¤ 0. Lemma 1.4.18. Let f 2 P .G/ and let D
n X j D1
cj ıtj ;
n 2 N; cj 2 C; tj 2 G
31
Section 1.4 Basic properties of positive definite functions
be an arbitrary complex measure with finite support. Then the function g.t / D Q f .t / D
n X
f .t ti C tj /ci cj ;
t 2G
i;j D1 y
is positive definite. In particular, the function fc defined by fcy .t / D .1 C jcj2 /f .t / C cf .t C y/ C cf .t y/;
t 2G
is positive definite for all y 2 G and c 2 C. Proof. Let be an arbitrary finitely supported complex measure on G. The first statement of the lemma follows from the equation Q g.0/ D . / . /Q f .0/; using Lemma 1.4.3 and the positive definiteness of f . Setting D ı0 C cıy , we obtain the second statement. It follows from Corollary 1.4.11 that the set P0 .G/ WD ff 2 P .G/ W f .0/ D 1g is convex. Using the previous lemma we are now able to identify the extreme points (cf. D.4.1) of this set. First we prove a simple but useful statement. Lemma 1.4.19. If f is an extreme point of P0 .G/ and f D f1 C f2 where fi 2 P .G/, then fi D fi .0/ f; i D 1; 2. Proof. If f1 .0/ D 0, then f1 D 0 (cf. (1.4.12.i)) and hence f2 D f . The case f2 .0/ D 0 can be treated in the same way. If f1 .0/ and f2 .0/ are different from zero then fi =fi .0/ 2 P0 .G/ and f D f1 .0/
f1 f2 C f2 .0/ : f1 .0/ f2 .0/
Since f is extremal and 1 D f .0/ D f1 .0/ C f2 .0/ we conclude that fi =fi .0/ is equal to f . Theorem 1.4.20. A complex-valued function on G is an extreme point of P0 .G/ if and only if it is a character of G. Proof. Let 2 P0 .G/ be extremal. For every y 2 G and c 2 C, the function N y/
cy .x/ D .1 C jcj2 / .x/ C c .x C y/ C c .x
32
Chapter 1 Characteristic functions
is positive definite by Lemma 1.4.18. Moreover, 1 y
D /: . cy C c 2.1 C jcj2 / y
Since is extremal the previous lemma shows that c .x/ D g.c; y/ .x/ with a certain function g. Thus, y
cy .x/ c .x/ D 2Œc .x C y/ C c .x N y/ D g1 .c; y/ .x/;
(1)
where g1 .c; y/ D g.c; y/ g.c; y/. Setting c D 1 and c D i in (1) , we get
.x C y/ D l.y/ .x/
(2)
with
1 Œg1 .1; y/ ig1 .i; y/: 4 Setting x D 0 in (2) we obtain .y/ D l.y/, and therefore l.y/ D
.x C y/ D .x/ .y/;
x; y 2 G:
Noting that positive definite functions are Hermitian we see that is a character of G. We now prove that every character is an extreme point of P0 .G/. We have already seen in Example 1.4.6 that is positive definite. Assume that D p 1 C .1 p/ 2 , where 1 ; 2 2 P0 .G/ and 0 < p < 1. It follows immediately from the relations j j D 1; j 1 j 1 and j 2 j 1 that D 1 D 2 , i.e., is extremal.
1.5 Further properties of positive definite functions on Rd We continue the investigation of positive definite functions, but now we make use of the special structure of Rd . Lemma 1.5.1. Let f 2 P c .Rd / such that jf .t /j D 1 holds for all t from a neighborhood of 0 2 Rd . Then f .t / D ei.x;t/ ;
t 2 Rd
for some x 2 Rd . Proof. If a subgroup of Rd contains a neighborhood of the zero, then it is equal to Rd . The statement follows therefore from Corollary 1.4.13 and Theorem C.9.2. Corollary 1.5.2. Let f 2 P .Rd /. If Re f is continuous at 0, then f is uniformly continuous on Rd . Proof. This is a consequence of (1.4.12.ii).
Section 1.5 Further properties of positive definite functions on Rd
33
The next two results can be applied to construct positive definite functions. Lemma 1.5.3. Let f 2 P c .Rd / and let be an arbitrary complex measure on Bd . Then the function Z Z Q f .t / D f .t x y/ d.x/ d.y/; Q t 2 Rd Rd
Rd
is positive definite. In particular, the function Z Z t 7! f .t C x y/h.x/h.y/ dx dy; Rd
Rd
t 2 Rd
is positive definite for all h 2 L1 .Rd /. Proof. By Lemma 1.4.18, the statement is true if has finite support. Let now be arbitrary, and let f˛ g be a net of finitely supported complex measures converging weakly to (cf. Theorem E.1.8). By Theorem E.2.5, the net f˛ Q ˛ g converges weakly to , Q and hence f˛ Q ˛ f g converges pointwise to Q f . Thus, the first assertion follows from Theorem 1.4.10. The second assertion follows from the first one by setting d.x/ D h.x/ dx. Lemma 1.5.4. Let g 2 L2 .Rd /. Then the function Z g.t C y/g.y/ dy; f .t / WD g g.t Q /D Rd
t 2 Rd
belongs to P c .Rd / and it vanishes at infinity. Proof. For all t1 ; : : : ; tn 2 Rd , and for all c1 ; : : : ; cn 2 C we have n X
Z f .ti tj /ci cj D
i;j D1
D
Rd i;j D1 Z n X Rd i;j D1
Z D
n X
g.ti tj C y/g.y/ci cj dy g.ti C y/g.y C tj /ci cj dy
ˇ n ˇ2 ˇX ˇ ˇ ˇ g.t C y/c ˇ i i ˇ dy 0 ˇ ˇ
Rd iD1
i.e., f is positive definite. The remaining statements follow from Theorem E.2.4.
34
Chapter 1 Characteristic functions
1
0.5
3
2
1
1
2
3
Figure 1.6. The function pa from Remark 1.5.5 with a D 2.
Remark 1.5.5. Setting d D 1 and g D p1a 1Œa=2;a=2 ; a > 0, in the previous lemma we see that the function 8 if jt j a < 1 jtj a .t / WD ; t 2R a : 0 if jt j > a is positive definite. We show that the function a sin.t a=2/ 2 ; pa .t / D 2 t a=2
t 2R
is a density and the corresponding characteristic function is a . Indeed, let X and Y be independent random variables having a1 1Œa=2;a=2 as their density. The characteristic functions of X and Y are given by sin.t a=2/ : fX .t / D fY .t / D t a=2 In view of Theorem 1.1.3, the characteristic function of their sum is sin.t a=2/ 2 2 D pa .t /: fXCY .t / D fX .t / fY .t / D t a=2 a Applying (F.1.2), a short computation shows that the density of X CY is equal to a1 a . From Theorem 1.3.6 with f replaced by fXCY we conclude that pa is a density and the corresponding characteristic function is a .15 15
The function a is an example of the so-called Pólya-type characteristic functions, which we will treat in more detail in Section 3.9.
Section 1.5 Further properties of positive definite functions on Rd
35
Lemma 1.5.6. The following statements hold.16 (i)
c d There exists a sequence fgn g1 1 of functions gn 2 P .R / of the form gn D d Q hn hn where hn 2 C00 .R / such that gn .0/ D 1 and fgn g1 1 tends to 1 uniformly on compact sets.
(ii)
d There exists a sequence fgn g1 1 of functions gn 2 P .Z / of the form gn D hn hQ n where hn is a complex-valued function on Zd with finite support such that gn .0/ D 1 and fgn g1 1 tends pointwise to 1.
Proof. (i) For n 2 N let Sn WD ft 2 Rd W 0 tj n; j D 1; : : : ; d g and hn WD p
1
1Sn :
.Sn /
For all r 2 Œ0; n we have .Sn \ .Sn C x// .n r/d ;
x 2 Rd ; jxi j r:
Using this inequality we see that the sequence fhn hQ n g1 1 has the desired properties. d (ii) We define Sn WD ft 2 Z W 0 tj n; j D 1; : : : ; d g and hn WD p
1 jSn j
1Sn
where j j denotes the counting measure. For all r 2 Œ0; n we then have jSn \ .Sn C x/j .n C 1 r/d ;
x 2 Zd ; jxi j r
and hence the sequence fhn hQ n g1 1 has the desired properties. The next lemma follows immediately from the definition of positive definiteness, we omit the proof. Lemma 1.5.7. Let f 2 P .Rd / be Borel measurable and let be a finite, nonnegative measure on B1 . Then the function g defined by Z g.t / D f .xt / d.x/; t 2 Rd R
is positive definite. In particular, the function t 7! f .xt /; t 2 Rd , is positive definite for all x 2 R.17
16 17
See also Lemma 3.2.3 and Lemma 3.2.4. See also Theorem 1.1.6.
36
Chapter 1 Characteristic functions
Lemma 1.5.8. A continuous function f on Rd is positive definite if and only if the inequality Z Z f .x y/h.x/h.y/ dx dy 0 Rd
Rd
holds for arbitrary functions h 2 C00 .Rd /. Proof. The ‘only if part’ follows from Lemma 1.5.3 with t D 0 and d.x/ D h.x/ dx, using the fact that positive definite functions are nonnegative at 0. Q The assumption of the ‘if part’ means that h hf .0/ 0 for all h 2 C00 .Rd /. Let be an arbitrary finitely supported complex measure. It follows from Theorems E.1.9 d and E.2.5 that there exists a sequence fhn g1 1 of functions hn 2 C00 .R / such that n Q where dn .x/ D hn .x/ dx. From this we conclude that Q n converges weakly to , Q f .0/ 0. Thus, the positive definiteness of f follows from Lemma 1.4.3. Theorem 1.5.9. If f 2 P c .Rd / \ L1 .Rd /, then18 Z f .x/ei.t;x/ dx 0; t 2 Rd : Rd
If f 2 P .Zd / \ L1 .Zd /, then X
f .n/ei.t;n/ 0;
t 2 Rd :
n2Zd
Proof. Since the integrand is positive definite for all fixed t , it suffices to prove that Z f .x/ dx 0 Rd
for all f 2 P c .Rd / \ L1 .Rd /. To show this inequality we choose a sequence fgn g as in Lemma 1.5.6. Then jfgn j jf j and Lebesgue’s theorem on dominated convergence shows that Z Z f .x/ dx D lim f .x/gn .x/ dx 0: Rd
n!1 Rd
Applying Fubini’s theorem and Lemma 1.5.8 we obtain Z Z Z f .x/gn .x/ dx D f .x/ hn .x C y/hn .y/ dy dx d Rd Rd ZR Z f .x/hn .x C y/hn .y/ dx dy D d d ZR ZR f .x y/hn .x/hn .y/ dx dy 0: D Rd
Rd
The second statement can be proved in the same way by replacing the integrals by sums. 18
See Theorems 1.8.13 and 1.9.6 for closely related results.
Section 1.5 Further properties of positive definite functions on Rd
37
The next lemma will be used in the proof of Theorem 1.5.11. Lemma 1.5.10. Let K Rd be compact and let fgn g1 1 be a uniformly bounded d sequence of Lebesgue measurable functions on R converging -almost everywhere to a function g. Then the sequence f'n g1 1 with Z 'n .t / D gn .t C x/ dx; t 2 Rd K
converges uniformly on every compact set to the function ' defined by Z '.t / D g.t C x/ dx; t 2 Rd : K
Proof. We may assume that .K/ > 0; g D 0 and jgn j 1. Suppose, on the contrary, that for some compact set C Rd the sequence f'n g does not converge uniformly to 0 on C . Then there exist a positive number ı, a sequence ftn g1 1 C , and a subsequence such that j' .t /j ı. We claim that the Lebesgue measure of the set f'kn g1 n k n 1
ı Sn D x 2 K W jgkn .tn C x/j 2.K/ is at least
ı 2
. Indeed, the inequality .Sn / < 2ı would imply that Z Z jgkn .tn C x/j dx C jgkn .tn C x/j dx j'kn .tn /j Sn
KnSn
ı .Sn / C .K/ 0 for all y 2 SQn WD Sn C tn K C C and .SQn / D .Sn / ı . Consequently, 2
\ 1 [ 1 ı Q Q .lim sup Sn / D Sm 2 mDn nD1
which is a contradiction since the sequence fgn g1 1 does not converge to 0 on the set lim sup SQn which is contained in the compact set K C C . We close this section with an important result concerning the convergence of positive definite functions. c d Theorem 1.5.11. If a sequence ffn g1 1 of functions fn 2 P .R / converges pointwise to a continuous function f , then the convergence is uniform on every compact subset of Rd .
38
Chapter 1 Characteristic functions
Proof. Let C Rd be an arbitrary compact subset. For a bounded complex-valued function h on C write khkC WD supf jh.t /j W t 2 C g. Using this notation we have to show that kfn f kC ! 0. We may assume that fn .0/ D f .0/ D 1. Since f is continuous at 0, for every " > 0 there exists a compact set K with positive Lebesgue measure satisfying Z "2 2 Œ1 Re f .x/ dx < : (1) .K/ K 9 Since fn .x/ ! f .x/, this inequality also holds for fn if n is sufficiently large, say n N0 . Define ' and 'n as in Lemma 1.5.10 with g and gn replaced by f and fn , respectively. We show that 1 1 " " (2) .K/ ' f < 3 and .K/ 'n fn < 3 : C C Indeed, using the Cauchy–Schwarz inequality, (1.4.12.ii), and (1) we obtain ˇ ˇ2 ˇ ˇ2 Z Z ˇ ˇ ˇ ˇ 1 ˇf .t / 1 '.t /ˇ D ˇ 1 ˇ f .t / dx f .t C x/ dx ˇ ˇ ˇ ˇ .K/ .K/ K .K/ K ˇZ ˇ2 ˇ 1 ˇˇ D Œf .t / f .t C x/ dx ˇˇ ˇ 2 .K/ Z
ZK 1 2 1 dx jf .t / f .t C x/j dx .K/2 K K Z 2 "2 Œ1 Re f .x/ dx : .K/ K 9 The second inequality in (2) can be proved in the same way. By Lemma 1.5.10 (note that jfn .t /j fn .0/ D 1), there exists N N0 such that 1 1 " (3) .K/ 'n .K/ ' < 3 if n N: C From (2) and (3) we infer that kf fn kC < "; n N .
1.6 Lévy’s continuity theorem We now give necessary and sufficient conditions for the weak convergence of probability distributions in terms of their characteristic functions. d Theorem 1.6.1. If a sequence fn g1 1 of probability distributions on R converges 1 weakly to a probability distribution , then the sequence ffn g1 of the corresponding characteristic functions converges uniformly on every compact set to the characteristic function of .
39
Section 1.6 Lévy’s continuity theorem
Proof. By weak convergence (cf. Theorem E.1.7), Z Z i.t;x/ lim fn .t / D lim e dn .x/ D n!1
n!1 Rd
Rd
ei.t;x/ d.x/;
t 2 Rd :
That the convergence is uniform on compact sets follows from Theorem 1.5.11. Lemma 1.6.2. Let f be the characteristic function of a distribution on Rd . For a > 0 we then have .Ka / 1 7 sup Œ1 Re f .t / t2K1=a
where Ka D Œa; ad . Proof. Using Fubini’s theorem and the example after Definition 1.1.1 we obtain
d Z 2 sup Œ1 Re f .t / Œ1 Re f .t / dt t2K1=a a K1=a Z Z D Œ1 cos..t; y// d.y/ dt K Rd Z 1=aZ Œ1 cos..t; y// dt d.y/ D Rd
K1=a
#
d Z " d Y 2 sin yj =a 1 d.y/ D a yj =a Rd j D1 # "
d Z d Y 2 sin yj =a d.y/ 1 a yj =a Rd nKa j D1
If y 2 Rd n Ka , then jyj =aj 1 for at least one j . Since 101 6 sin x < < x 120 7
if
jxj 1
and
sin x < 1 if x
x 2 R n f0g
(cf. (B.1.6.5) and (B.1.6.1)), the last integrand is not less than 17 on the set Rd n Ka . We thus obtain 1 sup Œ1 Re f .t / Œ1 .Ka /; t2K1=a 7 from which the lemma follows. The next theorem is frequently applied in probability theory. Theorem 1.6.3 (Lévy’s continuity theorem). Let fn g1 1 be a sequence of probability distributions on Rd such that the sequence ffn g1 of the corresponding characteristic 1
40
Chapter 1 Characteristic functions
functions converges pointwise to a continuous function f . Then f is the characteristic function of a probability distribution and the sequence fn g1 1 converges weakly to . Proof. In view of Theorem E.1.13, the sequence fn g1 nD1 contains a subsequence converging vaguely to some finite nonnegative measure satisfying fnk g1 kD1 d d .R / 1. We show that .R / D 1. By Lemma 1.6.2, the inequality nk .Km / 1 7 sup Œ1 Re fnk .t / t2K1=m
holds for all k and m 2 N. Taking lim sup with respect to k and using Theorem E.1.11 we obtain .Km / lim sup nk .Km / 1 7 sup Œ1 Re f .t /: t2K1=m
k
Now letting m ! 1 and noting that f is continuous and f .0/ D 1, we see that converges weakly to (cf. .Rd / 1. Thus, .Rd / D 1 and hence fnk g1 kD1 Theorem E.1.12). From Theorem 1.6.1 we conclude that f is the characteristic function of . Since is uniquely determined by f , we see that every weakly convergent subsequence of fn g1 1 has the same limit . This shows that the whole sequence converges weakly to . Remark 1.6.4. In view of Theorem 1.4.10 and Corollary 1.5.2, it suffices to assume continuity of Ref at 0 in the previous theorem. This assumption cannot be dropped. Indeed, with the notation introduced in Remark 1.5.5, limn!1 1=n D 1f0g . The limit is not a characteristic function though it is continuous at every point different from zero. The next theorem is usually called the Cramér–Wold device. Theorem 1.6.5. A sequence fXn g of d-dimensional random vectors converges in distribution to a random vector X if and only if for all a 2 Rd the sequence f.a; Xn /g of random variables converges in distribution to .a; X /. Proof. Suppose that Xn ! X in distribution. For any a 2 Rd the function x 7! .a; x/ is continuous on Rd . Applying Corollary E.1.16 we see that .a; Xn / ! .a; X / in distribution. Conversely, suppose that .a; Xn / ! .a; X / in distribution for all a 2 Rd . Denoting by fa;n and fa the corresponding characteristic functions, Theorem 1.6.1 shows that lim fa;n .s/ D fa .s/; n
s 2 R:
41
Section 1.7 The theorems of Bochner and Herglotz
Using Remark 1.1.10 we obtain fXn .a/ D fa;n .1/ ! fa .1/ D fX .a/;
a 2 Rd :
Hence, by Lévy’s continuity theorem, Xn ! X in distribution. The next result will be used in the proof of Schoenberg’s Theorem 3.8.5. Proposition 1.6.6. Let p; pn ; n 2 N, be probability densities on Rd with corresponding characteristic functions fn and f , respectively. If pn ! p pointwise, then fn ! f uniformly on Rd . Proof. We have ˇZ ˇ Z ˇ ˇ i.rt;x/ i.rt;x/ ˇ jfn .rt / f .rt /j D ˇ e pn .x/ dx e p.x/ dx ˇˇ d Rd ZR jpn .x/ p.x/j dx: Rd
By Scheffe’s Lemma E.1.19 the integral above tends to zero as n ! 1. This completes the proof.
1.7
The theorems of Bochner and Herglotz
We already know that functions of the form f .t / D
n X
pj j .t /;
t 2G
j D1
where G is a commutative group, pj 0, and j is a character of G, are positive definite. We will show that every positive definite function on G can be obtained if we replace the weighted sum above by an integral with respect to a nonnegative measure. First we consider the case G D Rd . Lemma 1.7.1. Let be a probability measure on Rd and for n 2 N define the probability measure n by n .B/ D .nB/; B 2 Bd . Then lim n g.t / D g.t /;
n!1
t 2 Rd
holds for any bounded, continuous function g on Rd . Proof. Let > 0 and t 2 Rd be arbitrary and choose a bounded, open neighborhood U of 0 such that jg.t / g.t x/j < ; x 2 U:
42 We have
Chapter 1 Characteristic functions
ˇZ ˇ jg.t / n g.t /j D ˇˇ
Rd
ˇ ˇ g.t / g.t x/ dn .x/ˇˇ
n .U / C 2kgk1 n .Rd n U / C 2kgk1 .Rd n nU /: The lemma follows now from the fact that limn!1 .Rd n nU / D 0. Applying the previous lemma to an absolutely continuous measure we immediately obtain the following corollary: Corollary 1.7.2. Let p be a probability density on Rd and for n 2 N put pn .t / D nd p.nt /;
t 2 Rd :
Then lim pn g.t / D g.t /;
n!1
t 2 Rd
holds for any bounded, continuous function g on Rd . We are now able to characterize continuous positive definite functions on Rd . We formulate the result in terms of characteristic functions. Theorem 1.7.3 (Bochner). A continuous complex-valued function f on Rd is a characteristic function if and only if f .0/ D 1 and f is positive definite. Proof. We have already seen in Theorem 1.1.2 that characteristic functions are positive definite. Conversely, let f be a continuous positive definite function with f .0/ D 1 and assume first that f is integrable. Further, let g and ' be as in Lemma 1.3.4. Applying Fubini’s theorem we obtain Z f .x/g .t x/ dx f g .t / D Rd Z Z f .x/ ei.tx;y/ ' .y/ dy dx D d d ZR ZR D ' .y/ f .x/ei.y;x/ dx ei.t;y/ dy d Rd ZR ' .y/ q.y/ ei.t;y/ dy: DW Rd
R Theorem 1.5.9 shows that q.y/ 0. Since q.y/ jf .x/j dx, the function y 7! ' .y/q.y/ is integrable. From this, using (1.3.4.1), we conclude that f '1= f g D DW h1= f g .0/ f '1= .0/
Section 1.7 The theorems of Bochner and Herglotz
43
is a characteristic function. By Lemma 1.7.2, the sequence fhn g1 1 converges pointwise to f . Application of Lévy’s continuity Theorem 1.6.3 shows that f is a characteristic function. In the general case where f is not supposed to be integrable, we consider the functions f g1=n which are positive definite and integrable. Moreover, f .t / D limn f g1=n .t /; t 2 Rd . A second application of Lévy’s continuity theorem completes the proof. An alternative formulation of Bochner’s theorem is the following. Theorem 1.7.4. A continuous complex-valued function f on Rd is positive definite if and only if it can be represented in the form Z ei.t;x/ d.x/; t 2 Rd ; f .t / D Rd
with some nonnegative finite Borel measure on Rd . The measure is uniquely determined by f . In the case d D 1 the next theorem is due to G. Herglotz. We will deduce it from the more general Theorem 1.7.8.19 Theorem 1.7.5. A complex-valued function f on Zd is positive definite if and only if it can be represented in the form Z ei.n;t/ d.t /; n 2 Zd f .n/ D Œ0;2/d
with some nonnegative finite Borel measure on Œ0; 2/d . The measure is uniquely determined by f . Remark 1.7.6. Let G be an arbitrary commutative group and denote by L D L.G/ the complex linear space of all bounded complex-valued functions on G. Equipped with the topology of pointwise convergence, L becomes a locally convex linear space (cf. Section D.4). Notice that for fixed x 2 G the function h 7! h.x/ is a continuous linear functional on L. The convex set P0 .G/ D ff 2 P .G/ W f .0/ D 1g is a closed subset of L. Since jf .x/j 1; x 2 G, this set is compact (cf. Theorem B.2.8). We have already seen (cf. Theorem 1.4.20) that the set of all extreme O points of P0 .G/ is equal to the set of all characters of G which we will denote by G. The set GO is a closed subset of P0 .G/. Hence it is compact. 19
The formulations of Theorems 1.7.4, 1.7.5, and 1.7.8 can be unified within the framework of Fourier transformation on locally compact commutative groups, see for example [49].
44
Chapter 1 Characteristic functions
Remark 1.7.7. If G D Zd then we can identify GO with the set Td (see Lemma 1.4.7). Moreover, the topology of pointwise convergence on GO is the same as the metric topology of Td . Since every z 2 T can uniquely be written as z D eit with some t 2 Œ0; 2/, we can identify GO also with Œ0; 2/d . This is sometimes more convenient. However, in this case the topology of pointwise convergence is not the same as the metric topology (note that Œ0; 2/d is not compact). Nevertheless, both topologies generate the same Borel sets. Theorem 1.7.8. A complex-valued function f on a commutative group G is positive definite if and only if it can be represented in the form Z
.x/ d. /; x 2 G f .x/ D O G
O The measure is uniquely with some nonnegative finite Radon measure on G. 20 determined by f . Proof. If f has the form above, then n X i;j D1
ˇ2 Z ˇˇX n ˇ ˇ ˇ f .xi xj /ci cj D ci .xi /ˇ d. / 0; ˇ ˇ ˇ O G iD1
showing that f is positive definite. To prove the other direction we may assume that f 2 P0 .G/. By Theorem D.4.6 (see also Remark 1.7.6), there exists a probability measure D f on GO such that Z l. / d. / l.f / D O G
holds for every continuous linear functional l on L.G/. Taking here the linear functionals lx defined by lx .h/ D h.x/; h 2 L.G/, we obtain the desired representation of f . To prove uniqueness, assume that is a finite Radon measure on GO such that Z Z
.x/ d. / D
.x/ d. /; x 2 G: O G
O G
Denote by A the linear space of (continuous) functions on GO spannedR by all functions R of the form 7! .x/; 2 GO where x 2 G. By the equation above, g d D g d for all g 2 A. It is easy to check that A satisfies the first two conditions of Theorem B.2.4 (Stone–Weierstraß theorem). To show that the last condition is satisfied, let x; y 2 G be such that x ¤ y. Applying the first part of the proof to the positive 20
The measure is sometimes called a representing measure or spectral measure of f .
Section 1.7 The theorems of Bochner and Herglotz
45
definite function 1f0g we see that there exists a probability measure on GO such that Z 0 D 1f0g .x y/ D
.x y/ d. /: O G
This implies the existence of 2 GO such that .x y/ ¤ 1 from which .x/ ¤ .y/ follows. Thus, A satisfies the conditions of the Stone–Weierstraß theorem (cf. TheoO can be uniformly approxrem B.2.4) and therefore an arbitrary functionR g 2 C.G/ R O and imated by functions from A. This shows that g d D g d for all g 2 C.G/, hence D . Applying Theorem E.1.821 we obtain the following corollary: Corollary 1.7.9. Every positive definite function f on G is the pointwise limit of a net ff˛ g of positive definite functions of the form f˛ D
n˛ X
pi˛ i˛
iD1
where n˛ 2 N; pi˛ 0;
Pn˛
˛ iD1 pi
O D f .0/ and i˛ 2 G.
The next statement has been show in the proof of Theorem 1.7.8. We state it explicitly because of its importance. Corollary 1.7.10. If x and y are distinct elements of G, then there exists a character
of G such that .x/ ¤ .y/. Another nice corollary of Theorem 1.7.8 is the existence and uniqueness of a O which is called the normalized Haar translation invariant probability measure on G, measure. Corollary 1.7.11. There exists a unique Radon probability measure on GO such that Z Z O 2 G: O f . / d. / D f . / d. /; f 2 C.G/; O G
O G
Proof. Let and A be as in the proof of Theorem 1.7.8. It follows from the definition of that the equation above holds for all f 2 A, while the density property of A O implies that it holds for all f 2 C.G/. Assume that is a Radon probability measure satisfying Z Z O 2 G: O f . / d. / D f . / d. /; f 2 C.G/; O G
21
O G
The corollary also follows from Kre˘ın–Milman’s Theorem D.4.5.
46
Chapter 1 Characteristic functions
Setting f . / WD .x/ where x 2 G we obtain Z Z Z
.x/ d. / D . /.x/ d. / D
.x/ d. /; .x/ O G
O G
O G
O x 2 G; 2 G:
We have shown in the proof of Theorem 1.7.8 that for all x ¤ 0 there exists 2 GO such that .x/ ¤ .0/ D 1. Using this and the relation above we conclude that Z
.x/ d. / D 1f0g .x/; x 2 G O G
and hence D . Example 1.7.12. Identifying the character group of Zd with Œ0; 2/d the restriction of the Lebesgue measure to Œ0; 2/d is translation invariant in the sense of the previous corollary. Thus, dividing this restriction by .2/d we obtain the normalized Haar measure. Corollary 1.7.13. If 2 Mb .G/ is such that Z
.x/ d.x/ D 0;
2 GO
G
then D 0. Proof. For all y 2 G we have Z Z Z 1f0g .x y/ d.x/ D
.x y/ d. / d.x/ .fyg/ D O G ZG G Z
.x/ d.x/ d. / D 0: D
.y/ O G
G
Proof of Theorem 1.7.5. We identify the character group of Zd with Œ0; 2/d (see Remark 1.7.7). Since the topology of pointwise convergence generates the same Borel -algebra as the metric topology of Œ0; 2/d , Theorem 1.7.5 follows immediately from Theorem 1.7.8. Alternative proof of Theorem 1.7.5 in the case d D 1. We show here only the existence of . Definition 1.4.1 with cj D eijx , x 2 R, and tj D j gives
N N N X 1 XX jmj i.j k/x f .j k/e D f .m/eimx 0 PN .x/ WD 1 N N j D1 kD1
mDN
Section 1.8 Fourier transformation on Rd
47
and therefore 1 2
Z
2 inx
e 0
Z 2 N 1 X jmj PN .x/ dx D f .m/ ei.nm/x dx 1 2 N 0 mDN
jnj D 1 f .n/ N C
for all n 2 Z. Consequently, the equation dN .x/ WD
1 PN .x/ dx; 2
x 2 Œ0; 2/
defines a nonnegative measure on Œ0; 2/, such that
Z jnj inx e dN .x/ D 1 f .n/: N C Œ0;2/
(1)
In particular, N .Œ; // D f .0/. By Theorem E.1.13, the sequence fN g contains a subsequence converging weakly to some nonnegative measure .22 From (1) we conclude that Z einx d.x/ D f .n/: Œ0;2/
1.8
Fourier transformation on Rd
The aim of the present section is to prove some basic and frequently used results on the Fourier transform on Rd. Our previous results on inversion theorems, on positive definite functions and Bochner’s theorem enable us to give short proofs. Definition 1.8.1. For g 2 L1 .Rd / or g 2 L1 .Rd / let Z 1 g.t O /D g.x/ei.t;x/ dx; .2/d=2 Rd and g.t L /D
1 .2/d=2
t 2 Rd
Z Rd
g.x/ei.t;x/ dx;
t 2 Rd :
The function gO is called the Fourier transform of g while gL is the inverse Fourier transform of g. In some cases it is convenient to consider the Fourier transformation on the Banach space L1 .Rd /, in other cases it is more natural to deal with L1 .Rd /, the elements of which are functions and not equivalence classes of functions.
22
This follows also from Helly’s selection theorem.
48
Chapter 1 Characteristic functions
For a finite complex measure on Rd we define the Fourier–Stieltjes transform of by Z .t O /D
Rd
ei.t;x/ d.x/;
t 2 Rd
ei.t;x/ d.x/;
t 2 Rd
Z
while .t L /D
Rd
is the inverse Fourier–Stieltjes transform of . Note that if ' is a density, then the corresponding characteristic function is equal L On the other hand, if is the distribution of a random vector, then the to .2/d=2 '. corresponding characteristic function is . L To see an example of the Fourier transform, let ' be the Gaussian density as defined in Lemma 1.3.4. Then, by the same lemma, 'O D 'L D
1 '1= : d
Theorem 1.8.2. Every function g 2 L1 .Rd / is uniquely determined by its Fourier transform gO (inverse Fourier transform g) L which is a bounded, uniformly continuous function on Rd . Moreover, for all g; h 2 L1 .Rd / we have O (i) .g C h/O D gO C h; (iii)
for all c 2 C; O .g h/O D .2/d=2 gO h;
(iv)
gO D .g/L D .g/O; Q
(ii)
.cg/O D c gO
gO is real-valued if and only if g D g; Q Z 1 (vi) sup fjg.t O /j W t 2 Rd g jg.x/j dx; .2/d=2 Rd O / and (vii) gy O.t / D ei.t;y/ g.t i.t;y/ gy L.t / D e g.t L /, where gy .x/ D g.x C y/. (v)
The analogues of (i)–(iii), (v) and (vi) are also valid for the inverse Fourier transform g. L Proof. The properties (i)–(iv) and (vi) are simple consequences of Definition 1.8.1, we omit the details. Since g is integrable it can be written as a linear combination of four density functions. Using this, uniform continuity of gO (and of g) L follows from (1.1.2.iv). As to uniqueness, the proof of Theorem 1.3.3 can also be applied to D g. Property (v) follows from (iv) and from the uniqueness. The next theorem can be proved in the same way as Theorem 1.8.2. We omit the proof.
Section 1.8 Fourier transformation on Rd
49
Theorem 1.8.3. Every measure 2 Mb .Rd / is uniquely determined by its Fourier– Stieltjes transform O which is a bounded, uniformly continuous function on Rd . For all ; 2 Mb .Rd / we have (i)
. C /O D O C ; O
(ii)
.c/O D c O
(iii) (iv)
. /O D O ; O . f /O D O fO
(v)
./O Q D ; O
(vi)
O is real-valued if and only if D . Q
for all c 2 C; for all f 2 L1 .Rd /;
All these properties are also valid for . L Next we investigate the behavior of the Fourier transform at infinity. Theorem 1.8.4 (Riemann–Lebesgue lemma). If g 2 L1 .Rd / then gO 2 C0 .Rd /. Proof. Let g first be the indicator function of a parallelepiped Œa1 ; b1 Œad ; bd Rd: Then g.t O /D
d Y eibj tj eiaj tj ; itj
t 2 Rd;
j D1
and this tends to zero as kt k ! 1, since jtj j ! 1 for at least one j . For each function g 2 L1 .Rd / and > 0 there exists a function h such that h is a finite linear combination of indicator functions of parallelepipeds and kg hk1 < =2. O /j < From the first step we know that there exists a positive number K such that jh.t
=2 whenever jt j > K. For such t we have O /j < : O /j kg hk1 C jh.t jg.t O /j D j.g h/ O.t / C h.t Theorem 1.8.5. If is a finite discrete measure on Rd , then for all t 2 Rd and s 2 R we have O / .rt O /j D 0: lim inf j.st r!1
Proof. The function f .s/ WD .st O /; s 2 R, has the form f .s/ D
1 X
cn eixn s ;
s2R
nD1
P where xn 2 R; cn 2 C and 1 nD1 jcn j < 1. Therefore, the theorem follows from Corollary C.2.5 and Theorem C.2.7.
50
Chapter 1 Characteristic functions
Remark 1.8.6. Applying the previous theorem with s D 0 and t D 1 to a discrete probability measure on R we see that lim sup j.r/j O D 1: r!1
It can be shown (see, e.g., [6], Corollary 1.11.8) that for every p 2 Œ0; 1 there exists a continuous, singular probability measure on R such that lim sup j.r/j O D p: r!1
To see another interesting example for the behavior at infinity of a Fourier transform, let q 2 .0; 1/ and let Xn ; n 2 N0 , be independent random variables such that 1 P .Xn D 1/ D P .Xn D C1/ D : 2 The characteristic function of the random variable 1 X XD q n Xn nD0
does not tend to zero at infinity if and only if WD q1 is a Pisot number23 different from 2. This is the so-called Zygmund–Salem theorem; we refer to [6, Sect. 1.11] for more details. The Fourier transform of an integrable function may not be integrable (cf. Example 1.8.14). However, we have the following result:24 Theorem 1.8.7. Let f be a continuous and integrable positive definite function on Rd . Then both fO and fL are integrable, nonnegative and f D .fO /L D .fL /O: Proof. We may assume that f .0/ D 1. Then f is a characteristic function in view of Bochner’s theorem. By Theorem 1.3.6, f corresponds to some density p, i.e., L On the other hand, Theorem 1.3.6 shows that p D .1=2/d=2 fO . f D .2/d=2 p. Combining these facts we see that fO 0 and f D .fO /L . The relations fL 0 and f D .fL /O follow from (1.8.2.iv). We will extend the Fourier transformation to square integrable functions. For this we need the following result: Lemma 1.8.8. The set L WD f gO W g 2 L1 .Rd / \ L2 .Rd / g 23
A real number > 1 is said to be a Pisot number if it satisfies an equation n C b1 n1 C C bn D 0
24
where b1 ; : : : ; bn are integers, and all other roots of this equation have moduli less than 1. See also Theorem 1.8.13.
Section 1.8 Fourier transformation on Rd
51
is a dense linear subspace of L2 .Rd / and O 2; kgk2 D kgk
g 2 L1 .Rd / \ L2 .Rd /:
Proof. If g 2 L1 .Rd / \ L2 .Rd /, then the function f D g gQ belongs to P c .Rd / \ O 2 in view of L1 .Rd / (cf. Lemma 1.5.4 and Theorem E.2.4) and fO D .2/d=2 jgj O (1.8.2.iii) and (1.8.2.iv). By Theorem 1.8.7, the function f is integrable and therefore gO 2 L2 .R/, i.e., L L2 .Rd /. It is obvious that L is a linear subspace. Applying the first equation in Theorem 1.8.7, we get Z Z Z 1 2 O.x/ dx D f jg.x/j2 dx D g g.0/ Q D f .0/ D jg.x/j O dx; d=2 d d d .2/ R R R that is, kgk2 D kgk O 2. Suppose now that 2 L2 .Rd / and is orthogonal to L. Since L1 .Rd / \ L2 .Rd / is translation-invariant, (1.8.2.vi) shows that L is invariant under multiplication by the function t 7! ei.t;x/ for all x 2 Rd . Define the function by .t / D e.t;t/ . Since and are square integrable their product is in L1 .Rd /. By Lemma 1.3.4, the function is in L, thus the identity Z .t /.t /ei.t;x/ dt D 0 Rd
holds for all x 2 Rd , i.e., . /O D 0. The first part of Theorem 1.8.2 implies that D 0 and hence also D 0. Thus, 0 is the only element in L2 .Rd / that is orthogonal to L; therefore L is dense in L2 .Rd /. Theorem 1.8.9 (Plancherel). We consider the Fourier transformation, introduced in Definition 1.8.1, on the set L1 .Rd /\L2 .Rd /. This transformation can be extended in a unique way to a linear isometry from L2 .Rd / onto L2 .Rd /. Denoting the extension by the same symbol, the equations25 Z Z fO.t /g.t (i) f .x/g.x/ dx D O / dt and Rd
(ii)
Rd
.2/d=2 .f g/O D fO g. O
hold for all f; g 2 L2 .Rd /. Proof. Write H D L1 .Rd /\L2 .Rd /. By Lemma 1.8.8, the Fourier transformation is a linear isometry from H into L2 .Rd /. The first statement follows immediately from the fact that H is a dense subspace of L2 .Rd /. Equation (i) is a consequence of the identity 4f gN D jf C gj2 jf gj2 C ijf C igj2 ijf igj2 : Replacing g.x/ by g.x/ei.y;x/ ; y 2 Rd , in (i) we obtain (ii). 25
The first equation is usually referred to as Parseval’s identity.
52
Chapter 1 Characteristic functions
Remark 1.8.10. If g 2 L2 .Rd / then we call gO the L2 Fourier transform of g. Note that gO is not a function but an equivalence class of functions and relations such as gO D hO or gO 0 are relations between equivalence classes. However, if g 2 L1 .Rd / \ L2 .Rd / then we can regard gO as a function given by Z 1 g.x/ei.t;x/ dx: g.t O /D .2/d=2 Rd It will be clear from the context in every case whether we mean the fixed function gO O or the equivalence class in L2 .Rd / containing g. The inverse Fourier transformation can be extended to square integrable functions in the same way. Theorem 1.8.11. We consider the inverse Fourier transformation, introduced in Definition 1.8.1, on the set L1 .Rd / \ L2 .Rd /. This transformation can be extended in a unique way to a linear isometry from L2 .Rd / onto L2 .Rd /. Denoting the extension by the same symbol, the equations (i) D .L /O D .O /L /L D L L /L D .2/d=2 L L
(ii)
.2/d=2 .
(iii)
.
2 L2 .Rd /.
hold for all ;
Proof. The inverse Fourier transform of a function in L1 .Rd / \ L2 .Rd / is square integrable. This follows from (1.8.2.iv) and the square integrability of the Fourier transform. If ; g 2 L1 .Rd / \ L2 .Rd /, then Z Z Z 1 L .x/g.x/ dx D .t /ei.t;x/ dt g.x/ dx .2/d=2 Rd Rd Rd Z Z 1 D .t / g.x/ei.t;x/ dx dt .2/d=2 Rd Rd Z .t /g.t O / dt: D Rd
Using (1.8.9.i) and the equations above we see that Z Z Z L L .x/g.x/ dx D ./O.t /g.t O / dt D Rd
Rd
Rd
.t /g.t O / dt:
Since the set L in Lemma 1.8.8 is dense in L2 .Rd /, we conclude that D .L /O;
2 L1 .Rd / \ L2 .Rd /:
(1)
L 2 . Thus, the mapping 7! L maps It follows from (1) and (1.8.9.i) that kk2 D kk L1 .Rd / \ L2 .Rd /, which is dense in L2 .Rd /, isometrically into L2 .Rd /, and there-
Section 1.8 Fourier transformation on Rd
53
fore it extends to a unique isometric mapping from L2 .Rd / onto L2 .R/. From (1) we conclude that (i) holds while (ii) is obtained in the same way as (1.8.9.ii). Taking the inverse Fourier transform of both sides of (1.8.9.ii) we obtain (iii). L 2 L2 .Rd /, the inverse L2 Fourier transform of . Remark 1.8.12. We call ; Remark 1.8.10 holds for this transformation, too. Theorem 1.8.13. Suppose that f 2 L1 .Rd / \ L1 .Rd / and that fO is real-valued and nonnegative. Then fO is in L1 .Rd /. Proof. First we note that f 2 L2 .Rd /. Let ' be the Gaussian density as defined in Lemma 1.3.4. Applying (1.8.9.i) with g D '1=n and using that 'O1=n D nd 'n we obtain Z Z 2 1 kt k2 O f .t /e 2n dt D fO.t /'O1=n .t / dt .2/d=2 Rd Rd Z D f .x/'1=n .x/ dx kf k1 : Rd
The integrand on the left is nonnegative and monotone increasing as a function of n. Taking the limit n ! 1 and applying Beppo Levi’s theorem on monotone convergence we obtain Z fO.t / dt .2/d=2 kf k1 < 1: Rd
Thus, fO is integrable. Next we give two simple examples where the Fourier transform is square integrable but not integrable. Example 1.8.14. By Example 1.1.13(a), the Fourier transform of the function p.x/ D 1 . This function is square integrable 1Œ0;1/ .x/ex ; x 2 R, is equal to q.t / D p1 1Cit 2
but not integrable. By (1.8.11.i), the inverse L2 Fourier transform of q is p. A similar example is given by the indicator function p of the set Œ1; 1d . The computations at the end of Definition 1.1.1 show that
d=2 Y d sin tj 2 ; q.t / WD p.t O / D p.t L /D tj
t 2 Rd :
j D1
The function q belongs to L2 .Rd / n L1 .Rd / and qO D qL D p. The representing measure of the positive definite function Q f from Lemma 1.5.3 can be easily determined.
54
Chapter 1 Characteristic functions
Theorem 1.8.15. Assume that f 2 P c .Rd / admits the representing measure and let 2 Mb .Rd / be arbitrary. Then the function Q f is positive definite and its 2 d .s/. O representing measure is given by d .s/ D j.s/j Proof. Using
Z f .t / D
ei.t;s/ d .s/
(integration over Rd ) we obtain Z Q f .t / D f .t x/ d. /.x/ Q “ D f .t x C y/ d.x/ d.y/ • D ei.txCy;s/ d .s/ d.x/ d.y/ Z D
Z e
i.t;s/
i.x;s/
e
Z d.x/ ei.y;s/ d.y/ d .s/
Z D
2 O d .s/: ei.t;s/ j.s/j
Using L2 Fourier transformation we can characterize characteristic functions corresponding to absolutely continuous distributions. Theorem 1.8.16. Let f be the characteristic function of a distribution on Rd . Then is absolutely continuous if and only if there exists g 2 L2 .Rd / such that Z g.t C x/g.x/ dx; t 2 Rd : f .t / D g g.t Q /D Rd
2 dx. For this function g we have d .x/ D jg.x/j O
Proof. Assume first that d .t / D p.t / dt where p 2 L1 .Rd /; p 0, so that d
f D .2/ 2 p: L Choose an arbitrary Lebesgue measurable function q with p D jqj2 (we can take p q D p). Then q 2 L2 .Rd /. Setting g WD qL we have g 2 L2 .Rd /; gO D q and .g g/O Q D .2/ 2 jgj O 2 D .2/ 2 jqj2 D .2/ 2 p D fO: d
d
d
Thus, f D g g. Q Assume now that f D g gQ with some function g 2 L2 .Rd /. Parseval’s identity (1.8.9.i) shows that Z Z 2 g.t C x/g.x/ dx D ei.t;x/ jg.x/j O dx f .t / D i.e., d .x/ D
Rd 2 jg.x/j O dx.
Rd
Section 1.9 Fourier transformation on discrete commutative groups
1.9
55
Fourier transformation on discrete commutative groups
In this section we prove some basic facts on the Fourier transformation on a commuO which we will need in the sequel. tative group G and its character group G, Definition 1.9.1. The Fourier–Stieltjes transform O of a complex measure 2 Mb .G/ is defined by Z O . O /D
.x/ d.x/; 2 G: G
If d.x/ D f .x/ dx with f 2 L1 .G/ where dx denotes integration with respect to the counting measure on G, then we write fO for O and call fO the Fourier transform of f . If G D Zd and we identify GO with Œ0; 2/d (see Remark 1.7.7) then we have Z X ei.t;n/ d.n/ D .fng/ ei.t;n/ ; t 2 Œ0; 2/d .t O /D Zd
n2Zd
for all 2 Mb .Zd /. Theorem 1.9.2. Every complex measure 2 Mb .G/ is uniquely determined by its Fourier–Stieltjes transform O which is a bounded continuous function. For all ; 2 Mb .G/ we have (i)
. C /O D O C ; O
(ii)
.c/O D c O
(iii)
. /O D O ; O
(iv)
./O Q D . O
for all c 2 C;
Proof. The uniqueness follows from Corollary 1.7.13, the remaining properties are simple consequences of the definitions. Definition 1.9.3. The inverse Fourier–Stieltjes transform L of a complex measure O is defined by 2 Mb .G/ Z
.x/ d. /; x 2 G: .x/ L D O G
Note that the function 7! .x/ is bounded and continuous on GO for all x in G, O where denotes the so the integral exists. If d. / D f . / d. / with f 2 L1 .G/ O then we write fL for L and call fL the inverse Fourier normalized Haar measure on G, transform of f .
56
Chapter 1 Characteristic functions
If G D Zd and we identify GO with Œ0; 2/d then we have Z ei.t;n/ d.t /; n 2 Zd .n/ L D Œ0;2/d
for all 2 Mb .Œ0; 2/d /. O is uniquely determined by its Theorem 1.9.4. Every complex measure 2 Mb .G/ inverse Fourier–Stieltjes transform L which is a bounded function. For all ; 2 O we have Mb .G/ (i)
. C /L D L C ; L
(ii)
.c/L D c L
(iii)
./L Q D . L
for all c 2 C;
Proof. The uniqueness can be proved by the same arguments used in the proof of Theorem 1.7.8. The remaining properties are simple consequences of the definitions. The next theorem is the analogue of Theorem 1.8.15 and can be proved in the same way. We omit the proof. Theorem 1.9.5. Assume that f 2 P .G/ admits the representing measure 2 O and let 2 Mb .G/ be arbitrary. Then the function Q f is positive MbC .G/ 2 d .s/. O definite and its representing measure is given by d .s/ D j.s/j The analogue of Theorem 1.8.7 is contained in the next result. Theorem 1.9.6. (ii)
The relation f D .fO /L holds for all f 2 L1 .G/. If f 2 P .G/ \ L1 .G/, then fO 0.
(iii)
If f 2 L1 .G/ and fO 0, then f 2 P .G/.
(i)
Proof. (i) Applying Fubini’s theorem we obtain Z Z O f .x/ .x/ dx .t / d. / .f /L.t / D O G
G
G
O G
Z Z D Z D
.t x/ d. / f .x/ dx
f .x/1ftg .x/ dx D f .t /; G
t 2 G:
57
Section 1.10 Basic properties of Gaussian distributions
O (ii) By Theorem 1.7.8 there exists a unique nonnegative measure 2 MbC .G/ O such that f D . L By (i) we also have f D .f /L. Since is unique we must have D fO d. The nonnegativity of implies that fO 0. (iii) This statement follows immediately from (i). The extended Fourier transform 1.9.7. If 2 Mf .Zd / then we can extend the Fourier–Stieltjes transform of from Td to Cd n f0g by setting Z .z/ O WD z n d.n/; z 2 Cd n f0g: Zd
In the same way, if f is a complex-valued function on Zd having finite support, then we define the Fourier transform of f on Cd n f0g by X z n f .n/; z 2 Cd n f0g: fO.z/ WD n2Zd
It is easy to see that the relations fQ O.z/ D fO.1=z/
(1)
.f g/O.z/ D fO.z/ g.z/ O hold for all z 2 Cd n f0g. In particular, f fQ O.z/ D fO.z/ fO.1=z/;
1.10
(2)
z 2 Cd n f0g:
Basic properties of Gaussian distributions
1.10.1. In Lemma 1.3.4 we already introduced the standard Gaussian density '.x/ D
1 1 e 2 .x;x/ ; d=2 .2/
x 2 Rd
and showed that its characteristic function is 1
g.t / D e 2 .t;t/ ;
t 2 Rd :
Suppose that a random vector X has characteristic function g and consider the random vector Y D m C BX where m 2 Rd and B is a d d real matrix. The covariance matrix of Y is C D BB T (see page 349). By Theorem 1.1.7, the characteristic function fY of Y is given by 1
fY .t / D ei.m;t/ g.B T t / D ei.m;t/ 2 .B
T t;B T t/
1
D ei.m;t/ 2 .C t;t/ :
Since C is a covariance matrix, it is positive semidefinite (cf. Section F.1).
58
Chapter 1 Characteristic functions
Now let C be an arbitrary positive semidefinite real matrix. By Theorem D.2.11 there exists a symmetric real matrix B such that B 2 D C . The above considerations 1 show that fY .t / D ei.m;t/ 2 .C t;t/ is the characteristic function of Y D m C BX where X is standard Gaussian. The exponent i.m; t / 12 .C t; t / is a polynomial of t of degree at most two. We will show in Section 3.5 that a function of the form eP where P is a polynomial of degree greater than two cannot be a characteristic function (cf. Theorem 3.5.1). Definition 1.10.2. A d-dimensional random vector Y is said to be Gaussian (or normal) if its characteristic function f has the form 1
f .t / D ei.m;t/ 2 .C t;t/ ;
t 2 Rd
with some m 2 Rd and a d d positive semidefinite real matrix C . For such Y we will write Y N.m; C /. It follows from Theorem 1.2.1 that m D E.Y / and C D cov.Y /. The next lemma follows immediately from the definition of Gaussian random vectors, we omit the proof. Lemma 1.10.3. Let X be a d-dimensional Gaussian vector with X N.m; C /. For an arbitrary n d matrix B, the n-dimensional random vector Y D BX is Gaussian and Y N.Bm; BCB T /. In particular, all marginal distributions of X are Gaussian. Moreover, for each a 2 Rd the random variable .a; X / has a univariate Gaussian distribution with mean .a; m/ and variance .a; C a/. The next theorem gives a very useful characterization of Gaussian random vectors. Theorem 1.10.4. A d-dimensional random vector X is Gaussian if and only if for each a 2 Rd the random variable .a; X / is Gaussian. Proof. The necessity of the condition has already been established in Lemma 1.10.3. To prove the sufficiency suppose that Xa D .a; X / is Gaussian for any a 2 Rd . Setting m D E.X / and C D cov.X / we have E.Xa / D .m; a/ and cov.Xa / D .C a; a/. Thus, the characteristic function fa of Xa is given by 1 2 fa .t / D E eit.a;X/ D eit.m;a/ 2 t .C a;a/ ; t 2 R: Setting here t D 1 we obtain the required characteristic function of X , namely 1 E ei.a;X/ D ei.m;a/ 2 .C a;a/ ; a 2 Rd :
59
Section 1.10 Basic properties of Gaussian distributions
Theorem 1.10.5. Let Y be a d-dimensional Gaussian vector with covariance matrix C having rank k > 0. Then there exist a k-dimensional standard Gaussian vector X and a d k real matrix B of rank k such that C D BB T and Y D E.Y / C BX: Consequently, the distribution of Y is supported by a k-dimensional linear manifold. Proof. We may assume that E.Y / D 0. Let Y D .Y1 ; : : : ; Yd / and denote by L the linear subspace of L2r . ; A; P / spanned by the random variables Yj . The elements of L are Gaussian random variables with zero mean (cf. Theorem 1.10.4). By Theorem D.2.16, the dimension of L is equal to the rank of C . Let X1 ; : : : ; Xk be an arbitrary orthonormal basis of L. Then the random vector X D .X1 ; : : : ; Xk / is standard Gaussian. The existence of B follows from the fact that each Yj is a linear combination of the Xi ’s. Moreover, rank.B/ D dim L D k. Using the previous theorem we can now express the density of absolutely continuous Gaussian vectors by means of the standard Gaussian density. Theorem 1.10.6. Let Y be a d-dimensional Gaussian vector with Y N.m; C /. If det C > 0 then Y has the density p.x/ D
1 .2/d=2
1
p e 2 .C det C
1 .xm/;xm/
;
x 2 Rd :
Proof. We already know that the statement is true for a standard Gaussian vector (cf. 1.10.1). On the other hand, by Theorem 1.10.5, Y can be written as Y D m C BX where X is d-dimensional, standard Gaussian and B is a d d matrix. The covariance of m C BX is BB T and hence det C D .det B/2 . The theorem follows now immediately from the transformation formula in Corollary B.7.3. Theorem 1.10.7. Let Xn be a d-dimensional Gaussian random vector with mean mn n and covariance matrix Cn D .ci;j /; n 2 N. The sequence fXn g1 1 converges in law n g1 converge and fci;j to some random vector X if and only if the sequences fmn g1 1 nD1 for all i and j . In case of convergence, X is Gaussian with mean m D limn mn and n /. covariance matrix C D .limn ci;j Proof. The characteristic function of Xn is given by 1
fn .t / D ei.mn ;t/ 2 .Cn t;t/ ;
t 2 Rd :
60
Chapter 1 Characteristic functions
n 1 If the sequences fmn g1 1 and fci;j gnD1 converge, then 1
lim fn .t / D ei.m;t/ 2 .C t;t/ ;
n!1
t 2 Rd
and convergence in law of the sequence fXn g1 1 follows from Theorem 1.6.3. 1 Assume now that fXn g1 converges in law. By Theorem 1.6.1, the sequence ffn g1 1 converges uniformly on every compact set. The same holds for the sequence fjfn jg1 1 hence the limit lim .Cn t; t /; t 2 Rd n!1
exists. By Lemma D.2.17, the sequence fCn g1 1 converges entrywise to some positive i.mn ;t/ semidefinite matrix C . Considering now the sequence ffn =jfn jg1 1 we see that e converges to some continuous positive definite function. The modulus of this function is 1 and therefore it has the form ei.m;t/ with some m (cf. Lemma 1.5.1). Thus, the one point measures ımn converge weakly to ım from which mn ! m follows. Theorem 1.10.8. If X D .X1 ; : : : ; Xd / is Gaussian and the random variables Xj are pairwise uncorrelated, then X1 ; : : : ; Xd are (completely) independent. Proof. Since the Xj ’s are uncorrelated the covariance matrix C of X is diagonal: C D diag .12 ; : : : ; d2 /. Thus, 1
fX .t / D ei.EX;t/ 2 .C t;t/ D
d Y
1
2 2
eitj EXj 2 j tj ;
t 2 Rd
j D1
and hence X1 ; : : : ; Xd are independent (cf. Theorem 1.3.10). Remark 1.10.9. There exist uncorrelated Gaussian random variables which are not independent. To see an example let p 1 2 2 h.x/ D 2 e 2 x ex ; x 2 R and
i 1 h 2 2 h.x/ ey C h.y/ ex ; .x; y/ 2 R2 : 2 It is easy to check that p is a density and Z 1 Z 1 1 2 1 2 1 1 e 2 x ; e 2 y : p.x; y/ dy D p p.x; y/ dx D p 2 2 1 1 p.x; y/ D
Since p is an even function in x and in y we have Z 1Z 1 xyp.x; y/ dx dy D 0: 1
1
61
Section 1.11 Some inequalities
Now let Z D .X; Y / be a two-dimensional random vector with density p. Note that Z is not Gaussian. By what we have proved, X and Y are standard Gaussian and uncorrelated. Since p is not the product of the marginal densities, they are not independent. Theorem 1.10.10. Let X D .X1 ; : : : ; Xd / where X1 ; : : : ; Xd are independent standard Gaussian random variables. Further, let a; b 2 Rd n f0g and O be a d d real matrix.26 (i)
The random variables Ya D .a; X / D a1 X1 C C ad Xd and Yb D .b; X / D b1 X1 C C bd Xd are independent if and only if .a; b/ D 0.
(ii)
The coordinates of the random vector OX are independent if and only if O is an orthogonal matrix.
Proof. The second statement follows immediately from the first one. To prove (i) we consider the 2-dimensional random vector Y D .Ya ; Yb /. It follows from Theorem 1.10.4 that Y is Gaussian. Applying Theorem 1.10.8 we obtain that Ya and Yb are independent if and only if d X
0 D E.Ya Yb / D
aj bk E.Xj Xk / D .a; b/:
j;kD1
1.11
Some inequalities
In the present section we mainly consider inequalities which are used in this book.27 Theorem 1.11.1. Let X be a d-dimensional random vector such that kX k R with some R > 0 and E.X / D 0. Then the inequality jfX .t /j 1 holds for all t 2 Rd with kt k
R Then the corresponding characteristic functions f and fR satisfy the inequality jRe f .t /j Re fR .t / for all t 2 Rd with kt k
2R .
Proof. Denote by and R the corresponding distributions and write B D fx 2 Rd W kxk Rg: Note that R D B C .1 .B// ı0
63
Section 1.11 Some inequalities
where B is given by B .A/ D .B \ A/; A 2 Bd . If kt k 2R and kxk R, then cos ..t; x// 0. Using this we obtain: ˇ ˇZ ˇ ˇ cos ..t; x// d.x/ˇˇ jRe f .t /j D ˇˇ d ˇ ˇZ ˇ ˇZR ˇ ˇ ˇ ˇ ˇ ˇ ˇ cos ..t; x// d.x/ˇˇ ˇ cos ..t; x// d.x/ˇ C ˇ B Rd nB ˇZ ˇ Z ˇ ˇ ˇ cos ..t; x// d.x/ C ˇ cos ..t; x// d.x/ˇˇ D B Rd nB Z cos ..t; x// d.x/ C .Rd n B/ ZB cos ..t; x// dB .x/ D Re fR .t /: D Rd
Theorem 1.11.4. Let X be a d-dimensional random vector such that the support of its distribution is not contained in some hyperplane of Rd. Then there exist positive numbers ı and such that the inequality jfX .t /j 1 kt k2
(1)
holds for all t 2 Rd with kt k < ı.28 Proof. We may suppose that f WD fX is real-valued. Indeed, if the statement is true for real-valued characteristic functions, then for an arbitrary characteristic function f we have jf .t /j2 1 kt k2 ; kt k ı for some positive ı and , which yields jf .t /j 1
kt k2 ; 2
kt k ı:
We choose R > 0 such that the random variable XR from Lemma 1.11.3 is not concentrated on a hyperplane.29 Then the covariance matrix of XR is strictly positive definite and hence .t; cov.XR /t / kt k2 ; t 2 Rd where .> 0/ is the smallest eigenvalue of cov.XR /. The theorem now follows from Corollary 1.11.2 and Lemma 1.11.3, using the fact that f and fR are real valued.
28 29
See also Lemma 1.2.8. For this it is sufficient to choose R such that fx 2 supp W kxk Rg has a positive Lebesgue measure.
64
Chapter 1 Characteristic functions
Theorem 1.11.5. Let n 2 N and f be a real-valued 2n-times differentiable characteristic function on R. Then we have 2b n1 2 cC1
X
n
2b 2 c X M2k 2k M2k 2k .1/ .1/k t f .t / t ; .2k/Š .2k/Š k
kD0
t 2 R;
kD0
where M2k is the moment of order 2k of the corresponding distribution. Proof. From Corollary 1.2.3 and inequality (1.1.2.ii) we see that M2k f .2k/ .t / M2k ;
t 2 R; k D 0; 1; : : : ; n:
Since f is an even function, all derivatives of odd order are zero at 0. Applying the n upper estimate to f .4b 2 c/ . t / in the Taylor expansion 2b n 2 c1
f .t / D
X
kD0
n
n t 4b 2 c M2k 2k .1/ t C f .4b 2 c/ . t /; n .2k/Š .4b 2 c/Š
k
t 2 R; 2 Œ0; 1;
we obtain the upper estimate for f .t /. The lower estimate can be obtained in the same way. As an application of Corollary 1.7.9 we prove two useful inequalities for positive definite functions. Theorem 1.11.6. Let f be a positive definite function on a commutative group G such that f .0/ D 1. Then for all x 2 G and n 2 N we have (i)
1 Re f .nx/ nŒ1 .Re f .x//n ;
(ii)
1 jf .nx/j nŒ1 jf .x/jn .
Proof. For each fixed x 2 G the function n 7! f .nx/ is positive definite on Z. We may therefore assume that G D Z and x D 1. Thus, we have to show that 1 Re f .n/ nŒ1 .Re f .1//n
(1)
1 jf .n/j nŒ1 jf .1/jn :
(2)
To prove the first inequality we define the set Kn R2 by Kn WD f.x; y/ W jxj 1; jyj 1; nx n C 1 n yg: Inequality (1) is then equivalent to the relation .Re f .1/; Re f .n// 2 Kn . The set Kn is convex for all n 2 N. Using this and Corollary 1.7.9 we see that it is sufficient to show (1) for the positive definite functions n 7! z n ; z 2 T. Thus, it remains to prove that 1 Re .z n / nŒ1 .Re z/n ; z 2 T;
65
Section 1.11 Some inequalities
4
2
2
2
4 Figure 1.7. The function hn from the proof of Theorem 1.11.6 with n D 5.
or equivalently, hn .t / WD h.t / WD n cosn t cos nt n 1;
t 2R
(3)
(see Figure 1.7). The function h is periodic and hence it attains its maximum at some y 2 R. From h0 .y/ D n2 cosn1 y sin y C n sin ny D 0 we see that either sin y D 0 or sin y 6D 0
and n cosn1 y D
sin ny : sin y
(4)
If sin y D 0, then (3) holds. If y satisfies (4), then cos y sin ny cos y sin ny sin y cos ny cos ny D sin y sin y sin.n 1/y D : sin y
h.y/ D
Using induction on n we see that sin.n 1/y= sin y n 1; n 2 N, and therefore h.y/ n 1. To prove inequality (2), let z 2 T be such that Re .zf .1// D jf .1/j. Since the function n 7! z n f .n/ is positive definite on Z, inequality (1) shows that njf .1/jn C 1 n D n.Re zf .1//n C 1 n Re .z n f .n// jf .n/j from which (2) follows.
66
Chapter 1 Characteristic functions
Using the inequality 1 r n n.1 r/; 1 r 1; we obtain the following corollary: Corollary 1.11.7. With the notation of the previous theorem we have (i)
1 Re f .nx/ n2 Œ1 Re f .x/;
(ii)
1 jf .nx/j n2 Œ1 jf .x/j.
Chapter 2
Correlation functions
In the first section of this chapter we present a few basic results on processes with independent increments which can be obtained easily by using characteristic functions. Then we concentrate on those properties of second order fields which are related to the fact that their correlation functions represent positive semidefinite kernels. In the second part of the chapter we investigate the special case of stationary fields. At the end of the chapter we give an introduction to the harmonic analysis of stationary fields using unitary representations built from the correlation function. Throughout this chapter the symbol T denotes a nonempty subset of Rd with some d 2 N and . ; A; P / is a probability space.
2.1
Random fields
Basic notions 2.1.1. A real (complex) random field Z is a mapping from T into the set of all real (complex, respectively) random variables on . ; A; P /. The random variable Z.t /; t 2 T , is also written as Zt . If Z.t / is square integrable for all t 2 T , then Z is called a second order random field. In the special case T D R or T D Œ0; 1/ we also use the terminology random process or simply process instead of random field. If T D Z or T D N0 , then Z is usually called a time series. For any finite subset ft1 ; : : : ; tn g of T the distribution of the random vector X D .Z.t1 /; : : : ; Z.tn // is called a finite-dimensional distribution of Z. If Z is real and all finite-dimensional distributions are Gaussian (see Definition 1.10.2), then Z is said to be a Gaussian field. Note that Z is Gaussian if and only if the characteristic function fX of X is given by 1
fX .s/ D ei.s;E.X// 2 .C s;s/ ;
s D .s1 ; : : : ; sn / 2 Rn
where C is the covariance matrix of X . If Z is complex and all random vectors .Re Z.t1 /; Im Z.t1 /; : : : ; Re Z.tn /; Im Z.tn // are Gaussian, then Z is called a complex Gaussian field. A process Z on T D R or T D Œ0; 1/ is said to have independent increments if the random variables Z.t2 / Z.t1 /; Z.t3 / Z.t2 /; : : : ; Z.tn / Z.tn1 /;
68
Chapter 2 Correlation functions
the so-called increments of Z, are independent for every n 3 and every t 2 Rn such that t1 < < tn and tj 2 T . Next we prove the existence of processes having special properties. The proof uses Kolmogorov’s existence theorem formulated in terms of characteristic functions (cf. Remark F.3.4). We need the following simple lemma. Lemma 2.1.2. A real process Z on T D Œ0; 1/ with Z.0/ D 0 has independent increments if and only if for every d 1 and for every t 2 Rd such that 0 t1 < < td the characteristic function ft of the random vector .Z.t1 /; : : : ; Z.td // satisfies the equation ft .x/ D g1 .x1 C C xd / g2 .x2 C C xd / : : : gd .xd /;
x 2 Rd
(1)
where gj , j 1, denotes the characteristic function of the random variable j D Z.tj / Z.tj 1 /, and t0 D 0. Proof. Denote by g the characteristic function of the random vector .1 ; : : : ; d /. We have ft .x/ D E .exp Œix1 Z.t1 / C ix2 Z.t2 / C C ixd Z.td // D E .exp Œix1 1 C ix2 .1 C 2 / C C ixd .1 C C d // D E .exp Œi.x1 C C xd /1 C i.x2 C C xd /2 C C ixd d / D g.x1 C C xd ; x2 C C xd ; : : : ; xd /: These equations show that (1) holds whenever the random variables j are independent. Assume now that (1) holds. The equations above show that g.y/ D g1 .y1 / : : : gd .yd / where yj D xj C C xd ; 1 j d . Since any y 2 Rd can be represented in this way with some x 2 Rd the independence of the j ’s follows from Theorem 1.3.10. Theorem 2.1.3. Let h be a complex-valued function on R such that eph is a characteristic function for all p 2 Œ0; 1/.1 There exists a process Z on Œ0; 1/ satisfying the conditions (i)
Z.0/ D 0;
(ii)
Z has independent increments;
(iii)
the characteristic function of Z.t / Z.s/ is e.ts/h for all t s.
1
We will investigate functions with this property in Section 3.11.
69
Section 2.1 Random fields
Proof. Let d 1 and t0 D 0. By assumption, the function " d # X ft .x/ D exp .tj tj 1 /h.xj C C xd / ;
x 2 Rd
j D1
is a characteristic function for all t 2 Œ0; 1/d such that t1 td . For arbitrary t 2 Œ0; 1/d we choose a permutation such that t.1/ t.d / and define ft by the consistency condition (F.3.4.i) Then condition (F.3.4.ii) is satisfied as well. Consequently, there exists a process Z on Œ0; 1/ such that the finite-dimensional distributions of Z have the ft ’s as characteristic functions. Obviously, Z.0/ D 0. The equation above with d D 2 and x2 D x1 shows that the characteristic function of Z.t / Z.s/ is e.ts/h if t s (cf. Theorem 1.1.7). By Lemma 2.1.2, the process Z has independent increments. As a corollary we establish the existence of two important processes, the Wiener process and the Poisson process.2 The next two statements follow immediately from Theorem 2.1.3 by noting that the characteristic function of a Gaussian random variable 1 2 2 with mean zero and variance 2 is t 7! e 2 t while the characteristic function it of a Poisson distributed random variable with mean 0 is t 7! e.e 1/ (see Examples 1.1.13 and 1.2.4). Corollary 2.1.4 (Wiener process). There exists a process W on Œ0; 1/ satisfying the conditions (i)
W .0/ D 0;
(ii)
W has independent increments;
(iii)
W .t / W .s/ has a Gaussian distribution with mean zero and variance t s for all t s.
Corollary 2.1.5 (Poisson process). For each 0 there exists a process N on Œ0; 1/ satisfying the conditions (i)
N.0/ D 0;
(ii)
N has independent increments;
(iii)
N.t / N.s/ has a Poisson distribution with mean .t s/ for all t s.
2
Usually one also imposes conditions on the trajectories t 7! Zt .!/ of these processes but we do not consider these questions in this book.
70
Chapter 2 Correlation functions
2.2 Correlation functions of second order random fields To give a complete description of a random field it is necessary to specify all of its finite-dimensional distributions. Since the theoretical determination of these distributions is possible only in a few cases, it is natural to restrict oneself to investigating properties of the field which are determined by simple characteristics of the finitedimensional distributions. The next definition uses only the first and second moments of the field. Definition 2.2.1. Let Z be a second order field on T . The function C W T T ! C defined by C.s; t / D E Z.s/ Z.t / is called the correlation function of Z. The covariance function is defined by .s; t / D E ŒZ.s/ M.s/ ŒZ.t / M.t / where M.t / D E.Z.t // is the mean of the field. An easy computation shows that .s; t / D C.s; t / M.s/ M.t /: If the mean is zero, the correlation function and the covariance function are equal. Note that all finite-dimensional distributions of a Gaussian field are completely determined if we know its mean and correlation function. Example 2.2.2. (a) Let X 2 L2 . ; A; P / and f W T ! C be arbitrary. Then Z.t / D f .t / X;
t 2T
is a second order field with mean M.t / D f .t / E .X / and correlation function C.s; t / D f .s/f .t / E .jX j2 /: (b) If Z0 D 0 and Z has independent increments, then for 0 s t we have C.s; t / D E.Zs Zt / D E.Zs .Zt Zs // C E.Zs2 / D E.Zs / E.Zt Zs / C E.Zs2 / D E.Zs / E.Zt / C Var.Zs / and hence .s; t / D Var.Zs /;
s t:
Section 2.2 Correlation functions of second order random fields
71
In particular, the covariance function of the Wiener process and of the Poisson process with D 1 is given by3 .s; t / D min.s; t /;
t; s 2 Œ0; 1/:
The next theorem gives a characterization of correlation functions. Theorem 2.2.3. A complex-valued (real-valued) function K defined on T T is the covariance or correlation function of a second order complex (real, respectively) field if and only if for any finite collection t1 ; : : : ; tn of elements of T the matrix n K.ti ; tj / i;j
is positive semidefinite (and symmetric, respectively). Proof. Assume that K is the covariance function of a complex field Z. The case where K is the correlation function can be treated in the same way. We have n X i;j D1
D
K.ti ; tj /ci cj n X
h i E .Z.ti / M.ti // .Z.tj / M.tj // ci cj
i;j D1 ˇ n X
ˇ D Eˇˇ
j D1
ˇ2 ˇ cj .Z.tj / M.tj //ˇˇ 0
for all c1 ; : : : ; cn 2 C, i.e., K is positive semidefinite. If the field is real, then K is symmetric and the relations above hold for all c1 ; : : : ; cn 2 R. Theorem D.2.4 shows that K is positive semidefinite. The remaining statements follow from Theorem 2.2.4. Theorem 2.2.4. Let M W T ! C be arbitrary and let K W T T ! C be such that for any finite collection t1 ; : : : ; tn of elements of T the matrix n K.ti ; tj / (1) i;j
is positive semidefinite. Then there exists a Gaussian field Z on T such that E.Z.t // D M.t / E.Z.s/ Z.t / / M.s/ M.t / D K.s; t / E.Z.s/ Z.t / / D M.s/ M.t /: 3
(2) (3) (4)
Combining this with Theorem 2.2.3 we obtain an alternative, though not elementary, proof of the positive definiteness of the kernel .s; t / 7! min.s; t / (cf. Theorem D.2.10).
72
Chapter 2 Correlation functions
If M and K are real-valued and K is symmetric, then there exists a real Gaussian field Z such that (2) and (3) hold.4 Proof. Assume first that M and K are real-valued and let ft1 ; : : : ; tn g T be arbitrary. Since K is positive semidefinite there exists a unique Gaussian distribution on Rn with mean vector .M.t1 /; : : : ; M.tn // and covariance matrix (1). The collection of all these distributions satisfies the conditions of Kolmogorov’s consistency Theorem F.3.3. Hence there exists a real Gaussian field satisfying (2) and (3). Now assume that M and K are complex-valued and let TQ WD T f1; 2g. Further, let MQ .t; 1/ D Re M.t /; MQ .t; 2/ D Im M.t / and define the real kernel KQ on TQ TQ by setting 1 Q Q K..s; 1/; .t; 1// D K..s; 2/; .t; 2// D Re K.s; t / 2 and 1 Q Q K..s; 1/; .t; 2// D K..s; 2/; .t; 1// D Im K.s; t /; s; t 2 T: 2 It follows from Lemma D.2.14 that KQ is positive semidefinite. By the first part of the proof there exists a real field ZQ on TQ such that the analogues of (2) and (3) are satisfied Q Setting now for MQ ; KQ and Z. Q 1/ C iZ.t; Q 2/; Z.t / WD Z.t;
t 2T
an easy computation shows that Z satisfies the equations (2)–(4). Example 2.2.5. By Lemma D.2.10 and Theorem 2.2.4, there exists a real Gaussian process X on T D Œ0; 1/ such that E.X.t // D 0 and E.X.t /X.s// D min.t; s/;
t; s 2 Œ0; 1/:
(1)
Let t1 t2 s1 s2 . Using (1) we obtain E.ŒX.t2 / X.t1 / ŒX.s2 / X.s1 // D C.t2 ; s2 / C.t2 ; s1 / C.t1 ; s2 / C C.t1 ; s1 / D t2 t2 t1 C t1 D0 i.e., the increments X.t2 / X.t1 / and X.s2 / X.s1 / are uncorrelated. Since the increments of a Gaussian process are Gaussian, we infer that they are independent (cf. Theorem 1.10.8). Thus, we obtain the Wiener process constructed in Corollary 2.1.4. Now let h 2 Œ0; 1/ be arbitrary and define the process X h by X h .t / D X.t C h/ X.t /; 4
t 2 Œ0; 1/:
If a real field satisfies (2) and (4), then E.ŒZ.t / M.t /2 / D 0 and hence Z.t / D M.t /.
Section 2.2 Correlation functions of second order random fields
73
Denote by C h the correlation function of this process. The mean of X h is zero and for s t we have C h .s; t / D E Œ.X.s C h/ X.s// .X.t C h/ X.t // D C.s C h; t C h/ C.s C h; t / C.s; t C h/ C C.s; t / D s C h min.s C h; t / s C s D max.0; h t C s/: In the same way we see that C h .s; t / D max.0; h C t s/; s t , and hence C h .s; t / D max.0; h js t j/: Therefore C h .s; t / depends only on s t , a property that we will call stationarity (cf. Definition 2.8.1). Notation 2.2.6. Recall that the set L2 .P / WD L2 . ; A; P / of all square integrable complex-valued random variables is a linear space with the usual notion of multiplication by a scalar and addition of random variables. This linear space endowed with the inner product Z X Y dP D E.X Y / .X; Y / WD
is a positive semidefinite inner product space. The inner product space L2r .P / of all square integrable real-valued random variables on . ; A; P / is introduced in the same way. We denote the corresponding Hilbert spaces by L2 .P / and L2r .P /, the elements of which are equivalence classes of random variables. In all these cases we will use the notation k k for the norm: p kX k D .X; X / : We have by the Cauchy–Schwarz inequality jE.X Y /j kX k kY k: Note that the expectation can be expressed using the inner product: E .X / D .X; 1 /: Two square integrable random variables with zero mean are orthogonal if and only if they are uncorrelated. For a field Z of second order on T we define the mapping Z W T ! L2 .P / by Zt D Zt C N where N is the class of random variables which are zero P -almost everywhere. We call Z the L2 -valued random field corresponding to Z. The notions mean, covariance function, correlation function and Gaussian field are defined for Z
74
Chapter 2 Correlation functions
in the same way as for Z. By H.Z/ and H.Z/ we denote the closed linear subspaces generated by all Zt and Zt ; t 2 T , respectively. Let fe˛ g˛2I be an orthonormal basis of H.Z/. The index set I may be finite, countable infinite or uncountable.5 In any case X f˛ .t / e˛ < 1; t 2 T (1) Zt D ˛2I
where f˛ .t / D .Zt ; e˛ / (for each t only countable many summands are different from zero). Moreover, X jf˛ .t /j2 : (2) kZt k2 D kZt k2 D ˛2I
Theorem 2.2.7. A complex-valued (real-valued) function K on T T is the correlation function or covariance function of a complex (real, respectively) field of second order if and only if it can be represented in the form X K.s; t / D f˛ .s/f˛ .t /; s; t 2 T (1) ˛2I
where I is some nonempty set and f˛ ; ˛ 2 I , are complex-valued (real-valued, respectively) functions on T such that X jf˛ .t /j2 < 1; t 2 T: (2) ˛2I
Proof. We consider only correlation functions and the complex case, the rest is similar. Assume that (1) and (2) hold. Then K is positive semidefinite: ˇ n ˇ2 n ˇ X ˇˇ X X ˇ K.tj ; tk /cj ck D f˛ .tj /cj ˇ 0: ˇ ˇ ˇ j;kD1
˛2I
j D1
Theorem 2.2.3 shows that K is a correlation function. Now let K be the correlation function of a field Z W T ! L2 . ; A; P /. The relations (1) and (2) follow immediately from (2.2.6.1) and (2.2.6.2) by noting that .Zs ; Zt / D .Zs ; Zt /.
5
To give an example where I is uncountable, we take independent random variables Zt ; t 2 R, with mean 0 and variance 1. Then fZt gt2R is an orthonormal basis of H.Z/. The fact that such random variables exist follows from Corollary F.3.2.
Section 2.3 Continuity and differentiability
2.3
75
Continuity and differentiability
As we will see, continuity and differentiability of a second order field can be completely described in terms of its correlation function. Definition 2.3.1. A second order field Z is called strongly continuous at t0 2 T if lim E jZ.t / Z.t0 /j2 D 0: t!t0
The field Z is called strongly continuous if it is strongly continuous at each t0 2 T . Theorem 2.3.2. A second order field Z on T is strongly continuous at t 2 T if and only if its correlation function C is continuous at .t; t /. Moreover, if for some t; s 2 T the function C is continuous at .t; t / and .s; s/, then it is continuous at .t; s/ and .s; t / as well. Proof. For any h 2 T such that t C h 2 T we have kZ.t C h/ Z.t /k2 D C.t C h; t C h/ C.t C h; t / C.t; t C h/ C C.t; t /: Thus, if C is continuous at .t; t /, then Z is strongly continuous at t . For t and h as above we have C.t C h; t C / C.t; t / D E ŒZ.t C h/ Z.t / ŒZ.t C / Z.t / CŒZ.t C h/ Z.t / Z.t / CŒZ.t C / Z.t / Z.t / : Applying the Cauchy–Schwarz inequality we obtain jC.t C h; t C / C.t; t /j kZ.t C h/ Z.t /k kZ.t C / Z.t /k CkZ.t C h/ Z.t /k kZ.t /k CkZ.t C / Z.t /k kZ.t /k: Thus, if Z is strongly continuous at t , then C is continuous at .t; t /. To prove the second statement assume that C is continuous at .t; t / and .s; s/. As we have seen, this implies that Z is strongly continuous at t and s. The proof is completed by noting that C.t; s/ D .Z.t /; Z.s//:
76
Chapter 2 Correlation functions
Definition 2.3.3. A process Z of second order on .a; b/; 1 a < b 1, is called strongly differentiable, if for all t 2 .a; b/ the strong limit6 Z.t C h/ Z.t / h h!0 lim
exists. The limit is denoted by Z 0 .t /. In the same way one can define partial derivatives of a field defined on an open set T Rd . Next we show that strong differentiability of a process is closely related to the differentiability of its correlation function. For a function f of two variables we write 2 f .t; s; ; h/ D f .t C ; s C h/ f .t C ; s/ f .t; s C h/ C f .t; s/: Theorem 2.3.4. Let Z be a process of second order on .a; b/ with correlation function C . The process Z is strongly differentiable if and only if the limit 2 C.t; t; ; h/
h ;h!0 lim
exists for all t 2 .a; b/. If Z is strongly differentiable, then the limit 2 C.t; s; ; h/
h ;h!0
D.t; s/ WD lim
exists for all t; s 2 .a; b/ and D is the correlation function of Z 0 . Moreover, d .Z.t /; v/ D .Z 0 .t /; v/ dt
(1)
for all v 2 L2 . ; A; P /. In particular, with v D 1 , d E .Z.t // D E .Z 0 .t //: dt
(2)
Proof. By Cauchy’s criterion (cf. Lemma D.3.2), the process Z is strongly differentiable at t if and only if the limit
2 C.t; t; ; h/ Z.t C / Z.t / Z.t C h/ Z.t / lim ; D lim
h
h ;h!0 ;h!0 exists. This shows the first statement.
6
That is, limit with respect to the norm of L2 .P /.
Section 2.4 Integration with respect to complex measures
77
If Z is strongly differentiable, then
Z.t C / Z.t / Z.s C h/ Z.s/ 0 0 .Z .t /; Z .s// D lim ; lim !0
h h!0
Z.t C / Z.t / Z.s C h/ Z.s/ ; D lim
h ;h!0 2 C.t; s; ; h/ D lim D D.t; s/:
h ;h!0 Equation (1) follows from the fact that strong convergence implies weak convergence:
Z.t C h/ Z.t / Z.t C h/ Z.t / ; v D lim ;v : lim h h h!0 h!0 Applying Lemma B.6.3 we obtain the following corollary: Corollary 2.3.5. If C has continuous partial derivatives of second order on .a; b/ .a; b/, then Z is strongly differentiable and the correlation function D of Z 0 is given by @2 C .s; t /; s; t 2 .a; b/: D.s; t / D @s@t Lemma 2.3.6. If Z is a strongly differentiable Gaussian process on .a; b/, then Z 0 is Gaussian as well. Proof. The random vectors 1 Z.t1 C 1=n/ Z.t1 /; : : : ; Z.tk C 1=n/ Z.tk / n where n 2 N and tj ; tj C 1=n 2 .a; b/ are Gaussian and they converge in law to .Z 0 .t1 /; : : : ; Z 0 .tk // as n ! 1 (see Section F.2). Thus, the statement of the Lemma follows from Theorem 1.10.7.
2.4
Integration with respect to complex measures
Throughout this section T Rd denotes a nonempty Borel set and Z is a strongly continuous random field of second order on T . As in Notation 2.2.6 we denote by Z the corresponding L2 -valued random field. First we define integrals of the form Z Z.t / d.t / T
78
Chapter 2 Correlation functions
where 2 Mb .T / is a complex Radon measure on T . Then we prove basic properties of this integral paying special attention to properties which can be expressed in terms of the correlation function C.s; t / D E Z.s/ Z.t / D E Z.s/ Z.t / ; s; t 2 T: Definition 2.4.1. Let 2 Mb .T /. The field Z is called -integrable if the function t 7! .Z.t /; h/; t 2 T , is -integrable for all h 2 L2 .P / and there exists I.Z/ 2 L2 .P / such that Z .Z.t /; h/ d.t / D .I.Z/; h/ ; h 2 L2 .P /: (1) T
Note that I.Z/ is uniquely determined by equation (2.4.1.1). We write Z Z Z d WD Z.t / d.t / WD I.Z/: T
For a Lebesgue integrable function f on T let Z Z Z.t /f .t / dt WD Z.t / d.t / T
T
where d.t / D f .t / dt . If S T is a Borel set, then we write Z Z Z.t / d.t / D Z.t /1S .t / d.t /: S
T
In Rthe special case where S D Œa; b R we also use the notation of Œa;b Z d.
Rb a
Z d instead
The next lemma follows immediately from the definition of the integral. We omit the proof. Lemma 2.4.2. Assuming that all the integrals below exist, we have:
Z Z .Z.t /; h/ d.t / D Z.t / d.t /; h ; h 2 L2 .P / (i) T
T
Z (ii)
T
Z (iii) T
a1 Z1 .t / C a2 Z2 .t / d.t / D Z Z a1 Z1 .t / d.t / C a2 Z2 .t / d.t /; T
Z.t / d.1 C 2 /.t / D
Z T
T
Z.t / d1 .t / C
a1 ; a 2 2 C Z T
Z.t / d2 .t /
79
Section 2.4 Integration with respect to complex measures
Z (iv)
Z S1 [S2
Z.t / d.t / D
Z S1
Z.t / d.t / C
S2
Z.t / d.t /
where S1 ; S2 T are disjoint Borel sets (v)
If Z.t / D h for some h 2 L2 .P / and all t 2 T then Z Z.t / d.t / D .T / h. T
Example 2.4.3. (a) Let D
n X
cj 2 C; tj 2 T
cj ıtj ;
j D1
be a measure with finite support. Then
X Z n n X .Z.t /; h/ d.t / D cj .Z.tj /; h/ D cj Z.tj /; h ; T
j D1
h 2 L2 .P /:
j D1
Thus, by Definition 2.4.1, Z Z d D
n X
cj Z.tj /:
j D1
Moreover, we have Z 2 n n n X X X Z d D c Z.t /; c Z.t / D cj ck C.tj ; tk /: j j k k j D1
kD1
(1)
j;kD1
(b) Assume that Z.t / D f .t /X; t 2 T D Œa; b, where f is a continuous function on T and X 2 L2 .P /. For all h 2 L2 .P / we have Z b Z b .Z.t /; h/ d.t / D .f .t /X; h/ d.t / a
a
Z
Z
b
f .t / d.t / D
D .X; h/ a
and hence
Z a
Z.t / d.t / D
f .t / d.t / X; h a
Z
b
b
b
f .t / d.t / X: a
Using this and (2.4.2.ii) we obtain the more general relation Z bX n Z b n X fj .t /Xj d.t / D fj .t / d.t / Xj a j D1
j D1
a
where Xj 2 L2 .P / and fj is a continuous function on Œa; b.
80
Chapter 2 Correlation functions
Theorem 2.4.4. Assume that t 7! .Z.t /; h/ is -integrable for all h 2 L2 .P /. Then Z is -integrable if and only if there exists a constant K 2 Œ0; 1/ with ˇZ ˇ ˇ ˇ ˇ .Z.t /; h/ d.t /ˇ K khk; h 2 L2 .P /: (1) ˇ ˇ T
R Proof. If Z is -integrable, then the inequality (1) holds with K D k Z dk. This follows from (2.4.1.1) and from the Cauchy–Schwarz inequality j.I.Z/; h/j kI.Z/k khk: Assume now that (1) holds with some constant K. Then Z h 7! .h; Z.t // d.t / T
is a bounded linear functional7 on L2 .P /. By a theorem of Riesz (cf. Theorem D.3.4) there exists I 2 L2 .P / such that Z .h; Z.t // d.t / D .h; I / ; h 2 L2 .P / T
Z
i.e.,
.Z.t /; h/ d.t / D .I; h/ ;
h 2 L2 .P /:
T
Thus, the field Z is -integrable. Corollary 2.4.5. If
Z kZ.t /k djj.t / < 1
(1)
T
then Z is -integrable and Z Z Z.t / d.t / kZ.t /k djj.t /: T
(2)
T
Proof. The Cauchy–Schwarz inequality j.Z.t /; h/j kZ.t /k khk and (1) imply that t 7! .Z.t /; h/ is jj-integrable and hence -integrable as well. Moreover, ˇZ ˇ Z Z ˇ ˇ ˇ .Z.t /; h/ d.t /ˇ kZ.t /k djj.t / khk : /; h/j djj.t / j.Z.t ˇ ˇ T
T
T
By Theorem 2.4.4, ˇRthe field Z ˇ is -integrable. R The left-hand side of the inequality above is equal to ˇ Z d; h ˇ. Setting h D Z d and dividing by khk if khk ¤ 0 we obtain (2). 7
To obtain a linear functional we use the conjugate of the integral in (1).
81
Section 2.4 Integration with respect to complex measures
Theorem 2.4.6. If Z is integrable with respect to and , then the correlation function C is -integrable and Z
Z Z Z d; Z d D C.t; s/ d. /.t; s/: (1) T
In particular,
T T
T
2 Z Z Z d D T
T T
C.t; s/ d. /.t; s/:
(2)
R Proof. Equation (2.4.1.1) with h D Z d yields
Z Z
Z Z Z.t / d.t /; Z.s/ d.s/ D Z.t /; Z.s/ d.s/ d.t /:
(3)
Setting now h D Z.t / in equation (2.4.1.1), replacing the integration variable t by s, and replacing the measure by we obtain
Z Z Z Z.t /; Z.s/ d.s/ D .Z.s/; Z.t // d.s/ D C.t; s/ d.s/: This relation together with equation (3) shows that
Z Z Z Z Z.t / d.t /; Z.s/ d.s/ D C.t; s/ d.s/ d.t /: By Fubini’s theorem C is -integrable and (1) holds. Next we characterize integrability of a random field in terms of integrability of its correlation function. Theorem 2.4.7. The field Z is -integrable if and only if the correlation function C is -integrable. Proof. In view of Theorem 2.4.6 it remains to prove that Z is -integrable whenever C is -integrable. To show this, let h 2 L2 .P /; h ¤ 0, be arbitrary and write fh .t / WD .Z.t /; h/;
Zh .t / WD Z.t /
fh .t / h; khk2
t 2 T:
Then .Zh .t /; h/ D 0 and C.t; s/ D .Z.t /; Z.s// D .Zh .t /; Zh .s// C
1 fh .t /fh .s/: khk2
Let S ¤ ; be an arbitrary compact subset of T . By continuity, the functions t 7! kZ.t /k and t 7! kZh .t /k are bounded on S . Consequently, the restrictions of Z and
82
Chapter 2 Correlation functions
Zh to S are integrable with respect to any complex measure (see Corollary 2.4.5). Applying Theorem 2.4.6 we obtain Z Z Z Z C.t; s/ d.t / d.s/ D .Zh .t /; Zh .s// d.t / d.s/ S S S S Z Z 1 fh .t /fh .s/ d.t / d.s/ C khk2 S S ˇZ ˇ2 ˇ 1 ˇˇ fh .t / d.t /ˇˇ ˇ 2 khk S
(note that the first integral after the equation sign is nonnegative by (2.4.6.2)). Consequently, ˇZ ˇ Z 1=2 ˇ ˇ ˇ fh .t / d.t /ˇ jC j d j j khk: ˇ ˇ T T
S
Using Lemma E.1.18 we conclude8 that fh is -integrable and the inequality above remains valid if we replace S by T . By Theorem 2.4.4 the field Z is -integrable. We already know that a correlation function K is positive semidefinite, i.e., n X
K.ti ; tj /ci cj 0
i;j D1
for all n 2 N; tj 2 T and cj 2 C. The next corollary contains a more general inequality. Corollary 2.4.8. Let K be a continuous positive semidefinite kernel on T . Then the inequality Z K.t; s/ d. /.t; s/ 0 T T
holds for all 2 Mb .T / such that K is -integrable. In particular, the inequality Z Z K.t; s/h.t /h.s/ dt ds 0 T
T
holds for every continuous function h W T ! C with compact support. Proof. By Theorems 2.2.3 and 2.3.2 the kernel K is the correlation function of a strongly continuous random field. This field is -integrable in view of Theorem 2.4.7. The corollary follows now from (2.4.6.2).
8
To apply this lemma we extend to a measure on Rd by setting .B/ WD .B \ T /; B 2 B.Rd /. The function fh is extended to Rd by setting it equal to zero on Rd n T .
83
Section 2.4 Integration with respect to complex measures
Theorem 2.4.9. Let fn g be a sequence of complex measures on T converging weakly to some complex measure . If Z is integrable with respect to and n then Z Z Z d D lim Z dn : (1) n!1 T
T
Proof. We may suppose that and n are nonnegative. If T is compact, then, by continuity, the function C is bounded and hence integrable with respect to any complex measure on T T . Using (2.4.6.2) we obtain 2 Z 2 Z Z Z d Z dn D Z d. n / T T T Z D C d. n / . n / : T T
In view of Theorem E.1.14, the sequence f. n / . n /g converges weakly to 0. Since C is bounded, the right-hand side of the relation above tends to zero from which (1) follows. Now let T be arbitrary. For each > 0 there exist disjoint Borel sets Tb and Tu such that Tb is compact, T D Tb [ Tu and Z C d. / <
Tu Tu
(this integral is nonnegative in view of (2.4.6.2)). Therefore, Z 2 Z 2 Z Z Z d Z d D Z d C d. / < : D T
Tb
Tu Tu
Tu
By the first part of the proof, the statement of the theorem is true for the compact set Tb . The inequality above shows that it is true for T as well. Using Theorem E.1.8 we obtain the following corollary: Corollary 2.4.10. For every complex measure there exists a sequence fn g of complex measures with finite support converging weakly to and satisfying the relation (2.4.9.1). Corollary 2.4.11. If Z is a j -integrable Gaussian field .j D 1; : : : ; d /, then
Z Z Z d1 ; : : : ; Z dd T
T
is a Gaussian random vector. Proof. The statement is obvious if the j ’s have finite support. The general case follows from Corollary 2.4.10 and from the fact that the limit of weakly convergent Gaussian random vectors is Gaussian (cf. Theorem 1.10.7).
84
Chapter 2 Correlation functions
Theorem 2.4.12 (Newton–Leibniz formula for fields). Let Z be a strongly continuous field on .a; b/; 1 a < b 1, and let t0 2 .a; b/ be arbitrary. Rt (i) The field t 7! t0 Z.s/ ds is strongly differentiable and Z.t / D (ii)
d dt
Z
t
t0
Z.s/ ds;
t 2 .a; b/:
If C is twice continuously differentiable, then Z t Z.t / D Z.t0 / C Z 0 .s/ ds;
t 2 .a; b/:
t0
Proof. (i) We have to show that # "Z Z t Z tCh 1 tCh 1 Z.s/ ds Z.s/ ds D Z.s/ ds h t0 h t t0 converges to Z.t / as h ! 0. This follows from Z Z 1 tCh tCh 1 Z.s/ ds Z.t / D Z.s/ Z.t / ds jhj t h t Z 1 kZ.s/ Z.t /k ds jhj Œt;tCh sup kZ.s/ Z.t /k ! 0: s2Œt;tCh
(ii) Corollary 2.3.5 shows that Z is strongly differentiable and the correlation function of Z 0 is continuous. Consequently, Z 0 is strongly continuous and hence integrable with respect to the Lebesgue measure on Œt0 ; t . By Theorem 2.3.4, the function t 7! .Z.t /; h/ is continuously differentiable for all h 2 L2 .P /. Applying the classical Newton–Leibniz formula to this function we obtain Z t d .Z.s/; h/ ds: .Z.t /; h/ D .Z.t0 /; h/ C ds t0 Equation (2.3.4.1) and the definition of the integral show that
Z t Z 0 .s/ ds; h ; h 2 L2 .P /: .Z.t /; h/ D .Z.t0 /; h/ C t0
from which the statement follows. Lemma 2.4.13 (dominated convergence). Let Z; Zn ; n 2 N, be strongly continuous fields on T such that (1) lim Zn .t / D Z.t /; t 2 T: n!1
85
Section 2.4 Integration with respect to complex measures
Assume further that there exists a jj-integrable function g on T such that kZn .t /k g.t / for all n and t . Then Z and Zn are -integrable and Z Z Zn .t / d.t / D Z.t / d.t /: lim n!1 T
T
Proof. Relation (1) implies that limn!1 kZn .t /k D kZ.t /k and hence kZ.t /k g.t /. By Corollary 2.4.5, the fields Z and Zn are -integrable. Using the inequality kZn .t / Z.t /k 2g.t /, the lemma follows from Z Z Z Zn .t / d Z.t / d.t / D ŒZn .t / Z.t / d.t / T T Z T kZn .t / Z.t /k djj.t / T
and from Lebesgue’s theorem on dominated convergence. Definition 2.4.14. Assume that T D Rd or T D Zd and let be a complex measure on T . Further, let Z be a continuous field of second order on T such that the field x 7! Z.t x/ is -integrable for all t 2 T . We define the field Z, the convolution of and Z, by Z Z.t / D
T
Z.t x/ d.x/;
t 2 T:
If f 2 L1 ./ or f 2 L1 .Zd /, then we write Z f Z.t / WD Z.t x/f .x/ d.x/; Rd
and f Z.t / WD
X
Z.t n/f .n/;
t 2T t 2T
n2Zd
respectively. In the next theorem we collect basic properties of the convolution. Theorem 2.4.15. Assume that all convolutions below exist. Then we have . C / Z D Z C Z
(1)
.c/ Z D c . Z/; c 2 C Z Z.t / D Z d.ıt 4 / . / Z D . Z/
(2) (3) (4)
for all t 2 T . If a sequence fn g converges weakly to , then lim n Z.t / D Z.t /;
n!1
t 2 T:
(5)
86
Chapter 2 Correlation functions
Proof. The first three equations follow immediately from the definition of the convolution, while (5) is a consequence of Theorem 2.4.9. It is easy to check that (4) holds for measures with finite support. In the general case we approximate and by measures with finite support (see Corollary 2.4.10) and apply Theorem E.2.5. Remark 2.4.16. Let Z be a not necessarily continuous, second order field on Rd . If the complex measure has finite support, then we define the convolution of and Z, as in Definition 2.4.14, by Z Z.t x/ d.x/; t 2 Rd : Z.t / D Rd
Then equations (1)–(4) in the previous theorem still remain valid.
2.5 The Karhunen–Loève decomposition In this section we use the notation I WD Œa; b WD Œa1 ; b1 Œad ; bd where a; b 2 Rd and 1 < aj < bj < 1. We will prove a decomposition theorem for continuous positive semidefinite kernels on I I . Using this result we derive a decomposition theorem for strongly continuous fields on I . We denote by L2 .a; b/ the positive semidefinite inner product space of all complexvalued functions on Œa; b which are square integrable with respect to the Lebesgue R 2 .a; b/. We write b instead measure. The corresponding Hilbert space is denoted by L a R of I . Finally, C W I I ! C will be a continuous kernel. Definition 2.5.1. A function ' 2 L2 .a; b/ is called an eigenfunction of C with eigenvalue 2 C if Z b
'.t / D
C.t; s/'.s/ ds;
t 2 Œa; b:
a
Lemma 2.5.2. Let h 2 L2 Œa; b. The function g defined by Z b g.t / WD C.t; s/h.s/ ds; t 2 Œa; b a
is continuous. In particular, if ' is an eigenfunction of C with eigenvalue ¤ 0, then ' is continuous.
87
Section 2.5 The Karhunen–Loève decomposition
Proof. For t; t0 2 I the Cauchy–Schwarz inequality shows that ˇZ ˇ2 ˇ b ˇ ˇ ˇ 2 jg.t / g.t0 /j D ˇ .C.t; s/ C.t0 ; s//h.s/ ds ˇ ˇ a ˇ Z b Z b jC.t; s/ C.t0 ; s/j2 ds jh.s/j2 ds: a
a
The kernel C is continuous and hence uniformly continuous on I I . Using this we see that the right-hand side of the relation above tends to zero as t ! t0 . Lemma 2.5.3. The mapping K defined by9 Z b Kh.t / WD C.t; s/h.s/ ds;
h 2 L2 .a; b/; t 2 Œa; b
a
is a compact linear operator in L2 .a; b/. Proof. Using the fact that C is bounded on I I it is easy to see that K is a bounded linear operator in L2 .a; b/. We have to show that for every bounded sequence fhn g, the sequence fKhn g has a convergent subsequence. Write gn D Khn . To simplify the notation, we also consider gn as a function defined pointwise by the integral above. Applying the Cauchy–Schwarz inequality as in Lemma 2.5.2, we see that the sequence fgn g is equicontinuous and uniformly bounded. In view of the theorem of Arzelà and Ascoli (cf. Theorem B.2.9), the sequence fgn g has a subsequence converging uniformly on Œa; b to a continuous function. It follows that the operator K is indeed compact. Theorem 2.5.4 (Mercer). An arbitrary continuous positive semidefinite kernel C W I I ! C admits the decomposition C.t; s/ D
1 X
n 'n .t /'n .s/;
t; s 2 I
(1)
nD1
where (i)
each 'n is an eigenfunction of C with eigenvalue n 0;
(ii)
the series above is absolutely and uniformly convergent;
(iii)
2 f'n C N g1 nD1 is an orthonormal basis of L .a; b/, where 2 N D f' 2 L .a; b/ W .'; '/ D 0g.
Proof. To simplify the notation we identify continuous functions on Œa; b with the corresponding elements of L2 .a; b/. We consider the compact linear operator K 9
To be precise, Kh is the equivalence class represented by the integral on the right-hand side.
88
Chapter 2 Correlation functions
defined in Lemma 2.5.3. Using that C.t; s/ D C.s; t / it is easy to check that .Kh; g/ D .h; Kg/ , i.e., K is self-adjoint. It follows from Corollary 2.4.8 that K is nonnegative and hence all eigenvalues of K are nonnegative. By Remark D.3.13, there 2 exists an orthonormal basis f'n g1 1 of L .a; b/ consisting of eigenvectors of K with 1 2 eigenvalues fn g1 . Every h 2 L .a; b/ admits the decomposition hD
1 X
.h; 'n / 'n :
nD1
For fixed t we apply the above decomposition to the function h W s 7! C.s; t /: C.t; / D C.; t / D
1 X
t 2 Œa; b:
.'n ; C.; t // 'n ;
nD1
Since 'n is an eigenvector of K with eigenvalue n we have Z .'n ; C.; t // D
b
C.t; s/'n .s/ ds D K'n .t / D n 'n .t /:
(2)
a
Thus, C.t; / D
1 X
n 'n .t /'n :
nD1
From here, again using (2) and the orthonormality of the vectors 'n , we obtain 2 n X ' .t /' 0 D lim C.t; / k k k n!1
Z D lim
n!1
kD1 b
2
jC.t; s/j ds a
n X
2k j'k .t /j2
;
t 2 Œa; b:
kD1
If k ¤ 0, then the function 'k is continuous (cf. Lemma 2.5.2). It is easy to see that the function Z b t 7! jC.t; s/j2 ds a
P 2 2 is continuous as well. By Dini’s Theorem B.2.5, the series 1 kD1 k j'k .t /j converges uniformly on Œa; b. Using this and the Cauchy–Schwarz inequality 12 X 12 q ˇ q q ˇ X X ˇ ˇ 2 2 k j'k .t /j k j'k .s/j ˇk 'k .t /'k .s/ˇ kDp
kDp
kDp
Section 2.5 The Karhunen–Loève decomposition
89
we conclude that (ii) holds. To show equation (1) we consider for fixed s the continuous nonnegative function ˇ ˇ2 1 X ˇ ˇ ˇ t 7! ˇC.t; s/ n 'n .t /'n .s/ˇˇ : nD1
Using (2) we see that its integral over I is equal to zero. Thus, the function above is equal to zero everywhere. Theorem 2.5.5 (Karhunen–Loève). Let Z W I ! L2 . ; A; P / be a strongly continuous field with correlation function C . Then Z admits the expansion Xp n 'n .t /Xn ; t 2 I (1) Z.t / D n2S
where
(iii)
'n and n are as in Mercer’s theorem and S D fn 2 N W n > 0g; 10 Z b 1 Xn D p Z.s/ 'n .s/ ds; n a the Xn ’s form an orthonormal system in L2 . ; A; P /;
(iv)
the series (1) converges uniformly on I .
(i) (ii)
Proof. For n 2 S we define Xn by equation (ii) and write Xn D 0 if n … S . Note that the function s 7! kZ.s/ 'n .s/k is bounded by continuity, hence the integral in (ii) exists (cf. Theorem 2.4.5). Applying Theorem 2.4.6 we see that
Z b Z b 1 .Xn ; Xm / D p Z.t / 'n .t / dt; Z.s/ 'm .s/ ds n m a a Z bZ b 1 Dp C.t; s/'n .t /'m .s/ ds dt n m a a s Z m b 'n .t /'m .t / dt D n a D ın;m ;
10
n; m 2 S
The correlation function C is positive semidefinite, hence we may apply Mercer’s Theorem 2.5.4 to it. If S D ;, then the sum (1) is zero by definition.
90
Chapter 2 Correlation functions
i.e., (iii) holds. Using property (2.4.2.i) of the integral and the definition of Xn , we obtain
Z b 1 Z.s/'n .s/ ds .Z.t /; Xn / D p Z.t /; n a Z b 1 Dp C.t; s/'n .s/ ds n a p D n 'n .t /; n 2 N: Applying this we see that 2 k p k p X X Z.t / ' .t /X D C.t; t / 2Re n 'n .t /.Z.t /; Xn / n n n nD1
C
k X
nD1
n j'n .t /j2
nD1
D C.t; t /
k X
n j'n .t /j2 ;
k 2 N:
nD1
In view of Theorem 2.5.4, the right-hand side converges to zero uniformly on I as k ! 1. This completes the proof. Remark 2.5.6. If Z is Gaussian, then in view of Corollary 2.4.11 the random variables Xn in Theorem 2.5.5 are Gaussian. If the mean of Z is zero, then E.Xn / D 0. This follows from (2.4.2.i) and from E.Xn / D .Xn ; 1/. By orthogonality, the random variables Xn are then uncorrelated and hence independent (cf. Theorem 1.10.8). The next theorem is the converse of the previous one and it is easier to prove. Theorem 2.5.7. Let ; ¤ S N, for each n 2 S let n be a positive number and fXn gn2S be an orthonormal system in L2 . ; A; P /. Further, let f'n gn2S be an orthonormal system of continuous functions in L2 .a; b/ such that the series Xp n 'n .t /Xn ; t 2 I Z.t / D n2M
is uniformly convergent on I . Then this series defines a strongly continuous second order field Z and the function 'n is an eigenfunction with eigenvalue n of the correlation function of Z. Proof. To simplify the notation we assume that S D N, the general case can be treated in the same way. It is easy to check that Z is a field of second order with correlation
91
Section 2.5 The Karhunen–Loève decomposition
function C.t; s/ D .Z.t /; Z.s// D
1 X
n 'n .t /'n .s/;
t; s 2 I:
nD1
where the series is uniformly convergent on I . From this we conclude that C and hence Z is continuous. Since f'n gn2N is an orthonormal system, we have Z
Z
b
C.t; s/'k .s/ ds D a
D
1 b X
n 'n .t /'n .s/'k .s/ ds
a nD1 1 X
Z
n 'n .t /
b
'n .s/'k .s/ ds a
nD1
D k 'k .t /;
t 2 I; k 2 N:
Example 2.5.8. We consider the Wiener process W from Corollary 2.1.4 on a finite interval I D Œ0; T ; T > 0. The correlation function of W is given by C.t; s/ D min.t; s/. First we compute the eigenfunctions of C with positive eigenvalues. The equation Z T
C.t; s/'.s/ ds D '.t /;
0 t T; > 0
0
can be rewritten as Z
Z
t
s'.s/ ds C t 0
T
'.s/ ds D '.t /: t
From here we see that '.0/ D 0. The left-hand side, hence also the right-hand side, is differentiable with respect to t .11 We have, Z T '.s/ ds D ' 0 .t /; t 2 Œ0; T : t
This equation implies that ' 0 .T / D 0. Differentiating again we obtain '.t / D ' 00 .t /: The solutions of this equation satisfying the condition '.0/ D 0 are given by t '.t / D A sin p ;
t 2 Œ0; T :
Taking into account the condition ' 0 .T / D 0 it is easy to check that n D 11
T2 .n C 12 /2 2
;
At t D 0 and t D T we have only one-sided derivatives.
n 2 N0
92
Chapter 2 Correlation functions
are theqonly possible values of . We denote by 'n the corresponding function and set
A WD
2 T,
i.e.,
r 'n .t / D
2 sin T
1 t nC : 2 T
The L2 -norm of 'n is equal to 1. From Theorem 2.5.5 we obtain the representation 1 X p sin n C 12 Tt W .t / D 2T XnT ; 0 t T: 1 nC 2 nD1 Remark 2.5.6 shows that the random variables XnT are independent and standard Gaussian. Example 2.5.9. Let Y1 ; : : : ; Yn 2 L2 . ; A; P / be uncorrelated with mean zero and standard deviation j > 0. Further, let k1 ; : : : ; kn 2 N. The correlation function of the process n X eitkj Yj ; t 2 Œ0; 2 Z.t / D j D1
is given by C.t; s/ D
n X
j2 eitkj eiskj D
j D1
The functions 'j .t / D
n X
j2 ei.ts/kj ;
t 2 Œ0; 2:
j D1 p1 2
eitkj ; t 2 Œ0; 2, build an orthonormal system
in L2 .0; 2/. Write Xj WD Yj =j . Then X1 ; : : : ; Xn is an orthonormal system in L2 . ; A; P /, n p X j 'j .t /Xj ; t 2 Œ0; 2 Z.t / D 2 j D1
and 'j is an eigenfunction of C with eigenvalue
p 2 j2 (cf. Theorem 2.5.7).
2.6 Integration with respect to orthogonal random measures The results and methods presented in this section will be used to prove integral representations for random fields. Throughout the present section .W; B/ denotes a measurable space.
93
Section 2.6 Integration with respect to orthogonal random measures
Definition 2.6.1. A mapping W B ! L2 . ; A; P / is called a random orthogonal measure on B (or on .W; B//, if (i) (ii)
..B/; .D// D 0 for all B; D 2 B with B \ D D ;; P1 .[1 j D1 .Bj / for all mutually disjoint sets Bj 2 B. j D1 Bj / D
If is a random orthogonal measure, then the mapping W B ! Œ0; 1/, defined by .B/ WD ..B/; .B// D k.B/k2 ;
B2B
is called the structure measure of .12 By H./ we denote the smallest subspace of L2 . ; A; P / containing the set f.B/ W B 2 Bg. Lemma 2.6.2. The structure measure of a random orthogonal measure is a finite nonnegative measure on B. Proof. Equation (2.6.1.i) with B D D D ; shows that .;/ D 0. If the sets Bj 2 B are mutually disjoint, then by (2.6.1.i), the .Bj /’s are mutually orthogonal. From here and from (2.6.1.ii) we obtain 2 ! !2 1 1 1 X [ [ B j D Bj D .Bj / j D1
j D1
D
j D1 1 X
1 X
j D1
j D1
k.Bj /k2 D
.Bj /:
Theorem 2.6.3. Let be a random orthogonal measure on B with structure measure . Then there exists an isometric linear operator I from L2 . / onto H./ such that ! n n X X cj 1Bj D cj .Bj / I
j D1
j D1
holds for all n 2 N and all cj 2 C; Bj 2 B. Proof. Let L denote the linear manifold of all functions f of the form f D P n 2 j D1 cj 1Bj . This manifold is dense in L . /. For f 2 L we define I .f / by the equation above. It is easy to check that this definition is correct, i.e., the right-hand side does not depend on the special representation of f . It is clear that I is a linear operator from L into H./. By the definition of H./, the range of I is dense in H./. 12
It is possible to define random orthogonal measures where the structure measure is not finite (see for example [19]). However, we do not need this more general notion in the present book.
94
Chapter 2 Correlation functions
An arbitrary step function f can be written in the form above with mutually disjoint sets Bj . By orthogonality, we obtain .I .f /; I .f // D
n X
jcj j2 .Bj / D .f; f /
j D1
(we denote the inner product in both spaces by .; /). Thus, the mapping I is isometric. By Theorem D.3.6, this mapping can be uniquely extended to an isometric linear operator from L2 . / onto H./. In the sequel we will use the notation Z Z Z f .t / d.t / WD f .t / d.t / WD f d WD I .f /;
f 2 L2 . /:
W
For a set A 2 B we write Z Z f .t / d.t / WD 1A .t /f .t / d.t /;
f 2 L2 . /:
A
In the next theorem we list the basic properties of this integral which follow immediately from Theorem 2.6.3. Theorem 2.6.4. Let be a random orthogonal measure. For all f; g; fn 2 L2 . / and a; b 2 C we have Z Z Z (i) Œaf .t / C bg.t / d.t / D a f .t / d.t / C b g.t / d.t /
Z Z Z (ii) f .t / d.t /; g.t / d.t / D f .t /g.t / d .t / Z Z (iii) lim lim fn .t / d.t / fn .t / d.t / D n!1
n!1
whenever the sequence ffn g is convergent. Example 2.6.5. Let tj 2 W and Xj 2 L2 . ; A; P /; j 2 N, where the tj ’s are mutually distinct and the Xj ’s are mutually orthogonal such that 1 X
kXj k2 < 1:
j D1
For an arbitrary subset B W we define .B/ WD
X j W tj 2B
Xj :
Section 2.6 Integration with respect to orthogonal random measures
Thus, D
1 X
Xj ıtj :
95
(1)
j D1
Then is a random orthogonal measure with a discrete structure measure D
1 X
kXj k2 ıtj :
j D1
Choosing Bj D ftj g in Theorem 2.6.3 and taking the limit n ! 1 we obtain Z
1 X
f .t / d.t / D
f .tj /Xj
j D1
where f is an arbitrary function on W .13 It is easy to see that the following converse statement is true as well: If is discrete, then can be represented in the form (1). Definition 2.6.6. A second order process Z on T D Œ0; 1/ is called a process with orthogonal increments, if .Z.t2 / Z.t1 /; Z.t4 / Z.t3 // D 0;
0 t1 t2 t3 t4 ; tj 2 T:
The Wiener process and the Poisson process on Œ0; 1/ have orthogonal increments (cf. Theorem 2.1.4 and Theorem 2.1.5). The next theorem shows that for each process with orthogonal increments there is an associated random orthogonal measure. Theorem 2.6.7. Let Z be a bounded, continuous14 process with orthogonal increments on T D Œ0; 1/. Then there exists a unique orthogonal random measure on B.T / such that H./ H.Z/ and .Œt; s// D Z.s/ Z.t /;
0 t < s < 1:
(1)
Proof. We define the function F by F .t / D kZ.t / Z.0/k2 ;
t 2 Œ0; 1/:
This function is continuous and bounded. Since the increments are orthogonal we have F .t / D kZ.t / Z.s/ C Z.s/ Z.0/k2 D kZ.t / Z.s/k2 C F .s/ 13 14
If .ftj g/ > 0 it is meaningful to write f .tj / for f 2 L2 . Otherwise Xj D 0 and hence it is not necessary to evaluate f at tj . Actually, it is sufficient to assume continuity from the left.
96
Chapter 2 Correlation functions
where 0 s t . Thus, the function F is increasing and F .t / F .s/ D kZ.t / Z.s/k2 ;
0 s t:
By basic results of measure theory, there exists a finite nonnegative measure on B.T / such that .Œs; t // D F .t / F .s/; s t: For a set B T of the form B D Œt; s/ we define .B/ by (1). In the same way as in the proof of Theorem 2.6.3 we define I W L2 . / ! H.Z/ for step functions from L2 . /, we use however only Borel sets of this special form. These functions span a dense linear manifold and hence I can be extended to an isometric operator from L2 . / into H.Z/ (see Theorem D.3.6). Now, let B T be an arbitrary Borel set and write .B/ WD I.1B /. The fact that I is isometric implies that is an orthogonal random measure. The uniqueness follows from the density of the linear manifold used above. Example 2.6.8. Let Z be the Wiener process or the Poisson process on Œ0; 1/. Denote by and the corresponding random orthogonal measure and the structure measure, in the sense of the previous theorem, respectively. For B D Œ0; t / Œ0; 1/ we have .B/ D ..B/; .B// D .Z.t /; Z.t // D t: From this we conclude that D jŒ0;1/ . Definition 2.6.9. Let Z and be as in Theorem 2.6.7. We define the integral Rb a f .t / dZ.t /; 0 a b, by Z b Z b f .t / dZ.t / WD f .t / d.t /; f 2 L2 . / a
a
where is the structure measure of . Theorem 2.6.10 (integration by parts). Let f be an absolutely continuous function on Œ0; t ; t > 0, and let Z be a continuous process on Œ0; 1/ with orthogonal increments. Then we have Z t Z t f .s/ dZ.s/ D f .t /Z.t / f .0/Z.0/ Z.s/f 0 .s/ ds: 0
0
Proof. Denote by the structure measure of the random orthogonal measure corresponding to Z (cf. Theorem 2.6.7). We write ti;n WD
t i ; n
0i n
Section 2.6 Integration with respect to orthogonal random measures
97
and define the function fn on Œ0; t by fn .s/ D f .ti;n / if s 2 Œti;n ; tiC1;n / and fn .t / D f .t /. The sequence ffn g converges uniformly, hence also in L2 . /, to f . Using Theorem 2.6.4 we obtain Z
Z
t
f .s/ dZ.s/ D 0
t
lim fn .s/ dZ.s/ D lim
0 n!1
n!1
n1 X
fn .ti;n /ŒZ.tiC1;n / Z.ti;n /:
iD0
Using summing by parts15 the last sum can be written as f .t /Z.t / f .0/Z.0/
n1 X
Z.ti;n /Œf .tiC1;n / f .ti;n /:
iD0
The sum above is equal to
Z
t
0
Z.gn .s//f 0 .s/ ds
(1)
where gn .s/ D ti;n whenever s 2 Œti;n ; tiC1;n /. Denote by C the correlation function of Z. Using the inequality jgn .s/ sj < n1 , the equation kZ.s/ Z.gn .s//k2 D C.s; s/ C.s; gn .s// C.gn .s/; s/ C C.gn .s/; gn .s// and the uniform continuity of C on Œ0; t Œ0; t we see that for any > 0 the inequality kZ.s/ Z.gn .s//k2 <
holds if n is sufficiently large. Applying this and inequality (2.4.5.2) we conclude that the integral (1) converges to Z t
0
Z.s/f 0 .s/ ds
as n ! 1. The proof is complete. We end this section with a lemma which will be useful in Section 5.4. Lemma 2.6.11. Let be a random orthogonal measure on .W; B/ with structure measure and let g 2 L2 . /. Then Z Z g .A/ WD 1A g d D g d; A 2 B: W
A
defines a random orthogonal measure g on .W; B/ with structure measure g given
15
n X kDm
ak .bkC1 bk / D ŒanC1 bnC1 am bm
n X kDm
bk .akC1 ak /
98
Chapter 2 Correlation functions
by dg D jgj2 d . Moreover, we have Z Z (i) f dg D fg d; (ii)
f 2 L2 .g /I
the g -measure of the set of zeros of g is equal to 0, the function 1=g is in L2 .g / and Z 1 .A/ D dg ; A 2 B: A g
Proof. By equation (2.6.4.ii), Z
Z 2
.g .A/; g .B// D
jgj2 d
1A 1B jgj d D W
A\B
and hence (2.6.1.i) holds. Let Bj 2 B; j 2 N, be mutually disjoint. Lebesgue’s dominated convergence theorem shows that lim 1[N Bj D 1[1 1 Bj
N !1
1
and
lim g1[N Bj D g1[1 1 Bj
N !1
1
the convergence being in L2 . /. From (2.6.4.iii) we see that (2.6.1.ii) holds, i.e., g is a random orthogonal measure. The relation (i) is true for step functions by the definition of g . Let ffn g be a sequence of step functions converging in L2 .g / to f . Since dg D jgj2 d , the sequence ffn gg converges in L2 . / to fg, from which (i) follows. The first two statements in (ii) follow from dg D jgj2 d , while the third one follows from (i) replacing f by 1A g1 .
2.7 The theorem of Karhunen Let be a random orthogonal measure on the measurable space .W; B/ with structure measure . Further, let g W T W ! C be such that g.t; / 2 L2 . / for all t 2 T . We define the second order field Z by Z g.t; x/ d.x/; t 2 T: Z.t / D W
By virtue of (2.6.4.ii) the correlation function C of Z is given by Z g.t; x/g.s; x/ d .x/; t; s 2 T: C.t; s/ D W
In the next theorem we consider the converse direction. We assume that the correlation function admits an integral representation of this kind and deduce from it an integral representation for the field.
99
Section 2.7 The theorem of Karhunen
Theorem 2.7.1 (Karhunen). Let Z W T ! L2 . ; A; P / be a field with correlation function C . Assume that C admits the representation Z g.t; x/g.s; x/ d .x/; t; s 2 T (1) C.t; s/ D W
where (i)
is a finite nonnegative measure on a measurable space .W; B/;
(ii)
g is a complex-valued function on T W with g.t; / 2 L2 . / for all t 2 T ;
(iii)
the linear manifold Lg WD span fg.t; /; t 2 T g is dense in L2 . /.
Then there exists a uniquely determined random orthogonal measure on .W; B/ with values in L2 . ; A; P / and with structure measure such that Z Z.t / D g.t; x/ d.x/; t 2 T (2) W
and H.Z/ D H./. Proof. Let f .x/ D
n X
tk 2 T; x 2 W; ck 2 C
ck g.tk ; x/;
kD1
be an arbitrary function in Lg . We define the linear mapping I W Lg ! H.Z/ by I.f / D
n X
ck Z.tk /:
kD1
We have to show that this definition is correct, i.e., the sum above depends only on f but not on the special choice of the ck ’s and tk ’s. Using equation (1) we obtain X
X n n ck Z.tk /; Z.s/ D ck C.tk ; s/ kD1
D
kD1 n X kD1
Z D
Z ck
g.tk ; x/g.s; x/ d .x/ W
f .x/g.s; x/ d .x/;
s 2 T:
W
Using this and theP fact that the linear span of the set fZ.s/ W s 2 T g is dense in H.Z/ we conclude that nkD1 ck Z.tk / depends only on f . It is clear that I is linear. A similar computation as above shows that Z n X cj ck C.tj ; tk / D f .x/f .x/ d .x/ D .f; f / .I.f /; I.f // D j;kD1
W
100
Chapter 2 Correlation functions
i.e., I is isometric. Consequently, I can be extended to an isometric linear operator from Lg D L2 . / into H.Z/ (cf. Theorem D.3.6). We denote also the extension by I and define by (3) .B/ D I.1B /; B 2 B: Using the fact that I is isometric we see that is a random orthogonal measure with structure measure . Write Z X.t / WD g.t; x/ d.x/; t 2 T: Next we show that Z D X . We have
Z
.Z.t /; .B// D .I.g.t; //; I.1B // D
g.t; x/1B .x/ d .x/;
B 2 B:
(4)
P For fixed t , let gn .s/ D k ckn 1Bkn ; n 2 N, be step functions converging to g.t; / in L2 . / as n ! 1. Then, by (2.6.4.iii), Z X n n lim ck .Bk / D lim gn .t; x/ d.x/ D X.t /: n!1
n!1
k
Combining this with (4) we conclude that Z .Z.t /; X.s// D g.t; x/g.s; x/ d .x/ and hence .Z.t /; X.s// D .Z.t /; Z.s//;
t; s 2 T:
Since span fZ.t / W t 2 T g is dense in H.Z/ we must have X.s/ D Z.s/. The representation (2) shows that H.Z/ H./. By the definition of we also have H./ H.Z/, i.e., H./ D H.Z/. Finally, let be a random orthogonal measure on .W; B/ with structure measure such that Z Z g.t; x/ d.x/ D g.t; x/ d.x/; t 2 T: W
W
The assumption (iii) implies that Z Z h d D W
h d;
h 2 L2 . /:
W
Setting h D 1B ; B 2 B, we obtain D . It is possible to prove a similar result (see [34]) without using the density condition (2.7.1.iii). In this case, besides losing the equality of the spaces H.Z/ and H./, the random orthogonal measure has values in a possibly larger space. This is the motivation for the next theorem where we use the same notation as in Theorem 2.7.1.
101
Section 2.7 The theorem of Karhunen
Theorem 2.7.2. Assume that C admits the integral representation (2.7.1.1) satisfying the conditions (2.7.1.i) and (2.7.1.ii). Then the integral representation (2.7.1.2) with a random orthogonal measure W B ! L2 . ; A; P / having structure measure is ? possible if and only if dim L? g dim H.Z/ . ? Proof. Assume first that dim L? g dim H.Z/ and consider the orthogonal decomposition L2 .W; B; / D Lg ˚ L? g:
Choose an orthonormal basis fe W 2 I g of L? g , where I is a nonempty index set such that T \ I D ;. By assumption there exists an orthonormal system fY W 2 I g in H.Z/? . Now we consider the random field ( Z.t /; if t 2 T e X .t / WD if t 2 I; Yt ; for all t 2 T [ I . We write further for t 2 T [ I ( g.t; /; if t 2 T e g .t; / WD if t 2 I: et ; e .t / 2 L2 . ; A; P / for all t 2 T [ I and Then X Z e .s; t / D .X e .s/; X e .t // D e g .s; x/e g .t; x/ d .x/: C W
2 Since Le g is dense in L .W; B; /, Theorem 2.7.1 yields the existence of a uniquely determined random orthogonal measure with structure measure and values in L2 . ; A; P / such that Z e .t / D X e g .t; x/ d.x/: W
Z
In particular, Z.t / D
g.t; x/ d.x/ W
for all t 2 T . For the converse direction assume that dim H.Z/? < dim L? g: As before, choose an orthonormal basis fe W 2 I g of L? g , where I is nonempty and T \ I D ;. Assume W B ! L2 . ; A; P / to be a random orthogonal measure satisfying (2.7.1.2) and having structure measure . Set for 2 I Z e .x/ d.x/ 2 L2 . ; A; P /: Y WD W
102
Chapter 2 Correlation functions
We then have
Z
e .x/ d.x/
Z
.Z.s/; Y / D
g.s; x/ d.x/; W
Z
W
g.s; x/e .x/ d .x/ D 0;
D
s 2 T; 2 I
W
and
Z
e| .x/ d.x/
Z
.Y ; Y| / D
e .x/ d.x/; Z
W
W
e .x/e| .x/ d .x/ D ı;| ;
D
; | 2 I:
W
Thus, fY W 2 I g is an orthonormal system in H.Z/? . Hence, this system can be extended to an orthonormal basis of H.Z/? . Since dim L? g is equal to the cardinality of I , we obtain ? dim L? g dim H.Z/ a contradiction to the assumption. The proof is complete. Remark 2.7.3. For fixed functions C and g the representing measure in (2.7.1.1) is in general not uniquely determined. To see classical examples consider the case of W D R equipped with the usual Borel -algebra B. Let first g.t; x/ D x t ; n 2 T WD N0 ; x 2 R: A bounded positive measure on the real line is in general not uniquely determined by the values of the integrals Z 1 Z 1 g.t; x/g.s; x/ d .x/ D x tCs d .x/; t; s 2 N0 1
1
(see also Remark 3.3.7). The description of all measures with given moments is the power moment problem. For more information on the power moment problem we refer the reader to the monographs [1] and [55]. Another example is given by the trigonometric moment problem. In this case T D Œa; a; a > 0, and g.t; x/ D eitx ;
t 2 Œa; a; x 2 R:
We will see in Chapter 4 that is in general not uniquely determined by the values of the integrals Z 1 Z 1 g.t; x/g.s; x/ d .x/ D ei.ts/x d .x/; t; s 2 Œa; a: 1
1
The paper [5] contains more details on the connection between Karhunen’s theorem, moment problems and quadrature formulae.
103
Section 2.8 Stationary fields
2.8
Stationary fields
Recall that throughout this chapter T denotes a nonempty subset of Rd . Definition 2.8.1. A second order field Z on T is said to be stationary,16 if the expectation M WD E.Z.t // does not depend on t and E.Z.t / Z.s/ / is a function of t s: E Z.t / Z.s/ D C.t s/; t; s; t s 2 T where C is a complex-valued function defined on the set T T D ft s W t; s 2 T g: With a little abuse of terminology we call the function C , like in Definition 2.1.1, the correlation function of Z. If Z is stationary, then the covariance E .Z.t / M / .Z.s/ M / is a function of t s as well. This function is called the covariance function of Z. Denoting it by K we have K.h/ D C.h/ jM j2 ;
h 2 T T:
The notions above are defined for Z (cf. Notation 2.2.6) in the same way as for Z. Definition 2.8.2. A field Z on T is said to be strongly stationary or stationary in the strong sense when all its finite-dimensional distributions remain the same when shifted. That is, for all n 2 N and all t1 ; : : : tn ; h 2 Rd the random vectors .Z.t1 /; : : : ; Z.tn // and
.Z.t1 C h/; : : : ; Z.tn C h//
have the same distribution whenever t1 C h; : : : ; tn C h 2 T . If Z is a strongly stationary field of second order then it is stationary. Indeed, the random variables Z.t / and Z.t C h/ have the same distribution and hence the same expectation. Moreover, the random vectors .Z.t /; Z.s//
and
.Z.t C h/; Z.s C h//
have the same distribution and hence the same correlation.17 Next we show that for Gaussian fields the notions stationarity and strong stationarity are the same. 16
17
Stationary fields are also called stationary in the wide sense, widely stationary or second order stationary. These are the main fields studied in this book, hence we call them simply stationary for brevity. Actually, we used only the cases n D 1 and n D 2 of the definition of strong stationarity.
104
Chapter 2 Correlation functions
Theorem 2.8.3. Every stationary Gaussian field Z is strictly stationary. Proof. Since Z is Gaussian the random vectors .Z.t1 /; : : : ; Z.tn //
and
.Z.t1 C h/; : : : ; Z.tn C h//
tj ; tj Ch 2 T; h 2 Rd , are Gaussian. Since Z is stationary, they have the same expectation and the same covariance. Consequently, they have the same distribution. Lemma 2.8.4. Let Z be stationary on T D Rd or T D Zd with a continuous correlation function C . Then we have . Z.t /; Z.s// D . Q Z.t /; Z.s// D Q C.t s/
(1)
for all complex measures ; on T and for all s; t 2 T . Proof. It is easy to check that the lemma holds for measures with finite support. In the general case we approximate and by measures with finite support (see Corollary 2.4.10) and apply Theorem E.2.5. Definition 2.8.5. A second order field Z is called white noise with expectation and variance 2 ; 0, if the random variables X.t /; t 2 T , are uncorrelated and E.Z.t // D and Var.Z.t // D 2 for all t 2 T . We will use the notation Z W N.; 2 / to indicate that Z is white noise with expectation and variance 2 . White noise is stationary with covariance function 2 ; hD0 K.h/ D 0; h ¤ 0: If the random variables Z.t /; t 2 T , are not identically distributed, then Z is not strongly stationary. Example 2.8.6. (1)
Let X 2 L2 . ; A; P /; T D Rd and x 2 Rd . The field Z.t / D ei.t;x/ X;
t 2 Rd
(1)
is stationary since .Z.t C h/; Z.t // D ei.tCh;x/ ei.t;x/ .X; X / D ei.h;x/ .X; X /: The question arises of under which conditions on a complex-valued function f on Rd is the field Z W t 7! f .t /X stationary? Without loss of generality
105
Section 2.8 Stationary fields
assume that .X; X / D 1 and f .0/ D 1. The field Z is stationary if and only if the relation .f .t C h/X; f .t /X / D f .t C h/f .t / D g.h/;
t; h 2 Rd
holds with some function g. Setting t D 0 we see that f D g while setting h D 0 we obtain jf j D 1. Thus, f is a character of the group Rd (cf. Definition 1.4.5). If f is continuous then, by Theorem C.9.2, the function f has the form f .h/ D ei.h;x/ with some x 2 Rd . (2)
The example given by equation (1) can be generalized as follows. Let X1 ; : : : ; Xn 2 L2 . ; A; P / be an orthonormal system such that E.Xj / D m and let x1 ; : : : ; xn 2 Rd . Then the field Z defined by n X
Z.t / D
ei.t;xj / Xj ;
t 2 Rd
j D1
is stationary with correlation function C.h/ D
n X
.Xj ; Xj / ei.h;xj / ;
h 2 Rd :
j D1
(3)
To give an example of a real stationary field let A; B 2 L2r . ; A; P / be such that .A; A/ D .B; B/ D 1 and .A; B/ D .A; 1/ D .B; 1/ D 0. Further, let 2 Rd and put Z.t / D A cos t C B sin t;
t 2 Rd
where we write t instead of ..; t //, for brevity. Then E.Z.t // D 0. Using trigonometric identities we obtain .Z.t C h/; Z.t // D .A cos .t C h/ C B sin .t C h/; A cos t C B sin t / D cos .t C h/ cos t C sin .t C h/ sin t D cos h: Thus, Z is stationary with correlation function h 7! cos..; h//. Example 2.8.7. (1) In this example we generalize the field defined in equation (2.8.6.1) by making the exponent random. Let X be a real random variable and Y be a d-dimensional real random vector. Assume that X and Y are independent, E.X / D 0 and Var.X / D 2 < 1. We define the field Z by Z.t / D ei.t;Y / X;
t 2 Rd :
106
Chapter 2 Correlation functions
This field is stationary since E.Z.t // D E.X / E ei.t;Y / D 0 and .Z.t C h/; Z.t // D ei.tCh;Y / X; ei.t;Y / X D E ei.h;Y / X 2 D 2 fY .h/;
h 2 Rd
where fY denotes the characteristic function of Y . (2) Now we construct a real stationary field in a similar way. Let X and Y be as above and choose a real random variable ‰ such that X; Y; ‰ are independent and18 E.cos ‰/ D E.sin ‰/ D 0 and
E.cos2 ‰/ D E.sin2 ‰/ D
1 : 2
(1)
We define the field Z by Z.t / D X cos..t; Y / C ‰/;
t 2 Rd :
Then E.Z.t // D 0 and .Z.t C h/; Z.t // D 2 E cos..t C h; Y / C ‰/ cos..t; Y / C ‰/ : Applying trigonometric identities and using (1) we obtain 2 E .cos..t C h/Y / cos.t Y / C sin..t C h/Y / sin.t Y // 2 2 D E cos..h; Y // 2 2 D RefY .h/ 2
.Z.t C h/; Z.t // D
where fY is the characteristic function of Y . Example 2.8.8. (1) Let Zj ; j 2 Z, be independent and identically distributed such that E.Zj / D 0 and Var.Zj / D 2 . For 2 R we define the time series X by19 Xj D Zj C Zj 1 ;
18 19
j 2 Z:
These equations hold, e.g., if ‰ is uniformly distributed in Œ0; 2. A time series of this form is called a moving average sequence of order 1.
107
Section 2.8 Stationary fields
This time series is stationary: .Xj Ch ; Xj / D .Zj Ch C Zj 1Ch ; Zj C Zj 1 / 8 h D 0; < .1 C 2 / 2 ; h D ˙1; 2; D : 0; jhj > 0: It is even strongly stationary. To prove this we show that the characteristic function fY of the random vector Y D .X1Ch ; X2Ch ; : : : ; XnCh /;
h 2 Z; n 2 N
does not depend on h. Denote by f the characteristic function of Zj . For t D .t1 ; : : : ; tn / 2 Rn we have .Y; t / D
n X
.Zj Ch C Zj 1Ch /tj D t1 Zh C tn ZnCh C
j D1
n1 X
.tj C tj C1 /Zj Ch
j D1
and therefore n1 Y f .tj C tj C1 /: fY .t / D E ei.Y;t/ D f . t1 / f .tn / j D1
Thus, fY does not depend on h. (2) Let T D Z; Z W N.0; 2 / and a 2 C such that jaj < 1. We are looking for a stationary time series X satisfying the equation20 Xn D Zn C aXn1 ;
n 2 Z:
(1)
Applying this equation iteratively we obtain Xn D Zn C aZn1 C a2 Xn2 D D Zn C aZn1 C C ak Znk C akC1 Xnk1 : If X is stationary, then kXn k is constant, so 2 k X j a Znj D jaj2kC2 kXnk1 k2 ! 0; Xn
k ! 1:
j D0
Thus, Xn D
1 X
aj Znj ;
n 2 Z:
j D0
20
A time series satisfying this equation is called an autoregressive sequence of order 1.
(2)
108
Chapter 2 Correlation functions
By Theorem F.2.3 this series converges with probability one. Now we show that the time series X D fXn gn2N is stationary. Indeed, 1 X .Xn ; 1/ D aj .Znj ; 1/ D 0 j D0
for all n and for h 0 we have .XnCh ; Xn / D
1 X
j
1 X
j
kD0 1 X
a ZnChj ;
j D0
D
1 X
a ZnChj ;
j Dh
D
1 X
a Znk ! k
a Znk
kD0
a
j Ch
Znj ;
j D0
D 2 ah
! k
1 X
! k
a Znk
kD0
1 X
jaj2j
j D0
2 D ah : 1 jaj2 We find that X is stationary and the correlation function C is given by 2 .a/jhj ; h < 0: C.h/ D 1 jaj2 We have thus shown that equation (1) has exactly one stationary solution which is given by (2). If a ¤ 0 then equation (1) has a unique solution for each fixed X0 which can be determined iteratively: X1 D Z1 C aX0 ; X1 D .X0 Z0 /=a, and so on. If X0 is not given by (2), i.e., 1 X aj Zj X0 ¤ j D0
then the solution is not stationary. Example 2.8.9. Let Z be the Wiener process or the Poisson process constructed in Corollary 2.1.4 and Corollary 2.1.5, respectively. Further, let h 2 Œ0; 1/ be arbitrary and define the process X h by X h .t / D Z.t C h/ Z.t /;
t 2 Œ0; 1/:
The computations in Example 2.2.5 show that X h is stationary and its correlation function C h is given by C h .t / D max.0; h t /;
t 2 Œ0; 1/:
109
Section 2.9 Spectral representation of stationary fields
2.9
Spectral representation of stationary fields
First we generalize the notion of positive definiteness introduced in Definition 1.4.1.21 Definition 2.9.1. A function f W T T ! C is said to be T -positive definite if the inequality n X f .ti tj /ci cj 0 i;j D1
holds for every positive integer n, for all t1 ; : : : ; tn 2 T and for all c1 ; : : : ; cn 2 C. Now we reformulate some of our previous results in terms of stationary fields. The next theorem follows from Theorem 2.2.3. Theorem 2.9.2. The correlation function of a real or complex stationary field on T is T -positive definite. Conversely, let C be a real- or complex-valued function defined on T T . If C is T -positive definite, then there exists a real or complex, respectively, stationary Gaussian field on T with correlation function C . Analogous statements hold for the covariance function. In the remaining part of this section we consider only the correlation function. Combining the previous theorem with Bochner’s Theorem 1.7.3 and Herglotz’s Theorem 1.7.5 we obtain the following characterizations of correlation functions of stationary fields on Rd and Zd . Theorem 2.9.3. A continuous complex-valued function C on Rd is the correlation function of a continuous stationary field on Rd if and only if it can be represented in the form Z C.t / D
Rd
ei.t;x/ d .x/;
t 2 Rd
with some nonnegative finite Borel measure on Rd . Theorem 2.9.4. A complex-valued function C on Zd is the correlation function of a stationary field on Zd if and only if it can be represented in the form Z ei.n;t/ d .t /; n 2 Zd C.n/ D Œ0;2/d
with some nonnegative finite Borel measure on Œ0; 2/d . The measure in the previous theorems is called the spectral measure of the field. If is absolutely continuous with respect to the Lebesgue measure, then the corresponding density is called the spectral density. 21
See also Definition 4.1.1.
110
Chapter 2 Correlation functions
Example 2.9.5. Let Z W N.0; 2 / be a white noise on Zd . Since Z ei.n;x/ dd .x/ D 0; n 2 Zd n f0g Œ0;2/d
the correlation function C can be represented as Z 2 ei.n;x/ dd .x/; C.n/ D .2/d Œ0;2/d
n 2 Zd :
Thus, the spectral density is constant. Conversely, if a field Z on Zd has a constant spectral density, then Z is a white noise. Using Theorem 1.7.8 we can characterize the correlation functions of not necessarily continuous fields on Rd . By d we denote the set of all characters of the group Rd . Recall that d consists of all complex-valued functions ¤ 0 on Rd such that .x C y/ D .x/ .y/ and .x/ D .x/ for all x and y. Endowing d with the topology of pointwise convergence, d becomes a compact Hausdorff space (cf. Remark 1.7.6). Theorem 2.9.6. A complex-valued function C on Rd is the correlation function of a stationary field on Rd if and only if it can be represented in the form Z C.t / D
.t / d . /; t 2 Rd d
with some nonnegative finite Borel measure on d . Example 2.9.7. Let Z W N.0; 1/ be a white noise on Rd . Then C D 1f0g and the measure in the previous theorem is equal to the normalized Haar measure on d (see Corollary 1.7.11 and its proof). We are now able to prove integral representations for stationary fields using Karhunen’s Theorem 2.7.1. Theorem 2.9.8. Let Z be a continuous stationary field on Rd with correlation function C and spectral measure . Then there exists a uniquely determined random orthogonal measure on .Rd ; Bd / with structure measure such that Z ei.t;x/ d.x/; t 2 Rd Z.t / D Rd
and H.Z/ D H./. Conversely, let be a random orthogonal measure on .Rd ; Bd / with structure measure . Then the equation above defines a continuous stationary field Z with spectral measure .
111
Section 2.9 Spectral representation of stationary fields
Proof. We have Z
Z C.t s/ D
i.ts;x/
Rd
e
d .x/ D
Rd
ei.t;x/ ei.s;x/ d .x/;
t; s 2 Rd :
The linear span L of the functions ei.t;/ ; t 2 Rd , is dense in L2 . /. To see this assume that g 2 L? . Then Z ei.t;x/ g.x/ d .x/ D 0; t 2 Rd : Rd
N is finite. The equation above Since g 2 L2 . / L1 . /, the measure WD g shows that L D 0 and hence g D 0. Thus, L is dense in L2 . /. The first statement of the theorem follows immediately from Karhunen’s Theorem 2.7.1, while the second one is a consequence of basic properties of the integral with respect to (cf. Theorem 2.6.4). Example 2.9.9. Let Z be a continuous stationary field on Rd . If the spectral measure of Z is discrete, i.e., D
1 X
xj 2 Rd ; pj 0;
pj ıxj ;
j D1
1 X
pj < 1
j D1
then C.t / D
1 X
pj ei.t;xj / ;
t 2 Rd
j D1
and Z.t / D
1 X
ei.t;xj / Xj ;
t 2 Rd
j D1
where the Xj ’s are orthogonal and .Xj ; Xj / D pj (see Example 2.6.5). If Z is a Gaussian field then the Xj ’s are even independent. The next two theorems can be proved in the same way as Theorem 2.9.8. We omit the proofs. Theorem 2.9.10. Let Z be a stationary field on Zd with correlation function C and spectral measure a uniquely determined random orthogo . Thenthere exists nal measure on Œ0; 2/d ; B Œ0; 2/d with structure measure such that Z Z.n/ D and H.Z/ D H./.
ei.n;x/ d.x/; Œ0;2/d
n 2 Zd
112
Chapter 2 Correlation functions
Conversely, let be a random orthogonal measure on .Œ0; 2/d ; B.Œ0; 2/d // with structure measure . Then the equation above defines a stationary field Z with spectral measure . Theorem 2.9.11. Let Z be a stationary field on Rd with correlation function C and spectral measure . Then there exists a uniquely determined random orthogonal measure on .d ; B.d // with structure measure such that Z Z.t / D
.t / d. /; t 2 Rd d
and H.Z/ D H./. Conversely, if is a random orthogonal measure on .d ; B.d //, then the equation above defines a stationary field Z. We will call the random orthogonal measure occurring in Theorems 2.9.8, 2.9.10 and 2.9.11, the representing measure of the field Z. Remark 2.9.12. Let Z be a continuous stationary field on Rd (analogous considerations can be carried out for fields on Zd or for not necessarily continuous fields on Rd ). We have seen in the proof of Karhunen’s Theorem 2.7.1 that the mapping ei.t;/ 7! Z.t /; t 2 Rd , can be uniquely extended to an isometric linear operator IZ from L2 . / onto H.Z/ D H./. Using the inverse of IZ we can reformulate problems that are formulated in the space H.Z/ to problems formulated in L2 . /. To see examples, let X be a stationary time series on Z with spectral measure . By isometry we have 2 Z 2 ˇ ˇ2 n n X ˇ i.nC1/x X ˇ ikx ˇ ˇ XnC1 ck Xk D ck e ˇ d .x/: ˇe kD0
0
kD0
This equation, for example, is important for the prediction of stationary processes.22 In this case the problem is to find those cj ’s which minimize the left-hand side. Another problem is to determine .Œa; b//; a < b, where is the random orthogonal measure in the spectral representation of X . For this we choose a sequence X cj;n eikj;n x ; n 2 N; kj;n 2 Z; x 2 Œ0; 2/ pn .x/ D j
of trigonometric polynomials converging in L2 . ) to 1Œa;b/ . Since IZ is isometric we have limn!1 IZ .pn / D IZ .1Œa;b/ /. Using the definition of (see equation (2.7.1.3)) we obtain X cj;n Xkj;n : .Œa; b// D lim n!1
22
j
We will generalize the equation above in Theorem 2.9.13.
113
Section 2.9 Spectral representation of stationary fields
We will use this method in Theorems 2.9.16 and 2.9.17 to prove inversion formulae. In Section 2.10 we will investigate the interconnection between stationary fields and unitary representations which will provide another point of view for the operator IZ . Theorem 2.9.13. Let Z be a continuous stationary field on Rd with spectral measure , correlation function C , and let and be complex measures on Rd . Then Z
Z Z Z d; Z d D L L d d Rd Rd ZR Z C.t s/ d.t / d.s/ D d d ZR R C d. / Q D Rd
Z
and
Rd
Z d D IZ ./ L
(1)
where IZ is the isometric operator from L2 . / onto H.Z/ introduced in Remark 2.9.12. Proof. The integrability of Z is a consequence of Theorem 2.4.4, since kZ.t /k is constant. The equations Z Z
Z Z Z Z d; Z d D C.t s/ d.t / d.s/ D C d. / Q Rd
Rd
Rd
Rd
Rd
follow from Theorem 2.4.6 and from the definition of the convolution (cf. Section E.2). To prove the remaining two equations it suffices to consider nonnegative (finite) measures. Using the definition of IZ and the fact that it is isometric we see that the equations hold for measures with finite support. By Corollary 2.4.10 there exist sequences fn g and fn g of nonnegative measures with finite support converging weakly to and , respectively, such that Z Z Z Z Z d D lim Z dn and Z d D lim Z dn : Rd
n!1 Rd
Rd
n!1 Rd
Theorem 1.6.1 shows that the sequences fL n g and fL n g converge uniformly on comL respectively. Now the equations in pact sets (and hence also in L2 . /) to L and , question follow by taking the limit n ! 1. The next theorem can be proved in the same way as the previous one. We omit the proof.
114
Chapter 2 Correlation functions
Theorem 2.9.14. If Z is a stationary field on Zd with spectral measure and and are complex measures on Zd , then
Z Z Z Z d; Z d D L L d Zd
Zd
Œ0;2/d
Z
and
Zd
Z d D IZ ./: L
(1)
As an application of the operator IZ we present the so-called sampling theorem attributed to E. T. Whittaker, H. Nyquist, C. Shannon, V. Kotelnikov, and É. Borel. It implies, in terms of signal processing, that a bandlimited signal can be recovered from uniformly spaced discrete samples as long as the sampling rate is not smaller than twice the bandwidth. Theorem 2.9.15 (sampling theorem). Let X be a continuous stationary process on R such that the support of its spectral measure is contained in a finite interval Œa; a; a > 0, and .fag/ D .fag/ D 0. Then we have X.t / D
1 n X sin.at n/ Z ; at n a nD1
t 2 R:23
Proof. The correlation function C of X admits the representation Z a C.t / D eitx d .x/; t 2 R: a
Let t 2 R be fixed and consider the function ft W x 7! eitx on the interval Œa; a. Its Fourier series is given by 1 X sin.at n/ ix n e a ; at n nD1
x 2 Œa; a:
Since ft is continuously differentiable, the series converges in L2 . / to ft (see Corollary C.3.2). Applying the isometry IZ we obtain the statement of the theorem. Next we present two inversion theorems. Theorem 2.9.16. Let Z be a continuous stationary field on Rd with representing measure and spectral measure . If the -measure of the boundary of the set J D .a1 ; b1 / .ad ; bd / Rd ; 23
We use the convention
sin 0 0
WD 1.
aj < bj
Section 2.9 Spectral representation of stationary fields
115
is equal to 0, then24 1 .J / D lim T !1 .2/d Proof. We have KT .x/ WD
1 .2/d
D
1 .2/d
Z
Z
Z
T
T
Z
T
T
d Y eiaj tj eibj tj X.t / dt: itj
T j D1
T
::: Z
T
:::
d Y eiaj tj eibj tj i.x;t/ e dt itj
T j D1
Z
T
T
::: T
d Y ei.xj aj /tj ei.xj bj /tj dt itj
T j D1
Z T d
Y sin.xj bj /tj 1 sin.xj aj /tj D dtj 2 T tj tj j D1 d Y
DW
Kj .T; x/:
j D1
By Corollary B.1.8,
8 < 1; 1 lim Kj .T; x/ D ; : 2 T !1 0;
xj 2 .aj ; bj / xj D aj or xj D bj : xj … Œaj ; bj
and hence lim KT D 1J ;
T !1
-almost everywhere:
Theorem B.1.9 shows that jKj .T; x/j < 2 and therefore we also have convergence in L2 . /. The isometric operator IZ being continuous we conclude that limT !1 IZ .KT / D .J /. Now the theorem follows immediately from equation (2.9.13.1). The next theorem can be proved in the same way by using Corollary C.3.2. Theorem 2.9.17. Let Z be a stationary field on Zd with representing measure and spectral measure measure . If the -measure of the boundary of the set J D .a1 ; b1 / .ad ; bd /;
0 aj < bj < 2
is equal to 0, then d X Y 1 eiaj kj eibj kj Z.k/: n!1 .2/d ikj j D1
.J / D lim
jkjn
24
In case of tj D 0 we use the same convention as in Theorem 1.3.1.
116
Chapter 2 Correlation functions
Theorem 2.9.18. Let Z be a continuous stationary field on Rd or Zd with representing measure and spectral measure . The following properties are equivalent: (i)
H.Z/ is finite-dimensional;
(ii)
the support of is finite; P Z.t / D njD1 ei.t;xj / Xj with some n 2 N, mutually orthogonal Xj 2 H.Z/, and xj 2 Rd or xj 2 Œ0; 2/d , respectively.
(iii)
Proof. Noting that L2 . / is finite dimensional if and only if supp is finite, the equiv2 alence of (i) and (ii) follows Pn from the fact that H.Z/ and L . / are isomorphic. Assume that D j D1 pj ıxj , where the xj ’s are mutually distinct. Then the functions 1fxj g build an orthogonal basis of L2 . /. For all functions f we have f D
n X
f .xj /1fxj g
j D1
equality being in L2 . /, and hence Z n X f .xj /Xj ; f .x/ d.x/ D j D1
R
where the random variables Xj WD 1fxj g d are orthogonal. Relation (iii) follows from the equation above by setting f .x/ WD ft .x/ WD ei.t;x/ and from the spectral representation of Z. If (iii) holds, then H.Z/ D spanfX1 ; : : : ; Xn g, from which (i) follows. Theorem 2.9.19. Let Z be a continuous stationary field on T D Rd or on T D Zd with correlation function C and spectral measure . Then Z is stationary with correlation function Q C and spectral measure given by 2 d .x/ D j.x/j O d .x/:
Moreover, Z admits the representation Z Z.t / D ei.t;x/ .x/ O d.x/;
(1)
t 2 T;
where is the representing measure of Z and integration is over Rd or Œ0; 2/d , respectively. Proof. The statement on stationarity and on the correlation function follows from Lemma 2.8.4 with D while equation (1) follows from Theorem 1.8.15 and Theorem 1.9.5, respectively.
117
Section 2.10 Unitary representations
The statement on the representation of Z can be easily checked when the support of is finite. In the general case we approximate by measures with finite support using (2.4.15.5)25 and (2.6.4.iii).
2.10
Unitary representations
This section contains an introduction to unitary representations and their connection to stationary fields. This connection is useful for example in the study of ergodic properties of stationary fields (see Section 5.3). Throughout the present section H is a complex Hilbert space and Z denotes a stationary field on T , where T D Rd or T D Zd . Unitary Operators 2.10.1. Recall that a linear operator U from H onto H is called unitary if .U h; U h/ D .h; h/ for all h 2 H . If U is unitary then .U h; Ug/ D .h; g/
and
.U h; g/ D .h; U 1 g/;
h; g 2 H:
To see examples, let H D L2 .d /. Since the Lebesgue measure is invariant under translations, the equation26 Ut h.x/ WD h.x t /;
h 2 H; x 2 Rd
defines a unitary operator Ut in H for all t 2 Rd . Another example is given by U h.x/ WD g.x/h.x/;
h 2 H; x 2 Rd
where g is a Lebesgue measurable function on Rd such that jgj D 1. Definition 2.10.2. A unitary representation .Ut / of T in H is a mapping t 7! Ut from T into the set of all unitary operators in H such that UtCs D Ut Us ;
t; s 2 T:
The representation .Ut / is called continuous if the mapping t 7! Ut h is continuous for all h 2 H with respect to the norm topology on H . If T D Rd and the function t 7! .g; Ut h/ is Lebesgue measurable for all g; h 2 H , then .Ut / is said to be weakly measurable.
25 26
See also Corollary 2.4.10. Strictly speaking this definition is not correct since h is an equivalence class and not a function. It can easily be made correct by using the representants of h.
118
Chapter 2 Correlation functions
Lemma 2.10.3. Let .Ut / be a unitary representation of T in H . Then U0 D I (the identity operator), Ut D Ut1 and .Ut v; w/ D .v; Ut w/;
t 2 T; v; w 2 H:
Moreover, .Ut / is continuous if and only if the complex-valued functions t 7! .Ut v; w/ are continuous for all v; w 2 L, where L is a dense subset of H . Proof. Since U0 D U0C0 D U0 U0 we see that U0 D I . From I D Utt D Ut Ut it follows that Ut D Ut1 and hence .Ut v; w/ D .v; Ut w/ D .v; Ut1 w/ D .v; Ut w/: If .Ut / is continuous, then the functions t 7! .Ut v; w/ are obviously continuous for all v; w 2 H . Assume that these functions are continuous for all v; w 2 L. Let h 2 H and w 2 L be arbitrary and choose a sequence fvn g in L converging to h. Using the Cauchy–Schwarz inequality we get j.Ut vn ; w/ .Ut h; w/j D j.Ut .vn h/; w//j kvn hk kwk: Thus, the function t 7! .Ut h; w/ is continuous since it is the uniform limit of the sequence t 7! .Ut vn ; w/ of continuous functions. Applying the same argument once more, we see that t 7! .Ut h; g/ is continuous for all h; g 2 H . The continuity of .Ut / follows from the identity kUt h Us hk2 D 2khk2 .Ut h; Us h/ .Us h; Ut h/: Example 2.10.4. Continuous unitary representations of Rd in H D L2 .d / are given by Ut h.x/ WD h.x t /; h 2 H; x; t 2 Rd and Ut h.x/ WD ei.t;x/ h.x/;
h 2 H; t; x 2 Rd :
The continuity of the first example follows from Lemma 2.10.3 and from the fact that the function Z t 7! h.x t /g.x/ d.x/; t 2 Rd is continuous for all h; g 2 L2 .Rd / (cf. Theorem E.2.4). Replacing Rd by Zd and d by the counting measure on Zd we obtain unitary representations of Zd .
119
Section 2.10 Unitary representations
Definition 2.10.5. Let .Ut / be a unitary representation of T in H . A subspace L of H is said to be .Ut /-invariant if Ut L L;
t 2 T:
A vector h 2 H; h ¤ 0, is called an eigenvector of .Ut / if there exists a complexvalued function on T such that Ut h D .t /h;
t 2 T:
Let be as above. It is easy to check that .0/ D 1 and
.t C s/ D .t / .s/;
t; s 2 T:
Thus, is a character of T . If the representation is continuous, then is continuous as well. Definition 2.10.6. Let .Ut / be a unitary representation of T in H . A vector v 2 H is called a cyclic vector if the linear space spanfUt W t 2 T g is dense in H . Unitary representations having a cyclic vector are called cyclic. Example 2.10.7. The constant function 1 2 H D L2 .Œ0; 2// is a cyclic vector for the unitary representation Ut h.x/ WD eitx h.x/;
h 2 H; x 2 Œ0; 2/; t 2 Z
of Z since the trigonometric polynomials build a dense linear space in H . This follows immediately from the Stone–Weierstraß Theorem B.2.4. Example 2.10.8. Let H be a subspace of L2 . ; A; P / and let .Ut / be a unitary representation of T in H . We choose an arbitrary random variable X 2 H and define the second order field Z by Z.t / WD Ut X; t 2 T: Then Z is stationary. Indeed, .Z.t /; Z.s// D .Ut X; Us X / D .Us Ut X; X / D .Uts X; X /;
t; s 2 T:
It is easy to check that H.Z/ is .Ut /-invariant. Moreover, the restriction of .Ut / to H.Z/ is a unitary representation with cyclic vector X .
120
Chapter 2 Correlation functions
Lemma 2.10.9. Let .Ut / be a unitary representation of T in H . The function t 2T
f .t / D .h; Ut h/; is positive definite for all h 2 H .
Proof. Let tj 2 T and cj 2 C; j D 1; : : : ; n, be arbitrary. Using that Utj ti D we obtain Utj Ut1 i n X
n X
f .tj ti /ci cj D
i;j D1
D
i;j D1 n X
.ci h; cj Utj ti h/ .ci Uti h; cj Utj h/
i;j D1
X 2 n D ci Uti h 0; iD1
i.e., f is positive definite. Definition 2.10.10. Two unitary representations .Ut / and .Vt / of T in H1 and H2 , respectively, are called equivalent if there exists an isometric linear operator M from H1 onto H2 such that M Ut D Vt M; t 2 T: Lemma 2.10.11. Let .Ut / and .Vt / be equivalent unitary representations of Rd in H1 and H2 , respectively. Then .Ut / is continuous if and only if .Vt / is continuous. Proof. Let M be as in Definition 2.10.10. For all v; w 2 H1 we have .M Ut v; w/ D .Ut v; M 1 w/ D .Vt M v; w/;
t 2 Rd :
Since M maps H1 onto H2 , the statement follows from Lemma 2.10.3. Lemma 2.10.12. Let .Ut / and .Vt / be cyclic unitary representations of T in H1 and H2 with cyclic vectors h1 and h2 , respectively. If .h1 ; Ut h1 / D .h2 ; Vt h2 /;
t 2 T;
then these representations are equivalent.27
27
We use the same notation for the inner products in H1 and H2 .
121
Section 2.10 Unitary representations
Proof. We define the mapping M from L WD span fUt h1 W t 2 T g into H by M
X n
ci Uti h1 WD
iD1
n X
ci Vti h2 :
iD1
That this definition is correct, i.e., the left-hand side does not depend on the special choice of ci and ti , follows from the relation X
X m n m n X X ci Uti h1 ; dj Usj h1 D ci dj h1 ; Usj ti h1 iD1
j D1
iD1 j D1
D
X n
ci Vti h2 ;
m X
dj Vsj h2
j D1
iD1
and from the fact that h1 and h2 are cyclic vectors. This mapping is linear and M h1 D h2 ; .g; h/ D .M g; M h/; M Ut g D Vt M g;
t 2 T; g 2 L:
By Theorem D.3.6, M can be extended to an isometric linear operator MQ from H1 onto H2 . It is clear that MQ Ut D Vt MQ for all t , i.e., .Ut / and .Vt / are equivalent. Lemma 2.10.13. Let and be complex measures with finite support on T . If the equation Z.s/ D Z.s/ holds for some s 2 T , then Z.t / D Z.t / for all t 2 T . Proof. Indeed, applying Lemma 2.8.4 we see that k. / Z.t /k2 D . / . /Q C.0/: Thus, k. / Z.t /k does not depend on t . Since it is equal to zero if t D s, it is zero for all t . The next theorem establishes a close relation between unitary representations and stationary fields. Theorem 2.10.14. For every (continuous) stationary field Z on T there exists a (continuous) cyclic unitary representation .Ut / of T in the Hilbert space H.Z/ such that Ut Z.s/ D Z.s t /;
s; t 2 T:
(1)
122
Chapter 2 Correlation functions
We call this representation the canonical unitary representation of Z in H.Z/. Conversely, if .Vt / is a (continuous) cyclic unitary representation of T in a Hilbert space H , then there exists a (continuous) stationary field Z on T such that the canonical unitary representation of Z in H.Z/ is equivalent to .Vt /. Proof. Denote by L.Z/ the linear space of all vectors v 2 H.Z/ of the form X v D Z.0/ D cj Z.xj / j
P
where D define Ut v by
j
cj ıxj is a complex measure with finite support. For v as above we Ut v WD ıt Z.0/ D Z.t /;
t 2 T:
It follows from Lemma 2.10.13 that this definition is correct, i.e., Ut v does not depend on the special representation of v. It is clear that Ut is a linear operator from T .Z/ onto T .Z/. Moreover, Ut Us v D UtCs v; t; s 2 T; in particular, Ut Ut v D v. It follows from Lemma 2.8.4 that Ut is isometric: .Ut v; Ut v/ D .v; v/;
v 2 L.Z/:
We conclude that Ut can be extended to a unitary operator on H.Z/ D L.Z/, we denote this operator also by Ut . It is easy to check that .Ut / is a unitary representation of T with cyclic vector Z.0/. If Z is continuous then, by the definition of Ut , the function t 7! Ut v is continuous for all v 2 L. In view of Lemma 2.10.3, the representation .Ut / is continuous. To prove the second part of the theorem, assume that .Vt / is a cyclic unitary representation of T in some Hilbert space H and let v0 be a cyclic vector. The function f .t / D .v0 ; Vt v0 /;
t 2T
is positive definite in view of Lemma 2.10.9. By Theorem 1.7.8 there exists a unique measure 2 MC ./ such that Z
.t / d. /; t 2 T; f .t / D
where denotes the character group of T . For each t 2 T the function gt . / D .t /;
2
is bounded and continuous on , hence gt 2 L2 ./. We define the second order random field Z W T ! L2 ./ by Z.t / D gt . Since Z Z
.t / .s/ d. / D
.t s/ d. / D f .t s/ .Z.t /; Z.s// D d
d
123
Section 2.10 Unitary representations
this field is stationary. Denoting by .Ut / the canonical unitary representation in H.Z/ we have Ut Z.s/ D Z.s t / D gst D gt gs D gt Z.s/: From this we see that the operator Ut acts as multiplication with the function gt . By the uniqueness of , the constant function 1 is a cyclic vector for .Ut /. Moreover, Z .1; Ut 1/ D
.t / d. / D f .t / D .v0 ; Vt v0 /:
Lemma 2.10.12 shows that the representations .Ut / and .Vt / are equivalent. If .Vt / is continuous, then .Ut / is continuous as well (cf. Lemma 2.10.11). The continuity of Z follows from Ut Z.0/ D Z.t /. Remark 2.10.15. Let Z be a continuous stationary field on Rd . The proof of the second part of the previous theorem can be carried out by using Bochner’s Theorem 1.7.3 instead of Theorem 1.7.8, as follows. Let be the spectral measure of Z and consider the isometric mapping IZ W L2 . / ! H.Z/ with IZ ei.t;/ D Z.t / (cf. Remark 2.9.12). We define the unitary representation .Vt / by Vt g.x/ WD ei.t;x/ g.x/;
g 2 L2 . /:
The function 1 is a cyclic vector of this representation.28 Moreover, .Vt / is equivalent with the unitary representation .Ut / of Z in H.Z/. This follows from the relation .1; Vt 1/ D 1; ei.t;/ D IZ .1/; IZ ei.t;/ D .Z.0/; Z.t // D .Z.0/; Ut Z.0// and from Lemma 2.10.12.29 We call .Vt / the canonical unitary representation of Z in L2 . /. In Section 2.11 we will encounter another unitary representation for Z which is constructed using the correlation function of Z. Lemma 2.10.16. Let .Ut / be a unitary representation of T in H , let H0 be a subspace of H and denote by P0 the orthogonal projection from H onto H0 . The following statements are equivalent: (i)
H0 is .Ut /-invariant;
(ii)
Ut P 0 D P 0 Ut ;
(iii)
H0?
28 29
t 2 T;
is .Ut /-invariant.
This follows from the fact that is uniquely determined by its inverse Fourier–Stieltjes transform. Note that .Z.0/; Z.t // D .Z.s/; Z.s t // by stationarity, so that Z.0/ can be replaced by Z.s/ where s 2 Rd is arbitrary.
124
Chapter 2 Correlation functions
Proof. The equivalence of (i) and (iii) follows from the relations H0 is .Ut / invariant ” Ut h 2 H0 ; ” Ut h ? g;
t 2 T; h 2 H0 h 2 H0 ; g 2 H0? t; h 2 H0 ; g 2 H0?
” h ? Ut g; ” Ut g 2
H0? ;
t; g 2 H0?
” H0? is .Ut / invariant: Assume that (i) holds. An arbitrary element h 2 H can be written as h D h0 C h? 0;
? h 0 2 H0 ; h ? 0 2 H0 :
From this we see that Ut P0 h D Ut h0 . Since (iii) holds we obtain P 0 Ut h D P 0 Ut h 0 C P 0 Ut h ? 0 D Ut h 0 : Thus the relation (i) implies (ii). Now assume that (ii) holds. For all h0 2 H0 we have P0 Ut h0 D Ut P0 h0 D Ut h0 and hence Ut h0 2 H0 . Thus, H0 is .Ut /-invariant, completing the proof. Lemma 2.10.17. Let Z be a stationary field on T with canonical unitary representation .Ut / in H.Z/. If H.Z/ admits the decomposition H.Z/ D H1 ˚ H2 where H1 and H2 are .Ut /-invariant subspaces, then Z can be decomposed as Z D Z1 C Z2 where Z1 and Z2 are orthogonal stationary fields with H.Z1 / D H1 and H.Z2 / D H2 . Proof. Denote by Pi the orthogonal projection onto Hi . Setting Zi .t / WD Pi Z.t / D Pi Ut Z.0/;
t 2 V; i D 1; 2;
we obviously have Z D Z1 CZ2 and Z1 ? Z2 . By Lemma 2.10.16 we have Zi .t / D Ut Pi Z.0/ and hence Zi is stationary (cf. Example 2.10.8) and H.Zi / Hi . From the relation H1 ˚ H2 D H.Z/ D H.Z1 C Z2 / H.Z1 / ˚ H.Z2 / H1 ˚ H2 we conclude that H.Z1 / ˚ H.Z2 / D H1 ˚ H2 and therefore H.Zi / D Hi .
Section 2.11 Unitary representations and positive definite functions
125
Corollary 2.10.18. If H.Z/ is n-dimensional where n 2 N, then there exist onedimensional .Ut /-invariant subspaces Hj , j D 1; : : : ; n; such that30 H.Z/ D H1 ˚ ˚ Hn : Moreover, Z.t / D 1 .t /X1 C C n .t /Xn ;
t 2 T;
where Xj 2 Hj and j is a character of T . Proof. Since the unitary operators Ut commute, they have a common eigenvector. Denoting by H1 the subspace spanned by this eigenvector X1 we have H.Z/ D H1 ˚ H1? : By Lemma 2.10.16, the subspace H1? is .Ut /-invariant. Using this, the lemma follows easily by induction on n (see also the discussion in the first part of Example 2.8.6).
2.11
Unitary representations and positive definite functions
In Theorem 2.10.14 we constructed a unitary representation for a given stationary field by using the integral representation of the correlation function. In the present section we follow another method where we construct a unitary representation without using the integral representation. We also present some applications of this method. Throughout this section we assume that T D Rd or T D Zd . In the linear space of all complex-valued functions g on T we introduce the translation operator Et by Et g.x/ WD g.x t /;
t; x 2 T:
We have Et g D ıt g and EsCt D Es Et . For a complex-valued function g on T , we denote by T .g/ the linear space spanned by all translates of g. Note that T .g/ D f g W 2 Mf .T /g: Theorem 2.11.1. To every positive definite function f on T there corresponds a Hilbert space H.f / with inner product .; / D .; /f such that (i)
elements of H.f / are bounded complex-valued functions on T and H.f / is invariant under translations;
(ii)
T .f / H.f / and T .f / is dense in H.f /;
(iii)
setting Ut g WD Et g; g 2 H.f /; t 2 T , we obtain a unitary representation .Ut / of T in H.f / with cyclic vector f ;
30
See also Theorem 2.9.18.
126
Chapter 2 Correlation functions
(iv)
g.t / D .g; Ut f / for all g 2 H.f / and t 2 T ; in particular, f .t / D .f; Ut f /;
(v)
convergence of a sequence fgn g in H.f / implies uniform convergence on T ;
(vi)
weak convergence of a net fg˛ g in H.f / implies pointwise convergence;
(vii) if f 2 P c .Rd /; then the representation .Ut / and each g 2 H.f / are continuous; (viii) if f 2 P m .Rd /; then each g 2 H.f / is measurable and the representation .Ut / is weakly measurable. Pn Proof. Let g; h 2 T .f / and choose finitely supported measures D iD1 ai ıxi Pm and D j D1 bj ıyj such that g D f and h D f . We write .g; h/ WD Q f .0/ D
m n X X
f .yj xi /ai bj :
iD1 j D1
Since .g; h/ D
m X
g.yj /bj D
j D1
n X
h.xi /ai ;
iD1
the definition of .g; h/ does not depend on the particular choice of and . Using Lemma 1.4.3 and Lemma 1.4.8 we see that .; / is a positive semidefinite inner product on T .f /. It follows from the definition of .; / that g.t / D .g; Ut f /;
t 2 T; g 2 T .f /:
(1)
Moreover, the translation operators Et are isometric on T .f /: .Et g; Et g/ D .g; g/;
g 2 T .f /:
Using equation (1), together with the Cauchy–Schwarz inequality, we obtain jg.t /j2 D j.g; Et f /j2 .g; g/.Et f; Et f / D .g; g/.f; f /:
(2)
From this we conclude that .g; g/ D 0 if and only if g D 0, so T .f / is a pre-Hilbert space. Moreover, every function g 2 T .f / is bounded. Let fgn g be a Cauchy sequence in T .f /. It follows from (2) that fgn g, as a sequence of functions on T , is a Cauchy sequence in the metric of uniform convergence on T . Consequently, there exists a complex-valued function g on T with lim gn .t / D g.t /;
n!1
t 2T
(3)
where the convergence is uniform. Let H.f / be the completion of T .f / constructed by means of functions on T . These functions are obviously bounded. It follows easily
127
Section 2.11 Unitary representations and positive definite functions
from (3) and from the fact that Et is isometric on T .f / that H.f / is invariant under Et and .Et g; Et g/ D .g; g/ holds for all g 2 H.f /. Denote by Ut the restriction of Et to H.f /. Since Ut Ut D Ut Ut D I (the identity operator), the operator Ut is invertible and hence unitary. It is easy to check that .Ut / is a unitary representation of T in H.f /. The linear space spanned by the set fUt f W t 2 T g is equal to T .f / and hence it is dense in H.f /. Since g 7! .g; Ut f / is continuous on H.f / and T .f / is dense in H.f /, equation (1) holds for all g 2 H.f /. Using the Cauchy–Schwarz inequality again, we obtain jgn .t / g.t /j2 D j.gn g; Ut f /j2 kgn gk2 kf k2 : Thus, gn ! g uniformly on T whenever fgn g converges to g in H.f /. Property (vi) follows immediately from the equation g˛ .t / D .g˛ ; Ut f /. To prove (vii), let g; h 2 H.f / be arbitrary and choose n ; n 2 Mf .T / such that limn n f D g and limn n f D h, the convergence being in H.f /. Then the function t 7! .g; Ut h/ is the uniform limit of the functions t 7! .n f; Ut n f / D Q n n f .t / which are continuous. Thus, the representation .Ut / is continuous. The continuity of the functions in H.f / now follows from (iv). The last statement can be proved using the same arguments. Throughout the rest of this section .Ut / denotes the unitary representation of T in H.f /, constructed in the previous theorem. If f D C is the correlation function of a stationary process Z, then .Ut / is called the canonical unitary representation of Z in H.C /. Theorem 2.11.2. Let f 2 P .T / and let H1 and H2 be .Ut /-invariant subspaces of H.f / such that H.f / D H1 ˚ H2 . Then the orthogonal projection fi of f onto Hi .i D 1; 2/ is a positive definite function. Moreover, the equation gi .t / D .gi ; Ut fi / holds for all t 2 T and gi 2 Hi .i D 1; 2/. Proof. Using that H1 ? H2 and the fact that Hi is .Ut /-invariant, we obtain gi .t / D .gi ; Ut f / D .gi ; Ut .f1 C f2 // D .gi ; Ut fi /;
gi 2 Hi :
In particular, fi .t / D .fi ; Ut fi / and hence fi is positive definite in view of Lemma 2.10.9.
128
Chapter 2 Correlation functions
Theorem 2.11.3. If f D f1 C f2 , where f; f1 ; f2 2 P .T /, then f1 ; f2 2 H.f / and there exists a nonnegative operator Q on H.f / such that kQk 1 and QUt D Ut Q
(1)
f1 .t / D .Qf; Ut f /
(2)
f2 .t / D ..I Q/f; Ut f /
(3)
for all t 2 T . Conversely, if f 2 P .T / and an operator Q on H.f / has the properties above, then the functions f1 and f2 defined by (2) and (3) are positive definite and f D f1 C f2 . Proof. We define in T .f / a new positive semidefinite inner product .; /1 by setting . f; f /1 WD Q f1 .0/;
; 2 Mf .T /:
(4)
Since f1 and f f1 are positive definite, we have 0 .g; g/1 .g; g/;
g 2 T .f /:
From Theorem D.3.5 we conclude that the inner product .; /1 can be extended to a positive semidefinite inner product on H.f /, which we also denote by .; /1 . By the second part of Theorem D.3.5 there exists a nonnegative linear operator Q on H.f / such that kQk 1 and .g; h/1 D .Qg; h/;
g; h 2 H.f /:
From equation (4) we see that .Ut g; h/1 D .g; Ut h/1 holds for all g; h 2 H.f / and t 2 T . Using this, we obtain .Ut Qg; h/ D .Qg; Ut h/ D .g; Ut h/1 D .Ut g; h/1 D .QUt g; h/: This being true for all g; h 2 H.f /, we must have Ut Q D QUt ; t 2 T . Taking into account the definition of .; /1 , we obtain f1 .t / D .f; Ut f /1 D .Qf; Ut f / D .Qf /.t / and f2 .t / D f .t / f1 .t / D ..I Q/f; Ut f / D ..I Q/f /.t /: Therefore, f1 D Qf 2 H.f / and f2 D .I Q/f 2 H.f /. Suppose now that Q is a nonnegative operator on H.f / with kQk 1 and QUp t D Ut Q; t 2 Tp. Then the operator I Q is nonnegative as well. Setting Q1 WD Q and Q2 WD I Q, the operators Q1 and Q2 commute with Ut for every t 2 T (cf. Theorem D.3.7). Thus, we have f1 .t / D .Qf; Ut f / D .Q1 f; Ut Q1 f /
Section 2.11 Unitary representations and positive definite functions
129
and f2 .t / D ..I Q/f; Ut f / D .Q2 f; Ut Q2 f /: It is obvious that f D f1 C f2 . It follows from Lemma 2.10.9 that f1 and f2 are positive definite. Corollary 2.11.4. Let f1 ; f2 be positive definite functions on Rd . If f1 C f2 is continuous, then f1 and f2 are also continuous. If f1 C f2 is measurable (vanishes -almost everywhere), then f1 and f2 are measurable (vanish -almost everywhere, respectively). Proof. By the previous theorem f1 ; f2 2 H.f / and therefore the corollary follows from the last two statements of Theorem 2.11.1. Theorem 2.11.5. Let f 2 P m .Rd /; h 2 L1 .Rd / and define the mapping Ah by Ah g WD h g; g 2 H.f /.31 Then (i)
Ah is a bounded linear operator from H.f / into H.f /;
(ii)
if H0 is a closed .Ut /-invariant subspace of H.f /, then H0 is invariant under Ah ;
(iii)
.Ah / D AhQ .
Proof. By Theorem 2.11.1, the function t 7! .Ut g1 ; g2 / is measurable for all g1 ; g2 2 H.f /. We define the sesquilinear functional Bh by Z Bh .g1 ; g2 / WD .Ut g1 ; g2 /h.t / dt; g1 ; g2 2 H.f /: Rd
Since j.Ut g1 ; g2 /j kg1 k kg2 k, we have jBh .g1 ; g2 /j kg1 k kg2 k
Z Rd
jh.t /j dt:
Consequently, by the second part of Theorem D.3.5, there exists a bounded linear operator Ch W H.f / ! H.f / for which Bh .g1 ; g2 / D .Ch g1 ; g2 / for all g1 ; g2 2 H.f /. Using the definition of Bh we obtain Z Z .Ch g/.t / D .Ch g; Ut f / D .Us g; Ut f /h.s/ ds D g.t s/h.s/ ds D h g.t / G
G
and so Ah g D Ch g 2 H.f / if g 2 H.f /:
31
Note that h g is a continuous function because g is bounded and measurable by Theorem 2.11.1.
130
Chapter 2 Correlation functions
Let H0 be a closed .Ux /-invariant subspace of H.f /. If g1 2 H0 , then Z .Ah g1 ; g2 / D .Ut g1 ; g2 /h.t / dt D 0 G
holds for every g2 2 H0? : Thus, Ah g1 2 H0?? D H0 , i.e., H0 is invariant under Ah . We now compute the adjoint Ah of Ah : .Ah g/.x/ D .Ah g; Ux f / D .g; Ah Ux f / D .Ah Ux f; g/ Z .UxCy f; g/h.y/ dy D G Z g.x C y/h.y/ dy D G
D .hQ g/.x/ D .AhQ g/.x/: Thus, Ah D AhQ . Corollary 2.11.6. For all f 2 P m .Rd / we have: 32 (i) .h g1 ; g2 / D .g1 ; hQ g2 /; g1 ; g2 2 H.f /; h 2 L1 .G/; Z Z (ii) f .y x/h.x/h.y/ dx dy 0; h 2 L1 .Rd /. Rd
Rd
Proof. Property (i) is a consequence of (2.11.5.iii) while (ii) is proved as follows: Z Z f .y x/h.x/h.y/ dx dy D hQ h f .0/ Rd
Rd
D .hQ h f; f / D .h f; h f / 0:
We are now able to prove a decomposition theorem for measurable positive definite functions. Theorem 2.11.7. Let f 2 P m .Rd /. Then f D fc C f 0 where fc 2 P c .Rd /; f0 2 P m .Rd / and f0 D 0 -almost everywhere. Proof. Denote by Hc the set of all continuous functions in H.f / and by H0 the orthogonal complement of Hc . Then Hc and H0 are closed .Ut /-invariant subspaces of H.f / and H.f / D Hc ˚ H0 : 32
See also Lemma 1.5.8 and Corollary 2.4.8.
Section 2.11 Unitary representations and positive definite functions
131
By Theorem 2.11.2 the function f can be decomposed as f D fc C f0 ;
fc 2 P c .Rd / \ Hc ; f0 2 P .Rd / \ H0 :
It remains to prove that f0 D 0; -almost everywhere. Theorem 2.11.5 shows that h f0 2 H0 for all h 2 L1 .Rd /. But h f0 is continuous, i.e., h f0 2 Hc , and therefore we must have h f0 D 0. This being true for all h 2 L1 .Rd /, we obtain f0 D 0 -almost everywhere. Theorem 2.11.8. If f 2 P .Rd / is Lebesgue measurable on a neighborhood V of 0, then f is Lebesgue measurable on Rd . Proof. Let B be an open ball with center 0 such that B CB V and denote by HB the subspace of H.f / spanned by the functions Ut f; t 2 B. As f is measurable on V and B C B V , the function Ut f is measurable on B for all t 2 B. Since convergence in H.f / implies uniform convergence (cf. Theorem 2.11.1) every function in HB is measurable on B. If g ? 2 HB? , then g ? .t / D .g ? ; Ut f / D 0; t 2 B. Applying the decomposition H.f / D HB ˚ HB? we see that every function h 2 H.f / can be written as h D g C g? where g is measurable on B and g ? is identically zero on B. Thus, h is measurable on B. In particular, Ut f is measurable on B for all t 2 Rd . From this it follows that f is measurable on Rd .
Chapter 3
Special properties
3.1 Strict positive definiteness In several applications, for example in interpolation problems, it is more advantageous to use strictly positive definite functions.1 The aim of the present section is to collect basic properties of these functions and to give necessary and sufficient conditions for a function to be strictly positive definite.2 Definition 3.1.1. A function f W Rd ! C is said to be strictly positive definite if the matrix n A D f .ti tj / i;j D1 is positive definite for every positive integer n and for all pairwise distinct t1 ; : : : ; tn 2 Rd . We denote by SP .Rd / the set of strictly positive definite functions on Rd while SP c .Rd / is the set of all continuous functions in SP .Rd /. It is easy to check that SP .Rd / P .Rd /. The next lemma follows immediately from the definition. We omit the proof. Lemma 3.1.2. Let f be a complex-valued function on Rd . Then f 2 SP .Rd / if and only if the inequality n X f .ti tj /ci cj > 0 i;j D1
holds for every positive integer n, for all pairwise distinct t1 ; : : : ; tn 2 Rd , and for all c 2 Cn n f0g. Theorem 3.1.3. Let f 2 SP .Rd / and g 2 P .Rd /. We have (i)
f .0/ > 0 and jf .t /j < f .0/ for all t ¤ 0;
(ii)
f ; f C g 2 SP .Rd /. In particular, Re f 2 SP .Rd /;
(iii)
If r 2 R n f0g, then f .r/ 2 SP .Rd /;
(iv)
If g ¤ 0, then fg 2 SP .Rd /. In particular, jf j2 2 SP .Rd /.
1 2
We refer to [58] for an introduction to multivariate scattered data approximation. More specific results and references can be found in [23].
133
Section 3.1 Strict positive definiteness
Proof. Setting n D 1 and c D 1 in Lemma 3.1.2 shows that f .0/ > 0 while setting n D 2; x1 D x; x2 D 0; c1 D 1, and c2 D 1 yields f .0/2 f .x/f .x/ D f .0/2 jf .x/j2 > 0 whenever x ¤ 0. (iv) is a consequence of Schur’s Theorem D.2.12 while the remaining statements follow immediately from the definition of strictly positive definiteness. Recall that (complex) trigonometric polynomials on Rd are functions P of the form P .t / D
n X
cj ei.t;xj / ;
t 2 Rd
j D1
where n 2 N; c 2 Cn and x 2 Rd . Theorem 3.1.4. Let f 2 P c .Rd / and let be the corresponding spectral measure. Then f 2 SP c .Rd / if and only if there is no trigonometric polynomial P ¤ 0 on Rd which vanishes on the support of . Proof. Let P ¤ 0 be a trigonometric polynomial on Rd . We write P in the form P .t / D
n X
cj ei.t;xj / ;
t 2 Rd ;
j D1
where c 2 Cn nf0g and the xj ’s are pairwise distinct. The proof of inequality (1.1.2.v) shows that Z n X f .ti tj /ci cj D jP .t /j2 d.t /: i;j D1
Rd
Since jP j2 is continuous and nonnegative, the integral on the right-hand side is zero if and only if P vanishes on the support of . The theorem follows immediately from this observation. The next two theorems provide simple sufficient conditions to ensure strict positive definiteness on Rd . Theorem 3.1.5. If f 2 P c .Rd /; d 2, is radial and not constant, then f is strictly positive definite. Proof. By Lemma 3.6.3 the spectral measure of f is radial. Since f is not constant the support of contains a sphere of positive radius centered at the origin. Therefore, Theorem C.10.3 shows that there is no trigonometric polynomial P ¤ 0 vanishing on the support of . From Theorem 3.1.4 we infer that f is strictly positive definite.
134
Chapter 3 Special properties
Theorem 3.1.6. Let f 2 P c .Rd / and let be the corresponding spectral measure. If the support of contains a nonempty open set then f is strictly positive definite. Proof. The statement follows by the same argument as in the proof of Theorem 3.1.5. Instead of Theorem C.10.3 we apply Theorem C.10.2. We now turn to the special case d D 1.3 Proposition 3.1.7. Let f 2 P c .R/ and let be the corresponding spectral measure. If the support of has accumulation points, then f is strictly positive definite. Proof. If a trigonometric polynomial vanishes on the support of then, being analytic, it vanishes on R. Thus, the statement follows from Theorem 3.1.4. Corollary 3.1.8. If the support of is not countable, then f is strictly positive definite. In particular, if is not purely discrete, then f 2 SP c .R/. Corollary 3.1.9. If f 2 P c .R/ and lim sup jf .x/j < f .0/; x!1
then f is strictly positive definite. Proof. Theorem 1.8.5 with s D 0 shows that the spectral measure of f cannot be purely discrete so we can apply Corollary 3.1.8.
3.2 Infinitely differentiable and rapidly decreasing functions In this section we construct positive definite functions which are rapidly decreasing (cf. Definition 3.2.6). These functions can be useful for example as technical tools in proofs. We also consider the restriction of the Fourier transformation to the Schwartz space S.Rd / of all rapidly decreasing functions (see Theorem 3.2.8). 3.2.1. Recall that C1 .Rd / denotes the set of infinitely differentiable functions d 1 d on Rd while C1 00 .R / stands for the set of functions in C .R / which have compact support. It is easy to check that the function 1=t e if t > 0; g.t / WD (1) 0 if t 0 (see Figure 3.1) is infinitely differentiable on R. Since x 7! kxk2 is infinitely differ3
For special results in the one-dimensional case we refer to [12].
135
Section 3.2 Infinitely differentiable and rapidly decreasing functions
1
0.5
1
2
3
4
Figure 3.1. The function g from equation (3.2.1.1).
entiable on Rd we see that the function h.x/ WD g.1 kxk2 /;
x 2 Rd
(2)
d (see Figure 3.2) belongs to C1 00 .R /. Moreover, h is radial and its support is equal to c the closed unit ball B .1/. The support of f WD h hQ (3)
is the ball B c .2/. In addition, f is positive definite, infinitely differentiable and radial (cf. Lemma 3.6.5). Theorem 3.2.2. A set S Rd has the form S D fx 2 Rd W f .x/ 6D 0g for some infinitely differentiable positive definite function f ¤ 0 if and only if (i)
0 2 S;
(ii)
S is open;
(iii)
S D S .
Proof. If f 2 P c .Rd / and f ¤ 0, then the set S WD fx 2 Rd W f .x/ 6D 0g obviously has the properties (i)–(iii).
136
Chapter 3 Special properties
0.4
1
1
Figure 3.2. The function h from equation (3.2.1.2) with d D 1.
Assume that S Rd is such that (i)–(iii) hold and let s1 ; s2 ; : : : be all points of S having rational coordinates. For each j 2 N choose rj > 0 such that B c .rj / [ ŒB c .rj / C sj [ ŒB c .rj / sj S: C1 .Rd /
\ Let h 2 function gj ; j 2 N, by P c .Rd /
be such that h 0 and supp h D
(1)
B c .1/.
gj .x/ D 2h.x=rj / C h..x C sj /=rj / C h..x sj /=rj /;
We define the
x 2 Rd :
(2)
By Lemma 1.4.18, the function gj is positive definite. In addition, gj is nonnegative, gj 2 C1 .Rd / and supp gj D B c .rj / [ ŒB c .rj / C sj [ ŒB c .rj / sj S: Using equation (2) and Theorem 1.2.1 we see that 4A˛ jD ˛ gj j j˛j ; ˛ 2 Nd0 rj where A˛ denotes the absolute moment corresponding to h. Using this, Lemma B.1.16 shows the existence of a sequence fpk g1 1 of positive real numbers such that the series 1 X
pj jD ˛ gj j
j D1
converges for all ˛ 2 Nd0 . Setting f WD
1 X j D1
pj gj
Section 3.2 Infinitely differentiable and rapidly decreasing functions
137
we have f 2 P c .Rd /. Theorem B.2.10 shows that f 2 C1 .Rd /. Noting that f .sj / gj .sj / > 0, the relation fx 2 Rd W f .x/ 6D 0g D S follows from equation (1) and from the fact that for each s 2 S there exists a subsequence fskj g converging to s. The next three lemmata are often useful as technical tools in proofs. Lemma 3.2.3. There exists a sequence fgn g1 1 of radial positive definite functions d / of the form g D h h d Q gn 2 C1 .R where hn 2 C1 n n n 00 00 .R / such that gn .0/ D 1 1 and fgn g1 tends to 1 uniformly on compact sets. Proof. This lemma can be proved in the same way as Lemma 1.5.6 by using the function f from (3.2.1.3). Lemma 3.2.4. There exists a sequence ffn g of radial positive definite functions fn 2 d C1 00 .R / such that (i) (ii)
fOn ! 1 uniformly on compact sets; R Rd fn h d ! h.0/ for every continuous function h.
Proof. Consider the positive definite function f WD h hQ where h is the function d defined in (3.2.1.3). It is clear that f is radial and f 2 C1 00 .R /. Scaling and normalizing this function we obtain a sequence ffn g of radial positive definite functions such that fn is a density and the support of fn is contained in B c .1=n/. It is a routine exercise to show that (ii) holds. Replacing h in (ii) by the characters of Rd , we see that fOn ! 1 pointwise. The uniform convergence follows from Theorem 1.5.11. Lemma 3.2.5. For all ; ı 2 .0; 1/ there exists a radial positive definite function d p 2 C1 00 .R / such that p is a density and 0 p.t O / < ı;
t 2 Rd n B o . /:
d Proof. Let f 2 C1 00 .R / be an arbitrary radial positive definite function which is a density (cf. the proof of Lemma 3.2.4). Then 0 fO.t / 1 D f .0/ for all t . By the Riemann–Lebesgue Lemma 1.8.4, the function fO is in C0 .Rd /. From Corollary 1.4.13 we conclude that 0 fO.t / < 1; t ¤ 0. Consequently,
.fO.t //n < ı;
t 2 Rd n B o . /
for some n 2 N. The function p WD f f (n fold convolution) has the desired properties.
138
Chapter 3 Special properties
Definition 3.2.6. A function f 2 C1 .Rd / is called rapidly decreasing if kf k˛;ˇ WD sup jx ˛ D ˇ f .x/j < 1 x2Rd
for all ˛; ˇ 2 N0 . We denote by S.Rd / the linear space of all rapidly decreasing functions4 on Rd . If g is a complex-valued function on Rd , then we write5 M˛ g.x/ WD x ˛ g.x/;
x 2 Rd
so that kf k˛;ˇ D kM˛ D ˇ f k1 : d d It is clear that C1 00 .R / S.R /. Another example of a rapidly decreasing function is given by 2 f .x/ D ekxk ; x 2 Rd :
The next lemma is simple, we omit the proof. Lemma 3.2.7. If f 2 S.Rd /, then also M˛ D ˇ f 2 S.Rd / for all ˛; ˇ 2 Nd0 . Theorem 3.2.8. The Fourier transformation maps S.Rd / onto S.Rd / and Mˇ D ˛ fO D .1/j˛j .D ˇ M˛ f /O;
f 2 S.Rd /:
Moreover, if fn ; f 2 S.Rd / and lim kfn f k˛;ˇ ! 0
n!1
for all ˛; ˇ 2 Nd0 , then
lim kfOn fOk˛;ˇ ! 0
n!1
for all ˛; ˇ 2 Nd0 . Proof. Equation (1) is a consequence of Theorem 1.2.1 and Theorem 1.2.9. Let Q be a positive polynomial such that6 Z 1 d D 1: Rd Q 4 5 6
The function space S.Rd / is usually called Schwartz space. Note that the symbol M˛ is also used to denote moments of measures. We can take for example Q.x/ D c.1 C kxk2 /d with a suitable constant c > 0.
(1)
Section 3.3 Analytic characteristic functions of one variable
If g 2 S.Rd / we have d=2
.2/
jgj O
Z
Z
Rd
jgj d D
139
jgQj d kgQk1 < 1: Q
Rd
Replacing here g by Mˇ D ˛ fO and using equation (1) we see that fO 2 S.Rd / and that the second statement of the theorem is true. In the same way, we see that fL 2 S.Rd / whenever f 2 S.Rd /. Using this, the fact that the Fourier transformation maps S.Rd / onto S.Rd / follows from the relation f D .fL /O ;
f 2 S.Rd /:
The proof is complete.
3.3
Analytic characteristic functions of one variable
Definition 3.3.1. A complex-valued function f on R is said to be analytic if it is infinitely often differentiable and its Taylor series at 0 has positive convergence radius. If the convergence radius is equal to infinity, then f is said to be an entire function. We note that f is analytic if and only if there exists a complex-valued function of one complex variable, which is holomorphic in an open disk around 0 and f .t / D .t / for all real t in this disk. If is holomorphic in the whole complex plane and f .t / D .t / for all t 2 R, then f is entire. Characteristic functions that are not differentiable at 0, like ejtj or max .1 jt j; 0/, 2 1 are not analytic. The characteristic functions et and cos t are entire, while 1it and 1 are analytic but not entire. 1Ct 2 3.3.2. We have seen in Section 1.2 that the existence of derivatives of a characteristic function f is closely related to the existence of moments of the corresponding distribution .RWe will prove that analyticity is closely related to finiteness of integrals of the form ers d.s/ for certain r 2 R. We introduce the numbers
Z 1 C C rx s D sf D sup r 2 R W e d.x/ < 1
s D sf D inf r 2 R W
and the set
Z
1 1
1
erx d.x/ < 1
C S D Sf D fz 2 C W s < Im z < s g:
0 s C and hence S is empty if and only if s D s C D 0. Note that s
140
Chapter 3 Special properties
If the support of is bounded, then S D R. If is the exponential distribution ˛ and S D fz 2 C W ˛ < Im z < 1g. with parameter ˛, then f .t / D ˛it Theorem 3.3.3. Let be a probability distribution with characteristic function f . If S is not empty, then the integral Z 1 eizx d.x/ f .z/ WD 1
exists for all z in the strip S and it represents there a holomorphic function. Moreover, Z 1 .n/ f .z/ D .ix/n eizx d.x/; n 2 N (1) 1
for all z 2 S .7 Proof. The fact that f .z/ exists follows from jeizx j D eIm zx . We now proceed in a similar way as in the proof of (1.2.1.i). Let z 2 S be fixed and choose a positive ı such that z C h 2 S for all h 2 C with jhj < ı. If, in addition, h ¤ 0 then we have Z 1 eihx 1 izx f .z C h/ f .z/ x (2) D e d.x/: h hx 1 Theorem B.1.5 with n D 0 shows that the integrand times
ı 2
is dominated by
ıjxj jhxj Im zx e e : 2 Using that r < er ; r 2 R, we see that this expression is less than eıjxj eIm zx whenever jhj < 2ı . This function is -integrable with respect to x by our choice of ı. Taking the limit h ! 0 in (2) and applying Lebesgue’s theorem on dominated convergence we obtain (1) for n D 1. The general case is obtained by repeating the arguments above. Corollary 3.3.4. If the support of a distribution is bounded, then its characteristic function is entire. Corollary 3.3.5. The characteristic function of is analytic if and only if C < 0 < s : s
In the sequel, we always assume that an analytic characteristic function f is defined in the strip S . 7
See also Lemma 3.4.1.
141
Section 3.3 Analytic characteristic functions of one variable
Theorem 3.3.6. Let f be an analytic characteristic function. (i)
If g is an analytic characteristic function and f and g coincide on an interval .a; a/ R, then f D g on R.
(ii)
If g is an entire function and f and g coincide on an interval .a; a/ R, then f D g on R.8
(iii)
If two probability distributions have analytic characteristic functions and the same moments, then they are equal.
Proof. (i) It follows from Theorem 3.3.3 that f and g have holomorphic extensions to the strip S WD Sf \ Sg . Since S is connected and f D g on .a; a/, the extensions are equal on S (cf. Theorem C.1.8). This implies that the corresponding distributions are equal, hence Sf D Sg . (ii) We extend f to the strip Sf and consider g on the whole complex plane. Since Sf is simply connected and f D g on .a; a/, the extensions are equal on Sf . (iii) By Theorem 1.2.1, the corresponding characteristic functions have the same Taylor expansion at zero. Hence (iii) follows from (i). Remark 3.3.7. A distribution is in general not uniquely determined by its moments. To see an example we consider the log-normal distribution which has the density p given by 1 2 p.x/ D p e.log x/ =2 ; x > 0 x 2 (see Figure 3.3). The moments Mk of this distribution are given by 1 Mk D p 2
Z
1
x k1 e.log x/
2 =2
dx;
0
k 2 N:
Substituting t D log x, we get Z
1 Mk D p 2
8
See also Theorem 4.2.3.
D ek
2 =2
D ek
2 =2
1
1
1 p 2 :
et Z
2 =2Ckt
1
1
dt
e.tk/
2 =2
dt
142
Chapter 3 Special properties
0.7
1
2
Figure 3.3. The function p from Remark 3.3.7.
The same argument as above shows that Z 1 Z 1 1 2 k x sin.2 log x/ p.x/ dx D p et =2Ckt sin.2 t / dt 2 1 0 Z 1 2 2 k =2 1 De ey =2 sin.2y/ dy p 2 1 D0 for all k 2 N0 . Consequently, all the functions pa , a 2 Œ1; 1, defined by pa .x/ D p.x/ 1 C a sin.2 log x/ ; x > 0 are densities and they have the same moment sequence.9 Next we present an application of analytic characteristic functions. Theorem 3.3.8. If the random variables X and Y have analytic characteristic functions and the equation E.X k Y l / D E.X k / E.Y l / holds for all k and l in N, then X and Y are independent.10
9 10
See, e.g., [1, 35, 55] or Chapter 2 in [6] for more information on this topic. We refer to [7] for some related results.
Section 3.3 Analytic characteristic functions of one variable
143
Proof. Let fX , fY , and g be the characteristic functions of X , Y , and the random vector .X; Y /, respectively. If s; t 2 R and jsj and jt j are small enough, then we have 1 1 X E.X k / .is/k X E.Y l / .it /l fX .s/ fY .t / D kŠ lŠ
D D
kD0 1 X
k;lD0 1 X k;lD0
lD0
E.X k /
E.Y l / .it /l kŠ lŠ
.is/k
E.X k Y l / .is/k .it /l : kŠ lŠ
Using Theorem 1.2.1 we see that the last sum is the Taylor series of the function g at zero. Now fix s and t . The display above shows that r 7! g.rs; rt /; r 2 R, is an analytic characteristic function which is equal to the analytic characteristic function r 7! fX .rs/ fY .rt / in a neighborhood of zero. By (3.3.6.ii), the equality holds on R. In particular, g.s; t / D fX .s/ fY .t /: Thus, X and Y are independent (cf. Theorem 1.3.10). Theorem 3.3.9. Let f1 and f2 be characteristic functions. If the characteristic function f D f1 f2 is analytic, then so are f1 and f2 . Moreover, the relations Sfj Sf .j D 1; 2/ hold.11 Proof. Denote by 1 ; 2 and the corresponding distributions. For all A; B > 0 and < r < s C we have s Z
A
Z
rx
A
e
D D
d1 .x/ Z 1Z 1 Z1 1 Z1 1 1
1
B B
ry
e
Z d2 .y/ D
A
A
Z
B
B
er.xCy/ d1 .x/ d2 .y/
er.xCy/ d1 .x/ d2 .y/
ers d.1 2 /.s/ ers d.s/ < 1:
This being true for all A; B > 0, we conclude that Z 1 Z 1 erx d1 .x/ < 1 and erx d2 .x/ < 1 1
whenever 11
s
0; iy 2 Sf . jf .x C iy/j f .iy/; x 2 R; iy 2 Sf . The function y 7! log f .iy/ is convex on the interval .sf ; sfC /.
Proof. The first statement is obtained from Lemma 3.3.11 with zj D xj C iy=2 while (ii) and (iv) follow from (i) (see Lemma 1.4.8 and Theorem 1.4.12).12 Let be the distribution corresponding to f . Then Z 1 f .iy/ D eyx d.x/ > 0: 1
12
Actually, these properties also follow easily from the integral representation of f .
145
Section 3.3 Analytic characteristic functions of one variable
n By Lemma 3.3.11, the matrix A D f .zj z k / j;kD1 is positive semidefinite and hence det A 0. In the special case where n D 2, zj D iyj , this inequality gives f .iŒy1 C y2 /2 f .i2y1 /f .i2y2 /;
sf < y1 ; y2 ; y1 C y2 < sfC :
(1)
From (iii) we see that f is strictly positive on the imaginary axis. Inequality (1) and the continuity of f imply that the function y 7! log f .iy/ is convex. Lemma 3.3.13. Let f be an analytic characteristic function of the form f D eg , where g is holomorphic in Sf and g.0/ D 0. Then (i)
(ii)
for x; y 2 R such that jyj r < min .sf ; sfC / we have Re g.x C iy/ g.iy/ max g.ir/; g.ir/ I ˇ ˇ y. g.iy/ dg.iy/ dy ˇ yD0
Proof. By (3.3.12.v), the function y 7! log f .iy/ is convex and hence the inequality g.iy/ max g.ir/; g.ir/ (1) holds. Since jf j D eRe g , we obtain (i) from (3.3.12.iv) and (1). In view of g.iy/ D log f .iy/, the function g is convex on the imaginary axis. Convexity and the fact that g.0/ D 0 imply (ii). The next theorem, which is essentially due to Yu. V. Linnik,13 is a generalization of Theorem 3.3.9. If all the numbers ˛k below are integers or rational numbers, then Theorem 3.3.14 follows easily from Theorem 3.3.9. Theorem 3.3.14 (Linnik). Let f be a holomorphic function on the open disk D.0; ı/; ı > 0, such that f .z/ D f .z/ for all z 2 D.0; ı/. Further, let f1 ; : : : ; fr be characteristic functions and ˛1 ; : : : ; ˛r be positive real numbers such that r Y
jfk .t /j˛k D jf .t /j;
ı < t < ı:
kD1
Then fk is analytic and Sfk D.0; ı/; k D 1; : : : ; r. Proof. As in the proof of Theorem 1.2.13 we write gk .t / D fk .t /fk .t / D jfk .t /j2 ; 13
t 2R
Linnik’s original formulation contains the conditions f .z/ ¤ 0; z 2 D.0; ı/, and r Y
˛
fk k .t / D f .t /;
ı < t < ı:
kD1
Our conditions are somewhat weaker and allow us to avoid the use of powers of complex numbers.
146
Chapter 3 Special properties
and g.z/ D f .z/f .z/ D jf .z/j2 ; z 2 D.0; ı/. Then g is an even holomorphic function on D.0; ı/, the gk ’s are characteristic functions and g; gk are nonnegative on the interval .ı; ı/. Moreover, r Y
gk˛k .t / D g.t /;
ı < t < ı:
(1)
kD1
We will prove the statement of the theorem for the functions gk . The statement for fk follows then from Theorem 3.3.9. We choose d 2 .0; ı such that g.z/ ¤ 0 if jzj < d . From (1) we obtain r X
˛k log
kD1
1 D h.t /; gk .t /
d < t < d
(2)
where h.t / D log g.t /. By Theorem 1.2.13, the functions gk are infinitely often differentiable. We prove first that they are analytic and Sgk D.0; R/ for some R > 0. We may suppose that ˛k 1 .k D 1; : : : ; r/. Indeed, if N0 2 N is sufficiently large, then N0 ˛k 1 for all k. Raising both sides of (1) to the power N0 we obtain an equation of the same type containing factors of the form gkN0 ˛k . 2q˛ We introduce the notation q D g 2q and q;k .t / D gk k .t /; t 2 .ı; ı/. Raising both sides of (1) to the power 2q, differentiating the new equation 2q-times using the Leibniz rule (cf. (B.1.13.1)), and setting t D 0 we obtain X .2q/Š .l / .lr / .0/: (3)
1 .0/ : : : q;r
q.2q/ .0/ D l1 Š : : : lr Š q;1 l1 CClr D2q
To compute the derivatives on the right-hand side we apply Faà di Bruno’s formula (B.1.14) which yields X .m/ .m/ c.b; m/ gk0 .0/b1 gk00 .0/b2 : : : gk .0/bm (4)
q;k .0/ D where the sum is over all b 2 Nm 0 satisfying b1 C 2b2 C C mbm D m and c.b; m/ D
mŠ2q˛k .2q˛k 1/ : : : .2q˛k jbj C 1/ : b1 Šb2 Š : : : bm Š .1Š/b1 .2Š/b2 : : : .mŠ/bm
Since ˛k 1 and jbj m, the number c.b; m/ is positive if m 2q. Noting that .2l1/ .2l/ gk .0/ D 0 and the sign of Œgk .0/b2l is .1/lb2l , we see that the sign of the nonvanishing terms14 on the right of (4) is equal to .1/m=2 . From this we conclude that the sign of the nonvanishing terms on the right-hand side of (3) is .1/q . Since 14
Note that m is even if there are nonvanishing terms.
147
Section 3.3 Analytic characteristic functions of one variable .2q/
the right-hand side of (3) contains the terms 2q˛k gk
we obtain
ˇ ˇX ˇ ˇ r .2q/ ˇ 2q˛k gk .0/ˇˇ j q.2q/ .0/j ˇ kD1
and hence
ˇ ˇ ˇ .2q/ ˇ ˇgk .0/ˇ
1 j .2q/ .0/j; k D 1; : : : ; r: (5) 2q˛k q Applying the inequality (C.1.13) to the Taylor expansion of q D g 2q with r replaced by ı0 D 2ı we get .2q/ ˇ ˇ j q .0/j 2 2q sup ˇRe g 2q .z/ ˇ 2M 2q .2q/Š ı0 jzjDı0
where M D
1 ı0
(6)
supjzjDı0 jg.z/j. The inequalities (5) and (6) show that .2q/
lim sup q!1
gk
.0/
1 ! 2q
M
.2q/Š
and hence gk is holomorphic on the disk D.0; 1=M /. Denote by ık the radius of convergence of the Taylor expansion of gk . We already know that ık > 0. Without loss of generality we may assume that ı1 ır . To finish the proof we have to show that ı1 D ı. Assume, on the contrary, that ı1 < ı. The equations .2j / 1 X gk .0/ .t /j ; t 2 .ık2 ; ık2 / Gk .t / D .2j /Š j D0
and
1 X g .2j / .0/ .t /j ; G.t / D .2j /Š
t 2 .ı 2 ; ı 2 /
j D0
define functions whose Taylor expansions have ık2 and ı 2 as radii of convergence, respectively. We have Gk .t 2 / D gk .it /; G.t 2 / D g.it / and therefore r Y
Gk˛k .t / D G.t /;
t 2 .ı12 ; ı12 /:
(7)
l 2 N0 :
(8)
kD1
Relation (1.2.1.ii) shows that .l/
Gk .0/ 0;
Let ı0 2 .0; ı12 / be such that G.ı0 / ¤ 0 and define the functions Hk by Hk .t / D Gk .ı0 C t /; jı0 C t j < ı12 ; k D 1; : : : ; r:
148
Chapter 3 Special properties
By (7) we have
r Y
Hk˛k .t / D G.ı0 C t /;
jı0 C t j < ı12 ;
kD1
hence
r
Y G.ı0 C t / Hk .t / ˛k D : Hk .0/ G.ı0 /
kD1
We raise this equation to the power n, differentiate the resulting equation n times and .n/ .l/ set t D 0. The left-hand side contains the term n˛1 H1 .0/=H1 .0/. Since Hk .0/ 0 for all l (cf. (8)) we conclude that ˇ dn G n .ı0 C t / ˇˇ .n/ : n˛1 H1 .0/ H1 .0/ n dt G n .ı / ˇ 0
tD0
We now choose arbitrary real numbers r1 and r2 such that ı12 < r1 < r2 < ı 2 . Applying the inequality (C.1.13) to the Taylor expansion of G n .ı0 Ct / with r replaced by r1 ı0 we get ˇ ˇ n ˇ ˇ 2 dn n 1 ˇRe G .ı0 C z/ ˇ 2M n ; ˇ G .ı C t / sup 0 ˇ nŠ dt n .r1 ı0 /n jzjDr1 ı0 tD0 where M D
1 r1 ı1
supjzjr2 jG.z/j. Consequently, .n/
H1 .0/ lim sup n!1 nŠ
! n1 M
and therefore H1 is holomorphic on the disk D.ı0 ; 1=M /. Since M does not depend on ı0 , we can choose ı0 such that this disk contains ı12 . This shows that G1 is bounded on .ı12 ; ı12 C / for some positive . On the other hand, the Taylor series of G1 at 0 has nonnegative coefficients (cf. inequality (8)) and hence this series cannot converge for values larger than ı12 (see Lemma B.1.15). This contradiction shows that ı1 D ı.
3.4 Holomorphic L2 Fourier transforms In this section we consider the Fourier transforms of square integrable functions as well as their extensions to certain regions of C.15 We will use the notation
and 15
CC im WD fz 2 C W Im z > 0g;
C im WD fz 2 C W Im z < 0g
CC re WD fz 2 C W Re z > 0g;
C re WD fz 2 C W Re z < 0g:
For further reading on this topic we refer to Rudin’s book [46].
Section 3.4 Holomorphic L2 Fourier transforms
149
Lemma 3.4.1. If F 2 L2 .0; 1/, then the function Z 1 F .t /eitz dt; z 2 CC f .z/ D im 0
is holomorphic. Moreover, Z 1 Z 2 jf .x C iy/j dx 2 1
1
jF .t /j2 dt;
y > 0:
0
Proof. If z D x C iy where y > 0, then jeitz j D ety showing that the integral above exists. Assume that y > ı > 0; Im zn > ı, and zn ! z. Using the Cauchy–Schwarz inequality we obtain Z 1 Z 1 ˇ itz ˇ 2 2 ˇe eitzn ˇ2 dt: jF .t /j dt jf .z/ f .zn /j 0
0
The second integrand on the right is bounded by the integrable function t 7! 4e2ıt and tends to zero as n ! 1. The dominated convergence theorem shows that f is continuous. Let be a closed path in CC im . By Fubini’s theorem we have Z 1 Z Z f .z/ dz D F .t / eitz dz dt: 0
Cauchy’s Theorem C.1.5 shows that Z eitz dz D 0:
Thus, the integral of f over is zero and hence, by Morera’s Theorem C.1.6, f is holomorphic on CC im . To prove the last statement let y > 0 be fixed. By the definition of f , we have Z 1Z 1 Z 1 F .t /ety eitx dt dx; y > 0: f .x C iy/ dx D 1
1
0
Applying Plancherel’s Theorem 1.8.9 we obtain Z 1 Z 1 Z 1 1 2 2 2ty jf .x C iy/j dx D jF .t /j e dt jF .t /j2 dt: 2 1 0 0 Lemma 3.4.2. Let 0 < a < 1 and F 2 L2 .a; a/. The function Z a F .t /eitz dt; z 2 C f .z/ D a
150
Chapter 3 Special properties
is entire and jf .z/j C eajzj ;
z2C
with some constant C 0. Proof. The fact that f is entire can be proved by the same arguments as in the proof of Lemma 3.4.1 while the inequality follows from Z a Z a jF .t /jety dt eajyj jF .t /j dt jf .z/j a
a
since jyj jzj. Definition 3.4.3. For a > 0 we denote by E.a/ the set of all entire functions of exponential type a, i.e., F 2 E.a/ provided that F is entire and jF .w/j C eajwj ;
w2C
for some C < 1. Theorem 3.4.4 (Paley–Wiener). Let a and C be positive constants and f 2 E.a/ be an entire function such that Z 1 jf .x/j2 dx < 1: (1) 1
Then the L2 Fourier transform F of f jR vanishes outside Œa; a and Z a f .z/ D F .t /eitz dt a
(2)
for all z 2 C. Proof. For > 0 define the function f by f .x/ WD f .x/ejxj ; We will prove that
Z lim
1
!0 1
f .x/eitx dx D 0;
x 2 R:
t 2 R n Œa; a:
(3)
First we show that this relation implies the theorem. Considering the restriction of f to the real axis, we have lim kf f k2 D 0:
!0
Plancherel’s theorem (cf. Theorem 1.8.9) implies that the Fourier transforms of the f ’s converge in L2 .R/ to the L2 Fourier transform F of f . By equation (3) the
Section 3.4 Holomorphic L2 Fourier transforms
151
function F vanishes outside Œa; a. Since f is equal to the inverse Fourier transform of F we see that equation (2) holds for almost every z 2 R. As each side of (2) is an entire function (see Corollary 3.3.4), we conclude that (2) holds for all z 2 C. To prove (3) let ˛ be the path defined by ˛ .s/ D sei˛ ;
˛ 2 R; s 2 Œ0; 1/:
(4)
Write …˛ D fw 2 C W Re .wei˛ / > ag; and for w 2 …˛ , define the function ‰˛ by16 Z Z wz i˛ ‰˛ .w/ D f .z/e dz D e
1
f .sei˛ /ewse ds: i˛
0
˛
Using the fact that f 2 E.a/ we see that the modulus of the integrand is at most C e.Re .we
i˛ /a/s
:
The same arguments as in the proof of Lemma 3.4.1 show that ‰˛ is holomorphic in the half plane …˛ . In the cases ˛ D 0 and ˛ D , we extend the definition of ‰˛ to larger half planes by Z 1 f .x/ewx dx; w 2 CC ‰0 .w/ D re 0
and
Z ‰ .w/ D
0
1
f .x/ewx dx;
w 2 C re :
Inequality (1) and Lemma 3.4.1 show that ‰0 and ‰ are holomorphic in the indicated half planes. Moreover, it is easy to check that Z 1 f .x/eitx dx D ‰0 . C it / ‰ . C it /; t 2 R: 1
Hence, it remains to prove that the right-hand side of the equation above tends to zero as ! 0 if jt j > a. Next we show that any two of the functions ‰˛ are equal in the intersection of their domains of definition. Let 0 < ˇ ˛ < and write
D
˛Cˇ ; 2
D cos
ˇ˛ : 2
Note that is positive. If w D rei , where r 2 .a=; 1/, then w 2 …˛ \ …ˇ . This follows from Re wei˛ D r D Re weiˇ : 16
See Section C.1 for the definition of line integrals.
152
Chapter 3 Special properties
Let be the curve given by .t / D Reit ; R > 0; ˛ t ˇ. If z D .t / is a point of this curve and w is as above, then Re .wz/ D rR cos.t / rR and hence the modulus of the integrand of Z f .z/ewz dz IR WD
is at most C e.ar/R : Since r 2 .a=; 1/, we conclude that the integral IR tends to zero as R ! 1. By Cauchy’s Theorem C.1.5 Z Z f .z/ewz dz D IR C f .z/ewz dz: Œ0;Reiˇ
Œ0;Rei˛
Taking the limit R ! 1 we see that ‰˛ .w/ D ‰ˇ .w/;
w D rei ; r >
a :
Theorem C.1.8 shows that ‰˛ and ‰ˇ coincide in the intersection of their domains of definition. Thus, the difference on the right of equation (3) does not change if we replace ‰0 and ‰ by ‰=2 if t < a, and by ‰=2 if t > a. From this we infer that the difference tends to 0 as ! 0. Theorem 3.4.5 (Rudin). Suppose F 2 E.2a/; F is even, nonnegative on the real axis, and Z 1 F 2 .u/ du < 1: 1
Then there are even functions Gj ; Hj 2 E.a/ such that F .u/ D
1 X j D1
jGj .u/j2 C u2
1 X
jHj .u/j2 ;
u 2 R:
(1)
j D1
Proof. We may assume that a D 1. If F .0/ D 0, then F .w/ D w 2k F0 .w/ with some positive integer k and an entire function F0 having no zero at 0. It is easy to check that F0 satisfies the conditions of the theorem. If we have shown that F0 admits a decomposition analogous to (1), then we obtain the desired decomposition for F by multiplying Gj and Hj by w k and changing the roles of Gj and Hj if k is odd. Therefore we may assume, without loss of generality, that F .0/ D 1.
Section 3.4 Holomorphic L2 Fourier transforms
153
Let ˙1 ; ˙2 ; : : : be the zeros of F , counted with their multiplicities. Since F 2 E.2/, Corollary C.1.16 shows that n
log C C 2jn j log 2
with some constant C > 0 and hence jn j > cn, for some c > 0. Thus, 1 X nD1
1 < 1: jn j2
Using this and the facts that F is even and F 2 E.2/, Hadamard’s factorization Theorem C.1.19 shows that 1
Y z2 1 2 ; z 2 C: (2) F .z/ D n nD1
Let t1 t2 be those positive numbers for which itn is a zero of F and write m
Y z2 Am .z/ D 1C 2 tn nD1
Qm D F=Am Q D lim Qm : m!1
We show that Q 2 E.2/. It is clear that Qm 2 E.2/. The Paley–Wiener Theorem 3.4.4 shows that the Fourier transform of the restriction of Qm to R vanishes outside Œ2; 2. By definition, Qm ! Q pointwise. Since 0 Qm F on the real axis, Lebesgue’s dominated convergence theorem shows that Qm ! Q in L2 .R/. Consequently, QO m ! QO in L2 .R/ in view of Theorem 1.8.9 and hence QO vanishes outside of Œ2; 2. Thus, QO is Lebesgue integrable. Taking the inverse Fourier transform of QO and using Lemma 3.4.2 we see that Q 2 E.2/. The function Q is nonnegative on the real axis, it is even and has no pure imaginary zeros. Therefore, the real zeros of Q have even multiplicity and the non-real zeros occur in conjugate pairs. Hence, we can write Q as Q D F1 F2 where F1 and F2 are subproducts of the product in (2), F1 and F2 are even, entire and F2 .w/ D F1 .w/. Next we show that F1 2 E.1/ and F2 2 E.1/. Since jn j > cn, the subproducts of (2) are dominated by 1
ˇ z ˇ2 Y ˇ ˇ 1Cˇ ˇ : cn nD1
Using this and the product representation of the sine function from Remark C.1.20, we see that F1 and F2 are of exponential type. The equation jF1 j2 D Q holds on both the real and imaginary axes. On the real axis, the function Q is the inverse Fourier
154
Chapter 3 Special properties
O hence it is bounded. The function R defined transform of the integrable function Q, by R.w/ WD eiw F1 .w/ is obviously bounded on the real axis. Using that Q 2 E.2/ we see that jR.it /j D et jF1 .it /j2 D et Q.it / et C e2t ;
t > 0;
with some C > 0, i.e., R is bounded on the upper half of the imaginary axis. Since F1 is of exponential type, R is of exponential type as well. The Phragmen–Lindelöf Theorem C.1.21 shows that R is bounded in the first and second quadrant. Applying the same argument to the lower half plane we conclude that F1 2 E.1/ and F2 2 E.1/. By the definition of Am and Q we have F .z/ D Q.z/
1
Y z2 1C 2 tn
nD1
D F1 .z/F2 .z/
1 X
ck2 z 2k
kD0
with some nonnegative numbers ck . If u is real, then F1 .u/F2 .u/ D jF1 .u/j2 and therefore F .u/ D D
1 X kD0 1 X j D0
jck uk F1 .u/j2 jc2j u2j F1 .u/j2 C u2
1 X
jc2j C1 u2j F1 .u/j2 ;
u 2 R:
j D0
Thus, the functions Gj .z/ WD c2j z 2j F1 .z/ and Hj .z/ WD c2j C1 z 2j F1 .z/ have the desired properties.
3.5 Further properties of Gaussian distributions We now have all the necessary tools to prove further important properties of Gaussian distributions. Theorem 3.5.1. Let P be a polynomial on Rd with complex coefficients such that P .0/ D 0. The function f D eP is the characteristic function of a d-dimensional random vector X if and only if 1 P .t / D i.m; t / .C t; t /; 2
t 2 Rd
with some m 2 Rd and a d d positive semidefinite real matrix C D .cij /.
(1)
155
Section 3.5 Further properties of Gaussian distributions
Proof. First we show that if f D eP is a characteristic function, then the degree of P is at most two. Suppose first that d D 1. Since P .0/ D 0 the constant term of P is 0. The case P 0 being trivial, assume that n X
P .t / D
ck t k ;
ck 2 C; t 2 R
kD1
where n 1 and cn ¤ 0. Since jf .z/j D eRe P .z/ ; z 2 C, the inequality (3.3.12.iv) implies that X X n n k ck z ck .i Im z/k ; z 2 C: Re kD1
kD1
Replacing here z by s z; s 2 R, dividing both sides by s n and letting s ! 1 we see that Re .cn z n / cn .i Im z/n : Putting all n-th roots z1 ; : : : zn of jcn j=cn into this inequality, we obtain jcn j cn .i Im zj /n ;
j D 1; : : : ; n:
Taking modulus and dividing by jcn j shows that jIm zj j 1. Since jzj j D 1, we conclude that jIm zj j D 1 for all j D 1; : : : ; n. This is only possible if n D 1 or n D 2. Thus, the degree of P is at most 2. The case where d is arbitrary follows now from the fact that s 7! f .s t / is a characteristic function of one variable for all t 2 Rd (see, e.g., Remark 1.1.10) by using Lemma B.6.7. Next we show that P has the form (1). We already know that P .t / D
d X j D1
aj tj C
d X
bj k tj tk
j;kD1
with some aj ; bj k 2 C. We may obviously assume that bj k D bkj . Theorem 1.2.9 shows that all moments of X exist. Hence, by Theorem 1.2.1, @ f .0/ D aj D iE.Xj / @tj @2 f .0/ D aj ak C bj k C bkj D aj ak C 2bj k D E.Xj Xk /: @tj @tk Thus, m WD i.a1 ; : : : ; ad / is the expectation and C WD 2.bj k /dj;kD1 is the covariance matrix of X . From this it follows that m 2 Rd and that C is real and positive semidefinite (see page 349). Finally, suppose that C is a positive semidefinite real matrix and m 2 Rd . Then f is the characteristic function of the random vector Y D BX C m where X is standard Gaussian and B is a symmetric real matrix such that B 2 D C (see 1.10.1).
156
Chapter 3 Special properties
Theorem 3.5.2. Let f be a characteristic function of a d-dimensional random vector X and P be a polynomial on Rd . If f D eP in a neighborhood of zero, then X is Gaussian. Proof. For fixed t 2 R consider the characteristic function x 7! f .xt /; x 2 R. By (3.3.6.ii), f .xt / D eP .xt/ for all x 2 R and hence f D eP . We can now apply Theorem 3.5.1. Remark 3.5.3. The function f .t / D e˛jtjjtj ; ˛
t 2R
is a characteristic function for all ˛ 2 Œ2; 1/ (see Figure 3.4). To see this we show that f is convex on the interval Œ0; 1/. The fact that f is a characteristic function follows then from Pólya’s Theorem 3.9.11. The second derivative of f at t > 0 is Œ˛ 2 .1 C t ˛1 /2 ˛.˛ 1/t ˛2 f .t /: It suffices to prove that the expression in the brackets is positive which is equivalent to p ˛ t C t ˛ > 1 1=˛ t 2 : ˛
˛
This inequality holds because t t 2 if 0 t 1 and t ˛ t 2 if t 1.
1
1 Figure 3.4. The function f from Remark 3.5.3 with ˛ D 3.
1
157
Section 3.5 Further properties of Gaussian distributions
Theorem 3.5.4 (Lévy–Cramér). If f1 and f2 are characteristic functions on R such that for some a 2 R and b > 0 2
f1 .t /f2 .t / D eiatbt ; then
2
fj .t / D eiaj tbj t ;
t 2R t 2R
for some aj 2 R and bj 0; j D 1; 2. Proof. Without loss of generality we may assume that a D 0 and b D 1. From Corollary 3.3.10 we infer that f1 and f2 are entire characteristic functions. The equation 2 ez D f1 .z/f2 .z/ holds clearly for every z 2 C. It follows that f1 has no zeros and therefore, by Lemma C.1.11, f1 D eg for some entire function g of the form g.z/ D
1 X
ck z k ;
ck ; z 2 C:
kD1
g 0 .0/
We may assume that c1 D D 0, for otherwise we replace f1 .z/ by ec1 z f1 .z/ c z 1 and f2 .z/ by e f2 .z/. Applying inequality (3.3.13.ii) to the functions f1 .z/ D eg.z/ and f2 .z/ D 2 eg.z/z we obtain g.iy/ 0 and
g.iy/ C y 2 0;
y 2 R:
Applying now the inequality on the left of (3.3.13.i) to f1 and f2 , we see that Re g.x C iy/ g.iy/ and Re g.x C iy/ .x C iy/2 g.iy/ C y 2 for all x; y 2 R. These four inequalities imply that jRe g.z/j jzj2 ; z 2 C. Using Corollary C.1.14 we obtain that cn D 0 for each n > 2. Since f1 is bounded on R and f1 .x/ D f1 .x/; x 2 R, we must have c2 0, completing the proof. Corollary 3.5.5. If X and Y are independent d-dimensional random vectors and X C Y is Gaussian, then X and Y are Gaussian as well. Proof. Let a 2 Rd be arbitrary. By Lemma 1.10.3 the random variable .a; X C Y / D .a; X / C .a; Y / is Gaussian. The previous theorem shows that .a; X / and .a; Y / are Gaussian. The statement follows now from Theorem 1.10.4. Theorem 3.5.6. Let f1 ; : : : ; fr be characteristic functions on Rd and let ı; ˛1 ; : : : ; ˛r be positive real numbers such that r Y
jfk .t /j˛k D eP .t/ ;
t 2 Rd ; kt k < ı
kD1
holds for some real polynomial P . Then the distributions corresponding to the fk ’s are Gaussian.
158
Chapter 3 Special properties
Proof. Suppose first that d D 1. By Theorem 3.3.14, the functions fk are entire. As in the proofs of Theorems 3.3.14 and 1.2.13 we write gk .t / D fk .t /fk .t / D jfk .t /j2 ;
t 2R
and Q.t / D P .t / C P .t /. Then gk is an entire, nonnegative characteristic function. Moreover, r Y gk˛k .t / D eQ.t/ ; t 2 R: (1) kD1
We will prove the statement of the theorem for the functions gk . The statement for fk follows then from Theorem 3.5.4. By Lemma C.1.11 there exist entire functions hk such that gk D ehk and hk .0/ D 0. These functions satisfy the equation r X
˛k hk .z/ D Q.z/;
z 2 C:
kD1
As hk is even, its first derivative at 0 is equal to 0. Inequality (3.3.13.ii) shows that hk .it / 0; t 2 R. Consequently, Q.it / 0 and 0 hk .it / Q.it /. From (3.3.13.i) and Corollary C.1.14 we conclude that hk is a polynomial. That the distribution corresponding to gk is Gaussian follows from Theorem 3.5.1. Now let d be arbitrary. For each a 2 Rd nf0g the univariate characteristic functions t 7! fk .t a/; t 2 R, satisfy the equation r Y
jfk .t a/j˛k D eP .ta/ ;
t 2 R; jt j
m. If k > m and ak ¤ 0, then the factor fk .ak t / appears on both sides of (1). The same holds for fk .bk s/ if k > m and bk ¤ 0. We choose ı > 0 such that none of the factors in (1) is equal to zero if t; s 2 .ı; ı/. Then we have m Y
fk .ak t C bk s/ D
kD1
m Y kD1
fk .ak t /
m Y
fk .bk s/;
t; s 2 .ı; ı/:
(2)
kD1
We show that this equation implies that Xk is Gaussian. By the same argument as in the proof of Theorem 3.5.6, we may assume that all the factors in (2) are positive. Setting gk D log fk we write (2) in the form m X kD1
gk .ak t C bk s/ D
m X kD1
gk .ak t / C
m X
gk .bk s/;
t; s 2 .ı; ı/:
kD1
If the numbers bk =ak are all different, then Theorem C.9.5 shows that gk is a polynomial in a neighborhood of 0 and hence, by Theorem 3.5.2, Xk is Gaussian. Assume that b1 =a1 D D bn1 =an1 for some n1 and b1 =a1 ¤ bk =ak if k > n1 . We then write Ya and Yb in the form a2 an Ya D a1 X1 C X2 C C 1 Xn1 C a1 a1 b2 bn1 Yb D b1 X1 C X2 C C X n1 C b1 b1 The expressions in the brackets are equal. Applying the same arguments to the remaining Xj ’s, we obtain that the sums in the brackets are Gaussian. By Theorem 3.5.4 of Lévy and Cramér the summands are Gaussian as well. and
The next theorem is a multivariate analogue of the previous one.18
18
We refer to [33] for further reading on results of this type and their applications.
160
Chapter 3 Special properties
Theorem 3.5.8. Let X1 ; : : : ; Xr be independent random vectors in Rd and let Ak ; Bk 2 Rd d ; k D 1; : : : ; r, be nonsingular matrices. If the random vectors YA D
r X
Ak X k
and
YB D
kD1
r X
Bk X k
kD1
are independent, then Xk is Gaussian for all k. Proof. Proceeding in the same way as in the proof of Theorem 3.5.7, we obtain the functional equation r X
gk .Ak t C Bk s/ D
kD1
r X
gk .Ak t / C
kD1
r X
gk .Bk s/;
t; s 2 B o .ı/
kD1
with some ı > 0, where gk D log fk and fk is the characteristic function of Xk . Theorem C.9.5 shows that both sums on the right-hand side are polynomials of t and s, respectively. In view of Theorem 3.5.2, the random vectors YA and YB are Gaussian. Hence, by Theorem 3.5.4 of Lévy and Cramér, Ak Xk is Gaussian as well. Since Ak is nonsingular, we infer that Xk is Gaussian.
3.6 Fourier transformation of radial measures and functions Radial measures and functions have the advantage of a simple structure. In this section we start exploiting radiality in more detail. Definition 3.6.1. A function g W Rd ! C is called radial if g.t / D g.O t /;
t 2 Rd
for every orthogonal matrix O 2 O.d /. A measure 2 M.Rd / is called radial if .B/ D .OB/;
B 2 B.Rd /
for all O 2 O.d /. Setting O D I where I is the identity matrix, we see that radial positive definite functions are real-valued and even. Lemma 3.6.2. A complex-valued function g on Rd is radial if and only if there exists a complex-valued function h on Œ0; 1/ such that g.t / D h.kt k/;
t 2 Rd :
Section 3.6 Fourier transformation of radial measures and functions
161
Proof. Assume that g is radial and define the function h by h.r/ WD g.re1 /;
r 2 Œ0; 1/
where e1 D .1; 0; : : : ; 0/ is the first vector of the standard basis of Rd . Let t 2 Rd be arbitrary and choose an orthogonal matrix Ot 2 O.d / such that Ot t D kt k e1 . We then have g.t / D g.Ot t / D g.kt k e1 / D h.kt k/: To prove the converse statement assume that g.t / D h.kt k/ with some function h. Then f .O t / D h.kO t k/ D h.kt k/ D f .t /; t 2 Rd , i.e., f is radial. Lemma 3.6.3. A measure 2 Mb .Rd / is radial if and only if the function O is radial. Proof. If is radial, then Z
Z g.t / d.t / D
g.O t / d.t /
(1)
for all functions g of the form gD
n X
cj 1Bj ;
n 2 N; cj 2 C; Bj 2 B.Rd /
j D1
and for all O 2 O.d /. Since these functions span a dense linear subspace of L1 ./, we conclude that equation (1) holds for all continuous bounded functions g. In particular, Z Z 1 O /: .O O t / D ei.O t;s/ d.s/ D ei.t;O s/ d.s/ D .t The converse statement can be proved in the same way by using the fact that functions of the form t 7! ei.t;s/ ; s 2 Rd , span a dense linear subspace of L1 ./. Next we consider radiality in terms of random vectors. Theorem 3.6.4. Let X be a d-dimensional random vector. The following statements are equivalent: (i)
X and OX have the same distribution for every orthogonal matrix O 2 O.d /;
(ii)
the distribution of X is radial;
(iii)
the characteristic function of X is radial;
(iv)
for any a 2 Rd the random variables .a; X / and kakX1 have the same distribution.
162
Chapter 3 Special properties
Proof. The equivalence of (i) and (ii) is trivial while the equivalence of (ii) and (iii) follows from Lemma 3.6.3. Assume that (i) holds and let Oa 2 O.d / be such that Oa a D kake1 where e1 D .1; 0; : : : ; 0/ 2 Rd . Then .a; Oa1 X / D .Oa a; X / D kakX1 so that .a; X / and kakX1 have the same distribution. Finally, assume that .a; X / and kakX1 have the same distribution. We then have i h i h fX .a/ D E ei.a;X/ D E eikakX1 D fX1 .kak/ i.e., fX is radial. Lemma 3.6.5. Let g 2 L2 .Rd / be a radial function. Then the function Z g.t C y/g.y/ dy; t 2 Rd f .t / WD g g.t Q /D Rd
is radial, f 2 P c .Rd /, and f vanishes at infinity. Proof. In view of Lemma 1.5.4 it remains to prove that f is radial. Let O 2 O.d / be an orthogonal matrix. Using the fact that the Lebesgue measure is invariant under orthogonal transformations we obtain Z g.O t C y/g.y/ dy f .O t / D d R Z g.O t C Oy/g.Oy/ dy D d ZR D g.t C y/g.y/ dy D f .t /; t 2 Rd Rd
i.e., f is radial. Using Bessel functions, the Fourier transform of a radial function can be expressed as a univariate integral. Theorem 3.6.6. The Fourier transform of a radial function f 2 L1 .Rd / is given by Z 1 .2/d=2 O f .t / D r d=2 Jd=21 .rkt k/f .r e/ dr; t 2 Rd n f0g; kt kd=21 0 where e 2 Rd is an arbitrary element of unit length. q 2 cos x the equation above is trivial if d D 1. Suppose Proof. Since J1=2 .x/ D x that d > 1. By Lemma 3.6.3 the function fO is radial. We may therefore assume that
163
Section 3.6 Fourier transformation of radial measures and functions
e D .1; 0; : : : ; 0/. Writing h.r/ WD f .r e/; r 0, we have Z O O f .t / D f .kt k e/ D eiktkx1 h.kxk/ dx Rd
q Z Z iktkx1 2 2 e h x1 C kyk dy dx1 : D Rd 1
R
Using Corollary B.7.7 we see that the integral over Rd 1 is equal to Z 1 q d 2 2 2 .S / h x1 C u ud 2 du 0
where, in view of Proposition B.7.8, .S d 2 / D Therefore, we have fO.t / D .S d 2 /
Z
1
Z
2 .d 1/=2 : ..d 1/=2/
1 iktkx1
1
e 0
q h x12 C u2 ud 2 du dx1 :
Introducing polar coordinates .r; '/ by x1 D r cos ' and u D r sin ', it follows Z Z 1 d 2 d 1 iktkr cos ' d 2 O f .t / D .S / h.r/r e sin ' d' dr: 0
0
From Lemma C.5.3 we know that x Z 1 eix cos ' sin2 ' d'; J .x/ D p C 12 2 0
x 0; 0
from which the assertion follows. Corollary 3.6.7. If f 2 L1 .R3 /, then Z 4 1 O r sin.rkt k/f .r e/ dr; f .t / D kt k 0
t 2 R3 n f0g:
If f is a complex-valued function on Œ0; 1/ such that Fd .t / WD f .kt k/; t 2 Rd is Lebesgue integrable, then the Fourier transform of Fd is radial. We will use the notation FOd .t / D Fd .f /.kt k/; t 2 Rd : Next we prove a recursion formula for the Fourier transformation for radial functions.19 19
See [54] and [26] for more information on recursion formulae.
164
Chapter 3 Special properties
Theorem 3.6.8. Let f W Œ0; 1/ ! C be such that the functions t 7! f .kt k/; t 2 Rd C2 , and t 7! f .kt k/; t 2 Rd , are in L1 .Rd C2 / and L1 .Rd /, respectively. Then we have 1 1 d (1) Fd .f /.r/; r > 0: Fd C2 .f /.r/ D 2 r dr Proof. In view of Theorem 3.6.6 Fd .f /.s/ D .2/d=2
Z
1
JQd=21 .rs/f .r/r d 1 dr;
s>0
(2)
0
where JQ .x/ WD x J .x/. By the second equation in Proposition C.5.5 we have20 d Q (3) J .x/ D x JQC1 .x/: dx while Lemma C.5.2 shows that jJQ .x/j C ;
0
with some positive constant C . Using equation (3) and differentiating both sides of equation (2) with respect to s we obtain (1). The fact that interchanging differentiation and integration is permissible follows from Z 1ˇ ˇ ˇ ˇ d Q ˇ Jd=21 .rs/ˇ jf .r/jr d 1 dr ds 0 Z 1 ˇ ˇ ˇJQd=2 .rs/ˇ jf .r/jr d C1 dr Ds 0 Z 1 jf .r/jr d C1 dr < 1: sCd=2 0
That the last integral is finite is a consequence of Corollary B.7.7 since, by assumption, the function t 7! f .kt k/; t 2 Rd C2 , is in L1 .Rd C2 /. Using induction on d . we immediately obtain the following corollary: Corollary 3.6.9. Let f W Œ0; 1/ ! C be such that the functions t 7! f .kt k/; t 2 Rn are in L1 .Rn / for all 1 n 2d C 2. Then we have F2d C1 .f /.r/ D
d 1 X .1/j .2d j 1/Š 1 d j F1 .f /.r/ .2/d j D1 2d j .d j /Š.j 1/Š r 2d j dr
and F2d C2 .f /.r/ D
20
d 1 X .1/j .2d j 1/Š 1 d j F2 .f /.r/ : .2/d j D1 2d j .d j /Š.j 1/Š r 2d j dr
See also Remark C.5.6.
165
Section 3.7 Radial characteristic functions
3.7
Radial characteristic functions
Radial positive definite functions have many applications in probability and statistics. They occur for example as characteristic functions of radial distributions and as covariance functions of isotropic random fields. We have already shown (cf. Lemma 1.3.4) that the characteristic function of the radial density 1 1 2 e 2 kxk ; x 2 Rd '.x/ D d=2 .2/ is given by 1 2 g.t / D e 2 ktk ; t 2 Rd : We continue with further examples of radial characteristic functions. Lemma 3.7.1. Let U be a random vector that is uniformly distributed on the closed ball B c .R/ Rd . Then its characteristic function fU (see Figure 3.5) is given by
d=2 2 fU .t / D .d=2 C 1/ Jd=2 .Rkt k/; t 2 Rd n f0g: Rkt k Proof. The case d D 1 being simple, assume that d > 1. Theorem 3.6.6 and Corollary C.5.5 show that Z R .2/d=2 1d=2 kt k r d=2 Jd=21 .rkt k/ dr fU .t / D d .B c .R// 0 Z Rktk d=2 .2/ x d=2 Jd=21 .x/ dx kt kd D d .B c .R// 0 h iRktk .2/d=2 d d=2 x J .x/ kt k d=2 xD0 d .B c .R// from which the lemma follows. D
Recall that a d-dimensional random vector is said to be uniformly distributed on the d-dimensional sphere of radius R if the distribution of X=R is equal to the measure d introduced in Theorem B.7.5. Lemma 3.7.2. Let U be a random vector that is uniformly distributed on the d-dimensional sphere of radius R. Then its characteristic function fU is given by
fU .t / D .d=2/
2 Rkt k
d=21 Jd=21 .Rkt k/;
t 2 Rd n f0g:
Proof. To simplify the notation we may assume that R D 1. By Lemma 3.6.3 the function fU is radial and hence fU .t / D fU .kt ke1 / where e1 D .1; 0; : : : ; 0/ 2 Rd .
166
Chapter 3 Special properties
1
0.75
0.50
0.25
2
4
6
8
10
0:25 Figure 3.5. The characteristic function fU from Lemma 3.7.1 with d D 2 and R D 1 (shown as a function of the norm).
Using this and equation (B.7.11.1) we obtain Z 1 eitu dd 1 .u/ fU .t / D d 1 .S d 1 / S d 1 Z 1 eiktku1 dd 1 .u/ D d 1 d 1 d 1 .S / S Z Z 1 1 .1 s 2 /.d 3/=2 eiktks ds dd 2 .v/: D d 1 .S d 1 / S d 2 1 The definition of Jd=21 (cf. Definition C.5.1 and Remark C.5.6) and Proposition B.7.8 lead immediately to the stated formula. Lemma 3.7.3. The function f defined by f .t / D ektk ; t 2 Rd ; d 2 N is a characteristic function and fO.t / D 2d .d 1/=2 ..d C 1/=2/ .1 C kt k2 /.d C1/=2 ; Proof. Theorem 3.6.6 shows that Z fO.t / D .2/d=2 kt k1d=2 0
1
r d=2 Jd=21 .rkt k/er dr;
t 2 Rd :
t 2 Rd n f0g:
(1)
167
Section 3.7 Radial characteristic functions
Using the formula Jd=21 .x/ D
1 x d=21 X
2
kD0
x 2k .1/k kŠ .k C d=2/ 2
(cf. Theorem C.5.4) we obtain fO.t / D 2 d=2
1 X
.1/k kt k2k 22k kŠ.k C d=2/
kD0
Z
1
r 2kCd 1 er dr
0
for all t where the power series above is absolutely convergent. The definition of the Gamma function and Legendre’s duplication formula (cf. Proposition C.4.5) show that the integral above is equal to 1 p 22kCd 1 .k C d=2/.k C .d C 1/=2/ and hence fO.t / D 2d .d 1/=2
1 X .1/k .k C .d C 1/=2/ kt k2k ; kŠ
kt k < 1:
kD0
Using this and Proposition B.1.3 we see that equation (1) holds whenever kt k < 1 while Theorem 3.3.6 implies that it holds for all t 2 Rd . The fact that f is a characteristic function follows from fO 0. Corollary 3.7.4. Let P be a polynomial with real coefficients such that P .0/ D 1 and P .z/ ¤ 0 if Re z 0. Then t 7! 1=P .kt k/b ;
t 2 Rd
is a characteristic function for all b > 0 and all d 2 N. Proof. Since P has no zero with nonnegative real part, it is the product of polynomials of the form Q.t / D 1 C pt C qt 2 where p; q 0. Thus, it suffices to show that t 7! 1=Q.kt k/b is positive definite. We have Z 1 Z 1 b1 x b x e dx D s ub1 esu du; b; s > 0: .b/ D 0
Setting s D Q.kt k/ we obtain 1 1 D .b/ Q.kt k/b
0
Z
1
ub1 eQ.ktk/u du;
b > 0; t 2 Rd :
0
Using Lemma 3.7.3 and Lemma 1.3.4 we see that t 7! eQ.ktk/u D eu eupktk euqktk ; 2
t 2 Rd
is positive definite if u 0. Therefore, the Corollary follows from Lemma 1.5.7.
(1)
168
Chapter 3 Special properties
Taking the inverse Fourier transform in equation (3.7.3.1), we see that the function t 7! .1 C kt k2 /.d C1/=2 is the characteristic function of an absolutely continuous distribution. The next theorem states more. Theorem 3.7.5. The function f .t / D
1 ; .1 C kt k2 /b
is a characteristic function for all b > 0. If b >
d 2,
t 2 Rd then21
21b fO.x/ D kxkbd=2 Kbd=2 .kxk/; .b/
x 2 Rd n f0g:
Proof. The first statement follows from Corollary 3.7.4. If b > d=2, then f is Lebesgue integrable (cf. Corollary B.7.7). By equation (3.7.4.1) we have Z 1 1 2 ub1 e.1Cktk /u du: f .t / D .b/ 0 0.15
0.10
0.05
2
4
6
Figure 3.6. The function fO from Theorem 3.7.5 with b D 3 and d D 4 (shown as a function of the norm).
21
Recall that K denotes the modified Bessel function (cf. Definition C.5.7). The function fO is shown in Figure 3.6.
169
Section 3.7 Radial characteristic functions
Consequently, fO.x/ D .2/d=2
Z
f .t /ei.t;x/ dt Z Z 1 1 2 ub1 e.1Cktk /u ei.t;x/ du dt D .2/d=2 .b/ Rd 0 Z 1 Z 1 2 D .2/d=2 ub1 eu ektk u ei.t;x/ dt du: .b/ 0 Rd Rd
Using Lemma 1.3.4 we obtain fO.x/ D
1 2d=2 .b/
Z
1
ubd=21 eu ekxk
2 =.4u/
du:
0
From Lemma C.5.8 we know that Z 1 r u a 1 K .r/ D e 2 . a C u / u1 du; 2a 0
a; r > 0; 2 R:
Setting r D kxk; a D kxk=2, and D b d=2 leads to
Z 1 kxk d=2b 1 u kxk2 =.4u/ bd=21 e e u du Kbd=2 .kxk/ D 2 2 0 D 2b1 .b/kxkd=2b fO.x/; x ¤ 0; from which the second statement follows. Next we compute the self-convolution of the indicator function of a ball. Lemma 3.7.6 (Euclid’s hat). The function22 gd WD
1 d
.B c .1=2//
1B c .1=2/ 1B c .1=2/
is a radial characteristic function on Rd and we have 8 Z 1 d .d=2/ ˆ < p .1 x 2 /.d 1/=2 dx; t 2 B c .1/ ..d C 1/=2/ ktk gd .t / D ˆ : 0; t 2 Rd n B c .1/: The distribution corresponding to gd is absolutely continuous with density (see Figure 3.7) 2 .d=2 C 1/ Jd=2 .kxk=2/ ; x 2 Rd n f0g: pd .x/ D d=2 kxkd 22
Scale mixtures of Euclid’s hat have been investigated in [20].
170
Chapter 3 Special properties
0.08
0.06
0.04
0.02
2
4
6
Figure 3.7. The density pd from Lemma 3.7.6 with d D 2 (shown as a function of the norm).
Proof. It follows from Lemma 3.6.5 that gd is a radial characteristic function. To prove the remaining statements we may assume that d 2.23 We have Z 1B c .1=2/ .t C y/1B c .1=2/ .y/ dy q.t / WD 1B c .1=2/ 1B c .1=2/ .t / D Rd
D d .B c .1=2/ \ .B c .1=2/ t //: If kt k > 1, then the measure above is zero. Assume that kt k 1 and let e WD .1; 0; : : : ; 0/. Since q is radial, we have q.t / D q.kt k e/. Using the fact that the set B c .1=2/ \ .B c .1=2/ C kt k e/ is symmetric with respect to the hyperplane fy 2 Rd W y1 D kt k=2g we obtain n o q.t / D 2d y 2 Rd W kt k=2 y1 1=2; kyk 1=2
r Z 1=2 1 2 d 1 B s D2 ds: 4 ktk=2 By Proposition B.7.9 Z 1=2 d 1 .d 1/=2 .1 4s 2 / 2 ds q.t / D d 2 2 ..d C 1/=2/ ktk=2 Z 1 d 1 .d 1/=2 .1 x 2 / 2 dx D d 1 2 ..d C 1/=2/ ktk from which the representation of gd follows. 23
We already considered the case d D 1 in Remark 1.5.5.
171
Section 3.7 Radial characteristic functions
From Lemma 3.7.1 we conclude that the Fourier transform of 1B c .1=2/ is given by 1O B c .1=2/ .x/ D
Jd=2 .kxk=2/ .2kxk/d=2
:
Thus, the last statement is an immediate consequence of Theorem 1.3.6 and equation (1.8.2.iii). Remark 3.7.7. Write the function gd in the form gd .t / D hd .kt k/. For 0 s 1 we then have h1 .s/ D 1 s and h2 .s/ D
2 .arccos s s.1 s 2 /1=2 /
(see Figure 3.8). Partial integration of the integral in Lemma 3.7.6 gives the recursion formula24 hd .s/ D hd 2 .s/ p
.d=2/ .d 1/=2 s.1 s 2 /C ; ..d C 1/=2/
s 2 R;
d 3:
The derivative
d .d=2/ .d 1/=2 .1 s 2 /C ..d C 1/=2/ is nonpositive and nondecreasing, so that hd is convex and nonincreasing for all d 2 N.25 h0n .s/ D p
1
0.5 h2 h3 h4
0.5
1
Figure 3.8. The functions h2 ; h3 and h4 from Remark 3.7.7. 24 25
See also the proof of Proposition C.1.10. More information on positive definite functions related to Euclid’s hat can be found in [20].
172
Chapter 3 Special properties
Corollary 3.7.8. The function 'd defined by b d C2 2 c
'd .t / D .1 kt k/C
;
t 2 Rd
is a characteristic function. Proof. If d is even, then 'd .t / D 'd C1 .t; 0/; t 2 Rd . Therefore, it suffices to consider the case where d D 2n C 1; n 2 N. Lemma 3.7.6 shows that the function gd .t / D c1B c .1/ .t /Pn .kt k/;
t 2 Rd
is positive definite, where c is some positive constant and Z 1 .1 x 2 /n dx; s 2 R: Pn .s/ D s
By Proposition C.1.10 Pn .s/ D .1 s/nC1 Qn .s/; where Qn is a polynomial of degree n, having only zeros with negative real part. Using the definition of Pn we see that 1 gd .t /; cQn .kt k/ Thus, the statement follows from Corollary 3.7.4. D .1 kt k/nC1 C
t 2 Rd :
3.8 Schoenberg’s theorems on radial characteristic functions In this section we first characterize radial characteristic functions on Rd (Theorem 3.8.2). Then we investigate what happens if the dimension d tends to infinity (Theorem 3.8.5). 3.8.1. We will use the notation
d=21 2 Jd=21 .r/; d .r/ D .d=2/ r
r >0
(1)
and d .0/ D 1 (see Figure 3.9). By Lemma 3.7.2, t 7! d .kt k/ is the characteristic function of a random vector that is uniformly distributed on S d 1 . Theorem 3.8.2 (Schoenberg). A real-valued function f on Rd is a radial characteristic function if and only if f .t / D .kt k/ where Z 1 d .rs/ d.s/ (1) .r/ D 0
for some probability measure on Œ0; 1/.
173
Section 3.8 Schoenberg’s theorems on radial characteristic functions 1 0.8 0.6 0.4 0.2
5
10
15
0:2 Figure 3.9. The function 4 from equation (3.8.1.1).
Proof. Since t 7! d .kt ks/ is a positive definite function on Rd for all s, the function f .t / WD .kt k/, where is defined by (1), is positive definite as well. Thus, f is a characteristic function by Bochner’s Theorem 1.7.3. The other direction follows from the fact that f is the characteristic function of the random vector RX where R is a random variable with distribution and X is a random vector uniformly distributed on S d 1 and independent of R (cf. Theorem 1.1.6). As the proof of the preceding theorem shows, we have the following characterization of random vectors with a radial distribution. Corollary 3.8.3. A d-dimensional random vector Y has a radial distribution if and only if there exist a nonnegative random variable R and a random vector X which is uniformly distributed on S d 1 such that R and X are independent and RX has the same distribution as Y . The set of all characteristic functions on R such that t 7! .kt k/ is a characteristic function on Rd is denoted by ˆd . If 2 ˆd for all d , then we write 2 ˆ1 . Notice that ˆ1 ˆ2 ˆ1 : Lemma 3.8.4. The relation
p 2 lim d .r d / D er =2
d !1
holds uniformly for r 2 R.
174
Chapter 3 Special properties
Proof. Using the definition of the Bessel function Jd=21 (see Definition C.5.1) it is p easy to see that r 7! d .r d /; r 2 R, is the characteristic function corresponding to the density .d 3/=2
.d=2/ x2 pd .x/ D p 1.pd ;pd / ; x 2 R: 1 d d ..d 1/=2/ In view of Stirling’s formula (cf. Corollary C.4.10) lim p
d !1
.d=2/ d ..d 1/=2/
D 1:
Using this and the relation .d 3/=2
x2 2 lim 1 D ex =2 d d !1 we see that
1 2 lim pd .x/ D p1 .x/ WD p ex =2 ; d !1 2
x 2 R:
The characteristic function corresponding to the density p1 is r 7! er the lemma follows from Proposition 1.6.6.
2 =2
. Therefore,
Theorem 3.8.5 (Schoenberg). A characteristic function on R belongs to ˆ1 if and only if Z 1 2 2 er s =2 d.s/; r 2 R .r/ D 0
for some probability measure on Œ0; 1/. Proof. Since t 7! ektk s =2 is a characteristic function on Rd for all s and for all d (cf. Lemma 1.3.4), the same argument as in the proof of Theorem 3.8.2 shows that 2 ˆ1 . Next assume that 2 ˆd for all d . Then for each d there exists a probability measure d on Œ0; 1/ such that Z 1 p d .rs d / dd .s/; r 2 R: (1) .r/ D 2 2
0
contains a subsequence fdk g1 In view of Theorem E.1.13, the sequence fd g1 d D1 kD1 converging vaguely to some finite nonnegative measure satisfying .R/ 1. Let > 0 and t 2 R n f0g be arbitrary. Equation (1) and Lemma 3.8.4 imply Z 1 2 2 er s =2 dd .s/ C d .r/ .r/ D 0
Section 3.9 Convex and completely monotone functions
175
where jd j < for sufficiently large values of d . This implies that Z a 2 2 er s =2 dd .s/ C d;a .r/ .r/ D 0
where jd;a j < 2 whenever both a and d are sufficiently large. If a is a continuity point of , letting d ! 1 we get Z a 2 2 er s =2 d.s/ C ıa .r/ .r/ D 0
where jıa j < 2 . Finally, letting a ! 1 and then ! 0 we obtain the desired representation of .26
3.9
Convex and completely monotone functions
As we will see in this section, sufficient conditions for positive definiteness can be formulated in terms of convexity. Convexity conditions can be checked efficiently by taking derivatives and checking for nonnegativity. Using numerical algorithms this leads to simple and powerful tests for positive definiteness. One of the main results of the present section is Theorem 3.9.8 which describes the connection between radial characteristic functions and completely monotone functions. Definition 3.9.1. For a 2 R and n 2 N0 we denote by Cna the set of all functions g W Œa; 1/ ! R having the following properties: (i)
g is bounded;
(ii)
g is n-times differentiable;27
(iii)
g .n/ is monotone.
Note that C0a is the set of functions which are bounded and monotone. Lemma 3.9.2. For all n 2 N we have a C0a : Cna Cn1
Moreover, the derivatives g .k/ ; 1 k n, of a function g 2 Cna are alternately nonnegative and decreasing or nonpositive and increasing. Proof. Assume that g 2 Cna for some n 1. Without loss of generality we may further assume that g .n/ is decreasing. We prove that g .n/ is nonnegative and g .n1/ is nonpositive. The lemma follows then by induction on n. 26 27
See also the proof of Bernstein’s Theorem 3.9.6 for a similar argument. We consider differentiability from the right at a.
176
Chapter 3 Special properties
Since g .n/ is monotone the limit ın WD lim g .n/ .x/ x!1
exists. We show that ın D 0. If ın > 0, then there exists An > a such that g .n/ .x/ > ın =2 whenever x An . Thus, the function g .n1/ is increasing on the interval ŒAn ; 1/. Let x An . By the mean value theorem there exists 2 Œx; x C 1 such that g .n1/ .x C 1/ D g .n1/ .x/ C g .n/ ./ > g .n1/ .x/ C ın =2: This relation shows that limx!1 g .n1/ .x/ D 1. Repeating this argument we obtain that limx!1 g.x/ D 1, contradicting our assumption. In the same way we see that ın < 0 is not possible and hence ın D 0. Using this and the fact that g .n/ is decreasing, we conclude that g .n/ must be nonnegative on Œa; 1/. This implies that g .n1/ is increasing. The same argument as above shows that limx!1 g .n1/ .x/ D 0 and hence g .n1/ is nonpositive. Lemma 3.9.3. If n 2 N and g 2 Cna , then limx!1 x k g .k/ .x/ D 0 for all k D 1; : : : ; n. Proof. Lemma 3.9.2 shows that g is monotone and hence the limit ı0 WD limx!1 g.x/ exists. Replacing g by g ı0 we may suppose that ı0 D 0. Without loss of generality we may further assume that g is decreasing. Then g 0; g 0 0 and g 0 is increasing. We consider first the case n D 1. For 0 < c < 1 we have Z 1 Z x 0 g .u/ du g 0 .u/ du .1 c/xg 0 .x/ 0 (1) g.cx/ D cx
cx
whenever cx a, showing that limx!1 xg 0 .x/ D 0. a a . Since CnC1 Cna Assume that the lemma is true for some n and let g 2 CnC1 we have g .n/ 2 C1a . Applying the relation .1/ for g .n/ instead of g we see that limx!1 x nC1 g .nC1/ .x/ D 0. Theorem 3.9.4. Let n 2 N0 and g W Œ0; 1/ ! R be an n-times differentiable bounded function such that g .n/ is convex or concave. Then28 lim x k g .k/ .x/ D 0;
x!1
28
k D 1; : : : ; n C 1
(1)
This result is essentially due to P. Lévy [40] though his formulation and proof is slightly different from ours.
177
Section 3.9 Convex and completely monotone functions
where g .nC1/ denotes the right-hand derivative.29 Moreover, the function Z .1/nC1 1 nC1 .nC1/ v dg .v/; u 2 Œ0; 1/ F .u/ WD .n C 1/Š u is monotone, bounded and Z 1 x nC1 1 dF .u/ C lim g.u/; g.x/ D u!1 u C 0
x 2 Œ0; 1/:
Proof. It suffices to consider the case where g .n/ is convex. The same argument as in the proof of Lemma 3.9.3 shows that we may suppose limu!1 g.u/ D 0. By Theorem B.4.4 there exists a 0 such that g .n/ is monotone on Œa; 1/. Consequently, by Lemma 3.9.3, the relation (1) holds if k n. Since g .n/ is convex it is absolutely continuous and g .nC1/ is increasing (cf. Theorem B.4.7 and Theorem B.4.3). Consequently, we may replace g by g .n/ in relation (3.9.3.1). This leads to the inequality g .n/ .cx/ .1 c/xg .nC1/ .x/ 0;
cx a
from which (1) with k D n C 1 follows. Using the relation (1) repeated integration by parts (cf. relation (B.5.6.1)) gives Z 1 Z 1 .1/n g 0 .v/ dv D D v nC1 dg .nC1/ .v/: g.0/ D .n C 1/Š 0 0 This shows that the integral on the right-hand side converges. Thus, the integral in the definition of F converges as well. It is clear that F is monotone and bounded. Integrating by parts again and using Theorem B.5.4 we obtain Z 1 Z 1 .1/n g 0 .u/ du D .u x/nC1 dg .nC1/ .u/ g.x/ D .n C 1/Š x x Z 1 1 .u x/nC1 nC1 dF .u/ D u Zx 1 x nC1 1 dF .u/ D u Zx 1 x nC1 1 dF .u/: D u C 0 Definition 3.9.5. An infinitely differentiable function g W .0; 1/ ! R is called completely monotone if .1/n g .n/ .x/ 0 for all n 2 N0 and x > 0. A continuous function g W Œ0; 1/ ! R is called completely monotone if gj.0;1/ is completely monotone. 29
That the right-hand derivative exists follows from Theorem B.4.3.
178
Chapter 3 Special properties
The function x 7! eux is completely monotone for all u 2 Œ0; 1/. The next theorem shows that this function is the prototype of completely monotone functions. Theorem 3.9.6 (Hausdorff–Bernstein–Widder). An infinitely differentiable function g W Œ0; 1/ ! R is completely monotone if and only if it admits the representation Z 1 eux d.u/; x 2 Œ0; 1/ g.x/ D 0
where is a finite nonnegative measure on Œ0; 1/. Proof. It is clear that g is completely monotone if it admits the representation above. It follows from Theorem 3.9.4 that for all n 2 N there exists a finite nonnegative measure on Œ0; 1/ such that Z 1 xv n 1 dn .v/; x 0: (1) g.x/ D n C 0 Since g.0/ D n .Œ0; 1//, some subsequence fnk g converges weakly to a finite nonnegative measure (cf. Theorem E.1.13 and Theorem E.1.12). Moreover, xv n D exv lim 1 n!1 n C where for fixed x the convergence is uniform in v. Using this and taking the limit along the subsequence fnk g in equation (1) we obtain the desired representation of g.30 Corollary 3.9.7. An infinitely differentiable function g W .0; 1/ ! R is completely monotone if and only if it admits the representation Z 1 eux d.u/; x 2 Œ0; 1/ g.x/ D 0
where is a nonnegative measure on Œ0; 1/. Proof. It is easy to show that g is completely monotone if it admits the representation above (cf. Lemma C.7.4). To prove the converse, assume that g is completely monotone on .0; 1/. Then for each a > 0 the function x 7! g.x C a/ is completely monotone on Œ0; 1/. Hence, by Theorem 3.9.6, there is a finite nonnegative measure a on Œ0; 1/ such that Z 1 g.x C a/ D eux da .u/; x 2 Œ0; 1/: 0
30
We used the same idea as in the proof of Schoenberg’s Theorem 3.8.5. Here the situation is simpler since we have uniform convergence of the integrands.
179
Section 3.9 Convex and completely monotone functions
We define the measure by d.x/ WD eax da .x/. We then have Z 1 eux d.u/ g.x/ D
(1)
0
for all x > a. Using this and the uniqueness of the Laplace transform (cf. Theorem C.7.5) we conclude that eax da .x/ D ebx db .x/ for all b > 0. Thus, does not depend on a and therefore equation (1) remains valid for all x > 0. Combining the Hausdorff–Bernstein–Widder Theorem 3.9.6 and Schoenberg’s Theorem 3.8.5 we obtain the following characterization. Theorem 3.9.8. For a continuous function g W Œ0; 1/ ! R the following properties are equivalent: (i) (ii) (iii)
g.k k/ is positive definite on every Rd ; p g. / is completely monotone on Œ0; 1/; g admits the representation Z g.r/ D
1
er
2s
d.s/;
r 2 Œ0; 1/;
0
where is finite nonnegative measure on Œ0; 1/. Theorem 3.9.9 (Askey). Let d 2 N, k WD bd=2c and let g W Œ0; 1/ ! R be a continuous function such that (i)
g.0/ D 1;
(ii)
limr!1 g.r/ D 0;
(iii)
.1/k g .k/ is convex.
Then the function f defined by f .t / WD g.kt k/;
t 2 Rd
is a characteristic function in Rd . Moreover, there exists a probability measure on .0; 1/ such that Z 1
f .t / D
'd .rt / d.r/; 0
t 2 Rd
where 'd is the characteristic function defined in Corollary 3.7.8.31 Proof. The function g is obviously bounded. Since .1/k g .k/ is convex, the function .1/k g .kC1/ is increasing where g .kC1/ denotes the right-hand derivative. The statements of the theorem follow immediately from Theorem 3.9.4 by setting d.r/ WD d.F .1=r// and noting that the function F is increasing. 31
Generalizations of Askey’s theorem as well as further references can be found in [22].
180
Chapter 3 Special properties
Definition 3.9.10. An even continuous function f W R ! Œ0; 1/ with f .0/ D 1 is said to be of Pólya-type if f is convex on .0; 1/ and limt!1 f .t / D 0. The case d D 1 of the previous theorem has been considered by G. Pólya in [43]. Theorem 3.9.11 (Pólya). Every function of the Pólya type is the characteristic function of an absolutely continuous distribution. 3.9.12. Below we give a more detailed statement. For its proof we will need the following facts. Every function f of the Pólya type admits the representation Z 1 .1 jt =yj/C d.y/; t 2 R f .t / D 0
where d.y/ D dŒ1 f .y/ C yf 0 .y/ is a probability distribution on .0; 1/ and f 0 denotes the right-hand derivative. This follows from Theorem 3.9.4 but it can also be proved directly using the relation Z 1 Z 1 d f .y/ .1 t =y/ dŒ1 f .y/ C yf 0 .y/; t > 0 dy D f .t / D t dy y t t where
d dy
denotes the right-hand derivative. The fact that the function F .y/ WD 1 f .y/ C yf 0 .y/
is increasing is illustrated by Figure 3.10. For y 2 .0; 1/ and x 2 R let
y sin yx=2 2 2 sin2 yx=2 D K.x; y/ WD yx 2 2 yx=2
F .y/ f
y Figure 3.10. The function F from equation (3.9.12.1).
(1)
Section 3.9 Convex and completely monotone functions
181
if x ¤ 0 and K.0; y/ WD y=2. By Remark 1.5.5 the function x 7! K.x; y/ is a density for any fixed y and the corresponding characteristic function is given by t 7! .1 jt =yj/C . That is, Z .1 jt =yj/C D K.x; y/ eitx dx: (2) R
Theorem 3.9.13. Let f be a Pólya-type function. Then we have (i)
f is the characteristic function of an absolutely continuous distribution with density Z 2 1 sin2 .yx=2/ d.y/ p.x/ D 0 yx 2 where d.y/ D dŒ1 f .y/ C yf 0 .y/ is a probability distribution on .0; 1/;
(ii)
p is finite and continuous on R n f0g;
(iii)
p is bounded if and only if
Z
1
y d.y/ < 1:
0
Proof. (i) We have seen in Theorem 3.9.12 that is a probability distribution. By Fubini’s theorem, Z 1Z 1 Z 1 1 d.y/ D K.x; y/ dx d.y/ 1D 0 Z0 1 Z1 1 K.x; y/ d.y/ dx D Z1 0 K.x; y/ d .x; y/ D R.0;1/
showing that K 2 L1 . /. Moreover, the function p defined by Z 1 K.x; y/ d.y/; x 2 R p.x/ WD 0
is a probability density. Since K 2 L1 . /, applying Fubini’s theorem and (3.9.12.2) we obtain Z 1Z 1 Z 1 p.x/ eitx dx D K.x; y/ d.y/ eitx dx 1 1 0 Z 1Z 1 K.x; y/ eitx dx d.y/ D 0 1 Z 1 .1 jt =yj/C d.y/ D f .t /: D 0
182
Chapter 3 Special properties
(ii) Using the inequalities K.x; y/
1 ; 2
0 < y 1 and
K.x; y/
2 ; x 2
y1
we see that p.x/ < 1; x ¤ 0. Moreover, in view of these inequalities, Lebesgue’s dominated R convergence theorem shows that p is continuous on R n f0g. (iii) If y d.y/ < 1 then, using the inequality K.x; y/ y=2 D K.0; y/ we conclude that p.x/ p.0/ < 1 for every x. Assume that p is bounded. Then, by Fatou’s lemma, Z 1 K.x; y/ d.y/ 1 > lim inf p.x/ D lim inf x!0 x!0 0 Z 1 lim inf K.x; y/ d.y/ x!0 0 Z 1 1 y d.y/: D 2 0 This completes the proof. Remark 3.9.14. The fact that a function f of Pólya-type is positive definite can also be proved in the following way. For m 2 N let the function gm W R ! Œ0; 1/ be defined by the relations (see Figure 3.11) (i)
gm .j=m/ D f .j=m/;
j D m2 ; : : : ; m2 ;
(ii)
gm is linear on the intervals Œj=m; .j C 1/=m;
(iii)
gm .x/ D f .m/ if jxj > m.
j D m2 ; : : : ; m2 1;
It is not hard to see that gm can be written as 2
gm D f .m/ C
m X
pj
j=m
j Dm2
where a .x/ D .1 jx=aj/C and pj 0. This shows that gm is positive definite. On the other hand, gm ! f uniformly on R. Example 3.9.15. We have shown in Remark 3.5.3 that the function f˛ .t / D e˛jtjjtj ; ˛
t 2R
is of Pólya-type for all ˛ 2 Œ2; 1/. It is easy to check that the functions g˛ .t / D ejtj
˛
and
h˛ .t / D
1 ; 1 C jt j˛
t 2R
183
Section 3.9 Convex and completely monotone functions
1=m2
2=m2
3=m2
Figure 3.11. The functions f (continuous line) and gm from Remark 3.9.14.
are of the Pólya type if 0 < ˛ 1. Note that g˛ and h˛ are characteristic functions also for 1 < ˛ 2. To prove this we show that these functions are positive definite. As to g˛ , we may assume that 0 < ˛ < 2 and write Z
1
D˛ D 0
1 cos s ds: s 1C˛
The substitution s D ty leads to the formula Z jt j˛ D C˛
1
1
cos ty 1 dy; jyj1C˛
t 2R
R where C˛ D 1=2D˛ > 0. Replacing the integral above by jtj1=m ; m 2 N, we see that g˛ is the pointwise limit of functions of the form qm D e'm Ccm where 'm is positive definite and cm 2 R. By Lemma 1.4.17 the function qm is positive definite. The positive definiteness of h˛ follows from the equation Z h˛ .t / D
1
0
and from the positive definiteness of g˛ .
ey.1Cjtj
˛/
dy
184
Chapter 3 Special properties
3.10 Convolution roots with compact support If f is a characteristic function and f D h hQ with some square integrable function h, then h is called a convolution root of f . By Theorem 1.8.16, a characteristic function has a convolution root if and only if its spectral measure is absolutely continuous. In this section we assume that f has compact support and investigate the existence of compactly supported convolution roots. This problem is closely related to factorization of entire functions (cf. Corollary 3.10.3). Applications of results of this type range from probability theory, time series, spatial statistics, to optics, crystallography, and signal processing.32 We consider first the discrete one-dimensional case. Theorem 3.10.1. Let N 2 N and f 2 P .Z/ be such that supp f Œ2N; 2N . Then there exists a complex-valued function h on Z such that supp h ŒN; N and N X
Q f .n/ D h h.n/ D
h.n C k/h.k/;
n 2 Z:
kDN
Proof. Define the polynomial p by 2N X
p.z/ D
f .n/z n ;
z 2 C:
nD2N
It follows from Theorem 1.9.6 that p is nonnegative on T. By Theorem B.1.4, p can be written in the form z 2 C n f0g
p.z/ D q.z/q.1=z/ ; where
2N X
q.z/ D
bk z k ;
z; bk 2 C:
kD0
Setting bk WD 0 if k 2 Z n Œ0; 2N and equating the coefficients we obtain f .n/ D
2N X
bnCk bk ;
n 2 Z:
kD0
The function h defined by h.k/ WD bkCN has the desired properties.
32
See [15] for more details and references.
185
Section 3.10 Convolution roots with compact support
Theorem 3.10.2 (Boas–Kac, Kre˘ın). Let f 2 P c .R/ be such that the support of f is contained in Œ2r; 2r. Then there is a square integrable function h vanishing outside Œr; r such that Z r Q h.x C y/h.y/ dy; x 2 R: f .x/ D h h.x/ D r
Proof. We may suppose that r D 12 . For all m 2 N the sequence ff .n=m/gn2Z is positive definite and outside fm; m C 1; : : : ; mg. Hence, there exists a o n vanishes .m/
sequence b .m/ D bn
1
nD1
vanishing outside f0; 1; : : : ; mg and satisfying
f .n=m/ D
m X
.m/
.m/
bnCk bk ;
n 2 Z:
(1)
kD0
(see the proof of Theorem 3.10.1). For each m let33 p hm WD m b .m/ 1Œ1=.2m/;1=.2m/ and fm .x/ WD hm hQ m : It follows from (1) and from the definition of hm that fm takes the same values as f at the points n=m and is linear in between. Since f is uniformly continuous, we see that limm!1 fm .t / D f .t / uniformly on R. As khm k22 D fm .0/ D f .0/, there exists a subsequence fhmk g converging weakly in L2 .R/ to some square integrable function h (see Theorem D.5.6). If g 2 L2 .R/ and supp .g/ R n Œ1; 1 then Z 1 Z 1 g.x/h.x/ dx D lim g.x/hmk .x/ dx D 0 k!1 1
1
w
and hence supp .h/ Œ1; 1. Therefore, h is integrable. Using that hmk ! h and the fact that the supports of h and hmk are contained in Œ2; 2 we see that O /; t 2 R: lim hO mk .t / D h.t
k!1
Consequently,
fO.t / D lim fOmk .t / D jhmk .t /j2 D jh.t /j2 ; k!1
Q i.e., f D h h. Corollary 3.10.3. If F is an entire function of finite exponential type 2r which is nonnegative and integrable on R, then there exists an entire function H of exponential type r such that (1) F .z/ D H.z/H.z/; z 2 C: 33
The first convolution below is defined by considering b .m/ as a complex measure in Mf .R/.
186
Chapter 3 Special properties
Proof. Since F is integrable and nonnegative on R, the Fourier transform f of F jR is a continuous positive definite function. The Paley–Wiener Theorem 3.4.4 shows that f vanishes outside Œ2r; 2r. By Theorem 3.10.2 there is a square integrable function Q Setting h vanishing outside Œr; r such that f D h h. Z r h.t /eitz dt; z 2 C H.z/ WD r
Lemma 3.4.2 shows that H is an entire function of exponential type r. Moreover, it is easy to check that equation (1) holds. Recall that B o .r/ Rd denotes the open ball with center 0 and radius r 0. Theorem 3.10.4 (Rudin). Let f 2 P c .Rd / be an infinitely differentiable radial function such that the support of f lies in B o .2r/; r > 0. Then f is the sum of a uniformly convergent series34 1 X f D fk fQk kD1
where each fk is infinitely differentiable and vanishes outside B o .r/. Proof. Define the function F by F .w/ D
Z Rd
eiwt1 f .t / dt;
w 2 C:
Since f is even, F is even as well. Note that F can be written as Z 2r eiwt1 g.t1 / dt1 F .w/ D 2r
where g D f if d D 1 and Z g.t1 / D
Rd 1
f .t1 ; t2 ; : : : ; td / dt2 dtd ;
(1)
t1 2 R
if d > 1. In both cases g is an infinitely differentiable positive definite function vanishing outside Œ2r; 2r. By Theorem 3.2.8 the function F jR is rapidly decreasing. Using the representation (1), Lemma 3.4.2 and Theorem 1.5.9 show that F is an entire function of exponential type 2r which is nonnegative on the real axis. By Theorem 3.4.5 there are even functions Gj ; Hj 2 E.r/ such that F .u/ D
1 X j D1
34
jGj .u/j2 C u2
1 X
jHj .u/j2 ;
u 2 R:
(2)
j D1
The paper [14] contains an analogue of Rudin’s result where f is not supposed to be infinitely differentiable and d 3. See also [15] where this analogue is stated without proof.
187
Section 3.11 Infinitely divisible characteristic functions
Since jGj .u/j2 F and jHj .u/j2 F , the functions Gj jR and Hj jR are rapidly decreasing. Thus, s 7! Gj .ksk/ and s 7! Hj .ksk/ are integrable on Rd . We define the function gj by Z gj .t / WD
Rd
ei.t;s/ Gj .ksk/ ds:
(3)
By Lemma 3.6.3 the function g is radial while Theorem 3.2.8 shows that gj is rapidly decreasing. Since Gj is even and entire, for fixed t2 ; : : : ; td 2 R there is an entire function Qj D Qjt2 ;:::;td such that Qj .t1 / D Gj .k.t1 ; t2 ; : : : ; td /k/;
t1 2 R:
Using that Gj 2 E.r/ it is easy to check that Qj belongs to E.r/. By the Paley– Wiener Theorem 3.4.4 the Fourier transform of Qj vanishes outside Œr; r. Hence, Z Z gj .t1 ; 0; : : : ; 0/ D eit1 x Qj .x/ dx dt2 dtd D 0 Rd 1
R
whenever jt1 j r (if d D 1, then the relation above does not contain the outer integral). Since gj is radial we conclude that gj vanishes outside B o .r/. Next, we have d X juk Hj .juj/j2 : juj2 jHj .juj/j2 D kD1
Associate the function hj with Hj as the function gj was associated with Gj by equation (3). Then hj is rapidly decreasing. Moreover, if hkj denotes the partial derivative of hj with respect to tk , then .2/d=2 tk Hj .kt k/ is the Fourier transform of hkj . From equation (2) we see that fO.t / D
1 X
jgO j .t /j2 C
j D1
d X 1 X
jhO kj .t /j2
kD1 j D1
where the series converges in L1 .Rd /. Taking inverse Fourier transforms we obtain the uniformly convergent representation f D
1 X
gk gQ k C
kD1
1 d X X
hkj hQ kj
kD1 j D1
from which the theorem follows.
3.11
Infinitely divisible characteristic functions
In this section we study a special class of characteristic functions on Rd known as infinitely divisible characteristic functions. This class plays an important role in the
188
Chapter 3 Special properties
study of decomposition of distributions, in the theory of processes with independent increments (cf. Theorem 2.1.3), as well as in the study of limit theorems. Definition 3.11.1. A characteristic function f on Rd is called infinitely divisible if for every positive integer n there exists a characteristic function fn such that f D .fn /n : Equivalently, a probability measure is said to be infinitely divisible if for every n 2 N there exists a probability measure n such that D n n : „ ƒ‚ … n times The characteristic functions 2
t 7! ei.a;t/qktk
and
t 7!
1 ; .1 C kt k2 /q
t 2 Rd
where a 2 Rd ; q 0, are easily seen to be infinitely divisible (cf. Theorem 3.7.5). The next lemma follows immediately from Definition 3.11.1. We omit the proof. Lemma 3.11.2. If the characteristic functions f and g are infinitely divisible, then so are f ; jf j2 and fg. The next theorem shows that characteristic functions having zeros cannot be infinitely divisible. Theorem 3.11.3. Let f be an infinitely divisible characteristic function on Rd . Then f has no zeros. Proof. For n 2 N let fn be a characteristic function such that f D .fn /n . The function jf j2=n D jfn j2 is a characteristic function for all n. Define g by g.t / WD lim jfn .t /j2 D lim jf .t /j2=n ; n!1
n!1
t 2 Rd :
If f .t / ¤ 0, then g.t / D 1, otherwise g.t / D 0. From this we conclude that g is equal to 1 in a neighborhood of 0. Since g is positive definite, Corollary 1.5.2 and Lemma 1.5.1 show that g.t / D 1 and hence f .t / ¤ 0 for all t 2 Rd . Remark 3.11.4. Let g W Rd ! C n f0g be a continuous function such that g.0/ > 0. By Theorem C.8.7 there exists a unique continuous function W Rd ! R such that .0/ D 0 and g.t / D jg.t /j ei.t/ ; t 2 Rd :
189
Section 3.12 Conditionally positive definite functions
We define the functions ln g and g p ; p 0, by the relations ln g.t / D ln jg.t /j C i.t /; g p .t / D jg.t /jp eip.t/ ;
t 2 Rd :
be a sequence of infinitely divisible characteristic Theorem 3.11.5. Let ffk g1 kD1 functions converging pointwise on Rd to a continuous function f . Then f is an infinitely divisible characteristic function. Proof. Since fk is infinitely divisible, jfk j2=n is a characteristic function for all positive integers k and n. Lévy’s continuity Theorem 1.6.3 and the relation jf .t /j2=n D lim jfk .t /j2=n ; k!1
t 2 Rd
show that f and jf j2=n ; n 2 N, are characteristic functions. Thus, jf j2 is an infinitely divisible characteristic function and therefore it has no zeros. Using this we obtain 1
1
1=n
f 1=n .t / D e n ln f .t/ D lim e n ln fk .t/ D lim fk k!1
k!1
.t /;
t 2 Rd :
This relation shows that f is infinitely divisible. Corollary 3.11.6. If f is an infinitely divisible characteristic function, then so are f p and jf jp for all p 0. Proof. If p is rational, then the statement follows easily from Definition 3.11.1. Using this, the general case is a consequence of Theorem 3.11.5. In the next section we give an integral representation for the continuous logarithm of infinitely divisible characteristic functions.
3.12
Conditionally positive definite functions
In the present section we generalize the concept of positive definiteness and show that the new concept is closely related to infinite divisibility. We prove an integral representation and give a useful sufficient condition for a radial function to be conditionally positive definite.35
35
Conditionally positive definite functions and their generalizations have many applications, see for example [9, 58] and also the notes in Remark 3.12.16.
190
Chapter 3 Special properties
Definition 3.12.1. A complex-valued Hermitian function f on Rd is called conditionally positive definite if the inequality n X
f .ti tj /ci cj 0
i;j D1
holds for all n 2 N; t1 ; : : : ; tn 2 Rd and for all complex numbers c1 ; : : : ; cn such that c1 C C cn D 0. Note that the inequality above is satisfied if f is constant. Therefore, we cannot conclude from this inequality alone that f is Hermitian. Positive definite functions are obviously conditionally positive definite. Let a 2 Rd be arbitrary and write l.t / WD i.t; a/; t 2 Rd : If a ¤ 0, then l is not bounded and hence it is not positive definite. The function l is Hermitian and we have n X
l.ti tj /ci cj D
i;j D1
D
n X
Œl.ti / l.tj /ci cj
i;j D1
X n iD1
l.ti /ci
X n j D1
cj
X n j D1
X n l.tj /cj ci D 0 iD1
P
whenever ci D 0. Thus, l is conditionally positive definite. The next lemma states elementary properties of conditionally positive definite functions which follow immediately from Definition 3.12.1. We omit the proof. Lemma 3.12.2. Let f and g be conditionally positive definite functions on Rd . Then so are the functions f ; Ref and pf C qg for all p; q 0. Moreover, the pointwise limit of conditionally positive definite functions is conditionally positive definite as well. Note that the product of conditionally positive definite functions need not be conditionally positive definite. A simple example is given by the product of the functions t 7! 1 and t 7! eit ; t 2 R. Lemma 3.12.3. If f is conditionally positive definite, then the Hermitian matrix n A D f .ti tj / i;j D1 has at most one negative eigenvalue36 for all finite systems t1 ; : : : ; tn of elements of Rd . 36
See also Remark 3.12.16.
191
Section 3.12 Conditionally positive definite functions
Proof. The set fx 2 Cn W x1 C C xn D 0g is an .n 1/-dimensional nonnegative subspace of A (cf. Definition D.2.5). We conclude that A has no negative subspace of dimension greater than 1. Hence, the statement follows from Theorem D.2.6. Theorem 3.12.4. Let f W Rd ! C be a Hermitian function with f .0/ 0. Then f is conditionally positive definite if and only if the matrix n B D f .ti tj / f .ti / f .tj / i;j D1
is positive semidefinite for all finite systems t1 ; : : : ; tn 2 Rd . d Proof. To prove Pn the “if part” let t1 ; : : : ; tn 2 R be arbitrary and let c1 ; : : : ; cn 2 C be such that iD1 ci D 0. Then we have n X
f .ti tj /ci cj D
i;j D1
n X
.f .ti tj / f .ti / f .tj / /ci cj 0:
i;j D1
Thus, f is conditionally positive definite. Conversely, suppose that f is conditionally positive definite. Let t1P ; : : : ; tn 2 Rd and c ; : : : ; cn 2 C be arbitrary and write tnC1 WD 0; cnC1 WD niD1 ci . Then PnC11 iD1 ci D 0 and we have n X
.f .ti tj / f .ti / f .tj / /ci cj D
i;j D1
n X
f .ti tj /ci cj
i;j D1 n X
C
iD1
D
nC1 X
f .ti /ci cnC1 C
n X
f .tj /cnC1 cj
j D1
f .ti tj /ci cj f .0/jcnC1 j2
i;j D1
0: The next theorem establishes an important connection between infinitely divisible characteristic functions and conditionally positive definite functions. Theorem 3.12.5. A Hermitian function f on Rd is conditionally positive definite if and only if the function t 7! epf .t/ ; t 2 Rd is positive definite for all p > 0.
192
Chapter 3 Special properties
Proof. Suppose that epf ./ is positive definite for all p > 0. Then .epf ./ 1/=p is conditionally positive definite for all p > 0. From the relation epf .t/ 1 ; p!C0 p
f .t / D lim
t 2 Rd
we conclude that the function f is conditionally positive definite. To prove the converse statement, suppose that f is conditionally positive definite. Then so is f f .0/ and therefore we may suppose that f .0/ D 0. Since pf is conditionally positive definite if p > 0, it suffices to consider the case p D 1. If .aij / is a positive semidefinite matrix, then so is .exp.aij // (cf. Corollary D.2.13). Using this, it follows from Theorem 3.12.4 that the matrix n exp.f .ti tj / f .ti / f .tj / / i;j D1
is positive semidefinite. We have n X
exp.f .ti tj //ci cj
i;j D1
D D
n X i;j D1 n X
exp.f .ti tj / f .ti / f .tj / / exp.f .ti // exp.f .tj / /ci cj exp.f .ti tj / f .ti / f .tj / /di dj 0
i;j D1
where di D exp. f .ti / /ci . As a corollary we obtain that the function t 7! a C i.x; t / .C t; t /;
t 2 Rd
is conditionally positive definite for all a 2 R; x 2 Rd and all d d positive semidefinite real matrices C . Lemma 3.12.6. A Hermitian function f on Rd is conditionally positive definite if and only if the inequality Q f .0/ 0 O D 0. holds for all finitely supported complex measures on Rd such that .0/ Proof. If D
n X j D1
cj ıtj
193
Section 3.12 Conditionally positive definite functions
then .0/ O D c1 C C cn . Using this the statement can be proved in the same way as Lemma 1.4.3. Lemma 3.12.7. Let f be a conditionally positive definite function on Rd and let be a complex measure with finite support on Rd . Then the function g D Q f is conditionally positive definite. If .0/ O D 0, then g is positive definite. In particular, the function t 7! 2f .t / f .t C y/ f .t y/;
t 2 Rd
is positive definite for all y 2 Rd . Proof. Let be an arbitrary finitely supported complex measure on Rd . The first two statements of the lemma follow from the equation Q f .0/ D . / . / Q f .0/ using Lemma 3.12.6 and the fact that . /O.0/ D .0/ O .0/. O Setting D ı0 ıy we obtain the last statement. The previous simple lemma has several useful corollaries. Corollary 3.12.8. Let t 7! f .t /; t 2 Rd , be a conditionally positive definite function. If f is twice continuously differentiable, then the function
@2 f @tj2
is positive definite for all j D 1; : : : ; d . Proof. Let e1 ; : : : ; ed be the standard orthonormal basis in Rd . Then we have
@2 2f .t / f .t C ej / f .t ej / f .t / D lim ; !0 2 @tj2
t 2 Rd :
Hence, the statement follows from Lemma 3.12.7. Corollary 3.12.9. Let h W Œ0; 1/ ! R be a continuous function which is twice continuously differentiable on .0; 1/. If h.k k2 / is conditionally positive definite on Rd C1 for some d 2 N, then the function h0 .k k2 / is positive definite on Rd . Proof. For 2 R we consider the function t 7! 2h.kt k2 / h.kt C ed C1 k2 / h.kt ed C1 k2 /;
t 2 Rd C1
194
Chapter 3 Special properties
where e1 ; : : : ; ed C1 is the standard basis of Rd C1 . Lemma 3.12.7 shows that this function is positive definite. Its restriction to Rd is given by s 7! 2 Œh.ksk2 / h.ksk2 C 2 /;
s 2 Rd :
Dividing by 2 and letting tend to zero we obtain the statement of the lemma. Corollary 3.12.10. Let f be a conditionally positive definite function and let ; 2 O D .0/ O D 0. Then there exists a complex measure such Mf .Rd / be such that .0/ that f D L . Proof. The corollary follows from the identity 1X l i . C il / Q . C il /Q Q 4 3
D
lD0
using Lemma 3.12.7 and the fact that . C il /O.0/ Q D 0. Lemma 3.12.11. For every continuous conditionally positive definite function f there exists a finite radial measure 0 such that (i)
the convolution f 0 exists and f 0 is a continuous positive definite function;
(ii)
O 0 .t / > 0;
t 2 Rd n f0g.
Proof. In view of Lemma 3.2.5 for every n 2 N there exists a radial probability measure n with compact support satisfying 0 O n .t / 1=2;
kt k 1=n:
Setting n WD ı0 n we have O n .0/ D 0 and 1 (1) O n .t / ; kt k 1=n: 2 By Lemma 3.12.7, the function n Q n f is positive definite. Consequently, writing rn WD n Q n f .0/, we have jn Q n f .t /j rn ;
t 2 Rd :
(2)
n Q n
(3)
Since kn k 2, the equation 0 D
1 X
1
2n max.1; rn / nD1
defines a finite radial measure. Using inequality (2) we see that the convolution f 0 exists and f 0 2 P c .Rd /. Property (ii) follows from inequality (1).
195
Section 3.12 Conditionally positive definite functions
Theorem 3.12.12 (Lévy–Khinchin formula). Let f be a continuous Hermitian function on Rd and let h W Rd ! Rd be a bounded continuous function such that h.t / D t in some neighborhood of zero and h.t / D h.t /; t 2 Rd . The function f is conditionally positive definite if and only if Z ei.t;y/ 1 i.t; h.y// d .y/; t 2 Rd (1) f .t / D a C i.t; x/ .C t; t / C Rd
where a 2 R; x 2 Rd ; C is a d d positive semidefinite real matrix and is a nonnegative measure on Rd such that .f0g/ D 0; .Rd n U / < 1 and Z kyk2 d .y/ < 1 (2) U
for every neighborhood U of zero. The representation (1) is unique. Proof. Assume first that f admits the integral representation (1). That f is conditionally positive definite follows immediately from the fact that the functions t 7! a C i.t; x/ .C t; t /
and
t 7! ei.t;y/ 1 i.t; h.y//
are conditionally positive definite. Next suppose that f is conditionally positive definite and let 0 be the measure from Lemma 3.12.11. Since 0 f is positive definite it can be represented as 0 f D L 0 with some nonnegative measure 0 . We define the measure by .f0g/ WD 0 and d .y/ WD
1 d0 .y/; O 0 .y/
y 6D 0:
Then is nonnegative and (3.12.11.ii) implies that .Rd n U / < 1 for every neighborhood U of zero. By Corollary 3.12.10, for ; 2 Mf .Rd / with .0/ O D .0/ O D 0 there is a complex measure such that f D L : It follows from 0 . f / D .0 f / that O 0 .y/ d .y/ D . /O.y/ d0 .y/: Using this relation and the definition of we obtain d .y/ D . /O.y/ d .y/;
y ¤ 0:
(3)
196
Chapter 3 Special properties
In particular, the right-hand side represents a finite measure. The special case x WD ı0 ıx and WD Q x , gives Z Z 2 jO x .y/j d .y/ D j1 ei.x;y/ j2 d .y/ Rd Rd Z Œ1 cos..x; y// d .y/ < 1; x 2 Rd D2 Rd
from which the relation (2) readily follows. Now we put Z ei.t;y/ 1 i.t; h.y// d .y/; p.t / WD f .t / Rd
t 2 Rd :
That the integral above exists follows from (2) using the inequality in Lemma B.1.5 and the fact that h.y/ D y in some neighborhood of zero. Next we show that p O D .0/ O D 0. Writing WD for is a constant for all ; 2 Mf .Rd / with .0/ short, and applying Proposition C.9.6 we obtain Z i.t;y/ .y/e O d .y/ p.t / D f .t / d R Z i.t;y/ .y/e O d .y/; t 2 Rd : D L .t / Rd
From equation (3) we see that the last integral is equal to Z ei.t;y/ d .y/ D L .t / .f0g/ Rd nf0g
and hence p.t / D .f0g/;
t 2 Rd :
In particular, .ı0 ıx / .ı0 ıs / p is constant for all x; s 2 Rd . Applying Lemma C.9.4 twice we see that p is a polynomial of degree at most 2. Setting WD , Q we obtain that Q p.0/ D Q .f0g/ 0 and therefore, by Lemma 3.12.6, the function p is conditionally positive definite. Thus, ep is positive definite in view of Theorem 3.12.5. The special form of p follows from (the simple part of) Theorem 3.5.1. To prove uniqueness of the representation (1) assume that Z ei.t;y/ 1 i.t; h.y// d1 .y/ p1 .t / C d ZR ei.t;y/ 1 i.t; h.y// d2 .y/; t 2 Rd D p2 .t / C Rd
where pj is a polynomial of degree at most 2 and j is a measure having the properties of ; j D 1; 2. Let x 2 Rd be arbitrary. Convolving both sides of the equation above
Section 3.12 Conditionally positive definite functions
197
with WD .ı0 ıx / .ı0 ıx / .ı0 ıx / and applying Proposition C.9.6 we obtain Z Z i.t;y/ i.t;y/ .y/e O d1 .y/ D .y/e O d2 .y/; t 2 Rd : Rd
Rd
i3 h Since .y/ O dj .y/ D 1 ei.x;y/ dj .y/ is a finite measure, in view of property (2), we conclude that i3 i3 h h 1 ei.x;y/ d1 .y/ D 1 ei.x;y/ d2 .y/; x 2 Rd : Taking into account that 1 .f1g/ D 2 .f1g/ D 0 we infer that 1 D 2 . Hence we also have p1 D p2 . Remark 3.12.13. If f is radial, then the measure in Theorem 3.12.12 is radial as well. This follows immediately from the construction of . In this case, x D 0 and C is a nonnegative multiple of the identity matrix. Theorem 3.12.14 (Micchelli). Let h W Œ0; 1/ ! R be a continuous function which is infinitely differentiable on .0; 1/. Then the function f WD h.k k2 / is conditionally positive definite on Rd for all d if and only if h0 is completely monotone on .0; 1/.37 Proof. If h0 is completely monotone on .0; 1/ then, by Corollary 3.9.7, it admits the representation Z 1 0 ers d.s/; r 2 Œ0; 1/ h .r/ D 0
where is a nonnegative measure on Œ0; 1/. Let > 0 and define the function h by h .r/ WD h.r C /; r 2 .; 1/. Then we have Z r h0 .t / dt h .r/ D h .0/ C 0 Z rZ 1 e.tC/s d.s/ dt D h .0/ 0 0 Z 1 Z r s e ets dt d.s/ D h .0/ 0 0 Z 1 ers 1 es d.s/ D h .0/ C s 0 Since the function ekk s 1 is conditionally positive definite on Rd for all s 2 Œ0; 1/ and for all d , the function h .k k2 / is conditionally positive definite on Rd for all d . Letting tend to zero, we conclude that the same is true for h. 2
37
This result can easily be generalized for conditionally positive definite functions of order m (see [58]).
198
Chapter 3 Special properties
Next assume that h.k k2 / is conditionally positive definite on every Rd . Corollary 3.12.9 shows that the function h0 .k k2 / is positive definite on every Rd . From Schoenberg’s Theorem 3.8.5 we conclude that h0 is completely monotone. Example 3.12.15. Let a > 0; 0 < b < 1, and write g.r/ WD .a C r/b ; r 0. From the relation g .k/ .r/ D b.b 1/ .b k C 1/.a C r/bk ;
k2N
we see that g 0 is completely monotone. Thus, g.k k2 / is conditionally positive definite on every Rd . Remark 3.12.16. M. G. Kre˘ın (see the historical remarks in [49]) generalized the concept of positive definiteness in the following way: A complex-valued Hermitian function f on Rd is said to have k negative squares if the Hermitian matrix n A D f .ti tj / i;j D1 has at most k negative eigenvalues (counted with their multiplicities) for any choice of n and t1 ; : : : ; tn 2 Rd , and for some choice of n and t1 ; : : : ; tn the matrix A has exactly k negative eigenvalues. Denote by Pk .Rd / the set of all functions on Rd with k negative squares. As we have seen in Lemma 3.12.3, conditionally positive definite functions belong to P0 .Rd / [ P1 .Rd /. Further examples are given by the functions t 7! t k et C .t /k et 2 PkC1 .R/; t 7! .1/k t 2k 2 Pk .R/; t 7! .1/kC1 t 2k 2 PkC1 .R/; t 7! .1/k jt ja 2 Pk .R/;
a 2 .2k 2; 2k; k > 0;
(see [49] for proofs). In [36] M. G. Kre˘ın proved that every continuous function f 2 Pk .R/ is definitizable in the following sense: there exists a polynomial Q of degree k such that the inequality
Z 1Z 1 d d f .x y/Q i h.y/ Q i h.x/ dy dx 0 dy dx 1 1 holds for every infinitely differentiable function h with compact support. He obtained the integral representation Z 1 itx e S.x; t / d.t / f .x/ D p.x/ C jQ0 .t /j2 1 where p is a Hermitian solution of the differential equation
d d Q i Q i p.x/ D 0 dx dx
199
Section 3.12 Conditionally positive definite functions
where Q.t / WD Q.tN /; Q0 is a polynomial that is obtained by deleting the non-real zeros of Q, S is a regularizing correction compensating for the real zeros of Q, and is a nonnegative measure satisfying Z 1 1 d.t / < 1 2 m 1 .1 C t / where m denotes the degree of Q0 . The concepts of negative squares and definitizability, as well as Kre˘ın’s integral representation have been generalized to commutative groups. Functions with k negative squares on a commutative group are definitizable in the sense that certain linear combinations of their translates are positive definite. We refer to [49] for more information on this topic. Definitizable functions are closely related to certain random fields which are generalizations of stationary processes.38 To see multivariate examples of functions with negative squares, denote by srd ; 1 r d , the elementary symmetric polynomial of degree r in d real variables: s1d .t / D t1 C t2 C C td s2d .t / D t1 t2 C t1 t3 C C td 1 td :: : srd .t / D t1 tr C C tnrC1 td :: : sdd .t / D t1 t2 td ;
t 2 Rd :
Then we have ˙ is1d 2 P1 .Rd /; ˙ is3d 2 Pd C1 .Rd /;
s2d 2 P2 .Rd /; ˙ id sdd 2 P2d 1 .Rd /
(see [51] for proofs).
38
See, among others, [4, 9, 42, 52, 53] and the references therein.
s2d 2 Pd .Rd /;
Chapter 4
The extension problem
In this chapter, we study the extendability of positive definite functions defined on a certain subset V of a commutative group. After proving some general results we consider the groups Rd and Zd .
4.1 General results Throughout this section G is a discrete commutative group with character group GO and V denotes a symmetric subset1 of G containing 0. Definition 4.1.1. A complex-valued function f on V is called positive definite if the inequality n X f .xi xj /ci c j 0 (1) i;j D1
holds for all c1 ; : : : ; cn 2 C and all x1 ; : : : ; xn 2 V such that xi xj 2 V; i; j D 1; : : : ; n. We denote by P .V / the set of all positive definite functions on V and write P0 .V / WD ff W f 2 P .V /; f .0/ 1g: Remark 4.1.2. If f 2 P .V /, then inequality (4.1.1.1) holds for all x1 ; : : : ; xn 2 G such that xi xj 2 V . Indeed, setting yi WD xi x1 we have yi 2 V and yi yj D xi xj . If V D T T with some T G, then every f 2 P .V / is T -positive definite (cf. Definition 2.9.1). Now let G D R and V D .2a; 2a/ with some a > 0. Then V D T T where T D .a; a/. In this case a complex-valued function f on V is T positive definite if and only if f is positive definite on V . To see this, let x1 ; : : : ; xn 2 V be such that xi xj 2 V for all i and j . Then the diameter of the set fx1 ; : : : ; xn g is less than 2a and hence there exists x0 2 R such that xi 2 .x0 a; x0 C a/ for all i . Setting yi WD xi x0 we have yi 2 T and yi yj D xi xj . Consequently, f is positive definite on V . Example 4.1.4 shows that there exist a finite set T R2 and a function f on T T such that f is T -positive definite but not positive definite on T T . 1
A subset V of G is called symmetric if V D V .
201
Section 4.1 General results
Example 4.1.3. For a 2 .0; =2/ let Va WD feit W a < t < ag G WD T and let r 2 R be arbitrary.2 Then the function
r .eit / WD eirt ;
a < t < a
is positive definite on Va . Indeed, if x; y; x y 2 Va then r .x y/ D r .x/ r .y/ and hence we can repeat the computations in Example 1.4.6: ˇX ˇ2 n X ˇ n ˇ
r .xi xj /ci cj D ˇˇ ci r .xi /ˇˇ 0: i;j D1
iD1
Example 4.1.4. Write3
2k 2k T WD cos ; sin W k D 0; 1; : : : ; 6 6 6 and V WD T T . We show that there is a function W Œ0; 2 ! Œ1; 1 such that the function f .x/ D .kxk/ is T -positive definite but f … P .V /. p Note that V consists of 19 points, of which 6 have norm 1, 6 have norm 3, and 6 have norm 2 (see Figure 4.1). Thus, apart from .0/ D 1, it suffices to specify three numbers: p .2/: .1/; . 3/ and Choose any
2 Œ1; 1=2/;
a 2 Œ1=2; 1;
and b 2 Œ1; 1=2
and put .1/ WD ˛ WD and
1 Œa.1 C / C b.1 /; 2
p 1 . 3/ WD ˇ WD Œa.1 C / b.1 / 2
.2/ WD . To see that f is T -positive definite, write tk D .cos 2k=6; sin 2k=6/
so we have to show that the matrix A D .f .ti tj //5j;kD0 is positive semidefinite. Writing down A, we see that it is the Toeplitz matrix corresponding to the function .1; ˛; ˇ; ; ˇ; ˛/ on the group f0; 1; 2; 3; 4; 5g D Z=6Z (factor 2 3
Below we use additive notation for the group operation in T which is actually multiplication. This example is due to T. M. Bisgaard.
202
Chapter 4 The extension problem Im
Re
Figure 4.1. The set V D T T from Example 4.1.4.
group). By (1.9.6.iii) it suffices to show that the Fourier transform of this function is nonnegative. So one has to verify 1 C ˛ .z k C z k / C ˇ .z 2k C z 2k / C z 3k 0 for all k D 0; : : : ; 5, where z D ei2=6 . This is readily done.4 To see that f is not positive definite on V , write R D f0; 2t0 ; 2t1 g and note that R R V . The corresponding 3 3 matrix is 2 3 1
B D 4 1 5:
1 Setting v WD Œ1; 1; 1T we have .Bv; v/ D 3.1 C 2 / < 0. Thus, B is not positive semidefinite. In the next theorem we list basic properties of positive definite functions on V which can be proved in the same way as in Section 1.4 and Section 1.5. Theorem 4.1.5. (i) 4
If f 2 P .V /, then f is Hermitian. By symmetry of the function it suffices to consider the values k D 0; 1; 2; 3.
203
Section 4.1 General results
(ii)
Let f1 ; f2 2 P .V /. Then the functions f1 , Ref1 , jf1 j2 , and f1 f2 are positive definite. Moreover, p1 f1 C p2 f2 is positive definite for all p1 ; p2 0.
(iii)
The set P .V / is a convex cone closed in the topology of pointwise convergence, while P0 .V / is a compact convex set.
(iv)
The inequalities of Theorem 1.4.12 hold for a function f 2 P .V / whenever the used arguments of f lie in V .
(v)
Let G D Rd or G D Td , let V be open and f 2 P .V /. If Re f is continuous at the neutral element, then f is uniformly continuous on V .
(vi)
Let V Rd and assume that V is an open ball with center 0. If f 2 P c .V / is such that jf .t /j D 1 for all t from a neighborhood of 0, then f .t / D ei.x;t/ ;
t 2V
for some x 2 Rd . The next proposition can be proved in the same way as Lemma 1.5.8. We omit the proof. Proposition 4.1.6. Let r > 0 and f 2 P c .B o .2r//. Then the inequality Z Z Z f .x y/c.x/c.y/ dx dy D f .x/.c c/.x/ Q dx 0 B o .r/
B o .r/
B o .2r/
holds for every continuous function c on Rd whose support lies in B o .r/. P Notation 4.1.7. Recall that the support supp ./ of a measure D j cj ıxj 2 Mf .G/ is given by supp ./ D fxj W cj ¤ 0g: We write TV WD fO W 2 Mf .G/; supp ./ V g TVr WD fp 2 TV W p is real-valuedg
TVC WD fp 2 TV W p 0g
Further, let TV2 denote the set of all q 2 TVC of the form n X
qD
jqj j2
for some n 2 N
j D1
where qj D O j 2 TV It is clear that
and
supp .j / supp .j / V:
TV2 TVC TVr TV
(1)
204
Chapter 4 The extension problem
TV is a complex linear space, TVr is a real linear space and the sets TV2 and TVC are convex cones. Note that TV2 contains the constant character and hence TV2 6D ;. By Q On TV and TVr (1.8.3.vi) a function p D O 2 TV belongs to TVr if and only if D . we will use the supremum norm O kpk1 D supfjp. /j W 2 Gg: Assume that V is finite. Then both of the linear spaces TV and TVr are finite dimensional and we have dim .TV / D dim .TVr / D d , where d is the number of points of V . For a complex-valued Hermitian function f on V we set Z X Lf .p/ WD f .x/p.x/ L D f d; p D O 2 TVr : V
x2V
Then Lf is a real linear functional on TVr . Conversely, if L is a real linear functional on TVr , then L D Lf with some Hermitian function f on V . Lemma 4.1.8. If V is finite, then the cone TV2 is closed in TVr with respect to the norm k k1 . Proof. Let d denote the dimension of TVr . We prove that every q 2 TV2 is a sum of d squares jqj j2 with some qj satisfying (4.1.7.1). Suppose that qD
m X
jqj j2
(1)
j D1
where m > d . Since m > dim .TVr /, there exist real numbers rj , not all zero, such that m X rj jqj j2 D 0: j D1
We may assume, without loss of generality, that jr1 j jr2 j jrm j and r1 ¤ 0. Solving the above equation for jq1 j2 and substituting the solution into (1) we obtain qD
m X
.1 rj =r1 /jqj j2
j D2
which is a sum of m 1 squares. Repeating these arguments we see that every q 2 TV2 is a sum of d squares. 2 r Suppose that a sequence fqn g1 1 TV converges to q 2 TV with respect to the norm k k1 . By the first part of the proof there exist functions qj;n satisfying (4.1.7.1) such that d X jqj;n j2 : (2) qn D j D1
205
Section 4.1 General results
The sequence fqn g1 1 being convergent, there exists a constant K such that kqn k1 K for all n. It follows from equation (2) that kqj;n k21 K for all j and n. The normed space TV being finite dimensional, there is a sequence ni ! 1 such that qj;ni converges to some qj0 2 TV ; j D 1; : : : ; d . Since the corresponding measures j;ni converge weakly, qj0 satisfies the condition on the support in (4.1.7.1). Obviously we have d X jqj0 j2 : qD j D1
Thus, q 2
TV2 ,
showing that
TV2
is closed.
Lemma 4.1.9. Let f W V ! C be a Hermitian function. Then f is positive definite on V if and only if 5 Lf .q/ 0 f or al l q 2 TV2 : (1) Moreover, f can be extended to a positive definite function on G if and only if Lf .p/ 0
f or al l p 2 TVC :
P Proof. To prove the first statement assume that f 2 P .V / and let D niD1 ci ıxi be such that xi 2 V and xi xj 2 V for all i; j . Then supp ./ supp ./ V . Since . /O Q D jj O 2 we have Z O 2/ D Lf .jj
f d. / Q D V
n X
f .xi xj /ci cj :
i;j D1
This relation shows that f is positive definite if and only if (1) holds. To prove the second statement. suppose that f can be extended to a positive definite function 2 P .G/. If p 2 TVC , then pL is a positive definite function on G with supp .p/ L V . Using the fact that pL 2 P .G/ and applying Theorem 1.9.6 we obtain X X f .x/p.x/ L D .x/p.x/ L D . p/O.0/ L 0: Lf .p/ D x2V
x2G
To prove the converse statement, assume that Lf .p/ 0 for every p 2 TVC . Without loss of generality we may suppose that f .0/ D 1. If p q; p; q 2 TVr , then q p 2 TVC and hence Lf .p/ Lf .q/. From this we conclude that jLf .p/j Lf .1/ D 1 whenever p 2 TVr and 1 p 1. Thus, Lf is a linear functional of norm6 1 on TVr . By the Hahn–Banach theorem (cf. Theorem D.4.4), Lf can be extended to a 5 6
See Notation 4.1.7 for the definition of Lf . O Note that we consider the supremum norm on Cr .G/.
206
Chapter 4 The extension problem
O Since linear functional L of norm 1 on the linear space Cr .G/. 1 D L.1/ kLk D 1 L is nonnegative (cf. Lemma E.1.5). The Riesz representation theorem (see TheoO such that rem E.1.3) shows the existence of a (nonnegative) measure 2 M.G/ Z p. / d. /; p 2 TVr : (2) L.p/ D Lf .p/ D O G
Let x 2 V be arbitrary. Since V is symmetric, the function p defined by p. / WD
.x/ C .x/ belongs to TVr . Substituting this p into (2) we obtain f .x/ C f .x/ D .x/ L C .x/: L If we define p by p. / WD i. .x/ .x//, then (2) gives f .x/ f .x/ D .x/ L .x/: L Thus, f .x/ D .x/ L for all x 2 V , i.e., .x/ L is an extension of f . Since is nonnegative the function L is positive definite. The proof is complete. Theorem 4.1.10. Let V be finite. The following two conditions are equivalent: (i)
every function f 2 P .V / can be extended to a function in P .G/;
(ii)
TVC D TV2 .
Proof. Suppose that TVC D TV2 and let f 2 P .V /. By the first statement of Lemma 4.1.9 the inequality Lf .p/ 0 holds for all p 2 TV2 D TVC . It follows from the second statement of Lemma 4.1.9 that f admits a positive definite extension. By Lemma 4.1.8, the convex cone TV2 is closed. If TV2 6D TVC then there exists a real linear functional L on TVr such that L is nonnegative on TV2 and L.p0 / < 0 for some p0 2 TVC (cf. Corollary D.4.3 with F D TV2 and C D fp0 g). Let f be the Hermitian function on V for which L D Lf . Lemma 4.1.9 shows that f is positive definite on V but has no positive definite extension. Example 4.1.11. Let Va and r be as in Example 4.1.3 and assume that r can be extended to a function fr 2 P .T/. By (4.1.5.v), the function fr is continuous. Arguing as in the proof of Corollary 1.4.13, we see that fr must be a character of T. Thus, fr .t / D eint with some n 2 Z. Consequently, r can be extended to a positive definite function on T if and only if r is an integer. In the rest of this section G1 and G2 are commutative groups, V2 is a symmetric subset of G2 containing 0 and G denotes the product group G1 G2 . We identify G1 and G2 with the subgroups G1 f0g and f0g G2 , respectively. Doing so, every element x 2 G has a unique decomposition x D x1 C x2 with xi 2 Gi ; i D 1; 2.
207
Section 4.1 General results
We will investigate positive definite functions on the set V WD G1 V2 G. Lemma 4.1.12. Let f 2 P .V /; y 2 G1 , and c 2 C. Then the function fcy .x/ WD .1 C jcj2 /f .x/ C cf .x C y/ C cf .x y/;
x2V
is positive definite on V . Proof. It is easy to see that x y; x C y 2 V if x 2 V and y 2 G1 . Thus, the function y fc is well defined on V . Writing h1 WD 0; h2 WD y; d1 WD 1 and d2 WD c, we have fcy .x/
D
2 X
f .x C hl hk /dl dk ;
x 2 V:
k;lD1
Now let c1 ; : : : ; cn 2 C be arbitrary and x1 ; : : : ; xn 2 V be such that xi xj 2 V . Setting xj;k WD xj C hk ; cj;k WD cj dk , using the equation above and the fact that f 2 P .V /, we obtain n X
fcy .xi xj /ci cj D
i;j D1
n 2 X X
f .xi;l xj;k /ci;l cj;k 0
k;lD1 i;j D1
y
i.e., fc 2 P .V /. Lemma 4.1.13. If f is an extremal point of the convex set P0 .V /, then f .x C y/ D f .x/f .y/ holds for all x 2 G1 and y 2 V . Proof. This lemma can be proved by the same argument that was used in the first part of the proof of Theorem 1.4.20. We omit the details. Theorem 4.1.14. If every positive definite function on V2 can be extended to a positive definite function on G2 , then every positive definite function on V has a positive definite extension to G. Proof. First we prove that every extremal point f of P0 .V / can be extended to a positive definite function on G. By the previous lemma we have f .x C y/ D f .x/f .y/;
x 2 G1 ; y 2 V:
The function f2 defined by f2 .y/ WD f .y/; y 2 V2 , is positive definite on V2 . By assumption, it has an extension 2 2 P .G2 /. Setting .x C y/ WD f .x/2 .y/;
x 2 G1 ; y 2 G2
208
Chapter 4 The extension problem
the function is positive definite on G (cf. Lemma 1.4.16). Moreover, is an extension of f . The set P0 .V / is compact in the topology of pointwise convergence. Kre˘ın– Milman’s Theorem D.4.5 shows that every function g 2 P0 .V / is the pointwise limit of a net ff˛ g, where f˛ is a finite convex combination of extremal points. By what we have proved, f˛ admits an extension ˛ 2 P .G/. Since P0 .V / is compact, the net f˛ g has a subnet converging pointwise to some complex-valued function on G. It is clear that is a positive definite extension of g.
4.2 The cases Rd and Zd We start with the one-dimensional case. Theorem 4.2.1. Let N be a nonnegative integer. If V D fn 2 Z W jnj N g, then every positive definite function on V can be extended to a positive definite function on Z. Proof. By Theorem 4.1.10 it suffices to show that TVC D TV2 . The character group of Z is T and each function p 2 TV can be written as p.z/ D
N X
cn z n ;
z2T
nDN
with some complex numbers cn . The equality TVC D TV2 follows now at once from Theorem B.1.4. Theorem 4.2.2. Let a 2 .0; 1/ and set V WD .a; a/. (i)
Every function f 2 P .V / can be extended to a function ' 2 P .R/.
(ii)
If f is measurable (continuous) on V , then every positive definite extension of f is measurable (continuous) on R.
Proof. (i) Let g be a positive definite function on R such that the support of g is a finite subset of V . We show that the function h, which is equal to fg on V and is equal to zeronon R n V , is positive definite on R. For this we prove that the matrix h.xi xj / i;j D1 is positive semidefinite for all x1 ; : : : ; xn 2 R. Without loss of generality assume that x1 x2 xn . We set S WD f.i; j / W xi xj 2 V g and
cij WD f .xi xj /;
.i; j / 2 S:
Then S and cij satisfy the conditions of Theorem D.2.19 (see also n beginning of the Remark 4.1.2). Thus, there exists a positive semidefinite matrix aij i;j D1 such that
Section 4.2 The cases Rd and Zd
209
aij D cij for .i; j / 2 S . Schur’s Theorem D.2.12 shows that the matrix n aij g.xi xj / i;j D1 is positive semidefinite. Since the support of g is contained in V , we have g.xi xj / D 0 for .i; j / … S and hence h.xi xj / D aij g.xi xj /;
i; j D 1; : : : ; n:
Thus, h is positive definite and hence (cf. Theorem 1.9.6) X O h.1/ D f .x/g.x/ 0:
(1)
x2V
Setting7 Lf .q/ WD
X
f .x/q.x/; L
q 2 TVr
x2V
inequality (1) shows that Lf is nonnegative on TVC . Considering R as a discrete group and arguing as in the second part of the proof of (4.1.9.ii), we see that there exists a nonnegative finite measure on the character group of R such that f .x/ D .x/; L x2 V . Consequently, L is a positive definite extension of f . (ii) The statements concerning measurability and continuity of ' follow from Theorem 2.11.8 and Corollary 1.5.2. As a corollary we obtain the following interesting result. Theorem 4.2.3. If an analytic function g W R ! C is positive definite on an interval .a; a/, then it is positive definite on R. Proof. By the previous theorem, the restriction of g to .a; a/ can be extended to a continuous positive definite function f on R. It follows from (3.3.6.ii) that f D g. Remark 4.2.4. In a series of papers M. G. Kre˘ın investigated the problem of describing all positive definite continuations to R of a given positive definite function on a finite interval.8 To give details would go beyond the scope of this book. Below we present an example which was studied in [38]. We consider the function f .t / D 1 jt j;
t 2R
and its restrictions fa WD f j.2a;2a/ ; a > 0. Using Pólya’s Theorem 3.9.11 we see that fa is positive definite if a 1. If a > 1, then fa is not positive definite since its 7 8
We use the Notation 4.1.7 with G D R. See, e.g., [38] or [25] for a list of references.
210
Chapter 4 The extension problem
modulus is not majorized by f .0/. If a 1, then the periodic extension fQa of fa is positive definite. This can be seen from its Fourier series fQa .t / D 1 a C
1 X kD1
a .k C
1 2 2 2/
1
eit.kC 2 / a ;
t 2 R:
If a D 1 and fQ1 is a positive definite continuation of f1 to R, then fQ1 .2/ D 1 and therefore fQ1 is periodic with period 4 in view of Corollary 1.4.14. Using Kre˘ın’s results it was shown in [38] that for a < 1 the relation Z 1 eizt fQa; .t / dt D i 0
tan az 1 .a 1/2 ; z2 z z 2 cos2 az. C .a 1/2 tan az C .a 1/=z/
Im z > 0
establishes a bijective correspondence between all continuations fQa; of fa and all functions of the form 1 or Z 1 tz C 1 .z/ D ˛ C ˇz C d .t /; z 2 C n R 1 t z where ˛ 2 R; ˇ 0 and is nonnegative finite measure.9 We consider two special extensions. If D 1, then the corresponding extension is the 4a-periodic extension of fa . If .z/ D .a 1/2 i; Imz > 0, and a; is the measure corresponding to fQa; . then da; .x/ D
.a 1/2 dx: ..a 1/2 x 2 C .a 1/x sin.2ax/ C cos2 .ax//
This implies the relation Z .a 1/2 cos.tx/ 2 1 dx D 1 jt j 0 .a 1/2 x 2 C .a 1/x sin.2ax/ C cos2 .ax/ where 0 < a < 1; 2a < t < 2a. We now turn to the multivariate extension problem. Below we show that the extension is not always possible if d > 1 (cf. Theorem 4.2.6 and Theorem 4.2.8). For M 2 N write d WD fi 2 Zd W jik j M for all kg: SM d Theorem 4.2.5. If G D Zd and V D SM where d and M are greater than 1, then C 2 TV 6D TV . 9
is a so-called Nevanlinna function: it is holomorphic in the upper half plane and has a nonnegative imaginary part there.
Section 4.2 The cases Rd and Zd
211
Proof. We consider only the case d D 2, the general case can be treated in the same way. Denote by L the real linear space of all polynomials P .s; t / D
2M X
am;n s m t n ;
am;n 2 R
m;nD0
of two real variables. Since the character group of Z2 is T2 , the linear space TVr consists of functions of the form p.z1 ; z2 / D
M X
cm;n z1m z2n ;
z1 ; z2 2 T
m;nDM
where cm;n 2 C and cm;n D cm;n . For p 2 TVr we define the function Lp by
sCi t Ci ; ; s; t 2 R: .Lp/.s; t / WD .1 C s 2 /M .1 C t 2 /M p si t i Noting that .r C i/=.r i/ 2 T for all r 2 R, we see that Lp 2 L. The mapping L is obviously linear and Lp ¤ 0 if p ¤ 0. From dim .L/ D dim .TVr / D .2M C 1/2 we conclude that L is a linear isomorphism from TVr onto L. Moreover, p 0 if and only if Lp 0. Let P be the polynomial from Lemma B.6.4. Then P 2 L and setting p WD L1 P we have p 2 TVC . We show that p … TV2 . Suppose, on the contrary, that p 2 TV2 . Then there exist qj D O j 2 TV with P 2 p D jqj j where supp .j / V
and supp .j / supp .j / V:
(1)
We may suppose that C supp .j / SM WD f.k; l/ 2 Z2 W 0 k; l Mg: C . Indeed, it follows from (1) that there exists xj 2 Z2 such that supp .ıxj j / SM Moreover, we have ˇ ˇ ˇ.ıx j /Oˇ2 D jO j j2 D jqj j2 : j C , the function Since supp .j / SM
M
M
Qj .s; t / WD .s i/ .t i/ qj
sCi t Ci ; si t i
is a complex polynomial in each of the variables s and t . Applying the definition of L we see that jQj j2 D Ljqj j2 . Consequently, X X jQj j2 : P D Lp D Ljqj j2 D
212
Chapter 4 The extension problem
Writing Qj D gj C ihj where gj and hj are polynomials with real coefficients, we obtain X P D .gj2 C h2j / in contradiction to the choice of P . Thus, p … TV2 . Combining Theorem 4.1.10 and Theorem 4.2.5 we obtain the following theorems: Theorem 4.2.6 (Calderón–Pepinsky). If d and M are greater than 1, then there exists d which cannot be extended to a positive definite a positive definite function on SM d function on Z . d Theorem 4.2.7. Let M; d 2 N be arbitrary. Every positive definite function on SM can be extended to a continuous positive definite function on the set d TM WD ft 2 Rd W jtj j M for all j g Rd :
Proof. Write U WD ft 2 Rd W jtj j < 1=2 for all j g and choose a continuous function h such that supp .h/ U and K WD h hQ and define by X .t / WD f .n/K.t n/; t 2 Rd :
R Rd
jhj2 d D 1. Let
d n2SM
Note that K and are continuous. Using that K.0/ D 1 and K.m/ D 0 if m 2 Zd nf0g, we see that is an extension of f . It remains to prove that is positive definite, i.e., n X
.xi xj /ci cj 0
(1)
i;j D1 d d with xi xj 2 TM . We may for every choice of n 2 N; ci 2 C and points xi 2 TM suppose that the xi ’s have nonnegative coordinates. Indeed, denote by rk the smallest of the k-th coordinates of the points xi and write x0 WD .r1 ; : : : ; rd / and yi WD xi x0 . d , have nonnegative coordinates, and .xi xj / D .yi yj /. The points yi lie in TM Put n X ci h.y xi /; y 2 Rd : (2) H.y/ WD iD1
Setting f .n/ WD side of (1) is equal to
d 0 outside of SM
X n2Zd
and using the definition of we see that the left-hand Z
f .n/ Rd
H.y n/H.y/ dy:
(3)
Section 4.3 Decomposition of locally defined positive definite functions
213
It follows from Z X Z H.y n/H.y/ dy D H.y n m/H.y m/ dy Rd
m2Zd
that (3) is equal to Z X U
U
f .m n/H.y C m/H.y C n/ dy:
(4)
m;n2Zd
Using the definition of H we see that H.y C n/, where y 2 U and n 2 Zd , can be d and n has nonnegative coordinates. Thus, (4) is not different from zero only if n 2 SM d and to have nonnegative coordinates. For changed if we restrict m and n to lie in SM d d , the integrand such m and n we have m n 2 SM . Since f is positive definite on SM in (4) is nonnegative from which (1) follows. The next result follows from Theorems 4.2.6 and 4.2.7. Theorem 4.2.8 (Rudin). Let d 2 and V Rd be a symmetric closed square. Then there exists a continuous function on V which cannot be extended to a positive definite function on Rd . In Section 4.4 we return to the extension problem and show that if V is a ball in Rd and f is a continuous radial positive definite function on V , then there exists a radial positive definite function on Rd that extends f .
4.3
Decomposition of locally defined positive definite functions
Throughout the present section V denotes a symmetric open neighborhood of 0 2 Rd . The main result of this section is that every positive definite function f on V can be written as f D fc C f0 where fc and f0 are positive definite, fc is continuous and f0 is near to zero in a certain sense. Definition 4.3.1. Let h be a complex-valued function on V . We say that h averages to zero on V if for every neighborhood W of 0 and for any > 0 there exist x1 ; : : : ; xn 2 W and positive numbers p1 ; : : : ; pn summing to 1 such that ˇ ˇX ˇ n ˇ ˇ ˇ p h.x C x / i i ˇ 12 for all x 2 W0 . If pi 0 and pi D 1, then we have n X
pi H g.xi / >
iD1
1 2
for all xi 2 W0 . Since h averages to zero, for all > 0 we can choose pi and xi such that the inequality ˇ ˇX ˇ ˇ n ˇ pi h.y C xi /ˇˇ < ˇ iD1
holds whenever x; y C xi 2 V . Using that y 2 x0 C W implies y C xi 2 xi C x0 C W x0 C W C W V , we obtain Z X n n 1 X < pi H g.xi / D pi H.y C xi / g.y/ dy 2 Rd iD1 iD1 ˇX ˇ Z ˇ n ˇ ˇ jg.y/j dy sup ˇ pi h.y C xi /ˇˇ y2x0 CW Rd iD1 Z jg.y/j dy Rd
for all > 0. This contradiction shows that h vanishes -almost everywhere.
Section 4.3 Decomposition of locally defined positive definite functions
215
The main result of this section is the following theorem: Theorem 4.3.4 (Sasvári). Every function f 2 P .V / admits a unique decomposition f D fc C f0 where fc 2 P c .V /; f0 2 P .V / and f0 averages to zero. If f is Lebesgue measurable, then f0 vanishes -almost everywhere.10 First we introduce some notation and basic facts that will be used in the proof. P P Notation 4.3.5. Let D i ci ıti and D j dj ısj be measures in Mf .V / such that sj ti 2 V for all i; j . We write11 X .; / WD f .sj ti /ci dj : i;j
Since f is positive definite we have .; / 0. Moreover, the Cauchy–Schwarz inequality (1) j.; /j2 .; /.; / holds whenever all expressions are defined. This can be proved in the same way as in positive semidefinite inner product spaces. For t 2 V and 2 Mf .V / such that ıt 2 Mf .V / we set Ut WD ıt : Then the mapping 7! Ut is linear, where it is defined, and we have .Ut ; Ut / D .; /
(2)
f .t / D .ı0 ; Ut ı0 /:
(3)
Let fWn g1 nD1 be a sequence of balls with center zero such that W1 C W1 V;
WnC1 C WnC1 Wn
and
1 \
Wn D f0g:
nD1
For all t 2 V let N.t / 2 N be the smallest integer such that t C WN.t/ V . Then is a neighborhood basis of t . If n N.t /, then .; / N.0/ D 1 and ft C Wn g1 nDN.t/ is a well-defined semidefinite inner product on the linear manifold Mf .t C Wn /. Therefore, Mf .t C Wn / can be completed to a Hilbert space Hn .t /. The element of
10 11
The case where V is the whole group has been considered in [11] in a more general setting. Note that if V ¤ Rd , then .; / is not an inner product in Mf .V / since it is not defined for all pairs of and .
216
Chapter 4 The extension problem
Hn .t / corresponding to a 2 Mf .t C Wn / will also be denoted by . We assume that the completions are chosen such that12 t CWn sCWm implies Hn .t / Hm .s/. For all t; s 2 V such that t s 2 V let N.t; s/ 2 N be the smallest integer such that t C WN.t;s/ .s C WN.t;s/ / V . If n N.t; s/, then .; / is well-defined for all 2 Mf .t C Wn / and 2 Mf .s C Wn /. In view of inequality (1), .; / is a bounded sequilinear functional on Mf .t C Wn / Mf .s C Wn / and hence it has a unique extension to a bounded sequilinear functional on Hn .t / Hn .s/ which we also denote by .; /. By continuity, the inequality j.h; g/j2 .h; h/.g; g/ holds for all h 2 Hn .t / and g 2 Hn .s/ if n N.t; s/. Let t 2 V be fixed. Since Ut 2 Mf .s t C WN.t;s/ / for all measures 2 Mf .s C WN.t;s/ /, the isometric linear mapping Ut W Mf .s C WN.t;s/ / ! Mf .s t C WN.t;s/ / can be uniquely extended to an isometric mapping Ut W HN.t;s/ .s/ ! HN.t;s/ .s t /: Doing so for all s 2 V we obtain a linear isometric mapping13 Ut defined on [ HN.t;s/ .s/: dom.t / WD s2V
Using the properties of Ut on Mf .s C WN.t/ / we see that the relations .UtCs h; g/ D .Ut Us h; g/ D .Us h; Ut g/
(4)
hold for all t; s 2 V and for all h; g for which all the expressions above are welldefined. Set 1 \ Hn .0/: H0 WD nD1
Then ı0 2 H0 and H0 is a Hilbert space. Since HN.t;0/ .0/ dom.t /, the mapping Ut is well-defined on H0 . Next we define two semigroups of operators that will play an important role in the proof of Theorem 4.3.4. If m > 1, then Wm C Wm Wm1 and hence the relations t 2 Wm and 2 Mf .Wm / imply that Ut 2 Mf .W1 /. Thus, Ut maps Hm .0/ and H0 into Hm1 .0/. For t 2 Wm we denote by Ut0 the restriction of Ut to H0 . Then Ut0 12 13
That this is possible can be easily seen, e.g., by applying the completion procedure which uses Cauchy sequences. We use the word “mapping” to indicate that domain and range of Ut need not be Hilbert spaces.
Section 4.3 Decomposition of locally defined positive definite functions
217
is an isometric linear operator on H0 with range in Hm1 .0/. For m > 1 set Um WD fUt0 W t 2 Wm g the closure of U in the weak operator topology.14 Write further and denote by Um m \ S0 WD Um m>1
and let S be the closure, in the weak operator topology, of the convex hull of S0 . and S are weakly The set Um consists of operators of norm 1 and hence the sets Um 0 compact. By Theorem D.5.9, the set S is weakly compact as well. Moreover, in view of Theorem D.5.5, the norm of the operators in S and S0 is at most 1. We now show several properties of S; S0 and the mappings Ut . Lemma 4.3.6. The operators in S and S0 map H0 into H0 . Proof. It suffices to prove the statement concerning S0 . If m0 > m > 1, then we have Wm0 C Wm0 Wm and
Consequently,
Ut0 H0 Hm .0/;
t 2 W m0 :
Um 00 H0 Hm .0/;
m00 m0 :
We conclude that S0 H0 Hm .0/ whenever m > 1 and therefore S0 H0 H0 . In the sequel we will consider the elements of S and S0 as operators from H0 into H0 . Lemma 4.3.7. S and S0 are semigroups. Proof. Let s1 ; s2 2 S0 be arbitrary. We show that s1 s2 2 Um for all m > 1. If m0 > m, then Wm0 CWm0 Wm and Um0 Um0 Um . Since multiplication of operators is separately continuous in the weak operator topology (cf. Lemma D.5.2), we infer U U . From the definition of S we see that s s 2 U U and hence that Um 0 m0 0 1 2 m m0 m0 U for all m > 1. Consequently, s s 2 S showing that S is a semigroup. s1 s2 2 Um 1 2 0 0 m Since the convex hull of S0 is also a semigroup, the same is true for the weak operator closure S . Notation 4.3.8. By Theorem D.5.11 the semigroup S contains an orthogonal projection P such that sP D P s D P for all s 2 S . We introduce the notation c WD P ı0 and 0 WD ı0 c . 14
We consider the weak operator topology on the set of all bounded linear operators from H0 to Hm1 .0/, cf. D.5.1.
218
Chapter 4 The extension problem
Lemma 4.3.9. Let m0 < m. The mapping t 7! Ut0 c from Wm0 into Hm .0/ is weakly continuous at 0. Proof. We show that Ut0n c ! c for an arbitrary sequence tn ! 0; tn 2 Wm . The sequence fUt0n g can have cluster points only in S0 . Consequently, fUt0n c g can only have cluster points in S0 c . Using the relation sP D P;
s 2 S0 S
we see that S0 c D S0 P ı0 D fP ı0 g D fc g and therefore Utn c ! c . Lemma 4.3.10. The function g defined by g.t / D .Ut0 c ; h/;
t 2V
is continuous on V for all h 2 H0 . Proof. The function g is continuous at t0 2 V if and only if the function g0 .t / WD .UtCt0 c ; h/ which is defined in a neighborhood of 0, is continuous at 0. Using (4.3.5.4) we obtain .UtCt0 c ; h/ D .Ut c ; Ut0 h/:
(1)
Since v 7! .v; Ut0 h/ is a continuous linear functional on H0 , the continuity of g0 at 0 follows at once from equation (1) and from Lemma 4.3.9. Lemma 4.3.11. For all > 0 and m 2 N there exist t1 ; : : : ; tn 2 Wm and p1 ; : : : ; pn 0 summing to 1 such that n X p i U ti 0 (1) < : iD1
Proof. Since 0 D P 0 2 S0 , the lemma follows from the definitions of S and S0 and from the fact that the weak and norm closures of the convex hull of S0 0 coincide. Lemma 4.3.12. For all t0 2 V we have .Ut0 c ; 0 / D .Ut0 0 ; c / D 0: Proof. In view of Lemma 4.3.10 the function t 7! .Ut c ; 0 / is continuous on V . Thus, for all > 0 there exists N 2 N such that j.Ut0 t c ; 0 / .Ut0 c ; 0 /j < ;
t 2 WN :
219
Section 4.3 Decomposition of locally defined positive definite functions
Consequently, if pi 0; p1 C C pn D 1 and ti 2 WN , then ˇ n ˇ ˇ X ˇ ˇ ˇ p U ; ; / t 2 WN : .U i t0 t c 0 t0 c 0 ˇ < ; ˇ
(1)
iD1
By the previous lemma we can choose pi and ti such that inequality (4.3.11.1) holds. We then have ˇ n ˇ ˇ
ˇ n X ˇ ˇ ˇ ˇ X ˇ ˇ ˇ pi Ut0 ti c ; 0 ˇ D ˇ Ut0 c ; pi Uti 0 ˇˇ kUt0 c k: ˇ iD1
iD1
Comparing this with (1) we obtain .Ut0 c ; 0 / D 0 and hence .Ut0 0 ; c / D .0 ; Ut0 c / D 0. Proof of Theorem 4.3.4. Setting fc .t / WD .c ; Ut c /;
f0 .t / WD .0 ; Ut 0 /;
t 2V
the functions fc and f0 are positive definite on V . Indeed, write '.t / WD .x; Ut x/; t 2 V , where x 2 H0 is arbitrary. Using the relation (4.3.5.4) we obtain
X X n n n n X X '.ti tj /ci cj D x; ci cj Uti tj x D ci Uti x; ci Uti x 0 i;j D1
i;j D1
iD1
iD1
where tj 2 V and cj 2 C. By the previous lemma we have f .t / D .ı0 ; Ut ı0 / D .c C 0 ; Ut .c C 0 // D fc .t / C f0 .t /;
t 2 V:
The continuity of fc follows from Lemma 4.3.10. To prove that f0 averages to zero, let > 0 and m 2 N be arbitrary and choose tj and pj as in Lemma 4.3.11. We then have ˇ ˇ
ˇ n ˇ n X ˇ ˇ ˇ ˇX ˇ ˇ ˇ pi f0 .t ti /ˇ D ˇ 0 ; pi Utti 0 ˇˇ (2) ˇ iD1
iD1
ˇ n ˇ ˇ ˇ X ˇ pi Uti 0 ; Ut 0 ; ˇˇ Dˇ
(3)
X n kUt 0 k p i U ti 0 :
(4)
iD1
iD1
To prove the uniqueness, assume that f D fc C f0 D gc C g0 where gc 2 P c .V /; g0 2 P .V / and g0 averages to zero. Then the function h WD fc gc D g0 f0 averages to zero and it is continuous. Theorem 4.3.3 shows that h D 0, i.e., we have fc D gc and f0 D g0 . The second statement follows from Theorem 4.3.3.
220
Chapter 4 The extension problem
Recall that B o .r/ Rd denotes the open ball with center 0 and radius r > 0. A function g defined on B o .r/ is called a radial function if g.t / D g.O t / for all t 2 B o .r/ and for all orthogonal matrices O 2 O.d /. Taking O D I we see that radial positive definite functions on B o .r/ are real-valued. The same argument as in Lemma 3.6.2 shows that g W B o .r/ ! C is radial if and only if there exists a complex-valued function h on Œ0; r/ such that g.t / D h.kt k/;
t 2 B o .r/:
Lemma 4.3.13. If V D B o .r/ and f 2 P .V / is radial, then the functions fc and f0 in Theorem 4.3.4 are radial as well. Proof. Let e 2 B o .1/ be arbitrary and define the function h by h.t / WD f .t e/;
t 2 I WD .r; r/:
Since f is radial h does not depend on e. By Theorem 4.3.4, h D hc C h0 where hc 2 P c .I /; h0 2 P .I / and h0 averages to zero. On the other hand, we have h.t / D fc .t e/ C f0 .t e/. The uniqueness statement of Theorem 4.3.4 shows that fc .t e/ D hc .t / and f0 .t e/ D h0 .t / for all t . Thus, fc and f0 are radial. Lemma 4.3.14. If V D Bdo .r/; d 2, and f0 2 P .V / is a radial function which vanishes Lebesgue almost everywhere, then f0 D p1f0g with some nonnegative constant.15 Proof. We may assume that f0 .0/ D 1. Let k be a positive integer and let 0 < q < r. Let t 2 Rd be a fixed point of norm q and choose ı > 0 such that Sı Sı B o .r/ where Sı D fx 2 Rd W kxk D q; kx t k ıg: Choose independent random vectors x1 ; : : : ; xk on a probability space .; A; P / such that xj .!/ 2 Sı for all ! and xj is uniformly distributed in Sı for all j . Further, let x0 be the random variable on which is identically zero. Write ˛0 D 1 and ˛j D f0 .t /; 1 j k. Since f0 is positive definite the inequality k k X X iD0 j D0
f0 .xi .!/ xj .!//˛i ˛j D 1 kf .t /2 C 2f .t /2
X
f0 .xi .!/ xj .!// 0
1i 2, we consider the function g0 .s/ WD f0 .s; 0; : : : ; 0/; s 2 B2o .r/. By the first part of the proof, g0 .s/ D 0 for all s ¤ 0. Since f0 is radial we infer that f0 .t / D 0 for all t 2 B o .r/ n f0g. Combining Theorem 4.3.4 and the preceding lemmata we obtain the following result.17 Theorem 4.3.15. Let d 2 and f be a Lebesgue measurable radial positive definite function on B o .r/ Rd . Then f admits the decomposition f D fc C p1f0g where fc is a continuous radial positive definite function on B o .r/ and p is a nonnegative constant.
4.4
Extension of radial positive definite functions
We have seen in Section 4.2 that there exist locally defined positive definite functions which cannot be extended to the whole of Zd or Rd . The situation changes if we assume radiality. Theorem 4.4.1 (Rudin). Every radial function ' 2 P c .Bdo .2r// can be extended to a radial positive definite function on Rd . d Proof. Denote by C1 r the set of all radial, infinitely differentiable functions on R whose support lies in B o .2r/ and let Pr be the set of all f 2 C1 r which are positive definite. By Theorem 3.10.4, each f 2 Pr is the sum of a uniformly convergent series
f D
1 X
fk fQk
(1)
kD1
where fk is infinitely differentiable and vanishes outside B o .r/. Note that fO is nonnegative and radial. O Fix f 2 C1 r , write Mf WD kf k1 , and choose 2 .0; 1/ such that the support o of f lies in B .2r /. By Lemma 3.2.4 there exists a sequence fgi g of functions gi 2 Pr such that 16 17
This argument is due to T. Gneiting. See also [21] and [50] for decomposition results concerning more general norms.
222
Chapter 4 The extension problem
(i)
supp gi B o ./
(ii)
gO i ! 1 uniformly on compact sets R Rd gi h d ! h.0/ for every continuous function h.
(iii)
Let ı > 0 be arbitrary. There is a compact set K Rd such that jfO.y/j < ı;
y 2 Rd n K
and there is a positive integer i0 such that Mf gO i .y/ < Mf ı;
y 2 K; i i0 :
Then fO < Mf gO i C ı on Rd . Multiplying this inequality by gO i we see that the Fourier transform of the function hi WD Mf gi gi C ıgi f gi is nonnegative. Noting that the support of hi lies in B o .2r/ we conclude that hi 2 Pr . Proposition 4.1.6 together with (1) shows that Z Z .f gi /'.x/ dx Œıgi .x/ C Mf .gi gi /.x/'.x/ dx; i i0 : B o .2r/
B o .2r/
Letting i ! 1 we obtain Z B o .2r/
f .x/'.x/ dx .ı C Mf /'.0/:
Letting ı ! 0 and applying the same argument to f in place of f we obtain the inequality ˇZ ˇ ˇ ˇ ˇ f .x/'.x/ dx ˇˇ Mf '.0/; f 2 C1 (2) r : ˇ o B .2r/
Denote by E the linear subspace of all g 2 C0 .Rd / which are integrable and gL 2 C1 r . d We endow C0 .R / with the supremum norm and define the linear functional L by Z Lg WD g.x/'.x/ L dx; g 2 E: B o .2r/
By inequality (2) we have jLgj '.0/kgk1 ; g 2 E. The Hahn–Banach Theorem D.4.4 shows that L can be extended to a linear functional LQ on C0 .Rd / such that Q jLgj '.0/kgk1 ; g 2 C0 .Rd /. Therefore, in view of the Riesz–Markov Theorem E.1.4, there is a complex Radon measure on Rd such that Z Z fO.y/ d.y/; f 2 C1 f .x/'.x/ dx D (3) r B o .2r/
Rd
and jj.X / '.0/. It is not hard to check that is radial and real. Applying equation (3) to gi in place of f and letting i ! 1, we obtain '.0/ D .Rd / jj.Rd /.
223
Section 4.4 Extension of radial positive definite functions
Hence is nonnegative. Equation (3) is the same as Z Z Z f .x/'.x/ dx D f .x/ ei.x;y/ d.y/ dx; B o .2r/
B o .2r/
Rd
f 2 C1 r :
Since this holds for all f 2 C1 r we conclude that Z ei.x;y/ d.y/; x 2 B o .2r/: '.x/ D Rd
This right-hand side of this equation defines a radial positive definite function on the whole of Rd . This completes the proof of the theorem. Combining Theorem 4.4.1 with Theorem 4.3.15 we obtain the following result. Theorem 4.4.2 (Gneiting–Sasvári). Every Lebesgue measurable radial function in P .Bdo .2r// can be extended to a Lebesgue measurable radial positive definite function on the whole of Rd .
Chapter 5
Selected applications
A chapter on applications could easily fill several hundred pages so that we had to be very restrictive in our choice of the topics. We start with the probably ‘most classical’ application of characteristic functions: limit theorems. Even here we had to restrict ourselves to present only basic methods. In the remaining sections we treat topics which usually cannot be found in other textbooks in this form.
5.1 Limit theorems First we prove a version of the central limit theorem. Theorem 5.1.1. Let be a d-dimensional distribution such that the first order moments are all equal to zero while the second order moments are finite. If X1 ; X2 ; : : : are independent random vectors all having the distribution , then the distributions of the random vectors X1 C X2 C C Xn p Sn D n tend weakly, as n ! 1, to the normal distribution having the same first and second moments as . Proof. Denoting by f the characteristic function of , the characteristic function fn p of Sn is given by fn .t / D f .t = n/n . By Corollary 1.2.7, n d p 1 X ŒE.Xi Xj / C Ri;j .t = n / ti tj ; fn .t / D 1 2n i;j D1
where limt!0 Ri;j .t / D 0. Using Lemma B.1.2 we obtain 1
lim fn .t / D e 2
n!1
Pd
i;j D1
E.Xi Xj / ti tj
:
The statement now follows from Lévy’s continuity Theorem 1.6.3. The next application is Khinchin’s weak law of large numbers.
t 2 Rd
225
Section 5.1 Limit theorems
Theorem 5.1.2. Let X1 ; X2 ; : : : be independent identically distributed d-dimensional random vectors and put Sn D
X1 C X2 C C Xn : n
If E.X1 / D m exists then lim P .kSn mk / D 0
n!1
for all > 0. Proof. Denote by f the characteristic function of the random vectors Xj . Then the characteristic function fn of Sn is given by fn .t / D f .t =n/n . By Theorem 1.2.6, n d 1 X Œimj C Rj .t =n/ tj ; fn .t / D 1 C n
t 2 Rd
j D1
where limt!0 Rj .t / D 0. Lemma B.1.2 shows that lim fn .t / D ei.m;t/ :
n!1
By Lévy’s continuity Theorem 1.6.3, the distributions of the Sn ’s converge weakly, and hence also in probability, to ım . The next result is Poisson’s limit theorem. Theorem 5.1.3. Let X1;1 X2;1 ; X2;2 :: : Xn;1 ; Xn;2 ; : : : ; Xn;n :: : be random variables which are independent and identically distributed within each row of the array above and such that P .Xn;k D 1/ D pn ;
P .Xn;k D 0/ D 1 pn
where limn!1 npn D . Writing Sn D Xn;1 C Xn;2 C C Xn;n
226
Chapter 5 Selected applications
we have lim P .Sn D k/ D e
n!1
k ; kŠ
Proof. The characteristic function of Sn is given by n fn .t / D 1 C pn eit 1 ;
k 2 N0 :
(1)
t 2 R:
By our assumption, pn D . C n /=n where n ! 0. Using this and Lemma B.1.2 we obtain it lim fn .t / D e .e 1/ : n!1
Thus, by 1.6.3, the distributions of the Sn ’s converge weakly to the Poisson distribution with parameter (see (1.1.13.d)) from which (1) follows.
5.2 Sums of independent random vectors and the Jessen–Wintner purity law In the present section we study infinite series X1 C X2 C where the Xj ’s are independent random vectors. The investigation of such series leads us to infinite convolutions of distributions and infinite products of characteristic functions. The main results are Kolmogorov’s three-series theorem, the equivalence of convergence in law and almost sure convergence, and the Jessen–Wintner purity law (cf. Theorem 5.2.14, Corollary 5.2.12 and Theorem 5.2.15).1 If fn g is a sequence of distributions converging weakly to some distribution , then we simply write n ! . Recall that the sum A C B of two sets A; B Rd is defined as the set of those points in Rd that may be represented in at least one way as a C b where a 2 A and b 2 B. Next we extend this notation to an infinite sequence of sets. Definition 5.2.1. By the limit lim Bn of a sequence fBn g of subsets of Rd we understand the set of those points in Rd that may be represented in at least one way as the limit of a sequence fxn g where xn 2 Bn . By the sum B1 C B2 C we mean the set lim .B1 C C Bn /. It is easy to check that lim Bn , which may be empty, is a closed set. 1
The presentation of the results is based on the paper [31].
Section 5.2 Sums of independent random vectors and the Jessen–Wintner purity law
227
Definition 5.2.2. If fn g1 1 is a sequence of distributions, we say that the infinite convolution 1 2 is convergent if there exists a distribution such that 1 n ! : Lemma 5.2.3. If n ! and n n ! , then n ! ı0 . Proof. Denote by fn ; f and gn the characteristic functions of n ; and n , respectively. For all t 2 Rd we have fn .t / ! f .t / and fn .t /gn .t / ! f .t /. This implies that gn .t / ! 1 if f .t / ¤ 0. Since the set of such t ’s contains a neighborhood of 0, Corollary 1.4.15 shows that gn .t / ! 1 for all t 2 Rd . Thus, n ! ı0 by Lévy’s continuity Theorem 1.6.3. Theorem 5.2.4. Let n be a distribution with characteristic function fn ; n 2 N. The following conditions are equivalent: (i)
The infinite convolution 1 2 is convergent.
(ii)
The sequence fpn g1 1 with pn D f1 fn converges uniformly on every comd pact subset of R .
(iii)
nC1 nCd.n/ ! ı0 for an arbitrary function d W N ! N.
Proof. The equivalence of (i) and (ii) follows immediately from Theorems 1.6.1 and 1.6.3. Let d W N ! N be arbitrary. Writing n D nC1 nCd.n/ and n D 1 n we have nCd.n/ D n n . Consequently, if n ! we also have n n ! , hence n ! ı0 by Lemma 5.2.3. Conversely, assume that n ! ı0 for all d W N ! N and let K Rd be compact. Then k1 On k D k1 fnC1 fnCd.n/ k ! 0
(1)
where kk denotes the supremum norm on K. We show that fpn g is a Cauchy sequence with respect to the supremum metric on K from which (ii) follows. Assume, on the contrary, that there exists > 0 such that for all n 2 N there exist positive integers a.n/; b.n/ with n < a.n/ < b.n/ such that the distance of pa.n/ and pb.n/ is at least . We have pa.n/ pb.n/ D pa.n/ 1 fa.n/C1 fb.n/ 1 fa.n/C1 fb.n/ 1 fnC1 fb.n/ C fnC1 fb.n/ fa.n/C1 fb.n/ D 1 fnC1 fb.n/ C fa.n/C1 fb.n/ 1 fnC1 fa.n/ 1 fnC1 fb.n/ C 1 fnC1 fa.n/ ! 0 where we have used (1). This contradiction completes the proof.
228
Chapter 5 Selected applications
Using Theorem B.3.6 we obtain the following corollary: Corollary 5.2.5. If for each compact set K Rd there exists a constant CK 0 such that 1 X j1 fn .t /j CK ; t 2 K; nD1
then the infinite convolution 1 2 is convergent. 5.2.6. If x 7! xj is -integrable for each j D 1; : : : ; d , then we denote by E./ the center of gravity or barycenter of , i.e., E./ is the point of Rd which has the moments Z xj d.x/ Mj ./ D of order one as coordinates. We denote
Rd by c
the distribution given by
c .B/ D .B C E.//; B 2 Bd so that E.c / D 0. For an arbitrary distribution on Rd with finite moments Z Mj;j ./ D xj2 d.x/; j D 1; : : : ; d Rd
of order two, we write Z Z 2 Var./ D kx E k d.x/ D Rd
Rd
.x1 M1 /2 C C .xd Md /2 d.x/
where Mj D Mj ./. Note that if is the distribution of a random vector X , then E./ D E.X /; Var./ D E.kX E X k2 / and c is the distribution of X E.X /. This fact can be used to show easily the following simple properties (provided the existence of E or Var, respectively): (i)
E.1 2 / D E.1 / C E.2 /;
(ii)
Var.1 2 / D Var.1 / C Var.2 /;
(iii)
.1 2 /c D c1 c2 ;
(iv)
f c .t / D eiE./t f .t /; t 2 Rd , where f and f c denote the characteristic functions of and c , respectively.
(v)
if n ! and lim infn Var.n / is finite, then Var./ is finite, namely Var./ lim infn Var.n /.2
2
This follows from the relation Z
Z f .x/ d.x/ lim inf n
f .x/ dn .x/
which holds for every nonnegative, continuous function on Rd .
Section 5.2 Sums of independent random vectors and the Jessen–Wintner purity law
229
Theorem 5.2.7. If Mj;j .n / is finite for every n 2 N and j D 1; : : : ; d , then the convergence of the two series E.1 / C E.2 / C
and
Var.1 / C Var.2 / C
implies that 1 2 converges to some distribution . Furthermore, Var./ is finite and the series above converge to E./ and Var./, respectively. Proof. Assume first the E.n / D 0. Since Mj;j .n / < 1, the characteristic function gn of n possesses continuous partial derivatives of order at least two and those of the first order vanish at 0. Using this, Corollary 1.2.7 shows that gn .t / 1 D
d 1 X ŒRe gn;ti ;tj .n t / C i Im gn;ti ;tj .n t / ti tj 2 i;j D1
where the subscripts ti ; tj denote partial differentiation. By Theorem 1.2.1, Z jti tj j dn .t / jgn;ti ;tj j Rd
Z
Rd
12 Z
ti2 dn .t /
Rd
12
tj2 dn .t /
Var.n /:
Using this we obtain d X 1 d jti tj j p Var.n / kt k2 jgn .t / 1j p Var.n / 2 2 i;j D1
(1)
which proves that 1 2 converges to some P distribution (cf. Corollary 5.2.5). From (5.2.6.v) we conclude that Var./ 1 j D1 Var.j / < 1. Since Var./ D Var.nC1 /, we have Var./ Var.1 / C C Var.n /, Var.1 n / C P and hence Var./ 1 j D1 Var.j /. Applying (1) to the characteristic function fn WD g1 gn instead of gn and using (5.2.6.ii) we get n d X Var.j / kt k2 : jfn .t / 1j p 2 j D1 Taking the limit n ! 1 we see that the partial derivatives of order one of the characteristic function of are zero at 0, thus E./ D 0. The general case where E.n / is not supposed to be zero follows from the relation gn .t / D eiE.n /t gnc .t /. Remark 5.2.8. The converse of the previous theorem is false. To see an example, random variables such that P .Xn D 0/ D 1 pn let X1 ; X2 ; : : : be independent P where 0 pn 1 and 1 p n < 1. The Borel–Cantelli Lemma F.2.1 shows that 1
230
Chapter 5 Selected applications
P P1 Xn is a finite sum with probability 1. On the other hand, the series 1 1P 1 E .Xn / 1 Var .X / may not converge. We can take for example p D and choose and 1 n n 1 n2 1 2 2 Xn such that P .Xn D n / D n2 . Then E .Xn / D 1 and Var .Xn / D n 1. We have, however, the following theorem. Theorem 5.2.9. If for some R 0 the supports of all n are contained in the sphere fx W kxk Rg, then the convergence of the two series E.1 / C E.2 / C
and
Var.1 / C Var.2 / C
is necessary and sufficient for the convergence of 1 2 . Proof. The sufficiency follows from Theorem 5.2.7. In order to prove the necessity assume that 1 2 converges to some distribution and let gn ; gnc and g be the characteristic functions of n ; cn and , respectively. Suppose that the series Var.1 / C Var.2 / C is divergent. Then it follows from Var.n / D Pd c c c j D1 Mj;j .n / that for some value of j the series Mj;j .1 / C Mj;j .2 / C is divergent. Without loss of generality assume that j D 1. In view of Corollary 1.11.2 there exists an open neighborhood U of 0 2 R such that jgn .s; 0; : : : ; 0/j D jgnc .s; 0; : : : ; 0/j 1
1 2 s M1;1 .cn / 8
holds for all s 2 U and for all n. The second part of Theorem B.3.4 shows that g.s; 0; : : : ; 0/ D
1 Y
gn .s; 0; : : : ; 0/ D 0;
s 2 U n f0g:
nD1
This, however, contradicts the continuity of g. A fundamental theorem of the theory of sums of independent random vectors is that for the sequence of partial sums there is equivalence between convergence in law and convergence almost everywhere. Theorem 5.2.11 is an immediate consequence of this fact. Below we present a different proof given in the paper [31] of B. Jessen and A. Wintner since it contains a nice application of Jessen’s martingale Theorem 5.2.10.3 To formulate this theorem we need some notation (see Section F.3 for the definition of products of probability spaces).
3
I would like to thank Christian Berg for making Jessen’s original work [30] available to me.
Section 5.2 Sums of independent random vectors and the Jessen–Wintner purity law
231
For all n 2 N let .n ; An ; Pn / be a probability space and write Q D 1 2 AQ D A1 A2
PQ D P1 P2
Q !Q n D .!nC1 ; !nC2 ; : : : /; ! D .!1 ; !2 ; : : : / 2 Q n D nC1 nC2 AQ n D AnC1 AnC2 PQn D PnC1 PnC2 : We will identify .!1 ; !2 ; : : : / and .!1 ; : : : ; !n ; !Q n /. Theorem 5.2.10 (Jessen). Let f 2 L1 .PQ / and for each n 2 N define the function fn by Z Q Q D f .!1 ; : : : ; !n ; !Q n / dPQn .!Q n /; !Q 2 : fn .!/ Qn
Then Q D f .!/ Q lim fn .!/
n!1
Q holds for PQ -almost all !Q 2 . Theorem 5.2.11. A necessary and sufficient condition for the convergence of the infinite convolution 1 2 is that if X1 ; X2 ; : : : are independent random vectors on a probability space .; A; P / having the distributions 1 ; 2 ; : : :, then the series X1 C X2 C is convergent P -almost everywhere. The distribution of the random vector S D X1 C X2 C is then D 1 2 . Proof. By independence, the distribution of the partial sum Sn D X1 C C Xn is n D 1 n . Moreover, by the very definition of the convolution, n is also the distribution of the random vector Q D X1 .!1 / C C Xn .!n /; SQn .!/
Q !Q D .!1 ; !2 ; : : :/ 2 :
If fSn g converges P -almost everywhere to some random vector S , then Sn ! S in law, showing that n ! where is the distribution of S . Now assume that fn g is convergent and let d W N ! N be arbitrary. The distribution of Q D XnC1 .!nC1 / C C XnCd.n/ .!nCd.n/ / RQ n .!/ is n D nC1 nCd.n/ . From Theorem 5.2.4 we see that n ! ı0 implying that SQn converges to some that RQ n ! 0 in probability (see Section F.2). This shows P Q Q D 1 random vector SQ in probability. Setting QQ n .!/ kDnC1 Xk .!k /, Qn is defined Q P -almost everywhere and Q
Q
Q
Q Q Q f .!/ Q WD ei.S .!/;y/ D ei.Sn .!/;y/ ei.Qn .!/;y/
232
Chapter 5 Selected applications
for every y 2 Rd . The function f is measurable and bounded, hence integrable, so that we may apply Theorem 5.2.10 with .n ; An ; Pn / D .; A; P / for all n. We find Q
Q fn .!/ Q D ei.Sn .!/;y/ zn .y/ Q Q with some zn .y/ 2 C satisfying limn zn .y/ D 1. Hence, ei.Sn .!/;y/ converges PQ almost everywhere for every y. Lemma B.1.17 shows that SQn is convergent PQ -almost everywhere. Since the random variables
! 7! .X1 .!/; X2 .!/; : : :/
and
!Q 7! .X1 .!1 /; X2 .!2 /; : : :/
Q respectively, have the same distribution (by independence), we defined on and , conclude that Sn is convergent P -almost everywhere. The set of points where an infinite series of random vectors converges is a tail event. Combining the previous theorem and the zero–one law of Kolmogorov (cf. Theorem F.2.5) we obtain the following corollary. Corollary 5.2.12. An infinite series X1 C X2 C of independent random vectors is convergent in law if and only if it is convergent almost surely. Moreover, the series is always either almost surely convergent or almost surely divergent. 5.2.13. The following notation will be used in the next theorem. Let fXn g be a sequence of d-dimensional random vectors and, as in Lemma 1.11.3, define the random vectors Xn;R , where n 2 N and R 2 .0; 1/, by ( if kXn .!/k R Xn .!/ ! 2 : Xn;R .!/ D 0 if kXn .!/k > R Denoting by n and n;R the corresponding distributions we have n;R D n;R C .1 n .BR // ı0 where BR D fx 2 Rd W kxk Rg and n;R .A/ D n .BR \ A/; A 2 Bd . Theorem 5.2.14 (Kolmogorov’s three-series theorem). A necessary and sufficient condition for the convergence of the infinite convolution 1 2 is the convergence of the three series .1 1 .BR // C .1 2 .BR // C
(1)
E.1;R / C E.2;R / C
(2)
and
Var.1;R / C Var.2;R / C
for a fixed R > 0 (or for all R > 0). Proof. Let X1 ; X2 ; : : : be independent random vectors having the distributions 1 ; 2 ; : : :. By Theorem 5.2.11 the convergence of 1 2 is necessary and sufficient for the almost sure convergence of X1 C X2 C .
Section 5.2 Sums of independent random vectors and the Jessen–Wintner purity law
233
If X1 C X2 C is almost surely convergent, then Xn ! 0 almost surely, hence the probability that the event ŒkXn k > R happens for infinitely many n, is equal to 0. By Theorem F.2.1, we then have 1 1 X X P .Xn ¤ Xn;R / D .1 n .BR // < 1 nD1
nD1
so that X1;R C X2;R C is almost surely convergent in view of Lemma F.2.2. From Theorem 5.2.9 we see that the two series in (2) are convergent. Assume now that the three series in (1) and (2) are convergent. Theorem 5.2.9 then shows that X1;R C X2;R C is almost surely convergent. Applying again Lemma F.2.2 we see that X1 C X2 C is almost surely convergent. Theorem 5.2.15 (Jessen–Wintner purity law). If D 1 2 is a convergent infinite convolution of distributions n each of which is discrete, then the following three cases are possible: is (i) discrete; (ii) singular and continuous; (iii) absolutely continuous. Proof. Let X1 ; X2 ; : : : be a sequence of independent random vectors on a probability space .; A; P / such that n is the distribution of Xn . By Theorem 5.2.11, the series X1 .!/ C X2 .!/ C converges with probability 1. We may suppose that each Xn takes values in a countable set and that S D X1 C X2 C converges everywhere. Let G denote the smallest group in Rd containing all possible values of the Xn ’s. Then G is countable. Let E Rd be an arbitrary Borel set and write A WD f! 2 W S.!/ 2 E C Gg: Since X1 .!/ C C Xn .!/ 2 G;
n 2 N; ! 2
and G is a group, we see that A D f! W Xn .!/ C XnC1 .!/ C 2 E C Gg;
n 2 N:
Thus, A is a tail event (cf. F.2.4) with respect to X1 ; X2 ; : : : . By Theorem F.2.5, P .A/ D 1 whenever P .A/ > 0. This means that .E C G/ D 1 whenever .E C G/ > 0 and a fortiori when .E/ > 0 (since E E C G). If is not continuous, then there is a one-point set E with .E/ > 0. The set E CG is then countable, its -measure is 1, showing that is discrete. Assume now that is continuous. If there is a Borel set E such that .E/ D 0 and .E/ > 0, then .E C G/ D 0 (since G is countable) and .E C G/ D 1, i.e., is singular. If there is no such set E then, by the theorem of Radon and Nikodym, is absolutely continuous.
234
Chapter 5 Selected applications
5.3 Ergodic theorems for stationary fields We start with a simple lemma. As an immediate corollary, we obtain a limit theorem for stationary sequences by applying the isometric operator IZ from Remark 2.9.12. Lemma 5.3.1. The sequence fsn g1 nD1 of complex-valued functions where sn .x/ D
n 1 X ikx e ; n
x 2 R;
n2N
kD1
is uniformly bounded and converges pointwise to 1f0g (see Figure 5.1). Im 1
0:5
1 Re
1 Figure 5.1. The function s5 from Lemma 5.3.1.
Proof. We have sn .0/ D 1 and jsn .x/j 1 for all x. If x ¤ 0 then ˇ2 ˇ 1 ˇˇ eix ei.nC1/x ˇˇ 2 jsn .x/j D 2 ˇ ˇ ˇ n ˇ 1 eix
1 4 ! 0; 2 n j1 eix j2
n ! 1:
(1) (2)
Theorem 5.3.2. Let Z be a stationary time series on Z with representing measure . Then we have n 1X lim Z.k/ D .f0g/: n!1 n kD1
Section 5.3 Ergodic theorems for stationary fields
235
Proof. In view of Lemma 5.3.1 the sequence fsn g converges in L2 . / to 1f0g . The theorem follows by applying the isometric operator IZ (cf. Remark 2.9.12) to sn and noting that IZ .1f0g / D .f0g/ (cf. equation (2.7.1.3)). Next we prove the continuous version of the previous result. Theorem 5.3.3. If Z is a continuous stationary process on R with representing measure , then Z 1 T lim Z.t / dt D .f0g/: T !1 T 0 Proof. This theorem can be proved in the same way as the previous one by using the relations Z eiT x 1 1 T itx e dt D ; x ¤ 0; T > 0 (1) T 0 iT x and
Z T Z 1 T 1 IZ eitx dt D Z.t / dt T 0 T 0 (see equation (2.9.13.1)). Remark 5.3.4. Let Z be as in Theorem 5.3.2 or in Theorem 5.3.3 and denote by m the mean of Z. We say that Z is ergodic if n 1X Z.k/ D m n!1 n
lim
kD1
or 1 T !1 T
Z
T
Z.t / dt D m
lim
0
respectively. By the theorems above, Z is ergodic if and only if .f0g/ D m. If Z is ergodic and denotes the spectral measure of Z, then Z .f0g/ D 1f0g d D E j.f0g/j2 D jmj2 : Consequently, if m D 0, then Z is ergodic if and only if its spectral measure has no atom at the origin. Remark 5.3.5. Let Z be a stationary time series and write Sn D Z1 C CZn ; n 2 N. We already know that the sequence fSn =ng converges to some Y 2 H.Z/. Denote by .Un / the canonical unitary representation of Z in H.Z/. We then have
Z1Ck C C ZnCk SnCk Sk SnCk n C k Sk Sn Uk D D D n n n nCk n n
236
Chapter 5 Selected applications
for all k 2 Z such that n C k ¤ 0. Taking the limit n ! 1 and replacing k by k we obtain Uk Y D Y; k 2 Z i.e., Y is a so-called fixed point of .Un /. Similar computations can be carried out for continuous stationary processes and integrals instead of sums. This is the motivation for the following investigation of fixed points of unitary representations. Notation 5.3.6. Let .Ut / be a unitary representation of T D Rd or T D Zd in a Hilbert space H . For g 2 H we denote by H.g/ the closure of the linear manifold span fUt g W t 2 T g: It is easy to check that H.g/ is a .Ut /-invariant subspace of H . The orthogonal projection from H.Z/ onto H.g/ will be denoted by Pg . It follows from Lemma 2.10.16 that Pg commutes with all operators Ut : Ut P g D P g Ut ;
t 2 T; g 2 H:
(1)
An element h 2 H is called a fixed point of .Ut / if Ut h D h;
t 2 T:
We denote by H0 the set of all fixed points. This set is a .Ut /-invariant subspace, the orthogonal projection onto H0 will be denoted by P 0 . We have P 0 Ut D Ut P 0 D P 0 :
(2)
The first equation follows from the .Ut /-invariance of H0 , the second one from P 0 H D H0 . Lemma 5.3.7. For all g 2 H we have (i)
Pg P 0 D P 0 Pg
(ii)
H.g/ \ H0 D C P 0 g.
Proof. (i) Equations (5.3.6.1) and (5.3.6.2) show that Ut Pg P 0 D Pg P 0 . Thus, for all h 2 H the vector Pg P 0 h is a fixed point, i.e., Pg P 0 h 2 H0 . Consequently, P 0 Pg P 0 h D Pg P 0 h;
h2H
and hence P 0 Pg P 0 D Pg P 0 : Since Pg and P 0 are orthogonal projections, we see that .P 0 Pg h; h0 / D .h; Pg P 0 h0 / D .h; P 0 Pg P 0 h0 / D .P 0 Pg P 0 h; h0 /
(1)
237
Section 5.3 Ergodic theorems for stationary fields
for all h; h0 2 H . This relation and (1) imply that P 0 Pg D P 0 Pg P 0 D Pg P 0 . (ii) By (i) we have P 0 g D P 0 Pg g D Pg P 0 g 2 H.g/ and hence C P 0 g H.g/ \ H0 . If h0 2 H0 \ H.g/, then h0 D lim
n!1
kn X
cj;n Utj;n g
j D1
with some kn 2 N; cj;n 2 C and tj;n 2 T . Using equation (5.3.6.1) we see that h0 D P 0 h0 D lim
n!1
kn X
cj;n P 0 Utj;n g D
j D1
lim
n!1
kn X
cj;n P 0 g
j D1
i.e., h0 is proportional to P 0 g. Theorem 5.3.8. Let Z be a continuous stationary field on T D Rd or T D Zd with representing measure , spectral measure and correlation function C . Denote by .Ut /; .Vt / and .Wt / the canonical unitary representations in H.Z/; L2 . / and H.C /, respectively. Then we have: (i)
The set of all fixed points of .Vt / is C 1f0g .
(ii)
The set of all fixed points of .Wt / is C 1T \ H.C /.
(iii)
The set H0 of all fixed points of .Ut / is C .f0g/ and .f0g/ D P 0 Z.s/ for all s 2 T .
Proof. We consider only the case Rd , the case Zd can be treated in the same way. (i) If h is a fixed point of .Vt /, then h.x/ D Vt h.x/ D ei.t;x/ h.x/;
x; t 2 Rd :
We conclude that h.x/ D 0 if x ¤ 0, i.e., h 2 C 1f0g . On the other hand, 1f0g is a fixed point of .Vt / so that (i) holds. (ii) A function h 2 H.C / is a fixed point for the translation operators Wt if and only if h.x t / D h.x/ for all x; t 2 T . This equation is satisfied if and only if h is constant. (iii) It is easy to see that the isometric operator IZ from Remark 2.10.15 maps the set of fixed points of .Vt / onto H0 . From (i) we conclude that H0 D C IZ .1f0g / D C .f0g/:
238
Chapter 5 Selected applications
If .f0g/ D 0, then .f0g/ D 0. The equation above shows that H0 D f0g and hence P 0 D 0. Thus, the equation in (iii) holds whenever .f0g/ D 0. If .f0g/ ¤ 0, then .f0g/ ¤ 0, the subspace H0 is one-dimensional and P 0 ¤ 0. We obviously have P 0 Z.s/ 2 H0 . By equation (5.3.6.2), P 0 Z.s/ D P 0 Us Z.0/ D Us P 0 Z.0/ D P 0 Z.0/ and hence P 0 Z.s/ does not depend on s. From the definition of H.Z/ and from the fact that P 0 ¤ 0 we conclude that P 0 Z.s/ ¤ 0. Since H0 is one-dimensional, P 0 Z.s/ D P 0 Z.0/ D cIZ .1f0g / with some c 2 C n f0g. As P 0 is an orthogonal projection we have .P 0 Z.0/; P 0 Z.0// D .Z.0/; P 0 Z.0//: Using the fact that IZ is isometric we obtain .P 0 Z.0/; P 0 Z.0// D .c1f0g ; c1f0g / and .Z.0/; P 0 Z.0// D .1; c1f0g / D .1f0g ; c1f0g /: Thus, c D 1. Corollary 5.3.9. H0 D f0g if and only if .f0g/ D 0. We are now able to prove a generalization of Theorem 5.3.2 and Theorem 5.3.3. Note that the expressions Z n 1 T 1X Z.k/ and Z.t / dt n T 0 kD1 R can be written as Z d where D
n 1X ık n
or
D
kD1
1
jŒ0;T : T
The Fourier–Stieltjes transforms of these measures converge to 1f0g if n ! 1 and T ! 1, respectively (cf. Lemma 5.3.1 and equation (5.3.3.1)). Theorem 5.3.10 (ergodic theorem). Let Z be a continuous stationary field on the set T D Rd or T D Zd and let fn g be a sequence of probability measures such that lim L n .t / D 0;
n!1
t 2 T n f0g:
Z
Then lim
n!1 T
Z.t / dn .t / D .f0g/ D P 0 Z.s/;
where denotes the representing measure of Z.
s2T
239
Section 5.4 Filtration of discrete stationary fields
Proof. Since jL n j 1, Rthe sequence fL n g converges in L2 . / to 1f0g . By (2.9.13.1) and (2.9.14.1) we have Z dn D IZ .L n /. Using this and Theorem 5.3.8 we obtain Z Z.t / dn .t / D IZ 1f0g D .f0g/ D P 0 Z.s/: lim n!1 T
Example 5.3.11. (i) Let n be the Gaussian distribution on Rd with density x 7!
2 1 kxk 2n e .2 n/d=2 n
and characteristic function t 7! L n .t / D e 2 ktk . By the previous theorem we have Z kt k2 1 Z.t / e 2n dt D .f0g/: lim d=2 n!1 .2 n/ Rd 2
(ii) Let n be the uniform distribution on the cube Œ0; nd Rd . The characteristic function is then d Y 1 e2itj n : t 7! 2itj n j D1
The sequence fn g satisfies the condition of the previous theorem. (iii) Examples of sequences of distributions on Zd satisfying the condition of Theorem 5.3.10 can be given by considering the uniform distribution on f1; : : : ; ngd or on fn; : : : ; ngd .
5.4
Filtration of discrete stationary fields
First we prove a more detailed version of Theorem 2.9.19 where we replace by a square integrable function. Theorem 5.4.1. Let Z be a stationary field on Zd with representing measure Z and spectral measure Z . Further, let a 2 L2 .Zd / be such that the series X a.n/ei.;n/ (1) a./ O WD n2Zd
converges in L2 .Z /. Then the series X a.n/Z.t n/; a Z.t / WD n2Zd
t 2 Zd
240
Chapter 5 Selected applications
converges in H.Z/. The field Y D a Z is stationary with representing measure Y and spectral measure Y given by dY D aO dZ
dY D jaj O 2 dZ :
and
If Z .fx W a.x/ O D 0g/ D 0, then Z ei.t;x/ Z.t / D Œ0;2/d
1 dY .x/; a.x/ O
t 2 Zd
i.e., the convolution Z 7! a Z is invertible. Proof. Using the spectral representation of Z we obtain Z X X a.j /Z.t j / D ei.t;x/ a.j /ei.j;x/ dZ .x/: j W jj jn
Œ0;2/d
j W jj jn
The sum on the right-hand side converges in L2 .Z /. Using the isometric operator IZ we see that the sum on the left-hand side converges in H.Z/. Taking the limit n ! 1 we obtain the spectral representation of Y . The rest follows from Lemma 2.6.11. Remark 5.4.2. If Z has a constant density, then the series (5.4.1.1) converges in L2 . / for all a 2 L2 .Zd /, since the functions ei.;n/ have the same norm and build a (complete) orthogonal system in this Hilbert space. The next theorem is important in signal processing. Theorem 5.4.3. A stationary field Y on Zd admits the representation X a.n/X.t n/; t 2 Zd Y .t / D a X.t / D n2Zd
with a white noise X W N.0; 1/ on Zd and a 2 L2 .Zd / if and only if the spectral measure of Y is absolutely continuous. Proof. Assume that Y has the representation above. By Example 2.9.5 and Theorem 5.4.1, the spectral measure of Y is given by d D
1 jaj O 2 d d .2/d
i.e., is absolutely continuous. Now assume that is absolutely continuous and let C be the correlation function of Y . By Theorem 1.8.16 there exists a function g 2 L2 .Zd / such that X C.t / D g g.t Q /D g.t C n/g.n/; t 2 Zd n2Zd
241
Section 5.4 Filtration of discrete stationary fields 2 d .x/. Let and d .x/ D jg.x/j O d
Z
Y .t / D
ei.t;x/ d.x/ Œ0;2/d
be the spectral representation of Y where is a random orthogonal measure with structure measure . We define the random orthogonal measure h as in Lemma 2.6.11 by Z h .A/ WD h d; A 2 B Œ0; 2/d A
where hD
1 2 L2 . / .2/d=2 gO
2 d .x/ we may ignore the zeros of g). O In view of (since d .x/ D jg.x/j O d Z Z 1 .h .A/; h .B// D 1A 1B jhj2 jgj O 2 d d D 1 d d .2/d A\B Œ0;2/d
the structure measure of h is d =.2/d . Consequently, Z X.t / D ei.t;x/ dh .x/; t 2 Zd Œ0;2/d
defines a white noise X W N.0; 1/ (see Example 2.9.5). By Lemma 2.6.11 we have Z Z i.t;x/ d.x/ D ei.t;x/ .2/d=2 g.x/ O dh .x/ Y .t / D e Z D
ei.t;x/ .2/d=2
1 X
ei.x;n/ g.n/ dh .x/
nD1 d=2
D .2/
D .2/d=2
1 X nD1 1 X nD1
completing the proof.
Z
g.n/
ei.tn;x/ dh .x/
g.n/X.t n/
Appendix A
Basic notation
A.1 Standard notation The list below contains some standard notation that is used throughout the book. N N0 Z Q R C T C1 .Rd / d C1 00 .R / Lp .Rd / Lp .Rd / Lp .Zd / Lpr ; Lpr xC exp.z/ 1A ıx k kp fQ.x/ span f .x C 0/ f .x 0/ E.X / cov.X /
D f1; 2; 3; : : : g D f0; 1; 2; : : :g D f0; ˙1; ˙2; : : :g rational numbers real numbers complex numbers complex numbers with modulus one the set of infinitely differentiable functions on Rd the set of infinitely differentiable functions on Rd with compact support the set of complex-valued Lebesgue measurable functions f on Rd for which jf jp is Lebesgue integrable Lp -space with respect to Lebesgue measure on Rd Lp -space with respect to the counting measure on Zd the corresponding spaces of real-valued functions D max .0; x/; x 2 R D ez ; z 2 C indicator function of a set A one-point or Dirac measure concentrated at x Lp -norm D f .x/ linear span right-hand limit left-hand limit expectation of the random variable or random vector X covariance of the random variable or random vector X
Let S be a topological space. By C.S / we denote the set of continuous complex-valued functions on S . The symbol C00 .S / denotes the set of continuous functions with compact support while C0 .S / is the set of continuous functions vanishing at infinity.1
1
Recall that a continuous function f on S vanishes at infinity if and only if for all > 0 there exists a compact set K S such that jf .x/j < ; x 2 S n K.
243
Section A.2 Multidimensional notation
A.2
Multidimensional notation
If x 2 Rd .Cd ; Nd0 , etc.), then xi ; 1 i d , denotes the i -th coordinate of x and we write x as x D .x1 ; : : : ; xd /. In expressions involving matrix operations, e.g., in Ax where A is a matrix, we consider x as a column vector. The zero element of Rd will also be denoted by 0. The standard basis of Rd or Zd consists of elements ej 2 Rd ; 1 j d , such that the j -th coordinate of ej is equal to 1 and the other coordinates are zero. For ˛; ˇ 2 Rd we write ˇ ˛ if ˇj ˛j for all j . The relation ˛ < ˇ holds if and only if ˛ ˇ and ˛ ¤ ˇ. Let x 2 Rd or Cd and let ˛; ˇ 2 Nd0 . Then we write x ˛ D x1˛1 x2˛2 : : : xd˛d q kxk D jx1 j2 C C jxd j2 jxj D jx1 j C C jxd j ˛Š D ˛1 Š : : : ˛d Š ! ! ! ˛ ˛Š ˛d ˛1 D ::: ; D ˇ .˛ ˇ/Š ˇŠ ˇ1 ˇd D˛ D
@x1˛1
ˇ˛
@j˛j @xd˛d
When ˛ D .0; : : : ; 0/ then x ˛ D 1. Note that D ˛ D ˇ g D D ˛Cˇ g where g denotes an arbitrary real- or complex-valued function on Rd for which the partial derivative D ˛Cˇ g exists. By definition, D ˛ g D g when ˛ D .0; : : : ; 0/. We also use the notation @g @g or gxi ;xj WD : gxi WD @xi @xi @xj By dx; d .x/ or d d .x/ we denote integration with respect to the Lebesgue measure D d on Rd , while .x; y/ is the inner product of x; y 2 Rd or Cd . For r 0 the open and closed balls with radius r and center 0 2 Rd are denoted by B o .r/ D Bdo .r/ and B c .r/ D Bdc .r/, respectively. That is, Bdo .r/ D ft 2 Rd W kt k < rg and Bdc .r/ D ft 2 Rd W kt k rg:
Appendix B
Basic analysis
B.1 Miscellaneous results from classical analysis Lemma B.1.1. The inequality ˇ n ˇˇ k X z wk ˇ jz wj emax.jzj;jwj/ kŠ kD1
holds for all z; w 2 C and n 2 N. Proof. Using the identity k
k
z w D .z w/
k1 X
z s w k1s ;
k2N
sD0
we obtain ˇ n ˇ n k1 ˇ X zk wk ˇ X 1 X s k1s ˇ ˇ jzj jwj ˇ ˇ jz wj ˇ kŠ ˇ kŠ kD1
kD1
jz wj
sD0
n X 1 k max.jzj; jwj/k1 kŠ
kD1
D jz wj
n1 X kD0
1 max.jzj; jwj/k kŠ
jz wj emax.jzj;jwj/ : Lemma B.1.2. Let fzn g1 1 be a sequence of complex numbers tending to some complex number z. Then zn n lim 1 C D ez : n!1 n Proof. Setting dn D
n X zn n zk ; 1C kŠ n
kD0
n2N
245
Section B.1 Miscellaneous results from classical analysis
it suffices to prove that limn!1 dn D 0. We have ˇX ˇ ˇ n z k n.n 1/ .n k C 1/ znk ˇ ˇ k ˇˇ jdn j D ˇ kŠ kŠ n kD1 ˇ ˇ n k1 X Y
1 ˇˇ k j ˇˇ ˇz znk 1 kŠ n ˇ j D1 kD1 ˇ ˇ n ˇˇz k z k ˇˇ n k1 X X Y
n jzn jk j C 1 : 1 kŠ kŠ n kD1
j D1
kD1
By Lemma B.1.1, the first sum in the last display tends to zero if n ! 1. Denote by sn the second sum. Applying Bernoulli’s inequality1 we obtain jsn j
n X jzn j2 jzn j jzn jk .k 1/k ! 0 .n ! 1/: e kŠ 2n 2n
kD1
Proposition B.1.3 (binomial series). For all ˛ > 0 and s 2 .1; 1/ we have 1 X .k C ˛/ 1 D .1/k sk ˛ .1 C s/ .k C 1/.˛/
(1)
kD0
where the series is absolutely convergent. Proof. Denote by ak the coefficient of s k . Using the relation .x C 1/ D x.x/
(2)
(cf. Lemma C.4.2) we see that limk!1 jakC1 =ak j D 1. Thus, the radius of convergence of the power series above is equal to 1. Applying again (2), the relation .k C 1/ D kŠ, and using induction on k it is easy to check that (1) is the Taylor series 1 expansion of s 7! .1Cs/ ˛ at s D 0. This completes the proof. Theorem B.1.4 (Fejér–Riesz). Let p be a function of the form N X
p.z/ D
cn z n ;
cn 2 C; z 2 C n f0g:
nDN
1
Bernoulli’s inequality states that .1 C x1 /.1 C x2 / .1 C xn / 1 C x1 C x2 C C xn where xj 2 .1; 0 for all j or xj 2 Œ0; 1/ for all j . This inequality can easily be proved by induction on n.
246
Appendix B Basic analysis
If p.z/ 0 for all z 2 T, then there exists a polynomial q of the form q.z/ D
N X
bn z n ;
z; bn 2 C
nD0
such that p.z/ D q.z/q.1=z/ ;
z 2 C n f0g:
In particular, p.z/ D jq.z/j2 holds for all z 2 T. Proof. Without loss of generality assume that cN 6D 0. Since p is real-valued on T we have N X .cn c n /z n ; z 2 T 0 D p.z/ p.z/ D nDN
showing that cn D c n , n D 0; 1; : : : ; N . Using this we see that the polynomial g given by g.z/ WD z N p.z/ D c N C C c0 z N C C cN z 2N ;
z2C
satisfies the equation g.z/ D z 2N g.1=z/;
z 2 C n f0g:
(1)
Denote by z1 ; : : : ; z2N the zeros of g counted according to their multiplicities. Since cN 6D 0, all these zeros are different from 0. It follows from (1) that 1=z j is a zero of g having the same multiplicity as zj . Next we show that if zj D eitj 2 T, then zj has even multiplicity. The function '.t / WD p.eit /; t 2 R, is analytic and nonnegative on R. Moreover, tj is a zero of '. It suffices to prove that the multiplicity of tj is even but this is a consequence of the fact that the function ', being nonnegative, has a local minimum at tj . It follows from the statements concerning the zeros of g that we can, if necessary, rearrange these zeros zj such that z1 ; : : : ; zN ; 1=z 1 ; : : : ; 1=z N represent all the 2N zeros of g. We then have g.z/ D cN
N Y
.z zj /
j D1
N Y
.z 1=z j /
j D1
and hence p.z/ D z
N
g.z/ D cN
N Y j D1
Dc
N Y
.z zj /
j D1
N Y
.z zj /
N Y
.1=z z j /
j D1
.1 1=zz j /
j D1
(2)
247
Section B.1 Miscellaneous results from classical analysis
Q N
1 .z / . Since p is nonnegative on T the constant c is posij j D1 p QN tive. Setting q.z/ WD c j D1 .z zj / and applying (2) we obtain
where c D cN
z 2 C n f0g:
p.z/ D q.z/q.1=z/; Theorem B.1.5. The inequality
8 jzjnC1 ˆ ˆ ; ˇ ˇ ˆ < nˇ ˇ iz .n C 1/Š .iz/ ˇe 1 iz ˇ ˇ nŠ ˇ ˆ ˆ jzjnC1 ˆ : ejzj ; .n C 1/Š
if
Im z 0;
if
Im z 0:
holds for all n 2 N0 and z 2 C. Proof. Assume first that n D 0 and let z D a C ib; a; b 2 R, be arbitrary. Using the identity Z 1 eiz 1 D iz eitz dt 0
we obtain
ˇZ ˇ iz ˇ ˇ ˇe 1ˇ D jzj ˇ ˇ Z D jzj
1
0 1ˇ
0
ˇ Z ˇ eitz dt ˇˇ jzj
1ˇ
ˇ ˇeitz ˇ dt
0
Z ˇ ˇ itatb ˇ e dt D jzj ˇ ˇ
(
1
tb
e
dt jzj
0
Setting rn .z/ WD eiz 1 iz
.iz/n ; nŠ
1;
if
b 0;
ejzj ;
if
b 0:
n 2 N0
it is not hard to check that Z rnC1 .z/ D iz
1
rn .t z/ dt: 0
Using this equation, the theorem follows by induction on n. B.1.6. Here we collect some simple but useful inequalities for trigonometric functions. Integrating the inequalities 1 cos x 1
and
1 sin x 1;
x2R
several times we find j sin xj < jxj;
x¤0
1 1 1 1 x 2 < cos x < 1 x 2 C x 4 ; 2 2 24
(1) x¤0
(2)
248
Appendix B Basic analysis
1 1 1 5 x x 3 < sin x < x x 3 C (3) x ; x > 0: 6 6 120 The inequality sin y ; 0 0, the function 'a; .x/ D p
1
2
e
x2 2 2
;
x2R
is a probability density (see Figure B.2). Lemma B.1.12. For all x > 0 and a > x we have Z 1 xt e e.xCa/t a dt D log .1 C /: t x 0 Proof. Denote by f .x/ and g.x/ the left- and right-hand side of the equation above. We have lim f .x/ D lim g.x/ D 0 x!1
x!1
251
Section B.1 Miscellaneous results from classical analysis
1 D 0:3
0.5
D1 3
2
1
1
2
3
Figure B.2. The function '0; from Corollary B.1.11.
and f 0 .x/ D g 0 .x/.2 Consequently, f .x/ D g.x/ for all x > 0. B.1.13. The numbers m k1 ; k 2 ; : : : ; kn
! WD
mŠ k1 Šk2 Š kn Š
are called multinomial coefficients. They satisfy the multinomial formula ! X m .x1 C C xn /m D x k ; n 2 N; m 2 N0 k1 ; k2 ; : : : ; kn jkjDm
where k 2 Nn0 .3 This formula can be proved by induction on n, applying the binomial formula to the identity .x1 C x2 C C xn C xnC1 /m D .x1 C x2 C C Œxn C xnC1 /m : It follows from the multinomial formula that the multinomial coefficients have a combinatorial interpretation as the number of partitions4 of a set with m elements in n blocks, with k1 elements in the first block, k2 elements in the second block, and so on. 2 3 4
R1 Differentiating under the integral sign can be avoided by replacing 1t by 0 est ds and then using Fubini’s theorem. Expressions of the form 00 are defined to be equal to 1. A partition of a set S is a collection of nonempty pairwise disjoint subsets of S whose union is S . The subsets are also called blocks.
252
Appendix B Basic analysis
The multinomial coefficients also occur in the identity ! n n X Y .k / m dm Y fi .t / D fi i .t / m k 1 ; : : : ; kn dt jkjDm
iD1
(1)
iD1
called the generalized Leibniz rule. This identity can be proved by induction on n showing first that ! m X m f .k/ g .mk/ : .fg/.m/ D k kD0
Next we prove Faà di Bruno’s formula for the higher chain rule for differentiation.5 Theorem B.1.14 (Faà di Bruno’s formula). If f and g are real functions with a sufficient number of derivatives, then dm g.f .t // D dt m X mŠ b1 Šb2 Š bm Š
g .jbj/ .f .t //
f 0 .t / 1Š
b1
f 00 .t / 2Š
b2
f .m/ .t / mŠ
bm (1)
where the sum is over all different solutions in nonnegative integers b1 ; : : : ; bm of b1 C 2b2 C C mbm D m, and jbj D b1 C C bm . Proof. We prove that X dm g.f .t // D g .k/ .f .t // .f 0 .t //b1 .f 00 .t //b2 .f .m/.t/ /bm dt m
(2)
where the sum is over all partitions of Sm D f1; 2; : : : ; mg, and, for each partition, k is its number of blocks and bi is the number of blocks with exactly i elements. The number of partitions of Sm into b1 blocks with 1 element, b2 blocks with 2 elements, etc. would be (cf. B.1.13) the multinomial coefficient ! m 1; : : : ; 1; 2; : : : ; 2; 3; : : : ; 3; : : : „ ƒ‚ … „ ƒ‚ … „ ƒ‚ … b1 1 0 s
b2 2 0 s
b3 3 0 s
except that this makes artificial distinctions among the i -blocks for each i . Therefore, we have to divide this number by b1Š b2 Š bm Š . Thus, (1) follows from equation (2). We prove (2) by induction on m. The case m D 1 being simple, assume that the statement is true for some m. Every partition of SmC1 can be obtained in a unique way by adjoining m C 1 to a partition of Sm . 5
Our proof is taken from the survey paper [32] which contains interesting remarks on the history of this formula.
Section B.1 Miscellaneous results from classical analysis
253
If we add fm C 1g as a new singleton block, we increase the number of blocks of size 1 by one, and the total number of blocks by one. This corresponds to applying d=dt to g .k/ .f .t // to get g .kC1/ .f .t //f 0 .t /. If we add m C 1 to an existing block of size i , then the number of such blocks decreases by one, the number of blocks of size i C 1 increases by one, and the total number of blocks remains the same. This corresponds to applying d=dt to .f .i/ .t //bi , to get bi .f .i/ .t //bi 1 f .iC1/ .t /. Lemma B.1.15. Let ak 0; k 2 N, and denote by R the radius of convergence of the power series 1 X ak z k ; z 2 C: S.z/ D kD1
If 0 R < 1, then S.R C / D 1 for all > 0. Proof. Indeed, if S.R C / were finite for some positive , then we would have jS.z/j
1 X
ak .R C /k < 1
kD1
whenever jzj R C , i.e., the radius of convergence would be larger than R. Lemma B.1.16. Let fak g1 1 be a sequence of nonnegative real numbers. There exists 1 a sequence fpk g1 of positive real numbers such that 1 X
pk akn < 1
kD1
for all n 2 N. Proof. Setting bk WD max .ak ; k/ we have ak bk and bk ! 1. Let pk D ebk . Then for each n 2 N there exists N.n/ 2 N such that pk bkn ebk =2 ek=2 if k N.n/. Thus,
1 X
pk akn
kD1
1 X
pk bkn < 1
kD1
for all n 2 N. Lemma B.1.17. Let x 2 Rd and let fxn g be a sequence in Rd such that lim ei.t;xn / D ei.t;x/
n!1
for all t 2 Rd . Then limn!1 xn D x.6 6
Note that for the divergent sequence fxn g in R with xn D 2 nŠ the relation above holds for all rational t .
254
Appendix B Basic analysis
Proof. We may assume that x D 0 and d D 1 so that lim eitxn D 1
(1)
n!1
for all t . Assume first that the sequence fxn g is bounded. If s is an accumulation point of this sequence then, by equation (1), eits D 1 for all t and hence s D 0. From this we conclude that fxn g tends to 0. Next assume that fxn g is unbounded and choose a subsequence fyn g tending to 1 or to 1. Using equation (1) we see that Z 1 Z 1 eiyn 1 lim eityn dt D lim eityn dt D lim D 0: 1D n!1 0 n!1 iyn 0 n!1 Thus, the sequence fxn g cannot be unbounded. Lemma B.1.18. For all t; a 0 we have lim e t
!1
X 0k a
. t /k D 1Œ0;a : kŠ
Proof. Let X be a Poisson random variable with parameter t . Then P .X a/ D e t
X 0k a
. t /k : kŠ
Thus, we have to prove that lim P .X a/ D 1Œ0;a :
!1
We have E .X / D Var .X / D t (cf. Example 1.2.4). If t > a, then Chebyshev’s inequality (cf. (F.1.6)) shows that t ! 0; ! 1: P .X a/ P .jX t j .t a//
.t a/2 If t a, we find in the same way P .X a/ D 1 P .X t > .a t // 1 P .jX t j > .a t // ! 1; Example B.1.19. Let h be the 2-periodic function on R with h.x/ D 1 jxj;
jxj 1:
We show that the function f defined by (see Figure B.3) 1 1X 3 n h.4n x/ f .x/ D 4 4 nD0
! 1:
255
Section B.1 Miscellaneous results from classical analysis
1
0.50
0.25
0.50
0.75
1
Figure B.3. The 5th partial sum of the series from Example B.1.19.
is nowhere differentiable.7 Let x 2 R; m 2 N, and "m WD ˙
1 2 4m
where the sign is chosen in such a way that there is no integer lying between 4m x and 4m .x C "m /. Write h.4n .x C "m // h.4n x/ : dnm WD "m Using the obvious inequality jh.s/ h.t /j js t j, we see that jdnm j 4n for all n; m. Since there is no integer between 4m x and 4m .x C "m /, we conclude that jdmm j D 4m . If n > m, then 4n "m is an even integer and hence dnm D 0. Therefore, ˇ ˇ ˇ m n ˇ m1 ˇ X ˇ f .x C "m / f .x/ ˇ ˇˇ X 1 3 ˇ m ˇDˇ d 3n D .3m C 1/ : 4 ˇˇ 3 ˇ nm ˇ ˇ ˇ "m 4 2 nD0
nD0
Since limm!1 "m D 0 we see that f is not differentiable at x.
7
Note that h is equal to the function fQ1=2 from Remark 4.2.4. Therefore h, and hence also f , are positive definite. Bochner’s Theorem 1.7.3 shows that they are characteristic functions.
256
Appendix B Basic analysis
B.2 Uniform convergence of continuous functions The first theorem of this section is due to S. N. Bernstein. Theorem B.2.1 (Bernstein). Let f be a continuous real-valued function on Œ0; 1. Then n n X f .x/ D lim f kn x k .1 x/nk ; x 2 Œ0; 1 n!1 k kD0
where the convergence is uniform on Œ0; 1. Proof. For each x 2 Œ0; 1 and n 2 N let Ynx be a random variable on a probability space .; A; P / such that nYnx has binomial distribution with parameters n and x. We and then have E.Ynx / D x; Var.Ynx / D x.1x/ n E.f .Ynx // D
n n X f kn x k .1 x/.nk/ : k
kD0
Therefore, it suffices to show that lim E.f .Ynx / f .x// D 0;
n!1
x 2 Œ0; 1
uniformly on Œ0; 1. By Chebyshev’s inequality (see (F.1.6)) 1 x.1 x/ p : P jYnx xj > n1=4 n1=2 Var .Ynx / D p n n
(1)
Choose K 2 R such that jf j K and denote by A the random event on the left-hand side of (1). Then P .A/ p1n and Z E.jf .Ynx /
f .x/j/ D Z D A
jf .Ynx / f .x/j dP jf .Ynx / f .x/j dP C
Z nA
jf .Ynx / f .x/j dP
2K sup p C jf .y/ f .z/j: n jyzjn1=4 The theorem follows now from the fact that f is uniformly continuous. Corollary B.2.2 (Weierstraß). Every continuous, real-valued (complex-valued) function on a finite interval Œa; b R can be uniformly approximated by real (complex, respectively) polynomials.
257
Section B.2 Uniform convergence of continuous functions
1
f
B5
B2
B1
1 Figure B.4. The Bernstein polynomials B1 ; B2 and B5 from Remark B.2.3 for the function f W x 7! x 0:3 .
Remark B.2.3. Let f be a function on Œ0; 1. The polynomials Bn .f /; n 2 N0 , defined by ! n n X x k .1 x/nk ; x 2 Œ0; 1 f kn Bn .f /.x/ WD k kD0
are called Bernstein polynomials (see Figure B.4). Theorem B.2.4 (Stone–Weierstraß). Suppose X is a compact Hausdorff space and A is a linear subspace of C.X / such that (i) if f; g 2 A, then fN and f g belong to A, as well; (ii)
A contains the constant functions;
(iii)
for all x; y 2 X; x 6D y, there exists a function f 2 A such that f .x/ 6D f .y/.
Then every function in C.X / can be uniformly approximated by functions in A. Proof. Denote by A the closure of A in the topology of uniform convergence. We have to show that A D C.X /. It is easy to check that A is a linear subspace and fN; f g; Im f; Re f 2 A whenever f; g 2 A. From this we conclude that if p is a polynomial, then p.f / 2 A for all f 2 A. Now let g 2 A be a real-valued function. We show that jgj 2 A. We may suppose that 1 g 1. By Corollary B.2.2, for each n 2 N there exists a real polynomial
258
Appendix B Basic analysis
such that
ˇ ˇ ˇjt j pn .t /ˇ 1 ; 1 t 1: n We already know that pn .g/ belongs to A. The inequalities above show that the sequence fpn .g/g converges uniformly to jgj, and hence jgj 2 A. If f; g 2 A are real-valued, then max .f; g/ and min .f; g/ belong to A, since max .f; g/ D
f C g C jf gj ; 2
min .f; g/ D
f C g jf gj : 2
To finish the proof it suffices to show that every real-valued function h 2 C.X / belongs to A. It follows at once from (iii) that for all x; y 2 X; x ¤ y, we can choose a real-valued function fxy 2 A such that fxy .x/ D h.x/ and fxy .y/ D h.y/. Let > 0 be arbitrary and define the sets Uxy and Vxy by Uxy D ft 2 X W fxy .t / < h.t / C g; Vxy D ft 2 X W fxy .t / > h.t / g: These sets are open and x; y 2 Uxy \ Vxy . Therefore, fUxy gx2X is an open covering of X for each y. By compactness of X , there exist n D n.y/ 2 N and xi D xi .y/ 2 X; i D 1; : : : ; n, such that n [ Uxi y : XD iD1
By what we have already proved, fy WD min .fx1 y ; : : : ; fxn y / 2 A. Moreover, fy .t / < h.t / C ;
t 2 X;
fy .t / > h.t / ;
t 2 Vy WD \niD1 Vxi y :
The sets Vy are open and y 2 Vy . Using again the compactness of X we see that there exist m 2 N and y1 ; : : : ; ym 2 X such that [m 1 Vyi D X . Choosing f WD max .fy1 ; : : : ; fym / we have f 2 A and h.t / < f .t / < h.t / C ;
t 2X
completing the proof. Theorem B.2.5 (Dini). Let ffn g be a sequence of continuous real-valued functions on a compact space X converging pointwise to a continuous function g. If this sequence is monotonically increasing or decreasing, then the convergence is uniform. Proof. We assume that the sequence is increasing. For every > 0 and x 2 X there exists an index n.x/ D n.x; / such that 0 g.x/ fm .x/ ; 3
m n.x/:
259
Section B.2 Uniform convergence of continuous functions
Let V .x/ be an open neighborhood of x such that the inequalities 3 jfn.x/ .x/ fn.x/ .y/j 3 jg.x/ g.y/j
hold for all y 2 V .x/. For all y 2 V .x/ we then have 0 g.y/ fn.x/ .y/ : By compactness we can choose finitely many points xi 2 X such that the union of the sets V .xi / is equal to X . Let n0 be the largest of the numbers n.xi /. Since each y 2 X belongs to some V .xi / we have g.y/ fn .y/ g.y/ fn.xi / .y/ ;
n n0 :
The proof is complete. Definition B.2.6. Let X be a topological space. A set F of complex-valued functions on X is called equicontinuous if for all x 2 X and > 0 there exists a neighborhood Vx of x such that the inequality jf .x/ f .y/j < holds for all f 2 F and for all y 2 Vx . Theorem B.2.7 (Ascoli). Let ff˛ g be a net of equicontinuous complex-valued functions on a topological space X converging pointwise to a function f . Then this net converges uniformly on compact sets and f is continuous. Proof. We show first that f is continuous. For all x; y 2 X and for all ˛ we have jf .x/ f .y/j jf .x/ f˛ .x/j C jf˛ .x/ f˛ .y/j C jf˛ .y/ f .y/j:
(1)
Let x 2 X and > 0 be arbitrary and let Vx be a neighborhood of x such that jf˛ .x/ f˛ .y/j < ;
y 2 Vx :
For each y 2 Vx we choose an index ˛.y/ D ˛ such that jf .x/ f˛ .x/j < and jf .y/ f˛ .y/j < . It follows from inequality (1) that jf .x/ f .y/j < 3, y 2 Vx . Thus, f is continuous in x. Obviously the set ff˛ ; f g is also equicontinuous. Let K X be compact. For all x 2 K and for all > 0 there exists an open neighborhood Vx of x such that the inequalities jf .x/ f .y/j < and jf˛ .x/ f˛ .y/j <
260
Appendix B Basic analysis
hold for S all ˛ and for all y 2 Vx . Since K is compact there exist x1 ; : : : ; xn such that K niD1 Vxi . Choose ˇ such that jf˛ .xi / f .xi /j < ;
˛ ˇ; i D 1; : : : ; n:
Now let y 2 K be arbitrary. Then y 2 Vxi for some i . Using this we see that jf˛ .y/ f .y/j jf˛ .y/ f˛ .xi /j C jf˛ .xi / f .xi /j C jf .xi / f .y/j < 3 holds whenever ˛ ˇ. Thus, f˛ ! f uniformly on K. Theorem B.2.8. Let F be a set of complex-valued functions on a set X ¤ ; and consider on F the topology of pointwise convergence. If supfjf .x/j W f 2 F g < 1 for all x 2 X , then the closure of F is compact. Proof. Let C.x/ be the closure of the set ff .x/ W f 2 F g C. Then C.x/ is bounded and hence compact. Now F is a subset of the product space Y C.x/ C D x2X
and pointwise convergence is the same as convergence in the product topology. By Tychonoff’s theorem, C is compact. Combining Theorem B.2.7 and Theorem B.2.8 we obtain the following result. Theorem B.2.9 (Arzelà–Ascoli). Let ff˛ g be an equicontinuous net of complexvalued functions on a topological space X such that supfjf .x/j W f 2 F g < 1 for all x 2 X . Then this net contains a subnet converging uniformly on compact sets to a continuous function. A proof of the next theorem can be found for example in [47]. Theorem B.2.10. Let ffn g be a sequence of differentiable real-valued functions on a finite interval I D Œa; b R such that (i)
for some x0 2 I the sequence ffn .x0 /g is convergent;
(ii)
the sequence ffn0 g is uniformly convergent on I .
Then the sequence ffn g converges uniformly on I to a differentiable function f and f 0 .x/ D lim fn0 .x/; n!1
x 2 I:
261
Section B.3 Infinite products
B.3
Infinite products
Definition B.3.1. Given a sequence fzj g1 j D1 of complex numbers let pn D z1 z2 zn ; n 1, be the n-th partial product. If there is a p 2 C such that limn pn D p, then we write8 1 Y pD zn : nD1
Lemma B.3.2. For a sequence fzj g1 1 of complex numbers the following conditions are equivalent: (i)
The n-th partial products pn converge to a nonzero limit as n ! 1.
(ii)
pn ¤ 0 for all n and > 0 there is an N 2 N such that jpn =pm 1j < for all n; m N .
(iii)
There is a ı > 0 such that jpn j > ı for all n, and for each > 0 there is an N 2 N such that and jpn pm j < for all n; m N .
Proof. The equivalence of (i) and (iii) is obvious. If (iii) holds then ˇ ˇ ˇ pn ˇ ˇ ˇ < < ; n; m N 1 ˇp ˇ jp j ı m m from which (ii) follows. Assume finally that (ii) is true and let 0 < < 1. Then ˇˇ ˇ ˇ ˇ ˇ ˇ ˇˇ pn ˇ ˇ ˇ ˇ 1ˇ < ˇ pn 1ˇ < ; n; m N: ˇˇ ˇ ˇˇ p ˇ ˇ ˇp m m Replacing m by N and writing A D jpN j we obtain
1, we obtain the following corollary. Q1 Corollary B.3.3. If nD1 zn D p ¤ 0 then limn!1 zn D 1. 8
Note that we do not introduce the terminology convergent infinite product which is treated differently in the literature.
262
Appendix B Basic analysis
Theorem B.3.4. Let fan g1 1 be a sequence of nonnegative real numbers. Then 1 Y
.1 C an / < 1
nD1
if and only if
P
an < 1. If an < 1 for all n, then 1 Y
.1 an / > 0
nD1
if and only if
P
an < 1.
Proof. Writing as shorthands sn D a1 C C an ; pn D .1 C a1 / .1 C an / and qn D .1 a1 / .1 an /, the sequences fsn g and fpn g are increasing while fqn g is decreasing so that their (possibly infinite) limits exist. Moreover, sn < pn esn :
(1)
Indeed, the lower estimate for pn is a consequence of pn D .1 C a1 / .1 C an / D 1 C a1 C C an C a1 a2 C while the upper one follows by applying the inequalities 1 C ak eak ; k D 1; : : : ; n. From (1) we conclude that fsn g and fpn g are at the same time bounded or unbounded. This is shown by the first statement. If ak < 1, then 0 < 1 ak eak hence 0 < qn esn . Thus, lim qn D 0 if lim sn D 1. The case lim sn < 1 follows from the next theorem.9 Theorem B.3.5. Let fun g1 1 be a sequence of complex numbers different from 1 such that 1 X jun j < 1: nD1
Then there exists p 2 C such that 1 Y
.1 C un / D p ¤ 0:
nD1
Proof. Write an D jun j;
qn D
n Y
.1 C aj /;
pn D
j D1 9
Note that Theorem B.3.5 uses only the first part of Theorem B.3.4.
n Y j D1
.1 C uj /
263
Section B.3 Infinite products
and let > 0 arbitrary. It follows from the first part of Theorem B.3.4 that lim qn < 1. By Lemma B.3.2, there exists N 2 N such that qnCd 1 < ; n N; d 2 N0 : qn We have
ˇ ˇ ˇ pnCd ˇ ˇ ˇ ˇ p 1ˇ D j.1 C unC1 /.1 C unC2 / .1 C unCd / 1j n D junC1 C C unCd C unC1 unC2 C j anC1 C C anCd C anC1 anC2 C qnCd 1 < ; n N; d 2 N0 : D qn
Thus, the theorem follows from Lemma B.3.2. Theorem B.3.6. Let fun g1 1 be a sequence of complex-valued functions on some set D ¤ ; such that for some K > 0 1 X
jun .z/j K;
z 2 D:
nD1
Then the sequence pn .z/ D .1 C u1 .z// .1 C un .z// converges uniformly on D. The limit u.z/ is equal to zero if and only if 1 C un .z/ D 0 for some n. Proof. We have pnC1 .z/ pn .z/ D unC1 .z/pn .z/ and hence for m > n pm .z/ pn .z/ D pm .z/ pm1 .z/ C C pnC1 .z/ pn .z/ D
m1 X j Dn
Using this and the inequality 1 C x ex ; x 2 R, we obtain ˇ m1 ˇ ˇX ˇ ˇ ˇ .z/ p .z/j D u .z/p .z/ jpm ˇ ˇ n j C1 j ˇ ˇ j Dn
m1 X
juj C1 .z/j eju1 .z/j ejuj .z/j
j Dn
m1 X j Dn
juj C1 .z/j eK :
uj C1 .z/pj .z/:
264
Appendix B Basic analysis
By our assumption the right-hand side tends to 0 uniformly as m; n ! 1, i.e., the sequence fpn g is uniformly convergent. The last statement follows from Theorem B.3.5.
B.4 Convex functions For proofs of the results presented in this section we refer to [61]. Definition B.4.1. A real-valued function f on an interval I R is said to be convex if the inequality f . x1 C .1 /x2 / f .x1 / C .1 /f .x2 / holds for all x1 ; x2 2 I and 2 Œ0; 1. Geometrically, a convex function f has the property that for any two points x1 < x2 in I the chord joining the points .x1 ; f .x1 // and .x2 ; f .x2 // is above the graph of f . Proposition B.4.2. Let f be a convex function on an interval I R and u; v; w 2 I be such that u < v < w. Then we have f .v/ f .u/ f .w/ f .u/ f .w/ f .v/ < < : vu wu wv The inequalities in Proposition B.4.2 play a basic role in the proof of the next theorem. Theorem B.4.3. Let f be a convex function on an open interval I R. Then we have (i)
f is continuous on I ;
(ii)
f is both left- and right-differentiable at every x0 2 I , i.e., f .x/ f .x0 / 2R x x0 x"x0 f .x/ f .x0 / 2 RI .Dr f /.x0 / WD lim x x0 x#x0 .Dl f /.x0 / WD lim
(iii)
Dl and Dr are increasing functions, Dl Dr and .Dr f /.x1 /
f .x2 / f .x1 / .Dl f /.x2 / x2 x1
for all x1 ; x2 2 I such that x1 < x2 ; (iv)
f is differentiable everywhere except at countably many points in I and f 0 is an increasing function on the subset of I on which it exists.
265
Section B.4 Convex functions
Theorem B.4.4. Let f be a convex function on an open interval I R. Then either f is monotone on I or else there exists x0 2 I such that f is decreasing on I \ .1; x0 and increasing on I \ Œx0 ; 1/. Theorem B.4.7 below states that convex functions are absolutely continuous. Before formulating it we recall the definition of absolute continuity. Definition B.4.5. A complex-valued function F , defined on an interval ŒaI b R, is called absolutely continuous if there corresponds to every > 0 a ı > 0 such that n X
jF .bi / F .ai /j <
iD1
for any n 2 N and any disjoint collection of intervals Œai ; bi Œa; b satisfying n X
j.bi ai /j < ı:
iD1
Taking n D 1 in the definition above, we see that such an F is continuous. Moreover, F is of bounded variation.10 Indeed, let ı > 0 correspond to D 1. If a D x0 < < xm D b, by inserting further subdivision points if necessary, we can collect the intervals .xi1 ; xi / into at most N WD b.b a/=ıc C 1 groups such that the sum of the interval lengths within a group is less than ı. It follows that the total variation of F is at most N . The next theorem is usually referred to as the fundamental theorem of calculus for Lebesgue integrals. Theorem B.4.6 (Lebesgue). If 1 < a < b < 1 and F W Œa; b ! C, then the following statements are equivalent: (i) (ii) (iii)
F is absolutely continuous; Rx F .x/ F .a/ D a f .t / dt; x 2 Œa; b for some Lebesgue integrable function f on Œa; b; F is Lebesgue almost differentiable, F 0 is Lebesgue integrable and R x everywhere 0 F .x/ F .a/ D a F .t / dt; x 2 Œa; b.
Theorem B.4.7. Let f be a convex function on an interval I R. (i)
If I is open, then f is absolutely continuous on every finite closed interval contained in I .
(ii)
If I is a closed finite interval and f is continuous at the endpoints of I , then f is absolutely continuous on I .
10
See page 288 for the definition of bounded variation.
266
Appendix B Basic analysis
Theorem B.4.8. Let f be a real-valued function on an open interval I R. Suppose (i)
f is absolutely continuous on every finite closed interval contained in I , and
(ii)
f 0 is an increasing function on the subset of I on which it exists.
Then f is a convex function on I .
B.5 The Riemann–Stieltjes integral At a few places in the book we need the Riemann–Stieltjes integral. D. V. Widder’s classic book [60] contains a short introduction to this topic. Let ˛ and f be real-valued functions defined on some finite interval Œa; b R. Denote by D .x0 ; x1 ; : : : ; xn / a subdivision of this interval by the points x0 ; x1 ; : : : ; xn , where a D x0 < x1 < < xn D b: Further, let ı D ı./ be the largest of the numbers xiC1 xi ; i D 0; : : : ; n 1. Definition B.5.1. If the limit lim
ı!0
n1 X
f .si /Œ˛.xiC1 / ˛.xi /
iD0
where xi si xiC1 exists independently of the manner of subdivision and of the choice of the numbers si , then the limit is called the Riemann–Stieltjes integral of f with respect to ˛ from a to b and is denoted by Z b f .x/ d˛.x/: a
The Riemann–Stieltjes integral reduces to the Riemann integral if ˛.x/ D x. Theorem B.5.2. If f is continuous and ˛ is of bounded variation in Œa; b, then the Riemann–Stieltjes integral of f with respect to ˛ from a to b exists. Theorem B.5.3 (integration by parts). If the Riemann–Stieltjes integral of f with respect to ˛ from a to b exists, then so does the Riemann–Stieltjes integral of ˛ with respect to f , and Z b Z b f .x/ d˛.x/ D f .b/˛.b/ f .a/˛.a/ ˛.x/ df .x/: a
a
Another useful property of the Riemann–Stieltjes integral is given next.
267
Section B.6 Multivariate calculus
Theorem B.5.4. If f and ' are continuous and ˛ is of bounded variation in Œa; b, and if Z x '.t / d˛.t /; x 2 Œa; b ˇ.x/ D c
where c 2 Œa; b is fixed, then Z
Z
b
f .x/ dˇ.x/ D a
b
f .x/'.x/ d˛.x/: a
Definition B.5.5 (improper Riemann–Stieltjes integral). Let ˛ and f be real-valued functions defined on the interval Œa; 1/ R. Then Z 1 f .x/ d˛.x/ a
is said to be convergent and equal to A provided and Z
Rb a
f .x/ d˛.x/ exists for all b a
b
A D lim
b!1 a
f .x/ d˛.x/
is finite. B.5.6. Using the notation from the previous definition and applying Theorem B.5.3 we obtain the formula Z 1 Z 1 f .x/ d˛.x/ D lim Œf .b/˛.b/ f .a/˛.a/ ˛.x/ df .x/ (1) b!1
a
a
provided the improper integrals and the limit exist.
B.6
Multivariate calculus
Theorem B.6.1 (Taylor’s theorem for multivariate functions). Let f be a real- or complex-valued function defined on an open convex set S Rd such that for some positive integer n all partial derivatives D ˛ f; ˛ 2 Nd0 ; j˛j n, exist and are continuous. Then for all t; a 2 S we have f .t / D
X X D ˛ f .a/ R˛ .t / .t a/˛ .t a/˛ C ˛Š
j˛jn
j˛jDn
where R˛ .t / D
n ˛Š
Z 0
1
D ˛ f .a C x.t a// D ˛ f .a/ .1 x/n1 dx:
268
Appendix B Basic analysis
The remainder terms R˛ satisfy the inequality jR˛ .t /j
1 sup jD ˛ f .y/ D ˛ f .a/j ˛Š y2L
where L denotes the line segment connecting a and t . Proof. Since S is open and convex we can choose an open interval I Œ0; 1 such that a C s.t a/ 2 S for all s 2 I . The function g of one real variable defined by g.s/ D f .a C s.t a//;
s2I
(1)
is n-times continuously differentiable. Thus, for n > 0 the one-dimensional Taylor formula Z s n X .n/ 1 g .k/ .0/ k g.s/ D g .x/ g .n/ .0/ .s x/n1 dx; s 2 I s C kŠ .n 1/Š 0 kD0
holds. Consequently, f .t / D g.1/ D
Z 1 n X .n/ 1 g .k/ .0/ g .x/g .n/ .0/ .1x/n1 dx: (2) C kŠ .n 1/Š 0
kD0
Using (1), the chain rule, and induction on k we get g .k/ .s/ D
d X
i1 D1
d X ik D1
@k f .a C s.t a// .ti1 ai1 / : : : .tik aik / @ti1 @tik
X kŠ D D ˛ f .a C s.t a// .t a/˛ : ˛Š j˛jDk
Inserting this into (2) we obtain the first statement of the theorem. The remainders are easily seen to satisfy the given upper estimate. Theorem B.6.2. Under the conditions of Theorem B.6.1 there exist ; 2 .0; 1/ depending on t such that f .t / D
X j˛jn1
X D ˛ f .a/ Q˛ .t / .t a/˛ .t a/˛ C ˛Š j˛jDn
where Q˛ .t / D
1 ŒRe D ˛ f .a C .t a// C i Im D ˛ f .a C .t a//: ˛Š
269
Section B.6 Multivariate calculus
Proof. The proof is almost the same as that of Theorem B.6.1, but now we apply the one-dimensional Taylor formula h.s/ D
n1 X kD0
1 h.k/ .0/ k s C h.n/ .s/ s n kŠ nŠ
where h is real-valued, for the real and imaginary part of g. Lemma B.6.3. Let f be a complex-valued function on .a; b/ .a; b/ R2 having continuous partial derivatives of the second order. Then 2 f .x; y; ; h/ ; h ;h!0
D .1;1/ f .x; y/ D fxy .x; y/ D lim
t; s 2 .a; b/
where 2 f .x; y; ; h/ D f .x C ; y C h/ f .x C ; y/ f .x; y C h/ C f .x; y/: Proof. We may assume that f is real-valued. Let x; y 2 .a; b/ and choose ı > 0 such that x C ; y C h 2 .a; b/ whenever ; h 2 Iı WD .ı; ı/. Setting g.t / WD f .x C t; y C h/ f .x C t; y/ fxy .x; y/t h;
t 2 Iı
the function g is continuously differentiable on Iı . For 2 Iı we have g./ g.0/ D 2 f .x; y; ; h/ fxy .x; y/h: Applying the mean value theorem to g we obtain jg./ g.0/j jj sup jg 0 .s/j s2Iı
D jj sup jfx .x C s; y C h/ fx .x C s; y/ fxy .x; y/hj: s2Iı
Now we apply the mean value theorem to the continuously differentiable function r.h/ D fx .x C s; y C h/ fxy .x; y/h: jr.h/ r.0/j D jfx .x C s; y C h/ fx .x C s; y/ fxy .x; y/hj jhj sup jr 0 .q/j q2Iı
D jhj sup jfxy .x C s; y C q/ fxy .x; y/j: q2Iı
Combining the two upper estimates we see that ˇ ˇ 2 ˇ f .x; y; ; h/ ˇ ˇ ˇ sup jfxy .x C s; y C q/ fxy .x; y/j; f .x; y/ xy ˇ ˇ s;q2Iı h Since fxy is continuous, the right-hand side tends to zero as ı ! 0.
; h 2 Iı :
270
Appendix B Basic analysis
Lemma B.6.4. Let a; b 2 .0; 1/ be such that b > .a=3/3 . Then the polynomial P .s; t / D s 2 t 2 .s 2 C t 2 a/ C b;
s; t 2 R
is strictly positive but it is not a sum of squares of polynomials with real coefficients. p p Proof. It is easy to show that P attains its minimum at s D ˙ a=3; t D ˙ a=3 and the minimum is equal to b .a=3/3 > 0. Assume that (1) P D P12 C C Pn2 where Pj is a polynomial with real coefficients. It follows from P .s; 0/ D P .0; t / D b that Pj .s; 0/ and Pj .0; t / are constant for all j . Hence, Pj .s; t / D aj C stQj .s; t /
(2)
where Qj is a polynomial and aj 2 R. From (1) we see that the degree of Qj is at most one. Putting (2) into (1) and comparing the coefficients we obtain 2
2
s Ct a D
n X
Qj2 .s; t /:
j D1
This, however, is a contradiction because the right-hand side is always nonnegative whereas the left-hand side is negative if s 2 C t 2 < a. In the rest of this section we prove some simple results about multivariate polynomials. Lemma B.6.5. Let P be a polynomial of degree m on Rd . Then for all t; s 2 Rd the function Qt;s .r/ D P .t C r.s t //; r 2 R is a polynomial of degree at most m. Moreover, every nonempty open set S Rd contains elements t; s such that the degree of Qt;s is equal to m. Proof. The first statement follows from the fact that Qt;s is a linear combination of terms of the form d Y ˛ tj C r.sj tj / j j D1
where ˛j 2 N0 and ˛1 C C˛d m. We now show the second statement. If m D 0, then there is nothing to show; otherwise we have X a˛ t ˛ C R.t / P .t / D j˛jDm
271
Section B.6 Multivariate calculus
where not all of the coefficients a˛ are equal to zero and R is a polynomial of degree less than m. We choose t 2 S such that the sum above is different from zero. Further, let ı ¤ 0 be a real number such that s D .1 C ı/t 2 S . Then X a˛ t ˛ C R t C r.s t / Qt;s .r/ D P .1 C rı/t D .1 C rı/m j˛jDm
Applying the first statement to R we see that the degree of Qt;s is equal to m. Theorem B.6.6. Let f be a real- or complex-valued function defined on an open convex set S Rd . Assume that for some nonnegative integer m and for all t; s 2 S the function r 7! f .t C r.s t //; r 2 Œ0; 1 is a polynomial of degree at most m. Then f is a polynomial of degree at most m. Proof. To simplify the notation we assume that d D 2 and write the function f as f .x; y/; .x; y/ 2 S . We choose mutually different aj , mutually different bj ; 0 j m, such that .ai ; bj / 2 S for all i and j and denote by Ai and Bi the corresponding Lagrange polynomials: Ai .x/ D
m Y kD1; k¤i
x ak ; ai ak
Bi .y/ D
m Y kD1; k¤i
y bk : bi bk
The function P .x; y/ D
m X
ŒAi .x/f .ai ; y/ C Bi .y/f .x; bi /
m X
Ai .x/Bj .y/f .ai ; bj /
i;j D1
iD1
is a polynomial of degree at most m in each variable. Moreover, we have P .ak ; bl / D f .ak ; bl / for all 0 k; l m. Since f and P are polynomials of degree at most m of the first variable, we conclude the P .x; bl / D f .x; bl / whenever .x; bl / 2 S . Knowing this, the same argument shows that P .x; y/ D f .x; y/ for all .x; y/ 2 S , i.e., f is a polynomial. That the degree of f is at most m follows from Lemma B.6.5. Lemma B.6.7. Let P be a polynomial on Rd with complex coefficients such that for all t 2 Rd the polynomial s 7! P .st /; s 2 R, of one variable is of degree at most k. Then the degree of P is at most k. Proof. We write P in the form P D P0 C C Pn where n is the degree of P and X Pj .t / D c˛ t ˛ ; c˛ 2 C; t 2 Rd : j˛jDj
272
Appendix B Basic analysis
Choose t0 such that Pn .t0 / ¤ 0. Using the relation Pj .st0 / D s j Pj .t0 / we see that the degree of s 7! P .st0 / is equal to n and hence n k.
B.7 The Lebesgue integral on Rd We refer to [18] for basic properties of the multidimensional Lebesgue integral. B.7.1. Assume that the real-valued functions h1 ; : : : ; hd are defined on the open set U Rd and let h.x/ D .h1 .x/; : : : ; hd .x//; x 2 U . Suppose further that @hi ; i; j D 1; : : : ; d exist. Then the matrix-valued function all partial derivatives @x j
J h.x/ D
d @hi .x/ @xj i;j D1
2 @h 1 .x/ 6 @x1 6 :: D6 : 6 4 @h d .x/ @x1
@h1 .x/ @xd :: : @hd .x/ @xd
3 7 7 7; 7 5
x2U
is called the Jacobian matrix of h. The real-valued function x 7! det .J h.x// is usually called the Jacobian of h. Theorem B.7.2. Let U be an open subset of Rd and h W U ! Rd be an injective function with continuous partial derivatives, the Jacobian of which is nonzero for every x in U . A real- or complex-valued function g is Lebesgue integrable over h.U / if and only if x 7! g.h.x// det .J h.x//; x 2 U , is Lebesgue integrable over U . In this case11 Z Z g.y/ dy D g.h.x// j det .J h.x//j dx: h.U /
U
Corollary B.7.3. Let h be as in the previous theorem and write ' D h1 . If X is a d-dimensional random vector having density pX , then the random vector Y D h.X / has density pY .x/ D pX .'.x// j det .J'.x//j; x 2 Rd : If h.x/ D Ax with a nonsingular matrix A, then pY .x/ D
11
1 pX .A1 x/; j det Aj
Note that the assumptions imply that h.U / is open.
x 2 Rd :
Section B.7 The Lebesgue integral on Rd
273
B.7.4. Let S d 1 D ft 2 Rd W kt k D 1g. The map F defined by F .t / WD .kt k; t =kt k/;
t 2 Rd n f0g
is a continuous bijection from Rd n f0g to .0; 1/ S d 1 whose inverse is F 1 .r; s/ D rs;
.r; s/ 2 .0; 1/ S d 1 :
Theorem B.7.5. There is a unique Borel measure D d on S d 1 such that if f is Borel measurable on Rd and f 0 or f 2 L1 . /, then Z 1Z Z f .t / dt D f .rs/r d 1 d .s/ dr: (1) Rd
0
S d 1
Moreover, is rotation-invariant. Proof. For E 2 B.S d 1 / let .E/ WD d .E1 /, where Ea D F 1 ..0; a E/ D frs W 0 < r a; s 2 Eg;
a > 0:
Since the map E 7! E1 takes Borel sets to Borel sets and commutes with unions, intersections and complements, it is clear that is a Borel measure on S d 1 . The rotation-invariance of implies that is rotation-invariant as well. Since Ea is the image of E1 under the map t 7! at , Theorem B.7.2 shows that .Ea / D ad .E1 /. We denote by the Borel measure on .0; 1/ S d 1 induced by the mapping F
.B/ D .F 1 .B//. Further, define the measure on .0; 1/ by .A/ WD R , i.e., d 1 dr. We show that D from which equation (1) follows. We have Ar b d ad
..a; b E/ D .Eb n Ea / D .E/ d Z b r d 1 dr .E/ D ..a; b/ .E/; D a
E 2 B.S d 1 /:
Using this, a standard uniqueness argument shows that for fixed E 2 B.S d 1 / the measures and coincide on the -algebra fA E W A 2 B..0; 1//g: Since E 2 B.S d 1 / is arbitrary we conclude that D on all Borel sets of .0; 1/ S d 1 . Remark B.7.6. By considering the completion of the measure , which we denote also by , equation (B.7.5.1) can be extended to Lebesgue measurable functions. Note that the completion is rotation-invariant, as well.
274
Appendix B Basic analysis
30
20
10
5
10
Figure B.5. The function d 7! 2
d=2
15
20
=.d=2/; d 2 .0; 1/, from Proposition B.7.8.
Corollary B.7.7. If f is a measurable function on Rd , nonnegative or integrable, such that f .t / D g.kt k/; t 2 Rd , for some function g on .0; 1/, then Z Z 1 d 1 f .t / dt D .S / g.r/r d 1 dr: Rd
0
Proposition B.7.8. We have12 .S d 1 / D
2 d=2 : .d=2/
Proof. Using Lemma B.1.10, Corollary B.7.7, and the substitution s D r 2 we obtain
d=2
D
d Z Y
1
j D1 1
D .S
d 1
tj2
e
Z
/
Z dtj D
1
r
d 1 r 2
e
0
D
12
.S d 1 / .d=2/ : 2
See Figure B.5 for a plot of this expression.
ektk dt 2
Rd
.S d 1 / dr D 2
Z 0
1
s .d 2/=2 es ds
Section B.7 The Lebesgue integral on Rd
275
Proposition B.7.9. For R 0 we have
.ft 2 Rd W kt k Rg/ D
Rd Rd d=2 .S d 1 / D : d .d=2 C 1/
Proof. By Corollary B.7.7 and the previous proposition Z R r d 1 dr
.ft 2 Rd W kt k Rg/ D .S d 1 / 0
Rd d=2 Rd .S d 1 / D : D d .d=2 C 1/ Spherical coordinates B.7.10. Recall that the angle between x; y 2 Rd n f0g is defined by .x; y/ ; 0 : cos D kxk kyk If x D 0 or y D 0, then we set WD 0. Let e1 ; : : : ; ed be the standard orthonormal basis in Rd ; d 2. For x 2 Rd write r D kxk. Let uj be the unit vector in the direction of the orthogonal projection of x onto the space spanned by ej ; : : : ; ed ; j D 2; : : : ; d 1. If the orthogonal projection is 0, then we set uj WD 0. Denote by j 1 the angle between uj and ej . Below we show that ! ! jY 1 d 2 dY 2 X r sin k cos j ej C r sin k ud 1 : (1) xD j D1
kD1
kD1
Since ud 1 is a linear combination of ed 1 and ed , there is a unique 2 Œ0; 2/ such that ud 1 D cos ed 1 C sin ed . From equation (1) we obtain x1 D r cos xj D r cos
(2)
1 jY 1 j
k;
sin
j D 2; : : : ; d 2
(3)
kD1
xd 1 D r cos
dY 2
sin
k;
(4)
sin
k
(5)
kD1
xd D r sin
dY 2 kD1
where r 2 Œ0; 1/; The numbers r; ordinates of x.13 13
2 Œ0; ; 1 j d 2, and 2 Œ0; 2/. ; 1 : : : ; d 1 and are called the d-dimensional spherical coj
This derivation of the spherical coordinates is taken from [8].
276
Appendix B Basic analysis
Proof of (1). Since x1 D .x; e1 / D r cos x D r cos
1, d X
1 e1 C
xj ej :
j D2
We have r 2 D kxk2 D r 2 cos2
1C
d X
xj2
d X
and hence
j D2
xj2 D r 2 sin2
1:
j D2
Define aj by xj D aj r sin 1 if sin 1 ¤ 0 and let aj D 0; j D 2; : : : ; d , otherwise. P Then djD2 aj ej has unit length and x D r cos
1
d X
e1 C r sin
1
aj ej :
j D2
Since sin and
1
0 we see that u2 D
Pd
ej . We have cos
j D2 aj
u2 D cos
d X
2 e2 C
D .u2 ; e2 / D a2
2
aj ej :
j D3
Moreover, 1 D ku2 k2 D cos2
2C
d X
aj2
and hence
j D3
d X
aj2 D sin2
2:
j D3
Define bj by aj D bj sin 2 if sin 2 ¤ 0 and let bj D 0; j D 3; : : : ; d , otherwise. P Then djD3 bj ej has unit length and u2 D cos
2
e2 C sin
2
d X
bj ej
j D3
i. e., u3 D
Pd
j D3 bj ej .
x D r cos
Thus,
1
e1 C r sin
1 cos
2
e2 C r sin
1 sin
2
u3 :
Continuing this process we obtain the equation (1). Remark B.7.11. Note that the Jacobian of the transformation (B.7.10.1) is given by J.r;
1; : : : ;
d 1 d 2 ; / D r
dY 2 kD1
sink
d 1k :
Section B.7 The Lebesgue integral on Rd
277
Let d 2. Setting r D 1 in equation (B.7.10.1) we see that S d 1 can be parameterized by angles j 2 Œ0; ; 1 j d 2, and 2 Œ0; 2/, which are related to the Cartesian coordinates x1 ; : : : ; xd by x1 D cos xd 1 D sin
1;
x2 D sin
1 sin
1 cos
d 2 cos ;
2;
x3 D sin
xd D sin
1 sin
1 sin
2; : : : ;
2 cos
d 2 sin :
Using this parameterization the surface measure d on S d 1 can be given explicitly by dd .
1; : : : ;
d 2 ; /
D sind 2
Making the substitution t D cos relation dd . for d 3.
1; : : : ;
d 2 ; /
1
1 sin
d 3
2 sin
d 2 d 1
d
d 2 d:
on the right-hand side we obtain the recursive
D .1 t 2 /.d 3/=2 dd 1 .
2; : : : ;
d 2 ; / dt
(1)
Appendix C
Advanced analysis
C.1 Functions of a complex variable We refer to [46] and [56] for proofs of the classical results not presented in the present section. Throughout this section the symbol D denotes a nonempty open subset of C. If r 2 .0; 1 and z0 2 C, then we write D.z0 ; r/ D fz 2 C W jz z0 j < rg: Definition C.1.1. A complex-valued function f on D is said to be differentiable at z0 2 D if the limit f .z/ f .z0 / lim z!z0 z z0 exists. This limit is then denoted by f 0 .z0 / and is called the derivative of f . Higher order derivatives are defined recursively in the same way as for real-valued functions. If f is differentiable at all points of D, then f is said to be holomorphic on D. The set of all holomorphic functions on D will be denoted by H.D/. Functions that are holomorphic on C are also called entire. Unlike the real case, holomorphic functions are infinitely often differentiable. Theorem C.1.2. If f 2 H.D/ then f 0 2 H.D/. Power series and Taylor expansion C.1.3. A power series is a series of the form 1 X ck .z z0 /k kD0
where z; z0 and ck are complex numbers. The radius of convergence r of this series is a nonnegative number or 1 given by 1 : p rD lim sup k jck j k!1
If jz z0 j < r, then the series converges absolutely; if jz z0 j > r, then the series is not convergent. If r > 0, then the function 1 X ck .z z0 /k ; z 2 D.z0 ; r/ g.z/ D kD0
279
Section C.1 Functions of a complex variable
is holomorphic on D.z0 ; r/. If f is holomorphic on D and D.z0 ; q/ D for some q > 0, then the convergence radius of the power series 1 X f .k/ .z0 / .z z0 /k kŠ kD0
is at least q and the series coincides with f on D.z0 ; q/. This series is called the Taylor expansion of f at z0 . Line integrals C.1.4. A curve in the complex plane is a continuous mapping of a finite interval Œa; b R into C. If .a/ D .b/, then is said to be closed. A curve can be written in the form
.t / D x.t / C iy.t /;
t 2 Œa; b
where x and y are continuous real-valued functions on Œa; b. If x and y are piecewise continuously differentiable, then is called a path. We define 0 .t / by
0 .t / D x 0 .t / C iy 0 .t / for all t where the derivatives x 0 .t / and y 0 .t / exist. Let f W D ! C be a continuous function and W Œa; b ! D; a; b 2 R, be a path in D. The line integral Z f .z/ dz
of f along is defined by Z
Z f .z/ dz D
b
f . .t // 0 .t / dt:
a
We use the notation Z Z Z f .z/ dz WD f .z/ dz D .z1 z0 / Œz0 ;z1
1
f .z0 C .z1 z0 /t / dt;
0
z0 ; z1 2 C
where is the path given by
.t / D z0 C .z1 z0 /t;
0 t 1:
For z0 ; z1 ; z2 2 C let D .z0 ; z1 ; z2 / denote the triangle with vertices at z0 ; z1 and z2 and define Z Z Z Z f .z/ dz WD f .z/ dz C f .z/ dz C f .z/ dz: @
Œz0 ;z1
Œz1 ;z2
Œz2 ;z0
280
Appendix C Advanced analysis
Theorem C.1.5 (Cauchy). Suppose D is a convex open set and f 2 H.D/. Then Z f .z/ dz D 0
for every closed path in D. If z0 2 D, then the function Z f .w/ dw; z 2 D F .z/ WD Œz0 ;z
belongs to H.D/ and F 0 D f . Theorem C.1.6 (Morera). Let f W D ! C be a continuous function such that Z f .z/ dz D 0 @
for every triangle D. Then f 2 H.D/. Simply connected sets C.1.7. Suppose 0 and 1 are closed curves in X C, both with parameter interval Œ0; 1. We say that these curves are X -homotopic if there is a continuous mapping H from Œ0; 1 Œ0; 1 into X such that H.s; 0/ D 0 .s/;
H.s; 1/ D 1 .s/;
H.0; t / D H.1; t /;
t; s 2 Œ0; 1:
If X is connected and every closed curve in X is X -homotopic to a constant mapping, then X is said to be simply connected. Intuitively, this means that every closed curve can be continuously deformed to a single point within X . Theorem C.1.8. Suppose that D is connected and let fzn g be a sequence in D having at least one accumulation point in D. If f and g are holomorphic functions in D and f .zn / D g.zn / for all n, then f .z/ D g.z/ for all z 2 D. Next we prove a result on the zeros of certain holomorphic functions. Theorem C.1.9 (Eneström–Kakeya1 ). Let a0 a1 a2 0 and a0 ¤ 0. Then 1 X aj z j ¤ 0; jzj < 1: f .z/ WD j D0
Proof. We may assume that a0 D 1. Setting g.z/ WD
1 X
.aj 1 aj /z j ;
jzj < 1
j D1 1
The original Eneström–Kakeya theorem is on the zeros of polynomials. The present formulation and its simple proof is due to T. Koornwinder.
281
Section C.1 Functions of a complex variable
we have .1 z/f .z/ D 1 g.z/. Since 1 X
.aj 1 aj / D 1 lim aj 1 j !1
j D1
and aj 1 aj 0 we see that jg.z/j
1 X
k
.aj 1 aj /jzj
j D1
1 X
.aj 1 aj /jzj jzj:
j D1
So j.1 z/f .z/j 1 jg.z/j 1 jzj > 0;
jzj < 1:
Proposition C.1.10. For n 2 N0 we have Z .1 t 2 /n dt D .1 z/nC1 Qn .z/ Œz;1
where Qn is a polynomial of degree n, the zeros of which lie in the closed disc D.z0 ; r/ where2
1 1 1 1 z0 D nC1C and r D nC1 : 2 nC1 2 nC1 Proof. We consider the polynomial Z .1 t /nCj .1 C t /nj dt; Pj .z/ D Œz;1
Then
0 j n; z 2 C:
1 .1 z/2nC1 2n C 1
(1)
1 nj .1 z/nCj C1 .1 C z/nj C Pj C1 .z/ nCj C1 nCj C1
(2)
Pn .z/ D and integration by parts gives Pj .z/ D
for all j < n. Setting cj WD
1 n n1 nj C1 nC1 nC2 nC3 nCj C1
from (1) and (2) we obtain P0 .z/ D
n X
cj .1 z/nC1Cj .1 C z/nj
j D0 2
This proposition is taken from [39] where the authors state that the zeros of Qn lie in the open disc D.z0 ; r/. However, the real part of two zeros of Q1 is equal to 12 .
282
Appendix C Advanced analysis
and therefore n
Qn .z/ D .1 C z/
n X j D0
cj
1z 1Cz
j :
(3)
Using the notation
aj WD
nC2 n
j cj ;
H.w/ WD
n X
aj w j
j D0
equation (3) yields
Qn .z/ D .1 C z/n H
n 1z nC2 1Cz
:
Since
nj nC2 aj C1 D 1 aj nCj C2 n the Enström–Kakeya Theorem C.1.9 shows that all of the zeros of H lie in the set S WD C n D.0; 1/. The proposition follows from the fact that the image of S [ f1g by the reciprocal map of n 1z z 7! w D nC21Cz is D.z0 ; r/. 0
k. Theorem C.1.15 (Jensen’s formula). Suppose 0 < r < R, and let f be a holomorphic function on D.0; R/ such that f .0/ ¤ 0. Denote by z1 ; : : : ; zN the zeros of f in the closed disc D.0; r/, listed according to their multiplicities. Then Z N Y 1 r i log jf .re /j d : D exp jf .0/j jzn j 2 nD1
Corollary C.1.16. Let f be an entire function such that f .0/ D 1 and denote by n.r/; 0 < r < 1, the number of zeros of f in the closed disc D.0; r/. Then n.r/
log M.2r/ log 2
where M.r/ D
sup
2Œ0;2/
jf .rei /j:
284
Appendix C Advanced analysis
Proof. Let fzk g be the sequence of zeros of f , listed according to their multiplicities and arranged so that the sequence fjzk jg is increasing. By Jensen’s formula and inequality Z 1 i log jf .2re /j d M.2r/ exp 2 D
n.2r/ Y kD1
n.r/ Y 2r 2r 2n.r/ : jzk j jzk j kD1
Taking logarithms we obtain the desired inequality. Definition C.1.17. An entire function f is said to be of finite order if there exist positive numbers A and C such that jf .z/j C ejzj ; A
z 2 C:
The infimum of all numbers A for which this is true is called the order of f . A polynomial is of order zero. The functions ez ; cos z and sin z are of order 1 while z e ; k 2 N, is of order k. The function ee is of infinite order. zk
Lemma C.1.18. If f is an entire function of order , f .0/ ¤ 0, and r1 ; r2 ; : : : are the moduli of the zeros of f , then the series X 1 rn˛
n1
is convergent if ˛ > . Theorem C.1.19 (Hadamard). If f is an entire function of order , with zeros z1 ; z2 ; : : : such that f .0/ ¤ 0, then
Y
1 z 2 1 z p z z Q.z/ f .z/ D e C C C 1 exp zn zn 2 zn p zn n1
where Q is a polynomial of degree not greater than and p is the smallest integer for which X 1 jzn jpC1 n1
is convergent.
285
Section C.2 Almost periodic functions
Remark C.1.20. Since the functions cos and sin are of order 1, the well-known product formulae 1
Y z2 sin z D z 1 2 2 n nD1 1
Y z2 cos z D 1 .n 1=2/2 2 nD1
are consequences of Hadamard’s theorem.4 Theorem C.1.21 (Phragmen–Lindelöf). Let ˛ < ˇ ˛ C and let f be a function, holomorphic in the sector S D fz 2 C W ˛ < arg z < ˇg and continuous on its boundary. If jf .z/j M for all z on the boundary of S and jf .z/j C ejzj ; A
where 0 C and 0 A
0, there exists n 2 N such that jg.t / g.s/j < " whenever t; s 2 Œ0; 1 and jt sj 1=n. Using this and the fact that g is periodic with period 1 we see that 1 [
Uj WD
Œk C .j 1/=n; k C j=n;
j D 1; : : : ; n
kD1
is an "-partition for g. Lemma C.2.3. If g and h are almost periodic, then so is their sum. Proof. If fU1 ; : : : ; Un g and fV1 ; : : : ; Vm g are "-partitions for g and h, respectively, then the sets Ui \ Vj ; i D 1; : : : ; nI j D 1; : : : ; m form a 2"-partition for g C h. Lemma C.2.4. If g is the uniform limit of almost periodic functions gn ; n 2 N, then g is almost periodic. Proof. For each " > 0 there exists n 2 N such that jg.t / gn .t /j < "=3;
t 2 R:
Since jg.s C x/ g.s C y/j jg.s C x/ gn .s C x/j C jgn .s C x/ gn .s C y/j C jgn .s C y/ g.s C y/j jgn .s C x/ gn .s C y/j C 2"=3 we see that every "=3-partition for gn is an "-partition for g. Applying the previous lemmata we obtain the following corollary. Corollary C.2.5. Every function g of the form g.t / D
1 X
cn ei˛n t ;
t 2R
nD1
where ˛n 2 R; cn 2 C and
P1
nD1 jcn j
< 1, is almost periodic.
Theorem C.2.6. Let g be a continuous almost periodic function. Then for all " > 0 there exists a positive number L D L."/ such that every closed interval of the form
287
Section C.3 Fourier series
Œa; a C L; a 2 R, contains a number satisfying6 jg.s C / g.s/j < ";
s 2 R:
(1)
Proof. Let fU1 ; : : : ; Un g be an "-partition for g. Further, let a 2 R and aj 2 Uj be arbitrary and write L WD 2 max.ja1 j; : : : ; jan j/. Choose j 2 f1; : : : ; ng such that a C L=2 2 Uj and define by WD a C L=2 aj . Then 2 Œa; a C L. Since aj and a C L=2 are in Uj we have jg.s C a C L=2/ g.s C aj /j < ;
s 2 R:
Replacing here s by s aj we see that satisfies the inequality (1). Corollary C.2.7. If g is a continuous almost periodic function, then the relation lim inf jg.t / g.s/j D 0 t!1
holds for all s 2 R.
C.3
Fourier series
Let f be a Lebesgue integrable function on the interval Œ0; 2/. The series X fO.n/einx ; x 2 Œ0; 2/ n2Z
where 1 fO.n/ D 2
Z
2
f .x/einx dx;
0
n2Z
is called the Fourier series of f . If N is a nonnegative integer, the N-th symmetric partial sum of the Fourier series of f is X fO.n/einx : sN f .x/ D jnjN
If f D 1Œa;b/ where Œa; b/ Œ0; 2/, then sN f .x/ D
1 X eia eib ijt e : 2 ij jj jN
6
It can be shown that this property implies almost periodicity; see Maak [41], p. 94.
288
Appendix C Advanced analysis
Recall that the total variation V .f / of a complex-valued function f defined on an interval Œa; b is the quantity V .f / WD sup
m X
jf .xi / f .xi1 /j
iD1
where the supremum is taken over all m 2 N and all xi 2 Œa; b such that 0 D x0 < x1 < < xm D b . If V .f / is finite, then f is said to be of bounded variation. A real-valued function f is of bounded variation if and only if it can be expressed as the difference of two monotone functions. A proof of the next theorem can be found, e.g., in Section 10.1 of [13]. Theorem C.3.1. Let f be a Lebesgue integrable function on the interval Œ0; 2/ and suppose that the total variation of f is finite. Then lim sN f .x/ D
N !1
1 Œf .x C 0/ C f .x 0/; 2
and
x 2 Œ0; 2/
jsN f .x/j K
with some constant K
0.7
Applying Lebesgue’s theorem on dominated convergence we obtain the following corollary. Corollary C.3.2. Let f be as in the previous theorem and let be a complex Borel measure on Œ0; 2/. If .fxg/ D 0 for all discontinuity points8 of f , then sN f converges to f in L2 . /.
C.4 The Gamma function and the formulae of Stirling and Binet Definition C.4.1. The Gamma function is defined by Z 1 t x1 et dt; x 2 .0; 1/: .x/ D 0
Note that if Z is an exponentially distributed random variable with expectation one, then Z 1 E .Z x / D
7 8
t x et dt D .x C 1/;
0
f .0 0/ is defined by the 2-periodic extension of f . Discontinuity in 0 is defined by the 2-periodic extension of f .
x > 1:
289
Section C.4 The Gamma function and the formulae of Stirling and Binet 8
6
4 n D 12 2
nD6
1
2
3
4
Figure C.1. The Gamma function (continuous line) and the function x 7! nŠnx =x.x C 1/ .x C n/ from relation (C.4.2.v).
Lemma C.4.2. We have (i)
.x/ < 1 and is continuous;
(ii)
.x C 1/ D x.x/;
(iii) (iv)
.n C 1/ D nŠ; n 2 N0 ; p .1=2/ D ;
(v)
.x/ D limn!1
nŠnx x.xC1/.xCn/
:
The relation (C.4.2.v) is illustrated in Figure C.1. Proof. (i) It is easy to check that .x/ < 1 while the continuity can be proved by using Lebesgue’s theorem on dominated convergence. (ii) Integrating by parts we obtain Z 1 Z 1 ˇ x t x t ˇ1 t e dt D t e 0 C x t x1 et dt D x.x/: .x C 1/ D 0
0
(iii) Since .1/ D 1, (iii) follows from (ii) by induction on n. (iv) We have Z 1 Z 1 1 t 1=21 et dt D p et dt .1=2/ D t 0 0 Z 1 p 2 D2 eu du D 0
290
Appendix C Advanced analysis
where we made the substitution t D u2 and applied Lemma B.1.10. (v) Repeated integration by parts gives
Z Z n 1 n x t n t n1 x1 t dt D t 1 dt 1 n x 0 n 0 nŠnx D x.x C 1/ .x C n/ from which (v) follows by dominated convergence.9 Definition C.4.3. The Beta function B (see Figure C.2) is defined by the equation Z 1 B.a; b/ D .1 t /a1 t b1 dt; a; b > 0: 0
8
6
y D 0:2
4 y D 0:4 2 y D 2:0 1
2
3
Figure C.2. The functions By W x 7! B.x; y/ where B is the Beta function from Definition C.4.3.
Proposition C.4.4. We have B.a; b/ D
9
.a/.b/ ; .a C b/
a; b > 0:
It is easy to check by differentiating that 1Œ0;n .t /.1 t =n/n is an increasing function of n.
Section C.4 The Gamma function and the formulae of Stirling and Binet
291
Proof. It follows from the definition of the Gamma function that the function pa defined by x a1 ex pa .x/ WD 1Œ0;1/ .x/ ; x2R .a/ is a probability density. We have Z 1 Z x ex pa .x y/pb .y/ dy D .x y/a1 y b1 dy pa pb .x/ D .a/.b/ 0 1 Z 1 x aCb1 ex D .1 t /a1 t b1 dt .a/.b/ 0 where we made the substitution y D tx. The relation above shows pa pb D cpaCb with some constant c. Since pa pb and paCb are probability densities, we conclude that c D 1 from which the assertion follows. Proposition C.4.5 (Legendre’s duplication formula). We have
p 1 2x1 .2x/ D 2 .x/ x C ; x > 0: 2 Proof. Let f1 .x/ D ex ; f2 .x/ D ex x 1=2 and denote by g the multiplicative convolution of f1 and f2 , i.e., Z 1 1 g.x/ D f1 .x=u/f2 .u/ du; x 2 .0; 1/: u 0 Then Mf1 .s/ D .s/; Mf2 .s/ D .s C 12 / and, by Lemma C.6.3, M g.s/ D Mf1 .s/ Mf2 .s/ where M denotes Mellin transformation. We prove that p .2s/ M g.s/ D 22s1 from which the assertion follows. We have Z 1 Z 1 2 2 .x=uCu/ 1 g.x/ D e p du D 2 e.x=t Ct / dt: u 0 0 p
Changing t to
t
x
yields Z g.x/ D 2
p
1
.x=t 2 Ct 2 /
e 0
and therefore
Z g.x/ D 0
1
e.x=t
2 Ct 2 /
x
t2
dt
p
x 1 C 2 dt: t
292
Appendix C Advanced analysis p
Substituting u D t tx we obtain Z 1 Z p p .u2 C2 x/ 2 x e du D e g.x/ D 1
1
1
eu du D 2
p
e2
p
x
:
p Finally, set y D 2 x and obtain Z 1 p p e2 x x s1 dy M g.s/ D 0 p Z 1 2s1 p .2s/ y y e dy D : D 2 22s1 0 Theorem C.4.6 (Binet). We have .x C 1/ D where10
Z .x/ D 0
x x p
1
e
2x e.x/ ;
x>0
(1)
1 1 xt 1 1 C e dt: et 1 t 2 t
C.4.7. For the proof of Binet’s formula we define the function ' by the equation Z 1 p 1t x 1 '.x/ e Dp x te dt; x > 0 2 0 so that
x x p
2x e'.x/ : (1) e Binet’s formula is equivalent with .x/ D '.x/. For the proof of this we need that and ' satisfy a certain difference equation and that . 12 / D '. 12 /. First we show these facts. .x C 1/ D
Lemma C.4.8. For all x > 0 we have
1 1 '.x/ '.x C 1/ D .x/ .x C 1/ D x C log 1 C 1: 2 x
Proof. Denote by g.x/ the right-hand side of the relation above. The equation '.x/ '.x C 1/ D g.x/ 10
That the integral below exists follows from the relation
1 1 1 1 1 lim C D t!0 et 1 t 2 t 12 which can be shown, e.g., by using the Bernoulli–l’Hospital rule.
(1)
293
Section C.4 The Gamma function and the formulae of Stirling and Binet
follows immediately from (C.4.7.1) by using the relation .x C 2/ D .x C 1/.x C 1/: To prove the statement on , first note that lim .x/ .x C 1/ D lim g.x/ D 0:
x!1
x!1
Moreover,11 0 .x/ 0 .x C 1/ D
Z
1 0
ext C e.xC1/t ext e.xC1/t dt: t 2
Applying Lemma B.1.12 we obtain
1 1 1 1 0 0 .x/ .x C 1/ D ln 1 C C D g 0 .x/ x 2 x xC1 completing the proof. Lemma C.4.9. '. 12 / D . 12 / D 12 12 log 2. p Proof. Since . 32 / D 12 , equation (C.4.7.1) shows that '. 12 / D By an obvious substitution, Z 1
2 1 1t 1 1 .1/ D C e 2 dt: 1 2 t 0 e2t 1 t
1 2
1 2
log 2.
Using this, we obtain .1=2/ D ..1=2/ .1// C .1/ Z 1
Z 1 1t 1 1 1 1 t 1 e 2 1 t dt C C e dt D t e 1 t et 1 t 2 t 0 0 Z 1 1t 1 t 1 e 2 et D e dt t 2 t 0
1 1 Z 1 d e 2 t et e 2 t et D dt: dt t 2t 0 Applying Lemma B.1.12 we obtain the desired result. Proof of Binet’s formula. We have to show that '.x/ D .x/. By Lemma C.4.8, .x/ .x C 1/ D '.x/ '.x C 1/:
11
The footnote to Lemma B.1.12 also holds for the differentiation below.
294
Appendix C Advanced analysis
Applying this to x; x C 1; : : : ; x C n 1 and summing these equations we see that .x/ .x C n/ D '.x/ '.x C n/: Since limn!1 .x C n/ D 0, we immediately obtain .x/ D '.x/ lim '.x C n/ DW '.x/ h.x/:
(1)
n!1
Next we show that the function h is decreasing. If 0 y x and 0 p 1, then p p p p x C np xCn y C np yCn x C np yCn y C np yCn p p . x C n y C n/p for all n 1. Noting that 0 t e1t 1; t 0, and using the definition of ', we conclude that p p e'.xCn/ e'.yCn/ . x C n y C n/e'.1/ : Taking the limit as n ! 1, we obtain eh.x/ eh.y/ 0, i.e., h.x/ h.y/. Since the function h is also periodic with period 1, it must be constant. Applying (1) and Lemma C.4.9, we now obtain that h.x/ D h. 12 / D 0 for all x > 0. Since limx!1 .x/ D 0, from Theorem C.4.6 we immediately obtain: Corollary C.4.10 (Stirling’s formula). lim
n!1
nŠ p
.n=e/n 2 n
D 1:
To prove a more precise statement we will need the Bernoulli numbers. C.4.11. The Bernoulli numbers Bn ; n 2 N0 , are defined by12 z D ez 1
1 1 1Š
C
z 2Š
C
z2 3Š
C
D
1 X Bn n z ; nŠ
z 2 C; jzj < :
nD0
It follows from this definition that
1 z z2 B0 B1 B2 2 C C C C zC z C D 1; 1Š 2Š 3Š 0Š 1Š 2Š
jzj < :
Comparing the coefficients we see that B0 D 1 and 1 B1 1 B2 1 Bn1 1 B0 C C C C D0 nŠ 0Š .n 1/Š 1Š .n 2/Š 2Š 1Š .n 1/Š 12
The expression on the left-hand side is defined by continuity at z D 0.
(1)
295
Section C.4 The Gamma function and the formulae of Stirling and Binet
if n > 1. Multiplying both sides by nŠ we obtain ! ! ! ! n n n n B0 C B1 C B2 C C Bn1 D 0: 0 1 2 n1
(2)
In particular, 0 D 1 C 2B1 ; 0 D 1 C 3B1 C 3B2 ; 0 D 1 C 4B1 C 6B2 C 4B3 ; 0 D 1 C 5B1 C 10B2 C 10B3 C 5B4 and hence
1 B1 D ; 2
1 B2 D ; 6
B3 D 0;
B4 D
1 : 30
Using that B1 D 12 , relation (1) gives z
1C
z
B2 2 z z z ez C 1 z e 2 C e 2 z C D z C D D z z : 2Š e 1 2 2 ez 1 2 e 2 e 2
(3)
The function on the right-hand side is even, showing that B2kC1 D 0 for all k 2 N. Theorem C.4.12. For all n; N 2 N we have 2N X j D1
2N C1 X nŠ B2j B2j < < log : p 2j 1 n 2j.2j 1/n 2j.2j 1/n2j 1 .n=e/ 2 n j D1
Proof. From equation (C.4.11.3) we see that 1 X 1 1 B2j 2j 1 1 : C D z z e 1 z 2 .2j /Š j D1
By Problem 154 in Part I, Chapter 4 of [44], the series on the right-hand side has the so-called enveloping property, i.e., 2N 2N C1 X X 1 1 1 B2j 2j 1 B2j 2j 1 < t ; t C < t .2j /Š e 1 t 2 .2j /Š
j D1
Using the equation j Š D the assertion.
0 < t < :
j D1
R1 0
t j et dt and the definition of we immediately obtain
296
Appendix C Advanced analysis
C.5 Bessel functions Definition C.5.1. For 0 we define13 the Bessel function J by x Z 1 1 .1 t 2 /1=2 eitx dt; x 0: J .x/ D p C 12 2 1
(1)
For x D D 0 the expression x2 is equal to 1 by definition. Using that .1=2/ D p we see that Z 1 1 1 1 p dt D arcsin t j11 D 1 J0 .0/ D 1 1 t 2 while J .0/ D 0 if > 0. If D 1=2, then the integral above can be easily evaluated. We obtain: r 2 J1=2 .x/ D sin x: x Note that .1 t 2 /1=2 is even in t , hence one can replace eitx by cos tx in Definition C.5.1. Lemma C.5.2. The function J is continuous and jJ .x/j C x ; where C D p
1
C 12 2
Z
; x 0 1 1
.1 t 2 /1=2 dt:
Proof. The continuity follows from the fact that the integral in Definition C.5.1 represents a characteristic function, up to a constant factor. The second statement is obvious. The substitution t D cos ' in Definition C.5.1 yields: Lemma C.5.3. For 0 we have x Z 1 J .x/ D p eix cos ' sin2 ' d'; C 12 2 0 13
x 0:
We will extend the definition of J for all 2 R in Remark C.5.6. Note that the extension is possible for all complex but we do not need it in this book. The functions J0 and J1 are shown in Figure C.3.
297
Section C.5 Bessel functions
1
J0 J1
0.5
2
4
6
8
10
0:5 Figure C.3. The Bessel functions J0 and J1 from Definition C.5.1.
Theorem C.5.4. The formula J .x/ D
1 x X
2
kD0
x 2k .1/k kŠ .k C C 1/ 2
holds for all 0 and x 0.
Proof. Replacing eitx by its power series expansion in Definition C.5.1 we see that J .x/ is equal to Z 1 1 x X 1 1 .ixt /2k .1 t 2 /1=2 dt: p 1 2 .2k/Š C 2 1 kD0 Proposition C.4.4 implies Z 1 .k C 1=2/. C 1=2/ t 2k .1 t 2 /1=2 dt D .k C C 1/ 1 so J .x/ D
1 x X 1 1 .k C 1=2/. C 1=2/ : .ix/2k p 1 .k C C 1/ C 2 2 kD0 .2k/Š
298
Appendix C Advanced analysis
Using .2k/Š D .2k C 1/; tion C.4.5), which implies
p
D .1=2/, and the duplication formula (cf. Proposi-
22k .k C 1=2/ D .1=2/.2k C 1/ .k C 1/ we obtain the assertion. Proposition C.5.5. We have d .x J .x// D x J1 .x/; dx and
1; x 0
d .x J .x// D x JC1 .x/; dx
0; x > 0
Proof. Applying Theorem C.5.3 we obtain 1 d d X 1 .1/k x 2kC2 .x J .x// D 2kC dx dx kŠ .k C C 1/ 2 kD0
1 X .1/k .2k C 2/ 1 D x 2kC21 kŠ .k C C 1/ 22kC
D
kD0 1 X kD0
1 .1/k x 2kC21 2kC1 kŠ .k C / 2
D x J1 .x/: The second equation can be proved in the same way: 1 d X 1 .1/k d x 2k .x J .x// D 2kC dx dx kŠ .k C C 1/ 2 kD0
D
1 X
1 .1/k 2k x 2k1 kŠ .k C C 1/ 22kC
kD1 1 X
D D
kD0 1 X kD0
1 .1/k .2k C 2/ x 2kC1 .k C 1/Š .k C C 2/ 22kC2C 1 .1/k x 2kC1 kŠ .k C C 2/ 22kC1C
D x JC1 .x/: Remark C.5.6. We use the first equation in Proposition C.5.5 to define J .x/ recursively for all 2 R and x > 0. Doing so, both equations in Proposition C.5.5 hold for
299
Section C.5 Bessel functions
all such and x. Note that the first equation can also be written in the form d C J .x/ D J1 .x/; x > 0: dx x q 2 sin x we obtain From J1=2 .x/ D x r 2 cos x; x > 0: J1=2 .x/ D x The second equation in Proposition C.5.5 can also be written in the form d J .x/ D JC1 .x/; x > 0: dx x Putting together (1) and (2) produces the equation
1 d d C J .x/ D J .x/ dx x dx x which is equivalent to
(1)
(2)
1 d 2 d2 C C 1 2 J .x/ D 0 dx 2 x dx x
known as Bessel’s equation. Adding and subtracting (1) and (2) we obtain the identities 2J0 .x/ D J1 .z/ JC1 .z/ 2 J .x/ D J1 .z/ C JC1 .z/ : z Definition C.5.7. The modified Bessel function of the second kind14 of index 2 C is defined by Z 1 ez cosh t cosh.t / dt; z 2 C; Re z > 0: K .z/ D 0
It follows from the definition that K is continuous, K D K if 2 R, and K .r/ > 0 if 2 R and r > 0. In this book we need only the following simple lemma. Lemma C.5.8. For all b 2 R and a > 0 we have Z 1 r u a 1 e 2 . a C u / ub1 du; Kb .r/ D b 2a 0 14
r > 0:
This function has also been called by the now-rare name modified Bessel function of the third kind. The function K1 is shown in Figure C.4.
300
Appendix C Advanced analysis
10
5
1
2
3
Figure C.4. The modified Bessel function K1 from Definition C.5.7.
Proof. Using the substitution u D aet we obtain Z 1 r u a 1 e 2 . a C u / ub1 du b 2a 0 Z 1 1 r .et Cet / bt e 2 e dt D 2 1 Z 1 1 er cosh t ebt dt D Kb .r/: D 2 1
Note that setting a D 2r in the previous lemma leads to the inequality Z 1 2 b1 b r eu er =.4u/ ub1 du .b/2b1 r b ; a; b > 0: Kb .r/ D 2 0
C.6 The Mellin transform C.6.1. We consider G D .0; 1/ as a group with multiplication as operation. The mapping ' W r 7! ln .r/; r 2 G, is then an algebraical and topological isomorphism between G and the additive group R of all real numbers. Using this isomorphism we see that is a continuous character of G if and only if it has the form .r/ D r ix with some x 2 R. Denoting by the image of the Lebesgue
301
Section C.6 The Mellin transform
measure by the mapping ' 1 , we have ..a; b// D ln.b/ ln.a/; 0 < a b < 1: More generally,
Z
1
Z f .s/ d .s/ D
0
0
1
f .s/ ds; s
f 2 L1 . /:
Since is invariant under translations we have Z 1 Z 1 f .rs/ d .s/ D f .s/ d .s/; 0
0
f 2 L1 . /:
The multiplicative convolution on G corresponds to the convolution on R. The multiplicative convolution on G is given by Z 1 f .r=s/g.s/ d .s/; r 2 G; f; g 2 L2 . /: h.r/ D 0
Definition C.6.2. Let f be aR Lebesgue measurable, complex-valued function on 1 .0; 1/. For all z 2 C such that 0 jf .t /t z1 j dt < 1 we define the Mellin transform Mf of f by Z 1 Z 1 z1 Mf .z/ WD f .s/s ds D f .s/s z d .s/: 0
0
Lemma C.6.3. If h is the multiplicative convolution of f and g, then M h.z/ D Mf .z/ M g.z/ for all z 2 C for which the Mellin transforms Mf .z/ and M g.z/ exist. Proof. Indeed, using that is invariant under multiplications, we obtain Z 1 Z 1 f .r/r z d .r/ g.s/s z d .s/ Mf .z/M g.z/ D 0 Z0 1 Z 1 f .r/g.s/.rs/z d .r/ d .s/ D Z0 1 Z0 1 f .r=s/g.s/r z d .r/ d .s/ D M h.z/: D 0
0
Remark C.6.4. Substituting s D et and setting g.t / WD f .et / in Definition C.6.2 we obtain Z 1 g.t /etz dt: (1) Mf .z/ D 1
302
Appendix C Advanced analysis
Setting here z D iy; y 2 R, we obtain Z 1 Mf .iy/ D eiyt g.t / dt D g.y/; O 1
y 2 R:
In the same way as in Section 3.3 we see that there exists a; b 2 Œ1; 1; a b, such that the integral in (1) exists if a < Re z < b and Mf is holomorphic in this strip. Example C.6.5. If f .s/ D es , then Z Mf .z/ D
1
et t z1 dt:
0
In this case, Mf is holomorphic in the region Re .z/ > 0 and Mf is equal to the Gamma function. Let f .s/ D 1=.1 C s/. Using the substitution s D x=.1 x/ we see that Z 1 z s1 .1 x/z dx: Mf .z/ D 0
In this case, Mf is holomorphic in the region fz 2 C W 0 < Re .z/ < 1g and Mf .z/ D B.z; 1 z/ .
C.7 The Laplace transform Throughout this section denotes a nonnegative measure on Œ0; 1/ such that the function x 7! etx is -integrable for all t > 0. Definition C.7.1. The function L defined by Z 1 L .t / D etx d .x/;
t >0
0
is called the Laplace transform of .15 Example C.7.2. If d .x/ D x a dx; a > 1, then Z 1 Z 1 1 .a C 1/ L .t / D x a etx dx D aC1 y a ey dy D ; t t aC1 0 0
t > 0:
Lemma C.7.3. The measure is finite if and only if its Laplace transform is bounded. 15
A classical reference for Laplace transforms is [60].
303
Section C.7 The Laplace transform
Proof. If is finite, then Z 1 Z L .t / D etx d .x/ 0
1
1 d .x/ D .Œ0; 1// < 1;
t > 0:
0
Assume that L K for some K 2 Œ0; 1/. By monotone convergence we have Z 1 Z 1 tx e d .x/ D 1 d .x/ D .Œ0; 1// K lim L .t / D lim t!0
t!0 0
0
i.e., is bounded. Lemma C.7.4. The function L is infinitely differentiable and Z 1 .k/ k .L / .t / D .1/ x k etx d .x/; t > 0 0
for all k 2 N0 . Proof. The lemma follows by successive differentiation which can be justified since x k etx D x k etx=2 etx=2 Kk;t etx=2 ;
x>0
where Kk;t is uniformly bounded for all t from a fixed compact subset of .0; 1/. Theorem C.7.5. The measure is uniquely determined by its Laplace transform. Proof. Writing d.x/ WD ex d .x/, the measure is finite. Since L.t / D L .tC1/, it suffices to show that is uniquely determined by its Laplace transform. By Lemma C.7.4 we have Z 1 k .k/ x k etx d.x/; t > 0; k 2 N0 .1/ .L/ .t / D 0
and hence X
k
.k/
.1/ .L/
0kta
tk .t / D kŠ
Z
1 0
etx
X 0kta
.tx/k d.x/; kŠ
a; t > 0:
The integrand on the right-hand side is at most 1. Using this, dominated convergence and Lemma B.1.18 show that Z 1 X tk .1/k .L/.k/ .t / D 1Œ0;a .x/ d.x/ D .Œ0; a/: lim t!1 kŠ 0 0kta
This relation shows that is uniquely determined by L.
304
Appendix C Advanced analysis
C.8 Existence of continuous logarithms Definition C.8.1. Let f be a complex-valued function defined on some topological space X . A continuous complex-valued function F on X is called a continuous logarithm of f if f .x/ D eF .x/ ; x 2 X: By a continuous argument of f we mean any real-valued continuous function on X such that f .x/ D jf .x/j ei.x/ ; x 2 X: Remark C.8.2. Note that if F is a continuous logarithm of f , then WD Im F is a continuous argument of f . It is clear that functions which have a continuous logarithm cannot have zeros. If X D T C and f .z/ D z; z 2 T, then f is continuous and has no zeros. However, f does not have a continuous logarithm. To see this, assume that z D ei.z/ ;
z2T
with some continuous function W T ! R. In particular, is injective and hence .z/ .z/ ¤ 0. Consequently, g.z/ WD
.z/ .z/ ; j.z/ .z/j
z2T
defines a continuous real-valued function on T. Since jg.z/j D 1 we must have g.T/ f1; 1g . But g.z/ D g.z/ and therefore g.T/ D f1; 1g. This is a contradiction, since T is connected while f1; 1g is disconnected. Lemma C.8.3. Let X be a connected space and f W X ! C n f0g be a continuous function. If 1 and 2 are continuous arguments of f , then there exists k 2 Z such that 2 .x/ D 1 .x/ C 2ki; x 2 X: Proof. From f .x/ D jf .x/j ei1 .x/ D jf .x/j ei2 .x/ we see that 2 .x/ D 1 .x/ C 2k.x/i with some function k W X ! Z. Since 1 and 2 are continuous, so is k. The continuous image of connected sets is connected, therefore k.X / consists of a single integer. Lemma C.8.4. Every continuous function f W X ! T such that f .X / ¤ T has a continuous argument.
305
Section C.8 Existence of continuous logarithms
Proof. Choose ˛ 2 R such that ei˛ 2 T n f .X /. The mapping t 7! eit ;
t 2 .˛; ˛ C 2/
is continuous, injective and it maps .˛; ˛ C 2/ onto T n fei˛ g. Denoting by inverse mapping we have ˚ z D ei .z/ ; z 2 T n ei˛ : Thus, the function WD
the
.f / is a continuous argument of f .
Lemma C.8.5. Let f1 and f2 be continuous functions on X with values in T such that f1 .x/ ¤ f2 .x/ for all x 2 X . If f1 has a continuous argument, then so does f2 . Proof. Let 1 be a continuous argument of f1 . The function f WD f1 =f2 is continuous, maps X into T and f .x/ ¤ 1 for all x. By Lemma C.8.4 it has a continuous argument . If follows from the definition of f that 1 is a continuous argument of f2 . Proposition C.8.6. Let X be a compact topological space and f W X Œ0; 1 ! Cnf0g be a continuous function. If the restriction of f to X f0g has a continuous argument, then so does the restriction of f to X f1g. Proof. Replacing f by f =jf j we may assume that jf j D 1. The function f is uniformly continuous on the compact set X Œ0; 1. Therefore, there exists n 2 N such that jf .x; s/ f .x; t /j 1; x 2 X (1) whenever js t j 1=n. For j D 0; 1; : : : ; n we define the function fj by fj .x/ WD f .x; j=n/;
x 2 X:
Inequality (1) shows that jfj fj C1 j 1. Since jfj j D jfj C1 j D 1 we conclude that fj .x/ ¤ fj C1 .x/ for all x. The statement of the proposition follows by applying Lemma C.8.5 n times. Theorem C.8.7. Let f W Rd ! Cnf0g be a continuous function such that f .0/ > 0. Then f has a unique continuous argument such that .0/ D 0.16 Proof. Replacing f by f =jf j we may assume that jf j D 1 and f .0/ D 1. Denote by fn the restriction of f to the compact set Bdc .n/; n 2 N. We claim that fn has a continuous argument n such that n .0/ D 0. To show this define the function gn by gn .x; t / WD fn .tx/; 16
See also Lemma C.1.11.
.x; t / 2 Bdc .n/ Œ0; 1:
306
Appendix C Advanced analysis
We have g.x; 0/ D 1 and g.x; 1/ D fn .x/. Hence, the existence of a continuous argument for fn follows from Proposition C.8.6. Using Lemma C.8.3 and the fact that 0 D n .0/ D nC1 .0/ we see that n .x/ D nC1 .x/, x 2 Bdc .n/. Let be the unique function on Rd such that .x/ D n .x/, x 2 Bdc .n/. It is clear that is a continuous argument of f while the uniqueness follows from Lemma C.8.3.
C.9 Solutions of certain functional equations Recall that B o .ı/ D Bdo .ı/ denotes the open ball ft 2 Rd W kt k < ıg: Lemma C.9.1 (Cauchy’s functional equation). Let f be a continuous real-valued function on B o .ı/ such that f .t C s/ D f .t / C f .s/;
t; s; t C s 2 B o .ı/:
(1)
Then there exists c 2 Rd such that f .t / D .c; t / for all t 2 B o .ı/. Proof. Assume first that d D 1. From f .0/ D f .0 C 0/ D f .0/ C f .0/ we get f .0/ D 0. Repeatedly applying equation (1) gives f .t / D f .t =n C C t =n/ D nf .t =n/ for all n 2 N and for all t . Now let r D m=n be a rational number such that rt 2 B o .ı/. Then f .rt / D f .mt =n/ D mf .t =n/ D
m f .t / D rf .t /: n
By continuity, the equation f .rt / D rf .t / holds for all r 2 R such that rt; t 2 B o .ı/. We replace in this equation t by t0 ¤ 0 and r by t =t0 and obtain f .t / D
f .t0 / t; t0
t 2 B o .ı/:
To prove the general case, let e1 ; : : : ; ed be the standard basis of Rd . Then f .t / D f .t1 e1 C C td ed / D f1 .t1 / C C fd .td / where fj .tj / D f .tj ej /. We now apply the first part of the proof to the functions fj . Theorem C.9.2. A continuous complex-valued function ' ¤ 0 on B o .ı/ satisfies the equation (1) '.t C s/ D '.t /'.s/; t; s; t C s 2 B o .ı/
307
Section C.9 Solutions of certain functional equations
if and only if there exists c 2 Cd such that '.t / D e.c;t/ ;
t 2 B o .ı/:
Proof. It is clear that any function of the above form satisfies the equation (1). To prove the other direction we consider only the case d D 1. The general case can be reduced to this one in the same way as in the proof of Lemma C.9.1. From '.t C0/ D '.t /'.0/ we see that '.0/ D 1. Thus, 1 D '.t t / D '.t /'.t / and therefore '.t / ¤ 0 for all t . The function j'j satisfies the same equation as '. Applying Lemma C.9.1 to the logarithm of j'j, we obtain j'.t /j D ert with some r 2 R. The function D '=j'j satisfies the same equation as '. By continuity, there exists ı0 > 0 such that the real part of .t / is positive whenever t 2 .ı0 ; ı0 /. Setting f .t / D arctan.Im .t /=Re .t // we have .t / D eif .t/ ; t 2 .ı0 ; ı0 /. The function f is continuous and satisfies equation (C.9.1.1). Applying Lemma C.9.1 and using ' D j'j we conclude that '.t / D ect ;
t 2 .ı0 ; ı0 /
with some c 2 C. To show that this equation holds for arbitrary t 2 B o .ı/, let n 2 N be such that t =n 2 .ı0 ; ı0 /. Repeated application of (1) shows that '.t / D '.t =n C C t =n/ D '.t =n/n D ect : Lemma C.9.3 (Pexider’s functional equation). Let f; g and h be real-valued functions on B o .ı/ such that f is continuous and f .t C s/ D g.t / C h.s/;
t; s; t C s 2 B o .ı/:
Then there exist a; b 2 R and c 2 Rd such that f .t / D .c; t / C a C b, g.t / D .c; t / C a; h.t / D .c; t / C b for all t 2 B o .ı/. Proof. Let a D g.0/ and b D h.0/. Setting s D 0 in the above equation we get g.t / D f .t / b. Similarly, h.s/ D f .s/ a. Thus, f .t C s/ D f .t / C f .s/ a b and therefore the function f a b satisfies Cauchy’s functional equation. The assertion follows now from Lemma C.9.1. Lemma C.9.4. Let k 2 N0 and let f be a continuous real-valued function on B o .ı/ such that for all s 2 B o .ı/ the function t 7! f .t C s/ f .t /;
t; t C s 2 B o .ı/
is a polynomial of degree at most k. Then f is a polynomial of degree at most k C 1. Proof. First, let d D 1. By assumption, f .t C s/ f .t / D ak .s/t k C C a1 .s/t C a0 .s/;
t; s; t C s 2 .ı; ı/
(1)
308
Appendix C Advanced analysis
where a0 ; : : : ; ak are real-valued functions. First we show that the functions aj are continuous. Let s be fixed and choose mutually different tj such that tj ; s C tj 2 .ı; ı/ , j D 0; : : : ; k. Replacing t by tj in (1) we obtain k C 1 equations which can be written in the form f D Va, where V is the Vandermonde matrix corresponding to the tj ’s and f and a are column vectors with coordinates f .tj C s/ f .tj / and aj .s/, respectively. Since V is invertible, we see that each aj can be written as k X ri Œf .ti C s/ f .ti / aj .s/ D iD0
with some ri D ri .t0 ; : : : ; tk / 2 R, from which the continuity of aj follows. We now prove the lemma by induction on k. The case k D 0 follows from Lemma C.9.3 with g D f . Assume that k > 0 and that the statement is true for 0; 1; : : : ; k 1. Replacing s by s C x in (1) gives f .t C s C x/ f .t / D
k X
aj .s C x/t j
j D0
while replacing t by t C x gives f .t C s C x/ f .t C x/ D
k X
aj .s/.t C x/j :
j D0
These equations hold whenever s; t; x; t C x; s C x; s C t C x 2 .ı; ı/. Subtracting the second equation from the first one yields f .t C x/ f .t / D
k X aj .s C x/t j aj .s/.t C x/j : j D0
On the other hand, (1) shows that f .t C x/ f .t / D
k X
aj .x/t j :
j D0
For fixed s and x these equations hold for all t in a neighborhood of zero. Comparing the coefficients of t k we obtain ak .s C x/ D ak .s/ C ak .x/;
s; x; s C x 2 .ı; ı/:
Thus, in view of Lemma C.9.1, ak .s/ D c s with some c 2 R. Define the function g c t kC1 . Then by g.t / D f .t / kC1 g.t C s/ g.t / D bk1 .s/t k1 C C b0 .s/ with some functions bj . By the induction assumption g is a polynomial of degree at most k, completing the proof in the case d D 1.
309
Section C.9 Solutions of certain functional equations
Now let d be arbitrary. By assumption, X a˛ .s/t ˛ ; f .t C s/ f .t / D
t; s; t C s 2 B o .ı/:
(2)
j˛jk
with some real-valued functions a˛ . For fixed t; s 2 B o .ı/ define the function h by h.x/ D f .t C x .s t //;
x 2 I WD fr 2 R; t C r .s t / 2 B o .ı/g:
Note that I is an open interval containing Œ0; 1. If x; y; x C y 2 I , then h.x C y/ h.x/ D f .t C x .s t / C y .s t // f .t C x .s t // X D a˛ .y .s t // .t C x .s t //˛ : j˛jk
The right-hand side is a polynomial in x of degree at most k. By the first part of the proof, h is a polynomial of degree at most k C 1.17 Application of Theorem B.6.6 completes the proof. Theorem C.9.5. Let f1 ; : : : ; fm ; g and h be continuous real-valued functions on B o .ı/. Further, let Ai ; Bi 2 Rd d be invertible matrices. If for some ı0 2 .0; ı the equation m X
fi .Ai t C Bi s/ D g.t / C h.s/;
t; s 2 B o .ı0 /
(1)
iD1
holds, then in some neighborhood of zero the functions g and h are polynomials of 1 degree at most m. If the matrices Bi A1 i Bj Aj are invertible for all i ¤ j , then all functions fi are polynomials of degree at most m. Proof. Without loss of generality we may assume that Ai D Id (the identity matrix) for all i . We prove the assertions by induction on m. The case m D 1 follows from Lemma C.9.3. Let m > 1 and assume that the assertions are true for 1; : : : ; m 1. For t; s and x near to the origin we have m X
fi .t C.Bm Bi /xCBi s/ D
iD1
m X
fi .t CBm xCBi .sx// D g.t CBm x/Ch.sx/:
iD1
Subtracting from this equation (1) with Ai D Id we obtain m1 X
Œfi .t C.Bm Bi /xCBi s/fi .t CBi s/ D Œg.t CBm x/g.t /CŒh.sx/h.s/:
iD1 17
The first part remains valid if we replace .ı; ı/ by the possibly nonsymmetric interval I .
310
Appendix C Advanced analysis
We consider the functions Fi .t / D fi .t C .Bm Bi /x/ fi .t / and G.t / D g.t C Bm x/ g.t /;
H.s/ D h.s x/ h.s/:
These functions satisfy the equation m1 X
Fi .t C Bi s/ D G.t / C H.s/:
(2)
iD1
The induction assumption (concerning g and h) and Lemma C.9.4 show that g and h are polynomials of degree at most m. If Bm Bi is nonsingular, then the induction assumption (concerning fi ) and Lemma C.9.4 show that fi is a polynomial of degree at most m. Proposition C.9.6. If f W Rd ! C is a polynomial of degree k > 0, then for all s 2 Rd the function t 7! f .t C s/ f .t / is a polynomial of degree at most k 1. More generally, if the measure D
n X
ci ıxi ;
c 2 Rn ; x 2 Rn
iD1
is such that .0/ O D
Pn
iD1 ci
D 0, then f is a polynomial of degree at most k 1.
Proof. It is easy to check that the statement is true for polynomials of the form f .t / D
d Y
˛
tj j ;
t 2 Rd ; ˛ 2 Nd0
j D1
from which the general statement follows.
C.10 Linear independence of exponential functions C.10.1. Let P be a trigonometric polynomial of the form P .t / D
n X j D1
cj ei.t;xj / ;
t 2 Rd
311
Section C.10 Linear independence of exponential functions
where n 2 N; c 2 Cn and the xj ’s are pairwise distinct. Then P can be regarded as the inverse Fourier transform of the measure D
n X
cj ıxj :
j D1
Since is uniquely determined by its inverse Fourier transform, we conclude that P D 0 implies D 0. In other words, the functions t 7! ei.t;xj / ; t 2 Rd , are linearly independent (over C). In this section we generalize this statement. We will use the notation x .t / WD ei.x;t/ ;
x; t 2 Rd :
Theorem C.10.2. Suppose that x1 ; : : : ; xn 2 Rd are pairwise distinct and let ; ¤ U Rd be open. Then the functions xj jU ; 1 j n, are linearly independent. Proof. For c 2 Cn write P .t / WD
n X
cj xj .t / D 0;
t 2 Rd
j D1
and assume that P .t / D 0 for all t 2 U . For each t0 2 U the function ' defined by '.x/ WD P .xt0 /;
x2R
is analytic on R. Since ' vanishes on an open interval containing 1, we conclude that ' D 0. Consequently, P .t / D 0 for all t 2 Rd . The remarks in the preceding paragraph show that c D 0, from which the theorem follows. Theorem C.10.3. Suppose that x1 ; : : : ; xn 2 Rd ; d 2, are pairwise distinct and let S Rd be a sphere of positive radius. Then the functions xj jS ; 1 j n, are linearly independent.18 Proof. We may suppose that S D S d 1 . We have to show that the relation n X j D1
cj xj .t / D
n X
cj ei.t;xj / D 0;
t 2 S d 1
(1)
j D1
where cj 2 C, implies that c D 0. Assume, on the contrary, that c ¤ 0. Removing the zeros from the numbers cj we suppose that cj ¤ 0 for all j . The case n D 1 being trivial, let n 2. Write R WD max kxj k > 0 1j n
18
The proof of this result is taken from [24].
312
Appendix C Advanced analysis
and let e1 ; : : : ; ed be the standard basis of Rd . We may assume that x1 D Re1 . Denoting by P the orthogonal projection onto spanfe1 ; e2 g we have P xj D rj Œcos.'j / e1 C sin.'j / e2 where rj D kP xj k kxj k R and 0 'j < 2. Noting that r1 D R we may assume that for some n0 2 N we have rj D R if 1 j n0 and rj < R if j > n0 . Then kP xj k D kxj k and hence P xj D xj for all j n0 . Since the xj ’s are pairwise distinct we may also assume that 0 D '1 < '2 < < 'n0 < 2:
(2)
For all ' 2 R we have t WD cos.'/ e1 C sin.'/ e2 2 S d 1 . Putting this t into equation (1) and using that .t; xj / D .P t; xj / D .t; P xj / we obtain n X
cj eirj cos.''j / D 0;
' 2 R:
j D1
It follows from Theorem C.1.8 that this equation holds in the whole complex plane: n X
cj eirj cos.z'j / D 0;
z D x C iy 2 C:
(3)
j D1
Since cos.z 'j / D we have ˇ ir cos.z' / ˇ j ˇ ˇe j D exp
1 i.z'j / C ei.z'j / e 2 i.z' / rj i.z'j / j Im e Ce 2
D erj sinh.y/ sin.x'j / : Let
j .z/
2 Œ0; 2/ be the argument of eirj cos.z'j / . Then, by relation (3), n X
cj erj sinh.y/ sin.x'j / ei
j .z/
D 0:
j D1
Multiplying this equation by eR sinh.y/ , we obtain 0
n X
cj eR sinh.y/Œsin.x'j /1 ei
j D1
C
n X j Dn0 C1
j .z/
cj esinh.y/Œrj sin.x'j /R ei
j .z/
D 0:
(4)
Section C.10 Linear independence of exponential functions
313
Set x WD '1 C 2 . Then sin.x 'j / 1 is equal to 0 if j D 1 and it is negative if 1 < j n0 , in view of (2). Since rj < R if j > n0 we conclude that rj sin.x 'j /R is negative for all such j . Taking the limit y ! 1 in (4), we obtain c1 D 0. This contradiction shows that the relation (1) implies that c D 0. The theorem is proved.
Appendix D
Functional analysis
D.1 Inner product spaces Let V and W be linear spaces over K where K D R or K D C. The dimension of V is denoted by dim.V /. A linear operator from V into W is a mapping A W V ! W satisfying A.av C bw/ D aA.v/ C bA.w/;
a; b 2 K; v; w 2 V:
If W D V then we say that A is a linear operator in V . If W D K then A is called a linear functional on V . A mapping .; / from V V into K satisfying (i)
.c1 v1 C c2 v2 ; w/ D c1 .v1 ; w/ C c2 .v2 ; w/
(ii)
.w; c1 v1 C c2 v2 / D c1 .w; v1 / C c2 .w; v2 /
for all v; v1 ; v2 ; w 2 V and c1 ; c2 2 K, is called a sesquilinear form on V . A positive semidefinite inner product is a sesquilinear form such that .v; w/ D .w; v/ and .v; v/ 0. If .v; v/ > 0 whenever v ¤ 0, then the inner product .; / is called positive definite. The linear space V together with the inner product .; / is called a (positive definite or positive semidefinite, respectively) inner product space. Two elements v; w of an inner product space V are called orthogonal if .v; w/ D 0. Orthogonality of two sets is defined correspondingly. Orthogonality of elements and sets is denoted by the symbol ? . The orthogonal complement M ? of a set M V is the set of all elements of V which are orthogonal to M W M ? D fv 2 V W v ? M g. The orthogonal complement is a linear space. If L; M V are linear spaces, then the notation V DL˚M means that V D L C M and V ? M . If V is finite dimensional then any linear functional l on V can be written as l.v/ D .v; w/;
v2V
with some w 2 V .1 In case of a positive definite inner product, w is uniquely determined. Any positive semidefinite inner product h; i on the linear space Kd can be 1
See also Theorem D.3.4.
315
Section D.2 Matrices and kernels
written as hv; wi D .Av; w/ where A 2 Kd d is positive semidefinite and .v; w/ D
d X
v j wj
j D1
is the standard inner product of Kd . Writing p kvk D .v; v/;
v2V
for all v; w 2 V and a 2 K we have (i)
kvk 0
(ii)
kavk D jaj kvk
(iii)
j.v; w/j kvk kwk
(iv)
kv C wk kvk C kwk
(v)
kv C
wk2
C kv
(Cauchy–Schwarz inequality)
wk2
(triangle inequality) D 2kvk2 C 2kwk2
(parallelogram law).
Any mapping kk 7! Œ0; 1/ satisfying (ii) and (iv) is called a seminorm. If kvk > 0 for all v 2 V nf0g then kk is called a norm. A linear space equipped with a norm is called a normed linear space. Defining the distance .v; w/ of v and w by .v; w/ WD kvwk, the linear space V becomes a metric space. If this metric space is complete, i.e., if every Cauchy sequence is convergent, then V is called a Banach space. The inner product .; / of a semidefinite inner product space can be expressed in terms of the corresponding seminorm: 1 kv C wk2 kv wk2 ; if K D R 4 3 1X k .v; w/ D i kv C ik wk2 ; if K D C: 4
.v; w/ D
(1) (2)
kD0
The two equations above are called polarization identity. A result, attributed to M. R. Fréchet, J. von Neumann and P. Jordan, states that if a seminorm (norm) k k satisfies the parallelogram law, then the equations above define a semidefinite (positive definite, respectively) inner product .; /.
D.2
Matrices and kernels
Basic facts and notation D.2.1. The identity matrix is I D In D .ıij / 2 Rnn , where ıi i D 1 and ıij D 0 if i ¤ j . A permutation matrix is obtained by reordering the columns of the identity matrix. The symbol diag .c1 ; : : : ; cn / where cj 2 C denotes the diagonal matrix .dij / with di i D ci and dij D 0; i ¤ j . More generally, if
316
Appendix D Functional analysis
D1 ; : : : ; Dn are matrices, then we write
2
6 diag .D1 ; : : : ; Dn / D 4
3
D1 ::
7 5
: Dn
where we use the convention that blank entries in a matrix denote zeros. For a matrix A D .aij / 2 Cmn , AT denotes the transpose .aj i / 2 Cnm and A denotes the conjugate transpose .aj i /. The rank of A is defined by rank.A/ D dim.range.A// , where2 range.A/ D ACn D fAx W x 2 Cn g Cm : The rank of A is equal to the maximum number of linearly independent column vectors of A, which is the same as the maximum number of linearly independent row vectors. If the product AB is defined then rank.AB/ min.rank.A/; rank.B//: Let A 2 Cnn be a quadratic matrix for the rest of this paragraph. A is diagonalizable if there exists an invertible matrix X 2 Cnn such that XAX 1 is diagonal. A is symmetric if A D AT and A is Hermitian if A D A . A is called unitary if A D A1 . If A is real and AT D A1 , then A is called orthogonal. We denote by O.n/ the set of all n n orthogonal matrices. A is said to be normal if A A D AA . A matrix is normal if and only if it is diagonalizable by a unitary matrix. A is positive semidefinite if x Ax 0 for all x 2 Cn and positive definite if x Ax > 0 for all nonzero x 2 Cn . We show below (cf. Theorem D.2.3) that positive semidefinite matrices are Hermitian. The determinant of A is denoted by det.A/. Key properties are det.AB/ D det.A/ det.B/ det.cA/ D c n det.A/;
c 2 C:
The complex number is an eigenvalue of A with corresponding eigenvector x 2 Cn n f0g if Ax D x. The eigenvalues are the zeros of the characteristic polynomial q.t / D det.tI A/; t 2 C which has degree n. The eigenspace of A corresponding to an eigenvalue is the linear space fx 2 Cn W Ax D xg. The algebraic multiplicity of is its multiplicity as a zero of the characteristic polynomial. The geometric multiplicity of is the dimension of the eigenspace of A corresponding to . If A is Hermitian, then the eigenvalues are real; if A is positive semidefinite, then they are nonnegative. 2
Recall our convention from A.2 that in expressions involving matrix operations, e.g., as in Ax, we consider x 2 Cn as a column vector.
317
Section D.2 Matrices and kernels
The trace of A is defined by trace.A/ D
n X
ai i :
iD1
It is easy to check that trace.AB/ D trace.BA/ and hence trace.X 1 AX / D trace.A/ for an arbitrary invertible matrix X 2 Cnn . The trace of A is the sum of all eigenvalues of A while the determinant is the product of all eigenvalues of A. AVandermonde matrix V has the form 2 3 1 c1 c12 : : : c1n1 6 1 c2 c 2 : : : c n1 7 2 2 6 7 V D V .c1 ; : : : ; cn / D 6 : : :: :: 7 : : 4 : : : : 5 1 cn cn2 : : : cnn1 its determinant is equal to
Y
.cj ci /
1i 0
j;kD1
for arbitrary real numbers r1 and r2 , not both zero. Theorem D.2.4. A real matrix A D .aj k /nj;kD1 is positive semidefinite if and only if it is symmetric and n X aj k rj rk 0 j;kD1
for all real numbers r1 ; : : : ; rn . Proof. The “only if” part follows immediately from Theorem D.2.3. If A is real and symmetric, then n n X X aj k cj ck D aj k Re.cj ck / 2 R j;kD1
j;kD1
319
Section D.2 Matrices and kernels
for all cj 2 C. Using this and writing cj D xj C iyj ; xj ; yj 2 R we have n X
n X
aj k cj ck D
j;kD1
D
j;kD1 n X
n X
aj k .xj xk C yj yk / C i
aj k .yj xk xj yk /
j;kD1
aj k x j x k C
j;kD1
n X
aj k yj yk
j;kD1
from which the “if part” follows. Definition D.2.5. Suppose A D .aj k /nj;kD1 is a Hermitian matrix. Then A has n real eigenvalues, counted with multiplicity, and there exists an orthonormal basis of Cn consisting of eigenvectors of A. The number of negative squares of A is the number of negative eigenvalues of A, counted with multiplicity. A negative (respectively nonnegative, respectively positive) subspace for A is a linear subspace E of Cn such that n X aj k xj xk D .Ax; x/ < 0 j;kD1
(respectively 0, respectively > 0) for all x 2 E n f0g. Note that, by orthogonality, the linear space spanned by eigenvectors corresponding to negative (nonnegative, positive, nonpositive) eigenvalues is negative (nonnegative, positive, nonpositive, respectively). Theorem D.2.6. If A is a Hermitian matrix, then the number of negative squares of A is equal to the maximal dimension of a negative subspace for A. Proof. Let the order of A be n and let k be the number of negative squares of A. In view of the remark after Definition D.2.5, the matrix A has a negative subspace of dimension k and A has a nonnegative subspace P of dimension n k. Now let N be an arbitrary negative subspace for A. Since N cannot intersect P except in 0 we must have dim .N / k. Theorem D.2.7. If A 2 Cnn is an invertible Hermitian, matrix then the number of negative squares of A is equal to the number of changes of sign in the sequence .1; det A1 ; : : : ; det An / where Ak D .aij /ki;j D1 . Proof. For 1 k n we identify Ck with the linear subspace f .x1 ; : : : ; xk ; 0; : : : ; 0/ W .x1 ; : : : ; xk / 2 Ck g
320
Appendix D Functional analysis
of Cn . Further, let mk be the number of negative squares of Ak . It follows from Theorem D.2.6 that mk1 mk for k D 2; : : : ; n. If k 2 and if E is a negative subspace for Ak of dimension mk , then E \ Ck1 is a negative subspace for Ak1 of co-dimension at most 1 in E, hence of dimension at least mk 1. Thus, mk1 mk mk1 C 1 for k D 2; : : : ; n. The determinant of a Hermitian matrix is equal to the product of all eigenvalues (considered with multiplicities). Therefore, the sign of det Ak is .1/mk , from which the statement follows. Noting that positive definite matrices are nonsingular, we obtain the following corollary. Corollary D.2.8. An invertible Hermitian matrix A 2 Cnn is positive definite if and only if det Ak > 0 for k D 1; : : : ; n.
As the example AD
0 0 0 1
shows, there exist matrices with det Ak 0 for all k that are not positive semidefinite. However, the following result holds. Theorem D.2.9. A Hermitian matrix A 2 Cnn is positive semidefinite if and only if all subdeterminants of A are nonnegative. Proof. If A D .aj k / is positive semidefinite, then the eigenvalues and hence the determinant of A are nonnegative. On the other hand, let us assume that all subdeterminants are nonnegative. For > 0 we define the matrix A D A C In . Then det .A / D
n X
dl l
lD0
where dn D 1 and dl D
X
det ..aj k /j;k2Sl / 0;
l D 0; 1; : : : ; n 1:
Sl
Here the sum is over all sets Sl f1; : : : ; ng having n l elements. From this we conclude that det .A / n > 0. Corollary D.2.8 shows that A is positive definite. Consequently, the pointwise limit A D lim!0 A is positive semidefinite. As an application of Theorem D.2.9 we present the following result: Lemma D.2.10. The kernels K1 .x; y/ D min .x; y/;
x; y 2 Œ0; 1/
321
Section D.2 Matrices and kernels
and K2 .x; y/ D
1 .jxj C jyj jx yj/; 2
x; y 2 R
are positive semidefinite.4 Proof. We consider first the kernel K1 and show that the matrix n A D min .xi ; xj / i;j D1 is positive semidefinite for all x1 ; : : : ; xn 2 Œ0; 1/. We may assume that x1 x2 xn . We then have ˇ ˇ ˇ x1 x1 x1 : : : x1 ˇ ˇ ˇ ˇ x1 x2 x2 : : : x2 ˇ ˇ ˇ ˇ ˇ Dn .x1 ; : : : ; xn / WD det .A/ D ˇ x1 x2 x3 : : : x3 ˇ : ˇ :: : : : :: :: :: ˇˇ ˇ : ˇ ˇ ˇ x 1 x2 x3 : : : xn ˇ Subtracting the first row from all other rows and expanding the determinant along the first column we see that Dn .x1 ; x2 ; : : : ; xn / D x1 Dn1 .x2 x1 ; : : : ; xn x1 /: Induction on n leads to Dn .x1 ; x2 ; : : : ; xn / D x1 .x2 x1 /.x3 x2 / .xn xn1 / from which det A 0 follows. Since all subdeterminants of A have the same structure, they are also nonnegative. Theorem D.2.9 shows that A is positive semidefinite. We now consider the kernel K2 . If xy 0 then K2 .x; y/ D 0, otherwise K2 .x; y/ D K1 .jxj; jyj/. Thus, for x1 xn , n K2 .xi xj / i;j D1 D diag .A1 ; A2 / where A1 and A2 are, by the first part of the lemma, positive semidefinite matrices. This completes the proof. Alternative proof. We present an alternative proof for the positive semidefiniteness of the matrix A D .min .xi ; xj //ni;j D1 , where 0 x1 x2 xn . By continuity, it suffices to consider positive rational xj ’s. Multiplying A by a positive integer we may even assume that the xj ’s are positive integers. Then for some m the matrix A is a submatrix of .min.i; j //m i;j D1 which is positive definite since it is the square of the symmetric matrix B D .bij / where bij D 0 if i C j m and bij D 1 if i C j > m. For example, 2 3 2 32 1 1 1 1 0 0 0 1 6 1 2 2 2 7 6 0 0 1 1 7 6 7 6 7 4 1 2 3 3 5D4 0 1 1 1 5 : 1 2 3 4 1 1 1 1 4
Note that K1 .x; y/ D K2 .x; y/ if x; y 0.
322
Appendix D Functional analysis
Theorem D.2.11. A matrix A 2 Cnn is positive semidefinite if and only if there exists a Hermitian matrix B 2 Cnn such that A D B 2 . If A is real then B can be chosen real as well. Proof. If A D B 2 for some n n Hermitian matrix B, then .Ax; x/ D .B 2 x; x/ D .Bx; Bx/ 0;
x 2 Cn :
Thus, A is positive semidefinite. For the converse, let fe1 ; : : : ; en g be an orthonormal basis of Cn (or of Rn if A is real), where ej is an eigenvector of A with corresponding eigenvalue rj 0. Consider ej as a column vector and let Q D Œe1 ; : : : ; en . Using orthonormality we see that Q is unitary, i.e., Q Q is the identity matrix. Moreover, Q diag .r1 ; : : : ; rn / Q ej D rj ej D Aej and hence A D Q diag .r1 ; : : : ; rn / Q . It is now easy to check that the square of the Hermitian matrix p p B D Q diag . r1 ; : : : ; rn / Q is equal to A. Theorem D.2.12 (Schur). Let A D .aij / and B D .bij / be nn positive semidefinite matrices. Then the matrix C D .aij bij /ni;j D1 is positive semidefinite. If A ¤ 0 and B is positive definite, then C is positive definite, as well. Proof. Suppose first that A and B are positive semidefinite. We assume that A ¤ 0. By Theorem D.2.11 there exists an n n Hermitian matrix D such that A D D 2 . Writing D D .dij /, we have for x D .xi / 2 Cn .C x; x/ D D D D
n X
aij bij xj xi i;j D1 n X n X
dli dlj bij xj xi
i;j D1 lD1 n X n X
bij ylj yli
lD1 i;j D1 n X
.Byl ; yl / 0
lD1
by the positive semidefiniteness of B, where yl D .yl1 ; : : : ; yln / 2 Cn with components yli D dli xi ; i; l D 1; : : : ; n. The second statement can be proved in the same way.
323
Section D.2 Matrices and kernels
Corollary D.2.13. Let A D .aij / be an n n positive semidefinite matrix and let pk ; k 2 N0 , be nonnegative numbers such that the series '.z/ WD
1 X
pk z k
kD0
is convergent for all z 2 C. Then the matrix .'.aij // is positive semidefinite. Lemma D.2.14. Let C D .cj k /nj;kD1 be a positive semidefinite matrix and A D .aj k / D Re C , B D .bj k / D Im C . Then the 2n 2n matrix A B D D .dj k / D B A is positive semidefinite as well. Proof. Since cj k D ckj we conclude that AT D A; B T D B and hence D D D T . In view of Theorem D.2.4 it suffices to show that 2n X
dj k rj rk 0
(1)
j;kD1
for all r1 ; : : : ; r2n 2 R. By assumption, 0 D
n X j;kD1 n X
cj k .rj irnCj /.rk C irnCk / cj k .rj rk C rnCj rnCk / i
j;kD1
n X
cj k .rnCj rk rj rnCk / DW S1 iS2 :
j;kD1
Using the relation cj k D ckj we see that S1 is real, while S2 is purely imaginary. Consequently, S1 iS2 D
n X
aj k .rj rk C rnCj rnCk / C
j;kD1
D
2n X
n X j;kD1
dj k rj rk
j;kD1
showing the inequality (1).
bj k .rnCj rk rj rnCk /
324
Appendix D Functional analysis
Definition D.2.15. Let V be a positive definite inner product space over K where K D C or K D R and let v1 ; : : : ; vn 2 V . The Gram matrix G associated with v1 ; : : : ; vn is defined by n G D G.v1 ; : : : ; vn / D .vi ; vj / i;j D1
where .; / denotes the inner product of V . Choosing an orthonormal basis in the linear span of the vectors vj and building a matrix A by writing the coordinates of vj as the j -th row of A we have G D AA . Theorem D.2.16. (With the notation of the previous definition.) (i)
The Gram matrix G.v1 ; : : : ; vn / is positive semidefinite.
(ii)
If w1 ; : : : ; wm 2 V are any vectors for which vi D
m X
tij wj ;
tij 2 K; i D 1; : : : ; n
j D1
then
G.v1 ; : : : ; vn / D T G.w1 ; : : : ; wm /T
(1)
where T D .tij /. (iii)
The dimension of the linear space L spanned by the vectors v1 ; : : : ; vn is equal to the rank of G.v1 ; : : : ; vn /.
Proof. The first statement follows from5 n X
.vi ; vj /ci cj D
i;j D1
n X
vi ci ;
iD1
n X
! vi ci
0;
ci 2 K
iD1
while the second one is a consequence of .vi ; vk / D
m X
tij .wj ; wl /tkl :
j;lD1
To prove the last statement choose arbitrary vectors w1 ; : : : ; wn which span the same linear space L. Since the rank of the product of matrices is not greater than that of any factor, equation (1) shows that rank G.v1 ; : : : ; vn / rank G.w1 ; : : : ; wn /: Exchanging the role of the v’s and w’s we conclude that we have equality in the above display. The result now follows by choosing the w’s such that w1 ; : : : ; wk form an orthonormal basis of L and wj D 0 for j > k D dim L. 5
The positive semidefiniteness also follows from the representation G D AA after Definition D.2.15.
325
Section D.2 Matrices and kernels
Lemma D.2.17. Let Cn 2 Cd d ; n 2 N, be positive semidefinite matrices such that the limit q.t / D lim .Cn t; t / n!1
exists for all t 2 Rd . Then there exists a positive semidefinite matrix C 2 Cd d such that q.t / D .C t; t /; t 2 Rd . Moreover, the sequence fCn g1 1 converges to C entrywise. Proof. It is clear that the above limit also exists for all t 2 p Cd , we denote it in this case also with q.t /. Since Cn is positive semidefinite, kt kn D .Cn t; t / is a seminorm on p Cd satisfying the parallelogram law (cf. (D.1.v)). The same is true for the limit q . By the theorem of Fréchet, von Neumann and Jordan, q is generated by a positive semidefinite inner product h; i on Cd : q.t / D ht; t i;
t 2 Cd :
The first statement of the lemma follows from the fact that ht; t i D .C t; t / with some positive semidefinite matrix C 2 Cd d . Let e1 ; : : : ; ed be the standard basis of Cd . Using the polarization identity (D.1.2) with v D ej and w D ek , we obtain the second statement. Lemma D.2.18. If A is a positive semidefinite matrix. then A WD A C I is positive definite for all > 0. Proof. The matrix A is obviously positive semidefinite. If it was not positive definite, then it would have an eigenvector with eigenvalue 0. This vector would be an eigenvector of A with eigenvalue , contradicting the positive semidefiniteness of A. Theorem D.2.19. Let n 2 N and let S be a subset of the square Sn WD f.i; j / 2 Z2 W 1 i; j ng such that (i)
.i; i / 2 S whenever 1 i n;
(ii)
if .l; k/ 2 S then f.i; j / W min .l; k/ i; j max .l; k/g S .
Suppose that for each .i; j / 2 S a complex number cij is given such that (iii)
for all .l; k/ 2 S with l k the matrix .cij /ki;j Dl is positive semidefinite.
Then there exists a positive semidefinite matrix A D .aij /ni;j such that aij D cij for .i; j / 2 S . Proof. The case n D 1 being trivial suppose that n > 1.
326
Appendix D Functional analysis
We consider first the special case where S D Sn n f.1; n/; .n; 1/g. For z 2 C we define the matrix A.z/ by 3 2 c12 : : : c1;n1 z c11 6 c21 c22 : : : c2;n1 c2n 7 7 6 7 6 : :: A.z/ WD 6 7: 7 6 4 cn1;1 cn1;2 : : : cn1;n1 cn1;n 5 z cn2 : : : cn;n1 cnn Set A1 WD .cij /n1 i;j D1 ;
A3 WD .cij /ni;j D2 ;
A5 WD .cij /n1 i;j D2
and let A2 .z/ .A4 .z// denote the matrix obtained by canceling the first (last) column and the last (first) row of A.z/. Assume that all matrices in (iii) are positive definite. By Corollary D.2.8 it suffices to prove the existence of a complex number z for which the determinant det .A.z// is positive. We have det .A.z// D
det .A1 / det .A3 / det .A2 .z// det .A4 .z// : det .A5 /
Since A.z/ is Hermitian det .A4 .z// D det .A2 .z// and hence the inequality det .A.z// > 0 holds if and only if j det .A2 .z//j2 < det .A1 / det .A3 /:
(1)
By assumption, the right-hand side of this inequality is positive. Expanding the determinant det .A2 .z// along the first row we see that det .A2 .z// D det .A5 /z C c where det .A5 / > 0 and c is some complex number not depending on z. We conclude that there exist infinitely many z 2 C satisfying (1). If the matrices in (iii) are not all positive definite, then we replace A.z/ by the matrices A .z/ WD A.z/ C I where I denotes the n n identity matrix and > 0. By Lemma D.2.18 the corresponding matrices are positive definite. For m 2 N we choose zm 2 C such that A1=m .zm / is positive semidefinite. Then 1 c11 C m zm det 0 1 zm cnn C m and hence the sequence fzm g is bounded. Consequently, there exists a subsequence converging to some z 2 C. The matrix A.z/ is then positive semidefinite. We now turn to the general case. We may suppose that S 6D Sn . Let k be the greatest integer with the property that .1; 1/; : : : ; .1; k 1/ 2 S and let l be the greatest integer with .1; k/; : : : ; .l; k/ … S . Using (i) and (ii) we see that 1 l < k n and S \ f.i; j / W l i; j kg D f.i; j / W l i; j kg n f.l; k/; .k; l/g:
Section D.3 Hilbert spaces and linear operators
327
The first part of the proof shows the existence of a complex number z such that the matrix .cij /ki;j Dl is positive semidefinite, where ckl WD z; clk WD z. Replacing S by S [f.l; k/; .k; l/g, this new index set and the corresponding complex numbers cij also satisfy the conditions of the theorem. Thus, repeating the arguments above we obtain the desired matrix A.
D.3
Hilbert spaces and linear operators
Hilbert spaces D.3.1. Let V be a normed linear space. We say that a sequence fvn g1 1 of elements of V converges strongly to or simply converges to v 2 V if lim kv vn k D 0:
n!1
In case of strong convergence we write v D limn!1 vn or vn ! v. When speaking about convergence, continuity, closed or open sets, etc., we always refer to the corresponding notion for the topology generated by the norm.6 A Hilbert space H over K where K D R or K D C is a positive definite inner product space over K which is complete with respect to the norm p khk D .h; h/; h 2 H where .; / denotes the inner product of H. Examples of Hilbert spaces are Rd ; Cd and L2 . ; A; P /. Throughout the rest of this section the symbol H denotes a Hilbert space. We call a nonempty set L H a linear manifold if for any two vectors g; h 2 L and any a; b 2 K we have af C bg 2 L. A linear manifold is called a subspace if it is closed. The orthogonal complement M ? of a nonempty subset M H is a subspace of H. If L is a subspace of H, then H D L ˚ L? : Thus, each h 2 H can be represented in the form h D hL C hL? ;
hL 2 L; hL? 2 L? :
The element hL is called the orthogonal projection of h onto L. The orthogonal projection hL is the (unique) element of L having minimal distance to h: kh hL k D inf kh vk: v2L
6
See also D.5.1 for the definition of weak convergence.
328
Appendix D Functional analysis
Lemma D.3.2 (Cauchy criterion). For a sequence fhn g1 1 of elements of H the following conditions are equivalent: (i) fhn g1 1 is convergent; (ii) limn;m!1 khn hm k D 0; (iii) The limit limn;m!1 .hn ; hm / exists. If the limit in (iii) exists, then it is equal to k limn!1 hn k2 . Linear operators D.3.3. Let V and W be normed linear spaces. A linear operator A from V into W is called bounded if there exists a C 0 such that kAvk C kvk for all v 2 V . We denote by B.V; W / the set of all bounded linear operators from V to W . A linear operator is bounded if and only if it is continuous. The norm kAk of a bounded operator is defined by kAk WD inf fC 0 W kAvk C kvk for all v 2 V g:
(1)
If kAk 1, then A is called a (linear) contraction. We have kAk D sup fkAhk W h 2 H; khk D 1g D sup fkAhk W h 2 H; khk 1g and kAvk kAk kvk and
kABk kAk kBk
where B is a bounded linear operator with values in V . If W D H is a Hilbert space, then kAk D sup fj.Ag; h/j W g 2 V; h 2 H; kgk 1; khk 1g: A linear operator A in H is called isometric if kAhk D khk for all h 2 H. An isometric operator A is obviously bounded and its norm is equal to 1. If A is an isometric operator in H and A.H/ D H, then A is called unitary. For an arbitrary bounded linear operator A W H ! H there exists a uniquely determined bounded linear operator A W H ! H, the adjoint of A, such that .Ah; g/ D .h; A g/;
h; g 2 H:
Key properties are kAk D kA k and .AB/ D B A . The operator A is called selfadjoint if A D A. It is called normal if AA D A A. If U is unitary, then U is invertible and U D U 1 . Unitary and self-adjoint operators are normal. Two eigenvectors of a normal operator are orthogonal whenever they correspond to different eigenvalues. All eigenvalues of a self-adjoint operator are real, while the eigenvalues of a unitary operator have modulus 1. For a self-adjoint operator A, the inner product .Ah; h/ is real for all h 2 H. If, in addition, .Ah; h/ 0 for all h 2 H, then A is said to be nonnegative. The eigenvalues of nonnegative operators are nonnegative. A linear manifold L H is said to be invariant under an operator A, or A-invariant, if A.L/ L. If L is A-invariant, then L? is invariant under A .
329
Section D.3 Hilbert spaces and linear operators
Theorem D.3.4 (Riesz–Fréchet). For each bounded linear functional l on H there exists a unique v 2 H such that l.h/ D .h; v/;
h2H
and klk D kvk. Theorem D.3.5. Let h; iL be a sesquilinear form on a dense linear manifold L H such that jhg; hiL j C kgk khk; h 2 L with some C 2 Œ0; 1/. This form can be uniquely extended to a sesquilinear form h; i on H such that jhg; hij C kgk khk; h 2 H: Moreover, there exists a linear operator A on H such that kAk C and hg; hi D .Ag; h/;
g; h 2 H:
Theorem D.3.6. Let H1 and H2 be two Hilbert spaces and let I W L ! H2 be an isometric linear operator defined on a linear manifold L H1 . Then I can be uniquely extended to an isometric linear operator from the closure of L onto the closure of I.L/. Theorem D.3.7. For every bounded nonnegative operator P p onpH there exists a p unique bounded nonnegative operator P such that P pD P P . If a bounded linear operator commutes with P , then it commutes with P , as well. Theorem D.3.8 (Banach–Steinhaus, principle of uniform boundedness). Let B be a Banach space and V be a normed linear space. If A B.B; V / is such that for each v2V sup fkAvk W A 2 Ag < 1 then sup fkAk W A 2 Ag < 1:
Definition D.3.9. A bounded linear operator K in H is called compact if fK.xn /g1 1 contains a convergent subsequence for all bounded sequences fxn g1 1 in H. The next lemma states basic properties of compact operators which follow immediately from the definition of compactness.
330
Appendix D Functional analysis
Lemma D.3.10. (i)
If K1 and K2 are compact operators, then so is a1 K1 Ca2 K2 for all a1 ; a2 2 K.
(ii)
If K is a compact and A is a bounded linear operator, then AK and KA are compact.
Theorem D.3.11. If K is a compact self-adjoint operator in H, then K has an eigenvalue 2 R such that jj D kKk. Proof. The case K D 0 being trivial, we may assume that kKk ¤ 0. We choose a sequence fhn g in H such that khn k D 1 and kKhn k converges to kKk. Since K is self-adjoint we have .K 2 hn ; hn / D .Khn ; Khn / D kKhn k2 . Using this we obtain 0 kK 2 hn kKhn k2 hn k2 D kK 2 hn k2 2kKhn k2 .K 2 hn ; hn / C kKhn k4 D kK 2 hn k2 kKhn k4 kKk2 kKhn k2 kKhn k4 : Since the sequence fkKhn kg converges to kKk we see that lim kK 2 hn kKhn k2 hn k D 0:
n!1
(1)
By Lemma D.3.10, the operator K 2 is compact. Thus, we can choose a subsequence fK 2 hnk g converging to an element, which we write in the form kKk2 g. Relation (1) shows that fhnk g converges to g and K 2 g D kKk2 g: To complete the proof we observe that kgk D 1 and .K kKk/.K C kKk/g D 0: If h WD .K C kKk/g D 0, then kKk is an eigenvalue of K. Otherwise we have .K kKk/h D 0, i.e., kKk is an eigenvalue of K. Theorem D.3.12. Let K ¤ 0 be a self-adjoint compact operator in H. For all h 2 H we have X .h; 'n /'n (1) hD n2S
and Kh D
X n2S
.h; 'n /n 'n
(2)
Section D.3 Hilbert spaces and linear operators
331
where (i)
S D f1; : : : ; N g with some N 2 N or S D N;
(ii)
fn W n 2 S g R is the set of all nonzero eigenvalues of K;
(iii)
f'n W n 2 S g is an orthonormal system in H and 'n is an eigenvector of K with eigenvalue n .
(iv)
If S D N, then limn!1 n D 0.
Proof. By Theorem D.3.11 there exists an eigenvalue 1 2 R of K with j1 j D kKk. Let '1 be an eigenvector with eigenvalue 1 such that .'1 ; '1 / D 1 and denote by H1 the subspace generated by '1 . We have H D H1 ˚ H1? where H1 is K-invariant. Since K is self-adjoint, H1? is K-invariant as well. The restriction of K to H1? is a compact self-adjoint operator with norm at most kKk. If this restriction is not the zero operator, then we apply Theorem D.3.11 to it. Proceeding in this way we obtain a finite or infinite sequence 1 ; 2 ; : : : of eigenvalues such that jn j jnC1 j > 0 and a sequence of corresponding subspaces Hn spanned by eigenvectors of K with eigenvalue n . If this sequence is finite, then H D H1 ˚ ˚ HN ˚ N where N 2 N and K.N / D f0g, from which (1) and (2) follow. Assume that this sequence is infinite and suppose that the sequence fjn jg has a positive lower bound. Then the sequence f'n =n g is bounded but the sequence fK'n =n g D f'n g does not contain a convergent subsequence. For k'n 'm k2 D 2 if m ¤ n so that no subsequence can be a Cauchy sequence. Thus, limn n D 0. We have
M 1 Hn ˚ N HD nD1
where N is a K-invariant subspace. By our procedure, kKjN k jn j for all n, where KjN denotes the restriction of K to N . Thus K.N / D f0g. The decompositions (1) and (2) follow as in the finite case. Remark D.3.13. If the Hilbert space H in the previous theorem is separable and infinite-dimensional, then there exists an orthonormal basis f'n g1 1 of H consisting of eigenvectors 'n of K with (possibly zero) eigenvalues n . This follows from the previous proof by selecting an orthonormal basis of the subspace N .
332
Appendix D Functional analysis
D.4 Convex sets and the theorem of Kre˘ın and Milman A short introduction to the theory of locally convex spaces can be found in Chapter 1 of [3]. D.4.1. Throughout this section the symbol L will denote a linear space over a field K which is either R or C. We assume that L is equipped with a Hausdorff topology O such that the following hold: (i)
The mappings .x; y/ 7! x C y of L L into L and .r; x/ 7! rx of K L into L are continuous.7
(ii)
Each neighborhood of 0 2 L contains a convex neighborhood of 0.
The linear space L together with such a topology O is called locally convex. Let A and B be subsets of L such that A B. Then A is called an extreme subset of B if for all x; y 2 B and p 2 .0; 1/ the relation px C .1 p/y 2 A implies that x; y 2 A. In the special case A D fag we say that a is an extreme point of B. The set of all extreme points of B will be denoted by ex .B/. The next separation result is often useful when dealing with convex sets. A proof can be found, e.g., in [3] (cf. Theorem 2.3 in Chapter 1). Theorem D.4.2. Let E be a locally convex space over K. Further, let F and C be disjoint nonempty convex subsets of E such that F is closed and C is compact. Then there exists a continuous linear functional l W E ! K such that sup Re l.x/ < inf Re l.x/:
x2C
x2F
Since a locally convex space over C can also be considered as a locally convex space over R, we obtain: Corollary D.4.3. Under the assumptions of the preceding theorem there exists a continuous R-linear functional l W E ! R such that sup l.x/ < inf l.x/:
x2C
x2F
Theorem D.4.4 (Hahn–Banach). Let E be a linear space over K, let p be a seminorm on E and let L E be a linear manifold. For every linear functional l on L such that jl.x/j p.x/ for all x 2 L there exists a linear functional e on E such that e.x/ D l.x/ for all x 2 L and je.x/j p.x/ for all x 2 E. 7
Here K is equipped with its usual topology and the product sets are equipped with the product topology.
Section D.4 Convex sets and the theorem of Kre˘ın and Milman
333
In the proof of the next theorem we will repeatedly use the following simple fact: If A is an extreme subset of B and B is an extreme subset of C then A is an extreme subset of C . Theorem D.4.5 (Kre˘ın–Milman). Every compact convex subset K of a locally convex space L is the closed convex hull of its extreme points. Proof. First we prove that an arbitrary non-void compact set C L has at least one extreme point. Applying Zorn’s lemma we see that C contains a closed extreme subset M which is minimal with respect to inclusion. We claim that M has only one element. This element is then an extreme point of C . Assume, on the contrary, that M has more than one element. Then, by Corollary D.4.3, there exists a continuous real linear functional l on L such that l is not constant on M . It is easy to check that the set N D fx 2 M W l.x/ D sup l.M /g is a proper, closed and extreme subset of M . This implies that N is a closed extreme subset of C contradicting the minimality of M . Denote by K0 the closed convex hull of the extreme points of K. Assume that there exists a y 2 K n K0 . Corollary D.4.3 with C D K0 and F D fyg shows the existence of a continuous real linear functional l on L such that sup l.K/ l.y/ > sup l.K0 /:
(1)
By the first part of the proof, the compact convex set N D fx 2 K W l.x/ D sup l.K/g which is obviously not empty, contains at least one extreme point x0 . As N is an extreme subset of K, x0 is an extreme point of K as well and hence x0 2 K0 . Since this contradicts (1), we must have K D K0 . Corollary D.4.6. Let K be a compact convex subset of L. For every x 2 K there exists a Radon probability measure x on ex .K/ such that Z l.x/ D l.y/ d x .y/ holds for every continuous linear functional l on L. Proof. By the previous theorem there exists a net fx˛ g of points of K of the form x˛ D
n˛ X
p˛j x˛j ;
n˛ < 1
j D1
P j j j where x˛ 2 ex.K/; p˛ 0; j p˛ D 1, such that lim˛ x˛ D x. We define the probability measures ˛ on ex.K/ by ˛ D
n˛ X j D1
p˛j ıx j : ˛
334
Appendix D Functional analysis
Since ex.K/ is compact, Theorem E.1.13 shows the existence of a subnet f ˇ g converging weakly to a probability measure x on ex.K/. For an arbitrary continuous linear functional l we then have Z Z X j j pˇ l.xˇ / D l.x/ l.y/ d x .y/ D lim l.y/ d ˇ .y/ D lim ˇ
ˇ
j
completing the proof.
D.5 Weak topologies D.5.1. Let V be a normed linear space and denote by V the linear space of all bounded linear functionals on V . Equipped with the norm (D.3.3.1), V is a Banach space. The weak topology on V is the coarsest topology (the topology with the fewest open sets) such that each element of V is a continuous function. The weak- topology on V is the coarsest topology such that for all v 2 V the maps l 7! l.v/; l 2 V , remain continuous. Thus, a net fv˛ g in V converges weakly to v 2 V if and only if lim l.v˛ / D l.v/; ˛
l 2V
while a net fl˛ g in V converges in the weak- topology to l 2 V if and only if lim l˛ .v/ D l.v/; ˛
v 2 V: w
w
In case of weak and weak- convergence we write v˛ ! v and l˛ ! l, respectively. The limits of weakly and weak- convergent sequences are unique. Moreover, strong convergence implies weak and weak- convergence. If V D H is a Hilbert space then, by Theorem D.3.4, V can be identified with H. Moreover, the weak and weak- topologies on H coincide and a net fh˛ g converges weakly to h if and only if lim .h˛ ; g/ D .h; g/; ˛
g 2 H:
In H any orthonormal sequence fen g1 nD1 converges weakly to 0. This follows from Bessel’s inequality 1 X j.en ; h/j2 khk2 ; h 2 H: nD1
Since ken k D 1, the sequence fen g does not converge strongly to 0. Let H1 and H2 be Hilbert spaces. The weak operator topology on the set B.H1 ; H2 / is the coarsest topology rendering all of the maps8 A 7! .Ag; h/; 8
g 2 H1 ; h 2 H2
Recall that B.H1 ; H2 / denotes the set of all bounded linear operators from H1 to H2 (cf. D.3.3).
335
Section D.5 Weak topologies
continuous, where .; / denotes the inner product of H2 . A net fA˛ g of operators A˛ 2 B.H1 ; H2 / converges in the weak operator topology to an operator A 2 B.H1 ; H2 / if and only if lim .A˛ g; h/ D .Ag; h/; g 2 H1 ; h 2 H2 : ˛
The relation above is equivalent to w
A˛ g ! Ag;
g 2 H1 : wo
In case of convergence in the weak operator topology we write A˛ ! A. Lemma D.5.2. Multiplication of bounded operators in a Hilbert space H is sepawo wo rately continuous in the weak operator topology, i.e., if A˛ ! A then A˛ B ! AB wo and BA˛ ! BA. wo
Proof. Assume that A˛ ! A so that lim .A˛ g; h/ D .Ag; h/; ˛
g; h 2 H:
wo
Replacing g by Bg we see that A˛ B ! AB. The second statement follows from lim .BA˛ g; h/ D lim .A˛ g; B h/ D .Ag; B h/ D .BAg; h/: ˛
˛
Remark D.5.3. Multiplication of operators is not jointly continuous in the weak operator topology. To see an example, let H be a Hilbert space with orthonormal basis fen g1 nD1 and write Uen WD enC1 . It is easy to check that U can be extended to a unitary operator in H. Using that any orthonormal sequence in H converges weakly wo wo to 0 (cf. D.5.1) we see that U n ! 0 and U n ! 0; n ! 1. On the other hand, U n U n is the identity operator for all n. Theorem D.5.4. Convergence of a net in the weak, weak- or weak operator topology implies that the net is bounded. w
Proof. Let V be a normed linear space and let fl˛ g be a net in V with l˛ ! l. Then for each v 2 V the net fl˛ .v/g converges to l.v/ and hence it is bounded. Theorem D.3.8 shows that the net fl˛ g is bounded. Assume that fv˛ g is a weakly convergent net in V and define the linear functional L˛ on V by L˛ .l/ WD l.v˛ /. Then kL˛ k D kv˛ k and the net fL˛ g converges in the weak- topology. By the first part of the proof fL˛ g, and hence also fv˛ g, is bounded. wo Let fA˛ g be a net in B.H1 ; H2 / where H1 and H2 are Hilbert spaces. If A˛ ! A, w then A˛ g ! Ag for all g 2 H1 . It follows that fA˛ gg is bounded for all g 2 H1 . Applying again Theorem D.3.8, we conclude that fA˛ g is bounded.
336
Appendix D Functional analysis
Theorem D.5.5. Let V be a normed linear space, H1 ; H2 be Hilbert spaces, fv˛ g; fl˛ g and fA˛ g be nets in V; V and B.H1 ; H2 /, respectively.9 w
(i)
If v˛ ! v, then kvk lim inf˛ kv˛ k.
(ii)
If l˛ ! l, then klk lim inf˛ kl˛ k.
(iii)
If A˛ ! A, then kAk lim inf˛ kA˛ k.
w
wo
Proof. (i) Using the linear functionals L˛ from Theorem D.5.4 we see that (i) follows from (ii). (ii) Let flˇ g be a subnet of fl˛ g such that limˇ klˇ k D lim inf˛ kl˛ k. For all v 2 V we have jl.v/j D lim jlˇ .v/j lim klˇ k kvk D lim inf kl˛ k kvk ˇ
ˇ
˛
from which (ii) follows. (iii) The proof is similar to that of (ii). We choose a subnet fAˇ g of fA˛ g such that limˇ kAˇ k D lim inf˛ kA˛ k. For all g 2 H1 and h 2 H2 we have j.Ag; h/j D lim j.Aˇ g; h/j lim kAˇ k kgk khk D lim inf kA˛ k kgk khk ˇ
ˇ
˛
from which the inequality in (iii) follows. Theorem D.5.6 (Banach–Alaoglu). Let V be a normed linear space and K V be a bounded set which is closed in the weak- topology. Then K is compact in the weak- topology. Proof. It suffices to consider the special case K D fl 2 V W klk rg; r > 0. This set is weak- closed in view of (D.5.5.ii) and jl.v/j rkvk for all l 2 K and all v 2 V . The weak- topology is the topology of pointwise convergence. Hence, by Tychonoff’s theorem, each net in K has a subnet converging pointwise to some function l0 on V . This function is obviously linear and jl0 .v/j rkvk, i.e., l0 2 K. Thus, K is compact in the weak- topology. Corollary D.5.7. Let H1 and H2 be Hilbert spaces and let S B.H1 ; H2 / be closed in the weak operator topology. Then S is compact in the weak operator topology if and only if Sg is bounded for all g 2 H1 . Proof. The “if part” follows by the same argument as in the proof of the previous theorem. Here we use the fact that the weak operator topology is the topology of pointwise convergence, considering the weak- topology on H2 . To show the “only if part”, assume that Sg is unbounded for some g 2 H1 . Then for all n 2 N there exists sn 2 S such that ksn gk n. By compactness we can choose a subsequence fsnk g which is convergent in the weak operator topology. In view of 9
For a net fr˛ g of real numbers lim inf˛ r˛ is defined by lim inf˛ r˛ D lim˛ inf frˇ W ˇ ˛g.
337
Section D.5 Weak topologies
Theorem D.5.4 this subsequence is bounded and hence fsnk gg is bounded, as well. This contradiction completes the proof. Theorem D.5.8. Let V be a normed linear space and let K V be convex. Then the weak and strong closures of K coincide. Proof. It suffices to prove that every strongly closed convex set E V is weakly closed. By Corollary D.4.3, for all x … E there exists a continuous real linear functional lx on V such that mx WD inf flx .h/ W y 2 Eg > lx .x/: Then
\
ED
fy 2 V W lx .y/ mx g:
x…E
Since each set fy 2 V W lx .y/ mx g is weakly closed, the set E is weakly closed as well. Theorem D.5.9. If S B.H1 ; H2 / is compact in the weak operator topology, then so is the weak operator closure of its convex hull. Proof. Corollary D.5.7 and Theorem D.3.8 show that S is bounded and hence so is its convex hull. By (D.5.5.iii) the weak operator closure Q of the convex hull is bounded as well. Therefore, the compactness of Q follows from Corollary D.5.7. Lemma D.5.10. Let B be a Banach space such that B (B ) is strictly convex.10 If K is a convex subset of B (B ) and K is closed in the weak (weak-) topology, then there exists a unique element of K with minimal norm. Proof. We consider only B, the statement about B can be proved in the same way. Choose a sequence fvn g in K such that lim kvn k D inf fkvk W v 2 Kg:
n!1
By Theorem D.5.6 this sequence has a subsequence, converging weakly to some element v 2 V . Since K is weakly closed we have v 2 K. Theorem D.5.5 shows that the norm of v is minimal. The uniqueness of v follows from the fact that B is strictly convex. Theorem D.5.11. Let S be a convex semigroup of linear contractions in H. If S is compact in the weak operator topology, then there exists a unique orthogonal projection s0 2 S such that s0 s D ss0 D s0 holds for all s 2 S .11 10 11
A normed space V is called strictly convex if the relations kvk D kwk and v ¤ w imply the inequality k.v C w/=2k < kvk. Hilbert spaces are strictly convex. A more general result can be found in [48].
338
Appendix D Functional analysis
Proof. Denote by H0 the set of common fixed points of the operators in S . Then 0 2 H0 and H0 is an S -invariant subspace of H. Write S WD fs W s 2 S g: It is easy to check that S is a compact convex semigroup of contractions. Moreover, the subspace H0? is S -invariant. Indeed, if h0 2 H0 and h 2 H0? are arbitrary then .s h; h0 / D .h; sh0 / D .h; h0 / D 0;
s 2 S:
For all h 2 H the set S h is convex. Using the fact that S is compact we see that S h is weakly compact. By Lemma D.5.10 there exists a unique h0 2 S h having minimal norm. Since S is a semigroup of contractions, s h0 is in S h and ks h0 k h0 for all s 2 S . By the uniqueness of h0 we must have s h0 D h0 . We prove that h0 D 0 if h 2 H0? . Let g 2 H be arbitrary and consider the weakly compact convex set Sg. The same arguments as above show the existence of g0 2 Sg such that sg0 D g0 for all s 2 S . Thus, g0 2 H0 and hence .g0 ; h0 / D 0. Choose s0 2 S such that g0 D s0 g. Then .g; h0 / D .g; s0 h0 / D .s0 g; h0 / D .g0 ; h0 / D 0;
g2H
and hence h0 D 0. We have thus shown that for all h0 2 H0? the set S h0 contains 0. Now let h1 ; : : : ; hn 2 H0? be arbitrary and choose s1 2 S such that s1 h1 D 0. Next we choose s2 2 S such that s2 .s1 h2 / D 0. Continuing this process we obtain an operator s 2 S with s hj D 0 for all j D 1; : : : ; n. A simple compactness argument shows the existence of an operator s0 2 S such that s0 h D 0 for all h 2 H0? . It follows that s0 H H0 and hence ss0 D s0 . It remains to prove that s0 s D s0 . Note first that the equations s0 ss0 D s0 and s0 ss0 s D s0 s hold because of ss0 D s0 . We have ks s0 hk D ks s0 s s0 hk ks0 s s0 hk ks s0 hk;
h 2 H:
Using this and the relation s0 s s0 D s0 we see that ks s0 hk D ks0 hk. If s0 h ¤ s s0 h for some h 2 H, then 1 1 ks0 hk D ks s0 hk D ks0 .s s0 h C s0 h/k ks s0 h C s0 hk < ks0 hk: 2 2 This contradiction shows that s0 D s s0 and therefore s0 s D s0 . The uniqueness of s0 follows at once from ss0 D s0 s D s0 .
Appendix E
Measure theory
E.1
Borel measures, weak and vague convergence
Mainly we need only measures on Rd or on Zd . However, at a few places we use Borel measures on topological spaces in general. In this section we collect some basic facts that will be sufficient for our purposes. We refer to Chapter 2 of [3] or to §§30–31 of [2] for the proofs.1 Throughout this section X will denote a Hausdorff topological space. The measures considered will be defined on the -algebra B.X / of all Borel subsets of X , i.e., the -algebra generated by the open subsets of X . A measure (nonnegative, real or complex) defined on B.X / will be called a Borel measure. In our terminology a measure need not be nonnegative. If and are finite Borel measures and c 2 C, then the measures c and C are defined by .c /.B/ D c .B/ and . C/.B/ D .B/C.B/, B 2 B.X /. Definition E.1.1. A nonnegative Radon measure on X is a nonnegative Borel measure such that (i)
.K/ < 1 for each compact set K X ;
(ii)
.B/ D sup f .K/ W K B; K compactg for each Borel set B X .
E.1.2. The set of all nonnegative Radon measures is denoted by MC .X / while MbC .X / denotes the set of all finite measures in MC .X /. A complex Radon measure on X is a complex measure of the form D . 1 2 / C i . 3 4 / where the j ’s are in MbC .X /. If a Borel measurable function f is j -integrable for R all j , then we say that f is -integrable. The integral f d is then defined by Z Z Z Z Z f d D f d 1 f d 2 C i f d 3 i f d 4 : We write Mb .X / for the set of all complex Radon measures on X while Mf .X / stands for the set of all complex measures with finite support. 1
We remark that there are some small differences in the terminology of these books, e.g., in the definitions of Borel and Radon measures.
340
Appendix E Measure theory
We define the measure by .B/ D .B/; B 2 B.X /. If 2 Mb .Rd /, then the measures 4 and Q are defined by 4 .B/ D .B/
and
.B/ Q D .B/;
B 2 B.Rd /
respectively. Note that on Rd each nonnegative Borel measure that is finite on compact sets is a Radon measure. If and are complex Radon measures on a locally compact space X and Z Z f d D f d; f 2 C00 .X / then D . Let be a complex Radon measure on a locally compact space X . Then there exist a finite nonnegative Borel measure j j and a Borel measurable function g on X such that2 jg.x/j D 1 for all x 2 X and Z Z f d D fg dj j (1) for every -integrable function f . This equation implies the inequality ˇZ ˇ Z ˇ ˇ ˇ f d ˇ jf j dj j: ˇ ˇ Theorem E.1.3 (Riesz representation theorem). Let X be locally compact. Then there is a bijection between all nonnegative linear functionals3 L on C00 .X / and all Radon measures on X given by Z L.f / D f d ; f 2 C00 .X /: Theorem E.1.4 (Riesz–Markov theorem). Let X be locally compact. For any continuous4 linear functional L on C0 .X /, there is a unique complex Radon measure on X such that Z L.f / D f d ; f 2 C0 .X /: The norm of L as a linear functional is j j.X /. Lemma E.1.5. Let F be a linear space of real-valued bounded functions on some set X such that 1 WD 1X 2 F. Further, let L be a real linear functional on F such that L.1/ D 1 and jL.f /j sup jf .x/j; f 2 F: x2X
Then L is nonnegative, i.e., L.p/ 0 whenever p 0. 2 3 4
See Theorem 14.14 and Corollary 11.41 in [27]. Nonnegativity means that L.f / 0 whenever f 0. We consider the supremum norm on C0 .X /
Section E.1 Borel measures, weak and vague convergence
341
Proof. If L was not nonnegative, then we could find a function p 2 F such that 0 p 1 and L.p/ < 0. From 1 D L.1/ D L.p/ C L.1 p/ it follows that L.1 p/ > 1. This contradicts the assumptions since 0 1 p 1. Definition E.1.6. The weak topology on MbC .X / is the coarsest R topology (the topology with the fewest open sets) such that the functions 7! f d become lower semicontinuous for every bounded lower semicontinuous real-valued function f on X .5 The space MbC .X / is a Hausdorff space in the weak topology. Convergence in the weak topology is characterized by the next theorem. Theorem E.1.7. For 2 MbC .X / and a net f ˛ g in MbC .X / the following properties are equivalent: (i)
˛ ! in the weak topology;
(ii)
lim inf ˛ .G/ .G/ for all open G X and lim ˛ .X / D .X /;
(iii)
lim sup ˛ .F / .F / for all closed F X and lim ˛ .X / D .X /; R R lim inf f d ˛ f d for all bounded lower semicontinuous f W X ! R. R R lim sup f d ˛ f d for all bounded upper semicontinuous f W X ! R.
(iv) (v)
If (i)–(v) are fulfilled, then R R (vi) lim f d ˛ D f d for all bounded continuous f W X 7! R. Finally, if X is a completely regular space,6 then property (vi) implies (i)–(v). We say that a net f ˛ g of complex Radon measures on a completely regular space X converges weakly to some complex measure if the relation (E.1.7.vi) holds. The set of finitely supported nonnegative measures is a dense subset of MbC .X / with respect to the weak topology. This set is dense even in a stronger sense: b Theorem E.1.8. For every 2 MC .X / there exists a net f ˛ g of nonnegative finitely supported measures such that
lim ˛ .B/ D .B/; ˛
5 6
B 2 B.X /:
Note that a real-valued function f on X is called lower (upper) semicontinuous if the set fx 2 X W f .x/ > rg (the set fx 2 X W f .x/ < rg, respectively) is open for every r 2 R. A topological space X is completely regular if, given any closed set F and any point x 2 X n F , there is a continuous real-valued function f on X such that f .x/ D 0 and f .y/ D 1 for every y 2 F . Locally compact Hausdorff spaces and metric spaces are completely regular.
342
Appendix E Measure theory
The next simple result shows that finitely supported measures on Rd can be approximated by absolutely continuous measures with compact support. Lemma E.1.9. For every finitely supported nonnegative measure on Rd there exists a sequence fhn g of nonnegative functions hn 2 C00 .Rd / such that the sequence fn g where dn .t / D hn .t / d.t / converges weakly to . Proof. Consider first the special case D ıx ; x 2 Rd , and denote by Bn .x/ the open ball with center Rx and radius 1=n; n 2 N. For each n we choose a nonnegative function hn such that hn d D 1 and hn .t / D 0; t … Bn .x/, and define n as in the statement of the lemma. Now let f be an arbitrary continuous real-valued function on Rd . For an arbitrary > 0 there exists N. / 2 N such that f .x/ < f .t / < f .x/ C ;
t 2 Bn .x/; n N. /:
Integrating both sides with respect to n gives Z f .x/ < f .t / dn .t / < f .x/ C ;
n N. /:
R Noting that f d D f .x/ and applying Theorem E.1.7 we see that fn g converges weakly to . The general case follows by linearity. Definition E.1.10. RThe vague topology on MC .X / is the coarsest topology such that the functions 7! f d are continuous for every function f 2 C00 .X /. Convergence in the vague topology is characterized by the next theorem. Theorem E.1.11. Let X be locally compact. For 2 MC .X / and a net f ˛ g in MC .X / the following properties are equivalent: (i)
˛ ! in the vague topology;
(ii)
lim sup ˛ .K/ .K/ for all compact K X and lim inf ˛ .G/ .G/ for all relatively compact7 open sets G X ;
(iii)
lim ˛ .B/ D .B/ for all relatively compact sets B 2 B.X / such that .@B/ D 0.8 The following relationship holds between the weak and the vague topologies:
7 8
A set is called relatively compact if its closure is compact. The symbol @B denotes the boundary of B. It consists of all points of the closure of B which are not inner points of B.
Section E.1 Borel measures, weak and vague convergence
343
Theorem E.1.12. Suppose that X is locally compact and let f ˛ g be a net in MbC .X /. Then f ˛ g converges weakly to a measure 2 MbC .X / if and only if it converges vaguely to and lim ˛ .X / D .X /. The next theorem is very useful. A special case of it is known as Helly’s selection theorem. Theorem E.1.13. Suppose that X is locally compact and let p > 0. The set f 2 MbC .X / W .X / pg is vaguely compact. If X is compact, then the set of all Radon probability measures on X is compact in the weak topology. If and are Radon measures on spaces X and Y , then there is a unique Radon measure on X Y , called product measure, satisfying . /.A B/ D .A/ .B/ for all compact A X; B Y: The next theorem states that the product measure depends continuously on both of its arguments. Theorem E.1.14. Let X and Y be Hausdorff spaces. Then . ; / 7! is weakly continuous as a mapping from MbC .X / MbC .Y / to MbC .X Y /. If X and Y are locally compact, then . ; / 7! is vaguely continuous as a mapping from MC .X / MC .Y / to MC .X Y /. Theorem E.1.15. Let X and Y be Hausdorff spaces and let f W X ! Y be a continuous mapping. Then for any 2 MbC .X / the image measure f belongs to MbC .Y /. Moreover, the transformation 7! f from MbC .X / to MbC .Y / is continuous in the weak topology. The previous theorem has a useful corollary. Corollary E.1.16. Let f W Rd ! Rk be continuous. If a sequence fXn g of d-dimensional random vectors converges in distribution9 to a random vector X , then ff .Xn /g converges in distribution to f .X /. Theorem E.1.17. Let .O˛ /˛2I be an open covering of X and on each O˛ let a Radon measure ˛ be given such that ˛ .B/ D ˇ .B/ for each pair of indices ˛; ˇ 2 I and for each Borel set B O˛ \ Oˇ . Then there is a uniquely determined Radon measure on X such that .B/ D ˛ .B/ for every Borel set B O˛ . 9
Recall that, by definition, a sequence of random vectors converges in distribution to a random vector X if the sequence of the corresponding distributions converges weakly to the distribution of X .
344
Appendix E Measure theory
Lemma E.1.18. Let f be a Borel measurable function on a locally compact Hausdorff space X and be a complex Radon measure on X . Suppose that there exists K 0 such that the inequality ˇZ ˇ ˇ ˇ ˇ f d ˇ K ˇ ˇ S
holds for all compact sets S X . Then f is -integrable and the same inequality holds with S replaced by X . Proof. Let g be as in (E.1.2.1) and write h D fg. Then ˇ ˇZ ˇ ˇZ ˇ ˇ ˇ ˇ ˇ f d ˇ D ˇ h dj jˇ K ˇ ˇ ˇ ˇ S
S
holds for all compact sets S X . We show first that h is j j-integrable. Since jRe zj jzj and jIm zj jzj; z 2 C, we may suppose that h is real-valued. Setting hC D max .0; h/; h D min .0; h/ and ShC D fx 2 X W h.x/ 0g, we have h D hC h and Z Z 0 hC dj j D h dj j K: S \ShC
S
The validity of the last inequality follows from the fact that is a Radon measure and hence the integral over S \ ShC can be approximated by integrals over compact sets. From this we conclude that hC is j j-integrable. The integrability of h follows in the same way. Thus h is integrable. Since jf j D jhj, the function f is integrable as well. As is a Radon measure, there exist compact sets Kn X; n 2 N, such that Z Z f d D f d lim n!1 K n
S
from which the required inequality follows. Lemma E.1.19 (Scheffe). Let be a nonnegative measure on a measurable space . ; A/ and let f; fn be nonnegative, -integrable functions such that lim fn D f
n!1
-almost everywhere and Z
Z lim
n!1
fn d D
f d :
Z
Then lim
n!1
jfn f j d D 0:
345
Section E.2 Convolution of measures and functions
Proof. The function fn C f jfn f j is obviously nonnegative. Applying Fatou’s lemma we obtain Z Z f d D lim inf .fn C f jfn f j/ d 2 n!1 Z lim inf .fn C f jfn f j/ d n!1 Z Z f d lim sup jfn f j d D2 n!1
from which the statement follows.
E.2
Convolution of measures and functions
This section contains basic facts about the convolution of functions and measures. Proofs and more details can be found in the books [2, 3, 27, 28]. All measures in this section will be complex Borel measures on Rd (cf. Section E.1). Definition E.2.1. The additive convolution or simply convolution of two finite Borel measures and is defined as the image of the product measure under the mapping A W Rd Rd ! Rd where A.x; y/ D x C y. If d D 1, then the multiplicative convolution ? is the image of the product measure under the mapping M W R R ! R where M.x; y/ D xy. E.2.2. The convolution and the multiplicative convolution ? are again finite Borel measures satisfying Z ZZ f d D f .x C y/ d .x/ d.y/ Z
and
ZZ g d ? D
g.xy/ d .x/ d.y/
for arbitrary bounded continuous functions f on Rd and g on R, respectively. In the sequel we consider only the additive convolution, analogous formulae hold for the multiplicative convolution as well. We have (i)
D ;
(ii)
. / D . /;
(iii)
.c / D c . /;
(iv)
. C / D C ;
(v)
ıx ıy D ıxCy .
c 2 C;
346
Appendix E Measure theory
The last three P the convolution of the finitelydsupported measP equations imply that ures D njD1 cj ıxj and D m kD1 dk ıyk , where xj ; yk 2 R ; cj ; dk 2 C, is given by m n X X cj dk ıxj Cyk : D j D1 kD1
E.2.3. Let 2 Mb .Rd / and f; g 2 L1 .Rd /. Recall that D d denotes the Lebesgue measure on Rd . We define the functions f and f g for x 2 Rd by Z f .x/ D f .x y/ d.y/ Z
and f g.x/ D
f .x y/g.y/ d.y/
if the integrals exist, otherwise we set f .x/ D 0 or f g.x/ D 0, respectively. It can be shown that the integrals above exist -almost everywhere (see also Theorem E.2.4 below). Moreover, the analogues of (E.2.2.i–iv) hold for convolutions of functions and measures and for convolutions of functions as well (-almost everywhere). If is absolutely continuous with respect to the Lebesgue measure, i.e., d .x/ D f .x/ d.x/ with some f 2 L1 .Rd /, then is absolutely continuous, as well, and d .x/ D f .x/ d.x/: If is absolutely continuous, too, i.e., d.x/ D g.x/ d.x/ with some g 2 L1 .Rd /, then d .x/ D f g.x/ d.x/: Theorem E.2.4. If f; g 2 L2 .Rd / then the integral Z f g.x/ WD f .x y/g.y/ d.y/ exists for all x 2 Rd and f g 2 C0 .Rd /. If f 2 L1 .Rd / and g 2 Lp .Rd /; 1 p 1, then the integral above exists -almost everywhere, f g 2 L1 .Rd / and kf gkp kf k1 kgkp . The next theorem states that the convolution depends continuously on both of its arguments. Theorem E.2.5. If the nets f ˛ g; f˛ g converge weakly to and , respectively, then f ˛ ˛ g converges weakly to .
Appendix F
Probability
In this chapter we collect some basic definitions, facts, and notation of probability theory. For more details see, e.g., [10, 16, 17, 37, 45].
F.1 Basic notions A probability space is a triple . ; A; P /, where is a nonempty set, A is a -algebra of subsets of , and P is a probability measure1 on A. A d-dimensional random vector on . ; A; P / is an .A; Bd /-measurable function X W ! Rd , where Bd denotes the -algebra of Borel subsets of Rd . If d D 1, then we call X a random variable. If we mention two or more random vectors in a theorem, definition, etc., we implicitly assume that all random vectors are defined on the same probability space. Two random vectors X and Y are called independent if the equality P .X 2 B; Y 2 D/ D P .X 2 B/ P .Y 2 D/ holds for all B; D 2 Bd . Let g and h be .Bd ; Bn /-measurable functions from Rd into Rn . If X and Y are independent, then the random vectors g.X / and h.Y / are independent as well. Complex-valued random variables, complex random vectors and their independence are defined in the same way by considering the -algebra of Borel subsets of Cd . If not stated otherwise, random vectors are supposed to be real. The probability measure X defined by X .B/ WD P .X 2 B/;
B 2 Bd
is called the distribution of the random vector X , while the distribution function FX of X is defined by FX .t / D X ..1; t1 / .1; td //;
t 2 Rd :
A random vector X is said to be continuous if X is continuous, i.e., if the equation X .ft g/ D P .X D t / D 0 holds for all t 2 Rd . If there exists a nonnegative Borel measurable function p on Rd such that Z X .B/ D p.x/ d.x/; B 2 Bd B
1
That is, a nonnegative measure such that P . / D 1.
348
Appendix F Probability
i.e., if X is absolutely continuous with respect to the Lebesgue measure, then p is called the density of X (or of X ). In this case we say that X is an absolutely continuous random vector. If there exists a Borel subset S of Rd such that .S / D 0 and X .S / D 1, then X and X are said to be singular (with respect to the Lebesgue measure). If there exists a denumerable subset S of Rd such that X .S / D 1, then X and X are said to be discrete. Note that discrete distributions are singular. For arbitrary X the distribution X can be written as X D p1 d C p2 ac C p3 sc where p1 ; p2 ; p3 are nonnegative numbers summing to 1 and d ; ac ; sc are probability distributions such that d is discrete, ac is absolutely continuous, and sc is continuous and singular. If X and Y are independent, then XCY D X Y where denotes convolution2 of measures, respectively. We have Z Z Z h.t / d. X Y /.t / D h.x C y/ d X .x/ d Y .y/ Rd
Rd
Rd
(1)
for every bounded, continuous complex-valued function h on Rd . If Y is absolutely continuous, then X C Y is absolutely continuous as well, and its density pXCY is given by Z pY .y x/ d X .x/ (2) pXCY .y/ D Rd
where pY denotes the density of Y . If d D 1, then XY D X ? Y where ? denotes multiplicative convolution. We have Z Z Z h.t / d X ? Y .t / D h.xy/ d X .x/ d Y .y/ Rd
Rd
Rd
for every bounded, continuous complex-valued function h on R. Let X be a P -integrable real- or complex-valued random variable. The number Z X.!/ dP .!/ E .X / WD
is called the expectation of X while the variance Var.X / of X is defined by Var .X / WD E.jX E.X /j2 /: Note that this definition allows the variance to be 1. 2
For properties of convolution of functions and measures we refer to Section E.2.
(3)
349
Section F.1 Basic notions
If X and Y are independent and integrable, then X Y is integrable and we have E.X Y / D E.X / E.Y /: More generally, let X D .X1 ; : : : ; Xd / be a d-dimensional real or complex random vector such that Xj is P -integrable for all j . Then the expectation of X is defined by E .X / WD .EX1 ; : : : ; EXd /: Suppose that Xj is square integrable for all j and put mj D EXj . The matrix cov.X / D .cj k /dj;kD1 where cj k D EŒ.Xj mj /.Xk mk / D EŒXj Xk mj mk is called the covariance matrix of X . The covariance matrix is positive semidefinite (cf. Section D.2), since ˇ d ˇ2 d ˇX ˇ X ˇ ˇ cj k zj zk D Eˇ zj .Xj mj /ˇ 0; zj 2 C: ˇ ˇ j;kD1
j D1
If the entries of a matrix Y D .Yj k / are integrable random variables, then we write E.Y / WD .EYj k /: Using this notation, the covariance matrix can also be written in the form3 cov.X / D E Œ.X E.X // .X E.X // : Let A 2 Cd d be a square matrix. Then, as it is easy to verify, cov.AX / D A cov.X / A : Let ˛ D .˛1 ; : : : ; ˛d / 2 Nd0 be a d -tuple of nonnegative integers and let be the distribution of a d-dimensional random vector X . If the function x 7! x ˛ is integrable, then we call the number Z x ˛ d .x/ M˛ D M˛ . / D M˛ .X / D Rd
the moment of order ˛ of (of X ). In this case we also say that the moment of order ˛ of exists. Note that M˛ exists if and only if Z A˛ D A˛ . / D A˛ .X / D jx ˛ j d .x/ < 1: Rd
3
Recall our convention from Section A.2 saying that in expressions involving matrix operations the elements of Rd and Cd are considered as column vectors.
350
Appendix F Probability
The number A˛ is called the absolute moment of order ˛. The existence of M˛ implies the existence of Mˇ for all ˇ ˛. In the case d D 1 the inequality jxj2k1
x 2k C x 2k2 ; 2
x 2 R; k 1
shows that
M2k C M2k2 ; k 2 N: 2 Let n > k 1 and suppose that An exists. By Hölder’s inequality,
Z 1=p Z k pk Ak D jxj 1 d .x/ jxj d .x/ A2k1
R
(4)
R
for every p > 1. Setting p D
n k
we find
1=k
Ak
A1=n n ;
1 k n:
(5)
We will also use the simple but useful inequality P .jX E .X /j /
1 ; 2
>0
(6)
p where D Var .X / . This inequality is called Chebyshev’s inequality, it holds for every random variable X with finite variance.
F.2 Convergence of random vectors A sequence fXn g of random vectors is said to converge in distribution, or converge weakly, or converge in law to a random vector X if the sequence f n g of their distributions converges weakly (see Section E.1) to the distribution of X . We say that the sequence fXn g converges in probability to X if for all > 0 lim P .kX Xn k / D 0:
n!1
If g W Rd ! Rn is continuous and fXn g is a sequence of d-dimensional random vectors which converges in distribution (in probability) to X , then fg.Xn /g converges in distribution4 (in probability, respectively) to g.X /. Finally, fXn g converges to X almost surely if P .f! 2 W lim Xn .!/ D X.!/g/ D 1: n!1
Note that the set of all ! for which lim Xn .!/ exists belongs to A. Almost sure convergence implies convergence in probability which implies convergence in distribution. The converse is not true, but if a sequence fXn g converges in 4
See Corollary E.1.16 for the convergence in distribution.
351
Section F.2 Convergence of random vectors
distribution to a constant random vector, then the sequence is convergent in probability, as well. Theorem F.2.1 (zero–one law of Borel–Cantelli). Let . ; A; P / be a probability space and fAn g1 nD1 be a sequence of events in A. P (i) If P .An / < 1, then the probability that infinitely many of the An ’s occur is 0. P (ii) If the events are pairwise independent and P .An / D 1, then the probability that infinitely many of them occur is 1.
Lemma F.2.2. Let fXn g and fYn g be sequences of random vectors such that X P .Xn ¤ Yn / < 1: P P Then Xn is almost surely convergent if and only if Yn is almost surely convergent (and they converge to the same limit). Proof. The lemma follows immediately from the fact that, by the zero–one law of Borel–Cantelli (see Theorem F.2.1), the probability that the event ŒXn ¤ Yn happens for infinitely many n is zero. Theorem F.2.3. Let X1 ; X2 ; : : : be random variables and c1 ; c2 ; : : : be complex numbers such that 1 X jcn j < 1: sup E jXn j < 1 and n2N
nD1
Then the series
1 X
cn Xn
(1)
nD1
converges absolutely with probability one. If in addition supn2N E jXn j2 < 1, then the series converges in mean square to the same limit.5 Proof. The monotone convergence theorem gives E
X 1
X N jcn Xn j D lim E jcn Xn j N !1
nD1
lim
N !1
5
1
nD1
X N
nD1
jcn j sup E jXn j < 1 n
Note that E jX j .E jX j2 / 2 in view of (F.1.5) and therefore the condition supn E jXn j2 < 1 implies that supn E jXn j < 1.
352
Appendix F Probability
P from which it follows that n jcn Xn .!/j is finite with probability one. Thus, the sequence of partial sums of (1) converges almost surely to a limit X . If K WD supn E jXn j2 < 1 and 0 < m < n, then ˇ2 ˇ
X 2 ˇ ˇ X X X ˇ ˇ cj Xj ˇ D cj ck E .Xj X k / K jcj j : Eˇ ˇ ˇ m