287 18 10MB
English Pages 550 [628] Year 2021
Foundations of Constructive Probability Theory Using Bishop’s work on constructive analysis as a framework, this monograph gives a systematic, detailed, and general constructive theory of probability theory and stochastic processes. It is the first extended account of this theory: Almost all of the constructive existence and continuity theorems that permeate the book are original. It also contains results and methods hitherto unknown in the constructive and nonconstructive settings. The text features logic only in the common sense and, beyond a certain mathematical maturity, requires no prior training in either constructive mathematics or probability theory. It will thus be accessible and of interest to both probabilists interested in the foundations of their specialty and constructive mathematicians who wish to see Bishop’s theory applied to a particular field.
Y u e n - K w o k C h a n completed a PhD in constructive mathematics with Errett Bishop before leaving academia for a career in private industry. He is now an independent researcher in probability and its applications.
ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS This series is devoted to significant topics or themes that have wide application in mathematics or mathematical science and for which a detailed development of the abstract theory is less important than a thorough and concrete exploration of the implications and applications. Books in the Encyclopedia of Mathematics and Its Applications cover their subjects comprehensively. Less important results may be summarized as exercises at the ends of chapters. For technicalities, readers can be referred to the bibliography, which is expected to be comprehensive. As a result, volumes are encyclopedic references or manageable guides to major subjects ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS All the titles listed below can be obtained from good booksellers or from Cambridge University Press. For a complete series listing, visit www.cambridge.org/mathematics 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177
J. Berstel, D. Perrin, and C. Reutenauer Codes and Automata T. G. Faticoni Modules over Endomorphism Rings H. Morimoto Stochastic Control and Mathematical Modeling G. Schmidt Relational Mathematics P. Kornerup and D. W. Matula Finite Precision Number Systems and Arithmetic Y. Crama and P. L. Hammer (eds.) Boolean Models and Methods in Mathematics, Computer Science, and Engineering V. Berthé and M. Rigo (eds.) Combinatorics, Automata, and Number Theory A. Kristály, V. D. R˘adulescu, and C. Varga Variational Principles in Mathematical Physics, Geometry, and Economics J. Berstel and C. Reutenauer Noncommutative Rational Series with Applications B. Courcelle and J. Engelfriet Graph Structure and Monadic Second-Order Logic M. Fiedler Matrices and Graphs in Geometry N. Vakil Real Analysis through Modern Infinitesimals R. B. Paris Hadamard Expansions and Hyperasymptotic Evaluation Y. Crama and P. L. Hammer Boolean Functions A. Arapostathis, V. S. Borkar, and M. K. Ghosh Ergodic Control of Diffusion Processes N. Caspard, B. Leclerc, and B. Monjardet Finite Ordered Sets D. Z. Arov and H. Dym Bitangential Direct and Inverse Problems for Systems of Integral and Differential Equations G. Dassios Ellipsoidal Harmonics L. W. Beineke and R. J. Wilson (eds.) with O. R. Oellermann Topics in Structural Graph Theory L. Berlyand, A. G. Kolpakov, and A. Novikov Introduction to the Network Approximation Method for Materials Modeling M. Baake and U. Grimm Aperiodic Order I: A Mathematical Invitation J. Borwein et al. Lattice Sums Then and Now R. Schneider Convex Bodies: The Brunn-Minkowski Theory (Second Edition) G. Da Prato and J. Zabczyk Stochastic Equations in Infinite Dimensions (Second Edition) D. Hofmann, G. J. Seal, and W. Tholen (eds.) Monoidal Topology M. Cabrera García and Á. Rodríguez Palacios Non-Associative Normed Algebras I: The Vidav-Palmer and Gelfand-Naimark Theorems C. F. Dunkl and Y. Xu Orthogonal Polynomials of Several Variables (Second Edition) L. W. Beineke and R. J. Wilson (eds.) with B. Toft Topics in Chromatic Graph Theory T. Mora Solving Polynomial Equation Systems III: Algebraic Solving T. Mora Solving Polynomial Equation Systems IV: Buchberger Theory and Beyond V. Berthé and M. Rigo (eds.) Combinatorics, Words, and Symbolic Dynamics B. Rubin Introduction to Radon Transforms: With Elements of Fractional Calculus and Harmonic Analysis M. Ghergu and S. D. Taliaferro Isolated Singularities in Partial Differential Inequalities G. Molica Bisci, V. D. Radulescu, and R. Servadei Variational Methods for Nonlocal Fractional Problems S. Wagon The Banach-Tarski Paradox (Second Edition) K. Broughan Equivalents of the Riemann Hypothesis I: Arithmetic Equivalents K. Broughan Equivalents of the Riemann Hypothesis II: Analytic Equivalents M. Baake and U. Grimm (eds.) Aperiodic Order II: Crystallography and Almost Periodicity M. Cabrera García and Á. Rodríguez Palacios Non-Associative Normed Algebras II: Representation Theory and the Zel’manov Approach A. Yu. Khrennikov, S. V. Kozyrev, and W. A. Zúñiga-Galindo Ultrametric Pseudodifferential Equations and Applications S. R. Finch Mathematical Constants II J. Krajíˇcek Proof Complexity D. Bulacu, S. Caenepeel, F. Panaite and F. Van Oystaeyen Quasi-Hopf Algebras P. McMullen Geometric Regular Polytopes M. Aguiar and S. Mahajan Bimonoids for Hyperplane Arrangements M. Barski and J. Zabczyk Mathematics of the Bond Market: A Lévy Processes Approach T. R. Bielecki, J. Jakubowski, and M. Niew¸egłowski Fundamentals of the Theory of Structured Dependence between Stochastic Processes A. A. Borovkov Asymptotic Analysis of Random Walks: Light-Tailed Distributions Y.-K. Chan Foundations of Constructive Probability Theory
Foundations of Constructive Probability Theory YUEN-KWOK CHAN Citigroup
University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781108835435 DOI: 10.1017/9781108884013 © Yuen-Kwok Chan 2021 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2021 A catalogue record for this publication is available from the British Library. Library of Congress Cataloging-in-Publication Data Names: Chan, Yuen-Kwok, author. Title: Foundations of constructive probability theory / Yuen-Kwok Chan. Description: Cambridge, UK ; New York, NY : Cambridge University Press, 2021. | Series: Encyclopedia of mathematics and its applications | Includes bibliographical references and index. Identifiers: LCCN 2020046705 (print) | LCCN 2020046706 (ebook) | ISBN 9781108835435 (hardback) | ISBN 9781108884013 (epub) Subjects: LCSH: Probabilities. | Stochastic processes. | Constructive mathematics. Classification: LCC QA273 .C483 2021 (print) | LCC QA273 (ebook) | DDC 519.2–dc23 LC record available at https://lccn.loc.gov/2020046705 LC ebook record available at https://lccn.loc.gov/2020046706 ISBN 978-1-108-83543-5 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Dedicated to the memory of my father, Tak-Sun Chan
Contents
Acknowledgments Nomenclature
page x xi
PART I INTRODUCTION AND PRELIMINARIES
1
1
Introduction
3
2
Preliminaries 2.1 Natural Numbers 2.2 Calculation and Theorem 2.3 Proof by Contradiction 2.4 Recognizing Nonconstructive Theorems 2.5 Prior Knowledge 2.6 Notations and Conventions
6 6 6 7 7 7 8
3
Partition of Unity 3.1 Abundance of Compact Subsets 3.2 Binary Approximation 3.3 Partition of Unity 3.4 One-Point Compactification
4
Integration and Measure 4.1 Riemann–Stieljes Integral 4.2 Integration on a Locally Compact Metric Space 4.3 Integration Space: The Daniell Integral 4.4 Complete Extension of Integration 4.5 Integrable Set 4.6 Abundance of Integrable Sets 4.7 Uniform Integrability
PART II PROBABILITY THEORY
vii
17 17 19 27 34 43 45 45 47 51 54 65 73 88
viii
Contents 4.8 4.9 4.10
5
Measurable Function and Measurable Set Convergence of Measurable Functions Product Integration and Fubini’s Theorem
Probability Space 5.1 Random Variable 5.2 Probability Distribution on Metric Space 5.3 Weak Convergence of Distributions 5.4 Probability Density Function and Distribution Function 5.5 Skorokhod Representation 5.6 Independence and Conditional Expectation 5.7 Normal Distribution 5.8 Characteristic Function 5.9 Central Limit Theorem PART III STOCHASTIC PROCESS
94 109 119 138 138 151 155 164 169 182 189 201 220 225
6
Random Field and Stochastic Process 6.1 Random Field and Finite Joint Distributions 6.2 Consistent Family of f.j.d.’s 6.3 Daniell–Kolmogorov Extension 6.4 Daniell–Kolmogorov–Skorokhod Extension
227 227 231 241 262
7
Measurable Random Field 7.1 Measurable r.f. That Is Continuous in Probability 7.2 Measurable Gaussian Random Field
275 275 290
8
Martingale 8.1 Filtration and Stopping Time 8.2 Martingale 8.3 Convexity and Martingale Convergence 8.4 Strong Law of Large Numbers
299 299 306 314 323
9
a.u. Continuous Process 9.1 Extension from Dyadic Rational Parameters to Real Parameters 9.2 C-Regular Family of f.j.d.’s and C-Regular Process 9.3 a.u. Hoelder Process 9.4 Brownian Motion 9.5 a.u. Continuous Gaussian Process
331 333 337 346 353 362
a.u. Càdlàg Process 10.1 Càdlàg Function 10.2 Skorokhod Space D[0,1] of Càdlàg Functions 10.3 a.u. Càdlàg Process 10.4 D-Regular Family of f.j.d.’s and D-Regular Process
374 375 384 405 409
10
Contents 10.5 10.6 10.7 10.8 10.9 10.10 10.11 11
Right-Limit Extension of D-Regular Process Is a.u. Càdlàg Continuity of the Right-Limit Extension Strong Right Continuity in Probability Sufficient Condition for an a.u. Càdlàg Martingale Sufficient Condition for Right-Hoelder Process a.u. Càdlàg Process on [0,∞) First Exit Time for a.u. Càdlàg Process
Markov Process 11.1 Markov Process and Strong Markov Process 11.2 Transition Distribution 11.3 Markov Semigroup 11.4 Markov Transition f.j.d.’s 11.5 Construction of a Markov Process from a Semigroup 11.6 Continuity of Construction 11.7 Feller Semigroup and Feller Process 11.8 Feller Process Is Strongly Markov 11.9 Abundance of First Exit Times 11.10 First Exit Time for Brownian Motion APPENDICES
ix 413 433 443 458 464 478 488 493 494 495 500 502 510 524 536 548 561 568 575
Appendix A
Change of Integration Variables
577
Appendix B
Taylor’s Theorem
605
References
606
Index
609
Acknowledgments
Yuen-Kowk Chan is retired from Citigroup’s Mortgage Analytics unit. All opinions expressed by the author are his own. The author is grateful to the late Professor E. Bishop for teaching him constructive mathematics, to the late Professors R. Getoor and R. Blumenthal for teaching him probability and for mentoring him, and to the late Professors R. Pyke and W. Birnbaum and the other statisticians in the Mathematics Department of the University of Washington, circa 1970s, for their moral support. The author is also thankful to the constructivists in the Mathematics Department of New Mexico State University, circa 1975, for hosting a sabbatical visit and for valuable discussions, especially to Professors F. Richman, D. Bridges, M. Mandelkern, W. Julian, and the late Professor R. Mines. Professors Melody Chan and Fritz Scholz provided incisive and valuable critiques of the introduction chapter of an early draft of this book. Professor Douglas Bridges gave many thoughtful comments of the draft. The author also wishes to thank Ms Jill Hobbs for her meticulous copyediting, and Ms Niranjana Harikrishnan for her aesthetically pleasing typography.
x
Nomenclature
≡........................................... by definition equal to, 8 R........................................... set of real numbers, 8 decld ...................................... Euclidean metric, 8 a ∨ b..................................... max(a,b),8 a ∧ b..................................... min(a,b),8 a+ ......................................... max(a,0),8 a− ......................................... min(a,0), 8 A ∪ B.................................... union of sets A and B, 8 A ∩ B,AB............................. intersection of sets A and B, 8 [a]1 ........................................ an integer [a]1 ∈ (a,a + 2) for given a ∈ R, 9 X|A....................................... restriction of function X on a set to a subset A, 9 X ◦ X,X (X)....................... composite of functions X and X, 10 (X ≤ a)................................. {ω ∈ domain(X) : X(ω) ≤ a}, 11 X(·,ω ).... ............................ function of first variable, given value of second variable for a function X of two variables, 12 T ∗ (Y ) ≡ T (·,Y )................... dual function of Y relative to a certain mapping T , 12 (S,d) .....................................metric space, with metric d on set S, 12 x y .................................... d(x,y) > 0, where x,y are in some metric space, 13 Jc ..........................................metric complement of subset J in a metric space, 12 .......................................... direct product of functions or sets, 14 Cu (S,d),Cu (S)..................... space of uniformly continuous real-valued functions on metric space (S,d), 14 Cub (S,d),Cub (S)................. subspace of Cu (S,d) whose members are bounded, 14 C0 (S,d), C0 (S).....................subspace of Cu (S,d) whose members vanish at infinity, 15 C(S,d),C(S).........................subspace of Cu (S,d) whose members have bounded supports, 15 d............................................ 1 ∧ d, 15 O,o........................................bounds for real-valued function, 16 ............................................ mark for end of proof or end of definition, 16 ξ ............................................ binary approximation of a metric space, 20
ξ ........................................ modulus of local compactness corresponding to ξ , 20 (S,d)..................................... one-point compactification of (S,d), 34 ........................................... point at infinity, 34 F |B....................................... {f |B : f ∈ F }, 35 +∞ −∞ X(x)dF (x).................. Riemann–Stieljes integral, 46 1A .......................................... indicator of measurable set A, 67 Ac .......................................... measure-theoretic complement of measurable set A, 67 ♦........................................... ordering between certain real numbers and functions, 73
xi
xii
Nomenclature
(G,λ).....................................profile system, 73 (a,b) α............................. the interval (a,b) is bounded in profile by α, 74 (,L,E)................................probability space, 138 E(dω)X(ω)....................... E(X), 139 ρP rob .................................... the probability metric on r.v.’s, 145 L(G)..................................... probability subspace generated by the family G of r.v.’s, 150 J(S,d)...................................set of distributions on complete metric space (S,d), 151 ⇒.......................................... weak convergence of distributions, or convergence in distributions of r.r.v.’s, 155 ρDist,ξ .................................. metric on distributions on a locally compact metric space relative to binary approximation ξ , 156 FX ......................................... P.D.F. induced on R by an r.r.v. X, 165 Sk,ξ .................................... Skorokhod representation of distributions on (S,d), determined by ξ , 170 L|L ....................................... subspace of conditionally integrable r.r.v.’s given the subspace L , 184 L|G ........................................subspace of conditionally integrable r.r.v.’s, given L(G), 184 EA ......................................... conditional expectation given an event A with positive probability, 184 ϕμ,σ ...................................... multivariate normal p.d.f., 192 μ,σ ..................................... multivariate normal distribution, 193 ϕ0,I ....................................... multivariate standard normal p.d.f., 193 0,I ...................................... multivariate standard normal distribution, 193 ........................................... tail of univariate standard normal distribution, 193 ψX .........................................characteristic function of r.v. X with values in R n , 204 ψJ ......................................... characteristic function of distribution J on R n , 204 g ............................................ Fourier transform of complex-valued function g on R n , 204 f g......................................Convolution of complex-valued functions f and g on R n , 204 ρchar ..................................... metric on characteristic functions on R n , 210 × ,S)......................... set of r.f.’s with parameter set Q, state space (S,d), and sample space R(Q (,L,E), 227 × ,S) to parameter subset K ⊂ Q, 227 X|K...................................... restriction of X ∈ R(Q δCp,K .................................... modulus of continuity in probability of X|K, 228 δcau,K ................................... modulus of continuity a.u. of X|K, 228 δauc,K ................................... modulus of a.u. continuity of X|K, 228 (Q,S)................................. set of consistent families of f.j.d.’s with parameter set Q and state space F S, 232 (Q,S) relative to the binary approximation ρ Marg,ξ,Q ............................ marginal metric for the set F ξ , 237 (Q,S) whose members are continuous in probability, 238 Cp (Q,S)............................. subset of F F Cp (Q,S) relative to dense subset Q∞ of parameter metric ρ Cp,ξ,Q,Q(∞) ...................... metric on F space Q, 240 × ,S), 265 ρ P rob,Q ................................ probability metric on R(Q Cp (Q,S), 277 ρSup,P rob ............................. metric on F Cp (Q × ,S).....................subset of R(Q×,S) whose members are continuous in probability, 228 R × ,S) whose members are measurable, 276 Meas (Q × ,S)................. subset of R(Q R Meas (Q × ,S) ∩ R Cp (Q × ,S), 276 Meas,Cp (Q × ,S)........... R R L............................................filtration in probability space (,L,E), 300 LX ........................................ natural filtration of a process X, 300 L+ ........................................ right-limit extension of filtration L, 301 L(τ ) ....................................... probability subspace of observables at stopping time τ relative to filtration L, 302 λ............................................ the special convex function on R, 316 m ,Q∞ ,Q∞ ........ certain subsets of dyadic rationals in [0,∞), 332 Qm,Qm , Q (C[0,1],ρ C[0,1] )................... metric space of a.u. continuous processes on [0,1], 334
Nomenclature
xiii
Lim ......................................................... extension by limit of a process with parameter set Q∞ to parameter set [0,1], 335 D[0,1]....................................................... set of all a.u. càdlàg processes on [0,1], 406 δaucl .......................................................... modulus of a.u. càdlàg, 406 δ(aucl),δ(cp) [0,1]....................................subset of D[0,1] whose members have moduli δCp , and D δaucl , 406 ...................................................... metric on D[0,1], 409 ρD[0,1] Dreg (Q∞ × ,S), ρP rob,Q(∞) )......... metric space of D-regular processes, 410 (R rLim ....................................................... extension by right-limit of a process with parameter set Q∞ to parameter set [0,1], 418 βauB ......................................................... modulus of a.u. boundedness, 445 δSRCp ....................................................... modulus of strong right continuity in probability, 445 τ f ,a,N (X)................................................ certain first exit times by the process X, 488 T................................................................ a Markov semigroup, 500 δT .............................................................. a modulus of strong continuity of T, 500 αT ............................................................. a modulus of smoothness of T, 500 ∗,T Fr(1),··· ,r(m) ............................................. a finite joint transition distribution generated by T, 502 (T ,ρT )................................................... metric space of Markov semigroups. 525 V............................................................... a Feller semigroup, 538 δV .............................................................. a modulus of strong continuity of V, 538 αV ............................................................. a modulus of smoothness of V, 538 κV .............................................................. a modulus of nonexplosion of V, 538 ∗,V Fr(1),··· ,r(m) ..............................................a finite joint transition distribution generated by V, 539 ((S,d),(,L,E),{U x,V : x ∈ S})............ Feller process, 543 ((R m,d m ),(,L,E),{B x : x ∈ R m })...... Brownian motion as a Feller process, 568
Part I Introduction and Preliminaries
1 Introduction
The present work on probability theory is an outgrowth of the constructive analysis in [Bishop 1967] and [Bishop and Bridges 1985]. Perhaps the simplest explanation of constructive mathematics is by way of focusing on the following two commonly used theorems. The first, the principle of finite search, states that, given a finite sequence of 0-or-1 integers, either all members of the sequence are equal to 0, or there exists a member that is equal to 1. We use this theorem without hesitation because, given the finite sequence, a finite search would determine the result. The second theorem, which we may call the principle of infinite search, states that, given an infinite sequence of 0-or-1 integers, either all members of the sequence are equal to 0, or there exists a member that is equal to 1. The name “infinite search” is perhaps unfair, but it brings into sharp focus the point that the computational meaning of this theorem is not clear. The theorem is tantamount to an infinite loop in computer programming without any assurance of termination. Most mathematicians acknowledge the important distinction between the two theorems but regard the principle of infinite search as an expedient tool to prove theorems, with the belief that theorems so proved can then be specialized to constructive theorems, when necessary. Contrary to this belief, many classical theorems proved directly or indirectly via the principle of infinite search are actually equivalent to the latter: as such, they do not have a constructive proof. Oftentimes, not even the numerical meaning of the theorems in question is clear. We believe that, for the constructive formulations and proofs of even the most abstract theorems, the easiest way is to employ a disciplined and systematic approach, by using only finite searches and by quantifying mathematical objects and theorems at each and every step, with natural numbers as a starting point. The references cited earlier show that this approach is not only possible but also fruitful. It should be emphasized that we do not claim that theorems whose proofs require the principle of infinite search are untrue or incorrect. They are certainly correct and consistent derivations from commonly accepted axioms. There is 3
4
Introduction and Preliminaries
indeed no reason why we cannot discuss such classical theorems alongside their constructive counterparts. The term “nonconstructive mathematics” is not meant to be pejorative. We will use, in its place, the more positive term “classical mathematics.” Moreover, it is a myth that constructivists use a different system of logic. The only logic we use is commonsense logic; no formal language is needed. The present author considers himself a mathematician who is neither interested in nor equipped to comment on the formalization of mathematics, whether classical or constructive. Since a constructively valid argument is also correct from the classical viewpoint, a reader of the classical persuasion should have no difficulties understanding our proofs. Proofs using only finite searches are surely agreeable to any reader who is accustomed to infinite searches. Indeed, the author would consider the present book a success if the reader, but for this introduction and occasional remarks in the text, finishes reading without realizing that this is a constructive treatment. At the same time, we hope that a reader of the classical persuasion might consider the more disciplined approach of constructive mathematics for his or her own research an invitation to a challenge. Cheerfully, we hasten to add that we do not think that finite computations in constructive mathematics are the end. We would prefer a finite computation with n steps to one with n! steps. We would be happy to see a systematic and general development of mathematics that is not only constructive but also computationally efficient. That admirable task will, however, be left to abler hands. Probability theory, which is rooted in applications, can naturally be expected to be constructive. Indeed, the crowning achievements of probability theory – the laws of large numbers, the central limit theorems, the analysis of Brownian motion processes and their stochastic integrals, and the analysis of Levy processes, to name just a few – are exemplars of constructive mathematics. Kolmogorov, the grandfather of modern probability theory, actually took an interest in the formalization of general constructive mathematics. Nevertheless, many a theorem in modern probability actually implies the principle of infinite search. The present work attempts a systematic constructive development. Each existence theorem will be a construction. The input data, the construction procedure, and the output objects are the essence and integral parts of the theorem. Incidentally, by inspecting each step in the procedure, we can routinely observe how the output varies with the input. Thus a continuity theorem in epsilon–delta terms routinely follows an existence theorem. For example, we will construct a Markov process from a given semigroup and prove that the resulting Markov process varies continuously with the semigroup. The reader familiar with the probability literature will notice that our constructions resemble Kolmogorov’s construction of the Brownian motion process, which is replete with Borel–Cantelli estimates and rates of convergence. This is in contrast to popular proofs of existence via Prokhorov’s theorem. The reader can
Introduction
5
regard Part III of this book, Chapters 6–11, the part on stochastic processes, as an extension of Kolmogorov’s constructive methods to stochastic processes: Daniell– Kolmogorov–Skorokhod construction of random fields, measurable random fields, a.u. continuous processes, a.u. càdlàg processes, martingales, strong Markov processes, and Feller processes, all with locally compact state spaces. Such a systematic, constructive, and general treatment of stochastic processes, we believe, has not previously been attempted. The purpose of this book is twofold. First, a student with a general mathematics background at the first-year graduate-school level can use it as an introduction to probability or to constructive mathematics. Second, an expert in probability can use it as a reference for further constructive development in his or her own research specialties. Part II of this book, Chapters 3–5, is a repackaging and expansion of the measure theory in [Bishop and Bridges 1985]. It enables us to have a self-contained probability theory in terms familiar to probabilists. For expositions of constructive mathematics, see the first chapters of the last cited reference. See also [Richman 1982] and [Stolzenberg 1970]. We give a synopsis in Chapter 2, along with basic notations and terminologies for later reference.
2 Preliminaries
2.1 Natural Numbers We start with the natural numbers as known in elementary schools. All mathematical objects are constructed from natural numbers, and every theorem is ultimately a calculation on the natural numbers. From natural numbers are constructed the integers and the rational numbers, along with the arithmetical operations, in the manner taught in elementary schools. We claim to have a natural number only when we have provided a finite method to calculate it, i.e., to find its decimal representation. This is the fundamental difference from classical mathematics, which requires no such finite method; an infinite procedure in a proof is considered just as good in classical mathematics. The notion of a finite natural number is so simple and so immediate that no attempt is needed to define it in even simpler terms. A few examples would 9 suffice as clarification: 1,2, and 3 are natural numbers. So are 99 and 99 ; the multiplication method will give, at least in principle, their decimal expansion in a finite number of steps. In contrast, the “truth value” of a particular mathematical statement is a natural number only if a finite method has been supplied that, when carried out, would prove or disprove the statement.
2.2 Calculation and Theorem An algorithm or a calculation means any finite, step-by-step procedure. A mathematical object is defined when we specify the calculations that need to be done to produce this object. We say that we have proved a theorem if we have provided a step-by-step method that translates the calculations doable in the hypothesis to a calculation in the conclusion of the theorem. The statement of the theorem is merely a summary of the algorithm contained in the proof. Although we do not, for good reasons, write mathematical proofs in a computer language, the reader would do well to compare constructive mathematics to the development of a large computer software library, with successive objects and library functions being built from previous ones, each with a guarantee to finish in a finite number of steps. 6
Preliminaries
7
2.3 Proof by Contradiction There is a trivial form of proof by contradiction that is valid and useful in constructive mathematics. Suppose we have already proved that one of two given alternatives, A and B, must hold, meaning that we have given a finite method, that, when unfolded, gives either a proof for A or a proof for B. Suppose subsequently we also prove that A is impossible. Then we can conclude that we have a proof of B; we need only exercise said finite method, and see that the resulting proof is for B.
2.4 Recognizing Nonconstructive Theorems Consider the simple theorem “if a is a real number, then a ≤ 0 or 0 < a,” which may be called the principle of excluded middle for real numbers. We can see that this theorem implies the principle of infinite search by the following argument. Let (x)i=1,2,... be any given sequence of 0-or-1 integers. Define the real number −i a= ∞ i=1 xi 2 . If a ≤ 0, then all members of the given sequence are equal to 0; if 0 < a, then some member is equal to 1. Thus the theorem implies the principle of infinite search, and therefore cannot have a constructive proof. Consequently, any theorem that implies this limited principle of excluded middle cannot have a constructive proof. This observation provides a quick test to recognize certain theorems as nonconstructive. Then it raises the interesting task of examining the theorem for constructivization of a part or the whole, or the task of finding a constructive substitute of the theorem that will serve all future purposes in its stead. For the aforementioned principle of excluded middle of real numbers, an adequate constructive substitute is the theorem “if a is a real number, then, for arbitrarily small ε > 0, we have a < ε or 0 < a.” Heuristically, this is a recognition that a general real number a can be computed with arbitrarily small, but nonzero, error.
2.5 Prior Knowledge We assume that the reader of this book has familiarity with calculus, real analysis, and metric spaces, as well as some rudimentary knowledge of complex analysis. These materials are presented in the first chapters of [Bishop and Bridges 1985]. We will also quote results from typical undergraduate courses in calculus or linear algebra, with the minimal constructivization wherever needed. We assume also that the reader has had an introductory course in probability theory at the level of [Feller I 1971] or [Ross 2003]. The reader should have no difficulty in switching back and forth between constructive mathematics and classical mathematics, or at least no more than in switching back and forth between classical mathematics and computer programming. Indeed, the reader is urged to read, concurrently with this book if not before delving into it, the many classical texts in probability.
8
Introduction and Preliminaries 2.6 Notations and Conventions
If x,y are mathematical objects, we write x ≡ y to mean “x is defined as y,” “x, which is defined as y,” “x, which has been defined earlier as y,” or any other grammatical variation depending on the context.
2.6.1 Numbers Unless otherwise indicated, N,Q, and R will denote the set of integers, the set of rational numbers in the decimal or binary system, and the set of real numbers, respectively. We will also write {1,2, . . .} for the set of positive integers. The set R is equipped with the Euclidean metric d ≡ decld . Suppose a,b,ai ∈ R for i = m,m + 1, . . . for some m ∈ N. We will write limi→∞ ai for the limit of the sequence am,am+1, . . . if it exists, without explicitly referring to m. We will write a ∨ b,a ∧ b,a+ , and a− for max(a,b), min(a,b),a ∨ 0, and a ∧ 0, respectively. The sum ni=m ai ≡ am + · · · + an is understood to be 0 if n < m. The product n understood to be 1 if n < m. Suppose ai ≥ 0 for i = m, i=m ai ≡ am · · · an is ai < ∞ if and only if ∞ m + 1, . . . We write ∞ i=m i=m |ai | < ∞, in which n a is taken to be lim a . In other words, unless otherwise case ∞ n→∞ i=m i i=m i specified, convergence of a series of real numbers means absolute convergence. Regarding real numbers, we quote Lemma 2.18 from [Bishop and Bridges 1985], which will be used, extensively and without further comments, in the present book. Limited proof by contradiction of an inequality of real numbers. Let x,y be real numbers such that the assumption x > y implies a contradiction. Then x ≤ y. This lemma remains valid if the relations > and ≤ are replaced by < and ≥, respectively. We note, however, that if the relations > and ≤ are replaced by ≥ and 0 such that y − x > ε, which is more than a proof of x ≤ y; the latter requires only a proof that x > y is impossible and does not require the calculation of anything. The reader should ponder on the subtle but important difference.
2.6.2 Set, Operation, and Function Set. In general, a set is a collection of objects equipped with an equality relation. To define a set is to specify how to construct an element of the set, and how to prove that two elements are equal. A set is also called a family. A member ω in the collection is called an element of the latter, or, in symbols, ω ∈ . The usual set-theoretic notations are used. Let two subsets A and B of a set be given. We will write A ∪ B for the union, and A ∩ B or AB for the intersection. We write A ⊂ B if each member ω of A is a member of B. We write A ⊃ B for
Preliminaries
9
B ⊂ A. The set-theoretic complement of a subset A of the set is defined as the set {ω ∈ : ω ∈ A implies a contradiction}. We write ω A if ω ∈ A implies a contradiction. Nonempty set. A set is said to be nonempty if we can construct some element ω ∈ . Empty set. A set is said to be empty if it is impossible to construct an element ω ∈ . We will let φ denote an empty set. Operation. Suppose A,B are sets. A finite, step-by-step, method X that produces an element X(x) ∈ B given any x ∈ A is called an operation from A to B. The element X(x) need not be unique. Two different applications of the operation X with the same input element x can produce different outputs. An example of an operation is [·]1 , which assigns to each a ∈ R an integer [a]1 ∈ (a,a + 2). This operation is a substitute of the classical operation [·] and will be used frequently in the present work. Function. Suppose , are sets. Suppose X is an operation that, for each ω in some nonempty subset A of , constructs a unique member X(ω) in . Then the operation X is called a function from to , or simply a function on . The subset A is called the domain of X. We then write X : → , and write domain(X) for the set A. Thus a function X is an operation that has the additional property that if ω1 = ω2 in domain(X), then X(ω1 ) = X(ω2 ) in . To specify a function X, we need to specify its domain as well as the operation that produces the image X(ω) from each given member ω of domain(X). Two functions X,Y are considered equal, X = Y in symbols, if domain(X) = domain(Y ), and if X(ω) = Y (ω) for each ω ∈ domain(X). When emphasis is needed, this equality will be referred to as the set-theoretic equality, in contradistinction to almost everywhere equality, to be defined later. A function is also called a mapping. Domain of function on a set need not be the entire set. The nonempty domain(X) is not required to be the whole set . This will be convenient when we work with functions defined only almost everywhere, in a sense to be made precise later in the setting of a measure/integration space. Henceforth, we write X(ω) only with the implicit or explicit condition that ω ∈ domain(X). Miscellaneous set notations and function notations. Separately, we sometimes use the expression ω → X(ω) for a function X whose domain is understood. For example, the expression ω → ω2 stands for the function X : R → R defined by X(ω) ≡ ω2 for each ω ∈ R. Let X : → be a function, and let A be a subset of such that A ∩ domain(X) is nonempty. Then the restriction X|A of X to A is defined as the function from A to with domain(X|A) ≡ A ∩ domain(X) and (X|A)(ω) for each ω ∈ domain(X|A). The set
10
Introduction and Preliminaries B ≡ {ω ∈ : ω = X(ω) for some ω ∈ domain(X)}
is called the range of the function X, and is denoted by range(X). A function X : A → B is called a surjection if range(X) = B; in that case, there exists an operation Y : B → A, not necessarily a function, such that X(Y (b) = b for each b ∈ B. The function X is called an injection if for each a,a ∈ domain(X) with X(a) = X(a ), we have a = a . It is called a bijection if domain(X) = A and if X is both a surjection and an injection. Let X : B → A be a surjection with domain(X) = B. Then the triple (A,B,X) is called an indexed set. In that case, we write Xb ≡ X(b) for each b ∈ B. We will, by abuse of notations, call A or {Xb : b ∈ B} an indexed set, and write A ≡ {Xb : b ∈ B}. We will call B the index set, and say that A is indexed by the members b of B. Finite set, enumerated set, countable set. A set A is said to be finite if there exists a bijection v : {1, . . . ,n} → A, for some n ≥ 1, in which case we write |A| ≡ n and call it the size of A. We will then call v an enumeration of the set A, and call the pair (A,v) an enumerated set. When the enumeration v is understood from the context, we will abuse notations and simply call the set A ≡ {v1, . . . ,vn } an enumerated set. A set A is said to be countable if there exists a surjection v : {1,2, . . .} → A. A set A is said to be countably infinite if there exists a bijection v : {1,2, . . .} → A. We will then call v an enumeration of the set A, and call the pair (A,v) an enumerated set. When the enumeration v is understood from the context, we will abuse notations and simply call the set A ≡ {v1,v2, . . .} an enumerated set. Composite function. Suppose X : → and X : → are such that the set A defined by A ≡ {ω ∈ domain(X) : X(ω) ∈ domain(X )} is nonempty. Then the composite function X ◦ X : → is defined to have domain(X ◦ X) = A and (X ◦ X)(ω) = X (X(ω)) for ω ∈ A. The alternative notation X (X) will also be used for X ◦ X. Sequence. Let be a set and let n ≥ 1 be an arbitrary integer. A function ω : {1, . . . ,n} → that assigns to each i ∈ {1, . . . ,n} an element ω(i) ≡ ωi ∈ is called a finite sequence of elements in . A function ω : {1,2, . . . ,} → that assigns to each i ∈ {1,2, . . .} an element ω(i) ≡ ωi ∈ is called an infinite sequence of elements in . We will then write ω ≡ (ω1, . . . ,ωn ) ≡ or (ωi )i=1,...,n in the first case, and write (ω1,ω2, . . .) or (ωi )i=1,2,... in the second case, for the sequence ω. If, in addition, j is a sequence of integers in domain(ω), such that jk < jh for each k < h in domain(j ), then the sequence ω ◦ j : domain(j ) → is called a subsequence of ω. Throughout this book, we will write a subscripted symbol ab interchangeably with a(b) to lessen the burden on subscripts. Thus, ab(c) stands for abc . Similarly, ωjk ≡ ωj (k) ≡ ω(j (k)) for each k ∈ domain(j ), and we write (ωj (1),ωj (2), . . .) or (ωj (k) )k=1,2,... , or simply (ωj (k) ), for the
Preliminaries
11
subsequence when the domain of j is clear. If (ω1, . . . ,ωn ) is a sequence, we will write {ω1, . . . ,ωn } for the range of ω. Thus an element ω0 ∈ is in {ω1, . . . ,ωn } if and only if there exists i = 1, . . . ,n such that ω0 = ωi . Suppose (ωi )i=1,2,..., and (ωi )i=1,2,..., are two infinite sequences. We will write (ωi ,ωi )i=1,2,... for the merged sequence (ω1,ω1 ,ω2,ω2 , . . .). Similar notations are used for several sequences. Cartesian product of sequence of sets. Let (n )n=0,1,... be a sequence of nonempty sets. Consider any 0 ≤ n ≤ ∞, i.e., n is a nonnegative integer or the symbol ∞. We will let (n) denote the Cartesian product nj=0 j . Consider 0 ≤ k < ∞ with k ≤ n. The coordinate function πk is the function with domain(πk ) = (n) and πk (ω0,ω1, . . .) = ωk . If n = for each n ≥ 0, then we will write n for (n) for each n ≥ 0. Let X be a function on k and let Y be a function on (k) . When confusion is unlikely, we will use the same symbol X also for the function X ◦ πk on (n) , which depends only on the kth coordinate. Likewise, we will use Y also for the function Y ◦ (π0, . . . ,πk ) on (n) , which depends only on the first k + 1 coordinates. Thus every function on k or (k) is identified with a function on (∞) . Accordingly, sets of functions on k ,(k) are regarded also as sets of functions on (n) . Function of several functions. Let M be the family of all real-valued functions on , equipped with the set-theoretic equality for functions. Suppose X,Y ∈ M and suppose f is a function on R × R such that the set D ≡ {ω ∈ domain(X) ∩ domain(Y ) : (X(ω),Y (ω)) ∈ domain(f )} is nonempty. Then f (X,Y ) is defined as the function with domain(f (X,Y )) ≡ D and f (X,Y )(ω) ≡ f (X(ω),Y (ω)) for each ω ∈ D. The definition extends to a sequence of functions in the obvious manner. Examples are where f (x,y) ≡ x +y for each (x,y) ∈ R × R, or where f (x,y) ≡ xy for each (x,y) ∈ R × R. Convergent series of real-valued functions. Suppose (Xi )i=m,m+1,... is a sequence of real-valued functions on a set . Suppose the set ∞ ∞
domain(Xi ) : |Xi (ω)| < ∞ D≡ ω∈ i=m
i=m
∞ is nonempty. Then ∞ i=m Xi is defined as the function with domain i=m Xi ≡ ∞ D and with value ∞ i=m Xi (ω) for each ω ∈ D. This function i=m Xi is then called a convergent series. Thus convergence for series means absolute convergence. Ordering of functions. Suppose X,Y ∈ M and A is a subset of , and suppose a ∈ R. We say X ≤ Y on A if (i) A ∩ domain(X) = A ∩ domain(Y ) and (ii) X(ω) ≤ Y (ω) for each ω ∈ A ∩ domain(X). If X ≤ Y on , we will simply write X ≤ Y . Thus X ≤ Y implies domain(X) = domain(Y ). We write X ≤ a if X(ω) ≤ a for each ω ∈ domain(X). We will write (X ≤ a) ≡ {ω ∈ domain(X) : X(ω) ≤ a}.
12
Introduction and Preliminaries
We make similar definitions when the relation ≤ is replaced by , or =. We say X is nonnegative if X ≥ 0. Suppose a ∈ R. We will abuse notations and write a also for the constant function X with domain(X) = and with X(ω) = a for each ω ∈ domain(X). Regarding one of several variables as a parameter. Let X be a function on the product set × . Let ω ∈ be such that (ω ,ω ) ∈ domain(X) for some ω ∈ . Define the function X(ω ,·) on by domain(X(ω ,·)) ≡ {ω ∈ : (ω ,ω ) ∈ domain(X)}, and by X(ω ,·)(ω ) ≡ X(ω ,ω ) for each ω ∈ domain(X(ω ,·)). Similarly, let ω ∈ be such that (ω ,ω ) ∈ domain(X) for some ω ∈ . Define the function X(·,ω ) on by domain(X(·,ω )) ≡ {ω ∈ : (ω ,ω ) ∈ domain(X)}, and by X(·,ω )(ω ) ≡ X(ω ,ω ) for each ω ∈ domain(X(·,ω )). More generally, given a function X on the Cartesian product × × · · · × (n) , for each (ω ,ω , . . . ,ω(n) ) ∈ domain(X), we define similarly the functions X(·,ω ,ω , . . . ,ω(n) ), X(ω , · ,ω , . . . ,ω(n) ), . . .,X(ω ,ω , . . . ,ω(n−1),·) on the sets , , . . . ,(n) , respectively. Let M ,M denote the families of all real-valued functions on two sets , , respectively, and let L be a subset of M ’. Suppose T : × L → R
(2.6.1)
is a real-valued function. We can define the function T ∗ : L → M with domain(T ∗ ) ≡ {Y ∈ L : domain(T (·,Y )) is nonempty} and by T ∗ (Y ) ≡ T (·,Y ). When there is no risk of confusion, we write T also for the function T ∗ , we write T Y for T (·,Y ), and we write T : L → M interchangeably with the expression 2.6.1. Thus the duality T (·,Y )(ω ) ≡ T (ω ,Y ) ≡ T (ω ,·)(Y ).
(2.6.2)
2.6.3 Metric Space The definitions and notations related to metric spaces in [Bishop and Bridges 1985], with few exceptions, are familiar to readers of classical texts. A summary of these definitions and notations follows. Metric complement. Let (S,d) be a metric space. If J is a subset of S, its metric complement is the set {x ∈ S : d(x,y) > 0 for all y ∈ J }. Unless otherwise specified, Jc will denote the metric complement of J .
Preliminaries
13
Condition valid for all but countably many points in metric space. A condition is said to hold for all but countably many members of S if it holds for each member in the metric complement Jc of some countable subset J of S. Inequality in a metric space. We will say that two elements x,y ∈ S are unequal, and write x y, if d(x,y) > 0. Metrically discrete subset of a metric space. We will call a subset A of S metrically discrete if, for each x,y ∈ A we have x = y or d(x,y) > 0. Classically, each subset A of S is metrically discrete. Limit of a sequence of functions with values in a metric space. Let (fn )n=1,2,... be a sequence of functions from a set to S such that the set ∞ D≡ ω∈ domain(fi ) : lim fi (ω) exists in S i=1
i→∞
is nonempty. Then limi→∞ fi is defined as the function with domain (limi→∞ fi ) ≡ D and with value lim fi (ω) ≡ lim fi (ω) i→∞
i→∞
for each ω ∈ D. We emphasize that limi→∞ fi is well defined only if it can be shown that D is nonempty. Continuous function. A function f : S → S is said to be uniformly continuous on a subset A ⊂ domain(f ), relative to the metrics d,d on S,S respectively, if there exists an operation δ : (0,∞) → (0,∞) such that d (f (x),f (y)) < ε for each x,y ∈ A with d(x,y) < δ(ε), for each ε > 0. When there is a need to be precise as to the metrics d,d , we will say that f : (S,d) → (S ,d ) is uniformly continuous on A. The operation δ is called a modulus of continuity of f on A. Lipschitz continuous function. If there exists a coefficient c ≥ 0 such that d (f (x),f (y)) ≤ cd(x,y) for all x,y ∈ A, then the function f is said to be Lipschitz continuous on A, and the constant c is then called a Lipschitz constant of f on A. In that case, we will say simply that f has Lipschitz constant c. Totally bounded metric space, compact metric space. A metric space (S,d) is said to be totally bounded if, for each ε > 0, there exists a finite subset A ⊂ S such that for each x ∈ S there exists y ∈ A with d(x,y) < ε. The subset A is then called an ε-approximation of S. A compact metric space K is defined as a complete and totally bounded metric space. Locally compact metric space. A subset A ⊂ S is said to be bounded if there exists x ∈ S and a > 0 such that A ⊂ (d(·,x) ≤ a). A subset S ⊂ S is said to be locally compact if every bounded subset of S is contained in some compact subset. The metric space (S,d) is said to be locally compact if the subset S is locally compact. Continuous function on metric space. A function f : (S,d) → (S ,d ) is said to be continuous if domain(f ) = S and if it is uniformly continuous on each bounded subset K of S.
14
Introduction and Preliminaries
Product of a finite sequence of metric spaces. Suppose (Sn,dn )n=1,2,... is a sequence of metric spaces. For each integer n ≥ 1, define n n (n) d (x,y) ≡ di (x,y) ≡ (d1 ⊗ · · · ⊗ dn )(x,y) ≡ di (xi ,yi ) for each x,y ∈
n
i=1
i=1 Si .
i=1
Then
(S (n),d (n) ) ≡
n
(Si ,di ) ≡
i=1
n
Si ,
i=1
n
di
i=1
is a metric space called the product metric space of S1, . . . ,Sn . Product of an infinite sequence of metric spaces. Define the infinite product ∞ metric ∞ i=1 di on i=1 Si by ∞ ∞
(∞) d (x,y) ≡ di (x,y) ≡ 2−i (1 ∧ di (xi ,yi )) for each x,y ∈
i=1
∞
i=1
i=1 Si .
Define the infinite product metric space ∞ ∞ ∞ (S (∞),d (∞) ) ≡ (Si ,di ) ≡ Si , di . i=1
i=1
i=1
Powers of a sequence of metric spaces. Suppose, in addition, (Sn,dn ) is a copy of the same metric space (S,d) for each n ≥ 1. Then we simply write (S n,d n ) ≡ (S (n),d (n) ) and (S ∞,d ∞ ) ≡ (S (∞),d (∞) ). Thus, in this case, d n (x,y) ≡
n
d(xi ,yi )
i=1
for each x = (x1, . . . ,xn ),y = (y1, . . . ,yn ) ∈ S n , and ∞
d (x,y) ≡
∞
2−i (1 ∧ d(xi ,yi ))
i=1
for each x = (x1,x2, . . .),y = (y1,y2, . . .) ∈ S ∞ . If, in addition, (Sn,dn ) is locally compact for each n ≥ 1, then the finite product space (S (n),d (n) ) is locally compact for each n ≥ 1, while the infinite product space (S (∞),d (∞) ) is complete but not necessarily locally compact. If (Sn,dn ) is compact for each n ≥ 1, then both the finite and infinite product spaces are compact. Spaces of real-valued continuous functions. Suppose (S,d) is a metric space. We will write Cu (S,d), or simply Cu (S), for the space of real-valued functions functions on (S,d) with domain(f ) = S that are uniformly continuous on S. We will write Cub (S,d), or simply Cub (S), for the subspace of Cu (S) whose members are bounded. Let x◦ be an arbitrary, but fixed, reference point in (S,d). A uniformly continuous function f on (S,d) is then said to vanish at infinity if, for
Preliminaries
15
each ε > 0, there exists a > 0 such that |f | ≤ ε for each x ∈ S with d(x,x◦ ) > a. Write C0 (S,d), or simply C0 (S), for the space of continuous functions on (S,d) that vanish at infinity. Space of real-valued continuous functions with bounded support. A realvalued function f on S is said to have a subset A ⊂ S as support if x ∈ domain(f ) and |f (x)| > 0 together imply x ∈ A. Then we also say that f is supported by A, or that A supports f . We will write C(S,d), or simply C(S), for the subspace of Cu (S,d) whose members have bounded supports. In the case where (S,d) is locally compact, C(S) consists of continuous functions on (S,d) with compact supports. Summing up, C(S) ⊂ C0 (S) ⊂ Cub (S) ⊂ Cu (S). Infimum and supremum. Suppose a subset A of R is nonempty. A number b ∈ R is called a lower bound of A, and A said to bounded from below, if b ≤ a for each a ∈ A. A lower bound b of A is called the greatest lower bound, or infimum, of A if b ≥ b for each lower bound b of A. In that case, we write inf A ≡ b. Similarly, a number b ∈ R is called an upper bound of A, and A said to be bounded from above, if b ≥ a for each a ∈ A. An upper bound b of A is called the least upper bound, or supremum, of A if b ≤ b for each upper bound b of A. In that case, we write sup A ≡ b. In contrast to classical mathematics, there is no constructive general proof for the existence of a infimum for an subset of R that is bounded from below. Existence needs to be proved before each usage for each special case, much as in the case of limits. In that regard, [Bishop and Bridges 1985] prove that if a nonempty subset A of R is totally bounded, then both inf A and sup A exist. Moreover, suppose f is a continuous real-valued function on a compact metric space (K,d). Then the last cited text proves that infK f ≡ inf{f (x) : x ∈ K} and supK f ≡ sup{f (x) : x ∈ K} exist.
2.6.4 Miscellaneous Notations for if, only if, etc. The symbols ⇒, ⇐, and ⇔ will in general stand for “only if,” “if,” and “if and only if,” respectively. An exception will be made where the symbol ⇒ is used for weak convergence, as defined later. The intended meaning will be clear from the context. Capping a metric at 1. If (S,d) is a metric space, we will write d ≡ 1 ∧ d. Abbreviation for subsets. We will often write “x,y, . . . ,z ∈ A” as an abbreviation for “{x,y, . . . ,z} ⊂ A.” Default notations for numbers. Unless otherwise indicated by the context, the symbols i,j,k,m,n,p will denote integers, the symbols a,b will denote real
16
Introduction and Preliminaries
numbers, and the symbols ε,δ will denote positive real numbers. For example, the statement “for each i ≥ 1” will mean “for each integer i ≥ 1.” Notations for convergence. Suppose (an )n=1,2,... is a sequence of real numbers. Then an → a stands for limn→∞ an = a. We write an ↑ a if (an ) is a nondecreasing sequence and an → a. Similarly, we write an ↓ a if (an ) is a nonincreasing sequence and an → a. More generally, suppose f is a function on some subset A ⊂ R. Then f (x) → a stands for limx→x0 f (x) = a, where x0 can stand for a real number or for one of the symbols ∞ or −∞. Big O and small o. Suppose f and g are functions with domains equal to some subset A ⊂ R. Let x0 ∈ A be arbitrary. If for some c > 0, we have |f (x)| ≤ c|g(x)| for all x ∈ A in some neighborhood B of x0 , then we write f (x) = O(g(x)). If for each ε > 0, we have |f (x)| ≤ ε|g(x)| for each x ∈ A in some neighborhood B of x0 , then we write f (x) = o(g(x)). Here, a subset B ⊂ R is a neighborhood of x0 if there exists an open interval (a,b) such that x0 ∈ (a,b). End-of-proof or end-of-definition marker. Finally, we sometimes use the symbol to mark the end of a proof or a definition.
3 Partition of Unity
In the previous chapter, we summarized the basic concepts and theorems about metric spaces from [Bishop and Bridges 1985]. Locally compact metric space was introduced. It is a simple and wide-ranging generalization of the real line. With few exceptions, the metric spaces used in the present book are locally compact. In the present chapter, we will define and construct binary approximations of a locally compact metric space (S,d), then define and construct a partition of unity relative to each binary approximation. Roughly speaking, a binary approximation is a digitization of (S,d), a generalization of the dyadic rationals that digitize the space R of real numbers. A partition of unity is then a sequence in C(S,d) that serves as a basis for the linear space C(S,d) of continuous functions on (S,d) with compact supports, in the sense that each f ∈ C(S,d) can be approximated by linear combinations of members in the partition of unity. A partition of unity provides a countable set of basis functions for the metrization of probability distributions on the space (S,d). Because of that important role, we will study binary approximations and partitions of unity in detail in this chapter.
3.1 Abundance of Compact Subsets First we cite a theorem from [Bishop and Bridges 1985] that guarantees an abundance of compact subsets. Theorem 3.1.1. Abundance of compact sets. Let f : K → R be a continuous function on a compact metric space (K,d) with domain(f ) = K. Then, for all but countably many real numbers α > infK f , the set (f ≤ α) ≡ {x ∈ K : f (x) ≤ α} is compact. Proof. See theorem (4.9) in chapter 4 of [Bishop and Bridges 1985].
Classically, the set (f ≤ α) is compact for each α ≥ infK f , without exception. Such a general theorem would, however, imply the principle of infinite search and is therefore nonconstructive. Theorem 3.1.1 is sufficient for all our purposes. 17
18
Introduction and Preliminaries
Definition 3.1.2. Convention for compact sets (f ≤ α). We hereby adopt the convention that if the compactness of the set (f ≤ α) is required in a discussion, compactness has been explicitly or implicitly verified, usually by proper prior selection of the constant α, enabled by an application of Theorem 3.1.1. The following simple corollary of Theorem 3.1.1 guarantees an abundance of compact neighborhoods of a compact set. Corollary 3.1.3. Abundance of compact neighborhoods. Let (S,d) be a locally compact metric space, and let K be a compact subset of S. Then the subset Kr ≡ (d(·,K) ≤ r) ≡ {x ∈ S : d(x,K) ≤ r} is compact for all but countably many r > 0. Proof. 1. Let n ≥ 1 be arbitrary. Then An ≡ (d(·,K) ≤ n) is a bounded set. Since (S,d) is locally compact, there exists a compact set Kn such that An ⊂ Kn ⊂ S. The continuous function f on the compact metric space (Kn,d) defined by f ≡ d(·,K) has infimum 0. Hence, by Theorem 3.1.1, the set {x ∈ Kn : d(x,K) ≤ r} is compact for all but countably many r ∈ (0,∞). In other words, there exists a countable subset J of (0,∞) such that for each r in the metric complement Jc of J in (0,∞), the set {x ∈ Kn : d(x,K) ≤ r} is compact. 2. Now let r ∈ Jc be arbitrary. Take n ≥ 1 so large that r ∈ (0,n). Then Kr ⊂ An . Hence the set Kr = Kr An = Kr Kn = {x ∈ Kn : d(x,K) ≤ r} is compact according to Step 1. Since J is countable and r ∈ Jc is arbitrary, we see that Kr is compact for all but countably many r ∈ (0,∞). Separately, the next elementary metric-space lemma will be convenient. Lemma 3.1.4. If (S, d) is compact, then the subspace of C(S ∞, d ∞ ) consisting of members that depend on finitely many coordinates is dense in C(S ∞, d ∞ ). Suppose (S,d) is a compact metric space. Let x◦ be an arbitrary but fixed reference point in (S,d). Let n ≥ 1 be arbitrary. Define the projection mapping jn∗ : S ∞ → S ∞ by jn∗ (x1,x2, . . .) ≡ (x1,x2, . . . ,xn,x◦,x◦, . . .) for each (x1,x2, . . .) ∈ S ∞ . Then jn∗ ◦ jm∗ = jn∗ for each m ≥ n. Let L0,n ≡ {f ∈ C(S ∞,d ∞ ) : f = f ◦ jn∗ }. Then L0,n ⊂ L0,n+1 .
(3.1.1)
Partition of Unity
19
∞
Let L0,∞ ≡ n=1 L0,n . Then the following conditions hold: 1. L0,n and L0,∞ are linear subspaces of C(S ∞,d ∞ ), and consist of functions that depend, respectively, on the first n coordinates and on finitely many coordinates. 2. The subspace L0,∞ is dense in C(S ∞,d ∞ ) relative to the supremum norm
· . Specifically, let f ∈ C(S ∞,d ∞ ) be arbitrary, with a modulus of continuity ∗ δf . Then f ◦ jn ∈ L0,n . Moreover, for each ε > 0 we have f − f ◦ jn∗ ≤ ε if n > − log2 (δf (ε)). In particular, if f has Lipschitz constant c > 0, then f − f ◦ jn∗ ≤ ε if n > log2 (cε−1 ). Proof. Let m ≥ n ≥ 1 and w ∈ S ∞ be arbitrary. Then, for each (x1,x2, . . .) ∈ S ∞ , we have jn∗ (jm∗ (x1,x2, . . .)) = jn∗ (x1,x2, . . . ,xm,x◦,x◦, . . .) = (x1,x2, . . . ,xn,x◦,x◦, . . .) = jn∗ (x1,x2, . . .). Thus jn∗ ◦ jm∗ = jn∗ . 1. It is clear from the defining equality 3.1.1 that L0,n is a linear subspace of C(S ∞,d ∞ ). Let f ∈ L0,n be arbitrary. Then f = f ◦ jn∗ = f ◦ jn∗ ◦ jm∗ = f ◦ jm∗ .
∞ Hence f ∈ L0,m . Thus L0,n ⊂ L0,m . Consequently, L0,∞ ≡ p=1 L0,p is ∞ a union of a nondecreasing sequence of linear subspaces of C(S ,d ∞ ) and, therefore, is also a linear subspace of C(S ∞,d ∞ ). 2. Let f ∈ C(S ∞,d ∞ ) be arbitrary, with a modulus of continuity δf . Let ε > 0 be arbitrary. Suppose n > − log2 (δf (ε)). Then 2−n < δf (ε). Let (x1,x2, . . .) ∈ S ∞ be arbitrary. Then d ∞ ((x1,x2, . . .),jn∗ (x1,x2, . . .)) = d ∞ ((x1,x2, . . .),(x1,x2, . . . ,xn,x◦,x◦, . . .)) ≡
n
−k
2
d(xk ,xk ) +
k=1
∞
k ,x◦ ) ≤ 0 + 2−n < δf (ε), 2−k d(x
k=n+1
where d ≡ 1 ∧ d. Hence |f (x1,x2, . . .) − f ◦ jn∗ (x1,x2, . . .)| < ε, where (x1,x2, . . .) ∈ S ∞ is arbitrary. We conclude that f − f ◦ jn∗ ≤ ε, as alleged.
3.2 Binary Approximation Let (S,d) be an arbitrary locally compact metric space. Then S contains a countable dense subset. A binary approximation, defined presently, is a structured and well-quantified countable dense subset.
20
Introduction and Preliminaries
Recall that (i) |A| denotes the number of elements in an arbitrary finite set A; (ii) a subset A of S is said to be metrically discrete if for each y,z ∈ A, either y = z or d(y,z) > 0; and (iii) a finite subset A of a subset K ⊂ S is called an ε-approximation of K if for each x ∈ K, there exists y ∈ A with that d(x,y) ≤ ε. Classically, each subset of (S,d) is metrically discrete. Condition (iii) can be written more succinctly as (d(·,x) ≤ ε). K⊂ x∈A
Definition 3.2.1. Binary approximation and modulus of local compactness. Let (S,d) be a locally compact metric space, with an arbitrary but fixed reference point x◦ . Let A0 ≡ {x◦ } ⊂ A1 ⊂ A2 ⊂ . . . be a sequence of metrically discrete and finite subsets of S. For each n ≥ 1, write κn ≡ |An |. Suppose (d(·,x) ≤ 2−n ) (3.2.1) (d(·,x◦ ) ≤ 2n ) ⊂ x∈A(n)
and
(d(·,x) ≤ 2−n+1 ) ⊂ (d(·,x◦ ) ≤ 2n+1 )
(3.2.2)
x∈A(n)
for each n ≥ 1. Then the sequence ξ ≡ (An )n=1,2,... of subsets is called a binary approximation for (S,d) relative to x◦ , and the sequence of integers
ξ ≡ (κn )n=1,2,... ≡ (|An |)n=1,2,... is called the modulus of local compactness of (S,d) corresponding to ξ . Thus a binary approximation is an expanding sequence of 2−n -approximation for (d(·,x◦ ) ≤ 2n ) as n → ∞. The next proposition shows that the definition is not vacuous. First note that ∞ n=1 An is dense in (S,d) in view of relation 3.2.1. In the case where (S,d) is compact, for n ≥ 1 so large that S = (d(·,x◦ ) ≤ 2n ), relation 3.2.1 says that we need at most κn points to make a 2−n -approximation of S.1 Lemma 3.2.2. Existence of metrically discrete ε-approximations. Let K be a compact subset of the locally compact metric space (S,d). Let A0 ≡ {x1, . . . ,xn } be an arbitrary metrically discrete finite subset of K. Let ε > 0 be arbitrary. Then the following conditions hold: 1. There exists a metrically discrete finite subset A of K such that (i) A0 ⊂ A and (ii) A is an ε-approximation of K. 2. In particular, there exists a metrically discrete finite set A that is an ε-approximation of K.
1 Incidentally, the number log κ is a bound for Kolmogorov’s 2−n -entropy of the compact metn ric space (S,d), which represents the informational content in a 2−n -approximation of S. See
[Lorentz 1966] for a definition of ε-entropy.
Partition of Unity
21
Proof. 1. By hypothesis, the set A0 ≡ {x1, . . . ,xn } is a metrically discrete finite subset of the compact set K. Let ε > 0 be arbitrary. Take an arbitrary ε0 ∈ (0,ε). Take any ε0 -approximation {y1, . . . ,ym } of K. Write α ≡ m−1 (ε − ε0 ). 2. Trivially, the set A0 ∪ {y1, . . . ,ym } is an ε0 -approximation of K. Moreover, A0 is metrically discrete. 3. Consider y1 . Then either (i) there exists x ∈ A0 such that d(x,y1 ) < α or (ii) for each x ∈ A0 we have d(x,y1 ) > 0. In case (i), define A1 ≡ A0 . In case (ii), define A1 ≡ A0 ∪{y1 }. Then, in either case, A1 is a metrically discrete finite subset of K. Moreover, A0 ⊂ A1 . By the assumption in Step 2, the set A0 ∪ {y1, . . . ,ym } is an ε0 -approximation of K. Hence there exists z ∈ A0 ∪ {y1, . . . ,ym } such that d(z,y) < ε0 . There are three possibilities: (i ) z ∈ A0 , (ii ) z ∈ {y2, . . . ,ym }, or (iii ) z = y1 . 4. In cases (i ) and (ii ), we have, trivially, z ∈ A1 ∪ {y2, . . . ,ym } and d(z,y) < ε0 + α. Let w ≡ z ∈ A0 ∪ {y1, . . . ,ym }. 5. Consider case (iii ), where z = y1 . Then, according to Step 3, either (i) or (ii) holds. In case (ii), we have, again trivially, z = y1 ∈ A0 ∪ {y1 } ≡ A1, and d(z,y) < ε0 . Consider case (i). Then A1 ≡ A0, and there exists x ∈ A0 such that d(x,y1 ) < α. Consequently, x ∈ A1 and d(x,y) ≤ d(x,y1 ) + d(y1,y) = d(x,y1 ) + d(z,y) < α + ε0 . Let w ≡ x ∈ A0 ∪ {y1, . . . ,ym }. 6. Combining Steps 4 and 5, we see that, in any case, there exists w ∈ A1 ∪ {y2, . . . ,ym } such that d(w,y) < ε0 + α. Since y ∈ K is arbitrary, we conclude that the set A1 ∪ {y2, . . . ,ym } is an (ε0 + α)-approximation of K, and that the set A1 is metrically discrete. Moreover, A0 ⊂ A1 . 7. Repeat Steps 3–6 with A1 ∪ {y2, . . . ,ym } and ε0 + α in the roles of A0 ∪ {y1, . . . ,ym } and ε0 , respectively. We obtain a metrically discrete subset A2 such that A2 ∪ {y3, . . . ,ym } is an (ε0 + 2α)-approximation of K. Moreover A0 ⊂ A2 . 8. Recursively on y3, . . . ,ym , we obtain a metrically discrete subset Am that is an (ε0 + mα)-approximation of K. Moreover, A0 ⊂ Am . Since ε0 + mα ≡ ε0 + (ε − ε0 ) = ε,
22
Introduction and Preliminaries
it follows that Am is an ε-approximation of K. In conclusion, the set A ≡ Am has the desired properties in Assertion 1. 9. To prove Assertion 2, take an arbitrary x1 ∈ K and define A0 ≡ {x1 }. Then, by Assertion 1, there exists a metrically discrete finite subset A of K such that (i) A0 ⊂ A and (ii) A is an ε-approximation of K. Thus the set A has the desired properties in Assertion 2. Proposition 3.2.3. Existence of binary approximations. Each locally compact metric space (S,d) has a binary approximation. Proof. Let x◦ ∈ S be an arbitrary but fixed reference point. Proceed inductively on n ≥ 0 to construct a metrically discrete and finite subset An of S to satisfy Conditions 3.2.1 and 3.2.2 in Definition 3.2.1. To start, let A0 ≡ {x◦ }. Suppose the set An has been constructed for some n ≥ 0 such that if n ≥ 1, then (i) An is metrically discrete and finite and (ii) Conditions 3.2.1 and 3.2.2 in Definition 3.2.1 are satisfied. Proceed to construct An+1 . To that end, write ε ≡ 2−n−2 and take any r ∈ [2n+1,2n+1 + ε) such that K ≡ (d(·,x◦ ) ≤ r) is compact. This is possible in view of Corollary 3.1.3. If n = 0, then An ≡ {x◦ } ⊂ K trivially. If n ≥ 1, then, according to the induction hypothesis, the set An is metrically discrete, and by Condition 3.2.2 in Definition 3.2.1, we have An ⊂ (d(·,x) ≤ 2−n+1 ) ⊂ (d(·,x◦ ) ≤ 2n+1 ) ⊂ K. x∈A(n)
Hence we can apply Lemma 3.2.2 to construct a 2−n−1 -approximation An+1 of K, which is metrically discrete and finite, such that An ⊂ An+1 . It follows that (d(·,x◦ ) ≤ 2n+1 ) ⊂ K ⊂ (d(·,x) ≤ 2−n−1 ), x∈A(n+1)
proving Condition 3.2.1 in Definition 3.2.1 for n + 1. Now let (d(·,x) ≤ 2−n ) y∈ x∈A(n+1)
be arbitrary. Then d(y,x) ≤ 2−n for some x ∈ An+1 ⊂ K. Hence d(x,x◦ ) ≤ r < 2n+1 + ε. Consequently, d(y,x◦ ) ≤ d(y,x) + d(x,x◦ ) ≤ 2−n + 2n+1 + ε ≡ 2−n + 2n+1 + 2−n−2 ≤ 2n+2 .
Partition of Unity Thus we see that
23
(d(·,x) ≤ 2−n ) ⊂ (d(·,x◦ ) ≤ 2n+2 ),
x∈A(n+1)
proving Condition 3.2.2 in Definition 3.2.1 for n + 1. Induction is completed. The sequence ξ ≡ (An )n=1,2,... satisfies all the conditions in Definition 3.2.1 to be a binary approximation of (S,d). Definition 3.2.4. Finite product and power of binary approximations. Let n ≥ 1 be arbitrary. For each i = 1, . . . ,n, let (Si ,di ) be a locally compact metric space, with a reference point xi,◦ ∈ Si and with a binary approximation
n n ξi ≡ (Ai,p )p=1,2,... relative to xi,◦ . Let (S (n),d (n) ) ≡ i=1 Si , i=1 di be the product metric space, with x◦(n) ≡ (x1,◦, . . . ,xn,◦ ) designated as the reference point in (S (n),d (n) ). (n) For each p ≥ 1, let Ap ≡ A1,p × · · · × An,p . The next lemma proves that (n) (n) (Ap )p=1,2,... is a binary approximation of (S (n),d (n) ) relative to x◦ . We will call (n) ξ (n) ≡ (Ap )p=1,2,... the product binary approximation of ξ1, . . . ,ξn , and write ξ (n) ≡ ξ1 ⊗ · · · ⊗ ξn . If (Si ,di ) = (S,d) for some locally compact metric space, with xi,◦ = x◦ and ξi = ξ for each i = 1, . . . ,n, then we will call ξ (n) the nth power of the binary approximation ξ , and write ξ n ≡ ξ (n) . Lemma 3.2.5. Finite product binary approximation is indeed a binary approximation. Use the assumptions and notations in Definition 3.2.4. Then the finite product binary approximation ξ (n) is indeed a binary approximation of (n) (S (n),d (n) ) relative to x◦ . Let ξi ≡ (κi,p )p=1,2,... ≡ (|Ai,p |)p=1,2,... be the modulus of local compactness of (Si ,di ) corresponding to ξi , for each i = 1, . . . ,n. Let ξ (n) be the modulus of local compactness of (S (n),d (n) ) corresponding to ξ (n) . Then n (n) κi,p . ξ = i=1
p=1,2,...
In particular, if ξi ≡ ξ for each i = 1, . . . ,n for some binary approximation ξ of some locally compact metric space (S,d), then the finite power binary approximation is indeed a binary approximation of (S n,d n ) relative to x◦n . Moreover, the modulus of local compactness of (S n,d n ) corresponding to ξ n is given by n ξ = (κ n )p=1,2,... . p (n)
(n)
Proof. Recall that Ap ≡ A1,p × · · · × An,p for each p ≥ 1. Hence A1 (n) A2 ⊂ · · · . 1. Let p ≥ 1 be arbitrary. Let x ≡ (x1, . . . ,xn ),y ≡ (y1, . . . ,yn ) ∈ A(n) p ≡ A1,p × · · · × An,p
⊂
24
Introduction and Preliminaries
be arbitrary. For each i = 1, . . . ,n, because (Ai,q )q=1,2,... is a binary approximation of (Si ,di ), the set Ai,p is metrically discrete. Hence either (i) xi = yi for each i = 1, . . . ,n or (ii) di (xi ,yi ) > 0 for some i = 1, . . . ,n. In case (i), we have x = y. In case (ii), we have d
(n)
(x,y) ≡
n
dj (xj ,yj ) ≥ di (xi ,yi ) > 0.
j =1 (n)
Thus the set Ap is metrically discrete. 2. Next note that (d
(n)
(·,x◦(n) )
≤ 2 ) ≡ (y1, . . . ,yn ) ∈ S p
(n)
:
n
di (yi ,xi,◦ ) ≤ 2
p
i=1
=
n
{(y1, . . . ,yn ) ∈ S (n) : di (yi ,xi,◦ ) ≤ 2p }
i=1
⊂C≡
n
{(y1, . . . ,yn ) ∈ S (n) : di (yi ,xi ) ≤ 2−p },
i=1 x(i)∈A(i,p)
(3.2.3) where the last inclusion is due to Condition 3.2.1 applied to the binary approximation (Ai,q )q=1,2,... . Basic Boolean operations then yield
C=
n
{(y1, . . . ,yn ) ∈ S (n) : di (yi ,xi ) ≤ 2−p }
(x(1),...,x(n))∈A(1,p)×···×A(n,p) i=1
=
(y1, . . . ,yn ) ∈ S
:
n
−p
di (yi ,xi ) ≤ 2
i=1
(x(1),...,x(n))∈A(1,p)×···×A(n,p)
≡
(n)
(x(1),...,x(n))∈A(1,p)×···×A(n,p)
× {(y1, . . . ,yn ) ∈ S (n) : d (n) ((y1, . . . ,yn ),(x1, . . . ,xn )) ≤ 2−p } ≡ (d (n) (·,(x1, . . . ,xn )) ≤ 2−p ) (x(1),...,x(n))∈A(1,p)×···×A(n,p)
=
(d (n) (·,x) ≤ 2−p ).
(3.2.4)
(n)
x∈Ap
Combining with relation 3.2.3, this yields (d (n) (·,x◦(n) ) ≤ 2p ) ⊂ (d (n) (·,x) ≤ 2−p ), (n)
x∈Ap
where p ≥ 1 is arbitrary. Thus Condition 3.2.1 has been verified for the sequence (n) ξ (n) ≡ (Ap )p=1,2,... .
Partition of Unity
25
3. In the other direction, we have, similarly, (d (n) (·,x) ≤ 2−p+1 ) (n)
x∈Ap
≡
n
(di (·,x) ≤ 2−p+1 )
(n) x∈Ap i=1
=
n
{(y1, . . . ,yn ) ∈ S (n) : di (yi ,xi ) ≤ 2−p+1 }
i=1 x(i)∈A(i,p)
⊂
n
{(y1, . . . ,yn ) ∈ S (n) : di (yi ,xi,◦ ) ≤ 2p+1 }
i=1
= (y1, . . . ,yn ) ∈ S
(n)
:
n
di (yi ,xi,◦ ) ≤ 2
p+1
i=1
= (d (n) (·,x◦(n) ) ≤ 2p+1 ), where p ≥ 1 is arbitrary. This verifies Condition 3.2.2 for the sequence ξ (n) ≡ (n) (Ap )p=1,2,... . Thus all the conditions in Definition 3.2.1 have been proved for (n) the sequence ξ (n) to be a binary approximation of (S (n),d (n) ) relative to x◦ . Moreover, n n (n) (n) ξ ≡ (|A |)q=1,2,... = |Ai,q | ≡ κi,q . q i=1
q=1,2,...
i=1
q=1,2,...
We next extend the construction of the powers of binary approximations to the infinite power (S ∞,d ∞ ) of a compact metric space (S,d). Recall that d ≡ 1 ∧ d. Definition 3.2.6. Countable power of binary approximation for a compact metric space. Suppose (S,d) is a compact metric space, with a reference point x◦ ∈ S and a binary approximation ξ ≡ (An )n=1,2,... relative to x◦ . Let (S ∞,d ∞ ) be the countable power of the metric space (S,d), with x◦∞ ≡ (x◦,x◦, . . .) designated as the reference point in (S ∞,d ∞ ). For each n ≥ 1, define the subset ∞ Bn ≡ An+1 n+1 × {x◦ }
= {(x1, . . . ,xn+1,x◦,x◦ · · · ) : xi ∈ An+1 for each i = 1, . . . ,n + 1}. The next lemma proves that ξ ∞ ≡ (Bn )n=1,2,... is a binary approximation of (S ∞,d ∞ ) relative to x◦∞ . We will call ξ ∞ the countable power of the binary approximation ξ . Lemma 3.2.7. Countable power of binary approximation of a compact metric space is indeed a binary approximation. Suppose (S,d) is a compact
26
Introduction and Preliminaries
metric space, with a reference point x◦ ∈ S and a binary approximation ξ ≡ (An )n=1,2,... relative to x◦ . Without loss of generality, assume that d ≤ 1. Then the sequence ξ ∞ ≡ (Bn )n=1,2,... in Definition 3.2.6 is indeed a binary approximation of (S ∞,d ∞ ) relative to x◦∞ . Let ξ ≡ (κn )n=1,2,... ≡ (|An |)n=1,2,... denote the modulus of local compactness of (S,d) corresponding to ξ . Then the modulus of local compactness of (S ∞,d ∞ ) corresponding to ξ ∞ is given by ∞ ξ = (κ n+1 )n=1,2,... . n+1 Proof. Let n ≥ 1 be arbitrary. 1. Let x ≡ (x1, . . . ,xn+1,x◦,x◦, . . .),y ≡ (y1, . . . ,yn+1,x◦,x◦, . . .) ∈ Bn be arbitrary. Since An+1 is metrically discrete, we have either (i) xi = yi for each i ,yi ) > 0 for some i = 1, . . . ,n + 1. In case (i), we i = 1, . . . ,n + 1 or (ii) d(x have x = y. In case (ii), we have ∞
d (x,y) ≡
∞
j ,yj ) ≥ 2−i d(x i ,yi ) > 0. 2−j d(x
j =1
Thus we see that Bn is metrically discrete. 2. Next, let y ≡ (y1,y2, . . .) ∈ S ∞ be arbitrary. Let j = 1, . . . ,n + 1 be arbitrary. Then (d(·,z) ≤ 2−n−1 ), yj ∈ (d(·,x◦ ) ≤ 2n+1 ) ⊂ z∈A(n+1)
where the first containment relation is a trivial consequence of the hypothesis that d ≤ 1, and where the second is an application of Condition 3.2.1 of Definition 3.2.1 to the binary approximation ξ ≡ (An )n=1,2,... . Hence there exists some uj ∈ An+1 with d(yj ,uj ) ≤ 2−n−1 , where j = 1, . . . ,n + 1 is arbitrary. It follows that ∞ u ≡ (u1, . . . ,un+1,x◦,x◦, . . .) ∈ An+1 n+1 × {x◦ } ≡ Bn,
and that ∞
d (y,u) ≤
n+1
j =1
≤
n+1
−j
2
d(yj ,uj ) +
∞
2−j
j =n+2
2−j 2−n−1 + 2−n−1 < 2−n−1 + 2−n−1 = 2−n,
j =1
where y ∈
S∞
is arbitrary. We conclude that (d ∞ (·,x◦∞ ) ≤ 2n ) = S ∞ ⊂ (d ∞ (·,u) ≤ 2−n ). u∈B(n)
Partition of Unity
27
where the equality is trivial because d ∞ ≤ 1. Thus Condition 3.2.1 of Definition 3.2.1 is verified for the sequence (Bn )n=1,2,... . At the same time, again because because d ∞ ≤ 1, we have trivially (d ∞ (·,u) ≤ 2−n+1 ) ⊂ S ∞ = (d ∞ (·,x◦∞ ) ≤ 2n+1 ). u∈B(n)
Thus Condition 3.2.2 of Definition 3.2.1 is also verified for the sequence (Bn )n=1,2,... . All the conditions in Definition 3.2.1 have been verified for the sequence ξ ∞ ≡ (Bn )n=1,2,... to be a binary approximation of (S ∞,d ∞ ) relative to x◦∞ . Moreover, ∞ ξ ≡ (|Bn |)n=1,2,... = (|An+1 |)n=1,2,... ≡ (κ n+1 )n=1,2,... . n+1 n+1
The lemma is proved.
3.3 Partition of Unity In this section, we define and construct a partition of unity relative to a binary approximation of a locally compact metric space (S,d). There are many different versions of partitions of unity in the mathematics literature, providing approximate linear bases in the analysis of various linear spaces of functions. The present version, roughly speaking, furnishes an approximate linear basis for C(S,d), the space of continuous functions with compact supports on a locally compact metric space. In this version, the basis functions will be endowed with specific properties that make later applications simpler. For example, each basis function will be Lipschitz continuous. First we prove an elementary lemma for Lipschitz continuous functions. Lemma 3.3.1. Definition and basics for Lipschitz continuous functions. Let (S,d) be an arbitrary metric space. A real-valued function f on S is said to be Lipschitz continuous, with Lipschitz constant c ≥ 0, if |f (x) − f (y)| ≤ cd(x,y) for each x,y ∈ S. We will then also say that the function has Lipschitz constant c. Let x◦ ∈ S be an arbitrary but fixed reference point. Let f ,g be real-valued functions with Lipschitz constants a,b, respectively, on S. Then the following conditions hold: 1. d(·,x◦ ) has Lipschitz constant 1. 2. αf + βg has Lipschitz constant |α|a + |β|b for each α,β ∈ R. If, in addition, |f | ≤ 1 and |g| ≤ 1, then f g has Lipschitz constant a + b. 3. f ∨ g and f ∧ g have Lipschitz constant a ∨ b. 4. 1 ∧ (1 − cd(·,x◦ ))+ has Lipschitz constant c for each c > 0. 5. If f ∨ g ≤ 1, then f g has Lipschitz constant a + b. 6. Suppose (S ,d ) is a locally compact metric space. Suppose f is a realvalued function on S , with Lipschitz constant a > 0. Suppose f ∨ f ≤ 1.
28
Introduction and Preliminaries
Then f ⊗ f : S × S → R has Lipschitz constant a + a , where S × S is equipped with the product metric d ≡ d ⊗ d , and where f ⊗ f (x,x ) ≡ f (x)f (x ) for each (x,x ) ∈ S × S . 7. Assertion 6 can be generalized to a p-fold product f ⊗ f ⊗ · · · ⊗ f (p) . Proof. Let x,y ∈ S be arbitrary. 1. By the triangle inequality, we have |d(x,x◦ ) − d(y,x◦ )| ≤ d(x,y). Assertion 1 follows. 2. Note that |αf (x) + βg(x) − (αf (y) + βg(y))| ≤ |α(f (x) − f (y))| + |β(g(x) − g(y))| ≤ (|α|a + |β|b)d(x,y) and that if |f | ≤ 1 and |g| ≤ 1, then |f (x)g(x) − f (y)g(y)| ≤ |(f (x) − f (y))g(x)| + |(g(x) − g(y))f (y)|. ≤ |f (x) − f (y)| + |g(x) − g(y)| ≤ (a + b)d(x,y). Assertion 2 is proved. 3. To prove Assertion 3, first consider arbitrary r,s,t,u ∈ R. We will show that α ≡ |r ∨ s − t ∨ u| ≤ β ≡ |r − t| ∨ |s − u|.
(3.3.1)
To see this, we may assume, without loss of generality, that r ≥ s. If t ≥ u, then α = |r − t| ≤ β. On the other hand, if t ≤ u, then −β ≤ −|s − u| ≤ s − u ≤ r − u ≤ r − t ≤ |r − t| ≤ β, whence α = |r − u| ≤ β. By continuity, we see that inequality 3.3.1 holds for each r,s,t,u ∈ R. Consequently, |f (x) ∨ g(x) − f (y) ∨ g(y)| ≤ |f (x) − f (y)| ∨ |g(x) − g(y)| ≤ (a ∨ b)d(x,y) for each x,y ∈ S. Thus f ∨ g has Lipschitz constant a ∨ b. Assertion 2 then implies that the function f ∧ g = −((−f ) ∨ (−g)) also has Lipschitz constant a ∨ b. Assertion 3 is proved. 4. Assertion 4 follows immediately from Assertions 1, 2, and 3. 5. Assertion 5 follows from |f (x)g(x) − f (y)g(y)| ≤ |f (x)(g(x) − g(y))| + |(f (x) − f (y))g(y)| ≤ (b + a)d(x,y). 6. Suppose f ∨ f ≤ 1. Then for each (x,x ),(y,y ) ∈ S × S . |f (x)f (x ) − f (y)f (y )| ≤ |f (x)(f (x ) − f (y ))| + |(f (x) − f (y))f (y )| ≤ (a + a)(d(x ,y ) ∨ d(x,y)) ≡ (a + a)d ⊗ d ((x,x )),d(y,y ))
Partition of Unity
29
Thus the function f ⊗ f : S × S → R has Lipschitz constant a + a . Assertion 6 follows. 7. The proof of Assertion 7 is omitted. The next definitions and propositions embellish proposition 6.15 on page 119 of [Bishop and Bridges 1985]. Definition 3.3.2. ε-Partition of unity. Let A be an arbitrary metrically discrete and finite subset of a locally compact metric space (S,d). Because the set A is finite, we can write A = {x1, . . . ,xκ } for some sequence x ≡ (x1, . . . ,xκ ), where x : {1, . . . ,κ} → A is an enumeration of the finite set A. Thus |A| ≡ κ. Let ε > 0 be arbitrary. Define, for each k = 1, . . . ,κ, the function ηk ≡ 1 ∧ (2 − ε−1 d(·,xk ))+ ∈ C(S,d)
(3.3.2)
gk+ ≡ η1 ∨ · · · ∨ ηk ∈ C(S,d).
(3.3.3)
and
In addition, define
g0+
≡ 0. Also, for each k = 1, . . . ,κ, define + . gx(k) ≡ gk+ − gk−1
(3.3.4)
Then the subset {gy : y ∈ A} of C(S,d) is called the ε-partition of unity of (S,d), determined by the enumerated set A. The members of {gy : y ∈ A} are called the basis functions of the ε-partition of unity. Proposition 3.3.3. Properties of ε-partition of unity. Let A = {x1, . . . ,xκ } be an arbitrary metrically discrete and enumerated finite subset of a locally compact metric space (S,d). Let ε > 0 be arbitrary. Let {gx : x ∈ A} be the ε-partition of unity determined by the enumerated set A. Then the following conditions hold: 1. gx has values in [0,1] and has the set (d(·,x) < 2ε) as support, for each x ∈ A. 2. x∈A gx ≤ 1 on S. 3. x∈A gx = 1 on x∈A (d(·,x) ≤ ε). 4. For each x ∈ A, the functions gx , y∈A;y 0. By the defining equality 3.3.4, it follows + (y). Hence ηk (y) > 0 by equality 3.3.3. Equality 3.3.2 then that gk+ (y) > gk−1 implies that d(y,xk ) < 2ε. Thus we see that the function gx(k) has (d(·,xk ) < 2ε) as support. In general, gx(k) has values in [0,1], thanks to equalities 3.3.2, 3.3.3, and 3.3.4. Assertion 1 is proved. 2. Note that x∈A gx = gκ+ ≡ η1 ∨ · · · ∨ ηκ ≤ 1. Assertion 2 is verified. 3. Suppose y ∈ S is such that d(y,xk ) ≤ ε for some k = 1, . . . ,κ. Then ηk (y) = 1 according to equality 3.3.2. Hence x∈A gx (y) = gk+ (y) = 1 by equality 3.3.3. Assertion 3 is proved.
30
Introduction and Preliminaries
4. Now let k = 1, . . . ,κ be arbitrary. Refer to Lemma 3.3.1 for the basic properties of Lipschitz constants. Then, in view of the defining equality 3.3.2, the function ηk has Lipschitz constant ε−1 . Hence gk+ ≡ η0 ∨ · · · ∨ ηk has Lipschitz constant ε−1 . In particular, y∈A gy ≡ gκ+ has Lipschitz constant ε−1 . Moreover, for each k = 1, . . . ,κ, the function
gy ≡
y∈A;y 0 be arbitrary. Then the following conditions hold: 1. There exists k ≥ 1 so large that (i) the function f has the set B ≡ (d(·,x◦ ) ≤ 2k )n ⊂ S n as support, and that (ii) 2−k < 2−1 δf (3−1 ε). 2. Take any k ≥ 1 that satisfies Conditions (i) and (ii). Then sup (y,y ,...,y (n) )∈S n
−
|f (y,y , . . . ,y (n) )
f (x,x , . . . ,x (n) )gx (y)gx (y ) · · · gx (n) (y (n) )| ≤ ε.
(x,x ,...,x (n) )∈A(k)n
(3.3.6) In other words,
(n) f − f (x,x , . . . ,x )gx ⊗ gx ⊗ · · · ⊗ gx (n) ≤ ε, (x,x ,...,x (n) )∈A(k)n where · signifies the supremum norm in C(S n,d n ).
(3.3.7)
32
Introduction and Preliminaries
3. The function represented by the sum in inequality 3.3.7 is Lipschitz continuous. 4. In particular, suppose n = 1 and |f | ≤ 1. Take any k ≥ 1 that satisfies Conditions (i) and (ii). Then
f − g ≤ ε for some Lipschitz continuous function with Lipschitz constant 2k+1 |Ak |. Proof. We will give the proof only for the case where n = 2; the other cases are similar. 1. Let K be a compact subset of C(S 2,d 2 ) such that the function f ∈ C(S 2,d 2 ) has K as support. Since the compact set K is bounded, there exists k ≥ 1 so large that K ⊂ B ≡ (d(·,x◦ ) ≤ 2k ) × (d(·,x◦ ) ≤ 2k ) and that (ii) 2−k < 2−1 δf (3−1 ε). Conditions (i) and (ii) follow. Assertion 1 is proved. 2. Now fix any k ≥ 1 that satisfies Conditions (i) and (ii). Note that (d(·,x◦ ) ≤ k 2 ) ⊂ x∈A(k) (d(·,x) ≤ 2−k ) by relation 3.2.1 of Definition 3.2.1. Hence Condition (i) implies that (i ) the function f has the set (d(·,x) ≤ 2−k ) × (d(·,x ) ≤ 2−k ) x∈A(k)
x ∈A(k)
as support. For abbreviation, write α ≡ 2−k , A ≡ Ak , and gx ≡ gk,x for each x ∈ A. 3. Let (y,y ) ∈ S 2 be arbitrary. Suppose x,x ∈ A are such that gx (y)gx (y ) > 0. Then, since gx ,gx have (d(·,x) < 2−k+1 ),(d(·,x ) < 2−k+1 ), respectively, as support, according to Assertion 1 of Proposition 3.3.5, it follows that d 2 ((y,y ), (x,x )) < 2α < δf (3−1 ε). Consequently, the inequality |f (y,y ) − f (x,x )|gx (y)gx (y ) ≤ 3−1 εgx (y)gx (y )
(3.3.8)
holds for arbitrary (x,x ) ∈ A2 such that gx (y)gx (y ) > 0 and, consequently, holds for arbitrary (x,x ) ∈ A2 . 4. There are two possibilities: (i ) |f (y,y )| > 0 or (ii ) |f (y,y )| < 3−1 ε. 5. First consider case (i ). Then, by Condition (i ), we have (y,y ) ∈ (d(·,x◦ ) ≤ 2k ) × (d(·,x◦ ) ≤ 2k ) ⊂ B ≡ (d(·,x) < 2−k ) × (d(·,x ) < 2−k ). (x,x )∈A×A
Hence, by Assertion 3 of Proposition 3.3.3, we have x ∈A gx (y ) = 1. Therefore
x∈A gx (y)
= 1 and
Partition of Unity
f (x,x )gx (y)gx (y ) f (y,y ) − x∈A x ∈A
= f (y,y )gx (y)gx (y ) − f (x,x )gx (y)gx (y ) x∈A x ∈A x∈A x ∈A
≤ |f (y,y ) − f (x,x )|gx (y)gx (y )
33
x∈A x ∈A
≤ 3−1 ε
gx (y)gx (y ) ≤ 3−1 ε < ε,
x∈A x ∈A
where the second inequality follows from inequality 3.3.8, and where the third inequality is thanks to Assertion 2 of Proposition 3.3.3. 6. Now consider case (ii), where |f (y,y )| < 3−1 ε.
(3.3.9)
Then
f (x,x )gx (y)gx (y ) < 3−1 ε + |f (x,x )|gx (y)gx (y ) f (y,y ) − x∈Ax ∈A
−1
≤3
ε+
x∈Ax ∈A
−1
(|f (y,y )| + 3
ε)gx (y)gx (y )
x∈A x ∈A
≤ 3−1 ε +
(3−1 ε + 3−1 ε)gx (y)gx (y )
x∈A x ∈A
≤ 3−1 ε + (3−1 ε + 3−1 ε) = ε,
(3.3.10)
where the first and third inequalities are by inequality 3.3.9, where the second inequality is by inequality 3.3.8, and the last inequality is thanks to Assertion 2 of Proposition 3.3.3. 7. Summing up, Steps 5 and 6 show that
f (x,x )gx (y)gx (y ) ≤ ε, f (y,y ) − x∈A x ∈A
(y,y ) ∈ S 2
where is arbitrary. The desired inequality 3.3.6 follows for the case where n = 2. The proof for the general case n ≥ 1 is similar. Assertion 2 of the present proposition is verified. 8. Let x,x ∈ A be arbitrary. Then the functions gx ,gx are Lipschitz, according to Assertion 4 of Proposition 3.3.3. Hence the function gx ⊗ gx is Lipschitz, by Assertion 6 of Lemma 3.3.1. Assertion 3 of the present proposition follows. 9. Now suppose, in addition, that n = 1 and |f | ≤ 1. According to Proposition 3.3.3, each of the functions gx in the last sum has Lipschitz constant 2k+1 , while f (x) is bounded by 1 by hypothesis. By Assertion 2, we have
34
Introduction and Preliminaries
f − f (x)gx ≤ ε, x∈A(k) where the function g ≡ x∈A(k) f (x)gx has Lipschitz constant
|f (x)|2k+1 ≤ 2k+1 |Ak |,
(3.3.11)
x∈A(k)
according to Assertion 2 of Lemma 3.3.1. Assertion 4 of the present proposition is proved.
3.4 One-Point Compactification The countable power of a locally compact metric space (S,d) is not necessarily locally compact, while the countable power of a compact metric space remains compact. For that reason, we will often find it convenient to embed a locally compact metric space into a compact metric space such that while the metric is not preserved, the continuous functions are. This embedding is made precise in the present section, by an application of partitions of unity. The next definition is an embellishment of definition 6.6, proposition 6.7, and theorem 6.8 of [Bishop and Bridges 1985]. Definition 3.4.1. One-point compactification. A one-point compactification of a locally compact metric space (S,d) is a compact metric space (S,d) with an element , called the point at infinity, such that the following five conditions hold: 1. d ≤ 1 and S ∪ {} is a dense subset of (S,d). 2. Let K be an arbitrary compact subset of (S,d). Then there exists c > 0 such that d(x,) ≥ c for each x ∈ K. 3. Let K be an arbitrary compact subset of (S,d). Let ε > 0 be arbitrary. Then there exists δK (ε) > 0 such that for each y ∈ K and z ∈ S with d(y,z) < δK (ε), we have d(y,z) < ε. In particular, the identity mapping ι¯ : (S,d) → (S,d), defined by ι(x) ≡ x for each x ∈ S, is uniformly continuous on each compact subset K of (S,d). 4. The identity mapping ι : (S,d) → (S,d), defined by ι(x) ≡ x for each x ∈ S, is uniformly continuous on (S,d). In other words, for each ε > 0, there exists δd (ε) > 0 such that d(x,y) < ε for each x,y ∈ S with d(x,y) < δd (ε). 5. For each n ≥ 1, we have (d(·,x◦ ) > 2n+1 ) ⊂ (d(·,) ≤ 2−n ). Thus, as a point x ∈ S moves away from x◦ relative to d, it converges to the point at infinity relative to d. First we provide some convenient notations. Definition 3.4.2. Restriction of a family of functions. Let A,A be arbitrary sets and let B be an arbitrary subset of A. Recall that the restriction of a function
Partition of Unity
35
f : A → A to a subset B ⊂ A is denoted by f |B. Suppose F is a family of functions from A to A and suppose B ⊂ A. Then we call the family F |B ≡ {f |B : f ∈ F } the restriction of F to B. The next theorem constructs a one-point compactification. The proof loosely follows the lines of theorem 6.8 in chapter 4 of [Bishop and Bridges 1985]. Theorem 3.4.3. Construction of a one-point compactification from a binary approximation. Let (S,d) be a locally compact metric space. Let the sequence ξ ≡ (An )n=1,2,... of subsets be a binary approximation of (S,d) relative to x◦ . Then there exists a one-point compactification (S,d) of (S,d) such that the following conditions hold: (i) For each p ≥ 1 and for each y,z ∈ S with d(y,z) < p−1 2−p−1, we have d(y,z) < 2−p+1 . (ii) For each n ≥ 1, for each y ∈ (d(·,x◦ ) ≤ 2n ), and for each z ∈ S with d(y,z) < 2−n−1 |An |−2, we have d(y,z) < 2−n+2 . The one-point compactification (S,d) constructed in the proof is said to be relative to the binary approximation ξ . Proof. 1. Let π ≡ ({gn,x : x ∈ An })n=1,2,... be the partition of unity of (S,d) determined by ξ . Let n ≥ 1 be arbitrary. Then {gn,x : x ∈ An } is a 2−n -partition of unity corresponding to the metrically discrete and enumerated finite set An . Moreover, by Assertion 4 of Proposition 3.3.5, the function gn,x has Lipschitz constant 2n+1 for each x ∈ An . 2. Define S ≡ {(x,i) ∈ S × {0,1} : i = 0 or (x,i) = (x◦,1)} S. Thus S=S∪ and define ≡ (x◦,1). Identify each x ∈ S with x¯ ≡ (x,0) ∈ {}. Extend each function f ∈ C(S,d) to a function on S by defining f () ≡ 0. S, with gn,x () ≡ 0, In particular, the function gn,x is extended to a function on S × S by for each x ∈ An . Define a function d on
36
Introduction and Preliminaries d(y,z) ≡
∞
2−n |An |−1
n=1
|gn,x (y) − gn,x (z)|
(3.4.1)
x∈A(n)
S. Symmetry and triangle for each y,z ∈ S. Then d(y,y) = 0 for each y ∈ inequality of the function d are immediate consequences of equality 3.4.1. Moreover, d ≤ 1 since the functions gn,x have values in [0,1]. 3. Let K be an arbitrary compact subset of (S,d) and let y ∈ K be arbitrary. Let n ≥ 1 be so large that y ∈ K ⊂ (d(·,x◦ ) ≤ 2n ). Then y∈
⎛ (d(·,x) ≤ 2−n ) ⊂ ⎝
x∈A(n)
⎞ gn,x = 1⎠ ,
x∈A(n)
where the membership relation is by Condition 3.2.1 in Definition 3.2.1, and where the set inclusion is according to Assertion 3 of Proposition 3.3.3. Hence the defining equality 3.4.1, where z is replaced by , yields
d(y,) ≥ 2−n |An |−1 gn,x (y) = c ≡ 2−n |An |−1 > 0, (3.4.2) x∈A(n)
establishing Condition 2 in Definition 3.4.1. Note that the constant c is independent of y ∈ K. 4. Let n ≥ 1 be arbitrary. Let y ∈ (d(·,x◦ ) ≤ 2n ) and z ∈ S be arbitrary such that d(y,z) < δξ,n ≡ 2−n−1 |An |−2 . As seen in Step 3,
gn,x (y) = 1.
x∈A(n)
Hence there exists x ∈ An such that gn,x (y) > 2−1 An |−1 > 0. At the same time, |gn,x (y) − gn,x (z)| ≤
(3.4.3)
|gn,u (y) − gn,u (z)|
u∈A(n)
≤ 2n |An |d(y,z) < 2n |An |δξ,n ≡ 2−1 |An |−1 . Hence inequality 3.4.3 implies that gn,x (z) > 0. Recall that gn,x ∈ C(S,d) has support (d(·,x) ≤ 2−n+1 ). Consequently, y,z ∈ (d(·,x) < 2−n+1 ). Thus d(y,z) < 2−n+2 . This establishes Assertion (ii) of the present theorem. Now let K be an arbitrary compact subset of (S,d) and let ε > 0 be arbitrary. Let n ≥ 1 be so large that K ⊂ (d(·,x◦ ) ≤ 2n ) and that 2−n+2 < ε. Let δK (ε) ≡ δξ,n .
Partition of Unity
37
Then, by the preceding paragraph, for each y ∈ K and z ∈ S with d(y,z) < δK (ε) ≡ δξ,n , we have d(y,z) < ε. Condition 3 in Definition 3.4.1 has been verified. In particular, suppose y,z ∈ S are such that d(y,z) = 0. Then either y = z = or y,z ∈ S, in view of inequality 3.4.2. Suppose y,z ∈ S. Then the preceding paragraph applied to the compact set K ≡ {y,z}, implies that d(y,z) = 0. Since (S,d) is a metric space, we conclude that y = z. In view of the last two statements of Step 2, ( S,d) is a metric space. S,d). Then (S,d) is a complete Let (S,d) be the completion of the metric space ( S is a dense subset of (S,d). metric space, with d ≤ 1 on S × S, and 5. Recall that gn,x has values in [0,1] and, as remarked earlier, has Lipschitz constant 2n+1 , for each x ∈ An , for each n ≥ 1. Now let p ≥ 1 be arbitrary. Let y,z ∈ S be arbitrary such that d(y,z) < p−1 2−p−1 . Then d(y,z) ≡
∞
2−n |An |−1
n=1
≤
p
|gn,x (y) − gn,x (z)|
x∈A(n)
2−n 2n+1 d(y,z) +
n=1
∞
2−n
n=p+1
< p · 2 · p−1 2−p−1 + 2−p = 2−p + 2−p = 2−p+1 . In short, for arbitrary y,z ∈ S such that d(y,z) < p−1 2−p−1 , we have d(y,z) < 2−p+1 .
(3.4.4)
This establishes Assertion (i) of the present theorem. Since 2−p+1 is arbitrarily small for sufficiently large p ≥ 1, we see that the identity mapping ι : (S,d) → (S,d) is uniformly continuous. This establishes Condition 4 in Definition 3.4.1. 6. Let n ≥ 1 be arbitrary. Consider each y ∈ (d(·,x◦ ) > 2n+1 ). Let m = 1, . . . ,n be arbitrary. Then (d(·,x) ≥ 2−m+1 ). y ∈ (d(·,x◦ ) > 2m+1 ) ⊂ x∈A(m)
Let x ∈ Am be arbitrary. Suppose d(y,x) < 2−m+1 . Then (d(·,x) ≤ 2−m+1 ) ⊂ (d(·,x◦ ) ≤ 2m+1 ) y∈
(3.4.5)
x∈A(m)
by Condition 3.2.2 in Definition 3.2.1 of a binary approximation. This is a contradiction. Hence d(y,x) ≥ 2−m+1 . Since gm,x has support (d(·,x) ≤ 2−m+1 ) according to Assertion 1 of Proposition 3.3.5, we infer that gm,x (y) = 0, where x ∈ Am and m = 1, . . . ,n are arbitrary. Hence the defining equality 3.4.1, where z is replaced by , reduces to
38
Introduction and Preliminaries d(y,) ≡
∞
2−m |Am |−1
m=1
=
∞
2−m |Am |−1
m=n+1
≤
∞
gm,x (y)
x∈A(m)
gm,x (y)
x∈A(m)
2−m = 2−n .
(3.4.6)
m=n+1
Since y ∈ (d(·,x◦ ) > 2n+1 ) is arbitrary, we conclude that (d(·,x◦ ) > 2n+1 ) ⊂ (d(·,) ≤ 2−n ).
(3.4.7)
This proves Condition 5 in Definition 3.4.1. 7. We will prove next that ( S,d) is totally bounded. To that end, let p ≥ 1 be arbitrary. Let m ≡ mp ≡ [(p + 2) + log2 (p + 2)]1, where we recall that [·]1 is the operation that assigns to each a ∈ [0,∞) an integer [a]1 in (a,a + 2). Then 2−m < δ p ≡ (p + 2)−1 2−p−2 . Note that S ≡ S ∪ {} ⊂ (d(·,x◦ ) < 2m ) ∪ (d(·,x◦ ) > 2m−1 ) ∪ {} ⊂ (d(·,x) ≤ 2−m ) ∪ (d(·,) ≤ 2−m+2 ) ∪ {}, x∈A(m)
where the second set inclusion is due to Condition 3.2.1 in Definition 3.2.1, and to relation 3.4.7 where n is replaced by m − 2. Continuing, (d(·,x) ≤ (p + 1)−1 2−p−2 ) ∪ (d(·,) < (p + 2)−1 2−p ) ∪ {} S⊂ x∈A(m)
⊂
(d(·,x) < 2−p ) ∪ (d(·,) < 2−p ) ∪ {},
x∈A(m)
where the last set inclusion is by applying inequality 3.4.4 with p replaced by p + 1. Consequently, the set Ap ≡ Am(p) ∪ {} is a metrically discrete 2−p -approximation of ( S,d). Since 2−p is arbitrarily small, the metric space ( S,d) is totally bounded. Hence its completion (S,d) is compact, and S is dense in (S,d), proving Condition 1 in Definition 3.4.1. 8. Incidentally, since S ≡ S ∪ {} is a dense subset of (S,d), the metrically discrete finite subset Ap is a 2−p -approximation of (S,d).
Partition of Unity
39
Summing up, (S,d) satisfies all the conditions in Definition 3.4.1 to be a onepoint compactification of (S,d). Corollary 3.4.4. Compactification of a binary approximation. Use the same notations and assumptions as in Theorem 3.4.3. Let (S,d) be a locally compact metric space. Let the sequence ξ ≡ (An )n=1,2,... of subsets be a binary approximation of (S,d) relative to the reference point x◦ . For each n ≥ 1, let An ≡ {xn,1, . . . ,xn,κ(n) }. Thus
ξ ≡ (|An |)n=1.2.... = (|κn |)n=1.2.··· . For each p ≥ 1 , write mp ≡ [(p + 2) + log2 (p + 2)]1 . Define Ap ≡ Am(p) ∪ {} ≡ {xm(p),1, . . . ,xm(p),κ(m(p)),}. Then ξ ≡ (Ap )p=1,2,... is a binary approximation of (S,d) relative to x◦ , called the compactification of the binary approximation ξ . The corresponding modulus of local compactness of (S,d) is given by ξ ≡ (|Ap |)p=1,2,... = (|Am(p) | + 1)p=1,2,... and is determined by ξ . Proof. Let p ≥ 1 be arbitrary. According to Step 8 of the proof of Theorem 3.4.3, the finite set Ap is a metrically discrete 2−p -approximation of (S,d). Hence
(d(·,x) ≤ 2−p ). (d(·,x◦ ) ≤ 2p ) ⊂ S ⊂ x∈A(p)
At the same time, Condition 1 of Definition 3.4.1 says that d ≤ 1. Hence, trivially, (d(·,x) ≤ 2−p+1 ) ⊂ S ⊂ (d(·,x◦ ) ≤ 1) ⊂ (d(·,x◦ ) ≤ 2p+1 ). x∈A(p)
Thus all the conditions in Definition 3.2.1 have been verified for ξ ≡ (Ap )p=1,2,... to be a binary approximation of (S,d) relative to x◦ . Let (S,d) be an arbitrary locally compact metric space, and let (S,d) be a onepoint compactification of (S,d). Let n ≥ 1 be arbitrary. Note that if n ≥ 2, then the n n power metric space (S ,d ) is compact, but need not be a one-point compactification of (S n,d n ). Recall that Cub (S,d) denotes the space of bounded and uniformly continuous functions on a locally compact metric space (S,d). The next proposition proves the promised preservation of continuous functions for a one-point compactification. Proposition 3.4.5. Continuous functions on (S, d) and continuous functions on (S, d). Let (S,d) be a locally compact metric space, with a fixed reference point x◦ ∈ S. Let (S,d) be a one-point compactification of (S,d), with the point at infinity . Then the following conditions hold: 1. Each compact subset K of (S,d) is also a compact subset of (S,d).
40
Introduction and Preliminaries
2. Let n ≥ 1 be arbitrary. Then n
n
C(S n,d n ) ⊂ C(S ,d )|S n ⊂ Cub (S n,d n ).
(3.4.8)
n n Moreover, for each f ∈ C(S n,d n ), there exists f ∈ C(S ,d ) such that f(x1, . . . ,xn ) ≡ f (x1, . . . ,xn ) or f(x1, . . . ,xn ) ≡ 0 according as (x1, . . . ,xn ) ∈ Sn. S n or xi = for some i = 1, . . . ,n, for each (x1, . . . ,xn ) ∈
Proof. 1. Suppose K is a compact subset of (S,d). By Conditions 3 and 4 of Definition 3.4.1, the identity mapping ι : (K,d) → (K,d) and its inverse are uniformly continuous. Since, by assumption, (K,d) is compact, so is (K,d). Assertion 1 is proved. 2. Let n ≥ 1 be arbitrary. Consider each f ∈ C(S n,d n ), with a modulus of continuity δf . Let the compact subset K of (S,d) be such that K n is a compact support of f . Write S ≡ S ∪ {}. Extend f to a function f : S n → R by defining f (x1, . . . ,xn ) ≡ f (x1, . . . ,xn ) or f (x1, . . . ,xn ) ≡ 0 according as (x1, . . . ,xn ) ∈ S n . We will show S n or xi = for some i = 1, . . . ,n, for each (x1, . . . ,xn ) ∈ n n that f is uniformly continuous on (S ,d ). 3. To that end, let ε > 0 be arbitrary. By Condition 3 in Definition 3.4.1, there exists δK (δf (ε)) > 0 such that for each y ∈ K and z ∈ S with d(y,z) < δ ≡ δK (δf (ε)), we have d(y,z) < δf (ε). 4. By Condition 2 in Definition 3.4.1, we have d(K,) > 0. Hence δ ≡ δ ∧ d(K,) > 0. S n with 5. Now consider each (x1, . . . ,xn ),(y1, . . . ,yn ) ∈ n
d ((x1, . . . ,xn ),(y1, . . . ,yn )) < δ ≤ δ.
(3.4.9)
There are four possibilities: (i) |f(x1, . . . ,xn ) − f(y1, . . . ,yn )| > 0 and xi = for some i = 1, . . . ,n. (ii) |f(x1, . . . ,xn ) − f(y1, . . . ,yn )| > 0 and yi = for some i = 1, . . . ,n. (iii) |f(x1, . . . ,xn ) − f(y1, . . . ,yn )| > 0 and (x1, . . . ,xn ),(y1, . . . ,yn ) ∈ S n . (iv) |f(x1, . . . ,xn ) − f(y1, . . . ,yn )| < ε. 6. Consider case (i). Then f(x1, . . . ,xn ) = 0 by the definition of the function f. Hence |f(y1, . . . ,yn )| > 0. Therefore (y1, . . . ,yn ) ∈ S n by the definition of the function f, and |f (y1, . . . ,yn )| ≡ |f(y1, . . . ,yn )| > 0. Since, by assumption, the compact set K n is a support of the function f , we see that (y1, . . . ,yn ) ∈ K n . Hence yi ∈ K. Therefore d(yi ,xi ) = d(yi ,) ≥ d(K,) ≥ δ ∧ d(K,) ≡ δ. Consequently, n
d ((x1, . . . ,xn ),(y1, . . . ,yn )) ≡
n j =1
a contradiction to inequality 3.4.9.
d(yj ,xj ) ≥ d(yi ,xi ) ≥ δ,
Partition of Unity
41
7. Similarly, case (ii) leads to a contradiction. Only cases (iii) and (iv) remain possible. 8. Consider case (iii). Then |f (x1, . . . ,xn ) − f (y1, . . . ,yn )| ≡ |f(x1, . . . ,xn ) − f(y1, . . . ,yn )| > 0. (3.4.10) Hence |f (x1, . . . ,xn )| > 0 or |f (y1, . . . ,yn )| > 0. Assume, without loss of generality, that |f (x1, . . . ,xn )| > 0. Since the compact set K n is a support of f , we have (x1, . . . ,xn ) ∈ K n , while (y1, . . . ,yn ) ∈ S n by assumption. Let i = 1, . . . ,n be arbitrary. Then xi ∈ K and yi ∈ S. At the same time, inequality 3.4.9 implies that d(yi ,xi ) < δ. Therefore, according to Step 3, we have d(yi ,xi ) < δf (ε), where i = 1, . . . ,n is arbitrary. Combining, we obtain d n ((x1, . . . ,xn ),(y1, . . . ,yn )) ≡
n
d(yi ,xi ) < δf (ε).
(3.4.11)
i=1
It follows that |f(x1, . . . ,xn ) − f(y1, . . . ,yn )| ≡ |f (x1, . . . ,xn ) − f (y1, . . . ,yn )| < ε. (3.4.12) 9. Summing up, we see that each of the only possible cases (iii) and (iv) implies that |f(x1, . . . ,xn ) − f(y1, . . . ,yn )| < ε, where ε > 0 and (x1, . . . ,xn ),(y1, . . . ,yn ) ∈ S n are arbitrary with n
d ((x1, . . . ,xn ),(y1, . . . ,yn )) < δ.
(3.4.13)
Thus the function f : ( S n,d ) → R is uniformly continuous. Since S n is a dense n n subset of the compact metric space (S ,d ), the function f can be extended to n n a continuous function f : (S ,d ) → R. Thus f = f|S n = (f | S n )|S n = f |S n , n n n n where f ∈ C(S ,d ). In short, f ∈ C(S ,d )|S n , where f ∈ C(S n,d n ) is arbin n trary. We conclude that C(S n,d n ) ⊂ C(S ,d )|S n . This proves the first half of the desired Condition 3.4.8 in Assertion 2. n n 10. Now consider each f ∈ C(S ,d ), with a modulus of continuity δf . Then n
n
f |S n : (S n,d ) → R is bounded and uniformly continuous with the modulus of continuity δf . At the same time, by Condition 4 in Definition 3.4.1, the identity mapping ι : (S,d) → (S,d), defined by ι(x) ≡ x for each x ∈ S, is uniformly n continuous on (S,d). Consequently, the mapping ιn : (S n,d n ) → (S n,d ), defined by ιn (x1, . . . ,xn ) ≡ (x1, . . . ,xn ) for each (x1, . . . ,xn ) ∈ S n , is uniformly continuous. Hence the composite mapping (f |S n ) ◦ ιn : (S n,d n ) → R is bounded and uniformly continuous. Since (f |S n ) ◦ ιn = f |S n on S n , we see that the function f |S n : (S n,d n ) → R is bounded and uniformly continuous.
42
Introduction and Preliminaries n
n
In short, f |S n ∈ Cub (S n,d n ), where f ∈ C(S ,d ) is arbitrary. We conclude n n that C(S ,d )|S n ⊂ Cub (S n,d n ). This proves the second half of the desired Condition 3.4.8 in Assertion 2. The proposition is proved. Proposition 3.4.6. Continuous functions on (S ∞, d ∞ ) and continuous func∞ ∞ ∞ ∞ tions on (S , d ). Let g ∈ C(S ,d ) be arbitrary, with modulus of continuity ∞ ∞ ∞ δg . Define g ≡ g|S . Then g ∈ Cub (S ∞,d ∞ ). In short, C(S ,d )|S ∞ ⊂ Cub (S ∞,d ∞ ). Proof. Let ε > 0 be arbitrary. Take n ≥ 1 so large that 2−n ≤ ε ≡ 2−1 δg (ε). By Condition 4 of Definition 3.4.1, there exists δd (ε ) ∈ (0,1) such that d(x ,y ) < ε for each x ,y ∈ S with d(x ,y ) < δd (ε ). Define δg (ε) ≡ 2−n δd (ε ). Let x,y ∈ S ∞ be arbitrary such that d ∞ (x,y) ≡
∞
2−i (1 ∧ d(xi ,yi )) < δg (ε).
i=1
Then, for each i = 1, . . . ,n, we have 1 ∧ d(xi ,yi ) < 2n δg (ε) ≡ δd (ε ), whence d(xi ,yi ) < δd (ε ) and so d(xi ,yi ) < ε . Hence ∞
d (x,y) ≡
∞
2−i (1 ∧ d(xi ,yi ))
i=1
≤
n
2−i (1 ∧ d(xi ,yi )) + 2−n ≤
i=1
n
2−i ε + ε = 2ε ≡ δg (ε).
i=1
Hence |g(x) − g(y)| = |g(x) − g(y)| < ε. Thus g ∈ Cub (S ∞,d ∞ ), as alleged.
Part II Probability Theory
4 Integration and Measure
We introduce next the Riemann–Stieljes integral on R. Then we present a general treatment of integration and measure theory in terms of Daniell integration, adapted from [Bishop and Bridges 1985]. The standard course in measure theory usually starts with measurable sets, before defining a measure function and an integration function. In contrast, the Daniell integration theory starts with the integration function and the integrable functions. Thus we discuss the computation of the integration early on. The end products of the measure- and integration functions are essentially the same in both approaches; the dissimilarity is only superficial. However, Daniell integrals are more natural, and cleaner, in the constructive approach.
4.1 Riemann–Stieljes Integral Definition 4.1.1. Distribution function. A distribution function is a nondecreasing real-valued function F on R, with domain(F ) dense in R. Definition 4.1.2. Riemann–Stieljes sum relative to a distribution function and a partition of the real line R. Let F be a distribution function, and let X ∈ C(R) be arbitrary. An arbitrary finite and increasing sequence (x0, . . . ,xn ) in domain(F ) is called a partition of R relative to the distribution function F . One partition is said to be a refinement of another if the former contains the latter as a subsequence. For any partition (x1, . . . ,xn ) relative to the distribution function F , define its mesh as n (xi − xi−1 ) , mesh(x1, . . . ,xn ) ≡ i=1
and define the Riemann–Stieljes sum as S(x0, . . . ,xn ) ≡
n
X(xi )(F (xi ) − F (xi−1 )).
i=1
45
46
Probability Theory
Theorem 4.1.3. Existence of Riemann–Stieljes integral. Let F be a distribution function, and let X ∈ C(R) be arbitrary. Then the Riemann–Stieljes sum converges as mesh(x1, . . . ,xn ) → 0 with x0 → −∞ and xn → +∞. The limit will be called the Riemann–Stieljes integral of X with respect to the distribution +∞ function F , and will be denoted by −∞ X(x)dF (x), or simply by X(x)dF (x). Proof. 1. Suppose X has modulus of continuity δX and vanishes outside the compact interval [a,b], where a,b ∈ domain(F ). Let ε > 0 be arbitrary. Consider an arbitrary partition (x0, . . . ,xn ) with (i) x0 < a − 2 < b + 2 < xn and (ii) mesh(x1, . . . ,xn ) < 1 ∧ δX (ε), where δX is a modulus of continuity for X. 2. Let i be any index with 0 < i ≤ n. Suppose we insert m points between (xi−1,xi ) and make a refinement (. . . ,xi−1,y1, . . . ,ym−1,xi , . . .). Let y0 and ym denote xi−1 and xi , respectively. Then the difference in Riemann–Stieljes sums for the new and old partitions is bounded by m
X(xi )(F (xi ) − F (xi−1 )) − X(yj )(F (yj ) − F (yj −1 ) j =1 m
= (X(xi ) − X(yj ))(F (yj ) − F (yj −1 ) j =1 m
≤ ε(F (yj ) − F (yj −1 ) = ε(F (xi ) − F (xi−1 ) j =1 Moreover, the difference is 0 if xi < a − 2 or xi−1 > b + 2. Since xi − xi−1 < 1, the difference is 0 if xi−1 < a − 1 or xi > b + 1. 3. Since any refinement of (x0, . . . ,xn ) can be obtained by inserting points between the pairs (xi−1,xi ), we see that the Riemann–Stieljes sum of any refine ment differs from that for (x0, . . . ,xn ) by at most ε(F (xi ) − F (xi−1 )), where the sum is over all i for which a < xi−1 and xi < b. The difference is therefore at most ε(F (b) − F (a)). 4. Consider a second partition (u0, . . . ,up ) satisfying Conditions (i) and (ii). Because the domain of F is dense, we can find a third partition (v0, . . . ,vq ) satisfying the same conditions and the additional condition that |vk − xi | > 0 and |vk − uj | > 0 for all i,j,k. Then (v0, . . . ,vq ) and (x0, . . . ,xn ) have a common refinement – namely, the merged sequence rearranged in increasing order. Thus their Riemann–Stieljes sums differ from each other by at most 2ε(F (b) − F (a)) according to Step 3. Similarly, the Riemann–Stieljes sum for (u0, . . . ,up ) differs from that of (v0, . . . ,vq ) by at most 2ε(F (b)−F (a)). Hence the Riemann–Stieljes sums for (u0, . . . ,up ) and (x0, . . . ,xn ) differ by at most 4ε(F (b) − F (a)). 5. Since ε is arbitrary, the asserted convergence is proved. Theorem 4.1.4. Basic properties of the Riemann–Stieljes integral. The Riemann-Stieljes integral is linear on C(R). It is also positive in the sense that if X(x)dF (x) > 0, then there exists x ∈ R such that X(x) > 0.
Integration and Measure
47
Proof. Linearity follows trivially from the defining formulas. Suppose a,b ∈ domain(F ) are such that X vanishes outside [a,b]. If the integral is greater than some positive real number c, then the Riemann–Stieljes sum S(x0, . . . ,xn ) for some partition with x1 = a and xn = b is greater than c. It follows that X(xi )(F (xi ) − F (xi−1 )) is greater than or equal to n−1 c for some index i. Hence X(xi )(F (b) − F (a)) ≥ X(xi )(F (xi ) − F (xi−1 )) ≥ n−1 c. This implies X(xi ) > n−1 c(F (b) − F (a)) > 0.
Definition 4.1.5. Riemann sums and Riemann integral. In the special case where domain(F ) ≡ R and F (x) ≡ x for each x ∈ R, the Riemann–Stieljes sums and Riemann–Stieljes integral of a function X ∈ C(R) relative to the distribution F are called the Riemann sums and the Riemann integral of X, respectively.
4.2 Integration on a Locally Compact Metric Space In this section, the Riemann–Stieljes integration is generalized to functions X ∈ C(S,d), where (S,d) is a locally compact metric space (S,d). Traditionally, integration is usually defined in terms of a measure, a function on a family of subsets that is closed relative to the operations of countable unions, countable intersections, and relative to complements. In the case of a metric space, one such family can be generated via these three operations from the family of all open subsets. Members of the family thus generated are called Borel sets. In the special case of R, the open sets can in turn be generated from a countable subfamily of intervals in successive partitions of R, wherein ever smaller intervals cover any compact interval in R. The intervals in the countable family can thus serve as building blocks in the analysis of measures on R. The Daniell integration theory is a more natural choice for the constructive development. Integrals of functions, rather than measures of sets, are the starting point. In the special case of a locally compact metric space (S,d), the family C(S,d) of continuous functions with compact supports supplies the basic integrable functions. Definition 4.2.1. Integration on a locally compact metric space. An integration on a locally compact metric space (S,d) is a real-valued linear function I on the linear space C(S,d) such that (i) I (X) > 0 for some X ∈ C(S,d) and (ii) for each X ∈ C(S,d) with I (X) > 0, there exists a point x in S for which X(x) > 0. Condition (ii) will be called the positivity condition of the integration I . It immediately follows that if X ∈ C(S,d) is such that X ≤ 0, then I (X) ≤ 0. By the linearity and the positivity of the function I , we see that if X ∈ C(S,d) is such that X ≥ 0, then I (X) ≥ 0. The Riemann–Stieljes integration defined for a distribution function F on R is an integration on (R,d), where d is the Euclidean metric and is denoted by
48 Probability Theory ·dF , with I (X) written as X(x)dF (x) for each X ∈ C(S). Riemann–Stieljes integrals provide an abundance of examples for integration on locally compact metric spaces. It follows from the positivity Condition (ii) and from the linearity of I that if X,Y ∈ C(S,d) are such that I (X) > I (Y ), then there exists a point x in S for which X(x) > Y (x). The positivity Condition (ii), extended in the next proposition, is a powerful tool in proving existence theorems. It translates a condition on integrals into the existence of a point in S with certain properties. To prove the next proposition, we need the following lemma, which will be used again in a later chapter. This lemma, obtained from [Chan 1975], is a pleasant surprise because, in general, the convergence of a series of nonnegative real numbers does not follow constructively from the boundedness of partial sums. Lemma 4.2.2. Positivity of a linear function on a linear space of functions. Suppose I is a linear function on a linear space L of functions on a set S such that I (X) ≥ 0 for each nonnegative function X ∈ L. Suppose I satisfies the following condition: for each X0 ∈ L, there exists a nonnegative function Z ∈ L such that for each sequence (Xi )i=1,2,... of nonnegative functions in L with ∞ i=1 I (Xi ) < I (X0 ), there exists x ∈ S with (i) Z(x) = 1 p and (ii) i=1 Xi (x) ≤ X0 (x) for each p > 0. Then, for each X0 ∈ L and for each sequence (Xi )i=1,2,... of nonnegative ∞ functions in L with ∞ i=1 I (Xi ) < I (X0 ), there exists x ∈ S such that i=1 Xi (x) converges and is less than X0 (x). ∞ Proof. Classically, the convergence of i=1 Xi (x) follows trivially from the boundedness of the partial sums. Note that if the constant function 1 is a member of L, then the lemma can be simplified with Z ≡ 1 or, equivalently, with Z omitted altogether. 1. Suppose X0 ∈ L and suppose (Xi )i=1,2,... is a sequence of nonnegative functions in L with ∞ i=1 I (Xi ) < I (X0 ). Let Z ∈ L be as given in the hypothesis. Take a positive real number α so small that αI (Z) +
∞
I (Xi ) + α < I (X0 ).
i=1
Take an increasing sequence (nk )k=1,2,... of integers such that ∞
i=n(k)
for each k ≥ 1.
I (Xi ) < 2−2k α
(4.2.1)
Integration and Measure
49
2. Consider the sequence of functions ⎛ ⎞ n(3) n(2)
⎝αZ,X1,2 Xi ,X2,22 Xi ,X3, . . . ⎠ i=n(1)
i=n(2)
It can easily be verified that the series of the corresponding values for the function I then converges to a sum less than αI (Z) + ∞ i=1 I (Xi ) + α, which is in turn less than I (X0 ) according to inequality 4.2.1. 3. Hence, by Conditions (i) and (ii) in the hypothesis, there exists a point x ∈ S with Z(x) = 1 such that αZ(x) + X1 (x) + · · · + Xk (x) + 2
k
n(k+1)
Xi (x) ≤ X0 (x)
i=n(k)
n(k+1) for each k ≥ 1. In particular, i=n(k) Xi (x) ≤ 2−k X0 (x) so ∞ i=1 Xi (x) < ∞. The last displayed inequality implies also that αZ(x) +
∞
Xi (x) ≤ X0 (x).
i=1
Because Z(x) = 1, we obtain α+
∞
Xi (x) ≤ X0 (x)
i=1
as desired.
Proposition 4.2.3. Positivity of an integration on a locally compact metric space. Let I be an integration on a locally compact metric space (S,d). Let (Xi )i= 0,1,2,... be a sequence in C(S,d) such that Xi is nonnegative for i ≥ 1, and ∞ such that ∞ i=1 I (Xi ) < I (X0 ). Then there exists x ∈ S such that i=1 Xi (x) < X0 (x). Proof. 1. Let K be a compact support of X0 . The set B ≡ {x ∈ S : d(x,K) ≤ 2} is bounded. Hence, since S is locally compact, there exists a compact subset K such that B ⊂ K. Define Z ≡ (1 − d(·,K))+ . 2. Let ε ∈ (0,1) be arbitrary. By Lemma 3.2.2, there exists a metrically discrete and enumerated finite set A ≡ {y1, . . . ,yn } that is an ε-approximation of K. Let {Yy(1), . . . ,Yy(n) } be the ε-partition of unity determined by A, as in Definition 3.3.2. For short, we abuse notations and write Yk ≡ Yy(k) for each k = 1, . . . ,n. By Proposition 3.3.3, we have Yk ≥ 0 for each k = 1, . . . ,n, and nk=1 Yk ≤ 1, with equality prevailing on K ⊂ x∈A (d(·,x) ≤ ε). It n n follows that k=1 Xi Yk ≤ Xi , and k=1 X0 Yk = X0 since K is a support of the
50
Probability Theory n function X0 . Consequently, k=1 I (Xi Yk ) ≤ I (Xi ) for each i ≥ 0, with equality in the case i = 0. Therefore n
∞
I (Xi Yk ) ≤
k=1 i=1
∞
I (Xi ) < I (X0 ) =
i=1
n
I (X0 Yk )
k=1
3. Hence there exists some k = 1, . . . ,n for which ∞
I (Xi Yk ) < I (X0 Yk )
i=1
Again by Proposition 3.3.3, for each x ∈ S with Yk (x) > 0, we have d(x,yk ) < 2ε, whence d(x,K) ≤ 2 and x ∈ B. Therefore (Yk (x) > 0 and Yk (x ) > 0) ⇒ (d(x,x ) ≤ 4ε and x,x ∈ B ⊂ K) for each x,x ∈ S. Define Z1 ≡ Yk . 4. Let m ≥ 1 be arbitrary, By repeating the previous argument with εm = (4m)−1 , we can construct inductively a sequence of nonnegative continuous functions (Zm )m=1,2,... such that for each m ≥ 1 and for each x,x ∈ S, we have (Zm (x) > 0 and Zm (x ) > 0) ⇒ (d(x,x ) ≤ m−1 and x,x ∈ K)
(4.2.2)
and such that ∞
I (Xi Z1 . . . Zm ) < I (X0 Z1 . . . Zm ).
(4.2.3)
i=1
5. Since all terms in relation 4.2.3 are nonnegative, the same inequality holds if the infinite sum is replaced by the partial sum of the first m terms. By the positivity of I , this implies, for each m ≥ 1 , that there exists xm ∈ (S,d) such that m
Xi Z1 . . . Zm (xm ) < X0 Z1 . . . Zm (xm )
(4.2.4)
i=1
6. Inequality 4.2.4 immediately implies that Zp (xm ) > 0 for each p ≤ m. Therefore the inference 4.2.2 yields xp ∈ K and d(xp,xm ) ≤ p−1 for each p ≤ m. Hence (xm )m=1,2,... is a Cauchy sequence in K and converges to some point x ∈ K. By the definition of the function Z at the beginning of this proof, we have Z(x) = 1. Canceling positive common factors on both sides of inequality 4.2.4, we obtain p p i=1 Xi (xm ) < X0 (xm ) for each p ≤ m. Letting m → ∞ yields i=1 Xi (x) ≤ X0 (x) for each p ≥ 1. 7. The conditions in Lemma 4.2.2 have thus been established. Accordingly, there exists x ∈ S such that ∞ i=1 Xi (x) converges and is less than X0 (x). The proposition is proved.
Integration and Measure
51
4.3 Integration Space: The Daniell Integral Integration on a locally compact space is a special case of Daniell integration, introduced next. Definition 4.3.1. Integration Space. An integration space is a triple (,L,I ) where is a nonempty set, L is a set of real-valued functions on , and I is a real-valued function with domain(I ) = L, satisfying the following conditions: 1. If X,Y ∈ L and a,b ∈ R, then aX + bY,|X|, and X ∧ 1 belong to L, and I (aX + bY ) = aI (X) + bI (Y ). In particular, if X,Y ∈ L, then there exists ω ∈ domain(X) ∩ domain(Y ). 2. If a sequence (Xi )i=0,1,2,... of functions in L is such that Xi is nonnegative I (X ) < I (X0 ), then there exists a point for each i ≥ 1 and such that ∞ i=1 ∞ i ω∈ ∞ domain(X ) such that i i=0 i=1 Xi (ω) < X0 (ω). This condition will be referred to as the positivity condition for I . 3. There exists X0 ∈ L such that I X0 = 1. 4. For each X ∈ L, we have I (X ∧ n) → I (X) and I (|X| ∧ n−1 ) → 0 as n → ∞. Then the function I is called an integration on (,L), and I (X) is called the integral of X, for each X ∈ L. Given X ∈ L, the function X ∧ n = n((n−1 X) ∧ 1) belongs to L by Condition 1 of Definition 4.3.1. Similarly, the function |X| ∧ n−1 belongs to L. Hence the integrals I (X ∧ n) and I (|X| ∧ n−1 ) in Condition 4 make sense. In the following discussion, to minimize clutter, we will write I X for I (X) and I XY etc. for I (XY ) etc. when there is no risk of confusion. The positivity condition, the innocent-looking Condition 2, is a powerful tool that is useful in many constructive existence proofs. Lemma 4.3.2. An existence theorem in an integration space. Let (,L,I ) be an arbitrary integration space. 1. Let (Yi )i=1,2,... be an arbitrary sequence in L such that ∞ i=1 I |Yi | < ∞. Then there exists an element ω in the set ∞ ∞
domain(Yi ) : |Yi (ω)| < ∞ . D≡ ω∈ i=1
i=1
In other words, the set D is nonempty. 2. Let (Yi )i=1,2,... be an arbitrary sequence in L. Then the set ∞ i=1 domain(Yi ) is nonempty. Proof. 1. Let (Yi )i=1,2,... be an arbitrary sequence in L such that ∞ i=1 I |Yi | < ∞. By Condition 3 in Definition 4.3.1, there exists X0 ∈ L such that I X0 = 1. Hence ∞ i=1 I |Yi | < I Y0 , where Y0 ≡ aX0 for some sufficiently large a > 0. It follows from Condition 2 in Definition 4.3.1 that there exists a point
52
Probability Theory ω∈
∞
domain(|Yi |) =
i=0
∞
domain(Yi )
i=0
such that ∞ i=1 |Yi (ω)| < Y0 (ω). Assertion 1 is proved. 2. Let (Yi )i=1,2,... be an arbitrary sequence in L. Then the sequence (0Y1,0Y2, . . .) satisfies the hypothesis in Assertion 1. 4.3.1. Accordingly, there exists a an element ω in the set ∞ ∞ ∞
domain(Yi ) : |0Yi (ω)| < ∞ = domain(Yi ). ω∈ i=1
i=1
i=1
Assertion 2 is proved.
One trivial but useful example of an integration space is the triple (,L,δω ), where ω is a given point in a given set , L is the set of all functions X on whose domains contain ω, and δω is defined on L by δω (X) = X(ω). The integration δω is called the point mass at ω. The next proposition gives an abundance of nontrivial examples. Proposition 4.3.3. An integration on a locally compact space yields an integration space. Let I be an integration on the locally compact metric space (S,d) as defined in Definition 4.2.1. Then (S,C(S,d),I ) is an integration space. Proof. The positivity condition in Definition 4.3.1 has been proved for (S,C(S,d),I ) in Proposition 4.2.3. The other conditions in Definition 4.3.1 are easily verified. The next proposition collects some more useful properties of integration spaces. Proposition 4.3.4. Basic properties of an integration space. Let (,L,I ) be an integration space. Then the following hold: 1. If X,Y ∈ L, then X ∨ Y,X ∧ Y ∈ L. If, in addition, a > 0, then X ∧ a ∈ L and I (X ∧ a) is continuous in a. 2. If X ∈ L, then X+,X− ∈ L and I X = I X+ + I X− . 3. For each X ∈ L with I X > 0, there exists ω such that X(ω) > 0. 4. Suppose X ∈ L is such that X ≥ 0. Then I X ≥ 0. Let X,Y,Z ∈ L be arbitrary. Then I |X − Z| ≤ I |X − Y | + I |Z − Y |
(4.3.1)
and I |Y − Z| = I |Z − Y |. In particular, we obtain I |X − Z| ≤ I |X| + I |Z| when we set Y ≡ 0X. 5. There exists a nonnegative X ∈ L such that I X = 1. Proof. 1. The first part of Assertion 1 follows from X ∨ Y = (X + Y + |X − Y |)/2 and X ∧ Y = (X + Y − |X − Y |)/2. The second part follows from X ∧ a = a(a −1 X ∧ 1) and, in view of Condition 4 in Definition 4.3.1, from I |X ∧ a − X ∧ b| ≤ I (|b − a| ∧ |X|) for a,b > 0.
Integration and Measure
53
2. Assertion 2 follows from X+ = X ∨ 0X, X− = X ∧ 0X, and X = X+ + X− . 3. Suppose X ∈ L has integral I X > 0. The positivity condition in Definition 4.3.1, when applied to the sequence (X,0X,0X, . . .), guarantees an ω ∈ domain(X) such that X(ω) > 0. Assertion 3 is proved. 4. Suppose X(ω) ≥ 0 for each ω ∈ domain(X). Suppose I X < 0. Then I (−X) > 0, and Assertion 3 would yield ω ∈ domain(X) with −X(ω) > 0 or, equivalently, X(ω) < 0, a contradiction. We conclude that I X ≥ 0. The first part of Assertion 4 is verified. Now let X,Y,Z ∈ L be arbitrary. Then |X − Y | + |Z − Y | − |X − Z| ≥ 0. Hence by the first part of Assertion 4, and by linearity, we obtain I |X − Y | + I |Z − Y | − I |X − Z| ≥ 0, which proves inequality 4.3.1 in Assertion 4. In the particular case where X ≡ Y , this reduces to I |Z − Y | ≥ I |Y − Z|. Similarly, I |Y − Z| ≥ I |Z − Y |. Hence I |Y − Z| = I |Z − Y |. Assertion 4 is proved. 5. By Condition 3 of Definition 4.3.1, there exists X ∈ L such that I (X) > 0. By Assertion 4, and by the linearity of I , we see that I X ≤ I |X| and so I |X| > 0. Let X0 denote the function |X|/I |X|. Then X0 ∈ L is nonnegative and I X0 = 1. Assertion 5 and the proposition are proved. Definition 4.3.5. Integration subspace. Let (,L,I ) be an integration space. Let L be a linear subspace of L such that (,L ,I ) is an integration space. We will then call (,L ,I ) an integration subspace of (,L,I ). When confusion is unlikely, we will abuse terminology and simply call L an integration subspace of L, with and I understood. Proposition 4.3.6. A linear subspace closed to absolute values and minimum with constants is an integration subspace. Let (,L,I ) be an integration space. Let L be a linear subspace of L such that if X,Y ∈ L , then |X|,X ∧ 1 ∈ L . Then (,L ,I ) is an integration subspace of (,L,I ). Proof. By hypothesis, L is closed to linear operations, absolute values, and the operation of taking minimum with the constant 1. Condition 1 in Definition 4.3.1 for an integration space is thus satisfied by L . Conditions 2 and 3 are trivially inherited by (,L ,I ) from (,L,I ). Proposition 4.3.7. Integration induced by a surjection. Let (,L,I ) be an integration space. Let π : → be a surjection. For each f ∈ L, write f (π ) ≡ f ◦ π . Define L ≡ {f (π ) : f ∈ L} and define I : L → R by I X ≡ I (f ) for each f ∈ L and for each X ≡ f (π ) ∈ L. Then (,L,I ) is an integration space. Proof. Suppose X = f (π ) = g(π ) for some f ,g ∈ L. Let ω ∈ domain(f ) be arbitrary. Since π is a surjection, there exists ∈ such that π( ) = ω ∈ domain(f ). It follows that ∈ domain(f (π )) = domain(g(π )) and so
54
Probability Theory
ω = π( ) ∈ domain(g). Since ω ∈ domain(f ) is arbitrary, we see that domain(f ) ⊂ domain(g) and, by symmetry, domain(f ) = domain(g). Moreover, f (ω) = f (π( )) = g(π( )) = g(ω). We conclude that f = g. Next let and a,b ∈ R be arbitrary. Let X ≡ f (π ) and Y ≡ h(π ), where f ,h ∈ L are arbitrary. Then af + bh ∈ L and so aX + bY = (af + bh)(π ) ∈ L. Furthermore I (aX + bY ) = I (af + bh) = aI (f ) + bI (h) ≡ aI X + bI Y . Thus L is a linear space and I is a linear function. Similarly, |X| = |f |(π ) ∈ L and a ∧ X = (a ∧ f )(π ) ∈ L. Furthermore, I (a ∧ X) ≡ I (a ∧ f ) → I (f ) ≡ I X as a → ∞, while I (a ∧ |X|) ≡ I (a ∧ |f |) → 0 as a → 0. Thus Conditions 1, 3, and 4 in Definition 4.3.1 for an integration space are verified for the triple (,L,I ). It remains to prove the positivity condition (Condition 2) in Definition 4.3.1. To that end, let (Xi )i=0,1,2,... be a sequence in L such that Xi is nonnegative for each i ≥ 1 and such that ∞ i=1 I Xi < I X0 . For each i ≥ 0, let fi ∈ L be such that X = fi (π ). Then, since π is a surjection, fi ≥ 0 for each i ≥ 1. Moreover, ∞ i I (fi ) ≡ ∞ ≡ I (f0 ). Since I is an integration, there exists i=1 i=1 I Xi < I X0 ∞ ω∈ ∞ domain(f ) such that i i=0 i=1 fi (ω) < f0 (ω). Let ∈ be such that π( ) = ω. Then ∈
∞ i=0
domain(fi (π )) =
∞
domain(Xi ).
i=0
∞ ∞ By hypothesis, i=1 Xi ( ) = i=1 fi (ω) < f0 (ω) = X0 ( ). All the conditions in Definition 4.3.1 have been established. Accordingly, (,L,I ) is an integration space.
4.4 Complete Extension of Integration A common and useful integration space is (S,C(S,d),I ), where (S,d) is a locally compact metric space and where I is an integration in the sense of Definition 4.2.1. However, the family C(S,d) of continuous functions with compact support is too narrow to hold all the interesting integrable functions in applications. For example, in the case where (S,d) is the unit interval [0,1] with the Euclidean metric and where I is the Lebesgue integral, it will be important to be able to integrate simple step functions like the indicator 1[2−1,1] . For that reason, we will expand the family C(S,d) to a family of integrable functions that includes functions which need not be continuous, and that includes an abundance of indicators. More generally, given an integration space (,L,I ), we will expand the family L to a larger family L1 and extend the integration I to L1 . We will do so by summing certain series of small pieces in L, in a sense to be made precise presently. This is analogous to of the expansion of the set of rational numbers to the set of
Integration and Measure
55
real numbers by representing a real number as the sum of a convergent series of rational numbers. Definition 4.4.1. Integrable functions and complete extension of an integration space. Let (,L,I ) be an arbitrary integration space. Recall that each function X on is required to have a nonempty domain(X). A function X on is called an integrable function if there exists a sequence (Xn )n=1,2,... in L satisfying the following two conditions: (i) ∞ i=1 I |Xi | < ∞. ∞ (ii) For each ω ∈ ∞ i=1 domain(Xi ) such that i=1 |Xi (ω)| < ∞, we have ω ∈ domain(X) and X(ω) =
∞
Xi (ω).
i=1
The sequence (Xn )n=1,2,... is then called a representation of the integrable function X by elements of L, relative to I . The set of integrable functions will be denoted by L1 . The sum I1 (X) ≡
∞
I Xi
(4.4.1)
i=1
is then called the integral of X. The function I1 : L1 → R is called the complete extension, or simply completion, of I . Likewise, L1 and (,L1,I1 ) are called the complete extensions, or simply completions, of L and (,L,I ), respectively. The next proposition and theorem prove that (i ) the function I1 is well defined on L1 and (ii ) the completion (,L1,I1 ) is an integration space, with L ⊂ L1 and I = I1 |L. Proposition 4.4.2. Complete extension of an integration is well defined. Let (,L,I ) be an arbitrary integration space. Let X be an arbitrary integrable function. If (Xn )n=1,2,... and (Yn )n=1,2,... are two representations of the inte∞ grable function X, then ∞ i=1 I Xi = i=1 I Yi . Proof. By Condition (i) of Definition 4.4.1 for a representation, we have ∞ ∞ i=1 I |Xi | < ∞ and i=1 I |Yi | < ∞. Therefore, by Assertion 4 of Proposition 4.3.4, we have ∞
I |Yi − Xi | ≤
i=1
∞
I |Xi | +
i=1
∞
I |Yi | < ∞.
i=1
∞ Suppose, for the sake of a contradiction, that ∞ i=1 I Xi < i=1 I Yi . Then, for some sufficiently large number m ≥ 1, we have ∞
i=m+1
I |Xi | +
∞
i=m+1
I |Yi |
0 be arbitrary. Then it follows from the convergence condition 4.4.10 that there exists m ≥ 1 so large that I1 |X − m i=1 Xi | < ε. Write Y ≡ m i=1 Xi ∈ L. Then I1 |X − Y | < ε. Moreover, since Y ∈ L, there exists, according to Condition 4 of Definition 4.3.1 for (,L,I ), some p ≥ 1 so large that |I1 (Y ∧ n) − I1 Y | = |I (Y ∧ n) − I Y | < ε for each n ≥ p. Consider each n ≥ p. Then, since |X ∧ n − Y ∧ n| ≤ |X − Y |, we have I1 |X ∧ n − Y ∧ n| ≤ I1 |X − Y | < ε. Hence I1 X ≥ I1 (X ∧ n) > I1 (Y ∧ n) − ε > I1 Y − 2ε > I1 X − 3ε. Since ε > 0 is arbitrary, the last displayed inequality implies that I1 (X∧n) → I1 X as n → ∞. 10. Separately, again by Condition 4 of Definition 4.3.1 for (,L,I ), there exists p ≥ 1 so large that I (|Y | ∧ n−1 ) < ε for each n ≥ p. Therefore, using the elementary inequality a ∧ c − b ∧ c ≤ |a − b| for each a,b,c ∈ (0,∞), we obtain I1 (|X| ∧ n−1 ) ≤ I1 (|Y | ∧ n−1 ) + I1 |(|X| − |Y |)| ≤ I1 (|Y | ∧ n−1 ) + I1 |X − Y | < 2ε, for each n ≥ p. Since ε > 0 is arbitrary, the last displayed inequality implies that I1 (|X| ∧ n−1 ) → 0 as n → ∞. 11. In Steps 9 and 10, we have verified Condition 4 in Definition 4.3.1 for (,L1,I1 ). Summing up, all the conditions in Definition 4.3.1 have been verified for (,L1,I1 ) to be an integration space. The theorem is proved. Corollary 4.4.4. L is dense in L1 . Let (,L,I ) be an arbitrary integration space. Let (,L1,I1 ) be its completion. Let X ∈ L1 be arbitrary, with a representation (Xi )i=1,2,... in L. Then I1 |X − m i=1 Xi | → 0 as m → ∞. Proof. See inequality 4.4.10 in Step 8 of the proof of Theorem 4.4.3.
Definition 4.4.5. Notational convention for a complete integration space. From this point forward, we will use the same symbol I to denote both the given integration and its complete extension. Thus we write I also for its I1 . An integration space (,L,I ) is said to be complete if (,L,I ) = (,L1,I ). If, in addition, 1 ∈ L and I 1 = 1, then (,L,I ) is called a probability integration space. Lemma 4.4.6. Integrable function from the sum of a series in L. Let (,L,I ) be an arbitrary integration space. Let (Xn )n=1,2,... be an arbitrary sequence in L such that ∞ i=1 I |Xi | < ∞. Define the function X on by
62
Probability Theory
domain(X) ≡ ω ∈
∞
domain(Xi ) :
i=1
∞
|Xi (ω)| < ∞
i=1
and by X(ω) ≡ ∞ i=1 Xi (ω) for each ω ∈ domain(X). Then domain(X) is nonempty, and the function X is integrable, with I Xi . We will then (Xn )n=1,2,... as a representation in L, and with I X = ∞ i=1 call X ∈ L1 the sum of the sequence (Xn )n=1,2,... and define ∞ i=1 Xi ≡ X ∈ L1 . ∞ Proof. By hypothesis, we have i=1 I |Xi | < ∞. Thus Condition (i) of Definition 4.4.1 is satisfied by the sequence (Xn )n=1,2,... . Moreover, by Assertion 1 of Lemma 4.3.2, domain(X) is nonempty. Now consider each ω ∈ ∞ ∞ i=1 domain(Xi ) such that i=1 |Xi (ω)| < ∞. Then, by the definition of the function X in the hypothesis, we have ω ∈ domain(X) and X(ω) =
∞
Xi (ω).
i=1
Thus Condition (ii) of Definition 4.4.1 is also satisfied for the sequence (Xn )n=1,2,... to be a representation of the function X in L. Accordingly, X ∈ L1 and I X = ∞ i=1 I Xi . Theorem 4.4.7. Nothing is gained from further complete extension. Let (,L,I ) be an arbitrary integration space. Let (,L1,I ) denote its complete extension. In turn, let (,(L1 )1,I ) denote the complete extension of (,L1,I ). Then (,(L1 )1,I ) = (,L1,I ). In other words, the complete extension of an arbitrary integration space is complete, and further completion yields nothing new. Proof. 1. Because L1 ⊂ (L1 )1 according to Theorem 4.4.3, it remains only to prove that (L1 )1 ⊂ L1 . 2. To that end, let Z ∈ (L1 )1 be arbitrary. Then there exists a sequence (Zk )k=1,2,... in L1 that is a representation of the function Z. Hence, by Definition 4.4.1, the following two conditions hold: ∞. (i) ∞ k=1 I |Zk | < ∞ (ii) For each ω ∈ ∞ k=1 domain(Zk ) such that k=1 |Zk (ω)| < ∞, we have ω ∈ domain(Z) and Z(ω) =
∞
Zk (ω).
k=1
3. Let k ≥ 1 be arbitrary. Let the sequence (Xk,m )m=1,2,... in L be a representation of the function Zk ∈ L1 . Then < ∞. (i ) ∞ m=1 I |Xk,m | ∞ (ii ) For each ω ∈ ∞ m=1 domain(Xk,m ) such that m=1 |Xk,m (ω)| < ∞, we have ω ∈ domain(Zk ) and Zk (ω) =
∞
m=1
Xk,m (ω).
Integration and Measure 63 p 4. By Corollary 4.4.4, we have I1 |Zk − m=1 Xk,m | → 0 as p → ∞. −k and Hence there exists mk ≥ 1 so large that I |Zk − m(k) m=1 Xk,m | < 2 ∞ −k m=m(k)+1 I |Xk,m | < I |Zk | + 2 . Consider the sequence ⎛ ⎞ m(k)
(Vk,1,Vk,2, . . .) ≡ ⎝ Xk,m,Xk,m(k)+1,Xk,m(k)+2, . . . ⎠ m=1
in L. Let the sequence (Un )n=1,2,... be an enumeration of the double sequence (Vk,m )k,m=1,2,... in L. We will verify that the sequence (Un )n=1,2,... in L is a representation of the function Z. 5. Note that (i ) m(k)
∞ ∞
∞ ∞ ∞ ∞
I |Un | = I |Vk,m | = I Xk,m + I |Xk,m | k=1 m=m(k)+1 n=1 k=1 m=1 k=1 m=1 ∞ ∞ ∞
−k −k ≤ (I |Zk | + 2 ) + (I |Zk | + 2 ) = 2 I |Zk | + 2 < ∞. k=1
6. Consider each ω ∈ ω∈
∞
k=1
∞
n=1 domain(Un )
domain(Un ) =
n=1
=
such that
∞ ∞ k=1 m=1 ∞ ∞
∞
k=1
n=1 |Un (ω)|
< ∞. Then
domain(Vk,m ) domain(Xk,m ).
(4.4.11)
k=1 m=1
Moreover,
∞ m(k) ∞ ∞
+ X (ω) |Xk,m (ω)| k,m k=1 m=m(k)+1 k=1 m=1 =
∞ ∞
|Vk,m (ω)| =
k=1 m=1
∞
|Un (ω)| < ∞.
(4.4.12)
n=1
Let k ≥ 1 be arbitrary. Then inequality 4.4.12 implies that ∞
|Xk,m (ω)| < ∞,
m=m(k)+1
whence ∞ m=1 |Xk,m (ω)| < ∞. Consequently, by Condition (ii ), we have ω ∈ domain(Zk ) with Zk (ω) =
∞
m=1
Xk,m (ω).
(4.4.13)
64
Probability Theory
Therefore ∞
Un (ω) =
n=1
∞
∞
Vk,m (ω)
k=1 m=1
=
∞
⎛ ⎝
k=1
=
m(k)
Xk,m (ω) +
m=1
∞
∞
∞
⎞ Xk,m (ω)⎠
m=m(k)+1
Xk,m (ω) =
k=1 m=1
∞
Zk (ω) = Z(ω),
(4.4.14)
k=1
where the last two equalities are due to equality 4.4.13 and Condition (ii), respectively. Summing up, we have proved that: ∞ (ii ) for each ω ∈ ∞ n=1 domain(Un ) such that n=1 |Un (ω)| < ∞,we have ω ∈ domain(Z) and Z(ω) =
∞
Un (ω).
n=1
7. Conditions (i ) and (ii ) together show that the sequence (Un )n=1,2,... in L is a representation of the function Z. Thus we have proved that Z ∈ L1 , where Z ∈ (L1 )1 is arbitrary. We conclude that (L1 )1 ⊂ L1 , as alleged. Following is a powerful tool for the construction of integrable functions on a complete integration space. Theorem 4.4.8. Monotone Convergence Theorem. Let (,L,I ) be a complete integration space. Then the following holds: 1. Suppose (Xi )i=1,2,... is a sequence in L such that Xi−1 ≤ Xi for each i ≥ 2, and such that limi→∞ I (Xi ) exists. Then there exists X ∈ L such that (i) domain(X) ⊂ ∞ i=1 domain(Xi ) and (ii) X = limi→∞ Xi on domain(X). Moreover, limi→∞ I |X − Xi | = 0 and I Xi ↑ I X as i → ∞. 2. Suppose (Yi )i=1,2,... is a sequence in L such that Yi−1 ≥ Yi for each i ≥ 2, and such that limi→∞ I (Yi ) exists. Then there exists Y ∈ L such that (iii) domain(Y ) ⊂ ∞ i=1 domain(Yi ) and (iv) Y = limi→∞ Yi on domain(Y ). Moreover, limi→∞ I |Y − Yi | = 0 and I Yi ↓ I Y as i → ∞. Proof. 1. Suppose (Xi )i=1,2,... is a sequence in L such that Xi−1 ≤ Xi for each i ≥ 2, and such that limi→∞ I (Xi ) exists. Consider the sequence (Vn )n=1,2,... ≡ (X1,X2 − X1,X3 − X2, . . .) in L. Then ∞ n=1 I |Vn | = limi→∞ I (Xi ) < ∞. Hence, according to Lemma 4.4.6, the function X defined by ∞ ∞
domain(Vi ) : |Vi (ω)| < ∞ , domain(X) ≡ ω ∈ and by X(ω) ≡
∞
i=1 Vi (ω)
i=1
i=1
for each ω ∈ domain(X), is integrable. Moreover,
Integration and Measure domain(X) ⊂
∞
domain(Vi ) =
i=1
∞
65
domain(Xi ),
i=1
which proves the desired Condition (i). Furthermore, consider each ω ∈ domain ∞ (X). Then X(ω) ≡ i=1 Vi (ω) = limi→∞ Xi (ω), which proves the desired Condition (ii). In addition, according to Corollary 4.4.4, we have m
Vn → 0 I |X − Xm | = I X − n=1
as m → ∞. Finally, |I X − I Xm | ≤ I |X − Xm | → 0. Hence I Xm ↑ I X. Assertion 1 of the theorem is proved. 2. Suppose (Yi )i=1,2,... is a sequence in L such that Yi−1 ≥ Yi for each i ≥ 2, and such that limi→∞ I (Yi ) exists. For each i ≥ 1, define Xi ≡ −Yi . Then (Xi )i=1,2,... is a sequence in L such that Xi−1 ≤ Xi for each i ≥ 2, and such that limi→∞ I (Xi ) exists. Hence, by Assertion 1, there exists X ∈ L such that (i) domain(X) ⊂ ∞ i=1 domain(Xi ) and (ii) X = limi→∞ Xi on domain(X). Moreover, limi→∞ I |X − Xi | = 0 and I Xi ↑ I X as i → ∞. Now define Y ≡ −X ∈ L. Then (iii) domain(Y ) = domain(X) ⊂
∞
domain(Xi ) =
i=1
∞
domain(Yi )
i=1
and (iv) Y = −X = − limi→∞ Xi = limi→∞ Yi on domain(Y ). Moreover, limi→∞ I |Y −Yi | = limi→∞ I |X−Xi | = 0 and I Yi = −I Xi ↓ −I X = I Y as i → ∞. Assertion 2 is proved.
4.5 Integrable Set To model an event in an experiment of chance that may or may not occur depending on the outcome, we can use a function of the outcome with only two possible values, 1 or 0. Equivalently, we can specify the subset of those outcomes that realize the event. We make these notions precise in the present section. Definition 4.5.1. Indicators and mutually exclusive subsets. A function X on a set with only two possible values, 1 or 0, is called an indicator. Subsets A1, . . . ,An of a set are said to be mutually exclusive if Ai Aj = φ for each i,j = 1, . . . ,n with i j . Indicators X1, . . . ,Xn are said to be mutually exclusive if the sets {ω ∈ domain(Xi ) : Xi (ω) = 1} (i = 1, . . . ,n) are mutually exclusive. In the remainder of this section, let (,L,I ) be a complete integration space. Recall that an integrable function need not be defined everywhere. However, such functions are defined almost everywhere in the sense of the next definition. Definition 4.5.2. Full set, and conditions holding almost everywhere. A subset D of is called a full set if D ⊃ domain(X) for some integrable function
66
Probability Theory
X ∈ L = L1 . By Definition 4.4.1, domain(X) is nonempty for each integrable function X ∈ L1 . Hence each full set D is nonempty. Two integrable functions Y,Z ∈ L1 are said to be equal almost everywhere (or Y = Z a.e. in symbols) if there exists a full set D ⊂ domain(Y ) ∩ domain(Z) such that Y = Z on D. More generally, a condition about a general element ω of is said to hold almost everywhere (a.e. for short) if it holds for each ω in a full set D. For example, according to the terminology established in the Introduction of this book, the statement Y ≤ Z means that for each ω ∈ we have (i ) ω ∈ domain(Y ) iff ω ∈ domain(Z), and (ii ) Y (ω) ≤ Z(ω) if ω ∈ domain(Y ). Hence the statement Y ≤ Z a.e. means that there exists some full set D such that, for each ω ∈ D, Conditions (i ) and (ii ) hold. A similar argument holds when ≤ is replaced by ≥ or by = . If A,B are subsets of , then A ⊂ B a.e. iff AD ⊂ BD for some full set D. Proposition 4.5.3. Properties of full sets. Let X,Y,Z ∈ L denote integrable functions. 1. A subset that contains a full set is a full set. The intersection of a sequence of full sets is again a full set. 2. Suppose W is a function on and W = X a.e. Then W is an integrable function with I W = I X. 3. If D is a full set, then D = domain(X) for some X ∈ L 4. X = Y a.e. if and only if I |X − Y | = 0. 5. If X ≤ Y a.e., then I X ≤ I Y . 6. If X ≤ Y a.e. and Y ≤ Z a.e., then X ≤ Z a.e., Moreover, if X ≤ Y a.e. and X ≥ Y a.e., then X = Y a.e. 7. Almost everywhere equality is an equality relation in L. In other words, for all X,Y,Z ∈ L, we have (i) X = X a.e.; (ii) if X = Y a.e., then Y = X a.e.; and (iii) if X = Y a.e. and Y = Z a.e., then X = Z a.e. Proof. 1. Suppose a subset A contains a full set D. By definition, D ⊃ domain(X) for some X ∈ L. Hence A ⊃ domain(X). Thus A is a full set. Now suppose Dn ⊃ domain(Xn ), where Xn ∈ L for each n ≥ 1. Define the function X by domain(X) = ∞ n=1 domain(Xn ), and by X(ω) = X1 (ω) for each ω ∈ domain(X). Then the function X has the sequence (X1,0X2,0X3, . . .) as a representation. Hence X is integrable, and so domain(X) is a full set. Since ∞
Dn ⊃
n=1
we see that
∞
n=1 Dn
∞
domain(Xn ) = domain(X),
n=1
is a full set.
Integration and Measure
67
2. By the definition of a.e. equality, there exists a full set D such that D ∩ domain(W ) = D ∩ domain(X) and W (ω) = X(ω) for each ω ∈ D ∩ domain(X). By the definition of a full set, D ⊃ domain(Z) for some Z ∈ L. It follows that the sequence (X,0Z,0Z, . . .) is a representation of the function W . Therefore W ∈ L1 = L with I W = I X. 3. Suppose D is a full set. By definition, D ⊃ domain(X) for some X ∈ L. Define a function W by domain(W ) ≡ D and W (ω) ≡ 0 for each ω ∈ D. Then W = 0X on the full set domain(X). Hence by Assertion 2, W is an integrable function, with D = domain(W ). 4. Suppose X = Y a.e. Then |X − Y | = 0(X − Y ) a.e. Hence I |X − Y | = 0 according to Assertion 2. Suppose, conversely, that I |X − Y | = 0. Then the function defined by Z ≡ ∞ n=1 |X − Y | is integrable. By definition, ∞
domain(Z) ≡ ω ∈ domain(X − Y ) : |X(ω) − Y (ω)| < ∞ n=1
= {ω ∈ domain(X) ∩ domain(Y ) : X(ω) = Y (ω)}. Thus we see that X = Y on the full set domain(Z). 5. Suppose X ≤ Y a.e. Then Y − X = |Y − X| a.e. Hence, by Assertion 4, I (Y − X) = I |Y − X| ≥ 0. 6. Suppose X ≤ Y a.e. and Y ≤ Z a.e. Then there exists a full set D such that D ∩ domain(X) = D ∩ domain(Y ) and X(ω) ≤ Y (ω) for each ω ∈ D ∩ domain(X). Similarly, there exists a full set D such that D ∩ domain(Y ) = D ∩ domain(Z) and Y (ω) ≤ Z(ω) for each ω ∈ D ∩ domain(Y ). By Assertion 1, the set DD is a full set. Furthermore, DD ∩ domain(X) = DD ∩ domain(Y ) = DD ∩ domain(Z) and X(ω) ≤ Y (ω) ≤ Z(ω) for each ω ∈ DD ∩ domain(X). It follows that X ≤ Z a.e. The remainder of Assertion 6 is similarly proved. 7. This is a trivial consequence of Assertion 4. Definition 4.5.4. Integrable set, measure of integrable set, measure-theoretic complement, and null set. A subset A of is called an integrable set if there exists an indicator X that is an integrable function such that A = (X = 1). We then call X an indicator of A and define 1A ≡ X; we also define the measure of A to be μ(A) ≡ I X. We will call the set Ac ≡ (X = 0) a measure-theoretic complement of the set A. An arbitrary subset of an integrable set A with measure μ(A) = 0 is called a null set. Note that two distinct integrable indicators X and Y can be indicators of the same integrable set A. Hence 1A and Ac are not uniquely defined relative to the
68
Probability Theory
set-theoretic equality for functions. However, the next proposition shows that they are uniquely defined relative to a.e. equality. Proposition 4.5.5. Uniqueness of integrable indicators and measure-theoretic complement relative to a.e. equality. Let A and B be integrable sets. Let X,Y be integrable indicators of A,B, respectively. Then the following conditions hold: 1. A = B a.e. iff X = Y a.e. Hence 1A is a well-defined integrable function relative to a.e. equality, and the measure μ(A) is well defined. Note that, in general, the set Ac need not be an integrable set. 2. If A = B a.e., then (X = 0) = (Y = 0) a.e. Hence Ac is a well-defined subset relative to equality a.e. 3. is a full set. The empty set φ is a null set. 4. Suppose μ(A) = 0. Then Ac is a full set. 5. If C is a subset of such that C = A a.e., then C is an integrable set, with 1C = 1A a.e. and μ(C) = μ(A). Proof. By the definition of an indicator for an integrable set, we have A = (X = 1) and B = (Y = 1). Let D be an arbitrary full set. Then the intersection D ≡ D ∩ domain(X) ∩ domain(Y ) is a full set. Since D D = D , we have D DA = D D(X = 1) and D DB = D D(Y = 1). 1. Suppose A = B a.e. Then A = B on some full set D. Then DA = DB. It follows from the previous paragraph that D (X = 1) = D D(X = 1) = D D(Y = 1) = D (Y = 1). By the remark following Definition 4.5.1, we see that for each ω ∈ D , X(ω) and Y (ω) are defined and equal. Hence X = Y a.e. Moreover, it follows from Proposition 4.5.3 that μ(A) ≡ I X = I Y ≡ μ(B). Conversely, suppose X = Y a.e. with D ∩ domain(X) = D ∩ domain(Y ) and X(ω) = Y (ω) for each ω ∈ D ∩ domain(X). Then D A = D DA = D D(X = 1) = D D(Y = 1) = D (Y = 1) = D B. Hence A = B a.e. 2. Suppose A = B a.e. In the proof for Assertion 1, we saw that for each ω in the full set D , we have X(ω) = 0 iff Y (ω) = 0. In short, (X = 0) = (Y = 0) a.e. Thus Assertion 2 is proved. 3. Let X be an arbitrary integrable function. Trivially ⊃ domain(X). Hence is a full set. Now define a function Y ≡ 0X. Then Y is an integrable indicator, with I Y = 0. Moreover, (Y = 1) = {ω ∈ domain(Y ) : Y (ω) = 1} = {ω ∈ domain(Y ) : 0X(ω) ≡ Y (ω) = 1} = φ. Thus Y is an indicator of the empty subset φ. Consequently, μ(φ) ≡ I Y = 0, whence φ is a null set. 4. Suppose μ(A) = 0. Then I X = μ(A) = 0. Hence the function Z ≡ ∞ i=1 X is integrable. Moreover,
Integration and Measure
domain(Z) = ω ∈ domain(X) :
∞
69
|X(ω)| = 0 = (X = 0) = Ac .
i=1
Thus Ac contains the domain of the integrable function Z, and is therefore a full set by Definition 4.5.2. 5. Suppose C = A on some full set D. Define a function W by domain(W ) ≡ C ∪ (X = 0), and by W (ω) ≡ 1 or 0 according to ω ∈ C or ω ∈ (X = 0), respectively. Then W = 1 = X on D(X = 1), and W = 0 = X on D(X = 10). Hence W = X on the full set D ∩ domain(X) = D(X = 1) ∪ D(X = 0). In short, W = X a.e. Therefore, by Assertion 2 of Proposition 4.5.3, the function W is integrable. Hence the set C = (W = 1) has the integrable indicator, and is therefore an integrable set, with 1C ≡ W = X ≡ 1A a.e. Moreover, μ(C) = I W = I X = μ(A). Definition 4.5.6. Convention: unless otherwise specified, equality of integrable functions will mean equality a.e.; equality of integrable sets will mean equality a.e. Let (,L,I ) be an arbitrary complete integration space. Let X,Y ∈ L and A,B ⊂ be arbitrary integrable sets. Henceforth, unless otherwise specified, the statement X = Y will mean X = Y a.e. Similarly, the statements X ≤ Y , X < Y , X ≥ Y , and X > Y will mean X ≤ Y a.e., X < Y a.e., X ≥ Y a.e., and X > Y a.e., respectively. Likewise, the statements A = B, A ⊂ B, and A ⊃ B will mean A = B a.e., A ⊂ B a.e., and A ⊃ B a.e., respectively. Suppose each of a sequence of statements is valid a.e. Then, in view of Assertion 1 of Proposition 4.5.3, there exists a full set on which all of these statements are valid; in other words, a.e., we have the validity of all the statements. For example if (An )n=1,2,... is a sequence of integrable sets with An ⊂ An+1 a.e. for each n ≥ 1, then A1 ⊂ A2 ⊂ · · · a.e. Proposition 4.5.7. Basics of measures of integrable sets. Let (,L,I ) be an arbitrary complete integration space. Let A,B be arbitrary integrable sets, with measure-theoretic complements Ac,B c , respectively. Then the following conditions hold: 1. A ∪ Ac is a full set, and AAc = φ. 2. A ∪ B is an integrable set, with integrable indicator 1A∪B = 1A ∨ 1B . 3. AB is an integrable set, with indicator 1AB = 1A ∧ 1B . 4. AB c is an integrable set, with 1AB c = 1A − 1A ∧ 1B . Furthermore, A(AB c )c = AB. 5. μ(A ∪ B) + μ(AB) = μ(A) + μ(B). 6. If A ⊃ B, then μ(AB c ) = μ(A) − μ(B). Proof. Let A,B be arbitrary integrable sets, with integrable indicators X,Y , respectively.
70
Probability Theory
1. We have A = (X = 1) and Ac = (X = 0). Hence AAc = φ. Moreover, A ∪ Ac = domain(X) is a full set. 2. By Assertion 1 of Proposition 4.3.4, the function X ∨ Y is integrable. At the same time, A ∪ B = (X ∨ Y = 1). Hence the set A ∪ B is integrable, with integrable indicator 1A∪B = X ∨ Y = 1A ∨ 1B . 3. By Assertion 1 of Proposition 4.3.4, the function X ∧ Y is integrable. At the same time, AB = (X ∧ Y = 1). Hence the set AB is integrable, with integrable indicator 1AB = X ∧ Y = 1A ∧ 1B 4. By Assertion 1 of Proposition 4.3.4, the function X − X ∧ Y is integrable. Hence the set AB c = (X = 1)(Y = 0) = (X − X ∧ Y = 1) is integrable, with integrable indicator 1AB c = X − X ∧ Y = 1A − 1A ∨ 1B . Furthermore, on the full set domain(X) ∩ domain(Y ), we have A(AB c )c = (X = 1)(X − X ∧ Y = 0) = (X = 1)(Y = 1) = AB. Thus A(AB c )c = AB. 5. Since 1A ∨ 1B + 1A ∧ 1B = 1A + 1B , Assertion 6 follows from the linearity of I . 6. Suppose AD ⊃ BD for some full set D. Then X ≥ Y on D. Hence X ∧ Y = Y . Consequently, I X ∧ Y = I Y . By Assertion 5, we have μ(AB c ) = I (X − X ∧ Y ) = I X − I Y = μ(A) − μ(B). Assertion 7 is proved. Proposition 4.5.8. Sequence of integrable sets. For each n ≥ 1, let An be an integrable set with a measure-theoretic complement Acn . Then the following conditions hold: A 1. If An ⊂ An+1 for each n ≥ 1, and if μ(An ) converges, then ∞
c n ∞ ∞ n=1 is an integrable set with μ = n=1 An = limn→∞ μ(An ) and n=1 An ∞ c. A n=1 n An 2. If An ⊃ An+1 for each n ≥ 1, and if μ(An ) converges, then ∞
c ∞ ∞ n=1 is an integrable set with μ = n=1 An = limn→∞ μ(An ) and n=1 An ∞ c. A n=1 n 3. If An Am = φ for each n > m ≥ 1, and if ∞ μ(An ) converges, then
n=1 ∞ ∞ ∞ n=1 An is an integrable set with μ n=1 An = n=1 μ(An ). ∞ ∞ 4. If μ(A ) converges, then A is an integrable set with n n=1 n=1 n
∞ ∞ μ n=1 An ≤ n=1 μ(An ). Proof. For each n ≥ 1, let 1A(n) be the integrable indicator of An . Then Acn = (1A(n) = 0). 1. Define a function Y by ∞ ∞ c An ∪ An domain(Y ) ≡ n=1
n=1
Integration and Measure 71 ∞ ∞ with Y (ω) ≡ 1 or 0 according to ω ∈ n=1 An or ω ∈ n=1 Acn , respectively. ∞ c Then (Y = 1) = ∞ n=1 An and (Y = 0) = n=1 An . For each n ≥ 1, we have An ⊂ An+1 and so 1A(n+1) ≥ 1A(n) . By assumption, we have the convergence of I (1A(1) ) + I (1A(2) − 1A(1) ) + · · · + I (1A(n) − 1A(n−1) ) = I 1A(n) = μ(An ) → lim μ(Ak ) k→∞
as n → ∞. Hence X ≡ 1A(1) +(1A(2) −1A(1) )+(1A3 −1A2 )+· · · is an integrable function. Consider an arbitrary ω ∈ domain(X). The limit lim 1A(n) (ω)
n→∞
= lim (1A(1) (ω) + (1A(2) − 1A(1) )(ω) + · · · + (1A(n) − 1A(n−1) )(ω)) = X(ω) n→∞
exists, and is either 0 or 1 since it is the limit of a sequence in {0,1}. Suppose X(ω) = 1. Then 1A(n) (ω) = 1 for some n ≥ 1. Hence ω ∈ ∞ n=1 An and so Y (ω) ≡ 1 = X(ω). Suppose X(ω) = 0. Then 1A(n) (ω) = 0 for each n ≥ 1. c Hence ω ∈ ∞ n=1 An and so Y (ω) ≡ 0 = X(ω). Combining, we see that Y = X on the full set domain(X). According to Proposition 4.5.3, we therefore have Y ∈ L. Thus ∞ n=1 An = (Y = 1) is an integrable set with Y as its indicator, and has measure equal to I Y = I X = lim I 1A(n) = lim μ(An ). ∞
n→∞
c
∞
n→∞
Moreover n=1 An = (Y = 0) = 2. Similar. 3. Write Bn = ni=1 Ai . Repeated application of Proposition 4.5.7 leads to n ∞ Assertion 1 we see that ∞ A = μ(Bn ) = i=1 μ(Ai ). From n=1 Bn
∞
n=1 n ∞ = μ = lim is an integrable set with μ A B μ(B n→∞ n) = n=1 n n=1 n ∞ i=1 μ(Ai ).
n−1 c n 4. Define B1 = A1 and Bn = k=1 Ak k=1 Ak for n > 1. Let D denote ∞ c c the full set k=1 (Ak ∪ Ak )(Bk ∪ Bk ). Clearly, Bn Bk = φ on D for each positive integer k < n. This implies μ(Bn Bk ) = 0 for each positive integer k < n. Furthermore, for every ω ∈ D, we have ω ∈ ∞ k=1 Ak iff there is a smallest n A . Since for every ω ∈ D either ω ∈ Ak or n > 0 such that ω ∈ ∞k=1 k c ω ∈ Ak , we have ω ∈ k=1 Ak iff there is an n > 0 such that ω ∈ Bn . In other
n−1 n ∞ words, ∞ k=1 Ak = k=1 Bk . Moreover, μ(Bn ) = μ k=1 Ak − μ k=1 Ak . Hence the sequence (Bn ) of integrable sets satisfies the hypothesis in Assertion 3. Therefore ∞ k=1 Bk is an integrable set, with ∞ ∞ n n
Ak = μ Bk = lim μ(Bk ) ≤ lim μ(Ak ) μ k=1
c n=1 An .
n→∞
k=1
=
∞
n=1
k=1
μ(An ).
n→∞
k=1
72
Probability Theory
Proposition 4.5.9. Convergence in L implies an a.e. convergent subsequence. Let X ∈ L and let (Xn )n=1,2,... be a sequence in L. If I |Xn − X| → 0, then there exists a subsequence (Yn )n=1,2,... such that Yn → X a.e. Proof. Let (Yn )n=1,2,... be a subsequence such that I |Yn − X| < 2−n . Define the sequence (Zn )n=1,2,... ≡ (X, − X + Y1,X − Y1, − X + Y2,X − Y2, . . .). ∞ Then n=1 I |Zn | < I |X| + 2 < ∞. Hence the set ∞
ω∈ domain(Zn ) : |Zn (ω)| < ∞ n=1
is a full set. Moreover, for each ω ∈ D, we have X(ω) =
∞
Zn (ω) = lim (Z1 (ω) + · · · + Z2n (ω)) = lim Yn (ω). n→∞
n=1
n→∞
We will use the next theorem many times to construct integrable functions. Theorem 4.5.10. A sufficient condition for a function to be integrable. Suppose X is a function defined a.e. on . Suppose there exist two sequences (Yn )n=1,2,... and (Zn )n=1,2,... in L such that |X − Yn | ≤ Zn a.e. for each n ≥ 1 and such that I Zn → 0 as n → ∞. Then X ∈ L. Moreover, I |X − Yn | → 0 as n → ∞. Proof. According to Proposition 4.5.9, there exists a subsequence (Zn(k) )k=1,2,... such that Zn(k) → 0 a.e. and such that I Zn(k) ≤ 2−k for each k ≥ 1. Since, by assumption, |X − Yn(k) | ≤ Zn(k) a.e. for each k ≥ 1, it follows that Yn(k) → X a.e. In other words, Yn(k) → X on domain(V ) for some V ∈ L. Moreover, |Yn(k+1) − Yn(k) | ≤ |X − Yn(k) | + |X − Yn(k+1) | ≤ Zn(k) + Zn(k+1) a.e. Hence 2I |V | +
∞
I |Yn(k+1) − Yn(k) | ≤ 2I |V | +
k=1
≤ 2I |V | +
∞
(I Zn(k) + I Zn(k+1) )
k=1 ∞
(2−k + 2−k+1 ) < 2I |V | + 2 < ∞. k=1
Therefore the sequence (V , − V ,Yn(1),Yn(2) − Yn(1),Yn(3) − Yn(2), . . .) in L is a representation of the function X. Consequently, X ∈ L1 = L. Moreover, since |X − Yn | ≤ Zn a.e. for each n ≥ 1, we have I |X − Yn | ≤ I Zn → 0 as n → ∞.
Integration and Measure
73
4.6 Abundance of Integrable Sets In this section, let (,L,I ) be a complete integration space. Let X be any function defined on a subset of and let t be a real number. Recall from the notations and conventions described in the Introduction that we use the abbreviation (t ≤ X) for the subset {ω ∈ domain(X) : t ≤ X(ω)}. Similar notations are used for (X < t), (X ≤ t), and (X < t). We will also write (t < X ≤ u) and similar for the intersection (t < X)(X ≤ u) and similar. Recall also the definition of the metric complement Jc of a subset J of a metric space. In the remainder of this section, let X be an arbitrary but fixed integrable function. We will show that the sets (t ≤ X) and (t < X) are integrable sets for each positive t in the metric complement of some countable subset of R. In other words, (t ≤ X) and (t < X) are integrable sets for all but countably many t ∈ (0,∞). First define some continuous functions that will serve as surrogates for step functions on R. Specifically, for real numbers s,t with 0 < s < t, define gs,t (x) ≡ (t − s)−1 (x ∧ t − x ∧ s). Then, by Assertion 1 of Proposition 4.3.4 and by linearity, the function gs,t (X) ≡ (t − s)−1 (X ∧ t − X ∧ s) is integrable for each s,t ∈ R with 0 < s < t. Moreover, 1 ≥ gt ,t ≥ gs,s ≥ 0 for each t ,t,s,s ∈ R with t < t ≤ s < s . If we can prove that lims↑t Igs,t (X) exists, then we can use the Monotone Convergence Theorem to show that the limit function lims↑t gs,t (X) is integrable and is an indicator of (t ≤ X), proving that the latter set is integrable. Classically the existence of lims↑t Igs,t (X) is trivial since, for fixed t, the integral Igs,t (X) is nonincreasing in s and bounded from below by 0. A nontrivial constructive proof that the limit exists for all but countably many t’s is given by [Bishop and Bridges 1985], who devised a theory of profiles for that purpose. In the following, we give a succinct presentation of this theory. Definition 4.6.1. Profile system. Let K be a nonempty open interval in R. Let G be a family of continuous functions on R with values in [0,1]. Let t ∈ K and g ∈ G be arbitrary. If g = 0 on (−∞,t] ∩ K, then we say that t precedes g and write t ♦ g. If g = 1 on [t,∞) ∩ K, then we say that g precedes t and write g ♦ t. Let t,s ∈ K and g ∈ G be arbitrary with t < s. If both t ♦ g and g ♦ s, then we say that the function g separates the points t and s, and write t ♦ g ♦ s. If for each t,s ∈ K with t < s, there exists g ∈ G such that t ♦ g ♦ s, then we say that the family G separates points in K. Suppose the family G separates points in K, and suppose λ is a real-valued function on G that is nondecreasing in the sense that, for each g,g with g ≤ g on K, we have λ(g) ≤ λ(g ). Then we say that (G,λ) is a profile system on the interval K. Definition 4.6.2. Profile bound. Let (G,λ) be an arbitrary profile system on a nonempty open interval K. Then we say that a closed interval [t,s] ⊂ K has a
74
Probability Theory
positive real number α as a profile bound, and write [t,s] α, if there exist t ,s ∈ K and f ,g ∈ G such that (i) f ♦ t , t < t ≤ s < s , s ♦ g and (ii) λ(f ) − λ(g) < α. Suppose a,b ∈ R and a ≤ b are such that (a,b) ⊂ K. Then we say an open interval (a,b) ⊂ K has a positive real number α as a profile bound, and write (a,b) α, if [t,s] α for each closed subinterval [t,s] of (a,b). Note that the open interval (a,b), defined as the set {x ∈ R : a < x < b}, can be empty. Note that t ♦ g is merely an abbreviation for 1[t,∞) ≥ g and g ♦ t is an abbreviation for g ≥ 1[t,∞) . The motivating example of a profile system is when K ≡ (0,∞),G ≡ {gs,t : s,t ∈ K and 0 < s < t}, and the function λ is defined on G by λ(g) ≡ Ig(X) for each g ∈ G. It can easily be verified that (G,λ) is then a profile system on K. The next lemma lists some basic properties of a profile system. Lemma 4.6.3. Basics of a profile system. Let (G,λ) be an arbitrary profile system on an open interval K in R. Then the following conditions hold: 1. If f ♦ t, t ≤ s, and s ♦ g, then f ≥ g and λ(f ) ≥ λ(g). 2. If t ≤ s and s ♦ g, then t ♦ g. 3. If g ♦ t and t ≤ s, then g ♦ s. 4. In view of the transitivity in Assertions 2 and 3, we can rewrite, without ambiguity, Condition (i) in Definition 4.6.1 as f ♦ t < t ≤ s < s ♦ g. 5. Suppose [t,s] α and t0 < t ≤ s < s0 . Let ε > 0 be arbitrary. Then there exist t1,s1 ∈ K and f1,g1 ∈ G such that (i) t0 ♦ f1 ♦ t1 < t ≤ s < s1 ♦ g1 ♦ s0 , (ii) λ(f1 ) − λ(g1 ) < α, and (iii) t − ε < t1 < t and s < s1 < s + ε. 6. Every closed subinterval of K has a finite profile bound. Proof. We will prove only Assertions 5 and 6, as the rest of the proofs are trivial. 1. Suppose [t,s] α and t0 < t ≤ s < s0 . Then, according to Definition 4.6.2, there exist t ,s ∈ K and f ,g ∈ G such that (i ) f ♦ t < t ≤ s < s ♦ g and (ii ) λ(f ) − λ(g) < α. Let ε > 0 be arbitrary. Then t0 ∨ t ∨ (t − ε) < t. Hence there exist real numbers t ,t1 such that t0 ∨ t ∨ (t − ε) < t < t1 < t.
(4.6.1)
Since G separates points in K, there exists f1 ∈ G such that t0 < t ♦ f1 ♦ t1 < t.
(4.6.2)
Then f ♦ t < t ♦ f1 . Hence, in view of Assertion 1, we have λ(f ) ≥ λ(f1 ). Similarly, we obtain s ,s1 ∈ K and g1 ∈ G such that s < s1 < s < s0 ∧ s ∧ (s + ε)
(4.6.3)
s < s1 ♦ g1 ♦ s < s0
(4.6.4)
and
Integration and Measure
75
with λ(g1 ) ≥ λ(g). Hence λ(f1 ) − λ(g1 ) ≤ λ(f ) − λ(g) < α. Condition (ii) in Assertion 5 is established. Condition (i) in Assertion 5 follows from relations 4.6.2 and 4.6.4. Condition (iii) in Assertion 5 follows from relations 4.6.1 and 4.6.3. Assertion 5 is proved. 2. Given any interval [t,s] ⊂ K, let t ,t ,s ,s be members of K such that t < t < t ≤ s < s < s . Since G separates points in K, there exist f ,g ∈ G such that t ♦ f ♦ t < t ≤ s < s ♦ g ♦ s . Take any real number α such that λ(f ) − λ(g) < α. Then [t,s] α. Assertion 6 is proved. Lemma 4.6.4. Subintervals with small profile bounds of a given interval with a finite profile bound. Let (G,λ) be a profile system on a proper open interval K in R. Let [a,b] be a closed subinterval of K with [a,b] α. Let ε > 0 be arbitrary. Let q be an arbitrary integer with q ≥ αε−1 . Then there exists a sequence s0 = a ≤ s1 ≤ . . . ≤ sq = b of points in the interval [a,b] such that (sk−1,sk ) ε for each k = 1, . . . ,q. Proof. 1. As an abbreviation, write dn ≡ 2−n (b−a) for each n ≥ 1. By hypothesis, [a,b] α. Hence there exist a ,b ∈ K and f ,f ∈ G such that (i) f ♦ a < a ≤ b < b ♦ f and (ii) λ(f ) − λ(f ) < α ≤ qε. 2. Let n ≥ 1 be arbitrary. For each i = 0, . . . ,2n , define tn,i ≡ a + idn . Thus tn,0 = a and tn,2n = b. Define Dn ≡ {tn,i : 0 ≤ i ≤ 2n }. Then, for each i = 0, . . . ,2n , we have tn,i ≡ a + idn = a + 2idn+1 ≡ tn+1,2i ∈ Dn+1 . Hence Dn ⊂ Dn+1 . 3. Let n ≥ 1 be arbitrary. Let i = 1, . . . ,2n be arbitrary. Then we have tn,i−1,tn,i ∈ [a,b] ⊂ K, with tn,i−1 < tn,i . By hypothesis, the pair (G,λ) is a profile system on the interval K. Hence the family G of functions separates points in K. Therefore there exists a function fn,i ∈ G with tn,i−1 ♦ fn,i ♦ tn,i . In addition, define fn,0 ≡ f and fn,2n +1 ≡ f . Then, according to Condition (i) in Step 1, we have fn,0 ♦ tn,0 = a and b = tn,2n ♦ fn,2n +1 . Combining with Step 3, we obtain fn,0 ♦ tn,0 ♦ fn,1 ♦ tn,1 · · · ♦ fn,2n ♦ tn,2n ♦ fn,2n +1 .
(4.6.5)
By Assertion 1 of Lemma 4.6.3, relation 4.6.5 implies that λ(fn,0 ) ≥ λ(fn,1 ) ≥ · · · ≥ λ(fn,2n ) ≥ λ(fn,2n +1 ).
(4.6.6)
4. Next, let t = tn,i ∈ Dn and s ≡ tn+1,j ∈ Dn+1 be arbitrary with s ≤ t − dn . Then t > a. Hence i ≥ 1 and s ≤ tn,i−1 . Moreover, fn+1,j ♦ tn+1,j = s ≤ tn,i−1 ♦ fn,i .
76
Probability Theory
Therefore, by Assertion 1 of Lemma 4.6.3, we obtain λ(fn+1,j ) ≥ λ(fn,i ),
(4.6.7)
where t = tn,i ∈ Dn and s ≡ tn+1,j ∈ Dn+1 are arbitrary with s ≤ t − dn . 5. Similarly, let t = tn,i ∈ Dn and s ≡ tn+1,j ∈ Dn+1 be arbitrary with s ≥ t + dn+1 . Then s > a. Hence j ≥ 1 and tn+1,j −1 ≥ t. Hence fn,i ♦ tn,i = t ≤ tn+1,j −1 ♦ fn+1,j . Therefore, by Assertion 1 of Lemma 4.6.3, we obtain λ(fn,i ) ≥ λ(fn+1,j ),
(4.6.8)
for each t = tn,i ∈ Dn and s ≡ tn+1,j ∈ Dn+1 with s ≥ t + dn+1 . 6. For each n ≥ 1 and for each i = 0, . . . ,2n , write, as an abbreviation, Sn,i ≡ λ(f ) − λ(fn,i ). In view of Condition (ii) in Step 1, there exists ε > 0 such that qε > qε > λ(f ) − λ(f )
(4.6.9)
|ε − Sn,i k −1 | > 0
(4.6.10)
and such that
for each k = 1, . . . ,q, for each i = 0, . . . ,2n , for each n ≥ 1. 7. Let n ≥ 1 be arbitrary. Note that Sn,0 ≡ λ(f ) − λ(fn,0 ) ≡ λ(f ) − λ(f ) = 0. and Sn,2n ≡ λ(f ) − λ(fn,2n ) ≤ λ(f ) − λ(f ) < qε . Hence inequality 4.6.6 implies that 0 = Sn,0 ≤ Sn,1 ≤ · · · ≤ Sn,2n < qε .
(4.6.11)
8. Now let k = 1, . . . ,q be arbitrary. Let n ≥ 1 be arbitrary. Note from inequality 4.6.10 that |kε − Sn,i | > 0 for each i = 0, . . . ,2n . Therefore, from inequality 4.6.11, we see that there exists a largest integer in,k = 0, . . . ,2n such that Sn,i(n,k) < kε . In other words, in,k is the largest integer among 0, . . . ,2n such that λ(f ) − λ(fn,i(n,k) ) < kε . Define sn,k ≡ tn,i(n,k) ∈ [a,b].
(4.6.12)
Integration and Measure
77
Note that in,q = 2n by inequality 4.6.11. Hence sn,q ≡ tn,2n = b. For convenience, define in,0 ≡ 0. 9. Suppose k ≤ q − 1. Then in,k+1 is, by definition, the largest integer among 0, . . . ,2n such that Sn,i(n,k+1) < (k + 1)ε . At the same time, by inequality 4.6.12, we have Sn,i(n,k) < kε < (k + 1)ε . Hence in,k ≤ in,k+1 . Consequently, sn,k ≡ tn,i(n,k) ≤ tn,i(n,k+1) ≡ sn,k+1,
(4.6.13)
provided that k < q. 10. With k = 1, . . . ,q arbitrary but fixed until further notice, we will show that the sequence (sn,k )n=1,2,... converges as n → ∞. To that end, let n ≥ 1 be arbitrary, and proceed to estimate |sn,k −sn+1,k |. As an abbreviation, write i ≡ in,k and j ≡ in+1,k . 11. First suppose, for the sake of a contradiction, that sn+1,k − sn,k > dn . Then tn,i + dn ≡ sn,k + dn < sn+1,k ≡ tn+1,j ≤ b. Hence i ≤ 2n − 1 and tn,i+1 = tn,i + dn < tn+1,j . Therefore, since tn,i+1,tn+1,j ∈ Dn+1 , we obtain tn,i+1 + dn+1 ≤ tn+1,j . Inequality 4.6.8, where i is replaced by i + 1, therefore implies that λ(fn,i+1 ) ≥ λ(fn+1,j ).
(4.6.14)
At the same time, by the definitions of in,k and in+1,k , we have Sn+1,j < kε ≤ Sn,i+1, whence λ(fn+1,j ) > λ(fn,i+1 ), contradicting inequality 4.6.14. We conclude that sn+1,k − sn,k ≤ dn .
(4.6.15)
12. Now suppose, again for the sake of a contradiction, that sn,k − sn+1,k > dn + dn+1 . Then tn+1,j + dn+1 ≡ sn+1,k + dn+1 < sn,k ≡ tn,i ≤ b.
78
Probability Theory
Hence j ≤ 2n+1 − 1 and tn+1,j +1 + dn = tn+1,j + dn+1 + dn = sn+1.k + dn+1 + dn < sn,k = tn,i . Inequality 4.6.7, where j is replaced by j + 1, therefore implies that λ(fn+1,j +1 ) ≥ λ(fn,i ).
(4.6.16)
At the same time, by the definition of in,k and in+1,k , we have Sn,i < kε ≤ Sn+1,j +1, whence λ(fn,i ) > λ(fn+1,j +1 ), contradicting inequality 4.6.16. We conclude that sn,k − sn+1,k ≤ dn + dn+1 .
(4.6.17)
13. Combining inequalities 4.6.15 and 4.6.17, we obtain |sn,k − sn+1,k | ≤ dn + dn+1 ≡ (2−n + 2−n−1 )(b − a) < 2−n+1 (b − a). Hence, for each p ≥ n, we can inductively obtain |sn,k − sp,k | = |(sn,k − sn+1,k ) + (sn+1,k − sn+2,k ) + · · · + (sp−1,k − sp,k )| ≤ (2−n+1 + 2−n + . . .)(b − a) = 2−n+2 (b − a) = 4dn . (4.6.18) Thus we see that the sequence (sn,k )n=1,2,... is Cauchy, and converges to some sk ∈ [a,b] as n → ∞. Letting p → ∞ in inequality 4.6.18, we obtain |sn,k − sk | ≤ 4dn,
(4.6.19)
where n ≥ 1 is arbitrary. 14. For ease of notations, we will also define sn,0 ≡ s0 ≡ a. Then inequality 4.6.13 implies that a ≡ sn,0 ≤ sn,1 ≤ · · · ≤ sn,q = b.
(4.6.20)
Letting n → ∞,we obtain a = s0 ≤ s1 ≤ · · · ≤ sq = b.
(4.6.21)
15. Now let k = 1, . . . ,q be arbitrary. Suppose [u,v] ⊂ (sk−1,sk ) for some real numbers u ≤ v. We will show that [u,v] ε. To that end, let n ≥ 1 be so large that sk−1 + 5dn < u ≤ v < sk − 5dn . This implies, in view of inequality 4.6.19, that sn,k−1 + dn ≤ sk−1 + 5dn < u ≤ v < sk − 5dn < sn,k − dn .
(4.6.22)
Integration and Measure
79
As abbreviations, write i ≡ in,k−1 and j ≡ in,k . Then tn,i ≡ sn,k−1 < sn,k ≤ b. Hence in,k−1 ≡ i < 2n . By the definitions of the integers in,k−1 and in,k , we then have (k − 1)ε ≤ Sn,i(n,k−1)+1
(4.6.23)
Sn,i(n,k) < kε .
(4.6.24)
tn,i+1 = tn,i + dn = sn,k−1 + dn < u,
(4.6.25)
and
Consequently,
where the inequality is by inequality 4.6.22. At the same time, v < sn,k − dn = tn,j − dn = tn,j −1,
(4.6.26)
where the inequality is by inequality 4.6.22. Combining inequalities 4.6.25 and 4.6.26, we obtain fn,i+1 ♦ tn,i+1 < u ≤ v < tn,j −1 ♦ fn,j .
(4.6.27)
Furthermore, inequalities 4.6.23 and 4.6.24 together imply that Sn,j − Sn,i+1 < kε − (k − 1)ε = ε . Equivalently, λ(fn,i+1 ) − λ(fn,j ) < ε . Thus Conditions (i) and (ii) in Definition 4.6.2, where [t,s], t ,s ,f,g, and α are replaced by [u,v], tn,i+1,tn,j −1,fn,i+1,fn,j , and ε , respectively, are satisfied. Accordingly, [u,v] ε < ε. Since [u,v] is an arbitrary closed subinterval of (sk−1,sk ), we have established that (sk−1,sk ) ε according to Definition 4.6.1, where k = 1, . . . ,q is arbitrary. This, together with inequality 4.6.21, prove the lemma. Theorem 4.6.5. All but countably many points have an arbitrarily low profile bound. Let (G,λ) be an arbitrary profile system on a proper open interval K. Then there exists a countable subset J of K such that for each t ∈ K ∩ Jc , we have [t,t] ε for arbitrarily small ε > 0. Proof. Let [a,b] ⊂ [a2,b2 ] ⊂ · · · be a sequence of subintervals of K such that K = ∞ p=1 [ap ,bp ]. Let p ≥ 1 be arbitrary. Then [ap ,bp ] αp for some real number αp , according to Assertion 6 of Lemma 4.6.3. Now let qp be an arbitrary integer with qp ≥ αp p. Then, according to Lemma 4.6.4, there exists a finite sequence (p)
s0 (p)
(p)
such that (sk−1,sk )
1 p
(p)
= ap ≤ s1
(p)
≤ · · · ≤ sq(p) = bp
for each k = 1, . . . ,qp .
(4.6.28)
80
Probability Theory (p)
Define J ≡ {sk : 1 ≤ k ≤ qp ;p ≥ 1}. Suppose t ∈ K ∩ Jc . Let ε > 0 be arbitrary. Let p ≥ 1 be so large that t ∈ [ap,bp ] and that p1 < ε. By the definition (p)
of the metric complement Jc , we have |t −sk | > 0 for each k = 1, . . . ,qp . Hence, (p) (p) by inequality 4.6.28, we have t ∈ (sk−1,sk ) for some k = 1, . . . ,qp . Since (p)
(p)
(sk−1,sk )
1 p,
we have [t,t]
1 p
< ε according to Definition 4.6.2.
An immediate application of Theorem 4.6.4 is to establish the abundance of integrable sets, in the following main theorem of this section. Theorem 4.6.6. Abundance of integrable sets. Given an integrable function X on the complete integration space (,L,I ), there exists a countable subset J of (0,∞) such that for each positive real number t in the metric complement Jc of J , the following conditions hold: 1. The sets (t ≤ X) and (t < X) are integrable sets, with (t ≤ X)c = (X < t) and (t < X)c = (X ≤ t). 2. The measures μ(t ≤ X) and μ(t < X) are equal and are continuous at each t > 0 with t ∈ Jc . Proof. 1. Let K ≡ (0,∞). Define the family of functions G ≡ {gs,t : s,t ∈ K and 0 < s < t}, where gs,t denotes the function defined on R by gs,t (x) ≡ (t − s)−1 (x ∧ t − x ∧ s) for each x ∈ R. Define the real-valued function λ on G by λ(g) ≡ Ig(X) for each g ∈ G. For each t,s ∈ K with t < s, we have t ♦ gs,t ♦ s in the sense of Definition 4.6.1. In other words, the family G separates points in K. As observed in the beginning of this section, the function gs,t (X) ≡ (t − s)−1 (X ∧ t − X ∧ s) is integrable, for each s,t ∈ R with 0 < s < t, by Assertion 1 of Proposition 4.3.4 and by linearity. Hence the function λ is welldefined. Moreover, by Assertion 4 of Proposition 4.3.4, for each g,g with g ≤ g on K, we have λ(g) ≤ λ(g ). Thus the function λ is is nondecreasing. Summing up, the couple (G,λ) is a profile system according to Definition 4.6.1. 2. Let the countable subset J of K be constructed for the profile system (G,λ) as in Theorem 4.6.5. 3. Let t ∈ K ∩ Jc be arbitrary. Then, by Theorem 4.6.5, we have [t,t] p1 for each p ≥ 1. Recursively applying Assertion 5 of Lemma 4.6.3, we can construct two sequences (up )p= 0,1,... and (vp )p= 0,1,... in K, and two sequences (fp )p=1,2,... and (gp )p=1,2,... in G, such that for each p ≥ 1 we have (i) up−1 ♦ fp ♦ up < t < vp ♦ gp ♦ vp−1 , (ii) λ(fp ) −λ(gp ) < p1 , and (iii) t − p1 < up < vp < t + p1 . Then Condition (i) implies that fp ♦ up ♦ fp+1 for each p > 1. Hence, by Assertion 1 of Lemma 4.6.3, we have fp ≥ fp+1 for each p ≥ 1.
Integration and Measure
81
4. Consider p,q ≥ 1. We have fq ♦ uq < t < vp ♦ gp . Hence λ(fq ) ≥ λ(gp ) > λ(fp ) −
1 , p
(4.6.29)
where the second inequality is by Condition (ii). By symmetry, we also have λ(fp ) > λ(fq ) − q1 . Combining, we obtain |λ(fq ) − λ(fp )| < p1 + q1 . Hence (λ(fp ))p=1,2,... is a Cauchy sequence and converges. Similarly, (λ(gp ))p=1,2,... converges. In view of Condition (ii), the two limits are equal. 5. By the definition of λ, we see that limp→∞ Ifp (X) ≡ limp→∞ λ(fp ) exists. Since fp ≥ fp+1 for each p > 1, as proved in Step 3, the Monotone Convergence Theorem (Theorem 4.4.8) implies that Y ≡ limp→∞ fp (X) is an integrable function, with Ifp (X) ↓ I Y . Likewise, Z ≡ limp→∞ gp (X) is an integrable function, with Igp (X) ↑ I Z. Furthermore, I |Y − Z| = I Y − I Z = lim (λ(fp ) − λ(gp )) = 0, p→∞
(4.6.30)
where the last equality is thanks to equality 4.6.29. Hence, according to Assertion 4 of Proposition 4.5.3, we have Y = Z a.e. 6. We next show that Y is an indicator with (Y = 1) = (t ≤ X). To that end, consider each ω ∈ domain(Y ). Suppose Y (ω) > 0. Let p ≥ 2 be arbitrary. Then ω ∈ domain(X) and fp (X(ω)) ≥ Y (ω) > 0. Condition (i) in Step 3 implies that up−1 ♦ fp . Therefore we have, according to Definition 4.6.1, fp (u) = 0 for each u ∈ (−∞,up−1 ] ∩ K. Since fp (X(ω)) > 0, we infer that X(ω) ∈ [up−1,∞). At the same time, Condition (i) in Step 3 implies that fp−1 ♦ up−1 . Combining, fp−1 (X(ω)) = 1, where p ≥ 2 is arbitrary. Letting p → ∞, we conclude that X(ω) ∈ [t,∞) and Y (ω) = 1, assuming Y (ω) > 0. In particular, Y can have only two possible values, 0 or 1. Thus Y is an indicator. We have also seen that (Y = 1) ⊂ (Y > 0) ⊂ (t ≤ X).
(4.6.31)
7. Conversely, consider each ω ∈ domain(X) such that t ≤ X(ω). Let p ≥ 1 be arbitrary. Condition (i) in Step 3 implies that fp ♦ t. Therefore we have, according to Definition 4.6.1, fp (u) = 1 for each u ∈ [t,∞) ∩ K. Combining, we conclude that fp (X(ω)) = 1, where p ≥ 1 is arbitrary. It follows that limp→∞ fp (X(ω)) = 1. Thus ω ∈ domain(Y ) and Y (ω) = 1. Thus (t ≤ X) ⊂ (Y = 1). 8. Combined with relation 4.6.31, this implies that (t ≤ X) = (Y = 1). In other words, the set (X ≥ t) has the integrable indicator Y as an indicator and is therefore an integrable set,with (X ≥ t)c ≡ (Y = 0). Now consider each ω ∈ (Y = 0). Then limp→∞ fp (X(ω)) = Y (ω) = 0. Hence X(ω) < t, and so ω ∈ (X < t). Conversely, consider each ω ∈ (X < t). Then Y (ω) = limp→∞ fp (X(ω)) = 0. Hence ω ∈ (Y = 0). Combining, we obtain (X ≥ t)c ≡ (Y = 0) = (X < t). Summing up, the set (t ≤ X) is an integrable set, with Y as an integrable indicator and with (t ≤ X)c = (X < t).
82
Probability Theory
9. Similarly, we can prove that the set (t < X) is an integrable set, with Z as an integrable indicator and with (X > t)c ≡ (Z = 0) = (X ≤ t). (t < X)c = (X ≤ t). Thus Assertion 1 of the theorem is proved. 10. To prove Assertion 2, first note that μ(t ≤ X) = I Y = I Z = μ(t < X), where the second equality follows from equality 4.6.30. 11. It remains for us to show that μ(t ≤ X) is continuous at t. To that end, let p > 1 be arbitrary. Recall Conditions (i–iii) in Step 3 – to wit, (i) up−1 ♦ fp ♦ up < t < vp ♦ gp ♦ vp−1 , (ii) λ(fp ) − λ(gp ) < t−
1 p
< up < vp < t +
1 p.
1 p,
and (iii)
By the monotone convergence in Step 5, we have
λ(fp ) ≡ Ifp (X) ≥ μ(t ≤ X) ≥ Igp (X) ≡ λ(gp ). Hence λ(fp ) − μ(t ≤ X) ≤ λ(fp ) − λ(gp ) ≤
1 . p
12. With p > 1 arbitrary but fixed, consider any t ∈ K ∩ Jc such that t ∈ (up,vp ). Similarly to Step 3, by recursively applying Assertion 5 of Lemma 4.6.3, we can construct two sequences (u q )q=0,1,... and (v q )q=0,1,... in K, and two sequences (fq )q=1,2,... and (gq )q=1,2,... in G, such that for each q ≥ 1 we have (i) u q−1 ♦ fq ♦ u q < t < v q ♦ gq ♦ v q−1 , (ii) λ(fq ) − λ(gq )
1. 13. Let q ≥ 1 be so large that u q−1,v q−1 ∈ (up,vp ) for each q ≥ q. Consider each q ≥ q. Because u q−1,v q−1 ∈ (up,vp ), Condition (i ) can be extended to 1 q
1 q.
fp ♦ up < u q−1 ♦ fq ♦ u q < t < v q ♦ gq ♦ v q−1 < vp ♦ gp . It follows that 0 ≤ λ(fp ) − λ(fq ) ≤ λ(fp ) − λ(gp ) ≤ By Step 5, (t ≤ X) is an integrable set, with 0 ≤ λ(fp ) − μ(t ≤ X) ≤
1 . p
Similarly, (t ≤ X) is an integrable set, with 0 ≤ λ(fq ) − μ(t ≤ X) ≤
1 . q
1 . p
Integration and Measure
83
Combining the three last displayed inequalities, we obtain |μ(t ≤ X) − μ(t ≤ X)| ≤
2 1 + . p q
Since q ≥ q is arbitrarily large, the last displayed inequality implies |μ(t ≤ X) − μ(t ≤ X)| ≤
2 , p
where t ∈ (up,vp ) ∩ Jc is arbitrary. Since p ≥ 1 is arbitrary, continuity of μ(t ≤ X) at t has thus been established. Corollary 4.6.7. Abundance of integrable sets. Let X be an arbitrary integrable function. There exists a countable subset J of R such that for each t in the metric complement Jc of J , the following conditions hold: 1. If t > 0, then the sets (t < X) and (t ≤ X) are integrable, with equal measures that are continuous at t. 2. If t < 0, then the sets (X < t) and (X ≤ t) are integrable, with equal measures that are continuous at t. Proof. Assertion 1 is a restatement of Theorem 4.6.6. To prove Assertion 2, apply Assertion 1 to X and −X. Definition 4.6.8. Regular and continuity points of an integrable function relative to an integrable set. Let X be an integrable function, let A be an integrable set, and let t ∈ R. 1. We say that t is a regular point of X relative to A if (i) there exists a sequence (sn )n=1,2,... of real numbers decreasing to t such that (sn < X)A is integrable for each n ≥ 1 and such that limn→∞ μ(sn < X)A exists, and (ii) there exists a sequence (rn )n=1,2,... of real numbers increasing to t such that (rn < X)A is integrable for each n ≥ 1 and such that limn→∞ μ(rn < X)A exists. 2. If, in addition, the two limits in (i) and (ii) are equal, then we call t a continuity point of X relative to A. 3. We say that a positive real number t > 0 is a regular point of X if Conditions (i) and (ii), with A replaced by , are satisfied. We say that a negative real number t < 0 is a regular point of X if −t is a regular point of −X. Corollary 4.6.9. Simple properties of regular and continuity points. Let X be an integrable function, let A be an integrable set, and let t be a regular point of X relative to A. Then the following conditions hold: 1. If u is a regular point of X, then u is a regular point for X relative to any integrable set B. If u is a continuity point of X, then u is a continuity point for X relative to any integrable set B. 2. All but countably many positive real numbers are continuity points of X. 3. All but countably many real numbers are continuity points of X relative to A. Hence all but countably many real numbers are regular points of X relative to A.
84
Probability Theory
4. The sets A(t < X),A(t ≤ X),A(X < t), A(X ≤ t), and A(X = t) are integrable sets. 5. (X ≤ t)A = A((t < X)A)c and (t < X)A = A((X ≤ t)A)c . 6. (X < t)A = A((t ≤ X)A)c and (t ≤ X)A = A((X < t)A)c . 7. For a.e. ω ∈ A, we have t < X(ω), t = X(ω), or t > X(ω). Thus we have a limited, but very useful, version of the principle of excluded middle. 8. Let ε > 0 be arbitrary. There exists δ > 0 such that if r ∈ (t − δ,t] and A(X < r) is integrable, then μ(A(X < t)) − μ(A(X < r)) < ε. There exists δ > 0 such that if s ∈ [t,t + δ) and A(X ≤ s) is integrable, then μ(A(X ≤ s)) − μ(A(X ≤ t)) < ε. 9. If t is a continuity point of X relative to A, then μ((t < X)A) = μ((t ≤ X)A). Proof. 1. Suppose u > 0 and u is a regular point of X. Then by Definition 4.6.8, (i ) there exists a sequence (sn )n=1,2,... of real numbers decreasing to u such that (sn < X) is integrable and limn→∞ μ(sn < X) exists and (ii ) there exists a sequence (rn )n=1,2,... of positive real numbers increasing to u such that (rn < X) is integrable and limn→∞ μ(rn < X) exists. Now let B be any integrable set. Then for all m > n ≥ 1, we have 0 ≤ μ((sm < X)B) − μ((sn < X)B) = μ(B(sm < X)(sn < X)c ) ≤ μ((sm < X)(sn < X)c ) = μ(sm < X) − μ(sn < X) ↓ 0 as n → ∞. Therefore (μ((sn < X)B))n=1,2,... is a Cauchy sequence and converges, verifying Condition (i) of Definition 4.6.8. Condition (ii) of Definition 4.6.8 is similarly verified. Hence u is a regular point of X relative to B. Suppose, in addition, that u is a continuity point of X. Then for each n ≥ 1, we have 0 ≤ μ((rn < X)B) − μ((sn < X)B) = μ(B(rn < X)(sn < X)c ) ≤ μ((rn < X)(sn < X)c ) = μ(rn < X) − μ(sn < X) ↓ 0. Therefore the two sequences (μ((rn < X)B))n=1,2,... and (μ((sn < X)B))n=1,2,... have the same limit. Thus t is a continuity point of X relative to B, according to Definition 4.6.8. Thus Assertion 1 is proved for the case t > 0. The case t < 0 is similar. Assertion 1 is proved. 2. By Corollary 4.6.7, there exists a countable subset J of R such that for each t > 0 in the metric complement Jc of J , the sets (t < X) and (t ≤ X) are integrable, with equal measures that are continuous at t. Consider each t ∈ (0,∞)Jc . Take any sequence (sn )n=1,2,... in (0,∞)Jc such that sn ↓ t. Then (sn < X) is integrable for each n ≥ 1, with μ(sn < X) ↓ μ(t < X) by continuity. Similarly, we can take a sequence (rn )n=1,2,... in (0,∞)Jc such that rn ↑ t. Then (rn < X) is integrable for each n ≥ 1, with μ(rn < X) ↑ μ(t < X) by continuity. Combining, we see that the conditions in Definition 4.6.8 are satisfied for t to be a continuity points of X. Assertion 2 is proved.
Integration and Measure
85
3. By Assertion 2, there exists a countable subset J of R such that each t ∈ (0,∞)Jc is a continuity point of both X and −X. There is no loss of generality in assuming that 0 ∈ J . Consider each r ∈ (0,∞)Jc . Then r is a continuity point of X. Hence, by Assertion 1,r is a continuity point of X relative to A. Now consider each r ∈ (−∞,0)Jc . Then t ≡ −r ∈ (0,∞)Jc is a continuity point of −X. Hence by Assertion 1,t is a continuity point of −X relative to A. It follows that r is a continuity point of X relative to A. Thus Assertion 3 is verified. 4. By hypothesis, t is a regular point of X relative to the integrable A. Hence there exist two sequences of real numbers (sn )n=1,2,... and (rn )n=1,2,... satisfying Conditions (i) and (ii) in Definition 4.6.8. Then (sn < X)A is an integrable set for n ≥ 1 and a = limn→∞ μ(sn < X)A exists. Since (sn ) decreases to t, we have A(t < X) =
∞
A(sn < X).
n=1
Hence the Monotone Convergence Theorem implies that the union A(t < X) is integrable, with μ(A(t < X)) = a.
(4.6.32)
Consequently, the set A(X ≤ t) = A((t < X)A)c is integrable by Proposition 4.5.7. Similarly, the set (rn < X)A is an integrable set for n ≥ 1, and b = limn→∞ μ(rn < X)A exists. Since (rn ) increases to t, we have A(t ≤ X) =
∞
A(rn < X).
n=1
Hence the Monotone Convergence Theorem implies that the intersection A(t ≤ X) is integrable, with μA(t ≤ X) = b.
(4.6.33)
Hence, by Assertion 3 of Proposition 4.5.7, the set A(X < t) = A((t ≤ X)A)c
a.e
(4.6.34)
is integrable, with μA(X < t) = μA − b. Likewise, the Monotone Convergence Theorem implies that the intersection A(X = t) =
∞
A(rn < X ≤ sn )
n=1
is integrable, with μ(A(X = t)) = lim (μ(rn < X) − μ(sn < X)) = b − a. n→∞
Assertion 4 is proved.
(4.6.35)
86
Probability Theory
5. Using Proposition 4.5.7, we obtain (X ≤ t)A = A((t < X)A)c and (t < X)A = A((X ≤ t)A)c . This proves Assertion 5. 6. Similar. 7. Combining equalities 4.6.33, 4.6.32, and 4.6.35, we see that μA(t ≤ X) − μA(t < X) = b − a = μ(A(X = t)). Hence μA = μA(t ≤ X) + μA(X < t) = μA(t < X) + b − a + μA(X < t) = μ(A(t < X)) + μ(A(X = t)) + μA(X < t) = μ(A((t < X) ∪ (X = t) ∪ (X < t)). Since A ⊃ A((t < X) ∪ (X = t) ∪ (X < t)), the last displayed equality implies that A = A((t < X) ∪ (X = t) ∪ (X < t)) a.e. This proves Assertion 7. 8. Let ε > 0 be arbitrary. Let n ≥ 1 be so large that μ((t < X)A) − μ(sn < X)A < ε. Define δ = sn − t. Suppose s ∈ [t,t + δ) and A(X ≤ s) are integrable. Then s < sn and so μ(A(X ≤ s)) − μ(A(X ≤ t)) = (μ(A) − μ(s < X)A) − (μ(A) − μ((t < X)A)) = μ((t < X)A) − μ(s < X)A ≤ μ((t < X)A) − μ(sn < X)A < ε. This proves the second half of Assertion 8, the first half being similar. 9. Suppose t is a continuity point of X relative to A. Then the limits a = limn→∞ μ(sn < X)A and b = limn→∞ μ(rn < X)A are equal. Hence by equalities 4.6.32 and 4.6.33 μ((t ≤ X)A) = b = a = μ((t < X)A). Assertion 9 and the corollary are proved.
Theorem 4.6.10. Chebychev’s inequality. Let X ∈ L be arbitrary. Then the following conditions hold: 1. (First and common version of Chebychev’s inequality.) If t > 0 is a regular point of the integrable function |X|, then we have μ(|X| > t) ≤ t −1 I |X|.
Integration and Measure
87
2. (Second version.) If I |X| < b for some b > 0, then for each s > 0, we have (|X| > s) ⊂ B for some integrable set B with μ(B) < s −1 b. This second version of Chebychev’s inequality is useful when a real number s > 0 is given without any assurance that the set (|X| > s) is integrable. Proof. 1. 1(|X|>t) ≤ t −1 |X|. Assertion 1 follows. 2. Take an arbitrary regular point t of the integrable function X in the open interval (b−1 I |X|s,s). Let B ≡ (|X| > t). By Assertion 1, we then have μ(B) ≤ t −1 I |X| < s −1 b. Moreover, (|X| > s) ⊂ (|X| > t) ≡ B. Assertion 2 is proved. Definition 4.6.11. Convention: implicit assumption of regular points of integrable functions. Let X be an integrable function, and let A be an integrable set. Henceforth, if the integrability of the set (X < t)A or (X ≤ t)A, for some t ∈ R, is required in a discussion, then it is understood that the real number t can be chosen, and has been chosen, from the regular points of the integrable function X relative to the integrable set A. Likewise, if the integrability of the set (t < X) or (t ≤ X), for some t > 0, is required in a discussion, then it is understood that the number t > 0 has been chosen from the regular points of the integrable function X. For brevity, we will sometimes write (X < t; Y ≤ s; . . .) for (X < t) (Y ≤ s) . . .. Recall that Cub (R) is the space of bounded and uniformly continuous functions on R. Proposition 4.6.12. The product of a bounded continuous function of an integrable function and an integrable indicator is integrable. Suppose X ∈ L, A is an integrable set, and f ∈ Cub (R). Then f (X)1A ∈ L. In particular, if X ∈ L is bounded, then X1A is integrable. Proof. 1 By Assertion 3 of Corollary 4.6.9, all but countably many real numbers are regular points of X relative to A. In other words, there exists a countable subset J of R such that each t ∈ Jc is a regular point of the integrable function X relative to the integrable set A. Here Jc denotes the metric complement of the set J in R. 2. Let c > 0 be so large that |f | ≤ c on R. Let δf be a modulus of continuity of the function f . Let ε > 0 be arbitrary. Since X is integrable, there exists a > 0 so large that I |X| − I |X| ∧ (a − 1) < ε. Since f is uniformly continuous, there exists a sequence −a = t0 < t1 < · · · < tn = a such that ni=1 (ti − ti−1 ) < δf (ε). Then Y ≡
n
i=1
f (ti )1(t (i−1)a)A ≤ |X| − |X| ∧ (a − 1), we have n
f (ti )1(t (i−1) 0 be such that |X| ≤ b. Define f ∈ Cub (R) by f (x) ≡ −b ∨ x ∧ b for each x ∈ R. Then X = f (X) and so, according to the first part of this proposition, X1A ∈ L. The second part of the proposition is also proved.
4.7 Uniform Integrability In this section, let (,L,I ) be a complete integration space. Proposition 4.7.1. Moduli of integrability for integrable functions. Let X ∈ L be arbitrary. Recall Definition 4.6.11. Then the following conditions hold: 1. Let A be an arbitrary integrable set. Then X1A ∈ L. 2. I (|X|1A ) → 0 as μ(A) → 0, where A is an arbitrary integrable set. More precisely, there exists an operation δ : (0,∞) → (0,∞) such that I (|X|1A ) ≤ ε for each integrable set A with μ(A) < δ(ε), for each ε > 0 . 3. I (|X|1(|X|>a) ) → 0 as a → ∞. More precisely, suppose I |X| ≤ b for some b > 0, and let operation δ be as described in Assertion 2. For each ε > 0, if we define η(ε) ≡ b/δ(ε), then I (|X|1(|X|>a) ) ≤ ε for each a > η(ε). 4. Suppose an operation η > 0 is such that I (|X|1(|X|>a) ) ≤ ε for
each a > η(ε), for each ε > 0. Then the operation δ defined by δ(ε) ≡ 2ε /η 2ε satisfies the conditions in Assertion 2.
Integration and Measure
89
Proof. 1. Let n ≥ 1 be arbitrary. Then n1A is integrable. Hence by Assertion 1 of Proposition 4.3.4, the function |X| ∧ (n1A ) is integrable. Now consider each ω ∈ domain(X) ∩ domain(1A ). Suppose |X(ω)1A (ω)| ∧ n |X(ω)| ∧ (n1A (ω)). Then 1A (ω) 0. Hence 1A (ω) = 1. It follows that |X(ω)| ∧ n |X(ω)| ∧ n), which is a contradiction. Thus |X(ω)1A (ω)| ∧ n = |X(ω)| ∧ (n1A (ω)) for each ω ∈ domain(X) ∩ domain(1A ). In other words, |X1A | ∧ n = |X| ∧ (n1A ). We saw earlier that |X|∧(n1A ) is integrable. Hence |X1A |∧n is integrable. Moreover, let n,p ≥ 1 be arbitrary with n > p. Consider each ω ∈ domain(X) ∩ domain(1A ). Suppose |X(ω)1A (ω)| ∧ n − |X(ω)1A (ω)| ∧ p > |X(ω)| ∧ n − |X(ω)| ∧ p. Then 1A (ω) 0. Hence 1A (ω) = 1. It follows that |X(ω)| ∧ n − |X(ω)| ∧ p > |X(ω)| ∧ n − |X(ω)| ∧ p, which is a contradiction. Thus |X(ω)1A (ω)| ∧ n − |X(ω)1A (ω)| ∧ p ≤ |X(ω)| ∧ n − |X(ω)| ∧ p for each ω ∈ domain(X) ∩ domain(1A ). In other words, |X1A | ∧ n − |X1A | ∧ p ≤ |X| ∧ n − |X| ∧ p. Consequently, since |X| ∈ L, we have I (|X1A | ∧ n − |X1A | ∧ p) ≤ I (|X| ∧ n − |X| ∧ p) → 0 as p → ∞. Thus limn→∞ I (|X1A | ∧ n) exists. By the Monotone Convergence Theorem, the limit function |X1A | = limn→∞ |X1A | ∧ n is integrable. Similarly, |X+ 1A | is integrable, and so is X1A = 2|X+ 1A | − |X1A |. Assertion 1 is proved. 2. Let ε > 0 be arbitrary. Since X is integrable, there exists a > 0 so large that I |X| − I (|X| ∧ a) < 2−1 ε. Define δ(ε) ≡ 2−1 ε/a. Now let A be an arbitrary integrable set with μ(A) < δ(ε) ≡ 2−1 ε/a. Then, since |X|1A ≤ (|X| − |X| ∧ a)1A + a1A , we have I (|X|1A ) ≤ I |X| − I (|X| ∧ a) + aμ(A) < 2−1 ε + a2−1 ε/a = ε. Assertion 2 is proved. 3. Take any b ≥ I |X|. Define η(ε) ≡ b/δ(ε), where δ is an operation as in Assertion 2. Let a > η(ε) be arbitrary. Define the integrable set A ≡ (|X| > a). Chebychev’s inequality then gives μ(A) = μ(|X| > a) ≤ I |X|/a ≤ b/a < δ(ε). Hence I (1A |X|) < ε by Assertion 2. Thus Assertion 3 is verified.
90
Probability Theory
4. Suppose an operation η > 0 is such that I (|X|1(|X|>a) ) ≤ ε for each a > an η(ε), for each ε > 0. Define the operation δ by δ(ε) ≡ 2ε /η 2ε . Suppose
ε ε ε integrable set A is such that μ(A) < δ(ε) ≡ 2 /η 2 . Take any a > η 2 . Then I (|X|1A ) ≤ I (a1A(X≤a) + 1(|X|>a) |X|)
ε ε ε ε ≤ I (a1A + 1(|X|>a) |X|) ≤ aμ(A) + ≤ a /η + . 2 2 2 2 ε By taking a arbitrarily close to η 2 , we obtain ε ε ε ε /η + = ε. I (|X|1A ) ≤ η 2 2 2 2 Assertion 4 and the proposition are proved.
Note that in the proof for Assertion 4 of Proposition 4.7.1, we use a real number a > η(ε) arbitrarily close to η(ε) rather than simply a = η(ε). This ensures that a can be a regular point of |X|, as required in Definition 4.6.11. Definition 4.7.2. Uniform integrability and simple modulus of integrability. A family G of integrable functions is said to be uniformly integrable if for each ε > 0, there exists η(ε) > 0 so large that I (|X|1(|X|>a) ) ≤ ε for each a > η(ε), for each X ∈ G. The operation η is then called a simple modulus of integrability of G. Proposition 4.7.1 ensures that each family G consisting of finitely many integrable functions is uniformly integrable. Proposition 4.7.3. Alternative definition of uniform integrability, and modulus of integrability, in the special case of a probability space. Suppose the integration space (,L,I ) is such that 1 ∈ L and I 1 = 1. Then a family G of integrable r.r.v.’s is uniformly integrable in the sense of Definition 4.7.2 iff (i) there exists b ≥ 0 such that I |X| ≤ b for each X ∈ G, and (ii) for each ε > 0, there exists δ(ε) > 0 such that I |X|1A ≤ ε for each integrable set A with μ(A) < δ(ε), and for each X ∈ G. The operation δ is then called a modulus of integrability of G. Proof. 1. First suppose the family G is uniformly integrable. In other words, for each ε > 0, there exists η(ε) such that I (|X|1(|X|>a) ) ≤ ε for each a > η(ε), and for each X ∈ G. Define b ≡ η(1) + 2. Let X ∈ G be arbitrary. Take any a ∈ (η(1),η(1) + 1). Then I |X| = I (1(|X|>a) |X|) + I (1(|X|≤ a) |X|) ≤ 1 + aI 1 = 1 + a < 1 + η(1) + 1 ≡ b, where the second equality follows from the hypothesis that I 1 = 1. This verifies Condition ε (i) of the present proposition. Now let ε > 0 be arbitrary. Define δ(ε) ≡ ε /η 2 2 . Then Assertion 4 of Proposition 4.7.1 implies that I (|X|1A ) ≤ ε for each integrable set A with μ(A) < δ(ε), for each X ∈ G. This verifies Condition (ii) of the present proposition.
Integration and Measure
91
2. Conversely, suppose Conditions (i) and (ii) of the present proposition hold. For each ε > 0, define η(ε) ≡ b/δ(ε). Then, according to Assertion 3 of Proposition 4.7.1, we have I (|X|1(|X|>a) ) ≤ ε for each a > η(ε). Thus the family G is uniformly integrable in the sense of Definition 4.7.2. Proposition 4.7.4. Dominateduniform integrability. If there exists an integrable function Y such that |X| ≤ Y for each X in a family G of integrable functions, then G is uniformly integrable. Proof. Note that b ≡ I |Y | satisfies Condition (i) in Definition 4.7.2. Then Assertion 3 of Proposition 4.7.1 guarantees an operation η such that for each ε > 0, we have I (1(|Y |>a) |Y |) ≤ ε for each a > η(ε). Hence, for each ε > 0, for each X ∈ G, and for each a > η(ε), we have I (1(|X|>a) |X|) ≤ I (1(Y >a) Y ) ≤ ε. Thus η is a common simple modulus of integrability for members X of G. The conditions in Definition 4.7.2 have been verified for the family G to be uniformly integrable. Proposition 4.7.5. Each integrable function is the L1 limit of some sequence of linear combinations of integrable indicators. The following conditions hold: 1. Suppose X is an integrable function with X ≥ 0. Then there exists a n(k)−1 sequence (Yk )k=1,2,... such that for each k ≥ 1 we have (i) Yk ≡ i=1 tk,i 1(t (k,i) 0. Then 0 < Yk (ω) = tk,i 1(t (k,i) k ≥ 1. Next, let ε > 0 be arbitrary. Then either (i ) X(ω) > 0 or (ii ) X(ω) < ε. In case (i ), we have tm,i < X(ω) for some m ≥ 1 and i ≥ 0 with i ≤ nm − 1. Hence Ym (ω) > 0. Therefore Yk (ω) ≥ Ym (ω) > 0 for each k ≥ m. Therefore, for each k ≥ k0 ≡ m ∨ log2 (aε−1 ), inequality 4.7.3 implies that X(ω) − Yk (ω) ≤ tk,i+1 − tk,i = 2−k a ≤ 2−k(0) a < ε.
(4.7.4)
In case (ii ), we have, trivially, X(ω) − Yk (ω) < ε for each k ≥ k0 . Combining, we see that
Integration and Measure
93
X(ω) − Yk (ω) < ε in either case, for each k ≥ k0 . Since ε > 0 is arbitrarily small, we see that Yk ↑ X on D. Moreover, we see that X − Yk < ε for each k ≥ k0 . 4. Consequently, I (X − Yk ) ≤ εI 1 = ε for each k ≥ k0 . Since ε > 0 is arbitrary, we conclude that I |Yk − X| → 0. Assertion 1 is proved. 5. By Assertion 1, we see that there exists a sequence (Yk+ )k=1,2,... of linear combinations of mutually exclusive indicators such that I |X+ − Yk+ | < 2−k−1 for + each k ≥ 1 and such that Yk+ ↑ X+ on D + ≡ ∞ k=1 domain(Yk ). By the same − token, there exists a sequence (Yk )k=1,2,... of linear combinations of mutually exclusive indicators such that I |X− − Yk− | < 2−k−1 for each k ≥ 1 and such that − Yk− ↑ X− on D − ≡ ∞ k=1 domain(Yk ). Let k ≥ 1 be arbitrary. Define Zk ≡ Yk+ − Yk− . Then I |X − Zk | ≤ I |X+ − Yk+ | + I |X− − Yk− | < 2−k . Moreover, we see from the proof of Assertion 1 that Yk+ can be taken to be a linear combination of mutually exclusive indicators of subsets of (X+ > 0). By the same token, Yk− can be taken to be a linear combination of mutually exclusive indicators of subsets of (X− > 0). Since (X+ > 0) and (X− > 0) are disjoint, Zk ≡ Yk+ − Yk− is a linear combination of mutually exclusive indicators. Since Yk+ ↑ X+ on D + and Yk− ↑ X− on D − , we have Zk → X = X+ − X− on ∞ + − k=1 domain(Zk ) = D ∩ D . Next, define Z0 ≡ 0 and define Uk ≡ Zk − Zk−1 for each k ≥ 1. Then ∞ ∞ ∞ k=1 I (Uk ) < ∞ and k=1 Uk = X on k=1 domain(Uk ). Hence (Uk )k=1,2,... is a representation of X in L. Assertion 2 is proved. 6. Assertion 3 is trivial if X and X are integrable indicators. Hence it is also valid when X and X are linear combinations of integrable indicators. Now suppose X and X are arbitrary integrable functions bounded in absolute value by some a > 0. By Assertion 2, there exist sequences (Zn )n=1,2,... and (Zn )n=1,2,... of linear combinations of integrable indicators such that I |X − Zn | → 0 and I |X − Zn | → 0. Then, for each n ≥ 1, the product function Zn Zn is integrable by the second statement in this paragraph. Moreover, |XX − Zn Zn | ≤ a|X − Zn | + a|X − Zn | for each n ≥ 1. Therefore, by Theorem 4.5.10, the function XX is integrable. Assertion 3 and the proposition are proved.
94
Probability Theory 4.8 Measurable Function and Measurable Set
In this section, let (,L,I ) be a complete integration space, and let (S,d) be a complete metric space with a fixed reference point x◦ ∈ S. In the case where S = R, it is understood that d is the Euclidean metric and that x◦ = 0. Recall that, as an abbreviation, we write AB ≡ A ∩ B for subsets A and B of . Recall from the notations and conventions described in the Introduction that if X is a real-valued function on and if t ∈ R, then we use the abbreviation (t ≤ X) for the subset {ω ∈ domain(X) : t ≤ X(ω)}. Similarly, “≤” may be replaced by “≥,” “ a)A) → 0 as a → ∞ while a remains in the metric complement Jc of J . In the special case where the constant function 1 is integrable, then Conditions (i) and (ii) reduce to (i ) f (X) ∈ L and (ii ) μ(d(x◦,X) > a) → 0 as a → ∞ while a ∈ Jc . It is obvious that if Condition (ii) holds for one point x◦ ∈ S, then it holds for any point x◦ ∈ S. The next lemma shows that, given Condition (i) and given an arbitrary integrable set A, the measure in Condition (ii) is welldefined for all but countably many a ∈ R. Hence Condition (ii) makes sense. Let X be a function defined a.e. on a complete integration space (,L,I ). Then it can be proved that X is measurable iff it satisfies Condition (*): for each integrable set A and ε > 0, there exist an integrable set B and an integrable function Y such that B ⊂ A, μ(AB c ) < ε, and |X − Y | < ε on B. Condition (*) is used as the definition of a measurable function in [Bishop and Bridges 1985]. Thus Definition 4.8.1 of a measurable function, which we find more convenient, is equivalent to the one in [Bishop and Bridges 1985]. Definition 4.8.2. Measurable set and measure-theoretic complement. Suppose a subset B of is such that B = (X = 1) for some real-valued measurable indicator function X. Then we say that B is a measurable set, and call the function 1B ≡ X the measurable indicator of B. The set B c ≡ (X = 0) is then called the measure-theoretic complement of B. Lemma 4.8.4 proves that 1B and B c are uniquely defined relative to a.e. equality.
Integration and Measure
95
Lemma 4.8.3. Integrability of some basic sets. Let X be a function from to S that satisfies Condition (i) in Definition 4.8.1. Let A be an arbitrary integrable set. Then the set (d(x◦,X) > a)A is integrable for all but countably many a ∈ R. Thus μ(d(x◦,X) > a)A is well defined for all but countably many a ∈ R. Proof. Let n ≥ 0 be arbitrary. Then hn ≡ 1 ∧ (n + 1 − d(x◦,·))+ ∈ Cub (S) and so hn (X)1A ∈ L by hypothesis. Hence all but countably many b ∈ (0,1) are regular points of hn (X)1A . Therefore the set (d(x◦,X) > n + 1 − b)A = (hn (X)1A < b)A is integrable for all but countably many b ∈ (0,1). Equivalently, (d(x◦,X) > a)A is integrable for all but countably many a ∈ (n,n + 1). Since n ≥ 0 is arbitrary, we see that (d(x◦,X) > a)A is integrable for all but countably many points a > 0. For each a ≤ 0, the set (d(x◦,X) > a)A = A is integrable by hypothesis. Lemma 4.8.4. The indicator and measure-theoretic complement of a measurable set are welldefined relative to a.e. equality. Let B,C be arbitrary measurable subsets of such that B = C a.e. Let X,Y be measurable indicators of B,C, respectively. 1. B = C a.e. iff X = Y a.e. Hence 1B is a well-defined measurable relative to a.e. equality. 2. If B = C a.e., then (X = 0) = (Y = 0) a.e. Hence B c is a well-defined subset relative to equality a.e. Proof. 1. Suppose B = C a.e. Then BD = CD for some full set D. Let A be an arbitrary integrable set. Then X1A and Y 1A are integrable indicators. Hence D ≡ D ∩ domain(X1A ) ∩ domain(Y 1A ) is a full set. Note that, on the full set D ,) the functions X and Y can have only two possible values, 1 or 0. Moreover, D (X = 1) = D B = D C = D (Y = 1). Hence X = Y on the full set D . Thus X = Y a.e. Conversely, suppose X = Y a.e. Then X = Y on some full set D. Hence (X = 0)D = (Y = 0)D. Since X,Y can have only two possible values, 1 or 0, it follows that X = Y on D. In short, X = Y a.e. Assertion 1 is proved. 2. Suppose B = C a.e. Then Assertion 1 implies that X = Y a.e. Then the previous paragraph shows that (X = 0) = (Y = 0) on the full set D. Thus (X = 0) = (Y = 0) a.e. The lemma is proved. Definition 4.8.5. Convention: unless otherwise specified, equality of measurable functions will mean equality a.e.; equality of measurable sets will mean equality a.e. Let (,L,I ) be an arbitrary complete integration space. Let X, Y be measurable functions with values in some metric space (S,d). Let A,B ⊂ be arbitrary measurable sets. Henceforth, unless otherwise specified, the statement X = Y will mean X = Y a.e., Similarly, the statements X ≤ Y , X < Y , X ≥ Y , and X > Y will mean X ≤ Y a.e., X < Y a.e., X ≥ Y a.e., and
96
Probability Theory
X > Y a.e., respectively. Likewise, the statements A = B, A ⊂ B, and A ⊃ B will mean A = B a.e., A ⊂ B a.e., and A ⊃ B a.e., respectively. The next proposition gives an equivalent condition to Condition (ii) in Definition 4.8.1. Proposition 4.8.6. Alternative definition of measurable functions. For each n ≥ 0, define the function hn ≡ 1 ∧ (n + 1 − d(x◦,·))+ ∈ Cub (S). Then a function X from (,L,I ) to the complete metric space (S,d) is a measurable function iff, for each integrable set A and for each f ∈ Cub (S), we have (i ) f (X)1A ∈ L, and (ii ) I hn (X)1A → μ(A) as n → ∞. In the anticipated case where (,L,I ) is a probability space, where μ() = 1, conditions (i ) and (ii ) can be replaced by (i ) f (X) ∈ L and (ii ) I hn (X) → 1 as n → ∞. Proof. 1. Suppose Conditions (i ) and (ii ) hold. Let n ≥ 0 be arbitrary. We need to verify that the function X is measurable. Note that since hn ∈ Cub (S), we have hn (X)1A ∈ L by Condition (i ). Now let A be an arbitrary integrable set. Then, for each n ≥ 1 and a > n + 1, we have I hn (X)1A ≤ μ(d(x◦,X) ≤ a)A ≤ μ(A). Letting n → ∞, Condition (ii ) and the last displayed inequality imply that μ(d(x◦,X) ≤ a)A → μ(A). Equivalently, μ(d(x◦,X) > a)A → 0 as a → ∞. The conditions in Definition 4.8.1 are satisfied for X to be measurable. 2. Conversely, suppose X is measurable. Then Definition 4.8.1 of measurable functions implies Condition (i ) of the present lemma. It implies also that μ(d(x◦,X) ≤ a)A → μ(A) as a → ∞. At the same time, for each a > 0 and n > a, we have μ(d(x◦,X) ≤ a)A ≤ I hn (X)1A ≤ μ(A). Letting a → ∞, we see that I hn (X)1A → μ(A). Thus Condition (ii ) of the present lemma is also proved. Proposition 4.8.7. Basics of a measurable function. Let X : (,L,E) → S be an arbitrary measurable function with values in the complete metric space (S,d). 1. Then domain(X) is a full set. In other words, X is defined a.e. In particular, if A is a measurable set, then A ∪ Ac is a full set. 2. Each function Y : (,L,E) → S that is equal a.e. to the measurable function X is itself measurable. 3. Each integrable function Z : (,L,E) → R is a real-valued measurable function. Each integrable set is measurable. Proof. 1. Let X be an arbitrary measurable function. Let f ≡ 0 denote the constant 0 function. Then f ∈ Cub (S). Hence, by Condition (i) in Definition 4.8.1, we have f (X)1A ∈ L. Consequently, D ≡ domain(f (X)1A ) is a full set. Since domain(X) = domain(f (X)) ⊃ D, we see that domain(X) is a full set.
Integration and Measure
97
Next, let A be an arbitrary measurable set. In other words, its indicator 1A is measurable. Hence the set A ∪ Ac = domain(1A ) is a full set according to the previous paragraph. Assertion 1 is proved. 2. Now suppose Y is a function on , with values in S such that Y = X a.e., where X is some measurable function. Let A be any integrable set. Let f ∈ Cub (S) be arbitrary. Then, by Condition (i) in Definition 4.8.1, we have f (X)1A ∈ L. Moreover, because Y = X a.e., we have f (Y )1A = f (X)1A a.e. Consequently, f (Y )1A ∈ L. Again because Y = X a.e., μ(d(x◦,Y ) > a)A = μ(d(x◦,X) > a)A → 0 as a → ∞. Thus the conditions in Definition 4.8.1 are verified for Y to be measurable. 3. Next, let Z be any integrable function. Let f ∈ Cub (R) be arbitrary and let A be an arbitrary integrable set. By Proposition 4.6.12, we have f (Z)1A ∈ L, which establishes Condition (i) of Definition 4.8.1 for the function Z. By Chebychev’s inequality, μ(|Z| > a)A ≤ μ(|Z| > a) ≤ a −1 I |Z| → 0 as a → ∞. Condition (ii) of Definition 4.8.1 follows. Hence Z is measurable. In particular, 1A and A are measurable. The next proposition will be used repeatedly to construct measurable functions from given ones. Proposition 4.8.8. Construction of a measurable function from pieces of given measurable functions on measurable sets in a disjoint union. Let (S,d) be a complete metric space. Let (Xi ,Ai )i=1,2,... be a sequence where for each i,j ≥ 1, Xi is a measurable function on (,L,I ) with values in S, and where (i) Ai is a measurable subset of , (ii) if i j , then Ai Aj = φ, (iii) ∞ k=1 Ak is a full set, ∞ and (iv) k=1 μ(Ak A) = μ(A) for each integrable set A. Define a function X by domain(X) ≡
∞
domain(Xi )Ai
i=1
and by X ≡ Xi on domain(Xi )Ai , for each i ≥ 1. Then X is a measurable function on with values in S. The same conclusion holds for a finite sequence (Xi ,Ai )i=1,...,n . Proof. We will give the proof for the infinite sequence only. For each n ≥ 1, define hn ≡ 1 ∧ (n + 1 − d(x◦,·))+ ∈ Cub (S). Let f ∈ Cub (S) be arbitrary, with |f | ≤ c on S for some c > 0. Let A be an arbitrary integrable set. Since ∞
i=1
If (Xi )1A(i)A ≤ c
∞
i=1
μ(Ai A) < ∞,
98
Probability Theory
the function Y ≡ on the full set
∞
i=1 f (Xi )1A(i)A
∞
Ai
i=1
∩
is integrable. At the same time, f (X)1A = Y ∞
domain(Xi ) .
i=1
Hence f (X)1A is integrable. In particular, hn (X)1A is integrable for each n ≥ 1. Moreover, I hn (Xi )1A(i)A ↑ μ(Ai A) as n → ∞, for each i ≥ 1, according to Condition (ii ) of Lemma 4.8.6. Consequently, I hn (X)1A =
∞
I hn (Xi )1A(i)A ↑
i=1
∞
μ(Ai A) = μ(A).
i=1
Hence, by Lemma 4.8.6, X is a measurable function.
We next provide a metric space lemma. Lemma 4.8.9. Sufficient condition for uniform continuity on a metric space. Let (S,d) be an arbitrary metric space. Let A,B be arbitrary subsets of S, and let a > 0 be arbitrary such that for each x ∈ S, we have either (i) (d(·,x) < a) ⊂ A or (ii) (d(·,x) < a) ⊂ B. Suppose λ : S → R is a function with domain(λ) = S such that λ is uniformly continuous on each of A and B. Then λ is uniformly continuous on S. Proof. Let ε > 0 be arbitrary. Since λ is uniformly continuous on each of A and B, there exists δ0 > 0 so small that |λ(x) − λ(y)| < ε for each x,y with d(x,y) < δ0 , provided that either x,y ∈ A or x,y ∈ B. Let δ ≡ a ∧ δ0 . Consider each x,y ∈ S with d(x,y) < δ. By hypothesis, either Condition (i) or Condition (ii) holds. Assume that Condition (i) holds. Then since d(x,x) = 0 < a and d(y,x) < δ ≤ a, we have x,y ∈ A. Hence, since d(y,x) < δ ≤ δ0 , we have |λ(x) − λ(y)| < ε. Similarly, if Condition (ii) holds, then |λ(x) − λ(y)| < ε, where x,y ∈ S are arbitrary with d(x,y) < δ. Thus λ is uniformly continuous on S. Proposition 4.8.10. A continuous function of a measurable function is measurable. Let (S,d) and (S d ) be complete metric spaces. Let X : (,L,I ) → (S,d) be an arbitrary measurable function. Suppose a function f : (S,d) → (S ,d ) with domain(f ) = S is uniformly continuous and bounded on each bounded subset of S. Then the composite function f (X) ≡ f ◦X : (,L,I ) → (S ,d ) is measurable. In particular, d(x,X) is a real-valued measurable function for each x ∈ S. Proof. We need to prove that Y ≡ f (X) is measurable. 1. To that end, let g ∈ Cub (S ) be arbitrary, with |g| ≤ b for some b > 0. Consider an arbitrary integrable set A and an arbitrary ε > 0. Since X is measurable by hypothesis, there exists a > 0 so large that μ(B) < ε, where B ≡ (d(x◦,X) > a)A. Define h ≡ 1 ∧ (a − d(x◦,·))+ ∈ Cub (S).
Integration and Measure
99
The function f is, by hypothesis, uniformly continuous on the bounded set G ≡ (d(·,x◦ ) < 2 + a). By assumption, g is uniformly continuous. Therefore (g ◦ f ) and (g ◦ f )h are uniformly continuous on G. At the same time, h = 0 on H ≡ (d(·,x◦ ) > a). Hence (g ◦ f )h = 0 on H . Thus (g ◦ f )h is uniformly continuous on H . Now consider each x ∈ S. Either (i) d(x,x◦ ) < a + 32 or (ii) d(x,x◦ ) > a + 12 .
1 In case
d(·,x) < 2 ⊂ (d(·,x◦ ) < 2 + a) ≡ G. In case (ii), we (i), we have 1 have d(·,x) < 2 ⊂ (d(·,x◦ ) > a) ≡ H . Combining, Lemma 4.8.9 implies that (g ◦ f )h is uniformly continuous on S. Moreover, since (g ◦ f )h is bounded on G by hypothesis, and is equal to 0 on H , it is bounded on S. In short, (g ◦ f )h ∈ Cub (S). Since X is measurable, the function g(Y )h(X)1A = (g ◦ f )(X)h(X)1A is integrable. At the same time, |g(Y )1A − g(Y )h(X)1A | ≤ b(1 − h(X))1A,
(4.8.1)
where I (1 − h(X))1A ≤ μ(d(x◦,X) > a)A = μ(B) < ε. Since ε > 0 is arbitrary, by Theorem 4.5.10, inequality 4.8.1 implies that the function g(Y )1A is integrable. We have verified Condition (i) of Definition 4.8.1 for the function Y . 2. Now let c > a be arbitrary. By hypothesis, there exists c > 0 so large that d (x◦ ,f ) ≤ c on (d(x◦,·) < c). Then, for each x ∈ S with d (x◦ ,f (x)) > c , we have d(x◦,x) ≥ c . Hence d(x◦ ,f (X)) > c ) ⊂ (d(x◦,X) ≥ c). Therefore μ(d(x◦,Y ) > c )A ≡ μ(d(x◦ ,f (X)) > c )A ≤ μ(d(x◦,X) ≥ c)A ≤ μ(d(x◦,X) > a)A < ε. By Lemma 4.8.3, the left-hand side of the last displayed inequality is well defined for all but countably many c > 0. Since ε > 0 is arbitrary, we conclude that μ(d(x◦,Y ) > c )A → 0 as c → ∞. Thus we have verified all the conditions of Definition 4.8.1 for the function Y to be measurable. In other words, f (X) is measurable. Corollary 4.8.11. Condition for measurability of the identity function, and of a continuous function of a measurable function. Let (S,d) be a complete metric space. Suppose (S,Cub (S),I ) is an integration space, with completion (S,L,I ). Define the identity function X : S → S by X(x) ≡ x for each x ∈ S. Define hk ≡ 1 ∧ (k + 1 − d(x◦,·))+ ∈ Cub (S) for each k ≥ 1. Suppose I hk 1A ↑ μ(A) for each integrable set A, Then the following conditions hold: 1. The identity function X : (S,L,I ) → (S,d) is a measurable function.
100
Probability Theory
2. Let (S ,d ) be a second complete metric space. Suppose a function f : (S,d) → (S ,d ) with domain(f ) = S is uniformly continuous and bounded on each bounded subset of S. Then the function f : (S,L,I ) → (S ,d ) is measurable. In particular, d(x,·) is a real-valued measurable function for each x ∈ S. Proof. 1. Let f ∈ Cub (S) be arbitrary, and let A be an arbitrary integrable set. Then f (X) ≡ f ∈ Cub (S) ⊂ L. Hence f (X)1A ∈ L. Moreover, I hk (X)1A = I hk 1A ↑ μ(A) by hypothesis. Hence X is measurable according to Lemma 4.8.6. 2. The conditions in the hypothesis of Proposition 4.8.10 are satisfied by the functions X : (S,L,I ) → (S,d) and f : (S,d) → (S ,d ). Accordingly, the function f ≡ f (X) is measurable. The next proposition says that in the case where (S,d) is locally compact, the conditions for measurability in Definition 4.8.1 can be relaxed by replacing Cub (S) with the subfamily C(S). Proposition 4.8.12. Sufficient condition for measurability in case S is locally compact. Let (S,d) be a locally compact metric space. Define hn ≡ 1 ∧ (n + 1 − d(x◦,·))+ ∈ C(S) for each n ≥ 1. Let X be a function from (,L,I ) to (S,d) such that f (X)1A ∈ L for each integrable set A and each f ∈ C(S). Then the following conditions hold: 1. If I hn (X)1A ↑ μ(A) for each integrable set A, then X is measurable. 2. If μ(d(x◦,X) > a)A → 0 as a → ∞ for each integrable set A, then X is measurable. Proof. Let n ≥ 1 be arbitrary. Note that the function hn has the bounded set (d(x◦,·) < n + 1) as support, which is contained in some compact subset because the metric space (S,d) is locally compact. Hence hn ∈ C(S). 1. Now let A be an arbitrary integrable set. Let g ∈ Cub (S) be arbitrary, with |g| ≤ c on S. Since hn ∈ C(S), we have hn,hn g ∈ C(S), Hence, by the hypothesis of the proposition, we have hn (X)g(X)1A ∈ L. Moreover, I hn (X)1A ↑ μ(A) by the hypothesis of Assertion 1. Hence I |g(X)1A − hn (X)g(X)1A | ≤ cI (1A − hn (X)1A ) = c(μ(A) − I hn (X)1A ) → 0 (4.8.2) as n → ∞. Therefore Theorem 4.5.10 is applicable; it implies that g(X)1A ∈ L, where g ∈ Cub (S) is arbitrary. Thus the conditions in Proposition 4.8.6 are satisfied for X to be measurable. Assertion 1 is proved. 2. For each a > 0 and for each n > a, 0 ≤ μ(A) − I hn (X)1A = I (1 − hn (X))1A ≤ μ(d(x◦,X) > a)A, which, by the hypothesis of Assertion 2, converges to 0 as a → ∞. Hence, by Assertion 1, the function X is measurable. Assertion 2 and the proposition are proved.
Integration and Measure
101
Definition 4.8.13. Regular and continuity points of a measurable function relative to each integrable set. Suppose X is a real-valued measurable function on (,L,I ). We say that t ∈ R is a regular point of X relative to an integrable set A if (i) there exists a sequence (sn )n=1,2,... of real numbers decreasing to t such that (sn < X)A is an integrable set for each n ≥ 1 and such that limn→∞ μ(sn < X)A exists, and (ii) there exists a sequence (rn )n=1,2,... of real numbers increasing to t such that (rn < X)A is integrable for each n ≥ 1 and such that limn→∞ μ(rn < X)A exists. If, in addition, the two limits in (i) and (ii) are equal, then we call t a continuity point of X relative to A. If a real number t is a regular point of X relative to each integrable set A, then we call t a regular point of X. If a real number t is a continuity point of X relative to each integrable set A, then we say t is a continuity point of X. The next proposition shows that regular points and continuity points of a realvalued measurable function are abundant, and that they inherit the properties of regular points and continuity points of integrable functions. Proposition 4.8.14. All but countably many real numbers are continuous points of a real-valued measurable function, relative to each given integrable set A. Let X be a real-valued measurable function on (,L,I ). Let A be an integrable set and let t be a regular point of X relative to A. Then the following conditions hold: 1. All but countably many u ∈ R are continuity points of X relative to A. Hence all but countably many u ∈ R are regular points of X relative to A. 2. Ac is a measurable set. 3. The sets A(t < X),A(t ≤ X),A(X < t),A(X ≤ t), and A(X = t) are integrable sets. 4. (X ≤ t)A = A((t < X)A)c and (t < X)A = A((X ≤ t)A)c . 5. (X < t)A = A((t ≤ X)A)c and (t ≤ X)A = A((X < t)A)c . 6. For a.e. ω ∈ A, we have t < X(ω), t = X(ω), or t > X(ω). 7. Let ε > 0 be arbitrary. There exists δ > 0 such that if r ∈ (t − δ,t] and if A(X < r) is integrable, then μ(A(X < t)) − μ(A(X < r)) < ε. Similarly, there exists δ > 0 such that if s ∈ [t,t + δ) and if A(X ≤ s) is integrable, then μ(A(X ≤ s)) − μ(A(X ≤ t)) < ε. 8. If t is a continuity point of X relative to A, then μ((t < X)A) = μ((t ≤ X)A). Proof. In the special case where X is an integrable function, Assertions 1, 3, 4, 5, 6, 7, and 8 of the present proposition are restatements of Assertions 3, 4, 5, 6, 7, 8, and 9, respectively, of Corollary 4.6.9. 1. In general, suppose X is a real-valued measurable function. Let n ≥ 1 be arbitrary. Then, by Definition 4.8.1, ((−n)∨X∧n)1A ∈ L. Hence all but countably many u ∈ (−n,n) are continuity points of the integrable function ((−n)∨X∧n)1A
102
Probability Theory
relative to A. At the same time, for each t ∈ (−n,n), we have (X < t)A = (((−n) ∨ X ∧ n)1A < t)A. Hence, a point u ∈ (−n,n) is a continuity point of ((−n) ∨ X ∧ n)1A relative to A iff it is a continuity point of X relative to A. Combining, we see that all but countably many points in the interval (−n,n) are continuity points of X relative to A. Therefore all but countably many points in R are continuity points of X relative to A. This proves Assertion 1 of the proposition. 2. To prove that Ac is a measurable set, consider the function 1Ac : (,L,I ) → R. Let x◦ ≡ 0 be the reference point in (R,d), where d is the Euclidean metric. Let B be an arbitrary integrable set and let f ∈ Cub (R) be arbitrary. Then the sets B and BA are integrable. Hence, by Assertion 3 of 4.5.7, the set B(BA)c is integrable. Since BAc = B(BA)c, the set BAc is integrable. Consequently, the function f (1Ac )1B = f (1)1BAc + f (0)1BA is integrable. At the same time, we have (|0 − 1Ac |) > a) = (|1Ac |) > a) = φ for each a > 1. Hence, trivially, μ(|0 − 1Ac |) > a) → 0. Thus the conditions in Definition 4.8.1 are satisfied for the function 1Ac to be measurable and the set Ac is measurable. Assertion 2 is proved. 3. The remaining assertions are proved similarly to Step 1 – that is, by reducing the measurable function X to the integrable functions ((−n) ∨ X ∧ n)1A and then applying Corollary 4.6.9. Let X be an arbitrary real-valued measurable function. We have proved that, given each integrable set A, there exists a countable subset JA of R such that each point t in the metric complement JA,c is a regular point of X relative to A. In the special case where the integration space is the union of a sequence ) of integrable sets, it follows that each point t in the metric complement (A k ∞k=1,2,... J is a regular point of X relative to Ak , for each k ≥ 1. We will show A(k) k=1 c that as a consequence, all but countably many points t ∈ R are regular points of the measurable function X. The following definition and proposition will make this precise. Definition 4.8.15. Finite or σ -finite integration space. The complete integration space (,L,I ) is said to be finite if the constant function 1 is integrable. It is said to be sigma finite, or σ -finite, if there exists a sequence (Ak )k=1,2,... of integrable sets with positive measures such that (i) Ak ⊂ Ak+1 for k = 1,2, . . ., (ii) ∞ k=1 Ak is a full set, and (iii) for any integrable set A we have μ(Ak A) → μ(A). The sequence (Ak )k=1,2,... is then called an I -basis for (,L,I ). If (,L,I ) is finite, then it is σ -finite, with an I -basis given by (Ak )k=1,2,... , where Ak ≡ for each k ≥ 1. An example is where (S,Cub (S),I ) is an integration space with completion (S,L,I ). Then the constant function 1 is integrable, and so (S,L,I ) is finite. The next lemma provides other examples that include the Lebesgue integration space (R,L, ·dx).
Integration and Measure
103
Lemma 4.8.16. Completion of an integration on a locally compact metric space results in a σ -finite integration space. Suppose (S,d) is a locally compact metric space. Let (S,L,I ) be the completion of some integration space (S,C(S),I ). Then (,L,I ) ≡ (S,L,I ) is σ -finite. Specifically, there exists an increasing sequence (ak )k=1,2,... of positive real numbers with ak ↑ ∞ such that (Ak )k=1,2,... is an I -basis for (S,L,I ), where Ak ≡ (d(x◦,·) ≤ ak ) for each k ≥ 1. Proof. Consider each k ≥ 1. Define Xk ≡ 1 ∧ (k + 1 − d(x◦,·))+ ∈ C(S) ⊂ L. Let c ∈ (0,1) be arbitrary and let ak ≡ k + 1 − c. Then the set Ak ≡ (d(x◦,·) ≤ ak ) = (Xk ≥ c) is integrable. Conditions (i) and (ii) of Definition 4.8.15 are easily verified. For Condition (iii) of Definition 4.8.15, consider any integrable set A. According to Assertion 2 of Corollary 4.8.11, the real-valued function d(x◦,·) is measurable on (,L,I ) ≡ (S,L,I ). Hence μA − μAk A = μ(d(x◦,·) > ak )A → 0. Thus Condition (iii) in Definition 4.8.15 is also verified for (Ak )k=1,2,... to be an I -basis. Proposition 4.8.17. In the case of a σ -finite integration space, all but countably many points are continuous points of an arbitrary given real-valued measurable function. Suppose X is a real-valued measurable function on a σ -finite integration space (,L,I ). Then all but countably many real numbers t are continuity points, and hence regular points, of X. Proof. Let (Ak )k=1,2,... be an I -basis for (,L,I ). According to Proposition 4.8.14, for each k there exists a countable subset Jk of R such that if t ∈ (Jk )c , where (Jk )c stands for the metric complement of Jk in R, then t is a continuity point of X relative to Ak . Define J ≡ ∞ k=1 Jk . Consider each t ∈ Jc . Let the integrable set A be arbitrary. According to Condition (iii) in Definition 4.8.15, we can select a subsequence (Ak(n) )n=1,2,... of (Ak ) such that μ(A) − μ(AAk(n) ) < n1 for each n ≥ 1. Let n ≥ 1 be arbitrary. Write Bn ≡ Ak(n) . Then t ∈ (Jk(n) )c , and so t is a continuity point of X relative to Bn . Consequently, according to Proposition 4.8.14, the sets (X < t)Bn and (X ≤ t)Bn are integrable, with μ(X < t)Bn = μ(X ≤ t)Bn . Furthermore, according to Assertion 7 of Proposition 4.8.14, there exists δn > 0 such that (i) if r ∈ (t − δn,t] and if (X < r)Bn is integrable, then μ(X < t)Bn − μ(X < r)Bn < n1 , and (ii) if s ∈ [t,t + δn ) and if (X ≤ s)Bn is integrable, then μ(X ≤ s)Bn − μ(X ≤ t)Bn < n1 . Let r0 ≡ t − 1 and
s0 ≡ t + 1. Inductively, we can select rn ∈ (t − δn,t) ∩ (rn−1,t) ∩ t − n1 ,t such that rn is a regular point of X relative to both Bn and A. Similarly, we can select sn ∈ (t,t + δn ) ∩ (t,sn−1 ) ∩ t,t + n1 such that sn is a regular point of X relative to both Bn and A. Then, for each n ≥ 1, we have
104
Probability Theory
μ(rn < X)A − μ(sn < X)A = μA(rn < X)(X ≤ sn )A ≤ μ(rn < X)(X ≤ sn )Bn + μ(ABnc ) = μ(X ≤ sn )Bn − μ(X ≤ rn )Bn + μ(A) − μ(ABn ) 1 1 − μ(X < t)Bn − + (μ(A) − μ(AAk(n) )) ≤ μ(X ≤ t)Bn + n n 1 1 1 (4.8.3) ≤ + + . n n n Since the sequence (μ(rn < X)A) is nonincreasing and the sequence (μ(sn < X)A) is nondecreasing, inequality 4.8.3 implies that both sequences converge, and to the same limit. By Definition 4.8.13, t is a continuity point of X relative to A, where t ∈ Jc is arbitrary. The proposition is proved. We now expand the convention in Definition 4.6.11 to cover measurable functions. Definition 4.8.18. Convention regarding regular points of measurable functions. Let X be a real-valued measurable function, and let A be an integrable set. Henceforth, when the integrability of the set (X < t)A or (X ≤ t)A is required in a discussion, for some t ∈ R, then it is understood that the real number t has been chosen from the regular points of the measurable function X relative to the given integrable set A. Furthermore, if (,L,I ) is a σ -finite integration space, when the measurability of the set (X < t) or (X ≤ t) is required in a discussion, for some t ∈ R, then it is understood that the real number t has been chosen from the regular points of the measurable functions X. Corollary 4.8.19. Properties of regular points. Let X be a real-valued measurable function on a σ -finite integration space (,L,I ), and let t be a regular point of X. Then (X ≤ t) = (t < X)c
(4.8.4)
and (t < X) = (X ≤ t)c are measurable sets. Similarly, (X < t) = (t ≤ X)c and (t ≤ X) = (X < t)c are measurable sets. According to the convention in Definition 4.8.5, the equalities here are understood to be a.e. equalities. Proof. We will prove only equality 4.8.4, with the proofs of the rest being similar. Define an indicator function Y , with domain(Y ) = (X ≤ t) ∪ (t < X), by Y = 1 on (X ≤ t) and Y = 0 on (t < X). It suffices to show that Y satisfies Conditions (i) and (ii) of Definition 4.8.1 for a measurable function. To that end, consider an arbitrary f ∈ Cub (R) and an arbitrary integrable subset A. By hypothesis, and by Definition 4.8.18, t is a regular point of X relative to A. Moreover, f (Y )1A = f (1)1(X≤t)A + f (0)1(t a) is empty for each a > 1. Hence, trivially, μ(|Y | > a)A → 0 as a → ∞, thereby establishing Condition (ii) of Definition 4.8.1. Consequently, Y is a measurable indicator for (X ≤ t), and (t < X) = (Y = 0) = (X ≤ t)c . This proves equality 4.8.4. Proposition 4.8.20. A vector of measurable functions with values in locally compact metric spaces constitutes a measurable function with values in the product metric space. Let (,L,I ) be a complete integration space. 1. Let n ≥ 2 be arbitrary. For each i = 1, . . . ,n, let (S (i),d (i) ) be an arbitrary locally compact metric space, with some arbitrary but fixed reference point x◦(i) ∈ S (i) , and let X(i) : (,L,I ) → (S (i),d (i) ) be an arbitrary measurable function. n (i) (i) ≡ Let ( S, d) i=1 (S ,d ) be the product locally compact metric space. Define the function X : (,L,I ) → ( S, d) by domain(X) ≡ ni=1 domain(X(i) ) and by X(ω) ≡ (X(1) (ω), . . . ,X(n) (ω)) ∈ S for each ω ∈ domain(X). Then X is a measurable function. 2. In addition, let (S,d) be an arbitrary complete metric space, and let g : → (S,d) be an arbitrary function that is uniformly continuous on compact ( S, d) Then subsets of ( S, d). g(X) ≡ g ◦ X : (,L,I ) → (S,d) is a measurable function. 3. As a special case of Assertion 2, let (S,d) be an arbitrary locally compact metric space. Let X(1),X(2) : (,L,I ) → (S,d) be arbitrary measurable functions. Then the function d(X(1),X(2 ) : (,L,I ) → R is measurable. Proof. 1. Let i = 1, . . . ,n be arbitrary. Let ξ (i) ≡ (A(i) k )k=1,2,... be an arbitrary but fixed binary approximation of the locally compact metric space (S (i),d (i) ) relative (i) to the reference point x◦ ∈ S (i) , in the sense of Definition 3.2.1. Let (i) π (i) ≡ ({g (i) k,x(i) : xi ∈ Ak })k=1,2,...
be the partition of unity of (S (i),d (i) ) relative to the binary approximation ξ (i), in the sense of Definition 3.3.4. To lessen the burden on subscripts, write A(i,k) ≡ A(i) k for each k ≥ 1. Then, by Definition 3.2.1, we have (d (i) (·,xi ) ≤ 2−k ) (4.8.5) (d (i) (·,x◦(i) ) ≤ 2k ) ⊂ x(i)∈A(i,k)
106
Probability Theory (i)
for each k ≥ 1. Moreover, by Proposition 3.3.5, gk,x(i) ∈ C(S,d) has values in [0,1] and has support (d (i) (·,xi ) ≤ 2−k+1 ), for each xi ∈ Ak . Furthermore, by the same Proposition 3.3.5,
(i) gk,x(i) ≤ 1 (4.8.6) (i)
x(i)∈A(i,k)
on S (i) . 2. For ease of notations, assume that n = 2 for the remainder of this proof. The proof for the general case is similar. 3. We need to show that the function X with values in the product space ( S, d) is measurable. To that end, first let f ∈ C(S, d) be arbitrary, with a modulus of continuity δf . Let A be an arbitrary integrable subset of . Let α > 0 be arbitrary. (1) (2) k−1 ) Let k ≥ 1 be so large that (i) the function f has the set (d(·,(x ◦ ,x◦ )) ≤ 2 by S, d) as support and (ii) 2−k+1 < δf (α). Define the function g on (
(1) (2) f (x1,x2 )gk,x(1) (y1 )gk,x(2) (y2 ) (4.8.7) g(y1,y2 ) ≡ x(1)∈A(1,k) x(i2)∈A(2,g)
for each (y1,y2 ) ∈ ( S, d). (1) 4. By hypothesis, the function X(1) is measurable. Hence, for each x1 ∈ Ak , (1) the function gk,x(1) (X(1) )1A is bounded and integrable. Similarly, for each x2 ∈ (2)
(2)
Ak , the function gk,x(2) (X(2) )1A is bounded and integrable. Thus equality 4.8.7 implies that the function g(X(1),X(2) )1A is integrable. be arbitrary. We will show that S, d) 5. Let (y1,y2 ) ∈ (
(1) (2) f (y1,y2 ) = h(y1,y2 ) ≡ f (y1,y2 )gk,x(1) (y1 )gk,x(2) (y2 ). x(1)∈A(1,k) x(i2)∈A(2,g)
Suppose, for the sake of a contradiction, that |f (y1,y2 ) − h(y1,y2 )| > 0. Then |f (y1,y2 )| > 0. Hence, according to Condition (i) in Step 3, we have 1,y2 ),(x◦(1),x◦(2) )) ≤ 2k−1 < 2k . d (1) (y1,x◦(1) ) ∨ d (2) (y2,x◦(2) ) ≡ d((y (1)
(2)
Therefore, by relation 4.8.5, there exists (x1,x2 ) ∈ Ak × Ak such that yi ∈ (d(·,xi ) ≤ 2−k ) x(i)∈A(i,k)
for each i = 1,2. Therefore, according to Assertion 3 of Proposition 3.3.5, we have
(i) gk,x(i) (yi ) = 1 x(i)∈A(i,k)
for each i = 1,2. It follows that
Integration and Measure ⎞⎛
(1) h(y1,y2 ) = f (y1,y2 ) ⎝ gk,x(1) (y1 )⎠ ⎝ ⎛
x(1)∈A(1,k)
⎞
107
(2) gk,x(2) (y2 )⎠
x(2)∈A(2,k)
= f (y1,y2 ), which is a contradiction. We conclude that f (y1,y2 ) = h(y1,y2 ) for each S, d). (y1,y2 ) ∈ ( 6. Hence |f (y1,y2 ) − g(y1,y2 )| = |h(y1,y2 ) − g(y1,y2 )|
(1) (2) = (f (y1,y2 ) − f (x1,x2 ))gk,x(1) (y1 )gk,x(2) (y2 ) x(1)∈A(1,k) x(i2)∈A(2,g)
(1) (2) ≤ |f (y1,y2 ) − f (x1,x2 )|gk,x(1) (y1 )gk,x(2) (y2 ). x(1)∈A(1,k) x(i2)∈A(2,g)
(4.8.8) Suppose, in the last displayed sum, the summand corresponding to x1,x2 is (1) positive. Then gk,x(1) (y1 ) > 0. Hence, according to the last statement in Step 1, we have d (i) (yi ,xi ) ≤ 2−k+1 < δf (α) for each i = 1,2, where the last inequality is by Condition (ii) in Step 3. It follows 1,y2 ),(x1,x2 )) < δf (α) and therefore that |f (y1,y2 ) − f (x1,x2 )| < α. that d((y Thus (1) (2) (1) (2) (y1 )gk,x(2) (y2 ) ≤ αgk,x(1) (y1 )gk,x(2) (y2 ). |f (y1,y2 ) − f (x1,x2 )|gk,x(1)
Consequently, inequality 4.8.8 implies that
(1) |f (y1,y2 ) − g(y1,y2 )| ≤ α gk,x(1) (y1 ) x(1)∈A(1,k)
(2)
gk,x(2) (y2 ) ≤ α,
x(i2)∈A(2,g)
where the last inequality is thanks to inequality 4.8.6, and where (y1,y2 ) ∈ ( S, d) is arbitrary. 7. It follows that |f (X(1),X(2) )1A − g(X(1),X(2) )1A | ≤ α1A . Since α > 0 is arbitrarily small, and since g(X(1),X(2) )1A is integrable, as established in Step 4, Theorem 4.5.10 implies that the function f (X(1),X(2) )1A is integrable, where the integrable set A is arbitrary. 8. At the same time, 1,X2 ),(x◦(1),x◦(2) )) > a)A μ(d((X ≤ μ((d (1) (X1,x◦(1) ) > a)A) + μ((d (2) (X1,x◦(2) ) > a)A) → 0,
108
Probability Theory
because each measure on the right-hand side converges to 0 as a → ∞, thanks to the measurability of the functions X1,X2 . 9. In view of Steps 7 and 8, Proposition 4.8.12 implies that the function X ≡ (X(1),X(2) ) : (,L,I ) → ( S, d) is measurable. Assertion 1 is proved. 10. Next, suppose (S,d) is a complete metric space and suppose a function → (S,d) is uniformly continuous on compact subsets of ( Since g : ( S, d) S, d). the product space (S, d) is locally compact, each bounded subset is contained in a compact subset. Hence the function g is uniformly continuous and bounded on bounded subsets. Therefore, according to Proposition 4.8.10, the composite function g(X) : (,L,I ) → (S,d) is measurable. Assertion 2 is proved. 11. Now let (S,d) be an arbitrary locally compact metric space. Let X(1),X(2) : (,L,I ) → (S,d) be arbitrary measurable functions. Then the is measurable, by Assertion 1. At S, d) function (X(1),X(2) ) : (,L,I ) → ( → R is uniformly continuous the same time, the distance function d : ( S, d) on compact subsets of (S, d). Hence the composite function d(X(1),X(2) ) is measurable by Assertion 2. Thus Assertion 3 and the proposition are proved. Corollary 4.8.21. Some operations that preserve measurability for realvalued measurable functions. Let X,Y : (,L,I ) → R be real-valued measurable functions. Then aX + bY , 1, X ∨ Y , X ∧ Y , |X|, and Xα are measurable functions for any real numbers a,b,α with α ≥ 0. Let A,B be measurable sets. Then A ∪ B and AB are measurable. Moreover, (A ∪ B)c = Ac B c and (AB)c = Ac ∪ B c . Proof. Each of the real-valued functions aX+bY , 1, X∨Y , X∧Y , |X|, and Xα can be written as g(X,Y ) for some function g : R 2 → R that is uniformly continuous on compact subsets of R 2 . Thus these functions are measurable by Assertion 2 of Proposition 4.8.20. Let A,B be measurable sets with indicators U,V , respectively. Then U ∨ V is a measurable indicator, with (U ∨ V = 1) = (U = 1) ∪ (V = 1) = A ∪ B. Hence A ∪ B is a measurable set, with U ∨ V as a measurable indicator. Moreover, (A ∪ B)c = (U ∨ V = 0) = (U = 0)(V = 0) = Ac B c . Similarly, AB is measurable, with (AB)c = Ac ∪ B c . Proposition 4.8.22. A real-valued measurable function dominated by an integrable function is integrable. If X is a real-valued measurable function such that |X| ≤ Y for some nonnegative integrable function Y , then X is integrable. In particular, if A is a measurable set and Z is an integrable function, then Z1A is integrable.
Integration and Measure
109
Proof. Let (an )n=1,2,... be an increasing sequence of positive real numbers with an → ∞. Let (bn )n=1,2,... be a decreasing sequence of positive real numbers with bn → 0. Let n ≥ 1 be arbitrary. Then the function f : R → R defined by f (x) ≡ (−an ) ∧ x ∨ an for each x ∈ R that is a member of Cub (R). Hence the function Xn ≡ ((−an ) ∧ X ∨ an )1(b(n) 0. Suppose also that bn < Y (ω) and |X(ω)| ≤ an . Then, by equality 4.8.9, we obtain Xn (ω) = X(ω), which is a contradiction. Hence (i) bn ≥ Y (ω), or (ii) bn < Y (ω) and X(ω) > an , or (iii) bn < Y (ω) and X(ω) < −an . Consider case (i). Then 1(Y ≤b(n)) (ω) = 1, and Xn (ω) = 0 by equality 4.8.9. Hence |X(ω) − Xn (ω)| = |X(ω)| ≤ Y (ω) ≤ Y (ω)1(Y ≤b(n)) (ω), where the first inequality is by the defining equality 4.8.10. Next consider case (ii). Then 1(b(n)a(n)) ).
(4.8.11)
Now note that μ(|X| > an ) → 0 as n → ∞. Hence I Y 1(|X|>a(n)) → 0 as n → ∞, by Assertion 2 of Proposition 4.7.1. At the same time, by Assertion 3 of Proposition 4.7.1, we have I (Y 1(Y >a(n)) ) → 0. Combining, we obtain I Y (1(Y ≤b(n)) + 1(|X|>a(n)) ) → 0. Hence, by Theorem 4.5.10, inequality 4.8.11 implies that the real-valued measurable function X is integrable, as alleged.
4.9 Convergence of Measurable Functions In this section, let (,L,I ) be a complete integration space, and let (S,d) be a complete metric space, with a fixed reference point x◦ ∈ S. In the case where S = R, it is understood that d is the Euclidean metric and that x◦ = 0. We will
110
Probability Theory
introduce several notions of convergence of measurable functions on (,L,I ) with values in (S,d). We will sometimes write (ai ) for short for a given sequence (ai )i=1,2,.... Recall the following definition. Definition 4.9.1. Limit of a sequence of functions. If (Yi )i=1,2,... is a sequence of functions from a set to the metric space (S,d), and if the set ∞ domain(Yi ) : lim Yi (ω) exists in (S,d) D≡ ω∈ i=1
i→∞
is nonempty, then the function limi→∞ Yi is defined by domain(limi→∞ Yi ) ≡ D and by (limi→∞ Yi )(ω) ≡ limi→∞ Yi (ω) for each ω ∈ D. Definition 4.9.2. Convergence in measure, a.u., a.e., and in L1 . For each n ≥ 1, let X,Xn be functions on the complete integration space (,L,I ), with values in the complete metric space (S,d). 1. The sequence (Xn ) is said to converge to X uniformly on a subset A of if for each ε > 0, there exists p ≥ 1 so large that A ⊂ ∞ n=p (d(Xn,X) ≤ ε). 2. The sequence (Xn ) is said to converge to X almost uniformly (a.u.) if for each integrable set A and for each ε > 0, there exists an integrable set B with μ(B) < ε such that Xn converges to X uniformly on AB c . 3. The sequence (Xn ) is said to converge to X in measure if for each integrable set A and for each ε > 0, there exists p ≥ 1 so large that for each n ≥ p, there exists an integrable set Bn with μ(Bn ) < ε and ABnc ⊂ (d(Xn,X) ≤ ε). 4. The sequence (Xn ) is said to be Cauchy in measure if for each integrable set A and for each ε > 0, there exists p ≥ 1 so large that for each m,n ≥ p there c ⊂ (d(Xn,Xm ) ≤ ε). exists an integrable set Bm,n with μ(Bm,n ) < ε and ABm,n 5. Suppose S = R and X,Xn ∈ L for n ≥ 1. The sequence (Xn ) is said to converge to X in L1 if I |Xn − X| → 0. 6. The sequence (Xn ) is said to converge to X on an integrable subset A if A ⊂ domain(limn→∞ Xn ) and if X = limn→∞ Xn on A. The sequence (Xn ) is said to converge to X almost everywhere (a.e.) on an integrable subset A if (Xn ) converges to X on DA for some full set D. We will use the abbreviation Xn → X to stand for “(Xn ) converges to X” in whichever sense specified. Proposition 4.9.3. a.u. Convergence implies convergence in measure, etc. For each n ≥ 1, let X,Xn be functions on the complete integration space (,L,I ), with values in the complete metric space (S,d). Then the following conditions hold: 1. If Xn → X a.u. then (i) X is defined a.e. on each integrable set A, (ii) Xn → X in measure, and (iii) Xn → X a.e. on each integrable set A. 2. If (i ) Xn is a measurable function for each n ≥ 1 and (ii ) Xn → X in measure, then X is a measurable function.
Integration and Measure
111
3. If (i ) Xn is a measurable function for each n ≥ 1, and (ii ) Xn → X a.u., then X is a measurable function. Proof. 1. Suppose Xn → X a.u. Let the integrable set A and n ≥ 1 be arbitrary. Then, by Definition 4.9.2 for a.u. convergence, there exists an integrable set Bn with μ(Bn ) < 2−n such that Xh → X uniformly on ABnc . Hence, by Definition 4.9.2 for uniform convergence, there exists p ≡ pn ≥ 1 so large that ∞
ABnc ⊂
(d(Xh,X) ≤ 2−n ).
(4.9.1)
h=p(n)
Consequently, ABnc ⊂ domain(X). Define the integrable set B ≡ Then μ(B) ≤
∞
μ(Bn )
0 be arbitrary. Then, by Definition 4.9.2 for a.u. convergence, there exists an integrable set C with μ(C) < ε such that Xn → X uniformly on AC c . Hence there exists p ≥ 1 so large that AC c ⊂ ∞ n=p (d(Xn,X) ≤ ε). Next let n ≥ p be arbitrary. Define Cn ≡ C. Then μ(Cn ) < ε and ACnc = AC c ⊂ (d(Xn,X) < ε). Thus the condition in Definition 4.9.2 is verified for the convergence Xn → X in measure. Part (ii) of Assertion 1 is proved. 3. By Step 1, we have Xh → X uniformly on ABnc , for each n ≥ 1. It follows c c that Xh → X at each point in AD = ∞ n=k ABn , where D ≡ B is a full k=1 set. Thus Xn → X a.e. on A, where A is an arbitrary integrable set. Part (iii) of Assertion 1 is also verified. Assertion 1 is proved. 4. Next, suppose (i )Xn is measurable for each n ≥ 1, and suppose (ii ) Xn → X in measure. We need to prove that X is a measurable function. To that end, let f ∈ Cub (S) be arbitrary. Then |f | ≤ c on S for some c > 0, and f has a modulus of continuity δf . Let A be an arbitrary integrable set. Let k ≥ 1 be arbitrary. Because Xn → X in measure, there exists n ≡ nk ≥ 1 and an integrable set Bn ≡ Bn(k) , with μ(Bn ) < k −1 ∧ δf (k −1 ) and with ABnc ⊂ (d(Xn,X) < k −1 ∧ δf (k −1 )) ⊂ (|f (X) − f (Xn )| < k −1 ).
(4.9.2)
112
Probability Theory
This implies that (|f (X) − f (Xn )| ≥ k −1 ) ∩ (ABn ∪ ABnc ) ⊂ ABn and that ABnc ⊂ (d(Xn,X) < 1).
(4.9.3)
Because Xn is measurable, we have f (Xn(k) )1A ∈ L. Moreover, on the full set Bnc ∪ Bn , we have |f (X)1A − f (Xn(k) )1A | = |f (X) − f (Xn )|1AB(n)c + |f (X) − f (Xn )|1AB(n) ≤ k −1 1AB(n)c + 2c1AB(n) ≤ k −1 1A + 2c1B(n(k)), (4.9.4) where I (k −1 1A + 2c1B(n(k)) ) = k −1 μ(A) + 2cμ(Bn ) < k −1 (μ(A) + 2c) → 0 as k → ∞. Hence, by Theorem 4.5.10, inequality 4.9.4 implies that f (X)1A ∈ L. Condition (i) in Definition 4.8.1 has been proved for X. Hence, by Lemma 4.8.3, there exists a countable subset J of R such that the set (d(X,x◦ ) > a)A is integrable for each a ∈ Jc . 5. It remains to verify Condition (ii) in Definition 4.8.1 for X to be measurable. To that end, consider each ε > 0. Suppose k ≥ 1 is so large that k −1 < ε. Let n ≡ nk , as defined in Step 4. Since Xn is a measurable function, there exists αk > 0 so large that μ(d(Xn,x◦ ) > αk )A < k −1 .
(4.9.5)
Consider each b ∈ Jc such that b > αk + 1. Then (d(X,x◦ ) > b)A ⊂ (d(Xn,x◦ ) > αk )A ∪ (d(Xn,X) > 1)A ⊂ (d(Xn,x◦ ) > αk )A ∪ (d(Xn,X) > 1)ABnc ∪ ABn ⊂ (d(Xn,x◦ ) > αk )A ∪ (d(Xn,X) > 1)(d(Xn,X) < 1) ∪ ABn ⊂ (d(Xn,x◦ ) > αk )A ∪ φ ∪ Bn, where the third set-inclusion relation follows from relation 4.9.3. Hence, in view of inequalities 4.9.5 and 4.9.2, μ(d(X,x◦ ) > b)A ≤ μ(d(Xn,x◦ ) > αk )A + μ(Bn ) < k −1 + k −1 = 2k −1 < 2ε, where b ∈ Jc is arbitrary with b > αk + 1 and where ε > 0 is arbitrary. We conclude that μ(d(X,x◦ ) > b)A → 0 as b → ∞. Thus Condition (ii) in Definition 4.8.1 is verified. Accordingly, the function X is measurable. Assertion 2 is proved. 6. Suppose (i) Xn is measurable for each n ≥ 1 and (ii) Xn → X a.u. Then, by Assertion 1, we have Xn → X in measure. Hence, by Assertion 2, the function X is measurable. Assertion 3 and the proposition are proved.
Integration and Measure
113
Proposition 4.9.4. In the case of σ -finite (,L,I ), each sequence Cauchy in measure converges in measure and contains an a.u. convergent subsequence. Suppose (,L,I ) is σ -finite. For each n ≥ 1, let Xn be a function on (,L,I ), with values in the complete metric space (S,d). Suppose the sequence (Xn )n=1,2,... is Cauchy in measure. Then there exists a subsequence (Xn(k) )k=1,2,... such that X ≡ limk→∞ Xn(k) is a measurable function, with Xn(k) → X a.u. and Xn(k) → X a.e. Moreover, Xn → X in measure. Proof. 1. Let (Ak )k=1,2,... be a sequence of integrable sets that is an I -basis of (,L,I ). Thus (i) Ak ⊂ Ak+1 for each k ≥ 1, (ii) ∞ k=1 Ak is a full set, and (iii) for each integrable set A we have μ(Ak A) → μ(A). By hypothesis, (Xn ) is Cauchy in measure. By Definition 4.9.2, for each k ≥ 1 there exists nk ≥ 1 such that for each m,n ≥ nk , there exists an integrable set Bm,n,k with μ(Bm,n,k ) < 2−k
(4.9.6)
c ⊂ (d(Xn,Xm ) ≤ 2−k ). Ak Bm,n,k
(4.9.7)
and with
By inductively replacing nk with n1 ∨ . . . ∨ nk , we may assume that nk+1 ≥ nk for each k ≥ 1. Define Bk ≡ Bn(k+1),n(k),k for each k ≥ 1. Then μ(Bk ) < 2−k , and Ak Bkc ⊂ (d(Xn(k+1),Xn(k) ) ≤ 2−k )
(4.9.8)
for each k ≥ 1. Let i ≥ 1 be arbitrary. Define ∞
Ci ≡
Bk .
(4.9.9)
k=i
Then μ(Aci Ci ) Hence
2−k = 2−i+1 .
(4.9.10)
k=i
∞
c i=1 Ai Ci
Ai Cic ⊂
≤ μ(Ci ) ≤
∞
∞ k=i
is a null set and D ≡
Ai Bkc ⊂
∞ k=i
Ak Bkc ⊂
∞
∞
c i=1 Ai Ci
is a full set. Moreover,
(d(Xn(k+1),Xn(k) ) ≤ 2−k ),
(4.9.11)
k=i
in view of relation 4.9.8. Here the second inclusion is because Ai ⊂ Ak for each k ≥ i. Therefore, since (S,d) is complete, the sequence (Xn(k) )k=1,2,... converges uniformly on Ai Cic . In other words, Xn(k) → X uniformly on Ai Cic , where X ≡ limk→∞ Xn(k) .
114
Probability Theory
Next let A be an arbitrary integrable set. Let ε > 0 be arbitrary. In view of Condition (iii), there exists i ≥ 1 so large that 2−i+1 < ε and μ(AAci ) = μ(A) − μ(Ai A) < ε.
(4.9.12)
Define B ≡ AAci ∪Ci . Then μ(B) < 2ε. Moreover, AB c ⊂ Ai Cic . Hence relation 4.9.11 implies that Xn(k) → X uniformly on AB c . Since ε > 0 is arbitrary, we conclude that Xn(k) → X a.u. It then follows from Assertion 1 of Proposition 4.9.3 that Xn(k) → X in measure, Xn(k) → X a.e. on each integrable set, and X is measurable. 2. It remains only to prove that Xn → X in measure. To that end, define, for each m ≥ ni , B m ≡ AAci ∪ Bm,n(i),i ∪ Ci . Then, in view of inequalities 4.9.12, 4.9.6, and 4.9.10, we have, for each m ≥ ni , μ(B m ) = μ(AAci ∪ Bm,n(i),i ∪ Ci ) < ε + 2−i + 2−i+1 < 3ε.
(4.9.13)
Moreover, c
c AB m = A(Ac ∪ Ai )Bm,n(i),i Cic c c = AAi Bm,n(i),i Cic = (Ai Bm,n(i),i )(AAi Cic )
⊂ (d(Xm,Xn(i) ) ≤ 2−i )(d(X,Xn(i) ) ≤ 2−i+1 ) ⊂ (d(X,Xm ) ≤ 2−i + 2−i+1 ) ⊂ (d(X,Xm ) < 2ε)
(4.9.14)
for each m ≥ n(i), where the first inclusion is because of relations 4.9.7 and 4.9.11. Since the integrable set A and the positive real number ε > 0 are arbitrary, we have verified the condition in Definition 4.9.2 for Xm → X in measure. The proposition is proved. Proposition 4.9.5. Convergence in measure in terms of convergence of integrals. For each n ≥ 1, let X,Xn be functions on (,L,I ), with values in the complete metric space (S,d), such that d(Xn,Xm ) and d(Xn,X) are measurable for each n,m ≥ 1. Then the following conditions hold: 1. If I (1 ∧ d(Xn,X))1A → 0 for each integrable set A, then Xn → X in measure. 2. Conversely, if Xn → X in measure, then I (1 ∧ d(Xn,X))1A → 0 for each integrable set A. 3. The sequence (Xn ) is Cauchy in measure iff I (1 ∧ d(Xn,Xm ))1A → 0 as n,m → ∞ for each integrable set A. Proof. Let the integrable set A and the positive real number ε ∈ (0,1) be arbitrary. 1. Suppose I (1 ∧ d(Xn,X))1A → 0. Then, by Chebychev’s inequality, μ(d(Xn,X) > ε)A) ≤ μ(1 ∧ d(Xn,X))1A > ε) ≤ ε−1 I (1 ∧ d(Xn,X))1A → 0
Integration and Measure
115
as n → ∞. In particular, there exists p ≥ 1 so large that μ(1 ∧ d(Xn,X) > ε)A < ε for each n ≥ p. Now consider each n ≥ p. Define the integrable set Bn ≡ (1 ∧ d(Xn,X) > ε)A. Then μ(Bn ) < ε and ABnc ⊂ (d(Xn,X) ≤ ε). Thus Xn → X in measure. 2. Conversely, suppose Xn → X in measure. Let ε > 0 be arbitrary. Then there exists p ≥ 1 so large that for each n ≥ p, there exists an integrable set Bn with μ(Bn ) < ε and ABnc ⊂ (d(Xn,X) ≤ ε). Hence I (1 ∧ d(Xn,X))1A = I (1 ∧ d(Xn,X))1AB(n) + I (1 ∧ d(Xn,X))1AB(n)c ≤ I 1B(n) + I ε1A < ε + εμ(A), where ε > 0 is arbitrary. Thus I (1 ∧ d(Xn,X))1A → 0. 3. The proof of Assertion 3 is similar to that of Assertions 1 and 2, and is omitted. The next proposition will be convenient for establishing a.u. convergence. Proposition 4.9.6. Sufficient condition for a.u. convergence. Suppose (,L,I ) is σ -finite, with an I -basis (Ai )i=1,2,... . For each n,m ≥ 1, let Xn be a function on , with values in the complete metric space (S,d), such that d(Xn,Xm ) is measurable. Suppose that for each i ≥ 1, there exists a sequence (εn )n=1,2,... of positive 2 real numbers such that ∞ n=1 εn < ∞ and such that I (1∧d(Xn,Xn+1 ))1A(i) < εn for each n ≥ 1. Then X ≡ limn→∞ Xn exists on a full set, and Xn → X a.u. If, in addition, Xn is measurable for each n ≥ 1, then the limit X is measurable. Proof. 1. As an abbreviation, write Zn ≡ 1 ∧ d(Xn+1,Xn ) for each n ≥ 1. Let A be an arbitrary integrable set and let ε > 0 be arbitrary. Take i ≥ 1 so large that μ(AAci ) < ε. By hypothesis, there exists a sequence (εn )n=1,2,... of positive real 2 numbers such that ∞ n=1 εn < ∞ and such that I Zn 1A(i) < εn for each n ≥ 1. Chebychev’s inequality then implies that μ(Zn > εn )Ai ≤ μ(Zn 1A(i) > εn ) ≤ I (εn−1 Zn 1A(i) ) < εn ∞ for each n ≥ 1. Let p ≥ 1 be so large that ∞ n=p εn < 1∧ε. Let C ≡ n=p (Zn > ∞ c εn )Ai and let B ≡ AAi ∪ C. Then μ(B) < ε + n=p εn < 2ε. Moreover, AB c = A((Ac ∪ Ai )C c ) = AAi C c = AAi
∞
(Zn ≤ εn )
n=p
⊂
∞ n=p
(Zn ≤ εn ) ⊂
∞
(d(Xn+1,Xn ) ≤ εn ).
n=p
c Since ∞ n=p εn < ∞, it follows that Xn → X uniformly on AB , where X ≡ limn→∞ Xn . Since A and ε > 0 are arbitrary, we see that Xn → X a.u. 2. If, in addition, Xn is measurable for each n ≥ 1, then X is a measurable function by Assertion 3 of Proposition 4.9.3.
116
Probability Theory
Proposition 4.9.7. A continuous function preserves convergence in measure. Let (,L,I ) be a complete integration space. Let (S ,d ),(S ,d ) be locally ≡ (S ,d ) ⊗ (S ,d ) denote the product compact metric spaces and let ( S, d) metrics space. Let X ,X1 X2, . . . be a sequence of measurable functions with values in S such that Xn → X in measure. Similarly, let X ,X1 X2 , . . . be a sequence of measurable functions with values in S such that Xn → X in measure. → (S,d) be a continuous function with values in a complete Let f : ( S, d) Then metric space (S,d) that is uniformly continuous on compact subsets of ( S, d). f (Xn ,Xn ) → f (X ,X ) in measure as n → ∞. Generalization to m ≥ 2 sequences of measurable functions is similar. Proof. 1. Let x◦ and x◦ be fixed reference points in S andS , respectively. Write S. For each x,y ∈ S, write x ≡ (x ,x ) and y ≡ (y ,y ). x◦ ≡ (x◦ ,x◦ ) ∈ Likewise, write X ≡ (X ,X ) and Xn ≡ (Xn ,Xn ) for each n ≥ 1. For each n ≥ 1, the functions f (X),f (Xn ) are measurable functions with values in S, thanks to Assertion 2 of Proposition 4.8.20. 2. Let A be an arbitrary integrable set. Let ε > 0 be arbitrary. By Condition (ii) in Definition 4.8.1, there exists a > 0 so large that μ(B ) < ε and μ(B ) < ε, S, d) where B ≡ (d (x◦ ,X ) > a)A and B ≡ (d (x◦ ,X ) > a)A. Since ( ◦,·) ≤ a + 1) is contained in some is locally compact, the bounded subset (d(x → (S,d) compact subset. On the other hand, by hypothesis, the function f : ( S, d) is uniformly continuous on each compact subset of S. Hence there exists δ1 > 0 so small that we have d(f (x),f (y)) < ε
(4.9.15)
for each ◦,·) ≤ a + 1) x,y ∈ (d(x with d(x,y) < δ1 . Define δ ∈ (0,1 ∧ δ1 ). For each n ≥ 1, define Cn ≡ (d (Xn,X ) ≥ δ)A and Cn ≡ (d (Xn ,X ) ≥ δ)A. By hypothesis, Xn → X , and Xn → X in measure as n → ∞. Hence there exists p ≥ 1 so large that μ(Cn ) < ε and μ(Cn ) < ε for each n ≥ p. Consider any n ≥ p. Then μ(B ∪ B ∪ Cn ∪ Cn ) < 4ε. Moreover, A(B ∪ B ∪ Cn ∪ Cn )c = AB c B c Cn c Cn c = A(d (x◦ ,X ) ≤ a;d (x◦ ,X ) ≤ a;d (Xn ,X ) < δ;d (Xn ,X ) < δ) ⊂ A(d (x◦ ,X ) ≤ a;d (x◦ ,Xn ) ≤ a + 1;d (x◦ ,X ) ≤ a;d (x◦ ,Xn ) ≤ a + 1;d (Xn ,X ) < δ;d (Xn ,X ) < δ)
Integration and Measure
117
◦,X) ≤ a)(d(x ◦,Xn ) ≤ a + 1)(d(X n,X) < δ1 ) = A(d(x ◦,X) ≤ a + 1)(d(x ◦,Xn ) ≤ a + 1)(d(X n,X) < δ1 ) ⊂ (d(x ⊂ (d(f (X),f (Xn )) < ε), where the last set-inclusion relation is thanks to inequality 4.9.15. Since ε > 0 and A are arbitrary, the condition in Definition 4.9.2 is verified for f (Xn ) → f (X) in measure. Theorem 4.9.8. Dominated Convergence Theorem. Let (Xn )n=1,2,... be a sequence of real-valued measurable functions on the complete integration space (,L,I ), and let X be a real-valued function defined a.e. on , with Xn → X in measure. Suppose there exists an integrable function Y such that |X| ≤ Y a.e. and |Xn | ≤ Y a.e. for each n ≥ 1. Then X,Xn are integrable for each n ≥ 1, and I |Xn − X| → 0. Proof. 1. By Assertion 2 of Proposition 4.9.3, the function X is a measurable function. Since |X| ≤ Y a.e. and |Xn | ≤ Y a.e. for each n ≥ 1, Proposition 4.8.22 implies that X,Xn are integrable for each n ≥ 1. 2. Let ε > 0 be arbitrary. Since Y is integrable and is nonnegative a.e., there exists a > 0 so small that I (Y ∧ a) < ε.
(4.9.16)
Define A ≡ (Y > a). Then |Xn −X| ≤ 2Y = 2(Y ∧a) a.e. on Ac , for each n ≥ 1. By Proposition 4.7.1, there exists δ ∈ (0,ε/(1 + μA)) so small that I Y 1B < ε for each integrable set B with μ(B) < δ. On the other hand, by hypothesis, Xn → X in measure. Hence there exists m > 0 so large that for each n ≥ m, we have ABnc ⊂ (|Xn − X| ≤ δ) for some integrable set Bn with μ(Bn ) < δ. Then I Y 1B(n) < ε.
(4.9.17)
Combining, for each n ≥ m, we have, on the full set ABn ∪ ABnc ∪ Ac, |X − Xn | ≤ |X − Xn |1AB(n) + |X − Xn |1AB(n)c + |X − Xn |1Ac ≤ 2Y 1B(n) + δ1A + 2(Y ∧ a).
(4.9.18)
In view of inequalities 4.9.17 and 4.9.16, it follows that I |X − Xn | ≤ I (2Y 1B(n) + δ1A + 2(Y ∧ a)) ≤ 2ε + δμ(A) + 2ε ≤ 2ε + ε + 2ε. Since ε > 0 is arbitrary, we see that I |Xn − X| → 0.
(4.9.19)
The next definition introduces Newton’s notation for the Riemann–Stieljes integration relative to a distribution function. Definition 4.9.9. Lebesgue integration. Suppose F is a distribution function on R. Let I be the Riemann–Stieljes integration with respect to F , and let (R,L,I )
118
Probability Theory
be the completionof (R,C(R),I ). We will use the notation ·dF for I . For each X ∈ L, we write XdF or X(x)dF (x) for I X. An integrable function in L is then said to be integrable relative to the distribution function F , and a measurable function on (R,L,I ) is said to be measurable relative to F . Suppose X is a measurable function relative to F , and suppose s,t ∈ R such that the functions 1(s∧t,s] X and 1(s∧t,t] X are integrable relative to F . Then we write ! t ! ! ! t XdF ≡ X(x)dF (x) ≡ X1(s∧t,t] dF − X1(s∧t,s] dF . s
s
Thus
!
t
!
s
XdF = −
s
XdF . t
If A is a measurable set relative to F such that X1A is integrable, then we write ! ! ! XdF ≡ X(x)dF (x) ≡ X1A dF . A
x∈A
In the special case where F (x) ≡ x for x ∈ R,we write ·dx for ·dF. Let s < t t in R be arbitrary. The integration spaces (R,L, ·dx) and ([s,t],L[s,t], s ·dx) are the Lebesgue integration spaces on R and [s,t], respectively, and ·dx and called t ·dx are called the Lebesgue integrations. Then an integrable function in L or s L[s,t] is said to be Lebesgue integrable and a measurable function is said to be Lebesgue measurable. Since the identity function Z, defined by Z(x) ≡ x for each x ∈ R, is continuous and therefore a measurable function on (R,L, ·dF ), all but countably many t ∈ R are regular points of Z. Hence (s,t] = (s < Z ≤ t) is a measurable set in (R,L, ·dF ) for all but countably many s,t ∈ R. In other words, 1(s,t] is measurable relative to F for all but countably many s,t ∈ R. Therefore the t definition of s X(x)dF (x) is not vacuous. Proposition 4.9.10. Intervals are Lebesgue integrable. Let s,t ∈ R be arbitrary with s ≤ t. Then each of the intervals [s,t], (s,t), (s,t], and [s,t) is Lebesgue integrable, with the same Lebesgue measure equal to t − s, and with measuretheoretic complements (−∞,s) ∪ (t,∞), (−∞,s] ∪ [t,∞), (−∞,s] ∪ (t,∞), and (−∞,s) ∪ [t,∞), respectively. Each of the intervals (−∞,s), (−∞,s], (s,∞), and [s,∞) is Lebesgue measurable. Proof. Consider the Lebesgue integration ·dx and the Lebesgue measure μ. Let a,b ∈ R be such that a < s ≤ t < b. Define f ≡ fa,s,t,b ∈ C(R) such that f ≡ 1 on [s,t], f ≡ 0 on (−∞,a] ∪ [b,∞), and f is linear on [a,s] and on [t,b]. Let t0 < · · · < tn be any partition in the definition of a Riemann–Stieljes sum S ≡ ni=1 f (ti )(ti − ti−1 ) such that a = tj and b = tk for some j,k = 1, . . .,n
Integration and Measure
119
k
with j ≤ k. Then S = i=j +1 f (ti )(ti −ti−1 ) since f has [a,b] as support. Hence 0 ≤ S ≤ ki=j +1 (ti − ti−1 ) = tk − tj = b − a. Now let n → ∞,t0 → −∞, tn → ∞, and let the mesh of the partition approach 0. It follows from the last inequality that f (x)dx ≤ b − a. Similarly, t − s ≤ f (x)dx. Now, with s,t fixed, let (ak )k=1,2,... and (bk )k=1,2,... be sequences in R such that ak ↑ s and bk ↓ t, and let gk ≡ fak ,s,t,bk Then, by the previous argument, we have t − s ≤ gk (x)dx ≤ bk − ak ↓ t − s. Hence, by the Monotone Convergence Theorem, the limit g ≡ limk→∞ gk is integrable, with integral t − s. It is obvious that g = 1 or 0 on domain(g). In other words, g is an indicator function. Moreover, [s,t] = μ([s,t]) = (g = 1). Hence [s,t] is an integrable set, with 1[s,t] = g, with measure g(x)dx = t − s, and with measure-theoretic complement [s,t]c = (−∞,s) ∪ (t,∞). " # 1 Next consider the half-open interval (s,t]. Since (s,t] = ∞ k=1 s + k ,t , where " # "
1 s + k1 ,t is integrable for each k ≥ 1 and μ s + k1 ,t] = t − s − " s as k ↑ t −# 1 k → ∞, we have the integrability of (s,t], and μ([s,t]) = limk→∞ μ s+ k ,t = t − s. Moreover, % ∞ $ 1 c s + ,t (s,t]c = k k=1 ∞ 1 −∞,s + ∪ (t,∞) = (−∞,s] ∪ (t,∞). = k k=1
The proofs for the intervals (s,t) and [s,t) are similar. Now consider the interval (−∞,s). Define the function X on the full set D by X(x) = 1 or 0 according as x ∈ (−∞,s) or x ∈ [s,∞). Let A be any integrable subset of R. Then, for each n ≥ 1 with n > −s, we have |X1A − 1[−n,s) 1A | ≤ 1(−∞,−n) 1A on the full set D(A ∪ Ac )([−n,s) ∪ [−n,s)c ). At the same time, 1(−∞,−n) (x)1A (x)dx → 0. Therefore, by Theorem 4.5.10, the function X1A is integrable. It follows that for any f ∈ C(R), the function f (X)1A = f (1)X1A + f (0)(1A − X1A ) is integrable. We have thus verified Condition (i) in Definition 4.8.1 for X to be measurable. At the same time, since |X| ≤ 1, we have trivially μ(|X| > a) = 0 for each a > 1. Thus Condition (ii) in Definition 4.8.1 is also verified. We conclude that (−∞,s) is measurable. Similarly, we can prove that (−∞,s], (s,∞), and [s,∞) are all measurable.
4.10 Product Integration and Fubini’s Theorem In this section, let ( ,L ,I ) and ( ,L ,I ) be two arbitrary but fixed complete integration spaces. Let ≡ × denote the product set. We will construct the product integration space and embed the given integration spaces in it. The definitions and results can easily be generalized to more than two given integration spaces.
120
Probability Theory
Definition 4.10.1. Direct product of functions. Let X ,X be arbitrary members of L ,L , respectively. Define the function X ⊗ X : → R by domain(X ⊗ X ) ≡ domain(X ) × domain(X ) and by (X ⊗ X )(ω ,ω ) ≡ X (ω )X (ω ) for each ω ∈ . The function X ⊗ X is then called the direct product of the functions X and X . When the risk of confusion is low, we will write X ⊗ X and X X interchangeably. Definition 4.10.2. Simple functions. Let n,m ≥ 1 be arbitrary. Let X1 , . . . ,Xn ∈ ∈ L be mutually L be mutually exclusive indicators, and let X1 , . . . ,Xm exclusive indicators. For each i = 1, . . . ,n and j = 1, . . . ,m, let ci,j ∈ R be arbitrary. Then the real-valued function X=
n
m
ci,j X i Xj
i=1 j =1
is called a simple function relative to L ,L . Let L0 denote the set of simple functions on × . Two simple functions are said to be equal if they have the same domain and the same values on the common domain. In other words, equality in L0 is the set-theoretic equality: I (X) =
n
m
ci,j I (X i )I (Xj ).
(4.10.1)
i=1 j =1
Lemma 4.10.3. Simple functions constitute a linear space. As in Definition 4.10.2, let L0 be the set of simple functions on × relative to L ,L . Then the following conditions hold: 1. If X ∈ L0 , then |X|,a ∧ X ∈ L0 for each a > 0. 2. L0 is a linear space. 3. The function I on L0 is linear. Proof. 1. Let X,Y ∈ L0 be arbitrary. We may assume that X is the linear combination in Definition 4.10.2, and that Y is a similar linear combination p q Y ≡ k=1 h=1 bk,h Y k Yh in the notations of Definition 4.10.2. Define X0 ≡ p 1 − ni=1 Xi and X0 ≡ 1 − m j =1 Xj . Similarly, define Y0 ≡ 1 − k=1 Yk and q Y0 ≡ 1− h=1 Yh . For convenience, define ci,0 ≡ c0,j ≡ 0 for each i = 0, . . . ,n and j = 0, . . . ,m. Define bk,0 ≡ b0,h ≡ 0 for each k = 0, . . .,p and h = 0, . . .,q. 2. Let a > 0 be arbitrary. Consider each (ω ,ω ) ∈ domain(X). We have either (i) Xi (ω )Xj (ω ) = 0 for each i = 1, . . . ,n and j = 1, . . . ,m, or (ii)Xk (ω )Xh (ω ) = 1 for exactly one pair of k,h with k = 1, . . . ,n and h = 1, . . . ,m. In case (i), we have
n
n
m m
ci,j Xi (ω )Xj (ω ) = 0 = |ci,j |Xi (ω )Xj (ω ) |X(ω ,ω )| = i=1 j =1 i=1 j =1
Integration and Measure
121
and a ∧ X(ω ,ω ) = a ∧
m n
ci,j Xi (ω )Xj (ω )|
i=1 j =1 m n
(a ∧ ci,j )Xi (ω )Xj (ω ).
=a∧0=
i=1 j =1
Consider case (ii). Then
n
m |X(ω ,ω )| = ci,j Xi (ω )Xj (ω ) = |ck,h Xk (ω )Xh (ω )| i=1 j =1 = |ck,h |Xk (ω )Xh (ω ) =
m n
|ci,j |Xi (ω )Xj (ω ),
i=1 j =1
and a ∧ X(ω ,ω ) = a ∧
n
m
ci,j Xi (ω )Xj (ω ) = a ∧ (ck,h Xk (ω )Xh (ω ))
i=1 j =1
= (a ∧ ck,h )Xk (ω )Xh (ω ) =
m n
(a ∧ ci,j )Xi (ω )Xj (ω ). i=1 j =1
Thus, in either case, the conditions in Definition 4.10.2 are satisfied for the functions |X|,a ∧ X to be simple functions. Assertion 1 is proved. 3. Similarly, we can prove that L0 is closed under scalar multiplication. It remains to show that L0 is also closed under addition, 4. To that end, first note that if Y ∈ L is an arbitrary integrable indicator, then we have Y X0 ≡ Y − ni=1 Y Xi ∈ L , whence Y Xi ∈ L is an integrable indicator for each i = 0, . . . ,n. Similarly for L . 5. Then, in view of the observation in the previous paragraph, (Xi Yk )i=0,...,n;k=0,...,p;(i,k)(0,0) is a double sequence of N mutually exclusive integrable indicators in L , where N ≡ (n + 1)(p + 1) − 1. Let Yk(ν) )ν=0,...,N (Zν )ν=0,...,N ≡ (Xi(ν)
be an arbitrary but fixed rearrangement of the double sequence (Xi Yk )i=0,...,n; k=0,...,p into a single sequence, such that i0 = k0 = 0. Similarly, (Xj Yh )j =0,...,m; h=0,...,q;(j,h)(0,0) is a sequence of M mutually exclusive integrable integrable indicators in L , where M ≡ (m + 1)(q + 1) − 1. Let )μ=0,...,M (Zμ )μ=0,...,M ≡ (Xj (μ) Yh(μ)
122
Probability Theory
be an arbitrary but fixed rearrangement of the double sequence (Xj Yh )j =0,...,m;h=0,...,q into a single sequence, such that j0 = h0 = 0. 6. For each ν = 0, . . . ,N and for each μ = 0, . . . ,M, define the real number aν,μ ≡ ci(ν),j (μ) + bk(ν),h(μ) . Then, for each ν = 0, . . . ,N, we have aν,0 = ci(ν),j (0) + bk(ν),h(0) = ci(ν),0 + bk(ν),0 = 0 + 0 = 0. Similarly, for each μ = 0, . . . ,M, we have a0,μ = 0. 7. Then X+Y ≡
n
m
ci,j X i Xj +
i=0 j =0
=
n
m
ci,j X
i Xj
p q
Y
k Yh
k=0 h=0
q p
⎛ ⎞ m n
bk,h Y k Yh ⎝ X i Xj ⎠ i=0 j =0
k=0 h=0
=
bk,h Y k Yh
k=0 h=0
i=0 j =0
+
q p
p
q m
n
(ci,j + bk,h )(X i Y k )(Xj Yh )
i=0 k=0 j =0 h=0
=
M N
(ci(ν),j (μ) + bk(ν),h(μ) )(X i(ν) Y k(ν) )(Xj (μ) Yh(μ) )
ν=0 μ=0
=
N
M
aν,μ Z ν Zμ =
ν=0 μ=0
N
M
aν,μ Z ν Zμ .
(4.10.2)
ν=1 μ=1
) and (Z , . . . ,Z ) 8. Summing up, we see that the two sequences (Z1 , . . . ,ZN M 1 of mutually exclusive integrable indicators in L and L , respectively, together with the sequence (aν,κ )ν=1,...,N ;μ=1,...,M of real numbers, satisfy the conditions in Definition 4.10.2 for the function
X+Y =
N
M
aν,μ Z ν Zμ
ν=1 μ=1
to be a member of L0 , where X,Y ∈ L0 are arbitrary. Thus L0 is also closed relative to addition. We conclude that L0 is a linear space. Assertion 2 is proved. 9. To see that the operation I is additive, we work backward from equality 4.10.2. More precisely, by the definition of the function I , we have, in view of equality 4.10.2,
Integration and Measure I (X + Y ) ≡
N
M
123
aν,μ I (Z ν )I (Zμ )
ν=1 μ=1
=
N
M
aν,μ I (Z ν )I (Zμ )
ν=0 μ=0
=
N
M
(ci(ν),j (μ) + bk(ν),h(μ) )I (X i(ν) Y k(ν) )I (Xj (μ) Yh(μ) )
ν=0 μ=0
=
p
q n
m
(ci,j + bk,h )I (X i Y k )I (Xj Yh ) i=0 k=0 j =0 h=0
=
m n
ci,j I Xi
i=0 j =0
+
p
q p
m n
Y k I Xj
bk,h I Y k
n
=
⎛
q
Yh
m
⎞ Xj ⎠
j =0
i=0
ci,j I (Xi )I (Xj ) +
h=0
Xi I ⎝Yh
i=0 j =0 n
m
k=0
k=0 h=0
=
q p
bk,h I (Y k )I (Yh )
k=0 h=0
ci,j I (Xi )I (Xj ) +
i=1 j =1
q p
bk,h I (Y k )I (Yh )
k=1 h=1
≡ I (X) + I (Y ). Thus the operation I is additive. Finally, let a ∈ R be arbitrary. Then aX ≡ n m i=1 j =1 (aci,j )X i Xj . Hence I (aX) ≡
n
m
(aci,j )I (X i )I (Xj ) i=1 j =1
⎛ ⎞ m n
= a⎝ ci,j I (X i )I (Xj )⎠ = aI (X). i=1 j =1
Combining, we see that the operation I is linear. 10. Now suppose two simple functions U,V are equal. Suppose |I (U ) − I (V )| > 0. Then, by linearity, we know that X ≡ U − V is a simple function, with |I (X)| > 0. Let X ≡ ni=1 m j =1 ci,j X i Xj in the notations of Definition 4.10.2. Then
m n
ci,j I (X i )I (Xj ) ≡ |I (X)| > 0. i=1 j =1
124
Probability Theory
Hence |ci,j I (X i )I (Xj )| > 0 for some i = 1, . . . ,n and j = 1, . . . ,m. Consequently, |ci,j | > 0 and Xi (ω ) = 1 and Xj (ω ) = 1 for some ω ≡ (ω ,ω ) ∈ domain(Xi Xj ), thanks to the positivity of the integrations I ,I . Therefore |U (ω) − V (ω)| ≡ |X(ω)| = |ci,j X i (ω )Xj (ω )| = |ci,j | > 0, which is a contradiction. Thus we see that I (U ) = I (V ) for arbitrary simple functions with U = V . Thus I is a well-defined function, and we saw earlier that it is linear. Assertion 3 and the lemma are proved. Theorem 4.10.4. Integration on space of simple functions. Let the set L0 of simple functions and the function I on L0 be defined as in Definition 4.10.2. Then the triple (,L0,I ) is an integration space. Proof. We need to verify the three conditions in Definition 4.3.1. 1. The linearity of the space L0 and the linearity of I were proved in Lemma 4.10.3. 2. Next consider any X ∈ L0 , with X = ni=0 m j =0 ci,j X i Xj in the notations of Definition 4.10.2. By Step 2 in the proof of Lemma 4.10.3, we see that n
m
|X| =
|ci,j |X i Xj ∈ L0
i=0 j =0
and a∧X =
n
m
(a ∧ ci,j )X i Xj ∈ L0
i=0 j =0
for each a > 0. Hence n
m
I (X ∧ a) = (a ∧ ci,j )I (X i )I (Xj ) i=0 j =0 m n
→
ci,j I (Xi )I (Xj ) ≡ I (X)
i=0 j =0
as a → ∞. Likewise, I (|X| ∧ a) =
n
m
(a ∧ |ci,j |)I (X i )I (Xj ) → 0 i=0 j =0
as a → 0. Conditions 1 and 4 in Definition 4.3.1 are thus satisfied by the triple (,L0,I ). 3. Next, since I is an integration, there exists, according to Proposition 4.3.4, some nonnegative X ∈ L such that I X = 1. By Condition 4 in Definition 4.3.1, there exists n ≥ 2 so large that I X ∧ n − I X ∧ n−1 > 0. By Assertion 1 of Corollary 4.6.7, there exists t ∈ (0,n−1 ) such that (t < X ) is an integrable set.
Integration and Measure
125
Let X1 be the integrable indicator of the integrable set (t < X ) in ( ,L ,I ). Suppose X (ω ) ∧ n − X (ω ) ∧ n−1 > 0. Then X (ω ) ≥ n−1 > t, whence X1 (ω ) = 1 by the definition of an indicator. Thus X ∧ n − X ∧ n−1 ≤ X1 . Hence I X1 ≥ I X ∧ n − I X ∧ n−1 > 0. Similarly, there exists an integrable indicator X1 in ( ,L ,I ) such that I X1 > 0. Now define the simple function X ≡ X1 X1 . Then I X ≡ I (X1 )I (X1 ) > 0. This proves Condition 3 in Definition 4.3.1 for the triple (,L0,I ). 4. It remains to prove Condition 2 in Definition 4.3.1, the positivity condition. To that end, suppose (Xk )k=0,1,2,... is a sequence of simple functions in L0 , such that Xk ≥ 0 for k ≥ 1 and such that ∞ k=1 I (Xk ) < I (X0 ). For each k ≥ 0, we have Xk =
n(k) m(k)
ck,i,j X k,i Xk,j
i=1 j =1
as in Definition 4.10.2. It follows that n(k) m(k) ∞
ck,i,j I (X k,i )I (Xk,j )≡
k=1 i=1 j =1
∞
I (Xk ) < I (X0 )
k=1
=
n(0) m(0)
c0,i,j I (X 0,i )I (X0,j ).
i=1 j =1
In view of the positivity condition on the integration I , there exists ω ∈ such that n(k) m(k) ∞
ck,i,j X k,i (ω )I (Xk,j )
0 is any bound for f ∈ Cub (S). Hence f (X)Y =
∞
k=1
f (X)Yk ∈ L.
(4.10.5)
Integration and Measure
131
Similarly, hn (X)Y ∈ L and I hn (X)Y =
∞
I hn (X)Yk
k=1
for each n ≥ 0. Now I |hn (X)Yk | ≤ I |Yk | and, by Condition (ii), I hn (X)Yk → Y as n → ∞, for each k ≥ 1. Hence I hn (X)Y → ∞ k=1 I Yk = I Y as n → ∞, where Y ∈ L is arbitrary. 2. In particular, if A is an arbitrary integrable subset of , then I hn (X)1A ↑ I 1A ≡ μ(A), where μ is the measure function relative to the integration I . We have thus verified the conditions in Proposition 4.8.6 for X to be measurable. Assertion 1 is proved. Assertion 2 is proved similarly by symmetry. Assertion 3 follows from Assertions 1 and 2 by induction. Assertion 4 is a special case of Assertion 3, where Ak ≡ (k) for each k = 1, . . . ,n. For products of integrations based on locally compact spaces, the following proposition will be convenient. Proposition 4.10.11. Product of integration spaces based on locally compact metric spaces. Let (S1,d1 ) be an arbitrary locally compact metric space. Let (S1,C(S1,d1 ),I1 ) be an integration space, with completion (S1,L1,I1 ). Let n ≥ 2 be arbitrary. Let (S,d) ≡ (S1n,d1n ) be the nth power metric space of (S1,d1 ). Let ⊗n (S,L,I ) ≡ (S1n,L⊗n 1 ,I1 )
be the nth power integration space of (S1,L1,I1 ). Then C(S,d) ⊂ L, and (S,C(S,d),I ) is an integration space with (S,L,I ) as its completion. Proof. Consider only the case n = 2, with the general case being similar. By Definition 4.10.5, the product integration space (S,L,I ) is the completion of the subspace (S,L0,I ), where L0 is the space of simple functions relative to L1,L1 . Let X ,X be arbitrary members of L1 . When the risk of confusion is low, we will write X ⊗ X and X X interchangeably. 1. Let X ∈ C(S,d) be arbitrary. We need to show that X ∈ L. Since X has compact support, there exists V1,V2 ∈ C(S1,d1 ) such that (i) 0 ≤ V1,V2 ≤ 1 , (ii) if x ≡ (x1,x2 ) ∈ S is such that |X(x)| > 0, then V1 (x1 ) = 1 = V2 (x2 ), and (iii) I V1 > 0, I V2 > 0. 2. Let ε > 0 be arbitrary. Then, according to Assertion 2 of Proposition 3.3.6, there exists a family {gx : x ∈ A} of Lipschitz continuous functions indexed by some discrete finite subset A of S1 such that
X − (4.10.6) X(x,x )gx ⊗ gx ≤ ε, (x,x )∈A2
132
Probability Theory
where · signifies the supremum norm in C(S n,d n ). Multiplication by V1 ⊗ V2 yields, in view of Condition (ii),
X − X(x,x )(gx V1 ⊗ gx V2 ) (4.10.7) < εV1 ⊗ V2 . (x,x )∈A2 Since C(S1,d1 ) ⊂ L1 , we have V1 ⊗ V2 ∈ L and gx V1 ⊗ gx V2 ∈ L for each (x,x ) ∈ A2 , according to Proposition 4.10.6. Since I (εV1 V2 ) > 0 is arbitrarily small, inequality 4.10.7 implies that X ∈ L, thanks to Theorem 4.5.10. Since X ∈ C(S,d) is arbitrary, we conclude that C(S,d) ⊂ L. 3. Since I is a linear function on L, and since C(S,d) is a linear subspace of L, the function I is a linear function on C(S,d). Since I (V1 V2 ) = (I1 V1 )(I1 V2 ) > 0, the triple (S,C(S,d),I ) satisfies Condition (i) of Definition 4.2.1. Condition (ii) of Definition 4.2.1, the positivity condition, follows trivially from the positivity condition of (S,L,I ). Hence (S,C(S,d),I ) is an integration space. Since C(S,d) ⊂ L and since (S,L,I ) is complete, the completion L of C(S,d) relative to I is such that L ⊂ L. 4. We will now show that, conversely, L ⊂ L. To that end, consider any Y1 Y2 ∈ L1 . Let ε > 0 be arbitrary. Then there exists Ui ∈ C(S1,d1 ) such that I1 |Ui − Yi | < ε for each i = 1,2 because L1 is the completion of C(S1,d1 ) relative to I1 , by hypothesis. Consequently, I |Y1 Y2 − U1 U2 | ≤ I |Y1 (Y2 − U2 )| + I |(Y1 − U1 )U2 | = I1 |Y1 | · I1 |Y2 − U2 | + I1 |Y1 − U1 | · I1 |U2 | < I1 |Y1 |ε + ε(I1 |Y2 | + ε), where the equality is provided by Fubini’s Theorem. Since ε > 0 is arbitrarily small, while U1 U2 ∈ C(S,d) ⊂ L, we see that Y1 Y2 ∈ L. Each simple function on S, as in Definition 4.10.2, is a linear combination of functions of the form Y1 Y2 , where Y1,Y2 ∈ L1 . Thus we conclude that L0 ⊂ L. At the same time, L is complete relative to I . Hence the completion L of L0 is a subspace of L. Thus L ⊂ L, as alleged. 5. Summing up the results of Steps 3 and 4, we obtain L = L. In other words, the completion of (S,C(S,d),I ) is (S,L,I ). The proposition is proved. Proposition 4.10.12. The product of σ -finite integration spaces is σ -finite. Let ( ,L ,I ) and ( ,L ,I ) be arbitrary integration spaces that are σ -finite, with I -bases (A k )k=1,2,... and (A k )k=1,2,... , respectively. Then the product integration space (,L,I ) ≡ ( × ,L ⊗ L ,I ⊗ I ) is σ -finite, with an I -basis (Ak )k=1,2,... ≡ (A k × A k )k=1,2,... .
Integration and Measure
133
Proof. By the definition of an I -basis, we have A k ⊂ A k+1 and A k ⊂ A k+1 for each k ≥ 1. Hence, Ak ≡ A k × A k ⊂ A k+1 × A k+1 for each k ≥ 1. Moreover, ∞ ∞ ∞ (Ak × Ak ) = Ak × Ak . k=1
k=1
k=1
Again by the definition of an I -basis, the two unions on the right-hand side are full subsets in , , respectively. Hence the union on the left-hand side is, according to Assertion 2 of Proposition 4.10.6, a full set in . Now let f ≡ 1B 1B , where B ,B are arbitrary integrable subsets in , , respectively. Then I (1A(k) f ) = I (1A (k) 1B )I (1A (k) 1B ) → I (1B )I (1B ) = If as k → ∞. By linearity, it follows that I (1A(k) g) → Ig for each simple function g on × relative to L ,L . Consider each h ∈ L. Let ε > 0 be arbitrary. Since (,L,I ) is the completion of (,L0,I ), where L0 is the space of simple functions on × relative to L ,L , it follows that I |h−g| < ε for some g ∈ L0 . Hence |I 1A(k) f − If | ≤ |I 1A(k) f − I 1A(k) g| + |I 1A(k) g − Ig| + |Ig − If | < ε ≤ I |f − g| + |I 1A(k) g − Ig| + |Ig − If | < 3ε for sufficiently large k ≥ 1. Since ε > 0 is arbitrary, we conclude that I (1A(k) h) → I h. In particular, if A is an arbitrary integrable subset of , we have I (1A(k) 1A ) → I 1A . In other words, μ(Ak A) → μ(A) for each integrable set A ⊂ . We have verified the conditions in Definition 4.8.15 for (,L,I ) to be σ -finite, with (Ak )k=1,2,... ≡ (A k × A k )k=1,2,... as an I -basis. The next definition establishes some familiar notations for the special cases of the Lebesgue integration space on R n . Definition 4.10.13. Lebesgue integration on R n . Let (R,L, ·dx) denote the Lebesgue integration space on R, as in Definition 4.9.9. The product integration space n ! ! n n ! n R, L, ·dx R ,L, · · · ·dx1 . . . dxn ≡ i=1
i=1
i=1
is called the Lebesgue integration space of dimension n. Similar terminology n n applies when R n is replaced by an interval i=1 [si ,ti ] ⊂ R . When confusion is unlikely, we will also abbreviate · · · ·dx1 . . . dxn to ·dx, with the undern function standing that thedummy variable x is now a member of R . An integrable relative to · · · ·dx1 . . . dxn will be called Lebesgue integrable on R n .
134
Probability Theory
Corollary 4.10.14. Restriction of the Lebesgue integration on R n to C(R n ) is an integration whose completion is the Lebesgue integration on R n . Let n ≥ 1 be arbitrary. Then, in the of Definition 4.10.13, we have C(R n ) ⊂ L. notations n n and its comMoreover, (R ,C(R ), · · · ·dx1 . . . dxn ) is an integration space, pletion is equal to the Lebesgue integration space (R n,L, · · · ·dx1 . . . dxn ). Proof. Let Si ≡ R for each i = 1, . . . ,n. Let S ≡ R n . Proposition 4.10.11 then applies and yields the desired conclusions. Definition 4.10.15. Product of countably many probability integration spaces. For each n ≥ 1, let ((n),L(n),I (n) ) be a complete integration space such that (i) 1 ∈ L(n) with I (n) 1 = 1. Consider the Cartesian product ≡ ∞ i=1 . Let n n n (i) (i) (i) be the product of the I n ≥ 1 be arbitrary. Let i=1 , i=1 L , i=1 n (i) , define a function L first n complete integration spaces. For each g ∈ i=1 ∞ (i) g on by domain(g) ≡ domain(g) × i=n+1 , and by g(ω1,ω2, . . .) ≡ g(ω1, . . . ,ωn ) for each (ω1,ω2, . . .) ∈ domain(g). Let Gn ≡ g : g ∈
n
L(i) .
i=1
Then Gn ⊂ Gn+1 . Let L ≡ ∞ n=1 Gn and define a function I : L → R by n (i) (g) if g ∈ G , for each g ∈ L. Theorem 4.10.16 says that L I (g) ≡ I n i=1 is a linear space, that I is a well-defined linear function, and that (,L,I ) is an integration ∞space. ∞ (i) ∞ (i) (i) denote the completion of (,L,I ), and Let i=1 , i=1 L , i=1 I call it the product of the sequence ((n),L(n),I (n) )n=1,2,... of complete integration spaces. In the special case where ((i),L(i),I (i) ) = (0,L0,I0 ) for each i ≥ 1, for some complete integration space (0,L0,I0 ), then we write ⊗∞ ⊗∞ (∞ ) 0 ,L0 ,I0
≡
∞ i=1
(i)
,
∞ i=1
(i)
L ,
∞
I
(i)
i=1
and call it the countable power of the integration space (0,L0,I0 ). Theorem 4.10.16. The countable product of probability integration spaces is a well-defined integration space. For each n ≥ 1, let ((n),L(n),I (n) ) be a complete integration space such that 1 ∈ L(n) with I (n) 1 = 1. Then the following conditions hold: 1. The set L of functions is a linear space. Moreover, I is a well-defined linear function and (,L,I ) is an integration space. 2. Let N ≥ 1 be arbitrary. Let Z (N ) be a measurable function on ((N ), (N L ),I (N ) ) with values in some complete metric space (S,d). Define the function (N ) (N ) Z : → S by Z (ω) ≡ Z (N ) (ωi ) for each ω ≡ (ω1,ω2, . . .) ∈ such
Integration and Measure
135
that ωN ≡ (ω1,ω2, . . . ,ωN ) ∈ domain(Z (N ) ). Let M ≥ 1 be arbitrary. Let fj ∈ Cub (S,d) be arbitrary for each j ≤ M. Then ⎛ ⎞ M M (j ) (j ) I⎝ fj (Z ⎠ = I (j ) fj (Z ). j =1
j =1
3. For N ≥ 1, the function Z each ∞ (i) ∞ (i), L ,I . space i=1 i=1
(N )
is measurable on the countable product
Proof. 1. Obviously, Gn and L are linear spaces. Suppose g = h for some g ∈ Gn and h ∈ Gm with n ≤ m. Then h(ω1, . . . ,ωm ) ≡ h(ω1,ω2, . . .) = g(ω1,ω2, . . .) ≡ g(ω1, . . . ,ωn ) = g(ω1, . . . ,ωn )1A (ωn+1, . . . ,ωm ), where A ≡ i=n+1 (i) . Then 1A = m i=n+1 1(i) . Since, by hypothesis, 1(i) = (i) (i) 1 ∈ L with I 1 = 1, for each
i ≥ 1, Fubini’s Theorem implies that 1A ∈ m m (i), with (i) (1 ) = 1, and that L I A i=n+1 i=n+1 m m I (h) ≡ I (i) (h) = I (i) (g ⊗ 1A ) m
i=1
i=1
⎛ ⎞ n m =⎝ I (i) ⊗ I (i) ⎠ (g ⊗ 1A ) i=1
=
n i=1
i=n+1
I (i)
⎞ n ⎛ m (i) ⎠ (i) ⎝ I I (1A ) = (g) · (g) · 1 = I (g). i=n+1
i=1
Thus the function I is well defined. The linearity of I is obvious. The verification of the other conditions in Definition 4.3.1 is straightforward. Accordingly, (,L,I ) is an integration space. 2. In view of Fubini’s Theorem (Theorem 4.10.7), the proofs of Assertions 2 and 3 are straightforward and omitted. Following are two results that will be convenient. Proposition 4.10.17. The region below the graph of a nonnegative integrable function is an integrable set in the product space. Let (Q,L,I ) be a complete integration space that is σ -finite. Let (,,I0 ) ≡ (,, ·dr) be the Lebesgue integration space based on ≡ R or ≡ [0,1]. Let λ : Q → R be an arbitrary measurable function on (Q,L,I ). Then the following conditions hold: 1. The sets Aλ ≡ {(t,r) ∈ Q × : r ≤ λ(t)}
136
Probability Theory
and A λ ≡ {(t,r) ∈ Q × : r < λ(t)} are measurable on (Q,L,I ) ⊗ (,,I0 ). 2. Suppose, in addition, that λ is a nonnegative integrable function. Then the sets Bλ ≡ {(t,r) ∈ Q × : 0 ≤ r ≤ λ(t)} and Bλ ≡ {(t,r) ∈ Q × : 0 ≤ r < λ(t)} are integrable, with (I ⊗ I0 )Bλ = I λ = (I ⊗ I0 )Bλ .
(4.10.8)
Proof. 1. Let g be the identity function on , with g(r) ≡ r for each r ∈ . By Proposition 4.10.10, g and λ can be regarded as measurable functions on Q × , with values in . Define the function f : Q × → R by f (t,r) ≡ g(r) − λ(t) ≡ r − λ(t) for each (t,r) ∈ Q × . Then f is the difference of two real-valued measurable functions on Q × . Hence f is measurable. Therefore there exists a sequence (an )n=1,2,... in (0,∞) with an ↓ 0 such that (f ≤ an ) is measurable for each n ≥ 1. We will write an and a(n) interchangeably. 2. Let A ⊂ Q and B ⊂ be arbitrary integrable subsets of Q and , respectively. Let h : → be the identity function, with h(r) ≡ r for each r ∈ . Let m ≥ n be arbitrary. Then
I ⊗ I0 1(f ≤a(n))(A×B) − 1(f ≤a(m))(A×B) = I ⊗ I0 (1(a(m) t) < a for each i = 1,2, . . ..” The purpose of this convention is to obviate unnecessary distraction from the main arguments. If, for another example, the measurability of the set (X ≤ 0) is required in a discussion, we would need to first supply a proof that 0 is a regular point of X or, instead of (X ≤ 0), use (X ≤ a) as a substitute, where a is some regular point near 0. Unless the exact value 0 is essential to the discussion, the latter, usually effortless alternative will be used. The implicit assumption of regularity of the point a is clearly possible, for example, when we have the freedom to pick the number a from some open interval. This is thanks to Proposition 4.8.14, which says that all but countably many real numbers are regular points of X. Classically, all t ∈ R are regular points for each r.r.v. X, so this convention would be redundant classically. In the case of a measurable indicator X, the only possible values, 0 and 1, are regular points. Separately, recall that the indicator 1A and the complement Ac of an event are uniquely defined relative to a.s. equality. Proposition 5.1.4. Basic properties of r.v.’s. Let (,L,E) be a probability space. 1. Suppose A is an event. Then Ac is an event. Moreover, (Ac )c = A and P (Ac ) = 1 − P (A).
Probability Space
141
2. A subset A of is a full set iff it is an event with probability 1. 3. Let (S,d) be a complete metric space. A function X : → S is an r.v. with values in (S,d) iff (i) f (X) ∈ L for each f ∈ Cub (S,d) and (ii) P (d(X,x◦ ) ≥ a) → 0 as a → ∞. Note that if the metric d is bounded, then Condition (ii) is automatically satisfied. 4. Let (S,d) be a complete metric space, with a reference point x◦ . For each n ≥ 1, define hn ≡ 1 ∧ (1 + n − d(·,x◦ ))+ ∈ Cub (S). Then a function X : → S is an r.v. iff (i) f (X) ∈ L for each f ∈ Cub (S) and (iii) Ehn (X) ↑ 1 as n → ∞. In that case, we have E|f (X) − f (X)hn (X)| → 0, where f hn ∈ C(S) 5. Let (S,d) be a locally compact metric space, with a reference point x◦ . For each n ≥ 1, define the function hn as in Assertion 4. Then hn ∈ C(S). A function X : → S is an r.v. iff (iv) f (X) ∈ L for each f ∈ C(S) and (iii) Ehn (X) ↑ 1 as n → ∞. In that case, for each f ∈ Cub (S), there exists a sequence (gn )n=1,2,... in C(S) such that E|f (X) − gn (X)| → 0. 6. If X is an integrable r.r.v. and A is an event, then EX = E(X;A)+E(X;Ac ). 7. A point t ∈ R is a regular point of an r.r.v. X iff it is a regular point of X relative to . 8. If X is an r.r.v. such that (t − ε < X < t) ∪ (t < X < t + ε) is a null set for some t ∈ R and ε > 0, then the point t ∈ R is a regular point of X. Proof. 1. Suppose A is an event with indicator 1A and complement Ac = (1A = 0). Because 1 is integrable, so is 1 − 1A . At the same time, Ac = (1A = 0) = (1 − 1A = 1). Hence Ac is an event with indicator 1 − 1A . Moreover, P (Ac ) = E(1 − 1A ) = 1 − P (A). Repeating the argument with the event Ac , we see that (Ac )c = (1 − (1 − 1A ) = 1) = (1A = 1) = A. 2. Suppose A is a full set. Since any two full sets are equal a.s., we have A = a.s. Hence P (A) = P () = 1. Conversely, if A is an event with P (A) = 1, then according to Assertion 1, Ac is a null set with A = (Ac )c . Hence by Assertion 4 of Proposition 4.5.5, A is a full set. 3. Suppose X is an r.v. Since is an integrable set, Conditions (i) and (ii) of the present proposition hold as special cases of Conditions (i) and (ii) in Definition 4.8.1 when we take A = . Conversely, suppose Conditions (i) and (ii) of the present proposition hold. Let f ∈ Cub (S) be arbitrary and let A be an arbitrary integrable set. Then f (X) ∈ L by Condition (i), and so f (X)1A ∈ L. Moreover P (d(x◦,X) ≥ a)A ≤ P (d(x◦,X) ≥ a) → 0 as a → ∞. Thus Conditions (i) and (ii) in Definition 4.8.1 are established for X to be a measurable function. In other words, X is an r.v. Assertion 3 is proved. 4. Given Condition (i), then Conditions (ii) and (iii) are equivalent to each other, according to Proposition 4.8.6, whence Condition (ii) holds iff the function X is an r.v. Thus Assertion 4 follows from Assertion 3.
142
Probability Theory
5. Suppose (S,d) is locally compact. Suppose Condition (iii) holds. We will first verify that Conditions (i) and (iv) are then equivalent. Trivially, Condition (i) implies Condition (iv). Conversely, suppose Condition (iv) holds. Let f ∈ Cub (S) be arbitrary. We need to prove that f (X) ∈ L. There is no loss of generality in assuming that 0 ≤ f ≤ b for some b > 0. Then E(f (X)hm (X) − f (X)hn (X)) ≤ bE(hm (X) − hn (X)) → 0 as m ≥ n → ∞, thanks to Condition (iii). Thus Ef (X)hn (X) converges as n → ∞. Hence the Monotone Convergence Theorem implies that limn→∞ f (X)hn (X) is integrable. Since limn→∞ f hn = f on S, it follows that f (X) = limn→∞ f (X)hn (X) ∈ L. Thus Condition (i) holds. Summing up, given Condition (iii), the Conditions (i) and (iv) are equivalent to each other, as alleged. Hence Conditions (iii) and (iv) together are equivalent to Conditions (iii) and (i) together, which is equivalent to the function X being an r.v. Moreover, in that case, the Monotone Convergence Theorem implies that E|f (X)hn (X) − f (X)| → 0, where f hn ∈ C(S) for each n ≥ 1. Assertion 5 is proved. 6. EX = EX(1A +1Ac ) = EX1A +EX1Ac ≡ E(X;A)+E(X;Ac ). Assertion 6 is verified. 7. Assertion 7 follows easily from Definition 4.6.8. 8. Suppose X is an r.r.v. such that B ≡ (t −ε < X < t)∪(t < X < t +ε) is a null set for some t ∈ R and ε > 0. Let (sn )n=1,2,... be a sequence of regular points of X in (t,t + ε) that decreases to t. Then (sn < X) = (sn+1 < X) a.s. for each n ≥ 1, because (sn+1 < X ≤ sn ) ⊂ B is a null set. Hence P (sn < X) = P (sn+1 < X) for each n ≥ 1. Therefore limn→∞ P (sn < X) trivially exists. Similarly, there exists a sequence (rn )n=1,2,... of regular points of X in (t − ε,t) that increases to t such that limn→∞ P (rn < X) exists. The conditions in Definition 4.8.13 have been proved for t to be a regular point of X. The proposition is proved. We will make heavy use of the following Borel–Cantelli Lemma. Proposition 5.1.5. First Borel–Cantelli Lemma. Suppose (An )n=1,2,... is a only a finite sequence of events such that ∞ n=1 P (An ) converges. Then ∞ a.s. ∞ c number of the events An occur. More precisely, we have P k=1 n=k An = 1. Proof. By Assertion 4 of Proposition 4.5.8, for each k ≥ 1, the union Bk ≡ ∞ ∞ ∞ c An is an event, with P (Bk ) ≤ n=k P (An ) → 0. Then n=k An ⊂ n=k ∞ c n=k+1 An for each k ≥ 1, and P
∞
Acn
= P (Bkc ) → 1.
n=k
Therefore, by Assertion 1 of Proposition 4.5.8, the union
∞ ∞ c = 1. A event, with P k=1 n=k n
∞ ∞ k=1
n=k
Acn is an
Probability Space
143
Definition 5.1.6. Lp space. Let X,Y be arbitrary r.r.v.’s. Let p ∈ [1,∞) be arbitrary. If Xp is integrable, define X p ≡ (E|X|p )1/p . Define Lp to be the family of all r.r.v.’s X such that Xp is integrable. We will refer to X p as the Lp -norm of X. Let n ≥ 1 be an integer. If X ∈ Ln , then E|X|n is called the n-th absolute moment, and EXn the n-th moment, of X. If X ∈ L1 , then EX is also called the mean of X. If X,Y ∈ L2 , then according to Proposition 5.1.7 (proved next), X,Y, and (X − EX)(Y − EY ) are integrable. Then E(X − EX)2 and E(X − EX)(Y − EY ) are called the variance of X and the covariance of X and Y , respectively. The square root of the variance of X is called the standard deviation of X. Next are several basic inequalities for Lp . Proposition 5.1.7. Basic inequalities in Lp . Let p,q ∈ [1,∞) be arbitrary. 1. Hoelder’s inequality. Suppose p,q > 1 and p1 + q1 = 1. If X ∈ Lp and Y ∈ Lq , then XY ∈ L1 and E|XY | ≤ X p Y q . The special case where p = q = 2 is referred to as the Cauchy–Schwarz inequality. 2. Minkowski’s inequality. If X,Y ∈ Lp , then X + Y ∈ Lp and X + Y p ≤
X p + Y p . 3. Lyapunov’s inequality. If p ≤ q and X ∈ Lq , then X ∈ Lp and X p ≤
X q . Proof. 1. Write α,β for p1 , q1 , respectively. Then x α y β ≤ αx + βy for nonnegative x,y. This can be seen by noting that with y fixed, the function f defined by f (x) ≡ αx + βy − x α y β is equal to 0 at x = y, is decreasing for x < y, and is increasing for x > y. Let a,b ∈ R be arbitrary with a > X p and b > Y q . Replacing x,y by |X/a|p,|Y /b|q , respectively, we see that |XY | ≤ (α|X/a|p + β|Y /b|q )ab. It follows that |XY | is integrable, with the integral bounded by p
q
E|XY | ≤ (α X p /a p + β Y q /bq )ab. As a → X p and b → Y q , the last bound approaches X p Y q . p . Then p1 + q1 = 1. Because |X+Y |p ≤ 2. Suppose first that p > 1. Let q ≡ p−1 (2(|X| ∨ |Y |))p ≤ 2p (|X|p + |Y |p ), we have X + Y ∈ Lp . It follows trivially that |X + Y |p−1 ∈ Lq . Applying Hoelder’s inequality, we estimate E|X + Y |p ≤ E|X + Y |p−1 |X| + E|X + Y |p−1 |Y | ≤ (E|X + Y |(p−1)q )1/q ( X p + Y p ) = (E|X + Y |p )1/q ( X p + Y p ).
(5.1.1)
Suppose X + Y p > X p + Y p . Then inequality 5.1.1, when divided by (E|X + Y |p )1/q , would imply X + Y p = (E|X + Y |p )1−1/q ≤ X p + Y p ,
144
Probability Theory
which is a contradiction. This proves Minkowski’s inequality for p > 1. Suppose now p ≥ 1. Then |X|r ,|Y |r ∈ Lp/r for any r < 1. The preceding proof of the special case of Minkowski’s inequality for the exponent pr > 1 therefore implies (E(|X|r + |Y |r )p/r )r/p ≤ (E(|X|p )r/p + (E(|Y |p )r/p .
(5.1.2)
Since (|X|r + |Y |r )p/r ≤ 2p/r (|X|r ∨ |Y |r )p/r = 2p/r (|X|p ∨ |Y |p ) ≤ 2p/r (|X|p + |Y |p ) ∈ L, we can let r → 1 and apply the Dominated Convergence Theorem to the lefthand side of inequality 5.1.2. Thus we conclude that (|X| + |Y |)p ∈ L and that (E(|X| + |Y |)p )1/p ≤ (E(|X|p )1/p + (E(|X|p )1/p . Minkowski’s inequality is proved. 3. Since |X|p ≤ 1∨|X|q ∈ L, we have X ∈ Lp . Suppose E|X|p > (E|X|q )p/q . Let r ∈ (0,p) be arbitrary. Clearly, |X|r ∈ Lq/r . Applying Hoelder’s inequality to |X|r and 1, we obtain E|X|r ≤ (E|X|q )r/q . At the same time, |X|r ≤ 1 ∨ |X|q ∈ L. As r → p, the Dominated Convergence Theorem yields E|X|p ≤ (E|X|q )p/q , establishing Lyapunov’s inequality. Next we restate and simplify some definitions and theorems of convergence of measurable functions, in the context of r.v.’s. Definition 5.1.8. Convergence in probability, a.u., a.s., and in L1 . For each n ≥ 1, let Xn,X be functions on the probability space (,L,E), with values in the complete metric space (S,d). 1. The sequence (Xn ) is said to converge to X almost uniformly (a.u.) on the probability space (,L,E) if Xn → X a.u. on the integration space (,L,E). In that case, we write X = a.u. limn→∞ Xn . Since (,L,E) is a probability space, is a full set. It can therefore be easily verified that Xn → X a.u. iff for each ε > 0, there exists a measurable set B with P (B) < ε such that Xn converges to X uniformly on B c . 2. The sequence (Xn ) is said to converge to X in probability on the probability space (,L,E) if Xn → X in measure. Then we write Xn → X in probability. It can easily be verified that Xn → X in probability iff for each ε > 0, there exists p ≥ 1 so large that for each n ≥ p, there exists a measurable set Bn with P (Bn ) < ε such that Bnc ⊂ (d(Xn,X) ≤ ε). 3. The sequence (Xn ) is said to be Cauchy in probability if it is Cauchy in measure. It can easily be verified that (Xn ) is Cauchy in probability iff for each ε > 0 there exists p ≥ 1 so large that for each m,n ≥ p, there exists a c ⊂ (d(Xn,Xm ) ≤ ε). measurable set Bm,n with P (Bm,n ) < ε such that Bm,n 4. The sequence (Xn ) is said to converge to X almost surely (a.s.) if Xn → X a.e.
Probability Space
145
Proposition 5.1.9. a.u. Convergence implies convergence in probability, etc. For each n ≥ 1, let X,Xn be functions on the probability space (,L,E), with values in the complete metric space (S,d). Then the following conditions hold: 1. If Xn → X a.u., then (i) X is defined a.e., (ii) Xn → X in probability, and (iii) Xn → X a.s. 2. If (i) Xn is an r.v. for each n ≥ 1 and (ii) Xn → X in probability, then X is an r.v. 3. If (i) Xn is an r.v. for each n ≥ 1 and (ii) Xn → X a.u., then X is an r.v. 4. If (i) Xn is an r.v. for each n ≥ 1 and (ii) (Xn )n=1,2,... is Cauchy in probability, then there exists a subsequence (Xn(k) )k=1,2,... such that X ≡ limk→∞ Xn(k) is an r.v., with Xn(k) → X a.u. and Xn(k) → X a.s. Moreover, Xn → X in probability. 5. Suppose (i) Xn,X are r.r.v.’s for each n ≥ 1, (ii) Xn ↑ X in probability, and (iii) a ∈ R is a regular point of Xn,X for each n ≥ 0. Then P ((Xn > a)B) ↑ P ((X > a)B) as n → ∞, for each measurable set B. Proof. Assertions 1–3 are trivial consequences of the corresponding assertions in Proposition 4.9.3. Assertion 4 is a trivial consequence of Proposition 4.9.4. It remains to prove Assertion 5. To that end, let ε > 0 be arbitrary. Then, by Condition (iii), a ∈ R is a regular point of the r.r.v. X. Hence there exists a > a such that P (a ≥ X > a) < ε. Since by Condition (ii), Xn ↑ X in probability, there exists m ≥ 1 so large that P (X − Xn > a − a) < ε for each n ≥ m. Now let n ≥ m be arbitrary. Let A ≡ (a ≥ X > a) ∪ (X − Xn > a − a). Then P (A) < 2ε. Moreover, P ((X > a)B) − P ((Xn > a)B) ≤ P (X > a;Xn ≤ a) = P ((X > a;Xn ≤ a)Ac ) + P (A) < P ((X > a) ∩ (Xn ≤ a) ∩ ((a < X) ∪ (X ≤ a)) ∩ (X − Xn ≤ a − a)) + 2ε = P ((Xn ≤ a) ∩ (a < X) ∩ (X − Xn ≤ a − a)) + 2ε = 0 + 2ε = 2ε. Since P (A) < 2ε is arbitrarily small, we see that P ((Xn > a)B) ↑ P ((X > a)B), as alleged in Assertion 5. The next definition and proposition say that convergence in probability can be metrized. Definition 5.1.10. Probability metric on the space of r.v.’s. Let (,L,E) be a probability space. Let (S,d) be a complete metric space. We will let M(,S) denote the space of r.v.’s on (,L,E) with values in (S,d), where two r.v.’s are considered equal if they are equal a.s. Define the metric ρP rob (X,Y ) ≡ E(1 ∧ d(X,Y ))
(5.1.3)
for each X,Y ∈ M(,S). The next proposition proves that ρP rob is indeed a metric. We will call ρP rob the probability metric on the space M(,S) of r.v.’s.
146
Probability Theory
Proposition 5.1.11. Basics of the probability metric ρP rob on the space M(,S) of r.v.’s. Let (,L,E) be a probability space. Let X,X1,X2, . . . be arbitrary r.v.’s with values in the complete metric space (S,d). Then the following conditions hold: 1. The pair (M(,S),ρP rob ) is a metric space. Note that ρP rob ≤ 1. 2. Xn → X in probability iff for each ε > 0, there exists p ≥ 1 so large that P (d(Xn,X) > ε) < ε for each n ≥ p. 3. Sequential convergence relative to ρP rob is equivalent to convergence in probability. 4. The metric space (M(,S),ρP rob ) is complete. 5. Suppose there exists a sequence (εn )n=1,2,... of positive real numbers such 2 that ∞ n=1 εn < ∞ and such that ρP rob (Xn,Xn+1 ) ≡ E(1 ∧ d(Xn,Xn+1 )) < εn for each n ≥ 1. Then Y ≡ limn→∞ Xn is an r.v., and Xn → Y a.u. Proof. 1. Let X,Y ∈ M(,S) be arbitrary. Then d(X,Y ) is an r.r.v according to Proposition 4.8.20. Hence 1 ∧ d(X,Y ) is an integrable function, and ρP rob is well defined in equality 5.1.3. Symmetry and triangle inequality for the function ρP rob are obvious from its definition. Suppose ρP rob (X,Y ) ≡ E(1 ∧ d(X,Y )) = 0. Let (εn )n=1,2,... be a sequence in (0,1) with εn ↓ 0. Then Chebychev’s inequality implies P (d(X,Y ) > εn ) = P (1 ∧ d(X,Y ) > εn ) ≤ εn−1 E(1 ∧ d(X,Y )) = 0 for each n ≥ 1. Hence A ≡ ∞ n=1 (d(X,Y ) > εn ) is a null set. On the full set c A , we have d(X,Y ) ≤ εn for each n ≥ 1. Therefore d(X,Y ) = 0 on the full set Ac . Thus X = Y in M(,S). Summing up, ρP rob is a metric. 2. Suppose Xn → X in probability. Let ε > 0 be arbitrary. Then, according to Definition 5.1.8, there exists p ≥ 1 so large that for each n ≥ p, there exists an integrable set Bn with P (Bn ) < ε and Bnc ⊂ (d(Xn,X) ≤ ε). Now consider each n ≥ p. Then P (d(Xn,X) > ε) ≤ P (Bn ) < ε for each n ≥ p. Conversely, suppose, for each ε > 0, there exists p ≥ 1 so large that P (d(Xn,X) > ε) < ε for each n ≥ p. Let ε > 0 be arbitrary. Consider each n ≥ 1. Define the integrable set Bn ≡ (d(Xn,X) > ε). Then P (Bn ) < ε and Bnc ⊂ (d(Xn,X) ≤ ε). Hence Xn → X in probability according to Definition 5.1.8. Assertion 2 is proved. 3. Suppose ρP rob (Xn,X) ≡ E(1 ∧ d(Xn,X)) → 0. Let ε > 0 be arbitrary. Take p ≥ 1 so large that E(1 ∧ d(Xn,X)) < ε(1 ∧ ε) for each n ≥ p. Then Chebychev’s inequality implies that P (d(Xn,X) > ε) ≤ P (1 ∧ d(Xn,X) ≥ 1 ∧ ε) ≤ (1 ∧ ε)−1 E(1 ∧ d(Xn,X)) < ε for each n ≥ p. Thus Xn → X in probability, by Assertion 2. Conversely, suppose Xn → X in probability. Then, by Assertion 2, there exists p ≥ 1 so large that P (d(Xn,X) > ε) < ε for each n ≥ p. Hence
Probability Space
147
E(1 ∧ d(Xn,X)) = E(1 ∧ d(Xn,X))1(d(X(n),X)>ε) + E(1 ∧ d(Xn,X))1(d(X(n),X)≤ε) ≤ E1(d(X(n),X)>ε) + ε = P (d(Xn,X) > ε) + ε < 2ε for each n ≥ p. Thus ρP rob (Xn,X) ≡ E(1 ∧ d(Xn,X)) → 0. Assertion 3 is proved. 4. Suppose ρP rob (Xn,Xm ) ≡ E(1 ∧ d(Xn,Xm )) → 0 as n,m → ∞. Let ε > 0 be arbitrary. Take p ≥ 1 so large that E(1 ∧ d(Xn,Xm )) < ε(1 ∧ ε) for each n,m ≥ p. Then Chebychev’s inequality implies that P (d(Xn,Xm ) > ε) ≤ P (1 ∧ d(Xn,Xm ) ≥ 1 ∧ ε) ≤ (1 ∧ ε)−1 E(1 ∧ d(Xn,Xm )) < ε for each n,m ≥ p. Thus the sequence (Xn )n=1,2,... of functions is Cauchy in probability. Hence Proposition 4.9.4 implies that X ≡ limk→∞ Xn(k) is an r.v. for some subsequence (Xn(k) )k=1,2,... of (Xn )n=1,2,... , and that Xn → X in probability. By Assertion 3, it then follows that ρP rob (Xn,X) → 0. Thus the metric space (M(,S),ρP rob ) is complete, and Assertion 4 is proved. 5. Suppose there exists a sequence (εn )n=1,2,... of positive real numbers such that ∞ 2 n=1 εn < ∞, and such that ρP rob (Xn,Xn+1 ) ≡ E(1 ∧ d(Xn,Xn+1 )) < εn for each n ≥ 1. Trivially, the probability space (,L,E) is σ -finite with the sequence (Ai )i=1,2,... as an I -basis, where Ai ≡ for each i ≥ 1. Then, for each i ≥ 1, we have I (1 ∧ d(Xn,Xn+1 ))1A(i) < εn2 for each n ≥ 1. Hence Proposition 4.9.6 is applicable; it implies that X ≡ limn→∞ Xn is an r.v., and Xn → X a.u. Assertion 5 and the proposition are proved. Corollary 5.1.12. Reciprocal of an a.s. positive r.r.v. Let X be a nonnegative r.r.v. such that P (X < a) → 0 as a → 0. Define the function X−1 by domain(X−1 ) ≡ D ≡ (X > 0) and X−1 (ω) ≡ (X(ω))−1 for each ω ∈ D. Then X−1 is an r.r.v. Proof. Let a1 > a2 > · · · > 0 be a sequence such that P Dkc → 0, where Dk ≡ P (X ≥ ak ) for each k ≥ 1. Then D = ∞ k=1 Dk , whence D is a full set. Let j ≥ k ≥ 1 be arbitrary. Define the r.r.v. Yk ≡ (X ∨ ak )−1 1D(k) . Then X−1 1D(k) = Yk . Moreover, Yj ≥ Yk and (Yj − Yk > 0) ⊂ (1D(j ) − 1D(k) > 0) = Dj Dkc ⊂ Dkc . Consequently, since P Dkc → 0 as k → ∞, the sequence (Yk )k=1,2,... converges a.u. Hence, according to Proposition 5.1.9, Y ≡ limk→∞ Yk is an r.r.v. Since X−1 = Y on the full set D, the function X−1 is an r.r.v. We saw in Proposition 5.1.11 that convergence in L1 of r.r.v.’s implies convergence in probability. The next proposition gives the converse in the case of uniform integrability.
148
Probability Theory
Proposition 5.1.13. Uniform integrability of a sequence of r.r.v.’s and convergence in probability together imply convergence in L1 . Suppose (Xn )n=1,2,... is a uniformly integrable sequence of r.r.v.’s. Suppose Xn → X in probability for some r.r.v. X. Then the following conditions hold: 1. limn→∞ E|Xn | exists. 2. X is integrable and Xn → X in L1 . Proof. 1. Let ε > 0 be arbitrary. By hypothesis, the sequence (Xn )n=1,2,... is uniformly integrable. Hence, by Proposition 4.7.3, there exist (i) b ≥ 0 so large that E|Xn | ≤ b for each n ≥ 1 and (ii) δ(ε) ∈ (0,ε) so small that E(|Xn |;A) < ε for each n ≥ 1 and for each event A with P A < δ(ε). 2. Since, again by hypothesis, Xn → X in probability, there exists p(ε) ≥ 1 so large that P (|X − Xn | > δ(ε)) < δ ≡ δ(ε) for each n ≥ p(ε). Let m,n ≥ p(ε) be arbitrary. Then |E|Xm | − E|Xn || ≤ E|Xm − Xn | ≤ (E|Xm − Xn |;|Xm − Xn | ≤ 2δ) + (E|Xm − Xn |;|Xm − Xn | > 2δ) ≤ 2δ + E(|Xm | + |Xn |;|X − Xn | ∨ |X − Xm | > δ) ≤ 2ε + E(|Xm | + |Xn |;(|X − Xn | > δ) ∪ (|X − Xm | > δ)) = 2ε + E(|Xm |;|X − Xn | > δ) + E(|Xm |;|X − Xm | > δ) + E(|Xn |;|X − Xn | > δ) + E(|Xn |;|X − Xm | > δ) < 2ε + ε + ε + ε + ε = 6ε.
(5.1.4)
Thus (E|Xn |)n=1,2,... is a Cauchy sequence. Consequently, c ≡ limn→∞ E|Xn | exists. Assertion 1 is proved. 3. Letting m → ∞ while keeping n = p(ε) in inequality 5.1.4, we obtain |c − E|Xp(ε) || ≤ 6ε.
(5.1.5)
Moreover, inequality 5.1.4 implies that |E(|Xm | ∧ a) − E(|Xn | ∧ a)| ≤ |E|Xm | − E|Xn || < 6ε
(5.1.6)
for each m,n ≥ p(ε), for each a > 0. Again, let m → ∞, while keeping n = p(ε). Then E(|Xm | ∧ a) → E(|X| ∧ a) as m → ∞, by the Dominated Convergence Theorem. Hence inequality 5.1.6 yields |E(|X| ∧ a) − E(|Xp(ε) | ∧ a)| ≤ 6ε.
(5.1.7)
4. Next note that Xp(ε) is, by hypothesis, an integrable r.r.v. Hence there exists aε > 0 so large that |E|Xp(ε) | − E(|Xp(ε) | ∧ a)| < ε for each a > aε . Then, for each a > aε , we have
(5.1.8)
Probability Space
149
|c − E(|X| ∧ a)| ≤ |c − E|Xp(ε) | + |E|Xp(ε) | − E(|Xp(ε) | ∧ a)| + |E(|Xp(ε) | ∧ a) − E(|X| ∧ a)| ≤ 6ε + ε + 6ε = 13ε, where the second inequality is thanks to inequalities 5.1.5, 5.1.7, and 5.1.8. Since ε > 0 is arbitrary, we see that E(|X| ∧ a) ↑ c as a → ∞. 5. Therefore, by the Monotone Convergence Theorem, the r.r.v. |X| is integrable, with expectation equal to c ≡ limn→∞ E|Xn |. Consequently, the r.r.v. X is integrable. 6. At the same time, inequality 5.1.4 yields E|Xm − Xn | ≤ 6ε for each m,n ≥ p(ε). Now consider each fixed n ≥ p(ε). Then the nonnegative sequence ((Xm − Xn ) ∨ 0)m=1,2,... of r.r.v.’s converges in probability to the r.r.v. (X − Xn ) ∨ 0. Hence, by Step 5, E((X − Xn ) ∨ 0) = limm→∞ E((Xm − Xn ) ∨ 0). Similarly, E(−((X − Xn ) ∧ 0)) = limm→∞ E(−((Xm − Xn ) ∧ 0)). Combining, E|X − Xn | = E((X − Xn ) ∨ 0) − E((X − Xn ) ∧ 0) = lim E((Xm − Xn ) ∨ 0) − lim E((Xm − Xn ) ∧ 0) m→∞
m→∞
= lim E((Xm − Xn ) ∨ 0) − E((Xm − Xn ) ∧ 0) m→∞
= lim E|Xm − Xn | ≤ 6ε, m→∞
where n ≥ p(ε) is arbitrary. Thus E|X − Xn | → 0 as n → ∞. In other words, Xn → X in L1 . The proposition is proved. Proposition 5.1.14. Necessary and sufficient condition for a.u. convergence. For n ≥ 1, let X,Xn be r.v.’s with values in the locally compact metric space (S,d). Then the following two conditions are equivalent: (i) for each ε > 0, there exist an integrable set B with P (B) < ε and an integer m ≥ 1 such that for each n ≥ m, we have d(X,Xn ) ≤ ε on B c , and (ii) Xn → X a.u. Proof. Suppose Condition (i) holds. Let (εk )k=1,2,... be a sequence of positive real numbers with ∞ k=1 εk < ∞. By hypothesis, for each k ≥ 1 there exist an integrable set Bk with P (Bk ) < εk and an integer mk ≥ 1 such that for each n ≥ mk , we have d(X,Xn ) ≤ εk on Bkc . Let ε > 0 be arbitrary. Let p ≥ 1 be so ∞ ε and define A ≡ ∞ large that ∞ k=p εk < k=p Bk . Then P (A) ≤ k=p εk < ε. ∞ c c Moreover, on A = k=p Bk , we have d(X,Xn ) ≤ εk for each n ≥ mk and each k ≥ p. Therefore Xn → X uniformly on Ac . Since P (A) is arbitrarily small, we have Xn → X a.u. Thus Condition (ii) is verified. Conversely, suppose Condition (ii) holds. Let ε > 0 be arbitrary. Then, by Definition 4.9.2, there exists a measurable set B with P (B) < ε such that Xn converges to X uniformly on B c . Hence there exists m ≥ 1 so large that ∞ n=m (d(Xn,X) > ε) ⊂ B. In particular, for each n ≥ m, we have (d(Xn,X) > ε) ⊂ B, whence d(X,Xn ) ≤ ε on B c . Condition (i) is established.
150
Probability Theory
Definition 5.1.15. Probability subspace generated by a family of r.v.’s with values in a complete metric space. Let (,L,E) be a probability space and let L be a subset of L. If (,L ,E) is a probability space, then we call (,L ,E) a probability subspace of (,L,E). When confusion is unlikely, we will abuse terminology and simply call L a probability subspace of L, with and E understood. Let G be a nonempty family of r.v.’s with values in a complete metric space (S,d). Define LCub (G) ≡ {f (X1, . . . ,Xn ) : n ≥ 1;f ∈ Cub (S n,d n );X1, . . . ,Xn ∈ G}. Then (,LCub (G),E) is an integration subspace of (,L,E). Its completion L(G) ≡ L(X : X ∈ G) ≡ {f (X1, . . . ,Xn ) : n ≥ 1;f ∈ Cub (S n,d n );X1, . . . ,Xn ∈ G}− will be called the probability subspace of L generated by the family G. If G is a finite or countably infinite set {X1,X2, . . .}, we write L(X1,X2, . . .) for L(G). Note that LCub (G) is a linear subspace of L containing constants and is closed to the operation of maximum and absolute values. Hence (,LCub (G),E) is indeed an integration space, according to Proposition 4.3.6. Since 1 ∈ LCub (G) with E1 = 1, the completion (,L(G),E) is a probability space. Any r.r.v. in L(G) has its value determined once all the values of the r.v.’s in the generating family G have been observed. Intuitively, L(G) contains all the information obtainable by observing the values of all X ∈ G. Proposition 5.1.16. Probability subspace generated by a family of r.v.’s with values in a locally compact metric space. Let (,L,E) be a probability space. Let G be a nonempty family of r.v.’s with values in a locally compact metric space (S,d). Let LC (G) ≡ {f (X1, . . . ,Xn ) : n ≥ 1;f ∈ C(S n );X1, . . . ,Xn ∈ G}. Then (,LC (G),E) is an integration subspace of (,L,E). Moreover, its completion LC is equal to L(G) ≡ LCub . Proof. Note first that LC (G) ⊂ LCub (G), and that LC (G) is a linear subspace of LCub such that if U,V ∈ LC (G), then |U |,U ∧ 1 ∈ LC (G). Hence LC (G) is an integration subspace of the complete integration space (,LCub,E) according to Proposition 4.3.6. Consequently, LC ⊂ LCub . Conversely, let U ∈ LCub (G) be arbitrary. Then U = f (X1, . . . ,Xn ) for some f ∈ Cub (S n,d n ) and some X1, . . . ,Xn ∈ G. Then, by Assertion 4 of Proposition 5.1.4, there exists a sequence (gk )k=1,2,... in C(S n,d n ) such that E|f (X) − gk (X)| → 0, where we write X ≡ (X1, . . . ,Xn ). Since gk (X) ∈ LC(G) ⊂ LC
Probability Space
151
for each k ≥ 1, and since LC is complete, we see that U = f (X) ∈ LC . Since U ∈ LCub (G) is arbitrary, we have LCub (G) ⊂ LC . Consequently, LCub ⊂ LC . Summing up, LC = LCub ≡ L(G), as alleged. The next lemma sometimes comes in handy. Lemma 5.1.17. The intersection of probability subspaces is a probability be a nonempty family of subspace. Let (,L,E) be a probability space. Let L probability subspaces L of L. Then L ≡ L ∈L L is a probability subspace of L. Proof. Clearly, the intersection L is a linear subspace of L, contains the constant function 1 with E1 = 1, and is such that if X,Y ∈ L , then |X|,X ∧ 1 ∈ L . Hence it is an integration subspace of L, according to Proposition 4.3.6. At the are closed in the space L relative to same time, since the sets L in the family L the norm E| · |, so is their intersection L . Since L is complete relative to E, so is the closed subspace L . Summing up, (,L ,I ) is a probability subspace of (,L,I ).
5.2 Probability Distribution on Metric Space Definition 5.2.1. Distribution on a complete metric space. Suppose (S,d) is a complete metric space. Let n ≥ 1 be arbitrary. Recall the function hn ≡ 1 ∧ (1 + n − d(·,x◦ ))+ ∈ Cub (S,d), where x◦ ∈ S is an arbitrary but fixed reference point. Note that the function hn has bounded support. Hence hn ∈ C(S,d) if (S,d) is locally compact. Let J be an integration on (S,Cub (S,d)), in the sense of Definition 4.3.1. Suppose J hn ↑ 1 as n → ∞. Then the integration J is called a probability distribution, or simply a distribution, on (S,d). We will let J(S,d) denote the set of distributions on the complete metric space (S,d). Lemma 5.2.2. Distribution basics. Suppose (S,d) is a complete metric space. Then the following conditions hold: 1. Let J be an arbitrary distribution on (S,d). Let (S,L,J ) ≡ (S,Cub (S),J ) be the complete extension of the integration space (S,Cub (S),J ). Then (S,L,J ) is a probability space. 2. Suppose the metric space (S,d) is bounded. Let J be an integration on (S,Cub (S)) such that J 1 = 1. Then the integration J is a distribution on (S,d). 3. Suppose (S,d) is locally compact. Let J be an integration on (S,C(S,d)) in the sense of Definition 4.2.1. Suppose J hn ↑ 1 as n → ∞. Then J is a distribution on (S,d). Proof. 1. By Definition 5.2.1, J hn ↑ 1 as n → ∞. At the same time, hn ↑ 1 on S. The Monotone Convergence Theorem therefore implies that 1 ∈ L and J 1 = 1. Thus (S,L,J ) is a probability space.
152
Probability Theory
2. Suppose (S,d) is bounded. Then hn = 1 on S, for sufficiently large n ≥ 1. Hence, trivially J hn ↑ J 1 = 1, where the equality is by assumption. Therefore the integration J ion (S,Cub (S)) satisfies the conditions in Definition 5.2.1 to be a distribution. 3. Since (S,d) is locally compact, we have hn ∈ C(S) for each n ≥ 1. Moreover, J hn ↑ 1 by hypothesis. Let (S,L,J ) denote the completion of (S,C(S),J ). Let f ∈ Cub (S) be arbitrary, with some bound b ≥ 0 for |f |. Then J |hm f − hn f | ≤ bJ |hm − hn | = bJ (hm − hn ) → 0 as m ≥ n → ∞. Hence the sequence (hn f )n=1,2,... is Cauchy in the complete integration space L relative to J . Therefore g ≡ limn→∞ (hn f ) ∈ L, with J g = limn→∞ J hn f . At the same time, limn→∞ (hn f ) = f on S. Hence f = g ∈ L, with Jf = J g = limn→∞ J hn f . Since f ∈ Cub (S) is arbitrary, we conclude that Cub (S) ⊂ L. Consequently, (S,Cub (S),J ) is an integration subspace of (S,L,J ). Moreover, in the special case f ≡ 1, we obtain 1 ∈ L with J 1 = limn→∞ J hn = 1. Thus the integration J on Cub (S) satisfies the conditions in Definition 5.2.1 to be a distribution. Definition 5.2.3. Distribution induced by an r.v. Let X be an r.v. on a probability space (,L,E) with values in the complete metric space (S,d). For each f ∈ Cub (S), define EX f ≡ Ef (X). Lemma 5.2.4 (next) proves that EX is a distribution on (S,d). We will call EX the distribution induced on the complete metric space (S,d) by the r.v. X. The completion (S,LX,EX ) ≡ (S,Cub (S),EX ) of (S,Cub (S),EX ) is a probability space, called the probability space induced on the complete metric space (S,d) by the r.v. X. Lemma 5.2.4. Distribution induced by an r.v. is indeed a distribution. Let X be an arbitrary r.v. on a probability space (,L,E) with values in the complete metric space (S,d). Then the function EX introduced in Definition 5.2.3 is indeed a distribution. Proof. Let f ∈ Cub (S) be arbitrary. By Assertion 4 of Proposition 5.1.4, we have f (X) ∈ L. Hence EX f ≡ Ef (X) is well defined. The space Cub (S) is linear, contains constants, and is closed to absolute values and taking minimums. The remaining conditions in Definition 4.3.1 for EX to be an integration on (S,Cub (S)) follow from the corresponding conditions for E. Moreover, EX hn ≡ Ehn (X) ↑ 1 as n → ∞, where the convergence is again by Assertion 4 of Proposition 5.1.4. All the conditions in Definition 5.2.1 have been verified for EX to be a distribution. Proposition 5.2.5. Each distribution is induced by some r.v. Suppose J is a distribution on a complete metric space (S,d). Let (S,L,J ) denote the completion of the integration space (S,Cub (S),J ). Then the following conditions hold: 1. The identity function X : (S,L,J ) → (S,d), defined by X(x) = x for each x ∈ S, is an r.v.
Probability Space
153
2. The function d(·,x◦ ) is an r.r.v. on (S,L,J ). 3. J = EX . Thus each distribution on a complete metric space is induced by some r.v. Proof. By Lemma 5.2.2, (S,L,J ) is a probability space and J hn ↑ 1 as n → ∞. Hence the hypothesis in Corollary 4.8.11 is satisfied. Accordingly, the identity function X is an r.v. on (,L,E) ≡ (S,L,J ), and d(·,x◦ ) is an r.r.v. Moreover, for each f ∈ Cub (S), we have Jf ≡ Ef ≡ Ef (X) ≡ EX f . Hence the integration spaces (S,Cub (S,J ),J ) and (S,Cub (S,J ),EX ) are equal. Therefore their completions are the same. In other words, (S,L,J ) = (S,LX,EX ). Proposition 5.2.6. Relation between probability spaces generated and induced by an r.v. Suppose X is an r.v. on the probability space (,L,E) with values in a complete metric space (S,d). Let (,L(X),E) be the probability subspace of (,L,E) generated by {X}. Let (S,LX,EX ) be the probability space induced on (S,d) by X. Let f : S → R be an arbitrary function. Then the following conditions hold: 1. f ∈ LX iff f (X) ∈ L(X), in which case EX f = Ef (X). In words, a function f is integrable relative to EX iff f (X) is integrable relative to E, in which case EX f = Ef (X). 2. f is an r.r.v. on (S,LX,EX ) iff f (X) is an r.r.v. on (,L(X),E). Proof. 1. Suppose f ∈ LX . Then, by Corollary 4.4.4, there exists a sequence (fn )n=1,2,... in Cub (S) such that EX |f − fn | → 0 and f = limn→∞ fn . Consequently, E|fn (X) − fm (X)| ≡ EX |fn − fm | → 0. Thus (fn (X))n=1,2,... is a Cauchy sequence in L(X) relative to the expectation E. Since L(X) is complete, we have Y ≡ limn→∞ fn (X) ∈ L(X) with E|fn (X) − Y | → 0, whence EY = lim Efn (X) ≡ lim EX fn = EX f . n→∞
n→∞
Since f (X) = limn→∞ fn (X) = Y on the full set domain(Y ), it follows that f (X) ∈ L(X), with Ef (X) = EY = EX f . Conversely, suppose Z ∈ L(X). We will show that Z = f (X) for some integrable function f relative to EX . Since L(X) is, by definition, the completion of LCub (X) ≡ {f (X) : f ∈ Cub (S)}, the latter is dense in the former, relative to the norm E| · |. Hence there exists a sequence (fn )n=1,2,... in Cub (S) such that E|Z − fn (X)| → 0. Consequently, EX |fn − fm | ≡ E|fn (X) − fm (X)| → 0.
(5.2.1)
154
Probability Theory
Hence EX |fn − f | → 0 where f ≡ limn→∞ fn ∈ Cub (S) ≡ LX . By the first part of this proof in the previous paragraph, we have E|fn (X) − Y | → 0,
(5.2.2)
where Y = f (X)
a.s.
(5.2.3)
Convergence expressions 5.2.1 and 5.2.2 together imply that Z = Y a.s., which, together with equality 5.2.3, yields Z = f (X), where f ∈ LX . Assertion 1 is proved. 2. For each n ≥ 1, define gn ≡ 1 ∧ (1 + n − | · |)+ ∈ Cub (R). Suppose the function f is an r.r.v. on (S,LX,EX ). Then, by Proposition 5.1.4, we have (i) g ◦ f ∈ LX for each g ∈ Cub (R) and (ii) EX gn ◦ f ↑ 1 as n → ∞. In view of Condition (i), we have g(f (X)) ≡ g ◦ f (X) ∈ L(X) for each g ∈ Cub (R) by Assertion 1. Moreover, Egn (f (X)) = EX gn ◦ f ↑ 1 as n → ∞. Combining, we can apply Assertion 4 of Proposition 5.1.4 to the function f (X) : → R in the place of X : → S, and we conclude that f (X) is an r.r.v. on (,L(X),E). Conversely, suppose f (X) is an r.r.v. on (,L(X),E). Then, again by Assertion 4 of Proposition 5.1.4, we have (i ) g(f (X)) ∈ L(X) for each g ∈ Cub (R) and (ii ) E(gn (f (X)) ↑ 1 as n → ∞. In view of Condition (i ), we have g ◦ f ∈ LX for each g ∈ Cub (R) by Assertion 1 of the present proposition. Moreover, EX gn ◦ f = Egn (f (X)) =↑ 1 as n → ∞. Combining, we see that f is an r.r.v. on (S,LX,EX ), again by Assertion 4 of Proposition 5.1.4. Proposition 5.2.7. Regular points of an r.r.v. f relative to induced distribution by an r.v. X are the same as regular points of f (X). Suppose X is an r.v. on the probability space (,L,E) with values in a complete metric space (S,d). Suppose f is an r.r.v. on (S,LX,EX ). Then t ∈ R is a regular point of the r.r.v. f on (S,LX,EX ) iff it is a regular point of the r.r.v. f (X) on (,L,E). Similarly, t ∈ R is a continuity point of f iff it is a continuity point of f (X). Proof. Suppose f is an r.r.v. on (S,LX,EX ). By Definition 4.8.13, t is a regular point of f iff (i) there exists a sequence (sn )n=1,2,... of real numbers decreasing to t such that (sn < f ) is integrable relative to EX for each n ≥ 1, and limn→∞ PX (sn < f ) exists, and (ii) there exists a sequence (rn )n=1,2,... of real numbers increasing to t such that (rn < f ) is integrable relative to EX for each n ≥ 1, and limn→∞ PX (rn < f ) exists. In view of Proposition 5.2.6, Conditions (i) and (ii) are equivalent to (i ) there exists a sequence (sn )n=1,2,... of real numbers decreasing to t such that (sn < f (X)) is integrable relative to E for each n ≥ 1, and limn→∞ P (sn < f (X)) exists, and (ii ) there exists a sequence (rn )n=1,2,... of real numbers increasing to t such that (rn < f (X)) is integrable relative to E for each n ≥ 1, and limn→∞ P (rn < f (X)) exists. In other words, t is a regular point of f iff t is a regular point of f (X).
Probability Space
155
Moreover, a regular point t of f is a continuity point of f iff the two limits in Conditions (i) and (ii) exist and are equal. Equivalently, t is a continuity point of f iff the two limits in Conditions (i ) and (ii ) exist and are equal. Combining, we conclude that t is a continuity point of f iff it is a continuity point of f (X).
5.3 Weak Convergence of Distributions Recall that if X is an r.v. on a probability space (,L,E) with values in S, then EX denotes the distribution induced on S by X. Definition 5.3.1. Weak convergence of distributions on a complete metric space. Recall that J(S,d) denotes the set of distributions on a complete metric space (S,d). A sequence (Jn )n=1,2,... in J(S,d) is said to converge weakly to J ∈ J(S,d) if Jn f → Jf for each f ∈ Cub (S). We then write Jn ⇒ J . Suppose X,X1,X2, . . . are r.v.’s with values in S, not necessarily on the same probability space. The sequence (Xn )n=1,2,... of r.v.’s is said to converge weakly to r.v. X, or to converge in distribution, if EX(n) ⇒ EX . We then write Xn ⇒ X. Proposition 5.3.2. Convergence in probability implies weak convergence. Let (Xn )n=0,1,... be a sequence of r.v.’s on the same probability space (,L,E), with values in a complete metric space (S,d). If Xn → X0 in probability, then Xn ⇒ X0 . Proof. Suppose Xn → X0 in probability. Let f ∈ Cub (S) be arbitrary, with |f | ≤ c for some c > 0, and with a modulus of continuity δf . Let ε > 0 be arbitrary. By Definition 5.1.8 of convergence in probability, there exists p ≥ 1 so large that for each n ≥ p, there exists an integrable set Bn with P (Bn ) < ε and Bnc ⊂ (d(Xn,X0 ) < δf (ε)) ⊂ (|f (Xn ) − f (X0 )| < ε). Consider each n ≥ p. Then |Ef (Xn ) − Ef (X0 )| = E|f (Xn ) − f (X0 )|1B(n) + E|f (Xn ) − f (X0 )|1B(n)c ≤ 2cP (Bn ) + ε < 2cε + ε. Since ε > 0 is arbitrarily small, we conclude that Ef (Xn ) → Ef (X0 ). Equivalently, JX(n) f → JX(0) f . Since f ∈ Cub (S) is arbitrary, we have JX(n) ⇒ JX(0) . In other words, Xn ⇒ X0 . Lemma 5.3.3. Weak convergence of distributions on a locally compact metric space. Suppose (S,d) is locally compact. Suppose J,J ,Jp ∈ J(S,d) for each p ≥ 1. Then Jp ⇒ J iff Jp f → Jf for each f ∈ C(S). Moreover, J = J if Jf = J f for each f ∈ C(S). Consequently, a distribution on a locally compact metric space is uniquely determined by the expectation of continuous functions with compact supports.
156
Probability Theory
Proof. Since C(S) ⊂ Cub (S), it suffices to prove the “if” part. To that end, suppose that Jp f → Jf for each f ∈ C(S). Let g ∈ Cub (S) be arbitrary. We need to prove that Jp g → J g. We assume, without loss of generality, that 0 ≤ g ≤ 1. Let ε > 0 be arbitrary. Since J is a distribution, there exists n ≥ 1 so large that J (1 − hn ) < ε, where hn ∈ Cub (S) as defined at the beginning of this chapter. Since hn,ghn ∈ C(S), we have, by hypothesis, Jm hn → J hn and Jm ghn → J ghn as m → ∞. Hence |Jm g − J g| ≤ |Jm g − Jm ghn | + |Jm ghn − J ghn | + |J ghn − J g| ≤ |1 − Jm hn | + |Jm ghn − J ghn | + |J hn − 1| < ε + ε + ε for sufficiently large m ≥ 1. Since ε > 0 is arbitrary, we conclude that Jm g → J g, where g ∈ Cub (S) is arbitrary. Thus Jp ⇒ J . Now suppose Jf = J f for each f ∈ C(S). Define Jp ≡ J for each p ≥ 1. Then Jp f ≡ J f = Jf for each f ∈ C(S). Hence by the previous paragraph, J g ≡ Jp g → J g for each g ∈ Cub (S). Thus J g = J g for each g ∈ Cub (S). In other words, J = J on Cub (S). We conclude that J = J as distributions. In the important special case of a locally compact metric space (S,d), the weak convergence of distributions on (S,d) can be metrized, as in the next definition and proposition. Definition 5.3.4. Distribution metric for distributions on a locally compact metric space. Suppose the metric space (S,d) is locally compact, with an arbitrary but fixed reference point x◦ ∈ S. Let ξ ≡ (An )n=1,2,... be an arbitrary but fixed binary approximation of (S,d) relative to x◦ . Let π ≡ ({gn,x : x ∈ An })n=1,2,... be the partition of unity of (S,d) determined by ξ , as in Definition 3.3.4. Let J(S,d) denote the set of distributions on the locally compact metric space (S,d). Let J,J ∈ J(S,d) be arbitrary. Define ρDist,ξ (J,J ) ≡
∞
n=1
2−n |An |−1
|J gn,x − J gn,x |
(5.3.1)
x∈A(n)
and call ρDist,ξ the distribution metric on J(S,d) relative to the binary approximation ξ . The next proposition shows that the function ρDist,ξ is indeed a metric, and that sequential convergence relative to ρDist,ξ is equivalent to weak convergence. Note that ρDist,ξ ≤ 1. In the following, recall that [·]1 is the operation that assigns to each a ∈ [0,∞) an integer [a]1 in (a,a + 2). Proposition 5.3.5. Sequential metrical convergence is equivalent to weak convergence on a locally compact metric space. Suppose the metric space (S,d) is locally compact, with the reference point x◦ ∈ S. Let ξ ≡ (An )n=1,2,... be a
Probability Space
157
binary approximation of (S,d) relative to x◦ , with a corresponding modulus of local compactness ξ ≡ (|An |)n=1,2,... of (S,d). Let ρDist,ξ be the operation introduced in Definition 5.3.4. Let Jp ∈ J(S,d) for p ≥ 1. Let f ∈ C(S) be arbitrary, with a modulus of continuity δf , with |f | ≤ 1, and with (d(·,x◦ ) ≤ b) as support for some b > 0. Then the following conditions hold: 1. Let α > 0 be arbitrary. Then there exists δJ(α) ≡ δJ(α,δf ,b, ξ ) > 0 such that for each J,J ∈ J(S,d) with ρDist,ξ (J,J ) < δJ(α), we have |Jf − J f | < α. 2. Let π ≡ ({gn,x : x ∈ An })n=1,2,... be the partition of unity of (S,d) determined by ξ . Suppose Jp gn,x → J gn,x as p → ∞, for each x ∈ An , for each n ≥ 1. Then ρDist,ξ (Jp,J ) → 0. 3. Jp f → Jf for each f ∈ C(S) iff ρDist,ξ (Jp,J ) → 0. Thus Jp ⇒ J
ρDist,ξ (Jp,J ) → 0.
iff
4. The function ρDist,ξ is a metric.
" Proof.# 1. Let α > 0 be arbitrary. Let ε ≡ 3−1 α. Let n ≡ 0 ∨ 1 − log2 δf 3ε ∨ log2 b 1 . We will show that δJ(α) ≡ δJ(α,δf ,b, ξ ) ≡ 2−n |An |−1 ε has the desired property in Assertion 1. To that end, suppose J,J ∈ J(S,d) are such that ρDist,ξ (J,J ) < δJ(α). By Definition 3.3.4 of π , the sequence {gn,x : x ∈ An } is a 2−n -partition of unity of (S,d) determined by An . 2. Separately, by hypothesis, the function f has support (d(·,x) ≤ 2−n ), (d(·,x◦ ) ≤ b) ⊂ (d(·,x◦ ) ≤ 2n ) ⊂ x∈A(n) n where the first inclusion is because 1 b < 2 and the second inclusion is by 1 −n Definition 3.2.1. Since 2 < 2 δf 3 ε , Proposition 3.3.6 then implies that
f − g ≤ ε, where g≡
f (x)gn,x .
x∈A(n)
3. By the definition of ρDist,ξ , we have
|J gn,x − J gn,x | ≤ ρDist,ξ (J,J ) < δJ(α). 2−n |An |−1 x∈A(n)
(5.3.2)
158
Probability Theory
Therefore
|J g − J g| ≡ f (x)(J gn,x − J gn,x ) x∈A(n)
≤ |J gn,x − J gn,x | < 2n |An |δJ(α) ≡ ε. x∈A(n)
Combining with inequality 5.3.2, we obtain |Jf − J f | ≤ |J g − J g| + 2ε < ε + 2ε = 3ε ≡ α, where J,J ∈ J(S,d) are arbitrary such that ρDist,ξ (J,J ) < δJ(α). Assertion 1 is proved. 4. Suppose Jp gn,x → J gn,x as p → ∞, for each x ∈ An , for each n ≥ 1. Let ε > 0 be arbitrary. Note that ρDist,ξ (J,Jp ) ≡ ≤
∞
n=1 m
n=1
2−n |An |−1
|J gn,x − Jp gn,x |
x∈A(n)
2−n |An |−1
|J gn,x − Jp gn,x | + 2−m .
x∈A(n)
We can first fix m ≥ 1 so large that 2−m < 2−1 ε. Then, for sufficiently large p ≥ 1, the last double summation yields a sum also less than 2−1 ε, whence ρDist,ξ (J,Jp ) < ε. Since ε > 0 is arbitrary, we see that ρDist,ξ (J,Jp ) → 0. Assertion 2 is proved. 5. Suppose ρDist,ξ (Jp,J ) → 0. Then Assertion 1 implies that Jp f → Jf for each f ∈ C(S). Hence Jp ⇒ J , thanks to Lemma 5.3.3. Conversely, suppose Jp f → Jf for each f ∈ C(S). Then, in particular, Jp gn,x → J gn,x as p → ∞, for each x ∈ An , for each n ≥ 1. Hence ρDist,ξ (Jp,J ) → 0 by Assertion 2. Assertion 3 is verified. Applying it to the special case where Jp = J for each p ≥ 1, we obtain ρDist,ξ (J ,J ) = 0 iff J = J . 6. Symmetry and the triangle inequality required for a metric follow trivially from the defining equality 5.3.1. Hence ρDist,ξ is a metric. From the defining equality 5.3.1, we have ρDist,ξ (J,J ) ≤ 1 for each J,J ∈ J(S,d). Hence the metric space (J(S,d),ρDist,ξ ) is bounded. It is not necessarily complete. An easy counterexample is found by taking S ≡ R with the Euclidean metric, and taking Jp to be the point mass at p for each p ≥ 0. In other words Jp f ≡ f (p) for each f ∈ C(R). Then ρDist,ξ (Jp,Jq ) → 0 as p,q → ∞. On the other hand Jp f → 0 for each f ∈ C(R). Hence if ρDist,ξ (Jp,J ) → 0 for some J ∈ J(S,d), then Jf = 0 for each f ∈ C(R), and so J = 0, contradicting the condition for J to be a distribution and an integration. The obvious problem here is that the mass of the distributions Jp escapes to infinity as p → ∞. The condition of tightness, defined next for a subfamily of J(S,d), prevents this from happening.
Probability Space
159
Definition 5.3.6. Tightness. Suppose the metric space (S,d) is locally compact. Let β : (0,∞) → [0,∞) be an operation. Let J be a subfamily of J(S,d) such that for each ε > 0 and for each J ∈ J , we have PJ (d(·,x◦ ) > a) < ε for each a > β(ε), where PJ is the probability function of the distribution J . Then we say the subfamily J is tight, with β as a modulus of tightness relative to the reference point x◦ . We say that a distribution J has modulus of tightness β if the singleton family {J } has modulus of tightness β. A family M of r.v.’s with values in the locally compact metric space (S,d), not necessarily on the same probability space, is said to be tight, with modulus of tightness β, if the family {EX : X ∈ M} is tight with modulus of tightness β. We will say that an r.v. X has modulus of tightness β if the singleton{X} family has modulus of tightness β. We emphasize that we have defined tightness of a subfamily J of J(S,d) only when the metric space (S,d) is locally compact, even as weak convergence in J(S,d) is defined for the more general case of any complete metric space (S,d). Note that, according to Proposition 5.2.5, d(·,x◦ ) is an r.r.v. relative to each distribution J . Hence, given each J ∈ J , the set (d(·,x◦ ) > a) is integrable relative to J for all but countably many a > 0. Therefore the probability PJ (d(·,x◦ ) > a) makes sense for all but countably many a > 0. However, the countable exceptional set of values of a depends on J . A modulus of tightness for a family M of r.v.’s gives the uniform rate of convergence P (d(x◦,X) > a)) → 0 as a → ∞, independent of X ∈ M, where the probability function P and the corresponding expectation E are specific to X. This is analogous to a modulus of uniform integrability for a family G of integrable r.r.v.’s, which gives the rate of convergence E(|X|;|X| > a) → 0 as a → ∞, independent of X ∈ G. Lemma 5.3.7. A family of r.r.v.’s bounded in Lp is tight. Let p > 0 be arbitrary. Let M be a family of r.r.v.’s such that E|X|p ≤ b for each X ∈ M, for some b ≥ 0. Then the family M is tight, with a modulus of tightness β relative to 0 ∈ R defined 1
by β(ε) ≡ b p ε
− p1
for each ε > 0.
Proof. Let X ∈ M be arbitrary. Let ε > 0 be arbitrary. Then, for each a > β(ε) ≡ 1
bp ε
− p1
, we have P (|X| > a) = P (|X|p > a p ) ≤ a −p E|X|p ≤ a −p b < ε,
where the first inequality is Chebychev’s inequality, and the second is by the definition of the constant b in the hypothesis. Thus X has the operation β as a modulus of tightness relative to 0 ∈ R. If a family J of distributions is tight relative to a reference point x◦ , then it is tight relative to any other reference point x0 , thanks to the triangle inequality. Intuitively, tightness limits the escape of mass to infinity as we go through
160
Probability Theory
distributions in J . Therefore a tight family of distributions remains so after a finitedistance shift of the reference point. Proposition 5.3.8. Tightness, in combination with convergence of a sequence of distributions at each member of C(S), implies weak convergence to some distribution. Suppose the metric space (S,d) is locally compact. Let {Jn : n ≥ 1} be a tight family of distributions, with a modulus of tightness β relative to the reference point x◦ . Suppose J (f ) ≡ limn→∞ Jn (f ) exists for each f ∈ C(S). Then J is a distribution, and Jn ⇒ J . Moreover, J has the modulus of tightness β + 2. Proof. 1. Clearly, J is a linear function on C(S). Suppose f ∈ C(S) is such that Jf > 0. Then, in view of the convergence in the hypothesis, there exists n ≥ 1 such that Jn f > 0. Since Jn is an integration, there exists x ∈ S such that f (x) > 0. We have thus verified Condition (ii) in Definition 4.2.1 for J . 2. Next let ε ∈ (0,1) be arbitrary, and take any a > β(ε). Then Pn (d(·,x◦ ) > a) < ε for each n ≥ 1, where Pn ≡ PJ (n) is the probability function for Jn . Define hk ≡ 1 ∧ (1 + k − d(·,x◦ ))+ ∈ C(S) for each k ≥ 1. Take any m ≡ m(ε,β) ∈ (a,a + 2). Then hm ≥ 1(d(·,x(◦))≤a) , whence Jn hm ≥ Pn (d(·,x◦ ) ≤ a) > 1 − ε
(5.3.3)
for each n ≥ 1. By hypothesis, Jn hm → J hm as n → ∞. Inequality 5.3.3 therefore yields J hm ≥ 1 − ε > 0.
(5.3.4)
We have thus verified Condition (i) in Definition 4.2.1 for J to be an integration on (S,d). Therefore, by Proposition 4.3.3, (S,C(S),J ) is an integration space. At the same time, inequality 5.3.4 implies that J hm ↑ 1. We conclude that J is a distribution. Since Jn f → Jf for each f ∈ C(S) by hypothesis, Lemma 5.3.3 implies that Jn ⇒ J . 3. Now note that inequality 5.3.4 implies that PJ (d(·,x◦ ) ≤ a + 2) = J 1(d(·,x◦ )≤a+2) ≥ J hm ≥ 1 − ε > 0,
(5.3.5)
where a > β(ε) is arbitrary. Thus J is tight with the modulus of tightness β + 2. The proposition is proved. Corollary 5.3.9. A tight ρDist,ξ -Cauchy sequence of distributions converges. Let ξ be a binary approximation of a locally compact metric space (S,d) relative to a reference point x◦ ∈ S. Let ρDist,ξ be the distribution metric on the space J(S,d) of distributions on (S,d), determined by ξ . Suppose the subfamily {Jn : n ≥ 1} ⊂ J(S,d) of distributions is tight, with a modulus of tightness β relative to x◦ . Suppose ρDist,ξ (Jn,Jm ) → 0 as n,m →∞. Then Jn ⇒J and ρDist,ξ (Jn,J ) →0, for some J ∈ J(S,d) with the modulus of tightness β + 2.
Probability Space
161
Proof. Suppose ρDist,ξ (Jn,Jm ) → 0 as n,m → ∞. Let f ∈ C(S) be arbitrary. We will prove that J (f ) ≡ limn→∞ Jn (f ) exists. Let ε > 0 be arbitrary. Then there exists a > β(ε) such that Pn (d(·,x◦ ) > a) < ε for each n ≥ 1, where Pn ≡ PJ (n) is the probability function for Jn . Let k ≥ 1 be so large that k ≥ a, and recall that hk ≡ 1 ∧ (1 + k − d(·,x◦ ))+ . Then Jn hk ≥ Pn (d(·,x◦ ) ≤ a) > 1 − ε for each n ≥ 1. At the same time, f hk ∈ C(S). Since ρDist,ξ (Jn,Jm ) → 0, it follows that (Jn f hk )n=1,2,... is a Cauchy sequence of real numbers, according to Assertion 1 of Proposition 5.3.5. Hence Jf hk ≡ limn→∞ Jn (f hk ) exists. Consequently, |Jn f − Jm f | ≤ |Jn f − Jn f hk | + |Jn f hk − Jm f hk | + |Jm f hk − Jm f | ≤ |1 − Jn hk | + |Jn f hk − Jm f hk | + |Jm hk − 1| ≤ ε + |Jn f hk − Jm f hk | + ε < ε + ε + ε for sufficiently large n,m ≥ 1. Since ε > 0 is arbitrary, we conclude that J (f ) ≡ limn→∞ Jn f exists for each f ∈ C(S). By Proposition 5.3.8, J is a distribution with the modulus of tightness β + 2 and Jn ⇒ J . Proposition 5.3.5 then implies that ρDist,ξ (Jn,J ) → 0. Proposition 5.3.10. A weakly convergent sequence of distributions on a locally compact metric space is tight. Suppose the metric space (S,d) is locally compact. Let J,Jn be distributions for each n ≥ 1. Suppose Jn ⇒ J . Then the family {J,J1,J2, . . .} is tight. In particular, any finite family of distributions on S is tight, and any finite family of r.v.’s with values in S is tight. Proof. For each n ≥ 1, write P and Pn for PJ and PJ (n) , respectively. Since J is a distribution, we have P (d(·,x◦ ) > a) → 0 as a → ∞. Thus any family consisting of a single distribution J is tight. Let β0 be a modulus of tightness of {J } with reference to x◦ , and for each k ≥ 1, let βk be a modulus of tightness of {Jk } with reference to x◦ . Let ε > 0 be arbitrary. Let a > β0 2ε and define f ≡ 1 ∧ (a + 1 − d(·,x◦ ))+ . Then f ∈ C(S) with 1(d(·,x(◦))>a+1) ≤ 1 − f ≤ 1(d(·,x(◦))>a) . Hence 1 − Jf ≤ P (d(·,x◦ ) > a) < 2ε . By hypothesis, we have Jn ⇒ J . Hence there exists m ≥ 1 so large that |Jn f − Jf | < 2ε for each n > m. Consequently, ε m. Define β(ε) ≡ (a + 1) ∨ β1 (ε) ∨ · · · ∨ βm (ε). Then, for each a > β(ε) we have Pn (d(·,x◦ ) > a + 1) ≤ 1 − Jn f < 1 − Jf +
162
Probability Theory
(i) P (d(·,x◦ ) > a ) ≤ P (d(·,x◦ ) > a) < 2ε . (ii) Pn (d(·,x◦ ) > a ) ≤ Pn (d(·,x◦ ) > a + 1) < ε for each n > m. (iii) a > βn (ε) and so Pn (d(·,x◦ ) > a ) ≤ ε for each n = 1, . . . ,m. Since ε > 0 is arbitrary, the family {J,J1,J2 . . .} is tight.
The next proposition provides alternative characterizations of weak convergence in the case of locally compact (S,d). Proposition 5.3.11. Modulus of continuity of the function J → Jf , for a fixed Lipschitz continuous function f. Suppose (S,d) is locally compact, with a reference point x◦ . Let ξ ≡ (An )n=1,2,... be a binary approximation of (S,d) relative to x◦ , with a corresponding modulus of local compactness ξ ≡ (|An |)n=1,2,... of (S,d). Let ρDist,ξ be the distribution metric on the space J(S,d) of distributions on (S,d), determined by ξ , as introduced in Definition 5.3.4. Let J,J ,Jp be distributions on (S,d), for each p ≥ 1. Let β be a modulus of tightness of {J,J } relative to x◦ . Then the following conditions hold: 1. Let f ∈ C(S,d) be arbitrary with |f | ≤ 1 and with modulus of continuity (ε,δf ,β, ξ ) > 0 such that if δf . Then, for each ε > 0, there exists (ε,δf ,β, ξ ), then |Jf − J f | < ε. ρDist,ξ (J,J ) < 2. The following three conditions are equivalent: (i) Jp f → Jf for each Lipschitz continuous f ∈ C(S), (ii) Jp ⇒ J , and (iii) Jp f → Jf for each Lipschitz continuous f that is bounded. Proof. 1. By Definition 3.2.1, we have (d(·,x◦ ) ≤ 2n ) ⊂
(d(·,x) ≤ 2−n )
(5.3.6)
x∈A(n)
and
(d(·,x) ≤ 2−n+1 ) ⊂ (d(·,x◦ ) ≤ 2n+1 )
(5.3.7)
x∈A(n)
for each n ≥ 1. 2. Let ε > 0 be arbitrary. As an abbreviation, write α ≡ 2ε . Let & ε ' b ≡ 1+β . 4 1 Define h ≡ 1 ∧ (b − d(·,x◦ ))+ ∈ C(S). Then the functions h and f h have support (d(·,x◦ ) ≤ b), and we have h = 1 on (d(·,x◦ ) ≤ b − 1). 3. Since the function h has a Lipschitz constant of 1, the function f h has a γ γ modulus of continuity δf h defined by δf h (γ ) ≡ 2 ∧ δf 2 for each γ > 0. As in Step 1 of the proof of Proposition 5.3.5, let n ≡ [0 ∨ (1 − log2 δf h (3−2 α)) ∨ log2 b]1 , and define δJ(α) ≡ δJ(α,δf h,b, ξ ) ≡ 3−1 2−n |An |−1 α.
Probability Space
163
Then, by Proposition 5.3.5, |Jf h − J f h| < α,
(5.3.8)
provided that J,J ∈ J are such that ρDist,ξ (J,J ) < δJ(α). 4. Now define (ε) ≡ (ε,δf ,β, ξ ) ≡ δJ(α). (ε). We need to prove that |Jf − J f | < ε. To that Suppose ρDist,ξ (J,J ) < end, note that since J,J have tightness ε modulus β, and since 1 − h = 0 on (d(·,x◦ ) ≤ b − 1) where b − 1 > β 4 , we have J (1 − h) ≤ PJ (d(·,x◦ ) ≥ b − 1) ≤
ε . 4
Consequently, |Jf − Jf h| = |Jf (1 − h)| ≤ J (1 − h) ≤
ε . 4
(5.3.9)
Similarly, |J f − J f h| ≤
ε . 4
(5.3.10)
Combining inequalities 5.3.8, 5.3.9, and 5.3.10, we obtain |Jf − J f | ≤ |Jf − Jf h| + |Jf h − J f h| + |J f − J f h| ε ε ε ε ε < + α + ≡ + + = ε, 4 4 4 2 4 as desired. Assertion 1 is thus proved. 5. We need to prove that Conditions (i–iii) are equivalent. To that end, first suppose (i) Jp f → Jf for each Lipschitz continuous f ∈ C(S). Let π ≡ ({gn,x : x ∈ An })n=1,2,... be the partition of unity of (S,d) determined by ξ . Then, for each n ≥ 1 and each x ∈ An , we have Jp gn,x → J gn,x as p → ∞, because gn,x ∈ C(S) is Lipschitz continuous by Proposition 3.3.5. Hence ρDist,ξ (Jp,J ) → 0 by Assertion 2 of Proposition 5.3.5. Consequently, by Assertion 3 of Proposition 5.3.5, we have Jp ⇒ J . Thus we have proved that Condition (i) implies Condition (ii). 6. Suppose next that Jp ⇒ J . Then since (S,d) is locally compact, we have ρDist,ξ (Jp,J ) → 0 by Assertion 3 of Proposition 5.3.5. Separately, in view of Proposition 5.3.10, the family {J,J1,J2, . . .} is tight, with some modulus of tightness β. Let f ∈ C(S) be arbitrary and Lipschitz continuous. We need to prove that Jp f → Jf . By linearity, we may assume that |f | ≤ 1, whence Jp f → Jf by Assertion 1. Thus Condition (ii) implies Condition (iii). 7. Finally, Condition (iii) trivially implies Condition (i). Assertion 2 and the proposition are proved.
164
Probability Theory 5.4 Probability Density Function and Distribution Function
In this section, we discuss two simple and useful methods to construct distributions on a locally compact metric space (S,d). The first starts with one integration I on (S,d) in the sense of Definition 4.2.1, where the full set S need not be integrable. Then, for each nonnegative integrable function g with integral 1, we can construct a distribution on (S,d) using g as a density function. A second method is for the special case where (S,d) = (R,d) is the real line, equipped with the Euclidean metric. Let F be an arbitrary distribution function on R, in the sense of Definition 4.1.1, such that F (t) → 0 as t → −∞, and F (t) → 1 as t → ∞. Then the Riemann–Stieljes integral corresponding to F constitutes a distribution on (R,d). Definition 5.4.1. Probability density function. Let I be an integration on a locally compact metric space (S,d) in the sense of Definition 4.2.1. Let (S,,I ) denote the completion of the integration space (S,C(S),I ). Let g ∈ be an arbitrary nonnegative integrable function with Ig = 1. Define Ig f ≡ Igf
(5.4.1)
for each f ∈ C(S,d). According to the following lemmas, the function Ig is a probability distribution on (S,d),in the sense of Definition 5.2.1. In such a case, g will be called a probability density function, or p.d.f . for short, relative to the integration I , and the completion (S,g ,Ig ) of (S,C(S,d),Ig ) will be called the probability space generated by the p.d.f. g. Suppose, in addition, that X is an arbitrary r.v. on some probability space (,L,E) with values in S such that EX = Ig , where EX is the distribution induced on the metric space (S,d) by the r.v. X, in the sense of Definition 5.2.3. Then the r.v. X is said to have the p.d.f. g relative to I . Frequently used p.d.f.’s are defined on (S,,I ) ≡ (R n,, ·dx), the n-dimensional Euclidean space equipped with the Lebesgue integral, and on (S,,I ) ≡ ({1,2, . . .},,I ) with the counting measure I defined by If ≡ ∞ n=1 f (n) for each f ∈ C(S). In the following discussion, for each n ≥ 1, define hn ≡ 1∧(1+n−d(·,x◦ ))+ ∈ C(S,d). Lemma 5.4.2. Ig is indeed a probability distribution. Let I be an arbitrary integration on a locally compact metric space (S,d) in the sense of Definition 4.2.1. Let g be a p.d.f. relative to the integration I . Then Ig is indeed a probability distribution on (S,d). Proof. 1. By Definition 5.4.1, we have g ∈ with g ≥ 0 and Ig = 1. Let ε > 0 be arbitrary. By Corollary 4.4.4, there exists f ∈ C(S,d) such that I |f − g| < ε.
Probability Space
165
Let n ≥ 1 be so large that the compact support of f is contained in the set (d(·,x◦ ) ≤ n). Then (i) f hn = f , (ii) |If hn − Ighn | ≤ I |f − g| < ε, and (iii) |If − 1| = |If − Ig| < ε. Combining, |Ighn − 1| ≤ |Ighn − If hn | + |If hn − If | + |If − 1| < ε + 0 + ε = 2ε. Since ε > 0 is arbitrarily small, we see that Ighn ↑ 1 as n → ∞. 2. By the defining equality 5.4.1, the function Ig is linear on the space C(S,d). Moreover, Ig hn ≡ Ighn > 0 for some sufficiently large n ≥ 1. This verifies Condition (i) of Definition 4.2.1 for the linear function Ig on C(S,d). Next suppose f ∈ C(S,d) with Ig f > 0. Then Igf ≡ Ig f > 0. Hence, by the positivity condition, Condition (ii) of Definition 4.2.1 for the integration I , there exists x ∈ S such that g(x)f (x) > 0. Consequently, f (x) > 0. Thus Condition (ii) of Definition 4.2.1 is also verified for the function Ig on C(S,d). 3. Accordingly, (S,C(S,d),Ig ) is an integration space. Since Ig hn ≡ I hn g ↑ 1, Assertion 3 of Lemma 5.2.2 implies that Ig is a probability distribution on (S,d). The lemma is proved. Distributions on R can be studied in terms of their corresponding distribution functions, as introduced in Definition 4.1.1 and specialized to probability distribution functions. Recall the convention that if F is a function, then we write F (t) only with the explicit or implicit assumption that t ∈ domain(F ). Definition 5.4.3. Probability distribution functions. Suppose F is a distribution function on R satisfying the following conditions: (i) F (t) → 0 as t → −∞, and F (t) → 1 as t → ∞; (ii) for each t ∈ domain(F ), the left limit limrt;s→t F (s) exists and is equal to F (t); (iv) domain(F ) contains the metric complement Ac of some countable subset A of R; and (v) if t ∈ R is such that both the left and right limits exist as just defined then t ∈ domain(F ). Then F is called a probability distribution function, or a P.D.F. for short. A point t ∈ domain(F ) is called a regular point of F . A point t ∈ domain(F ) at which the just-defined left and right limits exist and are equal is called a continuity point of F . Suppose X is an r.r.v. on some probability space (,L,E). Let FX be the function defined by (i ) domain(FX ) ≡ {t ∈ R : t is a regular point of X} and (ii ) FX (t) ≡ P (X ≤ t) for each t ∈ domain(FX ). Then FX is called the P.D.F. of X. Recall in the following that ·dF denotes the Riemann–Stieljes integration relative to a distribution function F on R. Proposition 5.4.4. FX is indeed a P.D.F. Let X be an r.r.v. on a probability space (,L,E) with FX : R → [0,1] as in Definition 5.4.3. Let EX : LX → R denote
166
Probability Theory
the distribution induced on R by X, in the sense of Definition 5.2.3. Then the following conditions hold: 1. F X is a P.D.F. 2. ·dFX = EX . Proof. As an abbreviation, write J ≡ EX and F ≡ FX , and write P for the probability function associated to the probability expectation E. 1. We need to verify Conditions (i) through (v) in Definition 5.4.3 for F . Condition (i) holds because P (X ≤ t) → 0 as t → −∞ and P (X ≤ t) = 1 − P (X > t) → 1 as t → ∞, by the definition of a measurable function. Next consider any t ∈ domain(F ). Then t is a regular point of X, by the definition of FX . Hence there exists a sequence (sn )n=1,2,... of real numbers decreasing to t such that (X ≤ sn ) is integrable for each n ≥ 1 and such that limn→∞ P (X ≤ sn ) exists. Since F is a nondecreasing function, it follows that lims>t;s→t F (s) exists. Similarly, limrt;s→t
n→∞
Conditions (ii) and (iii) in Definition 5.4.3 have thus been verified. Condition (iv) in Definition 5.4.3 follows from Assertion 1 of Proposition 4.8.14. Condition (v) remains. Suppose t ∈ R is such that both limrt;s→t F (s) exist. Then there exists a sequence (sn )n=1,2,... in domain(F ) decreasing to t such that F (sn ) converges. This implies that (X ≤ sn ) is an integrable set, for each n ≥ 1, and that P (X ≤ sn ) converges. Hence (X > sn ) is an integrable set, and P (X > sn ) converges. Similarly, there exists a sequence (rn )n=1,2,... increasing to t such that (X > rn ) is an integrable set and P (X > rn ) converges. We have thus verified the conditions in Definition 4.8.13 for t to be a regular point of X. In other words, t ∈ domain(F ). Condition (v) in Definition 5.4.3 has also been verified. Summing up,F ≡ FX is a P.D.F. 2. Note that both ·dFX and EX are complete extensions of integrations defined on (R,C(R)). Hence, to show that they are equal, it suffices to prove that they are equal on C(R). To that end, let f ∈ C(R) be arbitrary, with a modulus of continuity for δf . Let ε > 0 be arbitrary. The Riemann–Stieljes integral f (t)dFX (t) is, by definition, the limit of Riemann–Stieljes sums S(t1, . . . ,tn ) = n i=1 f (ti )(FX (ti ) − FX (ti−1 )) as t1 → −∞ and tn → ∞, with the mesh of the partition t1 < · · · < tn approaching 0. Consider such a Riemann–Stieljes sum where the mesh is smaller than δf (ε), and where [t1,tn ] contains a support of f . Then n
(f (ti ) − f (X))1(ti−1 k ≥ 1. Then 0 ≤ fk,n ≤ fj,n ≤ 1, and fj,n − fk,n has [rj +1,rk ] as support. Therefore, as seen in the previous paragraph, we have Jfj,n − Jfk,n ≤ F (rk ) − F (rj +1 ) → 0 as j ≥ k → ∞. Hence the Monotone
168
Probability Theory
Convergence Theorem implies that fn ≡ limk→∞ fk,n is integrable, with Jfn = limk→∞ Jfk,n . Moreover, fn = 1 on (−∞,sn+1 ], fn = 0 on [sn,∞), and fn is linear on [sn+1,sn ]. Now consider any m ≥ n ≥ 1. Then 0 ≤ fm ≤ fn ≤ 1, and fn −fm has [t,sn ] as support. Therefore, as seen earlier, Jfn − Jfm ≤ F (sn ) − F (t) → 0 as m ≥ n → ∞. Hence, the Monotone Convergence Theorem implies that g ≡ limn→∞ fn is integrable, with J g = limn→∞ Jfn . It is evident that on (−∞,t] we have fn = 1 for each n ≥ 1. Hence g is defined and equal to 1 on (−∞,t]. Similarly, g is defined and equal to 0 on (t,∞). Consider any x ∈ domain(g). Then either g(x) > 0 or g(x) < 1. Suppose g(x) > 0. Then the assumption x > t would imply g(x) = 0, which is a contradiction. Hence x ∈ (−∞,t] and so g(x) = 1. On the other hand, suppose g(x) < 1. Then fn (x) < 1 for some n ≥ 1, whence x ≥ sn+1 for some n ≥ 1. Hence x ∈ (t,∞) and so g(x) = 0. Combining, we see that 1 and 0 are the only possible values of g. In other words, g is an integrable indicator. Moreover, (g = 1) = (−∞,t] and (g = 0) = (t,∞). Thus the interval (−∞,t] is an integrable set with 1(−∞,t] = g. Finally, for any k,n ≥ 1, we have F (sn+1 )−F (rk ) ≤ Jfk,n ≤ F (sn )−F (rk+1 ). Letting k → ∞, we obtain F (sn+1 ) ≤ Jfn ≤ F (sn ) for n ≥ 1. Letting n → ∞, we obtain, in turn, J g = F (t). In other words J 1(−∞,t] = F (t). Assertion 2 is proved. 3. Suppose two P.D.F.’s F and F are equal on some dense subset D of domain(F ) ∩ domain(F ). We will show that then F = F . To that end, consider any t ∈ domain(F ). Let (sn )n=1,2,... be a decreasing sequence in D converging to t. By hypothesis, F (sn ) = F (sn ) for each n ≥ 1. At the same time, F (sn ) → F (t) since t ∈ domain(F ). Therefore F (sn ) → F (t). By the monotonicity of F , it follows that lims>t;s→t F (s) = F (t). Similarly, limrt;s→t F (s) = F (t). We have thus proved that domain(F ) ⊂ domain(F ) and F = F on domain(F ). By symmetry, domain(F ) = domain(F ). We conclude that F = F . Assertion 3 is verified. ·dF . Write 4. Suppose two F and F are such that ·dF = P.D.F.’s J ≡ ·dF = ·dF . Consider any t ∈ D ≡ domain(F ) ∩ domain(F ). By Assertion 2, the interval (−∞,t] is integrable relative to J , with F (t) = J 1(−∞,t] = F (t). Since D is a dense subset of R, we have F = F by Assertion 3. This proves Assertion 4. 5. Let F be any P.D.F. By Assertion 1, we have ·dF = ·dFX for some r.r.v. X. Therefore F = FX according to Assertion 4. Assertion 5 is proved. 6. Let F be any P.D.F. By Assertion 5, F = FX for some r.r.v. X. Hence F (t) = FX (t) ≡ P (X ≤ t) for each regular point t of X. Consider any continuity point t of X. Then, by Definition 4.8.13, we have limn→∞ P (X ≤ sn ) = limn→∞ P (X ≤ rn ) for some decreasing sequence (sn ) with sn → t and for
Probability Space
169
some increasing sequence (rn ) with rn → t. Since the function is nondecreasing, it follows that lims>t;s→t F (s) = limr 0) ⊂ (d(·,x) ≤ 2−n+1 ) for each x ∈ An , and x∈A(n)
⎛ (d(·,x) ≤ 2−n ) ⊂ ⎝
x∈A(n)
(5.5.3) ⎞
gn,x = 1⎠ .
(5.5.4)
Probability Space Define Kn ≡ κn + 1, and define the sequence ⎛
171 ⎛
(fn,1, . . . ,fn,K(n) ) ≡ ⎝gn,x(n,1), . . . ,gn,x(n,κ(n)), ⎝1 −
⎞⎞ gn,x ⎠⎠ (5.5.5)
x∈A(n)
of nonnegative continuous functions on S. Then K(n)
fn,k = 1
(5.5.6)
k=1
on S, where n ≥ 1 is arbitrary. 2. For the purpose of this proof, an open interval (a,b) in [0,1] is defined by an arbitrary pair of endpoints a,b, where 0 ≤ a ≤ b ≤ 1. Two open intervals (a,b),(a ,b ) are considered equal if a = a and b = b . For arbitrary open intervals (a,b),(a ,b ) ⊂ [0,1] we will write, as an abbreviation, (a,b) < (a ,b ) if b ≤ a . 3. Let n ≥ 1 be arbitrary. Define the product set Bn ≡ {1, . . . ,K1 } × · · · × {1, . . . ,Kn }. Let μ denote the Lebesgue measure on [0,1]. Define the open interval ≡ (0,1). Then, since K(1)
k=1
Efn,k = E
K(1)
fn,k = E1 = 1,
k=1
we can subdivide the open interval into mutually exclusive open subintervals 1, . . . ,K(1), such that μk = Efn,k for each k = 1, . . . ,K1 , and such that k < j for each k = 1, . . . ,κ1 with k < j. 4. We will construct, for each n ≥ 1, a family of mutually exclusive open subintervals {k(1),...,k(n) : (k1, . . . ,kn ) ∈ Bn } of (0,1) such that for each (k1, . . . ,kn ) ∈ Bn , we have (i) μk(1),...,k(n) = Ef1,k(1) . . . fn,k(n) . (ii) k(1),...,k(n) ⊂ k(1),...,k(n−1) if n ≥ 2. (iii) k(1),...,k(n−1),k < k(1),...,k(n−1),j for each k,j = 1, . . . ,κn with k < j . 5. Proceed inductively. Step 3 gave the construction for n = 1. Now suppose the construction has been carried out for some n ≥ 1 such that Conditions (i–iii) are satisfied. Consider each (k1, . . . ,kn ) ∈ Bn . Then
172
Probability Theory K(n+1)
Ef1,k(1) . . . fn,k(n) fn+1,k = Ef1,k(1) . . . fn,k(n)
K(n+1)
k=1
fn+1,k
k=1
= Ef1,k(1) . . . fn,k(n) = μk(1),...,k(n), where the last equality is because of Condition (i) in the induction hypothesis. Hence we can subdivide k(1),...,k(n) into Kn+1 mutually exclusive open subintervals k(1),...,k(n),1, . . . ,k(1),...,k(n),K(n+1) such that μk(1),...,k(n),k(n+1) = Ef1,k(1) . . . fn,k(n) fn+1,k(n+1)
(5.5.7)
for each kn+1 = 1, . . . ,Kn+1 . Thus Condition (i) holds for n + 1. In addition, we can arrange these open subintervals such that k(1),...,k(n),k < x(1),...,x(n),j for each k,j = 1, . . . ,Kn+1 with k < j . This establishes Condition (iii) for n + 1. Condition (ii) also holds for n + 1 since, by construction, k(1),...,k(n),k(n+1) is a subinterval of k(1),...,k(n) for each (k1, . . . ,kn+1 ) ∈ Bn+1 . Induction is completed. 6. Note that mutual exclusiveness and Condition (i) together imply that
k(1),...,k(n) = μk(1),...,k(n) μ (k(1),...,k(n))∈B(n)
(k(1),...,k(n))∈B(n)
⎛
=E⎝
K(1)
⎞
⎛
f1,K(1) ⎠ . . . ⎝
k(1)=1
K(n)
⎞ fn,K(n) ⎠
k(n)=1
= E1 = 1
(5.5.8)
for each n ≥ 1. Hence the set D≡
∞
k(1),...,k(n)
(5.5.9)
n=1 (k(1),...,k(n))∈B(n)
is a full subset of [0,1]. 7. Let θ ∈ D be arbitrary. Consider each n ≥ 1. Then θ ∈ k(1),...,k(n) for some unique sequence (k1, . . . ,kn ) ∈ Bn since the intervals in each union in equality 5.5.9 are mutually exclusive. By the same token, θ ∈ j (1),...,j (n+1) for some unique (j1, . . . ,jn+1 ) ∈ Bn+1 . Then θ ∈ j (1),...,j (n) in view of Condition (ii) in Step 4. Hence, by uniqueness of the sequence (k1, . . . ,kn ), we have (j1, . . . ,jn ) = (k1, . . . ,kn ). Now define kn+1 ≡ j n+1 . It follows that θ ∈ k(1),...,k(n+1) . Thus we obtain inductively a unique sequence (kp )p=1,2,... such that kp ∈ {1, . . . ,Kp } and θ ∈ k(1),...,k(p) , for each p ≥ 1.
Probability Space
173
Since the open interval k(1),...,k(n) contains the given point θ , it has positive Lebesgue measure. Hence, in view of Condition (i) in Step 4, it follows that Ef1,k(1) . . . fn,k(n) > 0, where n ≥ 1 is arbitrary. 8. Define the function Xn : [0,1] → (S,d) by domain(Xn ) ≡
(5.5.10)
k(1),...,k(n)
(k(1),...,k(n))∈B(n)
and by Xn ≡ xn,k(n) or x◦
on k(1),...,k(n), based on either kn ≤ κn or kn = Kn, (5.5.11)
for each (k1, . . . ,kn ) ∈ Bn . Then, according to Proposition 4.8.8, Xn is a r.v. with values in the metric space (S,d). In other words, Xn ∈ M(0,S). Now define the function X ≡ limn→∞ Xn . We proceed to prove that the function X is a well-defined r.v. by showing that Xn → X a.u. 9. To that end, let n ≥ 1 be arbitrary, but fixed until further notice. Define m ≡ mn ≡ n ∨ [log2 (1 ∨ β(2−n ))]1,
(5.5.12)
where β is the assumed modulus of tightness of the distribution E relative to the reference point x◦ ∈ S. Then 2m > β(2−n ). Take an arbitrary αn ∈ (β(2−n ),2m ). Then E(d(·,x◦ ) > αn ) ≤ 2−n,
(5.5.13)
because β is a modulus of tightness of E. At the same time, (d(·,x◦ ) ≤ αn ) ⊂ (d(·,x◦ ) ≤ 2 ) ⊂ m
−m
(d(·,x) ≤ 2
⎛
) ⊂⎝
x∈A(m)
⎞ gm,x = 1⎠,
x∈A(m)
(5.5.14) where the second and third inclusions are by relations 5.5.2 and 5.5.4, respectively. Define the Lebesgue measurable set Dn ≡ k(1),...,k(m) ⊂ [0,1]. (5.5.15) (k(1),...,k(m))∈B(m);k(m)≤κ(m)
Then, in view of equality 5.5.8, we have
μ(Dnc ) =
μk(1),...,k(m)
(k(1),...,k(m))∈B(m);k(m)=K(m)
=
(k(1),...,k(m))∈B(m);k(m)=K(m)
Ef1,k(1) . . . fm,k(m)
174
Probability Theory =
K(1)
k(1)=1
...
K(m−1)
Ef1,k(1) . . . fm−1,k(m)−1 fm,K(m)
k(m−1)=1
⎛
= Efm,K(m) = E ⎝1 −
⎞
gm,x ⎠
x∈A(m)
≤ E(d(·,x◦ ) > αn ) ≤ 2−n,
(5.5.16)
where the first inequality is thanks to relation 5.5.14, and the second is by inequality 5.5.13. 10. Consider each θ ∈ D. By Step 7, there exists a unique sequence (kp )p=1,2,... such that kp ∈ {1, . . . ,Kp } and θ ∈ k(1),...,k(p) for each p ≥ 1. In particular, θ ∈ k(1),...,k(m) . Suppose km ≤ κm . Then fm,k(m) ≡ gm,x(m,k(m)), according to the defining equality 5.5.5. Moreover, by Condition (ii) in Step 4, we have θ ∈ k(1),...,k(p) ⊂ k(1),...,k(m) for each p ≥ m. Suppose, for the sake of a contradiction, that km+1 = Km+1 . Then, by the defining equality 5.5.5, we have
fm+1,k(m+1) ≡ 1 − gm+1,x . x∈A(m+1)
Separately, inequality 5.5.10 applied to m + 1 yields 0 < Ef1,k(1) . . . fm,k(m) fm+1,k(m+1)
(5.5.17)
On the other hand, using successively the relations 5.5.3, 5.5.1, 5.5.2, and 5.5.4, we obtain (fm,k(m) > 0) = (gm,x(m,k(m)) > 0) ⊂ (d(·,xm,k(m) ) ≤ 2−m+1 ) ⊂ (d(·,x◦ ) ≤ 2m + 2−m+1 ) ⊂ (d(·,x◦ ) ≤ 2m+1 ) ⊂ (d(·,x) ≤ 2−m−1 ) ⎛ ⊂⎝
x∈A(m+1)
⎞
gm+1,x = 1⎠ = (fm+1,k(m+1) = 0).
x∈A(m+1)
Consequently, fm,k(m) fm+1,k(m+1) = 0. Hence the right-hand side of inequality 5.5.17 vanishes, while the left-hand side is 0, which is a contradiction. We conclude that km+1 ≤ κm+1 . Summing up, from the assumption that km ≤ κm , we can infer that km+1 ≤ κm+1 . Repeating these steps, we obtain, for each q ≥ m, the inequality kq ≤ κq , whence fq,k(q) ≡ gq,x(q,k(q)),
(5.5.18)
Probability Space
175
where q ≥ m is arbitrary. It follows from the defining equality 5.5.11 that Xq (θ ) = xq,k(q)
(5.5.19)
for each q ≥ m ≡ mn , provided that km ≤ κm . 11. Next, suppose θ ∈ DDn . Then km ≤ κm according to the defining equality 5.5.15 for the set Dn . Hence equality 5.5.19 holds for each q ≥ m. Thus we see that ∞
DDn ⊂
(Xq = xq,k(q) ).
(5.5.20)
q=m(n)
Now let q ≥ p ≥ m be arbitrary. As an abbreviation, write yp ≡ xp,k(p) . Then inequality 5.5.10 and equality 5.5.18 together imply that Ef1,k(1) . . . fm−1,k(m−1) gm,y(m) . . . gp,y(p) . . . gq,y(q) = Ef1,k(1) . . . fq,k(q) > 0. Hence there exists z ∈ S such that (f1,k(1) . . . fm−1,k(m−1) gm,y(m) . . . gp,y(p) . . . gq,y(q) )(z) > 0, whence gp,y(p) (z) > 0 and gq,y(q) (z) > 0. Consequently, by relation 5.5.3, we obtain d(yp,yq ) ≤ d(yp,z) + d(z,yq ) ≤ 2−p+1 + 2−q+1 → 0
(5.5.21)
as p,q → ∞. Since (S,d) is complete, we have Xp (θ ) ≡ xp,k(p) ≡ yp → y as p → ∞, for some y ∈ S. Hence θ ∈ domain(X), with X(θ ) ≡ y, where we recall that X ≡ limn→∞ Xn .. Moreover, with q → ∞ in inequality 5.5.21, we obtain d(Xp (θ ),X(θ )) ≤ 2−p+1,
(5.5.22)
where p ≥ m ≡ mn and θ ∈ DDn are arbitrary. Since μ(DDn )c = μDnc ≤ 2−n is arbitrarily small when n ≥ 1 is sufficiently large, we conclude that Xn → X a.u. relative to the Lebesgue measure I , as n → ∞. It follows that the function X : [0,1] → S is measurable. In other words, X is a r.v. 12. It remains to verify that IX = E, where IX is the distribution induced by X on S. For that purpose, let h ∈ C(S) be arbitrary. We need to prove that I h(X) = Eh. Without loss of generality, we assume that |h| ≤ 1 on S. Let δh be a modulus of continuity of the function h. Let ε > 0 be arbitrary. Let n ≥ 1 be so large that (i ) 1 ε (5.5.23) 2−n < ε ∧ δh 2 3
176
Probability Theory
and (ii ) h is supported by (d(·,x◦ ) ≤ 2n ). Then relation 5.5.2 implies that h is supported by x∈A(n) (d(·,x) ≤ 2−n ). At the same time, by the defining equality 5.5.11 of the simple r.v. Xn : [0,1] → S, we have
I h(Xn ) = h(xn,k(n) )μk(1),...,k(n) + h(x◦ )μ(Dnc ) (k(1),...,k(n))∈B(n);k(n)≤κ(n)
=
K(1)
k(1)=1
···
K(n−1)
κ(n)
h(xn,k(n) )Ef1,k(1) . . . fn−1,k(n−1) fn,k(n)
k(n−1)=1 k(n)=1
+ h(x◦ )μ(Dnc ) =
κ(n)
h(xn,k )Efn,k + h(x◦ )μ(Dnc )
k=1
=
κ(n)
h(xn,k(n) )Egn,x(n,k) + h(x◦ )μ(Dnc )
k=1
=
h(x)Egn,x + h(x◦ )μ(Dnc ),
x∈A(n)
where the third equality is thanks to equality 5.5.6. Hence
I h(Xn ) − E h(x)gn,x ≤ |h(x◦ )μ(Dnc )| ≤ μ(Dnc ) ≤ 2−n < ε. (5.5.24) x∈A(n) ε 1 At the same time, since An is a 2−n -partition of unity of (S,d), with 2−n < 2 δh 3 , Assertion 2 of Proposition 3.3.6 implies that x∈A(n) h(x)gn,x −h ≤ ε. Hence
E ≤ ε. h(x)g − Eh n,x x∈A(n) Inequality 5.5.24 therefore yields |I h(Xn ) − Eh| < 2ε. Since ε > 0 is arbitrarily small, we have I h(Xn ) → Eh as n → ∞. At the same time, h(Xn ) → h(X) a.u. relative to I . Hence the Dominated Convergence Theorem implies that I h(Xn ) → I h(X). It follows that Eh = I h(X), where h ∈ C(S) is arbitrary. We conclude that E = IX . Define Sk,ξ (E) ≡ X, and the theorem is proved. Theorem 5.5.2. Metrical continuity of Skorokhod representation. Let ξ ≡ (An )n=1,2,... be a binary approximation of the locally compact metric space (S,d) relative to the reference point x◦ ∈ S. Let ξ ≡ (κn )n=1,2,... be the modulus of local compactness of (S,d) corresponding to ξ . In other words, κn ≡ |An | is the number of elements in the enumerated finite set An , for each n ≥ 1.
Probability Space
177
Let J(S,d) be the set of distributions on (S,d). Let Jβ (S,d) be an arbitrary tight subset of J(S,d), with a modulus of tightness β relative to x◦ . Recall the probability metric ρP rob on M(0,S) defined in Definition 5.1.10. Then the Skorokhod representation Sk,ξ : (J(S,d),ρDist,ξ ) → (M(0,S),ρP rob ) constructed in Theorem 5.5.1 is uniformly continuous on the subset Jβ (S,d), with a modulus of continuity δSk (·, ξ ,β) depending only on ξ and β. Proof. Refer to the proof of Theorem 5.5.1 for notations. In particular, let π ≡ ({gn,x : x ∈ An })n=1,2,... denote the partition of unity of (S,d) determined by ξ . 1. Let n ≥ 1 be arbitrary. Recall from Proposition 3.3.5 that for each x ∈ An , the functions gn,x and y∈A(n) gn,y in C(S) have Lipschitz constant 2n+1 and have values in [0,1]. Consequently, each of the functions fn,1, . . . ,fn,K(n) defined in formula 5.5.5 has a Lipschitz constant 2n+1 and has values in [0,1]. 2. Let (k1, . . . ,kn ) ∈ Bn ≡ {1, . . . ,K1 } × · · · × {1, . . . ,Kn } be arbitrary. Then, by Lemma 3.3.1 on the basic properties of Lipschitz functions, the function hk(1),...,k(n) ≡ n−1
n k(i)−1
f1,k(1) . . . fi−1,k(i−1) fi,k ∈ Cub (S)
i=1 k=1
has a Lipschitz constant given by n−1
n k(i)−1
(21+1 + 22+1 + · · · + 2i+1 )
i=1 k=1
≤ 22 n−1
n
κi (1 + 2 + · · · + 2i−1 )
i=1
< 22 n−1
n
κn 2i < 22 n−1 κn 2n+1 = n−1 2n+3 κn,
i=1
where n ≥ 1 is arbitrary. Moreover, 0 ≤ hk(1),...,k(n) ≤ n−1
n k(i)−1
i=1 k=1
fi,k ≤ n−1
n
1 = 1.
i=1
3. Now let E,E ∈ Jβ (S,d) be arbitrary. Let the objects {k(1),...,k(n) : (k1, . . . ,kn ) ∈ Bn ;n ≥ 1}, D, (Xn )n=1,2,... , and let X be constructed as in Theorem 5.5.1 relative to E. Let the objects { k(1),...,k(n) : (k1, . . . ,kn ) ∈ Bn ; n ≥ 1}, D , (Xn )n=1,2,... , and let X be similarly constructed relative to E .
178
Probability Theory
4. Let ε > 0 be arbitrary. Fix n ≡ [3 − log2 ε]1 . Thus 2−n+3 < ε. As in the proof of Theorem 5.5.1, let m ≡ mn ≡ n ∨ [log2 (1 ∨ β(2−n ))]1,
(5.5.25)
c ≡ m−1 2m+3 κm,
(5.5.26)
and let where β is the given modulus of tightness of the distributions E in Jβ (S,d) relative to the reference point x◦ ∈ S. Let α ≡ 2−n
m
Ki −1 = 2−n |Bm |−1 .
(5.5.27)
i=1
(m−1 α,c,β, ξ ) > 0 such By Assertion 1 of Proposition 5.3.11, there exists that if (m−1 α,c,β, ξ ), ρDist,ξ (E,E ) < δSk (ε, ξ ,β) ≡ then |Ef − E f | < m−1 α
(5.5.28)
for each f ∈ Cub (S) with Lipschitz constant c > 0 and with |f | ≤ 1. 5. Now suppose ρDist,ξ (E,E ) < δSk (ε, ξ ,β).
(5.5.29)
We will prove that ρP rob (X,X ) < ε. To that end, consider each (k1, . . . ,km ) ∈ Bm . We will calculate the endpoints of the corresponding open interval (ak(1),...,k(m),bk(1),...,k(m) ) ≡ k(1),...,k(m) . Recall that, by construction, {k(1),...,k(m−1),k : 1 ≤ k ≤ Km } is the set of subintervals in a partition of the open interval k(1),...,k(m−1) into mutually exclusive open subintervals, with k(1),...,k(m−1),k < k(1),...,k(m−1),j if 1 ≤ k < j ≤ Km . Hence the left endpoint of k(1),...,k(m) is ak(1),...,k(m) = ak(1),...,k(m−1) +
k(m)−1
μk(1),...,k(m−1),k
k=1
= ak(1),...,k(m−1) +
k(m)−1
k=1
Ef1,k(1) . . . fm−1,k(m−1) fm,k ,
(5.5.30)
Probability Space
179
where the second equality is due to Condition (i) in Step 4 of the proof of Theorem 5.5.1. Recursively, we then obtain ak(1),...,k(m) = ak(1),...,k(m−2) +
k(m−1)−1
Ef1,k(1) . . . fm−2,k(m−2) fm−1,k
k=1
+
k(m)−1
Ef1,k(1) . . . fm−1,k(m−1) fm,k
k=1
= ··· =
m k(i)−1
Ef1,k(1) . . . fi−1,k(i−1) fi,k ≡ mEhk(1),...,k(m) .
i=1 k=1
6. Similarly, write ,bk(1),...,k(m) ) ≡ k(1),...,k(m) . (ak(1),...,k(m)
Then ak(1),...,k(m) = mE hk(1),...,k(m) .
Combining, |ak(1),...,k(m) − ak(1),...,k(m) | = m|Ehk(1),...,k(m) − E hk(1),...,k(m) | < mm−1 α = α, (5.5.31)
where the inequality is obtained by applying inequality 5.5.28 to the function f ≡ hk(1),...,k(m) ; in Step 2, the latter was observed to have values in [0,1] and to have Lipschitz constant c ≡ m−1 2m+3 κm . By symmetry, we can similarly prove that |bk(1),...,k(m) − bk(1),...,k(m) | < α.
(5.5.32)
7. Inequality 5.5.22 in Step 11 of the proof of Theorem 5.5.1 gives d(Xm,X) ≤ 2−m+1
(5.5.33)
on DDn . Now partition the set Bm ≡ Bm,0 ∪Bm,1 ∪Bm,2 into three disjoint subsets such that Bm,0 = {(k1, . . . ,km ) ∈ Bm : km = Km }, Bm,1 = {(k1, . . . ,km ) ∈ Bm : km ≤ κm ;μk(1),...,k(m) > 2α}, Bm,2 = {(k1, . . . ,km ) ∈ Bm : km ≤ κm ;μk(1),...,k(m) < 3α}. 8. For each (k1, . . . ,km ) ∈ Bm,1, define the α-interior k(1),...,k(m) ≡ (ak(1),...,k(m) + α,bk(1),...,k(m) − α) of the open interval k(1),...,k(m) . Define the set k(1),...,k(m) ⊂ . H ≡ (k(1),...,k(m))∈B(m,1)
180
Probability Theory
Then
Hc =
c k(1),...,k(m) k(1),...,k(m) ∪
(k(1),...,k(m))∈B(m,1)
(k(1),...,k(m))∈B(m,2)
Hence
2α +
(k(1),...,k(m))∈B(m,1)
+μ
(k(1),...,k(m))∈B(m,1)
μk(1),...,k(m)
(k(1),...,k(m))∈B(m,2)
k(1),...,k(m)
(k(1),...,k(m))∈B(m);k(m)=K(m)
0 is arbitrary. Thus the mapping Sk,ξ : (J(S,d),ρDist,ξ ) → (M(0,S),ρP rob ) is uniformly continuous on the subspace Jβ (S,d), with δSk (·, ξ ,β) as a modulus of continuity. The theorem is proved. Skorokhod’s continuity theorem in [Skorokhod 1956], in terms of a.u. convergence, is a consequence of the preceding proof. Theorem 5.5.3. Continuity of Skorokhod representation in terms of weak convergence and a.u. convergence. Let ξ be a binary approximation of the locally compact metric space (S,d), relative to the reference point x◦ ∈ S. Let E,E (1),E (2), . . . be a sequence of distributions on (S,d) such that (n) E ⇒ E. Let X ≡ Sk,ξ (E) and X(n) ≡ Sk,ξ (E (n) ) for each n ≥ 1. Then X(n) → X a.u. on (0,L0,I ). Proof. Let ξ be the modulus of local compactness of (S,d) corresponding to ξ . By hypothesis, E (n) ⇒ E. Hence ρDist,ξ (E,E (n) ) → 0 by Proposition 5.3.5. By Proposition 5.3.10, the family Jβ (S,d) ≡ {E,E (1),E (2), . . .} is tight, with some modulus of tightness β. Let ε > 0 be arbitrary. Let δSk (ε, ξ ,β) > 0 be defined as in Theorem 5.5.2. By relation 5.5.38 in Step 10 of the proof of Theorem 5.5.2, we see that there exists a Lebesgue measurable subset H of [0,1] which depends only on E, with μH c < ε, such that for each E ∈ Jβ (S,d) we have H ⊂ (d(X,X ) < ε)
a.s.,
(5.5.39)
where X ≡ Sk,ξ (E ), provided that ρDist,ξ (E,E ) < δSk (ε, ξ ,β). Take any p ≥ 1 so large that ρDist,ξ (E,E (n) ) < δSk (ε, ξ ,β) for each n ≥ p. Then
182
Probability Theory H ⊂
∞
(d(X,X(n) ) < ε)
a.s.
n=p
Consequently, Xn → X a.u. according to Proposition 5.1.14.
5.6 Independence and Conditional Expectation The product space introduced in Definition 4.10.5 gives a model for compounding two independent experiments into one. This section introduces the notion of conditional expectations, which is a more general method of compounding probability spaces. Definition 5.6.1. Independent set of r.v.’s. Let (,L,E) be a probability space. A finite set {X1, . . . ,Xn } of r.v.’s where Xi has values in a complete metric space (Si ,di ), for each i = 1, . . . ,n, is said to be independent if Ef1 (X1 ) . . . fn (Xn ) = Ef1 (X1 ) . . . Efn (Xn )
(5.6.1)
for each f1 ∈ Cub (S1 ), . . . ,fn ∈ Cub (Sn ). In that case, we will also simply say that X1, . . . ,Xn are independent r.v.’s. A sequence of events A1, . . . ,An is said to be independent if their indicators 1A(1), . . . ,1A(n) are independent r.r.v.’s. An arbitrary set of r.v.’s is said to be independent if every finite subset is independent. Proposition 5.6.2. Independent r.v.’s from product space. Let F1, . . . ,Fn be distributions on the locally compact metric spaces (S1,d1 ), . . . ,(Sn,dn ), respectively. Let (S,d) ≡ (S1 × . . . ,Sn,d1 ⊗ . . . ⊗ dn ) be the product metric space. Consider the product integration space (,L,E) ≡ (S,L,F1 ⊗ · · · ⊗ Fn ) ≡
n (Sj ,Lj ,Fj ), j =1
where (Si ,Li ,Fi ) is the probability space that is the completion of (Si ,Cub (Si ),Fi ), for each i = 1, . . . ,n. Then the following conditions hold: 1. Let i = 1, . . . ,n be arbitrary. Define the coordinate r.v. Xi : → Si by Xi (ω) ≡ ωi for each ω ≡ (ω1, . . . ,ωn ) ∈ . Then the r.v.’s X1, . . . ,Xn are independent. Moreover, Xi induces the distribution Fi on (Si ,di ) for each i = 1, . . . ,n. 2. F1 ⊗ · · · ⊗ Fn is a distribution on (S,d). Specifically, it is the distribution F induced on (S,d) by the r.v. X ≡ (X1, . . . ,Xn ). Proof. 1. By Proposition 4.8.10, the continuous functions X1, . . . ,Xn on (S,L,E) are measurable. Let fi ∈ Cub (Si ) be arbitrary, for each i = 1, . . . ,n. Then Ef1 (X1 ) . . . fn (Xn ) = (F1 ⊗ · · · ⊗ Fn )(f1 ⊗ · · · ⊗ fn ) = F1 f1 . . . Fn fn (5.6.2)
Probability Space
183
by Fubini’s Theorem (Theorem 4.10.9). Let i = 1, . . . ,n be arbitrary. In the special case where fj ≡ 1 for each j = 1, . . . ,n with j i, we obtain, from equality 5.6.2, Efi (Xi ) = Fi fi .
(5.6.3)
Hence equality 5.6.2 yields Ef1 (X1 ) . . . fn (Xn ) = Ef1 (X1 ) . . . Efn (Xn ) where fi ∈ Cub (Si ) is arbitrary for each i = 1, . . . ,n. Thus the r.v.’s X1, . . . ,Xn are independent. Moreover, equality 5.6.3 shows that the r.v. Xi induces the distribution Fi on (Si ,di ) for each i = 1, . . . ,n. Assertion 1 is proved. 2. Since X is an r.v. with values in S, it induces a distribution EX on (S,d). Hence EX f ≡ Ef (X) = (F1 ⊗ · · · ⊗ Fn )f for each f ∈ Cub (S). Thus F1 ⊗ · · · ⊗ Fn = EX is a distribution F on (S,d). Assertion 2 is proved. Proposition 5.6.3. Basics of independence. Let (,L,E) be a probability space. For i = 1, . . . ,n, let Xi be an r.v. with values in a complete metric space (Si ,di ), and let (Si ,LX(i),EX(i) ) be the probability space it induces on (Si ,di ). Suppose the r.v.’s X1, . . . ,Xn are independent. Then, for arbitrary f1 ∈ LX(1), . . . ,fn ∈ LX(n) , we have E
n
fi (Xi ) =
i=1
n
Efi (Xi ).
(5.6.4)
i=1
Proof. 1. Consider each i = 1, . . . ,n. Let fi ∈ LX(i) be arbitrary. By Definition 5.2.3, LX(i) is the completion of (,Cub (Si ),EX(i) ). The r.r.v. fi ∈ LX(i) is therefore the L1 -limit relative to EX(i) of a sequence (fi,h )h=1,2,... in Cub (Si ) as h → ∞. Moreover, according to Assertion 1 of Proposition 5.2.6, we have fi (Xi ) ∈ L(Xi ) and E|fi,h (Xi ) − fi (Xi )| = EX(i) |fi,h − fi | → 0 as h → ∞. By passing to subsequences if necessary, we may assume that fi,h (Xi ) → fi (Xi )
a.u.
(5.6.5)
as h → ∞, for each i = 1, . . . ,n. 2. Consider the special case where fi ≥ 0 for each i = 1, . . . ,n. Then we may assume that fi,h ≥ 0 for each h ≥ 1, for each i = 1, . . . ,n. Let a > 0 be arbitrary. By the independence of the r.v.’s X1, . . . ,Xn , we then have E
n n n (fi,h (Xi ) ∧ a) = E(fi,h (Xi ) ∧ a) ≡ EX(i) (fi,h ∧ a). i=1
i=1
i=1
184
Probability Theory
In view of the a.u. convergence 5.6.5, we can let h → ∞ and apply the Dominated Convergence Theorem to obtain E
n
(fi (Xi ) ∧ a) =
i=1
n
EX(i) (fi ∧ a).
i=1
Now let a → ∞ and apply the Monotone Convergence Theorem to obtain E
n i=1
fi (Xi ) =
n i=1
EX(i) (fi ) =
n
Efi (Xi ).
i=1
3. The same equality for arbitrary f1 ∈ LX(1), . . . ,fn ∈ LX(n) follows by linearity. We next define the conditional expectation of an r.r.v. as the revised expectation given the observed values of all the r.v.’s in a family G. Definition 5.6.4. Conditional expectation. Let (,L,E) be a probability space, and let L be a probability subspace of L. Let Y ∈ L be arbitrary. If there exists X ∈ L such that EZY = EZX for each indicator Z ∈ L , then we say that X is the conditional expectation of Y given L , and define E(Y |L ) ≡ X. We will call L|L ≡ {Y ∈ L : E(Y |L ) exists} the subspace of conditionally integrable r.r.v.’s given the subspace L . In the special case where L ≡ L(G) is the probability subspace generated by a given family of r.v.’s with values in some complete metric space (S,d), we will simply write E(Y |G) ≡ E(Y |L ) and say that L|G ≡ L|L is the subspace of conditionally integrable r.r.v.’s given the family G. In the special case where G ≡ {V1, . . . ,Vm } for some m ≥ 1, we write also E(Y |V1, . . . ,Vm ) ≡ E(Y |G) ≡ E(Y |L ). In the further special case where m = 1, and where V1 = 1A for some measurable set A with P (A) > 0, it can easily be verified that, for arbitrary Y ∈ L, the conditional E(Y |1A ) exists and is given by E(Y |1A ) = P (A)−1 E(Y 1A )1A . In that case, we will write EA (Y ) ≡ P (A)−1 E(Y 1A ) for each Y ∈ L, and write PA (B) ≡ EA (1B ) for each measurable set B. The next lemma proves that (,L,EA ) is a probability space, called the conditional probability space given the event A. More generally, if Y1, . . . ,Yn ∈ L|L then we define the vector E((Y1, . . . ,Yn )|L ) ≡ (E(Y1 |L ), . . . ,E(Yn |L )) of integrable r.r.v.’s in L . Let A be an arbitrary measurable subset of (,L,E). If 1A ∈ L|L we will write P (A|L ) ≡ E(1A |L ) and call P (A|L ) the conditional probability of the event A given the probability subspace L . If 1A ∈ L|G for some give family
Probability Space
185
of r.v.’s with values in some complete metric space (S,d), we will simply write P (A|G) ≡ E(1A |G). In the case where G ≡ {V1, . . . ,Vm }, we write also P (A|V1, . . . ,Vm ) ≡ E(1A |V1, . . . ,Vm ). Before proceeding, note that the statement E(Y |L ) = X asserts two things: E(Y |L ) exists, and it is equal to X. We have defined the conditional expectation without the sweeping classical assertion of its existence. Before we use a particular conditional expectation, we will first supply a proof of its existence. Lemma 5.6.5. Conditional probability space given an event is indeed a probability space. Let the measurable set A be arbitrary, with P (A) > 0. Then the triple (,L,EA ) is indeed a a probability space. Proof. We need to verify the conditions in Definition 4.3.1 for an integration space. 1. Clearly, EA is a linear function on the linear L. 2. Let (Yi )i=0,1,2,... be an arbitrary sequence of functions in L such that Yi is nonnegative for each i ≥ 1 and such that ∞ i=1 EA (Yi ) < EA (Y0 ). Then ∞ E(Y 1 ) < E(Y 1 ) by the definition of the function EA . Hence, since i A 0 A i=1 ∞ ∞ E is an integration, there exists ω ∈ i=0 domain(Yi 1A ) such that i=1 Yi (ω)1A (ω) < Y0 (ω)1A (ω). It follows that 1A (ω) > 0. Dividing by 1A (ω), we obtain ∞ i=1 Yi (ω) < Y0 (ω). 3. Trivially, EA (1) ≡ P (A)−1 E(1A ) = 1. 4. Let Y ∈ L be arbitrary. Then EA (Y ∧ n) ≡ E(Y ∧ n)1A → E(Y )1A ≡ EA (Y ) as n → ∞. Similarly, EA (|Y | ∧ n−1 ) ≡ E(|Y | ∧ n−1 )1A → 0 as n → ∞. Summing up, all four conditions in Definition 4.3.1 are satisfied by the triple (,L,EA ). Because L is complete relative to the integration E in the sense of Definition 4.4.1, it is also complete relative to the integration EA . Because 1 ∈ L with EA (1) = 1, the complete integration space (,L,EA ) is a probability space. We will show that the conditional expectation, if it exists, is unique in the sense of equality a.s. The next two propositions prove basic properties of conditional expectations. Proposition 5.6.6. Basics of conditional expectation. Let (,L ,E) be a probability subspace of a probability space (,L,E). Then the following conditions hold: 1. Suppose Y1 = Y2 a.s. in L, and suppose X1,X2 ∈ L are such that EZYj = EZXj for each j = 1,2, for each indicator Z ∈ L . Then X1 = X2 a.s. Consequently, the conditional expectation, if it exists, is uniquely defined. 2. Suppose X,Y ∈ L|L . Then aX + bY ∈ L|L , and E(aX + bY |L ) = aE(X|L ) + bE(Y |L )
186
Probability Theory
for each a,b ∈ R. If, in addition, X ≤ Y a.s., then E(X|L ) ≤ E(Y |L ) a.s. In particular, if, in addition, |X| ∈ L|L , then |E(X|L )| ≤ E(|X||L ) a.s. 3. E(E(Y |L )) = E(Y ) for each Y ∈ L|L . Moreover, L ⊂ L|L , and E(X|L ) = X for each X ∈ L . 4. Let U ∈ L and Y ∈ L|L be arbitrary. In other words, suppose the conditional expectation E(Y |L ) exists. Then U Y ∈ L|L and E(U Y |L ) = U E(Y |L ). 5. Let Y ∈ L be arbitrary. Let G be an arbitrary set of r.v.’s with values in some complete metric space (S,d). Suppose there exists X ∈ L(G) such that EY h(V1, . . . ,Vk ) = EXh(V1, . . . ,Vk )
(5.6.6)
for each h ∈ Cub (S k ), for each finite subset {V1, . . . ,Vk } ⊂ G, and for each k ≥ 1; then E(Y |G) = X. 6. Let L be a probability subspace with L ⊂ L . Suppose Y ∈ L|L with X ≡ E(Y |L ). Then Y ∈ L|L iff X ∈ L , in which case E(Y |L ) = E(Y |L ). Moreover, E(E(Y |L )|L ) = E(Y |L ). 7. If Y ∈ L, and if Y,Z are independent for each indicator Z ∈ L , then E(Y |L ) = EY . 8. Let Y be an r.r.v.. with Y 2 ∈ L. Suppose X ≡ E(Y |L ) exists. Then X2,(Y − X)2 ∈ L and EY 2 = EX2 + E(Y − X)2 . Consequently, EX2 ≤ EY 2 and E(Y − X)2 ≤ EY 2 . Proof. 1. Suppose Y1 = Y2 a.s. in L, and suppose X1,X2 ∈ L are such that EZYj = EZXj for each j = 1,2, for each indicator Z ∈ L . Let t > 0 be arbitrary and let Z ≡ 1(t 0, with Z ≡ 1(U −V >t) ∈ L . Hence tP (U − V > t) = Et1(U −V >t) ≤ E1(U −V >t) (U − V ) = EZ(U − V ) = EZ(X − Y ) ≤ 0.
Probability Space
187
Consequently, (U − V > t) is a null set and (U − V ≤ t) is a full set, for each t > 0. It follows that D ≡ ∞ n=1 (U − V ≤ tn ) is a full set, where we let (tn )n=1,2,... be an arbitrary decreasing sequence in (0,∞) with tn ↓ 0. Moreover, E(X|L ) − E(Y |L ) ≡ U − V ≤ 0 on the full set D. Assertion 2 is proved. 3. If Y ∈ L|L , then E(E(Y |L )) = E(1E(Y |L )) = E(1Y ) = E(Y ). Moreover, if Y ∈ L , then for each indicator Z ∈ L , we have trivially E(ZY ) = E(ZY ), whence E(Y |L ) = Y . Assertion 3 is proved. 4. Suppose Y ∈ L|L , with X ≡ E(Y |L ). Then, by definition, EZY = EZX
(5.6.7)
for each indicator Z ∈ L . The equality extends to all linear combinations of integrable indicators in L’. The reader can verify that such linear combinations are dense in L relative to the L1 -norm, relative to the integration E. Hence equality 5.6.7 extends, by the Dominated Convergence Theorem, to all such bounded integrable r.r.v.’s Z ∈ L . Moreover, if U,Z ∈ L are bounded and integrable r.r.v.’s, so is U Z, and equality 5.6.7 implies that E(U ZY ) = E(U ZX). It follows that E(U Y |L ) = U X = U E(Y |L ). This verifies Assertion 4. 5. Let Y ∈ L be arbitrary, such that equality 5.6.6 holds for some X ∈ L(G). Let Z be an arbitrary indicator in L ≡ L(G). Then Z is the L1 -limit, relative to E, of some sequence (hn (Vn,1, . . . ,Vn,k(n) ))n=1,2,... , where hn ∈ Cub (S k(n) ) with 0 ≤ hn ≤ 1, for each n ≥ 1. It follows that EYZ = lim EY hn (Vn,1, . . . ,Vn,k(n) ) = lim EXhn (Vn,1, . . . ,Vn,k(n) ) = EXZ, n→∞
n→∞
where the second equality is due to equality 5.6.6. Thus E(Y |G) ≡ E(Y |L ) = X, proving Assertion 5. 6. Let L be a probability subspace with L ⊂ L . Suppose Y ∈ L|L . Let X ≡ E(Y |L ). Suppose, in addition, Y ∈ L|L . Let W ≡ E(Y |L ) ∈ L . Then E(U Y ) = E(U W ) for each U ∈ L . Hence E(U W ) = E(U Y ) = E(U X) for each U ∈ L ⊂ L . Thus X ∈ L|L with E(X|L ) = W ≡ E(Y |L ). We have proved the “only if” part of Assertion 6. Conversely, suppose X ∈ L|L . Consider each U ∈ L ⊂ L . Then, because X ≡ E(Y |L ), we have EU Y ≡ EU X. Thus Y ∈ L|L . We have proved also the “if” part of Assertion 6. 7. Suppose Y ∈ L, and suppose Y,Z are independent for each indicator Z ∈ L . Then, for each indicator Z ∈ L , we have E(ZY ) = (EZ)(EY ) = E(ZEY ). Since trivially EY ∈ L , it follows that E(Y |L ) = EY . 8. Let Y be an r.r.v. with Y 2 ∈ L. Suppose X ≡ E(Y |L ) exists. Since Y ∈ Y 2 ∈ L, there exists a deceasing sequence ε1 > ε2 > · · · of positive real numbers such that EY 2 1A < 2−k for each measurable set A with P (A) < εk , for each k ≥ 1.
188
Probability Theory
Since X is an r.r.v., there exists a sequence 0 ≡ a0 < a1 < a2 < · · · of positive real numbers with ak → ∞ such that P (|X| ≥ ak ) < εk . Let k ≥ 1 be arbitrary. Then EY 2 1(|X|≥a(k)) < 2−k . Write Zk ≡ 1(a(k+1)>|X|≥a(k)) . Then Zk ,XZk ,X2 Zk ∈ L are bounded in absolute 2 , respectively. Hence value by 1,ak+1,ak+1 EY 2 1(a(k+1)>|X|≥a(k)) = E((Y − X) + X)2 Zk = E(Y − X)2 Zk + 2E(Y − X)XZk + EX2 Zk = E(Y − X)2 Zk + 2E(Y XZk ) − 2E(X2 Zk ) + EX2 Zk = E(Y − X)2 Zk + 2E(E(Y |L )XZk ) − EX2 Zk ≡ E(Y − X)2 Zk + 2E(XXZk ) − EX2 Zk = E(Y − X)2 Zk + EX2 Zk ,
(5.6.8)
where the fourth equality is because XZk ∈ L . Since Y 2 ∈ L by assumption, equality 5.6.8 implies that ∞
EX2 Zk ≤
k=0
∞
EY 2 1(a(k+1)>|X|≥a(k)) = EY 2 .
k=0
Consequently, X2 =
∞
X2 1(a(k+1)>|X|≥a(k)) ≡
k=0
∞
X2 Zk ∈ L.
k=0
Similarly, (Y − X) ∈ L2 . Moreover, summing equality 5.6.8 over k = 0,1, . . . , we obtain EY 2 = E(Y − X)2 + EX2 . Assertion 8 and the proposition are proved.
Proposition 5.6.7. The space of conditionally integrable functions given a probability subspace is closed relative to the L1 -norm. Let (,L,E) be a probability space. Let (,L ,E) be a probability subspace of (,L,E). Let L|L be the space of r.r.v.’s conditionally integrable given L . Then the following conditions hold: 1. Let X,Y ∈ L be arbitrary. Suppose EU X ≤ EU Y for each indicator U ∈ L . Then EZX ≤ EZY for each bounded nonnegative r.r.v. Z ∈ L . 2. Suppose Y ∈ L|L . Then E|E(Y |L )| ≤ E|Y |. 3. The linear subspace L|L of L is closed relative to the L1 -norm. Proof. 1. Suppose EU X ≤ EU Y for each indicator U ∈ L . Then, by linearity, EV X ≤ EV Y for each nonnegative linear combination Y of indicators in L . Now consider each bounded nonnegative r.r.v. Z ∈ L . We may assume, without
Probability Space
189
loss of generality, that Z has values in [0,1]. Then E|Z − Vn | → 0 for some sequence (Vn )n=1,2,... of nonnegative linear combinations of indicators in L , with values in [0,1]. By passing to a subsequence, we may assume that Vn → Z a.s. Hence, by the Dominated Convergence Theorem, we have EZX = lim EVn X ≤ lim EVn Y = EZY . n→∞
n→∞
2. Suppose Y ∈ L|L , with E(Y |L ) = X ∈ L . Let ε > 0 be arbitrary. Then, since X is integrable, there exists a > 0 such that EX1(|X|≤a) < ε. Hence E|X| = EX1(X>a) − EX1(Xa) − EX1(Xa) − EY 1(Y a) + E|Y |1(X 0 is arbitrarily small, we conclude that E|X| ≤ E|Y |, as alleged. 3. Let (Yn )n=1,2,... be a sequence in L|L such that E|Yn − Y | → 0 for some Y ∈ L. For each n ≥ 1, let Xn ≡ E(Yn |L ). Then, by Assertion 2, we have E|Xn − Xm | = E|E(Yn − Ym |L )| ≤ E|Yn − Ym | → 0 as n,m → ∞. Thus (Xn )n=1,2,... is a Cauchy sequence in the complete metric space L relative to the L1 -norm. It follows that E|Xn −X| → 0 for some X ∈ L , as n → ∞. Hence, for each indicator Z ∈ L , we have EXZ = lim EXn Z = lim EYn Z = EY Z. n→∞
It follows that
E(Y |L )
n→∞
= X and Y ∈ L|L .
5.7 Normal Distribution The classical development of the topics in the remainder of this chapter is an exemplar of constructive mathematics. However, some tools in this development have been given many proofs – some constructive and others not. An example is the spectral theorem for symmetric matrices discussed in this section. For ease of reference, we therefore present some of these topics here, using only constructive proofs. Recall some notations and basic theorems from matrix algebra. Definition 5.7.1. Matrix notations. For an arbitrary m × n matrix ⎤ ⎡ θ1,1, . . . , θ1,n ⎢ · ... · ⎥ ⎥ ⎢ ⎥ ⎢ θ ≡ [θi,j ]i=1,...,m;j =1,...,n ≡ ⎢ · ... · ⎥, ⎥ ⎢ ⎣ · ... · ⎦ θm,1, . . . , θm,n
190
Probability Theory
of real or complex elements θi,j , we will let ⎡ θ T ≡ [θj,i ]j =1,...,n;1=1,...m
⎢ ⎢ ⎢ =⎢ ⎢ ⎣
θ1,1, · · · θ1,n
..., ... ... ... ,...,
θm,1 · · · θm,n
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
denote the transpose, which is an n × m matrix. If n = m and θ = θ T , then θ is said to be symmetric. If θi,j = 0 for each i,j = 1, . . . ,n with i j , then θ is called a diagonal matrix. For each sequence of complex numbers (λ1, . . . ,λn ), write diag(λ1, . . . ,λn ) for the diagonal matrix θ with θi,i = λi for each i = 1, . . . ,n. A matrix θ is said to be real if all its elements θi,j are real numbers. Unless otherwise specified, all matrices in the following discussion are assumed to be real. For an arbitrary sequence μ ≡ (μ1, . . . ,μn ) ∈ R n , we will abuse notations and let μ denote also the column vector ⎤ ⎡ μ1 ⎢ . ⎥ ⎥ ⎢ ⎥ ⎢ μ ≡ (μ1, . . . ,μn ) ≡ ⎢ . ⎥ . ⎥ ⎢ ⎣ . ⎦ μn Thus μT = [μ1, . . . ,μn ]. A 1 × 1 matrix is identified with its only entry. Hence, if μ ∈ R n , then / 0 n . 0
μ2i . |μ| ≡ μ ≡ μT μ = 1 i=1
We will let In denote the n × n diagonal matrix diag(1, . . . ,1). When the dimension n is understood, we write simply I ≡ In . Likewise, we will write 0 for any matrix whose entries are all equal to the real number 0, with dimensions understood from the context. The determinant of an n × n matrix θ is denoted by det θ . The n complex roots λ1, . . . ,λn of the polynomial det(θ − λI ) of degree n are called the eigenvalues of θ . Then det θ = λ1 . . . λn . Let j = 1, . . . ,n be arbitrary. Then there exists a nonzero column vector xj , whose elements are in general complex, such that θ xj = λj xj . The vector xj is called an eigenvector for the eigenvalue λj . If θ is real and symmetric, then the n eigenvalues λ1, . . . ,λn are real. Let σ be a symmetric n × n matrix whose elements are real. Then σ is said to be nonnegative definite if x T σ x ≥ 0 for each x ∈ R n . In that case, all its eigenvalues are nonnegative and, for each eigenvalue, there exists a real eigenvector whose elements are real. It is said to be positive definite if x T σ x > 0 for each nonzero x ∈ R n . In that case, all its eigenvalues are positive, whence σ is nonsingular, with
Probability Space
191
an inverse σ −1 . An n × n real matrix U is said to be orthogonal if U T U = I . This is equivalent to saying that the column vectors of U form an orthonormal basis of R n . Theorem 5.7.2. Spectral theorem for symmetric matrices. Let θ be an arbitrary n × n symmetric matrix. Then the following conditions hold: 1. There exists an orthogonal matrix U such that U T θ U = , where Λ ≡ diag(λ1, . . . ,λn ) and λ1, . . . ,λn are eigenvalues of θ . 2. Suppose, in addition, that λ1, . . . ,λn are nonnegative. Define the symmetric 1
1
1
1
matrix A ≡ U Λ 2 U T , where Λ 2 = diag(λ12 , . . . ,λn2 ). Then θ = AAT . Proof. 1. Proceed by induction on n. The assertion is trivial if n = 1. Suppose the assertion has been proved for n − 1. Recall that for an arbitrary unit vector vn , there exist v1, . . . ,vn−1 ∈ R n such that v1, . . . ,vn−1,vn form an orthonormal basis of R n . Now let vn be an eigenvector of θ corresponding to λn . Let V be the n × n matrix whose ith column is vi for each i = 1, . . . ,n. Then V is an orthogonal matrix. Define an (n − 1) × (n − 1) symmetric matrix η by ηi,j ≡ vTi θ vj for each i,j = 1, . . . ,n − 1. By the induction hypothesis, there exists an (n − 1) × (n − 1) orthogonal matrix ⎤ ⎡ w1,1, ..., w1,n−1 ⎥ ⎢ · ... · ⎥ ⎢ ⎥ ⎢ W ≡⎢ · ... · ⎥ ⎥ ⎢ ⎦ ⎣ · ... · wn−1,1, . . . , wn−1,n−1 such that W T ηW = n−1 = diag(λ1, . . . ,λn−1 )
(5.7.1)
for some λ1, . . . ,λn−1 ∈ R. Define the n × n matrices ⎡ ⎢ ⎢ ⎢ ⎢ W ≡ ⎢ ⎢ ⎢ ⎣
w1,1, · · · wn−1,1,
..., ... ... ... ...,
w1,n−1, · · · wn−1,n−1,
0 · · · 0
0,
...,
0,
1
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
and U ≡ V W . Then it is easily verified that U is orthogonal. Moreover,
192
Probability Theory
U T θ U = W T V T θ V W ⎡ T v1 θ v1, . . . , ⎢ · ... ⎢ ⎢ · ... ⎢ = W T ⎢ · ... ⎢ ⎢ T θ v , . .., v ⎣ n−1 1
vT1 θ vn−1, · · · vTn−1 θ vn−1,
vTn θ v1, ⎡ ⎢ ⎢ ⎢ ⎢ =⎢ ⎢ ⎢ ⎣
w1,1, ..., · ... · ... · ... wn−1,1, . . . , 0,
...,
⎤ vT1 θ vn · ⎥ ⎥ · ⎥ ⎥ W · ⎥ ⎥ ⎥ T vn−1 θ vn⎦
vTn θ vn−1,
...,
w1,n−1, · · · wn−1,n−1,
0 · · · 0
0,
1
⎡
vTn θ vn ⎤T⎡ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎦⎣
η1,1, ..., · ... · ... · ... ηn−1,1, . . . , 0,
...,
η1,n−1, · · · ηn−1,n−1,
0 · · · 0
0,
λn
⎤
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
w1,1, ..., w1,n−1, 0 ⎢ · . . . · · ⎥ ⎢ ⎥ ⎢ · ... · · ⎥ ⎢ ⎥ ⎢ ⎥ · ... · · ⎥ ⎢ ⎢ ⎥ ⎣ wn−1,1, . . . , wn−1,n−1, 0 ⎦ 0, ..., 0, 1 ⎡ ⎤ λ1 , . . . , 0, 0 ⎢ · ... · · ⎥ ⎢ ⎥ ⎢ · ... · · ⎥ ⎢ ⎥ =⎢ ⎥ ≡ Λ ≡ diag(λ1, . . . ,λn ), ... · · ⎥ ⎢ · ⎢ ⎥ ⎣ 0 . . . , λn−1, 0 ⎦ 0 ..., 0, λn where the fourth equality is thanks to equality 5.7.1. Induction is completed. The equality U T θ U = implies that θ U = U and that λi is an eigenvalue of θ with an eigenvector given by the ith column of U . Assertion 1 is thus proved. Since 1
1
1
1
θ = U ΛU T = U Λ 2 Λ 2 U T = U Λ 2 U T U Λ 2 U T = AAT , Assertion 2 is proved.
Definition 5.7.3. Normal distribution with positive definite covariance. Let n ≥ 1 and μ ∈ R n be arbitrary. Let σ be an arbitrary positive definite n × n matrix. Then the function defined on R n by 1 − n2 − 12 ϕμ,σ (y) ≡ (2π ) (det σ ) exp − (y − μ)T σ −1 (y − μ) (5.7.2) 2
Probability Space
193
for each y ∈ R n is a p.d.f. Let μ,σ be the corresponding distribution on R n , and let Y ≡ (Y1, . . . ,Yn ) be any r.v. with values in R n and with μ,σ as its distribution. Then ϕμ,σ , μ,σ , Y , and the sequence Y1, . . . ,Yn are said to be the normal p.d.f., the normal distribution, normally distributed, and jointly normal, respectively, with mean μ and covariance matrix σ . Proposition 5.7.6 later in this section justifies the terminology. The p.d.f. ϕ0,I and the distribution 0,I are said to be standard normal, where I is the identity √ matrix. In the case where n = 1, define σ ≡ σ and write μ,σ 2 also for the P.D.F. with the distribution μ,σ 2 , and call it a normal P.D.F. Thus 0,1 (x) = associated x ϕ (u)du for each x ∈ R. −∞ 0,1 In Definition 5.7.7, we will generalize the definition of normal distributions to an arbitrary nonnegative definite matrix σ . Proposition 5.7.4. Basics of standard normal distribution. Consider the case n = 1. Then the following conditions hold: 1. The function ϕ0,1 on R defined by 1 1 ϕ0,1 (x) ≡ √ exp − x 2 2 2π is a p.d.f. on R relative to the Lebesgue measure. Thus 0,1 is a P.D.F. on R.# 2. Write ≡ 0,1 . We will call the function ≡ 1 − : [0,∞) → 0, 12 the tail of . Then (−x) = 1 − (x) for each x ∈ R. Moreover, (x) ≤ e−x
2 /2
for each x ≥ 0. # 3. The inverse function : 0, 12 → [0,∞) of is a decreasing 2 function from (0,1) to R# such that (ε) → ∞ as ε → 0. Moreover, (ε) ≤ −2 log ε for each ε ∈ 0, 12 . Proof. 1. We calculate ! +∞ 1 2 e−x /2 dx √ 2π −∞
2
=
1 2π
!
+∞ ! +∞ −∞
−∞
e−(x
2 +y 2 )/2
dxdy = 1,
(5.7.3)
where the last equality is by Corollary A.0.13, which also says that the function 2 2 (x,y) → e−(x +y )/2 on R 2 is Lebesgue integrable. Thus ϕ0,1 is Lebesgue integrable, with an integral equal to 1, and hence is a p.d.f. on R. 2. First, ! −x ϕ(u)du (−x) ≡ −∞ ! ∞ ! ∞ = ϕ(−v)dv = ϕ(v)dv = 1 − (x), x
x
where we made a change of integration variables v = −u and noted that ϕ(−v) = ϕ(v).
194
Probability Theory
Next, if x ∈ √1 ,∞ , then 2π ! ∞ 1 2 e−u /2 du (x) ≡ √ 2π x ! ∞ u −u2 /2 1 1 −x 2 /2 1 2 e e du = √ ≤ e−x /2 . ≤√ 2π x x 2π x
" On the other hand, if x ∈ 0, √1 , then 2π 23 1 1 2 2 ≤ e−x /2 . (x) ≤ (0) = < exp − √ 2 2π "
Therefore, by continuity, (x) ≤ e−x /2 for 2each x ∈ [0,∞). 2 3. Consider any ε ∈ (0,1). Define x ≡ −2 log ε. Then (x) ≤ e−x /2 = ε by Assertion 2. Since is a decreasing function, it follows that 2 −2 log ε ≡ x = ( (x)) ≥ (ε). 2
Proposition 5.7.5. Moments of standard normal r.r.v. Suppose an r.r.v. X has 2 the standard normal distribution 0,1 , with p.d.f. ϕ0,1 (x) ≡ √1 e−x /2 . Then Xm 2π is integrable for each m ≥ 0. Moreover, for each even integer m ≡ 2k ≥ 0, we have EXm = EX2k = (2k − 1)(2k − 3) . . . 3 · 1 = (2k)! 2−k /k! , while EXm = 0 for each odd integer m > 0. Proof. Let m ≥ 0 be any even integer. Let a > 0 b arbitrary. Then, integrating by parts, we have ! a 1 1 2 2 x m+2 e−x /2 dx = √ (−x m+1 e−x /2 )|a−a √ 2π −a 2π ! a 1 2 + (m + 1) √ x m e−x /2 dx. (5.7.4) 2π −a Since the function gm defined by gm (x) ≡ 1[−a,a] (x)x m for each x ∈ R is Lebesgue measurable and is bounded by the integrable function x m relative to the P.D.F. 0,1 , the function gm is integrable relative to the P.D.F. 0,1 , which has ϕ0,1 as its p.d.f. Moreover, equality 5.7.4 can be rewritten as ! ! 1 2 gm+2 (x)d 0,1 (x) = √ (−x m+1 e−x /2 )|a−a + (m + 1) gm (x)d 0,1 (x) 2π or, in view of Proposition 5.4.4, as 1 2 Egm+2 (X) = √ (−x m+1 e−x /2 )|a−a + (m + 1)Egm (X). 2π
(5.7.5)
The lemma is trivial for m = 0. Suppose the lemma has been proved for integers up to and including the even integer m ≡ 2k − 2. By the induction hypothesis,
Probability Space
195
Xm is integrable. At the same time, gm (X) → Xm in probability as a → ∞. Hence, by the Dominated Convergence Theorem, we have Egm (X) ↑ EXm as 2 a → ∞. Since |a|m+1 e−a /2 → 0 as a → ∞, equality 5.7.5 yields Egm+2 (X) ↑ (m + 1)EXm as a → ∞. The Monotone Convergence Theorem therefore implies that Xm+2 is integrable, with EXm+2 = (m + 1)EXm , or EX2k = (2k − 1)EX2k−2 = · · · = (2k − 1)(2k − 3) . . . 1 = (2k)! 2−k /k! Since Xm+2 is integrable, so is Xm+1 , according to Lyapunov’s inequality. Moreover, ! ! m+1 m+1 = x d 0,1 (x) = x m+1 ϕ0,1 (x)dx = 0 EX since x m+1 ϕ0,1 (x) is an odd function of x ∈ R. Induction is completed.
The next proposition shows that ϕμ,σ and μ,σ in Definition 5.7.3 are well defined. Proposition 5.7.6. Basics of normal distributions with positive definite covariance. Let n ≥ 1 and μ ∈ R n be arbitrary. Let σ be an arbitrary positive definite n × n matrix. Use the notations in Definition 5.7.3. Then the following conditions hold: 1. ϕμ,σ is indeed a p.d.f. on R n , i.e., ϕμ,σ (x)dx = 1, where ·dx stands for the Lebesgue integration on R n . Thus the corresponding distribution μ,σ on R n is well defined. Moreover, μ,σ is equal to the distribution of the r.v. Y ≡ μ+AX, where A is an arbitrary n × n matrix with σ = AAT , and where X is an arbitrary r.v. with values in R n and with the standard normal distribution 0,I . In short, linear combinations of a finite set of standard normal r.r.v.’s and the constant 1 are jointly normal. More generally, linear combinations of a finite set of jointly normal r.r.v.’s are jointly normal. 2. Let Z ≡ (Z1, . . . ,Zn ) be an r.v. with values in R n with distribution μ,σ . Then EZ = μ and E(Z − μ)(Z − μ)T = σ . 3. Let Z1, . . . ,Zn be jointly normal r.r.v.’s. Then Z1, . . . ,Zn are independent iff they are pairwise uncorrelated. In particular, if Z1, . . . ,Zn are jointly standard normal, then they are independent. Proof. For each x ≡ (x1, . . . ,xn ) ∈ R n , we have, by Definition 5.7.3, 1 − n2 ϕ0,I (x1, . . . ,xn ) ≡ ϕ0,I (x) ≡ (2π ) exp − x T x 2 n 1 1 = ϕ0,1 (x1 ) . . . ϕ0,1 (xn ). = √ exp − xi2 2 2π i=1 Since ϕ0,1 is a p.d.f. on R according to Proposition 5.7.4, Proposition 4.10.6 implies that the Cartesian product ϕ0,I is a p.d.f. on R n . Let X ≡ (X1, . . . ,Xn ) be an arbitrary r.v. with values in R n and with p.d.f. ϕ0,I . Then
196
!
Ef1 (X1 ) . . . fn (Xn ) = =
! ···
n !
Probability Theory f1 (x1 ) . . . fn (xn )ϕ0,1 (x1 ) . . . ϕ0,1 (xn )dx1 . . . dxn
fi (xi )ϕ0,1 (xi )dxi = Ef1 (X1 ) . . . Efn (Xn )
i=1
(5.7.6) for each f1, . . . ,fn ∈ C(R). Separately, for each i = 1, . . . ,n, the r.r.v. Xi has distribution ϕ0,1 , whence Xi has an mth moment for each m ≥ 0, with EXim = 0 if m is odd, according to Proposition 5.7.5. 1. Next let σ,μ be as given. Let A be an arbitrary n × n matrix such that σ = AAT . By Theorem 5.7.2, such a matrix A exists. Then det(σ ) = det(A)2 . Since σ is positive definite, it is nonsingular and so is A. Let X be an arbitrary r.v. with values in R n and with the standard normal distribution 0,I . Define the r.v. Y ≡ μ + AX. Then, for arbitrary f ∈ C(R n ), we have ! Ef (Y ) = Ef (μ + AX) = f (μ + Ax)ϕ0,I (x)dx n
≡ (2π )− 2
n
= (2π )− 2
n
= (2π )− 2 ! ≡
1 f (μ + Ax) exp − x T x dx 2 ! 1 −1 det(A) f (y) exp − (y − μ)T (A−1 )T A−1 (y − μ) dy 2 ! 1 1 det(σ )− 2 f (y) exp − (y − μ)T σ −1 (y − μ) dy 2 !
f (y)ϕμ,σ (y)dy,
where the fourth equality is by the change of integration variables y = μ + Ax. Thus ϕμ,σ is the p.d.f. on R n of the r.v. Y , and μ,σ is the distribution of Y . 2. Next, let Z1, . . . ,Zn be jointly normal r.r.v.’s with distribution μ,σ . By Assertion 1, there exist a standard normal r.v. X ≡ (X1, . . . ,Xn ) on some probability space ( ,L ,E ), and an n × n matrix AAT = σ , such that E f (μ + AX) = μ,σ (f ) = Ef (Z) for each f ∈ C(R n ). Thus Z and Y ≡ μ+AX induce the same distribution on R n . Let i,j = 1, . . . ,n be arbitrary. Since Xi ,Xj ,Xi Xj and, therefore, Yi ,Yj ,Yi Yj are integrable, so are Zi ,Zj ,Zi Zj , with EZ = E Y = μ + AE X = μ and E(Z − μ)(Z − μ)T = E (Y − μ)(Y − μ)T = AE XXT AT = AAT = σ . 3. Suppose Z1, . . . ,Zn are pairwise uncorrelated. Then σ i,j = E(Zi −μi )(Zj − μj ) = 0 for each i,j = 1, . . . ,n with i j . Thus σ and σ −1 are diagonal matrices, with (σ −1 )i,j = σ i,i or 0 depending on whether i = j or not. Hence, for each f1, . . . ,fn ∈ C(R), we have
Probability Space
197
Ef1 (Z1 ) . . . fn (Zn ) n
1
= (2π )− 2 (det σ )− 2 ! ! 1 × · · · f (z1 ) . . . f (zn ) exp − (z − μ)T σ −1 (z − μ) dz1 . . . dzn 2 1
n
= (2π )− 2 (σ 1,1 . . . σ n,n )− 2 ! ! n
1 − (zi − μi )σ −1 × · · · · · · f (z1 ) . . . f (zn )exp i,i (zi − μi ) dz1 . . . dzn 2 i=1 ! 1 1 = (2π σ i,i )− 2 f (zi ) exp − (zi − μi )σ −1 i,i (zi − μi ) dzi 2 = Ef1 (Z1 ) . . . Efn (Zn ). We conclude that Z1, . . . ,Zn are independent if they are pairwise uncorrelated. The converse is trivial. Next we generalize the definition of a normal distribution to include the case where the covariance matrix is nonnegative definite. Definition 5.7.7. Normal distribution with nonnegative definite covariance. Let n ≥ 1 and μ ∈ R n be arbitrary. Let σ be an arbitrary nonnegative definite n × n. Define the normal distribution μ,σ on R n by μ,σ (f ) ≡ lim μ,σ +εI (f ) ε→0
(5.7.7)
for each f ∈ C(R n ), where for each ε > 0, the function μ,σ +εI is the normal distribution on R n introduced in Definition 5.7.3 for the positive definite matrix σ + εI . Lemma 5.7.8 (next) proves that μ,σ is well defined and is indeed a distribution. A sequence Z1, . . . ,Zn of r.r.v.’s is said to be jointly normal, with μ,σ as its distribution, if Z ≡ (Z1, . . . ,Zn ) has the distribution μ,σ on R n . Lemma 5.7.8. Normal distribution with nonnegative definite covariance is well defined. Use the notations and assumptions in Definition 5.7.7. Then the following conditions hold: 1. The limit limε→0 μ,σ +εI (f ) in equality 5.7.7 exists for each f ∈ C(R n ). Moreover, μ,σ is the distribution of Y ≡ μ + AX for some standard normal T X ≡ (X1, . . . ,Xn ) and some n × n matrix A with AA = σ . 2. If σ is positive definite, then μ,σ (f ) = f (y)ϕμ,σ (y)dy, where ϕμ,σ was defined in Definition 5.7.3. Thus Definition 5.7.7 of μ,σ for a nonnegative definite σ is consistent with Definition 5.7.3 for a positive definite σ . 3. Let Z ≡ (Z1, . . . ,Zn ) be an arbitrary r.v. with values in R n and with k(1) k(n) distribution μ,σ . Then Z1 . . . Zn is integrable for each k1, . . . ,kn ≥ 0. In particular, Z has mean μ and covariance matrix σ .
198
Probability Theory
Proof. 1. Let ε > 0 be arbitrary. Then σ + εI is positive definite. Hence, the normal distribution μ,σ +εI has been defined. Separately, Theorem 5.7.2 implies that there exists an orthogonal matrix U such that U T σ U = , where Λ ≡ diag(λ1, . . . ,λn ) is a diagonal matrix whose diagonal elements consist of the eigenvalues λ1, . . . ,λn of σ . These eigenvalues are nonnegative since σ is nonnegative definite. Hence, again by Theorem 5.7.2, we have σ + εI = Aε ATε ,
(5.7.8)
where 1
Aε ≡ U Λε2 U T ,
(5.7.9)
1 √ √ where Λε2 = diag( λ1 + ε, . . . , λn + ε). Now let X be an arbitrary r.v. on R n with the standard normal distribution 0,I . In view of equality 5.7.8, Proposition 5.7.6 implies that μ,σ +εI is equal to the distribution of the r.v.
Y (ε) ≡ μ + Aε X. √ √ 1 1 Define A ≡ U Λ 2 U T , where Λ 2 = diag( λ1, . . . , λn ) and define Y ≡ μ + AX. Then E|Aε X − AX|2 = EXT (Aε − A)T (Aε − A)X = = = =
n
n
n
2 2 EXi Ui,j ( λj + ε − λj )2 Uj,k Xk
i=1 j =1 k=1 n
n
2 2 Ui,j ( λj + ε − λj )2 Uj,i
i=1 j =1 n
n
2 2 ( λ j + ε − λ j )2 Ui,j Uj,i
j =1 n
i=1
2 2 ( λ j + ε − λ j )2 → 0
j =1
as ε → 0. Lyapunov’s inequality then implies that 1
E|Y (ε) − Y | = E|Aε X − AX| ≤ (E|Aε X − AX|2 ) 2 → 0 as ε → 0. In other words, Y (ε) → Y in probability. Consequently, the distribution μ,σ +εI converges to the distribution FY of Y . We conclude that the limit μ,σ (f ) in equality 5.7.7 exists and is equal to EF (Y ). In other words, μ,σ is the distribution of the r.v. Y ≡ μ + AX. Moreover, 1
1
AAT = U Λ 2 U T U Λ 2 U T = U ΛU T = σ . Assertion 1 is proved.
Probability Space
199
2. Next suppose σ is positive definite. Then ϕμ,σ +εI → ϕμ,σ uniformly on compact subsets of R n . Hence ! ! lim μ,σ +εI (f ) = lim f (y)ϕμ,σ +εI (y)dy = f (y)ϕμ,σ (y)dy ε→0
ε→0
for each f ∈ C(R n ). Therefore Definition 5.7.7 is consistent with Definition 5.7.3, proving Assertion 2. 3. Now let Z ≡ (Z1, . . . ,Zn ) be any r.v. with values in R n and with distribution μ,σ . By Assertion 1, μ,σ is the distribution of Y ≡ μ + AX for some standard normal X ≡ (X1, . . . ,Xn ) and some n × n matrix A with AAT = σ . Thus Z and Y has the same distribution. Let k1, . . . ,kn ≥ 0 be arbitrary. Then the j (1) j (n) k(1) k(n) is a linear combination of products X1 . . . Xn inter.r.v. Y1 . . . Yn grable where j1, . . . ,jn ≥ 0, each of which is integrable in view of Proposition k(1) k(n) 5.7.5 and Proposition 4.10.6. Hence Y1 . . . Yn is integrable. It follows that k(1) k(n) Z1 . . . Zn is integrable. EZ = EY = μ and E(Z − μ)(Z − μ)T = E(Y − μ)(Y − μ)T = EAXXT AT = AAT = σ . In other words, Z has mean μ and covariance matrix σ , proving Assertion 3.
We will need some bounds related to the normal p.d.f. in later sections. Recall from Proposition 5.7.4 the standard normal P.D.F. on R, its tail , and the inverse function of the latter. Lemma 5.7.9. Some bounds for normal probabilities. 1. Suppose h is a measurable function on R relative to the Lebesgue integration. If |h| ≤ a on [−α,α] and |h| ≤ b on [−α,α]c for some a,b,α > 0, then ! α |h(x)|ϕ0,σ (x)dx ≤ a + 2b σ for each σ > 0. 2. In general, let n ≥ 1 be arbitrary. Let I denote the n × n identity matrix. Suppose f is a Lebesgue integrable function on R n , with |f | ≤ 1. Let σ > 0 be arbitrary. Define a function fσ on R n by ! fσ (x) ≡ f (x − y)ϕ0,σ I (y)dy y∈R n
for each x ∈ R n . Suppose f is continuous at some t ∈ R n . In other words, suppose, for arbitrary ε > 0, there exists δf (ε,t) > 0 such that |f (t)−f (r)| < ε n for each
r ∈ R with |r − t| < δf (ε,t). Let ε > 0 be arbitrary. Let α ≡ ε δf 2 ,t > 0 and let σ < α/
ε n1 1 1− 1− 2 4
.
(5.7.10)
200
Probability Theory
Then |fσ (t) − f (t)| ≤ ε. 3. Again consider the case n = 1. Let ε > 0 be arbitrary. Suppose σ > 0 is so small that σ < ε/ ( 8ε ). Let r,s ∈ R be arbitrary with r + 2ε < s. Let f ≡ 1(r,s] . Then 1(r+ε,s−ε] − ε ≤ fσ ≤ 1(r−ε,s+ε] + ε. Proof. 1. We estimate ! 1 2 2 |h(x)|e−x /(2σ ) dx √ 2π σ ! α ! 1 1 2 2 2 2 ae−x /(2σ ) dx + √ be−x /(2σ ) dx ≤√ 2π σ −α 2π σ |x|>α ! α 1 2 . ≤a+√ be−u /2 du = a + 2b σ 2π |u|> σα 2. Let f ,t,ε,δf ,α, and σ be as given. Then inequality 5.7.10 implies that α ε n1 1− 1− > 2 , 4 σ whence
α n ε 2 1 − 1 − 2 < . σ 2
(5.7.11)
u ≡ |u1 | ∨ · · · ∨ |un | < α. Then, |f (t − u) − f (t)| < 2ε for u ∈ R n with By hypothesis, σ ≤ α/ 8ε . Hence σα ≥ 8ε and so σα ≤ 8ε . Hence, by Assertion 1, we have ! |fσ (t) − f (t)| = (f (t − u) − f (t))ϕ0,σ I (u)du !
≤
|f (t − u) − f (t)|ϕ0,σ I (u)du u: u 0. Let α ≡ δf ( 2ε ,t) ≡ ε. Then, by hypothesis, ε −1 ε 1 1− 1− . σ < ε = α/ 8 2 4
(5.7.12)
Hence, by Assertion 2, we have |f (t) − fσ (t)| ≤ ε, where t ∈ (r + ε,s − ε] is arbitrary. Since 1(r+ε,s−ε] (t) ≤ f (t), it follows that 1(r+ε,s−ε] (t) − ε ≤ fσ (t)
(5.7.13)
for each t ∈ (r + ε,s − ε]. Since fσ ≥ 0, inequality 5.7.13 is trivially satisfied for t ∈ (−∞,r + ε] ∪ (s − ε,∞). We have thus proved that inequality 5.7.13 holds on domain(1(r+ε,s−ε] ). Next consider any t ∈ (−∞,r − ε] ∪ (s + ε,∞). Again, for arbitrary θ > 0, we have |f (t)−f (u)| = 0 < θ for each u ∈ (t −δf (θ ),t +δf (θ )). Hence, by Assertion 2, we have fσ (t) = fσ (t) − f (t) < ε. It follows that fσ (t) ≤ 1(r−ε,s+ε] (t) + ε
(5.7.14)
for each t ∈ (−∞,r − ε] ∪ (s + ε,∞). Since fσ ≤ 1, inequality 5.7.14 is trivially satisfied for t ∈ (r − ε,s + ε]. We have thus proved that inequality 5.7.14 holds on domain(1(r−ε,s+ε] ). Assertion 3 is proved.
5.8 Characteristic Function In previous sections we analyzed distributions J on a locally compact metric space (S,d) in terms of their values J g at basis functions g in a partition of unity. In the special case where (S,d) is the Euclidean space R, the basis functions can be replaced by the exponential√ functions hλ , where λ ∈ R, where hλ (x) ≡ eiλx for each x ∈ R, and where i ≡ −1. The result is characteristic functions, which are most useful in the study of distributions of r.r.v.’s. The classical development of this tool, such as in [Chung 1968] or [Loeve 1960], is constructive, except for infrequent and nonessential appeals to the principle of infinite search. The bare essentials of this material are presented here for completeness and for ease of reference. The reader who is familiar with the topic and is comfortable with the notion that the classical treatment is constructive, or easily made so, can skip over this and the next section and come back only for reference. We will be working with complex-valued measurable functions. Let C denote the complex plane equipped with the usual metric. Definition 5.8.1. Complex-valued integrable function. Let I be an integration on a locally compact metric space (S,d), and let (S,,I ) denote the completion of the integration space (S,C(S),I ). A function X ≡ I U + iI V : S → C whose real
202
Probability Theory
part U and imaginary part V are measurable on (S,,I ) is said to be measurable on (S,,I ). If both U,V are integrable, then X is said to be integrable, with integral I X ≡ I U + iI V . By separation into real and imaginary parts, the complex-valued functions immediately inherit the bulk of the theory of integration developed hitherto in this book for real-valued functions. One exception is the very basic inequality |I X| ≤ I |X| when |X| is integrable. Its trivial proof in the case of real-valued integrable functions relies on the linear ordering of R, which is absent in C. The next lemma gives a proof for complex-valued integrable functions. Lemma 5.8.2. |I X| ≤ I |X| for complex-valued integrable function X. Use the notations in Definition 5.8.1. Let X : S → C be an arbitrary complex-valued function. Then the function X is measurable in the sense of Definition 5.8.1 iff it is measurable in the sense of Definition 5.8.1. In other words, the former is consistent with the latter. Moreover, if X is measurable and if |X| ∈ L, then X is integrable with |I X| ≤ I |X|. Proof. Write X ≡ I U + iI V , where U,V are the real and imaginary parts of X, respectively. 1. Suppose X is measurable in the sense of Definition 5.8.1. Then U,V : (S,, I ) → R are measurable functions. Therefore the function (U,V ) : (S,,I ) → R 2 is measurable. At the same time, we have X = f (U,V ), where the continuous function f : R 2 → C is defined by f (u,v) ≡ u + iv. Hence X is measurable in the sense of Definition 4.8.1, according to Proposition 4.8.10. Conversely, suppose X(S,,I ) → C is measurable in the sense of Definition 4.8.1. Note that U,V are continuous functions of X. Hence, again by Proposition 4.8.10, both U,V are measurable. Thus X is measurable in the sense of Definition 5.8.1. 2. Suppose X is measurable and |X| ∈ L. Then, by Definition 5.8.1, both U and V are measurable, with |U | ∨ |V | ≤ |X| ∈ L. It follows that U,V ∈ L. Thus X is integrable according to Definition 5.8.1. Let ε > 0 be arbitrary. Then either (i) I |X| < 3ε or (ii) I |X| > 2ε. First consider case (i). Then |I X| = |I U + iI V | ≤ |I U | + |iI V | ≤ I |U | + I |V | ≤ 2I |X| < I |X| + 3ε. Now consider case (ii). By the Dominated Convergence Theorem, there exists a > 0 so small that I (|X| ∧ a) < ε. Then I |X|1(|X|≤a) ≤ I (|X| ∧ a) < ε. Write A ≡ (a < |X|). Then |I U 1A − I U | = |I U 1(|X|≤a) | ≤ I |U |1(|X|≤a) ≤ I |X|1(|X|≤a) < ε. Similarly, |I V 1A − I V | < ε. Hence
(5.8.1)
Probability Space
203
|I (X1A ) − I X| = |I (U 1A − I U ) + i(I V 1A − I V )| < 2ε.
(5.8.2)
Write c ≡ I |X|1A . Then it follows that c ≡ I |X|1A = I |X| − I |X|1(|X|≤a) > 2ε − 2ε = 0, where the inequality is on account of Condition (ii) and inequality 5.8.1. Now define a probability integration space (S,L,E) using g ≡ c−1 |X|1A as a probability density function on the integration space (S,,I ). Thus E(Y ) ≡ c−1 I (Y |X|1A ) for each Y ∈ L. Then X |c−1 I (X1A )| ≡ E 1A |X| ∨ a E = ≤ E = E
U V = E + iE 1 1A A |X| ∨ a |X| ∨ a
U 1A |X| ∨ a
2
+ E
V 1A |X| ∨ a
2
1 2
1 2 U2 V2 1 1 + E A A (|X| ∨ a)2 (|X| ∨ a)2 |X|2 1A (|X| ∨ a)2
1 2
≤ 1,
where the first inequality is thanks to Lyapunov’s inequality. Hence |I (X1A )| ≤ c ≡ I |X|1A . Inequality 5.8.2 therefore yields |I X| < |I (X1A )| + 2ε ≤ I |X|1A + 2ε < I |X| + 3ε. Summing up, we have |I X| < I |X| + 3ε regardless of case (i) or case (ii), where ε > 0 is arbitrary. We conclude that I |X| ≤ I |X|. Lemma 5.8.3. Basic inequalities for exponentials. Let x,y,x ,y ∈ R be arbitrary, with y ≤ 0 and y ≤ 0. Then |eix − 1| ≤ 2 ∧ |x|
(5.8.3)
and
|eix+y − eix +y | ≤ 2 ∧ |x − x | + 1 ∧ |y − y |. Proof. If x ≥ 0, then |eix − 1|2 = | cos x − 1 + i sin x|2 = 2(1 − cos x) ! x ! x sin udu ≤ 2 udu ≤ x 2 . =2 0
(5.8.4)
0
Hence, by symmetry and continuity, |eix − 1| ≤ |x| for arbitrary x ∈ R. At the same time, |eix − 1| ≤ 2. Equality 5.8.3 follows.
204
Probability Theory
Now assume y ≥ y .
|eix+y − eix +y | ≤ |eix+y − eix +y | + |eix +y − eix +y |
≤ |eix − eix |ey + |ey − ey |
≤ |ei(x−x ) − 1|ey + ey (1 − e−(y−y ) )
≤ (2 ∧ |x − x |)ey + (1 − e−(y−y ) ) ≤ 2 ∧ |x − x | + 1 ∧ |y − y |, where the last inequality is because y ≤ y ≤ 0 by assumption. Hence, by symmetry and continuity, the same inequality holds for arbitrary y,y ≤ 0. Recall the matrix notations and basics from Definition 5.7.1. Moreover, we will 1 write |x| ≡ (x12 + · · · + xn2 ) 2 and write x ≡ |x1 | ∨ · · · ∨ |xn | for each x ≡ (x1, . . . ,xn ) ∈ R n . Definition 5.8.4. Characteristic function, Fourier transform, and convolution. Let n ≥ 1 be arbitrary. 1. Let X ≡ (X1, . . . ,Xn ) be an r.v. with values in R n . The characteristic function of X is the complex-valued function ψX on R n defined by ψX (λ) ≡ E exp iλT X ≡ E cos(λT X) + iE sin(λT X). for each λ ∈ R n . 2. Let J be an arbitrary distribution on R n . The characteristic function of J is defined to be ψJ ≡ ψX , where X is any r.v. with values in R n such that EX = J . Thus ψJ (λ) ≡ J hλ , where hλ (x) ≡ exp iλT X for each λ,x ∈ R n . 3. If g is a complex-valued integrable function on R n relative to the Lebesgue integration, the Fourier transform of g is defined to be the complex-valued function g on R n with ! (exp iλT x)g(x)dx g (λ) ≡
x∈R n
for λ ∈ R n, where ·dx signifies the Lebesgue integration on R n , and where x ∈ R n is the integration variable. The convolution of two complex-valued Lebesgue integrable functions f ,g on R n is the complex-valued function f g defined by domain(f g) ≡ {x ∈ R n : f (x − ·)g ∈ L} and by (f g)(x) ≡ u∈R n f (x − y)g(y)dy for each x ∈ domain(f g). 4. Suppose n = 1. Let F be an P.D.F. on R. The characteristic function of F is defined as ψF ≡ ψJ , where J ≡ ·dF . If, in addition, F has a p.d.f. f on R, then the characteristic function of f is defined as ψf ≡ ψF . In that case, ψF (λ) = eiλt f (t)dt ≡ f(λ) for each λ ∈ R. We can choose to express the characteristic function in terms of the r.v. X, or in terms of the distribution J , or, in the case n = 1, in terms of the P.D.F., as a matter
Probability Space
205
of convenience. A theorem proved in one set of notations will be used in another set without further comment. Lemma 5.8.5. Basics of convolution. Let f ,g,h be complex-valued Lebesgue integrable functions on R n . Then the following conditions hold: 1. 2. 3. 4. 5.
f g is Lebesgue integrable. f g =gf (f g) h = f (g h) (af + bg) h = a(f h) + b(g h) for all complex numbers a,b. Suppose n = 1, and suppose g is a p.d.f. If |f | ≤ a for some a ∈ R, then |f g| ≤ a. If f is real-valued with a ≤ f ≤ b for some a,b ∈ R, then a ≤ f g ≤ b. g 6. f g = f 7. |f| ≤ f 1 ≡ x∈R n |f (x)|dx 4 4 4
Proof. If f and g are real-valued, then the integrability of f g follows from Corollary A.0.9 in Appendix A. Assertion 1 then follows by linearity. We will prove Assertions 6 and 7; the remaining assertions are left as an exercise. For Assertion 6, for each λ ∈ R n , we have ! ! f g (λ) ≡ (exp iλT x) f (x − y)g(y)dy dx 4 4 4
! = =
!
∞
−∞ −∞ ! ∞ ! ∞
! =
∞
−∞
−∞
(exp iλT (x − y))f (x − y)dx (exp iλT y)g(y)dy (exp iλT u)f (u))du (exp iλT y)g(y)dy
f(λ)(exp iλT y)g(y)dy = f(λ) g (λ),
as asserted. At the same time, for each λ ∈ R n , we have ! ! T |f (λ)| ≡ (exp iλ x)f (x)dx ≤ |(exp iλT x)f (x)|dx| x∈R n x∈R n ! |f (x)|dx, = x∈R n
where the inequality is by Lemma 5.8.2. Assertion 7 is verified.
Proposition 5.8.6. Uniform continuity of characteristic functions. Let X be an r.v. with values in R m . Let βX be a modulus of tightness of X. Then the following conditions hold: 1. |ψX (λ)| ≤ 1 and ψa+BX (λ) = exp(iλT a)ψX (λT B) for each a,λ ∈ R n and for each n × m matrix B. More precisely, ψX has a modulus of continuity 2. ψX is uniformly continuous. given by δ(ε) ≡ 3ε /β 3ε for ε > 0.
206
Probability Theory
3. If g is a Lebesgue integrable function on R n , then g is uniformly continuous. More precisely, for each ε > 0, there exists γ ≡ γg (ε) > 0 such that 1(|x|>γ ) |g(x)|dx g is given by δ(ε) ≡
< ε. Then a modulus of continuity of ε ε
g for ε > 0, where ≡ |g(t)|dt. /γ g
g +2
g +2 Proof. 1. For each λ ∈ R n , we have |ψX (λ)| ≡ |E exp(iλT X)| ≤ E| exp(iλT X)| = E1 = 1. Moreover, ψa+BX (λ) = exp(iλT a)E(iλT BX) = exp(iλT a)ψX (λT B) for each a,λ ∈ R n and for each n × m matrix B. 2. Let ε > 0. Let δ(ε) ≡ 3ε /β 3ε . Suppose h ∈ R n is such that |h| < δ(ε). Then ε ε β 3ε < 3|h| . Then P (|X| > a) < 3ε by the definition of β. . Pick a ∈ β 3ε , 3|h| On the other hand, for each x ∈ R n with |x| ≤ a, we have | exp(ihT x) − 1| ≤ |hT x| ≤ |h|a < 3ε . Hence, for each λ ∈ R n , |ψX (λ + h) − ψX (λ)| ≤ E| exp(iλT X)(exp(ihT X) − 1)| ≤ E(| exp(ihT X) − 1|;|X| ≤ a) + 2P (|X| > a) ε ε < + 2 = ε. 3 3 3. Proceed in the same manner as in Step 2. Let ε > 0. Write ε ≡
ε
g +2 . Let ε < |h| . Pick
δ(ε) ≡ γgε(ε ) . Suppose h ∈ R n is such that |h| < δ(ε). Then γg (ε )
ε a ∈ γg (ε ), |h| . Then |x|>a |g(x)|dx < ε by the definition of γg . Moreover, for each x ∈ R n with |x| ≤ a, we have | exp(ihT x) − 1| ≤ |hT x| ≤ |h|a < ε . Hence, for each λ ∈ R n , ! | g (λ + h) − g (λ)| ≤ | exp(iλT x)(exp(ihT x) − 1)g(x)|dx ! ! T ≤ |(exp(ih x) − 1)g(x)|dx + |2g(x)|dx |x|≤a
≤ ε
!
|x|>a
|g(x)|dx + 2ε = ε ( g + 2) = ε.
Lemma 5.8.7. Characteristic function of normal distribution. Let μ,σ be an arbitrary normal distribution on R n , with mean μ and covariance matrix σ . Then the characteristic function of μ,σ is given by 1 ψμ,σ (λ) ≡ exp iμT λ − λT σ λ 2 for each λ ∈ R n . Proof. 1. Consider the special case where n = 1, μ = 0, and σ = 1. Let X be an r.r.v. with the standard normal distribution 0,1 . By Proposition 5.7.5, Xp is integrable for each p ≥ 0, with mp ≡ EXp = (2k)! 2−k /k! if p is equal to
Probability Space
207
some even integer 2k, and with mp ≡ EXp = 0 otherwise. Using these moment formulas, we compute the characteristic function ! !
∞ (iλx)p −x 2 /2 1 1 2 e dx ψ0,1 (λ) = √ eiλx e−x /2 dx = √ p! 2π 2π p=0
=
∞
p=0
=
∞
k=0
=e
∞
(−1)k λ2k (iλ)p mp = m2k p! (2k)! k=0
∞
(−λ2 /2)k (−1)k λ2k (2k)! 2−k /k! = (2k)! k!
−λ2 /2
k=0
,
where Fubini’s Theorem justifies any change in the order of integration and summation. 2. Now consider the general case. By Lemma 5.7.8, μ,σ is the distribution of an r.v. Y = μ + AX for some matrix A with σ ≡ AAT and for some r.v. X with the standard normal p.d.f. ϕ0,I on R n , where I is the n × n identity matrix. Let λ ∈ R n be arbitrary. Write θ ≡ AT λ. Then ψμ,σ (λ) ≡ E exp(iλT Y ) ≡ E exp(iλT μ + iλT AX) ! = exp(iλT μ + iθ T x)ϕ0,I (x)dx x∈R n ⎛ ⎞ ! ! n
θj xj ⎠ ϕ0,1 (x1 ) . . . ϕ0,1 (xn )dx1 . . . dxn . = exp(iλT μ) · · · exp ⎝i j =1
By Fubini’s Theorem and by the first part of this proof, this reduces to n ! ψμ,σ (λ) = exp(iλT μ) exp(iθj xj )ϕ0,1 (xj )dxj j =1 n
1 = exp(iλT μ) exp − θ T θ 2 j =1 1 T 1 T T = exp iλ μ − λ AA λ ≡ exp iλT μ − λT σ λ . 2 2 = exp(iλT μ)
1 exp − θj2 2
Corollary 5.8.8. Convolution with normal density. Suppose f is a Lebesgue integrable function on R n . Let σ > 0 be arbitrary. Write σ ≡ σ 2 I , where I is the n × n identity matrix. Define fσ ≡ f ϕ0,σ . Then ! 1 fσ (t) = (2π )−n exp iλT t − σ 2 λT λ f(−λ)dλ 2 for each t ∈ R n .
208
Probability Theory
Proof. In view of Lemma 5.8.7, we have, for each t ∈ R n , ! fσ (t) ≡ ϕ0,σ (t − x)f (x)dx n
= (2π σ 2 )− 2
2 − n2
= (2π σ )
2 − n2
! !
1 exp − 2 (t − x)T (t − x) f (x)dx 2σ ψ0,I (σ −1 (t − x))f (x)dx
! !
= (2π σ )
(2π )
− n2
1 exp iσ −1 (t − x)T y − y T y f (x)dxdy 2
1 exp −iσ −1 y T t − y T y f(−σ −1 y)dy 2 ! 1 = (2π )−n exp −iλT t − σ 2 λT λ f(−λ)dλ. 2 = (2π σ )−n
!
Note that in the double integral, the integrand is a continuous function in (x,y) and is bounded in absolute value by a constant multiple of exp − 12 y T y f (x) that is, by Proposition 4.10.6, Lebesgue integrable on R 2n . This justifies the changes in order of integration, thanks to Fubini. The next theorem recovers a distribution on R n from its characteristic function. Theorem 5.8.9. Inversion formula for characteristic functions. Let J,J be a distribution on R n , with characteristic functions ψJ ,ψJ respectively. Let f be an arbitrary Lebesgue integrable function on R n . Let fdenote the Fourier transform of f . Let σ > 0 be arbitrary. Write σ ≡ σ 2 I , where I is the n × n identity matrix. Define fσ ≡ f ϕ0,σ . Then the following conditions hold: 1. We have Jfσ = (2π )−n
!
1 exp − σ 2 λT λ f(−λ)ψJ (λ)dλ. 2
(5.8.5)
2. Suppose f ∈ Cub (R n ) and |f | ≤ 1. Let ε > 0 be arbitrary. Suppose σ > 0 is so small that 5 ε3 ε n1 1 . 1− 1− −2 log σ ≤ δf 2 2 4 Then |Jf − Jfσ | ≤ ε. Consequently, Jf = limσ →0 Jfσ . 3. Suppose f ∈ Cub (R) is arbitrary such that f is Lebesgue integrable on R n . Then ! Jf = (2π )−n f(−λ)ψJ (λ)dλ.
Probability Space
209
4. If ψJ is Lebesgue integrable on R n , then J has a p.d.f. Specifically, then ! −n Jf = (2π ) f (x)ψˆJ (−x)dx for each f ∈ C(R n ). 5. J = J iff ψJ = ψJ . Proof. Write f ≡ |f (t)|dt < ∞. Then |f| ≤ f .
1. Consider the function Z(λ,x) ≡ exp(iλT x − 12 σ 2 λT λ)f(−λ) on the product n n n n space (R ,L0,I0 ) ⊗ (R ,L,J ), where (R ,L0,I0 ) ≡ (R ,L0, ·dx) is the Lebesgue integration space and where (R n,L,J ) is the probability space that is the completion of (R n,Cub (R n ),J ). The function Z is a continuous function of 2 2 (λ,x). Hence Z is measurable. Moreover, |Z| ≤ U where U (λ,x) ≡ f e−σ λ /2 is integrable. Hence Z is integrable by the Dominated Convergence Theorem. Define hλ (t) ≡ exp(it T λ) for each t,λ ∈ R n . Then J (hλ ) = ψJ (λ) for each λ ∈ R n . Corollary 5.8.8 then implies that ! 1 −n J (hλ ) exp − σ 2 λT λ f(−λ)dλ Jfσ = (2π ) 2 ! 1 (5.8.6) ≡ (2π )−n ψJ (λ) exp − σ 2 λT λ f(−λ)dλ, 2 proving Assertion 1. n 2. Now suppose 1f# ∈ Cub (R ) with modulus of continuity δf with |f | ≤ 1. Recall that : 0, 2 → [0,∞) denotes the inverse of the tail function 2 ≡ 1− (α) ≤ −2 log α of the standard normal P.D.F. . Proposition 5.7.4 says that # for each α ∈ 0, 12 . Hence 5 ε3 ε n1 1 1− 1− σ ≤ δf −2 log 2 2 4 ε3 ε n1 1 , 1− 1− ≤ δf 2 2 4 where the first inequality is by hypothesis. Therefore Lemma 5.7.9 implies that |fσ − f | ≤ ε. Consequently, |Jf − Jfσ | ≤ ε. Hence Jf = limσ →0 Jfσ . This proves Assertion 2. 3. Now let f ∈ Cub (R n ) be arbitrary. Then, by linearity, Assertion 2 implies that lim Jfσ = Jf
σ →0
Jfσ = (2π )−n
!
1 exp − σ 2 λT λ f(−λ)ψJ (λ)dλ. 2
(5.8.7)
(5.8.8)
Suppose f is Lebesgue integrable on R n . Then the integrand in equality 5.8.5 is dominated in absolute value by the integrable function |f|, and converges a.u. on
210
Probability Theory
R n to the function f(−λ)ψJ (λ) as σ → 0. Hence the Dominated Convergence Theorem implies that ! −n lim Jfσ = (2π ) f(−λ)ψJ (λ)dλ. σ →0
Combining with equality 5.8.7, Assertion 3 is proved. 4. Next consider the case where ψJ is Lebesgue integrable. Suppose f ∈ 2 2 C(R n ) with |f | ≤ 1. Then the function Uσ (x,λ) ≡ f (x)ψJ (λ)e−iλx−σ λ /2 is an integrable function relative to the product Lebesgue integration on R 2n , and is dominated in absolute value by the integrable function f (x)ψJ (λ). Moreover, Uσ → U0 uniformly on compact subsets of R 2n where U0 (x,λ) ≡ f (x)ψJ (λ)e−iλx . Hence Uσ → U0 in measure relative to I0 ⊗ I0 . The Dominated Convergence Theorem therefore yields, as σ → 0, ! 1 −n exp − σ 2 λT λ f(−λ)ψJ (λ)dλ Jfσ = (2π ) 2 ! ! 1 = (2π )−n exp − σ 2 λT λ ψJ (λ) exp(−iλx)f (x)dxdλ 2 ! ! −n ψJ (λ) exp(−iλx)f (x)dxdλ → (2π ) ! ˆ = (2π )−n ψ(−x)f (x)dx. On the other hand, by Assertion 2, we have Ifσ → If as σ → 0. Assertion 4 is proved. 5. Assertion 5 follows from Assertion 4. Definition 5.8.10. Metric of characteristic functions. Let n ≥ 1 be arbitrary. Let ψ,ψ be arbitrary characteristic functions on R n . Define
ρchar (ψ,ψ ) ≡ ρchar,n (ψ,ψ ) ≡
∞
j =1
2−j sup |ψ(λ) − ψ (λ)|. |λ|≤j
(5.8.9)
Then ρchar is a metric. We saw earlier that characteristic functions are continuous and bounded in absolute values by 1. Hence the supremum inside the parentheses in equality 5.8.9 exists and is bounded by 2. Thus ρchar is well defined. In view of Theorem 5.8.9, it is easily seen that ρchar is a metric. Convergence relative to ρchar is equivalent to uniform convergence on each compact subset of R n . The next theorem shows that the correspondence between distributions on R n and their characteristic functions is uniformly continuous when restricted to a tight subset. Theorem 5.8.11. Continuity theorem for characteristic functions. Let ξ be an arbitrary binary approximation of R. Let n ≥ 1 be arbitrary but fixed. Let
Probability Space
211
ξ n ≡ (Ap )p=1,2,... be the binary approximation of R n that is the nth power of ξ . Let ξ n be the modulus of local compactness of R n associated with ξ n . Let ρDist,ξ n be the corresponding distribution metric on the space of distributions on R n , as in Definition 5.3.4. Let J0 be a family of distributions on R n . Let J,J ∈ J0 be arbitrary, with corresponding characteristic functions ψ,ψ . Then the following conditions hold: 1. For each ε > 0, there exists δch,dstr (ε,n) > 0 such that if ρchar (ψ,ψ ) < δch,dstr (ε,n), then ρDist,ξ n (J,J ) < ε. 2. Suppose J0 is tight, with some modulus of tightness β. Then, for each ε > 0, there exists δdstr,ch (ε,β, ξ n ) > 0 such that if ρDist,ξ n (J,J ) < δdstr,ch (ε,β), then ρchar (ψ,ψ ) < ε. 3. If (Jm )m=0,1,... is a sequence of distributions on R n with a corresponding sequence (ψm )m=0,1,... of characteristic functions such that ρchar (ψm,ψ0 ) → 0, then Jm ⇒ J0 . Proof. Let πR n ≡ ({gp,x : x ∈ Ap })p=1,2,...
(5.8.10)
be the partition of unity of R n determined by ξ n , as introduced in Definition 3.3.4. Thus ξ n ≡ (|Ap |)p=1,2,... . Let ! dy, vn ≡ |y|≤1
the volume of the unit n-ball {y ∈ : |y| ≤ 1} in R n . 1. Let ε > 0 be arbitrary. As an abbreviation, write Rn
α≡
1 ε. 8
Let p ≡ [0 ∨ (1 − log2 ε)]1 . Thus 2−p
0, define δp (θ ) ≡ 2−p−1 θ > 0.
(5.8.11)
Recall from Proposition 5.7.4 its# decreasing the# standard normal P.D.F. on R, tail function : [0,∞) → 0, 12 , and the inverse function : 0, 12 → [0,∞) of the latter. Define 5 α 3 α n1 1 1− 1− > 0. (5.8.12) −2 log σ ≡ δp 2 2 4 Define
$ % 1 n 1 −5 2 σ n n−1 m ≡ σ −1 n 2 ∧ v−1 ε2 (2π ) . n 2 1
212
Probability Theory
Thus m ≥ 1 is so large that n
1
vn 22 (2π )− 2 σ −n n (σ n− 2 m)
0. n ε2
(5.8.14)
Now suppose the characteristic functions ψ,ψ on R n are such that ρchar (ψ,ψ ) ≡
∞
j =1
2−j sup |ψ(λ) − ψ (λ)| < δch,dstr (ε). |λ|≤j
(5.8.15)
We will prove that ρDist,ξ n (J,J ) < ε. To that end, first note that with m ≥ 1 as defined earlier, the last displayed inequality implies sup |ψ(λ) − ψ (λ)| < 2m δch,dstr (ε).
|λ|≤m
(5.8.16)
Next, let k = 1, . . . ,p and x ∈ Ak be arbitrary. Write f ≡ gk,x as an abbreviation. Then, by Proposition 3.3.5, f has values in [0,1] and has a Lipschitz constant 2k+1 ≤ 2p+1 . Consequently, f has the modulus of continuity δp defined in equality 5.8.11. Hence, in view of equality 5.8.12, Theorem 5.8.9 implies that |Jf − Jfσ | ≤ α ≡ and that Jfσ = (2π )−n
!
1 ε, 8
1 exp − σ 2 λT λ f(−λ)ψ(λ)dλ, 2
(5.8.17)
(5.8.18)
where fσ ≡ f ϕ0,σ 2 I , where I is the n×n identity matrix, and where fstands for the Fourier transform of f . Moreover, by Proposition 3.3.5, the function f ≡ gk,x has the sphere {y ∈ R n : |y − x| ≤ 2−k+1 } as support. Therefore ! |f | ≤ f (y)dy ≤ vn (2−k+1 )n < vn, (5.8.19) where vn is the volume of the unit n-sphere in R n , as defined previously. By equality 5.8.18 for J and a similar equality for J , we have ! 1 2 T −n |Jfσ − J fσ | = (2π ) exp − σ λ λ f (−λ)(ψ(λ) − ψ (λ))dλ 2 ! 1 ≤ (2π )−n exp − σ 2 λT λ |f(−λ)(ψ(λ) − ψ (λ)|dλ 2 |λ|≤m ! 1 exp − σ 2 λT λ |f(−λ)(ψ(λ) − ψ (λ)|dλ). + (2π )−n 2 |λ|>m (5.8.20)
Probability Space
213
In view of inequalities 5.8.16 and 5.8.19, the first summand in the last sum is bounded by ! 1 (2π )−n vn 2m δch,dstr (ε) exp − σ 2 λT λ dλ 2 n
≤ (2π )−n vn 2m δch,dstr (ε)(2π ) 2 σ −n =
1 ε, 8
where the last equality is from the defining equality 5.8.14. The second summand is bounded by ! 1 −n (2π ) vn 2 exp − σ 2 λT λ dλ 2 |λ|>m ! ! 1 2 2 −n 2 ≤ (2π ) vn 2 · · · √ exp − 2 σ (λ1 + · · · + λn ) |λ1 |∨···∨|λn |>m/ n × dλ1 . . . dλn ! ! n −n −n 2 ≤ (2π ) vn 2(2π ) σ ···
√ ϕ0,1 (λ1 ) . . . ϕ0,1 (λn ) |λ1 |∨···∨|λn |>σ m/ n
× dλ1 . . . dλn n
≤ vn 2(2π )− 2 σ −n
n !
! ···
√ ϕ0,1 (λ1 ) . . . ϕ0,1 (λn )dλ1 . . . dλn |λj |>σ m/ n
j =1
= vn 2(2π )
− n2
σ
−n
n !
√ ϕ0,1 (λj )dλj j =1 |λj |>σ m/ n 1
n
= vn 22 (2π )− 2 σ −n n (σ n− 2 m)
0 be arbitrary. Write p ≡ [0∨(2−log 2 ε)]1 . For each θ > 0, ε ,δp,β, ξR n > 0 define δp (θ ) ≡ p−1 θ . By Proposition 5.3.11, there exists 4 such that if ε ,δp,β, ξ n ρDist,ξ n (J,J ) < 4 then for each f ∈ Cub (R n ) with modulus of continuity δp and with |f | ≤ 1, we have ε (5.8.21) |Jf − J f | < . 4 Define ε ,δp,β, ξ n . δdstr,ch (ε,β, ξ n ) ≡ (5.8.22) 4 We will prove that δdstr,ch (ε,β, ξ n ) has the desired properties. To that end, suppose ε ,δp,β, ξ n . ρDist,ξ n (J,J ) < δdstr,ch (ε,β, ξ n ) ≡ 4 Let λ ∈ R n be arbitrary with |λ| ≤ p. Define the function hλ (x) ≡ exp(iλT x) ≡ cos λT x + i sin λT x for each x ∈ R n . Then, using inequality 5.8.3, we obtain | cos λT x − cos λT y| ≤ | exp(iλT x) − exp(iλT y)| = | exp(iλT (x − y)) − 1| ≤ |λT (x − y)| ≤ p|x − y|, for each x,y ∈ R n . Hence the function cos(λT ·) on R n has modulus of continuity δp . Moreover, | cos(λT ·)| ≤ 1. Hence, inequality 5.8.21 is applicable and yields |J cos(λT ·) − J cos(λT ·)|
0. 2. For each λ ∈ R with
ε ε − n1 η 2b 2
n! ε n! ε η |λ| < 2b 2
− n1
,
we have |rn (λ)| < ε|λ|n . 3. Suppose Xn+1 is integrable. Then, for each t ∈ R, we have ψ(t) ≡
n
k=0
ψ (k) (t0 )(t − t0 )k /k! +¯rn (t),
216
Probability Theory
where |¯rn (t)| ≤ |t − t0 |n+1 E|X|n+1 /(n + 1)! . Proof. We first observe that (eiax − 1)/a → ix uniformly for x in any compact interval [−t,t] as a → 0. This can be shown by first noting that, for arbitrary ε > 0, Taylor’s Theorem (Theorem B.0.1 in Appendix B), implies that |eiax − 1 − iax| ≤ a 2 x 2 /2 and so |a −1 (eiax − 1) − ix| ≤ ax 2 /2 < ε for each x ∈ [−t,t], provided that a < 2ε/t 2 . 1. Let λ ∈ R be arbitrary. Proceed inductively. The assertion is trivial if n = 0. Suppose the assertion has been proved for k = 0, . . . n − 1. Let ε > 0 be arbitrary, and let t be so large that P (|X| > t) < ε. For a > 0, define Da ≡ i k Xk eiλX (eiaX − 1)/a. By the observation at the beginning of this proof, Da converges uniformly to i k+1 Xk+1 eiλX on (|X| ≤ t) as a → 0. Thus we see that Da converges a.u. to i k+1 Xk+1 eiλX . At the same time, |Da | ≤ |X|k |(eiaX − 1)/a| ≤ |X|k+1 where |X|k+1 is integrable. The Dominated Convergence Theorem applies, yielding lima→0 EDa = i k+1 EXk+1 eiλX . On the other hand, by the induction hypothesis, EDa ≡ a −1 (i k EXk ei(λ+a)X − i k EXk eiλX ) = d ψ (k) (λ) exists and is a −1 (ψ (k) (λ + a) − ψ (k) (λ)). Combining, we see that dλ k+1 k+1 iλX equal to i EX e . Induction is completed. We next prove the continuity of ψ (n) . To that end, let ε > 0 be arbitrary. Let λ,a ∈ R be arbitrary with |a| < δψ,n (ε) ≡
ε ε − n1 η . 2b 2
Then |ψ (n) (λ + a) − ψ (n) (λ)| = |EXn ei(λ+a)X − EXn eiλX | ≤ E|X|n |eiaX − 1| ≤ E|X|n (2 ∧ |aX|) ≤ 2E|X|n 1(|X|n >η( ε )) + E|X|n (|aX|)1(|X|n ≤η( ε )) 2
ε 1 ε ε ε n E|X|n < + E|X|n ≤ + |a| η 2 2 2 2b ε ε ≤ + = ε. 2 2
2
Thus δψ,n is the modulus of continuity of ψ (n) on R. Assertion 1 is verified. 2. Assertion 2 is an immediate consequence of Assertion 1. 3. Suppose Xn+1 is integrable. Then ψ (n+1) exists on R, with |ψ (n+1) | ≤ E|X|n+1 by equality 5.8.24. Hence r¯n (t) ≤ E|X|n+1 |t −t0 |n+1 /(n+1)! according to Taylor’s Theorem.
Probability Space
217
For the proof of a partial converse, we need some basic equalities for binomial coefficients. Lemma 5.8.13. Binomial coefficients. For each n ≥ 1, we have n
n k
(−1)k k j = 0
k=0
for j = 0, . . . ,n − 1, and n
n k
(−1)k k n = (−1)n n! .
k=0
Proof. Differentiate j times the binomial expansion (1 − et )n =
n
n k
(−1)k ekt
k=0
to get n(n − 1) . . . (n − j + 1)(−1)j (1 − et )n−j =
n
n k
(−1)k k j ekt ,
k=0
and then set t to 0.
Classical proofs for the next theorem in familiar texts rely on Fatou’s Lemma, which is not constructive because it trivially implies the principle of infinite search. The following proof contains an easy fix. Proposition 5.8.14. Moments of r.r.v. and derivatives of its characteristic function. Let ψ denote the characteristic function of X. Let n ≥ 1 be arbitrary. If ψ has a continuous derivative of order 2n in some neighborhood of λ = 0, then X2n is integrable. Proof. Write λk ≡ 2−k for each k ≥ 1. Then sin2 (2λk+1 X) (2 sin(λk+1 X) cos(λk+1 X))2 sin2 (λk X) = = (2λk+1 )2 (2λk+1 )2 λ2k sin2 (λk+1 X) sin2 (λk+1 X) cos2 (λk+1 X) ≤ 2 λk+1 λ2k+1 n 2 for each k ≥ 1. Thus we see that the sequence sin (λ2k X) of integrable =
r.r.v.’s is nondecreasing. Since
ψ (2n)
ψ(λ) =
λk
2n
ψ (j ) (0) j =0
k=1,2,...
exists, we have, by Taylor’s Theorem
j!
λj + o(λ2n )
218
Probability Theory
as λ → 0. Hence for any λ ∈ R we have iλX 2n − e−iλX sin λX 2n e E =E λ 2iλ = (2iλ)−2n E
2n
2n k
(−1)k ei(2n−2k)λX
k=0
= (2iλ)−2n
2n
2n k
(−1)k ψ((2n − 2k)λ)
k=0
= (2iλ)−2n
2n
2n k
⎧ 2n ⎨
ψ (j ) (0)
(−1)k ⎩
j =0
k=0
j!
⎫ ⎬ (2n − 2k)j λj + o(λ2n ) ⎭
⎧ ⎫ 2n 2n ⎨ ⎬ (j ) (0)λj
ψ 2n (−1)k (2n − 2k)j = (2iλ)−2n o(λ2n ) + k ⎩ ⎭ j! j =0
−2n
= o(1) + (2iλ)
{ψ
k=0
(0)λ 2 } = (−1)n ψ (2n) (0)
2n in view of Lemma 5.8.13. Consequently, E sinλλkk X → (−1)n ψ (2n) (0). At the
2n same time, sinλλk k t → t uniformly for t in any compact interval. Hence sinλλkk X ↑ 2n X a.u. as k → ∞. Therefore,
2nby the Monotone Convergence Theorem, the limit is integrable. r.r.v. X2n = limk→∞ sinλλkk X (2n)
2n 2n
Proposition 5.8.15. Product distribution and direct product of characteristic function. Let F1,F2 be distributions on R n and R m , respectively, with the characteristic functions ψ1,ψ2 , respectively. Let the function ψ1 ⊗ψ2 be defined by (ψ1 ⊗ ψ2 )(λ) ≡ ψ1 (λ1 )ψ2 (λ2 ) for each λ ≡ (λ1,λ2 ) ∈ R n+m , whereλ1 ∈ R n and λ2 ∈ R m . Let F be a distribution on R n+m with characteristic function ψ. Then F = F1 ⊗ F2 iff ψ = ψ1 ⊗ ψ2 . Proof. Suppose F = F1 ⊗ F2 . Let λ ≡ (λ1,λ2 ) ∈ R n+m be arbitrary, where λ1 ∈ R n and λ2 ∈ R m . Let exp(iλT ·) be the function on R n+m whose value at arbitrary x ≡ (x1,x2 ) ∈ R n+m , where x1 ∈ R n and x2 ∈ R m , is exp(iλT x). Similarly, let exp(iλT1 ·), exp(iλT2 ·) be the functions whose values at (x1,x2 ) ∈ R n+m are exp(iλT1 x1 ), exp(iλT2 x2 ), respectively. Then ψ(λ) ≡ F exp(iλT ·) = F exp(iλT1 ·) exp(iλT2 ·) = (F1 ⊗ F2 ) exp(iλT1 ·) exp(iλT2 ·) = (F1 exp(iλT1 ·))(F2 exp(iλT2 ·)) = ψ1 (λ1 )ψ2 (λ2 ) = (ψ1 ⊗ ψ2 )(λ). Thus ψ = ψ1 ⊗ ψ2 .
Probability Space
219
Conversely, suppose ψ = ψ1 ⊗ψ2 . Let G ≡ F1 ⊗F2 . Then G has characteristic function ψ1 ⊗ ψ2 by the previous paragraph. Thus the distributions F and G have the same characteristic function ψ. By Theorem 5.8.9, it follows that F = G ≡ F1 ⊗ F 2 . Corollary 5.8.16. Independence in terms of characteristic functions. Let X1 : → R n and X2 : → R m be r.v.’s on a probability space (,L,E), with characteristic functions ψ1,ψ2 , respectively. Let ψ be the characteristic function of the r.v. X ≡ (X1,X2 ) : → R n+m . Then X1,X2 are independent iff ψ = ψ1 ⊗ ψ2 . Proof. Let F,F1,F2 be the distributions induced by X,X1,X2 on R n+m,R n,R m , respectively. Then X1,X2 are independent iff F = F1 ⊗ F2 , by Definition 5.6.1. Since F = F1 ⊗F2 iff ψ = ψ1 ⊗ψ2 , according to Proposition 5.8.15, the corollary is proved. Proposition 5.8.17. Conditional expectation of jointly normal r.r.v.’s. Let Z1, . . . , Zn,Y1, . . . ,Ym be arbitrary jointly normal r.r.v.’s with mean 0. Suppose the covariance matrix σ Z ≡ EZZ T of Z ≡ (Z1, . . . ,Zn ) is positive definite. Let σ Y ≡ EY Y T be the covariance matrix of Y ≡ (Y1, . . . ,Ym ). Define the n × m cross-covariance matrix cZ,Y ≡ EZY T , and define the n × m matrix bY ≡ σ −1 Z cZ,Y . Then the following conditions hold: T σ −1 c 1. The m × m matrix σY |Z ≡ σ Y − cZ,Y Z Z,Y is nonnegative definite. 2. For each f ∈ LY , we have
E(f (Y )|Z) = bT Z,σY |Z f . Y
Heuristically, given Z, the conditional distribution of Y is normal with mean T σ −1 Z. bYT Z and covariance matrix σY |Z . In particular, E(Y |Z) = bYT Z = cZ,Y Z 3. The r.v.’s V ≡ E(Y |Z) and X ≡ Y − E(Y |Z) are independent normal r.v.’s with values in R m . 4. EY T Y = EV T V + EXT X. Proof. 1. Let X ≡ (X1, . . . ,Xm ) ≡ Y − bYT Z. Thus Y = bYT Z + X. Then Z1, . . . , Zn,X1, . . . ,Xm are jointly normal according to Proposition 5.7.6. Furthermore, EZXT = EZY T − EZZ T bY ≡ cZ,Y − σ Z bY = 0, while the covariance matrix of X is given by σ X ≡ EXXT = EY Y T − EY Z T bY − EbYT ZY T + EbYT ZZ T bY T = σ Y − cZ,Y bY − bYT cZ,Y + bYT σ Z bY T T T = σ Y − cZ,Y σ −1 Z cZ,Y − bY cZ,Y + bY cZ,Y T = σ Y − cZ,Y σ −1 Z cZ,Y ≡ σY |Z ,
whence σY |Z is nonnegative definite.
220
Probability Theory
2. Hence the r.v. U ≡ (Z,X) in R n+m has mean 0 and covariance matrix % $ % $ σZ 0 σZ 0 ≡ . σU ≡ 0 σX 0 σY |Z Accordingly, U has the characteristic function
1 E exp(iλT U ) = ψ0,σ U (λ) ≡ exp − λT σ U λ 2 1 1 T = exp − θ σ Z θ exp − γ T σ X γ 2 2 = E exp(iθ T Z)E exp(iγ T X),
for each λ ≡ (θ1, . . . ,θn,γ1, . . . ,γm ) ∈ R n+m . It follows from Corollary 5.8.16 that Z,X are independent. In other words, the distribution E(Z,X) induced by (Z,X) on R n+m is given by the product distribution E(Z,X) = EZ ⊗ EX of EZ ,EX induced on R n,R m , respectively, by Z,X, respectively. Now let f ∈ LY be arbitrary. Thus f (Y ) ∈ L. Let z ≡ (z1, . . . ,zn ) ∈ R n and z ≡ (x1, . . . ,xm ) ∈ R m be arbitrary. Define f(z,x) ≡ f (bYT z + x) and f (z) ≡ EX f(z,·) ≡ E f(z,X) ≡ Ef (bYT z + X) = bT z,σY |Z f . Y
We will prove that the r.r.v. f (Z) is the conditional expectation of f (Y ) given Z. To that end, let g ∈ C(R n ) be arbitrary. Then, by Fubini’s Theorem, Ef (Y )g(Z) = E f(Z,X)g(Z) = E(Z,X) (fg) = EZ ⊗ EX (fg) = EZ (EX (fg)) = EZ f g = Ef (Z)g(Z). It follows that E(f (Y )|Z) = f (Z) ≡ bT Z,σY |Z f . In particular, E(Y |Z) = Y
T σ −1 Z. bYT Z = cZ,Y Z 3. By Step 2, the r.v.’s Z,X are independent normal. Hence the r.v.’s V ≡ E(Y |Z) = bYT Z and X ≡ Y − E(Y |Z) are independent normal. 4. Hence EV T X = (EV T )(EX) = 0. It follows that
EY T Y = E(V + X)T (V + X) = EV T V + EXT X.
5.9 Central Limit Theorem Let X1, . . . ,Xn be independent r.r.v.’s with mean 0 and standard deviations σ1, . . . ,σn , respectively. Define σ by σ 2 = σ12 + · · · + σn2 and consider the
Probability Space
221
distribution F of the scaled sum X = (X1 + · · · + Xn )/σ . By replacing Xi with Xi /σ we may assume that σ = 1. The Central Limit Theorem says that if each individual summand Xi is small relative to the sum X, then F is close to the standard normal distribution 0,1 . One criterion, due to Lindberg and Feller, for the summands Xk (k = 1, . . . ,n) to be individually small relative to the sum, is for θ (r) ≡
n
(E1|Xk |>r Xk2 + E1|Xk |≤r |Xk |3 )
k=1
to be small for some r ≥ 0. Lemma 5.9.1. Lindberg–Feller bound. Suppose r ≥ 0 is such that θ (r) < Then n
σk3 ≤ θ (r)
1 8.
(5.9.1)
k=1
Proof. Consider each k = 1, . . . ,n. Then, since θ (r) < 18 by hypothesis, we have z ≡ E1|Xk |>r Xk2 < 18 and a ≡ E1|Xk |≤r |Xk |3 < 18 . A consequence is that (z + a 2/3 )3/2 ≤ z + a, which can be seen by noting that the two sides are equal at z = 0 and by comparing first derivatives relative to z on [0, 18 ]. Lyapunov’s inequality then implies that σk3 = (EXk2 1(|Xk |>r) + EXk2 1(|Xk |≤r) )3/2 ≤ (EXk2 1(|Xk |>r) + (E|Xk |3 1(|Xk |≤r) )2/3 )3/2 ≡ (z + a 2/3 )3/2 ≤ z + a ≡ EXk2 1(|Xk |>r) + E|Xk |3 1(|Xk |≤r) . Summing over k, we obtain inequality 5.9.1.
Theorem 5.9.2. Central Limit Theorem. Let f ∈ C(R) and ε > 0 be arbitrary. Then there exists δ > 0 such that if θ (r) < δ for some r ≥ 0, then ! ! f (x)dF (x) − f (x)d 0,1 (x) < ε. (5.9.2) Proof. Let ξR be an arbitrary but fixed binary approximation of R relative to the reference point 0. We assume, without loss of generality, that |f (x)| ≤ 1. Let δf be a modulus of continuity of f , and let b > 0 be so large that f has [−b,b] as support. Let ε > 0 be arbitrary. By Proposition 5.3.5, there exists δJ(ε,δf ,b, ξR ) > 0 such that if the distributions F, 0,1 satisfy ρξ(R) (F, 0,1 ) < ε ≡ δJ(ε,δf ,b, ξR ),
(5.9.3)
then inequality 5.9.2 holds. Separately, according to Corollary 5.8.11, there exists δch,dstr (ε ) > 0 such that if the characteristic functions ψF ,ψ0,1 of F, 0,1 , respectively, satisfy
222
Probability Theory ρchar (ψF ,ψ0,1 ) ≡
∞
j =1
2−j sup |ψF (λ) − ψ0,1 (λ)| < ε ≡ δch,dstr (ε ), |λ|≤j
(5.9.4) then inequality 5.9.3 holds. Now take m ≥ 1 be so large that 2−m+2 < ε , and define δ≡
1 1 −3 ∧ m ε . 8 6
Suppose θ (r) < δ for some r ≥ 0. Then θ (r) < 18 . We will show that inequality 5.9.2 holds. To that end, let λ ∈ [−m,m] and k = 1, . . . ,n be arbitrary. Let ϕk denote the characteristic function of Xk , and let Yk be a normal r.r.v. with mean 0, variance 2 2 σk2 , and characteristic function e−σk λ /2 . Then ! ∞ 2 1 2 3 3 E|Yk | = √ y exp − 2 y dy. 2σk 2π σk 0 < ! ∞ 4σk3 4σk3 2 3 σ , =2 u exp(−u)du = √ =√ π k 2π 0 2π where we made a change of integration variables u ≡ − 1 2 y 2 . Moreover, since 2σk n 2 k=1 σk = 1 by assumption, and since all characteristic functions have absolute value bounded by 1, we have n n −λ2 /2 −σk2 λ2 /2 |= ϕk (λ) − e |ϕF (λ) − e k=1
≤
n
k=1
|ϕk (λ) − e−σk λ
2 2 /2
|.
(5.9.5)
k=1
By Proposition 5.8.12, the Taylor expansions up to degree 2 for the characteristic 2 2 functions ϕk (λ) and e−σk λ /2 are equal because the two corresponding distributions have equal first and second moments. Hence the difference of the two functions is equal to the difference of the two remainders in their respective Taylor expansions. Again by Proposition 5.8.12, the remainder for ϕk (λ) is bounded by λ2 EXk2 1(|Xk |>r) +
|λ|3 E|Xk |3 1(|Xk |≤r) ≤ m3 (EXk2 1(|Xk |>r) + E|Xk |3 1(|Xk |≤r) ). 3!
By the same token, the remainder for e−σk λ /2 is bounded by a similar expression, where Xk is replaced by Yk and where r ≥ 0 is replaced by s ≥ 0, which becomes, as s → ∞, 2 2
Probability Space < 2 3 3 3 3 m σk < 2m3 σk3 . m E|Yk | = 2 π
223
Combining, inequality 5.9.5 yields, for each λ ∈ [−m,m], |ϕF (λ) − e−λ
2 /2
| ≤ m3
n
(EXk2 1(|Xk |>r) + E|Xk |3 1(|Xk |≤r) ) + 2m3
k=1
n
σk3
k=1
≤ 3m3 θ (r) ≤ 3m3 δ ≤
ε 2
,
where the second inequality follows from the definition of θ (r) and from Lemma 5.9.1. Hence, since |ψF − ψ0,1 | ≤ 2, we obtain ρchar (ψF ,ψ0,1 ) ≤ ≤
m
j =1 ε
2
2−j sup |ψF (λ) − ψ0,1 (λ)| + 2−m+1 |λ|≤j
+
ε = ε ≡ δch,dstr (ε ), 2
establishing inequality 5.9.4. Consequently, inequality 5.9.3 and, in turn, inequality 5.9.2 follow. The theorem is proved. Corollary 5.9.3. Lindberg’s Central Limit Theorem. For each p = 1,2, . . ., let np ≥ 1 be arbitrary, and let (Xp,1, . . . ,Xp,n(p) ) be an independent sequence n(p) 2 2 such that of r.r.v.’s with mean 0 and variance σp,k k=1 σp,k = 1. Suppose for each r > 0 we have lim
p→∞
n(p)
2 EXp,k 1(|X(p,k)|>r) = 0.
(5.9.6)
k=1
n(p) Then k=1 Xp,k converges in distribution to the standard normal distribution 0,1 as p → ∞. Proof. Let δ > 0 be arbitrary. According to Theorem 5.9.2, it suffices to show that there exists r > 0 such that, for sufficiently large p, we have n(p)
2 EXp,k 1(|X(p,k)|>r)
0, there exists δcau (ε) > 0 such that for each s ∈ Q, there exists a measurable set Ds ⊂ domain(Xs ) with P (Dsc ) < ε, such that for each ω ∈ Ds and for each t ∈ domain(X(·,ω)) with dQ (t,s) < δcau (ε), we have d(X(t,ω),X(s,ω)) ≤ ε. Then the r.f. X is said to be continuous a.u., with the operation δcau as a modulus of continuity a.u. on Q. 3. Suppose, for each ε > 0, there exist δauc (ε) > 0 and a measurable set D with P (D c ) < ε such that for each ω ∈ D, we have (i) domain(X(·,ω)) = Q and (ii) d(X(t,ω),X(s,ω)) ≤ ε, for each t,s ∈ domain(X(·,ω)) with dQ (t,s) < δauc (ε). Then the r.f. X is said to be a.u. continuous, with the operation δauc as a modulus of a.u. continuity. The reader can give simple examples of stochastic processes that are continuous in probability but not continuous a.u., and of processes that are continuous a.u. but not a.u. continuous. Definition 6.1.3. Continuity of r.f. on an arbitrary metric parameter space. Let X : Q × → S be an r.f., where (S,d) is a locally compact metric space and where (Q,dQ ) is an arbitrary metric space. The r.f. X is said to be continuous in probability if, for each bounded subset K of Q, the restricted r.f. X|K : K × → S is continuous in probability with modulus of continuity in probability δCp,K . The r.f. X is said to be continuous a.u. if, for each bounded subset K of Q, the restricted r.f. X|K : K × → S is continuous a.u., with some modulus of continuity a.u. δcau,K .
Random Field and Stochastic Process
229
The r.f. X is said to be a.u. continuous if, for each bounded subset K of Q, the restricted r.f. X|K : K × → S is a.u. continuous, with some modulus of a.u. continuity δauc,K . If, in addition, Q = [0,∞), and if δauc,[M,M+1] = δauc,[0,1] for each M ≥ 0, then X is said to be time-uniformly a.u. continuous. Proposition 6.1.4. Alternative definitions of r.f. continuity. Let X : Q× → S be an r.f., where (S,d) is a locally compact metric space and where (Q,dQ ) is a bounded metric space. Then the following conditions hold: 1. Suppose the r.f. X is continuous in probability, with a modulus of continuity in probability δ Cp . Let ε > 0 be arbitrary. Define δ cp (ε) ≡ δ Cp (2−2 (1 ∧ ε)2 ) > 0. Then for each s,t ∈ Q with dQ (t,s) < δ cp (ε), there exists measurable set Dt,s c ≤ ε such that with P Dt,s d(X(t,ω),X(s,ω)) ≤ ε for each ω ∈ Dt,s . Conversely, if there exists an operation δ cp with the just described properties, then the r.f. X is continuous in probability, with a modulus of continuity in probability δ Cp defined by δ Cp (ε) ≡ δ cp (2−1 ε) for each ε > 0. 2. The r.f. X is continuous a.u. iff for each ε > 0 and s ∈ Q, there exists a measurable set Ds ⊂ domain(Xs ) with P (Dsc ) < ε, such that for each α > 0, (α,ε) > 0 such that there exists δcau d(X(t,ω),X(s,ω)) ≤ α (α,ε), for each ω ∈ D . for each t ∈ Q with dQ (t,s) < δcau s 3. Suppose the r.f. X is a.u. continuous. Then for each ε > 0, there exists a mea (α,ε) > 0 surable set D with P (D c ) < ε such that for each α > 0, there exists δauc such that
d(X(t,ω),X(s,ω)) ≤ α (α,ε), for each ω ∈ D. Conversely, if for each s,t ∈ Q with dQ (t,s) < δauc such an operation δauc exists, then X has a modulus of a.u. continuity given by (ε,ε) for each ε > 0. δ auc (ε) ≡ δauc
Proof. As usual, write d ≡ 1 ∧ d. 1. Suppose X is continuous in probability, with a modulus of continuity in probability δ Cp . Let ε > 0 be arbitrary. Write ε ≡ 1 ∧ ε. Suppose s,t ∈ Q are arbitrary dQ (t,s) < δ cp (ε) ≡ δ Cp (2−2 (1 ∧ ε)2 ) ≡ δ Cp (2−2 ε 2 ). Then, by Definition 6.1.2 of δ Cp as a modulus of continuity in probability, we t ,Xs ) ≤ 2−2 ε 2 < ε 2 . Take any α ∈ (2−1 ε ,ε ) such that the set have E d(X t ,Xs ) ≤ α) is measurable. Then Chebychev’s inequality implies Dt,s ≡ (d(X c ) < α −1 2−2 ε 2 < ε ≤ ε. Moreover, for each ω ∈ D , we have that P (Dt,s t,s d(X(t,ω),X(s,ω)) ≤ α < ε ≤ ε. Thus the operation δ cp has the properties described in Assertion 1.
230
Stochastic Process
Conversely, suppose δ cp is an operation with the properties described in Asser tion 1. Let ε > 0 be arbitrary. Let s,t ∈ Q be arbitrary with dQ (t,s) < δ cp 12 ε . c ≤ 1 ε such Then by hypothesis, there exists a measurable subset Dt,s with P Dt,s 2 1 that for each ω ∈ Dt,s , we have d(Xt (ω),Xs (ω)) ≤ 2 ε. It follows that E(1 ∧ c ≤ ε. Thus X is continuous in probability. Assertion 1 d(Xt ,Xs )) ≤ 12 ε + P Dt,s is proved. 2. Suppose X is continuous a.u., with δcau as a modulus of continuity a.u. Let ε > 0 and s ∈ Q be arbitrary. Then there exists, for each k ≥ 1, a measurable set c ) < 2−k ε such that for each ω ∈ D , we have Ds,k with P (Ds,k s,k d(X(t,ω),X(s,ω)) ≤ 2−k ε c for each t ∈ Q with dQ (t,s) < δcau (2−k ε). Let Ds ≡ ∞ k=1 Ds,k . Then P (Ds ) < ∞ −k −k < α, k=1 2 ε = ε. Now let α > 0 be arbitrary. Let k ≥ 1 be so large that 2 and let (α,ε) ≡ δcau (2−k ε). δcau (α,ε). Then ω ∈ D , and Consider each ω ∈ Ds and t ∈ Q with dQ (t,s) < δcau s,k dQ (t,s) < δcau (2−k ε). Hence
d(X(t,ω),X(s,ω)) ≤ 2−k ε < α. Thus the operation δcau has the described properties in Assertion 2. Conversely, let δcau be an operation with the properties described in Assertion 2. Let ε > 0 be arbitrary. Let s ∈ Q be arbitrary. Then there exists a measurable set Ds with P (Dsc ) < ε such that for each ω ∈ Ds , and t ∈ Q with dQ (t,s) < (ε,ε), we have d(X(t,ω),X(s,ω)) ≤ ε. Thus the r.f. X is continδ cau (ε) ≡ δcau uous a.u., with the operation δ cau as a modulus of continuity a.u. Assertion 2 is proved. 3. For Assertion 3, proceed almost verbatim as in the proof of Assertion 2. Suppose the r.f. X is a.u. continuous, with δauc as a modulus of a.u. continuity. Let ε > 0 be arbitrary. Then there exists, for each k ≥ 1, a measurable set Dk with P (Dkc ) < 2−k ε such that for each ω ∈ Dk , we have
d(X(t,ω),X(s,ω)) ≤ 2−k ε
c for each s,t ∈ Q with dQ (t,s) < δauc (2−k ε). Let D ≡ ∞ k=1 Dk . Then P (D ) < ∞ −k −k < α, k=1 2 ε = ε. Now let α > 0 be arbitrary. Let k ≥ 1 be so large that 2 and let (α,ε) ≡ δauc (2−k ε). δauc (α,ε). Then ω ∈ D and Consider each ω ∈ D and s,t ∈ Q with dQ (t,s) < δauc k dQ (t,s) < δauc (2−k ε). Hence
d(X(t,ω),X(s,ω)) ≤ 2−k ε < α. Thus the operation δauc has the properties described in Assertion 3.
Random Field and Stochastic Process
231
Conversely, suppose there exists an operation δauc with the properties described (ε,ε). Then there in Assertion 3. Let ε > 0 be arbitrary. Define δ auc (ε) ≡ δauc c exists a measurable set D with P (D ) < ε such that for each ω ∈ D and s,t ∈ Q (ε,ε), we have d(X(t,ω),X(s,ω)) ≤ ε. Thus the with dQ (t,s) < δ auc (ε) ≡ δauc r.f. X is a.u. continuous, with the operation δ auc as a modulus of a.u. continuity. Assertion 3 is proved.
Proposition 6.1.5. a.u. Continuity implies continuity a.u., etc. Let X : (Q,dQ )× → (S,d) be an r.f., where (S,d) is a locally compact metric space and where (Q,dQ ) is a bounded metric space. Then a.u. continuity of X implies continuity a.u., which in turn implies continuity in probability. Proof. Let ε > 0 be arbitrary. Suppose X is a.u. continuous, with modulus of a.u. continuity given by δauc . Define δcau ≡ δauc . Let D be a measurable set satisfying Condition 3 in Definition 6.1.2. Then Ds ≡ D satisfies Condition 2 in Definition 6.1.2. Accordingly, X is continuous a.u. Now suppose X is continuous a.u., with modulus of continuity a.u. given by δcau . Let Ds be a measurable set satisfying Condition 2 in Definition 6.1.2. Then Dt,s ≡ Ds satisfies the conditions in Assertion 1 of Proposition 6.1.4, provided that we define δ cp ≡ δcau . Accordingly, X is continuous in probability. Definition 6.1.6. Marginal distributions of an r.f. Let (S,d) be an arbitrary complete metric space. Let X : Q × (,L,E) → (S,d) be an r.f. Let n ≥ 1 be arbitrary, and let t ≡ (t1, . . . ,tn ) be an arbitrary sequence in the parameter set Q. Let Ft (1),...,t (n) denote the distribution induced on (S n,d n ) by the r.v. (Xt (1), . . . ,Xt (n) ). Thus Ft (1),...,t (n) f ≡ Ef (Xt (1), . . . ,Xt (n) )
(6.1.1)
for each f ∈ Cub (S n ). Then we call the indexed family F ≡ {Ft (1),...,t (n) : n ≥ 1 and t1, . . . ,tn ∈ Q} the family of marginal distributions of X. We will say that the r.f. X extends the family F of finite joint distributions and that X is an extension of F . Let X : Q × ( ,L ,E ) → (S,d) be an r.f. with sample space ( ,L ,E ). Then X and X are said to be equivalent if their marginal distributions at each finite sequence in Q are the same. In other words, X and X are said to be equivalent if Ef (Xt (1), . . . ,Xt (n) ) = E f (Xt (1), . . . ,Xt (n) ) for each f ∈ Cub (S n ), for each sequence (t1, . . . ,tn ) in Q, for each n ≥ 1. In short, two r.f.’s are equivalent if they extend the same family of finite joint distributions.
6.2 Consistent Family of f.j.d.’s In the last section, we saw that each r.f. gives rise to a family of marginal distributions. Conversely, in this section, we seek conditions for a family F of finite
232
Stochastic Process
joint distributions to be the family of marginal distributions of some r.f. We will show that a necessary condition is consistency, to be defined next. In the following chapters we will present various sufficient conditions on F for the construction of r.f.’s with F as the family of marginal distributions of r.f.’s and processes with various desired properties of sample functions. Recall that, unless otherwise specified, (S,d) will denote a locally compact metric space, with an arbitrary but fixed reference point x◦ , and (,L,E) will denote an arbitrary probability space. Definition 6.2.1. Consistent family of f.j.d.’s. Let Q be a set. Suppose, for each n ≥ 1 and for each finite sequence t1, . . . ,tn in Q, a distribution Ft (1),...,t (n) is given on the locally compact metric space (S n,d (n) ), which will be called a finite joint distribution, or f.j.d. for short. Then the indexed family F ≡ {Ft (1),...,t (n) : n ≥ 1 and t1, . . . ,tn ∈ Q} is said to be a consistent family of f.j.d.’s with parameter set Q and state space S, if the following Kolmogorov consistency condition is satisfied. Let n,m ≥ 1 be arbitrary. Let t ≡ (t1, . . . ,tm ) be an arbitrary sequence in Q, and let i ≡ (i1, . . . ,in ) be an arbitrary sequence in {1, . . . ,m}. Define the continuous function i ∗ : S m → S n by i ∗ (x1, . . . ,xm ) ≡ (xi(1), . . . ,xi(n) )
(6.2.1)
for each (x1, . . . ,xm ) ∈ S m , and call i ∗ the dual function of the sequence i. Then, for each f ∈ Cub (S n ), we have Ft (1),...,t (m) ( f ◦ i ∗ ) = Ft (i(1)),...,t (i(n)) f
(6.2.2)
or, in short, Ft ( f ◦ i ∗ ) = Ft◦i ( f ). (Q,S) denote the set of consistent families of f.j.d.’s with parameter We will let F set Q and state space S. Two consistent families F ≡ {Ft (1),...,t (n) : n ≥ 1 and t1, . . . ,tn ∈ Q} and F ≡ {Ft (1),...,t (n) : n ≥ 1 and t1, . . . ,tn ∈ Q} are considered equal if Ft (1),...,t (n) = Ft (1),...,t (n) as distributions for each n ≥ 1 and t1, . . . ,tn ∈ Q. In that case, we write simply F = F . When there is little risk of confusion, we will call a consistent family of f.j.d.’s simply a consistent family. Note that for an arbitrary f ∈ Cub (S n ), we have f ◦ i ∗ ∈ Cub (S m ), so f ◦ i ∗ is integrable relative to Ft (1),...,t (m) . Hence the left-hand side of equality 6.2.2 makes sense. Definition 6.2.2. Notations for sequences. Given any sequence (a1, . . . ,am ) of objects, we will use the shorter notation a to denote the sequence. When there is little risk of confusion, we will write κσ ≡ κ ◦σ for the composite of two functions σ : A → B and κ : B → C. Separately, for each m ≥ n ≥ 1, define the sequence
Random Field and Stochastic Process
233
κ ≡ κn,m : {1, . . . ,m − 1} → {1, . . . ,m} by n, . . . ,m) ≡ (1, . . . ,n − 1,n + 1, . . . ,m), (κ1, . . . ,κm−1 ) ≡ (1, . . . , where the caret on the top of an element in a sequence signifies the omission of ∗ denote the dual function of sequence κ. Thus that element. Let κ ∗ ≡ κn,m κ ∗ x = κ ∗ (x1, . . . ,xm ) ≡ xκ = (xκ(1), . . . ,xκ(m−1) ) = (x1, . . . , xn, . . . ,xm ) ∗ deletes the nth entry for each x ≡ (x1, . . . ,xm ) ∈ S m . In words, the function κn,m of the sequence (x1, . . . ,xm ).
Lemma 6.2.3. Consistency when parameter set is a metrically discrete subset of R. Let (S,d) be a locally compact metric space. Let Q be an arbitrary metrically discrete subset of R. Suppose, for each m ≥ 1 and nonincreasing sequence r ≡ (r1, . . . ,rm ) in Q, there exists a distribution Fr(1),...,r(m) on (S m,d m ) such that ∗ f = Fr(1),...,r(m) ( f ◦ κn,m ) Fr(1),..., r(n)...,r(m) =
(6.2.3)
∗ ), Fr◦κ(n,m) f = Fr ( f ◦ κn,m
(6.2.4)
or, equivalently,
for each f ∈ Cub (S m−1 ), for each n = 1, . . . ,m. Then the family F ≡ {Fr(1),...,r(n) : n ≥ 1;r1 ≤ . . . ≤ rn in Q} of f.j.d.’s can be uniquely extended to a consistent family of f.j.d.’s F = {Fs(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q}
(6.2.5)
with parameter Q. Proof. 1. Let the integers m,n, with m ≥ n ≥ 1, and the increasing sequence r ≡ (r1, . . . ,rm ) in Q be arbitrary. Let r ≡ (r1 , . . . ,rn ) be an arbitrary subsequence of r. Then m > h ≡ m − n ≥ 0. Moreover, r can be obtained by deleting h elements in the sequence r. Specifically, r = κ ∗ r = rκ, where κ ≡ κn(h),m κn(h−1),m−1 . . . κn(1),m−h : {1, . . . ,m − h} → {1, . . . ,m}
(6.2.6)
if h > 0, and where κ is the identity function if h = 0. Call such an operation κ a deletion. Then, by repeated application of equality 6.2.3 in the hypothesis, we obtain Fr f = Frκ f = Fr ( f ◦ κ ∗ ) m−h
(6.2.7)
for each f ∈ C(S ), for each deletion κ, for each increasing sequence r ≡ (r1, . . . ,rm ) in Q. 2. Let the sequence s ≡ (s1, . . . ,sp ) in Q be arbitrary. Let r ≡ (r1, . . . ,rm ) be an arbitrary increasing sequence in Q such that s is a sequence in {r1, . . . ,rm }.
234
Stochastic Process
Then because the sequence r is increasing, there exists a unique function σ : {1, . . . ,p} → {1, . . . ,m} such that s = rσ . Let f ∈ Cub (S p,d p ) be arbitrary. Define F s(1),...,s(p) f ≡ F s f ≡ Fr ( f ◦ σ ∗ ).
(6.2.8)
) be a We will verify that F s f is well defined. To that end, let r ≡ (r1 , . . . ,rm second increasing sequence in Q such that s is a sequence in {r1, . . . ,rm }, and let σ : {1, . . . ,p} → {1, . . . ,m } be the corresponding function such that s = r σ . We need to verify that Fr ( f ◦σ ∗ ) = Fr ( f ◦σ ∗ ). To that end, let r¯ ≡ (¯r1, . . . , r¯m ) be an arbitrary supersequence of r and r . Then r = r¯ κ and r = r¯ κ for some deletions κ and κ , respectively. Hence s = rσ = r¯ κσ and s = r σ = r¯ κ σ . Then by the uniqueness alluded to in the previous paragraph, we have κσ = κ σ and so σ ∗ ◦ κ ∗ = σ ∗ ◦ κ ∗ . Consequently,
Fr ( f ◦ σ ∗ ) = Fr¯ κ ( f ◦ σ ∗ ) = Fr¯ ( f ◦ σ ∗ ◦ κ ∗ ) = Fr¯ ( f ◦ σ ∗ ◦ κ ∗ ) = Fr¯ κ ( f ◦ σ ∗ ) = Fr ( f ◦ σ ∗ ), where the second and fourth equalities are thanks to equality 6.2.7. This shows that F s f is well defined in equality 6.2.8. The same equality says that F s is the distribution induced by the r.v. σ ∗ : (S m,Cub (S m,d m ),Fr ) → (S p,d p ), where Cub (S m,d m )− stands for the completion of C(S m,d m ) relative to the distribution Fr . In particular, F s(1),...,s(p) ≡ F s is a distribution. 3. Next, let s ≡ (s1, . . . ,sq ) be an arbitrary sequence in Q, and let (si(1), . . . ,si(p) ) be an arbitrary subsequence of s. Write i ≡ (i1, . . . ,ip ). Let the increasing sequence r ≡ (r1, . . . ,rm ) be arbitrary such that s is a sequence in {r1, . . . ,rm }, and let σ : {1, . . . ,q} → {1, . . . ,m} such that s = rσ . Then si = rσ i. Hence, for each f ∈ Cub (S p,d p ), we have F si f = F rσ i f ≡ Fr ( f ◦ i ∗ ◦ σ ∗ ) ≡ F s ( f ◦ i ∗ ). Thus the family F ≡ {F s(1),...,s(p) : p ≥ 1;s1, . . . ,sp ∈ Q} of f.j.d.’s is consistent. 4. Lastly, let s ≡ (s1, . . . ,sq ) be an arbitrary increasing sequence in Q. Write r ≡ s. Then s = rσ , where σ : {1, . . . ,q} → {1, . . . ,q} is the identity function. Hence F s(1),...,s(q) f ≡ Fr(1),...,r(q) f ◦ σ ∗ = Fs(1),...,s(q) f for each f ∈ Cub (S q ,d q ). In other words, F s(1),...,s(q) ≡ Fs(1),...,s(q) . Thus the family F is an extension of the family F , and we can simply write F for F . The lemma is proved.
Random Field and Stochastic Process
235
Lemma 6.2.4. Two consistent families of f.j.d.’s are equal if their corresponding f.j.d.’s on initial sections are equal. Let (S,d) be a locally compact metric space. Let Q ≡ {t1,t2, . . .} be a countably infinite set. Let F ≡ {Fs(1),...,s(m) : : m ≥ 1;s1, . . . ,sm ∈ Q} be two m ≥ 1;s1, . . . ,sm ∈ Q} and F ≡ {Fs(1),...,s(m) members of F (Q,S) such that Ft (1),...,t (n) = Ft (1),...,t (n) for each n ≥ 1. Then F = F . Proof. Let m ≥ 1 be arbitrary, and let (s1, . . . ,sm ) be an arbitrary sequence in Q. Then (s1, . . . ,sm ) is a subsequence of (t1, . . . ,tn ) for some sufficiently large n ≥ 1. Hence (s1, . . . ,sm ) = (ti(1), . . . ,ti(m) ) for some sequence i ≡ (i1, . . . ,im ) in {1, . . . ,n}. Let i ∗ : S n → S m be the dual function of the sequence i. Let f ∈ Cub (S m,d m ) be arbitrary. Then Fs(1),...,s(m)) f = Ft (i(1)),...,t (i(m)) f = Ft (1),...,t (n) ( f ◦ i ∗ ), where the last equality is by the consistency of the family F . Similarly, Fs(1),...,s(m)) f = Ft (i(1)),...,t (i(m)) f = Ft (1),...,t (k) ( f ◦ i ∗ ).
Since the right-hand sides of the two last displayed equalities are equal by hypothesis, the left-hand sides are equal – namely, Fs(1),...,s(m)) f = Fs(1),...,s(m)) f,
where f ∈ Cub (S m,d m ), m ≥ 1, and the sequence (s1, . . . ,sm ) in Q is arbitrary. We conclude that F = F . The next lemma extends the consistency condition 6.2.2 to integrable functions. Proposition 6.2.5. The consistency condition extends to cover integrable functions. Suppose the consistency condition 6.2.2 holds for each f ∈ Cub (S n ), for the family F of f.j.d.’s. Then a real-valued function f on S n is integrable relative to Ft (i(1)),...,t (i(n)) iff f ◦ i ∗ is integrable relative to Ft (1),...,t (m) , in which case equality 6.2.2 also holds for f . Proof. 1. Write E ≡ Ft (1),...,t (m) . Since i ∗ : (S m,d (m) ) → (S n,d (n) ) is uniformly continuous, i ∗ is an r.v. on the completion (S m,L,E) of (S m,C(S m ),E) with values in S n , whence it induces a distribution Ei ∗ on S n . Equality 6.2.2 then implies that Ei ∗ f ≡ E( f ◦ i ∗ ) ≡ Ft (1),...,t (m) ( f ◦ i ∗ ) = Ft (i(1)),...,t (i(n)) f
(6.2.9)
for each f ∈ Cub (S n,d (n) ). Hence the set of integrable functions relative to Ei ∗ is equal to the set of integrable functions relative to Ft (i(1)),...,t (i(n)) . Moreover, Ei ∗ = Ft (i(1)),...,t (i(n)) on this set.
236
Stochastic Process
2. According to Proposition 5.2.6, a function f : S n → R is integrable relative to Ei ∗ iff the function f ◦ i ∗ is integrable relative to E, in which case Ei ∗ f = E( f ◦ i ∗ ). In other words, a function f : S n → R is integrable relative to Ft (i(1)),...,t (i(n)) iff the function f ◦ i ∗ is integrable relative to Ft (1),...,t (m) , in which case Ft (i(1)),...,t (i(n)) f = Ft (1),...,t (m) f ◦ i ∗ .
Proposition 6.2.6. Marginal distributions are consistent. Let X : Q × → S be an r.f. Then the family F of marginal distributions of X is consistent. Proof. Let n,m ≥ 1 and f ∈ Cub (S n ) be arbitrary. Let t ≡ (t1, . . . ,tm ) be an arbitrary sequence in Q, and let i ≡ (i1, . . . ,in ) be an arbitrary sequence in {1, . . . ,m}. Using the defining equalities 6.1.1 and 6.2.1, we obtain Ft (1),...,t (m) ( f ◦ i ∗ ) ≡ E(( f ◦ i ∗ )(Xt (1), . . . ,Xt (m) )) ≡ Ef (Xt (i(1)), . . . ,Xt (i(n)) ) ≡ F t (i(1)),...,t (i(n)) f .
Thus the consistency condition 6.2.2 holds.
Definition 6.2.7. Restriction to a subset of the parameter set. Let (S,d) be a (Q,S) is the set of consistent families locally compact metric space. Recall that F of f.j.d.’s with parameter set Q and state space S. Let Q be any subset of Q. For (Q,S) define each F ∈ F F |Q ≡ Q,Q (F ) ≡ {Fs(1),...,s(n) : n ≥ 1;s1, . . . ,sn ∈ Q } and call
F |Q
the restriction of the consistent family F to
Q .
(6.2.10)
The function
(Q,S) → F (Q ,S) Q,Q : F will be called the restriction mapping of consistent families with parameter set Q to consistent families with parameter set Q . (Q,S) be arbitrary. Denote its image under the mapping Q,Q by 0 ⊂ F Let F 0 |Q ≡ Q,Q (F 0 ) = {F |Q : F ∈ F 0 }, F
(6.2.11)
0 |Q the restriction of the set F 0 of consistent families to Q . and call F (Q,S) when Q is countably infinite. We next introduce a metric on the set F Definition 6.2.8. Marginal metric on the space of consistent families of f.j.d.’s with a countably infinite parameter set. Let (S,d) be a locally compact metric space, with a binary approximation ξ relative to some fixed reference point x◦ . Let n ≥ 1 be arbitrary. Recall that ξ n is the nth power of ξ and a binary approximation (n) of (S n,d n ) relative to x◦ ≡ (x◦, . . . ,x◦ ) ∈ S n , as in Definition 3.2.4. Recall, from Definition 5.3.4, the distribution metric ρDist,ξ n on the set of distributions on (S n,d n ). Recall, from Assertion 3 of Proposition 5.3.5, that sequential convergence relative to ρDist,ξ n is equivalent to weak convergence.
Random Field and Stochastic Process
237
(Q,S) Let Q ≡ {t1,t2, . . .} be a countably infinite parameter set. Recall that F is the set of consistent families of f.j.d.’s with parameter set Q and state space S. (Q,S) by Define a metric ρ Marg,ξ,Q on F ρ Marg,ξ,Q (F,F ) ≡
∞
2−n ρDist,ξ n (Ft (1),...,t (n),Ft (1),...,t (n) )
(6.2.12)
n=1
(Q,S). The next lemma proves that ρ for each ∈ F Marg,ξ,Q is indeed (Q,S) of a metric. We will call ρ Marg,ξ,Q the marginal metric for the space F consistent families of f.j.d.’s, relative to the binary approximation ξ of the locally compact state space (S,d). Note that ρ Marg,ξ,Q ≤ 1 because ρDist,ξ n ≤ 1 for each n ≥ 1. We emphasize that the metric ρ Marg,ξ,Q depends on the enumeration t of the set Q. Two different enumerations lead to two different metrics, though those metrics are equivalent. As observed earlier, sequential convergence relative to ρDist,ξ n is equivalent to weak convergence of distributions on (S n,d n ), for each n ≥ 1. Hence, for each (Q,S), we have ρ Marg,ξ,Q (F (m),F (0) ) → 0 iff sequence (F (m) )m=0,1,2,... in F (m) (0) Ft (1),...,t (n) ⇒ Ft (1),...,t (n) as m → ∞, for each n ≥ 1. F,F
Lemma 6.2.9. A marginal metric is indeed a metric. Let (S,d) be a locally compact metric space, with a binary approximation ξ relative to some fixed reference point x◦ . Let Q ≡ {t1,t2, . . .} be a countably infinite parameter set.. Then (Q,S) in Definition 6.4.2 is indeed a the marginal metric ρ Marg,ξ,Q defined on F metric. Proof. 1. Symmetry and triangle inequality for ρ Marg,ξ,Q follow from their respective counterparts for ρDist,ξ n for each n ≥ 1 in the defining equality 6.2.12. (Q,S) be arbitrary such that ρ Marg,ξ,Q (F,F ) = 0. Then 2. Let F,F ∈ F equality 6.2.12 implies that ρDist,ξ n (Ft (1),...,t (n),Ft (1),...,t (n) ) = 0 for each n ≥ 1. Hence, since ρDist,ξ n is a metric, we have Ft (1),...,t (n) = Ft (1),...,t (n) , for each n ≥ 1. Hence F = F by Lemma 6.2.4. 3. Summing up, ρ Marg,ξ,Q is a metric. Definition 6.2.10. Continuity in probability of consistent families. Let (S,d) be a locally compact metric space. Write d ≡ 1∧d. Let (Q,dQ ) be a metric space. (Q,S) is the set of consistent families of f.j.d.’s with parameter space Recall that F (Q,S) be arbitrary. Q and state space (S,d). Let F ∈ F 1. Suppose (Q,dQ ) is bounded. Suppose, for each ε > 0, there exists δCp (ε) > 0 such that Fs,t d ≤ ε for each s,t ∈ Q with dQ (s,t) < δCp (ε). Then the consistent family F of f.j.d.’s is said to be continuous in probability, with δCp as a modulus of continuity in probability. 2. More generally, let the metric space (Q,dQ ) be arbitrary, not necessarily bounded. Then the consistent family F of f.j.d.’s is said to be continuous in
238
Stochastic Process
probability if, for each bounded subset K of Q, the restricted consistent family Cp (Q,S) denote the subset of F |K is continuous in probability. We will let F F (Q,S) whose members are continuous in probability. Lemma 6.2.11. Continuity in probability extends to f.j.d.’s of higher dimensions. Let (S,d) be a locally compact metric space. Let (Q,dQ ) be a bounded metric space. Suppose F is a consistent family of f.j.d.’s with state space S and parameter space Q that is continuous in probability, with a modulus of continuity in probability δCp . Then the following conditions hold: 1. Let m ≥ 1 be arbitrary. Let f ∈ Cub (S m,d m ) be arbitrary with a modulus of continuity δf and with | f | ≤ 1. Let and ε > 0 be arbitrary. Then there exists δfj d (ε,m,δf ,δCp ) > 0 such that for each s1, . . . ,sm,t1, . . . ,tm ∈ Q with m
dQ (sk ,tk ) < δfj d (ε,m,δf ,δCp ),
(6.2.13)
|Fs(1),...,s(m) f − Ft (1),...,t (m) f | ≤ ε.
(6.2.14)
k=1
we have
2. Suppose, in addition, F is a consistent family of f.j.d.’s with state space S and parameter space Q that is continuous in probability. Suppose there exists a dense of (Q,dQ ) such that Fs(1),...,s(m) = F subset Q s(1),...,s(m) for each s1, . . . ,sm ∈ Q. Then F = F . Proof. 1. Let m ≥ 1 and f ∈ Cub (S m,d m ) be as given. Write α ≡ 2−3 m−1 ε(1 ∧ δf (2−1 ε)) and define δfj d (ε,m,δf ,δCp ) ≡ δCp (α). Suppose s1, . . . ,sm,t1, . . . ,tm ∈ Q satisfy inequality 6.2.13. Then m
dQ (sk ,tk ) < δCp (α).
(6.2.15)
k=1
Let i ≡ (1, . . . ,m) and j ≡ (m + 1, . . . ,2m). Thus i and j are sequences in {1, . . . ,2m}. Let x ∈ S 2m be arbitrary. Then ( f ◦ i ∗ )(x1, . . . ,x2m ) ≡ f (xi(1), . . . ,xi(m) ) = f (x1, . . . ,xm ) and ( f ◦ j ∗ )(x1, . . . ,x2m ) ≡ f (xj (1), . . . ,xj (m) ) = f (xm+1, . . . ,x2m ), where i ∗,j ∗ are the dual functions of the functions i,j , respectively, in the sense of Definition 6.2.1. Consider each k = 1, . . . ,m. Let h ≡ (h1,h2 ) ≡ (k,m + k).
Random Field and Stochastic Process
239
Thus h : {1,2} → {1, . . . ,2m} is a sequence in {1, . . . ,2m}. Let h∗ denote the dual function. Define (r1, . . . ,r2m ) ≡ (s1, . . . ,sm,t1, . . . ,tm ). Then Fr(1),...,r(2m) (d ◦ h∗ ) = Fr(h(1)),r(h(2)) d = Fr(k),r(m+k) d = Fs(k),t (k) d < α,
(6.2.16)
where the inequality follows from inequality 6.2.15 in view of the definition of δCp as a modulus of continuity in probability of the family F . Now take any δ0 ∈ (2−1 (1 ∧ δf (2−1 ε)),1 ∧ δf (2−1 ε)). Let k ,xm+k ) > δ0 } = (d ◦ h∗ > δ0 ) ⊂ S 2m . Ak ≡ {x ∈ S 2m : d(x In view of inequality 6.2.16, Chebychev’s inequality yields Fr(1),...,r(2m) (Ak ) = Fr(1),...,r(2m) (d ◦ h∗ > δ0 ) < δ0−1 α < 2(1 ∧ δf (2−1 ε))−1 α ≡ 2(1 ∧ δf (2−1 ε))−1 α
Let A ≡
≡ 2(1 ∧ δf (2−1 ε))−1 2−3 m−1 ε(1 ∧ δf (2−1 ε)) = 2−2 m−1 ε.
m
k=1 Ak
⊂ S 2m . Then
Fr(1),...,r(2m) (A) ≤
m
Fr(1),...,r(2m) (Ak ) ≤ 2−2 ε.
k=1
Now consider each x ∈
Ac .
We have x ∈ Ack , whence
k ,xm+k ) ≤ δ0 < 1 1 ∧ d(xk ,xm+k ) ≡ d(x for each k = 1, . . . ,m. Therefore d ((x1, . . . ,xm ),(xm+1, . . . ,x2m )) ≡ m
m
d(xk ,xm+k ) ≤ δ0 < δf (2−1 ε).
k=1
Consequently, |( f ◦ i ∗ )(x) − ( f ◦ j ∗ )(x)| = | f (x1, . . . ,xm ) − f (xm+1, . . . ,x2m )| < 2−1 ε where x ∈ Ac is arbitrary. By hypothesis, | f | ≤ 1. Hence |Fs(1),...,s(m) f − Ft (1),...,t (m) f | = |Fr(i(1)),...,r(i(n)) f − Fr(j (1)),...,r(j (n)) f | = |Fr(1),...,r(2m) ( f ◦ i ∗ ) − Fr(1),...,r(2m) ( f ◦ j ∗ )|
240
Stochastic Process = |Fr(1),...,r(2m) ( f ◦ i ∗ − f ◦ j ∗ )| = |Fr(1),...,r(2m) ( f ◦ i ∗ − f ◦ j ∗ )|1Ac + |Fr(1),...,r(2m) ( f ◦ i ∗ − f ◦ j ∗ )|1A ≤ 2−1 ε + 2Fr(1),...,r(2m) (A) ≤ 2−1 ε + 2 · 2−2 ε = ε,
(6.2.17)
as desired. Assertion 1 is proved. 2. From relations 6.2.13 and 6.2.14 in Assertion 1, we see that for each arbitrary but fixed m ≥ 1 and f ∈ Cub (S m,d m ), the expectation Fs(1),...,s(m) f is a continu f is a continuous ous function of (s1, . . . ,sm ) ∈ (Qm,d m ). Similarly, Fs(1),...,s(m) m m function of (s1, . . . ,sm ) ∈ (Q ,d ). At the same time, by the assumption in f for each (s1, . . . ,sm ) in the Assertion 2, we have Fs(1),...,s(m) f = Fs(1),...,s(m) m m m f for dense subset Q of (Q ,d ). Consequently, Fs(1),...,s(m) f = Fs(1),...,s(m) m m m m each (s1, . . . ,sm ) ∈ (Q ,d ), where m ≥ 1 and f ∈ Cub (S ,d ) are arbitrary. Thus F = F . Assertion 2 and the lemma are proved. Definition 6.2.12. Metric space of consistent families that are continuous in probability. Let (S,d) be a locally compact metric space, with a reference point x◦ ∈ S and a binary approximation ξ ≡ (An )n=1,2,... relative to x◦ . Let (Q,dQ ) be a locally compact metric space. Let Q∞ ≡ {q1,q2, . . .} be an arbitrary countably infinite and dense subset of Q. (Q,S) is the set of consistent families of f.j.d.’s with parameter set Recall that F (Q,S) whose members Cp (Q,S) denote the subset of F Q and state space S. Let F are continuous in probability. Relative to the countably infinite parameter subset Q∞ and the binary approxCp (Q,S) by imation ξ , define a metric ρ Cp,ξ,Q,Q(∞) on F ρ Cp,ξ,Q,Q(∞) (F,F ) ≡ ρ Marg,ξ,Q(∞) (F |Q∞,F |Q∞ ) ≡
∞
2−n ρDist,ξ n (Fq(1),...,q(n),Fq(1),...,q(n) )
(6.2.18)
n=1
Cp (Q,S), where ρ for each F,F ∈ F Marg,ξ,Q(∞) is the marginal metric on (Q∞,S) introduced in Definition 6.2.8. In other words, F ρ Cp,ξ,Q,Q(∞) (F,F ) ≡ ρ Marg,ξ,Q(∞) ( Q,Q(∞) (F ), Q,Q(∞) (F )) Cp (Q,S). Lemma 6.2.13 (next) shows that ρ for each F,F ∈ F Cp,ξ,Q,Q(∞) is indeed a metric. Then, trivially, the mapping Cp (Q,S), Cp (Q∞,S), ρCp,ξ,Q,Q(∞) ) → (F ρMarg,ξ,Q(∞) ) Q,Q(∞) : (F is an isometry. Note that 0 ≤ ρ Cp,ξ,Q,Q(∞) ≤ 1. Lemma 6.2.13. ρ Cp,ξ,Q,Q(∞) is indeed a metric. Let Q∞ ≡ {q1,q2, . . .} be an arbitrary countably infinite and dense subset of Q. Then the function ρ Cp,ξ,Q,Q(∞) Cp (Q,S). defined in Definition 6.2.12 is a metric on F
Random Field and Stochastic Process
241
Cp (Q,S) are such that ρ Proof. Suppose F,F ∈ F Cp,ξ,Q,Q(∞) (F,F ) = 0. By the defining equality 6.2.18, we then have ρ Marg,ξ,Q(∞) (F |Q∞,F |Q∞ ) = 0. Hence, since ρ Marg,ξ,Q(∞) is a metric on F (Q∞,S), we have F |Q∞ = F |Q∞ . In other words, Fq(1),...,q(n) = Fq(1),...,q(n)
for each n ≥ 1. Consider each m ≥ 1 and each s1, . . . ,sm ∈ Q∞ . Let n ≥ 1 be so large that {s1, . . . ,sm } ⊂ {q1, . . . ,qn } and obtain, by the consistency of F and F , . Fs(1),...,s(m) = Fs(1),...,s(m)
(6.2.19)
Now let m ≥ 1 and t1, . . . ,tm ∈ Q be arbitrary. Let f ∈ C(S m ) be arbitrary. For (p) (p) each i = 1, . . . ,m, let (si )p=1,2,... be a sequence in Q∞ with d(si ,ti ) → 0 as p → ∞. Then there exists a bounded subset K ⊂ Q such that the sequences (p) (t1, . . . ,tm ) and (si )p=1,2,...;i=1,...,m are in K. Since F |K is continuous in probability, we have, by Lemma 6.2.11, Fs(p,1),...,s(p,m) f → Ft (1),...,t (m) f , (p)
where we write s(p,i) ≡ si
to lessen the burden on subscripts. Similarly,
f → Ft (1),...,t (m) f . Fs(p,1),...,s(p,m)
On the other hand, f Fs(p,1),...,s(p,m) f = Fs(p,1),...,s(p,m)
thanks to equality 6.2.19. Combining, Ft (1),...,t (m) f = Ft (1),...,t (m) f . We conclude that F = F . Cp,ξ,Q,Q(∞) (F,F ) = 0 from Conversely, suppose F = F . Then, trivially, ρ equality 6.2.18. Separately, the triangle inequality and symmetry of ρ Cp,ξ,Q,Q(∞) n follow from equality 6.2.18 and from the fact that ρDist,ξ is a metric for each n ≥ 1. Summing up, ρ Cp,ξ,Q,Q(∞) is a metric.
6.3 Daniell–Kolmogorov Extension In this and the next section, unless otherwise specified, (S,d) will be a locally compact metric space, and Q ≡ {t1,t2, . . .} will denote an enumerated countably infinite parameter set, enumerated by the bijection t : {1,2, . . .} → Q. Let x◦ be an arbitrary but fixed reference point in (S,d). (Q,S) is the set of consistent families of f.j.d.’s with Recall that F parameter set Q and the locally compact state space (S,d). We will prove the Daniell–Kolmogorov Extension Theorem, which constructs, for each member (Q,S), a probability space (S Q,L,E) and an r.f. U : Q × (S Q,L,E) → F ∈F (S,d) with marginal distributions given by F .
242
Stochastic Process
Furthermore, we will prove the uniform metrical continuity of the Daniell– (Q,S), in a sense Kolmogorov Extension on each pointwise tight subset of F to be made precise in the following discussion. This metrical continuity implies sequential continuity relative to weak convergence. Recall that [·]1 is an operation that assigns to each c ≥ 0 an integer [c]1 in the interval (c,c + 2). As usual, for arbitrary symbols a and b, we will write ab and a(b) interchangeably. Definition 6.3.1. Path space, coordinate function, and distributions on path space. Let S Q ≡ t∈Q S denote the space of functions from Q to S, called the path space. Relative to the enumerated set Q, define a complete metric d Q on S Q by d Q (x,y) ≡ d ∞ (x ◦ t,y ◦ t) ≡
∞
2−i (1 ∧ d(xt (i),yt (i) ))
i=0
for arbitrary x,y ∈ S Q . Define the function U : Q × S Q → S by U (r,x) ≡ xr for each (r,x) ∈ Q×S Q . The function U is called the coordinate function of Q×S Q . Note that the function t ∗ : (S ∞,d ∞ ) → (S Q,d Q ), defined by t ∗ (x1,x2, . . .) ≡ (xt (1),xt (2), . . .) for each (x1,x2, . . .) ∈ S ∞ , is an isometry. Note that d Q ≤ 1 and that the path space (S Q,d Q ) is complete. (S Q,d Q ) need not be locally compact. However, (S Q,d Q ) is compact if (S,d) is compact. In keeping with the terminology used in Definition 5.2.1, we will let J(S Q,d Q ) denote the set of distributions on the complete path space (S Q,d Q ). Theorem 6.3.2. Compact Daniell–Kolmogorov Extension. Suppose the metric space (S,d) is compact. Then there exists a function (Q,S) → J(S Q,d Q ) DK : F (Q,S), the distribution E ≡ such that for each consistent family of f.j.d.’s F ∈ F DK (F ) on the path space satisfies the following two conditions: (i) The coordinate function U : Q × (S Q,L,E) → (S,d) is an r.f., where (S Q,L,E) is the probability space that is the completion of (S Q,C(S Q,d Q ),E), and (ii) the r.f. U has marginal distributions given by the family F . The function DK will be called the Compact Daniell–Kolmogorov Extension. Proof. Without loss of generality, assume that Q ≡ {t1,t2, . . .} ≡ {1,2, . . .}. Then (S Q,d Q ) = (S ∞,d ∞ ). Since (S,d) is compact by hypothesis, its countable power (S Q,d Q ) = (S ∞,d ∞ ) is also compact. Hence Cub (S Q,d Q ) = C(S Q,d Q ). (Q,S). Let 1. Consider each F ≡ {Fs(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q} ∈ F ∞ ∞ f ∈ C(S ,d ) be arbitrary, with a modulus of continuity δf . For each n ≥ 1, define the function fn ∈ C(S n,d n ) by
Random Field and Stochastic Process fn (x1,x2, . . . ,xn ) ≡ f (x1,x2, . . . ,xn,x◦,x◦, . . .)
243 (6.3.1)
for each (x1,x2, . . . ,xn ) ∈ S n . Let ε > 0. Consider each m ≥ n ≥ 1 so large that 2−n < δf (ε). Define the function fn,m ∈ C(S m,d m ) by fn,m (x1,x2, . . . ,xm ) ≡ fn (x1,x2, . . . ,xn ) ≡ f (x1,x2, . . . ,xn,x◦,x◦, . . .) (6.3.2) for each (x1,x2, . . . ,xm ) ∈ S m . Consider the initial-section subsequence i ≡ (i1, . . . ,in ) ≡ (1, . . . ,n) of the sequence (1, . . . ,m). Let i ∗ : S m → S n be the dual function of the sequence i, as in Definition 6.2.1. Then for each (x1, . . . ,xm ) ∈ S m , we have fn,m (x1,x2, . . . ,xm ) ≡ fn (x1,x2, . . . ,xn ) = fn (xi(1),xi(2), . . . ,xi(n) ) ≡ fn ◦ i ∗ (x1,x2, . . . ,xm ). In short, fn,m = fn ◦ i ∗, whence, by the consistency of the family F of f.j.d.’s, we obtain F1,...,m fn,m = F1,...,m ( fn ◦ i ∗ ) = Fi(1),...,i(n) fn = F1,...,n fn .
(6.3.3)
At the same time, d ∞ ((x1,x2, . . . ,xm,x◦,x◦, . . .),(x1,x2, . . . ,xn,x◦,x◦, . . .)) ≡
n
2−i (1 ∧ d((xi ,xi )) +
i=1
2−i (1 ∧ d((xi ,x◦ ))
i=n+1 ∞
+
m
2−i (1 ∧ d((x◦,x◦ ))
i=m+1
=0+
m
2−i (1 ∧ d((xi ,x◦ )) + 0 ≤ 2−n < δf (ε)
i=n+1
for each (x1,x2, . . . ,xm ) ∈ S m . Hence | fm (x1,x2, . . . ,xm ) − fn,m (x1,x2, . . . ,xm )| ≡ | f (x1,x2, . . . ,xm,x◦,x◦, . . .) − f (x1,x2, . . . ,xn,x◦,x◦, . . .)| < ε for each (x1,x2, . . . ,xm ) ∈ S m . Consequently, |F1,...,m fm − F1,...,m fn,m | ≤ ε. Combined with equality 6.3.3, this yields |F1,...,m fm − F1,...,n fn, | ≤ ε,
(6.3.4)
where m ≥ n ≥ 1 are arbitrary with 2−n < δf (ε). Thus we see that the sequence (F1,...,n fn, )n=1,2,... of real numbers is Cauchy and has a limit. Define Ef ≡ lim F1,...,n fn, . n→∞
(6.3.5)
244
Stochastic Process
2. Letting m → ∞ in inequality 6.3.4, we obtain |Ef − F1,...,n fn, | ≤ ε,
(6.3.6)
where n ≥ 1 is arbitrary with 2−n < δf (ε). 3. Proceed to prove that E is an integration on the compact metric space (S Q,d Q ), in the sense of Definition 4.2.1. We will first verify that the function E is linear. To that end, let f ,g ∈ C(S ∞,d ∞ ) and a,b ∈ R be arbitrary. For each n ≥ 1, define the function fn relative to f as in equality 6.3.1. Similarly, define the functions gn,(af + bg)n relative to the functions g,af + bg, respectively, for each n ≥ 1. Then the defining equality 6.3.1 implies that (af + bg)n = afn + bgn for each n ≥ 1. Hence E(af + bg) ≡ lim F1,...,n (af + bg)n = lim F1,...,n (afn + bgn ) n→∞
n→∞
= a lim F1,...,n fn + b lim F1,...,n gn n→∞
n→∞
≡ aEf + bEg. Thus E is a linear function. Moreover, in the special case where f ≡ 1, we have E1 ≡ Ef ≡ lim F1,...,n fn = lim F1,...,n 1 = 1 > 0. n→∞
n→∞
(6.3.7)
Inequality 6.3.7 shows that the triple (S Q,C(S Q,d Q ),E) satisfies Condition (i) of Definition 4.2.1. It remains to verify Condition (ii), the positivity condition, of Definition 4.2.1. To that end, let f ∈ C(S ∞,d ∞ ) be arbitrary with Ef > 0. Then by the defining equality 6.3.5, we have F1,...,n fn > 0 for some n ≥ 1. Hence, since F1,...,n is a distribution, there exists (x1,x2, . . . ,xn ) ∈ S n such that fn, (x1,x2, . . . ,xn ) > 0. Therefore f (x1,x2, . . . ,xn,x◦,x◦, . . .) ≡ fn, (x1,x2, . . . ,xn ) > 0. Thus the positivity condition of Definition 4.2.1 is also verified. Accordingly, E is an integration on the compact metric space (S Q,d Q ). 4. Since the compact metric space (S Q,d Q ) is bounded, and since E1 = 1, Assertion 2 of Lemma 5.2.2 implies that the integration E is a distribution on (S Q,d Q ), and that the completion (S Q,L,E) of the integration space (S Q,(S Q, d Q ),E) is a probability space. In symbols, E ∈ J(S Q,d Q ). Define DK (F ) ≡ E. (Q,S) → J(S Q,d Q ). Thus we have constructed a function DK : F 5. It remains to show that the coordinate function U : Q × (S Q,L,E) → S is an r.f. with marginal distributions given by the family F . To that end, let n ≥ 1 be arbitrary. Let ε ∈ (0,1) be arbitrary. By Definition 6.3.1, we have Un (x) ≡ U (n,x) ≡ xn for each x ≡ (x1,x2, . . .) ∈ S ∞ . Hence, for each x ≡ (x1,x2, . . .),y ≡ (y1,y2, . . .) ∈ S ∞ such that d ∞ (x,y) < 2−n ε, we have
Random Field and Stochastic Process 1 ∧ d(Un (x),Un (y)) = 1 ∧ d(xn,yn ) ≤ 2n
∞
245
2−i (1 ∧ di (xi ,yi ))
i=1 n ∞
n −n
≡ 2 d (x,y) < 2 2
ε = ε.
Since ε < 1, it follows that d(Un (x),Un (y)) < ε. We conclude that Un ∈ C(S Q,d Q ) ⊂ L, where n ∈ Q is arbitrary. Thus U : Q × (S Q,L,E) → S is an r.f. 6. To prove that the r.f. U has marginal distributions given by the family F , let n ≥ 1 and let g ∈ C(S n,d n ) be arbitrary. Define the function f ∈ C(S ∞,d ∞ ) by f (x1,x2, . . .) ≡ g(x1, . . . ,xn )
(6.3.8)
for each x ≡ (x1,x2, . . .) ∈ S ∞ . Let m ≥ n be arbitrary. As in Step 1, define the function fm ∈ C(S m,d m ) by fm (x1,x2, . . . ,xm ) ≡ f (x1,x2, . . . ,xm,x◦,x◦, . . .) ≡ g(x1, . . . ,xn ),
(6.3.9)
for each (x1,x2, . . . ,xm ) ∈ S m . Then fm (x1,x2, . . . ,xm ) = g(x1, . . . ,xn ) = fn (x1,x2, . . . ,xn ) = fn ◦ i ∗ (x1,x2, . . . ,xm ) for each (x1,x2, . . . ,xm ) ∈ S m , where i ≡ (i1, . . . ,in ) ≡ (1, . . . ,n) is the initialsection subsequence of the sequence (1, . . . ,m), and where i ∗ : S m → S n is the dual function of the sequence i. In short, fm = fn ◦ i ∗ . At the same time, g(U1, . . . ,Un )(x) ≡ g(U1 (x), . . . ,Un (x)) = g(x1, . . . ,xn ) = f (x) for each x ≡ (x1,x2, . . .) ∈ S ∞ . Hence g(U1, . . . ,Un ) = f ∈ C(S ∞,d ∞ ).
(6.3.10)
Combining, Eg(U1, . . . ,Un ) = Ef ≡ lim F1,...,m fm m→∞
= lim F1,...,m fn ◦ i ∗ = F1,...,n fn = F1,...,n g, m→∞
(6.3.11)
where n ≥ 1 and g ∈ C(S n,d n ) are arbitrary, and where the fourth equality is by the consistency of the family F of f.j.d.’s. Thus the r.f. U has marginal distributions given by the family F . The theorem is proved. Proceed to prove the metrical continuity of the Compact Daniell–Kolmogorov Extension. First, we will specify the metrics. Definition 6.3.3. Specification of the binary approximation of the compact path space, and the distribution metric on the set of distributions on said path space. Suppose the state space (S,d) is compact. Recall that Q ≡ {t1,t2, . . .} is an enumerated countable parameter set. Let ξ ≡ (Ak )k=1,2, be an arbitrary binary
246
Stochastic Process
approximation of (S,d) relative to the reference point x◦ . Since the metric space (S,d) is compact, the countable power ξ ∞ ≡ (Bk )k=1,2, of ξ is defined and is a binary approximation of (S ∞,d ∞ ), according to Definition 3.2.6. For each k ≥ 1, define k ≡ {x ∈ S Q : x ◦ t ∈ Bk } ≡ (t ∗ )−1 Bk . B Because the function t ∗ : (S ∞,d ∞ ) → (S Q,d Q ) is an isometry, the sequence k )k=1,2, is a binary approximation of (S Q,d Q ). ξ Q ≡ (B Moreover, since (S Q,d Q ) is compact, and therefore locally compact and complete, the set J(S Q,d Q ) of distributions on (S Q,d Q ) is defined and is equipped with the distribution metric ρDist,ξ Q relative to ξ Q in Definition 5.3.4. Recall that sequential convergence relative to the metric ρDist,ξ Q is equivalent to weak convergence. Theorem 6.3.4. Continuity of the Compact Daniell–Kolmogorov Extension. Suppose the state space (S,d) is compact. Recall that Q ≡ {t1,t2, . . .}is an enumerated countable parameter set. Let ξ ≡ (Ak )k=1,2, be an arbitrary binary approximation of (S,d) relative to the reference point x◦ . Let the correk )k=1,2, , J(S Q,d Q ), and ρDist,ξ Q be as in sponding objects (S Q,d Q ),ξ Q ≡ (B Definition 6.3.3. (Q,S) of consistent families of f.j.d.’s is equipped with the Recall that the set F marginal metric ρ Marg,ξ,Q relative to ξ , as in Definition 6.2.8. Then the Compact Daniell–Kolmogorov Extension (Q,S), DK : (F ρMarg,ξ,Q ) → (J(S Q,d Q ),ρDist,ξ Q ) constructed in Theorem 6.3.2 is uniformly continuous, with a modulus of continuity δ DK (·, ξ ) dependent only on the modulus of local compactness
ξ ≡ (|Ak |)k=1,2,... of the compact metric space (S,d). Proof. Without loss of generality, assume that d ≤ 1, and that Q ≡ {t1,t2, . . .} = k )k=1,2,... are equal to (S ∞,d ∞ ) {1,2, . . .}. Then the objects (S Q,d Q ) and ξ Q ≡ (B ∞ and ξ ≡ (Bk )k=1,2,... . 1. Let ε ∈ (0,1) be arbitrary. As an abbreviation, write c ≡ 24 ε−1 and α ≡ 2−1 ε. Let m ≡ [log2 2ε−1 ]1 . Define the operation δc by δc (ε ) ≡ c−1 ε
(6.3.12)
for each ε > 0. Fix n ≥ 1 so large that 2−n < 3−1 c−1 α ≡ δc (3−1 α). By the definition of the operation [·]1 , we have 2ε−1 < 2m < 2ε−1 · 22 ≡ 2−1 c. Hence 2m+1 < c
Random Field and Stochastic Process
247
and 2−m < 2−1 ε. (Q,S) be arbitrary, with F ≡ {Fs(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ 2. Let F,F ∈ F Q} and F ≡ {Fs(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q}. Consider the distributions ∈ F ’. Since, by assumption, d ≤ 1, we have d n ≤ 1 as F1,...,n ∈ F and F1,...,n on (S n,d n ) have modulus of tightness well. Hence, the distributions F1,...,n,F1,...,n n equal to β ≡ 1. Let ξ be the nth power of ξ , as in Definition 3.2.4. Thus ξ n is a binary approximation for (S n,d n ). Recall the distribution metric ρDist,ξ n relative to ξ n on the set of distributions on (S n,d n ), as introduced in Definition 5.3.4. Then Assertion 1 of Proposition 5.3.11 applies to the compact metric space (S n,d n ) and the distribution metric ρDist,ξ n , to yield ≡ (3−1 α,δc,1, ξ n ) > 0 (6.3.13) such that, if , ρDist,ξ n (F1,...,n,F1,...,n ) 0. δ DK (ε) ≡ δ DK (ε, ξ ) ≡ 2−n
(6.3.15)
We will prove that the operation δ DK is a modulus of continuity of the Compact Daniell–Kolmogorov Extension DK . 3. Suppose, for that purpose, that ρ Marg,ξ,Q (F,F ) ≡
∞
2−k ρDist,ξ k (F1,...,k ,F1,...,k ) < δ DK (ε).
(6.3.16)
k=1
We need to show that ρDist,ξ ∞ (E,E ) < ε, where E ≡ DK (F ) and E ≡ DK (F ). 4. To that end, let π ≡ ({gk,x : x ∈ Bk })k=1,2,... be the partition of unity of the compact metric space (S ∞,d ∞ ) determined by its binary approximation ξ ∞ ≡ (Bk )k=1,2, , in the sense of Definition 3.3.4. Thus the family {gk,x : x ∈ Bk } of basis functions is the 2−k -partition of unity of (S ∞,d ∞ )
248
Stochastic Process
determined by the enumerated finite subset Bk , for each k ≥ 1. According to Definition 5.3.4, we have ρDist,ξ ∞ (E,E ) ≡
∞
2−k |Bk |−1
k=1
|Egk,x − E gk,x |.
(6.3.17)
x∈B(k)
5. The assumed inequality 6.3.16 yields . ρDist,ξ n (F1,...,n,F1,...,n ) < 2n δ DK (ε) ≡
(6.3.18)
Consider each k = 1, . . . ,m. Let x ∈ Bk be arbitrary. Proposition 3.3.3 says that the basis function gk,x has values in [0,1] and has Lipschitz constant 2k+1 on (S ∞,d ∞ ), where 2k+1 ≤ 2m+1 < c. Hence the function gk,x has Lipschitz constant c and, equivalently, has modulus of continuity δc . Now define the function gk,x,n ∈ C(S n,d n ) by gk,x,n (y1, . . . ,yn ) ≡ gk,x (y1, . . . ,yn,x◦,x◦, . . .) for each (y1, . . . ,yn ) ∈ S n . Then, for each (z1,z2, . . . ,zn ),(y1,y2, . . . ,yn ) ∈ (S n,d n ), we have |gk,x,n (z1,z2, . . . ,zn ) − gk,x,n (y1,y2, . . . ,yn )| ≡ |gk,x (z1,z2, . . . ,zn,x◦,x◦, . . .) − gk,x (y1,y2, . . . ,yn,x◦,x◦, . . .)| ≤ cd ∞ ((z1,z2, . . . ,zn,x◦,x◦, . . .),(y1,y2, . . . ,yn,x◦,x◦, . . .)) =c
n
k=1
2−k d(zk ,yk ) ≤ c
n
d(zk ,yk )
k=1
≡ cd n ((z1,z2, . . . ,zn ),(y1,y2, . . . ,yn )). Thus the function gk,x,n also has Lipschitz constant c and, equivalently, has modulus of continuity δc . In addition, |gk,x | ≤ 1, whence |gk,x,n | ≤ 1. In view of inequality 6.3.18, inequality 6.3.14 therefore holds for gk,x,n , to yield gk,x,n | < 3−1 α. |F1,...,n gk,x,n − F1,...,n
At the same time, since 2−n < δc (3−1 α), where δc is a modulus of continuity of the function gk,x,n , inequality 6.3.6 in the proof of Theorem 6.3.2 applies to the functions gk,x ,gk,x,n in the place of f ,fn , and to the constant 3−1 α in the place of ε. This yields |Egk,x − F1,...,n gk,x,n | ≤ 3−1 α,
(6.3.19)
with a similar inequality when E,F are replaced by E ,F , respectively. The triangle inequality therefore leads to gk,x,n | + 3−1 α + 3−1 α |Egk,x − E gk,x | ≤ |F1,...,n gk,x,n − F1,...,n
< 3−1 α + 2 · 3−1 α = α,
(6.3.20)
Random Field and Stochastic Process
249
where k = 1, . . . ,m and x ∈ Bk are arbitrary. It follows that ρDist,ξ ∞ ( DK (F ), DK (F )) ≡ ρDist,ξ ∞ (E,E ) ≡
∞
2−k |Bk |−1
k=1
≤
m
|Egk,x − E gk,x |
x∈B(k)
2−k α +
k=1
∞
2−k < α + 2−m < α + 2−1 ε ≡ ε,
k=m+1
(Q,S) F,F ∈ F
are arbitrary, provided that ρ Marg,ξ,Q (F,F ) < δ DK (ε, where
ξ ). Since ε > 0 is arbitrary, we conclude that the Compact Daniell–Kolmogorov Extension (Q,S), DK : (F ρMarg,ξ,Q ) → (J(S Q,d Q ),ρDist,ξ Q ) (Q,S), with modulus of continuity δ DK (·, ξ ). The is uniformly continuous on F theorem is proved. To generalize Theorems 6.3.2 and 6.3.4 to a locally compact, but not necessarily compact, state space (S,d), we (i) map each consistent family of f.j.d.’s on the latter to a corresponding family on the one-point compactification (S,d), whose f.j.d.’s assign probability 1 to powers of S; (ii) apply Theorems 6.3.2 and 6.3.4 to the Q Q compact state space (S,d), resulting in a distribution on the path space (S d ); and (iii) prove that this latter distribution assigns probability 1 to the path subspace S Q and is a distribution on the complete metric space (S Q,d Q ). The remainder of this section makes this precise. First we define the modulus of pointwise tightness for an arbitrary consistent family of f.j.d.’s. Definition 6.3.5. Modulus of pointwise tightness. Let (S,d) be a locally compact metric space with the reference point x◦ . For each n ≥ 1, define the function hn ≡ 1 ∧ (1 + n − d(·,x◦ ))+ ∈ C(S,d). Let Q be an arbitrary set. Recall that (Q,S) is the set of consistent families of f.j.d.’s with parameter set Q and state F (Q,S) be arbitrary. Let t ∈ Q be arbitrary, and consider space S. Let F ∈ F the distribution Ft on (S,d). According to Definition 5.2.1, we have Ft hn ↑ 1 as n → ∞. Hence, for each ε > 0, there exists some β(ε,t) > 0 so large that Ft (1 − hn ) < ε for each n ≥ β(ε,t). Note that β(·,t) is equivalent to a modulus of tightness of the distribution Ft , in the sense of Definition 5.3.6. We will call such an operation β : (0,∞) × Q → (0,∞) a modulus of (Q,S) β (Q,S) as the subset of F pointwise tightness of the family F . Define F whose members share a common modulus of pointwise tightness β. Lemma 6.3.6. Mapping each consistent family of f.j.d.’s with a locally compact state space S to a consistent family of f.j.d.’s with the one-point compactification S as state space. Suppose (S,d) is locally compact, not necessarily
250
Stochastic Process
compact, with a one-point compactification (S,d). Recall that Q ≡ {t1,t2, . . .} is an enumerated countably infinite parameter set. Define the function (Q,S) → F (Q,S) ψ :F
(6.3.21)
(Q,S) be arbias follows. Let F ≡ {Fs(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q} ∈ F m m trary. For each m ≥ 1 and f ∈ C(S ,d ), define F s(1),...,s(m) f ≡ Fs(1),...,s(m) (f |S m ).
(6.3.22)
(Q,S). Define ψ(F ) ≡ F . Then F ≡ {F s(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q} ∈ F Hence the function ψ is well defined. Proof. Assume, without loss of generality, that Q ≡ {1,2, . . .}. (Q,S). Let 1. Consider each F ≡ {Fs(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q} ∈ F m m m m ≥ 1 be arbitrary. Let f ∈ C(S ,d ) be arbitrary. Then f |S ∈ Cub (S m,d m ) by Assertion 2 of Proposition 3.4.5. Hence f |S m is integrable relative to the distribution Fs(1),...,s(m) on (S m,d m ). Therefore equality 6.3.22 makes sense. Since Fs(1),...,s(m) is a distribution, the right-hand side of equality 6.3.22 is a linear m m function of f . Hence F s(1),...,s(m) is a linear function on C(S ,d ). Suppose F s(1),...,s(m) f > 0. Then Fs(1),...,s(m) (f |S m ) > 0. Again, since Fs(1),...,s(m) is a distribution, it follows that there exists x ∈ S m such f (x) = (f |S m )(x) > 0. Thus m m F s(1),...,s(m) is an integration on the compact metric space (S ,d ). Moreover, F s(1),...,s(m) 1 ≡ Fs(1),...,s(m) (1) = 1. Therefore F s(1),...,s(m) is a distribution. 2. Next, we need to verify that the family F ≡ {F s(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q} is consistent. To that end, let n,m ≥ 1 be arbitrary. Let s ≡ (s1, . . . ,sm ) be an arbitrary sequence in Q, and let i ≡ (i1, . . . ,in ) be an arbitrary sequence in m n {1, . . . ,m}. Define the dual function i ∗ : S → S by i ∗ (x1, . . . ,xm ) ≡ (xs(i(1)), . . . ,xs(i(n)) )
(6.3.23)
m
for each (x1, . . . ,xm ) ∈ S . Define j ≡ (j1, . . . ,jn ) ≡ (i1, . . . ,in ) and define the dual function j ∗ : S m → S n by j ∗ (x1, . . . ,xm ) ≡ (xs(j (1)), . . . ,xs(j (n)) ) ≡ (xs(i(1)), . . . ,xs(i(n)) ) n
(6.3.24)
n
for each (x1, . . . ,xm ) ∈ S m . Consider each f ∈ C(S ,d ). Then, for each (x1, . . . ,xm ) ∈ S m , we have ((f ◦ i ∗ )|S m )(x1, . . . ,xm ) = (f ◦ i ∗ )(x1, . . . ,xm ) = f (xs(i(1)), . . . ,xs(i(n)) ) = (f |S n )(xs(i(1)), . . . ,xs(i(n)) ) = (f |S n )(xs(j (1)), . . . ,xs(j (n)) ) = (f |S m ) ◦ j ∗ (x1, . . . ,xm ).
Random Field and Stochastic Process
251
In short, (f ◦ i ∗ )|S m = (f |S m ) ◦ j ∗ . It follows that F s(1),...,s(m) (f ◦ i ∗ ) ≡ Fs(1),...,s(m) ((f ◦ i ∗ )|S m ) = Fs(1),...,s(m) ((f |S m ) ◦ j ∗ ) = Fs(j (1)),...,s(j (n)) (f |S m ) = Fs(i(1)),...,s(i(n)) (f |S m ) ≡ F s(i(1)),...,s(i(n)) f ,
(6.3.25)
where the first and last equalities are by the defining equality 6.3.22, and where (Q,S). Equality the third equality is thanks to the consistency of the family F ∈ F (Q,S). Thus 6.3.25 shows that the family F ≡ ψ(F ) is consistent. In short, F ∈ F (Q,S) → F (Q,S) is well defined. The lemma is proved. the function ψ : F We are now ready to construct, and prove the metrical continuity of, the Daniell– Kolmogorov Extension, where the state space is required only to be locally compact. Theorem 6.3.7. Construction, and metrical continuity, of the Daniell– Kolmogorov Extension, with locally compact state space. Q ≡ {t1,t2, . . .} will denote an enumerated countably infinite parameter set, enumerated by the bijection t : {1,2, . . .} → Q. Suppose (S,d) is a locally compact metric space, not necessarily compact, with a one-point compactification (S,d), relative to some arbitrary but fixed binary approximation ξ , and with the point at infinity. Recall that J(S Q,d Q ) denotes the set of distributions on the complete metric space (S Q,d Q ). Then the following conditions hold: 1. There exists a function (Q,S) → J(S Q,d Q ) DK : F (Q,S), the coordinate function U : Q × (S Q,L,E) → such that for each F ∈ F (S,d) is an r.f. with marginal distributions given by the family F , where E ≡ DK (F ), and where (S Q,L,E) is the probability space that is the completion of the integration space (S Q,Cub (S Q,d Q ),E). The function DK is called the Daniell–Kolmogorov Extension. (Q,S) whose members share a common β (Q,S) be an arbitrary subset F 2. Let F modulus of pointwise tightness β. Define β (Q,S)) ⊂ J(S Q,d Q ). Jβ (S Q,d Q ) ≡ DK (F Then the Daniell–Kolmogorov Extension
β (Q,S), DK : (F ρMarg,ξ,Q ) → Jβ (S Q,d Q ),ρDist,ξ Q,∗
(6.3.26)
(Q,S), as is uniformly continuous relative to the marginal metric ρMarg,ξ,Q on F defined in Definition 6.2.8, and relative to the metric ρDist,ξ Q,∗ on Jβ (S Q,d Q ) to be defined in the following proof. Proof. Assume, without loss of generality, that Q ≡ {1,2, . . .}.
252
Stochastic Process
1. Consider the function (Q,S) → F (Q,S) ψ :F
(6.3.27)
defined in Lemma 6.3.6. Separately, by applying Theorem 6.3.4, to the compact metric space (S,d), we have the uniformly continuous Compact Daniell– Kolmogorov Extension (Q,S), DK : (F ρMarg,ξ,Q ) → (J(S ,d ),ρDist,ξ Q ). Q
Q
(6.3.28)
Define Q Q (Q,S))). JDK (S ,d ) ≡ DK (ψ(F Q Q (Q,S) such that 2. Let E ∈ JDK (S ,d ) be arbitrary. Then there exists F ∈ F (Q,S). Theorem 6.3.2, when applied to the E = DK (F ), where F ≡ ψ(F ) ∈ F compact metric space (S,d) and the consistent family F of f.j.d.’s with state space (S,d), says that the coordinate function Q
U : Q × (S ,L,E) → (S,d) Q
is an r.f. with marginal distributions given by the family F , where (S ,L,E) is Q Q Q the completion of (S ,C(S ,d ),E). In the next several steps, we will prove, Q successively, that (i) S Q is a full set in (S ,L,E), (ii) Cub (S Q,d Q ) ⊂ L, and (iii) the restricted function E ≡ E|Cub (S Q,d Q ) is a distribution on (S Q,d Q ). Note Q that each function f ∈ Cub (S Q,d Q ) can be regarded as a function on S with Q domain( f ) ≡ S Q ⊂ S . 3. To proceed, let m ≥ 1 be arbitrary. Define hm ≡ 1 ∧ (1 + m − d(·,x◦ ))+ ∈ C(S,d). Note that |hm (z) − hm (z)| ≤ d(x,y) for each z,y ∈ S. Separately, by Proposition 3.4.5, we have C(S,d) ⊂ C(S,d)|S. Hence there exists hm ∈ C(S,d) such that hm = hm |S. 4. Now let n ≥ 1 be arbitrary. Consider the r.r.v. hm (U n ), with values in [0,1]. Then Ehm (U n ) = F n hm ≡ Fn (hm |S) = Fn hm ↑ 1
(6.3.29)
as m → ∞, where the first equality is because the r.f. U has marginal distributions given by the family F , where the second equality is by the defining equality 6.3.22 in Lemma 6.3.6, and where the convergence is because Fn is a distribution on the locally compact metric space (S,d). The Monotone Convergence Theorem there∞ fore implies that the limit Yn ≡ limm→∞ hm (U n ) is integrable on (S ,L,E), with values in [0,1] and with integral EYn = 1. It follows that P (Yn = 1) = 1. Here P denotes the probability function associated with the probability integration E.
Random Field and Stochastic Process
253
5. With n ≥ 1 be arbitrary, but fixed until further notice, consider each x ≡ (x1,x2, . . .) ∈ (Yn = 1). Then lim hm (xn ) = lim hm (U n (x)) ≡ Yn (x) = 1.
m→∞
m→∞
(6.3.30)
Let α ∈ (0,1) be arbitrary. Then there exists m ≥ 1 such that hm (xn ) > α. Note that the bounded subset (d(·,x◦ ) ≤ 1 + m) of (S,d) is contained in some compact subset Km ⊂ S. By Assertion 1 of Proposition 3.4.5, the set Km is also S ≡ S ∪ {} is dense in (S,d), and a compact subset of (S,d). Now, because S ≡ S ∪ {} such because hm ∈ C(S,d), there exists a sequence (yk )k=1,2,... in that d(yk ,xn ) → 0 as k → ∞, and such that |hm (yk ) − hm (xn )| < α for each k ≥ 1. Then for each k ≥ 1, we have hm (yk ) > hm (xn ) − α > 0, whence yk , yk ∈ S and hm (yk ) = hm (yk ) > 0. It follows that yk ∈ (d(,x◦ ) ≤ 1 + m) ⊂ Km . Since d(yk ,xn ) → 0 as k → ∞, and since Km is a compact subset of (S,d), we infer that xn ∈ Km ⊂ S. Summing up, ∞
(Yn = 1) ⊂ {x ≡ (x1,x2, . . .) ∈ S ;xn ∈ S}, where n ≥ 1 is arbitrary. 6. It follows that ∞ n=1
(Yn = 1) ⊂
∞
{(x1,x2, . . .) ∈ S
∞
: xn ∈ S} = S ∞ .
n=1
S∞
Hence contains the full set that is the intersection of the sequence of full ∞ subsets of (S ,L,E) on the left-hand side. Thus S ∞ is itself a full subset of ∞ (S ,L,E). This proves the desired Condition (i) in Step 2. 7. Proceed to verify Condition (ii). To that end, consider the coordinate function U : Q × S∞ → S as in Definition 6.3.1. First, let f ∈ C(S n,d n ) be arbitrary. By Proposition 3.4.5, we have n n n n C(S n,d n ) ⊂ C(S ,d )|S n . Hence there exists f ∈ C(S ,d ) such that f = f |S n . ∞ Because U is an r.f. with sample space (S ,L,E) and with values in ∞ (S,d), the function f (U 1, . . . ,U n ) on (S ,L,E) is an r.r.v. Consequently, f (U 1, . . . ,U n ) ∈ L because f is bounded. At the same time, we have ∞ f (U1, . . . ,Un ) = f (U 1, . . . ,U n )1S ∞ on the full subset S ∞ of (S ,L,E). It follows that f (U1, . . . ,Un ) = f (U 1, . . . ,U n )1S ∞ ∈ L
(6.3.31)
and that Ef (U1, . . . ,Un ) = Ef (U 1, . . . ,U n ) = F 1,...,n f ≡ F1,...,n (f |S n ) = F1,...,n ( f ),
(6.3.32)
254
Stochastic Process
where f ∈ C(S n,d n ) is arbitrary, where the first equality is from equality 6.3.31, Q where the second equality is because the r.f. U : Q × (S ,L,E) → (S,d) has marginal distributions given by the family F, and where the third equality is by the defining equality 6.3.22 in Lemma 6.3.6. 8. Next, let f ∈ Cub (S n,d n ) be arbitrary, with 0 ≤ f ≤ b for some b > 0. Let m ≥ 1 be arbitrary. From Step 3, we have the function hm ≡ 1 ∧ (1 + m − d(·,x◦ ))+ ∈ C(S,d). Define the function by gm on S n by gm (x1, . . . ,xn ) ≡ hm (x1 ) . . . hm (xn ) for each (x1, . . . ,xn ) ∈ S n . Then gm ≡ hm ⊗ · · · ⊗ hm ∈ C(S n,d n ). Hence f gm ∈ C(S n,d n ). Moreover, gm (U1, . . . ,Un ) ∈ L and Egm (U1, . . . ,Un ) = F1,...,n (gm ),
(6.3.33)
according to equalities 6.3.31 and 6.3.32. Let ε > 0 be arbitrary. Because F1,...,n is a distribution on (S n,d n ), there exists m ≥ 1 so large that F1,...,n hm > 1 − ε,
(6.3.34)
where we define hm ≡ 1 ∧ (m − d n (·,(x◦, . . . ,x◦ )))+ ∈ C(S n,d n ). Let (x1, . . . ,xn ) ∈ S n be arbitrary such that hm (x1, . . . ,xn ) > 0. Then n
d(xi ,x◦ ) ≡ d n ((x1, . . . ,xn ),(x◦, . . . ,x◦ )) ≤ m,
i=1
whence gm (x1, . . . ,xn ) ≡ hm (x1 ) . . . hm (xn ) = 1. We conclude that h m ≤ gm and therefore that 1 ≥ F1,...,n gm ≥ F1,...,n hm > 1 − ε,
(6.3.35)
where the last inequality is from inequality 6.3.34. Hence, using the monotonicity of the sequence (gm )m=1,2,... and the nonnegativity of the function f , we obtain F1,...,n |f gm − f gm | ≤ bF1,...,n (gm − gm ) < bε,
(6.3.36)
for each m ≥ m. Thus F1,...,n f gm converges as m → ∞. It follows from the Monotone Convergence Theorem that the function f = limm→∞ F1,...,n f gm is integrable relative to F1,...,n , and that F1,...,n f = limm→∞ F1,...,n f gm . Similarly, note that f gm ∈ C(S n,d n ). Hence relation 6.3.31 yields (f gm )(U1, . . . ,Un ) ∈ L. At the same time, E(1 − gm (U1, . . . ,Un )) = 1 − F1,...,n (gm ) < ε,
Random Field and Stochastic Process
255
according to equality 6.3.33. Consequently, E|(fgm )(U1, . . . ,Un ) − (f gm )(U1, . . . ,Un )| ≤ bE|gm (U1, . . . ,Un ) − gm (U1, . . . ,Un )| < bε,
(6.3.37)
m
≥ m. Hence E(f gm )(U1, . . . ,Un ) converges as m → ∞. It follows for each from the Monotone Convergence Theorem that f (U1, . . . ,Un ) = lim (f gm )(U1, . . . ,Un ) ∈ L m→∞
(6.3.38)
and that Ef (U1, . . . ,Un ) = lim E(f gm )(U1, . . . ,Un ) m→∞
= lim F1,...,n (f gm ) = F1,...,n ( f ), m→∞
(6.3.39)
where the second equality is thanks to equality 6.3.32, and where f ∈ Cub (S n,d n ) is arbitrary with f ≥ 0. By linearity, relation 6.3.38 and equality 6.3.39 hold for each f ∈ Cub (S n,d n ). 9. Now let j ∈ Cub (S ∞,d ∞ ) be arbitrary, with a modulus of continuity δj and with |j | ≤ b for some b > 0. We will prove that j ∈ L. To that end, let ε > 0 be arbitrary. Let n ≥ 1 be arbitrary with 2−n < δj (ε). Define jn ∈ Cub (S n,d n ) by jn (x1,x2, . . . ,xn ) ≡ j (x1,x2, . . . ,xn,x◦,x◦, . . .)
(6.3.40)
for each (x1,x2, . . . ,xn ) ∈ S n . Then d ∞ ((x1,x2, . . . ,xn,x◦,x◦, . . .),(x1,x2, . . .)) ≤
∞
2−i = 2−n < δj (ε).
i=n+1
for each x = (x1,x2, . . .) ∈
S∞.
Hence
|jn (U1 (x), . . . ,Un (x)) − j (x1,x2, . . .)| = |jn (x1,x2, . . . ,xn ) − j (x1,x2, . . .)| = |j (x1,x2, . . . ,xn,x◦,x◦, . . .) − j (x1,x2, . . .)| < ε. for each x = (x1,x2, . . .) ∈ S ∞ . In other words,
on the full set
S∞,
|jn (U1, . . . ,Un ) − j | < ε
(6.3.41)
E|jn (U1, . . . ,Un ) − j | ≤ ε,
(6.3.42)
whence
where n ≥ 1 is arbitrary with 2−n < δj (ε). Inequality 6.3.41 trivially implies that Q
jn (U1, . . . ,Un ) → j in probability on (S ,L,E), as n → ∞. At the same time, jn ∈ Cub (S n,d n ), whence jn (U1, . . . ,Un ) ∈ L according to relation 6.3.38. The Dominated Convergence Theorem therefore implies that j ∈ L and that Ej = lim Ejn (U1, . . . ,Un ), n→∞
(6.3.43)
256
Stochastic Process
where j ∈ Cub (S ∞,d ∞ ) is arbitrary. We conclude that Cub (S ∞,d ∞ ) ⊂ L. This proves the desired Condition (ii) in Step 2. 10. In view of Condition (ii), we can define the function E ≡ E|Cub (S ∞,d ∞ ).
(6.3.44)
Ej ≡ Ej = lim Ejn (U1, . . . ,Un )
(6.3.45)
In other words, n→∞
for each j ∈ Cub (S ∞,d ∞ ). In particular, for each f ∈ Cub (S n,d n ),we have Ef (U1, . . . ,Un ) = Ef (U1, . . . ,Un ) = F1,...,n ( f ),
(6.3.46)
where the second equality is from equality 6.3.39. 11. Next observe that Cub (S ∞,d ∞ ) is a linear subspace of L such that if f ,g ∈ Cub (S ∞,d ∞ ), then | f |,f ∧ 1 ∈ Cub (S ∞,d ∞ ). Proposition 4.3.6 therefore implies that (S ∞,Cub (S ∞,d ∞ ),E) is an integration space. Since d ∞ ≤ 1 and E1 = 1, Assertion 2 of Lemma 5.2.2 implies that the integration E is a distribution on (S ∞,d ∞ ). This proves the desired Condition (iii) in Step 2. 12. Thus we can define a function ϕ : JDK (S ,d ) → J(S Q,d Q ) Q
Q
by ϕ(E) ≡ E ≡ E|Cub (S ∞,d ∞ ) ∈ J(S Q,d Q ) for each E ∈ JDK (S ,d ). 13. Now define the composite function Q
Q
(Q,S) → J(S Q,d Q ). DK ≡ ϕ ◦ DK ◦ ψ : F (Q,S) be arbitrary. Let E ≡ DK (F ) ∈ J(S Q,d Q ). Then E = ϕ(E), Let F ∈ F where E ≡ DK (F ) and F ≡ ψ(F ). Let (S Q,L,E) be the probability space that is the completion of the integration space (S Q,Cub (S Q,d Q ),E). Equality 6.3.46 shows that the coordinate function U : Q × (S Q,L,E) → (S,d) is an r.f. with marginal distributions given by the family F . We conclude that the function DK satisfies all the conditions of Assertion 1 of the present theorem. Thus Assertion 1 is proved. (Q,S) whose β (Q,S) be an arbitrary subset F 14. To prove Assertion 2, let F members share a common modulus of pointwise tightness β. We will first show that the restricted mapping (Q,S), β (Q,S), ρMarg,ξ,Q ) → (F ρMarg,ξ,Q ) ψ : (F
(6.3.47)
is uniformly continuous, where the marginal metrics ρ Marg,ξ,Q and ρ Marg,ξ,Q are as in Definition 6.2.8. 15. Recall from Definition 6.3.3 that ξ ≡ (An )n=1,2,... is a binary approximation of (S,d), relative to some fixed reference point x◦ . For each n ≥ 1, let
Random Field and Stochastic Process
257
An ≡ {xn,1, . . . ,xn,κ(n) } ⊂ S. Let ξ ≡ (Ap )p=1,2,... be the compactification of the binary approximation ξ . Thus there exists an increasing sequence (mn )n=1,2,... of integers such that An ≡ Am(n) ∪ {} ≡ {xm(n),1, . . . ,xm(n),κ(m(n)),}, for each n ≥ 1, where is the point at infinity of the one-point compactification (S,d) of (S,d) relative to ξ . Then ξ is a binary approximation of (S,d) relative to x◦ , according to Corollary 3.4.4. 16. Now let ν ≥ 1 be arbitrary but fixed until further notice. For each n ≥ 1, let ν An ≡ An × · · · × An ⊂ S ν . Then ξ ν ≡ (Aνn )n=1,2,... is the binary approximation ν of (S ν ,d ν ) relative to the reference point (x◦, . . . ,x◦ ) ∈ S ν . Similarly, ξ ≡ ν ν ν (An )n=1,2,... is the binary approximation of (S ,d ) relative to the fixed reference ν point (x◦, . . . ,x◦ ) ∈ S . To lessen the burden on subscripts, we will write Aνn and ν A(ν,n) interchangeably, and write An and A(ν,n) interchangeably. Let the sequence π ν ≡ ({gν,n,x : x ∈ Aνn })n=1,2,... in C(S ν ,d ν ) be the partition of unity of (S ν ,d ν ) determined by the binary approximation ξ ν , in the sense of Definition 3.3.4. Likewise, let the sequence ν
π ν ≡ ({g ν,n,x : x ∈ An })n=1,2,... ν
ν
ν
ν
in C(S ,d ) be the partition of unity of (S ,d ) determined by the binary ν approximation ξ . : 17. Let F ≡ {Fs(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q}, F ≡ {Fs(1),...,s(m) β (Q,S) be arbitrary. Let F ≡ {F s(1),...,s(m) : m ≥ 1; m ≥ 1;s1, . . . ,sm ∈ Q} ∈ F
s1, . . . ,sm ∈ Q} ≡ ψ(F ), F ≡ {F s(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q} ≡ ψ(F ). Then, according to Definition 5.3.4, we have ρDist,ξ ν (F1,....ν ,F1,...,ν )≡
∞
2−n |Aνn |−1
n=1
|F1,....ν gν,n,x − F1,...,ν gν,n,x |.
x∈A(ν,n)
(6.3.48) Similarly, ρDist,ξ ν (F 1,...,ν ,F 1,...,ν ) ≡
∞
2−n |An |−1 ν
n=1
=
∞
n=1
|F 1,...,ν g ν,n,x − F 1,...,ν g ν,n,x |
x∈A(ν,n)
2−n |An |−1 ν
|F1,...,ν (g ν,n,x |S ν ) − F 1,...,ν (g ν,n,x |S ν )|,
x∈A(ν,n)
(6.3.49) where the last equality is due to the defining equality 6.3.22.
258
Stochastic Process
18. Let ε > 0 be arbitrary. Take N ≥ ν so large that 2−N < ε. Suppose ) < 2−N |AνN |−1 ε. ρDist,ξ ν (F1,....ν ,F1,...,ν
Consider each n = 1, . . . ,N. Then it follows from equality 6.3.48 that gν,n,x | ≤ 2n |Aνn |ρDist,ξ ν (F1,....ν ,F1,...,ν ) |F1,....ν gν,n,x − F1,...,ν
< 2n |Aνn |2−N |AνN |−1 ε ≤ ε,
(6.3.50)
for each x ∈ Aνn . 19. Take any m ≥ νi=1 β(N −1 ε,i), where, by hypothesis, β is a modulus β (Q,S). From Step 8, the of pointwise tightness of the consistent family F ∈ F function gm ≡ hm ⊗ · · · ⊗ hm ∈ C(S n,d n ). Thus gm (x1, . . . ,xν ) ≡ hm (x1 ) . . . hm (xν ) for each (x1, . . . ,xν ) ∈ (S ν ,d ν ). Let i = 1, . . . ,ν be arbitrary. Then m ≥ β(N −1 ε,i). Hence, by Definition 6.3.5, we have 1 − Fi hm = Fi (1 − hm ) < N −1 ε. Therefore 0 ≤ 1 − F1,...,ν gm = 1 − Egm (U1, . . . ,Uν ) ≡ 1 − Ehm (U1 ) . . . hm (Uν ) = (1 − Ehm (U1 )) + E(hm (U1 ) − hm (U1 )hm (U2 )) + · · · + E(hm (U1 ) . . . hm (Uν−1 ) − hm (U1 ) . . . hm (Uν )) ≤ (1 − Ehm (U1 )) + E(1 − hm (U2 )) + · · · + E(1 − hm (Uν )) = (1 − F1 hm ) + (1 − F2 hm ) + · · · + (1 − Fν hm ) < νN −1 ε < ε,
(6.3.51)
where the first equality is because the coordinate function U : Q × (S Q,L,E) → (S,d) is an r.f. with marginal distributions given by the family F . ν Consider each x ∈ An . It follows that 0 ≤ F1,...,ν (g ν,n,x |S ν ) − F1,...,ν (g ν,n,x |S ν )gm = F1,...,ν (g ν,n,x |S ν )(1 − gm ) ≤ 1 − F1,...,ν gm < ε.
(6.3.52)
Similarly, (g ν,n,x |S ν ) − F1,...,ν (g ν,n,x |S ν )gm < ε. 0 ≤ F1,...,ν
(6.3.53)
Moreover, the function f ≡ (g ν,n,x |S ν ) ∈ Cub (S ν ,d ν ) has some modulus of continuity δν,n,x . Suppose z ≡ (x1, . . . ,xν ),y ≡ (y1, . . . ,yν ) ∈ S ν are such that > δν,n (ε) ≡ (2−1 ν −1 ε) ∧ δν,n,x (2−1 ε). d ν (z,y) < x∈A(ν,n)
Random Field and Stochastic Process
259
Then |gm (z) − gm (y)| ≡ |hm (z1 ) . . . hm (zν ) − hm (y1 ) . . . hm (yν )| ≤ |hm (z1 ) − hm (y1 )| + · · · + |hm (zν ) − hm (yν )| ≤ν
ν
d(zi ,yi ) ≡ νd ν (z,y) < ν(2−1 ν −1 ε) < 2−1 ε
i=1
and | f (z) − f (y)| < 2−1 ε. Hence |fgm (z) − f gm (y)| ≤ | f (z)gm (z) − f (y)gm (z)| + | f (y)gm (z) − f (y)gm (y)| ≤ | f (z) − f (y)| + |gm (z) − gm (y)| < 2−1 ε + 2−1 ε = ε. We conclude that the function f gm ≡ (g ν,n,x |S ν )gm has the modulus of continuity δν,n . We emphasize that the modulus δν,n is regardless of m. 20. Therefore we can take ? ν @ N −1 −1 −1 β(N ε,i) ∨ (− log2 (2 δν,k (3 ε)) . m ≡ mν ≡ ν
i=1
β(N −1 ε,i)
k=1
2−m
1
δν,n (3−1 ε). 2−1
and < Then m > i=1 β (Q,S) are such that Suppose F,F ∈ F
ρDist,ξ ν (F1,....ν ,F1,...,ν ) ≤ 2−m |Aνm |−1 ε ≡ 2−m(ν) |Aνm(ν) |−1 ε.
(6.3.54)
Note that the function hm ∈ C(S,d) has the set (d(·,x◦ ) ≤ m + 1) as support. Hence the function gm ≡ hm ⊗ · · · ⊗ hm ∈ C(S ν ,d ν ) has the set (d ν (·,(x◦, . . . ,x◦ )) ≤ m + 1) ⊂ B ≡ (d ν (·,(x◦, . . . ,x◦ )) ≤ 2m ) ⊂ S ν ν
as support. Let x ∈ Ak be arbitrary, and write f ≡ g ν,n,x |S ν . Then (i) the function δν,n (3−1 ε), where fgm ∈ C(S ν ,d ν ) has the set B as support and (ii) 2−m < 2−1 δν,n is a modulus of continuity of f gm . Hence Conditions (i) and (ii) in Assertion 2 of Proposition 3.3.6 are satisfied, where we replace the objects n, (S,d), ξ , f , δf , and π by 1, (S ν ,d ν ), ξ ν ≡ δν,n , and π ν ≡ ({gν,k,y : y ∈ Aνk })k=1,2,... , respectively. (Aνk )k=1,2,... , f gm , Accordingly,
f gm − (f gm )(y)gν,m,y ≤ ε. x∈A(ν,m) Consequently,
0 ≤ F 1,...,ν f gm − (f gm )(y)F1,...,ν gν,m,y ≤ ε x∈A(ν,m)
(6.3.55)
260
Stochastic Process
and
0 ≤ F 1,...,ν f gm − (f gm )(y)F1,...,ν gν,m,y ≤ ε. x∈A(ν,m)
Combining the two last displayed inequalities, we obtain |F1,...,ν (g ν,n,x |S ν )gm − F 1,...,ν (g ν,n,x |S ν )gm | ≡ |F1,...,ν f gm − F 1,...,ν f gm |
(f gm )(y)(F1,...,ν gν,m,y − F1,...,ν gν,m,y ) + 2ε ≤ y∈A(ν,m)
≤ |F1,...,ν gν,m,y − F1,...,ν gν,m,y | + 2ε y∈A(ν,m) ≤ 2m |Aνm |ρDist,ξ ν (F1,....ν ,F1,...,ν ) + 2ε
≤ ε + 2ε = 3ε,
(6.3.56)
where the third inequality is from the defining equality 6.3.48, where the last inequality is by inequality 6.3.54, and where n = 1, . . . ,N is arbitrary. From equality 6.3.49, we therefore obtain ρDist,ξ ν (F 1,...,ν ,F 1,...,ν ) ≤
N
2−n |An |−1 ν
n=1
≤
N
0 and ν ≥ 1, we have ρDist,ξ ν (F 1,...,ν ,
F 1,...,ν ) < 6ε, provided that
ρDist,ξ ν (F1,....ν ,F1,...,ν ) ≤ 2−m(ν) |Aνm(ν) |−1 ε.
2κ
(6.3.57)
22. Now let α > 0 be arbitrary. Write ε ≡ 2−3 α. Take κ ≥ 1 so large that β (Q,S) be arbitrary such that < ε. Let F,F ∈ F
Random Field and Stochastic Process ρ Marg,ξ,Q (F,F ) < δψ (α) ≡ 2−κ
κ >
2−m(ν) |Aνm(ν) |−1 ε.
261 (6.3.58)
ν=1
Let F ≡ ψ(F ), and F ≡ ψ(F ). Then, for each ν = 1, . . . ,κ, we have ρDist,ξ ν (F1,....ν ,F1,...,ν ) ≤ 2κ ρ Marg,ξ,Q (F,F ) ≤ 2−m(ν) |Aνm(ν) |−1 ε,
whence, according to Step 21, we have ρDist,ξ ν (F 1,...,ν ,F 1,...,ν ) Consequently,
0 is arbitrary, we see that the mapping β (Q,S), (Q,S), ψ : (F ρMarg,ξ,Q ) → (F ρMarg,ξ,Q )
(6.3.59)
in relation 6.3.47 is uniformly continuous, as alleged, with a modulus of continuity δψ defined in equality 6.3.58. 23. Next consider the function Q Q ϕ : JDK (S ,d ) → J(S Q,d Q ).
We will define a metric on its range ϕ(JDK (S ,d )). Since the metric space Q ∞ (S,d) is compact, the countable power ξ = ξ of ξ is a binary approximation Q Q of (S ,d ), according to Definition 3.2.6. Hence the distribution metric ρDist,ξ Q Q
Q
Q Q on the set JDK (S ,d ) of distributions is defined according to Definition 5.3.4. Q Q Q Q Now consider each E,E ∈ ϕ(JDK (S ,d )). Let E,E ∈ JDK (S ,d ) be such ∞ ∞ that E = ϕ(E) and E = ϕ(E ). Suppose E = E . Let f ∈ C(S ,d ) be arbitrary. Then f ≡ f |S ∞ ∈ Cub (S ∞,d ∞ ). Hence
Ef = Ef = Ef = E f = E f = E f , where the first and last equalities are because f = f on the full subset S ∞ relative to E and to E , and where the second and fourth equalities are thanks to the defining equality 6.3.45. Thus E = E as distributions on the compact metric ∞ ∞ space (S ,d ), provided that ϕ(E) = ϕ(E ). We conclude that the function ϕ is an injection. Now define ρDist,ξ Q,∗ (E,E ) ≡ ρDist,ξ Q (E,E ).
(6.3.60)
262
Stochastic Process
Suppose ρDist,ξ Q,∗ (E,E ) = 0. Then ρDist,ξ Q (E,E ) = 0. Hence E = E because ρDist,ξ Q is a metric. Moreover, the defining equality 6.3.45 immediately implies the symmetry of, and triangular inequality for, the function ρDist,ξ Q,∗ , Q Q which is therefore a metric on ϕ(JDK (S ,d )). 24. It is easy to see that the function
ϕ : (JDK (S ,d ),ρDist,ξ Q ) → (ϕ(JDK (S ,d )),ρDist,ξ Q,∗ ) Q
Q
Q
Q
(6.3.61)
Q Q is continuous. To be precise, let E,E ∈ JDK (S ,d ) arbitrary. Let E = ϕ(E) and E = ϕ(E ). Then
ρDist,ξ Q,∗ (ϕ(E),ϕ(E )) = ρDist,ξ Q,∗ (E,E ) ≡ ρDist,ξ ∞ (E,E ). Thus ϕ is an isometry, and hence continuous. 25. Now define the composite β (Q,S), ρMarg,ξ,Q ) → (ϕ(JDK (S ,d )),ρDist,ξ Q,∗ ) DK ≡ ϕ ◦ DK ◦ ψ : (F Q
Q
(6.3.62) of the three uniformly continuous functions in formulas 6.3.47, 6.3.28, and 6.3.61. Call DK the Daniell–Kolmogorov Extension. Then DK is itself uniformly continuous. Moreover, its range β (Q,S)) ⊂ ϕ( DK (ψ(F (Q,S))) ≡ ϕ(JDK (S Q,d Q )) Jβ (S Q,d Q ) ≡ DK (F Q Q is a subset of ϕ(JDK (S ,d )). Hence formula 6.3.62 can be rewritten as the uniformly continuous function
β (Q,S), ρMarg,ξ,Q ) → (Jβ (S Q,d Q ),ρDist,ξ Q,∗ ). DK : (F
(6.3.63)
(Q,S) be arbitrary. Let F ≡ ψ(F ), E ≡ DK (F ), and 26. Let F ∈ F E ≡ ϕ(E). Then DK (F ) = E. Let (S Q,L,E) be the completion of (S Q,Cub (S Q,d Q ),E). Then, according to equality 6.3.39, we have Ef (U1, . . . ,Un ) = F1,...,n ( f ) where f ∈ Cub (S n,d n ) is arbitrary. Thus the coordinate function U : Q × (S Q, L,E) → (S,d) is an r.f. with marginal distributions given by the family F . Assertion 1 of the present theorem is proved. 27. Assertion 2 has been proved in Step 25. The theorem is proved.
6.4 Daniell–Kolmogorov–Skorokhod Extension Use the notations as in the previous section. Unless otherwise specified, (S,d) will be a locally compact metric space, and Q ≡ {t1,t2, . . .} will denote an enumerated countably infinite parameter set, enumerated by the bijection t : {1,2, . . .} → Q. Let x◦ be an arbitrary but fixed reference point in (S,d).
Random Field and Stochastic Process
263
For two consistent families F and F of f.j.d.’s with the parameter set Q and the locally compact state space (S,d), the Daniell–Kolmogorov Extension in Section 6.3 constructs two corresponding distributions E and E on the path space (S Q,d Q ), such that the families F and F of f.j.d.’s are the marginal distributions of the r.f.’s U : Q × (S Q,L,E) → (S,d) and U : Q × (S Q,L ,E ) → S, respectively, even as the underlying coordinate function U remains the same. In contrast, theorem 3.1.1 in [Skorokhod 1956] combines the Daniell– Kolmogorov Extension with Skorokhod’s Representation Theorem, presented as Theorem 5.5.1 in this book. It produces (i) as the sample space, the fixed probability space ! (0,L0,I0 ) ≡ [0,1],L0, ·dx based on the uniform distribution ·dx on the unit interval [0,1], and (ii) for each (Q,S), an r.f. Z : Q × (0,L0,I0 ) → (S,d) with marginal distributions F ∈F given by F . The sample space is fixed, but two different families F and F of f.j.d.’s result in two r.f.’s Z and Z that are distinct functions on the same sample space. Theorem 3.1.1 in [Skorokhod 1956] shows that the Daniell–Kolmogorov– Skorokhod Extension thus obtained is continuous relative to weak convergence (Q,S). Because the r.f.’s produced can be regarded as r.v.’s on one and the in F same probability space (0,L0,I0 ) with values in the path space (S Q,d Q ), we will have at our disposal the familiar tools of making new r.v.’s, including taking a continuous function of r.v.’s Z and Z , and taking limits of r.v.’s in various senses. In our Theorem 5.5.1, we recast Skorokhod’s Representation Theorem in terms of partitions of unity in the sense of Definition 3.3.4. Namely, where Borel sets are used in [Skorokhod 1956], we use continuous basis functions with compact support. This will facilitate the subsequent proof of metrical continuity of the Daniell–Kolmogorov–Skorokhod Extension, as well as the derivation of an accompanying modulus of continuity. This way, we gain the advantage of metrical continuity, which is stronger than sequential weak convergence. × ,S) denotes the set of r.f.’s with Recall from Definition 6.1.1 that R(Q parameter set Q, sample space (,L,E), and state space (S,d). We will next identify each member of R(Q × ,S) with an r.v. on with values in the Q Q path space (S ,d ), with the latter having been introduced in Definition 6.3.1. As observed in Definition 6.3.1, the path space (S Q,d Q ) is complete, but not necessarily locally compact. However, it is compact if (S,d) is compact. Lemma 6.4.1. A random field with a countable parameter set can be regarded as an r.v. with values in the path space. Let (S,d) be a locally compact metric space. Let Q = {t1,t2, . . .} be an enumerated countably infinite parameter set. Let (,L,E) be a probability space. Then the following holds. 1. Let X : Q × (,L,E) → (S,d) be an arbitrary r.f. Define the function : (,L,E) → (S Q,d Q ) X
264
Stochastic Process ∞
≡ i=1 domain(Xt (i) ), and by X(ω)(t by domain(X) i ) ≡ Xt (i) (ω) for each Then the function X is an r.v. on (,L,E) with i ≥ 1, for each ω ∈ domain(X). values in (S Q,d Q ). : (,L,E) → (S Q,d Q ) be an arbitrary r.v. For each s ∈ Q, 2. Conversely, let X and define the function Xs : (,L,E) → (S,d) by domain(Xs ) ≡ domain(X) Then X : Q × (,L,E) → (S,d) is an r.f. by Xs (ω) ≡ X(s,ω) ≡ (X(ω))(s). Convention: Henceforth, for brevity of notations, we will write X also for X, when it is clear from context that we intend to mean the latter. Proof. 1. Let x◦ be an arbitrary but fixed reference point in S. 2. Let m ≥ 1 be arbitrary. Let x,y ∈ S Q be arbitrary. Define jm (x) ≡ x (m) ∈ Q S by x (m) (ti ) ≡ x(ti ) or x (m) (ti ) ≡ x◦ according as i ≤ m or i > m, for each i ≥ 1. Then, by Definition 6.3.1, we have d Q (jm (x),jm (y)) ≡ d Q (x (m),y (m) ) ≡
∞
2−i (1 ∧ d(x (m) (ti ),y (m) (ti )))
i=1
=
m
i=1
≤
∞
∞
2−i (1 ∧ d(x(ti ),y(ti ))) +
2−i (1 ∧ d(x◦,x◦ ))
i=m+1
2−i (1 ∧ d(x(ti ),y(ti ))) ≡ d Q (x,y).
i=1
We see that the function jm : (S Q,d Q ) → (S Q,d Q ) is a contraction, and hence continuous. 3. Similarly, let u ≡ (u1, . . . ,um ),v ≡ (v1, . . . ,vm ) ∈ S m be arbitrary. Define km (u) ≡ u(m) ∈ S Q by u(m) (ti ) ≡ ui or u(m) (ti ) ≡ x◦ according as i ≤ m or i > m, for each i ≥ 1. Then d Q (km (u),km (v)) ≡ d Q (u(m),v(m) ) ≡
∞
2−i (1 ∧ d(u(m) (ti ),v(m) (ti )))
i=1
=
m
2−i 1 ∧ d(ui ,vi ) +
i=1
≤
m
i=1
∞
2−i (1 ∧ d(x◦,x◦ ))
i=m+1
2−i 1 ∧ d(ui ,vi ) ≤
m
d(ui ,vi ) ≡ d m (u,v).
i=1
We see that the function km : (S m,d m ) → (S Q,d Q ) is a contraction, and hence continuous. 4. Consider each f ∈ Cub (S Q,d Q ), with a modulus of continuity δf . Then f ◦ km ∈ Cub (S m,d m ). Since X : Q × (,L,E) → (S,d) is an r.f., it follows that ( f ◦ km )(Xt (1), . . . ,Xt (m) ) ∈ L. Let ε > 0 be arbitrary. Let m ≥ 1 be so large that 2−m < δf (ε).
Random Field and Stochastic Process
265
5. Let x ∈ S Q be arbitrary. Define u ≡ (xt (1), . . . ,xt (m) ) ∈ S m . Then we have x (m) (ti ) ≡ x(ti ) = ui ≡ u(m) (ti ) or x (m) (ti ) ≡ x◦ ≡ u(m) (ti ) depending on whether i ≤ m or i > m, for each i ≥ 1. It follows that x (m) = u(m) ≡ km (u) = km (xt (1), . . . ,xt (m) ), for each x ∈ S Q . Consequently, f (X(m) ) = f ◦ km (Xt (1), . . . ,Xt (m) ) ∈ L. At the same time, (m) ) ≡ d ∞ (X ◦ t,X(m) ◦ t) d Q (X,X =
∞
2−i (1 ∧ d(Xt (i),x◦ )) ≤
i=m+1
∞
2−i = 2−m < δf (ε).
i=m+1
Consequently, − f (X(m) )| ≤ ε. | f (X) is the uniform limit of a sequence Since ε > 0 is arbitrary, the function f (X) ∈ L, where f ∈ Cub (S Q,d Q ) is arbitrary. Since the complete in L. Hence f (X) metric space (S Q,d Q ) is bounded, Assertion 4 of Proposition 5.1.4 implies that is an r.v. on (,L,E). Assertion 1 is proved. the function X : (,L,E) → (S Q,d Q ) be an arbitrary r.v. Consider each 6. Conversely, let X s ∈ Q. Then the function js : (S Q,d Q ) → (S,d), defined by js (x) ≡ x(s) for : (,L,E) → (S,d) is an each x ∈ S Q , is uniformly continuous. Hence js (X) r.v. At the same time, for each ω ∈ domain(X), we have Xs (ω) ≡ X(s,ω) ≡ and Xs is an r.v. In other words, X is Hence Xs = js (X) (X(ω))(s) = js (X(ω)). an r.f. Assertion 2 and the lemma are proved. Definition 6.4.2. Metric space of r.f.’s with a countable parameter set. Let (S,d) be a locally compact metric space, not necessarily compact. Let Q = {t1,t2, . . .} be an enumerated countably infinite parameter set. Let (,L,E) be a probability space. Let Z,Z : Q × (,L,E) → (S,d) be arbitrary r.f.’s. In symbols, Z,Z ∈ × ,S). Lemma 6.4.1, with the notations therein, says that the functions R(Q : (,L,E) → (S Q,d Q ) Z Z, are r.v.’s on (,L,E) with values in the complete metric space (S Q,d Q ). In ∈ M(,S Q ). Z symbols, Z, Define ) ≡ E(1 ∧ d Q (Z, )) Z Z ρ P rob,Q (Z,Z ) ≡ ρP rob (Z, ) = Z = Ed Q (Z,
∞
2−n E(1 ∧ d(Zt (n),Zt (n) )).
(6.4.1)
n=1
P rob,Q is a metric, called the probability metric on Note that ρ P rob,Q ≤ 1. Then ρ × ,S) of r.f.’s. the space R(Q
266
Stochastic Process
In view of the right-hand side of the defining equality 6.4.1, the metric ρ P rob,Q is determined by the enumeration {t1,t2, . . .} of the countably infinite set Q. A adifferent enumeration would produce a different, albeit equivalent, metric. × Equality 6.4.1 implies that convergence of a sequence (Z (k) ) of r.f.’s in R(Q ,S) relative to the metric ρ P rob,Q is equivalent to convergence in probability and, therefore, to the weak convergence of the sequence Zs(k) , for each s ∈ Q. However, the metrical continuity is stronger. Theorem 6.4.3. Compact Daniell–Kolmogorov–Skorokhod Extension. Let (S,d) be a compact metric space. Let Q = {t1,t2, . . .} be an enumerated countably infinite parameter set. Let ξ ≡ (Ak )k=1,2,... be an arbitrary binary approximation of (S,d) relative to the reference point x◦ . Then there exists a function (Q,S) → R(Q × 0,S) DKS,ξ : F (Q,S), the r.f. Z ≡ DKS,ξ (F ) : Q × 0 → such that for each F ∈ F (S,d) has marginal distributions given by the family F . The function DKS,ξ constructed in the proof that follows will be called the Daniell–Kolmogorov– Skorokhod Extension relative to the binary approximation ξ of (S,d). k )k=1,2, , J(S Q,d Q ), and Proof. Let the corresponding objects (S Q,d Q ), ξ Q ≡ (B ρDist,ξ Q be as in Definition 6.3.3. Without loss of generality, assume that Q ≡ {t1,t2, . . .} ≡ {1,2, . . .}. 1. Consider the Compact Daniell–Kolmogorov Extension (Q,S) → J(S Q,d Q ), DK : F (Q,S) of f.j.d.’s to in Theorem 6.3.2, which maps each consistent family F ∈ F Q Q a distribution E ≡ DK (F ) on (S ,d ), such that (i) the coordinate function U : Q × (S Q,L,E) → S is an r.f., where L is the completion of C(S Q,d Q ) relative to the distribution E, and (ii) U has marginal distributions given by the family F . 2. Since the state space (S,d) is compact by hypothesis, the countable power ξ Q ≡ ξ ∞ ≡ (Bk )k=1,2, of ξ is defined and is a binary approximation of (S Q,d Q ) = (S ∞,d ∞ ), according to Definition 3.2.6. Recall that the space J(S Q,d Q ) is then equipped with the distribution metric ρDist,ξ Q defined relative to ξ ∞ , according to Definition 5.3.4, and that convergence of a sequence of distributions on (S Q,d Q ) relative to the metric ρDist,ξ Q is equivalent to weak convergence. 3. Recall from Definition 5.1.10 that M(0,S Q ) denotes the set of r.v.’s Z ≡ (Zt (1),Zt (2), . . .) ≡ (Z1,Z2, . . .) on (0,L0,I0 ), with values in the compact path space (S Q,d Q ). Theorem 5.5.1 constructed the Skorokhod representation Sk,ξ ∞ : J(S Q,d Q ) → M(0,S Q )
Random Field and Stochastic Process
267
such that for each distribution E ∈ J(S Q,d Q ), with Z ≡ Sk,ξ ∞ (E) : 0 → S Q , we have E = I0,Z ,
(6.4.2)
where I0,Z is the distribution induced on the compact metric space (S Q,d Q ) by the r.v. Z, in the sense of Definition 5.2.3. 4. We will now verify that the composite function (Q,S) → M(0,S Q ) DKS,ξ ≡ Sk,ξ ∞ ◦ DK : F
(6.4.3)
(Q,S) has the desired properties. To that end, let the consistent family F ∈ F of f.j.d.’s be arbitrary. Let Z ≡ DKS,ξ (F ). Then Z ≡ Sk,ξ ∞ (E), where E ≡ DK (F ). We need only verify that the r.v. Z ∈ M(0,S Q ), when viewed as an r.f. Z : Q × (0,L0,I0 ) → (S,d), has marginal distributions given by the family F . 5. For that purpose, let n ≥ 1 and g ∈ C(S n,d n ) be arbitrary. Define the function f ∈ C(S ∞,d ∞ ) by f (x1,x2, . . .) ≡ g(x1, . . . ,xn ), for each (x1,x2, . . .) ∈ S ∞ . Then, for each x ≡ (x1,x2, . . .) ∈ S ∞ , we have, by the definition of the coordinate function U, f (x) = f (x1,x2, . . .) = f (U1 (x),U2 (x), . . .) = g(U1, . . . ,Un )(x).
(6.4.4)
Therefore I0 g(Z1, . . . ,Zn ) = I0 f (Z1,Z2 . . .) = I0 f (Z) = I0,Z f = Ef = Eg(U1, . . . ,Un ) = F1,...,n g,
(6.4.5)
where the third equality is by the definition of the induced distribution I0,Z , where the fourth equality follows from equality 6.4.2, where the fifth equality is by equality 6.4.4, and where the last equality is by Condition (ii) in Step 1. Since n ≥ 1 and g ∈ C(S n,d n ) are arbitrary, we conclude that the r.f. DKS,ξ (F ) = Z : Q × (0,L0,I0 ) → (S,d) has marginal distributions given by the family F . The theorem is proved. Theorem 6.4.4. Continuity of Compact Daniell–Kolmogorov–Skorokhod Extension. Use the same assumptions and notations as in Theorem 6.4.3. In particular, suppose the state space (S,d) is compact. Recall that the modulus of local compactness of (S,d) corresponding to the binary approximation ξ ≡ (Ap )p=1,2,... is defined as the sequence
ξ ≡ (|Ap |)p=1,2,... of integers. Then the Compact Daniell–Kolmogorov–Skorokhod Extension (Q,S), × 0,S), DKS,ξ : (F ρMarg,ξ,Q ) → (R(Q ρP rob,Q )
(6.4.6)
268
Stochastic Process
is uniformly continuous with a modulus of continuity δ DKS (·, ξ ) dependent only P rob,Q were on ξ . The marginal metric ρ Marg,ξ,Q and the probability metric ρ introduced in Definitions 6.2.8 and 6.4.2, respectively. Proof. 1. By the defining equality 6.4.3 in Theorem 6.4.3, we have DKS,ξ ≡ Sk,ξ ∞ ◦ DK ,
(6.4.7)
where the Compact Daniell–Kolmogorov Extension (Q,S), DK : (F ρMarg,ξ,Q ) → (J(S Q,d Q ),ρDist,ξ Q ) is uniformly continuous according to Theorem 6.3.4, with modulus of continuity δ DK (·, ξ ) dependent only on the modulus of local compactness ξ ≡ (|Ak |)k=1,2,... of the compact metric space (S,d). 2. Separately, the metric space (S,d) is compact by hypothesis, and hence its countable power (S Q,d Q ) is compact. Moreover, the countable power ξ Q is defined and is a binary approximation of (S Q,d Q ). Furthermore, since d Q ≤ 1, the set J(S Q,d Q ) of distributions on (S Q,d Q ) is trivially tight, with a modulus of tightness β ≡ 1. Hence Theorem 5.5.2 is applicable to the metric space (S Q,d Q ), along with its binary approximation ξ Q , and implies that the Skorokhod representation Sk,ξ ∞ : (J(S Q,d Q ),ρDist,ξ Q ) → (M(0,S Q ),ρP rob ) Q is uniformly Q continuous, with a modulus of continuity δSk (·, ξ ,1) depending only on ξ . × 0,S), this is equivalent to saying that With M(0,S Q ) identified with R(Q × 0,S), Sk,ξ ∞ : (J(S Q,d Q ),ρDist,ξ Q ) → (R(Q ρP rob,Q ) Q is uniformly continuous, with a modulus of continuity δSk (·, ξ ,1). 3. Combining, we see that the composite function DKS,ξ in equality 6.4.6 is uniformly continuous, with a modulus of continuity given by the composite operation δ DKS (·, ξ ) ≡ δ DK (δSk (·, ξ Q ,1), ξ ), where we observe that the modulus of local compactness ξ Q of the countable power (S Q,d Q ) is determined by the modulus of local compactness ξ of the compact metric space (S,d), according to Lemma 3.2.7. The theorem is proved. We will next prove the Daniell–Kolmogorov–Skorokhod Extension Theorem for the general case of a state space is required only to be locally compact, not necessarily compact.
Random Field and Stochastic Process
269
In the remainder of this section, let Q ≡ {t1,t2, . . .} denote a countable parameter set. For simplicity of presentation, we will assume, in the proofs, that tn = n for each n ≥ 1. Theorem 6.4.5. Daniell–Kolmogorov–Skorokhod Extension and its continuity. Suppose (S,d) is locally compact, not necessarily compact, with a binary approximation ξ . Let Q = {t1,t2, . . .} be an enumerated countably infinite parameter set. Then the following conditions hold: 1. Existence: There exists a function (Q,S) → R(Q × 0,S) DKS,ξ : F (Q,S), the r.f. Z ≡ DKS,ξ (F ) : Q × 0 → S such that for each F ∈ F has marginal distributions given by the family F . The function DKS,ξ will be called the Daniell–Kolmogorov–Skorokhod Extension relative to the binary approximation ξ of (S,d). (Q,S), with a β (Q,S) be a pointwise tight subset of F 2. Continuity: Let F modulus of pointwise tightness β : (0,∞) × Q → (0,∞). Then the Daniell– Kolmogorov–Skorokhod Extension β (Q,S), × 0,S), ρMarg,ξ,Q ) → (R(Q ρP rob,Q ), DKS,ξ : (F
(6.4.8)
is uniformly continuous, where ρ Marg,ξ,Q is the marginal metric introduced in Definition 6.2.8. Moreover, the uniformly continuous function DKS,ξ has a modulus of continuity δDKS (·, ξ ,β) dependent only on the modulus of local compactness ξ ≡ (|Ak |)k=1,2,... of the locally compact state space (S,d), and on the modulus of pointwise tightness β. Proof. Without loss of generality, assume that Q ≡ {t1,t2, . . .} ≡ {1,2, . . .}. 1. Let ξ be the compactification of the given binary approximation ξ , as constructed in Corollary 3.4.4. Thus ξ is a binary approximation of (S,d) relative to the fixed reference point x◦ ∈ S. Since the metric space (S,d) is compact, the Q Q Q countable power ξ of ξ is defined and is a binary approximation of (S ,d ), according to Definition 6.3.3. 2. Recall from Lemma 6.3.6 the mapping (Q,S) → F (Q,S). ψ :F
(6.4.9)
3. Apply Theorems 6.4.3 and 6.4.4 to the compact metric space (S,d) to obtain the Compact Daniell–Kolmogorov–Skorokhod Extension (Q,S), × 0,S), DKS,ξ ≡ Sk,ξ ∞ ◦ DK : (F ρMarg,ξ,Q ) → (R(Q ρP rob,Q ), (6.4.10) which is uniformly continuous with a modulus of continuity δ DKS (·, ξ ) depen dent only on ξ .
270
Stochastic Process
4. Separately, let (Q,S)) ⊂ R(Q × 0,S). Z ≡ DKS,ξ (F (Q,S), (Q,S) such that F ≡ ψ(F ) ∈ F be arbitrary. Then there exists F ∈ F E ≡ DK (F ), and Z = Sk,ξ ∞ (E). 5. According to Step 2 of of the proof of Theorem 6.3.7, we have (i) S Q is Q a full set in (S ,L,E), (ii) Cub (S Q,d Q ) ⊂ L, and (iii) the restricted function E ≡ ϕ(E) ≡ E|Cub (S Q,d Q ) is a distribution on (S Q,d Q ). Moreover, according to Theorem 6.3.7, the coordinate function U : Q × (S Q,L,E) → (S,d) is an r.f. with marginal distributions given by the family F . 6. By the definition of the Skorokhod representation Sk,ξ ∞ , we have E = I0,Z . Q
At the same time, S Q is a full set in (S ,L,E). Hence S Q is a full subset of Q (S ,L,I0,Z ). Equivalently, I0 1(Z∈S Q ) = I0,Z 1S Q = 1, where we view the r.f. Z as a r.v. on (0,L0,I0 ). In other words, D ≡ {θ ∈ 0 : Z(θ ) ∈ S Q } is a full subset of (0,L0,I0 ). For each t ∈ Q, define Zt ≡ Z t |D. Then Zt = Z t
a.s.
on (0,L0,I0 ). Let f ∈ Cub (S n,d n ) be arbitrary. Define g ∈ Cu,b (S Q,d Q ) by g(x) ≡ f (x(t1 ), . . . ,x(tn )) = f (Ut (1), . . . ,Ut (n) )(x) for each x ∈ S Q . Then g = f (Ut (1), . . . ,Ut (n) ). Hence I0 f (Zt (1), . . . ,Zt (n) ) = I0 g(Z) = I0 g(Z) = I0,Z g = Eg ≡ Eg = Ef (Ut (1), . . . ,Ut (n) ) = Ft (1),...,t (n) f . × 0,S) and has marginal distribution given by the family F . Thus Z ∈ R(Q Define γ (Z) ≡ Z. Thus we have a function (Q,S)) → R(Q × 0,S). γ : DKS,ξ (F
(6.4.11)
Define the composite function DKS,ξ ≡ γ ◦ DKS,ξ ◦ ψ. (Q,S) be arbitrary. Let Z ≡ DKS,ξ (F ). We have just shown that, Let F ∈ F × 0,S) and has marginal distribution given by the family F . then, Z ∈ R(Q Thus Assertion 1 is proved. 7. It remains to prove that the composite function DKS,ξ is uniformly continβ (Q,S). From Step 3, we saw that the function uous on F (Q,S), × 0,S), DKS,ξ : (F ρMarg,ξ,Q ) → (R(Q ρP rob,Q )
(6.4.12)
is uniformly continuous. From Step 22 in the proof of Theorem 6.3.7, we saw that the function (Q,S), β (Q,S), ρMarg,ξ,Q ) → (F ρMarg,ξ,Q ) ψ : (F
(6.4.13)
Random Field and Stochastic Process
271
is uniformly continuous. Hence we need only show that the function γ is uniformly β (Q,S)). continuous on DKS,ξ ◦ ψ(F β (Q,S)) be arbitrary. Let Z ≡ γ (Z) To that end, let Z,Z ∈ DKS,ξ ◦ ψ(F β (Q,S) such that F ≡ ψ(F ), E ≡ and Z ≡ γ (Z ). Then there exist F ∈ F DK (F ),Z = Sk,ξ ∞ (E), and Z ≡ γ (Z) = DKS,ξ (F ). Similarly, there exist β (Q,S) such that F ≡ ψ(F ), E ≡ DK (F ),Z = ∞ (E ), and F ∈ F Sk,ξ
Z ≡ γ (Z ) = DKS,ξ (F ). Let ε > 0 be arbitrary. Let ν ≥ 1 be so large that 2−ν < ε. Take any b > β(ε,ν). Then (d(·,x◦ ) ≤ b) ⊂ K for some compact subset K of (S,d). Then, by Condition 3 of Definition 3.4.1, there exists δK (ε) > 0 such that for each y,z ∈ K with d(y,z) < δK (ε), we have d(y,z) < ε. 8. Take any α ∈ (ε,2ε). Suppose ρ P rob,Q (Z,Z ) < 2−ν αδK (ε). Consider each n = 1, . . . ,ν. Then
2−n I0 d(Zn,Zn ) = 2−n I0 d(Z n,Z n ) ≤
∞
2−m I0 d(Z m,Z m ) ≡ ρ P rob,Q (Z,Z ) < 2−ν αδK (ε),
m=1
whence
I0 d(Zn,Zn )
< αδK (ε). Chebychev’s inequality therefore implies that μ0 (d(Zn,Zn ) ≥ δK (ε)) < α < 2ε,
(6.4.14)
where μ0 is the Lebesgue measure function on 0 . We estimate I0 (1 ∧ d(Zn,Zn )) ≤ I0 (1 ∧ d(Zn,Zn ))1(d(Z(n),x(◦))∨d(Z(n),x(◦))≤b) + I0 (d(Zn,x◦ ) > b) + I0 (d(Zn,x◦ ) > b) ≤ I0 (1 ∧ d(Zn,Zn ))1(d(Z(n),x(◦))∨d(Z(n),x(◦))≤b) + ε + ε ≤ I0 (1 ∧ d(Zn,Zn ))1(d(Z(n),Z (n)) 0 so large that I0 B c < 2−n, where
B≡
n
d(Zt(0) (k),x◦ )
≤b .
(6.4.20)
k=1
Let h ≡ n ∨ [log2 b]1 . In view of the a.u. convergence 6.4.18, there exist m ≥ h and a measurable subset A of (0,L0,I0 ) with I0 Ac < 2−n such that 1A
∞
(p) (0) −2h−1 2−k d(Zt (k),Z (0) |Ah |−2 t (k) ) ≡ 1A d (Z ,Z ) ≤ 2 Q
(p)
(6.4.21)
k=1
for each p ≥ m. 3. Now consider each θ ∈ AB and each k = 1, . . . ,n. Then, k ≤ h and, by inequality 6.4.21, we have d(Zt (k) (θ ),Z t (k) (θ )) ≤ 2k 2−2h−1 |Ah |−2 ≤ 2−h−1 |Ah |−2, (p)
(0)
(6.4.22)
while equality 6.4.20 yields (0)
d(Zt (k) (θ ),x◦ ) ≤ b < 2h .
(6.4.23)
4. According to Assertion (ii) of Theorem 3.4.3, regarding the one-point compactification (S,d) of (S,d) relative to the binary approximations ξ ≡ (Ap )p=1,2,... , if y ∈ (d(·,x◦ ) ≤ 2h ) and z ∈ S are such that d(y,z) < 2−h−1 |Ah |−2, then d(y,z) < 2−h+2 .
274
Stochastic Process
Hence inequalities 6.4.22 and 6.4.23 together yield
(p) (0) d Zt (k) (θ ),Z t (k) (θ ) < 2−h+2 ≤ 2−n+2, where θ ∈ AB, and k = 1, . . . ,n are arbitrary. Consequently, 1AB
∞
n ∞
(p) (0)
(p) (0)
2−k 1 ∧ d Zt (k),Z t (k) ≤ 1AB 2−k 1 ∧ di Zt (k),Z t (k) + 2−k
k=1
k=1
≤ 1AB
n
k=n+1
2−k 2−n+2 +
k=1 −n+2
0 be arbitrary. Let j ≥ 1 and t,t ∈ Q be arbitrary with dQ (t,t ) < δCp (ε). Since Q∞ is dense in Q, there exist s,s ∈ Q∞ with dQ (s,s ) < δCp (ε) and dQ (t,s) ∨ dQ (t ,s )
0 is arbitrarily small. Summing up, the r.f. X is continuous in probability on Q, with δCp as a modulus of continuity in probability. Condition 3 has been established. 13. It remains to prove Condition 4. For each s ∈ Q∞ , letting j → ∞ in inequality 7.1.20 with t = s, we obtain Z s ,Xs ) ≡ I1 ⊗ E0 d( Z s ,Xs ) = 0. E d( Hence
D2 ≡
s = Xs ) (Z
(7.1.24)
(7.1.25)
s∈Q(∞)
is a full subset of (,L,E). Define the full subset D ≡ D2 ∩ (D1 × D0 ) of the sample space (,L,E). 14. Consider each ω ≡ (θ,ω0 ) ∈ D ⊂ D2 . Then θ ∈ D1 and ω0 ∈ D0 . Let t ∈ domain(X(·,ω)) be arbitrary. Then (t,ω) ∈ domain(X). Hence, by the defining equality 7.1.13, we have X(t,ω) = lim X(m(j )) (t,ω). j →∞
(7.1.26)
Let ε > 0 be arbitrary. Then there exists J ≥ 1 so large that d(X(t,ω),X(m(j )) (t,ω)) < ε for each j ≥ J . Consider each j ≥ J . Then (t,ω) ∈ domain(X
(m(j ))
)≡
γ (m(j ))
m(j ),i × D0 .
(7.1.27)
i=1
Hence there exists ij = 1, . . . ,γm(j ) such that (t,θ,ω0 ) ≡ (t,ω) ∈ m(j ),i(j ) × D0 .
(7.1.28)
Consequently, m(j ),i(j ),ω) = X(qm(j ),i(j ),ω), X(m(j )) (t,ω) = Z(q
(7.1.29)
where the first equality follows from equality 7.1.7, and the second from equality 7.1.25 in view of the relation ω ∈ D2 . At the same time, since (t,θ ) ∈ m(j ),i(j ) by equality 7.1.28, we have λm(j ),q(m(j ),i(j )) (t) > 0. In addition, the continuous function λm(j ),q(m(j ),i(j )) on Q has support (dQ (·,qm(j ),i(j ) ) ≤ 2−m(j )+1 ), as specified in Step 2 of Definition 7.1.4. Consequently, dQ (t,qm(j ),i(j ) ) ≤ 2−m(j )+1 .
(7.1.30)
Measurable Random Field
285
Summing up, for each ω in the full set D, and for each t ∈ domain(X(·,ω)), the sequence (sj )j =‘1,2,... ≡ (qm(j ),i(j ) )j =‘1,2,... in Q∞ is such that dQ (t,sj ) → 0 and d(X(t,ω),X(sj ,ω)) ≡ d(X(t,ω),X(qm(j ),i(j ),ω)) = d(X(t,ω),X(m(j )) (t,ω)) → 0, as j → ∞, where the second equality is from equality 7.1.29 and where the convergence is from formula 7.1.26. Since ω ∈ D and t ∈ domain(X(·,ω)) are arbitrary, Condition 4 of the conclusion of the lemma has also been proved. We are now ready to construct the extension of an arbitrary family of f.j.d.’s that is continuous in probability to a measurable r.f. We will prove, at the same time, that this construction is uniformly metrically continuous on any subset of consistence families whose members share some common modulus of pointwise tightness. Definition 7.1.6. Specification of a space of consistent families whose members share a common modulus of pointwise tightness as well as a common (Q,S) of consistent modulus of continuity in probability. Recall the set F families of f.j.d.’s with the parameter set Q and state space S. Recall also the Cp (Q,S), whose members are continuous in probability, equipped with subset F the metric ρ Cp,ξ,Q,Q(∞) defined in Definition 6.2.12 by ρ Cp,ξ,Q,Q(∞) (F,F ) ≡ ρ Marg,ξ,Q(∞) (F |Q∞,F |Q∞ ) ≡
∞
2−n ρDist,ξ n (Fq(1),...,q(n),Fq(1),...,q(n) )
(7.1.31)
n=1
Cp (Q,S). Recall the subset F β (Q,S) of F (Q,S), whose memfor each F,F ∈ F Cp,β (Q,S) ≡ bers share some common modulus of pointwise tightness β. Define F Cp,ξ,Q,Q(∞) . FCp (Q,S) ∩ Fβ (Q,S). Then FCp,β (Q,S) inherits the metric ρ Meas,Cp (Q×,S),ρSup,P rob ) Recall, from Definition 7.1.3, the metric space (R of measurable r.f.’s which are continuous in probability. Theorem 7.1.7. Construction of a measurable r.f. from a family of consistent f.j.d.’s that is continuous in probability. Let the compact metric parameter space (Q,dQ ), its dense subset Q∞ , and related auxiliary objects be as specified as in Definition 7.1.4. Consider the locally compact metric space (S,d), which is not required to have any linear structure or ordering. Let ! (1,L1,I1 ) ≡ (0,L0,E0 ) ≡ [0,1],L0, ·dθ
286
Stochastic Process
denote the Lebesgue integration space based on the interval [0,1]. Define the product sample space (,L,E) ≡ (1,L1,I1 ) ⊗ (0,L0,E0 ). Cp,β , Meas,Cp (Q × ,S), and ρSup,P rob ρCp,ξ,Q,Q(∞) ,R Let the objects δCp,β, F be as specified in Definition 7.1.6. Then there exists a uniformly continuous mapping Cp,β (Q,S), Meas,Cp (Q × ,S),ρSup,P rob ) ρCp,ξ,Q,Q(∞) ) → (R meas,ξ,ξ(Q) : (F (7.1.32) Cp,β (Q,S), the measurable r.f. such that for each F ∈ F X ≡ meas,ξ,ξ(Q) (F ) : Q × (,L,E) → S has marginal distributions given by the family F . the mapping has a modulus of continuity δmeas,ξ,ξ(Q (·,δCp,β, ξ , Moreover, ξQ ) determined by the parameters δCp,β, ξ , ξQ . We will refer to the mapping meas,ξ,ξ(Q) as the measurable extension relative to the binary approximations ξ and ξQ of (S,d) and (Q,dq ), respectively. Proof. 1. Recall from Definition 6.2.12 the isometry Cp (Q,S), Cp (Q∞,S), ρCp,ξ,Q,Q(∞) ) → (F ρMarg,ξ,Q(∞) ), Q,Q(∞) : (F Cp (Q,S). If the family F ∈ F Cp (Q,S) is where Q,Q(∞) (F ) ≡ F |Q∞ for each F pointwise tight with a modulus of pointwise tightness β, then so is Q,Q(∞) (F ) ≡ F |Q∞ . In other words, we have the isometry Cp,β (Q,S), Cp,β (Q∞,S), ρCp,ξ,Q,Q(∞) ) → (F ρMarg,ξ,Q(∞) ). Q,Q(∞) : (F (7.1.33) Cp,β (Q,S) share a common modulus of Likewise, by assumption, members of F Cp,β (Q,S)). continuity in probability δCp . Hence, so do members of Q,Q(∞) (F 2. Theorem 6.4.5 describes the uniformly continuous Daniell–Kolmogorov– Skorokhod Extension β (Q∞,S), ∞ × 0,S), ρMarg,ξ,Q(∞) ) → (R(Q ρP rob,Q(∞) ). DKS,ξ : (F (7.1.34) The uniformly continuous mapping DKS,ξ has a modulus of continuity δDKS (·, ξ ,β) dependent only on the modulus of local compactness ξ ≡ (|Ak |)k=1,2,... of the locally compact state space (S,d), and on the modulus of pointwise tightness β. Hence Cp,β (Q∞,S), ∞ × 0,S), ρMarg,ξ,Q(∞) ) → (R(Q ρP rob,Q(∞) ) DKS,ξ : (F (7.1.35) Cp,β (Q∞,S). Let Z ≡ DKS,ξ is uniformly continuous. Consider each F ∈ F (F |Q∞ ). Then E0 d(Zs ,Zt ) = Ft,s d for each s,t ∈ Q∞ . It follows that the r.f.
Measurable Random Field
287
Cp (Q∞ × 0,S). Thus we Z is continuous in probability. In other words, Z ∈ R have the uniformly continuous Daniell–Kolmogorov–Skorokhod Extension Cp,β (Q∞,S), Cp (Q∞ × 0,S), ρMarg,ξ,Q(∞) ) → (R ρP rob,Q(∞) ) DKS,ξ : (F (7.1.36) with the modulus of continuity δDKS (·, ξ ,β). 3. Now let Cp (Q∞ × 0,S) → R Meas,Cp (Q × ,S) meas,ξ(Q) : R be the mapping constructed in Theorem 7.1.5. 4. We will show that the composite mapping meas,ξ,ξ(Q) ≡ meas,ξ(Q) ◦ DKS,ξ ◦ Q,Q(∞)
(7.1.37)
Cp,β (Q,S) be arbitrary. Let X ≡ has the desired properties. To that end, let F ∈ F meas,ξ,ξ(Q) (F ). Then X ≡ meas,ξ(Q) (Z), where Z ≡ DKS,ξ ( Q,Q(∞) (F )) = DKS,ξ (F |Q∞ ). Consequently, Z has marginal distributions given by F |Q∞ . According to Theorem 7.1.5, X is continuous in probability. Moreover, X|Q∞ is equivalent to Z. Hence X|Q∞ has marginal distributions given by F |Q∞ . 5. We need to prove that the r.f. X has marginal distributions given by F . To that end, consider each k ≥ 1, r1, . . . ,rk ∈ Q, s1, . . . ,sk ∈ Q∞ , and f ∈ Cub (S k ,d k ). Then Ef (Xs(1), . . . ,Xs(k) ) = Fs(1),...,s(k) f . Now let sk → rk in Q for each k = 1, . . . ,k. Then the left-hand side converges to Ef (Xr(1), . . . ,Xr(k) ), on account of the continuity in probability of X. The right-hand side converges to Fr(1),...,r(k) f by the continuity in probability of F , according to Lemma 6.2.11. Hence Ef (Xr(1), . . . ,Xr(k) ) = Fr(1),...,r(k) f . We conclude that the measurable r.f. X ≡ meas,ξ,ξ(Q) (F ) has marginal distributions given by the family F . 6. It remains to prove that the composite mapping meas,ξ,ξ(Q) is continuous. Since the two constituent mappings Q,Q(∞) and DKS,ξ in expressions 7.1.33 and 7.1.36, respectively, are uniformly continuous, it suffices to prove that the remaining constituent mapping Cp (Q∞ × 0,S), ρP rob,Q(∞) ) meas,ξ(Q) : (R Meas,Cp (Q × ,S),ρSup,P rob ) → (R also is uniformly continuous. 7. To that end, define, as in the proof of Lemma 7.1.5, for each j ≥ 1, εj ≡ 2−j ,
(7.1.38)
nj ≡ j ∨ [(2 − log2 δCp (εj ))]1,
(7.1.39)
288
Stochastic Process
and mj ≡ mj −1 ∨ nj ,
(7.1.40)
where m0 ≡ 0. 8. Now let ε > 0 be arbitrary. Fix j ≡ [0 ∨ (5 − log2 ε)]1 . Then 2−j < 2−5 ε. Define
where (|γk |)k=1,2,... arbitrary such that
δmeas (ε,δCp, ξQ ) ≡ 2−γ (m(j ))−4 ε2, Cp (Q∞ × 0,S) be ≡ (|Bk |)k=1,2,... ≡ ξQ . Let Z,Z ∈ R
ρ P rob,Q(∞) (Z,Z ) < δmeas (ε,δCp, ξQ ).
(7.1.41)
We will verify that ρSup,P rob ( meas,ξ(Q) (Z), meas,ξ(Q) (Z )) < ε. 9. Write X ≡ meas,ξ(Q) (Z), and X ≡ meas,ξ(Q) (Z ). Thus Z,Z : Q∞ × 0 → S are r.f.’s, and X,X : Q × → S are measurable r.f.’s. As in the proof of Lemma 7.1.5, define the full subset domain(Zq ) D0 ≡ q∈Q(∞)
of (0,L0,E0 ). Similarly, define the full subset D0 ≡ q∈Q(∞) domain(Zq ) of (0,L0,E0 ). Then D0 D0 is a full subset of 0 . Note that inequality 7.1.41 is equivalent to E0
∞
t (i),Z ) < 2−γ (m(j ))−4 ε2 . 2−i d(Z t (i)
(7.1.42)
i=1
Hence, by Chebychev’s inequality, there exists a measurable set A ⊂ 0 with E0 Ac < 2−2 ε, such that ∞
−γ (m(j ))−2 2−i d(Z(t ε, i ,ω0 ),Z (ti ,ω0 )) ≤ 2
(7.1.43)
i=1
for each ω0 ∈ A. 10. Consider each t ∈ Q. According to Step 5 in the proof of Lemma 7.1.5, the set t ≡
γ (m(j )) i=1
m(j ),i,t ≡
γ (m(j ))
+ (λ+ m(j ),i−1 (t),λm(j ),i (t))
i=1
t × AD0 D be arbitrary. Then is a full subset of 1 ≡ [0,1]. Let (θ,ω0 ) ∈ 0 (θ,ω0 ) ∈ m(j ),i,t × AD0 D0 for some i = 1, . . . ,γm(j ) . Therefore the defining equality 7.1.9 in the proof of Lemma 7.1.5 says that
Measurable Random Field (m(j ))
Xt
289
(θ,ω0 ) = Z(qm(j ),i ,ω0 )
(7.1.44)
(θ,ω0 ) = Z (qm(j ),i ,ω0 ).
(7.1.45)
and (m(j ))
Xt
Separately, inequality 7.1.43 implies that γ (m(j ))
−2 d(Z(t k ,ω0 ),Z (tk ,ω0 )) ≤ 2 ε.
(7.1.46)
k=1
Consequently, t(m(j )) (θ,ω0 ),Xt (m(j )) (θ,ω0 )) = d(Z(q d(X m(j ),i ,ω0 ),Z (qm(j ),i ,ω0 )) γ (m(j ))
≤
d(Z(q m(j ),k ,ω0 ),Z (qm(j ),k ,ω0 ))
k=1 γ (m(j ))
=
−2 d(Z(t k ,ω0 ),Z (tk ,ω0 )) ≤ 2 ε,
k=1
(7.1.47) where the second equality is thanks to the enumeration equality 7.1.5 in Definition 7.1.4, where the last inequality is inequality 7.1.46, and where t × AD0 D is arbitrary. Hence, since d ≤ 1, it follows from inequality (θ,ω0 ) ∈ 0 7.1.47 that t E d(X
(m(j ))
(m(j ))
,Xt
t(m(j )),Xt (m(j )) )1 ) ≤ E d(X (t)×AD(0)D (0) + E1 (t)×(AD(0)D (0))c ≤ 2−2 ε + (I1 ⊗ E0 )1 (t)×Ac = 2−2 ε + E0 1Ac < 2−2 ε + 2−2 ε = 2−1 ε.
(7.1.48)
11. Separately, inequality 7.1.21 says that t E d(X
(m(k))
(m(k+1))
,Xt
) < 2−k+1,
(7.1.49)
for arbitrary k ≥ 1. Hence t E d(X
(m(j ))
t(m(j )),Xt(m(j +1)) ) + E d(X t(m(j +1)),Xt(m(j +2)) ) + · · · ,Xt ) ≤ E d(X ≤ 2−j +1 + 2−j + · · · = 2−j +2 .
(7.1.50)
Similarly, (m(j ))
t E d(X
,Xt ) ≤ 2−j +2 .
(7.1.51)
Combining inequalities 7.1.48, 7.1.50, and 7.1.51, we obtain t ,Xt ) < 2−1 ε + 2−j +2 + 2−j +2 = 2−1 ε + 2−j +3 < 2−1 ε + 2−1 ε = ε, E d(X
290
Stochastic Process
where t ∈ Q is arbitrary. Therefore t ,Xt ) < ε. ρSup,P rob (X,X ) ≡ sup E d(X t∈Q
In other words, ρSup,P rob ( meas,ξ(Q) (Z), meas,ξ(Q) (Z )) < ε, provided that
ρ P rob,Q(∞) (Z,Z ) < δmeas (ε,δCp, ξQ ). Thus δmeas (·,δCp, ξQ ) is a modulus of continuity of meas,ξ(Q) on
(7.1.52)
Cp,δ(Cp) (Q,S)|Q∞ ) = DKS,ξ ( Q,Q(∞) (F Cp,δ(Cp) (Q,S))). DKS,ξ (F 12. Combining, we conclude that the composite function meas,ξ,ξ(Q) in expression 7.1.32 is indeed uniformly continuous, with the composite modulus of continuity δmeas,ξ,ξ(Q (·,δCp,β, ξ , ξQ ) ≡ δQ,Q(∞) (δDKS (δmeas (·,δCp, ξQ ), ξ ,β)) = δDKS (δmeas (·,δCp, ξQ ), ξ ,β),
as alleged.
7.2 Measurable Gaussian Random Field In this section, let (S,d) ≡ (R,decld ), where decld is the Euclidean metric. Let (Q,dQ ) be a compact metric space with an arbitrary but fixed distribution IQ . As an application of Theorem 7.1.7, we will construct a measurable Gaussian r.f. X : Q × → R from its continuous mean and covariance functions; we will then prove the continuity of this construction. For that purpose we need only prove that from the mean and covariance functions we can construct a consistent family of normal f.j.d.’s that is continuous in probability, and that the construction is continuous. Let ξ , and ξQ ≡ (Bn )n=1,2,... be arbitrary but fixed binary approximations of the Euclidean state space (S,d) ≡ (R,d), and of the compact parameter space (Q,dQ ), respectively, as specified in Definitions 7.1.1 and 7.1.4, respectively, along with associated objects. In particular, recall the enumerated, countably infinite, dense subset Q∞ ≡ {t1,t2, . . .} ≡
∞
Bn
(7.2.1)
n=1
of Q, where Bn ≡ {qn,1, . . . ,qn,γ (n) } = {t1, . . . ,tγ (n) } are sets, for each n ≥ 1. Definition 7.2.1. Gaussian r.f. Let (,L,E) be an arbitrary probability space. A real-valued r.f. X : Q × → R is said to be Gaussian if its marginal distributions are normal. The functions μ(t) ≡ EXt and σ (t,s) ≡ E(Xt − EXt )(Xs − EXs ) for each t,s ∈ Q are then called the mean and covariance functions,
Measurable Random Field
291
respectively, of the r.f. X. A Gaussian r.f. is said to be centered if μ(t) ≡ EXt = 0 for each t ∈ Q. Without loss of generality, we treat only Gaussian r.f.’s that are centered. Meas,Cp (Q×,S) whose members Gauss (Q×,S) denote the subset of R Let R are Gaussian r.f.’s. Recall the matrix terminologies in Definition 5.7.1, especially those related to nonnegative definite matrices. Definition 7.2.2. Nonnegative definite functions. Let σ : Q2 → [0,∞) be an arbitrary symmetric function. If, for each m ≥ 1 and for each r1, . . . ,rm ∈ Q, the square matrix [σ (rk ,rh )]k=1,...,m;h=1,...,m is nonnegative definite, then σ is said to be a nonnegative definite function on the set Q2 . If, for each m ≥ 1 and for each r1, . . . ,rm ∈ Q such that dQ (rk ,rh ) > 0 for each k,h = 1, . . . ,m with k h, the matrix [σ (rk ,rh )]k=1,...,m;h=1,...,m is positive definite, then the function σ is said to be positive definite on the set Q2 . Let Gδ(cov),b (Q) denote the set of continuous nonnegative definite functions 2 ) whose members share some common modulus on the metric space (Q2,dQ 2 ). Equip of continuity δcov , as well as some common bound b > 0, on (Q2,dQ cov defined by Gδ(cov),b (Q) with the metric ρ ρ cov (σ,σ ) ≡
sup
|σ (t,s) − σ (t,s)|
(t,s)∈Q×Q
for each σ,σ ∈ Gδ(cov),b (Q). Proposition 7.2.3. Consistent family of normal f.j.d.’s from covariance function. Let Gδ(cov),b (Q) be as described in Definition 7.2.2. Let σ ∈ Gδ(cov),b (Q) be arbitrary. Thus σ : Q × Q → [0,∞) is a continuous nonnegative definite function. Let m ≥ 1 and r ≡ (r1, . . . ,rm ) ∈ Qm be arbitrary. Write the nonnegative definite matrix σ (r) ≡ [σ (rk ,rh )]k=1,...,m;h=1,...,m
(7.2.2)
and define the normal distribution σ ≡ 0,σ (r) Fr(1),...,r(m)
(7.2.3)
on R n with mean 0 and covariance matrix σ (r). Then the following conditions hold: 1. The family σ : m ≥ 1;r1, . . . ,rm ∈ Q} F σ ≡ covar,fj d (σ ) ≡ {Fr(1),...,r(m)
(7.2.4)
of f.j.d.’s is consistent. 2. The consistent family F σ is continuous in probability, with a modulus of continuity in probability δCp ≡ δCp,δ(·,cov) defined by
292
Stochastic Process δCp (ε) ≡ δCp,δ(·,cov) (ε) ≡ δcov (2−1 ε2 )
for each ε > 0. 3. The consistent family F σ is pointwise tight on Q, with a modulus of pointwise tightness β ≡ βb defined by 2 βb (ε,r) ≡ ε−1 b for each ε > 0, for each r ∈ Q . 4. Thus we have a function Cp,β (Q,R), covar,fj d : Gδ(cov),b (Q) → F (Q,R) whose members share the Cp,β (Q,R) is a subset of F where F common modulus of continuity in probability δCp ≡ δCp,δ(·,cov) as well as the common modulus of pointwise tightness β ≡ βb . Proof. 1. First prove the consistency of the family F σ . To that end, let n,m ≥ 1 be arbitrary. Let r ≡ (r1, . . . ,rm ) be an arbitrary sequence in Q, and let j ≡ (j1, . . . ,jn ) be an arbitrary sequence in {1, . . . ,m}. Let the matrix σ (r) be defined σ ≡ 0,σ (r) is the distribution of as in equality 7.2.2. By Lemma 5.7.8, Fr(1),...,r(m) an r.v. Y ≡ AZ, where A is an m × m matrix such that σ (r) = AAT , and where Z is a standard normal r.v. on some probability space (,L,E), with values in R m . Let the dual function j ∗ : R m → R n be defined by j ∗ (x1, . . . ,xm ) ≡ (xj (1), . . . ,xj (n) ), for each x ≡ (x1, . . . ,xm ) ∈ R m . Then j ∗ (x) = Bx for each x ∈ R m , where the n × m matrix B ≡ [bk,h ]k=1,...,n;h=1,...,m ≡ BA. is defined by bk,h ≡ 1 or 0 depending on whether h = jk or h jk . Let A Define the n × n matrix A T = BAAT B T = Bσ (r)B T σ (r) ≡ A = [σ j (k),j (h) ]k=1,...,n;h=1,...,n = [σ (rj (k),rj (h) )]k=1,...,n;h=1,...,n, where the fourth equality is by using the definition of the matrix B and carrying out the matrix multiplication. Then, by the defining formula 7.2.3, σ Fr(j σ (r) . (1)),...,r(j (n)) = 0,
At the same time, the r.v. ≡ j ∗ (Y ) = BY = BAZ ≡ AZ Y has the normal characteristic function defined by
Measurable Random Field = exp − 1 λT A A T λ = exp − 1 λT E(exp iλT AZ) σ (r)λ 2 2
293
has the normal distribution 0, for each λ ∈ R n . Hence Y σ (r) . Combining, we see n that for each f ∈ C(R ), σ ) Fr(1),...,r(m) ( f ◦ j ∗ ) = E( f ◦ j ∗ (Y )) = Ef (Y σ = 0, σ (r) ( f ) = Fr(j (1)),...,r(j (n)) f .
We conclude that the family F σ of f.j.d.’s is consistent. Assertion 1 is verified. 2. To prove the continuity in probability of the family F σ , consider the case where m = 2. Consider each r ≡ (r1,r2 ) ∈ Q2 with 1 2 (7.2.5) ε . dQ (r1,r2 ) < δCp (ε) ≡ δcov 2 Let d denote the Euclidean metric for R. As in Step 1, there exists an r.v. Y ≡ (Y1,Y2 ) with values in R 2 with the normal distribution 0,σ (r) , where σ (r) ≡ [σ (rk ,rh )]k=1,2;h=1,2 . Then σ Fr(1),r(2) (1 ∧ d) = E1 ∧ |Y1 − Y2 | = 0,σ (r) (1 ∧ d) ≤ 0,σ (r) d 2 = σ (r1,r1 ) − 2σ (r1,r2 ) + σ (r2,r2 ) 2 ≤ |σ (r1,r1 ) − σ (r1,r2 )| + |σ (r2,r2 ) − σ (r1,r2 )| < 1 2 1 2 ε + ε = ε, ≤ 2 2
where the second inequality is Lyapunov’s inequality, and where the last inequality is thanks to inequality 7.2.5. Thus F σ is continuous in probability, with δCp (·,δcov ) as a modulus of continuity in probability. Assertion 2 is proved. 3. Let r ∈ Q be arbitrary. Let √ ε > 0 be arbitrary. Let n ≥ 1 be arbitrary and so large that n > βb (ε,r) ≡ ε−1 b. Define the function hn ≡ 1 ∧ (1 + n − | · |)+ ∈ C(R,d). Then there√exists an r.r.v. Y with the normal distribution 0,[r] ≡ 0,σ (r,r) . Take any α ∈ ( ε−1 b,n). Then 1 − Frσ hn ≡ 1 − 0,[r] hn ≡ E(1 − hn (Y )) ≤ E1(|Y |>α) ≤ α −2 EY 2 = α −2 σ (r,r) ≤ (εb−1 )b = ε, where the second inequality is by Chebychev’s inequality. Thus βb is a modulus of pointwise tightness of the family F σ in the sense of Definition 6.3.5, as alleged in Assertion 3. 4. Assertion 4 is a summary of Assertions 1–3.
294
Stochastic Process
Proposition 7.2.4. Normal f.j.d.’s depend continuously on a covariance function. Let Gδ(cov),b (Q) be as in Definition 7.2.2. Equip Gδ(cov),b (Q) with the metric ρ cov defined by ρ cov (σ,σ ) ≡
|σ (t,s) − σ (t,s)|
sup (t,s)∈Q×Q
for each
σ,σ
∈ Gδ(cov),b (Q). Recall from Definition 7.1.6 the metric space Cp,β (Q,R), (F ρCp,ξ,Q,Q(∞) )
of consistent families of f.j.d.’s with parameter space (Q,dQ ) and state space R, whose members share the common modulus of continuity in probability δCp ≡ δCp,δ(·,cov) as well as the common modulus of pointwise tightness β ≡ βb . Then the function Cp,β (Q,R), ρcov ) → (F ρCp,ξ,Q,Q(∞) ) covar,fj d : (Gδ(cov),b (Q), in Proposition 7.2.3 is uniformly continuous, with a modulus of continuity δcovar,fj d defined in equality 7.2.8 in the following proof. Proof. Recall that Q∞ ≡ {t1,t2, . . .} is an enumeration of the dense subset of Q. 1. Let ε > 0 be arbitrary. Let n ≥ 1 be arbitrary. By Theorem 5.8.11, there exists δch,dstr (ε,n) > 0 such that, for arbitrary distributions J,J on R n whose respective characteristic functions ψ,ψ satisfy
ρchar,n (ψ,ψ ) ≡
∞
j =1
2−j sup |ψ(λ) − ψ (λ)| < δch,dstr (ε,n), |λ|≤j
(7.2.6)
we have ρDist,ξ n (J,J ) < ε,
(7.2.7)
where ρDist,ξ n is the metric on the space of distributions on R n , as in Definition 5.3.4. 2. Let ε > 0 be arbitrary. Let m ≥ 1 be so large that 2−m+1 < ε. Let K ≥ 1 be so large that 2−K < α ≡ 2−1
m >
δch,dstr (2−1 ε,n).
n=1
Define δcovar,fj d (ε) ≡ 2K −2 m−2 α.
(7.2.8)
We will verify that δcovar,fj d is a desired modulus of continuity of the function covar,fj d . 3. To that end, let σ,σ ∈ Gδ(cov),b (Q) be arbitrary such that ρ cov (σ,σ ) < δcovar,fj d (ε).
(7.2.9)
Measurable Random Field
295
Let F σ ≡ covar,fj d (σ ) and F σ ≡ covar,fj d (σ ) be constructed as in Theorem 7.2.3. We will show that ρ Cp,ξ,Q,Q(∞) (F σ ,F σ ) < ε. 4. First note that inequality 7.2.9 is equivalent to |σ (t,s) − σ (t,s)| < 2K −2 m−2 α.
sup
(7.2.10)
(t,s)∈Q×Q
Next, consider each n = 1, . . . ,m. By Theorem 7.2.3, the joint normal distribution Ftσ(1),...,t (n) has mean 0 and covariance matrix σ ≡ [σ (tk ,th )]k=1,...,n;h=1,...,n .
(7.2.11)
Hence it has a characteristic function defined by n n 1
σ χt (1),...,t (n) (λ) ≡ exp − λk σ (tk ,th )λh 2 k=1 h=1
for each λ ≡ (λ1, . . . ,λn ) ∈ R n , with a similar equality for σ . Let λ ≡ (λ1, . . . ,λn ) ∈ R n be arbitrary. As an abbreviation, write 1
γ (λ) ≡ − λk σ (tk ,th )λh 2 n
n
k=1 h=1
and γ (λ) ≡ −
1
λk σ (tk ,th )λh . 2 n
n
k=1 h=1
Then, because the function σ is nonnegative definite, we have γ (λ) ≤ 0 and γ (λ) ≤ 0. Consequently, |eγ (λ) − eγ
(λ)
| ≤ |γ (λ) − γ (λ)|.
5. Suppose |λ| ≤ K. Then |eγ (λ) − eγ
(λ)
| ≤ |γ (λ) − γ (λ)| 1
|λk σ (tk ,th )λh − λk σ (tk ,th )λh | 2 n
≤
n
k=1 h=1
= ≤
n n 1
2
k=1 h=1
n n 1
2
|λk | · |σ (tk ,th ) − λk σ (tk ,th )| · |λh | |λk | · 2K −2 m−2 α · |λh |
k=1 h=1
≤ n K · K −2 m−2 α · K ≤ α, 2
where the third inequality is by inequality 7.2.10.
296
Stochastic Process
6. We estimate
ρchar,n (χtσ(1),...,t (n),χtσ(1),...,t (n) ) ≡
∞
2−j sup |χtσ(1),...,t (n) (λ) − χtσ(1),...,t (n) (λ)| |λ|≤j
j =1
≤ sup |eγ (λ) − eγ |λ|≤K
(λ)
|+
∞
2−j ≤ α +
j =K+1 −K
=α+2
∞
2−j
j =K+1 −1
< α + α ≤ δch,dstr (2
ε,n).
Hence, according to inequality 7.2.7, we have
ρDist,ξ n (Ftσ(1),...,t (n),F tσ(1),...,t (n) ) < 2−1 ε, where n = 1, . . . ,m is arbitrary. Therefore, according to Definition 6.2.12,
ρ Cp,ξ,Q,Q(∞) (F σ ,F σ ) ≡
∞
2−n ρDist,ξ n (Ftσ(1),...,t (n),F tσ(1),...,t (n) )
n=1
≤
m
2−nρDist,ξ n (Ftσ(1),...,t (n),F σt(1),...,t (n) ) +
n=1
≤
m
∞
2−n
n=m+1
2−n 2−1 ε + 2−m < 2−1 ε + 2−1 ε = ε,
(7.2.12)
n=1
where we used the bounds 0 ≤ ρDist,ξ n ≤ 1 for each n ≥ 1. Since ε > 0 is arbitrarily small, we conclude that the function covar,fj d is uniformly continuous, with modulus of continuity δcovar,fj d . The proposition is proved. Now we can apply the theorems from Section 7.1. As before, let ! (0,L0,I0 ) ≡ (1,L1,I1 ) ≡ [0,1],L1, ·dθ be the Lebesgue integration space based on the interval [0,1], and let (,L,E) ≡ (1,L1,E1 ) ⊗ (0,L0,E0 ). Theorem 7.2.5. Construction of a measurable Gaussian r.f. from a continuous covariance function, and continuity of said construction. Let Gδ(cov),b (Q) be cov defined by as in Definition 7.2.2. Equip Gδ(cov),b (Q) with the metric ρ ρ cov (σ,σ ) ≡
sup
|σ (t,s) − σ (t,s)|
(t,s)∈Q×Q
for each σ,σ ∈ Gδ(cov),b (Q). Recall from Definition 7.1.3 the metric space Meas,Cp (Q × ,R),ρSup,P rob ) of measurable r.f.’s X : Q × → R that (R are continuous in probability.
Measurable Random Field
297
Then there exists a uniformly continuous mapping Meas,Cp (Q × ,R),ρSup,P rob ), ρcov ) → (R cov,gauss,ξ,ξ(Q) : (Gδ(cov),b (Q), with a modulus of continuity δcov,gauss (·,δcov,b, ξ , ξQ ), with the following properties: 1. Let σ ∈ Gδ(cov),b (Q) be arbitrary. Then X ≡ cov,gauss,ξ,ξ(Q) (σ ) is a Gaussian r.f.. 2. Let r1,r2 ∈ Q be arbitrary. Then EXr(1) = 0,and EXr(1) Xr(2) = σ (r1,r2 ). We will call the function cov,gauss,ξ,ξ(Q) the measurable Gaussian extension relative to the binary approximations ξ and ξQ . Cp,β (Q,R), Proof. 1. Recall from Definition 7.1.6 the metric space (F ρ Cp,ξ,Q,Q(∞) ) of consistent families of f.j.d.’s with parameter space (Q,dQ ) and state space R, whose members share the common modulus of continuity in probability δCp ≡ δCp,δ(·,cov) as well as the common modulus of pointwise tightness β ≡ βb . 2. Recall, from Theorem 7.1.7, the uniformly continuous mapping Cp,β (Q,S), Meas,Cp (Q × ,S),ρSup,P rob ), ρCp,ξ,Q,Q(∞) ) → (R meas,ξ,ξ(Q) : (F (7.2.13) with a modulus of continuity δmeas,ξ,ξ(Q (·,δCp,β, ξ , ξQ ). 3. Recall, from Proposition 7.2.4, the uniformly continuous mapping Cp,β (Q,R), covar,fj d : (Gδ(cov),b (Q), ρcov ) → (F ρCp,ξ,Q,Q(∞) ),
(7.2.14)
with a modulus of continuity δcovar,fj d . 4. We will verify that the composite mapping ρcov ) cov,gauss,ξ,ξ(Q) ≡ meas,ξ,ξ(Q) ◦ covar,fj d : (Gδ(cov),b (Q), Meas,Cp (Q × ,R),ρSup,P rob ) → (R has the desired properties. 5. First note that the composite function cov,gauss,ξ,ξ(Q) is uniformly continuous, with a modulus of continuity defined by the composite operation δcov,gauss (·,δcov,b, ξ , ξQ ) ≡ δcovar,fj d (δmeas,ξ,ξ(Q (·,δCp,β, ξ , ξQ )). 6. Next, let σ ∈ Gδ(cov),b (Q) be arbitrary. Let X ≡ cov,gauss,ξ,ξ(Q) (σ ) ≡ meas,ξ,ξ(Q) ◦ covar,fj d (σ ) : Q × → R. Then X ≡ meas,ξ,ξ(Q) (F σ ) ≡ meas,ξ,ξ(Q) ◦ covar,fj d (σ ), where F σ ≡ covar,fj d (σ ). By Proposition 7.2.3, F σ ≡ covar,fj d (σ ) is a consistent family of of normal f.j.d.’s and is continuous in probability. According
298
Stochastic Process
to Theorem 7.1.7, the measurable r.f. X has marginal distributions given by F σ . Hence X is a Gaussian r.f and is continuous in probability. Assertion 1 is proved. 7. Now let r1,r2 ∈ Q. Then the r.r.v.’s Xr(1),Xr(2) have joint normal distribution σ = 0,σ , where σ ≡ [σ (ri ,rj )]i,j =1,2 . Hence, by Lemma 5.7.8, we Fr(1),r(2) have EXr(1) = 0,and EXr(1) Xr(2) = σ (r1,r2 ). Assertion 2 and the theorem are proved.
8 Martingale
In this chapter, we define a martingale X ≡ {Xt : t = 1,2, . . .} for modeling one’s fortune in a fair game of chance. Then we will prove the basic theorems on martingales, which have wide-ranging applications. Among these is the a.u. convergence of Xt as t → ∞. Our proof is constructive and quantifies rates of convergence by means of a maximal inequality. There are proofs in traditional texts that also are constructive and quantify rates similarly by means of maximal inequalities. These traditional maximal inequalities, however, require the integrability of |Xt |p for some p > 1, or at least the integrability of |Xt | log |Xt |. For the separate case of p = 1, the classical proof of a.u. convergence is by a separate inference from certain upcrossing inequalities. Such inference is essentially equivalent to the principle of infinite search and is not constructive. In contrast, the maximal inequality we present requires only the integrability of |Xt |. Therefore, thanks to Lyapunov’s inequality, it is at once applicable to the case of integrable |Xt |p for any given p ≥ 1, without having to first determine whether p > 1 or p = 1. For readers who are uninitiated in the subject, the previous paragraphs are perhaps confusing, but will become clear as we proceed. For the rich body of classical results on, and applications of, martingales, see, e.g., [Doob 1953; Chung 1968; Durret 1984].
8.1 Filtration and Stopping Time Definition 8.1.1. Assumptions and notations. In this chapter, let (S,d) be a locally compact metric space with an arbitrary but fixed reference point x◦ . Let (,L,E) be an arbitrary probability space. Unless otherwise specified, an r.v. refers to a measurable function with values in S. Let Q denote an arbitrary nonempty subset of R, called the time parameter set. If (,L ,E) is a probability subspace of (,L,E), we will simply call L a probability subspace of L when and E are understood. As an abbreviation, we will write A ∈ L if A is a measurable subset of (,L,E). Thus A ∈ L iff 1A ∈ L, in which case we will write P (A), P A, E1A , and EA 299
300
Stochastic Process
interchangeably, and write E(X;A) ≡ EX1A for each X ∈ L. As usual, we write a subscripted expression xy interchangeably with x(y). Recall the convention in Definition 5.1.3 regarding regular points of r.r.v.’s. Definition 8.1.2. Filtration and adapted process. Suppose that for each t ∈ Q, there exists a probability subspace (,L(t),E) of (,L,E), such that L(t) ⊂ L(s) for each t,s ∈ Q with t ≤ s. Then the family L ≡ {L(t) : t ∈ Q} is called a filtration in (,L,E). The filtration L is said to be right continuous if, for each t ∈ Q, we have L(s) . L(t) = s∈Q;s>t
is a subset of Q. Then a stochastic process X : Q× Separately, suppose Q → (t) S is said to be adapted to the filtration L if Xt is a r.v. on (,L ,E) for each t ∈ Q. The probability space L(t) can be regarded as the observable history up to the time t. Thus a process X adapted to L is such that Xt is observable at the time t, for each t ∈ Q. Definition 8.1.3. Natural filtration of a stochastic process. Let X : Q× → S be an arbitrary stochastic process. For each t ∈ Q, define the set G(X,t) ≡ {Xr : r ∈ Q; r ≤ t}, and let L(X,t) ≡ L(Xr : r ∈ Q; r ≤ t) ≡ L(G(X,t) ) be the probability subspace of L generated by the set G(X,t) of r.v.’s. Then the family LX ≡ {L(X,t) : t ∈ Q} is called the natural filtration of the process X. Lemma 8.1.4. A natural filtration is indeed a filtration. Let X : Q× → S be an arbitrary stochastic process. Then the natural filtration LX of X is a filtration to which the process X is adapted. Proof. For each t ≤ s in Q, we have G(X,t) ⊂ G(X,s) , whence L(X,t) ⊂ L(X,s) . Thus LX is a filtration. Let t ∈ Q be arbitrary. Then f (Xt ) ∈ L(G(X,t) ) ≡ L(X,t) for each f ∈ Cub (S,d). At the same time, because Xt is an r.v. on (,L,E), we have P (d(Xt ,x◦ ) ≥ a) → 0 as a → ∞. Hence Xt is an r.v. on (,L(X,t),E) according to Proposition 5.1.4. Thus the process X is adapted to its natural filtration LX . Definition 8.1.5. Right-limit extension and right continuity of a filtration. Suppose (i) Q = [0,∞) or (ii) Q ≡ [0,a] for some a > 0. Suppose Q is a subset that is dense in Q and that, in case (ii), contains the endpoint a. Let L ≡ {L(t) : t ∈ Q} be an arbitrary filtration of a given probability space (,L,E).
Martingale
301
In case (i), define, for each t ∈ Q, the probability subspace {L(s) : s ∈ Q ∩ (t,∞)}. L(t+) ≡
(8.1.1)
of L. In case (ii), define, for each t ∈ Q, the probability subspace {L(s) : s ∈ Q ∩ (t,a] ∪ {a})}. L(t+) ≡
(8.1.2)
Then the filtration L+ ≡ {L(t+) : t ∈ Q} is called the right-limit extension of the filtration L. If Q = Q and L(t) = L(t+) for each t ∈ Q, then L is said to be a right continuous filtration. Lemma 8.1.6. Right-limit extension of a filtration is right continuous. In the notations of Definition 8.1.5, we have (L+ )+ = L+ . In words, the right-limit extension of the filtration L ≡ {L(t) : t ∈ Q} is right continuous. Proof. We will give the proof only for the case where Q = [0,∞), with the proof for the case where Q ≡ [0,a] being similar. To that end, let t ∈ Q be arbitrary. Then {L(s+) : s ∈ Q ∩ (t,∞)} (L(t+)+ ) ≡ B A ≡ {L(u) : u ∈ Q ∩ (s,∞)} : s ∈ Q ∩ (t,∞) = {L(u) : u ∈ Q ∩ (t,∞)} ≡ L(t+), where the third equality is because u ∈ Q ∩ (t,∞) iff u ∈ Q ∩ (s,∞) for some s ∈ Q ∩ (t,∞), thanks to the assumption that Q is dense in Q. Definition 8.1.7. r.r.v. with values in a subset of R. Let A denote an arbitrary nonempty subset of R. We say that an r.r.v. η has values in the subset A if (η ∈ A) is a full set. Lemma 8.1.8. r.r.v. with values in an increasing sequence in R. Let the subset A ≡ {t0,t1, . . .} ⊂ R be arbitrary such that tn−1 < tn for each n ≥ 1. Then an r.r.v. η has values in A iff (i) (η = tn ) is measurable for each n ≥ 0 and (ii) ∞ n=1 P (η = tn ) = 1. Proof. Recall Definition 4.8.13 of regular points of a real-valued measurable function. Suppose the r.r.v. η has values in A. For convenience, write t−1 ≡ t0 −(t1 −t0 ). Consider each n ≥ 0. Write n ≡ 2−1 ((tn −tn−1 )∧(tn+1 −tn )) > 0. Then there exist regular points t,s of the r.r.v. η such that tn−1 < tn − n < s < tn < t < t + n < tn+1 . Then (η = tn ) = (η ≤ t)(η ≤ s)c (η ∈ A). Since (η ≤ t), (η ≤ s), and (η ∈ A) are measurable subsets, it follows that the set (η = tn ) is measurable. At the same time, P (η ≤ tm ) ↑ 1 as m → ∞ since η is an r.r.v. Hence
302
Stochastic Process m
P (η = tn ) = P (η ≤ tm ) ↑ 1
n=1
as m → ∞. In other words, ∞ n=1 P (η = tn ) = 1. Thus we have proved that if the r.r.v. η has values in A, then Conditions (i) and (ii) hold. Conversely, if Conditions (i) and (ii) hold, then ∞ n=1 (η = tn ) is a full set that is contained in (η ∈ A), whence the latter set is a full set. Thus the r.r.v. η has values in A. Definition 8.1.9. Stopping time, space of integrable observables at a stopping time, and simple stopping time. Let Q denote an arbitrary nonempty subset of R. Let L be an arbitrary filtration with time parameter set Q. Then an r.r.v. τ with values in Q is called a stopping time relative to the filtration L if (τ ≤ t) ∈ L(t)
(8.1.3)
for each regular point t ∈ Q of the r.r.v. τ . We will omit the reference to L when it is understood from the context, and simply say that τ is a stopping time. Each r.v. relative to the probability subspace L(τ ) ≡ {Y ∈ L : Y 1(τ ≤t) ∈ L(t) for each regular point t ∈ Q of τ } is said to be observable at the stopping time τ . Each member of L(τ ) is called an integrable observable at the stopping time τ . Let X : Q × → S be an arbitrary stochastic process adapted to the filtration L. Define the function Xτ by domain(Xτ ) ≡ {ω ∈ domain(τ ) : (τ (ω),ω) ∈ domain(X)} and by Xτ (ω) ≡ X(τ (ω),ω)
(8.1.4)
for each ω ∈ domain(Xτ ). Then the function Xτ is called the observation of the process X at the stopping time τ . In general, Xτ need not be a well-defined r.v. We will need to prove that Xτ is a well-defined r.v. in each application before using it as such. A stopping time τ with values in some metrically discrete finite subset of Q is called a simple stopping time. We leave it as an exercise to verify that L(τ ) is indeed a probability subspace. A trivial example of a stopping time is a deterministic time τ ≡ s, where s ∈ Q is arbitrary. Definition 8.1.10. Specialization to a metrically discrete parameter set. In the remainder of this section, unless otherwise specified, assume that the parameter set Q ≡ {0,,2, . . .} is equally spaced, with some fixed > 0, and let L ≡ {L(t) : t ∈ Q} be an arbitrary but fixed filtration in (,L,E) with parameter Q.
Martingale
303
Proposition 8.1.11. Basic properties of stopping times: a metrically discrete case. Let τ and τ be stopping times with values in Q ≡ {0,,2, . . .}, relative to the filtration L. For each n ≥ −1, write tn ≡ n for convenience. Then the following conditions hold: 1. Let η be an r.r.v. with values in Q . Then η is a stopping time iff (η = tn ) ∈ L(t (n)) for each n ≥ 0. 2. τ ∧ τ , τ ∨ τ are stopping times. 3. If τ ≤ τ , then L(τ ) ⊂ L(τ ) . 4. Let X : Q × → S be an arbitrary stochastic process adapted to the filtration L. Then Xτ is a well-defined r.v. on the probability space (,L(τ ),E). Proof. 1. By Lemma 8.1.8, the set (η = tn ) is measurable for each n ≥ 0, and ∞ n=1 P (η = tn ) = 1. Suppose η is a stopping time. Let n ≥ 0 be arbitrary. Then (η ≤ tn ) ∈ L(t (n)) . Moreover, if n ≥ 1, then (η ≤ tn−1 )c ∈ L(t (n−1)) ⊂ L(t (n)) . If n = 0, then (η ≤ tn−1 )c = (η ≥ 0) is a full set, whence (η ≥ 0) ∈ L(t (n)) . Combining, we see that (η = tn ) = (η ≤ tn )(η ≤ tn−1 )c ∈ L(t (n)) . We have proved the “only if” part of Assertion 1. Conversely, suppose (η = tn ) ∈ L(t (n)) for each n ≥ 0. Let t ∈ Q be arbitrary. Then t = tm for some m ≥ 0. Hence (η ≤ t) = m n=0 (η = tn ), where, by assumption, (η = tn ) ∈ L(t (n)) ⊂ L(t (m)) for each n = 0, . . . ,m. Thus we see that (η ≤ t) is observable at time t, where t ∈ Q is arbitrary. We conclude that η is a stopping time. 2. Let t ∈ Q be arbitrary. Then, since Q is countable and discrete, we have (τ ∧ τ ≤ t) = (τ ≤ t) ∪ (τ ≤ t) ∈ L(t) and (τ ∨ τ ≤ t) = (τ ≤ t)(τ ≤ t) ∈ L(t) . Thus τ ∧ τ and τ ∨ τ are stopping times. 3. Let Y ∈ L(τ ) be arbitrary. Consider each t ∈ Q. Then, since τ ≤ τ ,
Y 1(τ ≤t) = Y 1(τ =s) 1(τ ≤t) = Y 1(τ =s) 1(τ ≤t) ∈ L(t) . s∈Q
s∈[0,t]Q
Thus Y ∈ L(τ ), where Y ∈ L(τ ) is arbitrary. We conclude that L(τ ) ⊂ L(τ ) . 4. Let X : Q× → S be an arbitrary stochastic process adapted to the filtration ∞ ∞ L. Define the full sets A ≡ n=0 domain(Xt (n) ) and B ≡ n=0 (τ = tn ). Consider each ω ∈ AB. Then (τ (ω),ω) = (tn,ω) ∈ domain(X) on (τ = tn ), for each n ≥ 0. In short, Xτ is defined and is equal to the r.v. Xt (n) on (τ = tn ), for each n ≥ 0. Since ∞ n=0 (τ = tn ) is a full set, the function Xτ is therefore an r.v. according to Proposition 4.8.8. Assertion 4 is proved. Simple first exit times from a time-varying neighborhood, introduced next, are examples of simple stopping times.
304
Stochastic Process
Definition 8.1.12. Simple first exit time. Let Q ≡ {s0, . . . ,sn } be a finite subset of Q ≡ {0,,2, . . .}, where (s0, . . . ,sn ) is an increasing sequence. Recall that L ≡ {L(t) : t ∈ Q} is a filtration. 1. Let x : Q → S be an arbitrary given function. Let b : Q → (0,∞) be an arbitrary function such that for each t,r,s ∈ Q’, we have either b(s) ≤ d(xt ,xr ) or b(s) > d(xt ,xr ). Let t ∈ Q be arbitrary. Define
r1(d(x(t),x(r))>b(r)) 1(d(x(t),x(s))≤b(s)) ηt,b,Q (x) ≡ r∈Q ;t≤r
s∈Q ;t≤s 0, we will write simply ηt,α,Q (x) for ηt,b,Q (x). 2. More generally, let X : Q × → S be an arbitrary process adapted to the filtration L. Let b : Q → (0,∞) be an arbitrary function such that for each t,r, s ∈ Q , the real number b(s) is a regular point for the r.r.v. d(Xt ,Xr ). Let t ∈ Q be arbitrary. Define the r.r.v. ηt,b,Q (X) on as
ηt,b,Q (X) ≡ r1(d(X(t),X(r))>b(r)) 1(d(X(t),X(s))≤b(s)) r∈Q ;t≤r
+ sn
s∈Q ;t≤s b(ηt,b,Q (ω)). In words, if the simple first exit time occurs before the final time, then the sample path exits successfully at the simple first exit time. 4. If t ≤ s < ηt,b,Q (ω), then d(X(t,ω),X(s,ω)) ≤ b(s). In words, before the simple first exit time, the sample path on [t,sn ]Q remains in the b-neighborhood of the initial point. Moreover, if d(X(ηt,b,Q (ω),ω),X(t,ω)) ≤ b(ηt,b,Q (ω)), then d(X(s,ω),X(t,ω)) ≤ b(s) for each s ∈ Q with t ≤ s ≤ ηt,b,Q (ω). In words, if the sample path is in the b-neighborhood at the simple first exit time, then it is in the b-neighborhood at any time prior to the simple first exit time. Conversely, if r ∈ [t,sn )Q is such that d(X(t,ω),X(s,ω)) ≤ b(s) for each s ∈ (t,r]Q , then r < ηt,b,Q (ω). In words, if the sample path stays within the b-neighborhood up to and including a certain time, then the simple first exit time can come only after that certain time. 5. Suppose s0 = sk(0) < sk(1) < · · · < s k(p) = sn is a subsequence of s0 < s1 < · · · < sn . Define Q ≡ {sk(1), . . . ,s k(p) }. Let t ∈ Q ⊂ Q be arbitrary. Then ηt,b,Q ≤ ηt,b,Q . In other words, if the process X is sampled at more time points, then the more densely sampled simple first exit time can occur no later. Proof. By hypothesis, the process X is adapted to the filtration L. 1. Assertion 1 is obvious from the defining equality 8.1.6. 2. By equality 8.1.6, for each r ∈ {t, . . . ,sn−1 }, we have (ηt,b,Q = r) = (d(Xr ,Xt ) > b(r)) (d(Xs ,Xt ) ≤ b(s)) ∈ L(r) ⊂ L(s(n)) . s∈Q ;t≤s b(ηt,b,Q (ω)), as alleged in Assertion 3. 4. Suppose t < s < r ≡ ηt,b,Q (ω). Then d(X(t,ω),X(s,ω)) ≤ b(s)
(8.1.8)
by equality 8.1.7. The last inequality is trivially satisfied if t = s. Hence if r ≡ ηt,b,Q (ω) = sn with d(X(t,ω),X(r,ω)) ≤ b(r), then inequality 8.1.8 holds for each s ∈ Q with t ≤ s ≤ r. Conversely, suppose r ∈ Q is such that t ≤ r < sn and such that d(X(t,ω),X(s,ω)) ≤ b(s) for each s ∈ (t,r]Q . Suppose s ≡ ηt,b,Q (ω) ≤ r < sn . Then d(X(t,ω), X(s,ω)) > b(s) by Assertion 3, which is a contradiction. Hence ηt,b,Q (ω) > r. Assertion 4 is verified. 5. Let t ∈ Q ⊂ Q be arbitrary. Suppose, for the sake of contradiction, that s ≡ ηt,b,Q (ω) < ηt,b,Q (ω) ≤ sn . Then t < s and s ∈ Q ⊂ Q . Hence, by Assertion 4 applied to the time s and to the simple first exit time ηt,b,Q , we have d(X(t,ω),X(s,ω)) ≤ b(s). On the other hand, by Assertion 3 applied to the time s and to the simple first exit time ηt,b,Q , we have d(X(t,ω),X(s,ω)) > b(s), which is a contradiction. Hence ηt,b,Q (ω) ≥ ηt,b,Q (ω). Assertion 5 is proved.
8.2 Martingale Definition 8.2.1. Martingale and submartingale. Let Q be an arbitrary nonempty subset of R. Let L ≡ {L(t) : t ∈ Q} be an arbitrary filtration in (,L,E). Let X : Q × → R be a stochastic process such that Xt ∈ L(t) for each t ∈ Q. 1. The process X is called a martingale relative to L if, for each t,s ∈ Q with t ≤ s, we have EZXt = EZXs for each indicator Z ∈ L(t) . According to Definition 5.6.4, the last condition is equivalent to E(Xs |L(t) ) = Xt for each t,s ∈ Q with t ≤ s. 2. The process X is called a wide-sense submartingale relative to L if, for each t,s ∈ Q with t ≤ s, we have EZXt ≤ EZXs for each indicator Z ∈ L(t) . If, in addition, E(Xs |L(t) ) exists for each t,s ∈ Q with t ≤ s, then X is called a submartingale relative to L.
Martingale
307
3. The process X is called a wide-sense supermartingale relative to L if, for each t,s ∈ Q with t ≤ s, we have EZXt ≥ EZXs for each indicator Z ∈ L(t) . If, in addition, E(Xs |L(t) ) exists for each t,s ∈ Q with t ≤ s, then X is called a supermartingale relative to L. When there is little risk of confusion, we will omit the explicit reference to the given filtration L. With a martingale, the r.r.v. Xt can represent a gambler’s fortune at the current time t. Then the conditional expectation of said fortune at a later time s, given all information up to and including the current time t, is exactly the gambler’s current fortune. Thus a martingale X can be used as a model for a fair game of chance. Similarly, a submartingale can be used to model a favorable game. Clearly, a submartingale is also a wide-sense submartingale. The two notions are classically equivalent because, classically, with the benefit of the principle of infinite search, the conditional expectation always exists. Hence, any result that we prove for wide-sense submartingales holds classically for submartingales. Proposition 8.2.2. Martingale basics. Let X : Q × → R be an arbitrary process adapted to the filtration L ≡ {L(t) : t ∈ Q}. Unless otherwise specified, all martingales and wide-sense submartingales are relative to the filtration L. Then the following conditions hold: 1. The process X is a martingale iff it is both a wide-sense submartingale and a wide-sense supermartingale. 2. The process X is a wide-sense supermartingale iff −X is a wide-sense submartingale. 3. The expectation EXt is constant for t ∈ Q if X is a martingale. The expectation EXt is nondecreasing in t if X is a wide-sense submartingale. 4. Suppose X is a martingale. Then |X| is a wide-sense submartingale. In particular, E|Xt | is nondecreasing in t ∈ Q. 5. Suppose X is a martingale. Let a ∈ Q be arbitrary. Then the family {Xt : t ∈ (−∞,a]Q} is uniformly integrable. (t) (t) 6. Let L ≡ {L : t ∈ Q} be an arbitrary filtration such that L(t) ⊂ L for each t ∈ Q. Suppose X is a wide-sense submartingale relative to the filtration L. Then X is a wide-sense submartingale relative to the filtration L. The same assertion holds for martingales. 7. Suppose X is a wide-sense submartingale relative to the filtration L. Then it is a wide-sense submartingale relative to the natural filtration LX ≡ {L(X,t) : t ∈ Q} of the process X. Proof. 1. Assertions 1–3 being trivial, we will prove Assertions 4–7 only. 2. To that end, let t,s ∈ Q with t ≤ s be arbitrary. Let the indicator Z ∈ L(t) and the real number ε > 0 be arbitrary. Then E(|Xs |Z;Xt > ε) ≥ E(Xs Z;Xt > ε) = E(Xt Z;Xt > ε) = E(|Xt |Z;Xt > ε),
308
Stochastic Process
where the first equality is from the definition of a martingale. Since −X is also a martingale, we have similarly E(|Xs |Z;Xt < −ε) ≥ E(−Xs Z;Xt < −ε) = E(−Xt Z;Xt < −ε) = E(|Xt |Z;Xt < −ε). Adding the last two displayed inequalities, we obtain E(|Xs |;Z) ≥ E(|Xs |Z;Xt > ε) + E(|Xs |Z;Xt < −ε) ≥ E(|Xt |Z;Xt > ε) + E(|Xt |Z;Xt < −ε) = E(|Xt |Z) − E(|Xt |Z;|Xt | ≤ ε). Since E(|Xt |Z;|Xt | ≤ ε) ≤ E(|Xt |;|Xt | ≤ ε) → 0 as ε → 0, we conclude that E(|Xs |;Z) ≥ E(|Xt |;Z), where t,s ∈ Q with t ≤ s and the indicator Z ∈ L(t) are arbitrary. Thus the process |X| is a wide-sense submartingale. Assertion 4 is proved. 3. Suppose X is a martingale. Consider each a ∈ Q. Let t ∈ Q be arbitrary with t ≤ a, and let ε > 0 be arbitrary. Then, since Xa is integrable, there exists δ ≡ δX(a) (ε) > 0 so small that E|Xa |1A < ε for each measurable set A with P (A) < δ. Now let γ > β(ε) ≡ E|Xa |δ −1 be arbitrary. Then by Chebychev’s inequality, P (|Xt | > γ ) ≤ E|Xt |γ −1 ≤ E|Xa |γ −1 < δ, where the second inequality is because |X| is a wide-sense submartingale by Assertion 4. Hence E|Xt |1(X(t)>γ ) ≤ E|Xa |1(X(t)>γ ) < ε, where the first inequality is because |X| is a wide-sense submartingale. Since t ∈ Q is arbitrary with t ≤ a, we conclude that the family {Xt : t ∈ (−∞,a]Q} is uniformly integrable, with a simple modulus of uniform integrability β, in the sense of Definition 4.7.2. Assertion 5 has been verified. 4. To prove Assertion 6, assume that the process X is a wide-sense submartin(t) gale relative to some filtration L such that L(t) ⊂ L for each t ∈ Q. Let t,s ∈ Q (t) with t ≤ s be arbitrary. Consider each indicator Z ∈ L(t) . Then Z ∈ L . Hence EZXt ≤ EZXs , where the indicator Z ∈ L(t) is arbitrary. Thus X is a wide-sense submartingale relative to L. The proof for martingales is similar. Assertion 6 is verified. 5. To prove Assertion 7, suppose X is a wide-sense submartingale relative to L. Consider each t ∈ Q. We have Xr ∈ L(t) for each r ∈ [0,t]Q. Hence L(X,t) ≡ L(Xr : r ∈ [0,t]Q) ⊂ L(t) . Therefore Assertion 6 implies that X is a wide-sense
Martingale
309
submartingale relative to LX , and similarly for martingales. Assertion 7 and the proposition are proved. Definition 8.2.3. Specialization to uniformly spaced parameters. Recall that, unless otherwise specified, we assume in the remainder of this section that Q ≡ {0,,2, . . .} with some fixed > 0, and that L ≡ {L(t) : t ∈ Q} denotes an arbitrary but fixed filtration in (,L,E) with parameter Q. For ease of notations, we will further assume, by a change of units if necessary, that = 1. Theorem 8.2.4. Doob decomposition. Let Y : {0,1,2, . . .} × (,L,E) → R be a process that is adapted to the filtration L ≡ {L(n) : n ≥ 0}. Suppose the conditional expectation E(Ym |L(n) ) exists for each m,n ≥ 0 with n ≤ m. For each n ≥ 0, define Xn ≡ Y0 +
n
(Yk − E(Yk |L(k−1) ))
(8.2.1)
k=1
and An ≡
n
(E(Yk |L(k−1) ) − Yk−1 ),
(8.2.2)
k=1
where an empty sum is by convention equal to 0. Then the process X : {0,1,2, . . .} × → R is a martingale relative to the filtration L. Moreover, An ∈ L(n−1) and Yn = Xn + An for each n ≥ 1. Proof. From the defining equality 8.2.1, we see that Xn ∈ L(n) for each n ≥ 1. Hence the process X : {0,1,2, . . .} × → R is adapted to the filtration L. Let m > n ≥ 1 be arbitrary. Then ⎫ ⎛⎧ ⎞ m ⎨ ⎬
(Yk − E(Yk |L(k−1) )) |L(n) ⎠ E(Xm |L(n) ) = E ⎝ Xn + ⎩ ⎭ k=n+1
= Xn + = Xn +
m
{E(Yk |L(n) ) − E(E(Yk |L(k−1) )|L(n) )}
k=n+1 m
{E(Yk |L(n) ) − E(Yk |L(n) )} = Xn,
k=n+1
where we used basic properties of conditional expectations in Proposition 5.6.6. Thus the process X is a martingale relative to the filtration L. Moreover, An ∈ L(n−1) because all the summands in the defining equality 8.2.2 are members of L(n−1) .
310
Stochastic Process
Intuitively, Theorem 8.2.4 says that a multi-round game Y can be turned into a fair game X by charging a fair price determined at each round as the conditional expectation of payoff at the next round, with the cumulative cost of entry equal to An by the time n. The next theorem of Doob and its corollary are key to the analysis of martingales. It proves that under reasonable conditions, a fair game can never be turned into a favorable one by sampling at a sequence of stopping times, or by stopping at some stopping time, with an honest scheme, short of peeking into the future. The reader can look up “gambler’s ruin” in the literature for a counterexample where these reasonable conditions are not assumed, where a fair coin tossing game can be turned into an almost sure win by stopping when and only when the gambler is ahead by one dollar. This latter strategy sounds intriguing except for the lamentable fact that to achieve almost sure winning against a house with infinite capital, the strategy would require the gambler both to stay in the game for an unbounded number of rounds and to have infinite capital to avoid bankruptcy first. The next theorem and its proof are essentially restatements of parts of theorems 9.3.3 and 9.3.4 in [Chung 1968], except that for the case of wide-sense submartingales, we add a condition to make the theorem constructive. Theorem 8.2.5. Doob’s optional sampling theorem. Let X : {0,1,2, . . .} × (,L,E) → R be a wide-sense submartingale relative to a filtration L ≡ {L(k) : k ≥ 0}. Let τ ≡ (τn )n=1,2,... be a nondecreasing sequence of stopping times with values in {0,1,2, . . .} relative to the filtration L. Define the function Xτ : {0,1,2, . . .} × (,L,E) → R by Xτ,n ≡ Xτ (n) for each n ≥ 0. Suppose one of the following three conditions holds: (i) The function Xτ (n) is an integrable r.r.v. for each n ≥ 0, and the family {Xn : n ≥ 0} of r.r.v.’s is uniformly integrable. (ii) For each m ≥ 1, there exists some Mm ≥ 0 such that τm ≤ Mm . (iii) The given process X is a martingale, and the family {Xn : n ≥ 0} of r.r.v.’s is uniformly integrable. Then Xτ is a wide-sense submartingale relative to the filtration Lτ ≡ {L(τ (n)) : n ≥ 0}. If, in addition, the given process X is a martingale, then Xτ is a martingale relative to the filtration Lτ . Proof. Recall that Q ≡ {0,1,2, . . .}. Let m ≥ n and the indicator Z ∈ L(τ (n)) be arbitrary. We need to prove that the function Xτ (n) is integrable and that E(Xτ (m) Z) ≥ E(Xτ (n) Z).
(8.2.3)
First we will prove that Xτ (n) is integrable. 1. Suppose Condition (i) holds. Then the function Xτ (n) is integrable by assumption.
Martingale
311
2. Suppose Condition (ii) holds. Then the function Xτ (n) =
M(n)
Xτ (n) 1(τ (n)=u) =
u=0
M(n)
Xu 1(τ (n)=u)
u=0
is a finite sum of integrable r.r.v.’s. Hence Xτ (n) is itself an integrable r.r.v. 3. Suppose Condition (iii) holds. Then X is a martingale. Hence |X| is a widesense submartingale and E|Xt | is nondecreasing in t ∈ Q, according to Assertion 4 of Proposition 8.2.2. Consider each v,v ∈ Q with v ≤ v . Then it follows that
v
E|Xτ (n) |1(τ (n)=u) =
u=v
v
E|Xu |1(τ (n)=u) ≤
u=v
v
E|Xv |1(τ (n)=u)
u=v
= E|Xv |1(v≤τ (n)≤v ) ≤ αv,v ≡ E|Xv |1(v≤τ (n)) . (8.2.4) Let v → ∞. Since τn is a nonnegative r.r.v., we have P (v ≤ τn ) → 0. Therefore αv,v → 0, thanks to the uniform integrability of the family {Xt : t ∈ Q} of r.r.v.’s under Condition (iii). Summing up, we conclude that vu=v E|Xτ (n) |1(τ (n)=u) → 0 ∞ as v → ∞. Thus u=0 E|Xτ (n) |1(τ (n)=u) < ∞. Consequently, the Monotone Convergence Theorem implies that the function Xτ (n) = ∞ u=0 Xτ (n) 1(τ (n)=u) is an integrable r.r.v. 4. Summing up, we see that Xτ (n) is an integrable r.r.v. under each one of the three Conditions (i–iii). It remains to prove the relation 8.2.3. To that end, let u,v ∈ Q be arbitrary with u ≤ v. Then Z1τ (n)=u ∈ L(u) ⊂ L(v) . Hence Yu,v ≡ Xv Z1τ (n)=u ∈ L(v) . Moreover, EYu,v 1τ (m)≥v ≡ EXv Z1τ (n)=u 1τ (m)≥v = EXv Z1τ (n)=u 1τ (m)=v + EXv Z1τ (n)=u 1τ (m)≥v+1 = EXτ (m) Z1τ (n)=u 1τ (m)=v + EXv Z1τ (n)=u 1τ (m)≥v+1 ≤ EXτ (m) Z1τ (n)=u 1τ (m)=v + EXv+1 Z1τ (n)=u 1τ (m)≥v+1, where the inequality is because the indicator Z1τ (n)=u 1τ (m)≥v+1 = Z1τ (n)=u (1 − 1τ (m)≤v ) is a member of L(v) , and because X is, by hypothesis, a wide-sense submartingale. In short, EYu,v 1τ (m)≥v ≤ EXτ (m) Z1τ (n)=u 1τ (m)=v + EYu,v+1 1τ (m)≥v+1,
(8.2.5)
where v ∈ [u,∞)Q is arbitrary. Let κ ≥ 0 be arbitrary. Applying inequality 8.2.5 successively to v = u,u + 1,u + 2, . . . ,u + κ, we obtain
312
Stochastic Process
EYu,u 1τ (m)≥u ≤ EXτ (m) Z1τ (n)=u 1τ (m)=u + EYu,u+1 1τ (m)≥u+1 ≤ EXτ (m) Z1τ (n)=u 1τ (m)=u + EXτ (m) Z1τ (n)=u 1τ (m)=u+1 + EYu,u+2 1τ (m)≥u+2 ≤ · · ·
≤ EXτ (m) Z1τ (n)=u 1τ (m)=v + EYu,u+(κ+1) 1τ (m)≥u+(κ+1) v∈[u,u+κ]Q
= EXτ (m) Z1τ (n)=u 1u≤τ (m)≤u+κ + EXu+(κ+1) Z1τ (n)=u 1τ (m)≥u+(κ+1) = EXτ (m) Z1τ (n)=u 1τ (m)≤u+κ + EXu+(κ+1) Z1τ (n)=u 1τ (m)≥u+(κ+1). = EZXτ (m) Z1τ (n)=u − EXτ (m) Z1τ (n)=u 1τ (m)≥u+(κ+1) + EXu+(κ+1) Z1τ (n)=u 1τ (m)≥u+(κ+1). ≡ EXτ (m) Z1τ (n)=u − EXτ (m) 1A(κ) + EXu+(κ+1) 1A(κ),
(8.2.6)
where the second equality is because τn ≤ τm , and where Aκ is the measurable set whose indicator is 1A(κ) ≡ Z1τ (n)=u 1τ (m)≥u+(κ+1) and whose probability is therefore bounded by P (Aκ ) ≤ P (τm ≥ u + (κ + 1)). Now let κ → ∞. Then P (Aκ ) → 0 because Xτ (m) is an integrable r.r.v., as proved in Steps 1–3. Consequently, the second summand on the right-hand side of inequality 8.2.6 tends to 0. Now consider the third summand on the right-hand side of inequality 8.2.6. Suppose Condition (ii) holds. Then, as soon as κ is so large that u + (κ + 1) ≥ Mm , we have P (Aκ ) = 0, whence said two summands vanish as κ → ∞. Suppose, alternatively, that Condition (i) or (iii) holds. Then the last summand tends to 0, thanks to the uniform integrability of the family {Xt : t ∈ [0,∞)} of r.r.v.’s guaranteed by Condition (i) or (iii). Summing up, the second and third summands both tend to 0 as κ → ∞, with only the first summand on the right-hand side of inequality 8.2.6 surviving, to yield EYu,u 1τ (m)≥u ≤ EXτ (m) Z1τ (n)=u . Equivalently, EXu Z1τ (n)=u 1τ (m)≥u ≤ EXτ (m) Z1τ (n)=u . Since (τn = u) ⊂ (τm ≥ u), this last inequality simplifies to EXτ (n) Z1τ (n)=u ≤ EXτ (m) Z1τ (n)=u, where u ∈ Q ≡ {0,1,2, . . .} is arbitrary. Summation over u ∈ Q then yields the desired equality 8.2.3. Thus Xτ is a wide-sense submartingale relative to the filtration Lτ ≡ {L(τ (n)) : n = 0,1, . . .}. The first part of the conclusion of the theorem, regarding wide-sense submartingales, has been proved.
Martingale
313
5. Finally, suppose the given wide-sense submartingale X is actually a martingale. Then −X is a wide-sense submartingale, and so by the preceding arguments, both processes Xτ and −Xτ are a wide-sense submartingale relative to the filtration Lτ . Combining, we conclude that Xτ is a martingale if X is a martingale, provided that one of the three Conditions (i–iii) holds. The theorem is proved. Corollary 8.2.6. Doob’s optional stopping theorem for a finite game. Let n ≥ 1 be arbitrary. Write Q ≡ {0,1, . . . ,n} ≡ {t0,t1, . . . ,tn } ⊂ Q. Let X : Q × → R be a process adapted to the filtration L ≡ {L(t) : t ∈ Q}. Let τ be an arbitrary simple stopping time relative to L with values in Q . Define the r.r.v.
Xt 1(τ =t) ∈ L (τ ) . Xτ ≡ t∈Q
Define the process X : {0,1,2} × → R by (X0 ,X1 ,X2 ) ≡ (Xt (0) Xτ ,Xt (n) ). Define the filtration L ≡ {L (i) : i = 0,1,2} by (L (0),L (1),L (2) ) ≡ (L(t (0)),L(τ ),L(t (n)) ). Then the following conditions hold: 1. If the process X is a wide-sense submartingale relative to L, then the process X is a wide-sense submartingale relative to the filtration L . 2. If the process X is a martingale relative to L, then the process X is a martingale relative to L . Proof. Extend the process X to the parameter set Q ≡ {0,1, . . .} by Xt ≡ Xt∧n for each t ∈ {0,1, . . .}. Likewise, extend the filtration L by defining L(t) ≡ L(t∧n) for each t ∈ {0,1, . . .}. We can verify that the extended process X : {0,1, . . .} × → R retains the same property of being a martingale or wide-sense submartingale, respectively, as the given process being a martingale or wide-sense submartingale, relative to the extended filtration L. Now define a sequence τ ≡ (τ0,τ1, . . .) of stopping times by τ0 ≡ t0 , τ1 ≡ τ, and τm ≡ tn for each m ≥ 2. Then it can easily be verified that the sequence τ satisfies Condition (ii) of Theorem 8.2.5. Hence the process Xτ defined in Theorem 8.2.5 is a martingale if X is a martingale, and it is a wide-sense submartingale if X is a wide-sense submartingale. Since (X0 ,X1 ,X2 ) ≡ (Xt (0) Xτ ,Xt (n) ) = (Xτ (0) Xτ (1),Xτ (2) ) and (L (0),L (1),L (2) ) ≡ (L(t (0)),L(τ ),L(t (n)) ) = (L(τ (0)),L(τ (1)),L(τ (2)) ), the conclusion of the corollary follows.
314
Stochastic Process 8.3 Convexity and Martingale Convergence
In this section, we consider the a.u. convergence of martingales. Suppose X : {1,2, . . .} × → R is a martingale relative to some filtration L ≡ {L(n) : n = 1, 2, . . .}. A classical theorem says that if E|Xn | is bounded as n → ∞, then Xn converges a.u. as n → ∞. The theorem can be proved, classically, by the celebrated upcrossing inequality of Doob, thanks to the principle of infinite search. See, for example, [Durret 1984]. While the upcrossing inequality is constructive, the inference of a.u. convergence from it is not. The following example shows that the martingale convergence theorem, as stated earlier, actually implies the principle of infinite search. Let (an )n=1,2,... be an arbitrary nondecreasing sequence in {0,1}. Let Y be an arbitrary r.r.v. that takes the value −1 or +1 with equal probabilities. For each n ≥ 1, define Xn ≡ 1+an Y . Then the process X : {1,2, . . .}× → {0,1,2} is a martingale relative to its natural filtration, with E|Xn | = EXn = 1 for each n ≥ 1. Suppose Xn → U a.u. for some r.r.v. U . Then there exists b ∈ (0,1) such that the set (U < b) is measurable. Either (i) P (U < b) < 12 or (ii) P (U < b) > 0. In case (i), we must have an = 0 for each n ≥ 1. In case (ii), because of a.u. convergence, there exists b ∈ (b,1) such that P (Xn < b ) > 0 for some n ≥ 1, whence an = 1 for some n ≥ 1. Since the nondecreasing sequence (an )n=1,2,... is arbitrary, we have deduced the principle of infinite search from the classical theorem of martingale convergence. Thus the boundedness Xn of together with the constancy of E|Xn | is not sufficient for the constructive a.u. convergence. Boundedness is not the issue; convexity is. The function |x| simply does not have any positive convexity away from x = 0. Bishop’s maximal inequality for martingales, given as theorem 3 in chapter 8 of [Bishop 1967], uses an admissible symmetric convex function λ(x) in the place of the function |x|. It proves a.u. convergence of martingales under a condition in terms of the convergence of Eλ(Xn ), thus obviating the use of upcrossing inequalities. We will modify Bishop’s theorem by using strictly convex functions λ(x) that are not necessarily symmetric, but which have positive and continuous second derivatives, as a natural alternative to the function |x|. This allows the use of a special strictly convex function λ such that |λ(x)| ≤ 3|x| for each x ∈ R. Then the boundedness and convergence of Eλ(Xn ) follow, classically, from the boundedness of E|Xn |. Thus we will have a criterion for constructive a.u. convergence that, from the classical view point, imposes no additional condition beyond the boundedness of E|Xn |. The proof, being constructive, produces rates of a.u. convergence. Definition 8.3.1. Strictly convex function. A continuous function λ : R → R is said to be strictly convex if it has a positive continuous second derivative λ on R. This definition generalizes the admissible functions in chapter 8 of [Bishop 1967]. The conditions of symmetry and nonnegativity of λ are dropped
Martingale
315
here. Correspondingly, we need to generalize Bishop’s version of Jensen’s inequality, given as lemma 2 in chapter 8 of [Bishop 1967], to the following version. Theorem 8.3.2. Bishop–Jensen inequality. Let λ : R → R be a strictly convex function. Define the continuous function θ on [0,∞) by θ (x) ≡ infy∈[−x,x] λ (y) > 0 for each x > 0. Define the continuous function g : R 2 → [0,∞) by g(x0,x1 ) ≡
1 (x1 − x0 )2 θ (|x0 | ∨ |x1 |) 2
(8.3.1)
for each (x0,x1 ) ∈ R 2 . Let X0 and X1 be integrable r.r.v.’s on (,L,E) such that λ(X0 ),λ(X1 ) are integrable. Suppose either (i) E(X1 |X0 ) = X0 or (ii) the strictly convex function λ : R → R is nondecreasing and EU X0 ≤ EU X1 for each indicator U ∈ L(X0 ). Then the r.r.v. g(X0,X1 ) is integrable and 0 ≤ Eg(X0,X1 ) ≤ Eλ(X1 ) − Eλ(X0 ). Proof. 1. Let (x0,x1 ) ∈ R 2 be arbitrary. Then ! x(1) ! v 0 ≤ g(x0,x1 ) = θ (|x0 | ∨ |x1 |) ! ≤
x(1) v=x(0)
!
dudv
v=x(0) u=x(0) v
(8.3.2)
!
λ (u)du dv =
u=x(0)
x(1)
(λ (v) − λ (x0 ))dv
v=x(0)
= λ(x1 ) − λ(x0 ) − λ (x0 )(x1 − x0 ),
(8.3.3)
where the last equality is by the Fundamental Theorem of Calculus. 2. Let (,L(X0 ),E) denote the probability subspace of (,L,E) generated by the r.r.v. X0 . Let V ∈ L(X0 ) be an arbitrary indicator such that λ (X0 )V is bounded. Suppose Condition (i) in the hypothesis holds: E(X1 |X0 ) = X0 . Then, by the properties of conditional expectations and the assumed boundedness of the r.r.v. λ (X0 )V ∈ L(X0 ), we obtain E(X1 − X0 )λ (X0 )V = E(X0 − X0 )λ (X0 )V = 0. Suppose Condition (ii) holds. Then EU X0 ≤ EU X1 for each indicator U ∈ L(X0 ), and the function λ : R → R is nondecreasing. Hence λ ≥ 0 and the bounded r.r.v. λ (X0 )V ∈ L(X0 ) is nonnegative. Therefore, by Assertion 1 of Proposition 5.6.7, where X,Y,Z,L are replaced by X0,X1,λ (X0 )V ,L(X0 ), respectively, we have Eλ (X0 )V X0 ≤ Eλ (X0 )V X1 . Summing up, under either Condition (i) or (ii), we have E(X1 − X0 )λ (X0 )V ≥ 0,
(8.3.4)
for each indicator V ∈ L(X0 ) such that λ (X0 )V is bounded. 3. Now the r.r.v.’s λ(X0 ),λ(X1 ),X0,X1 are integrable by hypothesis. Let b > a > 0 be arbitrary. Since the function λ is continuous, it is bounded on [−b,b].
316
Stochastic Process
Hence the r.r.v. λ (X0 )1(a≥|X(0|) is bounded. Consequently, inequality 8.3.3 implies that the r.r.v. g(X0,X1 )1(a≥|X(0|) is bounded by the integrable r.r.v. λ(X1 )1(a≥|X(0|) − λ(X0 )1(a≥|X(0|) − λ (X0 )(X1 − X0 )1(a≥|X(0|) and is therefore itself integrable. 4. At the same time, λ (X0 )1(b≥|X(0)|>a) is bounded. Therefore inequality 8.3.4 holds with V ≡ 1(b≥|X(0)|>a) . Combining, 0 ≤ Eg(X0,X1 )1(b≥|X(0|) − Eg(X0,X1 )1(a≥|X(0|) = Eg(X0,X1 )V ≤ E(λ(X1 ) − λ(X0 ) − λ (X0 )(X1 − X0 ))V ≤ E(λ(X1 ) − λ(X0 ))V ≡ E(λ(X1 ) − λ(X0 ))1(b≥|X(0)|>a) → 0 as b > a → ∞, where the second inequality is due to inequality 8.3.3, and where the third inequality is thanks to inequality 8.3.4. Hence the integral Eg(X0,X1 )1(a≥|X(0|) converges as a → ∞. It follows from the Monotone Convergence Theorem that the r.r.v. g(X0,X1 ) = lim g(X0,X1 )1(a≥|X(0|) a→∞
is integrable, with Eg(X0,X1 ) = lim Eg(X0,X1 )1(a≥|X(0|) a→∞
≤ lim E(λ(X1 ) − λ(X0 ) − λ (X0 )(X1 − X0 ))1(a≥|X(0|) a→∞
≤ lim E(λ(X1 ) − λ(X0 ))1(a≥|X(0|) = Eλ(X1 ) − Eλ(X0 ), a→∞
where the first inequality follows from inequality 8.3.3, and where the second inequality follows from inequality 8.3.4 in which V is replaced by 1(a≥|X(0)|) . The desired inequality 8.3.2 is proved. Now we are ready to formulate and prove the advertised maximal inequality. Definition 8.3.3. The special convex function. Define the continuous function λ : R → R by λ(x) ≡ 2x + (e−|x| − 1 + |x|)
(8.3.5)
for each x ∈ R. We will call λ the special convex function. Theorem 8.3.4. A maximal inequality for martingales. 1. The special convex function λ is increasing and strictly convex, with |x| ≤ |λ(x)| ≤ 3|x|
(8.3.6)
for each x ∈ R. 2. Let Q ≡ {t0,t1, . . . ,tn } be an arbitrary enumerated finite subset of R, with t0 < t1 < · · · < tn . Let X : Q × → R be an arbitrary martingale relative to the filtration L ≡ {L(t (i)) : i = 1, . . . ,n}. Let ε > 0 be arbitrary. Suppose
Martingale Eλ(Xt (n) ) − Eλ(Xt (0) ) < Then
P
n
317
1 3 ε exp(−3(E|Xt (0) | ∨ E|Xt (n) |)ε−1 ). 6
(8.3.7)
|Xt (k) − Xt (0) | > ε
< ε.
(8.3.8)
k=0
We emphasize that the last two displayed inequalities are regardless of how large n ≥ 0 is. We also note that in view of inequality 8.3.6, the r.r.v. λ(Y ) is integrable for each integrable r.r.v. Y . Thus inequality 8.3.7 makes sense when |Xt (n) | is p integrable, in contrast to the classical counterpart, which requires either Xt (n) is integrable for some p > 1 or |Xt (n) | log |Xt (n) | is integrable. Proof. 1. First note that λ(0) = 0. Elementary calculus yields a continuous first derivative λ on R such that
λ (x) = 2 + (−e−x + 1) ≥ 2
(8.3.9)
for each x ≥ 0, and such that
λ (x) = 2 + (ex − 1) ≥ 1
(8.3.10)
for each x ≤ 0. Therefore the function λ is increasing. Moreover, λ has a positive and continuous second derivative
λ (x) = e−|x| > 0 for each x ∈ R. Thus the function λ is strictly convex, and θ (x) ≡
inf
y∈[−x,x]
λ (y) = e−x > 0
for each x > 0. Furthermore, since 0 ≤ e−r − 1 + r ≤ r for each r ≥ 0, the triangle inequality yields, for each x ∈ R, |x| = 2|x| − |x| ≤ |2x| − |(e−|x| − 1 + |x|)| ≤ |2x + (e−|x| − 1 + |x|)| ≡ |λ(x)| ≤ |2x| + |(e−|x| − 1 + |x|)| ≤ 2|x| + |x| = 3|x|. This establishes the desired inequality 8.3.6 in Assertion 1. 2. As in Theorem 8.3.2, define the continuous function g : R 2 → [0,∞) by g(x0,x1 ) ≡
1 1 (x1 − x0 )2 θ (|x0 | ∨ |x1 |) = (x1 − x0 )2 exp(−(|x0 | ∨ |x1 |)) 2 2 (8.3.11)
for each (x0,x1 ) ∈ R 2 . 3. By relabeling if necessary, we assume, without loss of generality, that Q ≡ {t0,t1, . . . ,tn } = {0,1, . . . ,n}
318
Stochastic Process
as enumerated sets. Let ε > 0 be arbitrary. For abbreviation, write K ≡ E|X0 | ∨ E|Xn |, b ≡ 3Kε−1 , and 1 1 1 2 ε θ (b) = ε2 e−b ≡ ε2 exp(−3Kε−1 ). 2 2 2 Then inequality 8.3.7 in the hypothesis can be rewritten as γ ≡
1 εγ . (8.3.12) 3 4. Let τ ≡ η0,ε,Q be the simple first exit time of the process X to exit, after t = 0, from the ε-neighborhood of X0 , in the sense of Definition 8.1.12. Define the probability subspace L(τ ) relative to the simple stopping time τ , as in Definition 8.1.9. Define the r.r.v.
Xt 1(τ =t) ∈ L(τ ) . Xτ ≡ Eλ(Xn ) − Eλ(X0 )
ε⎠ t∈Q
k=0
≤ P (AB1 B2 )c
1, the following theorem requires no Lp -integrability for p > 1. Theorem 8.3.5. a.u. Convergence of a martingale, and rate of said a.u. convergence. Suppose the parameter set Q is either Q ≡ {1,2, . . .} or Q ≡ {· · · , − 2, − 1}. Let X : Q × → R be an arbitrary martingale relative to its natural filtration L. Then the following conditions hold: 1. a.u. convergence. Suppose β ≡ lim|t|→∞ E|Xt | exists, and suppose α ≡ lim|t|→∞ E exp(−|Xt |) exists. Then Xt → Y a.u. as |t| → ∞ in Q, for some r.r.v. Y . 2. Rate of a.u. convergence. Specifically, for each h ≥ 0, define bh ≡
sup t∈Q;|t|≥h
|(E|Xt | − β)|,
(8.3.14)
320
Stochastic Process ah ≡
|(E exp(−|Xt |) − α)|,
sup t∈Q;|t|≥h
and δh ≡ ah + bh . Then ah,bh,δh → 0 as h → ∞. Define k0 ≡ 0. Inductively, for each m ≥ 1, take any km ≥ km−1 so large that δk(m) ≤
1 −3m 2 exp(−2m 3(β + bk(m−1) )). 12
(8.3.15)
Let ε > 0 be arbitrary. Take m ≥ 1 so large that 2−m+2 < ε. Then there exists a measurable set A with P (Ac ) < ε where ∞
A⊂
(|Xt − Y | ≤ 2−p+3 ).
(8.3.16)
p=m t∈Q;|t|≥k(p)
Proof. 1. Recall the special convex function λ : R → R, defined by λ(x) ≡ 2x + (e−|x| − 1 + |x|)
(8.3.17)
for each x ∈ R. Define, for convenience, ι ≡ 1 or ι ≡ −1 according as Q ≡ {1,2, . . .} or Q ≡ {· · · , − 2, − 1}. Define γ ≡ 2EXι + α − 1 + β. Then, since X : Q × → R is a martingale, the r.r.v. λ(Xt ) ≡ 2Xt + (e−|Xt | − 1 + |Xt |)
(8.3.18)
is integrable for each t ∈ Q, with Eλ(Xt ) ≡ 2EXι + (Ee−|Xt | − 1 + E|Xt |). Hence lim Eλ(Xt ) = 2EXι + α − 1 + β ≡ γ .
|t|→∞
Moreover, for each h ≥ 0, we have sup t∈Q;|t|≥h
|Eλ(Xt ) − γ | =
|(Ee−|Xt | + E|Xt |) − (α + β)|
sup t∈Q;|t|≥h
≤ ah + bh ≡ δh . 2. Let m ≥ 1 be arbitrary. Then it follows that sup t∈Q;|t|≥k(m)
|Eλ(Xt ) − γ | ≤ δk(m) ≤
1 −3m 2 exp(−2m 3(β + bk(m−1) )), 12 (8.3.19)
where the second inequality is by inequality 8.3.15.
Martingale
321
3. To finish the proof, consider first the special case where Q ≡ {0,1,2, . . .}. Then inequality 8.3.19 yields |Eλ(Xk(m+1) ) − Eλ(Xk(m) )| ≤ 2δk(m) ≤
1 −3m 2 exp(−2m 3(β + bk(m−1) )). 6 (8.3.20)
4. Separately, from equality 8.3.14, we then have E|Xk(m) | ∨ E|Xk(m+1) | ≤ β + bk(m−1) .
(8.3.21)
Take any εm ∈ (2−m,2−m+1 ). The last two displayed inequalities combine to yield 1 −3m 2 exp(−2m 3(β + bk(m−1) )) 6 1 ≤ 2−3m exp(−2m 3(E|Xk(m) | ∨ E|Xk(m+1) |)) 6 1 3 −1 < εm exp(−3εm (E|Xk(m) | ∨ E|Xk(m+1) |)), 6 (8.3.22)
Eλ(Xk(m+1) ) − Eλ(Xk(m) ) ≤
where the last inequality is because 2−m < εm . 5. In view of inequality 8.3.22, Theorem 8.3.4 is applicable and implies that P (Bm ) < εm , where ⎛ ⎞ k(m+1) Bm ≡ ⎝ |Xi − Xk(m) | > εm ⎠ , (8.3.23) i=k(m)
where m ≥ 1 is arbitrary. c 6. Now define Am ≡ ∞ h=m Bh . Then P (Acm ) ≤
∞
P (Bh )
c
|f(−u)| · |ψk (u) − ψ0 (u)|du |f(−u)| · 2du
4 24 · 2du = π + a 2 u2 a2c
24 = π + π = 2π, 2−2n π −1 22n+4
where the third inequality is thanks to inequality 8.4.9, and where the last inequality is because a ∈ (2−n,2−n+1 ). Hence |Jk f − J0 f | ≤ 1. At the same time, from the defining formula 8.4.8, we see that 1[−a,a] ≥ af . Hence P (|Sk | > a) = Jk (1 − 1[−a,a] ) ≤ 1 − aJk f ≤ 1 − a(J0 f − 1) = 1 − a( f (0) − 1) = 1 − a(a −1 − 1) = a,
(8.4.10)
where a ∈ (2−n,2−n+1 ), k ≥ pn , and n ≥ 1 are arbitrary. 8. Now let m ≥ 1 be arbitrary. Let n ≡ nm . Take an arbitrary a ∈ (2−n,2−n+1 ). Consider each k ≥ qm ≡ pn(m) ≡ pn . Then, by inequality 8.4.10, we have P (|Sk | > a) < a < 2−n+1 ≡ 2−n(m)+1 < δ(2−m−1 ), where the last inequality is by the defining equality 8.4.5. Hence, since δ is a modulus of integrability of Zκ for each κ = 1, . . . ,k, it follows that
Martingale E|Sk |1(|S(k)|>a) ≤ k −1
k
E|Zκ |1(|S(k)|>a) ≤ k −1
κ=1
327 k
2−m−1 = 2−m−1 .
κ=1
Consequently, E|Sk | ≤ E|Sk |1(|S(k)|≤a) + E|Sk |1(|S(k)|>a) ≤ a + 2−m−1 < 2−n(m)+1 + 2−m−1 ≤ 2−m−1 + 2−m−1 = 2−m,
(8.4.11)
where the last inequality is, again, by the defining equality 8.4.5, and where m ≥ 1 and k ≥ qm are arbitrary. We conclude that E|Sk | → 0 as k → ∞. Moreover, inequality 8.4.11 shows that bm ≡ sup E|Sk | ≤ 2−m
(8.4.12)
k≥q(m)
for each m ≥ 1. The theorem is proved.
Theorem 8.4.2. Strong Law of Large Numbers. Suppose Z1,Z2, . . . is a sequence of integrable, independent, and identically distributed r.r.v’s with mean 0, on some probability space (,L,E). Let η be a simple modulus of integrability of Z1 in the sense of Definition 4.7.2. Then the following conditions hold: 1. The partial sums Sk ≡ k −1 (Z1 + · · · + Zk ) → 0
a.u.
as k → ∞. 2. More precisely, for each m ≥ 1 there exists an integer km,η and a measurable set A, with P (Ac ) < 2−m+2 and with A⊂
∞
∞
(|Sk | ≤ 2−p+3 ).
(8.4.13)
p=m k=k(p,η)
Proof. 1. Let m ≥ j ≥ 1 be arbitrary, and let Ij denote the distribution of Zj on R. Then, in view of the hypothesis of independence and identical distribution, the r.v. (Z1, . . . ,Zj , · · · ,Zm ) with values in R m has the same distribution as the r.v (Zj , . . . ,Z1, · · · ,Zm ), where, for brevity, the latter stands for the sequence obtained from (Z1, . . . ,Zj , · · · ,Zm ) by swapping the first and the j th members. Thus Eh(Z1, . . . ,Zj , . . . ,Zm ) = Eh(Zj , . . . ,Z1, . . . ,Zm )
(8.4.14)
for each integrable function h on R m
relative to the joint distribution EZ(1),...,Z(m) . 2. Let Q ≡ {· · · , − 2, − 1}. For each t ≡ −k ∈ Q, define Xt ≡ S|t| ≡ Sk .
Let L ≡ LX ≡ {L(X,t) : t ∈ Q} be the natural filtration of the process X : Q × → R. Let t ∈ Q be arbitrary. Then t = −n for some n ≥ 1. Hence (,L(X,t),E) is the probability subspace of (,L,E) generated by the family
328
Stochastic Process G(X,t) ≡ {Xr : r ∈ Q; r ≤ t} = {Sm : m ≥ n}.
In other words, (,L(X,t),E) is the completion of the integration space (,LCub (G(X,t) ),E), where LCub (G(X,t) ) = {f (Sn, . . . ,Sm ) : m ≥ n;f ∈ Cub (R m−n+1 )}.
(8.4.15)
By Lemma 8.1.4, the process X is adapted to its natural filtration L ≡ LX . 3. We will prove that the process X is a martingale relative to the filtration L. To that end, let s,t ∈ Q be arbitrary with t ≤ s. Then t = −n and s = −k for some n ≥ k ≥ 1. Let Y ∈ LCub (G(X,t) ) be arbitrary. Then, in view of equality 8.4.15, we have Y = f (Sn, . . . ,Sm ) ≡ f ((Z1 + · · · + Zn )n−1, . . . ,(Z1 + · · · + Zm )m−1 ) for some f ∈ Cub (R m−n+1 ), for some m ≥ n. Let j = 1, . . . ,n be arbitrary. Then, since the r.r.v. Y is bounded, the r.r.v. Y Zj is integrable. Hence, EY Zj ≡ Ef ((Z1 + · · · + Zj + · · · + Zn )n−1, . . . , (Z1 + · · · + Zj + · · · + Zm )m−1 )Zj = Ef ((Zj + · · · + Z1 + · · · + Zn )n−1, . . . , (Zj + · · · + Z1 + · · · + Zm )m−1 )Z1 = Ef ((Z1 + · · · + Zj + · · · + Zn )n−1, . . . , (Z1 + · · · + Zj + · · · + Zm )m−1 )Z1 ≡ EY Z1, where the second equality is by equality 8.4.14, and where the third equality is because addition is commutative. In short, EY Zj = EY Z1 for each j = 1, . . . ,n. Therefore, since k ≤ n, we have EY Sk = k −1 E(Y Z1 + · · · + Y Zk ) = EY Z1 . In particular, EY Sn = EY Z1 = EY Sk . In other words, EY Xt = EY Xs , where Xt ∈ L(X,t) and where Y ∈ LCub (G(X,t) ) is arbitrary. Hence, according to Assertion 5 of Proposition 5.6.6, we have E(Xs |L(X,t) ) = Xt . Since s,t ∈ Q are arbitrary with t ≤ s, the process X is a martingale relative to its natural filtration L ≡ {L(X,t) : t ∈ Q}.
Martingale
329
4. Thanks to Theorem 8.4.1, the Weak Law of Large Numbers, the supremum bh ≡
sup
E|Xt |
(8.4.16)
t∈Q;|t|≥h
is well defined for each h ≥ 1. Moreover, by Theorem 8.4.1, there exists a sequence (qh )h=1,2,... ≡ (qh,η )h=1,2,... of positive integers, which depends only on η, and is such that bq(h) ≡ sup E|Sk | ≤ 2−h .
(8.4.17)
k≥q(h)
Thus lim E|Xt | ≡ lim E|Sk | = β ≡ 0.
|t|→∞
k→∞
(8.4.18)
Moreover, inequality 8.4.17 can be rewritten as bq(h) ≡
|(E|Xt | − β)| ≤ 2−h,
sup
(8.4.19)
t∈Q;|t|≥q(h)
for each h ≥ 1. 5. Next, let h ≥ 1 be arbitrary, and consider each k ≥ qh . Then E|e−|S(k)| − 1| ≤ E|Sk | ≤ 2−h . Thus lim Ee−|X(t)| ≡ lim Ee−|S(k)| = α ≡ 1,
|t|→∞
k→∞
(8.4.20)
with aq(h) ≡
|(E exp(−|Xt |) − α)|
sup t∈Q;|t|≥q(h)
≡
|(E exp(−|Xt |) − 1)| ≤
sup t∈Q;|t|≥q(h)
sup
E|Xt | ≤ 2−h . (8.4.21)
t∈Q;|t|≥q(h)
6. In view of equalities 8.4.18 and 8.4.20, Theorem 8.3.5 implies that Xt → Y a.u. as |t| → ∞ in Q, for some r.r.v. Y . In other words, Sk → Y a.u. as k → ∞. Hence, by the Dominated Convergence Theorem, we have E(1 ∧ |Y |) = lim E(1 ∧ |Sk |) ≤ lim E|Sk | = 0. k→∞
k→∞
It follows that Y = 0. Thus Sk → 0 a.u. as k → ∞. This proves Assertion 1 of the present theorem. 7. For the rate of a.u. convergence, we follow Assertion 2 of Theorem 8.3.5 and its proof. Specifically, define δh ≡ ah + bh for each h ≥ 1. Define k0 ≡ 0. Inductively, for each m ≥ 1, take any km ≡ km,η ≥ km−1 so large that δk(m) ≤
1 −3m 2 exp(−2m 3(β + bk(m−1) )). 12
(8.4.22)
330
Stochastic Process
Let ε > 0 be arbitrary. Take m ≥ 1 so large that 2−m+2 < ε. Then there exists a measurable set A with P (Ac ) < ε where A⊂
∞
(|Xt − Y | ≤ 2−p+3 ).
(8.4.23)
(|Sk | ≤ 2−p+3 ),
(8.4.24)
p=m t∈Q;|t|≥k(p)
In other words, A⊂
∞ ∞ p=m k=k(p)
as desired.
9 a.u. Continuous Process
In this chapter, let (S,d) be a locally compact metric space. Unless otherwise specified, this will serve as the state space for the processes in this chapter. Consider an arbitrary consistent family F of f.j.d.’s that is continuous in probability, with state space S and parameter set [0,1]. We will find conditions on the f.j.d.’s in F under which an a.u. continuous process X can be constructed with marginal distributions given by the family F . A classical proof of the existence of such processes X, as elaborated in [Billingsley 1974], uses the following theorem. Theorem 9.0.1. Prokhorov’s Relative Compactness Theorem. Each tight family J of distributions on a locally compact metric space (H,dH ) is relatively compact, in the sense that each sequence in J contains a subsequence that converges weakly to some distribution on (H,dH ). Prokhorov’s theorem implies the principle of infinite search, and is therefore not constructive. This can be seen as follows. Let (rn )n=1,2,... be an arbitrary nondecreasing sequence in H ≡ {0,1}. Let the doubleton H be endowed with the Euclidean metric dH defined by dH (x,y) = |x − y| for each x,y ∈ H . For each n ≥ 1, let Jn be the distribution on (H,dH ) that assigns unit mass to rn ; in other words, Jn (f ) ≡ f (rn ) for each f ∈ C(H,dH ). Then the family J ≡ {J1,J2, . . .} is tight, and Prokhorov’s theorem implies that Jn converges weakly to some distribution J on (H,dH ). It follows that Jn g converges as n → 0, where g ∈ C(H,dH ) is defined by g(x) = x for each x ∈ H . Thus rn ≡ g(rn ) ≡ Jn g converges as n → 0. Since (rn )n=1,2,... is an arbitrary nondecreasing sequence in {0,1}, the principle of infinite search follows. In our constructions, we will bypass any use of Prokhorov’s theorem or of unjustified supremums, in favor of direct proofs using Borel–Cantelli estimates. We will give a necessary and sufficient condition on the f.j.d.’s in the family F for F to be extendable to an a.u. continuous process X. We will call this condition C-regularity. We will derive a modulus of a.u. continuity of the process X from a given modulus of continuity in probability and a given modulus of C-regularity of the consistent family F , in a sense defined presently. We will also prove that 331
332
Stochastic Process
0 of such the extension is uniformly metrically continuous on an arbitrary set F consistent families F that share a common modulus of C-regularity. In essence, the material presented in Sections 9.1 and 9.2 is a constructive and more general version of material from section 7 of chapter 2 of [Billingsley 1974], though the latter treats only the special case where S = R. We remark that the generalization to the arbitrary locally compact state space (S,d) is not entirely trivial, because we forego the convenience of linear interpolation in R. Chapter 10 of this book will introduce the condition of D-regularity, analogous to C-regularity, for the treatment of processes that are, almost uniformly, right continuous with left limits, again with a general locally compact metric space as state space. In Section 9.3, we will prove a generalization of Kolmogorov’s theorem for a.u. Hoelder continuity, to the case of the state space (S,d), in a sense to be made precise. In Section 9.4, we will apply this result to construct a Brownian motion. In Section 9.5, in the case of Gaussian processes, we will present a hithertounknown sufficient condition on the covariance function to guarantee a.u. Hoelder continuity. Then we will present the sufficient condition on the covariance function, due to [Garsia, Rodemich, and Rumsey 1970], for a.u. continuity along with a modulus of a.u. continuity. We will present their proof with a minor modification to make it strictly constructive. For a more general parameter space that is a subset of R m for some m ≥ 0, with some restriction on its local ε-entropy, [Potthoff 2009-2] gives sufficient conditions on the pair distributions Ft,s to guarantee the construction of an a.u. continuous or an a.u. Hoelder, real-valued, random field. In this and later chapters we will use the following notations for the dyadic rationals. Definition 9.0.2. Notations for dyadic rationals. For each m ≥ 0, define pm ≡ 2m , m ≡ 2−m , and the enumerated sets of dyadic rationals Qm ≡ {t0,t1, . . . ,tp(m) } = {qm,0, . . . ,qm,p(m) } ≡ {0,m,2m, . . . ,pm m } ≡ {0,m,2m, . . . ,1} ⊂ [0,1], Qm ≡ {u0,u1, . . . ,up(2m) } ≡ {0,2−m,2 · 2−m, . . . ,2m } ⊂ [0,2m ], m ≡ {0,m,2m, . . .} ≡ {0,2−m,2 · 2−m, . . .} ⊂ [0,∞). Q Define Q∞ ≡
∞
Qm = {t0,t1, . . .}.
m=0
Let m ≥ 0 be arbitrary. Then the enumerated set Qm is a 2−m -approximation of [0,1], with Qm ⊂ Qm+1 . Conditions in Definition 3.2.1 can easily be verified for the sequence
a.u. Continuous Process
333
ξ[0,1] ≡ (Qm )m=1,2,... to be a binary approximation of [0,1] relative to the reference point q◦ ≡ 0. In addition, define Q∞ ≡
∞
Qm = {u0,u1, . . .}.
m=0
Definition 9.0.3. Miscellaneous notations and conventions. As usual, to lighten the notational burden, we will write an arbitrary subscripted symbol ab interchangeably with a(b). We will write T U for a composite function T ◦ U . If f : A → B is a function from a set A to a set B, and if A is a nonempty subset A, then the restricted function f |A : A → B will also be denoted simply by f when there is little risk of confusion. If A is a measurable subset on a probability space (,L,E), then we will write P A, P (A), E(A), EA, or E1A interchangeably. For arbitrary r.r.v. Y ∈ L and measurable subsets A,B, we will write (A;B) ≡ AB, E(Y ;A) ≡ E(Y 1A ), and A ∈ L. As a further abbreviation, we drop the parentheses when there is little risk of confusion. For example, we write 1Y ≤β;Z>α ≡ 1(Y ≤β)(Z>α) . For an arbitrary integrable function X ∈ L, we will sometimes use the more suggestive notation E(dω)X(ω) for EX, where ω is a dummy variable. Let Y be an arbitrary r.r.v. Recall from Definition 5.1.3 the convention that if measurability of the set (Y < β) or (Y ≤ β) is required in a discussion, for some β ∈ R, then it is understood that the real number β has been chosen from the regular points of the r.r.v. Y . × ,S) denotes the set of r.f.’s with an arbitrary Separately, recall that R(Q parameter set Q , sample space (,L,E), and state space (S,d). Recall that [·]1 is an operation that assigns to each a ∈ R an integer [a]1 ∈ (a,a + 2). As usual, write d ≡ 1 ∧ d.
9.1 Extension from Dyadic Rational Parameters to Real Parameters Our approach to extend a given family F of f.j.d.’s that is continuous in probability on the parameter set [0,1] is as follows. First note that F carries no more useful information than its restriction F |Q∞ , where Q∞ is the dense subset of dyadic rationals in [0,1], because the family can be recovered from the F |Q∞ , thanks to continuity in probability. Hence we can first extend the family F |Q∞ to a process Z : Q∞ × → S by applying the Daniell–Kolmogorov Theorem or the Daniell–Kolmogorov–Skorokhod Theorem. Then any condition of the family F is equivalent to a condition to Z. In particular, in the current context, any condition on f.j.d.’s to make F extendable to an a.u. continuous process X : [0,1] × → S can be stated in terms of a process Z : Q∞ × → S, with the latter to be extended by limit to the process X. It is intuitively obvious that any a.u. continuous process Z : Q∞ × → S is
334
Stochastic Process
extendable to an a.u. continuous process X : [0,1] × → S, because Q∞ is dense [0,1]. In this section, we will make this precise, and we will define metrics such that the extension operation is itself a metrically continuous construction. Definition 9.1.1. Metric space of a.u. continuous processes. Let C[0,1] be the space of continuous functions x : [0,1] → (S,d), endowed with the uniform metric defined by dC[0,1] (x,y) ≡ sup d(x(t),y(t))
(9.1.1)
t∈[0,1]
for each x,y ∈ C[0,1]. Write dC[0,1] ≡ 1 ∧ dC[0,1] . Let (,L,E) be an arbitrary probability space. Let C[0,1] denote the set of stochastic processes X : [0,1] × (,L,E) → (S,d) that are a.u. continuous on on C[0,1] by [0,1]. Define a metric ρC[0,1] t ,Yt ) ≡ E dC[0,1] (X,Y ) ρC[0,1] (X,Y ) ≡ E sup d(X
(9.1.2)
t∈[0,1]
for each X,Y ∈ C[0,1]. Lemma 9.1.2 (next) says that (C[0,1],ρ ) is a well C[0,1] defined metric space. t ,Yt ) is an r.r.v. The is a metric. The function supt∈[0,1] d(X Lemma 9.1.2. ρC[0,1] is well defined and is a metric. function ρC[0,1] X ,δ Y , Proof. Let X,Y ∈ C[0,1] be arbitrary, with moduli of a.u. continuity δauc auc t ,Yt ) is defined a.s., on respectively. First note that the function supt∈[0,1] d(X account of continuity on [0,1] of X,Y on a full subset of . We need to prove that it is measurable, so that the expectation in the defining formula 9.1.2 makes sense. To that end, let ε > 0 be arbitrary. Then there exist measurable sets DX,DY c ) ∨ P (D c ) < ε, on the intersection of which we have with P (DX Y
d(Xt ,Xs ) ∨ d(Yt ,Ys ) ≤ ε X (ε) ∧ δ Y (ε). Now let the sequence for each t,s ∈ [0,1] with |t − s| < δ ≡ δauc auc s0, . . . ,sn be an arbitrary δ-approximation of [0,1], for some n ≥ 1. Then, for each t ∈ [0,1], we have |t − sk | < δ for some k = 0, . . . ,n, whence
d(Xt ,Xs(k) ) ∨ d(Yt ,Ys(k) ) ≤ ε on D ≡ DX DY . This, in turn, implies |d(Xt ,Yt ) − d(Xs(k),Ys(k) )| ≤ 2ε on D. It follows that n t ,Yt ) − s(k),Ys(k) ) ≤ 2ε d(X sup d(X t∈[0,1] k=1
(9.1.3)
a.u. Continuous Process
335
n
c on D, where Z ≡ k=1 d(Xt (k),Yt (k) ) is an r.r.v., and where P (D ) < 2ε. 1 For each p ≥ 1, we can repeat this argument with ε ≡ p . Thus we obtain a t ,Yt ) in probability as p → ∞. sequence Zp of r.r.v.’s with Zp → supt∈[0,1] d(X t ,Yt ) is accordingly an r.r.v. and, being bounded by 1, The function supt∈[0,1] d(X is is integrable. Summing up, the expectation in equality 9.1.2 exists, and ρC[0,1] well defined. to be a metric is straightVerification of the conditions for the function ρC[0,1] forward and is omitted here.
Definition 9.1.3. Extension by limit, if possible, of a process with parameter set Q∞ . Let Z : Q∞ × → S be an arbitrary process. Define a function X ≡ Lim (Z) by domain(X) ≡ {(r,ω) ∈ [0,1] × :
lim
s→r;s∈Q(∞)
Z(s,ω) exists}
and X(r,ω) ≡
lim
s→r;s∈Q(∞)
Z(s,ω)
for each (r,ω) ∈ domain(X). We will call X the extension-by-limit of the process Z to the parameter set [0,1]. A similar definition is made where the interval [0,1] is replaced by the interval [0,∞), and where the set Q∞ of dyadic rationals in [0,1] is replaced by the set Q∞ of dyadic rationals in [0,∞). We emphasize that, absent any additional conditions on the process Z, the function X need not be a process. Indeed, need not even be a well-defined function. Theorem 9.1.4. Extension by limit of a.u. continuous process on Q∞ to a.u. continuous process on [0,1]; and metrical continuity of said extension. Recall ∞ × ,S),ρP rob,Q(∞) ) of processes from Definition 6.4.2 the metric space (R(Q ∞ × ,S) whose members are a.u. Z : Q∞ × → S. Let R0 be a subset of R(Q continuous with a common modulus of a.u. continuity δauc . Then the following conditions hold: 0 be arbitrary. Then its extension-by-limit X ≡ Lim (Z) is an 1. Let Z ∈ R a.u. continuous process such that Xt = Zt on domain(Xt ) for each t ∈ Q∞ . Moreover, the process X has the same modulus of a.u. continuity δauc as Z. 2. The extension-by-limit 0,ρP rob,Q(∞) ) → (C[0,1],ρ ) Lim : (R C[0,1] is uniformly continuous, with a modulus of continuity δLim (·,δauc ). 0 be arbitrary. Let ε > 0 be arbitrary. Then, by hypothesis, Proof. 1. Let Z ∈ R there exists δauc (ε) > 0 and a measurable set D ⊂ with P (D c ) < ε such that for each ω ∈ D and for each s,s ∈ Q∞ with |s − s | < δauc (ε), we have d(Z(s,ω),Z(s ,ω)) ≤ ε.
(9.1.4)
336
Stochastic Process
Next let ω ∈ D and r,r ∈ [0,1] be arbitrary with |r − r | < δauc (ε). Letting s.s → r with s,s ∈ Q∞ , we have |s − s | → 0, and so d(Z(s,ω),Z(s ,ω)) → 0 as s,s → r. Since (S,d) is complete, we conclude that the limit X(r,ω) ≡
lim
s→r;s∈Q(∞)
Z(s,ω)
exists. Moreover, letting s → r with s,s ∈ Q∞ , inequality 9.1.4 yields d(Z(s,ω),X(r,ω)) ≤ ε.
(9.1.5)
Since ε > 0 is arbitrary, we see that Zs → Xr a.u. as s → r. Hence Xr is an r.v. Thus X : [0,1] × → S is a stochastic process. Now let s → r and s → r with s,s ∈ Q∞ in inequality 9.1.4. Then we obtain d(X(r,ω),X(r ,ω)) ≤ ε,
(9.1.6)
where ω ∈ D and r,r ∈ [0,1] are arbitrary with |r − r | < δauc (ε). Thus X has the same modulus of a.u. continuity δauc as Z. 2. It remains to verify that the mapping Lim is a continuous function. To that end, let ε > 0 be arbitrary. Write α ≡ 13 ε. Let m ≡ m(ε,δauc ) ≥ 1 be so large that 2−m < δauc (α). Define δLim (ε,δauc ) ≡ 2−p(m)−1 α 2 . 0 be arbitrary such that Let Z,Z ∈ R ρP rob,Q(∞) (Z,Z ) < δLim (ε,δauc ). Equivalently, E
∞
t (n),Z ) < 2−p(m)−1 α 2 . 2−n−1 d(Z t (n)
(9.1.7)
n=0
Then, by Chebychev’s inequality, there exists a measurable set A with P (Ac ) < α such that for each ω ∈ A, we have ∞
−p(m)−1 2−n−1 d(Z(t α, n,ω),Z (tn,ω)) < 2
n=0
whence d(Z(t n,ω),Z (tn,ω)) < α
(9.1.8)
for each n = 0, . . . ,pm . Now let X ≡ Lim (Z) and X ≡ Lim (Z ). By Assertion 1, the processes X and X have the same modulus of a.u. continuity δauc as Z and Z . Hence, there exist measurable sets D,D with P (D c ) ∨ P (D c ) < α such that for each ω ∈ DD , we have (r,ω),X (s,ω)) ≤ α d(X(r,ω),X(s,ω)) ∨ d(X for each r,s ∈ [0,1] with |r − s| < δauc (α).
(9.1.9)
a.u. Continuous Process
337
Now consider each ω ∈ ADD . Let r ∈ [0,1] be arbitrary. Since t0, . . . ,tp(m) is a 2−m -approximation of [0,1], there exists n = 0, . . . ,pm such that |r − tn | < 2−m < δauc (α). Then inequality 9.1.9 holds with s ≡ tn . Combining inequalities 9.1.8 and 9.1.9, we obtain (r,ω)) d(X(r,ω),X (r,ω),X (s,ω)) ≤ d(X(r,ω),X(s,ω)) + d(X(s,ω),X (s,ω)) + d(X (r,ω),X (s,ω)) < 3α, = d(X(r,ω),X(s,ω)) + d(Z(s,ω),Z (s,ω)) + d(X
where ω ∈ ADD and r ∈ [0,1] are arbitrary. It follows that r ,Xr ) (X,X ) ≡ E sup d(X ρC[0,1] r∈[0,1]
r ,Xr )1ADD + P (ADD )c ≤ E sup d(X r∈[0,1]
< 3α + 3α = 6α ≡ ε. We conclude that δLim (·,δauc ) is a modulus of continuity of the function Lim .
9.2 C-Regular Family of f.j.d.’s and C-Regular Process Definition 9.2.1. C-regularity. Let (,L,E) be an arbitrary sample space. Let Z : Q∞ × → S be an arbitrary process. We will say that Z is a C-regular process if there exists an increasing sequence m ≡ (mn )n=0,1,... of positive integers, called the modulus of C-regularity of the process Z, such that for each n ≥ 0 and for each βn > 2−n such that the set (n)
At,s ≡ (d(Zt ,Zs ) > βn ) is measurable for each s,t ∈ Q∞ , we have P (Cn ) < 2−n, where Cn ≡
t∈Q(m(n)) s∈[t,t ]Q(m(n+1))
(9.2.1)
(n)
(n)
At,s ∪ As,t .
(9.2.2)
Here, for each t ∈ Qm(n) , we abuse notations and write t ≡ 1 ∧ (t + 2−m(n) ) ∈ Qm(n) . Let F be an arbitrary consistent family of f.j.d.’s that is continuous in probability on [0,1]. Then the family F of consistent f.j.d.’s is said to be C-regular with the sequence m ≡ (mn )n=0,1,... as a modulus of C-regularity if F |Q∞ is the family of marginal distributions of some C-regular process Z. Let X : [0,1] × → S be an arbitrary process that is continuous in probability on [0,1]. Then the process X is said to be C-regular, with the sequence m ≡ (mn )n=0,1,... as a modulus of C-regularity if its family F of marginal distributions is C-regular with the sequence m as a modulus of C-regularity.
338
Stochastic Process
We will prove that a process on [0,1] is a.u. continuous iff it is C-regular. Theorem 9.2.2. a.u. Continuity implies C-regularity. Let (,L,E) be an arbitrary sample space. Let X : [0,1] × → S be an a.u. continuous process, with a modulus of a.u. continuity δauc . Then the process X is C-regular, with a modulus of C-regularity given by m ≡ (mn )n=0,1,... , where m0 ≡ 1 and mn ≡ [mn−1 ∨ (− log2 δauc (2−n ))]1 for each n ≥ 1. Proof. 1. First note that X is continuous in probability. Let F denote its family of marginal distributions. Then F |Q∞ is the family of marginal distributions of the process Z ≡ X|Q∞ . We will show that Z is C-regular. 2. To that end, let n ≥ 0 be arbitrary. By Definition 6.1.2 of a.u. continuity, there exists a measurable set Dn with P (Dnc ) < 2−n such that for each ω ∈ Dn and for each s,t ∈ [0,1] with |t − s| < δauc (2−n ), we have d(X(t,ω),X(s,ω)) ≤ 2−n .
(9.2.3)
3. Let βn > 2−n be arbitrary and let A(n) t,s ≡ (d(Zt ,Zs ) > βn ) ≡ (d(Xt ,Xs ) > βn ) for each s,t ∈ Q∞ . Define Cn ≡
t∈Q(m(n)) s∈[t,t ]Q(m(n+1))
(n)
(n)
At,s ∪ As,t ,
(9.2.4)
where for each t ∈ Qm(n) , we abuse notations and write t ≡ 1 ∧ (t + 2−m(n) ) ∈ Qm . Suppose, for the sake of a contradiction, that P (Dn Cn ) > 0. Then there exists some ω ∈ Dn Cn . Hence, by equality 9.2.4, there exists t ∈ Qm(n) and s ∈ [t,t ]Qm(n+1) with d(Z(t,ω),Z(s,ω)) ∨ d(Z(s,ω),Z(t ,ω)) > βn .
(9.2.5)
It follows from s ∈ [t,t ] that |s − t| ∨ |s − t | ≤ 2−m(n) < δauc (2−n ),
(9.2.6)
whence d(Z(t,ω),Z(s,ω)) ∨ d(Z(s,ω),Z(t ,ω)) ≤ 2−n < β n, contradicting inequality 9.2.5. We conclude that P (Dn Cn ) = 0. Consequently, P (Cn ) = P (Dn ∪ Dnc )Cn = P (Dnc Cn ) ≤ P (Dnc ) < 2−n . Thus the conditions in Definition 9.2.1 are satisfied for the process Z, the family F |Q∞ , the family F , and the process X to be C-regular, all with modulus of C-regularity given by m.
a.u. Continuous Process
339
The next theorem is the converse of Theorem 9.2.2. Theorem 9.2.3. C-regularity implies a.u. continuity. Let (,L,E) be an arbitrary sample space. Let F be a C-regular family of consistent f.j.d.’s. Then there exists an a.u. continuous process X : [0,1] × → S with marginal distributions given by F . Specifically, let m ≡ (mn )n=0,1,... be a modulus of C-regularity of F . Let Z : Q∞ × → S be an arbitrary process with marginal distributions given by F |Q∞ . Let ε > 0 be arbitrary. Define h ≡ [0 ∨ (4 − log2 ε)]1 and δauc (ε,m) ≡ 2−m(h) . Then δauc (·,m) is a modulus of a.u. continuity of Z. Moreover, the extension-by-limit X ≡ Lim (Z) : [0,1]× → S of the process Z to the full parameter set [0,1] is a.u. continuous, with the same modulus of a.u. continuity δauc (·,m), and with marginal distributions given by F . Proof. 1. Note that F |Q∞ is C-regular, with m as a modulus of C-regularity. Let Z : Q∞ × → S be an arbitrary process with marginal distributions given by F |Q∞ . We will verify that Z is a.u. continuous. 2. To that end, let n ≥ 0 be arbitrary. Take any βn ∈ (2−n,2−n+1 ). Then, by Definition 9.2.1,
where Cn ≡
P (Cn ) < 2−n,
(9.2.7)
(d(Zt ,Zs ) > βn ) ∪ (d(Zs ,Zt ) > βn ).
(9.2.8)
t∈Q(m(n)) s∈[t,t ]Q(m(n+1))
As before, for each t ∈ Qm(n) , we abuse notations and write t ≡ 1∧(t +2−m(n) ) ∈ Qm(n) .
c ∞ 2. Now define Dn ≡ j =n Cj . Then P (Dnc ) ≤
∞
j =n
P (Cj )
0 be arbitrary. Let n ≡ [0 ∨ (4 − log2 ε)]1 and δauc (ε,m) ≡ −m(n) , as in the hypothesis. In Step 6, we saw that the measurable set Dn ≡ 2 ∞ c c −n+1 < ε and such that d(Z(r,ω),Z(s,ω)) < j =n Cj is such that P (Dn ) ≤ 2 −n+4 2 < ε for each r,s ∈ Q∞ with |s − r| < δauc (ε,m) ≡ 2−m(n). . Thus the process Z is a.u. continuous, with δauc (·,m) as a modulus of a.u. continuity of Z. 8. By Theorem 9.1.4, the extension-by-limit X ≡ Lim (Z) of the process Z to the full parameter set [0,1] is a.u. continuous with the same modulus of a.u. continuity δauc (·,m). 9. It remains to verify that the process X has marginal distributions given by the family F . To that end, let r1, . . . ,rk be an arbitrary sequence in [0,1], and let s1, . . . ,sk be an arbitrary sequence in Q∞ . Let f ∈ Cub (S k ,d k ) be arbitrary. Then Fs(1),...,s(k) f = E(Zs(1), . . . ,Zs(k) ) = E(Xs(1), . . . ,Xs(k) ).
(9.2.14)
Now let si → ri for each i = 1, . . . ,k. Then Fs(1),...,s(k) f → Fr(1),...,r(k) f because the family F is continuous in probability, and E(Xs(1), . . . ,Xs(k) ) → E(Xr(1), . . . ,Xr(k) ) because the process X is a.u. continuous. Hence equality 9.2.14 yields Fr(1),...,r(k) f = E(Xr(1), . . . ,Xr(k) ),
(9.2.15)
where r1, . . . ,rk is an arbitrary sequence in [0,1], and where f ∈ Cub (S k ,d k ) is arbitrary. In short, the process X has marginal distributions given by the family F . The theorem is proved. Theorem 9.2.4. Continuity of extension-by-limit of C-regular processes. ∞ × ,S),ρP rob,Q(∞) ) of Recall from Definition 6.4.2 the metric space (R(Q processes Z : Q∞ × → S with parameter set Q∞ ≡ {t0,t1, . . .}, sample ∞ × ,S) 0 be a subset of R(Q space (,L,E), and state space (S,d). Let R whose members are C-regular with a common modulus of C-regularity m ≡ ) be the metric space of a.u. continuous processes (mn )n=0,1,... . Let (C[0,1],ρ C[0,1] on [0,1], as in Definition 9.1.1. Then the extension-by-limit 0,ρP rob,Q(∞) ) → (C[0,1],ρ ), Lim : (R C[0,1]
342
Stochastic Process
as in Definition 9.1.3, is uniformly continuous with a modulus of continuity δCreg2auc (·,m). Proof. 1. Let ε > 0 be arbitrary. Define j ≡ [0 ∨ (6 − log2 ε)]1, hj ≡ 2m(j ), and δCreg2auc (ε,m) ≡ 2−h(j )−2j −1 . 0 . We will prove that δCreg2auc (·,m) is a modulus of continuity of Lim on R 2. Let Z,Z ∈ R0 be arbitrary and let X ≡ Lim (Z), X ≡ Lim (Z ). Suppose ρprob,Q(∞) (Z,Z ) ≡ E
∞
t (n),Z ) < δCreg2auc (ε,m). 2−n−1 d(Z t (n)
(9.2.16)
n=0
We need to prove that ρC[0,1] (X,X ) < ε. 3. By Step 6 of the proof of Theorem 9.2.3, there exist measurable sets Dj ,Dj with P (Djc ) ∨ P (Dj c ) < 2−j +1 such that d(Xr ,Xs ) ∨ d(Xr ,Xs ) = d(Zr ,Zs ) ∨ d(Zr ,Zs ) ≤ 2−j +4
(9.2.17)
on Dj Dj , for each r,s ∈ Q∞ with |r − s| < 2−m(j ) . Consider each ω ∈ Dj Dj and t ∈ [0,1]. Then there exists s ∈ Qm(j ) such that |t − s| < 2−m(j ) . Letting r → t with r ∈ Q∞ and |r − s| < 2−m(j ) , inequality 9.2.17 yields d(Xt (ω),Xs (ω)) ∨ d(Xt (ω),Xs (ω)) ≤ 2−j +4 . Consequently, |d(Xt (ω),Xt (ω)) − d(Xs (ω),Xs (ω))| < 2−j +5, where ω ∈ Dj Dj and t ∈ [0,1] are arbitrary. Therefore sup d(Xt ,X ) − d(Xs ,Xs ) ≤ 2−j +5 t t∈[0,1] s∈Q(m(j ))
(9.2.18)
on Dj Dj . Note here that Lemma 9.1.2 earlier proved that the supremum is an r.r.v. 4. Separately, take any α ∈ (2−j ,2−j +1 ) and define ⎛ ⎞ Aj ≡ ⎝ d(Xs ,Xs ) ≤ α ⎠ . (9.2.19) s∈Q(m(j ))
Then inequality 9.2.18 and equality 9.2.19 together yield
Gj ≡ Dj Dj Aj ⊂
a.u. Continuous Process
343
sup d(Xt ,Xt ) ≤ 2−j +5 + 2−j +1 .
(9.2.20)
t∈[0,1]
5. By inequality 9.2.16, we have ρprob,Q(∞) (Z,Z ) < δCreg2auc (ε,m) ≡ 2−h(j )−2j −1,
(9.2.21)
where hj ≡ 2m(j ) . Hence E
s ,Xs ) = E d(X
h(j )
t (k),X ) d(X t (k)
k=0
s∈Q(m(j ))
≤ 2h(j )+1 E
h(j )
t (k),X ) 2−k−1 d(X t (k)
k=0
≤ 2h(j )+1 E
h(j )
t (k),Z ) 2−k−1 d(Z t (k)
k=0
≤2
h(j )+1
ρprob,Q(∞) (Z,Z ) < 2−2j < α 2 .
(9.2.22)
Chebychev’s inequality therefore implies that P (Acj ) < α < 2−j +1 . Hence, in view of relation 9.2.20, we obtain t ,Xt ) ≤ E1G(j ) sup d(X t ,Xt ) + P (Gcj ) (X,X ) ≡ E sup d(X ρC[0,1] t∈[0,1]
−j +5
≤ (2
t∈[0,1]
−j +1
+2
) + P (Acj ) + P (Djc ) + P (Dj c )
< (2−j +5 + 2−j +1 ) + 2−j +1 + 2−j +1 + 2−j +1 < 2−j +6 < ε. Since ε > 0 is arbitrary, we see that δCreg2auc (·,m) is a modulus of continuity of Lim . Theorems 9.2.3 and 9.2.4 can now be restated in terms of C-regular consistent families of f.j.d.’s. Corollary 9.2.5. Construction of a.u. continuous process from C-regular family of f.j.d.’s. Let ! (0,L0,I0 ) ≡ [0,1],L0, ·dx denote the Lebesgue integration space based on the interval 0 ≡ [0,1]. Let ξ be a fixed binary approximation of (S,d) relative to a reference point x◦ ∈ S. Cp ([0,1],S), ρCp,ξ,[0,1],Q(∞) ) of Recall from Definition 6.2.12 the metric space (F
344
Stochastic Process
consistent families of f.j.d.’s that are continuous in probability, with parameter set [0,1] and state space (S,d). Cp ([0,1],S) whose members are C-regular and share 0 be a subset of F Let F a common modulus of C-regularity m ≡ (mn )n=0,1,2,... . Define the restriction 0 → F 0 |Q∞ by [0,1],Q(∞) (F ) ≡ F |Q∞ for each F ∈ function [0,1],Q(∞) : F 0 . Then the following conditions hold: F 1. The function 0, ρCp,ξ,[0,1],Q(∞) ) fj d,auc,ξ ≡ Lim ◦ DKS,ξ ◦ [0,1],Q(∞) : (F → (C[0,1],ρ ) C[0,1]
(9.2.23)
is well defined, where DKS,ξ is the Daniell–Kolmogorov–Skorokhod extension constructed in Theorem 6.4.3, and where Lim is the extension-by-limit constructed in Theorem 9.2.3. 0 , the a.u. continuous process X ≡ fj d,auc,ξ 2. For each consistent family F ∈ F (F ) has marginal distributions given by F . 0 such that {F0 ;F ∈ F 0,γ } is tight with a certain 0,γ be a subset of F 3. Let F modulus of tightness γ . Then the construction fj d,auc,ξ is uniformly continuous 0,γ . on the subset F 0 be arbitrary. By hypothesis, F is C-regular, with m Proof. 1. Let F ∈ F as a modulus of C-regularity. Since the process Z ≡ DKS,ξ (F |Q∞ ) : Q∞ × 0 → S extends F |Q∞ , Z is C-regular, with m as a modulus of C-regularity. 0 is the set of C-regular processes on Q∞ , 0 , where R In other words, Z ∈ R with sample space (0,L0,I0 ) and whose members have m as a modulus of C-regularity. Hence the a.u. continuous process X ≡ Lim (Z) is well defined by Theorem 9.2.3, with X|Q∞ = Z. Thus the composite mapping in equality 9.2.23 is well defined. Assertion 1 is verified. 0 is continuous in probability. Hence, for 2. Being C-regular, the family F ∈ F each r1, . . . ,rn ∈ [0,1] and f ∈ Cub (S n ), we have Fr(1),...,r(n) f = = =
lim
Fs(1),...,s(n) f
lim
I0 f (Zs(1), . . . ,Zs(n) )
lim
I0 f (Xs(1), . . . ,Xs(n) )
s(i)→r(i);s(i)∈Q(∞);i=1,...,n s(i)→r(i);s(i)∈Q(∞);i=1,...,n s(i)→r(i);s(i)∈Q(∞);i=1,...,n
= I0 f (Xr(1), . . . ,Xr(n) ), where the last equality follows from the a.u. continuity of X. We conclude that F is the family of marginal distributions of X, proving Assertion 2. 0 such that {F0 ;F ∈ F 0,γ } is tight, with a 0,γ be an arbitrary subset of F 3. Let F certain modulus of tightness γ . Consider each F ∈ F0,γ . Then X ≡ fj d,auc,ξ (F ) is a.u. continuous with a modulus of a.u. continuity δauc (·,m). Let ε0 > 0 be arbitrary. Write ε ≡ 2−1 ε0 . Then there exists a measurable set A with P (Ac ) < ε such that for each ω ∈ A, we have d(Xt (ω),Xs (ω)) ≤ ε for each
a.u. Continuous Process
345
t,s ∈ [0,1] with |t − s| < δauc (ε,m). Take n ≥ 1 so large that n−1 < δauc (ε,m). Separately, there exists c > γ (ε) such that P (C c ) < ε, where C ≡ (d(X0, x◦ ) ≤ c). Let t ∈ [0,1] be arbitrary. Define β(ε0,t) ≡ β(ε0,t,γ ,m) ≡ c + nε. Consider each ω ∈ AC. Then d(x◦,Xt (ω)) ≤ d(x◦,X0 (ω)) +
n
d(X(i−1)t/n (ω),Xit/n (ω))
i=1
≤ c + nε ≡ β(ε0,t). In other words, (d(x◦,Xt ) > β(ε0,t)) ⊂ Ac ∪ C c . For each k ≥ 1, define the function hk ≡ 1 ∧ (1 + k − d(·,x◦ ))+ ∈ C(S,d). Then, for each k ≥ β(ε0,t),we have (1 − hk (Xt ) > 0) ⊂ (d(Xt ,x◦ ) > k) ⊂ (d(Xt ,x◦ ) > β(ε0,t)) ⊂ Ac ∪ C c . Hence Ft (1 − hk ) = E(1 − hk (Xt )) ≤ P (Ac ∪ C c ) < 2ε ≡ ε0 . Thus β(·, · ,γ ,m) is a modulus of pointwise tightness of the family F , according β ≡ F β ([0,1],S). 0,γ ⊂ F to Definition 6.3.5. Summing up, F 4. Recall the metric space (R(Q∞ ×0,S),ρprob,Q(∞) ) of processes Z : Q∞ × 0 → S. Then the function 0,γ , 0,γ |Q∞, [0,1],Q(∞) : (F ρCp,ξ,[0,1],Q(∞) ) → (F ρMarg,ξ,Q(∞) )
(9.2.24)
is an isometry, according to Definition 6.2.12. Separately, the Daniell– Kolmogorov–Skorokhod Extension β |Q∞, ∞ × 0,S),ρprob,Q(∞) ) DKS,ξ : (F ρMarg,ξ,Q(∞) ) → (R(Q 0,γ ⊂ F β , the is uniformly continuous, according to Theorem 6.4.5. Hence, since F function 0,γ |Q∞, ∞ × 0,S),ρprob,Q(∞) ) (9.2.25) ρMarg,ξ,Q(∞) ) → (R(Q DKS,ξ : (F 0 |Q∞ ) ⊂ is uniformly continuous. Moreover, by Step 1, we see that DKS,ξ (F R0 ,where the R0 is a set of C-regular processes on Q∞ whose members share the common modulus of C-regularity m. Therefore we have the uniformly continuous function 0,γ |Q∞, 0,ρprob,Q(∞) ). ρMarg,ξ,Q(∞) ) → (R DKS,ξ : (F
(9.2.26)
5. Finally, Theorem 9.2.4 says that 0,ρprob,Q(∞) ) → (C[0,1],ρ Lim : (R ) C[0,1]
(9.2.27)
346
Stochastic Process
is uniformly continuous. In view of expressions 9.2.24, 9.2.26, and 9.2.27, we see that the composite function 0,γ , ρCp,ξ,[0,1],Q(∞) ) fj d,auc,ξ ≡ Lim ◦ DKS,ξ ◦ [0,1],Q(∞) : (F ) (9.2.28) → (C[0,1],ρ C[0,1] is uniformly continuous. Assertion 3 and the corollary are proved.
9.3 a.u. Hoelder Process In this section, let (S,d) be a locally compact metric space. We will prove a theorem, due to Kolmogorov and Chentsov, that gives a sufficient condition on pairwise joint distributions for the construction of an a.u. Hoelder continuous process X : [0,1] × → S, in a sense to be made precise presently. Refer to Sections 9.1 and 9.2 and to Definitions 9.0.2 and 9.0.3 for notations and conventions, especially for the sets Qk and Q∞ of dyadic rationals in [0,1], for each k ≥ 0. Refer to Definition 9.1.3 for the operation Lim that extends, if possible, a process by limit. Lemma 9.3.1. A sufficient condition on pair distributions for a.u. continuity. Let κ ≥ 1 be arbitrary. Let γ ≡ (γk )k=κ,κ+1,... and ε ≡ (εk )k=κ,κ+1,... be two arbitrary sequences of positive real numbers with ∞ k=κ γk < ∞ and ∞ k=κ εk < ∞. Let Z : Q∞ × (,L,E) → (S,d) be an arbitrary process such that for each k ≥ κ and for each α k ≥ γk , we have
P (d(Zt ,Zt+(k+1) ) > αk ) t∈[0,1)Q(k)
+
P (d(Zt+(k+1),Zt+(k) ) > αk ) ≤ 2εk .
(9.3.1)
t∈[0,1)Q(k)
Then the following conditions hold: 1. The extension-by-limit X ≡ Lim (Z) : [0,1] × → (S,d) is an a.u. continuous process, with a modulus of a.u. continuity δ auc (·,γ ,ε,κ) defined as ∞ follows. Let ε > 0 be arbitrary. Take n ≥ κ so large that 12 ∞ k=n γk ∨ 2 k=n εk < ε. Define δ auc (ε,γ ,ε,κ) ≡ 2−n−1 . 2. There exists a sequence (Dn )n=κ,κ+1,... of measurable sets such that (i) Dκ ⊂ Dκ+1 ⊂ . . . ; (ii) for each n ≥ κ, we have P (Dnc ) ≤ 2
∞
εk ;
(9.3.2)
k=n
and (iii) for each n ≥ κ and each ω ∈ Dn , we have d(Xr (ω),Xs (ω)) < 12
∞
k=n
for each r,s ∈ [0,1] with |r − s| < 2−n−1 .
γk
(9.3.3)
a.u. Continuous Process
347
Proof. 1. Let r ∈ Q∞ and k ≥ 1 be arbitrary. There exists a unique uk (r) ≡ u(r,k) ∈ Qk such that r ∈ [uk (r),uk (r) + 2−k ). In words, uk : Q∞ → Qk is the function that assigns to each r ∈ Q∞ the largest member in Qk not to exceed r. Then either r ∈ [uk (r),uk (r) + 2−k−1 ) or r ∈ [uk (r) + 2−k−1,uk (r) + 2−k ). Note that uk (r),uk (r) + 2−k−1 ∈ Qk+1 . Hence we have either uk+1 (r) = uk (r) or uk+1 (r) = uk (r) + 2−k−1 . In either case, |uk (r) − uk+1 (r)| ≤ 2−k−1 ≡ k+1 .
(9.3.4)
Separately, if r ∈ Qk , then uk (r) = r. 2. Let Z : Q∞ × → S be as given. Let k ≥ κ be arbitrary, and fix any αk ∈ [γk ,2γk ). Define Ck ≡ (d(Zt ,Zt+(k+1) ) > αk ) ∪ (d(Zt+(k+1),Zt+(k) ) > αk ). t∈[0,1)Q(k)
(9.3.5) Then P (Ck ) ≤ 2εk , thanks to inequality 9.3.1 in the hypothesis. Let ω ∈ Ckc be arbitrary. Consider each u ∈ Qk and v ∈ Qk+1 with |u − v| ≤ k+1 . There are three possibilities: (i ) v = u, (ii ) v = u + k+1 , or (iii ) v = r + k+1 and u = r + k for some r ∈ [0,1)Qk . In view of equality 9.3.5, each of Conditions (i –iii ) yields d(Zu (ω),Zv (ω)) ≤ αk ,
(9.3.6)
where u ∈ Qk and v ∈ Qk+1 are arbitrary with |u−v| ≤ k+1 , and where ω ∈ Ckc and k ≥ κ are arbitrary. 3. Let n ≥ κ be arbitrary, but fixed until further notice. Define the measurable set ∞ c Ck . Dn ≡ k=n
Then Dn ⊂ Dn+1 and
P Dnc
≡P
∞
k=n
Ck
≤
∞
2εk .
(9.3.7)
k=n
4. Consider each ω ∈ Dn , t ∈ [0,1)Qn , and k ≥ n. Then ω ∈ Dn ⊂ Ckc . Let r ∈ Q∞ [t,t + 2−n ) be arbitrary. According to inequalities 9.3.4 and 9.3.6, we have d(Zu(r,k) (ω),Zu(r,k+1) (ω)) ≤ αk . The triangle inequality therefore yields, for each i ≥ n, d(Zu(r,n) (ω),Zu(r,i) (ω)) ≤
i−1
k=n
d(Zu(r,k) (ω),Zu(r,k+1) (ω)) ≤
∞
k=n
αk . (9.3.8)
348
Stochastic Process
At the same time, because r ∈ [t,t + 2−n ) and r ∈ [un (r),un (r) + 2−n ), the uniqueness of un (r) in Qn implies that t = un (r). Moreover, because r ∈ Q∞ , there exists i ≥ n such that r ∈ Qi , whence ui (r) = r. Inequality 9.3.8 therefore leads to d(Zt (ω),Zr (ω)) = d(Zu(r,n) (ω),Zu(r,i) (ω)) ≤
∞
αk ,
(9.3.9)
k=n
where r ∈ Q∞ [t,t + 2−n ), t ∈ [0,1)Qn , and ω ∈ Dn are arbitrary. 5. Next, consider the endpoint r = t + 2−n of [t,t + 2−n ). Then, because ω ∈ Dn ⊂ Cnc , the defining equality 9.3.5, with k replaced by n, implies that d(Zt (ω),Zt+(n+1) (ω)) ∨ d(Zt+(n+1) (ω),Zt+(n) (ω)) ≤ αn . Hence d(Zt (ω),Zr (ω)) = d(Zt (ω),Zt+(n) (ω)) ≤ d(Zt (ω),Zt+(n+1) (ω)) + d(Zt+(n+1) (ω),Zt+(n) (ω)) ≤ 2αn ≤ 2
∞
αk ,
k=n
where r = t
+ 2−n .
Combining with inequality 9.3.9, this yields d(Zt (ω),Zr (ω)) ≤ 2
∞
αk ,
(9.3.10)
k=n
where r ∈ Q∞ [t,t + 2−n ], t ∈ [0,1)Qn , and ω ∈ Dn are arbitrary. The same inequality 9.3.10 also holds when the condition t ∈ [0,1)Qn is relaxed to t ∈ [0,1]Qn = Qn , because if t = 1, then r = 1, and inequality 9.3.10 holds trivially. 6. Again, consider each ω ∈ Dn . Write m ≡ n+1. Then ω ∈ Dm . Let r,s ∈ Q∞ be arbitrary such that |s − r| < 2−m ≡ 2−n−1 . Assume, without loss of generality, that s ≤ r. Let t ≡ um (r) ∈ Qm . Then r ∈ Q∞ [t,t + 2−m ]. Hence, by inequality 9.3.10, we have d(Zt (ω),Zr (ω)) ≤ 2
∞
αk .
(9.3.11)
k=m
Moreover, s ∈ [t − 9.3.10, we have
2−m,t]
or, s ∈ [t,t +
2−m ].
d(Zt (ω),Zs (ω)) ≤ 4
Therefore, again by inequality
∞
αk .
(9.3.12)
k=m
Combining inequalities 9.3.11, and 9.3.12, we obtain d(Zr (ω),Zs (ω)) ≤ 6
∞
k=n+1
αk < 12
∞
γk
k=n
for arbitrary r,s ∈ Q∞ with |s − r| < 2−n−1 , where ω ∈ Dn is arbitrary.
(9.3.13)
a.u. Continuous Process 349 7. Since P (Dnc ) ≤ k=n 2εk → 0 and 12 ∞ k=n γk → 0 as n → ∞, it follows that the process Z is a.u. continuous, with a modulus of a.u. continuity δ auc defined as follows. Let ε > 0 be arbitrary. Let n ≡ n(ε,γ ,ε,κ) ≥ κ be so large that ∞
12
∞
k=n
γk ∨ 2
∞
εk < ε.
k=n
Define δ auc (ε) ≡ δ auc (ε,γ ,ε,κ) ≡ 2−n−1 . 8. Theorem 9.1.4 then says that the extension-by-limit X ≡ Lim (Z) : [0,1] × → (S,d) is an a.u. continuous process, with the same modulus of a.u. continuity δ auc (·,γ ,ε,κ). Moreover, inequality 9.3.13 implies that d(Xr (ω),Xs (ω)) ≤ 12
∞
γk ,
(9.3.14)
k=n
where r,s ∈ Q∞ are arbitrary with |r − s| < 2−n−1 , and where ω ∈ Dn is arbitrary. Since X is a.u. continuous on [0,1] and since Q∞ is dense in [0,1], it follows that inequality 9.3.14 holds for arbitrary r,s ∈ [0,1] with |r − s| < 2−n−1 and for arbitrary ω ∈ Dn . The lemma is proved. Definition 9.3.2. a.u. Hoelder continuous process. Let a > 0 be arbitrary. A process X : [0,a] × → (S,d) is said to be a.u. globally Hoelder, or simply a.u. Hoelder, if there exists a constant θ > 0 and for each ε > 0, there exists a measurable set D with P (D c ) < ε and a real number cH (ε) such that for each ω ∈ D, we have d(Xr (ω),Xs (ω)) < cH (ε)|r − s|θ
(9.3.15)
for each r,s ∈ [0,a]. The constant θ is called a Hoelder exponent of the process X, and the operation cH is called an a.u. Hoelder coefficient of the process X. We emphasize that θ and cH are independent of r,s, which explains our use of the adverb globally. Theorem 9.3.3. A sufficient condition on pair distributions for a.u. Hoelder continuity. Let (S,d) be a locally compact metric space. Let c0,u,w > 0 be arbitrary. Let θ > 0 be arbitrary such that θ < u−1 w. Then the following conditions hold: 1. Suppose Z : Q∞ × (,L,E) → (S,d) is an arbitrary process such that P (d(Zr ,Zs ) > b) ≤ c0 b−u |r − s|1+w
(9.3.16)
for each b > 0, for each r,s ∈ Q∞ . Then the extension-by-limit X ≡ Lim (Z) : [0,1] × → (S,d) is a.u. Hoelder with exponent θ , and with some a.u. Hoelder coefficient cH ≡ cH (·,c0,u,w,θ ).
350
Stochastic Process
2. Inequality 9.3.16 is satisfied if Ed(Zr ,Zs )u ≤ c0 |r − s|1+w
(9.3.17)
for each r,s ∈ Q∞ . 3. Suppose F is a consistent family of f.j.d.’s such that Fr,s 1(d>b) ≤ c0 b−u |r − s|1+w
(9.3.18)
for each b > 0, for each r,s ∈ [0,1]. Then F is the family of marginal distributions of some process X : [0,1] × → (S,d) that is a.u. Hoelder with exponent θ , and with some a.u. Hoelder coefficient cH ≡ cH (·,c0,u,w,θ ). Inequality 9.3.18 is satisfied if Fr,s d u ≤ c0 |r − s|1+w
(9.3.19)
for each r,s ∈ [0,1]. Proof. Let Z : Q∞ × (,L,E) → (S,d) be an arbitrary process such that inequality 9.3.16 holds. 1. As an abbreviation, define the positive constants a ≡ (w − θ u), c1 ≡ 12(1 − 2−θ )−1, and c ≡ c1 22θ . Fix κ ≡ κ(c0,a,w) so large that (i) κ ≥ 2 ∨ 2c0 2−w−1 and (ii) 2−ka k 2 ≤ 1 for each k ≥ κ. Thus K is determined by c0,u,w, and θ . We will verify that the process X is a.u. Hoelder with Hoelder exponent θ . 2. To that end, let k ≥ κ be arbitrary. Define εk ≡ c0 2−w−1 k −2
(9.3.20)
γk ≡ 2−kw/u k 2/u .
(9.3.21)
and
Take any αk ≥ γk such that the set (d(Zt ,Zs ) > αk ) is measurable for each t,s ∈ Q∞ . Let t ∈ [0,1)Qk be arbitrary. We estimate P (d(Zt ,Zt+(k+1) ) > αk ) ≤ c0 αk−u 1+w k+1 ≤ c0 γk−u 2−(k+1)w 2−(k+1) = c0 2kw k −2 2−(k+1)w 2−(k+1) = c0 2−w−1 k −2 2−k , where the first inequality is thanks to inequality 9.3.16 in the hypothesis. Similarly, P (d(Zt+(k+1),Zt+(k) ) > αk ) ≤ c0 2−w−1 k −2 2−k . Combining, we obtain
P (d(Zt ,Zt+(k+1) ) > αk ) + t∈[0,1)Q(k)
P (d(Zt+(k+1),Zt+(k) ) > αk )
t∈[0,1)Q(k)
≤ 2 · 2k (c0 2−w−1 k −2 2−k ) = 2c0 2−w−1 k −2 ≡ 2εk , where k ≥ κ is arbitrary.
a.u. Continuous Process 351 ∞ 3. Since k=κ γk < ∞ and k=κ εk < ∞, the conditions in the hypothesis of Theorem 9.3.1 are satisfied by the objects Z,(γk )k=κ,κ+1···,(εk )k=κ,κ+1··· . Accordingly, the extension-by-limit X ≡ Lim (Z) : [0,1] × → (S,d) is a.u. continuous. 4. Moreover, according to Assertion 2 of Lemma 9.3.1, there exists a sequence (Dn )n=κ,κ+1,... of measurable sets such that (i) Dκ ⊂ Dκ+1 ⊂ · · · , (ii) for each n ≥ κ, we have ∞
P (Dnc ) ≤ 2
∞
εk ;
(9.3.22)
k=n
and (iii) for each n ≥ κ and each ω ∈ Dn , we have d(Xr (ω),Xs (ω)) < 12
∞
γk
(9.3.23)
k=n
for each r,s ∈ [0,1] with |r − s| ≤ 2−n−1 . Because the process X is a.u. continuous, we may assume that X(·,ω) is continuous on [0,1] for each ω ∈ Dn , for each n ≥ κ. 5. We will now estimate bounds for the partial sum on the right-hand side of each of the inequalities 9.3.22 and 9.3.23. To that end, consider each n ≥ κ. Then ! ∞ ∞ ∞
−w−1 −2 −w−1 εk ≡ 2 c0 2 k ≤ 2c0 2 y −2 dy = 2c0 2−w−1 (n − 1)−1 . 2 k=n
y=n−1
k=n
(9.3.24) At the same time, 12
∞
γk ≡ 12
k=n
∞
k=n
= 12
∞
k=n
2−kw/u k 2/u ≡ 12
∞
2−kθu/u 2−ka/u k 2/u
k=n
2−kθ 2−ka/u k 2/u ≤ 12
∞
2−kθ
k=n
= 12 · 2−nθ (1 − 2−θ )−1 ≡ c1 2−nθ ,
(9.3.25)
where the second equality is because a ≡ (w − θ u), and where the inequality is because 2−ka k 2 ≤ 1 for each k ≥ n ≥ κ. 6. Recall that κ ≥ 2 ∨ 2c0 2−w−1 . Now let ε > 0 be arbitrary. Take m ≥ κ so large that 2c0 2−w−1 (m − 1)−1 < ε.
(9.3.26)
Thus m is determined by ε, κ, c0 , and w. In Step 1, we saw that κ is determined by c0 , θ,u, and w. Hence m is determined by ε, θ,u,c0 , and w. It follows that cH ≡ c2(m+1) is determined by ε, θ,u,c0 , and w. Inequalities 9.3.22, 9.3.24, and c ) ≤ ε. Consider each ω ∈ D ⊂ D 9.3.26 together imply that P (Dm m m+1 ⊂ . . ..
352
Stochastic Process
Then, for each n ≥ m, we have ω ∈ Dn , whence inequalities 9.3.23 and 9.3.25 together imply that d(Xr (ω),Xs (ω)) < c1 2−nθ
(9.3.27)
for each (r,s) ∈ Gn ≡ {(r,s) ∈ Q2∞ : 2−n−2 ≤ |r − s| ≤ 2−n−1 }. 7. Hence d(Xr (ω),Xs (ω)) < c1 22θ 2−(n+2)θ ≤ c1 22θ |r − s|θ ≡ c|r − s|θ
(9.3.28)
for each (r,s) ∈ Gn , for each n ≥ m. 8. Therefore d(Xr (ω),Xs (ω)) < c|r − s|θ
(9.3.29)
for each (r,s) ∈ Gm ≡ {(r,s) ∈ Q2∞ : |r − s| ≤ 2−m−1 } =
∞
Gn .
n=m
9. Let r,s ∈ Q∞ be arbitrary. Now write n ≡ 2m+1 ≥ m. For each i = 0, . . . ,n, define ri ≡ (1 − in−1 )r + in−1 s ∈ Q∞ . Then r0 = r, rn = s, and |ri − ri−1 | = n−1 |r − s| ≡ 2−m−1 |r − s| ≤ 2−m−1 . Hence (ri−1,ri ) ∈ Gm for each i = 1, . . . ,n. Therefore, in view of inequality 9.3.29, we obtain d(Xr (ω),Xs (ω)) ≤
n
d(Xr(i−1) (ω),Xr(i) (ω)) < nc|r − s|θ
i=1 m+1
≡2
c|r − s|θ ≡ cH |r − s|θ ,
(9.3.30)
where r,s ∈ Q∞ and ω ∈ Dm are arbitrary. Because X(·,ω) is continuous on [0,1], inequality 9.3.30 implies that d(Xr (ω),Xs (ω)) ≤ cH |r − s|θ ,
(9.3.31)
for each r,s ∈ [0,1], where ω ∈ Dm is arbitrary. 10. Thus the process X is a.u. Hoelder, with exponent θ and with coefficient cH , as alleged. Assertion 1 is proved. 11. Assertion 2 follows from Chebychev’s inequality. 12. Suppose F is a consistent family of f.j.d.’s such that Fr,s 1(d>b) ≤ c0 b−u |r − s|1+w
(9.3.32)
for each b > 0, for each r,s ∈ [0,1]. Then F is a consistent family of f.j.d.’s with state space S and parameter space [0,1] that is continuous in probability.
a.u. Continuous Process
353
Let Z : Q∞ × (,L,E) → (S,d) be an arbitrary process whose marginal distributions are given by F |Q∞ . For example, we can take Z to be the Daniell– Kolmogorov–Skorokhod Extension of the family F |Q∞ . Then inequality 9.3.32 shows that the a.u. continuous process Z satisfies inequality 9.3.16. Hence, by Assertion 1, the extension-by-limit X ≡ Lim (Z) : [0,1] × → (S,d) is a.u. Hoelder with exponent θ , and with a.u. Hoelder coefficient cH . Let F denote the family of marginal distributions of the process X. Then, since the process X is a.u. continuous, the family F is continuous in probability. At the same time, for each sequence s1, . . . ,sm ∈ Q∞ , the distribution Fs(1),...,s(m) is the joint distribution of the r.v.’s Zs(1), . . . ,Zs(m) by the construction of the process is the joint distribution of the r.v.’s Xs(1), . . . ,Xs(m) . Now Z, while Fs(1),...,s(m) (Xs(1), . . . ,Xs(m) ) = (Zs(1), . . . ,Zs(m) ) because X ≡ Lim (Z). We conclude , where m ≥ 1 and s1, . . . ,sm ∈ Q∞ are arbitrary. that Fs(1),...,s(m) = Fs(1),...,s(m) Therefore, by Assertion 2 of Lemma 6.2.11, we have F = F . In other words, the marginal distributions of the a.u. Hoelder process X are given by the family F . Assertion 3 of the present theorem is proved.
9.4 Brownian Motion One application of Theorem 9.3.3 is the construction of the Brownian motion. In the this section, let the dimension m ≥ 1 be arbitrary, but fixed. Definition 9.4.1. Brownian motion in R m . An a.u. continuous process B : [0,∞) × (Ω,L,E) → R m is called a Brownian motion in R m if (i) B0 = 0; (ii) for each sequence 0 ≡ t0 ≤ t1 ≤ · · · ≤ tn−1 ≤ tn in [0,∞), the r.v.’s Bt (1) − Bt (0), . . . ,Bt (n) − Bt (n−1) are independent; and (iii) for each s,t ∈ [0,∞), the r.v. Bt − Bs is normal with mean 0 and covariance matrix |t − s|I . We first construct a Brownian motion in R 1 . In the following, recall that Q∞ stands for the set of dyadic rationals in [0,∞). Theorem 9.4.2. Construction of Brownian motion in R. Brownian motion in R exists. Specifically, the following conditions hold: 1. Let Z : Q∞ × (,L,E) → R be an arbitrary process such that (i) Z0 = 0; (ii) for each sequence 0 ≡ t0 ≤ t1 ≤ · · · ≤ tn−1 ≤ tn in Q∞ , the r.r.v.’s Zt (1) − Zt (0), . . . ,Zt (n) − Zt (n−1) are independent; and (iii) for each s,t ∈ Q∞ , the r.r.v. Zt − Zs is normal with mean 0 and variance |t − s|. Then the extension-by-limit B ≡ Lim (Z) : [0,∞) × → R is a Brownian motion. 2. For each n ≥ 1 and for each t1, . . . ,tn ∈ Q∞ , define the f.j.d. Ft (1),...,t (m) ≡ 0,σ ,
354
Stochastic Process
where σ ≡ [σ (tk ,tj )]k=1,...,n;j =1,...,n ≡ [tk ∧ tj ]k=1,...,n;j =1,...,n . Then the family F ≡ {Ft (1),...,t (m) : m ≥ 1;t1, . . . ,tm ∈ [0,∞)} of f.j.d.’s is consistent and is continuous in probability. 3. Let Z : Q∞ × (,L,E) → R be an arbitrary process with marginal distributions given by the family F |Q∞ , where F is defined in Assertion 2. Then the extension-by-limit B ≡ Lim (Z) : [0,∞) × → R is a Brownian motion. Proof. For convenience, let U,U1,U2, . . . be an independent sequence of standard E). Such a sequence can be , L, normal r.r.v.’s. on some probability space ( E) to be the infinite product of the probability , L, seen to exist by taking ( 0, ·d 0,1 ), and taking U,U1,U2, . . . to be the successive coordinate space (R, L . functions on 1. Let Z : Q∞ × (,L,E) → R be an arbitrary process such that Conditions (i–iii) hold. Let b > 0 and s1,s2 ∈ Q∞ be arbitrary. Then, by Condition (iii), the r.r.v. Zs(1) − Zs(2) is normal with mean 0 and variance |s1 − s2 |. Consequently, by the formulas in Proposition 5.7.5 for moments of standard normal r.r.v.’s, we obtain E(Zs(1) − Zs(2) )4 = 3|s1 − s2 |2 . Chebychev’s inequality then implies that, for each b > 0, we have P (|Zs(1) − Zs(2) | > b) = P ((Zs(1) − Zs(2) )4 > b4 ) ≤ b−4 E(Zs(1) − Zs(2) )4 = 3b−4 |s1 − s2 |2,
(9.4.1)
where s1,s2 ∈ Q∞ are arbitrary. 2. Let N ≥ 0 be arbitrary and consider the shifted process Y ≡ Z N : Q∞ × (,L,E) → R defined by ZsN ≡ ZN +s for each s ∈ Q∞ . Note that Z0N = Z1N −1 if N ≥ 1. Then, for each b > 0 and s1,s2 ∈ Q∞ , we have P (|Ys(1) − Ys(2) | > b) ≡ P (|ZN +s(1) − ZN +s(2) | > b) ≤ 3b−4 (|(N + s1 ) − (N + s2 )|2 = 3b−4 |s1 − s2 |2, where the inequality follows from inequality 9.4.1. Thus the process Y satisfies the hypothesis of Theorem 9.3.3, with c0 = 3, u = 4, and w = 1. Accordingly, the extension-by-limit W ≡ Lim (Y ) : [0,1] × → R is a.u. Hoelder, and hence a.u. continuous. In particular, for each t ∈ [N,N + 1], the limit Bt ≡
lim
r→t−N ;r∈Q(∞)
ZrN ≡
lim
r→t−N ;r∈Q(∞)
Yr ≡ Wt−N
exists as an r.r.v. In other words, B|[N,N + 1] : [N,N + 1] × → R is a well-defined process. Moreover, since the process W is a.u. Hoelder, we see that B|[N,N + 1] is a.u. Hoelder, where N ≥ 0 is arbitrary. Combining, the process
a.u. Continuous Process
355
B : [0,∞) × → R is a.u. continuous, in the sense of Definition 6.1.3. Note that B0 = Z0 = 0 by Condition (i). 3. Let the sequence 0 ≡ t0 ≤ t1 ≤ · · · ≤ tn−1 ≤ tn in [0,∞) and the sequence 0 ≡ s0 ≤ s1 ≤ · · · ≤ sn−1 ≤ sn in Q∞ be arbitrary. Let fi ∈ Cub (R) be arbitrary for each i = 1, . . . ,n. Then si ∈ [Ni ,Ni + 1] for some Ni ≥ 0, for each i = 0, . . . ,n. We may assume that 0 = N0 ≤ N1 ≤ · · · ≤ Nn . Therefore Bs(i) − Bs(i−1) =
N (i)
lim
r→s(i)−N (i);r∈Q(∞)
−
Zs(i)−N (i)
lim
r→s(i−1)−N (i−1);r∈Q(∞)
N (i)
N (i−1) Zs(i−1)−N (i−1)
N (i−1)
= Zs(i)−N (i) − Zs(i−1)−N (i−1) N (i)−1 N (i) N (i) N (i)−1 + Z1 = Zs(i)−N (i) − Z0 − Z0
+ · · · + Z1N (i−1)−1 − Z0N (i−1)−1 N (i−1)
N (i−1) + Z1 (9.4.2) − Zs(i−1)−N (i−1) , where, according to Conditions (ii) and (iii) in the hypothesis, the terms in parentheses are independent, normal, with mean 0 and variances si −Ni ,1, . . . ,1− (si−1 − Ni−1 ), respectively. Hence Bs(i) − Bs(i−1) is normal, with mean 0 and variance si − si−1 . Moreover, in the special case where s1 = s ∈ Q∞ and s0 = 0, equality 9.4.2 shows that Bs = Zs , where s ∈ Q∞ is arbitrary. 4. Therefore n n E fi (Bs(i) − Bs(i−1) ) = E fi (Zs(i) − Zs(i−1) ) i=1
i=1
=
n
Efi (Zs(i) − Zs(i−1) )
i=1
=
n !
0,s(i)−s(i−1) (du)fi (u),
(9.4.3)
i=1 R
where the second and third equalities are due to Conditions (ii) and (iii), respectively. Now let si → ti for each i = 1, . . . ,n. Since the process B is a.u. continu ous, the left-hand side of equality 9.4.3 converges to E ni=1 fi (Bt (i) − Bt (i−1) ). At the same time, since, for each i = 1, . . . ,n, the integral ! √ ( rU ) 0,r (du)fi (u) = Ef R
is a continuous function of r ∈ [0,∞), the right-hand side of equality 9.4.3 converges to n ! 0,t (i)−t (i−1) (du)fi (u). i=1 R
356
Stochastic Process
Combining, equality 9.4.3 leads to E
n
fi (Bt (i) − Bt (i−1) ) =
i=1
n !
0,t (i)−t (i−1) (du)fi (u).
i=1 R
Consequently, the r.r.v.’s Bt (1) − Bt (0), . . . ,Bt (n) − Bt (n−1) are independent, with normal distributions, with mean 0 and variances given by t1 − t0, . . . ,tn − tn−1 , respectively. All the conditions in Definition 9.4.1 have been verified for the process B to be a Brownian motion. Assertion 1 is proved. 5. To prove Assertion 2, define the function σ : [0,∞)2 → [0,∞) by σ (s,t) ≡ s ∧ t for each (s,t) ∈ [0,∞)2 . Then the function σ is symmetric and continuous. We will verify that it is nonnegative definite in the sense of Definition 7.2.2. To that end, let n ≥ 1 and t1, . . . ,tn ∈ [0,∞) be arbitrary. We need only show that the square matrix σ ≡ [σ (tk ,tj )]k=1,...,n;j =1,...,n ≡ [tk ∧ tj ]k=1,...,n;j =1,...,n is nonnegative definite. Let (λk , . . . ,λk ) ∈ R n be arbitrary. We wish to prove that n
n
λk (tk ∧ tj )λj ≥ 0.
(9.4.4)
k=1 j =1
First assume that |tk − tj | > 0 if k j . Then there exists a permutation π of the indices 1, . . . ,n such that tπ(k) ≤ tπ(j ) iff k ≤ j . It follows that n
n
λk (tk ∧ tj )λj =
k=1 j =1
n
n
λπ(k) (tπ(k) ∧ tπ(j ) )λπ(j ) ≡
k=1 j =1
n
n
θk (sk ∧ sj )θj ,
k=1 j =1
(9.4.5) where we write sk ≡ tπ(k) and θk ≡ λπ(k) for each k = 1, . . . ,n. Recall the E). , L, independent standard normal r.r.v.’s.U1, . . . ,Un on the probability space ( k Thus EUk Uj = 1 or 0 according as k = j or k j . Define Vk ≡ i=1 √ k = 0 and si − si−1 Ui for each k = 1, . . . ,n, where s0 ≡ 0. Then EV k Vj = EV
k∧j k∧j
i2 = (si − si−1 )EU (si − si−1 ) = sk∧j = sk ∧ sj i=1
(9.4.6)
i=1
for each k,j = 1, . . . ,n. Consequently, the right-hand side of equality 9.4.5 becomes n 2 n n n
n
θk (sk ∧ sj )θj = E θk Vk Vj θj = E θk Vk ≥ 0. k=1 j =1
k=1 j =1
k=1
Hence the sum on the left-hand side of equality 9.4.5 is nonnegative. In other words, inequality 9.4.4 is valid if the point (t1, . . . ,tn ) ∈ [0,∞)n is such that |tk − tj | > 0 if k j . Since the set of such points is dense in [0,∞)n , inequality 9.4.4
a.u. Continuous Process
357
holds, by continuity, for each (t1, . . . ,tn ) ∈ [0,∞)n . In other words, the function σ : [0,∞)2 → [0,∞) is nonnegative definite according to Definition 7.2.2. 6. Consider each m ≥ 1 and each sequence t1, . . . ,tm ∈ [0,∞). Write the nonnegative definite matrix σ ≡ [σ (tk ,th )]k=1,...,m;h=1,...,m,
(9.4.7)
Ft (1),...,t (m) ≡ 0,σ ,
(9.4.8)
and define
where 0,σ is the normal distribution with mean 0 and covariance matrix σ . Take any M ≥ 1 so large that t1, . . . ,tm ∈ [0,M]. Proposition 7.2.3 says that the family F (M) ≡ {Fr(1),...,r(m) : m ≥ 1;r1, . . . ,rm ∈ [0,M]} is consistent and is continuous in probability. Hence, for each f ∈ C(R n ) and for each sequence mapping i : {1, . . . ,n} → {1, . . . ,m}, we have Ft (1),...,t (m) (f ◦ i ∗ ) = Ft (i(1)),...,t (i(n)) f ,
(9.4.9)
where the dual function i ∗ : R m → R n is defined by i ∗ (x1, . . . ,xm ) ≡ (xi(1), . . . ,xi(n) ).
(9.4.10)
7. To prove the remaining Assertion 3, let Z : Q∞ × (,L,E) → R be an arbitrary process with marginal distributions given by the family F |Q∞ , where F is defined in Assertion 2. Such a process Z exists by the Daniell–Kolmogorov Theorem or the Daniell–Kolmogorov–Skorokhod Extension Theorem. 8. Let t1,t2 ∈ Q∞ be arbitrary. Then the r.r.v.’s Zt (1),Zt (2) have a jointly normal distribution given by Ft (1),t (2) ≡ 0,σ , where σ ≡ [tr ∧ th ]k=1,2;h=1,2 . Hence EZt (1) Zt (2) = t1 ∧ t2 . It follows that Zt (1) − Zt (2) is a normal r.r.v. with mean 0, and with variance given by E(Zt (1) − Zt (2) )2 = EZt2(1) + ZBt2(2) − 2EZt (1) Zt (2) = t1 + t2 − 2t1 ∧ t2 = |t1 − t2 |. 9. Now let 0 ≡ t0 ≤ t1 ≤ · · · ≤ tn−1 ≤ tn be arbitrary in Q∞ . Then the r.r.v.’s Zt (1), · · · ,Zt (n) have joint distribution Ft (1),...,t (n) according to Step 7. Hence Zt (1), · · · ,Zt (n) are jointly normal. Therefore the r.r.v.’s Zt (1) − Zt (0), . . . ,Zt (n) − Zt (n−1) are jointly normal. Moreover, for each i,k = 1, . . . ,n with i < k, we have E(Zt (i) − Zt (i−1) )(Zt (k) − Zt (k−1) ) = EZt (i) Zt (k) − EZt (i) Zt (k−1) − EZt (i−1) Zt (k) + EZt (i−1) Zt (k−1) = tt − ti − ti−1 + ti−1 = 0.
(9.4.11)
358
Stochastic Process
Thus the jointly normal r.r.v.’s Zt (1) − Zt (0), . . . ,Zt (n) − Zt (n−1) are pairwise uncorrelated. Hence, by Assertion 3 of Proposition 5.7.6, they are mutually independent. Summing up Steps 8 and 9, all of Conditions (i–iii) of Assertion 1 have been verified for the process Z. Accordingly, the extension-by-limit B ≡ Lim (Z) : [0,∞) × → R
is a Brownian motion. Assertion 3 and the Theorem are proved.
The following corollary is Levy’s well-known result on the a.u. Hoelder continuity of a Brownian motion on each finite interval. A stronger theorem by Levy gives the best modulus 1 of a.u. continuity of a Brownian motion, but shows that an arbitrary θ ∈ 0, 2 is the best global Hoelder exponent that can be hoped for. Namely, a.u. Hoelder continuity for Brownian motion with global Hoelder exponent θ = 12 fails. Corollary 9.4.3. Levy’s Theorem: Brownian motion on a finite interval is a.u. Hoelder with any global Hoelder exponent less than 12 . Let
B : [0,∞) × (Ω,L,E) → R be an arbitrary Brownian motion. Let θ ∈ 0, 12 and a > 0 be arbitrary but fixed. Then B|[0,a] is a.u. Hoelder, with Hoelder exponent θ and with some a.u. Hoelder coefficient cH (·,a,θ ). We emphasize that θ and cH are independent of the time parameters r,s ∈ [0,a]. For that reason, θ and cH may be called global coeffiecients. Proof. Since θ < 12 , there exists m ≥ 0 so large that θ < (2 + 2m)−1 m. Consider the process X : [0,1] × (Ω,L,E) → R defined by Xt ≡ Bat for each t ∈ [0,1]. Consider each b > 0 and each r,s ∈ [0,1]. Then the r.r.v. Xs − Xr ≡ Bas − Bar is normally distributed with mean 0 and variance a|r − s|. Therefore 2 E|Xs − Xr |2+2m = E|U a|r − s||2+2m = c0 a 1+m |r − s|1+m, where U is a standard normal r.r.v. and c0 ≡ EU 2+2m . Thus the process X|Q∞ satisfies inequality 9.3.17 in the hypothesis of Theorem 9.3.3 with u ≡ 2 + 2m, c0 ≡ c0 a 1+m , and w ≡ m. Note that θ < u−1 w by the choice of m. Hence, accordingly to Theorem 9.3.3, the process X is a.u. Hoelder, with exponent θ and with some a.u. Hoelder coefficient cH ≡ cH (·,a,θ ) ≡ cH (·,c0,u,w,θ ). Let ε > 0 be arbitrary. Then, according to Definition 9.3.2, there exists a measurable set D with P (D c ) < ε such that for each ω ∈ D, we have |Xr (ω) − Xs (ω)| < cH (ε)|r − s|θ
(9.4.12)
for each r,s ∈ [0,1]. Now consider each ω ∈ D and each t,u ∈ [0,a] with |t − u| < a. Then inequality 9.4.12 yields |Bt (ω) − Bu (ω)| ≡ |Xt/a (ω) − Xu/a (ω)| < cH (ε)|a −1 t − a −s s|θ = cH (ε)a −1 |t − s|θ .
(9.4.13)
a.u. Continuous Process
359
Thus we see that the process B|[0,a] is a.u. Hoelder, with Hoelder exponent θ and a.u. Hoelder coefficient cH ≡ a −1 cH , according to Definition 9.3.2, as alleged. Theorem 9.4.4. Construction of Brownian motion in R m . Brownian motion in R m exists. Proof. Let U1, . . . ,Um be an independent sequence of standard normal r.r.v.’s. on E). Such a sequence exists by taking ( E) to be , L, , L, some probability space ( 0, ·d 0,1 ), and taking U1, . . . ,Um the mth power of the probability space (R, L to be the coordinate mappings. 1. According to Theorem 9.4.2, there exists a Brownian motion B : [0,∞) × (,L,E) → R with some sample space (,L,E). Now define the mth direct product (,L,E) ≡ (m,L⊗m,E ⊗m ). Let t,s ∈ [0,∞), define B t (ω) ≡ (Bt (ω1 ), . . . ,Bt (ωm ))
(9.4.14)
for each ω ≡ (ω1, . . . ,ωm ) ∈ ≡ m . Then B t is an r.v. on (,L,E) with values in R m , with EB t = 0. Thus B : [0,∞)×(,L,E) → R is a process with values in R m . Equality 9.4.14 says that the j th coordinate of B t (ω) is (B t (ω))j = Bt (ωj ) for each j = 1, . . . ,m. 2. Let a > 0 be arbitrary. Then B|[0,a] is a.u. Hoelder according to Corollary 9.4.3, and hence a.u. continuous. Let ε > 0 be arbitrary. Then, by Definition 6.1.2, there exist δauc (ε) > 0 and a measurable set D ⊂ with P (D c ) < ε such that for each ω ∈ D, we have (i) domain(B(·,ω)) = [0,a] and (ii) |B(t,ω)−B(s,ω)| ≤ ε for each t,s ∈ [0,a] with |t − s| < δauc (ε). Define D≡
m
{(ω1, . . . ,ωm ) ∈ m : ωi ∈ D}.
i=1
Then c
P (D ) ≤
m
P {(ω1, . . . ,ωm ) ∈ : ωi ∈ D } = m
i=1
c
m
P (D c ) < mε,
i=1
where the equality is by Fubini’s Theorem. 3. Consider each ω ∈ D. Then (i ) domain(B(·,ω)) = m i=1 domain(B(·,ωi )) = [0,a] and (ii ) |B(t,ω) − B(s,ω)| ≡ |(Bt (ω1 ), . . . ,Bt (ωm )) − (Bs (ω1 ), . . . ,Bs (ωm ))| ≤
m
i=1
|B(t,ωi ) − B(s,ωi )| ≤ mε
360
Stochastic Process
for each t,s ∈ [0,a] with |t − s| < δauc (ε). Thus the conditions in Definition 6.1.2 are satisfied for the process B to be a.u. continuous on [0,a], where a > 0 is arbitrary. Hence, according to Definition 6.1.3, the process B is a.u. continuous on [0,∞). 4. By the defining equality 9.4.14, we have B 0 = 0 ≡ (0, . . . ,0) ∈ R m . 5. Let 0 ≡ t0 ≤ t1 ≤ · · · ≤ tn−1 ≤ tn in [0,∞) be arbitrary. Consider each i = 1, . . . ,n and j = 1, . . . ,m. Let fi,j ∈ Cub (R) be arbitrary. Define Vi,j (ω) ≡ Bt (i) (ωj ) − Bt (i−1) (ωj ) for each ω ≡ (ω1, . . . ,ωm ) ∈ ≡ m . Then (Vi,1, . . . ,Vi,m )(ω) ≡ (Bt (i) (ω1 ) − Bt (i−1) (ω1 ), . . . ,Bt (i) (ωm ) − Bt (i−1) (ωm )) = (B(ti ) − B(ti−1 ))(ω) for each ω ≡ (ω1, . . . ,ωm ) ∈ ≡ m . Hence (Vi,1, . . . ,Vi,m ) = B(ti ) − B(ti−1 ).
(9.4.15)
Moreover, E
n m
fi,j (Vi,j )
j =1 i=1
!
≡
E(d(ω1, . . . ,ωm ))
n m
fi,j (Vi,j (ω1, . . . ,ωm ))
j =1 i=1
! ≡
E(d(ω1, . . . ,ωm ))
m n
fi,j (Bt (i) (ωj ) − Bt (i−1) (ωj ))
j =1 i=1
! =
! ···
E(dω1 ) · · · E(dωm )
m n
fi,j (Bt (i) (ωj ) − Bt (i−1) (ωj ))
j =1 i=1
=
m !
E(dωj )
j =1
≡
m j =1
n
fi,j (Bt (i) (ωj ) − Bt (i−1) (ωj ))
i=1
E
n i=1
fi,j (Bt (i) − Bt (i−1) ) =
n m
Efi,j (Bt (i) − Bt (i−1) ), (9.4.16)
j =1 i=1
where we used Fubini’s Theorem. 6. In the special case where fi ,j = 0 for each i = 1, . . . ,n, for each j = 1, . . . ,m such that i i and j j , equality 9.4.16 reduces to Efi,j (Vi,j ) = Efi,j (Bt (i) − Bt (i−1) ).
(9.4.17)
a.u. Continuous Process
361
Substituting this back into equality 9.4.16, we obtain E
n m
fi,j (Vi,j ) =
j =1 i=1
n m
Efi,j (Vi,j ),
j =1 i=1
where fi,j ∈ Cub (R) is arbitrary for each i = 1, . . . ,n, for each j = 1, . . . ,m. This implies that the r.r.v. s (Vi,j )j =1,...,m;i=1,...,n are mutually independent. 7. Now let gi ∈ Cub (R m ) be arbitrary for each i = 1, . . . ,n. Then E
n i=1
gi (Vi,1, . . . ,Vi,m ) =
n
Egi (Vi,1, . . . ,Vi,m ).
i=1
It follows that the r.v.’s (V1,1, . . . ,V1,m ), . . . ,(Vn,1, . . . ,Vn,m ) are independent. Equivalently, in view of equality 9.4.15, the r.v.’s B t (1) −B t (0), . . . ,B t (n) −B t (n−1) are independent, where 0 ≡ t0 ≤ t1 ≤ · · · ≤ tn−1 ≤ tn is an arbitrary sequence in [0,∞). 8. Let s,t ∈ [0,∞) be arbitrary. Let n = 2. Let (t0,t1,t2 ) ≡ (0,s,t). Let j = 1, . . . ,m be arbitrary. Then equality 9.4.17 shows that the r.r.v. V2,j has the same distribution as the r.r.v. Bt (2) − Bt (1) = Bt − Bs . Hence the independent r.r.v.’s V2,1, . . . ,V2,m are normally distributed with mean 0 and variance |t − s|. Consequently, the r.v. (V2,1, . . . ,V2,m ) is normally distributed with mean 0 ∈ R m and covariance matrix |t − s|I, where I is the m × m identity matrix. Therefore the r.v. B t − B s ≡ B t (2) − B t (1) ≡ (V2,1, . . . ,V2,m ) is normal with mean 0 ∈ R m and covariance matrix |t − s|I . 9. All the conditions in Definition 9.4.1 have been verified for the process B to be a Brownian motion. Corollary 9.4.5. Basic properties of Brownian motion in R m . Let B : [0,∞) × (Ω,L,E) → R m be an arbitrary Brownian motion in R m . Then the following conditions hold: 1. Let A be an arbitrary orthogonal k × m matrix. Thus AAT = I is the k × k identity matrix. Then the process AB : [0,∞) × (Ω,L,E) → R k is a Brownian motion in Rk . 2. Let b be an arbitrary unit vector. Then the process bT B : [0,∞) × (Ω, L,E) → R is a Brownian motion in R 1 . 3. Suppose the process B is adapted to some filtration L ≡ {L(t) : t ∈ [0,∞)}. : [0,∞) × (Ω,L,E) → R m by Let γ > 0 be arbitrary. Define the process B −1/2 Bγ t for each t ∈ [0,∞). Then B is a Brownian motion in R m adapted Bt ≡ γ to the filtration Lγ ≡ {L(γ t) : t ∈ [0,∞)}. Proof. 1. Let A be an orthogonal k × m matrix. Then trivially AB0 = A0 = 0 ∈ R k . Thus Condition (i) of Definition 9.4.1 holds for the process AB. Next,
362
Stochastic Process
let the sequence 0 ≡ t0 ≤ t1 ≤ · · · ≤ tn−1 ≤ tn in [0,∞) be arbitrary. Then the r.v.’s Bt (1) − Bt (0), . . . ,Bt (n) − Bt (n−1) are independent. Hence the r.v.’s A(Bt (1) − Bt (0) ), . . . ,A(Bt (n) − Bt (n−1) ) are independent, establishing Condition (ii) of Definition 9.4.1 for the process AB. Now let s,t ∈ [0,∞) be arbitrary. Then the r.v. Bt − Bs is normal with mean 0 and covariance matrix |t − s|I , where I stands for the k × k identity matrix. Hence A(Bt − Bs ) is normal with mean 0, with covariance matrix E(A(Bt − Bs )(Bt − Bs )T AT ) = A(E(Bt − Bs )(Bt − Bs )T )AT = A(|t − s|I AT ) = |t − s|AAT = |t − s|I . This proves Condition (iii) of Definition 9.4.1 for the process AB. Assertion 1 is proved. 2. Let b be an arbitrary unit vector. Then bT is a 1 × m orthogonal matrix. Hence, according to Assertion 1, the process bT B : [0,∞) × (Ω,L,E) → R is a Brownian motion in R 1 . Assertion 2 is proved. t ≡ γ −1/2 Bγ t for each : [0,∞)×(Ω,L,E) → R m by B 3. Define the process B t ∈ [0,∞). Trivially, Conditions (i) and (ii) of Definition 9.4.1 hold for the process Let s,t ∈ [0,∞) be arbitrary. Then the r.v. Bγ t − Bγ s is normal with mean 0 B. s ≡ γ −1/2 Bγ t −γ −1/2 Bγ s t − B and covariance matrix |γ t −γ s|I . Hence the r.v. B has mean 0 and covariance matrix (γ −1/2 )2 |γ t − γ s|I = |t − s|I . to be a Thus Condition (iii) of Definition 9.4.1 is also verified for the process B t ≡ γ −1/2 Bγ t ∈ Brownian motion. Moreover, for each t ∈ [0,∞), we have B is adapted to the filtration Lγ ≡ {L(γ t) : t ∈ [0,∞)}. L(γ t) . Hence the process B Assertion 3 and the corollary are proved.
9.5 a.u. Continuous Gaussian Process In this section, we will restrict our attention to real-valued Gaussian processes with parameter set [0,1]. Definition 9.5.1. Specification of a covariance function and a Gaussian process. In this section, let σ : [0,1] × [0,1] → R be an arbitrary continuous symmetric nonnegative definite function. For each r,s ∈ [0,1], define r,s ≡ σ (r,s) ≡ σ (r,r) + σ (s,s) − 2σ (r,s).
(9.5.1)
Let X : [0,1] × → R be a measurable Gaussian process such that (i) EXr = 0 and EXr Xs = σ (r,s) for each r,s ∈ [0,1] and (ii) X = Lim (Z), where Z ≡ X|Q∞ : Q∞ × → R is the restriction of X to the countable parameter set Q∞ . By Theorem 7.2.5 and its proof, such a process X exists. Then Xr − Xs is a normal r.r.v. with mean 0 and variance E|Xr − Xs |2 = r,s .
(9.5.2)
a.u. Continuous Process
363
Separately, for each u > 0, define the constant αu to be the absolute uth moment of the standard normal distribution on R. The next theorem seems to be hitherto unknown in the probability literature. Theorem 9.5.2. Sufficient condition for a Gaussian process to be a.u. Hoelder. Suppose there exist constants c0,u,w > 0 such that αu ur,s ≤ c0 |r − s|1+w
(9.5.3)
for each r,s ∈ [0,1]. Then the process X is a.u. globally Hoelder continuous with exponent θ , where θ < u−1 w is arbitrary. Proof. We will give the proof only for the case where σ is positive definite. Let r,s ∈ [0,1] be arbitrary. Then, from the defining equalities 9.5.1 and 9.5.2, we see that −1 r,s (Xr − Xs ) is a standard normal r.r.v. Hence its absolute uth moment is equal to the absolute constant αu . Therefore u u 1+w E|Xr − Xs |u = ur,s E|−1 , r,s (Xr − Xs )| = r,s αu ≤ c0 |r − s|
where the last inequality is by hypothesis. Hence E|Zr − Zs |u ≤ c0 |r − s|1+w for each r,s ∈ Q∞ . Theorem 9.3.3 therefore implies that the process Lim (Z) is a.u. globally Hoelder with exponent θ , where θ < u−1 w is arbitrary. Since X = Lim (Z) by Condition (ii) in Definition 9.5.1, the present theorem is proved. Thus we have a condition in terms of a bound on r,s to guarantee a.u. Hoelder continuity. The celebrated paper by [Garsia, Rodemich, and Rumsey 1970] also gives a condition in terms of a bound on r,s , under which the Gaussian process has an explicit modulus of a.u. continuity. The Garsia–Rodemich–Rumsey (GRR) proof shows that the partial sums of the Karhunen–Loeve expansion relative to σ are, under said condition, a.u. convergent to an a.u. continuous process. We will quote the key real-variable lemma in [Garsia, Rodemich, and Rumsey 1970]. We will then present a proof, which is constructive in every detail, of their main theorem; our proof is otherwise essentially the proof in the cited paper. We dispense with the authors’ use of a submartingale derived from the Karhunen– Loeve expansion, and dispense with their subsequent appeal to a version of the submartingale convergence theorem that asserts the a.u. convergence of each submartingale with bounded expectations. This version of the submartingale convergence implies the principle of infinite search and is not constructive. In the place of the Karhunen–Loeve expansion and the use of submartingales, we will derive Borel–Cantelli estimates on conditional expectations, thus sticking to elementary time-domain analysis and obviating the need, for the present purpose, of more groundwork on spectral analysis of the covariance function. Note that the direct use of conditional expectations in relation to the Karhunen– Loeve expansion is mentioned in [Garsia, Rodemich, and Rumsey 1970] as part of a related result.
364
Stochastic Process 1 In the following discussion, recall that 0 ·dp denotes the Riemann–Stieljes integration relative to an arbitrary distribution function p on [0,1]. Also, note that Y ≡ X|Q∞ : Q∞ × → R is a centered Gaussian process with the continuous nonnegative definite covariance function σ . Definition 9.5.3. Two auxiliary functions. Introduce the auxiliary function 1 2 (9.5.4) v (v) ≡ exp 4 for each v ∈ [0,∞), with its inverse
2 −1 (u) ≡ 2 log u
(9.5.5)
for each u ∈ [1,∞). Next we cite, without the proof from [Garsia, Rodemich, and Rumsey 1970], a remarkable real variable lemma. It is key to the GRR theorem. Lemma 9.5.4. Garsia–Rodemich–Rumsey real-variable lemma. Let the function and its inverse −1 be as in Definition 9.5.3. Let p : [0,1] → [0,∞) be an arbitrary continuous nondecreasing function with p(0) = 0 that is increasing in some neighborhood of 0. Let f be an arbitrary continuous function on [0,1] such that the function (
|f (t) − f (s)| ) p(|t − s|)
of (t,s) ∈ [0,1]2 is integrable, with ! 1! 1 |f (t) − f (s)| dtds ≤ B p(|t − s|) 0 0
(9.5.6)
for some B > 0. Then !
|t−s|
|f (t) − f (s)| ≤ 8 0
−1
4B u2
dp(u)
(9.5.7)
for each (t,s) ∈ [0,1]2 . Proof. See [Garsia, Rodemich, and Rumsey 1970]. The verification that their proof is constructive is left to the reader. Recall from Definition 9.0.2 some notations for dyadic rationals in [0,1]. For each N ≥ 0, we have pN ≡ 2N , N ≡ 2−N , and the sets of dyadic rationals QN ≡ {t0,t1, . . . ,tp(N ) } = {qN,0, . . . ,qN,p(N ) } ≡ {0,N ,2N , . . . ,1} and Q∞ ≡
∞ N =0
QN = {t0,t1, . . .}.
a.u. Continuous Process
365
Recall that [·]1 is the operation that assigns to each a ∈ R an integer [a]1 ∈ (a,a + 2). Recall also the matrix notations in Definition 5.7.1 and the basic properties of conditional distributions established in Propositions 5.6.6 and 5.8.17. As usual, to lessen the burden on subscripts, we write the symbols xy and x(y) interchangeably for arbitrary expressions x and y. Lemma 9.5.5. Interpolation of a Gaussian process by conditional expectations. Let Y : Q∞ × → R be an arbitrary centered Gaussian process with a continuous positive definite covariance function σ . Thus EYt Ys = σ (t,s) and E(Yt − Ys )2 = σ (t,s) for each t,s ∈ Q∞ . Let p : [0,1] → [0,∞) be an arbitrary continuous nondecreasing function such that (σ (s,t))1/2 ≤ p(u) 0≤s,t≤1;|s−t|≤u
for each u ∈ [0,1]. Then the following conditions hold: 1. Let n ≥ 0 and t ∈ [0,1] be arbitrary. Define the r.r.v. (n)
≡ E(Yt |Yt (0), . . . ,Yt (n) ).
Yt
Then, for each fixed n ≥ 0, the process Y (n) : [0,1] × → R is an a.u. continuous (n) centered Gaussian process. Moreover, Yr = Yr for each r ∈ {t0, . . . ,tn }. We will call the process Y (n) the interpolated approximation of Y by conditional expectations on {t0, . . . ,tn }. (·) 2. For each fixed t ∈ [0,1], the process Yt : {0,1, . . .} × → R is a martingale relative to the filtration L ≡ {L(Yt (0), . . . ,Yt (n) ) : n = 0,1, . . .}. 3. Let m > n ≥ 1 be arbitrary. Define (m,n)
Zt
(m)
≡ Yt
(n)
− Yt
∈ L(Yt (0), . . . ,Yt (m) )
for each t ∈ [0,1]. Let ∈ (0,1) be arbitrary. Suppose n is so large that the subset {t0, . . . ,tn } is a -approximation of [0,1]. Define the continuous nondecreasing function p : [0,1] → [0,∞) by p (u) ≡ 2p(u) ∧ 2p() for each u ∈ [0,1]. Then (m,n)
E(Zt
− Zs(m,n) )2 ≤ p2 (|t − s|)
for each t,s ∈ [0,1]. Proof. Since Y : Q∞ × → R is centered Gaussian with covariance function σ , we have E(Yt − Ys )2 = EYt2 − 2EYt Ys + EYs2 = σ (t,s) − 2σ (t,s) + σ (s,s) ≡ σ (t,s) for each t,s ∈ Q∞ 1. Let n ≥ 0 be arbitrary. Then the r.v. Un ≡ (Yt (0), . . . ,Yt (n) ) with values in R n+1 is normal, with mean 0 ∈ R n+1 and the positive definite covariance matrix
366
Stochastic Process σ n ≡ EUn UnT = [σ (th,tj )]h=0,...;n;j =0,...,n .
For each t ∈ [0,1], define cn,t ≡ (σ (t,t0 ), . . . ,σ (t,tn )) ∈ R n+1 . Define the Gaussian process Y
(n)
: [0,1] × → R by
(n)
Yt
T ≡ cn,t σ −1 n Un
for each t ∈ [0,1], where the inverse matrix σ −1 n is defined because the function (n)
σ is positive definite. Then, since cn,t is continuous in t, the process Y is a.u. continuous. Moreover, for each t ∈ Q∞ , the conditional expectation of Yt given Un is, according to Proposition 5.8.17, given by T σ −1 E(Yt |Yt (0), . . . ,Yt (n) ) = E(Yt |Un ) = cn,t n Un ≡ Y t , (n)
(n)
(n)
(9.5.8)
(n)
whence Yt = Y t . Since Y is an a.u. continuous and centered Gaussian process, so is Y (n) . Assertion 1 is proved. Note that for each r ∈ {t0, . . . ,tn } and for each m ≥ n, we have r ∈ {t0, . . . ,tm }, whence Yr ∈ L(Um ) and Yr(m) = E(Yr |Um ) = Yr ,
(9.5.9)
where the second equality is by the properties of the conditional expectation. 2. Let m > n ≥ 1 be arbitrary. Then, for each t ∈ Q∞ , we have (m)
E(Yt
(n)
|Un ) = E(E(Yt |Um )|Un ) = E(Yt |Un ) = Yt ,
where the first and third equalities are by equality 9.5.8, and where the second equality is because L(Un ) ⊂ L(Um ). Hence, for each V ∈ L(Un ), we have EYt(m) V = EYt(n) V (m)
for each t ∈ Q∞ and, by continuity, also for each t ∈ [0,1]. Thus E(Yt |Un ) = (n) Yt for each t ∈ [0,1]. We conclude that for each fixed t ∈ [0,1], the process (·) Yt : {0,1, . . .} × → R is a martingale relative to the filtration {L(Un ) : n = 0,1, . . .}. Assertion 2 is proved. 3. Let m > n ≥ 1 and t,s ∈ [0,1] be arbitrary. Then (m,n)
Zt
(m)
− Yt
(m)
− Ys(m) − E(Yt
− Zs(m,n) ≡ Yt = Yt
(n)
− Ys(m) + Ys(n) (m)
− Ys(m) |Un ).
Hence, by Assertion 8 of Proposition 5.6.6, we have (m,n)
E(Zt
(m)
− Zs(m,n) )2 ≤ E(Yt
− Ys(m) )2 .
Suppose t,s ∈ Q∞ . Then equality 9.5.8 implies that (m)
Yt
− Ys(m) = E(Yt − Ys |Um ).
(9.5.10)
a.u. Continuous Process
367
Hence, (m,n)
E(Zt
(m)
− Zs(m,n) )2 ≤ E(Yt
− Ys(m) )2 ≤ E(Yt − Ys )2 = σ (t,s),
where t,s ∈ Q∞ . are arbitrary, and where the second inequality is thanks to equality 9.5.10 and to Proposition 5.6.6. By continuity, we therefore have (n,m)
E(Zt
− Zs(n,m) )2 ≤ σ (t,s) ≤ p2 (|t − s|),
(9.5.11)
where t,s ∈ [0,1] are arbitrary. Now let > 0 be arbitrary. Suppose n ≥ 1 is so large that the subset {t0, . . . ,tn } is a -approximation of [0,1]. Then there exist t ,s ∈ {t0, . . . ,tn } such that |t − t | ∨ |s − s | < . Then equality 9.5.9 implies that (m,n)
Zt
(m)
≡ Yt
(n)
− Yt
= Yt − Yt = 0,
(9.5.12)
with a similar inequality for s . Applying inequality 9.5.11 to t,t in place of t,s, we obtain E(Zt(m,n) − Zt(m,n) )2 ≤ p 2 (|t − t |) ≤ p2 (), and a similar inequality for the pair s,s in place of t,t . In addition, equality 9.5.12 implies (m,n)
Zt
(m,n)
− Zs(m,n) = (Zt
(m,n)
− Zt
(m,n)
) − (Zs(m,n) − Zs
).
Hence Minkowski’s inequality yields . . . (m,n) (m,n) 2 (m,n) (m,n) (m,n) (m,n) E(Zt − Zs ) ≤ E(Zt − Zt )2 + E(Zs − Zs )2 ≤ 2p().
(9.5.13)
Combining inequalities 9.5.11 and 9.5.13, we obtain (m,n)
E(Zt
− Zs(m,n) )2 ≤ (2p(|t − s|) ∧ 2p())2 ≡ p2 (|t − s|).
Assertion 3 and the lemma are proved.
The next lemma prepares for the proof of the GRR theorem. Lemma 9.5.6. Modulus of a.u. continuity of an a.u. continuous Gaussian process. Let V : [0,1] × → R be an arbitrary a.u. continuous and centered Gaussian process, with a continuous covariance function σ . Suppose p : [0,1] → [0,∞) is a continuous 2function that is increasing in some neighborhood of 0, with p(0) = 0, such that − log u is integrable relative to the distribution function p on [0,1]. Thus ! 12 − log udp(u) < ∞. (9.5.14) 0
368
Stochastic Process
Suppose
σ (t,s) ≤ p(u)2
(9.5.15)
0≤s,t≤1;|s−t|≤u
√ for each u ∈ [0,1]. Then there exists an integrable r.r.v. B with EB ≤ 2 such that ! |t−s| 5 4B(ω) dp(u) log (9.5.16) |V (t,ω) − V (s,ω)| ≤ 16 u2 0 for each t,s ∈ [0,1], for each ω in the full set domain(B). Proof. 1. By hypothesis, p is a nondecreasing function on [0,1] that is increasing in some neighborhood of 0, with p(0) = 0. It follows that p(u) > 0 for each u ∈ (0,1]. 2. Define the full subset D ≡ {(t,s) ∈ [0,1]2 : |t − s| > 0} of [0,1]2 relative to the product Lebesgue integration. Because the process V is a.u. continuous, there exists a full set A ⊂ such that V (·,ω) is continuous on [0,1], for each ω ∈ A. Moreover, V is a measurable function on [0,1] × . Define the function U : [0,1]2 × → R by domain(U ) ≡ D × A and
|V (t,ω) − V (s,ω)| p(|t − s|) 1 (V (t,ω) − V (s,ω))2 ≡ exp 4 p(|t − s|)2
U (t,s,ω) ≡
(9.5.17)
for each (t,s,ω) ∈ domain(U ). Then U is measurable on [0,1]2 × . 3. Using the Monotone Convergence Theorem, the reader can prove that the right-hand side of equality 9.5.17 is an integrable function on [0,1]2 × relative to the product integration, with integral bounded by ! 1! 1√ ! 1! 1! ∞ √ 1 2 1 2dtds = 2. √ exp − u dudtds = 4 2π 0 0 −∞ 0 0 Therefore, by Fubini’s Theorem, the function ! 1! 1 ! 1! 1 |Vt − Vs | dtds B≡ U dtds ≡ p(|t − s|) 0 0 0 0
a.u. Continuous Process
369
on (,L,E) is an integrable r.r.v, with expectation given by ! 1! 1 √ E(B) ≡ E U dtds ≤ 2. 0
0
Consider each ω ∈ domain(B). Then ! 1! 1 |V (t,ω) − V (s,ω)| dtds. B(ω) ≡ p(|t − s|) 0 0 In view of equality 9.5.18, Lemma 9.5.4 implies that ! |t−s| −1 4B(ω) dp(u) |V (t,ω) − V (s,ω)| ≤ 8 u2 0 ! |t−s| 5 4B(ω) log ≡ 16 dp(u), u2 0 where the equality is by equality 9.5.5. The lemma is proved.
(9.5.18)
(9.5.19)
Theorem 9.5.7. Garsia–Rodemich–Rumsey Theorem. Let p 2 : [0,1] → [0,∞) be a continuous increasing function with p(0) = 0, such that − log u is integrable relative to the distribution function p on [0,1]. Thus ! 12 − log udp(u) < ∞. (9.5.20) 0
Let σ : [0,1] × [0,1] → R be an arbitrary symmetric positive definite function such that (σ (s,t))1/2 ≤ p(u). (9.5.21) 0≤s,t≤1;|s−t|≤u
Then there exists an a.u. continuous centered Gaussian process X : [0,1]× →R √ with σ as covariance function and an integrable r.r.v. B with EB ≤ 2 such that ! |t−s| 5 4B(ω) dp(u) log |X(t,ω) − X(s,ω)| ≤ 16 u2 0 for each t,s ∈ [0,1], for each ω ∈ domain(B). Proof. 1. By hypothesis, p is an increasing function with p(0) = 0. It follows that p(u) > 0 for each u ∈ (0,1]. 2. Let F σ ≡ covar,fj d (σ )
(9.5.22)
be the consistent family of normal f.j.d.’s on the parameter set [0,1] associated with mean function 0 and the given covariance function σ , as defined in equalities 7.2.4 and 7.2.3 of and constructed in Theorem 7.2.5. Let Y : Q∞ × → R be an
370
Stochastic Process
arbitrary process with marginal distributions given by F σ |Q∞ , the restriction of the family F σ of the normal f.j.d.’s to the countable parameter subset Q∞ . 3. By hypothesis, the function p is continuous at 0, with p(0) = 0. Hence there is a modulus of continuity δp,0 : (0,∞) → (0,∞) such that p(u) < ε for each u with 0 ≤ u < δp,0 (ε), for each ε > 0. 2 1 4. Also by hypothesis, the function − log u is integrable relative to 0 ·dp. Hence there exists a modulus of integrability δp,1 : (0,∞) → (0,∞) such that ! 2 − log udp(u) < ε (0,c]
for each c ∈ [0,1] with (0,c] dp = p(c) < δp,1 (ε). 5. Let k ≥ 1 arbitrary. Then, for each u ∈ (0,1], we have log k 2 ≥ 0 and − log u ≥ 0, whence 2 k2 2 log k − 2 log u −1 ≡ 2 u2 √ 2 √ 2 2 = 2 2 log k − log u ≤ 2 2( log k + − log u). (9.5.23) The functions of u on both ends have domain (0,1] 1 and are continuous on (0,1]. Hence these functions are measurable relative to 0 ·dp. Since the right-hand side 1 of inequality 9.5.23 is an integrable function of u relative to the integration 0 ·dp, so is the function on the left-hand side. 6. Define m0 ≡ 1. Let k ≥ 1 be arbitrary. In view of the conclusion of Step 5, there exists mk ≡ mk (δp,0,δp,1 ) ≥ mk−1 so large that ! (m(k)) 2 −1 k 16 (9.5.24) dp(u) < k −2, 2 u 0 where m(k) ≡ 2−m(k) . 7. Let n ≥ 0 be arbitrary. Define, as in Lemma 9.5.5, the interpolated process m(n) : [0,1] × → R of Y : Q∞ × → R by conditional expectations on Y {t0, . . . ,tm(n) }. Lemma 9.5.5 implies that (i) Y m(n) is a centered Gaussian process, m(n) = Yr for each r ∈ {t0, . . . ,tm(n) }. (ii) Y m(n) is a.u. continuous, and (iii) Yr Consequently, the difference process Z (n) ≡ Z (m(n),m(n−1)) ≡ Y (m(n)) − Y (m(n−1)) is a.u. continuous if n ≥ 1. From Condition (iii), we have (n)
(m(n))
Z0 ≡ Y0 if n ≥ 1.
(m(n−1))
− Y0
= Y0 − Y0 = 0
a.u. Continuous Process
371
8. Let t,s ∈ [0,1] be arbitrary with |t − s| > 0. Then, since {t0, . . . ,tm(k−1) } is a m(k−1) -approximation of [0,1], we have, by Assertion 3 of Lemma 9.5.5, E(Zt(k) − Zs(k) )2 ≡ E(Zt(m(k),m(k−1)) − Zs(m(k),m(k−1)) )2 ≤ p2k (|t − s|), (9.5.25) where we define pk (u) ≡ p(m(k−1)) (u) ≡ 2p(u) ∧ 2p(m(k−1) ) for each u ≥ 0. Note that pk (u) = 2p(u) for each u ≤ m(k−1) , and pk (u) is constant for each u > m(k−1) . The definition of Riemann–Stieljes integrals then implies that for each nonnegative function f on [0,1] that is integrable relative to the distribution function p, we have ! (m(k−1)) ! 1 f (u)dpk (u) = f (u)dp k (u) 0
0
!
(m(k−1))
=2
f (u)dp(u) < ∞.
(9.5.26)
0
In particular, ! 0
12
− log udpk (u) = 2
!
(m(k−1)) 2
− log udp(u) < ∞.
(9.5.27)
0
9. Inequalities 9.5.25 and 9.5.27 together imply that the a.u. continuous process and the function pk satisfy the conditions in the hypothesis √ of Lemma 9.5.6. Accordingly, there exists an integrable r.r.v. Bk with EBk ≤ 2 such that ! |t−s| 5 4Bk (ω) dp k (u) |Z (k) (t,ω) − Z (k) (s,ω)| ≤ 16 log u2 0 ! |t−s|∧(m(k−1)) 5 4Bk (ω) dp(u) = 16 log u2 0 (9.5.28)
Z (k)
for each t,s ∈ [0,1], for each ω ∈ domain(Bk ). 10. Suppose k ≥ 2. Let αk ∈ (2−3 (k − 1)2,2−2 (k − 1)2 ) be arbitrary, and define Ak ≡ (Bk ≤ αk ). Chebychev’s inequality then implies that √ √ P (Ack ) ≡ P (Bk > αk ) ≤ αk−1 2 < 23 2(k − 1)−2 . Consider each ω ∈ n=∞ n=k An . Let n ≥ k be arbitrary. Then inequality 9.5.28 implies that for each t,s ∈ [0,1], we have
372
Stochastic Process ! (m(n−1)) 5 4αn (n) (n) dp(u) |Zt (ω) − Zs (ω)| ≤ 16 log u2 0 ! (m(n−1)) 5 (n − 1)2 ≤ 16 dp(u) log u2 0 ! (m(n−1)) 2 −1 −1 (n − 1) 2 dp(u) ≡ 16 u2 0 ! (m(n−1)) 2 −1 (n − 1) < 16 dp(u) < (n − 1)−2, u2 0 (9.5.29)
where the last inequality is by inequality 9.5.24. In particular, if we set s = 0 and (n) recall that Z0 = 0, we obtain |Yt (ω) − Yt (ω)| ≡ |Zt (ω)| < (n − 1)−2, √ c 3 2(n − 1)−2 , we conclude where ω ∈ n=∞ n=k An is arbitrary. Since P (An ) < 2 (m(k)) (m(k)) that Yt converges a.u. to the limit r.r.v. X t ≡ limk→∞ Yt . Thus we obtain the limiting process X : [0,1] × → R. 11. We will next prove that the process X is a.u. continuous. To that end, note that since Y (m(n)) is an a.u. continuous process for each n ≥ −1 there exist a measurable set Dk with P (Dkc ) < k −1 and a δk > 0, such that for each ω ∈ Dk , we have (m(n))
(m(n−1))
(m(n))
k−1 (Yt |!n=0
(n)
(ω) − Ys(m(n)) (ω))| < k −1
for each t,s ∈ [0,1] with |s − t| < δk . A similar inequality holds with n replaced c by n − 1.√Separately, define√the measurable set Ck ≡ ∞ n=k An . Then P (Ck ) ≤ ∞ 3 −2 3 −1 2(n − 1) ≤ 2 2(k − 1) . n=k 2 12. Now consider each ω ∈ Dk Ck , and each t,s ∈ [0,1] with |s − t| < δk . Then ∞
(m(n)) (m(n−1)) k+1 X t (ω) = !n=0 + (ω) − Yt (ω)), (Yt n=k
with a similar equality when t is replaced by s. Hence |Xt (ω) − X s (ω)| ≤ 2k −1 + 2
∞
(n − 1)−2 < 9k −1, n=k
where ω ∈ Dk Ck and t,s √ ∈ [0,1] with |s − t| < δk are arbitrary. Since P (Dk Ck )c < 2k −1 + 23 22k −1 and 9k −1 are arbitrarily small if k ≥ 2 is sufficiently large, we see that X : [0,1] × → R is an a.u. continuous process. Consequently, the process X is continuous in probability.
a.u. Continuous Process
373
13. Now we will verify that the process X is Gaussian, centered, with covariance function σ . Note that X|Q∞ = Y . Hence X|Q∞ has marginal distributions given by the family F σ |Q∞ of f.j.d.’s. Since the process X and the family F σ are continuous in probability, and since the subset Q∞ is dense in the parameter set [0,1], it follows that X has marginal distributions given by the family F σ . Thus X is Gaussian, centered, with covariance function σ . 14. In view of inequalities 9.5.20 and 9.5.21 in the hypothesis, the conditions in Lemma 9.5.6 are satisfied by the process X and the function p.√ Hence Lemma 9.5.6 implies the existence of an integrable r.r.v. B with EB ≤ 2 such that ! |t−s| 5 4B(ω) dp(u) |X(t,ω) − X(s,ω)| ≤ 16 log (9.5.30) u2 0 for each t,s ∈ [0,1], for each ω ∈ domain(B), as alleged.
10 a.u. Càdlàg Process
In this chapter, let (S,d) be a locally compact metric space, with a binary approximation ξ relative to some fixed reference point x◦ . As usual, write d ≡ 1 ∧ d. We will study processes X : [0,∞) × → S whose sample paths are right continuous with left limits, or càdlàg (a French acronym for “continue à droite, limite à gauche”). Classically, the proof of existence of such processes relies on Prokhorov’s Relative Compactness Theorem. As discussed in the beginning of Chapter 9, this theorem implies the principle of infinite search. We will therefore bypass Prokhorov’s theorem, in favor of direct proofs using Borel–Cantelli estimates. Section 10.1 presents a version of Skorokhod’s definition of càdlàg functions from [0,∞) to S. Each càdlàg function will come with a modulus of càdlàg, much as a continuous function comes with a modulus of continuity. In Section 10.2, we study a Skorokhod metric dD on the space D of càdlàg functions. In Section 10.3, we define an a.u. càdlàg process X : [0,1]× → S as a process that is continuous in probability and that has, almost uniformly, càdlàg sample functions. In Section 10.4, we introduce a D-regular process Z : Q∞ × → S, in terms of the marginal distributions of Z, where Q∞ is the set of dyadic rationals in [0,1]. We then prove, in Sections 10.4 and 10.5, that a process X : [0,1]× → S is a.u. càdlàg iff its restriction X|Q∞ is D-regular or, equivalently, iff X is the extension, by right limit, of a D-regular process Z. Thus we obtain a characterization of an a.u. càdlàg processes in terms of conditions on its marginal distributions. Equivalently, we have a procedure to construct an a.u. càdlàg process X from a consistent family F of f.j.d.’s that is D-regular. We will derive the modulus of a.u. càdlàg of X from the given modulus of D-regularity of F . In Section 10.6, we will prove that this construction is metrically continuous, in epsilon–delta terms. Such continuity of construction seems to be hitherto unknown. In Section 10.7, we apply the construction to obtain a.u. càdlàg processes with strongly right continuous marginal distributions; in Section 10.8, to a.u. càdlàg martingales; and in Section 10.9, to processes that are right-Hoelder in a sense to be made precise there. In Section 10.10, we state the generalization of definitions and results in Sections 10.1–10.9, to the parameter interval [0,∞). 374
a.u. Càdlàg Process
375
Finally, in Section 10.11, we will prove an abundance of first exit times for each a.u càdlàg process. Before proceeding, we remark that our constructive method for a.u. càdlàg processes is by using certain accordion functions, defined in Definition 10.5.3, as time-varying boundaries for hitting times. This point will be clarified as we go along. This method was first used in [Chan 1974] to construct an a.u. càdlàg Markov process from a given strongly continuous semigroup. Definition 10.0.1. Notations for dyadic rationals. For ease of reference, we restate the following notations in Definition 9.0.2 related to dyadic rationals. For each m ≥ 0, define pm ≡ 2m , m ≡ 2−m ; recall the enumerated sets of dyadic rationals Qm ≡ {t0,t1, . . . ,tp(m) } = {qm,0, . . . ,qm,p(m) } ≡ {0,m,2m, . . . ,1} ⊂ [0,1], and Q∞ ≡
∞
Qm ≡ {t0,t1, . . .}.
m=0
Thus Qm ≡ {0,2−m,2 · 2−m, . . . ,1} is a 2−m -approximation of [0,1], with Qm ⊂ Qm+1 , for each m ≥ 0. Moreover, for each m ≥ 0, recall the enumerated sets of dyadic rationals Qm ≡ {u0,u1, . . . ,up(2m) } ≡ {0,2−m,2 · 2−m, . . . ,2m } ⊂ [0,2m ] and Q∞ ≡
∞
Qm = {u0,u1, . . .}.
m=0
10.1 Càdlàg Function Recall some notations and conventions. To minimize clutter, a subscripted expression ab will be written interchangeably with a(b). For an arbitrary function x, we write x(t) only with the explicit or implicit condition that t ∈ domain(x). If X : A × → S is an r.f., and if B is a subset of A, then X|B ≡ X|(B × ) denotes the r.f. obtained by restricting the parameter set to B. Definition 10.1.1. Pointwise left and right limits. Let Q be an arbitrary subset of [0,∞). Let the function x : Q → S be arbitrary. The function x is said to be right continuous at a point t ∈ domain(x) if limr→t;r≥t x(r) = x(t). The function x is said to have a left limit at a point t ∈ Q if limr→t;r 0, there exist δcdlg (ε) ∈ (0,1), p ≥ 1, and a sequence 0 = τ0 < τ1 < · · · < τp−1 < τp = 1
(10.1.1)
in domain(x) such that (i) for each i = 1, . . . ,p, we have τi − τi−1 ≥ δcdlg (ε) and (ii) for each i = 0, . . . ,p − 1, we have d(x,x(τi )) ≤ ε
(10.1.2)
on the interval θi ≡ [τi ,τi+1 ) or θi ≡ [τi ,τi+1 ] depending on whether i ≤ p − 2 or i = p − 1. We will call (τi )i=0,...,p a sequence of ε-division points of x with separation at least δcdlg (ε). Then x said to be a càdlàg function on [0,1] with values in S, with the operation δcdlg as a modulus of càdlàg. We will let D[0,1] denote the set of càdlàg functions. Two members of D[0,1] are considered equal if they are equal as functions – i.e., if they have the same domain and have equal values at each point in the common domain. Note that Condition 3 implies that the endpoints 0,1 are in domain(x). Condition 3 implies also that p ≤ δcdlg (ε)−1 . Let x,y ∈ D[0,1] be arbitrary, , respectively. Then the operation δcdlg ∧ δcdlg with moduli of càdlàg δcdlg ,δcdlg is obviously a common modulus of càdlàg of x,y. The next lemma is a simple consequence of right continuity and generalizes its counterpart for C[0,1]. Lemma 10.1.3. A càdlàg function is uniquely determined by its values on a dense subset of its domain. Let x,y ∈ D[0,1] be arbitrary. Let A be an arbitrary dense subset of [0,1] that is contained in B ≡ domain(x) ∩ domain(y). Then the following conditions hold: 1. Let t ∈ B and α > 0 be arbitrary. Then there exists r ∈ [t,t + α) ∩ A such that
a.u. Càdlàg Process d(x(t),x(r)) ∨ d(y(t),y(r)) ≤ α.
377 (10.1.3)
2. Let f : S 2 → R be a uniformly continuous function. Let c ∈ R be arbitrary such that f (x(r),y(r)) ≤ c for each r ∈ A. Then f (x,y) ≤ c. In other words, f (x(t),y(t)) ≤ c for each t ∈ domain(x) ∩ domain(y). The same assertion holds when “≤” is replaced by “≥” or by “=”. In particular, if d(x(r),y(r)) ≤ ε for each r ∈ A, for some ε > 0, then f (x,y) ≤ ε on domain(x) ∩ domain(y). 3. Suppose x(r) = y(r) for each r ∈ A. Then x = y. In other words, domain(x) = domain(y) and x(t) = y(t) for each t ∈ domain(x). 4. Let λ : [0,1] → [0,1] be an arbitrary continuous and increasing function with λ(0) = 0 and λ(1) = 1. Then x ◦ λ ∈ D[0,1]. Proof. Let δcdlg be a common modulus of càdlàg of x and y. 1. Let t ∈ B and α > 0 be arbitrary. Let (τi )i=0,...,p and (τi )i=0,...,p be sequenα ces of α 2 -division points of x and y, respectively, with separation of at least δcdlg 2 . Then τp−1 ∨ τp −1 < 1. Hence either (i) t < 1 or (ii) τp−1 ∨ τp −1 < t. Consider case (i). Since x,y are right continuous at t according to Definition 10.1.2, and since A is dense in [0,1], there exists r in A ∩ [t,1 ∧ (t + α)) such that d(x(t),x(r)) ∨ d(y(t),y(r)) ≤ α, as desired. Consider case (ii). Take r ∈ (τp−1 ∨ τp −1,t) ∩ A. Then t,r ∈ [τp−1,1] ∩ [τp −1,1]. Hence Condition 3 in Definition 10.1.2 implies that d(x(t),x(r)) ≤ d(x(τp−1 ),x(t)) + d(x(τp−1 ),x(r)) ≤
α α + = α. 2 2
Similarly, d(y(t),y(r)) ≤ α. Assertion 1 is proved. 2. Let t ∈ B be arbitrary. By Assertion 1, for each k ≥ 1, there exists rk ∈ [t,t + k −1 )A such that d(x(t),x(rk )) ∨ d(y(t),y(rk )) ≤ k −1 . By hypothesis, f (x(rk ),y(rk )) ≤ c because rk ∈ A, for each k ≥ 1. Consequently, by right continuity of x,y and continuity of f , we have f (x(t),y(t)) = lim f (x(rk ),y(rk )) ≤ c. k→∞
3. By hypothesis, d(x(r),y(r)) = 0 for each r ∈ A. Consider each t ∈ domain (x). We will first verify that t ∈ domain(y). To that end, let ε > 0 be arbitrary. Since x is right continuous at t, there exists c > 0 such that d(x(r),x(t)) < ε
(10.1.4)
for each r ∈ [t,t + c) ∩ domain(x). Consider each s ∈ [t,t + c) ∩ domain(y). Let α ≡ (t + c − s) ∧ ε. By Assertion 1 applied to the pair y,y in D[0,1], there exists r ∈ [s,s + α) ∩ A such that d(y(s),y(r)) ≤ α ≤ ε. Then r ∈ [t,t + c) ∩ A, whence inequality 10.1.4 holds. Combining,
378
Stochastic Process
d(y(s),x(t)) ≤ d(y(s),y(r)) + d(y(r),x(r)) + d(x(r),x(t)) < ε + 0 + ε = 2ε. Since ε > 0 is arbitrary, we see that lims→t;s>t y(s) exists and is equal to x(t). Hence the right completeness Condition 2 in Definition 10.1.2 implies that t ∈ domain(y), as alleged. Condition 1 in Definition 10.1.2 then implies that y(t) = lims→t;s>t y(s) = x(t). Since t ∈ domain(x) is arbitrary, we conclude that domain(x) ⊂ domain(y), and that x = y on domain(x). By symmetry, domain(x) = domain(y). 4. Since λ is continuous and increasing, it has an inverse λ−1 that is also continuous and increasing, with some modulus of continuity δ. Let δcdlg be a modulus of càdlàg of x. We will prove that x ◦ λ is càdlàg, with δ1 ≡ δ ◦ δcdlg as a modulus of càdlàg. To that end, let ε > 0 be arbitrary. Let 0 ≡ τ0 < τ1 < · · · < τp−1 < τp = 1
(10.1.5)
be a sequence of ε-division points of x with separation at least δcdlg (ε). Thus, for each i = 1, . . . ,p, we have τi − τi−1 ≥ δcdlg (ε). Suppose λτi − λτi−1 < δ(δcdlg (ε)) for some i = 1, . . . ,p. Then, since δ is a modulus of continuity of the inverse function λ−1 , it follows that τi − τi−1 = λ−1 λτi − λ−1 λτi−1 < δcdlg (ε), which is a contradiction. Hence λτi − λτi−1 ≥ δ(δcdlg (ε)) ≡ δ1 (ε) for each i = 1, . . . ,p. Moreover, for each i = 0, . . . ,p − 1, we have d(x,x(τi )) ≤ ε
(10.1.6)
on the interval θi ≡ [τi ,τi+1 ) or θi ≡ [τi ,τi+1 ] depending on whether i ≤ p −2 or i = p − 1. Since the function λ is increasing, it follows that d(x ◦ λ,x ◦ λ(τi )) ≤ ε
(10.1.7)
on the interval θi ≡ [λ−1 τi ,λ−1 τi+1 ) or θi ≡ [λ−1 τi ,λ−1 τi+1 ] depending on whether i ≤ p − 2 or i = p − 1. Thus the sequence 0 = λ−1 τ0 < λ−1 τ1 < · · · < λ−1 τp−1 < λ−1 τp = 1 is a sequence of ε-division points of x ◦ λ with separation at least δ1 (ε). Condition 3 in Definition 10.1.2 has been proved for the function x ◦ λ. In view of the monotonicity and continuity of the function λ, the other conditions can also be easily verified. Accordingly, the function x ◦ λ is càdlàg, with a modulus of càdlàg δ1 . Assertion 4 and the lemma are proved. Proposition 10.1.4. Each càdlàg function is continuous at all but countably many time points. Let x ∈ D[0,1] be arbitrary with a modulus of càdlàg δcdlg . Then the function x on [0,1] is continuous at the endpoints 0 and 1.
a.u. Càdlàg Process
379
For each k ≥ 1, let (τk,i )i=0,...,p(k) be a sequence of k −1 -division points of x with separation of at least δcdlg (k −1 ). Then the following conditions hold: p(k)−1 1. Define the set A ≡ ∞ θ k,i , where θ k,i ≡ [τk,i ,τk,i+1 ) or θ k,i ≡ k=1 i=0 [τk,i ,τk,i+1 ] depending on whether i = 0, . . . ,pk − 2 or i = pk − 1. Then the set A contains all but countably many points in [0,1] and is a subset of domain(x). p(k)−1 θk,i , where θk,i ≡ [0,τk,1 ] or θk,i ≡ 2. Define the set A ≡ ∞ k=1 i=0 (τk,i ,τk,i+1 ] depending on whether i = 0 or i = 1, . . . ,pk − 1. Then the set A contains all but countably many points in [0,1] and the function x has a left limit at each t ∈ A . p(k)−1 (τk,i ,τk,i+1 ). Then the set A contains 3. Define the set A ≡ ∞ k=1 i=0 all but countably many points in [0,1] and the function x is continuous at each t ∈ A . 4. The function x is bounded on domain(x). Specifically, d(x◦,x(t)) ≤ b ≡
p(1)−1
d(x◦,x(τ1,i )) + 1
i=0
for each t ∈ domain(x), where x◦ is an arbitrary but fixed reference point in S. Proof. By Definition 10.1.2, we have 1 ∈ domain(x). Condition 3 in Definition 10.1.2 implies that 0 = τ1,0 ∈ domain(x) and that x is continuous at 0 and 1. 1. Let t ∈ A be arbitrary. Let k ≥ 1 be arbitrary. Then t ∈ θ k,i for some i. = 0, . . . ,pk −1. Let δ0 ≡ τk,i+1 −t or δ0 ≡ 2 depending on whether i = 0, . . . ,pk −2 or i = pk − 1. Then domain(x) ∩ [t,t + δ0 ) is a nonempty subset of θ k,i . Moreover, by Condition 3 of Definition 10.1.2, we have d(x(r),x(τk,i )) ≤ k −1 for each r ∈ domain(x) ∩ [t,t + δ0 ). Hence d(x(r),x(s)) ≤ 2k −1 for each r,s ∈ domain(x) ∩ [t,t +δ0 ). Since 2k −1 is arbitrarily small, and since the metric space (S,d) is complete, we see that limr→t;r≥t x(r) exists. The right completeness Condition 2 of Definition 10.1.2 therefore implies that t ∈ domain(x). We conclude that A ⊂ domain(x). Assertion 1 follows. for some i = 0, . . . ,p − 1. 2. Let t ∈ A and k ≥ 1 be arbitrary. Then t ∈ θk,i . k Let δ0 ≡ 2 or δ0 ≡ t −τk,i depending on whether i. = 0 or i. = 1, . . . ,pk −1. Then domain(x) ∩ (t − δ0,t) is a nonempty subset of θ k,i . Moreover, by Condition 3 of Definition 10.1.2, we have d(x,x(τk,i )) ≤ k −1 for each r ∈ domain(x) ∩ (t − δ0,t). An argument similar to that made in the previous paragraph then shows that limr→t;r 0, let h ≡ [2 + 0 ∨ − log2 α]1 , and define δrc (α,δcdlg ) ≡ 2−h δcdlg (2−h ) > 0. Let ε > 0 be arbitrary. Then there exists a Lebesgue measurable subset A of domain(x) with Lebesgue measure μ(A) < ε such that for each α ∈ (0,ε), we have d(x(t),x(s)) < α for each t ∈ Ac ∩ domain(x) and s ∈ [t,t + δrc (α,δcdlg )) ∩ domain(x). Note that the operation δrc (·,δcdlg ) is determined by δcdlg . Proof. 1. Let h ≥ 0 be arbitrary. Write αh ≡ 2−h . Then, according to Condition (iii) of Definition 10.1.2, there exist an integer ph ≥ 1 and a sequence 0 = τh,0 < τh,1 < · · · < τh,p−1 < τh,p(h) = 1
(10.1.8)
in domain(x), such that (i) for each i = 1, . . . ,ph , we have τh,i − τh,i−1 ≥ δcdlg (αh )
(10.1.9)
and (ii) for each i = 0, . . . ,ph − 1, we have d(x,x(τh,i )) ≤ αh
(10.1.10)
on the interval θh,i ≡ [τh,i ,τh,i+1 ) or θh,i ≡ [τh,i ,τh,i+1 ] depending on whether i ≤ ph − 2 or i = ph − 1. 2. Let i = 0, . . . ,ph − 1 be arbitrary. Define θ h,i ≡ [τh,i ,τh,i+1 − αh (τh,i+1 − τh,i )) ⊂ θh,i . p(h)−1 Define θ h ≡ i=0 θ h,i . Then μ(θ h ) =
p(h)−1
i=0
μ(θ h,i ) =
p(h)−1
(10.1.11)
(τh,i+1 − τh,i )(1 − αh ) = 1 − αh,
i=0
whence μθ h c = αh ≡ 2−h , where h ≥ 0 is arbitrary.
a.u. Càdlàg Process
381
3. Now let ε > 0 be arbitrary, and let k ≡ [1 + 0 ∨ − log2 ε]1 . Define A ≡ ∞ c h=k+1 θ h . Then μ(A) ≤
∞
2−h = 2−k < ε.
h=k+1
∩ domain(x). Let α ∈ (0,ε) be arbitrary, and let h ≡ Consider each t ∈ [3 + 0 ∨ − log2 α]1 . Then Ac
h > 3 + 0 ∨ − log2 α > 3 + 0 ∨ − log2 ε > k. Hence h ≥ k +1, and so Ac ⊂ θ h . Therefore t ∈ θ h,i ≡ [τh,i ,τh,i+1 −αh (τh,i+1 − τh,i )) for some i = 0, . . . ,ph − 1. 4. Now consider each s ∈ [t,t + δrc (α,δcdlg )) ∩ domain(x). Then τh,i ≤ s ≤ t + δrc (α,δcdlg ) < τh,i+1 − αh · (τh,i+1 − τh,i ) + δrc (α,δcdlg ) ≤ τh,i+1 − 2−h δcdlg (2−h ) + 2−h δcdlg (2−h ) = τh,i+1 . Hence s,t ∈ [τh,i ,τh,i+1 ). It follows that d(x(s),x(τh,i )) ∨ d(x(t),x(τh,i )) ≤ αh and therefore that d(x(s),x(t)) ≤ 2αh = 2−h+1 < α.
The next proposition shows that if a function satisfies all the conditions in Definition 10.1.2 except perhaps the right completeness Condition 2, then it can be right completed and extended to a càdlàg function. This is analogous to the extension of a uniformly continuous function on a dense subset of [0,1]. Proposition 10.1.6. Right-limit extension and càdlàg completion. Let (S,d) be a locally compact metric space. Suppose Q = [0,1] or Q = [0,∞). Let x : Q → S be a function whose domain contains a dense subset A of Q, and which is right continuous at each t ∈ domain(x). Define its right-limit extension x : Q → S by A B (10.1.12) domain(x) ≡ t ∈ Q; lim x(r) exists r→t;r≥t
and by x(t) ≡
lim x(r)
r→t;r≥t
(10.1.13)
for each t ∈ domain(x). Then the following conditions hold: 1. The function x is right continuous at each t ∈ domain(x). 2. Suppose t ∈ Q is such that limr→t;r≥t x(r) exists. Then t ∈ domain(x). 3. Suppose Q = [0,1]. Suppose, in addition, that δcdlg : (0,∞) → (0,∞) is an operation such that x and δcdlg satisfy Conditions 1 and 3 in Definition 10.1.2.
382
Stochastic Process
Then x ∈ D[0,1]. Moreover, x has δcdlg as a modulus of càdlàg. Furthermore, x = x|domain(x). We will then call x the càdlàg completion of x. Proof. 1. Since, by hypothesis, x is right continuous at each t ∈ domain(x), it follows from the definition of x that domain(x) ⊂ domain(x) and that x = x on domain(x). In other words, x = x|domain(x). Since domain(x) contains the dense subset A of Q, so does domain(x). Now let t ∈ domain(x) and ε > 0 be arbitrary. Then, by the defining equality 10.1.13, x(t) ≡
lim x(r).
r→t;r≥t
Hence there exists δ0 > 0 such that d(x(t),x(r)) ≤ ε
(10.1.14)
for each r ∈ domain(x) ∩ [t,t + δ0 ). Let s ∈ domain(x) ∩ [t,t + δ0 ) be arbitrary. Then, again by the defining equalities 10.1.12 and 10.1.13, there exists a sequence (rj )j =1,2,... in domain(x) ∩ [s,t + δ0 ) such that rj → s and x(s) = limj →∞ x(rj ). For each j ≥ 1, we then have rj ∈ domain(x) ∩ [t,t + δ0 ). Hence inequality 10.1.14 holds for r = rj , for each j ≥ 1. Letting j → ∞, we therefore obtain d(x(t),x(s)) ≤ ε. Since ε > 0 is arbitrary, we conclude that x is right continuous at each t ∈ domain(x). Assertion 1 has been verified. 2. Next suppose limr→t;r≥t x(r) exists. Then, since x = x|domain(x), the right limit lim x(r) =
r→t;r≥t
lim x(r)
r→t;r≥t
exists. Hence t ∈ domain(x) by the defining equality 10.1.12. Condition 2 of Definition 10.1.2 has been proved for x. 3. Now let ε > 0 be arbitrary. Because x = x|domain(x), each sequence (τi )i=0,...,p of ε-division points of x, with separation of at least δcdlg (ε), is also a sequence of ε-division points of x with separation of at least δcdlg (ε). Therefore Condition 3 in Definition 10.1.2 holds for x and the operation δcdlg . Summing up, the function x is càdlàg, with δcdlg as a modulus of càdlàg. The next definition introduces simple càdlàg functions as càdlàg completion of step functions. Definition 10.1.7. Simple càdlàg function. Let 0 = τ0 < · · · < τp−1 < τp = 1 be an arbitrary sequence of dyadic rationals in [0,1] such that p >
(τi − τi−1 ) ≥ δ0
i=1
for some δ0 > 0. Let x0, . . . ,xp−1 be an arbitrary sequence in S.
a.u. Càdlàg Process
383
Define a function z : [0,1] → S by domain(z) ≡
p−1
θi ,
(10.1.15)
i=0
where θi ≡ [τi ,τi+1 ) or θi ≡ [τi ,τi+1 ] depending on whether i = 0, . . . ,p − 2 or i = p − 1, and by z(r) ≡ xi for each r ∈ θi , for each i = 0, . . . ,p − 1. Let x ≡ z ∈ D[0,1] be the càdlàg completion of z. Then x is called the simple càdlàg function determined by the pair of sequences ((τi )i=0,...,p−1,(xi )i=0,...,p−1 ). In symbols, we then write x ≡ smpl ((τi )i=0,...,p−1,(xi )i=0,...,p−1 ) or simply x ≡ smpl ((τi ),(xi )) when the range of subscripts is understood. The sequence (τi )i=0,...,p is called the sequence of division points of the simple càdlàg function x. The next lemma verifies that x is a well-defined càdlàg function, with the constant operation δcdlg (·) ≡ δ0 as a modulus of càdlàg. Lemma 10.1.8. Simple càdlàg functions are well defined. Use the notations and assumptions in Definition 10.1.7. Then z and δcdlg satisfy the conditions in Proposition 10.1.6. Accordingly, the càdlàg completion z ∈ D[0,1] of z is well defined. Proof. First note that domain(z) contains the dyadic rationals in [0,1]. Let t ∈ domain(z) be arbitrary. Then t ∈ θi for some i = 0, . . . ,p − 1. Hence, for each r ∈ θi , we have z(r) ≡ xi ≡ z(t). Therefore z is right continuous at t. Condition 1 in Definition 10.1.2 has been verified for z. The proof of Condition 3 in Definition 10.1.2 for z and δcdlg being trivial, the conditions in Proposition 10.1.6 are satisfied. Lemma 10.1.9. Insertion of division points leaves a simple càdlàg function unchanged. Let p ≥ 1 be arbitrary. Let 0 ≡ q0 < q1 < · · · < qp ≡ 1 be an arbitrary sequence of dyadic rationals in [0,1], with an arbitrary subsequenc 0 ≡ qi(0) < qi(1) < · · · < qi(κ) ≡ 1. Let (w0, . . . ,wκ−1 ) be an arbitrary sequence in S. Let x ≡ smpl ((qi(k) )k=0,...,κ−1,(wk )k=0,...,κ−1 ). Let y = smpl ((qj )j =0,...,p−1,(x(qj ))j =0,...,p−1 ). Then x = y.
(10.1.16)
384
Stochastic Process
Proof. Let j = 0, . . . ,p −1 and t ∈ [qj ,qj +1 ) be arbitrary. Then t ∈ [qj ,qj +1 ) ⊂ [qi(k),qi(k−1) ) for some unique k = 0, . . . ,κ − 1. Hence y(t) = x(qj ) = wk = p−1 x(t). Thus y = x on the dense subset j =0 [qj ,qj +1 ) of domain(y ∩ domain(x). Hence, by Assertion 3 of Lemma 10.1.3, we have y = x.
10.2 Skorokhod Space D[0,1] of Càdlàg Functions Following Skorokhod, via [Billingsley 1968], we proceed to define a metric on the space D[0,1] of càdlàg functions. This metric is similar to the supremum metric in C[0,1], except that it allows and measures a small error in time parameter, in terms of a continuous and increasing function λ : [0,1] → [0,1] with λ(0) = 0 and λ(1) = 1. Let λ,λ be any such continuous and increasing functions. We will write, as an abbreviation, λt for λ(t), for each t ∈ [0,1]. We will write λ−1 for the inverse of λ, and λ λ ≡ λ ◦ λ for the composite function. Definition 10.2.1. Skorokhod metric. Let denote the set of continuous and increasing functions λ : [0,1] → [0,1] with λ0 = 0 and λ1 = 1, such that there exists c > 0 with log λt − λs ≤ c (10.2.1) t −s or, equivalently, e−c (t − s) ≤ λt − λs ≤ ec (t − s), for each 0 ≤ s < t ≤ 1. We will call the set of admissible functions on [0,1]. Let x,y ∈ D[0,1] be arbitrary. Let Ax,y denote the set consisting of all pairs (c,λ) ∈ [0,∞) × such that (i) inequality 10.2.1 holds for each 0 ≤ s < t ≤ 1, and (ii) d(x(t),y(λt)) ≤ c
(10.2.2)
for each t ∈ domain(x) ∩ λ−1 domain(y). Let Bx,y ≡ {c ∈ [0,∞) : (c,λ) ∈ Ax,y for some λ ∈ }. Define the metric dD[0,1] on D[0,1] by dD[0,1] (x,y) ≡ inf Bx,y .
(10.2.3)
We will presently prove that dD[0,1] is well defined and is indeed a metric, called the Skorokhod metric on D[0,1]. When the interval [0,1] is understood, we write dD for dD[0,1] . Intuitively, the number c bounds both (i) the error in the time measurement, represented by the distortion λ, and (ii) the supremum distance between the
a.u. Càdlàg Process
385
functions x and y when allowance is made for said error. Existence of the infimum in equality 10.2.3 would follow easily from the principle of infinite search. We will supply a constructive proof in Lemmas 10.2.4 through 10.2.8. Proposition 10.2.9 will then complete the proof that dD[0,1] is a metric. Finally, we will prove that the Skorokhod metric space (D[0,1],dD ) is complete. First two elementary lemmas. Lemma 10.2.2. A sufficient condition for existence of infimum or supremum. Let B be an arbitrary nonempty subset of R. 1. Suppose for each k ≥ 0, there exists αk ∈ R such that (i) αk ≤ c + 2−k for each c ∈ B and (ii) c ≤ αk + 2−k for some c ∈ B. Then inf B exists and inf B = limk→∞ αk . Moreover, αk − 2−k ≤ inf B ≤ αk + 2−k for each k ≥ 0. 2. Suppose for each k ≥ 0, there exists αk ∈ R such that (iii) αk ≥ c − 2−k for each c ∈ B and (iv) c ≥ αk − 2−k for some c ∈ B. Then sup B exists and sup B = limk→∞ αk . Moreover, αk − 2−k ≤ sup B ≤ αk + 2−k for each k ≥ 0. Proof. Let h,k ≥ 0 be arbitrary. Then, by Condition (ii), there exists c ∈ B such that c ≤ αk + 2−k . At the same time, by Condition (i), we have αh ≤ c + 2−h ≤ αk + 2−k + 2−h . Similarly, αk ≤ αh + 2−h + 2−k . Consequently, |αh − αk | ≤ 2−h + 2−k . We conclude that the limit α ≡ limk→∞ αk exists. Let c ∈ B be arbitrary. Letting k → ∞ in Condition (i), we see that α ≤ c. Thus α is a lower bound for the set B. Suppose β is a second lower bound for B. By Condition (ii) there exists c ∈ B such that c ≤ αk + 2−k . Then β ≤ c ≤ αk + 2−k . Letting k → ∞, we obtain β ≤ α. Thus α is the greatest lower bound of the set B. In other words, inf B exists and is equal to α, as alleged. Assertion 1 is proved. The proof of Assertion 2 is similar. Lemma 10.2.3. Logarithm of certain difference quotients. Let 0 = τ0 < · · · < τp−1 < τp ≡ 1 be an arbitrary sequence in [0,1]. Suppose the function λ ∈ is linear on [τi ,τi+1 ], for each i = 0, . . . ,p − 1. Then p−1 λt − λs λτi+1 − λτi =α≡ sup log log τ . t −s i+1 − τi 0≤s 0, there exists j = 0, . . . ,m such that log λτ j +1 − λτ j > α − ε. τj +1 − τj Thus
λu − λv + ε, α < log u−v
(10.2.6)
where u ≡ τ j +1 and v ≡ λτj . Since ε > 0 is arbitrary, inequalities 10.2.5 and 10.2.6 together imply the desired equality 10.2.4, according to Lemma 10.2.2. Next, we prove some metric-like properties of the sets Bx,y introduced in Definition 10.2.1. Lemma 10.2.4. Metric-like properties of the sets Bx,y . Let x,y,z ∈ D[0,1] be arbitrary. Then Bx,y is nonempty. Moreover, the following conditions hold: 1. 0 ∈ Bx,x . More generally, if d(x,y) ≤ b for some b ≥ 0, then b ∈ Bx,y . 2. Bx,y = Bx,y . 3. Let c ∈ Bx,y and c ∈ By,z be arbitrary. Then c + c ∈ Bx,z . Specifically, suppose (c,λ) ∈ Ax,y and (c ,λ ) ∈ Ay,z for some λ,λ ∈ . Then (c + c ,λλ ) ∈ Ax,z . Proof. 1. Let λ0 : [0,1] → [0,1] be the identity function. Then, trivially, λ0 is admissible and d(x,x ◦ λ0 ) = 0. Hence (0,λ0 ) ∈ Ax,x . Consequently, 0 ∈ Bx,x . More generally, if d(x,y) ≤ b, then for some b ≥ 0, we have d(x,y ◦ λ0 ) = d(x,y) ≤ b, whence b ∈ Bx,y . 2. Next consider each c ∈ Bx,y . Then there exists (c,λ) ∈ Ax,y , satisfying inequalities 10.2.1 and 10.2.2. For each 0 ≤ s < t ≤ 1, if we write u ≡ λ−1 t and v ≡ λ−1 s, then u − v λu − λv λ−1 t − λ−1 s log = ≤ c. (10.2.7) = log log t −s λu − λv u−v Separately, consider each t ∈ domain(y) ∩ (λ−1 )−1 domain(x). Then u ≡ λ−1 t ∈ domain(x) ∩ λ−1 domain(y). Hence
a.u. Càdlàg Process d(y(t),x(λ−1 t)) = d(y(λu),x(u)) ≤ c.
387 (10.2.8)
Thus (c,λ−1 ) ∈ Ay,x . Consequently c ∈ By,x . Since c ∈ Bx,y is arbitrary, we conclude that Bx,y ⊂ By,x and, by symmetry, that Bx,y = By,x . 3. Consider arbitrary c ∈ Bx,y and c ∈ By,z . Then (c,λ) ∈ Ax,y and (c ,λ ) ∈ Ay,z for some λ,λ ∈ . The composite function λ λ on [0,1] then satisfies log λ λt − λ λs = log λ λt − λ λs + log λt − λs ≤ c + c . t −s λt − λs t −s Let r ∈ A ≡ domain(x) ∩ λ−1 domain(y) ∩ (λ λ)−1 domain(z) be arbitrary. Then d(x(r),z(λ λr)) ≤ d(x(r),y(λr)) + d(y(λr),z(λ λr)) ≤ c + c . By Proposition 10.1.4, the set A contains all but countably many points in [0,1], Hence it is dense in [0,1]. It therefore follows from Assertion 2 of Lemma 10.1.3 that d(x,z ◦ (λ λ)) ≤ c + c . Combining, we see that (c + c ,λ λ) ∈ Bx,z .
Definition 10.2.5. Simple càdlàg function and related notations. 1. Recall, from Definition 10.2.1, the set of admissible functions on [0,1]. Let λ0 ∈ be the identity function: λ0 t = t for each t ∈ [0,1]. Let m ≥ m be arbitrary. Let m,m denote the finite subset of consisting of functions λ such that (i) λQm ⊂ Qm and (ii) λ is linear on [qm,i ,qm,i+1 ] for each i = 0, . . . ,pm −1. 2. Let B be an arbitrary compact subset of (S,d), and let δ : (0,∞) → (0,∞) be an arbitrary operation. Then DB,δ [0,1] will denote the subset of D[0,1] consisting of càdlàg functions x with values in the compact set B and with δcdlg ≡ δ as a modulus of càdlàg. 3. Let U ≡ {u1, . . . ,uM } be an arbitrary finite subset of (S,d). Then Dsimple,m,U [0,1] will denote the finite subset of D[0,1] consisting of simple càdlàg functions with values in U and with qm ≡ (qm,0, . . . ,qm,p(m) ) as a sequence of division points. In symbols, Dsimple,m,U [0,1] ≡ { smpl ((qm,i )i=0,...,p(m)−1,(xi )i=0,...,p(m)−1 ) : xi ∈ U for each i = 0, . . . ,pm − 1}. 4. Let δlog @1 be a modulus of continuity at 1 of the natural logarithm function log. Specifically, let δlog @1 (ε) ≡ 1 − e−ε for each ε > 0. Note that 0 < δlog @1 < 1.
388
Stochastic Process
Lemma 10.2.6. dD[0,1] is well defined on the subset Dsimple,m,U [0,1]. Let M,m ≥ 1 be arbitrary. Let U ≡ {u1, . . . ,uM } be an arbitrary finite subset of (S,d). Let x,y ∈ Dsimple,m,U [0,1] be arbitrary. Then dD (x,y) ≡ inf Bx,y exists. Moreover, the following conditions hold: 1. Take any b > M i,j =0 d(ui ,uj ). Note that x ≡ smpl ((qm,i )i=0,...,p(m)−1,(xi )i=0,...,p(m)−1 ) for some sequence (xi )i=0,...,p(m)−1 in U . Similarly, y ≡ smpl ((qm,i )i=0,...,p(m)−1,(yi )i=0,...,p(m)−1 ) for some sequence (yi )i=0,...,p(m)−1 in U . Let k ≥ 0 be arbitrary. Take mk ≥ m so large that 2−m(k) ≤ 2−m−2 e−b δlog @1 (2−k ) < 2−m−2 e−b .
(10.2.9)
For each ψ ∈ m,m(k) , define βψ ≡
log ψqm,i+1 − ψqm,i ∨ d(y(ψqm,i ),x(qm,i )) qm,i+1 − qm,i ⎞
p(m)−1 i=0
∨ d(y(qm,i ),x(ψ −1 qm,i ))⎠ .
(10.2.10)
Then there exists ψk ∈ m,m(k) with (βψ(k),ψk ) ∈ Ax,y such that |dD (x,y) − βψ(k) | ≤ 2−k+1 . 2. For each k ≥ 0, we have (dD (x,y) + 2−k+1,ψk ) ∈ Ax,y . Proof. 1. Let M,m,U,x,y,b and k be as given. As an abbreviation, write m ≡ mk . Inequality 10.2.9 then implies that (m − m − 2) − b > 0. 2. As an abbreviation, write ε ≡ 2−k , p ≡ pm ≡ 2m , and τi ≡ qm,i ≡ i2−m ∈ Qm
(10.2.11)
for each i = 0, . . . ,p. Similarly, write n ≡ pm ≡ 2m , ≡ 2−m and
ηj ≡ qm ,j ≡ j 2−m ∈ Qm
(10.2.12)
for each j = 0, . . . ,n. Then, by hypothesis, x ≡ smpl ((τi )i=0,...,p−1,(xi )i=0,...,p−1 ) and y ≡ smpl ((τi )i=0,...,p−1,(yi )i=0,...,p−1 ). By the definition of simple càdlàg functions, we have x = xi and y = yi on [τi ,τi+1 ), for each i = 0, . . . ,p − 1.
a.u. Càdlàg Process 3. By hypothesis, b >
b>b >
M
M
i,j =0 d(ui ,uj ).
d(ui ,uj ) ≥
i,j =0
389
Hence there exists b > 0 such that
p−1
d(xi ,yj ) ≥ d(x(t),y(t))
i,j =0
for each t ∈ domain(x) ∩ domain(y). Hence b,b ∈ Bx,y by Assertion 1 of Lemma 10.2.4. 4. Define > βψ , αk ≡ ψ∈(m,m )
where βψ has been defined for each ψ ∈ m,m in equality 10.2.10. We will prove that (i) αk ≤ c + 2−k for each c ∈ Bx,y and (ii) c ≤ αk + 2−k for some c ∈ Bx,y . It will then follow from Lemma 10.2.2 that inf Bx,y exists and that inf Bx,y = limk→∞ αk . 5. To prove Condition (i), we will first show that it suffices to prove that (iii) αk ≤ c + 2−k for each c ∈ Bx,y with c < b. Suppose Condition (iii) is proved. Then αk ≤ b + 2−k . Now consider an arbitrary c ∈ Bx,y . Then either c < b or b < c. In the former case, we have αk ≤ c + 2−k on account of Condition (iii). In the latter case, we have αk ≤ b + 2−k < c + 2−k . Combining, we see that Condition (i) follows from Condition (iii), in any case. 6. Proceed to prove Condition (iii), with c ∈ Bx,y arbitrary and with c < b. Since c ∈ Bx,y , there exists λ ∈ such that (c,λ) ∈ Ax,y . In other words,
and, for each 0 ≤ s < t ≤ 1,
d(x,y ◦ λ) ≤ c
(10.2.13)
log λt − λs ≤ c, t −s
(10.2.14)
where the last inequality is equivalent to e−c (t − s) ≤ λt − λs ≤ ec (t − s).
(10.2.15)
7. Now consider each i = 1, . . . ,p − 1. Then there exists ji = 1, . . . ,n − 1 such that ηj (i)−1 < λτi < ηj (i)+1 .
(10.2.16)
Either (i ) d(y(ηj (i)−1 ),x(τi )) < d(y(ηj (i) ),x(τi )) + ε
(10.2.17)
d(y(ηj (i) ),x(τi )) < d(y(ηj (i)−1 ),x(τi )) + ε.
(10.2.18)
or (ii )
390
Stochastic Process
In case (i ), define ζi ≡ ηj (i)−1 . In case (ii ), define ζi ≡ ηj (i) . Then, in both cases (i ) and (ii ), inequality 10.2.16 implies that ζi − < λτi < ζi + 2.
(10.2.19)
8. Moreover, inequalities 10.2.17 and 10.2.18 together imply that d(y(ζi ),x(τi )) ≤ d(y(ηj (i)−1 ),x(τi )) ∧ d(y(ηj (i) ),x(τi )) + ε.
(10.2.20)
At the same time, in view of inequality 10.2.16, there exists a point s ∈ (λτi ,λτi+1 ) ∩ ((ηj (i)−1,ηj (i) ) ∪ (ηj (i),ηj (i)+1 )). λ−1 s
∈ (τi ,τi+1 ), whence x(τi ) = x(t). Moreover, either s ∈ (ηj (i)−1, Then t ≡ ηj (i) ), in which case y(ηj (i)−1 ) = y(s), or s ∈ (ηj (i),ηj (i)+1 ), in which case y(ηj (i) ) = y(s). Here we used the fact that the simple càdlàg function y is constant over the interval (ηh−1,ηh ), for each h = 1, . . . ,n. In both cases, inequality 10.2.20 yields d(y(ζi ),x(τi )) ≤ d(y(s),x(τi )) + ε = d(y(λt),x(t)) + ε ≤ c + ε,
(10.2.21)
where the last inequality is from inequality 10.2.13. 9. Now let ζ0 ≡ 0 and ζp ≡ 1. Then, for each i = 0, . . . ,p − 1, inequality 10.2.19 implies that ζi+1 − ζi > (λτi+1 − 2) − (λτi + ) > λτi+1 − λτi − 4
≥ e−c (τi+1 − τi ) − 4 > e−b 2−m − 2−m +2 ≥ 0, where the third inequality follows from inequality 10.2.15, where the fourth inequality is from the assumption c < b in Condition (iii), and where the last inequality is from inequality 10.2.9. Thus we see that (ζi )i=0,...,p is an increasing sequence in Qm . As such, it determines a unique function ψ ∈ m,m that is linear on [τi ,τi+1 ] for each i = 0, . . . ,p − 1, such that ψτi ≡ ζi for each i = 0, . . . ,p. 10. By Lemma 10.2.3, we then have p−1 ψt − ψs ψτi+1 − ψτi = sup log log τ t −s i+1 − τi 0≤s
(ηi+1 − ηi ) >
i=0
n−1 >
(τi+1 − τi )−δcdlg (2−1 ε) ≥ 0.
i=0
Thus (ηi )i=0,...,n is an increasing sequence in [0,1]. Therefore we can define an increasing function ψ ∈ by (i) ψτi = ηi for each i = 0, . . . ,n, and (ii) ψ is linear on [τi ,τi+1 ] for each i = 1, . . . ,n − 1. 3. By Lemma 10.2.3, n−1 ψt − ψs ψτi+1 − ψτi = log sup log t −s τi+1 − τi 0≤s 0 is arbitrary, it follows that dD (x,z) ≤ dD (x,y) + dD (y,z). 2. It remains to prove that if dD (x,y) = 0, then x = y. To that end, suppose dD (x,y) = 0. Let ε > 0 be arbitrary. Let (τi )i=0,...,p and (ηj )j =0,...,n be sequences of ε-division points of x,y, respectively. Note that τ0 = η0 = 0 and τp = ηn = 1. Let m ≥ 1 ∨ ε−1/2 be arbitrary. Consider each k ≥ m ∨ 2. Then εk ≡ k −2 ≤ m−2 ≤ ε. Moreover, since dD (x,y) = 0 < εk , we have εk ∈ Bx,y . Therefore there exists, by Definition 10.2.1, some λk ∈ such that log λk r − λk s ≤ εk (10.2.37) r −s for each r,s ∈ [0,1] with s ≤ r, and such that d(x(t),y(λk t)) ≤ εk
(10.2.38)
for each t ∈ domain(x) ∩ λ−1 k domain(y). Then inequality 10.2.37 implies that e−ε(k) r ≤ λk r ≤ eε(k) r
(10.2.39)
a.u. Càdlàg Process
397
for each r ∈ [0,1]. Define the subset ⎞c ⎛ n " −ε(k) # Ck ≡ ⎝ e ηj ,eε(k) ηj ⎠ j =0
= (eε(k) η0,e−ε(k) η1 ) ∪ (eε(k) η1,e−ε(k) η2 ) ∪ · · · ∪ (eε(k) ηn−1,e−ε(k) ηn ) of [0,1], where the superscript c signifies the measure-theoretic complement of a Lebesgue measurable set in [0,1]. Let μ denote the Lebesgue measure on [0,1]. Then μ(Ck ) ≥ 1 −
n
(eε(k) ηj − e−ε(k) ηj )
i=0
≥1−
n
(eε(k) − e−ε(k) ) ≥ 1 − (n + 1)(e2ε(k) − 1) ≥ 1 − 2(n + 1)eεk
j =0
≡ 1 − 2(n + 1)ek −2, where k ≥ m∨2 is arbitrary, and where we used the elementary inequality er −1 ≤ er for each r ∈ (0,1). Now define C≡
∞ ∞
Ck .
h=m k=h+1
Then, for each h ≥ m, we have μ(C ) ≤ c
∞
k=h+1
Hence
μ(C c )
μ(Ckc )
≤
∞
2(n + 1)ek −2 ≤ 2(n + 1)eh−1 .
k=h+1
= 0 and C is a full subset of [0,1]. Consequently, ∞
A ≡ C ∩ domain(x) ∩ domain(y) ∩
λ−1 k domain(y)
k=m
is a full subset of [0,1] and, as such, is dense in [0,1]. Now let t ∈ A be arbitrary. Then t ∈ C. Hence there exists h ≥ m such that t ∈ ∞ k=h+1 Ck . Consider each k ≥ h + 1. Then t ∈ Ck . Hence there exists j = 0, . . . ,n − 1 such that t ∈ (eε(k) ηj ,e−ε(k) ηj +1 ). It then follows from inequality 10.2.39 that ηj < e−ε(k) t ≤ λk t ≤ eε(k) t < ηj +1, whence λk t,t ∈ (ηj ,ηj +1 ). Since (ηj )j =0,...,n is a sequence of ε-division points of the càdlàg function y, it follows that
398
Stochastic Process d(y(λk t),y(t)) ≤ d(y(λk t),y(ηj )) + d(y(ηj ),y(t)) ≤ 2ε.
Consequently, d(x(t),y(t)) = d(x(t),y(λk t)) + d(y(λk t),y(t)) ≤ εk + 2ε ≤ 3ε, where the first inequality is by inequality 10.2.38. Let k → ∞ and then let ε → 0. Then we obtain d(x(t),y(t)) = 0, where t ∈ A is arbitrary. Summing up, x = y on the dense subset A. Therefore Lemma 10.1.3 implies that x = y. We conclude that dD is a metric. 3. Finally, suppose x,y ∈ D[0,1] are such that d(x(t),y(t)) ≤ c for each t in a dense subset of domain(x) ∩ domain(y). Then c ∈ Bx,y by Lemma 10.2.4. Hence dD (x,y) ≡ inf Bx,y ≤ c. The proposition is proved. The next theorem is now a trivial consequence of Lemma 10.2.7. Theorem 10.2.10. Arzela–Ascoli Theorem for (D[0,1],dD ). Let B be an arbitrary compact subset of (S,d). Let k ≥ 0 be arbitrary. Let U ≡ {v1, . . . ,vM } be a 2−k−1 -approximation of of B. Let x ∈ D[0,1] be arbitrary with values in the compact set B and with a modulus of càdlàg δcdlg . Let m ≥ 1 be so large that m ≥ m(k,δcdlg ) ≡ [1 − log2 (δlog @1 (2−k )δcdlg (2−k−1 ))]1 . Then there exist an increasing sequence (ηi )i = 0,...,n in Qm with η0 = 0 and ηn = 1, and a sequence (ui )i=0,...,n−1 in U such that dD[0,1] (x,x) < 2−k , where x ≡ smpl ((ηi )i=0,...,n−1,(ui )i=0,...,n−1 ) ∈ Dsimple,m,U [0,1]. Proof. Write m ≡ m(k,δcdlg ). By Lemma 10.2.7, there exist an increasing sequence (ηi )i=0,...,n in Qm ⊂ Qm with η0 = 0 and ηn = 1, and a sequence (ui )i = 0,...,n−1 in U such that 2−k ∈ Bx,x . At the same time, dD[0,1] (x,x) ≡ inf Bx,x , according to Definition 10.2.1. Hence dD[0,1] (x,x) < 2−k , as desired. Theorem 10.2.11. The Skorokhod space is complete. The Skorokhod space (D[0,1],dD ) is complete. Proof. 1. Let (yk )k=1,2,... be an arbitrary Cauchy sequence in (D[0,1],dD ). We need to prove that dD (yk ,y) → 0 for some y ∈ D[0,1]. Since (yk )k=1,2,... is Cauchy, it suffices to show that some subsequence converges. Hence, by passing to a subsequence if necessary, there is no loss in generality in assuming that dD (yk ,yk+1 ) < 2−k
(10.2.40)
for each k ≥ 1. Let δk ≡ δcdlg,k be a modulus of càdlàg of yk , for each k ≥ 1. For convenience, let y0 ≡ x◦ denote the constant càdlàg function on [0,1]. 2. Let k ≥ 1 be arbitrary. Then dD (y0,yk ) ≤ dD (y0,y1 ) + 2−1 + · · · + 2−k < b0 ≡ dD (y0,y1 ) + 1.
a.u. Càdlàg Process
399
Hence d(x◦,yk (t)) ≤ b0 for each t ∈ domain(yk ). Therefore the values of yk are in some compact subset B of (S,d) that contains (d(x◦,·) ≤ b0 ). Define b ≡ 2b0 > 0. Then d(yk (t),yh (s)) ≤ 2b0 ≡ b, for each t ∈ domain(yk ), s ∈ domain(yh ), for each h,k ≥ 0. 3. Next, refer to Definition 9.0.2 for notations related to dyadic rationals, and refer to Definitions 10.2.1 and 10.2.5 for the notations related to the Skorokhod metric. In particular, for each x,y ∈ D[0,1], recall the sets , m,m , Ax,y and Bx,y . Thus dD (x,y) ≡ inf Bx,y . Let λ0 ∈ denote the identity function on [0,1]. 4. The next two steps will approximate the given Cauchy sequence (yk )k=1,2,... with a sequence (x k )k=1,2,... of simple càdlàg functions whose division points are dyadic rationals. To that end, fix an arbitrary m0 ≡ 0 and, inductively for each k ≥ 1, define mk ≡ [2 + mk−1 − log2 (e−b δlog @1 (2−k )δcdlg,k (2−k−1 ))]1 .
(10.2.41)
mk ≥ [1 − log2 (δlog @1 (2−k )δcdlg,k (2−k−1 ))]1 .
(10.2.42)
Then
5. Define the constant càdlàg functions x 0 ≡ y0 ≡ x◦ . Let k ≥ 1 be arbitrary, and write nk ≡ pm(k) . Let Uk ≡ {vk,1, . . . ,vk,M(k) } be a 2−k−1 -approximation of of B, such that Uk ⊂ Uk+1 for each k ≥ 1. Then, in view of inequality 10.2.42, Theorem 10.2.10 (the Arzela–Ascoli Theorem for the Skorokhod space) applies to the càdlàg function yk . It yields a simple càdlàg function x k ≡ smpl ((ηk,i )i=0,...,n(k)−1,(uk,i )i=0,...,n(k)−1 ) ∈ Dsimple,m(k),U (k) [0,1] (10.2.43) such that dD[0,1] (yk ,x k ) < 2−k ,
(10.2.44)
where (ηk,i )i=0,...,n(k) is an increasing sequence in Qm(k) with ηk,0 = 0 and ηk,n(k) = 1, and where (uk,i )i=0,...,n(k)−1 is a sequence in Uk . Thus the sequence (ηk,i )i=0,...,n(k) comprises the division points of the simple càdlàg function x k . 6. Recall that Qm(k) ≡ {qm(k),0,qm(k),1, . . . ,qm(k),p(m(k)) } ≡ {0,2−m(k),2 · 2−m(k), . . . ,1}. Hence (ηk,i )i= 0,...,n(k) is a subsequence of qm(k) ≡ (qm(k),0,qm(k),1, . . . , qm(k),p(m(k)) ). By Lemma 10.1.9, we may insert all points in Qm(k) that are not already in the sequence (ηk,i )i=0,...,n(k) into the latter, without changing the simple càdlàg function x k . Hence we may assume that ηk ≡ (ηk,0,ηk,1, . . . ,ηk,n(k) ) ≡ (qm(k),0,qm(k),1, . . . ,qm(k),p(m(k)) ).
(10.2.45)
7. It follows from inequalities 10.2.40 and 10.2.44 that dD (x k ,x k+1 ) ≤ dD (x k ,yk ) + dD (yk ,yk+1 ) + dD (yk+1,x k+1 ) < 2−k + 2−k + 2−k−1 < 2−k+2 .
(10.2.46)
400
Stochastic Process
Moreover, since x 0 ≡ x◦ ≡ y0 , we have dD (x◦,x k ) ≤ dD (y0,yk ) + 2−k < dD (y0,y1 ) + dD (y1,y2 ) + · · · + dD (yk−1,yk ) + 2−k < dD (y0,y1 ) + 2−1 + · · · + 2−(k−1) + 2−k < dD (y0,y1 ) + 1 ≡ b0 . Hence d(x◦,x k (t)) ≤ b0 for each t ∈ domain(x k ), where k ≥ 1 is arbitrary. Consequently, d(x k (t),x h (s)) ≤ 2b0 ≡ b
(10.2.47)
for each t ∈ domain(x k ), s ∈ domain(x h ), for each h,k ≥ 0. 8. Now let k ≥ 0 be arbitrary. We will construct λk+1 ∈ m(k+1),m(k+2) such that (2−k+2,λk+1 ) ∈ Ax(k),x(k+1) .
(10.2.48)
First note, from equalities 10.2.43 and 10.2.45, that x k ≡ smpl ((qm(k),i )i=0,...,p(m(k))−1,(uk,i )i=0,...,p(m(k))−1 ) for some sequence (uk,i )i=0,...,p(m(k))−1 in Uk ⊂ Uk+1 . Similarly, x k+1 ≡ smpl ((qm(k+1),i )i=0,...,p(m(k+1))−1,(uk+1,i )i=0,...,p(m(k+1))−1 ) for some sequence (uk+1,i )i=0,...,p(m(k+1))−1 in Uk+1 . 9. Separately, equality 10.2.41, where k is replaced by k + 2, implies that mk+2 ≡ [2 + mk+1 − log2 (e−b δlog @1 (2−k−2 )δcdlg,k (2−k−3 ))]1,
(10.2.49)
whence 2−m(k+2) ≤ 2−m(k+1)−2 e−b δlog @1 (2−k−2 ).
(10.2.50)
Therefore we can apply Lemma 10.2.6, with m,m ,k,x,y,U replaced by mk+1, mk+2,k + 2, x k ,x k+1,Uk+1 , respectively, to yield some λk+1 ∈ m(k+1),m(k+2) such that (dD (x k ,x k+1 ) + 2−k−1,λk+1 ) ∈ Ax(k),x(k+1) .
(10.2.51)
From inequality 10.2.46, we have dD (x k ,x k+1 ) + 2−k−1 < (2−k + 2−k + 2−k−1 ) + 2−k−1 < 2−k+2, so relation 10.2.51 trivially implies the desired relation 10.2.48. 10. From relation 10.2.48, it follows that log λk+1 t − λk+1 s ≤ 2−k+2 t −s for each s,t ∈ [0,1] with s < t, and that
(10.2.52)
a.u. Càdlàg Process d(x k+1 (λk+1 t),x k (t)) ≤ 2−k+2
401 (10.2.53)
for each t ∈ domain(x k ) ∩ λ−1 k+1 domain(x k+1 ). 11. For each k ≥ 0, define the composite admissible function μk ≡ λk λk−1 · · · λ0 ∈ . We will prove that μk → μ uniformly on [0,1] for some μ ∈ . To that end, let h > k ≥ 0 be arbitrary. By Lemma 10.2.4, relation 10.2.48 implies that (2−k+2 + 2−k+1 + · · · + 2−h+3,λh λh−1 · · · λk+1 ) ∈ Ax(k),x(h) . Hence, since 2−k+2 + 2−k+1 + · · · + 2−h+3 < 2−k+3 , we also have (2−k+3,λh λh−1 · · · λk+1 ) ∈ Ax(k),x(h) .
(10.2.54)
12. Let t,s ∈ [0,1] be arbitrary with s < t. Write t ≡ μk t and s ≡ μk s. Then log μh t − μh s = log λh λh−1 · · · λk+1 t − λh λh−1 · · · λk+1 s < 2−k+3, μk t − μk s t − s (10.2.55) where the inequality is thanks to relation 10.2.54. Equivalently, (t − s ) exp(−2−k+3 ) < λh λh−1 · · · λk+1 t − λh λh−1 · · · λk+1 s < (t − s ) exp(2−k+3 )
(10.2.56)
for each t ,s ∈ [0,1] with s < t . In the special case where s = 0, inequality 10.2.55 reduces to μh t < 2−k+3 . (10.2.57) | log μh t − log μk t| = log μ t k
Hence the limit μt ≡ lim μh t h→∞
exists, where t ∈ (0,1] is arbitrary. Moreover, letting k = 0 and h → ∞ in inequality 10.2.55, we obtain log μt − μs ≤ 23 = 8. (10.2.58) t −s Therefore μ is an increasing function that is uniformly continuous on (0,1]. Furthermore, te−8 ≤ μt ≤ te8, where t ∈ (0,1] is arbitrary. Hence μ can be extended to a continuous increasing function on [0,1], with μ0 = 0. Since μk 1 = 1 for each k ≥ 0, we have μ1 = 1. In view of inequality 10.2.58, we conclude that μ ∈ .
402
Stochastic Process
13. By letting h → ∞ in inequality 10.2.55, we obtain log μt − μs ≤ 2−k+3, μk t − μk s
(10.2.59)
where t,s ∈ [0,1] are arbitrary with s < t. Replacing t,s by μ−1 t,μ−1 s, respectively, we obtain t −s μk μ−1 t − μk μ−1 s ≤ 2−k+3, (10.2.60) = log log t −s μk μ−1 t − μk μ−1 s where t,s ∈ [0,1] are arbitrary with s < t, and where k ≥ 0 is arbitrary. 14. Recall from Steps 5 and 6 that for each k ≥ 0, the sequence ηk ≡ (ηk,0,ηk,1, . . . ,ηk,n(k) ) ≡ qm(k) ≡ (qm(k),0,qm(k),1, . . . ,qm(k),p(m(k)) ) (10.2.61) is a sequence of division points of the simple càdlàg function x k . Define the set A≡
∞
μμ−1 h ([ηh,0,ηh,1 ) ∪ [ηh,1,ηh,2 ) ∪ · · · ∪ [ηh,n(h)−1,ηh,n(h) ]). (10.2.62)
h=0
Then A contains all but countably many points in [0,1], and is therefore dense in [0,1]. 15. Let k ≥ 0 be arbitrary. Define the function zk ≡ x k ◦ μk μ−1 .
(10.2.63)
Then zk ∈ D[0,1] by Assertion 4 of Lemma 10.1.3. Moreover, zk+1 ≡ x k+1 ◦ μk+1 μ−1 = x k+1 ◦ λk+1 μk μ−1 .
(10.2.64)
Now consider each r ∈ A. Let h ≥ k be arbitrary. Then, by the defining equality 10.2.62 of the set A, there exists i = 0, . . . ,nh − 1 such that μh μ−1 r ∈ [ηh,i ,ηh,i+1 ) ⊂ domain(x h ).
(10.2.65)
Hence, λh+1 μh μ−1 r ∈ λh+1 [ηh,i ,ηh,i+1 ). Moreover, from the defining equality 10.2.62, we have −1 r ∈ A ⊂ μμ−1 h (domain(x h )) = domain(x h ◦ μh μ ) ≡ domain(zh ). (10.2.66)
From equalities 10.2.64 and 10.2.63 and inequality 10.2.53, we obtain d(zk+1 (r),zk (r)) ≡ d(x k+1 (λk+1 μk μ−1 r),x k (μk μ−1 r)) ≤ 2−k+2 . (10.2.67) 16. Hence, since r ∈ A and k ≥ 0 are arbitrary, we conclude that limk→∞ zk exists on A. Define the function z : [0,1] → S by domain(z) ≡ A and by z(t) ≡ limk→∞ zk (t) for each t ∈ domain(z). Inequality 10.2.67 then implies that
a.u. Càdlàg Process d(z(r),zk (r)) ≤ 2−k+3,
403 (10.2.68)
where r ∈ A and k ≥ 0 are arbitrary. We proceed to verify the conditions in Proposition 10.1.6 for the function z to have a càdlàg completion. 17. For that purpose, let r ∈ A and h ≥ k be arbitrary. Then, as observed in Step 15, we have r ∈ domain(zh ). Hence, since zh is càdlàg, it is right continuous at r. Therefore there exists ch > 0 such that d(zh (t),zh (r)) < 2−h+3
(10.2.69)
for each t ∈ [r,r + ch ) ∩ A. In view of inequality 10.2.68, it follows that d(z(t),z(r)) ≤ d(z(t),zh (t)) + d(zh (t),zh (r)) + d(zh (r),z(r)) < 3 · 2−h+3 for each t ∈ [r,r + ch ) ∩ A. Thus z is right continuous at each point r ∈ A ≡ domain(z). Condition 1 in Definition 10.1.2 has been verified for the function z. 18. We will next verify Condition 3 in Definition 10.1.2 for the function z. To that end, let ε > 0 be arbitrary. Take k ≥ 0 so large that 2−k+4 < ε, and define δ(ε) ≡ exp(−2−k+3 )2−m(k) . Let j = 0, . . . ,nk be arbitrary. For brevity, define ηj ≡ μμ−1 k ηk,j . By inequality 10.2.52, we have λh+1 t − λh+1 s ≥ (t − s) exp(−2−h+2 ) for each s,t ∈ [0,1] with s < t, for each h ≥ 0. Hence, for each j = 0, . . . ,nk −1, we have −1 ηj +1 − ηj ≡ μμ−1 k ηk,j +1 − μμk ηk,j −1 = lim (μh μ−1 k ηk,j +1 − μh μk ηk,j ) h→∞
= lim (λh · · · λk+1 ηk,j +1 − λh · · · λk+1 ηk,j ) h→∞
≥ exp(−2−k+3 )(ηk,j +1 − ηk,j ) = exp(−2−k+3 )2−m(k) ≡ δ(ε), (10.2.70) where the inequality is by the first half of inequality 10.2.56. 19. Now consider each j = 0, . . . ,nk − 1 and each −1 t ∈ domain(z) ∩ [ηj ,ηj +1 ) ≡ A ∩ [μμ−1 k ηk,j ,μμk ηk,j +1 ).
We will show that d(z(t ),z(ηj ))) ≤ ε.
(10.2.71)
To that end, write t ≡ μ−1 t , and write s ≡ μk t ≡ μk μ−1 t ∈ [ηk,j ,ηk,j +1 ). Then x k (s) = x k (ηk,j )
(10.2.72)
since x k is a simple càdlàg function with (ηk,0,ηk,1, . . . ,ηk,n(k) ) as a sequence of division points. Let h > k be arbitrary, and define
404
Stochastic Process r ≡ μh μ−1 t = μh t = λh λh−1 · · · λk+1 s.
Then, according to the defining equality 10.2.63, we have zh (t ) ≡ x h (μh μ−1 t ) ≡ x h (r) ≡ x h (λh λh−1 · · · λk+1 s). Combining with equality 10.2.72, we obtain d(zh (t ),x k (ηk,j )) = d(x h (λh λh−1 · · · λk+1 s),x k (s)) ≤ 2−k+3, where the last inequality is a consequence of relation 10.2.54. Then d(zh (t ),zh (ηj )) ≤ d(zh (t ),x k (ηk,j )) + d(zh (ηj ),x k (ηk,j )) ≤ 2−k+4 < ε, (10.2.73) where t ∈ domain(z) ∩ [ηj ,ηj +1 ) is arbitrary. Letting h → ∞, we obtain the desired inequality 10.2.71. Inequalities 10.2.71 and 10.2.70 together say that (ηj )j =0,...,n(k) is a sequence of ε-division points for the function z, with separation of at least δ(ε). Condition 3 in Definition 10.1.2 has been verified for the objects z,η , and δ. 20. Thus Conditions 1 and 3 in Definition 10.1.2 have been verified for the objects z,η , and δ. Proposition 10.1.6 therefore says that (i ) the completion y ∈ D[0,1] of z is well defined, (ii ) y|domain(z) = z, and (iii ) δ is a modulus of càdlàg of y. 21. Finally, we will prove that dD (yh,y) → 0 as h → ∞. To that end, let h ≥ 0 and r ∈ A ⊂ domain(z) ⊂ domain(y) be arbitrary. By the Condition (ii ), we have y(r) = z(r). Hence, by inequality 10.2.68, d(y(r),zh (r)) = d(z(r),zh (r)) ≤ 2−h+3 . Consequently, since A is a dense subset of [0,1], Lemma 10.1.3 says that d(y(r),zh (r)) ≤ 2−h+3
(10.2.74)
for each r ∈ domain(y) ∩ domain(zh ). In other words, d(y(r),x h ◦ μh μ−1 (r)) ≤ 2−h+3
(10.2.75)
for each r ∈ domain(y) ∩ μμ−1 h domain(x h ). 22. Inequalities 10.2.75 and 10.2.60 together imply that (2−h+3,μh μ−1 ) ∈ Ay,x(h) , whence 2−h+3 ∈ By,x(h) . Accordingly, dD (y,x h ) ≡ inf By,x(h) ≤ 2−h+3 . Together with inequality 10.2.44, this implies dD (y,yh ) ≤ 2−h+3 + 2−h, where h ≥ 0 is arbitrary. Thus dD (yh,y) → 0 as h → ∞.
a.u. Càdlàg Process
405
23. Summing up, for each Cauchy sequence (yh )h=1,2,... , there exists y ∈ D[0,1] such that dD (yh,y) → 0 as h → ∞. In other words, (D[0,1],dD ) is complete, as alleged.
10.3 a.u. Càdlàg Process Let (,L,E) be a probability space, and let (S,d) be a locally compact metric space. Let (D[0,1],dD ) be the Skorokhod space of càdlàg functions on the unit interval [0,1] with values in (S,d), as introduced in Section 10.2. Recall Definition 9.0.2 for notations related to the enumerated set of dyadic rationals Q∞ in [0,1]. Definition 10.3.1. Random càdlàg function. An arbitrary r.v. Y : (,L,E) → (D[0,1],dD ) with values in the Skorokhod space is called a random càdlàg function if, for each ε > 0, there exists a measurable set A with P (A) < ε such that members of the family {Y (ω) : ω ∈ Ac } of càdlàg functions share a common modulus of càdlàg. Of special interest is the subclass of the random càdlàg functions corresponding to a.u. càdlàg processes on [0,1], defined next. Definition 10.3.2. a.u. Càdlàg process. Let X : [0,1]× → (S,d) be a stochastic process that is continuous in probability on [0,1], with a modulus of continuity in probability δCp . Suppose there exists a full set B ⊂ t∈Q(∞) domain(Xt ) with the following properties: 1. (Right continuity.) For each ω ∈ B, the function X(·,ω) is right continuous at each t ∈ domain(X(·,ω)). 2. (Right completeness.) Let ω ∈ B and t ∈ [0,1] be arbitrary. If limr→t;r≥t X(r,ω) exists, then t ∈ domain(X(·,ω)). 3. (Approximation by step functions.) Let ε > 0 be arbitrary. Then there exist (i) δaucl (ε) ∈ (0,1), (ii) a measurable set A ⊂ B with P (Ac ) < ε, (iii) an integer h ≥ 1, and (iv) a sequence of r.r.v.’s 0 = τ0 < τ1 < · · · < τh−1 < τh = 1
(10.3.1)
such that, for each i = 0, . . . ,h − 1, the function Xτ (i) is an r.v., and such that (v) for each ω ∈ A, we have h−1 >
(τi+1 (ω) − τi (ω)) ≥ δaucl (ε)
(10.3.2)
i=0
with d(X(τi (ω),ω),X(·,ω)) ≤ ε
(10.3.3)
on the interval θi (ω) ≡ [τi (ω),τi+1 (ω)) or θi (ω) ≡ [τi (ω),τi+1 (ω)] depending on whether 0 ≤ i ≤ h − 2 or i = h − 1.
406
Stochastic Process
Then the process X : [0,1] × → S is called an a.u. càdlàg process, with δCp as a modulus of continuity in probability and with δaucl as a modulus of a.u. δ(aucl),δ(cp) [0,1] denote the set of all such processes. càdlàg. We will let D We will let D[0,1] denote the set of all a.u. càdlàg processes. Two members X,Y of D[0,1] are considered equal if there exists a full set B such that for each ω ∈ B , we have X(·,ω) = Y (·,ω) as functions on [0,1]. Lemma 10.3.3. a.u. Continuity implies a.u. càdlàg. Let X : [0,1]×(,L,E) → (S,d) be an arbitrary a.u. continuous process. Then X is a.u. càdlàg.
Proof. Easy and omitted.
Definition 10.3.4. Random càdlàg function from an a.u. càdlàg process. Let X ∈ D[0,1] be arbitrary. Define a function X∗ : → D[0,1] by domain(X∗ ) ≡ {ω ∈ : X(·,ω) ∈ D[0,1]} and by X∗ (ω) ≡ X(·,ω) for each ω ∈ domain(X∗ ). We will call X∗ the random càdlàg function from the a.u. càdlàg process X. Proposition 10.3.5 (next) proves that the function X∗ is well defined and is indeed a random càdlàg function. Proposition 10.3.5. Each a.u. càdlàg process gives rise to a random càdlàg function. Let X ∈ D[0,1] be an arbitrary a.u. càdlàg process, with some modulus of a.u. càdlàg δaucl . Then the following conditions hold: 1. Let ε > 0 be arbitrary. Then there exists a measurable set A with P (A) < ε, such that members of the set {X∗ (ω) : ω ∈ Ac }of functions on [0,1] are càdlàg functions that share a common modulus of càdlàg. 2. X∗ is an r.v. with values in the complete metric space (D[0,1],dD ). Thus X∗ is a random càdlàg function. Proof. 1. Define the full subset B ≡ t∈Q(∞) domain(Xt ) of . By Conditions 1 and 2 in Definition 10.3.2, members of the set {X(·,ω) : ω ∈ B} of functions satisfy the corresponding Conditions 1 and 2 in Definition 10.1.2. 2. Let n ≥ 1 be arbitrary. By Condition 3 in Definition 10.3.2, there exist (i) δaucl (2−n ) > 0, (ii) a measurable set An ⊂ B with P (Acn ) < 2−n , (iii) an integer hn ≥ 0, and (iv) a sequence of r.r.v.’s 0 = τn,0 < τn,1 < · · · < τn,h(n) = 1
(10.3.4)
such that for each i = 0, . . . ,hn − 1, the function Xτ (n,i) is an r.v., and such that for each ω ∈ An , we have
a.u. Càdlàg Process h(n)−1 >
(τn,i+1 (ω) − τn,i (ω)) ≥ δaucl (2−n )
407 (10.3.5)
i=0
with d(X(τn,i (ω),ω),X(·,ω)) ≤ 2−n
(10.3.6)
on the interval θn,i (ω) ≡ [τn,i (ω),τn,i+1 (ω)) or θn,i (ω) ≡ [τn,i (ω),τn,i+1 (ω)] depending on whether 0 ≤ i ≤ hn − 2 or i = hn − 1. 3. Let ε > 0 be arbitrary. Take j ≥ 1 so large that 2−j < ε. Define A ≡ ∞ c c −n = 2−j < ε. Consider each ω ∈ Ac . B ∪ ∞ n=j +1 An . Then P (A) < n=j +1 2
Let ε > 0 be arbitrary. Consider each n ≥ j + 1 so large that 2−n < ε . Define δcdlg (ε ) ≡ δaucl (2−n ). Then ω ∈ An . Hence inequalities 10.3.5 and 10.3.6 hold, and imply that the sequence 0 = τn,0 (ω) < τn,1 (ω) < · · · < τn,h(n) (ω) = 1
(10.3.7)
is a sequence of ε -division points of the function X(·,ω), with separation of at least δcdlg (ε ). Summing up, all the conditions in Definition 10.1.2 have been verified for the function X∗ (ω) ≡ X(·,ω) to be càdlàg, with the modulus of càdlàg δcdlg , where ω ∈ Ac is arbitrary. Assertion 1 is proved. 4. Now let n ≥ 1 be arbitrary. We will construct a random càdlàg function Vn as follows. Write decld for the Euclidean metric on (0,1). Fix a 0 ≡ 0 and a h(n) ≡ 1. Let (a1, . . . ,ah(n)−1 ) ∈ (0,1)h(n−1) be arbitrary. As an abbreviation, −n write δ ≡ h−1 n δaucl (2 ) > 0. Define, for each i = 1, . . . ,hn − 1, a i ≡ fi (a1, . . . ,ah(n)−1 ) ≡ iδ ∨ (a0 ∨ · · · ∨ ai ) ∧ (iδ + 1 − hn δ). Then a i+1 > a i + δ for each i = 1, . . . ,hn − 1. Hence 0 ≡ a 0 < a 1 < · · · < a h(n) ≡ 1. Therefore the simple càdlàg function smpl ((a 0,a 1, . . . ,a h(n)−1 ),(x0, . . . ,xh(n)−1 )), first introduced in Definition 10.1.7, is well defined for each ((a1, . . . ,ah(n)−1 ),(x0, . . . ,xh(n)−1 )) ∈ (0,1)h(n−1) × S h(n)−1 . 5. Thus we have a function h(n)−1
n : ((0,1)h(n−1) × S h(n)−1,decld
⊗ d h(n)−1 ) → (D[0,1],dD )
defined by n ((a1, . . . ,ah(n)−1 ),(x0, . . . ,xh(n)−1 )) ≡ smpl ((a 0,a 1, . . . ,a h(n)−1 ),(x0, . . . ,xh(n)−1 )), for each ((a1, . . . ,ah(n)−1 ),(x0, . . . ,xh(n)−1 )) ∈ (0,1)h(n−1) × S h(n)−1 . It can easily be proved that this function n is uniformly continuous.
408
Stochastic Process
6. Hence Proposition 4.8.10 applies. It implies that the function Vn ≡ n ((τn,1, . . . ,τn,h(n)−1 ),(Xτ (n,0), . . . ,Xτ (n,h(n)−1) )) : (,L,E) → (D[0,1],dD ) is an r.v. 7. Define, for each i = 1, . . . ,hn − 1, the r.r.v. τ n,i ≡ fi (τn,1, . . . ,τn,h(n)−1 ). Consider each ω ∈ An . Let i = 1, . . . ,hn − 1 be arbitrary. Then inequality 10.3.5 holds and implies that τn,i (ω) ≥ δaucl (2−n ) > iδ and that τn,i (ω) ≤ τn,i+1 (ω) − δaucl (2−n ) ≤ · · · ≤ τn,h(n) (ω) − (hn − 1)δaucl (2−n ) ≤ 1 − (hn − 1)iδ = iδ + 1 − hn iδ ≤ iδ + 1 − hn δ. Therefore, in view of inequality 10.3.4, we have, τ n,i (ω) ≡ iδ ∨ (τn,0 (ω) ∨ · · · ∨ τn,i (ω)) ∧ (iδ + 1 − hn δ) = iδ ∨ τn,i (ω) ∧ (iδ + 1 − hn δ) = τn,i (ω). 8. Hence Vn (ω) ≡ n ((τn,1 (ω), . . . ,τn,h(n)−1 (ω)),(Xτ (n,0) (ω), . . . ,Xτ (n,h(n)−1) (ω))) ≡ smpl ((τ n,0 (ω), . . . ,τ n,h(n)−1 (ω)),(Xτ (n,0) (ω), . . . ,Xτ (n,h(n)−1) (ω))) ≡ smpl ((τn,0 (ω), . . . ,τn,h(n)−1 (ω)),(Xτ (n,0) (ω), . . . ,Xτ (n,h(n)−1) (ω))). 9. Now let t ∈ [τn,i (ω),τn,i+1 (ω)) be arbitrary, for some i = 0, . . . ,hn − 1. Then, by Definition 10.1.7 for the function smpl , we have Vn (ω)(t) = Xτ (n,i) (ω) ≡ X(τn,i (ω),ω). At the same time, inequality 10.3.6 implies that d(X(τn,i (ω),ω),X∗ (ω)(t)) ≡ d(X(τn,i (ω),ω),X(t,ω)) ≤ 2−n . Combining, we obtain d(Vn (ω)(t),X∗ (ω)(t)) ≤ 2−n h(n)−1
(10.3.8)
for each t in i=0 [τn,i (ω),τn,i+1 (ω)). Since this union is a dense subset of [0,1], inequality 10.3.8 holds for each t ∈ domain(Vn (ω)) ∩ domain(X∗ (ω)). It follows that dD (Vn (ω),X∗ (ω)) ≤ 2−n,
a.u. Càdlàg Process
409
where ω ∈ An is arbitrary. Since P (Acn ) < 2−n , it follows that Vn → X∗ in probability, as functions with values in the complete metric space (D[0,1],dD ). At the same time Vn is an r.v. for each n ≥ 1. Therefore Assertion 2 of Proposition 4.9.3 says that X∗ is an r.v. with values in (D[0,1],dD ). Assertion 2 and the present proposition are proved. Definition 10.3.6. Metric for the space of a.u. càdlàg processes. Define the on D[0,1] by metric ρD[0,1] ! ρD[0,1] (X,Y ) ≡ E(dω)dD (X(·,ω),Y (·,ω)) ≡ E dD (X∗,Y ∗ ) (10.3.9) for each X,Y ∈ D[0,1], where dD ≡ 1 ∧ dD . Lemma 10.3.7 (next) justifies the definition.
is a metric space. The function ρD[0,1] is well Lemma 10.3.7. D[0,1],ρ D[0,1] defined and is a metric. Proof. Let X,Y ∈ D[0,1] be arbitrary. Then, according to Proposition 10.3.5, the random càdlàg functions X∗,Y ∗ associated with X,Y , respectively, are r.v.’s with values in (D[0,1],dD ). Therefore the function dD (X∗,Y ∗ ) is an integrable r.r.v., and the defining equality 10.3.9 makes sense. . Now Symmetry and the triangle inequality can be trivially verified for ρD[0,1] suppose X = Y in D[0,1]. Then, by the definition of the set D[0,1], we have X(·,ω) = Y (·,ω) in D[0,1], for a.e. ω ∈ . Hence dD (X∗ (ω),Y ∗ (ω)) ≡ dD ((X(·,ω),Y (·,ω)) = 0. (X,Y ) ≡ E dD (X∗,Y ∗ ) = 0 Thus dD (X∗,Y ∗ ) = 0 a.s. Consequently, ρD[0,1] according to the defining equality 10.3.9. The converse is proved similarly. is a metric. Combining, we conclude that ρ D[0,1]
10.4 D-Regular Family of f.j.d.’s and D-Regular Process In this and the following two sections, we construct an a.u. càdlàg process from a consistent family F of f.j.d.’s with the locally compact state space (S,d) and parameter set [0,1] that satisfies a certain D-regularity condition, to be defined presently. The construction is by (i) taking any process Z : Q∞ × → S with marginal distributions given by F |Q∞ and (ii) extending the process Z to an a.u. càdlàg process X : [0,1] × → S by taking right limits of sample paths. Step (i) can be done by the Daniell–Kolmogorov Extension or Daniell–Kolmogorov– Skorokhod Extension, for example. As a matter of fact, we can define D-regularity for F as D-regularity of any process Z with marginal distributions given by F |Q∞ . The key Step (ii) then involves proving that a process X : [0,1] × → S is a.u. càdlàg iff it is the right-limit extension of some D-regular process Z : Q∞ × → S. In this section, we prove the “only if” part. In Section 10.5, we will prove the “if” part.
410
Stochastic Process
Definition 10.4.1. D-regular processes and D-regular families of f.j.d.’s with parameter set Q∞ . Let Z : Q∞ × → S be a stochastic process, with marginal distributions given by the family F of f.j.d.’s. Let m ≡ (mn )n=0.1.··· be an increasing sequence of nonnegative integers. Suppose the following conditions are satisfied: 1. Let n ≥ 0 be arbitrary. Let β > 2−n be arbitrary such that the set β
At,s ≡ (d(Zt ,Zs ) > β)
(10.4.1)
is measurable for each s,t ∈ Q∞ . Then P (Dn ) < 2−n,
(10.4.2)
where we define the exceptional set β β β β β β (At,r ∪ At,s )(At,r ∪ At ,s )(At ,r ∪ At ,s ), Dn ≡ t∈Q(m(n)) r,s∈(t,t )Q(m(n+1));r≤s
(10.4.3) where for each t ∈ Qm(n) we abuse notations and write t ≡ 1 ∧ (t + 2−m(n) ). 2. The process Z is continuous in probability on Q∞, with a modulus of continuity in probability δCp . Then the process Z : Q∞ × → S and the family F of f.j.d.’s are said to be Dregular, with the sequence m as a modulus of D-regularity and with the operation δCp as a modulus of continuity in probability. Let Dreg,m,δ(Cp) (Q∞ × ,S) R Dreg (Q∞ × ,S) denote the set denote the set of all such processes. Let R Dreg (Q∞ × of all D-regular processes. Thus RDreg,m,δ(Cp) (Q∞ × ,S) and R ρP rob,Q(∞) ) introduced in ,S) are subsets of the metric space (R(Q∞ × ,S), Definition 6.4.2 and, as such, inherit the metric ρ P rob,Q(∞) . Thus we have the metric space Dreg (Q∞ × ,S), ρP rob,Q(∞) ). (R In addition, let Dreg,m,δ(Cp) (Q∞,S) F Dreg (Q∞,S) denote the set of denote the set of all such families F of f.j.d.’s. Let F Dreg (Q∞,S) Dreg,m,δ(Cp) (Q∞,S) and F all D-regular families of f.j.d.’s. Then F ρMarg,ξ,Q(∞) ) of consistent families are subsets of the metric space (F (Q∞,S), of f.j.d.’s introduced in Definition 6.2.8, where the metric ρ Marg,ξ,Q(∞) is defined relative to an arbitrarily given, but fixed, binary approximation ξ ≡ (Aq )q=1,2,... of (S,d). Condition 1 in Definition 10.4.1 is, in essence, equivalent to condition (13.10) in the key theorem 13.3 of [Billingsley 1999]. The crucial difference between our construction and Billingsley’s theorem is that the latter relies on Prokhorov’s
a.u. Càdlàg Process
411
Theorem (theorem 5.1 in [Billingsley 1999]). As we observed earlier, Prokhorov’s Theorem implies the principle of infinite search. This is in contrast to our simple and direct construction developed in this and the next section. First we extend the definition of D-regularity to families of f.j.d.’s with a parameter interval [0,1] that are continuous in probability on the interval. Definition 10.4.2. D-regular families of f.j.d.’s with parameter interval [0,1]. Cp ([0,1],S), ρCp,ξ,[0,1],Q(∞) ) of Recall from Definition 6.2.12 the metric space (F families of f.j.d.’s that are continuous in probability on [0,1], where the metric is defined relative to the enumerated, countable, dense subset Q∞ of [0,1]. Define Cp ([0,1],S) by two subsets of F Cp ([0,1],S) : F |Q∞ ∈ F Dreg (Q∞,S)} Dreg ([0,1],S) ≡ {F ∈ F F and Cp ([0,1],S) : F |Q∞ ∈ F Dreg,m,δ(Cp) (Q∞,S)}. Dreg,m,δ(Cp) ([0,1],S) ≡ {F ∈ F F These subsets inherit the metric ρ Cp,ξ,[0,1],Q(∞) . We will prove that a process X : [0,1] × → S is a.u càdlàg iff it is the extension by right limit of a D-regular process Z : Q∞ × → S. Theorem 10.4.3. Restriction of each a.u. càdlàg process to Q∞ is D-regular. Let X : [0,1] × → S be an a.u càdlàg process with a modulus of a.u. càdlàg δaucl and a modulus of continuity in probability δCp . Let m ≡ (mn )n=0,1,2,... be an arbitrary increasing sequence of integers such that 2−m(n) < δaucl (2−n−1 )
(10.4.4)
for each n ≥ 0. Then the process Z ≡ X|Q∞ is D-regular with a modulus of D-regularity m and with the same modulus of continuity in probability δCp . Proof. 1. By Definition 10.3.2, the a.u càdlàg process X is continuous in probability on [0,1], with some modulus of continuity in probability δCp . Hence so is Z on Q∞ , with the same modulus of continuity in probability δCp . Consequently, Condition 2 in Definition 10.4.1 is satisfied. 2. Now let n ≥ 0 be arbitrary. Write εn ≡ 2−n . By Condition 3 in Definition 10.3.2, there exist (i) δaucl (2−1 εn ) > 0, (ii) a measurable set An ⊂ B ≡ t∈Q(∞) domain(Xt ) with P (Acn ) < 2−1 εn , (iii) an integer hn ≥ 0, and (iv) a sequence of r.r.v.’s 0 = τn,0 < τn,1 < · · · < τn,h(n)−1 < τn,h(n) = 1
(10.4.5)
such that for each i = 0, . . . ,hn − 1, the function Xτ (n,i) is an r.v., and such that for each ω ∈ An , we have h(n)−1 >
(τn,i+1 (ω) − τn,i (ω)) ≥ δaucl (2−1 εn )
i=0
(10.4.6)
412
Stochastic Process
with d(X(τn,i (ω),ω),X(·,ω)) ≤ 2−1 εn
(10.4.7)
on the interval θn,i ≡ [τn,i (ω),τn,i+1 (ω)) or θn,i ≡ [τn,i (ω),1] depending on whether i ≤ hn − 2 or i = hn − 1. 3. Take any β > 2−n such that the set β
At,s ≡ (d(Xt ,Xs ) > β)
(10.4.8)
is measurable for each s,t ∈ Q∞ . Let Dn be the exceptional set defined in equality 10.4.3. Suppose, for the sake of a contradiction, that P (Dn ) > εn ≡ 2−n . Then P (Dn ) > εn > P (Acn ) by Condition (ii). Hence P (Dn An ) > 0. Consequently, there exists ω ∈ Dn An . Since ω ∈ An , inequalities 10.4.6 and 10.4.7 hold at ω. At the same time, since ω ∈ Dn, there exist t ∈ Qm(n) and r,s ∈ Qm(n+1) , with t < r ≤ s < t ≡ t + 2−m(n) , such that β
β
β
β
β
β
ω ∈ (At,r ∪ At,s )(At,r ∪ At ,s )(At ,r ∪ At ,s ).
(10.4.9)
4. Note that τn,i+1 (ω) − τn,i (ω) > t − t, for each i = 0, . . . ,hn − 1. Hence there exists i = 1, . . . ,hn − 1 such that τn,i−1 (ω) ≤ t < t ≤ τn,i+1 (ω). There are three possibilities regarding the order of r,s in relation to the points τn,i−1 (ω),τn,i (ω), and τn,i+1 (ω) in the interval [0,1]: (i ) τn,i−1 (ω) ≤ t < r ≤ s < τn,i (ω), (i ) τn,i−1 (ω) ≤ t < r < τn,i (ω) < s < t ≤ τn,i+1 (ω), and (iii ) τn,i (ω) < r ≤ s < t ≤ τn,i+1 (ω). In case (i ), we have, in view of inequality 10.4.7, d(Xt (ω),Xr (ω)) ≤ (d(Xτ (n,i−1) (ω),Xt (ω)) + d(Xτ (n,i−1) (ω),Xr (ω))) ≤ εn < β β
β
and similarly d(Xt (ω),Xs (ω)) < β. Hence ω ∈ (At,r ∪ At,s )c . Similarly, in case (ii ), we have d(Xt (ω),Xr (ω)) ∨ d(Xs (ω),Xt (ω)) < β, whence ω ∈ (At,r ∪ At ,s )c . Likewise, in case (iii ), we have β
β
d(Xt (ω),Xr (ω)) ∨ d(Xt (ω),Xs (ω)) < β, β
β
whence ω ∈ (At ,r ∪ At ,s )c . Combining, in each case, we have
a.u. Càdlàg Process β
β
β
β
413 β
β
ω ∈ (At,r ∪ At,s )c ∪ (At,r ∪ At ,s )c ∪ (At ,r ∪ At ,s )c, which contradicts relation 10.4.9. Thus the assumption that P (Dn ) > 2−n leads to a contradiction. We conclude that P (Dn ) ≤ 2−n , where n ≥ 0 and β > 2−n are arbitrary. Hence the process Z ≡ X|Q∞ satisfies the conditions in Definition 10.4.1 to be D-regular, with the sequence (mn )n=0,1,2,... as a modulus of D-regularity. The converse to Theorem 10.4.3 will be proved in Section 10.5. From a D-regular family F of f.j.d.’s with parameter set [0,1] we will construct an a.u. càdlàg process with marginal distributions given by the family F .
10.5 Right-Limit Extension of D-Regular Process Is a.u. Càdlàg Refer to Definition 9.0.2 for the notations related to the enumerated set Q∞ of dyadic rationals in the interval [0,1]. We proved in Theorem 10.4.3 that the restriction to Q∞ of each a.u. càdlàg process on [0,1] is D-regular. In this section, we will prove the converse, which is the main theorem in this chapter: the extension by right limit of each D-regular process on Q∞ is a.u. càdlàg. Then we will prove the easy corollary that, given an D-regular family F of f.j.d.’s with parameter set [0,1], we can construct an a.u. càdlàg process X with marginal distributions given by F and with a modulus of a.u. càdlàg in terms of the modulus of D-regularity of F . We will use the following assumption and notations. Definition 10.5.1. Assumption of a D-regular process. Recall that (S,d) is a Dreg,m,δ(Cp) (Q∞ × ,S) be arbitrary locally compact metric space. Let Z ∈ R but fixed for the remainder of this section. In other words, Z : Q∞ × → S is a fixed D-regular process with a fixed modulus of D-regularity m ≡ (mk )k=0,1,... and with a fixed modulus of continuity in probability δCp . Definition 10.5.2. Notation for the range of a sample function. Let Y : Q × → S be an arbitrary process, and let A ⊂ Q and ω ∈ be arbitrary. Then we write Y (A,ω) ≡ {x ∈ S : x = Y (t,ω) for some t ∈ A}. Definition 10.5.3. Accordion function. In the following discussion, unless otherwise specified, (βh )h=0,1,... will denote an arbitrary but fixed sequence of real numbers such that for each k,h ≥ 0 with k ≤ h, and for each r,s ∈ Q∞, we have (i) βh ∈ (2−h+1,2−h+2 ),
(10.5.1)
(d(Zr ,Zs ) > βk + · · · + βh )
(10.5.2)
(ii) the set
414
Stochastic Process
is measurable, and, in particular, (iii) the set Aβ(h) r,s ≡ (d(Zr ,Zs ) > βh )
(10.5.3)
is measurable. Let h,n ≥ 0 be arbitrary. Define βn,h ≡ where, by convention,
βi ,
(10.5.4)
i=n
h
i=n βi
≡ 0 if h < n. Define βn,∞ ≡
Note that βn,∞ < Q∞ , write
h
∞
βi .
(10.5.5)
i=n
∞
−i+2 i=n 2
= 2−n+3 → 0 as n → ∞. For each subset A of
A− ≡ {t ∈ Q∞ : t s for each s ∈ A}, the metric complement of A in Q∞ . Let s ∈ Q∞ = ∞ h=0 Qm(h) be arbitrary. Define h(s) ≡ h ≥ 0 to be the smallest integer such that s ∈ Qm(h) . Let n ≥ 0 be arbitrary. Define n (s) ≡ βn, β h(s) ≡
h(s)
βi < βn,∞ .
(10.5.6)
i=n
h(u) ≤ k, whence Note that, for each u ∈ Qm(k) , we have n (u) ≡ β
h(u)
i=n
βi ≤
k
βi ≤ βn,k < βn,∞ .
(10.5.7)
i=n
Thus we have the functions h : Q∞ → {0,1,2, . . .} and n : Q∞ → (0,βn,∞ ) β for each n ≥ 0. These functions are defined relative to the sequences (βn )n=0,1,... and (mn )n=0,1,... . n an accordion function, For want of a better name, we might call the function β because its graph resembles a fractal-like accordion. It will furnish a time-varying boundary for some simple first exit times in the proof of the main theorem. Note n (s) ≤ h(s) ≤ k and so β that for arbitrary s ∈ Qm(k) for some k ≥ 0, we have βn,k .
a.u. Càdlàg Process
415
Definition 10.5.4. Some small exceptional sets. Let k ≥ 0 be arbitrary. Then βk+1 > 2−k . Hence, by the conditions in Definition 10.4.1 for m ≡ (mh )h=0,1,... to be a modulus of D-regularity of the process Z, we have P (Dk ) ≤ 2−k , where
Dk ≡
(10.5.8)
u∈Q(m(k)) r,s∈(u,u )Q(m(k+1));r≤s
A
β(k+1)
β(k+1) β(k+1) β(k+1) (Au,r ∪ Au,s )(Au,r ∪ Au ,s
β(k+1)
)(Au ,r
β(k+1)
∪ Au ,s
B ) , (10.5.9)
where, for each u ∈ Qm(k) , we abuse notations and write u ≡ u + m(k) . For each h ≥ 0, define the small exceptional set ∞
Dh+ ≡
Dk
(10.5.10)
k=h
with P (Dh+ ) ≤
∞
2−k = 2−h+1 .
(10.5.11)
k=h
Lemma 10.5.5. Existence of certain supremums as r.r.v.’s. Let Z : Q∞ × → S be a D-regular process, with a modulus of D-regularity m ≡ (mk )k=0,1,... . Let h ≥ 0 and v,v,v ∈ Qm(h) be arbitrary with v ≤ v ≤ v . Then the following conditions hold: 1. For each r ∈ [v,v ]Q∞ , we have h+1 (r) d(Zv,Zu ) + β (10.5.12) d(Zv,Zr ) ≤ u∈[v,v ]Q(m(h)) c . on Dh+ 2. The supremum
sup
u∈[v,v ]Q(∞)
exists as an r.r.v. Moreover, 0≤
sup
u∈[v,v ]Q(∞)
d(Zv,Zu )
d(Zv,Zu ) −
u∈[v,v ]Q(m(h))
c , where P (D ) ≤ 2−h+1 . on Dh+ h+
d(Zv,Zu ) ≤ βh+1,∞ ≤ 2−h+4
416
Stochastic Process
3. Write d ≡ 1 ∧ d. Then 0≤E
sup
u∈[v,v ]Q(∞)
v,Zu ) − E d(Z
v,Zu ) ≤ 2−h+5, d(Z
u∈[v,v ]Q(m(h))
where h ≥ 0 and v,v ∈ Qm(h) are arbitrary with v ≤ v . Proof. 1. First let k ≥ 0 and v,v,v ∈ Qm(k) be arbitrary with v ≤ v ≤ v . Consider each ω ∈ Dkc . We will prove that d(Zv (ω),Zu (ω)) − d(Zv (ω),Zu (ω)) ≤ βk+1 . 0≤ u∈[v,v ]Q(m(k+1))
u∈[v,v ]Q(m(k))
(10.5.13) To that end, consider each r ∈ [v,v ]Qm(k+1) . Then r ∈ [s,s + m(k) ]Qm(k+1) for some s ∈ Qm(k) such that [s,s + m(k) ] ⊂ [v,v ]. Write s ≡ s + m(k) . We need to prove that d(Zv (ω),Zu (ω)) + βk+1 . (10.5.14) d(Zv (ω),Zr (ω)) ≤ u∈[v,v ]Q(m(k))
If r = s or r = s , then r ∈ [v,v ]Qm(k) and inequality 10.5.14 holds trivially. Hence we may assume that r ∈ (s,s )Qm(k+1) . Since ω ∈ Dkc by assumption, the defining equality 10.5.9 implies that β(k+1) c
β(k+1) c β(k+1) c β(k+1) c ) (As,r ) ∪ (As,r ) (As ,r ω ∈ (As,r
β(k+1) c
) ∪ (As ,r
β(k+1) c
) (As ,r
) .
Consequently, by the defining equality 10.5.3 for the sets in the last displayed expression, we have d(Zs (ω),Zr (ω)) ∧ d(Zs (ω),Zr (ω)) ≤ βk+1 . Hence the triangle inequality implies that d(Zv (ω),Zr (ω)) ≤ (d(Zs (ω),Zr (ω)) + d(Zv (ω),Zs (ω))) + d(Zv (ω),Zs (ω)))
≤ (d(Zs (ω),Zr (ω)) + >
d(Zv (ω),Zu (ω)))
u∈[v,v ]Q(m(k))
(d(Zs (ω),Zr (ω)) +
d(Zv (ω),Zu (ω)))
u∈[v,v ]Q(m(k))
≤ (βk+1 + >
> (d(Zs (ω),Zr (ω))
d(Zv (ω),Zu (ω)))
u∈[v,v ]Q(m(k))
(βk+1 +
d(Zv (ω),Zu (ω)))
u∈[v,v ]Q(m(k))
= βk+1 +
u∈[v,v ]Q(m(k))
d(Zv (ω),Zu (ω)),
a.u. Càdlàg Process
417
establishing inequality 10.5.14 for arbitrary r ∈ [v,v ]Qm(k+1) . The desired inequality 10.5.13 follows. 2. Now let h ≥ 0 and v,v,v ∈ Qm(h) be arbitrary as given, with v ≤ v ≤ v . c . Then ω ∈ D c for each k ≥ h. Hence, inequality 10.5.13 Consider each ω ∈ Dh+ k can be applied repeatedly to h,h + 1, . . . ,k + 1, to yield d(Zv (ω),Zu (ω)) − d(Zv (ω),Zu (ω)) 0≤ u∈[v,v ]Q(m(k+1))
u∈[v,v ]Q(m(h))
≤ βk+1 + · · · + βh+1 = βh+1,k+1 < βh+1,∞ < 2−h+2 .
(10.5.15)
3. Consider each r ∈ [v,v ]Q∞ . We will prove that h+1 (r). d(Zv (ω),Zr (ω)) ≤ d(Zv (ω),Zu (ω)) + β
(10.5.16)
u∈[v,v ]Q(m(h))
This is trivial if r ∈ Qm(h) . Hence we may assume that r ∈ Qm(k+1) Q− m(k) for some k ≥ h. Then h(r) ≡ k + 1 and βh (r) = βh+1,k+1 . Therefore the first half of inequality 10.5.15 implies that d(Zv (ω),Zu (ω)) + βh+1,k+1 d(Zv (ω),Zr (ω)) ≤ u∈[v,v ]Q(m(h))
=
h+1 (r). d(Zv (ω),Zu (ω)) + β
u∈[v,v ]Q(m(h)) c is arbitrary. Inequality 10.5.12 Inequality 10.5.16 is proved, where ω ∈ Dh+ follows. Assertion 1 is proved. 4. Next consider the special case where v = v. Since P (Dh+ ) ≤ 2−h+1 , it follows from inequality 10.5.15 that the a.u. limit d(Zv,Zu ) Yv,v ≡ lim k→∞
u∈[v,v ]Q(m(k+1))
exists and is an r.r.v. Moreover, for each ω ∈ domain(Yv,v ), it is easy to verify that Yv,v (ω) gives the supremum supu∈[v,v ]Q(∞) d(Zv (ω),Zu (ω)). Thus this supremum is defined and equal to the r.r.v. Yv,v on a full set, and is therefore itself an r.r.v. Letting k → ∞ in inequality 10.5.15, we obtain d(Zv,Zu ) ≤ βh+1,∞ ≤ 2−h+4 (10.5.17) 0 ≤ Yv,v − u∈[v,v ]Q(m(h)) c ∩ domain(Y ). This proves Assertion 2. on Dh+ v,v 5. Write d ≡ 1 ∧ d. Then v,Zu ) − E 0≤E sup d(Z u∈[v,v ]Q(∞)
= E(1 ∧ Yv,v − 1 ∧
v,Zu ) d(Z
u∈[v,v ]Q(m(h))
u∈[v,v ]Q(m(h))
d(Zv,Zu ))
418
Stochastic Process ≤ E1D(h+)c (1 ∧ Yv,v − 1 ∧
d(Zv,Zu )) + E1D(h+)
u∈[v,v ]Q(m(h))
≤ 2−h+4 + P (Dh+ ) ≤ 2−h+4 + 2−h+1 < 2−h+5 .
Assertion 3 and the lemma are proved.
Definition 10.5.6. Right-limit extension of a process with dyadic rational parameters. Recall the convention that if f is an arbitrary function, we write f (x) only with the implicit or explicit condition that x ∈ domain( f ). 1. Let Q∞ stand for the set of dyadic rationals in [0,1]. Let Y : Q∞ × → S be an arbitrary process. Define a function X : [0,1] × → S by domain(X) ≡ {(r,ω) ∈ [0,1] × :
lim
u→r;u≥r
Y (u,ω) exists}
and by X(r,ω) ≡
lim
u→r;u≥r
Y (u,ω)
(10.5.18)
for each (r,ω) ∈ domain(X). We will call rLim (Y ) ≡ X the right-limit extension of the process Y to the parameter set [0,1]. 2. Let Q∞ stand for the set of dyadic rationals in [0,∞). Let Y : Q∞ × → S be an arbitrary process. Define a function X : [0,∞) × → S by D C domain(X) ≡ (r,ω) ∈ [0,∞) × : lim Y (u,ω) exists u→r;u≥r
and by X(r,ω) ≡
lim
u→r;u≥r
Y (u,ω)
(10.5.19)
for each (r,ω) ∈ domain(X). We will call rLim (Y ) ≡ X the right-limit extension of the process Y to the parameter set [0,∞). In general, the right-limit extension X need not be a well-defined process. Indeed, it need not be a well-defined function at all. In the following proposition, recall, as remarked after Definition 6.1.2, that continuity a.u. is a weaker condition than a.u. continuity. Proposition 10.5.7. The right-limit extension of a D-regular process is a welldefined stochastic process and is continuous a.u. Let Z be the arbitrary Dregular process as specified in Definition 10.5.1, along with the related objects specified in Definitions 10.5.3 and 10.5.4. Let X ≡ rLim (Z) : [0,1] × → S denote the right-limit extension of the process Z. Then the following conditions hold:
a.u. Càdlàg Process
419
1. Let ε > 0 be arbitrary. Fix any n ≥ 0 so large that 2−n+5 ≤ ε. Take an arbitrary N ≥ n so large that m(N ) ≡ 2−m(N ) < 2−2 δCp (2−2n+2 ).
(10.5.20)
δcau (ε) ≡ δcau (ε,m,δCp ) ≡ m(N ) > 0.
(10.5.21)
Define
Then, for each t ∈ [0,1], there exists an exceptional set Gt,ε with P (Gt,ε ) < ε such that d(X(t,ω),X(t ,ω)) < ε for each t ∈ [t − δcau (ε),t + δcau (ε)] ∩ domain(X(·,ω)), for each ω ∈ Gct,ε . 2. For each t ∈ [0,1], there exists a null set Ht such that for each ω ∈ Htc , we have t ∈ domain(X(·,ω)) and X(t,ω) = limu→t Z(u,ω). 3. There exists a null set H such that for each ω ∈ H c, we have Q∞ ⊂ domain(X(·,ω)) and X(·,ω)|Q∞ = Z(·,ω). 4. The function X is a stochastic process that is continuous a.u., with δcau as modulus of continuity a.u. 5. Furthermore, the process X has the same modulus of continuity in probability δCp as the process Z. 6. (Right continuity.) For each ω ∈ H c , the function X(·,ω) is right continuous at each t ∈ domain(X(·,ω)). 7. (Right completeness.) For each ω ∈ H c and for each t ∈ [0,1] such that limr→t;r≥t X(r,ω) exists, we have t ∈ domain(X(·,ω)). Proof. 1. Let ε > 0 be arbitrary. Fix n ≥ 0 and N ≥ n as in the hypothesis of Assertion 1. Write ≡ m(N ) ≡ δcau (ε). Recall that δCp is the given modulus of continuity in probability of the process Z, and recall that βn ∈ (2−n+1,2−n+2 ) as in Definition 10.5.3. When there is little risk of confusion, suppress the subscript mN , write p ≡ pm(N ) ≡ 2m(N ) , and write qi ≡ qm(N ),i ≡ i ≡ i2−m(N ) for each i = 0,1, . . . ,p. 2. Consider each t ∈ [0,1]. Then there exists i = 0, . . . ,p − 2 such that t ∈ [qi ,qi+2 ]. The neighborhood θt,ε ≡ [t − ,t + ] ∩ [0,1] of t in [0,1] is a subset of [q(i−1)∨0,q(i+3)∧p ]. Write v ≡ vt ≡ q(i−1)∨0 and v ≡ q(i+3)∧p . Then (i) v,v ∈ Qm(N ) , (ii) v < v , and (iii) the set [v,v ]Qm(N ) = {q(i−1)∨0,qi ,qi+1,qi+2,q(i+3)∧p } contains four or five distinct and consecutive elements of Qm(N ) . Therefore, for each u ∈ [v,v ]Qm(N ) , we have |v − u| ≤ 4 < δCp (2−2n+2 ), whence E1 ∧ d(Zv,Zu )) ≤ 2−2n+2 < βn2 and, by Chebychev’s inequality, P (d(Zv,Zu ) > βn ) ≤ βn .
420
Stochastic Process
Hence the measurable set
An ≡ An,t ≡
(d(Zv,Zu ) > βn )
u∈[v,v ]Q(m(N ))
has probability bounded by P (An ) ≤ 4βn < 2−n+4. . Define Gt,ε ≡ Dn+ ∪ An,t . It follows that P (Gt,ε ) ≤ P (Dn+ ) + P (An ) < 2−n+1 + 2−n+4 < 2−n+5 ≤ ε.
(10.5.22)
c Ac . Then, by the definition of the set A , we 3. Consider each ω ∈ Gct,ε = Dn+ n n have d(Z(v,ω),Z(u,ω)) ≤ βn . (10.5.23) u∈[v,v ]Q(m(N )) c ⊂ D c , inequality 10.5.12 of Lemma 10.5.5 – At the same time, since ω ∈ Dn+ N+ where h,v are replaced by N,v, respectively – implies that, for each r ∈ [v,v ]Q∞ , we have N +1 (r) + d(Zv (ω),Zu (ω)) d(Zr (ω),Zv (ω)) ≤ β u∈[v,v ]Q(m(N ))
N +1 (r) + βn ≤ βN +1,∞ + βn ≤ βn+1,∞ + βn = βn,∞, =β where the second inequality is by inequality 10.5.23. Then the triangle inequality yields d(Z(r,ω),Z(r ,ω)) ≤ 2βn,∞ < 2−n+4 < ε
(10.5.24)
for each r,r ∈ θt,ε Q∞ ⊂ [v,v ]Q∞ , where ω ∈ Gct,ε is arbitrary. 4. Now write εk ≡ 2−k for each k ≥ 1. Then the measurable set G(t) κ ≡ ∞ ∞ (t) −κ+1 . Hence G has probability bounded by P (G ) ≤ ε = 2 k κ t,ε(k) k=κ k=κ (t) (t) c c Ht ≡ ∞ κ=1 Gκ is a null set. Consider each ω ∈ Ht . Then ω ∈ (Gκ ) for some κ ≥ 1. Hence ω ∈ Gct,ε(k) for each k ≥ κ. Inequality10.5.24 therefore implies that limu→t Z(u,ω) exists, whence the right limit X(t,ω) is well defined, with X(t,ω) ≡
lim
u→t;u≥t
Z(u,ω) = lim Z(u,ω). u→t
(10.5.25)
In short, t ∈ domain(X(·,ω)) . This verifies Assertion 2. Assertion 1 then follows from inequality 10.5.24 by the right continuity of X. 5. Define the null set H ≡ u∈Q(∞) Hu . Consider each ω ∈ H c and each u ∈ Q∞ . Then ω ∈ Huc . Hence Assertion 2 implies that u ∈ domain(X(·,ω)), and that X(u,ω) = limv→u Z(v,ω) = Z(u,ω). In short, X(·,ω)|Q∞ = Z(·,ω) and X(·,ω)|Q∞ = Z(·,ω). Assertion 3 is proved. 6. For each k ≥ 1, fix an arbitrary rk ∈ θt,ε(k) Q∞ . Consider each κ ≥ 1 and each ω ∈ Gcκ . Then for each k ≥ κ, we have rk ,rκ ∈ θt,ε(κ) Q∞ . So by inequality 10.5.24, where ε is replaced by εκ , we obtain d(Z(rk ,ω),Z(rκ ,ω)) ≤ εκ .
(10.5.26)
a.u. Càdlàg Process
421
Letting k → ∞, this yields d(X(t,ω),Z(rκ ,ω)) ≤ εκ ≡ 2−k ,
(10.5.27)
where ω ∈ Gcκ is arbitrary. Since P (Gκ ) ≤ εκ ≡ 2−k , we conclude that Zr(κ) → Xt a.u. Consequently, the function Xt is an r.v., where t ∈ [0,1] is arbitrary. Thus the function X ≡ rLim (Z) : [0,1] × → S is a stochastic process. 7. Let t ∈ θt,ε ≡ [t − δcau (ε),t + δcau (ε)] ∩ domain(X(·,ω)) be arbitrary. Letting r ↓ t and r ↓ t in inequality 10.5.24 while r,r ∈ θt,ε Q∞ , we obtain d(X(t,ω),X(t ,ω)) < ε, where ω ∈ Gct,ε is arbitrary. Since t ∈ [0,1] is arbitrary, and since P (Gt,ε ) < ε is arbitrarily small, we see that the process X is continuous a.u. according to Definition 6.1.2, with δcau as a modulus of continuity a.u. Assertion 4 has been proved. 8. Next, we will verify that the process X has the same modulus of continuity in probability δCp as the process Z. To that end, let ε > 0 be arbitrary, and let t,s ∈ [0,1] be such that |t − s| < δCp (ε). In Step 6 we saw that there exist sequences (rk )k=1,2,... and (vk )k=1,2,... in Q∞ such that rk → t, vk → s, Zr(k) → Xt a.u. and Zv(k) → Xs a.u. Then, for sufficiently large k ≥ 0 we have |rk − vk | < δCp (ε), whence E1∧ d(Zr(k),Zv(k) ) ≤ ε. The last cited a.u. convergence therefore implies that E1 ∧ d(Xt ,Xs ) ≤ ε. Summing up, δCp is a modulus of continuity of probability of the process X. Assertion 5 is proved. 9. Let ω ∈ H c be arbitrary. Then domain(Z(·,ω)) = Q∞ . Consider each u ∈ Q∞ . Step 5 says that u ∈ domain(X(·,ω)) and that the function Z(·,ω) is right continuous at u. Hence Proposition 10.1.6, where the functions x,x are replaced by Z(·,ω) and X(·,ω), respectively, implies that the function X(·,ω) is right continuous at each t ∈ domain(X(·,ω)), and that for each t ∈ [0,1] such that limr→t;r≥t X(r,ω) exists, we have t ∈ domain(X(·,ω)). Assertions 6 and 7 have been proved. We now prove the main theorem of this section. Theorem 10.5.8. The right-limit extension of a D-regular process is a.u. càdlàg. Let Z be the arbitrary D-regular process as specified in Definition 10.5.1, along with the related objects in Definitions 10.5.3 and 10.5.4. Then the right-limit extension X ≡ rLim (Z) : [0,1] × → S is a.u. càdlàg. Specifically, (i) the process X has the same modulus of continuity in probability δCp as the given D-regular process Z and (ii) it has a modulus of a.u. càdlàg δaucl (·,m,δCp ) defined as follows. Let ε > 0 be arbitrary. Let n ≥ 0 be so large that 2−n+9 < ε. Let N ≥ mn + n + 6 be so large that m(N ) ≡ 2−m(N ) < 2−2 δCp (2−2m(n)−2n−10 ).
(10.5.28)
Define δaucl (ε,m,δCp ) ≡ m(N ) . We emphasize that the operation δaucl (·,m,δCp ) depends only on m and δCp .
422
Stochastic Process
Proof. We will refer to Definitions 10.5.3 and 10.5.4 for the properties of the β(h) n and Dk ,Dk+ relative to the D-regular process Z. h, β objects (βh )h=0,1,... ,Ar,s , Assertion 3 of Proposition 10.5.7 says that there exists a null set H such that X(·,ω)|Q∞ = Z(·,ω) for each ω ∈ H c . Assertion 6 of Proposition 10.5.7 says that for each ω ∈ H c , the function X(·,ω) is right continuous at each t ∈ domain (X(·,ω)). Moreover, Assertions 4 and 5 of Proposition 10.5.7 say that X is continuous a.u., with some modulus of continuity a.u.. δcau , and is continuous in probability with the same modulus of continuity in probability δCp as the process Z. Refer to Proposition 8.1.13 for basic properties of the simple first exit times. 1. To start, let ε > 0 be arbitrary but fixed. Then let n,N,and δaucl (ε,m,δCp ) be fixed as described in the hypothesis. Let i = 1, . . . ,pm(n) be arbitrary but fixed until further notice. When there is little risk of confusion, we suppress references to n and i, and write simply p ≡ pm(n) ≡ 2m(n), ≡ m(n) ≡ 2−m(n), t ≡ qi−1 ≡ qm(n),i−1 ≡ (i − 1)2−m(n), and t ≡ qi ≡ qm(n),i ≡ i2−m(n) . Thus t,t ∈ Qm(n) and 0 ≤ t < t = t + ≤ 1. 2. With n fixed, write εn = 2−m(n)−n−1 and ν ≡ mn + n + 6. Then N ≥ ν, −ν+5 = εn and 2 m(N ) ≡ 2−m(N ) < 2−2 δCp (2−2ν+2 ).
(10.5.29)
Hence δcau (εn,m,δCp ) = m(N ), where δcau (·,m,δCp ) is the modulus of continuity a.u. of the process X defined in Proposition 10.5.7. Note that εn ≤ βn ∈ (2−n+1,2−n+2 ), where we recall the sequence (βh )h=0,1,... specified relative to the process Z in Definition 10.5.3. In the next several steps, we will prove that, except on a set of small probability, the set Z([t,t ]Q∞,ω) is the union of two subsets Z([t,τ (ω))Q∞,ω) and Z([τ (ω),t ]Q∞,ω), each of which is contained in a ball in (S,d) with small radius, where τ is some r.r.v. Recall here the notations from Definition 10.5.2 for the range of the sample function Z(·,ω). 3. First introduce some simple first exit times of the process Z. Let k ≥ n be arbitrary. As in Definition 8.1.12, define the simple first exit time ηk ≡ ηk,i ≡ ηt, β(n),[t,t ]Q(m(k))
(10.5.30)
a.u. Càdlàg Process
423
n -neighborhood of Zt . Note for the process Z|[t,t ]Qm(k) to exit the time-varying β that the r.r.v. ηk has values in [t + m(k),t ]Qm(k) . Thus t + m(k) ≤ ηk ≤ t .
(10.5.31)
In the case where k = n, this yields ηn ≡ ηn,i = t = qm(n),i .
(10.5.32)
Since Qm(k) ⊂ Qm(k+1) , the more frequently sampled simple first exit time ηk+1 comes no later than ηk , according to Assertion 5 of Proposition 8.1.13. Hence ηk+1 ≤ ηk . 4. Let κ ≥ k ≥ n be arbitrary. Consider each ω ∈ that
(10.5.33) c Dk+
⊂
c . Dκ+
t ≤ ηk (ω) − 2−m(k) ≤ ηκ (ω) − 2−m(κ) ≤ ηκ (ω) ≤ ηk (ω).
We will prove (10.5.34)
The first of these inequalities is from the first part of inequality 10.5.31. The third is trivial, and the last is by repeated applications of inequality 10.5.33. It remains only to prove the second inequality. To that end, write, as an abbreviation, α ≡ t, s ≡ ηk (ω) and α ≡ s − m(k) . Then α,t,s,α ∈ [t,t ]Qm(k) . Since α < s ≡ ηk (ω), the sample path Z(·,ω)|[t,t ]Qm(k) has not exited the n -neighborhood of Zt (ω) at time α . In other words, time-varying β n (u) ≤ βn,k d(Zt (ω),Zu (ω)) ≤ β
(10.5.35)
for each u ∈ [t,α ]Qm(k) , where the last inequality is from equality 10.5.7 in Definition 10.5.3. − Next consider each u ∈ [t,α ]Qm(κ) Q− m(k) , where Qm(k) is the metric complement of Qm(k) . Then inequality 10.5.12 of Lemma 10.5.5, where h,v,v,v ,u,r are replaced by k,t,t,α ,w,u, respectively, implies that k+1 (u) d(Zt (ω),Zu (ω)) ≤ d(Zt (ω),Zw (ω)) + β w∈[t,α ]Q(m(k))
k+1 (u) = β n (u), ≤ βn,k + β Here the second inequality is from inequality 10.5.35 and the equality is thanks to the first half of inequality 10.5.6, while u ∈ [t,α ]Qm(κ) Q− m(k) is arbitrary. We can combine the last displayed inequality with inequality 10.5.35 to obtain n (u) d(Zt (ω),Zu (ω)) ≤ β
(10.5.36)
for each u ∈ [t,α ]Qm(κ) . Thus the sample path Z(·,ω)|[t,t ]Qm(κ) has not exited n -neighborhood of Z(t,ω) at time α ≡ ηk (ω)−m(k) . In other the time-varying β words, ηk (ω) − m(k) < ηκ (ω).
424
Stochastic Process
Since both sides of this strict inequality are members of Qm(κ) , it follows that ηk (ω) − m(k) ≤ ηκ (ω) − m(κ) . Thus inequality 10.5.34 has been verified. 5. Inequality 10.5.34 immediately implies that 0 ≤ ηk (ω) − ηκ (ω) ≤ 2−m(k) , c is arbitrary. Hence the limit where κ ≥ k ≥ n are arbitrary, and where ω ∈ Dk+ τ ≡ τi ≡ lim ηκ κ→∞
c . Moreover, with κ exists uniformly on Dk+
→ ∞, inequality 10.5.34 implies that
t ≤ ηk − 2−m(k) ≤ τ ≤ ηk ≤ t
(10.5.37)
c . Since P (D ) ≤ 2−k+3 is arbitrarily small, we conclude that on Dk+ k+
ηκ ↓ τ
a.u.
Therefore τ ≡ τi is an r.r.v., with t < τ ≤ t . Recall that t = qi−1 and t = qi . This last equality can be rewritten as qi−1 < τi ≤ qi .
(10.5.38)
c be arbitrary. Consider each 6. Now let h ≥ n be arbitrary, and let ω ∈ Dh+ u ∈ [t,τ (ω))Q∞ . Then u ∈ [t,ηk (ω))Qm(k) for some k ≥ h. Hence, by the basic properties of the simple first exit time ηk , we have
n (u) < βn,∞ . d(Z(t,ω),Z(u,ω)) ≤ β
(10.5.39)
Since u ∈ [t,τ (ω))Q∞ is arbitrary, inequality 10.5.39 implies Z([qi−1,τi (ω))Q∞,ω) ⊂ {x ∈ S : d(x,Z(qi−1,ω)) < βn,∞ } ⊂ {x ∈ S : d(x,Z(qi−1,ω)) < 2−n+3 }.
(10.5.40)
7. To obtain a similar bounding relation for the set Z([τi (ω),qi ],ω), we will first prove that d(Z(w,ω),Z(ηh (ω),ω)) ≤ βh+1,∞
(10.5.41)
for each w ∈ [ηh+1 (ω),ηh (ω)]Q∞ . To that end, write, as an abbreviation, u ≡ ηh (ω) − m(h) , r ≡ ηh+1 (ω), and u ≡ ηh (ω). From inequality 10.5.34, where k,κ are replaced by h,h + 1, respectively, we obtain u < r ≤ u . The desired inequality 10.5.41 holds trivially if r = u . Hence we may assume that u < r < u . Consequently, since u,u are consecutive points in the set Qm(h) of dyadic rationals, we have r ∈ Qm(h+1) Q− m(h) , whence βn (r) = βn,h+1 according to Definition 10.5.3. Moreover, since u ≤ t according to inequality 10.5.31, we have ηh+1 (ω) ≡ r < u ≤ t .
a.u. Càdlàg Process
425
In words, the sample path Z(·,ω)|[t,t ]Qm(h+1) successfully exits the timen -neighborhood of Z(t,ω), for the first time at r. Therefore varying β n (r) = βn,h+1 . d(Z(t,ω),Z(r,ω)) > β
(10.5.42)
However, since u ∈ Qm(h) ⊂ Qm(h+1) and u < r, exit has not occurred at time u. Hence n (u) ≤ βn,h . d(Z(t,ω),Z(u,ω)) ≤ β
(10.5.43)
Inequalities 10.5.42 and 10.5.43 together yield, by the triangle inequality, d(Z(u,ω),Z(r,ω)) > βn,h+1 − βn,h = βh+1 . In other words, β(h+1) ω ∈ Au,r ≡ (d(Zu,Zr ) > βh+1 ),
(10.5.44)
where the equality is the defining equality 10.5.3 in Definition 10.5.3. Now consider an arbitrary s ∈ [r,u )Qm(h+1) . Then relation 10.5.44 implies, trivially, that β(h+1)
β(h+1) β(h+1) β(h+1) β(h+1) ω ∈ Au,r ⊂ (Au,r ∪ Au,s )(Au,r ∪ Au ,s
).
(10.5.45)
c ⊂ D c and At the same time, r,s ∈ (u,u )Qm(h+1) with r ≤ s. Since ω ∈ Dh+ h since u,u ∈ Qm(h) with u ≡ u+m(h) , we can apply the defining formula 10.5.9 of the exceptional set Dh to obtain β(h+1) c
β(h+1) β(h+1) c β(h+1) ω ∈ Dhc ⊂ (Au,r ∪ Au,s ) ∪ (Au,r ∪ Au ,s
β(h+1)
) ∪ (Au ,r
β(h+1)
∪ Au ,s )c. (10.5.46)
Relations 10.5.45 and 10.5.46 together then imply that β(h+1)
ω ∈ (Au ,r
β(h+1) c
∪ Au ,s
β(h+1) c
) ⊂ (Au ,s
β(h+1)
Consequently, by the definition of the set Au ,s
) .
, we have
d(Z(s,ω),Z(u ,ω)) ≤ βh+1,
(10.5.47)
where s ∈ [r,u )Qm(h+1) is arbitrary. Inequality 10.5.47 trivially holds for s = u . Summing up, d(Zu (ω),Zs (ω)) ≤ βh+1 . (10.5.48) s∈[r,u ]Q(m(h+1))
Then, for each w ∈ [r,u ]Q∞ ≡ [ηh+1 (ω),ηh (ω)]Q∞ , we can apply inequality 10.5.12 of Lemma 10.5.5, where h,v,v ,v,r,u are replaced by h + 1,r,u ,u ,w,s, respectively, to obtain h+2 (w) + d(Zu (ω),Zw (ω)) ≤ β d(Zu (ω),Zs (ω)) s∈[r,u ]Q(m(h+1))
h+2 (w) + βh+1 ≤ βh+2,∞ + βh+1 = βh+1,∞ . (10.5.49) ≤β
426
Stochastic Process
Here, the second inequality is by inequality 10.5.48 and the third inequality is due to inequality 10.5.6 in Definition 10.5.3. In other words, inequality 10.5.41 is verified. 8. Proceed to prove that Xτ is a well-defined r.v. To that end, let r ∈ (τ (ω), ηh (ω)]Q∞ be arbitrary. Then r ∈ [ηk+1 (ω),ηk (ω)]Q∞ for some k ≥ h. Hence d(Z(r ,ω),Z(ηh (ω),ω)) ≤ d(Z(ηk+1 (ω),ω),Z(r ,ω)) + d(Z(ηk+1 (ω),ω),Z(ηk (ω),ω)) + · · · + d(Z(ηh+1 (ω),ω),Z(ηh (ω),ω) ≤ 2βk+1,∞ + · · · + 2βh+1,∞ < 2−h+5,
(10.5.50)
where the second inequality is by repeated applications of inequality 10.5.41, c , and h ≥ n are arbitrary. where r ∈ (τ (ω),ηh (ω)]Q∞ , ω ∈ Dh+ 9. Continuing, let k ≥ h be arbitrary. Define εk ≡ 2−k+6−m(k) . Then, by Assertion 1 of Proposition 10.5.7, there exists δk ≡ δcau (εk ,m,δCp ) > 0
(10.5.51)
such that for each u ∈ [0,1], there exists an exceptional set Gu,ε (k) with P (Gu,ε (k) ) < εk such that d(X(u,ω),X(u ,ω)) < εk
(10.5.52)
for each u ∈ [u − δk ,u + δk ] ∩ domain(X(·,ω)), for each ω ∈ Gcu,ε (k) . Define the exceptional sets Gu,ε (k) (10.5.53) Bk ≡ u∈Q(m(k))
and Bk+ ≡
∞
Bj .
(10.5.54)
j =k
Then P (Bk ) ≤
u∈Q(m(k))
εk ≡
2−k+6−m(k) = (2m(k) + 1)2−k+6−m(k) ≤ 2−k+7
u∈Q(m(k))
(10.5.55) and P (Bk+ ) ≤ 2−k+8 . Separately, define the full set C≡
∞
(ηj = u).
j =n u∈Q(m(j ))
Then P (C c ∪ Bk+ ∪ Dk+ ) < 2−k+8 + 2−k+4 < 2−k+9 .
(10.5.56)
a.u. Càdlàg Process
427
c D c . Then 10. Next, with k ≥ h ≥ n arbitrary, consider each ω ∈ CBh+ h+ c D c ⊂ C. Hence there exists u ∈ Q ω ∈ CBk+ m(k) such that ηk (ω) = u. Let k+ r ∈ (τ (ω),ηk (ω) + δk )Q∞ be arbitrary. Then either (i ) r ∈ (τ (ω),ηk (ω)]Q∞ or (ii ) r ∈ [ηk (ω),ηk (ω) + δk )Q∞ . In case (i ), we have
d(Z(r ,ω),Z(ηk (ω),ω)) < 2−k+5,
(10.5.57)
according to inequality 10.5.50, where h is replaced by k. Consider case (ii ). Then r ∈ [u,u + δk )Q∞ . At the same time, ω ∈ Bkc ⊂ Gcu,ε (k) . Hence d(Z(ηk (ω),ω),Z(r ,ω)) = d(Z(u,ω),Z(r ,ω)) = d(X(u,ω),X(r ,ω)) < εk ≡ 2−k+6−m(k) ≤ 2−k+5, where the first inequality is according to inequality 10.5.52. Summing up, in each of cases (i ) and (ii ), inequality 10.5.57 holds, where r ∈ (τ (ω),ηk (ω) + δk )Q∞ is arbitrary. Therefore the triangle inequality yields d(Z(r ,ω),Z(r ,ω)) < 2−k+6
(10.5.58)
for each r ,r ∈ (τ (ω),ηk (ω) + δk )Q∞ . 11. Now let s ,s ∈ [τ (ω),ηk (ω) + δk )Q∞ be arbitrary. Let (rj )j =1,2,... and (rj )j =1,2,... be two sequences in (τ (ω),ηk (ω) + δk )Q∞ such that rj ↓ s and rj ↓ s ’. Then d(X(rj ,ω),X(rj ,ω)) = d(Z(rj ,ω),Z(rj ,ω)) < 2−k+6
(10.5.59)
for each j ≥ 1. At the same time, the process X is right continuous, according to Proposition 10.5.7. It follows that d(X(s ,ω),X(s ,ω)) ≤ 2−k+6,
(10.5.60)
where s ,s ∈ [τ (ω),ηk (ω) + δk )Q∞ are arbitrary. Consequently, d(Z(s ,ω),Z(s ,ω)) = d(X(s ,ω),X(s ,ω)) ≤ 2−k+6,
(10.5.61)
c D c and where s ,s ∈ [τ (ω),τ (ω) + δk )Q∞ are arbitrary, where ω ∈ Ch Bh+ h+ k ≥ h are arbitrary. Thus lims →τ (ω);s ≥τ (ω) Z(s ,ω) exists. In other words, τ (ω) ∈ domain(X(·,ω)), with X(τ (ω),ω) ≡ lims →τ (ω);s ≥τ (ω) Z(s ,ω). Equivalently, ω ∈ domain(Xτ ), with Xτ (ω) ≡ lims →τ (ω);s ≥τ (ω) Zs (ω). Moreover, setting s ≡ ηk (ω) and letting s ↓ τ (ω) in inequality 10.5.60, we obtain
d(Zη(k) (ω),Xτ (ω)) ≤ 2−k+6 .
(10.5.62)
Similarly, letting s ↓ τ (ω) in relation 10.5.60, we obtain d(Xs (ω),Xτ (ω)) ≤ 2−k+6, c D c is arbitrary. for each s ∈ [τ (ω),τ (ω) + δk )Q∞ , where ω ∈ CBh+ h+
(10.5.63)
428
Stochastic Process
12. Since P (C c ∪Bh+ ∪Dh+ ) < 2−h+9 for each h ≥ n, it follows that Zη(k) → Xτ a.u. Hence Xτ is a well-defined r.v., where τ ≡ τi , and where i = 1, . . . ,p ≡ 2m(n) is arbitrary. 13. Continue with ε > 0 arbitrary, and n,i,p,,t,t fixed accordingly. Consider c D c be arbitrary. By the the special case where h = k = n. Let ω ∈ CBn+ n+ defining equality 10.5.30 of ηn ≡ ηn,i , we see that the r.r.v. ηn has values in (t,t ]Qm(n) = {t }, where the last equality is because t ≡ qi−1 and t ≡ qi are consecutive members of Qm(n) . Therefore [τ (ω),qi ]Q∞ = [τ (ω),ηn (ω)]Q∞ ⊂ [τ (ω),τ (ω) + δn )Q∞ ∪ (τ (ω),ηn (ω)]Q∞ . According to inequality 10.5.62, we have d(Zη(n) (ω),Xτ (ω)) ≤ 2−n+6, while, according to 10.5.63, we have d(Xs (ω),Xτ (ω)) ≤ 2−n+6 for each s ∈ [τ (ω),τ (ω) + δn )Q∞ . Combining, we obtain d(Xs (ω),Zη(n) (ω)) ≤ 2−n+7
(10.5.64)
for each s ∈ [τ (ω),τ (ω) + δn )Q∞ . Hence Z([τi (ω),qi ]Q∞,ω) ⊂ Z([τ (ω),τ (ω) + δn )Q∞,ω) ∪ Z((τ (ω),ηn (ω)]Q∞,ω) ⊂ {x ∈ S : d(x,Zη(n) (ω)) ≤ 2−n+8 } = {x ∈ S : d(x,Z(qi ,ω)) ≤ 2−n+8 },
(10.5.65)
c D c is arbitrary, and where the second containment relation is where ω ∈ CBn+ n+ thanks to inequalities 10.5.64 and 10.5.50. 14. For convenience, define the constant r.r.v.’s τ0 ≡ 0 and τp+1 ≡ 1. Then relations 10.5.65 and 10.5.40 can be combined to yield
Z([τi−1 (ω),τi (ω))Q∞,ω) = Z([τi−1 (ω),qi−1 ]Q∞,ω) ∪ Z([qi−1,τi (ω))Q∞,ω) ⊂ {x ∈ S : d(x,Z(qi−1,ω)) ≤ 2−n+8 }. In addition, relation 10.5.65 gives Z([τp (ω),τp+1 (ω)]Q∞,ω) = Z([τp (ω),qp ]Q∞,ω) ⊂ {x ∈ S : d(x,Z(qp,ω)) < 2−n+8 }. Define the random interval θi−1 ≡ [τi−1,τi ) and θp ≡ [τp,τp+1 ]. Then the last two displayed relations can be restated as Z(θj (ω)Q∞,ω) ⊂ {x ∈ S : d(x,Z(qj ,ω)) ≤ 2−n+8 } c D c is arbitrary. for each j = 0, . . . ,p, where ω ∈ CBn+ n+
(10.5.66)
a.u. Càdlàg Process
429
15. We will next estimate a lower bound for lengths of the random intervals θ0, . . . ,θp−1,θp . With ε,n,N fixed as in the hypothesis, recall from Step 2 that εn ≡ 2−m(n)−n−1 ≤ 2−n−1 < βn and N ≥ ν = mn +n+6. Then 2−ν+5 = εn and ≡ m(N ) ≡ 2−m(N ) < 2−2 δCp (2−2ν+2 ).
(10.5.67)
Now write q ≡ 1 − , and note that δcau (εn ) = < 2−2 δCp (2−2ν+2 ). Thus the conditions in Assertion 1 of Proposition 10.5.7 are satisfied with ε,n,t,t , replaced by εn,ν,r,r respectively. Accordingly, there exists, for each r ∈ [0,1], c an exceptional set Gr,ε(n) with P (Gr,ε(n) ) < εn such that for each ω ∈ Gr,ε(n) , we have d(Z(r,ω),Z(r ,ω)) < εn < βn for each r ∈ [r − ,r + ]Q∞ . Define the exceptional sets Cn ≡ Gr,ε(n),
(10.5.68)
(10.5.69)
r∈Q(m(n))
C n+ ≡
∞
C k,
(10.5.70)
k=n
and c
c c An ≡ CH c Bn+ Dn+ C n+,
where H is the null set introduced at the beginning of this proof. Then
εn < (2m(n) + 1)εn ≡ (2m(n) + 1)2−m(n)−n−1 ≤ 2−n, P (C n ) ≤ r∈Q(m(n))
P (C n+ ) ≤ 2−n+1, and P (An ) ≤ 0 + 0 + 2−n+8 + 2−n+3 + 2−n+1 < 2−n+9 < ε. c
c
c Dc C 16. Now let ω ∈ An ≡ CH c Bn+ n+ n+ be arbitrary, and let s ∈ [t,t +)Q∞ c c be arbitrary. Let k ≥ n be arbitrary. Then ω ∈ C n ⊂ Gt,ε(n) because t ∈ Qm(n) . Therefore inequality 10.5.68, with t in the place of r, holds:
d(Z(t,ω),Z(r ,ω)) < βn
(10.5.71)
for each r ∈ [t,s]Qm(k) ⊂ (t,t + )Q∞ . Consider each r ∈ [t,s]Qm(k) . Then m(n) . Hence we have r ∈ Q− m(n) because t ∈ Qm(n) and r − t ≤ s − t < ≤ 2 n (r ) ≥ equality 10.5.6 in Definition 10.5.3 applies with h(r ) ≥ n+1 and yields β βn + βn+1 . Therefore inequality 10.5.71 implies n (r ) d(Z(t,ω),Z(r ,ω)) < β
(10.5.72)
430
Stochastic Process
for each r ∈ [t,s]Qm(k) . Inequality 10.5.72 says that for each k ≥ n, the sample n -neighborhood of Z(t,ω) path Z(·,ω)|[t,t ]Qm(k) stays within the time-varying β up to and including time s. Hence, according to Assertion 4 of Proposition 8.1.13, the simple first exit time ηk (ω) ≡ ηt, β(n),[t,t ]Q(m(k)) (ω) can come only after time s. In other words, s < ηk (ω). Letting k → ∞, we therefore obtain s ≤ τ (ω). Since s ∈ [t,t + )Q∞ is arbitrary, it follows that t + ≤ τ (ω). Therefore, |θi−1 (ω)| = τi (ω) − τi−1 (ω) ≥ τi (ω) − qi−1 ≡ τ (ω) − t ≥ ,
(10.5.73)
where i = 1, . . . ,p is arbitrary. 17. The interval θp (ω) ≡ [τp (ω),1] remains, with a length possibly less than . To deal with this nuisance, we will replace the r.r.v. τp with the r.r.v. τ p ≡ τp ∧ (1 − ), while keeping τ i ≡ τi if i ≤ p − 1. For convenience, define the constant r.r.v.’s τ 0 ≡ 0 and τ p+1 ≡ 1. Define the random interval θ i−1 ≡ [τ i−1,τ i ) and θ p ≡ [τ p,τ p+1 ]. First note that τ j +1 (ω) − τ j (ω) ≡ τj +1 (ω) − τj (ω) ≥
(10.5.74)
for each j = 0, . . . ,p − 2. Moreover, for j = p − 1, we have τ p (ω) − τ p−1 (ω) ≡ τ p (ω) − τp−1 (ω) ≥ τ p (ω) − qp−1 ≡ τp (ω) ∧ (1 − ) − qp−1 = (τp (ω) − qp−1 ) ∧ (1 − − qp−1 ) = (τp (ω) − qp−1 ) ∧ ( − ) ≥ ∧ = ,
(10.5.75)
where the last inequality follows from the second half of inequality 10.5.73 and from the inequality − ≡ 2−m(n) − 2−m(N ) ≥ 2−m(N )+1 − 2−m(N ) = 2−m(N ) ≡ . Furthermore, for j = p, we have τ p+1 (ω) − τ p (ω) ≡ 1 − τp (ω) ∧ (1 − ) ≥ .
(10.5.76)
Combining inequalities 10.5.74, 10.5.75, and 10.5.76, we see that p >
(τ j +1 (ω) − τ j (ω)) ≥ ≡ δaucl (ε,m,δCp ),
(10.5.77)
j =0
where ω ∈ An is arbitrary. 18. We will now verify that relation 10.5.66 still holds when θj is replaced by θ j , for each j = 0, . . . ,p. First note that for j = 0, . . . ,p − 1, we have θ j ≡ [τ j ,τ j +1 ) ⊂ [τj ,τj +1 ) ≡ θj , whence, according to relation 10.5.66, we have Z(θ j (ω)Q∞,ω) = Z(θj (ω)Q∞,ω) ⊂ {x ∈ S : d(x,Z(qj ,ω)) ≤ 2−n+8 }. (10.5.78)
a.u. Càdlàg Process
431
19. We still need to verify a similar range-bound relation for j = p. To that end, let r ∈ θ p (ω)Q∞ ≡ [τp (ω)∧(1−),1]Q∞ be arbitrary. Either (i ) r < 1− or (ii ) r ≥ 1 − . Consider case (i ). Then the assumption r < τp (ω) would imply r < τp (ω) ∧ (1 − ), a contradiction. Therefore r ∈ [τp (ω),1]Q∞ , whence d(Z(r ,ω),Z(1,ω)) ≤ 2−n+8 by relation 10.5.66 with j = p. In case (ii ), because 1 ∈ Qm(n),r ∈ [1−,1]Q∞ , c and ω ∈ G1,ε(n) , inequality 10.5.68 applies to r ≡ 1, yielding d(Z(1,ω),Z(r ,ω)) < εn < βn < 2−n+8 . Thus, in both cases (i ) and (ii ), we have d(Z(r ,ω),Z(1,ω)) ≤ 2−n+8, where r ∈ θ p (ω)Q∞ is arbitrary. We conclude that Z(θ p (ω)Q∞,ω) ⊂ {x ∈ S : d(x,Z(qp,ω)) ≤ 2−n+8 }.
(10.5.79)
20. Inequalities 10.5.78 and 10.5.79 can be summarized as Z(θ j (ω)Q∞,ω) ⊂ {x ∈ S : d(x,Z(qj ,ω)) ≤ 2−n+8 }
(10.5.80)
for each j = 1, . . . ,p, where ω ∈ An is arbitrary. 21. Separately, as observed at the beginning of this proof, the process X is continuous a.u., with modulus of continuity a.u. δcau . Hence, for each
h ≥ 0, there exists a measurable set Aq,h ⊂ domain(Xq ) with P Acq,h < 2−h such that for each ω ∈ Aq,h and for each u ∈ (q−δcau (2−h ),q+δcau (2−h )) ∩ domain(X(·,ω )), we have d(X(u,ω ),X(q,ω )) ≤ 2−h .
(10.5.81)
Take regular points bh,ch of the r.r.v. τp such that q − δcau (2−h ) < bh < q < ch < q + δcau (2−h )) ∩ domain(X(·,ω)). 22. Now let h ≥ n be arbitrary. Define the measurable sets A h ≡ (τp ≤ bh )Aq,h , A h ≡ (bh < τp ≤ ch )Aq,h , and A h ≡ (ch < τp )Aq,h . Define the r.r.v. Yh by c domain(Yh ) ≡ Ah ∪ Ah ∪ Ah ∪ Aq,h and by Yh ≡ Xτ (p),Xq ,Xq ,Xq
c on A h,A h,A h ,Aq,h,
respectively. Then P (d(Xτ (p),Yh ) > 2−h ) ≤ P (d(Xτ (p),Yh ) > 2−h )A h + P (d(Xτ (p),Yh ) > 2−h )A h
c + P (d(Xτ (p),Yh ) > 2−h )A + P A h q,h .
432
Stochastic Process
Consider the for summands on the right-hand side of this inequality. For the first summand, we have (d(Xτ (p),Yh ) > 2−h )A h ⊂ (d(Xτ (p),Xτ (p) ) > 2−h )(τp ≤ bh < q) ⊂ (d(Xτ (p),Xτ (p) ) > 2−h )(τp < q) = φ. For the second summand, note that (bh < τ p ≤ ch )Aq,h = (bh < τ p = τp ∧ q ≤ ch )Aq,h ≡ (q − δcau (2−h ) < bh < τ p ≤ ch < q + δcau (2−h ))Aq,h ⊂ (d(Xτ (p),Xq ) ≤ 2−h ), where the last inclusion relation is thanks to inequality 10.5.81, whence (d(Xτ (p),Yh ) > 2−h )A h ≡ (d(Xτ (p),Xq ) > 2−h )(bh < τ p ≤ ch )Aq,h ⊂ (d(Xτ (p),Xq ) > 2−h )(d(Xτ (p),Xq ) ≤ 2−h ) = φ. For the third summand, we have −h (d(Xτ (p),Yh ) > 2−h )A h = (d(Xτ (p),Xq ) > 2 )(q < ch < τp )Aq,h
⊂ (d(Xτ (p),Xq ) > 2−h )(τ p = q) = φ. Combining, we see that P (d(Xτ (p),Yh ) > 2−h ) = 0 + 0 + 0 + P (Acq,h ) ≤ 2−h . Thus Yh → Xτ (p) a.u. Consequently, Xτ (p) is an r.v. 23. For each j = 1, . . . ,p − 1, we have τ j = τj , whence Xτ (j ) = Xτ (j ) , and so Xτ (j ) is an r.v. In view of the concluding remark of Step 22, we see that Xτ (j ) is an r.v. for each j = 1, . . . ,p. c D c C c arbitrary, let j = 1, . . . ,p 24. Continuing, with ω ∈ An ≡ CH c Bn+ n+ n+ be arbitrary. Inequality 10.5.80 and right continuity of X(·,ω) imply that d(X(r,ω),X(qj ,ω)) ≤ 2−n+8
(10.5.82)
for each r ∈ θ j (ω)domain(X(·,ω)). In particular, d(X(τj (ω),ω),X(qj ,ω)) ≤ 2−n+8 .
(10.5.83)
Hence the triangle inequality yields d(X(τj (ω),ω),X(r,ω)) ≤ 2−n+9 < ε,
(10.5.84)
for each r ∈ θ j (ω)domain(X(·,ω)), where ω ∈ An is arbitrary. 25. Summing up, the process X is continuous in probability on [0,1], with a ≡ H c ∩ u∈Q(∞) domain modulus of continuity in probability δCp . The set B (Xu ) is a full set, where H is the null set defined at the beginning of this Assertions 6 and 7 of Proposition 10.5.7 say that for n ≡ An B. proof. Define A
a.u. Càdlàg Process
433
⊂ H c , the function X(·,ω) is right continuous at each t ∈ domain each ω ∈ B (X(·,ω)), and that for each t ∈ [0,1] such that limr→t;r≥t X(r,ω) exists, we have t ∈ domain(X(·,ω)). In other words, the right-continuity condition and the rightcompleteness condition in Definition 10.3.2 have been proved for the process X. Moreover, with ε > 0, we have constructed (i ) δaucl (ε) ∈ (0,1), (ii ) a measur with P (A cn ) = P (Acn ) < ε, (iii ) an integer p + 1 ≥ 1, and (iv ) n ⊂ B able set A a sequence of r.r.v.’s 0 ≡ τ 0 < τ 1 < · · · < τ p < τ p+1 ≡ 1 such that for each i = 0, . . . ,p, the function Xτ (i) is an r.v., and such that (v ) for n ⊂ An , we have, in view of inequalities 10.5.77 and 10.5.84, each ω ∈ A p >
(τ i+1 (ω) − τ i (ω)) ≥ δaucl (ε)
(10.5.85)
d(X(τ i (ω),ω),X(·,ω)) ≤ ε
(10.5.86)
i=0
with
on the interval θ i (ω) ≡ [τ i (ω),τ i+1 (ω)) or θ i (ω) ≡ [τ i (ω),τ i+1 (ω)] depending on whether 0 ≤ i ≤ p − 1 or i = p. Thus all the conditions in Definition 10.3.2 have been verified for the process X. Accordingly, X is an a.u. càdlàg process that specifically satisfies Conditions (i) and (ii) in this theorem. 26. Note, for later reference, that inequality 10.5.38 implies q0 ≡ 0 < τ1 ≤ q1 < τ2 ≤ q2 ≤ · · · < τp−1 ≤ qp−1 < τp ≤ qp = 1. (10.5.87) By definition, τ i ≡ τi if i ≤ p − 1, while τ p ≡ τp ∧ (1 − ) > qp−1 because τp > qp−1 and (1 − ) > (1 − 2−m(n) ) = qp−1 . Therefore 10.5.87 yields τ 0 ≡ q0 ≡ 0 < τ 1 ≤ q1 < τ 2 ≤ q2 ≤ · · · < τ p−1 ≤ qp−1 < τ p ≤ qp = 1 ≡ τ p+1 . It follows that qj ≡ j 2−m(n) ∈ θ j , for each j = 0, . . . ,p + 1.
10.6 Continuity of the Right-Limit Extension We will prove that the right-limit extension of D-regular processes, defined in Section 10.5, is a continuous function. Let (S,d) be a locally compact metric space. Refer to Definition 9.0.2 for notations related to the enumerated set Q∞ ≡ {t0,t1, . . .} of dyadic rationals in the interval [0,1] and its subset Qm ≡ {qm,0,qm,1, . . . ,qm,p(m) } = {t0, . . . ,tp(m) } for each m ≥ 0. ∞ × ,S) denotes the space of stochastic Recall from Definition 6.4.2 that R(Q processes with parameter set Q∞ , sample space (,L,E), and state space (S,d), and that it is equipped with the metric ρ P rob,Q(∞) defined by
434
Stochastic Process ρ P rob,Q(∞) (Z,Z ) ≡
∞
2−j −1 E1 ∧ d(Zt (j ),Zt (j ) )
(10.6.1)
j =0
∞ × ,S). for each Z,Z ∈ R(Q ) of Recall from Definitions 10.3.2 and 10.3.6 the metric space (D[0,1],ρ D[0,1] a.u. càdlàg processes on the interval [0,1], with ! ... (X,X ) ≡ E(dω)dD (X(·,ω),X (·,ω)) ρD[0,1] for each X,X ∈ D[0,1], where dD ≡ 1 ∧ dD , where dD is the Skorokhod metric on the space D[0,1] of càdlàg functions. Recall that [·]1 is an operation that assigns to each a ∈ R an integer [a]1 ∈ (a,a + 2). Theorem 10.6.1. Continuity of the construction of a.u. càdlàg process by Dreg,m,δ(Cp) (Q∞ × ,S) right-limit extension of D-regular process. Let R ρP rob,Q(∞) ) whose members share some denote the subspace of (R(Q∞ × ,S), common modulus of continuity in probability δCp and some common modulus of D-regularity m ≡ (mn )n=0.1.··· . Let Dreg,m,δ(Cp) (Q∞ × ,S), δ(aucl),δ(Cp) [0,1],ρD[0,1] rLim : (R ρP rob,Q(∞) ) → (D ) (10.6.2) be the right-limit extension as constructed in Theorem 10.5.8, where δ(aucl),δ(Cp) [0,1],ρ (D D[0,1] ) is defined as the metric subspace of a.u. cadl ` ag ` processes that share the common modulus of continuity in probability δCp , and that share the common modulus of a.u. càdlàg δaucl ≡ δaucl (·,m,δCp ) as defined in Theorem 10.5.8. Then the function rLim is uniformly continuous, with a modulus of continuity δrLim (·,m,δCp ) that depends only on m and on δCp . Proof. 1. Let ε0 > 0 be arbitrary. Write ε ≡ 2−4 ε0 . According to Theorem 10.5.8, the real number δaucl (ε,m,δCp ) > 0 is defined as follows. Take any n ≥ 0 so large that 2−n+9 < ε. Take N ≥ mn + n + 6 so large that ≡ m(N ) ≡ 2−m(N ) < 2−2 δCp (2−2m(n)−2n−10 ).
(10.6.3)
Then, we have δaucl (ε,m,δCp ) ≡ m(N ) ≡ 2−m(N ) . 2. Now take k > N so large that 2−m(k)+2 < (1 − e−ε ). ≡ 2−m(k) . Define Write p ≡ pm(n) ≡ 2m(n) , p ≡ pm(k) ≡ 2m(k) , and δrLim (ε0,m,δCp ) ≡ δ ≡ 2−1 p −1 ε2 .
(10.6.4)
a.u. Càdlàg Process
435
We will prove that the operation δrLim (·,m,δCp ) is a modulus of continuity for the function rLim in expression 10.6.2. Dreg,m,δ(Cp) (Q∞ × ,S) be arbitrary such that 3. To that end, let Z,Z ∈ R ρ P rob,Q(∞) (Z,Z ) ≡ E
∞
t (j ),Z ) < δ ≡ δrLim (ε0,m,δCp ). 2−j −1 d(Z t (j )
j =0
(10.6.5) Let X ≡ rLim (Z) and X ≡ rLim (Z ). By Theorem 10.5.8, X,X are a.u. càdlàg processes. We need only verify that (X,X ) < ε0 . ρD[0,1]
(10.6.6)
4. Then, in view of the bound 10.6.5, Chebychev’s inequality implies that there exists a measurable set G with P (G) < ε such that for each ω ∈ Gc , we have ∞
t (j ) (ω),Z (ω)) < δε−1, 2−j −1 d(Z t (j )
j =0
whence
d(Z(r,ω),Z (r,ω))
r∈Q(m(k))
=
p(m(k))
d(Zt (j ) (ω),Zt (j ) (ω))
j =0
≤
p(m(k))
d(Zt (j ) (ω),Zt (j ) (ω)) ≤ 2m(k))+1
j =0
p(m(k))
t (j ) (ω),Z (ω)) 2−j −1 d(Z t (j )
j =0
≤ 2m(k))+1
∞
t (j ) (ω),Z (ω)) ≤ 2 2−j −1 d(Z pδε−1 t (j )
j =0
≡ 2 p2−1 p −1 ε2 ε−1 = ε.
(10.6.7)
5. According to Steps 25 and 26 in the proof of Theorem 10.5.8, we have cn ) = P (Acn ) < ε and a sequence of n with P (A constructed a measurable set A r.r.v.’s τ 0 ≡ q0 ≡ 0 < τ 1 ≤ q1 < τ 2 ≤ q2 ≤ · · · < τ p−1 ≤ qp−1 < τ p ≤ qp = 1 ≡ τ p+1,
(10.6.8)
where qi ≡ i2−m(n) ∈ θ i and Xτ (i) is an r.v., for each i = 0, . . . ,p. Moreover, for n , we have each ω ∈ A p > i=0
(τ i+1 (ω) − τ i (ω)) ≥ δaucl (ε) ≡ 2−m(N ) > 2−m(k)
(10.6.9)
436
Stochastic Process
with d(X(τ i (ω),ω),X(·,ω)) ≤ ε
(10.6.10)
on the interval θ i (ω) ≡ [τ i (ω),τ i+1 (ω)) or θ i (ω) ≡ [τ i (ω),τ i+1 (ω)] depending on whether 0 ≤ i ≤ p − 1 or i = p. 6. In particular, d(X(τ i (ω),ω),Z(qi ,ω)) ≤ ε
(10.6.11)
for each i = 0, . . . ,p. Combined with inequality 10.6.10, this yields X(θ i (ω),ω) ⊂ (d(·,Z(qi ,ω)) ≤ 2ε)
(10.6.12)
n is arbitrary. for each i = 0, . . . ,p, where ω ∈ A 7. Similarly, we can construct, relative to the processes Z and X, a measurable n )c < ε and a sequence of r.r.v.’s n with P (A set A τ 0 ≡ q0 ≡ 0 < τ 1 ≤ q1 < τ 2 ≤ q2 ≤ · · · < τ p−1 ≤ qp−1 < τ p ≤ qp = 1 ≡ τ p+1,
(10.6.13)
where qi ≡ i2−m(n) ∈ θ i and Xτ (i) is an r.v., for each i = 0, . . . ,p. Moreover, for n , we have each ω ∈ A p >
(τ i+1 (ω) − τ i (ω)) > 2−m(k)
(10.6.14)
i=0
with d(X (τ i (ω),ω),X (·,ω)) ≤ ε
(10.6.15)
with θ i (ω) ≡ [τ i (ω),τ i+1 (ω)) or θ i (ω) ≡ [τ i (ω),τ i+1 (ω)] depending on whether 0 ≤ i ≤ p − 1 or i = p. 8. In particular, (X (τ i (ω),ω),X (·,ω)) ≤ ε
(10.6.16)
for each i = 0, . . . ,p. Combined with inequality 10.6.15, this yields X(θ i (ω),ω) ⊂ (d(·,Z(qi ,ω)) ≤ 2ε)
(10.6.17)
n is arbitrary. for each i = 0, . . . ,p, where ω ∈ A 9. Because X,X are a.u. càdlàg processes, Proposition 10.3.5 says that there exists a full set G such that for each ω ∈ G, the functions X(·,ω),X (·,ω) are a.u. càdlàg functions on [0,1]. n A n . Let i = 0, . . . ,p be arbitrary. Then 9. Now consider each ω ∈ GGc A qi ∈ Qm(n) ⊂ Qm(k) . Hence inequality 10.6.7 implies that d(Z(qi ,ω),Z (qi ,ω)) ≤ d(Z(r,ω),Z (r,ω)) ≤ ε. (10.6.18) r∈Q(m(k))
a.u. Càdlàg Process
437
To simplify notations, we will now suppress both the reference to ω and the overline, writing τi ,τi ,x,x ,z,z ,θi ,θi for τ i (ω),τ i (ω),X(·,ω),X (·,ω),Z(·,ω), Z (·,ω),θ i (ω),θ i (ω), respectively. Then inequality 10.6.18 can be rewritten as d(z(qi ),z (qi )) ≤ ε
(10.6.19)
and inequality 10.6.12 can be rewritten as x(θi ) ⊂ (d(·,z(qi )) ≤ 2ε).
(10.6.20)
x (θi ) ⊂ (d(·,z (qi )) ≤ 2ε).
(10.6.21)
Similarly,
10. Next, partition the set {0, . . . ,p + 1} into the union of two disjoint subsets for each i ∈ A, A and B such that (i) {0, . . . ,p + 1} = A ∪ B, (ii) |τi − τi | < 2 for each i ∈ B. Consider each i ∈ B. We will verify that and (iii) |τi − τi | > 1 ≤ i ≤ p and that d(z(qi ),z(qi−1 )) ∨ d(z (qi ),z (q i−1 )) ≤ 6ε.
(10.6.22)
In view of Condition (iii), we may assume, without loss of generality, that τi −τi > ≡ 2−m(k) . Then there exists u ∈ [τi ,τ )Qm(k) . Since [τ0,τ ) = [0,0) = φ and i 0 ) = [1,1) = φ, it follows that 1 ≤ i ≤ p. Consequently, inequalities [τp+1,τp+1 10.6.8 and 10.6.13 together imply that ,τ i+1 ). u ∈ [τi ,τi ) ⊂ [qi−1,qi ) ⊂ [τi−1 ,τ ) = θ θ , and so u ∈ θ θ Q Hence u ∈ [τi ,τ i+1 ) ∩ [τi−1 i i−1 i i−1 m(k) . Therefore, i using inequalities 10.6.7, 10.6.19, 10.6.20, and 10.6.8, we obtain, respectively,
d(z(u),z (u)) ≤ ε, d(z(qi−1 ),z (qi−1 )) ≤ ε, z(u) ∈ x(θi ) ⊂ (d(·,z(qi )) ≤ 2ε), and ) ⊂ (d(·,z (qi−1 )) ≤ 2ε). z (u) ∈ x (θi−1
The triangle inequality then yields d(z(qi ),z(qi−1 )) ≤ d(z(qi ),z(u)) + d(z(u),z (u)) + d(z (u),z (qi−1 )) + d(z (qi−1 ),z(qi−1 )) ≤ 2ε + ε + 2ε + ε = 6ε. Similarly, d(z (qi ),z (q i−1 )) ≤ 6ε. Thus inequality 10.6.22 is verified for each i ∈ B.
(10.6.23)
438
Stochastic Process
11. Now define an increasing function λ : [0,1] → [0,1] by λτi ≡ τi or λτi ≡ τ i depending on whether i ∈ A or i ∈ B, for each i = 0, . . . ,p + 1, and by linearity on [τi ,τi+1 ] for each i = 0, . . . ,p. Here we write λt ≡ λ(t) for each t ∈ [0,1] for brevity. Then, in view of the definition of the index sets A and B, we for each i = 0, . . . ,p + 1. Now consider each i = 0, . . . ,p, have |τi − λτi | < 2 and write λτi+1 − τi+1 + τi − λτi ui ≡ . τi+1 − τi Then, since τi+1 − τi ≥ 2−m(N ) ≡ according to inequality 10.6.9, we have |ui | ≤ |λτi+1 − τi+1 |
−1
+ |λτi − τi |
−1
−1 + 2 −1 = 2−m(k)+2 −1 < (1 − e−ε ), ≤ 2 where the last inequality is from inequality 10.6.4. Note that the function log(1+u) of u ∈ [−1 + e−ε,1 − e−ε ] vanishes at u = 0; it has a positive first derivative and a negative second derivative on the interval. Hence the maximum of its absolute value is attained at the right endpoint of the interval and | log(1 + ui )| ≤ | log(1 − 1 + e−ε )| = ε.
(10.6.24)
Lemma 10.2.3 therefore implies the bound p λt − λs λτi+1 − λτi = sup log log τ t −s i+1 − τi 0≤s 0, recall, from Part 2 of Definition 8.1.12, the simple first exit time
ηt,α,[t,1]Q(h) ≡ r1(d(Z(t),Z(r))>α) 1(d(Z(t),Z(s))≤α) r∈[t,1]Q(h)
+
s∈[t,r)Q(h)
1(d(Z(t),Z(s))≤α)
(10.7.1)
s∈[t,1]Q(h)
for the process Z|[t,1]Qh to exit the closed α-neighborhood of Zt . As usual, an empty product is equal to 1, by convention. Similarly, for each γ > 0, define the r.r.v.
r1(d(x(◦),Z(r))>γ ) 1d(x(◦),Z(s))≤γ ζh,γ ≡ r∈Q(h)
+
s∈[0,r)Q(h)
1d(x(◦),Z(s))≤γ ,
(10.7.2)
s∈Q(h)
where x◦ is an arbitrary but fixed reference point in the state space (S,d). It can easily be verified that ζh,γ is a simple stopping time relative to the filtration L(h) .
a.u. Càdlàg Process
445
Intuitively, ζh,γ is the first time r ∈ Qh when the process Z|Qh is outside the bounded set (d(x◦,·) ≤ γ ), with ζh,γ set to 1 if no such s ∈ Qh exists. Refer to Propositions 8.1.11 and 8.1.13 for basic properties of simple stopping times and simple first exit times. Definition 10.7.2. a.u. Boundedness on Q∞ . Let (S,d) be a locally compact metric space. Suppose the process Z : Q∞ × (,L,E) → (S,d) is such that for each ε > 0, there exists βauB (ε) > 0 so large that ⎞ ⎛ d(x◦,Zr ) > γ ⎠ < ε (10.7.3) P⎝ r∈Q(h)
for each h ≥ 0, for each γ > βauB (ε). Then we will say that the process Z and the family F of its marginal distributions are a.u. bounded, with the operation βauB as a modulus of a.u. boundedness, relative to the reference point x◦ ∈ S. Note that this condition is trivially satisfied if d ≤ 1, in which case we can take βauB (ε) ≡ 1 for each ε > 1. Definition 10.7.3. Strong right continuity in probability on Q∞ . Let (S,d) be a locally compact metric space. Let Z : Q∞ × (,L,E) → (S,d) be an arbitrary process. Suppose that for each ε,γ > 0, there exists δSRCp (ε,γ ) > 0 such that for arbitrary h ≥ 0 and s,r ∈ Qh with s ≤ r < s + δSRCp (ε,γ ), we have PA (d(Zs ,Zr ) > α) ≤ ε,
(10.7.4)
A ∈ L(s,h) ≡ L(Zr : r ∈ [0,s]Qh ) for each α > ε and for each A ∈ L(s,h) with A ⊂ (d(x◦,Zs ) ≤ γ ) and P (A) > 0. Then we will say that the process Z is strongly right continuous in probability, with the operation δSRCp as a modulus of strong right continuity in probability. An arbitrary consistent family F of f.j.d.’s, with state space (S,d) and with parameter set Q∞ or [0,1], is said to be strongly right continuous in probability, with the operation δSRCp as a modulus of strong right continuity in probability, if (i) it is continuous in probability and (ii) F |Q∞ is the family of marginal distributions of some process Z : Q∞ × (,L,E) → (S,d) that is strongly right continuous in probability, with δSRCp as a modulus of strong right continuity in probability. Note that the operation δSRCp has two variables, but is independent of the sampling frequency h. This definition will next be restated without the assumption of P (A) > 0 or the reference to the probability PA . Lemma 10.7.4. Equivalent definition of strong right continuity in probability. Let (S,d) be a locally compact metric space. Then a process Z : Q∞ × (,L,E) → (S,d) is strongly right continuous in probability, with a modulus of strong right continuity in probability δSRCp , iff for each ε,γ > 0, there exists
446
Stochastic Process
δSRCp (ε,γ ) > 0 such that for arbitrary h ≥ 0 and s,r ∈ Qh with s ≤ r < s + δSRCp (ε,γ ), we have P (d(Zs ,Zr ) > α;A) ≤ εP (A),
(10.7.5)
for each α > ε and for each A ∈ L(s,h) with A ⊂ (d(x◦,Zs ) ≤ γ ). Proof. Suppose Z is strongly right continuous in probability, with a modulus of strong right continuity in probability δSRCp . Let ε,γ > 0, h ≥ 0, and s,r ∈ Qh be arbitrary with s ≤ r < s + δSRCp (ε,γ ). Consider each α > 0 and A ∈ L(s,h) with A ⊂ (d(x◦,Zs ) ≤ γ ). Suppose, for the sake of contradiction, that P (d(Zs ,Zr ) > α;A) > εP (A). Then P (A) ≥ P (d(Zs ,Zr ) > α;A) > 0. We can divide both sides of the last displayed inequality by P (A) to obtain PA (d(Zs ,Zr ) > α) ≡ P (A)−1 P (d(Zs ,Zr ) > α;A) > ε, which contradicts inequality 10.7.4 in Definition 10.7.3. Hence inequality 10.7.5 holds. Thus the “only if” part of the lemma is proved. The “if” part is equally straightforward, so its proof is omitted. Three more lemmas are presented next to prepare for the main theorem of this section. The first two are elementary. Lemma 10.7.5. Minimum of a real number and a sum of two real numbers. For each a,b,c ∈ R, we have a ∧ (b + c) = b + c ∧ (a − b) or, equivalently, a ∧ (b + c) − b = c ∧ (a − b). Proof. Write c ≡ a − b. Then a ∧ (b + c) = (b + c ) ∧ (b + c) = b + c ∧ c = b + (a − b) ∧ c. Lemma 10.7.6. Function on two contiguous intervals. Let Z : Q∞ × → S be an arbitrary process. Let α > 0 and β > 2α be arbitrary, such that the set Aβr,s ≡ (d(Zr ,Zs ) > β)
(10.7.6)
β β is measurable for each r,s ∈ Q∞ . Let ω ∈ r,s∈Q(∞) (Ar,s ∪(Ar,s )c ) and let h ≥ 0 be arbitrary. Let Aω,Bω be arbitrary intervals, with endpoints in Qh , such that the right endpoint of Aω is equal to the left endpoint of Bω . Let t,t ∈ (Aω ∪Bω )Qh be arbitrary such that t < t . Suppose there exist xω,yω ∈ S with d(xω,Z(r,ω)) ∨ d(yω,Z(s,ω)) ≤ α. (10.7.7) r∈A(ω)Q(h)
Then ω∈
r,s∈(t,t )Q(h);r≤s
s∈B(ω)Q(h)
β
β
β
β
β
β
((At,r ∪ At,s )(At,r ∪ At ,s )(At ,r ∪ At ,s ))c .
(10.7.8)
a.u. Càdlàg Process
447
Proof. With ω fixed, write zr ≡ Z(r,ω) ∈ S for each r ∈ Q∞ , and write A ≡ Aω , B ≡ Bω , x ≡ xω, and y ≡ yω . Then inequality 10.7.7 can be restated as d(x,zr ) ∨ d(y,zs ) ≤ α. (10.7.9) r∈AQ(h)
s∈BQ(h)
Let r,s ∈ (t,t )Qh ⊂ AQh ∪ BQh be arbitrary with r ≤ s. By assumption, the endpoints of the intervals A and B are members of Qh . Then, since the right endpoint of A is equal to the left endpoint of B, there are only three possibilities: (i) t,r,s ∈ AQh , (ii) t,r ∈ AQh and s,t ∈ BQh , or (iii) r,s,t ∈ BQh . In case (i), inequality 10.7.9 implies that d(zt ,zr ) ∨ d(zt ,zs ) ≤ (d(x,zt ) + d(x,zr )) ∨ (d(x,zt ) + d(x,zs )) ≤ 2α ∨ 2α = 2α. Similarly, in case (ii), we have d(zt ,zr ) ∨ d(zt ,zs ) ≤ (d(x,zt ) + d(x,zr )) ∨ (d(y,zt ) + d(y,zs )) ≤ 2α ∨ 2α = 2α. Similarly, in case (iii), we have d(zt ,zr ) ∨ d(zt ,zs ) ≤ (d(y,zt ) + d(y,zr )) ∨ (d(y,zt ) + d(y,zs )) ≤ 2α ∨ 2α = 2α. Thus, in each case, we have (d(zt ,zr ) ∨ d(zt ,zs )) ∧ (d(zt ,zr ) ∨ d(zt ,zs )) ∧ (d(zt ,zr ) ∨ d(zt ,zs )) ≤ 2α ≤ β, where r,s ∈ (t,t )Qh are arbitrary with r ≤ s. Equivalently, the desired relation 10.7.8 holds. Next is the key lemma. Lemma 10.7.7. Lower bound for mean waiting time before exit after a simple stopping time. Let (S,d) be a locally compact metric space. Suppose the process Z : Q∞ × → S is strongly right continuous in probability, with a modulus of strong right continuity in probability δSRCp . Let ε,γ > 0 be arbitrary. Take any m ≥ 0 so large that m ≡ 2−m < δSRCp (ε,γ ). Let h ≥ m be arbitrary. Define the simple stopping time ζh,γ to be the first time when the process Z|Qh is outside the bounded set (d(x◦,·) ≤ γ ), as in Definition 10.7.1. Then the following conditions hold: 1. Let the point t ∈ Qh and α > ε be arbitrary. Let η be an arbitrary simple stopping time with values in [t,t + m ]Qh relative to the natural filtration L(h) of the process Z|Qh . Let A ∈ L(η,h) be an arbitrary measurable set such that A ⊂ (d(x◦,Zη ) ≤ γ ). Then
448
Stochastic Process P (d(Zη,Z1∧(t+(m)) ) > α;A) ≤ εP (A)
(10.7.10)
for each α > ε. 2. Suppose ε ≤ 2−2 . Let α > 2ε be arbitrary. As an abbreviation, write ηt ≡ ηt,α,[t,1]Q(h) for each t ∈ Qh . Let τ be an arbitrary simple stopping time with values in Qh , relative to the filtration L(h) . Then the r.r.v.
ητ ≡ (ηt 1η(t) ε be arbitrary. Let t ∈ Qh, the simple stopping time η with values in [t,t + m ]Qh , and the set A ∈ L(η,h) with A ⊂ (d(x◦,Zη ) ≤ γ ) be as given. Write r ≡ 1 ∧ (t + m ). Then η has values in [t,r]Qh . Let s ∈ [t,r]Qh be arbitrary. Then s ≤ r ≤ t + m < s + δSRCp (ε,γ ) and (η = s;A) ∈ L(s,h) . Therefore, we can apply inequality 10.7.5 in Lemma 10.7.4 to the modulus of strong right continuity δSRCp , the points s,r ∈ Qh , and the measurable set (η = s;A) ∈ L(s,h) to obtain P (d(Zs ,Zr ) > α;η = s;A) ≤ εP (η = s;A), where s ∈ [t,r]Qh is arbitrary. Consequently,
P (d(Zs ,Zr ) > α;η = s;A) P (d(Zη,Zr ) > α;A) = s∈[t,r]Q(h)
≤
εP (η = s;A) = εP (A).
s∈[t,r]Q(h)
Assertion 1 is proved. 2. To prove Assertion 2, suppose ε ≤ 2−2 , and let α > 2ε be arbitrary. First consider each t ∈ Qh . Then, as a special case of the defining equality 10.7.11 in the hypothesis, we have the simple stopping time
ηt ≡ (ηt 1η(t) α;t < ζh,γ ;A) ≤ P (d(Zt ,Zη ) > α;d(x◦,Zt ) ≤ γ ;A) ≤ P (d(Zt ,Zη ) > α;d(Zt ,Zr ) ≤ 2−1 α;d(x◦,Zt ) ≤ γ ;A) + P (d(Zt ,Zr ) > 2−1 α;d(x◦,Zt ) ≤ γ ;A) ≤ P (d(Zη,Zr ) > 2−1 α;d(x◦,Zt ) ≤ γ ;A) + P (d(Zt ,Zr ) > 2−1 α;d(x◦,Zt ) ≤ γ ;A).
(10.7.16)
Here, the second inequality is thanks to relation 10.7.15, the third inequality is by the definition of the simple stopping time time ηt , and the fifth inequality is by the definition of the simple stopping time ζh,γ . Since 2−1 α > ε, we have
450
Stochastic Process P (d(Zη,Zr ) > 2−1 α;d(x◦,Zt ) ≤ γ ;A) ≤ εP (d(x◦,Zt ) ≤ γ ;A)
by applying inequality 10.7.10 where α,A are replaced by 2−1 α, (d(x◦,Zt ) ≤ γ ;A), respectively. Similarly, we have P (d(Zt ,Zr ) > 2−1 α;d(x◦,Zt ) ≤ γ ;A) ≤ εP (d(x◦,Zt ) ≤ γ ;A) by applying inequality 10.7.10 where η,α,A are replaced by t,2−1 α, (d(x◦,Zt ) ≤ γ ;A), respectively. Combining, inequality 10.7.16 can be continued to yield P (ηt < r;A) ≤ 2εP (d(x◦,Zt ) ≤ γ ;A) ≤ 2εP (A).
(10.7.17)
Consequently, E(ηt − t;A) ≥ E(r − t;ηt ≥ r;A) = (r − t)P (ηt ≥ r;A) = ((1 − t) ∧ m )(P (A) − P (ηt < r;A)) ≥ ((1 − t) ∧ m )(P (A) − 2εP (A)) ≥ ((1 − t) ∧ m )2−1 P (A),
(10.7.18)
where the second inequality is by inequality 10.7.17, and where the last inequality is because 1 − 2ε ≥ 2−1 by the assumption that ε ≤ 2−2 . 3. To complete the proof of Assertion 2, let the simple stopping time τ be arbitrary, with values in Qh relative to the filtration L(h) . Then
P (ηt < 1 ∧ (t + m );τ = t) P (ητ < 1 ∧ (τ + m )) = t∈Q(h)
≤
2εP (τ = t) = 2ε,
(10.7.19)
t∈Q(h)
where the inequality is by applying inequality 10.7.17 to the measurable set A ≡ (τ = t) ∈ L(t,h) , for each t ∈ Qh . Similarly,
E(ηt − t;τ = t) E(ητ − τ ) = t∈Q(h)
≥
((1 − t) ∧ m )2−1 P (τ = t)
t∈Q(h)
= 2−1 E((1 − τ ) ∧ m ),
(10.7.20)
where the first inequality is by inequality 10.7.18. Summing up, inequalities 10.7.19 and 10.7.20 yield, respectively, the desired inequalities 10.7.12 and 10.7.13. The lemma is proved. Theorem 10.7.8. Strong right continuity in probability and a.u. boundedness together imply D-regularity and extendability by the right limit to an a.u. càdlàg process. Let (S,d) be a locally compact metric space. Suppose an arbitrary process Z : Q∞ × (,L,E) → S is (i) a.u. bounded, with a modulus of a.u.
a.u. Càdlàg Process
451
boundedness βauB , and (ii) strongly right continuous in probability, with a modulus of strong right continuity in probability δSRCp . Then the following conditions hold: 1. The process Z is D-regular, with a modulus of D-regularity m ≡ m(βauB , δSRCp ) and a modulus of continuity in probability δ Cp (·,βauB ,δSRCp ). 2. The right-limit extension X ≡ rLim (Z) : [0,1] × → S is an a.u. càdlàg process, with the same modulus of continuity in probability δ Cp (·,βauB ,δSRCp ) and a modulus of a.u. càdlàg δaucl (·,βauB ,δSRCp ). Recall here the right-limit extension rLim from Definition 10.5.6. Proof. 1. Condition (i), the a.u. boundedness condition in the hypothesis, says that for each ε > 0, we have ⎞ ⎛ d(x◦,Zu ) > γ ⎠ < ε (10.7.21) P⎝ u∈Q(h)
for each h ≥ 0, for each γ > βauB (ε). 2. Let ε > 0 and α ∈ (2−2 ε,2−1 ε) be arbitrary. Take an arbitrary γ ∈ (βauB (2−2 ε),βauB (2−2 ε) + 1). Define m ≡ [− log2 (1 ∧ δSRCp (2−2 ε,γ ))]1 and δ Cp (ε) ≡ δ Cp (ε,βauB ,δSRCp ) ≡ m ≡ 2−m < δSRCp (2−2 ε,γ ). We will verify that δ Cp (·,βauB ,δSRCp ) is a modulus of continuity in probability of the process Z. For each h ≥ 0, define the measurable set ⎛ ⎞ Gh ≡ ⎝ d(x◦,Zu ) > γ ⎠ . u∈Q(h)
Then, since γ > βauB (2−2 ε), inequality 10.7.21, with ε replaced by 2−2 ε, implies that P (Gh ) < 2−2 ε,
(10.7.22)
where h ≥ 0 is arbitrary. 3. Consider each s,r ∈ Q∞ with |s − r| < δ Cp (ε). Define the measurable set Ds,r ≡ (d(Zs ,Zr ) ≤ α) ⊂ (d(Zs ,Zr ) ≤ 2−1 ε).
(10.7.23)
First assume that s ≤ r. Then s ≤ r < s + δ Cp (ε) < s + δSRCp (2−2 ε,γ ).
(10.7.24)
452
Stochastic Process
Take h ≥ m so large that s,r ∈ Qh . Since α > 2−2 ε, we can apply inequality 10.7.5 in Lemma 10.7.4, where ε and A are replaced by 2−2 ε and (d(x◦,Zs ) ≤ γ ), respectively, to obtain c
c Gh ) ≤ P (d(Zs ,Zr ) > α;d(x◦,Zs ) ≤ γ ) P (Ds,r
≤ 2−2 εP (d(x◦,Zs ) ≤ γ ) ≤ 2−2 ε. Combining with inequality 10.7.22, we obtain c c P (Ds,r ) ≤ P (Ds,r Gh ) + P (Gh ) ≤ 2−2 ε + 2−2 ε = 2−1 ε. c
(10.7.25)
Consequently, in view of relation 10.7.23, we see that c c E1 ∧ d(Zs ,Zr ) ≤ P (Ds,r ) + E(1 ∧ d(Zs ,Zr );Ds,r ) ≤ P (Ds,r ) + 2−1 ε ≤ ε.
By symmetry, the same inequality holds for each s,r ∈ Q∞ with |s −r| < δ Cp (ε), where ε > 0 is arbitrary. Hence the process Z is continuous in probability, with a modulus of continuity in probability δ Cp ≡ δ Cp (·,βauB ,δSRCp ). Thus the process Z satisfies Condition 2 of Definition 10.4.1. 4. It remains to prove Condition 1 in Definition 10.4.1 for Z to be D-regular. To that end, define m−1 ≡ 0. Let n ≥ 0 be arbitrary, but fixed until further notice. Write εn ≡ 2−n . Take any γn ∈ (βauB (2−3 εn ),βauB (2−3 εn ) + 1). Define the integer mn ≡ [mn−1 ∨ − log2 (1 ∧ δSRCp (2−m(n)−n−6 εn,γn ))]1 .
(10.7.26)
We will prove that the increasing sequence m ≡ (mk )k=0,1,... is a modulus of D-regularity of the process Z. As an abbreviation, define Kn ≡ 2m(n)+n+3 , hn ≡ mn+1 . Define the measurable set ⎛ ⎞ d(x◦,Zu ) > γn ⎠ . (10.7.27) Gn ≡ ⎝ u∈Q(h(n))
Then, since γn > βauB (2−3 εn ), inequality 10.7.21 implies that P (Gn ) < 2−3 εn . Moreover, hn ≡ mn+1 > mn ≥ n ≥ 0. Equality 10.7.26 can be rewritten as mn ≡ [mn−1 ∨ − log2 (1 ∧ δSRCp (Kn−1 2−3 εn,γn ))]1 .
(10.7.28)
Then m(n) ≡ 2−m(n) < δSRCp (Kn−1 2−3 εn,γn ).
(10.7.29)
a.u. Càdlàg Process
453
5. Now let β > εn be arbitrary such that the set Aβr,s ≡ (d(Zr ,Zs ) > β)
(10.7.30)
is measurable for each r,s ∈ Q∞ . Define the exceptional set β β β β β β Dn ≡ (At,r ∪ At,s )(At,r ∪ At ,s )(At ,r ∪ At ,s ), t∈Q(m(n)) r,s∈(t,t )Q(m(n+1));r≤s
(10.7.31) where for each t ∈ [0,1)Qm(n) , we write t ≡ t + 2−m(n) . It remains only to prove that P (Dn ) < 2−n , as required in Condition 1 of Definition 10.4.1. 6. For that purpose, let the simple first stopping time ζ ≡ ζh(n),γ (n) be as in Definition 10.7.1. Thus ζ is the first time s ∈ Qh(n) when the process Z|Qh(n) is outside the γn -neighborhood of the reference point x◦ , with ζ set to 1 if no such s ∈ Qh exists. Take any αn ∈ (2−1 εn,2−1 β). This is possible because β > εn by assumption. 7. Now let t ∈ Qh(n) be arbitrary, but fixed until further notice. Define the simple first exit time ηt ≡ ηt,α(n),[t,1]Q(h(n)) and define the simple stopping time
ηt ≡ ηt 1η(t) 2(2−3 εn ), we can apply Part 2 of Lemma 10.7.7, where ε,γ ,m,h,τ, are replaced by 2−3 εn,γn,mn,hn,τk−1,m(n),
454
Stochastic Process
respectively, to obtain the bounds P (ητ (k−1) < 1 ∧ (τk−1 + m(n) )) ≤ 2(2−3 εn )
(10.7.33)
E(ητ (k−1) − τk−1 ) ≥ 2−1 E((1 − τk−1 ) ∧ m(n) ).
(10.7.34)
and
Hence P (τk − τk−1 < (1 − τk−1 ) ∧ m(n) ) ≡ P (ητ (k−1) < τk−1 + (1 − τk−1 ) ∧ m(n) )) = P (ητ (k−1) < 1 ∧ (τk−1 + m(n) )) ≤ 2(2−3 εn ),
(10.7.35)
where the inequality is due to inequality 10.7.33. Similarly, E(τk − τk−1 ) ≡ E(ητ (k−1) − τk−1 ) ≥ 2−1 E((1 − τk−1 ) ∧ m(n) ), (10.7.36) where the inequality is due to inequality 10.7.34. Recall here that k = 1, . . . ,Kn is arbitrary. Consequently, 1 ≥ E(τK(n) ) =
K(n)
E(τk − τk−1 ) ≥ 2−1
k=1
≥ 2−1
K(n)
E((1 − τk−1 ) ∧ m(n) )
k=1 K(n)
E((1 − τK(n)−1 ) ∧ m(n) )
k=1
= 2−1 Kn E((1 − τK(n)−1 ) ∧ m(n) ) ≥ 2−1 Kn E((1 − τK(n)−1 ) ∧ m(n) ;1 − τK(n)−1 > m(n) ) = 2−1 Kn E(m(n) ;1 − τK(n)−1 > m(n) ) = 2−1 Kn m(n) P (1 − τK(n)−1 > m(n) ) = 2−1 Kn m(n) P (τK(n)−1 < 1 − m(n) ), where the second inequality is from inequality 10.7.36. Dividing by 2−1 Kn m(n) , we obtain −m(n)−n−3 m(n) 2 = 2−2 εn . P (τK(n)−1 < 1 − m(n) ) < 2Kn−1 −1 m(n) ≡ 2 · 2 (10.7.37)
At the same time, P (τK(n) < 1;τK(n)−1 ≥ 1 − m(n) ) ≡ P (ητ (K(n)−1) < 1;τK(n)−1 ≥ 1 − m(n) ) ≤ P (ητ (K(n))−1) < 1 ∧ (τK(n)−1 + m(n) )) ≤ 2(2−3 εn ) = 2−2 εn, where the last inequality is by applying inequality 10.7.35 to k = Kn .
(10.7.38)
a.u. Càdlàg Process
455
8. Next define the exceptional set Hn ≡ (τK(n) < 1). Then, combining inequalities 10.7.38 and 10.7.37, we obtain P (Hn ) ≡ P (τK(n) < 1) ≤ P (τK(n) < 1;τK(n)−1 ≥ 1 − m(n) ) + P (τK(n)−1 < 1 − m(n) ) ≤ 2−2 εn + 2−2 εn = 2−1 εn . Summing up, except for the small exceptional set Gn ∪ Hn , there are at most the finite number Kn of simple stopping times 0 < τ1 < · · · < τK(n) = 1, each of which is the first time in Qh(n) when the process Z strays from the previous stopped state by a distance greater than αn , while still staying in the bounded set (d(x◦,·) < γn ). At the same time, inequality 10.7.35 says that each waiting time τk − τk−1 exceeds a certain lower bound with some probability close to 1. We wish, however, to be able to say that with some probability close to 1, all these Kn waiting times simultaneously exceed a certain lower bound. For that purpose, we will relax the lower bound and specify two more small exceptional sets, as follows. 9. Define two more exceptional sets, Bn ≡
K(n)
(τk − τk−1 < (1 − τk−1 ) ∧ m(n) )
(10.7.39)
k=1
and Cn ≡
K(n)
(d(Zτ (k−1)∨(1−(m(n)),Z1 ) > αn ),
(10.7.40)
k=1
and proceed to estimate P (Bn ) and P (Cn ). 10. To estimate P (Cn ), let k = 1, . . . ,Kn be arbitrary. Trivially, αn > 2−1 εn > 2Kn−1 2−3 εn . Recall from inequality 10.7.29 that m(n) ≡ 2−m(n) < δSRCp (Kn−1 2−3 εn,γn ). As an abbreviation, write t ≡ 1 − m(n) . Define the simple stopping time η ≡ τk−1 ∨ t ≡ τk−1 ∨ (1 − m(n) ) with values in [t,1]Qh(n) relative to the filtration L(h(n)) . Then (d(x◦,Z η ) ≤ γn ) ∈ η,h(n)) . Hence we can apply relation 10.7.10 in Lemma 10.7.7, where L( η,τk−1, ε,γ ,α,m,h,t,r,η,τ,A are replaced by Kn−1 2−3 εn,γn,αn,mn,hn,t,1, (d(x◦,Z η ) ≤ γn ), respectively, to obtain −1 −3 −1 −3 P (d(Zη,Z1 ) > αn ;d(x◦,Z η ) ≤ γn ) ≤ Kn 2 εn P (d(x◦,Z η ) ≤ γn ) ≤ Kn 2 εn . (10.7.41)
456
Stochastic Process
Recalling the defining equalities 10.7.40 and 10.7.27 for the measurable sets Cn and Gn , respectively, we can now estimate ⎛ ⎞ K(n) P (Cn Gcn ) ≡ P ⎝ (d(Z d(x◦,Zu ) ≤ γn ⎠ η,Z1 ) > αn ; ⎛ ≤P⎝
k=1
u∈Q(h(n))
K(n)
⎞
(d(Z η,Z1 ) > αn ;d(x◦,Z η ) ≤ γ n )⎠
k=1
≤
K(n)
Kn−1 2−3 εn = 2−3 εn,
k=1
where the last inequality is by inequality 10.7.41 11. To estimate P (Bn ), apply relation 10.7.12 in Lemma 10.7.7, where ε,γ ,α,m,h,τ are replaced by Kn−1 2−3 εn,γn,αn,mn,hn,τk−1 , respectively, to obtain P (ητ (k−1) < 1 ∧ (τk−1 + m(n) )) ≤ 2Kn−1 2−3 εn .
(10.7.42)
Since ητ (k−1) ≡ τk and 1 ∧ (τk−1 + m(n) ) = τk−1 + m(n) ∧ (1 − τk−1 ) according to Lemma 10.7.5, inequality 10.7.42 is equivalent to P (τk − τk−1 < m(n) ∧ (1 − τk−1 )) ≤ 2Kn−1 2−3 εn .
(10.7.43)
Recalling the defining equality 10.7.39 for the measurable sets Bn , we obtain ⎛ ⎞ K(n) K(n)
P (Bn ) ≡ P ⎝ (τk − τk−1 < n ∧ (1 − τk−1 ))⎠ ≤ 2Kn−1 2−3 εn = 2−2 εn . k=1
k=1
Combining, we see that P (Gn ∪ Hn ∪ Bn ∪ Cn ) = P (Gn ∪ Hn ∪ Bn ∪ Cn Gcn ) ≤ 2−3 εn + 2−1 εn + 2−2 εn + 2−3 εn = εn . 12. Finally, we will prove that Dn ⊂ Gn ∪ Hn ∪ Bn ∪ Cn . To that end, consider each ω ∈ Gcn Hnc Bnc Cnc . Then, since ω ∈ Gcn, we have t∈Q(h(n)) d(x◦,Zt (ω)) ≤ γn . Consequently, ζh(n),γ (n) (ω) = 1 according to the defining equality 10.7.2. Hence, by the defining equality 10.7.32, we have ηt (ω) ≡ ηt (ω)1η(t,ω) τk+1 (ω) − τk (ω) ≥ m(n) ∧ (1 − τk (ω)),
(10.7.48)
where the last inequality is because ω ∈ Bnc . Consequently, m(n) > 1 − τk (ω). Hence m(n) ∧ (1 − τk (ω)) = 1 − τk (ω). Therefore the second half of inequality 10.7.48 yields τk+1 (ω) − τk (ω) ≥ 1 − τk (ω), which is equivalent to 1 = τk+1 (ω). Hence inequality 10.7.47 implies that t = 1 and τk (ω) > t = 1 − m(n) . It follows that d(Zτ (k) (ω),Z1 (ω)) = d(Zτ (k)∨(1−(m(n)) (ω),Z1 (ω)) ≤ αn,
(10.7.49)
where the inequality is because ω ∈ Cnc . At the same time, because 1 = τk+1 (ω) and ω ∈ Gcn Hnc Bnc Cnc , we have [τk (ω),1)Qh(n) = [τk (ω),τk+1 (ω))Qh(n) ≡ [τk (ω),ητ (k) (ω))Qh(n) = [τk (ω),ητ (k) (ω))Qh(n), where the last equality follows from equality 10.7.44, where t is replaced by τk (ω). The basic properties of the simple first exit time ητ (k) imply that d(Z(τk (ω),ω),Z(u,ω)) ≤ αn
(10.7.50)
for each u ∈ [τk (ω),1)Qh(n) . Combining with inequality 10.7.49 for the endpoint u = 1, we see that d(Z(τk (ω),ω),Z(u,ω)) ≤ αn
(10.7.51)
for each u ∈ [τk (ω),1]Qh(n) . Note that β > 2αn and t,t ∈ [τk−1 (ω),τk (ω)) ∪ [τk (ω),t ], with t = 1. In view of inequalities 10.7.46 and 10.7.51, we can
458
Stochastic Process
apply Lemma 10.7.6, where α,h,Aω,Bω t,t xω,yω are replaced by αn,hn , [τk−1 (ω),τk (ω)),[τk (ω),t ], t,t , Z(τk−1 (ω),ω), Z(τk (ω),ω), respectively, to obtain β β β β β β ((At,r ∪ At,s )(At,r ∪ At ,s )(At ,r ∪ At ,s ))c ω∈ r,s∈(t,t )Q(h(n));r≤s
≡
β
r,s∈(t,t )Q(m(n+1));r≤s
β
β
β
β
β
((At,r ∪ At,s )(At,r ∪ At ,s )(At ,r ∪ At ,s ))c ⊂ Dnc . (10.7.52)
(ii ) Now suppose the interval (t,t ] contains zero or one member in the sequence τ1 (ω), . . . ,τK(n)−1 (ω). Either way, t,t ∈ [τk−1 (ω),τk (ω)) ∪ [τk (ω),τk+1 (ω)]. Then inequality 10.7.46 holds for both k and k + 1. Hence we can apply Lemma 10.7.6, where α,h,Aω,Bω t,t xω,yω are replaced by αn,hn , [τk (ω),τk+1 (ω)), [τk−1 (ω),τk (ω)), t,t , Z(τk−1 (ω),ω), Z(τk (ω),ω), respectively, to again obtain relation 10.7.52. 13. Summing up, we have proved that ω ∈ Dnc for each ω ∈ Gcn Hnc Bnc Cnc . Consequently, Dn ⊂ Gn ∪ Hn ∪ Bn ∪ Cn , whence P (Dn ) ≤ P (Gn ∪ Hn ∪ Bn ∪ Cn ) < εn ≡ 2−n, where n ≥ 0 is arbitrary. Condition 1 of Definition 10.4.1 has also been verified. Accordingly, the process Z is D-regular, with the sequence m ≡ (mn )n=0.1.... as a modulus of D-regularity, and with a modulus of continuity in probability δ Cp ≡ δ Cp (·,βauB ,δSRCp ). Assertion 1 is proved. 14. Therefore, by Theorem 10.5.8, the right-limit extension X ≡ rLim (Z) : [0,1] × → S is a.u. càdlàg, with the same modulus of continuity in probability δ Cp ≡ δ Cp (·,βauB ,δSRcp ), and with a modulus of a.u. càdlàg δaucl (·,m,δCp ) ≡ δaucl (·,βauB ,δSRCp ).
Assertion 2 is proved.
10.8 Sufficient Condition for an a.u. Càdlàg Martingale Using Theorem 10.7.8 from Section 10.7, we will prove a sufficient condition for a martingale X : [0,1] × → R to be equivalent to one that is a.u. càdlàg. To that end, recall from Definition 8.3.3 the special convex function λ : R → R, defined by λ(x) ≡ 2x + (e−|x| − 1 + |x|)
(10.8.1)
for each x ∈ R. Theorem 8.3.4 says that the function λ is increasing and strictly convex, with
a.u. Càdlàg Process
459
|x| ≤ |λ(x)| ≤ 3|x|
(10.8.2)
for each x ∈ R. Also recall Definition 9.0.2 for the enumerated set Q∞ ≡ {t0,t1, . . .} of dyadic rationals in [0,1], and the enumerated subset Qh ≡ {0,h,2h, . . . ,1} = {t0,t1,t2, . . . ,tp(h) }, where ph ≡ 2h , h ≡ 2−h , for each h ≥ 0. Lemma 10.8.1. Each martingale on Q∞ is a.u. bounded. Let Z : Q∞ × → R be a martingale relative to some filtration L ≡ {L(t) : t ∈ Q∞ }. Suppose b > 0 is an upper bound of E|Z0 | ∨ E|Z1 | ≤ b. Then the process Z is a.u. bounded in the sense of Definition 10.7.2, with a modulus of a.u. boundedness βauB ≡ βauB (·,b). Proof. Let ε > 0 be arbitrary. Take an arbitrary α ∈ (2−2 ε,2−1 ε). Take an arbitrary real number K > 0 so large that 6b
α ⎠ < α. P⎝ r∈Q(h)
r∈Q(h)
(10.8.5) Separately, Chebychev’s inequality implies that P (|Z0 | > bα −1 ) ≤ b−1 αE|Z0 | ≤ α.
460
Stochastic Process
Combining with inequality 10.8.5, we obtain ⎛ ⎞ ⎞ ⎛ P⎝ |Zr | > Kα + bα −1 ⎠ ≤ P ⎝ |Zr − Z0 | > Kα ⎠ r∈Q(h)
r∈Q(h)
+ P (|Z0 | > bα −1 ) < 2α < ε.
Consequently, P ( r∈Q(h) |Zr | > γ ) < ε for each γ > Kα + bα −1 ≡ βauB (ε), where h ≥ 0 is arbitrary. In other words, the process Z is a.u. bounded in the sense of Definition 10.7.2, with βauB as a modulus of a.u. boundedness relative to the reference point 0 in R. Lemma 10.8.2. Martingale after an event observed at time t. Let Z : Q∞ × (,L,E) → R be a martingale relative to some filtration L ≡ {L(t) : t ∈ Q∞ }. Let t ∈ Q∞ and A ∈ L(t) be arbitrary with P (A) > 0. Recall from Definition 5.6.4 the conditional probability space (,L,EA ). Then the process Z|[t,1]Q∞ : [t,1]Q∞ × (,L,EA ) → R is a martingale relative to the filtration L. Proof. Consider each s,r ∈ [t,1]Q∞ with s ≤ r. Let U ∈ L(s) be arbitrary, with U bounded. Then U 1A ∈ L(s) . Hence, since Z : Q∞ × (,L,E) → R is a martingale relative to the filtration L, we have EA (Zr U ) ≡ P (A)−1 E(Zr U 1A ) = P (A)−1 E(Zs U 1A ) ≡ EA (Zs U ), where Zs ∈ L(s) . Hence EA (Zr |L(s) ) = Zs , where s,r ∈ [t,1]Q∞ are arbitrary with s ≤ r. Thus the process Z|[t,1]Q∞ : [t,1]Q∞ × (,L,EA ) → R is a martingale relative to the filtration L.
Theorem 10.8.3. Sufficient condition for a martingale on Q∞ to have an a.u. càdlàg martingale extension to [0,1]. Let Z : Q∞ × → R be an arbitrary martingale relative to some filtration L ≡ {L(t) : t ∈ Q∞ }. Suppose the following conditions are satisfied: (i). There exists b > 0 such that E|Z1 | ≤ b. (ii). For each α,γ > 0, there exists δ Rcp (α,γ ) > 0 such that for each h ≥ 1 and t,s ∈ Qh with t ≤ s < t + δ Rcp (α,γ ), and for each A ∈ L(t,h) with A ⊂ (|Zt | ≤ γ ) and P (A) > 0, we have EA |Zs | − EA |Zt | ≤ α
(10.8.6)
|EA e−|Z(s)| − EA e−|Z(t)| | ≤ α.
(10.8.7)
and
Then the following conditions hold:
a.u. Càdlàg Process
461
1. The martingale Z is D-regular, with a modulus of continuity in probability δ Cp ≡ δ Cp (·,b,δ Rcp ) and a modulus of D-regularity m ≡ m(b,δ Rcp ). 2. Let X ≡ rLim (Z) : [0,1] × → R be the right-limit extension of Z. Then X is an a.u. càdlàg martingale relative to the right-limit extension L+ ≡ {L(t+) : t ∈ [0,1]} of the given filtration L, with δ Cp (·,b,δ Rcp ) as a modulus of continuity in probability, and with a modulus of a.u. càdlàg δaucl (·,b,δ Rcp ). ∞ × ,R), ρP rob,Q(∞) ) 3. Recall from Definition 6.4.2 the metric space (R(Q of stochastic processes with parameter set Q∞, where sequential convergence relative to the metric ρ P rob,Q(∞) is equivalent to convergence in probability when stochastic processes are viewed as r.v.’s. Let R Mtgl,b,δ(Rcp) (Q∞ × ,R) denote the subspace of (R(Q∞ × ,R), ρP rob,Q(∞) )) consisting of all martingales Z : Q∞ × → R satisfying Conditions (i–ii) with a common bound b > 0 and a common operation δ Rcp . Then the right-limit extension ρP rob,Q(∞) )) rLim : (R Mtgl,b,δ(Rcp) (Q∞ × ,R), δ(aucl),δ(cp) [0,1],ρD[0,1] → (D )
(10.8.8)
is a well-defined uniformly continuous function, where δ(aucl),δ(cp) [0,1],ρD[0,1] ) (D is the metric space of a.u. cadl ` ag ` processes that share the common modulus of continuity in probability δ Cp ≡ δ Cp (·,b,δ Rcp ), and that share the common modulus of a.u. càdlàg δaucl ≡ δaucl (·,m,δCp ). Moreover, the mapping rLim has a modulus of continuity δrLim (·,b,δ Rcp ) depending only on b and δ Rcp . Proof. 1. Because Z is a martingale, we have E|Z0 | ≤ E|Z1 | by Assertion 4 of Proposition 8.2.2. Hence E|Z0 | ∨ E|Z1 | = E|Z1 | ≤ b. Therefore, according to Lemma 10.8.1, the martingale Z is a.u. bounded, with the modulus of a.u. boundedness βauB ≡ βauB (·,b), in the sense of Definition 10.7.2. 2. Let ε,γ > 0 be arbitrary. Define α ≡ αε,γ ≡ 1 ∧
1 3 ε exp(−3(γ ∨ b + 1)ε−1 ). 12
Take κ ≥ 0 so large that 2−κ < δ Rcp (α,γ ), where δ Rcp is the operation given in the hypothesis. Define δSRCp (ε,γ ) ≡ δSRCp (ε,γ ,δ Rcp ) ≡ 2−κ < δ Rcp (α,γ ). We will show that the operation δSRCp is a modulus of strong right continuity in probability of the process Z in the sense of Definition 10.7.3. 3. To that end, let h ≥ 0 and t,s ∈ Qh be arbitrary with t < s < t +δSRCp (ε,γ ). Then t < s < t ≡ t + δSRCp (ε,γ ) ≡ t + 2−κ < t + δ Rcp (α,γ ).
462
Stochastic Process
Hence inequalities 10.8.6 and 10.8.7 hold. Moreover, 2−h ≤ s − t < 2−κ , whence h > κ. Now let A ∈ L(t,h) be arbitrary with A ⊂ (|Zt | ≤ γ ) and P (A) > 0. By Lemma 10.8.2, the process Z|[t,1]Q∞ : [t,1]Q∞ × (,L,EA ) → R
(10.8.9)
is a martingale relative to the filtration L. Hence the process Z|[t,t ]Qh : [t,t ]Q∞ × (,L,EA ) → R
(10.8.10)
is also a martingale relative to the filtration L. Consequently, EA Zt = EA Zt and EA |Zt | ≥ EA |Zt |. Moreover, EA |Zt | ≡ E(|Zt |;A)/P (A) ≤ E(γ ;A)/P (A) = γ . Taken together, equality 10.8.1 and inequalities 10.8.6 and 10.8.7 imply that
EA λ(Zt ) − EA λ(Zt ) ≤ |EA e−|Z(t )| − EA e−|Z(t)| | + |EA |Zt | − EA |Zt ||
= |EA e−|Z(t )| − EA e−|Z(t)| | + EA |Zt | − EA |Zt | ≤ α + α = 2α ≤
1 3 ε exp(−3(b ∨ γ + 1)ε−1 ) 6
1 3 ε exp(−3bε−1 ) 6 1 ≤ ε3 exp(−3(EA |Zt | ∨ EA |Zt |)ε−1 ). 6 With this inequality, we can apply Assertion 2 of Theorem 8.3.4, with Q,X replaced by [t,t ]Qh , Z|[t,t ]Qh , respectively, to obtain the bound
ε) < ε
(10.8.11)
for each u ∈ [t,t + δSRCp (ε,γ ))Qh ⊂ [t,t ]Qh , where ε > 0, γ > 0, h ≥ 0, and A ∈ L(t,h) ≡ L(Zr : r ∈ [0,t]Qh ) are arbitrary with A ⊂ (|Xt | ≤ γ ) and P (A) > 0. Thus the process Z is strongly right continuous in probability in the sense of Definition 10.7.3, with the operation δSRCp as a modulus of strong right continuity in probability. In Step 1, we observed that the process Z is a.u. bounded, with the modulus of a.u. boundedness βauB ≡ βauB (·,b), in the sense of Definition 10.7.2. 4. Thus the process Z satisfies the conditions of Theorem 10.7.8. Accordingly, the process Z is D-regular, with a modulus of D-regularity m ≡ m(b,δ Rcp ) ≡ m(βauB ,δSRCp ), and with a modulus of continuity in probability δ Cp (·,b, δ Rcp ) ≡ δ Cp (·,b,δSRCp ). In addition, Theorem 10.7.8 says that its right-limit extension X ≡ rLim (Z) is an a.u. càdlàg process, with the same modulus of continuity in probability δ Cp (·,b,δ Rcp ), and with the modulus of a.u. càdlàg δaucl (·,b,δ Rcp ) ≡ δaucl (·,βauB ,δSRCp ). Assertions 1 is proved. Assertion 2 is proved except that we still need to verify that X is a martingale relative to the filtration L+ .
a.u. Càdlàg Process
463
5. To that end, let t,r ∈ [0,1] be arbitrary with t < r or r = 1. Take any sequence (uk )k=1,2,... in [t,r)Q∞ such that uk → t. Since the process Z is continuous in probability, the sequence (Zu(k) )k=1,2,... of r.r.v.’s is Cauchy in probability. Hence, according to Proposition 4.9.4, there exists a subsequence (Zu(k(n)) )n=1,2,... such that Y ≡ limn→∞ Zu(k(n)) is an r.r.v., with Zu(k(n)) → Y a.u. Now consider each ω ∈ A ≡ domain(Xt ) ∩ domain(Y ). Then by the definition of the right-limit extension, limv∈[t,∞)Q(∞) Zv (ω) exists and equals Xt (ω). It follows that limn→∞ Zu(k(n)) (ω) = Xt (ω). At the same time, Y (ω) ≡ limn→∞ Zu(k(n)) (ω). Hence Xt = Y on the full set A. Separately, Assertion 5 of Proposition 8.2.2 implies that the family H ≡ {Zu : u ∈ Q∞ } = {Zt : t ∈ [0,1]Q∞ } is uniformly integrable. It follows from Proposition 5.1.13 that the r.r.v. Y is integrable, with E|Zu(k(n)) − Y | → 0. Hence the r.r.v. Xt is integrable, with E|Zu(k(n)) − Xt | → 0. Consequently, since Zu(k(n)) ∈ L(u(k(n)) ⊂ L(r) , we have Xt ∈ L(r) , where r ∈ [0,1] is arbitrary with t < r or r = 1. It follows that Xt ∈ L(t+), where t ∈ [0,1] is arbitrary. In short, the process X is adapted to the filtration L+ . 6. We will next show that the process X is a martingale relative to the filtration L+ . To that end, let t,s ∈ [0,1] be arbitrary with t < s. Now let r,u ∈ Q∞ be arbitrary such that t < r ≤ u and s ≤ u. Let the indicator Y ∈ L(t+) be arbitrary. Then Y,Xt ∈ L(t+) ⊂ L(r) . Hence, since Z is a martingale relative to the filtration L, we have EY Zr = EY Zu . Let r ↓ t and u ↓ s. Then E|Zr − Xt | = E|Xr − Xt | → 0 and E|Zu − Xs | = E|Xu − Xs | → 0. It then follows that EY Xt = EY Xs ,
(10.8.12)
where t,s ∈ [0,1] are arbitrary with t < s. Consider each t,s ∈ [0,1] with t ≤ s. Suppose EY Xt EY Xs . If t < s, then equality 10.8.12 would hold, which is a contradiction. Hence t = s. Then trivially EY Xt = EY Xs , again a contradiction. We conclude that EY Xt = EY Xs for each t,s ∈ [0,1] with t ≤ s, and for each indicator Y ∈ L(t+) . Thus X is a martingale relative to the filtration L+ . Assertion 2 is proved. 7. Assertion 3 remains. Since Z ∈ R Mtgl,b,δ(Rcp) (Q∞ × ,R) is arbitrary, we proved in Step 5 that R Mtgl,b,δ(Rcp) (Q∞ × ,R) ⊂ RDreg,m,δ(Cp) (Q∞ × ,S). At the same time, Theorem 10.6.1 says that the right-limit extension function Dreg,m,δ(Cp) (Q∞ × ,R), ρP rob,Q(∞) )) rLim : (R δ(aucl),δ(Cp) [0,1],ρ → (D D[0,1] )
(10.8.13)
464
Stochastic Process
is uniformly continuous, with a modulus of continuity δrLim (·,m,δCp ) ≡ δrLim (·,b,δ Rcp ). Assertion 3 and the theorem are proved. Theorem 10.8.3 can be restated in terms of the continuous construction of a.u. càdlàg martingales X : [0,1] × → R from their marginal distributions. More precisely, let ξ be an arbitrary but fixed binary approximation of R relative to 0 ∈ R. cp ([0,1],R), ρCp,ξ,[0,1],Q(∞) ) of Recall from Definition 6.2.12 the metric space (F 1 ≡ consistent families of f.j.d.’s on [0,1] that are continuous in probability. Let F FMtgl,b,δ(Rcp) ([0,1],R) be the subspace of consistent families F such that F |Q∞ gives the marginal distributions of some martingale Z ∈ R Mtgl,b,δ(Rcp) (Q∞ × ,R). Let F ∈ F1 and t ∈ Q∞ be arbitrary. Then E|Zt | ≤ E|Z1 | ≤ b. Hence, by Lemma 5.3.7, the r.r.v. Zt has modulus of tightness β(·,t) relative to 0 ∈ R defined by β(ε,t) ≡ bε−1 for each ε > 0. In other words, the distribution Ft 1 |Q∞ is pointwise tight with a modulus of has modulus of tightness β. Thus F pointwise tightness β. Now let ! (0,L0,I0 ) ≡ [0,1],L0, ·dx denote the Lebesgue integration space based on the interval [0,1]. Recall from Theorem 6.4.5 that the Daniell–Kolmogorov–Skorokhod Extension 1 |Q∞, ∞ × 0,R),ρQ(∞)×(0),R ) ρMarg,ξ,Q(∞) ) → (R(Q DKS,ξ : (F is uniformly continuous, with a modulus of continuity δDKS (·, ξ ,β) dependent only on ξ ,b. Hence the composite mapping ρMarg,ξ,Q(∞) ) rLim ◦ DKS,ξ : (F Mtgl,b,δ(Rcp) ([0,1],R)|Q∞, δ(aucl),δ(cp) [0,1],ρ → (D D[0,1] ) is uniformly continuous, with a modulus of continuity depending only on b,δ Rcp, ξ .
10.9 Sufficient Condition for Right-Hoelder Process will denote a set of consistent families F of f.j.d.’s with In this section, F parameter set [0,1], and with a locally compact state space (S,d). We will give to a sufficient condition, in terms of triple joint distributions, for a member F ∈ F be D-regular. Theorem 10.5.8 will then be applied to construct a corresponding a.u. càdlàg process. As an application, we will prove that under an additional continuity condition on the double joint distributions, the a.u. càdlàg process so constructed is rightHoelder, in a sense made precise in the following discussion. This result seems to be new.
a.u. Càdlàg Process
465
, and for each f.j.d. Fr(1),...,r(n) in the In this section, for each family F ∈ F family F , we will use the same symbol Fr(1),...,r(n) for the associated probability function. For example, we write Fr,s,t (A) ≡ Fr,s,t (1A ) for each subset A of S 3 that is measurable relative to a triple joint distribution Fr,s,t . Recall Definition 9.0.2 for the notations associated with the enumerated sets (Qk )k=0,1,... and Q∞ of dyadic rationals in [0,1]. In particular, pk ≡ 2k and k ≡ 2−k for each k ≥ 0. Separately, (0,L0,E0 ) ≡ ([0,1],L0, ·dx) denotes the Lebesgue integration space based on the interval 0 ≡ [0,1]. This will serve as a sample space. Let ξ ≡ (Aq )q=1,2,... be an arbitrary but fixed binary approximation of the locally compact metric space (S,d) relative to some fixed reference point x◦ ∈ S. Recall the operation [·]1 that assigns to each a ∈ R an integer [a]1 ∈ (a,a + 2). The next theorem is in essence due to Kolmogorov. Theorem 10.9.1. Sufficient condition for D-regularity in terms of triple joint distributions. Let F be an arbitrary consistent family of f.j.d.’s with parameter set [0,1] and state space (S,d). Suppose F is continuous in probability. Let Z : Q∞ ×(,L,E) → S be an arbitrary process with marginal distributions given by F |Q∞ . Suppose there exist two sequences (γk )k=0,1··· and (αk )k=0,1,... of positive ∞ k real numbers such that (i) ∞ k=0 2 αk < ∞ and k=0 γk < ∞; (ii) the set A (k) r,s ≡ (d(Zr ,Zs ) > γk+1 )
(10.9.1)
is measurable for each s,t ∈ Q∞ , for each k ≥ 0; and (iii) (k)
(k)
P (Av,v Av ,v ) < αk , where v ≡ v + k and v ≡ v + k+1 = v − k+1 , for each v ∈ [0,1)Qk , for each k ≥ 0. Then the family F |Q∞ of f.j.d.’s is D-regular. Specifically, let m0 ≡ 0. For k −n each n ≥ 1, let mn ≥ mn−1 + 1 be so large that ∞ k=m(n) 2 αk < 2 ∞ and k=m(n)+1 γk < 2−n−1 . Then the sequence (mn )n=0,1,... is a modulus of D-regularity of the family F |Q∞ . Proof. 1. Let k ≥ 0 be arbitrary. Define Dk ≡
(k)
(k)
Av,v Av ,v .
(10.9.2)
v∈[0,1)Q(k)
Then P (Dk ) ≤ 2k αk according to Condition (iii) in the hypothesis. 2. Inductively, for each n ≥ 0, take any βn ∈ (2−n,2−n+1 )
(10.9.3)
such that, for each s,t ∈ Q∞, and for each k = 0, . . . ,n, the sets (d(Zt ,Zs ) > βk + · · · + βn )
(10.9.4)
466
Stochastic Process
and (10.9.5) A(n) t,s ≡ (d(Zt ,Zs ) > βn+1 ) ∞ −k+1 = 2−n+2 for each are measurable. Note that βn,∞ ≡ ∞ k=n βk ≤ k=n 2 n ≥ 0. 3. Let n ≥ 0 be arbitrary, but fixed until further notice. For ease of notations, suppress some symbols signifying dependence on n and write qi ≡ qm(n),i ≡ 2−p(m(n)) for each i = 0, . . . ,pm(n) . Let β > 2−n > βn+1 be arbitrary such that the set β
At,s ≡ (d(Zt ,Zs ) > β)
(10.9.6)
is measurable for each s,t ∈ Q∞ . Define the exceptional set β β β β β β (At,r ∪ At,s )(At,r ∪ At ,s )(At ,r ∪ At ,s ) Dn ≡ t∈[0,1)Q(m(n)) r,s∈(t,t )Q(m(n+1));r≤s
⊂
(n)
(n)
(n)
(n)
(n)
(n)
(At,r ∪ At,s )(At,r ∪ At ,s )(At ,r ∪ At ,s ),
t∈[0,1)Q(m(n))r,s∈(t,t )Q(m(n+1));r≤s
(10.9.7) where for each t ∈ [0,1)Qm(n) , we write t ≡ t + m(n) ∈ Qm(n) . To verify Condition 1 in Definition 10.4.1 for the sequence (m)n=0,1,... and the process Z, we need only show that (10.9.8) P (Dn ) ≤ 2−n . m(n+1) c 4. To that end, consider each ω ∈ k=m(n) Dk . Let t ∈ [0,1)Qm(n) be arbitrary, and write t ≡ t + m(n) . We will show inductively that for each k = mn, . . . ,mn+1 , there exists rk ∈ (t,t ]Qk such that
d(Zt (ω),Zu (ω)) ≤
u∈[t,r(k)−(k)]Q(k)
k
γj
(10.9.9)
j =m(n)+1
and such that
d(Zv (ω),Zt (ω)) ≤
v∈[r(k),t ]Q(k)
k
γj ,
(10.9.10)
j =m(n)+1
where an empty sum, by convention, is equal to 0. Start with k = mn . Define rk ≡ t , whence rk − k = t. Then inequalities 10.9.9 and 10.9.10 are trivially satisfied. Suppose, for some k = mn, . . . ,mn+1 − 1, we have constructed rk ∈ (t,t ]Qk that satisfies inequalities 10.9.9 and 10.9.10. According to the defining equality 10.9.2, we have (k)
(k)
ω ∈ Dk c ⊂ (Ar(k)−(k),r(k)−(k+1) )c ∪ (Ar(k)−(k+1),r(k) )c .
a.u. Càdlàg Process
467
Hence, by the defining equality 10.9.1, we have (i ) d(Zr(k)−(k) (ω),Zr(k)−(k+1) (ω)) ≤ γk+1
(10.9.11)
d(Zr(k)−(k+1) (ω),Zr(k) (ω)) ≤ γk+1 .
(10.9.12)
or (ii )
In case (i ), define rk+1 ≡ rk . In case (ii ), define rk+1 ≡ rk − k+1 . We wish to prove inequalities 10.9.9 and 10.9.10 for k + 1. To prove inequality 10.9.9 for k + 1, consider each u ∈ [t,rk+1 − k+1 ]Qk+1 = [t,rk − k ]Qk ∪ [t,rk − k ]Qk+1 Qck ∪ (rk − k ,rk+1 − k+1 ]Qk+1 . (10.9.13) Suppose u ∈ [t,rk − k ]Qk . Then inequality 10.9.9 in the induction hypothesis trivially implies d(Zt (ω),Zu (ω)) ≤
k+1
γj .
(10.9.14)
j =m(n)+1
Suppose next that u ∈ [t,rk − k ]Qk+1 Qck . Then u ≤ rk − k − k+1 . Let v ≡ u − k+1 and v ≡ v + k = u + k+1 . Then v ∈ [0,1)Qk , so the defining inequality 10.9.2 implies that (k)
c c ω ∈ Dk c ⊂ (A (k) v,u ) ∪ (Au,v ) ,
and therefore that d(Zu (ω),Zv (ω)) ∧ d(Zu (ω),Zv (ω)) ≤ γk+1 . It follows that d(Zu (ω),Zt (ω)) ≤ (d(Zu (ω),Zv (ω)) + d(Zv (ω),Zt (ω))) ∧ (d(Zu (ω),Zv (ω)) + d(Zv (ω),Zt (ω))) ≤ γk+1 + d(Zv (ω),Zt (ω)) ∨ d(Zv (ω),Zt (ω)) ≤ γk+1 +
k
j =m(n)+1
γj =
k+1
γj ,
j =m(n)+1
where the last inequality is due to inequality 10.9.9 in the induction hypothesis. Thus we have also verified inequality 10.9.14 for each u ∈ [t,rk − k ]Qk+1 Qck . Now suppose u ∈ (rk − k ,rk+1 − k+1 ]Qk+1 . Then rk+1 > rk − k + k+1 = rk − k+1,
468
Stochastic Process
which, by the definition of rk+1 , rules out case (ii ). Hence case (i ) must hold, where rk+1 ≡ rk . Consequently, u ∈ (rk − k ,rk − k+1 ]Qk+1 = {rk − k+1 }, so u = rk − k+1 . Let v ≡ rk − k . Then inequality 10.9.11 implies that d(Zv (ω),Zu (ω)) ≡ d(Zr(k)−(k) (ω),Zr(k)−(k+1) (ω)) ≤ γk+1 .
(10.9.15)
Hence d(Zu (ω),Zt (ω)) ≤ d(Zu (ω),Zv (ω)) + d(Zv (ω),Zt (ω)) ≤ γk+1 + d(Zv (ω),Zt (ω)) ≤ γk+1 +
k
γj =
j =m(n)+1
k+1
γj ,
j =m(n)+1
where the last inequality is due to inequality 10.9.9 in the induction hypothesis. Combining, we see that inequality 10.9.14 holds for each u ∈ [t,rk+1 − k+1 ] Qk+1 . Similarly, we can verify that d(Zu (ω),Zt (ω)) ≤
k+1
γj
(10.9.16)
j =m(n)+1
for each u ∈ [rk+1,t ]Qk+1 . Summing up, inequalities 10.9.9 and 10.9.10 hold for k + 1. Induction is completed. Thus inequalities 10.9.9 and 10.9.10 hold for each k = mn, . . . ,mn+1 . 5. Continuing, let r,s ∈ (t,t )Qm(n+1) be arbitrary with r ≤ s. Write k ≡ mn+1 . Then r,s ∈ (t,t )Qk , and, by the construction in Step 4, we have rk ∈ (t,t ]Qk . Hence there are three possibilities: (i ) r,s ∈ [t,rk − k ]Qk , (ii ) r ∈ [t,rk − k ]Qk and s ∈ [rk ,t ]Qk , or (iii ) r,s ∈ [rk ,t ]Qk . In case (i ), inequality 10.9.9 applies to r,s, yielding d(Zt (ω),Zr (ω)) ∨ d(Zt (ω),Zs (ω)) ≤
k
γj < 2−n−1 < βn+1 .
j =m(n)+1
Hence ω ∈ (At,r ∪ At,s )c ⊂ Dnc . In case (ii ), inequalities 10.9.9 and 10.9.10, apply to r and s, respectively, yielding (n)
(n)
d(Zt (ω),Zr (ω)) ∨ d(Zs (ω),Z (ω)) ≤ t
k
γj < 2−n−1 < βn+1 .
j =m(n)+1
In other words, ω ∈ (At,r ∪ As,t )c ⊂ Dnc . Similarly, in case (iii ), we can prove (n)
that ω ∈
(n) (Ar,t
(n) ∪ As,t )c
(n)
⊂ Dnc .
6. Summing up, we have shown that ω ∈ Dnc where ω ∈ m(n+1) arbitrary. Consequently, Dn ⊂ k=m(n) Dk . Hence
c k=m(n) Dk
m(n+1)
is
a.u. Càdlàg Process P (Dn ) ≤
m(n+1)
k=m(n)
P (Dk )
≤
∞
469 2k αk < 2−n,
k=m(n)
where the second inequality follows from the observation in Step 1, and where n ≥ 0 is arbitrary. This proves inequality 10.9.8 and verifies Condition 1 in Definition 10.4.1 for the sequence (m)n=0,1,... and the process Z. At the same time, since the family F is continuous in probability by hypothesis, Condition 2 of Definition 10.4.1 is also satisfied for F |Q∞ and for Z. We conclude that the family F |Q∞ of f.j.d.’s is D-regular, with the sequence (m)n=0,1,... as modulus of D-regularity. Definition 10.9.2. Right-Hoelder process. Let C0 ≥ 0 and λ > 0 be arbitrary. Let X : [0,1] × (,L,E) → S be an arbitrary a.u. càdlàg process. Suppose, for each ε > 0, there exist (i) δ > 0 and (ii) a measurable subset B ⊂ with P (B c ) < ε, and (iii) for each ω ∈ B, there exists a Lebesgue measurable subset θ (ω) of [0,1] with Lebesgue measure μ θk (ω)c < ε such that for each t ∈ θ (ω) ∩ domain(X(·,ω)) and for each s ∈ [t,t + δ ) ∩ domain(X(·,ω)), we have d(X(t,ω),X(s,ω)) ≤ C0 (s − t)λ . Then the a.u. càdlàg process X is said to be right-Hoelder, with right-Hoelder exponent λ, and with right-Hoelder coefficient C0 . Assertion 3 of the next theorem, concerning right-Hoelder processes, seems hitherto unknown. Assertion 2 is in essence due to [Chentsov 1956]. Theorem 10.9.3. Sufficient condition for a right-Hoelder process. Let u ≥ 0 and w,K > 0 be arbitrary. Let F be an arbitrary consistent family of f.j.d.’s with parameter set [0,1] and state space (S,d). Suppose F is continuous in probability, with a modulus of continuity in probability δCp , and suppose Ft,r,s {(x,y,z) ∈ S 3 : d(x,y) ∧ d(y,z) > b} ≤ b−u (Ks − Kt)1+w
(10.9.17)
for each b > 0 and for each t ≤ r ≤ s in [0,1]. Then the following conditions hold: 1. The family F |Q∞ is D-regular. 2. There exists an a.u. càdlàg process X : [0,1] × → S with marginal distributions given by the family F , and with a modulus of a.u. càdlàg dependent only on u,w and the modulus of continuity in probability δCp . 3. Suppose, in addition, that there exists α > 0 such that ≤ |Ks − Kt|α Ft,s (d)
(10.9.18)
for each t,s ∈ [0,1]. Then there exist constants λ(K,u,w,α) > 0 and C0 (K,u, w,α) > 0 such that each a.u. càdlàg process X : [0,1] × → S
(10.9.19)
470
Stochastic Process
with marginal distribution given by the family F is right-Hoelder, with rightHoelder exponent λ and right-Hoelder constant C0 . Proof. Let u ≥ 0 and w,K > 0, and the family F be as given. 1. Let Z : Q∞ × → S be an arbitrary process with marginal distributions given by F |Q∞ . Define u0 ≡ u + 2−1 and u1 ≡ u + 1. Then γ0 ≡ 2−w/u(0) < γ1 ≡ 2−w/u(1) . Take an arbitrary γ ∈ (γ0,γ1 ) such that the subset k+1 ) A (k) r,s ≡ (d(Zr ,Zs ) > γ
= ((d(Zr ,Zs ))1/(k+1) > γ )
(10.9.20)
of is measurable for each r,s ∈ Q∞ and for each k ≥ 0. Then, since 2−w/v is a strictly increasing continuous function of v ∈ (u0,u1 ) with range (γ0,γ1 ), there exists a unique v ∈ (u0,u1 ) ≡ (u + 2−1,u + 1) such that γ = 2−w/v .
(10.9.21)
0 < γ −u 2−w = 2uw/v−w < 2w−w = 1.
(10.9.22)
Note that 0 < γ < 1 and that
2. Let k ≥ 0 be arbitrary. Define the positive real numbers γk ≡ γ k and αk ≡ K 1+w 2−k γ −(k+1)u 2−kw . Then ∞
γj =
j =0
∞
γj < ∞
j =0
and, in view of inequality 10.9.22, ∞
2 αj = K j
1+w −u
γ
j =0
j =0
Let t ∈ [0,1)Qk be arbitrary. Write Then (k)
∞
(γ −u 2−w )j < ∞.
t
≡ t + k and t ≡ t + k+1 = t − k+1 .
(k)
P (At,t At ,t ) = P (d(Zt ,Zt ) ∧ d(Zt ,Zt ) > γ k+1 ) ≤ γ −(k+1)u (Kt − Kt)1+w = γ −(k+1)u (Kk )1+w
a.u. Càdlàg Process
471
= γ −(k+1)u (K2−k )1+w = K 1+w 2−k γ −(k+1)u 2−kw ≡ αk ,
(10.9.23)
where the inequality follows from inequality 10.9.17 in the hypothesis. 3. Thus we have verified the conditions in Theorem 10.9.1 for the consistent family F |Q∞ of f.j.d.’s to be D-regular, with a modulus of D-regularity (mn )n=0,1,... dependent only on the sequences (αk )k=0,1,... . and (γk )k=0,1,... , which, in turn, depend only on the constants K,u,w. Assertion 1 is proved. 4. Corollary 10.6.2 can now be applied to construct an a.u. càdlàg process X : [0,1]× → S with marginal distributions given by the family F , and with a modulus of a.u. càdlàg depending only on the modulus of D-regularity (mn )n=0,1,... and the modulus of continuity in probability δCp . Thus the process X has a modulus of a.u. càdlàg depending only on the constants K,u,w and the modulus of continuity in probability δCp . Assertion 2 is proved. 5. Proceed to prove Assertion 3. Suppose, in addition, the positive constant α is given and satisfies inequality 10.9.18. Let X : [0,1] × → S be an arbitrary a.u. càdlàg process with marginal distributions given by the family F . Such a process exists by Assertion 2. Consider the D-regular processes Z ≡ X|Q∞ . Then X is equal to the right-limit extension of Z. Use the notations in Steps 1–2. We need to show that X is right-Hoelder. As an abbreviation, define the constants c0 ≡ w + (1 + w) log2 K − log2 (1 − γ −u 2−w ), c ≡ w(1 − uv−1 ) = w − uwv−1 > 0, c1 ≡ − log2 (1 − γ ) > 0, c2 ≡ − log2 γ = wv−1 > 0, κ0 ≡ κ0 (K,u,w) ≡ [(c−1 c0 − 1) ∨ c2−1 (c1 + 1)]1 ≥ 1,
(10.9.24)
and κ ≡ κ(K,u,w) ≡ [c−1 ∨ c2−1 ]1 ≥ 1.
(10.9.25)
6. Define the constant integer valued coefficients b0 ≡ κ0 + (κ + 1)8 + 6 + [2 + α log2 K + (2κ0 + 2(κ + 1)8 + 10)]1 (10.9.26) and b1 ≡ κ0 + (κ + 1) + 6 + [α −1 2(κ + 1)]1 .
(10.9.27)
λ ≡ (κb1 + 2−1 )−1
(10.9.28)
Define
and write, as an abbreviation, η ≡ 2−1 = λ−1 − κb1 .
472
Stochastic Process
7. Define m0 ≡ 0. Let n ≥ 1 be arbitrary. Define mn ≡ κ0 + κn.
(10.9.29)
Then mn ≥ mn−1 + 1. Moreover, ∞
log2
∞
2h αh = log2
h=m(n)
2h K 1+w 2−h γ −(h+1)u 2−hw
h=m(n)
= log2 γ −(m(n)+1)u 2−m(n)w K 1+w (1 − γ −u 2−w )−1 = log2 2(w/v)(m(n)+1)u 2−m(n)w K 1+w (1 − γ −u 2−w )−1 = −(mn + 1)(w − uwv−1 ) + w + (1 + w) log2 K − log2 (1 − γ −u 2−w ) ≡ −(κ0 + κn + 1)c + c0 < −((c−1 c0 − 1) + κn + 1)c + c0 = −κnc < −c−1 nc = −n. where the last two inequalities are thanks to the defining equalities 10.9.24 and 10.9.25. Hence ∞
2k αk < 2−n . (10.9.30) k=m(n)
Similarly, log2
∞
k=m(n)+1
γk < log2
∞
γ k = log2 γ m(n) (1 − γ )−1
k=m(n)
= mn log2 γ − log2 (1 − γ ) ≡ −(κ0 + κn)c2 + c1 = −κ0 c2 − κc2 n + c1 < −c2−1 (c1 + 1)c2 − κc2 n + c1 = −1 − κc2 n < −1 − n, where the first inequality is thanks to equality 10.9.24. Hence ∞
γk < 2−n−1 .
(10.9.31)
k=m(n)+1
In view of inequalities 10.9.23, 10.9.30, and 10.9.31, Theorem 10.9.1 applies, and says that the sequence m ≡ (mn )n=0,1,... is a modulus of D-regularity of the family F |Q∞ and of the process Z. 8. By the hypothesis of Assertion 3, the operation δ Cp defined by δ Cp (ε) ≡ K −α ε
(10.9.32)
for each ε > 0 is a modulus of continuity in probability of the family F of f.j.d.’s and of the D-regular process
a.u. Càdlàg Process
473
Z : Q∞ × → S.
(10.9.33)
9. Theorem 10.5.8 says that the right-limit extension X of Z has a modulus of a.u. càdlàg δaucl defined as follows. Let ε > 0 be arbitrary. Let n ≥ 0 be so large that 2−n+9 < ε. Let N > mn + n + 6 be so large that m(N ) ≡ 2−m(N ) < 2−2 δ Cp (2−2m(n)−2n−10 ).
(10.9.34)
Define δaucl (ε,m,δ Cp ) ≡ m(N ) .
(10.9.35)
10. Now fix k ≡ [2 − 2 log2 (1 − 2−η )]1 . Let k ≥ k be arbitrary, but fixed until further notice. Then k − 2 > 2 log2 (1 − 2−η ). Write εk ≡ 2−k+2 , δk ≡ 2−(k−2)η , nk ≡ k + 8, and β ≡ (2κ0 + 2(κ + 1))α −1 . Then 2−n(k)+9 < εk . Define Nk ≡ b0 + b1 k.
(10.9.36)
Then Nk > b0 > κ0 + (κ + 1)(k + 8) + 6 = κ0 + (κ + 1)nk + 6 = κ0 + κnk + nk + 6 = mn(k) + nk + 6 At the same time, Nk ≡ b0 + b1 k > 2 + α log2 K + (2κ0 + 2(κ + 1)8 + 10) + 2(κ + 1)k = 2 + α log2 K + (2κ0 + 2(κ + 1)(k + 8) + 10) = 2 + α log2 K + (2κ0 + 2κnk + 2nk + 10) = 2 + α log2 K + (2mn(k) + 2nk + 10). Hence −m(Nk ) ≤ −Nk < −2 − α log2 K − (2mn(k) + 2nk + 10). Consequently, m(N (k)) ≡ 2−m(N (k)k ) < 2−2 K −α 2−2m(n(k))−2n(k)−10 = 2−2 δ Cp (2−2m(n(k))−2n(k)−10 ), where the last equality is from the defining equality 10.9.32. Therefore
(10.9.37)
474
Stochastic Process δaucl (εk ,m,δ Cp ) ≡ m(N (k))
according to the defining equality 10.9.35. 11. Hence, by Condition 3 in Definition 10.3.2 of a modulus of a.u. càdlàg, there exist an integer hk ≥ 1, a measurable set Ak with P (Ack ) < εk ≡ 2−k+2, and a sequence of r.r.v.’s 0 = τk,0 < τk,1 < · · · < τk. h(k)−1 < τk, h(k) = 1,
(10.9.38)
such that for each i = 0, . . . , hk − 1, the function Xτ (k,i) is an r.v., and such that for each ω ∈ Ak , we have h(k)−1 >
(τk,i+1 (ω) − τk,i (ω)) ≥ δaucl (εk ,m,δ Cp )
(10.9.39)
i=0
with d(X(τk,i (ω),ω),X(·,ω)) ≤ εk
(10.9.40)
on the interval θk,i (ω) ≡ [τk,i (ω),τk,i+1 (ω)) or θk,i (ω) ≡ [τk,i (ω),τk,i+1 (ω)] hk − 1. depending on whether i ≤ hk − 2 or i = hk − 1, define the subinterval 12. Consider each ω ∈ Ak For each i = 0, . . . , θ k,i (ω) ≡ [τk,i (ω),τk,i+1 (ω) − δk · (τk,i+1 (ω) − τk,i (ω))] of θk,i (ω), with Lebesgue measure μθ k,i (ω) = (1 − δk )(τk,i+1 (ω) − τk,i (ω)). Note that the intervals θk,0 (ω), . . . ,θk, h(k)−1 (ω) are mutually exclusive. Therefore ⎛ ⎞c h(k)−1 h(k)−1
μ⎝ θ k,i (ω)⎠ = δk · (τk,i+1 (ω) − τk,i (ω)) = δk . (10.9.41) i=0
i=0
13. Define the measurable subset Bk ≡
∞
Ah .
(10.9.42)
h=k
Then P (Bkc ) ≤
∞
P (Ach )
0 be arbitrary. Let G : [0,1] → [0,1] be a nondecreasing continuous function. Let F be an arbitrary consistent family of f.j.d.’s with parameter set [0,1] and state space (S,d). Suppose F is continuous in probability, and suppose Ft,r,s {(x,y,z) ∈ S 3 : d(x,y) ∧ d(y,z) > b} ≤ b−u (G(s) − G(t))1+w (10.9.53)
a.u. Càdlàg Process
477
for each b > 0 and for each t ≤ r ≤ s in [0,1]. Then the following conditions hold: 1. There exists an a.u. càdlàg process Y : [0,1] × → S with marginal distributions given by the family F . 2. Suppose, in addition, that there exists α > 0 such that ≤ |G(s) − G(t)|α Ft,s (d)
(10.9.54)
for each t,s ∈ [0,1]. Then Y (r,ω) = X(h(r),ω) for each (r,ω) ∈ domain(Y ), for some right-Hoelder process X and for some continuous increasing function h : [0,1] → [0,1]. Proof. Write a0 ≡ G(0) and a1 ≡ G(1). Write K ≡ a1 − a0 + 1 > 0. Define the continuous increasing function h : [0,1] → [0,1] by h(r) ≡ K −1 (G(r) − a0 + r) for each r ∈ [0,1]. Then its inverse g ≡ h−1 is also continuous and increasing. Moreover, for each s,t ∈ [0,1] with t ≤ s, we have G(g(s)) − G(g(t)) = (Kh(g(s)) + a0 − g(s)) − (Kh(g(t)) + a0 − g(t)) = (Ks − Kt) − (g(s) − g(t)) ≤ Ks − Kt.
(10.9.55)
1. Since the family F of f.j.d.’s is, by hypothesis, continuous in probability, the reader can verify that the singleton set {F } satisfies the conditions in the hypothesis of Theorem 7.1.7. Accordingly, there exists a process V : [0,1] × → S with marginal distributions given by the family F . Define the function U : [0,1] × → S by domain(U ) ≡ {(t,ω) : (g(t),ω) ∈ domain(V )} and by U (t,ω) ≡ V (g(t),ω)
(10.9.56)
for each (t,ω) ∈ domain(U ). Then U (t,·) ≡ V (g(t),·) is an r.v. for each t ∈ [0,1]. Thus U is a stochastic process. Let F denote the family of its marginal distributions. Then, for each b > 0 and for each t ≤ r ≤ s in [0,1], we have {(x,y,z) ∈ S 3 : d(x,y) ∧ d(y,z) > b) Ft,r,s
= P (d(Ut ,Ur ) ∧ d(Ur ,Us ) > b) ≡ P (d(Vg(t),Vg(r) ) ∧ d(Vg(r),Vg(s) ) > b) = Fg(t),g(r),g(s) {(x,y,z) ∈ S 3 : d(x,y) ∧ d(y,z) > b) ≤ b−u (G(g(s)) − G(g(t)))1+w ≤ b−u (Ks − Kt)1+w, where the next-to-last inequality follows from inequality 10.9.53 in the hypothesis, and the last inequality is from inequality 10.9.55. Thus the family F and the constants K,u,w satisfy the hypothesis of Theorem 10.9.3. Accordingly, there
478
Stochastic Process
exists an a.u. càdlàg process X : [0,1] × → S with marginal distributions given by the family F . Now define a process Y : [0,1] × → S by domain(Y ) ≡ {(r,ω) : (h(r),ω) ∈ domain(X)} and by Y (r,ω) ≡ X(h(r),ω) for each (rω) ∈ domain(Y ). Because the function h is continuous, it can be easily verified that the process Y is a.u. càdlàg. Moreover, in view of the defining equality 10.9.56, we have V (r,ω) = U (h(r),ω) for each (r,ω) ∈ domain(V ). Since the processes X and U share the same marginal distributions given by the family F , the last two displayed equalities imply that the processes Y and V share the same marginal distributions. Since the process V has marginal distributions given by the family F , so does the a.u. càdlàg process Y . Assertion 1 of the corollary is proved. 2. Suppose, in addition, that there exists α > 0 such that inequality 10.9.54 holds for each t,s ∈ [0,1]. Then g(t),Vg(s) )) = Fg(t),g(s) (d) (d) = E(d(V Ft,s
≤ |G(g(s)) − G(g(t))|α ≤ |Ks − Kt|α , where the first inequality is from inequality 10.9.54 in the hypothesis, and where the last inequality is from inequality 10.9.55. Thus the family F of marginal distributions of the process X and the constants K,α, satisfy the hypothesis of Assertion 3 of Theorem 10.9.3. Accordingly, the a.u. càdlàg process X : [0,1] × → S is right-Hoelder. The corollary is proved.
10.10 a.u. Càdlàg Process on [0,∞) In the preceding sections, a.u. càdlàg processes are constructed with the unit interval [0,1] as the parameter set. We now generalize to the parameter interval [0,∞) by constructing the process piecewise on unit subintervals [M,M + 1], for M = 0,1, . . ., and then stitching the results back together. For later reference, we will state several related definitions and will extend an arbitrary D-regular process on Q∞ to an a.u. càdlàg process on [0,∞). Recall here that Q∞ and Q∞ ≡ {u0,u1, . . .} are the enumerated sets of dyadic rationals in [0,1] and [0,∞), respectively. Let (,L,E) be a sample space. Let (S,d) be a locally compact metric space, which will serve as the state space. Let ξ ≡ (Aq )q=1,2,... be a binary approximation of (S,d). Recall that D[0,1] stands for the space of càdlàg functions on [0,1], and that D[0,1] stands for the space of a.u. càdlàg processes on [0,1].
a.u. Càdlàg Process
479
Definition 10.10.1. Skorokhod metric space of càdlàg functions on [0,∞). Let x : [0,∞) → S be an arbitrary function whose domain contains the enumerated set Q∞ of dyadic rationals. Let M ≥ 0 be arbitrary. Then the function x is said to be càdlàg on the interval [M,M + 1] if the shifted function x M : [0,1] → S, defined by x M (t) ≡ x(M + t) for each t with M + t ∈ domain(x), is a member of D[0,1]. The function x : [0,∞) → S is said to be càdlàg if it is càdlàg on the interval [M,M + 1] for each M ≥ 0. We will write D[0,∞) for the set of càdlàg functions on [0,∞). Recall, from Definition 10.2.1, the Skorokhod metric dD[0,1] on D[0,1]. Define the Skorokhod metric on D[0,∞) by dD[0,∞) (x,y) ≡
∞
2−M−1 (1 ∧ dD[0,1] (x M ,y M ))
M=0
for each x,y ∈ D[0,∞). We will call (D[0,∞),dD[0,∞) ) the Skorokhod space on [0,∞). The reader can verify that this metric space is complete. Definition 10.10.2. D-regular processes with parameter set Q∞ . Let Z : Q∞ × → S be a stochastic process. Recall from Definition 10.4.1 the metric Dreg (Q∞ × ,S), ρP rob,Q(∞) ) of D-regular processes with parameter space (R set Q∞ . Suppose, for each M ≥ 0, (i) the process Z|Q∞ [0,M + 1] is continuous in probability, with a modulus of continuity in probability δ Cp,M , and (ii) the shifted process Z M : Q∞ × → S, defined by Z M (t) ≡ Z(M + t) for each t ∈ Q∞ , is Dreg (Q∞ × ,S) with a modulus of D-regularity mM . a member of the space R Then the process Z : Q∞ × → S is said to be D-regular, with a modulus of continuity in probability δ Cp ≡ (δ Cp,M )M=0,1,... and a modulus of D-regularity Dreg (Q∞ × ,S) denote the set of all D-regular m ≡ (mM )M=0,1,... . Let R Dreg,δ (Cp), m processes with parameter set Q∞ . Let R . (Q∞ × ,S) denote the subset whose members share the common modulus of continuity in probability δ Cp and the common modulus of D-regularity m ≡ (mM )M=0,1,... . If, in addition, δ Cp,M = δ Cp,0 and mM = m0 for each M ≥ 0, then we say that the process Z time-uniformly D-regular on Q∞ . Definition 10.10.3. Metric space of a.u. càdlàg processes on [0,∞). Let X : [0,∞) × (,L,E) → (S,d) be an arbitrary process. Suppose, for each M ≥ 0, (i) the process X|[0,M + 1] is continuous in probability, with a modulus of continuity in probability δ Cp,M , and (ii) the shifted process XM : [0,1] × → S, defined by X(t) ≡ X(M + t) for each t ∈ [0,∞), is a member of the space D[0,1] with some modulus of a.u. M càdlàg δaucl . Then the process X is said to be a.u. càdlàg on the interval [0,∞), with a M ) modulus of a.u. càdlàg δaucl ≡ (δaucl M=0,1,... and a modulus of continuity in
480
Stochastic Process
0 0 M ) M M probability δCp ≡ (δCp M=0,1,... . If, in addition, δaucl = δaucl and δCp = δCp for each M ≥ 0, then we say that the process is time-uniformly a.u. càdlàg on the interval [0,∞). We will write D[0,∞) for the set of a.u. càdlàg processes on [0,∞), and equip defined by it with the metric ρ D[0,∞)
(X,X ) ≡ ρP rob,Q(∞) (X|Q∞,X |Q∞ ) ρ D[0,∞) where, according to Definition 6.4.2, we have for each X,X ∈ D[0,∞), ρP rob,Q(∞) (X|Q∞,X |Q∞ ) ≡ E
∞
2−n−1 (1 ∧ d(Xu(n),Xu(n) )).
(10.10.1)
n=0
Thus (X,X ) ≡ E ρ D[0,∞)
∞
2−n−1 (1 ∧ d(Xu(n),Xu(n) )
n=0
for each X,X ∈ D[0,∞). δ (aucl),δ (Cp) [0,∞) denote the subspace of the metric space (D[0,∞), Let D ) whose members share a common modulus of continuity in probability ρ D[0,∞) δCp ≡ (δ Cp,M )M=0,1,... and share a common modulus of a.u. càdlàg δaucl ≡ (δaucl (·,mM ,δCp,M ))M=0,1,... . Definition 10.10.4. Random càdlàg function from an a.u. càdlàg process on [0,∞). Let X ∈ D[0,∞) be arbitrary. Define a function X∗ : → D[0,∞) by domain(X∗ ) ≡ {ω ∈ : X(·,ω) ∈ D[0,∞)} and by X∗ (ω) ≡ X(·,ω) for each ω ∈ domain(X∗ ). We will call X∗ the random càdlàg function from the a.u. càdlàg process X. The reader can prove that the function X∗ is a welldefined r.v. with values in the Skorokhod metric space (D[0,∞),dD[0,∞) ) of càdlàg functions. Lemma 10.10.5. a.u. Continuity implies a.u. càdlàg, on [0,∞). Let X : [0,∞) × (,L,E) → (S,d) be an arbitrary a.u. continuous process. Then X is a.u. càdlàg. If, in addition, X is time-uniformly a.u. continuous, then X is timeuniformly a.u. càdlàg. Proof. Straightforward and omitted.
a.u. Càdlàg Process
481
Corollary 10.10.6. Brownian motion in R m is time-uniformly a.u. continuous on [0,∞). Let B : [0,∞) × (Ω,L,E) → R m be an arbitrary Brownian motion. Then B is time-uniformly a.u. continuous, and hence time-uniformly a.u. càdlàg. Proof. Straightforward and omitted.
In the following, recall the right-limit extension mappings rLim and rLim from Definition 10.5.6. Lemma 10.10.7. The right-limit extension of a D-regular process on Q∞ is continuous in probability on [0,∞). Suppose Z : Q∞ × (,L,E) → (S,d) is an arbitrary D-regular process on Q∞ , with a modulus of continuity in probability δCp ≡ (δ Cp,M )M=0,1,... . Then the right-limit extension X ≡ rLim (Z) : [0,∞) × (,L,E) → (S,d) of Z is a well-defined process that is continuous in probability, with the same modulus of continuity in probability δCp ≡ (δ Cp,M )M=0,1,... as Z. Proof. Let N ≥ 0 be arbitrary. Consider each t,t ∈ [0,N + 1]. Let (rk )k=1,2,... be a sequence in [0,N + 1]Q∞ such that rk ↓ t. Since the process Z is continuous in probability on [0,N + 1]Q∞ , with a modulus of continuity in probability δ Cp,N , it follows that E(1 ∧ d(Zrk),Zr(h) )) → 0 as k,h → 0. By passing to a subsequence if necessary, we can assume E(1 ∧ d(Zr(k),Zr(k+1) )) ≤ 2−k for each k ≥ 1. Then E(1 ∧ d(Zr(k),Zr(h) )) ≤ 2−k+1
(10.10.2)
for each h ≥ k ≥ 1. It follows that Zr(k) → Ut a.u. for some r.v. Ut . Consequently, Zrk) (ω) → Ut (ω) for each ω in some full set B. Consider each ω ∈ B. Since X ≡ rLim (Z) and since rk ↓ t with Zr(k) (ω) → Ut (ω), we see that ω ∈ domain(Xt ) and Xt (ω) = Ut (ω). In short, Xt = Ut a.s. Consequently, Xt is an r.v., where t ∈ [0,N +1] and N ≥ 0 are arbitrary. Since [0,∞) = ∞ M=0 [0,M +1], it follows that Xu is an r.v. for each u ∈ [0,∞). Thus X : [0,∞) × (,L,E) → (S,d) is a well-defined process. Letting h → ∞ in equality 10.10.2, we obtain E(1 ∧ d(Zr(k),Xt )) = E(1 ∧ d(Zr(k),Ut )) ≤ 2−k+1 for each k ≥ 1. Similarly, we can construct a sequence (rk )k=1,2,... in [0,N +1]Q∞ such that rk ↓ t and such that
482
Stochastic Process E(1 ∧ d(Zr (k),Xt )) ≤ 2−k+1 .
Now let ε > 0 be arbitrary, and suppose t,t ∈ [0,N + 1] are such that |t − t | < δ Cp,N (ε). Then |rk − rk | < δ Cp,N (ε), whence E(1 ∧ d(Zr(k),Zr (k) )) ≤ ε for sufficiently large k ≥ 1. Combining, we obtain E(1 ∧ d(Xt ,Xt )) ≤ 2−k+1 + ε + 2−k+1 for sufficiently large k ≥ 1. Hence E(1 ∧ d(Xt ,Xt )) ≤ ε, where t,t ∈ [0,N +1] are arbitrary with |t −t | < δ Cp,N (ε). Thus we have verified that the process X is continuous in probability, with a modulus of continuity in probability δCp ≡ (δCp.N )N =0,1,... , which is the same as the modulus of continuity in probability of the given D-regular process. Theorem 10.10.8. The right-limit extension of a D-regular process on Q∞ is an a.u. càdlàg process on [0,∞). Suppose Z : Q∞ × (,L,E) → (S,d) is an arbitrary D-regular process on Q∞ , with a modulus of continuity in ≡ probability δCp ≡ (δ Cp,M )M=0,1,... , and with a modulus of D-regularity m (mM )M=0,1,... . In symbols, suppose Dreg,δ (Cp), m Z∈R . (Q∞ × ,S). Then the right-limit extension X ≡ rLim (Z) : [0,∞) × (,L,E) → (S,d) of Z is an a.u. càdlàg process, with the same modulus of continuity in probability δCp ≡ (δ Cp,M )M=0,1,... as Z, and with a modulus of a.u. càdlàg δaucl ≡ δaucl ( m, δCp ). In other words, δ (aucl, m X ≡ rLim (Z) ∈ D , δ (Cp)), δ (Cp) [0,∞) ⊂ D[0,∞). Proof. 1. Lemma 10.10.7 says that the process X is continuous in probability, with the same modulus of continuity in probability δCp ≡ (δ Cp,M )M=0,1,... as Z. Let N N ≥ 0 be arbitrary. Then Z : Q∞ × (,L,E) → (S,d) is a D-regular process with a modulus of continuity in probability δ Cp,N and a modulus of D-regularity mN . Theorem 10.5.8 therefore implies that the right-limit extension process YN ≡ rLim (Z N ) : [0,1] × (,L,E) → (S,d) is a.u. càdlàg, with the same modulus of continuity in probability δ Cp,N and a modulus of a.u. càdlàg δaucl (·,mN ,δCp,N ). Separately, Proposition 10.5.7 implies that the process YN is continuous a.u., with a modulus of continuity
a.u. Càdlàg Process
483
a.u. δcau (·,mN ,δCp,N ). Here the reader is reminded that continuity a.u. is not to be confused with the stronger condition of a.u. continuity. Note that YN = Z N on Q∞ and X = Z on Q∞ . 2. Let k ≥ 1 be arbitrary. Define δk ≡ δcau (2−k ,mN ,δCp,N ) ∧ δcau (2−k ,mN +1,δCp,N +1 ) ∧ 2−k . Then, since δcau (·,mN ,δCp,N ) is a modulus of continuity a.u. of the process YN , there exists, according to Definition 6.1.2, a measurable set D1,k ⊂ domain(YN,1 ) c ) < 2−k such that for each ω ∈ D with P (D1,k 1,k and for each r ∈ domain (YN (·,ω)) with |r − 1| < δk , we have d(YN (r,ω),Z(N + 1,ω)) = d(YN (r,ω),YN (1,ω)) ≤ 2−k .
(10.10.3)
Likewise, since δcau (·,mN +1,δCp,N+1 ) is a modulus of continuity a.u. of the process YN +1 , there exists, according to Definition 6.1.2, a measurable set D0,k ⊂ c ) < 2−k such that for each ω ∈ D domain(YN +1,0 ) with P (D0,k 0,k and for each r ∈ domain(YN +1 (0,ω)) with |r − 0| < δk , we have d(YN +1 (r,ω),Z(N + 1,ω)) = d(YN +1 (r,ω),YN +1 (0,ω)) ≤ 2−k . (10.10.4) ∞ c −k+2 . Define Dk+ ≡ ∞ h=k D1,h D0,h and B ≡ κ=1 Dκ+. Then P (Dk+ ) < 2 Hence P (B) = 1. In words, B is a full set. 3. Consider each t ∈ [N,N + 1). Since YN ≡ rLim (Z N ) and X ≡ rLim (Z), we have A B lim ZsN (ω) exists domain(YN,t−N ) ≡ ω ∈ : A = ω∈: A = ω∈: A = ω∈: A = ω∈:
s→t−N ;s∈[t−N,∞)Q(∞)
B
lim
ZsN (ω) exists
lim
Z(N + s,ω) exists
s→t−N ;s∈[t−N,1]Q(∞) s→t−N ;s∈[t−N,1]Q(∞)
lim
r→t;r∈[t,N +1]Q(∞)
lim
B
B
Z(r,ω) exists B
Z(r,ω) exists
r→t;r∈[t,∞)Q(∞)
≡ domain(Xt ), because each limit that appears in the previous equality exists iff all others exist, in which case they are equal. Hence Xt (ω) =
lim r→t;r∈[t,∞)Q(∞)
Z(r,ω) =
lim
s→t−N ;s∈[t−N,∞)Q(∞)
ZsN (ω) = YN,t−N (ω)
for each ω ∈ domain(YN,t−N ). Thus the two functions Xt and YN,t−N have the same domain and have equal values on the common domain. In short, Xt = YN,t−N ,
(10.10.5)
484
Stochastic Process
where t ∈ [N,N +1) is arbitrary. As for the endpoint t = N +1, we have, trivially, XN +1 = ZN +1 = Z1N = YN,1 = YN,(N +1)−1 . Hence Xt = YN,t−N
(10.10.6)
for each t ∈ [N,N + 1) ∪ {N + 1}. 4. We wish to extend equality 10.10.6 to each t ∈ [N,N + 1]. To that end, consider each t ∈ [N,N + 1]. We will prove that Xt = YN,t−N
(10.10.7)
on the full set B. 5. Let ω ∈ B be arbitrary. Suppose ω ∈ domain(Xt ). Then, since X ≡ rLim (Z), the limit limr→t;r∈[t,∞)Q(∞) Z(r,ω) exists and is equal to Xt (ω). Consequently, each of the following limits lim
r→t;r∈[t,N +1]Q(∞)
Z(r,ω) = = =
lim
Z(N + s,ω)
lim
Z(N + s,ω)
lim
ZsN (ω)
N +s→t;s∈[t−N,1]Q(∞) s→t−N ;s∈[t−N,1]Q(∞) s→t−N ;s∈[t−N,1]Q(∞)
exists and is equal to Xt (ω). Since YN ≡ rLim (Z N ), the existence of the last limit implies that ω ∈ domain(YN,t−N ) and YN,t−N (ω) = Xt (ω). Thus domain(Xt ) ⊂ domain(YN,t−N )
(10.10.8)
Xt = YN,t−N
(10.10.9)
and
on domain(Xt ). We have proved half of the desired equality 10.10.7. 6. Conversely, suppose ω ∈ domain(YN,t−N ). Then y ≡ YN,t−N (ω) ∈ S is defined. Hence, since YN ≡ rLim (Z N ), the limit lim
s→t−N ;s∈[t−N,∞)Q(∞)
Z N (s,ω)
exists and is equal to y. Let ε > 0 be arbitrary. Then there exists δ > 0 such that d(Z N (s,ω),y) < ε for each s ∈ [t − N,∞)Q∞ = [t − N,∞)[0,1]Q∞
a.u. Càdlàg Process
485
such that s − (t − N ) < δ . In other words, d(Z(u,ω),y) < ε
(10.10.10)
for each u ∈ [t,t + δ )[N,N + 1]Q∞ . 7. Recall the assumption that ω ∈ B ≡ ∞ κ=1 Dκ+. . Therefore there exists ∞ some κ ≥ 1 such that ω ∈ Dκ+ ≡ k=κ D1,k D0,k . Take k ≥ κ so large that 2−k < ε. Then ω ∈ D1,k D0,k . Therefore, for each r ∈ domain(YN +1 (0,ω)) with |r − 0| < δk , we have, according to inequality 10.10.4, d(YN +1 (r,ω),Z(N + 1,ω)) ≤ 2−k < ε.
(10.10.11)
Similarly, for each r ∈ domain(YN (·,ω)) with |r − 1| < δk , we have, according to inequality 10.10.3, d(YN (r,ω),Z(N + 1,ω)) ≤ 2−k < ε.
(10.10.12)
8. Now let u,v ∈ [t,t + δk ∧ δ )Q∞ be arbitrary with u < v. Then u,v ∈ [t,t + δk ∧ δ )[N,∞)Q∞ . Since u,v are dyadic rationals, there are three possibilities: (i) u < v ≤ N + 1, (ii) u ≤ N + 1 < v, or (iii) N + 1 < u < v. Consider case (i). Then u,v ∈ [t,t + δ )[N,N + 1]Q∞ . Hence inequality 10.10.10 applies to u and v and yields d(Z(u,ω),y) ∨ d(Z(v,ω),y) < ε, whence d(Z(u,ω),Z(v,ω)) < 2ε. Next consider case (ii). Then |(u − N ) − 1| < v − t < δk . Hence, by inequality 10.10.12, we obtain d(Z(u,ω),Z(N + 1,ω)) ≡ d(Z N (u − N,ω),Z(N + 1,ω)) = d(YN (u − N,ω),Z(N + 1,ω)) < ε.
(10.10.13)
Similarly, |(v − (N + 1)) − 0| < v − t < δk . Hence, by inequality 10.10.11, we obtain d(Z(v,ω),Z(N + 1,ω)) ≡ d(Z N +1 (v − (N + 1),ω),Z(N + 1,ω)) = d(YN +1 (v − (N + 1),ω),Z(N + 1,ω)) < ε. (10.10.14) Combining 10.10.13 and 10.10.14, we obtain d(Z(u,ω),Z(v,ω)) < 2ε in case (ii) as well. Now consider case (iii). Then |(u − (N + 1)) − 0| < v − t < δk and |(v − (N + 1)) − 0| < v − t < δk . Hence inequality 10.10.11 implies
486
Stochastic Process d(Z(u,ω),Z(N + 1,ω)) = d(Z N +1 (u − (N + 1),ω),Z(N + 1,ω)) = d(YN +1 (u − (N + 1),ω),Z(N + 1,ω)) < ε (10.10.15)
and, similarly, d(Z(v,ω),Z(N + 1,ω)) < ε.
(10.10.16)
Hence d(Z(u,ω),Z(v,ω)) < 2ε in case (iii) as well. 9. Summing up, we see that d(Z(u,ω),Z(v,ω)) < 2ε for each u,v ∈ [t,t + δk ∧ δ )Q∞ with u < v. Since ε > 0 is arbitrary, we conclude that limu→t;u∈[t,∞)Q(∞) Z(u,ω) exists. Thus (t,ω) ∈ domain( rLim (Z)) ≡ domain(X). In other words, we have ω ∈ domain(Xt ), where ω ∈ B ∩ domain(YN,t−N ) is arbitrary. Hence B ∩ domain(YN,t−N ) ⊂ domain(Xt ) ⊂ domain(YN,t−N ), where the second inclusion is by relation 10.10.8 in Step 5. Consequently, B ∩ domain(YN,t−N ) = B ∩ domain(Xt ), while, according to equality 10.10.9, N Xt−N = Xt = YN,t−N
(10.10.17)
N = on B ∩ domain(Xt ), In other words, on the full subset B, we have Xt−N N YN,t−N for each t ∈ [N,N + 1]. Equivalently, X = YN on the full subset B. Since the process
YN ≡ rLim (Z N ) : [0,1] × (,L,E) → (S,d) is a.u. càdlàg with a modulus of a.u. càdlàg δaucl (·,mN ,δCp,N ), the same is true for the process XN . Thus, by Denition 10.10.3, the process X is a.u. càdlàg, with a modulus of continuity in probability δCp ≡ (δ Cp,M )M=0,1,... , and a modulus of a.u. càdlàg δaucl ≡ (δaucl (·,mM ,δCp,M ))M=0,1,... . In other words, δ (aucl),δ(Cp) [0,∞). X∈D The theorem is proved. The next theorem is straightforward, and is proved here for future reference.
a.u. Càdlàg Process
487
Theorem 10.10.9. rLim is an isometry on a properly restricted domain. Dreg,δ (Cp), m Recall from Definition 10.10.2 the metric space (R . (Q∞ × ,S), ρP rob,Q(∞) ) of D-regular processes whose members Z share a given modulus of continuity in probability δCp ≡ (δ Cp,M )M=0,1,... as well as a given modulus of D-regularity m ≡ (mM )M=0,1,... . ) of a.u. Recall from Definition 10.10.3 the metric space (D[0,∞), ρD[0,∞) càdlàg processes on [0,∞), where (X,X ) ≡ ρP rob,Q(∞) (X|Q∞,X |Q∞ ) ρ D[0,∞) for each X,X ∈ D[0,∞). Then the mapping Dreg,δ (Cp), m rLim : (R ρP rob,Q(∞) ) . (Q∞ × ,S), δ (aucl, m ρD[0,∞) ) →D , δ (Cp)), δ (Cp) [0,∞) ⊂ (D[0,∞), is a well-defined isometry on its domain, where the modulus of a.u. càdlàg δaucl ≡ δaucl ( m, δCp ) is defined in the following proof. Dreg,δ (Cp), m Proof. 1. Let Z ∈ R . (Q∞ × ,S) be arbitrary. In other words, Z, : Q∞ × (,L,E) → (S,d) is a D-regular process, with a modulus of continuity in probability δCp ≡ ≡ (mM )M=0,1,... . Consider (δ Cp,M )M=0,1,... and a modulus of D-regularity m each N ≥ 0. Then the shifted processes Z N : Q∞ × (,L,E) → (S,d) is D-regular, with a modulus of continuity in probability δ Cp,N and a modulus of D-regularity mN . In other words, Dreg,m,δ(Cp) (Q∞ × ,S), ρP rob,Q(∞) ). Z N ∈ (R Hence, by Theorem 10.6.1, the process Y N ≡ rLim (Z N ) is a.u. càdlàg, with a modulus of continuity in probability δ Cp,N and a modulus of a.u. càdlàg δaucl (·,mN ,δ Cp,N ). It is therefore easily verified that X ≡ rLim (Z) is a welldefined process on [0,∞), with XN ≡ rLim (Z)N = rLim (Z N ) Hence the function for each N ≥ 0. In other words, rLim (Z) ≡ X ∈ D[0,∞). rLim is well defined on RDreg,δ (Cp), m (Q × ,S). In other words, ∞ . δ (aucl,),δ (Cp) [0,∞), X ≡ rLim (Z) ∈ D δaucl ≡ (δaucl (·,mM ,δCp,M ))M=0,1,... . where δCp ≡ (δ Cp,M )M=0,1,... and 2. It remains to prove that the function rLim is uniformly continuous on its Dreg,δ (Cp), m domain. To that end, let Z,Z ∈ R . (Q∞ × ,S) be arbitrary. Define X ≡ rLim (Z) and X ≡ rLim (Z ) as in Step 1. Then
488
Stochastic Process
ρ D[0,∞) (X,X ) ≡ E
∞
2−n−1 (1 ∧ d(Xu(n),Xu(n) )
n=0
=E
∞
2−n−1 (1 ∧ d(Zu(n),Zu(n) )≡ρ P rob,Q(∞) (Z,Z ).
n=0
Hence the function rLim is an isometry on its domain.
10.11 First Exit Time for a.u. Càdlàg Process In this section, let X : [0,∞) × (,L,E) → (S,d) be an arbitrary a.u. càdlàg process that is adapted to some right continuous filtration L. Let f be an arbitrary bounded and uniformly continuous function on (S,d). In symbols, f ∈ Cub (S,d). Definition 10.11.1. First exit time. Let N ≥ 1 be arbitrary. Suppose τ is a stopping time relative to L, with values in (0,N], such that the function Xτ is a well-defined r.v. relative to L(τ ) , where L(τ ) is the probability subspace first introduced in Definition 8.1.9. Suppose also that for each ω in some full set, we have (i) f (X(·,ω)) < a on the interval [0,τ (ω)) and (ii) f (X(τ (ω),ω)) ≥ a if τ (ω) < N . Then we say that τ is the first exit time in [0,M] of the open subset ( f < a) by the process X, and define τ f ,a,N ≡ τ f ,a,N (X) ≡ τ . Note that there is no requirement that the process X ever actually exits ( f < a). Observation stops at time N if exit does not occur by then. The next lemma makes precise some intuition. Lemma 10.11.2. Basics of first exit times. Let a ∈ (a0,∞) be such that the first exit times τ f ,a,N ≡ τ f ,a,N (X) and τ f ,a,M ≡ τ f ,a,M (X) exist for some N ≥ M. Then the following conditions hold: 1. τ f ,a,M ≤ τ f ,a,N . 2. (τ f ,a,M < M) ⊂ (τ f ,a,N = τ f ,a,M ). 3. (τ f ,a,N ≤ r) = (τ f ,a,M ≤ r) for each r ∈ (0,M). Proof. 1. Let ω ∈ domain(τ f ,a,M ) ∩ domain(τ f ,a,N ) be arbitrary. For the sake of a contradiction, suppose t ≡ τ f ,a,M (ω) > s ≡ τ f ,a,N (ω). Then s < τ f ,a,M (ω). Hence we can apply Condition (i) of Definition 10.11.1 to the first exit time τ f ,a,M , obtaining f (Xs (ω)) < a. At the same time, τ f ,a,N (ω) < t ≤ N. Hence we can apply Condition (ii) to the first exit time τ f ,a,N , obtaining f (Xτ ( f ,a,N ) (ω)) ≥ a. In other words, f (Xs (ω)) ≥ a, which is a contradiction. We conclude that τ f ,a,N (ω) ≥ τ f ,a,M (ω), where ω ∈ domain(τ f ,a,M ) ∩ domain(τ f ,a,M ) is arbitrary. Assertion 1 is proved.
a.u. Càdlàg Process
489
2. Next, suppose t ≡ τ f ,a,M (ω) < M. Then Condition (ii) of Definition 10.11.1 implies that f (Xt (ω)) ≥ a. For the sake of a contradiction, suppose t < τ f ,a,N (ω). Then Condition (i) of Definition 10.11.1 implies that f (Xt (ω)) < a, which is a contradiction. We conclude that τ f ,a,M (ω) ≡ t ≥ τ f ,a,N (ω). Combining with Assertion 1, we see that τ f ,a,M (ω) = τ f ,a,N (ω). Assertion 2 is proved. 3. Note that (τ f ,a,N ≤ r) ⊂ (τ f ,a,M ≤ r) = (τ f ,a,M ≤ r)(τ f ,a,M < M) ⊂ (τ f ,a,M ≤ r)(τ f ,a,M = τ f ,a,N ) ⊂ (τ f ,a,M ≤ r)(τ f ,a,N ≤ r) = (τ f ,a,N ≤ r), (10.11.1) where we used the just established Assertions 1 and 2 repeatedly. Since the leftmost set and the rightmost set in relation 10.11.1 are the same, all inclusions therein can be replaced by equality. Assertion 3 and the lemma are proved. Proposition 10.11.3. Basic properties of stopping times relative to a right continuous filtration. All stopping times in the following discussion will be relative to some given filtration L ≡ {L(t) : t ∈ [0,∞)} and will have values in [0,∞). Let τ ,τ be arbitrary stopping times. Then the following conditions hold: 1. (Approximating stopping time by stopping times that have regularly spaced dyadic rational values.) Let τ be an arbitrary stopping time relative to the filtration L. Then for each regular point t of the r.r.v. τ , we have (τ < t),(τ = t) ∈ L(t) . Moreover, there exists a sequence (ηh )h=0,1,... of stopping times relative to the filtration L, such that for each h ≥ 0, (i) the stopping time ηh has positive values in the enumerated set h ≡ {s0,s1,s2, . . .} ≡ {0,h,2h, . . .}; Q (ii) τ + 2−h−1 < ηh < τ + 2−h+2 ;
(10.11.2)
and (iii) for each j ≥ 1, there exists a regular point rj of τ with rj < sj such that (ηh ≤ sj ) ∈ L(r(j )) . 2. (Construction of stopping time as right limit of sequence of stopping times.) Suppose the filtration L is right continuous. Suppose (ηh )h=0,1,... is a sequence of stopping times relative to L, such that for some r.r.v. τ , we have (i ) τ ≤ ηh for each h ≥ 0 and (ii) ηh → τ in probability. Then τ is a stopping time. 3. Suppose the filtration L is right continuous. Then τ ∧τ , τ ∨τ , and τ +τ are stopping times. 4. If τ ≤ τ then L(τ ) ⊂ L(τ ) . 5. Suppose the filtration L is right continuous. Suppose τ is a stopping time. Define the probability subspace L(τ +) ≡ s>0 L(τ +s) . Then L(τ +) = L(τ ) , where L(τ ) is the probability space first introduced in Definition 8.1.9.
490
Stochastic Process
Proof. 1. Suppose τ is a stopping time. Let t be an arbitrary regular point of the r.r.v. τ . Then, according to Definition 5.1.2 there exists an increasing sequence (rk )k=1,2,... of regular points of τ such that rk ↑ t and P (τ ≤ rk ) ↑ P (τ < t). Consequently, E|1τ ≤r(k) − 1τ t be arbitrary. Take any common regular point s > t of the three r.r.v.’s τ ,τ ,τ ∧ τ . Then (τ ≤ s) ∪ (τ > s) and (τ ≤ s) ∪ (τ > s) are full sets. Hence (τ ∧ τ ≤ s) = (τ ≤ s) ∪ (τ ≤ s) ∈ L(s) ⊂ L(r) .
(10.11.5)
Now let s ↓ t. Then, since t is a regular point of the r.r.v. τ ∧ τ , we have E1(τ ∧τ ≤s) ↓ E1(τ ∧τ ≤t) . Then E|1(τ ∧τ ≤s) − 1(τ ∧τ ≤t) | → 0. Consequently, since 1(τ ∧τ ≤s) ∈ L(r) according to relation 10.11.5, it follows that 1(τ ∧τ ≤t) ∈ L(r) , where r > t is arbitrary. Hence 1(τ ∧τ ≤t) ∈ L(r) ≡ L(t+) = L(t), r∈(t,∞)
where the last equality is again thanks to the right continuity of the filtration L. Thus τ ∧ τ is a stopping time relative to L. Similarly, we can prove that τ ∨ τ and τ + τ are stopping times relative to L. Assertion 3 is verified. 7. Suppose the stopping times τ ,τ are such that τ ≤ τ . Let Y ∈ L(τ ) be arbitrary. Consider each regular point t of the stopping time τ . Take an arbitrary regular point t of τ such that t t . Then Y 1(τ ≤t ) = Y 1(t 0. Since T1,2 is a distribution, there exists, in turn, x is an integration on (S ,d ) in the some z ∈ S2 such that f (z) > 0. Thus T0,2 2 2 x is a sense of Definition 4.2.1. Since 1 ∈ C(S2,d2 ) and d2 ≤ 1, it follows that T0,2 distribution on (S2,d2 ) in the sense of Definition 5.2.1, where x ∈ S0 is arbitrary. The conditions in Definition 11.2.1 have been verified for T0,2 to be a transition distribution.
By Definition 11.2.1, the domain and range of a transition distribution are spaces of continuous functions. We next extend both to spaces of integrable functions. Recall that for each x ∈ S, we let δx denote the distribution concentrated at x. Proposition 11.2.3. Complete extension of a transition distribution relative to an initial distribution. Let (S0,d0 ) and (S1,d1 ) be compact metric spaces, with d0 ≤ 1 and d1 ≤ 1. Let T be a transition distribution from (S0,d0 ) to (S1,d1 ). Let E0 be a distribution on (S0,d0 ). Define the composite function E1 ≡ E0 T : C(S1 ) → R
(11.2.1)
Then the following conditions hold: 1. E1 ≡ E0 T is a distribution on (S1,d1 ). 2. For each i = 0,1, let (Si ,LE(i),Ei ) be the complete extension of (Si ,C(Si ),Ei ). Let f ∈ LE(1) be arbitrary. Define the function Tf on S0 by domain(Tf ) ≡ {x ∈ S0 : f ∈ Lδ(x)T }
(11.2.2)
(Tf )(x) ≡ (δx T )f
(11.2.3)
and by
for each x ∈ domain(Tf ). Then (i) Tf ∈ LE(0) and (ii) E0 (Tf ) = E1 f ≡ (E0 T )f . 3. Moreover, the extended function T : LE(1) → LE(0), thus defined, is a contraction mapping and hence continuous relative to the norm E1 | · | on LE(1) and the norm E0 | · | on LE(0) . Proof. 1. By the defining equality 11.2.2, the function E1 is clearly linear and nonnegative. Suppose E1 f ≡ E0 Tf > 0 for some C(S1,d1 ). Then, since E0 is a distribution, there exists x ∈ S0 such that T x f > 0. In turn, since T x is a distribution, there exists y ∈ S1 such that f (y) > 0. Thus E1 is an integration. Since 1 ∈ C(S1,d1 ) and d1 ≤ 1, it follows that E1 is a distribution. 2. For each i = 0,1, let (Si ,LE(i),Ei ) be the complete extension of (Si ,C(Si ),Ei ). Let f ∈ LE(1) be arbitrary. Define the function Tf on S0 by equalities 11.2.2 and 11.2.3. Then, by Definition 4.4.1 of complete extensions,
Markov Process
497 ∞
there exists a sequence (fn )n=1,2,... in C(S1 ) such that (i ) ∞ n=1 E1 |fn | < ∞, (ii ) ∞
x ∈ S1 : |fn (x)| < ∞ ⊂ domain(f ),
n=1 E0 T |fn |
=
n=1
∞
and (iii ) f (x) = n=1 fn (x) for each x ∈ S1 with ∞ n=1 |fn (x)| < ∞. Condition (i ) implies that the subset ∞ ∞
x D0 ≡ x ∈ S0 : T |fn |(x) < ∞ ≡ x ∈ S0 : T |fn | < ∞ n=1
n=1
of the probability space (S0,LE(0),E0 ) is a full subset. It implies also that the ∞ function g ≡ n=1 Tfn , with domain(g) ≡ D0 , is a member of LE(0) . Now consider an arbitrary x ∈ D0 . Then ∞
|T x fn | ≤
n=1
∞
T x |fn | < ∞.
(11.2.4)
n=1
Together with Condition (iii ), this implies that f ∈ Lδ(x)T , with (δx T )f = ∞ n=1 (δx T )fn . Hence, according to the defining equalities 11.2.2 and 11.2.3, we have x ∈ domain(Tf ), with ∞
(Tf )(x) ≡ (δx T )f = Tfn (x) = g(x). n=1
Thus Tf = g on the full subset D0 of (S0,LE(0),E0 ). Since g ∈ LE(0) , it follows that Tf ∈ LE(0) . The desired Condition (i) is verified. Moreover, E0 (Tf ) = E0 g ≡ E0
∞
n=1
Tfn =
∞
n=1
E0 Tfn = E0 T
∞
fn = E1 f ,
n=1
where the third and fourth equalities are both justified by Condition (i ). The desired Condition (ii) is also verified. Thus Assertion 2 is proved. 3. Let f ∈ LE(1) be arbitrary. Then, in the notations of Step 2, we have N N
E0 |Tf | = E0 |g| = lim E0 Tfn = lim E0 T fn N →∞ N →∞ n=1 n=1 N N ≤ lim E0 T fn = lim E1 fn = E1 |f |. N →∞ N →∞ n=1
n=1
In short, E0 |Tf | ≤ E1 |f |. Thus the mapping T : LE(1) → LE(0) is a contraction, as alleged in Assertion 3. Definition 11.2.4. Convention regarding automatic completion of a transition distribution. We hereby make the convention that, given each transition distribution T from a compact metric space (S0,d0 ) to a compact metric space (S1,d1 )
498
Stochastic Process
with d0 ≤ 1 and d1 ≤ 1, and given each initial distribution E0 on (S0,d0 ), the transition distribution T : C(S1,d1 ) → C(S0,d0 ) is automatically completely extended to the nonnegative linear function T : LE(1) → LE(0) in the manner of Proposition 11.2.3, where E1 ≡ E0 T and (Si ,LE(i),Ei ) is the complete extension of (Si ,C(Si ),Ei ), for each i = 0,1, Thus Tf is integrable relative to E0 for each integrable function f relative to E0 T , with (E0 T )f = E0 (Tf ). In the special case where E0 is the point mass distribution δx concentrated at some x ∈ S0 , we have x ∈ domain(Tf ) and T x f = (Tf )x , for each integrable function f relative to T x . Lemma 11.2.5. One-step transition distribution at step m. Let (S,d) be a compact metric space with d ≤ 1. Let T be a transition distribution from (S,d) to (S,d), with a modulus of smoothness αT . Define 1 T ≡ T . Let m ≥ 2 and f ∈ C(S m,d m ) be arbitrary. Define a function (m−1 T )f on S m−1 by ! ((m−1 T )f )(x1, . . . ,xm−1 ) ≡ T x(m−1) (dxm )f (x1, . . . ,xm−1,xm ) (11.2.5) for each x ≡ (x1, . . . ,xm−1 ) ∈ S m−1 . Then the following conditions hold: 1. If f ∈ C(S m,d m ) has values in [0,1] and has a modulus of continuity δf , then the function (m−1 T )f is a member of C(S m−1,d m−1 ), with values in [0,1], and has the modulus of continuity αα(T ) (δf ) : (0,∞) → (0,∞) defined by αα(T ) (δf )(ε) ≡ αT (δf )(2−1 ε) ∧ δf (2−1 ε)
(11.2.6)
for each ε > 0. 2. For each x ≡ (x1, . . . ,xm−1 ) ∈ S m−1 , the function m−1
T : C(S m,d m ) → C(S m−1,d m−1 )
is a transition distribution from (S m−1,d m−1 ) to (S m,d m ), with modulus of smoothness αα(T ) . We will call m−1 T the one-step transition distribution at step m according to T . Proof. 1. Let f ∈ C(S m,d m ) be arbitrary, with values in [0,1] and with a modulus of continuity δf . Since T is a transition distribution, T x(m−1) is a distribution on (S,d) for each xm−1 ∈ S. Hence the integration on the right-hand side of equality 11.2.5 makes sense and has values in [0,1]. Therefore the left-hand side is well defined and has values in [0,1]. We need to prove that the function (m−1 T )f is a continuous function.
Markov Process
499
2. To that end, let ε > 0 be arbitrary. Let x ≡ (x1, . . . ,xm−1 ),x ≡ (x1 , . . . , ) ∈ S m−1 be arbitrary such that xm−1 αα(T ) (δf )(ε). d m−1 (x,x ) < As an abbreviation, write y ≡ xm−1 and y ≡ xm−1 . With x,y fixed, the function f (x,·) on S also has a modulus of continuity δf . Hence the function Tf (x,·) has a modulus of continuity αT (δf ), by Definition 11.2.1. Therefore, since
αα(T ) (δf )(ε) ≤ αT (δf )(2−1 ε), d(y,y ) ≤ d m−1 (x,x ) < where the last inequality is by the defining equality 11.2.6, it follows that |(Tf (x,·))(y) − (Tf (x,·))(y )| < 2−1 ε. In other words,
!
! T y (dz)f (x,z) =
T y (dz)f (x,z) ± 2−1 ε.
(11.2.7)
At the same time, for each z ∈ S, since d m ((x,z),(x ,z)) = d m−1 (x,x ) < δf (2−1 ε), we have |f (x,z) − f (x ,z)| < 2−1 ε. Hence ! ! ! T y (dz)f (x,z) = T y (dz)(f (x ,z) ± 2−1 ε) = T y (dz)f (x ,z) ± 2−1 ε. Combining with equality 11.2.7, we obtain ! ! T y (dz)f (x,z) = T y (dz)(f (x ,z) ± ε. In view of the defining equality 11.2.5, this is equivalent to ((m−1 T )f )(x) = ((m−1 T )f )(x ) ± ε. Thus (m−1 T )f is continuous, with a modulus of continuity αα(T ) (δf ). Assertion 1 is proved. 3. By linearity, we see that (m−1 T )f ∈ C(S m−1,d m−1 ) for each f ∈ C(S m,d m ). Therefore the function m−1
T : C(S m,d m ) → C(S m−1,d m−1 )
is well defined. It is clearly linear and nonnegative from the defining formula 11.2.5. Consider each x ≡ (x1, . . . ,xm−1 ) ∈ S m−1 . Suppose (m−1 T )x f ≡ x(m−1) T (dy)f (x,y) > 0. Then, since T x(m−1) is a distribution, there exists y ∈ S such that f (x,y) > 0. Hence (m−1 T )x is an integration on (S m,d m ) in the sense of Definition 4.2.1. Since d m ≤ 1 and 1 ∈ C(S m,d m ), the function (m−1 T )x is a distribution on (S m,d m ) in the sense of Definition 5.2.1. We have verified all the conditions in Definition 11.2.1 for m−1 T to be a transition distribution. Assertion 2 is proved.
500
Stochastic Process 11.3 Markov Semigroup
Recall that Q denotes one of the three parameter sets {0,1, . . .}, Q∞ , or [0,∞). Let (S,d) be a compact metric space with d ≤ 1. As discussed at the beginning of Section 11.2, the assumption of compactness is no loss of generality. Definition 11.3.1. Markov semigroup. Let (S,d) be a compact metric space with d ≤ 1. Unless otherwise specified, the symbol · will stand for the supremum norm for the space C(S,d). Let T ≡ {Tt : t ∈ Q} be a family of transition distributions from (S,d) to (S,d), such that T0 is the identity mapping. Suppose the following three conditions are satisfied: 1. (Smoothness.) For each N ≥ 1, for each t ∈ [0,N ]Q, the transition distribution Tt has some modulus of smoothness αT,N , in the sense of Definition 11.2.1. Note that the modulus of smoothness αT,N is dependent on the finite interval [0,N ], but is otherwise independent of t. 2. (Semigroup property.) For each s,t ∈ Q, we have Tt+s = Tt Ts . 3. (Strong continuity.) For each f ∈ C(S,d) with a modulus of continuity δf and with f ≤ 1, and for each ε > 0, there exists δT (ε,δf ) > 0 so small that for each t ∈ [0,δT (ε,δf ))Q, we have
f − Tt f ≤ ε.
(11.3.1)
Note that this strong continuity condition is trivially satisfied if Q = {0,1, . . .}. Then we call the family T a Markov semigroup of transition distributions with state space (S,d) and parameter space Q. For short, we will simply call T a semigroup. The operation δT is called a modulus of strong continuity of T. The sequence αT ≡ (αT,N )N =1,2,... is called the modulus of smoothness of the semigroup T. We will let T denote the set of semigroups T with state space (S,d) and with parameter set Q. The next lemma strengthens the continuity of Tt at t = 0 to uniform continuity over t ∈ Q. Lemma 11.3.2. Uniform strong continuity on the parameter set. Let (S,d) be a compact metric space with d ≤ 1. Suppose Q is one of the three parameter sets {0,1, . . .}, Q∞ , or [0,∞). Let T, T be arbitrary semigroups with state space (S,d) and parameter space Q, and with a common modulus of strong continuity δT . Then the following conditions hold: 1. Let f ∈ C(S,d) be arbitrary, with a modulus of continuity δf and with |f | ≤ 1. Let ε > 0 and r,s ∈ Q be arbitrary, with |r − s| < δT (ε,δf ). Then
Tr f − Ts f ≤ ε. 2. Let ι : (0,∞) → (0,∞) denote the identity operation, defined by ι(ε) ≡ ε for each ε > 0. Let ε > 0 and t ∈ [0,δT (ε,ι))Q be arbitrary. Then
Tt (d(x,·)) − d(x,·) ≤ ε for each x ∈ S.
Markov Process
501
3. Suppose Tr = T r for each r ∈ Q , for some dense subset of Q of Q. Then Tt = T t for each t ∈ Q. In short, T = T. Proof. 1. The proof for the case where Q = {0,1, . . .} is trivial and is omitted. 2. Suppose Q = Q∞ or Q = [0,∞). Let ε > 0 and r,s ∈ Q be arbitrary with 0 ≤ s − r < δT (ε,δf ). Then, for each x ∈ S, we have |Tsx f − Trx f | = |Trx (Ts−r f − f )| ≤ Ts−r f − f ≤ ε, where the equality is by the semigroup property, where the first inequality is because Trx is a distribution on (S,d), and where the last inequality is by the definition of δT as a modulus of strong continuity. Thus
Tr f − Ts f ≤ ε.
(11.3.2)
3. Let ε > 0 and r,s ∈ Q∞ be arbitrary with |s − r| < δT (ε,δf ). Either 0 ≤ s − r < δT (ε,δf ), in which case inequality 11.3.2 holds according to Step 2, or 0 ≤ r − s < δT (ε,δf ), in which case inequality 11.3.2 holds similarly. Thus Assertion 1 is proved if Q = Q∞ . 4. Now suppose Q = [0,∞). Let ε > 0 and r,s ∈ [0,∞) be arbitrary with |r − s| < δT (ε,δf ). Let ε > 0 be arbitrary. Let t,v ∈ Q∞ be arbitrary such that (i) r ≤ t < r + δT (ε ,δf ), (ii) s ≤ v < s + δT (ε ,δf ), and (iii) |t − v| < δT (ε,δf ). Then, according to inequality 11.3.2 in Step 2, we have Tt f − Tr f ≤ ε and
Tv f − Ts f ≤ ε . According to Step 3, we have Tt f − Tv f ≤ ε. Combining, we obtain
Tr f − Ts f ≤ Tt f − Tr f + Ts f − Tv f + Tt f − Tv f < ε + ε + ε. Letting ε → 0, we obtain Tr f − Ts f ≤ ε. Thus Assertion 1 is also proved for the case where Q = [0,∞). 5. To prove Assertion 2, consider each x ∈ S. Then the function fx ≡ (1 − d(·,x)) ∈ C(S,d) is nonnegative and has a modulus of continuity δf (x) defined by δf (x) (ε) ≡ ι(ε) ≡ ε for each ε > 0. Now let ε > 0 be arbitrary. Let t ∈ [0,δT (ε,ι))Q be arbitrary. Then t < δT (ε,δf (x) ). Hence, by Condition 3 of Definition 11.3.1, we have
Tt (d(x,·)) − d(x,·) = Tt fx − fx ≤ ε. Assertion 2 is proved. 6. Let T and Q be as given in Assertion 3. Consider each t ∈ Q. Let (rk )k=1,2,... be a sequence in Q such that rk → t. Let f ∈ C(S,d) be arbitrary. Then Tr(k) f = T r(k) f for each k ≥ 1, by hypothesis. At the same time, Tr(k) f − Tt f → 0 by Assertion 1. Similarly, T r(k) f − T t f → 0. Combining, Tt f = T t f , where f ∈ C(S,d) is arbitrary. Thus Tt = T t as transition distributions. Assertion 3 and the lemma are proved.
502
Stochastic Process 11.4 Markov Transition f.j.d.’s
In this section, we will define a consistent family of f.j.d.’s generated by an initial distribution and a semigroup. The parameter set Q is assumed to be one of the three sets {0,1, . . .}, Q∞ , or [0,∞). We will refer loosely to the first two as the metrically discrete parameter sets. For ease of presentation, we assume that the state space (S,d) is compact with d ≤ 1. Let ξ ≡ (Ak )k=1,2,... be a binary approximation of (S,d) relative to x◦ . Let π ≡ ({gk,x : x ∈ Ak })k=1,2,... be the partition of unity of (S,d) determined by ξ , as in Definition 3.3.4. Definition 11.4.1. Family of transition f.j.d.’s generated by an initial distribution and a Markov semigroup. Let Q be one of the three sets {0,1, . . .}, Q∞ , or [0,∞). Let T be an arbitrary Markov semigroup, with the compact state space (S,d) where d ≤ 1, and with parameter set Q. Let E0 be an arbitrary distribution on (S,d). For arbitrary m ≥ 1, f ∈ C(S m,d m ), and nondecreasing sequence r1 ≤ · · · ≤ rm in Q, define ! ! ! E(0),T x(0) x(1) Fr(1),...,r(m) f ≡ E0 (dx0 ) Tr(1) (dx1 ) Tr(2)−r(1) (dx2 ) · · · ! x(m−1) × Tr(m)−r(m−1) (dxm )f (x1, . . . ,xm ). (11.4.1) In the special case where E0 ≡ δx is the distribution that assigns probability 1 to some point x ∈ S, we will simply write ∗,T x,T (x) ≡ Fr(1),...,r(m) ≡ Fr(1),...,r(m) . Fr(1),...,r(m) δ(x),T
(11.4.2)
∗,T The next theorem will prove that Fr(1),...,r(m) : C(S m,d m ) → C(S,d) is then a well-defined transition distribution. 1. An arbitrary consistent family
{Fr(1),...,r(m) f : m ≥ 0;r1, . . . ,rm in Q} of f.j.d.’s satisfying Condition 11.4.1 is said to be generated by the initial distribution E0 and the semigroup T. 2. An arbitrary process X : Q × (,L,E) → (S,d), whose marginal distributions are given by a consistent family generated by the initial distribution E0 and the semigroup T, will itself be called a process generated by the initial distribution E0 and semigroup T. We will see later that such processes are Markov processes. 3. In the special case where E0 ≡ δx , the qualifier “generated by the initial distribution E0 ” will be replaced simply by “generated by the initial state x.” Theorem 11.4.2. Construction of a family of transition f.j.d.’s from an initial distribution and semigroup. Let (S,d) be a compact metric space with d ≤ 1.
Markov Process
503
Let Q be one of the three sets {0,1, . . .}, Q∞ , or [0,∞). Let T be an arbitrary semigroup with state space (S,d), with parameter set Q, with a modulus of strong continuity δT , and with a modulus of smoothness αT ≡ (αT,N )N =1,2,... , in the sense of Definition 11.3.1. Then the following conditions hold: 1. Let the sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm in Q be arbitrary. Then the function ∗,T : C(S m,d m ) → C(S,d), Fr(1),...,r(m)
as defined by equality 11.4.2 in Definition 11.4.1, is a well-defined transition distribution. Specifically, it is equal to the composite transition distribution ∗,T = (1 Tr(1)−r(0) )(2 Tr(2)−r(1) ) · · · (m Tr(m)−r(m−1) ), Fr(1),...,r(m)
(11.4.3)
where the factors on the right hand side are one-step transition distributions defined in Lemma 11.2.5. 2. Let the sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm in Q be arbitrary. Let N ≥ 1 be so αα(T ,N ) be the modulus of smoothness of the one-step large that rm ≤ N. Let transition distribution i
Tr(i)−r(i−1) : C(S i ,d i ) → C(S i−1,d i−1 )
for each i = 1, . . . ,m, as constructed in Lemma 11.2.5. Then the transition dis∗,T has a modulus of smoothness tribution Fr(1),...,r(m) (m)
αα(T,N ) ◦ · · · ◦ αα(T,N ) . αα(T,N ) ≡ 3. For each m ≥ 1 and ε > 0, there exists δm (ε,δf ,δT,αT ) > 0 such that for each f ∈ C(S m,d m ) with values in [0,1] and a modulus of continuity δf , and for arbitrary nondecreasing sequences r1 ≤ · · · ≤ rm and s1 ≤ · · · ≤ sm in Q with m
|ri − si | < δm (ε,δf ,δT,αT ),
i=1
we have
∗,T ∗,T Fr(1),...,r(m) f − Fs(1),...s(m) f ≤ ε.
4. Suppose Q = {0,1, . . .} or Q = Q∞ . Let E0 be an arbitrary distribution on (S,d). Then the family E(0),T f : m ≥ 1;r1 ≤ · · · ≤ rm in Q} F E(0),T ≡ {Fr(1),...,r(m)
can be uniquely extended to a consistent family E(0),T
Sg,fj d (E0,T) ≡ F E(0),T ≡ {Fr(1),...,r(m) f : m ≥ 1;r1, . . . ,rm in Q} of f.j.d.’s with parameter set Q and state space (S,d), with said family F E(0),T being continuous in probability, with a modulus of continuity in probability δCp (·,δT,αT ). Moreover, the consistent family Sg,fj d (E0,T) is generated by the
504
Stochastic Process
initial distribution E0 and the semigroup T, in the sense of Definition 11.4.1. Thus, in the special case where Q = {0,1, . . .} or Q = Q∞ , we have a mapping Cp (Q,S), Sg,fj d : J(S,d) × T → F where J(S,d) is the space of distributions E0 on (S,d), where T is the space of Cp (Q,S) is semigroups with state space (S,d) and parameter set Q, and where F the set of consistent families of f.j.d.’s with parameter set Q and state space S, whose members are continuous in probability. 5. Suppose Q = {0,1, . . .} or Q = Q∞ , and consider the special case where E0 ≡ δx for some x ∈ S. Write Sg,fj d (x,T) ≡ Sg,fj d (δx ,T) ≡ F x,T . Then we have a function Cp (Q,S). Sg,fj d : S × T → F Proof. 1. Let E0 be an arbitrary distribution on (S,d). Let the sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm in Q be arbitrary. Consider each f ∈ C(S m,d m ). By the defining equality 11.4.1, we have ! ! ! E(0),T x(0) x(1) (dx2 ) · · · Fr(1),...,r(m) f = E0 (dx0 ) Tr(1) (dx1 ) Tr(2)−r(1) ! x(m−1) × Tr(m)−r(m−1) (dxm )f (x1, . . . ,xm ). (11.4.4) By the defining equality 11.2.5 of Lemma 11.2.5, the rightmost integral is equal to ! x(m−1) Tr(n)−r(n−1) (dxm )f (x1, . . . ,xm−1,xm ) ≡ ((m−1 Tr(m)−r(m−1) )f )(x1, . . . ,xm−1 ). (11.4.5) Recursively backward, Equality 11.4.4 becomes ! E(0),T Fr(1),...,r(m) f = E0 (dx0 )(0 Tr(1)−r(0) ) · · · (m−1 Tr(m)−r(m−1) )f . In particular, x,T x Fr(1),...,r(m) f = (0 Tr(1)−r(0) ) · · · (m−1 Tr(m)−r(m−1) )f .
for each x ∈ S. In other words, ∗,T = (1 Tr(1)−r(0) )(2 Tr(2)−r(1) ) · · · (m Tr(m)−r(m−1) ). Fr(1),...,r(m)
(11.4.6)
2. Now Lemma 11.2.5 says the factors (1 Tr(1)−r(0) ),(2 Tr(2)−r(1) ), . . . , (m Tr(m)−r(m−1) ) on the right-hand side are a transition distribution with the common modulus of smoothness αα(T ,N ) defined therein. Hence, by repeated ∗,T is a transition applications of Lemma 11.2.2, the composite Fr(1),...,r(m) distribution, with a modulus of smoothness that is the m-fold composite operation
Markov Process
505
(m)
αα(T ,N ) ≡ αα(T,N ) ◦ · · · ◦ αα(T,N ) . Assertions 1 and 2 of the present theorem are proved. 3. Proceed to prove Assertion 3 by induction on m ≥ 1. Consider each f ∈ C(S m,d m ) with values in [0,1] and a modulus of continuity δf . Let ε > 0 be arbitrary. In the case where m = 1, define δ1 ≡ δ1 (ε,δf ,δT,αT ) ≡ δT (ε,δf ). Suppose r1,s1 in Q are such that |r1 − s1 | < δ1 (ε,δf ,δT,αT ) ≡ δT (ε,δf ). Then 1 ∗,T ∗,T Fr(1) f − Fs(1) f = Tr(1)−r(0) f − 1 Ts(1)−s(0) f = Tr(1) f − Ts(1) f ≤ ε, where the inequality is by Lemma 11.3.2. Assertion 3 is thus proved for the starting case m = 1. 4. Suppose, inductively, for some m ≥ 2, that the operation δm−1 (·,δf ,δT,αT ) has been constructed with the desired properties. Define δm (ε,δf ,δT,αT ) ≡ 2−1 δm−1 (2−1 ε,δf ,δT,αT ) ∧ δm−1 (ε,δf ,δT,αT ).
(11.4.7)
Suppose m
|ri − si | < δm (ε,δf ,δT,αT ).
(11.4.8)
i=1
Define the function h ≡ (2 Tr(2)−r(1) ) · · · (m Tr(m)−r(m−1) )f ∈ C(S,d).
(11.4.9)
Then, by the induction hypothesis for an (m − 1)-fold composite, the function h has modulus of continuity δ1 (·,δf ,δT,αT ). We emphasize here that as m ≥ 2, the modulus of smoothness of the one-step transition distribution 2 Tr(2)−r(1) on the right-hand side of equality 11.4.9 actually depends on the modulus αT , according to Lemma 11.2.5. Hence the modulus of continuity of function h indeed depends on αT , which justifies the notation. At the same time, inequality 11.4.8 and the defining equality 11.4.7 together imply that |r1 − s1 | < δm (ε,δf ,δT,αT ) ≤ · · · ≤ δ1 (ε,δf ,δT,αT ). Hence ∗,T 1 ∗,T ( Tr(1)−r(0) )h − (1 Ts(1)−s(0) )h = Fr(1) h − Fs(1) h ≤ 2−1 ε,
(11.4.10)
where the inequality is by the induction hypothesis for the starting case where m = 1.
506
Stochastic Process
5. Similarly, inequality 11.4.8 and the defining equality 11.4.7 together imply that m
|(ri − ri−1 ) − (si − si−1 )| ≤ 2
i=2
m
(ri − si | < δm−1 (2−1 ε,δf ,δT,αT ).
i=1
Hence 2 ( Tr(2)−r(1) ) · · · (m Tr(m)−r(m−1) )f − (2 Ts(2)−s(1) ) · · · (m Ts(m)−s(m−1) )f ∗,T ∗,T ≡ Fr(2),...,r(m) f − Fs(2),...,s(m) f < 2−1 ε, (11.4.11) where the inequality is by the induction hypothesis for the case where m − 1. 6. Combining, we estimate, for each x ∈ S, the bound x,T x,T f − Fs(1),...,s(m) f| |Fr(1),...,r(m) x )(2 Tr(2)−r(1) ) · · · (m Tr(m)−r(m−1) )f = |(1 Tr(1)−r(0) x )(2 Ts(2)−s(1) ) · · · (m Ts(m)−s(m−1) )f | − (1 Ts(1)−s(0) x x )h − (1 Ts(1)−s(0) )(2 Ts(2)−s(1) ) · · · (m Ts(m)−s(m−1) )f | ≡ |(1 Tr(1)−r(0) x x )h − (1 Ts(1)−s(0) )h| ≤ |(1 Tr(1)−r(0) x x )h − (1 Ts(1)−s(0) )(2 Ts(2)−s(1) ) · · · (m Ts(m)−s(m−1) )f | + |(1 Ts(1)−s(0) x )(2 Tr(2)−r(1) ) · · · (m Tr(m)−r(m−1) )f ≤ 2−1 ε + |(1 Ts(1)−s(0) x )(2 Ts(2)−s(1) ) · · · (m Ts(m)−s(m−1) )f | − (1 Ts(1)−s(0) x )|(2 Tr(2)−r(1) ) · · · (m Tr(m)−r(m−1) )f ≤ 2−1 ε + (1 Ts(1)−s(0)
− (2 Ts(2)−s(1) ) · · · (m Ts(m)−s(m−1) )f | < 2−1 ε + 2−1 ε = ε, where the second inequality is by inequality 11.4.10, and where the last inequality is by inequality 11.4.11. Since x ∈ S is arbitrary, it follows that ∗,T ∗,T Fr(1),...,r(m) f − Fs(1),...s(m) f ≤ ε. Induction is completed, and Assertion 3 is proved. 7. To prove Assertion 4, assume that Q = {0,1, . . .} or Q = Q∞ . We need to prove that the family B A E(0),T Fr(1),...,r(m) : m ≥ 1;r1, . . . ,rm ∈ Q;r1 ≤ · · · ≤ rm can be uniquely extended to a consistent family A B E(0),T Fs(1),...,s(m) : m ≥ 1;s1, . . . ,sm ∈ Q
Markov Process
507
of f.j.d.’s with parameter set Q. We will give the proof only for the case where Q = Q∞ , with the case of {0,1, . . .} being similar. Assume in the following that Q = Q∞ . ∗,T is a transition distribution, Proposition 11.2.3 says that 8. Because Fr(1),...,r(m) the composite function ∗,T = E0 (1 Tr(1)−r(0) )(2 Tr(2)−r(1) ) · · · (m Tr(m)−r(m−1) ) Fr(1),...,r(m) = E0 Fr(1),...,r(m) (11.4.12) E(0),T
is a distribution on (S m,d m ), for each m ≥ 1, and for each sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm in Q∞ . 9. To proceed, let m ≥ 2 and r1, . . . ,rm ∈ Q∞ be arbitrary, with r1 ≤ · · · ≤ rm . Let n = 1, . . . ,m be arbitrary. Define the sequence n, . . . ,m), κ ≡ κn,m ≡ (κ1, . . . ,κm−1 ) ≡ (1, . . . , where the caret on the top of an element in a sequence signifies the omission of that ∗ : S m → S m−1 denote the dual function element in the sequence. Let κ ∗ ≡ κn,m of the sequence κ, defined by κ ∗ (x1, . . . ,xm ) ≡ κ ∗ (x) ≡ x ◦ κ = (xκ(1), . . . ,xκ(m−1) ) = (x1, . . . , xn, . . . ,xm ) for each x ≡ (x1, . . . ,xm ) ∈ S m . Let f ∈ C(S m−1 ) be arbitrary. We will prove that ∗ Fr(1),..., r(n),··· = ,r(m) f = Fr(1),...,r(m) f ◦ κn,m . E(0),T
E(0),T
(11.4.13)
To that end, note that equality 11.4.4 yields ! ! ! E(0),T x(0) x(n−1) E0 (dx0 ) Tr(1) (dx1 ) · · · Tr(n+1)−r(n−1) (dyn ) Fr(1),..., r(n)··· = ,r(m) f ≡ C!
×
y(n)
Tr(n+2)−r(n+1) (dyn+1 ) · · ·
D ! y(m−2) × Tr(m)−r(m−1) (dym−1 )f (x1, . . . ,xn−1,yn, . . . ,ym−1 ) For each fixed (x1, . . . ,xn−1 ), the expression in braces is a continuous function of the one variable yn . Call this function gx(1),...,x(n−1) ∈ C(S,d). Then the last displayed equality can be continued as ! ! ! x(0) x(n−1) ≡ E0 (dx0 ) Tr(1) (dx1 ) · · · Tr(n+1)−r(n−1) (dyn )gx(1),...,x(n−1) (yn ) ! ! x(0) = E0 (dx0 ) Tr(1) (dx1 ) · · · ! ! x(n−1) x(n) × Tr(n)−r(n−1) (dxn ) Tr(n+1)−r(n) (dyn )gx(1),...,x(n−1) (yn ) ,
508
Stochastic Process
where the last equality is thanks to the semigroup property of T. Combining, we obtain ! ! ! E(0),T x(0) x(n−1) f = E0 (dx0 ) Tr(1) (dx1 ) · · · Tr(n)−r(n−1) (dxn ) Fr(1),..., r(n)...,r(m) = ! x(n) × Tr(n+1)−r(n) (dyn )gx(1),...,x(n−1) (yn ) ! ! ! x(0) x(n−1) ≡ E0 (dx0 ) Tr(1) (dx1 ) · · · Tr(n)−r(n−1) (dxn ) ! × ! ×
C! x(n) Tr(n+1)−r(n) (dyn )
D y(m−2)
Tr(m)−r(m−1) (dym−1 )f (x1, . . . ,xn−1,yn, . . . ,ym−1 )
!
!
=
E0 (dx0 ) ! ×
x(0) Tr(1) (dx1 ) · · ·
!
x(n−1)
Tr(n)−r(n−1) (dxn )
C! x(n) Tr(n+1)−r(n) (dxn+1 )
x(n+1)
Tr(n+2)−r(n+1) (dxn+2 ) · · · D
! ×
y(n)
Tr(n+2)−r(n+1) (dyn+1 ) · · ·
x(m−1)
Tr(m)−r(m−1) (dxm )f (x1, . . . ,xn−1,xn+1, . . . ,xm )
∗ ), = Fr(1),...,r(m) (f ◦ κn,m E(0),T
where the third equality is by a trivial change of the dummy integration variables yn, . . . ,ym−1 to xn+1, . . . ,xm , respectively. Thus equality 11.4.13 has been proved for the family B A E(0),T Fr(1),...,r(m) : m ≥ 1;r1, . . . ,rm ∈ Q;r1 ≤ · · · ≤ rm of f.j.d.’s. Consequently, the conditions in Lemma 6.2.3 are satisfied, to yield a unique extension of this family to a consistent family A B E(0),T Fs(1),...,s(m) : m ≥ 0;s0, . . . ,sm ∈ Q of f.j.d.’s with parameter set Q. Finally, equality 11.4.4 says that F E(0),T is generated by the initial distribution E0 and the semigroup T. 10. It remains to verify that the family F E(0),T is continuous in probability. To that end, let ε > 0 be arbitrary. Consider the function d ≡ 1 ∧ d ∈ C(S 2,d 2 ), with a modulus of continuity given by the identity operation ι. Consider each r,s ∈ Q with |s − r| < δCp (ε,δT,αT ) ≡ δ2 (ε,ι,δT,αT ). First assume r ≤ s. From Assertion 3, we then have ∗,T ∗,T d ≤ ε. Fr,s d − Fr,r
Markov Process
509
At the same time, by Definition 11.4.1, we have ! ! ! x(1) x,T 1,x2 ) = Trx (dx1 )d(x 1,x1 ) = 0. Fr,r d = Trx (dx1 ) T0 (dx2 )d(x (11.4.14) Hence
∗,T E(0),T ∗,T ∗,T Fr,s d = E0 Fr,s d = E0 Fr,s d − Fr,r d ≤ ε.
Similarly, we can prove the same inequality in the case where r ≥ s. Since ε > 0 is arbitrary, we conclude that the family F E(0),T is continuous in probability, with a modulus of continuity in probability δCp (ε,δT,αT ). This proves Assertions 4. Assertion 5 is a special case of Assertion 4. Corollary 11.4.3. Continuity of Markov f.j.d.’s. Let (S,d) be a compact metric space with d ≤ 1. Let T be an arbitrary semigroup with state space (S,d), with parameter set [0,∞), with a modulus of strong continuity δT , and with a modulus of smoothness αT ≡ (αT,N )N =1,2,... , in the sense of Definition 11.3.1. Let m ≥ 1 and ε > 0 be arbitrary. Then there exists δ > 0 such that for each f ∈ C(S m,d m ) with values in [0,1] and a modulus of continuity δf , and for arbitrary nondecreasing sequences r1 ≤ · · · ≤ rm and s1 ≤ · · · ≤ sm in [0,∞), we have x,T y,T Fr(1),...,r(m) f − Fs(1),...,s(m) f < ε provided that d(x,y) ∨
m
i=1 |ri
x,T − si | < δ. In short, the function Fr(1),...,r(m) f of
(x,r1, . . . ,rm ) ∈ S × {(r1, . . . ,rm ) ∈ [0,∞)m : r1 ≤ · · · ≤ rm } m , where d is uniformly continuous relative to the metric d⊗decld ecld is the Euclidean metric.
Proof. Let m ≥ 1 and ε > 0. Write ε0 ≡ 2−1 ε. There is no loss of generality to assume that the function f ∈ C(S m,d m ) has values in [0,1] and a modulus of continuity δf . Then Assertion 3 of Theorem 11.4.2 yields some δm (ε0,δf ,δT,αT ) > 0 such that for arbitrary nondecreasing sequences r1 ≤ · · · ≤ rm and s1 ≤ · · · ≤ sm in [0,∞) with m
|ri − si | < δ1 ≡ δm (ε0,δf ,δT,αT ),
i=1
we have
∗,T ∗,T Fr(1),...,r(m) f − Fs(1),...s(m) f ≤ ε0 .
Hence, for each x,y ∈ S , we have x,T x,T f − Fs(1),...,s(m) f | ≤ ε0, |Fr(1),...,r(m)
with a similar inequality for y in the place of x.
(11.4.15)
510
Stochastic Process
∗,T At the same time, Assertion 1 of Theorem 11.4.2 says that Fr(1),...,r(m) is a
∗,T transition function. Hence the function Fr(1),...,r(m) f is a member of C(S,d). Therefore, there exists δ2 > 0 such that x,T y,T Fr(1),...,r(m) f − Fr(1),...,r(m) f < ε0,
provided that d(x,y) < δ2 . Combining with inequality 11.4.15, we obtain x,T y,T Fr(1),...,r(m) f − Fs(1),...,s(m) f x,T x,T y,T y,T ≤ Fr(1),...,r(m) f − Fr(1),...,r(m) f + Fr(1),...,r(m) f − Fs(1),...,s(m) f < 2ε0 = ε provided that d(x,y) ∨
m
i=1 |ri
− si | < δ ≡ δ1 ∧ δ2 . The corollary is proved.
11.5 Construction of a Markov Process from a Semigroup In this section, we construct a Markov process from a Markov semigroup and an initial state. The next theorem gives the construction for the discrete parameter set {0,1, . . .} or Q∞ . Subsequently, Theorem 11.5.6 will do the same for the parameter set [0,∞), with the resulting Markov process having the additional property being a.u. càdlàg. First some notations. Definition 11.5.1. Notations for two natural filtrations. Let X : [0,∞) × (,L,E) → (S,d) be an arbitrary process that is continuous in probability, whose state space (S,d) is a locally compact metric space. Let Z ≡ X|Q∞ : Q∞ ×(,L,E) → (S,d). In this section, we will then use the following notations: 1. LX ≡ {L(X,t) : t ∈ [0,∞)} denotes the natural filtration of the process X, where for each t ∈ [0,∞), we have the probability subspace L(X,t) ≡ L(Xr : r ∈ [0,t]). 2. LZ ≡ {L(Z,t) : t ∈ Q∞ } denotes the natural filtration of the process Z, where for each t ∈ Q∞ , we have the probability subspace L(Z,t) ≡ L(Zr : r ∈ [0,t]Q∞ ). Lemma 11.5.2. Two natural filtrations. Let X : [0,∞) × (,L,E) → (S,d) be an arbitrary process that is continuous in probability, whose state space (S,d) is a locally compact metric space. Let Z ≡ X|Q∞ : Q∞ × (,L,E) → (S,d). Let the filtrations LX and LZ be as in Definition 11.5.1. Then for each t ∈ Q∞, we have L(Z,t) = L(X,t) . Proof. Let t ∈ Q∞ be arbitrary. Then L(Z,t) ≡ L(Zr : r ∈ [0,t]Q∞ ) = L(Xr : r ∈ [0,t]Q∞ ) ⊂ L(Xr : r ∈ [0,t]) ≡ L(X,t) .
(11.5.1)
Markov Process
511
Conversely, let Y ∈ L(X,t) be arbitrary. Then, for each ε > 0, there exists r1, . . . ,rn ∈ [0,t] and g ∈ C(S n,d n ) such that E|Y − g(Xr(1), . . . ,Xr(n) )| < ε. By continuity in probability, there exists s1, . . . ,sn ∈ [0,t]Q∞ such that E|g(Xr(1), . . . ,Xr(n) ) − g(Zs(1), . . . ,Zs(n) )| = E|g(Xr(1), . . . ,Xr(n) ) − g(Xs(1), . . . ,Xs(n) )| < ε. Combining, we have E|Y − Y | < 2ε, where Y ≡ g(Zs(1), . . . ,Zs(n) ) ∈ L(Z,t), and where ε > 0 is arbitrary. Since L(Z,t) is complete, it follows that Y ∈ L(Z,t) , where Y ∈ L(X,t) is arbitrary. Thus L(X,t) ⊂ L(Z,t) . It follows, in view of relation 11.5.1 in the opposite direction, that L(X,t) = L(Z,t) . Lemma 11.5.3. Approximation of certain stopping times. Let X : [0,∞) × (,L,E) → (S,d) be an arbitrary process that is continuous in probability, whose state space (S,d) is a locally compact metric space. Let LX ≡ {L(X,t+) : t ∈ [0,∞)} denote the right-limit extension of the natural filtration LX ≡ {L(X,t) : t ∈ [0,∞)}. Define Z ≡ X|Q∞ : Q∞ × (,L,E) → (S,d) and let LZ ≡ {L(Z,t) : t ∈ Q∞ } denote the natural filtration of the process Z. Suppose τ is a stopping time relative to LX . Then there exists a sequence (ηh )h=0,1,... of stopping times relative to LZ , such that for each h ≥ 0, (i) the stopping time ηh has positive values in the enumerated set h ≡ {s0,s1,s2, . . .} ≡ {0,h,2h, . . .} Q and (ii) τ + 2−h−1 < ηh < τ + 2−h+2 .
(11.5.2)
Proof. 1. Since τ is a stopping time relative to LX , there exists, by Assertion 1 of Proposition 10.11.3, a sequence (ηh )h=0,1,... of stopping times relative to LZ , such that for each h ≥ 0, (i) the stopping time ηh has positive values in the enumerated set h ≡ {s0,s1,s2, . . .} ≡ {0,h,2h, . . .}, Q (ii) τ + 2−h−1 < ηh < τ + 2−h+2,
(11.5.3)
and (iii) for each j ≥ 1, there exists a regular point rj of τ , with rj < sj such that (ηh ≤ sj ) ∈ L(X,r(j )+) . 2. Consider the stopping time ηh , for each h ≥ 0. By Condition (iii), for each possible value sj of the r.r.v. ηh , we have (ηh ≤ sj ) ∈ L(X,r(j )+) ≡ L(X,u) ⊂ L(X,s(j )) = L(Z,s(j )), u>r(j )
512
Stochastic Process
where the last equality is due to Lemma 11.5.2. Thus ηh is actually a stopping time relative to the filtration LZ . The present lemma is proved. Theorem 11.5.4. Existence of a Markov process with a given semigroup, and with discrete parameters. Let (S,d) be a compact metric space with d ≤ 1. Suppose Q is one of the two parameter sets {0,1, . . .} or Q∞ . Let ! (,L,E) ≡ (0,L0,I0 ) ≡ [0,1],L0, ·dx denote the Lebesgue integration space based on the unit interval [0,1]. Let T ≡ {Tt : t ∈ Q} be an arbitrary Markov semigroup with state space (S,d), and with a modulus of strong continuity δT . Let x ∈ S be arbitrary. Let F x,T ≡ Sg,fj d (x,T) be the corresponding consistent family of f.j.d.’s constructed in Theorem 11.4.2. Let Z x,T ≡ DKS,ξ (F x,T ) : Q × (,L,E) → S be the Compact Daniell–Kolmogorov–Skorokhod Extension of the consistent family F x,T , as constructed in Theorem 6.4.3. Then the following conditions hold: 1. The process Z ≡ Z x,T is generated by the initial state x and semigroup T, in the sense of Definition 11.4.1. Moreover, the process Z x,T is continuous in probability. 2. The process Z ≡ Z x,T is a Markov process relative to its natural filtration LZ ≡ {L(Z,t) : t ∈ Q}, where L(Z,t) ≡ L(Zs : s ∈ [0,t]Q} ⊂ L for each t ∈ Q. Specifically, let the nondecreasing sequence 0 ≡ s0 ≤ s1 ≤ · · · ≤ sm in Q, the function f ∈ C(S m+1,d m+1 ), and t ∈ Q be arbitrary. Then E(f (Zt+s(0),Zt+s(1), . . . ,Zt+s(m) )|L(Z,t) ) Z(t),T
= E(f (Zt+s(0),Zt+s(1), . . . ,Zt+s(m) )|Zt ) = Fs(0),...,s(m) (f )
(11.5.4)
∗,T is the transition distribution in Definition 11.4.1. as r.r.v.’s, where Fs(0),...,s(m) 3. The process Z x,T has a modulus of continuity in probability δCp,δ(T) that is completely determined by δT , and is independent of x. Hence so does the family F x,T of the marginal distributions of Z x,T .
Proof. 1. By Theorem 6.4.3, the process Z ≡ Z x,T ≡ DKS,ξ (F x,T ) has marginal distributions given by the family F x,T . Moreover, Theorem 11.4.2 says that the consistent family F x,T is generated by the initial state x and the semigroup T, in the sense of Definition 11.4.1, and that the family F x,T is continuous
Markov Process
513
in probability. Hence the process Z x,T|Q is continuous in probability. Assertion 1 is proved. 2. Let t ∈ Q be arbitrary. Let 0 ≡ r0 ≤ r1 ≤ · · · ≤ rn ≡ t and 0 ≡ s0 ≤ s1 ≤ · · · ≤ sm be arbitrary sequences in Q. Write rn+j ≡ t + sj for each j = 0, . . . ,m. Thus sj = rn+j −rn for each j = 0, . . . ,m. Consider each f ∈ C(S m+1, d m+1 ). Let h ∈ C(S n+1,S n+1 ) be arbitrary. Then Eh(Zr(0), . . . ,Zr(n) )f (Zr(n), . . . ,Zr(n+m) ) ! x,T d(x0, . . . ,xn+m )h(x0, . . . ,xn )f (xn, . . . ,xn+m ) = Fr(0),...,r(n+m) ! ! x(1) x ≡ Tr(1) (dx1 ) Tr(2)−r(1) (dx2 ) · · · ! × T x(n+m−1) (dxn+m )h(x1, . . . ,xn )f (xn, . . . ,xn+m ) ! ! x(1) x = Tr(1) (dx1 ) Tr(2)−r(1) (dx2 ) · · · ! x(n−1) × Tr(n)−r(n−1) (dxn )h(x1, . . . ,xn ) C! x(n)
Tr(n+1)−r(n) (dxn+1 ) · · ·
× ! ×
D x(n+m−1) Tr(n+m)−r(n+m−1) (dxn+m )f (xn,xn+1 . . . ,xn+m ) .
(11.5.5)
The term inside the braces in the last expression is, by changing the names of the dummy integration variables, equal to ! ! y(1) x(n) Tr(n+1)−r(n) (dy1 ) Tr(n+2)−r(n+1) (dy2 ) · · · ! y(m−1) Tr(n+m)−r(n+m−1) (dym )f (xn,y1, . . . ,ym ) ! ! ! y(1) y(m−1) x(n) = Ts(1) (dy0 ) Ts(2) (dy2 ) · · · Ts(m) (dym )f (xn,y1, . . . ,ym ) x(n),T
= Fs(1),...,s(m) f . Substituting back into equality 11.5.5, we obtain Eh(Zr(0), . . . ,Zr(n) )f (Zr(n), . . . ,Zr(n+m) ) ! ! ! x(1) x(n−1) x(n),T x (dx1 ) Tr(2)−r(1) (dx2 ) · · · Tr(n)−r(n−1) (dxn )Fs(1),...,s(m) f = Tr(1) Z(r(n)),T
= Eh(Zr(0), . . . ,Zr(n) )Fs(1),...,s(m) f ,
(11.5.6)
∗,T ∗,T where Fs(1),...,s(m) f ∈ C(S,d) because Fs(1),...,s(m) is a transition distribution according to Assertion 1 of Theorem 11.4.2.
514
Stochastic Process
3. Next note that the set of r.r.v.’s h(Zr(0), . . . ,Zr(n) ), with arbitrary 0 ≡ r0 ≤ r1 ≤ · · · ≤ rn ≡ t and arbitrary h ∈ C(S n+1,d n+1 ), is dense in L(Z,t) relative to the norm E| · |. Hence equality 11.5.6 implies, by continuity relative to the norm E| · |, that Z(r(n)),T
EYf (Zr(n), . . . ,Zr(n+m) ) = EY Fs(0),...,s(m) (f ) Z(r(n)),T
(11.5.7)
Z(t),T
for each Y ∈ L(x,t) . Since Fs(0),...,s(m) (f ) = Fs(0),...,s(m) (f ) ∈ L(Z,t) , it follows that Z(t),T
E(f (Zr(n), . . . ,Zr(n+m) )|L(Z,t) ) = Fs(0),...,s(m) (f )
(11.5.8)
or, equivalently, Z(t),T E(f (Zt ,Zt+s(1), . . . ,Zt+s(m) )|L(Z,t) ) = Fs(0),...,s(m) (f ).
(11.5.9)
In the special case where Y is arbitrary in L(Zt ) ⊂ L(Z,t), inequality 11.5.7 holds, whence Z(t),T (f ). E(f (Zt ,Zt+s(1), . . . ,Zt+s(m) )|Zt ) = Fs(0),...,s(m)
(11.5.10)
Equalities 11.5.9 and 11.5.10 together prove the desired equality 11.5.4 in Assertion 2. 4. Assertion 3 remains. We will give the proof only in the case where Q = Q∞ , with the case where Q = {0,1, . . .} being trivial. To that end, let ε > 0 be arbitrary. Let t ∈ Q∞ be arbitrary with t < δCp,δ(T) (ε) ≡ δT (ε,ι). Then, by Assertion 2 of Lemma 11.3.2, we have Tt (d(x,·)) = Tt (d(x,·)) − d(x,·) ≤ ε, − d(x,·)
(11.5.11)
for each x ∈ S. Now let r1,r2 ∈ Q∞ be arbitrary with |r2 − r1 | < δCp,δ(T) (ε). Then r(1)∧r(2),Zr(1)∨r(2) ) = F x,T r(1),Zr(2) ) = E d(Z E d(Z r(1)∧r(2),r(1)∨r(2) d ! ! x(1) x 1,x2 ) (dx1 ) Tr(1)∨r(2)−r(1)∧r(2) (dx2 )d(x = Tr(1)∧r(2) ! ! x(1) x 1,x2 ) = Tr(1)∧r(2) (dx1 ) T|r(2)−r(1)| (dx2 )d(x ! x(1) x 1,·) = Tr(1)∧r(2) (dx1 )T|r(2)−r(1)| d(x ! x 1,x1 ) + ε) ≤ Tr(1)∧r(2) (dx1 )(d(x ! x = Tr(1)∧r(2) (dx1 )ε = ε,
Markov Process
515
where the inequality is by applying inequality 11.5.11 to t ≡ |r2 − r1 | and x = x1 . Thus we have shown that δCp,δ(T) ≡ δT (·,ι) is a modulus of continuity in probability for the process Z, according to Definition 6.1.3. Note here that the operation δCp,δ(T) depends only on δT ; it is independent of x. Assertion 3 is proved. The next proposition implies that, given a Markov semigroup T ≡ {Tt : t ∈ [0,∞)}, the Markov process Z ≡ Z x,T|Q(∞) : Q∞ × (,L,E) → (S,d) can be extended by right limit to an a.u. càdlàg process Xx,T : [0,∞) × (,L,E) → (S,d). A subsequent theorem then says that the resulting process Xx,T is Markov. Proposition 11.5.5. Markov process with a semigroup on dyadic rationals is D-regular, and is extendable to a time-uniformly a.u. càdlàg process. Let (S,d) be a compact metric space with d ≤ 1. Let T ≡ {Tt : t ∈ [0,∞)} be an arbitrary Markov semigroup with state space (S,d) and with a modulus of strong continuity δT . Let ! (,L,E) ≡ (0,L0,I0 ) ≡ [0,1],L0, ·dx denote the Lebesgue integration space based on the unit interval [0,1]. Let x ∈ S be arbitrary. Let Z ≡ Z x,T|Q(∞) : Q∞ × (,L,E) → (S,d) denote the process generated by the initial state x and semigroup T|Q∞ , in the sense of Definition 11.4.1, as constructed in Theorem 11.5.4. Let N ≥ 0 be arbitrary. Define the shifted process Z N : Q∞ × → S by Z N (t,·) ≡ Z(N + t,·) for each t ∈ Q∞ . Then the following conditions hold: 1. The process Z N is continuous in probability, with a modulus of continuity in probability δCp,δ(T) ≡ δCp (·,δT ) that is completely determined by δT and is independent of x and N . 2. The process Z N is strongly right continuous in probability, in the sense of Definition 10.7.3, with a modulus of strong right continuity in probability given by the operation δSRCp,δ(T) defined by δSRCp,δ(T) (ε,γ ) ≡ δCp (ε2,δT ) for each ε,γ > 0. Note that δSRCp,δ(T) (ε,γ ) is completely determined by δT and is independent of x,N and γ . 3. The process Z N : Q∞ × → S is D-regular, with a modulus of D-regularity mδ(T) that is completely determined by δT . 4. The process Z : Q∞ × (,L,E) → (S,d) is time-uniformly D-regular in the sense of Definition 10.10.2, with a modulus of continuity in probability δ Cp,δ(T) ≡ (δCp,δ(T),δCp,δ(T), . . .) and a modulus of D-regularity m δ(T) ≡ (mδ(T),mδ(T), . . .). 5. The right-limit extension X ≡ rLim (Z) : [0,∞) × (,L,E) → (S,d)
516
Stochastic Process
is a time-uniformly a.u. càdlàg process in the sense of Definition 10.10.3, with a modulus of a.u. càdlàg δaucl,δ(T) ≡ (δaucl,δ(T),δaucl,δ(T), . . .) and a modulus of continuity in probability δ Cp,δ(T) . Moreover, both of these two moduli are completely determined by δT . Here we recall the mapping rLim from Definition 10.5.6. In the notations of Definition 10.10.3, we have δ (aucl,δ(T)),δ (Cp,δ(T)) [0,∞). X ≡ rLim (Z) ∈ D Proof. 1. By hypothesis, the process Z has initial state x and Markov semigroup T|Q∞ . In other words, the process Z has marginal distributions given by the consistent family F x,T|Q(∞) of f.j.d.’s. as constructed in Theorem 11.5.4. Therefore, by Assertion 3 of Theorem 11.5.4, the process Z is continuous in probability, with a modulus of continuity in probability δCp,δ(T) ≡ δCp (·,δT ) that is is completely determined by δT . 2. Consequently, the shifted process Y ≡ Z N : Q∞ × → S is continuous in probability, with a modulus of continuity in probability δCp,δ(T) ≡ δCp (·,δT ) that is completely determined by δT . Assertion 1 is proved. 3. To prove Assertion 2, let ε,γ > 0 be arbitrary. Define δSRCp,δ(T) (ε,γ ) ≡ δT (ε2,ι), where the operation ι is defined by ι(ε )
(11.5.12)
ε for each ε
≡ > 0. We will show that the operation δSRCp,δ(T) is a modulus of strong right continuity of the process Y . To that end, let h ≥ 0 and s,r ∈ Qh be arbitrary, with s ≤ r < s + δSRCp,δ(T) (ε,γ ). Recall the enumerated set Qh = {0,h,2h, . . . ,1} ≡ {qh,0, . . . ,qh,p(h) }, 2−h,p
h where h ≡ h ≡ 2 , and qh,i ≡ ih, for each i = 0, . . . ,ph . Then s = qh,i and r = qh,j for some i,j = 0, . . . ,ph with i ≤ j . Now let g ∈ C(S i+1,d i+1 ) be arbitrary. Then
Eg(Yq(h,0), . . . ,Yq(h,i) )d(Ys ,Yr ) = Eg(Yq(h,0), . . . ,Yq(h,i) )d(Yq(h,i),Yq(h,j ) ) ≡ Eg(ZN +q(h,0), . . . ,ZN +q(h,i) )d(ZN +q(h,i)s ,ZN +q(h,i) ) = Eg(ZN +0, . . . ,ZN +i )d(ZN +i,ZN +j ) = E(g(ZN +0, . . . ,ZN +i )E(d(ZN +i,ZN +j )|L(Z,N +i) )) Z(N +i),T = E g(ZN +0, . . . ,ZN +i )F0,(j −i) d , (11.5.13) where the fourth equality is because g(ZN +0, . . . ,ZN +i ) ∈ LZ,(N +i) , and where the fifth equality is by equality 11.5.4 in Theorem 11.5.4. Continuing, let z ∈ S be arbitrary. Since
Markov Process
517
(j − i) = r − s ∈ [0,δSRCp,δ(T) (ε,γ )) ≡ [0,δT (ε2,ι)), we have, according to Assertion 2 of Lemma 11.3.2, the bound T(j −i) (d(z,·)) − d(z,·) ≤ ε2 .
(11.5.14)
In particular, z,T z z 2 F0,(j −i) d = T(j −i) d(z,·) = |T(j −i) d(z,·) − d(z,z)| ≤ ε ,
(11.5.15)
where z ∈ S is arbitrary. Hence equality 11.5.13 can be continued to yield Eg(Yq(h,0), . . . ,Yq(h,i) )d(Ys ,Yr ) ≤ ε2 Eg(ZN +0, . . . ,ZN +i ), where g ∈ C(S i+1,d i+1 ) is arbitrary. It follows that EU d(Ys ,Yr ) ≤ ε2 EU
(11.5.16)
for each U ∈ L(Yq(h,0), . . . ,Yq(h,i) ). Now let γ > 0 be arbitrary, and take an arbitrary measurable set A ∈ L(Yr : r ∈ [0,s]Qh ) = L(Yq(h,0), . . . ,Yq(h,i) )
(11.5.17)
with A ⊂ (d(x◦,Ys ) ≤ γ ) and P (A) > 0. Let U ≡ 1A denote the indicator of A. Then the membership relation 11.5.17 is equivalent to U ≡ 1A ∈ L(Yq(h,0),Yq(h,1), . . . ,Yq(h,i) ). Hence equality 11.5.16 applies, to yield E1A d(Ys ,Yr ) ≤ ε2 E1A .
(11.5.18)
Dividing by P (A), we obtain EA d(Ys ,Yr ) ≤ ε2, where EA is the conditional expectation given the event A. Chebychev’s inequality therefore implies PA (d(Ys ,Yr ) > α) ≤ ε for each α > ε. Here h ≥ 0,ε,γ > 0, and s,r ∈ Qh are arbitrary with s ≤ r < s + δSRCp,δ(T) (ε,γ ). Summing up, the process Y ≡ Z N is strongly right continuous in the sense of Definition 10.7.3, with a modulus of strong right continuity δSRCp ≡ δSRCp,δ(T) . Assertion 2 is proved. 4. Proceed to prove Assertion 3. To that end, recall that, by hypothesis, d ≤ 1. Hence the process Y ≡ Z N : Q∞ × → (S,d) is trivially a.u. bounded, with a modulus of a.u. boundedness βauB ≡ 1. Combining with Assertion 2, we see that the conditions for Theorem 10.7.8 are satisfied for the process Y ≡ Z N : Q∞ × → (S,d)
518
Stochastic Process
to be D-regular, with a modulus of D-regularity mδ(T) ≡ m(1,δSRCp,δ(T) ) and a modulus of continuity in probability δ Cp,δ(T) ≡ δ Cp (·,1,δSRCp,δ(T) ). Note that both moduli are completely determined by δT . Assertion 3 of the present proposition is proved. 5. Since the moduli δ Cp,δ(T) and mδ(T) are independent of the integer N ≥ 0, we see that the process Z : Q∞ × → (S,d) is time-uniformly D-regular in the sense of Definition 10.10.2, with a modulus of continuity in probability δ Cp,δ(T) ≡ (δ Cp,δ(T),δ Cp,δ(T), . . .) and a modulus of D-regularity m δ(T) ≡ (mδ(T),mδ(T), . . .). Assertion 4 of the present proposition is verified. 6. In view of Assertion 3 of the present proposition, Assertion 2 of Theorem 10.7.8 says that the right-limit extension rLim (Z N ) is an a.u. càdlàg process, with the same modulus of continuity in probability δ Cp (·,1,δSRCp ) as Z N , and with a modulus of a.u. càdlàg δaucl,δ(T) ≡ δaucl (·,1,δSRCp,δ(T) ) ≡ δaucl (·,βauB ,δSRCp,δ(T) ). Consider the right-limit extension process X ≡ rLim (Z) : [0,∞) × (,L,E) → (S,d). Consider each N ≥ 0. Then XN = rLim (Z N ) on the interval [0,1). Near the endpoint 1, things are a bit more complicated. Recall that since rLim (Z N ) is a.u. càdlàg, it is continuous a.u. on [0,1]. Hence, for a.e. ω ∈ , the function Z(·,ω) is continuous at 1 ∈ [0,1]. Therefore XN = rLim (Z N ) on the interval [0,1]. We saw earlier that the process rLim (Z N ) is a.u. càdlàg, with a modulus of a.u. càdlàg δaucl,δ(T) . Hence XN is a.u. càdlàg, with the same modulus of a.u. càdlàg δaucl,δ(T) . 7. As an immediate consequence of Assertion 4 of the present proposition, the process Z|]0,N +1]Q∞ is continuous in probability, with a modulus of continuity in probability δ Cp,δ(T) . It follows that the process X|[0,N + 1] is continuous in probability, with the same modulus of continuity in probability δ Cp,δ(T) . 8. Summing up the results in Steps 6 and 7, we see that the process X is time-uniformly a.u. càdlàg in the sense of Definition 10.10.3, with a modulus of continuity in probability δ Cp,δ(T) ≡ (δ Cp,δ(T),δ Cp,δ(T), . . .) and a modulus of a.u. càdlàg δaucl,δ(T) ≡ (δaucl,δ(T),δaucl,δ(T), . . .). Assertion 5 and the proposition are proved. In the following, to minimize notational clutter, we will write a = b ±c to mean |a − b| ≤ c, for arbitrary real-valued expressions a,b,c.
Markov Process
519
Theorem 11.5.6. Existence; construction of an a.u. càdlàg Markov process from an initial state and a semigroup, with parameter set [0,∞). Let (S,d) be a compact metric space with d ≤ 1. Let T ≡ {Tt : t ∈ [0,∞)} be an arbitrary Markov semigroup with state space (S,d) and a modulus of strong continuity δT . Let ! (,L,E) ≡ (0,L0,I0 ) ≡ ([0,1],L0, ·dx) denote the Lebesgue integration space based on the unit interval 0 . Let x ∈ S be arbitrary. Let Z ≡ Z x,T|Q(∞) : Q∞ × (,L,E) → (S,d) denote the Markov process generated by the initial state x and semigroup T|Q∞ , in the sense of Definition 11.4.1, as constructed in Theorem 11.5.4. Let LZ ≡ {L(Z,t) : t ∈ Q∞ } be the natural filtration of the process Z ≡ Z x,T|Q(∞) . Let X ≡ Xx,T ≡ rLim (Z) : [0,∞) × → (S,d) be the right-limit extension of the process Z. Let LX ≡ {L(X,t) : t ∈ [0,∞)} denote the natural filtration of the process X ≡ Xx,T . Then the following conditions hold: 1. The function X ≡ Xx,T is a time-uniformly a.u. càdlàg process, with a modulus of continuity in probability δ Cp,δ(T) ≡ (δ Cp,δ(T),δ Cp,δ(T), . . .) and a modulus of a.u. càdlàg δaucl,δ(T) ≡ (δaucl,δ(T),δaucl,δ(T), . . .). Note that both moduli are completely determined by δT . 2. The marginal distributions of the process Xx,T is given by the family F x,T of f.j.d.’s generated by the initial state x and the semigroup T, in the sense of Definition 11.4.1. 3. The process X ≡ Xx,T is Markov relative to its natural filtration LX ≡ {L(X,t) : t ∈ [0,∞)}. Specifically, let v ≥ 0 and let t0 ≡ 0 ≤ t1 ≤ · · · ≤ tm be an arbitrary sequence in [0,∞), with m ≥ 1. Let f ∈ Cub (S m+1,d m+1 ) be arbitrary. Then E(f (Xv+t (0), . . . ,Xv+t (m) )|L(X,v) ) X(v),T = E(f (Xv+t (0), . . . ,Xv+t (m) )|Xv ) = F0,t (1)··· ,t (m) f ,
(11.5.19)
∗,T where F0,t (1)··· ,t (m) is the transition distribution in Definition 11.4.1.
Proof. 1. Let x ∈ S be arbitrary. Note that T|Q∞ is a Markov semigroup with a parameter set Q∞ and a modulus of strong continuity δT . Assertion 5 of Proposition 11.5.5 therefore applies; it says that the right-limit extension Xx,T ≡ rLim (Z x,T|Q(∞) ) is a time-uniformly a.u. càdlàg process in the sense
520
Stochastic Process
of Definition 10.10.3, with a modulus of a.u. càdlàg δaucl,δ(T) ≡ (δaucl,δ(T), δaucl,δ(T), . . .) that is completely determined by δT , and with a modulus of continuity in probability δ Cp,δ(T) ≡ (δCp,δ(T),δCp,δ(T), . . .). Assertion 1 of the present theorem is proved. 2. To prove Assertion 2, note that, by Assertion 4 of Theorem 11.4.2, the consistent family F x,T|Q(∞) is generated by the initial state x and the semigroup T|Q∞ . Hence, for each sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm in Q∞ and for each f ∈ C(S m,d m ), we have x,T Fr(1),...,r(m) f x,T|Q(∞)
≡ Ef (Xr(1), . . . ,Xr(m) ) = Ef (Zr(1), . . . ,Zr(m) ) = Fr(1),...,r(m) f ! ! ! x(1) x(m−1) x (dx1 ) Tr(2)−r(1) (dx2 ) · · · Tr(m)−r(m−1) (dxm )f (x1, . . . ,xm ). = Tr(1) (11.5.20) Because the process X is continuous in probability, and because the semigroup T is strongly continuous, this equality extends to x,T Fr(1),...,r(m) f
= Ef (Xr(1), . . . ,Xr(m) ) ! ! ! x(1) x(m−1) x = Tr(1) (dx1 ) Tr(2)−r(1) (dx2 ) · · · Tr(m)−r(m−1) (dxm )f (x1, . . . ,xm ) (11.5.21) for each sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm in [0,∞). Thus the marginal distributions of the process X is given by the family F x,T, and the latter is generated by the initial state x and the semigroup T, in the sense of Definition 11.4.1. Assertion 2 is proved. 3. To prove Assertion 3, first let LZ ≡ {L(Z,t) : t ∈ Q∞ } be the natural filtration of the process Z ≡ Z x,T|Q(∞) . Assertion 2 of Theorem 11.5.4 says that the process Z is a Markov process relative to the filtration LZ . Specifically, let the nondecreasing sequence 0 ≡ s0 ≤ s1 ≤ · · · ≤ sm in Q∞ , the function f ∈ C(S m+1,d m+1 ), and the point t ∈ Q∞ be arbitrary. Then Assertion 2 of Theorem 11.5.4 says that E(f (Zt+s(0),Zt+s(1), . . . ,Zt+s(m) )|L(Z,t) ) Z(t),T|Q(∞)
= E(f (Zt+s(0),Zt+s(1), . . . ,Zt+s(m) )|Zt )) = Fs(0),...,s(m) (f ) ∗,T|Q(∞)
(11.5.22)
as r.r.v.’s, where Fs(0),...,s(m) is the transition distribution as in Definition 11.4.1. 4. Next, let LX ≡ {L(X,t) : t ∈ [0,∞)} be the natural filtration of the process X. Let t ∈ Q∞ be arbitrary. Take any sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rn ≤ · · · ≤ rn+m in Q∞ such that rn = t. Let the function g ∈ C(S n+1,d n+1 ) be arbitrary,
Markov Process
521
with values in [0,1] and a modulus of continuity δg . Let f ∈ C(S m+1,d m+1 ) be arbitrary, with a modulus of continuity δf . Write (s0,s1, . . . ,sm ) ≡ (0,rn+1 − rn, . . . ,rm − rn ). Then (rn, . . . ,rm ) = (t + s0, . . . ,t + sm ). Moreover, g(Zr(0), . . . ,Zr(n) ) ∈ L(Z,r(n)) . Hence equality 11.5.22 yields Eg(Zr(0), . . . ,Zr(n) )f (Zr(n), . . . ,Zr(n+m) ) = E(g(Zr(0), . . . ,Zr(n) )E(f (Zt+s(0), . . . ,Zt+s(m) )|L(Z,r(n)) )) Z(r(n)),T|Q(∞)
= Eg(Zr(0), . . . ,Zr(n) )Fs(0)··· ,s(m)
(f ).
Z(r(n)),T|Q(∞)
= Eg(Zr(0), . . . ,Zr(n) )F0,r(n+1)−r(n),...,r(n+m)−r(n) (f ). Since X ≡ rLim (Z), we have Xr = Zr for each r ∈ Q∞ . Hence the last displayed equality is equivalent to Eg(Xr(0), . . . ,Xr(n) )f (Xr(n), . . . ,Xr(n+m) ) X(r(n)),T
= Eg(Xr(0), . . . ,Xr(n) )F0,r(n+1)−r(n),...,r(n+m)−r(n) (f ),
(11.5.23)
∗,T is the transition distribution as in Definition where F0,r(n+1)−r(n),...,r(n+m)−r(n) 11.4.1. 5. Now let v ≥ 0 and the sequence t0 ≡ 0 ≤ t1 ≤ · · · ≤ tm in [0,∞) be arbitrary. Let n ≥ 1 and v0 ≡ 0 ≤ v1 ≤ · · · ≤ vn−1 in [0,v] be arbitrary. Define vn+i ≡ v + ti for each i = 0, . . . ,m. Thus vn ≡ v and
v0 ≡ 0 ≤ v1 ≤ · · · ≤ vn+m . Fix any integer N ≥ 0 so large that vn+m ∈ [0,N − 1]. Take any sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rn ≤ · · · ≤ rn+m
in Q∞ [0,N ],
and let ri ↓ vi for each i = 0, . . . ,n + m. Then the left-hand side of equality 11.5.23 converges to the limit Eg(Xv(0), . . . ,Xv(n) )f (Xv(n), . . . ,Xv(n+m) ), thanks to the continuity in probability of the process X|[0,N + 1]. 6. Consider the right-hand side of equality 11.5.23. Let ε > 0 be arbitrary. As observed in Step 1, the process X|[0,N + 1] has a modulus of continuity in probability δCp,δ(T) . Consequently, there exists δ0 > 0 so small that E|g(Xr(0), . . . ,Xr(n) ) − g(Xv(0), . . . ,Xv(n) )| < ε provided that ni=0 (ri − vi ) < δ0 .
(11.5.24)
522
Stochastic Process
7. Separately, Assertion 3 of Theorem 11.4.2 implies that there exists δm+1 (ε,δf ,δT,αT ) > 0 such that for arbitrary nondecreasing sequences 0 ≡ r 0 ≤ r 1 ≤ · · · ≤ r m and 0 ≡ s 0 ≤ s 1 ≤ · · · ≤ s m in [0,∞) with m
|r i − s i | < δm+1 (ε,δf ,δT,αT ),
i=0
we have
∗,T ∗,T Fr(0),r(1),...,r(m) f − Fs(0),s(1),...s(m) f < ε.
(11.5.25)
8. Now suppose n+m
(ri − vi ) < 2−1 δm+1 (ε,δf ,δT,αT ) ∧ δ Cp,δ(T) (εδf (ε)) ∧ δ0 .
(11.5.26)
i=0
Write r i ≡ rn+i − rn and s i ≡ ti = vn+i − vn , for each i = 0, . . . ,m. Then r i ,s i ∈ [0,N ] for each i = 0, . . . ,m, with m
|r i − s i | ≤ 2
i=0
n+m
(ri − vi ) < δm+1 (ε,δf ,δT,αT ),
i=0
where the last equality is by inequality 11.5.26. Therefore inequality 11.5.25 holds. ∗,T ∗,T f and h ≡ Fs(0),s(1),...s(m) f . Then As an abbreviation, define h ≡ Fr(0),r(1),...,r(m) inequality 11.5.25 can be rewritten as h − h < ε. (11.5.27) 9. Since 0 ≡ r 0 ≤ r 1 ≤ · · · ≤ r m is a sequence in [0,N ], Theorem 11.4.2 ∗,T implies that the transition distribution Fr(1),...,r(m) has a modulus of smoothness given by the m-fold composite product operation (m)
αα(T,N ),ξ ◦ · · · ◦ αα(T,N ),ξ , αα(T,N ),ξ ≡ where each factor on the right-hand side is as defined in Lemma 11.2.5. Hence the ∗,T f is a member of C(S,d), with values in [0,1] and a function h ≡ Fr(0),r(1),...,r(m) (m) modulus of continuity δh ≡ αα(T,N ),ξ (δf ). 10. Now the bound 11.5.26 implies n+m
(m) (ri − vi ) < δ Cp,δ(T) (εδh (ε)) ≡ δ Cp,δ(T) (ε αα(T,N ),ξ (δf )(ε)).
(11.5.28)
i=0
In particular, we have rn − vn < δ Cp,δ(T) (εδh (ε)). Hence Ed(Xr(n),Xv(n) ) < εδh (ε) by the definition of δ Cp,δ(T) as a modulus of continuity in probability of the process X|[0,N + 1]. Therefore Chebychev’s inequality yields a measurable set A ⊂ with EAc < ε such that A ⊂ (d(Xr(n),Xv(n) ) < δh (ε)).
(11.5.29)
Markov Process
523
11. Since the function h has modulus of continuity δh , relation 11.5.29 immediately implies that A ⊂ (d(Xr(n),Xv(n) ) < δh (ε)) ⊂ (|h(Xr(n) ) − h(Xv(n) )| < ε).
(11.5.30)
As a result, we obtain the estimate X(r(n)),T
Eg(Xr(0), . . . ,Xr(n) )F0,r(n+1)−r(n),...,r(n+m)−r(n) f ≡ (Eg(Xr(0), . . . ,Xr(n) )h(Xr(n) )1A ) + (Eg(Xr(0), . . . ,Xr(n) )h(Xr(n) )1Ac ) = (Eg(Xr(0), . . . ,Xr(n) )h(Xr(n) )1A ) ± ε = (Eg(Xr(0), . . . ,Xr(n) )h(Xv(n) )1A ) ± ε ± ε,
(11.5.31)
where the second equality is because EAc < ε, and where the third inequality is thanks to relation 11.5.30. At this point, note that the bound 11.5.26 implies also that ni=0 (ri − vi ) < δ0 . Hence inequality 11.5.24 holds, and leads to E(g(Xr(0), . . . ,Xr(n) )h(Xv(n) )1A ) = (Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) )1A ) ± ε. Therefore equality 11.5.31 can be continued, to yield X(r(n)),T
Eg(Xr(0), . . . ,Xr(n) )F0,r(n+1)−r(n),...,r(n+m)−r(n) f = (Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) )1A ) ± 3ε = (Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) )1A ) + (Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) )1Ac ) ± 4ε = Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) ) ± 4ε = Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) ) ± ε ± 4ε = Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) ) ± 5ε, where we used the condition that the functions f ,g,h have values in [0,1], and where the fourth equality is thanks to equality 11.5.27. Summing up, X(r(n)),T Eg(Xr(0), . . . ,Xr(n) )F0,r(n+1)−r(n),...,r(n+m)−r(n) f − Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) ) ≤ 5ε provided that the bound 11.5.26 is satisfied. Since ε > 0 is arbitrarily small, we have proved the convergence of the right-hand side of equality 11.5.23 with, specifically, X(r(n)),T
Eg(Xr(0), . . . ,Xr(n) )F0,r(n+1)−r(n),...,r(n+m)−r(n) f → Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) ), as ri ↓ vi for each i = 0, . . . ,n + m. In view of the convergence of the left-hand side of the same equality 11.5.23, as observed in Step 5, the two limits are equal. Namely,
524
Stochastic Process
Eg(Xv(0), . . . ,Xv(n) )f (Xv(n), . . . ,Xv(n+m) ) = Eg(Xv(0), . . . ,Xv(n) )h(Xv(n) ), (11.5.32) where the function g ∈ C(S n+1,d n+1 ) is arbitrary with values in [0,1], and where the integer n ≥ 1 and the sequence 0 ≡ v0 ≤ v1 ≤ · · · ≤ vn−1 ≤ vn = v are arbitrary. In other words, EYf (Xv(n), . . . ,Xv(n+m) ) = EY h(Xv ) for each r.r.v. Y in the linear subspace F Gv ≡ g(Xv(0), . . . ,Xv(n−1),Xv(n) ) : g ∈ C(S n+1,d n+1 ); G v0 ≤ v1 ≤ · · · ≤ vn−1 ≤ vn = v . Since the linear subspace Gv is dense in the space L(X,v) relative to the norm E| · |, it follows that EYf (Xv(n), . . . ,Xv(n+m) ) = EY h(Xv ) for each Y ∈ expectation
L(X,v) .
Since h(Xv ) ∈ Gv ⊂
L(X,v) ,
(11.5.33)
we obtain the conditional
E(f (Xv(n), . . . ,Xv(n+m) )|L(X,v) ) = h(Xv ) ≡ F X(v),T
s(0),s(1),...s(m))
f.
(11.5.34)
In the special case of an arbitrary Y ∈ L(Xv ) ⊂ L(X,v) , equality 11.5.33 implies that E(f (Xv(n), . . . ,Xv(n+m) )|Xv ) = h(Xv ) ≡ F X(v),T
s(0),s(1),...s(m))
f.
(11.5.35)
Recall that vn+i ≡ v + ti and s i ≡ ti , for each i = 0, . . . ,m, and recall that t0 ≡ 0. Equalities 11.5.34 and 11.5.35 can then be rewritten as X(v),T
E(f (Xv+t (0), . . . ,Xv+t (m) )|L(X,v) ) = F0,t (1),...,t (m) f
(11.5.36)
and X(v),T
E(f (Xv+t (0), . . . ,Xv+t (m) )|Xv ) = F0,t (1),...,t (m),
(11.5.37)
respectively. The last two equalities together yield the desired equality 11.5.19. Assertion 3 has been verified. Assertion 3 and the theorem are proved.
11.6 Continuity of Construction In this section, we will prove that the construction in Theorem 11.5.6 of a Markov process with parameter set [0,∞), from an initial state x and a semigroup T, is uniformly metrically continuous over each subspace of semigroups T whose members share a common modulus of strong continuity and share a common modulus of smoothness. First we specify a compact state space, and define a metric on the space of Markov semigroups.
Markov Process
525
Definition 11.6.1. Specification of state space, its binary approximation, and partition of unity. In this section, unless otherwise specified, (S,d) will denote a given compact metric space, with d ≤ 1 and a fixed reference point x◦ ∈ S. Recall that ξ ≡ (Ak )k=1,2,... is a binary approximation of (S,d) relative to x◦ , and that π ≡ ({gk,x : x ∈ Ak })k=1,2,... is the partition of unity of (S,d) determined by ξ , as in Definition 3.3.4. Recall that |Ak | denotes the number of elements in the metrically discrete finite subset Ak ⊂ S, for each k ≥ 1. For each n ≥ 1, let ξ n denote the nth power of ξ , and let π (n) denote the corresponding partition of unity for (S n,d n ). Thus, for each n ≥ 1, the sequence (n) ξ n ≡ (Ak )k=1,2,... is the product binary approximation for (S n,d n ) relative to (n) the reference point x◦ ≡ (x◦, . . . x◦ ) ∈ S n , and G F (n) π (n) ≡ gk,x : x ∈ A(n) k k=1,2,... is the partition of unity of (S n,d n ) determined by the binary approximation ξ ≡ (1) (n) (Ak )k=1,2,... ≡ (Ak )k=1,2,... of (S,d). For each k ≥ 1, the set Ak is a 2k approximation of the bounded subset (d n (·,(x◦, . . . ,xo ) ≤ 2k ) ⊂ S n . (n)
To lessen the burden of subscripts, we write Ak and An,k interchangeably, for each n ≥ 1. Recall from Definition 11.0.2 the enumerated set Q∞ ≡ {u0,u1, . . .} of dyadic rationals. Definition 11.6.2. Metric on the space of Markov semigroups. Let (S,d) be the specified compact metric space, with d ≤ 1. Suppose Q = Q∞ or Q = [0,∞). Let T be set of Markov semigroups on the parameter set Q and with the compact metric state space (S,d). For each n ≤ 0, write n ≡ 2−n . Define the metric ρT ,ξ on the set T by ρT ,ξ (T,T) ≡
∞
n=0
2−n−1
∞
k=1
2−k |Ak |−1
T(n) gk,z − T (n) gk,z z∈A(k)
(11.6.1) for arbitrary members T ≡ {Tt : t ∈ Q} and T ≡ {T t : t ∈ Q} of the set T . Here
· stands for the supremum norm on C(S,d). The next lemma proves that ρT is indeed a metric. Note that ρT ≤ 1. Let (S × T ,d ⊗ ρT ) denote the product metric metric space of (S,d) and (T ,ρT ). For each T ≡ {Tt : t ∈ Q} ∈ T , define the semigroup T|Q∞ ≡ {Tt : t ∈ Q∞ }. Then ρT ,ξ (T|Q∞,T|Q∞ ) = ρT ,ξ (T,T).
526
Stochastic Process
In other words, the mapping : (T ,ρT ,ξ ) → (T |Q∞,ρT ,ξ ), defined by (T) ≡ T|Q∞ for each T ∈ T , is an isometry. Note here that we abuse notations and omit the reference to the parameter set Q or Q∞ in the symbol ρT ,ξ . Lemma 11.6.3. ρT ,ξ is a metric. The function ρT ≡ ρT ,ξ is indeed a metric. Proof. Symmetry and triangle inequality for the function ρT are obvious from the defining equality 11.6.1. Suppose T,T ∈ T are such that ρT (T,T) = 0. Consider each r ∈ Q∞ . Then r = κn for some n,κ ≥ 0, while equality 11.6.1 implies that ∞
2−k |Ak |−1
k=1
T(n) gk,z − T (n) gk,z = 0.
(11.6.2)
z∈A(k)
Now let x ∈ S be arbitrary. Then ∞
x 2−k |Ak |−1 |T(n) gk,z − T
x
(n) gk,z |
= 0.
(11.6.3)
k=1 x ,T In other words, ρDist,ξ (T(n)
x
(n) )
= 0, where ρDist,ξ is the distribution metric x
x = T (n) as distributions. introduced in Definition 5.3.4. It follows that T(n) Since x ∈ S is arbitrary, we see that T(n) = T (n) as transition distributions. Consequently, by the semigroup property, we obtain
Tr = Tκ(n) = T(n) ◦ · · · ◦ T(n) = T (n) ◦ · · · ◦ T (n) = T κ(n) = T r . where r ∈ Q∞ . is arbitrary. Since Q∞ is dense in [0,∞), Assertion 3 of Lemma 11.3.2 implies that T = T. Summing up, ρT is a metric. Lemma 11.6.4. Continuity of construction of a family of transition f.j.d.’s from an initial state and semigroup. Let (S,d) be the specified compact metric space, with d ≤ 1. Let T (δ,α) be an arbitrary family of Markov semigroups with parameter set Q∞ and state space (S,d), such that all its members T ∈ T (δ,α) share a common modulus of strong continuity δT = δ and a common modulus of smoothness αT = α, in the sense of Definition 11.3.1. Thus T (δ,α) is a subset of of the metric space (T ,ρT ,ξ ) introduced in Definition 11.6.2 and, as such, inherits the metric ρT ,ξ . Recall from Definition 6.2.8 the metric space (Q∞,S), ρMarg,ξ,Q(∞) ) of consistent families of f.j.d.’s with parameter set Q∞ (F Cp (Q∞,S) consisting of members and state space (S,d). Recall the subspace F that are continuous in probability, as in Definition 6.2.10.
Markov Process
527
Then the mapping Cp (Q∞,S), Sg,fj d : (S × T (δ,α),d ⊗ ρT ,ξ ) → (F ρMarg,ξ,Q(∞) ), constructed in Assertion 5 of Theorem 11.4.2, is uniformly continuous, with a modulus of continuity δSg,fj d (·,δ,α, ξ ) determined by the moduli δ,α and by the modulus of local compactness ξ ≡ (|An |)n=1,2,... of (S,d). Proof. 1. Let (x,T),(x,T) ∈ S × T (δ,α) be arbitrary, but fixed. As an abbreviation, write F ≡ F x,T ≡ Sg,fj d (x,T) and F ≡ F x,T ≡ Sg,fj d (x,T). Define the distance ρ 0 ≡ (d ⊗ ρT )((x,T),(x,T)) ≡ d(x,x) ∨ ρT (T,T).
(11.6.4)
2. Let ε0 > 0 be arbitrary, but fixed until further notice. Take M ≥ 0 so large that 2−p(2M)−1 < 3−1 ε0 , where p2M ≡ N ≡ 22M . As an abbreviation, also write ≡ M ≡ 2−M and K ≡ N + 1. Then 2−K = 2−N −1 < 3−1 ε0 . Moreover, in the notations of Definitions 11.0.2, we have QM ≡ {u0,u1, . . . ,up(2M) } ≡ {u0,u1, . . . ,uN } = {0,,2, . . . ,N } (11.6.5) as enumerated sets. Thus un = n for each n = 0, . . . ,N. 3. By Definition 6.2.8, we have ρ Marg,ξ,Q(∞) (F,F ) ≡
∞
2−n−1 ρDist,ξ n+1 (Fu(0),...,u(n),F u(0),...,u(n) )
n=0
≤
N
2−n−1 ρDist,ξ n+1 (Fu(0),...,u(n),F u(0),...,u(n) ) + 2−N −1
n=0
≤
N
2−n−1 ρDist,ξ n+1 (F0,,...,n,F 0,,...,n ) + 3−1 ε0 .
n=0
(11.6.6) For each n ≥ 0, the metric ρDist,ξ n+1 was introduced in Definition 5.3.4 for the space of distributions on (S n+1,d n+1 ), where it is observed that ρDist,ξ n+1 ≤ 1 and that sequential convergence relative to ρDist,ξ n+1 is equivalent to weak convergence. 4. We will prove that the summand indexed by n in the last sum in equality11.6.6 is bounded by 2−n−1 2 · 3−1 ε0 , provided that the distance ρ 0 is sufficiently small. To that end, let n = 0, . . . ,N be arbitrary, but fixed until further notice. Then the summand is bounded by ρDist,ξ n+1 (F0,,...,n,F 0,,...,n ) ≡
∞
k=1
2−k |An+1,k |−1
F0,,...,n g (n+1) − F 0,,...,n g (n+1) k,y
y∈A(n+1,k)
k,y
528 ≤
Stochastic Process K
2−k |An+1,k |−1
k=1
0, we have (p T )g − (p T )g < ε g∈G(p)
provided that the distance ρ 0 is bounded by ρ 0 < ρp (ε ). 8. Start with p = n. Arbitrarily define ρn ≡ 1. Then the operation ρp trivially (n+1) satisfies Condition (ii) in Step 7. Consider each g ∈ Gp . Then g = gk,y for some k = 0, . . . ,K and y ∈ An+1,k . According to Proposition 3.3.3, the basis (n+1) ∈ C(S n+1,d n+1 ) has a Lipschitz constant 2 · 2k ≤ 2K+1 ≡ 2N +2 function gk,y and has values in [0,1]. Thus the function g has a modulus of continuity δn defined δn by δn (ε ) ≡ 2−N −2 ε for each ε > 0. Since g ∈ Gn is arbitrary, the modulus satisfies Condition (i) in Step 7. The pair δn,ρn has been constructed to satisfy Conditions (i) and (ii), in the case where p = n. 9. Suppose, for some p = n,n − 1, . . . ,1, the pair of operations δp,ρp has been constructed to satisfy Conditions (i) and (ii) in Step 7. Proceed to construct the pair δp−1,ρp−1 . Consider each (n+1)
g ∈ Gp−1 ≡ {(p T ) ) · · · (n T )gk,y
: k = 0, . . . ,K and y ∈ An+1,k }
= (p T )Gp ⊂ C(S p,d p ). Then g = (p T )g for some g ∈ Gp . By Condition (ii) in the backward induction δp . Lemma 11.2.5 therefore hypothesis, the function g has a modulus of continuity says that the function g = (p T )g has a modulus of continuity given by δp−1 ≡ αα(T ()),ξ ( δp ), where the modulus of smoothness αα(T ()),ξ is as defined in Lemma 11.2.5. Since g ∈ Gp−1 is arbitrary, we see that δp−1 satisfies Condition (i) in Step 7. It remains to construct ρp−1 to satisfy Condition (ii) in Step 7. 10. To that end, let ε > 0 be arbitrary. Take h ≥ 1 so large that 1 δp−1 (3−2 ε ). 2−h < 2
(11.6.9)
530
Stochastic Process
Define ρp−1 (ε ) ≡ ρp (3−1 ε ) ∧ 2−M−1 2−h |Ah |−1 (3−1 ε ).
(11.6.10)
Consider each g ∈ Gp−1 . By Step 9, the function g has a modulus of continuity δp−1 . Let (w1, . . . ,wp−1 ) ∈ S p−1 be arbitrary, and consider the function f ≡ g(w1, . . . ,wp−1,·) ∈ C(S,d). Because d ≤ 1 by assumption, we have S ⊂ (d(·,x◦ ) ≤ 1) ⊂ (d(·,x◦ ) ≤ 2h ), whence, trivially, the function f ∈ C(S,d) has the set (d(·,x◦ ) ≤ 2h ) as support. δp−1 as Moreover, the function f has the same modulus of continuity δf ≡ the function g. Thus the conditions in Assertion 4 of Proposition 3.3.6, where n,k,f ,x,gx ,ε are replaced by 1,h,f ,z,gh,z,3−1 ε , respectively, are satisfied. Accordingly,
−1 f − f (z)gh,z (11.6.11) ≤3 ε z∈A(h) on S. Consequently, since T is a contraction mapping, we have
−1 T f − f (z)T g h,z ≤ 3 ε . z∈A(h)
(11.6.12)
Similarly,
−1 T f − f (z)T gh,z ≤3 ε. z∈A(h)
(11.6.13)
11. Now suppose ρ 0 < ρp−1 (ε ). Then ∞
j =0
2−j −1
∞
2−h |Ah |−1
h =1
T(j ) gh ,z − T (j ) gh ,z z∈A(h )
≡ ρT ,ξ (T,T) ≤ ρ 0 < ρp−1 (ε ), where the first inequality is by equality 11.6.4. Consequently,
T(M) gh,z − T (M) gh,z < ρp−1 (ε ). 2−M−1 2−h |Ah |−1 z∈A(h)
Recall that ≡ M ≡ 2−M . The last displayed inequality can be rewritten as
T gh,z − T gh,z < ρp−1 (ε ). (11.6.14) 2−M−1 2−h |Ah |−1 z∈A(h)
Markov Process
531
Therefore, since the function f has values in [0,1], we have
f (z)T gh,z − f (z)T gh,z f (z)T gh,z − f (z)T gh,z ≤ z∈A(h) z∈A(h) z∈A(h)
T gh,z − T gh,z ≤ z∈A(h)
< 2M+1 2h |Ah |ρp−1 (ε ) ≤ 3−1 ε , (11.6.15) where the third inequality is by inequality 11.6.14, and where the last inequality follows from the defining formula 11.6.10. 12. Combining inequalities 11.6.15, 11.6.13, and 11.6.12, we obtain T f − T f ≤ ε . In other words, T g(w1, . . . ,wp−1,·) − T g(w1, . . . ,wp−1,·) ≤ ε , where (w1, . . . ,wp−1 ) ∈ S p is arbitrary. Consequently, p−1 T g − p−1 T g ≤ ε . where ε > 0 and g ∈ Gp−1 are arbitrary, provided that ρ 0 < ρp−1 (ε ). 13. In short, the operation ρp−1 has been constructed to satisfy Condition (ii) in Step 7. The backward induction is completed, and we have obtained the pair ( δp,ρp ) for each p = n,n − 1, . . . ,0 to satisfy Conditions (i) and (ii) in Step 7. In particular, we obtained the pair ( δ0,ρ0 ). of operations. 14. Now let ε ≡ 3−1 (n + 2)−1 ε0 . Suppose ρ 0 < δSg,fj d (ε0,δ,α, ξ ) ≡ ρ0 (ε ) ∧ δ0 (ε ). Let p = 1, . . . n and g ∈ Gp, be arbitrary. Then (p T )g ∈ Gp−1,
(11.6.16)
Moreover, as a result of the monotonicity of the operations ρp in the defining formula 11.6.10, we have ρ 0 < ρp (ε ). Hence, by Condition (ii) in Step 7, we have (p T )g = (p T )g ± ε . Therefore, since the transition distributions (1 T ), . . . ,(p−1 T ) are contraction mappings, it follows that (1 T )(2 T ) · · · (p−2 T )(p−1 T )(p T )g = (1 T )(2 T ) · · · (p−2 T )(p−1 T )(p T )g ± ε .
(11.6.17)
532
Stochastic Process (n+1)
15. Finally, let k = 0, . . . ,K and y ∈ An+1,k . be arbitrary. Let g ≡ gk,y Gn . Hence, using equality 11.6.17 repeatedly, we obtain
∈
x,T F0,,...,n g = δx (1 T )(2 T ) · · · (n−1 T )(n T )g
= δx (1 T )(2 T ) · · · (n−1 T )(n T )(n+1 T )g ± ε = δx (1 T )(2 T ) · · · (n−1 T )(n T )(n+1 T )g ± 2ε = ··· = δx (1 T )(2 T ) · · · (n−1 T )(n T )(n+1 T )g ± (n + 1)ε x,T = F0,,...,n g ± (n + 1)ε .
(11.6.18)
16. Moreover, by Condition (i) in Step 7, the function ∗,T F0,,...,n g = (1 T )(2 T ) · · · (n−1 T )(n T )g ∈ G0
has a modulus of continuity δ0 . Therefore x,T x,T F0,,...,n g = F0,,...,n g ± ε
because d(x,x) ≤ ρ 0 < δSg,fj d (ε0,δ,α, ξ ) ≡ ρ0 (ε ) ∧ δ0 (ε ) ≤ δ0 (ε ). By symmetry, x,T x,T F0,,...,n g = F0,,...,n g ± ε .
(11.6.19)
Equalities 11.6.18 and 11.6.19 together imply x,T x,T x,T F0,,...,n g = F0,,...,n g ± (n + 1)ε = F0,,...,n g ± (n + 1)ε ± ε x,T x,T = F0,,...,n g ± (n + 2)ε ≡ F0,,...,n g ± 3−1 ε0 .
Equivalently, (n+1)
|δx (1 T )(2 T ) · · · (n T )gk,y
(n+1)
− δx (1 T )(2 T ) ) · · · (n T )gk,y
| ≤ 3−1 ε0, (11.6.20)
where k = 0, . . . ,K and y ∈ An+1,k . are arbitrary. The desired equality 11.6.8 follows for each n = 0, . . . ,N. Inequalities 11.6.7 and 11.6.6 then imply that HIJK
ρ Marg,ξ,Q(∞) Sg,fj d (x,T), Sg,fj d x, T =ρ Marg,ξ,Q(∞) (F,F ) ≤ 3−1 ε0 + 3−1 ε0 + 3−1 ε0 = ε0, provided that the distance ρ 0 ≡ (d ⊗ ρT )((x,T),(x,T)) is bounded by ρ 0 < δSg,fj d (ε0,δ,α, ξ ). Summing up, δSg,fj d (·,α, ξ ) is a modulus of continuity of the mapping Sg,fj d . The theorem is proved.
Markov Process
533
Following is the main theorem of this section. Theorem 11.6.5. Construction of a time-uniformly a.u. càdlàg Markov process from an initial state and Markov semigroup on [0,∞), and continuity of said construction. Let (S,d) be the specified compact metric space, with d ≤ 1. Let T (δ,α) be an arbitrary family of Markov semigroups with parameter set [0,∞) and state space (S,d), such that all its members T ∈ T (δ,α) share a common modulus of strong continuity δT ≡ δ and a common modulus of smoothness αT ≡ α, in the sense of Definition 11.3.1. Thus T (δ,α) is a subset of the metric space (T ,ρT ,ξ ) introduced in Definition 11.6.2 and, as such, inherits (Q∞,S) of consistent families of the metric ρT ,ξ . Separately, recall the space F f.j.d.’s, equipped with the marginal metric ρ Marg,ξ,Q(∞) as in Definition 6.2.8. Cp ([0,∞),S) of consistent Similarly, recall from Definition 6.2.10 the space F families of f.j.d.’s that are continuous in probability, equipped with the metric ρ Cp,ξ,[0,∞),Q(∞) introduced in Definition 6.2.12. Then the following conditions hold: 1. There exists a uniformly continuous mapping Cp (Q∞,S), ρMarg,ξ,Q(∞) ) Sg,fj d : (S × T (δ,α)|Q∞,d ⊗ ρT ,ξ ) → (F such that for each (x,T|Q∞ ) ∈ S×T (δ,α)|Q∞ , the family F x,T|Q(∞) ≡ Sg,fj d (x,T|Q∞ ) of f.j.d.’s is generated by the initial state x and the semigroup T|Q∞ , in the sense of Definition 11.4.1. In particular, for each fixed T ∈ T (δ,α), the function Cp (Q∞,S), ρMarg,ξ,Q(∞) ) F ∗,T|Q(∞) ≡ Sg,fj d (·,T|Q∞ ) : (S,d) → (F is uniformly continuous. Consequently, for each m ≥ 1, f ∈ C(S m,d m ), and for each nondecreasing sequence r1 ≤ · · · ≤ rm in Q∞ , the function ∗,T|Q(∞)
Fr(1),...,r(m) f : (S,d) → R is uniformly continuous on (S,d). 2. There exists a uniformly continuous mapping ρD[0,∞) ), Sg,CdlgMrkv : (S × T (δ,α),d ⊗ ρT ,ξ ) → (D[0,∞), ) is the metric space of an a.u. càdlàg process with where (D[0,∞), ρD[0,∞) parameter set [0,∞), as defined in Definition 10.10.3, such that for each (x,T) ∈ S × T (δ,α), the a.u. càdlàg process Xx,T ≡ Sg,CdlgMrkv (x,T) is generated by the initial state x and the semigroup T, in the sense of Definition 11.4.1. For each x ∈ S, the family F x,T of marginal distributions of Xx,T is therefore generated by the initial state x and the semigroup T. Specifically, Sg,CdlgMrkv ≡ rLim ◦ DKS,ξ ◦ Sg,fj d ◦ , where the component mappings on the right-hand side will be defined precisely in the proof.
534
Stochastic Process
Moreover, Xx,T has a modulus of a.u. càdlàg δaucl,δ ≡ (δaucl,δ ,δaucl,δ , . . .) and a modulus of continuity in probability δ Cp,δ ≡ (δCp,δ ,δCp,δ , . . .). In other words, Xx,T ∈ D δ (aucl,δ), δ (Cp,δ) [0,∞) is time-uniformly a.u. càdlàg. Hence the function Sg,CdlgMrkv has a range in D δ (aucl,δ), δ (Cp,δ) [0,∞) and can be regarded as a uniformly continuous mapping Sg,CdlgMrkv : (S × T (δ,α),d ⊗ ρT ,ξ ) → D δ (aucl,δ), δ (Cp,δ) [0,∞). Note that the moduli δ Cp,δ and δaucl,δ are completely determined by δ ≡ δT . 3. Let (x,T) ∈ S × T (δ,α) be arbitrary. Then the a.u. càdlàg process X ≡ Xx,T ≡ Sg,CdlgMrkv (x,T) : [0,∞) × (,L,E) → (S,d) (X,t)
: t ∈ [0,∞)} of its is Markov relative to the right-limit extension LX ≡ {L natural filtration LX ≡ {L(X,t) : t ∈ [0,∞)}. More precisely, let the nondecreasing sequence 0 ≡ s0 ≤ s1 ≤ · · · ≤ sm in [0,∞), the function f ∈ C(S m+1,d m+1 ), and t ≥ 0 be arbitrary. Then (X,t)
E(f (Xt+s(0),Xt+s(1), . . . ,Xt+s(m) )|L
)
X(t),T (f ) = E(f (Xt+s(0),Xt+s(1), . . . ,Xt+s(m) )|Xt ) = Fs(0),...,s(m)
(11.6.21)
as r.r.v.’s.
Proof. Let (,L,E) ≡ (0,L0,I0 ) ≡ ([0,1],L0, ·dx) denote the Lebesgue integration space based on the unit interval 0 . 1. Let T ∈ T (δ,α) be arbitrary. Then, since T is a Markov semigroup with parameter set [0,∞), its restriction T|Q∞ satisfies the conditions in Definition 11.3.1 to be a Markov semigroup with parameter set Q∞ , with the same modulus of strong continuity δT = δ and the same modulus of smoothness αT = α. 2. Define the set T (δ,α)|Q∞ ≡ {T|Q∞ : T ∈ T (δ,α)}. Then the sets T (δ,α) and T (δ,α)|Q∞ both inherit the metric ρT ,ξ introduced in Definition 11.6.2. As observed in the previous paragraph, members of the family T (δ,α)|Q∞ share the common moduli δ and α. As observed in Definition 11.6.2, the mapping : (T (δ,α),ρT ,ξ ) → (T (δ,α)|Q∞,ρT ,ξ ), defined by (T) ≡ T|Q∞ for each T ∈ T (δ,α), is an isometry. Hence the mapping : (S × T (δ,α),d ⊗ ρT ,ξ ) → (S × T (δ,α)|Q∞,d ⊗ ρT ,ξ ), defined by (x,T) ≡ (x, (T)) for each (x,T) ∈ S × T (δ,α), is uniformly continuous. 3. Separately, Lemma 11.6.4 says that the mapping Cp (Q∞,S), ρMarg,ξ,Q(∞) ), Sg,fj d : (S × T (δ,α)|Q∞,d ⊗ ρT ,ξ ) → (F
Markov Process
535
constructed in Theorem 11.4.2, is uniformly continuous, with a modulus of continuity δSg,fj d (·,δ,α, ξ ) completely determined by the moduli δ,α and the modulus of local compactness ξ ≡ (|An |)n=1,2,... of (S,d). Assertion 1 is proved. 4. Moreover, Theorem 6.4.4 says that the Compact Daniell–Kolmogorov– Skorokhod Extension (Q∞,S), ∞ × 0,S), DKS,ξ : (F ρMarg,ξ,Q(∞) ) → (R(Q ρP rob,Q(∞) ) is uniformly continuous with a modulus of continuity δ DKS (·, ξ ) dependent only on ξ . 5. Combining, we see that the composite mapping DKS,ξ ◦ Sg,fj d ◦ is uniformly continuous. Now consider the range of this composite mapping. Specifically, take an arbitrary (x,T) ∈ S ×T (δ,α) and consider the image process Z ≡ DKS,ξ ( Sg,fj d ( (x,T))). Write F ≡ F x,T|Q(∞) ≡ Sg,fj d (x,T|Q∞ ) for the consistent family constructed in Theorem 11.4.2, generated by the initial state x and the semigroup T|Q∞ , in the sense of Definition 11.4.1. Thus Z ≡ DKS,ξ (F x,T|Q(∞) ) and, by the definition of the mapping DKS,ξ , the process Z has marginal distributions given by the family F x,T|Q(∞) . In particular Z0 = x. 6. The semigroup T|Q∞ has a modulus of strong continuity δ. At the same time, Assertion 1 of Theorem 11.5.4 implies that the process Z ≡ DKS,ξ (F x,T|Q(∞) ) is generated by the initial state x and semigroup T|Q∞ , in the sense of Definition 11.4.1. Hence, by Assertion 4 of Proposition 11.5.5, the process Z is time-uniformly D-regular in the sense of Definition 10.10.2, with some modulus of continuity in probability δ Cp,δ ≡ (δCp,δ ,δCp,δ , . . .) and some modulus of D-regularity m δ ≡ (mδ ,mδ , . . .). In the notations of Definition 10.10.2, we thus have Z ∈ R Dreg, δ (Cp,δ), m (δ). (Q∞ × ,S). Summing up, we see that the range of the composite mapping DKS,ξ ◦ Sg,fj d ◦ is contained in the subset ρP rob,Q(∞) ). Hence we have R Dreg, δ (Cp,δ), m (δ). (Q∞ × ,S) of (R(Q∞ × 0,S), the uniformly continuous mapping DKS,ξ ◦ Sg,fj d ◦ : (S × T (δ,α),d ⊗ ρT ,ξ ) → (R ρP rob,Q(∞) ). Dreg, δ (Cp,δ), m (δ). (Q∞ × ,S), 7. By Assertion 5 in Proposition 11.5.5, the right-limit extension X ≡ rLim (Z) : [0,∞) × (,L,E) → (S,d) is a time-uniformly a.u. càdlàg process on [0,∞), in the sense of Definition 10.10.3, with some modulus of a.u. càdlàg δaucl,δ ≡ (δaucl,δ ,δaucl,δ , . . .), and with the same modulus of continuity in probability δ Cp,δ as Z. In short,
536
Stochastic Process δ (aucl,δ(T)),δ (Cp,δ(T)) [0,∞). X ≡ rLim (Z) ∈ D
8. Since (x,T) ∈ S × T (δ,α) is arbitrary, the composite mapping rLim ◦ DKS,ξ ◦ Sg,fj d ◦ is well defined. We have already seen that the mapping DKS,ξ ◦ Sg,fj d ◦ is uniformly continuous. In addition, Theorem 10.10.9 says that rLim : (R ρP rob,Q(∞) ) → (D[0,∞), ρD[0,∞) ) Dreg, δ (Cp,δ), m (δ). (Q∞ × ,S), is an isometry. Combining, we see that the composite construction mapping Sg,CdlgMrkv ≡ rLim ◦ DKS,ξ ◦ Sg,fj d ◦ : (S × T (δ,α),d ⊗ ρT ,ξ ) ) → (D[0,∞), ρD[0,∞) is well defined and is uniformly continuous. Moreover, X|Q∞ = Z has marginal distributions given by the family F x,T|Q(∞) . 9. Let m ≥ 1, f ∈ C(S m,d m ), s1 ≤ · · · ≤ sm in [0,∞), and r1 ≤ · · · ≤ rm in Q∞ . Then, because Z ≡ DKS,ξ (F x,T|Q(∞) ) is generated by the initial state x and semigroup T|Q∞ , we have Ef (Xr(1), . . . ,Xr(m) ) ! ! ! x(1) x(m−1) x = Tr(1) (dx1 ) Tr(2)−r(1) (dx2 ) · · · Tr(m)−r(m−1) (dxm )f (x1, . . . ,xm ). (11.6.22) Now let ri ↓ si for each i = 1, . . . ,m. Then the left-hand side of equality 11.6.22 converges to Ef (Xs(1), . . . ,Xs(m) ) because the process X is continuous in probability. At the same time, by Assertion 1 of Lemma 11.3.2, the right-hand side of equality 11.6.22 converges to ! ! ! x(1) x(m−1) x Ts(1) (dx1 ) Ts(2)−s(1) (dx2 ) · · · Ts(m)−s(m−1) (dxm )f (x1, . . . ,xm ). Consequently, Ef (Xs(1), . . . ,Xs(m) ) ! ! x(1) x(m−1) x = Ts(1) (dx1 ) Ts(2)−s(1) (dx2 ) · · · Ts(m)−s(m−1) (dxm )f (x1, . . . ,xm ). where m ≥ 1, f ∈ C(S m,d m ), and s1 ≤ · · · ≤ sm in [0,∞) are arbitrary. Thus the process X is generated by the initial state x and semigroup T, according to Definition 11.4.1. Assertion 2 of the present theorem is proved. 10. Finally, note that Assertion 3 of the present theorem is merely a restatement of Assertions 3 and 4 of Theorem 11.5.6. The present theorem is proved.
11.7 Feller Semigroup and Feller Process In the previous sections, we constructed and studied Markov processes with a compact metric state space. In the present section, we will construct and study Feller processes, which are Markov processes with a locally compact metric state space.
Markov Process
537
Definition 11.7.1. Specification of locally compact state space and related objects. In this section, let (S,d) be a locally compact metric space, as specified in Definition 11.0.1, along with related objects, including a reference point x◦ ∈ S and a binary approximation ξ . In addition, for each n ≥ 0 and for each y ∈ S, define the function hy,n ≡ (1 ∧ (1 + n − d(·,y))+ ∈ C(S,d). Then, for each fixed y ∈ S, we have hy,n ↑ 1 as n → ∞, uniformly on compact subsets of (S,d). Define hy,n ≡ 1 − hy,n ∈ Cub (S,d). The continuous functions hy,n and hy,n will be surrogates for the indicators 1(d(y,·)≤n) and 1(d(y,·)>n) , respectively. Let (S,d) be a one-point compactification of the metric space (S,d), where d ≤ 1 and where is called the point at infinity. For ease of reference, we list here almost verbatim the conditions from Definition 3.4.1 for the one-point compactification (S,d). 1. S ∪ {} is a dense subset of (S,d). Moreover, d ≤ 1. 2. For each compact subset K of (S,d), there exists c > 0 such that d(x,) ≥ c for each x ∈ K. 3. Let K be an arbitrary compact subset of (S,d). Let ε > 0 be arbitrary. Then there exists δK (ε) > 0 such that for each y ∈ K and z ∈ S with d(y,z) < δK (ε), we have d(y,z) < ε. In particular, the identity mapping ι¯ : (S,d) → (S,d) is uniformly continuous on each compact subset of S. 4. The identity mapping ι : (S,d) → (S,d), defined by ι(x) ≡ x for each x ∈ S, is uniformly continuous on (S,d). In other words, for each ε > 0, there exists δd (ε) > 0 such that d(x,y) < ε for each x,y ∈ S with d(x,y) < δd (ε). 5. For each n ≥ 1, we have (d(·,x◦ ) > 2n+1 ) ⊂ (d(·,) ≤ 2−n ). Separately, refer to Definition 11.0.2 for notations related to the enumerated 0 ,Q 1 · · · ,Q∞ of dyadic rationals in [0,∞), and to the enumerated sets sets Q Q0 ,Q1 · · · ,Q∞ of dyadic rationals in [0,1]. Definition 11.7.2. Feller semigroup. Let V ≡ {Vt : t ∈ [0,∞)} be an arbitrary family of nonnegative linear mappings Vt from Cub (S,d) to Cub (S,d) such that V0 is the identity mapping. Suppose, for each t ∈ [0,∞) and for each y ∈ S, the function y
Vt ≡ Vt (·)(y) : Cub (S,d) → R is a distribution on the locally compact space (S,d). Suppose, in addition, the following four conditions are satisfied: 1. (Smoothness.) For each N ≥ 1, for each t ∈ [0,N ], and for each f ∈ Cub (S,d) with a modulus of continuity δf and with |f | ≤ 1, the function
538
Stochastic Process
Vt f ∈ Cub (S,d) has a modulus of smoothness αV,N (δf ) that depends on N and on δf , and that is otherwise independent of the function f . 2. (Semigroup property.) For each s,t ∈ [0,∞), we have Vt+s = Vt Vs . 3. (Strong continuity.) For each f ∈ Cub (S,d) with a modulus of continuity δf and with |f | ≤ 1, and for each ε > 0, there exists δV (ε,δf ) > 0 so small that for each t ∈ [0,δV (ε,δf )), we have |f − Vt f | ≤ ε
(11.7.1)
as functions on (S,d). 4. (Non-explosion.) For each N ≥ 1, for each t ∈ [0,N ], and for each ε > 0, there exists an integer κV,N (ε) > 0 so large that if n ≥ κV,N (ε), then y
Vt hy,n ≤ ε for each y ∈ S. Then we call the family V a Feller semigroup. The operation δV is called a modulus of strong continuity of V. The sequence αV ≡ (αV,N )N =1,2,... of operations is called a modulus of smoothness of V. The sequence κV ≡ (κV,N )N =1,2,... of operations is called a modulus of non-explosion of V. Lemma 11.7.3. Each Markov semigroup, with a compact state space, is a Feller semigroup. Each Markov semigroup T, with a compact state space, in the sense of Definition 11.3.1, is a Feller semigroup. Proof. Straightforward and omitted.
Consequently, all the following results for Feller semigroups and Feller processes will be applicable to Markov processes with Markov semigroups. Definition 11.7.4. Family of transition distributions generated by Feller semigroup. Suppose, for each x ∈ S, we are given a consistent family F x,V of f.j.d.’s with state space (S,d) and parameter set [0,∞). 1. Suppose, for arbitrary m ≥ 1, f ∈ Cub (S m,d m ), and nondecreasing sequence r1 ≤ · · · ≤ rm in [0,∞), we have, for each x ∈ S, x,V Fr(1),...,r(m) f ! ! ! x(1) x(m−1) x = Vr(1) (dx1 ) Vr(2)−r(1) (dx2 ) · · · Vr(m)−r(m−1) (dxm )f (x1, . . . ,xm ).
(11.7.2) Then, for each x ∈ S, the family F x,V is called the consistent family of f.j.d.’s generated by the initial state x and Feller semigroup V. 2. Suppose, in addition, for arbitrary m ≥ 1, f ∈ Cub (S m,d m ), and sequence
∗,V r1 ≤ · · · ≤ rm in [0,∞), the function Fr(1),...,r(m) f : (S,d) → R, defined by ∗,V x,V f )(x) ≡ Fr(1),...,r(m) f for each x ∈ S, is uniformly continuous and (Fr(1),...,r(m)
Markov Process
539
∗,V ∗,V bounded or, in symbols, Fr(1),...,r(m) f ∈ Cub (S,d). Then {Fs(1),...,s(m) : m ≥ 1; s1, . . . ,sm ∈ [0,∞)} is called the family of transition distributions generated by the Feller semigroup V.
To use the results developed in previous sections for Markov semigroups and Markov processes, where the state space is assumed to be compact, we embed each given Feller semigroup on the locally compact state space (S,d) into a Markov semigroup on the one-point compactification (S,d) state space, as follows. Lemma 11.7.5. Compactification of a Feller semigroup into a Markov semigroup with a compact state space. Let V ≡ {Vt : t ∈ [0,∞)} be an arbitrary Feller semigroup on the locally compact metric space (S,d), with moduli δV , αV , κV as in Definition 11.7.2. Then there exists a Markov semigroup T ≡ {Tt : t ∈ [0,∞)} with state space (S,d), such that (Tt g)() ≡ Tt g ≡ g() and
!
y
(Tt g)(y) ≡ Tt g ≡
y
!
Tt (dz)g(z) ≡
(11.7.3)
y
Vt (dz)g(z)
(11.7.4)
z∈S
z∈S
for each y ∈ S, for each g ∈ C(S,d), for each t ∈ [0,∞). Equality 11.7.4 is equivalent to y
y
Tt g ≡ Vt (g|S) ≡ Vt (g|S)(y) for each y ∈ S, for each g ∈ C(S,d), for each t ∈ [0,∞). Moreover, S is a y full subset, and {} is a null subset, of S relative to the distribution Tt , for each t ∈ [0,∞). For want of a better name, such a Markov semigroup T will be called a compactification of the Feller semigroup V. Proof. Let t ∈ [0,∞) be arbitrary. Let N ≥ 1 be such that t ∈ [0,N ]. Let g ∈ C(S,d) be arbitrary, with a modulus of continuity δ g . There is no loss of generality in assuming that g has values in [0,1]. As an abbreviation, write f ≡ g|S ∈ Cub (S,d). Let ε > 0 be arbitrary. 1. Let δd be the operation listed in Condition 4 of Definition 11.7.1. Consider arbitrary points y,z ∈ S with d(y,z) < δd (δ g (ε)). Then, according to Condition 4 of Definition 11.7.1, we have d(y,z) < δ g (ε). Hence |f (y) − f (z)| = |g(y) − g(z)| < ε. Thus the function f ≡ g|S has a modulus of continuity δd ◦ δ g . Therefore, according to the definition of the modulus of smoothness αV in Definition 11.7.2, the function Vt f ∈ Cub (S,d) has a modulus of continuity αV,N (δd ◦ δ g ).
540
Stochastic Process
2. Let k ≥ 0 be so large that 2−k+1 < δ g (ε).
(11.7.5)
Define n ≡ 2k+1 ∨ κV,N (ε). Then, by the definition of κV,N , we have y
0 ≤ Vt hy,n ≤ ε
(11.7.6)
for each y ∈ S. 3. By Condition 5 of Definition 11.7.1, we have, for each u ∈ S, if d(u,x◦ ) > n ≥ 2k+1 , then d(u,) ≤ 2−k < δ g (ε), whence |f (u) − g()| = |g(u) − g()| ≤ ε. 4. Take an arbitrary a ∈ (2n + 1,2n + 2) such that the set K ≡ {u ∈ S : d(x◦,u) ≤ a} is a compact subset of (S,d). Define K ≡ {u ∈ S : d(x◦,u) > 2n + 1}.
(11.7.7)
∪ K .
Then S = K Recall here that S ∪ {} is a dense subset of (S,d) by Condition 1 of Definition 11.7.1. At the same time, by Condition 2 of Definition 11.7.1, there exists cK > 0 such that d(x,) ≥ cK for each x ∈ K. 5. Define ε ≡ αV,N (δd ◦ δ g )(ε). Then, by Condition 3 of Definition 11.7.1, there exists δK (ε ) > 0 such that for each y ∈ K and z ∈ S with d(y,z) < δK (ε ), we have d(y,z) < ε ≡ αV,N (δd ◦ δ g )(ε), whence, in view of the last statement in Step 1, we have y
|Vt f − Vtz f | < ε.
(11.7.8)
6. Now define δ(ε) ≡ δK (ε ) ∧ cK and consider each y,z ∈ S ∪ {} with d(y,z) < δ(ε). We will verify that |(Tt g)(y) − (Tt g)(z)| ≤ 4ε.
(11.7.9)
To that end, note that there are five possibilities: (i) y ∈ K and z ∈ S, (ii) y ∈ K and z = , (iii) y,z ∈ K , (iv) y ∈ K and z = , and (v) y = z = . Consider case (i). Then inequality 11.7.8 holds. Moreover, the left-hand side of inequality 11.7.9 is equal to the left-hand side of inequality 11.7.8. Therefore the desired inequality 11.7.9 holds. Next consider case (ii). Then y ∈ K and z = . Hence d(y,z) ≥ cK ≥ δ(ε), which is a contradiction. Therefore case (ii) can be ruled out. 7. Suppose y ∈ K . Then d(x◦,y) > 2n + 1. Therefore, for each point u ∈ S with hy,n (u) > 0, we have d(y,u) < n + 1, and so d(u,x◦ ) ≥ d(x◦,y) − d(y,u) > (2n + 1) − (n + 1) = n.
Markov Process
541
In view of Step 3, it follows that for each point u ∈ S with hy,n (u) > 0, we have |f (u) − g()| ≤ ε. Therefore |f hy,n − g()hy,n | ≤ ε on S. Consequently, y
y
y
y
y
0 ≤ (Tt g)(y) ≡ Vt f = Vt hy,n f + Vt hy,n f ≤ Vt hy,n f + Vt hy,n y
y
≤ Vt hy,n f + ε ≤ Vt (g()hy,n + ε) + ε y
= g()Vt hy,n + 2ε ≤ g() + 2ε,
(11.7.10)
where we have used equality 11.7.6. 8, To continue, consider case (iii). Then, with z in the role of y in the previous paragraph, we can prove, similarly, that 0 ≤ (Tt g)(z) ≤ g() + 2ε,Combining with equality 11.7.10, we again obtain inequality 11.7.9. Now consider case (iv). Then equality 11.7.10 holds, while (Tt g)(z) ≡ g(). Combining, we obtain, once more, inequality 11.7.9. Finally, consider case (v). Then (Tt g)(y) ≡ g() ≡ (Tt g)(z). Hence inequality 11.7.9 trivially holds. 9. Summing up, inequality 11.7.9 holds for each y,z ∈ S ∪ {} with d(y,z) < δ(ε), where ε > 0 is arbitrary. Thus the function Tt g is continuous on the dense subset S of (S,d), with a modulus of continuity αT,N (δ g ) defined by αT,N (δ g ) ≡ α (δ g ,αV,N ) ≡ δ ≡ cK ∧ δK ◦ αV,N ◦ δd (δ g ),
(11.7.11)
where δ g is the modulus of continuity of the arbitrary function g ∈ C(S,d). 10. Hence Tt g can be extended to a continuous function Tt g ∈ C(S,d), with the modulus of continuity αT,N (δ g ). Thus Tt : C(S,d) → C(S,d) is a well-defined function. By the defining equality 11.7.4, it is a nonnegative linear function, with y Tt 1 = 1. Hence, for each y ∈ S, the linear and nonnegative function Tt is an y integration with Tt 1 = 1. Moreover, for each t ∈ [0,∞), for each N ≥ 1 such that t ∈ [0,N ], and for each g ∈ C(S,d) with modulus of continuity δ g , the function Tt g has a modulus of continuity αT,N (δ g ). We conclude that Tt is a transition distribution from (S,d) to (S,d), where N ≥ 1 and t ∈ [0,N ] are arbitrary. It is also clear from the defining equality 11.7.4 that T0 is the identity mapping. 11. It remains to verify the conditions in Definition 11.3.1 for the family T ≡ {Tt : t ∈ [0,∞)} to be a Markov semigroup. The smoothness condition follows immediately from the first sentence in Step 10, which says that the operation αT,N is a modulus of smoothness for the transition distribution Tt , for each N ≥ 1, for each t ∈ [0,N ]. 12. For the semigroup property, consider each s,t ∈ [0,∞). Let y ∈ S be arbitrary. Then, by inequality 11.7.6, we have y
y
Ts hy,k = Vs hy,k ↑ 1
(11.7.12)
and hy,k ↑ 1S , as k → ∞. Consequently, S is a full subset and {} is a null subset y of S relative to the distribution Ts . Hence g|S is equal to the r.r.v. g on a full set, y and is itself an r.r.v. relative to Ts , with
542
Stochastic Process y
y
y
Tt g = Tt (g|S) ≡ Vt (g|S),
(11.7.13)
where y ∈ S and g ∈ C(S,d) are arbitrary. Therefore (Tt g)|S = Vt (g|S),
(11.7.14)
where g ∈ C(S,d) is arbitrary. Equality 11.7.13, with Tt g,s in the roles of g,t, respectively, implies that y
y
y
y
Ts (Tt g) = Vs ((Tt g)|S) = Vs (Vt (g|S)) = Vs+t (g|S),
(11.7.15)
where the second equality is from equality 11.7.14, and where the last equality is by the semigroup property of the Feller semigroup V . Applying equality 11.7.15, y y with t,s replaced by 0,t + s, respectively, we obtain Ts+t (g) = Vs+t (g|S). Substituting back into equality 11.7.15, we have y
y
Ts (Tt g) = Ts+t (g), where y ∈ S is arbitrary. At the same time, the defining equality 11.7.3 implies that (g). Ts (Tt g) ≡ (Tt g)() ≡ g() = Ts+t
Thus we have proved that Ts (Tt g) = Ts+t (g) on the dense subset S ∪ {} of (S,d). Hence, by continuity, Ts (Tt g) = Ts+t (g), where g ∈ C(S,d) is arbitrary. The semigroup property is proved for the family T. 13. It remains to verify strong continuity of the family T. To that end, let ε > 0 be arbitrary, and let g ∈ C(S,d) be arbitrary, with a modulus of continuity δ g and
g ≤ 1. Define δT (ε,δ g ) ≡ δ (ε,δ g ,δV ) ≡ δV (ε,δd ◦ δ g ).
(11.7.16)
g − Tt g ≤ ε,
(11.7.17)
We will prove that
provided that t ∈ [0,δT (ε,δ g )). First note that, by the defining equality 11.7.3, we have g() − (Tt g)() = 0.
(11.7.18)
Next, recall from Step 1 that the function g|S has a modulus of continuity δd ◦ δ g . Hence, by the strong continuity of the Feller semigroup, there exists δV (ε,δd ◦ δ g ) > 0 so small that for each t ∈ [0,δV (ε,δd ◦ δ g )), we have |(g|S) − Vt (g|S)| ≤ ε as functions on S. Then, for each y ∈ S, we have y
y
|Tt g − g(y)| ≡ |Vt (g|S) − g(y)| ≤ ε,
(11.7.19)
Markov Process
543
where the inequality is from inequality 11.7.19. Combining with equality 11.7.18, we obtain |Tt g−g| ≤ ε on the dense subset S ∪{} of (S,d). Hence, by continuity, we have
Tt g − g ≤ ε, where t ∈ [0,δT (ε,δ g )) and g ∈ C(S,d) are arbitrary, with a modulus of continuity δ g and g ≤ 1. Thus we have also verified the strong continuity condition in Definition 11.3.1 for the family T ≡ {Tt : t ∈ [0,∞)} to be a Markov semigroup. Definition 11.7.6. Feller process. Let V ≡ {Vt : t ∈ [0,∞)} be an arbitrary Feller semigroup on the locally compact metric space (S,d). Let (,L,E) be an arbitrary probability space. For each x ∈ S, let U x,V : [0,∞) × (,L,E) → (S,d) be a process such that 1. U x,V is a.u. càdlàg and 2. U x,V has marginal distributions given by the family F x,V of f.j.d.’s generated by the initial state x and Feller semigroup V, in the sense of Definition 11.7.4. Then the triple ((S,d),(,L,E),{U x,V : x ∈ S}) is called a Feller process with the Feller semigroup V. From a given Feller semigroup V with locally compact state space (S,d), the preceding theorem constructed a compactification Markov semigroup T with compact state space (S,d). For each x ∈ S, we can construct an a.u. càdlàg strong Markov process Xx,T with compact state space (S,d), from which we can extract a Feller process with the given locally compact state space along with all the nice sample properties. Thus the next theorem proves the existence of a Feller process with an arbitrarily given Feller semigroup. Theorem 11.7.7. Construction of a Feller process and Feller transition f.j.d.’s from a Feller semigroup. Let V ≡ {Vt : t ∈ [0,∞)} be an arbitrary Feller semigroup on the locally compact metric space (S,d), with moduli δV , αV , κV as in Definition 11.7.2. Let T ≡ {Tt : t ∈ [0,∞)} be a compactification of V, in the sense of Lemma 11.7.5. Let x ∈ S be arbitrary. Let X ≡ Xx,T : [0,∞) × (,L,E) → (S,d) be the a.u. càdlàg Markov process generated by the initial state x and semigroup T, as constructed in Theorem 11.6.5. Then the following conditions hold: 1. For each t ∈ [0,∞), we have P (Xt ∈ S) = 1. 2. Let M ≥ 1 be arbitrary. For each ε0 > 0, there exists β ≡ βM (ε0 ) > 0 such that for each h ≥ 0 we have
544
⎛ P⎝
Stochastic Process
⎞
(d(x,Xv ) > β)⎠ ≤ ε0 .
(11.7.20)
v∈[0,M]Q(h)
We emphasize that the bound βM exists regardless of how large h is. 3. Let t ∈ [0,∞) be arbitrary. Define the function Utx,V : (,L,E) → (S,d) by domain(Utx,V ) ≡ Dtx,T ≡ (Xtx,T ∈ S) and by Utx,V (ω) ≡ Xtx,T (ω) for each ω ∈ domain(Utx,V ). More succinctly, we define Utx,V ≡ ι(Xtx,T |Dtx,T ),
(11.7.21)
where the identity mapping ι¯ : (S,d) → (S,d), defined by ι(y) ≡ y for each y ∈ S, is uniformly continuous on each compact subset of (S,d), according to Condition 3 of Definition 11.7.1. Then Utx,V = Xtx,T a.s. Moreover, the function U ≡ U x,V : [0,∞) × (,L,E) → (S,d) is a well-defined process. 4. The process U x,V has marginal distributions given by the consistent family x,V of f.j.d.’s generated by the initial state x and the Feller semigroup V, in the F sense of Definition 11.7.4. In particular, for arbitrary m ≥ 1, f ∈ Cub (S m,d m ), and nondecreasing sequence r1 ≤ · · · ≤ rm in [0,∞), we have x,T x,V Ef (Ur(1), . . . ,Ur(m) ) = Fr(1),...,r(m) f = Fr(1),...,r(m) f.
5. The process U x,V is time-uniformly a.u. càdlàg in the sense of Definition 10.10.1, with a modulus of continuity in probability (δCp,δ(V),δCp,δ(V), . . .) and a modulus of a.u. càdlàg (δaucl,δ(V),δaucl,δ(V), . . .) that are completely determined by δV and independent of x. 6. The triple ((S,d),(,L,E),{U x,V : x ∈ S}) is a Feller process, with the Feller semigroup V. Proof. 1. Lemma 11.7.5 says that S is a full subset, and {} is a null subset, of S relative to the distribution Ttx , whence P (Xt ∈ S) = Tsx 1S = 1, where x ∈ S and t ∈ [0,∞) are arbitrary. Assertion 1 follows. 2. Let t ≥ 0 and M ≥ 1 be arbitrary such that t ∈ [0,M]. Let ε0 > 0 be arbitrary. Write ε ≡ 2−1 ε0 . Take any n ≥ κV,M (ε) and any b ≥ n + 1. Then, by the non-explosion condition in Definition 11.7.2, we have y
Vt hy,n ≤ ε for each y ∈ S. Define the function gn ∈ Cub (S 2,d 2 ) by gn (y,z) ≡ hy,n (z)
Markov Process
545
for each (y,z) ∈ S 2 . Then, using equality 11.4.1 in Definition 11.4.1, we obtain ! ! ! y,T y y x(1) F0,t gn = T0 (dx1 ) Tt (dx2 )gn (x1,x2 ) = T0 (dx1 )Ttx(1) gn (x1,·) y
y
y
= Tt gn (y,·) = Vt gn (y,·) ≡ Vt hy,n ≤ ε,
(11.7.22)
where we used the fact that T0 is the identity mapping, and where y ∈ S and xh t ∈ [0,M] are arbitrary. Moreover, TMx hx,n = VM x,n ≤ ε. Consequently, P (d(x,XM ) > b) ≤ Ehx,n (XM ) = TMx hx,n ≤ ε.
(11.7.23)
3. Next, take any β > 2b such that (d(x,·) ≤ β) is a compact subset of (S,d). h ) be the simple first exit Consider each h ≥ 0. Let η ≡ η0,β,[0,M]Q(h) (X|[0,M]Q time for the process X|[0,M]Qh to exit the β-neighborhood of X0 , in the sense of Definition 8.1.12. Then ⎞ ⎛ (d(x,Xv ) > β)⎠ P⎝ v∈[0,M]Q(h)
≤ P (d(x,Xη ) > β) ≤ P (d(x,Xη ) > β;d(x,XM ) ≤ b) + P (d(x,XM ) > b) ≤ P (d(x,Xη ) > β;d(x,XM ) ≤ b) + ε
= P (d(x,Xv ) > β,η = v;d(x,XM ) ≤ b) + ε v∈[0,M]Q(h)
≤
P (d(Xv,XM ) > β − b,η = v) + ε
v∈[0,M]Q(h)
≤
P (d(Xv,XM ) > b,η = v) + ε
v∈[0,M]Q(h)
≤
E(hX(v),n (XM ),η = v) + ε
v∈[0,M]Q(h)
≡
v∈[0,M]Q(h)
=
E(gn (Xv,XM ),η = v) + ε X(v),T
E F0,M−v gn,η = v + ε,
v∈[0,M]Q(h)
≤
E(ε;η = v) + ε = 2ε ≡ ε0,
(11.7.24)
v∈[0,M]Q(h)
where the last inequality is thanks to inequality 11.7.22; where the third inequality is by inequality 11.7.23; where the sixth inequality is because b ≥ n + 1, whence hX(v),n (XM ) ≥ 1b 2−m(k) ≡ m(k)
(11.8.5)
d(U (0,ω),U (·,ω)) ≤ 2−k
(11.8.6)
and
on the interval θ0 (ω) ≡ [0,τ1 (ω)). Inequalities 11.8.6 and 11.8.5 together imply that for each ω ∈ A, we have d(U0 (ω),Uu (ω)) ≤ 2−k
(11.8.7)
for each u ∈ [0,m(k) ] ∩ domain(U (·,ω)). 3. Consider each ω ∈ AB and each κ ≥ k. Write Jκ ≡ Jκ.k ≡ 2m(κ)−m(k) . Then Jκ m(κ) = m(k) . Since ω ∈ B, we have j m(κ) ∈ [0,m(k) ] ∩ domain(U (·,ω)) for each j = 0, . . . ,Jκ . Therefore, according to inequality 11.8.7, we have
552
Stochastic Process J (κ)
0,Uj (m(κ)) )1A ≤ 2−k d(U
j =0
on AB, where we recall that d ≡ Cub (S J (κ)+1,d J (κ)+1 ) by
1 ∧ d. Define the function fκ
fκ (x0,x1, . . . ,xJ (κ) ) ≡
J (κ)
∈
0,xj ) d(x
j =0
for each (x0,x1, . . . ,xJ (κ) ) ∈ S J (κ)+1 . Then Efκ (U0,U(m(κ)), . . . ,UJ (κ)(m(κ)) ) =E
J (κ)
0,Uj (m(κ)) ) d(U
j =0
≤E
J (κ)
0,Uj (m(κ)) )1AB + P ((AB)c ) ≤ 2−k + 2−k = 2−k+1, d(U
j =0
(11.8.8) In terms of marginal distributions of the process U ≡ U x,V , inequality 11.8.8 can be rewritten as x,V −k+1 , F0,(m(κ)),...,J (κ)(m(κ)) fκ ≤ 2
(11.8.9)
where x ∈ S is arbitrary. 4. Separately, because U |Q∞ is D-regular, with the sequence m as a modulus of D-regularity, Assertion 2 of Lemma 10.5.5, where Z,v,v are replaced by U |Q∞,0,m(k) , respectively, implies that the supremum V0,k,∞ ≡
sup
0,Uu ) = 1 ∧ d(U
u∈[0,(m(k))]Q(∞)
sup
d(U0,Uu )
u∈[0,(m(k))]Q(∞)
is a well-defined r.r.v. 5. Recall that κ ≥ k is arbitrary. Assertion 3 of Lemma 10.5.5, where Z,v,v ,h are replaced by U |Q∞,0,m(k),κ, respectively, says that 0,Uu ) − E 0,Uu ) ≤ 2−κ+5 . 0≤E sup d(U d(U u∈[0,(m(k))]Q(∞)
u∈[0,(m(k))]Q(m(κ))
(11.8.10) As an abbreviation, define the r.r.v. 0,Uu ) d(U V0,k,κ ≡ u∈[0,(m(k))]Q(m(κ))
=
J (κ) j =0
0,Uj (m(κ)) ) ≡ fκ (U0,U(m(κ)), . . . ,UJ (κ)(m(κ)) ). (11.8.11) d(U
Markov Process
553
Then inequality 11.8.10 can be rewritten compactly as 0 ≤ EV0,k.∞ − EV0,k,κ ≤ 2−κ+5,
(11.8.12)
where κ ≥ k is arbitrary. Consequently, for each κ ≥ κ ≥ k, we have 0 ≤ EV0,k,κ − EV0,k,κ ≤ 2−κ+5 .
(11.8.13)
Equivalently, for each κ ≥ κ ≥ k, we have x,V x,V −κ+5 0 ≤ F0,(m(κ , )),...,J (κ )(m(κ )) fκ − F0,(m(κ)),...,J (κ)(m(κ)) fκ ≤ 2 (11.8.14)
where x ∈ S is arbitrary. h for some h ≥ 0, 6. Now let η be an arbitrary stopping time with values in Q (Z,t) : t ∈ Q∞ }. Generalize the defining relative to the filtration L ≡ LZ ≡ {L equality 11.8.11 for V0,k,κ to η,Uη+u ) Vη,k,κ ≡ d(U u∈[0,(m(k))]Q(m(κ))
≡
J (κ)
η,Uη+j (m(κ)) ) ≡ fκ (Uη,Uη+(m(κ)), . . . ,Uη+J (κ)(m(κ)) ). d(U
j =0
(11.8.15) h, whence Then, 1(η=t) ∈ L(Z,t) for each t ∈ Q
Efκ (Uη,Uη+(m(κ)), . . . ,Uη+J (κ)(m(κ)) )1(η=t) EVη,k,κ = t∈Q(h)
=
Efκ (Ut ,Ut+(m(κ)), . . . ,Ut+J (κ)(m(κ)) )1(η=t)
t∈Q(h)
=
U (t),V
EF0,(m(κ)),...,J (κ)(m(κ)) (fκ )1(η=t)
t∈Q(h)
=
U (η),V
EF0,(m(κ)),...,J (κ)(m(κ)) (fκ )1(η=t)
t∈Q(h) U (η),V
= EF0,(m(κ)),...,J (κ)(m(κ)) (fκ ),
(11.8.16)
where the third equality is by Lemma 11.8.4. Hence 0 ≤ EVη,k,κ − EVη,k,κ U (t),V U (t),V = E F0,(m(κ)),...,J (κ )(m(κ )) (fκ ) − F0,(m(κ)),...,J (κ)(m(κ)) (fκ ) ≤ 2−κ+5, where the equality is by equality 11.8.16, where the inequality is by inequality 11.8.14, and where κ ,κ are arbitrary with κ ≥ κ ≥ k.
554
Stochastic Process
7. Consequently, EVη,k,κ converges. By the Monotone Convergence Theorem, we have E|Vη,k,κ − W | → 0 for some r.r.v. W ∈ L, as κ → ∞. Moreover, Vη,k,κ ↑ W a.u. as κ → ∞. In particular, Vη,k,κ (ω) ↑ W (ω) as κ → ∞, for each ω in some full set B . By replacing B with BB if necessary, we may assume that B ⊂ B. Then, for each ω ∈ B , the supremum (η(ω),ω),U (η(ω) + u,ω)) = W (ω) d(U
sup u∈[0,(m(k))]Q(∞)
exists. Since ω ∈ B, the function U (·,ω) is right continuous on its domain. Hence the last displayed equality implies (η(ω),ω),U (η(ω) + v,ω)) = W (ω). d(U
sup v∈[0,(m(k))]
Therefore, in terms of the function Vη,k defined in the hypothesis, we have Vη,k (ω) ≡
η (ω),Uη+v (ω)) = W (ω), d(U
sup v∈[0,(m(k))]
where ω ∈ B ∩ domain(W ) is arbitrary. We conclude that Vη,k = W a.s., and therefore that Vη,k ∈ L is a well-defined integrable r.r.v. Assertion 1 is proved. Moreover, EVη,k = EW = lim EVη,k,κ = lim EF0,(m(κ)),...,J (κ)(m(κ)) fκ ≤ 2−k+1, U (η),T
κ→∞
κ→∞
where the third equality is by equality 11.8.16, and where the inequality follows from inequality 11.8.9. Assertion 2 and the lemma are proved. Lemma 11.8.6. Observability of Feller process at stopping times. Recall the abbreviations U ≡ U x,V and Z ≡ Z x,V ≡ U x,V |Q∞ . Recall the natural filtration L ≡ LU ≡ {L(U,t) : t ∈ [0,∞)} of the process U , and the natural filtration (U,t) : LZ ≡ {L(Z,t) : t ∈ Q∞ } of the process Z. In addition, let L ≡ LU ≡ {L t ∈ [0,∞)} be the right-limit extension of LU . Let τ be an arbitrary stopping time relative to the right continuous filtration L ≡ LU . Then the following conditions hold. 1. The function Uτ is a well-defined r.v. which is measurable relative to the probability subspace L
(U,τ )
≡ {Y ∈ L : Y 1(τ ≤t) ∈ L
(U,t)
for each regular point t ∈ [0,∞) of τ }, (11.8.17)
introduced in Definition 8.1.9. 2. There exists a nonincreasing sequence (ηh )h=0,1,... of stopping times, relative to the filtration LZ ≡ {L(Z,t) : t ∈ Q∞ }, such that for each h ≥ 0, the r.r.v. ηh h and such that has values in Q
Markov Process τ + 2h < ηh < τ + 2h−1 .
555 (11.8.18)
Then Uη(h) is an r.v. for each h ≥ 0, and Uη(h) → Uτ a.u. as h → ∞. Moreover, for each ω in some full set A, the function U (·,ω) is right continuous at τ (ω). 3. More generally, let m ≥ 1 and r0 ≤ · · · ≤ rm be arbitrary. Then, for each i = 0, · · · ,m,Uτ +r(i) is a well-defined r.v. that is measurable relative to the (U,τ +r(i))
. Moreover, for each i = 0, · · · ,m and for each probability subspace L h , (ii) uh,0 < · · · < uh,m , h ≥ 0, (i) there exists uh,i ∈ (ri ,ri + (m + 2)2−h ) ∩ Q and (iii) Uη(h)+u(h,i) → Uτ +r(i) a.u. as h → ∞. Proof. 1. For each h ≥ 0, recall the set h ≡ {0,h,2h, · · · } ≡ {0,2−h,2 · 2−h, · · · } ⊂ [0,∞). Q Let the increasing sequence (mk )k=1,2, . . . ≡ (mδ(V),k )k=1,2, · · · of positive integers be as constructed in Lemma 11.8.5 relative to Feller semigroup V. h is countable, there exists αh ∈ 2. Let h ≥ 0 be arbitrary. Because the set Q h . Define (h,2h ) such that the set (τ + αh ≤ s) is measurable for each s ∈ Q the topping time ηh ≡
∞
(s + h )1s−(h) εj ) is measurable. Then P (Aj ) ≤ εj−1 EWj ≤ εj−1 2−j −1 < 2j/2+1 2−j −1 = 2−j/2, where the first inequality is by Chebychev’s inequality, and where the second inequality is by inequality 11.8.22. Therefore, we can define the measurable set Ak+ ≡ ∞ j =k Aj , with P (Ak+ )
0 is arbitrary. Consequently, f (Uτ )1(τ ≤t) ∈
L(U,t+s) ≡ L(U,t+) ≡ L
(U,t)
.
(11.8.31)
s>0
In view of relation 11.8.31, we can apply the defining equality 11.8.17 to the stop(U ) ping time τ to obtain f (Uτ ) ∈ L τ . Since f ∈ C(S,d) is arbitrary, we conclude (U ) that the r.v. Uτ is measurable relative to L τ . Assertion 1 is proved. 8. Let k ≥ 0 be arbitrary. Then τ ≤ ηK < τ + δk for sufficiently large K ≥ 0. Hence, by inequality 11.8.27, we have η(κ),Uζ (k+1) ) ≤ 2 d(U
∞
εj
(11.8.32)
j =k
on Ack+ . Combining with inequality 11.8.29, we obtain η(κ),Uτ ) ≤ 2 d(U
∞
εj
(11.8.33)
j =k
on Ack+ , for sufficiently large κ ≥ 0. Since P (Ak+ ) is arbitrarily small, we conclude that Uη(κ) → Uτ a.u. as κ → ∞. Furthermore, equality 11.8.28 shows that U (·,ω) is right continuous at τ (ω). Assertion 2 is verified. 9. To prove Assertion 3, let (ri )0,··· ,m be an arbitrary nondecreasing sequence in [0,∞). Let i = 0, · · · ,m be arbitrary. Consider the stopping time τ + ri . By Assertion 2, there exists a full set Ai such that for each ω ∈ Ai , the function U (·,ω) h is right continuous at τ (ω) + ri . Note that the set Bh,i ≡ (ri ,ri + (m + 2)2−h )Q contains at least m + 1 members. Let uh,i be the i-th member of Bh,i . Suppose i ≥ 1. Then uh,i is at least equal to the i-th member of Bh,i−1 . Hence it is greater than the (i − 1)-st member of Bh,i−1 . In other words, uh,i > uh,i−1 . Conditions (i) and (ii) of Assertion 3 are proved. Since ηh + uh,i ↓ τ + ri , the right continuity of U at τ (ω) + ri implies that Uη(h)+u(h,i) → Uτ +r(i) a.u. as h → ∞. Condition (iii) of Assertion 3 is verified. The lemma is proved. Theorem 11.8.7. Feller processes are strongly Markov. Let x ∈ S be arbitrary. Recall the abbreviations U ≡ U x,V and Z ≡ Z x,V ≡ U x,V |Q∞ . Recall the natural filtration L ≡ LU ≡ {L(U,t) : t ∈ [0,∞)} of the process U , and the natural filtration LZ ≡ {L(Z,t) : t ∈ Q∞ } of the process Z. In addition, let L ≡ (U,t) LU ≡ {L : t ∈ [0,∞)} be the right-limit extension of LU . Then the process U ≡ U x,V is strongly Markov relative to the right-continuous filtration L. More precisely, let τ be an arbitrary stopping time relative to the right-continuous filtration L ≡ LU . Then
Markov Process (U,τ )
E(f (Uτ +r(0), · · · ,Uτ +r(m) )|L
559
) U (τ ),V
= E(f (Uτ +r(0), · · · ,Uτ +r(m) )|Uτ ) = Fr(0),...,r(m) (f ),
(11.8.34)
for each nondecreasing sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm , for each f ∈ Cub (S m+1,d m+1 ). Proof. 1. Let the nondecreasing sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm , be arbitrary. Let f ∈ Cub (S m+1,d m+1 ) be arbitrary. By Lemma 11.8.6, there exists a nonincreasing sequence (ηh )h=0,1, . . . of stopping times, relative to the filtration h , and such that LZ , such that for each h ≥ 0, the r.r.v. ηh has values in Q τ + 2h < ηh < τ + 2h−1 .
(11.8.35)
2. Consider each i = 0, · · · ,m. According to Assertion 3 of Lemma 11.8.6, there exists a sequence (uh,i )h=0,1, . . . such that for each h ≥ 0, we have (i) uh,i ∈ h , (ii) uh,0 < · · · < uh,m , and (iii) Uη(h)+u(h,i) → Uτ +r(i) (ri ,ri +(m+2)2−h )∩ Q a.u. as h → ∞. (U,τ ) . Recall, from Definition 8.1.9, that 3. Consider each Y ∈ L F G (U,τ ) (U,t) L ≡ Y ∈ L : Y 1(τ ≤t) ∈ L for each regular point t ∈ [0,∞) of τ . Consider each h ≥ 0. Take any regular point t of the r.r.v. τ . Then (U,t) Y 1(τ ≤t) ∈ L = L(U,u) u∈(t,∞)
⊂
u∈(t,∞)Q(∞)
L(U,u) =
L(Z,u) .
(11.8.36)
u∈(t,∞)Q(∞)
where the last equality is due to Lemma 11.5.2. h be arbitrary such that P (ηh+2 = u) > 0. Then, on the set 4. Now let u ∈ Q (ηh+2 = u), we have, by inequality 11.8.35, τ + 2h+2 < ηh+2 = u. Consequently, u − 2h+2 > 0. Take a regular point t ∈ (u − 2h+2,u) of the r.r.v. τ . Then, on (ηh+2 = u), we have τ < u − 2h+2 < t. It follows that 1η(h+2)=u = 1τ ≤t 1η(h+2)=u . Because t < u by the choice of t, we have Y 1(τ ≤t) ∈ L(Z,u) according to relation 11.8.36. Hence Y 1η(h+2)=u = Y 1τ ≤t 1η(h+2)=u ∈ L(Z,u), n(h+2) is arbitrary such that P (ηh+2 = u) > 0. where u ∈ Q
560
Stochastic Process
5. Next, write k ≡ h + 2. Then Ef (Uη(k)+u(k,0), · · · ,Uη(k)+u(k,m) )Y
= Ef (Uη(k)+u(k,0), · · · ,Uη(k)+u(k,m) )Y 1(η(k)=u) u∈Q(k)
=
Ef (Uu+u(k,0), · · · ,Uu+u(k,m) )Y 1(η(k)=u)
u∈Q(k)
=
U (u),V
EFu(k,0),··· ,u(k,m) (f )Y 1(η(k)=u)
u∈Q(k)
=
U (η(k)),V
EFu(k,0),··· ,u(k,m)) (f )Y 1(η(k)=u)
u∈Q(k)
U (η(k)),V
= EFu(k,0),··· ,u(k,m)) (f )Y
1(η(k)=u)
u∈Q(k) U (η(k)),V
= EFu(k,0),··· ,u(k,m)) (f )Y,
(11.8.37)
where the third equality is by applying relation 11.8.4 to the r.v. Y 1(η(k) = u) ∈ L(Z,u) . 6. Now let h → ∞. Then Uη(k)+u(h,j ) → Uτ +r(j ) a.u., for each j = 0, · · · ,m. Hence, in view of the boundedness and uniform continuity of the function f , the left-hand side of equality 11.8.38 converges to Ef (Uτ +r(0), · · · Uτ +r(m) )Y . 7. Consider the right-hand side, Consider first the case where f ∈ C(S m+1, x,V m+1 ). Then Corollary 11.7.8 says that the function Fr(0),··· d ,r(m) f of (x,r0, · · · ,rm ) ∈ S × {(r0, · · · ,rm ) ∈ [0,∞)m+1 : r0 ≤ · · · ≤ rm } m+1 is continuous relative to the metric d ⊗ decld , where decld is the Euclidean metric. Hence U (η(k)),V
U (τ ),V
Fu(k,0),··· ,u(k,m)) (f )Y → Fr(0),··· ,r(m) f
a.u.
(11.8.38)
E(Fu(k,0),··· ,u(k,m) f )Y → E(Fr(0),··· ,r(m) f )Y .
(11.8.39)
as k → ∞. Therefore U (η(k)),V
U (τ ),V
Combining with Step 7, we see that as k → ∞, equality 11.8.38 yields U (τ ),V Ef (Uτ +r(0), · · · ,Uτ +r(m) )Y = E(Fr(0),··· ,r(m) f )Y,
(11.8.40)
where f ∈ C(S m+1,d m+1 ). is arbitrary. Since C(S m+1,d m+1 ) is dense in Cub (S m+1,d m+1 ) relative to the L1 -norm with respect to each distribution, we see that (U,τ ) equality 11.8.40 holds for each Cub (S m+1,d m+1 ), where Y ∈ L is arbitrary. Thus (U,τ )
E(f (Uτ +r(0), · · · ,Uτ +r(m) )|L
U (τ ),V
) = Fr(0),··· ,r(m) f .
(11.8.41)
Markov Process In particular, equality 11.8.40 holds for each Y ∈ L(Uτ ) ⊂ L E(f (Uτ +r(0), · · · ,Uτ +r(m) )|Uτ ) =
561 (U,τ )
. Hence
U (τ ),V Fr(0),··· ,r(m) f .
The desired equality 11.8.34 has been verified. The theorem is proved.
(11.8.42)
11.9 Abundance of First Exit Times In this section, let the arbitrary Feller process ((S,d),(,L,E),{U y,V : y ∈ S}) be as specified, along with related objects, in Definitions 11.8.2 and 11.8.1. In (U,t) : t ∈ [0,∞)} is particular, we have x ∈ S, U ≡ U x,V , and L ≡ LU ≡ {L the right-limit extension of the natural filtration LU of U . In addition, let f ∈ Cub (S,d) and a0 ∈ R be arbitrary but fixed, such that a0 ≥ f (x). Let δf be a modulus of continuity of the function f . Let M ≥ 1 be an arbitrary integer. Recall Definition 10.11.1 of first exit times τ f ,a,N (U ) related to the right continuous filtration L ≡ LU , and recall Lemma 10.11.2 for their basic properties. Theorem 11.9.1. Abundance of first exit times for Feller process. There exists a countable subset G of R such that for each a ∈ (a0,∞)Gc , the first exit time τ f ,a,M (U ) exists relative to the filtration L. Here Gc denotes the metric complement of G in R. Proof. 1. Consider the process U ≡ U x,V : [0,∞) × (,L,E) → (S,d). Define the process Z ≡ U |Q∞ . 2. Let N = 0, . . . ,M − 1 be arbitrary. Consider the shifted processes U N and Z N . By assumption, the process U is a.u. càdlàg. Hence the process U N : [0,1] × (,L,E) → (S,d) is a.u. càdlàg, with some modulus of continuity of probability δCp ≡ δCp,N , and modulus of a.u. càdlàg δaucl ≡ δaucl,N . Therefore, by Theorem 10.4.3, the restricted process Z N = U N |Q∞ : Q∞ × (,L,E) → (S,d) is D-regular with some modulus of D-regularity m ≡ (mk )k=0,1,... . 3. Let w ∈ [0,M]Q∞ be arbitrary. Define the function Vw ≡ sup f (Uu ) : (,L,E) → R. u∈[0,w]
In the following steps, we will prove that (i) Vw is a well-defined r.r.v., whence (ii) the function V : [0,M]Q∞ × (,L,E) → R is a well-defined process. The motivation for this alleged process V is that the process U exits the open subset (f < a) approximately when V exceeds the level a, and the fact that as a nondecreasing real-valued process with a countable parameter set, V is simpler to handle. 4. First note that Definition 10.3.2 says that there exists a full set B ⊂ N t∈Q(∞) domain(Ut ) with the following properties. For each ω ∈ B, the funcN tion U (·,ω) satisfies the right-continuity condition and the right-completeness
562
Stochastic Process
condition in Definition 10.3.2. Moreover, for each k ≥ 0 and εk > 0, there exist (i ) δk ≡ δaucl,N (εk ) > 0, (ii ) a measurable set Ak ⊂ B with P (Ack ) < εk , (iii ) an integer hk ≥ 1, and (iv ) a sequence of r.r.v.’s 0 = τk,0 < τk,1 < · · · < τk,h(k)−1 < τk,h(k) = 1,
(11.9.1)
such that for each i = 0, . . . ,hk − 1, the function UτN(k,i) is an r.v., and such that (v ) for each ω ∈ Ak , we have h(k)−1 >
(τk,i+1 (ω) − τk,i (ω)) ≥ δk
(11.9.2)
d(U N (τk,i (ω),ω),U N (·,ω)) ≤ εk
(11.9.3)
i=0
with
on the interval θk,i (ω) ≡ [τk,i (ω),τk,i+1 (ω)) or θk,i (ω) ≡ [τk,i (ω),τk,i+1 (ω)] depending on whether 0 ≤ i ≤ hk − 2 or i = hk − 1. 5. For each k ≥ 0, let εk ≡ 2−k ∧ 2−2 δf (2−k ). Then Conditions (i –v ) in Step 4 hold. Now define n−1 ≡ 0. Inductively, for each k ≥ 0, fix an integer nk ≥ nk−1 + 1 so large that 2−n(k) < δk ≡ δaucl,N (εk ). 6. Now let s ∈ Q∞ be arbitrary. We will show that supu∈[0,s] f (UuN ) ∈ L(N +s) . If s = 0, then sup f (UuN ) = f (U0N ) = YN,0 ≡ f (UN ) ∈ L(N +s) .
u∈[0,s]
Hence we may proceed with the assumption that s > 0. To that end, take k ≥ 0 so large that s ∈ (0,1]Qn(k) . Consider each ω ∈ Ak . Let r ∈ Gk,ω ≡ [0,s] ∩
h(k)−1
θk,i (ω) ∩ Q∞
i=0
be arbitrary. Then there exists i = 0, . . . ,hk − 1 such that r ∈ [0,s] ∩ θk,i (ω). According to 11.9.2, we have |θk,i (ω)| ≡ τk,i+1 (ω) − τk,i (ω) ≥ δk > 2−n(k), so there exists some t ∈ θk,i (ω)Qn(k) . Since the set Qn(k) is discrete, we have either t ≤ s or s < t. 7. Consider first the case where t ≤ s. Then t ∈ [0,s]Qn(k) . From Step 6, note that t,r ∈ θk,i (ω). Hence d(U N (τk,i (ω),ω),U N (t,ω)) ≤ εk ≤ 2−2 δf (2−k )
Markov Process
563
and d(U N (τk,i (ω),ω),U N (r,ω)) ≤ εk ≤ 2−2 δf (2−k ), according to inequality 11.9.3. Consequently, the triangle inequality implies that d(UtN (ω),UrN (ω)) ≡ d(U N (t,ω),U N (r,ω)) ≤ 2−1 δf (2−k ) < δf (2−k ), where we recall that δf is a modulus of continuity of the function f . Therefore f (UuN (ω)) + 2−k , (11.9.4) f (UrN (ω)) < f (UtN (ω)) + 2−k ≤ u∈[0,s]Q(n(k))
where the last inequality is because t ∈ [0,s]Qn(k) . 8. Now consider the other case, where t > s. Then s ∈ [r,t)Qn(k) ⊂ θk,i (ω), because r,t ∈ θk,i (ω). Thus r,s ∈ θk,i (ω). Hence d(U N (τk,i (ω),ω),U N (s,ω)) ≤ εk ≤ 2−2 δf (2−k ) and d(U N (τk,i (ω),ω),U N (r,ω)) ≤ εk ≤ 2−2 δf (2−k ), according to inequality 11.9.3. Consequently, d(UsN (ω),UrN (ω)) ≡ d(U N (s,ω),U N (r,ω)) ≤ 2−1 δf (2−k ) < δf (2−k ). Therefore
f (UrN (ω)) < f (UsN (ω)) + 2−k ≤
f (UuN (ω)) + 2−k ,
(11.9.5)
u∈[0,s]Q(n(k))
where the last inequality is because s ∈ [0,s]Qn(k) . Combining inequalities 11.9.4 and 11.9.5, we see that f (UuN (ω)) + 2−k , (11.9.6) f (UrN (ω)) ≤ u∈[0,s]Q(n(k))
where r ∈ Gk,ω is arbitrary. Since the set Gk,ω is dense in Gω ≡ [0,s] ∩ domain(U N (·,ω)), inequality 11.9.6 holds for each r ∈ Gω , thanks to the right continuity of the function U N (·,ω). In particular, it holds for each r ∈ [0,s]Qn(k+1) ⊂ Gω, where ω ∈ Ak is arbitrary. Thus f (UrN ) ≤ f (UuN ) + 2−k r∈[0,s]Q(n(k+1))
on Ak . Consequently, 0≤
r∈[0,s]Q(n(k+1))
u∈[0,s]Q(n(k))
f (UrN ) −
u∈[0,s]Q(n(k))
f (UuN ) ≤ 2−k
564
Stochastic Process
on Ak , where P (Ack ) < εk ≤ 2−k and k ≥ 0 is arbitrary. Since it follows that the a.u.- and L1 -limit YN,s ≡ lim f (UuN ) κ→∞
∞
−κ κ=0 2
< ∞,
(11.9.7)
u∈[0,s]Q(n(κ))
exists as an r.r.v., where s ∈ Q∞ is arbitrary. 9. The equality 11.9.7 implies that for each ω in the full set domain(YN,s ), the supremum sup u∈[0,s]Q(∞)
f (UuN (ω))
exists and is given by YN,s (ω). Now the function U N (·,ω) is right continuous for each ω in the full set B. Hence, by right continuity, we have sup f (UuN ) =
u∈[0,s]
sup u∈[0,s]Q(∞)
f (UuN ) = YN,s
(11.9.8)
on the full set B ∩ domain(YN,s ). Therefore supu∈[0,s] f (UuN ) is a well-defined r.r.v., where N = 0, . . . ,M − 1 and s ∈ Q∞ are arbitrary. 10. Moreover, from equality 11.9.7, we see that YN,s is the L1 -limit of a sequence in L(N +s) . Hence, sup f (UuN ) ∈ L(N +s)
(11.9.9)
u∈[0,s]
for each s ∈ Q∞ . Equivalently, sup f (Uu ) ∈ L(w)
(11.9.10)
u∈[N,w]
for each w ∈ [N,N + 1]Q∞ , where N = 0, . . . ,M − 1 is arbitrary. 11. Now let w ∈ [0,M]Q∞ be arbitrary. Then w ∈ [N,N + 1]Q∞ for some N = 0, . . . ,M − 1. Write s ≡ w − N ∈ Q∞ . There are two possibilities: (i ) N = 0, in which case s ≡ w and sup f (Uu ) = sup f (Uu0 ) = sup f (Uu0 ) ∈ L(s) = L(w),
u∈[0,w]
u∈[0,w]
(11.9.11)
u∈[0,s]
where the set membership is by relation 11.9.9, or (ii ) N ≥ 1, in which case the function sup f (Uu ) u∈[0,w]
= sup f (Uu ) ∨ sup f (Uu ) ∨ · · · ∨ u∈[0,1]
u∈[1,2]
sup
u∈[N −1,N ]
f (Uu ) ∨ sup f (Uu ) u∈[N,w]
is a member of L(w) , thanks to relation 11.9.9 and to the filtration relation L(1) ⊂ L(2) ⊂ · · · ⊂ L(N ) ⊂ L(w) .
Markov Process
565
Summing up, we conclude that the function Vw ≡ sup f (Uu )
(11.9.12)
u∈[0,w]
is an r.r.v. in L(w) , for each w ∈ [0,M]Q∞ . This verifies Conditions (i–ii) in Step 3, and proves that the function V : [0,M]Q∞ × (,L,E) → R is a nondecreasing real-valued process adapted to the filtration L. 12. Since the set {Vw : w ∈ [0,M]Q∞ } of r.r.v.’s is countable, there exists a countable subset G of R such that each point a ∈ Gc is a continuity point of the r.r.v. Vw for each w ∈ [0,M]Q∞ . Here Gc ≡ {a ∈ R : |a − b| > 0 for each b ∈ G} denotes the metric complement of G. 13. Consider each a ∈ (a0,∞)Gc . Thus a is a continuity point of the r.r.v. Vw for each w ∈ [0,M]Q∞ . Hence the set (Vw < a) is measurable, for each w ∈ [0,M]Q∞ . Now let k ≥ 0 be arbitrary. Recall that n(k) ≡ 2−n(k) and that n(k) ≡ {0,n(k),2n(k), . . .}. Define the r.r.v. Q
u1(V (u)≥a) 1(V (w) ηk+1 (ω) and so Vη(k+1) (ω) ≥ a, by the defining equality 11.9.13 applied to k + 1. Therefore, due to the monotonicity of the process V , we obtain Vw (ω) ≥ Vη(k+1) (ω) ≥ a, again a contradiction. We conclude that ηk − n(k) ≤ ηk+1 − n(k+1) . 15. In the opposite direction, suppose ηk+1 (ω) > ηk (ω). Since ηk (ω) ∈ n(k+1) , it follows from the defining equality 11.9.13, applied to k + 1, that Q
566
Stochastic Process
Vη(k) (ω) ≤ a. At the same time, M > ηk (ω), whence Vη(k) (ω) > a by equality 11.9.13. This, again, is a contradiction. Thus we conclude that ηk+1 (ω) ≤ ηk (ω). 16. Combining, we see that on the full set ∞ j =0 domain(ηj ), we have ηk − n(k) ≤ ηk+1 − n(k+1) < ηk+1 ≤ ηk
(11.9.14)
and, iterating inequality 11.9.14, ηk − n(k) ≤ ηκ − n(κ) ≤ ηκ ≤ ηk
(11.9.15)
for each κ ≥ k + 1. Since n(k) ≡ 2−n(k) → 0, it follows that ηκ ↓ τ uniformly on the full set ∞ j =0 domain(ηj ), for some r.r.v. τ with ηk − n(k) ≤ τ ≤ ηk ,
(11.9.16)
where k ≥ 0 is arbitrary. Consequently, Assertion 2 of Proposition 10.11.3 implies that τ is a stopping time relative to the right continuous filtration L. 17. It remains to verify that the stopping time τ is a first exit time in the sense of Definition 10.11.1. In view of the right continuity of the filtration L, Assertion 1 of Lemma 11.8.6 says that Uτ is an r.v. Assertion 2 of Lemma 11.8.6 then says that for each ω in some full set A, the function U (·,ω) is right continuous at τ (ω). Hence, for each ≡ domain(Uτ ) ∩ ω∈D
∞
domain(ηj ) ∩ A,
j =0
we have U (ηk (ω),ω) → U (τ (ω),ω). Let t ∈ domain(U (·,ω)) ∩ [0,τ (ω)) be arbitrary. 18. Consider each ω ∈ D. −n(k) for some sufficiently large k ≥ 0. In view of inequality Then t < τ (ω) − 2 11.9.16, it follows that t < τ (ω) − 2−n(k) ≤ ηk (ω) − n(k) . n(k) . Consequently, the defining equality Hence w ≡ ηk (ω) − n(k) ∈ (t,ηk (ω))Q 11.9.13 implies that Vw (ω) < a. Therefore f (U (t,ω)) ≤ sup f (Uu (ω)) ≡ Vw (ω) < a, u∈[0,w]
where t ∈ domain(U (·,ω)) ∩ [0,τ (ω)) is arbitrary, and where ω is arbitrary in Condition (i) of Definition 10.11.1 is proved for the stopping time the full set D. τ to be the first exit time in [0,M] of the open subset (f < a). 19. Proceed to verify Condition (ii) of Definition 10.11.1. To that end, let k ≥ 0 be arbitrary. The defining equality 11.9.13 says that the stopping time ηk has values n(k) . Let μk denote the number of elements in this finite in the finite set (0,M]Q n(k) . By assumption, we have a ∈ Gc . set (0,M]Qn(k) . Consider each t ∈ (0,M]Q Hence a is a continuity point of Vt . Therefore there exists εk < μ−1 k αk so small −k . Define 2 that P (a ≤ Vt < a + εk ) < μ−1 k
Markov Process (a ≤ Vt < a + εk )c .
Dk ≡
567
t∈(0,M]Q(n(k))
Then −k = 2−k . P (Dkc ) < μk μ−1 k 2
Hence D≡
∞ ∞
Dk
j =0 k=j
is a full set. 20. Next, take a sequence (αk )k=0,1,... such that αk ↓ 0. Let k ≥ 0 be arbitrary. Define the set Ak ≡ (τ < M − αk ) ∩ (f (Uτ ) < a − αk ). Then A0 ⊂ A1 ⊂ · · · . Consider each ω ∈ D ≡ domain(Uτ ) ∩
∞
domain(ηj ) ∩ A ∩ D ∩
j =0
∞
Aj .
j =0
Then ω ∈ Aj for some j ≥ 0. Hence τ (ω) < M − αj and f (Uτ (ω)) < a − αj . Since ω ∈ A, the function U (·,ω) is right continuous at τ (ω). Hence there exists k ≥ j so large that for each κ ≥ k, we have τ (ω) ≤ ηκ (ω) < M − αj and f (Uη(κ) (ω)) < a − αj . 21. Consider each κ ≥ k. Then, since ηκ (ω) < M, we have Vη(κ) (ω) ≥ a according to the defining equality 11.9.13. At the same time, because ω ∈ D ⊂ ∞ k =κ Dk , there exists k ≥ κ such that ω ∈ Dk ≡ (a ≤ Vt < a + εk )c )) t∈(0,M]Q(n(k
⊂
(a ≤ Vt < a + εk )c .
(11.9.17)
t∈(0,M]Q(n(κ))
Because ηκ (ω) < M, we have Vη(κ) (ω) ≥ a according to the defining equality n(κ) . Hence Vt (ω) < a or 11.9.13. On the other hand, t ≡ ηκ (ω) ∈ (0,M]Q a + εk ≤ Vt (ω) according to relation 11.9.17. Combining, we infer that a + εk ≤ Vt (ω). In other words, f (Uu (ω)) ≥ a + εk .
sup u∈[0,t]Q(∞)
Hence there exists uκ ∈ [0,t]Q∞ ≡ [0,ηκ (ω)]Q∞ such that f (Uu(κ) (ω)) ≥ a, where κ ≥ k is arbitrary.
(11.9.18)
568
Stochastic Process
22. Suppose, for the sake of a contradiction, that uκ ∈ [0,τ (ω)). Then f (Uu(κ) (ω)) < a, according to Condition (i) established previously, which is a contradiction to inequality 11.9.18. Therefore uκ ≥ τ (ω), whence uκ ∈ [τ (ω), ηκ (ω)]Q∞ . Consequently, uκ → τ (ω) with uκ ≥ τ (ω) and f (Uu(κ) (ω)) ≥ a, as κ → ∞. Since ω ∈ A, the function U (·,ω) is right continuous at τ (ω). It follows that f (Uτ (ω)) ≥ a, where ω is arbitrary in the full set D. Condition (ii) of Definition 10.11.1 is also verified. Accordingly, τ f ,a,M (U ) ≡ τ is the first exit time in [0,M], by the process U , of the open subset (f < a). The theorem is proved.
11.10 First Exit Time for Brownian Motion Definition 11.10.1. Specification of a Brownian motion. In this section, let m ≥ 1 be arbitrary. Let B : [0,∞) × (Ω,L,E) → R m be an arbitrary but fixed Brownian motion in the sense of Definition 9.4.1, where B0 = 0. For each x ∈ R m, define the process B x ≡ x + B : [0,∞) × (Ω,L,E) → R m with initial state x. In the following, let d denote the Euclidean metric on R. We will prove that the triple ((R m,d m ),(,L,E),{B x : x ∈ R m }) is a Feller process relative to a certain Feller semigroup V. Then the Brownian motion can inherit Theorem 11.9.1, with an abundance of first exit times. These exit times are useful in the probability analysis of classical potential theory. Definition 11.10.2. Specification of some independent standard normal r.v.’s. Let U,U1,U2, . . . be an arbitrary independent sequence of R m -valued standard E). Thus each of these r.v.’s has , L, normal r.v.’s on some probability space ( distribution 0,I , where I is the m × m identical matrix. The only purpose for these r.v.’s is to provide compact notations for the normal distributions. Such a sequence can be constructed by taking a product probability space and using Fubini’s Theorem. Proceed to construct the desired Feller semigroup V. Lemma 11.10.3. Brownian semigroup. Let t ≥ 0 be arbitrary. Define the function Vt : Cub (R m,d m ) → Cub (R m,d m ) by (x + Vt (f )(x) ≡ Vtx (f ) ≡ Ef
√
! tU ) =
0,I (du)f (x +
Rm m (R ,d m ).
√ tu)
(11.10.1)
for each x ∈ for each f ∈ Cub Then the family V ≡ {Vt : t ∈ [0,∞)} is a Feller semigroup in the sense of Definition 11.7.2. For lack of a better name, we will call V the Brownian semigroup. Rm,
Markov Process
569
Proof. We need to verify the conditions in Definition 11.7.2 for the family V to be a Feller semigroup. First, note from equality 11.10.1 that for each t ∈ [0,∞) and for each x ∈ R m , the function Vtx is the distribution on (R m,d m ) induced by the √ r.v. x + tU . In particular, Vtx is a distribution on (R m,d m ). 1. Let N ≥ 1, t ∈ [0,N ], and f ∈ Cub (R m,d m ) be arbitrary with a modulus of continuity δf and with |f | ≤ 1. Consider the function Vt f defined in equality 11.10.1. Let ε > 0 be arbitrary. Let x,y ∈ R m be arbitrary with |x − y| < δf (ε). Then √ √ |(x + tU ) − (y + tU )| ≤ |x − y| < δf (ε). Hence √ (y + tU ))| tU ) − Ef √ √ (x + tU ) − f (y + tU ))| ≤ Eε = ε. ≤ E|f
(x + |Vt (f )(x) − Vt (f )(y)| ≡ |Ef
√
Thus Vt (f ) has the same modulus of continuity δf as the function f . In other words, the family V has a modulus of smoothness αV ≡ (ι,ι, . . .), where ι is the identity operation ι. The smoothness condition, Condition 1 of Definition 11.7.2, has been verified for the family V. In particular, we have a function Vt : Cub (R m,d m ) → Cub (R m,d m ). 2. We will next prove that Vtx (f ) is continuous in t. To that end, let ε > 0, x ∈ R m be arbitrary. Note that the standard normal r.v. U is an r.v. Hence there exists a ≡ a(ε) > 0 so large that (|U |>a(ε)) < 2−1 ε. E1
(11.10.2) √ Note also that t is a uniformly continuous function of t ∈ [0,∞), with some modulus of continuity δsqrt . Let t,s ≥ 0 be such that (11.10.3) |t − s| < δV (ε,δf ) ≡ δsqrt (a(ε)−1 δf (2−1 ε)). √ √ Then | t − s| < a(ε)−1 δf (2−1 ε). Hence, on the measurable subset (|U | ≤ E), we have , L, a(ε)) of ( √ √ √ √ |(x + tU ) − (x + sU )| = | t − s| · |U | < δf (2−1 ε), √ √ whence |f (x + tU ) − f (x + sU )| < 2−1 ε. Therefore √ √ (x + tU ) − f (x + sU )| |Vtx (f ) − Vsx (f )| ≤ E|f √ √ (x + tU ) − f (x + sU )|1(|U |≤a(ε)) + E1 (|U |>a(ε)) ≤ E|f (|U |≤a(ε)) + E1 (|U |>a(ε)) ≤ 2−1 εE1 ≤ 2−1 ε + 2−1 ε = ε.
(11.10.4)
Thus the function Vt·x f is uniformly continuous in t ∈ [0,∞), with a modulus of continuity δV (·,δf )√independent of x. In the special case where s = 0, we have (x + 0U ) = f (x), whence inequality 11.10.4 yields Vsx (f ) = Ef
570
Stochastic Process |Vt (f ) − f | ≤ ε
for each t ∈ [0,δV (ε,δf )). The strong-continuity condition, Condition 3 in Definition 11.7.2, is thus verified for the family V, with δV being the modulus of strong continuity. 3. Proceeding to Condition 2, the semigroup property, let t,s ≥ 0 be arbitrary. √ 1 √ First assume that s > 0. Then (t + s)− 2 ( tW1 + sW2 ) is a standard normal r.v. with values in R m . Hence, for each x ∈ R m , we have √ √ √ x (x + t + sU ) = Ef (x + tW1 + sW2 ) (f ) = Ef Vt+s ! ! ϕ0,t (w1 )ϕ0,s (w2 )f (x + w1 + w2 )dw1 dw2 = w(1)∈R m
! =
w(2)∈R m
C!
w(2)∈R m
w(1)∈R m
! =
w(2)∈R m
D ϕ0,t (w1 )f (x + w1 + w2 )dw1 ϕ0,s (w2 )dw2
{(Vt f )(x + w2 )}ϕ0,s (w2 )dw2
= Vsx (Vt f ),
(11.10.5)
provided that s ∈ (0,∞). At the same time, inequality 11.10.4 shows that both ends of equality 11.10.5 are continuous functions of s ∈ [0,∞). Hence equality 11.10.5 can be extended, by continuity, to x (f ) = Vsx (Vt f ) Vt+s
for each s ∈ [0,∞). It follows that Vs+t = Vs Vt for each t,s ∈ [0,∞). Thus the semigroup property, Condition 2 in Definition 11.7.2, is proved for the family V. 4. It remains to verify the nonexplosion condition, Condition 4 in Definition 11.7.2. To that end, let N ≥ 1, t ∈ [0,N ], and ε > 0 be arbitrary. Let n ≥ κV,N (ε) ≡ N −1/2 a(ε) be arbitrary. Define the functions hx,n ≡ (1 ∧ (1 + n − d m (·,x))+ ∈ C(R m,d m ) and hx,n ≡ 1 − hx,n ∈ Cub (R m,d m ). Then x,n (x + Vtx hx,n ≡ Eh
√ m √ tU )) ≤ E1 (d (x+ tU,x)≥n)
−1 √ √ = E1 (| tU |≥n) ≤ E1(| N U |≥n)) ≤ E1(|U |≥a(ε)) < 2 ε < ε,
where the next-to-last inequality is inequality 11.10.2. Condition 4 in Definition 11.7.2 is verified for the family V. 5. Summing up, all the conditions in Definition 11.7.2 hold for the family V to be a Feller semigroup with state space (R m,d m ).
Markov Process
571
Theorem 11.10.4. Brownian motion in R m as a Feller process. Let V be the mdimensional Brownian semigroup constructed in Lemma 11.10.3. Then the triple ((R m,d m ),(,L,E),{B x : x ∈ R m }), where B x ≡ x + B for each x ∈ R m , is a Feller process with the Feller semigroup V, in the sense of Definition 11.7.6. Proof. 1. Let x ∈ R m be arbitrary. Then, according to Corollary 10.10.6, the process B : [0,∞) × (,L,E) → (R m,d m ) is time-uniformly a.u. continuous. It follows that the process B x ≡ x + B is time-uniformly a.u. càdlàg. Let F x,V be the family of f.j.d.’s generated by the initial state x and Feller semigroup V, as defined in 11.7.4 and constructed in Assertion 4 of Theorem 11.7.7. Let y,V E),{U , L, : y ∈ R m }) ((R m,d m ),(
be an arbitrary Feller process, with the Feller semigroup V. Then the family F x,V gives the marginal distributions of the process E) → (R m,d m ). , L, U x,V : [0,∞) × ( 2. Now let n ≥ 1, f ∈ Cub ((R m )n,(d m )n ), and nondecreasing sequence r1 ≤ · · · ≤ rn in Q∞ be arbitrary. We proceed to prove, by induction on n ≥ 1, that x,V x x f = Ef (Br(1) , . . . ,Br(n) ). Fr(1),...,r(n)
(11.10.6)
To start, suppose n = 1. Then, by the defining equality 11.10.1 for V, we have ! √ x,V x (x + r1 U ) (dx1 )f (x1 ) = Vr(1) (f )(x) = Ef Fr(1) f = Vr(1) x ), = Ef (x + Br(1) ) ≡ Ef (Br(1)
which proves equality 11.10.6 for n = 1. 3. Inductively, suppose n ≥ 2 and equality 11.10.6 has been proved for n − 1. Define, for each (x1, . . . ,xn−1 ) ∈ R n, 2 (x1, . . . ,xn−1,xn−1 + rn − rn−1 U ) g(x1, . . . ,xn−1 ) ≡ Ef x(n−1)
= Vr(n)−r(n−1) f (x1, . . . ,xn−1,·) ! x(n−1) ≡ Vr(n)−r(n−1) (dxn )f (x1, . . . ,xn−1,xn ),
(11.10.7)
where the second equality is by the defining equality 11.10.1 for V, and where the third equality is by equality 11.7.2 of Definition 11.7.4. Then x x x , . . . ,Br(n−1) ,Br(n) ) Ef (Br(1) x x x x x , . . . ,Br(n−1) ,Br(n−1) + (Br(n) − Br(n−1) )) = Ef (Br(1)
572
Stochastic Process 2 x (B x , . . . ,B x = E Ef r(1) r(n−1),Br(n−1) + rn − rn−2 U ) x x , . . . ,Br(n−1) ) = Eg(Br(1) x,V = Fr(1),...,r(n−1) g ! ! x(n−2) x (dx1 ) · · · Vr(n−1)−r(n−2) (dxn−1 )g(x1, . . . ,xn−1 ) = Vr(1) ! ! x(n−2) x = Vr(1) (dx1 ) · · · Vr(n−1)−r(n−2) (dxn−1 ) ! x(n−1) Vr(n)−r(n−1) (dxn )f (x1, . . . ,xn−1,xn ) x,V f, = Fr(1),...,r(n)
where the third equality is by the first part of equality 11.10.7, where the fourth equality is by equality 11.10.6 in the induction hypothesis, where the fifth and the last equalities are by equality 11.7.2 in Definition 11.7.4, and where the sixth equality is again thanks to equality 11.10.7. 4. Induction is completed. Equality 11.10.6 has been proved for an arbitrary nondecreasing sequence r1 ≤ · · · ≤ rn in Q∞ . By consistency, it follows that x,V x x Fr(1),...,r(n) f = Ef (Br(1) , . . . ,Br(n) )
(11.10.8)
for each sequence r1, . . . ,rn in Q∞ . Since Q∞ is dense in [0,∞), equality 11.10.8 extends, by continuity, to each sequence r1, . . . ,rn in [0,∞). Thus F x,V is the family of marginal distributions of the process B x , where x ∈ R m is arbitrary. 5. In addition, F x,V gives the marginal distributions of the process U x,V, for each x ∈ R m . At the same time, Condition 2 of Definition 11.7.6, applied to the x,V : x ∈ R m }), yields the continuity of the E),{U , L, Feller process ((R m,d m ),( ∗,V m m function Fs(1),...,s(n) f ∈ Cub (R ,d ). Said Condition 2 is thus satisfied also by the triple ((R m,d m ),(,L,E),{B x : x ∈ R m }) relative to the semigroup V. 6, Now let x ∈ R m be arbitrary. Then B x ≡ x + B is a.u. continuous, and hence a.u. càdlàg. Thus the triple ((R m,d m ),(,L,E),{B x : x ∈ R m }) satisfies all the conditions of Definition 11.7.6. Accordingly, it is a Feller process with the Feller semigroup V. Definition 11.10.5. Specification of some filtrations. Let L ≡ LB ≡ {L(B,t) : (B,t) t ∈ [0,∞)} be the natural filtration of the process B. Let L ≡ LB ≡ {L :t ∈ [0,∞)} be the right-limit extension of LB . Theorem 11.10.6. Brownian motion is strongly Markov. For each x ∈ R m, the process B x is strongly Markov relative to the right continuous filtration L. More precisely, let τ be an arbitrary stopping time relative to L ≡ LB . Then (B,τ ) E f (Bτx+r(0), . . . ,Bτx+r(m) )|L B x (τ ),V
= E(f (Bτx+r(0), . . . ,Bτx+r(m) )|Bτx ) = Fr(0),...,r(m) (f ),
(11.10.9)
Markov Process
573
for each nondecreasing sequence 0 ≡ r0 ≤ r1 ≤ · · · ≤ rm , for each f ∈ Cub (S m+1,d m+1 ). Proof. The present theorem is an immediate corollary of Theorem 11.8.7
Theorem 11.10.7. Abundance of first exit times for Brownian motion. Let x ∈ R m be arbitrary. Let f ∈ Cub (R m,d m ) and a0 ∈ R be arbitrary, such that a0 ≥ f (x). Then there exists a countable subset H of R such that for each a ∈ (a0,∞)Hc and for each M ≥ 1, the first exit time τ f ,a,M (U ) exists relative to the filtration L. Here Hc denotes the metric complement of H in R. Proof. Let M ≥ 1 be arbitrary. Theorem 11.9.1 says that there exists a countable subset GM of R such that for each a ∈ (a0,∞)GM,c , the first exit time τ f ,a,M (B x ) exists relative to the filtration L. Then the countable set H ≡ ∞ M=1 GM has the desired properties. An application of Theorem 11.10.7 is where f ≡ a ∧ | · | ∈ Cub (R m,d m ) for some a > 0, and where x ∈ R m and a0 ∈ R are such that |x| < a0 < a. Then, for each a ∈ (a0,a)Hc, the stopping time τ f ,a,M (B x ) is the first exit time, on or before M, for the process B x to exit the sphere in R m of center 0 and radius a. We end this section, and this book, with the remark, without giving any proof, that in the case of a Brownian motion, (i) the exceptional set H can be taken to be empty and (ii) τ f ,a (B x ) ≡ limM→∞ τ f ,a,M (B x ) exists a.u. Hence, if the starting state x is in the sphere with center 0 ∈ R m and arbitrary radius a, then, a.s., the process B x exits said sphere at some finite time τ f ,a (B x ).
Appendices
Several times in this book, we apply the method of change of integration variables for the calculation of Lebesgue integrals in R n . In these two appendices, we prove those theorems, and only those theorems, that justify the several applications. In the proofs of the following theorems, we assume no prior knowledge of Euclidean geometry. We will use some matrix algebra and calculus, and will use theorems on metric spaces and integration spaces that are treated in the first four chapters of this book. Note that the following theorems are used only in Chapter 5 and later.
575
Appendix A Change of Integration Variables
In this appendix, let n ≥ 1 be an arbitrary, but fixed, integer. As usual, we write ab and a(b) interchangeably for arbitrary expressions a,b. We write AB and A ∩ B interchangeably for arbitrary sets A,B. Recall that for each locally compact metric space (S,d), we write C(S,d) or C(S) for the space of continuous functions on S with compact support. At some small risk of confusion, for real-valued expressions a,b, and c, we will write a = b ± c to mean |a − b| ≤ c. Definition A.0.1. Some notations for Lebesgue integrations and for matrices. Let μ1 and μ denote the measures with respect to the Lebesgue integrations ·dx
⊗n ·dx in R 1 and R n , respectively. Unless otherand · · · ·dx1 . . . dxn ≡ wise specified, all measure-theoretic terms will be relative to these Lebesgue inte grations. Moreover, when the risk of confusion is low, we will write x∈A f (x)dx for · · · 1A (x1, . . . ,xn )f (x1, . . . ,xn )dx1 . . . dxn , for each integrable function f and measurable integrator 1A on R n . n × m matrices. For Let m ≥ 1 be arbitrary. We will write Mn×m for the set of all . n m n m 2 each α ∈ Mn×m , define |α| ≡ i=1 j =1 |α i,j | and α ≡ i=1 j =1 |α i,j | . Note that | · | and · are equivalent norms for Mn×m because |α| ≤ α ≤ √ nm|α|. Unless otherwise indicated, the space Mn×m is equipped with the norm | · |, and with the metric associated with this norm. Continuity on Mn×m means continuity relative to the latter norm and metric. Each point x ≡ (x1, . . . ,xn ) ∈ R n, will be identified with the n × 1 matrix ⎤ ⎡ x1 n ⎢ .. ⎥ ⎣ . ⎦, also called a column vector. Thus |x| ≡ i=1 |xi | and x ≡ x .n n
i=1 |xi |
2.
Moreover, | · | and · are equivalent norms on R n .
The determinant of α is defined as
det α ≡ sign(σ )α 1,σ (1) α 2,σ (2) . . . α n,σ (n), σ ∈#
577
(A.0.1)
578
Appendices
where # is the set of all permutations on {1, . . . ,n}, and where sign(σ ) is +1 or −1 depending on whether σ is an even or odd permutation, for each σ ∈ #. Suppose n ≥ 2. For each i,j = 1, . . . ,n, let α i,j denote the (n − 1) × (n − 1) matrix obtained by deleting the i-th row and j -th column from α. Then the number det α i,j is called the (i,j )-minor of α. The number (−1)i+j det α i,j is called the (i,j )-cofactor of α. Finally, if ϕ(x) is a given expression of x for each member x of a set G, then we write x → ϕ(x) for the function ϕ defined on G by ϕ(x) ≡ ϕ(x) for each x ∈ G. For example, the expression x → x + sin x will stand for the function ϕ defined on R by ϕ(x) ≡ x + sin x for each x ∈ R. Likewise, α → det α will stand for the function ϕ defined on Mn×n by ϕ(α) ≡ det α, for each α ∈ Mn×n . We will write α· for the function x → αx on R n , for each α ∈ Mn×n . Thus α· is the function of left multiplication with the matrix α. Lemma A.0.2. Matrix basics. Let α,β ∈ Mn×n be arbitrary. Then the following conditions hold: 1. (det αβ) = (det α) · (det β). 2. The function α → det α is uniformly continuous on compact subsets of Mn×n . If |α| ≤ b for some b ≥ 0, then |det α| ≤ n! bn . 3. Cramer’s rule. Suppose n ≥ 2. Suppose | det α| > 0. Then the inverse matrix α −1 is well defined and is given by (α −1 )i,j ≡ (det α)−1 (−1)i+j det α j,i ,
(A.0.2)
for each i,j = 1, . . . ,n. 4. Let c > 0 be arbitrary. Define Mn×n,c ≡ {α ∈ Mn×n : | det α| ≥ c}. Then the function α → α −1 is uniformly continuous on compact subsets of Mn×n,c . Proof. 1. For Assertions 1 and 3, consult any textbook on matrix algebra. 2. Assertion 2 follows from the defining equality A.0.1, which also says that det α is a polynomial in the entries of α, whence α → det α is a uniformly continuous function of these entries when these entries are restricted to a compact subset of R. In short, the function α → det α is uniformly continuous on compact subsets of Mn×n relative to the norm | · | on Mn×n . 3. Let i,j = 1, . . . ,n be arbitrary. Then each entry of the (n − 1) × (n − 1) matrix α j,i is equal to some entry of α. Hence each entry of the matrix α j,i is a uniformly continuous function on Mn×n . Therefore the function α → α j,i is a uniformly continuous function from Mn×n to M(n−1)×(n−1) . Since γ → det γ is, in turn, a uniformly continuous function on compact subsets of M(n−1)×(n−1) , we see that α → det α j,i is a uniformly continuous function on compact subsets of Mn×n , for each i,j = 1, . . . ,n. At the same time, Assertion 2 says that α → det α is a uniformly continuous function on compact subsets of Mn×n . Hence the function α → (det α)−1 is uniformly continuous on compact subsets
Change of Integration Variables
579
of Mn×n,c . Combining, we see that the right-hand side of equality A.0.2 is a uniformly continuous function on compact subsets of Mn×n,c . Hence so is the left-hand side of equality A.0.2. Thus the function α → (α −1 )i,j is uniformly continuous on compact subsets of Mn×n,c . Therefore the function α → α −1 is uniformly continuous on compact subsets of Mn×n,c . Definition A.0.3. (n-interval). In the following discussion, any open subinterval of R, whether proper or not, will be called an open 1-interval. Let a,b ∈ R be arbitrary with a < b.Then the intervals [a,b] and [a,b) will be called, respectively, closed and half-open 1-intervals. Each open, closed, and half-open 1-interval will also simply be called a 1-interval, and denoted by . If is a proper subinterval, with endpoints a ≤ b in R, then || ≡ b − a > 0 will be called the length of the 1-interval . Note that, as defined here, each closed or half-open interval is proper and has a well-defined length. More generally, a product subset ≡ ni=1 i of R n , where i is a 1-interval for each i = 1, . . . ,n, will be called an n-interval. The intervals 1, . . . ,n in R are then called the factors of the n-interval . Suppose all the factors are proper. Then the n-interval is said to be proper. The real number || ≡ ni=1.|i | > 0 will be called the length of the n-interval n 2 . The real number ≡ i=1 |i | will then be called the diameter of . The center x of the n-interval is then defined to be x = (x1, . . . ,xn ), where xi is the midpoint of i for each i = 1, . . . ,n. If all the factors of the n-interval are closed/half-open/open 1-intervals, then is called a closed/half-open/open n-interval. Lemma A.0.4. Proper n-intervals are Lebesgue integrable. Each proper n-interval ≡ ni=1 i is Lebesgue integrable, with μ() = ni=1 |i |. Proof. 1. Consider the case where n = 1, and where = [a,b]. For each k ≥ 1, define the continuous function hk on R by (i) hk (a) ≡ hk (b) ≡ 1; (ii) hk (a − k −1 ) ≡ hk (b + k −1 ) ≡ 0; (iii) hk is linear on each of the intervals [a − k −1,a], [a,b], and [b,b + k −1 ]; and(iv) hk vanishes off the interval [a − k −1,b + k −1 ]. Then the Riemann integral hk (x)dx is nonincreasing with k, and converges to b − a as k → ∞. Hence, by the Monotone Convergence Theorem, the indicator 1[a,b] is Lebesgue integrable, with integral b − a. In other words, the interval [a,b] is Lebesgue integrable, with Lebesgue measure ||. 2. Next, with n = 1, consider the half-open interval [a,b). Then, for each k ≥ 1, the interval [a,a ∨ (b − k −1 )] is integrable, with measure (b − a − k −1 )+ which converges to b − a as k → ∞. Hence, again by the Monotone Convergence Theorem, the interval [a,b) is integrable, with Lebesgue integral b − a. Similarly, the interval (a,b) is integrable, with Lebesgue integral || = b − a. 3. Consider now the general case where n ≥ 2. Then ≡ ni=1 i of R n , where i is a proper 1-interval for each i = 1, . . . ,n. If n = 2, then 1 ≡ ⊗ni=1
⊗n 1(i) is a simple function relative to ·dx ≡ · · · ·dx1 . . . dxn in the sense
580
Appendices 2
of Definition 4.10.2, whence i=1 i is an integrable set in R 2 . Repeating this argument, we can prove, inductively, that ni=1 i is an integrable subset of R n , where n ≥ 2 is arbitrary, and that, by Fubini’s Theorem, ! ! n ! n |i |. 1(i) dx = · · · 1 dx1 . . . dxn = Equivalently, μ() =
n
i=1 |i |.
i=1
i=1
The lemma is proved.
We need some more elementary (no pun intended) results in matrix algebra. Definition A.0.5. Elementary matrix. An n × n matrix T ≡ [Ti,j ]i,j =1,...,n is called an elementary matrix if one of the following three conditions holds: 1. Row swapping. Suppose there exist i,j ∈ {1, . . . ,n} with i < j such that, for each i,j = 1, . . . ,n, we have (i) Ti,j = 1, if i = j i and i = j j ; (ii) Ti,j = 1, if (i,j ) = (i,j ) or (j,i) = (i,j ); and (iii) Ti,j = 0, if i j , (i,j ) (i,j ), and (i,j ) (j,i). In other words, T is obtained by swapping the i-th row and the j -th row of the identity matrix I . We will then write T ≡ TSwp,i,j . The left multiplication by TSwp,i,j to any n × k matrix A then produces a matrix B that is obtainable by swapping the i-th row and the j -th row in the matrix A. Moreover, det TSwp,i,j = −1, whence det(TSwp,i,j A) = − det A for each n × n matrix A. 2. Multiplication of one row by a nonzero constant. Suppose there exist i ∈ {1, . . . ,n} and c ∈ R with |c| > 0 such that, for each i,j = 1, . . . ,n, we have (i) Ti,j = 1, if i = j i; (ii) Ti,j = c, if i = j = i; and (iii) Ti,j = 0 if i j . In other words, T is obtained by multiplying the i-th row of the identity matrix I with the constant c. We will then write T ≡ TMult,i,c . The left multiplication by TMult,i,c to any n×k matrix A then produces a matrix B that is obtainable by multiplying the i-th row of the matrix A by the constant c. It follows that det TMult,i,c = c, whence det(TMult,i,c A) = c det A for each n × n matrix A. 3. Adding a constant multiple of one row to another. Suppose there exist i,j ∈ {1, . . . ,n} with i j , and c ∈ R such that, for each i,j = 1, . . . ,n, we have (i) Ti,j = 1 if i = j ; (ii) Ti,j = c, if i j and (i,j ) = (i,j ); and (iii) Ti,j = 0 if i j and (i,j ) (i,j ). In other words, T is obtained by adding c times the i-th row of the identity matrix I to the j -th row. We will then write T ≡ TAdd,c,i,j . The left multiplication by TAdd,c,i,j to any n×k matrix A then produces a matrix B that is obtainable by adding c times the i-th row of the matrix A to the j -th row. It follows that det TAdd,c,i,j = 1, whence det(TAdd,c,i,j A) = det A for each n × n matrix A. An elementary matrix whose entries are rational numbers is called a rational elementary matrix. For example, TAdd,c,i,j is rational iff c is rational. Proposition A.0.6. Gauss–Jordan method of finding the inverse of an n × n matrix. Let α be an n × n matrix with | det α| > 0. Then there exists a finite
Change of Integration Variables
581
sequence (T (k) )k=0,...,κ of elementary matrices, where κ ≥ 0 and where T (0) = I , such that α −1 = T (κ) . . . T (0) . If, in addition, all the entries of the matrix α are rational, then there exists a finite sequence (T (k) )k=0,...,κ of rational elementary matrices, where κ ≥ 0 and where T (0) = I , such that α −1 = T (κ) . . . T (0) . Consequently, the inverse matrix α −1 has rational entries iff the matrix α does.
Proof. See any matrix algebra text on Gauss–Jordan elimination.
Lemma A.0.7. Change of integration variables in R 1 , by a multiplication and a shift. Let f ∈ C(R) be arbitrary. Let v,c ∈ R be arbitrary with |c| > 0. Then (i) ! ! f (x)dx = f (cu)|c|du and (ii)
!
! f (x)dx =
f (v + u)du.
Proof. 1. First assume that c > 0. By Definitions 4.9.9 and 4.1.2, the Lebesgue integral of f is the limit of Riemann sums. Specifically, ! f (x)dx ≡ lim n α(0)→−∞;α(n)→+∞;
×
n
i=1 (α(i)−α(i−1))→0
f (αi )(αi − αi−1 ) : α0 < · · · < αn
i=1
=
lim n
α(0)/c→−∞;α(n)/c→+∞;
×
n
f (cc
−1
αi )(cc
−1
i=1 (α(i)/c−α(i−1)/c)→0
αi − cc
−1
αi−1 ) : c
−1
α0 < · · · < c
−1
αn
i=1
=
lim n
β(0)→−∞;β(n)→+∞;
× ! ≡
n
i=1 (β(i)−β(i−1))→0
f (cβi )c(βi − βi−1 ) : β0 < · · · < βn
i=1
f (cu)cdu,
as alleged in Assertion (i). In the case where c < 0, Assertion (i) can be proved similarly. 2. Assertion (ii) can be proved similarly. A special case of the next theorem proves, in terms of Lebesgue measures, the intuitive notion of areas being invariant under Euclidean transformations. Note that we assume no prior knowledge of Euclidean geometry.
582
Appendices
Theorem A.0.8. Change of integration variables in R n by a matrix multiplication and a shift. Let g be an arbitrary function on R n . Let v ∈ R n and the n × n-matrix α be arbitrary, with | det α| > 0. Then the following conditions hold: 1. The function g is integrable on R n iff the function x → g(α · x) is integrable on R n , in which case ! ! ! ! | det α| · · · g(α · x)dx1 . . . dxn = · · · g(u)du1 . . . dun . 2. The function g is integrable on R n iff the function x → g(v + x) is integrable on R n , in which case ! ! ! ! · · · g(v + x)dx1 . . . dxn = · · · g(u)du1 . . . dun . Here we write x ≡ (x1, . . . ,xn ) and u ≡ (u1, . . . ,un ). Proof. We will prove only Assertion 1. The proof of Assertion 2 is similar. First assume that α has rational entries. 1. Then, by Proposition A.0.6, there exists a finite sequence (T (k) )k=0,...,κ of elementary matrices with rational entries, where κ ≥ 0 and where T (0) = I , such that α −1 = T (κ) . . . T (0) .
(A.0.3)
2. Suppose, for some k = 1, . . . ,κ, we have T (k) = TAdd,c,i,j for some c. Then c is a rational number. Either c = 0 or c 0. If c = 0, then TAdd,c,i,j = I , whence the factor T (k) can be removed from the right-hand side of equality A.0.3. If c 0, then TAdd,c,i,j = TMult,i,1/c TAdd,1,i,j TMult,i,c,
(A.0.4)
whence the factor T (k) in the right-hand side of equality A.0.3 can be removed and replaced by the right-hand side of equality A.0.4. Summing up, we may assume that, if T (k) = TAdd,c,i,j for some k = 1, . . . ,κ, then T (k) = TAdd,1,i,j . We proceed to prove Assertion 1, by induction on κ. 3. For that purpose, first assume that g ∈ C(R n ). Then there exists M ≥ 0 such that [−M,M]n is a support of the function g. Define the continuous function f ≡ g ◦ α· on R n . Suppose | f (x)| > 0 for some x ∈ R n . Then |g(α · x)| > 0. Hence y ≡ α · x ∈ [−M,M]n . Consequently, |y| ≤ M. Therefore ⎛ ⎛ ⎞ ⎞ n n n n
⎝ ⎝ ⎠ ⎠ |x| = |α −1 · y| ≤ |α −1 |α −1 i,j | · |yj | ≤ i,j | · |y| i=1
≤
n i=1
j =1
i=1
j =1
(n|α −1 |) · |y| ≤ M ≡ n|α −1 | · M.
Change of Integration Variables
583
In short, the continuous function f is supported by the compact subset [−M,M]n . Thus f ∈ C(R n ). Consequently, f is integrable. In other words, the function x → g(α · x) is integrable on R n . To start the induction, consider the case where κ = 1. According to Definition A.0.5 and the previous paragraph, the rational elementary matrix T (κ) can be one of three kinds: (i) T (κ) ≡ TSwp,i,j for some i,j ∈ {1, . . . ,n} with i < j ; (ii) T (κ) ≡ TMult,i,c for some i ∈ {1, . . . ,n}, for some rational number c 0; and (iii) T (k) = TAdd,1,i,j . 4. Consider Case (i), where T (κ) ≡ TSwp,i,j for some i,j ∈ {1, . . . ,n} with i < j . For ease of notations, there is no loss of generality to assume that i = 1 and j = 2. Then ! ! ! ! ··· g(α · x)dx1 dx2 dx3 . . . dxn x(1)∈R
!
x(2)∈R
!
x(3)∈R
!
x(n)∈R
≡
!
··· x(1)∈R
!
x(2)∈R
!
!
x(3)∈R
=
f (x1,x2,x3, . . . ,xn )dx1 dx2 dx3 . . . dxn x(n)∈R
! ···
x(2)∈R
!
!
x(1)∈R
!
x(3)∈R
=
f (x1,x2,x3, . . . ,xn )dx2 dx1 dx3 . . . dxn !
x(n)∈R
··· u(1)∈R
!
u(2)∈R
!
!
u(3)∈R
≡
f (u2,u1,u3, . . . ,un )du1 du2 du3 . . . dun !
u(n)∈R
··· u(1)∈R
!
u(2)∈R
!
f (TSwp,1,2 (u1,u2,u3, . . . ,un ))
u(3)∈R
u(n)∈R
× du1 du2 du3 . . . dun
!
!
≡
··· u(1)∈R
u(2)∈R
u(3)∈R
u(n)∈R
f (TSwp,1,2 (u1,u2,u3, . . . ,un )) · | det TSwp,1,2 | · du1 du2 du3 . . . dun ! ! ! ! ≡ ··· u(1)∈R
u(2)∈R
u(3)∈R
u(n)∈R
× f (α −1 (u1,u2,u3, . . . ,un )) · | det α −1 | · du1 du2 du3 . . . dun ! ! ! ! ≡ ··· g(u1,u2,u3, . . . ,un ) · | det α −1 | u(1)∈R
u(2)∈R
u(3)∈R
u(n)∈R
× du1 du2 du3 . . . dun, where the second equality is by Fubini’s Theorem, where the third equality is by a change of the names in the dummy variables, and where the third-to-last equality is because det TSwp,i,j = −1. This proves Assertion 1 in Case (i) if κ = 1, assuming g ∈ C(R n ), and assuming α has rational entries.
584
Appendices
6. With κ = 1, next consider Case (ii), where T (κ) ≡ TMult,i,c for some i ∈ {1, . . . ,n} and for some rational number c 0. For ease of notations, there is no loss of generality to assume that i = 1. Then ! ! ! ··· g(α · x)dx1 dx2 . . . dxn x(1)∈R
!
x(2)∈R
x(n)∈R
!
!
≡
··· !
x(1)∈R
=
f (x1,x2, . . . ,xn )dx1 dx2 . . . dxn
x(2)∈R
x(n)∈R
C!
!
D f (x1,x2, . . . ,xn )dx1 dx2 . . . dxn
··· !
x(2)∈R
=
!
x(n)∈R
C!
x(1)∈R
x(n)∈R
!
u(1)∈R
D f (cu1,x2, . . . ,xn )|c|du1 dx2 . . . dxn
··· !
x(2)∈R
!
=
··· !
u(1)∈R
!
f (cu1,x2, . . . ,xn )|c|du1 dx2 . . . dxn
x(2)∈R
=
!
x(n)∈R
··· u(1)∈R
f (TMult,1,c (u1,u2, . . . ,un ))
u(2)∈R
= | det α −1 |
u(n)∈R
!
× | det TMult,1,c |du1 du2 . . . dun ! ··· f (α −1 · (u1,u2, . . . ,un ))du1 du2 . . . dun
!
u(1)∈R u(2)∈R
= | det α −1 |
!
u(n)∈R
!
!
··· u(1)∈R
g(u1,u2, . . . ,un )du1 du2 . . . dun,
u(2)∈R
u(n)∈R
where the third equality is by Lemma A.0.7, and where the third-to-last equality is because det TMult,i,c = c. This proves Assertion 1 in Case (ii) if κ = 1, assuming f ∈ C(R n ), and assuming α has rational entries. 7. Continuing with κ = 1, consider Case (iii), where T (κ) = TAdd,1,i,j for some i,j ∈ {1, . . . ,n}. For ease of notations, there is no loss of generality to assume that i = 1 and j = 2. Then ! ! ! ! ··· g(α · x)dx2 dx3 . . . dxn x(1)∈R
!
x(2)∈R
!
x(3)∈R
x(n)∈R
!
!
≡
··· !
x(1)∈R
!
x(2)∈R
=
x(3)∈R
!
f (x1,x2,x3, . . . ,xn )dx1 dx2 dx3 . . . dxn
C!
x(n)∈R
··· !
x(1)∈R
!
x(3)∈R
=
D f (x1,x2,x3, . . . ,xn )dx2 dx1 dx3 . . . dxn
x(n)∈R
!
x(2)∈R
C!
··· x(1)∈R
!
x(3)∈R
x(n)∈R
!
!
=
··· u(1)∈R
u(3)∈R
D f (x1,x1 + u2,x3, . . . ,xn )du2
u(n)∈R
u(2)∈R
× dx1 dx3 . . . dxn C! D f (u1,u1 + u2,u3, . . . ,un )du2 u(2)∈R
× du1 du3 . . . dun
!
!
!
= u(1)∈R
u(2)∈R
Change of Integration Variables ! ··· f (TAdd,1,1,2 (u1,u2,u3, . . . ,un ))
u(3)∈R
!
585
u(n)∈R
!
× du1 du2 du3 . . . dun
= | det TAdd,1,1,2 | ··· f (TAdd,1,1,2 (u1, . . . ,un ))du1 . . . dun u(1)∈R u(n)∈R ! ! = | det α −1 | ··· f (α −1 · (u1, . . . ,un ))du1 . . . dun u(1)∈R u(n)∈R ! ! = | det α −1 | ··· g(u1, . . . ,un )du1 . . . dun, u(1)∈R
u(n)∈R
where the third equality is by Lemma A.0.7, and where the third-to-last equality is because det TAdd,1,1,2 = 1. Summing up, Assertion 1 is proved if κ = 1, assuming g ∈ C(R n ) and assuming α has rational entries. assuming κ = 1, Assertion 1 says that the Lebesgue 8. Note that, integration n n ·dμ on R agrees on C(R ) with the completion of the integration ·dμ defined by gdμ ≡ | det α| g(α · x)dx for each g ∈ C(R n ). Hence thecompletions n are equal. In particular, a function g on R is integrable relative to ·dμ iff it is integrable relative to ·dμ, in which case ! ! ! gdμ = gdμ ≡ | det α| g(α · x)dx. Assertion 1 is thus proved in the case where κ = 1, assuming α has rational entries. 9. Still assuming α has rational entries, suppose, inductively, that Assertion 1 has been proved for κ = 1, . . . ,κ − 1 for some κ ≥ 2. Now suppose α −1 = T (κ) . . . T (0) .
(A.0.5)
Define the n × n matrix α by its inverse α −1 = T (κ−1) . . . T (0) .
(A.0.6)
Then α −1 = T (κ) α −1 . Let g be an arbitrary integrable function on R n . Then, in view of equality A.0.6, the g ≡ g ◦ α induction hypothesis applies to κ = κ − 1, and implies that the function is integrable on R n , with ! ! ! ! α |−1 · · · ··· g (u)du1 . . . dun = | det g ( α −1 · v)dv1 . . . dvn . (A.0.7) In turn, the induction hypothesis applies to κ = 1, with (T (κ) )−1 in place of α, and implies that ! ! ! ! g (u)du1 . . . dun . g ((T (κ) )−1 · x)dx1 . . . dxn = · · · | det T (κ) |−1 · · · (A.0.8)
586
Appendices
Combining equalities A.0.8 and A.0.7, we obtain ! ! | det T (κ) |−1 · · · g ◦ α ((T (κ) )−1 · x)dx1 . . . dxn = | det α|
−1
!
! ···
g ◦ α ( α −1 · v)dv1 . . . dvn .
Since | det T (κ) | · | det α |−1 = | det α|−1, it follows that ! ! ! ! · · · g( α · (T (κ) )−1 · x)dx1 . . . dxn = | det α|−1 · · · g(v)dv1 . . . dvn . Equivalently, ! ! ! ! −1 · · · g(α · x)dx1 . . . dxn = | det α| · · · g(v)dv1 . . . dvn . Induction is completed, and Assertion 1 is proved with the additional assumption that the matrix α has rational entries. 10. Now let α be an arbitrary n × n matrix, with entries not necessarily rational. We proceed to prove that Assertion 1 remains valid. Let g ∈ C(R n ) be arbitrary, with compact support K. Then the real-valued ≡ function v → g( α · v) is also a continuous function, with compact support K −1 α | and | det α | > c > 0. α · K. Take any b,c ∈ R such that b > max |g| ∨ | det Let ε > 0 be arbitrary. By Assertion 4 of Lemma A.0.2, the function α → α −1 is uniformly continuous on α | ≤ b}. Mn×n,c,b ≡ {α ∈ Mn×n,c : | det Since b > | det α | > c, there exists an n × n matrix α ∈ Mn×n,c,b with rational α | so small that (i) | det α −1 − det α −1 | < ε; (ii) |g(α · v) − entries and with |α − n g( α · v)| < ε for each v ∈ R ; and (iii) there exists an integrable compact subset of R n that is a support of both the function g ◦ α · and the function g(α · x). By K the last statement in Step 9, we have ! ! ! ! −1 · · · g(v)dv1 . . . dvn . · · · g(α · x)dx1 . . . dxn = | det α| Conditions (i) and (iii) therefore lead to ! ! ! ! −1 g(α · x)dx1 . . . dxn = (| det α | ± ε) · · · g(v)dv1 . . . dvn . ··· x∈K
Condition (ii) then implies ! ! ! ! −1 ··· (g( α · x) ± ε)dx1 . . . dxn = (| det α | ± ε) · · · g(v)dv1 . . . dvn x∈K
Letting ε → 0, we obtain ! ! ! ! −1 · · · g( α · x)dx1 . . . dxn = | det α | · · · · g(v)dv1 . . . dvn,
Change of Integration Variables
587
where g ∈ C(R n ) is arbitrary. Since C(R n ) is dense in the space of integrable functions, the last displayed equality holds for each integrable function g. Since α is an arbitrary n × n matrix with | det α | > 0, Assertion 1 is proved. The proof of Assertion 2 is similar. Corollary A.0.9. Integrability of convolution of two integrable functions. Let L stand for the Lebesgue integrable functions on R n . Let f ,g ∈ L be arbitrary. The convolution f g : R n → R is the function defined by domain( f g) ≡ {x ∈ R n : f (x − ·)g ∈ L}, and by ( f g)(x) ≡ u∈R n f (x − y)g(y)dy for each x ∈ domain( f g). Then f g ∈ L. Proof. 1. First assume that f ,g ∈ C(R n ). Define f(x,y) ≡ f (x − y) and g is Lebesgue integrable g (x,y) ≡ g(y). Then f g ∈ C(R 2n ). Consequently, f 2n g (x,·) is a member on R . Therefore, by Fubini’s Theorem, the function f (x,·) of L, for each x in some full subset D of R n, with ! ! ! ! f (x,y) g (x,y)dxdy = f(x,y) g (x,y)dx dy ! ! ≡ f (x − y)dx g(y)dy ! ! ! ! = f (x)dx g(y)dy = f (x)dx · g(y)dy, (A.0.9) where the third equality is thanks to Theorem A.0.8. Note that the set {f(x,·) g (x,·) : x ∈ D} is dense in the metric subspace A ≡ {f(x,·) g (x,·) : x ∈ R n } ≡ {f (x − ·)g : x ∈ R n } of C(R n ), relative to the supremum norm. Hence equality A.0.9 implies that each member of A is integrable and also satisfies ! ! ! ! f(x,y) g (x,y)dxdy = f (x)dx · g(y)dy. In other words, for each x ∈ R n , we have x ∈ domain( f g) ≡ {x ∈ R n : f (x − ·)g ∈ L}, and, in addition, ( f g)(x) ≡
!
! u∈R n
f (x − u)g(u)du =
! f (x)dx ·
g(y)dy,
where f ,g ∈ C(R n ) are arbitrary. 2. Next let g ∈ C(R n ) be arbitrary with g(y)dy > 0. Define Lg ≡ {f ∈ L : f g ∈ L}.
(A.0.10)
588
Appendices
Then C(R n ) ⊂ Lg ⊂ L according to Step 1. Define the functions I1,I2 : C(R n ) → R by ! ! I1 ( f ) ≡ f (x − y)dx g(y)dy and
! I2 ( f ) ≡
! f (x)dx ·
g(y)dy,
for each f ∈ C(R n ). Equality A.0.10 shows that (R n,C(R n ),I1 ) and (R n,C(R n ), I2 ) are equal as integration spaces. Hence a function f is integrable relative to the integration I1 iff it is integrable relative to I2 . Since I2 is a positive constant multiple of the Lebesgue integration, the completion L of C(R n ) relative to the Lebesgue integration is equal to the completion of C(R n ) relative to I2 . At the same time, since C(R n ) ⊂ Lg ⊂ L, we have Lg = L. Thus we see that each f ∈ L is a member of Lg , and that I1 ( f) = I2 ( f ) for each f ∈ L. Summing up, for each f ∈ L and g ∈ C(R n ) with g(y)dy > 0, we have f g ∈ L and ! ! ! ( f g)(y)dy = f (x)dx · g(y)dy. (A.0.11) 3. Now let f ∈ L be arbitrary with f (x)dx > 0, but fixed. Then equality A.0.11 holds for each g ∈ C(R n ) with g(y)dy > 0, Hence, by linearity and continuity, it holds for each g ∈ C(R n ). Define the functions I3,I4 : C(R n ) → R by ! ! I3 (g) ≡ f (x − y)dx g(y)dy and
! I4 (g) ≡
! f (x)dx ·
g(y)dy,
(A.0.12)
n n for each g ∈ C(R n ). Then equality A.0.11 shows that (R ,C(R ),I3 ) and n n (R ,C(R ), I4 )are equal as integration spaces. Since f (x)dx > 0, we see from relation A.0.12 that ! ! (R n,C(R n ),I4 ) = R n,C(R n ), f (x)dx · ·dy .
Hence a function g is integrable relative to I3 iff it is integrable relative to f (x)dx · ·dy, in which case ! f (x − y)dx g(y)dy = I3 (g) ! ! = I4 (g) = f (x)dx · g(y)dy. (A.0.13) Now let g ∈ L be arbitrary. Then g is integrable relative to f (x)dx · ·dy.
f (x − · dx)g ∈ L, or, Therefore g is integrable relative to I3 . In other words,
Change of Integration Variables
589
equivalently, f g ∈ L, where g,f ∈ L are arbitrary, with f (x)dx > 0. By linearity and continuity, we conclude that f g ∈ L for arbitrary g,f ∈ L. The corollary is proved. Theorem A.0.10. Change of integration variables in R 1 . Let B be an open interval in R. Suppose: 1. β : B → R is a diferentiable function whose derivative dβ du : B → R is uniformly continuous on each compact subset K ⊂ B. 2. For each compact subset K ⊂ B, there exists some c > 0 such that dβ du (v) ≥ c for each v ∈ K. Then, for an arbitrary function f on R, the function 1β(B) · f is integrable relative to the Lebesgue integration iff the function 1B · ( f ◦ β) · dβ du is integrable relative to the Lebesgue integration, in which case ! ! dβ (A.0.14) f (x)dx = f (β(u)) · (u) du. du x∈β(B) u∈B Proof. 1. Let K ≡ [a,b] be an arbitrary closed interval with K ⊂ B. Then | dβ du (v)| ≥ c for each v ∈ K, for some c > 0. Hence, there is no loss of generality to assume that dβ du ≥ c on K. Thus β is an increasing function on the interval K, whence β(K) = [β(a),β(b)] is a closed interval. Then ! ! dβ dβ (u) · du = (u)du du β(u)∈β(K) u∈K du ! b dβ = (u)du = β(b) − β(a) u=a du ! β(b) ! = dx = 1β(K) (x)dx, x=β(a)
where the third equality is by the Second Fundamental Theorem of Calculus, which can be found in textbooks on advanced calculus or in [Bishop 1967]. 2. Next, let K ≡ [a,b] be an arbitrary closed interval. Let (Ki )i=1,2,... be an arbitrary sequence of closed intervals with Ki ⊂ Ki+1 ⊂ B for each i = 1,2, . . ., such that (i) ∞ i=1 Ki = B and (ii) u∈K(i) G(u)du ↑ u∈B G(u)du as i → ∞, for each nonnegative integrable function G on R. Consider each i ≥ 1. Then KKi ⊂ B, Hence, by Step 1 of this proof, we have ! ! dβ (u) du = 1β(KK(i)) (x)dx. β(u)∈β(KK(i)) du Consequently, ! ! ! dβ dβ dβ 1K (u) · (u) du = (u) du = (u) du du u∈K(i) u∈KK(i) du β(u)∈β(KK(i)) du ! ! = 1β(KK(i)) (x)dx = 1β(K(i)) (x)1β(K) (x)dx. (A.0.15)
590
Appendices dβ Let i → ∞. Then, since 1K du is integrable on R, Condition (ii) implies that the left-hand side of equality A.0.15 converges monotonically to u∈B 1K (u) · dβ (u)du. Hence so does the right-hand side of equality A.0.15. Therefore, by du the Monotone Convergence Theorem, the function 1β(B) 1β(K) is integrable on R and ! ! ! dβ 1K (u) · (u) du = 1β(B) (x)1β(K) (x)dx = 1β(K) (x)dx, du u∈B
x∈β(B)
where K ≡ [a,b] is an arbitrary closed interval. 3. Next, let K ≡ [a,b) be an arbitrary half-open interval where a < b < ∞. Let (bj )j =1,2,... be an increasing sequence in [a,b) such that bj ↑ b, and let Kj ≡ [a,b ] ⊂ [a,b) for each j ≥ 1. Then Kj ⊂ Kj +1 for each j ≥ 1. Moreover, ∞j j =1 Kj = K. Furthermore, by Step 2 of this proof, we have ! ! dβ 1K (j ) (u) · (u) du = 1β(K (j )) (x)dx (A.0.16) du u∈B x∈β(B) for each j ≥ 1. The Dominated Convergence Theorem implies that the left hand side of equality A.0.16 converges to u∈B 1K (u) · | dβ du (u)|du monotonically. Therefore, so does the right-hand side. The Monotone Convergence Theorem then, in turn, implies that the function 1β(B) 1β(K) = 1β(B) 1
β
∞ j =1 K (j )
= 1β(B) 1∞ j =1 β(K (j ))
is integrable, with integral ! ! 1β(K) (x)dx =
dβ 1K (u) · (u) du, du u∈B
x∈β(B)
where K ≡ [a,b) is an arbitrary half-open interval. 4. Now let f ∈ C(R) be arbitrary. Then f ◦β ∈ C(R) has support in some finite half-open interval [a,b). Let κ ≥ 1 be arbitrary. Then we can partition [a,b) into κ,1, . . . , K κ,m(κ) , each so a finite number mκ ≥ 1 of disjoint half-open intervals K −κ κ,j ), for each j = 1, . . . ,mκ . Define small that | f − cκ,j | < 2 1β([a,b]) on β(K the function fκ ≡
m(κ)
cκ,j 1β(K(κ,j )) .
j =1
Then | f − fκ | < 2−κ 1β([a,b]) . Moreover, fκ ◦ β ≡
m(κ)
j =1
cκ,j 1β(K(κ,j )) ◦ β =
m(κ)
cκ,j 1K(κ,j )
j =1
and | f ◦ β − fκ ◦ β| < 2−κ 1[a,b] . κ,1, . . . , K κ,m(κ,) are half-open intervals, we have, by Step 3, Since K
!
Change of Integration Variables ! dβ 1β(K(κ,j (x)dx = 1 (u) · (u) K(κ,j ) )) du du x∈β(B) u∈B
591
for each 1, . . . ,mκ . Hence !
m(κ)
x∈β(B) j =1
Equivalently,
! cκ,j 1β(K(κ,j )) (x)dx =
m(κ)
u∈B j =1
dβ (u) du. cκ,j 1K(κ,j ) (u) du
dβ fκ (x)dx = fκ (β(u)) · (u) du. du x∈β(B) u∈B
!
!
Letting κ → ∞, we obtain ! ! f (x)dx = x∈β(B)
dβ f (β(u)) · (u) du, du u∈B
where f ∈ C(R) is arbitrary. 5. Now define the functions I1,I2 : C(R) → R by ! I1 ( f ) ≡ f (x)dx x∈β(B)
and
dβ f (β(u)) · (u) du, I2 ( f ) ≡ du u∈B !
for each f ∈ C(R). Then both I1 and I2 are integrations on R, in the sense of Definition 4.2.1. Proposition 4.3.3 says that (R,C(R),I1 ) and (R,C(R),I2 ) are integration spaces. Moreover, Step 4 of this proof shows that I1 = I2 . Hence the complete extensions of (R,C(R),I1 ) and (R,C(R),I2 ) are equal. In other words, a function f on R is integrable relative to I1 iff f is integrable relative to I2 . Hence a function f is such that 1β(B) f is integrable relative to the Lebesgue integral iff the function 1B · ( f ◦ β) · dβ du is integrable relative to the Lebesgue integral, in which case ! ! dβ f (x)dx = f (β(u)) · (u) du. du x∈β(B) u∈B The theorem is proved.
The next lemma will be key to our subsequent proof of Theorem A.0.12 for the change of integration variables from rectangular to polar coordinates in the half plane. The proof of the lemma is longer than a few lines because, alas, Theorem A.0.12 is yet to be established. Lemma A.0.11. Lebesgue measure of certain fan-shaped subsets of a half disk.
592
Appendices
1. For each r > 0, the half disk Dr ≡ {(x,y) ∈ (0,∞) × R :
. x 2 + y 2 < r}
is integrable, with measure 2−1 π r 2 . 2. For each r,s > 0 with s < r, the subset Ds,r ≡ {(x,y) ∈ (0,∞) × R : s ≤
.
x 2 + y 2 < r}
is integrable, with measure 2−1 π(r 2 − s 2 ).
3. For each r,s > 0 with s < r, and for each u,v ∈ − π2 , π2 with u < v, the subset B A y (A.0.17) As,r,u,v ≡ (x,y) ∈ Ds,r : u < arctan < v x is integrable, with measure μ(As,r,u,v ) = 2−1 (v − u)(r 2 − s 2 ).
(A.0.18)
Proof. 1. First note that the set B0 ≡ {(x,y) : x > 0} = (0,∞)×R is a measurable set in R 2 because its indicator 1B(0) = 1(0,∞) ⊗1R is the product of two measurable to Assertion 1 of Theorem 4.10.10. Separately, the realfunctions on R 2 , according2 valued function (x,y) → x 2 + y 2 is continuous, and hence measurable on R 2 . Thus, according to Proposition 4.8.14, there exists a countable subset A ⊂ (0,∞) such that the set C D . B t ≡ (x,y) : x 2 + y 2 < t ⊂ [−M,M] × [−M,M] is integrable for each t ∈ [−M,M] ∩ Ac , for each M ≥ 1, where Ac is the metric complement in (0,∞) of the set A. Let t ∈ Ac be arbitrary. Then Dt = B0 B t . Hence Dt is an integrable subset of (0,∞) × R. Using Fubini’s Theorem, we compute ! ! ! 2 t 2 − x 2 dx μ(Dt ) = √ dy dx = 2 √ x∈(0,t)
y∈ −
t 2 −x 2,
t 2 −x 2
x∈(0,t)
.
! =2
! t 2 − t 2 sin2 u (t cos u)du = 2t 2
t sin u∈(0,t)
!
= 2t 2
π/2 u=0
(cos2 u)du u∈(0,π/2)
π/2 1 1 + cos 2u π du = t 2 u + sin 2u = t2 , u−0 2 2 2
where the third equality is by a change of integration variables x = t sin u in R 1 , justified by Theorem A.0.10, and where the sixth equality is by a second change of integration variables in R 1 , also justified by Theorem A.0.10. 2. Now consider each r > 0. Let (tk )k=1,2,... be an increasing sequence in Ac such that tk ↑ r. Then, according to Step 1, Dt (k) is an integrable set, with Dt (k) ⊂ Dt (k+1) for each k ≥ 1, and with μ(Dt (k) ) = tk2 π2 ↑ r 2 π2 . At the same time,
Change of Integration Variables
593
∞
Dr = k=1 Dt (k) . Hence, by the Monotone Convergence Theorem, the set Dr is an integrable set, with μ(Dr ) = r 2 π2 . Assertion 1 is proved. 3. Assertion 2 follows from Ds,r = Dr Dsc and from μ(Ds,r ) = μ(Dr Dsc ) = μ(Dr ) − μ(Ds ) = 2−1 π r 2 − 2−1 π s 2 . 4. It remains to prove Assertion 3. To that end,
let r,s > 0 be arbitrary with s < r. Define the function γ : R 2 → − π2 , π2 by (i) γ (x,y) ≡ arctan yx for each (x,y) with x > 0 and (ii) γ (x,y) ≡ 0 for each (x,y) with x ≤ 0. Thus the function γ is defined a.e. on R 2 . Separately, let k ≥ 0 be arbitrary. Define the function γk : R 2 → − π2 , π2 by (i ) γk (x,y) ≡ γ (x,y) for each (x,y) with x ≥ 2−k ; (ii ) γk (x,y) ≡ 0 for each (x,y) with x ≤ 2−k−1 ; and (iii ) γk (x,y) ≡
(x − 2−k−1 ) γ (2−k ,y) (2−k − 2−k−1 )
for each (x,y) with 2−k−1 < x < 2−k . Then γk is uniformly continuous on R 2 , and hence measurable. At the same time, by Assertion 2, the set Ds,r is an integrable subset of R 2 . Hence the product γk 1D(s,r) is an integrable function. Moreover, (|γk 1D(s,r) − γ 1D(s,r) | > 0) ⊂ [0,2−k ] × [−r,r], where μ([0,2−k ] × [−r,r]) = 2−k+1 r → 0 as k → ∞. Thus γk 1D(s,r) → γ 1D(s,r) in measure as k → ∞. Since |γ 1D(s,r) | ≤ π 1D(s,r) , and |γk 1D(s,r) | ≤ π 1D(s,r) for each k ≥ 0, the Dominated Convergence Theorem applies; it implies that the function γ 1D(s,r) is integrable. 5. Hence, according to Assertion 1 of Proposition 4.8.14, all but countably many t ∈ R are continuity points of the function γ 1D(s,r) relative to Ds,r . In other words, there exists a countable subset A of R such that each point t in the metric complement Ac is a a continuity point of the function γ 1D(s,r) relative to the integrable set Ds,r . Therefore Assertion 3 of Proposition 4.8.14 implies that, for each u,v ∈ Ac , the sets Ds,r ∩ (γ 1D(s,r) ≤ u) and Ds,r ∩ (γ 1D(s,r) < v) are integrable. Consequently, the set {(x,y) ∈ Ds,r : u < γ (x,y)1D(s,r) (x,y) < v) A y = (x,y) ∈ Ds,r : u < arctan < v ≡ As,r,u,v x is integrable, where u,v
arbitrary with u < v. ∈ Ac are with u < v, such that As,r,u,v is an 6. Next, let u,v ∈ − π2 , π2 be arbitrary integrable set. Take an arbitrary λ ∈ − π2 − u, π2 − v . Then v + λ < π2 and ' & λ, − sin λ u + λ > − π2 . Consider the matrix α ≡ cos sin λ, cos λ . Then det α = 1 and ' & λ, sin λ α −1 = −cos sin λ, cos λ . We will show that α · As,r,u,v = As,r,u+λ,v+λ .
(A.0.19)
594
Appendices
This follows from α · As,r,u,v ≡ {α · (x,y) : (x,y) ∈ As,r,u,v } ≡ {(x,y) ∈ R 2 : (x,y) = α −1 · (x,y); (x,y)∈A(s,r,u,v)
=
s≤
.
x 2 + y 2 < r;x sin u < y cos u;y cos v < x sin v}
{(x,y) ∈ R 2 : (x,y) = (x cos λ + y sin λ, − x sin λ + y cos λ);
(x,y)∈A(s,r,u,v)
s≤
. x 2 + y 2 < r;(x cos λ + y sin λ) sin u
< (−x sin λ + y cos λ) cos u;
=
(−x sin λ + y cos λ) cos v < (x cos λ + y sin λ) sin v} {(x,y) ∈ Ds,r : (x,y)
(x,y)∈A(s,r,u,v)
= (x cos λ + y sin λ, − x sin λ + y cos λ); x cos λ sin u + x sin λ cos u < y cos λ cos u − y sin λ sin u;
=
y cos λ cos v − y sin λ sin v < x cos λ sin v + x sin λ cos v} {(x,y) ∈ Ds,r : (x,y)
(x,y)∈A(s,r,u,v)
= (x cos λ + y sin λ, − x sin λ + y cos λ);
=
x sin(λ + u) < y cos(λ + u);y cos(λ + v) < x sin(λ + v)} C (x,y) ∈ Ds,r : (x,y) = (x cos λ + y sin λ,
(x,y)∈A(s,r,u,v)
=
y − x sin λ + y cos λ); tan(λ + u) < < tan(λ + v) x
D
{(x,y) ∈ As,r,λ+u,λ+v : (x,y)
(x,y)∈A(s,r,u,v)
= (x cos λ + y sin λ, − x sin λ + y cos λ)} = As,r,λ+u,λ+v .
(A.0.20) π
Equality A.0.19 has been verified, where λ ∈ − 2 − u, π2 − v is arbitrary, assuming that As,r,u,v is an integrable set.
Change of Integration Variables
595
At the same time, since, by assumption, As,r,u,v is an integrable set, its indicator g ≡ 1A(s,r,u,v) is an integrable function on R 2 . Hence, according to Assertion 1 of Theorem A.0.8, the function g ◦ α −1 · is integrable on R 2 , with ! ! ! ! −1 −1 g(α · (x,y))dxdy = g( u, v)d ud v. (A.0.21) | det α | · Consequently, the indicator 1α·A(s,r,u,v) = 1A(s,r,u,v) ◦ α −1 ≡ g ◦ α −1 is integrable. In other words, the set α · As,r,u,v is an integrable set. Since | det α −1 | = 1, equality A.0.21 simplifies to ! ! ! ! 1A(s,r,u,v) ( u, v)d ud v. 1A(s,r,u,v) (α −1 · (x,y))dxdy = Equivalently, ! !
! ! 1α·A(s,r,u,v) (x,y)dxdy =
1A(s,r,u,v) ( u, v)d ud v
or μ(α · As,r,u,v ) = μ(As,r,u,v ).
(A.0.22)
Moreover, equality A.0.19 says that α · As,r,u,v = As,r,u+λ,v+λ .
(A.0.23)
Hence As,r,u+λ,v+λ is an integrable set. Equalities A.0.22 and A.0.23 together yield (A.0.24) μ(As,r,u+λ,v+λ ) = μ(As,r,u,v ),
π where λ ∈ − 2 − u, π2 − v is arbitrary, assuming that u,v ∈ − π2 , π2 are such that u < v and As,r,u,v is an integrable set. 7. Continuing with the same assumptions as in Step 6, proceed to estimate a ). For that purpose, define the midpoint λ ≡ −2−1 (u + v) of bound for μ(A
s,r,u,v the interval − π2 −u, π2 −v . Then λ ∈ − π2 −u, π2 −v . Write ε ≡ tan(2−1 (v−u)). Then As,r,u+λ,v+λ B A y ≡ (x,y) ∈ Ds,r : tan(u + λ) < < tan(v + λ) x A B y −1 ≡ (x,y) ∈ Ds,r : tan(u − 2 (u + v)) < < tan(v − 2−1 (u + v)) x B A y = (x,y) ∈ Ds,r : −ε < < ε x = {(x,y) ∈ Ds,r : −εx < y < εx} ⊂ [0,r) × (−εr,εr).
596
Appendices
Hence, using equality A.0.24, we obtain the bound μ(As,r,u,v ) = μ(As,r,u+λ,v+λ ) ≤ 2r 2 ε ≡ 2r 2 tan(2−1 (v − u)),
(A.0.25) π π assuming that u,v ∈ − 2 , 2 are such that u < v and As,r,u,v is an integrable set. 8. Next, let u,v ∈ − π2 , π2 be arbitrary with u < v. We will show that As,r,u,v is an integrable set. Recall the countable exceptional set A defined in Step 5. Take a decreasing sequence (uh )h=1,2,... in Ac such that uh > u and tan(2−1 (uh −u)) → 0 as h → ∞. Take an increasing sequence (vi )i=1,2,... in Ac such that vi < v and tan(2−1 (v − vi )) → 0 as i → ∞. Since u < v, we may assume, without loss of generality, that uh < vi for each h,i ≥ 1. According to the conclusion of Step 5, As,r,u(h),v(i) is then an integrable set, for each h,i ≥ 1. Let h ≥ 1 be arbitrary. Note that for each i ≥ i + 2, we have As,r,u(h),v(i) ⊂ As,r,u(h),v(i ) and because (uh,vi ) ⊂ (uh,vi ) ∪ (vi−1,vi ), we have As,r,u(h),v(i ) ⊂ As,r,u(h),v(i) ∪ As,r,v(i−1),v(i ) . Hence μ(As,r,u(h),v(i ) ) − μ(As,r,u(h),v(i) ) ≤ μ(As,r,v(i−1),v(i ) ) ≤ 2r 2 tan(2−1 (vi − vi−1 )) ≤ 2r 2 tan(2−1 (v − vi−1 )) → 0, as i ≥ i + 2 → ∞, where the second inequality is by inequality A.0.25. Consequently, by the Monotone Convergence Theorem, the union As,r,u(h),v =
∞
As,r,u(h),v(i)
i=1
is an integrable set, where h ≥ 1 is arbitrary. Similarly, note that As,r,u(h),v ⊂ As,r,u(h ),v and that As,r,u(h ),v ⊂ As,r,u(h ),u(h−1) ∪ As,r,u(h),v because (uh ,v) ⊂ (uh ,uh−1 ) ∪ (uh,v), for each h ≥ 1 and h ≥ h + 2. Hence μ(As,r,u(h ),v ) − μ(As,r,u(h),v ) ≤ μ(As,r,u(h ),u(h−1) ) ≤ 2r 2 tan(2−1 (uh−1 − uh )) ≤ 2r 2 tan(2−1 (uh−1 − u)) → 0, as h ≥ h + 2 → ∞, where the second inequality is by inequality A.0.25. Consequently, by the Monotone Convergence Theorem, the union As,r,u,v =
∞ h=1
As,r,u(h),v
Change of Integration Variables
597
is set, where r,s > 0 are arbitrary with s < r, and where u,v ∈ anπ integrable
− 2 , π2 are arbitrary with u < v. 9. Proceed to
prove equality A.0.18, first with the additional assumption that v,u ∈ − π2 , π2 are rational multiples of π2 . Take an arbitrarily large q ≥ 1, such that u = − π2 + j q −1 π2 and v = − π2 + kq −1 π2 for some j,k ≥ 1 with 1 ≤ j < k ≤ 2q − 1. For each i = 1, . . . ,2q − 1, define ηi ≡ − π2 + iq −1 π2 . Thus u = ηj and v = ηk . As an abbreviation, write π π u ≡ η1 ≡ − + q −1 2 2 and π π v ≡ η2q−1 ≡ − q −1 . 2 2 Then π π π cos u ≡ cos − + q −1 = sin q −1 2 2 2 and π cos v = cos u = sin q −1 . 2 # " λ, − sin λ Define λ ≡ q −1 π2 , and define the matrix α ≡ cos . Then ηi+1 − ηi = λ sin λ cos λ for each i = 1, . . . ,2q −2, Moreover, det α = 1. Consider each i = 1, . . . ,2q −3. Then λ + ηi = ηi+1 and λ + ηi+1 = ηi+2 . Hence ! ! μ(As,r,η(i),η(i+1) ) = 1A(s,r,η(i),η(i+1)) (x)dxdy ! ! −1 = | det α| 1A(s,r,η(i),η(i+1) (α −1 · ( u, v))d ud v ! ! = 1α·(A(s,r,η(i),η(i+1)) ( u, v)d ud v ! ! = 1A(s,r,λ+η(i),λ+η(i+1) ( u, v)d ud v ! ! = 1A(s,r,η(i+1),η(i+2)) ( u, v)d ud v = μ(As,r,η(i+1),η(i+2) ),
(A.0.26)
where the second equality is by Assertion 1 of Theorem A.0.8, and where the fourth equality is by the recently proved equality A.0.19. 10. Based on the defining equality A.0.17, the sets As,r,η(1),η(2), . . . , As,r,η(2q−2),η(2q−1) are mutually exclusive. Hence ⎛ ⎞ 2q−2 2q−2
μ(As,r,η(i),η(i+1) ) = μ ⎝ As,r,η(i),η(i+1) ⎠ i=1
i=1
≤ μ(Ds,r ) = 2−1 π(r 2 − s 2 ).
(A.0.27)
598
Appendices
Consider each i = 1, . . . ,2q − 2. Let u i ,u i ∈ − π2 , π2 with u i < ηi < u i be
arbitrary such that tan(2−1 (u i − u i )) < q −2 . Let v i ,v i ∈ − π2 , π2 with v i < ηi+1 < v i be arbitrary such that tan(2−1 (v i − v i )) < q −2 . Then As,r,u (i),v (i) ⊂ As,r,η(i),η(i+1) ∪ As,r,u (i),u (i) ∪ As,r,v (i),v (i), whence μ(As,r,u (i),v (i) ) ≤ μ(As,r,η(i),η(i+1) ) + μ(As,r,u (i),u (i) ) + μ(As,r,v (i),v (i) ) ≤ μ(As,r,η(i),η(i+1) ) + 2r 2 tan(2−1 (u i − u i )) + 2r 2 tan(2−1 (v i − v i )) ≤ μ(As,r,η(i),η(i+1) ) + 4r 2 q −2,
(A.0.28)
where the second inequality is by inequality A.0.25. At the same time, (ηj ,ηk ) ⊂ k−1 i=j (ui ,vi ). Therefore As,r,η(j ),η(k) ⊂
k−1
As,r,u (i),v (i),
i=j
whence μ(As,r,η(j ),η(k) ) ≤
k−1
μ(As,r,u (i),v (i) .
(A.0.29)
i=j
Combining, k−1
⎛
μ(As,r,η(i),η(i+1) ) = μ ⎝
i=j
k−1
⎞ As,r,η(i),η(i+1) ⎠ ≤ μ(As,r,η(j ),η(k) )
i=j
≤
k−1
μ(As,r,u (i),v (i) ) ≤
i=j
k−1
μ(As,r,η(i),η(i+1) )
i=j
+ (k − j )4r 2 q −2, where the equality is because the members of the union are mutually exclusive, where the second inequality is inequality A.0.29, and where the last inequality is thanks to inequality A.0.28. Thus μ(As,r,η(j ),η(k) ) =
k−1
μ(As,r,η(i),η(i+1) ) ± (k − j )4r 2 q −2 .
(A.0.30)
i=j
11. In the special case where j = 1 and k = 2q − 1, inequality A.0.30 yields μ(As,r,u,v ) ≡ μ(As,r,η(1),η(2q−1) ) =
2q−2
i=1
μ(As,r,η(i),η(i+1) ) ± (2q − 2)4r 2 q −2 .
(A.0.31)
Change of Integration Variables
599
At the same time, As,r,u,v ⊂ Ds,r . Moreover, Ds,r ⊂ As,r,u,v ∪ ([0,r cos v] × [−r,r]) a.e. Hence μ(Ds,r ) = μ(As,r,u,v ) ± 2r 2 cos v =
2q−2
i=1
π , μ(As,r,η(i),η(i+1) ) ± (2q − 2)4r 2 q −2 ± 2r 2 sin q −1 2
where the last equality is due to equality A.0.31. Equivalently, π μ(As,r,η(i),η(i+1) ) = μ(Ds,r ) ± (2q − 2)4r 2 q −2 ± 2r 2 sin q −1 2 i=1 π . = 2−1 π(r 2 − s 2 ) ± (2q − 2)4r 2 q −2 ± 2r 2 sin q −1 2 Since all the summands on the left-hand side are equal, according to equality A.0.26, it follows that 2q−2
μ(As,r,η(i),η(i+1) ) = (2q − 2)−1 2−1 π(r 2 − s 2 )
π , ± 4r 2 q −2 ± (2q − 2)−1 2r 2 sin q −1 2 for each i = 1, . . . ,2q − 2. Consequently, μ(As,r,u.v ) = μ(As,r,η(j ),η(k) ) = A
k−1
(A.0.32)
μ(As,r,η(i),η(i+1) ) ± (k − j )4r 2 q −2
i=j
= (k − j ) · (2q − 2)−1 2−1 π(r 2 − s 2 ) ± 4r 2 q −2 π B ± (k − j )4r 2 q −2 ± (2q − 2)−1 2r 2 sin q −1 2 π A = (k − j )q −1 · q(2q − 2)−1 (r 2 − s 2 ) ± 4r 2 q −1 2π −1 2 π B ± q2π −1 (2q − 2)−1 2r 2 sin q −1 2 −1 π −1 2 −2 · q2π 4r q ± (k − j )q 2 A = (v − u) · q(2q − 2)−1 (r 2 − s 2 ) ± 4r 2 q −1 2π −1 π B ± q2π −1 (2q − 2)−1 2r 2 sin q −1 2 ± (v − u) · q2π −1 4r 2 q −2,
(A.0.33)
where the second equality is equality A.0.30, and where the third equality is from equality A.0.32. Since q ≥ 1 is arbitrary, we can let q → ∞. After all the vanishing terms on the right-hand side of equality A.0.33 drop out, we obtain
600
Appendices
1 (v − u)(r 2 − s 2 ), 2 provided that v,u are rational of π2 .
πmultiples π of π2 , 12. Finally, let u,v ∈ − 2 , 2 be arbitrary, not necessarily
multiples π π with v > u. Let (vi )i=1,2,... be an increasing sequence in − 2 , 2 such that vi is a rational multiple of π2 for each i ≥ 1,
that vi ↑ v. Similarly, and such let (uh )h=1,2,... be a decreasing sequence in − π2 , π2 such that uh is a rational multiple of π2 for each h ≥ 1, and such that uh ↓ u. Let h ≥ 1 be arbitrary. Then, by the preceding paragraph, we have μ(As,r,u.v ) =
1 (vi − uh )(r 2 − s 2 ) 2 for each i ≥ 1. Let i → ∞. Then the Monotone Convergence Theorem implies that the set As,r,u(h),v is integrable, with measure μ(As,r,u(h),v(i) ) =
1 (v − uh )(r 2 − s 2 ). 2 Now let h → ∞. Then the Monotone Convergence Theorem implies that the set As,r,u.v is integrable, with measure μ(As,r,u(h),v ) =
1 (v − u)(r 2 − s 2 ), 2 which is the desired equality A.0.18. Assertion 3 and the lemma are proved. μ(As,r,u.v ) =
Theorem A.0.12. Change of integration variables from rectangular to polar coordinates in the half plane. Let H ≡ (0,∞) × R denote the open half plane, equipped with the Euclidean metric. Let g be an arbitrary function on H . Then g is integrable on H relative to the Lebesgue integration
iff the function (r,u) → rg(r cos u,r sin u) is integrable on (0,∞) × − π2 , π2 , relative to the Lebesgue integration, in which case ! ∞! π ! ∞! ∞ 2 g(x,y)dxdy = rg(r cos u,r sin u)drdu. x=0 y=−∞
r=0 u=− π2
Proof. 1. Define the function β : [0,∞) × R → R 2 by β(r,u) ≡ (r cos u,r sin u) for each (r,u) ∈ [0,∞) × R. Then the function β is uniformly continuous, with some modulus of continuity δβ . Moreover, . y = (x,y) (A.0.34) x 2 + y 2, arctan β x for each (x,y) ∈ (0,∞) × R. 2. First, let g ∈ C(H ) be arbitrary, with some modulus of continuity√δg . Take M > x > 0 such that g has [x,M] × [−M,M] as support. Write r ≡ 2M and s ≡ arccos xr . Then [x,M] × [−M,M] ⊂ Ds,r . Hence there exists u,v ∈ − π2 , π2 such that [x,M] × [−M,M] ⊂ As,r,u,v ⊂ Ds,r ,
Change of Integration Variables whence ! ∞!
601
! !
∞
g(x,y)dxdy =
g(x,y)dxdy.
(A.0.35)
(x,y)∈A(s,r,u,v)
x=0 y=−∞
3. Let ε > 0 be arbitrary. Take q ≥ 1 so large that √ 2 −1 δr ≡ (r − s)q < δβ (δg (ε)) 2 and δu ≡ (v − u)q
−1
√ 2 < δβ (δg (ε)). 2
For each h = 0, . . . ,q, define rh ≡ s + hδr . For each i = 0, . . . ,q, define ui ≡ u + iδu . Then q−1 q−1
Ar(h),r(h+1),u(i),u(i+1) ⊂ As,r,u,v,
(A.0.36)
h=0 i=0
while according to Assertion 3 of Lemma A.0.11, we have ⎛ ⎞ q−1 q−1 q−1
q−1
μ⎝ Ar(h),r(h+1),u(i),u(i+1) ⎠ = μ(Ar(h),r(h+1),u(i),u(i+1) ) h=0 i=0
h=0 i=0
=
q−1 q−1
2 2−1 (ui+1 − ui )(rh+1 − rh2 )
h=0 i=0
= 2−1 (v − u)(r 2 − s 2 ) = μ(As,r,u,v ). Thus the union on the left-hand side of relation A.0.36 is actually a full subset of the right-hand side. Consequently, ! ! g(x,y)dxdy (x,y)∈A(s,r,u,v)
! !
= =
g(x,y)dxdy (x,y)∈ A(r(h),r(h+1),u(i),u(i+1)) q−1 ! ! q−1
h=0 i=0
g(x,y)dxdy,
(x,y)∈A(r(h),r(h+1),u(i),u(i+1))
Combining with equality A.0.35, we obtain !
∞
!
∞
x=0 y=−∞
g(x,y)dxdy =
q−1 ! ! q−1
h=0 i=0
g(x,y)dxdy.
(x,y)∈A(r(h),r(h+1),u(i),u(i+1))
(A.0.37)
602
Appendices
4. Let h,i = 1, . . . ,q − 1 be arbitrary. Define xh,i ≡ rh cos ui and yh,i ≡ rh sin ui . Consider each (x,y) ∈ Ar(h),r(h+1),u(i),u(i+1) . Then, by the defining equality A.0.17, we have . . 2 + y 2 ∈ [r ,r x 2 + y 2, xh,i h h+1 ) h,i and yh,i y arctan , arctan ∈ [ui ui+1 ). x xh,i Hence
. . y yh,i 2 2 2 2 − x + y , arctan xh,i + yh,i . arctan x xh,i . . < (rh+1 − rh )2 + (ui+1 − ui )2 = δr2 + δu2 < δβ (δg (ε)).
Therefore (x,y) − (xh,i ,yh,i ) . y = x 2 + y 2, arctan β x
. 2 + y 2 . arctan yh,i −β xh,i h,i xh,i
< δg (ε),
where the equality is thanks to equality A.0.34. Consequently, |g(x,y) − g(xh,i ,yh,i )| < ε. 5. Equality A.0.37 therefore yields ! ∞! ∞ g(x,y)dxdy x=0 y=−∞
=
q−1 ! ! q−1
=
q−1 ! ! q−1
q−1 q−1
(g(rh cos ui ,rh sin ui ) ± ε)dxdy (x,y)∈A(r(h),r(h+1),u(i),u(i+1))
h=0 i=0
=
(g(xh,i ,yh,i ) ± ε)dxdy (x,y)∈A(r(h),r(h+1),u(i),u(i+1))
h=0 i=0
μ(Ar(h),r(h+1),u(i),u(i+1) )(g(rh cos ui ,rh sin ui ) ± ε)dxdy
h=0 i=0
=
q−1 q−1
2 2−1 (ui+1 − ui )(rh+1 − rh2 )(g(rh cos ui ,rh sin ui ) ± ε)
h=0 i=0
=
q−1 q−1
(ui+1 − ui )(rh+1 − rh )
h=0 i=0
=
q−1
q−1 !
h=0 i=0
r(h+1) ! u(i+1) r(h)
u(i)
rh+1 + rh 2
(g(rh cos ui ,rh sin ui ) ± ε)
(rh ± ε)(g(rh cos ui ,rh sin ui ) ± ε)drdu
Change of Integration Variables ≡
q−1 ! q−1
r(h)
h=0 i=0
! ≡
∞
!
r(h+1) ! u(i+1)
π 2
r=0 u=− π2
603
(r ± 2ε)(g(r cos u,r sin u) ± 2ε)drdu
u(i)
(r ± 2ε)(g(r cos u,r sin u) ± 2ε)drdu,
where ε > 0 is arbitrarily small. Letting ε → 0, we obtain !
∞
!
!
∞
g(x,y)dxdy =
x=0 y=−∞
∞
!
π 2
rg(r cos u,r sin u)drdu,
r=0 u=− π2
(A.0.38)
where g ∈ C(H ) is arbitrary. 3. Define the functions I1,I2 : C(H ) → R by ! ∞! ∞ I1 (g) ≡ g(x,y)dxdy x=0 y=−∞
and ! I2 (g) ≡
∞
!
π 2
r=0 u=− π2
rg(r cos u,r sin u)drdu,
for each g ∈ C(H ). Then each of I1 and I2 is an integration on H , in the sense of Definition 4.2.1. Proposition 4.3.3 says that (H,C(H ),I1 ) and (H,C(H ),I2 ) are integration spaces. Moreover, equality A.0.38 implies that I1 = I2 . Hence the complete extensions of (H,C(H ),I1 ) and (H,C(H ),I2 ) are equal. In other words, a function g on H is integrable relative to I1 iff g is integrable relative to I2 . Thus a function g on H is integrable relative to the Lebesgue integral iff the function (r,θ ) → rg(r cos u,r sin u) is integrable relative to the Lebesgue integral, in which case !
∞
!
∞
! g(x,y)dxdy =
x=0 y=−∞
∞
!
π 2
r=0 u=− π2
rg(r cos u,r sin u)drdu.
The theorem is proved.
Corollary A.0.13. Integral of a function related to the normal p.d.f. The 2 2 function (x,y) → e−(x +y )/2 on R 2 is Lebesgue integrable, with ! ∞ ! ∞ 2 2 e−(x +y )/2 dxdy = 2π . x=−∞ y=−∞
Proof. 1. The function f : R → R defined by f (x) ≡ e−x /2 for each x ∈ R is integrable relative to the Lebesgue integration. Hence, by Fubini’s Theorem, the function f ⊗ f is integrable on R 2 . In other words, the function (x,y) → 2 2 e−(x +y )/2 is integrable relative to Lebesgue integration on R 2 . Moreover, 2
604
!
!
∞
Appendices ∞
e−(x
2 +y 2 )/2
x=−∞ y=−∞ ! ∞ ! ∞
e−(x
=
y=−∞
! =
∞
! = =2
∞
!
0
! =2 ! =2
!
y=−∞ ! ∞
=2
dx dy
−(x 2 +y 2 )/2
!
∞
z=0 ! ∞
−(z2 +y 2 )/2
! dz +
∞
dx +
x=−∞
e
!
2 +y 2 )/2
x=−∞
e y=−∞
dxdy
e
−(x 2 +y 2 )/2
dx dy
x=0 ∞
e−(x
2 +y 2 )/2
dx dy
x=0
e−(x
2 +y 2 )/2
dxdy
y=−∞ x=0 π 2
u=− π2 π 2 u=− π2 π 2
u=− π2
!
∞
e−(r
2 cos2 u+r 2 sin2 u)/2
rdrdu
r=0
!
∞
e−r
2 /2
! rdrdu = 2
r=0
! 1du = 2
π 2
u=− π2
π 2
u=− π2
!
∞
e−s dsdu
s=0
1du = 2π,
where the third equality is by the change of integration variables z = −x in R 1, as justified by Theorem A.0.10; where the seventh equality is by the change of integration variables s = r 2 /2 in R 1 , again as justified by Theorem A.0.10; and where the fifth equality is thanks to Theorem A.0.12. The corollary is proved.
Appendix B Taylor’s Theorem
For ease of reference, we cite here Taylor’s Theorem from [Bishop and Bridges 1985]. Theorem B.0.1. Taylor’s Theorem. Let D be a nonempty open interval in R. Let f be a complex-valued function on D. Let n ≥ 0 be arbitrary. Suppose f has continuous derivatives up to order n on D. For k = 1, . . . ,n write f (k) for the k-th derivative of f . Let t0 ∈ D be arbitrary, and define rn (t) ≡ f (t) −
n
f (k) (t0 )(t − t0 )k /k!
k=0
for each t ∈ D. Then the following conditions hold: 1. If | f (n) (t) − f (n) (t0 )| ≤ M on D for some M > 0, then |rn (t)| ≤ M|t − n t0 | /n! 2. rn (t) = o(|t − t0 |n ) as t → t0 . More precisely, suppose δf ,n is a modulus of continuity of f (n) at the point t0 . Let ε > 0 be arbitrary. Then |rn (t)| < ε|t − t0 |n for each t ∈ R with |t − t0 | < δf ,n (n! ε). 3. If f (n+1) exists on D and | f (n+1) | ≤ M for some M > 0, then |rn (t)| ≤ M|t − t0 |n+1 /(n + 1)! Proof. See [Bishop and Bridges 1985].
605
References
[Aldous 1978]
[Billingsley 1968]
[Billingsley 1974]
[Billingsley 1999]
[Bishop 1967]
[Bishop and Bridges 1985]
[Bishop and Cheng 1972] [Blumenthal and Getoor 1968]
[Chan 1974]
[Chan 1975]
Aldous, D.: Stopping Times and Tightness, Annals of Probability, Vol. 6, no. 2, 335–340, 1978 Billingsley, P.: Convergence of Probability Measures. New York: John Wiley & Sons, 1968 Billingsley, P.: Conditional Distributions and Tightness, Annals of Probability, Vol. 2, no. 3, 480–485, 1974 Billingsley, P.: Convergence of Probability Measures (2nd ed.). New York: John Wiley & Sons, 1999 Bishop, E.: Foundations of Constructive Analysis. New York, San Francisco, St. Louis, Toronto, London, and Sydney: McGraw-Hill, 1967 Bishop, E., and Bridges, D.: Constructive Analysis. Berlin, Heidelberg, New York, and Tokyo: Springer, 1985 Bishop, E., and Cheng, H.: Constructive Measure Theory, AMS Memoir no. 116, 1972 Blumenthal, R. M., and R. K. Getoor: Markov Processes and Potential Theory, New York and London: Academic Press, 1968 Chan, Y. K.: Notes on Constructive Probability Theory, Annals of Probability, Vol. 2, no. 1, 51–75, 1974 Chan, Y. K.: A Short Proof of an Existence Theorem in Constructive Measure Theory, Proceedings of the American Mathematical Society, Vol. 48, no. 2, 435–437, 1975
606
References [Chentsov 1956]
[Chung 1968]
[Doob 1953] [Durret 1984] [Feller I 1971]
[Feller II 1971]
[Garsia, Rodemich, and Rumsey 1970]
[Grigelionis 1973]
[Kolmogorov 1956]
[Loeve 1960]
[Lorentz 1966]
[Mines, Richman, and Ruitenburg 1988]
[Neveu 1965]
[Pollard 1984]
607
Chentsov, N.: Weak Convergence of Stochastic Processes with Trajectories have No Discontinuities of the Second Kind, Theory of Probability & Its Applications, Vol. 1, no. 1, 140– 144, 1956. Chung, K. L.: A Course in Probability Theory. New York, Chicago, San Francisco, and Atlanta: Harcourt, Brace & World, 1968 Doob, J. L.: Stochastic Processes. New York, London, and Sydney: John Wiley & Sons, 1953 Durrett, R.: Brownian Motion and Martingales in Analysis. Belmont: Wadsworth, 1984 Feller, W.: An Introduction to Probability and Its Applications, Vol. 1 (3rd ed.). New York: John Wiley & Sons, 1971 Feller, W.: An Introduction to Probability and Its Applications, Vol. 2 (2nd ed.). New York: John Wiley & Sons, 1971 Garsia, A. M., Rodemich, E., and Rumsey, H. Jr.: A Real Variable Lemma and the Continuity of Paths of Some Gaussian Processes, Indiana University Mathematics Journal, Vol. 20, no. 6, 565–578, 1970. Grigelionis, B.: On the Relative Compactness of Sets of Probability Measures in D[0,∞) (X), Mathematical Transactions of the Academy of Sciences of the Lithuanian SSR, Vol. 13, no. 4, 576–586, 1973 Kolmogorov, A. N.: Asymptotic Characteristics of Some Completely Bounded Metric Spaces, Proceedings of the USSR Academy of Sciences, Vol. 108, no. 3, 585–589, 1956 Loeve, M.: Probability Theory (2nd ed.). Princeton, Toronto, New York, and London: Van Nostrand, 1960 Lorentz, G.G.: Metric Entropy and Approximation. Bulletin of the American Mathematical Society, Vol. 72, no. 6, 903–937, 1966 Mines, R., Richman, F., and Ruitenburg, W.: A Course in Constructive Algebra. New York: Springer, 1988 Neveu, J.: Mathematical Foundations of the Calculus of Probability (translated by A. Feinstein). San Francisco, London, and Amsterdam: Holden-Day, 1965 Pollard, D.: Convergence of Stochastic Processes. New York, Berlin, Heidelberg, and Tokyo: Springer, 1984
608 [Potthoff 2009]
[Potthoff 2009-2]
[Potthoff 2009-3]
[Richman 1982]
[Ross 2003]
[Skorokhod 1956]
[Stolzenberg 1970]
References Potthoff, J.: Sample Properties of Random Fields, I, Separability and Measurability, Communications on Stochastic Analysis, Vol. 3, no. 1, 143–153, 2009 Potthoff, J.: Sample Properties of Random Fields, II, Continuity, Communications on Stochastic Analysis, Vol. 3, no. 3, 331–348, 2009 Potthoff, J.: Sample Properties of Random Fields, III, Differentiability, Communications on Stochastic Analysis, Vol. 4, no. 3, 335–353, 2010 Richman, F.: Meaning and Information in Constructive Mathematics. American Mathematical Monthly, Vol. 89, 385–388 (1982) Ross, S.: Introduction to Probability Models. San Diego, London, and Burlington: Academic Press, 2003 Skorokhod, A. V.: Limit Theorems for Stochastic Processes. Theory of Probability and Its Applications, Vol. I, no. 3, 1956 Stolzenberg, G.: Review of “Foundation of Constructive Analysis.” Bulletin of the American Mathematical Society, Vol. 76, 301–323, 1970
Index
absolute moment, 143 accordion function, 414 adapted process, 300 admissible functions on [0,1], 384 all but countably many, 13 almost everywhere, 66 almost sure, 139 a.u. boundedness, 444, 445 a.u. càdlàg process, 406 a.u. càdlàg process on [0,∞), 479 a.u. continuity of r.f., 229 a.u. globally Hoelder process, 349 a.u. Hoelder coefficient, 349 a.u. Hoelder continuous process, 349 basis functions of an ε-partition of unity, 29 bijection, 10 binary approximation, 20 bounded set, 13 Brownian motion in R m , 353 Brownian semigroup, 568 càdlàg completion, 382 càdlàg function on [0,∞), 479 càdlàg functions, 376 Cauchy in measure, 110 Cauchy in probability, 144 Cauchy–Schwarz inequality, 143 centered Gaussian r.f., 291 Central Limit Theorem, 221 characteristic function, 204 Chebychev’s inequality, 87 Compact Daniell–Kolmogorov Extension, 242 compact metric space, 13 compactification of a binary approximation, 39 compactification of a Feller semigroup, 539 complete extension of an integration, 55 complete extension of an integration space, 55 complete integration space, 61
composite function, 10 composite modulus of smoothness, 495 composite transition distribution, 495 conditional expectation, 184 conditional probability, 184 conditional probability space given an event, 184 conditionally integrable r.r.v.’s, 184 consistency condition, 232 consistent family, 232 consistent family of f.j.d.’s, 232 consistent family of f.j.d.’s from initial distribution and semigroup, 502, 533 continuity a.u. of r.f., 229 continuity point of an integrable function, 83, 84 continuity point of a measurable function, 101 continuity point of a P.D.F., 165 continuity point of an r.r.v., 140 continuity in probability of r.f., 229 continuous function on locally compact space, 13 continuous function that vanishes at infinity, 15 convergence almost everywhere, 110 convergence almost surely, 144 convergence almost uniformly, 110, 144 convergence in distribution, 155 convergence in L1 , 110 convergence in measure, 110 convergence in probability, 144 convergence of r.v.’s in distribution, 155 convergence uniformly, 110 convolution, 204 coordinate function, 11, 242 countable power of binary approximation, 25 countable power integration space, 134 countable set, 10 countably infinite set, 10 covariance, 143 C-regularity, 337
609
610 Daniell–Kolmogorov Extension, 241, 251 Daniell–Kolmogorov–Skorokhod Extension, 266 deletion, 233 direct product of two functions, 120 distribution function, 45 distribution induced by an r.v., 152 distribution metric for a locally compact space, 156 distribution on a metric space, 151 division points of simple càdlàg function, 383 domain of a function, 9 Dominated Convergence Theorem, 117 D-regular family of f.j.d.’s on Q∞ , 410 D-regular process on Q∞ , 410, 479 dual function of a sequence, 232 eigenvector, 190 empty set, 9 enumerated set, 10 enumeration, 10 ε-approximation of a totally bounded metric space, 13 ε-division points, 376 ε-entropy, 20 ε-partition of unity, 29 equivalent stochastic processes, 231 event, 139 expectation, 138 expected value, 138, 139 extension of family of f.j.d.’s, 231 family, 8 family of functions that separates points, 73 Feller semigroup, 538 filtration, 300 filtration generated by a process, 300 finite integration space, 102 finite joint distribution, 232 finite power of binary approximation, 23 finite sequence, 10 finite set, 10 first Borel–Cantelli Lemma, 142 first exit time, 488 f.j.d., 232 f.j.d.’s generated by Feller semigroup, 538 f.j.d.’s, continuous in probability, 237 Fourier transform, 204 Fubini’s Theorem, 127, 129 full set, 65, 141 function, 9 function that separates two points, 73 Gaussian r.f., 290 greatest lower bound, 15
Index Hoelder exponent, 349 Hoelder’s inequality, 143 I -basis, 102 independent events, 182 independent r.v.’s, 182 indexed set, 10 indicator, 65 indicator of an integrable set, 67 indicator of a measurable set, 94 infimum, 15 infinite sequence, 10 injection, 10 integrable function, 55 integrable function, complex-valued, 202 integrable real random variable, 139 integrable set, 67 integral, 51 integration, 51 integration on locally compact metric space, 47 integration space, 51 integration subspace, 53 interpolated Gaussian process by conditional expectations, 365 jointly normal, 193 Kolmogorov’s ε-entropy, 20 Lp -norm, 143 least upper bound, 15 Lebesgue integration space, 118, 133 Lebesgue measurable function, 118 left limit, 375 Lipschitz constant, 27 Lipschitz continuity, 13 Lipschitz continuous function, 27 locally compact metric space, 13 Lyapunov’s inequality, 143 mapping, 9 marginal distributions, 231 marginal metric, 237 Markov process, 494 Markov property, 494 Markov semigroup, 500 martingale, 306 mean, 143 measurable extension, 286 measurable function, complex-valued, 202 measurable r.f., 276 measurable set, 94 measure of integrable set, 67 measure-theoretic complement, 67, 94 mesh, 45
Index metric complement, 12 metric space, 12 metrically discrete subset, 13 Minkowski’s inequality, 143 modulus of a.u. boundedness, 445 modulus of a.u. càdlàg, 406 modulus of a.u. continuity of r.f., 228 modulus of càdlàg, 376 modulus of continuity, 13 modulus of continuity a.u. of r.f., 228 modulus of continuity of f.j.d.’s, 237 modulus of continuity in probability, 228 modulus of C-regularity, 337 modulus of D-regularity, 410 modulus of integrability, 90 modulus of local compactness, 20 modulus of non-explosion of Feller semigroup, 538 modulus of pointwise tightness, 249 modulus of smoothness of Feller semigroup, 538 modulus of smoothness of Markov semigroup, 500 modulus of smoothness of transition distribution, 495 modulus of strong continuity of Feller semigroup, 538 modulus of strong continuity of Markov semigroup, 500 modulus of strong right continuity, 445 modulus of tightness, 159 moment, 143 Monotone Convergence Theorem, 64 mutually exclusive, 65 nonempty set, 9 nonnegative definite function, 291 nonnegative definite matrix, 190 normal distribution, 193, 197 normal p.d.f, 193 normal P.D.F., 193 normally distributed, 193 null set, 67 observation of a process at a simple stopping time, 302 one-point compactification, 34 one-point compactification from binary approximation, 35 one-step transition distribution, 498 operation, 9 outcome, 139 parameter set, 227 partition of R, 45 partition of unity, 277
611
partition of unity of locally compact metric space, 30 path space, 242 P.D.F., 165 P.D.F. of an r.r.v., 165 point at infinity, 34 point mass, 52 pointwise continuity, 376 positive definite function, 291 positive definite matrix, 190 positivity condition for integration, 51 positivity condition of integration on locally compact metric space, 47 power integration space, 129 principle of finite search, 3 principle of infinite search, 3 probability density function, 164 probability distribution function, 165 probability of an event, 139 probability function, 139 probability integration space, 61 probability metric, 145, 265 probability space, 138 probability space induced by an r.v., 152 probability subspace, 150 probability subspace generated by family of r.v.’s, 150 process with Markov semigroup, 502 product integration, 125 product integration space, 125, 129 product metric space, 14 product of a sequence of complete integration spaces, 134 profile bound, 74 profile system, 73 quantile mapping, 169 random càdlàg function, 405, 406, 480 random field, 227 random variable, 138 range of a function, 10 real random variable, 139 refinement of a partition, 45 regular point of integrable function, 83, 141 regular point of measurable function, 101 regular point of an r.r.v., 139 representation of integrable function, 55 restriction of a consistent family, 236 restriction of a family of functions, 35 restriction of a function, 9 restriction of an r.f., 227 r.f., a.u. continuous, 228 r.f., continuity a.u., 228 r.f., continuous in probability, 228
612 Riemann–Stieljes integral, 46 Riemann–Stieljes sum, 45 right complete, 375 right continuity, 375 right continuous filtration, 300, 301 right-Hoelder constant, 469 right-Hoelder exponent, 469 right-Hoelder process, 469 right-limit extension of a filtration, 301 right-limit extension of process, 418 r.v. observable at stopping time, 302 sample, 139 sample function, 227 sample space, 139, 227 semigroup, 500 set, 8 set of distributions on complete metric space, 151 set-theoretic complement, 9 set-theoretic equality of functions, 9 σ -finite integration space, 102 simple càdlàg functions, 383 simple first exit time, 304 simple function, 120 simple modulus of integrability, 90 simple stopping time, 302 size of a finite set, 10 Skorokhod metric, 384 Skorokhod representation, 170 Skorokhod space on [0,∞), 479 special convex function, 316 standard deviation, 143 standard normal p.d.f., 193 state space, 227 state-uniformly a.u. càdlàg, 548
Index stochastic approximation, 276, 279 stochastic process, 227 stopping time, 302 strictly convex function, 314 strong Markov process, 494 strong right continuity in probability, 444, 445 submartingale, 306, 307 subsequence, 10 sum of a sequence in an integration space, 62 support, 15 supremum, 15 surjection, 10 tight family of distributions, 159 tight family of r.v.’s, 159 time parameter, 227 time-uniformly a.u. càdlàg process on [0,∞), 480 time-uniformly a.u. continuous, 229 time-uniformly D-regular process on Q∞ , 479 transition distribution, 495 transition distributions generated by Feller semigroup, 539 unequal elements of metric space, 13 uniform distribution on [0,1], 169 uniform integrability, 90 uniform metric, 334 uniformly continuity, 13 variance, 143 weak convergence of distributions, 155 weak convergence of r.v.’s, 155 wide-sense submartingale, 306 wide-sense supermartingale, 307